首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Lü L  Zhang ZK  Zhou T 《PloS one》2010,5(12):e14139

Background

Zipf''s law and Heaps'' law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation.

Methodology/Principal Findings

We show that the Heaps'' law can be considered as a derivative phenomenon if the system obeys the Zipf''s law. Furthermore, we refine the known approximate solution of the Heaps'' exponent provided the Zipf''s exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps'' exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf''s and Heaps'' exponents.

Conclusions/Significance

The present analysis provides a clear picture about the relation between the Zipf''s law and Heaps'' law without the help of any specific stochastic model, namely the Heaps'' law is indeed a derivative phenomenon from the Zipf''s law. The presented numerical method gives considerably better estimation of the Heaps'' exponent given the Zipf''s exponent and the system size. Our analysis provides some insights and implications of real complex systems. For example, one can naturally obtained a better explanation of the accelerated growth of scale-free networks.  相似文献   

2.

Background

Zipf''s discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well.

Methodology/Principal Findings

By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type – a measure of the logicality of each word – and less strongly on frequency. We develop a generative model of this behavior that fully determines the dynamics of word usage.

Conclusions/Significance

Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf''s law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics.  相似文献   

3.
Wang L  Li X  Zhang YQ  Zhang Y  Zhang K 《PloS one》2011,6(7):e21197

Background

Zipf''s law and Heaps'' law are two representatives of the scaling concepts, which play a significant role in the study of complexity science. The coexistence of the Zipf''s law and the Heaps'' law motivates different understandings on the dependence between these two scalings, which has still hardly been clarified.

Methodology/Principal Findings

In this article, we observe an evolution process of the scalings: the Zipf''s law and the Heaps'' law are naturally shaped to coexist at the initial time, while the crossover comes with the emergence of their inconsistency at the larger time before reaching a stable state, where the Heaps'' law still exists with the disappearance of strict Zipf''s law. Such findings are illustrated with a scenario of large-scale spatial epidemic spreading, and the empirical results of pandemic disease support a universal analysis of the relation between the two laws regardless of the biological details of disease. Employing the United States domestic air transportation and demographic data to construct a metapopulation model for simulating the pandemic spread at the U.S. country level, we uncover that the broad heterogeneity of the infrastructure plays a key role in the evolution of scaling emergence.

Conclusions/Significance

The analyses of large-scale spatial epidemic spreading help understand the temporal evolution of scalings, indicating the coexistence of the Zipf''s law and the Heaps'' law depends on the collective dynamics of epidemic processes, and the heterogeneity of epidemic spread indicates the significance of performing targeted containment strategies at the early time of a pandemic disease.  相似文献   

4.

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users.  相似文献   

5.
6.
It is well-known that word frequencies arrange themselves according to Zipf''s law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf''s law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf''s law and linguistic complexity are inter-related. The assumption that Zipf''s law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.  相似文献   

7.

Background

The dairy cattle breeding industry is a highly globalized business, which needs internationally comparable and reliable breeding values of sires. The international Bull Evaluation Service, Interbull, was established in 1983 to respond to this need. Currently, Interbull performs multiple-trait across country evaluations (MACE) for several traits and breeds in dairy cattle and provides international breeding values to its member countries. Estimating parameters for MACE is challenging since the structure of datasets and conventional use of multiple-trait models easily result in over-parameterized genetic covariance matrices. The number of parameters to be estimated can be reduced by taking into account only the leading principal components of the traits considered. For MACE, this is readily implemented in a random regression model.

Methods

This article compares two principal component approaches to estimate variance components for MACE using real datasets. The methods tested were a REML approach that directly estimates the genetic principal components (direct PC) and the so-called bottom-up REML approach (bottom-up PC), in which traits are sequentially added to the analysis and the statistically significant genetic principal components are retained. Furthermore, this article evaluates the utility of the bottom-up PC approach to determine the appropriate rank of the (co)variance matrix.

Results

Our study demonstrates the usefulness of both approaches and shows that they can be applied to large multi-country models considering all concerned countries simultaneously. These strategies can thus replace the current practice of estimating the covariance components required through a series of analyses involving selected subsets of traits. Our results support the importance of using the appropriate rank in the genetic (co)variance matrix. Using too low a rank resulted in biased parameter estimates, whereas too high a rank did not result in bias, but increased standard errors of the estimates and notably the computing time.

Conclusions

In terms of estimation''s accuracy, both principal component approaches performed equally well and permitted the use of more parsimonious models through random regression MACE. The advantage of the bottom-up PC approach is that it does not need any previous knowledge on the rank. However, with a predetermined rank, the direct PC approach needs less computing time than the bottom-up PC.  相似文献   

8.

Background

Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.

Methodology

Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.

Conclusions

Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.  相似文献   

9.

Background

When we talk to one another face-to-face, body gestures accompany our speech. Motion tracking technology enables us to include body gestures in avatar-mediated communication, by mapping one''s movements onto one''s own 3D avatar in real time, so the avatar is self-animated. We conducted two experiments to investigate (a) whether head-mounted display virtual reality is useful for researching the influence of body gestures in communication; and (b) whether body gestures are used to help in communicating the meaning of a word. Participants worked in pairs and played a communication game, where one person had to describe the meanings of words to the other.

Principal Findings

In experiment 1, participants used significantly more hand gestures and successfully described significantly more words when nonverbal communication was available to both participants (i.e. both describing and guessing avatars were self-animated, compared with both avatars in a static neutral pose). Participants ‘passed’ (gave up describing) significantly more words when they were talking to a static avatar (no nonverbal feedback available). In experiment 2, participants'' performance was significantly worse when they were talking to an avatar with a prerecorded listening animation, compared with an avatar animated by their partners'' real movements. In both experiments participants used significantly more hand gestures when they played the game in the real world.

Conclusions

Taken together, the studies show how (a) virtual reality can be used to systematically study the influence of body gestures; (b) it is important that nonverbal communication is bidirectional (real nonverbal feedback in addition to nonverbal communication from the describing participant); and (c) there are differences in the amount of body gestures that participants use with and without the head-mounted display, and we discuss possible explanations for this and ideas for future investigation.  相似文献   

10.
Claude Cyr  Luc Lanthier 《CMAJ》2007,177(12):1536-1538

Background

Canada''s Neo Rhino Party, a joke political party created in 2006 as a successor to the Parti Rhinocéros, is planning a new regulation to repeal the law of gravity, which could have an important impact on diseases attributable to gravity on earth.

Methods

We sought to estimate the number of quality-adjusted life-years that would be saved if the proposed regulation is passed and determine the cost-effectiveness of adapting Boris Volfson''s antigravity machine1 for use on earth. We performed an economic analysis using a hidden Markov model.

Results

Our results suggest that a microgravity environment would save over 2 million quality-adjusted life-years. The cost for every quality-adjusted life-year saved is estimated to be $328.

Interpretation

Microgravity is the solution to the health care crisis in Canada. In addition, using technological, statistical and medical jargon gives us the opportunity to defy the laws of physics, mathematics and medicine.Canada''s Neo Rhino Party is a joke federal political party that was created in Montréal, Quebec, in 2006 as the successor to the Parti Rhinocéros. Commonly called the Rhinoceros Party in English, this party was a registered political party in Canada from the 1960s to the 1990s. It was founded in 1963 by Jacques Ferron, a Canadian physician and author. The party''s basic credo was “a promise to keep none of our promises.” Its election platforms comprised impossible schemes that were designed to amuse and entertain the voting public. Some Rhinoceros Party promises included reducing the speed of light because it''s much too fast, paving Manitoba to create the world''s largest parking lot, providing higher education by building taller schools and repealing the law of gravity. When this last promise was made in the 1980s, it was unthinkable, that is, until Boris Volfson of Huntington, Indiana, received US Patent 6 960 975 for his design of an antigravity machine.1 The Neo Rhino Party is currently planning a new regulation to repeal the law of gravity that could have an important impact on diseases and other health outcomes attributable to gravity on earth.  相似文献   

11.

[Purpose]

Arterial stiffness is an independent predictor of cardiovascular risk and may contribute to reduced running capacity in humans. This study investigated the relationship between course record and arterial stiffness in marathoners who participated in the Seoul International Marathon in 2012.

[Methods]

A total of 30 amateur marathoners (Males n = 28, Females n = 2, mean age = 51.6 ± 8.3 years) were assessed before and after the marathon race. Brachial-ankle pulse wave velocity (ba-PWV) was assessed by VP-1000 plus (Omron Healthcare Co., Ltd., Kyoto, Japan) before and immediately after the marathon race. Pearson''s correlation coefficient was used to determine the relationship between race record and ba-PWV. In addition, Wilcoxon signed rank test was used to determine the difference in ba-PWV between before and after the race.

[Results]

There was no significant change in the ba-PWV of marathoners before and after the race (1271.1 ± 185 vs. 1268.8 ± 200 cm/s, P=0.579). Both the full course record (Pearson''s correlation coefficient = 0.416, P = 0.022) and the record of half line (Pearson''s correlation coefficient = 0.482, P = 0.007) were positively related with the difference in ba-PWV, suggesting that reduced arterial stiffness is associated with a better running record in the marathon.

[Conclusion]

These results may suggest that good vascular function contributes to a better running record in the marathon race.  相似文献   

12.

Background

The aim was to evaluate the readability of research information leaflets (RIL) for minors asked to participate in biomedical research studies and to assess the factors influencing this readability.

Methods and Findings

All the pediatric protocols from three French pediatric clinical research units were included (N = 104). Three criteria were used to evaluate readability: length of the text, Flesch''s readability score and presence of illustrations. We compared the readability of RIL to texts specifically written for children (school textbooks, school exams or extracts from literary works). We assessed the effect of protocol characteristics on readability. The RIL had a median length of 608 words [350 words, 25th percentile; 1005 words, 75th percentile], corresponding to two pages. The readability of the RIL, with a median Flesch score of 40 [30; 47], was much poorer than that of pediatric reference texts, with a Flesch score of 67 [60; 73]. A small proportion of RIL (13/91; 14%) were illustrated. The RIL were longer (p<0.001), more readable (p<0.001) and more likely to be illustrated (p<0.009) for industrial than for institutional sponsors.

Conclusion

Researchers should routinely compute the reading ease of study information sheets and make greater efforts to improve the readability of written documents for potential participants.  相似文献   

13.

Background

The image formed by the eye''s optics is inherently blurred by aberrations specific to an individual''s eyes. We examined how visual coding is adapted to the optical quality of the eye.

Methods and Findings

We assessed the relationship between perceived blur and the retinal image blur resulting from high order aberrations in an individual''s optics. Observers judged perceptual blur in a psychophysical two-alternative forced choice paradigm, on stimuli viewed through perfectly corrected optics (using a deformable mirror to compensate for the individual''s aberrations). Realistic blur of different amounts and forms was computer simulated using real aberrations from a population. The blur levels perceived as best focused were close to the levels predicted by an individual''s high order aberrations over a wide range of blur magnitudes, and were systematically biased when observers were instead adapted to the blur reproduced from a different observer''s eye.

Conclusions

Our results provide strong evidence that spatial vision is calibrated for the specific blur levels present in each individual''s retinal image and that this adaptation at least partly reflects how spatial sensitivity is normalized in the neural coding of blur.  相似文献   

14.

Introduction

Complete reporting assists readers in confirming the methodological rigor and validity of findings and allows replication. The reporting quality of observational functional magnetic resonance imaging (fMRI) studies involving clinical participants is unclear.

Objectives

We sought to determine the quality of reporting in observational fMRI studies involving clinical participants.

Methods

We searched OVID MEDLINE for fMRI studies in six leading journals between January 2010 and December 2011.Three independent reviewers abstracted data from articles using an 83-item checklist adapted from the guidelines proposed by Poldrack et al. (Neuroimage 2008; 40: 409–14). We calculated the percentage of articles reporting each item of the checklist and the percentage of reported items per article.

Results

A random sample of 100 eligible articles was included in the study. Thirty-one items were reported by fewer than 50% of the articles and 13 items were reported by fewer than 20% of the articles. The median percentage of reported items per article was 51% (ranging from 30% to 78%). Although most articles reported statistical methods for within-subject modeling (92%) and for between-subject group modeling (97%), none of the articles reported observed effect sizes for any negative finding (0%). Few articles reported justifications for fixed-effect inferences used for group modeling (3%) and temporal autocorrelations used to account for within-subject variances and correlations (18%). Other under-reported areas included whether and how the task design was optimized for efficiency (22%) and distributions of inter-trial intervals (23%).

Conclusions

This study indicates that substantial improvement in the reporting of observational clinical fMRI studies is required. Poldrack et al.''s guidelines provide a means of improving overall reporting quality. Nonetheless, these guidelines are lengthy and may be at odds with strict word limits for publication; creation of a shortened-version of Poldrack''s checklist that contains the most relevant items may be useful in this regard.  相似文献   

15.

Background

Within the structural and grammatical bounds of a common language, all authors develop their own distinctive writing styles. Whether the relative occurrence of common words can be measured to produce accurate models of authorship is of particular interest. This work introduces a new score that helps to highlight such variations in word occurrence, and is applied to produce models of authorship of a large group of plays from the Shakespearean era.

Methodology

A text corpus containing 55,055 unique words was generated from 168 plays from the Shakespearean era (16th and 17th centuries) of undisputed authorship. A new score, CM1, is introduced to measure variation patterns based on the frequency of occurrence of each word for the authors John Fletcher, Ben Jonson, Thomas Middleton and William Shakespeare, compared to the rest of the authors in the study (which provides a reference of relative word usage at that time). A total of 50 WEKA methods were applied for Fletcher, Jonson and Middleton, to identify those which were able to produce models yielding over 90% classification accuracy. This ensemble of WEKA methods was then applied to model Shakespearean authorship across all 168 plays, yielding a Matthews'' correlation coefficient (MCC) performance of over 90%. Furthermore, the best model yielded an MCC of 99%.

Conclusions

Our results suggest that different authors, while adhering to the structural and grammatical bounds of a common language, develop measurably distinct styles by the tendency to over-utilise or avoid particular common words and phrasings. Considering language and the potential of words as an abstract chaotic system with a high entropy, similarities can be drawn to the Maxwell''s Demon thought experiment; authors subconsciously favour or filter certain words, modifying the probability profile in ways that could reflect their individuality and style.  相似文献   

16.

Context

Prior research has faulted the US News and World Report hospital specialty rankings for excessive reliance on reputation, a subjective measure of a hospital''s performance.

Objective

To determine whether and to what extent reputation correlates with objective measures of research productivity among cancer hospitals.

Design

A retrospective observational study.

Setting

Automated search of NIH Reporter, BioEntrez, BioMedline and Clinicaltrials.gov databases.

Participants

The 50 highest ranked cancer hospitals in 2013''s US News and World Report Rankings.

Exposure

We ascertained the number of NCI funded grants, and the cumulative funds received by each cancer center. Additionally, we identified the number of phase I, phase II, and phase III studies published and indexed in MEDLINE, and registered at clinicaltrials.gov. All counts were over the preceding 5 years. For published articles, we summed the impact factor of the journals in which they appeared. Trials were attributed to centers on the basis of the affiliation of the lead author or study principal investigator.

Main Outcome

Correlation coefficients from simple and multiple linear regressions for measures of research productivity and a center''s reputation.

Results

All measures of research productivity demonstrated robust correlation with reputation (mean r-squared  = 0.65, median r-squared = 0.68, minimum r-squared = .41, maximum r-squared = 0.80). A multivariable model showed that 93% of the variation in reputation is explained by objective measures.

Conclusion

Contrary to prior criticism, the majority of reputation, used in US News and World Rankings, can be explained by objective measures of research productivity among cancer hospitals.  相似文献   

17.

Background

Categories of imperilment like the global IUCN Red List have been transformed to probabilities of extinction and used to rank species by the amount of imperiled evolutionary history they represent (e.g. by the Edge of Existence programme). We investigate the stability of such lists when ranks are converted to probabilities of extinction under different scenarios.

Methodology and Principal Findings

Using a simple example and computer simulation, we show that preserving the categories when converting such list designations to probabilities of extinction does not guarantee the stability of the resulting lists.

Significance

Care must be taken when choosing a suitable transformation, especially if conservation dollars are allocated to species in a ranked fashion. We advocate routine sensitivity analyses.  相似文献   

18.
Z Chen  Y Zhang  Z Liu  Y Liu  A Dyregrov 《PloS one》2012,7(8):e41741

Aim

PTSD symptoms were pervasive among children and adolescents after experiencing or exposure to traumatic events. Screening and diagnosis of PTSD symptoms is crucial in trauma-related research and practice. The 13-item Children''s Revised Impact of Event Scale (CRIES) has been demonstrated to be a valid and reliable tool to achieve this goal. This study was designed to examine the psychometric properties of the 13-item CRIES in a sample of Chinese debris flood victims.

Methods

A total of 268 participants (145 girls, 123 boys) aged 8–18 years were recruited from an integral part of a service oriented project, supported by the Institute of Psychology, Chinese Academy of Sciences following the debris flood. The participants were given the 13-item CRIES 3 months after the debris flood.

Results

The results of confirmatory factor analysis indicated that a two-factor structure (intrusion+arousal vs avoidance) emerged as the model best fit in total sample, boys and girls subsamples, respectively. The scale was also demonstrated to have good internal consistency (Cronbach''s alpha = 0.83).

Conclusion

The study confirmed the good psychometric properties of the CRIES and its'' applicability to Chinese children and adolescents. Moreover, these findings imply that the CRIES factor structure is stable across age, gender, and different types of trauma.  相似文献   

19.

Background

We study the evolutionary Prisoner''s Dilemma on two social networks substrates obtained from actual relational data.

Methodology/Principal Findings

We find very different cooperation levels on each of them that cannot be easily understood in terms of global statistical properties of both networks. We claim that the result can be understood at the mesoscopic scale, by studying the community structure of the networks. We explain the dependence of the cooperation level on the temptation parameter in terms of the internal structure of the communities and their interconnections. We then test our results on community-structured, specifically designed artificial networks, finding a good agreement with the observations in both real substrates.

Conclusion

Our results support the conclusion that studies of evolutionary games on model networks and their interpretation in terms of global properties may not be sufficient to study specific, real social systems. Further, the study allows us to define new quantitative parameters that summarize the mesoscopic structure of any network. In addition, the community perspective may be helpful to interpret the origin and behavior of existing networks as well as to design structures that show resilient cooperative behavior.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号