首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Words are built from smaller meaning bearing parts, called morphemes. As one word can contain multiple morphemes, one morpheme can be present in different words. The number of distinct words a morpheme can be found in is its family size. Here we used Birth-Death-Innovation Models (BDIMs) to analyze the distribution of morpheme family sizes in English and German vocabulary over the last 200 years. Rather than just fitting to a probability distribution, these mechanistic models allow for the direct interpretation of identified parameters. Despite the complexity of language change, we indeed found that a specific variant of this pure stochastic model, the second order linear balanced BDIM, significantly fitted the observed distributions. In this model, birth and death rates are increased for smaller morpheme families. This finding indicates an influence of morpheme family sizes on vocabulary changes. This could be an effect of word formation, perception or both. On a more general level, we give an example on how mechanistic models can enable the identification of statistical trends in language change usually hidden by cultural influences.  相似文献   

2.
Previous “one tone per word” analyses of Somali wordhood fall short in a number of ways due to the morphological and prosodic complexity of the language. While the presence of a single accentual high tone is generally a good diagnostic for prosodic wordhood in the language, it is a poor predictor of grammatical wordhood. In this paper, we aim to refine the criteria needed to define both. We explore the culminative role played by tonal accent in the formation of prosodic words and the contributions of morphosyntactic and phonological phenomena in defining larger phrases that are sometimes considered single words in the language. We explore positive and negative correlations between prosodic and grammatical wordhood, and in doing so, we find that the differing accentual behavior of Somali words depends largely on the prosodic structure of their constituent morphemes and the position of these morphemes on a wordhood cline. We illustrate that while each maximal prosodic word in the language exhibits one tone, a minimal prosodic word is better defined in terms of its accentual properties. In addition, while prosodic and grammatical wordhood often align with one another, grammatical wordhood cannot be unambiguously defined based on tone or accent location.  相似文献   

3.
We report here trends in the usage of “mood” words, that is, words carrying emotional content, in 20th century English language books, using the data set provided by Google that includes word frequencies in roughly 4% of all books published up to the year 2008. We find evidence for distinct historical periods of positive and negative moods, underlain by a general decrease in the use of emotion-related words through time. Finally, we show that, in books, American English has become decidedly more “emotional” than British English in the last half-century, as a part of a more general increase of the stylistic divergence between the two variants of English language.  相似文献   

4.

Background

Zipf''s discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well.

Methodology/Principal Findings

By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type – a measure of the logicality of each word – and less strongly on frequency. We develop a generative model of this behavior that fully determines the dynamics of word usage.

Conclusions/Significance

Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf''s law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics.  相似文献   

5.
Scientists studying how languages change over time often make an analogy between biological and cultural evolution, with words or grammars behaving like traits subject to natural selection. Recent work has exploited this analogy by using models of biological evolution to explain the properties of languages and other cultural artefacts. However, the mechanisms of biological and cultural evolution are very different: biological traits are passed between generations by genes, while languages and concepts are transmitted through learning. Here we show that these different mechanisms can have the same results, demonstrating that the transmission of frequency distributions over variants of linguistic forms by Bayesian learners is equivalent to the Wright–Fisher model of genetic drift. This simple learning mechanism thus provides a justification for the use of models of genetic drift in studying language evolution. In addition to providing an explicit connection between biological and cultural evolution, this allows us to define a ‘neutral’ model that indicates how languages can change in the absence of selection at the level of linguistic variants. We demonstrate that this neutral model can account for three phenomena: the s-shaped curve of language change, the distribution of word frequencies, and the relationship between word frequencies and extinction rates.  相似文献   

6.
This paper outlines a neurocognitive approach to human language, focusing on inflectional morphology and grammatical function in English. Taking as a starting point the selective deficits for regular inflectional morphology of a group of non-fluent patients with left hemisphere damage, we argue for a core decompositional network linking left inferior frontal cortex with superior and middle temporal cortex, connected via the arcuate fasciculus. This network handles the processing of regularly inflected words (such as joined or treats), which are argued not to be stored as whole forms and which require morpho-phonological parsing in order to segment complex forms into stems and inflectional affixes. This parsing process operates early and automatically upon all potential inflected forms and is triggered by their surface phonological properties. The predictions of this model were confirmed in a further neuroimaging study, using event-related functional magnetic resonance imaging (fMRI), on unimpaired young adults. The salience of grammatical morphemes for the language system is highlighted by new research showing that similarly early and blind segmentation also operates for derivationally complex forms (such as darkness or rider). These findings are interpreted as evidence for a hidden decompositional substrate to human language processing and related to a functional architecture derived from non-human primate models.  相似文献   

7.
This paper identifies two difficulties with treatments of derivation in Algonquian languages. In traditional approaches to grammar, in which the morpheme is seen as a unitary entity, morphemes are understood as minimal units of meaning and/or function. Definitions share an appeal to the morpheme’s indivisibility. In the Algonquianist literature, in contrast, some morphemes (‘components’) can themselves contain other morphemes (which we call ‘formatives’) and they can also be synchronically derived from other components or stems. Drawing data from Menominee, we propose that these difficulties disappear if the formatives are seen as historical rather than synchronic units, while the components are the synchronic morphemes. Formatives bear the hallmarks of historical products of morphologization (phonetic/phonological reduction, semantic bleaching, and increase in grammatical function), and we conclude that they are not part of synchronic grammatical computation. This resolves problems present in traditional and modern theoretical approaches to Algonquian derivation, and has broader ramifications for linguistic theory: in both the structuralist and generativist traditions, synchronic grammar has often been seen as expansive, responsible for generating surface patterns that may instead be products of history. This has been the case in phonology and syntax. The present paper provides a study of the phenomenon in derivational morphology, and suggests that a more modest role for synchronic rules is called for.  相似文献   

8.
We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a word's position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the world's language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history.  相似文献   

9.
In this study we examine linguistic variation and its dependence on both social and geographic factors. We follow dialectometry in applying a quantitative methodology and focusing on dialect distances, and social dialectology in the choice of factors we examine in building a model to predict word pronunciation distances from the standard Dutch language to 424 Dutch dialects. We combine linear mixed-effects regression modeling with generalized additive modeling to predict the pronunciation distance of 559 words. Although geographical position is the dominant predictor, several other factors emerged as significant. The model predicts a greater distance from the standard for smaller communities, for communities with a higher average age, for nouns (as contrasted with verbs and adjectives), for more frequent words, and for words with relatively many vowels. The impact of the demographic variables, however, varied from word to word. For a majority of words, larger, richer and younger communities are moving towards the standard. For a smaller minority of words, larger, richer and younger communities emerge as driving a change away from the standard. Similarly, the strength of the effects of word frequency and word category varied geographically. The peripheral areas of the Netherlands showed a greater distance from the standard for nouns (as opposed to verbs and adjectives) as well as for high-frequency words, compared to the more central areas. Our findings indicate that changes in pronunciation have been spreading (in particular for low-frequency words) from the Hollandic center of economic power to the peripheral areas of the country, meeting resistance that is stronger wherever, for well-documented historical reasons, the political influence of Holland was reduced. Our results are also consistent with the theory of lexical diffusion, in that distances from the Hollandic norm vary systematically and predictably on a word by word basis.  相似文献   

10.
Language is about words and rules. While there is some discussion to what extent rules are learned or innate, it is clear that words have to be learned. Here I construct a mathematical framework for the population dynamics of language evolution with particular emphasis on how words are propagated over generations. I define the basic reproductive ratio of word, R, and show that R > 1 is required for words to be maintained in the lexicon of a language. Assuming that the frequency distribution of words follow Zipf's law, an upper limit is obtained for the number of words in a language that relies exclusively on oral transmission.  相似文献   

11.
Tria F  Galantucci B  Loreto V 《PloS one》2012,7(6):e37744
The lexicons of human languages organize their units at two distinct levels. At a first combinatorial level, meaningless forms (typically referred to as phonemes) are combined into meaningful units (typically referred to as morphemes). Thanks to this, many morphemes can be obtained by relatively simple combinations of a small number of phonemes. At a second compositional level of the lexicon, morphemes are composed into larger lexical units, the meaning of which is related to the individual meanings of the composing morphemes. This duality of patterning is not a necessity for lexicons and the question remains wide open regarding how a population of individuals is able to bootstrap such a structure and the evolutionary advantages of its emergence. Here we address this question in the framework of a multi-agents model, where a population of individuals plays simple naming games in a conceptual environment modeled as a graph. We demonstrate that errors in communication as well as a blending repair strategy, which crucially exploits a shared conceptual representation of the environment, are sufficient conditions for the emergence of duality of patterning, that can thus be explained in a pure cultural way. Compositional lexicons turn out to be faster to lead to successful communication than purely combinatorial lexicons, suggesting that meaning played a crucial role in the evolution of language.  相似文献   

12.
Memory traces for words are frequently conceptualized neurobiologically as networks of neurons interconnected via reciprocal links developed through associative learning in the process of language acquisition. Neurophysiological reflection of activation of such memory traces has been reported using the mismatch negativity brain potential (MMN), which demonstrates an enhanced response to meaningful words over meaningless items. This enhancement is believed to be generated by the activation of strongly intraconnected long-term memory circuits for words that can be automatically triggered by spoken linguistic input and that are absent for unfamiliar phonological stimuli. This conceptual framework critically predicts different amounts of activation depending on the strength of the word's lexical representation in the brain. The frequent use of words should lead to more strongly connected representations, whereas less frequent items would be associated with more weakly linked circuits. A word with higher frequency of occurrence in the subject's language should therefore lead to a more pronounced lexical MMN response than its low-frequency counterpart. We tested this prediction by comparing the event-related potentials elicited by low- and high-frequency words in a passive oddball paradigm; physical stimulus contrasts were kept identical. We found that, consistent with our prediction, presenting the high-frequency stimulus led to a significantly more pronounced MMN response relative to the low-frequency one, a finding that is highly similar to previously reported MMN enhancement to words over meaningless pseudowords. Furthermore, activation elicited by the higher-frequency word peaked earlier relative to low-frequency one, suggesting more rapid access to frequently used lexical entries. These results lend further support to the above view on word memory traces as strongly connected assemblies of neurons. The speed and magnitude of their activation appears to be linked to the strength of internal connections in a memory circuit, which is in turn determined by the everyday use of language elements.  相似文献   

13.
Nabeshima T  Gunji YP 《Bio Systems》2004,73(2):131-139
Frequency distribution of word usage in a word sequence generated by capping is estimated in terms of the number of "hits" in retrieval of web-pages, to evaluate structure of semantics proper not to a particular text but to a language. Especially we compare distribution of English sequences with Japanese ones and obtain that, for English and Japanese phonogram, frequency of word usage against rank follows power-law function with exponent 1 and, for Japanese ideogram, it follows stretched exponential (Weibull distribution) function. We also discuss that such a difference can result from difference of phonogram based- (English) and ideogram-based language (Japanese).  相似文献   

14.
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.  相似文献   

15.
Aim To demonstrate that parsimony analysis of endemicity (PAE) is not analogous to a cladistic biogeographical analysis. Location We used six data sets from previously published studies from around the world. Methods In order to test the efficiency of PAE in recovering historical relationships among areas, we performed an empirical comparison of nodes recovered with PAE, primary Brooks parsimony analysis (BPA), and an event‐based method using three models (maximum codivergence, reconciled trees, and the default model of the treefitter program) for six data sets. We measured the performance of PAE in recovering historical area relationships by counting the number and examining the content of nodes recovered by PAE and by historical methods. The dispersal/vicariance ratio was calculated to assess the prevalence of dispersal or vicariance in each reconstruction and its relationship to the performance of PAE. Results Our results show that PAE recovers an average of 17.25% of historical nodes. PAE and BPA tend to provide similar results; however, in relation to the event‐based models, PAE performance was poor under all the tested scenarios. Although in some cases PAE reconstructions are more resolved than historical reconstructions, this does not necessarily mean that PAE produces more informative answers. These additional nodes correspond to unsupported statements that are based solely on the distributional data of taxa and not on their phylogenetic history. In other words, these nodes were not found by the historical methods, which take phylogenetics into account. The number of historical nodes recovered using PAE was in general negatively correlated with the dispersal/vicariance ratio. Main conclusions Our results show that PAE is unable to recover historical patterns and therefore does not fit into the current paradigm of historical biogeography. These findings raise doubts regarding conclusions derived from biogeographical studies that interpret PAE trees as area cladograms. We acknowledge that PAE aims to describe but does not explain the current distribution of organisms. It is therefore a useful tool in other biogeographical or ecological analyses for exploring the distribution of taxa or for establishing hypotheses of primary homology between areas.  相似文献   

16.
Q’eqchi’ (Mayan stock, K’ichean subgroup) is an ergative language; a finite verb form obligatorily carries information on the person and number of the absolutive participant, i.e., the unique argument of an intransitive verb or the direct object of a transitive one. The set of personal absolutive markers includes five morphemes; the third person singular has no overt marker. These morphemes in Modern Q’eqchi’ are prefixes in a finite verbal predication and enclitics in a non-finite predication. In a finite verb form, the place of an absolutive prefix is between the tense-aspect prefix and personal ergative prefix (in a transitive predication) or verb root (in an intransitive one). This paper argues that during Colonial Q’eqchi’ (used in the second half of the 16th century and slightly later) the general structure of a verbal complex was completely different, and all personal absolutive markers were in fact enclitics. They were enclitisized to tense-aspect morphemes that functioned syntactically as main predicates of a complex construction. Further diachronic change consolidated a verbal complex, conditioning the transition to affixation.  相似文献   

17.
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between their intrinsic properties and the environments in which they function. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.  相似文献   

18.
Conversations reflect the existing norms of a language. Previously, we found that utterance lengths in English fictional conversations in books and movies have shortened over a period of 200 years. In this work, we show that this shortening occurs even for a brief period of 3 years (September 2009–December 2012) using 229 million utterances from Twitter. Furthermore, the subset of geographically-tagged tweets from the United States show an inverse proportion between utterance lengths and the state-level percentage of the Black population. We argue that shortening of utterances can be explained by the increasing usage of jargon including coined words.  相似文献   

19.
Game Dynamics with Learning and Evolution of Universal Grammar   总被引:1,自引:0,他引:1  
We investigate a model of language evolution, based on population game dynamics with learning. First, we examine the case of two genetic variants of universal grammar (UG), the heart of the human language faculty, assuming each admits two possible grammars. The dynamics are driven by a communication game. We prove using dynamical systems techniques that if the payoff matrix obeys certain constraints, then the two UGs are stable against invasion by each other, that is, they are evolutionarily stable. Then, we prove a similar theorem for an arbitrary number of disjoint UGs. In both theorems, the constraints are independent of the learning process. Intuitively, if a mutation in UG results in grammars that are incompatible with the established languages, then the mutation will die out because mutants will be unable to communicate and therefore unable to realize any potential benefit of the mutation. An example for which these theorems do not apply shows that compatible mutations may or may not be able to invade, depending on the population's history and the learning process. These results suggest that the genetic history of language is constrained by the need for compatibility and that mutations in the language faculty may have died out or taken over due more to historical accident than to any straightforward notion of relative fitness. MSC 1991: 37N25 · 92D15 · 91F20  相似文献   

20.
Lexical decision task in an event-related potential experiment was used in order to determine the organization of mental lexicon regarding the polimorphemic words: are they stored as unanalyzable items or as separate morphemes? The results indicate the later: while monomorphemic words elicit N400 component, usually related to lexical-semantic processing, prefixed words and prefixed pseudo-words elicit left anterior negativity (LAN), usually related to grammatical (morphosyntactic) processes. These components indicate that the speakers apply grammatical (i.e., word-formation) rules and combine morphemes in order to obtain lexical meaning of the prefixed word.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号