首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The frequent appearance of empirical rank-frequency laws, such as Zipf’s law, in a wide range of domains reinforces the importance of understanding and modeling these laws and rank-frequency distributions in general. In this spirit, we utilize a simple stochastic cascade process to simulate several empirical rank-frequency distributions longitudinally. We focus especially on limiting the process’s complexity to increase accessibility for non-experts in mathematics. The process provides a good fit for many empirical distributions because the stochastic multiplicative nature of the process leads to an often observed concave rank-frequency distribution (on a log-log scale) and the finiteness of the cascade replicates real-world finite size effects. Furthermore, we show that repeated trials of the process can roughly simulate the longitudinal variation of empirical ranks. However, we find that the empirical variation is often less that the average simulated process variation, likely due to longitudinal dependencies in the empirical datasets. Finally, we discuss the process limitations and practical applications.  相似文献   

2.
We propose a model that explains the reliable emergence of power laws (e.g., Zipf’s law) during the development of different human languages. The model incorporates the principle of least effort in communications, minimizing a combination of the information-theoretic communication inefficiency and direct signal cost. We prove a general relationship, for all optimal languages, between the signal cost distribution and the resulting distribution of signals. Zipf’s law then emerges for logarithmic signal cost distributions, which is the cost distribution expected for words constructed from letters or phonemes.  相似文献   

3.
When sending text messages on their mobile phone to friends, children often use a special type of register, which is called textese. This register allows the omission of words and the use of textisms: instances of non-standard written language such as 4ever (forever). Previous studies have shown that textese has a positive effect on children’s literacy abilities. In addition, it is possible that children’s grammar system is affected by textese as well, as grammar rules are often transgressed in this register. Therefore, the main aim of this study was to investigate whether the use of textese influences children’s grammar performance, and whether this effect is specific to grammar or language in general. Additionally, studies have not yet investigated the influence of textese on children’s cognitive abilities. Consequently, the secondary aim of this study was to find out whether textese affects children’s executive functions. To investigate this, 55 children between 10 and 13 years old were tested on a receptive vocabulary and grammar performance (sentence repetition) task and various tasks measuring executive functioning. In addition, text messages were elicited and the number of omissions and textisms in children’s messages were calculated. Regression analyses showed that omissions were a significant predictor of children’s grammar performance after various other variables were controlled for: the more words children omitted in their text messages, the better their performance on the grammar task. Although textisms correlated (marginally) significantly with vocabulary, grammar and selective attention scores and omissions marginally significantly with vocabulary scores, no other significant effects were obtained for measures of textese in the regression analyses: neither for the language outcomes, nor for the executive function tasks. Hence, our results show that textese is positively related to children’s grammar performance. On the other hand, use of textese does not affect—positively nor negatively—children’s executive functions.  相似文献   

4.
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.  相似文献   

5.
The brain''s decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example, robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this paper, we propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal sequences, giving a representation of the environment with predictive power on several timescales. We illustrate the ensuing decoding or recognition scheme using synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. By presenting anomalous stimuli, we find that the resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain.  相似文献   

6.
7.
Because the volume of information available online is growing at breakneck speed, keeping up with meaning and information communicated by the media and netizens is a new challenge both for scholars and for companies who must address public relations crises. Most current theories and tools are directed at identifying one website or one piece of online news and do not attempt to develop a rapid understanding of all websites and all news covering one topic. This paper represents an effort to integrate statistics, word segmentation, complex networks and visualization to analyze headlines’ keywords and words relationships in online Chinese news using two samples: the 2011 Bohai Bay oil spill and the 2010 Gulf of Mexico oil spill. We gathered all the news headlines concerning the two trending events in the search results from Baidu, the most popular Chinese search engine. We used Simple Chinese Word Segmentation to segment all the headlines into words and then took words as nodes and considered adjacent relations as edges to construct word networks both using the whole sample and at the monthly level. Finally, we develop an integrated mechanism to analyze the features of words’ networks based on news headlines that can account for all the keywords in the news about a particular event and therefore track the evolution of news deeply and rapidly.  相似文献   

8.
Despite being a paradigm of quantitative linguistics, Zipf’s law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf’s law in texts as anecdotic. We try to solve these issues by studying three different versions of Zipf’s law and fitting them to all available English texts in the Project Gutenberg database (consisting of more than 30 000 texts). To do so we use state-of-the art tools in fitting and goodness-of-fit tests, carefully tailored to the peculiarities of text statistics. Remarkably, one of the three versions of Zipf’s law, consisting of a pure power-law form in the complementary cumulative distribution function of word frequencies, is able to fit more than 40% of the texts in the database (at the 0.05 significance level), for the whole domain of frequencies (from 1 to the maximum value), and with only one free parameter (the exponent).  相似文献   

9.
Opportunities for associationist learning of word meaning, where a word is heard or read contemperaneously with information being available on its meaning, are considered too infrequent to account for the rate of language acquisition in children. It has been suggested that additional learning could occur in a distributional mode, where information is gleaned from the distributional statistics (word co-occurrence etc.) of natural language. Such statistics are relevant to meaning because of the Distributional Principle that ‘words of similar meaning tend to occur in similar contexts’. Computational systems, such as Latent Semantic Analysis, have substantiated the viability of distributional learning of word meaning, by showing that semantic similarities between words can be accurately estimated from analysis of the distributional statistics of a natural language corpus. We consider whether appearance similarities can also be learnt in a distributional mode. As grounds for such a mode we advance the Appearance Hypothesis that ‘words with referents of similar appearance tend to occur in similar contexts’. We assess the viability of such learning by looking at the performance of a computer system that interpolates, on the basis of distributional and appearance similarity, from words that it has been explicitly taught the appearance of, in order to identify and name objects that it has not been taught about. Our experiment tests with a set of 660 simple concrete noun words. Appearance information on words is modelled using sets of images of examples of the word. Distributional similarity is computed from a standard natural language corpus. Our computation results support the viability of distributional learning of appearance.  相似文献   

10.
In his review of neural binding problems, Feldman (Cogn Neurodyn 7:1–11, 2013) addressed two types of models as solutions of (novel) variable binding. The one type uses labels such as phase synchrony of activation. The other (‘connectivity based’) type uses dedicated connections structures to achieve novel variable binding. Feldman argued that label (synchrony) based models are the only possible candidates to handle novel variable binding, whereas connectivity based models lack the flexibility required for that. We argue and illustrate that Feldman’s analysis is incorrect. Contrary to his conclusion, connectivity based models are the only viable candidates for models of novel variable binding because they are the only type of models that can produce behavior. We will show that the label (synchrony) based models analyzed by Feldman are in fact examples of connectivity based models. Feldman’s analysis that novel variable binding can be achieved without existing connection structures seems to result from analyzing the binding problem in a wrong frame of reference, in particular in an outside instead of the required inside frame of reference. Connectivity based models can be models of novel variable binding when they possess a connection structure that resembles a small-world network, as found in the brain. We will illustrate binding with this type of model with episode binding and the binding of words, including novel words, in sentence structures.  相似文献   

11.
Feminist news media researchers have long contended that masculine news values shape journalists’ quotidian decisions about what is newsworthy. As a result, it is argued, topics and issues traditionally regarded as primarily of interest and relevance to women are routinely marginalised in the news, while men’s views and voices are given privileged space. When women do show up in the news, it is often as “eye candy,” thus reinforcing women’s value as sources of visual pleasure rather than residing in the content of their views. To date, evidence to support such claims has tended to be based on small-scale, manual analyses of news content. In this article, we report on findings from our large-scale, data-driven study of gender representation in online English language news media. We analysed both words and images so as to give a broader picture of how gender is represented in online news. The corpus of news content examined consists of 2,353,652 articles collected over a period of six months from more than 950 different news outlets. From this initial dataset, we extracted 2,171,239 references to named persons and 1,376,824 images resolving the gender of names and faces using automated computational methods. We found that males were represented more often than females in both images and text, but in proportions that changed across topics, news outlets and mode. Moreover, the proportion of females was consistently higher in images than in text, for virtually all topics and news outlets; women were more likely to be represented visually than they were mentioned as a news actor or source. Our large-scale, data-driven analysis offers important empirical evidence of macroscopic patterns in news content concerning the way men and women are represented.  相似文献   

12.
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or “words”. We first confirmed that the English language highly likely follows Zipf''s law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and “compressed” English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., “key words”) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.  相似文献   

13.
The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (kmax). It is here shown that this maximum entropy prediction also describes a text written in Chinese characters. In particular it is shown that although the same Chinese text written in words and Chinese characters have quite differently shaped distributions, they are nevertheless both well predicted by their respective three a priori characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (M, N, kmax) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three a priori values (M, N, kmax), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf’s law, the Simon-model for texts and the present results are discussed.  相似文献   

14.
On the evolutionary trajectory that led to human language there must have been a transition from a fairly limited to an essentially unlimited communication system. The structure of modern human languages reveals at least two steps that are required for such a transition: in all languages (i) a small number of phonemes are used to generate a large number of words; and (ii) a large number of words are used to a produce an unlimited number of sentences. The first (and simpler) step is the topic of the current paper. We study the evolution of communication in the presence of errors and show that this limits the number of objects (or concepts) that can be described by a simple communication system. The evolutionary optimum is achieved by using only a small number of signals to describe a few valuable concepts. Adding more signals does not increase the fitness of a language. This represents an error limit for the evolution of communication. We show that this error limit can be overcome by combining signals (phonemes) into words. The transition from an analogue to a digital system was a necessary step toward the evolution of human language.  相似文献   

15.
Current metrics for estimating a scientist’s academic performance treat the author’s publications as if these were solely attributable to the author. However, this approach ignores the substantive contributions of co-authors, leading to misjudgments about the individual’s own scientific merits and consequently to misallocation of funding resources and academic positions. This problem is becoming the more urgent in the biomedical field where the number of collaborations is growing rapidly, making it increasingly harder to support the best scientists. Therefore, here we introduce a simple harmonic weighing algorithm for correcting citations and citation-based metrics such as the h-index for co-authorships. This weighing algorithm can account for both the nvumber of co-authors and the sequence of authors on a paper. We then derive a measure called the ‘profit (p)-index’, which estimates the contribution of co-authors to the work of a given author. By using samples of researchers from a renowned Dutch University hospital, Spinoza Prize laureates (the most prestigious Dutch science award), and Nobel Prize laureates in Physiology or Medicine, we show that the contribution of co-authors to the work of a particular author is generally substantial (i.e., about 80%) and that researchers’ relative rankings change materially when adjusted for the contributions of co-authors. Interestingly, although the top University hospital researchers had the highest h-indices, this appeared to be due to their significantly higher p-indices. Importantly, the ranking completely reversed when using the profit adjusted h-indices, with the Nobel laureates having the highest, the Spinoza Prize laureates having an intermediate, and the top University hospital researchers having the lowest profit adjusted h-indices, respectively, suggesting that exceptional researchers are characterized by a relatively high degree of scientific independency/originality. The concepts and methods introduced here may thus provide a more fair impression of a scientist’s autonomous academic performance.  相似文献   

16.
What are the limits of unconscious language processing? Can language circuits process simple grammatical constructions unconsciously and integrate the meaning of several unseen words? Using behavioural priming and electroencephalography (EEG), we studied a specific rule-based linguistic operation traditionally thought to require conscious cognitive control: the negation of valence. In a masked priming paradigm, two masked words were successively (Experiment 1) or simultaneously presented (Experiment 2), a modifier (‘not’/‘very’) and an adjective (e.g. ‘good’/‘bad’), followed by a visible target noun (e.g. ‘peace’/‘murder’). Subjects indicated whether the target noun had a positive or negative valence. The combination of these three words could either be contextually consistent (e.g. ‘very bad - murder’) or inconsistent (e.g. ‘not bad - murder’). EEG recordings revealed that grammatical negations could unfold partly unconsciously, as reflected in similar occipito-parietal N400 effects for conscious and unconscious three-word sequences forming inconsistent combinations. However, only conscious word sequences elicited P600 effects, later in time. Overall, these results suggest that multiple unconscious words can be rapidly integrated and that an unconscious negation can automatically ‘flip the sign’ of an unconscious adjective. These findings not only extend the limits of subliminal combinatorial language processes, but also highlight how consciousness modulates the grammatical integration of multiple words.  相似文献   

17.
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.  相似文献   

18.
A ‘Sleeping Beauty in Science’ is a publication that goes unnoticed (‘sleeps’) for a long time and then, almost suddenly, attracts a lot of attention (‘is awakened by a prince’). The aim of this paper is to present a general methodology to investigate (1) important properties of Sleeping Beauties such as the time-dependent distribution, author characteristics, journals and fields, and (2) the cognitive environment of Sleeping Beauties. We are particularly interested to find out to what extent Sleeping Beauties are application-oriented and thus are potential Sleeping Innovations. In this study we focus primarily on physics (including materials science and astrophysics) and present first results for chemistry and for engineering & computer science. We find that more than half of the SBs are application-oriented. To study the cognitive environments of Sleeping Beauties we develop a new approach in which the cognitive environment of the SBs is analyzed, based on the mapping of Sleeping Beauties using their citation links and conceptual relations, particularly co-citation mapping. In this way we investigate the research themes in which the SBs are ‘used’ and possible causes of why the premature work in the SBs becomes topical, i.e., the trigger of the awakening of the SBs. This approach is tested with a blue skies SB and an application-oriented SB. We think that the mapping procedures discussed in this paper are not only important for bibliometric analyses. They also provide researchers with useful, interactive tools to discover both relevant older work as well as new developments, for instance in themes related to Sleeping Beauties that are also Sleeping Innovations.  相似文献   

19.
The most influential theory of learning to read is based on the idea that children rely on phonological decoding skills to learn novel words. According to the self-teaching hypothesis, each successful decoding encounter with an unfamiliar word provides an opportunity to acquire word-specific orthographic information that is the foundation of skilled word recognition. Therefore, phonological decoding acts as a self-teaching mechanism or ‘built-in teacher’. However, all previous connectionist models have learned the task of reading aloud through exposure to a very large corpus of spelling–sound pairs, where an ‘external’ teacher supplies the pronunciation of all words that should be learnt. Such a supervised training regimen is highly implausible. Here, we implement and test the developmentally plausible phonological decoding self-teaching hypothesis in the context of the connectionist dual process model. In a series of simulations, we provide a proof of concept that this mechanism works. The model was able to acquire word-specific orthographic representations for more than 25 000 words even though it started with only a small number of grapheme–phoneme correspondences. We then show how visual and phoneme deficits that are present at the outset of reading development can cause dyslexia in the course of reading development.  相似文献   

20.
Previous research has shown that political leanings correlate with various psychological factors. While surveys and experiments provide a rich source of information for political psychology, data from social networks can offer more naturalistic and robust material for analysis. This research investigates psychological differences between individuals of different political orientations on a social networking platform, Twitter. Based on previous findings, we hypothesized that the language used by liberals emphasizes their perception of uniqueness, contains more swear words, more anxiety-related words and more feeling-related words than conservatives’ language. Conversely, we predicted that the language of conservatives emphasizes group membership and contains more references to achievement and religion than liberals’ language. We analysed Twitter timelines of 5,373 followers of three Twitter accounts of the American Democratic and 5,386 followers of three accounts of the Republican parties’ Congressional Organizations. The results support most of the predictions and previous findings, confirming that Twitter behaviour offers valid insights to offline behaviour.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号