首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.

Methodology

Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.

Conclusions

Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.  相似文献   

2.
To what extent do phonological codes constrain orthographic output in handwritten production? We investigated how phonological codes constrain the selection of orthographic codes via sublexical and lexical routes in Chinese written production. Participants wrote down picture names in a picture-naming task in Experiment 1or response words in a symbol—word associative writing task in Experiment 2. A sublexical phonological property of picture names (phonetic regularity: regular vs. irregular) in Experiment 1and a lexical phonological property of response words (homophone density: dense vs. sparse) in Experiment 2, as well as word frequency of the targets in both experiments, were manipulated. A facilitatory effect of word frequency was found in both experiments, in which words with high frequency were produced faster than those with low frequency. More importantly, we observed an inhibitory phonetic regularity effect, in which low-frequency picture names with regular first characters were slower to write than those with irregular ones, and an inhibitory homophone density effect, in which characters with dense homophone density were produced more slowly than those with sparse homophone density. Results suggested that phonological codes constrained handwritten production via lexical and sublexical routes.  相似文献   

3.
The most influential theory of learning to read is based on the idea that children rely on phonological decoding skills to learn novel words. According to the self-teaching hypothesis, each successful decoding encounter with an unfamiliar word provides an opportunity to acquire word-specific orthographic information that is the foundation of skilled word recognition. Therefore, phonological decoding acts as a self-teaching mechanism or ‘built-in teacher’. However, all previous connectionist models have learned the task of reading aloud through exposure to a very large corpus of spelling–sound pairs, where an ‘external’ teacher supplies the pronunciation of all words that should be learnt. Such a supervised training regimen is highly implausible. Here, we implement and test the developmentally plausible phonological decoding self-teaching hypothesis in the context of the connectionist dual process model. In a series of simulations, we provide a proof of concept that this mechanism works. The model was able to acquire word-specific orthographic representations for more than 25 000 words even though it started with only a small number of grapheme–phoneme correspondences. We then show how visual and phoneme deficits that are present at the outset of reading development can cause dyslexia in the course of reading development.  相似文献   

4.
Reading familiar words differs from reading unfamiliar non-words in two ways. First, word reading is faster and more accurate than reading of unfamiliar non-words. Second, effects of letter length are reduced for words, particularly when they are presented in the right visual field in familiar formats. Two experiments are reported in which right-handed participants read aloud non-words presented briefly in their left and right visual fields before and after training on those items. The non-words were interleaved with familiar words in the naming tests. Before training, naming was slow and error prone, with marked effects of length in both visual fields. After training, fewer errors were made, naming was faster, and the effect of length was much reduced in the right visual field compared with the left. We propose that word learning creates orthographic word forms in the mid-fusiform gyrus of the left cerebral hemisphere. Those word forms allow words to access their phonological and semantic representations on a lexical basis. But orthographic word forms also interact with more posterior letter recognition systems in the middle/inferior occipital gyri, inducing more parallel processing of right visual field words than is possible for any left visual field stimulus, or for unfamiliar non-words presented in the right visual field.  相似文献   

5.
Cognitive science has a rich history of interest in the ways that languages represent abstract and concrete concepts (e.g., idea vs. dog). Until recently, this focus has centered largely on aspects of word meaning and semantic representation. However, recent corpora analyses have demonstrated that abstract and concrete words are also marked by phonological, orthographic, and morphological differences. These regularities in sound-meaning correspondence potentially allow listeners to infer certain aspects of semantics directly from word form. We investigated this relationship between form and meaning in a series of four experiments. In Experiments 1-2 we examined the role of metalinguistic knowledge in semantic decision by asking participants to make semantic judgments for aurally presented nonwords selectively varied by specific acoustic and phonetic parameters. Participants consistently associated increased word length and diminished wordlikeness with abstract concepts. In Experiment 3, participants completed a semantic decision task (i.e., abstract or concrete) for real words varied by length and concreteness. Participants were more likely to misclassify longer, inflected words (e.g., "apartment") as abstract and shorter uninflected abstract words (e.g., "fate") as concrete. In Experiment 4, we used a multiple regression to predict trial level naming data from a large corpus of nouns which revealed significant interaction effects between concreteness and word form. Together these results provide converging evidence for the hypothesis that listeners map sound to meaning through a non-arbitrary process using prior knowledge about statistical regularities in the surface forms of words.  相似文献   

6.
In typical readers, orthographic knowledge has been shown to influence phonological decisions. In the present study, we used visual rhyme and spelling tasks to investigate the interaction of orthographic and phonological information in adults with varying reading skill. Word pairs that shared both orthography and phonology (e.g., throat/boat), differed in both orthography and phonology (e.g., snow/arm), shared only orthography (e.g., farm/warm), and shared only phonology (e.g., vote/boat) were visually presented to university students who varied in reading ability. For rhyme judgment, participants were slower and less accurate to accept rhyming pairs when words were spelled differently and to reject non-rhyming pairs when words were spelled similarly. Similarly, for spelling judgments, participants were slower and less accurate when indicating that word endings were spelled differently when words rhymed, and slower and less accurate when indicating that words were spelled similarly when words did not rhyme. Crucially, while these effects were clear at the group level, there were large individual differences in the extent to which participants were impacted by conflict. In two separate samples, reading skill was associated with the extent to which orthographic conflict impacted rhyme decisions such that individuals with better nonword reading performance were less impacted by orthographic conflict. Thus, university students with poorer reading skills may differ from their peers either in the reading strategies they use or in the degree to which they automatically access word form information. Understanding these relationships is important for understanding the roles that reading processes play in readers of different skill.  相似文献   

7.
Developmental dyslexia is a neurological condition that is characterized by severe impairment in reading skill acquisition in people with adequate intelligence and typical schooling [1], [2] and [3]. For English readers, reading impairment is critically associated with a phonological processing disorder [3], [4] and [5], which may co-occur with an orthographic (visual word form) processing deficit [6], but not with a general visual processing dysfunction in most dyslexics [7]. The pathophysiology of dyslexia varies across languages [8]: for instance, unlike English, written Chinese maps visually intricate graphic forms (characters) onto meanings; pronunciation of Chinese characters must be rote memorized. This suggests that, in Chinese, a fine-grained visuospatial analysis must be performed to activate characters' phonology and meaning; consequently, disordered phonological processing may commonly co-exist with abnormal visuospatial processing in Chinese dyslexia. To test this hypothesis, we conducted an fMRI experiment in which 12 Chinese dyslexics, shown previously [9] to exhibit a phonological disorder, performed a physical size judgment measuring visuospatial dimensions. Compared with 12 control subjects, the dyslexics showed weaker activations in left intraparietal sulcus (IPS) mediating visuospatial processing. Analyses of individual dyslexics' performances further suggest that developmental dyslexia in Chinese is commonly associated with the co-existence of a visuospatial deficit and a phonological disorder.  相似文献   

8.
It is a well-known fact that languages react differently when foreign words denoting new concepts have to be integrated into the native system. The procedure mostly depends on the degree of purism present in a linguistic community: some languages are rather open to foreign influences and do not demonstrate any special hostility towards new words which are easily accepted and adapted to the phonological and morphological systems of the receiving language. Languages, which have a strong puristic tradition, usually channel their borrowings into the loan translation field using internal word formation resources as a means of creating neologisms. Regardless of whether they are built of native elements or appear as loans, neologisms are necessarily the result of linguistic changes.  相似文献   

9.
We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a word's position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the world's language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history.  相似文献   

10.
Scientists studying how languages change over time often make an analogy between biological and cultural evolution, with words or grammars behaving like traits subject to natural selection. Recent work has exploited this analogy by using models of biological evolution to explain the properties of languages and other cultural artefacts. However, the mechanisms of biological and cultural evolution are very different: biological traits are passed between generations by genes, while languages and concepts are transmitted through learning. Here we show that these different mechanisms can have the same results, demonstrating that the transmission of frequency distributions over variants of linguistic forms by Bayesian learners is equivalent to the Wright–Fisher model of genetic drift. This simple learning mechanism thus provides a justification for the use of models of genetic drift in studying language evolution. In addition to providing an explicit connection between biological and cultural evolution, this allows us to define a ‘neutral’ model that indicates how languages can change in the absence of selection at the level of linguistic variants. We demonstrate that this neutral model can account for three phenomena: the s-shaped curve of language change, the distribution of word frequencies, and the relationship between word frequencies and extinction rates.  相似文献   

11.
Evidence from previous psycholinguistic research suggests that phonological units such as phonemes have a privileged role during phonological planning in Dutch and English (aka the segment-retrieval hypothesis). However, the syllable-retrieval hypothesis previously proposed for Mandarin assumes that only the entire syllable unit (without the tone) can be prepared in advance in speech planning. Using Cantonese Chinese as a test case, the present study was conducted to investigate whether the syllable-retrieval hypothesis can be applied to other Chinese spoken languages. In four implicit priming (form-preparation) experiments, participants were asked to learn various sets of prompt-response di-syllabic word pairs and to utter the corresponding response word upon seeing each prompt. The response words in a block were either phonologically related (homogeneous) or unrelated (heterogeneous). Participants'' naming responses were significantly faster in the homogeneous than in the heterogeneous conditions when the response words shared the same word-initial syllable (without the tone) (Exps.1 and 4) or body (Exps.3 and 4), but not when they shared merely the same word-initial phoneme (Exp.2). Furthermore, the priming effect observed in the syllable-related condition was significantly larger than that in the body-related condition (Exp. 4). Although the observed syllable priming effects and the null effect of word-initial phoneme are consistent with the syllable-retrieval hypothesis, the body-related (sub-syllabic) priming effects obtained in this Cantonese study are not. These results suggest that the syllable-retrieval hypothesis is not generalizable to all Chinese spoken languages and that both syllable and sub-syllabic constituents are legitimate planning units in Cantonese speech production.  相似文献   

12.
Reading disability (RD), or dyslexia, is a complex cognitive disorder manifested by difficulties in learning to read, in otherwise normal individuals. Individuals with RD manifest deficits in several reading and language skills. Previous research has suggested the existence of a quantitative-trait locus (QTL) for RD on the short arm of chromosome 6. In the present study, RD subjects' performance in several measures of word recognition and component skills of orthographic coding, phonological decoding, and phoneme awareness were individually subjected to QTL analysis, with a new sample of 126 sib pairs, by means of a multipoint mapping method and eight informative DNA markers on chromosome 6 (D6S461, D6S276, D6S105, D6S306, D6S258, D6S439, D6S291, and D6S1019). The results indicate significant linkage across a distance of at least 5 cM for deficits in orthographic (LOD = 3.10) and phonological (LOD = 2.42) skills, confirming previous findings.  相似文献   

13.
Chow BW  Ho CS  Wong SW  Waye MM  Bishop DV 《PloS one》2011,6(2):e16640
This study investigated the etiology of individual differences in Chinese language and reading skills in 312 typically developing Chinese twin pairs aged from 3 to 11 years (228 pairs of monozygotic twins and 84 pairs of dizygotic twins; 166 male pairs and 146 female pairs). Children were individually given tasks of Chinese word reading, receptive vocabulary, phonological memory, tone awareness, syllable and rhyme awareness, rapid automatized naming, morphological awareness and orthographic skills, and Raven's Coloured Progressive Matrices. All analyses controlled for the effects of age. There were moderate to substantial genetic influences on word reading, tone awareness, phonological memory, morphological awareness and rapid automatized naming (estimates ranged from .42 to .73), while shared environment exerted moderate to strong effects on receptive vocabulary, syllable and rhyme awareness and orthographic skills (estimates ranged from .35 to .63). Results were largely unchanged when scores were adjusted for nonverbal reasoning as well as age. Findings of this study are mostly similar to those found for English, a language with very different characteristics, and suggest the universality of genetic and environmental influences across languages.  相似文献   

14.
Lewis Carroll''s English word game Doublets is represented as a system of networks with each node being an English word and each connectivity edge confirming that its two ending words are equal in letter length, but different by exactly one letter. We show that this system, which we call the Doublets net, constitutes a complex body of linguistic knowledge concerning English word structure that has computable multiscale features. Distributed morphological, phonological and orthographic constraints and the language''s local redundancy are seen at the node level. Phonological communities are seen at the network level. And a balancing act between the language''s global efficiency and redundancy is seen at the system level. We develop a new measure of intrinsic node-to-node distance and a computational algorithm, called community geometry, which reveal the implicit multiscale structure within binary networks. Because the Doublets net is a modular complex cognitive system, the community geometry and computable multi-scale structural information may provide a foundation for understanding computational learning in many systems whose network structure has yet to be fully analyzed.  相似文献   

15.
Fiez JA  Balota DA  Raichle ME  Petersen SE 《Neuron》1999,24(1):205-218
Functional neuroimaging was used to investigate three factors that affect reading performance: first, whether a stimulus is a word or pronounceable non-word (lexicality), second, how often a word is encountered (frequency), and third, whether the pronunciation has a predictable spelling-to-sound correspondence (consistency). Comparisons between word naming (reading) and visual fixation scans revealed stimulus-related activation differences in seven regions. A left frontal region showed effects of consistency and lexicality, indicating a role in orthographic to phonological transformation. Motor cortex showed an effect of consistency bilaterally, suggesting that motoric processes beyond high-level representations of word phonology influence reading performance. Implications for the integration of these results into theoretical models of word reading are discussed.  相似文献   

16.

Background

Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression.

Results

Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others.

Conclusion

Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.  相似文献   

17.
18.
On the basis of data from the synthetic and agglutinative South American language Wichi (Mataguayan, Argentina/Bolivia), I argue in favor of regarding interface phenomena as typological variables. In particular, in this paper I discuss what type of interactions these are, arguing that they do not affect wordhood but do contribute to its formation. I will defend the hypothesis that linguistic level interactions within the word are of two types and different in nature: overlapping on the one hand and conditioning and alteration on the other. Conditioning only takes place in morphophonological and morphosemantic interactions and it follows the wordhood requirements of the language. Conversely, the interaction of morphology with all linguistic levels shows overlapping of units: the phonological word and the grammatical word in the morphophonological relation; the word and the simple clause or nominal phrase in the morphosyntactic relation; and the word and the semantic unit in the morphosemantic relation. This explains why the word is generally defined by phonological, morphological, syntactic and semantic criteria. It is to be hoped that the conclusions arrived at in this paper would contribute to deepen our knowledge of the notion of wordhood in synthetic languages in South America as well as our understanding of language structure and functioning.  相似文献   

19.

Purpose

The purpose of the present study was to extend previous research by analyzing the ability of adults who stutter to use phonological working memory in conjunction with lexical access to perform a word jumble task.

Method

Forty English words consisting of 3-, 4-, 5-, and 6-letters (n = 10 per letter length category) were randomly jumbled using a web-based application. During the experimental task, 26 participants were asked to silently manipulate the scrambled letters to form a real word. Each vocal response was coded for accuracy and speech reaction time (SRT).

Results

Adults who stutter attempted to solve fewer word jumble stimuli than adults who do not stutter at the 4-letter, 5-letter, and 6-letter lengths. Additionally, adults who stutter were significantly less accurate solving word jumble tasks at the 4-letter, 5-letter, and 6-letter lengths compared to adults who do not stutter. At the longest word length (6-letter), SRT was significantly slower for the adults who stutter than the fluent controls.

Conclusion

Results of the current study lend further support to the notion that differences in various aspects of phonological processing, including vision-to-sound conversions, sub-vocal stimulus manipulation, and/or lexical access are compromised in adults who stutter.  相似文献   

20.
Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号