首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.

Methodology

Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.

Conclusions

Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.  相似文献   

2.
Alcohol abuse is closely connected with so much hurt and pain in northern communities that it had to be addressed in this session. Much of what is done in the way of prevention and treatment of alcohol abuse originates from outside indigenous cultures. However, many Native people have either remained sober or become sober without ever going into a formal treatment program. Ironically, until very recently, little research effort has gone into understanding the backgrounds and attitudes of this population. “The People Awakening Project,” a collaborative effort between a group of Alaska Natives interested in sobriety and the University of Alaska Fairbanks, has changed that. Although the project is not finished, this presentation provides a clear sense of how the research is being conducted, what kinds of data are emerging from it, and what some of the preliminary results look like. Chase Hensel gave the original presentation in Quebec City. Svenne Haakenson and Gerry Mohatt, who are heavily involved in the project, join him in authoring this written version.  相似文献   

3.
When analyzed statistically, the distribution of verbal elements in some written sources, like that of objects in archeological sites, yields culture-historical data independent of artifacts' intended meaning. A 1559 case study from highland Ecuador uses personal names and their parts to detect cultural differences in aboriginal society and their changing relation to superordinate Inca culture. This can be done even with undeciphered names from an extinct language, suggesting a fortiori that the method is generalizable.  相似文献   

4.
5.
Languages, like genes, evolve by a process of descent with modification. This striking similarity between biological and linguistic evolution allows us to apply phylogenetic methods to explore how languages, as well as the people who speak them, are related to one another through evolutionary history. Language phylogenies constructed with lexical data have so far revealed population expansions of Austronesian, Indo-European and Bantu speakers. However, how robustly a phylogenetic approach can chart the history of language evolution and what language phylogenies reveal about human prehistory must be investigated more thoroughly on a global scale. Here we report a phylogeny of 59 Japonic languages and dialects. We used this phylogeny to estimate time depth of its root and compared it with the time suggested by an agricultural expansion scenario for Japanese origin. In agreement with the scenario, our results indicate that Japonic languages descended from a common ancestor approximately 2182 years ago. Together with archaeological and biological evidence, our results suggest that the first farmers of Japan had a profound impact on the origins of both people and languages. On a broader level, our results are consistent with a theory that agricultural expansion is the principal factor for shaping global linguistic diversity.  相似文献   

6.
This paper focuses on what electrical and magnetic recordings of human brain activity reveal about spoken language understanding. Based on the high temporal resolution of these recordings, a fine-grained temporal profile of different aspects of spoken language comprehension can be obtained. Crucial aspects of speech comprehension are lexical access, selection and semantic integration. Results show that for words spoken in context, there is no 'magic moment' when lexical selection ends and semantic integration begins. Irrespective of whether words have early or late recognition points, semantic integration processing is initiated before words can be identified on the basis of the acoustic information alone. Moreover, for one particular event-related brain potential (ERP) component (the N400), equivalent impact of sentence- and discourse-semantic contexts is observed. This indicates that in comprehension, a spoken word is immediately evaluated relative to the widest interpretive domain available. In addition, this happens very quickly. Findings are discussed that show that often an unfolding word can be mapped onto discourse-level representations well before the end of the word. Overall, the time course of the ERP effects is compatible with the view that the different information types (lexical, syntactic, phonological, pragmatic) are processed in parallel and influence the interpretation process incrementally, that is as soon as the relevant pieces of information are available. This is referred to as the immediacy principle.  相似文献   

7.

Background

Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events.

Results

This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard.

Conclusions

The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.  相似文献   

8.
9.

Background

Normal reading requires eye guidance and activation of lexical representations so that words in text can be identified accurately. However, little is known about how the visual content of text supports eye guidance and lexical activation, and thereby enables normal reading to take place.

Methods and Findings

To investigate this issue, we investigated eye movement performance when reading sentences displayed as normal and when the spatial frequency content of text was filtered to contain just one of 5 types of visual content: very coarse, coarse, medium, fine, and very fine. The effect of each type of visual content specifically on lexical activation was assessed using a target word of either high or low lexical frequency embedded in each sentence

Results

No type of visual content produced normal eye movement performance but eye movement performance was closest to normal for medium and fine visual content. However, effects of lexical frequency emerged early in the eye movement record for coarse, medium, fine, and very fine visual content, and were observed in total reading times for target words for all types of visual content.

Conclusion

These findings suggest that while the orchestration of multiple scales of visual content is required for normal eye-guidance during reading, a broad range of visual content can activate processes of word identification independently. Implications for understanding the role of visual content in reading are discussed.  相似文献   

10.
In the course of interviews with Israeli women who had recently been treated for breast cancer, we found that our informants tended to offer us "treatment narratives" rather than, or sometimes in addition to, the "illness narratives" made famous by Arthur Kleinman. For the women we interviewed, treatment narratives constitute verbal platforms on which to explore what it means to be human during a period in which one's body, spirit, and social identity are undergoing intense transformations. A central theme in these narratives is the Hebrew word yachas, loosely translated as "attitude," "attention," or "relationship." The women consistently contrasted the good yachas of medical staff who treated them "like humans" or like "real friends" with the bad yachas of staff who treated them like numbers, machines, or strangers. We argue that the women used language (in various contexts) as a means of resisting the medical culture's pattern of treating patients as "nonhumans."  相似文献   

11.
12.
What should human languages be like if humans are the products of Darwinian evolution? Between Darwin's day and our own, expectations about evolution's imprint on language have changed dramatically. It is now a commonplace that, for good Darwinian reasons, no language is more highly evolved than any other. But Darwin, in The descent of man, defended the opposite view: different languages, like the peoples speaking them, are higher or lower in an evolutionarily generated scale. This paper charts some of the changes in the Darwinian tradition that transformed the notion of human linguistic equality from creationist heresy to evolutionist orthodoxy. Darwin's position in particular is considered in detail, for there is disagreement about what it was, and about the bearing of a famous paragraph in the Descent comparing languages and species.  相似文献   

13.
We demonstrate a substantial evidence that the word length can be an essential lexical structural feature for word evolution in written Chinese. The data used in this study are diachronic Chinese short narrative texts with a time span of over 2000-years. We show that the increase of word length is an essential regularity in word evolution. On the one hand, word frequency is found to depend on word length, and their relation is in line with the Power law function y = ax-b. On the other hand, our deeper analyses show that the increase of word length results in the simplification in characters for balance in written Chinese. Moreover, the correspondence between written and spoken Chinese is discussed. We conclude that the disyllabic trend may account for the increase of word length, and its impacts can be explained in "the principle of least effort".  相似文献   

14.
As more U.S. youth claim "mixed" heritages, some adults are proposing to erase race words altogether from the nation's inequality analysis. Yet such proposals, as detailed ethnography shows, ignore the complex realities of continuing racialized practice. At an urban California high school in the 1990s, "mixed" youth strategically employed simple "race" categories to describe themselves and inequality orders, even as they regularly challenged these very labels' accuracy. In so "bending" race categories, these youth modeled a practical and theoretical strategy crucial for dealing thoughtfully with race in 21st century America . [race, youth, youth culture, discourse, language]  相似文献   

15.
To what extent do phonological codes constrain orthographic output in handwritten production? We investigated how phonological codes constrain the selection of orthographic codes via sublexical and lexical routes in Chinese written production. Participants wrote down picture names in a picture-naming task in Experiment 1or response words in a symbol—word associative writing task in Experiment 2. A sublexical phonological property of picture names (phonetic regularity: regular vs. irregular) in Experiment 1and a lexical phonological property of response words (homophone density: dense vs. sparse) in Experiment 2, as well as word frequency of the targets in both experiments, were manipulated. A facilitatory effect of word frequency was found in both experiments, in which words with high frequency were produced faster than those with low frequency. More importantly, we observed an inhibitory phonetic regularity effect, in which low-frequency picture names with regular first characters were slower to write than those with irregular ones, and an inhibitory homophone density effect, in which characters with dense homophone density were produced more slowly than those with sparse homophone density. Results suggested that phonological codes constrained handwritten production via lexical and sublexical routes.  相似文献   

16.
The language of ecosystem science is pervaded by value-laden terms such as pristine, fragile, disturbance, balance, dominance and alien species. Such terms have high status and are often used in the rhetoric of the conservation ethic. Here, I consider the possibility of the use of less value-laden terms such as change, increase, decrease and so on. This would distinguish between values and perceived trends or states and leave ecosystem science to deal with what is verifiable. However, I also consider the opposite point of view, in that the value-laden terms, like 'the balance of nature', relate to how a wide range of people feel about nature and are effective emotive motivators of the conservation ethic in society, providing a common language for a discourse between ecosystem scientists and other people.  相似文献   

17.
Many of the world's aboriginal peoples are currently engaged in struggles over land and self-government with the states that encompass them. In Canada, aboriginal people have effectively used the concept of "aboriginal title" to force the government to negotiate land and self-government agreements with them. Such agreements, however, along with the notion of "aboriginal title" itself, are based on the European concept of "property"; they grant First Nations "ownership" of certain lands and spell out the rights they possess in relation to those lands. This means that aboriginal people have had to learn to think and speak in the "language of property" as a precondition for even engaging government officials in a dialogue over land and sovereignty. Yet the concept of property is in many ways incompatible with many Canadian First Nation people's views about proper human-animal/land relations. In this article, I argue that the land claim process—because it forces aboriginal people to think and speak in the language of property—tends to undermine the very beliefs and practices that a land claim agreement is meant to preserve. [Key words: property, First Nations, aboriginal land claims, Canada, Subarctic]  相似文献   

18.
Skilled sentence production involves distinct stages of message conceptualization (deciding what to talk about) and message formulation (deciding how to talk about it). Eye-movement paradigms provide a mechanism for observing how speakers accomplish these aspects of production in real time. These methods have recently been applied to children with autism spectrum disorder (ASD) and specific language impairment (LI) in an effort to reveal qualitative differences between groups in sentence production processes. Findings support a multiple-deficit account in which language production is influenced not only by lexical and syntactic constraints, but also by variation in attention control, inhibition and social competence. Thus, children with ASD are especially vulnerable to atypical patterns of visual inspection and verbal utterance. The potential to influence attentional focus and prime appropriate language structures are considered as a mechanism for facilitating language adaptation and learning.  相似文献   

19.
This paper presents results from a corpus-based study investigating lexical variation in BSL. An earlier study investigating variation in BSL numeral signs found that younger signers were using a decreasing variety of regionally distinct variants, suggesting that levelling may be taking place. Here, we report findings from a larger investigation looking at regional lexical variants for colours, countries, numbers and UK placenames elicited as part of the BSL Corpus Project. Age, school location and language background were significant predictors of lexical variation, with younger signers using a more levelled variety. This change appears to be happening faster in particular sub-groups of the deaf community (e.g., signers from hearing families). Also, we find that for the names of some UK cities, signers from outside the region use a different sign than those who live in the region.  相似文献   

20.
This article aims at investigating the linguistic criteria to determine what a word is in Wichi (Matacoan), a polysynthetic and agglutinative language spoken in the Gran Chaco Region, in South America. The main phonological criteria proposed are phonological rules and stress. We also apply some grammatical criteria that have been proposed cross linguistically, some of which are useful to determine the boundaries of grammatical words in Wichi. Finally, we explore the relationship between the phonological and grammatical word with the written word. We base our analysis of written words on a textbook (Tsalanawu) used in many bilingual schools in Northeastern Argentina.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号