首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
A comparison of words in the Akarimojong and Gaelic languages reveals a large number of similar words with related meanings. Not only are there many isolated words in common between the two languages, but also there are many themetic words related to cultural activities such as construction of dwellings, animal husbandry, cultivation, food processing and personal adornment. Extension of the research reveals the existence of Akarimojong words with similar meanings to words or phrases in Hebrew, Sumerian, Akkadian, Spanish and Tibetan among other languages. The geographic spread of these modern languages and the extensive commonalities between thematic word lists suggests that the various languages formed a single entity some time in the past, possibly as early as the Late Pleistocene.  相似文献   

2.
Signed languages exhibit iconicity (resemblance between form and meaning) across their vocabulary, and many non-Indo-European spoken languages feature sizable classes of iconic words known as ideophones. In comparison, Indo-European languages like English and Spanish are believed to be arbitrary outside of a small number of onomatopoeic words. In three experiments with English and two with Spanish, we asked native speakers to rate the iconicity of ~600 words from the English and Spanish MacArthur-Bates Communicative Developmental Inventories. We found that iconicity in the words of both languages varied in a theoretically meaningful way with lexical category. In both languages, adjectives were rated as more iconic than nouns and function words, and corresponding to typological differences between English and Spanish in verb semantics, English verbs were rated as relatively iconic compared to Spanish verbs. We also found that both languages exhibited a negative relationship between iconicity ratings and age of acquisition. Words learned earlier tended to be more iconic, suggesting that iconicity in early vocabulary may aid word learning. Altogether these findings show that iconicity is a graded quality that pervades vocabularies of even the most “arbitrary” spoken languages. The findings provide compelling evidence that iconicity is an important property of all languages, signed and spoken, including Indo-European languages.  相似文献   

3.
We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a word's position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the world's language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history.  相似文献   

4.
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.  相似文献   

5.
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary‐linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to estimate a date of approximately 710–760 BCE for these great works. Our analysis compared a common set of vocabulary items among the three pairs of languages, recording for each item whether the words in the two languages were cognate – derived from a shared ancestral word – or not. We then used a likelihood‐based Markov chain Monte Carlo procedure to estimate the most probable times in years separating these languages given the percentage of words they shared, combined with knowledge of the rates at which different words change. Our date for the epics is in close agreement with historians' and classicists' beliefs derived from historical and archaeological sources.  相似文献   

6.
7.
Recovering discrete words from continuous speech is one of the first challenges facing language learners. Infants and adults can make use of the statistical structure of utterances to learn the forms of words from unsegmented input, suggesting that this ability may be useful for bootstrapping language-specific cues to segmentation. It is unknown, however, whether performance shown in small-scale laboratory demonstrations of “statistical learning” can scale up to allow learning of the lexicons of natural languages, which are orders of magnitude larger. Artificial language experiments with adults can be used to test whether the mechanisms of statistical learning are in principle scalable to larger lexicons. We report data from a large-scale learning experiment that demonstrates that adults can learn words from unsegmented input in much larger languages than previously documented and that they retain the words they learn for years. These results suggest that statistical word segmentation could be scalable to the challenges of lexical acquisition in natural language learning.  相似文献   

8.
因为汉字具有与拼音字母语言不同的方形书写结构以及字型映射规则,已有很多研究关注这两种语言是否存在着不同的阅读机制,但还在争论中.本研究通过功能磁共振技术,采用汉字和英文两种语言的真字(词),假字(词)以及非字(词)作为刺激材料来进一步研究此问题.研究结果显示:在高频字(词)条件下,汉字和英文具有相似的阅读机制,但在阅读假字(词)时,它们的阅读机制差异很大.具体表现为英文假词激活了左脑缘上回,而汉字假词激活了左脑额中回.研究结果说明:1)汉英双语者可能采用了两种不同的双线路机制来读取汉字和英文单词.2)汉英双语者在阅读英文假词时,须借助缘上回脑区进行字型转换.而在阅读汉字假字时,则需通过额中回进行笔画分析.  相似文献   

9.
It is a well-known fact that languages react differently when foreign words denoting new concepts have to be integrated into the native system. The procedure mostly depends on the degree of purism present in a linguistic community: some languages are rather open to foreign influences and do not demonstrate any special hostility towards new words which are easily accepted and adapted to the phonological and morphological systems of the receiving language. Languages, which have a strong puristic tradition, usually channel their borrowings into the loan translation field using internal word formation resources as a means of creating neologisms. Regardless of whether they are built of native elements or appear as loans, neologisms are necessarily the result of linguistic changes.  相似文献   

10.
The suffixes of the nominal declension in the Old Canary and Etruscan languages are very similar to the corresponding elements of the Sumerian and Ural-Altaic tongues. Also many words of funeral and generally cultic provenance are derived from common roots in these languages. So one may assume that the Indoeuropean tongues of (West) Europe overlaid a common substratum of Ural-Altaic type which was alive still in the time of Megalithicum.  相似文献   

11.
The claim that Eskimo languages have words for different types of snow is well-known among the public, but has been greatly exaggerated through popularization and is therefore viewed with skepticism by many scholars of language. Despite the prominence of this claim, to our knowledge the line of reasoning behind it has not been tested broadly across languages. Here, we note that this reasoning is a special case of the more general view that language is shaped by the need for efficient communication, and we empirically test a variant of it against multiple sources of data, including library reference works, Twitter, and large digital collections of linguistic and meteorological data. Consistent with the hypothesis of efficient communication, we find that languages that use the same linguistic form for snow and ice tend to be spoken in warmer climates, and that this association appears to be mediated by lower communicative need to talk about snow and ice. Our results confirm that variation in semantic categories across languages may be traceable in part to local communicative needs. They suggest moreover that despite its awkward history, the topic of “words for snow” may play a useful role as an accessible instance of the principle that language supports efficient communication.  相似文献   

12.
There is given a survey of the evolution of the idea of time in the mankind's thinking from the beginning down to the term's application in sciences and in the philosophy. As one can point out from some languages, living as well as extincted ones, the words for time are derived etymologically from several roots or stems, respectively, which mostly represent different meanings. But by increasing abstraction in all civilized languages, the process of stripping the different words of their concrete accompaniments led up to a narrow of the diverse meanings which converged towards the common understanding of time in modern sciences. Nevertheless time is no unequivocal term as one can show by linguistic and mathematical analysis. Especially by means of the theory of differential equations and the set theory, the chimerical nature of time is demonstrable, so that time is only an abstraction of abstractions.  相似文献   

13.
Language origins and diversification are vital for mapping human history. Traditionally, the reconstruction of language trees has been based on cognate forms among related languages, with ancestral protolanguages inferred by individual investigators. Disagreement among competing authorities is typically extensive, without empirical grounds for resolving alternative hypotheses. Here, we apply analytical methods derived from DNA sequence optimization algorithms to Uto‐Aztecan languages, treating words as sequences of sounds. Our analysis yields novel relationships and suggests a resolution to current conflicts about the Proto‐Uto‐Aztecan homeland. The techniques used for Uto‐Aztecan are applicable to written and unwritten languages, and should enable more empirically robust hypotheses of language relationships, language histories, and linguistic evolution.  相似文献   

14.
Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been proposed for Urdu text tokenization, along with some other algorithms to solve the additional issues of boundary detection of compound words, affixation, reduplication, names and abbreviations. This study resulted into 97.28% precision, 93.71% recall, and 95.46% F1-measure; while tokenizing a corpus of 57000 words by using a morpheme list with 6400 entries.  相似文献   

15.
On the evolutionary trajectory that led to human language there must have been a transition from a fairly limited to an essentially unlimited communication system. The structure of modern human languages reveals at least two steps that are required for such a transition: in all languages (i) a small number of phonemes are used to generate a large number of words; and (ii) a large number of words are used to a produce an unlimited number of sentences. The first (and simpler) step is the topic of the current paper. We study the evolution of communication in the presence of errors and show that this limits the number of objects (or concepts) that can be described by a simple communication system. The evolutionary optimum is achieved by using only a small number of signals to describe a few valuable concepts. Adding more signals does not increase the fitness of a language. This represents an error limit for the evolution of communication. We show that this error limit can be overcome by combining signals (phonemes) into words. The transition from an analogue to a digital system was a necessary step toward the evolution of human language.  相似文献   

16.
In all European countries, the eighteenth century was characterised by efforts to improve the vernaculars. The Transylvanian case study shows how both codified medical language and ordinary language were constructed and enriched by a large number of medical books and brochures. The publication of medical literature in Central European vernacular languages in order to popularise new medical knowledge was a comprehensive programme, designed on the one hand by intellectual, political and religious elites who urged the improvement of the fatherland and the promotion of the common good by perfecting the arts and sciences. On the other hand, the imperial administration's initiatives affected local forms of medical knowledge and the construction of vernacular languages. In the eighteenth century, the construction of vernacular languages in the Habsburg Monarchy took on a significant political character. However, in the process of building of the scientific and medical vocabulary, the main preoccupation was precision, clarity and accessibility of the neologisms being invented to encompass the medical phenomena being described. In spite of political conflicts among the 'nations' living in Transylvania, physicians borrowed words from German, Hungarian and Romanian. Thus they elevated several words used in everyday language to the upper social stratum of language use, leading to the invention of new terms to describe particular medical practices or phenomena.  相似文献   

17.
Past research has demonstrated cross-linguistic, cross-modal, and task-dependent differences in neighborhood density effects, indicating a need to control for neighborhood variables when developing and interpreting research on language processing. The goals of the present paper are two-fold: (1) to introduce CLEARPOND (Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densities), a centralized database of phonological and orthographic neighborhood information, both within and between languages, for five commonly-studied languages: Dutch, English, French, German, and Spanish; and (2) to show how CLEARPOND can be used to compare general properties of phonological and orthographic neighborhoods across languages. CLEARPOND allows researchers to input a word or list of words and obtain phonological and orthographic neighbors, neighborhood densities, mean neighborhood frequencies, word lengths by number of phonemes and graphemes, and spoken-word frequencies. Neighbors can be defined by substitution, deletion, and/or addition, and the database can be queried separately along each metric or summed across all three. Neighborhood values can be obtained both within and across languages, and outputs can optionally be restricted to neighbors of higher frequency. To enable researchers to more quickly and easily develop stimuli, CLEARPOND can also be searched by features, generating lists of words that meet precise criteria, such as a specific range of neighborhood sizes, lexical frequencies, and/or word lengths. CLEARPOND is freely-available to researchers and the public as a searchable, online database and for download at http://clearpond.northwestern.edu.  相似文献   

18.
Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language evolution also entails horizontal components, most commonly through lexical borrowing. For example, the English language was heavily influenced by Old Norse and Old French; eight per cent of its basic vocabulary is borrowed. Borrowing is a distinctly non-tree-like process--akin to horizontal gene transfer in genome evolution--that cannot be recovered by phylogenetic trees. Here, we infer the frequency of hidden borrowing among 2346 cognates (etymologically related words) of basic vocabulary distributed across 84 Indo-European languages. The dataset includes 124 (5%) known borrowings. Applying the uniformitarian principle to inventory dynamics in past and present basic vocabularies, we find that 1373 (61%) of the cognates have been affected by borrowing during their history. Our approach correctly identified 117 (94%) known borrowings. Reconstructed phylogenetic networks that capture both vertical and horizontal components of evolutionary history reveal that, on average, eight per cent of the words of basic vocabulary in each Indo-European language were involved in borrowing during evolution. Basic vocabulary is often assumed to be relatively resistant to borrowing. Our results indicate that the impact of borrowing is far more widespread than previously thought.  相似文献   

19.

Background

Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.

Methodology

Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.

Conclusions

Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.  相似文献   

20.
We propose a model that explains the reliable emergence of power laws (e.g., Zipf’s law) during the development of different human languages. The model incorporates the principle of least effort in communications, minimizing a combination of the information-theoretic communication inefficiency and direct signal cost. We prove a general relationship, for all optimal languages, between the signal cost distribution and the resulting distribution of signals. Zipf’s law then emerges for logarithmic signal cost distributions, which is the cost distribution expected for words constructed from letters or phonemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号