首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.  相似文献   

2.
Predation pressure may affect many aspects of prey behavior, including forming groups and changes in social interactions. We studied the aggregation behavior of competing gammarids Dikerogammarus villosus and Pontogammarus robustoides (Amphipoda, Crustacea) to check whether they modify their preferences for conspecifics or heterospecifics in response to predator (the racer goby Babka gymnotrachelus) kairomones in the presence or absence of stone shelters (alternative protection source). Both species exhibited preferences toward shelters occupied by conspecifics over empty shelters and conspecifics apart from shelters, suggesting that their aggregation depends not only on habitat heterogeneity, but also on their social interactions. Moreover, gammarids in the presence of shelters (safer conditions) preferred conspecifics over heterospecifics, but predator kairomones made them form aggregations irrespective of species. In the predator presence, P. robustoides increased its aggregation level only in the sheltered conditions, whereas D. villosus exhibited this response only in the absence of shelters, suggesting that this behavior can protect it against predators. Therefore, we tested the antipredator effectiveness of D. villosus aggregations by exposing them to fish predation. Gobies foraged most effectively on immobile single gammarids compared to moving and aggregated individuals. Fish also avoided aggregated prey, confirming the protective character of aggregations. We have demonstrated that the predator presence increases aggregation level of prey gammarids and affects their social behavior by reducing antagonistic interactions and avoidance between competing species. This is likely to affect their distribution and functioning in the wild, where predator pressure is a standard situation.  相似文献   

3.
MOTIVATION: Studies of efficient and sensitive sequence comparison methods are driven by a need to find homologous regions of weak similarity between large genomes. RESULTS: We describe an improved method for finding similar regions between two sets of DNA sequences. The new method generalizes existing methods by locating word matches between sequences under two or more word models and extending word matches into high-scoring segment pairs (HSPs). The method is implemented as a computer program named DDS2. Experimental results show that DDS2 can find more HSPs by using several word models than by using one word model. AVAILABILITY: The DDS2 program is freely available for academic use in binary code form at http://bioinformatics.iastate.edu/aat/align/align.html and in source code form from the corresponding author.  相似文献   

4.
5.
Dai Q  Liu X  Yao Y  Zhao F 《Amino acids》2012,42(5):1867-1877
There are two crucial problems with statistical measures for sequence comparison: overlapping structures and background information of words in biological sequences. Word normalization in improved composition vector method took into account these problems and achieved better performance in evolutionary analysis. The word normalization is desirable, but not sufficient, because it assumes that the four bases A, C, T, and G occur randomly with equal chance. This paper proposed an improved word normalization which uses Markov model to estimate exact k-word distribution according to observed biological sequence and thus has the ability to adjust the background information of the k-word frequencies in biological sequences. The improved word normalization was tested with three experiments and compared with the existing word normalization. The experiment results confirm that the improved word normalization using Markov model to estimate the exact k-word distribution in biological sequences is more efficient.  相似文献   

6.
German linking elements are sometimes classified as inflectional affixes, sometimes as derivational affixes, and in any case as morphological units with at least seven realisations (e.g. -s-, -es-, -(e)n-, -e-). This article seeks to show that linking elements are hybrid elements situated between morphology and phonology. On the one hand, they have a clear morphological status since they occur only within compounds (and before a very small set of suffixes) and support the listener in decoding them. On the other hand, they also have to be analysed on the phonological level, as will be shown in this article. Thus, they are marginal morphological units on the pathway to phonology (including prosodics). Although some alloforms can sometimes be considered former inflectional endings and in some cases even continue to demonstrate some inflectional behaviour (such as relatedness to gender and inflection class), they are on their way to becoming markers of ill-formed phonological words. In fact, linking elements, above all the linking -s-, which is extremely productive, help the listener decode compounds containing a bad phonological word as their first constituent, such as Geburt+s+tag ‘birthday’ or Religion+s+unterricht ‘religious education’. By marking the end of a first constituent that differs from an unmarked monopedal phonological word, the linking element aids the listener in correctly decoding and analysing the compound. German compounds are known for their length and complexity, both of which have increased over time—along with the occurrence of linking elements, especially -s-. Thus, a profound instance of language change can be observed in contemporary German, one indicating its typological shift from syllable language to word language.  相似文献   

7.

Background

DNA Clustering is an important technology to automatically find the inherent relationships on a large scale of DNA sequences. But the DNA clustering quality can still be improved greatly. The DNA sequences similarity metric is one of the key points of clustering. The alignment-free methodology is a very popular way to calculate DNA sequence similarity. It normally converts a sequence into a feature space based on words’ probability distribution rather than directly matches strings. Existing alignment-free models, e.g. k-tuple, merely employ word frequency information and ignore many types of useful information contained in the DNA sequence, such as classifications of nucleotide bases, position and the like. It is believed that the better data mining results can be achieved with compounded information. Therefore, we present a new alignment-free model that employs compounded information to improve the DNA clustering quality.

Results

This paper proposes a Category-Position-Frequency (CPF) model, which utilizes the word frequency, position and classification information of nucleotide bases from DNA sequences. The CPF model converts a DNA sequence into three sequences according to the categories of nucleotide bases, and then yields a 12-dimension feature vector. The feature values are computed by an entropy based model that takes both local word frequency and position information into account. We conduct DNA clustering experiments on several datasets and compare with some mainstream alignment-free models for evaluation, including k-tuple, DMk, TSM, AMI and CV. The experiments show that CPF model is superior to other models in terms of the clustering results and optimal settings.

Conclusions

The following conclusions can be drawn from the experiments. (1) The hybrid information model is better than the model based on word frequency only. (2) For DNA sequences no more than 5000 characters, the preferred size of sliding windows for CPF is two which provides a great advantage to promote system performance. (3) The CPF model is able to obtain an efficient stable performance and broad generalization.  相似文献   

8.
The cognitive analysis of adult language disorders continues to draw heavily on linguistic theory, but increasingly it reflects the influence of connectionist, spreading activation models of cognition. In the area of spoken word production, ‘localist’ connectionist models represent a natural evolution from the psycholingistic theories of earlier decades. By contrast, the parallel distributed processing framework forces more radical rethinking of aphasic impairments. This paper exemplifies these multiple influences in contemporary cognitive aphasiology. Topics include (i) what aphasia reveals about semantic-phonological interaction in lexical access; (ii) controversies surrounding the interpretation of semantic errors and (iii) a computational account of the relationship between naming and word repetition in aphasia. Several of these topics have been addressed using case series methods, including computational simulation of the individual, quantitative error patterns of diverse groups of patients and analysis of brain lesions that correlate with error rates and patterns. Efforts to map the lesion correlates of nonword errors in naming and repetition highlight the involvement of sensorimotor areas in the brain and suggest the need to better integrate models of word production with models of speech and action.  相似文献   

9.
Scientists studying how languages change over time often make an analogy between biological and cultural evolution, with words or grammars behaving like traits subject to natural selection. Recent work has exploited this analogy by using models of biological evolution to explain the properties of languages and other cultural artefacts. However, the mechanisms of biological and cultural evolution are very different: biological traits are passed between generations by genes, while languages and concepts are transmitted through learning. Here we show that these different mechanisms can have the same results, demonstrating that the transmission of frequency distributions over variants of linguistic forms by Bayesian learners is equivalent to the Wright–Fisher model of genetic drift. This simple learning mechanism thus provides a justification for the use of models of genetic drift in studying language evolution. In addition to providing an explicit connection between biological and cultural evolution, this allows us to define a ‘neutral’ model that indicates how languages can change in the absence of selection at the level of linguistic variants. We demonstrate that this neutral model can account for three phenomena: the s-shaped curve of language change, the distribution of word frequencies, and the relationship between word frequencies and extinction rates.  相似文献   

10.
This paper places models of language evolution within the framework of information theory. We study how signals become associated with meaning. If there is a probability of mistaking signals for each other, then evolution leads to an error limit: increasing the number of signals does not increase the fitness of a language beyond a certain limit. This error limit can be overcome by word formation: a linear increase of the word length leads to an exponential increase of the maximum fitness. We develop a general model of word formation and demonstrate the connection between the error limit and Shannon's noisy coding theorem.  相似文献   

11.
We present an off-line cursive word recognition system based completely on neural networks: reading models and models of early visual processing. The first stage (normalization) preprocesses the input image in order to reduce letter position uncertainty; the second stage (feature extraction) is based on the feedforward model of orientation selectivity; the third stage (letter pre-recognition) is based on a convolutional neural network, and the last stage (word recognition) is based on the interactive activation model.  相似文献   

12.
The quantitative modeling of semantic representations in the brain plays a key role in understanding the neural basis of semantic processing. Previous studies have demonstrated that word vectors, which were originally developed for use in the field of natural language processing, provide a powerful tool for such quantitative modeling. However, whether semantic representations in the brain revealed by the word vector-based models actually capture our perception of semantic information remains unclear, as there has been no study explicitly examining the behavioral correlates of the modeled brain semantic representations. To address this issue, we compared the semantic structure of nouns and adjectives in the brain estimated from word vector-based brain models with that evaluated from human behavior. The brain models were constructed using voxelwise modeling to predict the functional magnetic resonance imaging (fMRI) response to natural movies from semantic contents in each movie scene through a word vector space. The semantic dissimilarity of brain word representations was then evaluated using the brain models. Meanwhile, data on human behavior reflecting the perception of semantic dissimilarity between words were collected in psychological experiments. We found a significant correlation between brain model- and behavior-derived semantic dissimilarities of words. This finding suggests that semantic representations in the brain modeled via word vectors appropriately capture our perception of word meanings.  相似文献   

13.
In culture‐contact situations, it is commonplace for words to be borrowed from other unrelated vernaculars, for their pronunciations to be changed, and their meanings modified to fit new contexts. The Arandic word altyerre is a rather extreme example of this, and at the end of the nineteenth century, the ‘translation’ of the related word Alcheringa as ‘dream‐times’ sparked a debate that, in some forms, continues to this day. In this article, I discuss some of the reasons why this particular word struck such a controversial chord. I give an updated semantic perspective on the word altyerre , drawing on evidence from Arandic languages and from other languages in Central Australia. Then I examine some of the consequences of both religious and secular interpretations of altyerre and show how the popularisation of this word and its translations has impacted on its meanings in current usage.  相似文献   

14.
We conducted a preliminary study to examine whether Chinese readers’ spontaneous word segmentation processing is consistent with the national standard rules of word segmentation based on the Contemporary Chinese language word segmentation specification for information processing (CCLWSSIP). Participants were asked to segment Chinese sentences into individual words according to their prior knowledge of words. The results showed that Chinese readers did not follow the segmentation rules of the CCLWSSIP, and their word segmentation processing was influenced by the syntactic categories of consecutive words. In many cases, the participants did not consider the auxiliary words, adverbs, adjectives, nouns, verbs, numerals and quantifiers as single word units. Generally, Chinese readers tended to combine function words with content words to form single word units, indicating they were inclined to chunk single words into large information units during word segmentation. Additionally, the “overextension of monosyllable words” hypothesis was tested and it might need to be corrected to some degree, implying that word length have an implicit influence on Chinese readers’ segmentation processing. Implications of these results for models of word recognition and eye movement control are discussed.  相似文献   

15.
 One of the critical requirements of data analysis involving large DNA sequences is an effective statistical summarization of those sequences. In this article DNA sequences have been analyzed based on word frequencies. Our analysis focuses on the detection of structural signature of a genome reflected in word frequencies and identification of phylogenetic relationships among different species reflected in the variation of word distributions in their DNA sequences. We have carried out a statistical study of the complete genome of baker's yeast, of various ribosomal RNA sequences from different prokaryotic and eukaryotic organisms and of the full genomes of some bacteriophages. Our exploratory analysis amply demonstrates the usefulness of DNA word frequencies in reducing the dimensionality of large sequences while retaining some of the structural information there that can have biological significance. Some conceptual issues that arise in course of our investigation have been addressed. A few interesting problems related to the statistics of DNA words have been pointed out with some indication of their possible solutions. The work has been partially motivated by the fact that sequence alignment and homology techniques that are quite popular for comparing and analyzing relatively smaller DNA sequences of nearly equal sizes are not applicable to data consisting of large sequences with widely varying sizes, which may contain segments with unknown or no biological functions, and consequently their comparison through functional homology is either impossible or extremely difficult. Received: 15 October 2000 / Revised version: 8 October 2002 Published online: 28 February 2003 Current address: CF186, Salt lake, Calcutta 700064, India Research presented here was supported in part by a grant from Indian Statistical Institute. Key words or phrases: Average linkage clustering – Chernoff's faces – Dendrograms – DNA words – F-ranks of words – F-ratios of words – l 1-distance – Phylogenetic relationships – Rank correlation – Single linkage clustering  相似文献   

16.
Recent mechanistic evidence demonstrates that spa-based therapy (or, as we propose, crenotherapy from the Greek word κρενη, spring fountain) is indeed based on solid scientific data. This mini-review highlights the latest insights into the mechanisms of crenotherapy derived from in vitro experiments, studies on animal models, and carefully designed clinical trials. Although more basic and clinical data are still needed, crenotherapy is coming of age as a modern, scientifically sound therapy. As the underlying mechanisms are uncovered, it is becoming possible to choose the most appropriate applications of this centuries-old practice, possibly reducing medical costs, thus explaining the current worldwide renewed interest in crenotherapy.  相似文献   

17.
To characterize the functional role of the left-ventral occipito-temporal cortex (lvOT) during reading in a quantitatively explicit and testable manner, we propose the lexical categorization model (LCM). The LCM assumes that lvOT optimizes linguistic processing by allowing fast meaning access when words are familiar and filtering out orthographic strings without meaning. The LCM successfully simulates benchmark results from functional brain imaging described in the literature. In a second evaluation, we empirically demonstrate that quantitative LCM simulations predict lvOT activation better than alternative models across three functional magnetic resonance imaging studies. We found that word-likeness, assumed as input into a lexical categorization process, is represented posteriorly to lvOT, whereas a dichotomous word/non-word output of the LCM could be localized to the downstream frontal brain regions. Finally, training the process of lexical categorization resulted in more efficient reading. In sum, we propose that word recognition in the ventral visual stream involves word-likeness extraction followed by lexical categorization before one can access word meaning.  相似文献   

18.
Reading disability exhibited defects in different cognitive domains, including word reading fluency, word reading accuracy, phonological awareness, rapid automatized naming and morphological awareness. To identify the genetic basis of Chinese reading disability, we conducted a genome-wide association study (GWAS) of the cognitive traits related to Chinese reading disability in 2284 unrelated Chinese children. Among the traits analyzed in the present GWAS, we detected one genome-wide significant association (p < 5 × 10−8) on word reading fluency for one SNP on 4p16.2, within EVC genes (rs6446395, p = 7.33 × 10−10). Rs6446395 also showed significant association with Chinese character reading accuracy (p = 2.95 × 10−4), phonological awareness (p = 7.11 × 10−3) and rapid automatized naming (p = 4.71 × 10−3), implying multiple effects of this variant. The eQTL data showed that rs6446395 affected EVC expression in the cerebellum. Gene-based analyses identified a gene (PRDM10) to be associated with word reading fluency at the genome-wide level. Our study discovered a new candidate susceptibility variant for reading ability and provided new insights into the genetics of developmental dyslexia in Chinese children.  相似文献   

19.
A number of studies on network analysis have focused on language networks based on free word association, which reflects human lexical knowledge, and have demonstrated the small-world and scale-free properties in the word association network. Nevertheless, there have been very few attempts at applying network analysis to distributional semantic models, despite the fact that these models have been studied extensively as computational or cognitive models of human lexical knowledge. In this paper, we analyze three network properties, namely, small-world, scale-free, and hierarchical properties, of semantic networks created by distributional semantic models. We demonstrate that the created networks generally exhibit the same properties as word association networks. In particular, we show that the distribution of the number of connections in these networks follows the truncated power law, which is also observed in an association network. This indicates that distributional semantic models can provide a plausible model of lexical knowledge. Additionally, the observed differences in the network properties of various implementations of distributional semantic models are consistently explained or predicted by considering the intrinsic semantic features of a word-context matrix and the functions of matrix weighting and smoothing. Furthermore, to simulate a semantic network with the observed network properties, we propose a new growing network model based on the model of Steyvers and Tenenbaum. The idea underlying the proposed model is that both preferential and random attachments are required to reflect different types of semantic relations in network growth process. We demonstrate that this model provides a better explanation of network behaviors generated by distributional semantic models.  相似文献   

20.
Sarah White 《Dreaming》1999,9(1):11-21
This essay proposes that etymology, the study of word roots, presents analogies with dreamwork, although parallels between them must be carefully framed. Quoting Freud and the seventh century encyclopedist Isidore of Seville, weaknesses in their use of etymological arguments are identified. Theories forged from word origins should not blur distinctions between word and thing or force linguistic process into support of a preconceived theoretical project. To explore Freud's notion of contraries in words and dreams, examples are offered of single Indo-European word roots capable of engendering divergent or contradictory modern meanings, as well as examples of divergent or contradictory modern meanings for words that have two or more derivations, e.g., the English word dream and French rêve. Tracing a place-name (Campidoglio) in an actual dream demonstrates that etymology and dreamwork are both reconstructive processes that should avoid determinism, accept uncertainty, and respect complexity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号