首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Distributional learning of speech sounds (i.e., learning from simple exposure to frequency distributions of speech sounds in the environment) has been observed in the lab repeatedly in both infants and adults. The current study is the first attempt to examine whether the capacity for using the mechanism is different in adults than in infants. To this end, a previous event-related potential study that had shown distributional learning of the English vowel contrast /æ/∼/ε/ in 2-to-3-month old Dutch infants was repeated with Dutch adults. Specifically, the adults were exposed to either a bimodal distribution that suggested the existence of the two vowels (as appropriate in English), or to a unimodal distribution that did not (as appropriate in Dutch). After exposure the participants were tested on their discrimination of a representative [æ] and a representative [ε], in an oddball paradigm for measuring mismatch responses (MMRs). Bimodally trained adults did not have a significantly larger MMR amplitude, and hence did not show significantly better neural discrimination of the test vowels, than unimodally trained adults. A direct comparison between the normalized MMR amplitudes of the adults with those of the previously tested infants showed that within a reasonable range of normalization parameters, the bimodal advantage is reliably smaller in adults than in infants, indicating that distributional learning is a weaker mechanism for learning speech sounds in adults (if it exists in that group at all) than in infants.  相似文献   

2.
We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex. Received: 20 July 1993/Accepted in revised form: 22 December 1993  相似文献   

3.
Opportunities for associationist learning of word meaning, where a word is heard or read contemperaneously with information being available on its meaning, are considered too infrequent to account for the rate of language acquisition in children. It has been suggested that additional learning could occur in a distributional mode, where information is gleaned from the distributional statistics (word co-occurrence etc.) of natural language. Such statistics are relevant to meaning because of the Distributional Principle that ‘words of similar meaning tend to occur in similar contexts’. Computational systems, such as Latent Semantic Analysis, have substantiated the viability of distributional learning of word meaning, by showing that semantic similarities between words can be accurately estimated from analysis of the distributional statistics of a natural language corpus. We consider whether appearance similarities can also be learnt in a distributional mode. As grounds for such a mode we advance the Appearance Hypothesis that ‘words with referents of similar appearance tend to occur in similar contexts’. We assess the viability of such learning by looking at the performance of a computer system that interpolates, on the basis of distributional and appearance similarity, from words that it has been explicitly taught the appearance of, in order to identify and name objects that it has not been taught about. Our experiment tests with a set of 660 simple concrete noun words. Appearance information on words is modelled using sets of images of examples of the word. Distributional similarity is computed from a standard natural language corpus. Our computation results support the viability of distributional learning of appearance.  相似文献   

4.
The perception of vowels was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of vowels were taken as an index of similarity between vowels. Vowels used were five synthetic and natural Japanese vowels and eight natural French vowels. The chimpanzees required long reaction times for discrimination of synthetic [i] from [u] and [e] from [o], that is, they need long latencies for discrimination between vowels based on differences in frequency of the second formant. A similar tendency was observed for discrimination of natural [i] from [u]. The human subject required long reaction times for discrimination between vowels along the first formant axis. These differences can be explained by differences in auditory sensitivity between the two species and the motor theory of speech perception. A vowel, which is pronounced by different speakers, has different acoustic properties. However, humans can perceive these speech sounds as the same vowel. The phenomenon of perceptual constancy in speech perception was studied in chimpanzees using natural vowels and a synthetic [o]- [a] continuum. The chimpanzees ignored the difference in the sex of the speakers and showed a capacity for vocal tract normalization.  相似文献   

5.
The sequential organization of sound over time can interact with the concurrent organization of sounds across frequency. Previous studies using simple acoustic stimuli have suggested that sequential streaming cues can retroactively affect the perceptual organization of sounds that have already occurred. It is unknown whether such effects generalize to the perception of speech sounds. Listeners’ ability to identify two simultaneously presented vowels was measured in the following conditions: no context, a preceding context stream (precursors), and a following context stream (postcursors). The context stream was comprised of brief repetitions of one of the two vowels, and the primary measure of performance was listeners’ ability to identify the other vowel. Results in the precursor condition showed a significant advantage for the identification of the second vowel compared to the no-context condition, suggesting that sequential grouping mechanisms aided the segregation of the concurrent vowels, in agreement with previous work. However, performance in the postcursor condition was significantly worse compared to the no-context condition, providing no evidence for an effect of stream segregation, and suggesting a possible interference effect. Two additional experiments involving inharmonic (jittered) vowels were performed to provide additional cues to aid retroactive stream segregation; however, neither manipulation enabled listeners to improve their identification of the target vowel. Taken together with earlier studies, the results suggest that retroactive streaming may require large spectral differences between concurrent sources and thus may not provide a robust segregation cue for natural broadband sounds such as speech.  相似文献   

6.
Four male Long-Evans rats were trained to discriminate between synthetic vowel sounds using a GO/NOGO response choice task. The vowels were characterized by an increase in fundamental frequency correlated with an upward shift in formant frequencies. In an initial phase we trained the subjects to discriminate between two vowel categories using two exemplars from each category. In a subsequent phase the ability of the rats to generalize the discrimination between the two categories was tested. To test whether rats might exploit the fact that attributes of training stimuli covaried, we used non-standard stimuli with a reversed relation between fundamental frequency and formants. The overall results demonstrate that rats are able to generalize the discrimination to new instances of the same vowels. We present evidence that the performance of the subjects depended on the relation between fundamental and formant frequencies that they had previously been exposed to. Simple simulation results with artificial neural networks could reproduce most of the behavioral results and support the hypothesis that equivalence classes for vowels are associated with an experience-driven process based on general properties of peripheral auditory coding mixed with elementary learning mechanisms. These results suggest that rats use spectral and temporal cues similarly to humans despite differences in basic auditory capabilities.  相似文献   

7.
Filik R  Barber E 《PloS one》2011,6(10):e25782
While reading silently, we often have the subjective experience of inner speech. However, there is currently little evidence regarding whether this inner voice resembles our own voice while we are speaking out loud. To investigate this issue, we compared reading behaviour of Northern and Southern English participants who have differing pronunciations for words like 'glass', in which the vowel duration is short in a Northern accent and long in a Southern accent. Participants' eye movements were monitored while they silently read limericks in which the end words of the first two lines (e.g., glass/class) would be pronounced differently by Northern and Southern participants. The final word of the limerick (e.g., mass/sparse) then either did or did not rhyme, depending on the reader's accent. Results showed disruption to eye movement behaviour when the final word did not rhyme, determined by the reader's accent, suggesting that inner speech resembles our own voice.  相似文献   

8.
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.  相似文献   

9.
Studies have been made on the perception of vowel-like stimuli formed by interchange of impulses with various form of the spectrum following with a period of 14 ms. Application of similar stimuli in earlier investigations revealed a phenomenon of accumulation of spectral information during the interval within a stimulus and allowed to reject a hypothesis of averaging directly the auditory dynamic spectrum. In the first part of investigation, we checked and did not confirm a hypothesis of screening from the bulk of the accumulated data of those spectral configurations which are distinctly different from natural vowels. In the second part of investigation, new results were obtained which are predicted by the hypothesis of average meanings of vowel traits (calculated from impulse spectra) and which contradict the hypothesis of current phonemic discrimination with accumulation of the obtained values of similarity of the spectrum with phonemic standards.  相似文献   

10.
Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.  相似文献   

11.
Previous studies have shown that concurrent vowel identification improves with increasing temporal onset asynchrony of the vowels, even if the vowels have the same fundamental frequency. The current study investigated the possible underlying neural processing involved in concurrent vowel perception. The individual vowel stimuli from a previously published study were used as inputs for a phenomenological auditory-nerve (AN) model. Spectrotemporal representations of simulated neural excitation patterns were constructed (i.e., neurograms) and then matched quantitatively with the neurograms of the single vowels using the Neurogram Similarity Index Measure (NSIM). A novel computational decision model was used to predict concurrent vowel identification. To facilitate optimum matches between the model predictions and the behavioral human data, internal noise was added at either neurogram generation or neurogram matching using the NSIM procedure. The best fit to the behavioral data was achieved with a signal-to-noise ratio (SNR) of 8 dB for internal noise added at the neurogram but with a much smaller amount of internal noise (SNR of 60 dB) for internal noise added at the level of the NSIM computations. The results suggest that accurate modeling of concurrent vowel data from listeners with normal hearing may partly depend on internal noise and where internal noise is hypothesized to occur during the concurrent vowel identification process.  相似文献   

12.
A major theoretical debate in language acquisition research regards the learnability of hierarchical structures. The artificial grammar learning methodology is increasingly influential in approaching this question. Studies using an artificial centre-embedded A(n)B(n) grammar without semantics draw conflicting conclusions. This study investigates the facilitating effect of distributional biases in simple AB adjacencies in the input sample--caused in natural languages, among others, by semantic biases-on learning a centre-embedded structure. A mathematical simulation of the linguistic input and the learning, comparing various distributional biases in AB pairs, suggests that strong distributional biases might help us to grasp the complex A(n)B(n) hierarchical structure in a later stage. This theoretical investigation might contribute to our understanding of how distributional features of the input--including those caused by semantic variation--help learning complex structures in natural languages.  相似文献   

13.
The notion that linguistic forms and meanings are related only by convention and not by any direct relationship between sounds and semantic concepts is a foundational principle of modern linguistics. Though the principle generally holds across the lexicon, systematic exceptions have been identified. These “sound symbolic” forms have been identified in lexical items and linguistic processes in many individual languages. This paper examines sound symbolism in the languages of Australia. We conduct a statistical investigation of the evidence for several common patterns of sound symbolism, using data from a sample of 120 languages. The patterns examined here include the association of meanings denoting “smallness” or “nearness” with front vowels or palatal consonants, and the association of meanings denoting “largeness” or “distance” with back vowels or velar consonants. Our results provide evidence for the expected associations of vowels and consonants with meanings of “smallness” and “proximity” in Australian languages. However, the patterns uncovered in this region are more complicated than predicted. Several sound-meaning relationships are only significant for segments in prominent positions in the word, and the prevailing mapping between vowel quality and magnitude meaning cannot be characterized by a simple link between gradients of magnitude and vowel F2, contrary to the claims of previous studies.  相似文献   

14.
Four experiments sought evidence that listeners can use coherent changes in the frequency or amplitude of harmonics to segregate concurrent vowels. Segregation was not helped by giving the harmonics of competing vowels different patterns of frequency or amplitude modulation. However, modulating the frequencies of the components of one vowel was beneficial when the other vowel was not modulated, provided that both vowels were composed of components placed randomly in frequency. In addition, staggering the onsets of the two vowels, so that the amplitude of one vowel increased abruptly while the amplitude of the other was stationary, was also beneficial. Thus, the results demonstrate that listeners can group changing harmonics and can segregate them from stationary harmonics, but cannot use coherence of change to separate two sets of changing harmonics.  相似文献   

15.
Adaptation of saccade amplitude in response to intra-saccadic target displacement is a type of implicit motor learning which is required to compensate for physiological changes in saccade performance. Once established trials without intra-saccadic target displacement lead to de-adaptation or extinction, which has been attributed either to extra-retinal mechanisms of spatial constancy or to the influence of the stable visual surroundings. Therefore we investigated whether visual deprivation (“Ganzfeld”-stimulation or sleep) can partially maintain this motor learning compared to free viewing of the natural surroundings. Thirty-five healthy volunteers performed two adaptation blocks of 100 inward adaptation trials – interspersed by an extinction block – which were followed by a two-hour break with or without visual deprivation (VD). Using additional adaptation and extinction blocks short and long (4 weeks) term memory of this implicit motor learning were tested. In the short term, motor memory tested immediately after free viewing was superior to adaptation performance after VD. In the long run, however, effects were opposite: motor memory and relearning of adaptation was superior in the VD conditions. This could imply independent mechanisms that underlie the short-term ability of retrieving learned saccadic gain and its long term consolidation. We suggest that subjects mainly rely on visual cues (i.e., retinal error) in the free viewing condition which makes them prone to changes of the visual stimulus in the extinction block. This indicates the role of a stable visual array for resetting adapted saccade amplitudes. In contrast, visual deprivation (GS and sleep), might train subjects to rely on extra-retinal cues, e.g., efference copy or prediction to remap their internal representations of saccade targets, thus leading to better consolidation of saccadic adaptation.  相似文献   

16.
Recovering discrete words from continuous speech is one of the first challenges facing language learners. Infants and adults can make use of the statistical structure of utterances to learn the forms of words from unsegmented input, suggesting that this ability may be useful for bootstrapping language-specific cues to segmentation. It is unknown, however, whether performance shown in small-scale laboratory demonstrations of “statistical learning” can scale up to allow learning of the lexicons of natural languages, which are orders of magnitude larger. Artificial language experiments with adults can be used to test whether the mechanisms of statistical learning are in principle scalable to larger lexicons. We report data from a large-scale learning experiment that demonstrates that adults can learn words from unsegmented input in much larger languages than previously documented and that they retain the words they learn for years. These results suggest that statistical word segmentation could be scalable to the challenges of lexical acquisition in natural language learning.  相似文献   

17.
《Anthrozo?s》2013,26(3):373-380
ABSTRACT

Vowel triangle area is a phonetic measure of the clarity of vowel articulation. Compared with speech to adults, people hyperarticulate vowels in speech to infants and foreigners but not to pets, despite other similarities in infant- and pet-directed-speech. This suggests that vowel hyperarticulation has a didactic function positively related to the actual, or even the expected, degree of linguistic competence of the audience. Parrots have some degree of linguistic competence yet no studies have examined vowel hyperarticulation in speech to parrots. Here, we compared the speech of 11 adults to another adult, a dog, a parrot, and an infant. A significant linear increase in vowel triangle area was found across the four conditions, showing that the degree of vowel hyperarticulation increased from adult- and dog-directed speech to parrot-directed speech, then to infant-directed speech. This suggests that the degree of vowel hyperarticulation is related to the audience's actual or expected linguistic competence. The results are discussed in terms of the relative roles of speakers' expectations versus listeners' feedback in the production of vowel hyperarticulation; and suggestions for further studies, manipulating speaker expectation and listener feedback, are provided.  相似文献   

18.
M Latinus  P Belin 《PloS one》2012,7(7):e41384
Humans can identify individuals from their voice, suggesting the existence of a perceptual representation of voice identity. We used perceptual aftereffects - shifts in perceived stimulus quality after brief exposure to a repeated adaptor stimulus - to further investigate the representation of voice identity in two experiments. Healthy adult listeners were familiarized with several voices until they reached a recognition criterion. They were then tested on identification tasks that used vowel stimuli generated by morphing between the different identities, presented either in isolation (baseline) or following short exposure to different types of voice adaptors (adaptation). Experiment 1 showed that adaptation to a given voice induced categorization shifts away from that adaptor's identity even when the adaptors consisted of vowels different from the probe stimuli. Moreover, original voices and caricatures resulted in comparable aftereffects, ruling out an explanation of identity aftereffects in terms of adaptation to low-level features. In Experiment 2, we show that adaptors with a disrupted configuration, i.e., altered fundamental frequency or formant frequencies, failed to produce perceptual aftereffects showing the importance of the preserved configuration of these acoustical cues in the representation of voices. These two experiments indicate a high-level, dynamic representation of voice identity based on the combination of several lower-level acoustical features into a specific voice configuration.  相似文献   

19.
Monolingual infants start learning the prosodic properties of their native language around 6 to 9 months of age, a fact marked by the development of preferences for predominant prosodic patterns and a decrease in sensitivity to non-native prosodic properties. The present study evaluates the effects of bilingual acquisition on speech perception by exploring how stress pattern perception may differ in French-learning 10-month-olds raised in bilingual as opposed to monolingual environments. Experiment 1 shows that monolinguals can discriminate stress patterns following a long familiarization to one of two patterns, but not after a short familiarization. In Experiment 2, two subgroups of bilingual infants growing up learning both French and another language (varying across infants) in which stress is used lexically were tested under the more difficult short familiarization condition: one with balanced input, and one receiving more input in the language other than French. Discrimination was clearly found for the other-language-dominant subgroup, establishing heightened sensitivity to stress pattern contrasts in these bilinguals as compared to monolinguals. However, the balanced bilinguals' performance was not better than that of monolinguals, establishing an effect of the relative balance of the language input. This pattern of results is compatible with the proposal that sensitivity to prosodic contrasts is maintained or enhanced in a bilingual population compared to a monolingual population in which these contrasts are non-native, provided that this dimension is used in one of the two languages in acquisition, and that infants receive enough input from that language.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号