首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Humans can recognize spoken words with unmatched speed and accuracy. Hearing the initial portion of a word such as "formu…" is sufficient for the brain to identify "formula" from the thousands of other words that partially match. Two alternative computational accounts propose that partially matching words (1) inhibit each other until a single word is selected ("formula" inhibits "formal" by lexical competition) or (2) are used to predict upcoming speech sounds more accurately (segment prediction error is minimal after sequences like "formu…"). To distinguish these theories we taught participants novel words (e.g., "formubo") that sound like existing words ("formula") on two successive days. Computational simulations show that knowing "formubo" increases lexical competition when hearing "formu…", but reduces segment prediction error. Conversely, when the sounds in "formula" and "formubo" diverge, the reverse is observed. The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account. We propose a predictive coding model of spoken word recognition in which STG neurons represent the difference between predicted and heard speech sounds. This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.  相似文献   

2.

Background

Humans can easily restore a speech signal that is temporally masked by an interfering sound (e.g., a cough masking parts of a word in a conversation), and listeners have the illusion that the speech continues through the interfering sound. This perceptual restoration for human speech is affected by prior experience. Here we provide evidence for perceptual restoration in complex vocalizations of a songbird that are acquired by vocal learning in a similar way as humans learn their language.

Methodology/Principal Findings

European starlings were trained in a same/different paradigm to report salient differences between successive sounds. The birds'' response latency for discriminating between a stimulus pair is an indicator for the salience of the difference, and these latencies can be used to evaluate perceptual distances using multi-dimensional scaling. For familiar motifs the birds showed a large perceptual distance if discriminating between song motifs that were muted for brief time periods and complete motifs. If the muted periods were filled with noise, the perceptual distance was reduced. For unfamiliar motifs no such difference was observed.

Conclusions/Significance

The results suggest that starlings are able to perceptually restore partly masked sounds and, similarly to humans, rely on prior experience. They may be a suitable model to study the mechanism underlying experience-dependent perceptual restoration.  相似文献   

3.
Songbirds learn their song from an adult conspecific tutor when they are young, much like the acquisition of speech in human infants. When an adult zebra finch is re-exposed to its tutor's song, there is increased neuronal activation in the caudomedial nidopallium (NCM), the songbird equivalent of the auditory association cortex. This neuronal activation is related to the fidelity of song imitation, suggesting that the NCM may contain the neural representation of song memory. We found that bilateral neurotoxic lesions to the NCM of adult male zebra finches impaired tutor-song recognition but did not affect the males' song production or their ability to discriminate calls. These findings demonstrate that the NCM performs an essential role in the representation of tutor-song memory. In addition, our results show that tutor-song memory and a motor program for the bird's own song have separate neural representations in the songbird brain. Thus, in both humans and songbirds, the cognitive systems of vocal production and auditory recognition memory are subserved by distinct brain regions.  相似文献   

4.
Parrots and songbirds learn their vocalizations from a conspecific tutor, much like human infants acquire spoken language. Parrots can learn human words and it has been suggested that they can use them to communicate with humans. The caudomedial pallium in the parrot brain is homologous with that of songbirds, and analogous to the human auditory association cortex, involved in speech processing. Here we investigated neuronal activation, measured as expression of the protein product of the immediate early gene ZENK, in relation to auditory learning in the budgerigar (Melopsittacus undulatus), a parrot. Budgerigar males successfully learned to discriminate two Japanese words spoken by another male conspecific. Re-exposure to the two discriminanda led to increased neuronal activation in the caudomedial pallium, but not in the hippocampus, compared to untrained birds that were exposed to the same words, or were not exposed to words. Neuronal activation in the caudomedial pallium of the experimental birds was correlated significantly and positively with the percentage of correct responses in the discrimination task. These results suggest that in a parrot, the caudomedial pallium is involved in auditory learning. Thus, in parrots, songbirds and humans, analogous brain regions may contain the neural substrate for auditory learning and memory.  相似文献   

5.
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener’s native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.  相似文献   

6.
Early language acquisition: cracking the speech code   总被引:5,自引:0,他引:5  
Infants learn language with remarkable speed, but how they do it remains a mystery. New data show that infants use computational strategies to detect the statistical and prosodic patterns in language input, and that this leads to the discovery of phonemes and words. Social interaction with another human being affects speech learning in a way that resembles communicative learning in songbirds. The brain's commitment to the statistical and prosodic patterns that are experienced early in life might help to explain the long-standing puzzle of why infants are better language learners than adults. Successful learning by infants, as well as constraints on that learning, are changing theories of language acquisition.  相似文献   

7.
The most influential theory of learning to read is based on the idea that children rely on phonological decoding skills to learn novel words. According to the self-teaching hypothesis, each successful decoding encounter with an unfamiliar word provides an opportunity to acquire word-specific orthographic information that is the foundation of skilled word recognition. Therefore, phonological decoding acts as a self-teaching mechanism or ‘built-in teacher’. However, all previous connectionist models have learned the task of reading aloud through exposure to a very large corpus of spelling–sound pairs, where an ‘external’ teacher supplies the pronunciation of all words that should be learnt. Such a supervised training regimen is highly implausible. Here, we implement and test the developmentally plausible phonological decoding self-teaching hypothesis in the context of the connectionist dual process model. In a series of simulations, we provide a proof of concept that this mechanism works. The model was able to acquire word-specific orthographic representations for more than 25 000 words even though it started with only a small number of grapheme–phoneme correspondences. We then show how visual and phoneme deficits that are present at the outset of reading development can cause dyslexia in the course of reading development.  相似文献   

8.
Speech impairment is one of the most intriguing and least understood effects of alcohol on cognitive function, largely due to the lack of data on alcohol effects on vocalizations in the context of an appropriate experimental model organism. Zebra finches, a representative songbird and a premier model for understanding the neurobiology of vocal production and learning, learn song in a manner analogous to how humans learn speech. Here we show that when allowed access, finches readily drink alcohol, increase their blood ethanol concentrations (BEC) significantly, and sing a song with altered acoustic structure. The most pronounced effects were decreased amplitude and increased entropy, the latter likely reflecting a disruption in the birds’ ability to maintain the spectral structure of song under alcohol. Furthermore, specific syllables, which have distinct acoustic structures, were differentially influenced by alcohol, likely reflecting a diversity in the neural mechanisms required for their production. Remarkably, these effects on vocalizations occurred without overt effects on general behavioral measures, and importantly, they occurred within a range of BEC that can be considered risky for humans. Our results suggest that the variable effects of alcohol on finch song reflect differential alcohol sensitivity of the brain circuitry elements that control different aspects of song production. They also point to finches as an informative model for understanding how alcohol affects the neuronal circuits that control the production of learned motor behaviors.  相似文献   

9.
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken.  相似文献   

10.
A fundamental challenge in robotics today is building robots that can learn new skills by observing humans and imitating human actions. We propose a new Bayesian approach to robotic learning by imitation inspired by the developmental hypothesis that children use self-experience to bootstrap the process of intention recognition and goal-based imitation. Our approach allows an autonomous agent to: (i) learn probabilistic models of actions through self-discovery and experience, (ii) utilize these learned models for inferring the goals of human actions, and (iii) perform goal-based imitation for robotic learning and human-robot collaboration. Such an approach allows a robot to leverage its increasing repertoire of learned behaviors to interpret increasingly complex human actions and use the inferred goals for imitation, even when the robot has very different actuators from humans. We demonstrate our approach using two different scenarios: (i) a simulated robot that learns human-like gaze following behavior, and (ii) a robot that learns to imitate human actions in a tabletop organization task. In both cases, the agent learns a probabilistic model of its own actions, and uses this model for goal inference and goal-based imitation. We also show that the robotic agent can use its probabilistic model to seek human assistance when it recognizes that its inferred actions are too uncertain, risky, or impossible to perform, thereby opening the door to human-robot collaboration.  相似文献   

11.
As in human infant speech development, vocal imitation in songbirds involves sensory acquisition and memorization of adult-produced vocal signals, followed by a protracted phase of vocal motor practice. The internal model of adult tutor song in the juvenile male brain, termed ‘the template’, is central to the vocal imitation process. However, even the most fundamental aspects of the template, such as when, where and how it is encoded in the brain, remain poorly understood. A major impediment to progress is that current studies of songbird vocal learning use protracted tutoring over days, weeks or months, complicating dissection of the template encoding process. Here, we take the key step of tightly constraining the timing of template acquisition. We show that, in the zebra finch, template encoding can be time locked to, on average, a 2 h period of juvenile life and based on just 75 s of cumulative tutor song exposure. Crucially, we find that vocal changes occurring on the day of training correlate with eventual imitative success. This paradigm will lead to insights on how the template is instantiated in the songbird brain, with general implications for deciphering how internal models are formed to guide learning of complex social behaviours.  相似文献   

12.
13.
The brain''s decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example, robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this paper, we propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal sequences, giving a representation of the environment with predictive power on several timescales. We illustrate the ensuing decoding or recognition scheme using synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. By presenting anomalous stimuli, we find that the resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain.  相似文献   

14.
A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human.  相似文献   

15.
Extracting invariant features in an unsupervised manner is crucial to perform complex computation such as object recognition, analyzing music or understanding speech. While various algorithms have been proposed to perform such a task, Slow Feature Analysis (SFA) uses time as a means of detecting those invariants, extracting the slowly time-varying components in the input signals. In this work, we address the question of how such an algorithm can be implemented by neurons, and apply it in the context of audio stimuli. We propose a projected gradient implementation of SFA that can be adapted to a Hebbian like learning rule dealing with biologically plausible neuron models. Furthermore, we show that a Spike-Timing Dependent Plasticity learning rule, shaped as a smoothed second derivative, implements SFA for spiking neurons. The theory is supported by numerical simulations, and to illustrate a simple use of SFA, we have applied it to auditory signals. We show that a single SFA neuron can learn to extract the tempo in sound recordings.  相似文献   

16.
Memorizing and producing complex strings of sound are requirements for spoken human language. We share these behaviours with likely more than 4000 species of songbirds, making birds our primary model for studying the cognitive basis of vocal learning and, more generally, an important model for how memories are encoded in the brain. In songbirds, as in humans, the sounds that a juvenile learns later in life depend on auditory memories formed early in development. Experiments on a wide variety of songbird species suggest that the formation and lability of these auditory memories, in turn, depend on auditory predispositions that stimulate learning when a juvenile hears relevant, species-typical sounds. We review evidence that variation in key features of these auditory predispositions are determined by variation in genes underlying the development of the auditory system. We argue that increased investigation of the neuronal basis of auditory predispositions expressed early in life in combination with modern comparative genomic approaches may provide insights into the evolution of vocal learning.  相似文献   

17.
The brain mechanism of extracting visual features for recognizing various objects has consistently been a controversial issue in computational models of object recognition. To extract visual features, we introduce a new, biologically motivated model for facial categorization, which is an extension of the Hubel and Wiesel simple-to-complex cell hierarchy. To address the synaptic stability versus plasticity dilemma, we apply the Adaptive Resonance Theory (ART) for extracting informative intermediate level visual features during the learning process, which also makes this model stable against the destruction of previously learned information while learning new information. Such a mechanism has been suggested to be embedded within known laminar microcircuits of the cerebral cortex. To reveal the strength of the proposed visual feature learning mechanism, we show that when we use this mechanism in the training process of a well-known biologically motivated object recognition model (the HMAX model), it performs better than the HMAX model in face/non-face classification tasks. Furthermore, we demonstrate that our proposed mechanism is capable of following similar trends in performance as humans in a psychophysical experiment using a face versus non-face rapid categorization task.  相似文献   

18.
Human speech and bird vocalization are complex communicative behaviors with notable similarities in development and underlying mechanisms. However, there is an important difference between humans and birds in the way vocal complexity is generally produced. Human speech originates from independent modulatory actions of a sound source, e.g., the vibrating vocal folds, and an acoustic filter, formed by the resonances of the vocal tract (formants). Modulation in bird vocalization, in contrast, is thought to originate predominantly from the sound source, whereas the role of the resonance filter is only subsidiary in emphasizing the complex time-frequency patterns of the source (e.g., but see ). However, it has been suggested that, analogous to human speech production, tongue movements observed in parrot vocalizations modulate formant characteristics independently from the vocal source. As yet, direct evidence of such a causal relationship is lacking. In five Monk parakeets, Myiopsitta monachus, we replaced the vocal source, the syrinx, with a small speaker that generated a broad-band sound, and we measured the effects of tongue placement on the sound emitted from the beak. The results show that tongue movements cause significant frequency changes in two formants and cause amplitude changes in all four formants present between 0.5 and 10 kHz. We suggest that lingual articulation may thus in part explain the well-known ability of parrots to mimic human speech, and, even more intriguingly, may also underlie a speech-like formant system in natural parrot vocalizations.  相似文献   

19.
Songbird males learn to sing their songs from an adult ‘tutor’ early in life, much like human infants learn to speak. Similar to humans, in the songbird brain there are separate neural substrates for vocal production and for auditory memory. In adult songbirds, the caudal pallium, the avian equivalent of the auditory association cortex, has been proposed to contain the neural substrate of tutor song memory, while the song system is involved in song production as well as sensorimotor learning. If this hypothesis is correct, there should be neuronal activation in the caudal pallium, and not in the song system, while the young bird is hearing the tutor song. We found increased song-induced molecular neuronal activation, measured as the expression of an immediate early gene, in the caudal pallium of juvenile zebra finch males that were in the process of learning to sing their songs. No such activation was found in the song system. Molecular neuronal activation was significantly greater in response to tutor song than to novel song or silence in the medial part of the caudomedial nidopallium (NCM). In the caudomedial mesopallium, there was significantly greater molecular neuronal activation in response to tutor song than to silence. In addition, in the NCM there was a significant positive correlation between spontaneous molecular neuronal activation and the strength of song learning during sleep. These results suggest that the caudal pallium contains the neural substrate for tutor song memory, which is activated during sleep when the young bird is in the process of learning its song. The findings provide insight into the formation of auditory memories that guide vocal production learning, a process fundamental for human speech acquisition.  相似文献   

20.
The present article outlines the contribution of the mismatch negativity (MMN), and its magnetic equivalent MMNm, to our understanding of the perception of speech sounds in the human brain. MMN data indicate that each sound, both speech and non-speech, develops its neural representation corresponding to the percept of this sound in the neurophysiological substrate of auditory sensory memory. The accuracy of this representation, determining the accuracy of the discrimination between different sounds, can be probed with MMN separately for any auditory feature or stimulus type such as phonemes. Furthermore, MMN data show that the perception of phonemes, and probably also of larger linguistic units (syllables and words), is based on language-specific phonetic traces developed in the posterior part of the left-hemisphere auditory cortex. These traces serve as recognition models for the corresponding speech sounds in listening to speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号