首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Seeing the articulatory gestures of the speaker (“speech reading”) enhances speech perception especially in noisy conditions. Recent neuroimaging studies tentatively suggest that speech reading activates speech motor system, which then influences superior-posterior temporal lobe auditory areas via an efference copy. Here, nineteen healthy volunteers were presented with silent videoclips of a person articulating Finnish vowels /a/, /i/ (non-targets), and /o/ (targets) during event-related functional magnetic resonance imaging (fMRI). Speech reading significantly activated visual cortex, posterior fusiform gyrus (pFG), posterior superior temporal gyrus and sulcus (pSTG/S), and the speech motor areas, including premotor cortex, parts of the inferior (IFG) and middle (MFG) frontal gyri extending into frontal polar (FP) structures, somatosensory areas, and supramarginal gyrus (SMG). Structural equation modelling (SEM) of these data suggested that information flows first from extrastriate visual cortex to pFS, and from there, in parallel, to pSTG/S and MFG/FP. From pSTG/S information flow continues to IFG or SMG and eventually somatosensory areas. Feedback connectivity was estimated to run from MFG/FP to IFG, and pSTG/S. The direct functional connection from pFG to MFG/FP and feedback connection from MFG/FP to pSTG/S and IFG support the hypothesis of prefrontal speech motor areas influencing auditory speech processing in pSTG/S via an efference copy.  相似文献   

2.
Resting state studies of spontaneous fluctuations in the functional magnetic resonance imaging (fMRI) blood oxygen level dependent signal have shown great potential in mapping the intrinsic functional connectivity of the human brain underlying cognitive functions. The aim of the present study was to explore the developmental changes in functional networks of the developing human brain exemplified with the language network in typically developing preschool children. To this end, resting-sate fMRI data were obtained from native Chinese children at ages of 3 and 5 years, 15 in each age group. Resting-state functional connectivity (RSFC) was analyzed for four regions of interest; these are the left and right anterior superior temporal gyrus (aSTG), left posterior superior temporal gyrus (pSTG), and left inferior frontal gyrus (IFG). The comparison of these RSFC maps between 3- and 5-year-olds revealed that RSFC decreases in the right aSTG and increases in the left hemisphere between aSTG seed and IFG, between pSTG seed and IFG, as well as between IFG seed and posterior superior temporal sulcus. In a subsequent analysis, functional asymmetry of the language network seeding in aSTG, pSTG and IFG was further investigated. The results showed an increase of left lateralization in both RSFC of pSTG and of IFG from ages 3 to 5 years. The IFG showed a leftward lateralized trend in 3-year-olds, while pSTG demonstrated rightward asymmetry in 5-year-olds. These findings suggest clear developmental trajectories of the language network between 3- and 5-year-olds revealed as a function of age, characterized by increasing long-range connections and dynamic hemispheric lateralization with age. Our study provides new insights into the developmental changes of a well-established functional network in young children and also offers a basis for future cross-culture and cross-age studies of the resting-state language network.  相似文献   

3.
It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition), as well as in normal controls. A visual speech recognition task (i.e. speechreading) was administered either in silence or in combination with three types of auditory distractors: i) noise ii) reverse speech sound and iii) non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.  相似文献   

4.
Listening to speech in the presence of other sounds   总被引:1,自引:0,他引:1  
Although most research on the perception of speech has been conducted with speech presented without any competing sounds, we almost always listen to speech against a background of other sounds which we are adept at ignoring. Nevertheless, such additional irrelevant sounds can cause severe problems for speech recognition algorithms and for the hard of hearing as well as posing a challenge to theories of speech perception. A variety of different problems are created by the presence of additional sound sources: detection of features that are partially masked, allocation of detected features to the appropriate sound sources and recognition of sounds on the basis of partial information. The separation of sounds is arousing substantial attention in psychoacoustics and in computer science. An effective solution to the problem of separating sounds would have important practical applications.  相似文献   

5.
Humans can recognize spoken words with unmatched speed and accuracy. Hearing the initial portion of a word such as "formu…" is sufficient for the brain to identify "formula" from the thousands of other words that partially match. Two alternative computational accounts propose that partially matching words (1) inhibit each other until a single word is selected ("formula" inhibits "formal" by lexical competition) or (2) are used to predict upcoming speech sounds more accurately (segment prediction error is minimal after sequences like "formu…"). To distinguish these theories we taught participants novel words (e.g., "formubo") that sound like existing words ("formula") on two successive days. Computational simulations show that knowing "formubo" increases lexical competition when hearing "formu…", but reduces segment prediction error. Conversely, when the sounds in "formula" and "formubo" diverge, the reverse is observed. The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account. We propose a predictive coding model of spoken word recognition in which STG neurons represent the difference between predicted and heard speech sounds. This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.  相似文献   

6.
The present article outlines the contribution of the mismatch negativity (MMN), and its magnetic equivalent MMNm, to our understanding of the perception of speech sounds in the human brain. MMN data indicate that each sound, both speech and non-speech, develops its neural representation corresponding to the percept of this sound in the neurophysiological substrate of auditory sensory memory. The accuracy of this representation, determining the accuracy of the discrimination between different sounds, can be probed with MMN separately for any auditory feature or stimulus type such as phonemes. Furthermore, MMN data show that the perception of phonemes, and probably also of larger linguistic units (syllables and words), is based on language-specific phonetic traces developed in the posterior part of the left-hemisphere auditory cortex. These traces serve as recognition models for the corresponding speech sounds in listening to speech.  相似文献   

7.
As we speak, we use not only the arbitrary form–meaning mappings of the speech channel but also motivated form–meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal–posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language.  相似文献   

8.
Several acoustic cues contribute to auditory distance estimation. Nonacoustic cues, including familiarity, may also play a role. We tested participants' ability to distinguish the distances of acoustically similar sounds that differed in familiarity. Participants were better able to judge the distances of familiar sounds. Electroencephalographic (EEG) recordings collected while participants performed this auditory distance judgment task revealed that several cortical regions responded in different ways depending on sound familiarity. Surprisingly, these differences were observed in auditory cortical regions as well as other cortical regions distributed throughout both hemispheres. These data suggest that learning about subtle, distance-dependent variations in complex speech sounds involves processing in a broad cortical network that contributes both to speech recognition and to how spatial information is extracted from speech.  相似文献   

9.
How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.  相似文献   

10.
Bottlenose dolphins (Tursiops truncatus) use the frequency contour of whistles produced by conspecifics for individual recognition. Here we tested a bottlenose dolphin’s (Tursiops truncatus) ability to recognize frequency modulated whistle-like sounds using a three alternative matching-to-sample paradigm. The dolphin was first trained to select a specific object (object A) in response to a specific sound (sound A) for a total of three object-sound associations. The sounds were then transformed by amplitude, duration, or frequency transposition while still preserving the frequency contour of each sound. For comparison purposes, 30 human participants completed an identical task with the same sounds, objects, and training procedure. The dolphin’s ability to correctly match objects to sounds was robust to changes in amplitude with only a minor decrement in performance for short durations. The dolphin failed to recognize sounds that were frequency transposed by plus or minus ½ octaves. Human participants demonstrated robust recognition with all acoustic transformations. The results indicate that this dolphin’s acoustic recognition of whistle-like sounds was constrained by absolute pitch. Unlike human speech, which varies considerably in average frequency, signature whistles are relatively stable in frequency, which may have selected for a whistle recognition system invariant to frequency transposition.  相似文献   

11.
Virtually every human faculty engage with imitation. One of the most natural and unexplored objects for the study of the mimetic elements in language is the onomatopoeia, as it implies an imitative-driven transformation of a sound of nature into a word. Notably, simple sounds are transformed into complex strings of vowels and consonants, making difficult to identify what is acoustically preserved in this operation. In this work we propose a definition for vocal imitation by which sounds are transformed into the speech elements that minimize their spectral difference within the constraints of the vocal system. In order to test this definition, we use a computational model that allows recovering anatomical features of the vocal system from experimental sound data. We explore the vocal configurations that best reproduce non-speech sounds, like striking blows on a door or the sharp sounds generated by pressing on light switches or computer mouse buttons. From the anatomical point of view, the configurations obtained are readily associated with co-articulated consonants, and we show perceptual evidence that these consonants are positively associated with the original sounds. Moreover, the pairs vowel-consonant that compose these co-articulations correspond to the most stable syllables found in the knock and click onomatopoeias across languages, suggesting a mechanism by which vocal imitation naturally embeds single sounds into more complex speech structures. Other mimetic forces received extensive attention by the scientific community, such as cross-modal associations between speech and visual categories. The present approach helps building a global view of the mimetic forces acting on language and opens a new venue for a quantitative study of word formation in terms of vocal imitation.  相似文献   

12.
基于MFCC和GMM的昆虫声音自动识别   总被引:1,自引:0,他引:1  
竺乐庆  张真 《昆虫学报》2012,55(4):466-471
昆虫的运动、 取食、 鸣叫都会发出声音, 这些声音存在种内相似性和种间差异性, 因此可用来识别昆虫的种类。基于昆虫声音的昆虫种类自动检测技术对协助农业和林业从业人员方便地识别昆虫种类非常有意义。本研究采用了语音识别领域里的声音参数化技术来实现昆虫的声音自动鉴别。声音样本经预处理后, 提取梅尔倒谱系数(Mel frequency cepstrum coefficient, MFCC)作为特征, 并用这些样本提取的MFCC特征集训练混合高斯模型(Gaussian mixture model, GMM)。最后用训练所得到的GMM对未知类别的昆虫声音样本进行分类。该方法在包含58种昆虫声音的样本库中进行了评估, 取得了较高的识别正确率(平均精度为98.95%)和较理想的时间性能。该测试结果证明了基于MFCC和GMM的语音参数化技术可以用来有效地识别昆虫种类。  相似文献   

13.
An artificial neural network which uses anatomical and physiological findings on the afferent pathway from the ear to the cortex is presented and the roles of the constituent functions in recognition of continuous speech are examined. The network deals with successive spectra of speech sounds by a cascade of several neural layers: lateral excitation layer (LEL), lateral inhibition layer (LIL), and a pile of feature detection layers (FDL's). These layers are shown to be effective for recognizing spoken words. Namely, first, LEL reduces the distortion of sound spectrum caused by the pitch of speech sounds. Next, LIL emphasizes the major energy peaks of sound spectrum, the formants. Last, FDL's detect syllables and words in successive formants, where two functions, time-delay and strong adaptation, play important roles: time-delay makes it possible to retain the pattern of formant changes for a period to detect spoken words successively; strong adaptation contributes to removing the time-warp of formant changes. Digital computer simulations show that the network detect isolated syllables, isolated words, and connected words in continuous speech, while reproducing the fundamental responses found in the auditory system such as ON, OFF, ON-OFF, and SUSTAINED patterns.  相似文献   

14.
Animals recognize biologically relevant sounds, such as the non-harmonic sounds made by some predators, and respond with adaptive behaviors, such as escaping. To clarify which acoustic parameters are used for identifying non-harmonic, noise-like, broadband sounds, guinea pigs were conditioned to a natural target sound by introducing a novel training procedure in which 2 or 3 guinea pigs in a group competed for food. A set of distinct behavioral reactions was reliably induced almost exclusively to the target sound in a 2-week operant training. When fully conditioned, individual animals were separately tested for recognition of a set of target-like sounds that had been modified from the target sound, with spectral ranges eliminated or with fine or coarse temporal structures altered. The results show that guinea pigs are able to identify the noise-like non-harmonic natural sounds by relying on gross spectral compositions and/or fine temporal structures, just as birds are thought to do in the recognition of harmonic birdsongs. These findings are discussed with regard to similarities and dissimilarities to harmonic sound recognition. The results suggest that similar but not identical processing that requires different time scales might be used to recognize harmonic and non-harmonic sounds, at least in small mammals.  相似文献   

15.
The aims of this study were (1) to document the recognition performance of environmental sounds (ESs) in Mandarin-speaking children with cochlear implants (CIs) and to analyze the possible associated factors with the ESs recognition; (2) to examine the relationship between perception of ESs and receptive vocabulary level; and (3) to explore the acoustic factors relevant to perceptual outcomes of daily ESs in pediatric CI users. Forty-seven prelingually deafened children between ages 4 to 10 years participated in this study. They were divided into pre-school (group A: age 4–6) and school-age (group B: age 7 to 10) groups. Sound Effects Recognition Test (SERT) and the Chinese version of the revised Peabody Picture Vocabulary Test (PPVT-R) were used to assess the auditory perception ability. The average correct percentage of SERT was 61.2% in the preschool group and 72.3% in the older group. There was no significant difference between the two groups. The ESs recognition performance of children with CIs was poorer than that of their hearing peers (90% in average). No correlation existed between ESs recognition and receptive vocabulary comprehension. Two predictive factors: pre-implantation residual hearing and duration of CI usage were found to be associated with recognition performance of daily-encountered ESs. Acoustically, sounds with distinct temporal patterning were easier to identify for children with CIs. In conclusion, we have demonstrated that ESs recognition is not easy for children with CIs and a low correlation existed between linguistic sounds and ESs recognition in these subjects. Recognition ability of ESs in children with CIs can only be achieved by natural exposure to daily-encountered auditory stimuli if sounds other than speech stimuli were less emphasized in routine verbal/oral habilitation program. Therefore, task-specific measures other than speech materials can be helpful to capture the full profile of auditory perceptual progress after implantation.  相似文献   

16.
Deng Y  Guo R  Ding G  Peng D 《PloS one》2012,7(3):e33337
Both the ventral and dorsal visual streams in the human brain are known to be involved in reading. However, the interaction of these two pathways and their responses to different cognitive demands remains unclear. In this study, activation of neural pathways during Chinese character reading was acquired by using a functional magnetic resonance imaging (fMRI) technique. Visual-spatial analysis (mediated by the dorsal pathway) was disassociated from lexical recognition (mediated by the ventral pathway) via a spatial-based lexical decision task and effective connectivity analysis. Connectivity results revealed that, during spatial processing, the left superior parietal lobule (SPL) positively modulated the left fusiform gyrus (FG), while during lexical processing, the left SPL received positive modulatory input from the left inferior frontal gyrus (IFG) and sent negative modulatory output to the left FG. These findings suggest that the dorsal stream is highly involved in lexical recognition and acts as a top-down modulator for lexical processing.  相似文献   

17.
The multiple-channel cochlear implant is the first sensori-neural prosthesis to effectively and safely bring electronic technology into a direct physiological relation with the central nervous system and human consciousness, and to give speech perception to severely-profoundly deaf people and spoken language to children. Research showed that the place and temporal coding of sound frequencies could be partly replicated by multiple-channel stimulation of the auditory nerve. This required safety studies on how to prevent the effects to the cochlea of trauma, electrical stimuli, biomaterials and middle ear infection. The mechanical properties of an array and mode of stimulation for the place coding of speech frequencies were determined. A fully implantable receiver-stimulator was developed, as well as the procedures for the clinical assessment of deaf people, and the surgical placement of the device. The perception of electrically coded sounds was determined, and a speech processing strategy discovered that enabled late-deafened adults to comprehend running speech. The brain processing systems for patterns of electrical stimuli reproducing speech were elucidated. The research was developed industrially, and improvements in speech processing made through presenting additional speech frequencies by place coding. Finally, the importance of the multiple-channel cochlear implant for early deafened children was established.  相似文献   

18.
Sounds in our environment like voices, animal calls or musical instruments are easily recognized by human listeners. Understanding the key features underlying this robust sound recognition is an important question in auditory science. Here, we studied the recognition by human listeners of new classes of sounds: acoustic and auditory sketches, sounds that are severely impoverished but still recognizable. Starting from a time-frequency representation, a sketch is obtained by keeping only sparse elements of the original signal, here, by means of a simple peak-picking algorithm. Two time-frequency representations were compared: a biologically grounded one, the auditory spectrogram, which simulates peripheral auditory filtering, and a simple acoustic spectrogram, based on a Fourier transform. Three degrees of sparsity were also investigated. Listeners were asked to recognize the category to which a sketch sound belongs: singing voices, bird calls, musical instruments, and vehicle engine noises. Results showed that, with the exception of voice sounds, very sparse representations of sounds (10 features, or energy peaks, per second) could be recognized above chance. No clear differences could be observed between the acoustic and the auditory sketches. For the voice sounds, however, a completely different pattern of results emerged, with at-chance or even below-chance recognition performances, suggesting that the important features of the voice, whatever they are, were removed by the sketch process. Overall, these perceptual results were well correlated with a model of auditory distances, based on spectro-temporal excitation patterns (STEPs). This study confirms the potential of these new classes of sounds, acoustic and auditory sketches, to study sound recognition.  相似文献   

19.
Environmental sounds are highly complex stimuli whose recognition depends on the interaction of top-down and bottom-up processes in the brain. Their semantic representations were shown to yield repetition suppression effects, i. e. a decrease in activity during exposure to a sound that is perceived as belonging to the same source as a preceding sound. Making use of the high spatial resolution of 7T fMRI we have investigated the representations of sound objects within early-stage auditory areas on the supratemporal plane. The primary auditory cortex was identified by means of tonotopic mapping and the non-primary areas by comparison with previous histological studies. Repeated presentations of different exemplars of the same sound source, as compared to the presentation of different sound sources, yielded significant repetition suppression effects within a subset of early-stage areas. This effect was found within the right hemisphere in primary areas A1 and R as well as two non-primary areas on the antero-medial part of the planum temporale, and within the left hemisphere in A1 and a non-primary area on the medial part of Heschl’s gyrus. Thus, several, but not all early-stage auditory areas encode the meaning of environmental sounds.  相似文献   

20.
The processing of species-specific communication signals in the auditory system represents an important aspect of animal behavior and is crucial for its social interactions, reproduction, and survival. In this article the neuronal mechanisms underlying the processing of communication signals in the higher centers of the auditory system--inferior colliculus (IC), medial geniculate body (MGB) and auditory cortex (AC)--are reviewed, with particular attention to the guinea pig. The selectivity of neuronal responses for individual calls in these auditory centers in the guinea pig is usually low--most neurons respond to calls as well as to artificial sounds; the coding of complex sounds in the central auditory nuclei is apparently based on the representation of temporal and spectral features of acoustical stimuli in neural networks. Neuronal response patterns in the IC reliably match the sound envelope for calls characterized by one or more short impulses, but do not exactly fit the envelope for long calls. Also, the main spectral peaks are represented by neuronal firing rates in the IC. In comparison to the IC, response patterns in the MGB and AC demonstrate a less precise representation of the sound envelope, especially in the case of longer calls. The spectral representation is worse in the case of low-frequency calls, but not in the case of broad-band calls. The emotional content of the call may influence neuronal responses in the auditory pathway, which can be demonstrated by stimulation with time-reversed calls or by measurements performed under different levels of anesthesia. The investigation of the principles of the neural coding of species-specific vocalizations offers some keys for understanding the neural mechanisms underlying human speech perception.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号