首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
Spatial release from masking refers to a benefit for speech understanding. It occurs when a target talker and a masker talker are spatially separated. In those cases, speech intelligibility for target speech is typically higher than when both talkers are at the same location. In cochlear implant listeners, spatial release from masking is much reduced or absent compared with normal hearing listeners. Perhaps this reduced spatial release occurs because cochlear implant listeners cannot effectively attend to spatial cues. Three experiments examined factors that may interfere with deploying spatial attention to a target talker masked by another talker. To simulate cochlear implant listening, stimuli were vocoded with two unique features. First, we used 50-Hz low-pass filtered speech envelopes and noise carriers, strongly reducing the possibility of temporal pitch cues; second, co-modulation was imposed on target and masker utterances to enhance perceptual fusion between the two sources. Stimuli were presented over headphones. Experiments 1 and 2 presented high-fidelity spatial cues with unprocessed and vocoded speech. Experiment 3 maintained faithful long-term average interaural level differences but presented scrambled interaural time differences with vocoded speech. Results show a robust spatial release from masking in Experiments 1 and 2, and a greatly reduced spatial release in Experiment 3. Faithful long-term average interaural level differences were insufficient for producing spatial release from masking. This suggests that appropriate interaural time differences are necessary for restoring spatial release from masking, at least for a situation where there are few viable alternative segregation cues.  相似文献   

2.
In reverberant rooms with multiple-people talking, spatial separation between speech sources improves recognition of attended speech, even though both the head-shadowing and interaural-interaction unmasking cues are limited by numerous reflections. It is the perceptual integration between the direct wave and its reflections that bridges the direct-reflection temporal gaps and results in the spatial unmasking under reverberant conditions. This study further investigated (1) the temporal dynamic of the direct-reflection-integration-based spatial unmasking as a function of the reflection delay, and (2) whether this temporal dynamic is correlated with the listeners’ auditory ability to temporally retain raw acoustic signals (i.e., the fast decaying primitive auditory memory, PAM). The results showed that recognition of the target speech against the speech-masker background is a descending exponential function of the delay of the simulated target reflection. In addition, the temporal extent of PAM is frequency dependent and markedly longer than that for perceptual fusion. More importantly, the temporal dynamic of the speech-recognition function is significantly correlated with the temporal extent of the PAM of low-frequency raw signals. Thus, we propose that a chain process, which links the earlier-stage PAM with the later-stage correlation computation, perceptual integration, and attention facilitation, plays a role in spatially unmasking target speech under reverberant conditions.  相似文献   

3.
Many behaviourally relevant sensory events such as motion stimuli and speech have an intrinsic spatio-temporal structure. This will engage intentional and most likely unintentional (automatic) prediction mechanisms enhancing the perception of upcoming stimuli in the event stream. Here we sought to probe the anticipatory processes that are automatically driven by rhythmic input streams in terms of their spatial and temporal components. To this end, we employed an apparent visual motion paradigm testing the effects of pre-target motion on lateralized visual target discrimination. The motion stimuli either moved towards or away from peripheral target positions (valid vs. invalid spatial motion cueing) at a rhythmic or arrhythmic pace (valid vs. invalid temporal motion cueing). Crucially, we emphasized automatic motion-induced anticipatory processes by rendering the motion stimuli non-predictive of upcoming target position (by design) and task-irrelevant (by instruction), and by creating instead endogenous (orthogonal) expectations using symbolic cueing. Our data revealed that the apparent motion cues automatically engaged both spatial and temporal anticipatory processes, but that these processes were dissociated. We further found evidence for lateralisation of anticipatory temporal but not spatial processes. This indicates that distinct mechanisms may drive automatic spatial and temporal extrapolation of upcoming events from rhythmic event streams. This contrasts with previous findings that instead suggest an interaction between spatial and temporal attention processes when endogenously driven. Our results further highlight the need for isolating intentional from unintentional processes for better understanding the various anticipatory mechanisms engaged in processing behaviourally relevant stimuli with predictable spatio-temporal structure such as motion and speech.  相似文献   

4.
Klinge A  Beutelmann R  Klump GM 《PloS one》2011,6(10):e26124
The amount of masking of sounds from one source (signals) by sounds from a competing source (maskers) heavily depends on the sound characteristics of the masker and the signal and on their relative spatial location. Numerous studies investigated the ability to detect a signal in a speech or a noise masker or the effect of spatial separation of signal and masker on the amount of masking, but there is a lack of studies investigating the combined effects of many cues on the masking as is typical for natural listening situations. The current study using free-field listening systematically evaluates the combined effects of harmonicity and inharmonicity cues in multi-tone maskers and cues resulting from spatial separation of target signal and masker on the detection of a pure tone in a multi-tone or a noise masker. A linear binaural processing model was implemented to predict the masked thresholds in order to estimate whether the observed thresholds can be accounted for by energetic masking in the auditory periphery or whether other effects are involved. Thresholds were determined for combinations of two target frequencies (1 and 8 kHz), two spatial configurations (masker and target either co-located or spatially separated by 90 degrees azimuth), and five different masker types (four complex multi-tone stimuli, one noise masker). A spatial separation of target and masker resulted in a release from masking for all masker types. The amount of masking significantly depended on the masker type and frequency range. The various harmonic and inharmonic relations between target and masker or between components of the masker resulted in a complex pattern of increased or decreased masked thresholds in comparison to the predicted energetic masking. The results indicate that harmonicity cues affect the detectability of a tonal target in a complex masker.  相似文献   

5.
Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.  相似文献   

6.
The purpose of the present study was to determine whether different cues to increase loudness in speech result in different internal targets (or goals) for respiratory movement and whether the neural control of the respiratory system is sensitive to changes in the speaker's internal loudness target. This study examined respiratory mechanisms during speech in 30 young adults at comfortable level and increased loudness levels. Increased loudness was elicited using three methods: asking subjects to target a specific sound pressure level, asking subjects to speak twice as loud as comfortable, and asking subjects to speak in noise. All three loud conditions resulted in similar increases in sound pressure level . However, the respiratory mechanisms used to support the increase in loudness differed significantly depending on how the louder speech was elicited. When asked to target at a particular sound pressure level, subjects used a mechanism of increasing the lung volume at which speech was initiated to take advantage of higher recoil pressures. When asked to speak twice as loud as comfortable, subjects increased expiratory muscle tension, for the most part, to increase the pressure for speech. However, in the most natural of the elicitation methods, speaking in noise, the subjects used a combined respiratory approach, using both increased recoil pressures and increased expiratory muscle tension. In noise, an additional target, possibly improving intelligibility of speech, was reflected in the slowing of speech rate and in larger volume excursions even though the speakers were producing the same number of syllables.  相似文献   

7.
Most people are right-handed and left-cerebrally dominant for speech, leading historically to the general notion of left-hemispheric dominance, and more recently to genetic models proposing a single lateralizing gene. This hypothetical gene can account for higher incidence of right-handers in those with left cerebral dominance for speech. It remains unclear how this dominance relates to the right-cerebral dominance for some nonverbal functions such as spatial or emotional processing. Here we use functional magnetic resonance imaging with a sample of 155 subjects to measure asymmetrical activation induced by speech production in the frontal lobes, by face processing in the temporal lobes, and by spatial processing in the parietal lobes. Left-frontal, right-temporal, and right-parietal dominance were all intercorrelated, suggesting that right-cerebral biases may be at least in part complementary to the left-hemispheric dominance for language. However, handedness and parietal asymmetry for spatial processing were uncorrelated, implying independent lateralizing processes, one producing a leftward bias most closely associated with handedness, and the other a rightward bias most closely associated with spatial attention.  相似文献   

8.
Hasson U  Skipper JI  Nusbaum HC  Small SL 《Neuron》2007,56(6):1116-1126
Is there a neural representation of speech that transcends its sensory properties? Using fMRI, we investigated whether there are brain areas where neural activity during observation of sublexical audiovisual input corresponds to a listener's speech percept (what is "heard") independent of the sensory properties of the input. A target audiovisual stimulus was preceded by stimuli that (1) shared the target's auditory features (auditory overlap), (2) shared the target's visual features (visual overlap), or (3) shared neither the target's auditory or visual features but were perceived as the target (perceptual overlap). In two left-hemisphere regions (pars opercularis, planum polare), the target invoked less activity when it was preceded by the perceptually overlapping stimulus than when preceded by stimuli that shared one of its sensory components. This pattern of neural facilitation indicates that these regions code sublexical speech at an abstract level corresponding to that of the speech percept.  相似文献   

9.
Our auditory system has to organize and to pick up a target sound with many components, sometimes rejecting irrelevant sound components, but sometimes forming multiple streams including the target stream. This situation is well described with the concept of auditory scene analysis. Research on speech perception in noise is closely related to auditory scene analysis. This paper briefly reviews the concept of auditory scene analysis and previous and ongoing research on speech perception in noise, and discusses the future direction of research. Further experimental investigations are needed to understand our perceptual mechanisms better.  相似文献   

10.
Music has a pervasive tendency to rhythmically engage our body. In contrast, synchronization with speech is rare. Music’s superiority over speech in driving movement probably results from isochrony of musical beats, as opposed to irregular speech stresses. Moreover, the presence of regular patterns of embedded periodicities (i.e., meter) may be critical in making music particularly conducive to movement. We investigated these possibilities by asking participants to synchronize with isochronous auditory stimuli (target), while music and speech distractors were presented at one of various phase relationships with respect to the target. In Exp. 1, familiar musical excerpts and fragments of children poetry were used as distractors. The stimuli were manipulated in terms of beat/stress isochrony and average pitch to achieve maximum comparability. In Exp. 2, the distractors were well-known songs performed with lyrics, on a reiterated syllable, and spoken lyrics, all having the same meter. Music perturbed synchronization with the target stimuli more than speech fragments. However, music superiority over speech disappeared when distractors shared isochrony and the same meter. Music’s peculiar and regular temporal structure is likely to be the main factor fostering tight coupling between sound and movement.  相似文献   

11.
Bishop CW  Miller LM 《PloS one》2011,6(8):e24016
Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.  相似文献   

12.
Nasir SM  Ostry DJ 《Current biology : CB》2006,16(19):1918-1923
Speech production is dependent on both auditory and somatosensory feedback. Although audition may appear to be the dominant sensory modality in speech production, somatosensory information plays a role that extends from brainstem responses to cortical control. Accordingly, the motor commands that underlie speech movements may have somatosensory as well as auditory goals. Here we provide evidence that, independent of the acoustics, somatosensory information is central to achieving the precision requirements of speech movements. We were able to dissociate auditory and somatosensory feedback by using a robotic device that altered the jaw's motion path, and hence proprioception, without affecting speech acoustics. The loads were designed to target either the consonant- or vowel-related portion of an utterance because these are the major sound categories in speech. We found that, even in the absence of any effect on the acoustics, with learning subjects corrected to an equal extent for both kinds of loads. This finding suggests that there are comparable somatosensory precision requirements for both kinds of speech sounds. We provide experimental evidence that the neural control of stiffness or impedance--the resistance to displacement--provides for somatosensory precision in speech production.  相似文献   

13.
A possibility of correlative formation in ontogenesis of central mechanisms of stereognosis and speech function is considered in the paper by example of comparison of changes in spatial organization of interregional interaction of various cortex areas in children of three ages (5-6, 7-8 and 9-10 years) and in adult examinees during their performances of stereognostical, verbal-mnestical and motor manual activity (tepping-test). With age dynamics of children there is observed a significant increase of the degree of similarity of the spatial structure of EEG interrelations characteristic of periods of performance of stereognostical test with patterns of changes of the EEG distant connections revealed at performance of speech tasks. In turn, the similarity of patterns of interregional EEG relations characteristic of stereognostical tasks with the patterns revealed at the periods of performance of the tepping-test is not increased with age. On the whole, the obtained data allow believing that with increase of children's age there rises the degree of topological similarity of the spatial structure of systemic interactions of the cortex zones, on which there are "supported" processes of realization of stereognostical and speech functions. Progressing increase with children's age of the degree of similarity of distributive organization of neurophysiological mechanisms of central provision of the speech and stereognosis functions can indicate in favor of concept of correlative formation of these higher psychical functions in postnatal ontogenesis. The obtained results show that the correlative interfunctional interactions promoting progressive development of cognitive functions in the child ontogenesis can be realized through long fiber associated and commissural pathways composing the morphofunctional longitudinal-transversal "skeleton" ofneocortex in the close interaction with thalamo-cortical integrative systems.  相似文献   

14.
Motor alalia refers to a number of disorders of expressive speech that are caused by the dysfunction of cerebral structures in the period when the formation of the speech system is not complete. This form of speech disorder is considered as a language disorder characterized by a persistent disturbance of the assimilation of a system of linguistic units. The possible cause of deviations in the development of speech function in children is a disproportion in the levels of development of speech structures in the left and right hemispheres, and this temporary dominance is often associated with an increased activity in the right hemisphere. According to the results of electroencephalographic studies, in children aged five to six years, there are two types of changes of the bioelectric potential system interaction of the brain cortex. The disorders of the spatial organization of interregional EEG correlations are more pronounced in either the left or right hemispheres of the brain. Thus, motor alalia can be accompanied either by disturbances in the interaction between Broca’s and Wernicke’s areas of the left hemisphere, or between symmetrical areas of the right hemisphere.  相似文献   

15.
The role of structures of the left and right cerebral hemispheres in formation of speech function and memory was studied on the basis of complex examination of children with developmental speech disorders. On the basis of EEG estimation of the functional state of the brain, children were classified in two groups depending on the side of localization of changes in electrical activity: those with local changes in electrical activity in the left hemisphere (group I) and those with changes in the right hemisphere (group II). The medical history suggested that the observed features of topography of local changes in electrical activity were linked with the character of prenatal and labor complications and their consequences leading to embryo- and ontogenetic disorders in development of different brain regions. Comparison of the results of neuropsychological examination of the two groups showed that different regions of the brain cortex of both the left and right hemispheres are involved in speech formation. However, a specific role of the right hemisphere in formation and actualization of automatic speech series was revealed. It was suggested that the integrity of gnostic functions of the right hemisphere and, primarily, the spatial organization of perception and movements is a necessary factor of development of auditory–speech and nominative memory.  相似文献   

16.
We describe two design strategies that could substantially improve the performance of speech enhancement systems. Results from a preliminary study of pulse recovery are presented to illustrate the potential benefits of such strategies. The first strategy is a direct application of a non-linear, adaptive signal processing approach for recovery of speech in noise. The second strategy optimizes performance by maximizing the enhancement system's ability to evoke target speech percepts. This approach may lead to better performance because the design is optimized on a measure directly related to the ultimate goal of speech enhancement: accurate communication of the speech percept. In both systems, recently developed ‘neural network’ learning algorithms can be used to determine appropriate parameters for enhancement processing.  相似文献   

17.
Mutations of FOXP2 are associated with altered brain structure, including the striatal part of the basal ganglia, and cause a severe speech and language disorder. Songbirds serve as a tractable neurobiological model for speech and language research. Experimental downregulation of FoxP2 in zebra finch Area X, a nucleus of the striatal song control circuitry, affects synaptic transmission and spine densities. It also renders song learning and production inaccurate and imprecise, similar to the speech impairment of patients carrying FOXP2 mutations. Here we show that experimental downregulation of FoxP2 in Area X using lentiviral vectors leads to reduced expression of CNTNAP2, a FOXP2 target gene in humans. In addition, natural downregulation of FoxP2 by age or by singing also downregulated CNTNAP2 expression. Furthermore, we report that FoxP2 binds to and activates the avian CNTNAP2 promoter in vitro. Taken together these data establish CNTNAP2 as a direct FoxP2 target gene in songbirds, likely affecting synaptic function relevant for song learning and song maintenance.  相似文献   

18.
Convergence between cells which differ in both spatial and temporal properties create higher order neurons with response properties that are distinctly different from those of the input neurons. The spatial properties of target neurons are not necessarily cosinetuned. In addition, unlike the independence between spatial and temporal properties in cosine-tuned afferent neurons, higher-order target cells generally exhibit a dependence of temporal dynamics on spatial properties. The response properties of target neurons receiving spatio-temporal convergence (STC) from tonic and phasic-tonic or phasic afferents is investigated here by considering a general case where the dynamic input is represented by a fractional, leaky, derivative transfer function. It is shown that, at frequencies below the corner frequency of the dynamic input, the temporal properties of target neurons can be described by leaky differentiators having time constants that are a function of spatial direction. Thus, STC target neurons exhibit tonic temporal response properties during stimulation along some spatial directions (having small time constants) and phasic properties along other directions (having large time constants). Specifically, target neurons encode the complete derivative of the stimulus along certain spatial directions. Thus, STC acts as a directionally specific high-pass filter and produces complete derivatives from fractional, leaky derivative afferent signals. In addition, spatio-temporal transformations can generate novel temporal dynamics in the central nervous system. These observations suggest that spatio-temporal computations might constitute an alternative to parallel, independent spatial and temporal channels.  相似文献   

19.
基于时间机理与部位机理整合的鲁棒性语音信号表达   总被引:1,自引:0,他引:1  
传统语音信号谱特征的提取是基于FFT 的能谱分析方法,在噪音环境情况下,对噪音的频谱成分与语音信号的频谱成分的处理采用“平均主义”的原则。也就是说噪音的频谱成分与语音信号的频谱成分占同等重要的地位。显然在噪音环境中这种处理方法会使噪音掩蔽掉语音信号的成分。在听觉系统中这种处理编码方式如同耳蜗滤波器的频率分析功能那样,也就是部位机理。实际上听觉系统对噪音和周期信号的处理不是“平均主义”原则,而是对周期信号敏感, 对噪音不敏感,听觉神经纤维通过神经脉冲发放的周期间隔来编码刺激信号, 这对应听觉处理机制中的时间编码方式。基于这两种处理机制,文中提出整合部位机理和时间机理的方法,这正是听觉的处理刺激的方式。这样处理的方法很好地结合了两种处理机制的优点,能有效地探测噪音环境中的语音信号  相似文献   

20.
Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号