首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Recent research has addressed the suppression of cortical sensory responses to altered auditory feedback that occurs at utterance onset regarding speech. However, there is reason to assume that the mechanisms underlying sensorimotor processing at mid-utterance are different than those involved in sensorimotor control at utterance onset. The present study attempted to examine the dynamics of event-related potentials (ERPs) to different acoustic versions of auditory feedback at mid-utterance.

Methodology/Principal findings

Subjects produced a vowel sound while hearing their pitch-shifted voice (100 cents), a sum of their vocalization and pure tones, or a sum of their vocalization and white noise at mid-utterance via headphones. Subjects also passively listened to playback of what they heard during active vocalization. Cortical ERPs were recorded in response to different acoustic versions of feedback changes during both active vocalization and passive listening. The results showed that, relative to passive listening, active vocalization yielded enhanced P2 responses to the 100 cents pitch shifts, whereas suppression effects of P2 responses were observed when voice auditory feedback was distorted by pure tones or white noise.

Conclusion/Significance

The present findings, for the first time, demonstrate a dynamic modulation of cortical activity as a function of the quality of acoustic feedback at mid-utterance, suggesting that auditory cortical responses can be enhanced or suppressed to distinguish self-produced speech from externally-produced sounds.  相似文献   

2.
3.
An artificial neural network which uses anatomical and physiological findings on the afferent pathway from the ear to the cortex is presented and the roles of the constituent functions in recognition of continuous speech are examined. The network deals with successive spectra of speech sounds by a cascade of several neural layers: lateral excitation layer (LEL), lateral inhibition layer (LIL), and a pile of feature detection layers (FDL's). These layers are shown to be effective for recognizing spoken words. Namely, first, LEL reduces the distortion of sound spectrum caused by the pitch of speech sounds. Next, LIL emphasizes the major energy peaks of sound spectrum, the formants. Last, FDL's detect syllables and words in successive formants, where two functions, time-delay and strong adaptation, play important roles: time-delay makes it possible to retain the pattern of formant changes for a period to detect spoken words successively; strong adaptation contributes to removing the time-warp of formant changes. Digital computer simulations show that the network detect isolated syllables, isolated words, and connected words in continuous speech, while reproducing the fundamental responses found in the auditory system such as ON, OFF, ON-OFF, and SUSTAINED patterns.  相似文献   

4.
Pulse-resonance sounds play an important role in animal communication and auditory object recognition, yet very little is known about the cortical representation of this class of sounds. In this study we shine light on one simple aspect: how well does the firing rate of cortical neurons resolve resonant (“formant”) frequencies of vowel-like pulse-resonance sounds. We recorded neural responses in the primary auditory cortex (A1) of anesthetized rats to two-formant pulse-resonance sounds, and estimated their formant resolving power using a statistical kernel smoothing method which takes into account the natural variability of cortical responses. While formant-tuning functions were diverse in structure across different penetrations, most were sensitive to changes in formant frequency, with a frequency resolution comparable to that reported for rat cochlear filters.  相似文献   

5.
Wang XD  Gu F  He K  Chen LH  Chen L 《PloS one》2012,7(1):e30027

Background

Extraction of linguistically relevant auditory features is critical for speech comprehension in complex auditory environments, in which the relationships between acoustic stimuli are often abstract and constant while the stimuli per se are varying. These relationships are referred to as the abstract auditory rule in speech and have been investigated for their underlying neural mechanisms at an attentive stage. However, the issue of whether or not there is a sensory intelligence that enables one to automatically encode abstract auditory rules in speech at a preattentive stage has not yet been thoroughly addressed.

Methodology/Principal Findings

We chose Chinese lexical tones for the current study because they help to define word meaning and hence facilitate the fabrication of an abstract auditory rule in a speech sound stream. We continuously presented native Chinese speakers with Chinese vowels differing in formant, intensity, and level of pitch to construct a complex and varying auditory stream. In this stream, most of the sounds shared flat lexical tones to form an embedded abstract auditory rule. Occasionally the rule was randomly violated by those with a rising or falling lexical tone. The results showed that the violation of the abstract auditory rule of lexical tones evoked a robust preattentive auditory response, as revealed by whole-head electrical recordings of the mismatch negativity (MMN), though none of the subjects acquired explicit knowledge of the rule or became aware of the violation.

Conclusions/Significance

Our results demonstrate that there is an auditory sensory intelligence in the perception of Chinese lexical tones. The existence of this intelligence suggests that the humans can automatically extract abstract auditory rules in speech at a preattentive stage to ensure speech communication in complex and noisy auditory environments without drawing on conscious resources.  相似文献   

6.
Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.  相似文献   

7.
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.  相似文献   

8.
Nasir SM  Ostry DJ 《Current biology : CB》2006,16(19):1918-1923
Speech production is dependent on both auditory and somatosensory feedback. Although audition may appear to be the dominant sensory modality in speech production, somatosensory information plays a role that extends from brainstem responses to cortical control. Accordingly, the motor commands that underlie speech movements may have somatosensory as well as auditory goals. Here we provide evidence that, independent of the acoustics, somatosensory information is central to achieving the precision requirements of speech movements. We were able to dissociate auditory and somatosensory feedback by using a robotic device that altered the jaw's motion path, and hence proprioception, without affecting speech acoustics. The loads were designed to target either the consonant- or vowel-related portion of an utterance because these are the major sound categories in speech. We found that, even in the absence of any effect on the acoustics, with learning subjects corrected to an equal extent for both kinds of loads. This finding suggests that there are comparable somatosensory precision requirements for both kinds of speech sounds. We provide experimental evidence that the neural control of stiffness or impedance--the resistance to displacement--provides for somatosensory precision in speech production.  相似文献   

9.
The auditory responsiveness of a number of neurones in the meso- and metathoracic ganglia of the locust, Locusta migratoria, was found to change systematically during concomitant wind stimulation. Changes in responsiveness were of three kinds: a suppression of the response to low frequency sound (5 kHz), but an unchanged or increased response to high frequency (12 kHz) sound; an increased response to all sound; a decrease in the excitatory, and an increase in the inhibitory, components of a response to sound. Suppression of the response to low frequency sound was mediated by wind, rather than by the flight motor. Wind stimulation caused an increase in membrane conductance and concomitant depolarization in recorded neurones. Wind stimulation potentiated the spike response to a given depolarizing current, and the spike response to a high frequency sound, by about the same amount. The strongest wind-related input to interneuron 714 was via the metathoracic N6, which carries the axons of auditory receptors from the ear. The EPSP evoked in central neurones by electrical stimulation of metathoracic N6 was suppressed by wind stimulation, and by low frequency (5 kHz), but not high frequency (10 kHz), sound. This suppression disappeared when N6 was cut distally to the stimulating electrodes. Responses to low frequency (5 kHz), rather than high frequency (12 kHz), sounds could be suppressed by a second low frequency tone with an intensity above 50-55 dB SPL for a 5 kHz suppressing tone. Suppression of the electrically-evoked EPSP in neurone 714 was greatest at those sound frequencies represented maximally in the spectrum of the locust's wingbeat. It is concluded that the acoustic components of a wind stimulus are able to mediate both inhibition and excitation in the auditory pathway. By suppressing the responses to low frequency sounds, wind stimulation would effectively shift the frequency-response characteristics of central auditory neurones during flight.  相似文献   

10.
Sayles M  Winter IM 《Neuron》2008,58(5):789-801
Accurate neural coding of the pitch of complex sounds is an essential part of auditory scene analysis; differences in pitch help segregate concurrent sounds, while similarities in pitch can help group sounds from a common source. In quiet, nonreverberant backgrounds, pitch can be derived from timing information in broadband high-frequency auditory channels and/or from frequency and timing information carried in narrowband low-frequency auditory channels. Recording from single neurons in the cochlear nucleus of anesthetized guinea pigs, we show that the neural representation of pitch based on timing information is severely degraded in the presence of reverberation. This degradation increases with both increasing reverberation strength and channel bandwidth. In a parallel human psychophysical pitch-discrimination task, reverberation impaired the ability to distinguish a high-pass harmonic sound from noise. Together, these findings explain the origin of perceptual difficulties experienced by both normal-hearing and hearing-impaired listeners in reverberant spaces.  相似文献   

11.
The scope of lexical planning, which means how far ahead speakers plan lexically before they start producing an utterance, is an important issue for research into speech production, but remains highly controversial. The present research investigated this issue using the semantic blocking effect, which refers to the widely observed effects that participants take longer to say aloud the names of items in pictures when the pictures in a block of trials in an experiment depict items that belong to the same semantic category than different categories. As this effect is often interpreted as a reflection of difficulty in lexical selection, the current study took the semantic blocking effect and its associated pattern of event-related brain potentials (ERPs) as a proxy to test whether lexical planning during sentence production extends beyond the first noun when a subject noun-phrase includes two nouns, such as “The chair and the boat are both red” and “The chair above the boat is red”. The results showed a semantic blocking effect both in onset latencies and in ERPs during the utterance of the first noun of these complex noun-phrases but not for the second noun. The indication, therefore, is that the lexical planning scope does not encompass this second noun-phrase. Indeed, the present findings are in line with accounts that propose radically incremental lexical planning, in which speakers plan ahead only one word at a time. This study also provides a highly novel example of using ERPs to examine the production of long utterances, and it is hoped the present demonstration of the effectiveness of this approach inspires further application of ERP techniques in this area of research.  相似文献   

12.
Pell MD  Kotz SA 《PloS one》2011,6(11):e27256
How quickly do listeners recognize emotions from a speaker''s voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing.  相似文献   

13.
Computational and experimental research has revealed that auditory sensory predictions are derived from regularities of the current environment by using internal generative models. However, so far, what has not been addressed is how the auditory system handles situations giving rise to redundant or even contradictory predictions derived from different sources of information. To this end, we measured error signals in the event-related brain potentials (ERPs) in response to violations of auditory predictions. Sounds could be predicted on the basis of overall probability, i.e., one sound was presented frequently and another sound rarely. Furthermore, each sound was predicted by an informative visual cue. Participants’ task was to use the cue and to discriminate the two sounds as fast as possible. Violations of the probability based prediction (i.e., a rare sound) as well as violations of the visual-auditory prediction (i.e., an incongruent sound) elicited error signals in the ERPs (Mismatch Negativity [MMN] and Incongruency Response [IR]). Particular error signals were observed even in case the overall probability and the visual symbol predicted different sounds. That is, the auditory system concurrently maintains and tests contradictory predictions. Moreover, if the same sound was predicted, we observed an additive error signal (scalp potential and primary current density) equaling the sum of the specific error signals. Thus, the auditory system maintains and tolerates functionally independently represented redundant and contradictory predictions. We argue that the auditory system exploits all currently active regularities in order to optimally prepare for future events.  相似文献   

14.
Heterochronic formation of basic and language-specific speech sounds in the first year of life in infants from different ethnic groups (Chechens, Russians, and Mongols) has been studied. Spectral analysis of the frequency, amplitude, and formant characteristics of speech sounds has shown a universal pattern of organization of the basic sound repertoire and “language-specific” sounds in the process of babbling and prattle of infants of different ethnic groups. Possible mechanisms of the formation of specific speech sounds in early ontogeny are discussed.  相似文献   

15.
Speech is the most interesting and one of the most complex sounds dealt with by the auditory system. The neural representation of speech needs to capture those features of the signal on which the brain depends in language communication. Here we describe the representation of speech in the auditory nerve and in a few sites in the central nervous system from the perspective of the neural coding of important aspects of the signal. The representation is tonotopic, meaning that the speech signal is decomposed by frequency and different frequency components are represented in different populations of neurons. Essential to the representation are the properties of frequency tuning and nonlinear suppression. Tuning creates the decomposition of the signal by frequency, and nonlinear suppression is essential for maintaining the representation across sound levels. The representation changes in central auditory neurons by becoming more robust against changes in stimulus intensity and more transient. However, it is probable that the form of the representation at the auditory cortex is fundamentally different from that at lower levels, in that stimulus features other than the distribution of energy across frequency are analysed.  相似文献   

16.
Formants are important phonetic elements of human speech that are also used by humans and non-human mammals to assess the body size of potential mates and rivals. As a consequence, it has been suggested that formant perception, which is crucial for speech perception, may have evolved through sexual selection. Somewhat surprisingly, though, no previous studies have examined whether sexes differ in their ability to use formants for size evaluation. Here, we investigated whether men and women differ in their ability to use the formant frequency spacing of synthetic vocal stimuli to make auditory size judgements over a wide range of fundamental frequencies (the main determinant of vocal pitch). Our results reveal that men are significantly better than women at comparing the apparent size of stimuli, and that lower pitch improves the ability of both men and women to perform these acoustic size judgements. These findings constitute the first demonstration of a sex difference in formant perception, and lend support to the idea that acoustic size normalization, a crucial prerequisite for speech perception, may have been sexually selected through male competition. We also provide the first evidence that vocalizations with relatively low pitch improve the perception of size-related formant information.  相似文献   

17.
The dual-route model of speech processing includes a dorsal stream that maps auditory to motor features at the sublexical level rather than at the lexico-semantic level. However, the literature on gesture is an invitation to revise this model because it suggests that the premotor cortex of the dorsal route is a major site of lexico-semantic interaction. Here we investigated lexico-semantic mapping using word-gesture pairs that were either congruent or incongruent. Using fMRI-adaptation in 28 subjects, we found that temporo-parietal and premotor activity during auditory processing of single action words was modulated by the prior audiovisual context in which the words had been repeated. The BOLD signal was suppressed following repetition of the auditory word alone, and further suppressed following repetition of the word accompanied by a congruent gesture (e.g. [“grasp” + grasping gesture]). Conversely, repetition suppression was not observed when the same action word was accompanied by an incongruent gesture (e.g. [“grasp” + sprinkle]). We propose a simple model to explain these results: auditory and visual information converge onto premotor cortex where it is represented in a comparable format to determine (in)congruence between speech and gesture. This ability of the dorsal route to detect audiovisual semantic (in)congruence suggests that its function is not restricted to the sublexical level.  相似文献   

18.
To study how auditory cortical processing is affected by anticipating and hearing of long emotional sounds, we recorded auditory evoked magnetic fields with a whole-scalp MEG device from 15 healthy adults who were listening to emotional or neutral sounds. Pleasant, unpleasant, or neutral sounds, each lasting for 6 s, were played in a random order, preceded by 100-ms cue tones (0.5, 1, or 2 kHz) 2 s before the onset of the sound. The cue tones, indicating the valence of the upcoming emotional sounds, evoked typical transient N100m responses in the auditory cortex. During the rest of the anticipation period (until the beginning of the emotional sound), auditory cortices of both hemispheres generated slow shifts of the same polarity as N100m. During anticipation, the relative strengths of the auditory-cortex signals depended on the upcoming sound: towards the end of the anticipation period the activity became stronger when the subject was anticipating emotional rather than neutral sounds. During the actual emotional and neutral sounds, sustained fields were predominant in the left hemisphere for all sounds. The measured DC MEG signals during both anticipation and hearing of emotional sounds implied that following the cue that indicates the valence of the upcoming sound, the auditory-cortex activity is modulated by the upcoming sound category during the anticipation period.  相似文献   

19.
The brain activity of a fully awake chimpanzee being presented with her name was investigated. Event-related potentials (ERPs) were measured for each of the following auditory stimuli: the vocal sound of the subject''s own name (SON), the vocal sound of a familiar name of another group member, the vocal sound of an unfamiliar name and a non-vocal sound. Some differences in ERP waveforms were detected between kinds of stimuli at latencies at which P3 and Nc components are typically observed in humans. Following stimulus onset, an Nc-like negative shift at approximately 500 ms latency was observed, particularly in response to SON. Such specific ERP patterns suggest that the chimpanzee processes her name differently from other sounds.  相似文献   

20.
Some combinations of musical tones sound pleasing to Western listeners, and are termed consonant, while others sound discordant, and are termed dissonant. The perceptual phenomenon of consonance has been traced to the acoustic property of harmonicity. It has been repeatedly shown that neural correlates of consonance can be found as early as the auditory brainstem as reflected in the harmonicity of the scalp-recorded frequency-following response (FFR). “Neural Pitch Salience” (NPS) measured from FFRs—essentially a time-domain equivalent of the classic pattern recognition models of pitch—has been found to correlate with behavioral judgments of consonance for synthetic stimuli. Following the idea that the auditory system has evolved to process behaviorally relevant natural sounds, and in order to test the generalizability of this finding made with synthetic tones, we recorded FFRs for consonant and dissonant intervals composed of synthetic and natural stimuli. We found that NPS correlated with behavioral judgments of consonance and dissonance for synthetic but not for naturalistic sounds. These results suggest that while some form of harmonicity can be computed from the auditory brainstem response, the general percept of consonance and dissonance is not captured by this measure. It might either be represented in the brainstem in a different code (such as place code) or arise at higher levels of the auditory pathway. Our findings further illustrate the importance of using natural sounds, as a complementary tool to fully-controlled synthetic sounds, when probing auditory perception.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号