首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 203 毫秒
The processing of continuous and complex auditory signals such as speech relies on the ability to use statistical cues (e.g. transitional probabilities). In this study, participants heard short auditory sequences composed either of Italian syllables or bird songs and completed a regularity-rating task. Behaviorally, participants were better at differentiating between levels of regularity in the syllable sequences than in the bird song sequences. Inter-individual differences in sensitivity to regularity for speech stimuli were correlated with variations in surface-based cortical thickness (CT). These correlations were found in several cortical areas including regions previously associated with statistical structure processing (e.g. bilateral superior temporal sulcus, left precentral sulcus and inferior frontal gyrus), as well other regions (e.g. left insula, bilateral superior frontal gyrus/sulcus and supramarginal gyrus). In all regions, this correlation was positive suggesting that thicker cortex is related to higher sensitivity to variations in the statistical structure of auditory sequences. Overall, these results suggest that inter-individual differences in CT within a distributed network of cortical regions involved in statistical structure processing, attention and memory is predictive of the ability to detect structural structure in auditory speech sequences.  相似文献   

Ranging, the ability to judge the distance to a sound source, depends on the presence of predictable patterns of attenuation. We measured long-range sound propagation in coastal waters to assess whether humpback whales might use frequency degradation cues to range singing whales. Two types of neural networks, a multi-layer and a single-layer perceptron, were trained to classify recorded sounds by distance traveled based on their frequency content. The multi-layer network successfully classified received sounds, demonstrating that the distorting effects of underwater propagation on frequency content provide sufficient cues to estimate source distance. Normalizing received sounds with respect to ambient noise levels increased the accuracy of distance estimates by single-layer perceptrons, indicating that familiarity with background noise can potentially improve a listening whale's ability to range. To assess whether frequency patterns predictive of source distance were likely to be perceived by whales, recordings were pre-processed using a computational model of the humpback whale's peripheral auditory system. Although signals processed with this model contained less information than the original recordings, neural networks trained with these physiologically based representations estimated source distance more accurately, suggesting that listening whales should be able to range singers using distance-dependent changes in frequency content.  相似文献   

Nasir SM  Ostry DJ 《Current biology : CB》2006,16(19):1918-1923
Speech production is dependent on both auditory and somatosensory feedback. Although audition may appear to be the dominant sensory modality in speech production, somatosensory information plays a role that extends from brainstem responses to cortical control. Accordingly, the motor commands that underlie speech movements may have somatosensory as well as auditory goals. Here we provide evidence that, independent of the acoustics, somatosensory information is central to achieving the precision requirements of speech movements. We were able to dissociate auditory and somatosensory feedback by using a robotic device that altered the jaw's motion path, and hence proprioception, without affecting speech acoustics. The loads were designed to target either the consonant- or vowel-related portion of an utterance because these are the major sound categories in speech. We found that, even in the absence of any effect on the acoustics, with learning subjects corrected to an equal extent for both kinds of loads. This finding suggests that there are comparable somatosensory precision requirements for both kinds of speech sounds. We provide experimental evidence that the neural control of stiffness or impedance--the resistance to displacement--provides for somatosensory precision in speech production.  相似文献   

The coding of complex sounds in the early auditory system has a 'standard model' based on the known physiology of the cochlea and main brainstem pathways. This model accounts for a wide range of perceptual capabilities. It is generally accepted that high cortical areas encode abstract qualities such as spatial location or speech sound identity. Between the early and late auditory system, the role of primary auditory cortex (A1) is still debated. A1 is clearly much more than a 'whiteboard' of acoustic information-neurons in A1 have complex response properties, showing sensitivity to both low-level and high-level features of sounds.  相似文献   

A central goal in auditory neuroscience is to understand the neural coding of species-specific communication and human speech sounds. Low-rate repetitive sounds are elemental features of communication sounds, and core auditory cortical regions have been implicated in processing these information-bearing elements. Repetitive sounds could be encoded by at least three neural response properties: 1) the event-locked spike-timing precision, 2) the mean firing rate, and 3) the interspike interval (ISI). To determine how well these response aspects capture information about the repetition rate stimulus, we measured local group responses of cortical neurons in cat anterior auditory field (AAF) to click trains and calculated their mutual information based on these different codes. ISIs of the multiunit responses carried substantially higher information about low repetition rates than either spike-timing precision or firing rate. Combining firing rate and ISI codes was synergistic and captured modestly more repetition information. Spatial distribution analyses showed distinct local clustering properties for each encoding scheme for repetition information indicative of a place code. Diversity in local processing emphasis and distribution of different repetition rate codes across AAF may give rise to concurrent feed-forward processing streams that contribute differently to higher-order sound analysis.  相似文献   

Luo H  Poeppel D 《Neuron》2007,54(6):1001-1010
How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4-8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an approximately 200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.  相似文献   

Zimmer U  Macaluso E 《Neuron》2005,47(6):893-905
Our brain continuously receives complex combinations of sounds originating from different sources and relating to different events in the external world. Timing differences between the two ears can be used to localize sounds in space, but only when the inputs to the two ears have similar spectrotemporal profiles (high binaural coherence). We used fMRI to investigate any modulation of auditory responses by binaural coherence. We assessed how processing of these cues depends on whether spatial information is task relevant and whether brain activity correlates with subjects' localization performance. We found that activity in Heschl's gyrus increased with increasing coherence, irrespective of whether localization was task relevant. Posterior auditory regions also showed increased activity for high coherence, primarily when sound localization was required and subjects successfully localized sounds. We conclude that binaural coherence cues are processed throughout the auditory cortex and that these cues are used in posterior regions for successful auditory localization.  相似文献   

Functional neuroimaging research provides detailed observations of the response patterns that natural sounds (e.g. human voices and speech, animal cries, environmental sounds) evoke in the human brain. The computational and representational mechanisms underlying these observations, however, remain largely unknown. Here we combine high spatial resolution (3 and 7 Tesla) functional magnetic resonance imaging (fMRI) with computational modeling to reveal how natural sounds are represented in the human brain. We compare competing models of sound representations and select the model that most accurately predicts fMRI response patterns to natural sounds. Our results show that the cortical encoding of natural sounds entails the formation of multiple representations of sound spectrograms with different degrees of spectral and temporal resolution. The cortex derives these multi-resolution representations through frequency-specific neural processing channels and through the combined analysis of the spectral and temporal modulations in the spectrogram. Furthermore, our findings suggest that a spectral-temporal resolution trade-off may govern the modulation tuning of neuronal populations throughout the auditory cortex. Specifically, our fMRI results suggest that neuronal populations in posterior/dorsal auditory regions preferably encode coarse spectral information with high temporal precision. Vice-versa, neuronal populations in anterior/ventral auditory regions preferably encode fine-grained spectral information with low temporal precision. We propose that such a multi-resolution analysis may be crucially relevant for flexible and behaviorally-relevant sound processing and may constitute one of the computational underpinnings of functional specialization in auditory cortex.  相似文献   

This paper reviews the basic aspects of auditory processing that play a role in the perception of speech. The frequency selectivity of the auditory system, as measured using masking experiments, is described and used to derive the internal representation of the spectrum (the excitation pattern) of speech sounds. The perception of timbre and distinctions in quality between vowels are related to both static and dynamic aspects of the spectra of sounds. The perception of pitch and its role in speech perception are described. Measures of the temporal resolution of the auditory system are described and a model of temporal resolution based on a sliding temporal integrator is outlined. The combined effects of frequency and temporal resolution can be modelled by calculation of the spectro-temporal excitation pattern, which gives good insight into the internal representation of speech sounds. For speech presented in quiet, the resolution of the auditory system in frequency and time usually markedly exceeds the resolution necessary for the identification or discrimination of speech sounds, which partly accounts for the robust nature of speech perception. However, for people with impaired hearing, speech perception is often much less robust.  相似文献   

Evidence that the auditory system contains specialised motion detectors is mixed. Many psychophysical studies confound speed cues with distance and duration cues and present sound sources that do not appear to move in external space. Here we use the ‘discrimination contours’ technique to probe the probabilistic combination of speed, distance and duration for stimuli moving in a horizontal arc around the listener in virtual auditory space. The technique produces a set of motion discrimination thresholds that define a contour in the distance-duration plane for different combination of the three cues, based on a 3-interval oddity task. The orientation of the contour (typically elliptical in shape) reveals which cue or combination of cues dominates. If the auditory system contains specialised motion detectors, stimuli moving over different distances and durations but defining the same speed should be more difficult to discriminate. The resulting discrimination contours should therefore be oriented obliquely along iso-speed lines within the distance-duration plane. However, we found that over a wide range of speeds, distances and durations, the ellipses aligned with distance-duration axes and were stretched vertically, suggesting that listeners were most sensitive to duration. A second experiment showed that listeners were able to make speed judgements when distance and duration cues were degraded by noise, but that performance was worse. Our results therefore suggest that speed is not a primary cue to motion in the auditory system, but that listeners are able to use speed to make discrimination judgements when distance and duration cues are unreliable.  相似文献   

The present article outlines the contribution of the mismatch negativity (MMN), and its magnetic equivalent MMNm, to our understanding of the perception of speech sounds in the human brain. MMN data indicate that each sound, both speech and non-speech, develops its neural representation corresponding to the percept of this sound in the neurophysiological substrate of auditory sensory memory. The accuracy of this representation, determining the accuracy of the discrimination between different sounds, can be probed with MMN separately for any auditory feature or stimulus type such as phonemes. Furthermore, MMN data show that the perception of phonemes, and probably also of larger linguistic units (syllables and words), is based on language-specific phonetic traces developed in the posterior part of the left-hemisphere auditory cortex. These traces serve as recognition models for the corresponding speech sounds in listening to speech.  相似文献   

Lewald J  Getzmann S 《PloS one》2011,6(9):e25146
The modulation of brain activity as a function of auditory location was investigated using electro-encephalography in combination with standardized low-resolution brain electromagnetic tomography. Auditory stimuli were presented at various positions under anechoic conditions in free-field space, thus providing the complete set of natural spatial cues. Variation of electrical activity in cortical areas depending on sound location was analyzed by contrasts between sound locations at the time of the N1 and P2 responses of the auditory evoked potential. A clear-cut double dissociation with respect to the cortical locations and the points in time was found, indicating spatial processing (1) in the primary auditory cortex and posterodorsal auditory cortical pathway at the time of the N1, and (2) in the anteroventral pathway regions about 100 ms later at the time of the P2. Thus, it seems as if both auditory pathways are involved in spatial analysis but at different points in time. It is possible that the late processing in the anteroventral auditory network reflected the sharing of this region by analysis of object-feature information and spectral localization cues or even the integration of spatial and non-spatial sound features.  相似文献   

Multisensory integration was once thought to be the domain of brain areas high in the cortical hierarchy, with early sensory cortical fields devoted to unisensory processing of inputs from their given set of sensory receptors. More recently, a wealth of evidence documenting visual and somatosensory responses in auditory cortex, even as early as the primary fields, has changed this view of cortical processing. These multisensory inputs may serve to enhance responses to sounds that are accompanied by other sensory cues, effectively making them easier to hear, but may also act more selectively to shape the receptive field properties of auditory cortical neurons to the location or identity of these events. We discuss the new, converging evidence that multiplexing of neural signals may play a key role in informatively encoding and integrating signals in auditory cortex across multiple sensory modalities. We highlight some of the many open research questions that exist about the neural mechanisms that give rise to multisensory integration in auditory cortex, which should be addressed in future experimental and theoretical studies.  相似文献   

Selective attention is the mechanism that allows focusing one’s attention on a particular stimulus while filtering out a range of other stimuli, for instance, on a single conversation in a noisy room. Attending to one sound source rather than another changes activity in the human auditory cortex, but it is unclear whether attention to different acoustic features, such as voice pitch and speaker location, modulates subcortical activity. Studies using a dichotic listening paradigm indicated that auditory brainstem processing may be modulated by the direction of attention. We investigated whether endogenous selective attention to one of two speech signals affects amplitude and phase locking in auditory brainstem responses when the signals were either discriminable by frequency content alone, or by frequency content and spatial location. Frequency-following responses to the speech sounds were significantly modulated in both conditions. The modulation was specific to the task-relevant frequency band. The effect was stronger when both frequency and spatial information were available. Patterns of response were variable between participants, and were correlated with psychophysical discriminability of the stimuli, suggesting that the modulation was biologically relevant. Our results demonstrate that auditory brainstem responses are susceptible to efferent modulation related to behavioral goals. Furthermore they suggest that mechanisms of selective attention actively shape activity at early subcortical processing stages according to task relevance and based on frequency and spatial cues.  相似文献   

Immediate early genes (IEGs) are widely used as markers to delineate neuronal circuits because they show fast and transient expression induced by various behavioral paradigms. In this study, we investigated the expression of the IEGs c-fos and Arc in the auditory cortex of the mouse after auditory cued fear conditioning using quantitative polymerase chain reaction and microarray analysis. To test for the specificity of the IEG induction, we included several control groups that allowed us to test for factors other than associative learning to sounds that could lead to an induction of IEGs. We found that both c-fos and Arc showed strong and robust induction after auditory fear conditioning. However, we also observed increased expression of both genes in any control paradigm that involved shocks, even when no sounds were presented. Using mRNA microarrays and comparing the effect of the various behavioral paradigms on mRNA expression levels, we did not find genes being selectively upregulated in the auditory fear conditioned group. In summary, our results indicate that the use of IEGs to identify neuronal circuits involved specifically in processing of sound cues in the fear conditioning paradigm can be limited by the effects of the aversive unconditional stimulus and that activity levels in a particular primary sensory cortical area can be strongly influenced by stimuli mediated by other modalities.  相似文献   



Recent research has addressed the suppression of cortical sensory responses to altered auditory feedback that occurs at utterance onset regarding speech. However, there is reason to assume that the mechanisms underlying sensorimotor processing at mid-utterance are different than those involved in sensorimotor control at utterance onset. The present study attempted to examine the dynamics of event-related potentials (ERPs) to different acoustic versions of auditory feedback at mid-utterance.

Methodology/Principal findings

Subjects produced a vowel sound while hearing their pitch-shifted voice (100 cents), a sum of their vocalization and pure tones, or a sum of their vocalization and white noise at mid-utterance via headphones. Subjects also passively listened to playback of what they heard during active vocalization. Cortical ERPs were recorded in response to different acoustic versions of feedback changes during both active vocalization and passive listening. The results showed that, relative to passive listening, active vocalization yielded enhanced P2 responses to the 100 cents pitch shifts, whereas suppression effects of P2 responses were observed when voice auditory feedback was distorted by pure tones or white noise.


The present findings, for the first time, demonstrate a dynamic modulation of cortical activity as a function of the quality of acoustic feedback at mid-utterance, suggesting that auditory cortical responses can be enhanced or suppressed to distinguish self-produced speech from externally-produced sounds.  相似文献   

The measurement of time is fundamental to the perception of complex, temporally structured acoustic signals such as speech and music, yet the mechanisms of temporal sensitivity in the auditory system remain largely unknown. Recently, temporal feature detectors have been discovered in several vertebrate auditory systems. For example, midbrain neurons in the fish Pollimyrus are activated by specific rhythms contained in the simple sounds they use for communication. This poses the significant challenge of uncovering the neuro-computational mechanisms that underlie temporal feature detection. Here we describe a model network that responds selectively to temporal features of communication sounds, yielding temporal selectivity in output neurons that matches the selectivity functions found in the auditory system of Pollimyrus. The output of the network depends upon the timing of excitatory and inhibitory input and post-inhibitory rebound excitation. Interval tuning is achieved in a behaviorally relevant range (10 to 40 ms) using a biologically constrained model, providing a simple mechanism that is suitable for the neural extraction of the relatively long duration temporal cues (i.e. tens to hundreds of ms) that are important in animal communication and human speech.  相似文献   

Models of auditory processing, particularly of speech, face many difficulties. Included in these are variability among speakers, variability in speech rate, and robustness to moderate distortions such as time compression. We constructed a system based on ensembles of feature detectors derived from fragments of an onset-sensitive sound representation. This method is based on the idea of ‘spectro-temporal response fields’ and uses convolution to measure the degree of similarity through time between the feature detectors and the stimulus. The output from the ensemble was used to derive segmentation cues and patterns of response, which were used to train an artificial neural network (ANN) classifier. This allowed us to estimate a lower bound for the mutual information between the class of the input and the class of the output. Our results suggest that there is significant information in the output of our system, and that this is robust with respect to the exact choice of feature set, time compression in the stimulus, and speaker variation. In addition, the robustness to time compression in the stimulus has features in common with human psychophysics. Similar experiments using feature detectors derived from fragments of non-speech sounds performed less well. This result is interesting in the light of results showing aberrant cortical development in animals exposed to impoverished auditory environments during the developmental phase.  相似文献   



The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding ‘rapid temporal processing’.


A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds.


Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号