首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Speech is the most interesting and one of the most complex sounds dealt with by the auditory system. The neural representation of speech needs to capture those features of the signal on which the brain depends in language communication. Here we describe the representation of speech in the auditory nerve and in a few sites in the central nervous system from the perspective of the neural coding of important aspects of the signal. The representation is tonotopic, meaning that the speech signal is decomposed by frequency and different frequency components are represented in different populations of neurons. Essential to the representation are the properties of frequency tuning and nonlinear suppression. Tuning creates the decomposition of the signal by frequency, and nonlinear suppression is essential for maintaining the representation across sound levels. The representation changes in central auditory neurons by becoming more robust against changes in stimulus intensity and more transient. However, it is probable that the form of the representation at the auditory cortex is fundamentally different from that at lower levels, in that stimulus features other than the distribution of energy across frequency are analysed.  相似文献   

2.
Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons’ action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy.  相似文献   

3.
The measurement of time is fundamental to the perception of complex, temporally structured acoustic signals such as speech and music, yet the mechanisms of temporal sensitivity in the auditory system remain largely unknown. Recently, temporal feature detectors have been discovered in several vertebrate auditory systems. For example, midbrain neurons in the fish Pollimyrus are activated by specific rhythms contained in the simple sounds they use for communication. This poses the significant challenge of uncovering the neuro-computational mechanisms that underlie temporal feature detection. Here we describe a model network that responds selectively to temporal features of communication sounds, yielding temporal selectivity in output neurons that matches the selectivity functions found in the auditory system of Pollimyrus. The output of the network depends upon the timing of excitatory and inhibitory input and post-inhibitory rebound excitation. Interval tuning is achieved in a behaviorally relevant range (10 to 40 ms) using a biologically constrained model, providing a simple mechanism that is suitable for the neural extraction of the relatively long duration temporal cues (i.e. tens to hundreds of ms) that are important in animal communication and human speech.  相似文献   

4.
The present article outlines the contribution of the mismatch negativity (MMN), and its magnetic equivalent MMNm, to our understanding of the perception of speech sounds in the human brain. MMN data indicate that each sound, both speech and non-speech, develops its neural representation corresponding to the percept of this sound in the neurophysiological substrate of auditory sensory memory. The accuracy of this representation, determining the accuracy of the discrimination between different sounds, can be probed with MMN separately for any auditory feature or stimulus type such as phonemes. Furthermore, MMN data show that the perception of phonemes, and probably also of larger linguistic units (syllables and words), is based on language-specific phonetic traces developed in the posterior part of the left-hemisphere auditory cortex. These traces serve as recognition models for the corresponding speech sounds in listening to speech.  相似文献   

5.
The past 30 years has seen a remarkable development in our understanding of how the auditory system--particularly the peripheral system--processes complex sounds. Perhaps the most significant has been our understanding of the mechanisms underlying auditory frequency selectivity and their importance for normal and impaired auditory processing. Physiologically vulnerable cochlear filtering can account for many aspects of our normal and impaired psychophysical frequency selectivity with important consequences for the perception of complex sounds. For normal hearing, remarkable mechanisms in the organ of Corti, involving enhancement of mechanical tuning (in mammals probably by feedback of electro-mechanically generated energy from the hair cells), produce exquisite tuning, reflected in the tuning properties of cochlear nerve fibres. Recent comparisons of physiological (cochlear nerve) and psychophysical frequency selectivity in the same species indicate that the ear's overall frequency selectivity can be accounted for by this cochlear filtering, at least in bandwidth terms. Because this cochlear filtering is physiologically vulnerable, it deteriorates in deleterious conditions of the cochlea--hypoxia, disease, drugs, noise overexposure, mechanical disturbance--and is reflected in impaired psychophysical frequency selectivity. This is a fundamental feature of sensorineural hearing loss of cochlear origin, and is of diagnostic value. This cochlear filtering, particularly as reflected in the temporal patterns of cochlear fibres to complex sounds, is remarkably robust over a wide range of stimulus levels. Furthermore, cochlear filtering properties are a prime determinant of the 'place' and 'time' coding of frequency at the cochlear nerve level, both of which appear to be involved in pitch perception. The problem of how the place and time coding of complex sounds is effected over the ear's remarkably wide dynamic range is briefly addressed. In the auditory brainstem, particularly the dorsal cochlear nucleus, are inhibitory mechanisms responsible for enhancing the spectral and temporal contrasts in complex sounds. These mechanisms are now being dissected neuropharmacologically. At the cortical level, mechanisms are evident that are capable of abstracting biologically relevant features of complex sounds. Fundamental studies of how the auditory system encodes and processes complex sounds are vital to promising recent applications in the diagnosis and rehabilitation of the hearing impaired.  相似文献   

6.
What do we hear when someone speaks and what does auditory cortex (AC) do with that sound? Given how meaningful speech is, it might be hypothesized that AC is most active when other people talk so that their productions get decoded. Here, neuroimaging meta-analyses show the opposite: AC is least active and sometimes deactivated when participants listened to meaningful speech compared to less meaningful sounds. Results are explained by an active hypothesis-and-test mechanism where speech production (SP) regions are neurally re-used to predict auditory objects associated with available context. By this model, more AC activity for less meaningful sounds occurs because predictions are less successful from context, requiring further hypotheses be tested. This also explains the large overlap of AC co-activity for less meaningful sounds with meta-analyses of SP. An experiment showed a similar pattern of results for non-verbal context. Specifically, words produced less activity in AC and SP regions when preceded by co-speech gestures that visually described those words compared to those words without gestures. Results collectively suggest that what we ‘hear’ during real-world speech perception may come more from the brain than our ears and that the function of AC is to confirm or deny internal predictions about the identity of sounds.  相似文献   

7.
How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.  相似文献   

8.
Livshits MS 《Biofizika》2000,45(5):922-926
The study is based on the model of sound perception that involves two systems of measuring the frequency of the sound being perceived. The system of analyzing the periodicity of spike sequence in axons of neurons innervating the internal auditory hair cells excited by the running wave is less precise, but it provides the estimation of the frequency of any periodical sounds. Exact measurement of the frequency of the sinusoidal sound occurs from the spikes in axons of neurones innervating the internal hair cells of the auditory reception field, which uses the entire train of waves excited by this sound in the critical layer of the waveguide of the internal ear cochlea, which corresponds to the frequency of the sound. The octave effect is explained in terms of the fact that the spectrum of frequencies of speech sounds, singing and music coincides with the region of the audibility range in which the impulses of the auditory nerve fibers are synchronized by incoming signals. The octave similarity, i.e., the similarity in the sounding of harmonic signals, whose frequencies relate as even numbers (2:1, etc.), is explained by an unambiguous match between the sound frequency and pulse rate in auditory fibers coming from the auditory reception field. The presence in the brain posterior tubercles of multipeak neurons whose peaks are in octave ratio, confirm the crucial role of the system of exact measurement of frequency in the phenomenon of octave similarity. The phenomenon of diplacusis, which is particularly pronounced in persons with Menier disease, is caused by changes in the position of the auditory reception field in the diseased ear as compared with the healthy ear. The alternating switching of reception from one ear to the other is related to a disturbance of the unitary image of pitch.  相似文献   

9.
Luo H  Poeppel D 《Neuron》2007,54(6):1001-1010
How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4-8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an approximately 200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.  相似文献   

10.
The processing of species-specific communication signals in the auditory system represents an important aspect of animal behavior and is crucial for its social interactions, reproduction, and survival. In this article the neuronal mechanisms underlying the processing of communication signals in the higher centers of the auditory system--inferior colliculus (IC), medial geniculate body (MGB) and auditory cortex (AC)--are reviewed, with particular attention to the guinea pig. The selectivity of neuronal responses for individual calls in these auditory centers in the guinea pig is usually low--most neurons respond to calls as well as to artificial sounds; the coding of complex sounds in the central auditory nuclei is apparently based on the representation of temporal and spectral features of acoustical stimuli in neural networks. Neuronal response patterns in the IC reliably match the sound envelope for calls characterized by one or more short impulses, but do not exactly fit the envelope for long calls. Also, the main spectral peaks are represented by neuronal firing rates in the IC. In comparison to the IC, response patterns in the MGB and AC demonstrate a less precise representation of the sound envelope, especially in the case of longer calls. The spectral representation is worse in the case of low-frequency calls, but not in the case of broad-band calls. The emotional content of the call may influence neuronal responses in the auditory pathway, which can be demonstrated by stimulation with time-reversed calls or by measurements performed under different levels of anesthesia. The investigation of the principles of the neural coding of species-specific vocalizations offers some keys for understanding the neural mechanisms underlying human speech perception.  相似文献   

11.
Temporal summation was estimated by measuring the detection thresholds for pulses with durations of 1–50 ms in the presence of noise maskers. The purpose of the study was to examine the effects of the spectral profiles and intensities of noise maskers on temporal summation, to investigate the appearance of signs of peripheral processing of pulses with various frequency-time structures in auditory responses, and to test the opportunity to use temporal summation for speech recognition. The central frequencies of pulses and maskers were similar. The maskers had ripple structures of the amplitude spectra of two types. In some maskers, the central frequencies coincided with the spectrum humps, whereas in other maskers, they coincided with spectrum dip (so-called on- and off-maskers). When the auditory system differentiated the masker humps, then the difference between the thresholds of recognition of the stimuli presented together with each of two types of maskers was not equal to zero. The assessment of temporal summation and the difference of the thresholds of pulse recognition under conditions of the presentation of the on- and off-maskers allowed us to make a conclusion on auditory sensitivity and the resolution of the spectral structure of maskers or frequency selectivity during presentation of pulses of various durations in local frequency areas. In order to estimate the effect of the dynamic properties of hearing on sensitivity and frequency selectivity, we changed the intensity of maskers. We measured temporal summation under the conditions of the presentation of on- and off-maskers of various intensities in two frequency ranges (2 and 4 kHz) in four subjects with normal hearing and one person with age-related hearing impairments who complained of a decrease in speech recognition under noise conditions. Pulses shorter than 10 ms were considered as simple models of consonant sounds, whereas tone pulses longer than 10 ms were considered as simple models of vowel sounds. In subjects with normal hearing in the range of moderate masker intensities, we observed an enhancement of temporal summation when the short pulses or consonant sounds were presented and an improvement of the resolution of the broken structure of masker spectra when the short and tone pulses, i.e., consonant and vowel sounds, were presented. We supposed that the enhancement of the summation was related to the refractoriness of the fibers of the auditory nerve. In the range of 4 kHz, the subject with age-related hearing impairments did not recognize the ripple structure of the maskers in the presence of the short pulses or consonant sounds. We supposed that these impairments were caused by abnormal synchronization of the responses of the auditory nerve fibers induced by the pulses, and this resulted in a decrease in speech recognition.  相似文献   

12.
We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.  相似文献   

13.
Spatial frequency is a fundamental visual feature coded in primary visual cortex, relevant for perceiving textures, objects, hierarchical structures, and scenes, as well as for directing attention and eye movements. Temporal amplitude-modulation (AM) rate is a fundamental auditory feature coded in primary auditory cortex, relevant for perceiving auditory objects, scenes, and speech. Spatial frequency and temporal AM rate are thus fundamental building blocks of visual and auditory perception. Recent results suggest that crossmodal interactions are commonplace across the primary sensory cortices and that some of the underlying neural associations develop through consistent multisensory experience such as audio-visually perceiving speech, gender, and objects. We demonstrate that people consistently and absolutely (rather than relatively) match specific auditory AM rates to specific visual spatial frequencies. We further demonstrate that this crossmodal mapping allows amplitude-modulated sounds to guide attention to and modulate awareness of specific visual spatial frequencies. Additional results show that the crossmodal association is approximately linear, based on physical spatial frequency, and generalizes to tactile pulses, suggesting that the association develops through multisensory experience during manual exploration of surfaces.  相似文献   

14.
Functional neuroimaging research provides detailed observations of the response patterns that natural sounds (e.g. human voices and speech, animal cries, environmental sounds) evoke in the human brain. The computational and representational mechanisms underlying these observations, however, remain largely unknown. Here we combine high spatial resolution (3 and 7 Tesla) functional magnetic resonance imaging (fMRI) with computational modeling to reveal how natural sounds are represented in the human brain. We compare competing models of sound representations and select the model that most accurately predicts fMRI response patterns to natural sounds. Our results show that the cortical encoding of natural sounds entails the formation of multiple representations of sound spectrograms with different degrees of spectral and temporal resolution. The cortex derives these multi-resolution representations through frequency-specific neural processing channels and through the combined analysis of the spectral and temporal modulations in the spectrogram. Furthermore, our findings suggest that a spectral-temporal resolution trade-off may govern the modulation tuning of neuronal populations throughout the auditory cortex. Specifically, our fMRI results suggest that neuronal populations in posterior/dorsal auditory regions preferably encode coarse spectral information with high temporal precision. Vice-versa, neuronal populations in anterior/ventral auditory regions preferably encode fine-grained spectral information with low temporal precision. We propose that such a multi-resolution analysis may be crucially relevant for flexible and behaviorally-relevant sound processing and may constitute one of the computational underpinnings of functional specialization in auditory cortex.  相似文献   

15.
A sequence of sounds may be heard as coming from a single source (called fusion or coherence) or from two or more sources (called fission or stream segregation). Each perceived source is called a 'stream'. When the differences between successive sounds are very large, fission nearly always occurs, whereas when the differences are very small, fusion nearly always occurs. When the differences are intermediate in size, the percept often 'flips' between one stream and multiple streams, a property called 'bistability'. The flips do not generally occur regularly in time. The tendency to hear two streams builds up over time, but can be partially or completely reset by a sudden change in the properties of the sequence or by switches in attention. Stream formation depends partly on the extent to which successive sounds excite different 'channels' in the peripheral auditory system. However, other factors can play a strong role; multiple streams may be heard when successive sounds are presented to the same ear and have essentially identical excitation patterns in the cochlea. Differences between successive sounds in temporal envelope, fundamental frequency, phase spectrum and lateralization can all induce a percept of multiple streams. Regularities in the temporal pattern of elements within a stream can help in stabilizing that stream.  相似文献   

16.
本文提出了一种新型有效的基于听觉模型的基音提取方法.它主要是模拟人类听觉系统音调感知功能,在跨通道累加自相关处理方法的基础上,增加了神经系统在感知时的时间连续性模拟,由于对空间和时间分布的信息的综合累加作用,使所提的方法不仅能提取出淹没在各种噪音下的语言信号的基音信息,而且能够判断所处理信号是否由重叠语言信号构成,并进一步提取出叠在一起的独立的基音信息.初步的实验结果证明了所提模型的有效性.  相似文献   

17.
Several illusory phenomena in auditory perception are accounted for by using the event construction model presented by Nakajima et al. (2000) in order to explain the gap transfer illusion. This model assumes that onsets and offsets of sounds are detected perceptually as if they were independent auditory elements. They are connected to one another according to the proximity principle to constitute auditory events. This model seems to contribute to a general cross-modal theory of perception where the idea of edge integration plays an important role. Potential directions in which we can connect the present paradigm with speech perception are indicated, and possibilities to improve artificial auditory environments are suggested.  相似文献   

18.
Pulse-resonance sounds play an important role in animal communication and auditory object recognition, yet very little is known about the cortical representation of this class of sounds. In this study we shine light on one simple aspect: how well does the firing rate of cortical neurons resolve resonant (“formant”) frequencies of vowel-like pulse-resonance sounds. We recorded neural responses in the primary auditory cortex (A1) of anesthetized rats to two-formant pulse-resonance sounds, and estimated their formant resolving power using a statistical kernel smoothing method which takes into account the natural variability of cortical responses. While formant-tuning functions were diverse in structure across different penetrations, most were sensitive to changes in formant frequency, with a frequency resolution comparable to that reported for rat cochlear filters.  相似文献   

19.
Sounds in our environment like voices, animal calls or musical instruments are easily recognized by human listeners. Understanding the key features underlying this robust sound recognition is an important question in auditory science. Here, we studied the recognition by human listeners of new classes of sounds: acoustic and auditory sketches, sounds that are severely impoverished but still recognizable. Starting from a time-frequency representation, a sketch is obtained by keeping only sparse elements of the original signal, here, by means of a simple peak-picking algorithm. Two time-frequency representations were compared: a biologically grounded one, the auditory spectrogram, which simulates peripheral auditory filtering, and a simple acoustic spectrogram, based on a Fourier transform. Three degrees of sparsity were also investigated. Listeners were asked to recognize the category to which a sketch sound belongs: singing voices, bird calls, musical instruments, and vehicle engine noises. Results showed that, with the exception of voice sounds, very sparse representations of sounds (10 features, or energy peaks, per second) could be recognized above chance. No clear differences could be observed between the acoustic and the auditory sketches. For the voice sounds, however, a completely different pattern of results emerged, with at-chance or even below-chance recognition performances, suggesting that the important features of the voice, whatever they are, were removed by the sketch process. Overall, these perceptual results were well correlated with a model of auditory distances, based on spectro-temporal excitation patterns (STEPs). This study confirms the potential of these new classes of sounds, acoustic and auditory sketches, to study sound recognition.  相似文献   

20.
Speech perception at the interface of neurobiology and linguistics   总被引:2,自引:0,他引:2  
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号