首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Listening to speech in the presence of other sounds   总被引:1,自引:0,他引:1  
Although most research on the perception of speech has been conducted with speech presented without any competing sounds, we almost always listen to speech against a background of other sounds which we are adept at ignoring. Nevertheless, such additional irrelevant sounds can cause severe problems for speech recognition algorithms and for the hard of hearing as well as posing a challenge to theories of speech perception. A variety of different problems are created by the presence of additional sound sources: detection of features that are partially masked, allocation of detected features to the appropriate sound sources and recognition of sounds on the basis of partial information. The separation of sounds is arousing substantial attention in psychoacoustics and in computer science. An effective solution to the problem of separating sounds would have important practical applications.  相似文献   

2.
It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition), as well as in normal controls. A visual speech recognition task (i.e. speechreading) was administered either in silence or in combination with three types of auditory distractors: i) noise ii) reverse speech sound and iii) non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.  相似文献   

3.
Heterochronic formation of basic and language-specific speech sounds in the first year of life in infants from different ethnic groups (Chechens, Russians, and Mongols) has been studied. Spectral analysis of the frequency, amplitude, and formant characteristics of speech sounds has shown a universal pattern of organization of the basic sound repertoire and “language-specific” sounds in the process of babbling and prattle of infants of different ethnic groups. Possible mechanisms of the formation of specific speech sounds in early ontogeny are discussed.  相似文献   

4.
The present article outlines the contribution of the mismatch negativity (MMN), and its magnetic equivalent MMNm, to our understanding of the perception of speech sounds in the human brain. MMN data indicate that each sound, both speech and non-speech, develops its neural representation corresponding to the percept of this sound in the neurophysiological substrate of auditory sensory memory. The accuracy of this representation, determining the accuracy of the discrimination between different sounds, can be probed with MMN separately for any auditory feature or stimulus type such as phonemes. Furthermore, MMN data show that the perception of phonemes, and probably also of larger linguistic units (syllables and words), is based on language-specific phonetic traces developed in the posterior part of the left-hemisphere auditory cortex. These traces serve as recognition models for the corresponding speech sounds in listening to speech.  相似文献   

5.
Several acoustic cues contribute to auditory distance estimation. Nonacoustic cues, including familiarity, may also play a role. We tested participants' ability to distinguish the distances of acoustically similar sounds that differed in familiarity. Participants were better able to judge the distances of familiar sounds. Electroencephalographic (EEG) recordings collected while participants performed this auditory distance judgment task revealed that several cortical regions responded in different ways depending on sound familiarity. Surprisingly, these differences were observed in auditory cortical regions as well as other cortical regions distributed throughout both hemispheres. These data suggest that learning about subtle, distance-dependent variations in complex speech sounds involves processing in a broad cortical network that contributes both to speech recognition and to how spatial information is extracted from speech.  相似文献   

6.
Accumulating evidence suggests that storing speech sounds requires transposing rapidly fluctuating sound waves into more easily encoded oromotor sequences. If so, then the classical speech areas in the caudalmost portion of the temporal gyrus (pSTG) and in the inferior frontal gyrus (IFG) may be critical for performing this acoustic-oromotor transposition. We tested this proposal by applying repetitive transcranial magnetic stimulation (rTMS) to each of these left-hemisphere loci, as well as to a nonspeech locus, while participants listened to pseudowords. After 5 minutes these stimuli were re-presented together with new ones in a recognition test. Compared to control-site stimulation, pSTG stimulation produced a highly significant increase in recognition error rate, without affecting reaction time. By contrast, IFG stimulation led only to a weak, non-significant, trend toward recognition memory impairment. Importantly, the impairment after pSTG stimulation was not due to interference with perception, since the same stimulation failed to affect pseudoword discrimination examined with short interstimulus intervals. Our findings suggest that pSTG is essential for transforming speech sounds into stored motor plans for reproducing the sound. Whether or not the IFG also plays a role in speech-sound recognition could not be determined from the present results.  相似文献   

7.
Neural specializations for speech and pitch: moving beyond the dichotomies   总被引:2,自引:0,他引:2  
The idea that speech processing relies on unique, encapsulated, domain-specific mechanisms has been around for some time. Another well-known idea, often espoused as being in opposition to the first proposal, is that processing of speech sounds entails general-purpose neural mechanisms sensitive to the acoustic features that are present in speech. Here, we suggest that these dichotomous views need not be mutually exclusive. Specifically, there is now extensive evidence that spectral and temporal acoustical properties predict the relative specialization of right and left auditory cortices, and that this is a parsimonious way to account not only for the processing of speech sounds, but also for non-speech sounds such as musical tones. We also point out that there is equally compelling evidence that neural responses elicited by speech sounds can differ depending on more abstract, linguistically relevant properties of a stimulus (such as whether it forms part of one's language or not). Tonal languages provide a particularly valuable window to understand the interplay between these processes. The key to reconciling these phenomena probably lies in understanding the interactions between afferent pathways that carry stimulus information, with top-down processing mechanisms that modulate these processes. Although we are still far from the point of having a complete picture, we argue that moving forward will require us to abandon the dichotomy argument in favour of a more integrated approach.  相似文献   

8.
9.

Background

The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding ‘rapid temporal processing’.

Methodology

A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds.

Conclusions

Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.  相似文献   

10.
The aim of this study was to investigate the characteristic of information sufficiency in speech of children during their second year of life. We tried to select those elements of child's speech which have the main meaning in successive recognition of the sense of child's utterances by adults. The vocal phonemes from child's words and syllabic structure of words in child's utterances, were quite successfully recognized. Unclear articulated consonant's phonemes caused mistakes in recognition. Auditors with professional experience of analysis of child's sounds described the phoneme consequences of child's signals and the scence of child's words more clearly.  相似文献   

11.
The aims of this study were (1) to document the recognition performance of environmental sounds (ESs) in Mandarin-speaking children with cochlear implants (CIs) and to analyze the possible associated factors with the ESs recognition; (2) to examine the relationship between perception of ESs and receptive vocabulary level; and (3) to explore the acoustic factors relevant to perceptual outcomes of daily ESs in pediatric CI users. Forty-seven prelingually deafened children between ages 4 to 10 years participated in this study. They were divided into pre-school (group A: age 4–6) and school-age (group B: age 7 to 10) groups. Sound Effects Recognition Test (SERT) and the Chinese version of the revised Peabody Picture Vocabulary Test (PPVT-R) were used to assess the auditory perception ability. The average correct percentage of SERT was 61.2% in the preschool group and 72.3% in the older group. There was no significant difference between the two groups. The ESs recognition performance of children with CIs was poorer than that of their hearing peers (90% in average). No correlation existed between ESs recognition and receptive vocabulary comprehension. Two predictive factors: pre-implantation residual hearing and duration of CI usage were found to be associated with recognition performance of daily-encountered ESs. Acoustically, sounds with distinct temporal patterning were easier to identify for children with CIs. In conclusion, we have demonstrated that ESs recognition is not easy for children with CIs and a low correlation existed between linguistic sounds and ESs recognition in these subjects. Recognition ability of ESs in children with CIs can only be achieved by natural exposure to daily-encountered auditory stimuli if sounds other than speech stimuli were less emphasized in routine verbal/oral habilitation program. Therefore, task-specific measures other than speech materials can be helpful to capture the full profile of auditory perceptual progress after implantation.  相似文献   

12.
Temporal summation was estimated by measuring the detection thresholds for pulses with durations of 1–50 ms in the presence of noise maskers. The purpose of the study was to examine the effects of the spectral profiles and intensities of noise maskers on temporal summation, to investigate the appearance of signs of peripheral processing of pulses with various frequency-time structures in auditory responses, and to test the opportunity to use temporal summation for speech recognition. The central frequencies of pulses and maskers were similar. The maskers had ripple structures of the amplitude spectra of two types. In some maskers, the central frequencies coincided with the spectrum humps, whereas in other maskers, they coincided with spectrum dip (so-called on- and off-maskers). When the auditory system differentiated the masker humps, then the difference between the thresholds of recognition of the stimuli presented together with each of two types of maskers was not equal to zero. The assessment of temporal summation and the difference of the thresholds of pulse recognition under conditions of the presentation of the on- and off-maskers allowed us to make a conclusion on auditory sensitivity and the resolution of the spectral structure of maskers or frequency selectivity during presentation of pulses of various durations in local frequency areas. In order to estimate the effect of the dynamic properties of hearing on sensitivity and frequency selectivity, we changed the intensity of maskers. We measured temporal summation under the conditions of the presentation of on- and off-maskers of various intensities in two frequency ranges (2 and 4 kHz) in four subjects with normal hearing and one person with age-related hearing impairments who complained of a decrease in speech recognition under noise conditions. Pulses shorter than 10 ms were considered as simple models of consonant sounds, whereas tone pulses longer than 10 ms were considered as simple models of vowel sounds. In subjects with normal hearing in the range of moderate masker intensities, we observed an enhancement of temporal summation when the short pulses or consonant sounds were presented and an improvement of the resolution of the broken structure of masker spectra when the short and tone pulses, i.e., consonant and vowel sounds, were presented. We supposed that the enhancement of the summation was related to the refractoriness of the fibers of the auditory nerve. In the range of 4 kHz, the subject with age-related hearing impairments did not recognize the ripple structure of the maskers in the presence of the short pulses or consonant sounds. We supposed that these impairments were caused by abnormal synchronization of the responses of the auditory nerve fibers induced by the pulses, and this resulted in a decrease in speech recognition.  相似文献   

13.
Humans can recognize spoken words with unmatched speed and accuracy. Hearing the initial portion of a word such as "formu…" is sufficient for the brain to identify "formula" from the thousands of other words that partially match. Two alternative computational accounts propose that partially matching words (1) inhibit each other until a single word is selected ("formula" inhibits "formal" by lexical competition) or (2) are used to predict upcoming speech sounds more accurately (segment prediction error is minimal after sequences like "formu…"). To distinguish these theories we taught participants novel words (e.g., "formubo") that sound like existing words ("formula") on two successive days. Computational simulations show that knowing "formubo" increases lexical competition when hearing "formu…", but reduces segment prediction error. Conversely, when the sounds in "formula" and "formubo" diverge, the reverse is observed. The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account. We propose a predictive coding model of spoken word recognition in which STG neurons represent the difference between predicted and heard speech sounds. This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.  相似文献   

14.
Bottlenose dolphins (Tursiops truncatus) use the frequency contour of whistles produced by conspecifics for individual recognition. Here we tested a bottlenose dolphin’s (Tursiops truncatus) ability to recognize frequency modulated whistle-like sounds using a three alternative matching-to-sample paradigm. The dolphin was first trained to select a specific object (object A) in response to a specific sound (sound A) for a total of three object-sound associations. The sounds were then transformed by amplitude, duration, or frequency transposition while still preserving the frequency contour of each sound. For comparison purposes, 30 human participants completed an identical task with the same sounds, objects, and training procedure. The dolphin’s ability to correctly match objects to sounds was robust to changes in amplitude with only a minor decrement in performance for short durations. The dolphin failed to recognize sounds that were frequency transposed by plus or minus ½ octaves. Human participants demonstrated robust recognition with all acoustic transformations. The results indicate that this dolphin’s acoustic recognition of whistle-like sounds was constrained by absolute pitch. Unlike human speech, which varies considerably in average frequency, signature whistles are relatively stable in frequency, which may have selected for a whistle recognition system invariant to frequency transposition.  相似文献   

15.
Azadpour M  Balaban E 《PloS one》2008,3(4):e1966
Neuroimaging studies of speech processing increasingly rely on artificial speech-like sounds whose perceptual status as speech or non-speech is assigned by simple subjective judgments; brain activation patterns are interpreted according to these status assignments. The naïve perceptual status of one such stimulus, spectrally-rotated speech (not consciously perceived as speech by naïve subjects), was evaluated in discrimination and forced identification experiments. Discrimination of variation in spectrally-rotated syllables in one group of naïve subjects was strongly related to the pattern of similarities in phonological identification of the same stimuli provided by a second, independent group of naïve subjects, suggesting either that (1) naïve rotated syllable perception involves phonetic-like processing, or (2) that perception is solely based on physical acoustic similarity, and similar sounds are provided with similar phonetic identities. Analysis of acoustic (Euclidean distances of center frequency values of formants) and phonetic similarities in the perception of the vowel portions of the rotated syllables revealed that discrimination was significantly and independently influenced by both acoustic and phonological information. We conclude that simple subjective assessments of artificial speech-like sounds can be misleading, as perception of such sounds may initially and unconsciously utilize speech-like, phonological processing.  相似文献   

16.
We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.  相似文献   

17.
In this paper, we describe domain-general auditory processes that we believe are prerequisite to the linguistic analysis of speech. We discuss biological evidence for these processes and how they might relate to processes that are specific to human speech and language. We begin with a brief review of (i) the anatomy of the auditory system and (ii) the essential properties of speech sounds. Section 4 describes the general auditory mechanisms that we believe are applied to all communication sounds, and how functional neuroimaging is being used to map the brain networks associated with domain-general auditory processing. Section 5 discusses recent neuroimaging studies that explore where such general processes give way to those that are specific to human speech and language.  相似文献   

18.
An artificial neural network which uses anatomical and physiological findings on the afferent pathway from the ear to the cortex is presented and the roles of the constituent functions in recognition of continuous speech are examined. The network deals with successive spectra of speech sounds by a cascade of several neural layers: lateral excitation layer (LEL), lateral inhibition layer (LIL), and a pile of feature detection layers (FDL's). These layers are shown to be effective for recognizing spoken words. Namely, first, LEL reduces the distortion of sound spectrum caused by the pitch of speech sounds. Next, LIL emphasizes the major energy peaks of sound spectrum, the formants. Last, FDL's detect syllables and words in successive formants, where two functions, time-delay and strong adaptation, play important roles: time-delay makes it possible to retain the pattern of formant changes for a period to detect spoken words successively; strong adaptation contributes to removing the time-warp of formant changes. Digital computer simulations show that the network detect isolated syllables, isolated words, and connected words in continuous speech, while reproducing the fundamental responses found in the auditory system such as ON, OFF, ON-OFF, and SUSTAINED patterns.  相似文献   

19.
Research into speech perception by nonhuman animals can be crucially informative in assessing whether specific perceptual phenomena in humans have evolved to decode speech, or reflect more general traits. Birds share with humans not only the capacity to use complex vocalizations for communication but also many characteristics of its underlying developmental and mechanistic processes; thus, birds are a particularly interesting group for comparative study. This review first discusses commonalities between birds and humans in perception of speech sounds. Several psychoacoustic studies have shown striking parallels in seemingly speech-specific perceptual phenomena, such as categorical perception of voice-onset-time variation, categorization of consonants that lack phonetic invariance, and compensation for coarticulation. Such findings are often regarded as evidence for the idea that the objects of human speech perception are auditory or acoustic events rather than articulations. Next, I highlight recent research on the production side of avian communication that has revealed the existence of vocal tract filtering and articulation in bird species-specific vocalization, which has traditionally been considered a hallmark of human speech production. Together, findings in birds show that many of characteristics of human speech perception are not uniquely human but also that a comparative approach to the question of what are the objects of perception--articulatory or auditory events--requires careful consideration of species-specific vocal production mechanisms.  相似文献   

20.
Speaker verification and speech recognition are closely related technologies. They both operate on spoken language that has been input to a microphone or telephone, and they both employ analog-to-digital signal-processing (DSP) techniques to extract information about acoustic data and patterns from that input. The principal distinction between speech recognition and speaker verification is functional, with each system differing markedly in what they do with the speech data once it has been processed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号