首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper reviews progress in understanding the psychology of lipreading and audio-visual speech perception. It considers four questions. What distinguishes better from poorer lipreaders? What are the effects of introducing a delay between the acoustical and optical speech signals? What have attempts to produce computer animations of talking faces contributed to our understanding of the visual cues that distinguish consonants and vowels? Finally, how should the process of audio-visual integration in speech perception be described; that is, how are the sights and sounds of talking faces represented at their conflux?  相似文献   

2.
Spoken language communication is arguably the most important activity that distinguishes humans from non-human species. This paper provides an overview of the review papers that make up this theme issue on the processes underlying speech communication. The volume includes contributions from researchers who specialize in a wide range of topics within the general area of speech perception and language processing. It also includes contributions from key researchers in neuroanatomy and functional neuro-imaging, in an effort to cut across traditional disciplinary boundaries and foster cross-disciplinary interactions in this important and rapidly developing area of the biological and cognitive sciences.  相似文献   

3.
We are all voice experts. First and foremost, we can produce and understand speech, and this makes us a unique species. But in addition to speech perception, we routinely extract from voices a wealth of socially-relevant information in what constitutes a more primitive, and probably more universal, non-linguistic mode of communication. Consider the following example: you are sitting in a plane, and you can hear a conversation in a foreign language in the row behind you. You do not see the speakers' faces, and you cannot understand the speech content because you do not know the language. Yet, an amazing amount of information is available to you. You can evaluate the physical characteristics of the different protagonists, including their gender, approximate age and size, and associate an identity to the different voices. You can form a good idea of the different speaker's mood and affective state, as well as more subtle cues as the perceived attractiveness or dominance of the protagonists. In brief, you can form a fairly detailed picture of the type of social interaction unfolding, which a brief glance backwards can on the occasion help refine - sometimes surprisingly so. What are the acoustical cues that carry these different types of vocal information? How does our brain process and analyse this information? Here we briefly review an emerging field and the main tools used in voice perception research.  相似文献   

4.
Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30), we asked whether musical experience benefits an older cohort of musicians (ages 45-65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.  相似文献   

5.
Zatorre RJ 《Neuron》2001,31(1):13-14
Primary sensory cortices are generally thought to be devoted to one sensory modality-vision, hearing, or touch, for example. Surprising interactions between these sensory modes have recently been reported. One example demonstrates that people with cochlear implants show increased activity in visual cortex when listening to speech; this may be related to enhanced lipreading ability.  相似文献   

6.
Little is known about the brain mechanisms involved in word learning during infancy and in second language acquisition and about the way these new words become stable representations that sustain language processing. In several studies we have adopted the human simulation perspective, studying the effects of brain-lesions and combining different neuroimaging techniques such as event-related potentials and functional magnetic resonance imaging in order to examine the language learning (LL) process. In the present article, we review this evidence focusing on how different brain signatures relate to (i) the extraction of words from speech, (ii) the discovery of their embedded grammatical structure, and (iii) how meaning derived from verbal contexts can inform us about the cognitive mechanisms underlying the learning process. We compile these findings and frame them into an integrative neurophysiological model that tries to delineate the major neural networks that might be involved in the initial stages of LL. Finally, we propose that LL simulations can help us to understand natural language processing and how the recovery from language disorders in infants and adults can be accomplished.  相似文献   

7.
This special issue presents research concerning multistable perception in different sensory modalities. Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. Multistability was first described for vision, where it occurs, for example, when different stimuli are presented to the two eyes or for certain ambiguous figures. It has since been described for other sensory modalities, including audition, touch and olfaction. The key features of multistability are: (i) stimuli have more than one plausible perceptual organization; (ii) these organizations are not compatible with each other. We argue here that most if not all cases of multistability are based on competition in selecting and binding stimulus information. Binding refers to the process whereby the different attributes of objects in the environment, as represented in the sensory array, are bound together within our perceptual systems, to provide a coherent interpretation of the world around us. We argue that multistability can be used as a method for studying binding processes within and across sensory modalities. We emphasize this theme while presenting an outline of the papers in this issue. We end with some thoughts about open directions and avenues for further research.  相似文献   

8.
本文提出了一种新型有效的基于听觉模型的基音提取方法.它主要是模拟人类听觉系统音调感知功能,在跨通道累加自相关处理方法的基础上,增加了神经系统在感知时的时间连续性模拟,由于对空间和时间分布的信息的综合累加作用,使所提的方法不仅能提取出淹没在各种噪音下的语言信号的基音信息,而且能够判断所处理信号是否由重叠语言信号构成,并进一步提取出叠在一起的独立的基音信息.初步的实验结果证明了所提模型的有效性.  相似文献   

9.
The speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is discrete and compositional, shared by all the individuals of a community but different across communities, and phoneme inventories are characterized by statistical regularities. How can a speech code with these properties form? We try to approach these questions in the paper, using the "methodology of the artificial". We build a society of artificial agents, and detail a mechanism that shows the formation of a discrete speech code without pre-supposing the existence of linguistic capacities or of coordinated interactions. The mechanism is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non-language-specific neural devices leads to the formation of a speech code that has properties similar to the human speech code. This result relies on the self-organizing properties of a generic coupling between perception and production within agents, and on the interactions between agents. The artificial system helps us to develop better intuitions on how speech might have appeared, by showing how self-organization might have helped natural selection to find speech.  相似文献   

10.
The present article outlines the contribution of the mismatch negativity (MMN), and its magnetic equivalent MMNm, to our understanding of the perception of speech sounds in the human brain. MMN data indicate that each sound, both speech and non-speech, develops its neural representation corresponding to the percept of this sound in the neurophysiological substrate of auditory sensory memory. The accuracy of this representation, determining the accuracy of the discrimination between different sounds, can be probed with MMN separately for any auditory feature or stimulus type such as phonemes. Furthermore, MMN data show that the perception of phonemes, and probably also of larger linguistic units (syllables and words), is based on language-specific phonetic traces developed in the posterior part of the left-hemisphere auditory cortex. These traces serve as recognition models for the corresponding speech sounds in listening to speech.  相似文献   

11.
Pitch changes that occur in speech and melodies can be described in terms of contour patterns of rises and falls in pitch and the actual pitches at each point in time. This study investigates whether training can improve the perception of these different features. One group of ten adults trained on a pitch-contour discrimination task, a second group trained on an actual-pitch discrimination task, and a third group trained on a contour comparison task between pitch sequences and their visual analogs. A fourth group did not undergo training. It was found that training on pitch sequence comparison tasks gave rise to improvements in pitch-contour perception. This occurred irrespective of whether the training task required the discrimination of contour patterns or the actual pitch details. In contrast, none of the training tasks were found to improve the perception of the actual pitches in a sequence. The results support psychological models of pitch processing where contour processing is an initial step before actual pitch details are analyzed. Further studies are required to determine whether pitch-contour training is effective in improving speech and melody perception.  相似文献   

12.
Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test–retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.  相似文献   

13.
The purpose of this study was to use mismatch responses (MMRs) to explore the dynamic changes of Mandarin speech perception abilities from early to middle childhood. Twenty preschoolers, 18 school-aged children, and 26 adults participated in this study. Two sets of synthesized speech stimuli varying in Mandarin consonant (alveolo-palatal affricate vs. fricative) and lexical tone features (rising vs. contour tone) were used to examine the developmental course of speech perception abilities. The results indicated that only the adult group demonstrated typical early mismatch negativity (MMN) responses, suggesting that the ability to discriminate specific speech cues in Mandarin consonant and lexical tone is a continuing process in preschool- and school-aged children. Additionally, distinct MMR patterns provided evidence indicating diverse developmental courses to different speech characteristics. By incorporating data from the two speech conditions, we propose using MMR profiles consisting of mismatch negativity (MMN), positive mismatch response (p-MMR), and late discriminative negativity (LDN) as possible brain indices to investigate speech perception development.  相似文献   

14.
Inner ear prostheses, or cochlear implants, are neuroprostheses providing profoundly deaf people with sound sensations through electrical stimulation of the auditory nerve. Approximately 2,000 of these devices are currently in use worldwide. Five models are each being used by more than 50 patients. Among other features they differ from one another mainly in the design and placement of electrodes and in the particular speech coding strategy used, periodicity and tonotopy priciple being made use of to a highly varying degree. Nevertheless, 4 of the 5 devices lead to quite comparable degrees of speech understanding without additional lipreading.  相似文献   

15.
Listening to speech in the presence of other sounds   总被引:1,自引:0,他引:1  
Although most research on the perception of speech has been conducted with speech presented without any competing sounds, we almost always listen to speech against a background of other sounds which we are adept at ignoring. Nevertheless, such additional irrelevant sounds can cause severe problems for speech recognition algorithms and for the hard of hearing as well as posing a challenge to theories of speech perception. A variety of different problems are created by the presence of additional sound sources: detection of features that are partially masked, allocation of detected features to the appropriate sound sources and recognition of sounds on the basis of partial information. The separation of sounds is arousing substantial attention in psychoacoustics and in computer science. An effective solution to the problem of separating sounds would have important practical applications.  相似文献   

16.
Vowel identification in noise using consonant-vowel-consonant (CVC) logatomes was used to investigate a possible interplay of speech information from different frequency regions. It was hypothesized that the periodicity conveyed by the temporal envelope of a high frequency stimulus can enhance the use of the information carried by auditory channels in the low-frequency region that share the same periodicity. It was further hypothesized that this acts as a strobe-like mechanism and would increase the signal-to-noise ratio for the voiced parts of the CVCs. In a first experiment, different high-frequency cues were provided to test this hypothesis, whereas a second experiment examined more closely the role of amplitude modulations and intact phase information within the high-frequency region (4–8 kHz). CVCs were either natural or vocoded speech (both limited to a low-pass cutoff-frequency of 2.5 kHz) and were presented in stationary 3-kHz low-pass filtered masking noise. The experimental results did not support the hypothesized use of periodicity information for aiding low-frequency perception.  相似文献   

17.
On the basis of the general character and operation of the process of perception, a formalism is sought to mathematically describe the subjective or abstract/mental process of perception. It is shown that the formalism of orthodox quantum theory of measurement, where the observer plays a key role, is a broader mathematical foundation which can be adopted to describe the dynamics of the subjective experience. The mathematical formalism describes the psychophysical dynamics of the subjective or cognitive experience as communicated to us by the subject. Subsequently, the formalism is used to describe simple perception processes and, in particular, to describe the probability distribution of dominance duration obtained from the testimony of subjects experiencing binocular rivalry. Using this theory and parameters based on known values of neuronal oscillation frequencies and firing rates, the calculated probability distribution of dominance duration of rival states in binocular rivalry under various conditions is found to be in good agreement with available experimental data. This theory naturally explains an observed marked increase in dominance duration in binocular rivalry upon periodic interruption of stimulus and yields testable predictions for the distribution of perceptual alteration in time.  相似文献   

18.

Objectives

(1) To report the speech perception and intelligibility results of Mandarin-speaking patients with large vestibular aqueduct syndrome (LVAS) after cochlear implantation (CI); (2) to compare their performance with a group of CI users without LVAS; (3) to understand the effects of age at implantation and duration of implant use on the CI outcomes. The obtained data may be used to guide decisions about CI candidacy and surgical timing.

Methods

Forty-two patients with LVAS participating in this study were divided into two groups: the early group received CI before 5 years of age and the late group after 5. Open-set speech perception tests (on Mandarin tones, words and sentences) were administered one year after implantation and at the most recent follow-up visit. Categories of auditory perception (CAP) and Speech Intelligibility Rating (SIR) scale scores were also obtained.

Results

The patients with LVAS with more than 5 years of implant use (18 cases) achieved a mean score higher than 80% on the most recent speech perception tests and reached the highest level on the CAP/SIR scales. The early group developed speech perception and intelligibility steadily over time, while the late group had a rapid improvement during the first year after implantation. The two groups, regardless of their age at implantation, reached a similar performance level at the most recent follow-up visit.

Conclusion

High levels of speech performance are reached after 5 years of implant use in patients with LVAS. These patients do not necessarily need to wait until their hearing thresholds are higher than 90 dB HL or PB word score lower than 40% to receive CI. They can do it “earlier” when their speech perception and/or speech intelligibility do not reach the performance level suggested in this study.  相似文献   

19.
It is well known that simultaneous presentation of incongruent audio and visual stimuli can lead to illusory percepts. Recent data suggest that distinct processes underlie non-specific intersensory speech as opposed to non-speech perception. However, the development of both speech and non-speech intersensory perception across childhood and adolescence remains poorly defined. Thirty-eight observers aged 5 to 19 were tested on the McGurk effect (an audio-visual illusion involving speech), the Illusory Flash effect and the Fusion effect (two audio-visual illusions not involving speech) to investigate the development of audio-visual interactions and contrast speech vs. non-speech developmental patterns. Whereas the strength of audio-visual speech illusions varied as a direct function of maturational level, performance on non-speech illusory tasks appeared to be homogeneous across all ages. These data support the existence of independent maturational processes underlying speech and non-speech audio-visual illusory effects.  相似文献   

20.
Recent evidence from experiments on immediate memory indicates unambiguously that silent speech perception can produce typically 'auditory' effects while there is either active or passive mouthing of the relevant articulatory gestures. This result falsifies previous theories of auditory sensory memory (pre-categorical acoustic store) that insisted on external auditory stimulation as indispensable for access to the system. A resolution is proposed that leaves the properties of pre-categorical acoustic store much as they were assumed to be before but adds the possibility that visual information can affect the selection of auditory features in a pre-categorical stage of speech perception. In common terms, a speaker's facial gestures (or one's own) can influence auditory experience independently of determining what it was that was said. Some results in word perception that encourage this view are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号