首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Nonnative speech poses a challenge to speech perception, especially in challenging listening environments. Audiovisual (AV) cues are known to improve native speech perception in noise. The extent to which AV cues benefit nonnative speech perception in noise, however, is much less well-understood. Here, we examined native American English-speaking and native Korean-speaking listeners'' perception of English sentences produced by a native American English speaker and a native Korean speaker across a range of signal-to-noise ratios (SNRs;−4 to −20 dB) in audio-only and audiovisual conditions. We employed psychometric function analyses to characterize the pattern of AV benefit across SNRs. For native English speech, the largest AV benefit occurred at intermediate SNR (i.e. −12 dB); but for nonnative English speech, the largest AV benefit occurred at a higher SNR (−4 dB). The psychometric function analyses demonstrated that the AV benefit patterns were different between native and nonnative English speech. The nativeness of the listener exerted negligible effects on the AV benefit across SNRs. However, the nonnative listeners'' ability to gain AV benefit in native English speech was related to their proficiency in English. These findings suggest that the native language background of both the speaker and listener clearly modulate the optimal use of AV cues in speech recognition.  相似文献   

2.
This paper reviews progress in understanding the psychology of lipreading and audio-visual speech perception. It considers four questions. What distinguishes better from poorer lipreaders? What are the effects of introducing a delay between the acoustical and optical speech signals? What have attempts to produce computer animations of talking faces contributed to our understanding of the visual cues that distinguish consonants and vowels? Finally, how should the process of audio-visual integration in speech perception be described; that is, how are the sights and sounds of talking faces represented at their conflux?  相似文献   

3.
Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30), we asked whether musical experience benefits an older cohort of musicians (ages 45-65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.  相似文献   

4.
Effects of background speech on reading were examined by playing aloud different types of background speech, while participants read long, syntactically complex and less complex sentences embedded in text. Readers’ eye movement patterns were used to study online sentence comprehension. Effects of background speech were primarily seen in rereading time. In Experiment 1, foreign-language background speech did not disrupt sentence processing. Experiment 2 demonstrated robust disruption in reading as a result of semantically and syntactically anomalous scrambled background speech preserving normal sentence-like intonation. Scrambled speech that was constructed from the text to-be read did not disrupt reading more than scrambled speech constructed from a different, semantically unrelated text. Experiment 3 showed that scrambled speech exacerbated the syntactic complexity effect more than coherent background speech, which also interfered with reading. Experiment 4 demonstrated that both semantically and syntactically anomalous speech produced no more disruption in reading than semantically anomalous but syntactically correct background speech. The pattern of results is best explained by a semantic account that stresses the importance of similarity in semantic processing, but not similarity in semantic content, between the reading task and background speech.  相似文献   

5.
Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.  相似文献   

6.
BACKGROUND: Integrating information from the different senses markedly enhances the detection and identification of external stimuli. Compared with unimodal inputs, semantically and/or spatially congruent multisensory cues speed discrimination and improve reaction times. Discordant inputs have the opposite effect, reducing performance and slowing responses. These behavioural features of crossmodal processing appear to have parallels in the response properties of multisensory cells in the superior colliculi and cerebral cortex of non-human mammals. Although spatially concordant multisensory inputs can produce a dramatic, often multiplicative, increase in cellular activity, spatially disparate cues tend to induce a profound response depression. RESULTS: Using functional magnetic resonance imaging (fMRI), we investigated whether similar indices of crossmodal integration are detectable in human cerebral cortex, and for the synthesis of complex inputs relating to stimulus identity. Ten human subjects were exposed to varying epochs of semantically congruent and incongruent audio-visual speech and to each modality in isolation. Brain activations to matched and mismatched audio-visual inputs were contrasted with the combined response to both unimodal conditions. This strategy identified an area of heteromodal cortex in the left superior temporal sulcus that exhibited significant supra-additive response enhancement to matched audio-visual inputs and a corresponding sub-additive response to mismatched inputs. CONCLUSIONS: The data provide fMRI evidence of crossmodal binding by convergence in the human heteromodal cortex. They further suggest that response enhancement and depression may be a general property of multisensory integration operating at different levels of the neuroaxis and irrespective of the purpose for which sensory inputs are combined.  相似文献   

7.
It is well known that simultaneous presentation of incongruent audio and visual stimuli can lead to illusory percepts. Recent data suggest that distinct processes underlie non-specific intersensory speech as opposed to non-speech perception. However, the development of both speech and non-speech intersensory perception across childhood and adolescence remains poorly defined. Thirty-eight observers aged 5 to 19 were tested on the McGurk effect (an audio-visual illusion involving speech), the Illusory Flash effect and the Fusion effect (two audio-visual illusions not involving speech) to investigate the development of audio-visual interactions and contrast speech vs. non-speech developmental patterns. Whereas the strength of audio-visual speech illusions varied as a direct function of maturational level, performance on non-speech illusory tasks appeared to be homogeneous across all ages. These data support the existence of independent maturational processes underlying speech and non-speech audio-visual illusory effects.  相似文献   

8.
Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.  相似文献   

9.

Background

Visual cross-modal re-organization is a neurophysiological process that occurs in deafness. The intact sensory modality of vision recruits cortical areas from the deprived sensory modality of audition. Such compensatory plasticity is documented in deaf adults and animals, and is related to deficits in speech perception performance in cochlear-implanted adults. However, it is unclear whether visual cross-modal re-organization takes place in cochlear-implanted children and whether it may be a source of variability contributing to speech and language outcomes. Thus, the aim of this study was to determine if visual cross-modal re-organization occurs in cochlear-implanted children, and whether it is related to deficits in speech perception performance.

Methods

Visual evoked potentials (VEPs) were recorded via high-density EEG in 41 normal hearing children and 14 cochlear-implanted children, aged 5–15 years, in response to apparent motion and form change. Comparisons of VEP amplitude and latency, as well as source localization results, were conducted between the groups in order to view evidence of visual cross-modal re-organization. Finally, speech perception in background noise performance was correlated to the visual response in the implanted children.

Results

Distinct VEP morphological patterns were observed in both the normal hearing and cochlear-implanted children. However, the cochlear-implanted children demonstrated larger VEP amplitudes and earlier latency, concurrent with activation of right temporal cortex including auditory regions, suggestive of visual cross-modal re-organization. The VEP N1 latency was negatively related to speech perception in background noise for children with cochlear implants.

Conclusion

Our results are among the first to describe cross modal re-organization of auditory cortex by the visual modality in deaf children fitted with cochlear implants. Our findings suggest that, as a group, children with cochlear implants show evidence of visual cross-modal recruitment, which may be a contributing source of variability in speech perception outcomes with their implant.  相似文献   

10.
In a recent study in younger adults (19-29 year olds) we showed evidence that distributed audiovisual attention resulted in improved discrimination performance for audiovisual stimuli compared to focused visual attention. Here, we extend our findings to healthy older adults (60-90 year olds), showing that performance benefits of distributed audiovisual attention in this population match those of younger adults. Specifically, improved performance was revealed in faster response times for semantically congruent audiovisual stimuli during distributed relative to focused visual attention, without any differences in accuracy. For semantically incongruent stimuli, discrimination accuracy was significantly improved during distributed relative to focused attention. Furthermore, event-related neural processing showed intact crossmodal integration in higher performing older adults similar to younger adults. Thus, there was insufficient evidence to support an age-related deficit in crossmodal attention.  相似文献   

11.
In a natural setting, speech is often accompanied by gestures. As language, speech-accompanying iconic gestures to some extent convey semantic information. However, if comprehension of the information contained in both the auditory and visual modality depends on same or different brain-networks is quite unknown. In this fMRI study, we aimed at identifying the cortical areas engaged in supramodal processing of semantic information. BOLD changes were recorded in 18 healthy right-handed male subjects watching video clips showing an actor who either performed speech (S, acoustic) or gestures (G, visual) in more (+) or less (−) meaningful varieties. In the experimental conditions familiar speech or isolated iconic gestures were presented; during the visual control condition the volunteers watched meaningless gestures (G−), while during the acoustic control condition a foreign language was presented (S−). The conjunction of the visual and acoustic semantic processing revealed activations extending from the left inferior frontal gyrus to the precentral gyrus, and included bilateral posterior temporal regions. We conclude that proclaiming this frontotemporal network the brain''s core language system is to take too narrow a view. Our results rather indicate that these regions constitute a supramodal semantic processing network.  相似文献   

12.

Objective

To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users.

Methods

Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed.

Results

Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032).

Conclusion

Webcameras have the potential to improve telecommunication of hearing-impaired individuals.  相似文献   

13.
Numerous studies have reported subliminal repetition and semantic priming in the visual modality. We transferred this paradigm to the auditory modality. Prime awareness was manipulated by a reduction of sound intensity level. Uncategorized prime words (according to a post-test) were followed by semantically related, unrelated, or repeated target words (presented without intensity reduction) and participants performed a lexical decision task (LDT). Participants with slower reaction times in the LDT showed semantic priming (faster reaction times for semantically related compared to unrelated targets) and negative repetition priming (slower reaction times for repeated compared to semantically related targets). This is the first report of semantic priming in the auditory modality without conscious categorization of the prime.  相似文献   

14.
Gottfried JA  Dolan RJ 《Neuron》2003,39(2):375-386
Human olfactory perception is notoriously unreliable, but shows substantial benefits from visual cues, suggesting important crossmodal integration between these primary sensory modalities. We used event-related fMRI to determine the underlying neural mechanisms of olfactory-visual integration in the human brain. Subjects participated in an olfactory detection task, whereby odors and pictures were delivered separately or together. By manipulating the degree of semantic correspondence between odor-picture pairs, we show a perceptual olfactory facilitation for semantically congruent (versus incongruent) trials. This behavioral advantage was associated with enhanced neural activity in anterior hippocampus and rostromedial orbitofrontal cortex. We suggest these findings can be interpreted as indicating that human hippocampus mediates reactivation of crossmodal semantic associations, even in the absence of explicit memory processing.  相似文献   

15.
We present a sceptical view of multimodal multistability--drawing most of our examples from the relation between audition and vision. We begin by summarizing some of the principal ways in which audio-visual binding takes place. We review the evidence that unambiguous stimulation in one modality may affect the perception of a multistable stimulus in another modality. Cross-modal influences of one multistable stimulus on the multistability of another are different: they have occurred only in speech perception. We then argue that the strongest relation between perceptual organization in vision and perceptual organization in audition is likely to be by way of analogous Gestalt laws. We conclude with some general observations about multimodality.  相似文献   

16.
Listening to speech amidst noise is facilitated by a variety of cues, including the predictable use of certain words in certain contexts. A recent fMRI study of the interaction between noise and semantic predictability has identified a cortical network involved in speech comprehension.  相似文献   

17.
Although several cognitive processes, including speech processing, have been studied during sleep, working memory (WM) has never been explored up to now. Our study assessed the capacity of WM by testing speech perception when the level of background noise and the sentential semantic length (SSL) (amount of semantic information required to perceive the incongruence of a sentence) were modulated. Speech perception was explored with the N400 component of the event-related potentials recorded to sentence final words (50% semantically congruent with the sentence, 50% semantically incongruent). During sleep stage 2 and paradoxical sleep: (1) without noise, a larger N400 was observed for (short and long SSL) sentences ending with a semantically incongruent word compared to a congruent word (i.e. an N400 effect); (2) with moderate noise, the N400 effect (observed at wake with short and long SSL sentences) was attenuated for long SSL sentences. Our results suggest that WM for linguistic information is partially preserved during sleep with a smaller capacity compared to wake.  相似文献   

18.
This experiment investigated the effect of signal modality on time perception in 5- and 8-year-old children as well as young adults using a duration bisection task in which auditory and visual signals were presented in the same test session and shared common anchor durations. Durations were judged shorter for visual than for auditory signals by all age groups. However, the magnitude of this modality difference was larger in the children than in the adults. Sensitivity to time was also observed to increase with age for both modalities. Taken together, these two observations suggest that the greater modality effect on duration judgments for the children, for whom attentional abilities are considered limited, is the result of visual signals requiring more attentional resources than are needed for the processing of auditory signals. Within the framework of the information-processing model of Scalar Timing Theory, these effects are consistent with a developmental difference in the operation of the "attentional switch" used to transfer pulses from the pacemaker into the accumulator. Specifically, although timing is more automatic for auditory than visual signals in both children and young adults, children have greater difficulty in keeping the switch in the closed state during the timing of visual signals.  相似文献   

19.
The semantic content, or the meaning, is the essence of autobiographical memories. In comparison to previous research, which has mainly focused on the phenomenological experience and the age distribution of retrieved events, the present study provides a novel view on the retrieval of event information by quantifying the information as semantic representations. We investigated the semantic representation of sensory cued autobiographical events and studied the modality hierarchy within the multimodal retrieval cues. The experiment comprised a cued recall task, where the participants were presented with visual, auditory, olfactory or multimodal retrieval cues and asked to recall autobiographical events. The results indicated that the three different unimodal retrieval cues generate significantly different semantic representations. Further, the auditory and the visual modalities contributed the most to the semantic representation of the multimodally retrieved events. Finally, the semantic representation of the multimodal condition could be described as a combination of the three unimodal conditions. In conclusion, these results suggest that the meaning of the retrieved event information depends on the modality of the retrieval cues.  相似文献   

20.
To obtain a coherent perception of the world, our senses need to be in alignment. When we encounter misaligned cues from two sensory modalities, the brain must infer which cue is faulty and recalibrate the corresponding sense. We examined whether and how the brain uses cue reliability to identify the miscalibrated sense by measuring the audiovisual ventriloquism aftereffect for stimuli of varying visual reliability. To adjust for modality-specific biases, visual stimulus locations were chosen based on perceived alignment with auditory stimulus locations for each participant. During an audiovisual recalibration phase, participants were presented with bimodal stimuli with a fixed perceptual spatial discrepancy; they localized one modality, cued after stimulus presentation. Unimodal auditory and visual localization was measured before and after the audiovisual recalibration phase. We compared participants’ behavior to the predictions of three models of recalibration: (a) Reliability-based: each modality is recalibrated based on its relative reliability—less reliable cues are recalibrated more; (b) Fixed-ratio: the degree of recalibration for each modality is fixed; (c) Causal-inference: recalibration is directly determined by the discrepancy between a cue and its estimate, which in turn depends on the reliability of both cues, and inference about how likely the two cues derive from a common source. Vision was hardly recalibrated by audition. Auditory recalibration by vision changed idiosyncratically as visual reliability decreased: the extent of auditory recalibration either decreased monotonically, peaked at medium visual reliability, or increased monotonically. The latter two patterns cannot be explained by either the reliability-based or fixed-ratio models. Only the causal-inference model of recalibration captures the idiosyncratic influences of cue reliability on recalibration. We conclude that cue reliability, causal inference, and modality-specific biases guide cross-modal recalibration indirectly by determining the perception of audiovisual stimuli.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号