首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although infant speech perception in often studied in isolated modalities, infants'' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.  相似文献   

2.
Hasson U  Skipper JI  Nusbaum HC  Small SL 《Neuron》2007,56(6):1116-1126
Is there a neural representation of speech that transcends its sensory properties? Using fMRI, we investigated whether there are brain areas where neural activity during observation of sublexical audiovisual input corresponds to a listener's speech percept (what is "heard") independent of the sensory properties of the input. A target audiovisual stimulus was preceded by stimuli that (1) shared the target's auditory features (auditory overlap), (2) shared the target's visual features (visual overlap), or (3) shared neither the target's auditory or visual features but were perceived as the target (perceptual overlap). In two left-hemisphere regions (pars opercularis, planum polare), the target invoked less activity when it was preceded by the perceptually overlapping stimulus than when preceded by stimuli that shared one of its sensory components. This pattern of neural facilitation indicates that these regions code sublexical speech at an abstract level corresponding to that of the speech percept.  相似文献   

3.
The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity.  相似文献   

4.
In natural environments, sensory information is embedded in temporally contiguous streams of events. This is typically the case when seeing and listening to a speaker or when engaged in scene analysis. In such contexts, two mechanisms are needed to single out and build a reliable representation of an event (or object): the temporal parsing of information and the selection of relevant information in the stream. It has previously been shown that rhythmic events naturally build temporal expectations that improve sensory processing at predictable points in time. Here, we asked to which extent temporal regularities can improve the detection and identification of events across sensory modalities. To do so, we used a dynamic visual conjunction search task accompanied by auditory cues synchronized or not with the color change of the target (horizontal or vertical bar). Sounds synchronized with the visual target improved search efficiency for temporal rates below 1.4 Hz but did not affect efficiency above that stimulation rate. Desynchronized auditory cues consistently impaired visual search below 3.3 Hz. Our results are interpreted in the context of the Dynamic Attending Theory: specifically, we suggest that a cognitive operation structures events in time irrespective of the sensory modality of input. Our results further support and specify recent neurophysiological findings by showing strong temporal selectivity for audiovisual integration in the auditory-driven improvement of visual search efficiency.  相似文献   

5.
To obtain a coherent perception of the world, our senses need to be in alignment. When we encounter misaligned cues from two sensory modalities, the brain must infer which cue is faulty and recalibrate the corresponding sense. We examined whether and how the brain uses cue reliability to identify the miscalibrated sense by measuring the audiovisual ventriloquism aftereffect for stimuli of varying visual reliability. To adjust for modality-specific biases, visual stimulus locations were chosen based on perceived alignment with auditory stimulus locations for each participant. During an audiovisual recalibration phase, participants were presented with bimodal stimuli with a fixed perceptual spatial discrepancy; they localized one modality, cued after stimulus presentation. Unimodal auditory and visual localization was measured before and after the audiovisual recalibration phase. We compared participants’ behavior to the predictions of three models of recalibration: (a) Reliability-based: each modality is recalibrated based on its relative reliability—less reliable cues are recalibrated more; (b) Fixed-ratio: the degree of recalibration for each modality is fixed; (c) Causal-inference: recalibration is directly determined by the discrepancy between a cue and its estimate, which in turn depends on the reliability of both cues, and inference about how likely the two cues derive from a common source. Vision was hardly recalibrated by audition. Auditory recalibration by vision changed idiosyncratically as visual reliability decreased: the extent of auditory recalibration either decreased monotonically, peaked at medium visual reliability, or increased monotonically. The latter two patterns cannot be explained by either the reliability-based or fixed-ratio models. Only the causal-inference model of recalibration captures the idiosyncratic influences of cue reliability on recalibration. We conclude that cue reliability, causal inference, and modality-specific biases guide cross-modal recalibration indirectly by determining the perception of audiovisual stimuli.  相似文献   

6.
Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18–35) and thirty-three older adults (ages 60–90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener.  相似文献   

7.
A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190–210 ms, for 1 kHz stimuli from 170–200 ms, for 2.5 kHz stimuli from 140–200 ms, 5 kHz stimuli from 100–200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300–340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies.  相似文献   

8.
Visual motion information from dynamic environments is important in multisensory temporal perception. However, it is unclear how visual motion information influences the integration of multisensory temporal perceptions. We investigated whether visual apparent motion affects audiovisual temporal perception. Visual apparent motion is a phenomenon in which two flashes presented in sequence in different positions are perceived as continuous motion. Across three experiments, participants performed temporal order judgment (TOJ) tasks. Experiment 1 was a TOJ task conducted in order to assess audiovisual simultaneity during perception of apparent motion. The results showed that the point of subjective simultaneity (PSS) was shifted toward a sound-lead stimulus, and the just noticeable difference (JND) was reduced compared with a normal TOJ task with a single flash. This indicates that visual apparent motion affects audiovisual simultaneity and improves temporal discrimination in audiovisual processing. Experiment 2 was a TOJ task conducted in order to remove the influence of the amount of flash stimulation from Experiment 1. The PSS and JND during perception of apparent motion were almost identical to those in Experiment 1, but differed from those for successive perception when long temporal intervals were included between two flashes without motion. This showed that the result obtained under the apparent motion condition was unaffected by the amount of flash stimulation. Because apparent motion was produced by a constant interval between two flashes, the results may be accounted for by specific prediction. In Experiment 3, we eliminated the influence of prediction by randomizing the intervals between the two flashes. However, the PSS and JND did not differ from those in Experiment 1. It became clear that the results obtained for the perception of visual apparent motion were not attributable to prediction. Our findings suggest that visual apparent motion changes temporal simultaneity perception and improves temporal discrimination in audiovisual processing.  相似文献   

9.
Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information.  相似文献   

10.

Background

Visual cross-modal re-organization is a neurophysiological process that occurs in deafness. The intact sensory modality of vision recruits cortical areas from the deprived sensory modality of audition. Such compensatory plasticity is documented in deaf adults and animals, and is related to deficits in speech perception performance in cochlear-implanted adults. However, it is unclear whether visual cross-modal re-organization takes place in cochlear-implanted children and whether it may be a source of variability contributing to speech and language outcomes. Thus, the aim of this study was to determine if visual cross-modal re-organization occurs in cochlear-implanted children, and whether it is related to deficits in speech perception performance.

Methods

Visual evoked potentials (VEPs) were recorded via high-density EEG in 41 normal hearing children and 14 cochlear-implanted children, aged 5–15 years, in response to apparent motion and form change. Comparisons of VEP amplitude and latency, as well as source localization results, were conducted between the groups in order to view evidence of visual cross-modal re-organization. Finally, speech perception in background noise performance was correlated to the visual response in the implanted children.

Results

Distinct VEP morphological patterns were observed in both the normal hearing and cochlear-implanted children. However, the cochlear-implanted children demonstrated larger VEP amplitudes and earlier latency, concurrent with activation of right temporal cortex including auditory regions, suggestive of visual cross-modal re-organization. The VEP N1 latency was negatively related to speech perception in background noise for children with cochlear implants.

Conclusion

Our results are among the first to describe cross modal re-organization of auditory cortex by the visual modality in deaf children fitted with cochlear implants. Our findings suggest that, as a group, children with cochlear implants show evidence of visual cross-modal recruitment, which may be a contributing source of variability in speech perception outcomes with their implant.  相似文献   

11.
When dealing with natural scenes, sensory systems have to process an often messy and ambiguous flow of information. A stable perceptual organization nevertheless has to be achieved in order to guide behavior. The neural mechanisms involved can be highlighted by intrinsically ambiguous situations. In such cases, bistable perception occurs: distinct interpretations of the unchanging stimulus alternate spontaneously in the mind of the observer. Bistable stimuli have been used extensively for more than two centuries to study visual perception. Here we demonstrate that bistable perception also occurs in the auditory modality. We compared the temporal dynamics of percept alternations observed during auditory streaming with those observed for visual plaids and the susceptibilities of both modalities to volitional control. Strong similarities indicate that auditory and visual alternations share common principles of perceptual bistability. The absence of correlation across modalities for subject-specific biases, however, suggests that these common principles are implemented at least partly independently across sensory modalities. We propose that visual and auditory perceptual organization could rely on distributed but functionally similar neural competition mechanisms aimed at resolving sensory ambiguities.  相似文献   

12.
Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS) and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS). Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex.  相似文献   

13.
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.  相似文献   

14.
Multisensory integration may occur independently of visual attention as previously shown with compound face-voice stimuli. We investigated in two experiments whether the perception of whole body expressions and the perception of voices influence each other when observers are not aware of seeing the bodily expression. In the first experiment participants categorized masked happy and angry bodily expressions while ignoring congruent or incongruent emotional voices. The onset between target and mask varied from -50 to +133 ms. Results show that the congruency between the emotion in the voice and the bodily expressions influences audiovisual perception independently of the visibility of the stimuli. In the second experiment participants categorized the emotional voices combined with masked bodily expressions as fearful or happy. This experiment showed that bodily expressions presented outside visual awareness still influence prosody perception. Our experiments show that audiovisual integration between bodily expressions and affective prosody can take place outside and independent of visual awareness.  相似文献   

15.
It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition), as well as in normal controls. A visual speech recognition task (i.e. speechreading) was administered either in silence or in combination with three types of auditory distractors: i) noise ii) reverse speech sound and iii) non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.  相似文献   

16.
Recognizing other individuals by integrating different sensory modalities is a crucial ability of social animals, including humans. Although cross-modal individual recognition has been demonstrated in mammals, the extent of its use by birds remains unknown. Herein, we report the first evidence of cross-modal recognition of group members by a highly social bird, the large-billed crow (Corvus macrorhynchos). A cross-modal expectancy violation paradigm was used to test whether crows were sensitive to identity congruence between visual presentation of a group member and the subsequent playback of a contact call. Crows looked more rapidly and for a longer duration when the visual and auditory stimuli were incongruent than when congruent. Moreover, these responses were not observed with non-group member stimuli. These results indicate that crows spontaneously associate visual and auditory information of group members but not of non-group members, which is a demonstration of cross-modal audiovisual recognition of group members in birds.  相似文献   

17.
Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.  相似文献   

18.

Background

The timing at which sensory input reaches the level of conscious perception is an intriguing question still awaiting an answer. It is often assumed that both visual and auditory percepts have a modality specific processing delay and their difference determines perceptual temporal offset.

Methodology/Principal Findings

Here, we show that the perception of audiovisual simultaneity can change flexibly and fluctuates over a short period of time while subjects observe a constant stimulus. We investigated the mechanisms underlying the spontaneous alternations in this audiovisual illusion and found that attention plays a crucial role. When attention was distracted from the stimulus, the perceptual transitions disappeared. When attention was directed to a visual event, the perceived timing of an auditory event was attracted towards that event.

Conclusions/Significance

This multistable display illustrates how flexible perceived timing can be, and at the same time offers a paradigm to dissociate perceptual from stimulus-driven factors in crossmodal feature binding. Our findings suggest that the perception of crossmodal synchrony depends on perceptual binding of audiovisual stimuli as a common event.  相似文献   

19.
In humans, emotions from music serve important communicative roles. Despite a growing interest in the neural basis of music perception, action and emotion, the majority of previous studies in this area have focused on the auditory aspects of music performances. Here we investigate how the brain processes the emotions elicited by audiovisual music performances. We used event-related functional magnetic resonance imaging, and in Experiment 1 we defined the areas responding to audiovisual (musician's movements with music), visual (musician's movements only), and auditory emotional (music only) displays. Subsequently a region of interest analysis was performed to examine if any of the areas detected in Experiment 1 showed greater activation for emotionally mismatching performances (combining the musician's movements with mismatching emotional sound) than for emotionally matching music performances (combining the musician's movements with matching emotional sound) as presented in Experiment 2 to the same participants. The insula and the left thalamus were found to respond consistently to visual, auditory and audiovisual emotional information and to have increased activation for emotionally mismatching displays in comparison with emotionally matching displays. In contrast, the right thalamus was found to respond to audiovisual emotional displays and to have similar activation for emotionally matching and mismatching displays. These results suggest that the insula and left thalamus have an active role in detecting emotional correspondence between auditory and visual information during music performances, whereas the right thalamus has a different role.  相似文献   

20.
Evidence from human neuroimaging and animal electrophysiological studies suggests that signals from different sensory modalities interact early in cortical processing, including in primary sensory cortices. The present study aimed to test whether functional near-infrared spectroscopy (fNIRS), an emerging, non-invasive neuroimaging technique, is capable of measuring such multisensory interactions. Specifically, we tested for a modulatory influence of sounds on activity in visual cortex, while varying the temporal synchrony between trains of transient auditory and visual events. Related fMRI studies have consistently reported enhanced activation in response to synchronous compared to asynchronous audiovisual stimulation. Unexpectedly, we found that synchronous sounds significantly reduced the fNIRS response from visual cortex, compared both to asynchronous sounds and to a visual-only baseline. It is possible that this suppressive effect of synchronous sounds reflects the use of an efficacious visual stimulus, chosen for consistency with previous fNIRS studies. Discrepant results may also be explained by differences between studies in how attention was deployed to the auditory and visual modalities. The presence and relative timing of sounds did not significantly affect performance in a simultaneously conducted behavioral task, although the data were suggestive of a positive relationship between the strength of the fNIRS response from visual cortex and the accuracy of visual target detection. Overall, the present findings indicate that fNIRS is capable of measuring multisensory cortical interactions. In multisensory research, fNIRS can offer complementary information to the more established neuroimaging modalities, and may prove advantageous for testing in naturalistic environments and with infant and clinical populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号