首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16) = 17.153, p = 0.001). The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25) = 0.09, p = 0.767), in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41) = 4.466, p = 0.041). In some cases this reduced ability might lead to the poor communication skills characteristic of autism.  相似文献   

2.
This study investigated whether an odor can affect infants'' attention to visually presented objects and whether it can selectively direct visual gaze at visual targets as a function of their meaning. Four-month-old infants (n = 48) were exposed to their mother''s body odors while their visual exploration was recorded with an eye-movement tracking system. Two groups of infants, who were assigned to either an odor condition or a control condition, looked at a scene composed of still pictures of faces and cars. As expected, infants looked longer at the faces than at the cars but this spontaneous preference for faces was significantly enhanced in presence of the odor. As expected also, when looking at the face, the infants looked longer at the eyes than at any other facial regions, but, again, they looked at the eyes significantly longer in the presence of the odor. Thus, 4-month-old infants are sensitive to the contextual effects of odors while looking at faces. This suggests that early social attention to faces is mediated by visual as well as non-visual cues.  相似文献   

3.
Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.  相似文献   

4.
Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals.  相似文献   

5.
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.  相似文献   

6.
Speech perception provides compelling examples of a strong link between auditory and visual modalities. This link originates in the mechanics of speech production, which, in shaping the vocal tract, determine the movement of the face as well as the sound of the voice. In this paper, we present evidence that equivalent information about identity is available cross-modally from both the face and voice. Using a delayed matching to sample task, XAB, we show that people can match the video of an unfamiliar face, X, to an unfamiliar voice, A or B, and vice versa, but only when stimuli are moving and are played forward. The critical role of time-varying information is underlined by the ability to match faces to voices containing only the coarse spatial and temporal information provided by sine wave speech [5]. The effect of varying sentence content across modalities was small, showing that identity-specific information is not closely tied to particular utterances. We conclude that the physical constraints linking faces to voices result in bimodally available dynamic information, not only about what is being said, but also about who is saying it.  相似文献   

7.
Auditory and visual signals generated by a single source tend to be temporally correlated, such as the synchronous sounds of footsteps and the limb movements of a walker. Continuous tracking and comparison of the dynamics of auditory-visual streams is thus useful for the perceptual binding of information arising from a common source. Although language-related mechanisms have been implicated in the tracking of speech-related auditory-visual signals (e.g., speech sounds and lip movements), it is not well known what sensory mechanisms generally track ongoing auditory-visual synchrony for non-speech signals in a complex auditory-visual environment. To begin to address this question, we used music and visual displays that varied in the dynamics of multiple features (e.g., auditory loudness and pitch; visual luminance, color, size, motion, and organization) across multiple time scales. Auditory activity (monitored using auditory steady-state responses, ASSR) was selectively reduced in the left hemisphere when the music and dynamic visual displays were temporally misaligned. Importantly, ASSR was not affected when attentional engagement with the music was reduced, or when visual displays presented dynamics clearly dissimilar to the music. These results appear to suggest that left-lateralized auditory mechanisms are sensitive to auditory-visual temporal alignment, but perhaps only when the dynamics of auditory and visual streams are similar. These mechanisms may contribute to correct auditory-visual binding in a busy sensory environment.  相似文献   

8.
Rigoulot S  Pell MD 《PloS one》2012,7(1):e30740
Interpersonal communication involves the processing of multimodal emotional cues, particularly facial expressions (visual modality) and emotional speech prosody (auditory modality) which can interact during information processing. Here, we investigated whether the implicit processing of emotional prosody systematically influences gaze behavior to facial expressions of emotion. We analyzed the eye movements of 31 participants as they scanned a visual array of four emotional faces portraying fear, anger, happiness, and neutrality, while listening to an emotionally-inflected pseudo-utterance (Someone migged the pazing) uttered in a congruent or incongruent tone. Participants heard the emotional utterance during the first 1250 milliseconds of a five-second visual array and then performed an immediate recall decision about the face they had just seen. The frequency and duration of first saccades and of total looks in three temporal windows ([0-1250 ms], [1250-2500 ms], [2500-5000 ms]) were analyzed according to the emotional content of faces and voices. Results showed that participants looked longer and more frequently at faces that matched the prosody in all three time windows (emotion congruency effect), although this effect was often emotion-specific (with greatest effects for fear). Effects of prosody on visual attention to faces persisted over time and could be detected long after the auditory information was no longer present. These data imply that emotional prosody is processed automatically during communication and that these cues play a critical role in how humans respond to related visual cues in the environment, such as facial expressions.  相似文献   

9.
Recognizing other individuals by integrating different sensory modalities is a crucial ability of social animals, including humans. Although cross-modal individual recognition has been demonstrated in mammals, the extent of its use by birds remains unknown. Herein, we report the first evidence of cross-modal recognition of group members by a highly social bird, the large-billed crow (Corvus macrorhynchos). A cross-modal expectancy violation paradigm was used to test whether crows were sensitive to identity congruence between visual presentation of a group member and the subsequent playback of a contact call. Crows looked more rapidly and for a longer duration when the visual and auditory stimuli were incongruent than when congruent. Moreover, these responses were not observed with non-group member stimuli. These results indicate that crows spontaneously associate visual and auditory information of group members but not of non-group members, which is a demonstration of cross-modal audiovisual recognition of group members in birds.  相似文献   

10.
Although infant speech perception in often studied in isolated modalities, infants'' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.  相似文献   

11.
The role of the corpus callosum (CC) in the interhemispheric interaction of prosodic and syntactic information during speech comprehension was investigated in patients with lesions in the CC, and in healthy controls. The event-related brain potential experiment examined the effect of prosodic phrase structure on the processing of a verb whose argument structure matched or did not match the prior prosody-induced syntactic structure. While controls showed an N400-like effect for prosodically mismatching verb argument structures, thus indicating a stable interplay between prosody and syntax, patients with lesions in the posterior third of the CC did not show this effect. Because these patients displayed a prosody-independent semantic N400 effect, the present data indicate that the posterior third of the CC is the crucial neuroanatomical structure for the interhemispheric interplay of suprasegmental prosodic information and syntactic information.  相似文献   

12.
Many voice disorders are the result of intricate neural and/or biomechanical impairments that are poorly understood. The limited knowledge of their etiological and pathophysiological mechanisms hampers effective clinical management. Behavioral studies have been used concurrently with computational models to better understand typical and pathological laryngeal motor control. Thus far, however, a unified computational framework that quantitatively integrates physiologically relevant models of phonation with the neural control of speech has not been developed. Here, we introduce LaDIVA, a novel neurocomputational model with physiologically based laryngeal motor control. We combined the DIVA model (an established neural network model of speech motor control) with the extended body-cover model (a physics-based vocal fold model). The resulting integrated model, LaDIVA, was validated by comparing its model simulations with behavioral responses to perturbations of auditory vocal fundamental frequency (fo) feedback in adults with typical speech. LaDIVA demonstrated capability to simulate different modes of laryngeal motor control, ranging from short-term (i.e., reflexive) and long-term (i.e., adaptive) auditory feedback paradigms, to generating prosodic contours in speech. Simulations showed that LaDIVA’s laryngeal motor control displays properties of motor equivalence, i.e., LaDIVA could robustly generate compensatory responses to reflexive vocal fo perturbations with varying initial laryngeal muscle activation levels leading to the same output. The model can also generate prosodic contours for studying laryngeal motor control in running speech. LaDIVA can expand the understanding of the physiology of human phonation to enable, for the first time, the investigation of causal effects of neural motor control in the fine structure of the vocal signal.  相似文献   

13.
Young infants are typically thought to prefer looking at smiling expressions. Although some accounts suggest that the preference is automatic and universal, we hypothesized that it is not rigid and may be influenced by other face dimensions, most notably the face’s gender. Infants are sensitive to the gender of faces; for example, 3-month-olds raised by female caregivers typically prefer female over male faces. We presented neutral versus smiling pairs of faces from the same female or male individuals to 3.5-month-old infants (n = 25), controlling for low-level cues. Infants looked longer to the smiling face when faces were female but longer to the neutral face when faces were male, i.e., there was an effect of face gender on the looking preference for smiling. The results indicate that a preference for smiling in 3.5-month-olds is limited to female faces, possibly reflective of differential experience with male and female faces.  相似文献   

14.
It has recently been shown that some non-human animals can cross-modally recognize members of their own taxon. What is unclear is just how plastic this recognition system can be. In this study, we investigate whether an animal, the domestic horse, is capable of spontaneous cross-modal recognition of individuals from a morphologically very different species. We also provide the first insights into how cross-modal identity information is processed by examining whether there are hemispheric biases in this important social skill. In our preferential looking paradigm, subjects were presented with two people and playbacks of their voices to determine whether they were able to match the voice with the person. When presented with familiar handlers subjects could match the specific familiar person with the correct familiar voice. Horses were significantly better at performing the matching task when the congruent person was standing on their right, indicating marked hemispheric specialization (left hemisphere bias) in this ability. These results are the first to demonstrate that cross-modal recognition in animals can extend to individuals from phylogenetically very distant species. They also indicate that processes governed by the left hemisphere are central to the cross-modal matching of visual and auditory information from familiar individuals in a naturalistic setting.  相似文献   

15.
We describe an illusion in which a stranger's voice, when presented as the auditory concomitant of a participant's own speech, is perceived as a modified version of their own voice. When the congruence between utterance and feedback breaks down, the illusion is also broken. Compared to a baseline condition in which participants heard their own voice as feedback, hearing a stranger's voice induced robust changes in the fundamental frequency (F0) of their production. Moreover, the shift in F0 appears to be feedback dependent, since shift patterns depended reliably on the relationship between the participant's own F0 and the stranger-voice F0. The shift in F0 was evident both when the illusion was present and after it was broken, suggesting that auditory feedback from production may be used separately for self-recognition and for vocal motor control. Our findings indicate that self-recognition of voices, like other body attributes, is malleable and context dependent.  相似文献   

16.
Guellai B  Streri A 《PloS one》2011,6(4):e18610
Previous studies showed that, from birth, speech and eye gaze are two important cues in guiding early face processing and social cognition. These studies tested the role of each cue independently; however, infants normally perceive speech and eye gaze together. Using a familiarization-test procedure, we first familiarized newborn infants (n = 24) with videos of unfamiliar talking faces with either direct gaze or averted gaze. Newborns were then tested with photographs of the previously seen face and of a new one. The newborns looked longer at the face that previously talked to them, but only in the direct gaze condition. These results highlight the importance of both speech and eye gaze as socio-communicative cues by which infants identify others. They suggest that gaze and infant-directed speech, experienced together, are powerful cues for the development of early social skills.  相似文献   

17.
In a natural setting, speech is often accompanied by gestures. As language, speech-accompanying iconic gestures to some extent convey semantic information. However, if comprehension of the information contained in both the auditory and visual modality depends on same or different brain-networks is quite unknown. In this fMRI study, we aimed at identifying the cortical areas engaged in supramodal processing of semantic information. BOLD changes were recorded in 18 healthy right-handed male subjects watching video clips showing an actor who either performed speech (S, acoustic) or gestures (G, visual) in more (+) or less (−) meaningful varieties. In the experimental conditions familiar speech or isolated iconic gestures were presented; during the visual control condition the volunteers watched meaningless gestures (G−), while during the acoustic control condition a foreign language was presented (S−). The conjunction of the visual and acoustic semantic processing revealed activations extending from the left inferior frontal gyrus to the precentral gyrus, and included bilateral posterior temporal regions. We conclude that proclaiming this frontotemporal network the brain''s core language system is to take too narrow a view. Our results rather indicate that these regions constitute a supramodal semantic processing network.  相似文献   

18.
Mochida T  Gomi H  Kashino M 《PloS one》2010,5(11):e13866

Background

There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified.

Methodology/Principal Findings

This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /Φa/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested.

Conclusions/Significance

The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded.  相似文献   

19.
The processing of audio-visual speech: empirical and neural bases   总被引:2,自引:0,他引:2  
In this selective review, I outline a number of ways in which seeing the talker affects auditory perception of speech, including, but not confined to, the McGurk effect. To date, studies suggest that all linguistic levels are susceptible to visual influence, and that two main modes of processing can be described: a complementary mode, whereby vision provides information more efficiently than hearing for some under-specified parts of the speech stream, and a correlated mode, whereby vision partially duplicates information about dynamic articulatory patterning.Cortical correlates of seen speech suggest that at the neurological as well as the perceptual level, auditory processing of speech is affected by vision, so that 'auditory speech regions' are activated by seen speech. The processing of natural speech, whether it is heard, seen or heard and seen, activates the perisylvian language regions (left>right). It is highly probable that activation occurs in a specific order. First, superior temporal, then inferior parietal and finally inferior frontal regions (left>right) are activated. There is some differentiation of the visual input stream to the core perisylvian language system, suggesting that complementary seen speech information makes special use of the visual ventral processing stream, while for correlated visual speech, the dorsal processing stream, which is sensitive to visual movement, may be relatively more involved.  相似文献   

20.
Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号