首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS) and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS). Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex.  相似文献   

2.
Hasson U  Skipper JI  Nusbaum HC  Small SL 《Neuron》2007,56(6):1116-1126
Is there a neural representation of speech that transcends its sensory properties? Using fMRI, we investigated whether there are brain areas where neural activity during observation of sublexical audiovisual input corresponds to a listener's speech percept (what is "heard") independent of the sensory properties of the input. A target audiovisual stimulus was preceded by stimuli that (1) shared the target's auditory features (auditory overlap), (2) shared the target's visual features (visual overlap), or (3) shared neither the target's auditory or visual features but were perceived as the target (perceptual overlap). In two left-hemisphere regions (pars opercularis, planum polare), the target invoked less activity when it was preceded by the perceptually overlapping stimulus than when preceded by stimuli that shared one of its sensory components. This pattern of neural facilitation indicates that these regions code sublexical speech at an abstract level corresponding to that of the speech percept.  相似文献   

3.
Social animals learn to perceive their social environment, and their social skills and preferences are thought to emerge from greater exposure to and hence familiarity with some social signals rather than others. Familiarity appears to be tightly linked to multisensory integration. The ability to differentiate and categorize familiar and unfamiliar individuals and to build a multisensory representation of known individuals emerges from successive social interactions, in particular with adult, experienced models. In different species, adults have been shown to shape the social behavior of young by promoting selective attention to multisensory cues. The question of what representation of known conspecifics adult-deprived animals may build therefore arises. Here we show that starlings raised with no experience with adults fail to develop a multisensory representation of familiar and unfamiliar starlings. Electrophysiological recordings of neuronal activity throughout the primary auditory area of these birds, while they were exposed to audio-only or audiovisual familiar and unfamiliar cues, showed that visual stimuli did, as in wild-caught starlings, modulate auditory responses but that, unlike what was observed in wild-caught birds, this modulation was not influenced by familiarity. Thus, adult-deprived starlings seem to fail to discriminate between familiar and unfamiliar individuals. This suggests that adults may shape multisensory representation of known individuals in the brain, possibly by focusing the young's attention on relevant, multisensory cues. Multisensory stimulation by experienced, adult models may thus be ubiquitously important for the development of social skills (and of the neural properties underlying such skills) in a variety of species.  相似文献   

4.
To form a veridical percept of the environment, the brain needs to integrate sensory signals from a common source but segregate those from independent sources. Thus, perception inherently relies on solving the “causal inference problem.” Behaviorally, humans solve this problem optimally as predicted by Bayesian Causal Inference; yet, the underlying neural mechanisms are unexplored. Combining psychophysics, Bayesian modeling, functional magnetic resonance imaging (fMRI), and multivariate decoding in an audiovisual spatial localization task, we demonstrate that Bayesian Causal Inference is performed by a hierarchy of multisensory processes in the human brain. At the bottom of the hierarchy, in auditory and visual areas, location is represented on the basis that the two signals are generated by independent sources (= segregation). At the next stage, in posterior intraparietal sulcus, location is estimated under the assumption that the two signals are from a common source (= forced fusion). Only at the top of the hierarchy, in anterior intraparietal sulcus, the uncertainty about the causal structure of the world is taken into account and sensory signals are combined as predicted by Bayesian Causal Inference. Characterizing the computational operations of signal interactions reveals the hierarchical nature of multisensory perception in human neocortex. It unravels how the brain accomplishes Bayesian Causal Inference, a statistical computation fundamental for perception and cognition. Our results demonstrate how the brain combines information in the face of uncertainty about the underlying causal structure of the world.  相似文献   

5.
To form a percept of the multisensory world, the brain needs to integrate signals from common sources weighted by their reliabilities and segregate those from independent sources. Previously, we have shown that anterior parietal cortices combine sensory signals into representations that take into account the signals’ causal structure (i.e., common versus independent sources) and their sensory reliabilities as predicted by Bayesian causal inference. The current study asks to what extent and how attentional mechanisms can actively control how sensory signals are combined for perceptual inference. In a pre- and postcueing paradigm, we presented observers with audiovisual signals at variable spatial disparities. Observers were precued to attend to auditory or visual modalities prior to stimulus presentation and postcued to report their perceived auditory or visual location. Combining psychophysics, functional magnetic resonance imaging (fMRI), and Bayesian modelling, we demonstrate that the brain moulds multisensory inference via two distinct mechanisms. Prestimulus attention to vision enhances the reliability and influence of visual inputs on spatial representations in visual and posterior parietal cortices. Poststimulus report determines how parietal cortices flexibly combine sensory estimates into spatial representations consistent with Bayesian causal inference. Our results show that distinct neural mechanisms control how signals are combined for perceptual inference at different levels of the cortical hierarchy.

A combination of psychophysics, computational modelling and fMRI reveals novel insights into how the brain controls the binding of information across the senses, such as the voice and lip movements of a speaker.  相似文献   

6.
Seitz AR  Kim R  Shams L 《Current biology : CB》2006,16(14):1422-1427
Numerous studies show that practice can result in performance improvements on low-level visual perceptual tasks [1-5]. However, such learning is characteristically difficult and slow, requiring many days of training [6-8]. Here, we show that a multisensory audiovisual training procedure facilitates visual learning and results in significantly faster learning than unisensory visual training. We trained one group of subjects with an audiovisual motion-detection task and a second group with a visual motion-detection task, and compared performance on trials containing only visual signals across ten days of training. Whereas observers in both groups showed improvements of visual sensitivity with training, subjects trained with multisensory stimuli showed significantly more learning both within and across training sessions. These benefits of multisensory training are particularly surprising given that the learning of visual motion stimuli is generally thought to be mediated by low-level visual brain areas [6, 9, 10]. Although crossmodal interactions are ubiquitous in human perceptual processing [11-13], the contribution of crossmodal information to perceptual learning has not been studied previously. Our results show that multisensory interactions can be exploited to yield more efficient learning of sensory information and suggest that multisensory training programs would be most effective for the acquisition of new skills.  相似文献   

7.
Audiovisual integration of letters in the human brain   总被引:5,自引:0,他引:5  
Raij T  Uutela K  Hari R 《Neuron》2000,28(2):617-625
Letters of the alphabet have auditory (phonemic) and visual (graphemic) qualities. To investigate the neural representations of such audiovisual objects, we recorded neuromagnetic cortical responses to auditorily, visually, and audiovisually presented single letters. The auditory and visual brain activations first converged around 225 ms after stimulus onset and then interacted predominantly in the right temporo-occipito-parietal junction (280345 ms) and the left (380-540 ms) and right (450-535 ms) superior temporal sulci. These multisensory brain areas, playing a role in audiovisual integration of phonemes and graphemes, participate in the neural network supporting the supramodal concept of a "letter." The dynamics of these functions bring new insight into the interplay between sensory and association cortices during object recognition.  相似文献   

8.
Currently debate exists relating to the interplay between multisensory processes and bottom-up and top-down influences. However, few studies have looked at neural responses to newly paired audiovisual stimuli that differ in their prescribed relevance. For such newly associated audiovisual stimuli, optimal facilitation of motor actions was observed only when both components of the audiovisual stimuli were targets. Relevant auditory stimuli were found to significantly increase the amplitudes of the event-related potentials at the occipital pole during the first 100 ms post-stimulus onset, though this early integration was not predictive of multisensory facilitation. Activity related to multisensory behavioral facilitation was observed approximately 166 ms post-stimulus, at left central and occipital sites. Furthermore, optimal multisensory facilitation was found to be associated with a latency shift of induced oscillations in the beta range (14–30 Hz) at right hemisphere parietal scalp regions. These findings demonstrate the importance of stimulus relevance to multisensory processing by providing the first evidence that the neural processes underlying multisensory integration are modulated by the relevance of the stimuli being combined. We also provide evidence that such facilitation may be mediated by changes in neural synchronization in occipital and centro-parietal neural populations at early and late stages of neural processing that coincided with stimulus selection, and the preparation and initiation of motor action.  相似文献   

9.
Audiovisual integration of speech falters under high attention demands   总被引:11,自引:0,他引:11  
One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.  相似文献   

10.
A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190–210 ms, for 1 kHz stimuli from 170–200 ms, for 2.5 kHz stimuli from 140–200 ms, 5 kHz stimuli from 100–200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300–340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies.  相似文献   

11.
We rely on rich and complex sensory information to perceive and understand our environment. Our multisensory experience of the world depends on the brain''s remarkable ability to combine signals across sensory systems. Behavioural, neurophysiological and neuroimaging experiments have established principles of multisensory integration and candidate neural mechanisms. Here we review how targeted manipulation of neural activity using invasive and non-invasive neuromodulation techniques have advanced our understanding of multisensory processing. Neuromodulation studies have provided detailed characterizations of brain networks causally involved in multisensory integration. Despite substantial progress, important questions regarding multisensory networks remain unanswered. Critically, experimental approaches will need to be combined with theory in order to understand how distributed activity across multisensory networks collectively supports perception.  相似文献   

12.
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.  相似文献   

13.
Bishop CW  Miller LM 《PloS one》2011,6(8):e24016
Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.  相似文献   

14.
The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one’s cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.  相似文献   

15.
Although infant speech perception in often studied in isolated modalities, infants'' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.  相似文献   

16.
Given the extraordinary ability of humans and animals to recognize communication signals over a background of noise, describing noise invariant neural responses is critical not only to pinpoint the brain regions that are mediating our robust perceptions but also to understand the neural computations that are performing these tasks and the underlying circuitry. Although invariant neural responses, such as rotation-invariant face cells, are well described in the visual system, high-level auditory neurons that can represent the same behaviorally relevant signal in a range of listening conditions have yet to be discovered. Here we found neurons in a secondary area of the avian auditory cortex that exhibit noise-invariant responses in the sense that they responded with similar spike patterns to song stimuli presented in silence and over a background of naturalistic noise. By characterizing the neurons'' tuning in terms of their responses to modulations in the temporal and spectral envelope of the sound, we then show that noise invariance is partly achieved by selectively responding to long sounds with sharp spectral structure. Finally, to demonstrate that such computations could explain noise invariance, we designed a biologically inspired noise-filtering algorithm that can be used to separate song or speech from noise. This novel noise-filtering method performs as well as other state-of-the-art de-noising algorithms and could be used in clinical or consumer oriented applications. Our biologically inspired model also shows how high-level noise-invariant responses could be created from neural responses typically found in primary auditory cortex.  相似文献   

17.
Neural dynamics of envelope coding   总被引:1,自引:0,他引:1  
We consider the processing of narrowband signals that modulate carrier waveforms in sensory systems. The tuning of sensory neurons to the carrier frequency results in a high sensitivity to the amplitude modulations of the carrier. Recent work has revealed how specialized circuitry can extract the lower-frequency modulation associated with the slow envelope of a narrowband signal, and send it to higher brain along with the full signal. This paper first summarizes the experimental evidence for this processing in the context of electroreception, where the narrowband signals arise in the context of social communication between the animals. It then examines the mechanism of this extraction by single neurons and neural populations, using intracellular recordings and new modeling results contrasting envelope extraction and stochastic resonance. Low noise and peri-threshold stimulation are necessary to obtain a firing pattern that shows high coherence with the envelope of the input. Further, the output must be fed through a slow synapse. Averaging networks are then considered for their ability to detect, using additional noise, signals with power in the envelope bandwidth. The circuitry that does support envelope extraction beyond the primary receptors is available in many areas of the brain including cortex. The mechanism of envelope extraction and its gating by noise and bias currents is thus accessible to non-carrier-based coding as well, as long as the input to the circuit is a narrowband signal. Novel results are also presented on a more biophysical model of the receptor population, showing that it can encode a narrowband signal, but not its envelope, as observed experimentally. The model is modified from previous models by stimulus reducing contrast in order to make it sufficiently linear to agree with the experimental data.  相似文献   

18.
The relative positions of the brain and mouth are of central importance for models of chordate evolution. The dorsal hollow neural tube and the mouth have often been thought of as developmentally distinct structures that may have followed independent evolutionary paths. In most chordates however, including vertebrates and ascidians, the mouth primordia have been shown to fate to the anterior neural boundary. In ascidians such as Ciona there is a particularly intimate relationship between brain and mouth development, with a thin canal connecting the neural tube lumen to the mouth primordium at larval stages. This so-called neurohypophyseal canal was previously thought to be a secondary connection that formed relatively late, after the independent formation of the mouth primordium and the neural tube. Here we show that the Ciona neurohypophyseal canal is present from the end of neurulation and represents the anteriormost neural tube, and that the future mouth opening is actually derived from the anterior neuropore. The mouth thus forms at the anterior midline transition between neural tube and surface ectoderm. In the vertebrate Xenopus, we find that although the mouth primordium is not topologically continuous with the neural tube lumen, it nonetheless forms at this same transition point. This close association between the mouth primordium and the anterior neural tube in both ascidians and amphibians suggests that the evolution of these two structures may be more closely linked than previously appreciated.  相似文献   

19.
Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a "race" model failed to account for their behavior patterns. Conversely, a "superposition model", positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates.  相似文献   

20.
Kim RS  Seitz AR  Shams L 《PloS one》2008,3(1):e1532

Background

Studies of perceptual learning have largely focused on unisensory stimuli. However, multisensory interactions are ubiquitous in perception, even at early processing stages, and thus can potentially play a role in learning. Here, we examine the effect of auditory-visual congruency on visual learning.

Methodology/Principle Findings

Subjects were trained over five days on a visual motion coherence detection task with either congruent audiovisual, or incongruent audiovisual stimuli. Comparing performance on visual-only trials, we find that training with congruent audiovisual stimuli produces significantly better learning than training with incongruent audiovisual stimuli or with only visual stimuli.

Conclusions/Significance

This advantage from stimulus congruency during training suggests that the benefits of multisensory training may result from audiovisual interactions at a perceptual rather than cognitive level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号