期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech cues contribute to audiovisual spatial integration

Bishop CW Miller LM 《PloS one》2011,6(8):e24016

Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways. 相似文献

2.

Effect of audiovisual training on monaural spatial hearing in horizontal plane

Strelnikov K Rosito M Barone P 《PloS one》2011,6(3):e18344

The article aims to test the hypothesis that audiovisual integration can improve spatial hearing in monaural conditions when interaural difference cues are not available. We trained one group of subjects with an audiovisual task, where a flash was presented in parallel with the sound and another group in an auditory task, where only sound from different spatial locations was presented. To check whether the observed audiovisual effect was similar to feedback, the third group was trained using the visual feedback paradigm. Training sessions were administered once per day, for 5 days. The performance level in each group was compared for auditory only stimulation on the first and the last day of practice. Improvement after audiovisual training was several times higher than after auditory practice. The group trained with visual feedback demonstrated a different effect of training with the improvement smaller than the group with audiovisual training. We conclude that cross-modal facilitation is highly important to improve spatial hearing in monaural conditions and may be applied to the rehabilitation of patients with unilateral deafness and after unilateral cochlear implantation. 相似文献

3.

Audiovisual integration of speech falters under high attention demands 总被引：11，自引：0，他引：11

Alsius A Navarra J Campbell R Soto-Faraco S 《Current biology : CB》2005,15(9):839-843

One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands. 相似文献

4.

Effects of Sound Frequency on Audiovisual Integration: An Event-Related Potential Study

Weiping Yang Jingjing Yang Yulin Gao Xiaoyu Tang Yanna Ren Satoshi Takahashi Jinglong Wu 《PloS one》2015,10(9)

A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190–210 ms, for 1 kHz stimuli from 170–200 ms, for 2.5 kHz stimuli from 140–200 ms, 5 kHz stimuli from 100–200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300–340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies. 相似文献

5.

Audiovisual integration: an investigation of the "streaming-bouncing" phenomenon

Remijn GB Ito H Nakajima Y 《Journal of PHYSIOLOGICAL ANTHROPOLOGY and Applied Human Science》2004,23(6):243-247

Temporal aspects of the perceptual integration of audiovisual information were investigated by utilizing the visual "streaming-bouncing" phenomenon. When two identical visual objects move towards each other, coincide, and then move away from each other, the objects can either be seen as streaming past one another or bouncing off each other. Although the streaming percept is dominant, the bouncing percept can be induced by presenting an auditory stimulus during the visual coincidence of the moving objects. Here we show that the bounce-inducing effect of the auditory stimulus is paramount when its onset and offset occur in temporal proximity of the onset and offset of the period of visual coincidence of the moving objects. When the duration of the auditory stimulus exceeded this period, visual bouncing disappears. Implications for a temporal window of audiovisual integration and the design of effective audiovisual warning signals are discussed. 相似文献

6.

Effects of Auditory Stimuli in the Horizontal Plane on Audiovisual Integration: An Event-Related Potential Study

Weiping Yang Qi Li Tatsuya Ochi Jingjing Yang Yulin Gao Xiaoyu Tang Satoshi Takahashi Jinglong Wu 《PloS one》2013,8(6)

This article aims to investigate whether auditory stimuli in the horizontal plane, particularly originating from behind the participant, affect audiovisual integration by using behavioral and event-related potential (ERP) measurements. In this study, visual stimuli were presented directly in front of the participants, auditory stimuli were presented at one location in an equidistant horizontal plane at the front (0°, the fixation point), right (90°), back (180°), or left (270°) of the participants, and audiovisual stimuli that include both visual stimuli and auditory stimuli originating from one of the four locations were simultaneously presented. These stimuli were presented randomly with equal probability; during this time, participants were asked to attend to the visual stimulus and respond promptly only to visual target stimuli (a unimodal visual target stimulus and the visual target of the audiovisual stimulus). A significant facilitation of reaction times and hit rates was obtained following audiovisual stimulation, irrespective of whether the auditory stimuli were presented in the front or back of the participant. However, no significant interactions were found between visual stimuli and auditory stimuli from the right or left. Two main ERP components related to audiovisual integration were found: first, auditory stimuli from the front location produced an ERP reaction over the right temporal area and right occipital area at approximately 160–200 milliseconds; second, auditory stimuli from the back produced a reaction over the parietal and occipital areas at approximately 360–400 milliseconds. Our results confirmed that audiovisual integration was also elicited, even though auditory stimuli were presented behind the participant, but no integration occurred when auditory stimuli were presented in the right or left spaces, suggesting that the human brain might be particularly sensitive to information received from behind than both sides. 相似文献

7.

Neuronal plasticity and multisensory integration in filial imprinting

Town SM McCabe BJ 《PloS one》2011,6(3):e17777

Many organisms sample their environment through multiple sensory systems and the integration of multisensory information enhances learning. However, the mechanisms underlying multisensory memory formation and their similarity to unisensory mechanisms remain unclear. Filial imprinting is one example in which experience is multisensory, and the mechanisms of unisensory neuronal plasticity are well established. We investigated the storage of audiovisual information through experience by comparing the activity of neurons in the intermediate and medial mesopallium of imprinted and naïve domestic chicks (Gallus gallus domesticus) in response to an audiovisual imprinting stimulus and novel object and their auditory and visual components. We find that imprinting enhanced the mean response magnitude of neurons to unisensory but not multisensory stimuli. Furthermore, imprinting enhanced responses to incongruent audiovisual stimuli comprised of mismatched auditory and visual components. Our results suggest that the effects of imprinting on the unisensory and multisensory responsiveness of IMM neurons differ and that IMM neurons may function to detect unexpected deviations from the audiovisual imprinting stimulus. 相似文献

8.

Cross-modal prediction in speech perception

Sánchez-García C Alsius A Enns JT Soto-Faraco S 《PloS one》2011,6(10):e25198

Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis. 相似文献

9.

An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex

Kayoko Okada Jonathan H. Venezia William Matchin Kourosh Saberi Gregory Hickok 《PloS one》2013,8(6)

Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS) and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS). Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex. 相似文献

10.

Why Middle-Aged Listeners Have Trouble Hearing in Everyday Settings

Ruggles D Bharadwaj H Shinn-Cunningham BG 《Current biology : CB》2012,22(15):1417-1422

Anecdotally, middle-aged listeners report difficulty conversing in social settings, even when they have normal audiometric thresholds [1-3]. Moreover, young adult listeners with "normal" hearing vary in their ability to selectively attend to speech amid similar streams of speech. Ignoring age, these individual differences correlate with physiological differences in temporal coding precision present in the auditory brainstem, suggesting that the fidelity of encoding of suprathreshold sound helps explain individual differences [4]. Here, we revisit the conundrum of whether early aging influences an individual's ability to communicate in everyday settings. Although absolute selective attention ability is not predicted by age, reverberant energy interferes more with selective attention as age increases. Breaking the brainstem response down into components corresponding to coding of?stimulus fine structure and envelope, we find that age alters which brainstem component predicts performance. Specifically, middle-aged listeners appear to rely heavily on temporal fine structure, which is more disrupted by reverberant energy than temporal envelope structure is. In contrast, the fidelity of envelope cues predicts performance in younger adults. These results hint that temporal envelope cues influence spatial hearing in reverberant settings more than is commonly appreciated and help explain why middle-aged listeners have particular difficulty communicating in daily life. 相似文献

11.

Dissociated Roles of the Inferior Frontal Gyrus and Superior Temporal Sulcus in Audiovisual Processing: Top-Down and Bottom-Up Mismatch Detection

Takeshi Uno Kensuke Kawai Katsuyuki Sakai Toshihiro Wakebe Takuya Ibaraki Naoto Kunii Takeshi Matsuo Nobuhito Saito 《PloS one》2015,10(3)

Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information. 相似文献

12.

No,There Is No 150 ms Lead of Visual Speech on Auditory Speech,but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag

Jean-Luc Schwartz Christophe Savariaux 《PLoS computational biology》2014,10(7)

An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction. 相似文献

13.

The music of your emotions: neural substrates involved in detection of emotional correspondence between auditory and visual music actions

Petrini K Crabbe F Sheridan C Pollick FE 《PloS one》2011,6(4):e19165

In humans, emotions from music serve important communicative roles. Despite a growing interest in the neural basis of music perception, action and emotion, the majority of previous studies in this area have focused on the auditory aspects of music performances. Here we investigate how the brain processes the emotions elicited by audiovisual music performances. We used event-related functional magnetic resonance imaging, and in Experiment 1 we defined the areas responding to audiovisual (musician's movements with music), visual (musician's movements only), and auditory emotional (music only) displays. Subsequently a region of interest analysis was performed to examine if any of the areas detected in Experiment 1 showed greater activation for emotionally mismatching performances (combining the musician's movements with mismatching emotional sound) than for emotionally matching music performances (combining the musician's movements with matching emotional sound) as presented in Experiment 2 to the same participants. The insula and the left thalamus were found to respond consistently to visual, auditory and audiovisual emotional information and to have increased activation for emotionally mismatching displays in comparison with emotionally matching displays. In contrast, the right thalamus was found to respond to audiovisual emotional displays and to have similar activation for emotionally matching and mismatching displays. These results suggest that the insula and left thalamus have an active role in detecting emotional correspondence between auditory and visual information during music performances, whereas the right thalamus has a different role. 相似文献

14.

How bodies and voices interact in early emotion perception

Jessen S Obleser J Kotz SA 《PloS one》2012,7(4):e36070

Successful social communication draws strongly on the correct interpretation of others' body and vocal expressions. Both can provide emotional information and often occur simultaneously. Yet their interplay has hardly been studied. Using electroencephalography, we investigated the temporal development underlying their neural interaction in auditory and visual perception. In particular, we tested whether this interaction qualifies as true integration following multisensory integration principles such as inverse effectiveness. Emotional vocalizations were embedded in either low or high levels of noise and presented with or without video clips of matching emotional body expressions. In both, high and low noise conditions, a reduction in auditory N100 amplitude was observed for audiovisual stimuli. However, only under high noise, the N100 peaked earlier in the audiovisual than the auditory condition, suggesting facilitatory effects as predicted by the inverse effectiveness principle. Similarly, we observed earlier N100 peaks in response to emotional compared to neutral audiovisual stimuli. This was not the case in the unimodal auditory condition. Furthermore, suppression of beta-band oscillations (15-25 Hz) primarily reflecting biological motion perception was modulated 200-400 ms after the vocalization. While larger differences in suppression between audiovisual and audio stimuli in high compared to low noise levels were found for emotional stimuli, no such difference was observed for neutral stimuli. This observation is in accordance with the inverse effectiveness principle and suggests a modulation of integration by emotional content. Overall, results show that ecologically valid, complex stimuli such as joined body and vocal expressions are effectively integrated very early in processing. 相似文献

15.

Affect induction through musical sounds: an ethological perspective

David Huron 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2015,370(1664)

How does music induce or evoke feeling states in listeners? A number of mechanisms have been proposed for how sounds induce emotions, including innate auditory responses, learned associations and mirror neuron processes. Inspired by ethology, it is suggested that the ethological concepts of signals, cues and indices offer additional analytic tools for better understanding induced affect. It is proposed that ethological concepts help explain why music is able to induce only certain emotions, why some induced emotions are similar to the displayed emotion (whereas other induced emotions differ considerably from the displayed emotion), why listeners often report feeling mixed emotions and why only some musical expressions evoke similar responses across cultures. 相似文献

16.

Audiovisual integration of letters in the human brain 总被引：5，自引：0，他引：5

Raij T Uutela K Hari R 《Neuron》2000,28(2):617-625

Letters of the alphabet have auditory (phonemic) and visual (graphemic) qualities. To investigate the neural representations of such audiovisual objects, we recorded neuromagnetic cortical responses to auditorily, visually, and audiovisually presented single letters. The auditory and visual brain activations first converged around 225 ms after stimulus onset and then interacted predominantly in the right temporo-occipito-parietal junction (280345 ms) and the left (380-540 ms) and right (450-535 ms) superior temporal sulci. These multisensory brain areas, playing a role in audiovisual integration of phonemes and graphemes, participate in the neural network supporting the supramodal concept of a "letter." The dynamics of these functions bring new insight into the interplay between sensory and association cortices during object recognition. 相似文献

17.

The processing of audio-visual speech: empirical and neural bases 总被引：2，自引：0，他引：2

Campbell R 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1493):1001-1010

In this selective review, I outline a number of ways in which seeing the talker affects auditory perception of speech, including, but not confined to, the McGurk effect. To date, studies suggest that all linguistic levels are susceptible to visual influence, and that two main modes of processing can be described: a complementary mode, whereby vision provides information more efficiently than hearing for some under-specified parts of the speech stream, and a correlated mode, whereby vision partially duplicates information about dynamic articulatory patterning.Cortical correlates of seen speech suggest that at the neurological as well as the perceptual level, auditory processing of speech is affected by vision, so that 'auditory speech regions' are activated by seen speech. The processing of natural speech, whether it is heard, seen or heard and seen, activates the perisylvian language regions (left>right). It is highly probable that activation occurs in a specific order. First, superior temporal, then inferior parietal and finally inferior frontal regions (left>right) are activated. There is some differentiation of the visual input stream to the core perisylvian language system, suggesting that complementary seen speech information makes special use of the visual ventral processing stream, while for correlated visual speech, the dorsal processing stream, which is sensitive to visual movement, may be relatively more involved. 相似文献

18.

Behavioral and Neurogenomic Responses to Acoustic and Visual Sexual Cues are Correlated in Female Torrent Frogs

Longhui ZHAO Jichao WANG Yanlin CAI Jianghong RAN Steven E.BRAUTH Yezhong TANG Jianguo CUI 《亚洲两栖爬行动物研究(英文版)》2021,(1):88-99

Diverse a nimal species use multimodal communica tion signals to coordina te reproductive behavior.Despite active research in this field,the brain mechanisms underlying multimodal communication remain poorly understood.Similar to humans and many mammalian species,anurans often produce auditory signals accompanied by conspicuous visual cues(e.g.,vocal sac inflation).In this study,we used video playbacks to determine the role of vocal-sac inflation in little torrent frogs(Amolops torrentis).Then we exposed females to blank,visual,auditory,and audiovisual stimuli and analyzed whole brain tissue gene expression changes using RNAseq.The results showed that both auditory cues(i.e.,male advertisement calls)and visual cues were attractive to female frogs,although auditory cues were more attractive than visual cues.Females preferred simultaneous bimodal cues to unimodal cues.The hierarchical clustering of differentially expressed genes showed a close relationship between neurogenomic states and momentarily expressed sexual signals.We also found that the Gene Ontology terms and KEGG pathways involved in energy metabolism were mostly increased in blank contrast versus visual,acoustic,or audiovisual stimuli,indicating that brain energy use may play an important role in response to these stimuli.In sum,behavioral and neurogenomic responses to acoustic and visual cues are correlated in female little torrent frogs. 相似文献

19.

The attentional window modulates capture by audiovisual events

E Van der Burg CN Olivers J Theeuwes 《PloS one》2012,7(7):e39137

Visual search is markedly improved when a target color change is synchronized with a spatially non-informative auditory signal. This "pip and pop" effect is an automatic process as even a distractor captures attention when accompanied by a tone. Previous studies investigating visual attention have indicated that automatic capture is susceptible to the size of the attentional window. The present study investigated whether the pip and pop effect is modulated by the extent to which participants divide their attention across the visual field We show that participants were better in detecting a synchronized audiovisual event when they divided their attention across the visual field relative to a condition in which they focused their attention. We argue that audiovisual capture is reduced under focused conditions relative to distributed settings. 相似文献

20.

Encoding of illusory continuity in primary auditory cortex

Petkov CI O'Connor KN Sutter ML 《Neuron》2007,54(1):153-165

When interfering objects occlude a scene, the visual system restores the occluded information. Similarly, when a sound of interest (a "foreground" sound) is interrupted (occluded) by loud noise, the auditory system restores the occluded information. This process, called auditory induction, can be exploited to create a continuity illusion. When a segment of a foreground sound is deleted and loud noise fills the missing portion, listeners incorrectly report hearing the foreground continuing through the noise. Here we reveal the neurophysiological underpinnings of illusory continuity in single-neuron responses from awake macaque monkeys' primary auditory cortex (A1). A1 neurons represented the missing segment of occluded tonal foregrounds by responding to discontinuous foregrounds interrupted by intense noise as if they were responding to the complete foregrounds. By comparison, simulated peripheral responses represented only the noise and not the occluded foreground. The results reveal that many A1 single-neuron responses closely follow the illusory percept. 相似文献