首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Audiovisual integration of speech falters under high attention demands   总被引:11,自引:0,他引:11  
One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.  相似文献   

2.
Elucidating the structure and function of joint vocal displays (e.g. duet, chorus) recorded with a conventional microphone has proved difficult in some animals owing to the complex acoustic properties of the combined signal, a problem reminiscent of multi-speaker conversations in humans. Towards this goal, we set out to simultaneously compare air-transmitted (AT) with radio-transmitted (RT) vocalizations in one pair of humans and one pair of captive Bolivian grey titi monkeys (Plecturocebus donacophilus) all equipped with an accelerometer – or vibration transducer – closely apposed to the larynx. First, we observed no crosstalk between the two radio transmitters when subjects produced vocalizations at the same time close to each other. Second, compared with AT acoustic recordings, sound segmentation and pitch tracking of the RT signal was more accurate, particularly in a noisy and reverberating environment. Third, RT signals were less noisy than AT signals and displayed more stable amplitude regardless of distance, orientation and environment of the animal. The microphone outperformed the accelerometer with respect to sound spectral bandwidth and speech intelligibility: the sounds of RT speech were more attenuated and dampened as compared to AT speech. Importantly, we show that vocal telemetry allows reliable separation of the subjects’ voices during production of joint vocalizations, which has great potential for future applications of this technique with free-ranging animals.  相似文献   

3.
Rigoulot S  Pell MD 《PloS one》2012,7(1):e30740
Interpersonal communication involves the processing of multimodal emotional cues, particularly facial expressions (visual modality) and emotional speech prosody (auditory modality) which can interact during information processing. Here, we investigated whether the implicit processing of emotional prosody systematically influences gaze behavior to facial expressions of emotion. We analyzed the eye movements of 31 participants as they scanned a visual array of four emotional faces portraying fear, anger, happiness, and neutrality, while listening to an emotionally-inflected pseudo-utterance (Someone migged the pazing) uttered in a congruent or incongruent tone. Participants heard the emotional utterance during the first 1250 milliseconds of a five-second visual array and then performed an immediate recall decision about the face they had just seen. The frequency and duration of first saccades and of total looks in three temporal windows ([0-1250 ms], [1250-2500 ms], [2500-5000 ms]) were analyzed according to the emotional content of faces and voices. Results showed that participants looked longer and more frequently at faces that matched the prosody in all three time windows (emotion congruency effect), although this effect was often emotion-specific (with greatest effects for fear). Effects of prosody on visual attention to faces persisted over time and could be detected long after the auditory information was no longer present. These data imply that emotional prosody is processed automatically during communication and that these cues play a critical role in how humans respond to related visual cues in the environment, such as facial expressions.  相似文献   

4.
Language is a uniquely human trait, and questions of how and why it evolved have been intriguing scientists for years. Nonhuman primates (primates) are our closest living relatives, and their behavior can be used to estimate the capacities of our extinct ancestors. As humans and many primate species rely on vocalizations as their primary mode of communication, the vocal behavior of primates has been an obvious target for studies investigating the evolutionary roots of human speech and language. By studying the similarities and differences between human and primate vocalizations, comparative research has the potential to clarify the evolutionary processes that shaped human speech and language. This review examines some of the seminal and recent studies that contribute to our knowledge regarding the link between primate calls and human language and speech. We focus on three main aspects of primate vocal behavior: functional reference, call combinations, and vocal learning. Studies in these areas indicate that despite important differences, primate vocal communication exhibits some key features characterizing human language. They also indicate, however, that some critical aspects of speech, such as vocal plasticity, are not shared with our primate cousins. We conclude that comparative research on primate vocal behavior is a very promising tool for deepening our understanding of the evolution of human speech and language, but much is still to be done as many aspects of monkey and ape vocalizations remain largely unexplored.  相似文献   

5.
Infant-directed (ID) speech provides exaggerated auditory and visual prosodic cues. Here we investigated if infants were sensitive to the match between the auditory and visual correlates of ID speech prosody. We presented 8-month-old infants with two silent line-joined point-light displays of faces speaking different ID sentences, and a single vocal-only sentence matched to one of the displays. Infants looked longer to the matched than mismatched visual signal when full-spectrum speech was presented; and when the vocal signals contained speech low-pass filtered at 400 Hz. When the visual display was separated into rigid (head only) and non-rigid (face only) motion, the infants looked longer to the visual match in the rigid condition; and to the visual mismatch in the non-rigid condition. Overall, the results suggest 8-month-olds can extract information about the prosodic structure of speech from voice and head kinematics, and are sensitive to their match; and that they are less sensitive to the match between lip and voice information in connected speech.  相似文献   

6.
Vocal-tract resonances (or formants) are acoustic signatures in the voice and are related to the shape and length of the vocal tract. Formants play an important role in human communication, helping us not only to distinguish several different speech sounds [1], but also to extract important information related to the physical characteristics of the speaker, so-called indexical cues. How did formants come to play such an important role in human vocal communication? One hypothesis suggests that the ancestral role of formant perception--a role that might be present in extant nonhuman primates--was to provide indexical cues [2-5]. Although formants are present in the acoustic structure of vowel-like calls of monkeys [3-8] and implicated in the discrimination of call types [8-10], it is not known whether they use this feature to extract indexical cues. Here, we investigate whether rhesus monkeys can use the formant structure in their "coo" calls to assess the age-related body size of conspecifics. Using a preferential-looking paradigm [11, 12] and synthetic coo calls in which formant structure simulated an adult/large- or juvenile/small-sounding individual, we demonstrate that untrained monkeys attend to formant cues and link large-sounding coos to large faces and small-sounding coos to small faces-in essence, they can, like humans [13], use formants as indicators of age-related body size.  相似文献   

7.
Species-specific vocalizations fall into two broad categories: those that emerge during maturation, independent of experience, and those that depend on early life interactions with conspecifics. Human language and the communication systems of a small number of other species, including songbirds, fall into this latter class of vocal learning. Self-monitoring has been assumed to play an important role in the vocal learning of speech and studies demonstrate that perception of your own voice is crucial for both the development and lifelong maintenance of vocalizations in humans and songbirds. Experimental modifications of auditory feedback can also change vocalizations in both humans and songbirds. However, with the exception of large manipulations of timing, no study to date has ever directly examined the use of auditory feedback in speech production under the age of 4. Here we use a real-time formant perturbation task to compare the response of toddlers, children, and adults to altered feedback. Children and adults reacted to this manipulation by changing their vowels in a direction opposite to the perturbation. Surprisingly, toddlers' speech didn't change in response to altered feedback, suggesting that long-held assumptions regarding the role of self-perception in articulatory development need to be reconsidered.  相似文献   

8.
Vocal and facial masculinity are cues to underlying testosterone in men and influence women’s mate preferences. Consistent with the proposal that facial and vocal masculinity signal common information about men, prior work has revealed correlated female preferences for male facial and vocal masculinity. Previous studies have assessed women’s preferences for male facial and vocal masculinity by presenting faces and voices independently and using static face stimuli. By contrast, here we presented women with short video clips in which male faces and voices were simultaneously manipulated in masculinity. We found that women who preferred masculine faces also preferred masculine voices. Furthermore, women whose faces were rated as relatively more attractive preferred both facial and vocal masculinity more than did women whose faces were rated as less attractive. These findings complement other evidence for cross‐modal masculinity preferences among women and demonstrate that preferences observed in studies using still images and/or independently presented vocal stimuli are also observed when dynamic faces and voices are displayed simultaneously in video format.  相似文献   

9.
The observed respect and attention to elders'' speech in traditional cultures appears to have a ‘universal’ component which questions its possible biological bases. Animals present differential attention to the vocalizations of other individuals according to their characteristics but little is known about the potential propensity to pay more attention to vocalizations of elders. On the basis of several hundreds of vocal exchanges recorded, here we show that aged female Campbell''s monkeys (Cercopithecus campbelli), despite being significantly less ‘loquacious’ than their younger adult counterparts, elicit many more responses when calling. These findings show that attention to elders'' vocal production appears in non-human primates, leading to new lines of questioning on human culture and language evolution.  相似文献   

10.
Evidence is presented from recordings made from captive gelada monkeys (Theropithecus gelada) that these monkeys are capable of synchronizing the onsets of their own vocal sounds to the anticipated onsets of sounds produced by other gelada voices. The possibility is discussed that in order to synchronize the onsets of their own sounds to the anticipated onsets of sounds made by other voices, such gelada voices have to possess the ability to “figure out” the tempo and rhythm of the vocal strings produced by the other voices and precisely control the timing of their own voices. It is suggested that geladas do synchronize their voices by using precise temporal and rhythmical controls on the outputs of their voices that are analogous to the temporal and rhythmical abilities humans use in many of the supra-segmental aspects of speech.  相似文献   

11.
Considerable knowledge is available on the neural substrates for speech and language from brain-imaging studies in humans, but until recently there was a lack of data for comparison from other animal species on the evolutionarily conserved brain regions that process species-specific communication signals. To obtain new insights into the relationship of the substrates for communication in primates, we compared the results from several neuroimaging studies in humans with those that have recently been obtained from macaque monkeys and chimpanzees. The recent work in humans challenges the longstanding notion of highly localized speech areas. As a result, the brain regions that have been identified in humans for speech and nonlinguistic voice processing show a striking general correspondence to how the brains of other primates analyze species-specific vocalizations or information in the voice, such as voice identity. The comparative neuroimaging work has begun to clarify evolutionary relationships in brain function, supporting the notion that the brain regions that process communication signals in the human brain arose from a precursor network of regions that is present in nonhuman primates and is used for processing species-specific vocalizations. We conclude by considering how the stage now seems to be set for comparative neurobiology to characterize the ancestral state of the network that evolved in humans to support language.  相似文献   

12.
Songbirds are one of the few groups of animals that learn the sounds used for vocal communication during development. Like humans, songbirds memorize vocal sounds based on auditory experience with vocalizations of adult “tutors”, and then use auditory feedback of self-produced vocalizations to gradually match their motor output to the memory of tutor sounds. In humans, investigations of early vocal learning have focused mainly on perceptual skills of infants, whereas studies of songbirds have focused on measures of vocal production. In order to fully exploit songbirds as a model for human speech, understand the neural basis of learned vocal behavior, and investigate links between vocal perception and production, studies of songbirds must examine both behavioral measures of perception and neural measures of discrimination during development. Here we used behavioral and electrophysiological assays of the ability of songbirds to distinguish vocal calls of varying frequencies at different stages of vocal learning. The results show that neural tuning in auditory cortex mirrors behavioral improvements in the ability to make perceptual distinctions of vocal calls as birds are engaged in vocal learning. Thus, separate measures of neural discrimination and behavioral perception yielded highly similar trends during the course of vocal development. The timing of this improvement in the ability to distinguish vocal sounds correlates with our previous work showing substantial refinement of axonal connectivity in cortico-basal ganglia pathways necessary for vocal learning.  相似文献   

13.
Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS) and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS). Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex.  相似文献   

14.
Most previous studies of vocal attractiveness have focused on preferences for physical characteristics of voices such as pitch. Here we examine the content of vocalizations in interaction with such physical traits, finding that vocal cues of social interest modulate the strength of men's preferences for raised pitch in women's voices. Men showed stronger preferences for raised pitch when judging the voices of women who appeared interested in the listener than when judging the voices of women who appeared relatively disinterested in the listener. These findings show that voice preferences are not determined solely by physical properties of voices and that men integrate information about voice pitch and the degree of social interest expressed by women when forming voice preferences. Women's preferences for raised pitch in women's voices were not modulated by cues of social interest, suggesting that the integration of cues of social interest and voice pitch when men judge the attractiveness of women's voices may reflect adaptations that promote efficient allocation of men's mating effort.  相似文献   

15.
The perception of emotions is often suggested to be multimodal in nature, and bimodal as compared to unimodal (auditory or visual) presentation of emotional stimuli can lead to superior emotion recognition. In previous studies, contrastive aftereffects in emotion perception caused by perceptual adaptation have been shown for faces and for auditory affective vocalization, when adaptors were of the same modality. By contrast, crossmodal aftereffects in the perception of emotional vocalizations have not been demonstrated yet. In three experiments we investigated the influence of emotional voice as well as dynamic facial video adaptors on the perception of emotion-ambiguous voices morphed on an angry-to-happy continuum. Contrastive aftereffects were found for unimodal (voice) adaptation conditions, in that test voices were perceived as happier after adaptation to angry voices, and vice versa. Bimodal (voice + dynamic face) adaptors tended to elicit larger contrastive aftereffects. Importantly, crossmodal (dynamic face) adaptors also elicited substantial aftereffects in male, but not in female participants. Our results (1) support the idea of contrastive processing of emotions (2), show for the first time crossmodal adaptation effects under certain conditions, consistent with the idea that emotion processing is multimodal in nature, and (3) suggest gender differences in the sensory integration of facial and vocal emotional stimuli.  相似文献   

16.
Implicit multisensory associations influence voice recognition   总被引:4,自引:1,他引:3       下载免费PDF全文
Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules.  相似文献   

17.
Research into speech perception by nonhuman animals can be crucially informative in assessing whether specific perceptual phenomena in humans have evolved to decode speech, or reflect more general traits. Birds share with humans not only the capacity to use complex vocalizations for communication but also many characteristics of its underlying developmental and mechanistic processes; thus, birds are a particularly interesting group for comparative study. This review first discusses commonalities between birds and humans in perception of speech sounds. Several psychoacoustic studies have shown striking parallels in seemingly speech-specific perceptual phenomena, such as categorical perception of voice-onset-time variation, categorization of consonants that lack phonetic invariance, and compensation for coarticulation. Such findings are often regarded as evidence for the idea that the objects of human speech perception are auditory or acoustic events rather than articulations. Next, I highlight recent research on the production side of avian communication that has revealed the existence of vocal tract filtering and articulation in bird species-specific vocalization, which has traditionally been considered a hallmark of human speech production. Together, findings in birds show that many of characteristics of human speech perception are not uniquely human but also that a comparative approach to the question of what are the objects of perception--articulatory or auditory events--requires careful consideration of species-specific vocal production mechanisms.  相似文献   

18.
Here we report findings from neuropsychological investigations showing the existence, in humans, of intersensory integrative systems representing space through the multisensory coding of visual and tactile events. In addition, these findings show that visuo-tactile integration may take place in a privileged manner within a limited sector of space closely surrounding the body surface, i.e., the near-peripersonal space. They also demonstrate that the representation of near-peripersonal space is not static, as objects in the out-of-reach space can be processed as nearer, depending upon the (illusory) visual information about hand position in space, and the use of tools as physical extensions of the reachable space. Finally, new evidence is provided suggesting the multisensory coding of peripersonal space can be achieved through bottom-up processing that, at least in some instances, is not necessarily modulated by more "cognitive" top-down processing, such as the expectation regarding the possibility of being touched. These findings are entirely consistent with the functional properties of multisensory neuronal structures coding near-peripersonal space in monkeys, as well as with behavioral, and neuroimaging evidence for the cross-modal coding of space in normal subjects. This high level of convergence ultimately favors the idea that multisensory space coding is achieved through similar multimodal structures in both humans and non-human primates.  相似文献   

19.
The ability to recognize faces is an important socio-cognitive skill that is associated with a number of cognitive specializations in humans. While numerous studies have examined the presence of these specializations in non-human primates, species where face recognition would confer distinct advantages in social situations, results have been mixed. The majority of studies in chimpanzees support homologous face-processing mechanisms with humans, but results from monkey studies appear largely dependent on the type of testing methods used. Studies that employ passive viewing paradigms, like the visual paired comparison task, report evidence of similarities between monkeys and humans, but tasks that use more stringent, operant response tasks, like the matching-to-sample task, often report species differences. Moreover, the data suggest that monkeys may be less sensitive than chimpanzees and humans to the precise spacing of facial features, in addition to the surface-based cues reflected in those features, information that is critical for the representation of individual identity. The aim of this paper is to provide a comprehensive review of the available data from face-processing tasks in non-human primates with the goal of understanding the evolution of this complex cognitive skill.  相似文献   

20.
For humans and animals, the ability to discriminate speech and conspecific vocalizations is an important physiological assignment of the auditory system. To reveal the underlying neural mechanism, many electrophysiological studies have investigated the neural responses of the auditory cortex to conspecific vocalizations in monkeys. The data suggest that vocalizations may be hierarchically processed along an anterior/ventral stream from the primary auditory cortex (A1) to the ventral prefrontal cortex. To date, the organization of vocalization processing has not been well investigated in the auditory cortex of other mammals. In this study, we examined the spike activities of single neurons in two early auditory cortical regions with different anteroposterior locations: anterior auditory field (AAF) and posterior auditory field (PAF) in awake cats, as the animals were passively listening to forward and backward conspecific calls (meows) and human vowels. We found that the neural response patterns in PAF were more complex and had longer latency than those in AAF. The selectivity for different vocalizations based on the mean firing rate was low in both AAF and PAF, and not significantly different between them; however, more vocalization information was transmitted when the temporal response profiles were considered, and the maximum transmitted information by PAF neurons was higher than that by AAF neurons. Discrimination accuracy based on the activities of an ensemble of PAF neurons was also better than that of AAF neurons. Our results suggest that AAF and PAF are similar with regard to which vocalizations they represent but differ in the way they represent these vocalizations, and there may be a complex processing stream between them.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号