首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.  相似文献   

2.
This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the shape of the vocal tract. The babbling process wherein these convex region targets are formed explains how an infant can learn phoneme-specific and language-specific limits on acceptable variability of articulator movements. The model also learns an orosensory-to-articulatory mapping wherein cells coding desired movement directions in orosensory space learn articulator movements that achieve these orosensory movement directions. The resulting mapping provides a natural explanation for the formation of coordinative structures. This mapping also makes efficient use of redundancy in the articulator system, thereby providing the model with motor equivalent capabilities. Simulations verify the model's ability to compensate for constraints or perturbations applied to the articulators automatically and without new learning and to explain contextual variability seen in human speech production.Supported in part by AFOSR F49620-92-J-0499  相似文献   

3.
We address the hypothesis that postures adopted during grammatical pauses in speech production are more “mechanically advantageous” than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and consonant intervals using automated analysis of video captured with real-time magnetic resonance imaging during production of read and spontaneous speech by 5 healthy speakers of American English. We then use locally-weighted linear regression to estimate the articulatory forward map from low-level articulator variables to high-level task/goal variables for these postures. We quantify the overall magnitude of the first derivative of the forward map as a measure of mechanical advantage. We find that postures assumed during grammatical pauses in speech as well as speech-ready postures are significantly more mechanically advantageous than postures assumed during absolute rest. Further, these postures represent empirical extremes of mechanical advantage, between which lie the postures assumed during various vowels and consonants. Relative mechanical advantage of different postures might be an important physical constraint influencing planning and control of speech production.  相似文献   

4.
Extensive research shows that inter-talker variability (i.e., changing the talker) affects recognition memory for speech signals. However, relatively little is known about the consequences of intra-talker variability (i.e. changes in speaking style within a talker) on the encoding of speech signals in memory. It is well established that speakers can modulate the characteristics of their own speech and produce a listener-oriented, intelligibility-enhancing speaking style in response to communication demands (e.g., when speaking to listeners with hearing impairment or non-native speakers of the language). Here we conducted two experiments to examine the role of speaking style variation in spoken language processing. First, we examined the extent to which clear speech provided benefits in challenging listening environments (i.e. speech-in-noise). Second, we compared recognition memory for sentences produced in conversational and clear speaking styles. In both experiments, semantically normal and anomalous sentences were included to investigate the role of higher-level linguistic information in the processing of speaking style variability. The results show that acoustic-phonetic modifications implemented in listener-oriented speech lead to improved speech recognition in challenging listening conditions and, crucially, to a substantial enhancement in recognition memory for sentences.  相似文献   

5.
Speech production has always been a subject of interest both at the morphological and acoustic levels. This knowledge is useful for a better understanding of all the involved mechanisms and for the construction of articulatory models. Magnetic resonance imaging (MRI) is a powerful technique that allows the study of the whole vocal tract, with good soft tissue contrast and resolution, and permits the calculation of area functions towards a better understanding of this mechanism. Thus, our aim is to demonstrate the value and application of MRI in speech production study and its relationship with engineering, namely with biomedical engineering. After vocal tract contours extraction, data were processed for 3D reconstruction culminating in model construction of some of the sounds of European Portuguese. MRI provides useful morphological data about the position and shape of the different speech articulators, and the biomedical engineering computational tools for its analysis.  相似文献   

6.
Different kinds of articulators, such as the upper and lower lips, jaw, and tongue, are precisely coordinated in speech production. Based on a perturbation study of the production of a fricative consonant using the upper and lower lips, it has been suggested that increasing the stiffness in the muscle linkage between the upper lip and jaw is beneficial for maintaining the constriction area between the lips (Gomi et al. 2002). This hypothesis is crucial for examining the mechanism of speech motor control, that is, whether mechanical impedance is controlled for the speech motor coordination. To test this hypothesis, in the current study we performed a dynamical simulation of lip compensatory movements based on a muscle linkage model and then evaluated the performance of compensatory movements. The temporal pattern of stiffness of muscle linkage was obtained from the electromyogram (EMG) of the orbicularis oris superior (OOS) muscle by using the temporal transformation (second-order dynamics with time delay) from EMG to stiffness, whose parameters were experimentally determined. The dynamical simulation using stiffness estimated from empirical EMG successfully reproduced the temporal profile of the upper lip compensatory articulations. Moreover, the estimated stiffness variation significantly contributed to reproduce a functional modulation of the compensatory response. This result supports the idea that the mechanical impedance highly contributes to organizing coordination among the lips and jaw. The motor command would be programmed not only to generate movement in each articulator but also to regulate mechanical impedance among articulators for robust coordination of speech motor control.  相似文献   

7.
Human beings are thought to be unique amongst the primates in their capacity to produce rapid changes in the shape of their vocal tracts during speech production. Acoustically, vocal tracts act as resonance chambers, whose geometry determines the position and bandwidth of the formants. Formants provide the acoustic basis for vowels, which enable speakers to refer to external events and to produce other kinds of meaningful communication. Formant-based referential communication is also present in non-human primates, most prominently in Diana monkey alarm calls. Previous work has suggested that the acoustic structure of these calls is the product of a non-uniform vocal tract capable of some degree of articulation. In this study we test this hypothesis by providing morphological measurements of the vocal tract of three adult Diana monkeys, using both radiography and dissection. We use these data to generate a vocal tract computational model capable of simulating the formant structures produced by wild individuals. The model performed best when it combined a non-uniform vocal tract consisting of three different tubes with a number of articulatory manoeuvres. We discuss the implications of these findings for evolutionary theories of human and non-human vocal production.  相似文献   

8.
Research into speech perception by nonhuman animals can be crucially informative in assessing whether specific perceptual phenomena in humans have evolved to decode speech, or reflect more general traits. Birds share with humans not only the capacity to use complex vocalizations for communication but also many characteristics of its underlying developmental and mechanistic processes; thus, birds are a particularly interesting group for comparative study. This review first discusses commonalities between birds and humans in perception of speech sounds. Several psychoacoustic studies have shown striking parallels in seemingly speech-specific perceptual phenomena, such as categorical perception of voice-onset-time variation, categorization of consonants that lack phonetic invariance, and compensation for coarticulation. Such findings are often regarded as evidence for the idea that the objects of human speech perception are auditory or acoustic events rather than articulations. Next, I highlight recent research on the production side of avian communication that has revealed the existence of vocal tract filtering and articulation in bird species-specific vocalization, which has traditionally been considered a hallmark of human speech production. Together, findings in birds show that many of characteristics of human speech perception are not uniquely human but also that a comparative approach to the question of what are the objects of perception--articulatory or auditory events--requires careful consideration of species-specific vocal production mechanisms.  相似文献   

9.
Evidence regarding visually guided limb movements suggests that the motor system learns and maintains neural maps between motor commands and sensory feedback. Such systems are hypothesized to be used in a feed-forward control strategy that permits precision and stability without the delays of direct feedback control. Human vocalizations involve precise control over vocal and respiratory muscles. However, little is known about the sensorimotor representations underlying speech production. Here, we manipulated the heard fundamental frequency of the voice during speech to demonstrate learning of auditory-motor maps. Mandarin speakers repeatedly produced words with specific pitch patterns (tone categories). On each successive utterance, the frequency of their auditory feedback was increased by 1/100 of a semitone until they heard their feedback one full semitone above their true pitch. Subjects automatically compensated for these changes by lowering their vocal pitch. When feedback was unexpectedly returned to normal, speakers significantly increased the pitch of their productions beyond their initial baseline frequency. This adaptation was found to generalize to the production of another tone category. However, results indicate that a more robust adaptation was produced for the tone that was spoken during feedback alteration. The immediate aftereffects suggest a global remapping of the auditory-motor relationship after an extremely brief training period. However, this learning does not represent a complete transformation of the mapping; rather, it is in part target dependent.  相似文献   

10.
The present report presents an attempt to define the physiological parameter used to describe “voice tremor” in psychological stress evaluating machines, and to find its sources. This parameter was found to be a low frequency (5–20 Hz) random process which frequency modulates the vocal cord waveform and (independently) affects the frequency range of the third speech formant. The frequency variations in unstressed speakers were found to be the result of forced muscular undulations driven by central nervous signals and not of a passive resonant phenomenon. In this paper various physiological and clinical experiments which lead to the above conclusions are discussed. a) It is shown that induced muscular activity in the vocal tract and vocal cord regions can generate tremor in the voice. b) It is shown that relaxed subjects exhibit significant tremor correlation between spontaneously generated speech and EMG, with the EMG leading the speech tremor. c) Tremor in the electrical activity recorded from muscles overlapping vocal tract area was correlated with third formant demodulated signal and vocal cord demodulated pitch tremor was correlated with first formant demodulated tremor. d) Enhanced tremor was found in Parkinson patients and diminished tremor in patients with some traumatic brain injuries.  相似文献   

11.
Human speech and bird vocalization are complex communicative behaviors with notable similarities in development and underlying mechanisms. However, there is an important difference between humans and birds in the way vocal complexity is generally produced. Human speech originates from independent modulatory actions of a sound source, e.g., the vibrating vocal folds, and an acoustic filter, formed by the resonances of the vocal tract (formants). Modulation in bird vocalization, in contrast, is thought to originate predominantly from the sound source, whereas the role of the resonance filter is only subsidiary in emphasizing the complex time-frequency patterns of the source (e.g., but see ). However, it has been suggested that, analogous to human speech production, tongue movements observed in parrot vocalizations modulate formant characteristics independently from the vocal source. As yet, direct evidence of such a causal relationship is lacking. In five Monk parakeets, Myiopsitta monachus, we replaced the vocal source, the syrinx, with a small speaker that generated a broad-band sound, and we measured the effects of tongue placement on the sound emitted from the beak. The results show that tongue movements cause significant frequency changes in two formants and cause amplitude changes in all four formants present between 0.5 and 10 kHz. We suggest that lingual articulation may thus in part explain the well-known ability of parrots to mimic human speech, and, even more intriguingly, may also underlie a speech-like formant system in natural parrot vocalizations.  相似文献   

12.

Background

Birdsong and human vocal communication are both complex behaviours which show striking similarities mainly thought to be present in the area of development and learning. Recent studies, however, suggest that there are also parallels in vocal production mechanisms. While it has been long thought that vocal tract filtering, as it occurs in human speech, only plays a minor role in birdsong there is an increasing number of studies indicating the presence of sound filtering mechanisms in bird vocalizations as well.

Methodology/Principal Findings

Correlating high-speed X-ray cinematographic imaging of singing zebra finches (Taeniopygia guttata) to song structures we identified beak gape and the expansion of the oropharyngeal-esophageal cavity (OEC) as potential articulators. We subsequently manipulated both structures in an experiment in which we played sound through the vocal tract of dead birds. Comparing acoustic input with acoustic output showed that OEC expansion causes an energy shift towards lower frequencies and an amplitude increase whereas a wide beak gape emphasizes frequencies around 5 kilohertz and above.

Conclusion

These findings confirm that birds can modulate their song by using vocal tract filtering and demonstrate how OEC and beak gape contribute to this modulation.  相似文献   

13.
Established linguistic theoretical frameworks propose that alphabetic language speakers use phonemes as phonological encoding units during speech production whereas Mandarin Chinese speakers use syllables. This framework was challenged by recent neural evidence of facilitation induced by overlapping initial phonemes, raising the possibility that phonemes also contribute to the phonological encoding process in Chinese. However, there is no evidence of non-initial phoneme involvement in Chinese phonological encoding among representative Chinese speakers, rendering the functional role of phonemes in spoken Chinese controversial. Here, we addressed this issue by systematically investigating the word-initial and non-initial phoneme repetition effect on the electrophysiological signal using a picture-naming priming task in which native Chinese speakers produced disyllabic word pairs. We found that overlapping phonemes in both the initial and non-initial position evoked more positive ERPs in the 180- to 300-ms interval, indicating position-invariant repetition facilitation effect during phonological encoding. Our findings thus revealed the fundamental role of phonemes as independent phonological encoding units in Mandarin Chinese.  相似文献   

14.
A key feature of speech is its stereotypical 5 Hz rhythm. One theory posits that this rhythm evolved through the modification of rhythmic facial movements in ancestral primates. If the hypothesis has any validity, then a comparative approach may shed some light. We tested this idea by using cineradiography (X-ray movies) to characterize and quantify the internal dynamics of the macaque monkey vocal tract during lip-smacking (a rhythmic facial expression) versus chewing. Previous human studies showed that speech movements are faster than chewing movements, and the functional coordination between vocal tract structures is different between the two behaviors. If rhythmic speech evolved through a rhythmic ancestral facial movement, then one hypothesis is that monkey lip-smacking versus chewing should also exhibit these differences. We found that the lips, tongue, and hyoid move with a speech-like 5 Hz rhythm during lip-smacking, but not during chewing. Most importantly, the functional coordination between these structures was distinct for each behavior. These data provide empirical support for the idea that the human speech rhythm evolved from the rhythmic facial expressions of ancestral primates.  相似文献   

15.
The sounds of human speech make human language a rapid medium of communication through a process of speech "encoding." The presence of sounds like the vowels [a], [i], and [u] makes this process possible. The supralaryngeal vocal tracts of newborn Homo sapiens and chimpanzee are similar and resemble the reconstructed vocal tract of the fossil La Chapelle-aux-Saints Neanderthal man. Vocal tract area functions that were directed toward making best possible approximations to the human vowels [a], [i], and [u], as well as certain consonantal configurations, were modeled by means of a computer program. The lack of these vowels in the phonetic repertories of these creatures, who lack a supralaryngeal pharyngeal region like that of adult Homo sapiens, may be concomitant with the absence of speech encoding and a consequently linguistic ability inferior to modern man.  相似文献   

16.
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.  相似文献   

17.
Recent comparative data reveal that formant frequencies are cues to body size in animals, due to a close relationship between formant frequency spacing, vocal tract length and overall body size. Accordingly, intriguing morphological adaptations to elongate the vocal tract in order to lower formants occur in several species, with the size exaggeration hypothesis being proposed to justify most of these observations. While the elephant trunk is strongly implicated to account for the low formants of elephant rumbles, it is unknown whether elephants emit these vocalizations exclusively through the trunk, or whether the mouth is also involved in rumble production. In this study we used a sound visualization method (an acoustic camera) to record rumbles of five captive African elephants during spatial separation and subsequent bonding situations. Our results showed that the female elephants in our analysis produced two distinct types of rumble vocalizations based on vocal path differences: a nasally- and an orally-emitted rumble. Interestingly, nasal rumbles predominated during contact calling, whereas oral rumbles were mainly produced in bonding situations. In addition, nasal and oral rumbles varied considerably in their acoustic structure. In particular, the values of the first two formants reflected the estimated lengths of the vocal paths, corresponding to a vocal tract length of around 2 meters for nasal, and around 0.7 meters for oral rumbles. These results suggest that African elephants may be switching vocal paths to actively vary vocal tract length (with considerable variation in formants) according to context, and call for further research investigating the function of formant modulation in elephant vocalizations. Furthermore, by confirming the use of the elephant trunk in long distance rumble production, our findings provide an explanation for the extremely low formants in these calls, and may also indicate that formant lowering functions to increase call propagation distances in this species''.  相似文献   

18.
The perception of vowels was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of vowels were taken as an index of similarity between vowels. Vowels used were five synthetic and natural Japanese vowels and eight natural French vowels. The chimpanzees required long reaction times for discrimination of synthetic [i] from [u] and [e] from [o], that is, they need long latencies for discrimination between vowels based on differences in frequency of the second formant. A similar tendency was observed for discrimination of natural [i] from [u]. The human subject required long reaction times for discrimination between vowels along the first formant axis. These differences can be explained by differences in auditory sensitivity between the two species and the motor theory of speech perception. A vowel, which is pronounced by different speakers, has different acoustic properties. However, humans can perceive these speech sounds as the same vowel. The phenomenon of perceptual constancy in speech perception was studied in chimpanzees using natural vowels and a synthetic [o]- [a] continuum. The chimpanzees ignored the difference in the sex of the speakers and showed a capacity for vocal tract normalization.  相似文献   

19.
To better understand the role of each of the laryngeal muscles in producing vocal fold movement, activation of these muscles was correlated with laryngeal movement during different tasks such as sniff, cough or throat clear, and speech syllable production. Four muscles [the posterior cricoarytenoid, lateral cricoarytenoid, cricothyroid (CT), and thyroarytenoid (TA)] were recorded with bipolar hooked wire electrodes placed bilaterally in four normal subjects. A nasoendoscope was used to record vocal fold movement while simultaneously recording muscle activity. Muscle activation level was correlated with ipsilateral vocal fold angle for vocal fold opening and closing. Pearson correlation coefficients and their statistical significance were computed for each trial. Significant effects of muscle (P < or = 0.0005) and task (P = 0.034) were found on the r (transformed to Fisher's Z') values. All of the posterior cricoarytenoid recordings related significantly with vocal opening, whereas CT activity was significantly correlated with opening only during sniff. The TA and lateral cricoarytenoid activities were significantly correlated with vocal fold closing during cough. During speech, the CT and TA activity correlated with both opening and closing. Laryngeal muscle patterning to produce vocal fold movement differed across tasks; reciprocal muscle activity only occurred on cough, whereas speech and sniff often involved simultaneous contraction of muscle antagonists. In conclusion, different combinations of muscle activation are used for biomechanical control of vocal fold opening and closing movements during respiratory, airway protection, and speech tasks.  相似文献   

20.
Vocal production in songbirds requires the control of the respiratory system, the syrinx as sound source and the vocal tract as acoustic filter. Vocal tract movements consist of beak, tongue and hyoid movements, which change the volume of the oropharyngeal–esophageal cavity (OEC), glottal movements and tracheal length changes. The respective contributions of each movement to filter properties are not completely understood, but the effects of this filtering are thought to be very important for acoustic communication in birds. One of the most striking movements of the upper vocal tract during vocal behavior in songbirds involves the OEC. This study measured the acoustic effect of OEC adjustments in zebra finches by comparing resonance acoustics between an utterance with OEC expansion (calls) and a similar utterance without OEC expansion (respiratory sounds induced by a bilateral syringeal denervation). X-ray cineradiography confirmed the presence of an OEC motor pattern during song and call production, and a custom-built Hall-effect collar system confirmed that OEC expansion movements were not present during respiratory sounds. The spectral emphasis during zebra finch call production ranging between 2.5 and 5 kHz was not present during respiratory sounds, indicating strongly that it can be attributed to the OEC expansion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号