首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.  相似文献   

2.
The activation of listener''s motor system during speech processing was first demonstrated by the enhancement of electromyographic tongue potentials as evoked by single-pulse transcranial magnetic stimulation (TMS) over tongue motor cortex. This technique is, however, technically challenging and enables only a rather coarse measurement of this motor mirroring. Here, we applied TMS to listeners’ tongue motor area in association with ultrasound tissue Doppler imaging to describe fine-grained tongue kinematic synergies evoked by passive listening to speech. Subjects listened to syllables requiring different patterns of dorso-ventral and antero-posterior movements (/ki/, /ko/, /ti/, /to/). Results show that passive listening to speech sounds evokes a pattern of motor synergies mirroring those occurring during speech production. Moreover, mirror motor synergies were more evident in those subjects showing good performances in discriminating speech in noise demonstrating a role of the speech-related mirror system in feed-forward processing the speaker''s ongoing motor plan.  相似文献   

3.
Mochida T  Gomi H  Kashino M 《PloS one》2010,5(11):e13866

Background

There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified.

Methodology/Principal Findings

This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /Φa/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested.

Conclusions/Significance

The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded.  相似文献   

4.
5.
P. HANSEN 《Bioacoustics.》2013,22(2):147-154
ABSTRACT

The focus of this study was to determine whether individual vocal identification of Scops Owls Otus scops was possible and if there was a stability of the hoot-calls over a short time period in the same individuals. Spontaneous vocalizations of 13 owls were recorded in 2004 in Southern Tuscany, Italy. Visual analysis of spectrograms and quantitative multivariate analysis of six vocal features showed marked individual differences. In some owls a repertoire of two different hoot types was found. In 2005, 10 Scops owls were recorded three times in the same breeding season (2 hours and 10 days after the first session). Statistical analysis of data showed that 60% of owls did not change call features over time. However a slight but significant variability between successive vocal performances of the same owl was found in 40% of cases. This variability may decrease the recognition power by acoustic analysis. To overcome this obstacle I suggest a multi step qualitative/quantitative approach. A Difference Index (DI) was calculated to set a threshold between the slight intra-individual and the very high inter-individual variability. This method allowed the recognition of calls of each owl recorded over time in 2005.  相似文献   

6.
The perception of vowels was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of vowels were taken as an index of similarity between vowels. Vowels used were five synthetic and natural Japanese vowels and eight natural French vowels. The chimpanzees required long reaction times for discrimination of synthetic [i] from [u] and [e] from [o], that is, they need long latencies for discrimination between vowels based on differences in frequency of the second formant. A similar tendency was observed for discrimination of natural [i] from [u]. The human subject required long reaction times for discrimination between vowels along the first formant axis. These differences can be explained by differences in auditory sensitivity between the two species and the motor theory of speech perception. A vowel, which is pronounced by different speakers, has different acoustic properties. However, humans can perceive these speech sounds as the same vowel. The phenomenon of perceptual constancy in speech perception was studied in chimpanzees using natural vowels and a synthetic [o]- [a] continuum. The chimpanzees ignored the difference in the sex of the speakers and showed a capacity for vocal tract normalization.  相似文献   

7.
The human voice provides a rich source of information about individual attributes such as body size, developmental stability and emotional state. Moreover, there is evidence that female voice characteristics change across the menstrual cycle. A previous study reported that women speak with higher fundamental frequency (F0) in the high-fertility compared to the low-fertility phase. To gain further insights into the mechanisms underlying this variation in perceived attractiveness and the relationship between vocal quality and the timing of ovulation, we combined hormone measurements and acoustic analyses, to characterize voice changes on a day-to-day basis throughout the menstrual cycle. Voice characteristics were measured from free speech as well as sustained vowels. In addition, we asked men to rate vocal attractiveness from selected samples. The free speech samples revealed marginally significant variation in F0 with an increase prior to and a distinct drop during ovulation. Overall variation throughout the cycle, however, precluded unequivocal identification of the period with the highest conception risk. The analysis of vowel samples revealed a significant increase in degree of unvoiceness and noise-to-harmonic ratio during menstruation, possibly related to an increase in tissue water content. Neither estrogen nor progestogen levels predicted the observed changes in acoustic characteristics. The perceptual experiments revealed a preference by males for voice samples recorded during the pre-ovulatory period compared to other periods in the cycle. While overall we confirm earlier findings in that women speak with a higher and more variable fundamental frequency just prior to ovulation, the present study highlights the importance of taking the full range of variation into account before drawing conclusions about the value of these cues for the detection of ovulation.  相似文献   

8.
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.  相似文献   

9.
Sounds produced by male cichlids Metriaclima zebra during aggressive interactions were recorded to conduct a detailed analysis and to search for potential individual acoustic signatures. Fish from two different size groups (small and large individuals) were analysed. The two groups were significantly different for all acoustic variables considered; six of seven features demonstrated a significant interindividual variability and most of them were correlated with the size of the emitter. A cross-validated and permuted discriminant function analysis (pDFA) separated the two groups and correctly classified around 50% of the sounds to the correct individuals. Acoustic features that best distinguished among males were the instantaneous frequency of sounds and the modulation of pulse amplitude. These results suggest that acoustic signals could bear information about individual identity. The long-term stability of this signature is likely to be weak since the signature of a growing individual may change over time.  相似文献   

10.
Human beings are thought to be unique amongst the primates in their capacity to produce rapid changes in the shape of their vocal tracts during speech production. Acoustically, vocal tracts act as resonance chambers, whose geometry determines the position and bandwidth of the formants. Formants provide the acoustic basis for vowels, which enable speakers to refer to external events and to produce other kinds of meaningful communication. Formant-based referential communication is also present in non-human primates, most prominently in Diana monkey alarm calls. Previous work has suggested that the acoustic structure of these calls is the product of a non-uniform vocal tract capable of some degree of articulation. In this study we test this hypothesis by providing morphological measurements of the vocal tract of three adult Diana monkeys, using both radiography and dissection. We use these data to generate a vocal tract computational model capable of simulating the formant structures produced by wild individuals. The model performed best when it combined a non-uniform vocal tract consisting of three different tubes with a number of articulatory manoeuvres. We discuss the implications of these findings for evolutionary theories of human and non-human vocal production.  相似文献   

11.
Speech perception is remarkably robust. This paper examines how acoustic and auditory properties of vowels and consonants help to ensure intelligibility. First, the source-filter theory of speech production is briefly described, and the relationship between vocal-tract properties and formant patterns is demonstrated for some commonly occurring vowels. Next, two accounts of the structure of preferred sound inventories, quantal theory and dispersion theory, are described and some of their limitations are noted. Finally, it is suggested that certain aspects of quantal and dispersion theories can be unified in a principled way so as to achieve reasonable predictive accuracy.  相似文献   

12.
Changes in the oral cavity resulting from the loss of teeth and the ensuing reconstruction of a set of teeth by dentures (partial or complete) may cause changes in the speech and voice of the patient. The aim of the present investigation was to study the changes in speech and voice in patients suffering from teeth loss and the degree of speech improvement using dentures. Voice and speech parameters of a set of tested syllables were analysed in 10 patients at the 2nd Clinic of Stomatology. The analysis was carried out by means of an FFT, SoundForge 5.0 programme. Differently expressed acoustic changes in both consonants and vowels were ascertained in a percentage of the patients under examination. These concerned especially the sibilant ("s", "(see text)"), labiodental ("f", "v") and vibrating ("r", "(see text)") consonants. Changes in the FFT spectrum and air leakage in constrictive consonants were also found. In some patients the vowels, especially the closed ones ("i", "u"), may change their fundamental frequency and show noise admixture manifested as a blurred delimitation of the formants. A denture should, inter alia, render it possible for the patient to produce the same articulation to which he/she had been accustomed before the loss of teeth. For the construction of dentures the most important factors from a phonetic point of view appear to be the following: overbite, overjet, the height of the plate, the thickness of the palatal material, the incisor position, and the modelling of the ruga palatina on the hard palate. In case of wrong denture construction the acoustic changes may continue, resulting in the patient's stress load dependent upon sex, age, psychic condition and seriousness of the problem.  相似文献   

13.
Drawing on phonology research within the generative linguistics tradition, stochastic methods, and notions from complex systems, we develop a modelling paradigm linking phonological structure, expressed in terms of syllables, to speech movement data acquired with 3D electromagnetic articulography and X-ray microbeam methods. The essential variable in the models is syllable structure. When mapped to discrete coordination topologies, syllabic organization imposes systematic patterns of variability on the temporal dynamics of speech articulation. We simulated these dynamics under different syllabic parses and evaluated simulations against experimental data from Arabic and English, two languages claimed to parse similar strings of segments into different syllabic structures. Model simulations replicated several key experimental results, including the fallibility of past phonetic heuristics for syllable structure, and exposed the range of conditions under which such heuristics remain valid. More importantly, the modelling approach consistently diagnosed syllable structure proving resilient to multiple sources of variability in experimental data including measurement variability, speaker variability, and contextual variability. Prospects for extensions of our modelling paradigm to acoustic data are also discussed.  相似文献   

14.
In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learning by analyzing the duration of vowels from over 11 hours of Japanese infant-directed speech. We found that long vowels are substantially longer than short vowels in the input directed to infants, for each of the five oral vowels. However, we also found that learning phonemic length from the overall distribution of vowel duration is not going to be easy for a simple distributional learner, because of the large base-rate effect (i.e., 94% of vowels are short), and because of the many factors that influence vowel duration (e.g., intonational phrase boundaries, word boundaries, and vowel height). Therefore, a successful learner would need to take into account additional factors such as prosodic and lexical cues in order to discover that duration can contrast the meaning of words in Japanese. These findings highlight the importance of taking into account the naturalistic distributions of lexicons and acoustic cues when modeling early phonemic learning.  相似文献   

15.
Aim: The aim of this contribution is to present the formant chart of the Czech vowels a, e, i, o, u and show that this can be achieved by means of digital methods of sound processing. Method: A group of 35 Czech students of the Pedagogical Faculty of Palacky University was tested and a record of whispered vowels was taken from each of them. The record was digitalized and processed by the Discrete Fourier Trasform. The result is the power spectrum of the individual vocals - the graphic output consists of a plot of the relative power of individual frequencies in the original sound. The values of the first two maxima which represent the first and the second formants were determined from the graph. The values were plotted on a formant chart. Results: Altogether, 175 spectral analyses of individual vowels were performed. In the resulting power spectrum, the first and the second formant frequencies were identified. The first formant was plotted against the second one and pure vocal formant regions were identified. Conclusion: Frequency bands for the Czech vowel "a" were circumscribed between 850 and 1150 Hz for first formant (F1) and between 1200 and 2000 Hz for second formant (F2). Similarly, borders of frequency band for vowel "e" they were 700 and 950 Hz for F1 and 1700 and 3000 Hz for F2. For vowel "i" 300 and 450 Hz for F1 and 2000 and 3600 Hz for F2, for vowel "o" 600 and 800 Hz for F1 and 600 and 1400 Hz for F2, for vowel "u" 100 and 400 Hz for F1 and 400 and 1200 Hz for F2. Discussion: At low frequencies it is feasible to invoke the source-filter model of voice production and associate vowel identity with frequencies of the first two formants in the voice spectrum. On the other hand, subject to intonation, singing or other forms of exposed voice (such as emotional speech, focused speech), the formant regions tend to spread. In spectral analysis other frequencies dominate, so specific formant frequency bands are not easily recognizable. Although the resulting formant map is not much different from the formant map of Peterson, it carries basic information about specific Czech vowels. The results may be used in further research and in education.  相似文献   

16.
The sequential organization of sound over time can interact with the concurrent organization of sounds across frequency. Previous studies using simple acoustic stimuli have suggested that sequential streaming cues can retroactively affect the perceptual organization of sounds that have already occurred. It is unknown whether such effects generalize to the perception of speech sounds. Listeners’ ability to identify two simultaneously presented vowels was measured in the following conditions: no context, a preceding context stream (precursors), and a following context stream (postcursors). The context stream was comprised of brief repetitions of one of the two vowels, and the primary measure of performance was listeners’ ability to identify the other vowel. Results in the precursor condition showed a significant advantage for the identification of the second vowel compared to the no-context condition, suggesting that sequential grouping mechanisms aided the segregation of the concurrent vowels, in agreement with previous work. However, performance in the postcursor condition was significantly worse compared to the no-context condition, providing no evidence for an effect of stream segregation, and suggesting a possible interference effect. Two additional experiments involving inharmonic (jittered) vowels were performed to provide additional cues to aid retroactive stream segregation; however, neither manipulation enabled listeners to improve their identification of the target vowel. Taken together with earlier studies, the results suggest that retroactive streaming may require large spectral differences between concurrent sources and thus may not provide a robust segregation cue for natural broadband sounds such as speech.  相似文献   

17.
目的:对上颌肯氏I类牙列缺损可摘局部义齿戴用前后的临床语音适应效果做出客观评价。方法:对30例上颌肯氏I类牙列缺失患者在义齿初戴前、初戴时、初戴后的1、2、4、8用语音变化规律进行分析。结果:辅音/j/:在塑料组,在初戴后2.4、8周与初戴时F2有统计学差异(P〈0.05);在初戴时与初戴前相比F2也有统计学差异(P〈0.05)。初戴后1周也有统计学差异(P〈O.05);两组同一时间内比较,/j/的F2在初戴时有统计学差异(P〈0.01)。/sh/音:在初戴时与初戴后2、8周F1有统计学差异(P〈0.05)。/z/音:在塑料、铸造两组中,F1、F2在初戴前、后2、4、8周均有统计学差异(P〈0.05)。结论:可摘局部义齿基托厚度主要影响的发音部位是硬腭前部,在临床上,减少硬腭区基托厚度,对临床修复有较好的指导意义。  相似文献   

18.
Summary The stages of growth of the acoustic pathway (peripheral branch) were studied with the electron microscope in serial sections of the acoustic organs of 3 to 7 day chick embryos.Migration of cells from the acoustic epithelium was found at three days of incubation. These cells are presumably the futur ganglion cells. Fascicles of nerve fibers penetrate the epithelium through gaps of the basement membrane at 4–5 days of incubation. A dilatation develops in the intraepithelial fibers at about six days and thin and long prolongations grow from these dilatations and distribute among the cells. In the course of the next day the fibers embrace the foot of the sensory cell and the prolongations become shortened. Many of these extensions are charged with vesicles. At this stage (seven days) specialized structures (synaptic bars) differentiate in the region of the sensory cell contacting the large nerve ending (calix) or its short extensions. Each cell may show several synaptic bars, and each prolongation may contact with more than one cell.Research sponsored by the Air Force Office of Scientific Research, Office of Aerospace Research, United States Air Force, under AFOSR Grant Nr. 313-67.  相似文献   

19.
Microarray experiments are affected by several sources of variability. The paper demonstrates the major role of the day-to-day variability, it underlines the importance of a randomized block design when processing replicates over several days to avoid systematic biases and it proposes a simple algorithm that minimizes the day dependence.  相似文献   

20.
Seeing the articulatory gestures of the speaker (“speech reading”) enhances speech perception especially in noisy conditions. Recent neuroimaging studies tentatively suggest that speech reading activates speech motor system, which then influences superior-posterior temporal lobe auditory areas via an efference copy. Here, nineteen healthy volunteers were presented with silent videoclips of a person articulating Finnish vowels /a/, /i/ (non-targets), and /o/ (targets) during event-related functional magnetic resonance imaging (fMRI). Speech reading significantly activated visual cortex, posterior fusiform gyrus (pFG), posterior superior temporal gyrus and sulcus (pSTG/S), and the speech motor areas, including premotor cortex, parts of the inferior (IFG) and middle (MFG) frontal gyri extending into frontal polar (FP) structures, somatosensory areas, and supramarginal gyrus (SMG). Structural equation modelling (SEM) of these data suggested that information flows first from extrastriate visual cortex to pFS, and from there, in parallel, to pSTG/S and MFG/FP. From pSTG/S information flow continues to IFG or SMG and eventually somatosensory areas. Feedback connectivity was estimated to run from MFG/FP to IFG, and pSTG/S. The direct functional connection from pFG to MFG/FP and feedback connection from MFG/FP to pSTG/S and IFG support the hypothesis of prefrontal speech motor areas influencing auditory speech processing in pSTG/S via an efference copy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号