首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors.  相似文献   

2.
Objective assessments of lip movement can be beneficial in many disciplines including visual speech recognition, for surgical outcome assessment in patients with cleft lip and for the rehabilitation of patients with facial nerve impairments. The aim of this study was to develop an outcome measure for lip shape during speech using statistical shape analysis techniques. Lip movements during speech were captured from a sample of adult subjects considered as average using a three-dimensional motion capture system. Geometric Morphometrics was employed to extract three-dimensional coordinate data for lip shape during four spoken words decomposed into seven visemes (which included the resting lip shape). Canonical variate analysis was carried out in an attempt to statistically discriminate the seven visemes. The results showed that the second canonical variate discriminated the resting lip shape from articulation of the utterances and accounted for 17.2% of the total variance of the model. The first canonical variate was significant in discriminating between the utterances and accounted for 72.8% of the total variance of the model. The outcome measure was created using the 95% confidence intervals of the canonical variate scores for each subject plotted as ellipses for each viseme. The method and outcome model is proposed as reference to compare lip movement during speech in similar population groups.  相似文献   

3.
The potential role of a size-scaling principle in orofacial movements for speech was examined by using between-group (adults vs. 5-yr-old children) as well as within-group correlational analyses. Movements of the lower lip and jaw were recorded during speech production, and anthropometric measures of orofacial structures were made. Adult women produced speech movements of equal amplitude and velocity to those of adult men. The children produced speech movement amplitudes equal to those of adults, but they had significantly lower peak velocities of orofacial movement. Thus we found no evidence supporting a size-scaling principle for orofacial speech movements. Young children have a relatively large-amplitude, low-velocity movement strategy for speech production compared with young adults. This strategy may reflect the need for more time to plan speech movement sequences and an increased reliance on sensory feedback as young children develop speech motor control processes.  相似文献   

4.
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data.  相似文献   

5.
6.
Nasir SM  Ostry DJ 《Current biology : CB》2006,16(19):1918-1923
Speech production is dependent on both auditory and somatosensory feedback. Although audition may appear to be the dominant sensory modality in speech production, somatosensory information plays a role that extends from brainstem responses to cortical control. Accordingly, the motor commands that underlie speech movements may have somatosensory as well as auditory goals. Here we provide evidence that, independent of the acoustics, somatosensory information is central to achieving the precision requirements of speech movements. We were able to dissociate auditory and somatosensory feedback by using a robotic device that altered the jaw's motion path, and hence proprioception, without affecting speech acoustics. The loads were designed to target either the consonant- or vowel-related portion of an utterance because these are the major sound categories in speech. We found that, even in the absence of any effect on the acoustics, with learning subjects corrected to an equal extent for both kinds of loads. This finding suggests that there are comparable somatosensory precision requirements for both kinds of speech sounds. We provide experimental evidence that the neural control of stiffness or impedance--the resistance to displacement--provides for somatosensory precision in speech production.  相似文献   

7.
Head and facial movements can provide valuable cues to identity in addition to their primary roles in communicating speech and expression [1-8]. Here we report experiments in which we have used recent motion capture and animation techniques to animate an average head [9]. These techniques have allowed the isolation of motion from other cues and have enabled us to separate rigid translations and rotations of the head from nonrigid facial motion. In particular, we tested whether human observers can judge sex and identity on the basis of this information. Results show that people can discriminate both between individuals and between males and females from motion-based information alone. Rigid head movements appear particularly useful for categorization on the basis of identity, while nonrigid motion is more useful for categorization on the basis of sex. Accuracy for both sex and identity judgements is reduced when faces are presented upside down, and this finding shows that performance is not based on low-level motion cues alone and suggests that the information is represented in an object-based motion-encoding system specialized for upright faces. Playing animations backward also reduced performance for sex judgements and emphasized the importance of direction specificity in admitting access to stored representations of characteristic male and female movements.  相似文献   

8.
The study of the production of co-speech gestures (CSGs), i.e., meaningful hand movements that often accompany speech during everyday discourse, provides an important opportunity to investigate the integration of language, action, and memory because of the semantic overlap between gesture movements and speech content. Behavioral studies of CSGs and speech suggest that they have a common base in memory and predict that overt production of both speech and CSGs would be preceded by neural activity related to memory processes. However, to date the neural correlates and timing of CSG production are still largely unknown. In the current study, we addressed these questions with magnetoencephalography and a semantic association paradigm in which participants overtly produced speech or gesture responses that were either meaningfully related to a stimulus or not. Using spectral and beamforming analyses to investigate the neural activity preceding the responses, we found a desynchronization in the beta band (15–25 Hz), which originated 900 ms prior to the onset of speech and was localized to motor and somatosensory regions in the cortex and cerebellum, as well as right inferior frontal gyrus. Beta desynchronization is often seen as an indicator of motor processing and thus reflects motor activity related to the hand movements that gestures add to speech. Furthermore, our results show oscillations in the high gamma band (50–90 Hz), which originated 400 ms prior to speech onset and were localized to the left medial temporal lobe. High gamma oscillations have previously been found to be involved in memory processes and we thus interpret them to be related to contextual association of semantic information in memory. The results of our study show that high gamma oscillations in medial temporal cortex play an important role in the binding of information in human memory during speech and CSG production.  相似文献   

9.
Pei X  Hill J  Schalk G 《IEEE pulse》2012,3(1):43-46
From the 1980s movie Firefox to the more recent Avatar, popular science fiction has speculated about the possibility of a persons thoughts being read directly from his or her brain. Such braincomputer interfaces (BCIs) might allow people who are paralyzed to communicate with and control their environment, and there might also be applications in military situations wherever silent user-to-user communication is desirable. Previous studies have shown that BCI systems can use brain signals related to movements and movement imagery or attention-based character selection. Although these systems have successfully demonstrated the possibility to control devices using brain function, directly inferring which word a person intends to communicate has been elusive. A BCI using imagined speech might provide such a practical, intuitive device. Toward this goal, our studies to date addressed two scientific questions: (1) Can brain signals accurately characterize different aspects of speech? (2) Is it possible to predict spoken or imagined words or their components using brain signals?  相似文献   

10.
Different kinds of articulators, such as the upper and lower lips, jaw, and tongue, are precisely coordinated in speech production. Based on a perturbation study of the production of a fricative consonant using the upper and lower lips, it has been suggested that increasing the stiffness in the muscle linkage between the upper lip and jaw is beneficial for maintaining the constriction area between the lips (Gomi et al. 2002). This hypothesis is crucial for examining the mechanism of speech motor control, that is, whether mechanical impedance is controlled for the speech motor coordination. To test this hypothesis, in the current study we performed a dynamical simulation of lip compensatory movements based on a muscle linkage model and then evaluated the performance of compensatory movements. The temporal pattern of stiffness of muscle linkage was obtained from the electromyogram (EMG) of the orbicularis oris superior (OOS) muscle by using the temporal transformation (second-order dynamics with time delay) from EMG to stiffness, whose parameters were experimentally determined. The dynamical simulation using stiffness estimated from empirical EMG successfully reproduced the temporal profile of the upper lip compensatory articulations. Moreover, the estimated stiffness variation significantly contributed to reproduce a functional modulation of the compensatory response. This result supports the idea that the mechanical impedance highly contributes to organizing coordination among the lips and jaw. The motor command would be programmed not only to generate movement in each articulator but also to regulate mechanical impedance among articulators for robust coordination of speech motor control.  相似文献   

11.
Studies of the control of complex sequential movements have dissociated two aspects of movement planning: control over the sequential selection of movement plans, and control over the precise timing of movement execution. This distinction is particularly relevant in the production of speech: utterances contain sequentially ordered words and syllables, but articulatory movements are often executed in a non-sequential, overlapping manner with precisely coordinated relative timing. This study presents a hybrid dynamical model in which competitive activation controls selection of movement plans and coupled oscillatory systems govern coordination. The model departs from previous approaches by ascribing an important role to competitive selection of articulatory plans within a syllable. Numerical simulations show that the model reproduces a variety of speech production phenomena, such as effects of preparation and utterance composition on reaction time, and asymmetries in patterns of articulatory timing associated with onsets and codas. The model furthermore provides a unified understanding of a diverse group of phonetic and phonological phenomena which have not previously been related.  相似文献   

12.
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today’s automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database—a corpus containing emotionally colored conversations with a cognitive system for “Sensitive Artificial Listening”.  相似文献   

13.
Any human-computer interface requires both a means of transducing information flowing from the person and a way of classifying this information in a form that can be used by an application program. Since several interface devices exploit the head movements of disabled people to control computers, this paper includes a discussion of existing technologies based on head movements. As an alternative to simple techniques based on pointing to classify this information, this paper studies the possibility of using a combination of pointing and movement gestures to control an application program. By using hidden Markov models to classify movements into ‘yes’, ‘no’ and spurious gestures, it was possible to control a simple graphics application program. Subsequent analysis showed that the hidden Markov models achieved a 74% success rate.  相似文献   

14.
Recent advances in animal tracking and telemetry technology have allowed the collection of location data at an ever-increasing rate and accuracy, and these advances have been accompanied by the development of new methods of data analysis for portraying space use, home ranges and utilization distributions. New statistical approaches include data-intensive techniques such as kriging and nonlinear generalized regression models for habitat use. In addition, mechanistic home-range models, derived from models of animal movement behaviour, promise to offer new insights into how home ranges emerge as the result of specific patterns of movements by individuals in response to their environment. Traditional methods such as kernel density estimators are likely to remain popular because of their ease of use. Large datasets make it possible to apply these methods over relatively short periods of time such as weeks or months, and these estimates may be analysed using mixed effects models, offering another approach to studying temporal variation in space-use patterns. Although new technologies open new avenues in ecological research, our knowledge of why animals use space in the ways we observe will only advance by researchers using these new technologies and asking new and innovative questions about the empirical patterns they observe.  相似文献   

15.
The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.  相似文献   

16.
A method for the flexible docking of high-resolution atomic structures into lower resolution densities derived from electron microscopy is presented. The atomic structure is deformed by an iterative process using combinations of normal modes to obtain the best fit of the electron microscopical density. The quality of the computed structures has been evaluated by several techniques borrowed from crystallography. Two atomic structures of the SERCA1 Ca-ATPase corresponding to different conformations were used as a starting point to fit the electron density corresponding to a different conformation. The fitted models have been compared to published models obtained by rigid domain docking, and their relation to the known crystallographic structures are explored by normal mode analysis. We find that only a few number of modes contribute significantly to the transition. The associated motions involve almost exclusively rotation and translation of the cytoplasmic domains as well as displacement of cytoplasmic loops. We suggest that the movements of the cytoplasmic domains are driven by the conformational change that occurs between nonphosphorylated and phosphorylated intermediate, the latter being mimicked by the presence of vanadate at the phosphorylation site in the electron microscopy structure.  相似文献   

17.
Post-traumatic stress (PTSD) is considered a clinical issue that influences numerous people from diverse trades all over the world. Numerous research scholars recorded diverse complexities to estimate the severity of the PTSD symptoms in the patients. But diagnosing PTSD and obtaining accurate diagnosing techniques becomes a more complicated task. Therefore, this paper develops a speech based post-traumatic stress disorder monitoring method and the significant objective of the proposed method is to determine if the patients are affected by PTSD. The proposed approach utilizes three different steps: pre-processing or pre-emphasis, feature extraction as well as classification to evaluate the patients affected by PTSD or not. The input speech signal is initially provided to the pre-processing phase where the speech gets segmented into frames. The speech frame is then extracted and classified using XGBoost based Teamwork optimization (XGB-TWO) algorithm. In addition to this, we utilized two different types of datasets namely TIMIT and FEMH to evaluate and classify the PSTD from the speech signals. Furthermore, based on the evaluation of the proposed model to diagnose PTSD patients, various evaluation metrics namely accuracy, specificity, sensitivity, and recall are evaluated. Finally, the experimental investigation and comparative analysis are carried out and the evaluation results demonstrated that the accuracy rate achieved for the proposed technique is 98.25%.  相似文献   

18.
19.
Two models have been proposed to explain the adventurous gliding motility of Myxococcus xanthus: (i) polar secretion of slime and (ii) an unknown motor that uses cell surface adhesion complexes that form periodic attachments along the cell length. Gliding movements of the leading poles of cephalexin-treated filamentous cells were observed but not equivalent movements of the lagging poles. This demonstrates that the adventurous-motility motors are not confined to the rear of the cell.  相似文献   

20.
Nonequilibrium response spectroscopy (NRS) has been proposed recently to complement standard electrophysiological techniques used to investigate ion channels. It involves application of rapidly oscillating potentials that drive the ion channel ensemble far from equilibrium. It is argued that new, so far undiscovered features of ion channel gating kinetics may become apparent under such nonequilibrium conditions. In this paper we explore the possibility of using regular, sinusoidal voltages with the NRS protocols to facilitate Markov model selection for ion channels. As a test case we consider the Shaker potassium channel for which various Markov models have been proposed recently. We concentrate on certain classes of such models and show that while some models might be virtually indistinguishable using standard methods, they show marked differences when driven with an oscillating voltage. Model currents are compared to experimental data obtained for the Shaker K+ channel expressed in mammalian cells (tsA 201).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号