首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The activation of listener''s motor system during speech processing was first demonstrated by the enhancement of electromyographic tongue potentials as evoked by single-pulse transcranial magnetic stimulation (TMS) over tongue motor cortex. This technique is, however, technically challenging and enables only a rather coarse measurement of this motor mirroring. Here, we applied TMS to listeners’ tongue motor area in association with ultrasound tissue Doppler imaging to describe fine-grained tongue kinematic synergies evoked by passive listening to speech. Subjects listened to syllables requiring different patterns of dorso-ventral and antero-posterior movements (/ki/, /ko/, /ti/, /to/). Results show that passive listening to speech sounds evokes a pattern of motor synergies mirroring those occurring during speech production. Moreover, mirror motor synergies were more evident in those subjects showing good performances in discriminating speech in noise demonstrating a role of the speech-related mirror system in feed-forward processing the speaker''s ongoing motor plan.  相似文献   

2.
The purpose of this project was to assess the feasibility of imaging the velopharynx of adult volunteers during repetitive speech, using gated magnetic resonance imaging (MRI). Although a number of investigators have used conventional MRI in the study of the human vocal tract, the mismatch between the lengthy time necessary to acquire sufficiently detailed images and the rapidity of movement of the vocal tract during speech has forced investigators to acquire images either while the subject is at rest or during sustained utterances. The technique used here acquired a portion of each image during repetitive utterances, building the full image over multiple utterance cycles. The velopharyngeal portal was imaged on a 1.5-Tesla GE Signa LX 8.2 platform with gated fast spoiled gradient echo protocol. An external 1-Hertz trigger was fed to the cardiac gate. Subjects synchronized utterance of consonant-vowel syllables to a flashing light synchronized with the external trigger. Each acquisition of 30 phases per second at a single-slice location took 22 to 29 seconds. Four consonant-vowel syllables (/pa/, /ma/, /sa/, and /ka/) were evaluated. Subjects vocalized throughout the acquisition, beginning 5 to 6 seconds beforehand to establish a regular rhythm. Imaging of the velopharyngeal portal was performed for sagittal, velopharyngeal axial (aligned perpendicular to the "knee" of the velum), axial, and coronal planes. Volumes were obtained by sequential acquisition of six to 10 slices (each with 30 phases) in the axial or sagittal planes during repetition of the /pa/ syllable. Spatiotemporal volumes of the single-slice data were sectioned to provide time-motion images (analogous to M-mode echocardiograms). Three-dimensional dynamic volume renderings of palate motion were displayed interactively (Vortex; CieMed, Singapore). A method suitable for the collection and visualization of four-dimensional information regarding monosyllabic speech using gated MRI was developed. These techniques were applied to a population of adult volunteer subjects with no history of speech problems and two patients with a history of cleft lip and palate. The techniques allowed good real-time visualization of velopharyngeal anatomy during its entire range of motion and was also able to image pathology-specific anatomic differences in the subjects with cleft lip and cleft palate. These methods may be applicable to a wide spectrum of problems in speech physiology research and for clinical decision-making regarding surgery for speech and outcomes analysis.  相似文献   

3.
This paper presents a novel inverse estimation approach for the active contraction stresses of tongue muscles during speech. The proposed method is based on variational data assimilation using a mechanical tongue model and 3D tongue surface shapes for speech production. The mechanical tongue model considers nonlinear hyperelasticity, finite deformation, actual geometry from computed tomography (CT) images, and anisotropic active contraction by muscle fibers, the orientations of which are ideally determined using anatomical drawings. The tongue deformation is obtained by solving a stationary force-equilibrium equation using a finite element method. An inverse problem is established to find the combination of muscle contraction stresses that minimizes the Euclidean distance of the tongue surfaces between the mechanical analysis and CT results of speech production, where a signed-distance function represents the tongue surface. Our approach is validated through an ideal numerical example and extended to the real-world case of two Japanese vowels, /ʉ/ and /ɯ/. The results capture the target shape completely and provide an excellent estimation of the active contraction stresses in the ideal case, and exhibit similar tendencies as in previous observations and simulations for the actual vowel cases. The present approach can reveal the relative relationship among the muscle contraction stresses in similar utterances with different tongue shapes, and enables the investigation of the coordination of tongue muscles during speech using only the deformed tongue shape obtained from medical images. This will enhance our understanding of speech motor control.  相似文献   

4.
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.  相似文献   

5.
The present study investigated the effects of sequence complexity, defined in terms of phonemic similarity and phonotoactic probability, on the timing and accuracy of serial ordering for speech production in healthy speakers and speakers with either hypokinetic or ataxic dysarthria. Sequences were comprised of strings of consonant-vowel (CV) syllables with each syllable containing the same vowel, /a/, paired with a different consonant. High complexity sequences contained phonemically similar consonants, and sounds and syllables that had low phonotactic probabilities; low complexity sequences contained phonemically dissimilar consonants and high probability sounds and syllables. Sequence complexity effects were evaluated by analyzing speech error rates and within-syllable vowel and pause durations. This analysis revealed that speech error rates were significantly higher and speech duration measures were significantly longer during production of high complexity sequences than during production of low complexity sequences. Although speakers with dysarthria produced longer overall speech durations than healthy speakers, the effects of sequence complexity on error rates and speech durations were comparable across all groups. These findings indicate that the duration and accuracy of processes for selecting items in a speech sequence is influenced by their phonemic similarity and/or phonotactic probability. Moreover, this robust complexity effect is present even in speakers with damage to subcortical circuits involved in serial control for speech.  相似文献   

6.
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.  相似文献   

7.
Mochida T  Gomi H  Kashino M 《PloS one》2010,5(11):e13866

Background

There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified.

Methodology/Principal Findings

This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /Φa/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested.

Conclusions/Significance

The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded.  相似文献   

8.
Objective assessments of lip movement can be beneficial in many disciplines including visual speech recognition, for surgical outcome assessment in patients with cleft lip and for the rehabilitation of patients with facial nerve impairments. The aim of this study was to develop an outcome measure for lip shape during speech using statistical shape analysis techniques. Lip movements during speech were captured from a sample of adult subjects considered as average using a three-dimensional motion capture system. Geometric Morphometrics was employed to extract three-dimensional coordinate data for lip shape during four spoken words decomposed into seven visemes (which included the resting lip shape). Canonical variate analysis was carried out in an attempt to statistically discriminate the seven visemes. The results showed that the second canonical variate discriminated the resting lip shape from articulation of the utterances and accounted for 17.2% of the total variance of the model. The first canonical variate was significant in discriminating between the utterances and accounted for 72.8% of the total variance of the model. The outcome measure was created using the 95% confidence intervals of the canonical variate scores for each subject plotted as ellipses for each viseme. The method and outcome model is proposed as reference to compare lip movement during speech in similar population groups.  相似文献   

9.
Knowledge of the comparative anatomy of tongue musculature is crucial to the discussion of the origin and the evolution of speech because of the indispensable role played by this organ in speech. However, the tongue musculature of primates has rarely been studied. In a previous study, the author analyzed human tongue musculature and developed a 3D model of this organ [Takemoto, Journal of Speech, Language, and Hearing Research 44:95-107, 2001]. In this study, the tongue musculature of chimpanzees was examined using methods similar to those used for humans. Results showed that tongue musculature was topologically the same for both humans and chimpanzees. As in humans, the tongue musculature of chimpanzees consisted of inner and outer regions. The inner musculature was composed of serial "structural units," made up of two types of laminae whose fibers were perpendicular to the tongue surface. The outer musculature was a thin layer of fibers oriented parallel to the surface and superficial to the inner musculature. Although the tongue musculature of humans and chimpanzees is similar, the external shapes differ: the chimpanzee tongue is flat, whereas the human tongue is round. Applying the muscular hydrostat theory to the external shape of the tongue suggests that the primary actions of the chimpanzee tongue are protrusion and retrusion, whereas the human tongue can be deformed in the oral cavity with a high degree of freedom. It is hypothesized that the evolution of the external shape of the tongue is one of the factors that led to the development of human speech. The results of this study suggest that modeling based on muscular hydrostatic theory of the effects of changes in external tongue shape on articulatory movements should be included in discussions on the origin of speech.  相似文献   

10.
Although the acoustic variability of speech is often described as a problem for phonetic recognition, there is little research examining acoustic-phonetic variability over time. We measured naturally occurring acoustic variability in speech production at nine specific time points (three per day over three days) to examine daily change in production as well as change across days for citation-form vowels. Productions of seven different vowels (/EE/, /IH/, /AH/, /UH/, /AE/, /OO/, /EH/) were recorded at 9AM, 3PM and 9PM over the course of each testing day on three different days, every other day, over a span of five days. Results indicate significant systematic change in F1 and F0 values over the course of a day for each of the seven vowels recorded, whereas F2 and F3 remained stable. Despite this systematic change within a day, however, talkers did not show significant changes in F0, F1, F2, and F3 between days, demonstrating that speakers are capable of producing vowels with great reliability over days without any extrinsic feedback besides their own auditory monitoring. The data show that in spite of substantial day-to-day variability in the specific listening and speaking experiences of these participants and thus exposure to different acoustic tokens of speech, there is a high degree of internal precision and consistency for the production of citation form vowels.  相似文献   

11.
Neural encoding of temporal speech features is a key component of acoustic and phonetic analyses. We examined the temporal encoding of the syllables /da/ and /ta/, which differ along the temporally based, phonetic parameter of voice onset time (VOT), in primary auditory cortex (A1) of awake monkeys using concurrent multilaminar recordings of auditory evoked potentials (AEP), the derived current source density, and multiunit activity. A general sequence of A1 activation consisting of a lamina-specific profile of parallel and sequential excitatory and inhibitory processes is described. VOT is encoded in the temporal response patterns of phase-locked activity to the periodic speech segments and by “on” responses to stimulus and voicing onset. A transformation occurs between responses in the thalamocortical (TC) fiber input and A1 cells. TC fibers are more likely to encode VOT with “on” responses to stimulus onset followed by phase-locked responses during the voiced segment, whereas A1 responses are more likely to exhibit transient responses both to stimulus and voicing onset. Relevance to subcortical speech processing, the human AEP and speech psychoacoustics are discussed. A mechanism for categorical differentiation of voiced and unvoiced consonants is proposed.  相似文献   

12.
A computer-assisted three-dimensional (3D) system, 3D-DIASemb, has been developed that allows reconstruction and motion analysis of cells and nuclei in a developing embryo. In the system, 75 optical sections through a live embryo are collected in the z axis by using differential interference contrast microscopy. Optical sections for one reconstruction are collected in a 2.5-s period, and this process is repeated every 5 s. The outer perimeter and nuclear perimeter of each cell in the embryo are outlined in each optical section, converted into beta-spline models, and then used to construct 3D faceted images of the surface and nucleus of every cell in the developing embryo. Because all individual components of the embryo (i.e., each cell surface and each nuclear surface) are individually reconstructed, 3D-DIASemb allows isolation and analysis of (1) all or select nuclei in the absence of cell surfaces, (2) any single cell lineage, and (3) any single nuclear lineage through embryogenesis. Because all reconstructions represent mathematical models, 3D-DIASemb computes over 100 motility and dynamic morphology parameters for every cell, nucleus, or group of cells in the developing embryo at time intervals as short as 5 s. Finally, 3D-DIASemb reconstructs and motion analyzes cytoplasmic flow through the generation and analysis of "vector flow plots." To demonstrate the unique capabilities of this new technology, a Caenorhabditis elegans embryo is reconstructed and motion analyzed through the 28-cell stage. Although 3D-DIASemb was developed by using the C. elegans embryo as the experimental model, it can be applied to other embryonic systems. 3D-DIASemb therefore provides a new method for reconstructing and motion analyzing in 4D every cell and nucleus in a live, developing embryo, and should provide a powerful tool for assessing the effects of drugs, environmental perturbations, and mutations on the cellular and nuclear dynamics accompanying embryogenesis.  相似文献   

13.
Studies of the control of complex sequential movements have dissociated two aspects of movement planning: control over the sequential selection of movement plans, and control over the precise timing of movement execution. This distinction is particularly relevant in the production of speech: utterances contain sequentially ordered words and syllables, but articulatory movements are often executed in a non-sequential, overlapping manner with precisely coordinated relative timing. This study presents a hybrid dynamical model in which competitive activation controls selection of movement plans and coupled oscillatory systems govern coordination. The model departs from previous approaches by ascribing an important role to competitive selection of articulatory plans within a syllable. Numerical simulations show that the model reproduces a variety of speech production phenomena, such as effects of preparation and utterance composition on reaction time, and asymmetries in patterns of articulatory timing associated with onsets and codas. The model furthermore provides a unified understanding of a diverse group of phonetic and phonological phenomena which have not previously been related.  相似文献   

14.
ObjectiveDynamic PET imaging is extensively used in brain imaging to estimate parametric maps. Inter-frame motion can substantially disrupt the voxel-wise time-activity curves (TACs), leading to erroneous maps during kinetic modelling. Therefore, it is important to characterize the robustness of kinetic parameters under various motion and kinetic model related factors.MethodsFully 4D brain simulations ([15O]H2O and [18F]FDG dynamic datasets) were performed using a variety of clinically observed motion patterns. Increasing levels of head motion were investigated as well as varying temporal frames of motion initiation. Kinetic parameter estimation was performed using both post-reconstruction kinetic analysis and direct 4D image reconstruction to assess bias from inter-frame emission blurring and emission/attenuation mismatch.ResultsKinetic parameter bias heavily depends on the time point of motion initiation. Motion initiated towards the end of the scan results in the most biased parameters. For the [18F]FDG data, k4 is the more sensitive parameter to positional changes, while K1 and blood volume were proven to be relatively robust to motion. Direct 4D image reconstruction appeared more sensitive to changes in TACs due to motion, with parameter bias spatially propagating and depending on the level of motion.ConclusionKinetic parameter bias highly depends upon the time frame at which motion occurred, with late frame motion-induced TAC discontinuities resulting in the least accurate parameters. This is of importance during prolonged data acquisition as is often the case in neuro-receptor imaging studies. In the absence of a motion correction, use of TOF information within 4D image reconstruction could limit the error propagation.  相似文献   

15.
When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages.  相似文献   

16.
17.
We examined three bioacoustical analysis methods for comparing complex sounds among different populations. We chose the D‐syllable of the chick‐a‐dee call of the black‐capped chickadee (Poecile atricapilla) because it is a broadband sound representative of a class of vocalizations, common in many animals, that resists simple subjective classification for comparative studies. We examined the properties of the D‐syllable in field‐recorded samples from three different populations. The first method of data extraction sampled the amplitude values of a spectrum obtained in a single fast Fourier transform (SFFT) taken at the midpoint of each D‐syllable using multi‐speech software. The second method employed spectrogram cross‐correlation (SPCC) to obtain a matrix of similarity values between D‐syllables in the samples using canary software. The third method calculated similarity values obtained from the evaluation of four acoustic features of the D‐syllables derived from multi‐taper spectral analysis (MTSA) using sound analysis software. Following data extraction by these three techniques, we used multivariate statistical procedures to reduce the data for examination of differences among populations and to represent in scatter‐plots the patterns of clustering of the sounds. We found that the SFFT in the middle of the D‐syllable provided the poorest population discrimination following statistical processing, the SPCC method produced the next clearest population separation, and the MTSA method resulted in the most distinct separation of the three populations of D‐syllables. In carrying out these comparisons, we discovered that the characteristic environmental noise of a recording area can influence the signal properties of broadband sounds being compared by automated procedures, and could lead to faulty conclusions unless appropriate care is taken to mitigate the noise in which the signals of interest are embedded. Consequently we re‐analyzed our data following noise reduction and found less discrete population separation overall. However, the methods of SPCC and MTSA retained the ability to separate populations, with MTSA providing the sharpest discrimination among groups.  相似文献   

18.
19.
Work-related musculoskeletal disorders (WMSD) are commonly observed among the workers involved in material handling tasks such as lifting. To improve work place safety, it is necessary to assess musculoskeletal and biomechanical risk exposures associated with these tasks. Such an assessment has been mainly conducted using surface marker-based methods, which is time consuming and tedious. During the past decade, computer vision based pose estimation techniques have gained an increasing interest and may be a viable alternative for surface marker-based human movement analysis. The aim of this study is to develop and validate a computer vision based marker-less motion capture method to assess 3D joint kinematics of lifting tasks. Twelve subjects performing three types of symmetrical lifting tasks were filmed from two views using optical cameras. The joints kinematics were calculated by the proposed computer vision based motion capture method as well as a surface marker-based motion capture method. The joint kinematics estimated from the computer vision based method were practically comparable to the joint kinematics obtained by the surface marker-based method. The mean and standard deviation of the difference between the joint angles estimated by the computer vision based method and these obtained by the surface marker-based method was 2.31 ± 4.00°. One potential application of the proposed computer vision based marker-less method is to noninvasively assess 3D joint kinematics of industrial tasks such as lifting.  相似文献   

20.
PurposeTo validate the accuracy of 4D Monte Carlo (4DMC) simulations to calculate dose deliveries to a deforming anatomy in the presence of realistic respiratory motion traces. A previously developed deformable lung phantom comprising an elastic tumor was modified to enable programming of arbitrary motion profiles. 4D simulations of the dose delivered to the phantom were compared with the measurements.MethodsThe deformable lung phantom moving with irregular breathing patterns was irradiated using static and VMAT beam deliveries. Using the RADPOS 4D dosimetry system, point doses were measured inside and outside the tumor. Dose profiles were acquired using films along the motion path of the tumor (S-I). In addition to dose measurements, RADPOS was used to record the motion of the tumor during dose deliveries. Dose measurements were then compared against 4DMC simulations with EGSnrc/4DdefDOSXYZnrc using the recorded tumor motion.ResultsThe agreements between dose profiles from measurements and simulations were determined to be within 2%/2 mm. Point dose agreements were within 2σ of experimental and/or positional/dose reading uncertainties. 4DMC simulations were shown to accurately predict the sensitivity of delivered dose to the starting phase of breathing motions. We have demonstrated that our 4DMC method, combined with RADPOS, can accurately simulate realistic dose deliveries to a deforming anatomy moving with realistic breathing traces. This 4DMC tool has the potential to be used as a quality assurance tool to verify treatments involving respiratory motion. Adaptive treatment delivery is another area that may benefit from the potential of this 4DMC tool.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号