首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The present report presents an attempt to define the physiological parameter used to describe “voice tremor” in psychological stress evaluating machines, and to find its sources. This parameter was found to be a low frequency (5–20 Hz) random process which frequency modulates the vocal cord waveform and (independently) affects the frequency range of the third speech formant. The frequency variations in unstressed speakers were found to be the result of forced muscular undulations driven by central nervous signals and not of a passive resonant phenomenon. In this paper various physiological and clinical experiments which lead to the above conclusions are discussed. a) It is shown that induced muscular activity in the vocal tract and vocal cord regions can generate tremor in the voice. b) It is shown that relaxed subjects exhibit significant tremor correlation between spontaneously generated speech and EMG, with the EMG leading the speech tremor. c) Tremor in the electrical activity recorded from muscles overlapping vocal tract area was correlated with third formant demodulated signal and vocal cord demodulated pitch tremor was correlated with first formant demodulated tremor. d) Enhanced tremor was found in Parkinson patients and diminished tremor in patients with some traumatic brain injuries.  相似文献   

2.
An artificial neural network which uses anatomical and physiological findings on the afferent pathway from the ear to the cortex is presented and the roles of the constituent functions in recognition of continuous speech are examined. The network deals with successive spectra of speech sounds by a cascade of several neural layers: lateral excitation layer (LEL), lateral inhibition layer (LIL), and a pile of feature detection layers (FDL's). These layers are shown to be effective for recognizing spoken words. Namely, first, LEL reduces the distortion of sound spectrum caused by the pitch of speech sounds. Next, LIL emphasizes the major energy peaks of sound spectrum, the formants. Last, FDL's detect syllables and words in successive formants, where two functions, time-delay and strong adaptation, play important roles: time-delay makes it possible to retain the pattern of formant changes for a period to detect spoken words successively; strong adaptation contributes to removing the time-warp of formant changes. Digital computer simulations show that the network detect isolated syllables, isolated words, and connected words in continuous speech, while reproducing the fundamental responses found in the auditory system such as ON, OFF, ON-OFF, and SUSTAINED patterns.  相似文献   

3.
Unbalanced bipolar stimulation, delivered using charge balanced pulses, was used to produce “Phantom stimulation”, stimulation beyond the most apical contact of a cochlear implant’s electrode array. The Phantom channel was allocated audio frequencies below 300Hz in a speech coding strategy, conveying energy some two octaves lower than the clinical strategy and hence delivering the fundamental frequency of speech and of many musical tones. A group of 12 Advanced Bionics cochlear implant recipients took part in a chronic study investigating the fitting of the Phantom strategy and speech and music perception when using Phantom. The evaluation of speech in noise was performed immediately after fitting Phantom for the first time (Session 1) and after one month of take-home experience (Session 2). A repeated measures of analysis of variance (ANOVA) within factors strategy (Clinical, Phantom) and interaction time (Session 1, Session 2) revealed a significant effect for the interaction time and strategy. Phantom obtained a significant improvement in speech intelligibility after one month of use. Furthermore, a trend towards a better performance with Phantom (48%) with respect to F120 (37%) after 1 month of use failed to reach significance after type 1 error correction. Questionnaire results show a preference for Phantom when listening to music, likely driven by an improved balance between high and low frequencies.  相似文献   

4.
基于时间机理与部位机理整合的鲁棒性语音信号表达   总被引:1,自引:0,他引:1  
传统语音信号谱特征的提取是基于FFT 的能谱分析方法,在噪音环境情况下,对噪音的频谱成分与语音信号的频谱成分的处理采用“平均主义”的原则。也就是说噪音的频谱成分与语音信号的频谱成分占同等重要的地位。显然在噪音环境中这种处理方法会使噪音掩蔽掉语音信号的成分。在听觉系统中这种处理编码方式如同耳蜗滤波器的频率分析功能那样,也就是部位机理。实际上听觉系统对噪音和周期信号的处理不是“平均主义”原则,而是对周期信号敏感, 对噪音不敏感,听觉神经纤维通过神经脉冲发放的周期间隔来编码刺激信号, 这对应听觉处理机制中的时间编码方式。基于这两种处理机制,文中提出整合部位机理和时间机理的方法,这正是听觉的处理刺激的方式。这样处理的方法很好地结合了两种处理机制的优点,能有效地探测噪音环境中的语音信号  相似文献   

5.

Background

Improvement of the cochlear implant (CI) front-end signal acquisition is needed to increase speech recognition in noisy environments. To suppress the directional noise, we introduce a speech-enhancement algorithm based on microphone array beamforming and spectral estimation. The experimental results indicate that this method is robust to directional mobile noise and strongly enhances the desired speech, thereby improving the performance of CI devices in a noisy environment.

Methods

The spectrum estimation and the array beamforming methods were combined to suppress the ambient noise. The directivity coefficient was estimated in the noise-only intervals, and was updated to fit for the mobile noise.

Results

The proposed algorithm was realized in the CI speech strategy. For actual parameters, we use Maxflat filter to obtain fractional sampling points and cepstrum method to differentiate the desired speech frame and the noise frame. The broadband adjustment coefficients were added to compensate the energy loss in the low frequency band.

Discussions

The approximation of the directivity coefficient is tested and the errors are discussed. We also analyze the algorithm constraint for noise estimation and distortion in CI processing. The performance of the proposed algorithm is analyzed and further be compared with other prevalent methods.

Conclusions

The hardware platform was constructed for the experiments. The speech-enhancement results showed that our algorithm can suppresses the non-stationary noise with high SNR. Excellent performance of the proposed algorithm was obtained in the speech enhancement experiments and mobile testing. And signal distortion results indicate that this algorithm is robust with high SNR improvement and low speech distortion.  相似文献   

6.
Inferences on the evolution of human speech based on anatomical data must take into account its physiology, acoustics and perception. Human speech is generated by the supralaryngeal vocal tract (SVT) acting as an acoustic filter on noise sources generated by turbulent airflow and quasi-periodic phonation generated by the activity of the larynx. The formant frequencies, which are major determinants of phonetic quality, are the frequencies at which relative energy maxima will pass through the SVT filter. Neither the articulatory gestures of the tongue nor their acoustic consequences can be fractionated into oral and pharyngeal cavity components. Moreover, the acoustic cues that specify individual consonants and vowels are “encoded”, i.e., melded together. Formant frequency encoding makes human speech a vehicle for rapid vocal communication. Non-human primates lack the anatomy that enables modern humans to produce sounds that enhance this process, as well as the neural mechanisms necessary for the voluntary control of speech articulation. The specific claims of Duchin (1990) are discussed.  相似文献   

7.
Recording electrical auditory brainstem responses (EABR) provides clinical insight about responses of the residual post-cochlear neural system to electrical stimulation in profoundly deaf patients. A new strategy is presented for stimulating patients already implanted with a 15-electrode cochlear implant. Since the device is fully re-programmable via a RS-232 PC interface, it was possible to load a specific stimulating strategy designed to improve the spatial locus and the temporal structure of the impulse stimulation. Waves III to V emerge more clearly when this method is applied.  相似文献   

8.
Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.  相似文献   

9.
Cortical processing associated with orofacial somatosensory function in speech has received limited experimental attention due to the difficulty of providing precise and controlled stimulation. This article introduces a technique for recording somatosensory event-related potentials (ERP) that uses a novel mechanical stimulation method involving skin deformation using a robotic device. Controlled deformation of the facial skin is used to modulate kinesthetic inputs through excitation of cutaneous mechanoreceptors. By combining somatosensory stimulation with electroencephalographic recording, somatosensory evoked responses can be successfully measured at the level of the cortex. Somatosensory stimulation can be combined with the stimulation of other sensory modalities to assess multisensory interactions. For speech, orofacial stimulation is combined with speech sound stimulation to assess the contribution of multi-sensory processing including the effects of timing differences. The ability to precisely control orofacial somatosensory stimulation during speech perception and speech production with ERP recording is an important tool that provides new insight into the neural organization and neural representations for speech.  相似文献   

10.
We describe two design strategies that could substantially improve the performance of speech enhancement systems. Results from a preliminary study of pulse recovery are presented to illustrate the potential benefits of such strategies. The first strategy is a direct application of a non-linear, adaptive signal processing approach for recovery of speech in noise. The second strategy optimizes performance by maximizing the enhancement system's ability to evoke target speech percepts. This approach may lead to better performance because the design is optimized on a measure directly related to the ultimate goal of speech enhancement: accurate communication of the speech percept. In both systems, recently developed ‘neural network’ learning algorithms can be used to determine appropriate parameters for enhancement processing.  相似文献   

11.
Passive membrane properties of neurons, characterized by a linear voltage response to constant current stimulation, were investigated by busing a system model approach. This approach utilizes the derived expression for the input impedance of a network, which simulates the passive properties of neurons, to correlate measured intracellular recordings with the response of network models. In this study, the input impedances of different network configurations and of dentate granule neurons, were derived as a function of the network elements and were validated with computer simulations. The parameters of the system model, which are the values of the network elements, were estimated using an optimization strategy. The system model provides for better estimation of the network elements than the previously described signal model, due to its explicit nature. In contrast, the signal model is an implicit function of the network elements which requires intermediate steps to estimate some of the passive properties.  相似文献   

12.
This paper discusses modeling and automatic feedback control of (postural and rest) tremor for adaptive-control-methodology-based estimation of deep brain stimulation (DBS) parameters. The simplest linear oscillator-based tremor model, between stimulation amplitude and tremor, is investigated by utilizing input-output knowledge. Further, a nonlinear generalization of the oscillator-based tremor model, useful for derivation of a control strategy involving incorporation of parametric-bound knowledge, is provided. Using the Lyapunov method, a robust adaptive output feedback control law, based on measurement of the tremor signal from the fingers of a patient, is formulated to estimate the stimulation amplitude required to control the tremor. By means of the proposed control strategy, an algorithm is developed for estimation of DBS parameters such as amplitude, frequency and pulse width, which provides a framework for development of an automatic clinical device for control of motor symptoms. The DBS parameter estimation results for the proposed control scheme are verified through numerical simulations.  相似文献   

13.
Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output-a key source of phonetic detail-from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (≤30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (≈N1 + N2), F2 (≈N3 + N4) and the higher formants (F3' ≈ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.  相似文献   

14.
A speech act is a linguistic action intended by a speaker. Speech act classification is an essential part of a dialogue understanding system because the speech act of an utterance is closely tied with the user's intention in the utterance. We propose a neural network model for Korean speech act classification. In addition, we propose a method that extracts morphological features from surface utterances and selects effective ones among the morphological features. Using the feature selection method, the proposed neural network can partially increase precision and decrease training time. In the experiment, the proposed neural network showed better results than other models using comparatively high-level linguistic features. Based on the experimental result, we believe that the proposed neural network model is suitable for real field applications because it is easy to expand the neural network model into other domains. Moreover, we found that neural networks can be useful in speech act classification if we can convert surface sentences into vectors with fixed dimensions by using an effective feature selection method.  相似文献   

15.
The multiple-channel cochlear implant is the first sensori-neural prosthesis to effectively and safely bring electronic technology into a direct physiological relation with the central nervous system and human consciousness, and to give speech perception to severely-profoundly deaf people and spoken language to children. Research showed that the place and temporal coding of sound frequencies could be partly replicated by multiple-channel stimulation of the auditory nerve. This required safety studies on how to prevent the effects to the cochlea of trauma, electrical stimuli, biomaterials and middle ear infection. The mechanical properties of an array and mode of stimulation for the place coding of speech frequencies were determined. A fully implantable receiver-stimulator was developed, as well as the procedures for the clinical assessment of deaf people, and the surgical placement of the device. The perception of electrically coded sounds was determined, and a speech processing strategy discovered that enabled late-deafened adults to comprehend running speech. The brain processing systems for patterns of electrical stimuli reproducing speech were elucidated. The research was developed industrially, and improvements in speech processing made through presenting additional speech frequencies by place coding. Finally, the importance of the multiple-channel cochlear implant for early deafened children was established.  相似文献   

16.
Cochlear implant speech processors stimulate the auditory nerve by delivering amplitude-modulated electrical pulse trains to intracochlear electrodes. Studying how auditory nerve cells encode modulation information is of fundamental importance, therefore, to understanding cochlear implant function and improving speech perception in cochlear implant users. In this paper, we analyze simulated responses of the auditory nerve to amplitude-modulated cochlear implant stimuli using a point process model. First, we quantify the information encoded in the spike trains by testing an ideal observer’s ability to detect amplitude modulation in a two-alternative forced-choice task. We vary the amount of information available to the observer to probe how spike timing and averaged firing rate encode modulation. Second, we construct a neural decoding method that predicts several qualitative trends observed in psychophysical tests of amplitude modulation detection in cochlear implant listeners. We find that modulation information is primarily available in the sequence of spike times. The performance of an ideal observer, however, is inconsistent with observed trends in psychophysical data. Using a neural decoding method that jitters spike times to degrade its temporal resolution and then computes a common measure of phase locking from spike trains of a heterogeneous population of model nerve cells, we predict the correct qualitative dependence of modulation detection thresholds on modulation frequency and stimulus level. The decoder does not predict the observed loss of modulation sensitivity at high carrier pulse rates, but this framework can be applied to future models that better represent auditory nerve responses to high carrier pulse rate stimuli. The supplemental material of this article contains the article’s data in an active, re-usable format.  相似文献   

17.
Formants are important phonetic elements of human speech that are also used by humans and non-human mammals to assess the body size of potential mates and rivals. As a consequence, it has been suggested that formant perception, which is crucial for speech perception, may have evolved through sexual selection. Somewhat surprisingly, though, no previous studies have examined whether sexes differ in their ability to use formants for size evaluation. Here, we investigated whether men and women differ in their ability to use the formant frequency spacing of synthetic vocal stimuli to make auditory size judgements over a wide range of fundamental frequencies (the main determinant of vocal pitch). Our results reveal that men are significantly better than women at comparing the apparent size of stimuli, and that lower pitch improves the ability of both men and women to perform these acoustic size judgements. These findings constitute the first demonstration of a sex difference in formant perception, and lend support to the idea that acoustic size normalization, a crucial prerequisite for speech perception, may have been sexually selected through male competition. We also provide the first evidence that vocalizations with relatively low pitch improve the perception of size-related formant information.  相似文献   

18.
In this work a new strategy for automatic detection of ischemic episodes is proposed. A new measure for ST deviation based on the time–frequency analysis of the ECG and the use of a reduced set of Hermite basis functions for T wave and QRS complex morphology characterization, are the key points of the proposed methodology.Usually, ischemia manifests itself in the ECG signal by ST segment deviation or by QRS complex and T wave changes in morphology. These effects might occur simultaneously. Time–frequency methods are especially adequate for the detection of small transient characteristics hidden in the ECG, such as ST segment alterations. A Wigner–Ville transform-based approach is proposed to estimate the ST shift. To characterize the alterations in the T wave and the QRS morphologies, each cardiac beat is described by expansions in Hermite functions. These demonstrated to be suitable to capture the most relevant morphologic characteristics of the signal. A lead dependent neural network classifier considers, as inputs, the ST segment deviation and the Hermite expansion coefficients. The ability of the proposed method in ischemia episodes detection is evaluated using the European Society of Cardiology ST–T database. A sensitivity of 96.7% and a positive predictivity of 96.2% reveal the capacity of the proposed strategy to perform ischemic episodes identification.  相似文献   

19.

Background

Autism is a neurodevelopmental disorder characterized by a specific triad of symptoms such as abnormalities in social interaction, abnormalities in communication and restricted activities and interests. While verbal autistic subjects may present a correct mastery of the formal aspects of speech, they have difficulties in prosody (music of speech), leading to communication disorders. Few behavioural studies have revealed a prosodic impairment in children with autism, and among the few fMRI studies aiming at assessing the neural network involved in language, none has specifically studied prosodic speech. The aim of the present study was to characterize specific prosodic components such as linguistic prosody (intonation, rhythm and emphasis) and emotional prosody and to correlate them with the neural network underlying them.

Methodology/Principal Findings

We used a behavioural test (Profiling Elements of the Prosodic System, PEPS) and fMRI to characterize prosodic deficits and investigate the neural network underlying prosodic processing. Results revealed the existence of a link between perceptive and productive prosodic deficits for some prosodic components (rhythm, emphasis and affect) in HFA and also revealed that the neural network involved in prosodic speech perception exhibits abnormal activation in the left SMG as compared to controls (activation positively correlated with intonation and emphasis) and an absence of deactivation patterns in regions involved in the default mode.

Conclusions/Significance

These prosodic impairments could not only result from activation patterns abnormalities but also from an inability to adequately use the strategy of the default network inhibition, both mechanisms that have to be considered for decreasing task performance in High Functioning Autism.  相似文献   

20.
Assessing voice quality objectively is of great relevance to clinicians, both for quantifying surgical or pharmacological effectiveness and for detecting and classifying voice pathology. A large number of objective indexes have been proposed in literature and implemented in commercially available software tools. However, clinicians commonly resort to a small subset of these indexes since they may be difficult to set up or understand.This paper presents a new user-friendly voice analysis tool named BioVoice. At present, BioVoice allows for the evaluation of few but important indexes, devoting great effort to their robust and automatic evaluation, although extensions are foreseeable. Specifically, fundamental frequency, along with irregularity (jitter, relative average perturbation), noise, and formant frequencies, is tracked on voiced parts of the signal only. Mean and standard deviation values are also calculated and displayed. This high-resolution estimation procedure is further strengthened by an adaptive estimation of the optimal length of signal frames for analysis, linked to varying signal characteristics.Moreover, BioVoice allows automatic analysis of any kind of voice signal as far as F0 range and sampling frequency are concerned, with no manual setting required. This new tool is thus feasible for use by non-experts from different scientific fields, thanks to its simple interface.Here, the proposed approach is applied to patients who underwent micro-laryngoscopic direct exeresis to remove cysts and polyps. Pre- and post-surgical indexes were estimated using BioVoice and then compared with the output of one of the most common commercial software tools to both assess voice quality recovery and to evaluate the new method’s capabilities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号