首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Numerous speech processing techniques have been applied to assist hearing-impaired subjects with extreme high-frequency hearing losses who can be helped only to a limited degree with conventional hearing aids. The results of providing this class of deaf subjects with a speech encoding hearing aid, which is able to reproduce intelligible speech for their particular needs, have generally been disappointing. There are at least four problems related to bandwidth compression applied to the voiced portion of speech: (1) the problem of pitch extraction in real time; (2) pitch extraction under realistic listening conditions, i.e. when competing speech and noise sources are present; (3) an insufficient data base for successful compression of voiced speech; and (4) the introduction of undesirable spectral energies in the bandwidth-compressed signal, due to the compression process itself. Experiments seem to indicate that voiced speech segments bandwidth limited to f = 1000 Hz, even at a loss of higher formant frequencies, is in most instances superior in intelligibility compared to bandwidth-compressed voiced speech segments of the same bandwidth, even if pitch can be extracted with no error. With the added complexity of real-time pitch extraction which has to function in actual listening conditions, it is doubtful that a speech encoding hearing aid, based on bandwidth compression on the voiced portion of speech, could be successfully implemented. However, if bandwidth compression is applied to the unvoiced portions of speech only, the above limitations can be overcome (1).  相似文献   

2.
It is known from the literature that (1) sounds with complex spectral composition are assessed by summing the partial outputs of the spectral channels; (2) electrical stimuli used in cochlear implant systems bring about the perception of a frequency band; and (3) removal of different parts of the auditory spectrum significantly affects phrase intelligibility. The level of acoustic pressure (AP) at a comfortable loudness level and the phrase intelligibility after comb filtering of a speech signal were measured in normally hearing subjects. Using a software program for spectral transformation of the speech signal, the phrase spectrum was divided into frequency bands of various width and only the bands with odd numbers were summed. In three series, the width of odd bands was 50, 100, or 150 Hz and the width of even bands was varied. The filter period was equal to the sum of the even and odd bands. With the same period, the acoustic pressure of the output signal should be increased to reach the comfortable loudness level of a speech signal passed via the comb filter; the narrower the width of the test bands, the higher the AP increase. With the same width of the test band, the acoustic pressure of the output signal should be increased to reach the comfortable loudness level; the greater the filter period, the higher the increase should be. The speech signal redundancy with respect to its spectral content can be equal to or even exceed 97.5%.  相似文献   

3.
Twenty-three children with Down's syndrome, aged between 3.7 and 17.5 years, underwent partial glossectomy for improvement of cosmetic appearance. Improved speech was also expected. Preoperative and postoperative audiotaped samples of spoken words and connected speech on a standardized articulation test were rated by three lay and three expert listeners on a five-point intelligibility scale. Five subjects were eliminated from both tasks and another four from connected-speech testing because of inability to complete the experimental tasks. Statistical analyses of ratings for words in 18 subjects and connected speech in 14 of them revealed no significant difference in acoustic speech intelligibility preoperatively and postoperatively. The findings suggest that a wedge-excision partial glossectomy in children with Down's syndrome does not result in significant improvement in acoustic speech intelligibility; in some patients, however, there may be an aesthetic improvement during speech.  相似文献   

4.
We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants.  相似文献   

5.
Luo H  Poeppel D 《Neuron》2007,54(6):1001-1010
How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4-8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an approximately 200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.  相似文献   

6.
Elucidating the structure and function of joint vocal displays (e.g. duet, chorus) recorded with a conventional microphone has proved difficult in some animals owing to the complex acoustic properties of the combined signal, a problem reminiscent of multi-speaker conversations in humans. Towards this goal, we set out to simultaneously compare air-transmitted (AT) with radio-transmitted (RT) vocalizations in one pair of humans and one pair of captive Bolivian grey titi monkeys (Plecturocebus donacophilus) all equipped with an accelerometer – or vibration transducer – closely apposed to the larynx. First, we observed no crosstalk between the two radio transmitters when subjects produced vocalizations at the same time close to each other. Second, compared with AT acoustic recordings, sound segmentation and pitch tracking of the RT signal was more accurate, particularly in a noisy and reverberating environment. Third, RT signals were less noisy than AT signals and displayed more stable amplitude regardless of distance, orientation and environment of the animal. The microphone outperformed the accelerometer with respect to sound spectral bandwidth and speech intelligibility: the sounds of RT speech were more attenuated and dampened as compared to AT speech. Importantly, we show that vocal telemetry allows reliable separation of the subjects’ voices during production of joint vocalizations, which has great potential for future applications of this technique with free-ranging animals.  相似文献   

7.
This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.  相似文献   

8.
The frequency content of muscular sound (MS), detected by placing a contact sensor transducer over the belly of the biceps brachii during 10 isometric contractions of 4 s each [10-100% of maximal voluntary contraction (MVC)] in seven sedentary men, was analyzed by the maximum entropy spectral estimation and the fast Fourier transform methods. With increasing %MVC, the power spectrum of the MS enlarges and tends to be multimodal beyond 30% MVC. Independent of the method, the mean frequency is approximately 11 Hz at the lower tasks, and then it increases up to 15 Hz at 80% MVC and to 22 Hz at 100% MVC. When the effort is increased the relative power in the 15- to 45-Hz bandwidth (range of firing rate of the motor units with fast-twitch fibers) from 20% reaches 55% of the power in the 6- to 45-Hz bandwidth (firing rate range of motor units with slow- and fast-twitch fibers). Our results obtained by the two different modeling approaches confirm the reliability of the sound signal. Moreover, it appears that from the MS the motor unit activation pattern can be retrieved.  相似文献   

9.
Reverberation is known to reduce the temporal envelope modulations present in the signal and affect the shape of the modulation spectrum. A non-intrusive intelligibility measure for reverberant speech is proposed motivated by the fact that the area of the modulation spectrum decreases with increasing reverberation. The proposed measure is based on the average modulation area computed across four acoustic frequency bands spanning the signal bandwidth. High correlations (r = 0.98) were observed with sentence intelligibility scores obtained by cochlear implant listeners. Proposed measure outperformed other measures including an intrusive speech-transmission index based measure.  相似文献   

10.
1. Pure tone displacement sensitivity and bandwidth were measured from the saccule of the ear in two anabantid species (Trichogaster trichopterus and Helostoma temincki) using microphonic potentials with a 1 microV RMS threshold for the second harmonic of the stimulus frequency. 2. Saccular microphonics were recorded in both species from 80 to 1600 Hz, with lowest thresholds between 100 and 200 Hz. The overall microphonic response curves (sensitivity and bandwidth) of the two species were statistically similar to one another with an analysis of variance, although there were statistically different thresholds at 100 and 800 Hz. 3. The hair cell orientation patterns of the saccular epithelia differ in the two species. Consequently, the comparative sizes of the saccular sensory epithelium and numbers of sensory hair cells were examined. The saccular sensory epithelium of Helostoma is about 40% larger and contains nearly 50% more hair cells than the saccular epithelium of a comparably sized Trichogaster. 4. An extracranial air bubble, located in the suprabranchial chamber, is found in both species. The bubble has direct access to the saccular chamber in Trichogaster through a foramen which is absent in Helostoma. Despite the difference in morphology and the larger numbers of sensory hair cells in Helostoma, hearing sensitivity and bandwidth is similar in the two species. Although the structural differences in the auditory periphery do not affect pure tone sensitivity and bandwidth, other aspects of fish hearing such as frequency discrimination, discrimination of signals in the presence of noise, and/or sound localization ability may be affected by these structural differences.  相似文献   

11.
The effect of stimulation frequency on twitch force potentiation was examined in the adductor pollicis muscle of ten normal subjects. The ulnar nerve was supramaximally stimulated at the wrist and isometric twitch force was measured from a 3-Hz train lasting 1 s. Test stimulation frequencies of 5, 10, 20, 25, 30, 40, 50 and 100 Hz were applied for 5 s each in random order (5 min apart) and the twitches (3 Hz) were applied immediately before and after (1 s) the test frequency and at intervals up to 5 min afterwards (10 s, and 1, 2 and 5 min). Poststimulation twitches were expressed as a percentage of the prestimulation twitch. Low frequency fatigue was not induced by the protocol since the 20:50 Hz ratio did not alter within each session. The degree of twitch potentiation was frequency dependent, with potentiation increasing up to 50 Hz [mean 173 (SD 16)%] but the effect was markedly less at 100 Hz [mean 133 (SD 25)%, P less than 0.01] for all subjects. The reduced potentiation at 100 Hz may have occurred due to high frequency fatigue produced by the 100-Hz test stimulation train. The optimal frequency of those examined in the experimental group was 50 Hz but this only produced maximal potentiation in six of the ten subjects and 100 Hz always produced less potentiation. These findings have implications for electrical stimulation of muscle in the clinical setting.  相似文献   

12.
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.  相似文献   

13.

Objectives

(1) To report the speech perception and intelligibility results of Mandarin-speaking patients with large vestibular aqueduct syndrome (LVAS) after cochlear implantation (CI); (2) to compare their performance with a group of CI users without LVAS; (3) to understand the effects of age at implantation and duration of implant use on the CI outcomes. The obtained data may be used to guide decisions about CI candidacy and surgical timing.

Methods

Forty-two patients with LVAS participating in this study were divided into two groups: the early group received CI before 5 years of age and the late group after 5. Open-set speech perception tests (on Mandarin tones, words and sentences) were administered one year after implantation and at the most recent follow-up visit. Categories of auditory perception (CAP) and Speech Intelligibility Rating (SIR) scale scores were also obtained.

Results

The patients with LVAS with more than 5 years of implant use (18 cases) achieved a mean score higher than 80% on the most recent speech perception tests and reached the highest level on the CAP/SIR scales. The early group developed speech perception and intelligibility steadily over time, while the late group had a rapid improvement during the first year after implantation. The two groups, regardless of their age at implantation, reached a similar performance level at the most recent follow-up visit.

Conclusion

High levels of speech performance are reached after 5 years of implant use in patients with LVAS. These patients do not necessarily need to wait until their hearing thresholds are higher than 90 dB HL or PB word score lower than 40% to receive CI. They can do it “earlier” when their speech perception and/or speech intelligibility do not reach the performance level suggested in this study.  相似文献   

14.
The purpose of the present study was to determine whether different cues to increase loudness in speech result in different internal targets (or goals) for respiratory movement and whether the neural control of the respiratory system is sensitive to changes in the speaker's internal loudness target. This study examined respiratory mechanisms during speech in 30 young adults at comfortable level and increased loudness levels. Increased loudness was elicited using three methods: asking subjects to target a specific sound pressure level, asking subjects to speak twice as loud as comfortable, and asking subjects to speak in noise. All three loud conditions resulted in similar increases in sound pressure level . However, the respiratory mechanisms used to support the increase in loudness differed significantly depending on how the louder speech was elicited. When asked to target at a particular sound pressure level, subjects used a mechanism of increasing the lung volume at which speech was initiated to take advantage of higher recoil pressures. When asked to speak twice as loud as comfortable, subjects increased expiratory muscle tension, for the most part, to increase the pressure for speech. However, in the most natural of the elicitation methods, speaking in noise, the subjects used a combined respiratory approach, using both increased recoil pressures and increased expiratory muscle tension. In noise, an additional target, possibly improving intelligibility of speech, was reflected in the slowing of speech rate and in larger volume excursions even though the speakers were producing the same number of syllables.  相似文献   

15.
Unbalanced bipolar stimulation, delivered using charge balanced pulses, was used to produce “Phantom stimulation”, stimulation beyond the most apical contact of a cochlear implant’s electrode array. The Phantom channel was allocated audio frequencies below 300Hz in a speech coding strategy, conveying energy some two octaves lower than the clinical strategy and hence delivering the fundamental frequency of speech and of many musical tones. A group of 12 Advanced Bionics cochlear implant recipients took part in a chronic study investigating the fitting of the Phantom strategy and speech and music perception when using Phantom. The evaluation of speech in noise was performed immediately after fitting Phantom for the first time (Session 1) and after one month of take-home experience (Session 2). A repeated measures of analysis of variance (ANOVA) within factors strategy (Clinical, Phantom) and interaction time (Session 1, Session 2) revealed a significant effect for the interaction time and strategy. Phantom obtained a significant improvement in speech intelligibility after one month of use. Furthermore, a trend towards a better performance with Phantom (48%) with respect to F120 (37%) after 1 month of use failed to reach significance after type 1 error correction. Questionnaire results show a preference for Phantom when listening to music, likely driven by an improved balance between high and low frequencies.  相似文献   

16.
Eight patients with Down syndrome, aged 9 years and 10 months to 25 years and 4 months, underwent partial glossectomy. Preoperative and postoperative videotaped samples of spoken words and connected speech were randomized and rated by two groups of listeners, only one of which knew of the surgery. Aesthetic appearance of speech or visual acceptability of the patient while speaking was judged from visual information only. Judgments of speech intelligibility were made from the auditory portion of the videotapes. Acceptability and intelligibility also were judged together during audiovisual presentation. Statistical analysis revealed that speech was significantly more acceptable aesthetically after surgery. No significant difference was found in speech intelligibility preoperatively and postoperatively. Ratings did not differ significantly depending on whether the rater knew of the surgery. Analysis of results obtained in various presentation modes revealed that the aesthetics of speech did not significantly affect judgment of intelligibility. Conversely, speech acceptability was greater in the presence of higher levels of intelligibility.  相似文献   

17.
Significant scientific and translational questions remain in auditory neuroscience surrounding the neural correlates of perception. Relating perceptual and neural data collected from humans can be useful; however, human-based neural data are typically limited to evoked far-field responses, which lack anatomical and physiological specificity. Laboratory-controlled preclinical animal models offer the advantage of comparing single-unit and evoked responses from the same animals. This ability provides opportunities to develop invaluable insight into proper interpretations of evoked responses, which benefits both basic-science studies of neural mechanisms and translational applications, e.g., diagnostic development. However, these comparisons have been limited by a disconnect between the types of spectrotemporal analyses used with single-unit spike trains and evoked responses, which results because these response types are fundamentally different (point-process versus continuous-valued signals) even though the responses themselves are related. Here, we describe a unifying framework to study temporal coding of complex sounds that allows spike-train and evoked-response data to be analyzed and compared using the same advanced signal-processing techniques. The framework uses a set of peristimulus-time histograms computed from single-unit spike trains in response to polarity-alternating stimuli to allow advanced spectral analyses of both slow (envelope) and rapid (temporal fine structure) response components. Demonstrated benefits include: (1) novel spectrally specific temporal-coding measures that are less confounded by distortions due to hair-cell transduction, synaptic rectification, and neural stochasticity compared to previous metrics, e.g., the correlogram peak-height, (2) spectrally specific analyses of spike-train modulation coding (magnitude and phase), which can be directly compared to modern perceptually based models of speech intelligibility (e.g., that depend on modulation filter banks), and (3) superior spectral resolution in analyzing the neural representation of nonstationary sounds, such as speech and music. This unifying framework significantly expands the potential of preclinical animal models to advance our understanding of the physiological correlates of perceptual deficits in real-world listening following sensorineural hearing loss.  相似文献   

18.
Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants.  相似文献   

19.
The purpose of the study was to examine the effect of prolonged vibration on the force fluctuations during a force-matching task performed at low-force levels. Fourteen young healthy men performed a submaximal force-matching task of isometric plantar flexion before and after Achilles tendon vibration (n = 8, vibration subjects) or lying without vibration (n = 6, control subjects) for 30 min. The target forces were 2.5-10% of the previbration maximal voluntary contraction force. The standard deviation of force decreased by a mean of 29 +/- 20% across target forces after vibration, whereas it did not decrease significantly in control subjects (-5 +/- 12%). This change was significantly greater compared with control subjects (P < 0.01 for both). Power spectral density of the force was predominantly composed of signals of low-frequency bandwidth (相似文献   

20.
The aim of this study was to evaluate the effect of modulated microwave exposure on human EEG of individual subjects. The experiments were carried out on four different groups of healthy volunteers. The 450 MHz microwave radiation modulated at 7 Hz (first group, 19 subjects), 14 and 21 Hz (second group, 13 subjects), 40 and 70 Hz (third group, 15 subjects), 217 and 1000 Hz (fourth group, 19 subjects) frequencies was applied. The field power density at the scalp was 0.16 mW/cm(2). The calculated spatial peak SAR averaged over 1 g was 0.303 W/kg. Ten cycles of the exposure (1 min off and 1 min on) at fixed modulation frequencies were applied. All subjects completed the experimental protocols with exposure and sham. The exposed and sham-exposed subjects were randomly assigned. A computer also randomly assigned the succession of modulation frequencies. Our results showed that microwave exposure increased the EEG energy. Relative changes in the EEG beta1 power in P3-P4 channels were selected for evaluation of individual sensitivity. The rate of subjects significantly affected is similar in all groups except for the 1000 Hz group: in first group 3 subjects (16%) at 7 Hz modulation; in second group 4 subjects (31%) at 14 Hz modulation and 3 subjects (23%) at 21 Hz modulation; in third group 3 subjects (20%) at 40 Hz and 2 subjects (13%) at 70 Hz modulation; in fourth group 3 subjects (16%) at 217 Hz and 0 subjects at 1000 Hz modulation frequency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号