首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The most common complaint of older hearing impaired (OHI) listeners is difficulty understanding speech in the presence of noise. However, tests of consonant-identification and sentence reception threshold (SeRT) provide different perspectives on the magnitude of impairment. Here we quantified speech perception difficulties in 24 OHI listeners in unaided and aided conditions by analyzing (1) consonant-identification thresholds and consonant confusions for 20 onset and 20 coda consonants in consonant-vowel-consonant (CVC) syllables presented at consonant-specific signal-to-noise (SNR) levels, and (2) SeRTs obtained with the Quick Speech in Noise Test (QSIN) and the Hearing in Noise Test (HINT). Compared to older normal hearing (ONH) listeners, nearly all unaided OHI listeners showed abnormal consonant-identification thresholds, abnormal consonant confusions, and reduced psychometric function slopes. Average elevations in consonant-identification thresholds exceeded 35 dB, correlated strongly with impairments in mid-frequency hearing, and were greater for hard-to-identify consonants. Advanced digital hearing aids (HAs) improved average consonant-identification thresholds by more than 17 dB, with significant HA benefit seen in 83% of OHI listeners. HAs partially normalized consonant-identification thresholds, reduced abnormal consonant confusions, and increased the slope of psychometric functions. Unaided OHI listeners showed much smaller elevations in SeRTs (mean 6.9 dB) than in consonant-identification thresholds and SeRTs in unaided listening conditions correlated strongly (r = 0.91) with identification thresholds of easily identified consonants. HAs produced minimal SeRT benefit (2.0 dB), with only 38% of OHI listeners showing significant improvement. HA benefit on SeRTs was accurately predicted (r = 0.86) by HA benefit on easily identified consonants. Consonant-identification tests can accurately predict sentence processing deficits and HA benefit in OHI listeners.  相似文献   

2.

Objective

To investigate a set of acoustic features and classification methods for the classification of three groups of fricative consonants differing in place of articulation.

Method

A support vector machine (SVM) algorithm was used to classify the fricatives extracted from the TIMIT database in quiet and also in speech babble noise at various signal-to-noise ratios (SNRs). Spectral features including four spectral moments, peak, slope, Mel-frequency cepstral coefficients (MFCC), Gammatone filters outputs, and magnitudes of fast Fourier Transform (FFT) spectrum were used for the classification. The analysis frame was restricted to only 8 msec. In addition, commonly-used linear and nonlinear principal component analysis dimensionality reduction techniques that project a high-dimensional feature vector onto a lower dimensional space were examined.

Results

With 13 MFCC coefficients, 14 or 24 Gammatone filter outputs, classification performance was greater than or equal to 85% in quiet and at +10 dB SNR. Using 14 Gammatone filter outputs above 1 kHz, classification accuracy remained high (greater than 80%) for a wide range of SNRs from +20 to +5 dB SNR.

Conclusions

High levels of classification accuracy for fricative consonants in quiet and in noise could be achieved using only spectral features extracted from a short time window. Results of this work have a direct impact on the development of speech enhancement algorithms for hearing devices.  相似文献   

3.
Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants.  相似文献   

4.
For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception.  相似文献   

5.
In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (−8.8 dB to −18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.  相似文献   

6.
Capability for identification of direction of movement of sound images (“upward” or “downward”) was studied in listeners of two age groups: 19–27 year old (11 subjects) and 55–73 year old (9 subjects). Various sound models of movement in the median plane were used as stimuli. Initially, a model of movement was developed based on filtration of broadband noise pulses by sets of “non-individualized,” i. e., measured in other listeners, head-related transfer functions. These functions corresponded to consecutive positions of the sound sources with the 5.6° step between the space points with coordinates of elevation from ?45° to 45°. The signals generated on the basis of transfer function sets of 23 subjects were distinguished by regularity and the value of the spectral minimum shift as well as by dynamic changes of the spectral maximum. These dynamic changes were taken into account at creation of “synthetic” signals. In these signals, the vertical movement of sound images in the median plane was simulated either by a consecutive shift of the spectral minimum in broadband noise pulses or by a combination of the spectral minimum shift with a simultaneous change of the spectral maximum width and power. The obtained data have shown that young listeners with the high capability for vertical localization could identify direction of the sound image movement based on displacement of the spectral minimum in the broadband noise. For identification of direction of the sound image movement, the younger listeners with poor capabilities for vertical localization and the older listeners used dynamic changes of the signal power, which were connected mainly with the spectral maximum range.  相似文献   

7.
A significant fraction of newly implanted cochlear implant recipients use a hearing aid in their non-implanted ear. SCORE bimodal is a sound processing strategy developed for this configuration, aimed at normalising loudness perception and improving binaural loudness balance. Speech perception performance in quiet and noise and sound localisation ability of six bimodal listeners were measured with and without application of SCORE. Speech perception in quiet was measured either with only acoustic, only electric, or bimodal stimulation, at soft and normal conversational levels. For speech in quiet there was a significant improvement with application of SCORE. Speech perception in noise was measured for either steady-state noise, fluctuating noise, or a competing talker, at conversational levels with bimodal stimulation. For speech in noise there was no significant effect of application of SCORE. Modelling of interaural loudness differences in a long-term-average-speech-spectrum-weighted click train indicated that left-right discrimination of sound sources can improve with application of SCORE. As SCORE was found to leave speech perception unaffected or to improve it, it seems suitable for implementation in clinical devices.  相似文献   

8.
Petkov CI  O'Connor KN  Sutter ML 《Neuron》2007,54(1):153-165
When interfering objects occlude a scene, the visual system restores the occluded information. Similarly, when a sound of interest (a "foreground" sound) is interrupted (occluded) by loud noise, the auditory system restores the occluded information. This process, called auditory induction, can be exploited to create a continuity illusion. When a segment of a foreground sound is deleted and loud noise fills the missing portion, listeners incorrectly report hearing the foreground continuing through the noise. Here we reveal the neurophysiological underpinnings of illusory continuity in single-neuron responses from awake macaque monkeys' primary auditory cortex (A1). A1 neurons represented the missing segment of occluded tonal foregrounds by responding to discontinuous foregrounds interrupted by intense noise as if they were responding to the complete foregrounds. By comparison, simulated peripheral responses represented only the noise and not the occluded foreground. The results reveal that many A1 single-neuron responses closely follow the illusory percept.  相似文献   

9.
One of the most common examples of audiovisual speech integration is the McGurk effect. As an example, an auditory syllable /ba/ recorded over incongruent lip movements that produce “ga” typically causes listeners to hear “da”. This report hypothesizes reasons why certain clinical and listeners who are hard of hearing might be more susceptible to visual influence. Conversely, we also examine why other listeners appear less susceptible to the McGurk effect (i.e., they report hearing just the auditory stimulus without being influenced by the visual). Such explanations are accompanied by a mechanistic explanation of integration phenomena including visual inhibition of auditory information, or slower rate of accumulation of inputs. First, simulations of a linear dynamic parallel interactive model were instantiated using inhibition and facilitation to examine potential mechanisms underlying integration. In a second set of simulations, we systematically manipulated the inhibition parameter values to model data obtained from listeners with autism spectrum disorder. In summary, we argue that cross-modal inhibition parameter values explain individual variability in McGurk perceptibility. Nonetheless, different mechanisms should continue to be explored in an effort to better understand current data patterns in the audiovisual integration literature.  相似文献   

10.
Previous studies have shown that concurrent vowel identification improves with increasing temporal onset asynchrony of the vowels, even if the vowels have the same fundamental frequency. The current study investigated the possible underlying neural processing involved in concurrent vowel perception. The individual vowel stimuli from a previously published study were used as inputs for a phenomenological auditory-nerve (AN) model. Spectrotemporal representations of simulated neural excitation patterns were constructed (i.e., neurograms) and then matched quantitatively with the neurograms of the single vowels using the Neurogram Similarity Index Measure (NSIM). A novel computational decision model was used to predict concurrent vowel identification. To facilitate optimum matches between the model predictions and the behavioral human data, internal noise was added at either neurogram generation or neurogram matching using the NSIM procedure. The best fit to the behavioral data was achieved with a signal-to-noise ratio (SNR) of 8 dB for internal noise added at the neurogram but with a much smaller amount of internal noise (SNR of 60 dB) for internal noise added at the level of the NSIM computations. The results suggest that accurate modeling of concurrent vowel data from listeners with normal hearing may partly depend on internal noise and where internal noise is hypothesized to occur during the concurrent vowel identification process.  相似文献   

11.

Background

Enjoyment of music is an important part of life that may be degraded for people with hearing impairments, especially those using cochlear implants. The ability to follow separate lines of melody is an important factor in music appreciation. This ability relies on effective auditory streaming, which is much reduced in people with hearing impairment, contributing to difficulties in music appreciation. The aim of this study was to assess whether visual cues could reduce the subjective difficulty of segregating a melody from interleaved background notes in normally hearing listeners, those using hearing aids, and those using cochlear implants.

Methodology/Principal Findings

Normally hearing listeners (N = 20), hearing aid users (N = 10), and cochlear implant users (N = 11) were asked to rate the difficulty of segregating a repeating four-note melody from random interleaved distracter notes. The pitch of the background notes was gradually increased or decreased throughout blocks, providing a range of difficulty from easy (with a large pitch separation between melody and distracter) to impossible (with the melody and distracter completely overlapping). Visual cues were provided on half the blocks, and difficulty ratings for blocks with and without visual cues were compared between groups. Visual cues reduced the subjective difficulty of extracting the melody from the distracter notes for normally hearing listeners and cochlear implant users, but not hearing aid users.

Conclusion/Significance

Simple visual cues may improve the ability of cochlear implant users to segregate lines of music, thus potentially increasing their enjoyment of music. More research is needed to determine what type of acoustic cues to encode visually in order to optimise the benefits they may provide.  相似文献   

12.
The perception of consonants which were followed by the vowel [a] was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of syllables were taken as an index of similarity between consonants. Consonants used were 20 natural French consonants and six natural and synthetic Japanese stop consonants. Cluster and MDSCAL analyses of reaction times for discrimination of the French consonants suggested that the manner of articulation is the major determinant of the structure of the perception of consonants by the chimpanzees. Discrimination of stop consonants suggested that the major grouping in the chimpanzees was by voicing. The place of articulation from the lips to the velum was reproduced only in the perception of the synthetic unvoiced stop consonants in the two dimensional MDSCAL space. The phoneme-boundary effect (categorical perception) for the voicing and place-of-articulation features was also examined by a chimpanzee using synthetic [ga]-[ka] and [ba]-[da] continua, respectively. The chimpanzee showed enhanced discriminability at or near the phonetic boundaries between the velar voiced and unvoiced and also between the voiced bilabial and alveolar stops. These results suggest that the basic mechanism for the identification of consonants in chimpanzees is similar to that in humans, although chimpanzees are less accurate than humans in discrimination of consonants.  相似文献   

13.
Contralateral masking is the phenomenon where a masker presented to one ear affects the ability to detect a signal in the opposite ear. For normal hearing listeners, contralateral masking results in masking patterns that are both sharper and dramatically smaller in magnitude than ipsilateral masking. The goal of this study was to investigate whether medial olivocochlear (MOC) efferents are needed for the sharpness and relatively small magnitude of the contralateral masking function. To do this, bilateral cochlear implant patients were tested because, by directly stimulating the auditory nerve, cochlear implants circumvent the effects of the MOC efferents. The results indicated that, as with normal hearing listeners, the contralateral masking function was sharper than the ipsilateral masking function. However, although there was a reduction in the magnitude of the contralateral masking function compared to the ipsilateral masking function, it was relatively modest. This is in sharp contrast to the results of normal hearing listeners where the magnitude of the contralateral masking function is greatly reduced. These results suggest that MOC function may not play a large role in the sharpness of the contralateral masking function but may play a considerable role in the magnitude of the contralateral masking function.  相似文献   

14.
The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.  相似文献   

15.
In this study, we propose a novel estimate of listening effort using electroencephalographic data. This method is a translation of our past findings, gained from the evoked electroencephalographic activity, to the oscillatory EEG activity. To test this technique, electroencephalographic data from experienced hearing aid users with moderate hearing loss were recorded, wearing hearing aids. The investigated hearing aid settings were: a directional microphone combined with a noise reduction algorithm in a medium and a strong setting, the noise reduction setting turned off, and a setting using omnidirectional microphones without any noise reduction. The results suggest that the electroencephalographic estimate of listening effort seems to be a useful tool to map the exerted effort of the participants. In addition, the results indicate that a directional processing mode can reduce the listening effort in multitalker listening situations.  相似文献   

16.

Objective

To investigate the performance of monaural and binaural beamforming technology with an additional noise reduction algorithm, in cochlear implant recipients.

Method

This experimental study was conducted as a single subject repeated measures design within a large German cochlear implant centre. Twelve experienced users of an Advanced Bionics HiRes90K or CII implant with a Harmony speech processor were enrolled. The cochlear implant processor of each subject was connected to one of two bilaterally placed state-of-the-art hearing aids (Phonak Ambra) providing three alternative directional processing options: an omnidirectional setting, an adaptive monaural beamformer, and a binaural beamformer. A further noise reduction algorithm (ClearVoice) was applied to the signal on the cochlear implant processor itself. The speech signal was presented from 0° and speech shaped noise presented from loudspeakers placed at ±70°, ±135° and 180°. The Oldenburg sentence test was used to determine the signal-to-noise ratio at which subjects scored 50% correct.

Results

Both the adaptive and binaural beamformer were significantly better than the omnidirectional condition (5.3 dB±1.2 dB and 7.1 dB±1.6 dB (p<0.001) respectively). The best score was achieved with the binaural beamformer in combination with the ClearVoice noise reduction algorithm, with a significant improvement in SRT of 7.9 dB±2.4 dB (p<0.001) over the omnidirectional alone condition.

Conclusions

The study showed that the binaural beamformer implemented in the Phonak Ambra hearing aid could be used in conjunction with a Harmony speech processor to produce substantial average improvements in SRT of 7.1 dB. The monaural, adaptive beamformer provided an averaged SRT improvement of 5.3 dB.  相似文献   

17.
Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test–retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.  相似文献   

18.
The listener-distinctive features of recognition of different emotional intonations (positive, negative and neutral) of male and female speakers in the presence or absence of background noise were studied in 49 adults aged 20-79 years. In all the listeners noise produced the most pronounced decrease in recognition accuracy for positive emotional intonation ("joy") as compared to other intonations, whereas it did not influence the recognition accuracy of "anger" in 65-79-year-old listeners. The higher emotion recognition rates of a noisy signal were observed for speech emotional intonations expressed by female speakers. Acoustic characteristics of noisy and clear speech signals underlying perception of speech emotional prosody were found for adult listeners of different age and gender.  相似文献   

19.
Spatial release from masking refers to a benefit for speech understanding. It occurs when a target talker and a masker talker are spatially separated. In those cases, speech intelligibility for target speech is typically higher than when both talkers are at the same location. In cochlear implant listeners, spatial release from masking is much reduced or absent compared with normal hearing listeners. Perhaps this reduced spatial release occurs because cochlear implant listeners cannot effectively attend to spatial cues. Three experiments examined factors that may interfere with deploying spatial attention to a target talker masked by another talker. To simulate cochlear implant listening, stimuli were vocoded with two unique features. First, we used 50-Hz low-pass filtered speech envelopes and noise carriers, strongly reducing the possibility of temporal pitch cues; second, co-modulation was imposed on target and masker utterances to enhance perceptual fusion between the two sources. Stimuli were presented over headphones. Experiments 1 and 2 presented high-fidelity spatial cues with unprocessed and vocoded speech. Experiment 3 maintained faithful long-term average interaural level differences but presented scrambled interaural time differences with vocoded speech. Results show a robust spatial release from masking in Experiments 1 and 2, and a greatly reduced spatial release in Experiment 3. Faithful long-term average interaural level differences were insufficient for producing spatial release from masking. This suggests that appropriate interaural time differences are necessary for restoring spatial release from masking, at least for a situation where there are few viable alternative segregation cues.  相似文献   

20.
Schaette R  Turtle C  Munro KJ 《PloS one》2012,7(6):e35238
Tinnitus, a phantom auditory sensation, is associated with hearing loss in most cases, but it is unclear if hearing loss causes tinnitus. Phantom auditory sensations can be induced in normal hearing listeners when they experience severe auditory deprivation such as confinement in an anechoic chamber, which can be regarded as somewhat analogous to a profound bilateral hearing loss. As this condition is relatively uncommon among tinnitus patients, induction of phantom sounds by a lesser degree of auditory deprivation could advance our understanding of the mechanisms of tinnitus. In this study, we therefore investigated the reporting of phantom sounds after continuous use of an earplug. 18 healthy volunteers with normal hearing wore a silicone earplug continuously in one ear for 7 days. The attenuation provided by the earplugs simulated a mild high-frequency hearing loss, mean attenuation increased from <10 dB at 0.25 kHz to >30 dB at 3 and 4 kHz. 14 out of 18 participants reported phantom sounds during earplug use. 11 participants presented with stable phantom sounds on day 7 and underwent tinnitus spectrum characterization with the earplug still in place. The spectra showed that the phantom sounds were perceived predominantly as high-pitched, corresponding to the frequency range most affected by the earplug. In all cases, the auditory phantom disappeared when the earplug was removed, indicating a causal relation between auditory deprivation and phantom sounds. This relation matches the predictions of our computational model of tinnitus development, which proposes a possible mechanism by which a stabilization of neuronal activity through homeostatic plasticity in the central auditory system could lead to the development of a neuronal correlate of tinnitus when auditory nerve activity is reduced due to the earplug.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号