首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, a novel Artificial Neural-Network (ANN) based multi-sensor multi-band adaptive signal-processing scheme is described for enhancing acoustic-speech corrupted by real noise and reverberation. Numerically robust adaptation-algorithms are employed for the ANN based sub-band filters; and, new simulation experiments are reported using real-reverberant automobile data which demonstrate that the proposed speech-enhancement system is capable of outperforming conventional linear filtering-based wide-band and multi-band noise-cancellation schemes.  相似文献   

2.

Objective

To investigate the performance of monaural and binaural beamforming technology with an additional noise reduction algorithm, in cochlear implant recipients.

Method

This experimental study was conducted as a single subject repeated measures design within a large German cochlear implant centre. Twelve experienced users of an Advanced Bionics HiRes90K or CII implant with a Harmony speech processor were enrolled. The cochlear implant processor of each subject was connected to one of two bilaterally placed state-of-the-art hearing aids (Phonak Ambra) providing three alternative directional processing options: an omnidirectional setting, an adaptive monaural beamformer, and a binaural beamformer. A further noise reduction algorithm (ClearVoice) was applied to the signal on the cochlear implant processor itself. The speech signal was presented from 0° and speech shaped noise presented from loudspeakers placed at ±70°, ±135° and 180°. The Oldenburg sentence test was used to determine the signal-to-noise ratio at which subjects scored 50% correct.

Results

Both the adaptive and binaural beamformer were significantly better than the omnidirectional condition (5.3 dB±1.2 dB and 7.1 dB±1.6 dB (p<0.001) respectively). The best score was achieved with the binaural beamformer in combination with the ClearVoice noise reduction algorithm, with a significant improvement in SRT of 7.9 dB±2.4 dB (p<0.001) over the omnidirectional alone condition.

Conclusions

The study showed that the binaural beamformer implemented in the Phonak Ambra hearing aid could be used in conjunction with a Harmony speech processor to produce substantial average improvements in SRT of 7.1 dB. The monaural, adaptive beamformer provided an averaged SRT improvement of 5.3 dB.  相似文献   

3.
A variety of perceptual features can be used for the successful separation of information flows and higher speech intelligibility. The binaural system based on the spatial allocation of speech signals plays the most important role in attaining this goal. This review discusses how the mechanisms of spatial hearing provide selective attention to a target speech source and promote the recognition of a masked target signal.  相似文献   

4.
We describe two design strategies that could substantially improve the performance of speech enhancement systems. Results from a preliminary study of pulse recovery are presented to illustrate the potential benefits of such strategies. The first strategy is a direct application of a non-linear, adaptive signal processing approach for recovery of speech in noise. The second strategy optimizes performance by maximizing the enhancement system's ability to evoke target speech percepts. This approach may lead to better performance because the design is optimized on a measure directly related to the ultimate goal of speech enhancement: accurate communication of the speech percept. In both systems, recently developed ‘neural network’ learning algorithms can be used to determine appropriate parameters for enhancement processing.  相似文献   

5.
Auditory information is processed in a fine-to-crude hierarchical scheme, from low-level acoustic information to high-level abstract representations, such as phonological labels. We now ask whether fine acoustic information, which is not retained at high levels, can still be used to extract speech from noise. Previous theories suggested either full availability of low-level information or availability that is limited by task difficulty. We propose a third alternative, based on the Reverse Hierarchy Theory (RHT), originally derived to describe the relations between the processing hierarchy and visual perception. RHT asserts that only the higher levels of the hierarchy are immediately available for perception. Direct access to low-level information requires specific conditions, and can be achieved only at the cost of concurrent comprehension. We tested the predictions of these three views in a series of experiments in which we measured the benefits from utilizing low-level binaural information for speech perception, and compared it to that predicted from a model of the early auditory system. Only auditory RHT could account for the full pattern of the results, suggesting that similar defaults and tradeoffs underlie the relations between hierarchical processing and perception in the visual and auditory modalities.  相似文献   

6.
Auditory information is processed in a fine-to-crude hierarchical scheme, from low-level acoustic information to high-level abstract representations, such as phonological labels. We now ask whether fine acoustic information, which is not retained at high levels, can still be used to extract speech from noise. Previous theories suggested either full availability of low-level information or availability that is limited by task difficulty. We propose a third alternative, based on the Reverse Hierarchy Theory (RHT), originally derived to describe the relations between the processing hierarchy and visual perception. RHT asserts that only the higher levels of the hierarchy are immediately available for perception. Direct access to low-level information requires specific conditions, and can be achieved only at the cost of concurrent comprehension. We tested the predictions of these three views in a series of experiments in which we measured the benefits from utilizing low-level binaural information for speech perception, and compared it to that predicted from a model of the early auditory system. Only auditory RHT could account for the full pattern of the results, suggesting that similar defaults and tradeoffs underlie the relations between hierarchical processing and perception in the visual and auditory modalities.  相似文献   

7.
The objective was to determine if one of the neural temporal features, neural adaptation, can account for the across-subject variability in behavioral measures of temporal processing and speech perception performance in cochlear implant (CI) recipients. Neural adaptation is the phenomenon in which neural responses are the strongest at the beginning of the stimulus and decline following stimulus repetition (e.g., stimulus trains). It is unclear how this temporal property of neural responses relates to psychophysical measures of temporal processing (e.g., gap detection) or speech perception. The adaptation of the electrical compound action potential (ECAP) was obtained using 1000 pulses per second (pps) biphasic pulse trains presented directly to the electrode. The adaptation of the late auditory evoked potential (LAEP) was obtained using a sequence of 1-kHz tone bursts presented acoustically, through the cochlear implant. Behavioral temporal processing was measured using the Random Gap Detection Test at the most comfortable listening level. Consonant nucleus consonant (CNC) word and AzBio sentences were also tested. The results showed that both ECAP and LAEP display adaptive patterns, with a substantial across-subject variability in the amount of adaptation. No correlations between the amount of neural adaptation and gap detection thresholds (GDTs) or speech perception scores were found. The correlations between the degree of neural adaptation and demographic factors showed that CI users having more LAEP adaptation were likely to be those implanted at a younger age than CI users with less LAEP adaptation. The results suggested that neural adaptation, at least this feature alone, cannot account for the across-subject variability in temporal processing ability in the CI users. However, the finding that the LAEP adaptive pattern was less prominent in the CI group compared to the normal hearing group may suggest the important role of normal adaptation pattern at the cortical level in speech perception.  相似文献   

8.
The recently introduced wavelet transform is a member of the class of time-frequency representations which include the Gabor short-time Fourier transform and Wigner-Ville distribution. Such techniques are of significance because of their ability to display the spectral content of a signal as time elapses. The value of the wavelet transform as a signal analysis tool has been demonstrated by its successful application to the study of turbulence and processing of speech and music. Since, in common with these subjects, both the time and frequency content of physiological signals are often of interest (the ECG being an obvious example), the wavelet transform represents a particularly relevant means of analysis. Following a brief introduction to the wavelet transform and its implementation, this paper describes a preliminary investigation into its application to the study of both ECG and heart rate variability data. In addition, the wavelet transform can be used to perform multiresolution signal decomposition. Since this process can be considered as a sub-band coding technique, it offers the opportunity for data compression, which can be implemented using efficient pyramidal algorithms. Results of the compression and reconstruction of ECG data are given which suggest that the wavelet transform is well suited to this task.  相似文献   

9.
Bilateral cochlear implants aim to provide hearing to both ears for children who are deaf and promote binaural/spatial hearing. Benefits are limited by mismatched devices and unilaterally-driven development which could compromise the normal integration of left and right ear input. We thus asked whether children hear a fused image (ie. 1 vs 2 sounds) from their bilateral implants and if this “binaural fusion” reduces listening effort. Binaural fusion was assessed by asking 25 deaf children with cochlear implants and 24 peers with normal hearing whether they heard one or two sounds when listening to bilaterally presented acoustic click-trains/electric pulses (250 Hz trains of 36 ms presented at 1 Hz). Reaction times and pupillary changes were recorded simultaneously to measure listening effort. Bilaterally implanted children heard one image of bilateral input less frequently than normal hearing peers, particularly when intensity levels on each side were balanced. Binaural fusion declined as brainstem asymmetries increased and age at implantation decreased. Children implanted later had access to acoustic input prior to implantation due to progressive deterioration of hearing. Increases in both pupil diameter and reaction time occurred as perception of binaural fusion decreased. Results indicate that, without binaural level cues, children have difficulty fusing input from their bilateral implants to perceive one sound which costs them increased listening effort. Brainstem asymmetries exacerbate this issue. By contrast, later implantation, reflecting longer access to bilateral acoustic hearing, may have supported development of auditory pathways underlying binaural fusion. Improved integration of bilateral cochlear implant signals for children is required to improve their binaural hearing.  相似文献   

10.
Extensive research shows that inter-talker variability (i.e., changing the talker) affects recognition memory for speech signals. However, relatively little is known about the consequences of intra-talker variability (i.e. changes in speaking style within a talker) on the encoding of speech signals in memory. It is well established that speakers can modulate the characteristics of their own speech and produce a listener-oriented, intelligibility-enhancing speaking style in response to communication demands (e.g., when speaking to listeners with hearing impairment or non-native speakers of the language). Here we conducted two experiments to examine the role of speaking style variation in spoken language processing. First, we examined the extent to which clear speech provided benefits in challenging listening environments (i.e. speech-in-noise). Second, we compared recognition memory for sentences produced in conversational and clear speaking styles. In both experiments, semantically normal and anomalous sentences were included to investigate the role of higher-level linguistic information in the processing of speaking style variability. The results show that acoustic-phonetic modifications implemented in listener-oriented speech lead to improved speech recognition in challenging listening conditions and, crucially, to a substantial enhancement in recognition memory for sentences.  相似文献   

11.
The cortical organization of speech processing   总被引:5,自引:0,他引:5  
Despite decades of research, the functional neuroanatomy of speech processing has been difficult to characterize. A major impediment to progress may have been the failure to consider task effects when mapping speech-related processing systems. We outline a dual-stream model of speech processing that remedies this situation. In this model, a ventral stream processes speech signals for comprehension, and a dorsal stream maps acoustic speech signals to frontal lobe articulatory networks. The model assumes that the ventral stream is largely bilaterally organized--although there are important computational differences between the left- and right-hemisphere systems--and that the dorsal stream is strongly left-hemisphere dominant.  相似文献   

12.
Pleiotropy and preadaptation in the evolution of human language capacity   总被引:1,自引:0,他引:1  
The capacity for spoken language in the human is a genetic trait, but the information communicated by this means is to a large extent culturally determined. Using a gene-culture coevolutionary approach, we model the hypothesis that speech evolved as a channel for the communication of adaptive cultural traits from parent to offspring. The motivation for this paper is a condition obtained previously that initial increase of communication would require at least a two-fold advantage for the transmitted trait. Here, we show that under reasonable assumptions the invasion condition becomes less stringent. In Model 1, we assume that two adaptive cultural traits can be transmitted. A gene which permits communication of the second adaptive trait. In Model 2, we assume that a related function such as greater memory capacity is a prerequisite for speech, and that this function confers an advantage independent of its association with speech. In both models we assume haploid sexual genetics and a simple scheme of vertical transmission. The stability properties of all corner and edge equilibria of the models are analyzed. The two models taken together suggest a possible scenario for the initial stages of the evolution of speech.  相似文献   

13.
We propose in this paper a new class of model processes for the extraction of spectral information from the neural representation of acoustic signals in mammals. We are concerned particularly with mechanisms for detecting the phase-locked activity of auditory neurons in response to frequencies and intensities of sound associated with speech perception. Recent psychophysical tests on deaf human subjects implanted with intracochlear stimulating electrodes as an auditory prosthesis have produced results which are in conflict with the predictions of the classical place-pitch and periodicity-pitch theories. In our model, the detection of synchronicity between two phase-locked signals derived from sources spaced a finite distance apart on the basilar membrane can be used to extract spectral information from the spatiotemporal pattern of basilar membrane motion. Computer simulations of this process suggest an optimal spacing of about 0.3–0.4 of the wavelength of the frequency to be detected. This interval is consistent with a number of psychophysical, neurophysiological, and anatomical observations, including the results of high resolution frequency-mapping of the anteroventral cochlear nucleus which are presented here. One particular version of this model, invoking the binaurally sensitive cells of the medial superior olive as the critical detecting elements, has properties which are useful in accounting for certain complex binaural psychophysical observations.  相似文献   

14.
解码大脑在语音处理过程中涉及的信息加工层级结构、皮质响应机制及功能连接模式,是神经语言学领域的研究重点.以语音信息加工时序为依据,可将该认知过程划分为:初级声学信号时频编码(spectrotemporal analysis of primary acoustic signals)、音素处理(phonemic processing)以及词汇-语义加工(lexical-semantic processing) 3个处理阶段.目前,研究者已对各阶段的神经机制进行了广泛且深入的研究,但不同模型理论/假说难以整合互补,有必要进行梳理与总结.本文将以大脑处理语音信息的3个阶段为主线,以电生理学方法为侧重范式,对各阶段下的皮质映射、神经振荡模式以及事件相关响应机制等神经基础研究现状进行总结评述,以期为进一步了解语音信号如何在人脑中进行处理和表达等相关研究提供一定的参考.  相似文献   

15.
Auditory brainstem responses (ABRs), middle latency responses (MLRs), and slow cortical potentials (SCPs) were recorded in normal-hearing adults to trains of low-frequency acoustic signals delivered binaurally against a background of a continuous masking noise. Two stimulus conditions, labelled as binaural homophasic and binaural antiphasic paradigms, respectively, were systematically compared. In the homophasic paradigm both the signals and the masker were in-phase at two ears. In the antiphasic paradigm the signals were 180 degrees out-of-phase at two ears, while the masker was in-phase. The psychoacoustic release from masking in the antiphasic vs. the homophasic paradigm was regularly accompanied by an increase in amplitudes and a shortening in peak latencies of the SCPs. In contrast, no differences were evidenced between the homophasic and the antiphasic paradigms with respect to the ABRs and the MLRs. Considering the generation loci of the studied electric responses, it is concluded that the binaural psychoacoustic phenomenon, referred to as the masking level difference, is operated primarily at the cortical level.  相似文献   

16.
The effect of binaural decorrelation on the processing of interaural level difference cues in the barn owl (Tyto alba) was examined behaviorally and electrophysiologically. The electrophysiology experiment measured the effect of variations in binaural correlation on the first stage of interaural level difference encoding in the central nervous system. The responses of single neurons in the posterior part of the ventral nucleus of the lateral lemniscus were recorded to stimulation with binaurally correlated and binaurally uncorrelated noise. No significant differences in interaural level difference sensitivity were found between conditions. Neurons in the posterior part of the ventral nucleus of the lateral lemniscus encode the interaural level difference of binaurally correlated and binaurally uncorrelated noise with equal accuracy and precision. This nucleus therefore supplies higher auditory centers with an undegraded interaural level difference signal for sound stimuli that lack a coherent interaural time difference. The behavioral experiment measured auditory saccades in response to interaural level differences presented in binaurally correlated and binaurally uncorrelated noise. The precision and accuracy of sound localization based on interaural level difference was reduced but not eliminated for binaurally uncorrelated signals. The observation that barn owls continue to vary auditory saccades with the interaural level difference of binaurally uncorrelated stimuli suggests that neurons that drive head saccades can be activated by incomplete auditory spatial information.  相似文献   

17.
The presentation of two sinusoidal tones, one to each ear, with a slight frequency mismatch yields an auditory illusion of a beating frequency equal to the frequency difference between the two tones; this is known as binaural beat (BB). The effect of brief BB stimulation on scalp EEG is not conclusively demonstrated. Further, no studies have examined the impact of musical training associated with BB stimulation, yet musicians'' brains are often associated with enhanced auditory processing. In this study, we analysed EEG brain responses from two groups, musicians and non-musicians, when stimulated by short presentation (1 min) of binaural beats with beat frequency varying from 1 Hz to 48 Hz. We focused our analysis on alpha and gamma band EEG signals, and they were analysed in terms of spectral power, and functional connectivity as measured by two phase synchrony based measures, phase locking value and phase lag index. Finally, these measures were used to characterize the degree of centrality, segregation and integration of the functional brain network. We found that beat frequencies belonging to alpha band produced the most significant steady-state responses across groups. Further, processing of low frequency (delta, theta, alpha) binaural beats had significant impact on cortical network patterns in the alpha band oscillations. Altogether these results provide a neurophysiological account of cortical responses to BB stimulation at varying frequencies, and demonstrate a modulation of cortico-cortical connectivity in musicians'' brains, and further suggest a kind of neuronal entrainment of a linear and nonlinear relationship to the beating frequencies.  相似文献   

18.
Normal sound localization requires precise comparisons of sound timing and pressure levels between the two ears. The primary localization cues are interaural time differences, ITD, and interaural level differences, ILD. Voltage-gated potassium channels, including Kv3.3, are highly expressed in the auditory brainstem and are thought to underlie the exquisite temporal precision and rapid spike rates that characterize brainstem binaural pathways. An autosomal dominant mutation in the gene encoding Kv3.3 has been demonstrated in a large Filipino kindred manifesting as spinocerebellar ataxia type 13 (SCA13). This kindred provides a rare opportunity to test in vivo the importance of a specific channel subunit for human hearing. Here, we demonstrate psychophysically that individuals with the mutant allele exhibit profound deficits in both ITD and ILD sensitivity, despite showing no obvious impairment in pure-tone sensitivity with either ear. Surprisingly, several individuals exhibited the auditory deficits even though they were pre-symptomatic for SCA13. We would expect that impairments of binaural processing as great as those observed in this family would result in prominent deficits in localization of sound sources and in loss of the "spatial release from masking" that aids in understanding speech in the presence of competing sounds.  相似文献   

19.
In recent years, extensive studies have been conducted on the diagnosis of Alzheimer''s disease (AD) using the non-invasive speech signal recognition method. In this study, Farsi speech signals were analyzed using the auditory model system (AMS) in order to recognize AD. For this purpose, after the pre-processing of the speech signals and utilizing AMS, 4D outputs as function of time, frequency, rate, and scale range were obtained. The AMS outcomes were averaged in term of time to analyze the rate-frequency-scale for both groups, Alzheimer''s and healthy control subjects. Thereafter, the maximum of spectral and temporal modulation and frequency were extracted to classify by the support vector machine (SVM). The SVM achieves higher promising recognition accuracy with compare to prevalent approaches in the field of speech processing. The acceptable results demonstrate the applicability of the proposed algorithm in non-invasive and low-cost recognizing Alzheimer''s only using the few extracted features of the speech signal.  相似文献   

20.
A significant fraction of newly implanted cochlear implant recipients use a hearing aid in their non-implanted ear. SCORE bimodal is a sound processing strategy developed for this configuration, aimed at normalising loudness perception and improving binaural loudness balance. Speech perception performance in quiet and noise and sound localisation ability of six bimodal listeners were measured with and without application of SCORE. Speech perception in quiet was measured either with only acoustic, only electric, or bimodal stimulation, at soft and normal conversational levels. For speech in quiet there was a significant improvement with application of SCORE. Speech perception in noise was measured for either steady-state noise, fluctuating noise, or a competing talker, at conversational levels with bimodal stimulation. For speech in noise there was no significant effect of application of SCORE. Modelling of interaural loudness differences in a long-term-average-speech-spectrum-weighted click train indicated that left-right discrimination of sound sources can improve with application of SCORE. As SCORE was found to leave speech perception unaffected or to improve it, it seems suitable for implementation in clinical devices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号