首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today’s automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database—a corpus containing emotionally colored conversations with a cognitive system for “Sensitive Artificial Listening”.  相似文献   

2.

Background

Hearing ability is essential for normal speech development, however the precise mechanisms linking auditory input and the improvement of speaking ability remain poorly understood. Auditory feedback during speech production is believed to play a critical role by providing the nervous system with information about speech outcomes that is used to learn and subsequently fine-tune speech motor output. Surprisingly, few studies have directly investigated such auditory-motor learning in the speech production of typically developing children.

Methodology/Principal Findings

In the present study, we manipulated auditory feedback during speech production in a group of 9–11-year old children, as well as in adults. Following a period of speech practice under conditions of altered auditory feedback, compensatory changes in speech production and perception were examined. Consistent with prior studies, the adults exhibited compensatory changes in both their speech motor output and their perceptual representations of speech sound categories. The children exhibited compensatory changes in the motor domain, with a change in speech output that was similar in magnitude to that of the adults, however the children showed no reliable compensatory effect on their perceptual representations.

Conclusions

The results indicate that 9–11-year-old children, whose speech motor and perceptual abilities are still not fully developed, are nonetheless capable of auditory-feedback-based sensorimotor adaptation, supporting a role for such learning processes in speech motor development. Auditory feedback may play a more limited role, however, in the fine-tuning of children''s perceptual representations of speech sound categories.  相似文献   

3.
Selective attention is the mechanism that allows focusing one’s attention on a particular stimulus while filtering out a range of other stimuli, for instance, on a single conversation in a noisy room. Attending to one sound source rather than another changes activity in the human auditory cortex, but it is unclear whether attention to different acoustic features, such as voice pitch and speaker location, modulates subcortical activity. Studies using a dichotic listening paradigm indicated that auditory brainstem processing may be modulated by the direction of attention. We investigated whether endogenous selective attention to one of two speech signals affects amplitude and phase locking in auditory brainstem responses when the signals were either discriminable by frequency content alone, or by frequency content and spatial location. Frequency-following responses to the speech sounds were significantly modulated in both conditions. The modulation was specific to the task-relevant frequency band. The effect was stronger when both frequency and spatial information were available. Patterns of response were variable between participants, and were correlated with psychophysical discriminability of the stimuli, suggesting that the modulation was biologically relevant. Our results demonstrate that auditory brainstem responses are susceptible to efferent modulation related to behavioral goals. Furthermore they suggest that mechanisms of selective attention actively shape activity at early subcortical processing stages according to task relevance and based on frequency and spatial cues.  相似文献   

4.
Motor alalia refers to a number of disorders of expressive speech that are caused by the dysfunction of cerebral structures in the period when the formation of the speech system is not complete. This form of speech disorder is considered as a language disorder characterized by a persistent disturbance of the assimilation of a system of linguistic units. The possible cause of deviations in the development of speech function in children is a disproportion in the levels of development of speech structures in the left and right hemispheres, and this temporary dominance is often associated with an increased activity in the right hemisphere. According to the results of electroencephalographic studies, in children aged five to six years, there are two types of changes of the bioelectric potential system interaction of the brain cortex. The disorders of the spatial organization of interregional EEG correlations are more pronounced in either the left or right hemispheres of the brain. Thus, motor alalia can be accompanied either by disturbances in the interaction between Broca’s and Wernicke’s areas of the left hemisphere, or between symmetrical areas of the right hemisphere.  相似文献   

5.
《Anthrozo?s》2013,26(3):373-380
ABSTRACT

Vowel triangle area is a phonetic measure of the clarity of vowel articulation. Compared with speech to adults, people hyperarticulate vowels in speech to infants and foreigners but not to pets, despite other similarities in infant- and pet-directed-speech. This suggests that vowel hyperarticulation has a didactic function positively related to the actual, or even the expected, degree of linguistic competence of the audience. Parrots have some degree of linguistic competence yet no studies have examined vowel hyperarticulation in speech to parrots. Here, we compared the speech of 11 adults to another adult, a dog, a parrot, and an infant. A significant linear increase in vowel triangle area was found across the four conditions, showing that the degree of vowel hyperarticulation increased from adult- and dog-directed speech to parrot-directed speech, then to infant-directed speech. This suggests that the degree of vowel hyperarticulation is related to the audience's actual or expected linguistic competence. The results are discussed in terms of the relative roles of speakers' expectations versus listeners' feedback in the production of vowel hyperarticulation; and suggestions for further studies, manipulating speaker expectation and listener feedback, are provided.  相似文献   

6.
The present article contains no ultimate truths. As indicated in the title, the author's aim is to present, on the basis of presently available data, a few hypotheses concerning socalled "inner speech" in order to provide a foundation for experimental investigations. These hypotheses have as their premise the general conception of processes of speech generation that is current in contemporary Soviet psychology, in particular, in L. S. Vygotsky's school, as well as in another school of Soviet physiology of higher nervous activity associated with the name of N. A. Bernshteyn. However, these hypotheses are not, in principle, incompatible with certain other theories of verbal activity (verbal behavior); indeed, as will be evident later, they are partially based on material accumulated with the aid of these theories.  相似文献   

7.
In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments.  相似文献   

8.
Assessing brain activity during complex voluntary motor behaviors that require the recruitment of multiple neural sites is a field of active research. Our current knowledge is primarily based on human brain imaging studies that have clear limitations in terms of temporal and spatial resolution. We developed a physiologically informed non-linear multi-compartment stochastic neural model to simulate functional brain activity coupled with neurotransmitter release during complex voluntary behavior, such as speech production. Due to its state-dependent modulation of neural firing, dopaminergic neurotransmission plays a key role in the organization of functional brain circuits controlling speech and language and thus has been incorporated in our neural population model. A rigorous mathematical proof establishing existence and uniqueness of solutions to the proposed model as well as a computationally efficient strategy to numerically approximate these solutions are presented. Simulated brain activity during the resting state and sentence production was analyzed using functional network connectivity, and graph theoretical techniques were employed to highlight differences between the two conditions. We demonstrate that our model successfully reproduces characteristic changes seen in empirical data between the resting state and speech production, and dopaminergic neurotransmission evokes pronounced changes in modeled functional connectivity by acting on the underlying biological stochastic neural model. Specifically, model and data networks in both speech and rest conditions share task-specific network features: both the simulated and empirical functional connectivity networks show an increase in nodal influence and segregation in speech over the resting state. These commonalities confirm that dopamine is a key neuromodulator of the functional connectome of speech control. Based on reproducible characteristic aspects of empirical data, we suggest a number of extensions of the proposed methodology building upon the current model.  相似文献   

9.
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.  相似文献   

10.
11.
This study investigated associations between men's facial attractiveness, perceived personality, attitudes towards children, and the quality of their child-directed (CD) speech. Sixty-three males were photographed and completed a brief questionnaire concerning their family background and attitudes towards children. They then performed a task in which they gave directions to (imaginary) adults and children. Analyses of the acoustic properties of speech produced under each condition were performed in order to determine the extent to which individual men changed their speech to accommodate a child listener (i.e., exhibited CD speech). The men's faces were rated by 59 female participants, who assessed perceived prosociality, masculinity, health, and short- and long-term attractiveness. Although women's ratings of attractiveness and prosociality were related to men's self-reported liking for children, they were negatively correlated to men's use of CD speech (i.e., less attractive men used more features of CD speech when addressing an imaginary child). These findings are discussed in the context of halo effects and strategic pluralism in male mating behaviors.  相似文献   

12.
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.  相似文献   

13.
The article discusses the probable role of many factors that determine the individual variety of the neurophysiological mechanisms that provide the opportunity to learn and use fluently two or more languages. The formation of the speech function is affected by both the general factors for bilinguals and monolinguals, as well as by the specific characteristic of bilingualism. General factors include genetic and environmental impacts explaining the diversity of individual options for the development of the morphofunctional organization of the speech function. Bilinguals, obviously, have an even wider variation of the central maintenance of speech ability, due to the combination of different conditions that influence the language environment, which include the age of second language acquisition, the language proficiency, the linguistic similarity of the languages, the method of their acquisition, intensity of use, and the area where each language is used. The influence of these factors can be mediated in different ways by the individual characteristics of the bilingual??s brain. Being exposed to two languages from the first days of life, the child uses for the development of speech skills the unique features of the brain that exist only at the initial stages of postnatal ontogenesis. At an older age, mastering a second language requires much more effort, when, in the course of maturation, the brain acquires new additional possibilities but permanently loses that special ??bonus?? that nature gives to a small child only in the first months of life. Large individual variability patterns of activation of the cortex during verbal activity in older bilinguals, compared with the younger ones, allows us to assume that the brain of the older bilingual mastering a new language is forced to manipulate a large number of backup mechanisms, and this is reflected in an increase in the variation of the cerebral processes responsible for speech functions. In addition, there is a serious reason to believe that learning a second language contributes to the expansion of the functional capabilities of the brain and creates the basis for successful cognitive activity.  相似文献   

14.
When we speak, we provide ourselves with auditory speech input. Efficient monitoring of speech is often hypothesized to depend on matching the predicted sensory consequences from internal motor commands (forward model) with actual sensory feedback. In this paper we tested the forward model hypothesis using functional Magnetic Resonance Imaging. We administered an overt picture naming task in which we parametrically reduced the quality of verbal feedback by noise masking. Presentation of the same auditory input in the absence of overt speech served as listening control condition. Our results suggest that a match between predicted and actual sensory feedback results in inhibition of cancellation of auditory activity because speaking with normal unmasked feedback reduced activity in the auditory cortex compared to listening control conditions. Moreover, during self-generated speech, activation in auditory cortex increased as the feedback quality of the self-generated speech decreased. We conclude that during speaking early auditory cortex is involved in matching external signals with an internally generated model or prediction of sensory consequences, the locus of which may reside in auditory or higher order brain areas. Matching at early auditory cortex may provide a very sensitive monitoring mechanism that highlights speech production errors at very early levels of processing and may efficiently determine the self-agency of speech input.  相似文献   

15.
16.
Numerous speech processing techniques have been applied to assist hearing-impaired subjects with extreme high-frequency hearing losses who can be helped only to a limited degree with conventional hearing aids. The results of providing this class of deaf subjects with a speech encoding hearing aid, which is able to reproduce intelligible speech for their particular needs, have generally been disappointing. There are at least four problems related to bandwidth compression applied to the voiced portion of speech: (1) the problem of pitch extraction in real time; (2) pitch extraction under realistic listening conditions, i.e. when competing speech and noise sources are present; (3) an insufficient data base for successful compression of voiced speech; and (4) the introduction of undesirable spectral energies in the bandwidth-compressed signal, due to the compression process itself. Experiments seem to indicate that voiced speech segments bandwidth limited to f = 1000 Hz, even at a loss of higher formant frequencies, is in most instances superior in intelligibility compared to bandwidth-compressed voiced speech segments of the same bandwidth, even if pitch can be extracted with no error. With the added complexity of real-time pitch extraction which has to function in actual listening conditions, it is doubtful that a speech encoding hearing aid, based on bandwidth compression on the voiced portion of speech, could be successfully implemented. However, if bandwidth compression is applied to the unvoiced portions of speech only, the above limitations can be overcome (1).  相似文献   

17.
The study of the production of co-speech gestures (CSGs), i.e., meaningful hand movements that often accompany speech during everyday discourse, provides an important opportunity to investigate the integration of language, action, and memory because of the semantic overlap between gesture movements and speech content. Behavioral studies of CSGs and speech suggest that they have a common base in memory and predict that overt production of both speech and CSGs would be preceded by neural activity related to memory processes. However, to date the neural correlates and timing of CSG production are still largely unknown. In the current study, we addressed these questions with magnetoencephalography and a semantic association paradigm in which participants overtly produced speech or gesture responses that were either meaningfully related to a stimulus or not. Using spectral and beamforming analyses to investigate the neural activity preceding the responses, we found a desynchronization in the beta band (15–25 Hz), which originated 900 ms prior to the onset of speech and was localized to motor and somatosensory regions in the cortex and cerebellum, as well as right inferior frontal gyrus. Beta desynchronization is often seen as an indicator of motor processing and thus reflects motor activity related to the hand movements that gestures add to speech. Furthermore, our results show oscillations in the high gamma band (50–90 Hz), which originated 400 ms prior to speech onset and were localized to the left medial temporal lobe. High gamma oscillations have previously been found to be involved in memory processes and we thus interpret them to be related to contextual association of semantic information in memory. The results of our study show that high gamma oscillations in medial temporal cortex play an important role in the binding of information in human memory during speech and CSG production.  相似文献   

18.
The extent of research on children’s speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children’s speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children’s speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing.  相似文献   

19.
Inferences on the evolution of human speech based on anatomical data must take into account its physiology, acoustics and perception. Human speech is generated by the supralaryngeal vocal tract (SVT) acting as an acoustic filter on noise sources generated by turbulent airflow and quasi-periodic phonation generated by the activity of the larynx. The formant frequencies, which are major determinants of phonetic quality, are the frequencies at which relative energy maxima will pass through the SVT filter. Neither the articulatory gestures of the tongue nor their acoustic consequences can be fractionated into oral and pharyngeal cavity components. Moreover, the acoustic cues that specify individual consonants and vowels are “encoded”, i.e., melded together. Formant frequency encoding makes human speech a vehicle for rapid vocal communication. Non-human primates lack the anatomy that enables modern humans to produce sounds that enhance this process, as well as the neural mechanisms necessary for the voluntary control of speech articulation. The specific claims of Duchin (1990) are discussed.  相似文献   

20.
Speakers modulate their voice when talking to infants, but we know little about subtle variation in acoustic parameters during speech in adult social interactions. Because tests of perception of such variation are hampered by listeners' understanding of semantic content, studies often confine speech to enunciation of standard sentences, restricting ecological validity. Furthermore, apparent paralinguistic modulation in one language may be underpinned by specific parameters of that language. Here we circumvent these problems by recording speech directed to attractive or unattractive potential partners or competitors, and testing responses to these recordings by naive listeners, across both a Germanic (English) and a Slavic (Czech) language. Analysis of acoustic parameters indicates that men's voices varied F0 most in speech towards potential attractive versus unattractive mates, while modulation of women's F0 variability was more sensitive to competitors, with higher variability when those competitors were relatively attractive. There was striking similarity in patterns of social context-dependent F0 variation across the two model languages, with both men's and women's voices varying most when responding to attractive individuals. Men's minimum pitch was lower when responding to attractive than unattractive women. For vocal modulation to be effective, however, it must be sufficiently detectable to promote proceptivity towards the speaker. We showed that speech directed towards attractive individuals was preferred by naive listeners of either language over speech by the same speaker to unattractive individuals, even when voices were stripped of several acoustic properties by low-pass filtering, which renders speech unintelligible. Our results suggest that modulating F0 may be a critical parameter in human courtship, independently of semantic content.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号