首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Reaction time and recognition accuracy of speech emotional intonations in short meaningless words that differed only in one phoneme with background noise and without it were studied in 49 adults of 20-79 years old. The results were compared with the same parameters of emotional intonations in intelligent speech utterances under similar conditions. Perception of emotional intonations at different linguistic levels (phonological and lexico-semantic) was found to have both common features and certain peculiarities. Recognition characteristics of emotional intonations depending on gender and age of listeners appeared to be invariant with regard to linguistic levels of speech stimuli. Phonemic composition of pseudowords was found to influence the emotional perception, especially against the background noise. The most significant stimuli acoustic characteristic responsible for the perception of speech emotional prosody in short meaningless words under the two experimental conditions, i.e. with and without background noise, was the fundamental frequency variation.  相似文献   

2.
Pell MD  Kotz SA 《PloS one》2011,6(11):e27256
How quickly do listeners recognize emotions from a speaker''s voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing.  相似文献   

3.

Background

Time-compressed speech, a form of rapidly presented speech, is harder to comprehend than natural speech, especially for non-native speakers. Although it is possible to adapt to time-compressed speech after a brief exposure, it is not known whether additional perceptual learning occurs with further practice. Here, we ask whether multiday training on time-compressed speech yields more learning than that observed during the initial adaptation phase and whether the pattern of generalization following successful learning is different than that observed with initial adaptation only.

Methodology/Principal Findings

Two groups of non-native Hebrew speakers were tested on five different conditions of time-compressed speech identification in two assessments conducted 10–14 days apart. Between those assessments, one group of listeners received five practice sessions on one of the time-compressed conditions. Between the two assessments, trained listeners improved significantly more than untrained listeners on the trained condition. Furthermore, the trained group generalized its learning to two untrained conditions in which different talkers presented the trained speech materials. In addition, when the performance of the non-native speakers was compared to that of a group of naïve native Hebrew speakers, performance of the trained group was equivalent to that of the native speakers on all conditions on which learning occurred, whereas performance of the untrained non-native listeners was substantially poorer.

Conclusions/Significance

Multiday training on time-compressed speech results in significantly more perceptual learning than brief adaptation. Compared to previous studies of adaptation, the training induced learning is more stimulus specific. Taken together, the perceptual learning of time-compressed speech appears to progress from an initial, rapid adaptation phase to a subsequent prolonged and more stimulus specific phase. These findings are consistent with the predictions of the Reverse Hierarchy Theory of perceptual learning and suggest constraints on the use of perceptual-learning regimens during second language acquisition.  相似文献   

4.
Cerebral mechanisms of musical abilities were explored in musically gifted children. For this purpose, psychophysiological characteristics of perception of emotional speech information were experimentally studied in samples of gifted and ordinary children. Forty six schoolchildren and forty eight musicians of three age groups (7-10, 11-13 and 14-17 years old) participated in the study. In experimental session, a test sentence was presented to a subject through headphones with two emotional intonations (joy and anger) and without emotional expression. A subject had to recognize the type of emotion. His/her answers were recorded. The analysis of variance revealed age- and gender-related features of emotional recognition: boys musicians led the schoolchildren of the same age by 4-6 years in the development of mechanisms of emotional recognition, whereas girls musicians were 1-3 years ahead. Musical education in girls induced the shift of predominant activities for emotional perception in the left hemisphere; in boys, on the contrary, initial distinct dominance of the left hemisphere was not retained in the process of further education.  相似文献   

5.
Relations between the brain hemispheres were studied during the human perception of various types of Russian intonations. Fifty healthy subjects with normal hearing took part in the tests based on the method of monaural presentation of stimuli—the sentences that represented the main kinds of Russian emotional and linguistic intonations. The linguistic intonations expressed: various communicative types of sentences; completeness/incompleteness of a statement; various types of the syntagmatic segmentation of the statements; various logical stress. Sentences that required the identification of the emotion quality were used to study the perception of emotional intonations. The results of statistical analysis of latent periods and errors made by the test subjects demonstrated a significant preference of theright hemisphere in perceiving emotional intonations and complete/incomplete sentences; sentences with different logical stress were perceived mainly by theleft hemisphere. No significant differences were found in the perception of various communicative types of sentences and statements with different syntagmatic segmentation. The obtained data also testify to a difference in the degree of the involvement of human hemispheres in the perception and analysis of prosodic characteristics of the speech in males and females.  相似文献   

6.
Vélez A  Bee MA 《Animal behaviour》2011,(6):1319-1327
Dip listening refers to our ability to catch brief "acoustic glimpses" of speech and other sounds when fluctuating background noise levels momentarily decrease. Exploiting dips in natural fluctuations of noise contributes to our ability to overcome the "cocktail party problem" of understanding speech in multi-talker social environments. We presently know little about how nonhuman animals solve analogous communication problems. Here, we asked whether female grey treefrogs (Hyla chrysoscelis) might benefit from dip listening in selecting a mate in the noisy social setting of a breeding chorus. Consistent with a dip listening hypothesis, subjects recognized conspecific calls at lower thresholds when the dips in a chorus-like noise masker were long enough to allow glimpses of nine or more consecutive pulses. No benefits of dip listening were observed when dips were shorter and included five or fewer pulses. Recognition thresholds were higher when the noise fluctuated at a rate similar to the pulse rate of the call. In a second experiment, advertisement calls comprising six to nine pulses were necessary to elicit responses under quiet conditions. Together, these results suggest that in frogs, the benefits of dip listening are constrained by neural mechanisms underlying temporal pattern recognition. These constraints have important implications for the evolution of male signalling strategies in noisy social environments.  相似文献   

7.
We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity.  相似文献   

8.
Extensive research shows that inter-talker variability (i.e., changing the talker) affects recognition memory for speech signals. However, relatively little is known about the consequences of intra-talker variability (i.e. changes in speaking style within a talker) on the encoding of speech signals in memory. It is well established that speakers can modulate the characteristics of their own speech and produce a listener-oriented, intelligibility-enhancing speaking style in response to communication demands (e.g., when speaking to listeners with hearing impairment or non-native speakers of the language). Here we conducted two experiments to examine the role of speaking style variation in spoken language processing. First, we examined the extent to which clear speech provided benefits in challenging listening environments (i.e. speech-in-noise). Second, we compared recognition memory for sentences produced in conversational and clear speaking styles. In both experiments, semantically normal and anomalous sentences were included to investigate the role of higher-level linguistic information in the processing of speaking style variability. The results show that acoustic-phonetic modifications implemented in listener-oriented speech lead to improved speech recognition in challenging listening conditions and, crucially, to a substantial enhancement in recognition memory for sentences.  相似文献   

9.
This study investigated how speech recognition in noise is affected by language proficiency for individual non-native speakers. The recognition of English and Chinese sentences was measured as a function of the signal-to-noise ratio (SNR) in sixty native Chinese speakers who never lived in an English-speaking environment. The recognition score for speech in quiet (which varied from 15%–92%) was found to be uncorrelated with speech recognition threshold (SRTQ /2), i.e. the SNR at which the recognition score drops to 50% of the recognition score in quiet. This result demonstrates separable contributions of language proficiency and auditory processing to speech recognition in noise.  相似文献   

10.
Vocalizations are among the diverse cues that animals use to recognize individual conspecifics. For some calls, such as noisy screams, there is debate over whether such recognition occurs. To test recognition of rhesus macaque noisy screams, recorded calls were played back to unrelated and related conspecific group members as either single calls or short bouts. Higher-ranking, but not lower-ranking, monkeys looked longer toward the playback speaker in trials containing screams from kin than in those composed of screams from nonkin. In a second study, human listeners performed a "same/different" discrimination task between presentations of rhesus screams from either the same or two different monkeys. Listeners discriminated between "same" and "different" callers above an established empirical threshold, whether screams were presented singly or in short bouts. Together, these results suggest that rhesus monkeys can distinguish noisy screams between kin and nonkin, and humans are able to discriminate different individuals' noisy screams, even when the duration of the bout is short. Whether noisy screams are ideally designed signals for individual recognition is discussed with respect to possible evolutionary origins of the calls.  相似文献   

11.
In order to explore the process of adaptation of children to school environment psychophysiological characteristics of perception of emotional speech information and school progress were experimentally studied. Forty-six schoolchildren of three age groups (7-10, 11-13, and 14-17 years old) participated in the study. In experimental session, a test sentence was presented to a subject through headphones with two emotional intonations (joy and anger) and without emotional expression. A subject had to recognize the type of emotion. His/her answers were recorded. School progress was determined by year grades in Russian, foreign language, and mathematics. Analysis of variance and linear regression analysis showed that ontogenetic features of a correlation between psychophysiological mechanisms of emotion recognition and school progress were gender- and subject-dependent. This correlation was stronger in 7-13-year-old children than in senior children. This age boundary was passed by the girls earlier than by the boys.  相似文献   

12.
The effect of the temporal structure of a vocal stimulus on the perception of the emotional component of the signal was studied in several age groups (7–10, 11–13, and 14–17 years). The experiments were performed at different durations of the stimulus (0.5, 1, 1.5, 2, and 3 s). ANOVA of the recognition efficiency and response time showed that the stimulus duration and the interactions of this factor with two others (stimulus duration × age and stimulus duration × emotion type) were highly significant for the recognition of emotions. The effects of the temporal structure of the signal on the recognition efficiency and response time were the strongest in the cases of neutral and negative emotional intonations and on going from the youngest to the middle age group. The minimal stimulus duration at which the threshold recognition of the emotion type occurred changed with age (from 2 s in the youngest age group to 0.5 s in the oldest). The capacity of the sensory acoustic memory was evaluated in children and adolescents of different ages.  相似文献   

13.
The frequency-amplitude characteristics of the brain electrical activity were studied in two groups of subjects: (1) with high and (2) with low indexes of "emotional ear" (the ability to successfully recognize emotions in speech). Comparison of EEG power characteristics between the two groups of subjects permitted the authors to make a conclusion that the persons with lower indexes of "emotional ear" had a much higher EEG activation level as compared to the persons with higher "emotional ear" indexes. A different dynamics of the cortical activation was also observed in the process of recognition of emotions by alpha-rhythm amplitude. It was shown that the persons with higher indexes of recognition had higher alpha-rhythm amplitude, whereas the persons who were less successful in recognition of speech emotions had a contrary tendency: the amplitude on the alpha band decreased in the process of the experiment.  相似文献   

14.

Background

Improvement of the cochlear implant (CI) front-end signal acquisition is needed to increase speech recognition in noisy environments. To suppress the directional noise, we introduce a speech-enhancement algorithm based on microphone array beamforming and spectral estimation. The experimental results indicate that this method is robust to directional mobile noise and strongly enhances the desired speech, thereby improving the performance of CI devices in a noisy environment.

Methods

The spectrum estimation and the array beamforming methods were combined to suppress the ambient noise. The directivity coefficient was estimated in the noise-only intervals, and was updated to fit for the mobile noise.

Results

The proposed algorithm was realized in the CI speech strategy. For actual parameters, we use Maxflat filter to obtain fractional sampling points and cepstrum method to differentiate the desired speech frame and the noise frame. The broadband adjustment coefficients were added to compensate the energy loss in the low frequency band.

Discussions

The approximation of the directivity coefficient is tested and the errors are discussed. We also analyze the algorithm constraint for noise estimation and distortion in CI processing. The performance of the proposed algorithm is analyzed and further be compared with other prevalent methods.

Conclusions

The hardware platform was constructed for the experiments. The speech-enhancement results showed that our algorithm can suppresses the non-stationary noise with high SNR. Excellent performance of the proposed algorithm was obtained in the speech enhancement experiments and mobile testing. And signal distortion results indicate that this algorithm is robust with high SNR improvement and low speech distortion.  相似文献   

15.
We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants.  相似文献   

16.
The nonverbal component of human speech contains some information about the speaker himself, which for example enables listeners to recognize speakers from their voice. Here it was examined to what extent the speaker's body size and shape are betrayed in his speech signal and thus can be recognized by listeners. Contrary to earlier constitutional studies only size and not shape correlates with acoustical parameters of speech; comparing listening experiments with acoustical analysis gives some evidence that the average sound spectrum is used by listeners for judging speaker's body size.  相似文献   

17.

Background and objective

There has been a growing interest in objective assessment of speech in dysphonic patients for the classification of the type and severity of voice pathologies using automatic speech recognition (ASR). The aim of this work was to study the accuracy of the conventional ASR system (with Mel frequency cepstral coefficients (MFCCs) based front end and hidden Markov model (HMM) based back end) in recognizing the speech characteristics of people with pathological voice.

Materials and methods

The speech samples of 62 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits were taken as an input. The distribution of the first four formants of the vowel /a/ was extracted to examine deviation of the formants from normal.

Results

There was 100% recognition accuracy obtained for Arabic digits spoken by normal speakers. However, there was a significant loss of accuracy in the classifications while spoken by voice disordered subjects. Moreover, no significant improvement in ASR performance was achieved after assessing a subset of the individuals with disordered voices who underwent treatment.

Conclusion

The results of this study revealed that the current ASR technique is not a reliable tool in recognizing the speech of dysphonic patients.  相似文献   

18.
A review of experimental and theoretical works upon perception of emotions in speech is introduced. The main approaches to experimental study and different types of stimulation are considered. Clinical research and experiments upon healthy subjects investigate the brain organization of emotional speech recognition. In the works by Rusalova, Kislova integral psychophysiological preconditions for the successfulness of the recognition of speech emotional expression were studied. As a result of the investigation, extreme groups of persons were identified: with high indices of "emotional hearing" and with low level of recognition of emotions. Analysis of EEG included comparison of different EEG parameters between two groups: values of EEG power, the dominating frequencies, percentage of different EEG-bands in the summary EEG power, coherence, values of EEG inter- and intra-hemispheric asymmetry, etc. The subjects with low identification rates showed a higher brain activation and reactivity both during the emotion identification task and at rest as compared to the subjects with high identification rates. The data obtained reveal specific activation within the left frontal regions, as well as the right posterior temporal cortex during nonverbal recognition of emotions.  相似文献   

19.
Probability of detection and accuracy of distance estimates in aural avian surveys may be affected by the presence of anthropogenic noise, and this may lead to inaccurate evaluations of the effects of noisy infrastructure on wildlife. We used arrays of speakers broadcasting recordings of grassland bird songs and pure tones to assess the probability of detection, and localization accuracy, by observers at sites with and without noisy oil and gas infrastructure in south‐central Alberta from 2012 to 2014. Probability of detection varied with species and with speaker distance from transect line, but there were few effects of noisy infrastructure. Accuracy of distance estimates for songs and tones decreased as distance to observer increased, and distance estimation error was higher for tones at sites with infrastructure noise. Our results suggest that quiet to moderately loud anthropogenic noise may not mask detection of bird songs; however, errors in distance estimates during aural surveys may lead to inaccurate estimates of avian densities calculated using distance sampling. We recommend caution when applying distance sampling if most birds are unseen, and where ambient noise varies among treatments.  相似文献   

20.
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today’s automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database—a corpus containing emotionally colored conversations with a cognitive system for “Sensitive Artificial Listening”.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号