首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Aim: The aim of this contribution is to present the formant chart of the Czech vowels a, e, i, o, u and show that this can be achieved by means of digital methods of sound processing. Method: A group of 35 Czech students of the Pedagogical Faculty of Palacky University was tested and a record of whispered vowels was taken from each of them. The record was digitalized and processed by the Discrete Fourier Trasform. The result is the power spectrum of the individual vocals - the graphic output consists of a plot of the relative power of individual frequencies in the original sound. The values of the first two maxima which represent the first and the second formants were determined from the graph. The values were plotted on a formant chart. Results: Altogether, 175 spectral analyses of individual vowels were performed. In the resulting power spectrum, the first and the second formant frequencies were identified. The first formant was plotted against the second one and pure vocal formant regions were identified. Conclusion: Frequency bands for the Czech vowel "a" were circumscribed between 850 and 1150 Hz for first formant (F1) and between 1200 and 2000 Hz for second formant (F2). Similarly, borders of frequency band for vowel "e" they were 700 and 950 Hz for F1 and 1700 and 3000 Hz for F2. For vowel "i" 300 and 450 Hz for F1 and 2000 and 3600 Hz for F2, for vowel "o" 600 and 800 Hz for F1 and 600 and 1400 Hz for F2, for vowel "u" 100 and 400 Hz for F1 and 400 and 1200 Hz for F2. Discussion: At low frequencies it is feasible to invoke the source-filter model of voice production and associate vowel identity with frequencies of the first two formants in the voice spectrum. On the other hand, subject to intonation, singing or other forms of exposed voice (such as emotional speech, focused speech), the formant regions tend to spread. In spectral analysis other frequencies dominate, so specific formant frequency bands are not easily recognizable. Although the resulting formant map is not much different from the formant map of Peterson, it carries basic information about specific Czech vowels. The results may be used in further research and in education.  相似文献   

2.
In previous research, acoustic characteristics of the male voice have been shown to signal various aspects of mate quality and threat potential. But the human voice is also a medium of linguistic communication. The present study explores whether physical and vocal indicators of male mate quality and threat potential are linked to effective communicative behaviors such as vowel differentiation and use of more salient phonetic variants of consonants. We show that physical and vocal indicators of male threat potential, height and formant position, are negatively linked to vowel space size, and that height and levels of circulating testosterone are negatively linked to the use of the aspirated variant of the alveolar stop consonant /t/. Thus, taller, more masculine men display less clarity in their speech and prefer phonetic variants that may be associated with masculine attributes such as toughness. These findings suggest that vocal signals of men’s mate quality and/or dominance are not confined to the realm of voice acoustics but extend to other aspects of communicative behavior, even if this means a trade-off with speech patterns that are considered communicatively advantageous, such as clarity and indexical cues to higher social class.  相似文献   

3.
4.
Temporal summation was estimated by measuring the detection thresholds for pulses with durations of 1–50 ms in the presence of noise maskers. The purpose of the study was to examine the effects of the spectral profiles and intensities of noise maskers on temporal summation, to investigate the appearance of signs of peripheral processing of pulses with various frequency-time structures in auditory responses, and to test the opportunity to use temporal summation for speech recognition. The central frequencies of pulses and maskers were similar. The maskers had ripple structures of the amplitude spectra of two types. In some maskers, the central frequencies coincided with the spectrum humps, whereas in other maskers, they coincided with spectrum dip (so-called on- and off-maskers). When the auditory system differentiated the masker humps, then the difference between the thresholds of recognition of the stimuli presented together with each of two types of maskers was not equal to zero. The assessment of temporal summation and the difference of the thresholds of pulse recognition under conditions of the presentation of the on- and off-maskers allowed us to make a conclusion on auditory sensitivity and the resolution of the spectral structure of maskers or frequency selectivity during presentation of pulses of various durations in local frequency areas. In order to estimate the effect of the dynamic properties of hearing on sensitivity and frequency selectivity, we changed the intensity of maskers. We measured temporal summation under the conditions of the presentation of on- and off-maskers of various intensities in two frequency ranges (2 and 4 kHz) in four subjects with normal hearing and one person with age-related hearing impairments who complained of a decrease in speech recognition under noise conditions. Pulses shorter than 10 ms were considered as simple models of consonant sounds, whereas tone pulses longer than 10 ms were considered as simple models of vowel sounds. In subjects with normal hearing in the range of moderate masker intensities, we observed an enhancement of temporal summation when the short pulses or consonant sounds were presented and an improvement of the resolution of the broken structure of masker spectra when the short and tone pulses, i.e., consonant and vowel sounds, were presented. We supposed that the enhancement of the summation was related to the refractoriness of the fibers of the auditory nerve. In the range of 4 kHz, the subject with age-related hearing impairments did not recognize the ripple structure of the maskers in the presence of the short pulses or consonant sounds. We supposed that these impairments were caused by abnormal synchronization of the responses of the auditory nerve fibers induced by the pulses, and this resulted in a decrease in speech recognition.  相似文献   

5.
The present report presents an attempt to define the physiological parameter used to describe “voice tremor” in psychological stress evaluating machines, and to find its sources. This parameter was found to be a low frequency (5–20 Hz) random process which frequency modulates the vocal cord waveform and (independently) affects the frequency range of the third speech formant. The frequency variations in unstressed speakers were found to be the result of forced muscular undulations driven by central nervous signals and not of a passive resonant phenomenon. In this paper various physiological and clinical experiments which lead to the above conclusions are discussed. a) It is shown that induced muscular activity in the vocal tract and vocal cord regions can generate tremor in the voice. b) It is shown that relaxed subjects exhibit significant tremor correlation between spontaneously generated speech and EMG, with the EMG leading the speech tremor. c) Tremor in the electrical activity recorded from muscles overlapping vocal tract area was correlated with third formant demodulated signal and vocal cord demodulated pitch tremor was correlated with first formant demodulated tremor. d) Enhanced tremor was found in Parkinson patients and diminished tremor in patients with some traumatic brain injuries.  相似文献   

6.
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.  相似文献   

7.
In the last years, intimate partner violence (IPV) became a relevant problem for community and for social life, particularly in young people. Its correct assessment and evaluation in the population is mandatory. Our objectives were: Confirm factor structure of Dating Violence Questionnaire (DVQ) and investigate its convergent and divergent validity. The DVQ along with other personality measures were filled by a sample of 418 university students (Females = 310) of average age of 23 y.o. (SD = 4.71). A subsample of participants (223 students) consented in being involved also in retest and filled also the Revised Eysenck Personality Questionnaire (short form) and a brief scale for describing the behavior of the (past) partner after the breaking of the relationship (BRS). The 8-factor structure, with respect to the two other competing models, reported better fit indexes and showed significant correlations with other personality measures. Personality traits, both Neuroticism and Psychoticism, correlated with Sexual Violence, while Detachment correlated only with Neuroticism and Coercion, Humiliation and Physical Violence correlated with only Psychoticism. Extraversion did not report significant relationships with any of the 8 DVQ factors. Also the predictive validity of DVQ was satisfactory with the partner violent reaction to the break of relationship predicted positively predicted by Coercion (b = 0.22) and by Humiliation (b = 0.20) and negatively by Emotional Punishment (b = -0.18). The present results indicate a good factor structure of the questionnaire, and interesting correlations with personality traits, allowing to identify psychological aspects with a predisposing role for anti-social aggressive behaviors. Further studies will be aimed at ascertaining other possible determinants of intimate partner violence and the weight of cultural aspects.  相似文献   

8.
Variation is a ubiquitous feature of speech. Listeners must take into account context-induced variation to recover the interlocutor''s intended message. When listeners fail to normalize for context-induced variation properly, deviant percepts become seeds for new perceptual and production norms. In question is how deviant percepts accumulate in a systematic fashion to give rise to sound change (i.e., new pronunciation norms) within a given speech community. The present study investigated subjects'' classification of /s/ and // before /a/ or /u/ spoken by a male or a female voice. Building on modern cognitive theories of autism-spectrum condition, which see variation in autism-spectrum condition in terms of individual differences in cognitive processing style, we established a significant correlation between individuals'' normalization for phonetic context (i.e., whether the following vowel is /a/ or /u/) and talker voice variation (i.e., whether the talker is male or female) in speech and their “autistic” traits, as measured by the Autism Spectrum Quotient (AQ). In particular, our mixed-effect logistic regression models show that women with low AQ (i.e., the least “autistic”) do not normalize for phonetic coarticulation as much as men and high AQ women. This study provides first direct evidence that variability in human''s ability to compensate for context-induced variations in speech perceptually is governed by the individual''s sex and cognitive processing style. These findings lend support to the hypothesis that the systematic infusion of new linguistic variants (i.e., the deviant percepts) originate from a sub-segment of the speech community that consistently under-compensates for contextual variation in speech.  相似文献   

9.
The perception of vowels was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of vowels were taken as an index of similarity between vowels. Vowels used were five synthetic and natural Japanese vowels and eight natural French vowels. The chimpanzees required long reaction times for discrimination of synthetic [i] from [u] and [e] from [o], that is, they need long latencies for discrimination between vowels based on differences in frequency of the second formant. A similar tendency was observed for discrimination of natural [i] from [u]. The human subject required long reaction times for discrimination between vowels along the first formant axis. These differences can be explained by differences in auditory sensitivity between the two species and the motor theory of speech perception. A vowel, which is pronounced by different speakers, has different acoustic properties. However, humans can perceive these speech sounds as the same vowel. The phenomenon of perceptual constancy in speech perception was studied in chimpanzees using natural vowels and a synthetic [o]- [a] continuum. The chimpanzees ignored the difference in the sex of the speakers and showed a capacity for vocal tract normalization.  相似文献   

10.
Personality’s link to emotional experience has been demonstrated, but specific biological responses to emotion as a function of personality have not been well-established. Here, the association between personality and physiological responses (heart rate, skin conductance, and respiration) to emotional videos was assessed. One-hundred sixty-nine participants self-reported on their Big 5 personality traits and underwent ambulatory monitoring as they watched four brief video clips from primetime television content showing scenes containing violence, fear, sadness, and tension. Generally, the negatively-toned emotional scenes provoked increases in skin conductance response and declines in heart rate. We found that physiological outcomes depended on the particular emotional scene and on personality, most notably Extraversion and Neuroticism. Extraversion, and to a lesser degree, Neuroticism, were associated with increases in autonomic arousal responses to the scenes. Gender also interacted with personality to predict responses, such that women who scored higher on measures of Extraversion, Neuroticism, and Conscientious tended to show more physiological arousal than men. Overall, the emotional scenes evoked increases in arousal and more controlled attention. The findings are discussed in context of the limited capacity model and shed light on how personality and gender affect physiological reactions to emotional experiences in everyday life.  相似文献   

11.
Temporal measurements of frequency changes in the human voice and tremor in the muscles of the vocal area suggest that these phenomena are correlated. A measure of muscular activity was obtained using a non linear filter on the EMG wave. Frequency changes were detected in the third formant of the voice spectrum. The crosscorrelation results indicate that the voice vibrations are forced oscillations generated by central nervous activity, explaining the detection of stress from voice records.  相似文献   

12.
An artificial neural network which uses anatomical and physiological findings on the afferent pathway from the ear to the cortex is presented and the roles of the constituent functions in recognition of continuous speech are examined. The network deals with successive spectra of speech sounds by a cascade of several neural layers: lateral excitation layer (LEL), lateral inhibition layer (LIL), and a pile of feature detection layers (FDL's). These layers are shown to be effective for recognizing spoken words. Namely, first, LEL reduces the distortion of sound spectrum caused by the pitch of speech sounds. Next, LIL emphasizes the major energy peaks of sound spectrum, the formants. Last, FDL's detect syllables and words in successive formants, where two functions, time-delay and strong adaptation, play important roles: time-delay makes it possible to retain the pattern of formant changes for a period to detect spoken words successively; strong adaptation contributes to removing the time-warp of formant changes. Digital computer simulations show that the network detect isolated syllables, isolated words, and connected words in continuous speech, while reproducing the fundamental responses found in the auditory system such as ON, OFF, ON-OFF, and SUSTAINED patterns.  相似文献   

13.
14.
For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception.  相似文献   

15.
Summary The responses of neurons in field L in the auditory neostriatum of the mynah bird, Gracula religiosa, were recorded during presentation of intact or manipulated mimic voices. A typical mimic voice konnichiwa elicited responses in most of the neurons. Neurons in the input layer (L2) of field L showed many peaks on peristimulus time histograms while those in other layers (L1 and L3) exhibited only one or two peaks. Several neurons in L1 and L3 responded only to the affricative consonant /t/ in the intact mimic voices. They did not respond to the affricative consonant in the isolated segment or to the one in the playbacked voice in reverse. Forty-five percent of the neurons (33/ 73) decreased in firing rates at the affricative consonant in the isolated segment compared with in the intact voice. Some of these neurons, in which neither the affricative consonant in the isolated segment nor bursts of noise alone elicited responses, exhibited clear phasic responses to /t/ in the case when bursts of noise with particular central frequencies preceded the affricative consonant. The responsiveness of these neurons appears to receive temporal facilitation. These results suggest that these neurons code the temporal relationship of speech sound.Abbreviations HVc hyperstriatum ventrale, pars caudale - TFN temporally facilitated neuron - TSN temporally suppressed neuron  相似文献   

16.

Background

Recent research has addressed the suppression of cortical sensory responses to altered auditory feedback that occurs at utterance onset regarding speech. However, there is reason to assume that the mechanisms underlying sensorimotor processing at mid-utterance are different than those involved in sensorimotor control at utterance onset. The present study attempted to examine the dynamics of event-related potentials (ERPs) to different acoustic versions of auditory feedback at mid-utterance.

Methodology/Principal findings

Subjects produced a vowel sound while hearing their pitch-shifted voice (100 cents), a sum of their vocalization and pure tones, or a sum of their vocalization and white noise at mid-utterance via headphones. Subjects also passively listened to playback of what they heard during active vocalization. Cortical ERPs were recorded in response to different acoustic versions of feedback changes during both active vocalization and passive listening. The results showed that, relative to passive listening, active vocalization yielded enhanced P2 responses to the 100 cents pitch shifts, whereas suppression effects of P2 responses were observed when voice auditory feedback was distorted by pure tones or white noise.

Conclusion/Significance

The present findings, for the first time, demonstrate a dynamic modulation of cortical activity as a function of the quality of acoustic feedback at mid-utterance, suggesting that auditory cortical responses can be enhanced or suppressed to distinguish self-produced speech from externally-produced sounds.  相似文献   

17.

Background

Limited research exists exploring the influence of personality on adherence behaviour. Since non-adherence is a major obstacle in treating prevalent chronic diseases the aim was to determine whether personality traits are related to reported adherence to medication in individuals with chronic disease.

Methodology/Principal Findings

Individuals with chronic disease (n = 749) were identified in a random population sample of 5000 inhabitants aged 30–70 in two municipalities in West Sweden. Data on five personality traits, Neuroticism, Extraversion, Openness to experiences, Agreeableness, and Conscientiousness, and medication adherence behaviour was collected by questionnaires. Statistical analyses resulted in a negative relationship between Neuroticism and medication adherence (P<0.001), while both Agreeableness (P<0.001) and Conscientiousness (P<0.001) were positively related to adherence. At high levels of Conscientiousness, low adherence was related to higher scores in Neuroticism. At high levels of Agreeableness, low adherence was related to low scores in Conscientiousness and high scores in Openness to experiences.

Conclusions

This study demonstrated that multiple personality traits are of significant importance for adherence behaviour in individuals with chronic disease. The findings suggest that several personality traits may interact in influencing adherence behaviour. Personality traits could putatively be used to focus efforts to educate and support patients with high risk of low medical adherence.  相似文献   

18.
The present study investigated the effects of sequence complexity, defined in terms of phonemic similarity and phonotoactic probability, on the timing and accuracy of serial ordering for speech production in healthy speakers and speakers with either hypokinetic or ataxic dysarthria. Sequences were comprised of strings of consonant-vowel (CV) syllables with each syllable containing the same vowel, /a/, paired with a different consonant. High complexity sequences contained phonemically similar consonants, and sounds and syllables that had low phonotactic probabilities; low complexity sequences contained phonemically dissimilar consonants and high probability sounds and syllables. Sequence complexity effects were evaluated by analyzing speech error rates and within-syllable vowel and pause durations. This analysis revealed that speech error rates were significantly higher and speech duration measures were significantly longer during production of high complexity sequences than during production of low complexity sequences. Although speakers with dysarthria produced longer overall speech durations than healthy speakers, the effects of sequence complexity on error rates and speech durations were comparable across all groups. These findings indicate that the duration and accuracy of processes for selecting items in a speech sequence is influenced by their phonemic similarity and/or phonotactic probability. Moreover, this robust complexity effect is present even in speakers with damage to subcortical circuits involved in serial control for speech.  相似文献   

19.
The purpose of this study was: (i) to provide additional evidence regarding the existence of human voice parameters, which could be reliable indicators of a speaker's physical characteristics and (ii) to examine the ability of listeners to judge voice pleasantness and a speaker's characteristics from speech samples. We recorded 26 men enunciating five vowels. Voices were played to 102 female judges who were asked to assess vocal attractiveness and speakers' age, height and weight. Statistical analyses were used to determine: (i) which physical component predicted which vocal component and (ii) which vocal component predicted which judgment. We found that men with low-frequency formants and small formant dispersion tended to be older, taller and tended to have a high level of testosterone. Female listeners were consistent in their pleasantness judgment and in their height, weight and age estimates. Pleasantness judgments were based mainly on intonation. Female listeners were able to correctly estimate age by using formant components. They were able to estimate weight but we could not explain which acoustic parameters they used. However, female listeners were not able to estimate height, possibly because they used intonation incorrectly. Our study confirms that in all mammal species examined thus far, including humans, formant components can provide a relatively accurate indication of a vocalizing individual's characteristics. Human listeners have the necessary information at their disposal; however, they do not necessarily use it.  相似文献   

20.
M Latinus  P Belin 《PloS one》2012,7(7):e41384
Humans can identify individuals from their voice, suggesting the existence of a perceptual representation of voice identity. We used perceptual aftereffects - shifts in perceived stimulus quality after brief exposure to a repeated adaptor stimulus - to further investigate the representation of voice identity in two experiments. Healthy adult listeners were familiarized with several voices until they reached a recognition criterion. They were then tested on identification tasks that used vowel stimuli generated by morphing between the different identities, presented either in isolation (baseline) or following short exposure to different types of voice adaptors (adaptation). Experiment 1 showed that adaptation to a given voice induced categorization shifts away from that adaptor's identity even when the adaptors consisted of vowels different from the probe stimuli. Moreover, original voices and caricatures resulted in comparable aftereffects, ruling out an explanation of identity aftereffects in terms of adaptation to low-level features. In Experiment 2, we show that adaptors with a disrupted configuration, i.e., altered fundamental frequency or formant frequencies, failed to produce perceptual aftereffects showing the importance of the preserved configuration of these acoustical cues in the representation of voices. These two experiments indicate a high-level, dynamic representation of voice identity based on the combination of several lower-level acoustical features into a specific voice configuration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号