首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
The sequential organization of sound over time can interact with the concurrent organization of sounds across frequency. Previous studies using simple acoustic stimuli have suggested that sequential streaming cues can retroactively affect the perceptual organization of sounds that have already occurred. It is unknown whether such effects generalize to the perception of speech sounds. Listeners’ ability to identify two simultaneously presented vowels was measured in the following conditions: no context, a preceding context stream (precursors), and a following context stream (postcursors). The context stream was comprised of brief repetitions of one of the two vowels, and the primary measure of performance was listeners’ ability to identify the other vowel. Results in the precursor condition showed a significant advantage for the identification of the second vowel compared to the no-context condition, suggesting that sequential grouping mechanisms aided the segregation of the concurrent vowels, in agreement with previous work. However, performance in the postcursor condition was significantly worse compared to the no-context condition, providing no evidence for an effect of stream segregation, and suggesting a possible interference effect. Two additional experiments involving inharmonic (jittered) vowels were performed to provide additional cues to aid retroactive stream segregation; however, neither manipulation enabled listeners to improve their identification of the target vowel. Taken together with earlier studies, the results suggest that retroactive streaming may require large spectral differences between concurrent sources and thus may not provide a robust segregation cue for natural broadband sounds such as speech.  相似文献   

2.
Previous studies have shown that concurrent vowel identification improves with increasing temporal onset asynchrony of the vowels, even if the vowels have the same fundamental frequency. The current study investigated the possible underlying neural processing involved in concurrent vowel perception. The individual vowel stimuli from a previously published study were used as inputs for a phenomenological auditory-nerve (AN) model. Spectrotemporal representations of simulated neural excitation patterns were constructed (i.e., neurograms) and then matched quantitatively with the neurograms of the single vowels using the Neurogram Similarity Index Measure (NSIM). A novel computational decision model was used to predict concurrent vowel identification. To facilitate optimum matches between the model predictions and the behavioral human data, internal noise was added at either neurogram generation or neurogram matching using the NSIM procedure. The best fit to the behavioral data was achieved with a signal-to-noise ratio (SNR) of 8 dB for internal noise added at the neurogram but with a much smaller amount of internal noise (SNR of 60 dB) for internal noise added at the level of the NSIM computations. The results suggest that accurate modeling of concurrent vowel data from listeners with normal hearing may partly depend on internal noise and where internal noise is hypothesized to occur during the concurrent vowel identification process.  相似文献   

3.
We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex. Received: 20 July 1993/Accepted in revised form: 22 December 1993  相似文献   

4.
Aim: The aim of this contribution is to present the formant chart of the Czech vowels a, e, i, o, u and show that this can be achieved by means of digital methods of sound processing. Method: A group of 35 Czech students of the Pedagogical Faculty of Palacky University was tested and a record of whispered vowels was taken from each of them. The record was digitalized and processed by the Discrete Fourier Trasform. The result is the power spectrum of the individual vocals - the graphic output consists of a plot of the relative power of individual frequencies in the original sound. The values of the first two maxima which represent the first and the second formants were determined from the graph. The values were plotted on a formant chart. Results: Altogether, 175 spectral analyses of individual vowels were performed. In the resulting power spectrum, the first and the second formant frequencies were identified. The first formant was plotted against the second one and pure vocal formant regions were identified. Conclusion: Frequency bands for the Czech vowel "a" were circumscribed between 850 and 1150 Hz for first formant (F1) and between 1200 and 2000 Hz for second formant (F2). Similarly, borders of frequency band for vowel "e" they were 700 and 950 Hz for F1 and 1700 and 3000 Hz for F2. For vowel "i" 300 and 450 Hz for F1 and 2000 and 3600 Hz for F2, for vowel "o" 600 and 800 Hz for F1 and 600 and 1400 Hz for F2, for vowel "u" 100 and 400 Hz for F1 and 400 and 1200 Hz for F2. Discussion: At low frequencies it is feasible to invoke the source-filter model of voice production and associate vowel identity with frequencies of the first two formants in the voice spectrum. On the other hand, subject to intonation, singing or other forms of exposed voice (such as emotional speech, focused speech), the formant regions tend to spread. In spectral analysis other frequencies dominate, so specific formant frequency bands are not easily recognizable. Although the resulting formant map is not much different from the formant map of Peterson, it carries basic information about specific Czech vowels. The results may be used in further research and in education.  相似文献   

5.
In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learning by analyzing the duration of vowels from over 11 hours of Japanese infant-directed speech. We found that long vowels are substantially longer than short vowels in the input directed to infants, for each of the five oral vowels. However, we also found that learning phonemic length from the overall distribution of vowel duration is not going to be easy for a simple distributional learner, because of the large base-rate effect (i.e., 94% of vowels are short), and because of the many factors that influence vowel duration (e.g., intonational phrase boundaries, word boundaries, and vowel height). Therefore, a successful learner would need to take into account additional factors such as prosodic and lexical cues in order to discover that duration can contrast the meaning of words in Japanese. These findings highlight the importance of taking into account the naturalistic distributions of lexicons and acoustic cues when modeling early phonemic learning.  相似文献   

6.
Four male Long-Evans rats were trained to discriminate between synthetic vowel sounds using a GO/NOGO response choice task. The vowels were characterized by an increase in fundamental frequency correlated with an upward shift in formant frequencies. In an initial phase we trained the subjects to discriminate between two vowel categories using two exemplars from each category. In a subsequent phase the ability of the rats to generalize the discrimination between the two categories was tested. To test whether rats might exploit the fact that attributes of training stimuli covaried, we used non-standard stimuli with a reversed relation between fundamental frequency and formants. The overall results demonstrate that rats are able to generalize the discrimination to new instances of the same vowels. We present evidence that the performance of the subjects depended on the relation between fundamental and formant frequencies that they had previously been exposed to. Simple simulation results with artificial neural networks could reproduce most of the behavioral results and support the hypothesis that equivalence classes for vowels are associated with an experience-driven process based on general properties of peripheral auditory coding mixed with elementary learning mechanisms. These results suggest that rats use spectral and temporal cues similarly to humans despite differences in basic auditory capabilities.  相似文献   

7.
The perception of vowels was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of vowels were taken as an index of similarity between vowels. Vowels used were five synthetic and natural Japanese vowels and eight natural French vowels. The chimpanzees required long reaction times for discrimination of synthetic [i] from [u] and [e] from [o], that is, they need long latencies for discrimination between vowels based on differences in frequency of the second formant. A similar tendency was observed for discrimination of natural [i] from [u]. The human subject required long reaction times for discrimination between vowels along the first formant axis. These differences can be explained by differences in auditory sensitivity between the two species and the motor theory of speech perception. A vowel, which is pronounced by different speakers, has different acoustic properties. However, humans can perceive these speech sounds as the same vowel. The phenomenon of perceptual constancy in speech perception was studied in chimpanzees using natural vowels and a synthetic [o]- [a] continuum. The chimpanzees ignored the difference in the sex of the speakers and showed a capacity for vocal tract normalization.  相似文献   

8.
An algorithm that operates in real-time to enhance the salient features of speech is described and its efficacy is evaluated. The Contrast Enhancement (CE) algorithm implements dynamic compressive gain and lateral inhibitory sidebands across channels in a modified winner-take-all circuit, which together produce a form of suppression that sharpens the dynamic spectrum. Normal-hearing listeners identified spectrally smeared consonants (VCVs) and vowels (hVds) in quiet and in noise. Consonant and vowel identification, especially in noise, were improved by the processing. The amount of improvement did not depend on the degree of spectral smearing or talker characteristics. For consonants, when results were analyzed according to phonetic feature, the most consistent improvement was for place of articulation. This is encouraging for hearing aid applications because confusions between consonants differing in place are a persistent problem for listeners with sensorineural hearing loss.  相似文献   

9.
Perception of approaching and withdrawing sound sources and their action on auditory aftereffects were studied in the free field. Motion of adapting stimuli was mimicked in two ways: (1) simultaneous opposite changes of amplitude of broadband noise impulses at two loudspeakers placed at 1.1 and 4.5 m from the listener; (2) an increase or a decrease of amplitude of broadband noise impulses in only one loudspeaker, the nearer or the remote one. Motion of test stimuli was mimicked in the former way. Listeners determined direction of the test stimuli motion without any adaptation (control) or after adaptation to stationary, slowly moving (with an amplitude change of 2 dB) and rapidly moving (amplitude change of 12 dB) stimuli. Percentages of “withdrawal” reports were used for construction of psychometric curves. Three phenomena of auditory perception were observed. In the absence of adaptation, a growing-louder effect was revealed, i.e., listeners reported more frequently the test sounds as the approaching ones. Once adapted to stationary or slowly moving stimuli, listeners showed a location-dependent aftereffect. Test stimuli were reported as withdrawing more often as compared with control. The effect was associated with the previous one and was weaker when the distance to the loudspeaker producing adapting stimuli was greater. After adaptation to rapidly moving stimuli, a motion aftereffect was revealed. In this case, listeners reported a direction of test stimuli motion as being opposite to that of adapting stimuli. The motion aftereffect was more pronounced when the adapting stimuli motion was mimicked in the former way, as this method allows estimation of their trajectory. There was no relationship between the motion aftereffect and the growing-louder effect, whichever way the adapting stimuli were produced. There was observed a tendency for reduction of aftereffects of approaching and for intensification of aftereffects of withdrawal with growing distance from source of adapting stimuli.  相似文献   

10.
The purpose of this study was: (i) to provide additional evidence regarding the existence of human voice parameters, which could be reliable indicators of a speaker's physical characteristics and (ii) to examine the ability of listeners to judge voice pleasantness and a speaker's characteristics from speech samples. We recorded 26 men enunciating five vowels. Voices were played to 102 female judges who were asked to assess vocal attractiveness and speakers' age, height and weight. Statistical analyses were used to determine: (i) which physical component predicted which vocal component and (ii) which vocal component predicted which judgment. We found that men with low-frequency formants and small formant dispersion tended to be older, taller and tended to have a high level of testosterone. Female listeners were consistent in their pleasantness judgment and in their height, weight and age estimates. Pleasantness judgments were based mainly on intonation. Female listeners were able to correctly estimate age by using formant components. They were able to estimate weight but we could not explain which acoustic parameters they used. However, female listeners were not able to estimate height, possibly because they used intonation incorrectly. Our study confirms that in all mammal species examined thus far, including humans, formant components can provide a relatively accurate indication of a vocalizing individual's characteristics. Human listeners have the necessary information at their disposal; however, they do not necessarily use it.  相似文献   

11.
M Latinus  P Belin 《PloS one》2012,7(7):e41384
Humans can identify individuals from their voice, suggesting the existence of a perceptual representation of voice identity. We used perceptual aftereffects - shifts in perceived stimulus quality after brief exposure to a repeated adaptor stimulus - to further investigate the representation of voice identity in two experiments. Healthy adult listeners were familiarized with several voices until they reached a recognition criterion. They were then tested on identification tasks that used vowel stimuli generated by morphing between the different identities, presented either in isolation (baseline) or following short exposure to different types of voice adaptors (adaptation). Experiment 1 showed that adaptation to a given voice induced categorization shifts away from that adaptor's identity even when the adaptors consisted of vowels different from the probe stimuli. Moreover, original voices and caricatures resulted in comparable aftereffects, ruling out an explanation of identity aftereffects in terms of adaptation to low-level features. In Experiment 2, we show that adaptors with a disrupted configuration, i.e., altered fundamental frequency or formant frequencies, failed to produce perceptual aftereffects showing the importance of the preserved configuration of these acoustical cues in the representation of voices. These two experiments indicate a high-level, dynamic representation of voice identity based on the combination of several lower-level acoustical features into a specific voice configuration.  相似文献   

12.
Pattern-onset visual evoked potentials were elicited from humans by sinusoidal gratings of 0.5, 1, 2 and 4 cpd (cycles/degree) following adaptation to a blank field or one of the gratings. The wave forms recorded after blank field adaptation showed an early positive component, P0, which decreased in amplitude with spatial frequency, whereas the immediately succeeding negative component, N1, increased in amplitude with spatial frequency. P0 and N1 components of comparable size were recorded at 1 cpd. Stationary pattern adaptation to a grating of the same spatial frequency as the test grating significantly reduced N1 amplitude at 4, 2 and 1 cpd. The N1 component elicited at 4 cpd was attenuated in log-linear fashion as the spatial frequency of the adaptation grating increased. P0, on the other hand, was unaffected by stationary pattern adaptation at all combinations of test and adapting spatial frequencies, although P0 amplitude is known to be attenuated by adaptation to a drifting grating. Since N1, but not P0, was significantly attenuated following adaptation and testing at 1 cpd, it was concluded that the neurons generating these components are functionally distinct. The use of a common adaptation grating discounted the possibility that N1, but not P0, was affected due to a difference in the rates of retinal image modulation caused by eye movements made while viewing adaptation gratings of different spatial frequencies. The neurons generating N1 were adapted at a lower rate of retinal image modulation than that apparently required for adaptation of the neurons generating P0, which suggests a difference between these neurons in the rate of stimulus modulation necessary for activation.  相似文献   

13.
14.
15.
Distributional learning of speech sounds (i.e., learning from simple exposure to frequency distributions of speech sounds in the environment) has been observed in the lab repeatedly in both infants and adults. The current study is the first attempt to examine whether the capacity for using the mechanism is different in adults than in infants. To this end, a previous event-related potential study that had shown distributional learning of the English vowel contrast /æ/∼/ε/ in 2-to-3-month old Dutch infants was repeated with Dutch adults. Specifically, the adults were exposed to either a bimodal distribution that suggested the existence of the two vowels (as appropriate in English), or to a unimodal distribution that did not (as appropriate in Dutch). After exposure the participants were tested on their discrimination of a representative [æ] and a representative [ε], in an oddball paradigm for measuring mismatch responses (MMRs). Bimodally trained adults did not have a significantly larger MMR amplitude, and hence did not show significantly better neural discrimination of the test vowels, than unimodally trained adults. A direct comparison between the normalized MMR amplitudes of the adults with those of the previously tested infants showed that within a reasonable range of normalization parameters, the bimodal advantage is reliably smaller in adults than in infants, indicating that distributional learning is a weaker mechanism for learning speech sounds in adults (if it exists in that group at all) than in infants.  相似文献   

16.
《Anthrozo?s》2013,26(3):373-380
ABSTRACT

Vowel triangle area is a phonetic measure of the clarity of vowel articulation. Compared with speech to adults, people hyperarticulate vowels in speech to infants and foreigners but not to pets, despite other similarities in infant- and pet-directed-speech. This suggests that vowel hyperarticulation has a didactic function positively related to the actual, or even the expected, degree of linguistic competence of the audience. Parrots have some degree of linguistic competence yet no studies have examined vowel hyperarticulation in speech to parrots. Here, we compared the speech of 11 adults to another adult, a dog, a parrot, and an infant. A significant linear increase in vowel triangle area was found across the four conditions, showing that the degree of vowel hyperarticulation increased from adult- and dog-directed speech to parrot-directed speech, then to infant-directed speech. This suggests that the degree of vowel hyperarticulation is related to the audience's actual or expected linguistic competence. The results are discussed in terms of the relative roles of speakers' expectations versus listeners' feedback in the production of vowel hyperarticulation; and suggestions for further studies, manipulating speaker expectation and listener feedback, are provided.  相似文献   

17.
We investigated the electrophysiological response to matched two-formant vowels and two-note musical intervals, with the goal of examining whether music is processed differently from language in early cortical responses. Using magnetoencephalography (MEG), we compared the mismatch-response (MMN/MMF, an early, pre-attentive difference-detector occurring approximately 200 ms post-onset) to musical intervals and vowels composed of matched frequencies. Participants heard blocks of two stimuli in a passive oddball paradigm in one of three conditions: sine waves, piano tones and vowels. In each condition, participants heard two-formant vowels or musical intervals whose frequencies were 11, 12, or 24 semitones apart. In music, 12 semitones and 24 semitones are perceived as highly similar intervals (one and two octaves, respectively), while in speech 12 semitones and 11 semitones formant separations are perceived as highly similar (both variants of the vowel in ‘cut’). Our results indicate that the MMN response mirrors the perceptual one: larger MMNs were elicited for the 12–11 pairing in the music conditions than in the language condition; conversely, larger MMNs were elicited to the 12–24 pairing in the language condition that in the music conditions, suggesting that within 250 ms of hearing complex auditory stimuli, the neural computation of similarity, just as the behavioral one, differs significantly depending on whether the context is music or speech.  相似文献   

18.
Chicks perform conspicuous begging behaviour in response to the arrival of a parent. In seabirds colonies, as nests are close to each other, chicks are permanently surrounded by sound and visual stimuli produced by adult conspecifics approaching their nests. However, in spite of these conditions, black-headed gull chicks begin to vocalize as their parent approaches even before they can see it. In this paper, we report field experiments testing sound-based discrimination of parents by black-headed gull chicks. Focusing on the 'long call', i.e. the signal emitted by parents when coming back to the nest, we investigate here the acoustic parameters used for this recognition process. By playback experiments using modified 'long calls', we demonstrated that signals without amplitude modulation still elicit responses in chicks. In contrast, frequency modulation appears essential. In the frequency domain, experiments revealed that frequency analysis is precise. Chicks did not react when the frequency spectrum of parental call was shifted 20 Hz down or up. The totality of harmonics is not necessary: chicks require only two harmonics to discriminate between parents. Signal redundancy is of great significance since a minimum of four successive syllables in parental 'long call' are required to elicit reaction in the chick.  相似文献   

19.
Recent work on the identification and perception of fricatives has focussed on the use by listeners of spectral moments derived from the whole spectrum and there appears to be no work in the literature on the use of prominent spectral peaks. In this study, we map the response of a single listener to narrow bands of noise that "mimic" the spectral peaks of English voiceless fricatives. The stimuli are based on the critical-band rate scale (Zwicker and Fastl, 1990) which divides the audible frequency range up to 15,500 Hz into 24 abutting critical bands. The results suggest that listeners have knowledge that enables them to connect a narrow-band spectral peak with a particular fricative consonant. We demonstrate that such knowledge, particularly in conjunction with a normalization metric that takes account of an individual speaker's vocal tract characteristics (F0 of the vowel following the fricative), could be used to good effect, particularly in noisy conditions which impair the use of the whole spectrum.  相似文献   

20.
The audibility of a target tone in a multitone background masker is enhanced by the presentation of a precursor sound consisting of the masker alone. There is evidence that precursor-induced neural adaptation plays a role in this perceptual enhancement. However, the precursor may also be strategically used by listeners as a spectral template of the following masker to better segregate it from the target. In the present study, we tested this hypothesis by measuring the audibility of a target tone in a multitone masker after the presentation of precursors which, in some conditions, were made dissimilar to the masker by gating their components asynchronously. The precursor and the following sound were presented either to the same ear or to opposite ears. In either case, we found no significant difference in the amount of enhancement produced by synchronous and asynchronous precursors. In a second experiment, listeners had to judge whether a synchronous multitone complex contained exactly the same tones as a preceding precursor complex or had one tone less. In this experiment, listeners performed significantly better with synchronous than with asynchronous precursors, showing that asynchronous precursors were poorer perceptual templates of the synchronous multitone complexes. Overall, our findings indicate that precursor-induced auditory enhancement cannot be fully explained by the strategic use of the precursor as a template of the following masker. Our results are consistent with an explanation of enhancement based on selective neural adaptation taking place at a central locus of the auditory system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号