首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learning by analyzing the duration of vowels from over 11 hours of Japanese infant-directed speech. We found that long vowels are substantially longer than short vowels in the input directed to infants, for each of the five oral vowels. However, we also found that learning phonemic length from the overall distribution of vowel duration is not going to be easy for a simple distributional learner, because of the large base-rate effect (i.e., 94% of vowels are short), and because of the many factors that influence vowel duration (e.g., intonational phrase boundaries, word boundaries, and vowel height). Therefore, a successful learner would need to take into account additional factors such as prosodic and lexical cues in order to discover that duration can contrast the meaning of words in Japanese. These findings highlight the importance of taking into account the naturalistic distributions of lexicons and acoustic cues when modeling early phonemic learning.  相似文献   

2.
Aim: The aim of this contribution is to present the formant chart of the Czech vowels a, e, i, o, u and show that this can be achieved by means of digital methods of sound processing. Method: A group of 35 Czech students of the Pedagogical Faculty of Palacky University was tested and a record of whispered vowels was taken from each of them. The record was digitalized and processed by the Discrete Fourier Trasform. The result is the power spectrum of the individual vocals - the graphic output consists of a plot of the relative power of individual frequencies in the original sound. The values of the first two maxima which represent the first and the second formants were determined from the graph. The values were plotted on a formant chart. Results: Altogether, 175 spectral analyses of individual vowels were performed. In the resulting power spectrum, the first and the second formant frequencies were identified. The first formant was plotted against the second one and pure vocal formant regions were identified. Conclusion: Frequency bands for the Czech vowel "a" were circumscribed between 850 and 1150 Hz for first formant (F1) and between 1200 and 2000 Hz for second formant (F2). Similarly, borders of frequency band for vowel "e" they were 700 and 950 Hz for F1 and 1700 and 3000 Hz for F2. For vowel "i" 300 and 450 Hz for F1 and 2000 and 3600 Hz for F2, for vowel "o" 600 and 800 Hz for F1 and 600 and 1400 Hz for F2, for vowel "u" 100 and 400 Hz for F1 and 400 and 1200 Hz for F2. Discussion: At low frequencies it is feasible to invoke the source-filter model of voice production and associate vowel identity with frequencies of the first two formants in the voice spectrum. On the other hand, subject to intonation, singing or other forms of exposed voice (such as emotional speech, focused speech), the formant regions tend to spread. In spectral analysis other frequencies dominate, so specific formant frequency bands are not easily recognizable. Although the resulting formant map is not much different from the formant map of Peterson, it carries basic information about specific Czech vowels. The results may be used in further research and in education.  相似文献   

3.
4.
Previous studies have shown that concurrent vowel identification improves with increasing temporal onset asynchrony of the vowels, even if the vowels have the same fundamental frequency. The current study investigated the possible underlying neural processing involved in concurrent vowel perception. The individual vowel stimuli from a previously published study were used as inputs for a phenomenological auditory-nerve (AN) model. Spectrotemporal representations of simulated neural excitation patterns were constructed (i.e., neurograms) and then matched quantitatively with the neurograms of the single vowels using the Neurogram Similarity Index Measure (NSIM). A novel computational decision model was used to predict concurrent vowel identification. To facilitate optimum matches between the model predictions and the behavioral human data, internal noise was added at either neurogram generation or neurogram matching using the NSIM procedure. The best fit to the behavioral data was achieved with a signal-to-noise ratio (SNR) of 8 dB for internal noise added at the neurogram but with a much smaller amount of internal noise (SNR of 60 dB) for internal noise added at the level of the NSIM computations. The results suggest that accurate modeling of concurrent vowel data from listeners with normal hearing may partly depend on internal noise and where internal noise is hypothesized to occur during the concurrent vowel identification process.  相似文献   

5.
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.  相似文献   

6.
The perception of vowels was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of vowels were taken as an index of similarity between vowels. Vowels used were five synthetic and natural Japanese vowels and eight natural French vowels. The chimpanzees required long reaction times for discrimination of synthetic [i] from [u] and [e] from [o], that is, they need long latencies for discrimination between vowels based on differences in frequency of the second formant. A similar tendency was observed for discrimination of natural [i] from [u]. The human subject required long reaction times for discrimination between vowels along the first formant axis. These differences can be explained by differences in auditory sensitivity between the two species and the motor theory of speech perception. A vowel, which is pronounced by different speakers, has different acoustic properties. However, humans can perceive these speech sounds as the same vowel. The phenomenon of perceptual constancy in speech perception was studied in chimpanzees using natural vowels and a synthetic [o]- [a] continuum. The chimpanzees ignored the difference in the sex of the speakers and showed a capacity for vocal tract normalization.  相似文献   

7.
An algorithm that operates in real-time to enhance the salient features of speech is described and its efficacy is evaluated. The Contrast Enhancement (CE) algorithm implements dynamic compressive gain and lateral inhibitory sidebands across channels in a modified winner-take-all circuit, which together produce a form of suppression that sharpens the dynamic spectrum. Normal-hearing listeners identified spectrally smeared consonants (VCVs) and vowels (hVds) in quiet and in noise. Consonant and vowel identification, especially in noise, were improved by the processing. The amount of improvement did not depend on the degree of spectral smearing or talker characteristics. For consonants, when results were analyzed according to phonetic feature, the most consistent improvement was for place of articulation. This is encouraging for hearing aid applications because confusions between consonants differing in place are a persistent problem for listeners with sensorineural hearing loss.  相似文献   

8.
M Latinus  P Belin 《PloS one》2012,7(7):e41384
Humans can identify individuals from their voice, suggesting the existence of a perceptual representation of voice identity. We used perceptual aftereffects - shifts in perceived stimulus quality after brief exposure to a repeated adaptor stimulus - to further investigate the representation of voice identity in two experiments. Healthy adult listeners were familiarized with several voices until they reached a recognition criterion. They were then tested on identification tasks that used vowel stimuli generated by morphing between the different identities, presented either in isolation (baseline) or following short exposure to different types of voice adaptors (adaptation). Experiment 1 showed that adaptation to a given voice induced categorization shifts away from that adaptor's identity even when the adaptors consisted of vowels different from the probe stimuli. Moreover, original voices and caricatures resulted in comparable aftereffects, ruling out an explanation of identity aftereffects in terms of adaptation to low-level features. In Experiment 2, we show that adaptors with a disrupted configuration, i.e., altered fundamental frequency or formant frequencies, failed to produce perceptual aftereffects showing the importance of the preserved configuration of these acoustical cues in the representation of voices. These two experiments indicate a high-level, dynamic representation of voice identity based on the combination of several lower-level acoustical features into a specific voice configuration.  相似文献   

9.
Pattern recognition was an important goal in the early work on artificial neural networks. Without arousing dramatic speculation, the paper describes, how a "natural" method of dealing with the configuration of the input layer can considerably improve learning behaviour and classification rate of a modified multi-layered perception with backpropagation of the error learning rule. Using this method, recognition of complex patterns in electrophysiological signals can be performed more accurately, without rules or complicated heuristic procedures. The proposed technique is demonstrated using recognition of the J-point in the ECG as an example.  相似文献   

10.
11.
Synchronized firing in neural populations has been proposed to constitute an elementary aspect of the neural code, but a complete understanding of its origins and significance has been elusive. Synchronized firing has been extensively documented in retinal ganglion cells, the output neurons of the retina. However, differences in synchronized firing across species and cell types have led to varied conclusions about its mechanisms and role in visual signaling. Recent work on two identified cell populations in the primate retina, the ON-parasol and OFF-parasol cells, permits a more unified understanding. Intracellular recordings reveal that synchronized firing in these cell types arises primarily from common synaptic input to adjacent pairs of cells. Statistical analysis indicates that local pairwise interactions can explain the pattern of synchronized firing in the entire parasol cell population. Computational analysis reveals that the aggregate impact of synchronized firing on the visual signal is substantial. Thus, in the parasol cells, the origin and impact of synchronized firing on the neural code may be understood as locally shared input which influences the visual signals transmitted from eye to brain.  相似文献   

12.
表面肌电信号(Surface Electromyography,sEMG)是通过相应肌群表面的传感器记录下来的一维时间序列非平稳生物电信号,不但反映了神经肌肉系统活动,对于反映相应动作肢体活动信息同样重要。而模式识别是肌电应用领域的基础和关键。为了在应用基于表面肌电信号模式识别中选取合适算法,本文拟对基于表面肌电信号的人体动作识别算法进行回顾分析,主要包括模糊模式识别算法、线性判别分析算法、人工神经网络算法和支持向量机算法。模糊模式识别能自适应提取模糊规则,对初始化规则不敏感,适合处理s EMG这样具有严格不重复的生物电信号;线性判别分析对数据进行降维,计算简单,但不适合大数据;人工神经网络可以同时描述训练样本输入输出的线性关系和非线性映射关系,可以解决复杂的分类问题,学习能力强;支持向量机处理小样本、非线性的高维数据优势明显,计算速度快。比较各方法的优缺点,为今后处理此类问题模式识别算法选取提供了参考和依据。  相似文献   

13.
Social recognition, whereby animals identify and recognize other individual conspecifics, is a crucial prerequisite for a wide range of social behaviours. There are relationships among social odours (chemical signals), parasite recognition and avoidance that are associated with hormonal, neural and genomic mechanisms in rodents. Rodents use social odours to: (i) distinguish between infected and uninfected individuals; (ii) recognize specific infected individuals; and (iii) avoid and display aversive responses to infected individuals. There are genomic correlates of this parasite recognition and avoidance in which genes expressing the neuropeptide oxytocin have roles. In this article, we provide a framework ("micronet") by which the genetic, hormonal and neural interactions associated with social behaviours and recognition and avoidance of parasitized individuals can be explored.  相似文献   

14.
Do the oscillations observed in many neural assemblies have a cognitive significance? We investigate this question by mathematical modeling of the honeybee's olfactory glomeruli, which are a subsystem of the antennal lobe nervous network, involved in food odor recognition during foraging behavior. Our computations reveal spontaneous oscillations. In those units where they manifest themselves, however, application of input signals modulate only slightly the autonomous activity: thus, an intense, synchronized oscillatory background tends to hinder odor discrimination. In contrast, where and when spontaneous oscillations are repressed, due to low excitability, different input signals will re-excite selectively distinct subsets of spontaneous oscillatory modes. These observations agree well with experimental findings and suggest new, quantitative experiments. They further indicate a possible role for the modulation and differential activation of endogenous oscillations in odor identification and possibly in other cognitive activities subserved e.g. by the mammalian cortex.  相似文献   

15.
Four male Long-Evans rats were trained to discriminate between synthetic vowel sounds using a GO/NOGO response choice task. The vowels were characterized by an increase in fundamental frequency correlated with an upward shift in formant frequencies. In an initial phase we trained the subjects to discriminate between two vowel categories using two exemplars from each category. In a subsequent phase the ability of the rats to generalize the discrimination between the two categories was tested. To test whether rats might exploit the fact that attributes of training stimuli covaried, we used non-standard stimuli with a reversed relation between fundamental frequency and formants. The overall results demonstrate that rats are able to generalize the discrimination to new instances of the same vowels. We present evidence that the performance of the subjects depended on the relation between fundamental and formant frequencies that they had previously been exposed to. Simple simulation results with artificial neural networks could reproduce most of the behavioral results and support the hypothesis that equivalence classes for vowels are associated with an experience-driven process based on general properties of peripheral auditory coding mixed with elementary learning mechanisms. These results suggest that rats use spectral and temporal cues similarly to humans despite differences in basic auditory capabilities.  相似文献   

16.
Four experiments sought evidence that listeners can use coherent changes in the frequency or amplitude of harmonics to segregate concurrent vowels. Segregation was not helped by giving the harmonics of competing vowels different patterns of frequency or amplitude modulation. However, modulating the frequencies of the components of one vowel was beneficial when the other vowel was not modulated, provided that both vowels were composed of components placed randomly in frequency. In addition, staggering the onsets of the two vowels, so that the amplitude of one vowel increased abruptly while the amplitude of the other was stationary, was also beneficial. Thus, the results demonstrate that listeners can group changing harmonics and can segregate them from stationary harmonics, but cannot use coherence of change to separate two sets of changing harmonics.  相似文献   

17.
Auditory communication in humans and other animals frequently takes place in noisy environments with many co‐occurring signallers. Receivers are thus challenged to rapidly recognize salient auditory signals and filter out irrelevant sounds. Most bird species produce a variety of complex vocalizations that function to communicate with other members of their own species and behavioural evidence broadly supports preferences for conspecific over heterospecific sounds (auditory species recognition). However, it remains unclear whether such auditory signals are categorically recognized by the sensory and central nervous system. Here, we review 53 published studies that compare avian neural responses between conspecific versus heterospecific vocalizations. Irrespective of the techniques used to characterize neural activity, distinct nuclei of the auditory forebrain are consistently shown to be repeatedly conspecific selective across taxa, even in response to unfamiliar individuals with distinct acoustic properties. Yet, species‐specific neural discrimination is not a stereotyped auditory response, but is modulated according to its salience depending, for example, on ontogenetic exposure to conspecific versus heterospecific stimuli. Neuromodulators, in particular norepinephrine, may mediate species recognition by regulating the accuracy of neuronal coding for salient conspecific stimuli. Our review lends strong support for neural structures that categorically recognize conspecific signals despite the highly variable physical properties of the stimulus. The available data are in support of a ‘perceptual filter’‐based mechanism to determine the saliency of the signal, in that species identity and social experience combine to influence the neural processing of species‐specific auditory stimuli. Finally, we present hypotheses and their testable predictions, to propose next steps in species‐recognition research into the emerging model of the neural conceptual construct in avian auditory recognition.  相似文献   

18.
Most of current neural network architectures are not suited to recognize a pattern at various displaced positions. This lack seems due to the prevailing neuron model which reduces a neuron's information transmission to its firing rate. With this information code, a neuronal assembly cannot distinguish between different combinations of its entities and therefore fails to represent the fine structure within a pattern. In our approach, the main idea of the correlation theory is accepted that spatial relationships in a pattern should be coded by temporal relations in the timing of action potentials. However, we do not assume that synchronized spikes are a sign for strong synapses between the neurons concerned. Instead, the synchronization of Synfire chains can be exploited to produce the relevant timing relationships between the neuronal signals. Therefore, we do not require fast synaptic plasticity to account for the precise timing of action potentials. In order to illustrate this claim, we propose a model for translation-invariant pattern recognition which does not depend on any changes in synaptic efficacies. Received: 14 June 1998 / Accepted in revised form: 9 January 1999  相似文献   

19.
 This paper proposes temporal-to-spatial dynamic mapping inspired by neural dynamics of the olfactory cortex. In our model the temporal structure of olfactory-bulb patterns is mapped to the spatial dynamics of the ensemble of cortical neurons. This mapping is based on the following biological mechanism: while anterior part of piriform cortex can be excited by the afferent input alone, the posterior areas require both afferent and association signals, which are temporally correlated in a specific way. One of the functional types of the neurons in our model corresponds to the cortical spatial dynamics and encodes odor components, and another represents temporal activity of association-fiber signals, which, we suggest, may be relevant to the encoding of odor concentrations. The temporal-to-spatial mapping and distributed representation of the model enable simultaneous rough cluster classification and fine recognition of patterns within a cluster as parts of the same dynamic process. The model is able to extract and segment the components of complex odor patterns which are spatiotemporal sequences of neural activity. Received: 16 October 2001 / Accepted in revised form: 7 February 2002  相似文献   

20.
We investigated the electrophysiological response to matched two-formant vowels and two-note musical intervals, with the goal of examining whether music is processed differently from language in early cortical responses. Using magnetoencephalography (MEG), we compared the mismatch-response (MMN/MMF, an early, pre-attentive difference-detector occurring approximately 200 ms post-onset) to musical intervals and vowels composed of matched frequencies. Participants heard blocks of two stimuli in a passive oddball paradigm in one of three conditions: sine waves, piano tones and vowels. In each condition, participants heard two-formant vowels or musical intervals whose frequencies were 11, 12, or 24 semitones apart. In music, 12 semitones and 24 semitones are perceived as highly similar intervals (one and two octaves, respectively), while in speech 12 semitones and 11 semitones formant separations are perceived as highly similar (both variants of the vowel in ‘cut’). Our results indicate that the MMN response mirrors the perceptual one: larger MMNs were elicited for the 12–11 pairing in the music conditions than in the language condition; conversely, larger MMNs were elicited to the 12–24 pairing in the language condition that in the music conditions, suggesting that within 250 ms of hearing complex auditory stimuli, the neural computation of similarity, just as the behavioral one, differs significantly depending on whether the context is music or speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号