首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding ‘rapid temporal processing’.

Methodology

A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds.

Conclusions

Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.  相似文献   

2.
How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.  相似文献   

3.
Speech processing inherently relies on the perception of specific, rapidly changing spectral and temporal acoustic features. Advanced acoustic perception is also integral to musical expertise, and accordingly several studies have demonstrated a significant relationship between musical training and superior processing of various aspects of speech. Speech and music appear to overlap in spectral and temporal features; however, it remains unclear which of these acoustic features, crucial for speech processing, are most closely associated with musical training. The present study examined the perceptual acuity of musicians to the acoustic components of speech necessary for intra-phonemic discrimination of synthetic syllables. We compared musicians and non-musicians on discrimination thresholds of three synthetic speech syllable continua that varied in their spectral and temporal discrimination demands, specifically voice onset time (VOT) and amplitude envelope cues in the temporal domain. Musicians demonstrated superior discrimination only for syllables that required resolution of temporal cues. Furthermore, performance on the temporal syllable continua positively correlated with the length and intensity of musical training. These findings support one potential mechanism by which musical training may selectively enhance speech perception, namely by reinforcing temporal acuity and/or perception of amplitude rise time, and implications for the translation of musical training to long-term linguistic abilities.  相似文献   

4.
Zhang L  Xi J  Xu G  Shu H  Wang X  Li P 《PloS one》2011,6(6):e20963
In speech perception, a functional hierarchy has been proposed by recent functional neuroimaging studies: core auditory areas on the dorsal plane of superior temporal gyrus (STG) are sensitive to basic acoustic characteristics, whereas downstream regions, specifically the left superior temporal sulcus (STS) and middle temporal gyrus (MTG) ventral to Heschl's gyrus (HG) are responsive to abstract phonological features. What is unclear so far is the relationship between the dorsal and ventral processes, especially with regard to whether low-level acoustic processing is modulated by high-level phonological processing. To address the issue, we assessed sensitivity of core auditory and downstream regions to acoustic and phonological variations by using within- and across-category lexical tonal continua with equal physical intervals. We found that relative to within-category variation, across-category variation elicited stronger activation in the left middle MTG (mMTG), apparently reflecting the abstract phonological representations. At the same time, activation in the core auditory region decreased, resulting from the top-down influences of phonological processing. These results support a hierarchical organization of the ventral acoustic-phonological processing stream, which originates in the right HG/STG and projects to the left mMTG. Furthermore, our study provides direct evidence that low-level acoustic analysis is modulated by high-level phonological representations, revealing the cortical dynamics of acoustic and phonological processing in speech perception. Our findings confirm the existence of reciprocal progression projections in the auditory pathways and the roles of both feed-forward and feedback mechanisms in speech perception.  相似文献   

5.
The temporal pattern of amplitude modulations (AM) is often used to recognize acoustic objects. To identify objects reliably, intensity invariant representations have to be formed. We approached this problem within the auditory pathway of grasshoppers. We presented AM patterns modulated at different time scales and intensities. Metric space analysis of neuronal responses allowed us to determine how well, how invariantly, and at which time scales AM frequency is encoded. We find that in some neurons spike-count cues contribute substantially (20–60%) to the decoding of AM frequency at a single intensity. However, such cues are not robust when intensity varies. The general intensity invariance of the system is poor. However, there exists a range of AM frequencies around 83 Hz where intensity invariance of local interneurons is relatively high. In this range, natural communication signals exhibit much variation between species, suggesting an important behavioral role for this frequency band. We hypothesize, just as has been proposed for human speech, that the communication signals might have evolved to match the processing properties of the receivers. This contrasts with optimal coding theory, which postulates that neuronal systems are adapted to the statistics of the relevant signals.  相似文献   

6.
Speech perception at the interface of neurobiology and linguistics   总被引:2,自引:0,他引:2  
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.  相似文献   

7.
Does the lateral temporal cortex require acoustic exposure in order to become specialized for speech processing? Six hearing participants and six congenitally deaf participants, all with spoken English as their first langugage, were scanned using functional magnetic resonance imaging while performing a simple speech-reading task. Focal activation of the left lateral temporal cortex was significantly reduced in the deaf group compared with the hearing group. Activation within this region was present in individual deaf participants, but varied in location from person to person. Early acoustic experience may be required for regions within the left temporal cortex in order to develop into a coherent network with subareas devoted to specific speech analysis functions.  相似文献   

8.
Infant-directed speech is a linguistic phenomenon in which adults adapt their language when addressing infants in order to provide them with more salient linguistic information and aid them in language acquisition. Adult-directed language differs from infant-directed language in various aspects, including speech acoustics, syntax, and semantics. The existence of a "gestural motherese" in interaction with infants, demonstrates that not only spoken language but also nonvocal modes of communication can become adapted when infants are recipients. Rhesus macaques are so far the only nonhuman primates where a similar phenomenon to "motherese" has been discovered: the acoustic spectrum of a particular vocalization of adult females may be altered when the addressees are infants. The present paper describes how gorillas adjust their communicative strategies when directing intentional, nonvocal play signals at infants in the sense of a "nonvocal motherese." Animals of ages above infancy use a higher rate of repetitions and sequences of the tactile sensory modality when negotiating play with infants. This indicates that gorillas employ a strategy of infant-specific communication.  相似文献   

9.
In this paper the analysis of the human acoustic communication channel is presented as a potentially bio-anthropological subject of interest. After a crude survey concerning some important aspects of speech and expressive behaviour a hypothesis is outlined saying that the verbal and the nonverbal content of the speech signal are not just transferred side by side; rather there exist close interconnections between the linguistic and the expressive structures, which is shown regarding the speech melody; the 'cultural' language code makes use of a predominantly 'non-cultural' code of vocalisations for the purpose of linguistic disambiguation and speeding up the communication process. The complementarity of these two codes, their principal independence of each other as well as their different cerebral representation contribute to the high efficiency of speech as a communication tool.  相似文献   

10.
Reaction time and recognition accuracy of speech emotional intonations in short meaningless words that differed only in one phoneme with background noise and without it were studied in 49 adults of 20-79 years old. The results were compared with the same parameters of emotional intonations in intelligent speech utterances under similar conditions. Perception of emotional intonations at different linguistic levels (phonological and lexico-semantic) was found to have both common features and certain peculiarities. Recognition characteristics of emotional intonations depending on gender and age of listeners appeared to be invariant with regard to linguistic levels of speech stimuli. Phonemic composition of pseudowords was found to influence the emotional perception, especially against the background noise. The most significant stimuli acoustic characteristic responsible for the perception of speech emotional prosody in short meaningless words under the two experimental conditions, i.e. with and without background noise, was the fundamental frequency variation.  相似文献   

11.
Beyond the linguistic content of their speech, speakers of both sexes convey diverse biological and psychosocial information through their voices, which are important when assessing potential mates and competitors. However, studies investigating the relationships between mating success and acoustic inter-individual differences are scarce. In this study, we investigated such relationships in both sexes in courtship and competitive interactions—as they correspond to the two different types of sexual selection—using an experimental design based on a simulated dating game. We assessed which type of sexual selection best predicted mating success, here defined as the self-reported number of sexual partners within the past year. Our results show that only acoustic inter-individual differences in the courtship context for both men and women predicted their mating success. Men displaying faster articulation rate and louder voices reported significantly more sexual partners; in contrast, men displaying higher intonation reported a greater negative effect of roughness and breathiness on their mating success. Women who displayed relatively less breathy voices and shorter speech duration reported significantly fewer sexual partners. These novel findings are discussed in light of the mate choice context and the relative contribution of both types of sexual selection shaping acoustic features of speech.  相似文献   

12.
We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants.  相似文献   

13.
The purpose of this study was to compare the duration and variability of speech segments of children who stutter with those of children who do not stutter and to identify changes in duration and variability of speech segments due to the effect of utterance length. Eighteen children participated (ranging from 6.3 to 7.9 years of age). The experimental task required the children to repeat a single word in isolation and the same word embedded in a sentence. Durations of speech segments and Coefficients of variation (Cv) were defined to assess temporal parameters of speech. Significant differences were found in the variability of speech segments on the sentence level, but not in duration. The findings supported the assumption that linguistic factors pose direct demands on the speech motor system and that the extra duration of speech segments observed in the speech of stuttering adults may be a kind of compensation strategy.  相似文献   

14.
The speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is discrete and compositional, shared by all the individuals of a community but different across communities, and phoneme inventories are characterized by statistical regularities. How can a speech code with these properties form? We try to approach these questions in the paper, using the "methodology of the artificial". We build a society of artificial agents, and detail a mechanism that shows the formation of a discrete speech code without pre-supposing the existence of linguistic capacities or of coordinated interactions. The mechanism is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non-language-specific neural devices leads to the formation of a speech code that has properties similar to the human speech code. This result relies on the self-organizing properties of a generic coupling between perception and production within agents, and on the interactions between agents. The artificial system helps us to develop better intuitions on how speech might have appeared, by showing how self-organization might have helped natural selection to find speech.  相似文献   

15.
Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test–retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.  相似文献   

16.
Neural specializations for speech and pitch: moving beyond the dichotomies   总被引:2,自引:0,他引:2  
The idea that speech processing relies on unique, encapsulated, domain-specific mechanisms has been around for some time. Another well-known idea, often espoused as being in opposition to the first proposal, is that processing of speech sounds entails general-purpose neural mechanisms sensitive to the acoustic features that are present in speech. Here, we suggest that these dichotomous views need not be mutually exclusive. Specifically, there is now extensive evidence that spectral and temporal acoustical properties predict the relative specialization of right and left auditory cortices, and that this is a parsimonious way to account not only for the processing of speech sounds, but also for non-speech sounds such as musical tones. We also point out that there is equally compelling evidence that neural responses elicited by speech sounds can differ depending on more abstract, linguistically relevant properties of a stimulus (such as whether it forms part of one's language or not). Tonal languages provide a particularly valuable window to understand the interplay between these processes. The key to reconciling these phenomena probably lies in understanding the interactions between afferent pathways that carry stimulus information, with top-down processing mechanisms that modulate these processes. Although we are still far from the point of having a complete picture, we argue that moving forward will require us to abandon the dichotomy argument in favour of a more integrated approach.  相似文献   

17.
The concept of categorical perception of speech and speech-like sounds has been central to models of speech perception for decades. Event-related potentials (ERPs) provide a neurophysiologic perspective of this important phenomenon. In the present experiment the mismatch negativity (MMN) event-related potential, which is sensitive to fine acoustic differences, was recorded in adults. Of interest was whether the MMN reflects the acoustic or categorical perception of speech.The MMN was elicited by stimulus pairs (along a continuum varying in place of articulation from /da/ to /ga/) which had been identified as the same phoneme /da/ (within category condition) and as different phonemes /da/ and /ga/ (across categories condition). The acoustic differences between these two pairs of stimuli were equivalent.The MMN was observed in all subjects both in the within and across category conditions. Furthermore, the MMN did not differ in latency, amplitude or area within and across categories. That is, the MMN indicated equal discrimination both across and within categories. These results suggest that the MMN appears to reflect the processing of acoustic aspects of the speech stimulus, but not phonetic processing into categories. The MMN appears to be an extremely sensitive electrophysiologic index of minimal acoustic differences in speech stimuli.  相似文献   

18.
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data.  相似文献   

19.
20.
This study aimed to characterize the linguistic interference that occurs during speech-in-speech comprehension by combining offline and online measures, which included an intelligibility task (at a −5 dB Signal-to-Noise Ratio) and 2 lexical decision tasks (at a −5 dB and 0 dB SNR) that were performed with French spoken target words. In these 3 experiments we always compared the masking effects of speech backgrounds (i.e., 4-talker babble) that were produced in the same language as the target language (i.e., French) or in unknown foreign languages (i.e., Irish and Italian) to the masking effects of corresponding non-speech backgrounds (i.e., speech-derived fluctuating noise). The fluctuating noise contained similar spectro-temporal information as babble but lacked linguistic information. At −5 dB SNR, both tasks revealed significantly divergent results between the unknown languages (i.e., Irish and Italian) with Italian and French hindering French target word identification to a similar extent, whereas Irish led to significantly better performances on these tasks. By comparing the performances obtained with speech and fluctuating noise backgrounds, we were able to evaluate the effect of each language. The intelligibility task showed a significant difference between babble and fluctuating noise for French, Irish and Italian, suggesting acoustic and linguistic effects for each language. However, the lexical decision task, which reduces the effect of post-lexical interference, appeared to be more accurate, as it only revealed a linguistic effect for French. Thus, although French and Italian had equivalent masking effects on French word identification, the nature of their interference was different. This finding suggests that the differences observed between the masking effects of Italian and Irish can be explained at an acoustic level but not at a linguistic level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号