首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Soundscape ecology evaluates biodiversity and environmental disturbances by investigating the interaction among soundscape components (biological, geophysical, and human-produced sounds) using data collected with autonomous recording units. Current analyses consider the acoustic properties of frequency and amplitude resulting in varied metrics, but rarely focus on the discrimination of soundscape components. Computational musicologists analyze similar data but consider a third acoustic property, timbre.Here, we investigated the effectiveness of spectral timbral analysis to distinguish among dominant soundscape components. This process included manually labeling and extracting spectral timbral features for each recording. Then, we tested classification accuracy with linear and quadratic discriminant analyses on combinations of spectral timbral features.Different spectral timbral feature groups distinguished between biological, geophysical, and manmade sounds in a single field recording. Furthermore, as we tested different combinations of spectral timbral features that resulted in both high and very low accuracy results, we found that they could be ordered to “sift” out field recordings by individual dominant soundscape component.By using timbre as a new acoustic property in soundscape analyses, we could classify dominant soundscape components effectively. We propose further investigation into a sifting scheme that may allow researchers to focus on more specific research questions such as understanding changes in biodiversity, discriminating by taxonomic class, or to inspect weather-related events.  相似文献   

2.
Auditory and visual signals generated by a single source tend to be temporally correlated, such as the synchronous sounds of footsteps and the limb movements of a walker. Continuous tracking and comparison of the dynamics of auditory-visual streams is thus useful for the perceptual binding of information arising from a common source. Although language-related mechanisms have been implicated in the tracking of speech-related auditory-visual signals (e.g., speech sounds and lip movements), it is not well known what sensory mechanisms generally track ongoing auditory-visual synchrony for non-speech signals in a complex auditory-visual environment. To begin to address this question, we used music and visual displays that varied in the dynamics of multiple features (e.g., auditory loudness and pitch; visual luminance, color, size, motion, and organization) across multiple time scales. Auditory activity (monitored using auditory steady-state responses, ASSR) was selectively reduced in the left hemisphere when the music and dynamic visual displays were temporally misaligned. Importantly, ASSR was not affected when attentional engagement with the music was reduced, or when visual displays presented dynamics clearly dissimilar to the music. These results appear to suggest that left-lateralized auditory mechanisms are sensitive to auditory-visual temporal alignment, but perhaps only when the dynamics of auditory and visual streams are similar. These mechanisms may contribute to correct auditory-visual binding in a busy sensory environment.  相似文献   

3.
Vocal acquisition in songbirds and humans shows many similarities, one of which is that both involve a combination of experience and perceptual predispositions. Among languages some speech sounds are shared, while others are not. This could reflect a predisposition in young infants for learning some speech sounds over others, which combines with exposure-based learning. Similarly, in songbirds, some sounds are common across populations, while others are more specific to populations or individuals. We examine whether this is also due to perceptual preferences for certain within-species element types in naive juvenile male birds, and how such preferences interact with exposure to guide subsequent song learning. We show that young zebra finches lacking previous song exposure perceptually prefer songs with more common zebra finch song element types over songs with less common elements. Next, we demonstrate that after subsequent tutoring, birds prefer tutor songs regardless of whether these contain more common or less common elements. In adulthood, birds tutored with more common elements showed a higher song similarity to their tutor song, indicating that the early bias influenced song learning. Our findings help to understand the maintenance of similarities and the presence of differences among birds'' songs, their dialects and human languages.  相似文献   

4.
Crossmodal associations may arise at neurological, perceptual, cognitive, or emotional levels of brain processing. Higher-level modal correspondences between musical timbre and visual colour have been previously investigated, though with limited sets of colour. We developed a novel response method that employs a tablet interface to navigate the CIE Lab colour space. The method was used in an experiment where 27 film music excerpts were presented to participants (n = 22) who continuously manipulated the colour and size of an on-screen patch to match the music. Analysis of the data replicated and extended earlier research, for example, that happy music was associated with yellow, music expressing anger with large red colour patches, and sad music with smaller patches towards dark blue. Correlation analysis suggested patterns of relationships between audio features and colour patch parameters. Using partial least squares regression, we tested models for predicting colour patch responses from audio features and ratings of perceived emotion in the music. Parsimonious models that included emotion robustly explained between 60% and 75% of the variation in each of the colour patch parameters, as measured by cross-validated R 2. To illuminate the quantitative findings, we performed a content analysis of structured spoken interviews with the participants. This provided further evidence of a significant emotion mediation mechanism, whereby people tended to match colour association with the perceived emotion in the music. The mixed method approach of our study gives strong evidence that emotion can mediate crossmodal association between music and visual colour. The CIE Lab interface promises to be a useful tool in perceptual ratings of music and other sounds.  相似文献   

5.
Listening to speech in the presence of other sounds   总被引:1,自引:0,他引:1  
Although most research on the perception of speech has been conducted with speech presented without any competing sounds, we almost always listen to speech against a background of other sounds which we are adept at ignoring. Nevertheless, such additional irrelevant sounds can cause severe problems for speech recognition algorithms and for the hard of hearing as well as posing a challenge to theories of speech perception. A variety of different problems are created by the presence of additional sound sources: detection of features that are partially masked, allocation of detected features to the appropriate sound sources and recognition of sounds on the basis of partial information. The separation of sounds is arousing substantial attention in psychoacoustics and in computer science. An effective solution to the problem of separating sounds would have important practical applications.  相似文献   

6.
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken.  相似文献   

7.
Azadpour M  Balaban E 《PloS one》2008,3(4):e1966
Neuroimaging studies of speech processing increasingly rely on artificial speech-like sounds whose perceptual status as speech or non-speech is assigned by simple subjective judgments; brain activation patterns are interpreted according to these status assignments. The naïve perceptual status of one such stimulus, spectrally-rotated speech (not consciously perceived as speech by naïve subjects), was evaluated in discrimination and forced identification experiments. Discrimination of variation in spectrally-rotated syllables in one group of naïve subjects was strongly related to the pattern of similarities in phonological identification of the same stimuli provided by a second, independent group of naïve subjects, suggesting either that (1) naïve rotated syllable perception involves phonetic-like processing, or (2) that perception is solely based on physical acoustic similarity, and similar sounds are provided with similar phonetic identities. Analysis of acoustic (Euclidean distances of center frequency values of formants) and phonetic similarities in the perception of the vowel portions of the rotated syllables revealed that discrimination was significantly and independently influenced by both acoustic and phonological information. We conclude that simple subjective assessments of artificial speech-like sounds can be misleading, as perception of such sounds may initially and unconsciously utilize speech-like, phonological processing.  相似文献   

8.
Thierry G  Giraud AL  Price C 《Neuron》2003,38(3):499-506
Patient studies suggest that speech and environmental sounds are differentially processed by the left and right hemispheres. Here, using functional imaging in normal subjects, we compared semantic processing of spoken words to equivalent processing of environmental sounds, after controlling for low-level perceptual differences. Words enhanced activation in left anterior and posterior superior temporal regions, while environmental sounds enhanced activation in a right posterior superior temporal region. This left/right dissociation was unchanged by different attentional/working memory contexts, but it was specific to tasks requiring semantic analysis. While semantic processing involves widely distributed networks in both hemispheres, our results support the hypothesis of a dual access route specific for verbal and nonverbal material, respectively.  相似文献   

9.
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.  相似文献   

10.

Background

Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process.

Principal Findings

Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation.

Conclusion

Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.  相似文献   

11.
Wong PC  Ciocca V  Chan AH  Ha LY  Tan LH  Peretz I 《PloS one》2012,7(4):e33424
The strong association between music and speech has been supported by recent research focusing on musicians' superior abilities in second language learning and neural encoding of foreign speech sounds. However, evidence for a double association--the influence of linguistic background on music pitch processing and disorders--remains elusive. Because languages differ in their usage of elements (e.g., pitch) that are also essential for music, a unique opportunity for examining such language-to-music associations comes from a cross-cultural (linguistic) comparison of congenital amusia, a neurogenetic disorder affecting the music (pitch and rhythm) processing of about 5% of the Western population. In the present study, two populations (Hong Kong and Canada) were compared. One spoke a tone language in which differences in voice pitch correspond to differences in word meaning (in Hong Kong Cantonese, /si/ means 'teacher' and 'to try' when spoken in a high and mid pitch pattern, respectively). Using the On-line Identification Test of Congenital Amusia, we found Cantonese speakers as a group tend to show enhanced pitch perception ability compared to speakers of Canadian French and English (non-tone languages). This enhanced ability occurs in the absence of differences in rhythmic perception and persists even after relevant factors such as musical background and age were controlled. Following a common definition of amusia (5% of the population), we found Hong Kong pitch amusics also show enhanced pitch abilities relative to their Canadian counterparts. These findings not only provide critical evidence for a double association of music and speech, but also argue for the reconceptualization of communicative disorders within a cultural framework. Along with recent studies documenting cultural differences in visual perception, our auditory evidence challenges the common assumption of universality of basic mental processes and speaks to the domain generality of culture-to-perception influences.  相似文献   

12.

Background

Humans can easily restore a speech signal that is temporally masked by an interfering sound (e.g., a cough masking parts of a word in a conversation), and listeners have the illusion that the speech continues through the interfering sound. This perceptual restoration for human speech is affected by prior experience. Here we provide evidence for perceptual restoration in complex vocalizations of a songbird that are acquired by vocal learning in a similar way as humans learn their language.

Methodology/Principal Findings

European starlings were trained in a same/different paradigm to report salient differences between successive sounds. The birds'' response latency for discriminating between a stimulus pair is an indicator for the salience of the difference, and these latencies can be used to evaluate perceptual distances using multi-dimensional scaling. For familiar motifs the birds showed a large perceptual distance if discriminating between song motifs that were muted for brief time periods and complete motifs. If the muted periods were filled with noise, the perceptual distance was reduced. For unfamiliar motifs no such difference was observed.

Conclusions/Significance

The results suggest that starlings are able to perceptually restore partly masked sounds and, similarly to humans, rely on prior experience. They may be a suitable model to study the mechanism underlying experience-dependent perceptual restoration.  相似文献   

13.
Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.  相似文献   

14.
Stilp CE  Kluender KR 《PloS one》2012,7(1):e30845
To the extent that sensorineural systems are efficient, redundancy should be extracted to optimize transmission of information, but perceptual evidence for this has been limited. Stilp and colleagues recently reported efficient coding of robust correlation (r = .97) among complex acoustic attributes (attack/decay, spectral shape) in novel sounds. Discrimination of sounds orthogonal to the correlation was initially inferior but later comparable to that of sounds obeying the correlation. These effects were attenuated for less-correlated stimuli (r = .54) for reasons that are unclear. Here, statistical properties of correlation among acoustic attributes essential for perceptual organization are investigated. Overall, simple strength of the principal correlation is inadequate to predict listener performance. Initial superiority of discrimination for statistically consistent sound pairs was relatively insensitive to decreased physical acoustic/psychoacoustic range of evidence supporting the correlation, and to more frequent presentations of the same orthogonal test pairs. However, increased range supporting an orthogonal dimension has substantial effects upon perceptual organization. Connectionist simulations and Eigenvalues from closed-form calculations of principal components analysis (PCA) reveal that perceptual organization is near-optimally weighted to shared versus unshared covariance in experienced sound distributions. Implications of reduced perceptual dimensionality for speech perception and plausible neural substrates are discussed.  相似文献   

15.
The sequential organization of sound over time can interact with the concurrent organization of sounds across frequency. Previous studies using simple acoustic stimuli have suggested that sequential streaming cues can retroactively affect the perceptual organization of sounds that have already occurred. It is unknown whether such effects generalize to the perception of speech sounds. Listeners’ ability to identify two simultaneously presented vowels was measured in the following conditions: no context, a preceding context stream (precursors), and a following context stream (postcursors). The context stream was comprised of brief repetitions of one of the two vowels, and the primary measure of performance was listeners’ ability to identify the other vowel. Results in the precursor condition showed a significant advantage for the identification of the second vowel compared to the no-context condition, suggesting that sequential grouping mechanisms aided the segregation of the concurrent vowels, in agreement with previous work. However, performance in the postcursor condition was significantly worse compared to the no-context condition, providing no evidence for an effect of stream segregation, and suggesting a possible interference effect. Two additional experiments involving inharmonic (jittered) vowels were performed to provide additional cues to aid retroactive stream segregation; however, neither manipulation enabled listeners to improve their identification of the target vowel. Taken together with earlier studies, the results suggest that retroactive streaming may require large spectral differences between concurrent sources and thus may not provide a robust segregation cue for natural broadband sounds such as speech.  相似文献   

16.
Virtually every human faculty engage with imitation. One of the most natural and unexplored objects for the study of the mimetic elements in language is the onomatopoeia, as it implies an imitative-driven transformation of a sound of nature into a word. Notably, simple sounds are transformed into complex strings of vowels and consonants, making difficult to identify what is acoustically preserved in this operation. In this work we propose a definition for vocal imitation by which sounds are transformed into the speech elements that minimize their spectral difference within the constraints of the vocal system. In order to test this definition, we use a computational model that allows recovering anatomical features of the vocal system from experimental sound data. We explore the vocal configurations that best reproduce non-speech sounds, like striking blows on a door or the sharp sounds generated by pressing on light switches or computer mouse buttons. From the anatomical point of view, the configurations obtained are readily associated with co-articulated consonants, and we show perceptual evidence that these consonants are positively associated with the original sounds. Moreover, the pairs vowel-consonant that compose these co-articulations correspond to the most stable syllables found in the knock and click onomatopoeias across languages, suggesting a mechanism by which vocal imitation naturally embeds single sounds into more complex speech structures. Other mimetic forces received extensive attention by the scientific community, such as cross-modal associations between speech and visual categories. The present approach helps building a global view of the mimetic forces acting on language and opens a new venue for a quantitative study of word formation in terms of vocal imitation.  相似文献   

17.
Expectations and prior knowledge can strongly influence our perception. In vision research, such top-down modulation of perceptual processing has been extensively studied using ambiguous stimuli, such as reversible figures. Here, we propose a novel method to address this issue in the auditory modality during speech perception by means of Mondgreens and Soramimi which represent song lyrics with the potential for misperception within one or across two languages, respectively. We demonstrate that such phenomena can be induced by visual presentation of the alternative percept and occur with a sufficient probability to exploit them in neuroscientific experiments. Song familiarity did not influence the occurrence of such altered perception indicating that this tool can be employed irrespective of the participants’ knowledge of music. On the other hand, previous knowledge of the alternative percept had a strong impact on the strength of altered perception which is in line with frequent reports that these phenomena can have long-lasting effects. Finally, we demonstrate that the strength of changes in perception correlated with the extent to which they were experienced as amusing as well as the vocabulary of the participants as source of potential interpretations. These findings suggest that such perceptional phenomena might be linked to the pleasant experience of resolving ambiguity which is in line with the long-existing theory of Hermann von Helmholtz that perception and problem-solving recruit similar processes.  相似文献   

18.
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.  相似文献   

19.
Pitch is one of the most important features of natural sounds, underlying the perception of melody in music and prosody in speech. However, the temporal dynamics of pitch processing are still poorly understood. Previous studies suggest that the auditory system uses a wide range of time scales to integrate pitch-related information and that the effective integration time is both task- and stimulus-dependent. None of the existing models of pitch processing can account for such task- and stimulus-dependent variations in processing time scales. This study presents an idealized neurocomputational model, which provides a unified account of the multiple time scales observed in pitch perception. The model is evaluated using a range of perceptual studies, which have not previously been accounted for by a single model, and new results from a neurophysiological experiment. In contrast to other approaches, the current model contains a hierarchy of integration stages and uses feedback to adapt the effective time scales of processing at each stage in response to changes in the input stimulus. The model has features in common with a hierarchical generative process and suggests a key role for efferent connections from central to sub-cortical areas in controlling the temporal dynamics of pitch processing.  相似文献   

20.
Significant scientific and translational questions remain in auditory neuroscience surrounding the neural correlates of perception. Relating perceptual and neural data collected from humans can be useful; however, human-based neural data are typically limited to evoked far-field responses, which lack anatomical and physiological specificity. Laboratory-controlled preclinical animal models offer the advantage of comparing single-unit and evoked responses from the same animals. This ability provides opportunities to develop invaluable insight into proper interpretations of evoked responses, which benefits both basic-science studies of neural mechanisms and translational applications, e.g., diagnostic development. However, these comparisons have been limited by a disconnect between the types of spectrotemporal analyses used with single-unit spike trains and evoked responses, which results because these response types are fundamentally different (point-process versus continuous-valued signals) even though the responses themselves are related. Here, we describe a unifying framework to study temporal coding of complex sounds that allows spike-train and evoked-response data to be analyzed and compared using the same advanced signal-processing techniques. The framework uses a set of peristimulus-time histograms computed from single-unit spike trains in response to polarity-alternating stimuli to allow advanced spectral analyses of both slow (envelope) and rapid (temporal fine structure) response components. Demonstrated benefits include: (1) novel spectrally specific temporal-coding measures that are less confounded by distortions due to hair-cell transduction, synaptic rectification, and neural stochasticity compared to previous metrics, e.g., the correlogram peak-height, (2) spectrally specific analyses of spike-train modulation coding (magnitude and phase), which can be directly compared to modern perceptually based models of speech intelligibility (e.g., that depend on modulation filter banks), and (3) superior spectral resolution in analyzing the neural representation of nonstationary sounds, such as speech and music. This unifying framework significantly expands the potential of preclinical animal models to advance our understanding of the physiological correlates of perceptual deficits in real-world listening following sensorineural hearing loss.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号