首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Speaker verification and speech recognition are closely related technologies. They both operate on spoken language that has been input to a microphone or telephone, and they both employ analog-to-digital signal-processing (DSP) techniques to extract information about acoustic data and patterns from that input. The principal distinction between speech recognition and speaker verification is functional, with each system differing markedly in what they do with the speech data once it has been processed.  相似文献   

2.
In recent legal proceedings, forensic phoneticians were called upon to analyse a tape-recorded message intended for the blackmail of a bank manager following the kidnap of his wife. The brief was to establish the likelihood that the tape recording may have been made by any one of three suspects, samples of whose speech were also made available. The comparison was greatly complicated by voice disguise employed by the speaker who recorded the kidnap tape. This disguise comprised a form of phonation described phonetically as ‘glottal fry’ or vocal ‘creak’. This form of phonation occurs normally in normal speech, but it has received most attention in relation to voice pathologies. On the other hand there are few references to its use as a form of voice disguise. This paper discusses the nature of the creak, and examines its effectiveness as voice disguise. In addition, a method is described for speaker identification regardless of the disguise. Results indicate that trained listeners without repeated presentations or instrumentation are able to match speakers with 65% accuracy when one voice is creaky, compared with 90% accuracy for undisguised voices. Using a Euclidean metric to compare the power spectra of the [s] sound, we find that creaky disguised voices may be correctly matched with the undisguised voice of the same speaker (9 distracters) in 5 cases out of 10. However, when the computer's task is made more similar to the perceptual task, selecting one speaker out of two, it achieves an accuracy of 81%. Implications for forensic phonetics are discussed.  相似文献   

3.
Men's voices contain acoustic cues to body size and hormonal status, which have been found to affect women's ratings of speaker size, masculinity and attractiveness. However, the extent to which these voice parameters mediate the relationship between speakers' fitness-related features and listener's judgments of their masculinity has not yet been investigated.  相似文献   

4.
This paper presents a text-independent speaker verification system based on an online Radial Basis Function (RBF) network referred to as Minimal Resource Allocation Network (MRAN). MRAN is a sequential learning RBF, in which hidden neurons are added or removed as training progresses. LP-derived cepstral coefficients are used as feature vectors during training and verification phases. The performance of MRAN is compared with other well-known RBF and Elliptical Basis Function (EBF) based speaker verification methods in terms of error rates and computational complexity on a series of speaker verification experiments. The experiments use data from 258 speakers from the phonetically balancedcontinuous speech corpus TIMIT. The results show that MRAN produces comparable error rates to other methods with much less computational complexity.  相似文献   

5.
Behavioral studies of spoken word memory have shown that context congruency facilitates both word and source recognition, though the level at which context exerts its influence remains equivocal. We measured event-related potentials (ERPs) while participants performed both types of recognition task with words spoken in four voices. Two voice parameters (i.e., gender and accent) varied between speakers, with the possibility that none, one or two of these parameters was congruent between study and test. Results indicated that reinstating the study voice at test facilitated both word and source recognition, compared to similar or no context congruency at test. Behavioral effects were paralleled by two ERP modulations. First, in the word recognition test, the left parietal old/new effect showed a positive deflection reflective of context congruency between study and test words. Namely, the same speaker condition provided the most positive deflection of all correctly identified old words. In the source recognition test, a right frontal positivity was found for the same speaker condition compared to the different speaker conditions, regardless of response success. Taken together, the results of this study suggest that the benefit of context congruency is reflected behaviorally and in ERP modulations traditionally associated with recognition memory.  相似文献   

6.
The most significant developments impacting on the speaker verification sector appear to be the surge in support for speech recognition technology and the convergence of the telecom and datacom markets. In this survey, Btt looks at how the technology and markets are changing and examines the issues the sector is facing.  相似文献   

7.
This paper compares kernel-based probabilistic neural networks for speaker verification based on 138 speakers of the YOHO corpus. Experimental evaluations using probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models were conducted. The original training algorithm of PDBNNs was also modified to make PDBNNs appropriate for speaker verification. Results show that the equal error rate obtained by PDBNNs and GMMs is less than that of EBFNs (0.33% vs. 0.48%), suggesting that GMM- and PDBNN-based speaker models outperform the EBFN ones. This work also finds that the globally supervised learning of PDBNNs is able to find decision thresholds that not only maintain the false acceptance rates to a low level but also reduce their variation, whereas the ad-hoc threshold-determination approach used by the EBFNs and GMMs causes a large variation in the error rates. This property makes the performance of PDBNN-based systems more predictable.  相似文献   

8.
The most significant developments impacting on the speaker verification sector appear to be the surge in support for speech recognition technology and the convergence of the telecom and datacom markets. In this survey, Btt looks at how the technology and markets are changing and examines the issues the sector is facing.  相似文献   

9.

Background

It is usually possible to identify the sex of a pre-pubertal child from their voice, despite the absence of sex differences in fundamental frequency at these ages. While it has been suggested that the overall spacing between formants (formant frequency spacing - ΔF) is a key component of the expression and perception of sex in children''s voices, the effect of its continuous variation on sex and gender attribution has not yet been investigated.

Methodology/Principal findings

In the present study we manipulated voice ΔF of eight year olds (two boys and two girls) along continua covering the observed variation of this parameter in pre-pubertal voices, and assessed the effect of this variation on adult ratings of speakers'' sex and gender in two separate experiments. In the first experiment (sex identification) adults were asked to categorise the voice as either male or female. The resulting identification function exhibited a gradual slope from male to female voice categories. In the second experiment (gender rating), adults rated the voices on a continuum from “masculine boy” to “feminine girl”, gradually decreasing their masculinity ratings as ΔF increased.

Conclusions/Significance

These results indicate that the role of ΔF in voice gender perception, which has been reported in adult voices, extends to pre-pubertal children''s voices: variation in ΔF not only affects the perceived sex, but also the perceived masculinity or femininity of the speaker. We discuss the implications of these observations for the expression and perception of gender in children''s voices given the absence of anatomical dimorphism in overall vocal tract length before puberty.  相似文献   

10.
German provider of speaker authentication solutions VOICE.TRUST has formalised its partnership with US company SpeechWorks International. The company, which was launched in 2000, uses SpeechWorks’ SpeechSecure speaker verification technology at the heart of its biometric-based authentication solution VOICE.TRUST Server.This is a short news story only. Visit www.compseconline.com for the latest computer security industry news  相似文献   

11.
12.
Voices can convey information about a speaker. When forming an abstract representation of a speaker, it is important to extract relevant features from acoustic signals that are invariant to the modulation of these signals. This study investigated the way in which individuals with autism spectrum disorder (ASD) recognize and memorize vocal identity. The ASD group and control group performed similarly in a task when asked to choose the name of the newly-learned speaker based on his or her voice, and the ASD group outperformed the control group in a subsequent familiarity test when asked to discriminate the previously trained voices and untrained voices. These findings suggest that individuals with ASD recognized and memorized voices as well as the neurotypical individuals did, but they categorized voices in a different way: individuals with ASD categorized voices quantitatively based on the exact acoustic features, while neurotypical individuals categorized voices qualitatively based on the acoustic patterns correlated to the speakers'' physical and mental properties.  相似文献   

13.
As the applications of mobile and ubiquitous technologies have become more extensive, the communication security issues of those applications are emerging as the most important concern. Therefore, studies are active in relation with various techniques and system applications for individual security elements. In this paper, we proposed a new technique which uses the voice features in order to generate mobile one time passwords (OTPs) and generated safe and variable and safe passwords for one time use, using voice information of biometrics, which is used for powerful personal authentication optionally. Also, we performed the availability analysis on homomorphic variability of voice feature points using dendrogram and distribution of 15 users’ voice skip sampling of feature points for the proposed password generation method. And we have described the application cases of the proposed mobile-OTP using skip sampling of voice signal.  相似文献   

14.
The search for protein biomarkers has been a highly pursued topic in the proteomics community in the last decade. This relentless search is due to the constant need for validated biomarkers that could facilitate disease risk stratification, disease diagnosis, prognosis, monitoring as well as drug development, which ultimately would improve our quality of life. The recent development of proteomic technologies including the advancement of mass spectrometers with high sensitivity and speed has greatly advanced the discovery of potential biomarkers. One of the bottlenecks lies in the development of well-established verification assays to screen the biomarker candidates identified in the discovery stage. Recently, absolute quantitation using multiple-reaction monitoring mass spectrometry (MRM-MS) in combination with isotope-labeled internal standards has been extensively investigated as a tool for high-throughput protein biomarker verification. In this review, we describe and discuss recent developments and applications of MRM-MS methods for biomarker verification.  相似文献   

15.
Cartei V  Cowles HW  Reby D 《PloS one》2012,7(2):e31353

Background

The frequency components of the human voice play a major role in signalling the gender of the speaker. A voice imitation study was conducted to investigate individuals'' ability to make behavioural adjustments to fundamental frequency (F0), and formants (Fi) in order to manipulate their expression of voice gender.

Methodology/Principal Findings

Thirty-two native British-English adult speakers were asked to read out loud different types of text (words, sentence, passage) using their normal voice and then while sounding as ‘masculine’ and ‘feminine’ as possible. Overall, the results show that both men and women raised their F0 and Fi when feminising their voice, and lowered their F0 and Fi when masculinising their voice.

Conclusions/Significance

These observations suggest that adult speakers are capable of spontaneous glottal and vocal tract length adjustments to express masculinity and femininity in their voice. These results point to a “gender code”, where speakers make a conventionalized use of the existing sex dimorphism to vary the expression of their gender and gender-related attributes.  相似文献   

16.
In recent years, parametric speakers have been used in various circumstances. However, nothing has yet been demonstrated about the safety of parametric speakers for the human body. Therefore, we studied their effects on physiological functions. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-min quiet period as a baseline, a 45-min mental task period with a general speaker or a parametric speaker, and a 20-min recovery period. We measured electrocardiogram (ECG), photoplethysmogram (PTG), electroencephalogram (EEG), blood pressure (BP), and baroreflex sensitivity (BRS). Two experiments, one with a general speaker (the general condition), the other with a parametric speaker (the parametric condition), were conducted at the same time of day on separate days. To examine the effects of the parametric speaker, a two-way repeated measures ANOVA (speaker factor and time factor) was conducted. We found that sympathetic nervous activity and second derivative of PTG in task period and recovery period during the parametric condition were significantly lower than those indications during the general condition. Furthermore, Δ parasympathetic nervous activity during the parametric condition in task period and recovery period tended to be smaller than that during the general condition. The results suggested that the burden of the parametric speaker is lower than that of the general speaker for physiological functions, especially those of the cardiovascular system. Furthermore, we verified that the reaction time with the parametric speaker is shorter than that with the general speaker.  相似文献   

17.
In recent years, parametric speakers have been used in various circumstances. In our previous studies, we verified that the physiological burden of the sound of parametric speaker set at 2.6 m from the subjects was lower than that of the general speaker. However, nothing has yet been demonstrated about the effects of the sound of a parametric speaker at the shorter distance between parametric speakers the human body. Therefore, we studied this effect on physiological functions and task performance. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-minute quiet period as a baseline, a 30-minute mental task period with general speakers or parametric speakers, and a 20-minute recovery period. We measured electrocardiogram (ECG) photoplethysmogram (PTG), electroencephalogram (EEG), systolic and diastolic blood pressure. Four experiments, one with a speaker condition (general speaker and parametric speaker), the other with a distance condition (0.3 m and 1.0 m), were conducted respectively at the same time of day on separate days. To examine the effects of the speaker and distance, three-way repeated measures ANOVA (speaker factor x distance factor x time factor) were conducted. In conclusion, we found that the physiological responses were not significantly different between the speaker condition and the distance condition. Meanwhile, it was shown that the physiological burdens increased with progress in time independently of speaker condition and distance condition. In summary, the effects of the parametric speaker at the 2.6 m distance were not obtained at the distance of 1 m or less.  相似文献   

18.
Recognition of personally familiar voices benefits from the concurrent presentation of the corresponding speakers’ faces. This effect of audiovisual integration is most pronounced for voices combined with dynamic articulating faces. However, it is unclear if learning unfamiliar voices also benefits from audiovisual face-voice integration or, alternatively, is hampered by attentional capture of faces, i.e., “face-overshadowing”. In six study-test cycles we compared the recognition of newly-learned voices following unimodal voice learning vs. bimodal face-voice learning with either static (Exp. 1) or dynamic articulating faces (Exp. 2). Voice recognition accuracies significantly increased for bimodal learning across study-test cycles while remaining stable for unimodal learning, as reflected in numerical costs of bimodal relative to unimodal voice learning in the first two study-test cycles and benefits in the last two cycles. This was independent of whether faces were static images (Exp. 1) or dynamic videos (Exp. 2). In both experiments, slower reaction times to voices previously studied with faces compared to voices only may result from visual search for faces during memory retrieval. A general decrease of reaction times across study-test cycles suggests facilitated recognition with more speaker repetitions. Overall, our data suggest two simultaneous and opposing mechanisms during bimodal face-voice learning: while attentional capture of faces may initially impede voice learning, audiovisual integration may facilitate it thereafter.  相似文献   

19.
The nonverbal component of human speech contains some information about the speaker himself, which for example enables listeners to recognize speakers from their voice. Here it was examined to what extent the speaker's body size and shape are betrayed in his speech signal and thus can be recognized by listeners. Contrary to earlier constitutional studies only size and not shape correlates with acoustical parameters of speech; comparing listening experiments with acoustical analysis gives some evidence that the average sound spectrum is used by listeners for judging speaker's body size.  相似文献   

20.
The vocal apparatus serves phonation. It represents a biocybernetic self-regulating system, disposing of a feedback network of the central nervous system. The larynx is a self-induced vibrating system. The larynx, functioning as the phonation apparatus of the vocal apparatus, is a source of human voice. In every individual its frequency range corresponds to about eight semitones in speech and about two octaves of the so-called chest register in singing, denoted also as a thoracic or modal voice. This is followed by one more octave of the so-called cranial register or falsetto voice. We were interested in changes of the larynx positions at intonation in the fundamental singing registers, both modal and falsetto, in professional male singers. At our disposal were 11 professional male singers. We investigated changes in the position of the laryngeal structures simultaneously with the aid of an X-ray apparatus, the acoustic and mechanical signals registered by means of the B & K 4369 acceleration recorder. It has been found that at phonation with the modal voice a change in the position of the laryngeal structures takes place in two different ways, whereas the larynx movements at falsetto remain the same. It has been suggested that a complex fixation apparatus participates in the phonation larynx movements. Of the same complex character are also the problems connected with the examination of the entire vocal apparatus. For the purpose of compiling the present pieces of knowledge in the field of human voice studies, we have made the most advantageous use of the presently most complex system Authorware for the production of some interactive multimedial programmes on personal computers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号