Speaker detection using acoustic event sequences (original) (raw)
Related papers
A multiclass framework for Speaker Verification within an Acoustic Event Sequence system
2006
Building acoustic events and their sequence analysis (AES system) is a method that proved its efficiency in . Indeed, the methodology combines the power of the world model GMM, used in stateof-the-art speaker detection systems, for extracting speaker independent events with an analysis of these event sequences via tools usually used in so-called High Level Speaker Detection systems. The efficiency of this system has been validated at the last NIST evaluation campaign. This paper aims at proposing a new framework by applying an AES system on multiple classes, C-AES. The originality of this work is to consider that intraclass sequence analysis can bring more information than a global analysis on the whole speaker utterance. This paper also proposes a method to take into account the apriori knowledge of the classes within the scoring process. The results support the fact that intraclass information is discriminant for speaker verification, as a combination with a state-of-the-art GMM brings a 12% relative gain at the DCF.
A GMM-based Probabilistic Sequence Kernel for Speaker Verification
This paper describes the derivation of a sequence kernel that transforms speech utterances into probabilistic vectors for classification in an expanded feature space. The sequence kernel is built upon a set of Gaussian basis functions, where half of the basis functions contain speaker specific information while the other half implicates the common characteristics of the competing background speakers. The idea is similar to that in the Gaussian mixture modeluniversal background model (GMM-UBM) system, except that the Gaussian densities are treated individually in our proposed sequence kernel, as opposed to two mixtures of Gaussian densities in the GMM-UBM system. The motivation is to exploit the individual Gaussian components for better speaker discrimination. Experiments on NIST 2001 SRE corpus show convincing results for the probabilistic sequence kernel approach.
GMM based speaker recognition on readily available databases
2003
In this paper we give an overview of the most popular databases used in speaker recognition evaluation and clearly outline the means of training and testing an ASR system using them. The complete ASR system including both feature extraction and classification is explicitly explained in great detail. The performance of a GMM based system for speaker verification and identification is reported using various forms of MFCC features on the TIMIT, YOHO and ANDOSL databases.
The 2004 MIT Lincoln Laboratory speaker recognition system
2005
The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian Mixture Models, Support Vector Machines and N-gram language models and were combined using a single layer percepton fuser. The 2004 SRE used a new multilingual , multi-channel speech corpus that provided a challenging speaker detection task for the above systems. In this paper we describe the core systems used and provide an overview of their performance on the 2004 SRE detection tasks.
Comparative Study of Several Novel Acoustic Features for Speaker Recognition
Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, 2008
Finding good features that represent speaker identity is an important problem in speaker recognition area. Recently a number of novel acoustic features have been proposed for speaker recognition. The researchers use different data sets and sometimes different classifiers to evaluate the features and compare them to the baselines such as MFCC or LPCC. However, due to different experimental conditions direct comparison of those features to each other is difficult or impossible. This paper presents a study of five new recently proposed acoustic features using the same data (NIST 2001 SRE), and the same UBM-GMM classifier. The results are presented as DET curves with equal error ratios indicated. Also, an SVM-based combination of GMM scores produced on different features has been made to determine if the new features carry any complimentary information. The results for different features as well as for their combinations are directly comparable to each other and to those obtained with the baseline MFCC features.
Speaker Recognition Systems in the Last Decade – A Survey
Engineering and Technology Journal
Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.
A NOVEL METHODOLOGY FOR SPEAKER VERIFICATION USING GMM
Internation journal of science and innovation engineering and technology, 2018
The speaker verification is a process of verifying the identity of the claimants. It performs one to one comparison between a newly input voice print and the voice print for the claimed identity that is stored in the database. In this paper, linear predictive coding coefficient has been used for formant detection. The peak frequencies in the frequency response of vocal tract are formants, which is being detected and compared for verification. Data base of twenty persons having five samples per person including male and female has been created for analysis of results. The System (Speaker verification) is usually employed as a "gatekeeper" in order to provide access to a secure system. These systems operate with the user's knowledge and typically require the user's cooperation. The developed system uses the MATLAB.
Limited Data Speaker Verification: Fusion of Features
International Journal of Electrical and Computer Engineering (IJECE), 2017
The present work demonstrates experimental evaluation of speaker verification for different speech feature extraction techniques with the constraints of limited data (less than 15 seconds). The state-of-the-art speaker verification techniques provide good performance for sufficient data (greater than 1 minutes). It is a challenging task to develop techniques which perform well for speaker verification under limited data condition. In this work different features like Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), Delta (), Delta-Delta (), Linear Prediction Residual (LPR) and Linear Prediction Residual Phase (LPRP) are considered. The performance of individual features is studied and for better verification performance , combination of these features is attempted. A comparative study is made between Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) through experimental evaluation. The experiments are conducted using NIST-2003 database. The experimental results show that, the combination of features provides better performance compared to the individual features. Further GMM-UBM modeling gives reduced equal error rate (EER) as compared to GMM. 1. INTRODUCTION Speech signals play a main role in communication media to understand the conversation between the people [1]. The speaker recognition is a technique to recognize a speaker using his/her original speech voice and can be used for either speaker verification or speaker identification [2]. Over the last decade, speaker verification has been used for many commercial applications and these applications prefer limited data conditions. Further, limited data indicates speech data of few seconds (less than 15 sec). Based on the nature of training and test speech data, text-dependent and text-independent [3] are two classification of speaker verification. In text-dependent mode, speaker training and testing data remains same and in case of text-independent, training and testing speech data are different. Text-independent speaker verification under limited data conditions has always been a challenging task. The speaker verification system contains four stages, namely analysis of speech data, extraction of features, modeling and testing [4]. The analysis stage analyzes the speaker information using vocal tract [5], excitation source [6] and suprasegmental features like duration, accent and modulation [7].
Speaker and Session Variability in GMM-Based Speaker Verification
IEEE Transactions on Audio, Speech and Language Processing, 2000
We present a corpus-based approach to speaker verification in which maximum likelihood II criteria are used to train a large scale generative model of speaker and session variability which we call joint factor analysis. Enrolling a target speaker consists in calculating the posterior distribution of the hidden variables in the factor analysis model and verification tests are conducted using a new type of likelihood II ratio statistic. Using the NIST 1999 and 2000 speaker recognition evaluation data sets, we show that the effectiveness of this approach depends on the availability of a training corpus which is well matched with the evaluation set used for testing. Experiments on the NIST 1999 evaluation set using a mismatched corpus to train factor analysis models did not result in any improvement over standard methods but we found that, even with this type of mismatch, feature warping performs extremely well in conjunction with the factor analysis model and this enabled us to obtain very good results (equal error rates of about 6.2%).