Feature dimensionality reduction through Genetic Algorithms for faster speaker recognition (original) (raw)
Related papers
Feature Selection vs. Feature Transformation in Reducing Dimensionality for Speaker Recognition
2008
Mel-Frequency Cepstral Coefficients and their derivatives are commonly used as acoustic features for speaker recognition. Reducing the dimensionality of the feature set leads to more robust estimates of the model parameters, and speeds up the classification task, which is crucial for real-time speaker recognition applications running on low-resource devices. In this paper, a feature selection procedure based on genetic algorithms (GA) is compared to two well-known dimensionality reduction techniques based on linear transforms, namely Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Evaluation is carried out for two speech databases, containing laboratory read speech and telephone spontaneous speech, and applying a state-ofthe-art speaker recognition system. Results with GAbased feature selection suggest that dynamic features are less discriminant than static ones, since the low-size optimal subsets found by the GA did not include dynamic features. GA-based feature selection outperformed PCA and LDA when dealing with clean speech, but not for telephone speech, probably due to some noise compensation implicit in linear transforms, which cannot be accomplished just by selecting a subset of features.
Feature Selection Based on Genetic Algorithms for Speaker Recognition
The Mel-Frequency Cepstral Coefficients (MFCC) and their derivatives are commonly used as acoustic features for speaker recognition. The issue arises of whether some of those features are redundant or dependent on other features. Probably, not all of them are equally relevant for speaker recognition. Reduced feature sets allow more robust estimates of the model parameters. Also, less computational resources are required, which is crucial for real-time speaker recognition applications using low-resource devices. In this paper, we use feature weighting as an intermediate step towards feature selection. Genetic algorithms are used to find the optimal set of weights for a 38-dimensional feature set, consisting of 12 MFCC, their first and second derivatives, energy and its first derivative. To evaluate each set of weights, speaker recognition errors are counted over a validation dataset. Speaker models are based on empirical distributions of acoustic labels, obtained through vector quantization. On average, weighting acoustic features yields between 15% and 25% error reduction in speaker recognition tests. Finally, features are sorted according to their weights, and the K features with greatest average ranks are retained and evaluated. We conclude that combining feature weighting and feature selection allows to reduce costs without degrading performance. 1
Using Genetic Algorithms to Weight Acoustic Features for Speaker Recognition
2006
The Mel-Frequency Cepstral Coefficients (MFCC) are widely accepted as a suitable representation for speaker recognition applications. MFCC are usually augmented with dynamic features, leading to high dimensional representations. The issue arises of whether some of those features are redundant or dependent on other features. Probably, not all of them are equally relevant for speaker recognition. In this work, we explore the potential benefit of weighting acoustic features to improve speaker recognition accuracy. Genetic algorithms (GAs) are used to find the optimal set of weights for a 38dimensional feature set. To evaluate each set of weights, recognition error is measured over a validation dataset. Naive speaker models are used, based on empirical distributions of vector quantizer labels. Weighting acoustic features yields 24.58% and 14.68% relative error reductions in two series of speaker recognition tests. These results provide evidence that further improvements in speaker recognition performance can be attained by weighting acoustic features. They also validate the use of GAs to search for an optimal set of feature weights. 1
On Feature Selection for Speaker Verification
2002
This paper describes an HMM based speaker verification system, which verifies speakers in their own specific feature space. This 'individual' feature space is determined by a Dynamic Programming (DP) feature selection algorithm. A suitable criterion, correlated with Equal Error Rate (EER) was developed and is used for this feature selection algorithm. The algorithm was evaluated on a text-dependent database. A significant improvement in verification results was demonstrated with the DP selected individual feature space. An EER of 4.8% was achieved when the feature set was the "almost standard" Mel Frequency Cepstrum Coefficients (MFCC) space (12 MFCC + 12 ∆MFCC). Under the same conditions, a system based on the selected feature space yielded an EER of only 2.7%.
Feature selection for speaker verification using genetic programming
Evolutionary Intelligence, 2017
We present a study examining feature selection from high performing models evolved using Genetic Programming (GP) on the problem of Automatic Speaker Verification (ASV). ASV is a highly unbalanced binary classification problem in which a given speaker must be verified against everyone else. We evolve classification models for 10 individual speakers using a variety of fitness functions and data sampling techniques and examine the generalisation of each model on a 1:9 unbalanced set. A significant difference between train and test performance is found which may indicate overfitting in the models. Using only the best generalising models, we examine two methods for selecting the most important features. We compare the performance of a number of tuned machine learning classifiers using the full 275 features and a reduced set of 20 features from both feature selection methods. Results show that using only the top 20 features found in high performing GP programs led to test classifications that are as good as, or better than, those obtained using all data in the majority of experiments undertaken. The classification accuracy between speakers varies considerably across all experiments showing that some speakers are easier to classify than others. This indicates that in This work was carried out as a collaboration of projects funded by Science Foundation Ireland under grant Grant Numbers 08/SRC/FM1389 and 13/IA/1850.
Complementary features for speaker verification based on genetic algorithms
2007
Speech recognition systems usually need a feature extraction stage aiming at obtaining the best signal representation. State of the art speaker verification systems are based on cepstrals features like MFCC, LFCC or LPCC. In this article, we propose to use a genetic algorithm to provide new features able to complete the LFCC's.
Performance Evaluation of Feature Extraction and Modeling Methods for Speaker Recognition
Annals of Reviews & Research, 2018
In this study, the performance of the prominent feature extraction and modeling methods in speaker recognition systems are evaluated on the specifically created database. The main feature of the database is that subjects are siblings or relatives. After giving the basic information about speaker recognition systems, outstanding properties of the methods are briefly mentioned. While Linear Predictive Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficients (MFCC) methods are preferred for feature extraction, Gaussian Mixture Model (GMM) and I-Vector methods are employed for modeling. The best results are tried to be obtained by changing the parameters of these methods. A number of features for LPCC and MFCC and number of mixture components for GMM are the parameters experimented by changing. The aim of this study is to find out which parameters of the most commonly used methods contribute the success and at the same time, to determine the best combination of feature extraction and modeling methods for the speakers having similar sounds. This study is also a good resource and guidance for the researchers in the area of speaker recognition.
Feature Selection Based on Genetic Algorithms for On-Line Signature Verification
2007 IEEE Workshop on Automatic Identification Advanced Technologies, 2007
The Mel-Frequency Cepstral Coefficients (MFCC) and their derivatives are commonly used as acoustic features for speaker recognition. The issue arises of whether some of those features are redundant or dependent on other features. Probably, not all of them are equally relevant for speaker recognition. Reduced feature sets allow more robust estimates of the model parameters. Also, less computational resources are required, which is crucial for real-time speaker recognition applications using low-resource devices. In this paper, we use feature weighting as an intermediate step towards feature selection. Genetic algorithms are used to find the optimal set of weights for a 38-dimensional feature set, consisting of 12 MFCC, their first and second derivatives, energy and its first derivative. To evaluate each set of weights, speaker recognition errors are counted over a validation dataset. Speaker models are based on empirical distributions of acoustic labels, obtained through vector quantization. On average, weighting acoustic features yields between 15% and 25% error reduction in speaker recognition tests. Finally, features are sorted according to their weights, and the K features with greatest average ranks are retained and evaluated. We conclude that combining feature weighting and feature selection allows to reduce costs without degrading performance. 1
Feature Selection Method for Speaker Recognition using Neural Network
International Journal of Computer Applications, 2014
The aim of this paper is to extract and select features from speech signal that will make it possible to have acceptable speaker recognition rate in real life. A variety of combinations among formants (F1, F2, F3), Linear Predictive Coefficients (LPC), Mel Frequency Cepstral Coefficients (MFCC) and delta-Mel Frequency Cepstral Coefficients representing features are considered and their effect in speaker recognition is observed. Two similar volume data sets with differed string (words) are considered in the present study. These two data sets are prepared taking into account two differed data sampling rates. The study reveals another interesting fact that the selection of strings in speaker enrollment process is a matter of importance for accurate result. This means that the speaker will be tested for authentication with the same string with which he was enrolled earlier during the time of his first access to the system.
Information Technology And Control, 2020
One extension of feature vector for automatic speaker recognition is considered in this paper. The starting feature vector consisted of 18 mel-frequency cepstral coefficients (MFCCs). Extension was done with two additional features derived from the spectrum of the speech signal. The main idea that generated this research is that it is possible to increase the efficiency of automatic speaker recognition by constructing a feature vector which tracks a real perceived spectrum in the observed speech. Additional features are based on the energy maximums in the appropriate frequency ranges of observed speech frames. In experiments, accuracy and equal error rate (EER) are compared in the case when feature vectors contain only 18 MFCCs and in cases when additional features are used. Recognition accuracy increased by around 3%. Values of EER show smaller differentiation but the results show that adding proposed additional features produced a lower decision threshold. These results indicate t...