Speaker Verification Based on Wavelet Packets (original) (raw)
Related papers
Wavelet packet approximation of critical bands for speaker verification
International Journal of Speech Technology, 2007
Exploiting the capabilities offered by the plethora of existing wavelets, together with the powerful set of orthonormal bases provided by wavelet packets, we construct a novel wavelet packet-based set of speech features that is optimized for the task of speaker verification. Our approach differs from previous wavelet-based work, primarily in the wavelet-packet tree design that follows the concept of critical bands, as well as in the particular wavelet basis function that has been used. In comparative experiments, we investigate several alternative speech parameterizations with respect to their usefulness for differentiating among human voices. The experimental results confirm that the proposed speech features outperform Mel-Frequency Cepstral Coefficients (MFCC) and previously used wavelet features on the task of speaker verification. A relative reduction of the equal error rate by 15%, 15% and 8% was observed for the proposed speech features, when compared to the wavelet packet features introduced by Farooq and Datta, the MFCC of Slaney, and the subband based cepstral coefficients of Sarikaya et al., respectively.
Wavelet packet based speaker verification
2004
In an attempt to find out a more appropriate representation of a speech signal for the task of speaker recognition, we study alternative ways to represent speakers' voices individuality. A novel wavelet packet based set of speech features, apposite for speaker recognition, is proposed. We exploit the capabilities offered by the plethora of existing wavelets, along with the powerful set of orthonormal bases provided by wavelet packets that allow an effective manipulation of the frequency subbands. Our scheme differs from previous wavelet-based works, primarily in the wavelet-packet tree design which follows the concept of critical bandwidth, as well as in the particular wavelet basis function that has been used. Our baseline text-independent speaker verification system, which has participated in the 2002 NIST Speaker Recognition Evaluation, was used as a platform to study the practical significance of the proposed speech parameters. Comparative experimental results confirm the assertion that the proposed speech features outperform MFCC, as well as previously used wavelet features, on the task of speaker verification.
Speaker Recognition – Wavelet Packet Based Multiresolution Feature Extraction Approach
2017
This paper proposes a novel Wavelet Packet based feature extraction approach for the task of text independent speaker recognition. The features are extracted by using the combination of Mel Frequency Cepstral Coefficient (MFCC) and Wavelet Packet Transform (WPT).Hybrid Features technique uses the advantage of human ear simulation offered by MFCC combining it with multi-resolution property and noise robustness of WPT. To check the validity of the proposed approach for the text independent speaker identification and verification we have used the Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) respectively as the classifiers. The proposed paradigm is tested on voxforge speech corpus and CSTR US KED Timit database. The paradigm is also evaluated after adding standard noise signal at different level of SNRs for evaluating the noise robustness. Experimental results show that better results are achieved for the tasks of both speaker identification as well as speaker verification.
IIUM Engineering Journal, 2022
Speaker recognition is the process of recognizing a speaker from his speech. This can be used in many aspects of life, such as taking access remotely to a personal device, securing access to voice control, and doing a forensic investigation. In speaker recognition, extracting features from the speech is the most critical process. The features are used to represent the speech as unique features to distinguish speech samples from one another. In this research, we proposed the use of a combination of Wavelet and Mel Frequency Cepstral Coefficient (MFCC), Wavelet-MFCC, as feature extraction methods, and Hidden Markov Model (HMM) as classification. The speech signal is first extracted using Wavelet into one level of decomposition, then only the sub-band detail coefficient is used as the feature for further extraction using MFCC. The modeled system was applied in 300 speech datasets of 30 speakers uttering “HADIR” in the Indonesian language. K-fold cross-validation is implemented with fiv...
Automatic speaker identification by means of Mel cepstrum, wavelets and wavelet packets
2000
The present work consists on the use of Delta Cepstra Coeficients in Me1 scale, Wavelet and Wavelet Packets Transforms to feed a system for automatic speaker identification based on neural networks. Different alternatives are tested for the classifier based on neural nets, being achieved very good performance for closed groups of speakers in a text independent form. When a single neural net is used for al1 the speakers, the results decay abruptly when increasing the number of speakers to identify. This takes to implement, a system where there is one neural net for each speaker, which provided excellent results, compared with the opposing ones in the bibliography using other methods. This classifier structure possesses other advantages, for example, add a new speaker to the system only requires to train a net for the speaker in question, in contrast with a system where the classifier is formed by a single great net, which should be in general trained completely again.
Speaker Identification Using Discrete Wavelet Transform
Journal of Computer Science, 2014
This study presents an experimental evaluation of Discrete Wavelet Transforms for use in speaker identification. The features are tested using speech data provided by the CHAINS corpus. This system consists of two stages: Feature extraction stage and the identification stage. Parameters are extracted and used in a closed-set text-independent speaker identification task. In this study the signals are pre-processed and features are extracted using discrete wavelet transforms. The energy of the wavelet coefficients are used for training the Gaussian Mixture Model. Daubechies wavelets are used and the speech samples are analyzed using 8 levels of decomposition.
Discrete wavelet transform for automatic speaker recognition
2010
This paper deals with automatic speaker recognition. We consider here a context independent speaker recognition task with a closed set of speakers. We have shown in [1] a comparative study about the most frequently used parametrization/classification methods for the Czech language. Wavelet Transform (WT) is a modern parametrization method successfully used for some signal processing tasks. WT often outperforms parametrizations based on Fourier Transform, due to its capability to represent the signal precisely, in both frequency and time domains. The main goal of this paper is thus to use and evaluate several Wavelet Transforms instead of the conventional parametrizations that were used previously as a parametrization method of automatic speaker recognition. All experiments are performed on two Czech speaker corpora that contain speech of ten and fifty Czech native speakers, respectively. Three discrete wavelet families with different number of coefficients have been used and evaluated: Daubechies, Symlets and Coiflets with two classifiers: Gaussian Mixture Model (GMM) and Multi-Layer Perceptron (MLP). We show that recognition accuracy of wavelet parametrizations is very good and sometimes outperform the best parametrizations that were presented in our previous work.
Self-Organizing Map Weights and Wavelet Packet Entropy for Speaker Verification
Abstract—With the growing trend toward distant security verification systems for telephone banking, biometric security measures and other remote access applications, Automatic Speaker Verification (ASV) has attracted a great attention in recent years. The complexity of ASV system and its verification time depends on the number of feature vector elements.
Wavelet Based Noise Robust Features for Speaker Recognition
Signal Processing: An International Journal ( …, 2011
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
Arxiv preprint arXiv:1003.5627, 2010
To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral Coefficients (MFCCs) of the wavelet channels are calculated. Hidden Markov Models (HMMs) were used for the recognition stage as they give better recognition for the speaker's features than Dynamic Time Warping (DTW). Comparison of the proposed approach with the MFCCs conventional feature extraction method shows that the proposed method not only effectively reduces the influence of noise, but also improves recognition. A recognition rate of 99.3% was obtained using the proposed feature extraction technique compared to 98.7% using the MFCCs. When the test patterns were corrupted by additive white Gaussian noise with 20 dB S/N ratio, the recognition rate was 97.3% using the proposed method compared to 93.3% using the MFCCs.