A Novel Approach in Feature Level for Robust Text-Independent Speaker Identification system (original) (raw)

One Solution of Extension of Mel-Frequency Cepstral Coefficients Feature Vector for Automatic Speaker Recognition

Information Technology And Control, 2020

One extension of feature vector for automatic speaker recognition is considered in this paper. The starting feature vector consisted of 18 mel-frequency cepstral coefficients (MFCCs). Extension was done with two additional features derived from the spectrum of the speech signal. The main idea that generated this research is that it is possible to increase the efficiency of automatic speaker recognition by constructing a feature vector which tracks a real perceived spectrum in the observed speech. Additional features are based on the energy maximums in the appropriate frequency ranges of observed speech frames. In experiments, accuracy and equal error rate (EER) are compared in the case when feature vectors contain only 18 MFCCs and in cases when additional features are used. Recognition accuracy increased by around 3%. Values of EER show smaller differentiation but the results show that adding proposed additional features produced a lower decision threshold. These results indicate t...

An Efficient Approach for MFCC Feature Extraction for Text Independant Speaker Identification System

This paper presents an efficient noise-robust feature extraction method for remote speaker identification system. Mel frequency cepstral coefficients (MFCCs) are the most widely used front ends in the state of the art speaker identification systems. One of the major problem with MFCCs is that it deteriorates in the presence of noise. To overcome this problem, we have propsed an efficient feature extraction technique based on the combination between the MFCC and parameters of too pole filter parameter (Autoregressive model parameters) that characterize the human vocal tract. The system employs a robust speech feature based on MFCCAR modeled by GMM. An effective speech enhancement methods is essential for speaker recognition, an overview of some recent speech enhancement techniques of the state of the art have been presented where we have investigated its effects on our speaker identification system accuracy based on MFCCAR. TIMIT database with speech signals from 200 speakers has been used in Matlab simulation. The first four utterances for each speaker could be defined as the training set while 1 utterance as the test set. Experimental results show that proposed methods achieve better performance. The use of MFCCAR approach has provided significant improvements in identification rate accuracy when compared with MFCC, deltaMFCC and PLP in noisy environment. However, with regard to runtime, MFCCAR requires more time to execute. In terms of effects of reverberant speech enhancement methods, it is shown a significant improvement for Tracking of noise algorithm method.

Use of Mel Frequency Cepstral Coefficients for the Implementation of a Speaker Recognition System

2019

The paper proposes a Speaker Recognition system which does the task of validating a user’s claimed identity using characteristics extracted from their voices. It is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. A direct analysis and synthesizing of the complex voice signal is due to too much information contained in the signal. Therefore, the digital signal processes, Feature Extraction and Feature Matching were introduced to represent the voice signal. MelFrequency Cepstral Coefficients (MFCC) were extracted from the speech signal which were used to represent each speaker and recognition was carried out using weighted Euclidean distance. MATLABR2017b platform was used to implement feature extraction process. Index Terms – Co Feature matching, Feature Extraction, MFCC, Euclidean distance.

Comparison of features extracted using time-frequency and frequency-time analysis approach for text-independent speaker identification

2011 National Conference on Communications (NCC), 2011

This paper compares the feature sets extracted using time-frequency analysis approach and frequency-time analysis approach for text-independent speaker identification. Melfrequency cepstral coefficient (MFCC) feature set and Inverted Mel-frequency cepstral coefficient (IMFCC) feature set are extracted using time-frequency analysis approach. Temporal energy subband cepstral coefficient (TESBCC) feature set is extracted using frequency time analysis approach. Timebandwidth product of MFCC filter bank and TESBCC filter bank has been compared. RV coefficient has been used to calculate the correlation between the feature sets. Experimental evaluation was conducted on POLYCOST database with 130 speakers using Gaussian mixture speaker model. The TESBCC feature set has 9.5% higher average accuracy compared to the MFCC feature set. It is found that, the feature set extracted using time-frequency analysis approach is practically uncorrelated with the feature set extracted using frequencytime analysis approach. It is also demonstrated that IMFCC feature set has important role in fusion.

Speaker Identification using Mel Frequency Cepstral Coefficient and BPNN

Speech processing is emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. Feature extraction is the first step for speaker recognition. Many algorithms are suggested/developed by the researchers for feature extraction. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text dependent speaker identification system. BPNN is used for identification of speaker after training the feature set from MFCC. Some modifications to the existing technique of MFCC for feature extraction are also suggested to improve the speaker recognition efficiency. Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech.

Fourier-Bessel based Cepstral Coefficient Features for Text-Independent Speaker Identification

speech.iiit.ac.in

This paper proposes the Fourier-Bessel cepstral coefficients (FBCC) as features for robust text-independent speaker identification. Fourier-Bessel (FB) expansion is used instead of Fourier transform for representing the signal in frequency domain. FB expansion can be viewed as two-dimensional Fourier transform. Change in the kernel of the transform from exponential to decaying exponentials helps in viewing the speech signal as a linear sum of decaying exponentials. For signals arising out of acoustic tubes, where the signal is subjected to many damping effects, delays in the different components of the signal is inevitable. Representing such signals using FB coefficients helps in able identification of different components present in the signal. The random non-stationary nature of speech signal is more efficiently represented by damped sinusoidal nature of basis function that is more natural for the voiced speech signal since Bessel functions have damped sinusoidal as basis function, so it is more natural choice for the representation of natural signals. Vocal tract is modeled as a set of linear acoustic tubes being cylindrical in shape can be efficiently modeled using FB expansion because Bessel functions are solutions to cylindrical wave equations. The proposed approach to speaker identification is based on FBCC features, and method employ Gaussian mixture for modeling the speaker characteristics. However, we have build the speaker models from the Fourier-Bessel features derived from the speech samples, as an alternative to Mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) for building the speaker models. An evaluation of the Gaussian mixture model is conducted on TIMIT database which consists of 630 speakers and 10 speech utterances per speaker and white noise signals of TIMIT database having various SNRs of 50, 40, 30 and 20 dB. Using the statistical model like Gaussian mixture model (GMM) and features extracted from the speech signals build a unique identity for each person who enrolled for speaker identification [1]. Estimation and Maximization algorithm is used for finding the maximum likelihood solution for a model with features, to test the later speeches against the database of all speakers who enrolled in the database. Experimental results shows that the FBCC can be used as the alternate feature for the LPCC and MFCC since it can improve the performance of the speaker identification task.

Mel Frequency Cepstral Coefficients Based Text Independent Automatic Speaker Recognition Using Matlab

Speech feature extraction is the most significant step in any Automatic speaker recognition system. In the last 60 years a lot of research has gone into parametric representation of these speech features. Several techniques are currently being used for Automatic Speaker Recognition. Yet Automatic Speaker Recognition still remains a confront mainly due to variations in speaker's vocal tract with time and health, varying environmental conditions, disparities in the behavior and quality of speech recorders etc. MFCC is a extensively used technique in Automatic speaker recognition. In this paper the performance of MFCC technique was evaluated in a quiet environment. A speaker database containing 30 male and 30 female speakers was created. Two separate experiments were conducted for the performance evaluation of MFCC technique when applied to K means clustering. In the first case the speech features were directly matched. In the second case a VQ codebook was created by clustering the training features of these 60 speakers. A distortion easure based on the minimum Euclidean distance was used for speaker recognition. The failure rate of speaker recognition in first ase was found to be was found to be 10% while in the second case as found to be 14%. Matlab-7.10.0 was used for this study

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Indonesian Journal of Electrical Engineering and Computer Science, 2020

In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have...

A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction

Speech Communication, 2001

This paper presents a new framework for designing a feature extractor in a speaker identi®cation system based on the discriminative feature extraction (DFE) method. In order to ®nd the frequency scale appropriate for accurate speaker identi®cation, a mel-cepstral estimation technique using a second-order all-pass warping function is applied to the feature extractor; the frequency warping parameters and the text-independent speaker model parameters are jointly optimized based on a minimum classi®cation error (MCE) criterion. Experimental results show that the frequency scale after optimization is dierent from traditional Linear/Mel scales and the proposed system outperforms conventional systems in which only the classi®er is optimized with the MCE criterion. Ó 2001 Elsevier Science B.V. All rights reserved.

SPEAKER RECOGNITION WITH ARTIFICIAL NEURAL NETWORKS AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS CORRELATIONS

The problem addressed in this paper is related to the fact that classical statistical approach for speaker recognition yields satisfactory results but at the expense of long length training and test utterances. An attempt to reduce the length of speaker samples is of great importance in the field of speaker recognition since the statistical approach, due to its limitations, is usually precluded from use in real-time applications. A novel method of text-independent speaker recognition which uses only the correlations among MFCCs, computed over selected speech segments of very-short length (approximately 120ms) is proposed. Three different neural networks -the Multi-Layer Perceptron (MLP), the Steinbuch's Learnmatrix (SLM) and the Self-Organizing Feature Finder (SOFF) -are evaluated in a speaker recognition task. The ability of dimensionality reduction of the SOFF paradigm is also discussed.