FPGA Implementation for GMM-Based Speaker Identification (original) (raw)

Speaker identification system based on FPGA

2005 12th IEEE International Conference on Electronics, Circuits and Systems, 2005

Speaker identification is the process of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract and these can be exploited by extracting feature vectors such as Mel frequency cepstral coefficients (MFCCs) from the speech signal. The Gaussian Mixture Model (GMM) as a well-known statistical model then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the classification hardware implementation of a text-independent GMM-based speaker identification system. A speed factor of 90 was achieved compared to software-based implementation on a standard PC.

Real-Time Speaker Verification System Implemented on Reconfigurable Hardware

Journal of Signal Processing Systems, 2013

Nowadays, biometrics is considered as a promising solution in the market of security and personal verification. Applications such as financial transactions, law enforcement or network management security are already benefitting from this technology. Among the different biometric modalities, speaker verification represents an accurate and efficient way of authenticating a person's identity by analyzing his/her voice. This identification method is especially suitable in real-life scenarios or when a remote recognition over the phone is required. The processing of a signal of voice, in order to extract its unique features, that allows distinguishing an individual to confirm or deny his/ her identity is, usually, a process characterized by a high computational cost. This complexity imposes that many systems, based on microprocessor clocked at hundreds of MHz, are unable to process samples of voice in real-time. This drawback has an important effect, since in general, the response time needed by the biometric system affects its acceptability by users. The design based on FPGA (Field Programmable Gate Arrays) is a suited way to implement systems that require a high computational capability and the resolution of algorithms in real-time. Besides, these devices allow the design of complex digital systems with outstanding performance in terms of execution time. This paper presents the implementation of a MFCC (Mel-Frequency Cepstrum Coefficients)-SVM (Support Vector Machine) speaker verification system based on a low-cost FPGA. Experimental results show that our system is able to verify a person's identity as fast as a high-performance microprocessor based on a Pentium IV personal computer.

Use of Mel Frequency Cepstral Coefficients for the Implementation of a Speaker Recognition System

2019

The paper proposes a Speaker Recognition system which does the task of validating a user’s claimed identity using characteristics extracted from their voices. It is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. A direct analysis and synthesizing of the complex voice signal is due to too much information contained in the signal. Therefore, the digital signal processes, Feature Extraction and Feature Matching were introduced to represent the voice signal. MelFrequency Cepstral Coefficients (MFCC) were extracted from the speech signal which were used to represent each speaker and recognition was carried out using weighted Euclidean distance. MATLABR2017b platform was used to implement feature extraction process. Index Terms – Co Feature matching, Feature Extraction, MFCC, Euclidean distance.

Text Independent Automatic Speaker Recognition System Using Mel-Frequency Cepstrum Coefficient and Gaussian Mixture Models

Journal of Information Security, 2012

The aim of this paper is to show the accuracy and time results of a text independent automatic speaker recognition (ASR) system, based on Mel-Frequency Cepstrum Coefficients (MFCC) and Gaussian Mixture Models (GMM), in order to develop a security control access gate. 450 speakers were randomly extracted from the Voxforge.org audio database, their utterances have been improved using spectral subtraction, then MFCC were extracted and these coefficients were statistically analyzed by GMM in order to build each profile. For each speaker two different speech files were used: the first one to build the profile database, the second one to test the system performance. The accuracy achieved by the proposed approach is greater than 96% and the time spent for a single test run, implemented in Matlab language, is about 2 seconds on a common PC.

Speaker Recognition Using Gaussian Mixtures Models

Bio-Inspired Applications of Connectionism

Speaker recognition is a term which is most popular in biometric recognition technique that tends to identify and verify a speaker from his/her speech data. Speaker recognition system uses mechanism to recognize the speaker by using the speaker's speech signal. It is mainly useful in applications where security is the main and important one. Generally, speech information are recorded though the air microphone and these speech information collected from various speakers are used as input for the speaker recognition system as they are prone to environmental background noise, the performance is enhanced by integrating an additional speech signal collected through a throat microphone along with speech signal collected from standard air microphone. The resulting signal is very similar to normal speech, and is not affected by environmental background noise. This paper is mainly focused on extraction of the Mel frequency Cepstral Coefficients (MFCC) feature from an air speech signal and throat speech signal to built Gaussian Mixture Model(GMM) based closed-set text independent speaker recognition systems and to depict the result based on identification.

Automatic Speaker Recognition Based on Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models

Mehran University Research Journal of Engineering and Technology, 2013

This paper investigates the task of SR (Speaker Recognition) for the state-of-the-art techniques. The paper initially presents the technical description of automatic SR, followed by the comparative analysis of a number of methods available for feature extraction and modeling. Based on this analysis the NIST 2001, NIST 2002, NIST 2004 and NIST 2006 Speaker recognition corpora are used to investigate the state of the art feature extraction and modeling techniques. The state of the art technique for feature extraction is delta MFCC ( Mel Frequency Cepstral Coefficients) and for modeling is GMM (Gaussian Mixture Models) based on EM (Expectation Maximization). Further in this paper the details about the enrollment/training and recognition/testing is also presented. For different stages of SR systems the conventional methods are summarized.

Speaker Recognition Systems in the Last Decade – A Survey

Engineering and Technology Journal

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.

An Automatic Speaker Recognition System

Lecture Notes in Computer Science, 2008

Abstract. Speaker Recognition is the process of identifying a speaker by analyzing spectral shape of the voice signal. This is done by extracting & matching the feature of voice signal. Mel-frequency Cepstrum Co-efficient (MFCC) is the feature extraction technique in which we will get ...

SVM speaker verification system based on a low-cost FPGA

2009 International Conference on Field Programmable Logic and Applications, 2009

Biometric systems, characterized by their high confidential levels of security, are usually based on high-performance microprocessors implemented on personal computers. These advanced devices contain floating-point units able to carry out millions of operations per second at frequencies in the GHz range, being qualified to resolve the most complex algorithms in just a few hundred of milliseconds. However, their main drawback is the cost, and the necessary space required to incorporate their external associated peripherals. This disadvantage is especially significant in the low-cost consumer market, where factors such as price and size determine the viability of a product. The use of an FPGA is a suited way to implement systems that require a high computational capability at affordable prices. Besides, these devices allow the design of complex digital systems with outstanding performances in terms of execution times. This paper presents the implementation of a SVM (Support Vector Machines) speaker verification system on a low-cost FPGA. Experimental results show as our system is able to verify a person's identity as fast as a high-performance microprocessor based on a Pentium IV personal computer.

GMM Based Speaker Verification System

International Journal of Engineering Research and, 2015

Speaker verification system is a popular biometric system and is widely used in application such as speaker authentication. This paper presents a framework for speaker verification by preserving the speaker privacy. A small speech sample of a speaker is a private form of communication and contains information such as a message via words, gender, language being spoken, emotional state etc. In this, the verification system does not have a direct access to the voice input provided by the speaker. The system also provides privacy of the speaker model saved by the system and thus preventing it to verify the speaker elsewhere. By using Gaussian mixture model and features extracted from the speech signal we build a unique identity for each speaker enrolled within the system for later verification with privacy criteria.