Continuous speech recognition using support vector machines (original) (raw)

Support vector machines for speech recognition

2002

Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task -a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system.

SVMs for Automatic Speech Recognition: A Survey

Lecture Notes in Computer Science, 2007

Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.

Using Support Vector Machines in a HMM based Speech Recognition System

2005

Speech recognition is usually based on Hidden Markov Models (HMMs), which represent the temporal dynamics of speech very efficiently, and Gaussian mixture models, which do nonoptimally the classification of speech into single speech units (phonemes). In this paper we use Support Vector Machines (SVMs) for classification by integrating this method in a HMMbased speech recognition system. SVMs are very appealing due to their association with statistical learning theory and have already shown very good classification results in other fields of pattern recognition. In our hybrid SVM/HMM system we translate the outputs of the SVM classifiers into conditional probabilities and use them as emission probabilities in a HMM-based decoder using one-state-HMMs. We train and test the hybrid system on the DARPA Resource Management (RM1) corpus. Our results show better performance than HMM-based decoder using Gaussian mixtures.

Hybrid SVM/HMM architectures for speech recognition

2000

In this paper, we describe the use of a powerful machine learning scheme, Support Vector Machines (SVM), within the framework of hidden Markov model (HMM) based speech recognition. The hybrid SVM/HMM system has been developed based on our public domain toolkit. The hybrid system has been evaluated on the OGI Alphadigits corpus and performs at 11.6% WER, as compared to 12.7% with a triphone mixture-Gaussian HMM system, while using only a fifth of the training data used by triphone system. Several important issues that arise out of the nature of SVM classifiers have been addressed. We are in the process of migrating this technology to large vocabulary recognition tasks like SWITCHBOARD.

Speech recognition with support vector machines in a hybrid system

2005

While the temporal dynamics of speech can be represented very efficiently by Hidden Markov Models (HMMs), the classification of speech into single speech units (phonemes) is usually done with Gaussian mixture models which do not discriminate well. Here, we use Support Vector Machines (SVMs) for classification by integrating this method in a HMM-based speech recognition system. In this hybrid SVM/HMM system we translate the outputs of the SVM classifiers into conditional probabilities and use them as emission probabilities in a HMM-based decoder. SVMs are very appealing due to their association with statistical learning theory. They have already shown very good classification results in other fields of pattern recognition. We train and test our hybrid system on the DARPA Resource Management (RM1) corpus. Our results show better performance than HMM-based decoder using Gaussian mixtures.

Mixture of Support Vector Machines for HMM based Speech Recognition

18th International Conference on Pattern Recognition (ICPR'06), 2006

Models (HMMs), which represent the temporal dynamics of speech very efficiently, and Gaussian mixture models, which do non-optimally the classification of speech into single speech units (phonemes). In this paper we use parallel mixtures of Support Vector Machines (SVMs) for classification by integrating this method in a HMM-based speech recognition system. SVMs are very appealing due to their association with statistical learning theory and have already shown good results in pattern recognition and in continuous speech recognition. They suffer however from the effort for training which scales at least quadratic with respect to the number of training vectors. The SVM mixtures need only nearly linear training time making it easier to deal with the large amount of speech data. In our hybrid system we use the SVM mixtures as acoustic models in a HMM-based decoder. We train and test the hybrid system on the DARPA Resource Management (RM1) corpus, showing better performance than HMM-based decoder using Gaussian mixtures.

Advances in hybrid SVM/HMM speech recognition

2003

In this paper, we describe our continuing work on the development of a hybrid SVM/HMM speech recognition system. The original hybrid system was evaluated on the OGI Alphadigits corpus and performed at 11.0 % WER, compared to 11.9 % for a triphone mixture-Gaussian HMM system. In a new set of experiments reported here, the hybrid system performs at 10.6 % WER on the Alphadigits task using a simple score combination mechanism. On a largevocabulary task, SWITCHBOARD, the hybrid system improves the performance over the baseline HMM-based system from 41.6 % to 40.6 % WER. This is the first time SVMs have been applied to a complex large-vocabulary task. Several oracle experiments are discussed which demonstrate the potential benefit of this approach over traditional HMM systems. 1.

Real-Time Robust Automatic Speech Recognition Using Compact Support Vector Machines

IEEE Transactions on Audio, Speech, and Language Processing, 2000

In the last years, support vector machines (SVMs) have shown excellent performance in many applications, especially in the presence of noise. In particular, SVMs offer several advantages over artificial neural networks (ANNs) that have attracted the attention of the speech processing community. Nevertheless, their high computational requirements prevent them from being used in practice in automatic speech recognition (ASR), where ANNs have proven to be successful. The high complexity of SVMs in this context arises from the use of huge speech training databases with millions of samples and highly overlapped classes. This paper suggests the use of a weighted least squares (WLS) training procedure that facilitates the possibility of imposing a compact semiparametric model on the SVM, which results in a dramatic complexity reduction. Such a complexity reduction with respect to conventional SVMs, which is between two and three orders of magnitude, allows the proposed hybrid WLS-SVC/HMM system to perform real-time speech decoding on a connected-digit recognition task (SpeechDat Spanish database). The experimental evaluation of the proposed system shows encouraging performance levels in clean and noisy conditions, although further improvements are required to reach the maturity level of current context-dependent HMM-based recognizers.