A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments (original) (raw)

Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech

2011 Data Compression Conference, 2011

In this paper, we propose a low bit-rate speech codec based on a hybrid scalar/vector quantization of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of explicit phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000-4000 bps coding rates. The main advantage of the proposed codec is in distributed speech recognition (DSR) since speech features based on MFCCs can be directly obtained from codewords thus eliminating additional decode and feature extract stages.

Quantization of Cepstral Parameters for Speech Recognition over the World Wide Web

We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second. 2 1 INTRODUCTION Motivated by the explosive growth of the Internet, speech researchers have been working on the integration of speech technologies into the World Wide Web (WWW) [1]-[8]. Applications include Internet telephony, speech-enabled browsers, speech and natural language understanding systems, and speaker verification. Developers have successfully adapted existing systems, or created new ones, that can be deployed over the WWW.

QUALITY EVALUATION OF LPC BASED LOW BIT RATE SPEECH CODERS

IAEME PUBLICATION, 2021

Significant improvements are being reported these days in the coding of speech signals “with high quality at low bit rates. The need for low-bitrate speech coding algorithms continues, supported by the ever-increasing number of users” in wireless communication networks. Linear Prediction coders form important class speech coders with low bitrates. This paper describes the software level simulation and performance evaluation of CELP (4.8Kbps), LD-CELP (16 Kbps), MELP (2.4 Kbps), and CS-ACELP (8Kbps, 6.4 Kbps) speech coders. Even though all e coder outputs match with the input signal, the results from the error resilience test and MOS score show that the output quality of CS-ACELP is much better than the other three coders. The CS-ACELP coder performed well both in a quiet environment and in a noisy environment at a bitrate of 6.4Kbps.

Matrix Quantization and LPC Vocoder Based Linear Predictive for Low-Resource Speech Recognition system

ACM Transactions on Asian and Low-Resource Language Information Processing

Over the last ten years, there has been significant progress in the use of low-rate speech coders in voice applications for computers, military communications, and civil communications. This advancement has been made possible by the development of new speech coders that can generate high-quality speech at low data rates. The majority of existing coders include spectral representation of speech, speech waveform matching, and ”optimization” of the coder’s performance for human hearing. The goal of this paper is to provide a thorough evaluation of voice coding methods for educational purposes, with a particular emphasis on the algorithms used in low-rate cellular communication standards. The algorithm we developed using a voice-excited LPC vocoder produces clear, low-distortion results. Ordinary LPCs, on the other hand, fall short of vocoders because they can handle signals other than speech, such as music. To improve quality, additional bandwidth is used to reduce the bit rate. To imp...

Low-bitrate distributed speech recognition for packet-based and wireless communication

IEEE Transactions on Speech and Audio Processing, 2002

We present a framework for developing source coding, channel coding and decoding as well as erasure concealment techniques adapted for distributed (wireless or packet-based) speech recognition. It is shown that speech recognition as opposed to speech coding, is more sensitive to channel errors than channel erasures, and appropriate channel coding design criteria are determined. For channel decoding, we introduce a novel technique for combining at the receiver soft decision decoding with error detection. Frame erasure concealment techniques are used at the decoder to deal with unreliable frames. At the recognition stage, we present a technique to modify the recognition engine itself to take into account the time-varying reliability of the decoded feature after channel transmission. The resulting engine, referred to as weighted Viterbi recognition, further improves the recognition accuracy. Together, source coding, channel coding and the modified recognition engine are shown to provide good recognition accuracy over a wide range of communication channels with bit rates of 1.2 kbps or less.

A New Deterministic Codebook Structure for Celp Speech Coding

Proceedings. IEEE International Symposium on Information Theory, 1993

Low bit rate, high quality speech coding is a vital part in voice telecommunication systems. The introduction of CELP (1984) (Codebook Excited Linear Prediction) speech coding provided a feasible way to compress speech data to 4.8 kbps with high quality, but the formidable computational complexity required for real-time processing has prevented its wide application. Using the new deterministic codebook, we reduce the computational complexity of codebook search, which originally accounted for 2/3 of the computational complexity, to negligible. Based on this reduction, we produce an algorithm with complexity about 5 MIPS. It can be implemented in even inexpensive DSP chips, while maintaining the same high quality. In addition to extremely simple encoding and decoding schemes, this codebook also provides optimal error tolerance and it doesn't require codebook storage. We hope that this contribution can finally make CELP speech coding a widely applicable and practical technology.

Implementation of Low Complexity CELP Coder and Performance Evaluation in terms of Speech Quality

International Journal of Computer Applications, 2012

The critical issues that are serving as constraints in wireless communication particularly in mobile communication are bandwidth, storage memory and power. The speech transmission in wireless networks is associated with the reduction of extra information present in signal in such a way to preserve the quality and intelligibility of speech. To remove the redundancy and transmit the speech with acceptable quality, speech compression algorithms are deployed. Because of this reason the speech coding is and will be the most important research issue. This paper addresses the implementation of CELP coder having low computational complexity with acceptable speech quality and preserves the intelligibility. The coder is assessed in terms of quality for different kinds of speakers using PESQ, PSNR,Frequency Weighted SNRseg, and SNRseg.

A 2.4-kbps variable-bit-rate ADP-CELP speech coder

Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 2000

This paper presents a variable bit rate ADP-CELP (Adaptive Density Pulse Code Excited Linear Prediction) coder that selects one of four kinds of coding structure in each frame based on short time speech characteristics. To improve speech quality and reduce the average bit rate, we have developed a speech/non-speech classification method using spectrum envelope variation, which is robust for background noise. In addition, we propose an efficient pitch lag coding technique. The technique interpolates consecutive frame pitch lags and quantizes a vector of relative pitch lags consisting of variation between an estimated pitch lag and a target pitch lag in plural subframes. The average bit rate of the proposed coder was approximately 2.4 kbps for speech sources with activity factor of 60%. Our subjective testing indicates the quality of the propcsed coder exceeds that of the Japanese digital cellular standard with rate of 3.45 kbps.

Improved Packet Loss Recovery using Interleaving for CELP-type Speech Coders in Packet Networks

Iaeng International Journal of Computer Science, 2008

In VoIP applications, packet loss is a major source of speech impairment. In this paper, a packet loss concealment scheme based interleaving is presented to improve speech quality deterioration caused by packet losses for code-excited linear prediction (CELP) based coders. We applied the proposed scheme to the ITU-T G729 8 kb/s speech coding standard to evaluate the performance of the proposed method. The perceptual evaluation of speech quality (PESQ) and enhanced modified bark spectral distortion (EMBSD) tests under various packet loss conditions confirm that the proposed algorithm is superior to the concealment algorithm embedded in the G729. The spectral distortion measure is also used as an objective distortion measure; the obtained results prove that the interleaving method is better at the expense of extra delay.