Speech Coding at very low rate (original) (raw)

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Arabian Journal for Science and Engineering, 2019

In this paper, we propose a variable-bit-rate speech codec-based on mixed excitation linear prediction enhanced (MELPe) with an average bit rate of 2 kbps and with a better representation of excitation signal. The order of the prediction filter in MELPe coding architecture is reduced from 10 to 7 without affecting the perceptual quality of the decoded speech by using psychoacoustic Mel scale. An efficient two-split vector quantization is developed with weighted Euclidean distance measure for Mel scale-based linear predictive coding (Mel-LPC), and it requires only 18 bits/frame. The instantaneous pitch or epoch that is vital for many speech processing applications is preserved in this codec by including it in the excitation signal used for reconstructing the voiced speech. The quantization scheme developed for glottal closure instants (GCIs) causes an increase in the bit requirement for voiced frames by 4-25 bits depending on the position of GCIs. To compensate for that, the Mel-LPC order for both silence and unvoiced frames has been brought down to 4 without compromising the perceptual quality of reconstructed speech. The lowered bit budget for unvoiced frame is 41 bits/frame, and for silence, it is 31 bits/frame. Further reduction of 10 bits for silence frame is obtained by reducing the number of transmitted parameters and by tuning the quantization bit requirement for each. For categorizing the speech frames at the entry of the encoder, a neural network-based voiced/unvoiced/silence classification algorithm using five-dimensional feature set is created. The experimental results show that the proposed coding scheme operates at an average bit rate of 2 kbps, which is less than the bit rate of MELPe (2.4 kbps), but with a better perceptual score. In addition to all these, the incorporation of Mel-LPC gives a better performance in the estimation of formants and GCIs.

Low‐Bit‐Rate Speech Coding

2003

This article is focused on speech coding methods for achieving communication quality speech at bit rates of 4 kbit/s and lower. The speech coding techniques are based on an all-pole model of the vocal tract which may be implemented in the time domain with appropriately selected excitation functions or else may be fit to a spectral analysis of the speech signal. Three main types of coders are described below. Code-excited linear prediction (CELP) coders select their excitation from waveform codebooks using analysis-by-synthesis closed-loop techniques, which need to be supplemented by speech classification and open-loop parametric techniques for keeping up with quality at lower rates. The prototypical sinusoidal coder (SC) has a bank of oscillators for signal synthesis, driven by a model of the magnitude spectrum. However, phase regeneration is important in enhancing speech reconstruction at low rates. Waveform interpolation (WI) coders afford a wider timefrequency footprint for the representation of the excitation, showing a good potential for achieving toll quality at bit rates below 4 kbit/s.

A new technique for regular pulse predictive coding of speech at low bit rates

The International Conference on Electrical Engineering

Speech coding is a very important area that finds civilian and military applications. It can be considered as one of the important stages in speech processing. It is used to compress speech; this is because the speech signal is very redundant. Speech coding has many applications; it is used in digital telephony, in multimedia and in security of digital communications. In this paper, we focused on developing algorithms and methods for a waveform speech coder operating at low bit rate with good quality reconstructed speech signal. Moreover, a new model for linear predictive coding of speech that can be used to produce high quality speech at low data rate is introduced. In this model, we divided the residual (excitation signal) to subframes and made energy and voice / unvoice classifications to choose the best pulses in the residual that give us low bit rate and good quality for the reconstructed speech. Hence, this vocoder forms an excitation sequence which consists of groups of uniformly spaced pulses. During analysis the amplitude and LP coefficients of the pulses are determined. In addition, a new technique in the quantization of the amplitude of each pulse as well as linear prediction parameters is proposed.

A 1.7 kb/s MELP coder with improved analysis and quantization

1998

Abstract This paper describes our new mixed excitation linear predictive (MELP) coder designed for very low bit rate applications. This new coder, through algorithmic improvements and enhanced quantization techniques, produces better speech quality at 1.7 kb/s than the new US Federal Standard MELP coder at 2.4 kb/s. Key features of the coder are an improved pitch estimation algorithm and a line spectral frequencies (LSF) quantization scheme that requires only 21 bits per frame.

Strategies to improve the performance of very low bit rate speech coders and application to a variable rate 1.2 kb/s codec

IEE Proceedings - Vision, Image, and Signal Processing, 2005

This paper presents several strategies to improve the performance of very low bit rate speech coders and describes a speech codec that incorporates these strategies and operates at an average bit rate of 1.2 kb/s. The encoding algorithm is based on several improvements in a mixed multiband excitation (MMBE) linear predictive coding (LPC) structure. A switched-predictive vector quantiser technique that outperforms previously reported schemes is adopted to encode the LSF parameters. Spectral and sound specific low rate models are used in order to achieve high quality speech at low rates. An MMBE approach with three sub-bands is employed to encode voiced frames, while fricatives and stops modelling and synthesis techniques are used for unvoiced frames. This strategy is shown to provide good quality synthesised speech, at a bit rate of only 0.4 kb/s for unvoiced frames. To reduce coding noise and improve decoded speech, spectral envelope restoration combined with noise reduction (SERNR) postfilter is used. The contributions of the techniques described in this paper are separately assessed and then combined in the design of a low bit rate codec that is evaluated against the North American Mixed Excitation Linear Prediction (MELP) coder. The performance assessment is carried out in terms of the spectral distortion of LSF quantisation, mean opinion score (MOS), A/B comparison tests and the ITU-T P.862 perceptual evaluation of speech quality (PESQ) standard. Assessment results show that the improved methods for LSF quantisation, sound specific modelling and synthesis and the new postfiltering approach can significantly outperform previously reported techniques. Further results also indicate that a system combining the proposed improvements and operating at 1.2 kb/s, is comparable (slightly outperforming) a MELP coder operating at 2.4 kb/s. For tandem connection situations, the proposed system is clearly superior to the MELP coder.

Predictive Coding of Speech at Low Bit Rates

IEEE Transactions on Communications, 1982

Abstracr-Predictive coding is a promising approach for speech coding. In this paper, we review the recent work on adaptive predictive coding of speech signals, with particular emphasis on achieving high speech quality at low bit rates (less than 10 kbits/s). Efficient prediction of the redundant structure in speech signals is obviously important for proper functioning of a predictive coder. It is equally important to ensure that the distortion in the coded speech signal be perceptually small. The subjective loudness of quantization noise depends both on the short-time spectrum of the noise and its relation to the short-time spectrum of the speech signal. The noise in the formant regions is partially masked by the speech signal itself. This masking of quantization noise by speech signal allows one to use low bit rates while maintaining high speech quality. This paper will present generalizations of predictive coding for minimizing subjective distortion in the reconstructed speech signal at the receiver. The quantizer in predictive coders quantizes its input on a sample-by-sample basis. Such sample-by-sample (instantaneous) quantization creates difficulty in realizing an arbitrary noise spectrum, particularly at low bit rates. We will describe a new class of speech coders in this paper which could be considered to be a generalization of the predictive coder. These new coders not only allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.

Low-Delay Speech Coding at 16 kb/s and Below

1991

Development of network quality speech coders at 16 kb/s and below is an active research area. This thesis focuses on the study of low-delay Code Excited Linear Predictive (CELP) and tree coders. A 16 kb/s stochastic tree coder based on the (M,L) search algorithm suggested by Iyengar and Kabal and a low-delay CELP coder proposed by AT&T (CCITT 16 kb/s standardization candidate) are examined. The first goal is to compare and study the performance of the two coders. Second objective is to analyze the particular characteristics which make the two coders different from one another. The final goal is the improvement of the performance of the coders, particularly with a view of bringing down the bit rate below 16 kb/s. When compared under similar conditions, the two coders showed comparable performance at 16 kb/s. The analysis of the components and particular characteristics of the tree and CELP coders provide new insight for future coders. Higher performance coder components such as predi...

A New Low Bit Rate Speech Coding Scheme for Mixed Content

2006

Speech coding is a very mature research area and many coding schemes are available that provide speech qualities ranging from highly intelligible synthetic speech at about 2 kbit/s, till wideband natural speech at about 16 kbit/s. However, emerging application scenarios such as information services on broadcast radio are eliciting additional concurrent challenges not easily addressed by current speech coding technology, namely the need to code mixed audio material, the need to permit flexible bitrate coding configurations, the need to scale effectively in quality in the range 2-8 kbit/s, and the need to offer pleasant natural sound. In this paper we present a new very low rate speech/audio coding technology addressing those concurrent challenges thanks to the use of innovative approaches regarding accurate reconstruction of harmonic complexes, optimal coding of the excitation, efficient side information coding, and suitable combination of new bandwidth extension techniques. The structure of the speech/audio coder is detailed and its performance in the range 2.4-12 kbit/s is illustrated and compared to that of reference coders.

RTP Payload Format for the Mixed Excitation Linear Prediction Enhanced (MELPe) Codec

This document describes the RTP payload format for the Mixed Excitation Linear Prediction Enhanced (MELPe) speech coder. MELPe's three different speech encoding rates and sample frame sizes are supported. Comfort noise procedures and packet loss concealment are described in detail. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8130.

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Nowadays the number of mobile subscribers is increasing all over the world, so the system for the communication has to be improved. Mixed Excited Linear Prediction (MELP) algorithm is developed for reducing the bandwidth of the signal as well as transmission of large data on a single channel. This results in increase channel capacity. This also results in, increasing the number of user in a channel. MELP is basically a speech coding method, relying on a Speech Encoder and Speech Decoder. The MELP speech coder reduces the redundancy of the signal and compresses it which is represented by the MELP code. Speech Decoder includes a Linear Predictive Coding (LPC) filter providing a synthesized speech at its output side in response to voice and unvoiced. MELP also reduces jitter voice. The bit rate of MELP is reducing the reserves of the code book and calculation complexity. This paper describes "the bit rates of MELP coder can be reduced to as low as 2.4kbps without apparent damage to the speech quality."