Adaptive post-filtering controlled by pitch frequency for CELP-based speech coder (original) (raw)

Postfiltering techniques in low bit-rate speech coders

1999

Postfilters are used in speech decoders to improve speech quality by preserving formant information and reducing noise in the valley regions. In this thesis, a new adaptive least-squares LPC-based time-domain postfilter is presented to overcome problems presented in the conventional LPC-based time-domain postfilter. Conventional LPC-based time-domain postfilter [4] produces an unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse, and high pass filtering, causing unnecessary attenuation or amplification of some frequency components that introduces muffling in speech quality. This effect increases when voice coders are tandemed together. However, the least-squares postfilter solves these problems by eliminating the problem of spectral tilt in the conventional time-domain postfilter. The least-squares postfilter has a flat frequency response at formant peaks of the speech spectrum. Instead of looking at the modified LPC synthesis, inverse, and high...

A 2.4-kbps variable-bit-rate ADP-CELP speech coder

Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 2000

This paper presents a variable bit rate ADP-CELP (Adaptive Density Pulse Code Excited Linear Prediction) coder that selects one of four kinds of coding structure in each frame based on short time speech characteristics. To improve speech quality and reduce the average bit rate, we have developed a speech/non-speech classification method using spectrum envelope variation, which is robust for background noise. In addition, we propose an efficient pitch lag coding technique. The technique interpolates consecutive frame pitch lags and quantizes a vector of relative pitch lags consisting of variation between an estimated pitch lag and a target pitch lag in plural subframes. The average bit rate of the proposed coder was approximately 2.4 kbps for speech sources with activity factor of 60%. Our subjective testing indicates the quality of the propcsed coder exceeds that of the Japanese digital cellular standard with rate of 3.45 kbps.

Enhancing Speech Coder Quality: Improved Noise Estimation for Postlters

2011

ITU-T G.711.1 is a multirate wideband extension for the well-known ITU-T G.711 pulse code modulation of voice frequencies. The extended system is fully interoperable with the legacy narrowband one. In the case where the legacy G.711 is used to code a speech signal and G.711.1 is used to decode it, quantization noise may be audible. For this situation, the standard proposes an optional postfilter. The application of postfiltering requires an estimation of the quantization noise. The more accurate the estimate of the quantization noise is, the better the performance of the postfilter can be. In this thesis, we propose an improved noise estimator for the postfilter proposed for the G.711.1 codec and assess its performance. The proposed estimator provides a more accurate estimate of the noise with the same computational complexity.

Techniques for improving the performance of CELP type speech coders

1991

Techniques for improving the performance of CELP (code excited linear prediction) type speech coders while maintaining reasonable computational complexity are explored. A harmonic noise weighting function which enhances the perceptual quality of the processed speech is introduced. The combination of harmonic noise weighting and subsample resolution pitch significantly improves the coder performance for voiced speech. A 6.9 kb/s VSELP speech coder which incorporates subsample resolution pitch and harmonic noise weighting is described. Complexity reduction techniques are discussed which allow the coder to be implemented using a single fixed point digital signal processor

A Speech Coder Post-Processor Controlled By Side-Information

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005

Speech coders provide high speech quality at low rates. However they perform poorly when encoding non-speech signals. This paper proposes a new enhancement algorithm requiring minimum side information to reduce the effect of this shortcoming. The enhancement algorithm consists of post-processing the speech decoder output in the spectral domain. Specifically, some frequency components are reduced or forced to zero when the corresponding frequency content is poorly described by the speech coder. The choice of modifying spectral components is determined at the encoder, thus requiring to transmit the decision information. Experiments combining the AMR-WB speech codec and the proposed audio enhancement show that the quality for music signals is improved significantly while not affecting the quality for speech inputs.

A Pre-processing Method to Modify Irregular Pitch Variations for Quality Enhancement of Synthesised Speech

In low bit rate speech coders, pitch and voicing level estimation play an important role in quality of the synthesised speech. Although pitch usually evolves smoothly, sometimes it has irregular variations and as a result the estimated pitch and the voicing level differ from the real ones. This affects the performance of the speech coder. We propose to use a new modification as a preprocessor. This methodology modifies the residual speech signal such that the pitch period evolves more smoothly without distorting perceptual speech quality. Thus, the pitch and the voicing level can be determined correctly. Experimental results show that combination of the proposed method with 2.4 Kb/s MELP coder provides better quality.

Enhancement of Coded Speech Using a Mask-Based Post-Filter

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven postfilter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoderdecoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per timefrequency bin. The proposed models were tested on the five lowest operating modes (6.65 kbps-15.85 kbps) of the Adaptive Multi-Rate Wideband codec (AMR-WB). Both objective and subjective evaluations confirm the enhancement of the coded speech and also show the superiority of the mask-based neural network system over a conventional heuristic post-filter used in the standard like ITU-T G.718.

Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec

Interspeech 2020, 2020

Speech codecs can use postfilters to improve the quality of the decoded signal. While postfiltering is effective in reducing coding artifacts, such methods often involve processing in both the encoder and the decoder, rely on additional transmitted side information, or are highly dependent on other codec functions for optimal performance. We propose a low-complexity postfiltering method to improve the harmonic structure of the decoded signal, which models the fundamental frequency of the signal. In contrast to past approaches, the postfilter operates at the decoder as a standalone function and does not need the transmission of additional side information. It can thus be used to enhance the output of any codec. We tested the approach on a modified version of the EVS codec in TCX mode only, which is subject to more pronounced coding artefacts when used at its lowest bitrate. Listening test results show an average improvement of 7 MUSHRA points for decoded signals with the proposed harmonic postfilter. 1

Design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP

1997 IEEE International Conference on Acoustics, Speech, and Signal Processing

This paper describes the design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP. This adaptation method not only gives pitch periodicity to the random excitation but also synchronizes the basic point of the stored random vector with the pitch phase. We further improve the proposed coder by introducing a backward gain prediction scheme. In subjective evaluation experiment, there is no signicant dierence between the quality of ITU-T G.726 32-kbit/s coder and that of the proposed 4-kbit/s coder under the conditions of normal and low input levels, tandem connection for clean speech. In noisy environment, there are also no signicant dierences between G.726 and 4-kbit/s coders from MOS results of ACR test.

Design of a pitch synchronous innovation CELP coder for mobile communications

IEEE Journal on Selected Areas in Communications, 1995

This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In voiced frames, instead of conventional random excitation vectors, PSI-CELP converts even the random excitation vectors to have pitch periodicity by repeating stored random vectors as well as by using an adaptive codebook. In silent, unvoiced, and transient frames, the coder stops using the adaptive codebook and switches to fixed random codebooks. The PSI-CELP coder also implements novel structures and techniques: an FIR-type perceptual weighting filter using unquantized LPC parameters, a random codebook with a conjugate structure trained to be robust against channel errors, codebook search with delayed decision, a gain quantization with sloped amplitude, and a moving average prediction coding of LSP parameters. Our speech coder is implemented by DSP chips. Its coded speech quality at 3.6 kbls with 2.0 kbls redundancy is comparable to that of the Japanese full-rate VSELP coder at 6.7 kbls with 4.5 kb/s redundancy. The basic structure of this PSI-CELP coder has been chosen as the Japanese half-rate speech codec for digital cellular telecommunications.