A Speech Coder Post-Processor Controlled By Side-Information (original) (raw)
Related papers
Problem statement: Speech Enhancement plays an important role in any of the speech processing systems like speech recognition, speech coding, mobile communication, hearing aid, etc., Approach: In this work, the performance of the speech coding method is enhanced by using speech enhancement as the preprocessing technique. The purpose of the proposed method is to reduce the bit rate of the speech signal to be transmitted, so that the bandwidth can be utilized efficiently. In noisy environment speech coding is done both for desired speech and the unwanted noise signal. If the noise is reduced before coding the speech signal, the bit rate required will also be reduced. In this work a simple adaptive speech enhancement technique, using an adaptive sigmoid type function to determine the weighting factor of the TSDD algorithm is employed based on a subband approach for speech enhancement and Voice excited Linear predictive coding (VELP) method is used for coding the speech signal. Results:...
A Novel Audio Post-Processing Toolkit for the Enhancement of Audio Signals Coded at Low Bit Rates
2007
Low bit rate audio coding often results in the loss of a number of key audio attributes such as audio bandwidth and stereo separation. Additionally, there is also typically a loss in the level of details and intelligibility and/or warmth in the signal. Due to the proliferation, e.g. on Internet, of low bit rate audio coded using a variety of coding schemes and bit rates over which the listener has no control, it is becoming increasingly attractive to incorporate processing tools in the player which can ensure a consistent listener experience. We describe a novel post-processing toolkit which incorporates tools for (i) Stereo Enhancement, (ii) Blind Bandwidth Extension, (iii) Automatic Noise Removal and Audio Enhancement, and, (iv) Blind 2-to-5 channel upmixing. Algorithmic details, listening results, and audio demonstrations are presented. 1.
Postfiltering techniques in low bit-rate speech coders
1999
Postfilters are used in speech decoders to improve speech quality by preserving formant information and reducing noise in the valley regions. In this thesis, a new adaptive least-squares LPC-based time-domain postfilter is presented to overcome problems presented in the conventional LPC-based time-domain postfilter. Conventional LPC-based time-domain postfilter [4] produces an unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse, and high pass filtering, causing unnecessary attenuation or amplification of some frequency components that introduces muffling in speech quality. This effect increases when voice coders are tandemed together. However, the least-squares postfilter solves these problems by eliminating the problem of spectral tilt in the conventional time-domain postfilter. The least-squares postfilter has a flat frequency response at formant peaks of the speech spectrum. Instead of looking at the modified LPC synthesis, inverse, and high...
Enhancing Speech Coder Quality: Improved Noise Estimation for Postlters
2011
ITU-T G.711.1 is a multirate wideband extension for the well-known ITU-T G.711 pulse code modulation of voice frequencies. The extended system is fully interoperable with the legacy narrowband one. In the case where the legacy G.711 is used to code a speech signal and G.711.1 is used to decode it, quantization noise may be audible. For this situation, the standard proposes an optional postfilter. The application of postfiltering requires an estimation of the quantization noise. The more accurate the estimate of the quantization noise is, the better the performance of the postfilter can be. In this thesis, we propose an improved noise estimator for the postfilter proposed for the G.711.1 codec and assess its performance. The proposed estimator provides a more accurate estimate of the noise with the same computational complexity.
Enhancement of Coded Speech Using a Mask-Based Post-Filter
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven postfilter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoderdecoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per timefrequency bin. The proposed models were tested on the five lowest operating modes (6.65 kbps-15.85 kbps) of the Adaptive Multi-Rate Wideband codec (AMR-WB). Both objective and subjective evaluations confirm the enhancement of the coded speech and also show the superiority of the mask-based neural network system over a conventional heuristic post-filter used in the standard like ITU-T G.718.
2008
Audio coding based on Frequency Domain Linear Prediction (FDLP) uses autoregressive models to approximate Hilbert envelopes in frequency sub-bands. Although the basic technique achieves good coding efficiency, there is a need to improve the reconstructed signal quality for tonal signals with impulsive spectral content. For such signals, the quantization noise in the FDLP codec appears as frequency components not present in the input signal. In this paper, we propose a technique of Spectral Noise Shaping (SNS) for improving the quality of tonal signals by applying a Time Domain Linear Prediction (TDLP) filter prior to the FDLP processing. The inverse TDLP filter at the decoder shapes the quantization noise to reduce the artifacts. Application of the SNS technique to the FDLP codec improves the quality of the tonal signals without affecting the bit-rate. Performance evaluation is done with Perceptual Evaluation of Audio Quality (PEAQ) scores and with subjective listening tests.
Speech Enhancement Based on the General Transfer Function GSC and Postfiltering
IEEE Transactions on Speech and Audio Processing, 2004
In speech enhancement applications microphone array postfiltering allows additional reduction of noise components at a beamformer output. Among microphone array structures the recently proposed general transfer function generalized sidelobe canceller (TF-GSC) has shown impressive noise reduction abilities in a directional noise field, while still maintaining low speech distortion. However, in a diffused noise field less significant noise reduction is obtainable. The performance is even further degraded when the noise signal is nonstationary. In this contribution we propose three postfiltering methods for improving the performance of microphone arrays. Two of which are based on single-channel speech enhancers and making use of recently proposed algorithms concatenated to the beamformer output. The third is a multichannel speech enhancer which exploits noise-only components constructed within the TF-GSC structure. This work concentrates on the assessment of the proposed postfiltering structures. An extensive experimental study, which consists of both objective and subjective evaluation in various noise fields, demonstrates the advantage of the multichannel postfiltering compared to the single-channel techniques.
2 A NEW SINUSOIDAL SPEECH CODING TECHNIQUE WITH SPEECH ENHANCER AT LOW BIT RATES
Speech coding deals with the problem of reducing the bit rate required for representing speech signals while preserving the quality of the speech reconstructed from that representation. In this paper, we propose a novel speech coding technique, not only to compress speech signal at low bit rate, but also to maintain its quality even if the received signal is corrupted by noise. The encoder of the proposed technique is based on speech analysis/synthesis model using a sinusoidal representation where the sinusoidal components are involved to form a nearly resemblance of the original speech waveform. In the proposed technique, the original frame is divided to voiced or unvoiced sub-frames based on their energies. The aim of the division and classification is to choose the best parameters that reduce the total bit rate and enable the receiver to recover the speech signal with a good quality. The parameters involved in the analysis stage are extracted from the short-time Fourier transform where the original speech signal is converted into frequency domain. Making use of the peak-picking technique, amplitudes of the selected peaks with their associated frequencies and phases of the original speech signal are extracted. In the next stage, novel parameter reduction and quantization techniques are performed to reduce the bit rate while preserving the quality of the recovered signal.
A NEW SINUSOIDAL SPEECH CODING TECHNIQUE WITH SPEECH ENHANCER AT LOW BIT RATES
IAEME PUBLICATION, 2014
Speech coding deals with the problem of reducing the bit rate required for representing speech signals while preserving the quality of the speech reconstructed from that representation. In this paper, we propose a novel speech coding technique, not only to compress speech signal at low bit rate, but also to maintain its quality even if the received signal is corrupted by noise. The encoder of the proposed technique is based on speech analysis/synthesis model using a sinusoidal representation where the sinusoidal components are involved to form a nearly resemblance of the original speech waveform. In the proposed technique, the original frame is divided to voiced or unvoiced sub-frames based on their energies. The aim of the division and classification is to choose the best parameters that reduce the total bit rate and enable the receiver to recover the speech signal with a good quality. The parameters involved in the analysis stage are extracted from the short-time Fourier transform where the original speech signal is converted into frequency domain. Making use of the peak-picking technique, amplitudes of the selected peaks with their associated frequencies and phases of the original speech signal are extracted. In the next stage, novel parameter reduction and quantization techniques are performed to reduce the bit rate while preserving the quality of the recovered signal.
IEE Proceedings - Vision, Image, and Signal Processing, 2005
This paper presents several strategies to improve the performance of very low bit rate speech coders and describes a speech codec that incorporates these strategies and operates at an average bit rate of 1.2 kb/s. The encoding algorithm is based on several improvements in a mixed multiband excitation (MMBE) linear predictive coding (LPC) structure. A switched-predictive vector quantiser technique that outperforms previously reported schemes is adopted to encode the LSF parameters. Spectral and sound specific low rate models are used in order to achieve high quality speech at low rates. An MMBE approach with three sub-bands is employed to encode voiced frames, while fricatives and stops modelling and synthesis techniques are used for unvoiced frames. This strategy is shown to provide good quality synthesised speech, at a bit rate of only 0.4 kb/s for unvoiced frames. To reduce coding noise and improve decoded speech, spectral envelope restoration combined with noise reduction (SERNR) postfilter is used. The contributions of the techniques described in this paper are separately assessed and then combined in the design of a low bit rate codec that is evaluated against the North American Mixed Excitation Linear Prediction (MELP) coder. The performance assessment is carried out in terms of the spectral distortion of LSF quantisation, mean opinion score (MOS), A/B comparison tests and the ITU-T P.862 perceptual evaluation of speech quality (PESQ) standard. Assessment results show that the improved methods for LSF quantisation, sound specific modelling and synthesis and the new postfiltering approach can significantly outperform previously reported techniques. Further results also indicate that a system combining the proposed improvements and operating at 1.2 kb/s, is comparable (slightly outperforming) a MELP coder operating at 2.4 kb/s. For tandem connection situations, the proposed system is clearly superior to the MELP coder.