Low Bit-Rate Audio Coding With Hybrid Representations (original) (raw)
Related papers
Hybrid representations for audiophonic signal encoding
2002
We discuss in this paper a new approach for signal models in the context of audio signal encoding. The method is based upon hybrid models featuring transient, tonal and stochastic components in the signal. Contrary to several existing approaches, our method does not rely on any prior segmentation of the signal. The three components are estimated and encoded using a strategy very much in the spirit of transform coding. While the details of the method described here are taylored to audio signals, the general strategy should also apply to other types of signals exhibiting significantly different features, for example images.
The main features of a novel approach for audio signal encoding are described. The approach combines non-linear transform coding and structured approximation techniques, together with hybrid modeling of the signal class under consideration. Essentially, several different components of the signal are estimated and transform coded using an appropriately chosen orthonormal basis. Different models and estimation procedures are discussed, and numerical results are provided.
1998
We propose a new approach t o T emporal Decomposition TD of characteristic parameters of speech for very low rate coding applications. The method models the articulatory dynamics employing a hierarchical error minimization algorithm which does not use Singular Value Decomposition. It is also much faster than conventional TD and could be implemented in realtime. High exibility i s a c hieved with the proposed method to comply with the desired coding requirements, such as compression ratio, accuracy, delay, and computational complexity. This method can be used for coding spectral parameters at rates 1000-1200 b s with high delity and an algorithmic delay of less than 150 msec.
A hybrid scheme for encoding audio signal using hidden Markov models of waveforms
Applied and Computational Harmonic Analysis, 2005
This paper reports on recent results related to audiophonic signals encoding using time-scale and time-frequency transform. More precisely, nonlinear, structured approximations for tonal and transient components using local cosine and wavelet bases will be described, yielding expansions of audio signals in the form tonal + transient + residual. We describe a general formulation involving hidden Markov models, together with corresponding rate estimates. Estimators for the balance transient/tonal are also discussed.
Low power MPEG/audio encoders using simplified psychoacoustic model and fast bit allocation
… , IEEE Transactions on, 2001
In this papev, we propose novel techniques for the implementation of MPEGIAudio (Layer I1 and Layer I l l) encoder: The proposed techniques concern implementing the encoder with a minimum complexity. As an effort to minimize the complexity, the I S 0 psycho-acoustic model (PAM) is simplified that often demands significant computational power of the implementation system. The simplification follows the statistical behavior of the PAM. A fast bit allocation algorithm is also developed, in which the quantizer step size is updated dynamically and adaptively according to input signal statistics. The performance of the developed techniques is verified via subjective tests as well as statistical analyses. Real-time implementations are tried for MEPG/Audio Layer I1 and Layer I l l encoders employing the proposed algorithms. The implemented systems show that the developed encoders can be as simple as decoders, but still produce bitstreams of high audio quality.
A NEW CODING METHOD FOR SPEECH AND AUDIO SIGNALS
In this paper a new representation or modeling method of speech signals is introduced. The proposed method is based on the generation of the so-called Predefined Signature S={S R } and Envelope vector E={E K } Sets (PSEVS). These vector sets are speaker and language independent. In this method, once the speech signals are divided into frames with selected lengths, then each frame signal piece X i is reconstructed by means of the mathematical form of X i =C i E K S R . In this representation, C i is called the frame coefficient, S R and E K are the vectors properly assigned from the PSEVS respectively. It is shown that the proposed method provides fast reconstruction and substantial compression ratio with acceptable hearing quality.
A fully scalable audio coding structure with embedded psychoacoustic model
2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008
A fully scalable audio coding structure based on a novel combination of the non-core MPEG-4 scalable lossless audio coding (SLS), the state-of-the-art psychoacoustic model, joint stereo coding and the perceptually prioritized bit-plane coding is presented in this paper. The psychoacoustic information is implicitly embedded in the scalable bitstream with negligible amount of side information and trivial modi cation to the standardized SLS decoder. Results of extensive evaluation show that the subjective quality of scalable audio is improved signi cantly.
Multimedia Signal Processing, 1998
This paper proposes an efficient, low complexity audio coder based on the SPIHT (set partitioning in hierarchical trees) coding algorithm , which has achieved notable success in still image coding. A wavelet packet transform is used to decompose the audio signal into 29 frequency subbands corresponding roughly to the critical subbands of the human auditory system. A psychoacoustic model, which,
Speech Coding Based on Spectral Dynamics
Lecture Notes in Computer Science, 2006
In this paper we present first experimental results with a novel audio coding technique based on approximating Hilbert envelopes of relatively long segments of audio signal in critical-band-sized subbands by autoregressive model. We exploit the generalized autocorrelation linear predictive technique that allows for a better control of fitting the peaks and troughs of the envelope in the sub-band. Despite introducing longer algorithmic delay, improved coding efficiency is achieved. Since the described technique does not directly model short-term spectral envelopes of the signal, it is suitable not only for coding speech but also for coding of other audio signals.
Parametric audio coding with exponentially damped sinusoids
Sinusoidal modeling is one of the most popular techniques for low bitrate audio coding. Usually, the sinusoidal parameters (amplitude, pulsation and phase of each sinusoidal component) are kept constant within a time segment. An alternative model, the so-called Exponentially-Damped Sinusoidal (EDS) model, includes an additional damping parameter for each sinusoidal component to better represent the signal characteristics. It was however never shown that the EDS model could be efficient for perceptual audio coding. To that aim, we propose in this paper an efficient analysis/synthesis framework with dynamic timesegmentation on transients and psychoacoustic modeling, and an asymptotically optimal entropy-constrained quantization method for the four sinusoid parameters (e.g including damping). We then apply this coding technique to real audio excerpts for a given entropy target corresponding to a low bitrate (20 kbits/s), and compare this method with a classical sinusoidal coding scheme using a constant-amplitude sinusoidal model and the perceptually weighted Matching Pursuit algorithm. Subjective listening tests show that the EDS model is more efficient on audio samples with fast transient content, and similar to the classical model for more stationary audio samples.