Hybrid representations for audiophonic signal encoding (original) (raw)

A hybrid scheme for encoding audio signal using hidden Markov models of waveforms

Applied and Computational Harmonic Analysis, 2005

This paper reports on recent results related to audiophonic signals encoding using time-scale and time-frequency transform. More precisely, nonlinear, structured approximations for tonal and transient components using local cosine and wavelet bases will be described, yielding expansions of audio signals in the form tonal + transient + residual. We describe a general formulation involving hidden Markov models, together with corresponding rate estimates. Estimators for the balance transient/tonal are also discussed.

Low Bit-Rate Audio Coding With Hybrid Representations

We present a general audio coder based on a structural decomposition : the signal is expanded into three features : its harmonic part, the transients and the remaining part referred as the noise. The rst two of these layers can bevery e ciently encoded in a wellchosen basis. The noise is by construction modelized as a gaussian colored random noise. Furthermore, this decomposition allowsagood time-frequency psycoacoustic modeling, as it dircetly provides us with the tonal and nontonal part of the signal.

TOWARDS A HYBRID AUDIO CODER

The main features of a novel approach for audio signal encoding are described. The approach combines non-linear transform coding and structured approximation techniques, together with hybrid modeling of the signal class under consideration. Essentially, several different components of the signal are estimated and transform coded using an appropriately chosen orthonormal basis. Different models and estimation procedures are discussed, and numerical results are provided.

Audio Signal Representations for Indexing in the Transform Domain

IEEE Transactions on Audio, Speech, and Language Processing, 2000

Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g. MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multiscale MDCT transform has a good resolution both for timeand frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for 3 different applications, namely beat tracking, chord recognition and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.

A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration

EURASIP Journal on Advances in Signal Processing, 2005

Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion detectability defines a (perceptually relevant) norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.

An Analysis/Synthesis System of Audio Signal with Utilization of an SN Model

2004

An SN (sinusoids plus noise) model is a spectral model, in which the periodic components of the sound are represented by sinusoids with time-varying frequencies, amplitudes and phases. The remaining non-periodic components are represented by a filtered noise. The sinusoidal model utilizes physical properties of musical instruments and the noise model utilizes the human inability to perceive the exact spectral shape or the phase of stochastic signals.

a Family of Random Waveform Models for Audio Coding

We study the behavior of hybrid random waveform models for audio signals, involving sparse random series of waveforms, with random coefficients. Similar approaches have been considered in the recent years. However, these do generally not rely on explicit models, and are of more "algorithmical" nature. The models we propose allow us to analyze mathematical properties of such signals and corresponding estimators, and derive estimation algorithms, which do not rely on complex optimization techniques.

Perceptual segmentation and component selection for sinusoidal representations of audio

IEEE Transactions on Speech and Audio Processing, 2000

This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second, and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids. A systematic procedure is developed for the selection of a compact set of sinusoids and comparative results are given to demonstrate the merit of this method.