MPEG Audio FAQ Version 9 (original) (raw)

INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11 N2431
October 1998/Atlantic City

Source Audio Subgroup
Title MPEG Audio FAQ Version 9
Authors D. Thom, H. Purnhagen, and the MPEG Audio Subgroup

Contents

How does MPEG-1 and MPEG-2 Audio work?

The MPEG Audio coders, aimed for generic audio, i.e. all types of speech and music signals, are perceptual audio coders, rather than so-called 'waveform coders'. In a perceptual audio coder, the codec does not attempt to retain the input signal exactly after encoding and decoding, rather its goal is to ensure that the output signal sounds the same to a human listener. The primary psychoacoustic effect that the perceptual audio coder uses is called 'auditory masking', where parts of a signal are not audible due to the function of the human auditory system. The parts of the signal that are masked are commonly called 'irrelevant', as opposed to parts of the signal that are removed by a source coder (lossless or lossy), which are termed 'redundant'.

In order to remove this irrelevancy, the encoder contains a psychoacoustic model. This psychoacoustic model analyses the input signals within consecutive time blocks and determines for each block the spectral components of the input audio signal by applying a frequency transform. Then it models the masking properties of the human auditory system, and estimates the just noticeable noise-level, sometimes called the threshold of masking.

In parallel, the input signal is fed through a time-to-frequency mapping, resulting in spectrum components for subsequent coding. In its quantisation and coding stage, the encoder tries to allocate the available number of data bits in a way that meets both the bitrate and masking requirements. The information on how the bits are distributed over the spectrum is contained in the bitstream as side information.

The decoder is much less complex, because it does not require a psychoacoustic model and bit allocation procedure. Its only task is to reconstruct an audio signal from the coded spectral components and associated side information.

Talking about MPEG Audio, I hear about phases, layers and levels. What does that all mean?

There are two different matters that have to be distinguished. First, MPEG works in phases. These phases are normally denoted by Arabic numbers (MPEG-1, MPEG-2, MPEG-4). The audio activities of the first phase, MPEG-1, were finalised in 1992 and resulted in the International Standard ISO/IEC 11172-3 which was published in 1993. Part of the audio activities in the second phase, MPEG-2, have been finalised in 1994 and resulted in the International Standard ISO/IEC 13818-3 which was published in 1995. Further work relating to MPEG-2 will be finalised in 1997 and published in International Standard 13818-7. Another phase, currently under way, is called MPEG-4, and is planned to be finalised in 1998.

Both in MPEG-1 and in MPEG-2, three different layers are defined, sometimes incorrectly called 'levels'. These layers represent a family of coding algorithms. The layers are preferably denoted by roman figures, i.e. Layer I, Layer II and Layer III.

What are these different layers?

The different layers have been defined because they all have their merits. Basically, the complexity of the encoder and decoder, the encoder/decoder delay, and the coding efficiency increase when going from Layer I via Layer II to Layer III. Layer I has the lowest complexity and is specifically suitable for applications where also the encoder complexity plays an important role. Layer II requires a more complex encoder and a slightly more complex decoder, and is directed towards 'one to many' applications, i.e. one encoder serves many decoders. Compared to Layer I, Layer II is able to remove more of the signal redundancy and to apply the psychoacoustic threshold more efficiently. Layer III is again more complex and is directed towards lower bit rate applications due to the additional redundancy and irrelevancy extraction from enhanced frequency resolution in its filterbank.

The term 'layers' suggests that the higher layers are 'on top' of the lower layers. Is that true?

Not exactly. But it is true that the main functional modules of the lower layers are also used by the higher layers. E.g. the subband filter of Layer I is also used by Layer II and Layer III, Layer II adds a more efficient coding of side information, Layer III adds a frequency transform in all the subbands. The three layers have been defined to be compatible in a hierarchical way, i.e. a 'full Layer N' decoder is able to decode bitstreams encoded in Layer N and all layers below N. Consequently, a 'full Layer III' decoder accepts Layer I, II and III bitstreams and a 'full Layer II' decoder accepts Layer I and II bitstreams, however a Layer I decoder only accepts Layer I bitstreams.

Also MPEG Audio decoders may exist which do not support the full functionality for a certain layer, or do not support the lower layers. These decoders may however not be referred to as 'full Layer N' decoder.

I understand now the layers, how about the different phases?

The first phase, MPEG-1, was dealing with mono and two-channel stereo sound coding, at sampling frequencies commonly used for high quality audio (48, 44.1 and 32 kHz). This phase was finished in 1992, and MPEG-1 audio is nowadays used in a multitude of applications.

The second phase contained two different work items. First one is the extension to lower sampling frequencies, providing better sound quality at the very low bit rates (below 64 kbit/s for a mono channel). The second work item is the extension to multichannel sound. MPEG-2 Audio supports up to 5 full bandwidth channels plus one low frequency enhancement channel (such an ensemble of channels is referred to as '5.1'). This multichannel extension is both forward and backward compatible with MPEG-1.

Both MPEG-1 and MPEG-2, and within MPEG-2 both the work items, have the three layer structure. Also in the framework of MPEG-2, work is going on in the field of Advanced Audio Coding (AAC), also known as Non-Backward-Compatible coding (NBC). This work is expected to be finished in 1997. The AAC codec will not provide an inherent 'Backward Compatible' method.

What does the compatibility in MPEG-2 multichannel mean?

The core of the MPEG-2 bitstream is an MPEG-1 bitstream. This enables fully compatible decoding with an MPEG-1 decoder. In addition, the need to transfer two separate bitstreams, called simulcast (one for two-channel stereo and another one for the multichannel audio programme) is avoided, at some cost in coding efficiency for the multichannel audio signal, compared to AAC which is a Non Backward Compatible (NBC) coding algorithm.

How will an MPEG-1 decoder get information from all channels when receiving an MPEG-2 Audio bitstream?

An MPEG-1 decoder will be supplied with an appropriate two-channel downmix of all the channels in the multichannel ensemble, contained in the MPEG-1 core of the MPEG-2 bitstream. The left and right channel of the downmix together contain components of all the channels, according to the equations in the compatibility matrix.

Do I have to use MPEG-1 Audio with MPEG-1 Video and MPEG-2 Audio with MPEG-2 Video?

No, due to the compatibility, MPEG-2 Audio can be used with MPEG-1 Video as well. The other way around, MPEG-1 Audio can be used with MPEG-2 Video without any restrictions. Any combinations of MPEG-1 and MPEG-2 Audio and Video can be handled by the system as specified by the MPEG-Systems standard ISO/IEC 11172-1 for MPEG-1 and ISO/IEC 13818-3 for MPEG-2.

Is Variable Bit Rate allowed in MPEG-1 and MPEG-2 Audio?

For Layer III, the answer is simply 'yes'. For Layers I and II, according to the standard, it is not mandatory for decoders to support Variable Bit Rate (VBR). However, in practice the majority of the decoders do support Variable Bit Rate, and it is perfectly in line with the standard to specify for a certain application that decoders should support VBR.

What is the relation between MUSICAM and MPEG Audio Layer II?

MUSICAM was the name of an audio coding system submitted to MPEG, which became the basis for MPEG Audio Layer I and II. Since the finalisation of MPEG-1 Audio, the original MUSICAM algorithm is not used anymore. The name MUSICAM is however mistakenly still used regularly when MPEG Audio Layer II is meant. This is especially to be avoided because the name MUSICAM is trademarked by different companies in different regions of the world.

What are the typical applications of MPEG Audio?

Within the professional and consumer market, four fields of applications can be identified: broadcasting, storage, multimedia and telecommunication. This variety of applications is possible because of the wide range of bitrates and the numerous configurations, allowed within the MPEG Audio standard. Some of the most important applications are:

How many MPEG Audio decoders are already in the market-place?

Because of the widespread applications, it is rather difficult to give exact numbers. But at the end of 1996 a rough estimation of decoders in the marketplace gives a total number of several millions.

What are the reasons that MPEG Audio is used so widely?

Thanks to its technical merits and excellent audio quality performance, several standardisation bodies include the MPEG Audio standard in their recommendation. ITU-R (International Telecommunication Union) issued in 1994 the recommendation BS.1115, to use MPEG Audio for audio as well as television broadcasting, including contribution, distribution, commentary and emission links. In 1995, DAVIC (Digital Audio Visual Council) specified the use of MPEG Audio for mono and stereo audio signals. ETSI (European Telecommunication Standardisation Institute) included in January 1995 MPEG-1 and MPEG-2 Audio in their Standard on DAB Standard pr ETS 300 401, 'Radio Broadcasting system; Digital Audio Broadcasting (DAB) to mobile, portable and fixed receivers' and later in the ETR 154 on 'Digital broadcasting systems for television; Implementation guidelines for the use of MPEG-2 systems; Video and audio in satellite and cable broadcasting applications'. ITU-T recommended in 1995 in its recommendation J.52 'Digital Transmission of High-Quality Sound-Programme Signals using one, two or three 64 kbit/s Channels per Mono Signal (and up to Six per Stereo Signal)' the use of MPEG-1 and MPEG-2 Audio as an audio coding system to provide high quality audio over telecommunication lines.

How is the performance of MPEG Audio with respect to cascading, i.e. multiple coding?

This functionality was tested by the International Telecommunication Union (ITU-R). They tested various configurations of repeated encoder/decoder chains at different bitrates with a variety of audio coding algorithms.

MPEG Audio performed best in this test. On the basis of these tests, ITU-R recommends the use of MPEG Audio Layer II for contribution (i.e. link between broadcasting studios with provisions for post processing), for distribution (i.e. link between the broadcasting and transmitter station) and for emission (i.e. final transmission between transmitter and receiver at home). The use of MPEG Audio Layer III is recommended for commentary links, i.e. a link for speech signals which are transmitted to the broadcasting station using e.g. one B-channel of an ISDN line.

What is the Signal-To-Noise Ratio (SNR) of MPEG Audio?

For a perceptual codec, this is not really a relevant question. The SNR is a very bad measure of perceptual audio quality, even for a waveform coder. The SNR measured in a conventional way, may vary from a few dB up to more than 100 dB, mostly depending on the signal, while no noise is audible in any of these cases. Within the International Telecommunication Union (ITU-R), a task group (TG 10/4) is working on the development of a more appropriate objective measurement system, based on perceptual models. For the moment, one has to rely on the human ear as a measuring instrument, i.e. there are no other reliable means to determine the quality of a perceptual codec than listening tests. Even when a standardised perceptually based objective measurement system is available, listening tests will still be wise for comparison of different audio codecs.

On the basis of psychoacoustics, it must be noted that within the range of 5 to 80 dB SNR, it is easily possible to generate two test signals, one that can be exactly reproduced in the perceptual sense, and one that cannot, showing the large range over which SNR is not meaningful as a quality measure.

Now that I'm very interested in MPEG Audio, where can I find more detailed information in the literature?

Since 1992, many articles about MPEG Audio coding have been published all over the world. The following list of standards, recommendations and articles provides you with more information, and gives you both, a better overview and more detailed knowledge, so that you will become a 'real MPEG Audio fan'.