Professional Documents
Culture Documents
Introduction
Audio coding or audio compression algorithms are used to obtain compact digital representation of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. The central objective in audio coding is to represent the signal with minimum number of bits while achieving transparent signal reproduction The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high- fidelity audio compression.
AAC was promoted as the successor to MP3 for audio coding at medium to high bitrates. AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filter bank, non-uniform quantization, Huffman coding, iteration loop structure using analysis by-synthesis), but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates. Its popularity is currently maintained by it being the default iTunes codec, the media player which powers iPod, the most popular digital audio player on the market. Furthermore, the iTunes Music Store, whose sales account for 85% of the market for legal online downloads, sells AAC-encoded songs.
Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) Up to 48 channels Higher efficiency and simpler filterbank (hybrid pure MDCT) Higher coding efficiency for stationary signals (blocksize: 576 1024 samples) Higher coding efficiency for transient signals (blocksize: 192 128 samples) Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe Much better handling of frequencies above 16 kHz More flexible joint stereo (separate for every scale band)
Both the mid/side coding and the intensity coding are more flexible, allowing to apply them to reduce the bit-rate more frequently. An optional backward prediction, computed line by line, achieves better coding efficiency especially for very tone-like signals. This feature is only available within the rarely used main profile. Improved Huffman Coding : In AAC, coding by quadruples of frequency lines applied more often. In addition, the assignment of Huffman code tables to coder partitions can be much more flexible. AAC and HE-AAC are far better than MP3 at very low bitrates, but at medium to higher bitrates the two formats are more comparable
MPEG-4 HE AAC
MPEG-4 HE AAV v2
Mepeg layers
Layer 1: DCT type filter with equal frequency spread per band, sychoacoustic model only uses frequency masking. Layer 2: (Musicam or MUSICAM) Same filter bank as layer 1. Psychoacoustic model uses a little bit of the temporal masking. Layer 3 (MP3): Layer 1 filter bank followed by MDCT per band to obtain nonuniform frequency division similar to critical bands. Psychoacoustic model includes temporal masking effects, takes into account stereo redundancy, and uses Huffman coder. At the time of MPEG1 audio development (finalized 1992), Layer 3 was considered too complex to be practically useful. But today, layer 3 is the most widely deployed audio coding method (known as MP3), because it provides good quality at an acceptable bit rate. It is also because the code for layer 3 is distributed freely.
Masking
what is Masking : Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
AAC
Spectral Band Replication (SBR): Bandwidth extension technology is based on the observation for the purpose of improved compression . Instead of transmitting the upper part of the spectrum with AAC, SBR regenerates it from the lower part with the help of some low-bit rate guidance data. For regenerating the missing high-frequency components, using a QMF (Quadrature Mirror Filter) filter bank analysis/synthesis system. The main tools are : 1)High Frequency Reconstruction: Transposer or Generator -- upper part of the spectrum by copying and shifting the lower part of the transmitted spectrum. To generate the highfrequency spectrum. Constructor -- addition of missing sinusoids generates a the 2)Envelope Adjustment: upper spectrum generated by the transposer needs to be shaped subsequently with respect to frequency and time .
AAC
Parametric Stereo (PS): 1) joint coding of stereo audio : just a mono-downmix is transmitted, along with a small data stream describing it becomes to up mix in the decoder Noiseless Coding : This is done by a lossless packing of quantized spectral data exploiting statistical dependencies and other properties. To achieve a further gain in required data rate by reduction of redundancy in the representation of the transmitted data. PREDECTIVE CODING: Forward prediction : The correlation between subsequent input samples is exploited by quantizing /coding the prediction error based on the unquantized input samples. Backward prediction :This scheme is also known as opposed to the more widely used which comprises a prediction based on previously quantized values
AAC Compression
Architecture of HE-AAC
MPEG-HE AAC
HE-AAC is the low bit rate codec in the AAC family and is a combination of the AAC LC (Advanced Audio Coding Low Complexity) audio coder and SBR (Spectral Band Replication) bandwidth expansion tool. This combination achieves good stereo quality already at bit rates of 32 to 48 kbit/s. HE-AAC is also known as AAC Plus and can be used in multichannel operations.
MPEG-4 HE-AAC v2
Combined with parametric stereo, the HE-AAC codec provides good audio quality starting at bit rates around 16 to 24 kbit/s for stereo content. HE-AAC v2 is also known as AAC Plus v2.
Refrences ::