You are on page 1of 3

Compression of Audio using Discrete Cosine

Transformation
Navnit Dhyani, Durvesh Kambli, Tejas Ghadigaonkar
Department of Computer Engineering,
K. C. College Of Engineering and Management Studies and Research
durveshkambli@gmail.com
navnitdhyani@gmail.com
tejaisbest@gmail.com

Abstract
The discrete cosine transform (DCT) is a technique for
converting a signal into elementary frequency components. It
is widely used in image compression. Here we develop some
simple functions to compute the DCT and to compress
sound/audio.
The rapid growth of digital imaging applications,
including desktop publishing, multimedia, teleconferencing,
and high-definition television (HDTV) has increased the need
for effective and standardized image compression techniques.
Among the emerging standards are JPEG, for compression of
still images; MPEG, for compression of motion video; and
CCITT H.261 (also known as Px64), for compression of
video telephony and teleconferencing.[1]

Computers hear sounds using a microphone instead of


an eardrum. The microphone converts pressure variations into
an electric potential with amplitude corresponding to the
intensity of the pressure. The computer then processes the
electrical signal using a technique called sampling.
Computers sample the signal by measuring its amplitude at
regular intervals, often 44,100 times per second. Each
measurement is stored as a number with fixed precision,
often 16 bits. The following diagram illustrates the sampling
process showing a simple wave sampled at regular intervals.

Compared to most digital data types, with the exception of


digital video, the data rates associated with uncompressed
digital audio are substantial. Digital audio compression
enables more efficient storage and transmission of audio data.
[2]
Introduction

Figure 1: Sampling

Compared to most digital data types, with the exception of


digital video, the data rates associated with uncompressed
digital audio are substantial. Digital audio compression
enables more efficient storage and transmission of audio data.
The various audio compression techniques offer different
levels of complexity, compressed audio quality, and amount
of data compression.This paper is a survey of techniques used
to compress digital audio signals. Its intent is to provide
useful information on digital audio compression. The paper
begins with a summary of the basic audio digitization
process. The next two sections present detailed descriptions
of a relatively simple approach to audio compression: DCT
compression. In the following section, the paper gives an
overview of a third, much more sophisticated, compression
audio algorithm from the Motion Picture Experts Group. The
paper concludes with Scilab implementation of the above.
I.

COMPUTERS AND SOUND

Sound is a complicated phenomenon. It is normally


caused by a moving object in air (or other medium), for
example a loudspeaker cone moving back and forth. The
motion in turn causes air pressure variations that travel
through the air like waves in a pond. Our eardrums convert
the pressure variations into the phenomenon that our brain
processes as sound.

The bit rate of a set of digital audio data is the storage in


bits required for each second of sound. If the data has fixed
sampling rate and precision (as does CD audio), the bit rate is
simply their product. For example, the bit rate of one channel
of CD audio is 44,100 samples/second 16 bits/sample =
705,600 bits/second. The bit rate is a general measure of
storage, and is not always simply the product of sampling rate
and precision.
Large storage requirements limit the amount of audio data
that can be stored on compact discs, flash memory, and other
media. Large file sizes also give rise to long download times
for retrieving songs from the internet. For these reasons (and
others), there is considerable interest in shrinking the storage
requirements of sampled sound.
II.

SIGNAL PROCESSING

The digital representation of audio data offers many


advantages: high noise immunity, stability, and
reproducibility. Audio in digital form also allows the
efficient implementation of many audio processing functions
(e.g., mixing, filtering, and equalization) through the digital
computer.
The conversion from the analog to the digital domain
begins by sampling the audio input in regular, discrete

xintervals of time and quantizing the sampled values into a


discrete number of evenly spaced levels.

Figure 2: Digital Audio Process

The digital audio data consists of a sequence of binary


values representing the number of quantizer levels for each
audio sample. The method of representing each sample with
an independent code word is called pulse code modulation
(PCM). Figure 2 shows the digital audio process.
III.

COMPRESSION

Compression is a method to convert audio signal into an


encoded form in such a way that it can later be decoded to get
back the original signal. Compression is basically to remove
redundancy between neighbouring samples and between
adjacent cycles. Major objective of audio signal compression
is to represent signal with lesser number of bits. The
reduction of data should be done in such a way that there is
acceptable loss of quality.
3.1 Types of Compression:
There are mainly two types of compression techniques Lossless Compression and Lossy Compression
Lossless Compression
It is a class of data compression algorithm that allows the
exact original data to be reconstructed from the exact original
data to be reconstructed from the compressed data. It is
mainly used in cases where it is important that the original
signal and the decompressed signal are almost same or
identical. Examples of lossless compression are Huffmann
coding.
Lossy Compression
It is a data encoding method that compresses data by
removing some of them. The aim of this technique is to
minimize the amount of data that has to be transmitted. They
are mostly used for multimedia data compression.
3.2 Techniques for compression
Wavefom Coding
The signal that is transmitted as input is tried to be
reproduced at the output which would be very similar to the
original signal.
Parametric coding
In this type of coding the signals are represented in the
form of small parameters which describes the signals very
accurately. In parametric extraction method a preprocessor is
used to extract some features that can be later used to extract
the original signal.

Transform Coding
This is the coding technique that we have used for our
paper. In this method the signal is transformed into frequency
domain and then only dominant feature of signal is
maintained. In transform method we have used discrete
wavelet transform technique and discrete cosine transform
technique. When we use wavelet transform technique, the
original signal can be represented in terms of wavelet
expansion. Similarly in case of DCT transform speech can be
represented in terms of DCT coefficients. Transform
techniques do not compress the signal, they provide
information about the signal and using various encoding
techniques compressions of signal is done. Speech
compression is done by neglecting small and lesser important
coefficients and data and discarding them and then using
quantization and encoding techniques.
IV.

DIGITAL FILTERING

The DCT algorithm can be used to not only interpolate


data, but to compute a least-squares fit to the data by omitting
frequencies. The process of computing a least-squares fit to
digitized signals by omitting frequencies is called digital
filtering. Digital filtering can reduce the storage requirements
of digital audio by simply lopping off parts of the data that
correspond to specific frequencies. Of course, cutting out
frequencies affects the sound quality of data. However, the
human ear is not equally sensitive to all frequencies. In
particular, we generally do not perceive very high and very
low frequencies nearly as well as mid-range frequencies. In
some cases, we can filter out these frequencies without
significantly affecting the perceived quality.
Digital filtering is an effective technique for compressing
audio data in many situations, especially telephony. Cutting
out entire frequency ranges is rather a brute-force method,
however. There are more effective ways to reduce the storage
required of digital audio data, while also maintaining a highquality sound.
One idea is this: rather than cutting out less-important
frequencies altogether, we could store the corresponding
model coefficients with lower precision - that is, with fewer
bits. This technique is called quantization. The lessimportant frequencies are determined by the magnitude of
their DCT model coefficients.
Coefficients of small magnitude correspond to cosine
frequencies that do not contribute much to the sound sample.
A key idea of methods like the mp3 algorithm is to focus the
compression on parts of the signal that are perceptually not
very important.
V. DISCRETE COSINE TRANSFORM
The discrete cosine transform (DCT) helps separate the
signal into parts (or spectral sub-bands) of differing
importance (with respect to the sounds audio quality). The
DCT is similar to the discrete Fourier transform: it transforms
a signal or image from the spatial domain to the frequency
domain (Figure 2).

Figure 1: Digital Audio Process

An audio sample is a sequence real numbers X={x1,


xN}. The DCT of this audio sample is the sequence, DCT(X)
= Y = {y1,,yN} such that

We will be missing some of the signal, but one of the


properties of DCTs is that a few of the larger coefficients
account for a large amount of the power in the original signal.
Also the coefficients we discard will usually be from quiet
high frequency parts of the sound, which we hear less. These
are some of the reasons why DCT is often used in
compression.
When compressing with DCTs you typically compress
small slices (windows) of the audio at once. This is partly so
that seeking through the compressed stream is easier but
mostly because we want the coefficients in our window to
represent frequencies we hear (with large window the
majority of the coefficients would represent frequencies well
out of the human hearing range).

Where,

We need to note that the DCT represents the original


signal as a sum of cosines, and that the coefficients specify
the amplitude of these cosines.
If we have the DCT coefficients we can transform them
back to the original sequence with the inverse discrete cosine
transform (IDCT). This could be calculated with the above
expression but more efficient algorithms exist for both the
DCT and IDCT.
VI.

DCT ENCODING

The coefficients of the DCT are amplitudes of cosines that


are within the original signal. Small coefficients will result
in cosines with small amplitudes, which we are less likely to
hear. So instead of storing the original sample we could take
the DCT of the sample, discard small coefficients, and keep
that. We would store fewer numbers and so compress the
audio data.
windowSize = 5000;
The decompression algorithm would be simple; we would
simply take the IDCT of whatever we stored play that back.

In addition we need to consider the binary format of the


data. We could store the results of the DCT as floating point
values, but that would be 32 bits per coefficient which
seems a little high given that .wav format files are stored as
16 bit integers. So lets instead linearly map the range of the
DCT coefficients to 16 bit integers and store those instead.

VII.

CONCLUSION

Here we proposed a system that provides natural language


interface to user, by using which user can retrieve data by
using natural language. Providing the requested data is a
challenging task but here we not only provide data but also
analyze it and present it in rich visual format, due to which
user can easily understand the data.
In future we aim to allow the user even update and
modify the database.
VIII.

REFERENCES

[1] Andrew B. Watson, Image Compression Using


the Discrete Cosine Transform, Mathematica
Journal, 4(1), 1994
[2] Davis Yen Pan, Digital Audio Compression,
Digital Technical Journal Vol. 5 No. 2, Spring
1993

You might also like