You are on page 1of 16

FINAL PROJECT: DIGITAL DESIGN WITH VHDL

Implementation of Adaptive Huffman Codec on FPGA


By
Nikhil Soni 200830014

Problem Analysis
In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman coding suffers from the fact that the uncompresser need have some knowledge of the probabilities of the characters in the compressed files. Not only can this add somewhat to the bits needed to encode the file, but, if this crucial piece of knowledge is unavailable, compressing the file requires two passes- one pass to find the frequency of each character and construct the huffman tree and a second pass to actually compress the file. Expanding on the huffman algorithm, Faller and Gallagher, and later Knuth and Vitter, developed a way to perform the huffman algorithm as a one pass procedure. The fact that the file is encoded dynamically has significant implications for the effectiveness of the tree as a encoder and decoder. Because the Huffman tree is constructed from the character counts of the file as a whole, it works effectively at a universal level. Adaptive Huffman coding also works at a universal level, but is far more effective than static huffman coding at a local level because the tree is constantly evolving. Say, for example, a file starts out with a series of a character that are not repeated again in the file. In static Huffman coding, that character will be low down on the tree because of its low overall count, thus taking lots of bits to encode. In adaptive huffman coding, the character will be inserted at the highest leaf possible to be decoded, before eventually getting pushed down the tree by higher-frequency characters.

The Concept:
The basic concept behind an adaptive compression algorithm is very simple:
Initialize the model Repeat for each character { Encode character Update the model }

Decompression works the same way. As long as both sides have the same initialize and update model algorithms, they will have the same information. The problem is how to update the model. To make Huffman compression adaptive, we could just re-make the Huffman tree every time a character is sent, but that would cause an extremely slow algorithm. The trick is to only update the part of the tree that is affected.

The Algorithm:
The Huffman tree is initialized with a single node, known as the Not-Yet-Transmitted (NYT) or escape code. This code will be sent every time that a new character, which is not in the tree, is encountered, followed by the ASCII encoding of the character. This allows for the decoder to distinguish between a code and a new character. Also, the procedure creates a new node for the character and a new NYT from the old NYT node. Whenever a character that is already in the tree is encountered, the code is sent and the weight is increased. In order to for this algorithm to work, we need to add some additional information to the Huffman tree. In addition to each node having a weight, it will now also be assigned a unique node number. Also, all the nodes that have the same weight are said to be in the same block. These node numbers will be assigned in such a way that: 1. A node with a higher weight will have a higher node number. 2. A parent node will always have a higher node number than its children. This is known as the sibling property, and the update algorithm simply swaps nodes to make sure that this property is upheld. Obviously, the root node will have the highest node number because it has the highest weight.

After a count is increased, the update procedure moves up the tree and inspects the ancestors of the node one at a time. It checks to make sure that the node has the highest node in its block, and if not, swaps it with the highest node number. It then increases the node weight and goes to the parent. It continues until it reaches the root node. As you will see, this assures that the nodes with the highest weight are closer to the top and have shorter codes. The next page shows the construction of the Huffman tree adaptively for the input string ardad.

Tree Update:
The Huffman tree is dynamically updated using the following flow diagram(corresponding VHDL code snippets are also shown):

Architectural Design:
I have designed the Huffman codec using four files. 1) File one contains a package which contains all the records and procedures. 2) File two contains the encoder entity huffman_coder which takes an 8 bit input and outputs one 8-bit vector representing a character which has not been transmitted before and a vector of variable length representing the Huffman code. It also updates the Huffman tree at the encoder end. 3) File three contains the decoder entity huffman_decoder which takes as input an 8-bit vector if the character has not been transmitted and a variable length vector containing the Huffman code. 4) File four contains the entity huffman_codec which connects the encoder and decoder.

Source Code:

Source Code for entity huffman_codec

Source code for Huffman_coder

Source code for Huffman_decoder

Source code for package huffman_tree :

procedure path_to_node(

Explaination of Procedures:
PROCEDURE create
This procedure takes the NYT node pointer, global order count and 8bit input string as input, and creates to children to the current NYT. The left child becomes the new NYT and the right child stores the value of the 8-bit character.

PROCEDURE path_to_node
This procedure takes a node pointer (of a leaf node) as input and traverse up to the root while updating the Huffman path to the leaf node in a string.

PROCEDURE get_max_order
This procedure takes order number as input query and searches the tree for the node with the given order number.

PROCEDURE swap_nodes
This procedure swaps two nodes, as required by the algorithm and re-swaps their order numbers, so as to maintain consistency.

PROCEDURE order_max
This procedure is used to check whether the given node has the largest order number in the block or not. It returns the largest order number node in the block. This is done by traversing a global integer array which stores the weights of the nodes corresponding to their order number.

PROCEDURE go_to_node
This procedure takes the Huffman code and traverses the tree accordingly to reach the required leaf node.

PROCEDURE exists
This procedure takes the 8-bit data as input and returns a Boolean variable to state whether the given node exists in the tree or not.

RTL Schematic

Conclusion & Analytical Remarks: The project was successfully completed, containing 4 main modules: package, encoder, decoder and codec. The codec takes an 8-bit data as input, the decoder updates the tree and transmits the corresponding Huffman code and the encoder takes that as input and updates its tree as well. The major challenges that we faced were: 1) Working with data structures and access variables. 2) Coming up with the complete design of the codec. 3) Working with recursive procedures

References:
1) http://www.cs.sfu.ca/CC/365/li/squeeze/AdaptiveHuff.html

2) http://www.cs.duke.edu/csed/curious/compression/adaptivehuff.html 3) http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04232974&tag=1 4) http://en.wikipedia.org/wiki/Adaptive_Huffman_coding

You might also like