Huffman Codes

Basic

Huffman's algorithm (or Huffman's code) is an algorithm created in the early 1950's focused on compressing files and making them a smaller size. The algorithm does this by changing the binary encoding of characters within the program. A standard file (without compression) might consist of three binary bits for every character; what Huffman's algorithm does is change the amount of bits for each character, making less frequent characters more bits, and more frequent ones less bits.

In the visual below we see how this ends up saving us space within the program:
slide1final

The question is, how do we come about finding the numbers to use for the compressed bits? Enter the Huffman-code tree.

Huffman-Code Tree

To find which numbers to use for the compressed bits we need to construct a Huffman-code tree. Doing this is simple and all you have to do is follow a few simple steps. Firstly, lay out all the characters in your program with the number of times they occur inside the program. (See Below)
slide2final

Then find the two numbers that appear least frequently. Add them together and make another node connecting the two with the addition of both of the occurrences of those letters. Make sure to always keep the smaller number on the left. (See Below)
Slide3.jpg

Repeat steps until all characters are connected. Once done, for every line going to the left add a "0" and for every line going to the right add a "1". (See Below)
Slide4.jpg

Once this is done you can find out the "Huffman bits" for each of your characters.

Prefix Codes

One would think a problem arises; normally the computer knows its reading a new character because it scans every three bits, how then here, if the bits are of different lengths will the computer know that this is a new character? The answer is simple. One character's bit-code will never start with that of another. For example, since I have my "A" being "01", there is no character that will start with a "01" in the beginning. This should be evident when looking at how the tree is constructed.

Decoding

For a computer to be able to understand this code, you need more than just to send the compressed file. You will also need to send a map of all the characters in the specific program and all their frequencies so that the computer can create a tree for itself.

Sample

slide5final
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License