EE800 Term Project Huffman Coding
Embed Size (px)
description
Transcript of EE800 Term Project Huffman Coding

Abdullah Aldahami(11074595)April 6, 2010*

*
Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average size.
Huffman codes are part of several data formats as ZIP, MPEG and JPEG.
The code is generated based on the estimated probability of occurrence.
Huffman coding works by creating an optimal binary tree of nodes, that can be stored in a regular array.

*
The method starts by building a list of all the alphabet symbols in descending order of their probabilities (frequency of appearance).
It then construct a tree from bottom to top.
Step by step, the two symbols with the smallest probabilities are selected; added to the top.
When the tree is completed, the codes of the symbols are assigned.

*Example: circuit elements in digital computations
Summation of frequencies (Number of events) is 40
CharacterFrequency
i6t5space4c3e3n3u2l2
CharacterFrequency
m2s2a2o2r1d1g1p1

*Example: circuit elements in digital computations
22444477781213132540010101010101010101010101010101

*So, the code will be generated as follows:Total is 154 bits with Huffman Coding compared to 240 bits with no compression
CharacterFrequencyCodeCodeLengthTotalLengthi6010318t5000315space4110312c30010412e30110412n310039u200110510l201110510
CharacterFrequencyCodeCodeLengthTotalLengthm201111510s2100148a2111048o2111148r100111066d100111166g11000055p11000155

* Entropy is a measure defined in information theory that quantifies the information of an information source.The measure entropy gives an impression about the success of a data compression process.
InputSymbolit cenulProbabilityP(x)0.150.1250.10.0750.0750.0750.050.05OutputCode010000110001001101000011001110Code length(in bits) (Li)33344355Weighted path length Li P(x)0.450.3750.30.30.30.2250.250.25OptimalityProbability budget (2Li)1/81/81/81/161/161/81/321/32Information of a Message I(x)= log2 P(x)2.743.003.323.743.743.744.324.32Entropy H(x)=P(x) log2 P(x)0.4110.3750.3320.2800.2800.2800.2160.216

*The sum of the probability budgets across all symbols is always less than or equal to one. In this example, the sum is equal to one; as a result, the code is termed a complete code.
Huffman coding approaches the optimum on 98.36% = (3.787 / 3.85) *100
InputSymbolmsaordgpSumProbabilityP(x)0.050.050.050.050.0250.0250.0250.025= 1OutputCode011111001111011110011100011111000010001Code length(in bits) (Li)54446655Weighted path length Li P(x)0.250.20.20.20.150.150.1250.1253.85OptimalityProbability budget (2Li)1/321/161/161/161/641/641/321/32= 1Information of a Message I(x)= log2 P(x)4.324.324.324.325.325.325.325.32Entropy H(x)=P(x) log2 P(x)0.2160.2160.2160.2160.1330.1330.1330.1333.787Bit/sym

*
Static probability distribution (Static Huffman Coding)Coding procedures with static Huffman codes operate with a predefined code tree, previously defined for any type of data and is independent from the particular contents.
The primary problem of a static, predefined code tree arises, if the real probability distribution strongly differs from the assumptions. In this case the compression rate decreases drastically.

*
Adaptive probability distribution (Adaptive Huffman Coding)The adaptive coding procedure uses a code tree that is permanently adapted to the previously encoded or decoded data.
Starting with an empty tree or a standard distribution.
This variant is characterized by its minimum requirements for header data, but the attainable compression rate is unfavourable at the beginning of the coding or for small files.

*