Paper Topic : Huffman coding Codes1.pdf• Paper topic: Huffman coding. Information Retrieval 902333...
Embed Size (px)
Transcript of Paper Topic : Huffman coding Codes1.pdf• Paper topic: Huffman coding. Information Retrieval 902333...

Information Retrieval 902333 1
Paper Topic : Huffman Paper Topic : Huffman codingcoding
Team Members :Team Members :HanaHana’’a shdefat 0950902031a shdefat 0950902031Bushra hasaien 0900902008Bushra hasaien 0900902008
Dr.SaifDr.Saif RababahRababah

Information Retrieval 902333 2
Data CompressionData Compression
•• PressurePressure data:data:data compression (for a picture or sound or video or text , or to draw a Graphics ... etc.) is to reduce the size of the data. The main aim of data compression is to reduce the file size.

Information Retrieval 902333 3
Type of Data Type of Data CompressionCompression
Data compressionData compression in generalin general is dividedis divided into two into two partsparts oror two types, namely:two types, namely:
o The first type is called (Lossless) without losing anything. that the file is the output of the compression process Photocopy the original file before any pressure without losing anything.
o The second type is called (Lossy) the loss of everything .that the file output of the compression process is
similar to the original file to a certain extent, . But these data are the data lost is essential to us (for the User).

Information Retrieval 902333 4
History of Huffman codes• In 1951 was Professor Robert M.. Fano (who was a
professor at the Massachusetts Institute of Technology), who teaches Trivhtermaz Shannon -Fano his students. The Professor Btejearstudents either to attend the final exam or Ajdo way better and more efficient than encoding Shannon -Fano. Try David Hoffman- and it was one of the students, Professor - to find a better way ofShannon - Fano path of trial and error and was about to give upthe idea and Alstaadad of the test until he found a way to build atree from the bottom up unlike encoding Shannon -Fano and thusthe encoding is better than coding Shannon - Fano

Information Retrieval 902333 5
David A. Huffman• BS Electrical Engineering at
Ohio State University• Worked as a radar maintenance
officer for the US Navy• PhD student, Electrical
Engineering at MIT 1952• Was given the choice of writing
a term paper or to take a final exam
• Paper topic: Huffman coding

Information Retrieval 902333 6
Huffman Coding
• Uses the minimum number of bits• Variable length coding – good for data transfer
– Different symbols have different lengths• Symbols with the most frequency will result
in shorter codewords• Symbols with lower frequency will have
longer codewords• “Z” will have a longer code representation
then “E” if looking at the frequency of character occurrences in an alphabet
• No codeword is a prefix for another codeword!

Information Retrieval 902333 7
DecodingDecoding
• To determine the original message, read the string of bits from left to right and use the table to determine the individual symbols
ABCABCABCA
Decode the following: 01101101101101101101
SymSymbolbol
CodCodee
A 01
B 10
C 11

Information Retrieval 902333 8
Decoding
Original String:Original String:01101101101101101101
Symbol
Code
A 01
B 10
C 11
10 11 01 10 1101 01 10 11 01
A A A AB B BC C C

Information Retrieval 902333 9
• This text contain 10 character8 * 10 = 80 bit• To represent the text with 2
bits instead of 8-bit 2 * 10 = 20 bit• In this case is not lost from the text
but on the contrary, we reduced space
• Compression ratio = compressed text \ text non-compressed20 \ 80 = %25

Information Retrieval 902333 10
• A is More than a repetition of the characters that place just put (0)
• 0101101011010110• 4A 4 * 1 • 3B 3 * 2• 3C 3 * 2• 4 + 6 + 6 = 16 bit• Compression
ratio = compressed text \ text non-compressed16 \ 80 = % 20
SymSymbolbol
CodCodee
A 0
B 10
C 11

Information Retrieval 902333 11
Representing a Huffman Table as Representing a Huffman Table as a Binary Treea Binary Tree
• Codewords are presented by a binary tree• Each leaf stores a character• Each node has two children
– Left = 0– Right = 1
• The codeword is the path from the root to the leaf storing a given character
• The code is represented by the leads of the tree is the prefix code

Information Retrieval 902333 12
AlgorithmAlgorithm• Make a leaf node for node symbol
– Add the generation probability for each symbol to the leaf node
• Take the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)– Add 1 for the right edge– Add 0 for the left edge– The probability of the new node is the sum
of the probabilities of the two connecting nodes
• If there is only one node left, the code construction is completed.

Information Retrieval 902333 13
Algorithm Huffmanalgorithm Huffman (C,f){n:=|C|;Q:=C;for i:=1 to n-1 do{ z:=NewNode();x:=z.left:=GetMin(Q);y:=z.right:=GetMin(Q);f[z]:=f[x]+f[y];insert (Q,z);}return GetMin(Q);}

Information Retrieval 902333 14
Example
Symbol
FrequencyA 5
B 2C 2D 1E 1

Information Retrieval 902333 15
Example – Creating the tree
D 1
C 2
A5
B2
E 1
Symbol
Frequency
A 5B 2C 2D 1E 1

Information Retrieval 902333 16
E
D
CBA
SymbolSymbol
1
1
225
FrequenFrequency cy
SymbolSymbol FrequenFrequency cy
A 5B 2C 2
1 2
• Green nodes – nodes to be evaluated• White nodes – nodes which have already been evaluated• Blue nodes – nodes which are added in this iteration
Take the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)

Information Retrieval 902333 17
1. Step
D 1
C 2
A 5
B2
E 1
2
0 1

Information Retrieval 902333 18
2. Step
D 1
C2
E 1
2
4
0 1
E
D
CBA
SymboSymboll
1
1
225
FrequeFrequency ncy
SymbolSymbol FrequeFrequency ncy
A 5B 22 4
10

Information Retrieval 902333 19
D1
B2
E 1
2 C2
4
3. Step
10
10
6
0 1
EDCBA
Symbol
11225
Frequency
Symbol Frequency
A 53 6

Information Retrieval 902333 20
Example: Completed Tree
D1
A 5
E1
B2
2
6
11
1
0
0
0
1
1
4
C2
10

Information Retrieval 902333 21
Finally we convert theFinally we convert theTree to prefix codeTree to prefix code
Symbol Frequency
A 0B 10C 111D 1100E 1101
Generate the table by reading from the root node to the leaves for each symbol

Information Retrieval 902333 22
Questions?Questions?

Information Retrieval 902333 23
ReferencesReferences• http://www.cstutoringcenter.com/tutori
als/algorithms/huffman.php• http://en.wikipedia.org/wiki/Huffman_c
oding• http://michael.dipperstein.com/huffma
n/index.html• http://en.wikipedia.org/wiki/David_A._
Huffman• http://www.binaryessence.com/dct/en
000080.htm

Information Retrieval 902333 24
Thanks AllThanks All