# Paper Topic : Huffman coding Codes1.pdf• Paper topic: Huffman coding. Information Retrieval 902333...

Embed Size (px)

### Transcript of Paper Topic : Huffman coding Codes1.pdf• Paper topic: Huffman coding. Information Retrieval 902333...

Information Retrieval 902333 1

Paper Topic : Huffman Paper Topic : Huffman codingcoding

Team Members :Team Members :HanaHana’’a shdefat 0950902031a shdefat 0950902031Bushra hasaien 0900902008Bushra hasaien 0900902008

Dr.SaifDr.Saif RababahRababah

Information Retrieval 902333 2

Data CompressionData Compression

•• PressurePressure data:data:data compression (for a picture or sound or video or text , or to draw a Graphics ... etc.) is to reduce the size of the data. The main aim of data compression is to reduce the file size.

Information Retrieval 902333 3

Type of Data Type of Data CompressionCompression

Data compressionData compression in generalin general is dividedis divided into two into two partsparts oror two types, namely:two types, namely:

o The first type is called (Lossless) without losing anything. that the file is the output of the compression process Photocopy the original file before any pressure without losing anything.

o The second type is called (Lossy) the loss of everything .that the file output of the compression process is

similar to the original file to a certain extent, . But these data are the data lost is essential to us (for the User).

Information Retrieval 902333 4

History of Huffman codes• In 1951 was Professor Robert M.. Fano (who was a

professor at the Massachusetts Institute of Technology), who teaches Trivhtermaz Shannon -Fano his students. The Professor Btejearstudents either to attend the final exam or Ajdo way better and more efficient than encoding Shannon -Fano. Try David Hoffman- and it was one of the students, Professor - to find a better way ofShannon - Fano path of trial and error and was about to give upthe idea and Alstaadad of the test until he found a way to build atree from the bottom up unlike encoding Shannon -Fano and thusthe encoding is better than coding Shannon - Fano

Information Retrieval 902333 5

David A. Huffman• BS Electrical Engineering at

Ohio State University• Worked as a radar maintenance

officer for the US Navy• PhD student, Electrical

Engineering at MIT 1952• Was given the choice of writing

a term paper or to take a final exam

• Paper topic: Huffman coding

Information Retrieval 902333 6

Huffman Coding

• Uses the minimum number of bits• Variable length coding – good for data transfer

– Different symbols have different lengths• Symbols with the most frequency will result

in shorter codewords• Symbols with lower frequency will have

longer codewords• “Z” will have a longer code representation

then “E” if looking at the frequency of character occurrences in an alphabet

• No codeword is a prefix for another codeword!

Information Retrieval 902333 7

DecodingDecoding

• To determine the original message, read the string of bits from left to right and use the table to determine the individual symbols

ABCABCABCA

Decode the following: 01101101101101101101

SymSymbolbol

CodCodee

A 01

B 10

C 11

Information Retrieval 902333 8

Decoding

Original String:Original String:01101101101101101101

Symbol

Code

A 01

B 10

C 11

10 11 01 10 1101 01 10 11 01

A A A AB B BC C C

Information Retrieval 902333 9

• This text contain 10 character8 * 10 = 80 bit• To represent the text with 2

bits instead of 8-bit 2 * 10 = 20 bit• In this case is not lost from the text

but on the contrary, we reduced space

• Compression ratio = compressed text \ text non-compressed20 \ 80 = %25

Information Retrieval 902333 10

• A is More than a repetition of the characters that place just put (0)

• 0101101011010110• 4A 4 * 1 • 3B 3 * 2• 3C 3 * 2• 4 + 6 + 6 = 16 bit• Compression

ratio = compressed text \ text non-compressed16 \ 80 = % 20

SymSymbolbol

CodCodee

A 0

B 10

C 11

Information Retrieval 902333 11

Representing a Huffman Table as Representing a Huffman Table as a Binary Treea Binary Tree

• Codewords are presented by a binary tree• Each leaf stores a character• Each node has two children

– Left = 0– Right = 1

• The codeword is the path from the root to the leaf storing a given character

• The code is represented by the leads of the tree is the prefix code

Information Retrieval 902333 12

AlgorithmAlgorithm• Make a leaf node for node symbol

– Add the generation probability for each symbol to the leaf node

• Take the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)– Add 1 for the right edge– Add 0 for the left edge– The probability of the new node is the sum

of the probabilities of the two connecting nodes

• If there is only one node left, the code construction is completed.

Information Retrieval 902333 13

Algorithm Huffmanalgorithm Huffman (C,f){n:=|C|;Q:=C;for i:=1 to n-1 do{ z:=NewNode();x:=z.left:=GetMin(Q);y:=z.right:=GetMin(Q);f[z]:=f[x]+f[y];insert (Q,z);}return GetMin(Q);}

Information Retrieval 902333 14

Example

Symbol

FrequencyA 5

B 2C 2D 1E 1

Information Retrieval 902333 15

Example – Creating the tree

D 1

C 2

A5

B2

E 1

Symbol

Frequency

A 5B 2C 2D 1E 1

Information Retrieval 902333 16

E

D

CBA

SymbolSymbol

1

1

225

FrequenFrequency cy

SymbolSymbol FrequenFrequency cy

A 5B 2C 2

1 2

• Green nodes – nodes to be evaluated• White nodes – nodes which have already been evaluated• Blue nodes – nodes which are added in this iteration

Take the two leaf nodes with the smallest probability (pi) and connect them into a new node (which becomes the parent of those nodes)

Information Retrieval 902333 17

1. Step

D 1

C 2

A 5

B2

E 1

2

0 1

Information Retrieval 902333 18

2. Step

D 1

C2

E 1

2

4

0 1

E

D

CBA

SymboSymboll

1

1

225

FrequeFrequency ncy

SymbolSymbol FrequeFrequency ncy

A 5B 22 4

10

Information Retrieval 902333 19

D1

B2

E 1

2 C2

4

3. Step

10

10

6

0 1

EDCBA

Symbol

11225

Frequency

Symbol Frequency

A 53 6

Information Retrieval 902333 20

Example: Completed Tree

D1

A 5

E1

B2

2

6

11

1

0

0

0

1

1

4

C2

10

Information Retrieval 902333 21

Finally we convert theFinally we convert theTree to prefix codeTree to prefix code

Symbol Frequency

A 0B 10C 111D 1100E 1101

Generate the table by reading from the root node to the leaves for each symbol

Information Retrieval 902333 22

Questions?Questions?

Information Retrieval 902333 23

ReferencesReferences• http://www.cstutoringcenter.com/tutori

als/algorithms/huffman.php• http://en.wikipedia.org/wiki/Huffman_c

oding• http://michael.dipperstein.com/huffma

n/index.html• http://en.wikipedia.org/wiki/David_A._

Huffman• http://www.binaryessence.com/dct/en

000080.htm

Information Retrieval 902333 24

Thanks AllThanks All