Text Compression Huffman Coding

Post on 31-Dec-2015

95 views 1 download

description

Text Compression Huffman Coding. James Adkison 02/07/2008. Assumptions / Givens. A bit is represented by a ‘1’ or ‘0’ A byte is any combination of 8 bits All ASCII characters are stored in 1 byte, except the ‘\n’ character which is stored as two bytes the ‘\n’ and ‘\r’. Notation. - PowerPoint PPT Presentation

Transcript of Text Compression Huffman Coding

Text CompressionText CompressionHuffman CodingHuffman Coding

James AdkisonJames Adkison

02/07/200802/07/2008

Assumptions / GivensAssumptions / Givens

• A bit is represented by a ‘1’ or ‘0’A bit is represented by a ‘1’ or ‘0’

• A byte is any combination of 8 bitsA byte is any combination of 8 bits

• All ASCII characters are stored in 1 All ASCII characters are stored in 1 byte, except the ‘\n’ character which byte, except the ‘\n’ character which is stored as two bytes the ‘\n’ and ‘\r’is stored as two bytes the ‘\n’ and ‘\r’

NotationNotation

• Square brackets ‘[’ ‘]’ are range Square brackets ‘[’ ‘]’ are range inclusiveinclusive

• Parenthesis ‘(’ ‘)’ are range exclusiveParenthesis ‘(’ ‘)’ are range exclusive

• Example: [0, 6) includes 0 and excludes Example: [0, 6) includes 0 and excludes 6 so the range is 0 to 5 or [0, 5]6 so the range is 0 to 5 or [0, 5]

• Traversing a Hoffman Tree to the left Traversing a Hoffman Tree to the left produces a ‘0’ bit and the right produces a ‘0’ bit and the right produces a ‘1’ bitproduces a ‘1’ bit

DefinitionsDefinitions

• Bit string: any combination of two or Bit string: any combination of two or more bitsmore bits

• Text = ASCII text = Uncompressed Text = ASCII text = Uncompressed text = Decoded texttext = Decoded text

• Encoded text = Huffman encoding = Encoded text = Huffman encoding = Compressed textCompressed text

Definitions Continued…Definitions Continued…

• Leaf Node: Has 1 parent and [0, 1) Leaf Node: Has 1 parent and [0, 1) childrenchildren

• Non-leaf Node: Has 1 parent and [1, Non-leaf Node: Has 1 parent and [1, 2] children2] children

• Root Node: Has 0 parents and [0, 2] Root Node: Has 0 parents and [0, 2] childrenchildren

2 5

1

6

3 4

10

98

11

7

Root Node Non-leaf Node Leaf Node

Binary Binary TreeTree

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

‘000’

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

‘000’ ’010’

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

‘000’ ’010’ ‘011’

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

‘000’ ’010’ ‘011’ ‘101’

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

‘000’ ’010’ ‘011’

’11’

‘101’

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

‘000’ ’010’ ‘011’

’11’

‘101’

Root Node Non-leaf Node Leaf Node

HuffmaHuffman Treen Tree0 1

0

0 0

0 11

11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

HuffmaHuffman Treen Tree

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

Decode: 1110110000

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

Decode: 1110110000

1: w

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

Decode: 1110110000

11: ww

11: q

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

Decode: 1110110000

111: www

111: wq

111: qw

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

Decode: 1110110000

111: www

111: wq

111: qw

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

HuffmaHuffman Treen Tree

Decode: 1110110000

111: www

111: wq

111: qw

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

BadBadHuffmaHuffman Treen Tree

‘t’

‘y’

‘e’

‘w’

‘r’ ‘q’

0

10

1

10

‘y’ : 0

‘w’ : 1

‘t’ : 00

‘e’ : 01

‘r’ : 10

‘q’ : 11

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 61 bytes

Compression: 79.7 percent

BadBadHuffmaHuffman Treen Tree

Encode: q w e r t y

Code: 11 1 01 10 00 0

Huffman Tree ConstructionHuffman Tree Construction

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,1 ‘q’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,1 ‘w’,1‘q’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,1 ‘w’,1 ‘t’,1‘q’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,1 ‘w’,1 ‘t’,1 ‘e’,1‘q’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,1 ‘w’,1 ‘t’,1 ‘r’,1‘e’,1‘q’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,2 ‘w’,1 ‘t’,1 ‘r’,1‘e’,1‘q’,1

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55

Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55

Each distinct character onlyappears in the array once along

with the # of times it occurs

Huffman Tree Construction:Huffman Tree Construction:Sort the arraySort the array

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘r’,47 ‘q’,55 ‘w’,58‘y’,57‘e’,43

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘r’,47 ‘q’,55 ‘w’,58‘y’,57‘e’,43

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40

‘r’,47 ‘q’,55 ‘w’,58‘y’,57

‘e’,43

83

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40

‘r’,47 ‘q’,55 ‘w’,58‘y’,57

‘e’,43

83

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40

‘r’,47 ‘q’,55 ‘w’,58‘y’,57

‘e’,43

83

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40

‘w’,58‘y’,57

‘e’,43

83

‘r’,47 ‘q’,55

102

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40

‘w’,58‘y’,57

‘e’,43

83

‘r’,47 ‘q’,55

102

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40

‘w’,58‘y’,57

‘e’,43

83

‘r’,47 ‘q’,55

102

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102

‘y’,57 ‘w’,58

115

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102

‘y’,57 ‘w’,58

115

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102

‘y’,57 ‘w’,58

115

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102

‘y’,57 ‘w’,58

115185

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102‘y’,57 ‘w’,58

115 185

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102‘y’,57 ‘w’,58

115 185

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102‘y’,57 ‘w’,58

115 185

300

Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

‘t’,40 ‘e’,43

83

‘r’,47 ‘q’,55

102‘y’,57 ‘w’,58

115 185

300

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe

Input File:

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Huffman Tree:Huffman Tree:Compression ComputationCompression Computation

•NN : # of distinct characters : # of distinct characters•nnii : # of times a character at the i : # of times a character at the ithth

index occursindex occurs•bbii : # of bits used to encode a : # of bits used to encode a

character at the icharacter at the ithth index index• 8 : # of bits stored in a byte8 : # of bits stored in a byte

1

0

8/N

iii bn ‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55

Compression Computation Compression Computation ExampleExample

• Produces:Produces:(57 * 2) + (55 * 3) + (58 * 2) + (40 * 3) + (43 * 3) + (47 * 3)(57 * 2) + (55 * 3) + (58 * 2) + (40 * 3) + (43 * 3) + (47 * 3)

88

= 98.125 -> 99 bytes= 98.125 -> 99 bytes

1

0

8/N

iii bn ‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer:

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer:

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer:

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: q

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: q

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: q

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qw

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qw

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qw

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qw

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwe

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwe

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwe

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwe

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwer

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwer

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwer

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwer

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwert

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwert

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwert

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwerty

‘y’,57

115

‘w’,58

‘t’,40

185

‘r’,47

83

300

‘e’,43 ‘q’,55

102

0

10

1

1

11

0

0

‘y’ : 00

‘w’ : 01

‘t’ : 100

‘e’ : 101

‘r’ : 110

‘q’ : 111

0

Huffman

Encoding:

Uncompressed: 300 bytes

Compressed: 99 bytes

Compression: 67 percent

HuffmaHuffman Treen Tree

Decode: 1110110111010000

Answer: qwerty

Huffman EncodingHuffman EncodingReal World ExampleReal World Example

• Huffman (C++)Huffman (C++)

• Text I/O DirectoryText I/O Directory

Huffman HomeworkHuffman Homework

1.1. Construct a Huffman tree for the Construct a Huffman tree for the following input file. You only need to following input file. You only need to show the final tree. These are the only show the final tree. These are the only characters in the file and the # of times characters in the file and the # of times they occur: (a, 50)(b, 60)(c, 70)(d, 80)they occur: (a, 50)(b, 60)(c, 70)(d, 80)

2.2. How many bytes will the compressed How many bytes will the compressed text file occupy?text file occupy?

Works CitedWorks Cited

• ““ASCII.” ASCII.” Wikipedia The Free EncyclopediaWikipedia The Free Encyclopedia. 2008. . 2008. 21 January 2008. <21 January 2008. <http://http://en.wikipedia.orgen.wikipedia.org/wiki/ASCII/wiki/ASCII>>

• Dewdney, A. K. Dewdney, A. K. The New Turing OmnibusThe New Turing Omnibus. New . New York: Henry Holt, 1989. Pages 345 – 350.York: Henry Holt, 1989. Pages 345 – 350.

• Line Termination: Operating Systems Use Line Termination: Operating Systems Use Different ConventionsDifferent Conventions. 21 January 2008. <. 21 January 2008. <http://homepage.smc.edu/morgan_david/CS41/linhttp://homepage.smc.edu/morgan_david/CS41/lineterminators.htmeterminators.htm>>