Topic 20: Huffman Coding

of 77 /77
Topic 20: Huffman Coding The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass. Sydney Smith, Edinburgh Review

Embed Size (px)

description

Topic 20: Huffman Coding. The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass. Sydney Smith, Edinburgh Review. Compression. Storage of digital information measured in bits and bytes. - PowerPoint PPT Presentation

Transcript of Topic 20: Huffman Coding

Growing Trees to Shrink Files: An Introduction to Huffman Coding

Topic 20: Huffman CodingThe author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.Sydney Smith, Edinburgh Review

AgendaEncodingCompressionHuffman Coding2EncodingUT CS85 84 32 67 8301010101 01010100 00100000 01000011 01010011

what is a file? how do some OS use file extensions?open a bitmap in a text editoropen a pdf in word

3ASCII - UNICODE

4Text File

5Text File???

6Bitmap File

7Bitmap File????

8JPEG File

9JPEG VS BITMAPJPEG File

10Encoding Schemes"It's all 1s and 0s"What do the 1s and 0s mean?50 12 109ASCII -> 2ymRed Blue Green ->dark teal?

11AgendaEncodingCompressionHuffman Coding12CompressionCompression: Storing the same information but in a form that takes less memorylossless and lossy compressionRecall:

13Lossy Artifacts

14Why Bother?Is compression really necessary?

2 Terabytes500 HD, 2 hour movies or 500,000 songs15Little Pipes and Big PumpsHome Internet Access40 Mbps roughly $40 per month.12 months * 3 years * $40 = $1,44010,000,000 bits /second= 1.25 * 106 bytes / sec

CPU Capability$1,500 for a laptop or desktopIntel i7 processorAssume it lasts 3 years.Memory bandwidth25.6 GB / sec= 2.6 * 1010 bytes / secon the order of 5.0 * 1010 instructions / second

16Mobile Devices?Cellular NetworkYour mileage may vary Mega bits per secondAT&T17 download, 7 uploadT-Mobile & Verizon12 download, 7 upload17,000,000 bits per second = 2.125 x 106 bytes per secondhttp://tinyurl.com/q6o7waniPhone CPUApple A6 System on a ChipCoy about IPS2 coresRough estimates:1 x 1010 instructions per second17Little Pipes and Big PumpsData In From Network

CPU18Compression - Why Bother?19

Apostolos "Toli" LeriosFacebook EngineerHeads image storage groupjpeg images already compressedlook for ways to compress even more1% less space = millions of dollars in savings

AgendaEncodingCompressionHuffman Coding2021Purpose of Huffman CodingProposed by Dr. David A. Huffman A Method for the Construction of Minimum Redundancy CodesWritten in 1952Applicable to many forms of data transmissionOur example: text filesstill used in fax machines, mp3 encoding, others

An Introduction to Huffman CodingMarch 21, 2000Mike Scott2122The Basic AlgorithmHuffman coding is a form of statistical codingNot all characters occur with the same frequency!Yet in ASCII all characters are allocated the same amount of space1 char = 1 byte, be it e or x

An Introduction to Huffman CodingMarch 21, 2000Mike Scott2223The Basic AlgorithmAny savings in tailoring codes to frequency of character?Code word lengths are no longer fixed like ASCII or UnicodeCode word lengths vary and will be shorter for the more frequently used characters

An Introduction to Huffman CodingMarch 21, 2000Mike Scott2324The Basic Algorithm1.Scan text to be compressed and tally occurrence of all characters.2.Sort or prioritize characters based on number of occurrences in text.3.Build Huffman code tree based on prioritized list.4.Perform a traversal of tree to determine all code words.5.Scan text again and create new file using the Huffman codesAn Introduction to Huffman CodingMarch 21, 2000Mike Scott2425Building a TreeScan the original textConsider the following short text

Eerie eyes seen near lake.

Count up the occurrences of all characters in the textAn Introduction to Huffman CodingMarch 21, 2000Mike Scott2526Building a TreeScan the original textEerie eyes seen near lake.What characters are present?E e r i space y s n a r l k .An Introduction to Huffman CodingMarch 21, 2000Mike Scott2627Building a TreeScan the original textEerie eyes seen near lake.What is the frequency of each character in the text?Char Freq. Char Freq. Char Freq. E 1 y1 k1 e 8 s 2 .1 r 2 n 2 i 1 a2 space 4 l1An Introduction to Huffman CodingMarch 21, 2000Mike Scott2728Building a TreePrioritize charactersCreate binary tree nodes with character and frequency of each characterPlace nodes in a priority queueThe lower the occurrence, the higher the priority in the queueAn Introduction to Huffman CodingMarch 21, 2000Mike Scott2829The queue after inserting all nodes

Null Pointers are not shownBuilding a TreeE1i1y1l1k1.1r2s2n2a2sp4e8An Introduction to Huffman CodingMarch 21, 2000Mike Scott2930Building a TreeWhile priority queue contains two or more nodesCreate new nodeDequeue node and make it left subtreeDequeue next node and make it right subtreeFrequency of new node equals sum of frequency of left and right childrenEnqueue new node back into queue

An Introduction to Huffman CodingMarch 21, 2000Mike Scott3031Building a TreeE1i1y1l1k1.1r2s2n2a2sp4e8An Introduction to Huffman CodingMarch 21, 2000Mike Scott3132Building a TreeE1i12y1l1k1.1r2s2n2a2sp4e8An Introduction to Huffman CodingMarch 21, 2000Mike Scott3233Building a TreeE1i1k1l1y1.1a2n2r2s2sp4e82An Introduction to Huffman CodingMarch 21, 2000Mike Scott3334Building a TreeE1i1y1.1a2n2r2s2sp4e82k1l12An Introduction to Huffman CodingMarch 21, 2000Mike Scott3435Building a TreeE1i1y1.1a2n2r2s2sp4e82k1l12An Introduction to Huffman CodingMarch 21, 2000Mike Scott3536Building a TreeE1i1a2n2r2s2sp4e82k1l12y1.12An Introduction to Huffman CodingMarch 21, 2000Mike Scott3637Building a TreeE1i1a2n2r2s2sp4e82k1l12y1.12An Introduction to Huffman CodingMarch 21, 2000Mike Scott3738Building a TreeE1i1r2s2sp4e82k1l12y1.12a2n24An Introduction to Huffman CodingMarch 21, 2000Mike Scott3839Building a TreeE1i1r2s2sp4e82k1l12y1.12a2n24An Introduction to Huffman CodingMarch 21, 2000Mike Scott3940Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s24An Introduction to Huffman CodingMarch 21, 2000Mike Scott4041Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s24An Introduction to Huffman CodingMarch 21, 2000Mike Scott4142Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s244An Introduction to Huffman CodingMarch 21, 2000Mike Scott4243Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s244An Introduction to Huffman CodingMarch 21, 2000Mike Scott4344Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s2446An Introduction to Huffman CodingMarch 21, 2000Mike Scott4445Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s2446What is happening to the characters with a low number of occurrences?An Introduction to Huffman CodingMarch 21, 2000Mike Scott4546Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s24468An Introduction to Huffman CodingMarch 21, 2000Mike Scott4647Building a TreeE1i1sp4e82k1l12r1.12a2n24r2s24468An Introduction to Huffman CodingMarch 21, 2000Mike Scott4748Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s2446810An Introduction to Huffman CodingMarch 21, 2000Mike Scott4849Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s2446810An Introduction to Huffman CodingMarch 21, 2000Mike Scott4950Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s244681016An Introduction to Huffman CodingMarch 21, 2000Mike Scott5051Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s244681016An Introduction to Huffman CodingMarch 21, 2000Mike Scott5152Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s24468101626An Introduction to Huffman CodingMarch 21, 2000Mike Scott5253Building a TreeE1i1sp4e82k1l12y1.12a2n24r2s24468101626After enqueueing this node there is only one node left in priority queue.An Introduction to Huffman CodingMarch 21, 2000Mike Scott5354Building a TreeDequeue the single node left in the queue.

This tree contains the new code words for each character.

Frequency of root node should equal number of characters in text.

E1i1sp4e82k1l12y1.12a2n24r2s24468101626Eerie eyes seen near lake. 4 spaces,26 characters totalAn Introduction to Huffman CodingMarch 21, 2000Mike Scott5455Encoding the FileTraverse Tree for CodesPerform a traversal of the tree to obtain new code wordsleft, append a 0 to code wordright append a 1 to code wordcode word is only completed when a leaf node is reached E1i1sp4e82k1l12y1.12a2n24r2s24468101626An Introduction to Huffman CodingMarch 21, 2000Mike Scott5556Encoding the FileTraverse Tree for CodesCharCodeE0000i0001k0010l0011y0100.0101space011e10a1100n1101r1110s1111E1i1sp4e82k1l12y1.12a2n24r2s24468101626An Introduction to Huffman CodingMarch 21, 2000Mike Scott5657Encoding the FileRescan text and encode file using new code wordsEerie eyes seen near lake.

CharCodeE0000i0001k0010l0011y0100.0101space011e10a1100n1101r1110s1111000010111000011001110010010111101111111010110101111011011001110011001111000010100101An Introduction to Huffman CodingMarch 21, 2000Mike Scott5758Encoding the FileResultsHave we made things any better?82 bits to encode the textASCII would take 8 * 26 = 208 bits000010111000011001110010010111101111111010110101111011011001110011001111000010100101If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great.An Introduction to Huffman CodingMarch 21, 2000Mike Scott5859Decoding the FileHow does receiver know what the codes are?Tree constructed for each text file. Considers frequency for each fileBig hit on compression, especially for smaller filesTree predeterminedbased on statistical analysis of text files or file types

An Introduction to Huffman CodingMarch 21, 2000Mike Scott5960Decoding the FileOnce receiver has tree it scans incoming bit stream0 go left1 go right101000100111100011111111011100001010

elk nay sireek a snakeeek kin slyeek snarl nileel a snarl

E1i1sp4e82k1l12y1.12a2n24r2s24468101626Assignment Hintsreading chunks not charsheader formatthe pseudo eof characterthe GUI

61Assignment Example"Eerie eyes seen near lake." will result in different codes than those shown in slides due to:adding elements in order to PriorityQueuerequired pseudo eof character (PEOF)

62Assignment Example63Char Freq. Char Freq. Char Freq. E 1 y1 k1 e 8 s 2 .1 r 2 n 2 PEOF 1 i 1 a2 space 4 l1Assignment Example64.1y1E1i1k1l1PEOF1a2n2r2s2SP4e8Assignment Example65.1y1E1i1k1l1PEOF1a2n2r2s2SP4e82Assignment Example66.1y1E1i1k1l1PEOF1a2n2r2s2SP4e82Assignment Example67.1y1E1i1k1l1PEOF1a2n2r2s2SP4e822Assignment Example68.1y1E1i1k1l1PEOF1a2n2r2s2SP4e8222Assignment Example69.1y1E1i1k1l1PEOF1a2n2r2s2SP4e82223Assignment Example70.1y1E1i1k1l1PEOF1a2n2r2s2SP4e822234Assignment Example71.1y1E1i1k1l1PEOF1a2n2r2s2SP4e822234472.1y1E1i1k1l1PEOF1a2n2r2s2SP4e82223444773y1i1k1l1PEOF1a2SP4e822347.1E1n2r2s2244874y1i1k1l1PEOF1a2SP4e822347.1E1n2r2s2244811y1i1k1l1PEOF1a2SP4e822347.1E1n2r2s22448111675An Introduction to Huffman CodingMarch 21, 2000Mike Scott75y1i1k1l1PEOF1a2SP4e822347.1E1n2r2s2244811162776An Introduction to Huffman CodingMarch 21, 2000Mike Scott76Codes77value: 32, equivalent char: , frequency: 4, new code 011value: 46, equivalent char: ., frequency: 1, new code 11110value: 69, equivalent char: E, frequency: 1, new code 11111value: 97, equivalent char: a, frequency: 2, new code 0101value: 101, equivalent char: e, frequency: 8, new code 10value: 105, equivalent char: i, frequency: 1, new code 0000value: 107, equivalent char: k, frequency: 1, new code 0001value: 108, equivalent char: l, frequency: 1, new code 0010value: 110, equivalent char: n, frequency: 2, new code 1100value: 114, equivalent char: r, frequency: 2, new code 1101value: 115, equivalent char: s, frequency: 2, new code 1110value: 121, equivalent char: y, frequency: 1, new code 0011value: 256, equivalent char: ?, frequency: 1, new code 0100