03: Huffman Coding -...

of 51/51
03: Huffman Coding CSCI 6990 Data Compression Vassil Roussev 1 CSCI 6990.002: Data Compression 03: Huffman Coding Vassil Roussev <vassil @ cs.uno.edu> UNIVERSITY of NEW ORLEANS DEPARTMENT OF COMPUTER SCIENCE 2 Shannon-Fano Coding The first code based on Shannon’s theory Suboptimal (it took a graduate student to fix it!) Algorithm Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set to minimize * difference Add ‘0’ to the codes in the first set and ‘1’ to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split
  • date post

    21-Mar-2018
  • Category

    Documents

  • view

    213
  • download

    1

Embed Size (px)

Transcript of 03: Huffman Coding -...

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 1

    CSCI 6990.002: Data Compression

    03: Huffman Coding

    Vassil Roussev

    UNIVERSITYofNEW ORLEANSDEPARTMENTOFCOMPUTERSCIENCE

    2

    Shannon-Fano Coding

    The first code based on Shannons theorySuboptimal(ittookagraduatestudenttofixit!)

    AlgorithmStartwithemptycodesComputefrequencystatisticsforallsymbolsOrderthesymbolsinthesetbyfrequencySplitthesettominimize* differenceAdd0 tothecodesinthefirstsetand1 totherestRecursivelyassigntherestofthecodebitsforthetwosubsets,untilsetscannotbesplit

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 2

    3

    Shannon-Fano Coding (2)

    a b c d e f9 8 6 5 4 2

    a b9 8

    c d e f6 5 4 2

    0 1

    00 11 11

    4

    Shannon-Fano Coding (3)

    a b c d e f9 8 6 5 4 2

    a b9 8

    c d e f6 5 4 2

    0 1

    0 1

    00 11 11

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 3

    5

    Shannon-Fano Coding (4)

    a b c d e f9 8 6 5 4 2

    a b9 8

    c d e f6 5 4 2

    0 1

    0 1

    0100 11 11

    6

    Shannon-Fano Coding (5)

    a b c d e f9 8 6 5 4 2

    a b9 8

    0

    0 1

    c d e f6 5 4 2

    1

    0 1

    0100 11 11

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 4

    7

    Shannon-Fano Coding (6)

    a b c d e f9 8 6 5 4 2

    a b9 8

    c d e f6 5 4 2

    0 1

    0 1 0 1

    0100 1010 1111

    8

    Shannon-Fano Coding (7)

    a b c d e f9 8 6 5 4 2

    a b9 8

    e f4 2

    0 1

    0 1 1

    0100 1010 1111

    c d6 5

    0

    0 1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 5

    9

    Shannon-Fano Coding (8)

    a b c d e f9 8 6 5 4 2

    a b9 8

    e f4 2

    0 1

    0 1 1

    0100 101100 1111

    c d6 5

    0

    0 1

    10

    Shannon-Fano Coding (9)

    a b c d e f9 8 6 5 4 2

    a b9 8

    0 1

    0 1

    0100 101100 1111

    c d6 5

    0

    0 1

    1

    e f4 2

    0 1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 6

    11

    Shannon-Fano Coding (10)

    a b c d e f9 8 6 5 4 2

    a b9 8

    0 1

    0 1

    0100 101100 111110

    c d6 5

    0

    0 1

    1

    e f4 2

    0 1

    12

    Optimum Prefix Codes

    Key observations on optimal codes1. Symbolsthatoccurmorefrequentlywillhaveshortercodewords2. Thetwoleastfrequentsymbolswillhavethesamelength

    Proofs1. Assumetheoppositecodeisclearlysuboptimal2. Assumetheopposite

    Let X, Y be the least frequent symbols &|code(X)| = k, |code(Y)| = k+1

    Thenby unique decodability (UD), code(X) cannot be a prefix for code(Y)also, all other codes are shorterDropping the last bit of |code(Y)| would generate a new, shorter, uniquely decodable code

    !!! This contradicts optimality assumption !!!

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 7

    13

    Huffman Coding

    David Huffman (1951)GradstudentofRobertM.Fano (MIT)

    Term paper(!)

    Explained by example

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability Set

    c

    d

    e

    b

    a

    Set ProbLetter

    14

    Huffman Coding by Example

    Init: Create a set out of each letter

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability Set

    c

    d

    e

    b

    a

    Set ProbLetter Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    e

    d

    c

    b

    a

    Set

    0.2c

    0.1d

    0.1e

    0.4b

    0.2a

    Set ProbLetter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 8

    15

    Huffman Coding by Example

    1. Sort sets according to probability (lowest first)

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    e

    d

    c

    b

    a

    Set

    0.2c

    0.1d

    0.1e

    0.4b

    0.2a

    Set ProbLetter Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    e

    d

    Set

    0.2c

    0.2d

    0.4e

    0.1b

    0.1a

    Set ProbLetter

    16

    Huffman Coding by Example

    2. Insert prefix 1 into the codes of top set letters

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    e

    d

    Set

    0.2c

    0.2d

    0.4e

    0.1b

    0.1a

    Set ProbLetter

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    e

    d

    Set

    0.2c

    0.2d

    0.4e

    0.1b

    0.1a

    Set ProbLetter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 9

    17

    Huffman Coding by Example

    3. Insert prefix 0 into the codes of the second set letters

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    e

    d

    Set

    0.2c

    0.2d

    0.4e

    0.1b

    0.1a

    Set ProbLetter

    0

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    e

    d

    Set

    0.2c

    0.2d

    0.4e

    0.1b

    0.1a

    Set ProbLetter

    18

    Huffman Coding by Example

    4. Merge the top two sets

    0

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    e

    d

    Set

    0.2c

    0.2d

    0.4e

    0.1b

    0.1a

    Set ProbLetter

    0

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    d

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 10

    19

    Huffman Coding by Example

    1. Sort sets according to probability (lowest first)

    0

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

    20

    Huffman Coding by Example

    0

    1

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

    10

    11

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

    2. Insert prefix 1 into the codes of top set letters

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 11

    21

    Huffman Coding by Example

    10

    11

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

    10

    11

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

    3. Insert prefix 0 into the codes of the second set letters

    22

    Huffman Coding by Example

    4. Merge the top two sets

    10

    11

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    a

    de

    Set

    0.2c

    0.4d

    e

    0.2b

    0.2a

    Set ProbLetter

    10

    11

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    dea

    Set

    0.4c

    d

    e

    0.2b

    0.4a

    Set ProbLetter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 12

    23

    Huffman Coding by Example

    10

    11

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    c

    dea

    Set

    0.4c

    d

    e

    0.2b

    0.4a

    Set ProbLetter

    10

    11

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    dea

    c

    Set

    0.4c

    d

    e

    0.4b

    0.2a

    Set ProbLetter

    1. Sort sets according to probability (lowest first)

    24

    Huffman Coding by Example

    10

    11

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    dea

    c

    Set

    0.4c

    d

    e

    0.4b

    0.2a

    Set ProbLetter

    10

    11

    1

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    dea

    c

    Set

    0.4c

    d

    e

    0.4b

    0.2a

    Set ProbLetter

    2. Insert prefix 1 into the codes of top set letters

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 13

    25

    Huffman Coding by Example

    10

    11

    1

    0

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    dea

    c

    Set

    0.4c

    d

    e

    0.4b

    0.2a

    Set ProbLetter

    001

    011

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    dea

    c

    Set

    0.4c

    d

    e

    0.4b

    0.2a

    Set ProbLetter

    3. Insert prefix 0 into the codes of the second set letters

    26

    Huffman Coding by Example

    010

    011

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    dea

    c

    Set

    0.4c

    d

    e

    0.4b

    0.2a

    Set ProbLetter

    4. Merge the top two sets

    010

    011

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    cdea

    Set

    c

    d

    e

    0.4b

    0.6a

    Set ProbLetter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 14

    27

    Huffman Coding by Example

    010

    011

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    b

    cdea

    Set

    c

    d

    e

    0.4b

    0.6a

    Set ProbLetter

    1. Sort sets according to probability (lowest first)

    010

    011

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    cdea

    b

    Set

    c

    d

    e

    0.6b

    0.4a

    Set ProbLetter

    28

    Huffman Coding by Example

    010

    011

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    cdea

    b

    Set

    c

    d

    e

    0.6b

    0.4a

    Set ProbLetter

    010

    011

    1

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    cdea

    b

    Set

    c

    d

    e

    0.6b

    0.4a

    Set ProbLetter

    2. Insert prefix 1 into the codes of top set letters

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 15

    29

    Huffman Coding by Example

    010

    011

    1

    1

    00

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    cdea

    b

    Set

    c

    d

    e

    0.6b

    0.4a

    Set ProbLetter

    0010

    0011

    01

    1

    000

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    cdea

    b

    Set

    c

    d

    e

    0.6b

    0.4a

    Set ProbLetter

    3. Insert prefix 0 into the codes of the second set letters

    30

    Huffman Coding by Example

    0010

    0011

    01

    1

    000

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    cdea

    b

    Set

    c

    d

    e

    0.6b

    0.4a

    Set ProbLetter

    4. Merge the top two sets

    0010

    0011

    01

    1

    000

    Code

    0.1

    0.1

    0.2

    0.4

    0.2

    Probability

    abcde

    Set

    c

    d

    e

    b

    1.0a

    Set ProbLetter

    The END

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 16

    31

    Example Summary

    Average code lengthl =0.4x1+0.2x2+0.2x3+0.1x4+0.1x4=2.2 bits/symbol

    Entropy

    H =s=a..eP(s)log2P(s)=2.122 bits/symbolRedundancy

    l H =0.078 bits/symbol

    32

    Huffman Tree

    a

    b

    e

    0 1

    c

    d

    0

    0

    0

    1

    1

    10010

    0011

    01

    1

    000

    Code

    c

    d

    e

    b

    a

    Letter

    0.1 0.1

    0.2

    0.2

    0.4

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 17

    33

    Building a Huffman Tree

    a

    b

    e

    c

    d

    Code

    c

    d

    e

    b

    a

    Letter

    0.1 0.1

    0.2

    0.2

    0.4

    0 10.2

    0

    1

    Code

    c

    d

    e

    b

    a

    Letter

    34

    Building a Huffman Tree

    a

    b

    e

    c

    d

    0 10

    1

    Code

    c

    d

    e

    b

    a

    Letter

    0.1 0.1

    0.2

    0.2

    0.4

    0.2

    0 10.4

    10

    11

    0

    Code

    c

    d

    e

    b

    a

    Letter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 18

    35

    Building a Huffman Tree

    a

    b

    e

    c

    d

    0

    0

    1

    1

    0.1 0.1

    0.2

    0.2

    0.4

    10

    11

    0

    Code

    c

    d

    e

    b

    a

    Letter

    0.2

    0.4

    0 10.6

    010

    011

    1

    00

    Code

    c

    d

    e

    b

    a

    Letter

    36

    Building a Huffman Tree

    a

    b

    e

    c

    d

    0

    0

    0

    1

    1

    1

    0.1 0.1

    0.2

    0.2

    0.4

    010

    011

    1

    00

    Code

    c

    d

    e

    b

    a

    Letter

    0010

    0011

    01

    1

    000

    Code

    c

    d

    e

    b

    a

    Letter

    0.4

    0.2

    0.6

    0 11.0

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 19

    37

    An Alternative Huffman Tree

    b

    e d

    Code

    c

    d

    e

    b

    a

    Letter

    0.1 0.1

    0.4

    0 10.2

    0

    1

    Code

    c

    d

    e

    b

    a

    Letter

    a c0.2 0.2

    38

    An Alternative Huffman Tree

    b

    e d0.1 0.1

    0.4

    0 10.2

    a c0.2 0.2

    0 10.4

    0

    1

    Code

    c

    d

    e

    b

    a

    Letter

    0

    1

    1

    0

    Code

    c

    d

    e

    b

    a

    Letter

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 20

    39

    An Alternative Huffman Tree

    b

    e d0.1 0.1

    0.4

    0 10.2

    a c0.2 0.2

    0 10.4

    0

    1

    1

    0

    Code

    c

    d

    e

    b

    a

    Letter

    0.60 1

    10

    11

    01

    00

    Code

    c

    d

    e

    b

    a

    Letter

    40

    An Alternative Huffman Tree

    b

    e d0.1 0.1

    0.4

    0 10.2

    a c0.2 0.2

    0 10.4

    0.6

    10

    11

    01

    00

    Code

    c

    d

    e

    b

    a

    Letter

    010

    011

    001

    1

    000

    Code

    c

    d

    e

    b

    a

    Letter

    0 1

    0 1

    Average code lengthl =0.4x1+(0.2+0.2+0.1+0.1)x3=2.2 bits/symbol

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 21

    41

    Yet Another Tree

    b

    e d0.1 0.1

    0.40 10.2

    a c0.2 0.2

    0 10.40.6

    100

    101

    01

    11

    00

    Code

    c

    d

    e

    b

    a

    Letter 0 1

    0 1

    Average code lengthl =0.4x2+(0.2+0.2)x2+(0.1+0.1)x3=2.2 bits/symbol

    42

    Min Variance Huffman Trees

    Huffman codes are not uniqueAllversionsyieldthesameaverage length

    Which one should we choose?Theonewiththeminimumvarianceincodewordlengths

    I.e. with the minimum height tree

    Why?Itwillensuretheleastamountofvariabilityintheencodedstream

    How to achieve it?Duringsorting,breaktiesbyplacingsmallersetshigher

    Alternatively, place newly merged sets as low as possible

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 22

    43

    Extended Huffman Codes

    Consider the source:A ={a,b,c},P(a)=0.8,P(b)=0.02,P(c)=0.18H =0.816 bits/symbol

    Huffman code:a 0b 11c 10

    l =1.2 bits/symbolRedundancy=0.384b/sym (47%!)

    Q: Could we do better?

    44

    Extended Huffman Codes (2)

    IdeaConsiderencodingsequencesoftwoletters asopposedtosingleletters

    101001000.0036cb

    1010000.0160ba

    101001010.0004bb

    10100110.0036bc

    0.0324

    0.1440

    0.1440

    0.0160

    0.6400

    Probability

    1011

    100

    11

    10101

    0

    Code

    ac

    ca

    cc

    ab

    aa

    Letter

    l = 1.7228/2 = 0.8614

    Red. = 0.0045 bits/symbol

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 23

    45

    Extended Huffman Codes (3)

    The idea can be extended furtherConsiderallpossiblenm sequences(wedid32)

    In theory, by considering more sequences we can improve the codingIn reality, the exponential growth of the alphabet makes this impractical

    E.g.,forlength3ASCIIseq.:2563 =224 =16M

    Most sequences would have zero frequencyOther methods are needed

    46

    Adaptive Huffman Coding

    ProblemHuffmanrequiresprobabilityestimatesThiscouldturnitintoatwopassprocedure:1. Collect statistics, generate codewords2. Perform actual encoding

    NotpracticalinmanysituationsE.g. compressing network transmissions

    Theoretical solutionStartwithequalprobabilitiesBasedonthefirstk symbolstatistics(k =1,2,)regeneratecodewordsandencodek+1st symbol

    Too expensive in practice

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 24

    47

    Adaptive Huffman Coding (2)

    Basic ideaAlphabetA ={a1,,an}PickafixeddefaultbinarycodesforallsymbolsStartwithanemptyHuffmantreeReadsymbols fromsource

    If NYT(s) // Not Yet Transmitted SendNYT,default(s) Updatetree(andkeepitHuffman)

    Else Sendcodewordfors Updatetree

    Untildone

    Notes:CodewordswillchangeasafunctionofsymbolfrequenciesEncoder&decoderfollowthesameproceduresotheystayinsync

    48

    Adaptive Huffman Tree

    Tree has at most 2n - 1 nodesNode attributes

    symbol, left, right,parent, siblings, leafweight

    If xk is leaf then weight(xk) = frequency of symbol(xk)Else xk = weight( left(xk)) + weight( right(xk))

    id,assignedasfollows:If weight(x1) weight(x2) weight(x2n-1) thenid(x1) id(x2) id(x2n-1)Also, parent(x2k-1) = parent(x2k), for 1 k n

    Siblingproperty

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 25

    49

    Updating the Tree

    Assign id(root) = 2n-1, weight(NYT) = 0Start with an NYT nodeWhenever a new symbols is seen, a new node is formed by splitting the NYT Maintaining sibling property

    Whenevernodex isupdatedRepeat

    Ifweight(x)

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 26

    51

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Input: aardvark

    Output: 00000

    0

    49

    NYT

    1

    51

    1

    50

    a

    52

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Input: aardvark

    Output: 000001

    0

    49

    NYT

    2

    51

    2

    50

    a

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 27

    53

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 00

    a 1

    r 01

    d

    v

    k

    Symbol Code

    NYT 00

    a 1

    r 01

    d

    v

    k

    Input: aardvark

    Output: 000001010001

    0

    49

    NYT

    3

    51

    2

    50

    a1

    47

    1

    48

    r

    54

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Input: aardvark

    Output: 0000010100010000011

    49

    4

    51

    2

    50

    a2

    47

    1

    48

    r

    0NYT

    45

    1

    46

    d

    1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 28

    55

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Input: aardvark

    Output: 000001010001000001100049

    4

    51

    2

    50

    a2

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    56

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Input: aardvark

    Output: 000001010001000001100049

    4

    51

    2

    50

    a2

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 29

    57

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Input: aardvark

    Output: 000001010001000001100049

    4

    51

    2

    50

    a3

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    58

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Input: aardvark

    Output: 000001010001000001100049

    4

    51

    2

    50

    a3

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 30

    59

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Input: aardvark

    Output: 00000101000100000110001010149

    5

    51

    2

    50

    a 3

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    60

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Input: aardvark

    Output: 000001010001000001100010101049

    6

    51

    3

    50

    a 3

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 31

    61

    Adaptive Huffman Encoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Input: aardvark

    Output: 00000101000100000110001010101049

    7

    51

    3

    50

    a 4

    472

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    62

    Adaptive Huffman Encoding

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Input: aardvark 49

    7

    51

    3

    50

    a 4

    472

    48

    r

    1

    46

    d

    2

    45

    1

    44

    v

    2

    43

    0NYT

    41

    1

    42

    1

    k

    Output: 0000010100010000011000101010101100

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 32

    63

    Adaptive Huffman Encoding

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Input: aardvark 49

    7

    51

    3

    50

    a 4

    472

    48

    r

    1

    46

    d

    2

    45

    1

    44

    v

    2

    43

    0NYT

    41

    1

    42

    1

    k

    Output: 0000010100010000011000101010101100

    64

    Adaptive Huffman Encoding

    Symbol Code

    NYT 11100

    a 0

    r 10

    d 110

    v 1111

    k 11101

    Symbol Code

    NYT 11100

    a 0

    r 10

    d 110

    v 1111

    k 11101

    Input: aardvark

    Output: 000001010001000001100010101010110001010

    49

    8

    51

    3

    50

    a 5

    472

    48

    r

    1

    46

    d

    3

    45

    1

    44

    v

    2

    43

    0NYT

    41

    1

    42

    1

    kk 01010k 01010

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 33

    65

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Symbol Code

    NYT

    a

    r

    d

    v

    k

    Symbol Code

    NYT

    a

    r

    d

    v

    k

    Output:

    0

    51

    NYT

    Input: 000001010001000001100010101010110001010

    66

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: a

    Input: 000001010001000001100010101010110001010

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    0

    49

    NYT

    1

    51

    1

    50

    a

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 34

    67

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aa

    Input: -----1010001000001100010101010110001010

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    0

    49

    NYT

    2

    51

    2

    50

    a

    68

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aa

    Input: ------010001000001100010101010110001010

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    Symbol Code

    NYT 0

    a 1

    r

    d

    v

    k

    0

    49

    NYT

    2

    51

    2

    50

    a

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 35

    69

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aar

    Input: -------10001000001100010101010110001010

    Symbol Code

    NYT 00

    a 1

    r 01

    d

    v

    k

    Symbol Code

    NYT 00

    a 1

    r 01

    d

    v

    k

    0

    49

    NYT

    3

    51

    2

    50

    a1

    47

    1

    48

    r

    70

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aar

    Input: ------------000001100010101010110001010

    Symbol Code

    NYT 00

    a 1

    r 01

    d

    v

    k

    Symbol Code

    NYT 00

    a 1

    r 01

    d

    v

    k

    0

    49

    NYT

    3

    51

    2

    50

    a1

    47

    1

    48

    r

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 36

    71

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aard

    Input: --------------0001100010101010110001010

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    49

    4

    51

    2

    50

    a2

    47

    1

    48

    r

    0NYT

    45

    1

    46

    d

    1

    72

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aard

    Input: -------------------00010101010110001010

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    49

    4

    51

    2

    50

    a2

    47

    1

    48

    r

    0NYT

    45

    1

    46

    d

    1

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 37

    73

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aardv

    Input: ----------------------10101010110001010

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    Symbol Code

    NYT 000

    a 1

    r 01

    d 001

    v

    k

    49

    4

    51

    2

    50

    a2

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    74

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aardv

    Input: ---------------------------010110001010

    49

    5

    51

    2

    50

    a 3

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 38

    75

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aardva

    Input: ---------------------------010110001010

    49

    6

    51

    3

    50

    a 3

    471

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    76

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aardvar

    Input: ---------------------------10110001010

    49

    7

    51

    3

    50

    a 4

    472

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 39

    77

    Adaptive Huffman Decoding

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    a 00000 f 00101 k 01010 p 01111 u 10100b 00001 g 00110 l 01011 q 10000 v 10101c 00010 h 00111 m 01100 r 10001 w 10110d 00011 i 01000 n 01101 s 10010 x 10111e 00100 j 01001 o 01110 t 10011 y 11000

    Output: aardvar

    Input: -----------------------------110001010

    49

    7

    51

    3

    50

    a 4

    472

    48

    r

    1

    46

    d

    2

    45

    0NYT

    43

    1

    44

    v

    1

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    78

    Adaptive Huffman Decoding

    Output: aardvark

    Input: ----------------------------01010

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    Symbol Code

    NYT 1100

    a 0

    r 10

    d 111

    v 1101

    k

    k 01010k 01010

    49

    7

    51

    3

    50

    a 4

    472

    48

    r

    1

    46

    d

    2

    45

    1

    44

    v

    2

    43

    0NYT

    41

    1

    42

    1

    k

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 40

    79

    Adaptive Huffman Decoding

    Symbol Code

    NYT 11100

    a 0

    r 10

    d 110

    v 1111

    k 11101

    Symbol Code

    NYT 11100

    a 0

    r 10

    d 110

    v 1111

    k 11101

    49

    8

    51

    3

    50

    a 5

    472

    48

    r

    1

    46

    d

    3

    45

    1

    44

    v

    2

    43

    0NYT

    41

    1

    42

    1

    k

    80

    Dealing with Counter Overflow

    Over time counters can overflowE.g.,32bitcounter~4billion

    BIG but still finite and can overflow on long network connections

    Solution?Rescaleallfrequencycounts(ofleafnodes)whenlimitisreached

    E.g., divide by two all of themRecomputetherestofthetree(keepitHuffman!)Note:Afterrescaling,new symbolswillcounttwiceasmuchasoldones!

    This is mostly a feature, not a bug: Datatendstohavestronglocalcorrelation I.e.,whathappenedalongtimeagoisnotasimportantaswhat

    happenedmorerecently

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 41

    81

    Huffman Image Compression

    Example images: 256x256 pixels, 8 bits/pixel, 65,536 bytes

    Huffman coding of pixel values

    Sena Sensin Earth Omaha

    58,374

    40,534

    61,430

    57,504

    Size (bytes)

    1.127.12Omaha

    4.94

    7.49

    7.01

    Bits/pixel

    1.62

    1.07

    1.14

    CompressionRatio

    Earth

    Sensin

    Sena

    Image

    82

    Huffman Image Compression (2)

    Basic observationsTheplainHuffmanyieldsmodestgains,exceptintheEarth case

    Lots of black skews the pixel distribution nicely

    Wearenottakingintoaccountobvious correlationsofpixelvalues

    Huffman coding of pixel differences

    52,643

    33,880

    38,541

    32,968

    Size (bytes)

    1.246.42Omaha

    4.13

    4.70

    4.02

    Bits/pixel

    1.93

    1.70

    1.99

    CompressionRatio

    Earth

    Sensin

    Sena

    Image

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 42

    83

    Two-pass Huffman vs. Adaptive Huffman

    52,643

    33,880

    38,541

    32,968

    Size (bytes)

    1.246.42Omaha

    4.13

    4.70

    4.02

    Bits/pixel

    1.93

    1.70

    1.99

    CompressionRatio

    Earth

    Sensin

    Sena

    Image

    52,321

    39,504

    37,896

    32,261

    Size (bytes)

    1.256.39Omaha

    4.82

    4.63

    3.93

    Bits/pixel

    1.66

    1.73

    2.03

    CompressionRatio

    Earth

    Sensin

    Sena

    Image

    Two-pass

    Adaptive

    84

    Huffman Text Compression

    PDF(letters): US Constitution vs. Chapter 3

    0.00

    0.02

    0.04

    0.06

    0.08

    0.10

    0.12

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

    Letter

    Prob

    abili

    ty

    P(Consitution)

    P(Chapter)

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 43

    85

    Huffman Audio Compression

    13.7

    13.8

    12.8

    Entropy(bits)

    759,540

    349,300

    725,420

    Est. CompressedFile Size (bytes)

    1.30884,020Mir

    402,442

    939,862

    OriginalFile Size (bytes)

    1.15

    1.16

    CompressionRatio

    Cohn

    Mozart

    File Name

    Huffman coding: 16-bit CD audio (44,100 Hz) x 2 channels

    10.9

    10.4

    9.7

    Entropy of Diff.

    (bits)

    602,240

    261,590

    569.792

    Est. CompressedFile Size (bytes)

    1.47884,020Mir

    402,442

    939,862

    OriginalFile Size (bytes)

    1.54

    1.65

    CompressionRatio

    Cohn

    Mozart

    File Name

    Difference Huffman Coding

    86

    Golomb Codes

    Designed to encode integer sequences where the larger the number the lower the probabilityE.g., unary code

    Theunaryrepresentationofthenumberfollowedby00 01 102 1103 1110IdenticaltoHuffmancodefor{1,2,3,}andP(k)=1/2k

    Optimalfortheprobabilitymodel

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 44

    87

    Golomb Codes (2)

    Family of codes based on parameter mTo represent n, we compute

    q =n/m (quotient)r =n qm (remainder)representq inunarycode,followedbyr inlog2mbitsIfm isnotapowerof2thenwecanuselog2m bitsBetteryet,wecanuse

    log2m-bit representation for 0 r 2 log2m-m-1, and thelog2m-bit representation of r+2 log2m-m for the rest

    88

    Golomb Code Example

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0

    r

    000

    Codeword0

    q0

    n

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 45

    89

    Golomb Code Example (2)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    1

    0

    r

    001

    000

    Codeword

    0

    0

    q

    1

    0

    n

    90

    Golomb Code Example (3)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 46

    91

    Golomb Code Example (4)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    92

    Golomb Code Example (5)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 47

    93

    Golomb Code Example (6)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    94

    Golomb Code Example (7)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    0

    r

    1000

    Codeword1

    q6

    n

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 48

    95

    Golomb Code Example (8)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    1

    0

    r

    1001

    1000

    Codeword

    1

    1

    q

    7

    6

    n

    96

    Golomb Code Example (9)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    2

    1

    0

    r

    10100

    1001

    1000

    Codeword

    18

    1

    1

    q

    7

    6

    n

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 49

    97

    Golomb Code Example (10)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    10101319

    2

    1

    0

    r

    10100

    1001

    1000

    Codeword

    18

    1

    1

    q

    7

    6

    n

    98

    Golomb Code Example (11)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    101104110

    10101319

    2

    1

    0

    r

    10100

    1001

    1000

    Codeword

    18

    1

    1

    q

    7

    6

    n

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 50

    99

    Golomb Code Example (12)

    m = 6log2m =2 log2m =32bitcodesfor0 r2log2661

    0 r 13bitcodesofr+2log266fortherest

    r+2

    0111505

    0110404

    0101303

    2

    1

    0

    r

    0100

    001

    000

    Codeword

    02

    0

    0

    q

    1

    0

    n

    101115111

    101104110

    10101319

    2

    1

    0

    r

    10100

    1001

    1000

    Codeword

    18

    1

    1

    q

    7

    6

    n

    100

    Golomb Codes: Choosing m

    Assume a binary string (zeroes & ones)It can be encoded counting the runs of identical bits

    (eitherzeroesorones)A.k.a.runlengthencoding(RLE)

    E.g.00001001100010010000000001001110000100001000100---4 -2 0--3 -2 --------9 -2 00---4 ---4 --3 -2

    4,2,0,3,2,9,2,0,0,4,4,3,2

    35zeroes,12onesP(0)=35/(35+12)=0.745

    ( ) ( )2

    745.0log745.01log

    log1log

    2

    2

    2

    2 =

    +=

    += m

    pp

    m

  • 03: Huffman Coding CSCI 6990 Data Compression

    Vassil Roussev 51

    101

    Summary

    Early ShannonFano codeHuffman code

    Original(twopass)versionCollect symbol statistics, assign codesPerform actual encoding of the source

    ExtendedversionGroup multiple symbols to reduce entropy estimate

    AdaptiveversionMost practicalbuild Huffman tree on the flySingle passEscape codes for NYT symbolsEncoder & decoder are synchronizedMore sensitive to local variation, tends to forget older data

    102

    Problems & Extra

    Preparation problems (Sayood 3rd, pp. 74-76)Minimum:4,5,6,10Recommended:1,2,3,11

    ExtraHuffman

    Optimality of Huffman Codes /3.2.2/Length of Huffman Codes*/3.2.3/Extended Huffman Codes (theory) /3.2.4/Non-binary Huffman Codes /3.3/

    RelatedCodesGolomb /3.5/Rice /3.6/Tunstall /3.7/