Huffman coding - unisi.itmarco/bdm/Materiale_didattico/2005... · Huffman coding - notes In the...

of 37 /37
Huffman coding

Embed Size (px)

Transcript of Huffman coding - unisi.itmarco/bdm/Materiale_didattico/2005... · Huffman coding - notes In the...

  • Huffman coding

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 2

    Optimal codes - I

    A code is optimal if it has the shortest codeword length L

    This can be seen as an optimization problem1

    m

    i ii

    L p l=

    =∑

    1

    1

    min

    subject to 1i

    m

    i ii

    ml

    i

    l p

    D

    =

    =

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 3

    Optimal codes - II

    Let’s make two simplifying assumptionsno integer constraint on the codelengthsKraft inequality holds with equality

    Lagrange-multiplier problem

    1 1

    1im m

    li i

    i i

    J p l Dλ −= =

    ⎛ ⎞= + −⎜ ⎟⎝ ⎠

    ∑ ∑

    0 log 0 log

    j jl l jj

    j

    pJ p D D Dl D

    λλ

    − −∂ = → − = → =∂

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 4

    Optimal codes - III

    Substitute into the Kraft inequality

    that is

    Note that

    logjl jpD

    Dλ− =

    1

    11 log log

    i

    mli

    ii

    p p DD D

    λλ

    =

    = → = → =∑* logi D il p= −

    ** log ( ) !!m m

    i i i D i Dp l p pL H X= == −∑ ∑

    the entropy, when we use base D for logarithms

    1 1i i= =

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 5

    Optimal codes - IV

    In practice the codeword lengths must be integer value, so obtained results is a lower bound

    TheoremThe expected length of any istantaneous D-ary code for a r.v. X satisfies

    this fundamental result derives frow the work of Shannon

    ( )DL H x≥

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 6

    Optimal codes - V

    What about the upper bound?

    TheoremGiven a source alphabet (i.e. a r.v.) of entropy it is possible to find an instantaneous binary code which length satisfies

    A similar theorem could be stated if we use the wrong probabilities instead of the true ones ; the only difference is a term which accounts for the relative entropy

    ( )H X

    ( ) ( ) 1H X L H X≤ ≤ +

    { }ip{ }iq

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 7

    The redundance

    It is defined as the average codeword legths minus the entropy

    Note that

    (why?)

    Redundancy logi ii

    L p p⎛ ⎞= − −⎜ ⎟⎝ ⎠∑

    0 redundancy 1≤ ≤

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 8

    Compression ratio

    It is the ratio between the average number of bit/symbol in the original message and the same quantity for the coded message, i.e.

    average original symbol lengthaverage compressed symbol length

    C < >=< >

    ( )!!L X≠

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 9

    Uniquely decodable codes

    The set of the instantaneous codes are a small subset of the uniquely decodable codes. It is possible to obtain a lower average code length L using a uniquely decodable code that is not

    instantaneous? NOSo we use instantaneous codes that are easier to decode

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 10

    Summary

    Average codeword length Lfor uniquely decodable codes

    (and for instantaneous codes)In practice for each r.v. with entropy we can build a code with average codeword length that satisfies

    ( )L H X≥

    ( )H XX

    ( ) ( ) 1H X L H X≤ ≤ +

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 11

    Shannon-Fano codingThe main advantage of the Shannon-Fano technique is its semplicity

    Source symbols are listed in order of nonincreasing probability.The list is divided in such a way to form two groups of as nearly equal probabilities as possibleEach symbol in the first group receives a 0 as first digit of its codeword, while the others receive a 1Each of these group is then divided according to the same criterion and additional code digits are appendedThe process is continued until each group contains only one message

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 12

    example

    1 2 1 4 1 8 1 16 1 32 1 32

    abcdef

    011111

    01111

    0111

    011

    01

    H=1.9375 bits

    L=1.9375 bits

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 13

    Shannon-Fano coding - exercise

    Symb. Prob. * 12% ? 5% ! 13% & 2% $ 29% € 13% § 10% ° 6% @ 10%

    Encode, using Shannon-Fano algorithm

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 14

    Is Shannon-Fano coding optimal?

    0.35 0.17 0.17 0.16 0.15

    abcde

    0100101110111

    000110110111

    H=2.2328 bits

    L=2.31 bits

    L1=2.3 bits

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 15

    Huffman coding - I

    There is another algorithm which performances are slightly better than Shanno-Fano, the famous Huffman codingIt works constructing bottom-up a tree, that has symbols in the leafsThe two leafs with the smallest probabilities becomes sibling under a parent node with probabilities equal to the two children’s probabilities

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 16

    Huffman coding - II

    At this time the operation is repeated, considering also the new parent node and ignoring its childrenThe process continue until there is only parent node with probability 1, that is the root of the treeThen the two branches for every non-leaf node are labeled 0 and 1 (typically, 0 on the left branch, but the order is not important)

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 17

    Huffman coding - example

    0

    Symbol Prob. 0.05 0.05 0.1 0.2 0.3 0.2 0.1

    abcdefg a

    0.05b

    0.05c

    0.1d

    0.2e

    0.3f

    0.2g

    0.1

    0.1

    0.2

    0.3

    0.4

    0.6

    1.00

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    a0.05

    b0.05

    c0.1

    d0.2

    e0.3

    f0.2

    g0.1

    0.1

    0.2

    0.3

    0.4

    0.6

    1.0

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 18

    Huffman coding - example

    Symbol Prob. Codeword 0.05 0000 0.05 0001 0.1 001 0.2 01 0.3 10 0.2 11

    abcdef 0

    0.1 111g

    Exercise: evaluate H(X) and L(X)

    H(X)=2.5464 bits

    L(X)=2.6 bits !!

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 19

    Huffman coding - exercise

    Code the sequence

    aeebcddegfced and calculate the compression ratio

    Sol: 0000 10 10 0001 001 01 01

    10 111 110 001 10 01

    Aver. orig. symb. length = 3 bits

    Aver. compr. symb. length = 34/13

    C=.....

    Symbol Prob. Codeword 0.05 0000 0.05 0001 0.1 001 0.2 01 0.3 10 0.2 11

    abcdef 0

    0.1 111g

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 20

    Huffman coding - exercise

    Decode the sequence0111001001000001111110

    Sol: dfdcadgf

    Symbol Prob. Codeword 0.05 0000 0.05 0001 0.1 001 0.2 01 0.3 10 0.2 11

    abcdef 0

    0.1 111g

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 21

    Huffman coding - exercise

    Encode with Huffman the sequence01$cc0a02ba10

    and evaluate entropy, average codeword length and compression ratio

    Symb. Prob. 0.10 0.03 0.14 0 0.4 1 0.22 2 0.04 $ 0.07

    abc

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 22

    Huffman coding - exercise

    Symb. Prob. 0 0.16 1 0.02 2 0.15 3 0.29 4 0.17 5 0.04 % 0.17

    Decode (if possible) the Huffman coded bit streaming01001011010011110101...

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 23

    Huffman coding - notes

    In the huffman coding, if, at any time, there is more than one way to choose a smallest pair of probabilities, any such pair may be chosen

    Sometimes, the list of probabilities is inizialized to be non-increasing and reordered after each node creation. This details doesn’t affect the correctness of the algorithm, but it provides a more efficient implementation

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 24

    Huffman coding - notes

    There are cases in which the Huffman coding does not uniquely determine codeword lengths, due to the arbitrary choice among equal minimum probabilities.For example for a source with probabilities

    it is possible to obtain codeword lengths of and ofIt would be better to have a code which codelength has the minimum variance, as this solution will need the minimum buffer space in the transmitter and in the receiver

    { }0.4, 0.2, 0.2, 0.1, 0.1{ }1, 2, 3, 4, 4 { }2, 2, 2, 3, 3

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 25

    Huffman coding - notes

    Schwarz defines a variant of the Huffman algorithm that allows to build the code with minimum .

    There are several other variants, we will explain the most important in a while.

    maxl

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 26

    Optimality of Huffman coding - I

    It is possible to prove that, in case of character coding (one symbol, one codeword), Huffman coding is optimal

    In another terms Huffman code has minimum redundancyAn upper bound for redundancy has been found

    where is the probability of the most likely simbol

    ( )1 2 2 2 1redundancy 1 log log log 0.086p e e p≤ + − + +

    1p

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 27

    Optimality of Huffman coding - II

    Why Huffman code “suffers” when there is one symbol with very high probability?Remember the notion of uncertainty...

    The main problem is given by the integer constraint on codelengths!!

    This consideration opens the way to a more powerful coding... we will see it later

    ( ) 1 log( ( )) 0p x p x→ ⇒ − →

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 28

    Huffman coding - implementation

    Huffman coding can be generated in O(n) time, where n is the number of source symbols, provided that probabilities have been presorted (however this sort costs O(nlogn)...)

    Nevertheless, encoding is very fast

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 29

    Huffman coding - implementation

    However, spatial and temporal complexity of the decoding phase are far more important, because, on average, decoding will happen more frequently.Consider a Huffman tree with n symbols

    n leafs and n-1 internal nodes

    has the pointer to a symbol and the info that it is a leaf

    has two pointers

    2 2( 1) 4 words (32 bits)n n n+ −

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 30

    Huffman coding - implementation

    1 million symbols 16 MB of memory!Moreover traversing a tree from root to leaf involves follow a lot of pointers, with little locality of reference. This causes several page faults or cache misses.

    To solve this problem a variant of Huffman coding has been proposed: canonical Huffman coding

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 31

    canonical Huffman coding - I

    Symb. Prob. Code 1 Code 2 Code 3 0.11 000 0.12 001 0.13 100

    1111

    0000010

    1001

    10

    01

    abcd .14 101

    0.24 01 0.26 11

    0101000

    011

    10

    1 1ef

    b0.12

    c0.13

    d0.14

    e0.24

    f0.26

    a0.11

    0.23 0.27

    0.470.53

    1.0

    00

    1

    1 1

    1

    (0)

    (0)

    (0)

    (0)(0)

    (1)

    (1)(1)

    (1) (1)

    ?

    10

    00

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 32

    canonical Huffman coding - IIThis code cannot be obtained through a Huffman tree!

    We do call it an Huffman code because it is instantaneous and the codeword lengths are the same than a valid Huffman code

    numerical sequence propertycodewords with the same length are ordered lexicographicallywhen the codewords are sorted in lexical order they are also in order from the longest to the shortest codeword

    Symb. Code 3

    000001010011

    10 1 1

    abcdef

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 33

    canonical Huffman coding - III

    The main advantage is that it is not necessary to store a tree, in order to decodingWe need

    a list of the symbols ordered according to the lexical order of the codewordsan array with the first codeword of each distinct length

  • 34

    canonical Huffman coding - IVEncoding. Suppose there are n disctinct symbols, that for symbol

    i we have calculated huffman codelength andil ii l maxlength∀ ≤for 1 to { [ ] 0; }for 1 to { [ ] [ ] 1; }

    [ ] 0;for 1 downto 1 { [ ] ( [ 1] [ 1]) / 2 ; }

    for 1 to

    i i

    k maxlength numl ki n numl l numl l

    firstcode maxlengthk maxlength

    firstcode k firstcode k numl kk maxlength

    = == = +

    == −

    = + + +⎡ ⎤⎢ ⎥=

    [ ]

    { [ ]= [ ]; }for 1 to { [ ] [ ]; , [ ] - [ ] ; [ ] [ ] 1; }

    i

    i i i

    i i

    nextcode k firstcode ki n

    codeword i nextcode lsymbol l nextcode l firstcode l inextcode l nextcode l

    ==

    =

    = +

    numl[k] = number of codewords with length k

    firstcode[k] = integer for first code of length k

    nextcode[k] = integer for the next codeword of length k to be assigned

    symbol[-,-] used for decoding

    codeword[i] the rightmost bits of this integer are the code for symbol i

    il

  • 35

    canonical Huffman - example

    1. Evaluate array numlSymb. length 2 5 5 3 2 5 5 2

    ii labcdefgh

    : [0 3 1 0 4]numl

    2. Evaluate array firstcode

    : [2 1 1 2 0]firstcode3. Construct array codeword and symbol

    [ ]

    for 1 to { [ ]= [ ]; }for 1 to { [ ] [ ]; , [ ]- [ ] ; [ ] [ ] 1; }

    i

    i i i

    i i

    k maxlengthnextcode k firstcode ki n

    codeword i nextcode lsymbol l nextcode l firstcode l inextcode l nextcode l

    =

    ==

    =

    = +

    - - - -

    a e h -

    d - - -

    - - - -

    b c f g

    symbol0 1 2 3

    1

    2

    3

    4

    5

    code bitsword 1 01 0 00000 1 00001 1 001 2 10 2 00010 3 00011 3 11

    for 1 downto 1 {

    [ ] ( [ 1]

    [ 1]) / 2 ; }

    k maxlength

    firstcode k firstcode k

    numl k

    = −

    = + +

    + +

  • Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 36

    canonical Huffman coding - VDecoding. We have the arrays firstcode and symbols

    [ ]

    ();1;

    while [ ] { 2* (); 1; }Return , [ ] ;

    v nextinputbitk

    v firstcode kv v nextinputbitk k

    symbol k v firstcode k

    ==

    <= += +

    nextinputbit() function that returns next input bit

    firstcode[k] = integer for first code of length k

    symbol[k,n] returns the symbol number n with codelength k

  • 37

    canonical Huffman - example

    [ ]

    ();1;

    while [ ] { 2* (); 1; }Return , [ ] ;

    v nextinputbitk

    v firstcode kv v nextinputbitk k

    symbol k v firstcode k

    ==

    <= += +

    - - - -

    a e h -

    d - - -

    - - - -

    b c f g

    symbol0 1 2 3

    1

    2

    3

    4

    5: [2 1 1 2 0]firstcode

    00 00 00 000 0011 11 1100 00 00 000 0011 11 11

    symbol[3,0] = dsymbol[2,2] = hsymbol[2,1] = esymbol[5,0] = bsymbol[2,0] = asymbol[3,0] = d

    symbol[3,0] = dsymbol[2,2] = hsymbol[2,1] = esymbol[5,0] = bsymbol[2,0] = asymbol[3,0] = d

    Decoded: dhebad

    Huffman codingOptimal codes - IOptimal codes - IIOptimal codes - IIIOptimal codes - IVOptimal codes - VThe redundanceCompression ratioUniquely decodable codesSummaryShannon-Fano codingexampleShannon-Fano coding - exerciseIs Shannon-Fano coding optimal?Huffman coding - IHuffman coding - IIHuffman coding - exampleHuffman coding - exampleHuffman coding - exerciseHuffman coding - exerciseHuffman coding - exerciseHuffman coding - exerciseHuffman coding - notesHuffman coding - notesHuffman coding - notesOptimality of Huffman coding - IOptimality of Huffman coding - IIHuffman coding - implementationHuffman coding - implementationHuffman coding - implementationcanonical Huffman coding - Icanonical Huffman coding - IIcanonical Huffman coding - IIIcanonical Huffman coding - IVcanonical Huffman - examplecanonical Huffman coding - Vcanonical Huffman - example