Adaptive Huffman Coding[1]

download Adaptive Huffman Coding[1]

of 26

Transcript of Adaptive Huffman Coding[1]

  • 8/11/2019 Adaptive Huffman Coding[1]

    1/26

    Adaptive Huffman Coding

  • 8/11/2019 Adaptive Huffman Coding[1]

    2/26

    http://www.vidyarthiplus.com/

    Why Adaptive Huffman Coding? Huffman coding suffers from the fact that the

    uncompresser need have some knowledge ofthe probabilities of the symbols in thecompressed files

    this can need more bit to encode the file

    if this information is unavailable compressing the filerequires two passes

    first pass: find the frequency of each symbol andconstruct the huffman tree

    second pass: compress the file

  • 8/11/2019 Adaptive Huffman Coding[1]

    3/26

    http://www.vidyarthiplus.com/

    The key idea The key idea is to build a Huffman tree

    that is optimal for the part of the

    message already seen, and toreorganize it when needed, to maintainits optimality

  • 8/11/2019 Adaptive Huffman Coding[1]

    4/26

    http://www.vidyarthiplus.com/

    Pro & Con - I Adaptive Huffman determines the mapping to

    codewords using a running estimate of thesource symbols probabilities

    Effective exploitation of locality

    For example suppose that a file starts out with a series ofa character that are not repeated again in the file. In staticHuffman coding, that character will be low down on thetree because of its low overall count, thus taking lots of

    bits to encode. In adaptive huffman coding, the characterwill be inserted at the highest leaf possible to be decoded,before eventually getting pushed down the tree by higher-frequency characters

  • 8/11/2019 Adaptive Huffman Coding[1]

    5/26

    http://www.vidyarthiplus.com/

    Pro & Con - II only one pass over the data

    overhead

    In static Huffman, we need to transmit someway the

    model used for compression, i.e. the tree shape. This costsabout 2n bits in a clever representation. As we will see, inadaptive schemes the overhead is nlogn.

    sometimes encoding needs some more bits w.r.t.

    static Huffman (without overhead)But adaptive schemes generally compare well with static

    Huffman if overhead is taken into account

  • 8/11/2019 Adaptive Huffman Coding[1]

    6/26

    http://www.vidyarthiplus.com/

    Some history Adaptive Huffman coding was first conceived

    independently by Faller (1973) and Gallager(1978)

    Knuth contributed improvements to theoriginal algorithm (1985) and the resulting

    algorithm is referred to as algorithm FGK

    A more recent version of adaptive Huffmancoding is described by Vitter (1987) and calledalgorithm V

  • 8/11/2019 Adaptive Huffman Coding[1]

    7/26

    http://www.vidyarthiplus.com/

    An important question Better exploiting locality, adaptive Huffman

    coding is sometimes able to do better thanstatic Huffman coding, i.e., for somemessages, it can have a greater compression

    ... but weve assessed optimality of staticHuffman coding, in the sense of minimalredundancy

    There is a contradiction?

  • 8/11/2019 Adaptive Huffman Coding[1]

    8/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - I The basis for algorithm FGK is the Sibling

    Property (Gallager 1978)

    A binary code tree with nonnegative weights has thesibling property if each node (except the root) has asibling and if the nodes can be numbered in order ofnondecreasing weight with each node adjacent to itssibling. Moreover the parent of a node is higher inthe numbering

    A binary prefix code is a Huffman code if andonly if the code tree has the sibling property

  • 8/11/2019 Adaptive Huffman Coding[1]

    9/26

    Algorithm FGK - II

    Note that node numbering corresponds to the order in which thenodes are combined by Huffmans algorithm, first nodes 1 and 2,

    then nodes 3 and 4 ...

    2a

    3b

    5d

    6e

    5c

    11f

    32

    21

    1110

    5

    1 2

    3 4 5 6

    7 8

    9 10

    11

    http://www.vidyarthiplus.com/

  • 8/11/2019 Adaptive Huffman Coding[1]

    10/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - III In algorithm FGK, both encoder and decoder

    maintain dynamically changing Huffman codetrees. For each symbol the encoder sends thecodeword for that symbol in current tree andthen update the tree

    The problem is to change quickly the tree optimalafter tsymbols (not necessarily distinct) into the tree

    optimal for t+1 symbols

    If we simply increment the weight of the t+1-thsymbols and of all its ancestors, the sibling propertymay no longer be valid we must rebuild the tree

  • 8/11/2019 Adaptive Huffman Coding[1]

    11/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - IV Suppose next

    symbol is b

    if we update the

    weigths... ... sibling

    property isviolated!!

    This is no more a

    Huffman tree

    2a

    3b

    5d

    6e

    5c

    11f

    32

    21

    1110

    5

    1 2

    3 4 5 6

    7 8

    9 10

    11 b

    4

    6

    11

    22

    33

    no more ordered bynondecreasing weight

  • 8/11/2019 Adaptive Huffman Coding[1]

    12/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - V The solution can be described as a two-

    phase process

    first phase: original tree is transformed inanother valid Huffman tree for the first tsymbols, that has the property that simpleincrement process can be appliedsuccesfully

    second phase: increment process, asdescribed previously

  • 8/11/2019 Adaptive Huffman Coding[1]

    13/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - V The first phase starts at the leaf of the t+1-th

    symbol

    We swap this node and all its subtree, but notits numbering, with the highest numberednode of the same weight

    New current node is the parent of this latternode

    The process is repeated until we reach theroot

  • 8/11/2019 Adaptive Huffman Coding[1]

    14/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - VI First phase

    Node 2: nothingto be done

    Node 4: to be

    swapped withnode 5

    Node 8: to beswapped withnode 9

    Root reached:stop!

    Second phase

    2

    a

    3

    b

    5

    d

    6

    e

    5

    c

    11

    f

    32

    21

    1110

    5

    1 2

    3 4 5 6

    7 8

    9 10

    11 b

    4

    6

    12

    33

  • 8/11/2019 Adaptive Huffman Coding[1]

    15/26

    http://www.vidyarthiplus.com/

    Why FGK works? The two phase procedure builds a valid

    Huffman tree for t+1 symbols, as the

    sibling properties is satisfied In fact, we swap each node which weight is

    to be increased with the highest numberednode with the same weight

    After the increasing process there is nonode with previous weight that is highernumbered

  • 8/11/2019 Adaptive Huffman Coding[1]

    16/26

    The Not Yet Seen problem - I

    When the algorithm starts and sometimesduring the encoding we encounter a symbolthat has not been seen before.

    How do we face this problem? We use a single 0-node (with weight 0) that

    represents all the unseen symbols. When a newsymbol appears we send the code for the 0-nodeand some bits to discern which is the new symbol.

    As each time we send logn bits to discern thesymbol, total overhead is nlogn bits

    It is possible to do better, sending only the index of thesymbol in the list of the current unseen symbols.

    In this way we can save some bit, on average

    http://www.vidyarthiplus.com/

  • 8/11/2019 Adaptive Huffman Coding[1]

    17/26

    http://www.vidyarthiplus.com/

    The Not Yet Seen problem - II

    Then the 0-node is splitted into two leaves,that are sibling, one for the new symbol, withweight 1, and a new 0-node

    Then the tree is recomputed as seen before inorder to satisfy the sibling property

  • 8/11/2019 Adaptive Huffman Coding[1]

    18/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - summary

    The algorithm starts with only one leaf node,the 0-node. As the symbols arrive, new leavesare created and each time the tree is

    recomputed

    Each symbol is coded with its codeword in thecurrent tree, and then the tree is updated

    Unseen symbols are coded with 0-node

    codeword and some other bits are needed tospecify the symbol

  • 8/11/2019 Adaptive Huffman Coding[1]

    19/26

    Algorithm FGK - VII

    Algorithm FGK compares favourablywith static Huffman code, if we consideralso overhead costs (it is used in the Unixutility compact)

    Exercise Construct the static Huffman tree and the FGK tree

    for the message e eae de eabe eae dcf and evaluate

    the number of bits needed for the coding with boththe algorithms, ignoring the overhead for Huffman

    SOL. FGK 60 bits, Huffman 52 bits

    FGK is obtained using the minimum number of bits for theelement in the list of the unseen symbolshttp://www.vidyarthiplus.com/

  • 8/11/2019 Adaptive Huffman Coding[1]

    20/26

    http://www.vidyarthiplus.com/

    Algorithm FGK - VIII

    if T=total number of bits transmitted byalgorithm FGK for a message of length tcontaining n distinct symbols, then

    where S is the performance of the staticHuffman (Vitter 1987)

    So the performance of algorithm FGK is never

    much worse than twice optimal

    1 2 4 2S n T S t n

  • 8/11/2019 Adaptive Huffman Coding[1]

    21/26

    http://www.vidyarthiplus.com/

    Algorithm V - I

    Vitter in his work of the 1987 introduces two

    improvements over algorithm FGK, calling the

    new scheme algorithm

    As a tribute to his work, the algorithm is

    become famous... with the letter flipped

    upside-down... algorithm V

  • 8/11/2019 Adaptive Huffman Coding[1]

    22/26

    http://www.vidyarthiplus.com/

    The key ideas - I

    swapping of nodes during encoding anddecoding is onerous In FGK algorithm the number of swapping

    (considering a double cost for the updates that move

    a swapped node two levels higher) is bounded by

    , where is the length of the added symbol in

    the old tree (this bound require some effort to be proved

    and is due to the work of Vitter)

    In algorithm V, the number of swapping is bounded

    by 1

    2t

    dt

    d

  • 8/11/2019 Adaptive Huffman Coding[1]

    23/26

    http://www.vidyarthiplus.com/

    The key ideas - II

    Moreover algorithm V, not only minimize

    as Huffman and FGK, but also , i.e.

    the height of the tree, and , i.e. is bettersuited to code next symbol, given it could be

    represented by any of the leaves of the tree

    This two objectives are reached through a new

    numbering scheme, called implicit numbering

    i i

    i

    w l maxil

    i

    il

  • 8/11/2019 Adaptive Huffman Coding[1]

    24/26

  • 8/11/2019 Adaptive Huffman Coding[1]

    25/26

    http://www.vidyarthiplus.com/

    An invariant

    The key to minimize the other kind ofinterchanges is to maintain the followinginvariant

    for each weight w, all leaves of weight w precede (inthe implicit numbering) all internal nodes of weight w

    The interchanges, in the algorithm V, aredesigned to restore implicit numbering, whena new symbol is read, and to preserve theinvariant

  • 8/11/2019 Adaptive Huffman Coding[1]

    26/26

    http://www.vidyarthiplus.com/

    Algorithm V - II

    if T=total number of bits transmitted byalgorithm V for a message of length tcontaining n distinct symbols, then

    At worst then, Vitter's adaptive method maytransmit one more bit per codeword than thestatic Huffman method

    Empirically, algorithm V slightly outperformsalgorithm FGK

    1 2 2 1S n T S t n