Huffman Coding

33
data compression basics 1 HUFFMAN CODING 1 2 In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm Present a procedure for building Huffman codes when the probability model for the source is known A procedure for building codes when the source statistics are unknown Describe a new technique for code design that are in some sense similar to the Huffman coding approach Overview

description

Huffman Coding

Transcript of Huffman Coding

  • data compression basics

    1

    HUFFMAN CODING

    1

    2

    In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm

    Present a procedure for building Huffman codes when the probability model for the source is known

    A procedure for building codes when the source statistics are unknown

    Describe a new technique for code design that are in some sense similar to the Huffman coding approach

    Overview

  • data compression basics

    2

    3

    Huffman Coding Algorithm

    4

    Huffman Coding Algorithm

  • data compression basics

    3

    5

    Huffman Coding Algorithm

    6

    Huffman Coding Algorithm

  • data compression basics

    4

    7

    Huffman Coding Algorithm

    8

    Huffman Coding Algorithm

  • data compression basics

    5

    9

    Huffman Coding Algorithm

    10

    Huffman Coding Algorithm

  • data compression basics

    6

    11

    Minimum Variance Huffman Codes

    12

    Minimum Variance Huffman Codes

  • data compression basics

    7

    13

    Minimum Variance Huffman Codes

    14

    Minimum Variance Huffman Codes

  • data compression basics

    8

    15

    Minimum Variance Huffman Codes

    16

    Huffman Coding (using binary tree)

    Algorithm in 5 steps:1. Find the grey-level probabilities for the image by

    finding the histogram

    2. Order the input probabilities (histogram magnitudes) from smallest to largest

    3. Combine the smallest two by addition

    4. GOTO step 2, until only two probabilities are left

    5. By working backward along the tree, generate code by alternating assignment of 0 and 1

  • data compression basics

    9

    17

    Coding Procedures for an N-symbol source Source reduction List all probabilities in a descending order

    Merge the two symbols with smallest probabilities into a new compound symbol

    Repeat the above two steps for N-2 steps

    Codeword assignment Start from the smallest source and work back to the

    original source

    Each merging point corresponds to a node in binary codeword tree

    Huffman Coding (using binary tree)

    Example 1

    We have an image with 2 bits/pixel, giving 4

    possible gray levels. The image is 10 rows by 10

    columns. In step 1 we find the histogram for the

    image.

    18

  • data compression basics

    10

    Example 1

    Converted into probabilities by

    normalizing to the total

    number of pixels

    Gray level 0 has 20 pixels

    Gray level 1 has 30 pixels

    Gray level 2 has 10 pixels

    Gray level 3 has 40 pixels

    19

    a. Step 1: Histogram

    Example 1

    Step 2, the probabilities are ordered.

    20

  • data compression basics

    11

    Example 1

    Step 3, combine the smallest two by addition.

    21

    Example 1

    Step 4 repeats steps 2 and 3, where reorder (if

    necessary) and add the

    two smallest

    probabilities.

    22

    d. Step 4: Reorder and

    add until only two values

    remain.

  • data compression basics

    12

    Example 1

    Step 5, actual code assignment is made. Start on the right-hand side of the tree and assign 0s &

    1s

    0 is assigned to 0.6 branch & 1 to 0.4 branch

    23

    Example 1

    The assigned 0 & 1 are brought back along the tree & wherever a branch occurs the code is put on both

    branches

    24

  • data compression basics

    13

    Example 1

    Assign the 0 & 1 to the branches labeled 0.3, appending to the existing code.

    25

    Example 1

    Finally, the codes are brought back one more level, & where the branch splits another assignment 0 & 1 occurs

    (at 0.1 & 0.2 branch)

    26

  • data compression basics

    14

    Example 1

    Now we have Huffman code for this image 2 gray levels have 3 bits to represent & 1 gray level has 1 bit

    assigned

    Gray level represented by 1 bit, g3, is the most likely to occur (40% of the time) & thus has least information in the

    information theoretic sense.27

    Exercise

    Using the example 1, find a Huffman code using the minimum variance procedure.

    EE465: Introduction to Digital Image Processing 28

  • data compression basics

    15

    29

    symbol x p(x)

    S

    W

    N

    E

    0.5

    0.25

    0.125

    0.1250.25

    0.25

    0.5 0.5

    0.5

    Example 2

    Step 1: Source reduction

    (EW)

    (NEW)

    compound symbols

    30

    p(x)

    0.5

    0.25

    0.125

    0.1250.25

    0.25

    0.5 0.5

    0.5 1

    0

    1

    0

    1

    0

    codeword

    0

    10

    110

    111

    Example 2

    Step 2: Codeword assignment

    symbol x

    S

    W

    N

    E

    NEW 0

    10EW

    110

    EW

    N

    S

    01

    1 0

    1 0111

  • data compression basics

    16

    31

    Example 2

    NEW 0

    10EW

    110

    EW

    N

    S

    01

    1 0

    1 0

    NEW 1

    01EW

    000

    EW

    N

    S

    10

    0 1

    1 0001

    The codeword assignment is not unique. In fact, at each

    merging point (node), we can arbitrarily assign 0 and 1

    to the two branches (average code length is the same).

    or

    32

    symbol x p(x)

    e

    o

    a

    i

    0.4

    0.2

    0.2

    0.1

    0.4

    0.2

    0.4 0.6

    0.4

    Example 2

    Step 1: Source reduction

    (iou)

    (aiou)

    compound symbolsu 0.1

    0.2(ou)

    0.4

    0.2

    0.2

  • data compression basics

    17

    33

    symbol x p(x)

    e

    o

    a

    i

    0.4

    0.2

    0.2

    0.1

    0.4

    0.2

    0.4 0.6

    0.4

    Example 2

    (iou)

    (aiou)

    compound symbols

    u 0.10.2(ou)

    0.4

    0.2

    0.2

    Step 2: Codeword assignment

    codeword

    0

    1

    1

    01

    000

    0010

    0011

    34

    Example 2

    0 1

    0100

    000 001

    0010 0011

    e

    o u

    (ou)i

    (iou) a

    (aiou)

    binary codeword tree representation

  • data compression basics

    18

    35

    Example 2

    symbol x p(x)

    e

    o

    a

    i

    0.4

    0.2

    0.20.1

    u 0.1

    codeword

    1

    01

    0000010

    0011

    length1

    23

    4

    4

    bpsppXHi

    ii 122.2log)(5

    1

    2

    bpslpli

    ii 2.241.041.032.022.014.05

    1

    bpsXHlr 078.0)(

    If we use fixed-length codes, we have to spend three bits per

    sample, which gives code redundancy of 3-2.122=0.878bps

    36

    Example 3

    Step 1: Source reduction

    compound symbol

  • data compression basics

    19

    37

    Example 3

    Step 2: Codeword assignment

    compound symbol

    38

    Adaptive Huffman Coding

  • data compression basics

    20

    39

    Adaptive Huffman Coding

    40

    Update Procedure

  • data compression basics

    21

    41

    Update Procedure

    42

    Update Procedure

  • data compression basics

    22

    43

    Update Procedure

    44

    Update Procedure

  • data compression basics

    23

    45

    Update Procedure

    46

    Dynamic Huffman Coding

  • data compression basics

    24

    47

    T

    Stage 1 (First occurrence of t )

    r

    / \

    0 t(1)

    Order: 0,t(1)

    * r represents the root

    * 0 represents the null node

    * t(1) denotes the occurrence of T with a frequency of 1

    48

    TE

    Stage 2 (First occurrence of e)

    r

    / \

    1 t(1)

    / \

    0 e(1)

    Order: 0,e(1),1,t(1)

  • data compression basics

    25

    49

    TEN

    Stage 3 (First occurrence of n )r

    / \

    2 t(1)

    / \

    1 e(1)

    / \

    0 n(1)

    Order: 0,n(1),1,e(1),2,t(1) : Misfit

    50

    Reorder: TEN

    r

    / \

    t(1) 2

    / \

    1 e(1)

    / \

    0 n(1)

    Order: 0,n(1),1,e(1),t(1),2

  • data compression basics

    26

    51

    TENN

    Stage 4 ( Repetition of n )r

    / \

    t(1) 3

    / \

    2 e(1)

    / \

    0 n(2)

    Order: 0,n(2),2,e(1),t(1),3 : Misfit

    52

    Reorder: TENN

    r

    / \

    n(2) 2

    / \

    1 e(1)

    / \

    0 t(1)

    Order: 0,t(1),1,e(1),n(2),2

    t(1),n(2) are swapped

  • data compression basics

    27

    53

    TENNE

    Stage 5 (Repetition of e )r

    / \

    n(2) 3

    / \

    1 e(2)

    / \

    0 t(1)

    Order: 0,t(1),1,e(2),n(2),3

    54

    TENNES

    Stage 6 (First occurrence of s)r

    / \

    n(2) 4

    / \

    2 e(2)

    / \

    1 t(1)

    / \

    0 s(1)

    Order: 0,s(1),1,t(1),2,e(2),n(2),4

  • data compression basics

    28

    55

    TENNESS

    Stage 7 (Repetition of s)r

    / \

    n(2) 5

    / \

    3 e(2)

    / \

    2 t(1)

    / \

    0 s(2)

    Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit

    56

    Reorder: TENNESS

    r

    / \

    n(2) 5

    / \

    3 e(2)

    / \

    1 s (2)

    / \

    0 t(1)

    Order : 0,t(1),1,s(2),3,e(2),n(2),5

    s(2) and t(1) are swapped

  • data compression basics

    29

    57

    TENNESSE

    Stage 8 (Second repetition of e )

    r

    / \

    n(2) 6

    / \

    3 e(3)

    / \

    1 s(2)

    / \

    0 t(1)

    Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit

    58

    Reorder: TENNESSE

    r

    / \

    e(3) 5

    / \

    3 n(2)

    / \

    1 s(2)

    / \

    0 t(1)

    Order : 1,t(1),1,s(2),3,n(2),e(3),5

    N(2) and e(3) are swapped

  • data compression basics

    30

    59

    TENNESSEE

    Stage 9 (Second repetition of e )

    r

    0/ \1

    e(4) 5

    0/ \1

    3 n(2)

    0/ \1

    1 s(2)

    0/ \1

    0 t(1)

    Order : 1,t(1),1,s(2),3,n(2),e(4),5

    60

    ENCODING

    The letters can be encoded as follows:

    e : 0

    n : 11

    s : 101

    t : 1001

  • data compression basics

    31

    61

    Average Code Length

    Average code length = i=0,n (length*frequency)/ i=0,n frequency

    = { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1)

    = 18 / 9 = 2

    62

    ENTROPY

    Entropy = - i=1,n (pi log2 pi)

    = - ( 0.44 * log20.44 + 0.22 * log20.22

    + 0.22 * log20.22 + 0.11 * log20.11 )

    = - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)

    / log2

    = 1.8367

  • data compression basics

    32

    63

    Ordinary Huffman Coding

    TENNESSE

    9

    0/ \1

    5 e(4)

    0/ \1

    s(2) 3

    0/ \1

    t(1) n(2)

    ENCODING

    E : 1

    S : 00

    T : 010

    N : 011

    Average code length = (1*4 + 2*2 +

    2*3 + 3*1) / 9 = 1.89

    64

    SUMMARY

    The average code length of ordinary Huffman coding seems to be

    better than the Dynamic version,in this exercise.

    But, actually the performance of dynamic coding is better. The problem

    with static coding is that the tree has to be constructed in the transmitter

    and sent to the receiver. The tree may change because the frequency

    distribution of the English letters may change in plain text technical paper,

    piece of code etc.

    Since the tree in dynamic coding is constructed on the receiver as well, it

    need not be sent. Considering this, Dynamic coding is better.

    Also, the average code length will improve if the transmitted text is

    bigger.

  • data compression basics

    33

    65

    Summary of Huffman Coding Algorithm

    Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time

    Sorting symbols in descending probabilities is the key in the step of source reduction

    The codeword assignment is not unique. Exchange the labeling of 0 and 1 at any node of binary codeword tree would produce another solution that equally works well

    Only works for a source with finite number of symbols (otherwise, it does not know where to start)