Huffman Coding - Colorado State Universityanderson/cs320/slides/Huffman.pdf · Huffman Coding How...

of 14/14
Huffman Coding How can we represent a string of characters with a minimum number of bits ? Answered by David Huffman in 1952 when he was a graduate student at MIT . Each character represented by a bit string code . Codes are of various lengths . Codes are prefix codes ( prefix - free ) . No code is prefix of another ÷¥÷i¥÷xa¥÷ Y_bz Prefix cod
  • date post

    30-Mar-2020
  • Category

    Documents

  • view

    3
  • download

    0

Embed Size (px)

Transcript of Huffman Coding - Colorado State Universityanderson/cs320/slides/Huffman.pdf · Huffman Coding How...

  • Huffman CodingHow can we represent a string ofcharacters with a minimum numberof bits ?

    Answered by David Huffman in 1952when he was a graduate student at MIT .

    • Eachcharacter represented by a bitstring code .• Codes are of various lengths .

    • Codes are prefix codes ( prefix - free ) .No code is

    prefixof another

    ÷¥÷i¥÷xa¥÷Y_bz Prefix cod

  • How would you represent a string ofCharacters 6

    ,C

    ,T

    ,and A ?

    Assign shortest codes to most frequentcharacters

    .Let's

    say frequencies areG C T A

    -- - -

    O . 4 0.3 0.2 0.1

    One prefix code is

    G C T A-

    - --

    O to 110 I I I

    The stringC G TT A CC AT AA

    is10 O 110 110 Ill to LO Ill 110 Ill Ill

    Canyou decode this ? This binary tree

    Ohelps : ToY

    ④ to%

    to to

  • How many bits are needed for tooo GST , As ?

    G C T A-

    - - -

    0.4 0.3 0.2 0.1

    CodeG : 0.4 .1000 = 400 o

    C

    :0.3 - 1000 = 300 10

    T:

    0.2 . 1000 = 200 110

    A : Oil . 1000 = COO Ill

    Number of bits = 400.1 t 300.2 t 200.3+1003

    = 4001-600 t 600 t 300

    = 1,900 bits

    IWhat if we used a fixed-length code ?Not

    Four characters only needs 2 bit each muchG C T A less

    - - - -

    00 01 LO 11

    4000 characters = 2. ooo bits)

  • what if frequencies wereG C T A

    - - - -

    O . 8 O . I 0.05 0.05

    O LO I I O C I I

    1,000 characters = 800 . I t 100.2+50-3+50.3= 800 t 200 t 150 t 150

    = 1,300

    fixed length still 2,000

    Savings increases with number of characters .

  • How to construct a Huffman code ?or

    How to construct that decoding binary tree ?Start with a leaf for each character with

    its frequency .

    G C T A-

    -- -

    O . 4 0.3 0.2 O . I

    Merge two nodes with lowest frequency .Sum their frequencies .

    G C

    074Is0.3

    I IT ATo

    Repeat G-0.4 0.6

    I \c

    OJ 0.3

    I IT A

    - -

    0.2 O . I

  • 1.0

    Repeat again G I I074 0.6

    I \C-

    0.30.3

    I IT A

    - -

    and assign O to deff0.2 0.1

    subtrees and I to right ones .

    1.0

    4

    YG

    074 0.6O

    OfYC- 0.3O . 3

    10

    01IT A

    - -

    0.2 O . I

    110Ill

  • Pseudo code from page 431.

    Assume Q is a min - priority queueof objects containing a frequency and , if aleaf

    ,the character

    .

    In the queue,

    the order

    is determined by the frequency .

    Let C . be the initial objects containingeach character and its frequency .

    Each object has attributes . freer. lefta right .

    n = ICI

    Q = C

    for i = I to n - I

    2 = new object2. left = x = pop C Q )

    2. right = y = pop CQ )Z

    . frey = X. frog t y . freqpushCQ , z )

    return pop CQ ) # Top node

  • To encode a string of characters ,a dictionary of Huffman codes ,indexed ( keyed ) by characters , wouldbe useful

    .

    How can you construct this dictionaryfrom the tree ?

    1.0

    4

    YG

    074 0.6O

    OfYC- 0.3O . 3

    10

    01IT A

    - -

    0.2 O . I

    110Ill

  • Now,

    how would you use this

    dictionary to encode a text

    String ?,

    gcc AT Gj ⇒ 010104

    !" 0010

    And how would you decode this ?

    0101.04! 110010 ⇒

    'GCC AT 6C '

    1.0

    4

    YG

    074 0.60

    OfYcIs o . 310

    01I

    '

    T A- -

    0.2 O . I

    110Ill

  • Why does the Huffman algorithm guaranteean optimal ( minimum size ) prefix - free code ?

    or

    Does the greedy choice of merging the two

    lowest frequency characters always work ?

    Lemma 16.2-

    T T is an optimal tree .

    ×a and b are at maximum

    depth .

    x andy ane charactersY with lowest frequency .

    I \Swap x and a , anda b

    yand b

    .

    T"

    a

    Cost of codes for T"

    Same as cost for T ,

    is I s Iimisaiatsoeea:I \x↳y

  • Details : Assume a. freq E b .freqx. freq E y . freq

    BCT ) is the expected cost of encodings using T .

    BCT ) = Eec c. freq . of Cc )

    set of 9 ( depth in Tof character c ,all characters or number of bits in c code

    .

    Let T'

    be the tree after swapping just x and a .

    BCT ) - BCT' )

    = Ifc c. freq . of Cc ) -€ cc-freq 'd

    ,! )

    = X. freq . of Cx ) ta . freq . of Ca ) - x. freq . df.cc ) - a. freq . Ipcc)= X. freq . dy Cx ) ta . freq . of.ca ) - X. freq - afca ) - a. freq . dyed= @freq - a. freq) dtcx ) t Ca .frey - X. frey) d , Ca)

    = ( a. freq - x. frey ) C at -

    a'eCx ) )

    e-ZO nonnegative because nonnegativex is minimum frequency because a isleaf maximum depthleaf int

  • swap Xand a

    f Then swap

    So BCT ) - BLT' ) Zo

    .

    Yana bfWe can show in the same way that BCT

    ' ) - BCT " )zo .

    So BCT ) - BCT" ) ZO or BCT ) z BCT " ) .

    But, since we assumed Tis optimal , BCT ) s BCT " I .

    So,

    BCT ) = BCT " ) !

    We just showed that a . .

    • . .

    T ",

    formed by merging lowest frequencycharacters

    , is an optimal tree !

    Of all possible mergers at each step , the

    Huffman algorithm chooses the one thatincurs the least cost .

    Now,

    let's prove that the problem of constructing

    optimal prefix codes has the optimal substructure

    property .

  • Lemma 16.3.

    as before,

    let x and ybe the

    characters with the two lowest frequencies .

    We must show that if T'

    is optimal for C - Ex , y 3

    then the step of

    replacing z with÷÷. . . .

    .

    . .ae .resits in

    1 \

    ?x y an optimal

    tree for C.

    BCT ) = ¥ qgjfneg-d.cc ) t Cx . freq ty.fregdcd.cat I )= BCT ' ) t x .freq t y . freq

    o -

    BC T' ) = BCT ) - x. for - y . far

  • T .,÷.g!* eraserB ( T ' ) = BCT ) - x. for - y . far

    Let's assume that what we want to be

    true,

    thatT is optimal for C

    i

    ,is not

    true,

    and see itthis results in a contradiction .If T is not optimal for C , then for tree T "that is optimal for C , BC T

    " ) < BCT ).

    Call T' "

    the tree that we get by merging x and y in

    T"

    .BLT ' " ) = BC T

    " ) - x. freq - y .freq< BCT ) - x. fer - y . freer

    = BCT ' )

    So B CT' ' ' ) L B C T

    ' )

    Contradicting our assumption thatT and T

    'are optimal !