of 14/14
Huffman Coding How can we represent a string of characters with a minimum number of bits ? Answered by David Huffman in 1952 when he was a graduate student at MIT . Each character represented by a bit string code . Codes are of various lengths . Codes are prefix codes ( prefix - free ) . No code is prefix of another ÷¥÷i¥÷xa¥÷ Y_bz Prefix cod
• date post

30-Mar-2020
• Category

## Documents

• view

3

0

Embed Size (px)

### Transcript of Huffman Coding - Colorado State Universityanderson/cs320/slides/Huffman.pdf · Huffman Coding How...

• Huffman CodingHow can we represent a string ofcharacters with a minimum numberof bits ?

Answered by David Huffman in 1952when he was a graduate student at MIT .

• Eachcharacter represented by a bitstring code .• Codes are of various lengths .

• Codes are prefix codes ( prefix - free ) .No code is

prefixof another

÷¥÷i¥÷xa¥÷Y_bz Prefix cod

• How would you represent a string ofCharacters 6

,C

,T

,and A ?

Assign shortest codes to most frequentcharacters

.Let's

say frequencies areG C T A

-- - -

O . 4 0.3 0.2 0.1

One prefix code is

G C T A-

- --

O to 110 I I I

The stringC G TT A CC AT AA

is10 O 110 110 Ill to LO Ill 110 Ill Ill

Canyou decode this ? This binary tree

Ohelps : ToY

④ to%

to to

• How many bits are needed for tooo GST , As ?

G C T A-

- - -

0.4 0.3 0.2 0.1

CodeG : 0.4 .1000 = 400 o

C

:0.3 - 1000 = 300 10

T:

0.2 . 1000 = 200 110

A : Oil . 1000 = COO Ill

Number of bits = 400.1 t 300.2 t 200.3+1003

= 4001-600 t 600 t 300

= 1,900 bits

IWhat if we used a fixed-length code ?Not

Four characters only needs 2 bit each muchG C T A less

- - - -

00 01 LO 11

4000 characters = 2. ooo bits)

• what if frequencies wereG C T A

- - - -

O . 8 O . I 0.05 0.05

O LO I I O C I I

1,000 characters = 800 . I t 100.2+50-3+50.3= 800 t 200 t 150 t 150

= 1,300

fixed length still 2,000

Savings increases with number of characters .

• How to construct a Huffman code ?or

How to construct that decoding binary tree ?Start with a leaf for each character with

its frequency .

G C T A-

-- -

O . 4 0.3 0.2 O . I

Merge two nodes with lowest frequency .Sum their frequencies .

G C

074Is0.3

I IT ATo

Repeat G-0.4 0.6

I \c

OJ 0.3

I IT A

- -

0.2 O . I

• 1.0

Repeat again G I I074 0.6

I \C-

0.30.3

I IT A

- -

and assign O to deff0.2 0.1

subtrees and I to right ones .

1.0

4

YG

074 0.6O

OfYC- 0.3O . 3

10

01IT A

- -

0.2 O . I

110Ill

• Pseudo code from page 431.

Assume Q is a min - priority queueof objects containing a frequency and , if aleaf

,the character

.

In the queue,

the order

is determined by the frequency .

Let C . be the initial objects containingeach character and its frequency .

Each object has attributes . freer. lefta right .

n = ICI

Q = C

for i = I to n - I

2 = new object2. left = x = pop C Q )

2. right = y = pop CQ )Z

. frey = X. frog t y . freqpushCQ , z )

return pop CQ ) # Top node

• To encode a string of characters ,a dictionary of Huffman codes ,indexed ( keyed ) by characters , wouldbe useful

.

How can you construct this dictionaryfrom the tree ?

1.0

4

YG

074 0.6O

OfYC- 0.3O . 3

10

01IT A

- -

0.2 O . I

110Ill

• Now,

how would you use this

dictionary to encode a text

String ?,

gcc AT Gj ⇒ 010104

!" 0010

And how would you decode this ?

0101.04! 110010 ⇒

'GCC AT 6C '

1.0

4

YG

074 0.60

OfYcIs o . 310

01I

'

T A- -

0.2 O . I

110Ill

• Why does the Huffman algorithm guaranteean optimal ( minimum size ) prefix - free code ?

or

Does the greedy choice of merging the two

lowest frequency characters always work ?

Lemma 16.2-

T T is an optimal tree .

×a and b are at maximum

depth .

x andy ane charactersY with lowest frequency .

I \Swap x and a , anda b

yand b

.

T"

a

Cost of codes for T"

Same as cost for T ,

is I s Iimisaiatsoeea:I \x↳y

• Details : Assume a. freq E b .freqx. freq E y . freq

BCT ) is the expected cost of encodings using T .

BCT ) = Eec c. freq . of Cc )

set of 9 ( depth in Tof character c ,all characters or number of bits in c code

.

Let T'

be the tree after swapping just x and a .

BCT ) - BCT' )

= Ifc c. freq . of Cc ) -€ cc-freq 'd

,! )

= X. freq . of Cx ) ta . freq . of Ca ) - x. freq . df.cc ) - a. freq . Ipcc)= X. freq . dy Cx ) ta . freq . of.ca ) - X. freq - afca ) - a. freq . dyed= @freq - a. freq) dtcx ) t Ca .frey - X. frey) d , Ca)

= ( a. freq - x. frey ) C at -

a'eCx ) )

e-ZO nonnegative because nonnegativex is minimum frequency because a isleaf maximum depthleaf int

• swap Xand a

f Then swap

So BCT ) - BLT' ) Zo

.

Yana bfWe can show in the same way that BCT

' ) - BCT " )zo .

So BCT ) - BCT" ) ZO or BCT ) z BCT " ) .

But, since we assumed Tis optimal , BCT ) s BCT " I .

So,

BCT ) = BCT " ) !

We just showed that a . .

• . .

T ",

formed by merging lowest frequencycharacters

, is an optimal tree !

Of all possible mergers at each step , the

Huffman algorithm chooses the one thatincurs the least cost .

Now,

let's prove that the problem of constructing

optimal prefix codes has the optimal substructure

property .

• Lemma 16.3.

as before,

let x and ybe the

characters with the two lowest frequencies .

We must show that if T'

is optimal for C - Ex , y 3

then the step of

replacing z with÷÷. . . .

.

. .ae .resits in

1 \

?x y an optimal

tree for C.

BCT ) = ¥ qgjfneg-d.cc ) t Cx . freq ty.fregdcd.cat I )= BCT ' ) t x .freq t y . freq

o -

BC T' ) = BCT ) - x. for - y . far

• T .,÷.g!* eraserB ( T ' ) = BCT ) - x. for - y . far

Let's assume that what we want to be

true,

thatT is optimal for C

i

,is not

true,

and see itthis results in a contradiction .If T is not optimal for C , then for tree T "that is optimal for C , BC T

" ) < BCT ).

Call T' "

the tree that we get by merging x and y in

T"

.BLT ' " ) = BC T

" ) - x. freq - y .freq< BCT ) - x. fer - y . freer

= BCT ' )

So B CT' ' ' ) L B C T

' )

Contradicting our assumption thatT and T

'are optimal !