Huffman CodingHow can we represent a string of
characters with a minimum number
of bits ?
Answered by David Huffman in 1952
when he was a graduate student at MIT .
• Eachcharacter represented by a bitstring code
.
• Codes areof various lengths .
• Codes are prefix codes ( prefix - free ).
No code is
prefixof another
÷¥÷i¥÷xa¥÷Y_bz Prefix cod
How would you represent a string of
Characters 6,
C,
T,
and A ?
Assign shortest codes to most frequentcharacters
.Let's
say frequencies are
G C T A-
- - -
O . 4 0.3 0.2 0.1
One prefix code is
G C T A-
- --
O to 110 I I I
The stringC G TT A CC AT AA
is10 O 110 110 Ill to LO Ill 110 Ill Ill
Canyou decode this ? This binary tree
Ohelps :
ToY
④ to%
to to
How many bits are needed for tooo GST , As ?
G C T A-
- - -
0.4 0.3 0.2 0.1
CodeG : 0.4 .1000 = 400 o
C
:0.3 - 1000 = 300 10
T:
0.2 . 1000 = 200 110
A : Oil . 1000 = COO Ill
Number of bits = 400.1 t 300.2 t 200.3+1003
= 4001-600 t 600 t 300
= 1,900 bits
IWhat if we used a fixed-length code ?Not
Four characters only needs 2 bit each much
G C T A less- - - -
00 01 LO 11
4000 characters = 2. ooo bits)
what if frequencies were
G C T A- - - -
O . 8 O. I 0.05 0.05
O LO I I O C I I
1,000 characters = 800 . I t 100.2+50-3+50.3= 800 t 200 t 150 t 150
= 1,300
fixed length still 2,000
Savings increases with number of characters.
How to construct a Huffman code ?or
How to construct that decoding binary tree ?
Start with a leaf for each character withits frequency .
G C T A-
-- -
O . 4 0.3 0.2 O. I
Merge two nodes with lowest frequency .
Sum their frequencies .
G C
074Is0.3
I IT ATo
Repeat G-0.4 0.6
I \c
OJ 0.3
I IT A
- -
0.2 O. I
1.0
Repeat again GI I
074 0.6
I \C-
0.30.3
I IT A
- -
and assign O to deff0.2 0.1
subtrees and I to right ones.
1.0
4
YG
074 0.6
OOfYC-
0.3O . 3
10
01IT A
- -
0.2 O. I
110Ill
Pseudo code from page 431.
Assume Q is a min - priority queue
of objects containing a frequency and,
if a
leaf,
the character.
In the queue,
the order
is determined by the frequency .
Let C . be the initial objects containingeach character and its frequency .
Each object has attributes . freer. lefta right .
n = ICI
Q = C
for i = I to n - I
2 = new object2. left = x = pop C Q )
2. right =
y =
pop CQ )
Z. frey = X. frog t y . freq
pushCQ,
z )return pop CQ ) # Top node
To encode a string of characters,
a dictionary of Huffman codes,
indexed ( keyed ) by characters,
would
be useful.
How can you construct this dictionaryfrom the tree ?
1.0
4
YG
074 0.6
OOfYC-
0.3O . 3
10
01IT A
- -
0.2 O. I
110Ill
Now,
how would you use this
dictionary to encode a text
String ?,
gcc AT Gj ⇒ 010104
!" 0010
And how would you decode this ?
0101.04! 110010 ⇒
'GCC AT 6C
'
1.0
4
YG
074 0.6
0OfYc
Is o . 3
10
01I
'
T A- -
0.2 O. I
110Ill
Why does the Huffman algorithm guaranteean optimal ( minimum size ) prefix - free code ?
or
Does the greedy choice of merging the two
lowest frequency characters always work ?
Lemma 16.2-
T T is an optimal tree.
×a and b are at maximum
depth .
x andy ane characters
Y with lowest frequency .
I \Swap x and a
,and
a b
yand b
.
T"
a
Cost of codes for T"
Same as cost for T ,
is I s Iimisaiatsoeea:I \x↳y
Details : Assume a. freq E b.freq
x. freq E y . freq
BCT ) is the expected cost of encodings using T .
BCT ) = Eec c. freq . of Cc )
set of 9 ( depth in Tof character c,
all characters or number of bits in c code.
Let T'
be the tree after swapping just x and a.
BCT ) - BCT' )
= Ifc c. freq . of Cc )-
€ cc -freq 'd,
! )
= X. freq . of Cx ) ta. freq . of Ca ) - x. freq . df.cc ) - a. freq . Ipcc)
= X. freq . dy Cx ) ta . freq .
of.ca ) - X. freq - afca ) - a. freq . dyed= @freq -
a. freq) dtcx ) t Ca .frey - X. frey) d,
Ca)
= ( a. freq - x. frey ) C at -
a'eCx ) )
e-ZO nonnegative because nonnegativex is minimum frequency because a is
leaf maximum depthleaf int
swap X and a
f Then swap
So BCT ) - BLT' ) Zo
.
Yana bfWe can show in the same way that BCT ' ) - BCT " )zo
.
So BCT ) - BCT " ) ZO or BCT ) z BCT " ).
But, since we assumed Tis optimal ,
BCT ) s BCT " I.
So,
BCT ) = BCT " ) !
We just showed thata . .
• . .
T "
,formed by merging lowest frequency
characters, is an optimal tree !
Of all possible mergers at each step ,the
Huffman algorithm chooses the one that
incurs the least cost.
Now,
let's prove that the problem of constructing
optimal prefix codes has the optimal substructure
property .
Lemma 16.3.
as before,
let x andy
be the
characters with the two lowest frequencies .
We must show that if T'
is optimal for C - Ex, y 3
then the step of
replacing z with÷÷. .
. ..
. .ae.
resits in1 \
?x y an optimal
tree for C.
BCT ) = ¥ qgjfneg-d.cc) t Cx
. freq ty.fregdcd.cat I )
= BCT ' ) t x .freq t y . freqo -
BC T' ) = BCT ) - x. for -
y . far
T .,÷.g!* eraser
B ( T' ) = BCT ) - x. for -
y . farLet's assume that what we want to be
true,
thatT is optimal for C
i
,is not
true,
and see itthis results in a contradiction.
If T is not optimal for C,
then for tree T"
that is optimal for C,
BC T" ) < BCT )
.
Call T' "
the tree that we get by merging x andy
in
T"
.BLT ' " ) = BC T
" ) - x. freq -
y .freq< BCT ) -
x. fer -
y . freer= BCT ' )
So B CT' ' ' ) L B C T
' )
Contradicting our assumption thatT and T
'are optimal !
Top Related