Huffman Coding

data compression basics

1

HUFFMAN CODING

1

2

In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm

Present a procedure for building Huffman codes when the probability model for the source is known

A procedure for building codes when the source statistics are unknown

Describe a new technique for code design that are in some sense similar to the Huffman coding approach

Overview


2

3

Huffman Coding Algorithm

4



3

5


6



4

7


8



5

9


10



6

11

Minimum Variance Huffman Codes

12



7

13


14



8

15


16

Huffman Coding (using binary tree)

Algorithm in 5 steps:1. Find the grey-level probabilities for the image by

finding the histogram

2. Order the input probabilities (histogram magnitudes) from smallest to largest

3. Combine the smallest two by addition

4. GOTO step 2, until only two probabilities are left

5. By working backward along the tree, generate code by alternating assignment of 0 and 1


9

17

Coding Procedures for an N-symbol source Source reduction List all probabilities in a descending order

Merge the two symbols with smallest probabilities into a new compound symbol

Repeat the above two steps for N-2 steps

Codeword assignment Start from the smallest source and work back to the

original source

Each merging point corresponds to a node in binary codeword tree

Huffman Coding (using binary tree)

Example 1

We have an image with 2 bits/pixel, giving 4

possible gray levels. The image is 10 rows by 10

columns. In step 1 we find the histogram for the

image.

18


10

Example 1

Converted into probabilities by

normalizing to the total

number of pixels

Gray level 0 has 20 pixels




19

a. Step 1: Histogram

Example 1

Step 2, the probabilities are ordered.

20


11

Example 1

Step 3, combine the smallest two by addition.

21

Example 1

Step 4 repeats steps 2 and 3, where reorder (if

necessary) and add the

two smallest

probabilities.

22

d. Step 4: Reorder and

add until only two values

remain.


12

Example 1

Step 5, actual code assignment is made. Start on the right-hand side of the tree and assign 0s &

1s

0 is assigned to 0.6 branch & 1 to 0.4 branch

23

Example 1

The assigned 0 & 1 are brought back along the tree & wherever a branch occurs the code is put on both

branches

24


13

Example 1

Assign the 0 & 1 to the branches labeled 0.3, appending to the existing code.

25

Example 1

Finally, the codes are brought back one more level, & where the branch splits another assignment 0 & 1 occurs

(at 0.1 & 0.2 branch)

26


14

Example 1

Now we have Huffman code for this image 2 gray levels have 3 bits to represent & 1 gray level has 1 bit

assigned

Gray level represented by 1 bit, g3, is the most likely to occur (40% of the time) & thus has least information in the

information theoretic sense.27

Exercise

Using the example 1, find a Huffman code using the minimum variance procedure.

EE465: Introduction to Digital Image Processing 28


15

29

symbol x p(x)

S

W

N

E

0.5

0.25

0.125

0.1250.25

0.25

0.5 0.5

0.5

Example 2

Step 1: Source reduction

(EW)

(NEW)

compound symbols

30

p(x)

0.5

0.25

0.125

0.1250.25

0.25

0.5 0.5

0.5 1

0

1

0

1

0

codeword

0

10

110

111

Example 2

Step 2: Codeword assignment

symbol x

S

W

N

E

NEW 0

10EW

110

EW

N

S

01

1 0

1 0111


16

31

Example 2

NEW 0

10EW

110

EW

N

S

01

1 0

1 0

NEW 1

01EW

000

EW

N

S

10

0 1

1 0001

The codeword assignment is not unique. In fact, at each

merging point (node), we can arbitrarily assign 0 and 1

to the two branches (average code length is the same).

or

32

symbol x p(x)

e

o

a

i

0.4

0.2

0.2

0.1

0.4

0.2

0.4 0.6

0.4

Example 2


(iou)

(aiou)

compound symbolsu 0.1

0.2(ou)

0.4

0.2

0.2


17

33

symbol x p(x)

e

o

a

i

0.4

0.2

0.2

0.1

0.4

0.2

0.4 0.6

0.4

Example 2

(iou)

(aiou)

compound symbols

u 0.10.2(ou)

0.4

0.2

0.2


codeword

0

1

1

01

000

0010

0011

34

Example 2

0 1

0100

000 001

0010 0011

e

o u

(ou)i

(iou) a

(aiou)

binary codeword tree representation


18

35

Example 2

symbol x p(x)

e

o

a

i

0.4

0.2

0.20.1

u 0.1

codeword

1

01

0000010

0011

length1

23

4

4

bpsppXHi

ii 122.2log)(5

1

2

bpslpli

ii 2.241.041.032.022.014.05

1

bpsXHlr 078.0)(

If we use fixed-length codes, we have to spend three bits per

sample, which gives code redundancy of 3-2.122=0.878bps

36

Example 3


compound symbol


19

37

Example 3


compound symbol

38

Adaptive Huffman Coding


20

39

Adaptive Huffman Coding

40

Update Procedure


21

41

Update Procedure

42

Update Procedure


22

43

Update Procedure

44

Update Procedure


23

45

Update Procedure

46

Dynamic Huffman Coding


24

47

T

Stage 1 (First occurrence of t )

r

/ \

0 t(1)

Order: 0,t(1)

* r represents the root

* 0 represents the null node

* t(1) denotes the occurrence of T with a frequency of 1

48

TE

Stage 2 (First occurrence of e)

r

/ \

1 t(1)

/ \

0 e(1)

Order: 0,e(1),1,t(1)


25

49

TEN

Stage 3 (First occurrence of n )r

/ \

2 t(1)

/ \

1 e(1)

/ \

0 n(1)

Order: 0,n(1),1,e(1),2,t(1) : Misfit

50

Reorder: TEN

r

/ \

t(1) 2

/ \

1 e(1)

/ \

0 n(1)

Order: 0,n(1),1,e(1),t(1),2


26

51

TENN

Stage 4 ( Repetition of n )r

/ \

t(1) 3

/ \

2 e(1)

/ \

0 n(2)

Order: 0,n(2),2,e(1),t(1),3 : Misfit

52

Reorder: TENN

r

/ \

n(2) 2

/ \

1 e(1)

/ \

0 t(1)

Order: 0,t(1),1,e(1),n(2),2

t(1),n(2) are swapped


27

53

TENNE

Stage 5 (Repetition of e )r

/ \

n(2) 3

/ \

1 e(2)

/ \

0 t(1)

Order: 0,t(1),1,e(2),n(2),3

54

TENNES

Stage 6 (First occurrence of s)r

/ \

n(2) 4

/ \

2 e(2)

/ \

1 t(1)

/ \

0 s(1)

Order: 0,s(1),1,t(1),2,e(2),n(2),4


28

55

TENNESS

Stage 7 (Repetition of s)r

/ \

n(2) 5

/ \

3 e(2)

/ \

2 t(1)

/ \

0 s(2)

Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit

56

Reorder: TENNESS

r

/ \

n(2) 5

/ \

3 e(2)

/ \

1 s (2)

/ \

0 t(1)

Order : 0,t(1),1,s(2),3,e(2),n(2),5

s(2) and t(1) are swapped


29

57

TENNESSE

Stage 8 (Second repetition of e )

r

/ \

n(2) 6

/ \

3 e(3)

/ \

1 s(2)

/ \

0 t(1)

Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit

58

Reorder: TENNESSE

r

/ \

e(3) 5

/ \

3 n(2)

/ \

1 s(2)

/ \

0 t(1)

Order : 1,t(1),1,s(2),3,n(2),e(3),5

N(2) and e(3) are swapped


30

59

TENNESSEE

Stage 9 (Second repetition of e )

r

0/ \1

e(4) 5

0/ \1

3 n(2)

0/ \1

1 s(2)

0/ \1

0 t(1)

Order : 1,t(1),1,s(2),3,n(2),e(4),5

60

ENCODING

The letters can be encoded as follows:

e : 0

n : 11

s : 101

t : 1001


31

61

Average Code Length

Average code length = i=0,n (length*frequency)/ i=0,n frequency

= { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1)

= 18 / 9 = 2

62

ENTROPY

Entropy = - i=1,n (pi log2 pi)

= - ( 0.44 * log20.44 + 0.22 * log20.22

+ 0.22 * log20.22 + 0.11 * log20.11 )

= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)

/ log2

= 1.8367


32

63

Ordinary Huffman Coding

TENNESSE

9

0/ \1

5 e(4)

0/ \1

s(2) 3

0/ \1

t(1) n(2)

ENCODING

E : 1

S : 00

T : 010

N : 011

Average code length = (1*4 + 2*2 +

2*3 + 3*1) / 9 = 1.89

64

SUMMARY

The average code length of ordinary Huffman coding seems to be

better than the Dynamic version,in this exercise.

But, actually the performance of dynamic coding is better. The problem

with static coding is that the tree has to be constructed in the transmitter

and sent to the receiver. The tree may change because the frequency

distribution of the English letters may change in plain text technical paper,

piece of code etc.

Since the tree in dynamic coding is constructed on the receiver as well, it

need not be sent. Considering this, Dynamic coding is better.

Also, the average code length will improve if the transmitted text is

bigger.


33

65

Summary of Huffman Coding Algorithm

Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time

Sorting symbols in descending probabilities is the key in the step of source reduction

The codeword assignment is not unique. Exchange the labeling of 0 and 1 at any node of binary codeword tree would produce another solution that equally works well

Only works for a source with finite number of symbols (otherwise, it does not know where to start)

Huffman Coding

Documents

Transcript of Huffman Coding