HUFFMAN CODING - Universiti Putra Malaysiacsnotes.upm.edu.my/.../$FILE/Huffman_Coding.pdf ·...

65
HUFFMAN CODING 1

Transcript of HUFFMAN CODING - Universiti Putra Malaysiacsnotes.upm.edu.my/.../$FILE/Huffman_Coding.pdf ·...

HUFFMAN CODING

1

2

In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm

Present a procedure for building Huffman codes when the probability model for the source is known

A procedure for building codes when the source statistics are unknown

Describe a new technique for code design that are in some sense similar to the Huffman coding approach

Some applications

Overview

3

Huffman Coding Algorithm

4

Huffman Coding Algorithm

5

Huffman Coding Algorithm

6

Huffman Coding Algorithm

7

Huffman Coding Algorithm

8

Huffman Coding Algorithm

9

Huffman Coding Algorithm

10

Huffman Coding Algorithm

11

Minimum Variance Huffman Codes

12

Minimum Variance Huffman Codes

13

Minimum Variance Huffman Codes

14

Minimum Variance Huffman Codes

15

Minimum Variance Huffman Codes

16

Huffman Coding (using binary tree)

Algorithm in 5 steps:

1. Find the grey-level probabilities for the image by finding the histogram

2. Order the input probabilities (histogram magnitudes) from smallest to largest

3. Combine the smallest two by addition

4. GOTO step 2, until only two probabilities are left

5. By working backward along the tree, generate code by alternating assignment of 0 and 1

17

Coding Procedures for an N-symbol source

Source reduction

List all probabilities in a descending order

Merge the two symbols with smallest probabilities into a new compound symbol

Repeat the above two steps for N-2 steps

Codeword assignment

Start from the smallest source and work back to the original source

Each merging point corresponds to a node in binary codeword tree

Huffman Coding (using binary tree)

Example 1

We have an image with 2 bits/pixel, giving 4

possible gray levels. The image is 10 rows by 10

columns. In step 1 we find the histogram for the

image.

18

Example 1

Converted into

probabilities by

normalizing to the total

number of pixels

Gray level 0 has 20 pixels

Gray level 1 has 30 pixels

Gray level 2 has 10 pixels

Gray level 3 has 40 pixels

19

a. Step 1: Histogram

Example 1

Step 2, the probabilities

are ordered.

20

Example 1

Step 3, combine the

smallest two by addition.

21

Example 1

Step 4 repeats steps 2

and 3, where reorder (if

necessary) and add the

two smallest

probabilities.

22

d. Step 4: Reorder and

add until only two values

remain.

Example 1

Step 5, actual code assignment is made.

Start on the right-hand side of the tree and assign 0’s &

1’s

0 is assigned to 0.6 branch & 1 to 0.4 branch

23

Example 1

The assigned 0 & 1 are brought back along the tree &

wherever a branch occurs the code is put on both

branches

24

Example 1

Assign the 0 & 1 to the branches labeled 0.3, appending

to the existing code.

25

Example 1

Finally, the codes are brought back one more level, &

where the branch splits another assignment 0 & 1 occurs

(at 0.1 & 0.2 branch)

26

Example 1

Now we have Huffman code for this image

2 gray levels have 3 bits to represent & 1 gray level has 1 bit

assigned

Gray level represented by 1 bit, g3, is the most likely to occur

(40% of the time) & thus has least information in the

information theoretic sense.27

Exercise

Using the example 1, find a Huffman code

using the minimum variance procedure.

EE465: Introduction to Digital Image Processing 28

29

symbol x p(x)

S

W

N

E

0.5

0.25

0.125

0.1250.25

0.25

0.5 0.5

0.5

Example 2

Step 1: Source reduction

(EW)

(NEW)

compound symbols

30

p(x)

0.5

0.25

0.125

0.1250.25

0.25

0.5 0.5

0.5 1

0

1

0

1

0

codeword

0

10

110

111

Example 2

Step 2: Codeword assignment

symbol x

S

W

N

E

NEW 0

10EW

110

EW

N

S

01

1 0

1 0111

31

Example 2

NEW 0

10EW

110

EW

N

S

01

1 0

1 0

NEW 1

01EW

000

EW

N

S

10

0 1

1 0001

The codeword assignment is not unique. In fact, at each

merging point (node), we can arbitrarily assign “0” and “1”

to the two branches (average code length is the same).

or

32

symbol x p(x)

e

o

a

i

0.4

0.2

0.2

0.1

0.4

0.2

0.4 0.6

0.4

Example 2

Step 1: Source reduction

(iou)

(aiou)

compound symbolsu 0.1

0.2(ou)

0.4

0.2

0.2

33

symbol x p(x)

e

o

a

i

0.4

0.2

0.2

0.1

0.4

0.2

0.4 0.6

0.4

Example 2

(iou)

(aiou)

compound symbols

u 0.10.2(ou)

0.4

0.2

0.2

Step 2: Codeword assignment

codeword0

1

1

01

000

0010

0011

34

Example 2

0 1

0100

000 001

0010 0011

e

o u

(ou)i

(iou) a

(aiou)

binary codeword tree representation

35

Example 2

symbol x p(x)

e

o

a

i

0.4

0.2

0.20.1

u 0.1

codeword1

01

0000010

0011

length1

23

4

4

bpsppXHi

ii 122.2log)(5

1

2

bpslpli

ii 2.241.041.032.022.014.05

1

bpsXHlr 078.0)(

If we use fixed-length codes, we have to spend three bits per

sample, which gives code redundancy of 3-2.122=0.878bps

36

Example 3

Step 1: Source reduction

compound symbol

37

Example 3

Step 2: Codeword assignment

compound symbol

38

Adaptive Huffman Coding

39

Adaptive Huffman Coding

40

Update Procedure

41

Update Procedure

42

Update Procedure

43

Update Procedure

44

Update Procedure

45

Update Procedure

46

Dynamic Huffman Coding

47

T

Stage 1 (First occurrence of t )

r

/ \

0 t(1)

Order: 0,t(1)

* r represents the root

* 0 represents the null node

* t(1) denotes the occurrence of T with a frequency of 1

48

TE

Stage 2 (First occurrence of e)

r

/ \

1 t(1)

/ \

0 e(1)

Order: 0,e(1),1,t(1)

49

TEN

Stage 3 (First occurrence of n )

r

/ \

2 t(1)

/ \

1 e(1)

/ \

0 n(1)

Order: 0,n(1),1,e(1),2,t(1) : Misfit

50

Reorder: TEN

r

/ \

t(1) 2

/ \

1 e(1)

/ \

0 n(1)

Order: 0,n(1),1,e(1),t(1),2

51

TENN

Stage 4 ( Repetition of n )

r

/ \

t(1) 3

/ \

2 e(1)

/ \

0 n(2)

Order: 0,n(2),2,e(1),t(1),3 : Misfit

52

Reorder: TENN

r

/ \

n(2) 2

/ \

1 e(1)

/ \

0 t(1)

Order: 0,t(1),1,e(1),n(2),2

t(1),n(2) are swapped

53

TENNE

Stage 5 (Repetition of e )

r

/ \

n(2) 3

/ \

1 e(2)

/ \

0 t(1)

Order: 0,t(1),1,e(2),n(2),3

54

TENNES

Stage 6 (First occurrence of s)

r

/ \

n(2) 4

/ \

2 e(2)

/ \

1 t(1)

/ \

0 s(1)

Order: 0,s(1),1,t(1),2,e(2),n(2),4

55

TENNESS

Stage 7 (Repetition of s)

r

/ \

n(2) 5

/ \

3 e(2)

/ \

2 t(1)

/ \

0 s(2)

Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit

56

Reorder: TENNESS

r

/ \

n(2) 5

/ \

3 e(2)

/ \

1 s (2)

/ \

0 t(1)

Order : 0,t(1),1,s(2),3,e(2),n(2),5

s(2) and t(1) are swapped

57

TENNESSE

Stage 8 (Second repetition of e )

r

/ \

n(2) 6

/ \

3 e(3)

/ \

1 s(2)

/ \

0 t(1)

Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit

58

Reorder: TENNESSE

r

/ \

e(3) 5

/ \

3 n(2)

/ \

1 s(2)

/ \

0 t(1)

Order : 1,t(1),1,s(2),3,n(2),e(3),5

N(2) and e(3) are swapped

59

TENNESSEE

Stage 9 (Second repetition of e )

r

0/ \1

e(4) 5

0/ \1

3 n(2)

0/ \1

1 s(2)

0/ \1

0 t(1)

Order : 1,t(1),1,s(2),3,n(2),e(4),5

60

ENCODING

The letters can be encoded as follows:

e : 0

n : 11

s : 101

t : 1001

61

Average Code Length

Average code length = i=0,n (length*frequency)/ i=0,n frequency

= { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1)

= 18 / 9 = 2

62

ENTROPY

Entropy = - i=1,n (pi log2 pi)

= - ( 0.44 * log20.44 + 0.22 * log20.22

+ 0.22 * log20.22 + 0.11 * log20.11 )

= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)

/ log2

= 1.8367

63

Ordinary Huffman Coding

TENNESSE

9

0/ \1

5 e(4)

0/ \1

s(2) 3

0/ \1

t(1) n(2)

ENCODING

E : 1

S : 00

T : 010

N : 011

Average code length = (1*4 + 2*2 +

2*3 + 3*1) / 9 = 1.89

64

SUMMARY

The average code length of ordinary Huffman coding seems to be

better than the Dynamic version,in this exercise.

But, actually the performance of dynamic coding is better. The problem

with static coding is that the tree has to be constructed in the transmitter

and sent to the receiver. The tree may change because the frequency

distribution of the English letters may change in plain text technical paper,

piece of code etc.

Since the tree in dynamic coding is constructed on the receiver as well, it

need not be sent. Considering this, Dynamic coding is better.

Also, the average code length will improve if the transmitted text is

bigger.

65

Summary of Huffman Coding Algorithm

Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time

Sorting symbols in descending probabilities is the key in the step of source reduction

The codeword assignment is not unique. Exchange the labeling of “0” and “1” at any node of binary codeword tree would produce another solution that equally works well

Only works for a source with finite number of symbols (otherwise, it does not know where to start)