1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email:...

74
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: [email protected] Lecture 4 (week 2)

Transcript of 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email:...

Page 1: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

1

Source Coding and Compression

Dr.-Ing. Khaled Shawky HassanRoom: C3-222, ext: 1204,

Email: [email protected]

Lecture 4 (week 2)

Page 2: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

2

● Proposed by Dr. David A. Huffman in 1952

➢ “A Method for the Construction of Minimum Redundancy Codes”

● Applicable to many forms of data transmission

➢ Mostly used example: text files

Huffman Coding

Page 3: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

3

Huffman Coding Algorithm:

– The procedure to build Huffman codes

– Extended Huffman Codes (new)

Adaptive Huffman Coding (new)

– Update Procedure

– Decoding Procedure

Huffman Coding: What We Will Discuss

Page 4: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

4

Shannon-Fano Coding

● The second code based on Shannon’s theory✗ It is a suboptimal code (it took a graduate student

(Huffman) to fix it!)

● Algorithm:➢ Start with empty codes➢ Compute frequency statistics for all symbols➢ Order the symbols in the set by frequency➢ Split the set (almost half-half!) to minimize*difference➢ Add ‘0’ to the codes in the first set and ‘1’ to the rest➢ Recursively assign the rest of the code bits for the two

subsets, until sets cannot be split.

Page 5: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

5

Shannon-Fano Coding

● Example:● Assume a sequence: A={a,b,c,d,e,f} with the following

occurrence weights, {9, 8, 6, 5, 4, 2}, respectively● Apply Shannon-Fano Coding and discuss the sub-

optimality

Page 6: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

6

Shannon-Fano Coding

Page 7: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

7

Shannon-Fano Coding

Page 8: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

8

Shannon-Fano Coding

Page 9: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

9

Shannon-Fano Coding

Page 10: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

10

Shannon-Fano Coding

Page 11: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

11

Shannon-Fano Coding

e f

4

0 1

2

0 1

Page 12: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

12

Shannon-Fano Coding

● Shannon-Fano does not always produce optimal prefix codes; the ordering is performed only once at the beginning!!

● Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits

● Sometimes prefix-free codes are called, Huffman codes

● Symbol-by-symbol Huffman coding (the easiest one) is only optimal if the probabilities of these symbols are independent and are some power of a half, i.e. (½)n

Page 13: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

13

● Proposed by Dr. David A. Huffman in 1952

➢ “A Method for the Construction of Minimum Redundancy Codes”

● Applicable to many forms of data transmission

➢ Our example: text files

● In general, Huffman coding is a form of statistical coding as not all characters occur with the same frequency!

Huffman Coding

Page 14: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

14

● Why Huffman coding (likewise all entropy coding):

– Code word lengths are no longer fixed like ASCII.

– Code word lengths vary and will be shorter for the more frequently used characters, i.e., overall shorter average code length!

Huffman Coding: The Same Idea

Page 15: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

15

1. Scan the text to be compressed and compute the occurrence of all characters.

2. Sort or prioritize characters based on number of occurrences in text (from low-to-high).

3. Build Huffman code tree based on prioritized list.

4. Perform a traversal of tree to determine all code words.

5. Scan text again and create (encode) the characters in a new coded file using the Huffman codes.

Huffman Coding: The Algorithm

Page 16: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

16

● Consider the following short text:

– Eerie eyes seen near lake.● Count up the occurrences of all characters in the text

– E e r i e e y e s s e e n n e a r l a k e .

Huffman Coding: Building The Tree

E e r i space y s n a r l k .

Example

Page 17: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

17

Eerie eyes seen near lake.What is the frequency of each character in the text?

Huffman Coding: Building The Tree

Example

Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1

Page 18: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

18

Huffman Coding: Building The Tree

1. Create binary tree nodes with character and frequency of each character

2. Place nodes in a priority queue “??”

The lower the occurrence, the higher the priority in the queue

Page 19: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

19

Huffman Coding: Building The Tree

Uses binary tree nodes (OOP-like View; second bonus assignment: Construct the Huffman tree as follows!!)

public class HuffNode{

public char myChar;public int myFrequency;public HuffNode, myLeft, myRight;

}

priorityQueue myQueue;

Page 20: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

20

Huffman Coding: Building The Tree

The queue after inserting all nodes

Null Pointers are not shown

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Page 21: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

21

Huffman Coding: Building The Tree

While priority queue contains two or more nodes– Create new node– Dequeue node and make it left sub-tree– Dequeue next node and make it right sub-tree– Frequency of new node equals sum of frequency of left

and right children– Enqueue new node back into queue in the right order!!

Page 22: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

22

Huffman Coding: Building The Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Page 23: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

23

Huffman Coding: Building The Tree

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

E

1

i

1

Page 24: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

24

Huffman Coding: Building The Tree

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

E

1

i

1

2

y

1

l

1

2

2

2

Page 25: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

Huffman Coding: Building The Tree

Page 26: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

Huffman Coding: Building The Tree

Page 27: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

Huffman Coding: Building The Tree

Page 28: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

n

2

a

2sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

Huffman Coding: Building The Tree

Page 29: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

Huffman Coding: Building The Tree

Page 30: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

Huffman Coding: Building The Tree

Page 31: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

Huffman Coding: Building The Tree

Page 32: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4

Huffman Coding: Building The Tree

Page 33: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2k

1

.

1

2

r

2

s

2

4

n

2

a

2

4 4

Huffman Coding: Building The Tree

Page 34: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4 4

6

Huffman Coding: Building The Tree

Page 35: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2r

2

s

2

4

n

2

a

2

4 4 6

What is happening to the characters with a low number of occurrences?

Huffman Coding: Building The Tree

Page 36: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4 6

8

Huffman Coding: Building The Tree

Page 37: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4 6 8

Huffman Coding: Building The Tree

Page 38: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10

Huffman Coding: Building The Tree

Page 39: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2r

2

s

2

4

n

2

a

2

4 46

8 10

Huffman Coding: Building The Tree

Page 40: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10

16

Huffman Coding: Building The Tree

Page 41: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10 16

Huffman Coding: Building The Tree

Page 42: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

Huffman Coding: Building The Tree

Page 43: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

Huffman Coding: Building The Tree

After enqueueing this node there is only one node left in priority queue.

Page 44: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

Dequeue the single node left in the queue.

This tree contains the new code words for each character.

Frequency of root node should equal number of characters in text.

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Huffman Coding: Building The Tree

Eerie eyes seen near lake. ----> 26 characters

Page 45: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

CS 307

Perform a traversal of the tree to obtain new code words

Going left is a 0 going right is a 1 code word is only completed when

a leaf node is reached

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Encoding the File Traverse Tree

Page 46: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding (contd.)Char Code

E 0000

i 0001

y 0010

l 0011

k 0100

. 0101

space 011

e 10

r 1100

s 1101

n 1110

a 1111

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

0

0

0

0

1

1

1

11

1

1

1

1011

0

0

0

0

0 0

46

Page 47: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

Init: Create a priority set out of each letter

47

Page 48: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

1. Sort the sets according to probability (lowest first)

48

Page 49: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters

49

Page 50: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

3. Insert prefix ‘0’ into the codes of the second set letters

50

Page 51: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

4. Merge the top two sets

51

Page 52: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

1. Sort sets according to probability (lowest first)

52

Page 53: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters

53

Page 54: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

3. Insert prefix ‘0’ into the codes of the second set letters

54

Page 55: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

4. Merge the top two sets

55

Page 56: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

1. Sort sets according to probability (lowest first)

56

Page 57: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters

57

Page 58: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

3. Insert prefix ‘0’ into the codes of the second set letters

58

Page 59: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

4. Merge the top two sets

59

Page 60: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

1. Sort sets according to probability (lowest first)

60

Page 61: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters

61

Page 62: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

3. Insert prefix ‘0’ into the codes of the second set letters

62

Page 63: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (2)

The END

4. Merge the top two sets

63

Page 64: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Example Summary

• Average code length

l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol

• Entropy

H = Σs=a..eP(s) log2P(s) = 2.122 bits/symbol

• Redundancy

l – H = 0.078 bits/symbol

64

Page 65: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Example: Tree 1

Average code length

l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol65

Page 66: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (3)

• Symbols: {a,b,c,d,e,}• Weights: {0.2, 0.4, 0.2,0.1, 0.1}• Required: Maximum Variance Tree!

b, 0.4

c, 0.2

a, 0.2

de, 0.21

0

b, 0.4

c, 0.2 1

0dea, 0.4 b, 0.4 1

0deac, 0.6 deacb, 1.0b, 0.4

c, 0.2

a, 0.2

e, 0.1

d, 0.11

0

Page 67: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Example: Tree 1

Average code length

l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol67

Page 68: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Huffman Coding: Example (3)

• Symbols: {a,b,c,d,e,}• Weights: {0.2, 0.4, 0.2,0.1, 0.1}• Required: Minimum Variance Tree!

b, 0.4

c, 0.2

a, 0.2

de, 0.2

1

0b, 0.4

de, 0.2 1

0

ac, 0.4

ac, 0.4 1

0deb, 0.6 deacb, 1.0b, 0.4

c, 0.2

a, 0.2

e, 0.1

d, 0.11

0

Page 69: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Example: Minimum Variance Tree

Average code length

l = 0.4x2 + (0.1 + 0.1)x3+ (0.2 + 0.2)*2 = 2.2 bits/symbol

69

b

e d

ca

11

00

10

011

010

Page 70: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Example: Yet Another Tree

Average code length

l = 0.4x1 + (0.2 + 0.2 + 0.1 + 0.1)x3 = 2.2 bits/symbol 70

Page 71: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Min Variance Huffman Trees

• Huffman codes are not unique

• All versions yield the same average length

• Which one should we choose?

• The one with the minimum variance in codeword lengths• i.e., with the minimum height tree

• Why?

• It will ensure the least amount of variability in the encoded stream

71

Page 72: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Another Example!

Consider the source:

A = {a, b, c}, P(a) = 0.8, P(b) = 0.02, P(c) = 0.18H = 0.816 bits/symbol

Huffman code:a → 0b → 11c → 10

l = 1.2 bits/symbol

Redundancy = 0.384 bits/symbol (on average)(47%!)

Q: Could we do better?

72

Page 73: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Extended Huffman Codes

Example 1: Consider encoding sequences of two letters

Redundancy = 0.0045 bits/symbol

l = (0.64x1+0.144x2+0.144x3+0.0324x4+0.016x5+0.016x6+0.0036x7+0.0004x8+0.0036x8)/2 = 1.7228/2 bits/symbol

73

Page 74: 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 4 (week 2)

Extended Huffman Codes (Remarks)

The idea can be extended further• Consider all possible nm sequences (we did 32)

In theory, by considering more sequences we can improve the coding !!(is it applicable ? )

In reality, the exponential growth of the alphabet makes this impractical

• E.g., for length 3 ASCII seq.: 2563= 224= 16M

Practical consideration: most sequences would have zero frequency→ Other methods are needed (Adaptive Huffman Coding)

74