Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated...

38
C.-C. Jay Kuo 1 Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California Los Angeles, CA 90080 E-mail: [email protected] C.-C. Jay Kuo 2 Contents n Background n Huffman coding n Arithmetic coding n QM coder n Lempel Ziv coding

Transcript of Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated...

Page 1: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 1

Principle of Lossless Compression

Dr. C.-C. Jay KuoIntegrated Media Systems CenterUniversity of Southern California

Los Angeles, CA 90080E-mail: [email protected]

C.-C. Jay Kuo 2

Contents

n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding

Page 2: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 3

Background (1)

n Basic concept from information theorynFor symbol Sk with probability P(Sk):Finformation: I(Sk)=-log2P(Sk)Fentropy: average of information:

n Shannon’s noiseless coding theorem:nFor a source with entropy H bits/symbol, it is

possible to find a distortionless coding scheme using an average of (H+e) bits/message (e>0)

H P S P Sk kk

L

= −=

∑ ( ) l o g ( )21

C.-C. Jay Kuo 4

Background (2)

n Modelingnphysical modelnstatistical modelFprobability model

– statistical model– adaptive model

FMarkov state modelFdictionary model

ncomposite model

Page 3: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 5

Variable Length Coding

n Principlento encode symbols with a higher probability

(less information) with fewer bits and encode symbols with a lower probability (more information) with more bits

n Its performance is better than fixed length codes

n Well known example: Huffman code

C.-C. Jay Kuo 6

Contents

n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding

Page 4: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 7

Shannon-Fano Code (1)

n Procedures:nSort symbols according to their probabilitiesnRepeatedly divide them into two groups with

almost equal probabilities until there is only one symbol left in each group

nBuild the coding tree from the root to leaves according to the sequence of separated lines

nSet ‘0’or ‘1’to each branch

C.-C. Jay Kuo 8

Shannon-Fano Code (2)

symbol: A B C D E Fprob: 0.39 0.18 0.15 0.15 0.07 0.06

1st

2nd

3rd

4th

separated line

tree line

root

0

0

10 0 1

1

0

1

1

5th

Page 5: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 9

Shannon-Fano Code (3)

n Code table: n entropy:

n bits average:

n coding redundancy:

A 0 0

B 0 1

C 1 0D 1 1 0

E 1 1 1 0

F 1 1 1 1

H P r P rk kk

=− ==∑ ( )log ( ) .23083

1

6

B P r L ravg k kk

= ==∑ ( ) ( ) .

1

6

241

CR B Havg= − =01017.

C.-C. Jay Kuo 10

Huffman Coding (1)

n Background:ndeveloped by David Huffman as part of a class

assignment. The class was taught by Robert Fano at MIT (1951).

n Huffman coder is a prefix-free coder.nNo codeword is other codewords’prefix. (no

decoding conflict)

n Huffman coder is an optimal coder for a given probability model.

Page 6: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 11

Huffman Coding (2)

n Based on the two observations on an optimal prefix coder:nsymbol that occur more frequently will have

shorter codeword than symbol that occur less frequently.

nthe two symbols that occur least frequently will have the same codeword length.Fbecause Huffman coder is a prefix coder, it is not

necessary to let the codeword of least prob symbol longer than that of second least prob symbol.

C.-C. Jay Kuo 12

Huffman Coding (3)

n Recursive Procedures:nsort each symbol by its probability and add the

node for each symbol into the coding list.ncombine the two nodes with least prob to one

new node which embedded the sum of these two prob.

nassign ‘0’and ‘1’arbitrarily to the branches of these two nodes with least prob.

ndrop these two nodes from the coding list.

Page 7: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 13

Huffman Coding (4)

n Unlike Shannon-Fano coder which is encoded from root to leaf, Huffman coder encodes from leaf to root.

n Disadvantages:nlarge computation complexity for a large

amount of symbols case.nstatic (fixed) Huffman codeword table can

severely affect the coding performance.

C.-C. Jay Kuo 14

Huffman Coding (5)

symbol: A B C D E Fprob: 0.39 0.18 0.15 0.15 0.07 0.06

tree line

0

0

1

1

0

0

0 1

1

1

0.13

0.28

0.33

0.61

1.0

Page 8: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 15

Huffman Coding (6)

n Code table: n entropy:

n bits average:

n coding redundancy:

A 0

B 100

C 101

D 110

E 1110

F 1111

H P r P rk kk

=− ==∑ ( )log ( ) .23083

1

6

B P r L ravg k kk

= ==∑ ( ) ( ) .

1

6

235

CR B Havg= − =00417.

C.-C. Jay Kuo 16

Modified Algorithm

n Disadvantages in Huffman coding:nlarge computation and sorting in the tree for

large amount of symbols.nperforms poor for large amount of data if using

the fixed probability table.

n Modified method:nforget factor method (Huffman,Arithmetic).nhigher order model (Huffman,Arithmetic).nadaptive tree (Huffman).

Page 9: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 17

Forget Factor Method (1)

n To separate the data to many different groups and update the estimated probability table by previous group’s statistical results.

n To use the forget factor to eliminate the high frequency effect between different groups.

n Encoder and decoder compute the estimated probability table synchronously.

n Can be used for any probability model based method.

C.-C. Jay Kuo 18

Forget Factor Method (2)

n Procedures:ninitial setting the estimated probability table by

uniform distribution model.nto code Kth group by the statistical result of the

(K-1)th group’s probability plus (K-2)th group’s probability multiplexed by the forget factor, f

Pr ( )Pr ( ) Pr ( )

Pr ( ) Pr ( )j

j j

i ii

LKK f K

K f K=

− + × −

− + × −=

1 2

1 21

Page 10: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 19

Higher Order Model:

n Previous probability table based on the order-0 model, i.e. to estimate each symbol’s probability independently.

n Order-N model means to compute the probability of Pr(Sk|St0,St1,… ,StN).

n Also named as “context model”.n Example: to compress the text in one book:Pr(‘e’)=0.1732 but Pr(‘e’|‘t’,‘h’)=0.8344.

C.-C. Jay Kuo 20

Adaptive Huffman Coding (1)

n A fixed estimated probability table performs poor for a large amount of data. However, adaptive estimated probability model can achieve better results.

n Based on the causal data information, and having no knowledge about the future statistics, adaptive Huffman coder let the Huffman tree on the fly.

Page 11: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 21

Adaptive Huffman Coding (2)

n Developed based on sibling propertyna Huffman tree is a binary tree with a weight

assigned to every node. neach node in a leaf-end represents to a symbol

with the weight proportional to its probability.nthe weight of a node is equal to the sum of its

two branches ( if existed ).nevery node can be listed in order or increasing

weight from left to right and from leaf to root.

C.-C. Jay Kuo 22

Adaptive Huffman Coding (3)

n Principles:nweight increasing process:Fincrease the value of weight of a node will cause the

tangent in all of this node’s parent nodes.

nswitching process:Fafter each weight increasing, we need to check the

tree satisfy sibling property or not. if against the sibling property, it need to do the switching process.Fswitch the node against the property to the most

right and up node with the weight less than it.

Page 12: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 23

Adaptive Huffman Coding (4)

n Initial: start from a ‘EOF’with weight 1n New node creating example:

ncreate a new node when input a new symbol:assume input a new symbol ‘A’:

EOF

w=1

EOF

w=1

?

w=0

w=1

EOF

w=1

A

w=1

w=2

new symbol= ‘A’

need to createa new node

please note that, we use the number of occurrence as the weight

C.-C. Jay Kuo 24

Adaptive Huffman Coding (5)

n Weight increasing example:

EOF

w=1

B

w=1

w=2

A

w=3

C

w=9

w=14

w=5

EOF

w=1

B

w=2

w=3

A

w=3

C

w=9

w=15

w=6

received one more ‘B’

changed

Page 13: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 25

Adaptive Huffman Coding (6)

n Switching example (I):

A

w=9EOF

w=1

B

w=1

w=2

C

w=2

D

w=2

w=4

w=6

w=15

A

w=9EOF

w=1

D

w=2

w=3

C

w=2

B

w=3

w=5

w=8

w=17

switching

after received two contiguous ‘B’

C.-C. Jay Kuo 26

Adaptive Huffman Coding (7)

n Switching example (II):

A

w=9EOF

w=1

D

w=2

w=3

C

w=2

B

w=3

w=5

w=8

w=17

A

w=9

w=1 w=2

EOF D

w=3C

w=2

B

w=4

w=5

w=9

w=18

received onemore ‘B’

switching

Page 14: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 27

Adaptive Huffman Coding (8)

n Applications:nlossless image compressionntext compressionnaudio compressionFcompression ratio can reach up to 1.65:1 for CD-

quality audio data (sampled by 44.1kHz for each stereo channel and each sample is represented by 16 bits)Fcan perform better widely from symphonic to folk

rock music.

C.-C. Jay Kuo 28

Adaptive Huffman Coding Encoding Examples n Encoding

nGiven symbol set {‘A’, ‘B’, ‘C’, ’D’, ‘E’}nInitial adaptive Huffman table as follows

B

ED

A

C0 1

0

0

0

1

1

1

W=20

W=1W=1

W=1

W=3

Page 15: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 29

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider first ‘D’, encode it as ‘0110’, update the tree.

B

ED

A

C0 1

0

0

0

1

1

1

W=20

W=1W=1

W=1

W=3

C.-C. Jay Kuo 30

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider first ‘D’, encode it as ‘0110’, update the tree.

B

ED

A

C0 1

0

0

0

1

1

1

W=20

W=1W=2

W=1

W=3

Switch these two

Page 16: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 31

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider first ‘D’, encode it as ‘0110’, update the tree.

B

EC

A

D0 1

0

0

0

1

1

1

W=20

W=1W=1

W=2

W=3

C.-C. Jay Kuo 32

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider second ‘D’, encode it as ‘010’, update the tree.

B

EC

A

D0 1

0

0

0

1

1

1

W=20

W=1W=1

W=3

W=3Switch these two subtreesbecause 2<3

W=2

Page 17: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 33

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider second ‘D’, encode it as ‘010’, then update the tree.

W=3

B

A0

0

1

1

1

W=20

EC0 1

W=1W=1

DW=3

0

C.-C. Jay Kuo 34

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider third ‘D’, encode it as ‘011’, then update the tree.

W=3

B

A0

0

1

1

1

W=20

EC0 1

W=1W=1

DW=4

Switch these two subtrees

0

Page 18: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 35

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider third ‘D’, encode it as ‘011’, then update the tree.

D

A0

0

1

1

1

W=20

EC0 1

W=1W=1

BW=3

W=40

C.-C. Jay Kuo 36

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider ‘A’, encode it as ‘1’, then update the tree.

D

A0

0

1

1

1

W=21

EC0 1

W=1W=1

BW=3

W=40

Page 19: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 37

Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:

nConsider ‘E’, encode it as ‘0101’, then update the tree.

D

A0

0

1

1

1

W=21

EC0 1

W=2W=1

BW=3

W=40

C.-C. Jay Kuo 38

Adaptive Huffman Coding Decoding Examples n Decoding

nGiven symbol set {‘A’, ‘B’, ‘C’, ’D’, ‘E’}nInitial adaptive Huffman table as follows

B

ED

A

C0 1

0

0

0

1

1

1

W=20

W=1W=1

W=1

W=3

Page 20: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 39

Adaptive Huffman Coding Decoding Examplesn To decode ‘01001001110100’:

nGo through the table from its root, decode until ‘010’and get ‘C’. Then update the table.

B

ED

A

C0 1

0

0

0

1

1

1

W=20

W=1W=1

W=2

W=3

C.-C. Jay Kuo 40

Adaptive Huffman Coding Decoding Examplesn To decode ‘C01001110100’:

nGo through the table from its root, decode until ‘010’and get ‘C’. Then update the table.

B

ED

A

C0 1

0

0

0

1

1

1

W=20

W=1W=1

W=3

W=3

Page 21: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 41

Adaptive Huffman Coding Decoding Examplesn To decode ‘CC01110100’:

nGo through the table from its root, decode until ‘011’and get ‘C’. Then update the table.

B

ED

A0

0

1

1

1

W=20

0 1

W=1W=1

C0

W=4

W=3

C.-C. Jay Kuo 42

Adaptive Huffman Coding Decoding Examplesn To decode ‘CCC10100’:

nGo through the table from its root, decode until ‘1’and get ‘A’. Then update the table.

ED

A0

0

1

1

1

W=21

0 1

W=1W=1

0C

W=4 BW=3

Page 22: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 43

Adaptive Huffman Coding Decoding Examplesn To decode ‘CCCA0100’:

nGo through the table from its root, decode until ‘0100’and get ‘D’. Then update the table.

ED

A0

0

1

1

1

W=21

0 1

W=1W=2

0C

W=4 BW=3

Switch these two nodes because 2>1

C.-C. Jay Kuo 44

Adaptive Huffman Coding Decoding Examplesn Final result:

nDecoded symbols ‘CCCAD’nHuffman table

DE

A0

0

1

1

1

W=21

0 1

W=2W=1

0C

W=4 BW=3

Page 23: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 45

Contents

n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding

C.-C. Jay Kuo 46

Arithmetic Coding (1)

n Unlike Huffman code which is a block code (assign variable and integer length code to each block), arithmetic code is a non-block code (also known as tree code).

n Especially useful when dealing with sources with small alphabets, such as binary sources, and alphabets with highly skewed probability.

Page 24: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 47

Arithmetic Coding (2)

n Example: problem of Huffman code:

n Higher-order model still not work:

Symbol CodewordA 0B 11C 10

Pr(A)=0.95, Pr(B)=0.03, Pr(C)=0.02

Entropy=0.335 bits/symbolBavg=1.05 bits/symbolCoding Redundancy=0.715 bits/symbol

S y m b o l C o d e w o r dA A 0A B 1 1 1A C 1 0 0B A 1 1 0 1B B 1 1 0 0 1 1B C 1 1 0 0 0 1C A 1 0 1C B 1 1 0 0 1 0C C 1 1 0 0 0 0

Entropy=0.669 bits/symbolBavg=1.221 bits/symbolCoding Redundancy=0.552 bits/symbol

better but still not efficient !!

Always use integer length

C.-C. Jay Kuo 48

Arithmetic Coding (3)

n Example of arithmetic coding (I):Pr(A)=0.2, Pr(B)=0.4, Pr(C)=0.2, Pr(D)=0.1, Pr(E)=0.1

0.6

0.52

0.44

0.28

0.2

0.56{{

{

{{

E

D

B

A

C

0.28

0.264

0.248

0.216

0.2

0.272{{

{

{{

E

D

B

A

C

0.264

0.2608

0.2576

0.2512

0.248

0.2624{{

{

{{

E

D

B

A

CC DA

1.0

0.8

0.6

0.2

0.0

0.9{{

{

{{

E

D

B

A

CB

Page 25: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 49

Arithmetic Coding (4)

n Example of arithmetic coding (II):nif we want to encode the sequence “BACD”,

then any number within the range of [0.2608, 0.2624) can be used to represent this sequence.

nin general, we choose the central value of the range as our output code, like 0.2616.

nDecoding procedure:Ffind out the coding interval [BL,BU) for the given

code T (in floating) and output the corresponding symbol belong to that interval.

C.-C. Jay Kuo 50

Arithmetic Coding (5)

( ) . [ . , . )( . ) / ( . . ) .

( ) . [ . , . )( . ) / ( . . ) .

( ) . [ . , . )( . ) / ( . . ) .

1 0 2 6 1 6 0 2 0 60 2 0 6 0 2 0 1 5 4

2 0 1 5 4 0 0 0 20 0 0 2 0 0 0 7 7

3 0 7 7 0 6 0 80 6 0 8 0 6 0 8 5

Q

Q

Q

T BT T

T AT T

T CT T

= ∈ ∴ ⇒= − − =

= ∈ ∴ ⇒= − − =

= ∈ ∴ ⇒= − − =

' ' ,a n d

' ' ,a n d

' ' ,a n d

Fchange the value of T by:

T T B B BL U L= − −( ) / ( )

Decoding example for given code 0.2616:

Page 26: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 51

Arithmetic Coding (6)

n Adaptive model:nforget factor method.nhigher order model (context model).

n Algorithm for binary arithmetic coder:nQ coder by IBM, 1988.nQM coder by IBM, 1990.Fincluded into CCITT JBIG standard, 1992.

C.-C. Jay Kuo 52

Arithmetic Coding Encoding Examplesn Given symbol set and probability of each

symbol, encode ‘DBEFA.’

A ( E O F ) 0 . 1 0 . 0 - 0 . 1

B 0 . 1 0 . 1 - 0 . 2

C 0 . 1 0 . 2 - 0 . 3

D 0 . 2 0 . 3 - 0 . 5E 0 . 2 5 0 . 5 - 0 . 7 5

F 0 . 2 5 0 . 7 5 - 1 . 0

Symbol Probability Assigned range

Page 27: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 53

Arithmetic CodingEncoding Examplesn Encode ‘DBEFA’

nD:[0.3,0.5) 0.5-0.3=0.2

ndB: [0.3+0.2*0.1=0.32, 0.3+0.2*0.2=0.34)

A (EOF) 0.1 0.0-0.1B 0.1 0.1-0.2C 0.1 0.2-0.3D 0.2 0.3-0.5E 0.25 0.5-0.75F 0.25 0.75-1.0

C.-C. Jay Kuo 54

Arithmetic CodingEncoding Examplesn Encode ‘DBEFA’

nDBE:[0.32+0.02*0.5=0.33, 0.32+0.02*0.75=0.335)

nDBEF: [0.33375, 0.335)nDBEFA:[0.33375, 0.333875)

A (EOF) 0.1 0.0-0.1B 0.1 0.1-0.2C 0.1 0.2-0.3D 0.2 0.3-0.5E 0.25 0.5-0.75F 0.25 0.75-1.0

Page 28: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 55

Arithmetic CodingDecoding Examplesn Decode ‘0.8302’n Result: ‘FDBA’

A (EOF) 0.1 0.0-0.1B 0.1 0.1-0.2C 0.1 0.2-0.3D 0.2 0.3-0.5E 0.25 0.5-0.75F 0.25 0.75-1.0

0.83020.75 F

0.08020.250.32080.30.0208

D0.2

0.1040.1 B

0.0040.10.04 A

C.-C. Jay Kuo 56

Contents

n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding

Page 29: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 57

QM Coder (1)

n An adaptive binary arithmetic codern An entropy coder standard of JBIG and

approved by CCITT in 1992n A lineal descendent of the Q coder

developed by IBM in 1988, but enhanced by the improvements of:ninterval subdivisionnprobability estimation

C.-C. Jay Kuo 58

QM coder (2)

DATA Binary Decision Tree

0* 00 1S01 1S102~3 1S110M4~7 1S1110MM8~15 1S11110MMM

n Non-binary data need to be modified by binary decision tree

original input data, X=0

modified data D, where D=abs(X)-1

S:sign bit of XM:data bits of D

D

Page 30: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 59

QM Coder (3)

A: Coding intervalC: Coding index, always points to the bottom of AQe: Probability of LPS

n Basic QM coder algorithmnAfter MPS (Maximum Probability Symbol)FC=CFA=A(1-Qe)

nAfter LPS (Least Probability Symbol)FC=C+A(1-Qe)FA=A*Qe

C.-C. Jay Kuo 60

QM Coder (4)

A0

A1

A2

A3

A4

C0 C1 C2

C3

C4

Input Data Sequence: 0011Qe=0.25, MPS=0

1.5

0.0

A2=A1x0.75A3=A2x0.25A4=A3x0.25

n Example of basic QM coder algorithm

98

Page 31: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 61

QM Coder (5)

n Modified QM coder algorithmnPotentially unbounded precision for AFLeft shift 1 bit for A & C whenever A < 0.75 ( re-

normalization )FGenerate new Qe after each re-normalization

nConditional exchange whenever A < QeFExchange MPS and LPS

nElimination of multiplicationA Q Q Ae e× ≅ > ≥ if 15 075. .

C.-C. Jay Kuo 62

MPS Re-normalization1.5

0.75

0

Qe(1)

Qe(3)Qe(2)

Qe(1)>Qe(2)>Qe(3)because of the continuousreceiving of MPSMPS renormalization

occurred point

Page 32: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 63

Algorithm of QM Coder

n After MPSC=C;A=A-Qe;if (A<0x8000){

if (A<Qe){

C+=A;A=Qe;

}change Qe state for MPS-renormalization;

output MSB of C;A<<=1; C<<=1;

}

n After LPSC=C;A=A-Qe;if (A>=Qe){

C+=A;A=Qe;

}change Qe state for LPS-

renormalization;output MSB of C;A<<=1;C<<=1;

C.-C. Jay Kuo 64

Model of Context of QM Coder

?? K bit plane

K-2 bit plane

K-1 bit plane

(I) causal context model (II) parent-children contextmodel in EZW/LZC

Page 33: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 65

Contents

n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding

C.-C. Jay Kuo 66

Lempel Ziv Coding (1)

n Dictionary techniques (I):nto keep a list, or dictionary, of the frequently

occurring pattern.nuse a more efficient method to code those

pattern in the dictionary.nthere are two types of dictionary: static and

adaptive dictionary. In general, adaptive dictionary performs better than static ones. The well known adaptive cases are: LZ77, LZ78 and LZW.

Page 34: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 67

Lempel Ziv Coding (2)

n Example of dictionary technique:nSuppose we have a particular text data consists

of 4 characters word (chosen from 32 different characters). For fixed length code case, we need 20 bits (324=220) to represent all of the four-character patterns.

nIf we put the 256 most likely four-character in a dictionary, then we only need 8 bits to encode these patterns and use 20 bits to encode those patterns which are not in the dictionary.

C.-C. Jay Kuo 68

Lempel Ziv Coding (3)

n Example of static dictionary:

S y m b o l c o d e S y m b o l c o d ea 0 0 0 r 1 0 0b 0 0 1 a b 1 0 1c 0 1 0 a c 1 1 0d 0 1 1 a d 1 1 1

assume we have a text data contains ‘a’,’b’,’c’,’d’and ‘r’only,then build a static code table as below:

coding procedure:(1) read two characters from the input text data.(2) if these two characters inside the dictionary, then output the code

and goto step (1).(3) otherwise, output the first character by its code and keep the second

character as the first character in next iteration. read only one character from the text and goto step (2).

Page 35: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 69

Lempel Ziv Coding (4)

n Most adaptive dictionary based techniques have their roots in two landmark papers by Jacob Ziv and Abraham Lempel in 1977 and 1978. (LZ77/LZ78)

n LZW, which developed by Terry Welch “A Technique for High-Performance Data Compression”in IEEE Computer in 1984, make a significant improvement on LZ78.

C.-C. Jay Kuo 70

Lempel Ziv Coding (5)

n Algorithm of LZW:nInitialize the dictionary for each single symbol

with an index. Set the buffer to empty.nA new pattern means the combination of the old

pattern in the buffer and the input symbol. if the new pattern is not inside the dictionary, then output the index of the old pattern in the buffer and reset the buffer to the input symbol.

nOtherwise, reset the buffer to the new pattern and no output is necessary.

Page 36: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 71

Lempel Ziv Coding (6)

n Example of LZW (I):(0) assume the text data consists of ‘A’, ‘B’, ‘C’and ‘D’only.

initial the dictionary with these four symbols only.S y m b o l I n d e x S y m b o l I n d e x

A 1 C 3B 2 D 4

we want to encode a text data sequence:‘ABABBAABBCD’

(1) read ‘A’, already inside our dictionary (index ‘1’), then change the buffer=‘A’

C.-C. Jay Kuo 72

Lempel Ziv Coding (7)

n Example of LZW (II):(2) read ‘B’, but ‘AB’is not in our dictionary, then output our

buffer which is ‘A’now and add ‘AB’into our dictionary with index ‘5’. clean the buffer and put ‘B’into our buffer.

(3) read ‘A’, but ‘BA’is not in our dictionary, then output thebuffer (‘B’) and add ‘BA’into the dictionary with index‘6’. reset the buffer and put ‘A’in it.

(4) read ‘B’and ‘AB’is inside the dictionary, no output is necessary but change the value of buffer to ‘AB’.

(5) read ‘B’and ‘ABB’is not in the dictionary, then output thebuffer which is ‘AB’(with index ‘5’) and add ‘ABB’intodictionary with index ‘7’and buffer change to ‘B’only.

Page 37: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 73

Lempel Ziv Coding (8)

n Example of LZW (III):(6) read ‘A’and ‘BA’is inside the dictionary, no output is

necessary but change the value of buffer to ‘BA’.(7) read ‘A’but ‘BAA’is not in our dictionary, then output

the buffer (‘BA’) and add ‘BAA’into the dictionary with index ‘8’. reset the buffer and put ‘A’in it.

(8) read ‘B’and ‘AB’is inside the dictionary, no output is necessary but change the value of buffer to ‘AB’.

(9) read ‘B’and ‘ABB’is inside the dictionary, no output is necessary but change the value of buffer to ‘ABB’.

(10) read ‘A’but ‘ABBA’is not in our dictionary, then output the buffer (‘ABB’) and add ‘ABAA’into the dictionary.

C.-C. Jay Kuo 74

Lempel Ziv Coding (9)

n Applications:n“compress”command in UNIX systemn“ARJ”, “PKZIP”and “LHA”in PC systemntelephone network communication standard

which used in MODEM, V.42 bis by CCITTnimage compression standard, Graphics

Interchange Format (GIF), by Compuserve Information Service

Page 38: Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated Media Systems Center University of Southern California ... C.-C. Jay Kuo 21 Adaptive

C.-C. Jay Kuo 75

Reference

n Khalid Sayood, “Introduction to Data Compression”, Morgan K Publisher, 1996.

n Mark Nelson, “The Data Compression Book”, M&T books, 1992.n M. Rabbani and P. W. Jones, “Digital Image Compression

Techniques ”, SPIE Optical Engineering Press, 1991.n K. R. Rao and J. J. Hwang, “Techniques & Standards for Image, Video

& Audio Coding”, Prentice Hall PTR, 1996.n P. K. Andleigh and K. Thakrar, “Multimedia Systems Design”,

Prentice Hall PTR, 1996.n W. B. Pennebaker and J. L. Mitchell, “JPEG: Still Image data

Compression Standard”, Van Nostrand Reinhold, 1993.