Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated...
Transcript of Principle of Lossless Compression · Principle of Lossless Compression Dr. C.-C. Jay Kuo Integrated...
C.-C. Jay Kuo 1
Principle of Lossless Compression
Dr. C.-C. Jay KuoIntegrated Media Systems CenterUniversity of Southern California
Los Angeles, CA 90080E-mail: [email protected]
C.-C. Jay Kuo 2
Contents
n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding
C.-C. Jay Kuo 3
Background (1)
n Basic concept from information theorynFor symbol Sk with probability P(Sk):Finformation: I(Sk)=-log2P(Sk)Fentropy: average of information:
n Shannon’s noiseless coding theorem:nFor a source with entropy H bits/symbol, it is
possible to find a distortionless coding scheme using an average of (H+e) bits/message (e>0)
H P S P Sk kk
L
= −=
∑ ( ) l o g ( )21
C.-C. Jay Kuo 4
Background (2)
n Modelingnphysical modelnstatistical modelFprobability model
– statistical model– adaptive model
FMarkov state modelFdictionary model
ncomposite model
C.-C. Jay Kuo 5
Variable Length Coding
n Principlento encode symbols with a higher probability
(less information) with fewer bits and encode symbols with a lower probability (more information) with more bits
n Its performance is better than fixed length codes
n Well known example: Huffman code
C.-C. Jay Kuo 6
Contents
n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding
C.-C. Jay Kuo 7
Shannon-Fano Code (1)
n Procedures:nSort symbols according to their probabilitiesnRepeatedly divide them into two groups with
almost equal probabilities until there is only one symbol left in each group
nBuild the coding tree from the root to leaves according to the sequence of separated lines
nSet ‘0’or ‘1’to each branch
C.-C. Jay Kuo 8
Shannon-Fano Code (2)
symbol: A B C D E Fprob: 0.39 0.18 0.15 0.15 0.07 0.06
1st
2nd
3rd
4th
separated line
tree line
root
0
0
10 0 1
1
0
1
1
5th
C.-C. Jay Kuo 9
Shannon-Fano Code (3)
n Code table: n entropy:
n bits average:
n coding redundancy:
A 0 0
B 0 1
C 1 0D 1 1 0
E 1 1 1 0
F 1 1 1 1
H P r P rk kk
=− ==∑ ( )log ( ) .23083
1
6
B P r L ravg k kk
= ==∑ ( ) ( ) .
1
6
241
CR B Havg= − =01017.
C.-C. Jay Kuo 10
Huffman Coding (1)
n Background:ndeveloped by David Huffman as part of a class
assignment. The class was taught by Robert Fano at MIT (1951).
n Huffman coder is a prefix-free coder.nNo codeword is other codewords’prefix. (no
decoding conflict)
n Huffman coder is an optimal coder for a given probability model.
C.-C. Jay Kuo 11
Huffman Coding (2)
n Based on the two observations on an optimal prefix coder:nsymbol that occur more frequently will have
shorter codeword than symbol that occur less frequently.
nthe two symbols that occur least frequently will have the same codeword length.Fbecause Huffman coder is a prefix coder, it is not
necessary to let the codeword of least prob symbol longer than that of second least prob symbol.
C.-C. Jay Kuo 12
Huffman Coding (3)
n Recursive Procedures:nsort each symbol by its probability and add the
node for each symbol into the coding list.ncombine the two nodes with least prob to one
new node which embedded the sum of these two prob.
nassign ‘0’and ‘1’arbitrarily to the branches of these two nodes with least prob.
ndrop these two nodes from the coding list.
C.-C. Jay Kuo 13
Huffman Coding (4)
n Unlike Shannon-Fano coder which is encoded from root to leaf, Huffman coder encodes from leaf to root.
n Disadvantages:nlarge computation complexity for a large
amount of symbols case.nstatic (fixed) Huffman codeword table can
severely affect the coding performance.
C.-C. Jay Kuo 14
Huffman Coding (5)
symbol: A B C D E Fprob: 0.39 0.18 0.15 0.15 0.07 0.06
tree line
0
0
1
1
0
0
0 1
1
1
0.13
0.28
0.33
0.61
1.0
C.-C. Jay Kuo 15
Huffman Coding (6)
n Code table: n entropy:
n bits average:
n coding redundancy:
A 0
B 100
C 101
D 110
E 1110
F 1111
H P r P rk kk
=− ==∑ ( )log ( ) .23083
1
6
B P r L ravg k kk
= ==∑ ( ) ( ) .
1
6
235
CR B Havg= − =00417.
C.-C. Jay Kuo 16
Modified Algorithm
n Disadvantages in Huffman coding:nlarge computation and sorting in the tree for
large amount of symbols.nperforms poor for large amount of data if using
the fixed probability table.
n Modified method:nforget factor method (Huffman,Arithmetic).nhigher order model (Huffman,Arithmetic).nadaptive tree (Huffman).
C.-C. Jay Kuo 17
Forget Factor Method (1)
n To separate the data to many different groups and update the estimated probability table by previous group’s statistical results.
n To use the forget factor to eliminate the high frequency effect between different groups.
n Encoder and decoder compute the estimated probability table synchronously.
n Can be used for any probability model based method.
C.-C. Jay Kuo 18
Forget Factor Method (2)
n Procedures:ninitial setting the estimated probability table by
uniform distribution model.nto code Kth group by the statistical result of the
(K-1)th group’s probability plus (K-2)th group’s probability multiplexed by the forget factor, f
Pr ( )Pr ( ) Pr ( )
Pr ( ) Pr ( )j
j j
i ii
LKK f K
K f K=
− + × −
− + × −=
∑
1 2
1 21
C.-C. Jay Kuo 19
Higher Order Model:
n Previous probability table based on the order-0 model, i.e. to estimate each symbol’s probability independently.
n Order-N model means to compute the probability of Pr(Sk|St0,St1,… ,StN).
n Also named as “context model”.n Example: to compress the text in one book:Pr(‘e’)=0.1732 but Pr(‘e’|‘t’,‘h’)=0.8344.
C.-C. Jay Kuo 20
Adaptive Huffman Coding (1)
n A fixed estimated probability table performs poor for a large amount of data. However, adaptive estimated probability model can achieve better results.
n Based on the causal data information, and having no knowledge about the future statistics, adaptive Huffman coder let the Huffman tree on the fly.
C.-C. Jay Kuo 21
Adaptive Huffman Coding (2)
n Developed based on sibling propertyna Huffman tree is a binary tree with a weight
assigned to every node. neach node in a leaf-end represents to a symbol
with the weight proportional to its probability.nthe weight of a node is equal to the sum of its
two branches ( if existed ).nevery node can be listed in order or increasing
weight from left to right and from leaf to root.
C.-C. Jay Kuo 22
Adaptive Huffman Coding (3)
n Principles:nweight increasing process:Fincrease the value of weight of a node will cause the
tangent in all of this node’s parent nodes.
nswitching process:Fafter each weight increasing, we need to check the
tree satisfy sibling property or not. if against the sibling property, it need to do the switching process.Fswitch the node against the property to the most
right and up node with the weight less than it.
C.-C. Jay Kuo 23
Adaptive Huffman Coding (4)
n Initial: start from a ‘EOF’with weight 1n New node creating example:
ncreate a new node when input a new symbol:assume input a new symbol ‘A’:
EOF
w=1
EOF
w=1
?
w=0
w=1
EOF
w=1
A
w=1
w=2
new symbol= ‘A’
need to createa new node
please note that, we use the number of occurrence as the weight
C.-C. Jay Kuo 24
Adaptive Huffman Coding (5)
n Weight increasing example:
EOF
w=1
B
w=1
w=2
A
w=3
C
w=9
w=14
w=5
EOF
w=1
B
w=2
w=3
A
w=3
C
w=9
w=15
w=6
received one more ‘B’
changed
C.-C. Jay Kuo 25
Adaptive Huffman Coding (6)
n Switching example (I):
A
w=9EOF
w=1
B
w=1
w=2
C
w=2
D
w=2
w=4
w=6
w=15
A
w=9EOF
w=1
D
w=2
w=3
C
w=2
B
w=3
w=5
w=8
w=17
switching
after received two contiguous ‘B’
C.-C. Jay Kuo 26
Adaptive Huffman Coding (7)
n Switching example (II):
A
w=9EOF
w=1
D
w=2
w=3
C
w=2
B
w=3
w=5
w=8
w=17
A
w=9
w=1 w=2
EOF D
w=3C
w=2
B
w=4
w=5
w=9
w=18
received onemore ‘B’
switching
C.-C. Jay Kuo 27
Adaptive Huffman Coding (8)
n Applications:nlossless image compressionntext compressionnaudio compressionFcompression ratio can reach up to 1.65:1 for CD-
quality audio data (sampled by 44.1kHz for each stereo channel and each sample is represented by 16 bits)Fcan perform better widely from symphonic to folk
rock music.
C.-C. Jay Kuo 28
Adaptive Huffman Coding Encoding Examples n Encoding
nGiven symbol set {‘A’, ‘B’, ‘C’, ’D’, ‘E’}nInitial adaptive Huffman table as follows
B
ED
A
C0 1
0
0
0
1
1
1
W=20
W=1W=1
W=1
W=3
C.-C. Jay Kuo 29
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider first ‘D’, encode it as ‘0110’, update the tree.
B
ED
A
C0 1
0
0
0
1
1
1
W=20
W=1W=1
W=1
W=3
C.-C. Jay Kuo 30
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider first ‘D’, encode it as ‘0110’, update the tree.
B
ED
A
C0 1
0
0
0
1
1
1
W=20
W=1W=2
W=1
W=3
Switch these two
C.-C. Jay Kuo 31
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider first ‘D’, encode it as ‘0110’, update the tree.
B
EC
A
D0 1
0
0
0
1
1
1
W=20
W=1W=1
W=2
W=3
C.-C. Jay Kuo 32
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider second ‘D’, encode it as ‘010’, update the tree.
B
EC
A
D0 1
0
0
0
1
1
1
W=20
W=1W=1
W=3
W=3Switch these two subtreesbecause 2<3
W=2
C.-C. Jay Kuo 33
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider second ‘D’, encode it as ‘010’, then update the tree.
W=3
B
A0
0
1
1
1
W=20
EC0 1
W=1W=1
DW=3
0
C.-C. Jay Kuo 34
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider third ‘D’, encode it as ‘011’, then update the tree.
W=3
B
A0
0
1
1
1
W=20
EC0 1
W=1W=1
DW=4
Switch these two subtrees
0
C.-C. Jay Kuo 35
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider third ‘D’, encode it as ‘011’, then update the tree.
D
A0
0
1
1
1
W=20
EC0 1
W=1W=1
BW=3
W=40
C.-C. Jay Kuo 36
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider ‘A’, encode it as ‘1’, then update the tree.
D
A0
0
1
1
1
W=21
EC0 1
W=1W=1
BW=3
W=40
C.-C. Jay Kuo 37
Adaptive Huffman Coding Encoding Examplesn To encode ‘DDDAE’:
nConsider ‘E’, encode it as ‘0101’, then update the tree.
D
A0
0
1
1
1
W=21
EC0 1
W=2W=1
BW=3
W=40
C.-C. Jay Kuo 38
Adaptive Huffman Coding Decoding Examples n Decoding
nGiven symbol set {‘A’, ‘B’, ‘C’, ’D’, ‘E’}nInitial adaptive Huffman table as follows
B
ED
A
C0 1
0
0
0
1
1
1
W=20
W=1W=1
W=1
W=3
C.-C. Jay Kuo 39
Adaptive Huffman Coding Decoding Examplesn To decode ‘01001001110100’:
nGo through the table from its root, decode until ‘010’and get ‘C’. Then update the table.
B
ED
A
C0 1
0
0
0
1
1
1
W=20
W=1W=1
W=2
W=3
C.-C. Jay Kuo 40
Adaptive Huffman Coding Decoding Examplesn To decode ‘C01001110100’:
nGo through the table from its root, decode until ‘010’and get ‘C’. Then update the table.
B
ED
A
C0 1
0
0
0
1
1
1
W=20
W=1W=1
W=3
W=3
C.-C. Jay Kuo 41
Adaptive Huffman Coding Decoding Examplesn To decode ‘CC01110100’:
nGo through the table from its root, decode until ‘011’and get ‘C’. Then update the table.
B
ED
A0
0
1
1
1
W=20
0 1
W=1W=1
C0
W=4
W=3
C.-C. Jay Kuo 42
Adaptive Huffman Coding Decoding Examplesn To decode ‘CCC10100’:
nGo through the table from its root, decode until ‘1’and get ‘A’. Then update the table.
ED
A0
0
1
1
1
W=21
0 1
W=1W=1
0C
W=4 BW=3
C.-C. Jay Kuo 43
Adaptive Huffman Coding Decoding Examplesn To decode ‘CCCA0100’:
nGo through the table from its root, decode until ‘0100’and get ‘D’. Then update the table.
ED
A0
0
1
1
1
W=21
0 1
W=1W=2
0C
W=4 BW=3
Switch these two nodes because 2>1
C.-C. Jay Kuo 44
Adaptive Huffman Coding Decoding Examplesn Final result:
nDecoded symbols ‘CCCAD’nHuffman table
DE
A0
0
1
1
1
W=21
0 1
W=2W=1
0C
W=4 BW=3
C.-C. Jay Kuo 45
Contents
n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding
C.-C. Jay Kuo 46
Arithmetic Coding (1)
n Unlike Huffman code which is a block code (assign variable and integer length code to each block), arithmetic code is a non-block code (also known as tree code).
n Especially useful when dealing with sources with small alphabets, such as binary sources, and alphabets with highly skewed probability.
C.-C. Jay Kuo 47
Arithmetic Coding (2)
n Example: problem of Huffman code:
n Higher-order model still not work:
Symbol CodewordA 0B 11C 10
Pr(A)=0.95, Pr(B)=0.03, Pr(C)=0.02
Entropy=0.335 bits/symbolBavg=1.05 bits/symbolCoding Redundancy=0.715 bits/symbol
S y m b o l C o d e w o r dA A 0A B 1 1 1A C 1 0 0B A 1 1 0 1B B 1 1 0 0 1 1B C 1 1 0 0 0 1C A 1 0 1C B 1 1 0 0 1 0C C 1 1 0 0 0 0
Entropy=0.669 bits/symbolBavg=1.221 bits/symbolCoding Redundancy=0.552 bits/symbol
better but still not efficient !!
Always use integer length
C.-C. Jay Kuo 48
Arithmetic Coding (3)
n Example of arithmetic coding (I):Pr(A)=0.2, Pr(B)=0.4, Pr(C)=0.2, Pr(D)=0.1, Pr(E)=0.1
0.6
0.52
0.44
0.28
0.2
0.56{{
{
{{
E
D
B
A
C
0.28
0.264
0.248
0.216
0.2
0.272{{
{
{{
E
D
B
A
C
0.264
0.2608
0.2576
0.2512
0.248
0.2624{{
{
{{
E
D
B
A
CC DA
1.0
0.8
0.6
0.2
0.0
0.9{{
{
{{
E
D
B
A
CB
C.-C. Jay Kuo 49
Arithmetic Coding (4)
n Example of arithmetic coding (II):nif we want to encode the sequence “BACD”,
then any number within the range of [0.2608, 0.2624) can be used to represent this sequence.
nin general, we choose the central value of the range as our output code, like 0.2616.
nDecoding procedure:Ffind out the coding interval [BL,BU) for the given
code T (in floating) and output the corresponding symbol belong to that interval.
C.-C. Jay Kuo 50
Arithmetic Coding (5)
( ) . [ . , . )( . ) / ( . . ) .
( ) . [ . , . )( . ) / ( . . ) .
( ) . [ . , . )( . ) / ( . . ) .
1 0 2 6 1 6 0 2 0 60 2 0 6 0 2 0 1 5 4
2 0 1 5 4 0 0 0 20 0 0 2 0 0 0 7 7
3 0 7 7 0 6 0 80 6 0 8 0 6 0 8 5
Q
Q
Q
T BT T
T AT T
T CT T
= ∈ ∴ ⇒= − − =
= ∈ ∴ ⇒= − − =
= ∈ ∴ ⇒= − − =
' ' ,a n d
' ' ,a n d
' ' ,a n d
Fchange the value of T by:
T T B B BL U L= − −( ) / ( )
Decoding example for given code 0.2616:
C.-C. Jay Kuo 51
Arithmetic Coding (6)
n Adaptive model:nforget factor method.nhigher order model (context model).
n Algorithm for binary arithmetic coder:nQ coder by IBM, 1988.nQM coder by IBM, 1990.Fincluded into CCITT JBIG standard, 1992.
C.-C. Jay Kuo 52
Arithmetic Coding Encoding Examplesn Given symbol set and probability of each
symbol, encode ‘DBEFA.’
A ( E O F ) 0 . 1 0 . 0 - 0 . 1
B 0 . 1 0 . 1 - 0 . 2
C 0 . 1 0 . 2 - 0 . 3
D 0 . 2 0 . 3 - 0 . 5E 0 . 2 5 0 . 5 - 0 . 7 5
F 0 . 2 5 0 . 7 5 - 1 . 0
Symbol Probability Assigned range
C.-C. Jay Kuo 53
Arithmetic CodingEncoding Examplesn Encode ‘DBEFA’
nD:[0.3,0.5) 0.5-0.3=0.2
ndB: [0.3+0.2*0.1=0.32, 0.3+0.2*0.2=0.34)
A (EOF) 0.1 0.0-0.1B 0.1 0.1-0.2C 0.1 0.2-0.3D 0.2 0.3-0.5E 0.25 0.5-0.75F 0.25 0.75-1.0
C.-C. Jay Kuo 54
Arithmetic CodingEncoding Examplesn Encode ‘DBEFA’
nDBE:[0.32+0.02*0.5=0.33, 0.32+0.02*0.75=0.335)
nDBEF: [0.33375, 0.335)nDBEFA:[0.33375, 0.333875)
A (EOF) 0.1 0.0-0.1B 0.1 0.1-0.2C 0.1 0.2-0.3D 0.2 0.3-0.5E 0.25 0.5-0.75F 0.25 0.75-1.0
C.-C. Jay Kuo 55
Arithmetic CodingDecoding Examplesn Decode ‘0.8302’n Result: ‘FDBA’
A (EOF) 0.1 0.0-0.1B 0.1 0.1-0.2C 0.1 0.2-0.3D 0.2 0.3-0.5E 0.25 0.5-0.75F 0.25 0.75-1.0
0.83020.75 F
0.08020.250.32080.30.0208
D0.2
0.1040.1 B
0.0040.10.04 A
C.-C. Jay Kuo 56
Contents
n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding
C.-C. Jay Kuo 57
QM Coder (1)
n An adaptive binary arithmetic codern An entropy coder standard of JBIG and
approved by CCITT in 1992n A lineal descendent of the Q coder
developed by IBM in 1988, but enhanced by the improvements of:ninterval subdivisionnprobability estimation
C.-C. Jay Kuo 58
QM coder (2)
DATA Binary Decision Tree
0* 00 1S01 1S102~3 1S110M4~7 1S1110MM8~15 1S11110MMM
n Non-binary data need to be modified by binary decision tree
original input data, X=0
modified data D, where D=abs(X)-1
S:sign bit of XM:data bits of D
D
C.-C. Jay Kuo 59
QM Coder (3)
A: Coding intervalC: Coding index, always points to the bottom of AQe: Probability of LPS
n Basic QM coder algorithmnAfter MPS (Maximum Probability Symbol)FC=CFA=A(1-Qe)
nAfter LPS (Least Probability Symbol)FC=C+A(1-Qe)FA=A*Qe
C.-C. Jay Kuo 60
QM Coder (4)
A0
A1
A2
A3
A4
C0 C1 C2
C3
C4
Input Data Sequence: 0011Qe=0.25, MPS=0
1.5
0.0
A2=A1x0.75A3=A2x0.25A4=A3x0.25
n Example of basic QM coder algorithm
98
C.-C. Jay Kuo 61
QM Coder (5)
n Modified QM coder algorithmnPotentially unbounded precision for AFLeft shift 1 bit for A & C whenever A < 0.75 ( re-
normalization )FGenerate new Qe after each re-normalization
nConditional exchange whenever A < QeFExchange MPS and LPS
nElimination of multiplicationA Q Q Ae e× ≅ > ≥ if 15 075. .
C.-C. Jay Kuo 62
MPS Re-normalization1.5
0.75
0
Qe(1)
Qe(3)Qe(2)
Qe(1)>Qe(2)>Qe(3)because of the continuousreceiving of MPSMPS renormalization
occurred point
C.-C. Jay Kuo 63
Algorithm of QM Coder
n After MPSC=C;A=A-Qe;if (A<0x8000){
if (A<Qe){
C+=A;A=Qe;
}change Qe state for MPS-renormalization;
output MSB of C;A<<=1; C<<=1;
}
n After LPSC=C;A=A-Qe;if (A>=Qe){
C+=A;A=Qe;
}change Qe state for LPS-
renormalization;output MSB of C;A<<=1;C<<=1;
C.-C. Jay Kuo 64
Model of Context of QM Coder
?? K bit plane
K-2 bit plane
K-1 bit plane
(I) causal context model (II) parent-children contextmodel in EZW/LZC
C.-C. Jay Kuo 65
Contents
n Backgroundn Huffman codingn Arithmetic codingn QM codern Lempel Ziv coding
C.-C. Jay Kuo 66
Lempel Ziv Coding (1)
n Dictionary techniques (I):nto keep a list, or dictionary, of the frequently
occurring pattern.nuse a more efficient method to code those
pattern in the dictionary.nthere are two types of dictionary: static and
adaptive dictionary. In general, adaptive dictionary performs better than static ones. The well known adaptive cases are: LZ77, LZ78 and LZW.
C.-C. Jay Kuo 67
Lempel Ziv Coding (2)
n Example of dictionary technique:nSuppose we have a particular text data consists
of 4 characters word (chosen from 32 different characters). For fixed length code case, we need 20 bits (324=220) to represent all of the four-character patterns.
nIf we put the 256 most likely four-character in a dictionary, then we only need 8 bits to encode these patterns and use 20 bits to encode those patterns which are not in the dictionary.
C.-C. Jay Kuo 68
Lempel Ziv Coding (3)
n Example of static dictionary:
S y m b o l c o d e S y m b o l c o d ea 0 0 0 r 1 0 0b 0 0 1 a b 1 0 1c 0 1 0 a c 1 1 0d 0 1 1 a d 1 1 1
assume we have a text data contains ‘a’,’b’,’c’,’d’and ‘r’only,then build a static code table as below:
coding procedure:(1) read two characters from the input text data.(2) if these two characters inside the dictionary, then output the code
and goto step (1).(3) otherwise, output the first character by its code and keep the second
character as the first character in next iteration. read only one character from the text and goto step (2).
C.-C. Jay Kuo 69
Lempel Ziv Coding (4)
n Most adaptive dictionary based techniques have their roots in two landmark papers by Jacob Ziv and Abraham Lempel in 1977 and 1978. (LZ77/LZ78)
n LZW, which developed by Terry Welch “A Technique for High-Performance Data Compression”in IEEE Computer in 1984, make a significant improvement on LZ78.
C.-C. Jay Kuo 70
Lempel Ziv Coding (5)
n Algorithm of LZW:nInitialize the dictionary for each single symbol
with an index. Set the buffer to empty.nA new pattern means the combination of the old
pattern in the buffer and the input symbol. if the new pattern is not inside the dictionary, then output the index of the old pattern in the buffer and reset the buffer to the input symbol.
nOtherwise, reset the buffer to the new pattern and no output is necessary.
C.-C. Jay Kuo 71
Lempel Ziv Coding (6)
n Example of LZW (I):(0) assume the text data consists of ‘A’, ‘B’, ‘C’and ‘D’only.
initial the dictionary with these four symbols only.S y m b o l I n d e x S y m b o l I n d e x
A 1 C 3B 2 D 4
we want to encode a text data sequence:‘ABABBAABBCD’
(1) read ‘A’, already inside our dictionary (index ‘1’), then change the buffer=‘A’
C.-C. Jay Kuo 72
Lempel Ziv Coding (7)
n Example of LZW (II):(2) read ‘B’, but ‘AB’is not in our dictionary, then output our
buffer which is ‘A’now and add ‘AB’into our dictionary with index ‘5’. clean the buffer and put ‘B’into our buffer.
(3) read ‘A’, but ‘BA’is not in our dictionary, then output thebuffer (‘B’) and add ‘BA’into the dictionary with index‘6’. reset the buffer and put ‘A’in it.
(4) read ‘B’and ‘AB’is inside the dictionary, no output is necessary but change the value of buffer to ‘AB’.
(5) read ‘B’and ‘ABB’is not in the dictionary, then output thebuffer which is ‘AB’(with index ‘5’) and add ‘ABB’intodictionary with index ‘7’and buffer change to ‘B’only.
C.-C. Jay Kuo 73
Lempel Ziv Coding (8)
n Example of LZW (III):(6) read ‘A’and ‘BA’is inside the dictionary, no output is
necessary but change the value of buffer to ‘BA’.(7) read ‘A’but ‘BAA’is not in our dictionary, then output
the buffer (‘BA’) and add ‘BAA’into the dictionary with index ‘8’. reset the buffer and put ‘A’in it.
(8) read ‘B’and ‘AB’is inside the dictionary, no output is necessary but change the value of buffer to ‘AB’.
(9) read ‘B’and ‘ABB’is inside the dictionary, no output is necessary but change the value of buffer to ‘ABB’.
(10) read ‘A’but ‘ABBA’is not in our dictionary, then output the buffer (‘ABB’) and add ‘ABAA’into the dictionary.
C.-C. Jay Kuo 74
Lempel Ziv Coding (9)
n Applications:n“compress”command in UNIX systemn“ARJ”, “PKZIP”and “LHA”in PC systemntelephone network communication standard
which used in MODEM, V.42 bis by CCITTnimage compression standard, Graphics
Interchange Format (GIF), by Compuserve Information Service
C.-C. Jay Kuo 75
Reference
n Khalid Sayood, “Introduction to Data Compression”, Morgan K Publisher, 1996.
n Mark Nelson, “The Data Compression Book”, M&T books, 1992.n M. Rabbani and P. W. Jones, “Digital Image Compression
Techniques ”, SPIE Optical Engineering Press, 1991.n K. R. Rao and J. J. Hwang, “Techniques & Standards for Image, Video
& Audio Coding”, Prentice Hall PTR, 1996.n P. K. Andleigh and K. Thakrar, “Multimedia Systems Design”,
Prentice Hall PTR, 1996.n W. B. Pennebaker and J. L. Mitchell, “JPEG: Still Image data
Compression Standard”, Van Nostrand Reinhold, 1993.