Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data...
-
Upload
edwina-ford -
Category
Documents
-
view
229 -
download
6
Transcript of Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data...
![Page 1: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/1.jpg)
Text Compression
Spring 2007CSE, POSTECH
![Page 2: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/2.jpg)
22
Data Compression
Deals with reducing the size of data– Reduce storage space and hence storage cost
Compression ratio = compressed data size / original data size
– Reduce time to retrieve and transmit data
File coding is done by a compressor and decoding by a decompressor
![Page 3: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/3.jpg)
33
Lossless and Lossy Compression
compressedData = compress(originalData) decompressedData =
decompress(compressedData)
When originalData = decompressedData,the compression is lossless.
When originalData != decompressedData,the compression is lossy.
![Page 4: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/4.jpg)
44
Lossless and Lossy Compression
Lossless compression is essential in applications such as text file compression.– e.g., ZIP
Lossy compressors generally obtain much higher compression ratios than do lossless compressors.– e.g., JPG, MPEG
Lossy compression is acceptable in many imaging applications.– In video transmissions, a slight loss in the transmitted vi
deo is not noticed by the human eye.
![Page 5: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/5.jpg)
55
Text Compression
Lossless compression is essential in text compression
Popular text compressors such as zip and compress are based on the LZW (Lempel-Ziv-Welch) method– The method is simple and employs hashing for storing t
he code table
![Page 6: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/6.jpg)
66
LZW Compression
Character strings in the original text are replaced by codes that are mapped dynamically
The mapping between character strings and their codes is stored in a dictionary
Each dictionary entry has two fields: key and code
Code table is not encoded in the compressed data because it may be used to reconstruct the compressed
text during decompression
![Page 7: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/7.jpg)
77
LZW Compression Algorithm
Scan the text from left to right Find the longest prefix p for which there is a code
in the code table Represent p by its code pCode Assign the next available code number to pc, whe
re c is the next character in the text that is to be compressed
See Programs 7.16, 7.17, 7.18, 7.19
![Page 8: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/8.jpg)
88
LZW Compression Example
Compress abababbabaabbabbaabba Assume the letters in the text are limited to {a,b}. In practice, the alphabet may be 256 character AS
CII set. The characters in the alphabet are assigned code
numbers beginning at 0. The initial code table is:
![Page 9: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/9.jpg)
99
LZW Compression Example
Original text = abababbabaabbabbaabba p = a pCode = 0 c = b Represent “a” by 0 and enter “ab” into code table Compressed text = 0
![Page 10: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/10.jpg)
1010
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 0 p = b pCode = 1 c = a Represent “b” by 1 and enter “ba” into code table Compressed text = 01
![Page 11: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/11.jpg)
1111
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 01 p = ab pCode = 2 c = a Represent “ab” by 2 and enter “aba” into code tabl
e. Compressed text = 012
![Page 12: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/12.jpg)
1212
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 012 p = ab pCode = 2 c = b Represent “ab” by 2 and enter “abb” into code tabl
e. Compressed text = 0122
![Page 13: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/13.jpg)
1313
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 0122 p = ba pCode = 3 c = b Represent “ba” by 3 and enter “bab” into code tabl
e. Compressed text = 01223
![Page 14: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/14.jpg)
1414
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 01223 p = ba pCode = 3 c = a Represent “ba” by 3 and enter “baa” into code tabl
e. Compressed text = 012233
![Page 15: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/15.jpg)
1515
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 012233 p = abb pCode = 5 c = a Represent “abb” by 3 and enter “abba” into code t
able. Compressed text = 0122335
![Page 16: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/16.jpg)
1616
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 0122335 p = abba pCode = 8 c = a Represent “abba” by 8 and enter “abbaa” into cod
e table Compressed text = 01223358
![Page 17: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/17.jpg)
1717
LZW Compression
Original text = abababbabaabbabbaabba Compressed text = 01223358 p = abba pCode = 8 c = null Represent “abba” by 8 Compressed text = 012233588
![Page 18: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/18.jpg)
1818
Code Table Representation
Dictionary– Pairs are (key, element) = (key, code).– Operations are: get(key) and put(key, code).
Use a hash table– But, key has a variable size– Takes time to generate a hash key and
compare the actual keyCan we have fixed length keys? If so, how?
![Page 19: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/19.jpg)
1919
Code Table Representation
Use a hash table– Convert variable length keys into fixed length keys– Each key has the form pc, where the string p is a key
that is already in the table– Replace the key pc with (pCode)c
![Page 20: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/20.jpg)
2020
LZW Decompression
Compressed text = 012233588 Convert codes to text from left to right 0 represents a Decompressed text = a pCode = 0 and p = a p = a followed by next text character (c) is entered
into the code table
![Page 21: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/21.jpg)
2121
LZW Decompression
Compressed text = 012233588 1 represents b Decompressed text = ab pCode = 1 and p = b lastP = a followed by first character of p is entered
into the code table.
![Page 22: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/22.jpg)
2222
LZW Decompression
Compressed text = 012233588 2 represents ab Decompressed text = abab pCode = 2 and p = ab lastP = b followed by first character of p is entered
into the code table.
![Page 23: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/23.jpg)
2323
LZW Decompression
Compressed text = 012233588 2 represents ab Decompressed text = ababab pCode = 2 and p = ab lastP = ab followed by first character of p is entere
d into the code table.
![Page 24: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/24.jpg)
2424
LZW Decompression
Compressed text = 012233588 3 represents ba Decompressed text = abababba pCode = 3 and p = ba lastP = ab followed by first character of p is entere
d into the code table.
![Page 25: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/25.jpg)
2525
LZW Decompression
Compressed text = 012233588 3 represents ba Decompressed text = abababbaba pCode = 3 and p = ba lastP = ba followed by first character of p is entere
d into the code table.
![Page 26: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/26.jpg)
2626
LZW Decompression
Compressed text = 012233588 5 represents abb Decompressed text = abababbabaabb pCode = 5 and p = abb lastP = ba followed by first character of p is entere
d into the code table.
![Page 27: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/27.jpg)
2727
LZW Decompression
Compressed text = 012233588 8 represents ???. When a code is not in the table, its key is lastP foll
owed by first character of lastP. lastP = abb. So 8 represents abba.
![Page 28: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/28.jpg)
2828
LZW Decompression
Compressed text = 012233588 8 represents abba. Decompressed text = abababbabaabbabbaabba pCode = 8 and p = abba lastP = abba followed by first character of p is ente
red into the code table
![Page 29: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/29.jpg)
2929
Code Table Representation
Dictionary– pairs are (key,element) = (code, what the code represen
ts) = (code, codeKey)– Operations are: get(key) and put(key,code)
Keys are integers 0,1,2,… Use a 1D array codeTable.
– codeTable[code] = codeKey– Each code key has the form pc, where the string p is a c
ode key that is already in the table.– Replace pc with (pCode)c.
![Page 30: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/30.jpg)
3030
Time Complexity
Compression– O(n) expected time, where n is the length of the
text that is being compressed.
Decompression– O(n) time, where n is the length of
decompressed text.
![Page 31: Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.](https://reader035.fdocuments.in/reader035/viewer/2022081503/56649e215503460f94b0dcc0/html5/thumbnails/31.jpg)
3131
READING
See Programs 7.20, 7.21, 7.22, 7.23, 7.24
Read Section 7.5
Useful site - http://datacompression.info/