The LZ family LZ77 LZR LZSS LZB LZH – used by zip and unzip LZ78 LZW – Unix compress LZC –...
-
Upload
ernest-gilbert -
Category
Documents
-
view
218 -
download
0
Transcript of The LZ family LZ77 LZR LZSS LZB LZH – used by zip and unzip LZ78 LZW – Unix compress LZC –...
The LZ family
LZ77 LZR LZSS LZB LZH – used by zip and unzip
LZ78 LZW – Unix compress LZC – Unix compress LZT LZMW LZJLZFG
Overview of LZ family
To demonstrate: simple alphabet containing only two
letters, a and b, and create a sample stream of text
LZ family overview
Rule: Separate this stream of characters into pieces of text so that the shortest piece of data is the string of characters that we have not seen so far.
Sender : The Compressor
Before compression, the pieces of text from the breaking-down process are indexed from 1 to n:
LZ
indices are used to number the pieces of data. The empty string (start of text) has index 0. The piece indexed by 1 is a. Thus a, together with
the initial string, must be numbered Oa. String 2, aa, will be numbered 1a, because it
contains a, whose index is 1, and the new character a.
LZ
the process of renaming pieces of text starts to pay off. Small integers replace what were once
long strings of characters. can now throw away our old stream of
text and send the encoded information to the receiver
Bit Representation of Coded Information
Now, want to calculate num bits needed each chunk is an int and a letter num bits depends on size of table
permitted in the dictionary every character will occupy 8 bits because
it will be represented in US ASCII format
Compression good?
in a long string of text, the number of bits needed to transmit the coded information is small compared to the actual length of the text.
example: 12 bits to transmit the code 2b instead of 24 bits (8 + 8 + 8) needed for the actual text aab.
Receiver: The Decompressor (Implementation
receiver knows exactly where boundaries are, so no problem in reconstructing the stream of text.
Preferable to decompress the file in one pass; otherwise, we will encounter a problem with temporary storage..
Lempel-Ziv applet
See http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic23/#JavaApplet
Lempel Ziv Welsch (LZW)
previous methods worked only on characters LZW works by encoding strings some strings are replaced by a single
codeword for now assume codeword is fixed (12 bits) for 8 bit characters, first 256 (or less) entries
in table are reserved for the characters rest of table (257-4096) represent strings
LZW compression
trick is that string-to-codeword mapping is created dynamically by the encoder
also recreated dynamically by the decoder need not pass the code table between the
two is a lossless compression algorithm degree of compression hard to predict depends on data, but gets better as
codeword table contains more strings
LZW encoder
Initialize table with single character stringsSTRING = first input characterWHILE not end of input stream
CHARACTER = next input characterIF STRING + CHARACTER is in the string table
STRING = STRING + CHARACTERELSE
Output the code for STRINGAdd STRING + CHARACTER to the string
tableSTRING = CHARACTER
END WHILEOutput code for string
Demonstrations
Another animated LZ algorithm … http://www.data-compression.com/lempelziv.html
LZW encoder example
compress the string BABAABAAA
LZW decoder
Lempel-Ziv compression
a lossless compression algorithm All encodings have the same length
But may represent more than one character
Uses a “dictionary” approach – keeps track of characters and character strings already encountered
LZW decoder example
decompress the string <66><65><256><257><65><260>
LZW Issues
compression better as the code table grows
what happens when all 4096 locations in string table are used?
A number of options, but encoder and decoder must agree to do the same thing do not add any more entries to table (as is) clear codeword table and start again clear codeword table and start again with
larger table/longer codewords (GIF format)
LZW advantages/disadvantages
advantages simple, fast and good compression can do compression in one pass dynamic codeword table built for each file decompression recreates the codeword
table so it does not need to be passed disadvantages
not the optimum compression ratio actual compression hard to predict
Entropy methods
all previous methods are lossless and entropy based
lossless methods are essential for computer data (zip, gnuzip, etc.)
combination of run length encoding/huffman is a standard tool
are often used as a subroutine by other lossy methods (Jpeg, Mpeg)
Lempel-Ziv compression
a lossless compression algorithm All encodings have the same length
But may represent more than one character
Uses a “dictionary” approach – keeps track of characters and character strings already encountered