Data Compression1 File Compression Huffman Tries ABRACADABRA 01011011010000101001011011010.
-
Upload
collin-snow -
Category
Documents
-
view
212 -
download
0
Transcript of Data Compression1 File Compression Huffman Tries ABRACADABRA 01011011010000101001011011010.
Data Compression 1
Data Compression
• File Compression• Huffman Tries
D BC
R
0 1
0
00 1
1
1
A
ABRACADABRA 01011011010000101001011011010
Data Compression 2
File Compression
• Text files are usually stored by representing each character with an 8-bit ASCII code (type man ascii in a Unix shell to see the ASCII encoding)
• The ASCII encoding is an example of fixed-length encoding, where each character is represented with the same number of bits
• In order to reduce the space required to store a text file, we can exploit the fact that some characters are more likely to occur than others
• variable-length encoding uses binary codes of different lengths for different characters; thus, we can assign fewer bits to frequently used characters, and more bits to rarely used characters.
Data Compression 3
File Compression: Example
• An Encoding Exampletext: java encoding: a = “0”, j = “11”, v = “10”encoded text: 110100 (6 bits)
• How to decode (problems in ambiguity)?encoding: a = “0”, j = “01”, v = “00”
encoded text: 010000 (6 bits)
could be "java", or "jvv", or "jaaaa"
Data Compression 4
Encoding Trie
• To prevent ambiguities in decoding, we require that the encoding satisfies the prefix rule: no code is a prefix of another.– a = “0”, j = “11”, v = “10” satisfies the prefix rule– a = “0”, j = “01”, v= “00” does not satisfy the prefix rule (the code of 'a'
is a prefix of the codes of 'j' and 'v')
• We use an encoding trie to satisfy this prefix rule.– the characters are stored at the external nodes– a left child (edge) means 0– a right child (edge) means 1
A = 010
B= 11
C= 00
D= 10
R= 011
D BC
R
0 1
0
00 1
1
1
A
Data Compression 5
Example of Decoding
• trie A = 010
B= 11
C= 00
D= 10
R= 011
D BC
R
0 1
0
00 1
1
1
A
• encoded text: 01011011010000101001011011010• text:
ABRACADABRA
Data Compression 6
Trie this!10000111110010011000111011110001010100110100
E NKCS BTW
RO
0
0
0000
0
0
0
1
1111
1
11
1
Data Compression 7
Optimal Compression• An issue with encoding tries is to insure that the encoded text is as
short as possible:
D BC
R
0 1
0
00 1
1
1
A
ABRACADABRA0101101101000010100101101010 29 bits
B RA
D
0 1
0
00 1
1
1
C
ABRACADABRA001011000100001100101100 24 bits
Data Compression 8
Huffman Encoding Trie
1 1
C D
5 2 2
B R
5 2 2
1 1
2
5
2 2 1 1
24
2 2 1 1
24
5 6
frequency
character
ABRACADABRA
A B R
C D
B R C D
A
A
B R C D
A
Data Compression 9
Huffman Encoding Trie (contd.)
B R D
A
0
1
0 1
0
0
11
C
5
11
4 2
6
2 2 1 1
2 2 1 1
24
5 6A
B R C D
Data Compression 10
Final Huffman Encoding Trie
A B R A C A D A B R A 0 100101 0 110 0 111 0 100 1010
23 bits
B R D
A
0
1
0 1
0
0
11
C
5
11
4 2
6
2 2 1 1
Data Compression 11
Another Huffman Encoding Trie
1 1
C D
5 2 2
B R
5 2 2
1 1
2
5
frequency
characterABRACADABRA
A B R
C D
A
A
1 1
2
C D
2R
42B
Data Compression 12
Another Huffman Encoding Trie
5
A
1 1
2
C D
2R
42B
1 1
2
C D
2R
42B
65
A
Data Compression 13
Another Huffman Encoding Trie
11
5
A
1 1
2
C D
2R
42B
65
A
1 1
2
C D
2R
42B
6
Data Compression 14
Another Huffman Encoding Trie
A B R A C A D A B R A 010110 0 1100 0 1111 0 10 110 0 23 bits
11
1 1
2
C D
2R
42B
65
A
0 1
1
1
1
0
0
0
Construction Algorithm
Algorithm Huffman(X):Input: String X of length nOutput: Encoding trie for XCompute the frequency f(c) of each character c of X.Initialize a priority queue Q.for each character c in X do Create a single-node tree T storing c
Q. insertItem(f(c), T)while Q.size() > 1 do
f1 ¨ Q. minKey()T1 ¨ Q. removeMinElement()f2 ¨ Q.minKey()
T2 ¨ Q. removeMinElement() Create a new tree T with left subtree T1 and right subtree T2.
Q.insertItem(f1 + f2, T) return tree Q.removeMinElement()
Data Compression 16
Construction Algorithm (contd)
• Running time for a text of length n with k distinct characters: O(n + klogk)
• Typically, k is O(1) (e.g., ASCII characters) and the algorithm runs in O(n) time.
• With a Huffman encoding trie, the encoded text has minimal length
Data Compression 17
Image Compression
• we can use Huffman encoding also for binary files (bitmaps, executables, etc.)
• common groups of bits are stored at the leaves• Example of an encoding suitable for b/w bitmaps
000
0
0
1
11
1
010 101
111
0 1
001 100
0
0 1
011 110
0
0 1