Tutorial - Intercon 2014
-
Upload
jonathan-mendoza -
Category
Documents
-
view
41 -
download
1
description
Transcript of Tutorial - Intercon 2014
Principles of Data Compression: Theory and Applications
Dr. Daniel Leon-Salas
Motivation
The Information Revolution
IEEE Intercon 2014, Arequipa
Motivation
• Consider a 3 minute song:– assuming two channels, a 16-bit resolution, a
sampling rate of 48 kHz, it will take 33 MB of disk space to store the song.
• Consider a 5 megapixel camera:– assuming an 8-bit resolution per pixel, it will take
5 MB of disk space to store one picture.
• One second of video using the CCIR 601 format (720×485) needs more than 30 megabytes of storage space.
IEEE Intercon 2014, Arequipa
Introduction
• If data generation is growing at an explosive rate, why not focus on improving transmission and storage technologies?
• Transmission and storage technologies are improving but not at the same rate as data is generated.
• This is especially true for wireless communications where the radio spectrum is limited.
IEEE Intercon 2014, Arequipa
Introduction
• Data compression is the art or science of representing information in a compact form.
• Data compression is performed by identifying and exploiting structure and redundancies in the data.
• Data can be samples of audio, images, text files, it can be generated by sensors or scientific instruments, social networks, markets, etc.
IEEE Intercon 2014, Arequipa
Introduction
• Consider Morse code, developed in the 19th
century, in which letters are encoded with dots and dashes. some letters (e and a) occur more often than others (q
and j).
letters that occur more frequently are encoded using shorter sequences: e . a .-
Letters that occur less frequently are encoded using longer sequences: q - - . - j .- - -
• In this case the statistical structure of the data was exploited.
IEEE Intercon 2014, Arequipa
Introduction
• There are many other types of structure in data that can be exploited to achieve compression.
• In speech, the physical structure of our vocal tract determines the kind of sounds that we can produce instead of sending speech samples we can send information about the vocal tract to the receiver.
• We can also exploit characteristics of the end user of the data.
IEEE Intercon 2014, Arequipa
Introduction
• In many cases, when transmitting images or audio, the end user is a human.
• Humans have limited hearing and vision abilities.
• We can exploit the limitations of human perception to discard irrelevant information and obtain higher compression.
IEEE Intercon 2014, Arequipa
Compression and Reconstruction
IEEE Intercon 2014, Arequipa
compressionreconstruction
(decompression)
Original Reconstructed
Compression Algorithm
Lossless Compression
IEEE Intercon 2014, Arequipa
• Lossless compression involves no loss of information.
• The recovered data is an exact copy of the original.
• Useful in applications that cannot tolerate any difference:
medical images scientific data financial records computer programs
Lossy Compression
IEEE Intercon 2014, Arequipa
• In lossy compression some loss of information is tolerated.
• The original data cannot be recovered exactly but results in higher compression ratios.
• Useful in applications where some loss of information is not critical:
speech coding telephone communications
video coding digital photography
Compression Performance
IEEE Intercon 2014, Arequipa
• Compression ratio (CR):
• Distortion (for lossy compression):
# bits required to represent data without compression
# bits required to represent data with compressionCR =
MSE =1
𝑁𝑋 − 𝑋
2
2
PSNR dB = 10 log10𝑋max2
MSE
• Rate: average number of bits per sample or symbol
Example 1
IEEE Intercon 2014, Arequipa
Let’s consider the following input sequence:
𝑋 = [9, 11, 11, 11, 14, 13, 15, 17, 16, 17, 20, 21]
To encode this sequence using plain binary code, we would need to use 5 bits per number and a total of 60 bits.
K. Sayood, Introduction to Data Compression, 2nd edition, Morgan Kauffman
Example 1
IEEE Intercon 2014, Arequipa
If we use the model:
𝑋 𝑛 = 𝑛 + 8
The residual 𝑒 consists of only three numbers {−1, 0, 1} which can be encoded using 2 bits per number for a total 36 bits.
and compute the residual 𝑒 = 𝑋 − 𝑋 = [0, 1, 0, −1, 1,−1, 0, 1, −1,−1, 1, 1]
Example 2
IEEE Intercon 2014, Arequipa
• Input sequence: a_barayaran_array_ran_far_faar_faaar_away
• The sequence is made of eight different characters (symbols):
a, b, f, n, r, w, y, _
• Hence, we can use three bits per symbol to encode the sequence resulting in a total of 41×3=123 bits for the entire sequence.
• However, we can use fewer bits if we realize that some symbols occur more frequently than others.
• We can use fewer bits to encode the more frequent symbols.
K. Sayood, Introduction to Data Compression, 2nd edition, Morgan Kauffman
Example 2
IEEE Intercon 2014, Arequipa
Using variable-length codes we can encode the sequence using only 97 bits.
Input character Frequency Variable-length code Fixed-length code
a 16 1 000
_ 7 001 001
b 1 01100 010
f 3 0100 011
n 2 0111 100
r 6 000 101
w 1 01101 110
y 3 0101 111
Input sequence: a_barayaran_array_ran_far_faar_faaar_away
codes
codewords
Statistical Redundancy
IEEE Intercon 2014, Arequipa
• Statistical redundancy was employed in Example 2 to build a code to encode the input sequence.
• When compressing text, statistical redundancy can be extended to, not only characters, but also words dictionary technique.
• Examples of compression solutions that use the dictionary technique include the Lempel-Ziv (LZ) algorithm, LZ77, gzip, Zip, PNG, PKZip.
Information and Entropy
IEEE Intercon 2014, Arequipa
• Information can be defined as a message that helps to resolve uncertainty.
• In Information Theory information is taken as a sequence of symbols from an alphabet.
• Entropy is a measure of information.
source
A{a1, a2 … an}
a1 a2 a3 a6 a8 a5 a3 a4
symbols
messagealphabet
𝐻 𝐴 = −
𝑖=1
𝑛
𝑃(𝑎𝑖) log𝑃(𝑎𝑖)
First-order entropy of the source:
Entropy
IEEE Intercon 2014, Arequipa
• If the base of the logarithm is 2 the units of entropy are bits. If the base is 10 the units are hartleys. If the base is e the units are nats.
• The first-order entropy assumes that the symbols occur independently of each other.
• The entropy is a measure of the average number of bits needed to encoded the output of the source.
• Claude Shannon showed that the best rate that a lossless compression algorithm can achieve is equal to the entropy of the source.
• Example: Let’s consider a source with an alphabet consisting of four symbols: a1, a2, a3, a4.
P(a1) = 1/2, P(a2) = 1/4, P(a3) = 1/8, P(a4) = 1/8
H = -(1/2 log2(1/2) + 1/4 log2(1/4) + 1/8 log2(1/8) + 1/8 log2(1/8)) = 1.75 bits/symbol.
𝐻 𝐴 = −
𝑖=1
𝑛
𝑃(𝑎𝑖) log𝑃(𝑎𝑖)
Coding
IEEE Intercon 2014, Arequipa
• Coding is the process of assigning binary sequences to symbols of an alphabet.
• Example: Let’s consider a source with a four-symbol alphabet such that: P(a1) = 1/2,
P(a2) = 1/4, P(a3) = 1/8, P(a4) = 1/8 H = 1.75 bits/symbol.
Symbol Probability Code 1 Code 2 Code 3 Code 4
a1 0.5 0 0 0 0
a2 0.25 0 1 10 01
a3 0.125 1 00 110 011
a4 0.125 10 11 111 0111
Average length 1.125 bits 1.25 bits 1.75 bits 1.875 bits
uniquely decodable
codes
Prefix Codes
IEEE Intercon 2014, Arequipa
k bits
C1
n bits
C2
Consider the following codewords:
IF
n bits
C2
k bits
C1
dangling suffix
then we say that C1 is a prefix of C2
• If the dangling suffix is itself a codeword, the code is not uniquely decodable.
• A prefix code is a code in which no codeword is a prefix of another codeword.
• Prefix codes are uniquely decodable.
Huffman Coding
IEEE Intercon 2014, Arequipa
• Huffman coding is an algorithm for building optimum prefix codes.
• It was developed as a class assignment in the first class on information theory taught by Robert Fano at MIT in 1950.
• Huffman coding assumes that the probabilities of the source are known.
• Huffman coding is based on the following observations about optimum prefix codes: Symbols with higher probability have shorter codewords than
less probable symbols. The two symbols with the lowest probabilities have the same
length (proof by contradiction) In a Huffman code the codewords corresponding to the two
symbols with the lowest probabilities differ only in the last bit.
Huffman Coding
IEEE Intercon 2014, Arequipa
Example: Let’s build a Huffman code for a source with a four-symbol alphabet such that: (a1) = 0.5, P(a2) = 0.25, P(a3) = 0.125, P(a4) = 0.125
a1 a2 a3 a4
0.5 0.25 0.125 0.125
a1 a2
a3 a4
0.5 0.25 0.25
1
210
Huffman Coding
IEEE Intercon 2014, Arequipa
a1 a2
a3 a4
0.5 0.25 0.25
2
a1
a2
a3 a4
3
0.250.25
0.5 0.5
10
10
10
Huffman Coding
IEEE Intercon 2014, Arequipa
a1
a2
a3 a4
4
0.250.25
0.5 0.5
0.125 0.125
1.0
10
10
10
Symbol Probability Codeword
a1 0.5 0
a2 0.25 10
a3 0.125 110
a4 0.125 111
Average codeword length:lavg = 0.5×1 + 0.25×2 + 0.125×3 +
0.125×3 = 1.75 bits
It can be shown that for Huffman codes:
H(S) ≤ lavg ≤ H(S)+1
Decoding Huffman Codes
IEEE Intercon 2014, Arequipa
a1
a2
a3 a4
10
10
10
Example: Decode the following message using the Huffman code from previous example: 0110101110
0110101110
0110101110
0110101110
0110101110
0110101110
a1
a1 a3
a1 a3 a2
a1 a3 a2 a4
a1 a3 a2 a4 a1
Decoded messageEncoded message
Adaptive Huffman Codes
IEEE Intercon 2014, Arequipa
• Huffman coding requires knowledge of the probabilities of the source.• If this knowledge is not available, Huffman coding becomes a two-pass
procedure: first pass to compute the probabilities second pass to encode the output of the source.
• The adaptive Huffman coding algorithm converts this two-pass procedure into a single-pass procedure.
• In adaptive Huffman coding, the transmitter and the receiver start with a code tree that has a single node corresponding to all the symbols not yet transmitted (NYT).
• As transmission progresses, nodes corresponding to transmitted symbols are added to the tree.
• The first time a symbol is transmitted, the code for NYT is transmitted first followed by a non-adaptive code agreed by the transmitter and the receiver before transmission starts.
Golomb-Rice Codes
IEEE Intercon 2014, Arequipa
• The Golomb-Rice codes are a family of codes commonly used in data compression applications due to their low-complexity and good compression performance.
• The JPEG committee and the Consultative Committee for Space Data Systems (CCSDS), for instance, have adopted the Golomb-Rice codes as part of their standards.
• Golomb-Rices codes have also been recommended in the lossless audio compression standard H.264 and are already used in many commercial audio compression software.
• The Golomb-Rice codes have their origin in the pioneering work of Golomb who proposed a method to encode run lengths of events of a binary source when po
m=1/2, where po is the probability of events and mis an integer.
Golomb-Rice Codes
IEEE Intercon 2014, Arequipa
binary source
A{0, 1} 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 …
4 3 4 7 3 2po is the probability of a 1
(pom=1/2 where m is an integer) run lengths (non-negative integers)
. . . .0 1 2 3 4 5 6 7 8 9 10 11 12
n
P(n)
Geometric distribution
Golomb-Rice Codes
IEEE Intercon 2014, Arequipa
The Golomb-Rice codes consider the special case when m = 2k (k≥0)
k
n
2
k
n
2mod
unary code
natural binarycode
n
Example:n =17 (00010001)
k=0 codeword = 111111111111111110k=1 codeword = 1111111101k=2 codeword = 1111001k=3 codeword = 110001
b7 b6 b5 b4 b3 b2 b1 b0
k
b3 b2 b1 b00111111
unary code binary code
k=4 codeword = 100001k=5 codeword = 010001k=6 codeword = 0010001k=7 codeword = 00010001
Encoding Procedure:
Golomb-Rice Codes
IEEE Intercon 2014, Arequipa
P(n)
....0-1-2-3-4-5-6-7-8-9-10
. . . .1 2 3 4 5 6 7 8 9 10 11
n
Practical sources produce positive and negative numbers (double-sided distribution)
Use the following mapping:
M(n) =2n
2|n|−1
if n ≥ 0
if n < 0
Maps positive input numbers to even integers and negative input numbers to odd integers.
Adaptive Golomb-Rice Codes
IEEE Intercon 2014, Arequipa
M
codewordsource G-Rcoder
adaptivealgorithm
Adaptive Golomb-Rice Codes
IEEE Intercon 2014, Arequipa
1) Initialize k to kini;
2) Reset counter;
3) Read input n and encode it using parameter k;
4) If (unary code ≥ 1) increment counter;
5) If (unary code = 0) decrement counter;
6) If (counter value ≥ M) k++; Goto 2;
7) If (counter value ≤ -M) k--; Goto 2;
Entropy Coding
IEEE Intercon 2014, Arequipa
sourceentropyencoder
compressed output
n
P(n)
If the source has a narrow distribution, an entropy encoder (Huffman, Golomb-Rice, arithmetic) can be used directly
Otherwise, a decorrelation step might be necessary
sourceentropyencoder
compressed outputdecorrelation
predictive coding, transform coding,
subband coding
Predictive Coding Decorrelation
IEEE Intercon 2014, Arequipa
X
61 6358 69
6460
57 5955 63
X X = 64
3 20 6
4-1
2 2-2 4
eX
X ─ X prediction residual
pixel prediction
In an image, a pixel generally has a value
close to one of its neighbors
Predictive Coding Decorrelation
IEEE Intercon 2014, Arequipa
Histogram Histogram
Original Residual
Context Adaptive Lossless Image Compression (CALIC)
IEEE Intercon 2014, Arequipa
NNENN
NENNW
WWW X
Pixel neighborhood
𝑑ℎ = |𝑊 −𝑊𝑊| + 𝑁 − 𝑁𝑊 + |𝑁𝐸 − 𝑁|
𝑑𝑣 = |𝑊 −𝑁𝑊| + 𝑁 − 𝑁𝑁 + |𝑁𝑁𝐸 − 𝑁𝐸|
If 𝑑ℎ − 𝑑𝑣 > 80 𝑋 𝑁
else if 𝑑𝑣 − 𝑑ℎ > 80 𝑋 𝑊
else { 𝑋 𝑁 +𝑊 /2 + (𝑁𝐸 − 𝑁𝑊)/4
if 𝑑ℎ − 𝑑𝑣 > 32 𝑋 ( 𝑋 + 𝑁)/2
else if 𝑑𝑣 − 𝑑ℎ > 32 𝑋 ( 𝑋 + 𝑁)/2
else if 𝑑ℎ − 𝑑𝑣 > 8 𝑋 (3 𝑋 + 𝑁)/4
else if 𝑑𝑣 − 𝑑ℎ > 8 𝑋 (3 𝑋 +𝑊)/4
}
The neighboring pixels N, W,
NE, NW, NN, WW, NNE are available to both the encoder and the decoder (assuming a
raster scan)
To get an idea of the boundaries present in the neighborhood:
Initial pixel prediction:
1
2
3 The initial prediction is refined based on the relationships of the pixels in the neighborhood (contexts). For each context we keep track of how much prediction error is generated and use it to refine the initial prediction.
Transform Coding
IEEE Intercon 2014, Arequipa
• In transform coding the input sequence is transformed into another sequence in which most of the information is contained in only a few elements.
• For a 1D signal such as audio or speech, 𝐱, the forward transform is defined as:𝜃 = 𝐀𝐱
and the inverse transform is defined as:𝐱 = 𝐁𝜃
the transforms are orthonormal transforms: 𝐁 = 𝐀−𝟏 = 𝐀𝑇
• For 2D signals such as images, a two-dimensional separable transform is used. In a separable transform, we can take a 1D transform in one dimension and another 1D transform in the other dimension.
• In matrix notation:𝚯 = 𝐀𝐗𝐀𝑇
and the inverse transform is given by:𝐗 = 𝐁𝚯𝐁𝑇
Transform Coding
IEEE Intercon 2014, Arequipa
• In the JPEG standard, the forward transform is the Discrete Cosine Transform (DCT) and the inverse transform is the Inverse Discrete Cosine Transform (IDCT).
• The DCT transform matrix 𝐀 is defined as:
• 𝐀𝑖,𝑗 =
1
𝑁cos
2𝑗+1 𝑖𝜋
2𝑁𝑖 = 0, 𝑗 = 0,1,⋯ , 𝑁 − 1
2
𝑁cos
2𝑗+1 𝑖𝜋
2𝑁𝑖 = 1,2,⋯ , 𝑁 − 1, 𝑗 = 0,1,⋯ ,𝑁 − 1
DCT Quantization
DPCM
RLC
DC
AC
Entropy encoder
quantization table
compressed image
input image
Transform Coding - DCT
IEEE Intercon 2014, Arequipa
183 177 147 79 41 34 35 43
189 153 63 39 38 37 39 44
187 99 37 38 42 41 46 46
101 42 36 39 61 63 59 44
41 41 38 45 57 73 52 47
44 49 49 50 54 60 58 54
51 58 55 50 55 57 58 54
44 50 52 54 55 59 67 63
502.0 119.5 83.8 48.3 6.0 0.0 -0.1 -0.3
88.6 173.4 90.9 22.5 11.5 -1.8 -0.2 -0.8
62.0 78.7 22.2 -44.9 -19.8 -9.4 -7.3 -1.1
12.2 4.7 -37.1 -44.6 -30.2 -12.2 5.0 -3.0
3.5 -22.5 -36.9 -20.3 -13.0 4.1 11.5 5.1
12.1 9.7 -7.0 -6.6 2.6 11.3 8.5 11.5
9.2 7.9 3.7 -6.4 6.3 10.1 3.8 1.8
2.6 9.8 1.4 -2.0 0.3 -1.2 2.3 -5.1
DCT
8
8
AC coefficientsDC coefficient
Quantization of DCT Coefficients
IEEE Intercon 2014, Arequipa
502.0 119.5 83.8 48.3 6.0 0.0 -0.1 -0.3
88.6 173.4 90.9 22.5 11.5 -1.8 -0.2 -0.8
62.0 78.7 22.2 -44.9 -19.8 -9.4 -7.3 -1.1
12.2 4.7 -37.1 -44.6 -30.2 -12.2 5.0 -3.0
3.5 -22.5 -36.9 -20.3 -13.0 4.1 11.5 5.1
12.1 9.7 -7.0 -6.6 2.6 11.3 8.5 11.5
9.2 7.9 3.7 -6.4 6.3 10.1 3.8 1.8
2.6 9.8 1.4 -2.0 0.3 -1.2 2.3 -5.1
496 121 80 48 0 0 0 0
84 168 84 19 0 0 0 0
56 78 16 -48 0 0 0 0
14 0 -44 -58 -51 0 0 0
0 -22 -37 0 0 0 0 0
24 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
DCT coefficients
Quantization Table (𝐐)
Quantized coefficients
𝚯 = 𝐐 round𝚯
𝐐
After quantization the DCT coefficients are transmitted following a zig-zag pattern.
The coefficients are encoded using a Huffman
code.
Transform Coding - DCT
IEEE Intercon 2014, Arequipa
Original Coded using DCT
Sub-band Coding
IEEE Intercon 2014, Arequipa
• In sub-band coding the input signal is decomposed into several sub-bands using an analysis filter bank.
• Depending on the signal different sub-bands will contain different amounts of information.
• Sub-bands with lots of information are encoded using more bits while sub-bands with little information are encoded using fewer bits.
• At the decoder side, the signal is reconstructed using a bank of synthesis filter.
f1 f2 f3 fM
. . .
. . .
Subband Coding
IEEE Intercon 2014, Arequipa
analysis filter 1 M
entropy encoder 1
entropy decoder 1 M
synthesis filter 1
. . .
analysis filter 2 M
entropy encoder 2
entropy decoder 2 M
synthesis filter 2
. . .
analysis filter 3 M
entropy encoder 3
entropy decoder 3 M
synthesis filter 3
. . .
analysis filter M M
entropy encoder M
entropy decoder M M
synthesis filter M
. . .
outputinput
Further Reading
IEEE Intercon 2014, Arequipa
• Khalid Sayood, Introduction to Data Compression, 4th edition, Morgan Kaufmann, San Francisco, 2012.
• G. Held and T. R. Marshall, Data Compression, 3rd edition, John Wiley and Sons, New York, 1991.
• N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, Englewood Cliffs, 1984.
• B. E. Usevitch, “A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000,” IEEE Signal Processing Magazine, vol. 18, no. 5, 2001.
• D. Pan, “Digital audio compression,” Digital Technical Journal, vol. 5, no. 2, 1993.
• M. Hans and R. W. Schafer, “Lossless compression of digital audio,” IEEE Signal Processing Magazine, vol. 18, no. 4, 2001.
• G. E. Blelloch, Introduction to Data Compression, course notes, Computer Science Department, Carnegie Mellon University