ECEC 453 Image Processing Architecture
description
Transcript of ECEC 453 Image Processing Architecture
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1Lecture 5
ECEC 453Image Processing Architecture
Lecture 5, 1/22/2004Rate-Distortion Theory, Quantizers and DCT
Oleh TretiakDrexel University
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 2Lecture 5
Quality - Rate Tradeoff• Given: 512x512 picture, 8 bits per pixel
- Bit reductiono Fewer bits per pixelo Fewer pixelso Both
• Issues:- How do we measure compression?
o Bits/pixel — does not work when we change number of pixelo Total bits — valid, but hard to interpret
- How do we measure quality?o RMS noiseo Peak signal to noise ratio (PSR) in dBo Subjective quality
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 3Lecture 5
Comparison, Bit and Pixel Reduction
0
10
20
30
40
50
60
0 1000000 2000000Total bits in image
SubsampleDrop bits
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 4Lecture 5
Quantizer Performance• Questions:
- How much error does the quantizer introduce (Distortion)?- How many bits are required for the quantized values (Rate)?
• Rate:- 1. No compression. If there are N possible quantizer output values, then it takes ceiling(log2N) bits
per sample.- 2(a). Compression. Compute the histogram of the quantizer output. Design Huffman code for the
histogram. Find the average lentgth.- 2(b). Find the entropy of the quantizer distribution- 2(c). Preprocess quantizer output, ....
• Distortion:Let x be the input to the quantizer, x* the de-quantized value. Quantization noise n = x* - x. Quantization noise power is equal to D = Average(n2).
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 5Lecture 5
Quantizer: practical lossy encoding
• Quantizer- Symbols
x — input to quantizer, q — output of quantizer,S — quantizer step
- Quantizer:q = round(x/S)
- Dequantizer characteristicx* = Sq
- Typical noise power added by quantizer-dequantizer combination: D = S2/12
noise standard deviation = sqrt(D) = 0.287SExample: S = 8, D = 82/12 = 5.3,
rms. quatization noise = sqrt(D) = 2.3 If input is 8 bits, max input is 255. There are 255/8
~ 32 quantizer output values PSNR = 20 log10(255/2.3) = 40.8 dB
Quantizer characteristic
Dequantizer characteristic
Δxq
S
Δx*q
S
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 6Lecture 5
Rate-Distortion Theorem• When long sequences (blocks) are encoded, it is possible to
construct a coder-decoder pair that achieves the specified distortion whenever bits per sample are R(D) +
• Formula: X ~ Gaussian random variable, Q = E[X2] ~ signal power
• D = E[(X–Y)2 ] ~ noise power
p(x ) =1
2πQxπ(−x 2 / 2Q )
R =12log2
QD
D ≤Q0 D >Q
⎧ ⎨ ⎪ ⎩ ⎪
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 7Lecture 5
This Lecture• Decorrelation and Bit Allocation• Discrete Cosine Transform• Video Coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 8Lecture 5
Coding Correlated Samples• How to code correlated samples
- Decorrelate- Code
• Methods for decorrelation- Prediction- Transformation
o Block transformo Wavelet transform
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 9Lecture 5
Prediction Rules• Simplest: previous value
q i+Delay
p i
ˆ x iQuantizerDelayxi+-+qipi
p ij =w1 ˆxi−1,j+w 2 ˆxi,j−1 +w 3 ˆxi−1,j−1
ˆ x i −1,j
ˆxi,j−1ˆxi−1,j−1
πij
ˆ x ip i = ˆxi−1
ˆxi =πi + q i = ˆxi−1 +q i
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 10Lecture 5
General Predictive Coding• General System
• Example of linear predictive image coder
+–Quantizer
Predictor
xi
pi
ei ei*
+Predictor
xi^
pi
ei*
+
p ij =w1 ˆxi−1,j+w 2 ˆxi,j−1 +w 3 ˆxi−1,j−1
ˆ x i −1,j
ˆxi,j−1ˆxi−1,j−1
πij
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 11Lecture 5
Rate-distortion theory — correlated samples
• Given: x = (x1, x2, ... xn), a sequence of Gaussian correlated samples
• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) that de-correlates the samples. This is called a Karhunen-Loeve transformation
• Perform lossy encoding of (y1, y2, ... yn) - get y* = (y1*, y2*, ... yn*) after decoding
• Reconstruct: x* = A-1y*SignalPre-processorEncoderBitsCodesDecoderCodesRecon-structor
SignalTransmitReceive
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 12Lecture 5
Block-Based Coding• Discrete Cosine Transform (DCT)
is used instead of the K-L transform
• Full image DCT - one set of decorrelated coefficients for whole image
• Block-based coding: - Image divided into ‘small’ blocks- Each block is decorrelated
separately• Block decorrelation performs
almost as well (better?) than full image decorrelation
• Current standards (JPEG, MPEG) use 8x8 DCT blocks
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 13Lecture 5
Rate-distortion theory: non-uniform random variables
• Given (x1, x2, ... xn), use orthogonal transform to obtain (y1, y2, ... yn).
• Sequence of independent Gaussian variables (y1, y2, ... yn), Var[yi ] = Qi.
• Distortion allocation: allocate Di distortion to Qi
• Rate (bits) for i-th variable is Ri = max[0.5 log2(Qi/Di), 0]• Total distortion
• Total rate (bits)
• We specify R. What are the values of Di to get minimum total distortion D?
D = Dii=1
n∑
R = Rii=1
n∑
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 14Lecture 5
Bit allocation solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
QiDi
q
• Implicit solution (water-filling construction)• Choose Q (parameter)• Di = min(Qi, Q)
- If (Qi > Q) then Di = Q, else Di = Qi• Ri = max[0.5 log2(Qi/Di), 0]
- If (Qi > Q) then Ri = 0.5 log2(Qi/ Q), else Ri = 0.• Find value of Q to get specified R
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 15Lecture 5
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 16Lecture 5
Wavelet Transform• Filterbank and wavelets• 2 D wavelets• Wavelet Pyramid
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 17Lecture 5
Filterbank and Wavelets• Put signal (sequence) through two filters
- Low frequencies- High frequencies
• Downsample both by factor of 2• Do it in such a way that the original signal can be reconstructed!
LHx(i)l(k)h(k) LRHR+x(i)
100
50
50
100
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 18Lecture 5
Filterbank Pyramid
LHx(i)
LH
LH
1000 500
250
125
125
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 19Lecture 5
2D Wavelets• Apply wavelet processing along rows of picture
Apply wavelet processing along columns of picture
Pyramid processing
LHHHLHLVLHHV
HHLVHHHV
LHLV
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 20Lecture 5
Lena: Top Level, next level
1.01
0.372.52
48.81 9.23
15.45 6.48
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 21Lecture 5
Lena, more levels
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 22Lecture 5
Decorrelation of Images• x = (x1, x2, ... xn), a sequence of image gray values• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an
orthogonal matrix (A-1 = AT)• Theoretical best (for Gaussian process): A is the Karhunen-
Loeve transformation matrix- Images are not Gaussian processes- Karhunen-Loeve matrix is image-dependent, computationally
expensive to find- Evaluating y = Ax with K-L transformation is computationally
expensive• In practice, we use DCT (discrete cosine transform) for
decorrelation- Computationally efficient- Almost as good as the K-L transformation
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 23Lecture 5
DPCM• Simple to implement (low complexity)
- Prediction: 3 multiplications and 2 additions- Estimation: 1 addition- Encoding: 1 addition + quantization
• Performance for 2-D coding not as good as block quantization- In theory, for large past history the performance (rate-distortion)
should be as good as other linear methods, but in that case there is no computational advantage
• Bottom line: useful when complexity is limited• Important idea: Lossy predictive encoding.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 24Lecture 5
Review: Image Decorrelation• x = (x1, x2, ... xn), a sequence of image gray values• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an
orthogonal matrix (A-1 = AT)• Theoretical best (for Gaussian process): A is the Karhunen-
Loeve transformation matrix- Images are not Gaussian processes- Karhunen-Loeve matrix is image-dependent, computationally
expensive to find- Evaluating y = Ax with K-L transformation is computationally
expensive• In practice, we use DCT (discrete cosine transform) for
decorrelation- Computationally efficient- Almost as good as the K-L transformation
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 25Lecture 5
Rate-Distortion: 1D vs. 2D coding• Theory on tradeoff between distortion and least number of bits• Interesting tradeoff only if samples are correlated• “Water-filling” construction to compute R(d)
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 26Lecture 5
Review: Block-Based Coding• Full image DCT - one set of
decorrelated coefficients for whole image
• Block-based coding: - Image divided into ‘small’
blocks- Each block is decorrelated
separately• Block decorrelation performs
almost as well (better?) than full image decorrelation
• Current standards (JPEG, MPEG) use 8x8 DCT blocks
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 27Lecture 5
What is the DCT?• One-dimensional 8 point DCT
Input x0, ... x7, output y0, ... y7
• One-dimensional inverse DCTInput y0, ... y7, output x0, ... x7
• Matrix form of equations: x, y are one column matrices
yk =c(k)
2 xi cos (2i +1)kp16
⎛ ⎝
⎞ ⎠
i=0
7∑ , k =0,1,K ,7. c(k) = 1/ 2 k =0
1 otherwise⎧ ⎨ ⎩
y=Tx, x=TTy, tki =c(k)
2 cos (2i +1)kp16
⎛ ⎝
⎞ ⎠
xk = yi
c(i)2 cos (2k +1)ip
16⎛ ⎝
⎞ ⎠
i=0
7∑ , k =0,1,K ,7.
Note: in these equations, p stands for π
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 28Lecture 5
• Forward 2DDCT. Input xij i = 0, ... 7, j = 0, ... 7. Output ykl k = 0, ... 7, l = 0, ... 7
• Matrix form, X, Y ~ 8x8 matrices with coefficients xij , ykl
• The 2DDCT is separable!
Two-Dimensional DCT
ykl =c(k)c(l)
2 xij cos (2i +1)kp16
⎛ ⎝
⎞ ⎠ cos (2j +1)lp
16⎛ ⎝
⎞ ⎠
j=0
7∑
i=0
7∑c(k) = 1/ 2 k =0
1 otherwise⎧ ⎨ ⎩
Y=TXTT , X=TTYT, tki =c(k)
2 cos (2i +1)kp16
⎛ ⎝
⎞ ⎠
Note: in these equations, p stands for π
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 29Lecture 5
General DCT • One dimension
• Two dimensions
€
y(k) = t(k,i)x(i)i=0
N−1
∑ , k = 0,1,K ,N −1
€
t(k,i) =1/ N k = 0
2/N cos (2i+1)kπ2N
⎛ ⎝ ⎜ ⎞
⎠ ⎟ k ≠ 0
⎧ ⎨ ⎪
⎩ ⎪
€
y(k,l) = x(i, j)t(k,i)t(l, j)j=0
N−1
∑i=0
N−1
∑
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 30Lecture 5