ENEE631 Digital Image Processing (Spring'04)
Basic Video Coding and MPEGBasic Video Coding and MPEG
Spring ’04 Instructor: Min Wu
ECE Department, Univ. of Maryland, College Park
www.ajconline.umd.edu (select ENEE631 S’04) [email protected]
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
04
)
Based on ENEE631 Based on ENEE631 Spring’04Spring’04Section 12Section 12
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [3]
Motion PicturesMotion Pictures
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
04
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [4]
Bring in Motion Bring in Motion Video (Motion Pictures) Video (Motion Pictures) Capturing video
– Frame by frame => image sequence– Image sequence: A 3-D signal
2 spatial dimensions & time dimension continuous I( x, y, t ) => discrete I( m, n, tk )
Encode digital video
– Simplest way ~ compress each frame image individually e.g., “motion-JPEG” only spatial redundancy is explored and reduced
– How about temporal redundancy? Is differential coding good? Pixel-by-pixel difference could still be large due to motion
Need better prediction
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [5]
(From Princeton EE330 S’01 by B.Liu)
Residue after motion compensation
Pixel-wise difference w/o motion compensation
Motion estimation
“Horse ride”
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
. L
iu ©
20
02
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [6]
Motion EstimationMotion Estimation
Help understanding the content of image sequence– For surveillance
Help reduce temporal redundancy of video – For compression
Stabilizing video by detecting and removing small, noisy global motions– For building stabilizer in camcorder
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [7]
Block-Matching by Exhaustive SearchBlock-Matching by Exhaustive Search Assume block-based translation motion model
Search every possibility over a specified range for the best matching block – MAD (mean absolute difference) often used for simplicity
Flash Demo (by Dr. Ken Lam @ Hong Kong PolyTech Univ.)
From Wang’s Preprint Fig.6.6U
MC
P E
NE
E4
08
G S
lide
s (c
rea
ted
by
M.W
u &
R.L
iu ©
20
02
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [8]
Motion Compensation Motion Compensation
– Help reduce temporal redundancy of video
PREVIOUS FRAME CURRENT FRAME
PREDICTED FRAME PREDICTION ERROR FRAME
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
Revised from R.Liu Seminar Course ’00 @ UMD
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [9]
Complexity of Exhaustive Block-MatchingComplexity of Exhaustive Block-Matching
Assumptions– Block size NxN and image size S=M1xM2– Search step size is 1 pixel ~ “integer-pel accuracy”– Search range +/–R pixels both horizontally and vertically
Computation complexity– # Candidate matching blocks = (2R+1)2 – # Operations for computing MAD for one block ~ O(N2)– # Operations for MV estimation per block ~ O((2R+1)2 N2)– # Blocks = S / N2 – Total # operations for entire frame ~ O((2R+1)2 S)
i.e., overall computation load is independent of block size!
E.g., M=512, N=16, R=16, 30fps => On the order of 8.55 x 109 operations per second!– Was difficult for real time estimation, but possible with parallel hardware
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [10]
Exhaustive Search: Cons and ProsExhaustive Search: Cons and Pros
Pros– Guaranteed optimality within search range and motion model
Cons– Can only search among finitely many candidates
What if the motion is “fractional”?
– High computation complexity On the order of [search-range-size * image-size] for 1-pixel step
size
How to improve accuracy?
– Include blocks at fractional translation as candidates => require interpolation
How to improve speed?– Try to exclude unlikely candidates
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [11]
Fractional Accuracy Search for Block MatchingFractional Accuracy Search for Block Matching For motion accuracy of 1/K pixel
– Upsample (interpolate) reference frame by a factor of K– Search for the best matching block in the upsampled reference frame
Half-pel accuracy ~ K=2– Significant accuracy improvement over integer-pel
(esp. for low-resolution)– Complexity increase
(From Wang’s Preprint Fig.6.7)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [12]
Fast Algorithms for Block MatchingFast Algorithms for Block Matching
Basic ideas– Matching errors near the best match are generally smaller than far away– Skip candidates that are unlikely to give good match
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
(From Wang’s Preprint Fig.6.6)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [13]
M24
M15 M14 M13
M16
M11
M12
M5 M4 M3
M17 M18 M19
-6 M6 M1 M2 +6
M7 M8 M9
dx
dy
Fast Algorithm: 3-Step Search Fast Algorithm: 3-Step Search
Search candidates at 8 neighbor positions
Step-size cut down by 2 after each iteration– Start with step size
approx. half of max. search range
motion vector {dx, dy} = {1, 6}
Total number of computations: 9 + 82 = 25 (3-step) (2R+1)2 = 169 (full search)
(Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
=> See Flash demo by Jane Kim (UMD)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [14]
Lowest resolution
Lower resolution
Original resolution
Hierarchical Block MatchingHierarchical Block Matching Problem with fast search at full resolution
– Small mis-alignment may give high displacement error (EDFD) esp. for texture and edge blocks
Hierarchical (multi-resolution) block matching– Match with coarse resolution to narrow down search range– Match with high resolution to refine motion estimation
(From Wang’s Preprint Fig.6.19)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [15]
Hybrid Coding for Video Hybrid Coding for Video
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
04
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [16]
DCT-M.E. Hybrid Video CodingDCT-M.E. Hybrid Video Coding “Hybrid” ~ combined transform coding & predictive coding Spatial redundancy removal
– Use DCT-based transform coding for reference frame Temporal redundancy removal
– Use motion-based predictive coding for next frames estimate motion and use reference frame to predict only encode MV & prediction residue (“motion compensation residue”)
(From Princeton EE330 S’01 by B.Liu)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [17]
Review: Predictive Coding with QuantizationReview: Predictive Coding with Quantization Consider: high correlation between successive samples
Predictive coding– Basic principle: Remove redundancy between successive pixels and only
encode residual between actual and predicted – Residue usually has much smaller dynamic range
Allow fewer quantization levels for the same MSE => get compression
– Compression efficiency depends on intersample redundancy
First try:
Any problem with this codec?
uQ (n)
Predictor+
eQ(n)
uP(n) = f[uQ(n-1)] DecodeDecode
rr
u(n)
Predictor
Quantizer_
e(n) eQ(n)
EncodeEncoderr
u’P(n) = f[u(n-1)]
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [18]
Predictive Coding (cont’d)Predictive Coding (cont’d)
Problem with 1st try– Input to predictor are different at
encoder and decoder decoder doesn’t know u(n)!
– Mismatch error could propagate to future reconstructed samples
Solution: Differential PCM (DPCM)
– Use quantized sequence uQ(n) for prediction at both encoder and decoder
– Simple predictor f[ x ] = x– Prediction error e(n)– Quantized prediction error eQ(n)
– Distortion d(n) = e(n) – eQ(n)
uQ (n)
Predictor+
eQ(n)
uP(n)= f[uQ(n-1)]
DecodeDecoderr
EncodeEncoderr
u(n)
Predictor
Quantizer_
e(n) eQ(n)
+uP(n)=f[uQ(n-1)]
uQ(n)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
Note: “Predictor” contains one-step buffer as input to the prediction
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [19]
Hybrid MC-DCT Video EncoderHybrid MC-DCT Video Encoder(From R.Liu’s Handbook Fig.2.18)
• Intra-frame: encoded without prediction• Inter-frame: predictively encoded => use quantized frames as ref for residue
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [20]
Hybrid MC-DCT Video DecoderHybrid MC-DCT Video Decoder
(From R.Liu’s Handbook Fig.2.18)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [22]
Hybrid Video Coding: Problems to Be SolvedHybrid Video Coding: Problems to Be Solved Not all regions are easily inferable from previous frame
– Occlusion ~ solvable by backward prediction using future frames as ref.– Adaptively decide using prediction or not
Drifting and error propagation– Solution: Encode reference regions or frames from time to time
Random access– Solution: Encode frame without prediction from time to time
How to allocate bits?– Based on visual model and statistics: JPEG-like quant. steps; entropy
coding– Consider constant or variable bit-rate requirement
Constant-bit-rate (CER) vs. Variable-bit-rate (VER)
Wrap up all solutions ~ MPEG-like codec
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [23]
MPEG Video Coding MPEG Video Coding
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
04
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [24]
About MPEGAbout MPEG
MPEG – Moving Pictures Experts Group
– Coding of moving pictures and associated audio
Basic compression idea on the picture part
– Can achieve compression ratio of about 50:1 through storing only the differences between successive frames
– Some claim higher compression ratio Depends on how we calculate Notice color is often downsampled, and interleaving odd/even
fields
Audio part
– Compression of audio data at ratios ranging from 5:1 to 10:1– MP3 ~ “MPEG-1 audio Layer-3”
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [25]
Progressive vs. Interlaced scanProgressive vs. Interlaced scanFrom B.Liu EE330S’01 Princeton
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [26]
Compression RatioCompression Ratio
Raw video
– 24 bits/pixel x (720 x 480 pixels) x 30 fps = 249 Mbps
Potential “cheating” points => contributing ~4:1 inflation
– Color components are actually downsampled– 30 fps may refer to field rate in MPEG-2 ~ equiv. to 15 fps– ( 8 x 720 x 480 + 16 x 720 x 480 / 4 ) x 15 fps = 62 Mbps
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [27]
MPEG GenerationsMPEG Generations MPEG-1 ~ 1-1.5Mbps (early 90s)
– For compression of 320x240 full-motion video at rates around 1.15Mb/s
– Applications: video storage (VCD)
MPEG-2 ~ 2-80Mbps (mid 90s)– For higher resolutions
– Support interlaced video formats and a number of features for HDTV
– Address scalable video coding
– Also used in DVD
MPEG-4 ~ 9-40kbps (later 90s)
– For very low bit rate video and audio coding
– Applications: interactive multimedia and video telephony
MPEG-21 ~ ongoing
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [29]
MPEG-1 Video Coding StandardMPEG-1 Video Coding Standard Standard only specifies decoders’ capabilities
– Prefer simple decoding and not limit encoder’s complexity– Leave flexibility and competition in implementing encoder
Block-based hybrid coding (DCT + M.C.)
– 8x8 block size as basic coding unit– 16x16 “macroblock” size for motion estimation/compensation
Group-of-Picture (GOP) structure with 3 types of frames– Intra coded– Forward-predictively coded– Bidirectional-predictively coded
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [30]
MPEG-1 Picture Types and Group-of-PicturesMPEG-1 Picture Types and Group-of-Pictures
A Group-of-Picture (GOP) contains 3 types of frames (I/P/B)
Frame order
I1 BBB P1 BBB P2 BBB I2 …
Coding order
I1 P1 BBB P2 BBB I2 BBB …
(From R.Liu Handbook Fig.3.13)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [31]
““Adaptive” Predictive Coding in MPEG-1Adaptive” Predictive Coding in MPEG-1 Half-pel M.V. search within +/-64 pel range
– Use spatial differential coding on M.V. to remove M.V. spatial redundancy
Coding each block in P-frame– Predictive block using previous I/P frame as reference– Intra-block ~ encode without prediction
use this if prediction costs more bits than non-prediction good for occluded area can also avoid error propagation
Coding each block in B-frame– Intra-block ~ encode without prediction– Predictive block
Use previous I/P frame as reference (forward prediction), Or use future I/P frame as reference (backward prediction), Or use both for prediction and take average
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [32]
Coding of B-frame (cont’d)Coding of B-frame (cont’d)
Previous frameCurrent frame
Future frameA
B C
B = A forward predictionB = C backward prediction
or B = (A+C)/2 interpolation
one motion vector
two motion vectors
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
(Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [33]
Quantization for I-frame (I-block) & M.C. Residues Quantization for I-frame (I-block) & M.C. Residues
Quantizer for I-frame (I-block) – Different step size for different freq. band (similar to JPEG)– Default quantization table– Scale the table for different compression-quality
Quantizer for residues in predictive block– Noise-like residue – Similar variance in different freq. band– Assign same quantization step size for each freq. band
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
Revised from R.Liu Seminar Course ’00 @ UMD
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [34]
Adjusting QuantizerAdjusting Quantizer
For smoothing out bit rate– Some applications prefer approx. constant bit rate video stream (CBR)
e.g., prescribe # bits per second very-short-term bit-rate variations can be smoothed by a buffer variations can’t be too large on longer term ~ o.w. buffer overflow
– Need to assign large step size for complex and high-motion frames
For reducing bit rate by exploiting HVS temporal properties– Noise/distortion in a video frame would not be very much visible when there
is a sharp temporal transition (scene change) can compress a few frames right after scene change with less bits
Alternative bit-rate adjustment tool ~ frame type– I I I I I I … lowest compression ratio (like motion-JPEG)– I P P … P I P P … moderate compression ratio– I B B P B B P B B I … highest compression ratio
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [35]
Color TransformationColor Transformation
RGB YUV color coordinates
U/V chrominance components are downsampled in coding
B
G
R
V
U
Y
0813.04187.05000.0
5000.03313.01687.0
1140.05870.02990.0
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [37]
Video Coding Summary: Performance TradeoffVideo Coding Summary: Performance Tradeoff
From R.Liu’s Handbook Fig.1.2:
“mos” ~ 5-pt mean opinion scale of bad, poor, fair, good, excellent
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
ENEE631 Digital Image Processing (Spring'04) Lec18 – Video Coding (1) [38]
Top Related