M. Wu: ENEE631 Digital Image Processing (Spring'09) MPEG Video Coding and Beyond Spring ’09...

M. Wu: ENEE631 Digital Image Processing (Spring'09)

MPEG Video Coding and BeyondMPEG Video Coding and Beyond

Spring ’09 Instructor: Min Wu

Electrical and Computer Engineering Department,

University of Maryland, College Park

bb.eng.umd.edu (select ENEE631 S’09) [email protected]

ENEE631 Spring’09ENEE631 Spring’09Lecture 17 (4/6/2009)Lecture 17 (4/6/2009)

M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [2]

Overview and LogisticsOverview and Logistics

Last Time:

– Block-matching and application to hybrid video coding Exploit spatial redundancy via transform coding: e.g. block DCT

coding Exploit temporal redundancy via predictive coding: ME/MC

– MPEG-1 video coding standard

Today:– Finish MPEG-1 Discussion– Other coding considerations/standards: H.26x, MPEG-2, MPEG-4, etc.– Geometric transform of images

Assign#4 on video and motion estimation – posted online

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Review: DCT + ME/MC for Hybrid Video CodingReview: DCT + ME/MC for Hybrid Video Coding “Hybrid” ~ combined transform coding & predictive coding Spatial redundancy removal

– Use DCT-based transform coding for reference frame Temporal redundancy removal

– Use motion-based predictive coding for next frames estimate motion and use reference frame to predict only encode MV & prediction residue (“motion compensation residue”)

(From Princeton EE330 S’01 by B.Liu)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Review: Hybrid MC-DCT Video Encoder & DecoderReview: Hybrid MC-DCT Video Encoder & Decoder(From R.Liu’s Handbook Fig.2.18)

• Intra-frame: encoded without prediction

• Inter-frame: predictively encoded => use quantized frames as ref for residue

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Review: Additional Issues in Hybrid Video CodingReview: Additional Issues in Hybrid Video Coding

Not all regions are easily inferable from previous frame– Occlusion ~ solvable by backward prediction using future frames as ref.– Adaptively decide using prediction or not

Drifting and error propagation

Solution: Encode reference regions or frames from time to time (“intra coding”)

Random access: e.g. want to get 95th frame

Solution: Encode frame without prediction from time to time

How to allocate bits?– Based on visual model and statistics: JPEG-like quantiz.steps; entropy coding

– Consider constant or variable bit-rate requirement Constant-bit-rate (CER) vs. Variable-bit-rate (VER)

Wrap up all solutions ~ MPEG-like codec

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Review: MPEG-1 Video Coding StandardReview: MPEG-1 Video Coding Standard

Standard only specifies decoders’ capabilities

– Prefer simple decoding and not limit encoder’s complexity– Leave flexibility and competition in implementing encoder

Block-based hybrid coding (DCT + M.C.)

– 8x8 block size as basic coding unit– 16x16 “macroblock” size for motion estimation/compensation

Group-of-Picture (GOP) structure with 3 types of frames– Intra coded– Forward-predictively coded– Bidirectional-predictively coded

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


MPEG-1 Picture Types and Group-of-PicturesMPEG-1 Picture Types and Group-of-Pictures

A Group-of-Picture (GOP) contains 3 types of frames (I/P/B)

Frame order

I1 BBB P1 BBB P2 BBB I2 …

Coding order

I1 P1 BBB P2 BBB I2 BBB …

(From R.Liu Handbook Fig.3.13)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


““Adaptive” Predictive Coding in MPEG-1Adaptive” Predictive Coding in MPEG-1

Half-pel M.V. search within +/-64 pel range– Use spatial differential coding on M.V. to remove M.V. spatial redundancy

Coding each block in P-frame– Predictive block using previous I/P frame as reference– Intra-block ~ encode without prediction

use this if prediction costs more bits than non-prediction good for occluded area can also avoid error propagation

Coding each block in B-frame– Intra-block ~ encode without prediction– Predictive block

Use previous I/P frame as reference (forward prediction), Or use future I/P frame as reference (backward prediction), Or use both for prediction and take average

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Coding of B-frame (cont’d)Coding of B-frame (cont’d)

Previous frameCurrent frame

Future frameA

B C

B = A forward predictionB = C backward prediction

or B = (A+C)/2 interpolation

one motion vector

two motion vectors

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

(Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)


Quantization for I-frame (I-block) & M.C. Residues Quantization for I-frame (I-block) & M.C. Residues

Quantizer for I-frame (I-block) – Different step size for different freq. band (similar to JPEG)– Default quantization table– Scale the table for different compression-quality

Quantizer for residues in predictive block– Noise-like residue – Similar variance in different frequency band

=> Assign same quantization step size for each frequency band

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Revised from R.Liu Seminar Course ’00 @ UMD


Adjusting QuantizerAdjusting Quantizer For smoothing out bit rate

– Some applications prefer approx. constant bit rate video stream (CBR)e.g., prescribe # bits per second very-short-term bit-rate variations can be smoothed by a buffer variations can’t be too large on longer term

~ o.w. buffer overflow, delay and jitter in playback

– Need to assign large step size for complex and high-motion frames

For reducing bit rate by exploiting HVS temporal properties– Noise/distortion in a video frame would not be very much visible when

there is a sharp temporal transition (scene change) can compress a few frames right after scene change with

fewer bits

Alternative bit-rate adjustment tool ~ frame type– I I I I I I … lowest compression ratio (like motion-JPEG)– I P P … P I P P … moderate compression ratio– I B B P B B P B B I … highest compression ratio

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Color TransformationColor Transformation

RGB YUV color coordinates

U/V chrominance components are downsampled in coding

B

G

R

V

U

Y

0813.04187.05000.0

5000.03313.01687.0

1140.05870.02990.0

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Video Coding Summary: Performance TradeoffVideo Coding Summary: Performance Tradeoff

From R.Liu’s Handbook Fig.1.2:

“mos” ~ 5-pt mean opinion scale of bad, poor, fair, good, excellent

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


About Compression RatioAbout Compression Ratio Raw video

– 24 bits/pixel x (720 x 480 pixels) x 30 fps = 249 Mbps

Potential “cheating” points => contributing ~ 4:1 inflation

– Color components are actually downsampled– 30 fps may refer to field rate in MPEG-2 ~ equiv. to 15 fps– ( 8 x 720 x 480 + 16 x 720 x 480 / 4 ) x 15 fps = 62 Mbps

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Other Standards and Considerations for Other Standards and Considerations for

Digital Video Coding Digital Video Coding

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


H.26x for Video TelephonyH.26x for Video Telephony

Remote face-to-face communication: A dream for years– H.26x series – video coding targeted low bit rate

through ISDN or regular analog telephone line ~ on the order of 64kbps

need roughly symmetric complexity on encoder and decoder

H.261 (early 1990s)– Similar to simplified MPEG-1 ~ block-based DCT/MC hybrid coder– Integer-pel motion compensation with I/P frame only ~ no B frames– Restricted picture size/fps format and M.V. range

H.263 (mid 1990s) and H.263+/H.263++ (late 1990s)– Support half-pel motion compensation & many options for improvement

H.264 (latest, 2001-): also known as H.26L / JVT / MPEG4 part10

– Hybrid coding framework with many advanced techniques– Focusing on greatly improving compression ratio at a cost of complexity

allow smaller block size; more choices on ref; advanced entropy coding, etc.

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


From Gonzalez-Woods 3/e Table 8.11


MPEG-2MPEG-2

Extend from MPEG-1

Target at high-resolution high-bit-rate applications

– Digital video broadcasting, HDTV, …; also used for DVD

Support interlaced video

– Frame pictures vs. Field pictures– New prediction modes for motion compensation for interlaced video

Use previously encoded fields to do M.E.-M.C.

Support scalability

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


From Wang’s book preprint Fig. 13.17


Scalability in Video CodecsScalability in Video Codecs

Scalability: provide different quality in a single stream– Stack up more bits on base layer to provide improved quality

Possible ways for achieving scalabilities– SNR Scalability ~ Multiple–quality video services

Basic vs. premium quality

– Spatial Scalability ~ Multiple-dimension displays Display on PDA vs. PC vs. Super-resolution display

– Temporal Scalability ~ Multiple frame rates– Frequency Scalability ~ Blurred version to sharp, detailed version

Layered coding concept facilitates:– Unequal error protection – Efficient use of resources– Different needs from customers – Multiple services

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


SNR ScalabilitySNR Scalability

Two layers with same spatio-temporal resolution but different qualities

base-layerencoder

base-layerdecoder

enhancement-layerencoder

mul

tipl

exer+ -

Video inBase-layerbitsteam

Enhancement-layerbitsteam

Outputbitsteam

From R.Liu Seminar Course @ UMCP


Spatial ScalabilitySpatial Scalability

Two layers with different spatial resolution

base-layerencoder

base-layerdecoder

enhancement-layerencoder

mul

tipl

exer+ -

Video inBase-layerbitsteam

Enhancement-layerbitsteam

Outputbitsteam

Down-sampler

Up-sampler

From R.Liu Seminar Course @ UMCP


MPEG-4MPEG-4

Many functionalities targeting a variety of applications

Introduced object-based coding strategy– For better support of interactive applications & graphics/animation video– Require encoder to perform object segmentation

difficult for general applications

Introduced error resilient coding techniques– “Streaming video profile” for wireless multimedia applications

Part-10 is converged into H.264/AVC (Advanced Video Coding)

– Focused on improving compression ratio and error resilience– Stick with Hybrid Coding framework

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Object-based Coding in MPEG-4Object-based Coding in MPEG-4

Interactive functionalities Higher compression

efficiency by separately handling – Moving objects– Unchanged background– New regions– M.C.-failure regions=> “Sprite” encoding

Object segmentationneeded (not easy )– Based on color, motion,

edge, texture, etc.– Possible for targeted

applications

Revised from R.Liu Seminar Course @ UMCPU

MC

P E

NE

E4

08

G S

lide

s (c

rea

ted

by

M.W

u &

R.L

iu ©

20

02

)


From Wang’sPreprint

Table 1.3

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


MPEG-7MPEG-7

“Multimedia Content Description Interface”– Not a video coding/compression standard like previous MPEG– Emphasize on how to describe the video content for efficient

indexing, search, and retrieval

Standardize the description mechanism of content– Descriptor, Description Scheme, Description Definition Languages

Employ XML type of description language

– Example of MPEG-7 visual descriptors: Color, Texture, Shape, …

Figure from MPEG-7 Document N4031 (March 2001)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Summary of Today’s LectureSummary of Today’s Lecture

MPEG-1 video coding standard

Other coding considerations and standards– H.26x, MPEG-2, MPEG-4, MPEG-7, etc.

Geometric transform of images ~ more in next lecture

Readings:– Gonzalez’s 3/e book 8.2.9, 8.1.7; 2.6.5 (geometric transform)

– Liu’s book on video coding (see course website) Chapter 2 “Motion-Compensated DCT Video Coding” Chapter 3 “Video Coding Standards”

– Other reference: Wang’s textbook Chapter 13 (video standards); Chapter 1 (video basics)

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Geometric Relations and Manipulations of ImagesGeometric Relations and Manipulations of Images

Useful to characterize:Useful to characterize:- global camera motion in video;- global camera motion in video;- relate two images of similar scenes taken from - relate two images of similar scenes taken from different time or slightly different view point different time or slightly different view point => “image registration” => “image registration”

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Rotation, Translation, and Scaling Rotation, Translation, and Scaling

R.S.T. of an image object– Original pixel location (x,y) New location (x’,y’)

yxy

xtt

t

t

y

x

y

x by ation transl

'

'

clockwise-counter

origin around by rotate

cossin

sincos

'

'

y

x

y

x

yxy

xss

y

x

s

s

y

x and by scaling

0

0

'

'

Uniform scaling Sx = Sy (preserve angle and shape)

Differential scaling Sx Sy

Preserve length & angle

(x, y)

(x’, y’)

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Rotation, Translation, and Scaling (cont’d)Rotation, Translation, and Scaling (cont’d)

Rotation and translation of image coordinates

– Note the relations with rotation and translation of image objects

) ,( origin to ate transl'

'yx

y

xtt

t

t

y

x

y

x

clockwise-counter

by axis rotate

cossin

sincos

'

'

y

x

y

x

x

y

y’

x’

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Implementation Issues of Geometric TransformImplementation Issues of Geometric Transform Forward transform

– Index mapping from input to output image What if most values obtained for an output image are at

fractional coordinate indices?

Reverse transform

– Map integer indices of output image to input image Get values of input image at fractional indices through

interpolation

(p,q)

(p’,q’)

(p,q+1)

(p+1,q+1)(p+1,q)

a

b

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


2-D Homogeneous Coordinate2-D Homogeneous Coordinate

Describe R.S.T. transform by P’ = M P + T

– Need calculating intermediate coordinate values for successive transf.

Homogeneous coordinate

– Allow R.S.T. represented by matrix multiplication operations successive transf. can be calculated by combining transf. matrices

– Cartesian point (x,y) Homogeneous representation ( s x’, s y’, s ) represent same pixel location for all nonzero parameter s; often

use s=1

The name: Equation f(x,y) = 0 becomes homogeneous equation in (s x’, s y’, s ) such that if the common factors in 3 parameters can be factored out from the equation.

1

1

'

'

~

1

'

'

3231

232221

131211

y

x

aa

aaa

aaa

s

sy

sx

y

x

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

/20

04

)Exercise: express RST and reflection in homo-

geneous coordinate


R.S.T. in Homogeneous CoordinatesR.S.T. in Homogeneous Coordinates

Successive R.S.T.

– Left multiply the basic transform matrices

1

100

10

01

1

'

'

y

x

t

t

y

x

y

x

1

100

00

00

1

'

'

y

x

s

s

y

x

y

x

1

100

0cossin

0sincos

1

'

'

y

x

y

x

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


ReflectionReflection

Reflect about x-axis, y-axis, and origin

Reflect about y=x and y=-x

Reflect about a general line y=ax+bCombination of translate-rotate => reflect => inverse rotate-translate

100

010

001

100

010

001

100

010

001

100

001

010

100

001

010

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


ShearShear

Shear ~ a transformation that distorts the shape

– Cause opposite layers of the object slide over each other

Shear relative to x-axis

Extend to shears relative to other reference lines

100

010

01

xsh(1, 1)

y

x

(1, 0)

y’

x’

(2, 1) (3, 1)shx =2

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


General Composite TransformsGeneral Composite Transforms

Combined R.S.T. – {aij} is determined by

R.S.T. parameters

Rigid-body transform– Only involve translations and rotations– 2x2 rotation submatrix is orthogonal

row vectors are orthonormal

Extension to 3-D homogeneous coordinate– ( sX, sY, sZ, s ) with 4x4 transformation matrices

1

1001

'

'

232221

131211

y

x

aaa

aaa

y

x

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


General Composite Transforms (cont’d)General Composite Transforms (cont’d) Affine transforms ~ 6 parameters

– Can be expressed as composition of RST,reflection and shear

– Parallel lines are transformed as parallel lines

Projective transforms ~ 8 parameters

– Cover more general geometric transformations between 2 planes Widely used in computer vision (e.g. image mosaicing, synthesized

views)

– Two unique phenomena: Chirping: increase in perceived spatial freq as distance to camera

increases Converging/Keystone effects: parallel lines appear closer &

merging in distance

1

1001

'

'

232221

131211

y

x

aaa

aaa

y

x

wc

bwAw y

x

cc

baa

baa

s

sy

sx

Tnew

yxw T

1

1

1

'

' ]','[

21

22221

11211

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Effects of Various Geometric MappingsEffects of Various Geometric Mappings

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)

From Wang’s Book Preprint Fig.5.18


Higher-order Nonlinear Spatial WarpingHigher-order Nonlinear Spatial Warping

Analogous to “rubber sheet stretching”

– Forward and reverse mapping of pixels’ coordinate indices

(x, y) (x’, y’)

Polynomial warping

– Extend affine transform to higher-order polynomial mapping– 2nd-order warping

x’ = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2

y’ = b0 + b1 x + b2 y + b3 x2 + b4 xy + b5 y2

Spatial distortion in imaging system (lens)

– Pincushion and Barrel distortion

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Example of Example of 22ndnd-order -order Polynomial Polynomial Spatial Spatial WarpingWarping

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)

From P. Ramadge’s PU EE488 F’00


Illustration of Geometric DistortionIllustration of Geometric Distortion

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)

From P. Ramadge’s PU EE488 F’00


Compensating Spatial Distortion in ImagingCompensating Spatial Distortion in Imaging Control points – establishing correspondence

– Coordinates before and after distortion are known Fit into polynomial warping model: (x, y) => (x’, y’)

x’ = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2

y’ = b0 + b1 x + b2 y + b3 x2 + b4 xy + b5 y2

– Minimize the sum of squared error between a set of warped control points and the polynomial estimates

x’ = [ x’1, x’2, …, x’M ]T , Z = [ 1, x1, y1, x12, x1y1, y1

2 ; 1, x2, y2, … ]

E = ( x’ – Z a )T ( x’ – Z a ) + ( y’ – Z b )T ( y’ – Z b ) E / a = 0 => x’ = Z a

– Least square estimates: solution expressed by generalized inverse of Z a = Z^ x’ = (ZT Z) -1 ZT x’; b = Z^ y’

Higher-order approximation– 2nd order polynomial usually suffices for many applications

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Example of Example of Image RegistrationImage Registration

Figure from Gonzalez-Wood 3/e online book resource


Generations of Video CodingGenerations of Video Coding

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

From R.Liu Seminar Course ’00 @ UMCP

M. Wu: ENEE631 Digital Image Processing (Spring'09) MPEG Video Coding and Beyond Spring ’09...

Documents

Transcript of M. Wu: ENEE631 Digital Image Processing (Spring'09) MPEG Video Coding and Beyond Spring ’09...