Motion Estimation for Video Coding - Stanford University · PDF fileT. Wiegand / B. Girod:...

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 1

Motion Estimation for Video Coding

Motion-Compensated Prediction

Bit Allocation

Motion Models

Motion Estimation

Efficiency of Motion Compensation Techniques


Intra-Frame

Decoder

Motion-

Compensated

Predictor

Control

Data

DCT

Coefficients

Motion

Data

0

Intra/Inter

Coder

Control

Decoder

Motion

Estimator

Intra-Frame

DCT Coder-

],,[ tyxs ],,[ tyxu

],,[' tyxs

],,[' tyxu

],,[̂ tyxs

Hybrid Video Encoder


Motion-Compensated Prediction

Prediction for the luminance signal s[x,y,t] within the moving object:

Moving

object

Displaced

object

time t

x

y

Previous

frame

Current

frame

Stationary

backgroundt

y

x

d

d

),,(],,[̂ ttdydxstyxs yx


Motion-Compensated Prediction: Example

Referenced blocks in frame 1 Difference between motion-

compensated prediction and

current frame u[x,y,t]

Frame 1 s[x,y,t-1] (previous)

Frame 2 with

displacement vectors

Accuracy of Motion Vectors

Frame 2 s[x,y,t] (current) Partition of frame 2 into blocks

(schematic)

Size of Blocks


yx

yyxx

dd

yx

yx

yxfyydyxfxxd

,

,

,

,

),,( ),,,(

ba

ba

Motion in 3-D space corresponds to displacements in the image plane

Motion compensation in the image plane is conducted to provide a

prediction signal for efficient video compression

Efficient motion-compensated prediction often uses side information to

transmit the displacements

Displacements must be efficiently represented for video compression

Motion models relate 3-D motion to displacements assuming

reasonable restrictions of the motion and objects in the 3-D world

Motion Models

Motion Model

: location in previous image

: location in current image

: vector of motion coefficients

: displacements


Decoded video signal is given as

Representation of Video Signal

],,[],,[̂],,[ tyxutyxstyxs

Motion-compensated

prediction signal

),,(],,[̂ ttdydxstyxs yx

),(),,(1

0

1

0

yxbdyxad i

N

i

iyi

N

i

ix

Prediction residual signal

),(),(1

0

yxcyxu j

M

j

j

,...)(,...),(),,( 00 bafRm baba

Transmitted motion parameters

Transmitted residual

parameters

,...)(),( 0cfRu cc


Optimum trade-off:

Displacement error variance can be influenced via

• Block-size, quantization of motion parameters

• Choice of motion model

Bit-rate

Displacement

error variance

D

Prediction

error rate

Ru

Motion

vector rate

Rm

Rate-Constrained Motion Estimation

um

um

RRRdR

dD

dR

dD ,


Lagrangian Optimization in Video Coding

Distortion after

motion compensationLagrange

parameter

Number of bits

for motion vector

Rate-Constrained Motion Estimation [Sullivan, Baker 1991]:

Rate-Constrained Mode Decision [Wiegand, et al. 1996]:

A number of interactions are often neglected

• Temporal dependency due to DPCM loop

• Spatial dependency of coding decisions

• Conditional entropy coding

Distortion after

reconstructionLagrange

parameter

Number of bits

for coding mode

mmm RD min

RD min


Translational motion model

4-Parameter motion model: translation, zoom (isotropic Scaling),

rotation in image plane

Affine motion model:

Parabolic motion model

Motion Models

00 , bdad yx

yaxabd

yaxaad

y

x

120

210

ybxbbd

yaxaad

y

x

210

210

xybybxbybxbbd

xyayaxayaxaad

x

x

5

2

6

2

2310

5

2

6

2

2310

),(),,(1

0

1

0

yxbdyxad i

N

i

iyi

N

i

ix


Impact of the Affine Parameters

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

translation

scaling

sheering

0adx

xadx 1

yadx 3


-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

Impact of the Parabolic Parameters

2

2xadx

2

6 yadx

xyadx 5


Differential Motion Estimation

Assume small displacements dx,dy:

Aperture problem: several observations required

Inaccurate for displacements > 0.5 pel

multigrid methods, iteration

Minimize

Displace frame difference Horizontal and vertical

gradient of image signal S

yx

yx

dy

ttyxsd

x

ttyxsttyxstyxs

ddtyxstyxstyxu

),,(),,(),,(),,(

),,,,(ˆ),,(),,(

),,(min1 1

2 tyxuBy

y

Bx

x


Gradient-Based Affine Refinement

Displacement vector field is represented as

Combination

yields a system of linear equations:

System can be solved using, e.g., pseudo-inverse,

by minimizing

ybxbby

yaxaax

321

321

)()(),,(),,(),,( 321321 ybxbby

syaxaa

x

sttyxstyxstyxu

By

y

Bx

xbbbaaa

tyxu1 1

2

,,,,,

),,(minarg321321

3

2

1

3

2

1

,,,,,,

b

b

b

a

a

a

yy

sx

y

s

y

sy

x

sx

x

s

x

sssu


• Subdivide current frame

into blocks.

• Find one displacement

vector for each block.

• Within a search range, find

a “best match” that

minimizes an error

measure.

• Intelligent search strategies

can reduce computation.

search range in

previous frame

block of current

frame

Sk1

Sk

Block-matching Algorithm


Previous Frame Current Frame

Block of pixels is selected

as a measurement window

Measurement window is

compared with a shifted block

of pixels in the other image,

to determine the best match




. . . process repeated for another block.

Previous Frame Current Frame


Error Measures for Block-matching

Mean squared error (sum of squared errors)

Sum of absolute differences

Approximately same performance

SAD less complex for some architectures

2

1 1

)],,(),,([),( ttdydxstyxsddSSD y

By

y

Bx

x

xyx

|),,(),,(|),(1 1

ttdydxstyxsddSAD y

By

y

Bx

x

xyx


Full search

All possible displacements within the search range

are compared.

Computationally expensive

Highly regular, parallelizable

Block-matching: Search Strategies

dx

dy


Complexity of block-matching:

evaluation of complex error measure for many candidates

Speedup of Block-matching

Combine both approaches:

Choose starting point and search order that maximizes

likelihood for efficient approximations, early terminations

and excluding of candidates

Reduce complexity

• Approximations

• Early terminations

• Exclude candidates

Reduce number

• Cover likely search areas

• Unequal steps between

searched candidates


Approximations

• Stop search, if match is “good enough”

(SSD, SAD < threshold or J=D+R is small enough)

• Practical method in video conferencing for static

background: test zero-vector first and stop search if

match is good enough

static background


Early Termination

Partial distortion measure:

Previously computed minimum:

Early termination without loss:

Early termination with possible loss, but higher speedup:

p

y

K

y

B

x

xyxK ttdydxstyxsddDx

)],,(),,([),(1 1

yyxK BNKddD ...),,(

Jmin

min),(),( if Stop, JddRddD yxmyxK

min)(),(),( if Stop, JKddRddD yxmyxK

2/1)/()( e.g., yBKK


Block Comparison Speed-Ups

Triangle inequality for samples in block B (here SAD)

Strategy:1. Compute partial sums for blocks in current and previous frame

2. Compare blocks based on partial sums

3. Omit full block comparison, if partial sums indicate worse error

measure than previous best result

Block partitioning

Choose blocks to be nested (4x4, 8x8, 16x16, …) – efficient

for modern variable block size video codecs

n B

kk

B

kk

n

SSSS |||| 11

nn

n BBB ,

B

kk

B

kk SSSS |||||| 11

nB


Blockmatching: Search Order I

• Iterative comparison of error

measure values at 5

neighboring points

• Logarithmic refinement of the

search pattern if

– best match is in the center of

the 5-point pattern

– center of search pattern

touches the border of the

search range

2D logarithmic search [Jain + Jain, 1981]


Diamond search [Li, Zeng, Liou, 1994] [Zhu, Ma, 1997]

d x

d y

Start with

large diamond

pattern at

(0,0)

If best match

lies in the center

of large diamond,

proceed with

small diamond

If best match does not lie

in the center of large

diamond, center large

diamond pattern at new

best match

Blockmatching: Search Order II


Hierarchical blockmatching

current frame

previous frame

Block matching

Block matching

Block matching

Filtering and subsampling

Displacement vector field

Filtering and subsampling


Sub-pel Translational Motion

Motion vectors are often not restricted to only point into

the integer-pel grid of the reference frame

Typical sub-pel accuracies: half-pel and quarter-pel

Sub-pel positions are often estimated by „refinement“

•1 •1 •1

•1

•1

d

d

x

y

•1•1

•1•1

•2•2•2 •2

•2•2•2•2


Sub-pel Motion Compensation

Sub-pel positions are obtained via interpolation

Example: half-pixel accurate displacements

• • • • • • • • • • • • •• • • • • • • • • • • • •• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •• • • • • • • • •

• • • • • • • • •

• • • • • • • • •• • • • • •

• • • •• • • •• • • •• • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • •• • • •

• • • •

• • • •

4.5

4.5

x

y

d

d


Bi-linear Interpolation

Interpolated

Pixel Value

brightness


History of Motion Compensation

• Intraframe coding: only spatial correlation exploited

DCT [Ahmed, Natarajan, Rao 1974], JPEG [1992]

• Conditional replenishment

H.120 [1984] (DPCM, scalar quantization)

• Frame difference coding

H.120 Version 2 [1988]

• Motion compensation: integer-pel accurate displacements

H.261 [1991]

• Half-pel accurate motion compensation

MPEG-1 [1993], MPEG-2/H.262 [1994]

• Variable block-size (16x16 & 8x8) motion compensation

H.263 [1996], MPEG-4 [1999]

• Variable block-size (16x16 – 4x4) and multi-frame

motion compensation

H.264/MPEG-4 AVC [2003]

Complexity

increase


Milestones in Video Coding

0 100 200 300

28

30

32

34

36

38

40

Rate [kbit/s]

PSNR

[dB]

Variable block size

(16x16 – 8x8)

(H.263, 1996) +

quarter-pel

motion compensation

(MPEG-4, 1998)

Variable block size

(16x16 – 4x4) +

quarter-pel +

multi-frame

motion compensation

(H.264/AVC, 2003)

Intraframe

DCT coding

(JPEG, 1990)

Foreman

10 Hz, QCIF

100 frames

Frame

Difference coding

(H.120 1988)

Conditional

Replenishment

(H.120)

Half-pel

motion compensation

(MPEG-1 1993

MPEG-2 1994)

Integer-pel

motion

compensation

(H.261, 1991)

Bit-rate Reduction: 75%35


Milestones in Video Coding

0 100 200 300

28

30

32

34

36

38

40

Rate [kbit/s]

PSNR

[dB]

Integer-pel

motion

compensation

(H.261, 1991)

Variable block size

(16x16 – 8x8)

(H.263, 1996) +

quarter-pel

motion compensation

(MPEG-4, 1998)

Variable block size

(16x16 – 4x4) +

quarter-pel +

multi-frame

motion compensation

(H.264/AVC, 2003)

Intraframe

DCT coding

(JPEG, 1990)

Foreman

10 Hz, QCIF

100 frames

Half-pel

motion compensation

(MPEG-1 1993

MPEG-2 1994)

Frame

Difference coding

(H.120 1988)

Conditional

Replenishment

(H.120)

Visual Comparison


Visual Comparison

Foreman, QCIF, 10 Hz, 100 kbit/s

JPEG H.264/AVC


Summary

Video coding as a hybrid of motion compensation and prediction

residual coding

Motion models can represent various kinds of motions

Lagrangian bit-allocation rules specify constant slope allocation to

motion coefficients and prediction error

In practice: affine or 8-parameter model for camera motion,

translational model for small blocks

Differential methods calculate displacement from spatial and

temporal differences in the image signal -

Block matching computes error measure for candidate

displacements and finds best match

Speed up block matching by fast search methods, approximations,

early terminations and clever application of triangle inequality

Hybrid video coding has been drastically improved by enhanced

motion compensation capabilities

Motion Estimation for Video Coding - Stanford University · PDF fileT. Wiegand / B. Girod:...

Documents

Transcript of Motion Estimation for Video Coding - Stanford University · PDF fileT. Wiegand / B. Girod:...