Motion Estimation for Video Coding - Stanford University · PDF fileT. Wiegand / B. Girod:...

33
T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 1 Motion Estimation for Video Coding Motion-Compensated Prediction Bit Allocation Motion Models Motion Estimation Efficiency of Motion Compensation Techniques

Transcript of Motion Estimation for Video Coding - Stanford University · PDF fileT. Wiegand / B. Girod:...

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 1

Motion Estimation for Video Coding

Motion-Compensated Prediction

Bit Allocation

Motion Models

Motion Estimation

Efficiency of Motion Compensation Techniques

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 2

Intra-Frame

Decoder

Motion-

Compensated

Predictor

Control

Data

DCT

Coefficients

Motion

Data

0

Intra/Inter

Coder

Control

Decoder

Motion

Estimator

Intra-Frame

DCT Coder-

],,[ tyxs ],,[ tyxu

],,[' tyxs

],,[' tyxu

],,[̂ tyxs

Hybrid Video Encoder

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 3

Motion-Compensated Prediction

Prediction for the luminance signal s[x,y,t] within the moving object:

Moving

object

Displaced

object

time t

x

y

Previous

frame

Current

frame

Stationary

backgroundt

y

x

d

d

),,(],,[̂ ttdydxstyxs yx

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 4

Motion-Compensated Prediction: Example

Referenced blocks in frame 1 Difference between motion-

compensated prediction and

current frame u[x,y,t]

Frame 1 s[x,y,t-1] (previous)

Frame 2 with

displacement vectors

Accuracy of Motion Vectors

Frame 2 s[x,y,t] (current) Partition of frame 2 into blocks

(schematic)

Size of Blocks

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 5

yx

yyxx

dd

yx

yx

yxfyydyxfxxd

,

,

,

,

),,( ),,,(

ba

ba

Motion in 3-D space corresponds to displacements in the image plane

Motion compensation in the image plane is conducted to provide a

prediction signal for efficient video compression

Efficient motion-compensated prediction often uses side information to

transmit the displacements

Displacements must be efficiently represented for video compression

Motion models relate 3-D motion to displacements assuming

reasonable restrictions of the motion and objects in the 3-D world

Motion Models

Motion Model

: location in previous image

: location in current image

: vector of motion coefficients

: displacements

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 6

Decoded video signal is given as

Representation of Video Signal

],,[],,[̂],,[ tyxutyxstyxs

Motion-compensated

prediction signal

),,(],,[̂ ttdydxstyxs yx

),(),,(1

0

1

0

yxbdyxad i

N

i

iyi

N

i

ix

Prediction residual signal

),(),(1

0

yxcyxu j

M

j

j

,...)(,...),(),,( 00 bafRm baba

Transmitted motion parameters

Transmitted residual

parameters

,...)(),( 0cfRu cc

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 7

Optimum trade-off:

Displacement error variance can be influenced via

• Block-size, quantization of motion parameters

• Choice of motion model

Bit-rate

Displacement

error variance

D

Prediction

error rate

Ru

Motion

vector rate

Rm

Rate-Constrained Motion Estimation

um

um

RRRdR

dD

dR

dD ,

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 8

Lagrangian Optimization in Video Coding

Distortion after

motion compensationLagrange

parameter

Number of bits

for motion vector

Rate-Constrained Motion Estimation [Sullivan, Baker 1991]:

Rate-Constrained Mode Decision [Wiegand, et al. 1996]:

A number of interactions are often neglected

• Temporal dependency due to DPCM loop

• Spatial dependency of coding decisions

• Conditional entropy coding

Distortion after

reconstructionLagrange

parameter

Number of bits

for coding mode

mmm RD min

RD min

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 9

Translational motion model

4-Parameter motion model: translation, zoom (isotropic Scaling),

rotation in image plane

Affine motion model:

Parabolic motion model

Motion Models

00 , bdad yx

yaxabd

yaxaad

y

x

120

210

ybxbbd

yaxaad

y

x

210

210

xybybxbybxbbd

xyayaxayaxaad

x

x

5

2

6

2

2310

5

2

6

2

2310

),(),,(1

0

1

0

yxbdyxad i

N

i

iyi

N

i

ix

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 10

Impact of the Affine Parameters

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

translation

scaling

sheering

0adx

xadx 1

yadx 3

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 11

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

-150

-100

-50

0

50

100

150

-150 -100 -50 0 50 100 150

x

y

Impact of the Parabolic Parameters

2

2xadx

2

6 yadx

xyadx 5

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 12

Differential Motion Estimation

Assume small displacements dx,dy:

Aperture problem: several observations required

Inaccurate for displacements > 0.5 pel

multigrid methods, iteration

Minimize

Displace frame difference Horizontal and vertical

gradient of image signal S

yx

yx

dy

ttyxsd

x

ttyxsttyxstyxs

ddtyxstyxstyxu

),,(),,(),,(),,(

),,,,(ˆ),,(),,(

),,(min1 1

2 tyxuBy

y

Bx

x

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 13

Gradient-Based Affine Refinement

Displacement vector field is represented as

Combination

yields a system of linear equations:

System can be solved using, e.g., pseudo-inverse,

by minimizing

ybxbby

yaxaax

321

321

)()(),,(),,(),,( 321321 ybxbby

syaxaa

x

sttyxstyxstyxu

By

y

Bx

xbbbaaa

tyxu1 1

2

,,,,,

),,(minarg321321

3

2

1

3

2

1

,,,,,,

b

b

b

a

a

a

yy

sx

y

s

y

sy

x

sx

x

s

x

sssu

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 14

• Subdivide current frame

into blocks.

• Find one displacement

vector for each block.

• Within a search range, find

a “best match” that

minimizes an error

measure.

• Intelligent search strategies

can reduce computation.

search range in

previous frame

block of current

frame

Sk1

Sk

Block-matching Algorithm

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 15

Previous Frame Current Frame

Block of pixels is selected

as a measurement window

Measurement window is

compared with a shifted block

of pixels in the other image,

to determine the best match

Block-matching Algorithm

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 16

Block-matching Algorithm

. . . process repeated for another block.

Previous Frame Current Frame

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 17

Error Measures for Block-matching

Mean squared error (sum of squared errors)

Sum of absolute differences

Approximately same performance

SAD less complex for some architectures

2

1 1

)],,(),,([),( ttdydxstyxsddSSD y

By

y

Bx

x

xyx

|),,(),,(|),(1 1

ttdydxstyxsddSAD y

By

y

Bx

x

xyx

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 18

Full search

All possible displacements within the search range

are compared.

Computationally expensive

Highly regular, parallelizable

Block-matching: Search Strategies

dx

dy

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 19

Complexity of block-matching:

evaluation of complex error measure for many candidates

Speedup of Block-matching

Combine both approaches:

Choose starting point and search order that maximizes

likelihood for efficient approximations, early terminations

and excluding of candidates

Reduce complexity

• Approximations

• Early terminations

• Exclude candidates

Reduce number

• Cover likely search areas

• Unequal steps between

searched candidates

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 20

Approximations

• Stop search, if match is “good enough”

(SSD, SAD < threshold or J=D+R is small enough)

• Practical method in video conferencing for static

background: test zero-vector first and stop search if

match is good enough

static background

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 21

Early Termination

Partial distortion measure:

Previously computed minimum:

Early termination without loss:

Early termination with possible loss, but higher speedup:

p

y

K

y

B

x

xyxK ttdydxstyxsddDx

)],,(),,([),(1 1

yyxK BNKddD ...),,(

Jmin

min),(),( if Stop, JddRddD yxmyxK

min)(),(),( if Stop, JKddRddD yxmyxK

2/1)/()( e.g., yBKK

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 22

Block Comparison Speed-Ups

Triangle inequality for samples in block B (here SAD)

Strategy:1. Compute partial sums for blocks in current and previous frame

2. Compare blocks based on partial sums

3. Omit full block comparison, if partial sums indicate worse error

measure than previous best result

Block partitioning

Choose blocks to be nested (4x4, 8x8, 16x16, …) – efficient

for modern variable block size video codecs

n B

kk

B

kk

n

SSSS |||| 11

nn

n BBB ,

B

kk

B

kk SSSS |||||| 11

nB

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 23

Blockmatching: Search Order I

• Iterative comparison of error

measure values at 5

neighboring points

• Logarithmic refinement of the

search pattern if

– best match is in the center of

the 5-point pattern

– center of search pattern

touches the border of the

search range

2D logarithmic search [Jain + Jain, 1981]

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 24

Diamond search [Li, Zeng, Liou, 1994] [Zhu, Ma, 1997]

d x

d y

Start with

large diamond

pattern at

(0,0)

If best match

lies in the center

of large diamond,

proceed with

small diamond

If best match does not lie

in the center of large

diamond, center large

diamond pattern at new

best match

Blockmatching: Search Order II

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 25

Hierarchical blockmatching

current frame

previous frame

Block matching

Block matching

Block matching

Filtering and subsampling

Displacement vector field

Filtering and subsampling

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 26

Sub-pel Translational Motion

Motion vectors are often not restricted to only point into

the integer-pel grid of the reference frame

Typical sub-pel accuracies: half-pel and quarter-pel

Sub-pel positions are often estimated by „refinement“

•1 •1 •1

•1

•1

d

d

x

y

•1•1

•1•1

•2•2•2 •2

•2•2•2•2

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 27

Sub-pel Motion Compensation

Sub-pel positions are obtained via interpolation

Example: half-pixel accurate displacements

• • • • • • • • • • • • •• • • • • • • • • • • • •• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •• • • • • • • • •

• • • • • • • • •

• • • • • • • • •• • • • • •

• • • •• • • •• • • •• • • • • • •

• • • • • • • • • • • • •

• • • • • • • • • • • • •

• • • •• • • •

• • • •

• • • •

4.5

4.5

x

y

d

d

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 28

Bi-linear Interpolation

Interpolated

Pixel Value

brightness

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 29

History of Motion Compensation

• Intraframe coding: only spatial correlation exploited

DCT [Ahmed, Natarajan, Rao 1974], JPEG [1992]

• Conditional replenishment

H.120 [1984] (DPCM, scalar quantization)

• Frame difference coding

H.120 Version 2 [1988]

• Motion compensation: integer-pel accurate displacements

H.261 [1991]

• Half-pel accurate motion compensation

MPEG-1 [1993], MPEG-2/H.262 [1994]

• Variable block-size (16x16 & 8x8) motion compensation

H.263 [1996], MPEG-4 [1999]

• Variable block-size (16x16 – 4x4) and multi-frame

motion compensation

H.264/MPEG-4 AVC [2003]

Complexity

increase

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 30

Milestones in Video Coding

0 100 200 300

28

30

32

34

36

38

40

Rate [kbit/s]

PSNR

[dB]

Variable block size

(16x16 – 8x8)

(H.263, 1996) +

quarter-pel

motion compensation

(MPEG-4, 1998)

Variable block size

(16x16 – 4x4) +

quarter-pel +

multi-frame

motion compensation

(H.264/AVC, 2003)

Intraframe

DCT coding

(JPEG, 1990)

Foreman

10 Hz, QCIF

100 frames

Frame

Difference coding

(H.120 1988)

Conditional

Replenishment

(H.120)

Half-pel

motion compensation

(MPEG-1 1993

MPEG-2 1994)

Integer-pel

motion

compensation

(H.261, 1991)

Bit-rate Reduction: 75%35

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 31

Milestones in Video Coding

0 100 200 300

28

30

32

34

36

38

40

Rate [kbit/s]

PSNR

[dB]

Integer-pel

motion

compensation

(H.261, 1991)

Variable block size

(16x16 – 8x8)

(H.263, 1996) +

quarter-pel

motion compensation

(MPEG-4, 1998)

Variable block size

(16x16 – 4x4) +

quarter-pel +

multi-frame

motion compensation

(H.264/AVC, 2003)

Intraframe

DCT coding

(JPEG, 1990)

Foreman

10 Hz, QCIF

100 frames

Half-pel

motion compensation

(MPEG-1 1993

MPEG-2 1994)

Frame

Difference coding

(H.120 1988)

Conditional

Replenishment

(H.120)

Visual Comparison

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 32

Visual Comparison

Foreman, QCIF, 10 Hz, 100 kbit/s

JPEG H.264/AVC

T. Wiegand / B. Girod: EE398A Image and Video Compression Motion estimation no. 33

Summary

Video coding as a hybrid of motion compensation and prediction

residual coding

Motion models can represent various kinds of motions

Lagrangian bit-allocation rules specify constant slope allocation to

motion coefficients and prediction error

In practice: affine or 8-parameter model for camera motion,

translational model for small blocks

Differential methods calculate displacement from spatial and

temporal differences in the image signal -

Block matching computes error measure for candidate

displacements and finds best match

Speed up block matching by fast search methods, approximations,

early terminations and clever application of triangle inequality

Hybrid video coding has been drastically improved by enhanced

motion compensation capabilities