Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G....

Spatial Sparsity Induced Temporal Spatial Sparsity Induced Temporal Prediction for Hybrid Video Prediction for Hybrid Video

CompressionCompression

Gang Hua and Onur G. GuleryuzGang Hua and Onur G. Guleryuz

Rice University, Houston, TX

[email protected]

DoCoMo USA Labs, Palo Alto, CA

[email protected]

(Please view in full screen presentation mode to see the animations)

22

OutlineOutline

Problem Statement

– Quick intro to hybrid video compression.

– Example difficult video.

– Problems in temporal prediction.

– Quick results showing what the proposed work can do. Our Solution: Spatial Sparsity Induced Temporal Prediction for Hybrid Video

Compression

– Model.

– What we do, how we do it, and why it works. Simulation results showing prediction examples & discussion. Compression results. Conclusion & future work. I will show results on video

but I will also use classical images peppers/barbara to

make intuitive points

33

Quick Digression: The Set of “Natural” ImagesQuick Digression: The Set of “Natural” Images

The set of natural/interesting images

• Non-convex, star-shaped set

• Far from both barbara and peppers but still very useful in compressing barbara or peppers.

44

Setup: Hybrid Video CompressionSetup: Hybrid Video Compression

Predict

current framereference video frame

prediction error

Compress

• We will propose a new prediction technique:

Spatial Sparsity Induced Temporal Prediction (SIP)

55

Current State of PredictionCurrent State of PredictionPredict

current framereference frame

• Current prediction techniques only work well when current frame blocks are simple translations of past frame blocks (Sufficient, for most simple video). • In this work, we will assume translations are accounted for.

+

-Transform Coder

66

Example Difficult VideoExample Difficult Video

(please note differences with traditional sequences like foreman )

(Commercial) (Trailer)

77

Some example problematic temporal Some example problematic temporal evolutions for current techniquesevolutions for current techniques

reference frame current frame

Temporally decorrelated noise

Simple fade from a blend of two scenes

Special effects

INTRA (non-differential) encoding (many bits)

88

MotionCompensation

TransformCoder

1z

+

+Frame to be coded

+

-

++

Previously decoded frame, to be used as reference

Codeddifferential

TransformDecoder

SparsityInduced

Prediction

(causal information)

SIP inside a generic hybrid coderSIP inside a generic hybrid coder

• Objective is to generate better motion compensated predictors.

99

MC reference frame current frameMC reference after SIP

noise

(denoised)

lightning

(removed!)

cross-fade

(fading scenes reduced and amplified as needed!)

clutter

(removed)

1010

Loose Model Loose Model (after translations are accounted for)(after translations are accounted for)

= …= +[ ]

=0.5( + ) + = …[ ]

= …[ ]= …

)()()( 1 iwirix nnn )()()( 111 iwirix nnn

structured noise !brightness change

smooth light-map

relevant noisereference (ith pixel)

current

Straightforward with today’s know-how:• use an overcomplete set of transforms• threshold coefficients, …

• Denoising recipe will not work: can we somehow optimize transforms? use index sets of coefficients?, …

• We must find a common formulation for both of these cases (turns out to be very easy!)

)())()(()( 1111 iwisirix nnnn )()()( 1 iwirix nnn

)())()()(()( 1111 iwisiriix nnnn )()()( 1 iwirix nnn

(Nx1)

I will show more complicated variations as well

1111

How We Do itHow We Do it

M

Mn

d

dHHx

11 ...

)( NN “frame” coefficients of nx

)1( N

)1( MN•

Gcy )~( 1 nxMCy , •

Gdxn ,...1 MHHG

• Look at all images in terms of their “frame coefficients” (translation invariant decompositions generated with a 4x4 block DCT – poor person’s frame)

(M times expansive, M=16)

dGxnˆˆ •

( causal, least squares, per-coefficient estimate of the frame coefficients of )

)()()(ˆ jcjjd iii • nxMini FAQ:

• Why a frame? Separating r and s becomes straightforward, i.e., easy rejection of s.• Inverting overcomplete decompositions? Easy.• Why DCT 4x4? Because it is fast.• Can one use *lets, <my favorite basis>? Subject to some caveats, yes.

1212

Views of a DCT(4x4) frameViews of a DCT(4x4) frame

nx y = 0.5 ( + )

lkn dctx , lkdcty ,

I rearranged the d to make nice pictures

out of the coefficients

1313

2,1dctxn 2,1dcty

y

)0)(,2)(( jj ii )()()(ˆ jcjjd iii

0(gray=0)

0

nx

1nr

1ns

are conveniently separated in the frame/overcomplete domain!11, nn sr•

are conveniently separated except for overlaps (but usually there are few overlaps and for fast processing in this version we will ignore them)

11, nn sr•

can predict s in other ways and

improve its rejection

Automatic separation of relevant and irrelevantAutomatic separation of relevant and irrelevant

0 0

)(5.0 11 nn sr1 nr

1414

Grn 1

0

0

…

0

0

…

(blue= r is significant red = s is significant)

• Overlaps are few.• However, it is clear that the prediction must suppress/amplify the same frequencies in a spatially adaptive fashion. • Approaches that use filter dictionaries (i.e. Wiener interpolation filters, etc.) require very big dictionaries.

…00

0

…

Gsn 1

……

Gy

Real ExampleReal Example

1515

)()()(ˆ jcjjd iii 2

,

|)()(~

|minarg)( jcjdj iji

ii

• Fill all frame coefficients of the orange block and invert (encoder/decoder).• Send/receive residual for red block, … • (Less accurate prediction at singularity overlaps).

previously encoded

block to be encoded

lkn dctx ,~

nx~ y

lkdcty ,

available coefficients

coefficients associated with the block to be coded

Causal Prediction of Frame CoefficientsCausal Prediction of Frame Coefficients

neighborhood

1616

Simulation results that show the efficacy of causal predictions (compression results are later).

Showcase of the proposed work using standard test images to give an idea of the temporal evolutions that it can deal with.

Evolutions are frame-wide for ease of demonstration. Otherwise, the proposed algorithm is local and can easily take advantage of localized evolutions in an adaptive fashion.

All frames have additive Gaussian noise ( ) for added challenge and demonstration of noise robustness.

(The algorithm exploits the underlying non-convexity of the set of natural images.)

5

Some Prediction ExamplesSome Prediction Examples

1717

ProblemProblem Past framePast frame Current frameCurrent frame Required processing Required processing for each predicted for each predicted

block block (without (without looking at the looking at the

predicted block!)predicted block!)

PredictionPrediction Prediction Prediction Accuracy Accuracy (PSNR)(PSNR)

Noisy video Denoise 36.42 dB

=peppers + noise

(completely causal, no side information sent)

(BLS-GSM 37.12dB)

1818





Scene transition from a blend of two scenes.

Denoise, find peppers (!) out of the blend of peppers & barbara, amplify peppers.

28.954 dB

=(peppers+barbara)/2+noise

SNR=0dB!

Must catch the red fish

(completely causal, no side information sent)

=peppers+ noise(de-Barbara-d )

1919





Scene transition from a blend of three scenes.

Denoise, find peppers out of the blend of peppers, barbara & boat, amplify peppers.

26.874 dB

Must catch the red fish

=(peppers+barbara+boat) /3 + noise

2020





Scene transition with a cross fade (one scene fades out, the other fades in).

Denoise, find barbara, reduce barbara, find peppers, amplify peppers.

34.952 dB

=.3*peppers+.7*barbara + noise

=.7*peppers+.3*barbara + noise

2121





Scene transition from a blend with a brightness change.

Denoise, find lightmap, invert lightmap, find peppers out of the blend of peppers & barbara, amplify peppers.

27.274 dB

2222

Q: Does it work in practice? A: Yes. JM 10.2, IPP…, (MB level switch, no other overhead). QCIF video. ¼ pixel motion. Adaptive rounding on.

Better

~20% gains in rate

~10%~10%

~25%

~10%

(a) (b)

(c) (d)

2323

105

35

36

37

38

39

40

41

42

43

44

45

sleepycif

Rate

PS

NR

SIP

JM

% 20 improv. line over JM

Movie trailer

~18 %

• Our gains are reduced at lower bitrates because compression process tends to remove the effect of some of the problems we can deal with.

2424

PropertiesProperties• Decoder complexity

• translation invariant decomposition• per-pixel: 3*4*4 multiplies, 4*4 divides, 4*4*4 additions (to compute )• reduce complexity by reducing causal neighborhood, less expansive

decompositions, run only on high error blocks, etc.• Encoder complexity = Decoder complexity + motion search (fast search, run only

on high error blocks, etc.)• Other work:

• Brightness compensation methods: Work only for brightness changes.• Wiener Filter based sub-pixel interpolation : Filters have low-pass

characteristics only. Need many filters in dictionary (too much overhead).• Weighted prediction: Scene wide, only works on blends if blending frames are

in the reference frame buffer.

)( ji~

Our work is more of an “all purpose

cleaner” compared to early work

2525

ConclusionConclusion

• Images depicted in video are sparse and this can be taken advantage of in order

to generate very interesting prediction results. • The proposed work goes beyond early prediction solutions and adds new

capabilities to the prediction.• Many types of temporal evolutions in video can be easily managed, denoising

accomplished, lightning removed, complicated fades handled, focus changes

deblurred …• Showcase of the power of sparse decompositions and how the underlying non-

convexity can be utilized.• Future Work:

• Manage overhead better.• Improve performance.• Reduce complexity.

Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G....

Documents

Transcript of Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G....