Virtual Memory Alan L. Cox [email protected] Some slides adapted from CMU 15.213 slides.
Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G....
-
Upload
lesley-price -
Category
Documents
-
view
219 -
download
0
Transcript of Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G....
Spatial Sparsity Induced Temporal Spatial Sparsity Induced Temporal Prediction for Hybrid Video Prediction for Hybrid Video
CompressionCompression
Gang Hua and Onur G. GuleryuzGang Hua and Onur G. Guleryuz
Rice University, Houston, TX
DoCoMo USA Labs, Palo Alto, CA
(Please view in full screen presentation mode to see the animations)
22
OutlineOutline
Problem Statement
– Quick intro to hybrid video compression.
– Example difficult video.
– Problems in temporal prediction.
– Quick results showing what the proposed work can do. Our Solution: Spatial Sparsity Induced Temporal Prediction for Hybrid Video
Compression
– Model.
– What we do, how we do it, and why it works. Simulation results showing prediction examples & discussion. Compression results. Conclusion & future work. I will show results on video
but I will also use classical images peppers/barbara to
make intuitive points
33
Quick Digression: The Set of “Natural” ImagesQuick Digression: The Set of “Natural” Images
The set of natural/interesting images
• Non-convex, star-shaped set
• Far from both barbara and peppers but still very useful in compressing barbara or peppers.
44
Setup: Hybrid Video CompressionSetup: Hybrid Video Compression
Predict
current framereference video frame
prediction error
Compress
• We will propose a new prediction technique:
Spatial Sparsity Induced Temporal Prediction (SIP)
55
Current State of PredictionCurrent State of PredictionPredict
current framereference frame
• Current prediction techniques only work well when current frame blocks are simple translations of past frame blocks (Sufficient, for most simple video). • In this work, we will assume translations are accounted for.
+
-Transform Coder
66
Example Difficult VideoExample Difficult Video
(please note differences with traditional sequences like foreman )
(Commercial) (Trailer)
77
Some example problematic temporal Some example problematic temporal evolutions for current techniquesevolutions for current techniques
reference frame current frame
Temporally decorrelated noise
Simple fade from a blend of two scenes
Special effects
INTRA (non-differential) encoding (many bits)
88
MotionCompensation
TransformCoder
1z
+
+Frame to be coded
+
-
++
Previously decoded frame, to be used as reference
Codeddifferential
TransformDecoder
SparsityInduced
Prediction
(causal information)
SIP inside a generic hybrid coderSIP inside a generic hybrid coder
• Objective is to generate better motion compensated predictors.
99
MC reference frame current frameMC reference after SIP
noise
(denoised)
lightning
(removed!)
cross-fade
(fading scenes reduced and amplified as needed!)
clutter
(removed)
1010
Loose Model Loose Model (after translations are accounted for)(after translations are accounted for)
= …= +[ ]
=0.5( + ) + = …[ ]
= …[ ]= …
)()()( 1 iwirix nnn )()()( 111 iwirix nnn
structured noise !brightness change
smooth light-map
relevant noisereference (ith pixel)
current
Straightforward with today’s know-how:• use an overcomplete set of transforms• threshold coefficients, …
• Denoising recipe will not work: can we somehow optimize transforms? use index sets of coefficients?, …
• We must find a common formulation for both of these cases (turns out to be very easy!)
)())()(()( 1111 iwisirix nnnn )()()( 1 iwirix nnn
)())()()(()( 1111 iwisiriix nnnn )()()( 1 iwirix nnn
(Nx1)
I will show more complicated variations as well
1111
How We Do itHow We Do it
M
Mn
d
dHHx
11 ...
)( NN “frame” coefficients of nx
)1( N
)1( MN•
Gcy )~( 1 nxMCy , •
Gdxn ,...1 MHHG
• Look at all images in terms of their “frame coefficients” (translation invariant decompositions generated with a 4x4 block DCT – poor person’s frame)
(M times expansive, M=16)
dGxnˆˆ •
( causal, least squares, per-coefficient estimate of the frame coefficients of )
)()()(ˆ jcjjd iii • nxMini FAQ:
• Why a frame? Separating r and s becomes straightforward, i.e., easy rejection of s.• Inverting overcomplete decompositions? Easy.• Why DCT 4x4? Because it is fast.• Can one use *lets, <my favorite basis>? Subject to some caveats, yes.
1212
Views of a DCT(4x4) frameViews of a DCT(4x4) frame
nx y = 0.5 ( + )
lkn dctx , lkdcty ,
I rearranged the d to make nice pictures
out of the coefficients
1313
2,1dctxn 2,1dcty
y
)0)(,2)(( jj ii )()()(ˆ jcjjd iii
0(gray=0)
0
nx
1nr
1ns
are conveniently separated in the frame/overcomplete domain!11, nn sr•
are conveniently separated except for overlaps (but usually there are few overlaps and for fast processing in this version we will ignore them)
11, nn sr•
can predict s in other ways and
improve its rejection
Automatic separation of relevant and irrelevantAutomatic separation of relevant and irrelevant
0 0
)(5.0 11 nn sr1 nr
1414
Grn 1
0
0
…
0
0
…
(blue= r is significant red = s is significant)
• Overlaps are few.• However, it is clear that the prediction must suppress/amplify the same frequencies in a spatially adaptive fashion. • Approaches that use filter dictionaries (i.e. Wiener interpolation filters, etc.) require very big dictionaries.
…00
0
…
Gsn 1
……
Gy
Real ExampleReal Example
1515
)()()(ˆ jcjjd iii 2
,
|)()(~
|minarg)( jcjdj iji
ii
• Fill all frame coefficients of the orange block and invert (encoder/decoder).• Send/receive residual for red block, … • (Less accurate prediction at singularity overlaps).
previously encoded
block to be encoded
lkn dctx ,~
nx~ y
lkdcty ,
available coefficients
coefficients associated with the block to be coded
Causal Prediction of Frame CoefficientsCausal Prediction of Frame Coefficients
neighborhood
1616
Simulation results that show the efficacy of causal predictions (compression results are later).
Showcase of the proposed work using standard test images to give an idea of the temporal evolutions that it can deal with.
Evolutions are frame-wide for ease of demonstration. Otherwise, the proposed algorithm is local and can easily take advantage of localized evolutions in an adaptive fashion.
All frames have additive Gaussian noise ( ) for added challenge and demonstration of noise robustness.
(The algorithm exploits the underlying non-convexity of the set of natural images.)
5
Some Prediction ExamplesSome Prediction Examples
1717
ProblemProblem Past framePast frame Current frameCurrent frame Required processing Required processing for each predicted for each predicted
block block (without (without looking at the looking at the
predicted block!)predicted block!)
PredictionPrediction Prediction Prediction Accuracy Accuracy (PSNR)(PSNR)
Noisy video Denoise 36.42 dB
=peppers + noise
(completely causal, no side information sent)
(BLS-GSM 37.12dB)
1818
ProblemProblem Past framePast frame Current frameCurrent frame Required processing Required processing for each predicted for each predicted
block block (without (without looking at the looking at the
predicted block!)predicted block!)
PredictionPrediction Prediction Prediction Accuracy Accuracy (PSNR)(PSNR)
Scene transition from a blend of two scenes.
Denoise, find peppers (!) out of the blend of peppers & barbara, amplify peppers.
28.954 dB
=(peppers+barbara)/2+noise
SNR=0dB!
Must catch the red fish
(completely causal, no side information sent)
=peppers+ noise(de-Barbara-d )
1919
ProblemProblem Past framePast frame Current frameCurrent frame Required processing Required processing for each predicted for each predicted
block block (without (without looking at the looking at the
predicted block!)predicted block!)
PredictionPrediction Prediction Prediction Accuracy Accuracy (PSNR)(PSNR)
Scene transition from a blend of three scenes.
Denoise, find peppers out of the blend of peppers, barbara & boat, amplify peppers.
26.874 dB
Must catch the red fish
=(peppers+barbara+boat) /3 + noise
2020
ProblemProblem Past framePast frame Current frameCurrent frame Required processing Required processing for each predicted for each predicted
block block (without (without looking at the looking at the
predicted block!)predicted block!)
PredictionPrediction Prediction Prediction Accuracy Accuracy (PSNR)(PSNR)
Scene transition with a cross fade (one scene fades out, the other fades in).
Denoise, find barbara, reduce barbara, find peppers, amplify peppers.
34.952 dB
=.3*peppers+.7*barbara + noise
=.7*peppers+.3*barbara + noise
2121
ProblemProblem Past framePast frame Current frameCurrent frame Required processing Required processing for each predicted for each predicted
block block (without (without looking at the looking at the
predicted block!)predicted block!)
PredictionPrediction Prediction Prediction Accuracy Accuracy (PSNR)(PSNR)
Scene transition from a blend with a brightness change.
Denoise, find lightmap, invert lightmap, find peppers out of the blend of peppers & barbara, amplify peppers.
27.274 dB
2222
Q: Does it work in practice? A: Yes. JM 10.2, IPP…, (MB level switch, no other overhead). QCIF video. ¼ pixel motion. Adaptive rounding on.
Better
~20% gains in rate
~10%~10%
~25%
~10%
(a) (b)
(c) (d)
2323
105
35
36
37
38
39
40
41
42
43
44
45
sleepycif
Rate
PS
NR
SIP
JM
% 20 improv. line over JM
Movie trailer
~18 %
• Our gains are reduced at lower bitrates because compression process tends to remove the effect of some of the problems we can deal with.
2424
PropertiesProperties• Decoder complexity
• translation invariant decomposition• per-pixel: 3*4*4 multiplies, 4*4 divides, 4*4*4 additions (to compute )• reduce complexity by reducing causal neighborhood, less expansive
decompositions, run only on high error blocks, etc.• Encoder complexity = Decoder complexity + motion search (fast search, run only
on high error blocks, etc.)• Other work:
• Brightness compensation methods: Work only for brightness changes.• Wiener Filter based sub-pixel interpolation : Filters have low-pass
characteristics only. Need many filters in dictionary (too much overhead).• Weighted prediction: Scene wide, only works on blends if blending frames are
in the reference frame buffer.
)( ji~
Our work is more of an “all purpose
cleaner” compared to early work
2525
ConclusionConclusion
• Images depicted in video are sparse and this can be taken advantage of in order
to generate very interesting prediction results. • The proposed work goes beyond early prediction solutions and adds new
capabilities to the prediction.• Many types of temporal evolutions in video can be easily managed, denoising
accomplished, lightning removed, complicated fades handled, focus changes
deblurred …• Showcase of the power of sparse decompositions and how the underlying non-
convexity can be utilized.• Future Work:
• Manage overhead better.• Improve performance.• Reduce complexity.