New Direction in Wyner-Ziv Video Coding: On the Importance of Modeling Virtual Correlation Channel...

New Direction in Wyner-Ziv Video Coding:New Direction in Wyner-Ziv Video Coding:

On the Importance of Modeling On the Importance of Modeling Virtual Correlation Channel (VCC)Virtual Correlation Channel (VCC)

Xin LiXin Li

LDCSEE, WVULDCSEE, WVU

Email: [email protected]: [email protected]

“If you can’t solve a problem, then there is an easier problem you can solve: find it.” - George Pólya

Formulation of a Simpler ProblemFormulation of a Simpler Problem

x2tx2t-2

x2t-1

key frames

WZ frames

I frames

B frames

Conventional video coding (source coding)

Wyner-Ziv video coding ( joint source-channel coding)

x2t-1

x2tx2t-2

Assuming I or key frames are coded by the same intra-frame encoder, canwe achieve comparable coding efficiency on WZ frames to H.264 (state-of-the-art techniques of coding B frames)?

Outline of Our Attack Outline of Our Attack

Motivating observationsMotivating observations– Characterizing the nonstationary virtual correlation channel Characterizing the nonstationary virtual correlation channel

(VCC) by a mixture model(VCC) by a mixture model

Theoretical derivationTheoretical derivation– Classification gain (dual to that in conventional source coding)Classification gain (dual to that in conventional source coding)

Classification-based DVC algorithmClassification-based DVC algorithm– Approximate solution to the simplified problem Approximate solution to the simplified problem

Experimental resultsExperimental results– Comparable R-D performance to H.264 JM11.0 (for certain type Comparable R-D performance to H.264 JM11.0 (for certain type

of video sequences: slow motion)of video sequences: slow motion)

Discussions and perspectivesDiscussions and perspectives– Dualities between conventional and distributed video codingDualities between conventional and distributed video coding– DVC=video modeling + DSC (Rate) + Estimation (Distortion)DVC=video modeling + DSC (Rate) + Estimation (Distortion)

MotivationsMotivations

Learn from the conventional wisdom: What is the major Learn from the conventional wisdom: What is the major factor contributing to the success of existing image/video factor contributing to the success of existing image/video coding standards such as JPEG2000 and H.264?coding standards such as JPEG2000 and H.264?– It is the It is the source classificationsource classification principle and its principle and its subtle implicationssubtle implications

rooted in the earlier pioneering works such as EZW/SPIHT and rooted in the earlier pioneering works such as EZW/SPIHT and multi-hypothesis MCPmulti-hypothesis MCP

Therefore, by following the duality, it is natural to Therefore, by following the duality, it is natural to consider the idea of consider the idea of classifying the virtual correlation classifying the virtual correlation channelchannel in distributed source coding in distributed source coding– Unlike conventional video coding, Unlike conventional video coding, motion estimationmotion estimation (ME) is (ME) is

done at the decoder instead of encoder side in WZ video coding done at the decoder instead of encoder side in WZ video coding (we have addressed this issue separately under a different (we have addressed this issue separately under a different contextcontext11))

1X. Li, “Video processing via implicit and mixture motion model,” IEEE Trans. on Cir. Sys. for Video Tech., vol. 17, no. 8, pp. 953-963, Aug. 2007.

Modeling Non-stationary VCCModeling Non-stationary VCC

Why is the virtual correlation channel is non-Why is the virtual correlation channel is non-stationary?stationary?– Misaligned edges, deformable motion, illumination Misaligned edges, deformable motion, illumination

variations are all spatio-temporally varying phenomena variations are all spatio-temporally varying phenomena

Mixture modeling of Mixture modeling of virtual correlation channelvirtual correlation channel

ttt zxy WT of Interpolated

WZ frames(side information)

WT of originalWZ frames

additiveerrors

(e.g., significant vs. insignificantwavelet coefficients)

(e.g., significant vs. insignificanttemporal interpolation errors)

Summary of Theoretical ResultsSummary of Theoretical ResultsRate-Distortion optimization problem formulationRate-Distortion optimization problem formulation

4

1

mink

kkDaD RRak

kk

4

1

s.t.

Conventional source coding Distributed source coding

R-D function

Rateallocation

Classificationgain

22

222

2

],[log2

1)(

zx

zxxz

xz

DDR

)}(log,0max{][log

],[log2

1)(

2

2

xx

DDR x

)(log102

2

10 dBGgm

xc

)(log10

2

2

10 dBGgm

xzWZ

c

Implications into WZ Video CodingImplications into WZ Video CodingIn conventional source coding, In conventional source coding, classification gain implies that subsource classification gain implies that subsource of of largerlarger variance be assigned a higher variance be assigned a higher priority in rate allocationpriority in rate allocation

In distributed source coding, similar In distributed source coding, similar conclusion can be made except that the conclusion can be made except that the variance of “subsource” is now determined variance of “subsource” is now determined by the virtual correlation channelby the virtual correlation channel

22

222

zx

zxxz

02 xz02 x

02 zOR

Conclusion: the class of significant coefficients that are poorly motion compensated have the largest R-D slope (they should be coded first: where are they? and what are they?)

Rate Control DilemmaRate Control DilemmaHow can we estimate the second-order statistics How can we estimate the second-order statistics of VCC: of VCC: zz

22 (the accuracy of side information y (the accuracy of side information y tt

generated by temporal interpolation)?generated by temporal interpolation)?– At the encoder, we have access to xAt the encoder, we have access to xtt (original WZ (original WZ

frames) but not yframes) but not ytt (side information) (side information)11

– At the decoder, we have access to yAt the decoder, we have access to ytt (side (side

information) but not xinformation) but not xtt (original WZ frames) (original WZ frames)22

– We have adopted decoder-based approach based on We have adopted decoder-based approach based on a feedback channel and a feedback channel and scale invariancescale invariance assumption assumption about zabout zt t (an approximate but tractable solution)(an approximate but tractable solution)11Berkeley’s PRISM scheme allows simple temporal dependency estimation at the encoder.Berkeley’s PRISM scheme allows simple temporal dependency estimation at the encoder.

22Stanford’s researchers suggested the use of feedback channel for rate control.Stanford’s researchers suggested the use of feedback channel for rate control.

Feedback via Scale Invariance of Feedback via Scale Invariance of Interpolation ErrorsInterpolation Errors

x2tx2t+2

x2t-2

x2t-1

key frames

WZ frames

oracle actual

hall

foreman

Block-based significance map of zt

Fine-resolution:

1222212 ),( tttt xxxfz

oracleS.I.

Coarse-resolution:

tttt xxxfz 222222 ),(

Interpolated Key frame

Key frame

wavelet transform

advancedtemporal

interpolation

JointExploitation

WZframes SW lossless

coding of significance map

SW lossless coding of

significance coeff.

SI

CIdecoded

WZ frames

decodedI frames

Encoder Decoderblock-basedclassification

Classification-based WZ Video Classification-based WZ Video Coding SystemCoding System

In a nutshell, we only allocate bits to the class of poorly motioncompensated significant coefficients: both xx

22 and and zz22 are large

WT feedbackchannel

Joint Exploitation of Side and Joint Exploitation of Side and Coded Information at the DecoderCoded Information at the Decoder

zxy

z~N(0,z2)

y

SI

CI=Q(x)

Target of estimation: E[x|y,Q(x)]Latent variable: z (we don’t know z

2)

xUpdate

estimateof x

Updateestimate

of z2

initial guess

Justification of Distortion ReductionJustification of Distortion Reduction

foreman-qcif, block size 1616, 18.3% blocks are coded

SI alone

SI+CI

Coding Experiments SetupCoding Experiments SetupParameter settingParameter setting– Block size: Block size: 1616, WT: Daubechies’ 9-7,

Slepian-Wolf lossless encoder: LDPC-based1, uniform quantizer (∆=8)

– Rate control: thx, thz - significance thresholds for x and z respectively

– SI generation: Implicit MC vs. Explicit MC

Benchmark: H.264 JM11.0 implementation Benchmark: H.264 JM11.0 implementation (QP of I frames is small and fixed )(QP of I frames is small and fixed )

1Liveris, A.D.; Zixiang Xiong; Georghiades, C.N., "Compression of binary sources with side information at the decoder using LDPC codes," IEEE Communications Letters, vol.6, no.10, pp. 440-442, Oct 2002

Comparison of Temporal InterpolationComparison of Temporal Interpolation

Foreman-qcif, ad-hoc fusion by simple averaging1X. Li, “Video processing via implicit and mixture motion model,” IEEE Trans. on Cir. Sys. for Video Tech., vol. 17, no. 8, pp. 953-963, Aug. 2007.

Implicit MC1

Explicit MC2

2Tourapis, A.M.; Hye-Yeon Cheong; Liou, M.L.; Au, O.C., "Temporal interpolation of video sequences using zonal based algorithms," Proc. of ICIP, pp.895-898 vol.3, 2001

R-D Performance Comparison (I)R-D Performance Comparison (I)

Foreman-qcif, 30framesHall-qcif, 30frames

R-D Performance Comparison (II)R-D Performance Comparison (II)

Container-qcif, 30frames Football-qcif, 30frames

Dualities between Conventional Dualities between Conventional and WZ Video Codingand WZ Video Coding

Exploitation of motion-related temporal dependencyExploitation of motion-related temporal dependency– In traditional video coding, prediction is based on original frames In traditional video coding, prediction is based on original frames

(overhead is involved)(overhead is involved)– In WZ video coding, interpolation is based on reconstructed key In WZ video coding, interpolation is based on reconstructed key

frames (no overhead) frames (no overhead) – Importance of Importance of SI generationSI generation11

R-D optimization shifted from encoder to decoderR-D optimization shifted from encoder to decoder– In traditional video coding, decoder is often fixed but encoder In traditional video coding, decoder is often fixed but encoder

enjoys considerable flexibility enjoys considerable flexibility – In WZ video coding, rate control through the feedback channel In WZ video coding, rate control through the feedback channel

offers great flexibility to the decoder without touching encoderoffers great flexibility to the decoder without touching encoder22

– Importance of Importance of matching SW lossless encoder with the statistics matching SW lossless encoder with the statistics of virtual correlation channelof virtual correlation channel (UEP is desirable) (UEP is desirable)

1L. Lu, D. He, A. Jagmohan, “Robust Multi-Frame Side Information Generation For Distributed Video Coding”,Proc. Of ICIP’2007

2Girod, B.; Aaron, A.M.; Rane, S.; Rebollo-Monedero, D., "Distributed Video Coding,“ Proceedings of the IEEE , vol.93, no.1, pp.71-83, Jan. 2005

AcknowledgementAcknowledgement

Ligang Lu and Dake He for inviting me to Ligang Lu and Dake He for inviting me to participate this special sessionparticipate this special session

Zixiang Xiong for sharing with me his Zixiang Xiong for sharing with me his students’ implementation of LDPC-based students’ implementation of LDPC-based Slepian-Wolf coding algorithm Slepian-Wolf coding algorithm

E. Simoncelli for stimulating discussions E. Simoncelli for stimulating discussions on distributed motion representationson distributed motion representations

New Direction in Wyner-Ziv Video Coding: On the Importance of Modeling Virtual Correlation Channel...

Documents

Transcript of New Direction in Wyner-Ziv Video Coding: On the Importance of Modeling Virtual Correlation Channel...