New Direction in Wyner-Ziv Video Coding: On the Importance of Modeling Virtual Correlation Channel...
-
Upload
cathleen-norris -
Category
Documents
-
view
220 -
download
2
Transcript of New Direction in Wyner-Ziv Video Coding: On the Importance of Modeling Virtual Correlation Channel...
New Direction in Wyner-Ziv Video Coding:New Direction in Wyner-Ziv Video Coding:
On the Importance of Modeling On the Importance of Modeling Virtual Correlation Channel (VCC)Virtual Correlation Channel (VCC)
Xin LiXin Li
LDCSEE, WVULDCSEE, WVU
Email: [email protected]: [email protected]
“If you can’t solve a problem, then there is an easier problem you can solve: find it.” - George Pólya
Formulation of a Simpler ProblemFormulation of a Simpler Problem
x2tx2t-2
x2t-1
key frames
WZ frames
I frames
B frames
Conventional video coding (source coding)
Wyner-Ziv video coding ( joint source-channel coding)
x2t-1
x2tx2t-2
Assuming I or key frames are coded by the same intra-frame encoder, canwe achieve comparable coding efficiency on WZ frames to H.264 (state-of-the-art techniques of coding B frames)?
Outline of Our Attack Outline of Our Attack
Motivating observationsMotivating observations– Characterizing the nonstationary virtual correlation channel Characterizing the nonstationary virtual correlation channel
(VCC) by a mixture model(VCC) by a mixture model
Theoretical derivationTheoretical derivation– Classification gain (dual to that in conventional source coding)Classification gain (dual to that in conventional source coding)
Classification-based DVC algorithmClassification-based DVC algorithm– Approximate solution to the simplified problem Approximate solution to the simplified problem
Experimental resultsExperimental results– Comparable R-D performance to H.264 JM11.0 (for certain type Comparable R-D performance to H.264 JM11.0 (for certain type
of video sequences: slow motion)of video sequences: slow motion)
Discussions and perspectivesDiscussions and perspectives– Dualities between conventional and distributed video codingDualities between conventional and distributed video coding– DVC=video modeling + DSC (Rate) + Estimation (Distortion)DVC=video modeling + DSC (Rate) + Estimation (Distortion)
MotivationsMotivations
Learn from the conventional wisdom: What is the major Learn from the conventional wisdom: What is the major factor contributing to the success of existing image/video factor contributing to the success of existing image/video coding standards such as JPEG2000 and H.264?coding standards such as JPEG2000 and H.264?– It is the It is the source classificationsource classification principle and its principle and its subtle implicationssubtle implications
rooted in the earlier pioneering works such as EZW/SPIHT and rooted in the earlier pioneering works such as EZW/SPIHT and multi-hypothesis MCPmulti-hypothesis MCP
Therefore, by following the duality, it is natural to Therefore, by following the duality, it is natural to consider the idea of consider the idea of classifying the virtual correlation classifying the virtual correlation channelchannel in distributed source coding in distributed source coding– Unlike conventional video coding, Unlike conventional video coding, motion estimationmotion estimation (ME) is (ME) is
done at the decoder instead of encoder side in WZ video coding done at the decoder instead of encoder side in WZ video coding (we have addressed this issue separately under a different (we have addressed this issue separately under a different contextcontext11))
1X. Li, “Video processing via implicit and mixture motion model,” IEEE Trans. on Cir. Sys. for Video Tech., vol. 17, no. 8, pp. 953-963, Aug. 2007.
Modeling Non-stationary VCCModeling Non-stationary VCC
Why is the virtual correlation channel is non-Why is the virtual correlation channel is non-stationary?stationary?– Misaligned edges, deformable motion, illumination Misaligned edges, deformable motion, illumination
variations are all spatio-temporally varying phenomena variations are all spatio-temporally varying phenomena
Mixture modeling of Mixture modeling of virtual correlation channelvirtual correlation channel
ttt zxy WT of Interpolated
WZ frames(side information)
WT of originalWZ frames
additiveerrors
(e.g., significant vs. insignificantwavelet coefficients)
(e.g., significant vs. insignificanttemporal interpolation errors)
Summary of Theoretical ResultsSummary of Theoretical ResultsRate-Distortion optimization problem formulationRate-Distortion optimization problem formulation
4
1
mink
kkDaD RRak
kk
4
1
s.t.
Conventional source coding Distributed source coding
R-D function
Rateallocation
Classificationgain
22
222
2
],[log2
1)(
zx
zxxz
xz
DDR
)}(log,0max{][log
],[log2
1)(
2
2
xx
DDR x
)(log102
2
10 dBGgm
xc
)(log10
2
2
10 dBGgm
xzWZ
c
Implications into WZ Video CodingImplications into WZ Video CodingIn conventional source coding, In conventional source coding, classification gain implies that subsource classification gain implies that subsource of of largerlarger variance be assigned a higher variance be assigned a higher priority in rate allocationpriority in rate allocation
In distributed source coding, similar In distributed source coding, similar conclusion can be made except that the conclusion can be made except that the variance of “subsource” is now determined variance of “subsource” is now determined by the virtual correlation channelby the virtual correlation channel
22
222
zx
zxxz
02 xz02 x
02 zOR
Conclusion: the class of significant coefficients that are poorly motion compensated have the largest R-D slope (they should be coded first: where are they? and what are they?)
Rate Control DilemmaRate Control DilemmaHow can we estimate the second-order statistics How can we estimate the second-order statistics of VCC: of VCC: zz
22 (the accuracy of side information y (the accuracy of side information y tt
generated by temporal interpolation)?generated by temporal interpolation)?– At the encoder, we have access to xAt the encoder, we have access to xtt (original WZ (original WZ
frames) but not yframes) but not ytt (side information) (side information)11
– At the decoder, we have access to yAt the decoder, we have access to ytt (side (side
information) but not xinformation) but not xtt (original WZ frames) (original WZ frames)22
– We have adopted decoder-based approach based on We have adopted decoder-based approach based on a feedback channel and a feedback channel and scale invariancescale invariance assumption assumption about zabout zt t (an approximate but tractable solution)(an approximate but tractable solution)11Berkeley’s PRISM scheme allows simple temporal dependency estimation at the encoder.Berkeley’s PRISM scheme allows simple temporal dependency estimation at the encoder.
22Stanford’s researchers suggested the use of feedback channel for rate control.Stanford’s researchers suggested the use of feedback channel for rate control.
Feedback via Scale Invariance of Feedback via Scale Invariance of Interpolation ErrorsInterpolation Errors
x2tx2t+2
x2t-2
x2t-1
key frames
WZ frames
oracle actual
hall
foreman
Block-based significance map of zt
Fine-resolution:
1222212 ),( tttt xxxfz
oracleS.I.
Coarse-resolution:
tttt xxxfz 222222 ),(
Interpolated Key frame
Key frame
wavelet transform
advancedtemporal
interpolation
JointExploitation
WZframes SW lossless
coding of significance map
SW lossless coding of
significance coeff.
SI
CIdecoded
WZ frames
decodedI frames
Encoder Decoderblock-basedclassification
Classification-based WZ Video Classification-based WZ Video Coding SystemCoding System
In a nutshell, we only allocate bits to the class of poorly motioncompensated significant coefficients: both xx
22 and and zz22 are large
WT feedbackchannel
Joint Exploitation of Side and Joint Exploitation of Side and Coded Information at the DecoderCoded Information at the Decoder
zxy
z~N(0,z2)
y
SI
CI=Q(x)
Target of estimation: E[x|y,Q(x)]Latent variable: z (we don’t know z
2)
xUpdate
estimateof x
Updateestimate
of z2
initial guess
Justification of Distortion ReductionJustification of Distortion Reduction
foreman-qcif, block size 1616, 18.3% blocks are coded
SI alone
SI+CI
Coding Experiments SetupCoding Experiments SetupParameter settingParameter setting– Block size: Block size: 1616, WT: Daubechies’ 9-7,
Slepian-Wolf lossless encoder: LDPC-based1, uniform quantizer (∆=8)
– Rate control: thx, thz - significance thresholds for x and z respectively
– SI generation: Implicit MC vs. Explicit MC
Benchmark: H.264 JM11.0 implementation Benchmark: H.264 JM11.0 implementation (QP of I frames is small and fixed )(QP of I frames is small and fixed )
1Liveris, A.D.; Zixiang Xiong; Georghiades, C.N., "Compression of binary sources with side information at the decoder using LDPC codes," IEEE Communications Letters, vol.6, no.10, pp. 440-442, Oct 2002
Comparison of Temporal InterpolationComparison of Temporal Interpolation
Foreman-qcif, ad-hoc fusion by simple averaging1X. Li, “Video processing via implicit and mixture motion model,” IEEE Trans. on Cir. Sys. for Video Tech., vol. 17, no. 8, pp. 953-963, Aug. 2007.
Implicit MC1
Explicit MC2
2Tourapis, A.M.; Hye-Yeon Cheong; Liou, M.L.; Au, O.C., "Temporal interpolation of video sequences using zonal based algorithms," Proc. of ICIP, pp.895-898 vol.3, 2001
R-D Performance Comparison (I)R-D Performance Comparison (I)
Foreman-qcif, 30framesHall-qcif, 30frames
R-D Performance Comparison (II)R-D Performance Comparison (II)
Container-qcif, 30frames Football-qcif, 30frames
Dualities between Conventional Dualities between Conventional and WZ Video Codingand WZ Video Coding
Exploitation of motion-related temporal dependencyExploitation of motion-related temporal dependency– In traditional video coding, prediction is based on original frames In traditional video coding, prediction is based on original frames
(overhead is involved)(overhead is involved)– In WZ video coding, interpolation is based on reconstructed key In WZ video coding, interpolation is based on reconstructed key
frames (no overhead) frames (no overhead) – Importance of Importance of SI generationSI generation11
R-D optimization shifted from encoder to decoderR-D optimization shifted from encoder to decoder– In traditional video coding, decoder is often fixed but encoder In traditional video coding, decoder is often fixed but encoder
enjoys considerable flexibility enjoys considerable flexibility – In WZ video coding, rate control through the feedback channel In WZ video coding, rate control through the feedback channel
offers great flexibility to the decoder without touching encoderoffers great flexibility to the decoder without touching encoder22
– Importance of Importance of matching SW lossless encoder with the statistics matching SW lossless encoder with the statistics of virtual correlation channelof virtual correlation channel (UEP is desirable) (UEP is desirable)
1L. Lu, D. He, A. Jagmohan, “Robust Multi-Frame Side Information Generation For Distributed Video Coding”,Proc. Of ICIP’2007
2Girod, B.; Aaron, A.M.; Rane, S.; Rebollo-Monedero, D., "Distributed Video Coding,“ Proceedings of the IEEE , vol.93, no.1, pp.71-83, Jan. 2005
AcknowledgementAcknowledgement
Ligang Lu and Dake He for inviting me to Ligang Lu and Dake He for inviting me to participate this special sessionparticipate this special session
Zixiang Xiong for sharing with me his Zixiang Xiong for sharing with me his students’ implementation of LDPC-based students’ implementation of LDPC-based Slepian-Wolf coding algorithm Slepian-Wolf coding algorithm
E. Simoncelli for stimulating discussions E. Simoncelli for stimulating discussions on distributed motion representationson distributed motion representations