Conference title 1 A WYNER-ZIV TO H.264 VIDEO TRANSCODER José Luis Martínez, Pedro Cuenca, Gerardo...
Transcript of Conference title 1 A WYNER-ZIV TO H.264 VIDEO TRANSCODER José Luis Martínez, Pedro Cuenca, Gerardo...
Conference title 1
A WYNER-ZIV TO H.264 VIDEO TRANSCODER
José Luis Martínez , Pedro Cuenca, Gerardo Fernández-Escribano, Francisco José Quiles and Hari Kalva
February 27, 2009
Barcelona, Spain
RTCTCM 2009 2
OVERVIEW
1. Introduction/Motivation
2. H.264 and WZ Video Coding Paradigms
3. Proposed WZ/H.264 Video Transcoder
4. Results
5. Conclusion
RTCTCM 2009 3
OVERVIEW
1. Introduction/Motivation
2. H.264 and WZ Video Coding Paradigms
3. Proposed WZ/H.264 Video Transcoder
4. Results
5. Conclusion
RTCTCM 2009 4
Introduction/Motivation (I)
Comm. Tower
TRANSCODER
TRANSCODERTRANSCODER
• Requirements: • Low complexity devices. Low complexity
encoding / decoding algorithm
• Battery consumptions
• Low cost
• Solution adopted: • Traditional video coding with low
complexity tool
• Low complexity profiles (baseline, etc…)
• Rate-Distortion loss
RTCTCM 2009 5
Introduction/Motivation (II)
• What is video transcoding?• The process of converting video encoded
sequence from format A to format B. The conversion process can affect one or more of the coding parameters, such as frame rate, bitrate, resolution, quality, error resilience, etc.
• When is it necessary?• Mismatch in video capabilities and resources at
the sender and the receiver– Resource mismatch, e.g. sender has CIF and the
receiver requires QCIF resolution), bandwidth, computing resources, battery life, etc.
– Format mismatch, e.g. sender has MPEG-2 video and the receiver requires H.264 video
• Then we need transcoding• How to transcode?
– Decode video A and then encode to video B
• How to transcode efficiently?– Complexity reduction is the key problem
– Manage quality vs. complexity tradeoffs
Video Transcoder
RTCTCM 2009 6
OVERVIEW
1. Introduction/Motivation
2. H.264 and WZ Video Coding Paradigms
3. Proposed WZ/H.264 Video Transcoder
4. Results
5. Conclusion
RTCTCM 2009 7
H.264 Video Coding Paradigm
• H.264 is becoming popular, why?
• The H.264/AVC standard achieves much higher coding efficiency than the H.263, MPEG-2, and MPEG-4 standards, due to its improved inter and intra prediction modes at the expense of higher computation complexity. Therefore, H.264/AVC is a strong candidate for a wide range of applications in the near future.
• The complexity reduction techniques for the ME and MC for transcoders based on H.264 is the key for developing fast real-time systems.
Number of Profiles:- Baseline, Main, Extended, and FRExt.
Motion compensation minimum block size: - From 16x16 to 4x4 (frame/field)
Motion vector accuracy:- Quarter-pixel
Transform:- 4x4 DCT integer approximation
Reference frames:- Up to 16 reference frames
Built-in deblocking filter:- Yes
Intra prediction:- Yes
H.264
RTCTCM 2009 8
WZ Video Coding Paradigm
• Low-complexity encoding makes possible more complex decoding
• Side Information: Estimation of X´i available at the decoder.
• What we use to correct errors / mistakes in communications?
– ERROR CORRECTIONS CODES
– Coset codes, Turbo codes, LDPC, …
– Turbo Trellis Codes Modulation (TTCM)
RTCTCM 2009 9
Emerging Challenges
• Applications (from down-link to up-link)• Wireless digital video cameras
• Multimedia mobile phones and PDAs
• Low-power video sensors and surveillance cameras
• Wireless video teleconferencing systems
• Requirements• Light and flexible distribution codec complexity
• Robustness to packet/frame losses
• High compression efficiency
• Low latency
• Target• Inter coding efficiency
• Intra coding complexity (encoder)
• Intra coding robustness
RTCTCM 2009 10
OVERVIEW
1. Introduction/Motivation
2. H.264 and WZ Video Coding Paradigms
3. Proposed WZ/H.264 Video Transcoder
4. Results
5. Conclusion
RTCTCM 2009 11
Proposed WZ/H.264 Transcoder
TRANSCODER
TTCM
Decoder
Parity bit stream
F n - 1 (reference)
+
-
Inter
Intra
+
T
Q
Reorder
T - 1 Q - 1 +
D n
D´ n
F n (current)
Entropy encode
Deblocking Filte
r F´ n (reconstructe
d)
X
NAL
u F´ n
P
WYNER-ZIV DECODER -
H.264 ENCODER
Intra predictio
n
Key Frames
Bitstream
REGULAR CODED BITSTREAM
H.264 I Frames Regular
Intra-Frame
Decoder
X’2i-1
X’2i+1 Side
Information
Generation
DCT Y2i
Reconstruction
MC
MV’s
Buffer IDCT
q’2i X’2i
ME
RTCTCM 2009 12
Exhaustive Search
DMW Search
M
N
Block Size = M x N
Search area (M + 2d max) (N + 2d max)
Number of search points = (2d max+1)2
Search area: x2 + y2 <= (MVx2 +
MVy2), where x and y are the
coordinates of the candidate points Number of search points: Inner point of the circumference
N + 2d max
M + 2d max
d max
d max
Side Information motion vector (MVx, MVy)
1st Step: Reducing the Motion Estimation
TTCM
Decoder
Parity bit stream
F n - 1 (reference)
+
-
Inter
Intra
+
T
Q
Reorder
T - 1 Q - 1 +
D n
D´ n
F n (current)
Entropy encode
Deblocking Filte
r F´ n (reconstructe
d)
X
NAL
u F´ n
P
WYNER-ZIV DECODER -
H.264 ENCODER
Intra predictio
n
Key Frames
Bitstream
REGULAR CODED BITSTREAM
H.264 I Frames Regular
Intra-Frame
Decoder
X’2i-1
X’2i+1 Side
Information
Generation
DCT Y2i
Reconstruction
MC
MV’s
Buffer IDCT
q’2i X’2i
ME
RTCTCM 2009 13
2nd Step: Reducing the Inter Prediction
• Inter frame coding of H.264, seven different block division modes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4)
• H.264 adopts the spatial domain intra prediction in the block sizes 16x16 and 4x4, which include four and nine directional predictions.
• For each of these partions, the motion estimation is carried out
RTCTCM 2009 14
Background
• What is data mining?• Algorithms and techniques that allow computers to learn
• A decision tree is made by mapping the observations about a set of data in a tree made of arcs and nodes
• The goal is to reduce macroblock mode search in H.264 to a classification problem
• The decision trees use the information from the incoming WZ video
• Sum of Absolute Differences values (SAD), Motion Vectors (MVs) length and the amount of pixels that have to be reconstructed
• The coding mode of the corresponding MBs in H.264 is also saved for training purposes
Decision Trees
Data Mining Tools
WZ Decoder H.264 Encoder
Video Sequence
SAD, MVs length and the amount
of pixels
H.264 MB Coding Modes
Pixel data
RTCTCM 2009 15
Background
• The decision tree for MB mode classification was made using the WEKA
• WEKA is a collection of machine learning algorithms for data mining tasks. The algorithms can be applied directly to a dataset
• It is open source software
• ARFF files are used to prepare the data sets for training
@RELATION 4x4_mean_variance @ATTRIBUTE mean0 NUMERIC @ATTRIBUTE variance0 NUMERIC @ATTRIBUTE mean1 NUMERIC @ATTRIBUTE variance1 NUMERIC .............................. @ATTRIBUTE mean15 NUMERIC @ATTRIBUTE variance15 NUMERIC @ATTRIBUTE h263_CBPC_0 {0,1} @ATTRIBUTE h263_CBPC_1 {0,1} @ATTRIBUTE h263_CBPC_2 {0,1} @ATTRIBUTE h263_CBPC_3 {0,1} @ATTRIBUTE h263_mode {0,3} @ATTRIBUTE class {1,8,9}
RTCTCM 2009 16
Background
1
2 3
LOW COMPLEXITY MODES HIGH COMPLEXITY MODES
{SKIP, 16x16, 16x8, 8x16} {INTRA, 8x8, 8x4, 4x8, 4x4}
WZ Information
WEKA Tree
H.264 freedom
1
2 3
HIGH COMPLEXITY MODES {INTRA, 8x8, 8x4, 4x8, 4x4}
WZ Information
WEKA Tree
H.264 freedom
4
5
{SKIP, 16x16} {16x8, 8x16}
1
2 3
WZ Information
WEKA Tree
H.264 freedom
4
5
{SKIP, 16x16} {16x8, 8x16} {8x8, 8x4, 4x8} {INTRA, 4x4}
4
5
1
2 3
WZ Information
WEKA Tree
H.264 freedom
4
5
{SKIP, 16x16} {16x8, 8x16} {INTRA, 4x4}
4
5
6
7
{8x8} {8x4, 4x8}
RTCTCM 2009 17
OVERVIEW
1. Introduction/Motivation
2. H.264 and WZ Video Coding Paradigms
3. Proposed WZ/H.264 Video Transcoder
4. Results
5. Conclusion
RTCTCM 2009 18
Proposed Transcoder
1. Reduced Motion Estimation based on the Dynamic Search Windows
2. Reduced the Inter Prediction based on the decision trees algorithm
TTCM
Decoder
Parity bit stream
F n - 1 (reference)
+
-
Inter
Intra
+
T Q
Reorder
T - 1 Q - 1 +
D n
D´ n
F n (current)
Entropy encode
Deblocking Filte
r F´ n (reconstructe
d)
X
NAL
u F´ n
P
WYNER-ZIV DECODER -
H.264 ENCODER
Intra predictio
n
Key Frames
Bitstream
REGULAR CODED BITSTREAM
H.264 I Frames Regular
Intra-Frame
Decoder
X’2i-1
X’2i+1 Side
Information
Generation
DCT Y2i
Reconstruction
MC
MV’s
Buffer IDCT
q’2i X’2i
ME
RTCTCM 2009 19
RESULTS
• Training Sequence: 10 QCIF frames flower garden
• Tested Sequences: Akiyo, Coastguard, Container, Hall Monitor and Mother
• Simulation conditions The transcoder was implemented
using the H.264/AVC reference software (JM13.2)
QP values 28, 32, 36 and 40 were use for testing. IWZ GOP format was transcoded as IP
The approach is compared with a cascade WZ to H.264 transcoder
1
2 3
WZ Information
WEKA Tree
H.264 freedom
4
5
{SKIP, 16x16} {16x8, 8x16} {INTRA, 4x4}
4
5
6
7
{8x8} {8x4, 4x8}
RTCTCM 2009 20
RESULTS
Sequence Format ∆PSNR (dB) ∆Bitrate (%) ∆Time (%)
Akiyo QCIF -0,004 0,10 95,23
Coastguard QCIF -0,036 0,95 94,99
Container QCIF -0,001 0,03 96,00
Hall Monitor QCIF -0,006 0,14 95,43
Mother QCIF -0,005 0,14 95,32
Mean QCIF -0.011 0.27 95.39
G. Bjontegaard, Calculation of average PSNR differences between RD-Curves. Presented at the 13th VCEG-M33 Meeting, Austin, TX, April 2001
• RD differences
RTCTCM 2009 21
OVERVIEW
1. Introduction/Motivation
2. H.264 and Video Coding Paradigms
3. Proposed WZ/H.264 Video Transcoder
4. Results
5. Conclusion
RTCTCM 2009 22
Conclusions
• We propose a WZ to H.264 video transcoder as a solution for mobile-to-mobile video communications.
• Transcoders based on data mining techniques for exploiting the WZ encoding correlation.
• The H.264 Motion Estimation done in H.264 can be reduced by using the incoming side information motion vectors
• The proposed solution eliminates the requirements of complex WZ decoders and H.264 encoders in the end-user devices.
• These first approach results show an extremely close RD performance with an 95% of complexity reduction.
RTCTCM 2009 23
Collaborations
• Internationals
• Florida Atlantic University. Florida, USA
• Dr Hari Kalva
• University of Surrey. Centre for Communication System Research (CCSR). United Kingdom
• Dr. W.A.C. Fernando
• Nationals
• Universidad Miguel Hernandez. Elche
• Dr. Manuel Perez Malumbres