Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder
-
Upload
lunea-robles -
Category
Documents
-
view
33 -
download
3
description
Transcript of Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder
Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder
Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2005
Introduction
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
OutputVideoSignal
Multiple Reference Frames &Variable Block sizes
Introduction
Prediction Transform Quantization Entropy CodingSource
CompressedData
44/1616 Luma88 Chroma
4 4 DCTScalar
Nonuniform QCAVLCCABAC
(Bit per pixel)
lossy
lossless
Introduction
H.264/AVC I-Frame Coder (CAVLC) vs. JPEG2000 (DWT 53) Computational Complexity
Block-based coding vs. Frame-based coding
DWT 53
Hardware-friendlyHardware-friendly Memory-wastingMemory-wasting
Introduction
Comparison between different image coding standards
JPEG JPEG 2000 DWT53 H.264 I-Frame CAVLC
0.225 bpp
Introduction
Two solutions for platform-based design of H.264/AVC intra frame coder Fast algorithm for software
implementation Reduce 45% complexity PSNR drop 0.3 dB
Hardware accelerator Max. clock rate 55 MHz 31 fps for 4:2:0 SDTV (All intra frames)
H.264/AVC Intra Coding
Intra Prediction
I4MB (44)
0
1
3 45
6
7
8
Current
I16MB (1616)
0
1+ DC + DC + Plane
H.264/AVC Intra Coding
Mode Decision Low complexity mode
SATD (Original pels – Predictors) Rate (bit of Mode information)
High complexity mode MSE (Original pels – Reconstructed pels) Rate (Mode information + Residual)
)|,()|,(),|,( QPModeMBRateQPModeMBDistortionQPModeMBJ kkMODEkkMODEkkMODE
H.264/AVC Intra Coding
Transform and Quantization 4 4 integer transform
DCT-basedinteger transform
Hadamard transform
H.264/AVC Intra Coding
Entropy Coding Context-Based Adaptive Binary
Arithmetic Coding (CABAC) Context-Based Adaptive Variable Length
Coding (CAVLC)
Computation Reduction
Fast Intra Prediction The smaller the mode number is, the
more possible it will occur. global statistics cannot reflect the correlation
of local modes. Local statistics of neighboring blocks are
applied.
Computation Reduction
Rate-distortion under different numbers of local-searched I4MB modes without insertion of full-search blocks
1 246
All DC modes
Computation Reduction
Fast Intra Prediction Prevention of error propagation
Periodic insertion of full-search 4x4 blocks
Adaptive threshold on the distortion for a MB If min SATD of P > THMinSATD, then search all modes.
THMinSATD = (min SATD of F) = 2.0
F PP P
F PP P
F PP P
F PP P
Computation Reduction
1111
1111
1111
1111
1111
1111
1111
1111
33323130
23222120
13121110
03020100
rrrr
rrrr
rrrr
rrrr
1111
1111
1111
1111
1111
1111
1111
1111
33333131
22222020
13131111
02020000
rrrr
rrrr
rrrr
rrrr
Subsampling Patterns
Computation Reduction
Saved Computation and PSNR Drop
Global: subsampling + partial search using global statisticsLocal: subsampling + partial searchProposed: subsampling + partial search + periodic insertion of full search + adaptive SATD threshold
PSNR drop < 0.3 dB
Hardware Architecture
Assumptions A RISC can execute one instruction per
cycle, except multiplication requiring two. A processing element (PE) can generate
predictors of one pixel per cycle.
Hardware Architecture
Solutions
30fps # of modes Avg. cycles per predictors
lumachroma
Produce all modes per cycle Produce one mode per cycle
Hardware Architecture
Four-Parallel Reconfigurable Intra Prediction Generator
8-bit adder
9-bit adder
Hardware ArchitectureCycle 1: T0+T4+T8+T12 Cycle 1: T1+T5+T9+T13 Cycle 1: T2+T6+T10+T14 Cycle 1: T3+T7+T11+T15
Cycle 2: +L0+L4+L8 Cycle 2: +L0+L5+L9 Cycle 2: +L2+L6+L10 Cycle 2: +L3+L7+L11
Cycle 3: +L12 Cycle 3: +L13 Cycle 3: +L14 Cycle 3: +L15
Cycle 4: +++
PE0 PE1 PE2 PE3
I16MB DC Prediction Mode
Top
Left
Hardware Architecture
I16MB Plane Prediction ModePred[y, x] = Clip1((a + b (x – 7) + c (y – 7) >> 5)a = 16 (p[-1, 15] + p[15, -1])b = (5 H + 32) >> 6c = (5 V + 32) >> 6H = 7
x’=0 (x’+1) (p[-1, 8+x’] – p[-1, 6 – x’])V = 7
x’=0 (y’+1) (p[8+y’, -1] – p[6 – y’, -1])
A0 A1 A2 A3
Pred[0,0]Pred[0,4]
Pred[0,8]Pred[0,12]