Download - Video Coding Concept - NCTU - MAPLmapl.nctu.edu.tw/course/SMSDI_2014/files/[1]Video Coding Concept.pdf · Raw data rate of HDTV video ... “Rate-Constrained coder control and comparison

1

Video Coding Concept

2014 Summer School on MPEG/VCEG Video

2Outline

IntroductionCapture and representation of digital videoFundamentals of video codingSummary

3


Outline

4Why do we need video compression ?

Raw data rate of HDTV video– 1920x1080p pels/frame, 3 colors/pel, 8 bits/color, 60 frames/s

=> 1920 × 1080 × 3 × 8 × 60 ≈ 3.0𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺/𝐺𝐺– 20 Mbits/s for HDTV channel bandwidth

The capabilities of transmission & storage are limited

5Video/Image Coding EverywhereMPEG-1

Block-based Video Coding (VCD)

1992

MPEG-2Block-based Video Coding

With Interlaced tools (DVD)

1994

MPEG-4 AVC/H.264

2003

Multi-ViewVideo Coding for

3DTV

Scalable Video Coding

MPEG-H HEVC/H.265

2007 2008 2013

6Moore’s Law of Video Coding

>50%, 5 years

From: T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, G. Sullivan, “Rate-Constrained coder control and comparison of video coding standards”,

IEEE Trans. CSVT, 2003

MPEG-2

H.263MPEG-4MPEG-4

AVC/H.264

7


Outline

8Human Vision

Any color can be created by combining Red, Green,Blue in varying proportionsHowever, our visual system is less sensitive to color

Cone: Perceives color toneRod: Extracts only luminance

9Spatial and Temporal Sampling

Progressive Sampling

CCD/CMOS Mosaic

10Translation of color space

Represent a color image more efficiently by separating the luminance from the color– Representing chrominance with a lower resolution

R

Y

G B

Cb Cr

11Chroma Down-sampling

4:4:4 4:2:0

4:4:4 4:2:2 4:1:1 4:2:0

12Video Formats for Different Applications

13Objective Quality Metric

Peak Signal-to-Noise RatioFor objective video quality measurement, average per-frame PSNR is computed as a function of bitrates

2

10(2 1)10log

BitDepth

PSNRMSE

−=

PSNR = 33.8 PSNR = 34.9

14Subjective Measure

Mean Opinion Score– Viewer's judgment of quality

Methods– Double Stimulus Continuous Quality Scale (DSCQS)– Double Stimulus Impairment Scale (DSIS)– Single Stimulus Continuous Quality Evaluation (SSCQE)

15


Outline

16How does compression work?

Compression is achieved by removing redundancy– Temporal and spatial redundancy: temporally adjacent

frames are often very similar, so are the neighboring pixels – Statistical redundancy: some values occur more frequently

than others

Frame (t-1) Frame (t)

Temporal Correlation

Spatial Correlation

17Encoder and Decoder (CODEC)

Video compression (video coding) is the process of compacting or condensing a digital video sequence into a small number of bitsCompression involves a complementary pair of systems, a compressor (encoder) and a de-compressor (decoder)

Encoder Decodertransmit

orstore

18Video Coding System

Analysis Quantization Binary Encoding

Binary DecodingDe-quantizationSynthesis

Encoder

Decoder

Lossy Lossless

19Motion Estimation and Compensation

Goal: to remove temporal redundancy– Divide a current picture into macroblocks (MB)– Find for each MB a best match in the reference frame– Transmit the offset (motion vector) and residual block

Assumption: 1) translation motion, 2) constant intensity

MB

Reference Current

20Motion Compensation Block Size (1/2)

21Motion Compensation Block Size (2/2)

22Remarks

The smaller the block size, the less the residual energy

Theoretically, it is possible to estimate a motion vector for each pixel

However, the extra overhead for vectors may outweigh the benefits of temporal prediction

For a compromise between overhead and prediction efficiency, variable block-size motion compensation is often used

23Variable Block-Size Motion Compensation

To adapt block size to picture characteristics

Iain Richardson, H.264 and MPEG-4 Video Compression, Wiley, 2003

24Backward Prediction & Bi-Prediction

Some objects in Frame (t) may not appear in Frame (t-1), but can be found in Frame (t+1), due to occlusion Backward prediction from a future reference frameThe others may be found in both frames Bi-prediction from two reference frames

Frame (t-1) Frame (t) Frame (t+1)

25Coding Order vs. Display Order

Coding order: the order in which pictures are processedDisplay order: the order in which pictures are displayed

0 (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 (F)

Coding order: A, C, B, F, D, EDisplay order: 0, 1, 2, 3, 4, 5

26Sub-Pixel Motion Compensation

In some cases, a better prediction signal may come from interpolated samples in the reference frame

Frame (t-1)

Frame (t)Sampling

34

14

(t)

(t-1)

27Spatial Predictive Coding

Goal: to remove spatial redundancyCreate prediction signal from previously-coded samples in the same frame, e.g., Intra Coding in H.264/AVC

From: Iain Richardson, H.264 and MPEG-4 Video Compression, Wiley, 2003

28Transform

Goal:– To remove spatial redundancy further– To facilitate scalar quantization scheme

Idea: projection onto selected basis functions

From: Y. Wang and et al., Video Processing and Communications, Prentice Hall, 2002

29Choice of Basis Functions

Criteria:– Minimize inter-dependency between coefficients– Compact the energy of input into few coefficients

Optimal choice depends on the signal covariance

x

y

10

xy = x 0

1+ y

x’

11

1-1

xy

= x’ + y’

x y x y x yy’

302-D Discrete Cosine Transform (DCT)

DCT is widely used for video/image coding

Zigzag Order

31Encoding DCT Coefficients

Observations:– Non-zero coefficients are mostly low-frequency ones– High-frequency coefficients tend to be zero up to quantization

Example: Run-Length CodingRun: the number of zeros from the last non-zeroLength: the non-zero valueDCT Coefficients: [5, 0, 0, 2, 3, 0, 0, 4, 0, 0, 0, 1, 0, 0, …, 0]Coding Symbols: [5, (2, 2), (0, 3), (2, 4), (3, 1), EOB]

32Quantization

Goal: to reduce the range of a source signal so that it can be represented by fewer bits– E.g., rounding a fractional number to the nearest integer

Specification– Number of reconstructed values– Boundary values (b0, b1…)– Reconstructed values (g0, g1…)

33Quantizer Design

Criterion: to minimize distortion for a fixed number of reconstructed levelsOptimal design depends on (1) signal distribution, (2) distortion measure

1 2 3 4-1-2-3-4 x

Q(x)

x

f(x)

* ** **

34Scalar vs. Vector Quantization

One sample (scalar) or one group of samples (vector)? Example: using 2 bits to represent (x, y)

x

y

**

**

x

y

**

**

Scalar: 1 bit for x, 1 bit for y Vector: 2 bits for (x, y)

35Transform + Scalar Quant. vs. Vector Quant.

In some cases, Transform + Scalar Quant. can achieve a similar effect to Vector Quant., but with less complexity

x

y

**

**

Vector: 2 bits for (x, y)

x

y

**

**

x’y’

Scalar: 2 bits for x’, 0 bit for y’

36Huffman Coding

20 Questions : one player thinks of an object and the other asks questions to determine what that isGoal: to ask as few questions as possible on average

0.6

0.2

0.10.10.6 0.20.10.1Average codeword length:

(0.6+0.1+0.1+0.2)x2=2Average codeword length:

0.2x2+0.1x3+0.1x3+0.6x1=1.6

37

A B C D

A B C D

A B C D

Example: encoding “ACD”

Arithmetic Coding

Symbol Probability Sub-range

A 0.6 0 .0 – 0.6

B 0.2 0.6 – 0.8

C 0.1 0.8 – 0.9

D 0.1 0.9 – 1.0

38Comparison of Huffman & Arithmetic Coding

Huffman Coding Arithmetic Coding

Optimal entropy Yes No

Codeword length Must be integer Not constrained to be an integer

Extra storage Yes No

Extra transmission time Yes No

Encoding delay Yes No

39Block-based Hybrid Coding Architecture

Hybrid: Predictive Coding + Transform Coding

M.C.

-

+

{M.V.}

FrameBuffer

DCT Q EntropyCoder

Q-1

IDCT

Video

M.E.

Bit-stream

M.C.

-

+

{M.V.}

FrameBuffer

DCT Q EntropyCoder

Q-1

IDCT

Video

M.E.

Bit-stream

Spatial TransformQuantizationTemporal Prediction

Entropy Compression

40Summary

Keys to video compression are data redundancy– Spatial, temporal and statistics redundancies

Block-based hybrid video coding– Block-based motion compensated prediction (temporal)– Spatial intra prediction (spatial)– Residual transform coding (spatial + simple quantization)– Quantization (lossy)– Huffman coding (statistics)