3
IntroductionCapture and representation of digital videoFundamentals of video codingSummary
Outline
4Why do we need video compression ?
Raw data rate of HDTV video– 1920x1080p pels/frame, 3 colors/pel, 8 bits/color, 60 frames/s
=> 1920 × 1080 × 3 × 8 × 60 ≈ 3.0𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺/𝐺𝐺– 20 Mbits/s for HDTV channel bandwidth
The capabilities of transmission & storage are limited
5Video/Image Coding EverywhereMPEG-1
Block-based Video Coding (VCD)
1992
MPEG-2Block-based Video Coding
With Interlaced tools (DVD)
1994
MPEG-4 AVC/H.264
2003
Multi-ViewVideo Coding for
3DTV
Scalable Video Coding
MPEG-H HEVC/H.265
2007 2008 2013
6Moore’s Law of Video Coding
>50%, 5 years
From: T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, G. Sullivan, “Rate-Constrained coder control and comparison of video coding standards”,
IEEE Trans. CSVT, 2003
MPEG-2
H.263MPEG-4MPEG-4
AVC/H.264
7
IntroductionCapture and representation of digital videoFundamentals of video codingSummary
Outline
8Human Vision
Any color can be created by combining Red, Green,Blue in varying proportionsHowever, our visual system is less sensitive to color
Cone: Perceives color toneRod: Extracts only luminance
10Translation of color space
Represent a color image more efficiently by separating the luminance from the color– Representing chrominance with a lower resolution
R
Y
G B
Cb Cr
13Objective Quality Metric
Peak Signal-to-Noise RatioFor objective video quality measurement, average per-frame PSNR is computed as a function of bitrates
2
10(2 1)10log
BitDepth
PSNRMSE
−=
PSNR = 33.8 PSNR = 34.9
14Subjective Measure
Mean Opinion Score– Viewer's judgment of quality
Methods– Double Stimulus Continuous Quality Scale (DSCQS)– Double Stimulus Impairment Scale (DSIS)– Single Stimulus Continuous Quality Evaluation (SSCQE)
15
IntroductionCapture and representation of digital videoFundamentals of video codingSummary
Outline
16How does compression work?
Compression is achieved by removing redundancy– Temporal and spatial redundancy: temporally adjacent
frames are often very similar, so are the neighboring pixels – Statistical redundancy: some values occur more frequently
than others
Frame (t-1) Frame (t)
Temporal Correlation
Spatial Correlation
17Encoder and Decoder (CODEC)
Video compression (video coding) is the process of compacting or condensing a digital video sequence into a small number of bitsCompression involves a complementary pair of systems, a compressor (encoder) and a de-compressor (decoder)
Encoder Decodertransmit
orstore
18Video Coding System
Analysis Quantization Binary Encoding
Binary DecodingDe-quantizationSynthesis
Encoder
Decoder
Lossy Lossless
19Motion Estimation and Compensation
Goal: to remove temporal redundancy– Divide a current picture into macroblocks (MB)– Find for each MB a best match in the reference frame– Transmit the offset (motion vector) and residual block
Assumption: 1) translation motion, 2) constant intensity
MB
Reference Current
22Remarks
The smaller the block size, the less the residual energy
Theoretically, it is possible to estimate a motion vector for each pixel
However, the extra overhead for vectors may outweigh the benefits of temporal prediction
For a compromise between overhead and prediction efficiency, variable block-size motion compensation is often used
23Variable Block-Size Motion Compensation
To adapt block size to picture characteristics
Iain Richardson, H.264 and MPEG-4 Video Compression, Wiley, 2003
24Backward Prediction & Bi-Prediction
Some objects in Frame (t) may not appear in Frame (t-1), but can be found in Frame (t+1), due to occlusion Backward prediction from a future reference frameThe others may be found in both frames Bi-prediction from two reference frames
Frame (t-1) Frame (t) Frame (t+1)
25Coding Order vs. Display Order
Coding order: the order in which pictures are processedDisplay order: the order in which pictures are displayed
0 (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 (F)
Coding order: A, C, B, F, D, EDisplay order: 0, 1, 2, 3, 4, 5
26Sub-Pixel Motion Compensation
In some cases, a better prediction signal may come from interpolated samples in the reference frame
Frame (t-1)
Frame (t)Sampling
34
14
(t)
(t-1)
27Spatial Predictive Coding
Goal: to remove spatial redundancyCreate prediction signal from previously-coded samples in the same frame, e.g., Intra Coding in H.264/AVC
From: Iain Richardson, H.264 and MPEG-4 Video Compression, Wiley, 2003
28Transform
Goal:– To remove spatial redundancy further– To facilitate scalar quantization scheme
Idea: projection onto selected basis functions
From: Y. Wang and et al., Video Processing and Communications, Prentice Hall, 2002
29Choice of Basis Functions
Criteria:– Minimize inter-dependency between coefficients– Compact the energy of input into few coefficients
Optimal choice depends on the signal covariance
x
y
10
xy = x 0
1+ y
x’
11
1-1
xy
= x’ + y’
x y x y x yy’
31Encoding DCT Coefficients
Observations:– Non-zero coefficients are mostly low-frequency ones– High-frequency coefficients tend to be zero up to quantization
Example: Run-Length CodingRun: the number of zeros from the last non-zeroLength: the non-zero valueDCT Coefficients: [5, 0, 0, 2, 3, 0, 0, 4, 0, 0, 0, 1, 0, 0, …, 0]Coding Symbols: [5, (2, 2), (0, 3), (2, 4), (3, 1), EOB]
32Quantization
Goal: to reduce the range of a source signal so that it can be represented by fewer bits– E.g., rounding a fractional number to the nearest integer
Specification– Number of reconstructed values– Boundary values (b0, b1…)– Reconstructed values (g0, g1…)
33Quantizer Design
Criterion: to minimize distortion for a fixed number of reconstructed levelsOptimal design depends on (1) signal distribution, (2) distortion measure
1 2 3 4-1-2-3-4 x
Q(x)
x
f(x)
* ** **
34Scalar vs. Vector Quantization
One sample (scalar) or one group of samples (vector)? Example: using 2 bits to represent (x, y)
x
y
**
**
x
y
**
**
Scalar: 1 bit for x, 1 bit for y Vector: 2 bits for (x, y)
35Transform + Scalar Quant. vs. Vector Quant.
In some cases, Transform + Scalar Quant. can achieve a similar effect to Vector Quant., but with less complexity
x
y
**
**
Vector: 2 bits for (x, y)
x
y
**
**
x’y’
Scalar: 2 bits for x’, 0 bit for y’
36Huffman Coding
20 Questions : one player thinks of an object and the other asks questions to determine what that isGoal: to ask as few questions as possible on average
0.6
0.2
0.10.10.6 0.20.10.1Average codeword length:
(0.6+0.1+0.1+0.2)x2=2Average codeword length:
0.2x2+0.1x3+0.1x3+0.6x1=1.6
37
A B C D
A B C D
A B C D
Example: encoding “ACD”
Arithmetic Coding
Symbol Probability Sub-range
A 0.6 0 .0 – 0.6
B 0.2 0.6 – 0.8
C 0.1 0.8 – 0.9
D 0.1 0.9 – 1.0
38Comparison of Huffman & Arithmetic Coding
Huffman Coding Arithmetic Coding
Optimal entropy Yes No
Codeword length Must be integer Not constrained to be an integer
Extra storage Yes No
Extra transmission time Yes No
Encoding delay Yes No
39Block-based Hybrid Coding Architecture
Hybrid: Predictive Coding + Transform Coding
M.C.
-
+
{M.V.}
FrameBuffer
DCT Q EntropyCoder
Q-1
IDCT
Video
M.E.
Bit-stream
M.C.
-
+
{M.V.}
FrameBuffer
DCT Q EntropyCoder
Q-1
IDCT
Video
M.E.
Bit-stream
Spatial TransformQuantizationTemporal Prediction
Entropy Compression
40Summary
Keys to video compression are data redundancy– Spatial, temporal and statistics redundancies
Block-based hybrid video coding– Block-based motion compensated prediction (temporal)– Spatial intra prediction (spatial)– Residual transform coding (spatial + simple quantization)– Quantization (lossy)– Huffman coding (statistics)
Top Related