1 - UTA · Web view2.1 Components of an image ((a) R, G, ... Very Long Instruction Word Chapter...
Transcript of 1 - UTA · Web view2.1 Components of an image ((a) R, G, ... Very Long Instruction Word Chapter...
OPTIMIZATION OF THE DEBLOCKING FILTER IN H.264 CODEC FOR REAL TIME
IMPLEMENTATION
by
Hitesh Yadav
Presented to the Faculty of the Graduate School of
The University of Texas at Arlington in Partial Fulfillment
of the Requirements
for the Degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
THE UNIVERSITY OF TEXAS AT ARLINGTON
MAY 2006
ACKNOWLEDGEMENTS
I would like to thank Dr. K. R. Rao for his constant encouragement and patience that
made this thesis possible. I also thank Dr. M. Manry and Dr. Z. Wang for being on my
committee. I also thank people at Analog Devices Inc. for their advice. I would also like
to thank my roommates and friends for their support. Lastly, but not the least I would like
to thank my parents and family for their constant support and patience.
April 19, 2006
ii
ABSTRACT
OPTIMIZATION OF THE DEBLOCKING FILTER IN H.264 CODEC FOR REAL TIME
IMPLEMENTATION
Publication No. ______
Hitesh Yadav, MS
The University of Texas at Arlington, 2006
Supervising Professor: K. R. Rao
H.264 standard provides a broad range of low bit rate and high bit rate
multimedia applications with high reliability and efficient coding performance when
transporting compressed video through existing and future networks. At low bit rate
applications, the artifacts are visible in the H.264 decoded video. Prominent among them
is blocking artifact. It is caused mainly due to the use of block-based transform coding
and coarse quantization of the transform coefficients. H.264 standard uses an in-loop
deblocking filter as a part of the standard to reduce the blocking artifacts. The main
drawback of the deblocking filter is its implementation complexity. It takes
approximately one-third of the computational resources of the decoder. The optimization
of the code does not help much as some of the branches are inherited to the
iii
algorithm itself. This gives rise to a need of a simple deblocking algorithm with reduced
implementation complexity. The objective of the research is to develop an algorithm,
which reduces the implementation complexity while maintaining the perceptual quality of
the existing deblocking algorithm.
In the proposed method, the maximum and minimum values among the six
pixels across an edge are computed to decide whether the pixels of the block should be
filtered or not. If the difference between maximum and minimum value is less than QP
then the pixels of that block should be filtered. Otherwise if this difference above is
greater than QP, then it is more likely to represent an edge and should not be filtered. The
main advantage of the proposed method is its simple logic. It has less number of
conditional branches than the existing deblocking filter in H.264 standard. It helps in
reducing the execution time in real time systems as most of video coding standards
nowadays are implemented in processors having several pipelining stages. The proposed
method also has reduced code size compared to JM 9.2 (H.264 software). This means less
memory is required to implement the proposed algorithm. It can be concluded from the
research that the proposed method reduces the implementation complexity compared to
existing JM 9.2 (H.264 software) method while at the same time it maintains the same
visual quality.
iv
TABLE OF CONTENTS
ACKNOWLEDGEMENTS....................................................................................... ii
ABSTRACT............................................................................................................... iii
LIST OF ILLUSTRATIONS..................................................................................... ix
LIST OF TABLES..................................................................................................... xii
LIST OF ACRONYMS............................................................................................. xiii
CHAPTER
1. INTRODUCTION ................................................................... 1
1.1 Background .............................................................................. 1
1.2 Objective of the thesis ............................................................................ 2
1.3 Outline of the thesis .............................................................................. 2
2. BASIC VIDEO CODING THEORY ........................................................... 4
2.1 Video compression ............................................................................. 4
2.1.1 RGB and YUV color spaces..................................................... 4
2.1.2 Video sampling ........................................................................ 7
2.1.3 Redundancy reduction ……………………………………….. 8
2.1.4 Video codec ………………………………………………….. 9
2.1.5 Motion Estimation …………………………………………… 10
2.1.6 Motion vectors ………………………………………………. 12
2.1.7 Block size effect …………………………………………….... 13
v
2.1.8 Sub-pixel interpolation ……………………………………….
15
2.1.9 Discrete cosine transform ……………………………………. 16
2.1.10 Quantization ………………………………………………... 17
2.1.11 Zigzag scan …………………………………………………. 17
2.1.12 Run length encoding ………………………………………... 18
2.1.13 Entropy coding ……………………………………………… 18
2.2 MPEG and H.26x …………………………………………………... 19
2.2.1 ISO/IEC, ITU-T and JVT ……………………………………. 19
2.2.2 H.261 ………………………………………………………… 20
2.2.3 MPEG-1 ……………………………………………………... 21
2.2.4 H.262 and MPEG-2 …………………………………………. 21
2.2.5 H.263/H.263+/H.263++ ……………………………………... 22
2.2.6 MPEG-4 ……………………………………………………… 22
2.2.7 MPEG-4 part-10/H.264 ………………………………………
23
3. OVERVIEW OF H.264/AVC STANDARD................................................. 25
3.1 Network abstraction layer................................................................... 32
3.2 Video coding layer.............................................................................. 34
3.2.1 Motion estimation and compensation for inter frames............. 35
3.2.2 Multiple reference pictures selection ………………………... 39
3.2.3 Intra prediction ………………………………………………. 41
vi
3.2.4 Transform and quantization …………………………………. 43
3.2.5 Deblocking filter....................................................................... 45
3.2.6 Entropy coding.......................................................................... 46
3.3 Conclusions ……………………………………………………….... 48
4. REVIEW OF POSTPROCESSING TECHNIQUES.................................... 49
4.1 Need for postprocessing...................................................................... 49
4.1.1 Preprocessing............................................................................ 50
4.1.2 Postprocessing........................................................................... 50
4.2 Causes for blocking and ringing artifacts............................................ 52
4.2.1 Blocking artifacts ..................................................................... 52
4.2.2 Ringing artifacts ....................................................................... 53
4.3 Deblocking filters ............................................................................... 54
4.3.1 POCS based iterative algorithms ............................................. 55
4.3.2 Weighted sum of symmetrically aligned pixels ....................... 57
4.3.3 Adaptive deblocking filter ....................................................... 58
4.3.4 Comparison of algorithms ........................................................ 59
4.4 Comparison of postprocessing and loop filtering .............................. 61
4.5 Desired loop filter ............................................................................... 62
5. DEBLOCKING FILTER IN H.264/AVC …………………………………… 63
5.1 Deblocking filter operation …………………………………………. 64
5.1.1 Characteristics of deblocking filter …………………………… 64
5.1.2 Principle of deblocking filter ……………………………….... 65
vii
5.1.3 Algorithm of deblocking filter ………………………………. 67
5.2 Deblocking filter complexity ………………………………………. 69
6. INTRA AND INTER FRAMES…………………………………………… 73
6.1 Intra frames ………………………………………………………… 73
6.1.1 Proposed method for intra frames …………………………… 74
6.1.2 Results for intra frames …………………………………….... 77
6.2 Inter frames ………………………………………………………… 82
6.2.1 Proposed method for inter frames ………………………….... 84
6.2.2 Results for inter frames ……………………………………… 84
7. RESULTS AND CONCLUSIONS……………………………………....... 89
7.1 Results ……………………………………………………………… 89
7.2 Conclusions ………………………………………………………… 91
7.3 Future research ……………………………………………………... 92
REFERENCES.......................................................................................................... 93
BIOGRAPHICAL INFORMATION......................................................................... 97
viii
LIST OF ILLUSTRATIONS
Figure Page
2.1 Components of an image ((a) R, G, B components, (b) Cb, Cr, Cg components) ............................................................................................... 7
2.2 Color format ((a) 4:2:0 (b) 4:2:2 (c) 4:4:4)...................................................... 7
2.3 Spatial and temporal redundancies ................................................................. 9
2.4 Common video coding flow............................................................................ 10
2.5 Motion estimation procedure ......................................................................... 11
2.6 Macroblock representation in 4:2:0 format .................................................... 11
2.7 Motion vector representation ......................................................................... 12
2.8 Macroblock partitions for motion estimation and compensation ................... 14
2.9 Block size effects on motion estimation ((a) frame 1, (b) frame 2, (c) No .... 14 motion estimation, (d) 16x16 block, (e) 8x8 block, (f) 4x4 block.
2.10 Sub-pixel interpolation ................................................................................... 15
2.11 Zigzag scan ..................................................................................................... 18
2.12 Progression of the ITU-T recommendations and MPEG standards ............... 24
3.1 H.264/AVC layer structure ............................................................................. 27
3.2 Hierarchical syntax ......................................................................................... 28
3.3 Progressive and interlaced frames .................................................................. 29
3.4 Subdivision of a picture into slices ................................................................. 29
3.5 The specific coding parts of profile in H.264 ................................................. 32
ix
3.6 H.264 encoder ................................................................................................ 34
3.7 H.264 decoder ................................................................................................. 35
3.8 Motion compensation accuracy ...................................................................... 36
3.9 Quarter sample luma interpolation ................................................................. 39
3.10 Multiple reference frames and generalized bi-predictive frames ................... 40
3.11 Intra prediction in H.264 ................................................................................. 41
3.12 16x16 intra prediction directions .................................................................... 42
3.13 4x4 intra prediction directions ........................................................................ 43
3.14 Transform coding ............................................................................................ 44
3.15 CABAC overview ........................................................................................... 47
4.1 Blocking artifacts at low bit rate coding ......................................................... 53
5.1 One-dimensional view of a 4x4 block edge .................................................... 66
5.2 Boundaries in a macroblock to be filtered ...................................................... 67
5.3 Decision flow of filter tap selection ................................................................ 70
5.4 Decision flow of boundary strength where P and Q denote the identification of two adjacent 4x4 blocks ....................................................... 71
6.1 4x4 block edge ................................................................................................ 75
6.2 Decision flow of filtering of pixels ................................................................. 78
6.3 Reconstructed I frame with QP=37 (foreman) ............................................... 80
6.4 Reconstructed I frame with QP=45 (foreman) ............................................... 80
6.5 Reconstructed I frame with QP=37 (bridge) ................................................... 81
6.6 Reconstructed I frame with QP=37 (car phone) ............................................. 81
6.7 Reconstructed I frame with QP=39 (news) ..................................................... 81
x
6.8 Reconstructed I frame with QP =39 (silent).................................................... 82
6.9 Reconstructed I frame with QP=45 (container) .............................................. 82
6.10 Reconstructed P frame with QP=39 (bridge) .................................................. 87
6.11 Reconstructed B frame with QP=39 (foreman) .............................................. 87
6.12 Reconstructed P frame with QP=39 (car phone) ............................................ 87
6.13 Reconstructed P frame with QP=39 (foreman) ............................................... 88
6.14 Reconstructed B frame with QP=37 (car phone) ............................................ 88
xi
LIST OF TABLES
Table Page
4.1 A summary on the computation and implementation complexity of the deblocking algorithm ........................................................ 60
6.1 Comparison of PSNR values for different test sequences............................... 79
6.2 Comparison of PSNR values and the total number of bits used for encoding a P-frame or B-frame in H.264 compressed stream for different test sequences.............................................. 86
6.3 Comparison of PSNR values and the total number of bits used for encoding a P-frame or B-frame in H.264 compressed stream for a GOP......................................................................... 90
xii
LIST OF ACRONYMS
AC: Alternating Current
AVC: Advanced Video Coding
bS: Boundary strength
CABAC: Context-based Adaptive Binary Arithmetic Coding
CAVLC: Context-based Adaptive Variable length Coding
DB: Decibel
DC: Direct Current
DCT: Discrete Cosine Transform
DSP: Digital Signal Processor
DVD: Digital Video Disc
GOP: Group of Pictures
HDTV: High Definition Television
HVS: Human Visual System
IEC: International Engineering Consortium
ISO: International Standards Organization
ITU: International Telecommunication Union
JVT: Joint Video Team
MB: Macroblock
MC: Motion Compensation
xiii
MCPE: Motion Compensated Prediction Error
ME: Motion Estimation
MPEG: Motion Picture Experts Group
MSE: Mean Square Error
NALU: Network Abstraction Layer Unit
POCS: Projection onto Convex Sets
PSNR: Peak Signal to Noise Ratio
QP: Quantization Parameter
SI: Switching Intra
SIMD: Single Instruction Multiple Data
SP: Switching Prediction
VCL: Variable Length Coding
VLIW: Very Long Instruction Word
xiv
Chapter 1
INTRODUCTION
1.1 Background
H.264/AVC (advanced video coding) is a new recommendation
international standard published jointly by ITU-T VCEG (Video Coding
Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group) [1].
The main purpose of this standard is to provide a broad range of multimedia
applications with higher reliability and efficient coding performance when
transporting compressed video through various networks compared to
former standards [2].
At very low bit rate coding, artifacts are visible in the reconstructed
frames [3]. Prominent among them are blocking and ringing artifacts. Post
filter or loop filter is always a viable option to reduce those artifacts, as we
are gaining video quality without any increase in bit rate. The main
advantage of using a loop filter is that it improves both objective and
subjective quality of video streams with significant reduction in decoder
complexity as compared to post filtering [4], [5]. In inter coding the
1
reference frames are filtered ones, thereby reducing the complexity. H.264
standard has a loop filter as a normative part of the standard to reduce the
blocking artifacts at very low bit rate coding. H.264 uses 4x4 integer DCT
(discrete cosine transform) transform, which helps to reduce the ringing
artifacts. In FREXT (fidelity range extension) 8x8 integer DCT is also used.
Though this filter gives good results at low bit rate coding, its major
drawback is its implementation complexity.
1.2 Objective of the thesis
The purpose of this thesis is to optimize the deblocking algorithm to
reduce the complexity in H.264 codec for real time implementation.
Analysis of the run-time profiles of the decoder sub-functions indicates that
deblocking filter process in H.264 standard is the most computationally
intensive part [6].
1.3 Outline of the thesis
In chapter 2, basic video coding theory and the evolution of main
stream video-coding standards are discussed. Chapter 3 gives an overview of
H.264 standard. Chapter 4 describes the need for post processing, the review
2
of post processing techniques for compression artifact removal and the pros
and cons of each technique. Chapter 5 describes the existing deblocking
filter in H.264/AVC in detail and includes its complexity analysis. Chapters
6 describes the proposed method as applied to the intra and inter frames and
the results obtained. Chapter 7 concludes with the comparison of the
obtained results and future research directions.
3
Chapter 2
BASIC VIDEO CODING THEORY
2.1 Video compression
The digital video compression technology has been gaining popularity
for many years. Today, when people enjoy HDTV (high definition
television), movie broadcasting through Internet or the digital music such as
MP3, the convenience that the digital video industry brings to us cannot be
forgotten. All of these should attribute to the advances in compression
technology, enhancement on mass storage media or streaming video/audio
services. As the main contributor to all of above, video compression
technology is the focus of this chapter. Some basic video compression
concepts will be introduced as the basis of chapter 3.
2.1.1 RGB and YUV color spaces
RGB (red-green-blue) color space is well suited to capture and display
of color images. The image consists of three grayscale components
(sometimes referred to as channels) [2]. The combination of red, green and
blue with different weights can produce any visible color. A numerical value
4
is used to indicate the proportion of each color. The drawback of RGB
representation of color image is that all 3 colors are equally important and
should be stored with the same amount of data bits. It is found that HVS
(human visual system) is less sensitive to color than to brightness. In order
to take advantage of this finding, a new color space called YUV (luminance-
chrominance (blue)-chrominance (red)) is proposed. Instead of using the
color of the light, YUV chooses the luminance (Y) and chrominance (UV) of
the light to represent a color image. YUV uses RGB information, but it
creates a black and white image (luma) from the full color image and then
subtracts the three primary colors resulting in two additional signals (chroma
/Cb, Cr) to describe color. Combining the three signals back together results
in a full color image [2]. The luminance information Y can be calculated
from R, G and B according to the following equations:
Y = krR+kgG+kbB (2.1)
where k is the weighting factors, kr+kg+kb=1
The color difference information (Chroma) can be derived as:
Cb=B – Y (2.2)
Cr=R – Y (2.3)
Cg=G – Y (2.4)
5
In reality, only three components (Y, Cb and Cr) need to be transmitted for
video coding because Cg can be derived from Y, Cb and Cr. As recommended
by ITU-R [], kb=0.114, kr = 0.299. The Equations (2.2) thru (2.4) can be
rewritten as:
Y=0.299R + 0.587G + 0.114B (2.5)
Cb=0.564(B – Y) (2.6)
Cr=0.713(R – Y) (2.7)
R=Y + 1.402Cr (2.8)
G=Y – 0.344Cb – 0.714Cr (2.9)
B=Y + 1.772Cb (2.10)
In reality, images are looked as 2D arrays. Fig. 2.1a shows the red, green and
blue components of a color image in comparison to chroma components Cb,
Cr and Cg of Fig. 2.1b.
6
Figure 2.1 Components of an image: (a) R, G, B components, (b) Cb, Cr, Cg
components [2]
2.1.2 Video sampling
The video source is normally a bit steam consists of a series of frames
or fields in decoding order [1]. There are three YCbCr sampling modes
supported by MPEG-4 and H.264 (Fig 2.2).
Figure 2. 2 Color format, (a) 4:2:0, (b) 4:2:2, (c) 4:4:4
4:2:0 is the most common used sampling pattern. The sampling interval of
luminance sample Y is the same as the video source. The Cb and Cr have
twice the sampling interval as luminance on both vertical and horizontal
7
directions (Fig. 2.2a). In this case, every 4-luma samples have one Cb and
one Cr sample. As HVS is less sensitive to color than to brightness, it is
possible to reduce the resolution of chrominance part without degrading the
image quality apparently. This makes 4:2:0 very popular in current video
compression standards [2]. This mode is widely used for consumer
applications such as video conferencing, digital television and DVD (digital
versatile disc) storage. For 4:2:2 mode, Cb and Cr have the same vertical
resolution as luma but half the horizontal resolution as luma (Fig. 2.2b). This
mode is used for high quality color representation. 4:4:4 mode has the same
resolution for Y, Cb and Cr on both directions (Fig. 2.2c).
2.1.3 Redundancy reduction
The basic idea of video compression is to compress an original video
sequence (raw video) into a smaller one with fewer numbers of bits. The
compression is achieved by removing redundant information from the raw
video sequence. There are totally three types of redundancies present:
temporal, spatial and frequency domain redundancies.
Spatial and temporal redundancies: Pixel values are not
independent, but are correlated with their neighbors both within the same
frame and across frames [2]. Spatial redundancy is the little variation in the
8
content of the image within a frame (Fig. 2.3). Utilizing spatial redundancy,
the value of a pixel is predictable from the known values of neighboring
pixels. In time domain, there is little variation in frame content between
consecutive frames except for the case when the object or content of the
video is changing quickly. This is often known as temporal redundancy (Fig.
2.3).
Figure 2.3 Spatial and temporal redundancies [2]
Frequency domain redundancy: The HVS is more sensitive to lower
frequencies [19] than to higher frequencies.
2.1.4 Video codec
The redundancies mentioned above can be removed by different
methods. The temporal and spatial redundancies are often reduced by motion
estimation (and compensation) .The frequency redundancy is commonly
9
reduced by DCT and quantization. After these operations, entropy coding is
employed to the data to achieve further compression.
Figure 2.4 Common video coding flow [2]
Each function block of the common video coding flow (Fig. 2.4) will
be addressed in the order that it exists in the video coding process.
2.1.5 Motion Estimation
The input to the coding system is an uncompressed video sequence. In
motion estimation, we find the best match for the current block from
previous or future frames. We find the best match for current block by
selecting an area in the reference frame (past or future reference frames) that
minimizes the residual energy (Fig. 2.5). In motion compensation process,
the chosen candidate region is subtracted from the current block to form a
residual block.
10
Figure 2.5 Motion estimation procedure
In practice, motion estimation and compensation are often based on
rectangular blocks (MxN or NxN). The most common size of the block is
16x16 for luminance component and 8x8 for chrominance components
(4:2:0 format). A 16x16 pixel region called macroblock is the basic data unit
for motion compensation in current video coding standards (MPEG series
and ITU-T series). It consists of one 16x16 luminance sample block, one
8x8 Cb sample block and one 8x8 Cr sample block (Fig 2.6).
Figure 2.6 Macro block representation in 4:2:0 format
11
Theoretically, the smaller the block size is in 4:2:0 format, the better the
motion estimation performance.
2.1.6 Motion vectors
Motion vector is a two-value pair (∆x, ∆y), which indicates the
relative position offsets of the current macro block compared to its best
matching region in both vertical and horizontal directions (Fig 2.7). Motion
vector is encoded and transmitted together with the residual.
Figure 2.7 Motion vector representation
During the decoding process, the residual should be added to the
matching region to recover the current frame. With the help of motion
vectors, the matching region can be found from the reference frame.
12
2.1.7 Block size effect
Fig. 2.9 shows the residual of 2 successive frames based on different
block size. Fig. 2.9a and Fig. 2.9b are the original frames. Fig. 2.9c is the
residual without motion estimation. Fig. 2.9d, Fig. 2.9e and Fig. 2.9f are the
MCPE (motion compensated prediction error) based on 16x16, 8x8 and 4x4
block (Fig. 2.8) motion estimation respectively. Residual is the difference of
frame 1 and frame 2. The mid-gray in the residual indicates that the subtract
result is zero. The light or dark in the residual indicates the result is positive
or negative. The more mid-gray area is, the more redundant information is
reduced. In order to achieve higher compression efficiency, H.264 has
chosen smaller block size for motion estimation. However, as the redundant
information within residual is reduced, there is increase in motion vectors
encoded and transmitted. Therefore H.264 supports changing the block size
dynamically according to the content of the frame.
13
Figure 2.8. Macroblock partitions for motion estimation and compensation [46].
Figure 2.9 Block size effects on motion estimation, (a) Frame 1, (b) Frame 2, (c) No motion estimation (Inter-frame difference), (d) 16x16 block
(MCPE), (e) 8x8 block (MCPE), (f) 4x4 block (MCPE) [2].
(a) (b) (c)
(e) (f)(d)
14
2.1.8 Sub-pixel interpolation
The accuracy of motion compensation is in units of distance between
pixels. In case the motion vector points to an integer-sample position, the
prediction signal consists of the corresponding samples of the reference
picture; otherwise the corresponding sample is obtained using interpolation
to generate non-integer positions [2]. Non-integer position interpolation (Fig
2.10) gives the encoder more choices when searching for the best matching
region compare to integer motion estimation; the result is the redundancy in
the residual can be reduced further.
Figure 2.10 Sub-pixel interpolation [2]
15
2.1.9 Discrete cosine transform
After the motion estimation, the residual data can be converted into
another domain (transform domain) to minimize the frequency redundancy.
The choice of a transform depends on number of criteria: a) Data in the
transform domain should be decorrelated and compact. b) The transform
should be reversible. c) The transform should be computationally tractable.
The most popular transforms fall into 2 categories: block based and image
based. Examples of block-based transforms include KLT (Karhunen-Loève
transform), SVD (singular value decomposition) and DCT. Examples of
image-based transforms include DWT (discrete wavelet transform). DCT is
the most popular transform of all these and is being currently employed in
most video coding standards.
H.264/AVC employs smaller size of transform compared to earlier
standards. There is a tradeoff associated with the size of transform used. The
large transforms can provide a better energy compaction and better
preservation of detailed features in a quantized signal than a small transform
does. But at the same time large transform introduces more ringing artifacts
caused by quantization than small transform does.
16
2.1.10 Quantization
After DCT, quantization is employed to truncate the magnitude of
DCT coefficients in order to reduce the number of bits that represent the
coefficients. Quantization can be performed on each individual coefficient,
which is known as scalar quantization (SQ). This can also be performed on a
group of coefficients together, and this is known as vector quantization
(VQ). [7].
2.1.11 Zigzag scan
After quantization, most of the non-zero DCT coefficients are located
close to the upper left corner in the matrix. Through zigzag scan (Fig 2.11),
the order of the coefficients is rearranged in the order that most of the zeros
are grouped together in the output data stream. In the following stage using
run length coding, this string of zeros can be encoded with very few
numbers of bits.
17
Figure 2.11 Zigzag scan [2].
2.1.12 Run length encoding
Run-length coding chooses to use a series of (run, level) pairs to
represent a string of data. For example: For an input data array: {2, 0, 0, 0, 5,
0, 3, 7, 0, 0, 0, 1….} the output (run, level) pairs are: (0, 2), (3, 5), (1, 3), (0,
7), (3, 1)… Run here means the number of zeros in a data before the next
non-zero data. Level is the value of the non-zero data.
2.1.13 Entropy coding
The last stage in Fig 2.4 is entropy coding. Entropy encoder
compresses the quantized data into smaller number of bits for future
transmission. This is achieved by giving each value a unique code word
18
based on its probability in the data stream. The more the probability of a
value, the fewer bits are assigned to its code word. The most commonly used
entropy coders are the Huffman encoder and the arithmetic encoder, though
for applications requiring fast execution, simple run-length encoding (RLE)
has proven very effective [7].
Two advanced entropy-coding methods know as CAVLC (context-
based adaptive variable length coding) and CABAC (context-based
arithmetic coding) are adopted by H.264/AVC. These two methods have
improved coding efficiency compared to the methods applied in previous
standards.
2.2 MPEG and H.26x
2.2.1 ISO/IEC, ITU-T and JVT
ISO/IEC and ITU-T (international telecommunication union) are two
main international standards organizations for recommending coding
standards of video, audio and their combination. H.26x family of standards
is designed by ITU-T. As the ITU Telecommunication Standardization
Sector, ITU-T is a permanent organ of ITU responsible for studying
technical, operating and tariff questions and issuing Recommendations on
them with a view to standardizing telecommunications on a world-wide
19
basis [1]. H.261 is the first version of H.26x series started since 1984.
During the following years, H.262, H.263, H.263+, H.263++ and H.264 are
released by ITU-T subsequently [1].
The MPEG (moving picture expert group) family of standards
includes MPEG-1, MPEG-2 and MPEG-4, [8] formally known as ISO/IEC-
11172, ISO/IEC-13818 and ISO/IEC-14496. MPEG is originally the name
given to the group of experts that developed these standards. The MPEG
working group (formally known as ISO/IEC JTC1/SC29/WG11) is part of
JTC1, the Joint ISO/IEC Technical Committee on Information Technology.
The Joint Video Team (JVT) consists of members from ISO/IEC
JTC1/SC29/WG11 (MPEG) and ITU-T SG16 Q.6 (VCEG). They published
H.264 Recommendation/MPEG-4 part 10 standard [1].
2.2.2 H.261
H.261 is first developed by ITU-T in 1990. It is a video compression
standard, which targets on low bit- rate real time applications (down to 64
kbit/s), such as visual telephone service. The basic idea of video coding is
based on DCT, VLC entropy coding and simple motion estimation technique
for reducing the redundancy of the video information.
20
2.2.3 MPEG -1
The MPEG-1 standard, published in 1992, was designed to produce
reasonable quality images and audio at low bit rates. MPEG-1 provides the
resolution of 352x240 (SIF) for NTSC or 352x288 for PAL at 1.5 Mb/s. The
target applications are focused on the CD-ROM, video-CD, and stream
media applications like video over digital telephone networks, video on
demand (VOD) etc. The picture quality level almost equals to VHS tape.
MPEG-1 can also be encoded at bit rates as high as 4-5Mbits/sec. MPEG-1
specified the compression of audio signals, simply called layer-1,-2,-3.
Layer-3 is now very popular in the digital music distribution over Internet
known as MP3.
2.2.4 H.262 and MPEG-2
MPEG–2 standard was established by ISO/IEC in 1994. The purpose
of this standard is to produce enhanced data rate and better video quality
compared to MPEG–1. The coding technique of MPEG-2 is the same as
MPEG-1 but with a higher picture resolution of 720x486.The unique feature
of MPEG-2 is the layered structure, which supports a scalable video system.
In this system, a streaming video can be decoded to videos with different
21
qualities according to the situation of the network and the customer
requirements. Field and frame picture structure makes the standard
compatible with interlaced video. For the consistency of the standards,
MPEG-2 is also compatible with MPEG-1, which means a MPEG-2 player
can play back MPEG-1 video without any modification. This standard is
also adopted by ITU-T referred to as H.262.
2.2.5 H.263/H.263+/H.263++
H.263 (1995) [9] is the improvement of H.261. Compared to the
former standards, H.263 provides (achieves) better picture quality and higher
compression rate by using half pixel interpolation and more efficient VLC
coding. H.263 version 2 (H.263+) and H.263 version 3 (H.263++) give more
options to the coding standard on the basis of H.263 which achieves higher
coding efficiency, more flexibility, scalability support and error resilience
support.
2.2.6 MPEG-4
MPEG-4 (ISO/IEC 14496) became the international standard in 1999
[8]. The basic coding theory of MPEG-4 still remains the same as previous
MPEG standards but more networks oriented. It is more suitable for
broadcast, interactive and conversational environments. MPEG-4 introduced
22
‘objects’ concept: A video object in a scene is an entity that a user is allowed
to access (seek, browse) and manipulate (cut and paste). It serves from (2
kbit/s for speech, 5 kbit/s for video) to (5 Mbit/s for transparent quality
video, 64 kbit/s per channel for CD quality audio) [8].
2.2.7 MPEG-4 part-10/ H.264
ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC MPEG
jointly develop the newest standard, H.264/AVC (also known as MPEG-4
part 10). The motivation of this standard comes from the growing
multimedia services and the popularity of HDTV, which need more efficient
coding method. At the same time, various transmission media especially for
those low speed media (Cable Modem, xDSL or UMTS) also called for the
significant enhancement of coding efficiency. By introducing some unique
techniques, H.264/AVC aims to increase compression rate significantly
(save up to 50% bit rate as compared to MPEG-2 picture quality) while
transmitting high quality image at both high and low bit rates. The standard
can increase resilience to errors by supporting flexibility in coding as well as
organization of coded data. Network adaptation layer allows H.264 bit
stream to be transported over different networks. The increase in coding
efficiency and coding flexibility comes at the expense of increase in
23
complexity as compared to the other standards. These features are discussed
in chapter 3.
Figure 2.12 Progression of the ITU-T recommendations and MPEG standards
MPEG Standards
Joint ITU-T/ MPEG
Standards
ITU-T Standards H.261 H.263 H.263+ H.263++
H.262/ MPEG-2
MPEG-1 MPEG-4
H.264/ MPEG-4 AV C
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
24
Chapter 3
OVERVIEW OF H.264/AVC STANDARD
As broadband wire and wireless communication is booming in the
world, streaming video has become one of the most important applications
both in internet and telecom industry. 3G wireless service has been launched
throughout the world and enhanced data service such as HSDPA (high-speed
downlink packet access) is introduced with bandwidth more than 384kbps.
Thus multimedia streaming including video and audio are expected to be
delivered to end users. However the total bandwidth is still limited and the
costs for end user are proportional to the reserved bit rate or the number of
bits transmitted on the data link. At the same time, since harsh transmission
environment in wireless communications such as distance attenuation,
shadow fading, and multi-path fading can introduce unpredictable packet-
loss and error during transmission, compression efficiency and error
resilience are the main requirements for video coding standard to succeed in
the future.
25
Currently there are several image and video-coding standards that are
widely used such as JPEG, JPEG2000, MPEG-2, and MPEG-4 [2]. In 2003,
H.264/AVC was introduced with significant enhancement both in
compression efficiency and error resilience. Compared with former video
coding standards such as MPEG-2 and MPEG-4 part 2, it saves
approximately 50% in bit rate [10] and provides important characteristics
such as error resilience, stream switching, fast forward/backward etc. It is
believed to be the most competitive video coding standard in this new era.
However, the improvement in performance comes at the expense of increase
in computational complexity, which requires higher speed both in hardware
and software. H.264/AVC targets the applications like video conferencing
(full duplex), video storage or broadcasting (half duplex) with enhanced
compression efficiency as well as network friendliness. The scope of
H.264/AVC covers two layers: network abstraction layer (NAL) and video
coding layer (VCL) (Fig. 3.1). While NAL gives a better support for the
video transmission through a wide range of network environments, VCL
mainly focuses on how to enhance the coding efficiency.
26
Figure 3.1 H.264/AVC layer structure [11]
In this chapter, the features that make H.264/AVC achieve the
performance improvement compared to former existing standards will be
investigated. The improvement to the video coding layer will be addressed
in more detail. Before discussing the H.264/AVC technical features, some
important terminologies should be introduced first.
Video coding standards commonly use hierarchical syntax. A video
sequence is divided into group of pictures. A picture is divided into slices. A
slice is divided into macroblocks. A macroblock is divided into blocks (Fig.
3.2). In H.264/AVC additionally a block can be further divided into sub
blocks.
27
Figure 3.2 Hierarchical syntax
Coded picture: A coded picture in this standard refers to a field
(interlaced video) or a frame (progressive or interlaced video) (Fig 3.3).
Each coded frame has a unique frame number, which is signaled in the bit
stream. The frame number is not necessarily the same as the decoding order
of the frame. For those interlaced frame or progressive frame, each field has
an associate picture order count, used to indicate the decoding order
between the two fields.
28
Figure 3.3 Progressive and interlaced frames [2].
Each previously coded picture can be used as the reference picture for
future decoded pictures. One notable feature here is the reference pictures
are managed by one or two lists (list0 and list1). Macro block is the basic
data unit for video coding operations. A set of macro blocks is further
grouped into a slice in raster scan order. A frame may be split into one or
more slices (Fig. 3.4).
Figure 3.4 Subdivision of a picture into slices [10].
29
For each slice, the macro blocks within it are coded independently
from those within other slices. There are totally 5 types of slices defined in
H.264 /AVC:
I slice: All the macro blocks in this slice are I macro blocks. They are
coded without referring to previously coded pictures, but may use the
decoded samples within the same slice (current picture) as reference (intra
prediction).
P slice: Macro blocks in this type of slice can be P macro blocks or I
macro blocks. P macro block is predicted by referring to one previously
decoded picture in list 0.
B slice: In addition to the coding types available in a P slice, macro
blocks of a B slice can also be coded using inter prediction with two
reference pictures per predicted block (one from list 0 and/or one from list 1)
that are combined using a weighted average.
SP slice: A so-called switching P slice that is coded such that efficient
switching between different video streams becomes possible without the
large number of bits needed for an I slice [12].
SI slice: A so-called switching I slice that allows an exact match of a
macro block in an SP slice for random access or error recovery purposes.
30
The last two types of slices are newly added in H.264/AVC and the
first three slices are similar to those used in earlier standards.
Profile: Profile defines a set of coding algorithms or functions that a
coding standard may use. In H.264 the following profiles are defined known
as baseline profile (lower capability plus error resilience), main profile (high
compression quality), extended profile (added features for efficient
streaming) and high profile (Fig. 3.5).
Level: The performance limits for codecs are defined as a collection
of levels, each places a restriction on the configurations of the coding
process, such as decoding speed, sample rate, number of blocks per second
etc.
31
Figure 3.5 The specific coding parts of profile in H.264 [42].
3.1 Network abstraction layer
The NAL (network abstraction layer) is designed to provide friendly
transmission for video data through different network environments. The
coded video data is packetized into NAL units in order to support most of
the existing packet switched based network environments. Each NAL unit is
a packet that contains an integer number of bytes. The first byte of each
NAL unit is the header that contains an indication of the data type in this
NAL unit, and the following bytes contain the data indicated by the header
[11]. For more detailed information about NAL, please refer to [11].
32
3.2 Video coding layer
Modifications have been made in the video coding layer of H.264 in
order to achieve significant compression efficiency as compared to previous
standards. The basic encoding structure of H.264/AVC is shown in Fig. 3.6.
In H.264, first the video source is divided into blocks of luma and chroma.
Then the motion estimation and prediction are employed to exploit the
temporal and spatial redundancies. Then transform coding, quantization and
entropy coding are applied in serial, which finally generate the output bit
stream. This bit stream can be used to transmit through networks or stored
with optical or magnetic storage devices.
Figure 3.6 H.264 encoder [45]
33
The decoding flow consists of a series of reversed operations in terms
of the encoding process (Fig 3.7). The only operation added to the decoding
flow is the loop-filter (deblocking filter). The purpose of this filter is to
minimize the block distortions introduced by block based transformations
and motion estimations. The video decoding procedure is defined in existing
standards (also for H.264/AVC), which means by imposing the decoding
process with a collection of restrictions (such as the restrictions on bit stream
and syntax), any encoding process that produces a decodable bit stream
(decodable by the standard decoding process) is an applicable encoder. By
this way, the developers have strong flexibility in developing the encoders in
order to incorporate different applications with various requirements (such
as compression quality, implementation cost, time to market, etc) [11].
34
Figure 3.7 H.264 decoder [46].
3.2.1 Motion estimation and compensation for inter frames
The key features added to motion estimation and compensation part of
H.264/AVC include (1) variable block-size motion compensation with small
block sizes, (2) quarter-pixel motion estimation accuracy, and (3) multiple
reference pictures selection.
1. Variable block-size motion compensation with small block sizes:
In previous standards, motion estimation is based on 16x16 macro
block for luma component and as 8x8 block for chroma component for 4:2:0
format. But in H.264/AVC, different block sizes are supported for motion
35
compensation. The luminance component (Y) of each macro block can be
partitioned in 4 ways: one 16x16 macro block, two 16x8 rectangular blocks,
two 8x16 rectangular blocks or four 8x8 blocks (Fig. 3.8). If the 8x8 mode is
chosen, each 8x8 block may be further divided into four ways. One 8x8
block, two 8x4 sub blocks, two 4x8 sub blocks or four 4x4 sub blocks (Fig.
3.8).
Figure. 3.8 Motion compensation accuracy [45]
Variable block size is chosen by transmitting one additional syntax
element for each 8x8 partition. This syntax element specifies whether the
corresponding 8x8 partition should be divided further. The partition strategy
36
can be looked as a tree structure to some extent. The partitions for chroma
components (Cb, Cr) in 4:2:0 format are done in the same manner except the
size of each partition is only half of the luma partition in both horizontal and
vertical coordinates (16x8 in luma corresponding to 8x4 in chroma, 8x4 in
luma corresponding to 4x2 in chroma) [2]. The smaller the block is split, the
less energy is left within the residual. By using the combination of seven
different block sizes, the bit rate savings of up to 12% can be achieved as
compared to using only a 16x16 block size [13].
2. Quarter-pixel motion estimation accuracy:
Most of the existing standards support the motion estimation accuracy
up to half sample pixel. In H.264/AVC, the maximum accuracy is enhanced
to quarter pixel. Each partition or sub-macro block partition in an inter-
coded macro block is predicted from an area of the same size in a reference
picture. The offset between the two areas (the motion vector) has quarter-
sample resolution for the luma component and one-eighth sample resolution
for the chroma components. The luma and chroma samples at sub-sample
positions do not exist in the reference picture and so it is necessary to create
them by using interpolation from nearby coded samples.
37
Figure 3.9 shows the quarter pixel interpolation of a 4x4 luma block.
The gray dots noted with upper case indicate the integer-position
samples .The white dots noted with lower case indicate the half and quarter-
pixel samples. First the half sample positions are obtained by applying a 6-
tap filter with tap values: (1, -5, 20, 20, -5, 1)/32. Quarter sample positions
are obtained by averaging samples at integer and half sample positions. In
practice, the motion vectors (MV) of the block use one or two bits to
indicate if the motion estimation is integer, half-pixel or quarter-pixel. The
quarter-pixel accuracy gives 20% bit rate savings [14] as well as more
accurate motion representation compared to integer-pixel spatial accuracy.
38
Figure 3.9 Quarter sample luma interpolation [2]
3.2.2 Multiple reference pictures selection
The H.264/AVC standard gives flexibility for the encoder to select
large number of decoded reference pictures. This flexibility increases the
requirement of memory size for both encoder and decoder but enhances the
compression efficiency at the same time. For P macro block, the reference
picture can be chosen from multiple former decoded pictures (Fig. 3.10).
Therefore not only the motion vectors but also a reference index parameter ∆
(which indicates which picture should be referenced) is transmitted. The
39
reference index parameter is transmitted for each motion-compensated
16x16, 16x8, 8x16, or 8x8 luma block. Motion compensation for regions
smaller than 8x8 uses the same reference index for prediction of all blocks
within the 8x8 region.
Figure 3.10 Multiple reference frames and generalized bi-predictive frames [10].
Because of the increased complexity in motion prediction, the
H.264/AVC standard employs two distinct lists of reference pictures (list 0
and list 1). For P slice, only the list 0 is used to store the reference pictures
whereas B slice needs both list 0 and list 1. For detailed reference picture
management, please refer to [1] and [2].
The motion compensated-prediction for B slice is in the same manner
except that it is a bi-directional prediction. In B slices, four different types of
inter-picture prediction are supported: list 0, list 1, bi-predictive, and direct
prediction. For the bi-predictive mode, the prediction signal is formed by a
40
weighted average of motion-compensated list 0 and list 1. The direct
prediction mode is inferred from previously transmitted syntax elements and
can be either list 0 or list 1 prediction or bi-predictive.
Multiple reference pictures in the new standard yield about 5-20%
coding efficiency [14] compared to former standards that use only one
reference frame.
3.2.3 Intra prediction
Intra prediction allows the current macro block to be predicted by the
previously decoded samples within the same slice at the decoder. The
encoder can switch between intra and inter prediction dynamically according
to the content of the frame. The directional spatial prediction for intra coding
improves the quality of the prediction signal (Fig. 3.11).
Figure 3.11 Intra prediction in H.264 [45]
41
Luma intra prediction either has a single prediction for entire 16x16
macro block or 16 individual predictions of 4x4 blocks. In high profiles
there is also 8x8 intra prediction. There are 9 intra 4x4 (DC, 8 directional)
and 4 intra 16x16 (vertical, horizontal, DC, planar) prediction modes for
luma components. For chroma components, 4 8x8 based intra prediction
modes (vertical, horizontal, DC, planar) are supported and both the chroma
components of the same macro block (Cb and Cr) use the same prediction
mode. In addition to the intra prediction modes above, another intra coding
mode (I_PCM) is also used for some special cases. I_PCM just sends the
image samples without prediction or transformation. The I_PCM mode
guarantees a limit on the expansion of white noise during compression.
The 16x16 and 4x4 intra prediction directions modes are shown in the
Figs. 3.12 and 3.13 respectively.
Figure 3.12 16x16 intra prediction directions
For regions with less spatial detail (flat regions), H.264 supports
16x16 intra coding. The prediction mode for each block is efficiently coded
42
by assigning shorter symbols to more likely modes, where the probability of
each mode is determined based on the modes used for coding the
surrounding block.
Figure 3.13 4x4 intra prediction directions
3.2.4 Transform and quantization
H.264/AVC employs integer spatial transform which is primarily 4x4
in shape (Fig. 3.14), as opposed to the usual floating point 8x8 DCT
43
specified with rounding error tolerances as used in earlier standards. H.264
can use 3 transforms depending on the type of the residual data that is to be
coded: a transform for the 4x4 array of luma DC coefficients in intra macro
blocks (predicted in 16x16 mode), a transform for the 2x2 array of chroma
DC coefficients (in intra macro block) and integer DCT for all other 4x4
blocks in the residual data.
Figure 3.14 Transform coding [45]
The characteristics of transform used in H.264/AVC are as follows:
1. Transform is applied in 2 stages for 16x16 intra prediction. In the first
stage 4x4 integer DCT is applied. Hadamard transform is applied in the
second stage to the DC components of the first stage transform.
44
2. Transform of size of block 4x4 is separable.
3. Integer transform: Accuracy mismatch at the encoder/decoder can be
eliminated.
4. As it consists of only adds and shifts, it is easy to implement it.
5. Different norms for even and odd rows of the matrix.
6. Due to the use of small size transform, it reduces ringing artifacts in a
frame [15].
The characteristics of quantization used in H.264/AVC are as follows:
1. Scaling part of the transform in H.264/AVC is integrated into the
quantizer.
2. Logarithmic step size control.
3. Extended range of step sizes. QP is in the range of 0-51.
4. Smaller step size for chroma compared to luma.
5. Quantization reconstruction is just one multiply, one add and one
shift.
3.2.5 Deblocking filter
Deblocking filter is introduced in H.264/AVC standard to minimize
the block distortion caused by present compression technique. By
controlling the strength of the filtering with some parameters, the block
45
edges can be smoothed and the appearance of the decoded frames is also
enhanced. The deblocking filter is an in-loop filter, which means it is not a
kind of post processing. For post filter, the input is a completely
reconstructed frame, but for in-loop filter, the input is the current MB and
the boundaries of each decoded macro block are filtered immediately.
Deblocking filter is added to the encoder after inverse transformation and
before the frame store. For decoder, the filter is located after the
reconstruction of the frame for display. The deblocking filter is explained in
detail in chapter 5.
3.2.6 Entropy coding
Exp-Golomb code is used universally for all symbols except for
transform coefficients. The following two entropy-coding algorithms are
used in H.264 standard:
1. Context Adaptive Variable Length Coding
a. No end-of block, but number of coefficients is decoded.
b. Coefficients are scanned backwards and contexts are built
depending on transform coefficients.
c. Transform coefficients are coded with the following elements:
number of non-zero coefficients, levels and signs for all non-
46
zero coefficients, total number of zeros before last non-zero
coefficient, and run before each non-zero coefficient.
d. The VLC table to use is adaptively chosen based on the number
of coefficients in the neighboring blocks.
2. Context Adaptive Binary Arithmetic Coding
a. Overview of CABAC is shown in Fig 3.15.
b. Usage of adaptive probability models for most symbols.
c. Exploiting symbol correlations by using contexts.
d. Discriminate between binary decisions by their positions in the
binary sequence.
e. Probability estimation is realized via look up table.
Figure 3.15 CABAC overview [45]
47
3.3 Conclusions
H.264/AVC gives significant enhancement both in compression
efficiency and error resilience. Compared with earlier video coding
standards such as MPEG-2 and MPEG-4 part 2, it saves more than 40% in
bit rate [14] and provides important characteristics such as error resilience,
stream switching, fast forward/backward etc. It is believed to be the most
competitive video coding standard in this new era. However, the
improvement in performance also introduces increase in computational
complexity, which requires higher speed both in hardware and software.
48
Chapter 4
REVIEW OF POST PROCESSING TECHNIQUE
4.1 Need for postprocessing
At very low bit rate coding, artifacts are visible in the decoded frames
of most video coding standards [3]. As these artifacts are visually unpleasant
for the viewer, a method is needed to remove them. Some of today’s real
time or even offline applications demand higher bandwidth than the channel
can accommodate. Some of these applications are video telephony,
videoconferencing, and video streaming over the internet. Signal source,
compression methods and coding bit rates normally influence the perceptual
quality of compressed images and video [16]. In general, the less the bit rate
the severe the coding artifacts manifest in reconstructed video. The
compression method used determines the type of artifacts that occur in the
reconstructed video. BDCT (block discrete cosine transform) based coding
scheme introduces blocking artifacts in flat regions and ringing artifacts
along object edges [17] at low bit rate coding. Wavelet based coding
49
schemes introduce blurring and ringing artifacts [18] at low bit rate coding.
If the coding artifacts are removed from the reconstructed video, it will lead
to a significant improvement in visual quality of reconstructed video.
4.1.1 Preprocessing
Two techniques are widely used to remove the coding artifacts at low
bit rate coding. One way is to use pre-processing techniques at the encoder
side. Pre-processing techniques have been widely used in modern speech
and audio coding [19]. Many pre-processing techniques have been proposed
[20] which remove unnoticeable details from the source signal so that less
information has to be coded. Recently, different image transforms like
interleaved block transform [21], [22], the lapped transform [23], and the
combined transform [24] have been proposed to avoid blocking artifacts. But
each of them has its own coding scheme, which restricts its application to
commercial coding system products complaint with the existing standards.
4.1.2 Postprocessing
An efficient and widely used way to remove the artifacts is post-
processing. Post-processing gives a better video quality without an increase
in bit rate and any modification in the encoding procedure. Probably that is
the reason that it is an active area of research in recent years. Post-
50
processing techniques are employed at the decoder side. These techniques
can be broadly classified into two categories (image enhancement and image
restoration).
For algorithms based on image enhancement, improve the perceived
quality subjectively. The artifacts structure and HVS are taken into account
in the design of image enhancement methods. The image enhancement
approach aims at smoothing visible artifacts instead of restoring the pixel
back to its original value. One typical example is the application of filtering
along block boundaries to reduce blockiness [24].
Algorithms based on image restoration formulate post-processing as
an image recovery problem. Reconstruction is performed based on a priori
knowledge of the distortion model and the observed data at the decoder.
Several classical image restoration techniques, including CLS (constrained
least squares), POCS (projection onto convex sets), and MAP (maximum a
posteriori) have been used to alleviate compression artifacts. The
computational complexity of these methods is another issue for applications
that require real time processing.
51
4.2 Causes for blocking and ringing artifacts
As seen in section 4.1, at very low bit rate coding artifacts appear in
decoded video or images. BDCT based coding schemes are used in most
recent video coding standards like MPEG-2, MPEG-4, and H.264/AVC. At
low bit rate coding, BDCT based coding scheme introduces artifacts,
prominent among them are blocking and ringing artifacts [17].
4.2.1 Blocking artifacts
Blocking artifacts are the grid noise [25] along the block boundary in
a relatively homogeneous area (Fig 4.1). The total error introduced into an
individual pixel is sum of functions of quantization errors of each of the
DCT coefficients. Due to the complexity of HVS, the perceived distortion is
not directly proportional to the absolute quantization error, but also to the
local, global, spatial and temporal characteristics of the video sequence.
Unfortunately, coding a block as an independent unit does not take into
account the possibility that the correlation of pixels may extend beyond the
borders of a block into adjacent blocks, thereby leading to the border
discontinuity. Coarse quantization of transform coefficients is a major
contributor to the blocking artifacts. A different quantization step size used
for adjacent blocks is one of the other factors, which contributes to the
52
blocking artifacts. But due to the use of block based transform coding,
ensuring a similarity between the quantization step sizes of adjacent blocks
does not necessarily mean that the blocking effect will be eliminated [3].
Figure 4.1 Blocking Artifacts at low bit rate coding
4.2.2 Ringing artifacts
Ringing effects appear as spurious oscillations along the major objects
edges at low bit rates. The ringing effect is most evident along high contrast
edges in areas of generally smooth texture in the reconstruction. The DCT
basis images are not attuned to the representation of diagonal edges and
features [26]. Consequently, more of the higher frequency basis images are
required to satisfactorily represent diagonal edges or significantly diagonally
Blocking Artifacts
53
oriented features. At low bit rate coding, due to coarse quantization, most of
the contributions made by higher frequency basis images is reduced to zero.
As a result ringing effect occurs along edges. The poor energy compacting
properties of the DCT for blocks containing diagonal edges also extend to
the vertical and horizontal edges.
4.3 Deblocking filters
As the proposed work is to reduce the deblocking filter complexity,
only the deblocking filter will be discussed in detail. Although the
deblocking filters perform well in improving the objective and subjective
quality of output video frames, they are usually computationally intensive.
There are number of deblocking algorithms proposed for reducing the block
artifacts in BDCT based compressed images with minimal smoothing of true
edges. They can be classified into regression-based algorithm [27], wavelet
based algorithms [28], anisotropic diffusion based algorithm [29], weighted
sum of pixels across block boundaries based algorithm [30][31][32],
iterative algorithms based on POCS [33][34][35], and adaptive algorithms
[36]. An interesting approach of recasting the deblocking problem into a
denoising problem by injecting uniform random noise was also proposed in
54
[37]. While these algorithms operate in the spatial domain, algorithms that
work in DCT transformed domain were proposed in [38] [39] and [40].
Among the algorithms proposed, three key classes of algorithms are
discussed in the next section, which are based on POCS, weighted sum of
pixels across block boundaries, and adaptive filters.
4.3.1 POCS based iterative algorithms
This class of algorithms is originated from the image restoration
algorithm proposed in [34]. A number of constraints, two in most cases, are
imposed on an image to restore it from its corrupted version. After defining
the transformations between the constraints, the algorithm starts in an
arbitrary point in one of the sets and projecting iteratively among them until
convergence. In [33], MSE (mean square error) is used as a metric of
closeness between two consecutive projections. Convergence can be
imagined as achieving an MSE below a certain threshold value. If the
constraints are convex sets, it is claimed in [34] that a convergence is
guaranteed if the intersection of the sets is non-empty. This approach was
proposed for the first time in [33] to apply to deblocking of images. In this
paper, the constraint sets chosen are the frequency band limit in both vertical
and horizontal directions (known as filtering constraint) and the quantization
55
intervals of the transform coefficients (referred to as quantization
constraint). In the first step, an image is band limited by applying a LPF
(low pass filter) on it. After that, the image is transformed to obtain
transform coefficients, which are then subjected to the quantization
constraint. The coefficients lying outside of the quantization interval are
mapped back into the interval. For example, the coefficients can be clipped
to the minimum and maximum values if it is outside the interval. This two-
step process will be done iteratively until a convergence is reached. The
authors claimed that the algorithm converges after about 20 iterations in
their experiments.
Due to the iterative nature of this class of algorithms, time to
convergence is in fact unknown. This prohibits it from applying on real time
systems, in which run time of each module must be bounded. If the number
of iterations is bounded regardless of convergence, then the quality of video
becomes an unknown. The projections also involve filtering of the picture
and transformation to frequency domain and require unacceptably large
amounts of extra resources.
56
4.3.2 Weighted sum of symmetrically aligned pixels
In the second class of algorithms proposed, the value of each pixel in
the picture is recomputed with a weighted sum of itself and the other pixel
values, which are symmetrically aligned with respect to block boundaries. In
[30], the authors proposed a scheme of including three other pixels, which
are taken from the block above, to the left and the block above the left block.
The weights are determined empirically and can either be linear or quadratic.
The combined effect of these weighted sums on the pixels can be regarded
as an interpolation across the block boundaries. However, there is a problem
in this approach when a weighted sum of a pixel in a smooth block takes the
pixels in the adjacent high-detail blocks into account. The texture details get
into the smooth region and a vague image of the high detail can be seen.
This new artifact is referred to as ghosting [30]. A scheme of grading each
block according to the level of details with a grading matrix was proposed to
minimize this new artifact. The weights on each of the four pixels are then
increased or reduced according to the grades.
The execution time is guaranteed, as the operations are well defined.
Since the pictures must be graded before applying the filter on the pixels, a
4-pass scheme for processing a picture was proposed. This algorithm is
57
essentially performing a filtering operation on every pixel in a picture, with
the three passes of matrix operations in a grading process. A very high
performance platform is required to implement this algorithm in a real time
application.
4.3.3 Adaptive deblocking filter
In this class of algorithms, the deblocking process is separated into
two stages. In the first stage, the edge is classified into different boundary
strengths with the pixels along the normal to an edge. In the second stage,
different filtering schemes are applied according to the strengths obtained in
the first stage. In [36], the edges are classified into three types to which no
filter, weak 3-tap filter, and strong 5-tap filter are applied. The algorithm is
adaptive because the thresholds for edge classification are based on the
quantization parameters included in the relevant blocks. An edge will only
be filtered if the difference between the pixel values along the normal to the
edge, but not across the edge, is smaller than the threshold. For high detail
blocks on the side of edges, the differences are usually larger and so strong
filtering is seldom applied to preserve the details. As the threshold increases
with the QP, the edges across the high detail blocks will be filtered
eventually because a high coding error is assumed for large QP.
58
Since the edges are classified before processing, strong filtering can
be replaced by weak filtering or even skipped. Also the filtering is not
applied to every pixel but only to those across the edges. A significant
amount of computation can be saved through the classification. A
disadvantage of this algorithm is the higher complexity in control flow of the
algorithm.
4.3.4 Comparison of algorithms
The relative computation and implementation complexity of the three
classes of algorithms discussed is summarized qualitatively in Table 2. The
POCS based algorithms are considered most complex because of the flow
and major operations, which are more intensive than the other two methods.
The major operations for the weighted sum based algorithm and the adaptive
algorithm seems to be similar. For the case of 4x4 pixel blocks, the major
operation performed by adaptive algorithms is only about half of that by the
weighted sum based algorithms. The adaptive algorithm is considered more
complex in implementation because of its classification and applying
different filters adaptively.
59
Table 4.1 A summary on the computation and implementation complexity of the deblocking algorithms.
POCS-based
algorithms [33]
Weighted sum
based algorithm
[30]
Adaptive
algorithms [36]
Algorithm flow Iteratively
projecting
back and forth
between two
sets on entire
picture
Grading of
blocks with
grading matrix.
Iterative on
every pixel.
Iteratively
classify and
apply filter on
every block
edges.
Major operations LPF, DCT Weighted sum of
4 pixels for each
pixel
3-tap or 5-tap
filter on pixels
across edges.
Relative computation
complexity
High Medium Low
Relative implementation
complexity
High Low Medium
Quality Best Good Good
60
4.4 Comparison of postprocessing and loop filtering
Deblocking filters can be incorporated as post filters or loop filters.
Post filters operate on the display buffer outside of the coding loop, whereas
loop filters operate within the coding loop. Before going into the advantages
of loop filters over post filters, let’s look at its disadvantages. Post filters are
not a normative part of the standard. Hence they offer maximum freedom for
decoder implementations, as their use is optional.
There are several advantages of loop filters over post filters:
1. With a loop filter in the codec design, content providers can safely
assume that proper deblocking filters process their material, guaranteeing
the quality level expected by the producer.
2. In the loop filtering approach, filtering is carried out on macro block
basis during the decoding process and as the filtered output is directly
stored as reference frames, there is no need for an extra frame buffer in
the decoder. In the post filtering approach, however the frame is typically
decoded into a reference frame buffer. An additional frame buffer may be
needed to store the filtered frames to be passed onto the display device.
3. Empirical tests have shown that loop filtering typically improves both
objective and subjective quality of video streams with significant
61
reduction in decoder complexity compared to post filtering [4][5]. As the
filtered frames offer higher quality prediction for motion compensation,
quality improvements can be seen. Computational complexity is reduced
as the image area in the reference frames is already filtered.
4.5 Desired loop filter
Image blocking artifact reduction involves the following problems:
1. Smoothing artificial discontinuities (due to quantization noise or QP)
between blocks.
2. Differentiating between image edges and artificial edges.
3. Image edges should not be smoothed as it degrades image quality.
4. If needed, filters can be applied specific to image edges.
A loop filter should satisfy the above properties. In addition to this the
following properties are desired:
1. It should remove the blocking artifacts without blurring the image.
2. Its computational complexity should be low.
3. Its implementation complexity is low so that it can be implemented in
real time systems.
62
Chapter 5
DEBLOCKING FILTER IN H.264/AVC
H.264/AVC uses an adaptive in-loop deblocking filter to remove the
blocking artifacts visible in decoded frames at low bit rate coding. H.264
uses a 4x4 transform, which reduces the ringing effect in the decoded
frames. The significant contributor for blocking artifacts in H.264 is the
block-based integer DCT. The coarse quantization of the transform
coefficients can cause visually disturbing discontinuities at the block
boundaries [4] [41]. The other source for blocking artifacts in H.264/AVC is
motion compensated prediction. Motion compensated blocks are generated
by copying interpolated pixel data from different locations of possibly
different reference frames. Since there is almost never a perfect fit for this
data, discontinuities on the edges of the copied blocks typically arise.
Additionally, in the copying process, existing edge discontinuities in the
reference frames are carried into the interior of the block to be compensated.
63
5.1 Deblocking filter operation
Each video frames are divided into 16x16 pixel blocks called macro
blocks. The deblocking filter is applied to all the edges of 4x4 pixels blocks
in each macro block except the edges on the boundary of a frame or a slice.
For each block, vertical edges are filtered from left to right first, and then
horizontal edges are filtered from top to bottom. The decoding process is
repeated for all macro blocks in a frame.
5.1.1 Characteristics of deblocking filter
The main characteristics of a deblocking filter used in H.264/AVC are
as follows:
1. Improvement in subjective visual and objective quality of decoded
picture.
2. Significantly superior to post filtering.
3. Highly content adaptive filtering procedure mainly removes blocking
artifacts and does not necessarily blur the visual content [36].
On slice level, the global filtering strength can be adjusted to
the individual characteristics of the video sequence.
On edge level, the filtering strength is dependent on inter/intra,
motion, and coded residuals.
64
On sample level, quantizer dependent threshold can turn off
filtering for every individual sample.
Strong filters for macro blocks with very flat characteristics
almost remove blocking effects.
5.1.2 Principle of deblocking filter
The one-dimensional visualization of an edge position is shown in
Fig. 5.1. One line of sample values inside two neighboring 4x4 blocks are
denoted by a3, a2, a1, a0, b0, b1, b2, and b3 in Fig 5.1. The actual boundary is
between a0 and b0 as shown in Fig. 5.1. Filtering of p0 and q0 only takes place
if:
1. |a0-b0| < (QP)
2. |a1-a0| < (QP)
3. |b1-b0| < (QP)
where (QP) is considerably smaller than (QP).
Filtering of a1 or b1 takes place if additionally: |a2-a0| < (QP) or |b2-b0|
<(QP). Filtering can be done on a macroblock basis that is, immediately
after a macroblock is decoded. First, the vertical edges are filtered then the
horizontal edges. When interpreting edges in Fig. 5.2 as luma edges,
depending on the transform used (4x4 transform or 8x8 transform), the
65
Figure 5.1 One-dimensional view of a 4x4 block edge
following applies [1]:
If 4x4 transform is selected, both the solid bold and dashed bold luma
edges (Fig. 5.2) are filtered.
Otherwise (8x8 transform), only the solid bold luma edges are filtered.
When interpreting the edges in Figure 5.2 as chroma edges, depending
on the format used the following applies:
If 4:2:0 format is selected, only the solid bold chroma edges (Fig. 5.2)
are filtered.
Otherwise, if 4:2:2 format is selected, the solid bold vertical chroma
edges are filtered and both types, the solid bold and dashed bold
horizontal chroma edges are filtered.
66
Otherwise, if 4:4:4 format is selected, both types, the solid bold and
dashed bold chroma edges are filtered.
Figure 5.2 Boundaries in a macroblock to be filtered [1].
5.1.3 Algorithm of deblocking filter
The decision of filter tap for each pixel is based on the following
factors:
1. Boundary strength (bS).
2. Thresholds of α and β.
3. The content of sample pixels.
Figure 5.3 explains how the above factors are used to decide the filter
tap for each pixel (A0-A3, B0-B3). The first step decides (eq. 5.1) whether
67
the filtering is required or not. Then, according to the BS level, thresholds (α
and β), and the absolute differences of adjacent reconstructed pixels,
different filters are applied to different pixels [43]. According to the level of
bS, the numbers of input pixels are updated with the filtered result. The
higher the level of bS, more the number of input pixels updated with the
filtered results. In H.264, bS has 5 levels with 0 being lowest and 5 being
highest. If the level of bS is zero, no input pixels are updated with the
filtered result. The updated (B0-B3) pixels could be used for the filtering of
next adjacent block when the filtering window slides one block to the right.
The bS level mainly decides the necessity of filtering and filter type. The bS
level is determined by the MB type, edge position, reference frame type, and
motion vectors of two adjacent blocks as depicted in Fig. 5.4. The bS has the
strongest level when two adjacent blocks are intra coded and locate at the
MB boundary. The highest bS level invokes strong low pass filtering as
blocking artifacts are most visible in the above case. The lowest bS level
indicates no filtering of the input pixels.
In addition to bS levels, the parameters α and β are used for
preservation of a real edge. The parameters α and β also control the
necessity of filtering (eq. 5.1). α and β are assigned with higher values to
68
increase the possibility of filtering as higher QP causes more noticeable
blocking artifact [43]. In contrast, smaller α and β values are used for lower
QP.
bS! = 0 AND |A0–B0|<α AND |A1–A0|<β AND |B1–B0|<β (5.1)
5.2 Deblocking filter complexity
Analysis of run time profiles of decoder sub-functions reported that
the deblocking filter process in H.264 is the most computationally intensive
part [6]. Deblocking filter took as much as one-third of the computational
resources of the decoder.
The deblocking technique adopted in this standard requires an
extensive decision matrix to determine whether to filter on block edges and
which filter to employ. The complexity in H.264 is mainly based on high
adaptivity of the filter, which requires conditional processing at the block
edge and sample levels. As a result, conditional branches almost inevitably
appear in the innermost loops of the algorithm. These are very time
consuming and quite a challenge for parallel processing in DSP (digital
signal processor) hardware or SIMD (single instruction multiple data) code
[36] on general-purpose processor. Another reason for high complexity is
69
the small block size employed for residual coding in H.264 coding
algorithm. With the 4x4 blocks and a typical filter length of 2 samples in
70
Figure 5.3 Decision flow of filter tap selection [43].
Figure 5.4 Decision flow of boundary strength (BS) where P and Q denote the identifications of two adjacent 4x4 blocks [43].
each direction, almost every sample in a picture must be loaded from
memory, either to be modified or to determine if neighboring samples will
be modified. This was not the case with earlier standards. Some of the
branches are inherited to the algorithm itself and it is hard to eliminate them
71
at the programming level. These branches are data dependent and inherently
difficult to predict.
The standards group has published proposed program code to
implement this deblocking. The program code includes extensive
conditional branching [44]. This makes the code unsuitable for deeply
pipelined processors and ASIC (application specific integrated circuit)
implementations. In addition, this proposed program code exposes little
parallelism. This makes the proposed program code unsuitable for VLIW
(very long instruction word) processors and parallel hardware
implementations. This is particularly unfortunate in the case of VLIW
processors, which are otherwise well suited to video encoding/decoding
applications.
72
Chapter 6
INTRA AND INTER FRAMES
The blocking artifacts are visible in both intra and inter frames. As the
reference frames used are filtered ones in H.264/AVC, the motion
compensation does not propagate the blocking artifacts. This is because of
the use of in-loop deblocking filter in H.264/AVC. Due to these reasons, the
inter frames have less severe blocking artifacts than intra frames. The major
disadvantage of the in-loop deblocking filter adopted in H.264/AVC is its
implementation complexity as described in the section 5.2.
6.1 Intra frames
Intra coding exploits the spatial redundancies within a video frame.
The resulting frame is referred to as an I-frame. H.264 uses adaptive spatial
prediction to increase the efficiency of the intra coding process. H.264 uses
nine modes of prediction for 4x4 luminance blocks and four modes of
prediction for 16x16 luminance blocks. There is an additional 8x8 block
intra prediction in FRExt (fidelity range extension) [46]. For chroma, four
74
different modes are defined which are similar to four modes of 16x16
luminance block.
The blocking artifacts are more severe in intra frames as compared to
inter frames. In intra frames, the blocking effect occurs in either more
spatially active areas or the bright or very dark areas. The blocking effect is
more visible in the smoothly textured sections of a picture. DCT of a
smoothly textured section contains more energy in the lower order DCT
coefficients [3]. Therefore, the lower order DCT coefficients play a
significant role in determining the visibility of the blocking effect. The
blocking effect may occur in spatially active areas as a result of coarse
quantization [3]. In low bit rate coding, with higher QP the medium to
higher order DCT coefficients are quantized to zero. As a result, an
originally spatially active block will have a smoothly textured
reconstruction, which has same visibility concerns as originally smooth
blocks.
6.1.1 Proposed method for intra frames
As seen in section 5.2, the deblocking filter used in H.264 has high
implementation complexity. A new method is now proposed to reduce the
implementation complexity of H.264 deblocking filter, while maintaining
75
the comparable quality and computation complexity. The proposed method
as applied to intra frames is shown in Fig. 6.2. The first three blocks in Fig.
6.2 check for the conditions at the slice boundaries. The user sets these
conditions. These three blocks are the same as being used by the existing
deblocking filter in H.264. The edge can be visualized as shown in Fig. 6.1.
Here q0, q1, q2, q3 represent the pixel values loaded from the current 4x4
Figure 6.1 4x4 block edge (vertical or horizontal)
.block and the p0, p1, p2, p3 represent the 4x4 block adjacent to the current 4x4
block. The next step in Fig. 6.2 is to compute the maximum and minimum
values among the six pixels across an edge (p2, p1, p0, q0, q1, q2) and then
calculate the difference between the maximum and the minimum value. If
q3
p3
76
this difference is greater than the QP of current block, then it is more likely
to represent an edge and therefore should not be filtered. On the other hand,
if the difference between the maximum and minimum values is less than the
QP of the current block, filtering should be applied to that block to remove
the blocking artifacts. The block in the above case most likely represents
smoothly or mildly texture area. The next step in Fig. 6.2 is to find the
difference between adjacent pixels of a 4x4 block edge. For example, the
absolute difference between p3 and p2 is calculated and if that difference is
less than a fixed threshold (in this case the threshold has been set to two)
then one is assigned to a variable diff. The above process is repeated for all
the adjacent pixels across an edge (Fig. 6.1) in a 4x4 block. The variable
strength denotes the sum of the variable diff across all pixels of a 4x4 block
edge (Fig. 6.1). If the variable strength is greater than fixed threshold (in this
case the threshold has been set as four) then the block is most likely to be a
smoothly textured section and strong filtering is applied to that block. On the
other hand, if the variable strength is less than the above specified threshold,
it is most likely to represent mildly textured areas or high activity region and
weak filtering is applied to that block. Here, strong filtering means applying
a low pass filter to the three adjacent pixels on either side of the boundary of
77
a block and weak filtering is applying a low pass filter to the pixel on either
side of the boundary of a block.
6.1.2 Results for intra frames
The PSNR values for different test sequences [47] using the proposed
method, JM 9.2 (H.264 software) [44] and reconstruction without loop filter
are described in Table 6.1. The reconstruction of I-frame with proposed
method gives better PSNR (peak signal to noise ratio) values than the
reconstruction without loop filter as shown in the Table 6.1. Also, it gives
similar PSNR values compared to the reconstruction with loop filter (Table
6.1). The Figures 6.3-6.9 show visually the removal of blocking artifacts
using proposed method.
78
Deblocking disabled for all edges of the slice?
Deblocking disabled for all edges of the slice
boundaries
Is the edge at the slice boundary?
Find the max. and min. values among 6 pixels across the vertical or horizontal edge. Compute the absolute difference between max. and min. value diff = abs (max. – min.)
Is diff < QP
No filtering
Is MB Intra
Apply weak filtering across the edge
Calculate the offsets for 7 pixel pairs in the horizontal or vertical edge
If offset < 2
Var =0
Var =1
Strength = strength + var
Is strength > 4
Apply weak filtering
Apply strong filtering
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
Figure 6.2 Decision flow of filtering of pixels
78
79
Table 6.1. Comparison of PSNR values for different test sequences.
PSNR (dB)
Test clip
(QCIF)
QP
(Quantization
Parameter)
Reconstruction
with
Proposed
Method
Reconstruction
without
Loop Filter JM
9.2 (H.264
reference
software)
Reconstruction
with Loop
Filter
JM 9.2 (H.264
reference
software)
Foreman 37 31.615 31.363 31.637
Car phone 37 31.616 31.318 31.620
Car phone 45 26.671 26.291 26.612
News 35 32.204 32.040 32.043
News 39 29.515 29.313 29.55
Silent 39 29.489 29.269 29.330
Container 39 29.539 29.383 29.493
Container 45 25.718 25.553 25.710
Bridge-
close
37 29.886 29.738 29.828
Bridge-
close
45 25.816 25.635 25.805
81
(a) (b)Figure. 6.3 A reconstructed I-frame from H.264 decoder (a) QP = 37 without using a loop filter,
(b) QP = 37 with proposed method
(a) (b)Figure. 6.4a A reconstructed I-frame from H.264 decoder (a) QP = 45 without using a loop filter
(b) QP = 45 with proposed method
82
(a) (b)Figure. 6.5 A reconstructed I-frame from H.264 decoder (a) QP = 37 without using a loop filter
(b) QP = 37 with proposed method
(a) (b)Figure. 6.6 A reconstructed I-frame from H.264 decoder (a)QP = 37 without using a loop filter
(b) QP = 37 with proposed method
(a) (b)Figure. 6.7 A reconstructed I-frame from H.264 decoder (a) QP = 39 without using a loop filter
(b) QP = 39 with proposed method
83
(a) (b)Figure. 6.8 A reconstructed I-frame from H.264 decoder (a) QP = 39 without using a loop filter,
(b) QP = 39 with proposed method
(a) (b)Figure. 6.9 A reconstructed I-frame from H.264 decoder (a) QP = 45 without using a loop filter,
(b) QP = 45 with proposed method
6.2 Inter frames
Inter prediction and coding is based on using motion estimation and
compensation to take advantage of the temporal redundancies that exist
between successive frames. When a selected reference frame for motion
84
estimation is a previously encoded frame, the frame to be encoded is referred
to as a P (predictive)-picture. When both a previously encoded frame and a
future frame are chosen as reference frames, then the frame to be encoded is
referred to as a B (bipredictive)-picture.
The exploitation of interframe redundancies relies on the transfer of
previously coded information from motion compensated reference frames to
the current predictive coded picture. The transferred information also
includes the coding artifacts formed in the reconstruction of the motion
compensated reference. This type of artifact is generally referred to as false
edges [3]. False edges are mainly visible in smooth areas of predictive coded
pictures. Since the motion compensated prediction provides a good
prediction of the lower frequency information of a macroblock, the
prediction error in smooth areas is minimal or quantized to zero, and
therefore the false edges would not be masked. The blocking effect mainly
occurs in mildly textured areas where the prediction error is sufficiently
large such that it is not quantized to zero. This blocking effect is reduced in
H.264 by having filtered reference frames available for motion
compensation, but still some artifacts are present because the reference
frames may not fit at block boundary.
85
6.2.1 Proposed method for inter frames
The proposed method is applied to intra frames as shown in Fig. 6.2.
Here q0, q1, q2, q3 represent the pixel values loaded from the current 4x4
block and the p0, p1, p2, p3 represent the 4x4 block adjacent to the current 4x4
block. The next step in Fig. 6.2 is to compute the maximum and minimum
values among the six pixels across an edge (p2, p1, p0, q0, q1, q2) and then
calculate the difference between the maximum and the minimum value. If
this difference is less than QP then the current block most likely represents a
smooth or mildly textured area. If the MB is inter one then apply low pass
filter to the adjacent pixels on either side of the boundary of a block or MB.
Otherwise, if the difference between the minimum and maximum value is
greater than QP, the current block most likely represents an edge and
therefore is not filtered.
6.2.2 Results for inter frames
The PSNR values and the total number of bits used for encoding a P-
frame or B-frame in a H.264 compressed stream with different test
sequences using the proposed method, JM 9.2 (H.264 software) and
86
reconstruction without loop filter are given in Table 6.2. The reconstruction
of P-frame and B-frame with proposed method gives better PSNR (peak
signal to noise ratio) values than the reconstruction without loop filter as
shown in the Table 6.2. Also, it gives similar PSNR values compared to the
reconstruction with loop filter (Table 6.2). Figures 6.10-6.14 show visually
the removal of blocking artifacts in P-frames and B-frames using the
proposed method.
87
Table 6.2 Comparison of PSNR values and the total number of bits used for encoding a P- frame or B-frame in H.264
compressed stream for different test sequences.
PSNR (dB) Total number of bits used
Test clip
(QCIF)
-Type of
frames
QP
(Quantizat
ion
Parameter)
Reconstruction
with
Proposed
Method
Reconstruction
without
Loop filter
JM 9.2 (H.264 software)
Reconstruction
with
Loop filter JM 9.2
(H.264 software)
Reconstruction
with
Proposed
Method
Reconstruction
without
Loop filter
JM 9.2 (H.264
software)
Reconstruction
with
Loop filter
JM 9.2 (H.264
software)
Foreman-P 39 29.883 29.834 29.692 2085 2131 2107
News-P 39 27.846 27.528 27.771 4119 4074 4235
Car phone-P 39 30.271 30.024 30.171 817 897 720
Bridge close-
P
39 28.484 28.495 28.480 105 52 126
Foreman-B 39 28.925 28.879 29.054 489 515 499
News-B 39 28.085 28.307 27.982 1424 1802 1687
Car phone-B 39 29.637 29.438 29.491 197 203 193
Bridge close-
B
39 28.532 28.558 28.480 53 53 53
86
(a) (b)Figure. 6.10 A reconstructed P-frame from H.264 decoder (a) QP = 39 without using a loop filter,
(b) QP = 39 with proposed method
(a) (b)Figure. 6.11A reconstructed B-frame from H.264 decoder, (a) QP = 39 without using a loop filter
(b) QP = 39 with proposed method
(a) (b)Figure. 6.12 A reconstructed P-frame from H.264 decoder (a) QP = 39 without using a loop filter,
(b) QP = 39 with proposed method
(a) (b)Figure. 6.13 A reconstructed P-frame from H.264 decoder (a)QP = 39 without using a loop filter,
(b) QP = 39 with proposed method
(a) (b)Figure. 6.14 A reconstructed B-frame from H.264 decoder (a) QP = 39 without using a loop filter
(b) QP = 39 with proposed method
Chapter 7
RESULTS AND CONCLUSIONS
7.1 Results
The results presented so far show that the proposed method removes
blocking artifacts significantly, gives better visual quality and PSNR values
than without using loop filter in H.264 codec. In comparison to H.264 in-
loop deblocking filter, the present method gives equal visual quality and
almost comparable PSNR values. For Intra frames, the proposed method
gives 0.1 to 0.2 dB less PSNR values than the in-loop deblocking filter for
some test sequences, whereas it gives 0.1-0.2 dB more PSNR values than the
in-loop deblocking filter for other sequences. For inter frames, it gives
slightly better PSNR values than the in-loop deblocking filter for most of the
test sequences.
The results are given for a GOP (group of pictures) size of 10 with the
test clip used is of QCIF (176x144) resolution (Table 7.1). The proposed
method gives better PSNR values for P- frames and B-frames than the other
two methods (Table 7.1).
As described in section 5.2, deblocking filter in H.264 [6] has high
implementation complexity. The main reason for its high implementation
complexity is the occurrence of conditional branches in the innermost loop
of the algorithm. The details of this complexity are discussed in section 5.3.
The proposed method reduces the implementation complexity by reducing
the occurrence of conditional branches in the innermost loop of the
algorithm.
Table 7.1 Comparison of PSNR values and the total number of
bits used for encoding a P- frame or B-frame in H.264
compressed stream
PSNR (dB)
Frame Type-
frame
number
(Foreman
_qcif)
QP( Qua
ntization
paramet
er)
Reconstruction with
Proposed
Method
Reconstruction
without
Loop Filter
JM 9.2 (H.264
software)
Reconstruction
with Loop
Filter
JM 9.2 (H.264
software)
Intra –0 45 26.058 25.851 26.096
B -3 39 28.1341 28.066 28.044
P –8 39 28.668 28.556 28.411
B –9 39 28.269 28.439 28.157
P –10 39 28.731 28.727 28.534
The main advantages of proposed method over JM method are:
1. JM 9.2 (H.264 software) [44] loop filter code size is 21Kb whereas the
proposed method loop filter code size is 11Kb. In real time
implementation this means the memory is saved by the proposed
method.
2. JM 9.2 (H.264 software) loop filter uses two tables of size 52 bytes
and one table of size 260 bytes to check, whether a pixel should be
filtered or not. The values from these tables have to be accessed each
time a pixel is checked whether it has to be filtered or not in a real
time system. This will take some time of the processor in real time.
No such tables are used in the proposed method.
3. Conditional loops in the innermost loop of the algorithm are reduced
in the proposed method as compared to JM 9.2 (H.264 software). This
will lead to reduction in its execution time in real time systems.
7.2 Conclusions
The proposed method is able to reduce the blocking artifacts in the
reconstructed video. It gives almost similar visual quality of the
reconstructed video as compared to the one obtained from JM 9.2 (H.264
software) loop filter. The proposed method requires less implementation
complexity compared to the JM 9.2 (H.264 software) loop filter. That is
because of the simple flow algorithm of the proposed method as compared
to the JM 9.2 (H.264 software) loop filter.
7.3 Future research
The proposed deblocking filter can be implemented in a real time
system. By doing so, its exact reduction in implementation complexity as
compared to deblocking filter can be found out. The deringing filter can also
be incorporated in the in-loop filter to see the visual improvement of the
reconstructed video. The proposed method uses image enhancement
techniques to reduce the artifacts in the reconstructed video. Image recovery
techniques can also be explored to reduce the artifacts in H.264 decoded
reconstructed video. Also, the transforms that do not produce blocking
artifacts as well as providing the benefits of integer DCT can be explored.
REFERENCES
1. Draft ITU-T Recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC), Mar. 2003.
2. I. E. G. Richardson, “H.264 and MPEG-4 Video Compression: Video coding for next-generation multimedia”, Hoboken, NJ, Wiley, 2003.
3. H.R.Wu and K.R.Rao, “Digital Video Image Quality and Perceptual Coding”, Taylor and Francis, Dec. 2005.
4. Y. –L. Lee and H. W. Park, “Loop filtering and post-filtering for low- bit-rates moving picture coding”, Signal processing: Image Communication, vol. 1, pp. 94-98, Dec. 1999.
5. J. Lainema and M. Karczewicz, “TML 8.4 loop filter analysis”, ITU-T SG16 Doc. VCEG-N29, 2001.
6. V. Lappalainen, A. Hallapuro, and T. D. Hamalainen, “Complexity of Optimized H.26L Video Decoder Implementation”, IEEE Trans. CSVT, vol. 13, pp. 717-725, July 2003.
7. M. Ghanbari, “Video Coding: an introduction to standard codecs”, London, U.K.: Institution of Electrical Engineers, 1999.
8. ISO / IEC JTC1/SC29, Generic coding of moving pictures and associated audio, ISO/IEC 13818-2, Draft International Standard, Nov. 1994.
9. H.263: International telecommunication union, “Recommendation ITU-T H.263: Video coding for low bit rate communication”, ITU-T, 1998.
10.G. Sullivan and T. Wiegand, “ Video compression – from concepts to H.264/AVC standard”, Proc. IEEE, vol.93, pp. 18-31, Jan. 2005.
11.T. Weigand, et al, “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp. 560-576, July 2003.
12.M. Karczewicz and R. Kurceren, “The SP- and SI-Frames design for H.264/AVC”, IEEE Trans. CSVT, vol.13, pp. 637-644, July 2003.
13.M. Wein, “Variable block-size transforms for H.264/AVC”, IEEE Trans. CSVT, vol. 13, pp. 604-613, July 2003.
14.J. Ostermann, et al, “Video coding with H.264/AVC: Tools, performance and complexity”, IEEE CAS Magazine, vol.4, pp.7-34, I quarter, 2004.
15.H. S. Malvar, et al, “Low-complexity transform and quantization in H.264/AVC”, IEEE Trans. CSVT, vol. 13, pp. 598-603 July 2003.
16.M.-Y. Shen and C. - C Jay Kuo, “Review of postprocessing techniques for compression artifact removal”, Journal of Visual Communication and Image Representation, vol. 9, pp. 2-14, Mar. 1998.
17.S. A. Karunasekra and N. K. Kingsbury, “A distortion measure for blocking artifacts on images based on human visual sensitivity”, IEEE Trans. Image Processing, vol. 4, pp. 713-724, June 1995.
18.G. Fan and W. K. Chan, “Model-based edge reconstruction for low-bit rate wavelet compressed images”, IEEE Trans. CSVT, vol. 10, pp. 120-132, Jan.2000.
19.N. Jayant, J. Johnson, and R. Safranek, “Signal compression based on models of human perception”, Proc. IEEE, vol. 81, pp. 1385-1422, Oct. 1993.
20.A. Kundu, “Enhancement of JPEG coded images by adaptive spatial filtering”, Proc. IEEE ICIP, pp. 187-190, Oct. 1995.
21.P. Farrelle and A. K. Jain, “Recursive block-coding – a new approach to transform coding”, IEEE Trans. Commun., vol. COM-34, pp. 161-179, Feb. 1986.
22.D. Pearson and M. Whybray, “Transform coding of images using interleaved blocks”, Proc. IEE, vol. 131, pp. 466-472, Aug. 1984.
23.H. S. Malvar, Signal Processing with Lapped Transforms. Boston, MA: Artech House, 1992.
24.Y. Zhang, R. Pickholtz, and M. Loew, “A new approach to reduce the blocking effect of transform coding”, IEEE Trans. Commun., vol. 41, pp. 299-302, Feb. 1993.
25.B. Ramamurthi and A. Gersho, “ Nonlinear space-variant post processing of block coded images”, IEEE Trans. ASSP, vol. ASSP-34, pp. 1258-1267, Oct. 1986.
26.K. R. Rao and P. Yip, “Discrete cosine Transform: Algorithms, Advantages, Applications”, Boston, MA: Academic press, 1990.
27.K. Lee, D. S. Kim and T. Kim, “Regression-based prediction for blocking artifact reduction in JPEG-compressed images,” IEEE Trans. Image Processing, vol.14, pp. 36-48, Jan. 2005.
28.F. Gao, X. Li, and W. G. Wee, “A new wavelet based deblocking algorithm for compressed images,” Proc. IEEE Asilomar Conf. on Signals, Systems and Computers, vol. 2, pp. 1745-1748, Nov. 2002.
29.E. Choi and M. G. Kang, “Deblocking algorithm for DCT-based compressed images using anisotropic diffusion”, IEEE Trans. ASSP, vol. 3, pp. 717-720, Apr. 2003.
30.A. Z. Averbuch, A. Schlar, and D. L. Donoho, “Deblocking of block-transform compressed images using weighted sums of symmetrically aligned pixels,” IEEE Trans. Image Processing, vol.14, pp. 200-212, Feb. 2005.
31.O. Radovsky and M. Israeli, “Adaptive deblocking of block-transform compressed images using blending-functions approximation,” Proc. IEEE Int’l Conf. on Image Processing, vol. 3, pp. 227-230, Sept. 2003.
32.A. Schclar, A. Averbuch and D.L. Donoho, “Deblocking of block-DCT compressed images using deblocking frames of variable size,” IEEE Trans. ASSP, vol. 4, pp. 3285-3288, Apr. 2002.
33.A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Trans. CSVT, vol.2, pp. 91-95, Mar. 1992.
34.D. C. Youla and H. Webb, “Image restoration by the method of convex projections: Part 1 Theory,” IEEE Trans. Medical Imaging, vol.1, pp. 81-94, Oct. 1982.
35.J. J. Zou and H. Yan, “A POCS-based method for reducing artifacts in BDCT compressed images,” Proc. 16th Int’l Conf. on Pattern Recognition, vol. 2, pp. 11-15, Aug. 2002.
36.P. List, A. Joch, J. Lainema, G. Bjntegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. CSVT, vol.13, pp. 614-619, Jul. 2003.
37.C. A Graves, “Deblocking of DCT -compressed images using noise injection followed by image denoising,” Proc. IEEE Int’l Conf. on Information Technology: Computers and Communications, pp. 472-475, Apr. 2003.
38.C. Wang, W-J Zhang, and X-Z Fang, “Adaptive reduction of blocking Artifacts in DCT domain for highly compressed images,” IEEE Trans. Consumer Electronics, vol.50, pp. 647-654, May 2004.
39.S. Liu and A. C. Bovik, “Efficient DCT-domain blind measurement and reduction of blocking artifacts,” IEEE Trans. CSVT, vol.12, pp. 1139-1149, Dec. 2002.
40.Y. Zhao, G. Cheng and S. Yu, “Postprocessing technique for blocking artifacts reduction in DCT domain,” Electronics Letters, vol. 40, issue 19, pp. 1175-1176, Sept. 2004.
41.S. D. Kim, et al, “A deblocking filter with two separate modes in block-based video coding”, IEEE Trans. CSVT, vol. 9, pp. 156-160, Feb. 1999.
42.S.-K. Kwon, A. Tamhankar, K. R. Rao, “Overview of H.264 / MPEG-4 Part 10”, Journal of Visual Communication and Image Representation, vol. 17, pp. 186-216, April 2006.
43.S. C. Chang, et al, “A platform based bus-interleaved architecture for deblocking filter in H.264/MPEG-4 AVC”, IEEE Int’l Conf. on Consumer Electronics, vol. 51, pp. 249-255, 2005.
44.H.264 software (JM9.8/FRExt) from http://iphome.hhi.de/suehring/tml/download/jm98.zip .
45.http://www.stanford.edu/class/ee398b/handouts/05 standardsH264JVT.pdf
46. A. Puri, H. Chen and A. Luthra, "Video Coding using the H.264/MPEG-4 AVC compression standard", Signal Processing: Image Communicationvol. 19, pp.793-849, Oct. 2004.
47. Test sequences are obtained from: http://trace.eas.asu.edu/yuv/qcif.html
BIOGRAPHICAL INFORMATION
The author Hitesh Yadav received his Bachelors degree from Mangalore
University, India in Aug 2000 and Masters Degree in Electrical Engineering
from The University of Texas at Arlington in May 2006. Before coming for
Masters, he has worked in Accord Software Systems in Bangalore as
Systems Engineer. He has worked in the field of GPS receivers. During his