1 - UTA · Web view2.1 Components of an image ((a) R, G, ... Very Long Instruction Word Chapter...

OPTIMIZATION OF THE DEBLOCKING FILTER IN H.264 CODEC FOR REAL TIME

IMPLEMENTATION

by

Hitesh Yadav

Presented to the Faculty of the Graduate School of

The University of Texas at Arlington in Partial Fulfillment

of the Requirements

for the Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT ARLINGTON

MAY 2006

ACKNOWLEDGEMENTS

I would like to thank Dr. K. R. Rao for his constant encouragement and patience that

made this thesis possible. I also thank Dr. M. Manry and Dr. Z. Wang for being on my

committee. I also thank people at Analog Devices Inc. for their advice. I would also like

to thank my roommates and friends for their support. Lastly, but not the least I would like

to thank my parents and family for their constant support and patience.

April 19, 2006

ii

ABSTRACT

OPTIMIZATION OF THE DEBLOCKING FILTER IN H.264 CODEC FOR REAL TIME

IMPLEMENTATION

Publication No. ______

Hitesh Yadav, MS

The University of Texas at Arlington, 2006

Supervising Professor: K. R. Rao

H.264 standard provides a broad range of low bit rate and high bit rate

multimedia applications with high reliability and efficient coding performance when

transporting compressed video through existing and future networks. At low bit rate

applications, the artifacts are visible in the H.264 decoded video. Prominent among them

is blocking artifact. It is caused mainly due to the use of block-based transform coding

and coarse quantization of the transform coefficients. H.264 standard uses an in-loop

deblocking filter as a part of the standard to reduce the blocking artifacts. The main

drawback of the deblocking filter is its implementation complexity. It takes

approximately one-third of the computational resources of the decoder. The optimization

of the code does not help much as some of the branches are inherited to the

iii

algorithm itself. This gives rise to a need of a simple deblocking algorithm with reduced

implementation complexity. The objective of the research is to develop an algorithm,

which reduces the implementation complexity while maintaining the perceptual quality of

the existing deblocking algorithm.

In the proposed method, the maximum and minimum values among the six

pixels across an edge are computed to decide whether the pixels of the block should be

filtered or not. If the difference between maximum and minimum value is less than QP

then the pixels of that block should be filtered. Otherwise if this difference above is

greater than QP, then it is more likely to represent an edge and should not be filtered. The

main advantage of the proposed method is its simple logic. It has less number of

conditional branches than the existing deblocking filter in H.264 standard. It helps in

reducing the execution time in real time systems as most of video coding standards

nowadays are implemented in processors having several pipelining stages. The proposed

method also has reduced code size compared to JM 9.2 (H.264 software). This means less

memory is required to implement the proposed algorithm. It can be concluded from the

research that the proposed method reduces the implementation complexity compared to

existing JM 9.2 (H.264 software) method while at the same time it maintains the same

visual quality.

iv

TABLE OF CONTENTS

ACKNOWLEDGEMENTS....................................................................................... ii

ABSTRACT............................................................................................................... iii

LIST OF ILLUSTRATIONS..................................................................................... ix

LIST OF TABLES..................................................................................................... xii

LIST OF ACRONYMS............................................................................................. xiii

CHAPTER

1. INTRODUCTION ................................................................... 1

1.1 Background .............................................................................. 1

1.2 Objective of the thesis ............................................................................ 2

1.3 Outline of the thesis .............................................................................. 2

2. BASIC VIDEO CODING THEORY ........................................................... 4

2.1 Video compression ............................................................................. 4

2.1.1 RGB and YUV color spaces..................................................... 4

2.1.2 Video sampling ........................................................................ 7

2.1.3 Redundancy reduction ……………………………………….. 8

2.1.4 Video codec ………………………………………………….. 9

2.1.5 Motion Estimation …………………………………………… 10

2.1.6 Motion vectors ………………………………………………. 12

2.1.7 Block size effect …………………………………………….... 13

v

2.1.8 Sub-pixel interpolation ……………………………………….

15

2.1.9 Discrete cosine transform ……………………………………. 16

2.1.10 Quantization ………………………………………………... 17

2.1.11 Zigzag scan …………………………………………………. 17

2.1.12 Run length encoding ………………………………………... 18

2.1.13 Entropy coding ……………………………………………… 18

2.2 MPEG and H.26x …………………………………………………... 19

2.2.1 ISO/IEC, ITU-T and JVT ……………………………………. 19

2.2.2 H.261 ………………………………………………………… 20

2.2.3 MPEG-1 ……………………………………………………... 21

2.2.4 H.262 and MPEG-2 …………………………………………. 21

2.2.5 H.263/H.263+/H.263++ ……………………………………... 22

2.2.6 MPEG-4 ……………………………………………………… 22

2.2.7 MPEG-4 part-10/H.264 ………………………………………

23

3. OVERVIEW OF H.264/AVC STANDARD................................................. 25

3.1 Network abstraction layer................................................................... 32

3.2 Video coding layer.............................................................................. 34

3.2.1 Motion estimation and compensation for inter frames............. 35

3.2.2 Multiple reference pictures selection ………………………... 39

3.2.3 Intra prediction ………………………………………………. 41

vi

3.2.4 Transform and quantization …………………………………. 43

3.2.5 Deblocking filter....................................................................... 45

3.2.6 Entropy coding.......................................................................... 46

3.3 Conclusions ……………………………………………………….... 48

4. REVIEW OF POSTPROCESSING TECHNIQUES.................................... 49

4.1 Need for postprocessing...................................................................... 49

4.1.1 Preprocessing............................................................................ 50

4.1.2 Postprocessing........................................................................... 50

4.2 Causes for blocking and ringing artifacts............................................ 52

4.2.1 Blocking artifacts ..................................................................... 52

4.2.2 Ringing artifacts ....................................................................... 53

4.3 Deblocking filters ............................................................................... 54

4.3.1 POCS based iterative algorithms ............................................. 55

4.3.2 Weighted sum of symmetrically aligned pixels ....................... 57

4.3.3 Adaptive deblocking filter ....................................................... 58

4.3.4 Comparison of algorithms ........................................................ 59

4.4 Comparison of postprocessing and loop filtering .............................. 61

4.5 Desired loop filter ............................................................................... 62

5. DEBLOCKING FILTER IN H.264/AVC …………………………………… 63

5.1 Deblocking filter operation …………………………………………. 64

5.1.1 Characteristics of deblocking filter …………………………… 64

5.1.2 Principle of deblocking filter ……………………………….... 65

vii

5.1.3 Algorithm of deblocking filter ………………………………. 67

5.2 Deblocking filter complexity ………………………………………. 69

6. INTRA AND INTER FRAMES…………………………………………… 73

6.1 Intra frames ………………………………………………………… 73

6.1.1 Proposed method for intra frames …………………………… 74

6.1.2 Results for intra frames …………………………………….... 77

6.2 Inter frames ………………………………………………………… 82

6.2.1 Proposed method for inter frames ………………………….... 84

6.2.2 Results for inter frames ……………………………………… 84

7. RESULTS AND CONCLUSIONS……………………………………....... 89

7.1 Results ……………………………………………………………… 89

7.2 Conclusions ………………………………………………………… 91

7.3 Future research ……………………………………………………... 92

REFERENCES.......................................................................................................... 93

BIOGRAPHICAL INFORMATION......................................................................... 97

viii

LIST OF ILLUSTRATIONS

Figure Page

2.1 Components of an image ((a) R, G, B components, (b) Cb, Cr, Cg components) ............................................................................................... 7

2.2 Color format ((a) 4:2:0 (b) 4:2:2 (c) 4:4:4)...................................................... 7

2.3 Spatial and temporal redundancies ................................................................. 9

2.4 Common video coding flow............................................................................ 10

2.5 Motion estimation procedure ......................................................................... 11

2.6 Macroblock representation in 4:2:0 format .................................................... 11

2.7 Motion vector representation ......................................................................... 12

2.8 Macroblock partitions for motion estimation and compensation ................... 14

2.9 Block size effects on motion estimation ((a) frame 1, (b) frame 2, (c) No .... 14 motion estimation, (d) 16x16 block, (e) 8x8 block, (f) 4x4 block.

2.10 Sub-pixel interpolation ................................................................................... 15

2.11 Zigzag scan ..................................................................................................... 18

2.12 Progression of the ITU-T recommendations and MPEG standards ............... 24

3.1 H.264/AVC layer structure ............................................................................. 27

3.2 Hierarchical syntax ......................................................................................... 28

3.3 Progressive and interlaced frames .................................................................. 29

3.4 Subdivision of a picture into slices ................................................................. 29

3.5 The specific coding parts of profile in H.264 ................................................. 32

ix

3.6 H.264 encoder ................................................................................................ 34

3.7 H.264 decoder ................................................................................................. 35

3.8 Motion compensation accuracy ...................................................................... 36

3.9 Quarter sample luma interpolation ................................................................. 39

3.10 Multiple reference frames and generalized bi-predictive frames ................... 40

3.11 Intra prediction in H.264 ................................................................................. 41

3.12 16x16 intra prediction directions .................................................................... 42

3.13 4x4 intra prediction directions ........................................................................ 43

3.14 Transform coding ............................................................................................ 44

3.15 CABAC overview ........................................................................................... 47

4.1 Blocking artifacts at low bit rate coding ......................................................... 53

5.1 One-dimensional view of a 4x4 block edge .................................................... 66

5.2 Boundaries in a macroblock to be filtered ...................................................... 67

5.3 Decision flow of filter tap selection ................................................................ 70

5.4 Decision flow of boundary strength where P and Q denote the identification of two adjacent 4x4 blocks ....................................................... 71

6.1 4x4 block edge ................................................................................................ 75

6.2 Decision flow of filtering of pixels ................................................................. 78

6.3 Reconstructed I frame with QP=37 (foreman) ............................................... 80

6.4 Reconstructed I frame with QP=45 (foreman) ............................................... 80

6.5 Reconstructed I frame with QP=37 (bridge) ................................................... 81

6.6 Reconstructed I frame with QP=37 (car phone) ............................................. 81

6.7 Reconstructed I frame with QP=39 (news) ..................................................... 81

x

6.8 Reconstructed I frame with QP =39 (silent).................................................... 82

6.9 Reconstructed I frame with QP=45 (container) .............................................. 82

6.10 Reconstructed P frame with QP=39 (bridge) .................................................. 87

6.11 Reconstructed B frame with QP=39 (foreman) .............................................. 87

6.12 Reconstructed P frame with QP=39 (car phone) ............................................ 87

6.13 Reconstructed P frame with QP=39 (foreman) ............................................... 88

6.14 Reconstructed B frame with QP=37 (car phone) ............................................ 88

xi

LIST OF TABLES

Table Page

4.1 A summary on the computation and implementation complexity of the deblocking algorithm ........................................................ 60

6.1 Comparison of PSNR values for different test sequences............................... 79

6.2 Comparison of PSNR values and the total number of bits used for encoding a P-frame or B-frame in H.264 compressed stream for different test sequences.............................................. 86

6.3 Comparison of PSNR values and the total number of bits used for encoding a P-frame or B-frame in H.264 compressed stream for a GOP......................................................................... 90

xii

LIST OF ACRONYMS

AC: Alternating Current

AVC: Advanced Video Coding

bS: Boundary strength

CABAC: Context-based Adaptive Binary Arithmetic Coding

CAVLC: Context-based Adaptive Variable length Coding

DB: Decibel

DC: Direct Current

DCT: Discrete Cosine Transform

DSP: Digital Signal Processor

DVD: Digital Video Disc

GOP: Group of Pictures

HDTV: High Definition Television

HVS: Human Visual System

IEC: International Engineering Consortium

ISO: International Standards Organization

ITU: International Telecommunication Union

JVT: Joint Video Team

MB: Macroblock

MC: Motion Compensation

xiii

MCPE: Motion Compensated Prediction Error

ME: Motion Estimation

MPEG: Motion Picture Experts Group

MSE: Mean Square Error

NALU: Network Abstraction Layer Unit

POCS: Projection onto Convex Sets

PSNR: Peak Signal to Noise Ratio

QP: Quantization Parameter

SI: Switching Intra

SIMD: Single Instruction Multiple Data

SP: Switching Prediction

VCL: Variable Length Coding

VLIW: Very Long Instruction Word

xiv

Chapter 1

INTRODUCTION

1.1 Background

H.264/AVC (advanced video coding) is a new recommendation

international standard published jointly by ITU-T VCEG (Video Coding

Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group) [1].

The main purpose of this standard is to provide a broad range of multimedia

applications with higher reliability and efficient coding performance when

transporting compressed video through various networks compared to

former standards [2].

At very low bit rate coding, artifacts are visible in the reconstructed

frames [3]. Prominent among them are blocking and ringing artifacts. Post

filter or loop filter is always a viable option to reduce those artifacts, as we

are gaining video quality without any increase in bit rate. The main

advantage of using a loop filter is that it improves both objective and

subjective quality of video streams with significant reduction in decoder

complexity as compared to post filtering [4], [5]. In inter coding the

1

reference frames are filtered ones, thereby reducing the complexity. H.264

standard has a loop filter as a normative part of the standard to reduce the

blocking artifacts at very low bit rate coding. H.264 uses 4x4 integer DCT

(discrete cosine transform) transform, which helps to reduce the ringing

artifacts. In FREXT (fidelity range extension) 8x8 integer DCT is also used.

Though this filter gives good results at low bit rate coding, its major

drawback is its implementation complexity.

1.2 Objective of the thesis

The purpose of this thesis is to optimize the deblocking algorithm to

reduce the complexity in H.264 codec for real time implementation.

Analysis of the run-time profiles of the decoder sub-functions indicates that

deblocking filter process in H.264 standard is the most computationally

intensive part [6].

1.3 Outline of the thesis

In chapter 2, basic video coding theory and the evolution of main

stream video-coding standards are discussed. Chapter 3 gives an overview of

H.264 standard. Chapter 4 describes the need for post processing, the review

2

of post processing techniques for compression artifact removal and the pros

and cons of each technique. Chapter 5 describes the existing deblocking

filter in H.264/AVC in detail and includes its complexity analysis. Chapters

6 describes the proposed method as applied to the intra and inter frames and

the results obtained. Chapter 7 concludes with the comparison of the

obtained results and future research directions.

3

Chapter 2

BASIC VIDEO CODING THEORY

2.1 Video compression

The digital video compression technology has been gaining popularity

for many years. Today, when people enjoy HDTV (high definition

television), movie broadcasting through Internet or the digital music such as

MP3, the convenience that the digital video industry brings to us cannot be

forgotten. All of these should attribute to the advances in compression

technology, enhancement on mass storage media or streaming video/audio

services. As the main contributor to all of above, video compression

technology is the focus of this chapter. Some basic video compression

concepts will be introduced as the basis of chapter 3.

2.1.1 RGB and YUV color spaces

RGB (red-green-blue) color space is well suited to capture and display

of color images. The image consists of three grayscale components

(sometimes referred to as channels) [2]. The combination of red, green and

blue with different weights can produce any visible color. A numerical value

4

is used to indicate the proportion of each color. The drawback of RGB

representation of color image is that all 3 colors are equally important and

should be stored with the same amount of data bits. It is found that HVS

(human visual system) is less sensitive to color than to brightness. In order

to take advantage of this finding, a new color space called YUV (luminance-

chrominance (blue)-chrominance (red)) is proposed. Instead of using the

color of the light, YUV chooses the luminance (Y) and chrominance (UV) of

the light to represent a color image. YUV uses RGB information, but it

creates a black and white image (luma) from the full color image and then

subtracts the three primary colors resulting in two additional signals (chroma

/Cb, Cr) to describe color. Combining the three signals back together results

in a full color image [2]. The luminance information Y can be calculated

from R, G and B according to the following equations:

Y = krR+kgG+kbB (2.1)

where k is the weighting factors, kr+kg+kb=1

The color difference information (Chroma) can be derived as:

Cb=B – Y (2.2)

Cr=R – Y (2.3)

Cg=G – Y (2.4)

5

In reality, only three components (Y, Cb and Cr) need to be transmitted for

video coding because Cg can be derived from Y, Cb and Cr. As recommended

by ITU-R [], kb=0.114, kr = 0.299. The Equations (2.2) thru (2.4) can be

rewritten as:

Y=0.299R + 0.587G + 0.114B (2.5)

Cb=0.564(B – Y) (2.6)

Cr=0.713(R – Y) (2.7)

R=Y + 1.402Cr (2.8)

G=Y – 0.344Cb – 0.714Cr (2.9)

B=Y + 1.772Cb (2.10)

In reality, images are looked as 2D arrays. Fig. 2.1a shows the red, green and

blue components of a color image in comparison to chroma components Cb,

Cr and Cg of Fig. 2.1b.

6

Figure 2.1 Components of an image: (a) R, G, B components, (b) Cb, Cr, Cg

components [2]

2.1.2 Video sampling

The video source is normally a bit steam consists of a series of frames

or fields in decoding order [1]. There are three YCbCr sampling modes

supported by MPEG-4 and H.264 (Fig 2.2).

Figure 2. 2 Color format, (a) 4:2:0, (b) 4:2:2, (c) 4:4:4

4:2:0 is the most common used sampling pattern. The sampling interval of

luminance sample Y is the same as the video source. The Cb and Cr have

twice the sampling interval as luminance on both vertical and horizontal

7

directions (Fig. 2.2a). In this case, every 4-luma samples have one Cb and

one Cr sample. As HVS is less sensitive to color than to brightness, it is

possible to reduce the resolution of chrominance part without degrading the

image quality apparently. This makes 4:2:0 very popular in current video

compression standards [2]. This mode is widely used for consumer

applications such as video conferencing, digital television and DVD (digital

versatile disc) storage. For 4:2:2 mode, Cb and Cr have the same vertical

resolution as luma but half the horizontal resolution as luma (Fig. 2.2b). This

mode is used for high quality color representation. 4:4:4 mode has the same

resolution for Y, Cb and Cr on both directions (Fig. 2.2c).

2.1.3 Redundancy reduction

The basic idea of video compression is to compress an original video

sequence (raw video) into a smaller one with fewer numbers of bits. The

compression is achieved by removing redundant information from the raw

video sequence. There are totally three types of redundancies present:

temporal, spatial and frequency domain redundancies.

Spatial and temporal redundancies: Pixel values are not

independent, but are correlated with their neighbors both within the same

frame and across frames [2]. Spatial redundancy is the little variation in the

8

content of the image within a frame (Fig. 2.3). Utilizing spatial redundancy,

the value of a pixel is predictable from the known values of neighboring

pixels. In time domain, there is little variation in frame content between

consecutive frames except for the case when the object or content of the

video is changing quickly. This is often known as temporal redundancy (Fig.

2.3).

Figure 2.3 Spatial and temporal redundancies [2]

Frequency domain redundancy: The HVS is more sensitive to lower

frequencies [19] than to higher frequencies.

2.1.4 Video codec

The redundancies mentioned above can be removed by different

methods. The temporal and spatial redundancies are often reduced by motion

estimation (and compensation) .The frequency redundancy is commonly

9

reduced by DCT and quantization. After these operations, entropy coding is

employed to the data to achieve further compression.

Figure 2.4 Common video coding flow [2]

Each function block of the common video coding flow (Fig. 2.4) will

be addressed in the order that it exists in the video coding process.

2.1.5 Motion Estimation

The input to the coding system is an uncompressed video sequence. In

motion estimation, we find the best match for the current block from

previous or future frames. We find the best match for current block by

selecting an area in the reference frame (past or future reference frames) that

minimizes the residual energy (Fig. 2.5). In motion compensation process,

the chosen candidate region is subtracted from the current block to form a

residual block.

10

Figure 2.5 Motion estimation procedure

In practice, motion estimation and compensation are often based on

rectangular blocks (MxN or NxN). The most common size of the block is

16x16 for luminance component and 8x8 for chrominance components

(4:2:0 format). A 16x16 pixel region called macroblock is the basic data unit

for motion compensation in current video coding standards (MPEG series

and ITU-T series). It consists of one 16x16 luminance sample block, one

8x8 Cb sample block and one 8x8 Cr sample block (Fig 2.6).

Figure 2.6 Macro block representation in 4:2:0 format

11

Theoretically, the smaller the block size is in 4:2:0 format, the better the

motion estimation performance.

2.1.6 Motion vectors

Motion vector is a two-value pair (∆x, ∆y), which indicates the

relative position offsets of the current macro block compared to its best

matching region in both vertical and horizontal directions (Fig 2.7). Motion

vector is encoded and transmitted together with the residual.

Figure 2.7 Motion vector representation

During the decoding process, the residual should be added to the

matching region to recover the current frame. With the help of motion

vectors, the matching region can be found from the reference frame.

12

2.1.7 Block size effect

Fig. 2.9 shows the residual of 2 successive frames based on different

block size. Fig. 2.9a and Fig. 2.9b are the original frames. Fig. 2.9c is the

residual without motion estimation. Fig. 2.9d, Fig. 2.9e and Fig. 2.9f are the

MCPE (motion compensated prediction error) based on 16x16, 8x8 and 4x4

block (Fig. 2.8) motion estimation respectively. Residual is the difference of

frame 1 and frame 2. The mid-gray in the residual indicates that the subtract

result is zero. The light or dark in the residual indicates the result is positive

or negative. The more mid-gray area is, the more redundant information is

reduced. In order to achieve higher compression efficiency, H.264 has

chosen smaller block size for motion estimation. However, as the redundant

information within residual is reduced, there is increase in motion vectors

encoded and transmitted. Therefore H.264 supports changing the block size

dynamically according to the content of the frame.

13

Figure 2.8. Macroblock partitions for motion estimation and compensation [46].

Figure 2.9 Block size effects on motion estimation, (a) Frame 1, (b) Frame 2, (c) No motion estimation (Inter-frame difference), (d) 16x16 block

(MCPE), (e) 8x8 block (MCPE), (f) 4x4 block (MCPE) [2].

(a) (b) (c)

(e) (f)(d)

14

2.1.8 Sub-pixel interpolation

The accuracy of motion compensation is in units of distance between

pixels. In case the motion vector points to an integer-sample position, the

prediction signal consists of the corresponding samples of the reference

picture; otherwise the corresponding sample is obtained using interpolation

to generate non-integer positions [2]. Non-integer position interpolation (Fig

2.10) gives the encoder more choices when searching for the best matching

region compare to integer motion estimation; the result is the redundancy in

the residual can be reduced further.

Figure 2.10 Sub-pixel interpolation [2]

15

2.1.9 Discrete cosine transform

After the motion estimation, the residual data can be converted into

another domain (transform domain) to minimize the frequency redundancy.

The choice of a transform depends on number of criteria: a) Data in the

transform domain should be decorrelated and compact. b) The transform

should be reversible. c) The transform should be computationally tractable.

The most popular transforms fall into 2 categories: block based and image

based. Examples of block-based transforms include KLT (Karhunen-Loève

transform), SVD (singular value decomposition) and DCT. Examples of

image-based transforms include DWT (discrete wavelet transform). DCT is

the most popular transform of all these and is being currently employed in

most video coding standards.

H.264/AVC employs smaller size of transform compared to earlier

standards. There is a tradeoff associated with the size of transform used. The

large transforms can provide a better energy compaction and better

preservation of detailed features in a quantized signal than a small transform

does. But at the same time large transform introduces more ringing artifacts

caused by quantization than small transform does.

16

2.1.10 Quantization

After DCT, quantization is employed to truncate the magnitude of

DCT coefficients in order to reduce the number of bits that represent the

coefficients. Quantization can be performed on each individual coefficient,

which is known as scalar quantization (SQ). This can also be performed on a

group of coefficients together, and this is known as vector quantization

(VQ). [7].

2.1.11 Zigzag scan

After quantization, most of the non-zero DCT coefficients are located

close to the upper left corner in the matrix. Through zigzag scan (Fig 2.11),

the order of the coefficients is rearranged in the order that most of the zeros

are grouped together in the output data stream. In the following stage using

run length coding, this string of zeros can be encoded with very few

numbers of bits.

17

Figure 2.11 Zigzag scan [2].

2.1.12 Run length encoding

Run-length coding chooses to use a series of (run, level) pairs to

represent a string of data. For example: For an input data array: {2, 0, 0, 0, 5,

0, 3, 7, 0, 0, 0, 1….} the output (run, level) pairs are: (0, 2), (3, 5), (1, 3), (0,

7), (3, 1)… Run here means the number of zeros in a data before the next

non-zero data. Level is the value of the non-zero data.

2.1.13 Entropy coding

The last stage in Fig 2.4 is entropy coding. Entropy encoder

compresses the quantized data into smaller number of bits for future

transmission. This is achieved by giving each value a unique code word

18

based on its probability in the data stream. The more the probability of a

value, the fewer bits are assigned to its code word. The most commonly used

entropy coders are the Huffman encoder and the arithmetic encoder, though

for applications requiring fast execution, simple run-length encoding (RLE)

has proven very effective [7].

Two advanced entropy-coding methods know as CAVLC (context-

based adaptive variable length coding) and CABAC (context-based

arithmetic coding) are adopted by H.264/AVC. These two methods have

improved coding efficiency compared to the methods applied in previous

standards.

2.2 MPEG and H.26x

2.2.1 ISO/IEC, ITU-T and JVT

ISO/IEC and ITU-T (international telecommunication union) are two

main international standards organizations for recommending coding

standards of video, audio and their combination. H.26x family of standards

is designed by ITU-T. As the ITU Telecommunication Standardization

Sector, ITU-T is a permanent organ of ITU responsible for studying

technical, operating and tariff questions and issuing Recommendations on

them with a view to standardizing telecommunications on a world-wide

19

basis [1]. H.261 is the first version of H.26x series started since 1984.

During the following years, H.262, H.263, H.263+, H.263++ and H.264 are

released by ITU-T subsequently [1].

The MPEG (moving picture expert group) family of standards

includes MPEG-1, MPEG-2 and MPEG-4, [8] formally known as ISO/IEC-

11172, ISO/IEC-13818 and ISO/IEC-14496. MPEG is originally the name

given to the group of experts that developed these standards. The MPEG

working group (formally known as ISO/IEC JTC1/SC29/WG11) is part of

JTC1, the Joint ISO/IEC Technical Committee on Information Technology.

The Joint Video Team (JVT) consists of members from ISO/IEC

JTC1/SC29/WG11 (MPEG) and ITU-T SG16 Q.6 (VCEG). They published

H.264 Recommendation/MPEG-4 part 10 standard [1].

2.2.2 H.261

H.261 is first developed by ITU-T in 1990. It is a video compression

standard, which targets on low bit- rate real time applications (down to 64

kbit/s), such as visual telephone service. The basic idea of video coding is

based on DCT, VLC entropy coding and simple motion estimation technique

for reducing the redundancy of the video information.

20

2.2.3 MPEG -1

The MPEG-1 standard, published in 1992, was designed to produce

reasonable quality images and audio at low bit rates. MPEG-1 provides the

resolution of 352x240 (SIF) for NTSC or 352x288 for PAL at 1.5 Mb/s. The

target applications are focused on the CD-ROM, video-CD, and stream

media applications like video over digital telephone networks, video on

demand (VOD) etc. The picture quality level almost equals to VHS tape.

MPEG-1 can also be encoded at bit rates as high as 4-5Mbits/sec. MPEG-1

specified the compression of audio signals, simply called layer-1,-2,-3.

Layer-3 is now very popular in the digital music distribution over Internet

known as MP3.

2.2.4 H.262 and MPEG-2

MPEG–2 standard was established by ISO/IEC in 1994. The purpose

of this standard is to produce enhanced data rate and better video quality

compared to MPEG–1. The coding technique of MPEG-2 is the same as

MPEG-1 but with a higher picture resolution of 720x486.The unique feature

of MPEG-2 is the layered structure, which supports a scalable video system.

In this system, a streaming video can be decoded to videos with different

21

qualities according to the situation of the network and the customer

requirements. Field and frame picture structure makes the standard

compatible with interlaced video. For the consistency of the standards,

MPEG-2 is also compatible with MPEG-1, which means a MPEG-2 player

can play back MPEG-1 video without any modification. This standard is

also adopted by ITU-T referred to as H.262.

2.2.5 H.263/H.263+/H.263++

H.263 (1995) [9] is the improvement of H.261. Compared to the

former standards, H.263 provides (achieves) better picture quality and higher

compression rate by using half pixel interpolation and more efficient VLC

coding. H.263 version 2 (H.263+) and H.263 version 3 (H.263++) give more

options to the coding standard on the basis of H.263 which achieves higher

coding efficiency, more flexibility, scalability support and error resilience

support.

2.2.6 MPEG-4

MPEG-4 (ISO/IEC 14496) became the international standard in 1999

[8]. The basic coding theory of MPEG-4 still remains the same as previous

MPEG standards but more networks oriented. It is more suitable for

broadcast, interactive and conversational environments. MPEG-4 introduced

22

‘objects’ concept: A video object in a scene is an entity that a user is allowed

to access (seek, browse) and manipulate (cut and paste). It serves from (2

kbit/s for speech, 5 kbit/s for video) to (5 Mbit/s for transparent quality

video, 64 kbit/s per channel for CD quality audio) [8].

2.2.7 MPEG-4 part-10/ H.264

ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC MPEG

jointly develop the newest standard, H.264/AVC (also known as MPEG-4

part 10). The motivation of this standard comes from the growing

multimedia services and the popularity of HDTV, which need more efficient

coding method. At the same time, various transmission media especially for

those low speed media (Cable Modem, xDSL or UMTS) also called for the

significant enhancement of coding efficiency. By introducing some unique

techniques, H.264/AVC aims to increase compression rate significantly

(save up to 50% bit rate as compared to MPEG-2 picture quality) while

transmitting high quality image at both high and low bit rates. The standard

can increase resilience to errors by supporting flexibility in coding as well as

organization of coded data. Network adaptation layer allows H.264 bit

stream to be transported over different networks. The increase in coding

efficiency and coding flexibility comes at the expense of increase in

23

complexity as compared to the other standards. These features are discussed

in chapter 3.

Figure 2.12 Progression of the ITU-T recommendations and MPEG standards

MPEG Standards

Joint ITU-T/ MPEG

Standards

ITU-T Standards H.261 H.263 H.263+ H.263++

H.262/ MPEG-2

MPEG-1 MPEG-4

H.264/ MPEG-4 AV C

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

24

Chapter 3

OVERVIEW OF H.264/AVC STANDARD

As broadband wire and wireless communication is booming in the

world, streaming video has become one of the most important applications

both in internet and telecom industry. 3G wireless service has been launched

throughout the world and enhanced data service such as HSDPA (high-speed

downlink packet access) is introduced with bandwidth more than 384kbps.

Thus multimedia streaming including video and audio are expected to be

delivered to end users. However the total bandwidth is still limited and the

costs for end user are proportional to the reserved bit rate or the number of

bits transmitted on the data link. At the same time, since harsh transmission

environment in wireless communications such as distance attenuation,

shadow fading, and multi-path fading can introduce unpredictable packet-

loss and error during transmission, compression efficiency and error

resilience are the main requirements for video coding standard to succeed in

the future.

25

Currently there are several image and video-coding standards that are

widely used such as JPEG, JPEG2000, MPEG-2, and MPEG-4 [2]. In 2003,

H.264/AVC was introduced with significant enhancement both in

compression efficiency and error resilience. Compared with former video

coding standards such as MPEG-2 and MPEG-4 part 2, it saves

approximately 50% in bit rate [10] and provides important characteristics

such as error resilience, stream switching, fast forward/backward etc. It is

believed to be the most competitive video coding standard in this new era.

However, the improvement in performance comes at the expense of increase

in computational complexity, which requires higher speed both in hardware

and software. H.264/AVC targets the applications like video conferencing

(full duplex), video storage or broadcasting (half duplex) with enhanced

compression efficiency as well as network friendliness. The scope of

H.264/AVC covers two layers: network abstraction layer (NAL) and video

coding layer (VCL) (Fig. 3.1). While NAL gives a better support for the

video transmission through a wide range of network environments, VCL

mainly focuses on how to enhance the coding efficiency.

26

Figure 3.1 H.264/AVC layer structure [11]

In this chapter, the features that make H.264/AVC achieve the

performance improvement compared to former existing standards will be

investigated. The improvement to the video coding layer will be addressed

in more detail. Before discussing the H.264/AVC technical features, some

important terminologies should be introduced first.

Video coding standards commonly use hierarchical syntax. A video

sequence is divided into group of pictures. A picture is divided into slices. A

slice is divided into macroblocks. A macroblock is divided into blocks (Fig.

3.2). In H.264/AVC additionally a block can be further divided into sub

blocks.

27

Figure 3.2 Hierarchical syntax

Coded picture: A coded picture in this standard refers to a field

(interlaced video) or a frame (progressive or interlaced video) (Fig 3.3).

Each coded frame has a unique frame number, which is signaled in the bit

stream. The frame number is not necessarily the same as the decoding order

of the frame. For those interlaced frame or progressive frame, each field has

an associate picture order count, used to indicate the decoding order

between the two fields.

28

Figure 3.3 Progressive and interlaced frames [2].

Each previously coded picture can be used as the reference picture for

future decoded pictures. One notable feature here is the reference pictures

are managed by one or two lists (list0 and list1). Macro block is the basic

data unit for video coding operations. A set of macro blocks is further

grouped into a slice in raster scan order. A frame may be split into one or

more slices (Fig. 3.4).

Figure 3.4 Subdivision of a picture into slices [10].

29

For each slice, the macro blocks within it are coded independently

from those within other slices. There are totally 5 types of slices defined in

H.264 /AVC:

I slice: All the macro blocks in this slice are I macro blocks. They are

coded without referring to previously coded pictures, but may use the

decoded samples within the same slice (current picture) as reference (intra

prediction).

P slice: Macro blocks in this type of slice can be P macro blocks or I

macro blocks. P macro block is predicted by referring to one previously

decoded picture in list 0.

B slice: In addition to the coding types available in a P slice, macro

blocks of a B slice can also be coded using inter prediction with two

reference pictures per predicted block (one from list 0 and/or one from list 1)

that are combined using a weighted average.

SP slice: A so-called switching P slice that is coded such that efficient

switching between different video streams becomes possible without the

large number of bits needed for an I slice [12].

SI slice: A so-called switching I slice that allows an exact match of a

macro block in an SP slice for random access or error recovery purposes.

30

The last two types of slices are newly added in H.264/AVC and the

first three slices are similar to those used in earlier standards.

Profile: Profile defines a set of coding algorithms or functions that a

coding standard may use. In H.264 the following profiles are defined known

as baseline profile (lower capability plus error resilience), main profile (high

compression quality), extended profile (added features for efficient

streaming) and high profile (Fig. 3.5).

Level: The performance limits for codecs are defined as a collection

of levels, each places a restriction on the configurations of the coding

process, such as decoding speed, sample rate, number of blocks per second

etc.

31

Figure 3.5 The specific coding parts of profile in H.264 [42].

3.1 Network abstraction layer

The NAL (network abstraction layer) is designed to provide friendly

transmission for video data through different network environments. The

coded video data is packetized into NAL units in order to support most of

the existing packet switched based network environments. Each NAL unit is

a packet that contains an integer number of bytes. The first byte of each

NAL unit is the header that contains an indication of the data type in this

NAL unit, and the following bytes contain the data indicated by the header

[11]. For more detailed information about NAL, please refer to [11].

32

3.2 Video coding layer

Modifications have been made in the video coding layer of H.264 in

order to achieve significant compression efficiency as compared to previous

standards. The basic encoding structure of H.264/AVC is shown in Fig. 3.6.

In H.264, first the video source is divided into blocks of luma and chroma.

Then the motion estimation and prediction are employed to exploit the

temporal and spatial redundancies. Then transform coding, quantization and

entropy coding are applied in serial, which finally generate the output bit

stream. This bit stream can be used to transmit through networks or stored

with optical or magnetic storage devices.

Figure 3.6 H.264 encoder [45]

33

The decoding flow consists of a series of reversed operations in terms

of the encoding process (Fig 3.7). The only operation added to the decoding

flow is the loop-filter (deblocking filter). The purpose of this filter is to

minimize the block distortions introduced by block based transformations

and motion estimations. The video decoding procedure is defined in existing

standards (also for H.264/AVC), which means by imposing the decoding

process with a collection of restrictions (such as the restrictions on bit stream

and syntax), any encoding process that produces a decodable bit stream

(decodable by the standard decoding process) is an applicable encoder. By

this way, the developers have strong flexibility in developing the encoders in

order to incorporate different applications with various requirements (such

as compression quality, implementation cost, time to market, etc) [11].

34

Figure 3.7 H.264 decoder [46].

3.2.1 Motion estimation and compensation for inter frames

The key features added to motion estimation and compensation part of

H.264/AVC include (1) variable block-size motion compensation with small

block sizes, (2) quarter-pixel motion estimation accuracy, and (3) multiple

reference pictures selection.

1. Variable block-size motion compensation with small block sizes:

In previous standards, motion estimation is based on 16x16 macro

block for luma component and as 8x8 block for chroma component for 4:2:0

format. But in H.264/AVC, different block sizes are supported for motion

35

compensation. The luminance component (Y) of each macro block can be

partitioned in 4 ways: one 16x16 macro block, two 16x8 rectangular blocks,

two 8x16 rectangular blocks or four 8x8 blocks (Fig. 3.8). If the 8x8 mode is

chosen, each 8x8 block may be further divided into four ways. One 8x8

block, two 8x4 sub blocks, two 4x8 sub blocks or four 4x4 sub blocks (Fig.

3.8).

Figure. 3.8 Motion compensation accuracy [45]

Variable block size is chosen by transmitting one additional syntax

element for each 8x8 partition. This syntax element specifies whether the

corresponding 8x8 partition should be divided further. The partition strategy

36

can be looked as a tree structure to some extent. The partitions for chroma

components (Cb, Cr) in 4:2:0 format are done in the same manner except the

size of each partition is only half of the luma partition in both horizontal and

vertical coordinates (16x8 in luma corresponding to 8x4 in chroma, 8x4 in

luma corresponding to 4x2 in chroma) [2]. The smaller the block is split, the

less energy is left within the residual. By using the combination of seven

different block sizes, the bit rate savings of up to 12% can be achieved as

compared to using only a 16x16 block size [13].

2. Quarter-pixel motion estimation accuracy:

Most of the existing standards support the motion estimation accuracy

up to half sample pixel. In H.264/AVC, the maximum accuracy is enhanced

to quarter pixel. Each partition or sub-macro block partition in an inter-

coded macro block is predicted from an area of the same size in a reference

picture. The offset between the two areas (the motion vector) has quarter-

sample resolution for the luma component and one-eighth sample resolution

for the chroma components. The luma and chroma samples at sub-sample

positions do not exist in the reference picture and so it is necessary to create

them by using interpolation from nearby coded samples.

37

Figure 3.9 shows the quarter pixel interpolation of a 4x4 luma block.

The gray dots noted with upper case indicate the integer-position

samples .The white dots noted with lower case indicate the half and quarter-

pixel samples. First the half sample positions are obtained by applying a 6-

tap filter with tap values: (1, -5, 20, 20, -5, 1)/32. Quarter sample positions

are obtained by averaging samples at integer and half sample positions. In

practice, the motion vectors (MV) of the block use one or two bits to

indicate if the motion estimation is integer, half-pixel or quarter-pixel. The

quarter-pixel accuracy gives 20% bit rate savings [14] as well as more

accurate motion representation compared to integer-pixel spatial accuracy.

38

Figure 3.9 Quarter sample luma interpolation [2]

3.2.2 Multiple reference pictures selection

The H.264/AVC standard gives flexibility for the encoder to select

large number of decoded reference pictures. This flexibility increases the

requirement of memory size for both encoder and decoder but enhances the

compression efficiency at the same time. For P macro block, the reference

picture can be chosen from multiple former decoded pictures (Fig. 3.10).

Therefore not only the motion vectors but also a reference index parameter ∆

(which indicates which picture should be referenced) is transmitted. The

39

reference index parameter is transmitted for each motion-compensated

16x16, 16x8, 8x16, or 8x8 luma block. Motion compensation for regions

smaller than 8x8 uses the same reference index for prediction of all blocks

within the 8x8 region.

Figure 3.10 Multiple reference frames and generalized bi-predictive frames [10].

Because of the increased complexity in motion prediction, the

H.264/AVC standard employs two distinct lists of reference pictures (list 0

and list 1). For P slice, only the list 0 is used to store the reference pictures

whereas B slice needs both list 0 and list 1. For detailed reference picture

management, please refer to [1] and [2].

The motion compensated-prediction for B slice is in the same manner

except that it is a bi-directional prediction. In B slices, four different types of

inter-picture prediction are supported: list 0, list 1, bi-predictive, and direct

prediction. For the bi-predictive mode, the prediction signal is formed by a

40

weighted average of motion-compensated list 0 and list 1. The direct

prediction mode is inferred from previously transmitted syntax elements and

can be either list 0 or list 1 prediction or bi-predictive.

Multiple reference pictures in the new standard yield about 5-20%

coding efficiency [14] compared to former standards that use only one

reference frame.

3.2.3 Intra prediction

Intra prediction allows the current macro block to be predicted by the

previously decoded samples within the same slice at the decoder. The

encoder can switch between intra and inter prediction dynamically according

to the content of the frame. The directional spatial prediction for intra coding

improves the quality of the prediction signal (Fig. 3.11).

Figure 3.11 Intra prediction in H.264 [45]

41

Luma intra prediction either has a single prediction for entire 16x16

macro block or 16 individual predictions of 4x4 blocks. In high profiles

there is also 8x8 intra prediction. There are 9 intra 4x4 (DC, 8 directional)

and 4 intra 16x16 (vertical, horizontal, DC, planar) prediction modes for

luma components. For chroma components, 4 8x8 based intra prediction

modes (vertical, horizontal, DC, planar) are supported and both the chroma

components of the same macro block (Cb and Cr) use the same prediction

mode. In addition to the intra prediction modes above, another intra coding

mode (I_PCM) is also used for some special cases. I_PCM just sends the

image samples without prediction or transformation. The I_PCM mode

guarantees a limit on the expansion of white noise during compression.

The 16x16 and 4x4 intra prediction directions modes are shown in the

Figs. 3.12 and 3.13 respectively.

Figure 3.12 16x16 intra prediction directions

For regions with less spatial detail (flat regions), H.264 supports

16x16 intra coding. The prediction mode for each block is efficiently coded

42

by assigning shorter symbols to more likely modes, where the probability of

each mode is determined based on the modes used for coding the

surrounding block.

Figure 3.13 4x4 intra prediction directions

3.2.4 Transform and quantization

H.264/AVC employs integer spatial transform which is primarily 4x4

in shape (Fig. 3.14), as opposed to the usual floating point 8x8 DCT

43

specified with rounding error tolerances as used in earlier standards. H.264

can use 3 transforms depending on the type of the residual data that is to be

coded: a transform for the 4x4 array of luma DC coefficients in intra macro

blocks (predicted in 16x16 mode), a transform for the 2x2 array of chroma

DC coefficients (in intra macro block) and integer DCT for all other 4x4

blocks in the residual data.

Figure 3.14 Transform coding [45]

The characteristics of transform used in H.264/AVC are as follows:

1. Transform is applied in 2 stages for 16x16 intra prediction. In the first

stage 4x4 integer DCT is applied. Hadamard transform is applied in the

second stage to the DC components of the first stage transform.

44

2. Transform of size of block 4x4 is separable.

3. Integer transform: Accuracy mismatch at the encoder/decoder can be

eliminated.

4. As it consists of only adds and shifts, it is easy to implement it.

5. Different norms for even and odd rows of the matrix.

6. Due to the use of small size transform, it reduces ringing artifacts in a

frame [15].

The characteristics of quantization used in H.264/AVC are as follows:

1. Scaling part of the transform in H.264/AVC is integrated into the

quantizer.

2. Logarithmic step size control.

3. Extended range of step sizes. QP is in the range of 0-51.

4. Smaller step size for chroma compared to luma.

5. Quantization reconstruction is just one multiply, one add and one

shift.

3.2.5 Deblocking filter

Deblocking filter is introduced in H.264/AVC standard to minimize

the block distortion caused by present compression technique. By

controlling the strength of the filtering with some parameters, the block

45

edges can be smoothed and the appearance of the decoded frames is also

enhanced. The deblocking filter is an in-loop filter, which means it is not a

kind of post processing. For post filter, the input is a completely

reconstructed frame, but for in-loop filter, the input is the current MB and

the boundaries of each decoded macro block are filtered immediately.

Deblocking filter is added to the encoder after inverse transformation and

before the frame store. For decoder, the filter is located after the

reconstruction of the frame for display. The deblocking filter is explained in

detail in chapter 5.

3.2.6 Entropy coding

Exp-Golomb code is used universally for all symbols except for

transform coefficients. The following two entropy-coding algorithms are

used in H.264 standard:

1. Context Adaptive Variable Length Coding

a. No end-of block, but number of coefficients is decoded.

b. Coefficients are scanned backwards and contexts are built

depending on transform coefficients.

c. Transform coefficients are coded with the following elements:

number of non-zero coefficients, levels and signs for all non-

46

zero coefficients, total number of zeros before last non-zero

coefficient, and run before each non-zero coefficient.

d. The VLC table to use is adaptively chosen based on the number

of coefficients in the neighboring blocks.

2. Context Adaptive Binary Arithmetic Coding

a. Overview of CABAC is shown in Fig 3.15.

b. Usage of adaptive probability models for most symbols.

c. Exploiting symbol correlations by using contexts.

d. Discriminate between binary decisions by their positions in the

binary sequence.

e. Probability estimation is realized via look up table.

Figure 3.15 CABAC overview [45]

47

3.3 Conclusions

H.264/AVC gives significant enhancement both in compression

efficiency and error resilience. Compared with earlier video coding

standards such as MPEG-2 and MPEG-4 part 2, it saves more than 40% in

bit rate [14] and provides important characteristics such as error resilience,

stream switching, fast forward/backward etc. It is believed to be the most

competitive video coding standard in this new era. However, the

improvement in performance also introduces increase in computational

complexity, which requires higher speed both in hardware and software.

48

Chapter 4

REVIEW OF POST PROCESSING TECHNIQUE

4.1 Need for postprocessing

At very low bit rate coding, artifacts are visible in the decoded frames

of most video coding standards [3]. As these artifacts are visually unpleasant

for the viewer, a method is needed to remove them. Some of today’s real

time or even offline applications demand higher bandwidth than the channel

can accommodate. Some of these applications are video telephony,

videoconferencing, and video streaming over the internet. Signal source,

compression methods and coding bit rates normally influence the perceptual

quality of compressed images and video [16]. In general, the less the bit rate

the severe the coding artifacts manifest in reconstructed video. The

compression method used determines the type of artifacts that occur in the

reconstructed video. BDCT (block discrete cosine transform) based coding

scheme introduces blocking artifacts in flat regions and ringing artifacts

along object edges [17] at low bit rate coding. Wavelet based coding

49

schemes introduce blurring and ringing artifacts [18] at low bit rate coding.

If the coding artifacts are removed from the reconstructed video, it will lead

to a significant improvement in visual quality of reconstructed video.

4.1.1 Preprocessing

Two techniques are widely used to remove the coding artifacts at low

bit rate coding. One way is to use pre-processing techniques at the encoder

side. Pre-processing techniques have been widely used in modern speech

and audio coding [19]. Many pre-processing techniques have been proposed

[20] which remove unnoticeable details from the source signal so that less

information has to be coded. Recently, different image transforms like

interleaved block transform [21], [22], the lapped transform [23], and the

combined transform [24] have been proposed to avoid blocking artifacts. But

each of them has its own coding scheme, which restricts its application to

commercial coding system products complaint with the existing standards.

4.1.2 Postprocessing

An efficient and widely used way to remove the artifacts is post-

processing. Post-processing gives a better video quality without an increase

in bit rate and any modification in the encoding procedure. Probably that is

the reason that it is an active area of research in recent years. Post-

50

processing techniques are employed at the decoder side. These techniques

can be broadly classified into two categories (image enhancement and image

restoration).

For algorithms based on image enhancement, improve the perceived

quality subjectively. The artifacts structure and HVS are taken into account

in the design of image enhancement methods. The image enhancement

approach aims at smoothing visible artifacts instead of restoring the pixel

back to its original value. One typical example is the application of filtering

along block boundaries to reduce blockiness [24].

Algorithms based on image restoration formulate post-processing as

an image recovery problem. Reconstruction is performed based on a priori

knowledge of the distortion model and the observed data at the decoder.

Several classical image restoration techniques, including CLS (constrained

least squares), POCS (projection onto convex sets), and MAP (maximum a

posteriori) have been used to alleviate compression artifacts. The

computational complexity of these methods is another issue for applications

that require real time processing.

51

4.2 Causes for blocking and ringing artifacts

As seen in section 4.1, at very low bit rate coding artifacts appear in

decoded video or images. BDCT based coding schemes are used in most

recent video coding standards like MPEG-2, MPEG-4, and H.264/AVC. At

low bit rate coding, BDCT based coding scheme introduces artifacts,

prominent among them are blocking and ringing artifacts [17].

4.2.1 Blocking artifacts

Blocking artifacts are the grid noise [25] along the block boundary in

a relatively homogeneous area (Fig 4.1). The total error introduced into an

individual pixel is sum of functions of quantization errors of each of the

DCT coefficients. Due to the complexity of HVS, the perceived distortion is

not directly proportional to the absolute quantization error, but also to the

local, global, spatial and temporal characteristics of the video sequence.

Unfortunately, coding a block as an independent unit does not take into

account the possibility that the correlation of pixels may extend beyond the

borders of a block into adjacent blocks, thereby leading to the border

discontinuity. Coarse quantization of transform coefficients is a major

contributor to the blocking artifacts. A different quantization step size used

for adjacent blocks is one of the other factors, which contributes to the

52

blocking artifacts. But due to the use of block based transform coding,

ensuring a similarity between the quantization step sizes of adjacent blocks

does not necessarily mean that the blocking effect will be eliminated [3].

Figure 4.1 Blocking Artifacts at low bit rate coding

4.2.2 Ringing artifacts

Ringing effects appear as spurious oscillations along the major objects

edges at low bit rates. The ringing effect is most evident along high contrast

edges in areas of generally smooth texture in the reconstruction. The DCT

basis images are not attuned to the representation of diagonal edges and

features [26]. Consequently, more of the higher frequency basis images are

required to satisfactorily represent diagonal edges or significantly diagonally

Blocking Artifacts

53

oriented features. At low bit rate coding, due to coarse quantization, most of

the contributions made by higher frequency basis images is reduced to zero.

As a result ringing effect occurs along edges. The poor energy compacting

properties of the DCT for blocks containing diagonal edges also extend to

the vertical and horizontal edges.

4.3 Deblocking filters

As the proposed work is to reduce the deblocking filter complexity,

only the deblocking filter will be discussed in detail. Although the

deblocking filters perform well in improving the objective and subjective

quality of output video frames, they are usually computationally intensive.

There are number of deblocking algorithms proposed for reducing the block

artifacts in BDCT based compressed images with minimal smoothing of true

edges. They can be classified into regression-based algorithm [27], wavelet

based algorithms [28], anisotropic diffusion based algorithm [29], weighted

sum of pixels across block boundaries based algorithm [30][31][32],

iterative algorithms based on POCS [33][34][35], and adaptive algorithms

[36]. An interesting approach of recasting the deblocking problem into a

denoising problem by injecting uniform random noise was also proposed in

54

[37]. While these algorithms operate in the spatial domain, algorithms that

work in DCT transformed domain were proposed in [38] [39] and [40].

Among the algorithms proposed, three key classes of algorithms are

discussed in the next section, which are based on POCS, weighted sum of

pixels across block boundaries, and adaptive filters.

4.3.1 POCS based iterative algorithms

This class of algorithms is originated from the image restoration

algorithm proposed in [34]. A number of constraints, two in most cases, are

imposed on an image to restore it from its corrupted version. After defining

the transformations between the constraints, the algorithm starts in an

arbitrary point in one of the sets and projecting iteratively among them until

convergence. In [33], MSE (mean square error) is used as a metric of

closeness between two consecutive projections. Convergence can be

imagined as achieving an MSE below a certain threshold value. If the

constraints are convex sets, it is claimed in [34] that a convergence is

guaranteed if the intersection of the sets is non-empty. This approach was

proposed for the first time in [33] to apply to deblocking of images. In this

paper, the constraint sets chosen are the frequency band limit in both vertical

and horizontal directions (known as filtering constraint) and the quantization

55

intervals of the transform coefficients (referred to as quantization

constraint). In the first step, an image is band limited by applying a LPF

(low pass filter) on it. After that, the image is transformed to obtain

transform coefficients, which are then subjected to the quantization

constraint. The coefficients lying outside of the quantization interval are

mapped back into the interval. For example, the coefficients can be clipped

to the minimum and maximum values if it is outside the interval. This two-

step process will be done iteratively until a convergence is reached. The

authors claimed that the algorithm converges after about 20 iterations in

their experiments.

Due to the iterative nature of this class of algorithms, time to

convergence is in fact unknown. This prohibits it from applying on real time

systems, in which run time of each module must be bounded. If the number

of iterations is bounded regardless of convergence, then the quality of video

becomes an unknown. The projections also involve filtering of the picture

and transformation to frequency domain and require unacceptably large

amounts of extra resources.

56

4.3.2 Weighted sum of symmetrically aligned pixels

In the second class of algorithms proposed, the value of each pixel in

the picture is recomputed with a weighted sum of itself and the other pixel

values, which are symmetrically aligned with respect to block boundaries. In

[30], the authors proposed a scheme of including three other pixels, which

are taken from the block above, to the left and the block above the left block.

The weights are determined empirically and can either be linear or quadratic.

The combined effect of these weighted sums on the pixels can be regarded

as an interpolation across the block boundaries. However, there is a problem

in this approach when a weighted sum of a pixel in a smooth block takes the

pixels in the adjacent high-detail blocks into account. The texture details get

into the smooth region and a vague image of the high detail can be seen.

This new artifact is referred to as ghosting [30]. A scheme of grading each

block according to the level of details with a grading matrix was proposed to

minimize this new artifact. The weights on each of the four pixels are then

increased or reduced according to the grades.

The execution time is guaranteed, as the operations are well defined.

Since the pictures must be graded before applying the filter on the pixels, a

4-pass scheme for processing a picture was proposed. This algorithm is

57

essentially performing a filtering operation on every pixel in a picture, with

the three passes of matrix operations in a grading process. A very high

performance platform is required to implement this algorithm in a real time

application.

4.3.3 Adaptive deblocking filter

In this class of algorithms, the deblocking process is separated into

two stages. In the first stage, the edge is classified into different boundary

strengths with the pixels along the normal to an edge. In the second stage,

different filtering schemes are applied according to the strengths obtained in

the first stage. In [36], the edges are classified into three types to which no

filter, weak 3-tap filter, and strong 5-tap filter are applied. The algorithm is

adaptive because the thresholds for edge classification are based on the

quantization parameters included in the relevant blocks. An edge will only

be filtered if the difference between the pixel values along the normal to the

edge, but not across the edge, is smaller than the threshold. For high detail

blocks on the side of edges, the differences are usually larger and so strong

filtering is seldom applied to preserve the details. As the threshold increases

with the QP, the edges across the high detail blocks will be filtered

eventually because a high coding error is assumed for large QP.

58

Since the edges are classified before processing, strong filtering can

be replaced by weak filtering or even skipped. Also the filtering is not

applied to every pixel but only to those across the edges. A significant

amount of computation can be saved through the classification. A

disadvantage of this algorithm is the higher complexity in control flow of the

algorithm.

4.3.4 Comparison of algorithms

The relative computation and implementation complexity of the three

classes of algorithms discussed is summarized qualitatively in Table 2. The

POCS based algorithms are considered most complex because of the flow

and major operations, which are more intensive than the other two methods.

The major operations for the weighted sum based algorithm and the adaptive

algorithm seems to be similar. For the case of 4x4 pixel blocks, the major

operation performed by adaptive algorithms is only about half of that by the

weighted sum based algorithms. The adaptive algorithm is considered more

complex in implementation because of its classification and applying

different filters adaptively.

59

Table 4.1 A summary on the computation and implementation complexity of the deblocking algorithms.

POCS-based

algorithms [33]

Weighted sum

based algorithm

[30]

Adaptive

algorithms [36]

Algorithm flow Iteratively

projecting

back and forth

between two

sets on entire

picture

Grading of

blocks with

grading matrix.

Iterative on

every pixel.

Iteratively

classify and

apply filter on

every block

edges.

Major operations LPF, DCT Weighted sum of

4 pixels for each

pixel

3-tap or 5-tap

filter on pixels

across edges.

Relative computation

complexity

High Medium Low

Relative implementation

complexity

High Low Medium

Quality Best Good Good

60

4.4 Comparison of postprocessing and loop filtering

Deblocking filters can be incorporated as post filters or loop filters.

Post filters operate on the display buffer outside of the coding loop, whereas

loop filters operate within the coding loop. Before going into the advantages

of loop filters over post filters, let’s look at its disadvantages. Post filters are

not a normative part of the standard. Hence they offer maximum freedom for

decoder implementations, as their use is optional.

There are several advantages of loop filters over post filters:

1. With a loop filter in the codec design, content providers can safely

assume that proper deblocking filters process their material, guaranteeing

the quality level expected by the producer.

2. In the loop filtering approach, filtering is carried out on macro block

basis during the decoding process and as the filtered output is directly

stored as reference frames, there is no need for an extra frame buffer in

the decoder. In the post filtering approach, however the frame is typically

decoded into a reference frame buffer. An additional frame buffer may be

needed to store the filtered frames to be passed onto the display device.

3. Empirical tests have shown that loop filtering typically improves both

objective and subjective quality of video streams with significant

61

reduction in decoder complexity compared to post filtering [4][5]. As the

filtered frames offer higher quality prediction for motion compensation,

quality improvements can be seen. Computational complexity is reduced

as the image area in the reference frames is already filtered.

4.5 Desired loop filter

Image blocking artifact reduction involves the following problems:

1. Smoothing artificial discontinuities (due to quantization noise or QP)

between blocks.

2. Differentiating between image edges and artificial edges.

3. Image edges should not be smoothed as it degrades image quality.

4. If needed, filters can be applied specific to image edges.

A loop filter should satisfy the above properties. In addition to this the

following properties are desired:

1. It should remove the blocking artifacts without blurring the image.

2. Its computational complexity should be low.

3. Its implementation complexity is low so that it can be implemented in

real time systems.

62

Chapter 5

DEBLOCKING FILTER IN H.264/AVC

H.264/AVC uses an adaptive in-loop deblocking filter to remove the

blocking artifacts visible in decoded frames at low bit rate coding. H.264

uses a 4x4 transform, which reduces the ringing effect in the decoded

frames. The significant contributor for blocking artifacts in H.264 is the

block-based integer DCT. The coarse quantization of the transform

coefficients can cause visually disturbing discontinuities at the block

boundaries [4] [41]. The other source for blocking artifacts in H.264/AVC is

motion compensated prediction. Motion compensated blocks are generated

by copying interpolated pixel data from different locations of possibly

different reference frames. Since there is almost never a perfect fit for this

data, discontinuities on the edges of the copied blocks typically arise.

Additionally, in the copying process, existing edge discontinuities in the

reference frames are carried into the interior of the block to be compensated.

63

5.1 Deblocking filter operation

Each video frames are divided into 16x16 pixel blocks called macro

blocks. The deblocking filter is applied to all the edges of 4x4 pixels blocks

in each macro block except the edges on the boundary of a frame or a slice.

For each block, vertical edges are filtered from left to right first, and then

horizontal edges are filtered from top to bottom. The decoding process is

repeated for all macro blocks in a frame.

5.1.1 Characteristics of deblocking filter

The main characteristics of a deblocking filter used in H.264/AVC are

as follows:

1. Improvement in subjective visual and objective quality of decoded

picture.

2. Significantly superior to post filtering.

3. Highly content adaptive filtering procedure mainly removes blocking

artifacts and does not necessarily blur the visual content [36].

On slice level, the global filtering strength can be adjusted to

the individual characteristics of the video sequence.

On edge level, the filtering strength is dependent on inter/intra,

motion, and coded residuals.

64

On sample level, quantizer dependent threshold can turn off

filtering for every individual sample.

Strong filters for macro blocks with very flat characteristics

almost remove blocking effects.

5.1.2 Principle of deblocking filter

The one-dimensional visualization of an edge position is shown in

Fig. 5.1. One line of sample values inside two neighboring 4x4 blocks are

denoted by a3, a2, a1, a0, b0, b1, b2, and b3 in Fig 5.1. The actual boundary is

between a0 and b0 as shown in Fig. 5.1. Filtering of p0 and q0 only takes place

if:

1. |a0-b0| < (QP)

2. |a1-a0| < (QP)

3. |b1-b0| < (QP)

where (QP) is considerably smaller than (QP).

Filtering of a1 or b1 takes place if additionally: |a2-a0| < (QP) or |b2-b0|

<(QP). Filtering can be done on a macroblock basis that is, immediately

after a macroblock is decoded. First, the vertical edges are filtered then the

horizontal edges. When interpreting edges in Fig. 5.2 as luma edges,

depending on the transform used (4x4 transform or 8x8 transform), the

65

Figure 5.1 One-dimensional view of a 4x4 block edge

following applies [1]:

If 4x4 transform is selected, both the solid bold and dashed bold luma

edges (Fig. 5.2) are filtered.

Otherwise (8x8 transform), only the solid bold luma edges are filtered.

When interpreting the edges in Figure 5.2 as chroma edges, depending

on the format used the following applies:

If 4:2:0 format is selected, only the solid bold chroma edges (Fig. 5.2)

are filtered.

Otherwise, if 4:2:2 format is selected, the solid bold vertical chroma

edges are filtered and both types, the solid bold and dashed bold

horizontal chroma edges are filtered.

66

Otherwise, if 4:4:4 format is selected, both types, the solid bold and

dashed bold chroma edges are filtered.

Figure 5.2 Boundaries in a macroblock to be filtered [1].

5.1.3 Algorithm of deblocking filter

The decision of filter tap for each pixel is based on the following

factors:

1. Boundary strength (bS).

2. Thresholds of α and β.

3. The content of sample pixels.

Figure 5.3 explains how the above factors are used to decide the filter

tap for each pixel (A0-A3, B0-B3). The first step decides (eq. 5.1) whether

67

the filtering is required or not. Then, according to the BS level, thresholds (α

and β), and the absolute differences of adjacent reconstructed pixels,

different filters are applied to different pixels [43]. According to the level of

bS, the numbers of input pixels are updated with the filtered result. The

higher the level of bS, more the number of input pixels updated with the

filtered results. In H.264, bS has 5 levels with 0 being lowest and 5 being

highest. If the level of bS is zero, no input pixels are updated with the

filtered result. The updated (B0-B3) pixels could be used for the filtering of

next adjacent block when the filtering window slides one block to the right.

The bS level mainly decides the necessity of filtering and filter type. The bS

level is determined by the MB type, edge position, reference frame type, and

motion vectors of two adjacent blocks as depicted in Fig. 5.4. The bS has the

strongest level when two adjacent blocks are intra coded and locate at the

MB boundary. The highest bS level invokes strong low pass filtering as

blocking artifacts are most visible in the above case. The lowest bS level

indicates no filtering of the input pixels.

In addition to bS levels, the parameters α and β are used for

preservation of a real edge. The parameters α and β also control the

necessity of filtering (eq. 5.1). α and β are assigned with higher values to

68

increase the possibility of filtering as higher QP causes more noticeable

blocking artifact [43]. In contrast, smaller α and β values are used for lower

QP.

bS! = 0 AND |A0–B0|<α AND |A1–A0|<β AND |B1–B0|<β (5.1)

5.2 Deblocking filter complexity

Analysis of run time profiles of decoder sub-functions reported that

the deblocking filter process in H.264 is the most computationally intensive

part [6]. Deblocking filter took as much as one-third of the computational

resources of the decoder.

The deblocking technique adopted in this standard requires an

extensive decision matrix to determine whether to filter on block edges and

which filter to employ. The complexity in H.264 is mainly based on high

adaptivity of the filter, which requires conditional processing at the block

edge and sample levels. As a result, conditional branches almost inevitably

appear in the innermost loops of the algorithm. These are very time

consuming and quite a challenge for parallel processing in DSP (digital

signal processor) hardware or SIMD (single instruction multiple data) code

[36] on general-purpose processor. Another reason for high complexity is

69

the small block size employed for residual coding in H.264 coding

algorithm. With the 4x4 blocks and a typical filter length of 2 samples in

70

Figure 5.3 Decision flow of filter tap selection [43].

Figure 5.4 Decision flow of boundary strength (BS) where P and Q denote the identifications of two adjacent 4x4 blocks [43].

each direction, almost every sample in a picture must be loaded from

memory, either to be modified or to determine if neighboring samples will

be modified. This was not the case with earlier standards. Some of the

branches are inherited to the algorithm itself and it is hard to eliminate them

71

at the programming level. These branches are data dependent and inherently

difficult to predict.

The standards group has published proposed program code to

implement this deblocking. The program code includes extensive

conditional branching [44]. This makes the code unsuitable for deeply

pipelined processors and ASIC (application specific integrated circuit)

implementations. In addition, this proposed program code exposes little

parallelism. This makes the proposed program code unsuitable for VLIW

(very long instruction word) processors and parallel hardware

implementations. This is particularly unfortunate in the case of VLIW

processors, which are otherwise well suited to video encoding/decoding

applications.

72

Chapter 6

INTRA AND INTER FRAMES

The blocking artifacts are visible in both intra and inter frames. As the

reference frames used are filtered ones in H.264/AVC, the motion

compensation does not propagate the blocking artifacts. This is because of

the use of in-loop deblocking filter in H.264/AVC. Due to these reasons, the

inter frames have less severe blocking artifacts than intra frames. The major

disadvantage of the in-loop deblocking filter adopted in H.264/AVC is its

implementation complexity as described in the section 5.2.

6.1 Intra frames

Intra coding exploits the spatial redundancies within a video frame.

The resulting frame is referred to as an I-frame. H.264 uses adaptive spatial

prediction to increase the efficiency of the intra coding process. H.264 uses

nine modes of prediction for 4x4 luminance blocks and four modes of

prediction for 16x16 luminance blocks. There is an additional 8x8 block

intra prediction in FRExt (fidelity range extension) [46]. For chroma, four

74

different modes are defined which are similar to four modes of 16x16

luminance block.

The blocking artifacts are more severe in intra frames as compared to

inter frames. In intra frames, the blocking effect occurs in either more

spatially active areas or the bright or very dark areas. The blocking effect is

more visible in the smoothly textured sections of a picture. DCT of a

smoothly textured section contains more energy in the lower order DCT

coefficients [3]. Therefore, the lower order DCT coefficients play a

significant role in determining the visibility of the blocking effect. The

blocking effect may occur in spatially active areas as a result of coarse

quantization [3]. In low bit rate coding, with higher QP the medium to

higher order DCT coefficients are quantized to zero. As a result, an

originally spatially active block will have a smoothly textured

reconstruction, which has same visibility concerns as originally smooth

blocks.

6.1.1 Proposed method for intra frames

As seen in section 5.2, the deblocking filter used in H.264 has high

implementation complexity. A new method is now proposed to reduce the

implementation complexity of H.264 deblocking filter, while maintaining

75

the comparable quality and computation complexity. The proposed method

as applied to intra frames is shown in Fig. 6.2. The first three blocks in Fig.

6.2 check for the conditions at the slice boundaries. The user sets these

conditions. These three blocks are the same as being used by the existing

deblocking filter in H.264. The edge can be visualized as shown in Fig. 6.1.

Here q0, q1, q2, q3 represent the pixel values loaded from the current 4x4

Figure 6.1 4x4 block edge (vertical or horizontal)

.block and the p0, p1, p2, p3 represent the 4x4 block adjacent to the current 4x4

block. The next step in Fig. 6.2 is to compute the maximum and minimum

values among the six pixels across an edge (p2, p1, p0, q0, q1, q2) and then

calculate the difference between the maximum and the minimum value. If

q3

p3

76

this difference is greater than the QP of current block, then it is more likely

to represent an edge and therefore should not be filtered. On the other hand,

if the difference between the maximum and minimum values is less than the

QP of the current block, filtering should be applied to that block to remove

the blocking artifacts. The block in the above case most likely represents

smoothly or mildly texture area. The next step in Fig. 6.2 is to find the

difference between adjacent pixels of a 4x4 block edge. For example, the

absolute difference between p3 and p2 is calculated and if that difference is

less than a fixed threshold (in this case the threshold has been set to two)

then one is assigned to a variable diff. The above process is repeated for all

the adjacent pixels across an edge (Fig. 6.1) in a 4x4 block. The variable

strength denotes the sum of the variable diff across all pixels of a 4x4 block

edge (Fig. 6.1). If the variable strength is greater than fixed threshold (in this

case the threshold has been set as four) then the block is most likely to be a

smoothly textured section and strong filtering is applied to that block. On the

other hand, if the variable strength is less than the above specified threshold,

it is most likely to represent mildly textured areas or high activity region and

weak filtering is applied to that block. Here, strong filtering means applying

a low pass filter to the three adjacent pixels on either side of the boundary of

77

a block and weak filtering is applying a low pass filter to the pixel on either

side of the boundary of a block.

6.1.2 Results for intra frames

The PSNR values for different test sequences [47] using the proposed

method, JM 9.2 (H.264 software) [44] and reconstruction without loop filter

are described in Table 6.1. The reconstruction of I-frame with proposed

method gives better PSNR (peak signal to noise ratio) values than the

reconstruction without loop filter as shown in the Table 6.1. Also, it gives

similar PSNR values compared to the reconstruction with loop filter (Table

6.1). The Figures 6.3-6.9 show visually the removal of blocking artifacts

using proposed method.

78

Deblocking disabled for all edges of the slice?

Deblocking disabled for all edges of the slice

boundaries

Is the edge at the slice boundary?

Find the max. and min. values among 6 pixels across the vertical or horizontal edge. Compute the absolute difference between max. and min. value diff = abs (max. – min.)

Is diff < QP

No filtering

Is MB Intra

Apply weak filtering across the edge

Calculate the offsets for 7 pixel pairs in the horizontal or vertical edge

If offset < 2

Var =0

Var =1

Strength = strength + var

Is strength > 4

Apply weak filtering

Apply strong filtering

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

No

No

No

No

Figure 6.2 Decision flow of filtering of pixels

78

79

Table 6.1. Comparison of PSNR values for different test sequences.

PSNR (dB)

Test clip

(QCIF)

QP

(Quantization

Parameter)

Reconstruction

with

Proposed

Method

Reconstruction

without

Loop Filter JM

9.2 (H.264

reference

software)

Reconstruction

with Loop

Filter

JM 9.2 (H.264

reference

software)

Foreman 37 31.615 31.363 31.637

Car phone 37 31.616 31.318 31.620

Car phone 45 26.671 26.291 26.612

News 35 32.204 32.040 32.043

News 39 29.515 29.313 29.55

Silent 39 29.489 29.269 29.330

Container 39 29.539 29.383 29.493

Container 45 25.718 25.553 25.710

Bridge-

close

37 29.886 29.738 29.828

Bridge-

close

45 25.816 25.635 25.805

81

(a) (b)Figure. 6.3 A reconstructed I-frame from H.264 decoder (a) QP = 37 without using a loop filter,

(b) QP = 37 with proposed method

(a) (b)Figure. 6.4a A reconstructed I-frame from H.264 decoder (a) QP = 45 without using a loop filter


82

(a) (b)Figure. 6.5 A reconstructed I-frame from H.264 decoder (a) QP = 37 without using a loop filter


(a) (b)Figure. 6.6 A reconstructed I-frame from H.264 decoder (a)QP = 37 without using a loop filter


(a) (b)Figure. 6.7 A reconstructed I-frame from H.264 decoder (a) QP = 39 without using a loop filter


83





6.2 Inter frames

Inter prediction and coding is based on using motion estimation and

compensation to take advantage of the temporal redundancies that exist

between successive frames. When a selected reference frame for motion

84

estimation is a previously encoded frame, the frame to be encoded is referred

to as a P (predictive)-picture. When both a previously encoded frame and a

future frame are chosen as reference frames, then the frame to be encoded is

referred to as a B (bipredictive)-picture.

The exploitation of interframe redundancies relies on the transfer of

previously coded information from motion compensated reference frames to

the current predictive coded picture. The transferred information also

includes the coding artifacts formed in the reconstruction of the motion

compensated reference. This type of artifact is generally referred to as false

edges [3]. False edges are mainly visible in smooth areas of predictive coded

pictures. Since the motion compensated prediction provides a good

prediction of the lower frequency information of a macroblock, the

prediction error in smooth areas is minimal or quantized to zero, and

therefore the false edges would not be masked. The blocking effect mainly

occurs in mildly textured areas where the prediction error is sufficiently

large such that it is not quantized to zero. This blocking effect is reduced in

H.264 by having filtered reference frames available for motion

compensation, but still some artifacts are present because the reference

frames may not fit at block boundary.

85

6.2.1 Proposed method for inter frames

The proposed method is applied to intra frames as shown in Fig. 6.2.

Here q0, q1, q2, q3 represent the pixel values loaded from the current 4x4

block and the p0, p1, p2, p3 represent the 4x4 block adjacent to the current 4x4

block. The next step in Fig. 6.2 is to compute the maximum and minimum

values among the six pixels across an edge (p2, p1, p0, q0, q1, q2) and then

calculate the difference between the maximum and the minimum value. If

this difference is less than QP then the current block most likely represents a

smooth or mildly textured area. If the MB is inter one then apply low pass

filter to the adjacent pixels on either side of the boundary of a block or MB.

Otherwise, if the difference between the minimum and maximum value is

greater than QP, the current block most likely represents an edge and

therefore is not filtered.

6.2.2 Results for inter frames

The PSNR values and the total number of bits used for encoding a P-

frame or B-frame in a H.264 compressed stream with different test

sequences using the proposed method, JM 9.2 (H.264 software) and

86

reconstruction without loop filter are given in Table 6.2. The reconstruction

of P-frame and B-frame with proposed method gives better PSNR (peak

signal to noise ratio) values than the reconstruction without loop filter as

shown in the Table 6.2. Also, it gives similar PSNR values compared to the

reconstruction with loop filter (Table 6.2). Figures 6.10-6.14 show visually

the removal of blocking artifacts in P-frames and B-frames using the

proposed method.

87

Table 6.2 Comparison of PSNR values and the total number of bits used for encoding a P- frame or B-frame in H.264

compressed stream for different test sequences.

PSNR (dB) Total number of bits used

Test clip

(QCIF)

-Type of

frames

QP

(Quantizat

ion

Parameter)

Reconstruction

with

Proposed

Method

Reconstruction

without

Loop filter

JM 9.2 (H.264 software)

Reconstruction

with

Loop filter JM 9.2

(H.264 software)

Reconstruction

with

Proposed

Method

Reconstruction

without

Loop filter

JM 9.2 (H.264

software)

Reconstruction

with

Loop filter

JM 9.2 (H.264

software)

Foreman-P 39 29.883 29.834 29.692 2085 2131 2107

News-P 39 27.846 27.528 27.771 4119 4074 4235

Car phone-P 39 30.271 30.024 30.171 817 897 720

Bridge close-

P

39 28.484 28.495 28.480 105 52 126

Foreman-B 39 28.925 28.879 29.054 489 515 499

News-B 39 28.085 28.307 27.982 1424 1802 1687

Car phone-B 39 29.637 29.438 29.491 197 203 193

Bridge close-

B

39 28.532 28.558 28.480 53 53 53

86

(a) (b)Figure. 6.10 A reconstructed P-frame from H.264 decoder (a) QP = 39 without using a loop filter,


(a) (b)Figure. 6.11A reconstructed B-frame from H.264 decoder, (a) QP = 39 without using a loop filter


(a) (b)Figure. 6.12 A reconstructed P-frame from H.264 decoder (a) QP = 39 without using a loop filter,


(a) (b)Figure. 6.13 A reconstructed P-frame from H.264 decoder (a)QP = 39 without using a loop filter,


(a) (b)Figure. 6.14 A reconstructed B-frame from H.264 decoder (a) QP = 39 without using a loop filter


Chapter 7

RESULTS AND CONCLUSIONS

7.1 Results

The results presented so far show that the proposed method removes

blocking artifacts significantly, gives better visual quality and PSNR values

than without using loop filter in H.264 codec. In comparison to H.264 in-

loop deblocking filter, the present method gives equal visual quality and

almost comparable PSNR values. For Intra frames, the proposed method

gives 0.1 to 0.2 dB less PSNR values than the in-loop deblocking filter for

some test sequences, whereas it gives 0.1-0.2 dB more PSNR values than the

in-loop deblocking filter for other sequences. For inter frames, it gives

slightly better PSNR values than the in-loop deblocking filter for most of the

test sequences.

The results are given for a GOP (group of pictures) size of 10 with the

test clip used is of QCIF (176x144) resolution (Table 7.1). The proposed

method gives better PSNR values for P- frames and B-frames than the other

two methods (Table 7.1).

As described in section 5.2, deblocking filter in H.264 [6] has high

implementation complexity. The main reason for its high implementation

complexity is the occurrence of conditional branches in the innermost loop

of the algorithm. The details of this complexity are discussed in section 5.3.

The proposed method reduces the implementation complexity by reducing

the occurrence of conditional branches in the innermost loop of the

algorithm.

Table 7.1 Comparison of PSNR values and the total number of

bits used for encoding a P- frame or B-frame in H.264

compressed stream

PSNR (dB)

Frame Type-

frame

number

(Foreman

_qcif)

QP( Qua

ntization

paramet

er)

Reconstruction with

Proposed

Method

Reconstruction

without

Loop Filter

JM 9.2 (H.264

software)

Reconstruction

with Loop

Filter

JM 9.2 (H.264

software)

Intra –0 45 26.058 25.851 26.096

B -3 39 28.1341 28.066 28.044

P –8 39 28.668 28.556 28.411

B –9 39 28.269 28.439 28.157

P –10 39 28.731 28.727 28.534

The main advantages of proposed method over JM method are:

1. JM 9.2 (H.264 software) [44] loop filter code size is 21Kb whereas the

proposed method loop filter code size is 11Kb. In real time

implementation this means the memory is saved by the proposed

method.

2. JM 9.2 (H.264 software) loop filter uses two tables of size 52 bytes

and one table of size 260 bytes to check, whether a pixel should be

filtered or not. The values from these tables have to be accessed each

time a pixel is checked whether it has to be filtered or not in a real

time system. This will take some time of the processor in real time.

No such tables are used in the proposed method.

3. Conditional loops in the innermost loop of the algorithm are reduced

in the proposed method as compared to JM 9.2 (H.264 software). This

will lead to reduction in its execution time in real time systems.

7.2 Conclusions

The proposed method is able to reduce the blocking artifacts in the

reconstructed video. It gives almost similar visual quality of the

reconstructed video as compared to the one obtained from JM 9.2 (H.264

software) loop filter. The proposed method requires less implementation

complexity compared to the JM 9.2 (H.264 software) loop filter. That is

because of the simple flow algorithm of the proposed method as compared

to the JM 9.2 (H.264 software) loop filter.

7.3 Future research

The proposed deblocking filter can be implemented in a real time

system. By doing so, its exact reduction in implementation complexity as

compared to deblocking filter can be found out. The deringing filter can also

be incorporated in the in-loop filter to see the visual improvement of the

reconstructed video. The proposed method uses image enhancement

techniques to reduce the artifacts in the reconstructed video. Image recovery

techniques can also be explored to reduce the artifacts in H.264 decoded

reconstructed video. Also, the transforms that do not produce blocking

artifacts as well as providing the benefits of integer DCT can be explored.

REFERENCES

1. Draft ITU-T Recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC), Mar. 2003.

2. I. E. G. Richardson, “H.264 and MPEG-4 Video Compression: Video coding for next-generation multimedia”, Hoboken, NJ, Wiley, 2003.

3. H.R.Wu and K.R.Rao, “Digital Video Image Quality and Perceptual Coding”, Taylor and Francis, Dec. 2005.

4. Y. –L. Lee and H. W. Park, “Loop filtering and post-filtering for low- bit-rates moving picture coding”, Signal processing: Image Communication, vol. 1, pp. 94-98, Dec. 1999.

5. J. Lainema and M. Karczewicz, “TML 8.4 loop filter analysis”, ITU-T SG16 Doc. VCEG-N29, 2001.

6. V. Lappalainen, A. Hallapuro, and T. D. Hamalainen, “Complexity of Optimized H.26L Video Decoder Implementation”, IEEE Trans. CSVT, vol. 13, pp. 717-725, July 2003.

7. M. Ghanbari, “Video Coding: an introduction to standard codecs”, London, U.K.: Institution of Electrical Engineers, 1999.

8. ISO / IEC JTC1/SC29, Generic coding of moving pictures and associated audio, ISO/IEC 13818-2, Draft International Standard, Nov. 1994.

9. H.263: International telecommunication union, “Recommendation ITU-T H.263: Video coding for low bit rate communication”, ITU-T, 1998.

10.G. Sullivan and T. Wiegand, “ Video compression – from concepts to H.264/AVC standard”, Proc. IEEE, vol.93, pp. 18-31, Jan. 2005.

11.T. Weigand, et al, “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp. 560-576, July 2003.

12.M. Karczewicz and R. Kurceren, “The SP- and SI-Frames design for H.264/AVC”, IEEE Trans. CSVT, vol.13, pp. 637-644, July 2003.

13.M. Wein, “Variable block-size transforms for H.264/AVC”, IEEE Trans. CSVT, vol. 13, pp. 604-613, July 2003.

14.J. Ostermann, et al, “Video coding with H.264/AVC: Tools, performance and complexity”, IEEE CAS Magazine, vol.4, pp.7-34, I quarter, 2004.

15.H. S. Malvar, et al, “Low-complexity transform and quantization in H.264/AVC”, IEEE Trans. CSVT, vol. 13, pp. 598-603 July 2003.

16.M.-Y. Shen and C. - C Jay Kuo, “Review of postprocessing techniques for compression artifact removal”, Journal of Visual Communication and Image Representation, vol. 9, pp. 2-14, Mar. 1998.

17.S. A. Karunasekra and N. K. Kingsbury, “A distortion measure for blocking artifacts on images based on human visual sensitivity”, IEEE Trans. Image Processing, vol. 4, pp. 713-724, June 1995.

18.G. Fan and W. K. Chan, “Model-based edge reconstruction for low-bit rate wavelet compressed images”, IEEE Trans. CSVT, vol. 10, pp. 120-132, Jan.2000.

19.N. Jayant, J. Johnson, and R. Safranek, “Signal compression based on models of human perception”, Proc. IEEE, vol. 81, pp. 1385-1422, Oct. 1993.

20.A. Kundu, “Enhancement of JPEG coded images by adaptive spatial filtering”, Proc. IEEE ICIP, pp. 187-190, Oct. 1995.

21.P. Farrelle and A. K. Jain, “Recursive block-coding – a new approach to transform coding”, IEEE Trans. Commun., vol. COM-34, pp. 161-179, Feb. 1986.

22.D. Pearson and M. Whybray, “Transform coding of images using interleaved blocks”, Proc. IEE, vol. 131, pp. 466-472, Aug. 1984.

23.H. S. Malvar, Signal Processing with Lapped Transforms. Boston, MA: Artech House, 1992.

24.Y. Zhang, R. Pickholtz, and M. Loew, “A new approach to reduce the blocking effect of transform coding”, IEEE Trans. Commun., vol. 41, pp. 299-302, Feb. 1993.

25.B. Ramamurthi and A. Gersho, “ Nonlinear space-variant post processing of block coded images”, IEEE Trans. ASSP, vol. ASSP-34, pp. 1258-1267, Oct. 1986.

26.K. R. Rao and P. Yip, “Discrete cosine Transform: Algorithms, Advantages, Applications”, Boston, MA: Academic press, 1990.

27.K. Lee, D. S. Kim and T. Kim, “Regression-based prediction for blocking artifact reduction in JPEG-compressed images,” IEEE Trans. Image Processing, vol.14, pp. 36-48, Jan. 2005.

28.F. Gao, X. Li, and W. G. Wee, “A new wavelet based deblocking algorithm for compressed images,” Proc. IEEE Asilomar Conf. on Signals, Systems and Computers, vol. 2, pp. 1745-1748, Nov. 2002.

29.E. Choi and M. G. Kang, “Deblocking algorithm for DCT-based compressed images using anisotropic diffusion”, IEEE Trans. ASSP, vol. 3, pp. 717-720, Apr. 2003.

30.A. Z. Averbuch, A. Schlar, and D. L. Donoho, “Deblocking of block-transform compressed images using weighted sums of symmetrically aligned pixels,” IEEE Trans. Image Processing, vol.14, pp. 200-212, Feb. 2005.

31.O. Radovsky and M. Israeli, “Adaptive deblocking of block-transform compressed images using blending-functions approximation,” Proc. IEEE Int’l Conf. on Image Processing, vol. 3, pp. 227-230, Sept. 2003.

32.A. Schclar, A. Averbuch and D.L. Donoho, “Deblocking of block-DCT compressed images using deblocking frames of variable size,” IEEE Trans. ASSP, vol. 4, pp. 3285-3288, Apr. 2002.

33.A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Trans. CSVT, vol.2, pp. 91-95, Mar. 1992.

34.D. C. Youla and H. Webb, “Image restoration by the method of convex projections: Part 1 Theory,” IEEE Trans. Medical Imaging, vol.1, pp. 81-94, Oct. 1982.

35.J. J. Zou and H. Yan, “A POCS-based method for reducing artifacts in BDCT compressed images,” Proc. 16th Int’l Conf. on Pattern Recognition, vol. 2, pp. 11-15, Aug. 2002.

36.P. List, A. Joch, J. Lainema, G. Bjntegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. CSVT, vol.13, pp. 614-619, Jul. 2003.

37.C. A Graves, “Deblocking of DCT -compressed images using noise injection followed by image denoising,” Proc. IEEE Int’l Conf. on Information Technology: Computers and Communications, pp. 472-475, Apr. 2003.

38.C. Wang, W-J Zhang, and X-Z Fang, “Adaptive reduction of blocking Artifacts in DCT domain for highly compressed images,” IEEE Trans. Consumer Electronics, vol.50, pp. 647-654, May 2004.

39.S. Liu and A. C. Bovik, “Efficient DCT-domain blind measurement and reduction of blocking artifacts,” IEEE Trans. CSVT, vol.12, pp. 1139-1149, Dec. 2002.

40.Y. Zhao, G. Cheng and S. Yu, “Postprocessing technique for blocking artifacts reduction in DCT domain,” Electronics Letters, vol. 40, issue 19, pp. 1175-1176, Sept. 2004.

41.S. D. Kim, et al, “A deblocking filter with two separate modes in block-based video coding”, IEEE Trans. CSVT, vol. 9, pp. 156-160, Feb. 1999.

42.S.-K. Kwon, A. Tamhankar, K. R. Rao, “Overview of H.264 / MPEG-4 Part 10”, Journal of Visual Communication and Image Representation, vol. 17, pp. 186-216, April 2006.

43.S. C. Chang, et al, “A platform based bus-interleaved architecture for deblocking filter in H.264/MPEG-4 AVC”, IEEE Int’l Conf. on Consumer Electronics, vol. 51, pp. 249-255, 2005.

44.H.264 software (JM9.8/FRExt) from http://iphome.hhi.de/suehring/tml/download/jm98.zip .

http://iphome.hhi.de/suehring/tml/download/jm98.zip

45.http://www.stanford.edu/class/ee398b/handouts/05 standardsH264JVT.pdf

46. A. Puri, H. Chen and A. Luthra, "Video Coding using the H.264/MPEG-4 AVC compression standard", Signal Processing: Image Communicationvol. 19, pp.793-849, Oct. 2004.

47. Test sequences are obtained from: http://trace.eas.asu.edu/yuv/qcif.html

BIOGRAPHICAL INFORMATION

The author Hitesh Yadav received his Bachelors degree from Mangalore

University, India in Aug 2000 and Masters Degree in Electrical Engineering

from The University of Texas at Arlington in May 2006. Before coming for

Masters, he has worked in Accord Software Systems in Bangalore as

Systems Engineer. He has worked in the field of GPS receivers. During his

http://trace.eas.asu.edu/yuv/qcif.html

http://www.stanford.edu/class/ee398b/handouts/05%20standardsH264JVT.pdf

http://www.stanford.edu/class/ee398b/handouts/05%20standardsH264JVT.pdf

masters his main concentration is video coding. He has also done his

internship in Analog Devices Inc. in Wilmington. His research interests

include video coding, signal processing and embedded programming.

1 - UTA · Web view2.1 Components of an image ((a) R, G, ... Very Long Instruction Word Chapter...

Documents

Transcript of 1 - UTA · Web view2.1 Components of an image ((a) R, G, ... Very Long Instruction Word Chapter...