Digital Video Image Quality and Perceptual Coding

Digital VideoImage Quality

and Perceptual

Coding

2006 by Taylor & Francis Group, LLC

Signal Processing and Communications

Editorial Board

Maurice G. Ballanger, Conservatoire Nationaldes Arts et Mtiers (CNAM), Paris

Ezio Biglieri, Politecnico di Torino, Italy

Sadaoki Furui, Tokyo Institute of Technology

Yih-Fang Huang, University of Notre Dame

Nikil Jayant, Georgia Institute of Technology

Aggelos K. Katsaggelos, Northwestern University

Mos Kaveh, University of Minnesota

P. K. Raja Rajasekaran, Texas Instruments

John Aasted Sorenson, IT University of Copenhagen

1. Digital Signal Processing for Multimedia Systems, edited by Keshab K. Parhi and Takao Nishitani

2. Multimedia Systems, Standards, and Networks, edited by Atul Puri and Tsuhan Chen

3. Embedded Multiprocessors: Scheduling and Synchronization, SundararajanSriram and Shuvra S. Bhattacharyya

4. Signal Processing for Intelligent Sensor Systems, David C. Swanson5. Compressed Video over Networks, edited by Ming-Ting Sun and Amy R. Reibman6. Modulated Coding for Intersymbol Interference Channels, Xiang-Gen Xia7. Digital Speech Processing, Synthesis, and Recognition: Second Edition,

Revised and Expanded, Sadaoki Furui8. Modern Digital Halftoning, Daniel L. Lau and Gonzalo R. Arce9. Blind Equalization and Identification, Zhi Ding and Ye (Geoffrey) Li10. Video Coding for Wireless Communication Systems, King N. Ngan, Chi W. Yap,

and Keng T. Tan11. Adaptive Digital Filters: Second Edition, Revised and Expanded,

Maurice G. Bellanger12. Design of Digital Video Coding Systems, Jie Chen, Ut-Va Koc, and K. J. Ray Liu13. Programmable Digital Signal Processors: Architecture, Programming,

and Applications, edited by Yu Hen Hu14. Pattern Recognition and Image Preprocessing: Second Edition, Revised

and Expanded, Sing-Tze Bow15. Signal Processing for Magnetic Resonance Imaging and Spectroscopy,

edited by Hong Yan


16. Satellite Communication Engineering, Michael O. Kolawole17. Speech Processing: A Dynamic and Optimization-Oriented Approach, Li Deng18. Multidimensional Discrete Unitary Transforms: Representation: Partitioning

and Algorithms, Artyom M. Grigoryan, Sos S. Agaian, S.S. Agaian19. High-Resolution and Robust Signal Processing, Yingbo Hua, Alex B. Gershman

and Qi Cheng20. Domain-Specific Processors: Systems, Architectures, Modeling, and Simulation,

Shuvra Bhattacharyya; Ed Deprettere; Jurgen Teich21. Watermarking Systems Engineering: Enabling Digital Assets Security

and Other Applications, Mauro Barni, Franco Bartolini22. Biosignal and Biomedical Image Processing: MATLAB-Based Applications,

John L. Semmlow23. Broadband Last Mile Technologies: Access Technologies for Multimedia

Communications, edited by Nikil Jayant24. Image Processing Technologies: Algorithms, Sensors, and Applications,

edited by Kiyoharu Aizawa, Katsuhiko Sakaue and Yasuhito Suenaga25. Medical Image Processing, Reconstruction and Restoration: Concepts

and Methods, Jiri Jan26. Multi-Sensor Image Fusion and Its Applications, edited by Rick Blum

and Zheng Liu27. Advanced Image Processing in Magnetic Resonance Imaging, edited by

Luigi Landini, Vincenzo Positano and Maria Santarelli28. Digital Video Image Quality and Perceptual Coding, edited by

H.R. Wu and K.R. Rao


Digital VideoImage Quality

andPerceptual

Coding

edited by

H.R. Wu and K.R. Rao

A CRC title, part of the Taylor & Francis imprint, a member of theTaylor & Francis Group, the academic division of T&F Informa plc.

Boca Raton London New York


Published in 2006 byCRC PressTaylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

2006 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group

No claim to original U.S. Government worksPrinted in the United States of America on acid-free paper10 9 8 7 6 5 4 3 2 1

International Standard Book Number-10: 0-8247-2777-0 (Hardcover) International Standard Book Number-13: 978-0-8247-2777-2 (Hardcover) Library of Congress Card Number 2005051404

This book contains information obtained from authentic and highly regarded sources. Reprinted material isquoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable effortshave been made to publish reliable data and information, but the author and the publisher cannot assumeresponsibility for the validity of all materials or for the consequences of their use.

No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, andrecording, or in any information storage or retrieval system, without written permission from the publishers.

Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registrationfor a variety of users. For organizations that have been granted a photocopy license by the CCC, a separatesystem of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used onlyfor identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Digital video image quality and perceptual coding / edited by Henry R. Wu, K.R. Rao.p. cm. -- (Signal processing and communications)

Includes bibliographical references and index.ISBN 0-8247-2777-01. Digital video. 2. Imaging systems--image quality. 3. Perception. 4. Coding theory. 5. Computer

vision. I. Wu, Henry R. II. Rao, K. Ramamohan (Kamisetty Ramamohan) III. Series.

TK6680.5.D55 2006006.6'96--dc22 2005051404

Visit the Taylor & Francis Web site at

and the CRC Press Web site at Taylor & Francis Group is the Academic Division of Informa plc.


For permission to photocopy or use material electronically from this work, please access www.copyright.com(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive,

http://www.taylorandfrancis.com

http://www.crcpress.com
http://www.copyright.comhttp://www.copyright.comhttp://www.taylorandfrancis.comhttp://www.crcpress.com

To those who have pioneered, inspired and persevered.


Copyrights release for ISO/IEC:All the figures and tables obtained from ISO/IEC used in this book are subject to the follow-ing: The terms and definitions taken from the Figures and Tables ref. ISO/IEC IS11172,ISO/IEC11172-2, ISO/IEC 11172-73, ISO/IEC IS13818-2, ISO/IEC IS13818-6, ISO/IEC JTC1/SC29WG11 Doc N2196, ISO/IEC JTC1 /SC29/WG11 N3536, ISO/MPEG N2502 ISO/IECJTC1/SC29/WG1 Doc. N1595, ISO/IEC JTC SC29/WG11 Doc.2460, ISO/IEC JTC SC29/WG11Doc.3751, ISO/MPEG N2501, ISO/IEC JTC-1 SC29/WG11 M5804, ISO/IEC JTC SC29/WG11,Recomm.H.262 ISO/IEC 13818-2, ISO/IEC IS 13818-1, ISO/MPEG N3746, ISO/IEC Doc.N2502, ISO/IEC JTC1/SC29/WG11, Doc. N2424 are reproduced with permission of Inter-national Organization for Standardization, ISO. These standards can be obtained from any ISOmember and from the Web site of the ISO Central Secretariat at the following address:

The terms and definitions taken from the Figure ref. ISO/IEC 14496-1:1999, ISO/IEC 14496-2:1999, ISO/IEC 14496-3:1999, ISO/IEC 14496-4:2000, ISO/IEC 14496-5:2000, ISO/IEC 14496-10 AVC: 2003 are reproduced with the permission of the International for Standardization, ISO.These standards can be obtained from any ISO member and from the Web site of ISO Cen-

editors, authors and Taylor and Francis are grateful to ISO/IEC for giving the permission.

About Shannon image on the front cover:The original image of Claude E. Shannon, the father of information theory, was provided byBell Laboratories of Lucent Technologies. It was compressed using the JPEG coder that isinformation lossy. Its resolution was 41815685 pixels with white margins around all sides.This digital image had scanning dust.

The original image was cropped down to 41765680 for compression and the scanning dustwas removed by copying the surrounding pixels over the dust. The resultant image was usedas the original image to produce the compressed Shannon image using the perceptual lossless

image compression. The PLIC was benchmarked against the JPEG-LS and the JPEG-NLS (d=2).

The coding error images were produced between the cropped original Shannon image and thecompressed images. In the error images, the red color represents a positive error, the blue anegative error and the white a zero error. The black color is not actually black, it is small valuedred or blue color. Comparing the two error images, it can be appreciated how the PLIC uses thehuman vision model to achieve perceptual lossless coding with a higher compression ratio thanthe benchmarks, whilst maintaining the visual fidelity of the original picture.

The top image is the original image that can be compressed to 3.179 bpp using the JPEG-LS.The mid-right image is the JPEG-NLS (i.e., JPEG near lossless) compressed at d=2 with Bitrate= 1.424 bpp, Compression Ratio = 5.6180:1, MSE = 1.9689 and PSNR = 45.1885 dB. The mid-left image is the difference image between the original and that compressed by the JPEG-NLS(d=2). The bottom-right is the PLIC compressed with Bitrate = 1.370 bpp, Compression Ratio =5.8394:1, MSE = 2.0420 and PSNR = 45.0303 dB. The bottom-left image is a difference imagebetween the original and that compressed by the PLIC.


http://www.iso.org. Non-exclusive copyright remains with ISO.

tral Secretariat at the following address: www.iso.org. Non-exclusive copyright with ISO. The

image coder (PLIC) as described in Chapter 13 with an implementation intended for medical
http://www.iso.orghttp://www.iso.org

Contributors

Alan C. Bovik, University of Texas at Austin, Austin, Texas, U.S.A.

Jorge E. Caviedes, Intel Corporation, Chandler, Arizona, U.S.A.

Tao Chen, Panasonic Hollywood Laboratory, Universal City, California, U.S.A.

Francois-Xavier Coudoux, Universite de Valenciennes, Valenciennes, Cedex, France.

Philip J. Corriveau, Intel Media and Acoustics Perception Lab, Hillboro, Oregon,U.S.A.

Mark D. Fairchild, Rochester Institute of Technology, Rochester, New York, U.S.A.

Marc G. Gazalet, Universite de Valenciennes, Valenciennes, Cedex, France.

Jae Jeong Hwang, Kunsan National University, Republic of Korea.

Michael Isnardi, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.

Ryoichi Kawada, KDDI R&D Laboratories Inc., Japan.

Weisi Lin, Institute for Infocomm Research, Singapore.

Jeffrey Lubin, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.

Makoto Miyahara, Japan Advanced Institute of Science and Technology, Japan.

Ethan D. Montag, Rochester Institute of Technology, Rochester, New York, U.S.A.

Franco Oberti, Philips Research, The Netherlands.

Albert Pica, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.

K. R. Rao, University of Texas at Arlington, Arlington, Texas, U.S.A.

Hamid Sheikh, Texas Instruments, Inc., Dallas, Texas, U.S.A.

Damian Marcellinus Tan, Royal Melbourne Institute of Technology, Melbourne,Victoria, Australia.

Zhou Wang, University of Texas at Arlington, Arlington, Texas, U.S.A.

Stefan Winkler, Genista Corporation, Montreux, Switzerland.

ix 2006 by Taylor & Francis Group, LLC

x Digital Video Image Quality and Perceptual Coding

Hong Ren Wu, Royal Melbourne Institute of Technology, Melbourne,Victoria, Australia.

Zhenghua Yu, National Information Communication Technology Australia (NICTA).

Michael Yuen, ESS Technology, Inc., Beijing, China

Jian Zhang, National Information Communication Technology Australia (NICTA).


Acknowledgments

The editors, H. R. Wu and K. R. Rao, would like to thank all authors of this handbookfor their contributions, efforts and dedication without which this book would not havebeen possible.

The editors and the contributors have received assistance and support from many ofour colleagues that has made this handbook, Digital Video Image and Perceptual Cod-ing, possible. The generous assistance and support includes valuable information andmaterials used in and related to the book, discussions, feedback, comments on and proofreading of various parts of the book, recommendations and suggestions that shaped thebook as it is. Special thanks are due to the following persons:

M. Akgun Communications Research Center, CanadaJ. F. Arnold Australian Defence Force AcademyB. Baxter Intel CorporationJ. Cai Nanyang Technological UniversityN. Corriveau Spouse of P. CorriveauS. Daly Sharp Laboratories of AmericaM. Frater Australian Defence Force AcademyN. G. Kingsbury University of CambridgeL. Lu IBM T. J. Watson Research CenterZ. Man Nanyang Technological UniversityS. K. Mitra University of California, Santa BarbaraK. N. Ngan The Chinese University of Hong KongE. P. Simoncelli New York UniversityC.-S. Tan Royal Melbourne Institute of TechnologyA. Vincent Communications Research Center, CanadaM. Wada KDDI R&D LaboratoriesB. A. Wandell Stanford UniversityS. Wolf Institute for Telecommunication SciencesD. Wu Royal Melbourne Institute of TechnologyC. Zhang Nanyang Technological UniversityZ. Zhe Monash UniversityAll members VQEG

A. C. Bovik acknowledges the support by the National Science Foundation undergrant CCR-0310973.

xi 2006 by Taylor & Francis Group, LLC

xii Digital Video Image Quality and Perceptual Coding

H. R. Wu and Z. Yu acknowledge the support by Australian Research Council undergrant A49927209.

Assistance and support to this book project which H. R. Wu received from MonashUniversity where he lectured from 1990 to 2005, and from Nanyang Technological Uni-versity where he spent his sabbatical from 2002 to 2003, are gratefully acknowledged.

Special thanks go to David Wu of Royal Melbourne Institute of Technology for hisassistance in producing the final LATEX version of this handbook and the compressedShannon images shown on the front cover of the book.

H. R. Wu and K. R. Rao would like to express their sincere gratitude to B. J. Clark,publishing consultant at CRC Press LLC, who initiated this book project and whoseprofessional advice and appreciation of efforts which were involved in this undertakingmade the completion of this book possible. Sincere thanks also go to Nora Konopka,our Publisher, and Jessica Vakili, our Project Coordinator, at CRC Press LLC for theirpatience, understanding and unfailing support that helped see this project through. Weare most grateful to our Project Editor, Susan Horwitz, whose professional assistancehas made significant improvement to the books presentation. The work by NicholasYancer on the back cover of the book and the brilliant cover design by Jonathan Pennellare greatly appreciated.

Last but not least, without the patience and forbearance of our families, the prepara-tion of this book would have been impossible. We greatly appreciate their constant andcontinuing support and understanding.


Preface

The outset of digital video image coding research is commonly acknowledged [Cla95]to be around 1950, marked by Goodalls paper on television by pulse code modula-tion (or PCM) [Goo51, Hua65], Cutlers patent on differential quantization of com-munication signals (commonly known as differential pulse code modulation or DPCMfor short) [Cut52], Harrisons paper on experiments with linear prediction in televi-sion [Har52], and Huffmans paper on a method for the construction of minimum re-dundancy codes (commonly known as Huffman coding) [Huf52]; notwithstanding thatsome of the pioneering work on fundamental theories, techniques and concepts in digi-tal image and video coding for visual communications can be traced back to Shannonsmonumental work on the mathematical theory of communication in 1948 [Sha48], Ga-bors 1946 paper on theory of communication [Gab46] and even as early as the late1920s when Kell proposed the principle of frame difference signal transmission in aBritish patent [Kel29, SB65, Sey63, Gir03]. While international standardization of dig-ital image and video coding [RH96] might be considered by many as the end of an eraor, simply, research in the area, for others it presents new challenges and signals the be-ginning of a new era, or more precisely, it is high time that we addressed and, perhaps,solved a number of long standing open problems in the field.

A brief review of the history and the state-of-the-art of research in the field will revealthe fundamental concepts, principles and techniques used in image data compressionfor storage and visual communications. An important goal that was set fairly earlyby forerunners in image data compression is to minimize statistical (including sourcecoding, spatio-temporal and inter-scale) and psychovisual (or perceptual) redundanciesof the image data to either comply with a certain storage or communications bandwidthrestrictions or limitations with the best possible picture quality, or to provide a certainpicture quality service with the lowest possible amount of data or bit rate [Sey62]. Ithelped to set the course and to raise a series of issues widely researched, which haveinspired and, in many ways, frustrated generations of researchers in the field. Some ofthese issues and associated problems are better researched, understood and solved thanothers.

Using information theory and optimization techniques, we understand reasonablywell the definition of statistical redundancy and what is the theoretical lower bound setby Shannons entropy in lossless image and video coding [Sha48, JN84]. We have statis-tically modelled natural image data fairly well, which has led to various optimal or sub-

xiii 2006 by Taylor & Francis Group, LLC

xiv Digital Video Image Quality and Perceptual Coding

optimal compression techniques in the least mean square sense [Sey62, Cla85, NH95].We routinely apply the rate-distortion theory with the mean squared error (MSE) as adistortion measure in design of constant bit rate coders. We have pushed the perfor-mance of a number of traditional compression techniques, such as predictive and trans-form coding, close to their limit in terms of decorrelation and energy packing efficien-cies. Motion compensated prediction has been thoroughly investigated for inter-framecoding of video and image sequences, leading to a number of effective and efficientalgorithms used in practical systems.

In model-, object- or segmentation-based coding, we have been trying to balancebit allocations between coding of model parameters and that of the residual image, butwe have yet to get it right. Different from classical compression algorithms, techniquesbased on matching pursuit, fractal transforms and projection on to convex sets are re-cursive, and encode transform or projection parameters instead of either pixel or trans-form coefficient values. Nevertheless, they have failed so far to live up to their greatexpectations in terms of rate-distortion performance in practical coding systems andapplications. We have long since realized that much higher compression ratios can beachieved than what is achievable by the best lossless coding techniques or the theoreticallower bound set by the information theory without noticeable distortion when viewedby human subjects. Various adaptive quantization and bit allocation techniques and al-gorithms have been investigated to incorporate some of the aspects of the human visualsystems (HVS) [CS77, CP84, Cla85], most of which focus on spatial contrast sensitivityand masking effects. Various visually weighted distortion measures have been also ex-plored in either performance evaluation [JN84] or rate-distortion optimization of imageor video coders [Tau00].

Limited investigations have been conducted in constant quality coder design, im-peded by the lack of a commonly acceptable quality metric which correlates well withsubjective or perceived quality indices, such as the mean opinion score (MOS) [ITU98].Long has the question been asked, Whats wrong with mean-squared error? [Gir84],as well as its derivatives such as the peak signal to noise ratio (PSNR), as the qualityor distortion measure. Nonetheless, obtaining credible and widely acceptable alterna-tive perceptual based quantitative quality and/or impairment metrics have so far eludedus till most recently [LB82, VQE00]. Consequently, attempts and claims of provid-ing users with guaranteed or constant quality visual services have been by and largeunattainable or unsubstantiated. Lacking HVS-based quantitative quality or impairmentmetrics, more often than not we opt for a much higher bit rate for quality critical visualservice applications than what is necessary, resulting in users carrying extra costs; andjust as likely a coding strategy may reduce a particular type of coding distortions orartifacts at the expense of manifesting or enhancing other types of distortions. One ofthe most challenging questions begging for an answer is how to define psychovisualredundancy for lossy image and video coding, if it can ever be defined quantitatively ina similar way to the statistical redundancy defined for lossless coding. It would help


Preface xv

to set the theoretical lower bound for lossy image data coding at just noticeable levelcompared with the original.

This book attempts to address two of the above raised issues which may form acritical part of theoretical research and practical system development in the field, i.e.,HVS based perceptual quantitative quality/impairment metrics for digitally coded pic-tures (i.e., images and videos), and perceptual picture coding. The book consists of three

Part I comprises the first three chapters, covering a number of fundamental concepts,theory, principles and techniques underpinning issues and topics addressed by this book.

Rao provides an introduction to digital picture compression, covering basic issues andtechniques along with popular coding structures, systems and international standards forcompression of images and videos.

Fundamentals of Human Vision and Vision Modeling are presented by Montag

lated to the HVS and its applications presented in Parts II and III on perceptual qual-ity/impairment metrics, image/video coding and visual communications. The most re-cent achievements and findings in vision research are included, which are relevant todigital picture coding engineering practice.

Various digital image and video coding/compression algorithms and systems intro-duce highly structured coding artifacts or distortions, which are different from those intheir counterpart analog systems. It is important to analyze and understand these cod-ing artifacts in either subjective and objective quality assessment of digitally encoded

presented by Yuen of various coding artifacts in digital pictures coded using well knowntechniques.

Part II of this book consists of eight chapters dealing with a range of topics regardingpicture quality assessment criteria, subjective and objective methods and metrics, testingprocedures, and development of international standards activities in the field.

on subjective assessment methods and techniques, experimental design, internationalstandard test methods for digital video images in contrast to objective assessment meth-ods, highlighting a number of critical issues and findings. Commonly used test videosequences are presented. The chapter also covers test criteria, test procedures and re-lated issues for various applications in digital video coding and communications. Al-though subjective assessment methods have been well documented in the literature andstandardized by the international standards bodies [ITU98], there has been a renewed


Chapter 1, Digital Picture Compression and Coding Structure, by Hwang, Wu and

and Fairchild in Chapter 2 which forms foundations of materials and discussions re-

images or video sequences. In Chapter 3, a comprehensive classification and analysis is

Chapter 4, Video Quality Testing by Corriveau, provides an in-depth discussion

parts, i.e., Part I, Fundamentals; Part II, Testing and Quality Assessment of DigitalPictures and Part III, Perceptual Coding and Postprocessing.

xvi Digital Video Image Quality and Perceptual Coding

interest in and research publications on various issues with subjective test methods andnew methods, approaches or procedures which may further improve the reliability ofsubjective test data.

A comprehensive and up-to-date review is provided by Winkler on Perceptual Video

square error (MSE) and the PSNR, and HVS based metrics as reported in the litera-ture [YW00, YWWC02] as well as by international standards bodies such as VQEG[VQE00]. It discusses factors which affect human viewers assessment on picture qual-ity, classification of objective quality metrics, and various approaches and models usedfor metrics design.

Scale. It provides insights into the idea and concept behind the PQS which was in-troduced by Miyahara, Kotani and Algozi in [MKA98], an extension of the methodpioneered by Miyahara in 1988 [Miy88]. It examines applications of PQS to variousdigital picture services, including super HDTV, extra high quality images, and cellularvideo phones, in the context of international standards and activities.

Wang, Bovik and Sheikh present a detailed account on Structural Similarity Based

ric is devised to complement the traditional error sensitive picture assessment methods,by targeting at perceived structural information variation, an approach which mimicshigh level functionality of the HVS. Quality prediction accuracy of the metric is eval-uated with significant lower computational complexity than vision model based qualitymetrics.

Vision Model Based Digital Video Impairment Metrics introduced recently are

In contrast with traditional vision modeling and parameterization method used in visionresearch, the vision model used in the impairment metrics are parameterized and opti-mized using subjective test data provided the VQEG where original and distorted videosequences were used instead of simple test patterns. Detailed descriptions of impairmentmetric implementations are provided with performance evaluation which have showedgood agreements with the MOS obtained via subjective evaluations.

Computational Models for Just-Noticeable Difference are reviewed and closely

as well as a practical users guide for related techniques. JND estimation techniquesin both DCT subband domain and image pixel domain are discussed along with issuesregarding conversions between the two domains.

ity Metric for Degraded and Enhanced Video. The concept of virtual reference isintroduced and defined. It highlights the importance of assessing picture quality en-


Quality Metrics in Chapter 5, including both traditional measures, such as the mean

In Chapter 6, Miyahara and Kawada discuss the Philosophy of Picture Quality

Image Quality Assessment in Chapter 7. The structural similarity based quality met-

described by Yu and Wu in Chapter 8 for blocking and ringing impairment assessments.

examined by Lin in Chapter 9. It provides a systematic introduction to the field to date

In Chapter 10, Caviedes and Oberti investigate issues with No-Reference Qual-

Preface xvii

hancement as well as degradation in visual communications services and applications inabsence of original pictures. A framework for the development of no-reference qualitymetric is described. An extensive description is provided on the no-reference overallquality metric (NROQM) which the authors have developed for digital video qualityassessment.

activities highlighting its goals, test plans, major findings, and future work and direc-tions.

coder designs based on the HVS, and post-filtering, restoration, error correction and con-cealment techniques which paly an increasing role in improvement of perceptual picturequality by reduction of perceived coding artifacts and transmission errors. A number

foveated perceptual coding. A noticeable feature of these new perceptual coders is thatthey use much more sophisticated vision models resulting in significant visual perfor-mance improvement. Discussions are included in these chapters on possible new codingarchitectures based on vision model as compared with existing statistically based cod-ing algorithms and architectures predominant in current software and hardware productsand systems.

Chapter 12 by Pica, Isnardi and Lubin examines critical issues associated with HVSBased Perceptual Video Encoders. It covers an overview of perceptual based approaches,possible architectures and applications, and future directions. Architectures which sup-port perceptual based video encoding are discussed for an MPEG-2 compliant encoder.

Tan and Wu present Perceptual Image Coding in Chapter 13, which provides acomprehensive review of HVS based image coding techniques to date. The review cov-ers traditional techniques where various HVS aspects or simple vision models are usedfor coder designs. Until most recently, this traditional approach has dominated researchon the topic with numerous publications, which forms one of, at least, four approachesto perceptual coding design. The chapter describes a perceptual distortion metric basedimage coder and a vision model based perceptual lossless coder along with detaileddiscussions on model calibration and coder performance evaluation results.

Chapter 14 by Wang and Bovik investigates novel Foveated Image and Video Cod-ing techniques, which they introduced most recently. It provides an introduction to thefoveation feature of the HVS, a review of various foveation techniques that have beenused to construct image and video coding systems, and detailed descriptions of examplefoveated picture coding systems.

Processing in Image Compression. Various image restoration and processing tech-


In Chapter 11, Corriveau presents an overview of Video Quality Experts Group

of new perceptual coders introduced in recent years are presented in Chapters 12, 13and 14, including rate-distortion optimization using perceptual distortion metrics and

Chapter 15 by Chen and Wu discusses the topic of Artifact Reduction by Post-

The next six chapters form Part III of this book, focusing on digital image and video

xviii Digital Video Image Quality and Perceptual Coding

niques have been reported in recent years to eliminate or to reduce picture coding arti-facts introduced in the encoding or transmission process to improve perceptual imageor video picture quality. It becomes widely accepted that these post-filtering algorithmsare an integral part of a compression package or system from a rate-distortion optimiza-tion standpoint. This chapter focuses on reduction of blocking and ringing artifacts inorder to improve the visual quality of reconstructed pictures. A DCT domain deblock-ing technique is described with a fast implementation algorithm after a review of codingartifacts reduction techniques to date.

Color bleeding is a prominent distortion associated with color images encoded by

a novel approach to Reduction of Color Bleeding in DCT Block-Coded Video, whichthey introduced recently. This post-processing technique is devised after a thoroughanalysis of the cause of color bleeding. The performance evaluation results have demon-strated marked improvement in perceptual quality of reconstructed pictures.

Issues associated with Error Resilience for Video Coding Service are investigated

and concealment methods. Significant improvement in terms of visual picture qualityhas been demonstrated by using a number of techniques presented.

challenges of the field which may be beneficial to the readers for future research.

Performance measures used to evaluate objective quality/impairment metrics against

We hope that readers will enjoy reading this book as much as we have enjoyedwriting it and find materials provided in it useful and relevant to their work and studiesin the field.

H. R. Wu K. R. RaoRoyal Melbourne Institute of Technology, University of Texas at Arlington,Australia U.S.A.


block DCT based picture coding systems. Coudoux and Gazalet present in Chapter 16

by Zhang in Chapter 17. It provides an introduction to error resilient coding techniques

Chapter 18, the final chapter of the book, highlights a number of critical issues and

subjective test data are discussed in Appendix A.

Preface xix

References

[Cla85] R. J. Clarke. Transform Coding of Images. London: Academic Press, 1985.

[Cla95] R. J. Clarke. Digital Compression of Still Images and Video. London: AcademicPress, 1995.

[CP84] W.-H. Chen and W. K. Pratt. Scene adaptive coder. IEEE Trans. Commun., COM-32:225232, March 1984.

[CS77] W.-H. Chen and C. H. Smith. Adaptive coding of monochrome and color images.IEEE Trans. Commun., COM-25:12851292, November 1977.

[Cut52] C. C. Cutler. Differential Quantization of Communication Signals, U.S. PatentNo.2,605,361, July 1952.

[Gab46] D. Gabor. Theory of communication. Journal of IEE, 93:429457, 1946.

[Gir84] B. Girod. Whats wrong with mean-squared error? In A. B. Watson, Ed., DigitalImages and Human Vision, 207220. Cambridge, MA: MIT Press, 1984.

[Gir03] B. Girod. Video coding for compression and beyond, keynote. In Proceedings ofIEEE International Conference on Image Processing, Barcelona, Spain, Septem-ber 2003.

[Goo51] W. M. Goodall. Television by pulse code modulation. Bell Systems TechnicalJournal, 28:3349, January 1951.

[Har52] C. W. Harrison. Experiments with linear prediction in television. Bell SystemsTechnical Journal, 29:764783, 1952.

[Hua65] T. S. Huang. PCM picture transmission. IEEE Spectrum, 2:5763, December1965.

[Huf52] D. A. Huffman. A method for the construction of minimum redundancy codes.IRE Proc., 40:10981101, 1952.

[ITU98] ITU. ITU-RBT. 500-9, methodology for the subjective assessment of the qualityof television pictures. ITU-RBT, 1998.

[JN84] N. S. Jayant and P. Noll. Digital Coding of Waveforms Principles and Applica-tions to Speech and Video. Upper Saddle River, NJ: Prentice Hall, 1984.

[Kel29] R. D. Kell. Improvements Relating to Electric Picture Transmission Systems,British Patent No.341,811, 1929.

[LB82] F. J. Lukas and Z. L. Budrikis. Picture quality prediction based on a visual model.IEEE Transactions on Communications, COM-30:16791692, July 1982.

[Miy88] M. Miyahara. Quality assessments for visual service. IEEE Communications Mag-azine, 26(10):5160, October 1988.

[MKA98] M. Miyahara, K. Kotani, and V. R. Algazi. Objective picture quality scale (pqs)for image coding. IEEE Transactions on Communications, 46(9):12151226, Sep-tember 1998.

[NH95] A. N. Netravali and B. G. Haskell. Digital Pictures Representation, Compres-sion and Standards. New York: Plenum Press, 2nd ed., 1995.


xx Digital Video Image Quality and Perceptual Coding

[RH96] K. R. Rao and J. J. Hwang. Techniques and Standards for Image, Video and AudioCoding. Upper Saddle River, NJ: Prentice Hall, 1996.

[SB65] A. J. Seyler and Z. L. Budrikis. Detail perception after scene changes in televisionimage presentations. IEEE Trans. on Information Theory, IT-11(1):3143, January1965.

[Sey62] A. J. Seyler. The coding of visual signals to reduce channel-capacity requirements.Proc. IEE, pt. C, 109(1):676684, 1962.

[Sey63] A. J. Seyler. Real-time recording of television frame difference areas. Proc. IEEE,51(1):478480, 1963.

[Sha48] C. E. Shannon. A mathematical theory of communication. Bell System TechnicalJournal, 27:379623, 1948.

[Tau00] D. Taubman. High performance scalable image compression with ebcot. IEEETrans. Image Proc., 9:11581170, July 2000.

[VQE00] VQEG. Final Report from the Video Quality Experts Group on the Validation ofObjective Models of Video Quality Assessment. VQEG, March 2000. Available

[YW00] Z. Yu and H. R. Wu. Human visual systems based objective digital video qualitymetrics. In Proceedings of Internetional Conference on Signal Processing 2000 of16th IFIP World Computer Congress, 2:10881095, Beijing, China, August 2000.

[YWWC02] Z. Yu, H. R. Wu, S. Winkler, and T. Chen. Vision model based impairment metricto evaluate blocking artifacts in digital video. Proc. IEEE, 90(1):154169, January2002.


from ftp.its.bldrdoc.gov.
ftp://ftp.its.bldrdoc.gov

Contents

List of Contributors ix

Acknowledgments xi

Preface xiii

I Picture Coding and Human Visual System Fundamentals 1

1 Digital Picture Compression and Coding Structure 3

1.1 Introduction to Digital Picture Coding . . . . . . . . . . . . . . . . . . 3

1.2 Characteristics of Picture Data . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Digital Image Data . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Digital Video Data . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Compression and Coding Techniques . . . . . . . . . . . . . . . . . . 12

1.3.1 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.2 Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.3 Transform Coding . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.3.1 Discrete cosine transform (DCT) . . . . . . . . . . . 14

1.3.3.2 Discrete wavelet transform (DWT) . . . . . . . . . . 18

1.4 Picture Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.4.1 Uniform/Nonuniform Quantizer . . . . . . . . . . . . . . . . . 21

1.4.2 Optimal Quantizer Design . . . . . . . . . . . . . . . . . . . . 21

1.4.3 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . 24

1.5 Rate-Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.6 Human Visual Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 26


xxii Digital Video Image Quality and Perceptual Coding

1.6.1 Contrast Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.6.2 Spatial Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.6.3 Masking Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.6.4 Mach Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.7 Digital Picture Coding Standards and Systems . . . . . . . . . . . . . . 31

1.7.1 JPEG-Still Image Coding Standard . . . . . . . . . . . . . . . 31

1.7.2 MPEG-Video Coding Standards . . . . . . . . . . . . . . . . . 36

1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Fundamentals of Human Vision and Vision Modeling 45

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.2 A Brief Overview of the Visual System . . . . . . . . . . . . . . . . . 45

2.3 Color Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3.1 Colorimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.3.2 Color Appearance, Color Order Systems and ColorDifference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.4 Luminance and the Perception of Light Intensity . . . . . . . . . . . . . 55

2.4.1 Luminance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.4.2 Perceived Intensity . . . . . . . . . . . . . . . . . . . . . . . . 57

2.5 Spatial Vision and Contrast Sensitivity . . . . . . . . . . . . . . . . . . 59

2.5.1 Acuity and Sampling . . . . . . . . . . . . . . . . . . . . . . . 60

2.5.2 Contrast Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 62

2.5.3 Multiple Spatial Frequency Channels . . . . . . . . . . . . . . 64

2.5.3.1 Pattern adaptation . . . . . . . . . . . . . . . . . . . 65

2.5.3.2 Pattern detection . . . . . . . . . . . . . . . . . . . . 65

2.5.3.3 Masking and facilitation . . . . . . . . . . . . . . . . 66

2.5.3.4 Nonindependence in spatial frequency and orientation 68

2.5.3.5 Chromatic contrast sensitivity . . . . . . . . . . . . . 70

2.5.3.6 Suprathreshold contrast sensitivity . . . . . . . . . . 71

2.5.3.7 Image compression and image difference . . . . . . . 74

2.6 Temporal Vision and Motion . . . . . . . . . . . . . . . . . . . . . . . 75

2.6.1 Temporal CSF . . . . . . . . . . . . . . . . . . . . . . . . . . 75


Contents xxiii

2.6.2 Apparent Motion . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.7 Visual Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

2.7.1 Image and Video Quality Research . . . . . . . . . . . . . . . . 80

2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3 Coding Artifacts and Visual Distortions 87

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.2 Blocking Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.2.1 Intraframe Coded Macroblocks . . . . . . . . . . . . . . . . . 90

3.2.2 Predictive Coded Macroblocks . . . . . . . . . . . . . . . . . . 90

3.3 Basis Image Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

3.3.1 Visual Significance of Each Basis Image . . . . . . . . . . . . . 92

3.3.2 Predictive Coded Macroblocks . . . . . . . . . . . . . . . . . . 92

3.3.3 Aggregation of Major Basis Images . . . . . . . . . . . . . . . 93

3.4 Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.5 Color Bleeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.6 Staircase Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

3.7 Ringing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.8 Mosaic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.8.1 Intraframe Coded Macroblocks . . . . . . . . . . . . . . . . . 100

3.8.2 Predictive-Coded Macroblocks . . . . . . . . . . . . . . . . . . 101

3.9 False Contouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.10 False Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.11 MC Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.12 Mosquito Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.12.1 Ringing-Related Mosquito Effect . . . . . . . . . . . . . . . . 108

3.12.2 Mismatch-Related Mosquito Effect . . . . . . . . . . . . . . . 109

3.13 Stationary Area Fluctuations . . . . . . . . . . . . . . . . . . . . . . . 110

3.14 Chrominance Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.15 Video Scaling and Field Rate Conversion . . . . . . . . . . . . . . . . 113

3.15.1 Video Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

3.15.2 Field Rate Conversion . . . . . . . . . . . . . . . . . . . . . . 115


xxiv Digital Video Image Quality and Perceptual Coding

3.16 Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

3.16.1 Line Repetition and Averaging . . . . . . . . . . . . . . . . . . 117

3.16.2 Field Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . 117

3.16.3 Motion Adaptivity . . . . . . . . . . . . . . . . . . . . . . . . 118

3.16.3.1 Luminance difference . . . . . . . . . . . . . . . . . 118

3.16.3.2 Median filters . . . . . . . . . . . . . . . . . . . . . 118

3.16.3.3 Motion compensation . . . . . . . . . . . . . . . . . 119

3.17 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

II Picture Quality Assessment and Metrics 123

4 Video Quality Testing 125

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

4.2 Subjective Assessment Methodologies . . . . . . . . . . . . . . . . . . 126

4.3 Selection of Test Materials . . . . . . . . . . . . . . . . . . . . . . . . 126

4.4 Selection of Participants Subjects . . . . . . . . . . . . . . . . . . . 128

4.4.1 Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.4.2 Non-Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.4.3 Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.5 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.5.1 Test Chamber . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.5.2 Common Experimental Mistakes . . . . . . . . . . . . . . . . . 131

4.6 International Test Methods . . . . . . . . . . . . . . . . . . . . . . . . 132

4.6.1 Double Stimulus Impairment Scale Method . . . . . . . . . . . 132

4.6.2 Double Stimulus Quality Scale Method . . . . . . . . . . . . . 137

4.6.3 Comparison Scale Method . . . . . . . . . . . . . . . . . . . . 141

4.6.4 Single Stimulus Methods . . . . . . . . . . . . . . . . . . . . . 142

4.6.5 Continuous Quality Evaluations . . . . . . . . . . . . . . . . . 143

4.6.6 Discussion of SSCQE and DSCQS . . . . . . . . . . . . . . . 145

4.6.7 Pitfalls of Different Methods . . . . . . . . . . . . . . . . . . . 147

4.7 Objective Assessment Methods . . . . . . . . . . . . . . . . . . . . . . 150


Contents xxv

4.7.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.7.2 Requirement for Standards . . . . . . . . . . . . . . . . . . . . 151

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5 Perceptual Video Quality Metrics A Review 155

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

5.2 Quality Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.3 Metric Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.4 Pixel-Based Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5.5 The Psychophysical Approach . . . . . . . . . . . . . . . . . . . . . . 160

5.5.1 HVS Modeling Fundamentals . . . . . . . . . . . . . . . . . . 160

5.5.2 Single-Channel Models . . . . . . . . . . . . . . . . . . . . . . 163

5.5.3 Multi-Channel Models . . . . . . . . . . . . . . . . . . . . . . 164

5.6 The Engineering Approach . . . . . . . . . . . . . . . . . . . . . . . . 165

5.6.1 Full-Reference Metrics . . . . . . . . . . . . . . . . . . . . . . 166

5.6.2 Reduced-Reference Metrics . . . . . . . . . . . . . . . . . . . 167

5.6.3 No-Reference Metrics . . . . . . . . . . . . . . . . . . . . . . 167

5.7 Metric Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.7.2 Video Quality Experts Group . . . . . . . . . . . . . . . . . . . 170

5.7.3 Limits of Prediction Performance . . . . . . . . . . . . . . . . 171

5.8 Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . 172

6 Philosophy of Picture Quality Scale 181

6.1 Objective Picture Quality Scale for Image Coding . . . . . . . . . . . . 181

6.1.1 PQS and Evaluation of Displayed Image . . . . . . . . . . . . . 181

6.1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

6.1.3 Construction of a Picture Quality Scale . . . . . . . . . . . . . 182

6.1.3.1 Luminance coding error . . . . . . . . . . . . . . . . 183

6.1.3.2 Spatial frequency weighting of errors . . . . . . . . . 183

6.1.3.3 Random errors and disturbances . . . . . . . . . . . 185

6.1.3.4 Structured and localized errors and disturbances . . . 186


xxvi Digital Video Image Quality and Perceptual Coding

6.1.3.5 Principal component analysis . . . . . . . . . . . . . 188

6.1.3.6 Computation of PQS . . . . . . . . . . . . . . . . . . 189

6.1.4 Visual Assessment Tests . . . . . . . . . . . . . . . . . . . . . 189

6.1.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . 191

6.1.4.2 Test pictures . . . . . . . . . . . . . . . . . . . . . . 192

6.1.4.3 Coders . . . . . . . . . . . . . . . . . . . . . . . . . 193

6.1.4.4 Determination of MOS . . . . . . . . . . . . . . . . 193

6.1.5 Results of Experiments . . . . . . . . . . . . . . . . . . . . . . 193

6.1.5.1 Results of principal component analysis . . . . . . . 193

6.1.5.2 Multiple regression analysis . . . . . . . . . . . . . . 195

6.1.5.3 Evaluation of PQS . . . . . . . . . . . . . . . . . . . 195

6.1.5.4 Generality and robustness of PQS . . . . . . . . . . . 196

6.1.6 Key Distortion Factors . . . . . . . . . . . . . . . . . . . . . . 197

6.1.6.1 Characteristics of the principal components . . . . . . 197

6.1.6.2 Contribution of the distortion factors . . . . . . . . . 197

6.1.6.3 Other distortion factors . . . . . . . . . . . . . . . . 198

6.1.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

6.1.7.1 Limitations in applications . . . . . . . . . . . . . . 198

6.1.7.2 Visual assessment scales and methods . . . . . . . . 200

6.1.7.3 Human vision models and image quality metrics . . . 200

6.1.7.4 Specializing PQS for a specific coding method . . . . 201

6.1.7.5 PQS in color picture coding . . . . . . . . . . . . . . 201

6.1.8 Applications of PQS . . . . . . . . . . . . . . . . . . . . . . . 201

6.1.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

6.2 Application of PQS to a Variety of Electronic Images . . . . . . . . . . 202

6.2.1 Categories of Image Evaluation . . . . . . . . . . . . . . . . . 203

6.2.1.1 Picture spatial resolution and viewing distance . . . . 203

6.2.1.2 Constancy of viewing distance . . . . . . . . . . . . 205

6.2.1.3 Viewing angle between adjacent pixels . . . . . . . . 206

6.2.2 Linearization of the Scale . . . . . . . . . . . . . . . . . . . . 206

6.2.3 Importance of Center Area of Image in Quality Evaluation . . . 208


Contents xxvii

6.2.4 Other Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.3 Various Categories of Image Systems . . . . . . . . . . . . . . . . . . 209

6.3.1 Standard TV Images with Frame Size of about 500 640Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.3.2 HDTV and Super HDTV . . . . . . . . . . . . . . . . . . . . . 209

6.3.3 Extra High Quality Images . . . . . . . . . . . . . . . . . . . . 211

6.3.4 Cellular Phone Type . . . . . . . . . . . . . . . . . . . . . . . 212

6.3.5 Personal Computer and Display for CG . . . . . . . . . . . . . 213

6.4 Study at ITU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

6.4.1 SG9 Recommendations for Quality Assessment . . . . . . . . . 213

6.4.2 J.143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

6.4.2.1 FR scheme . . . . . . . . . . . . . . . . . . . . . . . 215

6.4.2.2 NR scheme . . . . . . . . . . . . . . . . . . . . . . . 215

6.4.2.3 RR scheme . . . . . . . . . . . . . . . . . . . . . . . 215

6.4.3 J.144 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.4.4 J.133 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.4.5 J.146 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

6.4.6 J.147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

6.4.7 J.148 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

7 Structural Similarity Based Image Quality Assessment 225

7.1 Structural Similarity and Image Quality . . . . . . . . . . . . . . . . . 225

7.2 The Structural SIMilarity (SSIM) Index . . . . . . . . . . . . . . . . . 228

7.3 Image Quality Assessment Based on the SSIM Index . . . . . . . . . . 233

7.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

8 Vision Model Based Digital Video Impairment Metrics 243

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

8.2 Vision Modeling for Impairment Measurement . . . . . . . . . . . . . 247

8.2.1 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . 248

8.2.2 Temporal Filtering . . . . . . . . . . . . . . . . . . . . . . . . 249

8.2.3 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 250


xxviii Digital Video Image Quality and Perceptual Coding

8.2.4 Contrast Gain Control . . . . . . . . . . . . . . . . . . . . . . 251

8.2.5 Detection and Pooling . . . . . . . . . . . . . . . . . . . . . . 253

8.2.6 Model Parameterization . . . . . . . . . . . . . . . . . . . . . 254

8.2.6.1 Parameterization by vision research experiments . . . 254

8.2.6.2 Parameterization by video quality experiments . . . . 255

8.3 Perceptual Blocking Distortion Metric . . . . . . . . . . . . . . . . . . 258

8.3.1 Blocking Dominant Region Segmentation . . . . . . . . . . . . 259

8.3.1.1 Vertical and horizontal block edge detection . . . . . 261

8.3.1.2 Removal of edges coexisting in original and processedsequences . . . . . . . . . . . . . . . . . . . . . . . 263

8.3.1.3 Removal of short isolated edges in processed sequence 263

8.3.1.4 Adjacent edge removal . . . . . . . . . . . . . . . . 263

8.3.1.5 Generation of blocking region map . . . . . . . . . . 263

8.3.1.6 Ringing region detection . . . . . . . . . . . . . . . 265

8.3.1.7 Exclusion of ringing regions from blocking region map 265

8.3.2 Summation of Distortions in Blocking Dominant Regions . . . 265

8.3.3 Performance Evaluation of the PBDM . . . . . . . . . . . . . . 266

8.4 Perceptual Ringing Distortion Measure . . . . . . . . . . . . . . . . . . 269

8.4.1 Ringing Region Segmentation . . . . . . . . . . . . . . . . . . 271

8.4.1.1 Modified variance computation . . . . . . . . . . . . 272

8.4.1.2 Smooth and complex region detection . . . . . . . . 272

8.4.1.3 Boundary labeling and distortion calculation . . . . . 273

8.4.2 Detection and Pooling . . . . . . . . . . . . . . . . . . . . . . 274

8.4.3 Performance Evaluation of the PRDM . . . . . . . . . . . . . . 274

8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

9 Computational Models for Just-Noticeable Difference 281

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

9.1.1 Single-Stimulus JNDT Tests . . . . . . . . . . . . . . . . . . . . 282

9.1.2 JND Tests with Real-World Images . . . . . . . . . . . . . . . 283

9.1.3 Applications of JND Models . . . . . . . . . . . . . . . . . . . 283

9.1.4 Objectives and Organization of the Following Sections . . . . . 284


Contents xxix

9.2 JND with DCT Subbands . . . . . . . . . . . . . . . . . . . . . . . . . 285

9.2.1 Formulation for Base Threshold . . . . . . . . . . . . . . . . . 286

9.2.1.1 Spatial CSF equations . . . . . . . . . . . . . . . . . 286

9.2.1.2 Base threshold . . . . . . . . . . . . . . . . . . . . . 287

9.2.2 Luminance Adaptation Considerations . . . . . . . . . . . . . 289

9.2.3 Contrast Masking . . . . . . . . . . . . . . . . . . . . . . . . . 291

9.2.3.1 Intra-band masking . . . . . . . . . . . . . . . . . . 291

9.2.3.2 Inter-band masking . . . . . . . . . . . . . . . . . . 291

9.2.4 Other Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

9.3 JND with Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

9.3.1 JND Estimation from Pixel Domain . . . . . . . . . . . . . . . 294

9.3.1.1 Spatial JNDs . . . . . . . . . . . . . . . . . . . . . . 294

9.3.1.2 Simplified estimators . . . . . . . . . . . . . . . . . 295

9.3.1.3 Temporal masking effect . . . . . . . . . . . . . . . 296

9.3.2 Conversion between Subband- and Pixel-Based JNDs . . . . . . 297

9.3.2.1 Subband summation to pixel domain . . . . . . . . . 297

9.3.2.2 Pixel domain decomposition into subbands . . . . . . 298

9.4 JND Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 298

9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

10 No-Reference Quality Metric for Degraded and Enhanced Video 305

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

10.2 State-of-the-Art for No-Reference Metrics . . . . . . . . . . . . . . . . 306

10.3 Quality Metric Components and Design . . . . . . . . . . . . . . . . . 307

10.3.1 Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . . . . 309

10.3.2 Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 310

10.3.3 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

10.3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

10.3.5 Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

10.3.6 Sharpness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

10.4 No-Reference Overall Quality Metric . . . . . . . . . . . . . . . . . . 313

10.4.1 Building and Training the NROQM . . . . . . . . . . . . . . . 314


xxx Digital Video Image Quality and Perceptual Coding

10.5 Performance of the Quality Metric . . . . . . . . . . . . . . . . . . . . 317

10.5.1 Testing NROQM . . . . . . . . . . . . . . . . . . . . . . . . . 318

10.5.2 Test with Expert Viewers . . . . . . . . . . . . . . . . . . . . . 320

10.6 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . 321

11 Video Quality Experts Group 325

11.1 Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

11.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

11.3 Phase I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

11.3.1 The Subjective Test Plan . . . . . . . . . . . . . . . . . . . . . 328

11.3.2 The Objective Test Plan . . . . . . . . . . . . . . . . . . . . . 328

11.3.3 Comparison Metrics . . . . . . . . . . . . . . . . . . . . . . . 329

11.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

11.4 Phase II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

11.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

11.5 Continuing Work and Directions . . . . . . . . . . . . . . . . . . . . . 332

11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

III Perceptual Coding and Processing of Digital Pictures 335

12 HVS Based Perceptual Video Encoders 337

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

12.1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

12.2 Noise Visibility and Visual Masking . . . . . . . . . . . . . . . . . . . 338

12.3 Architectures for Perceptual Based Coding . . . . . . . . . . . . . . . 340

12.3.1 Masking Calculations . . . . . . . . . . . . . . . . . . . . . . 343

12.3.2 Perceptual Based Rate Control . . . . . . . . . . . . . . . . . . 345

12.3.2.1 Macroblock level control . . . . . . . . . . . . . . . 345

12.3.2.2 Picture level control . . . . . . . . . . . . . . . . . . 346

12.3.2.3 GOP level control . . . . . . . . . . . . . . . . . . . 348

12.3.3 Look Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

12.4 Standards-Specific Features . . . . . . . . . . . . . . . . . . . . . . . . 352


Contents xxxi

12.4.1 Exploitation of Smaller Block Sizes in Advanced Coding Stan-dards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

12.4.1.1 The origin of blockiness . . . . . . . . . . . . . . . . 352

12.4.1.2 Parameters that affect blockiness visibility . . . . . . 352

12.4.2 In-Loop Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 356

12.4.3 Perceptual-Based Scalable Coding Schemes . . . . . . . . . . . 356

12.5 Salience/Maskability Pre-Processing . . . . . . . . . . . . . . . . . . . 357

12.6 Application to Multi-Channel Encoding . . . . . . . . . . . . . . . . . 358

13 Perceptual Image Coding 361

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

13.1.1 Watsons DCTune . . . . . . . . . . . . . . . . . . . . . . . . 362

13.1.2 Safranek and Johnstons Subband Image Coder . . . . . . . . . 363

13.1.3 Hontsch and Karams APIC . . . . . . . . . . . . . . . . . . . 363

13.1.4 Chou and Lis Perceptually Tuned Subband Image Coder . . . . 365

13.1.5 Taubmans EBCOT-CVIS . . . . . . . . . . . . . . . . . . . . 366

13.1.6 Zeng et al.s Point-Wise Extended Visual Masking . . . . . . . 366

13.2 A Perceptual Distortion Metric Based Image Coder . . . . . . . . . . . 368

13.2.1 Coder Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 368

13.2.2 Perceptual Image Distortion Metric . . . . . . . . . . . . . . . 369

13.2.2.1 Frequency transform . . . . . . . . . . . . . . . . . . 369

13.2.2.2 CSF . . . . . . . . . . . . . . . . . . . . . . . . . . 371

13.2.2.3 Masking response . . . . . . . . . . . . . . . . . . . 372

13.2.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . 373

13.2.2.5 Overall model . . . . . . . . . . . . . . . . . . . . . 373

13.2.3 EBCOT Adaptation . . . . . . . . . . . . . . . . . . . . . . . . 375

13.3 Model Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

13.3.1 Test Material . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

13.3.2 Generation of Distorted Images . . . . . . . . . . . . . . . . . 378

13.3.3 Subjective Assessment . . . . . . . . . . . . . . . . . . . . . . 379

13.3.4 Arrangements and Apparatus . . . . . . . . . . . . . . . . . . . 380

13.3.5 Presentation of Material . . . . . . . . . . . . . . . . . . . . . 381


xxxii Digital Video Image Quality and Perceptual Coding

13.3.6 Grading Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

13.3.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

13.3.8 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

13.3.9 Model Optimization . . . . . . . . . . . . . . . . . . . . . . . 386

13.3.9.1 Full parametric optimization . . . . . . . . . . . . . 389

13.3.9.2 Algorithmic optimization . . . . . . . . . . . . . . . 390

13.3.9.3 Coder optimization . . . . . . . . . . . . . . . . . . 391

13.3.9.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . 392

13.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 394

13.4.1 Assessment Material . . . . . . . . . . . . . . . . . . . . . . . 395

13.4.2 Objective Evaluation . . . . . . . . . . . . . . . . . . . . . . . 395

13.4.3 Objective Results . . . . . . . . . . . . . . . . . . . . . . . . . 397

13.4.4 Subjective Evaluation . . . . . . . . . . . . . . . . . . . . . . . 400

13.4.4.1 Dichotomous FCM . . . . . . . . . . . . . . . . . . 400

13.4.4.2 Trichotomous FCM . . . . . . . . . . . . . . . . . . 400

13.4.4.3 Assessment arrangements . . . . . . . . . . . . . . . 401

13.4.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 402

13.4.5.1 PC versus EBCOT-MSE . . . . . . . . . . . . . . . . 402

13.4.5.2 PC versus EBCOT-CVIS . . . . . . . . . . . . . . . 406

13.4.5.3 PC versus EBCOT-XMASK . . . . . . . . . . . . . . 406

13.4.6 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . 406

13.5 Perceptual Lossless Coder . . . . . . . . . . . . . . . . . . . . . . . . 412

13.5.1 Coding Structure . . . . . . . . . . . . . . . . . . . . . . . . . 412

13.5.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 414

13.5.2.1 Subjective evaluation . . . . . . . . . . . . . . . . . 415

13.5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 416

13.5.2.3 Discussions . . . . . . . . . . . . . . . . . . . . . . 416

13.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

14 Foveated Image and Video Coding 431

14.1 Foveated Human Vision and Foveated Image Processing . . . . . . . . 431

14.2 Foveation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434


Contents xxxiii

14.2.1 Geometric Methods . . . . . . . . . . . . . . . . . . . . . . . . 434

14.2.2 Filtering Based Methods . . . . . . . . . . . . . . . . . . . . . 436

14.2.3 Multiresolution Methods . . . . . . . . . . . . . . . . . . . . . 438

14.3 Scalable Foveated Image and Video Coding . . . . . . . . . . . . . . . 440

14.3.1 Foveated Perceptual Weighting Model . . . . . . . . . . . . . . 440

14.3.2 Embedded Foveation Image Coding . . . . . . . . . . . . . . . 445

14.3.3 Foveation Scalable Video Coding . . . . . . . . . . . . . . . . 447

14.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452

15 Artifact Reduction by Post-Processing in Image Compression 459

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

15.2 Image Compression and Coding Artifacts . . . . . . . . . . . . . . . . 461

15.2.1 Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . . . . 462

15.2.2 Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 464

15.3 Reduction of Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . 465

15.3.1 Adaptive Postfiltering of Transform Coefficients . . . . . . . . 469

15.3.1.1 Consideration of masking effect . . . . . . . . . . . . 471

15.3.1.2 Block activity . . . . . . . . . . . . . . . . . . . . . 473

15.3.1.3 Adaptive filtering . . . . . . . . . . . . . . . . . . . 473

15.3.1.4 Quantization constraint . . . . . . . . . . . . . . . . 474

15.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 475

15.3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 477

15.3.3.1 Results of block classification . . . . . . . . . . . . . 478

15.3.3.2 Performance evaluation . . . . . . . . . . . . . . . . 478

15.4 Reduction of Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . 482

15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

16 Reduction of Color Bleeding in DCT Block-Coded Video 489

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

16.2 Analysis of the Color Bleeding Phenomenon . . . . . . . . . . . . . . . 490

16.2.1 Digital Color Video Formats . . . . . . . . . . . . . . . . . . . 490

16.2.2 Color Quantization . . . . . . . . . . . . . . . . . . . . . . . . 491


xxxiv Digital Video Image Quality and Perceptual Coding

16.2.3 Analysis of Color Bleeding Distortion . . . . . . . . . . . . . . 492

16.3 Description of the Post-Processor . . . . . . . . . . . . . . . . . . . . . 495

16.4 Experimental Results Concluding Remarks . . . . . . . . . . . . . . 499

17 Error Resilience for Video Coding Service 503

17.1 Introduction to Error Resilient Coding Techniques . . . . . . . . . . . . 503

17.2 Error Resilient Coding Methods Compatible with MPEG-2 . . . . . . . 504

17.2.1 Temporal Localization . . . . . . . . . . . . . . . . . . . . . . 504

17.2.2 Spatial Localization . . . . . . . . . . . . . . . . . . . . . . . 506

17.2.3 Concealment . . . . . . . . . . . . . . . . . . . . . . . . . . . 506

17.2.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

17.3 Methods for Concealment of Cell Loss . . . . . . . . . . . . . . . . . . 513

17.3.1 Spatial Concealment . . . . . . . . . . . . . . . . . . . . . . . 513

17.3.2 Temporal Concealment . . . . . . . . . . . . . . . . . . . . . . 513

17.3.3 The Boundary Matching Algorithm (BMA) . . . . . . . . . . . 517

17.3.4 Decoder Motion Vector Estimation (DMVE) . . . . . . . . . . 520

17.3.5 Extension of DMVE algorithm . . . . . . . . . . . . . . . . . . 522

17.4 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 523

17.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 524

17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

18 Critical Issues and Challenges 543

18.1 Picture Coding Structures . . . . . . . . . . . . . . . . . . . . . . . . . 543

18.1.1 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . 545

18.1.2 Complete vs. Over-Complete Transforms . . . . . . . . . . . . 549

18.1.3 Decisions Decisions . . . . . . . . . . . . . . . . . . . . . . . 551

18.2 Vision Modeling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 554

18.3 Spatio-Temporal Masking in Video Coding . . . . . . . . . . . . . . . 558

18.4 Picture Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . 559

18.4.1 Picture Quality Metrics Design Approaches . . . . . . . . . . . 559

18.4.2 Alternative Assessment Methods and Issues . . . . . . . . . . . 560

18.4.3 More Challenges in Picture Quality Assessment . . . . . . . . . 561


Contents xxxv

18.5 Challenges in Perceptual Coder Design . . . . . . . . . . . . . . . . . 562

18.5.1 Incorporating HVS in Existing Coders . . . . . . . . . . . . . . 562

18.5.2 HVS Inspired Coders . . . . . . . . . . . . . . . . . . . . . . . 563

18.5.3 Perceptually Lossless Coding . . . . . . . . . . . . . . . . . . 565

18.6 Codec System Design Optimization . . . . . . . . . . . . . . . . . . . 566

18.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566

A VQM Performance Metrics 575

A.1 Metrics Relating to Model Prediction Accuracy . . . . . . . . . . . . 576

A.2 Metrics Relating to Prediction Monotonicity of a Model . . . . . . . . 580

A.3 Metrics Relating to Prediction Consistency . . . . . . . . . . . . . . . 581

A.4 MATLAB Source Code . . . . . . . . . . . . . . . . . . . . . . . . . 583

A.5 Supplementary Analyses . . . . . . . . . . . . . . . . . . . . . . . . . 591


Part I

Picture Coding and Human VisualSystem Fundamentals


Chapter 1

Digital Picture Compression andCoding Structure

Jae Jeong Hwang, Hong Ren Wu and K.R. Rao

Kunsan National University, Republic of Korea Royal Melbourne Institute of Technology, Australia University of Texas at Arlington, U.S.A.

1.1 Introduction to Digital Picture Coding

Digital video service has become an integral part of entertainment, education, broadcast-ing, communication, and business arenas [Say00, Bov05, PE02, PC00, GW02, Ric03,Gha03, Gib97]. Digital camcorders are more preferred than analog ones in the con-sumer market with their convenience and high quality. In fact, still image or movingvideo taken by digital cameras can be stored, displayed, edited, printed or transmittedvia the Internet. Digital television provides strong affinities to the TV audience and isgoing to expel analog television receivers from the market. Digital video and image aresimple alternative means of carrying the same information as their analog counterparts.An ideal analog recorder should exactly record the natural phenomena in the form ofvideo, image or audio. An ideal digital recorder has to do the same work with a num-ber of advantages such as interactivity, flexibility, and compressibility. Although, in thereal situation, ideal conditions seldom prevail and may not be possible by means of bothanalog and digital techniques, digital compression is one of the techniques used to lowerthe cost for a video system while maintaining the same quality of service. Data com-pression is a process to yield a compact representation of a signal in the digital format.For delivery or transmission of information, the key issue is to minimize the bit ratethat represents the number of bits per second in the real-time delivery system such as avideo stream or the number of bits per picture element (pixel or pel) in the static image.Digital data contains huge amounts of information. Full motion video, e.g., in NTSCformat at 30 frames per second (fps) and at 720 x 480 pixel resolution, generates data for

3 2006 by Taylor & Francis Group, LLC

4 Digital Video Image Quality and Perceptual Coding

luminance component at 10.4 Mbytes/sec, assuming 8 bits per sample quantization. Ifwe include color components for a 4:2:2 format, data rate of 20.8 Mbytes/sec is needed,allowing only 31 seconds of video storage on a 650 Mbyte CD-ROM. The storage ca-pacity up to 74 minutes is only possible by means of compression technology. Then,how can it be compressed? There is considerable statistical redundancy in the signal.

Spatial correlation: Within a single two-dimensional image plane, there usuallyexists significant correlation among neighboring samples.

Temporal correlation: For temporal data, such as moving video through temporaldirection, there usually exists significant correlation among samples in adjacentframes.

Spectral correlation: For multispectral images, such as satellite images, there usu-ally exists significant correlation among different frequency bands.

Original video/image data containing any kind of correlation or redundancy can be com-pressed by appropriate techniques such as predictive or transform based coding thatreduces correlation inherently. Image compression aims at reducing the number of bitsneeded to represent an image by removing the spatial and spectral redundancies as muchas possible, while video compression is achieved by removing temporal redundancy aswell. This is called redundancy reduction, the principle behind compression. Anotherimportant principle behind compression is irrelevancy reduction that will not be noticedby the signal receiver, namely the Human Visual System (HVS). Two ways of classify-ing compression techniques in terms of reproduction quality at the decoder are losslesscompression and lossy compression. In lossless compression schemes, the reconstructedimage, after compression, is numerically identical to the original image. This is also re-ferred to as a reversible process. However lossless compression can only achieve amodest amount of compression depending on the amount of data correlation. An imagereconstructed following lossy compression contains degradation relative to the original.Often this is because the compression scheme can completely discard redundant infor-mation. However, lossy schemes are capable of achieving much higher compression.Visually lossless coding is achieved if no visible loss is perceived by human viewersunder normal viewing conditions. Different classes of compression techniques with re-spect to statistical redundancy and irrelevancy (or psychovisual redundancy) reductions

Another classification in terms of coding techniques is based on prediction or trans-formation techniques. In predictive coding, information already sent or available is usedto predict future values, and the difference is coded and transmitted. Prediction canbe performed in any domain, but is usually done in the image or spatial domain. It isrelatively simple to implement and is readily adapted to local image characteristics. Dif-ferential Pulse Code Modulation (DPCM) is one particular example of predictive coding


are illustrated in Figure 1.1.

1.1. Introduction to Digital Picture Coding 5

Figure 1.1: Illustration of digital picture compression fundamental concepts.

in the spatial or time domain. Transform coding, on the other hand, first transforms theimage from its spatial domain representation to a different type of representation us-

then encodes the transformed values (coefficients). This method provides greater datacompression compared to predictive methods, although at the expense of higher com-putational complexity.

As a result of a quantization process, inevitable errors or distortions happen in thedecoded picture quality. Distortion measures can be divided into two categories: sub-jective and objective measures. It is said to be subjective if the quality is evaluated byhumans. The use of human analysts, however, is quite impractical and may not guaran-tee objectivity. The assessment is not stationary, depending on their feelings. Moreover,the definition of distortion highly depends on the application, i.e. the best quality eval-uation is not always made by people at all.

In the objective measures, the distortion is calculated as the difference between theoriginal image, xo, and the reconstructed image, xr, by a predefined function. It isassumed that the original image is perfect. All changes are considered as occurrences ofdistortion, no matter how they appear to a human observer. The quantitative distortionof the reconstructed image is commonly measured by the mean square error (MSE),the mean absolute error (MAE), and the peak-to-peak signal to noise ratio (PSNR):

MSE =1

M NM1m=0

N1n=0

(xo[m, n] xr[m, n])2 (1.1)

MAE =1

M NM1m=0

N1n=0

|xo[m, n] xr[m, n])| (1.2)


ing some well-known transforms such as DCT, DWT (See details in Section 1.3) and


PSNR = 10 log102552

MSE(1.3)

where M and N are the height and the width of the image, respectively, and (1.3) isdefined for 8 bits/pixel monochrome image representation.

These measures are widely used in the literature. Unfortunately, these measuresdo not always coincide with the evaluations of a human expert. The human eye, forexample, does not observe small changes of intensity between individual pixels, but issensitive to the changes in the average value and contrast in larger regions. Thus, oneapproach would be to calculate the local properties, such as mean values and variancesof some small regions in the image, and then compare them between the original andthe reconstructed images. Another deficiency of these distortion functions is that theymeasure only local, pixel-by-pixel differences, and do not consider global artifacts, suchas blockiness, blurring, jaggedness of the edges, ringing or any other type of structuraldegradation.

1.2 Characteristics of Picture Data

1.2.1 Digital Image Data

Digital image is visual information represented in a discrete form, suitable for digitalelectronic storage and transmission. It is obtained by image sampling techniques that adiscrete array x[m, n] is extracted from the continuous image field at some time instantover some rectangular area M N . The digitized brightness value is called the greylevel value. Each image sample is a picture element called a pixel or a pel. Thus, atwo-dimensional (2-D) digital image is defined as:

x[m, n] =

x[0, 0] x[0, 1] x[0, N 1]x[1, 0] x[1, 1] x[1, N 1]

......

. . ....

x[M 1, 0] x[M 1, 1] x[M 1, N 1]

(1.4)

where its array of image samples is defined on the two-dimensional Cartesian coordinate

size M N with 2q different grey levels is b = M N q. That is, to store a typicalimage of size 512512 with 256 grey levels (q = 8), we need 2,097,152 bits or 262,144bytes. We may try to reduce the factor M , N or q to save capacity of storage or bits fortransmission, but it is not said to be compressed, since it results in significant loss in thequality of the picture.


system as illustrated in Figure 1.2. The number of bits, b, we need to store an image of

1.2. Characteristics of Picture Data 7

Figure 1.2: Geometric relationship between the Cartesian coordinate system and its array of image sam-ples.

1.2.2 Digital Video Data

A natural video stream is continuous in both spatial and temporal domains. In order torepresent and process a video stream digitally it is also necessary to sample spatiallyand temporally as shown in Figure 1.3. An image sampled in the spatial domain istypically represented on a rectangular grid and a video stream is a series of still imagessampled at regular intervals in time. In this case, the still image is usually called a frame.For processing video signal in a television format, a couple of fields are interlaced toconstruct a frame. It is called a picture for processing non-interlaced (frame-based)video signal. Each spatio-temporal sample, pixel, is represented as a positive digitalnumber that describes the brightness (luminance) and color components.

Figure 1.3: Three dimensional (spatial and temporal) domain in a video stream.

A natural video scene is captured, typically with a camera, and converted to a sam-

color-difference format Y C1C2 rather than in the original RGB natural color format.It may then be handled in the digital domain in a number of ways, including processing,storage and transmission. At the final output of the system, it is displayed to a viewerby reproducing it on a video monitor.

The RGB (red, green, and blue) color space is the basic choice for computer graph-ics and image frame buffers because color CRTs use red, green, and blue phosphors tocreate the desired color as the three primary additive colors. Individual components areadded together to form a color and an equivalent addition of all components produceswhite. However, RGB is not very efficient for representing real-world images, sinceequal bandwidths are required to describe all the three color components. The equal


pled digital representation as shown in Figure 1.4. Digital video is represented in digital


Figure 1.4: Digital representation and color format conversion of natural video stream.

bandwidths result in the same pixel depth and display resolution for each color com-ponent. Using 8 bits per component requires 24 bits information for a pixel, resultingin 3 times the capacity of the luminance component. Moreover, the sensitivity of thecolor component of the human eye is less than that of the luminance component. Forthese reasons, many image coding standards and broadcast systems use luminance andcolor difference signals. These are, for example, Y UV and Y IQ for analog televisionstandards and Y CbCr for their digital version.

The Y CbCr format recommended by the ITU.R BT-601 [ITU82] as a worldwidevideo component standard is obtained from digital gamma-corrected RGB signals asfollows:

Y = 0.299R + 0.587G + 0.114B

Cb = 0.169R 0.331G + 0.500BCr = 0.500R

0.419G 0.081B(1.5)

The color-difference signals are given by:

(B Y ) = 0.299R 0.587G + 0.886B(R Y ) = 0.701R 0.587G 0.114B (1.6)

where the values for (BY ) have a range of 0.886 and for (RY ) a range of 0.701,while those for Y have a range of 0 to 1.

To restore the signal excursion of the color-difference signals to unity (-0.5 to +0.5),(BY ) is multiplied by a factor 0.564 (0.5 divided by 0.886) and (RY ) is multipliedby a factor 0.713 (0.5 divided by 0.701). Thus the Cb and Cr are the re-normalized blueand red color difference signals, respectively.

Given that the luminance signal is to occupy 220 levels (16 to 235), the luminancesignal has to be scaled to obtain the digital value, Yd. Similarly, the color differencesignals are to occupy 224 levels and the zero level is to be level 128. The digital repre-sentation for the three components are expressed as [NH95]:


1.2. Characteristics of Picture Data 9

Yd = 219Y + 16Cb = 224[0.564(B Y )] + 128 = 126(B Y ) + 128Cr = 224[0.713(R Y )] + 128 = 160(R Y ) + 128

(1.7)

or in its vector form: YdCb

Cr

= 65.481 128.553 24.96637.797 74.203 112.000

112.000 93.786 18.214

RG

B

+ 16128

128

(1.8)

where the corresponding level number after quantization is the nearest integer.

Video transmission bit rate is decreased by adopting lower sampling rates while pre-serving acceptable video quality. Given image resolution of 720576 pixels representedwith 8 bits each, the bit rate required is calculated as:

4:4:4 resolution: 72057683 = 10 Mbits/frame10 Mbits/frame29.97 frames/sec = 300 Mbits/sec

4:2:0 resolution: (7205768) + (3602888)2 = 5 Mbits/frame5 Mbits/frame29.97 frames/sec = 150 Mbits/sec

The 4:2:0 version requires half as many bits as the 4:4:4 version but compression isstill necessary for transmission and storage.

1.2.3 Statistical Analysis

The mean value of the discrete image array, x as defined in (1.4), expressed convenientlyin vector-space form is given by

x = E{x} = 1M N

M1m=0

N1n=0

x[m, n] =2b1k=0

xkp(xk) (1.9)

where xk denotes the k-th grey level that varies from value 0 to maximum level 2b 1defined by the quantization bits b and p(xk) = nk/(M N) the probability of xk.

The variance function of the image array, x, is defined as

2x =1

M NM1m=0

N1n=0

(x[m, n] x)2 =2b1k=0

(

Digital Video Image Quality and Perceptual Coding

Documents

Transcript of Digital Video Image Quality and Perceptual Coding