Digital Video Image Quality and Perceptual Coding
-
Upload
hr-wu-and-kr-rao -
Category
Documents
-
view
214 -
download
2
Transcript of Digital Video Image Quality and Perceptual Coding
-
Digital VideoImage Quality
and Perceptual
Coding
2006 by Taylor & Francis Group, LLC
-
Signal Processing and Communications
Editorial Board
Maurice G. Ballanger, Conservatoire Nationaldes Arts et Mtiers (CNAM), Paris
Ezio Biglieri, Politecnico di Torino, Italy
Sadaoki Furui, Tokyo Institute of Technology
Yih-Fang Huang, University of Notre Dame
Nikil Jayant, Georgia Institute of Technology
Aggelos K. Katsaggelos, Northwestern University
Mos Kaveh, University of Minnesota
P. K. Raja Rajasekaran, Texas Instruments
John Aasted Sorenson, IT University of Copenhagen
1. Digital Signal Processing for Multimedia Systems, edited by Keshab K. Parhi and Takao Nishitani
2. Multimedia Systems, Standards, and Networks, edited by Atul Puri and Tsuhan Chen
3. Embedded Multiprocessors: Scheduling and Synchronization, SundararajanSriram and Shuvra S. Bhattacharyya
4. Signal Processing for Intelligent Sensor Systems, David C. Swanson5. Compressed Video over Networks, edited by Ming-Ting Sun and Amy R. Reibman6. Modulated Coding for Intersymbol Interference Channels, Xiang-Gen Xia7. Digital Speech Processing, Synthesis, and Recognition: Second Edition,
Revised and Expanded, Sadaoki Furui8. Modern Digital Halftoning, Daniel L. Lau and Gonzalo R. Arce9. Blind Equalization and Identification, Zhi Ding and Ye (Geoffrey) Li10. Video Coding for Wireless Communication Systems, King N. Ngan, Chi W. Yap,
and Keng T. Tan11. Adaptive Digital Filters: Second Edition, Revised and Expanded,
Maurice G. Bellanger12. Design of Digital Video Coding Systems, Jie Chen, Ut-Va Koc, and K. J. Ray Liu13. Programmable Digital Signal Processors: Architecture, Programming,
and Applications, edited by Yu Hen Hu14. Pattern Recognition and Image Preprocessing: Second Edition, Revised
and Expanded, Sing-Tze Bow15. Signal Processing for Magnetic Resonance Imaging and Spectroscopy,
edited by Hong Yan
2006 by Taylor & Francis Group, LLC
-
16. Satellite Communication Engineering, Michael O. Kolawole17. Speech Processing: A Dynamic and Optimization-Oriented Approach, Li Deng18. Multidimensional Discrete Unitary Transforms: Representation: Partitioning
and Algorithms, Artyom M. Grigoryan, Sos S. Agaian, S.S. Agaian19. High-Resolution and Robust Signal Processing, Yingbo Hua, Alex B. Gershman
and Qi Cheng20. Domain-Specific Processors: Systems, Architectures, Modeling, and Simulation,
Shuvra Bhattacharyya; Ed Deprettere; Jurgen Teich21. Watermarking Systems Engineering: Enabling Digital Assets Security
and Other Applications, Mauro Barni, Franco Bartolini22. Biosignal and Biomedical Image Processing: MATLAB-Based Applications,
John L. Semmlow23. Broadband Last Mile Technologies: Access Technologies for Multimedia
Communications, edited by Nikil Jayant24. Image Processing Technologies: Algorithms, Sensors, and Applications,
edited by Kiyoharu Aizawa, Katsuhiko Sakaue and Yasuhito Suenaga25. Medical Image Processing, Reconstruction and Restoration: Concepts
and Methods, Jiri Jan26. Multi-Sensor Image Fusion and Its Applications, edited by Rick Blum
and Zheng Liu27. Advanced Image Processing in Magnetic Resonance Imaging, edited by
Luigi Landini, Vincenzo Positano and Maria Santarelli28. Digital Video Image Quality and Perceptual Coding, edited by
H.R. Wu and K.R. Rao
2006 by Taylor & Francis Group, LLC
-
Digital VideoImage Quality
andPerceptual
Coding
edited by
H.R. Wu and K.R. Rao
A CRC title, part of the Taylor & Francis imprint, a member of theTaylor & Francis Group, the academic division of T&F Informa plc.
Boca Raton London New York
2006 by Taylor & Francis Group, LLC
-
Published in 2006 byCRC PressTaylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742
2006 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group
No claim to original U.S. Government worksPrinted in the United States of America on acid-free paper10 9 8 7 6 5 4 3 2 1
International Standard Book Number-10: 0-8247-2777-0 (Hardcover) International Standard Book Number-13: 978-0-8247-2777-2 (Hardcover) Library of Congress Card Number 2005051404
This book contains information obtained from authentic and highly regarded sources. Reprinted material isquoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable effortshave been made to publish reliable data and information, but the author and the publisher cannot assumeresponsibility for the validity of all materials or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, andrecording, or in any information storage or retrieval system, without written permission from the publishers.
Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registrationfor a variety of users. For organizations that have been granted a photocopy license by the CCC, a separatesystem of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used onlyfor identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Digital video image quality and perceptual coding / edited by Henry R. Wu, K.R. Rao.p. cm. -- (Signal processing and communications)
Includes bibliographical references and index.ISBN 0-8247-2777-01. Digital video. 2. Imaging systems--image quality. 3. Perception. 4. Coding theory. 5. Computer
vision. I. Wu, Henry R. II. Rao, K. Ramamohan (Kamisetty Ramamohan) III. Series.
TK6680.5.D55 2006006.6'96--dc22 2005051404
Visit the Taylor & Francis Web site at
and the CRC Press Web site at Taylor & Francis Group is the Academic Division of Informa plc.
2006 by Taylor & Francis Group, LLC
For permission to photocopy or use material electronically from this work, please access www.copyright.com(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive,
http://www.taylorandfrancis.com
http://www.crcpress.com
http://www.copyright.comhttp://www.copyright.comhttp://www.taylorandfrancis.comhttp://www.crcpress.com -
To those who have pioneered, inspired and persevered.
2006 by Taylor & Francis Group, LLC
-
Copyrights release for ISO/IEC:All the figures and tables obtained from ISO/IEC used in this book are subject to the follow-ing: The terms and definitions taken from the Figures and Tables ref. ISO/IEC IS11172,ISO/IEC11172-2, ISO/IEC 11172-73, ISO/IEC IS13818-2, ISO/IEC IS13818-6, ISO/IEC JTC1/SC29WG11 Doc N2196, ISO/IEC JTC1 /SC29/WG11 N3536, ISO/MPEG N2502 ISO/IECJTC1/SC29/WG1 Doc. N1595, ISO/IEC JTC SC29/WG11 Doc.2460, ISO/IEC JTC SC29/WG11Doc.3751, ISO/MPEG N2501, ISO/IEC JTC-1 SC29/WG11 M5804, ISO/IEC JTC SC29/WG11,Recomm.H.262 ISO/IEC 13818-2, ISO/IEC IS 13818-1, ISO/MPEG N3746, ISO/IEC Doc.N2502, ISO/IEC JTC1/SC29/WG11, Doc. N2424 are reproduced with permission of Inter-national Organization for Standardization, ISO. These standards can be obtained from any ISOmember and from the Web site of the ISO Central Secretariat at the following address:
The terms and definitions taken from the Figure ref. ISO/IEC 14496-1:1999, ISO/IEC 14496-2:1999, ISO/IEC 14496-3:1999, ISO/IEC 14496-4:2000, ISO/IEC 14496-5:2000, ISO/IEC 14496-10 AVC: 2003 are reproduced with the permission of the International for Standardization, ISO.These standards can be obtained from any ISO member and from the Web site of ISO Cen-
editors, authors and Taylor and Francis are grateful to ISO/IEC for giving the permission.
About Shannon image on the front cover:The original image of Claude E. Shannon, the father of information theory, was provided byBell Laboratories of Lucent Technologies. It was compressed using the JPEG coder that isinformation lossy. Its resolution was 41815685 pixels with white margins around all sides.This digital image had scanning dust.
The original image was cropped down to 41765680 for compression and the scanning dustwas removed by copying the surrounding pixels over the dust. The resultant image was usedas the original image to produce the compressed Shannon image using the perceptual lossless
image compression. The PLIC was benchmarked against the JPEG-LS and the JPEG-NLS (d=2).
The coding error images were produced between the cropped original Shannon image and thecompressed images. In the error images, the red color represents a positive error, the blue anegative error and the white a zero error. The black color is not actually black, it is small valuedred or blue color. Comparing the two error images, it can be appreciated how the PLIC uses thehuman vision model to achieve perceptual lossless coding with a higher compression ratio thanthe benchmarks, whilst maintaining the visual fidelity of the original picture.
The top image is the original image that can be compressed to 3.179 bpp using the JPEG-LS.The mid-right image is the JPEG-NLS (i.e., JPEG near lossless) compressed at d=2 with Bitrate= 1.424 bpp, Compression Ratio = 5.6180:1, MSE = 1.9689 and PSNR = 45.1885 dB. The mid-left image is the difference image between the original and that compressed by the JPEG-NLS(d=2). The bottom-right is the PLIC compressed with Bitrate = 1.370 bpp, Compression Ratio =5.8394:1, MSE = 2.0420 and PSNR = 45.0303 dB. The bottom-left image is a difference imagebetween the original and that compressed by the PLIC.
2006 by Taylor & Francis Group, LLC
http://www.iso.org. Non-exclusive copyright remains with ISO.
tral Secretariat at the following address: www.iso.org. Non-exclusive copyright with ISO. The
image coder (PLIC) as described in Chapter 13 with an implementation intended for medical
http://www.iso.orghttp://www.iso.org -
Contributors
Alan C. Bovik, University of Texas at Austin, Austin, Texas, U.S.A.
Jorge E. Caviedes, Intel Corporation, Chandler, Arizona, U.S.A.
Tao Chen, Panasonic Hollywood Laboratory, Universal City, California, U.S.A.
Francois-Xavier Coudoux, Universite de Valenciennes, Valenciennes, Cedex, France.
Philip J. Corriveau, Intel Media and Acoustics Perception Lab, Hillboro, Oregon,U.S.A.
Mark D. Fairchild, Rochester Institute of Technology, Rochester, New York, U.S.A.
Marc G. Gazalet, Universite de Valenciennes, Valenciennes, Cedex, France.
Jae Jeong Hwang, Kunsan National University, Republic of Korea.
Michael Isnardi, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.
Ryoichi Kawada, KDDI R&D Laboratories Inc., Japan.
Weisi Lin, Institute for Infocomm Research, Singapore.
Jeffrey Lubin, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.
Makoto Miyahara, Japan Advanced Institute of Science and Technology, Japan.
Ethan D. Montag, Rochester Institute of Technology, Rochester, New York, U.S.A.
Franco Oberti, Philips Research, The Netherlands.
Albert Pica, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.
K. R. Rao, University of Texas at Arlington, Arlington, Texas, U.S.A.
Hamid Sheikh, Texas Instruments, Inc., Dallas, Texas, U.S.A.
Damian Marcellinus Tan, Royal Melbourne Institute of Technology, Melbourne,Victoria, Australia.
Zhou Wang, University of Texas at Arlington, Arlington, Texas, U.S.A.
Stefan Winkler, Genista Corporation, Montreux, Switzerland.
ix 2006 by Taylor & Francis Group, LLC
-
x Digital Video Image Quality and Perceptual Coding
Hong Ren Wu, Royal Melbourne Institute of Technology, Melbourne,Victoria, Australia.
Zhenghua Yu, National Information Communication Technology Australia (NICTA).
Michael Yuen, ESS Technology, Inc., Beijing, China
Jian Zhang, National Information Communication Technology Australia (NICTA).
2006 by Taylor & Francis Group, LLC
-
Acknowledgments
The editors, H. R. Wu and K. R. Rao, would like to thank all authors of this handbookfor their contributions, efforts and dedication without which this book would not havebeen possible.
The editors and the contributors have received assistance and support from many ofour colleagues that has made this handbook, Digital Video Image and Perceptual Cod-ing, possible. The generous assistance and support includes valuable information andmaterials used in and related to the book, discussions, feedback, comments on and proofreading of various parts of the book, recommendations and suggestions that shaped thebook as it is. Special thanks are due to the following persons:
M. Akgun Communications Research Center, CanadaJ. F. Arnold Australian Defence Force AcademyB. Baxter Intel CorporationJ. Cai Nanyang Technological UniversityN. Corriveau Spouse of P. CorriveauS. Daly Sharp Laboratories of AmericaM. Frater Australian Defence Force AcademyN. G. Kingsbury University of CambridgeL. Lu IBM T. J. Watson Research CenterZ. Man Nanyang Technological UniversityS. K. Mitra University of California, Santa BarbaraK. N. Ngan The Chinese University of Hong KongE. P. Simoncelli New York UniversityC.-S. Tan Royal Melbourne Institute of TechnologyA. Vincent Communications Research Center, CanadaM. Wada KDDI R&D LaboratoriesB. A. Wandell Stanford UniversityS. Wolf Institute for Telecommunication SciencesD. Wu Royal Melbourne Institute of TechnologyC. Zhang Nanyang Technological UniversityZ. Zhe Monash UniversityAll members VQEG
A. C. Bovik acknowledges the support by the National Science Foundation undergrant CCR-0310973.
xi 2006 by Taylor & Francis Group, LLC
-
xii Digital Video Image Quality and Perceptual Coding
H. R. Wu and Z. Yu acknowledge the support by Australian Research Council undergrant A49927209.
Assistance and support to this book project which H. R. Wu received from MonashUniversity where he lectured from 1990 to 2005, and from Nanyang Technological Uni-versity where he spent his sabbatical from 2002 to 2003, are gratefully acknowledged.
Special thanks go to David Wu of Royal Melbourne Institute of Technology for hisassistance in producing the final LATEX version of this handbook and the compressedShannon images shown on the front cover of the book.
H. R. Wu and K. R. Rao would like to express their sincere gratitude to B. J. Clark,publishing consultant at CRC Press LLC, who initiated this book project and whoseprofessional advice and appreciation of efforts which were involved in this undertakingmade the completion of this book possible. Sincere thanks also go to Nora Konopka,our Publisher, and Jessica Vakili, our Project Coordinator, at CRC Press LLC for theirpatience, understanding and unfailing support that helped see this project through. Weare most grateful to our Project Editor, Susan Horwitz, whose professional assistancehas made significant improvement to the books presentation. The work by NicholasYancer on the back cover of the book and the brilliant cover design by Jonathan Pennellare greatly appreciated.
Last but not least, without the patience and forbearance of our families, the prepara-tion of this book would have been impossible. We greatly appreciate their constant andcontinuing support and understanding.
2006 by Taylor & Francis Group, LLC
-
Preface
The outset of digital video image coding research is commonly acknowledged [Cla95]to be around 1950, marked by Goodalls paper on television by pulse code modula-tion (or PCM) [Goo51, Hua65], Cutlers patent on differential quantization of com-munication signals (commonly known as differential pulse code modulation or DPCMfor short) [Cut52], Harrisons paper on experiments with linear prediction in televi-sion [Har52], and Huffmans paper on a method for the construction of minimum re-dundancy codes (commonly known as Huffman coding) [Huf52]; notwithstanding thatsome of the pioneering work on fundamental theories, techniques and concepts in digi-tal image and video coding for visual communications can be traced back to Shannonsmonumental work on the mathematical theory of communication in 1948 [Sha48], Ga-bors 1946 paper on theory of communication [Gab46] and even as early as the late1920s when Kell proposed the principle of frame difference signal transmission in aBritish patent [Kel29, SB65, Sey63, Gir03]. While international standardization of dig-ital image and video coding [RH96] might be considered by many as the end of an eraor, simply, research in the area, for others it presents new challenges and signals the be-ginning of a new era, or more precisely, it is high time that we addressed and, perhaps,solved a number of long standing open problems in the field.
A brief review of the history and the state-of-the-art of research in the field will revealthe fundamental concepts, principles and techniques used in image data compressionfor storage and visual communications. An important goal that was set fairly earlyby forerunners in image data compression is to minimize statistical (including sourcecoding, spatio-temporal and inter-scale) and psychovisual (or perceptual) redundanciesof the image data to either comply with a certain storage or communications bandwidthrestrictions or limitations with the best possible picture quality, or to provide a certainpicture quality service with the lowest possible amount of data or bit rate [Sey62]. Ithelped to set the course and to raise a series of issues widely researched, which haveinspired and, in many ways, frustrated generations of researchers in the field. Some ofthese issues and associated problems are better researched, understood and solved thanothers.
Using information theory and optimization techniques, we understand reasonablywell the definition of statistical redundancy and what is the theoretical lower bound setby Shannons entropy in lossless image and video coding [Sha48, JN84]. We have statis-tically modelled natural image data fairly well, which has led to various optimal or sub-
xiii 2006 by Taylor & Francis Group, LLC
-
xiv Digital Video Image Quality and Perceptual Coding
optimal compression techniques in the least mean square sense [Sey62, Cla85, NH95].We routinely apply the rate-distortion theory with the mean squared error (MSE) as adistortion measure in design of constant bit rate coders. We have pushed the perfor-mance of a number of traditional compression techniques, such as predictive and trans-form coding, close to their limit in terms of decorrelation and energy packing efficien-cies. Motion compensated prediction has been thoroughly investigated for inter-framecoding of video and image sequences, leading to a number of effective and efficientalgorithms used in practical systems.
In model-, object- or segmentation-based coding, we have been trying to balancebit allocations between coding of model parameters and that of the residual image, butwe have yet to get it right. Different from classical compression algorithms, techniquesbased on matching pursuit, fractal transforms and projection on to convex sets are re-cursive, and encode transform or projection parameters instead of either pixel or trans-form coefficient values. Nevertheless, they have failed so far to live up to their greatexpectations in terms of rate-distortion performance in practical coding systems andapplications. We have long since realized that much higher compression ratios can beachieved than what is achievable by the best lossless coding techniques or the theoreticallower bound set by the information theory without noticeable distortion when viewedby human subjects. Various adaptive quantization and bit allocation techniques and al-gorithms have been investigated to incorporate some of the aspects of the human visualsystems (HVS) [CS77, CP84, Cla85], most of which focus on spatial contrast sensitivityand masking effects. Various visually weighted distortion measures have been also ex-plored in either performance evaluation [JN84] or rate-distortion optimization of imageor video coders [Tau00].
Limited investigations have been conducted in constant quality coder design, im-peded by the lack of a commonly acceptable quality metric which correlates well withsubjective or perceived quality indices, such as the mean opinion score (MOS) [ITU98].Long has the question been asked, Whats wrong with mean-squared error? [Gir84],as well as its derivatives such as the peak signal to noise ratio (PSNR), as the qualityor distortion measure. Nonetheless, obtaining credible and widely acceptable alterna-tive perceptual based quantitative quality and/or impairment metrics have so far eludedus till most recently [LB82, VQE00]. Consequently, attempts and claims of provid-ing users with guaranteed or constant quality visual services have been by and largeunattainable or unsubstantiated. Lacking HVS-based quantitative quality or impairmentmetrics, more often than not we opt for a much higher bit rate for quality critical visualservice applications than what is necessary, resulting in users carrying extra costs; andjust as likely a coding strategy may reduce a particular type of coding distortions orartifacts at the expense of manifesting or enhancing other types of distortions. One ofthe most challenging questions begging for an answer is how to define psychovisualredundancy for lossy image and video coding, if it can ever be defined quantitatively ina similar way to the statistical redundancy defined for lossless coding. It would help
2006 by Taylor & Francis Group, LLC
-
Preface xv
to set the theoretical lower bound for lossy image data coding at just noticeable levelcompared with the original.
This book attempts to address two of the above raised issues which may form acritical part of theoretical research and practical system development in the field, i.e.,HVS based perceptual quantitative quality/impairment metrics for digitally coded pic-tures (i.e., images and videos), and perceptual picture coding. The book consists of three
Part I comprises the first three chapters, covering a number of fundamental concepts,theory, principles and techniques underpinning issues and topics addressed by this book.
Rao provides an introduction to digital picture compression, covering basic issues andtechniques along with popular coding structures, systems and international standards forcompression of images and videos.
Fundamentals of Human Vision and Vision Modeling are presented by Montag
lated to the HVS and its applications presented in Parts II and III on perceptual qual-ity/impairment metrics, image/video coding and visual communications. The most re-cent achievements and findings in vision research are included, which are relevant todigital picture coding engineering practice.
Various digital image and video coding/compression algorithms and systems intro-duce highly structured coding artifacts or distortions, which are different from those intheir counterpart analog systems. It is important to analyze and understand these cod-ing artifacts in either subjective and objective quality assessment of digitally encoded
presented by Yuen of various coding artifacts in digital pictures coded using well knowntechniques.
Part II of this book consists of eight chapters dealing with a range of topics regardingpicture quality assessment criteria, subjective and objective methods and metrics, testingprocedures, and development of international standards activities in the field.
on subjective assessment methods and techniques, experimental design, internationalstandard test methods for digital video images in contrast to objective assessment meth-ods, highlighting a number of critical issues and findings. Commonly used test videosequences are presented. The chapter also covers test criteria, test procedures and re-lated issues for various applications in digital video coding and communications. Al-though subjective assessment methods have been well documented in the literature andstandardized by the international standards bodies [ITU98], there has been a renewed
2006 by Taylor & Francis Group, LLC
Chapter 1, Digital Picture Compression and Coding Structure, by Hwang, Wu and
and Fairchild in Chapter 2 which forms foundations of materials and discussions re-
images or video sequences. In Chapter 3, a comprehensive classification and analysis is
Chapter 4, Video Quality Testing by Corriveau, provides an in-depth discussion
parts, i.e., Part I, Fundamentals; Part II, Testing and Quality Assessment of DigitalPictures and Part III, Perceptual Coding and Postprocessing.
-
xvi Digital Video Image Quality and Perceptual Coding
interest in and research publications on various issues with subjective test methods andnew methods, approaches or procedures which may further improve the reliability ofsubjective test data.
A comprehensive and up-to-date review is provided by Winkler on Perceptual Video
square error (MSE) and the PSNR, and HVS based metrics as reported in the litera-ture [YW00, YWWC02] as well as by international standards bodies such as VQEG[VQE00]. It discusses factors which affect human viewers assessment on picture qual-ity, classification of objective quality metrics, and various approaches and models usedfor metrics design.
Scale. It provides insights into the idea and concept behind the PQS which was in-troduced by Miyahara, Kotani and Algozi in [MKA98], an extension of the methodpioneered by Miyahara in 1988 [Miy88]. It examines applications of PQS to variousdigital picture services, including super HDTV, extra high quality images, and cellularvideo phones, in the context of international standards and activities.
Wang, Bovik and Sheikh present a detailed account on Structural Similarity Based
ric is devised to complement the traditional error sensitive picture assessment methods,by targeting at perceived structural information variation, an approach which mimicshigh level functionality of the HVS. Quality prediction accuracy of the metric is eval-uated with significant lower computational complexity than vision model based qualitymetrics.
Vision Model Based Digital Video Impairment Metrics introduced recently are
In contrast with traditional vision modeling and parameterization method used in visionresearch, the vision model used in the impairment metrics are parameterized and opti-mized using subjective test data provided the VQEG where original and distorted videosequences were used instead of simple test patterns. Detailed descriptions of impairmentmetric implementations are provided with performance evaluation which have showedgood agreements with the MOS obtained via subjective evaluations.
Computational Models for Just-Noticeable Difference are reviewed and closely
as well as a practical users guide for related techniques. JND estimation techniquesin both DCT subband domain and image pixel domain are discussed along with issuesregarding conversions between the two domains.
ity Metric for Degraded and Enhanced Video. The concept of virtual reference isintroduced and defined. It highlights the importance of assessing picture quality en-
2006 by Taylor & Francis Group, LLC
Quality Metrics in Chapter 5, including both traditional measures, such as the mean
In Chapter 6, Miyahara and Kawada discuss the Philosophy of Picture Quality
Image Quality Assessment in Chapter 7. The structural similarity based quality met-
described by Yu and Wu in Chapter 8 for blocking and ringing impairment assessments.
examined by Lin in Chapter 9. It provides a systematic introduction to the field to date
In Chapter 10, Caviedes and Oberti investigate issues with No-Reference Qual-
-
Preface xvii
hancement as well as degradation in visual communications services and applications inabsence of original pictures. A framework for the development of no-reference qualitymetric is described. An extensive description is provided on the no-reference overallquality metric (NROQM) which the authors have developed for digital video qualityassessment.
activities highlighting its goals, test plans, major findings, and future work and direc-tions.
coder designs based on the HVS, and post-filtering, restoration, error correction and con-cealment techniques which paly an increasing role in improvement of perceptual picturequality by reduction of perceived coding artifacts and transmission errors. A number
foveated perceptual coding. A noticeable feature of these new perceptual coders is thatthey use much more sophisticated vision models resulting in significant visual perfor-mance improvement. Discussions are included in these chapters on possible new codingarchitectures based on vision model as compared with existing statistically based cod-ing algorithms and architectures predominant in current software and hardware productsand systems.
Chapter 12 by Pica, Isnardi and Lubin examines critical issues associated with HVSBased Perceptual Video Encoders. It covers an overview of perceptual based approaches,possible architectures and applications, and future directions. Architectures which sup-port perceptual based video encoding are discussed for an MPEG-2 compliant encoder.
Tan and Wu present Perceptual Image Coding in Chapter 13, which provides acomprehensive review of HVS based image coding techniques to date. The review cov-ers traditional techniques where various HVS aspects or simple vision models are usedfor coder designs. Until most recently, this traditional approach has dominated researchon the topic with numerous publications, which forms one of, at least, four approachesto perceptual coding design. The chapter describes a perceptual distortion metric basedimage coder and a vision model based perceptual lossless coder along with detaileddiscussions on model calibration and coder performance evaluation results.
Chapter 14 by Wang and Bovik investigates novel Foveated Image and Video Cod-ing techniques, which they introduced most recently. It provides an introduction to thefoveation feature of the HVS, a review of various foveation techniques that have beenused to construct image and video coding systems, and detailed descriptions of examplefoveated picture coding systems.
Processing in Image Compression. Various image restoration and processing tech-
2006 by Taylor & Francis Group, LLC
In Chapter 11, Corriveau presents an overview of Video Quality Experts Group
of new perceptual coders introduced in recent years are presented in Chapters 12, 13and 14, including rate-distortion optimization using perceptual distortion metrics and
Chapter 15 by Chen and Wu discusses the topic of Artifact Reduction by Post-
The next six chapters form Part III of this book, focusing on digital image and video
-
xviii Digital Video Image Quality and Perceptual Coding
niques have been reported in recent years to eliminate or to reduce picture coding arti-facts introduced in the encoding or transmission process to improve perceptual imageor video picture quality. It becomes widely accepted that these post-filtering algorithmsare an integral part of a compression package or system from a rate-distortion optimiza-tion standpoint. This chapter focuses on reduction of blocking and ringing artifacts inorder to improve the visual quality of reconstructed pictures. A DCT domain deblock-ing technique is described with a fast implementation algorithm after a review of codingartifacts reduction techniques to date.
Color bleeding is a prominent distortion associated with color images encoded by
a novel approach to Reduction of Color Bleeding in DCT Block-Coded Video, whichthey introduced recently. This post-processing technique is devised after a thoroughanalysis of the cause of color bleeding. The performance evaluation results have demon-strated marked improvement in perceptual quality of reconstructed pictures.
Issues associated with Error Resilience for Video Coding Service are investigated
and concealment methods. Significant improvement in terms of visual picture qualityhas been demonstrated by using a number of techniques presented.
challenges of the field which may be beneficial to the readers for future research.
Performance measures used to evaluate objective quality/impairment metrics against
We hope that readers will enjoy reading this book as much as we have enjoyedwriting it and find materials provided in it useful and relevant to their work and studiesin the field.
H. R. Wu K. R. RaoRoyal Melbourne Institute of Technology, University of Texas at Arlington,Australia U.S.A.
2006 by Taylor & Francis Group, LLC
block DCT based picture coding systems. Coudoux and Gazalet present in Chapter 16
by Zhang in Chapter 17. It provides an introduction to error resilient coding techniques
Chapter 18, the final chapter of the book, highlights a number of critical issues and
subjective test data are discussed in Appendix A.
-
Preface xix
References
[Cla85] R. J. Clarke. Transform Coding of Images. London: Academic Press, 1985.
[Cla95] R. J. Clarke. Digital Compression of Still Images and Video. London: AcademicPress, 1995.
[CP84] W.-H. Chen and W. K. Pratt. Scene adaptive coder. IEEE Trans. Commun., COM-32:225232, March 1984.
[CS77] W.-H. Chen and C. H. Smith. Adaptive coding of monochrome and color images.IEEE Trans. Commun., COM-25:12851292, November 1977.
[Cut52] C. C. Cutler. Differential Quantization of Communication Signals, U.S. PatentNo.2,605,361, July 1952.
[Gab46] D. Gabor. Theory of communication. Journal of IEE, 93:429457, 1946.
[Gir84] B. Girod. Whats wrong with mean-squared error? In A. B. Watson, Ed., DigitalImages and Human Vision, 207220. Cambridge, MA: MIT Press, 1984.
[Gir03] B. Girod. Video coding for compression and beyond, keynote. In Proceedings ofIEEE International Conference on Image Processing, Barcelona, Spain, Septem-ber 2003.
[Goo51] W. M. Goodall. Television by pulse code modulation. Bell Systems TechnicalJournal, 28:3349, January 1951.
[Har52] C. W. Harrison. Experiments with linear prediction in television. Bell SystemsTechnical Journal, 29:764783, 1952.
[Hua65] T. S. Huang. PCM picture transmission. IEEE Spectrum, 2:5763, December1965.
[Huf52] D. A. Huffman. A method for the construction of minimum redundancy codes.IRE Proc., 40:10981101, 1952.
[ITU98] ITU. ITU-RBT. 500-9, methodology for the subjective assessment of the qualityof television pictures. ITU-RBT, 1998.
[JN84] N. S. Jayant and P. Noll. Digital Coding of Waveforms Principles and Applica-tions to Speech and Video. Upper Saddle River, NJ: Prentice Hall, 1984.
[Kel29] R. D. Kell. Improvements Relating to Electric Picture Transmission Systems,British Patent No.341,811, 1929.
[LB82] F. J. Lukas and Z. L. Budrikis. Picture quality prediction based on a visual model.IEEE Transactions on Communications, COM-30:16791692, July 1982.
[Miy88] M. Miyahara. Quality assessments for visual service. IEEE Communications Mag-azine, 26(10):5160, October 1988.
[MKA98] M. Miyahara, K. Kotani, and V. R. Algazi. Objective picture quality scale (pqs)for image coding. IEEE Transactions on Communications, 46(9):12151226, Sep-tember 1998.
[NH95] A. N. Netravali and B. G. Haskell. Digital Pictures Representation, Compres-sion and Standards. New York: Plenum Press, 2nd ed., 1995.
2006 by Taylor & Francis Group, LLC
-
xx Digital Video Image Quality and Perceptual Coding
[RH96] K. R. Rao and J. J. Hwang. Techniques and Standards for Image, Video and AudioCoding. Upper Saddle River, NJ: Prentice Hall, 1996.
[SB65] A. J. Seyler and Z. L. Budrikis. Detail perception after scene changes in televisionimage presentations. IEEE Trans. on Information Theory, IT-11(1):3143, January1965.
[Sey62] A. J. Seyler. The coding of visual signals to reduce channel-capacity requirements.Proc. IEE, pt. C, 109(1):676684, 1962.
[Sey63] A. J. Seyler. Real-time recording of television frame difference areas. Proc. IEEE,51(1):478480, 1963.
[Sha48] C. E. Shannon. A mathematical theory of communication. Bell System TechnicalJournal, 27:379623, 1948.
[Tau00] D. Taubman. High performance scalable image compression with ebcot. IEEETrans. Image Proc., 9:11581170, July 2000.
[VQE00] VQEG. Final Report from the Video Quality Experts Group on the Validation ofObjective Models of Video Quality Assessment. VQEG, March 2000. Available
[YW00] Z. Yu and H. R. Wu. Human visual systems based objective digital video qualitymetrics. In Proceedings of Internetional Conference on Signal Processing 2000 of16th IFIP World Computer Congress, 2:10881095, Beijing, China, August 2000.
[YWWC02] Z. Yu, H. R. Wu, S. Winkler, and T. Chen. Vision model based impairment metricto evaluate blocking artifacts in digital video. Proc. IEEE, 90(1):154169, January2002.
2006 by Taylor & Francis Group, LLC
from ftp.its.bldrdoc.gov.
ftp://ftp.its.bldrdoc.gov -
Contents
List of Contributors ix
Acknowledgments xi
Preface xiii
I Picture Coding and Human Visual System Fundamentals 1
1 Digital Picture Compression and Coding Structure 3
1.1 Introduction to Digital Picture Coding . . . . . . . . . . . . . . . . . . 3
1.2 Characteristics of Picture Data . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Digital Image Data . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Digital Video Data . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Compression and Coding Techniques . . . . . . . . . . . . . . . . . . 12
1.3.1 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Transform Coding . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3.1 Discrete cosine transform (DCT) . . . . . . . . . . . 14
1.3.3.2 Discrete wavelet transform (DWT) . . . . . . . . . . 18
1.4 Picture Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Uniform/Nonuniform Quantizer . . . . . . . . . . . . . . . . . 21
1.4.2 Optimal Quantizer Design . . . . . . . . . . . . . . . . . . . . 21
1.4.3 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . 24
1.5 Rate-Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Human Visual Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2006 by Taylor & Francis Group, LLC
-
xxii Digital Video Image Quality and Perceptual Coding
1.6.1 Contrast Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6.2 Spatial Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.3 Masking Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.4 Mach Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.7 Digital Picture Coding Standards and Systems . . . . . . . . . . . . . . 31
1.7.1 JPEG-Still Image Coding Standard . . . . . . . . . . . . . . . 31
1.7.2 MPEG-Video Coding Standards . . . . . . . . . . . . . . . . . 36
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2 Fundamentals of Human Vision and Vision Modeling 45
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 A Brief Overview of the Visual System . . . . . . . . . . . . . . . . . 45
2.3 Color Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 Colorimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.2 Color Appearance, Color Order Systems and ColorDifference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Luminance and the Perception of Light Intensity . . . . . . . . . . . . . 55
2.4.1 Luminance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.2 Perceived Intensity . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5 Spatial Vision and Contrast Sensitivity . . . . . . . . . . . . . . . . . . 59
2.5.1 Acuity and Sampling . . . . . . . . . . . . . . . . . . . . . . . 60
2.5.2 Contrast Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 62
2.5.3 Multiple Spatial Frequency Channels . . . . . . . . . . . . . . 64
2.5.3.1 Pattern adaptation . . . . . . . . . . . . . . . . . . . 65
2.5.3.2 Pattern detection . . . . . . . . . . . . . . . . . . . . 65
2.5.3.3 Masking and facilitation . . . . . . . . . . . . . . . . 66
2.5.3.4 Nonindependence in spatial frequency and orientation 68
2.5.3.5 Chromatic contrast sensitivity . . . . . . . . . . . . . 70
2.5.3.6 Suprathreshold contrast sensitivity . . . . . . . . . . 71
2.5.3.7 Image compression and image difference . . . . . . . 74
2.6 Temporal Vision and Motion . . . . . . . . . . . . . . . . . . . . . . . 75
2.6.1 Temporal CSF . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2006 by Taylor & Francis Group, LLC
-
Contents xxiii
2.6.2 Apparent Motion . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.7 Visual Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.7.1 Image and Video Quality Research . . . . . . . . . . . . . . . . 80
2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3 Coding Artifacts and Visual Distortions 87
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2 Blocking Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.2.1 Intraframe Coded Macroblocks . . . . . . . . . . . . . . . . . 90
3.2.2 Predictive Coded Macroblocks . . . . . . . . . . . . . . . . . . 90
3.3 Basis Image Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.3.1 Visual Significance of Each Basis Image . . . . . . . . . . . . . 92
3.3.2 Predictive Coded Macroblocks . . . . . . . . . . . . . . . . . . 92
3.3.3 Aggregation of Major Basis Images . . . . . . . . . . . . . . . 93
3.4 Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Color Bleeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.6 Staircase Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.7 Ringing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.8 Mosaic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.8.1 Intraframe Coded Macroblocks . . . . . . . . . . . . . . . . . 100
3.8.2 Predictive-Coded Macroblocks . . . . . . . . . . . . . . . . . . 101
3.9 False Contouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.10 False Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.11 MC Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.12 Mosquito Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.12.1 Ringing-Related Mosquito Effect . . . . . . . . . . . . . . . . 108
3.12.2 Mismatch-Related Mosquito Effect . . . . . . . . . . . . . . . 109
3.13 Stationary Area Fluctuations . . . . . . . . . . . . . . . . . . . . . . . 110
3.14 Chrominance Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.15 Video Scaling and Field Rate Conversion . . . . . . . . . . . . . . . . 113
3.15.1 Video Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.15.2 Field Rate Conversion . . . . . . . . . . . . . . . . . . . . . . 115
2006 by Taylor & Francis Group, LLC
-
xxiv Digital Video Image Quality and Perceptual Coding
3.16 Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.16.1 Line Repetition and Averaging . . . . . . . . . . . . . . . . . . 117
3.16.2 Field Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.16.3 Motion Adaptivity . . . . . . . . . . . . . . . . . . . . . . . . 118
3.16.3.1 Luminance difference . . . . . . . . . . . . . . . . . 118
3.16.3.2 Median filters . . . . . . . . . . . . . . . . . . . . . 118
3.16.3.3 Motion compensation . . . . . . . . . . . . . . . . . 119
3.17 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
II Picture Quality Assessment and Metrics 123
4 Video Quality Testing 125
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2 Subjective Assessment Methodologies . . . . . . . . . . . . . . . . . . 126
4.3 Selection of Test Materials . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Selection of Participants Subjects . . . . . . . . . . . . . . . . . . . 128
4.4.1 Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4.2 Non-Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.4.3 Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.5 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.5.1 Test Chamber . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.5.2 Common Experimental Mistakes . . . . . . . . . . . . . . . . . 131
4.6 International Test Methods . . . . . . . . . . . . . . . . . . . . . . . . 132
4.6.1 Double Stimulus Impairment Scale Method . . . . . . . . . . . 132
4.6.2 Double Stimulus Quality Scale Method . . . . . . . . . . . . . 137
4.6.3 Comparison Scale Method . . . . . . . . . . . . . . . . . . . . 141
4.6.4 Single Stimulus Methods . . . . . . . . . . . . . . . . . . . . . 142
4.6.5 Continuous Quality Evaluations . . . . . . . . . . . . . . . . . 143
4.6.6 Discussion of SSCQE and DSCQS . . . . . . . . . . . . . . . 145
4.6.7 Pitfalls of Different Methods . . . . . . . . . . . . . . . . . . . 147
4.7 Objective Assessment Methods . . . . . . . . . . . . . . . . . . . . . . 150
2006 by Taylor & Francis Group, LLC
-
Contents xxv
4.7.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.7.2 Requirement for Standards . . . . . . . . . . . . . . . . . . . . 151
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5 Perceptual Video Quality Metrics A Review 155
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.2 Quality Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3 Metric Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.4 Pixel-Based Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5 The Psychophysical Approach . . . . . . . . . . . . . . . . . . . . . . 160
5.5.1 HVS Modeling Fundamentals . . . . . . . . . . . . . . . . . . 160
5.5.2 Single-Channel Models . . . . . . . . . . . . . . . . . . . . . . 163
5.5.3 Multi-Channel Models . . . . . . . . . . . . . . . . . . . . . . 164
5.6 The Engineering Approach . . . . . . . . . . . . . . . . . . . . . . . . 165
5.6.1 Full-Reference Metrics . . . . . . . . . . . . . . . . . . . . . . 166
5.6.2 Reduced-Reference Metrics . . . . . . . . . . . . . . . . . . . 167
5.6.3 No-Reference Metrics . . . . . . . . . . . . . . . . . . . . . . 167
5.7 Metric Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.7.2 Video Quality Experts Group . . . . . . . . . . . . . . . . . . . 170
5.7.3 Limits of Prediction Performance . . . . . . . . . . . . . . . . 171
5.8 Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . 172
6 Philosophy of Picture Quality Scale 181
6.1 Objective Picture Quality Scale for Image Coding . . . . . . . . . . . . 181
6.1.1 PQS and Evaluation of Displayed Image . . . . . . . . . . . . . 181
6.1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1.3 Construction of a Picture Quality Scale . . . . . . . . . . . . . 182
6.1.3.1 Luminance coding error . . . . . . . . . . . . . . . . 183
6.1.3.2 Spatial frequency weighting of errors . . . . . . . . . 183
6.1.3.3 Random errors and disturbances . . . . . . . . . . . 185
6.1.3.4 Structured and localized errors and disturbances . . . 186
2006 by Taylor & Francis Group, LLC
-
xxvi Digital Video Image Quality and Perceptual Coding
6.1.3.5 Principal component analysis . . . . . . . . . . . . . 188
6.1.3.6 Computation of PQS . . . . . . . . . . . . . . . . . . 189
6.1.4 Visual Assessment Tests . . . . . . . . . . . . . . . . . . . . . 189
6.1.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . 191
6.1.4.2 Test pictures . . . . . . . . . . . . . . . . . . . . . . 192
6.1.4.3 Coders . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.1.4.4 Determination of MOS . . . . . . . . . . . . . . . . 193
6.1.5 Results of Experiments . . . . . . . . . . . . . . . . . . . . . . 193
6.1.5.1 Results of principal component analysis . . . . . . . 193
6.1.5.2 Multiple regression analysis . . . . . . . . . . . . . . 195
6.1.5.3 Evaluation of PQS . . . . . . . . . . . . . . . . . . . 195
6.1.5.4 Generality and robustness of PQS . . . . . . . . . . . 196
6.1.6 Key Distortion Factors . . . . . . . . . . . . . . . . . . . . . . 197
6.1.6.1 Characteristics of the principal components . . . . . . 197
6.1.6.2 Contribution of the distortion factors . . . . . . . . . 197
6.1.6.3 Other distortion factors . . . . . . . . . . . . . . . . 198
6.1.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.1.7.1 Limitations in applications . . . . . . . . . . . . . . 198
6.1.7.2 Visual assessment scales and methods . . . . . . . . 200
6.1.7.3 Human vision models and image quality metrics . . . 200
6.1.7.4 Specializing PQS for a specific coding method . . . . 201
6.1.7.5 PQS in color picture coding . . . . . . . . . . . . . . 201
6.1.8 Applications of PQS . . . . . . . . . . . . . . . . . . . . . . . 201
6.1.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.2 Application of PQS to a Variety of Electronic Images . . . . . . . . . . 202
6.2.1 Categories of Image Evaluation . . . . . . . . . . . . . . . . . 203
6.2.1.1 Picture spatial resolution and viewing distance . . . . 203
6.2.1.2 Constancy of viewing distance . . . . . . . . . . . . 205
6.2.1.3 Viewing angle between adjacent pixels . . . . . . . . 206
6.2.2 Linearization of the Scale . . . . . . . . . . . . . . . . . . . . 206
6.2.3 Importance of Center Area of Image in Quality Evaluation . . . 208
2006 by Taylor & Francis Group, LLC
-
Contents xxvii
6.2.4 Other Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.3 Various Categories of Image Systems . . . . . . . . . . . . . . . . . . 209
6.3.1 Standard TV Images with Frame Size of about 500 640Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.3.2 HDTV and Super HDTV . . . . . . . . . . . . . . . . . . . . . 209
6.3.3 Extra High Quality Images . . . . . . . . . . . . . . . . . . . . 211
6.3.4 Cellular Phone Type . . . . . . . . . . . . . . . . . . . . . . . 212
6.3.5 Personal Computer and Display for CG . . . . . . . . . . . . . 213
6.4 Study at ITU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4.1 SG9 Recommendations for Quality Assessment . . . . . . . . . 213
6.4.2 J.143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.4.2.1 FR scheme . . . . . . . . . . . . . . . . . . . . . . . 215
6.4.2.2 NR scheme . . . . . . . . . . . . . . . . . . . . . . . 215
6.4.2.3 RR scheme . . . . . . . . . . . . . . . . . . . . . . . 215
6.4.3 J.144 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.4.4 J.133 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.4.5 J.146 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.4.6 J.147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.4.7 J.148 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7 Structural Similarity Based Image Quality Assessment 225
7.1 Structural Similarity and Image Quality . . . . . . . . . . . . . . . . . 225
7.2 The Structural SIMilarity (SSIM) Index . . . . . . . . . . . . . . . . . 228
7.3 Image Quality Assessment Based on the SSIM Index . . . . . . . . . . 233
7.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8 Vision Model Based Digital Video Impairment Metrics 243
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8.2 Vision Modeling for Impairment Measurement . . . . . . . . . . . . . 247
8.2.1 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . 248
8.2.2 Temporal Filtering . . . . . . . . . . . . . . . . . . . . . . . . 249
8.2.3 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 250
2006 by Taylor & Francis Group, LLC
-
xxviii Digital Video Image Quality and Perceptual Coding
8.2.4 Contrast Gain Control . . . . . . . . . . . . . . . . . . . . . . 251
8.2.5 Detection and Pooling . . . . . . . . . . . . . . . . . . . . . . 253
8.2.6 Model Parameterization . . . . . . . . . . . . . . . . . . . . . 254
8.2.6.1 Parameterization by vision research experiments . . . 254
8.2.6.2 Parameterization by video quality experiments . . . . 255
8.3 Perceptual Blocking Distortion Metric . . . . . . . . . . . . . . . . . . 258
8.3.1 Blocking Dominant Region Segmentation . . . . . . . . . . . . 259
8.3.1.1 Vertical and horizontal block edge detection . . . . . 261
8.3.1.2 Removal of edges coexisting in original and processedsequences . . . . . . . . . . . . . . . . . . . . . . . 263
8.3.1.3 Removal of short isolated edges in processed sequence 263
8.3.1.4 Adjacent edge removal . . . . . . . . . . . . . . . . 263
8.3.1.5 Generation of blocking region map . . . . . . . . . . 263
8.3.1.6 Ringing region detection . . . . . . . . . . . . . . . 265
8.3.1.7 Exclusion of ringing regions from blocking region map 265
8.3.2 Summation of Distortions in Blocking Dominant Regions . . . 265
8.3.3 Performance Evaluation of the PBDM . . . . . . . . . . . . . . 266
8.4 Perceptual Ringing Distortion Measure . . . . . . . . . . . . . . . . . . 269
8.4.1 Ringing Region Segmentation . . . . . . . . . . . . . . . . . . 271
8.4.1.1 Modified variance computation . . . . . . . . . . . . 272
8.4.1.2 Smooth and complex region detection . . . . . . . . 272
8.4.1.3 Boundary labeling and distortion calculation . . . . . 273
8.4.2 Detection and Pooling . . . . . . . . . . . . . . . . . . . . . . 274
8.4.3 Performance Evaluation of the PRDM . . . . . . . . . . . . . . 274
8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
9 Computational Models for Just-Noticeable Difference 281
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.1.1 Single-Stimulus JNDT Tests . . . . . . . . . . . . . . . . . . . . 282
9.1.2 JND Tests with Real-World Images . . . . . . . . . . . . . . . 283
9.1.3 Applications of JND Models . . . . . . . . . . . . . . . . . . . 283
9.1.4 Objectives and Organization of the Following Sections . . . . . 284
2006 by Taylor & Francis Group, LLC
-
Contents xxix
9.2 JND with DCT Subbands . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.2.1 Formulation for Base Threshold . . . . . . . . . . . . . . . . . 286
9.2.1.1 Spatial CSF equations . . . . . . . . . . . . . . . . . 286
9.2.1.2 Base threshold . . . . . . . . . . . . . . . . . . . . . 287
9.2.2 Luminance Adaptation Considerations . . . . . . . . . . . . . 289
9.2.3 Contrast Masking . . . . . . . . . . . . . . . . . . . . . . . . . 291
9.2.3.1 Intra-band masking . . . . . . . . . . . . . . . . . . 291
9.2.3.2 Inter-band masking . . . . . . . . . . . . . . . . . . 291
9.2.4 Other Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
9.3 JND with Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
9.3.1 JND Estimation from Pixel Domain . . . . . . . . . . . . . . . 294
9.3.1.1 Spatial JNDs . . . . . . . . . . . . . . . . . . . . . . 294
9.3.1.2 Simplified estimators . . . . . . . . . . . . . . . . . 295
9.3.1.3 Temporal masking effect . . . . . . . . . . . . . . . 296
9.3.2 Conversion between Subband- and Pixel-Based JNDs . . . . . . 297
9.3.2.1 Subband summation to pixel domain . . . . . . . . . 297
9.3.2.2 Pixel domain decomposition into subbands . . . . . . 298
9.4 JND Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 298
9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10 No-Reference Quality Metric for Degraded and Enhanced Video 305
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
10.2 State-of-the-Art for No-Reference Metrics . . . . . . . . . . . . . . . . 306
10.3 Quality Metric Components and Design . . . . . . . . . . . . . . . . . 307
10.3.1 Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . . . . 309
10.3.2 Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 310
10.3.3 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.5 Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.3.6 Sharpness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
10.4 No-Reference Overall Quality Metric . . . . . . . . . . . . . . . . . . 313
10.4.1 Building and Training the NROQM . . . . . . . . . . . . . . . 314
2006 by Taylor & Francis Group, LLC
-
xxx Digital Video Image Quality and Perceptual Coding
10.5 Performance of the Quality Metric . . . . . . . . . . . . . . . . . . . . 317
10.5.1 Testing NROQM . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.5.2 Test with Expert Viewers . . . . . . . . . . . . . . . . . . . . . 320
10.6 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . 321
11 Video Quality Experts Group 325
11.1 Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
11.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.3 Phase I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.3.1 The Subjective Test Plan . . . . . . . . . . . . . . . . . . . . . 328
11.3.2 The Objective Test Plan . . . . . . . . . . . . . . . . . . . . . 328
11.3.3 Comparison Metrics . . . . . . . . . . . . . . . . . . . . . . . 329
11.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.4 Phase II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
11.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.5 Continuing Work and Directions . . . . . . . . . . . . . . . . . . . . . 332
11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
III Perceptual Coding and Processing of Digital Pictures 335
12 HVS Based Perceptual Video Encoders 337
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
12.1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
12.2 Noise Visibility and Visual Masking . . . . . . . . . . . . . . . . . . . 338
12.3 Architectures for Perceptual Based Coding . . . . . . . . . . . . . . . 340
12.3.1 Masking Calculations . . . . . . . . . . . . . . . . . . . . . . 343
12.3.2 Perceptual Based Rate Control . . . . . . . . . . . . . . . . . . 345
12.3.2.1 Macroblock level control . . . . . . . . . . . . . . . 345
12.3.2.2 Picture level control . . . . . . . . . . . . . . . . . . 346
12.3.2.3 GOP level control . . . . . . . . . . . . . . . . . . . 348
12.3.3 Look Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
12.4 Standards-Specific Features . . . . . . . . . . . . . . . . . . . . . . . . 352
2006 by Taylor & Francis Group, LLC
-
Contents xxxi
12.4.1 Exploitation of Smaller Block Sizes in Advanced Coding Stan-dards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
12.4.1.1 The origin of blockiness . . . . . . . . . . . . . . . . 352
12.4.1.2 Parameters that affect blockiness visibility . . . . . . 352
12.4.2 In-Loop Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 356
12.4.3 Perceptual-Based Scalable Coding Schemes . . . . . . . . . . . 356
12.5 Salience/Maskability Pre-Processing . . . . . . . . . . . . . . . . . . . 357
12.6 Application to Multi-Channel Encoding . . . . . . . . . . . . . . . . . 358
13 Perceptual Image Coding 361
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
13.1.1 Watsons DCTune . . . . . . . . . . . . . . . . . . . . . . . . 362
13.1.2 Safranek and Johnstons Subband Image Coder . . . . . . . . . 363
13.1.3 Hontsch and Karams APIC . . . . . . . . . . . . . . . . . . . 363
13.1.4 Chou and Lis Perceptually Tuned Subband Image Coder . . . . 365
13.1.5 Taubmans EBCOT-CVIS . . . . . . . . . . . . . . . . . . . . 366
13.1.6 Zeng et al.s Point-Wise Extended Visual Masking . . . . . . . 366
13.2 A Perceptual Distortion Metric Based Image Coder . . . . . . . . . . . 368
13.2.1 Coder Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 368
13.2.2 Perceptual Image Distortion Metric . . . . . . . . . . . . . . . 369
13.2.2.1 Frequency transform . . . . . . . . . . . . . . . . . . 369
13.2.2.2 CSF . . . . . . . . . . . . . . . . . . . . . . . . . . 371
13.2.2.3 Masking response . . . . . . . . . . . . . . . . . . . 372
13.2.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . 373
13.2.2.5 Overall model . . . . . . . . . . . . . . . . . . . . . 373
13.2.3 EBCOT Adaptation . . . . . . . . . . . . . . . . . . . . . . . . 375
13.3 Model Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
13.3.1 Test Material . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
13.3.2 Generation of Distorted Images . . . . . . . . . . . . . . . . . 378
13.3.3 Subjective Assessment . . . . . . . . . . . . . . . . . . . . . . 379
13.3.4 Arrangements and Apparatus . . . . . . . . . . . . . . . . . . . 380
13.3.5 Presentation of Material . . . . . . . . . . . . . . . . . . . . . 381
2006 by Taylor & Francis Group, LLC
-
xxxii Digital Video Image Quality and Perceptual Coding
13.3.6 Grading Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
13.3.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
13.3.8 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
13.3.9 Model Optimization . . . . . . . . . . . . . . . . . . . . . . . 386
13.3.9.1 Full parametric optimization . . . . . . . . . . . . . 389
13.3.9.2 Algorithmic optimization . . . . . . . . . . . . . . . 390
13.3.9.3 Coder optimization . . . . . . . . . . . . . . . . . . 391
13.3.9.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . 392
13.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 394
13.4.1 Assessment Material . . . . . . . . . . . . . . . . . . . . . . . 395
13.4.2 Objective Evaluation . . . . . . . . . . . . . . . . . . . . . . . 395
13.4.3 Objective Results . . . . . . . . . . . . . . . . . . . . . . . . . 397
13.4.4 Subjective Evaluation . . . . . . . . . . . . . . . . . . . . . . . 400
13.4.4.1 Dichotomous FCM . . . . . . . . . . . . . . . . . . 400
13.4.4.2 Trichotomous FCM . . . . . . . . . . . . . . . . . . 400
13.4.4.3 Assessment arrangements . . . . . . . . . . . . . . . 401
13.4.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 402
13.4.5.1 PC versus EBCOT-MSE . . . . . . . . . . . . . . . . 402
13.4.5.2 PC versus EBCOT-CVIS . . . . . . . . . . . . . . . 406
13.4.5.3 PC versus EBCOT-XMASK . . . . . . . . . . . . . . 406
13.4.6 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . 406
13.5 Perceptual Lossless Coder . . . . . . . . . . . . . . . . . . . . . . . . 412
13.5.1 Coding Structure . . . . . . . . . . . . . . . . . . . . . . . . . 412
13.5.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 414
13.5.2.1 Subjective evaluation . . . . . . . . . . . . . . . . . 415
13.5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 416
13.5.2.3 Discussions . . . . . . . . . . . . . . . . . . . . . . 416
13.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
14 Foveated Image and Video Coding 431
14.1 Foveated Human Vision and Foveated Image Processing . . . . . . . . 431
14.2 Foveation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
2006 by Taylor & Francis Group, LLC
-
Contents xxxiii
14.2.1 Geometric Methods . . . . . . . . . . . . . . . . . . . . . . . . 434
14.2.2 Filtering Based Methods . . . . . . . . . . . . . . . . . . . . . 436
14.2.3 Multiresolution Methods . . . . . . . . . . . . . . . . . . . . . 438
14.3 Scalable Foveated Image and Video Coding . . . . . . . . . . . . . . . 440
14.3.1 Foveated Perceptual Weighting Model . . . . . . . . . . . . . . 440
14.3.2 Embedded Foveation Image Coding . . . . . . . . . . . . . . . 445
14.3.3 Foveation Scalable Video Coding . . . . . . . . . . . . . . . . 447
14.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
15 Artifact Reduction by Post-Processing in Image Compression 459
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
15.2 Image Compression and Coding Artifacts . . . . . . . . . . . . . . . . 461
15.2.1 Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . . . . 462
15.2.2 Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 464
15.3 Reduction of Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . 465
15.3.1 Adaptive Postfiltering of Transform Coefficients . . . . . . . . 469
15.3.1.1 Consideration of masking effect . . . . . . . . . . . . 471
15.3.1.2 Block activity . . . . . . . . . . . . . . . . . . . . . 473
15.3.1.3 Adaptive filtering . . . . . . . . . . . . . . . . . . . 473
15.3.1.4 Quantization constraint . . . . . . . . . . . . . . . . 474
15.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 475
15.3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 477
15.3.3.1 Results of block classification . . . . . . . . . . . . . 478
15.3.3.2 Performance evaluation . . . . . . . . . . . . . . . . 478
15.4 Reduction of Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . 482
15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
16 Reduction of Color Bleeding in DCT Block-Coded Video 489
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
16.2 Analysis of the Color Bleeding Phenomenon . . . . . . . . . . . . . . . 490
16.2.1 Digital Color Video Formats . . . . . . . . . . . . . . . . . . . 490
16.2.2 Color Quantization . . . . . . . . . . . . . . . . . . . . . . . . 491
2006 by Taylor & Francis Group, LLC
-
xxxiv Digital Video Image Quality and Perceptual Coding
16.2.3 Analysis of Color Bleeding Distortion . . . . . . . . . . . . . . 492
16.3 Description of the Post-Processor . . . . . . . . . . . . . . . . . . . . . 495
16.4 Experimental Results Concluding Remarks . . . . . . . . . . . . . . 499
17 Error Resilience for Video Coding Service 503
17.1 Introduction to Error Resilient Coding Techniques . . . . . . . . . . . . 503
17.2 Error Resilient Coding Methods Compatible with MPEG-2 . . . . . . . 504
17.2.1 Temporal Localization . . . . . . . . . . . . . . . . . . . . . . 504
17.2.2 Spatial Localization . . . . . . . . . . . . . . . . . . . . . . . 506
17.2.3 Concealment . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
17.2.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
17.3 Methods for Concealment of Cell Loss . . . . . . . . . . . . . . . . . . 513
17.3.1 Spatial Concealment . . . . . . . . . . . . . . . . . . . . . . . 513
17.3.2 Temporal Concealment . . . . . . . . . . . . . . . . . . . . . . 513
17.3.3 The Boundary Matching Algorithm (BMA) . . . . . . . . . . . 517
17.3.4 Decoder Motion Vector Estimation (DMVE) . . . . . . . . . . 520
17.3.5 Extension of DMVE algorithm . . . . . . . . . . . . . . . . . . 522
17.4 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 523
17.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
18 Critical Issues and Challenges 543
18.1 Picture Coding Structures . . . . . . . . . . . . . . . . . . . . . . . . . 543
18.1.1 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . 545
18.1.2 Complete vs. Over-Complete Transforms . . . . . . . . . . . . 549
18.1.3 Decisions Decisions . . . . . . . . . . . . . . . . . . . . . . . 551
18.2 Vision Modeling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 554
18.3 Spatio-Temporal Masking in Video Coding . . . . . . . . . . . . . . . 558
18.4 Picture Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . 559
18.4.1 Picture Quality Metrics Design Approaches . . . . . . . . . . . 559
18.4.2 Alternative Assessment Methods and Issues . . . . . . . . . . . 560
18.4.3 More Challenges in Picture Quality Assessment . . . . . . . . . 561
2006 by Taylor & Francis Group, LLC
-
Contents xxxv
18.5 Challenges in Perceptual Coder Design . . . . . . . . . . . . . . . . . 562
18.5.1 Incorporating HVS in Existing Coders . . . . . . . . . . . . . . 562
18.5.2 HVS Inspired Coders . . . . . . . . . . . . . . . . . . . . . . . 563
18.5.3 Perceptually Lossless Coding . . . . . . . . . . . . . . . . . . 565
18.6 Codec System Design Optimization . . . . . . . . . . . . . . . . . . . 566
18.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
A VQM Performance Metrics 575
A.1 Metrics Relating to Model Prediction Accuracy . . . . . . . . . . . . 576
A.2 Metrics Relating to Prediction Monotonicity of a Model . . . . . . . . 580
A.3 Metrics Relating to Prediction Consistency . . . . . . . . . . . . . . . 581
A.4 MATLAB Source Code . . . . . . . . . . . . . . . . . . . . . . . . . 583
A.5 Supplementary Analyses . . . . . . . . . . . . . . . . . . . . . . . . . 591
2006 by Taylor & Francis Group, LLC
-
Part I
Picture Coding and Human VisualSystem Fundamentals
2006 by Taylor & Francis Group, LLC
-
Chapter 1
Digital Picture Compression andCoding Structure
Jae Jeong Hwang, Hong Ren Wu and K.R. Rao
Kunsan National University, Republic of Korea Royal Melbourne Institute of Technology, Australia University of Texas at Arlington, U.S.A.
1.1 Introduction to Digital Picture Coding
Digital video service has become an integral part of entertainment, education, broadcast-ing, communication, and business arenas [Say00, Bov05, PE02, PC00, GW02, Ric03,Gha03, Gib97]. Digital camcorders are more preferred than analog ones in the con-sumer market with their convenience and high quality. In fact, still image or movingvideo taken by digital cameras can be stored, displayed, edited, printed or transmittedvia the Internet. Digital television provides strong affinities to the TV audience and isgoing to expel analog television receivers from the market. Digital video and image aresimple alternative means of carrying the same information as their analog counterparts.An ideal analog recorder should exactly record the natural phenomena in the form ofvideo, image or audio. An ideal digital recorder has to do the same work with a num-ber of advantages such as interactivity, flexibility, and compressibility. Although, in thereal situation, ideal conditions seldom prevail and may not be possible by means of bothanalog and digital techniques, digital compression is one of the techniques used to lowerthe cost for a video system while maintaining the same quality of service. Data com-pression is a process to yield a compact representation of a signal in the digital format.For delivery or transmission of information, the key issue is to minimize the bit ratethat represents the number of bits per second in the real-time delivery system such as avideo stream or the number of bits per picture element (pixel or pel) in the static image.Digital data contains huge amounts of information. Full motion video, e.g., in NTSCformat at 30 frames per second (fps) and at 720 x 480 pixel resolution, generates data for
3 2006 by Taylor & Francis Group, LLC
-
4 Digital Video Image Quality and Perceptual Coding
luminance component at 10.4 Mbytes/sec, assuming 8 bits per sample quantization. Ifwe include color components for a 4:2:2 format, data rate of 20.8 Mbytes/sec is needed,allowing only 31 seconds of video storage on a 650 Mbyte CD-ROM. The storage ca-pacity up to 74 minutes is only possible by means of compression technology. Then,how can it be compressed? There is considerable statistical redundancy in the signal.
Spatial correlation: Within a single two-dimensional image plane, there usuallyexists significant correlation among neighboring samples.
Temporal correlation: For temporal data, such as moving video through temporaldirection, there usually exists significant correlation among samples in adjacentframes.
Spectral correlation: For multispectral images, such as satellite images, there usu-ally exists significant correlation among different frequency bands.
Original video/image data containing any kind of correlation or redundancy can be com-pressed by appropriate techniques such as predictive or transform based coding thatreduces correlation inherently. Image compression aims at reducing the number of bitsneeded to represent an image by removing the spatial and spectral redundancies as muchas possible, while video compression is achieved by removing temporal redundancy aswell. This is called redundancy reduction, the principle behind compression. Anotherimportant principle behind compression is irrelevancy reduction that will not be noticedby the signal receiver, namely the Human Visual System (HVS). Two ways of classify-ing compression techniques in terms of reproduction quality at the decoder are losslesscompression and lossy compression. In lossless compression schemes, the reconstructedimage, after compression, is numerically identical to the original image. This is also re-ferred to as a reversible process. However lossless compression can only achieve amodest amount of compression depending on the amount of data correlation. An imagereconstructed following lossy compression contains degradation relative to the original.Often this is because the compression scheme can completely discard redundant infor-mation. However, lossy schemes are capable of achieving much higher compression.Visually lossless coding is achieved if no visible loss is perceived by human viewersunder normal viewing conditions. Different classes of compression techniques with re-spect to statistical redundancy and irrelevancy (or psychovisual redundancy) reductions
Another classification in terms of coding techniques is based on prediction or trans-formation techniques. In predictive coding, information already sent or available is usedto predict future values, and the difference is coded and transmitted. Prediction canbe performed in any domain, but is usually done in the image or spatial domain. It isrelatively simple to implement and is readily adapted to local image characteristics. Dif-ferential Pulse Code Modulation (DPCM) is one particular example of predictive coding
2006 by Taylor & Francis Group, LLC
are illustrated in Figure 1.1.
-
1.1. Introduction to Digital Picture Coding 5
Figure 1.1: Illustration of digital picture compression fundamental concepts.
in the spatial or time domain. Transform coding, on the other hand, first transforms theimage from its spatial domain representation to a different type of representation us-
then encodes the transformed values (coefficients). This method provides greater datacompression compared to predictive methods, although at the expense of higher com-putational complexity.
As a result of a quantization process, inevitable errors or distortions happen in thedecoded picture quality. Distortion measures can be divided into two categories: sub-jective and objective measures. It is said to be subjective if the quality is evaluated byhumans. The use of human analysts, however, is quite impractical and may not guaran-tee objectivity. The assessment is not stationary, depending on their feelings. Moreover,the definition of distortion highly depends on the application, i.e. the best quality eval-uation is not always made by people at all.
In the objective measures, the distortion is calculated as the difference between theoriginal image, xo, and the reconstructed image, xr, by a predefined function. It isassumed that the original image is perfect. All changes are considered as occurrences ofdistortion, no matter how they appear to a human observer. The quantitative distortionof the reconstructed image is commonly measured by the mean square error (MSE),the mean absolute error (MAE), and the peak-to-peak signal to noise ratio (PSNR):
MSE =1
M NM1m=0
N1n=0
(xo[m, n] xr[m, n])2 (1.1)
MAE =1
M NM1m=0
N1n=0
|xo[m, n] xr[m, n])| (1.2)
2006 by Taylor & Francis Group, LLC
ing some well-known transforms such as DCT, DWT (See details in Section 1.3) and
-
6 Digital Video Image Quality and Perceptual Coding
PSNR = 10 log102552
MSE(1.3)
where M and N are the height and the width of the image, respectively, and (1.3) isdefined for 8 bits/pixel monochrome image representation.
These measures are widely used in the literature. Unfortunately, these measuresdo not always coincide with the evaluations of a human expert. The human eye, forexample, does not observe small changes of intensity between individual pixels, but issensitive to the changes in the average value and contrast in larger regions. Thus, oneapproach would be to calculate the local properties, such as mean values and variancesof some small regions in the image, and then compare them between the original andthe reconstructed images. Another deficiency of these distortion functions is that theymeasure only local, pixel-by-pixel differences, and do not consider global artifacts, suchas blockiness, blurring, jaggedness of the edges, ringing or any other type of structuraldegradation.
1.2 Characteristics of Picture Data
1.2.1 Digital Image Data
Digital image is visual information represented in a discrete form, suitable for digitalelectronic storage and transmission. It is obtained by image sampling techniques that adiscrete array x[m, n] is extracted from the continuous image field at some time instantover some rectangular area M N . The digitized brightness value is called the greylevel value. Each image sample is a picture element called a pixel or a pel. Thus, atwo-dimensional (2-D) digital image is defined as:
x[m, n] =
x[0, 0] x[0, 1] x[0, N 1]x[1, 0] x[1, 1] x[1, N 1]
......
. . ....
x[M 1, 0] x[M 1, 1] x[M 1, N 1]
(1.4)
where its array of image samples is defined on the two-dimensional Cartesian coordinate
size M N with 2q different grey levels is b = M N q. That is, to store a typicalimage of size 512512 with 256 grey levels (q = 8), we need 2,097,152 bits or 262,144bytes. We may try to reduce the factor M , N or q to save capacity of storage or bits fortransmission, but it is not said to be compressed, since it results in significant loss in thequality of the picture.
2006 by Taylor & Francis Group, LLC
system as illustrated in Figure 1.2. The number of bits, b, we need to store an image of
-
1.2. Characteristics of Picture Data 7
Figure 1.2: Geometric relationship between the Cartesian coordinate system and its array of image sam-ples.
1.2.2 Digital Video Data
A natural video stream is continuous in both spatial and temporal domains. In order torepresent and process a video stream digitally it is also necessary to sample spatiallyand temporally as shown in Figure 1.3. An image sampled in the spatial domain istypically represented on a rectangular grid and a video stream is a series of still imagessampled at regular intervals in time. In this case, the still image is usually called a frame.For processing video signal in a television format, a couple of fields are interlaced toconstruct a frame. It is called a picture for processing non-interlaced (frame-based)video signal. Each spatio-temporal sample, pixel, is represented as a positive digitalnumber that describes the brightness (luminance) and color components.
Figure 1.3: Three dimensional (spatial and temporal) domain in a video stream.
A natural video scene is captured, typically with a camera, and converted to a sam-
color-difference format Y C1C2 rather than in the original RGB natural color format.It may then be handled in the digital domain in a number of ways, including processing,storage and transmission. At the final output of the system, it is displayed to a viewerby reproducing it on a video monitor.
The RGB (red, green, and blue) color space is the basic choice for computer graph-ics and image frame buffers because color CRTs use red, green, and blue phosphors tocreate the desired color as the three primary additive colors. Individual components areadded together to form a color and an equivalent addition of all components produceswhite. However, RGB is not very efficient for representing real-world images, sinceequal bandwidths are required to describe all the three color components. The equal
2006 by Taylor & Francis Group, LLC
pled digital representation as shown in Figure 1.4. Digital video is represented in digital
-
8 Digital Video Image Quality and Perceptual Coding
Figure 1.4: Digital representation and color format conversion of natural video stream.
bandwidths result in the same pixel depth and display resolution for each color com-ponent. Using 8 bits per component requires 24 bits information for a pixel, resultingin 3 times the capacity of the luminance component. Moreover, the sensitivity of thecolor component of the human eye is less than that of the luminance component. Forthese reasons, many image coding standards and broadcast systems use luminance andcolor difference signals. These are, for example, Y UV and Y IQ for analog televisionstandards and Y CbCr for their digital version.
The Y CbCr format recommended by the ITU.R BT-601 [ITU82] as a worldwidevideo component standard is obtained from digital gamma-corrected RGB signals asfollows:
Y = 0.299R + 0.587G + 0.114B
Cb = 0.169R 0.331G + 0.500BCr = 0.500R
0.419G 0.081B(1.5)
The color-difference signals are given by:
(B Y ) = 0.299R 0.587G + 0.886B(R Y ) = 0.701R 0.587G 0.114B (1.6)
where the values for (BY ) have a range of 0.886 and for (RY ) a range of 0.701,while those for Y have a range of 0 to 1.
To restore the signal excursion of the color-difference signals to unity (-0.5 to +0.5),(BY ) is multiplied by a factor 0.564 (0.5 divided by 0.886) and (RY ) is multipliedby a factor 0.713 (0.5 divided by 0.701). Thus the Cb and Cr are the re-normalized blueand red color difference signals, respectively.
Given that the luminance signal is to occupy 220 levels (16 to 235), the luminancesignal has to be scaled to obtain the digital value, Yd. Similarly, the color differencesignals are to occupy 224 levels and the zero level is to be level 128. The digital repre-sentation for the three components are expressed as [NH95]:
2006 by Taylor & Francis Group, LLC
-
1.2. Characteristics of Picture Data 9
Yd = 219Y + 16Cb = 224[0.564(B Y )] + 128 = 126(B Y ) + 128Cr = 224[0.713(R Y )] + 128 = 160(R Y ) + 128
(1.7)
or in its vector form: YdCb
Cr
= 65.481 128.553 24.96637.797 74.203 112.000
112.000 93.786 18.214
RG
B
+ 16128
128
(1.8)
where the corresponding level number after quantization is the nearest integer.
Video transmission bit rate is decreased by adopting lower sampling rates while pre-serving acceptable video quality. Given image resolution of 720576 pixels representedwith 8 bits each, the bit rate required is calculated as:
4:4:4 resolution: 72057683 = 10 Mbits/frame10 Mbits/frame29.97 frames/sec = 300 Mbits/sec
4:2:0 resolution: (7205768) + (3602888)2 = 5 Mbits/frame5 Mbits/frame29.97 frames/sec = 150 Mbits/sec
The 4:2:0 version requires half as many bits as the 4:4:4 version but compression isstill necessary for transmission and storage.
1.2.3 Statistical Analysis
The mean value of the discrete image array, x as defined in (1.4), expressed convenientlyin vector-space form is given by
x = E{x} = 1M N
M1m=0
N1n=0
x[m, n] =2b1k=0
xkp(xk) (1.9)
where xk denotes the k-th grey level that varies from value 0 to maximum level 2b 1defined by the quantization bits b and p(xk) = nk/(M N) the probability of xk.
The variance function of the image array, x, is defined as
2x =1
M NM1m=0
N1n=0
(x[m, n] x)2 =2b1k=0
(