JPEG2000 Standard for Image Compression · File Format for JPEG2000 Part 1: JP2 format 8.3.1 File...

JPEG2000 Standard for Image Compression Concepts, Algorithms and VLSI Architectures

This Page Intentionally Left Blank

JPEG2000 Standard for Image Compression

JPEG2000 Standard for Image Compression Concepts, Algorithms and VLSI Architectures

Tinku Acharya Avisere, Inc. Tucson, Arizona & Department of Engineering Arizona State University Tempe, Arizona

Ping-Sing Tsai Department of Computer Science The University of Texas-Pan American Edin burg, Texas

WI LEY- INTERSCIENCE

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright 0 2005 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part ofthis publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clcarance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, lnc., 1 I 1 River Street, Hohoken. NJ 07030, (201) 748-601 I , fax (201) 748-6008.

Limit of LiabilityiDisclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by salcs representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited lo special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Carc Departmcnt within the U.S. at 877-762-2974, outside the U.S. at 3 17-572-3993 or fax 317-572-4002

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however. may not be available in electronic format.

Library of’ Congress Cataloging-in-Publieution Data:

Acharya, Tinku. JPEG2000 standard for image compression : concepts, algorithms and VLSl

architecturcs / Tinku Acharya, Ping-Sing Tsai. p. cm.

“A Wiley-Interscience publication.” Includes bibliographical references and index. ISBN 0-471-48422-9 (cloth) I . JPEG (Image coding standard) 2. Image compression. I. Tsai. Ping-Sing, 1962- 11.

Title.

TK6680.5.A25 2004 006.6-dc22 2004042256

Printed in the United States of America

I 0 9 8 7 6 5 4 3 2 3

To my mother Mrs. Mrittika Acharya, my wife Lisa, my daughter Arita,

and my son Arani

- Tinku Acharya

To my family Meiling, Amy, and Tiffany - Ping-Sing Tsai

Preface

1 Introduction to Data Compression

Contents

1.1 1.2

1.3

1.4 1.5 1.6

1.7 1.8

Introduction Why Compression? 1.2.1 Advantages of Data Compression 1.2.2 Disadvantages of Data Compression Information Theory Concepts 1.3.1 1.3.2 Noiseless Source Coding Theorem 1.3.3 Unique Decipherability Classification of Compression algorithms A Data Compression Model Compression Performance 1.6.1 1.6.2 Quality Metrics 1.6.3 Coding Delay 1.6.4 Coding Complexity Overview of Image Compression Multimedia Data Compression Standards

Discrete Memoryless Model and Entropy

Compression Ratio and Bits per Sample

xiii

1 1 2 3 4 5 5 6 8 9

11 12 13 13 15 15 15 17

V i i

viii CONTENTS

1.8.1 Still Image Coding Standard 1.8.2 Video Coding Standards 1.8.3 Audio Coding Standard 1.8.4 Text Compression

References 1.9 Summary

2 Source Coding Algorithms 2.1 Run-length Coding 2.2 Huffman Coding

2.2.1 Limitations of Huffman Coding 2.2.2 Modified Huffman Coding

2.3.1 Encoding Algorithm 2.3.2 Decoding Algorithm

2.4 Binary Arithmetic Coding 2.4.1 Implementation with Integer Mathematics 2.4.2 The QM-Coder

2.5.1 The LZ77 Algorithm 2.5.2 The LZ78 Algorithm 2.5.3 The LZW Algorithm

References

2.3 Arithmetic Coding

2.5 Ziv-Lempel Coding

2.6 Summary

3 JPEG: Still Image Compression Standard 3.1 Introduction 3.2 3.3 Baseline JPEG Compression

The JPEG Lossless Coding Algorithm

3.3.1 Color Space Conversion 3.3.2 Source Image Data Arrangement 3.3.3 The Baseline Compression Algorithm 3.3.4 Discrete Cosine Transform 3.3.5 Coding the DCT Coefficients 3.3.6 Decompression Process in Baseline JPEG

3.4 Progressive DCT-based Mode 3.5 Hierarchical Mode 3.6 Summary

17 18 18 19 20 20

23 23 24 27 28 30 31 33 34 38 39 44 44 46 49 52 53

55 55 56 60 60 61 62 63 66 72 75 76 77

CON TIE N TS

References

4 Introduction to Discrete Wavelet Transform 4.1 Introduction 4.2 Wavelet Transforms

4.2.1 Discrete Wavelet Transforms 4.2.2 Concept of Multiresolution Analysis 4.2.3 Implementation by Filters and the Pyramid Algorithm

4.3 Extension to Two-Dimensional Signals 4.4 Lifting Implementation of the Discrete Wavelet Transform

Finite Impulse Response Filter and Z-transform Euclidean Algorithm for Laurent Polynomials Perfect Reconstruction and Polyphase Representation of Filters

4.4.1 4.4.2 4.4.3

4.4.4 Lifting 4.4.5

4.5 Why Do We Care About Lifting? 4.6 Summary

References

Data Dependency Diagram for Lifting Computation

5 VLSI Architectures for Discrete Wavelet Transforms 5.1 Introduction 5.2 A VLSI Architecture for the Convolution Approach

5.2.1 Mapping the DWT in a Semi-Systolic Architecture 5.2.2 Mapping the Inverse DWT in a Semi-Systolic

Architecture 5.2.3 Unified Architecture for DWT and Inverse DWT VLSI Architectures for Lifting-based DWT 5.3.1 Mapping the Data Dependency Diagram in Pipeline

Architectures 5.3.2 Enhanced Pipeline Architecture by Folding 5.3.3 Flipping Architecture 5.3.4 A Register Allocation Scheme for Lifting 5.3.5 A Recursive Architecture for Lifting 5.3.6 A DSP-Type Architecture for Lifting 5.3.7 A Generalized and Highly Programmable Architecture

for Lifting 5.3.8 A Generalized Two-Dimensional Architecture

5.3

ix

78

79 79 80 82 83 85 87 91 92 93

94 96

102 103 103 104

107 107 109 110

112 116 118

119 120 121 121 124 125

126 127

5.4 Summary 133

x CONTENTS

References 133

6 JPEG2000 Standard 6.1 Introduction 6.2 Why JPEG2000? 6.3 6.4 6.5 Image Preprocessing

Parts of the JPEG2000 Standard Overview of the JPEG2000 Part 1 Encoding System

6.5.1 Tiling 6.5.2 DC Level Shifting 6.5.3 Multicomponent Transformations

6.6.1 Discrete Wavelet Transformation 6.6.2 Quantization 6.6.3 Region of Interest Coding 6.6.4 Rate Control 6.6.5 Entropy Encoding Tier-2 Coding and Bitstream Formation

References

6.6 Compression

6.7 6.8 Summary

7 Coding Algorithms in JPEG2000 7.1 Introduction 7.2 Partitioning Data for Coding 7.3 Tier-1 Coding in JPEG2000

7.3.1 Fractional Bit-Plane Coding 7.3.2 Examples of BPC Encoder 7.3.3 Binary Arithmetic Coding-MQ-Coder

7.4 Tier-2 Coding in JPEG2000 7.4.1 Basic Tag Tree Coding 7.4.2 Bitstream Formation 7.4.3 Packet Header Information Coding

References 7.5 Summary

8 Code-Stream Organization and File Format 8.1 Introduction 8.2 Syntax and Code-Stream Rules

137 137 139 142 145 145 145 146 146 147 149 152 153 156 157 158 158 159

163 163 164 164 165 177 185 195 196 197 20 1 21 1 21 1

213 213 213

CONTENTS xi

8.2.1 Basic Rules 8.2.2 8.2.3 Headers Definition File Format for JPEG2000 Part 1: JP2 format 8.3.1 File Format Organization 8.3.2 J P 2 Required Boxes

Markers and Marker Segments Definitions

8.3

8.4 Example 8.5 Summary

References

9 VLSI Architectures for JPEG2000 9.1 Introduction 9.2 9.3 VLSI Architectures for EBCOT

A JPEG2000 Architecture for VLSI Implementation

9.3.1 Combinational Logic Blocks 9.3.2 Functionality of the Registers 9.3.3 Control Mechanism for the EBCOT Architecture VLSI Architecture for Binary Arithmetic Coding: MQ-Coder

Summary of Other Architectures for JPEG2000 9.6.1 Pass-Parallel Architecture for EBCOT 9.6.2 Memory-Saving Architecture for EBCOT 9.6.3

9.7 Summary

9.4 9.5 Decoder Architecture for JPEG2000 9.6

Computationally Efficient EBCOT Architecture by Skipping

References

10 Beyond Part 1 of JPEG2000 Standard 10.1 Introduction 10.2 Part 2: Extensions

10.2.1 Variable DC Offset 10.2.2 Variable Scalar Quantization Offsets 10.2.3 Trellis-Coded Quantization 10.2.4 Visual Masking 10.2.5 Arbitrary Wavelet Decomposition 10.2.6 Arbitrary Wavelet Transformation 10.2.7 Single Sample Overlap Discrete Wavelet Transformation

257

215 216 216 218 220 220 223 225 225

227 227 228 231 233 235 237 242 245 246 246 247

248 249 249

253 253 253 254 254 2 54 255 256 257

10.2.8 Multiple Component Transforms 258

xi; CONTENTS

10.2.9 Nonlinear Transformations 10.2.10 Region of Interest Extension 10.2.1 1 File Format Extension and Metadata Definitions

10.3 Part 3: Motion JPEG2000 10.4 Part 4: Conformance Testing 10.5 Part 5: Reference Software 10.6 Part 6: Compound Image File Format 10.7 Other Parts (7-12) 10.8 Summary

References

Index

About the Authors

259 261 26 1 261 264 265 265 265 266 267

269

273

Preface

The growing demand for interactive multimedia technologies, in various application domains in this era of wireless and Internet communication, necessi- tated a number of desirable properties to be included in image and video compression algorithms. Accordingly, current and future generation image compression algorithms should not only demonstrate state-of-the-art performance, it should also provide desirable functionalities such as progressive transmission in terms of image fidelity as well as resolution, scalability, region-of-interest coding, random access, error resilience, handling large-size images of different types, etc. Many of these desired functionalities are not easily achievable by the current JPEG standard. The algorithms to implement different modes of the current JPEG standard are independent from each other. The lossless compression algorithm in current JPEG standard is completely different from the lossy compression mode and also the progressive and hierarchical modes. JPEG2000 is the new still image compression standard that has been developed under the auspices of the International Organization for Standard- ization (ISO). The systems architecture of this new standard has been defined in such a unified manner that it offers a single unified algorithmic framework and a single syntax definition of the code-stream organization so that different modes of operations can be handled by the same algorithm and the same syntax definition offers the aforementioned desirable functionalities. Moreover, the JPEG standard was defined in 1980s before the emergence of the Inter- net age. Many developments since then have changed the nature of research

... Xl l l

xiv PREFACE

and development in multimedia applications and communication arena. The JPEG2000 standard takes these new developments into consideration.

The JPEG2000 algorithm has been developed based on the discrete wavelet transform (DWT) technique as opposed to the discrete cosine transform (DCT) based current JPEG. The nature of DWT helps to integrate both the lossless and lossy operations into the same algorithmic platform as well as it allows one to perform different kinds of progressive coding and decoding in the same algorithmic platform. Also the bit-plane coding of the transformed coefficients and the underlying structure of the bitstream syntax is very suitable to achieving different progressive operations during both encoding and decoding.

In this book, we present the basic background in multimedia compression techniques and prepare the reader for detailed understanding of the JPEG2000 standard. We present both the underlying theory and principles behind the algorithms of the JPEG2000 standard for scalable image compression. We have presented some of the open issues that are not explicitly defined in the standard. We have shown how the results achieved in different areas in information technology can he applied to enhance the performance of the JPEG2000 standard for image compression. We also introduced the VLSI architectures and algorithms for implementation of the JPEG2000 standard in hardware. The VLSI implementation of JPEG2000 will be an important factor in the near future for a number of image processing applications and devices such as digital camera, color fax, printer, scanner, etc. We also compile the latest publications and results in this book. Throughout the book we have provided sufficient examples for easy understanding by the readers.

The first two chapters provide an overview of the principles and theory of data and image compression with numerous examples. In Chapter 3, we review the current JPEG still standard for image compression, discuss the advantages and disadvantages of current JPEG, and the need for the new JPEG2000 standard for still image compression. We discuss the principles of discrete wavelet transformation and its implementation using both the convolution approach and the lifting approach in Chapter 4. In this chapter, we discuss the theory of multiresolution analysis and also the principles of lifting factorization for efficient implementation of discrete wavelet transform. In Chapter 5, we discuss VLSI algorithms and architectures for implementation of discrete wavelet transform and review different architectures for lifting-based implementation. In Chapters 6 to 8, we concentrate on descriptions of the JPEG2000 building blocks, de- tails of the coding algorithms with examples, code-stream organization using JPEG2000 syntax, and formation of the compressed file of the JPEG2000 standard. Chapter 9 is devoted to the VLSI architectures of the standard in great detail, which cannot be found in current books in the marketplace. In Chapter 9, we also summarize the latest results and developments in this area. Chapter 10 provides a discussion on the JPEG2000 extensions and other parts of the standards as of writing this book. Every chapter includes sufficient references relevant to the discussion.

This book consists of 10 chapters.

PREFACE xv

The book may be used either in a graduate-level course as a part of the subject of data compression, image compression, and multimedia processing, or as a reference book for professionals and researchers. This book is particularly useful for the engineers and professionals in industry for easy understanding of the subject matter and as an aid in both software and hardware developments of their products.

We would like to express our sincere thanks to many friends and colleagues who directly or indirectly helped us in different ways. We sincerely thank Dr. Val Moliere of John Wiley & Sons, Inc., for her assistance all through this project. We extend our gratitude to Professor Chaitali Chakraborti of Arizona State University for her guidance and assistance in compiling some of the chapters in this book. We also thank Dr. Kishore Andra for supplying some useful materials to enrich this book. We thank Dr. Andrew J. Griffis for reviewing and making suggestions to better explain some of the materials. Mr. Roger Undhagen and many other friends deserve special thanks for their continuous encouragement and support toward the compilation of this treatise. We would also like to thank the anonymous reviewers of our book proposal for their very constructive review and suggestions.

Finally, we are indebted to each member of our families for their active support and understanding throughout this project. Especially, Mrs. Baishali Acharya and Mrs. Meiling Dang stood strongly behind us with their love and supports which helped us to attempt this journey, and were cooperative with our erratic schedules during compilation of this book. We would also like to express our sincere appreciation to our children, who were always excited about this work and made us proud.

Tinku Acharya Ping-Sing Tsai

Introduction to Data Compression

1.1 INTRODUCTION

We have seen the revolution in computer and communication technologies in the twentieth century. The telecommunications industry has gone through sea-changes from analog to digital networking that enabled today’s very powerful Internet technology. Transition from the analog to the digital world offered many opportunities in every walk of life. Telecommunications, the Internet, digital entertainment, and computing in general are becoming part of our daily lives. Today we are talking about digital networks, digital representation of images, movies, video, TV, voice, digital library-all because digital representation of the signal is more robust than the analog counterpart for processing, manipulation, storage, recovery, and transmission over long distances, even across the globe through communication networks. In recent years, there have been significant advancements in processing of still image, video, graphics, speech, and audio signals through digital computers in order to accomplish different application challenges. As a result, multimedia information comprising image, video, audio, speech, text, and other data types has the potential to become just another data type. Telecommunica- tion is no longer a platform for peer-to-peer voice communication between two people. Demand for communication of multimedia data through the telecommunications network and accessing the multimedia data through Internet is growing explosively. In order to handle this pervasive multimedia data usage, it is essential that the data representation and encoding of multimedia data be standard across different platforms and applications. Still image and video

2 /NTRODUCJ/ON TO DATA COMPRESSION

data comprise a significant portion of the multimedia data and they occupy the lion’s share of the communication bandwidth for multimedia communication. As a result, development of efficient image compression techniques continues to be an important challenge to us, both in academia and in industry.

1.2 WHY COMPRESSION?

Despite the many advantages of digital representation of signals compared to the analog counterpart, they need a very large number of bits for storage and transmission. For example, a high-quality audio signal requires approximately 1.5 megabits per second for digital representation and storage. A television-quality low-resolution color video of 30 frames per second with each frame containing 640 x 480 pixels (24 bits per color pixel) needs more than 210 megabits per second of storage. As a result, a digitized one-hour color movie would require approximately 95 gigabytes of storage. The storage requirement for upcoming high-definition television (HDTV) of resolution 1280 x 720 at 60 frames per second is far greater. A digitized one-hour color movie of HDTV-quality video will require approximately 560 gigabytes of storage. A digitized 14 x 17 square inch radiograph scanned at 70 pm occupies nearly 45 megabytes of storage. Transmission of these digital signals through limited bandwidth communication channels is even a greater challenge and sometimes impossible in its raw form. Although the cost of storage has decreased drasti- cally over the past decade due to significant advancement in microelectronics and storage technology, the requirement of data storage and data processing applications is growing explosively to outpace this achievement.

Interestingly enough, most of the sensory signals such as still image, video, and voice generally contain significant amounts of superfluous and redundant information in their canonical representation as far as the human perceptual system is concerned. By human perceptual system, we mean our eyes and ears. For example, the neighboring pixels in the smooth region of a natural image are very similar and small variation in the values of the neighboring pixels are not noticeable to the human eye. The consecutive frames in a stationary or slowly changing scene in a video are very similar and redundant. Some audio data beyond the human audible frequency range are useless for all practical purposes. This fact tells us that there are data in audic-visual signals that cannot be perceived by the human perceptual system. We call this perceptual redundancy. In English text files, common words (e.g., “the”) or similar patterns of character strings (e.g., “ze”, “ th”) are usually used repeatedly. It is also observed that the characters in a text file occur in a well-documented distribution, with letter e and “space” being the most popular. In numeric data files, we often observe runs of similar numbers or predictable interdependency among the numbers. We have mentioned only a few examples here. There are many such examples of redundancy in digital representation in all sorts of data.

WHY COMPRESSION? 3

Data compression is the technique to reduce the redundancies in data representation in order to decrease data storage requirements and hence communication costs. Reducing the storage requirement is equivalent to increasing the capacity of the storage medium and hence communication bandwidth. Thus the development of efficient compression techniques will continue to be a de- sign challenge for future communication systems and advanced multimedia applications.

1.2.1 Advantages of Data Compression

The main advantage of compression is that it reduces the data storage requirements. It also offers an attractive approach to reduce the communication cost in transmitting high volumes of data over long-haul links via higher effective utilization of the available bandwidth in the data links. This significantly aids in reducing the cost of communication due to the data rate reduction. Be- cause of the data rate reduction, data compression also increases the quality of multimedia presentation through limited-bandwidth communication channels. Hence the audience can experience rich-quality signals for audio-visual data representation. For example, because of the sophisticated compression technologies we can receive toll-quality audio at the other side of the globe through the good old telecommunications channels at a much better price compared to a decade ago. Because of the significant progress in image compression techniques, a single 6 MHz broadcast television channel can carry HDTV signals to provide better quality audio and video at much higher rates and enhanced resolution without additional bandwidth requirements. Because of the reduced data rate offered by the compression techniques, computer network and In- ternet usage is becoming more and more image and graphic friendly, rather than being just data- and text-centric phenomena. In short, high-performance compression has created new opportunities of creative applications such as digital library, digital archiving, videoteleconferencing, telemedicine, and digital entertainment, to name a few.

There are many other secondary advantages in data compression. For example, it has great implications in database access. Data compression may enhance the database performance because more compressed records can be packed in a given buffer space in a traditional computer implementation. This potentially increases the probability that a record being searched will be found in the main memory. Data security can also be greatly enhanced by encrypting the decoding parameters and transmitting them separately from the compressed database files to restrict access of proprietary information. An extra level of security can be achieved by making the compression and decompression processes totally transparent to unauthorized users.

The rate of input-output operations in a computing device can be greatly increased due to shorter representation of data. In systems with levels of storage hierarchy, data compression in principle makes it possible to store data a t a higher and faster storage level (usually with smaller capacity), thereby

4 lNTRODUCTlON TO DATA COMPRESSlON

reducing the load on the input-output channels. Data compression obviously reduces the cost of backup and recovery of data in computer systems by storing the backup of large database files in compressed form.

The advantages of data compression will enable more multimedia applications with reduced cost and hence aid its usage by a larger population with newer applications in the near future.

1.2.2 Disadvantages of Data Compression

Although data compression offers numerous advantages and it is the most sought-after technology in most of the data application areas, it has some disadvantages too, depending on the application area and sensitivity of the data. For example, the extra overhead incurred by encoding and decoding processes is one of the most serious drawbacks of data compression, which discourages its usage in some areas (e.g., in many large database applications). This extra overhead is usually required in order to uniquely identify or interpret the compressed data. For example, the encoding/decoding tree in a Huffman coding [7] type compression scheme is stored in the output file in addition to the encoded bitstream. These overheads run opposite to the essence of data compression, that of reducing storage requirements. In large statistical or scientific databases where changes in the database are not very frequent, the decoding process has greater impact on the performance of the system than the encoding process. Even if we want to access and manipulate a single record in a large database, it may be necessary to decompress the whole database before we can access the desired record. After access and probably modification of the data, the database is again compressed to store. The delay incurred due to these compression and decompression processes could be prohibitive for many real-time interactive database access requirements unless extra care and complexity are added in the data arrangement in the database.

Data compression generally reduces the reliability of the data records. For example, a single bit error in compressed code will cause the decoder to mis- interpret all subsequent bits, producing incorrect data. Transmission of very sensitive compressed data (e.g., medical information) through a noisy communication channel (such as wireless media) is risky because the burst errors introduced by the noisy channel can destroy the transmitted data. Another problem of data compression is the disruption of data properties, since the compressed data is different from the original data. For example, sorting and searching schemes into the compressed data may be inapplicable as the lexical ordering of the original data is no longer preserved in the compressed data.

In many hardware and systems implementations, the extra complexity added by data compression can increase the system’s cost and reduce the system’s efficiency, especially in the areas of applications that require very low-power VLSI implementation.

lNfORMATlON THEORY CONCEPTS 5

1.3 INFORMATION THEORY CONCEPTS

The Mathematical Theory of Communication, which we also call Information Theory here, pioneered by Claude E. Shannon in 1948 [l, 2, 3, 41 is considered to be the theoretical foundation of data compression research. Since then many data compression techniques have been proposed and applied in practice.

Representation of data is a combination of information and redundancy [l]. Information is the portion of data that must be preserved permanently in its original form in order to correctly interpret the meaning or purpose of the data. However, redundancy is that portion of data that can be removed when it is not needed or can be reinserted to interpret the data when needed. Most often, the redundancy is reinserted in order to regenerate the original data in its original form. Data compression is essentially a redundancy reduction technique. The redundancy in data representation is reduced such a way that it can be subsequently reinserted to recover the original data, which is called decompression of the data. In the literature, sometimes data compression is referred to as coding and similarly decompression is referred to as decoding.

Usually development of a data compression scheme can be broadly divided into two phases-modeling and coding. In the modeling phase, information about redundancy that exists in the data is extracted and described in a model. Once we have the description of the model, we can determine how the actual data differs from the model and encode the difference in the coding phase. Obviously, a data compression algorithm becomes more effective if the model is closer to the characteristics of the data generating process, which we often call the source. The model can be obtained by empirical observation of the statistics of the data generated by the process or the source. In an empirical sense, any information-generating process can be described as a source that emits a sequence of symbols chosen from a finite alphabet. Alphabet is the set of all possible symbols generated by the source. For example, we can think of this text as being generated by a source with an alphabet containing all the ASCII characters.

1.3.1

If the symbols produced by the information source are statistically independent to each other, the source is called a discrete memoryless source. A discrete memoryless source is described by its source alphabet A = { a l , UZ, . . . , a N }

and the associated probabilities of occurrence P = {p(al),p(az), . . . , p ( a i y ) } of the symbols a l , a z , . . . , U N in the alphabet A.

The definition of the discrete memoryless source model provides us a very powerful concept of quantification of average information content per symbol of the source, or entropy of the data. The concept of “entropy” was first used by physicists as a thermodynamic parameter to measure the degree of

Discrete Memoryless Model and Entropy

6 INTRODUCTION TO DATA COMPRESSION

“disorder” or “chaos” in a thermodynamic or molecular system. In a statistical sense, we can view this as a measure of degree of “surprise” or “uncertainty.” In an intuitive sense, it is reasonable to assume that the appearance of a less probable event (symbol) gives us more surprise, and hence we expect that it might carry more information. On the contrary, the more probable event (symbol) will carry less information because it was more expected.

With the above intuitive explanation, we can comprehend Shannon’s definition of the relation between the source symbol probabilities and corresponding codes. The amount of information content, I ( a i ) , for a source symbol a,, in terms of its associated probability of occurrence p(ai) is

The base 2 in the logarithm indicates that the information is expressed in binary form, or bits. In terms of binary representation of the codes, a symbol ai that is expected to occur with probability p ( a i ) is best represented in approximately - log, p(ai) bits. As a result, a symbol with higher probability of occurrence in a message is coded using a fewer number of bits.

If we average the amount of information content over all the possible symbols of the discrete memoryless source, we find the average amount of information content per source symbol from the discrete memoryless source. This is expressed as

N N

i=l i = I

This is popularly known as entropy in information theory. Hence entropy is the expected length of a binary code over all possible symbols in a discrete memoryless source.

The concept of entropy is very powerful. In “stationary” systems, where the probabilities of occurrence of the source symbols are fixed, it provides a bound for the compression that can be achieved. This is a very convenient measure of the performance of a coding system. Without any knowledge of the physical source of data, it is not possible to know the entropy, and the entropy is estimated based on the outcome of the source by observing the structure of the data as source output. Hence estimation of the entropy depends on observation and assumptions about the structure of the source data sequence. These assumptions are called the model of the sequence.

1.3.2 Noiseless Source Coding Theorem

The Noiseless Source Coding Theorem by Shannon [l] establishes the min- imum average code word length per source symbol that can be achieved, which in turn provides the upper bound on the achievable compression losslessly. The Noiseless Source Coding Theorem is also known as Shannon’s first

lNFORMATlON THEORY CONCEPTS 7

theorem. This is one of the major source coding results in information theory

If the data generated from a discrete memoryless source A are considered as grouped together in blocks on n symbols, to form an n-extended source, then the new source A" has Nn possible symbols {ai} , with probability P(ai ) = P(ai,)P(ai,)...P(ain),i = 1 , 2 , . . . , N ". By deriving the entropy of the new n-extended source, it can be proven that E(A") = n E ( A ) , where E ( A ) is the entropy of the original source A. Let us now consider encoding blocks of n source symbols at a time into binary codewords. For any E > 0, it is possible to construct a codeword for the block in such a way that the average number of bits per original source symbol, L, satisfies

11, 2,31.

E ( A ) I L < E ( A ) + E

The left-hand inequality must be satisfied for any uniquely decodable code for the block of n source symbols.

The Noiseless Source Coding Theorem states that any source can be losslessly encoded with a code whose average number of bits per source symbol is arbitrarily close to, but not less than, the source entropy E in bits by coding infinitely long extensions of the source. Hence, the noiseless source coding theorem provides us the intuitive (statistical) yardstick to measure the information emerging from a source.

1.3.2.1 Example: We consider a discrete memoryless source with alphabet A1 = {a , p, y,6} and the associated probabilities are p(a) = 0.65, p ( p ) = 0.20, p(y) = 0.10, p ( 6 ) = 0.05 respectively. The entropy of this source is E = -(0.65 log, 0.65 + 0.20 log, 0.20 + O.lOlog, 0.10 + 0.05 log, 0.05), which is approximately 1.42 bits/symbol. As a result, a data sequence of length 2000 symbols can be represented using approximately 2820 bits.

Knowing something about the structure of the data sequence often helps to reduce the entropy estimation of the source. Let us consider that the numeric data sequence generated by a source of alphabet A2 = {0 ,1 ,2 ,3} is D = 0 1 1 2 3 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3, as an example. The probability of appearance of the symbols in alphabet A, are p ( 0 ) = 0.05, p(1) = 0.10, p(2) = 0.20, and p(3) = 0.65 respectively. Hence the estimated entropy of the sequence D is E = 1.42 bits per symbol. If we assume that correlation exists between two consecutive samples in this data sequence, we can reduce this correlation by simply subtracting a sample by its previous sample to generate the residual values ~i = si - siWl for each sample si. Based on this assumption of the model, the sequence of residuals of the original data sequence is D = o I o 1 1 o o o o o o o o -1 o o 1 o o 0, consisting of three symbols in a modified alphabet A:! = { - l , l , O } . The probability of occurrence of the symbols in the new alphabet A are P(-1) = 0.05, p(1) = 0.2, and p ( 0 ) = 0.75 respectively as computed by the number of occurrence in the residual sequence. The estimated entropy of the transformed sequence


is E = -(0.0510g20.05 + 0.210g~0.2 + 0.7510g20.75) = 0.992 (i.e., 0.992 bits/symbol).

The above is a simple example t o demonstrate that the data sequence can be represented with fewer numbers of bits if encoded with a suitable entropy encoding technique and hence resulting in data compression.

1.3.3 Unique Decipherability

Digital representation of data in binary code form allows us to store it in computer memories and to transmit it through communication networks. In terms of length of the binary codes, they can be fixed-length as shown in column A of Table 1.1 with alphabet {a , P , y , b } , as an example, where all the symbols have been coded using the same number of bits. The binary codes could also be variable-length codes as shown in columns B or C of Table 1.1 in which the symbols have different code lengths.

Table 1.1 Examples of Variable-Length Codes

y 10 110 00 11 111 01

Consider the string S = acuycyPaG. The binary construction of the string S using variable-length codes A, B, and C is as follows:

CA(S) = 00001000010011

Cn(S) = 001100100111

CC(S) = 000001001.

Given the binary code CA(S) = 00001000010011, it is easy to recognize or uniquely decode the string S = aayapab because we can divide the binary string into nonoverlapping blocks of 2 bits each and we know that two consecutive bits form a symbol as shown in column A. Hence the first two bits “00” form the binary code for the symbol a, the next two bits “00” is similarly mapped to the symbol a , the following two bits “10” can be mapped to symbol y, and so on. We can also uniquely decipher or decode the binary code C B ( S ) = 001100100111 because the first bit (0) represents the symbol a; similarly the next bit (0) also represents the symbol a according to the code in column B. The following three consecutive bits “110” uniquely represent the symbol y. Following this procedure, we can uniquely reconstruct the string S = a a y a ~ a b without any ambiguity.

CLASSIFICATION OF COMPRESSION ALGORITHMS 9

But deciphering the binary code Cc(S) = 000001001 is ambiguous because it has many possibilities-ayypyp, aya6yp, or aaaaapyp to name a few. Hence the code C c ( S ) = 000001001 is not uniquely decipherable using the code in column C in Table 1.1.

It is obvious that the f ixed-length codes are always uniquely decipherable. But not all the variable-length codes are uniquely decipherable. The uniquely decipherable codes maintain a particular property called the prefix property. According to the prefix property, no codeword in the code-set forms the prefix of another distinct codeword [5]. A codeword C = C O C ~ C Z ~ . ~ C ~ - ~ of length k is said to be the prefix of another codeword D = dodl...d,-l of length m if ci = di for all i = 0 ,1 , . . . , k - 1 and k l m .

Note that none of the codes in column A or in column B is a prefix of any other code in the corresponding column. The codes formed using either column A or column B are uniquely decipherable. On the other hand, binary code of a in column C is a prefix of both the binary codes of y and 6.

Some of the popular variable-length coding techniques are Shannon-Fano Coding [ 6 ] , Huffman Coding (71, Elias Coding [8], Arithmetic Coding [9], etc. It should be noted that the f ixed-length codes can be treated as a special case of uniquely decipherable variable-length code.

1.4 CLASSIFICATION OF COMPRESSION ALGORITHMS

In an abstract sense, we can describe data compression as a method that takes an input data D and generates a shorter representation of the data c(D) with a fewer number of bits compared to that of D. The reverse process is called decompression, which takes the compressed data c (D) and generates or reconstructs the data D’ as shown in Figure 1.1. Sometimes the compression (coding) and decompression (decoding) systems together are called a “CODEC,” as shown in the broken box in Figure 1.1.

Fig. 1.1 CODEC.


The reconstructed data D’ could be identical to the original data D or it could be an approximation of the original data D , depending on the reconstruction requirements. If the reconstructed data D‘ is an exact replica of the original data D , we call the algorithm applied to compress D and decompress c ( D ) to be lossless. On the other hand, we say the algorithms are lossy when D’ is not an exact replica of D . Hence as far as the reversibility of the original data is concerned, the data compression algorithms can be broadly classified in two categories - lossless and lossy . Usually we need to apply lossless data compression techniques on text data or scientific data. For example, we cannot afford to compress the electronic copy of this text book using a lossy compression technique. It is expected that we shall reconstruct the same text after the decompression process. A small error in the reconstructed text can have a completely different meaning. We do not expect the sentence “You should not delete this file” in a text to change to “You should now delete this file” as a result of an error introduced by a lossy compression or decompression algorithm. Similarly, if we compress a huge ASCII file containing a program written in C language, for example, we expect to get back the same C code after decompression because of obvious reasons. The lossy compression techniques are usually applicable to data where high fidelity of reconstructed data is not required for perception by the human perceptual system. Examples of such types of data are image, video, graphics, speech, audio, etc. Some image compression applications may require the compression scheme to be lossless (i.e., each pixel of the decompressed image should be exactly identical to the original one). Medical imaging is an example of such an application where compressing digital radiographs with a lossy scheme could be a disas- ter if it has to make any compromises with the diagnostic accuracy. Similar observations are true for astronomical images for galaxies and stars.

Sometimes we talk about perceptual lossless compression schemes when we can compromise with introducing some amount of loss into the reconstructed image as long as there is no perceptual difference between the reconstructed data and the original data, if the human perceptual system is the ultimate judge of the fidelity of the reconstructed data. For example, it is hardly noticeable by human eyes if there is any small relative change among the neighboring pixel values in a smooth non-edge region in a natural image.

In this context, we need to mention that sometimes data compression is referred as coding in the literature. The terms noiseless and noisy coding, in the literature, usually refer to lossless and lossy compression techniques respectively. The term “noise” here is the “error of reconstruction” in the lossy compression techniques because the reconstructed data item is not identical to the original one. Throughout this book we shall use lossless and lossy compression in place of noiseless and noisy coding respectively.

Data compression schemes could be static or dynamic. In statzc methods, the mapping from a set of messages (data or signal) to the corresponding set of compressed codes is always fixed. In dynamic methods, the mapping from the set of messages to the set of compressed codes changes over time. A

A DATA COMPRESSION MODEL 11

dynamic method is called adaptive if the codes adapt to changes in ensemble characteristics over time. For example, if the probabilities of occurrences of the symbols from the source are not fixed over time, we can adaptively formulate the binary codewords of the symbols, so that the compressed file size can adaptively change for better compression efficiency.

1.5 A DATA COMPRESSION MODEL

A model of a typical data compression system can be described using the block diagram shown in Figure 1.2. A data compression system mainly consists of three major steps-removal or reduction in data redundancy, reduction in entropy, and entropy encoding.

Input Data

1 Reduction of Data

Redundancy

I

I I

Reduction of Entropy

Entropy Encoding

Compressed Data

Fig. 1.2 A data compression model.

The redundancy in data may appear in different forms. For example, the neighboring pixels in a typical image are very much spatially correlated to each other. By correlation we mean that the pixel values are very similar in the non-edge smooth regions [lo] in the image. In the case of moving pic- tures, the consecutive frames could be almost similar with or without minor displacement if the motion is slow. The composition of the words or sen- tences in a natural text follows some context model based on the grammar being used. Similarly, the records in a typical numeric database may have some sort of relationship among the atomic entities that comprise each record in the database. There are rhythms and pauses in regular intervals in any


natural audio or speech data. These redundancies in data representation can be reduced in order to achieve potential compression.

Removal or reduction in data redundancy is typically achieved by trans- forming the original data from one form or representation to another. The popular techniques used in the redundancy reduction step are prediction of the data samples using some model, transformation of the original data from spa- tial domain to frequency domain such as Discrete Cosine Transform (DCT), decomposition of the original data set into different subbands such as Discrete Wavelet Transformation (DWT), etc. In principle, this step potentially yields more compact representation of the information in the original data set in terms of fewer coefficients or equivalent. In case of lossless data compression, this step is completely reversible. Transformation of data usually reduces entropy of the original data by removing the redundancies that appear in the known structure of the data sequence.

The next major step in a lossy data compression system is to further reduce the entropy of the transformed data significantly in order to allocate fewer bits for transmission or storage. The reduction in entropy is achieved by dropping nonsignificant information in the transformed data based on the application criteria. This is a nonreversible process because it is not possible to exactly recover the lost data or information using the inverse process. This step is applied in lossy data compression schemes and this is usually accom- plished by some version of quantization technique. The nature and amount of quantization dictate the quality of the reconstructed data. The quantized coefficients are then losslessly encoded using some entropy encoding scheme to compactly represent the quantized data for storage or transmission. Since the entropy of the quantized data is less compared to the original one, it can be represented by fewer bits compared to the original data set, and hence we achieve compression.

The decompression system is just an inverse process. The compressed code is first decoded to generate the quantized coefficients. The inverse quantization step is applied on these quantized coefficients to generate the approximation of the transformed coefficients. The quantized transformed coefficients are then inverse transformed in order to create the approximate version of the original data. If the quantization and inverse quantization steps are absent in the codec and the transformation step for redundancy removal is reversible, the decompression system produces the exact replica of the original data and hence the compression system can be called a lossless compression system.

1.6 COMPRESSION PERFORMANCE

Like any other system, metrics of performance of a data compression algorithm are important criteria for selection of the algorithm. The performance measures of data compression algorithms can be looked a t from different per- spectives depending on the application requirements: amount of compression

JPEG2000 Standard for Image Compression · File Format for JPEG2000 Part 1: JP2 format 8.3.1 File...

Documents

Transcript of JPEG2000 Standard for Image Compression · File Format for JPEG2000 Part 1: JP2 format 8.3.1 File...