Chapter 6 LOSSY COMPRESSION METHODS

37
Chapter 6 LOSSY COMPRESSION METHODS

Transcript of Chapter 6 LOSSY COMPRESSION METHODS

Chapter 6

LOSSY

COMPRESSION

METHODS

What is “lossy compression”

Trade-offs

• A design trade-off is necessary when there is a contradiction in

requirements:

• Examples:

• Cost versus quality

• Cost versus performance

• Usability for occasional or inexperienced users versus

power or flexibility for expert users.

• Size or download time versus resolution of image

• Decisions are made using criteria based on sufficiency for

purpose.

What is “lossy compression”

In lossy compression, as opposed to “Lossless compression”

data is compressed then decompressing so that the retrieved data,

although different from the original, is still "close enough" to be

useful in some way.

Depending on the design of the format lossy data compression often

suffers from generation loss, that is compressing and

decompressing multiple times will do more damage to the data than

doing it once.

Lossy compression methods

• Loss of information is acceptable in a picture of video.

• The reason is that our eyes and ears cannot distinguish subtle changes.

• Loss of information is not acceptable in a text file or a program file.

• For examples:

– Joint photographic experts group (JPEG)

– Motion picture experts group (MPEG)

Lossy compression methods

There are two basic lossy compression schemes:

1- Transform

• In lossy transform codecs,

•samples of picture or sound are taken,

•chopped into small segments,

•transformed into a new basis space,

•and quantized.

•The resulting quantized values are then given variable length

codes, depending on their frequency of occurrence (entropy

coding).

Lossy compression methods

The second type of scheme is:

2-Predictive:

•In lossy predictive codecs, previous and/or subsequent decoded

data is used to predict the current sound sample or image frame.

•The error between the predicted data and the real data, together with

any extra information needed to reproduce the prediction, is then

quantized and coded.

•In some systems transform and predictive techniques are combined, with

transform codecs being used to compress the error signals generated by the

predictive stage.

How it works: human perception perspective

JPEG exploits the characteristics of human vision, eliminating or reducing data to

which the eye is less sensitive.

JPEG works well on grayscale and color images, especially on photographs, but it

is not intended for two-tone images.

Original Lena

Image (12KB

size)

Lena Image, Compressed

(85% less information,

1.8KB)

Lena Image, Highly

Compressed (96% less

information, 0.56KB)

JPEG (Joint Photographic Experts Group)

JPEG (pronounced jay-peg) is a most commonly used standard method of lossy

compression for photographic images.

JPEG itself specifies only how an image is transformed into a stream of bytes, but not

how those bytes are encapsulated in any particular storage medium.

A further standard, created by the Independent JPEG Group, called JFIF (JPEG File

Interchange Format) specifies how to produce a file suitable for computer storage and

transmission from a JPEG stream.

In common usage, when one speaks of a "JPEG file" one generally means a JFIF file,

or sometimes an Exif JPEG file.

JPEG/JFIF is the format most used for storing and transmitting photographs on the

web.. It is not as well suited for line drawings and other textual or iconic graphics

because its compression method performs badly on these types of images

JPEG gray scale example, 640 x 480 pixels

Image compression: JPEG

JPEG process

• DTC: discrete cosine transform

• Quantization

• Compression

Discrete cosine transform

• T(0, 0): DC value (direct current value)

• T(m, n) : AC values (represent changes in the pixel values)

T(0, 0)

DCT discussion

• The DCT transformation creates table T from table P.

• The DC value gives the average value of the pixels.

• The AC values gives the changes.

• Lack of changes in neighboring pixels creates 0s.

• The DCT transformation is reversible.

Definition of DCT:

•Given an input function f(i, j) over two integer variables i and j (a piece of an image), the 2D DCT transforms it into a new function F(u, v), with integer u and v running over the same range as i and j. The general definition of the transform is:

• (6.1)

• where i, u = 0, 1, . . . ,M − 1; j, v = 0, 1, . . . ,N − 1; and the constants C(u) and C(v) are determined by

• (6.2)

14

1 1

0 0

2 ( ) ( ) (2 1)· (2 1)·( , ) cos ·cos · ( , )

2 2

M N

i j

C u C v i u j vF u v f i j

M NMN

− −

= =

+ +=

2         0,( ) 2

1     .

ifC

otherwise

=

=

Quantization

• After the T table is created, the values are quantized to reduce the number of bits needed for encoding.

• Quantization:

– Divide the number by a constant and then drop the fraction.

– The quantizing phase is not reversible.

– Some information will be lost.

Compression

• After quantization, the values are read from the table, and redundant 0s are removed.

• The reason is that if the picture does not have fine changes, the bottom right corner of the T table is all 0s.

Reading

the table

Baseline JPEG compression

YCbCb colour space is based on YUV colour space

YUV signals are created from an original RGB (red, green and blue) source. The weighted

values of R, G and B are added together to produce a single Y (lumsignal, representing

the overall brightness, or luminance, of that spot. The U signal is then created by

subtracting the Y from the blue signal of the original RGB, and then scaling; and V by

subtracting the Y from the red, and then scaling by a different factor. This can be

accomplished easily with analog circuitry.

The following equations can be used to derive Y, U and V from R, G and B:Y= 0.299R + 0.587G + 0.114BU= 0.492(B − Y)= − 0.147R − 0.289G + 0.436BV= 0.877(R − Y)= 0.615R − 0.515G − 0.100B

Y = luminance

Cr, Cb = chrominance

Discrete cosine transform

DCT transforms the image from the spatial

domain into the frequency domain

Next, each component (Y, Cb, Cr) of the image

is "tiled" into sections of eight by eight pixels

each, then each tile is converted to frequency

space using a two-dimensional forward discrete

cosine transform (DCT, type II).

The 64 DCT basis functions

…the coefficient in the output DCT matrix at (2,1) corresponds to the strength of the correlation between the basis function at (2,1) and the entire 8x8 input image block.

The coefficients corresponding to high-frequency details are located to the right and bottom of the DCT block, and it is precisely these weights which we try to nullify -- the more zeroes in the 8x8 DCT block, the higher the compression that is achieved.

In the Quantization step below, we'll discuss how to maximize the number of zeroes in the matrix.

Quantization

This is the main lossy operation in the whole process.

After the DCT has been performed on the 8x8 image block, the results are quantized

in order to achieve large gains in compression ratio.

Quantization refers to the process of representing the actual coefficient values as one

of a set of predetermined allowable values, so that the overall data can be encoded in

fewer bits (because the allowable values are a small fraction of all possible values).

Example of a quantizing matrix

The aim is to greatly reduce the amount of

information in the high frequency components.

This is done by simply dividing each component in

the frequency domain by a constant for that

component, and then rounding to the nearest integer.

As a result of this, it is typically the case that many of

the higher frequency components are rounded to

zero, and many of the rest become small positive or

negative numbers.

Quantization

Quantization is the key irreversible step in the jpeg process.

Jpeg Quality Settings

Typically the only thing that the user can control in Jpeg compression is the

quality setting (and rarely the chroma sub-sampling).

The value chosen is used in the quantization stage, where less common values

are discarded by using tables tuned to visual perception.

This reduces the amount of information while preserving the perceived quality.

Zig-zag sorting

The quantized data needs to be in an efficient format for encoding.

The quantized coefficients have a greater chance of being zero as the horizontal and vertical frequency values increase.

To exploit this behavior, we can rearrange the coefficients into a one-dimensional array sorted from the DC value to the highest-order spatial frequency coefficient

The first element in each 64x1 array represents

the DC coefficient from the DCT matrix, and the

remaining 63 coefficients represent the AC

components.

These two types of information are different

enough to warrant separating them and applying

different methods of coding to achieve optimal

compression efficiency.

All of the DC coefficients (one from each DCT

output block) must be grouped together in a

separate list.

At this point, the DC coefficients will be encoded

as a group and each set of AC values will be

encoded separately.

Coding the DC Coefficients

The DC components represent the intensity of each 8x8 pixel block.

Because of this, significant correlation exists between adjacent blocks.

So, while the DC coefficient of any given input array is fairly unpredictable by itself, real

images usually do not vary widely in a localized area.

As such, the previous DC value can be used to predict the current DC coefficient value.

By using a differential prediction model (DPCM), we can increase the probability that

the value we encode will be small, and thus reduce the number of bits in the

compressed image.

To obtain the coding value we simply subtract the DC coefficient of the previously

processed 8x8 pixel block from the DC coefficient of the current block. This value is

called the "DPCM difference".

Once this value is calculated, it is compared to a table to identify the symbol group to

which it belongs (based on its magnitude), and it is then encoded appropriately using

an entropy encoding scheme such as Huffman coding.

Coding the AC Coefficients (Run-Length Coding)

Because the values of the AC coefficients tend towards zero after the quantization step,

these coefficients are run-length encoded.

The concept of run-length encoding is a straightforward principle.

In real image sequences, pixels of the same value can always be represented as

individual bytes, but it doesn't make sense to send the same value over and over again.

For example, we have seen that the quantized output of the DCT blocks produces many

zero-value bytes.

The zig-zag ordering helps produce these zeros in groups at the end of each sequence.

Instead of coding each zero individually, we simply encode the number of zeros in a

given 'run.'

This run-length coded information is then variable-length coded (VLC), usually using

Huffman codes.

Entropy encoder

Fixed length codes are most often applied in systems where each of the

symbols occurs with equal probability.

Example of a fixed length

codeIn reality, most symbols do not occur with equal probability.

In these cases, we can take advantage of this fact and reduce the average number of bits used to

compress the sequence.

This is a final lossless compression performed on the quantized DCT

coefficients to increase the overall compression ratio achieved.

Example of entropy encoding with weighted symbol probabilities.

Entropy encoding is a compression

technique that uses a series of bit codes to

represent a set of possible symbols.

The length of the code that is used for each symbol can be

varied based on the probability of the symbol's occurrence.

By encoding the most common symbols with shorter bit

sequences and the less frequently used symbols with longer

bit sequences, we can easily improve on the average

number of bits used to encode a sequence.

JPEG File Interchange Format (JFIF)

The encoded data is written into the JPEG File Interchange Format (JFIF), which, as the

name suggests, is a simplified format allowing JPEG-compressed images to be shared

across multiple platforms and applications.

JFIF includes embedded image and coding parameters, framed by appropriate header

information.

Specifically, aside from the encoded data, a JFIF file must store all coding and

quantization tables that are necessary for the JPEG decoder to do its job properly.

EXAMPLE: JPEG

• Joint Photographic Experts Group

• Defines three different coding systems

– Lossy baseline coding system

– Extended coding system for greater compression

– Lossless independent coding system for reversible applications

• Baseline performed in three sequential steps

– DCT (Discrete Cosine Transform) computation

– Quantization

– Variable-length code assignment

• To start, the image is divided into pixel blocks of size 8x8 (left to right, top to bottom)

JPEG (cont.)

• Gray levels are shifted to center: –L/2 <= gl <= L/2

• DCT is run on the block producing frequency components centered on upper left corner.

• The DCT 2D block is converted into a 1D string by using a zig-zag pattern.

• The array is quantized.

• A selected number of coordinates is selected depending on the “quality” desired.

• These values are then encoded using a variable length coding system.

• The binary codes are transmitted.

JPEG Coding Example

52 55 61 66 70 61 64 73

63 59 66 90 109 85 69 72

62 59 68 113 144 104 66 73

63 58 71 122 154 106 70 69

67 61 68 104 126 88 68 70

79 65 60 70 77 68 58 75

85 71 64 59 55 61 65 83

87 79 69 68 65 76 78 94

8x8 block of original image

First step is to divide an image into blocks with each having dimensions of 8 x8.

Let’s for the record, say that this 8x8 image contains the

following values.

JPEG Coding Example

-76 -73 -67 -62 -58 -67 -64 -55

-65 -69 -62 -38 -19 -43 -59 -56

-66 -69 -60 -15 16 -24 -62 -55

-65 -70 -57 -6 26 -22 -58 -59

-61 -67 -60 -24 -2 -40 -60 -58

-49 -63 -68 -58 -51 -65 -70 -53

-43 -57 -64 -69 -73 -67 -63 -45

-41 -49 -59 -60 -63 -52 -50 -34

Level Shifted

The range of the pixels intensities now are from 0 to 255. We will change the range from -

128 to 127.

Subtracting 128 from each pixel value yields pixel value from -128 to 127. After

subtracting 128 from each of the pixel value, we got the following results.

JPEG Coding Example

-415 -29 -62 25 55 -20 -1 3

7 -21 -62 9 11 -7 -6 6

-46 8 77 -25 -30 10 7 -5

-50 13 35 -15 -9 6 0 3

11 -8 -13 -2 -1 1 -4 1

-10 1 3 -3 -1 0 2 -1

-4 -1 2 -1 2 -3 1 -2

-1 -1 -1 -2 -1 -1 0 -1

DCT

JPEG Coding Example

-26 -3 -6 2 2 0 0 0

1 -2 -4 0 0 0 0 0

-3 1 5 -1 -1 0 0 0

-4 1 2 -1 0 0 0 0

1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

DCT Normalized

JPEG Coding Example

-26 -3 -6 2 2 0 0 0

1 -2 -4 0 0 0 0 0

-3 1 5 -1 -1 0 0 0

-4 1 2 -1 0 0 0 0

1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

DCT Normalized

-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB

EOB indicates

remainder are

zeros.

JPEG Coding Example

• Now using a version of Huffman coding categories and DC differences, you have the sequence

• 1010110 0100 001 0100 0101 100001 0110 100011 001 100011

001 001 100101 11100110 110110 0110 11110100 000 1010

• Total number of bits in completely coded reordered array (and thus the number of bits necessary to represent the entire 8x8, 8-bit example sub image) is 92.

– The resulting compression ration is 512/92 or about 5.6 : 1