Roadmap - Kau

15
1 Image, video and audio coding concepts Stefan Alfredsson (based on material by Johan Garcia) Roadmap XML – Data structuring Loss-less compression (huffman, LZ77, ...) Lossy compression Rationale Compression is about removing redundant information, and decompression is about restoring to the ”original” state For loss-less compresson, decompressed data is identical to compressed data But, higher compression can be achieved by allowing a quality tradeoff! For example: removing all wovels from a text Fr xmpl: rmvng ll wvl frm txt Obviously not suitable for all kinds of data But image/video/audio are good candidates

Transcript of Roadmap - Kau

1

Image, video and audio coding concepts

Stefan Alfredsson

(based on material by Johan Garcia)

Roadmap

• XML – Data structuring

• Loss-less compression (huffman, LZ77, ...)

•�

Lossy compression

Rationale

• Compression is about removing redundant information, and decompression is about restoring to the ”original” state

• For loss-less compresson, decompressed data is identical to compressed data

• But, higher compression can be achieved by allowing a quality tradeoff!– For example: removing all wovels from a text– Fr xmpl: rmvng ll wvl frm txt

• Obviously not suitable for all kinds of data– But image/video/audio are good candidates

2

The case for image compression

• The human is ”impatient and half blind”

• Impatient– Frustrated by waiting behind the screen

– (i.e. need compression for faster transfer)

• Half blind– The human vision does not catch all

information

The seeing sense

(image from www.glaucoma.org/learn/eye-anatomy.gif )

Color vision worse than BW

Limited resolution

Different sensitivity to different patterns

Sharp contrast ”hides” weaker

Compression example

Original Compressed

73 kbyte 11 kbyte

> 6 times faster download!

3

Compression

• What’s differing?

Common Image Formats

• GIF (Graphis Interchange Format)– ”Lossless”, but only in 256 colors

– Uses LZW for compression (Patent problem)

• PNG (Portable Network Graphics)– More flexible replacement for GIF

• JPEG (Joint Photographers Expert Group)– Complex standard with many modes

– Typically lossy, but has lossless mode as well

JPEG

• The best compression for photo-like natural images

• Color or grayscale images

• Lossy or lossless compression

• Sequential or progressive display

• Exploits human visual system characteristics

4

JPEG functional overview

• Color conversion and downsample

• Split in 8x8 pixel blocks

• Do DCT (Discrete Cosine Transformation)

• Quantize

• Difference encode DC and runlength encode AC

• Huffman encode

• Packetize

Color conversion, color split

Split into three components(Y Cb Cr Color format)

RGB pixel image =>JPEG

• Image normally represented in RGB format (8 bits per color channel per pixel indicate pixel colors)• Color split: RGB is converted into Y Cb Cr color format

•Y – Luminance•Cb – Chrominance (blue)•Cr – Chrominance (red)

Color downsampling

Downsample chrominancecomponents

RGB pixel image =>JPEG

• Color split and downsample

•The eye less sensitive to chrominance (color details) than to luminance (brightness details)•==> Downsample color components•Downsampling the chroma components saves 33% or 50% of the space taken by the image.

5

Splitting into blocks

Split into 8x8 pixel blocks

RGB pixel image =>JPEG

• Color split and downsample

• Split in blocks

DCT transform

Perform DCT

RGB pixel image =>JPEG

• Color split and downsample

• Split in blocks and do DCT

•DCT indicates the amount of intensity variation (spatial frequency)

Why DCT? Spatial locality!

6

Image Frequencies• An 8x8 pixel block

is transformed into64 spatial frequencies(coefficients)

• Sort of”2-dimensional fourier transform”

Summing of coefficients

Ursprungsbild

• By using all 64 coefficients a perfectreconstrucion is possible

• Using fewercoefficients gives a good reconstructionand data reduction

Quantization

/ =

• The quantization reduces the resolution (i.e. The amount of information) for the coefficents

• The number of zero coefficients is increased

7

Encoding DCT - DC

Differential encode DC coefficient

RGB pixel image =>JPEG

• Color split and downsample

• Split in blocks and do DCT

• Quantize

• Diff code DC

Encoding DCT - AC

RGB pixel image =>JPEG

• Color split and downsample

• Split in blocks and do DCT

• Quantize

• Diff code DC and RLE AC

Run Length Encode AC coefficients

Zig-zag order• The quantized

coefficents are coded in a zig-zagmanner

• This leads to longer zerorunlength for the last coefficients.

8

Packetization

RGB pixel image =>JPEG

• Color split and downsample

• Split in blocks and do DCT

• Quantize

• Diff code DC and RLE AC

• Huffman encode

• PacketizeHeaders

One Minmum Coded Unit

Coder - overview

Decoder - overview

9

Image format recommendations

• Photo-like images – JPEG• Photo or drawn images – PNG or GIF

– These use lossless compression

• Text – PNG or GIF– High contrast, will be blurred by jpeg

• Trim JPEG quality• Avoid browser scaling

– Adjust filesize before to reduce download speed– Easy to upload 5 Mpix image, but image to be

displayed in 800x600 (0.48 Mpix) album!)

Size/quality comparisons

(Från: http://www.psychology.nottingham.ac.uk/staff/cr1/graphics.html)

Video coding

10

Video Coding• Video coding

– MPEG1, originally for 1x CD-spelare (”vcd”)– MPEG2, Digital Satellite och cable-TV

• Used for example in DVB-T for Boxer marksänd digitaltv

• Video coding for conferencing– H.261, older standard, used with e.g. ISDN– H.263, newer standard. Works better at low transmission speeds.

• Netmeeting• Used for most Flash videos (YouTube, Google Video, Myspace, and

others)• Basis of RealVideo / RealPlayer until RealVideo 8

• New technology– MPEG 4– (a subset of MPEG4 is used in DivX, XviD, H.264, and other

codecs)

Overview video coding

Video coding concepts

• Spatial Redundancy– Nearby pixels typically have similar values

– Removed by DCT-transform/quantization/huffmanencoding (like in JPEG)

• Temporal Redundancy– Video frames in sequence typically have similar content

– Removed by macro block coding with motion vectors and difference frames

11

Macro block coding

The motion vector specify how much and at which way the macro block has moved. The difference frame specify the difference within the macro block between frames.

Macro block coding 2

I, P and B frames

12

Transmission order

MPEG 4

Mpeg 4 är object based, where both image and audio objects may be placed in a 3D coordinate system.

These objetcs can then be coded and manipulted independently of each other.

The viewer may also interact with the settning, change viewing angles, and so on.

Mesh coding

An area in the picture can be modeled with a mesh. The mesh parameters are then changed when the area is changed. The texture is rescaled according to the mesh changes.

Wireline models of face and animation of ”wirelinefaces” with mapped textures are specified in MPEG 4 ver 1. Later versions have included models of human body and corresponding motion patterns.

13

Sprite coding

A picture can be composed with a fixed background and a moving sprite. The background can be larger than what is shown on screen, to facilitate camera panorama without transmitting a new background.

(compare to weather forecasts on TV, where the background is really ”blue” and the background / weather map / is added afterwards)

Audio coding

Audio coding

• Use redundancy in data– Similarities in channels in stereo sound

• Use the weaknesses of the hearing sense (psycho acoustics)– Hearing thresholds

– Frequency bound masking

– Time bound masking

14

Hearing threshold

• The hearing sensitivity varies for different frequencies

• What can not be detected does not need tranmission

Frequency masking

• A strong tone masks (conceals/hides) nearby tones

• The mask width varies with the frequency. Higher = broader

Time masking

• Strong tones also mask in time

• We can not sense nearby tones right after a strong tone has finished

15

Masking

• Frequency and time masking can be modeled as a surface

• Tones below the surface can not be sensed, and their information does not need coding / transmission

MPEG 2 - Audio Layer 3 (MP3)Algorithm:

1. Separate the audio into 32 subbands with a filter

2. Compute the mask that every band cause

3. If the audio strength in one subband is masked, do not code that subband

4. Determine quantization with regard to masking and bit rate

5. Huffman encode

6. Format output stream

Q?