Introduction to coding of audivisual contents

8/10/2019 Introduction to coding of audivisual contents

1/28

CAV: Fall 2014 1. Introduction

ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 1

Coding of Audiovisual Contents

1. Introduction

Veu Processing GroupImage and Video Processing Group

Unit structure

Motivating multimedia compression

The ITU601 standard

ow oes compress on wor

Redundancy and irrelevance

Lossless and lossy compression Compression efficiency

Quality measures

2

Coding systems

Importance of standards

1. IntroductionCoding of Audiovisual Contents


2/28



Motivating multimedia compression (I)

Uncompressed multimedia (graphics, audio and video) data requires considerable stora e ca acit and transmission bandwidth .

Why do we need to compress multimedia content?

Despite rapid progress in mass storage density, processor speeds , and digital communication system performance, demand for data storage capacity and data transmission bandwidth continues to outstrip the capabilities of available technologies.

The continuous growth of data intensive multimedia based web applications have not only sustained the need for more efficient ways to encode signals and images but have made compression of such signals central to storage and communication technology .

3 1. IntroductionCoding of Audiovisual Contents

Motivating multimedia compression (II)

More than 500 thousand million digital photos are taken each year .

3 days of content are uploaded

Some data (2013)

every minute to YouTube. 600 years of Youtube content are

shared through Facebook links every

day. Every second , on average, 58

photos are uploaded to Instagram, 15 photos are uploaded to Flickr, 26 photos/videos are shared on Twitter and more than 2000 photos are uploaded to Facebook (though the most part of them are not public).


http://instagram.com/p/W2FCksR9-e/#

What a difference 8 years makes. Saint Peters square in 2005 vs 2013 (#NBCPope)


3/28



An example: ITU601

In 1982, the CCIR defined a standard for encoding interlaced analogue video signals in digital form mainly for studio applications.

The current name of the standard is ITUR BT.601 .

e co or space s B R: It enables compatibility between color and black&white . It allows a reduction of the chroma resolution thanks to the lower

sensitivity of the HVS to the color information. The RB signals are selected to create the chroma signals (CBCR)

since G presents the largest correlation with Y.

0.299 0.587 0.1140.500 0.419 0.0810.169 0.331 0.500

R

B

Y R

C G

BC

E E

E E

E E


Digital video: Sampling (I)

The previous signals EY ECB ECR present a bandwidth of

approximately 6 MHz: Antialiasing filters have a bandpass up to 5.75 MHz (ITU601). The sampling frequency is 13.5 MHz (ITU601) which is a

multiple of the line frequency in the most common standards: 625 (PAL) or 525 (NTSC) lines.

Both standards have 720 samples by active line.

Video signal sampling : An orthogonal sampling pattern is used. Chroma signals can be sampled at lower resolution than the

luminance signal.



4/28



Digital video: Sampling (II)

The most common sampling formats are (Chroma sampling changes):

4:4:4 Format 4:2:2 Format 4:2:0 Format

4:4:4 Format: Also used for the RGB format.

4:2:2 Format: Useful for most studio applications.

4:2:0 Format: Not the scope of the ITU601 standard.

LuminanceChroma


Digital video: Quantization (I) The EY ECB ECR components

do not have the same dynamics : Signal EY [0, 1] Volts Signal ECB [0.5, 0.5] Volts Signal ECR [0.5, 0.5] Volts.

Every signal is quantized using 8 (10) bits ; that is 256 (1024) levels, so that: Values 0 and 255 are preserved for synchronization data

A protection interval (to allow for overshoot and undershoot) is left in the upper and lower levels of the dynamic range.

219 16Y Y E

224 128 160 128 R R C R Y C E E E 224 128 126 128

B B C B Y C E E E

The corresponding level number is the nearest integer value and the digital

equivalents are termed Y , C B and C R .



5/28


6/28



Bit rates (II)

The most

common

version

is

4:2:2

10

bits,

leading

to

270

Mbits/s :

Standardized as SMPTE 259M

Lower bit rates can be achieved removing all irrelevant information (without blanking intervals: Buffered video ):

eo s samp e a , z Stored in a memory Only actives lines are read at minimum speed.

For instance, at 4:2:2 format (625 lines, 50 Hz), there are 720 samples per line, 576 active lines and 25 frames per second. As every sample is quantized with 8 bits, the final bit rate is:

=

(720 x 480 x 30 x 10 x (1 + + ) = 207.36 Mbit/s)

Although the bit rate is reduced, figures are still very high: Only for studio applications . Broadcasting requires compression systems .


Unit structure


The ITU601 standard




Quality measures

12

Coding systems




7/28



How does compression work?

Two fundamental components of information compression are

redundancy and irrelevance reduction.

Redundancy reduction aims at removing duplication from the signal source (audio / image/ video/ text). In general, three types of redundancy can be identified:

Temporal Redundancy or correlation between adjacent samples in audio or adjacent frames in a sequence of images (in video applications).

. Spectral Redundancy or correlation between different spectral

bands or color components.


Redundancy / Relevancy

Two fundamental components of information compression are

redundancy and irrelevance reduction.

Irrelevance reduction omits parts of the signal that will not be noticed by the signal receiver:

The human

auditory

system

(HAS)

or

The human visual system (HVS).

Current coding systems exploit both types of characteristics

redundancy and irrelevance reduction, introducing losses in

the decoded signal.



8/28



Temporal Redundancy: Video

In video signals, there is usually a high temporal correlation (similarity) between frames that are captured at around the same time, especially if the temporal sampling rate (the frame rate) is high.

Table tennis sequenceframe #40




Temporal Redundancy: Speech

In speech signals, there is usually a high temporal correlation (similarity) between consecutive (or close) samples that can be appreciated in the signal itself, its (estimated) autocorrelation or its (estimated) spectral density.

Autocorrelation of

Speech signal: Voiced and Unvoiced parts

the voiced signal



Spectral density of the voiced signal


9/28



Temporal Redundancy: Audio

This temporal redundancy can be clearly observed as well when comparing consecutive samples in an audio signal.

Strong correlation found in voiced sounds of speech and many sounds from musical instruments


Spatial Redundancy: Image

In still images, there is usually a high spatial correlation between pixels (samples) that are close to each other, i.e. the values of neighboring samples are often very similar.

Amplitude of adjacent pixel

Gray level image, 8bbpJoint histogram of two horizontal adjacent pixels

Amplitude of current pixel



10/28



Inter channel redundancy: Color Image

In images, there is usually a high correlation in the spectral domain between color image components

Statistical dependence between R,G,B is stronger


Inter channel redundancy: Stereo audio signals

For some stereo audio signals , there is a strong correlation between channels

Example of recorded sounds coming from the front (as in live concerts)



11/28



Frequency redundancy: Image

In images, there is usually a high correlation between the various frequency bands of the signal

First level Second level Third level


Irrelevance

For audio signals as well as image and video data to be consumed by humans, there is no need to represent more than the information that can be perceived in

, , , loudness, brightness,

Required resolution might depend on audio / image content (masking )

Irrelevancy can be application dependent : In ima e onl a s ecific area mi ht be relevant for some task e. . in

medical imaging.



12/28



Exploiting limitations of HAS (I)

Human ear can only hear sound louder than a frequency dependant threshold (20Hz20kHz)

Masking : some signal components are irrelevant to human perception Depends on spectral characteristics, temporal characteristics and intensity

It can occur simultaneously with the masking signal (frequency masking)

Loud sounds mask soft ones, louder sound happens at the same time


Exploiting limitations of HAS (II) Human ear can only hear sound louder than a frequency dependant threshold

(20Hz20kHz) Masking : some signal components are irrelevant to human perception

Depends on intensity and spectral as well as temporal characteristics

It can occur before or after the masking signal (temporal masking) The auditory system needs an integration time higher for lower sounds

Signals which are not audible do not need to be transmitted

Perceptual lossless : assign number of bits so that the quantization noise is not audible



13/28



Exploiting limitations of HVS

Human visual system has much lower acuity for color hue and saturationthan for brightness

Use color transform to facilitate exploiting that property

Cb and Cr often sub sampled 2:1 relative to Y

0.299 0.587 0.114

0.169 0.331 0.5 .

0.5 0.419 0.813

Y R

Cb G

Cr B

x x

x x

x x

RGBcomponents

Luminance component

Chrominance components {

Y

Cb

Cr


Unit structure


The ITU601 standard




Quality measures

26

Coding systems




14/28



Lossless and lossy compression

Lossless compression

The reconstructed data at the output of the decoder is a perfect copy of the original data.

n y s a s ca re un ancy s exp o e . Lossless compression of audio, image, video gives only a moderate amount of

compression. Useful for file compression (rar, zip, arj), medical or satellite images

Lossy compression

The decompressed data is not identical to the source data. Both statistical and visual/psychoacoustic redundancy are exploited. Much higher compression ratios can be achieved at the expense of a loss of

visual/audio quality.


Performance

How to evaluate the performance of a compression algorithm?

Performance can be based on several factors: Algorithm complexity Memory requirements Run time Robustness Compression efficiency Quality measures

Objective measures Subjective measures



15/28



Efficiency (Measures of compression)

If b and b are the number of binary digits (or information carrying units) used in two representations of the same information,

Compression ratio : '

bC

b Example : grayscale image of size 256x256 at 1 byte per pixel. If the compressed image

occupies 16384 bytes, the compression ratio is 4

Compressed bit rate : number of binary digits (bits) used to represent each sample for ima es, in b or number of bits used er second for audio/video, in b/s

256 256 8 655364

16384 8 16384

x x

x

C

Example : for the previous image, the compressed bit rate is 2 bpp (bits per pixel)

16384 8 1310722

256 256 65536

x

x


Typical compressed bit rates (I)

Audio (MP3) 32 Kb/s MW (AM) quality 96 Kb/s FM quality 128160 Kb/s Standard Bitrate quality 192 Kb/s DAB (Digital Audio Broadcasting) quality 224320 Kb/s Near CD quality

Other audio formats 800 b/s Minimum necessary for recognizable speech 8 Kb/s Telephone quality 0.5:1 Mb/s Lossless audio 1.411 Mb/s PCM sound format of Compact Disc Digital Audio



16/28



Typical compressed bit rates (II)

Color Image Lossless: 3 bpp Lossy quality: Usable: 0.25 bpp, Moderate: 0.5 bpp, High: 1 bpp

Video (MPEG2) 16 Kb/s Videophone quality (minimum necessary for a

consumer acceptable "talking head" picture) 128 : 384 Kb/s Business oriented videoconferencing system quality 1.25 Mb/s VCD quality

5 Mb/s DVD quality 15 Mb/s HDTV quality 36 Mb/s HD DVD quality 54 Mb/s Bluray Disc quality


Quality measures: Speech (I)

Speech coders are evaluated in terms of bit rate, complexity, delay, robustness and output quality:

Objective measures exist for computing bit rate, complexity and delay performance.

For quality or robustness only subjective measures can be applied.

Mean Opinion

Score

(MOS):

The MOS is generated by averaging the results of a set of standard, subjective tests where a number of listeners rate the heard audio quality of test sentences read aloud by both male and female speakers over the communications medium bein tested.

A listener is required to give each sentence a rating. The whole evaluation process is very expensive



17/28



Quality measures: Speech (II)

Mean Opinion Score (MOS): The rating scheme is as follows:

Mean Opinion Score (MOS)MOS Quality Impairment

5 Excellent Imperceptible

4 Good Perceptible but not

annoying

3 Fair Slightly annoying2 Poor Annoying


1 Ba Very annoying

A common value for usual standards is around 4.0 See http://en.wikipedia.org/wiki/Mean_opinion_score for typical values

Quality measures: Audio (I)

Audio coders are evaluated in terms of bit rate, complexity, delay, robustness and output quality:

Objective measures exist for computing bit rate, complexity and delay performance.

For quality or robustness only subjective measures can be applied.

In current

perceptual

audio

coding,

uncompressed

waveforms

can

be very different form the originals, but still they can sound quite similar.

Objective measures Classical objective measures, such as Signal to Noise Ratio (SNR) or

Total Harmonic Distortion (THD) are inadequate. Segmental SNR may be used for some classes of coders.



18/28



Quality measures: Audio (II)

Subjective measures Subjective listening tests are the most reliable tool for assessing

quality. . . . . ,

ITUR BS.1116, ITUR BS1387, )

Listening tests are influenced by Playback level and background noise Loudspeakers and headphones distortions Listening room acoustics

Listeners expertise


Quality measures: Image and Video (I)

Objective measures

Mean Square Error (MSE)1 2

0

1 N

i i

i

MSE X X N

Mean Absolute Difference (MAD)

Max Error

1

0

1 N

i i

i

MAD X X N

max i ii MaxError X X

Peak Signalto Noise Ratio (PSNR)

PSNR is useful to compare images with different dynamic ranges :

M = Maximum peak to peak value of the representation

2

1010log ; M

PSNR MSE



19/28



Quality measures: Image and Video (II)

Subjective measures

Group of observers ITUR Standard BT 500 Time consuming, expensive, off line

Model the fundamental properties of Human Visual System Several attempts, no universal solution yet Lower levels of processing (eye imaging system, retina sampling,

spatial masking, etc.) can be modeled, but higher level in the brain

Commercial systems, like PQA500 (Tektronix)


Quality measures: Image and Video (III)

Example of subjective quality measure : Absolute rating scale of the Television Allocation Study Organization

R. Gonzlez and R. Woods, Digital Image Processing , Prentice-Hall, 2008.



20/28



Quality measures: Subjective vs. Objective (I)

Original Compressed and reconstructed,

very good quality

Compressed and reconstructed,

noticeable degradation

Missing visual information and with

artifacts

MSE= 5.17 MSE = 15.67 MSE = 14.17


Quality measures: Subjective vs. Objective (II)

Best SSIM Researchers are

continuously looking for

objective measures that

ReferenceImage

Hypersphere

behavior of the HVS.

For instance, the Structural

Similarity Index (SSIM).

Z. Wang and A.C. Bovik, Mean Square Error: Love it or Leave it? IEEE Signal Processing Magazine , pp. 98 - 117, January 2009.

Worst SSIM



21/28



Unit structure


The ITU601 standard



Lossless and lossy compression

Compression efficiency

Quality measures

41

Coding systems



A typical coding system (I)

Source Encoder

Channel Encoder

Encoder

x

Channel Decoder

Source Decoder

Channel

Decoder

x

Two different pairs of blocks: source encoder/decoder and channel encoder/decoder.

The source encoder converts the source output to a binary sequence and the channel encoder processes the binary sequence for transmission over the channel. The channel decoder recreates the incoming binary sequence (hopefully reliable) and the source decoder recreates the source input.



22/28



A typical coding system (II)

Source Encoder

Channel Encoder

Encoder

x

The main objective of the source encoder is to reduce the amount of data

Channel Decoder

Source Decoder

Channel

Decoder

x

(average number of bits) necessary to represent the information in the signal. This data compression is mainly carried out by exploiting the redundancy and irrelevancy in the signal information.

The main objective of the channel encoder is to add redundancy to the output of the source encoder to enhance the reliability on the transmission.


Typical source codec

Source Encoder

PerceptualAnalysis

Transform orPrediction Quantization

Channel

x EntropyEncoder

ChannelEncoder

QuantizationTable

CodingTable

Transform orPrediction Model

Inverse Quantization

InverseTransform or

Prediction x

Source Decoder

EntropyDecoder

ChannelDecoder



23/28



Source encoder (I)

Perceptual Analysis Transform or Prediction Quantization

Source Encoder

EntropyEncoder

Quant. Table Coding TableSelection T/P

PerceptualAnalysis

This block applies the knowledge about human perception on the signal and systems to be used.

Non perceptual parts of the signal should not be considered

Depending on the signal to be coded, better models can be used: In the audio case, the impact of the perceptual analysis in the final coding system is

paramount

In some cases, the knowledge about the human perception cannot be fully exploited in the coding process.

Typically, the result of the perceptual analysis affects the kind of signal model as well as the quantization of the parameters of such a model.


Source encoder (II)

Transform or Prediction Transform or Prediction Quantization

Source Encoder

EntropyEncoder


PerceptualAnalysis

correlated set of samples describing the signal; modifies the signal representation into a format designed to reduce its redundancy.

Data in the new format has to present a correct structure in order to be useful for coding

purposes

The operation is generally reversible , that is the transform step does not introduce losses in the description.

represent the signal.

Coding algorithms can be grouped into two main classes: techniques that modified the original samples relying on a prediction model (e.g. Block Matching) and techniques that represent the signal in a different domain and code the transformed coefficients (e.g. DCT)



24/28



Source encoder (III)

Quantization Transform or Prediction Quantization

Source Encoder

EntropyEncoder


PerceptualAnalysis

This block reduces the accuracy of the transform or prediction output in accordance with a pre established fidelity criterion.

The goal is to keep irrelevant information out of the compressed representation

The quantization represents a range of values of a given transformed sample by a single value in the range. The set of output values are the quantization indices. This step is responsible of the losses in the system and, therefore, determines the achieved com ression here we are assumin scalar uantization rou in a set of

transformed samples into a vector and performing vector quantization is possible as well).

This operation is irreversible , so it must be omitted when (lossless compression) error free compression is desired.


Source encoder (IV)

Entropy encoder Transform or Prediction Quantization

Source Encoder

EntropyEncoder


PerceptualAnalysis

This block produces the final bit stream .

The entropy encoder generates a fixed or variable length code to represent the quantization output and maps the output in accordance with the code.

In many cases a variable length code is used: the shortest code words are assigned to the most frequently occurring quantization output values, thus minimizing coding redundancy.

This operation is reversible



25/28



Complete system assessment

Rate/Distortion Analysis (I)

Transform or Prediction Quantization

Source Encoder

EntropyEncoder


PerceptualAnalysis

The way to analysis the performance of a coding system is by evaluating the quality of the decoded signal when varying the encoding rate .

In the case of audio signals, since good models for the HAS exist, the concept of quality refers to perceived audio quality .

D

, the concept of quality commonly refers toa measure of distortion in the decodedsignal (for instance, MSE or PSNR) R


Complete system assessment

Rate/Distortion Analysis (II)

Transform or Prediction Quantization

Source Encoder

EntropyEncoder


PerceptualAnalysis

A Rate/Distortion analysis requires the implementation of the complete encoding system , which sometimes is not available

The various parts of the system are commonly assessed using other (local) measures:

Transform / Prediction : Decrease in correlation of the output samples. Quantization : Decrease in entropy of the output symbols

Those are suboptimal measures and usually the global analysis is necessary since the different blocks can be very related:

For instance, entropy coders which are very adapted to a given source



26/28



Source decoder

Source Decoder

The decoder contains three blocks that erform, in reverse order, the inverse operations of the encoder: entropy decoder , inverse quantization and inverse transform block.

InverseQuantization

InverseTransform or

EntropyDecoder

Prediction

Source Decoder


Channel encoder and decoder The channel encoder and decoder play an important role when the

channel is noisy or prone to error.

They are designed to reduce the impact of channel noise by inserting a controlled form of redundancy into the source encoded data.

As the output of the source encoder contains little redundancy, it would be highly sensitive to transmission noise without the addition of this controlled redundancy.

Source Channel

Encoder

x

ChannelDecoder

SourceDecoder

Channel

Decoder

x



27/28



Normalization and Audiovisual Coding Standards

What do standards standardize? To correctly decode the bitstream produced by the coding system, the encoder

has to share some information with the decoder: How the different codewords have been stored in the bitstream ( syntax definition )

and Which actions are associated to their different decoded values (decoder definition ).

What is left for commercial competition? Manufacturers still have a margin for innovation and competition

Codec performance: parameters, coding tables, actual implementation, Added functionalities: interface, adaptation, complexity, quality, price,

Im ortance of Normalization Normalization allows interconnection of equipment from different

manufacturers Standards should arrive early in the market, to avoid de facto standards


Standards ISO/MPEG

ISO/IEC MPEG has published audio visual standards in the last 10 years to foster the development of the digital environment.

MPEG1 1992, ISO/IEC 11172Coding of video & associated audio for Digital Storage Media (DSM) up to 1,5 Mbit/s in

MPEG2 1994, ISO/IEC 13818 1996 Emmy for technical excellence Generic video and audio coding for TV Broadcasting of Standard Definition signals (SDTV) & High Definition (HDTV)

MPEG4 1998, ISO/IEC 14496 2003 JVT (ISO/ITU-T) AVC Coding of Audio Visual Objects (AVO) oriented to manipulation (e.g. use of acompositor in the decoder)

MPEG7 2001, ISO/IEC 15938Interface for the description of multimedia content, in order to facilitate search, retrieval and access of audio visual contents

MPEG21 2003, ISO/IEC 21000Structure for the development of the multimedia, management of adaptation of content to different media and devices (repurposing) & digital rights management for content creation (applications: collections, multimedia libraries...)



28/28


Course contents

Transform orPrediction Quantization

x

Source Encoder

EntropyEncoder

ChannelEncoder

PerceptualAnalysis

Inverse Quantization

InverseTransform or

Prediction

Channel

x EntropyDecoderChannelDecoder

QuantizationTable

CodingTable

Transform orPrediction Model

Source Decoder

Unit 2 Entropy Coding CommunicationCourses

Source Decoder

Unit 3 Speech Coding, Unit 4 Audio Coding, Unit 5 Image Coding, Unit 6 Video Coding


Introduction to coding of audivisual contents

Documents

Transcript of Introduction to coding of audivisual contents

Coding in Python Introduction to

Introduction to advance design & coding

X-Media Audio, Image and Video Coding · Contents 3 •Introduction to X-Media coding generic principles •Audio data recording/sampling, physiology •Data, audio coding LZ*, Mpeg-Layer

Medical Billing / Coding - Santa Monica College Billing / Coding Regional Program Demand Report Santa Monica College, LA MSA Economic Modeling Specialists Inc. Introduction and Contents

Line Coding Table of Contents - e-PGPathshala

An Introduction to Vex Coding

Table of Contents Introduction Terms - Glossary Heuristic Review Information Color Coding used to detect severity of a bugColor Coding used to detect severity.

Creative Coding 1 - 1 Introduction

An Introduction to (Network) Coding Theoryuser.math.uzh.ch/trautmann/Coding_Introduction.pdf · An Introduction to (Network) Coding Theory Coding Theory Introduction Data representation

Introduction to Video Coding Part 1: Transform Coding - Xiph.org

Introduction to video coding

Channel coding - Introduction & linear codes

· Contents 1 Gr¨obner bases for codes 5 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Basic facts from coding theory

Introduction to Multimedia1 Introduction yCoding Requirements yEntropy Encoding xContent Dependent Coding Run-length Coding Diatomic Coding xStatistical.

Introduction to Network Coding

Chapter 1: Introduction Transform Coding

Video Coding Introduction

Introduction to Error Control Coding and Channel Coding ...

PVK-HT04bella@cs.umu.se1 Contents Introduction Requirements Engineering Project Management Software Design Detailed Design and Coding Quality Assurance.

Introduction to Coding Theoryconrado/docencia/codigos.pdf · 2009-06-11 · Introduction to Channel Coding Gadiel Seroussi/Coding Theory/September 8, 2008 1. Communication System