T325: Technologies for digital media

62
T325: Technologies for digital media Second semester – 2011/2012 Tutorial 5 – Video and Audio Coding (1-2) Arab Open University – Spring 2012 1

description

T325: Technologies for digital media. Second semester – 2011/2012 Tutorial 5 – Video and Audio Coding (1-2). Introduction Video coding in MPEG-2 MPEG audio coding. Outline. Introduction. - PowerPoint PPT Presentation

Transcript of T325: Technologies for digital media

Page 1: T325: Technologies for digital media

T325: Technologies for digital media Second semester – 2011/2012Tutorial 5 – Video and Audio Coding (1-2)

Arab Open University – Spring 2012 1

Page 2: T325: Technologies for digital media

Arab Open University – Spring 2012 2Outline

• Introduction • Video coding in MPEG-2• MPEG audio coding

Page 3: T325: Technologies for digital media

Arab Open University – Spring 2012 3

INTRODUCTION

Page 4: T325: Technologies for digital media

Arab Open University – Spring 2012 4

Digital vs. Analog – At the beginning

• Digital video coding techniques have been used since 1970s, in television studios where equipment costs and the large bandwidths required at the time were not major considerations.

• Digital vs. Analog• Digital techniques allow much greater processing flexibility

than analogue • Digital material can be re-recorded many times over

without loss of quality.

• BUT, The large bandwidth and higher costs of receivers, meant that digital video coding was not appropriate for domestic broadcast systems at that time.

Page 5: T325: Technologies for digital media

Arab Open University – Spring 2012 5Introduction

• Digital coding had become a practicable possibility for the domestic market due to: • Rapid reduction in costs for digital processing hardware

Reduce equipment cost• Development of highly efficient digital video compression

techniques

Minimize bandwidth requirements

Page 6: T325: Technologies for digital media

Question

What are the advantages of Digital techniques over analogue techniques for broadcast TV?

Arab Open University – Spring 2012 6

Page 7: T325: Technologies for digital media

Arab Open University – Spring 2012 7

Digital vs. Analog for Broadcast TV

• Effect of transmission impairments on picture quality is far less than in the analogue case

Eliminate ‘ghost’ pictures due to the presence of multiple signal transmission paths and reflections.

• Digital television allows more channels to be accommodated in a given bandwidth

Different types of program material such as teletext or sub-titles in several languages can be accommodated much more flexibly with digital coding.

Page 8: T325: Technologies for digital media

Questions

•What do you know about the following standards: JPEG, MPEG?•What are the MPEG standards you

have used?

Arab Open University – Spring 2012 8

Page 9: T325: Technologies for digital media

Arab Open University – Spring 2012 9Introduction

• JPEG stands for Joint Photographic Experts Group• Set up to develop standards for the digital

coding of still pictures. • ‘Joint’ Done jointly by the CCITT (now

ITU-T) and the ISO • ‘Experts’ were drawn from industry,

universities, broadcasting authorities, etc.

Page 10: T325: Technologies for digital media

Arab Open University – Spring 2012

MPEG Standards

• MPEG stands for Motion Picture Experts Group. • Set up by the ISO to develop coding standards for

moving pictures. • Defined a number of standards for the compression

of moving pictures• MPEG-1• MPEG-2• MPEG-4• MPEG-7• MPEG-21

10

Page 11: T325: Technologies for digital media

Arab Open University – Spring 2012 11MPEG standards

• MPEG-1 designed mainly for the efficient storage of moving pictures on CD-ROM, but in a format not suitable for television.

• MPEG-2 is effectively a ‘tool box’ of compression techniques which can cater for a wide range of systems, present and future, including low, standard and high definition systems.

• MPEG-1 and -2 also include various audio standards, one of which -- the so-called Audio Layer III -- is the basis of MP3 coding. This standard is still widely used in digital television.

Page 12: T325: Technologies for digital media

12MPEG standards

• MPEG-4 • Initially intended to provide very high compression rates

allowing for transmission of moving images at rates of 64 kbps, or less.

• Aims extended to the provision of flexible standards for a wide range of audiovisual material.

• It is proposed for the new HDTV planned for many countries over the next few years.

• It is already (in 2008) used in many commercial devices, such as domestic video cameras, personal digital assistants (PDAs) and web-based video such as the BBC iPlayer.

Arab Open University – Spring 2012

Page 13: T325: Technologies for digital media

Arab Open University – Spring 2012 13MPEG standards

• MPEG-7 • Specifies the way multimedia content can be indexed, and

thus searched for in a variety of ways relating to the specific medium.

• It also has intellectual property aspects• Involves the idea of ‘metadata’ : data that describes the nature

of the multimedia object to ease searching.

• MPEG-21 • Includes additional digital rights management• Will be considered further in Block 2.

Page 14: T325: Technologies for digital media

Arab Open University – Spring 2012 14

VIDEO CODING IN MPEG-2

Page 15: T325: Technologies for digital media

Arab Open University – Spring 2012 15Introduction

• Both in films and television, moving scenes are shown as a series of fixed pictures• Generated at a rate of about 25 per second• The effect of motion being produced by changes from one picture

to the next.

• There is often very little change between consecutive pictures• MPEG-2 coding takes advantage of this to achieve high degrees

of compression (inter-frame compression).• Even in the case of single pictures, there can be a good deal of

redundancy it is possible to remove some fine detail without our perceiving any significant loss of quality (intra-frame compression).

Page 16: T325: Technologies for digital media

Arab Open University – Spring 2012 16Introduction

• Digital audio and video systems are based on the principle of sampling the original sound or image, and processing the samples in order to achieve the desired result, whether transmission, storage or processing of the sound or vision.

• The sampling rate ultimately depends on the quantity of ‘information’ in the original signal.

• Useful information in an audio or video signal is dependent on the way human beings perceive sound, light intensity and color.

Page 17: T325: Technologies for digital media

Arab Open University – Spring 2012 17Introduction

• What will be covered in this part?• How the luminance and the two chrominance signals are

sampled before any compression has been applied?• Compressed coding of still pictures, which involves the use

of JPEG techniques• The way correlation between successive pictures is used by

MPEG• Various levels of compression available with MPEG-2 • Forms of audio coding used with MPEG-2

Page 18: T325: Technologies for digital media

Arab Open University – Spring 2012 18Sampling formats

• The human eye is less sensitive to color than to brightness

The chrominance signal does not have to be sampled at such a high rate as the luminance signal.

Page 19: T325: Technologies for digital media

Arab Open University – Spring 2012 19Sampling formats

• Figure (a) represents the luminance sampling. • The figure represents part of a camera scanning raster• Circles show the times when the camera output

luminance signal is sampled. • The samples are taken consecutively along each line at

the sampling rate • a sample is taken every 1/(13.5 × 106) = 0.074 μs.

Page 20: T325: Technologies for digital media

Arab Open University – Spring 2012 20Sampling formats

• The Cb and Cr chrominance signals are sampled at half the luminance rate

• 4:2:2 sampling takes chrominance samples which coincide with alternate luminance ones

Page 21: T325: Technologies for digital media

Arab Open University – Spring 2012

Sampling format

• 4:2:0 sampling• The chrominance sample values are obtained by averaging the

values for corresponding points on two consecutive scan lines. • They represent the chrominance values half-way between these

lines• This averaging avoids the more abrupt changes in color that

would result from simply omitting half the chrominance samples. This is one of the main formats used for MPEG-2 coding.

21

Page 22: T325: Technologies for digital media

Sampling formats : comparisonArab Open University – Spring 2012 22

Page 23: T325: Technologies for digital media

Arab Open University – Spring 2012 23Sampling formats

• When even lower resolution is acceptable, source intermediate format (SIF) may be used.• Used for MPEG-1 coding. • The quality is comparable with that of a

VHS video recorder.

Page 24: T325: Technologies for digital media

Arab Open University – Spring 2012 24Sampling format

• 4:2:0 sampling vs. SIF sampling• Sixteen luminance samples are replaced by four• Four chrominance samples are replaced by just one. • The net effect is that both the luminance and

chrominance resolutions are halved in both the vertical and horizontal directions.

Page 25: T325: Technologies for digital media

Arab Open University – Spring 2012 25The coding of still pictures

• MPEG is designed to squeeze out as much redundancy as possible in order to achieve high levels of compression. This is done in two stages:• Spatial compression: • uses the fact that, in most pictures, there is considerable

correlation between neighboring areas in a picture to compress separately each picture in a video sequence.

• Temporal compression:• uses the fact that, in most picture sequences, there is normally

very little change during the 1/25 s interval between one picture and the next.

• The resulting high degree of correlation between consecutive pictures allows a considerable amount of further compression.

Page 26: T325: Technologies for digital media

Arab Open University – Spring 2012 26Discrete Cosine Transform (DCT)

• The first stage of spatial compression uses a variety of Fourier transform known as a discrete cosine transform (DCT) on 8×8 blocks of data.

• The luminance information for a row of “n” consecutive pixels will consist of “n” numbers.

• Example of transform: doubling each number form of amplification which would double the picture brightness.

• Reversible transform in which a set of “n” original data values are converted into a new set of “n” values in such a way that the original set can be recovered by applying what is called the inverse transform to the new set.

• Because the process is applied to digital data, that is to a set of discrete numbers, the transform is called a discrete transform.

Page 27: T325: Technologies for digital media

Arab Open University – Spring 2012 27Discrete Cosine Transform

• The row of pixels is a digital version of the original analogue signal consisting of a time-varying luminance signal

• The value of each consecutive pixel representing the signal luminance at each consecutive sampling interval.

• If the picture is a meaningful one, abrupt changes in sample values will be relatively rare.

• There are a number of transforms that can be applied to digital data samples so as to yield a set of numbers which correspond, in effect, to the amplitudes of the frequency components of the spectrum of the original analogue signal.

Page 28: T325: Technologies for digital media

Activity 5.1

• What do you think this implies about the frequency spectrum of the luminance signal?

• Abrupt changes in a signal correspond to high frequency components in its spectrum.

• If there are not many abrupt changes in the luminance values, then the amplitude of high frequency components will, in general, be small compared with that of low frequency components.

Arab Open University – Spring 2012 28

Page 29: T325: Technologies for digital media

Arab Open University – Spring 2012 29Discrete Cosine Transform

• DCT is used for JPEG and MPEG coding. • DCT is a Reversible transform• Applied to “n” original samples, it yields “n” amplitude

values, and applying the reverse transform to these “n” amplitudes enables one to recover the original sample values.

• But converting “n” original values into “n” new one does not achieve anything in terms of compression.

Page 30: T325: Technologies for digital media

• If the high-frequency components are sufficiently small, then setting them to zero before carrying out the reverse transform will produce a picture which, to a human observer, is effectively the same as the original one

This is the essence of the compression process!

Arab Open University – Spring 2012 30

Discrete Cosine Transform

Page 31: T325: Technologies for digital media

Arab Open University – Spring 2012 31

• At the transmitter• A DCT is applied to sets of “n” picture samples to yield “n’

amplitude values. • In most cases, the majority of the amplitudes are negligible. • This results in a data set containing many zero values, and

such a data set can be compressed and transmitted using far fewer bits than the original samples

• Only the relatively few values that make a significant contribution to the perceived picture are transmitted directly.

Discrete Cosine Transform

Page 32: T325: Technologies for digital media

Arab Open University – Spring 2012 32The discrete cosine transform

• At the receiving end• The reverse DCT is applied on a set of “n” samples

consisting of the received samples, together with the appropriate number of zero-amplitude samples.

• Ignoring the low amplitude, high frequency components means that the overall transform process is no longer reversible in a mathematical sense

• BUT this does not matter, so long as the recovered picture is sufficiently like the original one to meet the reproduction quality requirements of the system.

Page 33: T325: Technologies for digital media

Arab Open University – Spring 2012 33The discrete cosine transform

• Figure (a) shows the variation of luminance along part of a picture line of length w.

• The variation with distance, x, along the line obeys a cosine law with one complete ‘period’ of the cosine taking place over distance w.

• The variation of luminance with distance, L1, say, can be represented by the equation

Page 34: T325: Technologies for digital media

Arab Open University – Spring 2012 34The discrete cosine transform

• The resulting picture is shown as thick line in figure (b).

• Peak white at either end of the line and the darkest region in the middle.

Page 35: T325: Technologies for digital media

Arab Open University – Spring 2012 35The discrete cosine transform

• Figure (c) shows the case when two complete cycles of luminance just fit into length w of the line. The luminance can be expressed as

• The resulting pattern is shown in Figure (d).

Page 36: T325: Technologies for digital media

Arab Open University – Spring 2012 36The discrete cosine transform

• Taking “w” as our unit of length, we can think of the pattern for L1 as having a spatial frequency of one cycle per unit length and of L2 as having a spatial frequency of two cycles per unit length.

• This idea can be extended to higher frequencies with luminance components of the form

• where r = 1, 2, 3 and Lr has spatial frequency “r” cycles per unit length.

Page 37: T325: Technologies for digital media

37The discrete cosine transform

• Figure shows examples of five cosine patterns.

• The spatial frequencies for (a) to (d) are 1, 2, 3 and 4 cycles per unit length respectively.

• Figure (e) shows a zero frequency, that is constant luminance, pattern which corresponds to a DC component in terms of the usual frequency spectra.

Arab Open University – Spring 2012

Page 38: T325: Technologies for digital media

Arab Open University – Spring 2012 38The discrete cosine transform

• By combining spatial luminance patterns in appropriate amounts by adding components with appropriately chosen amplitudes (so that the rth component is of form Arcos(2rx/w), with amplitude Ar), one can reproduce any luminance pattern.

• In general, abrupt changes in amplitude values for a set of adjacent pixels will be unlikely.

• Because of this, the higher spatial frequency components of the pattern may be negligible and do not need to be transmitted.

• Applying the reverse transform at the receiving end will recover a satisfactory picture despite the absence of the higher components.

Page 39: T325: Technologies for digital media

Arab Open University – Spring 2012 39The discrete cosine transform

• DCT yields a finite set of discrete frequency components equal in number to the original number of samples in the segment being analyzed • Frequencies are 1, 2, 3, 4 ... times the lowest frequency,

together with a zero-frequency (dc) term.

• If a line segment consists of eight samples, the DCT will yield eight amplitudes for components with spatial frequencies of 0, 1, 2, ..., 7 cycles per unit length.

Page 40: T325: Technologies for digital media

Arab Open University – Spring 2012 40

The discrete cosine transform – 2D

• A much higher degree of compression can be achieved by using DCT simultaneously Horizontally (lines) and vertically (columns).

• This is done by applying a two-dimensional DCT to rectangular 8 × 8 blocks of pixels.

• The two-dimensional DCT applied to the 64 luminance values of an 8 × 8 block yields 64 amplitudes of two-dimensional spatial cosine functions.

• The spatial frequencies range from 0 (dc term) to 7 in both directions.

• The luminance in each block varies as a cosine function in both the horizontal and vertical directions.

Page 41: T325: Technologies for digital media

Arab Open University – Spring 2012 41The discrete cosine transform

The 64 two dimensional cosine functions.

Page 42: T325: Technologies for digital media

Question

In general, DCTs can be carried out on arrays of n x n amplitude values.

But why is n = 8 chosen?

Arab Open University – Spring 2012 42

Page 43: T325: Technologies for digital media

Arab Open University – Spring 2012 43The discrete cosine transform

1. Computation turns out to be more efficient if “n” is a power of 2. So “n” could be chosen to be 2 or 4 or 8 or 16 and so on.

2. The bigger the value of “n”, the more computation is involved and the more time is taken by the transform process.

3. Also, the smaller the value of “n”, the greater the inherent errors in the process. • These errors show up as differences between the original amplitudes

and the amplitudes obtained by using the reverse DCT on the result of carrying out a DCT on the original amplitudes.

• Tests on typical data indicate errors of the order of 5% for a 4x4 transform and 1% for an 8x8 transform. Beyond this point, the errors drop very slowly with increasing “n”, being of the order of 0.5% for a 256x256 transform.

• A 1% luminance error is not really perceptible, whereas a 5% error is. So the 8x8 transform is often the optimum choice.

Page 44: T325: Technologies for digital media

44The discrete cosine transform

• The DCT computation is much more efficient, and hence faster, if the original block is symmetrical in both the horizontal and vertical directions.

• Thus, if the DCT is to be applied to the block shown in (a) below, the transform which is used is applied to the extended block of (b).

Arab Open University – Spring 2012

Page 45: T325: Technologies for digital media

Arab Open University – Spring 2012 45The discrete cosine transform

• The extended block to which the DCT is applied has twice the height and twice the width of the original block.

• The original block lies in the top left quarter of the extended block and can be reconstructed by combining the top left quarters of the full two-dimensional cosine functions whose amplitudes have been determined by carrying out the DCT.

Page 46: T325: Technologies for digital media

Arab Open University – Spring 2012 46

Fig. Example of an 8 × 8 DCT.In Figure above, each amplitude applies to a different component and the way the amplitudes are ordered in the right-hand transform output block is shown to the right.

Page 47: T325: Technologies for digital media

Arab Open University – Spring 2012 47

• The output block is organized so that the horizontal frequencies increase from left to right and the vertical frequencies increase from top to bottom.

• The top-left component, with zero vertical and horizontal frequencies, is the dc term which represents the average luminance of the block.

• The minus signs in the DCT output represent phase differences.

Page 48: T325: Technologies for digital media

The discrete cosine transform

• Looking at the DCT output block of figure above, the dc term, A00 = 826, the A20 term = 15 and the A14 term=−2.

• It turns out that each of the components can either be in phase, or 180° out of phase, with any of the others. This is a consequence of applying the transform to a symmetrical block such as (b) above.

Arab Open University – Spring 2012 48

Page 49: T325: Technologies for digital media

Arab Open University – Spring 2012 49

Thresholding and requantization

• Humans are not very sensitive to fine detail at low luminance levels.

• This allows higher spatial frequency components below a certain magnitude to be eliminated.

• This is known as Thresholding. • The values of components below a certain threshold are

each replaced by a zero value.• Threshold tables are stored in the encoder

Page 50: T325: Technologies for digital media

Arab Open University – Spring 2012 50

Thresholding and requantization

• Also, in general, humans are less sensitive to the contribution of high-frequency components compared with lower ones.

• This is taken into account by using requantisation: fewer bits are used for the higher-frequency components which remain after Thresholding than for the low-frequency ones.

Page 51: T325: Technologies for digital media

Arab Open University – Spring 2012 51

Thresholding and requantization

• A requantization table which is stored in the encoder is used.

• Each amplitude value in the DCT output table is divided by the corresponding number in the quantization table and the result, rounded to the nearest integer, is used to replace the original amplitude.

Page 52: T325: Technologies for digital media

Arab Open University – Spring 2012 52

Thresholding and requantization

• Qij is used for the requantization table entries

• Aij is used for the DCT component amplitudes. • Example: Q31 = 24

• The new quantized values are given by taking the nearest integer to Aij /Qij for each table entry.• Example: A12 = 16 and Q12 = 22

Page 53: T325: Technologies for digital media

Arab Open University – Spring 2012 53

Thresholding and requantization

• A12/Q12 = 16/22 = 0.727 and the nearest integer is 1; this is the value of the (1,2) entry in the requantized values tables.

• A1 3 = 9 and Q1 3 = 22, giving 9/22 = 0.409. The nearest integer is 0 – hence the entry at position (1,3).

Page 54: T325: Technologies for digital media

Arab Open University – Spring 2012 54

Page 55: T325: Technologies for digital media

55Zig-zag scan and run-length encoding

• The higher the frequencies, the more zeros there are in the requantized block.

• In order to take advantage of this, the requantized values are rearranged for further processing in the order shown in the figure, which places them in order of ascending frequency for the horizontal and vertical directions combined.

• The result of this is that there are relatively long sequences consisting entirely of zeros.

• In that case, Run-length encoding leads to useful compression

Arab Open University – Spring 2012

Page 56: T325: Technologies for digital media

Arab Open University – Spring 2012 56Zig-zag scan and run-length encoding

• The dc term is coded separately using differential coding. • This just involves sending the difference between the value of

the dc term and the value of the dc term of the contiguous block that was encoded immediately before.

Page 57: T325: Technologies for digital media

Arab Open University – Spring 2012 57Zig-zag scan and run-length encoding

• For requantized table values, zig-zag scanning gives 103, followed by two zeros, −2, 1, 1, two zeros, 1, 42 zeros, 1, and finally 12 zeros. Assuming that the difference between the current and previous dc terms was 4, the data after zig-zag scanning and run-length coding would be sent as:• 4, 2, −2, 0, 1, 0, 1, 2, 1, 42, 1, 12

• Note that run lengths and non-zero amplitude values must alternate in the coded sequence.

• zero run lengths between contiguous non-zero amplitudes have to be indicated. There are two zeros immediately after the dc term, followed by consecutive non-zero amplitudes of −2, 1 and 1 which are separated by zero run lengths. Hence the initial coding of 4, 2,−2, 0, 1, 0, 1,… .

Page 58: T325: Technologies for digital media

Arab Open University – Spring 2012 58Activity 5.3

• Assuming that the requantized amplitude of the dc term in the contiguous block coded immediately before was 100, what would be the sequence of numbers transmitted after the requantized amplitude data of Table 5.4 had been run-length encoded?

Page 59: T325: Technologies for digital media

Arab Open University – Spring 2012 59Exercise

• Table 3 below shows an 4 × 4 DCT output block as part of MPEG-2 luminance coding. What would be the sequence of numbers transmitted after requantization using the table 4, zigzag scanning and run-length encoding? Assuming that the amplitude of the dc term in the contiguous block coded immediately before was 770.

Page 60: T325: Technologies for digital media

Arab Open University – Spring 2012 60Huffman coding

• The final step in the coding of single pictures is to use a technique known as Huffman coding.

• Huffman coding uses short code words for the most commonly occurring patterns and longer words for those patterns that occur less frequently.• This results in significant compression of the data without

any loss of information.

• The coding for the chrominance is essentially the same as for the luminance, but different quantization tables and Huffman code tables are used, based on the statistics of typical sets of chrominance data and on relevant features of our perception of color.

Page 61: T325: Technologies for digital media

Arab Open University – Spring 2012 61

Page 62: T325: Technologies for digital media

Arab Open University – Spring 2012 62Summary of spatial coding

• The coding techniques can be divided into two categories:• Reversible or lossless coding: • the exact data can be recovered after decoding. • Examples: Huffman and run-length encoding• The DCT is also effectively reversible, although some

errors are, in fact, introduced through rounding and other effects.

• Reversible coding preserves all the information contained in the signal.

• Non-reversible or lossy coding:• causes some information to be lost irrecoverably. • Example: the requantization (reduces the number of bits

per sample)