Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

1/21

4/2/2003 Nguyen Chan Hung Hanoi University of Technology 1

Multimedia Technology

n Overview

q Introduction

q Chapter 1: Background of compression techniques

q Chapter 2: Multimedia technologies

n JPEG

n MPEG-1/MPEG -2 Audio & Video

n MPEG-4

n MPEG-7 (brief introduction)n HDTV (brief introduction)

n H261/H263 (brief introduction)

n Model base coding (MBC) (brief introduction)

q Chapter 3: Some real-world systems

n CATV systems

n DVB systems

q Chapter 4: Multimedia Network


Introductionn The importance of Multimedia technologies: Multimedia everywhere !!

q On PCs:

n Real Player, QuickTime, Windows Media.

n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,mov, ra, ram, mid, DIVX, etc)

n Video/Audio Conferences.

n Webcast / Streaming Applications

n Distance Learning (or Tele-Education)

n Tele-Medicinen Tele-xxx (Lets imagine !!)

q On TVs and other home electronic devices:

n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting Terrestrial/Cable/Satellite) shows MPEG-2 superior quality overtraditional analog TV !!

n Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !! No need to wait for a PC to startup and shutdown !!

n CD/VCD/DVD/Mp3 players

q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!


Introduction (2)

n Multimedia networkq The Internet was designed in the 60s for low-speed inter-

networks with boring textual applications High delay,high jitter.

q Multimedia applications require drastic modificationsof the INTERNET infrastructure.

q Many frameworks have been being investigated and

deployed to support the next generation multimediaInternet. (e.g. IntServ, DiffServ)q In the future, all TVs (and PCs) will be connected to the

Internet and freely tuned to any of millions broadcaststations all over the World.

q At present, multimedia networks run over ATM (almostobsolete), IPv4, and in the future IPv6 shouldguarantee QoS (Quality of Service) !!


Chapter 1: Background of compression

techniques

n Why compression ?q For communication: reduce bandwidth in multimedia

network applications such as Streaming media, Video-on-Demand (VOD), Internet Phone

q Digital storage (VCD, DVD, tape, etc) Reduce size &cost, increase media capacity & quality.

n Compression factor or compression ratioq Ratio between the source data and the compressed data.

(e.g. 10:1)

n 2 types of compression:q Lossless compression

q Lossy compression


2/21


Information content and redundancy

n Information rate

q Entropy is the measure of information content.n Expressed in bits/source output unit (such as bits/pixel).

q The more information in the signal, the higher theentropy.

q Lossy compression reduce entropy while lossless

compression does not.n Redundancy

q The difference between the information rate and bitrate.

q Usually the information rate is much less than the bitrate.

q Compression is to eliminate the redundancy.


Lossless Compression

n The data from the decoder is identical to the

source data.

q Example: archives resulting from utilities such aspkzip or Gzip

q Compression factor is around 2:1.n Can not guarantee a fix compression ratio

The output data rate is variable problems

for recoding mechanisms or communication

channel.


Lossy Compression

n The data from the expander is not identical to

the source data but the difference can not be

distinguished auditorily or visually.

q Suitable for audio and video compression.

q

Compression factor is much higher than that oflossless. (up to 100:1)

n Based on the understanding of

psychoacoustic and psychovisual perception.

n Can be forced to operate at a fixed

compression factor.


Process of Compression

n Communication (reduce the cost of the data

link)

q Data ? Compressor (coder) ? transmissionchannel ? Expander (decoder) ? Data'

n Recording (extend playing time: in proportionto compression factor

q Data ? Compressor (coder) ? Storagedevice(tape, disk, RAM, etc.) ? Expander (decoder)? Data


3/21


Sampling and quantization

n Why sampling?

q Computer can not process analog signal directly.

n PCMq Sample the analog signal at a constant rate and

use a fixed number of bits (usually 8 or 16) to

represent the samples.q bit rate = sampling rate * number of bits per

sample

n Quantizationq Map the sampled analog signal (generally, infinite

precision) to discrete level (finite precision).

q Represent each discrete level with a number.


Predictive coding

n Prediction

q Use previous sample(s) to estimate the currentsample.

q For most signal, the difference of the prediction

and actual values is small.We can use smallernumber of bits to code the differencewhilemaintaining the same accuracy !!

q Noise is completely unpredictable

n Most codec requires the data being preprocessed or

otherwise it may perform badly when the data contains

noise.


Statistical coding: the Huffman code

n Assign short code to the most probable data

pattern and long code to the less frequent

data pattern.

n Bit assignment based on statistic of the

source data.n The statistics of the data should be known

prior to the bit assignment.


Drawbacks of compression

n Sensitive to data errorq Compression eliminates the redundancy which is essential

to making data resistant to errors.

n Concealment required for real time applicationq Error correction code is required, hence, adds redundancy

to the compressed data.

n Artifacts

q Artifacts appear when the coder eliminates part of the

entropy.

q The higher the compression factor, the more the artifacts.


4/21


A coding example: Clustering color pixels

n In an image, pixel values are clustered in severalpeaks

n Each cluster representing the color range of oneobject in the image (e.g. blue sky)

n Coding process:1. Separate the pixel values into a limited number of data

clusters (e.g., clustered pixels of sky blue or grass green)2. Send the average color of each cluster and anidentifying number for each cluster as side information.

3. Transmit, for each pixel:n The number of the average cluster color that it is close to.n Its difference from that average cluster color. ( can be

coded to reduce redundancy since the differences are oftensimilar !!) Prediction


Frame-Differential Coding

n Frame-Differential Coding = prediction from aprevious video frame.

n A video frame is stored in the encoder forcomparison with the present frame causesencodinglatency of one frame time.

n For still images:q Data can be sent only for the first instance of a frameq All subsequent prediction error values are zero.

q Retransmit the frame occasionally to allow receivers thathave just been turned on to have a starting point.

n FDC reduces the information for still images, butleaves significant data for moving images (e.g. amovement of the camera)


Motion Compensated Predictionn More data in Frame-Differential Coding can

be eliminated by comparing the presentpixel to the location of the same objectinthepreviousframe. ( not to thesame spatial location in the previous frame)

n The encoder estimates the motion in theimage to find the corresponding area in aprevious frame.

n The encoder searches for a portion of aprevious frame which is similar to the part

of the new frame to be transmitted.n It then sends (as side information) a

motion vectortelling the decoder whatportion of the previous frame it will use topredict the new frame.

n It also sends the prediction errorso thatthe exact new frame may be reconstituted

n See top figure without motioncompensation Bottom figure Withmotion compensation


Unpredictable Information

n Unpredictable information from the previous

frame:

1. Scene change (e.g. background landscapechange)

2. Newly uncovered information due to objectmotion across a background, or at the edges of apanned scene. (e.g. a soccer s face uncoveredby a flying ball)


5/21


Dealing with unpredictable Information

n Scene changeq An Intra-coded picture (MPEG I picture) must be sent

for a starting point require more data than Predictedpicture (P picture)

q I pictures are sent about twice per second Their time andsending frequency may be adjusted to accommodatescene changes

n Uncovered informationq Bi-directionally coded type of picture, or B picture.

q There must be enough frame storage in the system to waitfor the later picturethat has the desired information.

q To limit the amount of decoders memory, the encoderstores pictures and sends the required referencepictures before sending the B picture.


Transform Coding

n Convert spatial image pixel values totransform coefficient values

n the number of coefficients produced isequal to the number of pixels transformed.

n

Few coefficients contain most of theenergyin a picture coefficients may befurther coded by lossless entropy coding

n The transform process concentrates theenergy into particular coefficients(generally the low frequency coefficients )


Types of picture transform coding

n Types of picture coding:q Discrete Fourier (DFT)

q Karhonen-Loeve

q Walsh-Hadamard

q Lapped orthogonal

q Discrete Cosine (DCT) used in MPEG-2 !

q Wavelets New !

n The differences between transform coding methods:q The degree of concentration of energy in a few coefficients

q The region of influence of each coefficient in thereconstructed picture

q The appearance and visibility of coding noise due to coarsequantization of the coefficients


DCT Lossy Coding

n Lossless coding cannot obtain high

compression ratio (4:1 or less)

n Lossy coding = discard selective information

so that the reproduction is visually or aurally

indistinguishable from the source or havingleast artifacts.

n Lossy coding can be achieved by:

q Eliminating some DCT coefficients

q Adjusting the quantizing coarseness of thecoefficients better !!


6/21


Masking

n Masking make certain types of codingnoise invisible or inaudible due to some

psycho-visual/acoustical effect.

q In audio, a pure tone will mask energy of higher

frequency and also lower frequency (with weakereffect).

q In video, high contrast edges mask random noise.

n Noise introduced at low bit rates falls in the

frequency, spatial, or temporal regions


Variable quantization

n Variable quantization is the main technique of lossycoding greatly reduce bit rate.

n Coarsely quantizing the less significant coefficientsin a transform ( less noticeable / low energy / lessvisible/audible)

n Can be applied to a complete signal or to individualfrequency components of a transformed signal.

n VQ also controls instantaneous bit rate in order to:q Match average bit rate to a constant channel bit rate.

q Prevent buffer overflow or underflow.


Run-Level coding

n "Run-Level" coding = Coding a run-length of

zeros followed by a nonzero level.

q Instead of sending all the zero valuesindividually, the length of the runis sent.

q Useful for any data with long runs of zeros.

q Run lengths are easily encoded by Huffman code


Key points:

n Compression process

n Quantization & Sampling

n Coding:

q Lossless & lossy coding

q Frame-Differential Coding

q Motion Compensated Prediction

q Variable quantization

q Run level coding

n Masking


7/21


Chapter 2: Multimedia technologies

q Roadmapn JPEG

n MPEG-1/MPEG-2 Video

n MPEG-1 Layer 3 Audio (mp3)

n MPEG-4

n

MPEG-7 (brief introduction)n HDTV (brief introduction)

n H261/H263 (brief introduction)

n Model base coding (MBC) (brief introduction)


JPEG (Joint Photographic Experts Group)

n JPEG encoder

q Partitions image into blocks of 8 * 8 pixels

q Calculates the Discrete Cosine Transform (DCT)of each block.

q A quantizer roundsoff the DCT coefficients according to thequantizationmatrix. lossy but allows for large compression ratios.

q Produces a series of DCT coefficients using Zig-zag scanning

q Uses a variablelengthcode(VLC)on these DCT coefficients

q Writes the compressed data stream to an output file (*.jpg or *.jpeg).

n JPEG decoder

q File input data stream Variable length decoder IDCT (Inverse

DCT) Image


JPEG Zig-zag scanning


JPEG - DCTn DCT is similar to the Discrete Fourier Transform

transforms a signal or image from the spatial domain tothe frequency domain.

n DCT requires less multiplications than DFT

n Input image A:

q The input image A is N2 pixels wide by N1 pixels high;

q A(i,j) is the intensity of the pixel in row i and column j;

n Output image B:q B(k1,k2) is the DCT coefficient in row k1 and column k2 of

the DCT matrix


8/21


JPEG - Quantization Matrix

n The quantization matrixis the 8 by 8 matrix of step sizes(sometimescalled quantums) - one element for each DCTcoefficient.

n Usually symmetric.

n Step sizes will be:

q Small in the upper left (low frequencies),

q Large in the upper right (high frequencies)

q A step size of 1 is the most precise.

n The quantizer divides the DCT coefficient by its correspondingquantum, then rounds to the nearest integer.

n Large quantums drive small coefficients down to zero.

n The result:

q Many high frequency coefficients become zero remove easily.

q The low frequency coefficients undergo only minor adjustment.


JPEG Coding process illustrated

1255 -15 43 58 -12 1 -4 -6

11 -65 80 -73 -27 -1 -5 1

-49 37 -87 8 12 6 10 8

27 -50 29 13 3 13 -6 5

-16 21 -11 -10 10 -21 9 -6

3 -14 0 14 -14 16 -8 4

-4 -1 8 -13 12 -9 5 -1

-4 2 -2 6 -7 6 -1 3

78 -1 4 4 -1 0 0 0

1 -5 6 -4 -1 0 0 0

-4 3 -5 0 0 0 0 0

2 -3 1 0 0 0 0 0

-1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Q

Zigzag scan result: 78-1 1 -4 -5 4 4 6 3 2 -1-3 -5-4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

Easily coded by Run-length Huffman coding

DCT Coefficients Quantization result


MPEG (Moving Picture Expert Group)

n MPEG is the heart of:

q Digital television set-top boxes

q HDTV decoders

q DVD players

q

Video conferencingq Internet video, etc

n MPEG standards:

q MPEG-1, MPEG-2, MPEG-4, MPEG-7

q (MPEG-3 standard was abandoned and becamean extension of MPEG-2)


MPEG standardsn MPEG-1 (Obsolete)

q A standard for storage and retrieval of moving pictures and audioon storage media

q application: VCD (video compact disk)

n MPEG-2 (Widely implemented)q A standard for digital televisionq Applications: DVD (digital versatile disk), HDTV (high definition

TV), DVB (European Digital Video Broadcasting Group), etc.n MPEG-4 (Newly implemented still being

researched)q A standard for multimedia applicationsq Applications: Internet, cable TV, virtual studio, etc.

n MPEG-7 (Future work ongoing research)q Content representation standard for information search

( Multimedia Content Description Interface)q Applications: Internet, video search engine, digital library


9/21


MPEG -2 formal standards

n The international standard ISO/IEC 13818-2"Generic Coding of Moving Pictures and

Associated Audio Information

n ATSC (Advanced Television SystemsCommittee) document A/54 "Guide to the Use of

the ATSC Digital Television Standard


MPEG video data structure

n The MPEG 2 video data stream is constructed inlayersfrom lowest to highest as follows:q PIXEL is the fundamental unit

q BLOCK is an 8 x 8 array of pixels

q MACROBLOCK consists of 4 luma blocks and 2chromablocks

q Field DCT Coding and Frame DCT Coding

q SLICE consists of a variable number of macroblocks

q PICTURE consists of a frame (or field) of slices

q GROUP of PICTURES (GOP) consists of a variablenumber of pictures

q SEQUENCE consists of a variable number of GOPs

q PACKETIZED ELEMENTARY STREAM (opt)


Pixel & Block

n Pixel = "picture element".

q A discrete spatial point sample of an image.

q A color pixel may be represented digitally as anumber of bits for each of three primary colorvalues

n Block

q = 8 x 8 array of pixels.

q A block is the fundamental unit for the DCT coding(discrete cosine transform).


Macroblock

n A macroblock = 16 x 16 array of luma (Y) pixels ( =4 blocks = 2 x 2 block array).

n The number of chromapixels (Cr, Cb) will varydepending on the chroma pixel structureindicated in the sequence header (e.g. 4:2:0, etc)

n The macroblock is the fundamental unit for motion

compensation and will have motion vector(s)associated with it if is predictively coded.

n A macroblock is classified asq Field coded ( An interlaced frame consists of 2 field)

q Frame coded

depending on how the four blocks are extracted from themacroblock.


10/21


Slice

n Pictures are divided into slices.

n A slice consists of an arbitrary number of

successive macroblocks (going left to right),

but is typically an entire row of macroblocks.

A slice does not extend beyond one row.

n The slice header carries address information

that allows the Huffman decoder to

resynchronize at slice boundaries


Picture

n A source picture is a contiguous rectangular array of pixels.

n A picture may be a complete frame of video ("frame picture") orone of the interlaced fields from an interlaced source ("fieldpicture").

n A field picture does not have any blank lines between its activelines of pixels.

n A coded picture (also called a video access unit) begins with a

start code and a header. The header consists of:q picture type (I, B, P)

q temporal reference informationq motion vector search range

q optional user data

n A frame picture consists of:q a frame of a progressive source orq a frame (2 spatially interlaced fields) of an interlaced source


I, P, B Pictures

Encoded pictures are classified into 3 types: I, P, and B.

n I Pictures = Intra Coded Pictures

q All macroblocks coded without prediction

q Needed to allow receiver to have a "starting point" for prediction aftera channel change and to recover from errors

n P Pictures = Predicted Pictures

q Macroblocks may be coded with forward prediction from references

made from previous I and P pictures or may be intra codedn B Pictures = Bi -directionally predicted pictures

q Macroblocks may be coded with forward prediction from previous Ior P references

q Macroblocks may be coded with backward prediction from next I orP reference

q Macroblocks may be coded with interpolated prediction from pastand future I or P references

q Macroblocks may be intra coded (no prediction)


Group of pictures (GOP)n The group of pictures layer is optional in MPEG-2.

n GOP begins with a start code and a header

n The header carriesq time code information

q editing informationq optional user data

n First encoded picture in a GOP is always an I picture

n Typical length is 15 pictures with the following structure (in display order):

q I B B P B B P B B P B B P B B Provides an I picture with sufficientfrequency to allow a decoder to decode correctly

I B B P PB B B B P BTime

Forward motion compensation

Bidirectional motion compensation


11/21


Sequence

n A sequence begins with a unique 32 bit start code followed bya header.

n The header carries:

q picture size

q aspect ratio

q frame rate and bit rate

q optional quantizer matrices

q required decoder buffer size

q chroma pixel structure

q optional user data

n The sequence information is needed for channel changing.

n The sequence length depends on acceptable channel changedelay.


Packetized Elementary Stream (PES)

n Video Elementary Stream(video ES), consists of allthe video data for a sequence, including the sequenceheader and all the subparts of a sequence.

n An ES carries only one type of data (video or audio)from a single video or audio encoder.

n A PES, consists of a single ES which has been split

into packets, each starting with an added packetheader.

n A PES stream contains only one type of data fromone source, e.g. from one video or audio encoder.

n PES packets have variable length, not correspondingto the fixed packet length of transport packets, andmay be much longer than a transport packet.


Transport stream

n Transport packets (fixedlength) are formed from a PES stream,including:q The PES header

q Transport packet header.

q Successive transport packets payloads are filled by the remainingPES packet content until the PES packet is all used.

q The final transport packet is filled to a fixed length by stuffing with0xFF bytes (all ones).

n Each PES packet header includes:q An 8-bit stream IDidentifying the source of the payload.q Timing references: PTS (presentation time stamp), the time

at which a decoded audio or video access unit is to bepresented by the decoder

q DTS (decoding time stamp)the time at which an access unitis decoded by the decoder

q ESCR (elementary stream clock reference).


Intra Frame Codingn Intra codingonly concern with information within the current

frame, (not relative to any other frame in the video sequence)

n MPEG intra-frame coding block diagram (See bottom Fig) Similar to JPEG (Lets review JPEG coding mechanism !!)

n Basic blocks of Intra frame coder:

q Video filter

q Discrete cosine transform (DCT)

q

DCT coefficient quantizerq Run-length amplitude/variable length coder (VLC)


12/21


Video Filter

n Human Visual System (HVS) isq Most sensitive to changes in luminance,

q Less sensitive to variations in chrominance.

n MPEG uses the YCbCr color space to represent thedata values instead of RGB, where:q Y is the luminance signal ,

q Cb is the blue color difference signal,q Cr is the red color difference signal.

n What is 4:4:4, 4:2:0, etc, video format ?q 4:4:4 is full bandwidth YCbCr video each macroblock

consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks waste of bandwidth !!

q 4:2:0 is most commonly used in MPEG-2


Applications of chroma forma ts

Computer graphicsYYYYCbCrCbCrCbCrCbCr4:4:4

(12 blocks)

Studio production

environments

Professional editing

equipment,

YYYYCbCrCbCr4:2:2

(8 blocks)

Main stream television,

Consumer entertainment.YYYYCbCr

4:2:0

(6 blocks)

ApplicationMultiplex order (time)

within macroblock

chroma_for

mat


MPEG Profiles & levelsn MPEG-2 is classified into several profiles.

n Main profile features:q 4:2:0 chroma sampling formatq I, P, and B picturesq Non-scalable

n Main Profile is subdivided into levels.q MP@ML (Main Profile Main Level):

n Designed with CCIR601 standard for interlaced standard digitalvideo.n 720 x 576 (PAL) or 720 x 483 (NTSC)

n 30 Hz progressive, 60 Hz interlaced

n Maximum bit rate is 15 Mbits /s

q MP@HL (Main Profile High Level):n Upper bounds:

n 1152 x 1920, 60Hz progressive

n 80 Mbits/s


MPEG encoder/ decoder


13/21


Predictionn Backward prediction is done by

storing pictures until the desiredanchor picture is available beforeencoding the current stored frames.

n The encoder can decide to use:

q Forward prediction from a previouspicture,

q Backward prediction from a followingpicture,

q or Interpolated prediction

to minimize prediction error.n The encoder must transmit pictures in

an order differ from that of sourcepictures so that the decoder has theanchor pictures before decodingpredicted pictures. (See next slide)

n The decoder must have two framestored.


I P B Picture Reordering

n Pictures are coded and decoded in a different orderthan they are displayed.

n Due to bidirectional prediction for B pictures.n For example we have a 12 picture long GOP:n Source order and encoder input order:

q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)

B(12) I(13)n Encoding order and order in the coded bitstream:

q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)B(12)

n Decoder output order and display order (same asinput):q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)

B(12) I(13)


DCT and IDCT formulas

n DCT:q Eq 1 Normal formq Eq 2 Matrix form

n IDCT:q Eq 3 Normal formq Eq 4 Matrix form

n Where:q F(u,v) = two-dimensional

NxNDCT.q u,v,x,y = 0,1,2,...N-1q x,y are spatial coordinates in

the sample domain.

q u,v are frequency coordinatesin the transform domain.

q C(u), C(v) = 1/(square root(2)) for u, v = 0.

q C(u), C(v) = 1 otherwise.


DCT versus DFT

n The DCT is conceptually similar to the DFT, except:q DCT concentrates energy into lower order coefficients

better than DFT.

q DCT is purely real, the DFT is complex (magnitude andphase).

q A DCT operation on a block of pixels produces coefficientsthat are similar to the frequency domain coefficients

produced by a DFT operation.n An N-point DCT has the same frequency resolution as a 2N-

point DFT.n The N frequencies of a 2N point DFT correspond to N points

on the upper half of the unit circle in the complex frequencyplane.

q Assuming a periodic input, the magnitude of the DFTcoefficients is spatially invariant (phase of the input doesnot matter). This is not true for the DCT.


14/21


Quantization matrixn Note DCT

coefficients are:q Small in the upper left

(low frequencies),

q Large in the upper right(high frequencies)

Recall the JPEGmechanism !!

n Why ?q HVS is less sensitive

to errors in highfrequency coefficientsthan it is for lowerfrequencies

q higherfrequenciesshould be morecoarsely quantized !!


Result DCT matrix (example)

n After adaptive

quantization, the

result is a matrix

containing many

zeros.


MPEG scanning

n Left Zigzag scanning(like JPEG)

n Right Alternatescanning better for interlaced frames !


Huffman/ Run-Level Coding

n Huffman codingin combination with Run-Levelcodingand zig-zag scanningis applied toquantized DCT coefficients.

n "Run-Level" = A run-length of zeros followed by anon-zero level.

n

Huffman coding is also applied to various types ofside information.

n A Huffman code is an entropy code which isoptimally achieves the shortest average possiblecode word lengthfor a source.

n This average code word length is >= the entropyof the source.


15/21


Huffman/ Run-Level coding illustrated

n Using the DCT outputmatrix in previous slide,after being zigzagscanned the outputwill be a sequence ofnumber: 4, 4, 2, 2, 2, 1,1, 1, 1, 0 (12 zeros), 1, 0(41 zeros)

n These values are lookedup in a fixed table ofvariable length codesq The most probable

occurrence is given arelatively short code,

q The least probableoccurrence is given arelatively long code.

10EOBEOB

0010 0010 0112

11010

11010

11010

11010

0100020

0100020

0100020

0000110040

0000110040

110 10008 (DC Value)N/A

MPEGCode Value

AmplitudeZero

Run-Length


Huffman/ Run-Level coding illustrated (2)

n The first run of 12 zeroes has been efficientlycoded by only 9 bits

n The last run of 41 zeroes has been entirelyeliminated, represented only with a 2-bit End OfBlock (EOB) indicator.

n The quantized DCT coefficients are nowrepresented by a sequence of 61 binary bits(Seethe table).

n Considering that the original 8x8 block of 8-bitpixels required 512 bitsfor full representation, the compression rateis approx. 8,4:1.


MPEG Data Transportn MPEG packages all data into fixed-size 188-byte packets for transport.n Video or audio payload data placed in PES packets before is broken up

into fixed length transport packet payloads.n A PES packet may be much longer than a transport packet Require

segmentation:q The PES headeris placed immediately following a transport header

q Successive portions of the PES packet are then placed in the payloads oftransport packets.

q Remaining space in the final transport packet payload is filled with stuffingbytes = 0xFF (all ones).

q Each transport packet starts with a sync byte = 0x47.q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not

processed, but is replaced by a different sync symbol especiallysuited to RFtransmission.

q The transport packet header contains a 13-bit PID (packet ID), whichcorresponds to a particular elementary stream of video, audio, or other programelement.

q PID 0x0000 is reservedfor transport packets carrying a program associationtable (PAT).

q The PAT points to a Program Map Table (PMT) points to particular elementsof a program


MPEG Transport packet

n Adaptation Field:q 8 bits specifying the length of the

adaptation field.q The first group of flags consists of

eight 1-bit flags:q discontinuity_indicatorq random_access_indicatorq elementary_stream_priority_in

dicator

q PCR_flagq OPCR_flagq splicing_point_flagq transport_private_data_flagq adaptation_field_extension_flagq The optionalfieldsare present if

indicated byone of thepreceding flags.

q Theremainder of theadaptation field isfilled with stuffing bytes (0xFF, allones).


16/21


Demultiplexing a Transport Stream (TS)

n Demultiplexing a transport stream involves:1. Finding the PAT by selecting packets with PID = 0x0000

2. Reading the PIDs for the PMTs

3. Reading the PIDs for the elements of a desired programfrom its PMT (for example, a basic program will have aPID for audio and a PID for video)

4. Detecting packets with the desired PIDs and routing them

to the decoders

q A MPEG-2 transport stream can carry: Video stream

Audio stream

Any type of data

MPEG-2 TS is the packet format for CATV downstreamdata communication.


Timing & buffer controln Point A:

Encoder input

Constant/specifiedrate

n Point B:EncoderoutputVariable rate

n Point C:EncoderbufferoutputConstant rate

n

Point D:Communicationchannel +decoderbufferConstantrate

n Point E:Decoder inputVariable rate

n Point F:DecoderoutputConstant/specifiedrate


Timing - Synchronization

n The decoder is synchronized with the encoder by time stamps

n The encoder contains a master oscillator and counter, called theSystem Time Clock (STC). (See previous block diagram.)

q The STC belongs to a particularprogram and is the masterclock of the video and audio encodersfor that program.

q Multiple programs, each with its own STC, can also bemultiplexed into a single stream.

n A program component can even have no time stamps butcan not be synchronized with other components.

n At encoder input, (Point A), the time of occurrence of an inputvideo picture or audio block is noted by sampling the STC.

n A total delay of encoder and decoder buffer (constant) isadded to STC, creating a Presentation Time Stamp (PTS),

q PTS is then inserted in the first of the packet(s) representingthat picture or audio block, at Point B.


Timing Synchronization (2)n Decode Time Stamp (DTS)can optionally combined into the bit

stream represents the time at which the data should be takeninstantaneously from the decoder buffer and decoded.q DTS and PTS are identical except in the case of picture reordering for B

pictures.q The DTS is only used where it is needed because of reordering.

Whenever DTS is used, PTS is also coded.q PTS (or DTS) inserted interval = 700mS.

q In ATSC PTS (or DTS) must be inserted at the beginning of eachcoded picture (access unit ).

n In addition, the output of the encoder buffer (Point C) is timestamped with System Time Clock (STC)values, called:q System Clock Reference (SCR)in a Program Stream.q Program Clock Reference (PCR)in a Transport Stream.

n PCR time stamp interval = 100mS.n SCR time stamp interval = 700mS.n PCR and/or the SCR are used to synchronize the decoder STC

with the encoder STC.


17/21


Timing Synchronization (3)n All video and audio streams included in a program must get their

time stamps from a common STC so that synchronization of thevideo and audio decoders with each other may be accomplished.

n The data rate and packet rate on the channel (at the multiplexeroutput) can be completely asynchronous with the System TimeClock (STC)

n PCR time stamps allows synchronizations of differentmultiplexed programs having different STCs while allowing STCrecovery for each program.

n If there is no buffer underflow or overflow delays in the buffersand transmission channel for both video and audio areconstant.

n The encoder input and decoder output run at equal and constantrates.

n Fixedend-to-end delay from encoder input to decoder outputn If exact synchronization is not required, the decoder clock can be

free running video frames can be repeated / skipped asnecessary to prevent buffer underflow / overflow , respectively.


HDTV (High definition television)

n High definition television (HDTV) first came topublic attention in 1981, when NHK, theJapanese broadcasting authority, firstdemonstrated it in the United States.

n HDTV is defined by the ITU-R as:

q 'A system designed to allow viewing at aboutthree times the picture height, such that thesystem is virtually, or nearly, transparent to thequality or portrayal that would have beenperceived in the original scene ... by a discerningviewer with normal visual acuity.'


HDTV (2)

n HDTV proposals are for a screen which is wider than the conventionalTV image by about 33%. It is generally agreed that the HDTV aspectratio will be 16:9, as opposed to the 4:3 ratio of conventional TVsystems. This ratio has been chosen because psychological tests haveshown that it best matches the human visual field.

n It also enables use of existing cinema film formats as additional sourcematerial, since this is the same aspect ratio used in normal 35 mm film.Figure 16.6(a) shows how the aspect ratio of HDTV compares with thatof conventional television, using the same resolution, or the samesurface area as the comparison metric.

n To achieve the improved resolution the video image used in HDTVmust contain over 1000 lines, as opposed to the 525 and 625 providedby the existing NTSC and PAL systems. This gives a much improvedvertical resolution. The exact value is chosen to be a simple multiple ofone or both of the vertical resolutions used in conventional TV.

n However, due to the higher scan rates the bandwidth requirement foranalogue HDTV is approximately 12 MHz, compared to the nominal 6MHz of conventional TV


HDTV (3)

n The introduction of a non-compatible TV transmission format forHDTV would require the viewer either to buy a new receiver, or tobuy a converter to receive the picture on their old set.

n The initial thrust in Japan was towards an HDTV format which iscompatible with conventional TV standards, and which can bereceived by conventional receivers, with conventional quality.However, to get the full benefit of HDTV, a new wide screen, highresolution receiver has to be purchased.

n One of the principal reasons that HDTV is not already common isthat a general standard has not yet been agreed. The 26th CCIRplenary assembly recommended the adoption of a single, worldwidestandard for high definition television.

n Unfortunately, Japan, Europe and North America are all investingsignificant time and money in their own systems based on their own,current, conventional TV standards and other nationalconsiderations.


18/21


H261- H263

n The H.261 algorithm was developed for the purpose of imagetransmission rather than image storage.

n It is designed to produce a constant output of px 64 kbivs, wherepis an integer in the range 1 to 30.q This allows transmission over a digital network or data link of

varying capacity.

q It also allows transmission over a single 64 kbit/s digitaltelephone channel for low quality video-telephony, or at higher bitrates for improved picture quality.

n The basic coding algorithm is similar to that of MPEG in that it isa hybrid of motion compensation, DCT and straightforwardDPCM (intra-frame coding mode), without the MPEG I, P, Bframes.

n The DCT operation is performed at a low level on 8 x 8 blocks oferror samples from the predicted luminance pixel values, withsub-sampled blocks of chrominance data.


H261-H263 (2)


H261-H263 (3)

n H.261 is widely used on 176x 144 pixel images.

n The ability to select a range of output rates for the algorithmallows it to be used in different applications.

n Low output rates ( p= 1 or 2) are only suitable for face-to-face(videophone) communication. H.261 is thus the standard used inmany commercial videophone systems such as the UKBT/Marconi Relate 2000 and the US ATT 2500 products.

n

Video-conferencing would require a greater output data rate ( p>6) and might go as high as 2Mbit/s for high quality transmissionwith larger image sizes.

n A further development of H.261 is H.263 for lower fixedtransmission rates.

n This deploys arithmetic coding in place of the variable lengthcoding (See H261 diagram), with other modifications, the datarate is reduced to only 20kbit/s.


Model Based Coding (MBC)

n At the very low bit rates (20 kbit/s or less) associated with videotelephony, the requirements for image transmission stretch thecompression techniques described earlier to their limits.

n In order to achieve the necessary degree of compression theyoften require reduction in spatial resolution or even theelimination of frames from the sequence.

n Model based coding (MBC) attempts to exploit a greater degree

of redundancy in images than current techniques, in order toachieve significant image compression but without adverselydegrading the image content information.

n It relies upon the fact that the image quality is largely subjective.Providing that the appearance of scenes within an observedimage is kept at a visually acceptable level, it may not matter thatthe observed image is not a precise reproduction of reality.


19/21


Model Based Coding (2)n One MBC method for producing an artificial image of a head sequence

utilizes a feature codebook where a range of facial expressions,sufficient to create an animation, are generated from sub-images ortemplates which are joined together to form a complete face.

n The most important areas of a face, for conveying an expression, arethe eyes and mouth, hence the objective is to create an image in whi chthe movement of the eyes and mouth is a convincing approximation tothe movements of the original subject.

n When forming the synthetic image, the feature template vectors whichform the closest match to those of the original moving sequenceare

selected from the codebook and then transmitted as low bit rate codedaddresses.

n By using only 10 eye and 10 mouth templates, for instance, a total of100 combinations exists implying that only a 6-bit codebook addressneed be transmitted.

n It has been found that there are only 13 visually distinct mouth shapesfor vowel and consonant formation during speech.

n However, the number of mouth sub-images is usually increased, toinclude intermediate expressions and hence avoid step changes in theimage.


Model Based Coding (3)n Another common way of representing objects in three-

dimensional computer graphics is by a net ofinterconnecting polygons.

n A model is stored as a set of linked arrays which specifythe coordinates of each polygon vertex, with the linesconnecting the vertices together forming each side of apolygon.

n To make realistic models, the polygon net can beshaded to reflect the presence of light sources.

n The wire-frame model [Welch 19911 can be modified tofit the shape of a person's head and shoulders. The

wire-frame, composed of over 100 interconnectingtriangles, can produce subjectively acceptable syntheticimages, providing that the frame is not rotated b y morethan 30" from the full-fa ce position.

n The model, (see the Figure) uses smaller triangles inareas associated with high degrees of curvature wheresignificant movement is required.

n Large flat areas, such as the forehead, contain fewertriangles.

n A second wire-frame is used to model the mout hinterior.


Model based coding (4)n A synthetic image is created by texture mapping detail from an

initial full-face source image, over the wire-frame, Facialmovement can be achieved by manipulation of the vertices of thewire-frame.

n Head rotation requires the use of simple matrix operations uponthe coordinate array. Facial expression requires the manipulationof the features controlling the vertices.

n This model based feature codebook approach suffers from thedrawback of codebook formation.

n This has to be done off-line and, consequently, the image isrequired to be prerecorded, with a consequent delay.n However, the actual image sequence can be sent at a very low

data rate. For a codebook with 128 entries where 7 bits arerequired to code each mouth, a 25 frameh sequence requiresless than 200 bit/s to code the mouth movements.

n When it is finally implemented, rates as low as 1 kbit/s areconfidently expected from MBC systems, but they can onlytransmit image sequences which match the stored model, e.g.head and shoulders displays.


Key points:

n JPEG coding mechanism DCT/ Zigzag Scanning/ AdaptiveQuantization / VLC

n MPEG layered structure:q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,

Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream(PES)

n MPEG compression mechanism:q Predictionq Motion compensationq Scanningq YCbCr formats (4:4:4, 4:2:0, etc)

q Profiles @ Level

q I,P,B pictures & reordering

q Encoder/ Decoder process & Block diagram

n MPEG Data transport

n MPEG Timing & Buffer controlq STC/SCR/DTSq PCR/PTS

h i l i f i


20/21


Technical termsn Macro blocksn HVS = Human Visual Systemn GOP = Group of Picturesn VLC = Variable Length Coding/Coder

n IDCT/DCT = (Inverse) Discrete Cosine Transformn PES = Packetized ElementaryStreamn MP@ML = Main profile @ Main Level

n PCR = Program Clock Referencen SCR = System Clock Referencen STC = System Time Clock

n PTS = Presentation Time Stampn DTS = Decode Time Stampn PAT = Program Association Table

n PMT = Program Map Table


Chapter 3. CATV systems

n Overview:

qA brief history

qModern CATV networks

qCATV systems and equipments


A Brief History:

q CATV appeared in the 60s in the US, where highbuildings are the great obstacles for the

propagation of TV signal.

q Old CATV networks

n Coaxial only

n Tree-and-Branch only

n TV only

n No return path ( high-pass filters are installed in

customers houses to block return low frequency noise)


Modern CATV networksn Key elements:

q CO orMasterHeadend

q Headends/Hub

q Servercomplex

q CMTS

q TV contentprovider

q OpticalNodes

q Taps

q Amplifiers(GNA/TNA/LE)

M d CATV k (2) CATV d i


21/21


Modern CATV networks (2)n Based on Hybrid Fiber-Coaxialarchitecture also referred to

as HFC networks

n The optical section is based on modern optical communicationtechnologies q Star/ring/mesh, etc topologiesq SDH/SONET for digital fibers

q Various architectures digital, analog or mixed fiber cablingsystems.

n Part of forward path spectrum is used for high-speed Internetaccess

n Return path is exploited for Digital data communication theroot of new problems !!q 5-60 MHz band for upstream

q 88-860 MHz band for downstreamn 88-450 MHz for analog/digital TV channelsn 450-860 MHz for Internet access

q FDM


Spectrum allocation of CATV networks


CATV systems and equipments


Vocabulary

n Perception = Su nhan thuc

n Lap = Phu len

Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

Documents

Transcript of Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133