Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

download Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

of 21

Transcript of Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 1

    Multimedia Technology

    n Overview

    q Introduction

    q Chapter 1: Background of compression techniques

    q Chapter 2: Multimedia technologies

    n JPEG

    n MPEG-1/MPEG -2 Audio & Video

    n MPEG-4

    n MPEG-7 (brief introduction)n HDTV (brief introduction)

    n H261/H263 (brief introduction)

    n Model base coding (MBC) (brief introduction)

    q Chapter 3: Some real-world systems

    n CATV systems

    n DVB systems

    q Chapter 4: Multimedia Network

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 2

    Introductionn The importance of Multimedia technologies: Multimedia everywhere !!

    q On PCs:

    n Real Player, QuickTime, Windows Media.

    n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,mov, ra, ram, mid, DIVX, etc)

    n Video/Audio Conferences.

    n Webcast / Streaming Applications

    n Distance Learning (or Tele-Education)

    n Tele-Medicinen Tele-xxx (Lets imagine !!)

    q On TVs and other home electronic devices:

    n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting Terrestrial/Cable/Satellite) shows MPEG-2 superior quality overtraditional analog TV !!

    n Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !! No need to wait for a PC to startup and shutdown !!

    n CD/VCD/DVD/Mp3 players

    q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 3

    Introduction (2)

    n Multimedia networkq The Internet was designed in the 60s for low-speed inter-

    networks with boring textual applications High delay,high jitter.

    q Multimedia applications require drastic modificationsof the INTERNET infrastructure.

    q Many frameworks have been being investigated and

    deployed to support the next generation multimediaInternet. (e.g. IntServ, DiffServ)q In the future, all TVs (and PCs) will be connected to the

    Internet and freely tuned to any of millions broadcaststations all over the World.

    q At present, multimedia networks run over ATM (almostobsolete), IPv4, and in the future IPv6 shouldguarantee QoS (Quality of Service) !!

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 4

    Chapter 1: Background of compression


    n Why compression ?q For communication: reduce bandwidth in multimedia

    network applications such as Streaming media, Video-on-Demand (VOD), Internet Phone

    q Digital storage (VCD, DVD, tape, etc) Reduce size &cost, increase media capacity & quality.

    n Compression factor or compression ratioq Ratio between the source data and the compressed data.

    (e.g. 10:1)

    n 2 types of compression:q Lossless compression

    q Lossy compression

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 5

    Information content and redundancy

    n Information rate

    q Entropy is the measure of information content.n Expressed in bits/source output unit (such as bits/pixel).

    q The more information in the signal, the higher theentropy.

    q Lossy compression reduce entropy while lossless

    compression does not.n Redundancy

    q The difference between the information rate and bitrate.

    q Usually the information rate is much less than the bitrate.

    q Compression is to eliminate the redundancy.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 6

    Lossless Compression

    n The data from the decoder is identical to the

    source data.

    q Example: archives resulting from utilities such aspkzip or Gzip

    q Compression factor is around 2:1.n Can not guarantee a fix compression ratio

    The output data rate is variable problems

    for recoding mechanisms or communication


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 7

    Lossy Compression

    n The data from the expander is not identical to

    the source data but the difference can not be

    distinguished auditorily or visually.

    q Suitable for audio and video compression.


    Compression factor is much higher than that oflossless. (up to 100:1)

    n Based on the understanding of

    psychoacoustic and psychovisual perception.

    n Can be forced to operate at a fixed

    compression factor.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 8

    Process of Compression

    n Communication (reduce the cost of the data


    q Data ? Compressor (coder) ? transmissionchannel ? Expander (decoder) ? Data'

    n Recording (extend playing time: in proportionto compression factor

    q Data ? Compressor (coder) ? Storagedevice(tape, disk, RAM, etc.) ? Expander (decoder)? Data

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 9

    Sampling and quantization

    n Why sampling?

    q Computer can not process analog signal directly.

    n PCMq Sample the analog signal at a constant rate and

    use a fixed number of bits (usually 8 or 16) to

    represent the samples.q bit rate = sampling rate * number of bits per


    n Quantizationq Map the sampled analog signal (generally, infinite

    precision) to discrete level (finite precision).

    q Represent each discrete level with a number.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 10

    Predictive coding

    n Prediction

    q Use previous sample(s) to estimate the currentsample.

    q For most signal, the difference of the prediction

    and actual values is small.We can use smallernumber of bits to code the differencewhilemaintaining the same accuracy !!

    q Noise is completely unpredictable

    n Most codec requires the data being preprocessed or

    otherwise it may perform badly when the data contains


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 11

    Statistical coding: the Huffman code

    n Assign short code to the most probable data

    pattern and long code to the less frequent

    data pattern.

    n Bit assignment based on statistic of the

    source data.n The statistics of the data should be known

    prior to the bit assignment.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 12

    Drawbacks of compression

    n Sensitive to data errorq Compression eliminates the redundancy which is essential

    to making data resistant to errors.

    n Concealment required for real time applicationq Error correction code is required, hence, adds redundancy

    to the compressed data.

    n Artifacts

    q Artifacts appear when the coder eliminates part of the


    q The higher the compression factor, the more the artifacts.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 13

    A coding example: Clustering color pixels

    n In an image, pixel values are clustered in severalpeaks

    n Each cluster representing the color range of oneobject in the image (e.g. blue sky)

    n Coding process:1. Separate the pixel values into a limited number of data

    clusters (e.g., clustered pixels of sky blue or grass green)2. Send the average color of each cluster and anidentifying number for each cluster as side information.

    3. Transmit, for each pixel:n The number of the average cluster color that it is close to.n Its difference from that average cluster color. ( can be

    coded to reduce redundancy since the differences are oftensimilar !!) Prediction

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 14

    Frame-Differential Coding

    n Frame-Differential Coding = prediction from aprevious video frame.

    n A video frame is stored in the encoder forcomparison with the present frame causesencodinglatency of one frame time.

    n For still images:q Data can be sent only for the first instance of a frameq All subsequent prediction error values are zero.

    q Retransmit the frame occasionally to allow receivers thathave just been turned on to have a starting point.

    n FDC reduces the information for still images, butleaves significant data for moving images (e.g. amovement of the camera)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 15

    Motion Compensated Predictionn More data in Frame-Differential Coding can

    be eliminated by comparing the presentpixel to the location of the same objectinthepreviousframe. ( not to thesame spatial location in the previous frame)

    n The encoder estimates the motion in theimage to find the corresponding area in aprevious frame.

    n The encoder searches for a portion of aprevious frame which is similar to the part

    of the new frame to be transmitted.n It then sends (as side information) a

    motion vectortelling the decoder whatportion of the previous frame it will use topredict the new frame.

    n It also sends the prediction errorso thatthe exact new frame may be reconstituted

    n See top figure without motioncompensation Bottom figure Withmotion compensation

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 16

    Unpredictable Information

    n Unpredictable information from the previous


    1. Scene change (e.g. background landscapechange)

    2. Newly uncovered information due to objectmotion across a background, or at the edges of apanned scene. (e.g. a soccer s face uncoveredby a flying ball)

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 17

    Dealing with unpredictable Information

    n Scene changeq An Intra-coded picture (MPEG I picture) must be sent

    for a starting point require more data than Predictedpicture (P picture)

    q I pictures are sent about twice per second Their time andsending frequency may be adjusted to accommodatescene changes

    n Uncovered informationq Bi-directionally coded type of picture, or B picture.

    q There must be enough frame storage in the system to waitfor the later picturethat has the desired information.

    q To limit the amount of decoders memory, the encoderstores pictures and sends the required referencepictures before sending the B picture.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 18

    Transform Coding

    n Convert spatial image pixel values totransform coefficient values

    n the number of coefficients produced isequal to the number of pixels transformed.


    Few coefficients contain most of theenergyin a picture coefficients may befurther coded by lossless entropy coding

    n The transform process concentrates theenergy into particular coefficients(generally the low frequency coefficients )

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 19

    Types of picture transform coding

    n Types of picture coding:q Discrete Fourier (DFT)

    q Karhonen-Loeve

    q Walsh-Hadamard

    q Lapped orthogonal

    q Discrete Cosine (DCT) used in MPEG-2 !

    q Wavelets New !

    n The differences between transform coding methods:q The degree of concentration of energy in a few coefficients

    q The region of influence of each coefficient in thereconstructed picture

    q The appearance and visibility of coding noise due to coarsequantization of the coefficients

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 20

    DCT Lossy Coding

    n Lossless coding cannot obtain high

    compression ratio (4:1 or less)

    n Lossy coding = discard selective information

    so that the reproduction is visually or aurally

    indistinguishable from the source or havingleast artifacts.

    n Lossy coding can be achieved by:

    q Eliminating some DCT coefficients

    q Adjusting the quantizing coarseness of thecoefficients better !!

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 21


    n Masking make certain types of codingnoise invisible or inaudible due to some

    psycho-visual/acoustical effect.

    q In audio, a pure tone will mask energy of higher

    frequency and also lower frequency (with weakereffect).

    q In video, high contrast edges mask random noise.

    n Noise introduced at low bit rates falls in the

    frequency, spatial, or temporal regions

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 22

    Variable quantization

    n Variable quantization is the main technique of lossycoding greatly reduce bit rate.

    n Coarsely quantizing the less significant coefficientsin a transform ( less noticeable / low energy / lessvisible/audible)

    n Can be applied to a complete signal or to individualfrequency components of a transformed signal.

    n VQ also controls instantaneous bit rate in order to:q Match average bit rate to a constant channel bit rate.

    q Prevent buffer overflow or underflow.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 23

    Run-Level coding

    n "Run-Level" coding = Coding a run-length of

    zeros followed by a nonzero level.

    q Instead of sending all the zero valuesindividually, the length of the runis sent.

    q Useful for any data with long runs of zeros.

    q Run lengths are easily encoded by Huffman code

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 24

    Key points:

    n Compression process

    n Quantization & Sampling

    n Coding:

    q Lossless & lossy coding

    q Frame-Differential Coding

    q Motion Compensated Prediction

    q Variable quantization

    q Run level coding

    n Masking

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 25

    Chapter 2: Multimedia technologies

    q Roadmapn JPEG

    n MPEG-1/MPEG-2 Video

    n MPEG-1 Layer 3 Audio (mp3)

    n MPEG-4


    MPEG-7 (brief introduction)n HDTV (brief introduction)

    n H261/H263 (brief introduction)

    n Model base coding (MBC) (brief introduction)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 26

    JPEG (Joint Photographic Experts Group)

    n JPEG encoder

    q Partitions image into blocks of 8 * 8 pixels

    q Calculates the Discrete Cosine Transform (DCT)of each block.

    q A quantizer roundsoff the DCT coefficients according to thequantizationmatrix. lossy but allows for large compression ratios.

    q Produces a series of DCT coefficients using Zig-zag scanning

    q Uses a variablelengthcode(VLC)on these DCT coefficients

    q Writes the compressed data stream to an output file (*.jpg or *.jpeg).

    n JPEG decoder

    q File input data stream Variable length decoder IDCT (Inverse

    DCT) Image

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 27

    JPEG Zig-zag scanning

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 28

    JPEG - DCTn DCT is similar to the Discrete Fourier Transform

    transforms a signal or image from the spatial domain tothe frequency domain.

    n DCT requires less multiplications than DFT

    n Input image A:

    q The input image A is N2 pixels wide by N1 pixels high;

    q A(i,j) is the intensity of the pixel in row i and column j;

    n Output image B:q B(k1,k2) is the DCT coefficient in row k1 and column k2 of

    the DCT matrix

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 29

    JPEG - Quantization Matrix

    n The quantization matrixis the 8 by 8 matrix of step sizes(sometimescalled quantums) - one element for each DCTcoefficient.

    n Usually symmetric.

    n Step sizes will be:

    q Small in the upper left (low frequencies),

    q Large in the upper right (high frequencies)

    q A step size of 1 is the most precise.

    n The quantizer divides the DCT coefficient by its correspondingquantum, then rounds to the nearest integer.

    n Large quantums drive small coefficients down to zero.

    n The result:

    q Many high frequency coefficients become zero remove easily.

    q The low frequency coefficients undergo only minor adjustment.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 30

    JPEG Coding process illustrated

    1255 -15 43 58 -12 1 -4 -6

    11 -65 80 -73 -27 -1 -5 1

    -49 37 -87 8 12 6 10 8

    27 -50 29 13 3 13 -6 5

    -16 21 -11 -10 10 -21 9 -6

    3 -14 0 14 -14 16 -8 4

    -4 -1 8 -13 12 -9 5 -1

    -4 2 -2 6 -7 6 -1 3

    78 -1 4 4 -1 0 0 0

    1 -5 6 -4 -1 0 0 0

    -4 3 -5 0 0 0 0 0

    2 -3 1 0 0 0 0 0

    -1 1 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0


    Zigzag scan result: 78-1 1 -4 -5 4 4 6 3 2 -1-3 -5-4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

    Easily coded by Run-length Huffman coding

    DCT Coefficients Quantization result

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 31

    MPEG (Moving Picture Expert Group)

    n MPEG is the heart of:

    q Digital television set-top boxes

    q HDTV decoders

    q DVD players


    Video conferencingq Internet video, etc

    n MPEG standards:

    q MPEG-1, MPEG-2, MPEG-4, MPEG-7

    q (MPEG-3 standard was abandoned and becamean extension of MPEG-2)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 32

    MPEG standardsn MPEG-1 (Obsolete)

    q A standard for storage and retrieval of moving pictures and audioon storage media

    q application: VCD (video compact disk)

    n MPEG-2 (Widely implemented)q A standard for digital televisionq Applications: DVD (digital versatile disk), HDTV (high definition

    TV), DVB (European Digital Video Broadcasting Group), etc.n MPEG-4 (Newly implemented still being

    researched)q A standard for multimedia applicationsq Applications: Internet, cable TV, virtual studio, etc.

    n MPEG-7 (Future work ongoing research)q Content representation standard for information search

    ( Multimedia Content Description Interface)q Applications: Internet, video search engine, digital library

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 33

    MPEG -2 formal standards

    n The international standard ISO/IEC 13818-2"Generic Coding of Moving Pictures and

    Associated Audio Information

    n ATSC (Advanced Television SystemsCommittee) document A/54 "Guide to the Use of

    the ATSC Digital Television Standard

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 34

    MPEG video data structure

    n The MPEG 2 video data stream is constructed inlayersfrom lowest to highest as follows:q PIXEL is the fundamental unit

    q BLOCK is an 8 x 8 array of pixels

    q MACROBLOCK consists of 4 luma blocks and 2chromablocks

    q Field DCT Coding and Frame DCT Coding

    q SLICE consists of a variable number of macroblocks

    q PICTURE consists of a frame (or field) of slices

    q GROUP of PICTURES (GOP) consists of a variablenumber of pictures

    q SEQUENCE consists of a variable number of GOPs


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 35

    Pixel & Block

    n Pixel = "picture element".

    q A discrete spatial point sample of an image.

    q A color pixel may be represented digitally as anumber of bits for each of three primary colorvalues

    n Block

    q = 8 x 8 array of pixels.

    q A block is the fundamental unit for the DCT coding(discrete cosine transform).

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 36


    n A macroblock = 16 x 16 array of luma (Y) pixels ( =4 blocks = 2 x 2 block array).

    n The number of chromapixels (Cr, Cb) will varydepending on the chroma pixel structureindicated in the sequence header (e.g. 4:2:0, etc)

    n The macroblock is the fundamental unit for motion

    compensation and will have motion vector(s)associated with it if is predictively coded.

    n A macroblock is classified asq Field coded ( An interlaced frame consists of 2 field)

    q Frame coded

    depending on how the four blocks are extracted from themacroblock.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 37


    n Pictures are divided into slices.

    n A slice consists of an arbitrary number of

    successive macroblocks (going left to right),

    but is typically an entire row of macroblocks.

    A slice does not extend beyond one row.

    n The slice header carries address information

    that allows the Huffman decoder to

    resynchronize at slice boundaries

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 38


    n A source picture is a contiguous rectangular array of pixels.

    n A picture may be a complete frame of video ("frame picture") orone of the interlaced fields from an interlaced source ("fieldpicture").

    n A field picture does not have any blank lines between its activelines of pixels.

    n A coded picture (also called a video access unit) begins with a

    start code and a header. The header consists of:q picture type (I, B, P)

    q temporal reference informationq motion vector search range

    q optional user data

    n A frame picture consists of:q a frame of a progressive source orq a frame (2 spatially interlaced fields) of an interlaced source

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 39

    I, P, B Pictures

    Encoded pictures are classified into 3 types: I, P, and B.

    n I Pictures = Intra Coded Pictures

    q All macroblocks coded without prediction

    q Needed to allow receiver to have a "starting point" for prediction aftera channel change and to recover from errors

    n P Pictures = Predicted Pictures

    q Macroblocks may be coded with forward prediction from references

    made from previous I and P pictures or may be intra codedn B Pictures = Bi -directionally predicted pictures

    q Macroblocks may be coded with forward prediction from previous Ior P references

    q Macroblocks may be coded with backward prediction from next I orP reference

    q Macroblocks may be coded with interpolated prediction from pastand future I or P references

    q Macroblocks may be intra coded (no prediction)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 40

    Group of pictures (GOP)n The group of pictures layer is optional in MPEG-2.

    n GOP begins with a start code and a header

    n The header carriesq time code information

    q editing informationq optional user data

    n First encoded picture in a GOP is always an I picture

    n Typical length is 15 pictures with the following structure (in display order):

    q I B B P B B P B B P B B P B B Provides an I picture with sufficientfrequency to allow a decoder to decode correctly

    I B B P PB B B B P BTime

    Forward motion compensation

    Bidirectional motion compensation

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 41


    n A sequence begins with a unique 32 bit start code followed bya header.

    n The header carries:

    q picture size

    q aspect ratio

    q frame rate and bit rate

    q optional quantizer matrices

    q required decoder buffer size

    q chroma pixel structure

    q optional user data

    n The sequence information is needed for channel changing.

    n The sequence length depends on acceptable channel changedelay.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 42

    Packetized Elementary Stream (PES)

    n Video Elementary Stream(video ES), consists of allthe video data for a sequence, including the sequenceheader and all the subparts of a sequence.

    n An ES carries only one type of data (video or audio)from a single video or audio encoder.

    n A PES, consists of a single ES which has been split

    into packets, each starting with an added packetheader.

    n A PES stream contains only one type of data fromone source, e.g. from one video or audio encoder.

    n PES packets have variable length, not correspondingto the fixed packet length of transport packets, andmay be much longer than a transport packet.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 43

    Transport stream

    n Transport packets (fixedlength) are formed from a PES stream,including:q The PES header

    q Transport packet header.

    q Successive transport packets payloads are filled by the remainingPES packet content until the PES packet is all used.

    q The final transport packet is filled to a fixed length by stuffing with0xFF bytes (all ones).

    n Each PES packet header includes:q An 8-bit stream IDidentifying the source of the payload.q Timing references: PTS (presentation time stamp), the time

    at which a decoded audio or video access unit is to bepresented by the decoder

    q DTS (decoding time stamp)the time at which an access unitis decoded by the decoder

    q ESCR (elementary stream clock reference).

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 44

    Intra Frame Codingn Intra codingonly concern with information within the current

    frame, (not relative to any other frame in the video sequence)

    n MPEG intra-frame coding block diagram (See bottom Fig) Similar to JPEG (Lets review JPEG coding mechanism !!)

    n Basic blocks of Intra frame coder:

    q Video filter

    q Discrete cosine transform (DCT)


    DCT coefficient quantizerq Run-length amplitude/variable length coder (VLC)

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 45

    Video Filter

    n Human Visual System (HVS) isq Most sensitive to changes in luminance,

    q Less sensitive to variations in chrominance.

    n MPEG uses the YCbCr color space to represent thedata values instead of RGB, where:q Y is the luminance signal ,

    q Cb is the blue color difference signal,q Cr is the red color difference signal.

    n What is 4:4:4, 4:2:0, etc, video format ?q 4:4:4 is full bandwidth YCbCr video each macroblock

    consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks waste of bandwidth !!

    q 4:2:0 is most commonly used in MPEG-2

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 46

    Applications of chroma forma ts

    Computer graphicsYYYYCbCrCbCrCbCrCbCr4:4:4

    (12 blocks)

    Studio production


    Professional editing



    (8 blocks)

    Main stream television,

    Consumer entertainment.YYYYCbCr


    (6 blocks)

    ApplicationMultiplex order (time)

    within macroblock



    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 47

    MPEG Profiles & levelsn MPEG-2 is classified into several profiles.

    n Main profile features:q 4:2:0 chroma sampling formatq I, P, and B picturesq Non-scalable

    n Main Profile is subdivided into levels.q MP@ML (Main Profile Main Level):

    n Designed with CCIR601 standard for interlaced standard digitalvideo.n 720 x 576 (PAL) or 720 x 483 (NTSC)

    n 30 Hz progressive, 60 Hz interlaced

    n Maximum bit rate is 15 Mbits /s

    q MP@HL (Main Profile High Level):n Upper bounds:

    n 1152 x 1920, 60Hz progressive

    n 80 Mbits/s

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 48

    MPEG encoder/ decoder

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 49

    Predictionn Backward prediction is done by

    storing pictures until the desiredanchor picture is available beforeencoding the current stored frames.

    n The encoder can decide to use:

    q Forward prediction from a previouspicture,

    q Backward prediction from a followingpicture,

    q or Interpolated prediction

    to minimize prediction error.n The encoder must transmit pictures in

    an order differ from that of sourcepictures so that the decoder has theanchor pictures before decodingpredicted pictures. (See next slide)

    n The decoder must have two framestored.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 50

    I P B Picture Reordering

    n Pictures are coded and decoded in a different orderthan they are displayed.

    n Due to bidirectional prediction for B pictures.n For example we have a 12 picture long GOP:n Source order and encoder input order:

    q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)

    B(12) I(13)n Encoding order and order in the coded bitstream:

    q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)B(12)

    n Decoder output order and display order (same asinput):q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)

    B(12) I(13)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 51

    DCT and IDCT formulas

    n DCT:q Eq 1 Normal formq Eq 2 Matrix form

    n IDCT:q Eq 3 Normal formq Eq 4 Matrix form

    n Where:q F(u,v) = two-dimensional

    NxNDCT.q u,v,x,y = 0,1,2,...N-1q x,y are spatial coordinates in

    the sample domain.

    q u,v are frequency coordinatesin the transform domain.

    q C(u), C(v) = 1/(square root(2)) for u, v = 0.

    q C(u), C(v) = 1 otherwise.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 52

    DCT versus DFT

    n The DCT is conceptually similar to the DFT, except:q DCT concentrates energy into lower order coefficients

    better than DFT.

    q DCT is purely real, the DFT is complex (magnitude andphase).

    q A DCT operation on a block of pixels produces coefficientsthat are similar to the frequency domain coefficients

    produced by a DFT operation.n An N-point DCT has the same frequency resolution as a 2N-

    point DFT.n The N frequencies of a 2N point DFT correspond to N points

    on the upper half of the unit circle in the complex frequencyplane.

    q Assuming a periodic input, the magnitude of the DFTcoefficients is spatially invariant (phase of the input doesnot matter). This is not true for the DCT.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 53

    Quantization matrixn Note DCT

    coefficients are:q Small in the upper left

    (low frequencies),

    q Large in the upper right(high frequencies)

    Recall the JPEGmechanism !!

    n Why ?q HVS is less sensitive

    to errors in highfrequency coefficientsthan it is for lowerfrequencies

    q higherfrequenciesshould be morecoarsely quantized !!

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 54

    Result DCT matrix (example)

    n After adaptive

    quantization, the

    result is a matrix

    containing many


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 55

    MPEG scanning

    n Left Zigzag scanning(like JPEG)

    n Right Alternatescanning better for interlaced frames !

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 56

    Huffman/ Run-Level Coding

    n Huffman codingin combination with Run-Levelcodingand zig-zag scanningis applied toquantized DCT coefficients.

    n "Run-Level" = A run-length of zeros followed by anon-zero level.


    Huffman coding is also applied to various types ofside information.

    n A Huffman code is an entropy code which isoptimally achieves the shortest average possiblecode word lengthfor a source.

    n This average code word length is >= the entropyof the source.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 57

    Huffman/ Run-Level coding illustrated

    n Using the DCT outputmatrix in previous slide,after being zigzagscanned the outputwill be a sequence ofnumber: 4, 4, 2, 2, 2, 1,1, 1, 1, 0 (12 zeros), 1, 0(41 zeros)

    n These values are lookedup in a fixed table ofvariable length codesq The most probable

    occurrence is given arelatively short code,

    q The least probableoccurrence is given arelatively long code.


    0010 0010 0112










    110 10008 (DC Value)N/A

    MPEGCode Value



    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 58

    Huffman/ Run-Level coding illustrated (2)

    n The first run of 12 zeroes has been efficientlycoded by only 9 bits

    n The last run of 41 zeroes has been entirelyeliminated, represented only with a 2-bit End OfBlock (EOB) indicator.

    n The quantized DCT coefficients are nowrepresented by a sequence of 61 binary bits(Seethe table).

    n Considering that the original 8x8 block of 8-bitpixels required 512 bitsfor full representation, the compression rateis approx. 8,4:1.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 59

    MPEG Data Transportn MPEG packages all data into fixed-size 188-byte packets for transport.n Video or audio payload data placed in PES packets before is broken up

    into fixed length transport packet payloads.n A PES packet may be much longer than a transport packet Require

    segmentation:q The PES headeris placed immediately following a transport header

    q Successive portions of the PES packet are then placed in the payloads oftransport packets.

    q Remaining space in the final transport packet payload is filled with stuffingbytes = 0xFF (all ones).

    q Each transport packet starts with a sync byte = 0x47.q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not

    processed, but is replaced by a different sync symbol especiallysuited to RFtransmission.

    q The transport packet header contains a 13-bit PID (packet ID), whichcorresponds to a particular elementary stream of video, audio, or other programelement.

    q PID 0x0000 is reservedfor transport packets carrying a program associationtable (PAT).

    q The PAT points to a Program Map Table (PMT) points to particular elementsof a program

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 60

    MPEG Transport packet

    n Adaptation Field:q 8 bits specifying the length of the

    adaptation field.q The first group of flags consists of

    eight 1-bit flags:q discontinuity_indicatorq random_access_indicatorq elementary_stream_priority_in


    q PCR_flagq OPCR_flagq splicing_point_flagq transport_private_data_flagq adaptation_field_extension_flagq The optionalfieldsare present if

    indicated byone of thepreceding flags.

    q Theremainder of theadaptation field isfilled with stuffing bytes (0xFF, allones).

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 61

    Demultiplexing a Transport Stream (TS)

    n Demultiplexing a transport stream involves:1. Finding the PAT by selecting packets with PID = 0x0000

    2. Reading the PIDs for the PMTs

    3. Reading the PIDs for the elements of a desired programfrom its PMT (for example, a basic program will have aPID for audio and a PID for video)

    4. Detecting packets with the desired PIDs and routing them

    to the decoders

    q A MPEG-2 transport stream can carry: Video stream

    Audio stream

    Any type of data

    MPEG-2 TS is the packet format for CATV downstreamdata communication.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 62

    Timing & buffer controln Point A:

    Encoder input


    n Point B:EncoderoutputVariable rate

    n Point C:EncoderbufferoutputConstant rate


    Point D:Communicationchannel +decoderbufferConstantrate

    n Point E:Decoder inputVariable rate

    n Point F:DecoderoutputConstant/specifiedrate

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 63

    Timing - Synchronization

    n The decoder is synchronized with the encoder by time stamps

    n The encoder contains a master oscillator and counter, called theSystem Time Clock (STC). (See previous block diagram.)

    q The STC belongs to a particularprogram and is the masterclock of the video and audio encodersfor that program.

    q Multiple programs, each with its own STC, can also bemultiplexed into a single stream.

    n A program component can even have no time stamps butcan not be synchronized with other components.

    n At encoder input, (Point A), the time of occurrence of an inputvideo picture or audio block is noted by sampling the STC.

    n A total delay of encoder and decoder buffer (constant) isadded to STC, creating a Presentation Time Stamp (PTS),

    q PTS is then inserted in the first of the packet(s) representingthat picture or audio block, at Point B.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 64

    Timing Synchronization (2)n Decode Time Stamp (DTS)can optionally combined into the bit

    stream represents the time at which the data should be takeninstantaneously from the decoder buffer and decoded.q DTS and PTS are identical except in the case of picture reordering for B

    pictures.q The DTS is only used where it is needed because of reordering.

    Whenever DTS is used, PTS is also coded.q PTS (or DTS) inserted interval = 700mS.

    q In ATSC PTS (or DTS) must be inserted at the beginning of eachcoded picture (access unit ).

    n In addition, the output of the encoder buffer (Point C) is timestamped with System Time Clock (STC)values, called:q System Clock Reference (SCR)in a Program Stream.q Program Clock Reference (PCR)in a Transport Stream.

    n PCR time stamp interval = 100mS.n SCR time stamp interval = 700mS.n PCR and/or the SCR are used to synchronize the decoder STC

    with the encoder STC.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 65

    Timing Synchronization (3)n All video and audio streams included in a program must get their

    time stamps from a common STC so that synchronization of thevideo and audio decoders with each other may be accomplished.

    n The data rate and packet rate on the channel (at the multiplexeroutput) can be completely asynchronous with the System TimeClock (STC)

    n PCR time stamps allows synchronizations of differentmultiplexed programs having different STCs while allowing STCrecovery for each program.

    n If there is no buffer underflow or overflow delays in the buffersand transmission channel for both video and audio areconstant.

    n The encoder input and decoder output run at equal and constantrates.

    n Fixedend-to-end delay from encoder input to decoder outputn If exact synchronization is not required, the decoder clock can be

    free running video frames can be repeated / skipped asnecessary to prevent buffer underflow / overflow , respectively.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 66

    HDTV (High definition television)

    n High definition television (HDTV) first came topublic attention in 1981, when NHK, theJapanese broadcasting authority, firstdemonstrated it in the United States.

    n HDTV is defined by the ITU-R as:

    q 'A system designed to allow viewing at aboutthree times the picture height, such that thesystem is virtually, or nearly, transparent to thequality or portrayal that would have beenperceived in the original scene ... by a discerningviewer with normal visual acuity.'

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 67

    HDTV (2)

    n HDTV proposals are for a screen which is wider than the conventionalTV image by about 33%. It is generally agreed that the HDTV aspectratio will be 16:9, as opposed to the 4:3 ratio of conventional TVsystems. This ratio has been chosen because psychological tests haveshown that it best matches the human visual field.

    n It also enables use of existing cinema film formats as additional sourcematerial, since this is the same aspect ratio used in normal 35 mm film.Figure 16.6(a) shows how the aspect ratio of HDTV compares with thatof conventional television, using the same resolution, or the samesurface area as the comparison metric.

    n To achieve the improved resolution the video image used in HDTVmust contain over 1000 lines, as opposed to the 525 and 625 providedby the existing NTSC and PAL systems. This gives a much improvedvertical resolution. The exact value is chosen to be a simple multiple ofone or both of the vertical resolutions used in conventional TV.

    n However, due to the higher scan rates the bandwidth requirement foranalogue HDTV is approximately 12 MHz, compared to the nominal 6MHz of conventional TV

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 68

    HDTV (3)

    n The introduction of a non-compatible TV transmission format forHDTV would require the viewer either to buy a new receiver, or tobuy a converter to receive the picture on their old set.

    n The initial thrust in Japan was towards an HDTV format which iscompatible with conventional TV standards, and which can bereceived by conventional receivers, with conventional quality.However, to get the full benefit of HDTV, a new wide screen, highresolution receiver has to be purchased.

    n One of the principal reasons that HDTV is not already common isthat a general standard has not yet been agreed. The 26th CCIRplenary assembly recommended the adoption of a single, worldwidestandard for high definition television.

    n Unfortunately, Japan, Europe and North America are all investingsignificant time and money in their own systems based on their own,current, conventional TV standards and other nationalconsiderations.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 69

    H261- H263

    n The H.261 algorithm was developed for the purpose of imagetransmission rather than image storage.

    n It is designed to produce a constant output of px 64 kbivs, wherepis an integer in the range 1 to 30.q This allows transmission over a digital network or data link of

    varying capacity.

    q It also allows transmission over a single 64 kbit/s digitaltelephone channel for low quality video-telephony, or at higher bitrates for improved picture quality.

    n The basic coding algorithm is similar to that of MPEG in that it isa hybrid of motion compensation, DCT and straightforwardDPCM (intra-frame coding mode), without the MPEG I, P, Bframes.

    n The DCT operation is performed at a low level on 8 x 8 blocks oferror samples from the predicted luminance pixel values, withsub-sampled blocks of chrominance data.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 70

    H261-H263 (2)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 71

    H261-H263 (3)

    n H.261 is widely used on 176x 144 pixel images.

    n The ability to select a range of output rates for the algorithmallows it to be used in different applications.

    n Low output rates ( p= 1 or 2) are only suitable for face-to-face(videophone) communication. H.261 is thus the standard used inmany commercial videophone systems such as the UKBT/Marconi Relate 2000 and the US ATT 2500 products.


    Video-conferencing would require a greater output data rate ( p>6) and might go as high as 2Mbit/s for high quality transmissionwith larger image sizes.

    n A further development of H.261 is H.263 for lower fixedtransmission rates.

    n This deploys arithmetic coding in place of the variable lengthcoding (See H261 diagram), with other modifications, the datarate is reduced to only 20kbit/s.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 72

    Model Based Coding (MBC)

    n At the very low bit rates (20 kbit/s or less) associated with videotelephony, the requirements for image transmission stretch thecompression techniques described earlier to their limits.

    n In order to achieve the necessary degree of compression theyoften require reduction in spatial resolution or even theelimination of frames from the sequence.

    n Model based coding (MBC) attempts to exploit a greater degree

    of redundancy in images than current techniques, in order toachieve significant image compression but without adverselydegrading the image content information.

    n It relies upon the fact that the image quality is largely subjective.Providing that the appearance of scenes within an observedimage is kept at a visually acceptable level, it may not matter thatthe observed image is not a precise reproduction of reality.

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 73

    Model Based Coding (2)n One MBC method for producing an artificial image of a head sequence

    utilizes a feature codebook where a range of facial expressions,sufficient to create an animation, are generated from sub-images ortemplates which are joined together to form a complete face.

    n The most important areas of a face, for conveying an expression, arethe eyes and mouth, hence the objective is to create an image in whi chthe movement of the eyes and mouth is a convincing approximation tothe movements of the original subject.

    n When forming the synthetic image, the feature template vectors whichform the closest match to those of the original moving sequenceare

    selected from the codebook and then transmitted as low bit rate codedaddresses.

    n By using only 10 eye and 10 mouth templates, for instance, a total of100 combinations exists implying that only a 6-bit codebook addressneed be transmitted.

    n It has been found that there are only 13 visually distinct mouth shapesfor vowel and consonant formation during speech.

    n However, the number of mouth sub-images is usually increased, toinclude intermediate expressions and hence avoid step changes in theimage.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 74

    Model Based Coding (3)n Another common way of representing objects in three-

    dimensional computer graphics is by a net ofinterconnecting polygons.

    n A model is stored as a set of linked arrays which specifythe coordinates of each polygon vertex, with the linesconnecting the vertices together forming each side of apolygon.

    n To make realistic models, the polygon net can beshaded to reflect the presence of light sources.

    n The wire-frame model [Welch 19911 can be modified tofit the shape of a person's head and shoulders. The

    wire-frame, composed of over 100 interconnectingtriangles, can produce subjectively acceptable syntheticimages, providing that the frame is not rotated b y morethan 30" from the full-fa ce position.

    n The model, (see the Figure) uses smaller triangles inareas associated with high degrees of curvature wheresignificant movement is required.

    n Large flat areas, such as the forehead, contain fewertriangles.

    n A second wire-frame is used to model the mout hinterior.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 75

    Model based coding (4)n A synthetic image is created by texture mapping detail from an

    initial full-face source image, over the wire-frame, Facialmovement can be achieved by manipulation of the vertices of thewire-frame.

    n Head rotation requires the use of simple matrix operations uponthe coordinate array. Facial expression requires the manipulationof the features controlling the vertices.

    n This model based feature codebook approach suffers from thedrawback of codebook formation.

    n This has to be done off-line and, consequently, the image isrequired to be prerecorded, with a consequent delay.n However, the actual image sequence can be sent at a very low

    data rate. For a codebook with 128 entries where 7 bits arerequired to code each mouth, a 25 frameh sequence requiresless than 200 bit/s to code the mouth movements.

    n When it is finally implemented, rates as low as 1 kbit/s areconfidently expected from MBC systems, but they can onlytransmit image sequences which match the stored model, e.g.head and shoulders displays.

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 76

    Key points:

    n JPEG coding mechanism DCT/ Zigzag Scanning/ AdaptiveQuantization / VLC

    n MPEG layered structure:q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,

    Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream(PES)

    n MPEG compression mechanism:q Predictionq Motion compensationq Scanningq YCbCr formats (4:4:4, 4:2:0, etc)

    q Profiles @ Level

    q I,P,B pictures & reordering

    q Encoder/ Decoder process & Block diagram

    n MPEG Data transport

    n MPEG Timing & Buffer controlq STC/SCR/DTSq PCR/PTS

    h i l i f i

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 77

    Technical termsn Macro blocksn HVS = Human Visual Systemn GOP = Group of Picturesn VLC = Variable Length Coding/Coder

    n IDCT/DCT = (Inverse) Discrete Cosine Transformn PES = Packetized ElementaryStreamn MP@ML = Main profile @ Main Level

    n PCR = Program Clock Referencen SCR = System Clock Referencen STC = System Time Clock

    n PTS = Presentation Time Stampn DTS = Decode Time Stampn PAT = Program Association Table

    n PMT = Program Map Table

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 78

    Chapter 3. CATV systems

    n Overview:

    qA brief history

    qModern CATV networks

    qCATV systems and equipments

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 79

    A Brief History:

    q CATV appeared in the 60s in the US, where highbuildings are the great obstacles for the

    propagation of TV signal.

    q Old CATV networks

    n Coaxial only

    n Tree-and-Branch only

    n TV only

    n No return path ( high-pass filters are installed in

    customers houses to block return low frequency noise)

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 80

    Modern CATV networksn Key elements:

    q CO orMasterHeadend

    q Headends/Hub

    q Servercomplex

    q CMTS

    q TV contentprovider

    q OpticalNodes

    q Taps

    q Amplifiers(GNA/TNA/LE)

    M d CATV k (2) CATV d i

  • 8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133


    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 81

    Modern CATV networks (2)n Based on Hybrid Fiber-Coaxialarchitecture also referred to

    as HFC networks

    n The optical section is based on modern optical communicationtechnologies q Star/ring/mesh, etc topologiesq SDH/SONET for digital fibers

    q Various architectures digital, analog or mixed fiber cablingsystems.

    n Part of forward path spectrum is used for high-speed Internetaccess

    n Return path is exploited for Digital data communication theroot of new problems !!q 5-60 MHz band for upstream

    q 88-860 MHz band for downstreamn 88-450 MHz for analog/digital TV channelsn 450-860 MHz for Internet access

    q FDM

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 82

    Spectrum allocation of CATV networks

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 83

    CATV systems and equipments

    4/2/2003 Nguyen Chan Hung Hanoi University of Technology 84


    n Perception = Su nhan thuc

    n Lap = Phu len