Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
-
Upload
nvdangdt1k52bk03 -
Category
Documents
-
view
218 -
download
0
Transcript of Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
1/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 1
Multimedia Technology
n Overview
q Introduction
q Chapter 1: Background of compression techniques
q Chapter 2: Multimedia technologies
n JPEG
n MPEG-1/MPEG -2 Audio & Video
n MPEG-4
n MPEG-7 (brief introduction)n HDTV (brief introduction)
n H261/H263 (brief introduction)
n Model base coding (MBC) (brief introduction)
q Chapter 3: Some real-world systems
n CATV systems
n DVB systems
q Chapter 4: Multimedia Network
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 2
Introductionn The importance of Multimedia technologies: Multimedia everywhere !!
q On PCs:
n Real Player, QuickTime, Windows Media.
n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,mov, ra, ram, mid, DIVX, etc)
n Video/Audio Conferences.
n Webcast / Streaming Applications
n Distance Learning (or Tele-Education)
n Tele-Medicinen Tele-xxx (Lets imagine !!)
q On TVs and other home electronic devices:
n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting Terrestrial/Cable/Satellite) shows MPEG-2 superior quality overtraditional analog TV !!
n Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !! No need to wait for a PC to startup and shutdown !!
n CD/VCD/DVD/Mp3 players
q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 3
Introduction (2)
n Multimedia networkq The Internet was designed in the 60s for low-speed inter-
networks with boring textual applications High delay,high jitter.
q Multimedia applications require drastic modificationsof the INTERNET infrastructure.
q Many frameworks have been being investigated and
deployed to support the next generation multimediaInternet. (e.g. IntServ, DiffServ)q In the future, all TVs (and PCs) will be connected to the
Internet and freely tuned to any of millions broadcaststations all over the World.
q At present, multimedia networks run over ATM (almostobsolete), IPv4, and in the future IPv6 shouldguarantee QoS (Quality of Service) !!
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 4
Chapter 1: Background of compression
techniques
n Why compression ?q For communication: reduce bandwidth in multimedia
network applications such as Streaming media, Video-on-Demand (VOD), Internet Phone
q Digital storage (VCD, DVD, tape, etc) Reduce size &cost, increase media capacity & quality.
n Compression factor or compression ratioq Ratio between the source data and the compressed data.
(e.g. 10:1)
n 2 types of compression:q Lossless compression
q Lossy compression
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
2/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 5
Information content and redundancy
n Information rate
q Entropy is the measure of information content.n Expressed in bits/source output unit (such as bits/pixel).
q The more information in the signal, the higher theentropy.
q Lossy compression reduce entropy while lossless
compression does not.n Redundancy
q The difference between the information rate and bitrate.
q Usually the information rate is much less than the bitrate.
q Compression is to eliminate the redundancy.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 6
Lossless Compression
n The data from the decoder is identical to the
source data.
q Example: archives resulting from utilities such aspkzip or Gzip
q Compression factor is around 2:1.n Can not guarantee a fix compression ratio
The output data rate is variable problems
for recoding mechanisms or communication
channel.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 7
Lossy Compression
n The data from the expander is not identical to
the source data but the difference can not be
distinguished auditorily or visually.
q Suitable for audio and video compression.
q
Compression factor is much higher than that oflossless. (up to 100:1)
n Based on the understanding of
psychoacoustic and psychovisual perception.
n Can be forced to operate at a fixed
compression factor.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 8
Process of Compression
n Communication (reduce the cost of the data
link)
q Data ? Compressor (coder) ? transmissionchannel ? Expander (decoder) ? Data'
n Recording (extend playing time: in proportionto compression factor
q Data ? Compressor (coder) ? Storagedevice(tape, disk, RAM, etc.) ? Expander (decoder)? Data
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
3/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 9
Sampling and quantization
n Why sampling?
q Computer can not process analog signal directly.
n PCMq Sample the analog signal at a constant rate and
use a fixed number of bits (usually 8 or 16) to
represent the samples.q bit rate = sampling rate * number of bits per
sample
n Quantizationq Map the sampled analog signal (generally, infinite
precision) to discrete level (finite precision).
q Represent each discrete level with a number.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 10
Predictive coding
n Prediction
q Use previous sample(s) to estimate the currentsample.
q For most signal, the difference of the prediction
and actual values is small.We can use smallernumber of bits to code the differencewhilemaintaining the same accuracy !!
q Noise is completely unpredictable
n Most codec requires the data being preprocessed or
otherwise it may perform badly when the data contains
noise.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 11
Statistical coding: the Huffman code
n Assign short code to the most probable data
pattern and long code to the less frequent
data pattern.
n Bit assignment based on statistic of the
source data.n The statistics of the data should be known
prior to the bit assignment.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 12
Drawbacks of compression
n Sensitive to data errorq Compression eliminates the redundancy which is essential
to making data resistant to errors.
n Concealment required for real time applicationq Error correction code is required, hence, adds redundancy
to the compressed data.
n Artifacts
q Artifacts appear when the coder eliminates part of the
entropy.
q The higher the compression factor, the more the artifacts.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
4/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 13
A coding example: Clustering color pixels
n In an image, pixel values are clustered in severalpeaks
n Each cluster representing the color range of oneobject in the image (e.g. blue sky)
n Coding process:1. Separate the pixel values into a limited number of data
clusters (e.g., clustered pixels of sky blue or grass green)2. Send the average color of each cluster and anidentifying number for each cluster as side information.
3. Transmit, for each pixel:n The number of the average cluster color that it is close to.n Its difference from that average cluster color. ( can be
coded to reduce redundancy since the differences are oftensimilar !!) Prediction
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 14
Frame-Differential Coding
n Frame-Differential Coding = prediction from aprevious video frame.
n A video frame is stored in the encoder forcomparison with the present frame causesencodinglatency of one frame time.
n For still images:q Data can be sent only for the first instance of a frameq All subsequent prediction error values are zero.
q Retransmit the frame occasionally to allow receivers thathave just been turned on to have a starting point.
n FDC reduces the information for still images, butleaves significant data for moving images (e.g. amovement of the camera)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 15
Motion Compensated Predictionn More data in Frame-Differential Coding can
be eliminated by comparing the presentpixel to the location of the same objectinthepreviousframe. ( not to thesame spatial location in the previous frame)
n The encoder estimates the motion in theimage to find the corresponding area in aprevious frame.
n The encoder searches for a portion of aprevious frame which is similar to the part
of the new frame to be transmitted.n It then sends (as side information) a
motion vectortelling the decoder whatportion of the previous frame it will use topredict the new frame.
n It also sends the prediction errorso thatthe exact new frame may be reconstituted
n See top figure without motioncompensation Bottom figure Withmotion compensation
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 16
Unpredictable Information
n Unpredictable information from the previous
frame:
1. Scene change (e.g. background landscapechange)
2. Newly uncovered information due to objectmotion across a background, or at the edges of apanned scene. (e.g. a soccer s face uncoveredby a flying ball)
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
5/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 17
Dealing with unpredictable Information
n Scene changeq An Intra-coded picture (MPEG I picture) must be sent
for a starting point require more data than Predictedpicture (P picture)
q I pictures are sent about twice per second Their time andsending frequency may be adjusted to accommodatescene changes
n Uncovered informationq Bi-directionally coded type of picture, or B picture.
q There must be enough frame storage in the system to waitfor the later picturethat has the desired information.
q To limit the amount of decoders memory, the encoderstores pictures and sends the required referencepictures before sending the B picture.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 18
Transform Coding
n Convert spatial image pixel values totransform coefficient values
n the number of coefficients produced isequal to the number of pixels transformed.
n
Few coefficients contain most of theenergyin a picture coefficients may befurther coded by lossless entropy coding
n The transform process concentrates theenergy into particular coefficients(generally the low frequency coefficients )
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 19
Types of picture transform coding
n Types of picture coding:q Discrete Fourier (DFT)
q Karhonen-Loeve
q Walsh-Hadamard
q Lapped orthogonal
q Discrete Cosine (DCT) used in MPEG-2 !
q Wavelets New !
n The differences between transform coding methods:q The degree of concentration of energy in a few coefficients
q The region of influence of each coefficient in thereconstructed picture
q The appearance and visibility of coding noise due to coarsequantization of the coefficients
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 20
DCT Lossy Coding
n Lossless coding cannot obtain high
compression ratio (4:1 or less)
n Lossy coding = discard selective information
so that the reproduction is visually or aurally
indistinguishable from the source or havingleast artifacts.
n Lossy coding can be achieved by:
q Eliminating some DCT coefficients
q Adjusting the quantizing coarseness of thecoefficients better !!
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
6/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 21
Masking
n Masking make certain types of codingnoise invisible or inaudible due to some
psycho-visual/acoustical effect.
q In audio, a pure tone will mask energy of higher
frequency and also lower frequency (with weakereffect).
q In video, high contrast edges mask random noise.
n Noise introduced at low bit rates falls in the
frequency, spatial, or temporal regions
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 22
Variable quantization
n Variable quantization is the main technique of lossycoding greatly reduce bit rate.
n Coarsely quantizing the less significant coefficientsin a transform ( less noticeable / low energy / lessvisible/audible)
n Can be applied to a complete signal or to individualfrequency components of a transformed signal.
n VQ also controls instantaneous bit rate in order to:q Match average bit rate to a constant channel bit rate.
q Prevent buffer overflow or underflow.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 23
Run-Level coding
n "Run-Level" coding = Coding a run-length of
zeros followed by a nonzero level.
q Instead of sending all the zero valuesindividually, the length of the runis sent.
q Useful for any data with long runs of zeros.
q Run lengths are easily encoded by Huffman code
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 24
Key points:
n Compression process
n Quantization & Sampling
n Coding:
q Lossless & lossy coding
q Frame-Differential Coding
q Motion Compensated Prediction
q Variable quantization
q Run level coding
n Masking
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
7/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 25
Chapter 2: Multimedia technologies
q Roadmapn JPEG
n MPEG-1/MPEG-2 Video
n MPEG-1 Layer 3 Audio (mp3)
n MPEG-4
n
MPEG-7 (brief introduction)n HDTV (brief introduction)
n H261/H263 (brief introduction)
n Model base coding (MBC) (brief introduction)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 26
JPEG (Joint Photographic Experts Group)
n JPEG encoder
q Partitions image into blocks of 8 * 8 pixels
q Calculates the Discrete Cosine Transform (DCT)of each block.
q A quantizer roundsoff the DCT coefficients according to thequantizationmatrix. lossy but allows for large compression ratios.
q Produces a series of DCT coefficients using Zig-zag scanning
q Uses a variablelengthcode(VLC)on these DCT coefficients
q Writes the compressed data stream to an output file (*.jpg or *.jpeg).
n JPEG decoder
q File input data stream Variable length decoder IDCT (Inverse
DCT) Image
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 27
JPEG Zig-zag scanning
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 28
JPEG - DCTn DCT is similar to the Discrete Fourier Transform
transforms a signal or image from the spatial domain tothe frequency domain.
n DCT requires less multiplications than DFT
n Input image A:
q The input image A is N2 pixels wide by N1 pixels high;
q A(i,j) is the intensity of the pixel in row i and column j;
n Output image B:q B(k1,k2) is the DCT coefficient in row k1 and column k2 of
the DCT matrix
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
8/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 29
JPEG - Quantization Matrix
n The quantization matrixis the 8 by 8 matrix of step sizes(sometimescalled quantums) - one element for each DCTcoefficient.
n Usually symmetric.
n Step sizes will be:
q Small in the upper left (low frequencies),
q Large in the upper right (high frequencies)
q A step size of 1 is the most precise.
n The quantizer divides the DCT coefficient by its correspondingquantum, then rounds to the nearest integer.
n Large quantums drive small coefficients down to zero.
n The result:
q Many high frequency coefficients become zero remove easily.
q The low frequency coefficients undergo only minor adjustment.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 30
JPEG Coding process illustrated
1255 -15 43 58 -12 1 -4 -6
11 -65 80 -73 -27 -1 -5 1
-49 37 -87 8 12 6 10 8
27 -50 29 13 3 13 -6 5
-16 21 -11 -10 10 -21 9 -6
3 -14 0 14 -14 16 -8 4
-4 -1 8 -13 12 -9 5 -1
-4 2 -2 6 -7 6 -1 3
78 -1 4 4 -1 0 0 0
1 -5 6 -4 -1 0 0 0
-4 3 -5 0 0 0 0 0
2 -3 1 0 0 0 0 0
-1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Q
Zigzag scan result: 78-1 1 -4 -5 4 4 6 3 2 -1-3 -5-4 -1 0 -1 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
Easily coded by Run-length Huffman coding
DCT Coefficients Quantization result
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 31
MPEG (Moving Picture Expert Group)
n MPEG is the heart of:
q Digital television set-top boxes
q HDTV decoders
q DVD players
q
Video conferencingq Internet video, etc
n MPEG standards:
q MPEG-1, MPEG-2, MPEG-4, MPEG-7
q (MPEG-3 standard was abandoned and becamean extension of MPEG-2)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 32
MPEG standardsn MPEG-1 (Obsolete)
q A standard for storage and retrieval of moving pictures and audioon storage media
q application: VCD (video compact disk)
n MPEG-2 (Widely implemented)q A standard for digital televisionq Applications: DVD (digital versatile disk), HDTV (high definition
TV), DVB (European Digital Video Broadcasting Group), etc.n MPEG-4 (Newly implemented still being
researched)q A standard for multimedia applicationsq Applications: Internet, cable TV, virtual studio, etc.
n MPEG-7 (Future work ongoing research)q Content representation standard for information search
( Multimedia Content Description Interface)q Applications: Internet, video search engine, digital library
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
9/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 33
MPEG -2 formal standards
n The international standard ISO/IEC 13818-2"Generic Coding of Moving Pictures and
Associated Audio Information
n ATSC (Advanced Television SystemsCommittee) document A/54 "Guide to the Use of
the ATSC Digital Television Standard
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 34
MPEG video data structure
n The MPEG 2 video data stream is constructed inlayersfrom lowest to highest as follows:q PIXEL is the fundamental unit
q BLOCK is an 8 x 8 array of pixels
q MACROBLOCK consists of 4 luma blocks and 2chromablocks
q Field DCT Coding and Frame DCT Coding
q SLICE consists of a variable number of macroblocks
q PICTURE consists of a frame (or field) of slices
q GROUP of PICTURES (GOP) consists of a variablenumber of pictures
q SEQUENCE consists of a variable number of GOPs
q PACKETIZED ELEMENTARY STREAM (opt)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 35
Pixel & Block
n Pixel = "picture element".
q A discrete spatial point sample of an image.
q A color pixel may be represented digitally as anumber of bits for each of three primary colorvalues
n Block
q = 8 x 8 array of pixels.
q A block is the fundamental unit for the DCT coding(discrete cosine transform).
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 36
Macroblock
n A macroblock = 16 x 16 array of luma (Y) pixels ( =4 blocks = 2 x 2 block array).
n The number of chromapixels (Cr, Cb) will varydepending on the chroma pixel structureindicated in the sequence header (e.g. 4:2:0, etc)
n The macroblock is the fundamental unit for motion
compensation and will have motion vector(s)associated with it if is predictively coded.
n A macroblock is classified asq Field coded ( An interlaced frame consists of 2 field)
q Frame coded
depending on how the four blocks are extracted from themacroblock.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
10/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 37
Slice
n Pictures are divided into slices.
n A slice consists of an arbitrary number of
successive macroblocks (going left to right),
but is typically an entire row of macroblocks.
A slice does not extend beyond one row.
n The slice header carries address information
that allows the Huffman decoder to
resynchronize at slice boundaries
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 38
Picture
n A source picture is a contiguous rectangular array of pixels.
n A picture may be a complete frame of video ("frame picture") orone of the interlaced fields from an interlaced source ("fieldpicture").
n A field picture does not have any blank lines between its activelines of pixels.
n A coded picture (also called a video access unit) begins with a
start code and a header. The header consists of:q picture type (I, B, P)
q temporal reference informationq motion vector search range
q optional user data
n A frame picture consists of:q a frame of a progressive source orq a frame (2 spatially interlaced fields) of an interlaced source
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 39
I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B.
n I Pictures = Intra Coded Pictures
q All macroblocks coded without prediction
q Needed to allow receiver to have a "starting point" for prediction aftera channel change and to recover from errors
n P Pictures = Predicted Pictures
q Macroblocks may be coded with forward prediction from references
made from previous I and P pictures or may be intra codedn B Pictures = Bi -directionally predicted pictures
q Macroblocks may be coded with forward prediction from previous Ior P references
q Macroblocks may be coded with backward prediction from next I orP reference
q Macroblocks may be coded with interpolated prediction from pastand future I or P references
q Macroblocks may be intra coded (no prediction)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 40
Group of pictures (GOP)n The group of pictures layer is optional in MPEG-2.
n GOP begins with a start code and a header
n The header carriesq time code information
q editing informationq optional user data
n First encoded picture in a GOP is always an I picture
n Typical length is 15 pictures with the following structure (in display order):
q I B B P B B P B B P B B P B B Provides an I picture with sufficientfrequency to allow a decoder to decode correctly
I B B P PB B B B P BTime
Forward motion compensation
Bidirectional motion compensation
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
11/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 41
Sequence
n A sequence begins with a unique 32 bit start code followed bya header.
n The header carries:
q picture size
q aspect ratio
q frame rate and bit rate
q optional quantizer matrices
q required decoder buffer size
q chroma pixel structure
q optional user data
n The sequence information is needed for channel changing.
n The sequence length depends on acceptable channel changedelay.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 42
Packetized Elementary Stream (PES)
n Video Elementary Stream(video ES), consists of allthe video data for a sequence, including the sequenceheader and all the subparts of a sequence.
n An ES carries only one type of data (video or audio)from a single video or audio encoder.
n A PES, consists of a single ES which has been split
into packets, each starting with an added packetheader.
n A PES stream contains only one type of data fromone source, e.g. from one video or audio encoder.
n PES packets have variable length, not correspondingto the fixed packet length of transport packets, andmay be much longer than a transport packet.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 43
Transport stream
n Transport packets (fixedlength) are formed from a PES stream,including:q The PES header
q Transport packet header.
q Successive transport packets payloads are filled by the remainingPES packet content until the PES packet is all used.
q The final transport packet is filled to a fixed length by stuffing with0xFF bytes (all ones).
n Each PES packet header includes:q An 8-bit stream IDidentifying the source of the payload.q Timing references: PTS (presentation time stamp), the time
at which a decoded audio or video access unit is to bepresented by the decoder
q DTS (decoding time stamp)the time at which an access unitis decoded by the decoder
q ESCR (elementary stream clock reference).
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 44
Intra Frame Codingn Intra codingonly concern with information within the current
frame, (not relative to any other frame in the video sequence)
n MPEG intra-frame coding block diagram (See bottom Fig) Similar to JPEG (Lets review JPEG coding mechanism !!)
n Basic blocks of Intra frame coder:
q Video filter
q Discrete cosine transform (DCT)
q
DCT coefficient quantizerq Run-length amplitude/variable length coder (VLC)
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
12/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 45
Video Filter
n Human Visual System (HVS) isq Most sensitive to changes in luminance,
q Less sensitive to variations in chrominance.
n MPEG uses the YCbCr color space to represent thedata values instead of RGB, where:q Y is the luminance signal ,
q Cb is the blue color difference signal,q Cr is the red color difference signal.
n What is 4:4:4, 4:2:0, etc, video format ?q 4:4:4 is full bandwidth YCbCr video each macroblock
consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks waste of bandwidth !!
q 4:2:0 is most commonly used in MPEG-2
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 46
Applications of chroma forma ts
Computer graphicsYYYYCbCrCbCrCbCrCbCr4:4:4
(12 blocks)
Studio production
environments
Professional editing
equipment,
YYYYCbCrCbCr4:2:2
(8 blocks)
Main stream television,
Consumer entertainment.YYYYCbCr
4:2:0
(6 blocks)
ApplicationMultiplex order (time)
within macroblock
chroma_for
mat
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 47
MPEG Profiles & levelsn MPEG-2 is classified into several profiles.
n Main profile features:q 4:2:0 chroma sampling formatq I, P, and B picturesq Non-scalable
n Main Profile is subdivided into levels.q MP@ML (Main Profile Main Level):
n Designed with CCIR601 standard for interlaced standard digitalvideo.n 720 x 576 (PAL) or 720 x 483 (NTSC)
n 30 Hz progressive, 60 Hz interlaced
n Maximum bit rate is 15 Mbits /s
q MP@HL (Main Profile High Level):n Upper bounds:
n 1152 x 1920, 60Hz progressive
n 80 Mbits/s
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 48
MPEG encoder/ decoder
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
13/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 49
Predictionn Backward prediction is done by
storing pictures until the desiredanchor picture is available beforeencoding the current stored frames.
n The encoder can decide to use:
q Forward prediction from a previouspicture,
q Backward prediction from a followingpicture,
q or Interpolated prediction
to minimize prediction error.n The encoder must transmit pictures in
an order differ from that of sourcepictures so that the decoder has theanchor pictures before decodingpredicted pictures. (See next slide)
n The decoder must have two framestored.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 50
I P B Picture Reordering
n Pictures are coded and decoded in a different orderthan they are displayed.
n Due to bidirectional prediction for B pictures.n For example we have a 12 picture long GOP:n Source order and encoder input order:
q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)n Encoding order and order in the coded bitstream:
q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)B(12)
n Decoder output order and display order (same asinput):q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 51
DCT and IDCT formulas
n DCT:q Eq 1 Normal formq Eq 2 Matrix form
n IDCT:q Eq 3 Normal formq Eq 4 Matrix form
n Where:q F(u,v) = two-dimensional
NxNDCT.q u,v,x,y = 0,1,2,...N-1q x,y are spatial coordinates in
the sample domain.
q u,v are frequency coordinatesin the transform domain.
q C(u), C(v) = 1/(square root(2)) for u, v = 0.
q C(u), C(v) = 1 otherwise.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 52
DCT versus DFT
n The DCT is conceptually similar to the DFT, except:q DCT concentrates energy into lower order coefficients
better than DFT.
q DCT is purely real, the DFT is complex (magnitude andphase).
q A DCT operation on a block of pixels produces coefficientsthat are similar to the frequency domain coefficients
produced by a DFT operation.n An N-point DCT has the same frequency resolution as a 2N-
point DFT.n The N frequencies of a 2N point DFT correspond to N points
on the upper half of the unit circle in the complex frequencyplane.
q Assuming a periodic input, the magnitude of the DFTcoefficients is spatially invariant (phase of the input doesnot matter). This is not true for the DCT.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
14/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 53
Quantization matrixn Note DCT
coefficients are:q Small in the upper left
(low frequencies),
q Large in the upper right(high frequencies)
Recall the JPEGmechanism !!
n Why ?q HVS is less sensitive
to errors in highfrequency coefficientsthan it is for lowerfrequencies
q higherfrequenciesshould be morecoarsely quantized !!
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 54
Result DCT matrix (example)
n After adaptive
quantization, the
result is a matrix
containing many
zeros.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 55
MPEG scanning
n Left Zigzag scanning(like JPEG)
n Right Alternatescanning better for interlaced frames !
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 56
Huffman/ Run-Level Coding
n Huffman codingin combination with Run-Levelcodingand zig-zag scanningis applied toquantized DCT coefficients.
n "Run-Level" = A run-length of zeros followed by anon-zero level.
n
Huffman coding is also applied to various types ofside information.
n A Huffman code is an entropy code which isoptimally achieves the shortest average possiblecode word lengthfor a source.
n This average code word length is >= the entropyof the source.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
15/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 57
Huffman/ Run-Level coding illustrated
n Using the DCT outputmatrix in previous slide,after being zigzagscanned the outputwill be a sequence ofnumber: 4, 4, 2, 2, 2, 1,1, 1, 1, 0 (12 zeros), 1, 0(41 zeros)
n These values are lookedup in a fixed table ofvariable length codesq The most probable
occurrence is given arelatively short code,
q The least probableoccurrence is given arelatively long code.
10EOBEOB
0010 0010 0112
11010
11010
11010
11010
0100020
0100020
0100020
0000110040
0000110040
110 10008 (DC Value)N/A
MPEGCode Value
AmplitudeZero
Run-Length
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 58
Huffman/ Run-Level coding illustrated (2)
n The first run of 12 zeroes has been efficientlycoded by only 9 bits
n The last run of 41 zeroes has been entirelyeliminated, represented only with a 2-bit End OfBlock (EOB) indicator.
n The quantized DCT coefficients are nowrepresented by a sequence of 61 binary bits(Seethe table).
n Considering that the original 8x8 block of 8-bitpixels required 512 bitsfor full representation, the compression rateis approx. 8,4:1.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 59
MPEG Data Transportn MPEG packages all data into fixed-size 188-byte packets for transport.n Video or audio payload data placed in PES packets before is broken up
into fixed length transport packet payloads.n A PES packet may be much longer than a transport packet Require
segmentation:q The PES headeris placed immediately following a transport header
q Successive portions of the PES packet are then placed in the payloads oftransport packets.
q Remaining space in the final transport packet payload is filled with stuffingbytes = 0xFF (all ones).
q Each transport packet starts with a sync byte = 0x47.q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
processed, but is replaced by a different sync symbol especiallysuited to RFtransmission.
q The transport packet header contains a 13-bit PID (packet ID), whichcorresponds to a particular elementary stream of video, audio, or other programelement.
q PID 0x0000 is reservedfor transport packets carrying a program associationtable (PAT).
q The PAT points to a Program Map Table (PMT) points to particular elementsof a program
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 60
MPEG Transport packet
n Adaptation Field:q 8 bits specifying the length of the
adaptation field.q The first group of flags consists of
eight 1-bit flags:q discontinuity_indicatorq random_access_indicatorq elementary_stream_priority_in
dicator
q PCR_flagq OPCR_flagq splicing_point_flagq transport_private_data_flagq adaptation_field_extension_flagq The optionalfieldsare present if
indicated byone of thepreceding flags.
q Theremainder of theadaptation field isfilled with stuffing bytes (0xFF, allones).
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
16/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 61
Demultiplexing a Transport Stream (TS)
n Demultiplexing a transport stream involves:1. Finding the PAT by selecting packets with PID = 0x0000
2. Reading the PIDs for the PMTs
3. Reading the PIDs for the elements of a desired programfrom its PMT (for example, a basic program will have aPID for audio and a PID for video)
4. Detecting packets with the desired PIDs and routing them
to the decoders
q A MPEG-2 transport stream can carry: Video stream
Audio stream
Any type of data
MPEG-2 TS is the packet format for CATV downstreamdata communication.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 62
Timing & buffer controln Point A:
Encoder input
Constant/specifiedrate
n Point B:EncoderoutputVariable rate
n Point C:EncoderbufferoutputConstant rate
n
Point D:Communicationchannel +decoderbufferConstantrate
n Point E:Decoder inputVariable rate
n Point F:DecoderoutputConstant/specifiedrate
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 63
Timing - Synchronization
n The decoder is synchronized with the encoder by time stamps
n The encoder contains a master oscillator and counter, called theSystem Time Clock (STC). (See previous block diagram.)
q The STC belongs to a particularprogram and is the masterclock of the video and audio encodersfor that program.
q Multiple programs, each with its own STC, can also bemultiplexed into a single stream.
n A program component can even have no time stamps butcan not be synchronized with other components.
n At encoder input, (Point A), the time of occurrence of an inputvideo picture or audio block is noted by sampling the STC.
n A total delay of encoder and decoder buffer (constant) isadded to STC, creating a Presentation Time Stamp (PTS),
q PTS is then inserted in the first of the packet(s) representingthat picture or audio block, at Point B.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 64
Timing Synchronization (2)n Decode Time Stamp (DTS)can optionally combined into the bit
stream represents the time at which the data should be takeninstantaneously from the decoder buffer and decoded.q DTS and PTS are identical except in the case of picture reordering for B
pictures.q The DTS is only used where it is needed because of reordering.
Whenever DTS is used, PTS is also coded.q PTS (or DTS) inserted interval = 700mS.
q In ATSC PTS (or DTS) must be inserted at the beginning of eachcoded picture (access unit ).
n In addition, the output of the encoder buffer (Point C) is timestamped with System Time Clock (STC)values, called:q System Clock Reference (SCR)in a Program Stream.q Program Clock Reference (PCR)in a Transport Stream.
n PCR time stamp interval = 100mS.n SCR time stamp interval = 700mS.n PCR and/or the SCR are used to synchronize the decoder STC
with the encoder STC.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
17/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 65
Timing Synchronization (3)n All video and audio streams included in a program must get their
time stamps from a common STC so that synchronization of thevideo and audio decoders with each other may be accomplished.
n The data rate and packet rate on the channel (at the multiplexeroutput) can be completely asynchronous with the System TimeClock (STC)
n PCR time stamps allows synchronizations of differentmultiplexed programs having different STCs while allowing STCrecovery for each program.
n If there is no buffer underflow or overflow delays in the buffersand transmission channel for both video and audio areconstant.
n The encoder input and decoder output run at equal and constantrates.
n Fixedend-to-end delay from encoder input to decoder outputn If exact synchronization is not required, the decoder clock can be
free running video frames can be repeated / skipped asnecessary to prevent buffer underflow / overflow , respectively.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 66
HDTV (High definition television)
n High definition television (HDTV) first came topublic attention in 1981, when NHK, theJapanese broadcasting authority, firstdemonstrated it in the United States.
n HDTV is defined by the ITU-R as:
q 'A system designed to allow viewing at aboutthree times the picture height, such that thesystem is virtually, or nearly, transparent to thequality or portrayal that would have beenperceived in the original scene ... by a discerningviewer with normal visual acuity.'
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 67
HDTV (2)
n HDTV proposals are for a screen which is wider than the conventionalTV image by about 33%. It is generally agreed that the HDTV aspectratio will be 16:9, as opposed to the 4:3 ratio of conventional TVsystems. This ratio has been chosen because psychological tests haveshown that it best matches the human visual field.
n It also enables use of existing cinema film formats as additional sourcematerial, since this is the same aspect ratio used in normal 35 mm film.Figure 16.6(a) shows how the aspect ratio of HDTV compares with thatof conventional television, using the same resolution, or the samesurface area as the comparison metric.
n To achieve the improved resolution the video image used in HDTVmust contain over 1000 lines, as opposed to the 525 and 625 providedby the existing NTSC and PAL systems. This gives a much improvedvertical resolution. The exact value is chosen to be a simple multiple ofone or both of the vertical resolutions used in conventional TV.
n However, due to the higher scan rates the bandwidth requirement foranalogue HDTV is approximately 12 MHz, compared to the nominal 6MHz of conventional TV
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 68
HDTV (3)
n The introduction of a non-compatible TV transmission format forHDTV would require the viewer either to buy a new receiver, or tobuy a converter to receive the picture on their old set.
n The initial thrust in Japan was towards an HDTV format which iscompatible with conventional TV standards, and which can bereceived by conventional receivers, with conventional quality.However, to get the full benefit of HDTV, a new wide screen, highresolution receiver has to be purchased.
n One of the principal reasons that HDTV is not already common isthat a general standard has not yet been agreed. The 26th CCIRplenary assembly recommended the adoption of a single, worldwidestandard for high definition television.
n Unfortunately, Japan, Europe and North America are all investingsignificant time and money in their own systems based on their own,current, conventional TV standards and other nationalconsiderations.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
18/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 69
H261- H263
n The H.261 algorithm was developed for the purpose of imagetransmission rather than image storage.
n It is designed to produce a constant output of px 64 kbivs, wherepis an integer in the range 1 to 30.q This allows transmission over a digital network or data link of
varying capacity.
q It also allows transmission over a single 64 kbit/s digitaltelephone channel for low quality video-telephony, or at higher bitrates for improved picture quality.
n The basic coding algorithm is similar to that of MPEG in that it isa hybrid of motion compensation, DCT and straightforwardDPCM (intra-frame coding mode), without the MPEG I, P, Bframes.
n The DCT operation is performed at a low level on 8 x 8 blocks oferror samples from the predicted luminance pixel values, withsub-sampled blocks of chrominance data.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 70
H261-H263 (2)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 71
H261-H263 (3)
n H.261 is widely used on 176x 144 pixel images.
n The ability to select a range of output rates for the algorithmallows it to be used in different applications.
n Low output rates ( p= 1 or 2) are only suitable for face-to-face(videophone) communication. H.261 is thus the standard used inmany commercial videophone systems such as the UKBT/Marconi Relate 2000 and the US ATT 2500 products.
n
Video-conferencing would require a greater output data rate ( p>6) and might go as high as 2Mbit/s for high quality transmissionwith larger image sizes.
n A further development of H.261 is H.263 for lower fixedtransmission rates.
n This deploys arithmetic coding in place of the variable lengthcoding (See H261 diagram), with other modifications, the datarate is reduced to only 20kbit/s.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 72
Model Based Coding (MBC)
n At the very low bit rates (20 kbit/s or less) associated with videotelephony, the requirements for image transmission stretch thecompression techniques described earlier to their limits.
n In order to achieve the necessary degree of compression theyoften require reduction in spatial resolution or even theelimination of frames from the sequence.
n Model based coding (MBC) attempts to exploit a greater degree
of redundancy in images than current techniques, in order toachieve significant image compression but without adverselydegrading the image content information.
n It relies upon the fact that the image quality is largely subjective.Providing that the appearance of scenes within an observedimage is kept at a visually acceptable level, it may not matter thatthe observed image is not a precise reproduction of reality.
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
19/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 73
Model Based Coding (2)n One MBC method for producing an artificial image of a head sequence
utilizes a feature codebook where a range of facial expressions,sufficient to create an animation, are generated from sub-images ortemplates which are joined together to form a complete face.
n The most important areas of a face, for conveying an expression, arethe eyes and mouth, hence the objective is to create an image in whi chthe movement of the eyes and mouth is a convincing approximation tothe movements of the original subject.
n When forming the synthetic image, the feature template vectors whichform the closest match to those of the original moving sequenceare
selected from the codebook and then transmitted as low bit rate codedaddresses.
n By using only 10 eye and 10 mouth templates, for instance, a total of100 combinations exists implying that only a 6-bit codebook addressneed be transmitted.
n It has been found that there are only 13 visually distinct mouth shapesfor vowel and consonant formation during speech.
n However, the number of mouth sub-images is usually increased, toinclude intermediate expressions and hence avoid step changes in theimage.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 74
Model Based Coding (3)n Another common way of representing objects in three-
dimensional computer graphics is by a net ofinterconnecting polygons.
n A model is stored as a set of linked arrays which specifythe coordinates of each polygon vertex, with the linesconnecting the vertices together forming each side of apolygon.
n To make realistic models, the polygon net can beshaded to reflect the presence of light sources.
n The wire-frame model [Welch 19911 can be modified tofit the shape of a person's head and shoulders. The
wire-frame, composed of over 100 interconnectingtriangles, can produce subjectively acceptable syntheticimages, providing that the frame is not rotated b y morethan 30" from the full-fa ce position.
n The model, (see the Figure) uses smaller triangles inareas associated with high degrees of curvature wheresignificant movement is required.
n Large flat areas, such as the forehead, contain fewertriangles.
n A second wire-frame is used to model the mout hinterior.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 75
Model based coding (4)n A synthetic image is created by texture mapping detail from an
initial full-face source image, over the wire-frame, Facialmovement can be achieved by manipulation of the vertices of thewire-frame.
n Head rotation requires the use of simple matrix operations uponthe coordinate array. Facial expression requires the manipulationof the features controlling the vertices.
n This model based feature codebook approach suffers from thedrawback of codebook formation.
n This has to be done off-line and, consequently, the image isrequired to be prerecorded, with a consequent delay.n However, the actual image sequence can be sent at a very low
data rate. For a codebook with 128 entries where 7 bits arerequired to code each mouth, a 25 frameh sequence requiresless than 200 bit/s to code the mouth movements.
n When it is finally implemented, rates as low as 1 kbit/s areconfidently expected from MBC systems, but they can onlytransmit image sequences which match the stored model, e.g.head and shoulders displays.
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 76
Key points:
n JPEG coding mechanism DCT/ Zigzag Scanning/ AdaptiveQuantization / VLC
n MPEG layered structure:q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream(PES)
n MPEG compression mechanism:q Predictionq Motion compensationq Scanningq YCbCr formats (4:4:4, 4:2:0, etc)
q Profiles @ Level
q I,P,B pictures & reordering
q Encoder/ Decoder process & Block diagram
n MPEG Data transport
n MPEG Timing & Buffer controlq STC/SCR/DTSq PCR/PTS
h i l i f i
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
20/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 77
Technical termsn Macro blocksn HVS = Human Visual Systemn GOP = Group of Picturesn VLC = Variable Length Coding/Coder
n IDCT/DCT = (Inverse) Discrete Cosine Transformn PES = Packetized ElementaryStreamn MP@ML = Main profile @ Main Level
n PCR = Program Clock Referencen SCR = System Clock Referencen STC = System Time Clock
n PTS = Presentation Time Stampn DTS = Decode Time Stampn PAT = Program Association Table
n PMT = Program Map Table
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 78
Chapter 3. CATV systems
n Overview:
qA brief history
qModern CATV networks
qCATV systems and equipments
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 79
A Brief History:
q CATV appeared in the 60s in the US, where highbuildings are the great obstacles for the
propagation of TV signal.
q Old CATV networks
n Coaxial only
n Tree-and-Branch only
n TV only
n No return path ( high-pass filters are installed in
customers houses to block return low frequency noise)
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 80
Modern CATV networksn Key elements:
q CO orMasterHeadend
q Headends/Hub
q Servercomplex
q CMTS
q TV contentprovider
q OpticalNodes
q Taps
q Amplifiers(GNA/TNA/LE)
M d CATV k (2) CATV d i
-
8/3/2019 Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133
21/21
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 81
Modern CATV networks (2)n Based on Hybrid Fiber-Coaxialarchitecture also referred to
as HFC networks
n The optical section is based on modern optical communicationtechnologies q Star/ring/mesh, etc topologiesq SDH/SONET for digital fibers
q Various architectures digital, analog or mixed fiber cablingsystems.
n Part of forward path spectrum is used for high-speed Internetaccess
n Return path is exploited for Digital data communication theroot of new problems !!q 5-60 MHz band for upstream
q 88-860 MHz band for downstreamn 88-450 MHz for analog/digital TV channelsn 450-860 MHz for Internet access
q FDM
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 82
Spectrum allocation of CATV networks
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 83
CATV systems and equipments
4/2/2003 Nguyen Chan Hung Hanoi University of Technology 84
Vocabulary
n Perception = Su nhan thuc
n Lap = Phu len