Topic for lecture 2

39
Topic for lecture 2 • Topic: video compression • The ultimate compression task? • Color image (300 x 300 x 24bit): – 2.16Mbit/image x 30 image/s = 64.8Mbps • Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit • 56.6K modem => Raw download time (excl. sound and overhead) ~ 1717 hours or ~ 72 days!!!

description

Topic for lecture 2. Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): 2.16Mbit/image x 30 image/s = 64.8Mbps Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit - PowerPoint PPT Presentation

Transcript of Topic for lecture 2

Page 1: Topic for lecture 2

Topic for lecture 2

• Topic: video compression • The ultimate compression task?• Color image (300 x 300 x 24bit):

– 2.16Mbit/image x 30 image/s = 64.8Mbps

• Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit

• 56.6K modem => Raw download time (excl. sound and overhead) ~ 1717 hours or ~ 72 days!!!

Page 2: Topic for lecture 2

Agenda for lecture 2

• What makes video compression possible?

• Implementations of motion compensation– Block matching

• The YCbCr color representation

• MPEG

Page 3: Topic for lecture 2

Video compression • A sequence of images that needs to be

compressed: storage and/or transmission

• Ignore audio as images >> audio

• Straight forward methods– Motion JPEG – 3D DCT

Page 4: Topic for lecture 2

Temporal redundancy• Less than 10% of the pixels changes more than

1% between frames

• Temporal redundancy or interframe correlation

• Temporal redundancy > spatial redundancy

• Origin: slow camera- and object movements

Page 5: Topic for lecture 2

Motion compensated coding

• Second generation of temporal compression method• More efficient (especially with rapid changes) but also more

complex: – Ok since the cost of computer power is decreasing faster than the

cost of bandwidth

• Basic idea: only difference between two images are the moving objects (draw)

• Estimate the motion and simply code this information• From prediction and the initial frame we can encode/decode

all other frames

Page 6: Topic for lecture 2

Practical issues• Due to noise, camera movements, light changes etc. =>

the object and background changes =>– Calculate the predicted error (difference) and code this

• Very hard to track and describe a general object (contour and texture) instead a block of pixels is used as ’object’

• The estimated motion is represented as pure translation: no rotation and scaling– This is justified since we have high frame rates and ’slow’

changes

– Denoted the displacement vector or motion vector

Page 7: Topic for lecture 2

Procedure for motion compensated coding • Image sequence => image => blocks of pixels• Step 1: Motion analysis:

– Estimate the motion vector of the current block, i.e. the position of the block in the previous image(s)

• Step 2: Prediction and differentiation– Predict how the block found in the previous image(s) will look

like in the current image– Subtract the predicted block from the current block =>

difference • Step 3: Entropy encoding of the difference and motion vector• Encoded difference and motion vector << raw image =>

video compression• Step 3 we know

Page 8: Topic for lecture 2

Motion analysis and prediction• In general we seek the trajectory of a block so we

can predict its current position e.g. using weights• In praxis this is too complicated and instead a 0th

order predictor is applied:– Predicted block(x,y,t) = block(a,b,t-1)– MPEG uses two 0th order predictors

• The only unknown issue: step 1: how do we find the block in the previous frame that best matches the block in the current frame?

• Three methods:– Block matching (by far the most applied method)– Pel-recursion (block = 1 pixel)– Optical flow (block = 1 pixel)

Page 9: Topic for lecture 2

Block matching (1)• Principle• The displacement of

the pixels in a block are assumed to have the same motion vector

• Search window– Maximum from frame rate and context– Usually a square region

• Usually p=q => square block• The smaller the block size => the better prediction, but

more overhead (motion vectors)• Usually block size = 16 x 16

Page 10: Topic for lecture 2

Block matching (2)• Overlapping blocks improve reconstructed

image quality but decrease the bit-rate– Usually non-overlapping blocks are applies

• Block matching via a similarity measure:– Sum of squared differences (SSD): S(u,v) = (u-v)^2– Mean absolute differences (MAD): S(u,v) = |u-v|

Page 11: Topic for lecture 2

Searching strategies• Full search:

– Finds global minimum but requires heavy processing!

• Only one minimum in the search region => A less computational demanding search strategy

• Accept a local minimum => – Larger difference but less processing

• Searching strategies with one (local) minimum:– Coarse-fine three-step search– 2D logarithmic search– Conjugate direction search– Etc.

Page 12: Topic for lecture 2

Coarse-fine three-step search• Step 1) Test 9 points within a fixed pattern

• Step 2+3) Centre the pattern around the best match and change the distance within the pattern

Page 13: Topic for lecture 2

YCbCr color representation

Page 14: Topic for lecture 2

YCbCr color representation

• A camera captures color in RGB format (show)• We would like a representation where the intensity and color is separated:

– So we can transmit and decode both a color and gray-scale signal – [R,G,B]: [50,50,50] same color as [100,100,100]– HSI (hue-saturation-intensity)– HSI is complex to calculate so we seek a more simple rep.

• YUV-representation is a simple approximation:– Y = Luminance (intensity) = 0.299 R + 0.587 G + 0.114 B– The non-uniform weighting comes from the HVS– U = B – intensity = ”pure” blue color = 0.492 (B - Y)– V = R – intensity = ”pure” red color = 0.877 (R - Y)– Rough approximation but very simple to compute

Page 15: Topic for lecture 2

YCbCr color representation (3)• The HVS is more sensitive to intensity (Y)

than to color (Cb and Cr) so more bits can be used to represent the intensity

• Formats:

1

2

3

4

1

2

3

4

1

2

3

4

= Y sample = Cb and Cr sample

4:4:4 (24 bits) 4:2:2 (16 bits) 4:2:0 (12 bits)

Page 16: Topic for lecture 2

MPEG• MPEG = Moving pictures experts group• International standard for compression of video (image,

sound, and system info.), due to grows in the digital media (e.g. CD-rom, DVD) market. Both transmission and storage

• MPEG-1: 1991• MPEG-2: 1994

– MPEG-2 is MPEG-1 compatible, hence only MPEG-2 used today

• MPEG is NOT an algorithm but rather a frameworkwith several algorithms and MANY user-settings. – Fixed protocol, hence fixed decoders (encoder not specified! )– Asymmetrical codec ~ 100:1 ( JPEG ~1:1 )

• MPEG is a lossy compression algorithm

Page 17: Topic for lecture 2

MPEG-1• MPEG-2 is an ”add-on” to MPEG-1• Typical bit rate for MPEG-1 = 1.5Mbps

– Meaning that an MPEG-1 decoder can decode and show real-time video that has been compressed to 1.5Mbps. MPEG: Trade off between video quality and bandwidth

• Allows resolutions up to 4095 x 4095 at 60Hz– Most used is the CPB (constrained parameter bit steam)

• Fixed resolutions and frame rates =>

HW implementations

• Max. resolution = 768 x 576 at 30Hz

• Max. bit rate = 1.856Mbps

Page 18: Topic for lecture 2

MPEG-1 compression rate• BT.601 (digital TV-signal):• 704 x 576 x 24bit x 25Hz = 243Mbps• Compression factor: 243Mbps / 1.5Mbps = 162 • JPEG = 10-20• YCrCb 4:2:0 format: 12 bit per pixel• Basic operation: down-scale to SIF (source input format)

– Fixed resolution => HW solutions– 360 x 288 (ignore lines and/or interpolate)

• 360 x 288 x 12 x 25Hz = 30.4Mbps => comp. factor = 20• But can be higher or lower• In general: Fewer input data => better image quality (for

fixed bit rate)

Page 19: Topic for lecture 2

MPEG-1 principle (1)

• Full-motion-compensated DCT and difference coding

• Frames: 1,2,3,4,5,6,7,8,9, …

• 1: (DCT-JPEG)

• 2,3,4,5,6,7,8,9, … : difference coding– The difference is DCT coded and quantized =>

loosy compression– Problems? – Error propagation – No random access

Page 20: Topic for lecture 2

MPEG-1 principle (2)

• I-picture: intra-coded

– Similar to JPEG

• P-picture: predictive

coded via forward prediction

• B-picture: predictive coded via:

– forward-, backward-, or bi-directional prediction

• Errors in I and P are limited to max one GOP (group of pixels)

• Errors in B are limited to one picture

• High N and M => good coding but error propagation.

– Usually: 13<N<16 and 0<M<4

– Recommended: I each ½ sec. and whenever scene changes

• Coding order vs. visualisation order

Page 21: Topic for lecture 2

Entire sequence

16

16 Y

88Cb

88Cr

88

4:2:0-format

6 Blocks

Type: I,P,B

MB = Macro Block

Page 22: Topic for lecture 2

Coding one Block (8x8)

• Similar to JPEG except for adaptive quantization– DCT, quantization, zig-zag scan, entropy coding– Adaptive quantization controls the quality/amount of data– Intra vs. Inter coding:

• I-blocks: Intra

• P,B-blocks: Depending on DIFF: 0, motion vectors, Inter, Intra.

Page 23: Topic for lecture 2

Coding one Block (8x8)

• Encoding

• Decoding

Page 24: Topic for lecture 2

What to remember

• Video compression is done by removing the temporal redundancy• Principle: (at block level)

– Step 1: Motion analysis => motion vector– Step 2: Calculate the error/difference (subtraction)– Step 3: Entropy encoding of motion vector and difference

• Motion analysis:– Pel-recursion– Optical flow– Block matching (the currently applied method)

• Block matching– Block of pixels (16 x 16)– Similarity measure– Search region– Different search strategies to avoid the full search

Page 25: Topic for lecture 2

What to remember• Video compression is done by removing the temporal redundancy• Principle: (at (macro)block level)

– Step 1: Motion analysis (block matching) => motion vector– Step 2: Calculate the error/difference (subtraction)– Step 3: ’JPEG’-coding (DCT, quantization and entropy encoding)

• MPEG-1: – Bit rate ~1.5Mbps– Asymmetrical codec ~ 100:1 ( JPEG ~1:1 )– Compression rate < 400 (down scaling + YCbCr 4:2:0 => ~20)– Coding-style: I B B P B B P B B I

• Questions?• Presentations: email me [email protected]• The end

Page 26: Topic for lecture 2

Xtras

Page 27: Topic for lecture 2

Pel-recursion (1)• The block consists of only one pixel (= pel)• Problem formulation:

– Displaced frame difference function: – DFD(x,y,dx,dy) = i(x,y,t) – i(x-dx,y-dy,t-1)– Find (dx,dy) which minimises DFD^2 =>

most similar pixel => best displacement vector

• Solution:– Setting the partial derivatives = 0– Non-linear programming problem:

• Iterative algorithm• Steepest decent method• Newton-Raphson’s method• others

Page 28: Topic for lecture 2

Pel-recursion (2)• Algorithm:• Find the motion vector (dx,dy) for the first pixel• The motion vectors

are correlated =>– Use ’old’ (dx,dy) as

initial guess for the iterative algorithm =>recursion

Page 29: Topic for lecture 2

Optical flow

• The block consists of only one pixel

• Similar to Pel-recursive but calculated in a different manner

Page 30: Topic for lecture 2

Comparing the 3 types of motion analysis

• The three: pel-recursion, optical flow and block matching • Optical flow and pel-recursion calculated one motion

vector for each pixel =>– More precise => predicted block and current block are more

similar => smaller difference => more compact coding of the difference.

– More overhead as more motion vectors are to be coded– More complex to calculate– Pixel methods avoid the block artefacts of block matching

• Block matching is (at present) more suitable– Used in all coding standards

Page 31: Topic for lecture 2

Temporal methods

• Two methods which exploit both the spatial and temporal redundancies– Frame replenishment– Motion compensation

• Both utilise prediction => short summery

Page 32: Topic for lecture 2

Frame replenishment (1)

• Exploit the temporal redundancy• First generation of temporal compression method• If: value changed significantly:

| i(x,y,t) – i(x,y,t-1) | > TH • Then: code value and position: i(x,y,t) x,y• Else: code nothing => re-use i(x,y,t-1)• Enhancements:

– Send differences instead of values– Remove noise from the images prior to processing

Page 33: Topic for lecture 2

Frame replenishment (2)

• A fixed bit rate of 1Mbps means that the decoder can only decode and play-back real-time video compressed to 1Mbps

• Many changes between two images => many pixels to be coded.

• To achieve the same bit rate => TH is higher

=> only large changes are coded => poorer reconstructionaka. the dirty window effect

Page 34: Topic for lecture 2

2D logarithmic search• Test 5 points within a fixed pattern

• Centre the pattern around the best match

• When best match is in the centre or on the border: reduce distance in pattern

Page 35: Topic for lecture 2

Conjugate direction search• Step 1: Test 3 vertical points next to each other

• Step 2: Move to minimum point

• Continue step 1 and 2 until a minimum is found. Then repeat the process in the vertical direction

Page 36: Topic for lecture 2

YCbCr color representation (2)

• YUV-representation can have negative values, so YUV-representation is scaled and shifted to avoid this => YCbCr-representation

• Cb and Cr are denoted the chrominances

• YCbCr is the representation utilised in image/video compression

YUV

0.299 0.587 0.114-0.147 -0.289 0.436 0.615 -0.515 -0.100

=RGB

YCbCr

0.257 0.504 0.098-0.148 -0.291 0.439 0.439 -0.368 -0.071

=RGB

+ 16128128

Page 37: Topic for lecture 2

Audio in MPEG-1• 16 bit sampled at: 16, 22.05, 24, 32, 44.1 and 48Kbps

• Stereo at 44.1Kbps = 1.4Mbps• Compression based on psycho-acoustic redundancy:• Three methods:

– Layer 1: Target rate = 384Kbps– Layer 2: Target rate = 256Kbps– Layer 3: Target rate = 128Kbps

• Layer 3 is the most advanced and often applied– It has a nickname, which?

dB

Hz

dB

Hz

Page 38: Topic for lecture 2

MPEG-2• Defined in 1994• Developed for DTV but has lots of other applications• Based on MPEG-1 (backward compatible) • Bit rates: 1.5Mbps – 60Mbps. Target: 2-15Mbps (best: 4)• Lots of new features including:

– Support for fields, support for 4:4:4 and 4:2:2

– Alternative zig-zag scan, better motion vectors

– Scalability to allow any subset of a stream to be decoded and visualised, etc.

• MPEG-3: Purpose: HDTV – Merged with MPEG-2 => no MPEG-3 standard

Page 39: Topic for lecture 2

MPEG-4• Both for real video and synthetic video• Very low bit rates < 64Kbps => efficient coding• Content based coding: code the objects

– Shape, texture and sprite (background objects)

• Interactivity• Popular coding

standards: