Download - ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28.

ELE 488 F06

ELE 488 Fall 2006Image Processing and Transmission

(11-28 -06)

Digital Video

•Motion Pictures

•Broadcast Television

•Digital Video

11/28

ELE 488 F06

Motion Picture Television Digital Video

• Broadcast Television (analog)– why invent new technology? – movie at home, mass market– influence of movie on development

• Key Steps– convert pictures to electric signal– send electric signal – convert electric signal to picture

• Comparison with motion picture• High Definition Television - analog digital, compression

• Video telephone - analog predecessor• Video conference - travel cost, people cost• Cable (narrowcast), satellite, interactive, ...

ELE 488 F06

NTSC (National Television Systems Committee)

• 525 lines

– 2 dots less than 1/2000 of distance from eye are not separated (merge into one)

– Assume view at distance 4 times the screen height. No need to have more than 500 lines

– NTSC set 525 lines (475 active)

• Movies in 1940 has 4:3 aspect ratio (width to height)

• 25 or more pictures per second to see continuous motion

• 50 or more pictures per second to avoid flicker

– movies use 24 frames/sec, each shown twice

• 30 frames/sec with 2:1 interlace (60 even-odd fields/sec)

ELE 488 F06

Bandwidth of Broadcast Television

• Without interlace (progressive scan), 60 frames/sec– 500 lines alternating black and white gives 250 full cycles– each horizontal line has 250 x 4/3 ~ 350 full cycles– 60 (frames/sec) x 500 (line) x 350 = 10 MHz (video ONLY)

• With 2:1 interlace, 5 MHz for video

• FCC assigns 6 MHz per broadcast channel– real usable bandwidth is less, MUCH less– actual resolvable lines per vertical height ~250

• Color insertion - must compatible with B/W receiver– Change R-G-B to Y-Cb-Cr – Y is luminance (brightness), Cb and Cr are chrominances– B/W sets converts Y to picture, color sets converts Y-Cb-Cr to

R-G-B then display

ELE 488 F06

Digital Video

• What drives digital video?– Information technology:

electronics, communication, storage, new functionality, …– HDTV

• R-G-B component video– 640 x 480 (pixel) x 3 (color) x 8 (bits/color) x 30 = 221 Mb/sec

• Y-Cb-Cr with subsampled Cb and Cr– 640 x 480 (pixel) x 1.5 (color) x 8 (bits/color) x 30 = 110 Mb/sec

• Compression - MPEG (motion picture expert group)– MPEG-1: CD-ROM, 1.5Mb/sec, 1.2Mb/sec for video,

352x240 (CIF), progressive scan, motion compensation– MPEG-2: extension of MPEG-1, interlace, HD– MPEG-4: object/region based– H.2xx

ELE 488 F06

Video Coding

• Video consists of frames In(i,j)– Code each frame as a still picture – motion JPEG

• Each frame is close to the previous frame– Code the difference FDn(i,j) = In(i,j) – In-1(i,j)

– Differential coding (DPCM, predictive coding)

( In-1(i,j) is the predicted value of In(i,j) )

– Need to code the first frame

ELE 488 F06

Encoding Three Frame Types

Differential encoding of video I – Intra Frame, code by itselfP – Prediction Frame, code by referring to previous I or P frameB – Bi-direction Frame, code by referring to forward AND backward I or P frames

I

BP PP P

B B B B B B B B B

I

ELE 488 F06

Coding of I-frame – same as still image

ELE 488 F06

I Frames

• I frames are Intra-coded using the JPEG coded• I frames can be decoded without reference to other

frames of the video.• Sometimes called anchor frames

I frame: JPEG

Frame 31

A group of pictures (GOP) begins with an I-frame and ends before the next I-frame

A typical GOP length is 15 framesWith only 1 I-frame per GOP (the first frame)

ELE 488 F06

Coding P Frames

• Each frame is close to the previous frame– Code frame difference (differential coding – DPCM)

• Occlusion – parts of current frame is blocked in previous frame

– need future frame to “predict” FDn(i,j) = In(i,j) – In+1(i,j)

current frame In frame difference In - In-1

ELE 488 F06

Coding P Frames

• Each frame is close to the previous frame– Code frame difference (differential coding – DPCM)

current frame In frame difference In - In-1

ELE 488 F06

Coding of P Frames

• Video consists of frames In(i,j)– Code each frame as a still picture – motion JPEG

• Each frame is close to the previous frame– Code the difference FDn(i,j) = In(i,j) – In-1(i,j)

– Differential coding (predictive coding)

– In-1(i,j) is the predicted value of In(i,j)

• Observe: – Most part of frame is unchanged– Except for moving objects– Motion Compensated Coding MPEG

ELE 488 F06

Motion Compensated Video Coding

Observe: Most of picture remains unchanged

But some objects have moved.

So code Displaced Frame Difference

Motion Compensated Coding

previous frame current frame

ELE 488 F06

Displaced Frame Difference

ELE 488 F06

ELE 488 F06

P Frames

I frame: JPEG P frame: motion compensated. macro-blocks and macro-blockmotion vectors are indicated

Frame 31 Frame 34

P frames are coded using two methods: - block motion compensation + error coding - jpeg (intra-coded), without referring to previous frames P frames are also anchor frames

Divide P-frame into Macro-blocksMB ~16x16

ELE 488 F06

Finding Motion Vectors

• Matching a block from current frame with a displaced block in reference frame using: (a) sum of squared difference (SSD), or

(b) sum of absolute difference (SAD) (almost always used)• The displacement giving best match is the motion vector of the block • Search methods:

• Global search over the entire anchor frame• Restricted search over local neighborhood• Fast search – over a selected neighborhood,

anchor frame current frame

ELE 488 F06

Illustration: P-frame Macro-Blocks

Frame 34 P-frame

ELE 488 F06

MPEG: I and P frames (anchor frames)

ELE 488 F06

Block Matching Motion Estimation

current frame

Block for which motion vector to be determined

a position for comparison

previous frame

another position

Mean Absolute Diff erence:

MAD(k,l, x,y) = (1/ MN)

1M

0i

1N

0j

| I n(k+i,l+j ) – I n–1(k+x+i,l+y+j) |

Motion vector f or (k,l)th

block:

v(k,l) = arg min(x,y) MAD(k,l, x, y)

Blocks of size MxN

ELE 488 F06

Motion Compensated Encoding of P Frame

current frame

previous frame

Y

ELE 488 F06

Coding of P frame

reconstructed previous frame

Encoder contains decoder

ELE 488 F06

More Detail

ELE 488 F06

Need for Bi-directional Encoding

I

BP PP P

B B B B B B B B B

I

ELE 488 F06

Bidirectional Encoding

ELE 488 F06

Frame Transmit Order vs Viewing Order

View order

Decode order= transmit order

ELE 488 F06

B-frames

B-frames are coded in the same way as P-frames except that for each macro-block, search for the best matching block in both the preceding and succeeding anchor frames. Use the encoding that requires the fewest bits. Called bidirectional encoding.

ELE 488 F06

Block Matching Motion Estimation

current frame

Block for which motion vector to be determined

a position for comparison

previous frame

another position

Mean Absolute Diff erence:

MAD(k,l, x,y) = (1/ MN)

1M

0i

1N

0j

| I n(k+i,l+j ) – I n–1(k+x+i,l+y+j) |

Motion vector f or (k,l)th

block:

v(k,l) = arg min(x,y) MAD(k,l, x, y)

Blocks of size MxN

ELE 488 F06

Complexity of Exhaustive Block-Matching

• Assumptions

– Block size NxN and image size S=M1xM2– Search step size is 1 pixel ~ “integer-pixel accuracy”– Search range +/–R pixels both horizontally and vertically

• Computation complexity

– # Candidate matching blocks = (2R+1)2 – # Operations for computing MAD for one block ~ O(N2)– # Operations for MV estimation per block ~ O((2R+1)2 N2)– # Blocks = S / N2 – Total # operations for entire frame ~ O((2R+1)2 S)

• i.e., overall computation load is independent of block size!

• E.g., M=512, N=16, R=16, 30fps => On the order of 8.55 x 109 operations per second!– Was difficult for real time estimation, but possible with parallel

hardware

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

ELE 488 F06

Exhaustive Search: Cons and Pros

• Pros– Guaranteed optimality within search range and motion model

• Cons– Motion vectors are integer valued– High computation complexity

• On the order of [search-range-size * image-size] for 1-pixel step size

• How to improve accuracy?– Half pixel – significantly improvement– Quarter pixel – some improvement– Requires interpolation

• How to improve speed?– Fast search– Try to exclude unlikely candidates

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

ELE 488 F06

Half pixel resolution in matching

B

a b

cdp

ELE 488 F06

M24

M15 M14 M13

M16

M11

M12

M5 M4 M3

M17 M18 M19

-6 M6 M1 M2 +6

M7 M8 M9

dx

dy

Fast Algorithm: 3-Step Search

• Search candidates at 9 positions

• Reduce step-size after each iteration– Start with step size

approx. half of max. search range

motion vector {dx, dy} = {1, 6}

Total number of computations: 9 + 82 = 25 (3-step) (2R+1)2 = 169 (full search)

(Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

ELE 488 F06

Lowest resolution

Lower resolution

Original resolution

Hierarchical Block Matching• Problem with fast search at full resolution

– Small mis-alignment may give large displacement error esp. for texture and edge blocks

• Hierarchical (multi-resolution) block matching

– Match with coarse resolution to narrow down search range

– Match with high resolution to refine motion estimation

(From Wang’s Preprint Fig.6.19)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

ELE 488 F06

Pixel Decimation

IEEE Trans. on Video Technology, April 1993, pp. 148- 157.

a block in current frame

part of a block in reference frame

ELE 488 F06

Pixel Decimation

ELE 488 F06

Subsampled Motion Field

ELE 488 F06

What else can you do with MPEG video?

• The MPEG encoder-decoder is asymmetric. – Encoder is much more complex than the decoder.

Determining motion vectors is a major task– Decoding is easy and fast. – The encoding only has to be done once, the decoding will

be done many times or at many locations.

• Symmetric application?

• Compression loses information. But– compressed video has information not readily available in

original video