H_264 encoder

8/8/2019 H_264 encoder

1/23

H-264 Encoder

(MPEG-4 Part 10)

8/8/2019 H_264 encoder

2/23

Definitions

MOTION VECTOR: A two-dimensional vector used for inter prediction that provides an offset from the

coordinates in the decoded picture to the coordinates in a reference picture.

MOTION COMPENSATION(MC): The chosen candidate region becomes the predictor for the current M Nblock and is subtracted from the current block to form a residual M N block.

MOTION ESTIMATION(ME): This process of finding the best match is known as motion estimation.

NAL : Network Abstraction Layer.

The prediction PRED is subtracted from the current block to produce a RESIDUAL.

RTP : Real-time Transport Protocol

VCL : video coding layer

8/8/2019 H_264 encoder

3/23

Blocks

Discrete cosine transform (DCT)

Quantization

Inverse quantization

Inverse DCT Prediction Mode Select

Prediction Calculation

Of these blocks, the most challenging will probably be the prediction mode select block, which

determines the optimal prediction mode to minimize the sum of absolute differences (SATD) between the

predicted block and the actual image block.

Uses more sophisticated transforms, like quantization, Hadamard,and discrete cosine transforms to achieve

top-quality compression.

H.264 was designed to compensate for lossy networks. It includes its own network layer to facilitate streaming

video (with lost packets) and to minimize the amount of transfer that needs to be completed.

Acoded picture consists of a number ofmacroblocks, each containing 16 16 luma samples

and associated chroma samples (8 8 Cb and 8 8 Cr samples in the current standard).

8/8/2019 H_264 encoder

4/23

Encoder

8/8/2019 H_264 encoder

5/23

The Encoder includes two dataflow paths, a forward path (left to right) and a reconstruction path (right

to left).

Encoder (forward Path) :

Each macroblock is encoded in intra or inter mode and, for each block in the macroblock, a prediction

PRED (marked P in Figure 6.1) is formed based on reconstructed picture samples.

In Intra mode, PRED is formed from samples in the current slice that have previously encoded, decoded

and reconstructed.

In Inter mode, PRED is formed by motion-compensated prediction from one or two reference picture(s)

selected from the set of list 0 and/or list 1 reference pictures. The prediction PRED is subtracted from the current block to produce a residual (difference) block Dn that

is transformed (using a block transform) and quantised to give X.

The entropy-encoded coefficients, together with side information required to decode each block within

the macroblock (prediction modes, quantiser parameter, motion vector information, etc.) form the

compressed bitstream which is passed to a Network Abstraction Layer (NAL) for transmission or storage.

Encoder (Reconstruction Path) As well as encoding and transmitting each block in a macroblock, the encoder decodes (reconstructs) it to

provide a reference for further predictions.

A filter is applied to reduce the effects of blocking distortion and the reconstructed reference picture is

created from a series of blocks Fn.

8/8/2019 H_264 encoder

6/23

Intra prediction

In intra mode a prediction block P is formed based on previously encoded and reconstructed blocks and is

subtracted from the current block prior to encoding.

For the luma samples, P is formed for each 4 4 block or for a 16 16 macroblock.

There are a total of nine optional prediction modes for each 4 4 luma block, four modes for a 16 16

luma block and four modes for the chroma components.

The encoder typically selects the prediction mode for each block that minimises the difference between P

and the block to be encoded.

4 4 Luma Prediction Modes

The samples above and to the left have previously been encoded

and reconstructed and are therefore available in the encoder and

decoder to form a prediction reference.

The samples a, b, c, . . . , p of the prediction block P (Figure 6.23)

are calculated based on the samples AM.

For modes 3-8, the predicted samples are formed from a weighted average of the prediction samples A-Q.

16x16 luma prediction modes

As an alternative to the 4x4 luma modes described above, the entire 16x16 luma component of

amacroblock may be predicted.

8/8/2019 H_264 encoder

7/23

Four modes are available :

Mode 0 (vertical): extrapolation from upper samples (H).

Mode 1 (horizontal): extrapolation from left samples (V).

Mode 2 (DC): mean of upper and left-hand samples (H+V).

Mode 4 (Plane): a linear plane function is fitted to the

upper and left-hand samples H and V. This works well in

areas of smoothly-varying luminance.

8/8/2019 H_264 encoder

8/23

8/8/2019 H_264 encoder

9/23

IN ADDITION TO THESE TWO TYPES OF LUMA PREDICTION, A SEPARATE CHROMA PREDICTION IS

CONDUCTED.

As an alternative to Intra_4 *4 and Intra_16 *16, the I_PCM coding type allows the encoder to simply

bypass the prediction and transform coding processes and instead directly send the values of the encodedsamples.

8/8/2019 H_264 encoder

10/23

P_MB

Memory

+_

A_MB

Memory

DCT & Q

IDCT &

Q -1

P

calculator

Scaler

16*16 Y

8*8 Cr

8*8 Cb

16*16 Y

16*16 Cr

16*16 Cb

I/P

From

ITU 656

Y_MB

fetcher

16*16 / 8*8Cb_MB

fetcher

Cr_MB

fetcher

Frame

Memory

MUX

MUX

Y_refpixel

fetcher

Cb_refpixel

fetcher

Cr_refpixel

fetcher

From

controlckt

From

control

ckt

To NAL

4:4:4 4:2:0

Control

ckt

Reconstructed

Frame Memory

Mode

select

input

From

control

ckt

8/8/2019 H_264 encoder

11/23

Prediction of Inter Macroblocks in P-slices

Inter prediction creates a prediction model from one or more previously encoded video frames.

The model is formed by shifting samples in the reference frame(s) (motion compensated prediction).

Important differences from earlier standards include the support for a range of block sizes (down to 4x4)and fine sub-pixel motion vectors (1/4 pixel in the luma component).

Tree structured motion compensation :

AVC supports motion compensation block sizes ranging from 16x16 to 4x4 luminance samples with manyoptions between the two.

The luminance component of each macroblock (16x16 samples) may be split up in 4 ways as shown inFigure 2-1: 16x16, 16x8, 8x16 or 8x8.

If the 8x8 mode is chosen, each of the

four 8x8 macroblock partitions within

the macroblock may be split in a further

4 ways as shown in Figure 2-2: 8x8, 8x4,

4x8 or 4x4 (known as macroblock

sub-partitions), THIS IS KNOWN AS TREE

STRUCTURE .

A separate motion vector is required for each partition or sub-partition. Each motion vector must becoded and transmitted; in addition, the choice of partition(s) must be encoded in the compressedbitstream.

8/8/2019 H_264 encoder

12/23

Choosing a large partition size (e.g. 16x16, 16x8, 8x16) means that a small number of bits are required to

signal the choice of motion vector(s) and the type of partition; however, the motion compensated residual

may contain a significant amount of energy in frame and vice versa.

The choice of partition size therefore has a significant impact on compression.

In general, a large partition size is appropriate for homogeneous areas of the frame and a small partitionsize may be beneficial for detailed areas.

Each chroma block is partitioned in the same way as the luma component, except that the partition sizes

have exactly half the horizontal and vertical resolution (an 8x16 partition in luma corresponds to a 4x8

partition in chroma; an 8x4 partition in luma corresponds to 4x2 in chroma; and so on).

Shows a residual frame. The AVC reference encoder selects the best partition size for each part of the

frame, i.e. the partition size that minimizes the coded residual and motion vectors. The macroblock

partitions chosen for each area are shown superimposed on the residual frame. In areas where there is

little change between the frames (residual appears grey), a 16x16 partition is chosen; in areas of detailed

motion (residual appears black or white), smaller partitions are more efficient.

8/8/2019 H_264 encoder

13/23

Instead of directly encoding the raw pixel values for each block, the encoder will try to find a similar block

to the one it is encoding on a previously encoded frame, referred to as reference frame.

If the encoder succeeds on its search, the block could be directly encoded by a vector, known as MOTION

VECTOR, which points to the position of the matching block at the reference frame.

Encoder will compare the block found on the reference frame and the block it is encoding, obtaining the

differences between them. Those differences are known as the PREDICTION ERROR and need to be

transformed and sent to the decoder.

If the block matching algorithm fails to find a suitable

match the prediction error will be considerable. Thus

the overall size of motion vector plus prediction error

will be greater than the raw encoding. In this case theencoder would make an exception and send a raw

encoding for that specific block.

If the matched block at the reference frame has also been

encoded using Inter frame prediction, the errors made

for its encoding will be propagated to the next block.

These drawbacks stress out the need of a reliable andtime periodic reference frame for this technique to be

efficient and useful (I, B, P).

8/8/2019 H_264 encoder

14/23

P_MBMemory

+_

A_MB

Memory

DCT & Q

IDCT &

Q -1Scaler

I/P

From

ITU 656

Cb_MB

fetcher

Cr_MB

fetcher

Frame

Memory

MUXFromcontrol

ckt

Fromcontrol

ckt

PREDICTION

ERROR

4:4:4 4:2:0

Reconstructed

Frame Memory

Reconstructed

Frame Memory

Reconstructed

Frame Memory

List 0 & list 1

MUX

Y_MB

fetcher

From

control ckt

SAD calculator

&

Block-matching

algorithms

From

control ckt

Result

memory

Fromcontrol ckt

Cb_MB

fetcher

Cr_MB

fetcher

Y_MB

fetcher

MOTION

VECTOR

8/8/2019 H_264 encoder

15/23

Slice Modes......

I (Intra) Contains only I macroblocks (each block or All macroblock is predicted from previously codeddata

within the same slice).

P (Predicted) Contains P macroblocks (each macroblock All or macroblock partition is predicted from

onelist 0 reference picture) and/or I macroblocks.

B (Bi-predictive) Contains B macroblocks (each macroblock or macroblock Extended and Main partition is

predicted from a list 0 and/or a list 1 reference picture) and/or I macroblocks.

SP (Switching P) Facilitates switching between coded streams; contains Extended P and/or I macroblocks.

SI (Switching I) Facilitates switching between coded streams; contains SI Extended macroblocks (a special

type of intra coded macroblock).

Most broadcast quality applications however, have tended to use 2 consecutive B frames (I,B,B,P,B,B,P,)

as the ideal trade-off between compression efficiency and video quality.

The main advantage of the usage of B frames is coding efficiency. Backward prediction in this case allows

the encoder to make more intelligent decisions on how to encode the video within these areas. Also, since

B frames are not used to predict future frames, errors generated will not be propagated further within thesequence.

One disadvantage of B frame is that the frame reconstruction memory buffers within the encoder and

decoder must be doubled in size to accommodate the 2 anchor frames.

I B B B P B B B P B B B P B B B P .

8/8/2019 H_264 encoder

16/23

I frame

ntra frame is essentially the first frame to encode but with less amount of compression.

This frame is also known as key frame because the preceding frames are encoded using the information

available from this frame.

Intra-prediction utilizes spatial correlation in each frame to reduce the amount of transmission data

necessary to represent the picture.

Intra-frame is more or less similar to image compression like JPEG or GIF.

They is coded without any dependencies to other frames.

Intra prediction

H.264 performs intra-prediction on two different sized blocks:

16x16 (the entire macroblock) and 4x4.

16x16 prediction is generally chosen for areas of the picture that are

smooth. 4x4 prediction, on the other hand, is useful for predicting

more detailed sections of the frame.

The general idea is to predict a block, whether it be a 4x4 or 16x16

block, based on surrounding pixels using a mode that results in a

prediction that most closely resembles the actual pixels in

that block.

8/8/2019 H_264 encoder

17/23

P-frame

P-frames are predicted by using the previous P or I-frame.

This type of frames is responsible for the most reduction

of the video stream.

Motion estimation is the process of selecting an offset to

a suitable reference area in a previously coded frame.

Motion estimation is carried out in a video encoder(not in decoder).

A good choice of prediction reference minimises the energy in the motion-compensated residual which in

turn maximises compression performance.

However, finding the best offset can be a very computationally intensive procedure.

The goal of a practical motion estimation algorithm is to find a vector that minimises the residual energy

after motion compensation while keeping the computational complexity within acceptable limits.

8/8/2019 H_264 encoder

18/23

B-frame

B-frames are bidirectional predicted frames, i.e

B-frames rely on the frames preceding and following

them.

B-frames contain only the data that have changed

from the preceding frame or are different from the

data in the very next frame.

B frames are interesting for two facts:

1st . First they have a slightly better prediction.2nd. and more important, they do not impact the quality of following frames, so they can be coded with lower quality

without degrading the whole sequence.

Since B-frames depend on both past and future picture, the decoder have to be fed with future I-P frames before

being able to decode them.

Size of I, P, B frames =>

8/8/2019 H_264 encoder

19/23

The most important improvements of this technique in regard to previous H.264 standard are:

More flexible block partition,

Resolution of up to pixel motion compensation,

Multiple references,

Enhanced Direct/Skip Macroblock,

8/8/2019 H_264 encoder

20/23

Macro Blocks

Human eyes are sensing the the color and brightness

by different set of sensors.

The compression algorithms first transforms the image

from RGB to the luminance/chrominance (Y-Cb-Cr) color

space.

Here Y called as luma represents the brightness/grayscale,

And Cb-Cr are the two color components represents the

extent to which the color deviates from gray toward blue

and red, respectively.

Since human visual system is more sensitive to luma than chroma, we will use one fourth of the number of

samples the chroma component has, than the luma component.

This is done by down sampling half the number of samples in both the horizontal and vertical dimensions.

This is called 4:2:0 sampling with 8 bits of precision persample.

4:4:4

4:2:2

4:2:0

8/8/2019 H_264 encoder

21/23

Picture structure

Processing is done in block level(MB).

8/8/2019 H_264 encoder

22/23

At the top level, an H.264 sequence consists of a series of packets or Network Adaptation Layer Units

(NAL Units or NALUs).

These can include parameter sets (containing key parameters that are used by the decoder to correctly

decode the video data) and slices (coded video frames or parts of video frames).

At the next level, a slice represents all or part of a coded video frame and consists of a number of coded

macroblocks, each containing compressed data corresponding to a 16x16 block of displayed pixels in a

video frame.

At the lowest level of Figure 16, a macroblock contains type information (describing the particular choice

of methods used to code the macroblock), prediction information (coded motion vectors or intra

prediction mode information) and coded residual data.

8/8/2019 H_264 encoder

23/23

H_264 encoder

Documents

Transcript of H_264 encoder