H_264 encoder

download H_264 encoder

of 23

Transcript of H_264 encoder

  • 8/8/2019 H_264 encoder

    1/23

    H-264 Encoder

    (MPEG-4 Part 10)

  • 8/8/2019 H_264 encoder

    2/23

    Definitions

    MOTION VECTOR: A two-dimensional vector used for inter prediction that provides an offset from the

    coordinates in the decoded picture to the coordinates in a reference picture.

    MOTION COMPENSATION(MC): The chosen candidate region becomes the predictor for the current M Nblock and is subtracted from the current block to form a residual M N block.

    MOTION ESTIMATION(ME): This process of finding the best match is known as motion estimation.

    NAL : Network Abstraction Layer.

    The prediction PRED is subtracted from the current block to produce a RESIDUAL.

    RTP : Real-time Transport Protocol

    VCL : video coding layer

  • 8/8/2019 H_264 encoder

    3/23

    Blocks

    Discrete cosine transform (DCT)

    Quantization

    Inverse quantization

    Inverse DCT Prediction Mode Select

    Prediction Calculation

    Of these blocks, the most challenging will probably be the prediction mode select block, which

    determines the optimal prediction mode to minimize the sum of absolute differences (SATD) between the

    predicted block and the actual image block.

    Uses more sophisticated transforms, like quantization, Hadamard,and discrete cosine transforms to achieve

    top-quality compression.

    H.264 was designed to compensate for lossy networks. It includes its own network layer to facilitate streaming

    video (with lost packets) and to minimize the amount of transfer that needs to be completed.

    Acoded picture consists of a number ofmacroblocks, each containing 16 16 luma samples

    and associated chroma samples (8 8 Cb and 8 8 Cr samples in the current standard).

  • 8/8/2019 H_264 encoder

    4/23

    Encoder

  • 8/8/2019 H_264 encoder

    5/23

    The Encoder includes two dataflow paths, a forward path (left to right) and a reconstruction path (right

    to left).

    Encoder (forward Path) :

    Each macroblock is encoded in intra or inter mode and, for each block in the macroblock, a prediction

    PRED (marked P in Figure 6.1) is formed based on reconstructed picture samples.

    In Intra mode, PRED is formed from samples in the current slice that have previously encoded, decoded

    and reconstructed.

    In Inter mode, PRED is formed by motion-compensated prediction from one or two reference picture(s)

    selected from the set of list 0 and/or list 1 reference pictures. The prediction PRED is subtracted from the current block to produce a residual (difference) block Dn that

    is transformed (using a block transform) and quantised to give X.

    The entropy-encoded coefficients, together with side information required to decode each block within

    the macroblock (prediction modes, quantiser parameter, motion vector information, etc.) form the

    compressed bitstream which is passed to a Network Abstraction Layer (NAL) for transmission or storage.

    Encoder (Reconstruction Path) As well as encoding and transmitting each block in a macroblock, the encoder decodes (reconstructs) it to

    provide a reference for further predictions.

    A filter is applied to reduce the effects of blocking distortion and the reconstructed reference picture is

    created from a series of blocks Fn.

  • 8/8/2019 H_264 encoder

    6/23

    Intra prediction

    In intra mode a prediction block P is formed based on previously encoded and reconstructed blocks and is

    subtracted from the current block prior to encoding.

    For the luma samples, P is formed for each 4 4 block or for a 16 16 macroblock.

    There are a total of nine optional prediction modes for each 4 4 luma block, four modes for a 16 16

    luma block and four modes for the chroma components.

    The encoder typically selects the prediction mode for each block that minimises the difference between P

    and the block to be encoded.

    4 4 Luma Prediction Modes

    The samples above and to the left have previously been encoded

    and reconstructed and are therefore available in the encoder and

    decoder to form a prediction reference.

    The samples a, b, c, . . . , p of the prediction block P (Figure 6.23)

    are calculated based on the samples AM.

    For modes 3-8, the predicted samples are formed from a weighted average of the prediction samples A-Q.

    16x16 luma prediction modes

    As an alternative to the 4x4 luma modes described above, the entire 16x16 luma component of

    amacroblock may be predicted.

  • 8/8/2019 H_264 encoder

    7/23

    Four modes are available :

    Mode 0 (vertical): extrapolation from upper samples (H).

    Mode 1 (horizontal): extrapolation from left samples (V).

    Mode 2 (DC): mean of upper and left-hand samples (H+V).

    Mode 4 (Plane): a linear plane function is fitted to the

    upper and left-hand samples H and V. This works well in

    areas of smoothly-varying luminance.

  • 8/8/2019 H_264 encoder

    8/23

  • 8/8/2019 H_264 encoder

    9/23

    IN ADDITION TO THESE TWO TYPES OF LUMA PREDICTION, A SEPARATE CHROMA PREDICTION IS

    CONDUCTED.

    As an alternative to Intra_4 *4 and Intra_16 *16, the I_PCM coding type allows the encoder to simply

    bypass the prediction and transform coding processes and instead directly send the values of the encodedsamples.

  • 8/8/2019 H_264 encoder

    10/23

    P_MB

    Memory

    +_

    A_MB

    Memory

    DCT & Q

    IDCT &

    Q -1

    P

    calculator

    Scaler

    16*16 Y

    8*8 Cr

    8*8 Cb

    16*16 Y

    16*16 Cr

    16*16 Cb

    I/P

    From

    ITU 656

    Y_MB

    fetcher

    16*16 / 8*8Cb_MB

    fetcher

    Cr_MB

    fetcher

    Frame

    Memory

    MUX

    MUX

    Y_refpixel

    fetcher

    Cb_refpixel

    fetcher

    Cr_refpixel

    fetcher

    From

    controlckt

    From

    control

    ckt

    To NAL

    4:4:4 4:2:0

    Control

    ckt

    Reconstructed

    Frame Memory

    Mode

    select

    input

    From

    control

    ckt

  • 8/8/2019 H_264 encoder

    11/23

    Prediction of Inter Macroblocks in P-slices

    Inter prediction creates a prediction model from one or more previously encoded video frames.

    The model is formed by shifting samples in the reference frame(s) (motion compensated prediction).

    Important differences from earlier standards include the support for a range of block sizes (down to 4x4)and fine sub-pixel motion vectors (1/4 pixel in the luma component).

    Tree structured motion compensation :

    AVC supports motion compensation block sizes ranging from 16x16 to 4x4 luminance samples with manyoptions between the two.

    The luminance component of each macroblock (16x16 samples) may be split up in 4 ways as shown inFigure 2-1: 16x16, 16x8, 8x16 or 8x8.

    If the 8x8 mode is chosen, each of the

    four 8x8 macroblock partitions within

    the macroblock may be split in a further

    4 ways as shown in Figure 2-2: 8x8, 8x4,

    4x8 or 4x4 (known as macroblock

    sub-partitions), THIS IS KNOWN AS TREE

    STRUCTURE .

    A separate motion vector is required for each partition or sub-partition. Each motion vector must becoded and transmitted; in addition, the choice of partition(s) must be encoded in the compressedbitstream.

  • 8/8/2019 H_264 encoder

    12/23

    Choosing a large partition size (e.g. 16x16, 16x8, 8x16) means that a small number of bits are required to

    signal the choice of motion vector(s) and the type of partition; however, the motion compensated residual

    may contain a significant amount of energy in frame and vice versa.

    The choice of partition size therefore has a significant impact on compression.

    In general, a large partition size is appropriate for homogeneous areas of the frame and a small partitionsize may be beneficial for detailed areas.

    Each chroma block is partitioned in the same way as the luma component, except that the partition sizes

    have exactly half the horizontal and vertical resolution (an 8x16 partition in luma corresponds to a 4x8

    partition in chroma; an 8x4 partition in luma corresponds to 4x2 in chroma; and so on).

    Shows a residual frame. The AVC reference encoder selects the best partition size for each part of the

    frame, i.e. the partition size that minimizes the coded residual and motion vectors. The macroblock

    partitions chosen for each area are shown superimposed on the residual frame. In areas where there is

    little change between the frames (residual appears grey), a 16x16 partition is chosen; in areas of detailed

    motion (residual appears black or white), smaller partitions are more efficient.

  • 8/8/2019 H_264 encoder

    13/23

    Instead of directly encoding the raw pixel values for each block, the encoder will try to find a similar block

    to the one it is encoding on a previously encoded frame, referred to as reference frame.

    If the encoder succeeds on its search, the block could be directly encoded by a vector, known as MOTION

    VECTOR, which points to the position of the matching block at the reference frame.

    Encoder will compare the block found on the reference frame and the block it is encoding, obtaining the

    differences between them. Those differences are known as the PREDICTION ERROR and need to be

    transformed and sent to the decoder.

    If the block matching algorithm fails to find a suitable

    match the prediction error will be considerable. Thus

    the overall size of motion vector plus prediction error

    will be greater than the raw encoding. In this case theencoder would make an exception and send a raw

    encoding for that specific block.

    If the matched block at the reference frame has also been

    encoded using Inter frame prediction, the errors made

    for its encoding will be propagated to the next block.

    These drawbacks stress out the need of a reliable andtime periodic reference frame for this technique to be

    efficient and useful (I, B, P).

  • 8/8/2019 H_264 encoder

    14/23

    P_MBMemory

    +_

    A_MB

    Memory

    DCT & Q

    IDCT &

    Q -1Scaler

    I/P

    From

    ITU 656

    Cb_MB

    fetcher

    Cr_MB

    fetcher

    Frame

    Memory

    MUXFromcontrol

    ckt

    Fromcontrol

    ckt

    PREDICTION

    ERROR

    4:4:4 4:2:0

    Reconstructed

    Frame Memory

    Reconstructed

    Frame Memory

    Reconstructed

    Frame Memory

    List 0 & list 1

    MUX

    Y_MB

    fetcher

    From

    control ckt

    SAD calculator

    &

    Block-matching

    algorithms

    From

    control ckt

    Result

    memory

    Fromcontrol ckt

    Cb_MB

    fetcher

    Cr_MB

    fetcher

    Y_MB

    fetcher

    MOTION

    VECTOR

  • 8/8/2019 H_264 encoder

    15/23

    Slice Modes......

    I (Intra) Contains only I macroblocks (each block or All macroblock is predicted from previously codeddata

    within the same slice).

    P (Predicted) Contains P macroblocks (each macroblock All or macroblock partition is predicted from

    onelist 0 reference picture) and/or I macroblocks.

    B (Bi-predictive) Contains B macroblocks (each macroblock or macroblock Extended and Main partition is

    predicted from a list 0 and/or a list 1 reference picture) and/or I macroblocks.

    SP (Switching P) Facilitates switching between coded streams; contains Extended P and/or I macroblocks.

    SI (Switching I) Facilitates switching between coded streams; contains SI Extended macroblocks (a special

    type of intra coded macroblock).

    Most broadcast quality applications however, have tended to use 2 consecutive B frames (I,B,B,P,B,B,P,)

    as the ideal trade-off between compression efficiency and video quality.

    The main advantage of the usage of B frames is coding efficiency. Backward prediction in this case allows

    the encoder to make more intelligent decisions on how to encode the video within these areas. Also, since

    B frames are not used to predict future frames, errors generated will not be propagated further within thesequence.

    One disadvantage of B frame is that the frame reconstruction memory buffers within the encoder and

    decoder must be doubled in size to accommodate the 2 anchor frames.

    I B B B P B B B P B B B P B B B P .

  • 8/8/2019 H_264 encoder

    16/23

    I frame

    ntra frame is essentially the first frame to encode but with less amount of compression.

    This frame is also known as key frame because the preceding frames are encoded using the information

    available from this frame.

    Intra-prediction utilizes spatial correlation in each frame to reduce the amount of transmission data

    necessary to represent the picture.

    Intra-frame is more or less similar to image compression like JPEG or GIF.

    They is coded without any dependencies to other frames.

    Intra prediction

    H.264 performs intra-prediction on two different sized blocks:

    16x16 (the entire macroblock) and 4x4.

    16x16 prediction is generally chosen for areas of the picture that are

    smooth. 4x4 prediction, on the other hand, is useful for predicting

    more detailed sections of the frame.

    The general idea is to predict a block, whether it be a 4x4 or 16x16

    block, based on surrounding pixels using a mode that results in a

    prediction that most closely resembles the actual pixels in

    that block.

  • 8/8/2019 H_264 encoder

    17/23

    P-frame

    P-frames are predicted by using the previous P or I-frame.

    This type of frames is responsible for the most reduction

    of the video stream.

    Motion estimation is the process of selecting an offset to

    a suitable reference area in a previously coded frame.

    Motion estimation is carried out in a video encoder(not in decoder).

    A good choice of prediction reference minimises the energy in the motion-compensated residual which in

    turn maximises compression performance.

    However, finding the best offset can be a very computationally intensive procedure.

    The goal of a practical motion estimation algorithm is to find a vector that minimises the residual energy

    after motion compensation while keeping the computational complexity within acceptable limits.

  • 8/8/2019 H_264 encoder

    18/23

    B-frame

    B-frames are bidirectional predicted frames, i.e

    B-frames rely on the frames preceding and following

    them.

    B-frames contain only the data that have changed

    from the preceding frame or are different from the

    data in the very next frame.

    B frames are interesting for two facts:

    1st . First they have a slightly better prediction.2nd. and more important, they do not impact the quality of following frames, so they can be coded with lower quality

    without degrading the whole sequence.

    Since B-frames depend on both past and future picture, the decoder have to be fed with future I-P frames before

    being able to decode them.

    Size of I, P, B frames =>

  • 8/8/2019 H_264 encoder

    19/23

    The most important improvements of this technique in regard to previous H.264 standard are:

    More flexible block partition,

    Resolution of up to pixel motion compensation,

    Multiple references,

    Enhanced Direct/Skip Macroblock,

  • 8/8/2019 H_264 encoder

    20/23

    Macro Blocks

    Human eyes are sensing the the color and brightness

    by different set of sensors.

    The compression algorithms first transforms the image

    from RGB to the luminance/chrominance (Y-Cb-Cr) color

    space.

    Here Y called as luma represents the brightness/grayscale,

    And Cb-Cr are the two color components represents the

    extent to which the color deviates from gray toward blue

    and red, respectively.

    Since human visual system is more sensitive to luma than chroma, we will use one fourth of the number of

    samples the chroma component has, than the luma component.

    This is done by down sampling half the number of samples in both the horizontal and vertical dimensions.

    This is called 4:2:0 sampling with 8 bits of precision persample.

    4:4:4

    4:2:2

    4:2:0

  • 8/8/2019 H_264 encoder

    21/23

    Picture structure

    Processing is done in block level(MB).

  • 8/8/2019 H_264 encoder

    22/23

    At the top level, an H.264 sequence consists of a series of packets or Network Adaptation Layer Units

    (NAL Units or NALUs).

    These can include parameter sets (containing key parameters that are used by the decoder to correctly

    decode the video data) and slices (coded video frames or parts of video frames).

    At the next level, a slice represents all or part of a coded video frame and consists of a number of coded

    macroblocks, each containing compressed data corresponding to a 16x16 block of displayed pixels in a

    video frame.

    At the lowest level of Figure 16, a macroblock contains type information (describing the particular choice

    of methods used to code the macroblock), prediction information (coded motion vectors or intra

    prediction mode information) and coded residual data.

  • 8/8/2019 H_264 encoder

    23/23