Motion Vectors

Extraction of Motion Vectors from an MPEGStream

Technical report 1999

JOSEPH GILVARRYSchool of Electronic Engineering

Dublin City University

ii

Abstract In 1997, a project was started to capture, compress, store, and index up to 24 hours of digitalTV broadcasts. The work in this report is to help implement this. In the first chapter of thisreport, the overall project is introduced and also the motivation behind particular focus ofwork. The second chapter deals with the theory behind digital video compression. In the thirdchapter a report is given on how the program to extract the motion vectors from the MPEGstream was developed. It also reports on the further development of the program, so themotion from frame to frame can be calculated. Chapter four explains why knowledge of themotion vectors is not sufficient information to calculate the motion from frame to frame. Itgives the extra information that is needed and how all the information is used to calculate themotion from frame to frame.

iii

Table Of Contents

Abstract................................ ................................ ................................ ............................... iiTable Of Contents ................................ ................................ ................................ .............. iiiTable of Figures................................ ................................ ................................ .................. ivChapter 1................................ ................................ ................................ ............................. 11. Introduction................................ ................................ ................................ ..................... 1Chapter 2................................ ................................ ................................ ............................. 32. Digital Video Compression................................ ................................ .............................. 3

2.1 The MPEG-1 bit stream ................................ ................................ ......................... 32.1.1 Description of a frame................................ ................................ ..................... 52.1.2 Bit stream order and display order of frames................................ .................... 62.1.3 Description of a macroblock................................ ................................ ............ 7

2.2 Types of macroblock present in a frame................................ ................................ .. 82.2.1 Types of macroblock in an I frame................................ ................................ ... 82.2.2 Types of macroblock in a P frame................................ ................................ .... 82.2.3 Types of macroblock in a B frame................................ ................................ ... 9

2.3 Motion estimation and compensation................................ ................................ ...... 92.3.1 Encoding the motion vectors ................................ ................................ ......... 11Summary................................ ................................ ................................ ................ 12

Chapter 3................................ ................................ ................................ ........................... 133 Extraction of the motion vectors ................................ ................................ ..................... 13

3.1 Choosing a decoder................................ ................................ .............................. 133.1.1 The Berkeley Decoder................................ ................................ ................... 133.1.2 The Java Decoder................................ ................................ .......................... 13

3.2 Description of the source code ................................ ................................ ............. 133.3 Storage of the Motion Vectors ................................ ................................ ............. 15

3.3.1 Reordering the bit stream order to the display order ................................ ...... 153.3.2 Storing the motion vectors ................................ ................................ ............ 163.3.3 Operation of program................................ ................................ .................... 173.3.4 Alterations made to the decoder ................................ ................................ .... 18

Summary ................................ ................................ ................................ ................... 20Chapter 4................................ ................................ ................................ ........................... 21

4 Finding the motion from frame to frame................................ ................................ .. 214.1 Considerations that have to be taken into account - Frame level........................ 224.2Considerations that have to be taken into account - macroblock level................ 23

Summary ................................ ................................ ................................ ................... 24Conclusion ................................ ................................ ................................ ........................ 25References................................ ................................ ................................ ......................... 25Appendix A................................ ................................ ................................ ....................... 26

iv

Table of FiguresFigure 2.1 The layered structure of the MPEG bit stream… … … … … … … … … … … … … … … … … .… .… .4Figure 2. 2 P frames use only forward prediction… … … … … … … … … … … … … … … … … … … … … ..… ..5Figure 2.3 B frames use both forward and backward prediction… … … … … … … … … … … … … .… … … … .6Figure 2.4 A single frame divided up into slices… … … … … … … … … … … … … … … … … ..… … .… … .… ..6Figure 2.5 Only one set of chrominance components is needed for every four luminance components .… .7Figure 2.6 Structure of a macroblock, and the blocks numbering convention… … … … … … … … … … … … .7Figure 2.7 A forward predicted motion vector … … … … … … … … … … … … … … … … … … … … … … … … 10Figure 3.1 Converting from bit stream order to Display order… … … … … … … … … … … … … … … … … … 16Figure 3.2 Diagram of where the motion vectors for the different frames are stored… … … … … … … … … .18Figure3.3 Flow chart of the operational program… … … … … … … … … … … … … … … … … … … … … … … 19Figure 4.1 Motion vectors associated with a moving picture… … … … … … … … … … … … … … … … … … ..21Figure 4.2 Realistic version of vectors associated with a moving picture… … … … … … … … … … … … … … 23

1

Chapter 1

1. Introduction

With the arrival of Digital TV in America and Great Britain recently it is only a short timebefore its use will be standard. Recent years have also brought huge advances in:

• Networking - High bandwidth networks not only in the workplace, but reachingmany homes also;

• Data storage - Today we talk only in Gigabytes;• Video compression - Modern techniques allow compression rates of up to one in

fifty (this topic is discussed in detail in Chapter 2)

The combination of these developments will bring the wide spread usage of digital video overthe next few years.

Following the launch of this new technology will be the launch of many new services,we could see the introduction of the local video ‘server’ instead of the local video store whereconnected residents can select a video from a huge multimedia server. A recording of alltelevision broadcasts for the past week may be stored, allowing subscribers to catch up on anymissed viewing. Searching through such large archives will see the need of a navigation tool.There is an on going project in DCU at the moment to develop such a tool of which thisproject is only a part[1].

When complete, the tool will allow the user to pick a category to search through(sport, drama, action, soap). Clicking on a category will display a list of key frames, each framerepresenting a program. Clicking on one of these frames will display another list of key framesand using this hierarchical approach, the user can narrow the search down to a single shot ofvideo.

One of the challenges of the project is to choose a frame to best represent a clip of film.It has been found that the frame after a sequence of frames with a lot of action is sometimes agood representation for that shot. This is one area where the motion vectors may come inuseful.

To allow navigation, the material has first to be broken up into elements. For videothese elements are shots and scenes. A shot is defined as the continuous recording of a singlecamera, a scene is made up of multiple shots, while a television broadcast consists of acollection of scenes. For studio broadcasts (take for example the news) it is fairly easy to breakthe program up as the boundaries between shots are “hard”. However most televisionprograms and films use special techniques to “soften” the boundaries, this makes them lessdetectable.

There are four different types of boundaries between shots:

• A cut. This is a “hard” boundary and occurs when there is a complete change of pictureover two consecutive frames.

• A fade. There are two types of fade, a fade out and a fade in. A fade out occurs whenthe picture gradually fades to a dot or black screen, while a fade in occurs when the

2

picture is gradually displayed from a black screen, both these effects occur over a fewframes.

• A dissolve. This is the simultaneous occurrence of a fade out and a fade in, the twopictures are superimposed on each other.

• A wipe. This effect is like a virtual line going across the screen clearing one picture as itbrings in another, again this occurs over a few frames.

There are a lot of techniques (Pixel based difference method, Colour histogrammethod, Detection of macroblocks and Edge detection[5]) which can reliably detect a cut.However, only Edge detection is any way effective in detecting Fades, Dissolves and Wipes.

There is another ongoing project in DCU at the moment that uses edge detection tofind shot boundaries. The program takes two consecutive frames, uses special techniques toleave just a black & white outline of any objects in the frames, and then compares the twooutlines. If there are a lot differences in them, it concludes a shot cut has occurred. One of theflaws of this method is that it only allows for relatively small movements of the objects fromframe to frame. If something large suddenly moves across the screen, it may interpret this as acut.

To illustrate where this may happen, take the example where a journalist is giving a TVreport from outside some building, and suddenly a bus goes by in the background. Theinclusion of the bus in the frame could confuse the program into thinking a cut has occurred.This is another case where motion vectors could come in useful, as, associated with a lot ofmovement in a frame is a lot of motion vectors. These motion vectors can be used tocompensate for the movement of the bus.

Here is a history of the events that lead up to the creation of this project.

• Develop a system to capture, compress, store and index up to 24 hours of TVbroadcasts in digital format.

• Eight hour recording of television broadcasts in MPEG1 format.• This eight hours was broken into twenty minute segments for easier handling .• A baseline was created by manually going through the entire recording and labelling

where a cut, fade, dissolve and wipe occurred. A note of the frame number and thetime the boundary occurred was taken. The results of any program written to find theseboundaries can be easily compared to the baseline in order to determine its accuracy.

• A program was written using edge detection to find the shot boundaries but it wasfound that a lot of motion in a frame caused the program to falsely detect a cut. Theuse motion vectors to compensate for the motion should rectify the result. It is hopedthat the motion vectors can also be used to enhance the programs performance indetecting fades, dissolves and wipes.

• Another area where the motion vectors may be used is in the choice of key frame for ashot, [choose a frame after a lot of action?]

3

Chapter 2

2. Digital Video CompressionIn this chapter the techniques used to compress digital video are discussed with a specialemphasis on the factors that that need to be considered when finding the motion from oneframe to another.

Digital video has the advantages of high quality sound and pictures, but itsdisadvantage is it can’t be easily transmitted or stored; it needs to be transmitted at a minimumof 100Mbps which is impractical for to-day’s infrastructure. To combat this problem, a lot ofwork was put into video compression. In 1988 the International Standards Organisation (ISO)set up the Moving Picture Expert Group (MPEG) to standardise this compression. Its firststandard, IS 11172 (known as MPEG-1) came in five parts:

1. System (11172-1). This was concerned with the multiplexing and synchronisation ofthe multiple audio video streams.

2. Video (11172-2). This dealt with the encoding of the video stream.3. Audio (11172-3). This part dealt with the encoding of the audio stream.4. Compliance testing (11172-4)5. Software for MPEG-1 coding (11172-5)

Parts 1, 2 and 3 were approved in November 1992 and parts 4 and 5 in November 1994. Thisproject is only concerned with the second part, the Video encoding. A summary of thestandard is given in Table 2.1

Table 2.1 Summary of the constrained parameters of MPEG-1[2]Horizontal picture size less than or equal to 768 pelsVertical picture size less than or equal to 576 linesPicture area less than or equal to 396 macroblocksPel rate less than or equal to 396x25 macroblocks per secondPicture rate less than or equal to 30 HzMotion vector range -64 to 63.5 pels (half pel precision)Input buffer size less than or equal to 327.68 KbBitrate less than or equal to 1.856 Mbps (constant bitrate)

The aim of MPEG-1 was to achieve coding of full motion video at a rate of around 1.5Mbps,this rate was chosen as it would be suitable for transmission over any modern network and alsoit is nearly the same rate as a CD (1.412Mbps).

To allow for greater flexibility and ingenuity in compression techniques, MPEG-1 doesnot specify a standard for the encoding of video. What it does specify is a standard for thedecoding process and the video bit stream.

2.1 The MPEG-1 bit streamThe bit stream is in a layered format as shown in Figure 2.1, a brief description of the functionof each layer is giver in Table 2.2.

4

Figure 2.1 The layered structure of the MPEG bit stream

Table 2.2 Function of each layer of the bit stream[2]

Layer FunctionSequence layer One or more groups of picturesGroup of pictures (GOP) Random access into the sequencePicture Primary coding unitSlice Resynchronisation unitMacroblock Motion compensation unitBlock DCT unit

GOP GOP GOP GOP GOP GOP GOP GOP GOP

Frame Frame Frame Frame Frame Frame

Slice Slice Slice Slice Slice Slice Slice Slice Slice

Macro

Block

Macro

Block

Macro

Block

Macro

Block

Macro

Block

Macro

Block

Macro

Block

Macro

Block

Macro

Block

Y0

Block

Y1

Block

Y2

Block

Y3

Block

Cb

Block

Cr

Block

Frame Frame Frame

Sequence

layer

5

Firstly, each layer is briefly described, and then a more thorough description of the units in thelayers is given.

1. The sequence layer contains general information about the video: the vertical andhorizontal size of the frames, height/width ratio, picture rate, VBV Buffer size, Intra andnon-intra quantizer default tables.

2. Group of pictures (GOP) layer: Pictures are grouped together to support greaterflexibility and efficiency in the encoder/decoder [2].

3. The frame layer (picture layer) is the primary coding unit, it contains informationregarding the picture’s position in the display order (pictures do not come in the sameorder as they are displayed), what type of picture it is (Intra, Predicted or Bi-directionallypredicted) and the precision and range of any motion vectors present in the frame.

4. The Slice layer is important in the handling of errors. If the decoder comes across acorrupted slice, it skips it and goes straight to the start of the next slice.

5. The Macroblock layer is the basic coding unit It is within this unit that the motionvectors are stored. Each macroblock may have one or associated with it.

6. The Block layer is the smallest coding unit and it contains information on thecoefficients of the pixels.

2.1.1 Description of a frame

As mentioned above there are three types of picture/frame:

• Intra (I-type) . These frames are encoded using only information from itself.• Predicted (P-type). These frames are encoded using a past I or P frame as a reference,

as illustrated in figure 2.2. This is known as forward prediction.

I B B P B B B P

Figure 2. 2 P frames use only forward prediction

6

• Bi-directionally predicted (B-type). These frames are encoded using a past (forward

predicted) and a future (backward predicted) I or P frame as a reference, asillustrated in figure 2.3 (a B-type frame is never used as a reference).

I B B P B B B P

Figure 2.3 B frames use both forward and backward prediction

Each frame is divided up into arbitrary sized slices. A slice may contain just one macroblock orall the macroblocks in the frame. As shown in Figure 2.4, a slice is not confined to a singlerow.

Figure 2. 4 A single frame divided up into slices

2.1.2 Bit stream order and display order of framesA typical sequence of frames in the display order is shown below.

I B B B P B B B B P I B B B I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

However, this is not the order in which they are transmitted! The P frame numbered five isneeded for the decoding of B frames two, three, and four. Therefore five has to be decodedbefore two, three and four and hence transmitted before them. Similarly P9 is transmittedbefore B6, B7, B8 and B9 also I15 is transmitted before B12, B13 and B14. The bit streamorder is shown below.

I P B B B P B B B B I I B B B 1 5 2 3 4 10 6 7 8 9 11 15 12 13 14

Slice 1 Slice 2

Slice 3 Slice 4 Slice 5

Slice 6 Slice 7

Slice 8 Slice 9

7

2.1.3 Description of a macroblockThe macroblock is the basic unit in the MPEG stream, it is an area of 16 pixels by 16 pixelsand it is at this stage that the first compression takes place. Each pixel has a luminance (Y)component and two chrominance (Cb and Cr) components associated with it. The human eyeis much more sensitive to luminance than it is to chrominance. Therefore the luminancecomponents must be encoded at full resolution while the chrominance components can beencoded at quarter resolution without any noticeable loss. This gives compression of one intwo already. Figure 2.5 shows this compression.

Y Y Y Y Y Y Y Y Cb Cr Cb Cr Cb Cr Cb Cr Y Y Y Y Y Y Y Y




Figure 2.5 Only one set of chrominance components is needed for every four luminancecomponents.

A block is an 8 by 8 pixel area and is the smallest unit in the MPEG stream. It contains theDiscrete Cosine Transform (DCT) coefficients of the luminance and chrominance components[3]. Six blocks are needed to make up a macroblock (16 pixels by 16 pixels), four for theluminance components but only one for each of the two chrominance components due to theircompression. Figure 2.6 shows the blocks of a macroblock and their numbering convention.

0 1 4 5

2 3

Y Cb Cr

Figure 2.6 Structure of a macroblock, and the blocks numbering convention

8

2.2 Types of macroblock present in a frameIn a single frame there may be many different types of macroblock (MB). Tables 2.3, 2.4 and2.5 show the different types of macroblock that can be present in I, P and B framesrespectively.

2.2.1 Types of macroblock in an I frameIn an I frame there are only two types of macroblock, Intra-d uses the default quantizer scalewhile Intra-q uses a scale defined by the buffer status[2].

2.2.2 Types of macroblock in a P frameA P frame uses motion estimation and compensation to reduce the amount of informationneeded to play the video, this process is described later in the chapter. There are eight differenttypes of macroblock in a P frame, but for the purpose of this project they can be divided upinto three categories.

1. Intra. There are no motion vectors present. These macroblocks don’t use anyreference frame, and are encoded using only information from itself.

2. Predicted. These macroblocks have motion vectors present.3. Skipped. These macroblocks are the exact same as the macroblock in the previous

frame.

Table 2.3 Macroblock types in an I frame[2]Type VLC code MB quantIntra-d 1 0Intra-q 01 1

Table 2.4 Macroblock types in a P frame[2]Type VLC Intra M F Coded pattern Quantpred-mc 1 1 1pred-c 01 1pred-m 001 1Intra-d 0001 1 1Pred-mcq 0001 0 1 1 1Pred-cq 0000 1 1 1Intra-q 0000 01 1 1Skipped

9

Table 2.5 Macroblock types in a B frame[2]Type VLC Intra M F M B Coded pattern Quantpred-i 10 1 1pred-ic 11 1 1 1pred-b 010 1pred-bc 011 1 1pred-f 0010 1pred-fc 0011 1 1intra-d 0001 1 1pred-icq 0001 0 1 1 1 1pred-fcq 0000 11 1 1 1pred-bcq 0000 10 1 1 1intra-q 0000 01 1 1skipped

Here is the meaning of the abbreviations used in the tables above:

• VLC - variable length code• M F - motion forward• M B - motion backward• pred - predictive• m - motion compensated• c - at least one block in the macroblock is coded and transmitted• d - default quantizer is used• q - quantizer scale is changed• i - interpolated. This is a combination of forward prediction and backward prediction.• b - backward prediction• f - forward prediction

2.2.3 Types of macroblock in a B frameA B-frame uses two reference frames for prediction and can have twelve different types ofmacroblock. This leaves it the most complex, but it gives the highest compression rate. For thepurpose of this project, they can be categorised in five groups:

1. Forward predicted. Macroblock in encoded using only a past I or P frame.2. Backward predicted. Macroblock in encoded using only a future I or P frame.3. Forward and Backward predicted (Interpolated). macroblock is encoded using both a

past and future frame as a reference. The two macroblocks are interpolated to form thepredicted macroblock.

4. Intra. No reference frame is used. Macroblock is encoded using information from itself.5. Skipped. Macroblock is the same as the one in the previous frame

2.3 Motion estimation and compensationMPEG achieves it its high compression rate by the use of motion estimation and compensation.MPEG takes advantage of the fact that from frame to frame there is very little change in thepicture (usually only small movements). For this reason macroblock size areas can be

10

compared between frames, and instead of encoding the whole macroblock again the differencebetween the two macroblocks is encoded and transmitted.

Figure 2.7 demonstrates how forward motion compensation is achieved (backwardcompensation is done in the same way except a future frame in the display order is used as thereference frame.)

I or P Reference Frame P or B Frame

y x

Search area

x

y

Figure 2.7 A forward predicted motion vector

Macroblock “x” is the macroblock we wish to encode, macroblock “y” is its counterpart in thereference frame. A search is done around “y” to find the best match for “x”. This search islimited to a finite area, and even if there is a perfectly matching macroblock outside the searcharea, it will not be used. The displacement between the two macroblocks gives the motionvector associated with “x”.

There are many search algorithms to find the best matching macroblock. A full search gives thebest match but is computationally expensive. Alternatives to this are the Logarithmic search,One-at-a-time search, Three-step search and the Hierarchical search [3]. The choice of searchis decided by the encoder, with the usual trade-off between time and accuracy.

11

2.3.1 Encoding the motion vectorsOnce the motion vector is found it has to be encoded for transmission. The first step in theencoding process is to find the differential motion vectors (DMV). In a lot of situations, (e.g. apan) all motion vectors will be nearly the same. Therefore subtracting the motion vector for amacroblock from the previous motion vector in the slice will reduce a lot of the vectors tozero. Note this differential vector is reset to zero if an I-type macroblock is encountered, andalso at the end of a slice.

The second step is to make sure all differential vectors are within a permitted range.This range is defined by forward_f_code/backward_f_code and is given in table 2.6.If the vectors are outside this range, a modulus is added/subtracted. Finally the differentialvectors are variable length coded and transmitted. The variable length codes are given in table2.7.To illustrate an example suppose the vectors (full pel precision) in a slice are:

3 10 30 30 -14 -16 27 24

All vectors lie in the range -32 to 31, therefore a forward_f_code of 2 is used. Thedifferential vectors are:

3 7 20 0 -44 -2 43 -3

Adding/subtracting the modulus (64 in this case) to any values outside the range gives:

3 7 20 0 20 -2 -21 -3

The variable length codes for these values are[2]:

Value VLC Value VLC3 0010 0 20 0000 0100 1017 0000 1100 -2 011120 0000 0100 101 -21 0000 0100 01100 1 -3 0011 0

The code needed to decode these VLC values is given in MPEG standard

Table 2.6 Range of motion vectors and their modulus[2]

Forward-f-code orBackward-f-code Half pel precision Full pel precision Modulus

1 -8 to 7.5 -16 to 15 322 -16 to 15.5 -32 to 31 643 -32 to 31.5 -64 to 63 1284 -64 to 63.5 -128 to 127 2565 -128 to 127.5 -256 to 255 5126 -256 to 255.5 -512 to 511 10247 -512 to 511.5 -1024 to 1023 2048

12

Table 2.7 VLC for the differential motion vectors (DMV) [2]

VLC code DMV VLC code DMV0000 0011 001 -16 010 10000 0011 011 -15 0010 20000 0011 101 -14 0001 0 30000 0011 111 -13 0000 110 40000 0100 001 -12 0000 1010 50000 0100 011 -11 0000 1000 60000 0101 11 -10 0000 0110 70000 0101 01 -9 0000 0101 10 80000 0101 11 -8 0000 0101 00 90000 0111 -7 0000 0100 10 100000 1001 -6 0000 0100 010 110000 1011 -5 0000 0100 000 120000 11 -4 0000 0011 110 130001 1 -3 0000 0011 100 140011 -2 0000 0011 010 15011 -1 0000 0011 000 161 0

SummaryIn this chapter the MPEG standard is introduced and described, the layered structure of the bitstream was explained and the concept of a motion vector illustrated. The difference betweenthe bit stream order and the display order of the frames was explained and illustrated. Thedifferent types of macroblock present in I, P and B frames was given. To find the motion fromframe to frame all of these factors have to be considered.

13

Chapter 3

3 Extraction of the motion vectorsThis chapter discusses the steps taken to extract the motion vectors from the MPEG stream. Italso describes the alterations made to the source code to allow the calculation of the motionfrom frame to frame.

3.1 Choosing a decoderThe first step was to choose an MPEG1 decoder. The decoder is used to extract and decodethe motion vectors. A search of suitable decoders was undertaken, and this resulted in twocandidates; the Berkeley decoder and a Java decoder.

3.1.1 The Berkeley DecoderThe Berkeley decoder can be found at:http://www.bmrc.berkeley.edu:80/frame/research/mpeg/mpeg_play.htmlInitially it was thought that this would be the best decoder to use as it was written in C. Speedis an important factor in this project due to the size of the files that have to be processed, andC has a superior processing time to Java. However the source code proved impossible to read.It was not commented and there are pointers pointing to pointers pointing to ???

3.1.2 The Java DecoderThe Java decoder can be found at:http://rnvs.informatik.tu-chemnitz.de/~ja/MPEG/MPEG_Play.htmlThe Java program’s speed disadvantage over C was compensated for by its well structured anddocumented style. There are two versions of the decoder available. The default version storesall the frames as it decodes them. This version is impractical to use, as all the memory is usedup after only a few frames are decoded. It has to be able to decode thirty thousand frames! By making a small alteration to the source code, we get the Just-in-time version. Thisversion only stores seven or eight frames at a time, which makes it suitable for our purpose.

3.2 Description of the source codeThe motion vectors are decoded by the two classes MPEG_video and motion_data.MPEG_video is the main class in the program. It takes in the bit stream and decodes it. Askeleton of the program is given below.

14

public class MPEG_video implements Runnable {

MPEG_video () {}

public void run() {mpeg_stream.next_start_code();do {

Parse_sequence_header();}

do {Parse_group_of_pictures();

}}

}

private void Parse_sequence_header() {

}

private void Parse_group_of_pictures() {

do {Parse_picture();

}}private void Parse_picture () {

do {Parse_slice();

}}

private void Parse_slice() {

do {Parse_macroblock();

}}private void Parse_Block() {

}

}

It is clear how the program first takes in the highest level layer, and parses it. The programthen extracts the information in a section of that layer, and moves down to the next level. Thisprocess is repeated for all the layers.

The motion vector information is contained in the macroblock layer. Once thisinformation is known, it is passed to a method in motion_data calledcompute_motion_vector. To decode the motion vectors, compute_motion_vector usesanother method in motion_data called motion_displacement. The code in these methods isgiven in Appendix A.

15

The two components of the vector are right_x and down_x. The conventionaldirection used for the components is right and down, minus components represent left and uprespectively.

For this project it was decided to use half pixel precision for the vectors,(recon_right_x and recon_down_x). The vectors may not be pointing to a particular pixel,but it is the true vector for that macroblock.

The fact that the vector is not pointing at a pixel should not be an issue. If the motionvectors are used for the selection of the key frame in a shot, there is no need for the vector tobe pointing at a pixel.

If the vectors are used to compensate for movement in a frame, Edge Detection (theprocess that will be using the vectors) blows up an area around around each pixel whencomparing the two frames[5]. By simply halving the extracted (half pixel precision) vector, andusing it for any motion compensation, the need for the extra calculations to get the vectorpointing to a pixel will be eliminated. This will enhance the speed of the program. Anyinaccuracies in the motion vector will be compensated for by the Edge detection’s“explosion”. Besides Edge detection is not an exact science.

3.3 Storage of the Motion VectorsThe motion vectors have to be stored in an order that will allow the motion from frame toframe to be calculated. First, the process of reording the bit stream order to the display order isdiscused. This is followed by a description of how selective vector storage allows thisreordering.

3.3.1 Reordering the bit stream order to the display orderAs described in Chapter 2 the frames do not come into the decoder in the same order as theyare displayed. To reorder the frames to the display order the following procedure is used (seeFigure3):

• If an I or P frame (lets call it “1”) comes in it is put in a temporary storage future. Iand P frames always come into the decoder before the B frames that reference them.

• “1” is left in future until another I or P frame (“5”) comes in. The arrival of “5”indicates it is “1’s” turn in the display order. “1” is taken out of future, put in thedisplay order. “5” is put in future until another I of P frame arrives.

• All B frames are immediately put in the display order.• At the end whatever frame is left in future is taken out and put in the display order.

A typical bit stream is shown in figure 3.1, the display order number of each frame is alsogiven. Note this process doesn’t use the display order number. It is given to clarify what ishappening.

16

Bit stream Order Display Order future

1I 1I

5P 1I 5P

2B 2B 5P

3B 3B 5P

4B 4B 5P

10P 5P 10P

6B 6B 10P

7B 7B 10P

8B 8B 10P

9B 9B 10P

11I 10P 11I

15P 11I 15P

12B 12B 15P

13B 13B 15P

14B 14B 15P

15P

Figure 3.1 Converting from bit stream order to Display order

3.3.2 Storing the motion vectorsFor ease of handling, it was decided that the motion vectors should be stored in two-dimensional arrays. The size of the array corresponds to the frame size (in macroblocks). Theposition of the entry in the array corresponds to the macroblock’s position in the frame. Thereis a separate array for the two components of the vector, one for the right component and onefor the left component. To allow the storage of all the vectors that may be present in a frame,four arrays have to be created. Two arrays are needed for the storage of the forward predictedvectors, and two for backward predicted vectors.

To find the motion from one frame to another, a record of the motion vectors in theprevious frame has to be kept. This means four more arrays have to be created. Finally themotion vectors in a P frame have to be stored until it is the P frames turn in the display order.As a P frame can only have forward predicted vectors, only two arrays need to be created. Thenames of all the arrays used in this project are given below:

17

Array name Function of array .futureRight Store the motion vectors in a P frame untilfutureDown it is the P frames turn in the display order.

presentForwardRight Stores the motionpresentForwardDown vectors of the presentpresentBackwardRight frame in thepresentBackwardDown display order

pastForwardRight Stores the motionpastForwardDown vectors of the previouspastBackwardRight frame in thepastBackwardDown display order

3.3.3 Operation of programIf an I frame comes into the decoder, all the vectors in future are reset to zero (after the valuesthat were in it are taken out and put in present), as an I frame has no motion vectors.If a P frame comes in all its vectors have to be stored in future (after the values that were in itare taken out and put in present). The problem with the program is thatcompute_motion_vector (the method that decodes the motion vector) doesn’t know whattype of frame is in the decoder, or even what type of predicted vector it has to decode. It couldbe a forward predicted vector in a P or B frame, or a backward predicted vector in a B frame.To over come this problem, an extra variable, Pic_Type, is also passed. Pic_Type determineswhat type of frame is present in the decoder, Pic_Type = 2 means it is a P frame, and thevectors are put in future.

If a B frame comes in, all its vectors have to be stored in present. However present hastwo types of vector; presentForward and presentBackward. If it is forward predicted vectorsthat are to be calculated compute_motion_vector is called from the same place as it was forthe P frame. This time Pic_Type = 3 (for a B frame), and the vectors are stored inpresentForward. If backward predicted vectors are to be calculated, compute_motion_data iscalled from a different place. The arbitrary value four is passed, to indicate the vectors are tobe put in presentBackward. A diagram of where the vectors are stored is given in Figure 3.2.

18

Figure 3.2 Diagram of where the motion vectors for the different frames are stored

3.3.4 Alterations made to the decoderThe processes of inputting the motion vectors into the correct arrays and reordering the framesinto the display order were incorperted into the decoder. The end result was that the motionvectors for the present frame in display order are in presentForward and presentBackward.While the motion vectors for the previous frame in the display order are in pastForward andpastBackward. A flow chart of the program is given in Figure 3.3.

A skeleton of the two files MPEG_video and motion_data, (after the changes weremade to then) is given in Appendix A. Also in Appendix A is the new class, Array, that had tobe created.

Frame comes in

Is it an I, P or B frame?

I P

Reset future

B

All vectors put in futureForward or backwardprediction

forward backward

All vectors put inpresentForward

All vectors put inpresentBackward

19

Figure3.3 Flow chart of the operational program

Is frame I or PType?

Yes No

Put all the vectors from present into past

Reset present

Take Vectors out offuture and put in present

Reset future

Is frame I or P type ?

I P B

No vectors, allvectors in futureremain zero

All vectors put infuture

All vectors put inpresent

Frame comes in

20

To bring in theses changes it was decided it would be best to creat a new class. This was donefor a few reasons:

1. MPEG_video.java is a large file. It seemed unsuitable to make it any bigger.2. Even though MPEG_video is very large, there is a logical flow to it. The bit stream is

decoded from “top to bottom”, introducing new code would only disturb this naturalflow and leave the program difficult to read.

3. At some time in the future the MPEG2 standard may be used instead of the MPEG1standard that is being used at the moment. However most of the code developed forthis project may still be relevent. Having the code developed in a single class will leaveit easier to make the transition from MPEG1 to MPEG2.

SummaryThere is a program developed which extracts the motion vectors from the bit stream. Thesevectors are stored in a fashion that allows the motion from frame to frame to be easilycalculated. However additional information is needed to calculate this motion. The reasons weneed this additional information are explained in the next chapter. Note the source code for thedecoder has not been minimised. The code used to calculate the Inverse Discrete CosineTransform, and also the code used to display the picture can be deleted.

21

Chapter 4

4 Finding the motion from frame to frameTo find the motion from frame to frame, the motion vectors in the present frame aresubtracted from the vectors in the previous frame. However, depending on what type of frame(I, P, or B) is in present and past, not all of the arrays can be used. An explication of this isgiven below.

A vector defines a distance and a direction, it does not define a position. We have toknow the vector’s inital position (reference point) to find all the motion from frame to frame.Only vectors with the same reference point can be subtracted from each other. To illustrate,lets take the simple example of “x” moving across a portion of the screen as shown in Figure4.1

1 2 3 4 5 x x x

x x

I B B B PFigure 4.1 Motion vectors associated with a moving picture.

The arrow represents a forward vector and represents a backward vector [Note a forwardvector doesn’t have to be pointing forward, and a backward vector pointing backward. It isjust the nameing convention for whither the reference frame in the past (forward) or future(backward).]The values for the vectors are given below:

In the first frame there are no motion vectorsIn frame 2:

forwardRight = 2; forwardDown = -3; (2, -3)backwardRight = -7; backwardDown = 5; (-7, 4)

Frame 3:forward = (4, -6)backward = (-5, 1)

Frame 4:forward = (7, -7)backward = (-2, 0)

Frame 5:forward = (9, -7)

Transition 1To find the motion in the transition from frame one to frame two, we can only use the

forward vector. The backward vector has no reference in the I frame. The motion is just (2, -3)Transition 2

Here, the forward and backward vectors can be used as both forward vectors have thesame reference point and both backward vectors have the same reference point.

22

presentForward - pastForward = forward motion(4, -6) - (2, -3) = (2, -3)

presentBackward - pastBackward = backward motion(-5, -1) - (-7, 4) = (2, -3)

To find the total motion avarage the two resultsmotionRight = (2+2)/2 = 2motionDown = (-3+-3)/2 = -3Total motion = (2, -3)Note in this example the forward motion will always equal the backward motion but this is notusually the case in video.

Transition 3forward (7, -7) - (4, -6) = (3, -1)backward (-2, 0) - (-5, 1) = (3, -1)Total motion (3, -1)

Transition 4Both the forward and backward vectors can be used here. Both forward vectors are

referenced to the same point and, as the B frames backward vector is referenced to the Pframe. The P frame is said to have a zero backward vector.forward (9 ,-7) - (7, -7) = (2, 0)backward (0, 0) - (-2, 0) = (2, 0)Total motion (2, 0)

The motion for the sequence is:(2,-3), (2, -3), (3, -1), (2, 0)

4.1 Considerations that have to be taken into account - Frame level

Table 4.1 shows which types of vector can be subracted depending on what type of frame is inpast and present.

Table 4.1 Vector types that can be used in the transition from frame to frame

past present Vector types that can be subtractedI B or P forward onlyI I NoneP B or P forward onlyP I NoneB B or P forward and backwardB I backward only

I Frame to B or P Frame :When going from an I frame to a B or P frame only the forward motion vectors can be

used. The P frame will only have forward vectors, the B frames backward vectors can’t beused as they have no reference in the I frame.

23

I Frame to I frame:There are no vectors present in either frame.

P Frame to P or B frame:None of the backward vectors in the B frame have a reference in the P frame.

Therefore only forward vectors can be used.P Frame to I Frame:

The forward vectors in the P frame do not have a reference in the I frame. No motioncan be found.B Frame to B or P Frame:

Both forward and backward vectors can be used as both have the same reference pointfrom frame to frame.B Frame to I Frame:

Only the backward vectors are referenced in the I frame.

4.2Considerations that have to be taken into account - macroblock levelIn Chapter 2, all the different types of macroblock that can be present in a frame weredescribed. Each macroblock in a B frame does not have both forward and backward vectors.Some macroblocks will only have either a forward or backward vector. Other macroblocks willhave no vector at all, either because it is an Intra macroblock, or it is a skippedmacroblock.This complicates the process of finding the motion from frame to frame evenfurther.

It is not a simple matter of subtracting all the values in one array from all the values inits corresponding past array. A more accurate representation of “x” moving accross a portionof the screen may be as shown in Figure 4.2

1 2 3 4 5 x x x

x x

I B B B PFigure 4.2 Realistic version of vectors associated with a moving picture

In this example the transition from frame 1 to frame 2 can be calculated as before.If the second transition is calulated as before we get:

forward motion:(4, -6) - (2, -3) = (2, -3)backward motion:(-5, 1) - (0, 0) = (-5, 1)Total motion = (-1.5, -1)

This result is incorrect.To get the correct result, only the forward motion can be used. Similarly only the backwardmotion is used for the third transition. The motion for the final transition can not be found

24

because there is only a backward vector in frame 4 and only a forward vector in frame 5. Onlysimilar types of vector can be subtracted from each other.

Below are further rules to complement the rules that were established in table 4.1

• Only if there is a similar type of vector (forward, backward or both) present in bothframes can the motion be found.

• A reference frame is said to have all vectors equal to (0, 0)• If there is a skipped macroblock in the present frame, there is zero motion for that

transition.• If there is a skipped macroblock in the previous frame, the motion for that transition

can’t be calulated. An exception to this is if there is also a skipped macroblock in thepresent frame in which case the motion will be zero.

• If there is an Intra macroblock in either the present or previous frame, the motion for thattransition can’t be calculated.

SummaryIn this chapter the extra information needed to find the motion from frame to frame isdescribed. A set of rules is established on how to find the motion. Note this set of rules is notrigid. By keeping track of other information, more vectors can be found. For example, if arecord of the vector for a macroblock before a skipped macroblock is kept, the motion, in thetransition between that skipped macroblock (or the final skipped macroblock in a series ofskipped macroblocks) and a non intra macroblock can also be calculated. However, this willonly further complicate the program. For a starting point, the rules created in this chaptershould be sufficient. If the program does not perform satisfactorily this extra motion can becalculated.

25

ConclusionThis project set out to extract the motion vectors from an MPEG stream. This information wasto be used to calculate the motion of all objects from one frame to another.

The first step of the project was to choose an MPEG1 decoder to extract and decodethe motion vectors. The choice came down to a Java decoder and a C decoder. Two issueshad to be taken into account when choosing the decoder; how fast the decoder could run, andhow easily it could be modified. The Java decoder was chosen because although the MPEG bitstream is quite complicated, it is very well structured. Java’s superior ability to deal with thecomplexity of the bit stream in an easy to follow manner outweighed the C decoder’s superiorprocessing time.

Using the decoder, the motion vectors were extracted and decoded. The decoder wasmodified to allow the subtraction of all the motion vectors in the present frame (display order)from all the motion vectors in the previous frame (display order). All the modifications wereput in a separate class. This means minimal alterations to the decoder’s well structured code.The creation of a separate class with all the new code is important because, at some time in thefuture the MPEG2 standard may be used instead of the MPEG1 standard (the standard we areusing at the moment). All the relevant code developed for the MPEG1 standard can be easilytaken and used for the MPEG2 standard.

On completion of the program, it was realised that in order to find the motion from oneframe to another, it is not a simple matter of subtracting all the vectors in the present framefrom all the vectors in the previous frame. A set of rules have to be followed.

The rules were developed in two stages. First a general set of rules that only take intoaccount what type of frame (I, P or B) the vectors are in were written. Then at a lower level,the macroblock types present in the frames were taken into consideration and a comprehensiveset of rules were written. These rules give the true motion from frame to frame.

The next step in this project is to incorporated the rules into the program. Finally, toenhance the program’s performance, some of the decoder’s source code can be deleted. Thecode which deals with decoding the pixel coefficients is irrelevant. Also, the code used todisplay the video can be omitted.

To conclude, on accomplishing the task presented in this project (to extract the motionvectors from the MPEG stream) it was discovered that more information is needed in order toachieve the ultimate goal of finding the motion of objects from one frame to another. Thisextra information is identified. Also, a description is given on how to use this information tofind the motion from frame to frame.

References[1] http://www.compapp.dcu.ie/~asmeaton/Video-Proj-summary.html[2] ISO/IEC 11172-2, Genève, 1993[3] K.R. Rao and J.J.Hwang, “Techniques & Standards For Image Video & Audio Coding”,Prentice Hall PTR, New Jersey, 1996[4] http://rnvs.informatik.tu-chemnitz.de/~ja/MPEG/MPEG_Play.html[5] Aidan Totterdell, “An Algorithm for detecting and classifying scene breaks in an MPEG1video bit stream”, Dublin City University, 1998.

26

Appendix A

Code for the two methods, compute_motion_vector and motion_displacement [4]private int motion_displacement(int motion_code, int PMD, intmotion_r) {

int dMD, MD;

if (x_ward_f == 1 || motion_code == 0) {dMD = motion_code;

}else {

dMD = 1 + x_ward_f * (Math.abs(motion_code) -1);

dMD += motion_r;if (motion_code < 0) dMD = -dMD;

}MD = PMD + dMD;if (MD > max) MD -= range;else if (MD < min) MD += range;return MD;

}

public void compute_motion_vector(int motion_horiz_x_code,int motion_verti_x_code,int motion_horiz_x_r, intmotion_verti_x_r) {

recon_right_x_prev = recon_right_x =motion_displacement(motion_horiz_x_code,

recon_right_x_prev, motion_horiz_x_r);if (Full_pel_x_vector) recon_right_x <<= 1;recon_down_x_prev = recon_down_x =motion_displacement(motion_verti_x_code,

recon_down_x_prev, motion_verti_x_r);if (Full_pel_x_vector) recon_down_x <<= 1;

right_x = recon_right_x >> 1;down_x = recon_down_x >> 1;right_half_x = (recon_right_x & 0x1) != 0;down_half_x = (recon_down_x & 0x1) != 0;

right_x_col = recon_right_x >> 2;down_x_col = recon_down_x >> 2;right_half_x_col = (recon_right_x & 0x2) != 0;down_half_x_col = (recon_down_x & 0x2) != 0;

}

27

MPEG_video/*This is a skeleton structure of MPEG_video, just to document some of *//*the things that have been added in. Once the resolution of the video *//*is known Arrays is called and the size of all the arrays can be set. *//*If the frame is I or P type the future vectors will become the present *//*vectors in display order, if the frame is P type any vectors present *//*in the frame are stored in future until its turn in display order comes *//*(when another I or P frame comes in) *//* When compute_motion_vector is called some added information is *//*passed to it, the macroblockes address (row and column), what type of *//*frame it is if compute_motion_vector is to calculate forward motion *//*vectors, if it is to calculate backward motion vectors the arbitrary *//*value 4 (don't confuse this 4 with a D_type frame) is passed, just *//*to indicate the vectors are backward. */import java.io.InputStream;import java.applet.Applet;

public class MPEG_video implements Runnable {

private Array VideoArray = new Array();

MPEG_video () {}

public void run() {mpeg_stream.next_start_code();do {

Parse_sequence_header();}

do {Parse_group_of_pictures();

}}

}

private void Parse_sequence_header() {Width = mpeg_stream.get_bits(12);Height = mpeg_stream.get_bits(12);mb_width = (Width + 15) / 16;mb_height = (Height + 15) / 16;VideoArray.setDimensions(mb_height, mb_width);

}

private void Parse_group_of_pictures() {

do {VideoArray.pastEqualsPresent();//Store vectors of previos

frame.VideoArray.resetPresent();();// All Vectors are reset for the

new frameParse_picture();VideoArray.printArray(1);// Optional

}

}

private void Parse_picture () {

if (Pic_Type == P_TYPE || Pic_Type == I_TYPE) {VideoArray.futureEqualsPresent(); // Take what is in future and put

in presentVideoArray.resetFuture(); // Reset future for new values}

do {Parse_slice();

28

}}

private void Parse_slice() {

do {Parse_macroblock();

}

}

private void Parse_macroblock() {if (macro_block_motion_forward) {

Forward.compute_motion_vector(motion_horiz_forw_code,motion_verti_forw_code,

motion_horiz_forw_r, motion_verti_forw_r, mb_row, mb_column,Pic_Type);

}if (macro_block_motion_backward) { // motion vector for backward prediction

existsb = 4;Backward.compute_motion_vector(motion_horiz_back_code,

motion_verti_back_code,motion_horiz_back_r, motion_verti_back_r, mb_row, mb_column,

b);}

}

}

motion_data

/*This is a skeleton of the class motion_data there is very little added to it. *//*In the method compute_motion vector some extra information is passed, as was *//*documented in MPEG_video. All this extra information is passed straight to *//*Arrays along with the values of the motion vectors (in half pixels)*/public class motion_data {

private Array MotionArray = new Array(); //Create instance of The classArray

public void init () {}

public void set_pic_data() {}

public void reset_prev() {}

/* The internal method "motion_displacement" computes the difference of the*//* actual motion vector in respect to the last motion vector. Refer to *//* ISO 11172-2 to understand tho coding of the motion displacement.*/

private int motion_displacement(int motion_code, int PMD, int motion_r) {int dMD, MD;

if (x_ward_f == 1 || motion_code == 0) {dMD = motion_code;

}else {

29

dMD = 1 + x_ward_f * (Math.abs(motion_code) - 1);dMD += motion_r;if (motion_code < 0) dMD = -dMD;

}MD = PMD + dMD;if (MD > max) MD -= range;else if (MD < min) MD += range;return MD;

}

/* The method "compute_motion_vector" computes the motion vector accordingto the */

/* values supplied by the "ScanThread". It uses the method"motion_displacement". */

/* The result are the motion vectors for the luminance and the chrominanceblocks.*/

public void compute_motion_vector(int motion_horiz_x_code, intmotion_verti_x_code,

int motion_horiz_x_r, int motion_verti_x_r, int mr, int mc,int chooseArray) {

recon_right_x_prev = recon_right_x =motion_displacement(motion_horiz_x_code, recon_right_x_prev,

motion_horiz_x_r);if (Full_pel_x_vector) recon_right_x <<= 1;recon_down_x_prev = recon_down_x =motion_displacement(motion_verti_x_code, recon_down_x_prev,

motion_verti_x_r);if (Full_pel_x_vector) recon_down_x <<= 1;

/* The motion vectors(in half pixels) is sent to Arrays, along withinformation on*/

/* which array they are to go.*/

MotionArray.fillArray(mr, mc, recon_right_x, recon_down_x,chooseArray);

}

public void get_area() {}

public void copy_area() {}

public void copy_unchanged() {}

public void put_area() {}

public void put_area() {}

}

Array

/*The class Array is used for the storage of the Motion Vectors*//*Two instances of the class Array will be created, one in the class*//*MPEG-video called VideoArray. This instance will be used to first*//*create the size of the arrays, depending on the reselusion of the video clip.*//*This instance will also pass information regarding which arrray*//*the Motion Vectors should be in (Past or Present).*/

30

/* The second instance, created in the class motion_data is called*//*MotionArray. This instance passes the values of the Motion Vectors*//*to the arrays and also information regarding which array they go into*//*(futureRight, futureDown, presentForwardRight, presentForwardDown, *//*presentBackwardRight or presentBackwardDown) */

class Array { public Array()

{}

/*All arrays are created as static because we want both instances *//*of the class Array to be able to see them*/

static public int[][] futureRight;static public int[][] futureDown;static public int[][] presentForwardRight;static public int[][] presentForwardDown;static public int[][] presentBackwardRight;static public int[][] presentBackwardDown;static public int[][] pastForwardRight;static public int[][] pastForwardDown;static public int[][] pastBackwardRight;static public int[][] pastBackwardDown;

/*Sets the dimensions of all the arrays*/

public void setDimensions(int mb_h, int mb_w){futureRight = new int[mb_h][mb_w];futureDown = new int[mb_h][mb_w];presentForwardRight = new int[mb_h][mb_w];presentForwardDown = new int[mb_h][mb_w];presentBackwardRight = new int[mb_h][mb_w];presentBackwardDown = new int[mb_h][mb_w];pastForwardRight = new int[mb_h][mb_w];pastForwardDown = new int[mb_h][mb_w];pastBackwardRight = new int[mb_h][mb_w];pastBackwardDown = new int[mb_h][mb_w];

}

/*fill_array takes the values from the method compute_motion_vectors*//*in the class motion_data and puts them in the appropriate array.*//*Note all values are in half pixels*/

public void fillArray(int mr, int mc, int right, int down, int chooseArray){

if(chooseArray == 2){futureRight[mr][mc] = right;futureDown[mr][mc] = down;

}if(chooseArray == 3){

presentForwardRight[mr][mc] = right;presentForwardDown[mr][mc] = down;

}if(chooseArray == 4){

presentBackwardRight[mr][mc] = right;presentBackwardDown[mr][mc] = down;

}}

/*This method is only used to print out the values!*/

public void printArray(int printWhich){if (printWhich == 1){

for (int j = 0; j < futureDown.length; j++){for (int i = 0; i < futureDown[j].length; i++){System.out.print(""+pastBackwardRight[j][i]+"\t");}

31

System.out.print("\n");}

System.out.print("\n");}

}

/*As each new frame comes in all values have first to be set to zero*/

public void resetPresent(){for (int j = 0; j < futureDown.length; j++){

for (int i = 0; i < futureDown[j].length; i++){

presentForwardRight[j][i] = 0;presentForwardDown[j][i] = 0;presentBackwardRight[j][i] = 0;presentBackwardDown[j][i] = 0;

}

}

}

/*When an I or P picture comes in we have to take all the motion vectorsthat*/

/*are in future and put them in present*/

public void futureEqualsPresent(){for (int j = 0; j < futureDown.length; j++){


presentForwardRight[j][i] = futureRight[j][i];presentForwardDown[j][i] = futureDown[j][i];

}

}

}

/*After all the values are taken out of future, future has to be reset*//*before any more values can be put in.*/

public void resetFuture(){for (int j = 0; j < futureDown.length; j++){


futureRight[j][i] = 0;futureDown[j][i] = 0;

}

}

}

/*We need to store all the Motion Vectors from the previous frame*//*so the net movement from frame to frame can be calculated*/

public void pastEqualsPresent(){for (int j = 0; j < futureDown.length; j++){


pastForwardRight[j][i] = presentForwardRight[j][i];pastForwardDown[j][i] = presentForwardDown[j][i];pastBackwardRight[j][i] = presentBackwardRight[j][i];pastBackwardDown[j][i] = presentBackwardDown[j][i];

}

}

}

}

Motion Vectors

Documents

Transcript of Motion Vectors