A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors...

7/23/2019 A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors ([email protected])

1/4

2004 IEEE International Co nferenc e

on

Multimedia and Expo

ICME)

A Statistical Approach for Object M otion Estimation with

MPEG

Mo tion Vectors

Xiaodong Yu , Ping Xue' and Qi Tian'

Nanyang Technological Universiv, School of Electrical and Electronic Engineering, Singapore

Institute fa r Infocomm Research, Agency fo r Science, Technology and Research, Singapore

' {exdyu, epxue)@ntu.edu.sg, [email protected] tar. du.sg

Abstract

In this paper we propose

a

statistical approach to

estimate the object motion with A4PEG motion vectors A

model with

tw

normal distribution terms is applied to

represent the simplified object motion. One

term

models

the

nobes

embedded in the mofion vectorfield produced

in the encoding stage and the other term

models ihe

randomness of the

frue

object motion. Experiments with

vehicle mo tion estimation fro m MPEG ha@c video are

used

to

evaluate the proposed algorithm. The influence of

rime window, fram e size and referencej+ame distance a re

investigated. The vehicle speeds can be estimared with a

high accuracy up to 85

- 92 .

1. Introduction

Object motion estimation is a classic problem in the

computer vision field. In recent years with the popularity

of MPEG videos, much research effoorts have been

attached to estimate object motion with MPEG motion

vectors. Although MPEG motion vector is originally

designed to minimize the motion prediction error in

coding, it also embeds rich motion information among

frames

[ I ]

Sinc e motion vectors are readily available in

MPEG streams, we need neither fully decode the

compressed video stream nor calculate the optical flow

thus great computations can be saved.

Motion-vector-based object motion estimation is

composed of two components: motion segmentation and

object tracking. It is assumed that objects are rigid or their

parts are rigidly connected

to

one another and objects

have continuous motion [I]. Thus an object can be

segmented fr om background by clustering m otion vectors

according to their similarities in directions or amplitudes

[2,3,10].

In the next step, motion parameters are derived

from the motion vectors associated to this object for

tracking. Such algorithms are analogues of those in

optical flow field and they all rely on the success of

moving object segmentation. However, the granularity of

motion vector field limits the performance o f motion

vector based object segmentation. To solve this problem,

scholars have raised several approaches. Eng an d Ma

[5]

used unbiased

fuzzy

clustering to replace the well-know

fuzzy c-mea ns clustering. They found that this algorithm

was sensitive to the existence of small motion vector

clusters and resulted in accurate identification of small

objects. Babu and Ramakrishnan [6] accumulated and

interpolated motion vectors over a few frames to enrich

the motion information. Nevertheless, these approaches

are inefficient if the object is

too

small. For example,

wherever

two

or more objects sm aller than

a

macroblock

conbibute to distinct motions within a macroblock, the

encoded motion vector cannot represent the motion

correctly [4] hence motion segmentation is infeasible.

Furthermore, if an object is in similar size of one

or two

macroblocks, only one

or two

motion vectors cannot

provide sufficient information to distinguish object

motions from noisy vectors thus it is still difticult to

segment this object from the background. These pr ob lem

motivate us to seek another way to estimate object motion

with motion vectors. We argue that it is possible to

extract some useful motion information

in

macro level

even when the objects are too

small

to estimate individual

object's motion, providing these objects follow some

kinds of common motion pattern.

In this paper we proposed a statistical model to

estimate the mean object motio n with MF'EG motion

vectors under the stationary assumption. Two normal

distribution terms are used to model the randomness of

the object motion and the noises embedded in motion

vector field respectively. Applying the statistical analysis

within a time window, we alleviate the granularity of

motion vector field on the cost of instant motion

information.

The rest of this paper is organized as follows.

In

Section

2,

we formulate

our

research question and

proposed a statistical model. Then we present the test bed

for the proposed model in Section 3 . In Section we

present experimental results of the model and the

influential factors presented in Section 2 with the test bed.

Finally, a conclusion and the discussions of future work

are given in Section 5 .

2.

Theoretical an alysis

In this paper, we assume that the object motions are

homogeneous both in spatial and temporal domain and we

call it the stationary assumption. This assumption requires

that object motions are similar to one another in terms of

0-7803-8603-5/04/ 20.00

2004 EEE

519


2/4

amplitude or direction and their motions change slowly.

In light of the difiiculties of object segmentation with

motion vectors discussed in Section 1, we expect

to

describe the objects motion with a few statistical

parameters within

a

sho rt period rather than identifying

instant motion for every single object. The period within

which the stationary assumption is satisfied is defined

as

the time window. We defme the displacement that

associates with object motion as the true object motion,

the one that does not as noise and the motion derived

directly from the raw motion vector field, i.e., the true

object m otion plus noise,

as

the observed object motion,

X, = i n ,

where

X

X nd

n

are variables of the observed objects

motion, the true object motion and noise respectively,

denotes the i-th sample, i.e., the i-th motion vector within

a time window. The variables in the statistical model can

be either amplitudes or directions of motion vectors. Th ey

can be extracted either directly from motion vector field

or from motion vector field after some transformations,

e.g., camera calibration,

as

long as the stationary

assumption can be satisfied after such transformations.

The statistical model for the true object motion is

application-tailored.

In

this paper, we are interest in the

amplitude of object motion. Under the stationary

assumption, we can expect that within

a

t ime window,

most true object motions should concentrate around a

center value, i.e., the mean, and they sho uld be symm etric

about the center in a bell shape.

This

rational deduction

coincides w ith the experimental ohservations (see Figure

2). Combining the rational deduction and the

experimental ohservations, we propose to model the true

object motion with normal distribution in this

paper.

For

the model o f noise, we employ an additive zero-mean and

constant variance normal distribution, which has been

used to model the noise in the optical flow

[7].

Thus we

have

X

-

N P,~ : : ) ,

ni -

N(O,U;),

2)

z ' -

N P . ~U : ,

where p and

dx

re mean and variance of the true

object motion, and dn ariance of noise.

We

approximate

p with the sample mean o f the observed m otion A from

Nsamoles

in

a time window T:

1)

-

The mean of the true object motion

is

a

parameter of

interest to the users in applications because it represents

the dominant motion characteristics. It is desired to

improve its estimation accuracy. T his can he a chieved by

either reducing

the

variance of the estimation error or

improving the signal-noise-ratio

SNR).

The estimation

error follows the normal distribution,

1

: U :

=

( X , ' - p )

-

N ( 0 ,

,

4)

and

S N R

is

given by

5 )

Now let

us

characteristically analyze the influential

factors in


3/4

We present a case study for the trafiic monitoring

application in this section and use it to test the proposed

model and the influential facton discussed above. It is

extended from our previous work

[SI.

In this application,

we estimated vehicle speed with motion vectors in MPEG

traffic v ideo. The traff ic video

is

collected from a S kycam,

i.e., a camera highly mounted with a much wider view.

Figure 1 shows a sample image of such video with the

motion vectors. Most of the vchicles in the traffic video

are sm aller than a single macroblock and there

is

only one

o r two motion vectors associated to each vehicle. Hence,

this

is

a good example to show the advantage of the

proposed method over conventional clustering based

counterparts. Within a short period of time, the

amplitudes and the directions of vehicles speed should be

similar and they will not change significantly. Thus, the

stationary assumption

is

satisfied easily in this application.

Due to the perspective effect, motion vectors from

different vehicles with similar speed may present

different amplitudes. Hence camera calibration

is

employed to obtain the displacement of the vehicles in

ground plane from MPEG motion vectors. In this way,

the variable of the object motion in the statistical model

and equations

1)- 6)

is the mapped motion vector, or

equivalently

Figure 1. Sample image from Skycam and motion vectors

(scaled

by IO

4.

Experimental results

We test the proposed model and the impact of the

influential factors with the test bed described in Section

and the test videos are two MF'EG videos collected from

hvo Skycams respectively. Each

of

them

is

5 minutes

long and includes 6 lanes, representing various traffic

conditions at certain place. They are digitalized by a

MPEG card in MPEG-1 format at resolution 352x288,

frame rate 25Fps, reference frame distance

Df=3

and

constant bitrate 1150khps. The variable of object motion

is

the speed of vehicle in this case study. The mean speed

of each lane

is

calculated and compared with ground buth

independently. Ground truth

is

obtained manually at 2

seconds interval.

First of all, we test the normal approximation of object

motion. Figu re 2.a show the distributions of the estimated

speed within a lane for

a

30-second test sequence. It is

bell-shape and symmetric about their mean. The normal

fits

demonstrate that the normal distribution properly

approximate the speed distributions. To test our

assumption objectively, we conduct the Ryan-Joiner

(similar to Shapiro-Wilk) normality test. The plots and

the test results are presented in Figure 2.b.

As

the plot

shows the ordered observed values and the respective

cumulative frequency almost lie along a straight line, it is

secure to assume that the vehicle speed is normally

distributed.

4 ,

833

1

? a s m m

gp.6

-I...-

.I

,.


4/4

Figure 3. The impact of time window

on

speed estim ation.

a)

and (b) show the standard deviation of speed estimation error at

and mean accuracy of speed estimation

at

different time

windows respectively.

Finally, we test the influence of frame size and

reference frame distance

on

speed estimation. We re-

encode two test videos at two Frame sizes, CIF and QC IF,

and three reference frame distance, D,= (normal), Of=

6

and

D

- 12. Then we evaluate the

test

bed with re-

f:

encoded videos. Figure 4.a illustrates the mean amplitude

of motion vectors in the test videos. We find that the

mean motion vectors increase approximately linearly with

the reference frame distance and the square root of frame

size. Using a larger reference frame distance or a larger

frame size, motion vector becom es longer. Consequently,

the influence of half pixel error is suppressed and the

accuracies of speed estimation are improved. Note that

the mean motion vector with 0 =

12

in CIF size is

slightly smaller than the double of the mean motion

vector with

Of=

2 in QCIF size and the one with Of=

in CIF size. Meanwhile, the mean accuracy of speed

estimation with 0

=

12 in CIF size deteriorates as

compared with the others in CIF size. Similar observation

is also reported in

Gonzales,

Yeo and Kuos experiments

[ 9 ] The reason

is

that with

a

larger reference frame

distance,

a

larger motion vector is selected thus mo re bits

are needed to code this motion vector. When these

additional bits are not sufficiently compensated by the

corresponding saves in coding smaller marcoblock

residual emors, a sub-optimal, shorter motion vector is

used instead of the optimal, longer one. This is the

inherent limitation of MPEG motion vector based

approach. -

M

w PI

0 1 1 w

a

II

(a) (b)

Figure 4 The mean motion vectors (a) and the mean accuracies

of speed estimation @) far test videos n different fnme size and

reference hame distance. T=60s.

5. Conclusion and future work

In

this paper, an algorithm that estimates object

motion from

MPEG

compressed video with statistical

model was presented. This algorithm complements the

existing clustering based approaches in small object

scenarios where the latter are inefficient. Theoretical

analysis and experimental evaluation were conducted to

investigate the influential factors of the proposed model.

Unlike clustering based object motion estimation

techniques, we did not attempt to segment and track every

object. Instead, we tried to estimate

a

few statistical

motion parameters for all objects within a time window.

In this way, motion estimation accuracy can be

substantially improved with proper spatial and temporal

processing (85%-92%

in DUI

est bed) an d th c granularity

of motion vector field is alleviated.

Although the test vehicle in this paper is a typical

application for traffic monitoring, it is applicable in other

scenarios where the objects are small while moving in a

common pattern. It is can also be extend to moving

camera by compensating the camera motion. They are

included in ou r future works.

Reference:

[I] Nevenka Dimitrova and Forouzan Clshani, Motion

Recovely for Video Content Classification,

ACM

Transactions

on

Information Systems,

Vo1.13, No.4,

October,1995,pp408-439

[2] F Bartolini, V Cappelhi,

and

C. Giani,

Motion

Estimation and Tracking for Urban Traffic Monitoring,

Proceeding of

IEEE

Internal Conference

on

Image

Processing, 1996, pages 87-90

[3] Heitou Zen, Tameharu Hasegawa, Shinji Ozawa, Moving

Object Detection from MPEG Coded Picture, Proceeding

of IEEE International Confrrence

on

Image Processing,

vol. IV, pp.25-29, Oct. 1999

[4] Kyongil Yoon, Daniel DeMenthon, David Doermann,

Event Detection from MPEC Video in the Compressed

Domain,

Internalional Conference

on

Pattern R ecognition,

p. 1819 -1825, Volume 1, Barcelona, Sp ain, September 03 -

08,2000.

[5] Haw-Lung Eng, Kai-Kuang Ma, Motion Trajectory

Extraction Based on Macroblock Motion Vectors for

Video

Indexing, International Conzrence on Image Processing,

pp:284-288, 1999

[6] Babu,

R.V.,

Ramakrishnan, K.R., Co mpressed domain

motion segmentation for video object extraction,Acoustics,

Speech, and Signal Processing,

2002 IEEE

Inlernational

Conference o n, Volume: 4 ,2 00 2, Page(s): 3788 -3791

[7] Christophe Garcia, Georgios Tziritas, Optimal Projection of

2 - 0 Displacements for 3-D Translational Motion

Estimation,

Image

om

Vision Computing,

Vol

20,

pp:793-

804,2002

[8] Xiaodong Yu, Lingyu Dum, Qi Tian, Highway Traffic

Information Extraction from Skycam MPEG Video,

Proceedings

of

IEEE 5th Intelligent Tramponation S ystem

Conference,

Page(s): 37- 42, Sep. 3-6, 2002

[9]

CA .

Gonzales, H. Yeo and C.J.Kuo, Requ irements for

Motion Estimation Search Range in MPEG-2 Coded Video,

I BM Joumal

of Research Development,

Vol. 43, No.4, July

1999.

[IO] J im Wang and Ze-Nian

Li,

Kernel-based Multiple Cue

Algorithm for Object Segmentation,

IS&T/SPIE, Symp. On

Electronic Image and Video Communications and

Processing, 2000

522

A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors...

Documents

Transcript of A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors...