A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors...

download A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors (10.1109@ICME.2004.1394243)

of 4

Transcript of A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors...

  • 7/23/2019 A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors ([email protected])

    1/4

    2004 IEEE International Co nferenc e

    on

    Multimedia and Expo

    ICME)

    A Statistical Approach for Object M otion Estimation with

    MPEG

    Mo tion Vectors

    Xiaodong Yu , Ping Xue' and Qi Tian'

    Nanyang Technological Universiv, School of Electrical and Electronic Engineering, Singapore

    Institute fa r Infocomm Research, Agency fo r Science, Technology and Research, Singapore

    ' {exdyu, epxue)@ntu.edu.sg, [email protected] tar. du.sg

    Abstract

    In this paper we propose

    a

    statistical approach to

    estimate the object motion with A4PEG motion vectors A

    model with

    tw

    normal distribution terms is applied to

    represent the simplified object motion. One

    term

    models

    the

    nobes

    embedded in the mofion vectorfield produced

    in the encoding stage and the other term

    models ihe

    randomness of the

    frue

    object motion. Experiments with

    vehicle mo tion estimation fro m MPEG ha@c video are

    used

    to

    evaluate the proposed algorithm. The influence of

    rime window, fram e size and referencej+ame distance a re

    investigated. The vehicle speeds can be estimared with a

    high accuracy up to 85

    - 92 .

    1. Introduction

    Object motion estimation is a classic problem in the

    computer vision field. In recent years with the popularity

    of MPEG videos, much research effoorts have been

    attached to estimate object motion with MPEG motion

    vectors. Although MPEG motion vector is originally

    designed to minimize the motion prediction error in

    coding, it also embeds rich motion information among

    frames

    [ I ]

    Sinc e motion vectors are readily available in

    MPEG streams, we need neither fully decode the

    compressed video stream nor calculate the optical flow

    thus great computations can be saved.

    Motion-vector-based object motion estimation is

    composed of two components: motion segmentation and

    object tracking. It is assumed that objects are rigid or their

    parts are rigidly connected

    to

    one another and objects

    have continuous motion [I]. Thus an object can be

    segmented fr om background by clustering m otion vectors

    according to their similarities in directions or amplitudes

    [2,3,10].

    In the next step, motion parameters are derived

    from the motion vectors associated to this object for

    tracking. Such algorithms are analogues of those in

    optical flow field and they all rely on the success of

    moving object segmentation. However, the granularity of

    motion vector field limits the performance o f motion

    vector based object segmentation. To solve this problem,

    scholars have raised several approaches. Eng an d Ma

    [5]

    used unbiased

    fuzzy

    clustering to replace the well-know

    fuzzy c-mea ns clustering. They found that this algorithm

    was sensitive to the existence of small motion vector

    clusters and resulted in accurate identification of small

    objects. Babu and Ramakrishnan [6] accumulated and

    interpolated motion vectors over a few frames to enrich

    the motion information. Nevertheless, these approaches

    are inefficient if the object is

    too

    small. For example,

    wherever

    two

    or more objects sm aller than

    a

    macroblock

    conbibute to distinct motions within a macroblock, the

    encoded motion vector cannot represent the motion

    correctly [4] hence motion segmentation is infeasible.

    Furthermore, if an object is in similar size of one

    or two

    macroblocks, only one

    or two

    motion vectors cannot

    provide sufficient information to distinguish object

    motions from noisy vectors thus it is still difticult to

    segment this object from the background. These pr ob lem

    motivate us to seek another way to estimate object motion

    with motion vectors. We argue that it is possible to

    extract some useful motion information

    in

    macro level

    even when the objects are too

    small

    to estimate individual

    object's motion, providing these objects follow some

    kinds of common motion pattern.

    In this paper we proposed a statistical model to

    estimate the mean object motio n with MF'EG motion

    vectors under the stationary assumption. Two normal

    distribution terms are used to model the randomness of

    the object motion and the noises embedded in motion

    vector field respectively. Applying the statistical analysis

    within a time window, we alleviate the granularity of

    motion vector field on the cost of instant motion

    information.

    The rest of this paper is organized as follows.

    In

    Section

    2,

    we formulate

    our

    research question and

    proposed a statistical model. Then we present the test bed

    for the proposed model in Section 3 . In Section we

    present experimental results of the model and the

    influential factors presented in Section 2 with the test bed.

    Finally, a conclusion and the discussions of future work

    are given in Section 5 .

    2.

    Theoretical an alysis

    In this paper, we assume that the object motions are

    homogeneous both in spatial and temporal domain and we

    call it the stationary assumption. This assumption requires

    that object motions are similar to one another in terms of

    0-7803-8603-5/04/ 20.00

    2004 EEE

    519

  • 7/23/2019 A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors ([email protected])

    2/4

    amplitude or direction and their motions change slowly.

    In light of the difiiculties of object segmentation with

    motion vectors discussed in Section 1, we expect

    to

    describe the objects motion with a few statistical

    parameters within

    a

    sho rt period rather than identifying

    instant motion for every single object. The period within

    which the stationary assumption is satisfied is defined

    as

    the time window. We defme the displacement that

    associates with object motion as the true object motion,

    the one that does not as noise and the motion derived

    directly from the raw motion vector field, i.e., the true

    object m otion plus noise,

    as

    the observed object motion,

    X, = i n ,

    where

    X

    X nd

    n

    are variables of the observed objects

    motion, the true object motion and noise respectively,

    denotes the i-th sample, i.e., the i-th motion vector within

    a time window. The variables in the statistical model can

    be either amplitudes or directions of motion vectors. Th ey

    can be extracted either directly from motion vector field

    or from motion vector field after some transformations,

    e.g., camera calibration,

    as

    long as the stationary

    assumption can be satisfied after such transformations.

    The statistical model for the true object motion is

    application-tailored.

    In

    this paper, we are interest in the

    amplitude of object motion. Under the stationary

    assumption, we can expect that within

    a

    t ime window,

    most true object motions should concentrate around a

    center value, i.e., the mean, and they sho uld be symm etric

    about the center in a bell shape.

    This

    rational deduction

    coincides w ith the experimental ohservations (see Figure

    2). Combining the rational deduction and the

    experimental ohservations, we propose to model the true

    object motion with normal distribution in this

    paper.

    For

    the model o f noise, we employ an additive zero-mean and

    constant variance normal distribution, which has been

    used to model the noise in the optical flow

    [7].

    Thus we

    have

    X

    -

    N P,~ : : ) ,

    ni -

    N(O,U;),

    2)

    z ' -

    N P . ~U : ,

    where p and

    dx

    re mean and variance of the true

    object motion, and dn ariance of noise.

    We

    approximate

    p with the sample mean o f the observed m otion A from

    Nsamoles

    in

    a time window T:

    1)

    -

    The mean of the true object motion

    is

    a

    parameter of

    interest to the users in applications because it represents

    the dominant motion characteristics. It is desired to

    improve its estimation accuracy. T his can he a chieved by

    either reducing

    the

    variance of the estimation error or

    improving the signal-noise-ratio

    SNR).

    The estimation

    error follows the normal distribution,

    1

    : U :

    =

    ( X , ' - p )

    -

    N ( 0 ,

    ,

    4)

    and

    S N R

    is

    given by

    5 )

    Now let

    us

    characteristically analyze the influential

    factors in

  • 7/23/2019 A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors ([email protected])

    3/4

    We present a case study for the trafiic monitoring

    application in this section and use it to test the proposed

    model and the influential facton discussed above. It is

    extended from our previous work

    [SI.

    In this application,

    we estimated vehicle speed with motion vectors in MPEG

    traffic v ideo. The traff ic video

    is

    collected from a S kycam,

    i.e., a camera highly mounted with a much wider view.

    Figure 1 shows a sample image of such video with the

    motion vectors. Most of the vchicles in the traffic video

    are sm aller than a single macroblock and there

    is

    only one

    o r two motion vectors associated to each vehicle. Hence,

    this

    is

    a good example to show the advantage of the

    proposed method over conventional clustering based

    counterparts. Within a short period of time, the

    amplitudes and the directions of vehicles speed should be

    similar and they will not change significantly. Thus, the

    stationary assumption

    is

    satisfied easily in this application.

    Due to the perspective effect, motion vectors from

    different vehicles with similar speed may present

    different amplitudes. Hence camera calibration

    is

    employed to obtain the displacement of the vehicles in

    ground plane from MPEG motion vectors. In this way,

    the variable of the object motion in the statistical model

    and equations

    1)- 6)

    is the mapped motion vector, or

    equivalently

    Figure 1. Sample image from Skycam and motion vectors

    (scaled

    by IO

    4.

    Experimental results

    We test the proposed model and the impact of the

    influential factors with the test bed described in Section

    and the test videos are two MF'EG videos collected from

    hvo Skycams respectively. Each

    of

    them

    is

    5 minutes

    long and includes 6 lanes, representing various traffic

    conditions at certain place. They are digitalized by a

    MPEG card in MPEG-1 format at resolution 352x288,

    frame rate 25Fps, reference frame distance

    Df=3

    and

    constant bitrate 1150khps. The variable of object motion

    is

    the speed of vehicle in this case study. The mean speed

    of each lane

    is

    calculated and compared with ground buth

    independently. Ground truth

    is

    obtained manually at 2

    seconds interval.

    First of all, we test the normal approximation of object

    motion. Figu re 2.a show the distributions of the estimated

    speed within a lane for

    a

    30-second test sequence. It is

    bell-shape and symmetric about their mean. The normal

    fits

    demonstrate that the normal distribution properly

    approximate the speed distributions. To test our

    assumption objectively, we conduct the Ryan-Joiner

    (similar to Shapiro-Wilk) normality test. The plots and

    the test results are presented in Figure 2.b.

    As

    the plot

    shows the ordered observed values and the respective

    cumulative frequency almost lie along a straight line, it is

    secure to assume that the vehicle speed is normally

    distributed.

    4 ,

    833

    1

    ? a s m m

    gp.6

    -I...-

    .I

    ,.

  • 7/23/2019 A Statistical Approach for Object Motion Estimation With MPEG Motion Vectors ([email protected])

    4/4

    Figure 3. The impact of time window

    on

    speed estim ation.

    a)

    and (b) show the standard deviation of speed estimation error at

    and mean accuracy of speed estimation

    at

    different time

    windows respectively.

    Finally, we test the influence of frame size and

    reference frame distance

    on

    speed estimation. We re-

    encode two test videos at two Frame sizes, CIF and QC IF,

    and three reference frame distance, D,= (normal), Of=

    6

    and

    D

    - 12. Then we evaluate the

    test

    bed with re-

    f:

    encoded videos. Figure 4.a illustrates the mean amplitude

    of motion vectors in the test videos. We find that the

    mean motion vectors increase approximately linearly with

    the reference frame distance and the square root of frame

    size. Using a larger reference frame distance or a larger

    frame size, motion vector becom es longer. Consequently,

    the influence of half pixel error is suppressed and the

    accuracies of speed estimation are improved. Note that

    the mean motion vector with 0 =

    12

    in CIF size is

    slightly smaller than the double of the mean motion

    vector with

    Of=

    2 in QCIF size and the one with Of=

    in CIF size. Meanwhile, the mean accuracy of speed

    estimation with 0

    =

    12 in CIF size deteriorates as

    compared with the others in CIF size. Similar observation

    is also reported in

    Gonzales,

    Yeo and Kuos experiments

    [ 9 ] The reason

    is

    that with

    a

    larger reference frame

    distance,

    a

    larger motion vector is selected thus mo re bits

    are needed to code this motion vector. When these

    additional bits are not sufficiently compensated by the

    corresponding saves in coding smaller marcoblock

    residual emors, a sub-optimal, shorter motion vector is

    used instead of the optimal, longer one. This is the

    inherent limitation of MPEG motion vector based

    approach. -

    M

    w PI

    0 1 1 w

    a

    II

    (a) (b)

    Figure 4 The mean motion vectors (a) and the mean accuracies

    of speed estimation @) far test videos n different fnme size and

    reference hame distance. T=60s.

    5. Conclusion and future work

    In

    this paper, an algorithm that estimates object

    motion from

    MPEG

    compressed video with statistical

    model was presented. This algorithm complements the

    existing clustering based approaches in small object

    scenarios where the latter are inefficient. Theoretical

    analysis and experimental evaluation were conducted to

    investigate the influential factors of the proposed model.

    Unlike clustering based object motion estimation

    techniques, we did not attempt to segment and track every

    object. Instead, we tried to estimate

    a

    few statistical

    motion parameters for all objects within a time window.

    In this way, motion estimation accuracy can be

    substantially improved with proper spatial and temporal

    processing (85%-92%

    in DUI

    est bed) an d th c granularity

    of motion vector field is alleviated.

    Although the test vehicle in this paper is a typical

    application for traffic monitoring, it is applicable in other

    scenarios where the objects are small while moving in a

    common pattern. It is can also be extend to moving

    camera by compensating the camera motion. They are

    included in ou r future works.

    Reference:

    [I] Nevenka Dimitrova and Forouzan Clshani, Motion

    Recovely for Video Content Classification,

    ACM

    Transactions

    on

    Information Systems,

    Vo1.13, No.4,

    October,1995,pp408-439

    [2] F Bartolini, V Cappelhi,

    and

    C. Giani,

    Motion

    Estimation and Tracking for Urban Traffic Monitoring,

    Proceeding of

    IEEE

    Internal Conference

    on

    Image

    Processing, 1996, pages 87-90

    [3] Heitou Zen, Tameharu Hasegawa, Shinji Ozawa, Moving

    Object Detection from MPEG Coded Picture, Proceeding

    of IEEE International Confrrence

    on

    Image Processing,

    vol. IV, pp.25-29, Oct. 1999

    [4] Kyongil Yoon, Daniel DeMenthon, David Doermann,

    Event Detection from MPEC Video in the Compressed

    Domain,

    Internalional Conference

    on

    Pattern R ecognition,

    p. 1819 -1825, Volume 1, Barcelona, Sp ain, September 03 -

    08,2000.

    [5] Haw-Lung Eng, Kai-Kuang Ma, Motion Trajectory

    Extraction Based on Macroblock Motion Vectors for

    Video

    Indexing, International Conzrence on Image Processing,

    pp:284-288, 1999

    [6] Babu,

    R.V.,

    Ramakrishnan, K.R., Co mpressed domain

    motion segmentation for video object extraction,Acoustics,

    Speech, and Signal Processing,

    2002 IEEE

    Inlernational

    Conference o n, Volume: 4 ,2 00 2, Page(s): 3788 -3791

    [7] Christophe Garcia, Georgios Tziritas, Optimal Projection of

    2 - 0 Displacements for 3-D Translational Motion

    Estimation,

    Image

    om

    Vision Computing,

    Vol

    20,

    pp:793-

    804,2002

    [8] Xiaodong Yu, Lingyu Dum, Qi Tian, Highway Traffic

    Information Extraction from Skycam MPEG Video,

    Proceedings

    of

    IEEE 5th Intelligent Tramponation S ystem

    Conference,

    Page(s): 37- 42, Sep. 3-6, 2002

    [9]

    CA .

    Gonzales, H. Yeo and C.J.Kuo, Requ irements for

    Motion Estimation Search Range in MPEG-2 Coded Video,

    I BM Joumal

    of Research Development,

    Vol. 43, No.4, July

    1999.

    [IO] J im Wang and Ze-Nian

    Li,

    Kernel-based Multiple Cue

    Algorithm for Object Segmentation,

    IS&T/SPIE, Symp. On

    Electronic Image and Video Communications and

    Processing, 2000

    522