Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

download Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

of 19

Transcript of Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    1/19

    ~ e r g a m o n

    Pattern Recognition, Vol. 30, No. 4, pp.

    607--Q25

    1997

    ©

    1997 Pattern Recognition Society. Pnblished by Elsevier Science Ltd

    Ptinted

    in Great Britain. All rights reserved

    0031-3203/97 $17.00+.00

    PH:

    S0031-3203(96)00107-0

    AUTOMATIC VIDEO INDEXING VIA OBJECT

    MOTION ANALYSIS

    JONATHAN D. COURTNEY*

    Texas Instruments Incorporated 8330

    LBJ Freeway

    M S 8374

    Dallas

    Texas

    75243

    U.S.A

    Received

    12

    June

    1996;

    received

    or

    publication

    30

    July

    1996)

    Abstract-To

    assist human

    analysis of

    video

    data a technique has

    been developed

    to

    perform

    automatic

    content-based

    video

    indexing

    from

    object motion. Moving objects

    are

    detected in tbe

    video sequence using

    motiOn s e ~ m e n t a t i o n I?etbo?s. By t r c ~ n g individual objects through tbe segmented data a symbolic

    representation of

    tbe video IS

    generated m

    tbe

    form of a directed

    graph

    describing

    tbe

    objects and tbeir

    movement. This

    graph

    is then annotated using a rule-based classification scheme to identify

    events

    of interest

    ~ . g .

    a ~ p e a r a n c e / d i . s a p p e a r a n ~ e

    deposit/removal entrance/exit, and motion/rest of objects. One may then use ~

    m?ex mto. tbe monon

    graph

    mstead of

    the

    raw

    data

    to analyse

    the semantic

    content of

    tbe video.

    Application of

    tb1s techmque

    to

    surveillance

    video

    analysis is discussed.

    ©

    1997 Pattern Recognition Society. Published

    by

    Elsevier

    Science

    Ltd.

    Video indexing

    Object tracking Motion analysis

    Content-based retrieval

    1.

    INTRODUCTION

    Advances in multimedia technology, including commer

    cial prospects for video-on-demand and digital library

    systems, have generated recent interest in content-based

    video analysis. Video data offers users of multimedia

    systems a wealth of information; however, it is not as

    readily manipulated as other data such as text. Raw video

    data has no immediate handles by which the multi

    media system user may analyse its contents. By annotat

    ing video data with symbolic information describing the

    semantic content, one may facilitate analysis beyond

    simple serial playback.

    To assist human analysis

    of

    video data, a technique has

    been developed to perform automatic, content-based

    video indexing from object motion. Moving objects

    are detected in the video sequence using motion seg

    mentation methods. By tracking individual objects

    through the segmented data, a symbolic representation

    of the video is generated in the form of a directed graph

    describing the objects and their movement. This graph is

    then annotated using a rule-based classification scheme

    to identify events of interest, e.g., appearance/disappear

    ance, deposit/removal, entrance/exit, and motion/rest of

    objects. One may then use an index into the motion graph

    instead

    of

    the raw data to analyse the semantic content

    of

    the video.

    We have developed a system that demonstrates this

    indexing technique in assisted analysis of surveillance

    video data. The Automatic Video Indexing (AVI) system

    allows the user to select a video sequence of interest, play

    it forward or backward and stop at individual frames.

    Furthermore, the user may specify queries on video

    sequences and

    jump

    to events

    of

    interest to avoid

    tedious serial playback. For example, the user may select

    E-mail:

    [email protected].

    607

    a person in a video sequence and specify the query show

    me all objects that this person removed from the scene .

    In response, the AVI system assembles a set of video

    clips highlighting the query results. The user may

    select a clip of interest and proceed with further video

    analysis using queries or playback as before.

    The remainder of this paper is organized as follows:

    Section 2 discusses content-based video analysis. Sec

    tion 3 presents a video indexing technique based on

    object motion analysis. Section 4 describes a system

    which implements this video indexing technique for

    scene monitoring applications. Section 5 presents experi

    mental results using the system. Section 6 concludes the

    paper.

    2. CONTENT-BASED VIDEO ANALYSIS

    Video data poses unique problems for multimedia

    information systems that text does not. Textual data is

    a symbolic abstraction

    of

    the spoken word that is usually

    generated and structured by humans. Video, on the other

    hand, is a direct recording

    of

    visual information. In its

    raw and most common form, video data is subject to little

    human-imposed structure, and thus has no immediate

    handles by which the multimedia system user may

    analyse its contents.

    For example, consider an online movie screenplay

    (textual data) and a digitized movie (video and audio

    data). If one were analysing the screenplay and interested

    in searching for instances

    of

    the word horse in the text,

    various text searching algorithms could be employed to

    locate every instance of this symbol as desired. Such

    analysis is common in online text databases.

    If,

    however,

    one were interested in searching for every scene in the

    digitized movie where a horse appeared, the task is much

    more difficult. Unless a human performs some sort of

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    2/19

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    3/19

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    4/19

    610

    J. D.

    COURTNEY

    the region detected by the segmentation of image In

    is

    due

    to the motion

    of

    an object present in the reference image

    (i.e. due to exposed background ), a high probability

    exists that the boundary

    of

    the segmented region will

    coincide with intensity edges detected in

    0

    . If

    the region

    is due to the presence of a foreground object in the

    current image, a high probability exists that the region

    boundary will coincide with intensity edges in In. The test

    is

    implemented by applying an edge detection operator to

    the current and reference images and checking for co

    incident boundary pixels in the segmented region

    of

    Cn.<

    9

    l Figure 3 shows this process.

    I f

    the test supports

    the hypothesis that the region in question

    is

    due to

    exposed background, the reference image

    is

    modified

    by replacing the object with its exposed background

    region (see Fig. 4).

    No motion segmentation technique

    is

    perfect. The

    following are errors typical of many motion segmenta

    tion techniques:

    1.

    True objects will disappear temporarily from the

    motion segmentation record. This occurs when there

    is insufficient contrast between an object and an

    occluded background region, or

    if

    an object

    is

    partially occluded by a background structure (for

    instance, a tree or pillar present in the scene).

    2.

    False objects will appear temporarily in the motion

    segmentation record. This

    is

    caused by light fluctua

    tions or shadows cast by moving objects.

    3.

    Separate objects will temporarily join together. This

    typically occurs when two or more objects are in

    close proximity or when one object occludes another

    object.

    4.

    Single objects will split into multiple regions. This

    occurs when a portion of

    an

    object has insufficient

    contrast with the background it occludes.

    Instead of applying incremental improvements to re

    lieve the shortcomings of motion segmentation, the AVI

    technique addresses these problems at a higher level

    where information about the semantic content

    of

    the

    video data

    is

    more readily available. The object tracking

    and motion analysis stages described in Sections 3.3 and

    3.4 employ object trajectory estimates and knowledge

    concerning object motion and typical motion segmenta

    tion errors to construct a more accurate representation

    of

    the video content.

    3.3.

    Object tracking

    The motion segmentation output is processed by the

    object tracking stage. Given a segmented image Cn with

    P

    uniquely-labeled regions corresponding to foreground

    objects in the video, the system generates a set

    of

    features

    to represent each region. This set of features is named a

    V-object (video-object), denoted V ~ ,

    p =

    1, . . . ,

    P

    A

    V-object contains the label, centroid, bounding box, and

    shape mask of its corresponding region,

    as

    well

    as

    object

    velocity and trajectory information generated by the

    tracking process.

    V-objects are then tracked through the segmented

    video sequence. Given segmented images

    Cn

    and

    Cn+t

    with V-objects

    Vn

    = { V ~ ;

    p

    = 1, . . . , P} and

    Vn+l

    =

    {V,:

    1

    ; q = 1, . . .

    ,

    Q},

    respectively, the motion tracking

    process links V-objects and V ~ + l if their position

    and estimated velocity indicate that they correspond

    to

    the same real-world object appearing in frames

    Fn

    and

    Fn+l· This

    is

    determined using linear prediction of

    V

    object positions and a mutual nearest neighbor criter

    ion via the following procedure:

    1. For each V-object E

    Vn,

    predict its position in the

    next frame using

    if,.

    = J/, .

    tn+l

    -

    tn),

    where if,.

    is

    the predicted centroid of in Cn+t>

    J1:

    the centroid of measured in Cm the estimated

    (forward) velocity of V ~ , and

    tn+l

    and tn are the

    timestamps

    of

    frames Fn+l and

    Fm

    respectively.

    Initially, the velocity estimate is set to

    =

    0,

    0).

    2. For each E

    Vn,

    determine the V-object in the next

    frame with centroid nearest

    if,..

    This nearest neigh

    bor

    is

    denoted J V ~ . Thus,

    J V ~ = V ~ + l 3 II.U : - . u ~ + 1 l l

    S

    if ; - . u ~ + 1 l l Vq

    r.

    3. For every pair V ~ , J V ~ = V ~ +

    1

    )

    for which

    no

    other

    V

    objects in

    Vn

    have

    1

    as

    a nearest neighbor, estimate

    the (forward) velocity

    of V ~ + l ' as

    r . U ~ + 1 -  

    vn+l = ;

    tn+1 - tn

    (1)

    otherwise, set v ~ + l = 0,

    0).

    These steps are performed for each

    Cm

    n

    =

    0, 1,

    . . .

    ,

    2.

    Steps 1 and 2 find nearest neighbors

    in the subsequent frame for each V-object. Step 3 gen

    erates velocity estimates for V-objects that can be un

    ambiguously tracked; this information is used in step 1 to

    predict V-object positions for the next frame.

    Next, steps

    l 3

    are repeated for the reverse sequence,

    i.e.

    Cm n = 1 N -

    2,

    ... , 1. This results in anew set

    of predicted centroids, velocity estimates, and nearest

    neighbors for each V-object in the reverse direction.

    Thus, the V-objects are tracked both forward and back

    ward through the sequence. The remaining steps are then

    performed:

    4.

    V-objects and V ~ + l are

    mutual nearest neighbors

    if J l l ~ = V ~ +

    1

    and J V ~ +

    1

    = V ~ . (Here,

    J V ~

    is the

    nearest neighbor

    of

    in the forward direction, and

    J V ~

    1

    is

    the nearest neighbor of

    V ~ +

    1

    in the reverse

    direction.) For each pair

    of

    mutual nearest neighbors

    ( V ~ , v ~ + 1 ) , create a

    prim ry link

    from to v ~ + 1

    5.

    For each E

    Vn

    without a mutual nearest neighbor,

    create a

    secondary link

    from to if the predicted

    centroid

    if,.

    is

    within

    E of

    V ~ , where

    E

    is

    some small

    distance.

    6. For each in Vn+

    1

    without a mutual nearest

    neighbor, create a

    secondary

    link from J V ~ +

    1

    to

    V,:

    1

    if the predicted centroid p ~ +

    1

    is

    within E of

    J V ~ + l

    The object tracking procedure uses the mutual nearest

    neighbor criterion (step 4) to estimate frame-to-frame V-

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    5/19

    \ -

    r

    o.

    I I

    ....

    .__

    Automatic video indexing via object motion analysis

    a)

    b)

    (c)

    (f)

    (g)

    h)

    Fig. 3. Exposed background detection. a) Reference image /

    0

    . b) Image In c) Region to be tested. d)

    Edge image o a), found using Sobel

    0

    1 operator. e) Edge image o b). t) Edge image o c), showing

    boundary pixels. g) Pixels coincident in d) and

    t).

    h) Pixels coincident in e) and

    t).

    The greater number

    o coincident pixels in g) versus h) support the hypothesis that the region in question is due to exposed

    background.

    611

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    6/19

    612

    J. D.

    COURTNEY

    Fig. 4.

    Reference image modified to account

    for the

    exposed background region detected

    in Fig.

    3.

    object trajectories with a high degree of confidence. Pairs

    of mutual nearest neighbors are connected using a pri

    mary link to indicate that they are highly likely to

    represent the same real-world object in successive video

    frames.

    Steps 5-6 associate V-objects that are tracked

    with less confidence but display evidence that they might

    result from the same real-world object. Thus, these

    objects are joined by secondary links. These steps

    are necessary to account for the

    split

    and join

    type motion segmentation errors as described in

    Section

    3.2.

    The object tracking process results in a list

    of

    V-

    objects and connecting links that form a directed graph

    (digraph) representing the position and trajectory of

    foreground objects in the video sequence. Thus, the V-

    objects are the nodes

    of

    the graph and the connecting

    links are the arcs. This motion graph is the output

    of

    the

    object tracking stage.

    O

    Fl

    F2

    F3

    F4

    Figure 5 shows a motion graph for a hypothetical

    sequence of one-dimensional frames. Here, the system

    detects the appearance of an object at A and tracks it to

    the V-object at

    B.

    Due to an error in motion segmenta

    tion, the object splits at D and E, and joins at F.

    At

    G, the

    object joins with the object tracked from C due to

    occlusion. These objects split at H and

    I

    Note that

    primary links connect the V-objects that were most

    reliably tracked.

    3 4 otion analysis

    The motion analysis stage analyses the results of

    object tracking and annotates the motion graph with tags

    describing several events of interest. This process pro

    ceeds in two parts: V-object grouping and V-object

    indexing. Figure

    6

    shows an example motion graph for

    a hypothetical sequence of 1-D frames discussed in the

    following sections.

    F5

    F6 F7

    F8

    Fig. 5. The output of

    the

    object tracking stage for a hypothetical sequence of

    1 D

    frames. The vertical lines

    labeled Fn represent

    frame

    number n Primary links

    are shown as

    solid arcs; secondary links are

    shown as

    dashed arcs.

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    7/19

    Automatic video indexing via object motion analysis

    613

    FO Fl

    F2

    F3

    F4

    F5

    F6 F7

    F8 F

    FlO Fll

    Fl2 Fl3

    Fl4

    Fig.

    6.

    An example motion graph for a sequence of 1-D frames.

    3.4.1. V-object grouping.

    First, the motion analysis

    stage hierarchically groups V-objects into structures

    representing the paths of objects through the video data.

    Using graph theory terminology,0

    5

    l

    five groupings are

    defined for this purpose:

    A stem M

    ={Vi:

    i

    = 1,2,

    . . . NM}

    is a maximal

    size, directed path (dipath)

    of

    two or more V-objects

    containing no secondary links, meeting all of the follow

    ing conditions:

    • outdegree(Vi) = 1 for 1 ; i

    <

    NM

    • indegree(Vi) = 1 for 1 < i

    ;

    NM and

    • either

    (2)

    or

    (3)

    where

    Jli

    is the centroid

    of

    V-object Vi

    EM.

    Thus, a stem represents a simple trajectory of an object

    through two

    or

    more frames. Figure 7 labels V-objects

    from Fig. 6 belonging to separate stems with the letters

    A through

    K .

    Stems are used to determine the motion sta te

    of

    real-world objects, i.e. whether they are moving or

    FO Fl F2

    F3 F4 F5 F6

    F7

    stationary.

    f

    equation (2) is true, then the stem is classi

    fied as stationary; if equation (3) is true, then the stem is

    classified as

    moving.

    Figure

    7

    highlights stationary stems

    B, C,

    F

    and H; the remainder are moving.

    A branch B ={Vi: i = 1 2,

    . . .

    NB} is a maximal

    size dipath of two or more V-objects containing no

    secondary links, for which outdegree(Vi)=1 for

    1 ;

    i

    < NB and indegree(V;)=l for 1 <

    i ;

    NB Figure

    8 labels V-objects belonging to branches with the letters

    L through T .

    A

    branch represents a highly reliable

    trajectory estimate of an object through a series of

    frames.

    f

    a branch consists entirely of a single stationary stem,

    then it is classified as stationary; otherwise, it is classi

    fied as

    moving.

    Branches N and Q in Fig. (high

    lighted) are stationary; the remainder are moving.

    A trail is a maximal-size dipath of two or more V-

    objects that contains no secondary links. This grouping

    represents the object tracking stage's best estimate of an

    object trajectory using the mutual nearest neighbor cri

    terion. Figure 9 labels V-objects belonging to trails with

    the letters U through Z .

    A trail and the V-objects it contains are classified as

    stationary if

    all the branches it contains are stationary,

    K

    F8

    F9

    FlO

    Fll

    Fl2 Fl3 Fl4

    Fig. 7. Stems. Stationary stems are highlighted.

    FO Fl

    F2 F3

    F4

    F5

    F6 F7

    FS

    F9

    FlO

    Fll

    Fl2

    Fl3

    Fl4

    Fig. 8. Branches. Stationary branches are highlighted.

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    8/19

    614

    J.

    D.

    COURTNEY

    FO

    Fl

    F2 F3

    F4 F5

    F6 F7 FS

    F9 FlO

    Fll

    Fl2 Fl3 Fl4

    Fig. 9. Trails.

    and moving if all the branches it contains are moving.

    Otherwise, the trail is classified as unknown. Trail W in

    Fig.

    9

    is stationary; the remainder are moving.

    A

    track K={L,,G,,

    ...

    LNK_

    1

     GNK_

    1

     LNK} is a

    dipath of maximal size containing trails {L; : 1 ::;

    i::; NK] and connecting dipaths

    {G;

    : 1

    ::;

    i

    <

    NK}.

    For each

    G;

    E

    K

    there must exist a dipath

    H = {Vf,G;,

    V ~ d

    (where Vf is the last V-object in L;, and

    1

    is the first V-

    object inL;

    1

    ),

    such that every \ } E H meets the require-

    ment

    (4)

    where is the centroid of

    Vf

    the forward velocity of

    vf t j f;

    the time difference between the frames con

    taining \ } and Vf, and P j is the centroid of \ ). Thus,

    equation (4) specifies that the object must maintain a

    constant velocity through path H.

    A track represents the trajectory estimate

    of

    an object

    that may cause or undergo occlusion one or more times in

    a sequence. The motion analysis stage uses equation (4)

    to attempt to follow an object through frames where an

    FO

    Fl

    F2

    F3

    F4 F5 F6 F7

    occlusion occurs. Figure 10 labels V-objects belonging to

    tracks with the letters a ,

    (3 ,

    x , 6 and c .

    Note that track 6 joins trails

    X

    and Y

    A track and the V-objects it contains are classified as

    stationary if all the trails it contains are stationary, and

    moving if all the trails it contains are moving. Otherwise,

    the track is classified as

    unknown.

    Track

    x

    in Fig. 10 is

    stationary; the remaining tracks are moving.

    A trace is a maximal-size, connected digraph of V-

    objects. A trace represents the complete trajectory of an

    object and all the objects with which it intersects. Thus,

    the motion graph in Fig. 6 contains two traces: one trace

    extends from F

    2

    to F

     

    ; the remaining V-objects form a

    second trace. Figure 11 labels V-objects on these traces

    with the numbers

    1

    and

    2 .

    Note that the preceding groupings are hierarchical, i.e.

    for every trace E, there exists at least one track K trail L,

    branch

    B,

    and stem

    M

    such that

    E

    2

    K

    2

    L

    2

    B

    2

    M.

    Furthermore, every V-object is a member

    of

    exactly one

    trace.

    The motion analysis stage scans the motion graph

    generated by the object tracking stage and groups

    V-

    objects into stems, branches, trails, tracks, and traces.

    FS F9

    FlO

    Fll

    Fl2 Fl3

    Fl4

    Fig.

    10.

    Tracks. The dipath connecting trails X and Y from Fig. 9 is highlighted.

    FO

    Fl

    F2

    F3

    F4

    F5 F6

    F7

    FS

    F9

    FlO Fll

    Fl2

    Fl3 Fl4

    Fig. 11. Traces.

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    9/19

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    10/19

    616

    J D. COURTNEY

     

    Table 1

     

    Cond

    itions for annotatin

    g V-objects with eac

    h of the object-mot

    ion events

    V-obje

    ct motion state

    Moving

    Stationary   Unknown

    Appearance

    1 Head 

    of

    track

     

    1

    Head of

     

    trac

    k

    2 in degree

    V) > 0

    2 indegr

    ee V) = 0

    Disappeara

    nce 1 Tail

    of track

    1

    Tail

     of track

    2

    outdegree V)

    >

    0

    2

    outdegree V)

    =

    0

    Entrance

    1 Head

    of

    track

    1 H ead

    of

    t

    rack

    2

      in degree V

    ) = 0

    2  indeg

    ree V) 0

    Exi t

    1 Tail

    of track

    1

    T

    ail of track

    2 o

    utdegree V) 

    =

    0

    2

    outdegree V)

    = 0

    Deposit

    1 Head

    of

    track

    2

      in degree V

    ) = 1

    Re moval

    1

      Tail

     of track

    2

    outdegree V) 1

    Depositor) 

    Adjacent to V-ob

    ject with dep osit tag

     

    Re mover)

    Adj acent fro

    m V-object with re

    mov l tag

    Motio

    n

    1 Tail

    of stationary  stem

    2

    Head of moving ste

    m

    Rest

    1

    Tail o

    f

    moving s

    tem

    2

      Head

    of

    st

    ationary stem 

    Entrance Entrance  Motion Rest Exit  ppear Disappearance

    FO Fi

    F2.:

    F3

    F ¢ FS

    F6

    ··

    F7  

    FS

    F9

     

    FlO F l l

    Fl2.:

    Fl

    :Fl

    4

    De

    positor I Deposit

     

    Exit E

    ntrance

    Remov

    al I Remover

    Exit

    Fi

    g.

    12

    Annotation r

    ules applied  to Fig. 6

    Video Indexing

     

    Fig. 13 A

      high-level  diagram

     of the

    AVI 

    system

    .

    it forw

    ard  or backward

    and  stop on indiv

    idual frames.

    T

    he   system also p

    rovides a content-

    ba sed retrieval m

    e

    chanism

    by   which the A

    VI system user m

    ay  specify

    qu eries on a vid

    eo  sequence usin

    g spatial, tempo

    ral, 

    event-, and object-based  parameters. Th us, the user

    can j

    um p to importan

    t points in the vi

    deo  sequence

    ba sed on the que

    ry specification.

    Figure

    14

    show

    s a  picture of the

      playback porti

    on  

    of

    the AVI

      GUI. It provides

    familiar VCR-like

      controls

    (i

    .e. forward, revers

    e, stop, step-forw

    ard, step-back), a

    s

    well as

    a system clipb

    oar d for record

    in g  inter

    mediate video   analysis results (i.e. video cli ps ) .

    F

    or   example, the

    clipboard shown

    in Fig.

    14

    contai

    ns

    three

    clips, the result of

    a  previous quer

    y by the user.

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    11/19

    Au

    tomatic v

    id eo index

    ing via o

    bj ect moti

    on analysi

    s

    617

    Fig.

    14 

    The

    AVI

    system playback interface.

    Fig. 15 

    The AVI

      system

      query inte

    rface.

    The user

     may sele

    ct o ne o

    f he se clip

    s, p lay it f

    or ward a

    nd

    bac

    k, and po

    se a new

     query us

    in g it. Th

    e clip(s)

    resulting

    from th

    e new qu

    ery  are t

    he n push

    ed onto t

    he top

    of the

    clipboard

     stack. T

    he user m

    ay  also

    peruse th

    e cl ipboa

    rd

    stack

      using th

    e button

    -c omman

    ds

    up , dow n

      , and

    pop  .

    Fi gure 15

     shows

    the query

     in terface

     to the

    AVI system

    .

    Usi

    ng the T

    ype fie

    ld, the us

    er may s

    pe cify an

    y com -

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    12/19

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    13/19

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    14/19

    6

    20

    1 D

    . OU

    RTN

    EY

    [ ]

    [10

    ]

    [2

    0]

    [

    30] 

    [

    4 ]

    [

    50] 

    [60]

     

    [7

    0]

    [80

    ]

    [90]

     

    [1

    00] 

    [1

    1 ]

     

    [1

    20]

     

    1

    30]

     

    [

    140

    ]

    1

    50]

    1

    60]

    [17

    0]

    [180

    ]

    1

    90]

    [2

    ]

    [210

    ]

    [22

      ]

    [23

      ]

    Fi

    g. 21

    .  Fr a

    mes

    from

    an e

    xam p

    le vid

    eo s

    equen

    ce. F

    rame

      nu m

    bers

     are s

    hown

      belo

    w ea

    ch im

    age.

     

    ro

    om  

    at

    th a

    t po i

    nt, th

    e  pe

    rso n

     is d

    efine

    as

     

    a di

    ffere

    nt

    ob ject.) 

    T

    he

    user

    retu

    rns t

    o the

     or i

    gina l

      clip

     of F

    ig.

    23(a

    ) by

    pop p

    ing

    th e c

    lipb

    oa rd

     stac

    k twi

    ce. T

    hen

     the

    us er

    app l

    ies

    the

    que r

    y  f

    in d

    remo

    val

    even

    ts of

    this

     ob

    je ct

    to

    the

    br iefcase. The sys tem   responds with   a sing le clip 

    of

    the

      pe

    rson

      rem

    ov i

    ng  

    the

    brief

    case

    ,

    as

     

    sh

    own

      in

    F

    ig. 2

    3(c).

     

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    15/19

    Automatic video indexing via object motion analysis

    [36]

    [78]

    [110]

    Fig. 22. Clips from the video sequence of Fig. 21 satisfying the query fmd all deposit events . Boxes

    highlight the objects contributing to the event.

    a) b)

    c) d)

    Fig. 23. Advanced video analysis example. Clips show: (a) the briefcase being deposited, (b) the entrance of

    the person who deposits the briefcase, (c) the briefcase being removed, (d) the exit

    of

    the person who

    removes the briefcase.

    5. EXPERIMENTAL RESULTS

    621

    Finally, the user specifies the query find exit events of

    this object tothepersonremovingthe briefcase. The system

    then responds with a single clip of the person as he leaves

    the room (with the briefcase), as shown in Fig. 23(d).

    The video indexing technique described in this paper

    was tested using the AVI system on three video sequences

    containing a total of 900 frames,

    18

    objects, and 44

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    16/19

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    17/19

    Automatic video indexing via object motion analysis

    623

    [O

    [12]

    [24]

    [36]

    [ 8] [60]

    72]

    [8 ]

    [96]

    [108] [120]

    [132]

    144]

    [156] [168]

    [180]

    [192]

    204]

    [216]

    [228]

    [240J

    [252] [264]

    [276]

    Fig. 24. Frames from Test Sequence 2.

    which multimedia system users may navigate through

    video sequences. The video indexing technique described

    in this paper abstracts raw video information using

    motion segmentation object tracking and a hierarchical

    path construction method which enables annotation using

    several motion-based event tags. Efficient retrieval o

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    18/19

    624

    J. D. COURTNEY

    [O

    13]

    26] 39]

    52] 65]

    [78]

    [91]

    104]

    117]

    130]

    143]

    [156]

    [169] [182] [195]

    208]

    221]

    [234]

    247]

    260]

    [273] 286]

    299]

    Fig. 25. Frames from Test Sequence

    3.

    video clips

    is

    facilitated by an event index into the

    abstracted video. Furthermore, a system employing this

    indexing technique for assisted analysis o surveillance

    video allows users to jump to points

    o

    interest in a

    video sequence via intuitive spatial, temporal, event-, and

    object-based queries.

  • 8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

    19/19

    Automatic video indexing via object motion analysis 625

    a)

    b)

    Fig. 26. Appearance and exit

    of

    an individual pedestrian from Test Sequence

    3.

    Frame F

    217

    shows the

    pedestrian emerging from a car; frame

    248

    shows the pedestrian walk out of the field of

    view.

    Acknowledgements- Thanks go to Dinesh Nair and Stephen

    Perkins for assisting

    in

    the design and implementation of the

    AVI

    system.

    REFERENCES

    1. HongJiang Zhang, Atreyi Kankanhalli and

    W.

    Stephen,

    Automatic partitioning of full-motion video,

    Multimedia

    Systems 1(1), 10--28 (1993).

    2.

    Akihito Akutsu, Yoshinobu Tonomura, Hideo Hashimoto

    and Yuji Ohba, Video indexing using motion vectors, in

    Visual Communications and Image Processing Proc. SPIE

    1818

    Petros Maragos, ed., pp. 1522-1530, Boston,

    Massachusetts (November 1992).

    3. Mikihiro Ioka and Masato Kurokawa, A method for

    retrieving sequences of images on the basis of motion

    analysis, in Image Storage and Retrieval Systems Proc.

    SPIE 1662 pp. 35-46 (1992).

    4.

    Suh-Yin Lee and Huan-Ming Kao, Video indexing an

    approach based on moving object and track, in Storage and

    Retrieval for Image and

    Video

    Databases

    Proc.

    SPIE

    1908 Wayne Niblack, ed.,

    pp.

    25-36, San Jose, California

    (February 1993).

    5. Glorianna Davenport, Thomas Aguierre Smith and Nata1io

    Pincever, Cinematic primitives for multimedia,

    IEEE

    Comput. Graphics Appl.

    67-74 (July 1991).

    6.

    Masahiro Shibata, A temporal segmentation method for

    video sequences, in Visual Communications and Image

    Processing

    Proc.

    SPIE I818

    Petros Maragos, ed., pp.

    1194-1205, Boston, Massachusetts (November 1992).

    7.

    Deborah Swanberg, Chiao-Fe Shu and Ramesh Jain,

    Knowledge guided parsing in video databases, in

    Storage

    and Retrieval for Image and Video Databases

    Proc.

    SPIE

    1908

    Wayne Niblack, ed., pp. 13-24, San Jose, California

    (February 1993).

    8. F Arman, R Depommier, A. Hsu and M-Y. Chiu, Content

    based browsing of video sequences, in Proc.

    ACM

    Int.

    Conf on Multimedia San Francisco, California (October

    1994).

    9.

    Ramesh Jain, W.

    N.

    Martin and J. K. Aggarwal,

    Segmentation through the detection of changes due to

    motion,

    Comput. Graphics Image Process.

    11 13-34

    (1979).

    10.

    S.

    Yalamanchili,

    W.

    N.

    Martin and

    J.

    K.

    Aggarwal,

    Extraction of moving object descriptions via differ

    encing, Comput. Graphics Image Process. 18 188-201

    (1982).

    11. Dana H. Ballard and Christopher

    M.

    Brown,

    Computer

    Vision. Prentice-Hall, Englewood Cliffs, New Jersey

    (1982).

    12. Robert

    M.

    Haralick and Linda G. Shapiro, Computer and

    Robot Vision Vol. 2. Addison-

    Wesley,

    Reading, Massa

    chusetts (1993).

    13. Akio Shio and Jack Sklansky, Segmentation of people in

    motion, in

    IEEE Workshop on Visual Motion

    pp. 325-332,

    Princeton, New Jersey (October 1991).

    14. M.

    Irani and P. Anandan, A unified approach to moving

    object detection in 2D and 3D scenes, in Proc. Image

    Understanding Workshop

    pp. 707-718, Palm Springs,

    California (February 1996).

    15. Gary Chartrand and Ortrud R Oellermann, Applied and

    Algorithmic Graph Theory. McGraw-Hill, New York

    (1993).

    16. Stephen

    S.

    Intille and Aaron F Bobick, Closed-world

    tracking, in Proc. Fifth Int. Conf. on Computer Vision pp.

    672-678, Cambridge, Massachusetts (June 1995).

    bout the Author JONATHAN

    D. COURTNEY received the M.S. degree in Computer Science and the B.S.

    degree in Computer Engineering and Computer Science from Michigan State University. r Courtney

    is

    a

    Member of the Technical Staff in the Multimedia Systems Branch of Corporate Research and Development at

    Texas Instruments. His Master s thesis research, under the direction of Professor Ani

    K.

    Jain, concerned mobile

    robot localization using multisensor maps. His current research interests include multimedia information

    systems and virtual environments for cooperative work. r Courtney is a member of the IEEE.