Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

1/19

~ e r g a m o n

Pattern Recognition, Vol. 30, No. 4, pp.

607--Q25

1997

©

1997 Pattern Recognition Society. Pnblished by Elsevier Science Ltd

Ptinted

in Great Britain. All rights reserved

0031-3203/97 $17.00+.00

PH:

S0031-3203(96)00107-0

AUTOMATIC VIDEO INDEXING VIA OBJECT

MOTION ANALYSIS

JONATHAN D. COURTNEY*

Texas Instruments Incorporated 8330

LBJ Freeway

M S 8374

Dallas

Texas

75243

U.S.A

Received

12

June

1996;

received

or

publication

30

July

1996)

Abstract-To

assist human

analysis of

video

data a technique has

been developed

to

perform

automatic

content-based

video

indexing

from

object motion. Moving objects

are

detected in tbe

video sequence using

motiOn s e ~ m e n t a t i o n I?etbo?s. By t r c ~ n g individual objects through tbe segmented data a symbolic

representation of

tbe video IS

generated m

tbe

form of a directed

graph

describing

tbe

objects and tbeir

movement. This

graph

is then annotated using a rule-based classification scheme to identify

events

of interest

~ . g .

a ~ p e a r a n c e / d i . s a p p e a r a n ~ e

deposit/removal entrance/exit, and motion/rest of objects. One may then use ~

m?ex mto. tbe monon

graph

mstead of

the

raw

data

to analyse

the semantic

content of

tbe video.

Application of

tb1s techmque

to

surveillance

video

analysis is discussed.

©

1997 Pattern Recognition Society. Published

by

Elsevier

Science

Ltd.

Video indexing

Object tracking Motion analysis

Content-based retrieval

1.

INTRODUCTION

Advances in multimedia technology, including commer

cial prospects for video-on-demand and digital library

systems, have generated recent interest in content-based

video analysis. Video data offers users of multimedia

systems a wealth of information; however, it is not as

readily manipulated as other data such as text. Raw video

data has no immediate handles by which the multi

media system user may analyse its contents. By annotat

ing video data with symbolic information describing the

semantic content, one may facilitate analysis beyond

simple serial playback.

To assist human analysis

of

video data, a technique has

been developed to perform automatic, content-based

video indexing from object motion. Moving objects

are detected in the video sequence using motion seg

mentation methods. By tracking individual objects

through the segmented data, a symbolic representation

of the video is generated in the form of a directed graph

describing the objects and their movement. This graph is

then annotated using a rule-based classification scheme

to identify events of interest, e.g., appearance/disappear

ance, deposit/removal, entrance/exit, and motion/rest of

objects. One may then use an index into the motion graph

instead

of

the raw data to analyse the semantic content

of

the video.

We have developed a system that demonstrates this

indexing technique in assisted analysis of surveillance

video data. The Automatic Video Indexing (AVI) system

allows the user to select a video sequence of interest, play

it forward or backward and stop at individual frames.

Furthermore, the user may specify queries on video

sequences and

jump

to events

of

interest to avoid

tedious serial playback. For example, the user may select

E-mail:

[email protected].

607

a person in a video sequence and specify the query show

me all objects that this person removed from the scene .

In response, the AVI system assembles a set of video

clips highlighting the query results. The user may

select a clip of interest and proceed with further video

analysis using queries or playback as before.

The remainder of this paper is organized as follows:

Section 2 discusses content-based video analysis. Sec

tion 3 presents a video indexing technique based on

object motion analysis. Section 4 describes a system

which implements this video indexing technique for

scene monitoring applications. Section 5 presents experi

mental results using the system. Section 6 concludes the

paper.

2. CONTENT-BASED VIDEO ANALYSIS

Video data poses unique problems for multimedia

information systems that text does not. Textual data is

a symbolic abstraction

of

the spoken word that is usually

generated and structured by humans. Video, on the other

hand, is a direct recording

of

visual information. In its

raw and most common form, video data is subject to little

human-imposed structure, and thus has no immediate

handles by which the multimedia system user may

analyse its contents.

For example, consider an online movie screenplay

(textual data) and a digitized movie (video and audio

data). If one were analysing the screenplay and interested

in searching for instances

of

the word horse in the text,

various text searching algorithms could be employed to

locate every instance of this symbol as desired. Such

analysis is common in online text databases.

If,

however,

one were interested in searching for every scene in the

digitized movie where a horse appeared, the task is much

more difficult. Unless a human performs some sort of


2/19


3/19


4/19

610

J. D.

COURTNEY

the region detected by the segmentation of image In

is

due

to the motion

of

an object present in the reference image

(i.e. due to exposed background ), a high probability

exists that the boundary

of

the segmented region will

coincide with intensity edges detected in

0

. If

the region

is due to the presence of a foreground object in the

current image, a high probability exists that the region

boundary will coincide with intensity edges in In. The test

is

implemented by applying an edge detection operator to

the current and reference images and checking for co

incident boundary pixels in the segmented region

of

Cn.<

9

l Figure 3 shows this process.

I f

the test supports

the hypothesis that the region in question

is

due to

exposed background, the reference image

is

modified

by replacing the object with its exposed background

region (see Fig. 4).

No motion segmentation technique

is

perfect. The

following are errors typical of many motion segmenta

tion techniques:

1.

True objects will disappear temporarily from the

motion segmentation record. This occurs when there

is insufficient contrast between an object and an

occluded background region, or

if

an object

is

partially occluded by a background structure (for

instance, a tree or pillar present in the scene).

2.

False objects will appear temporarily in the motion

segmentation record. This

is

caused by light fluctua

tions or shadows cast by moving objects.

3.

Separate objects will temporarily join together. This

typically occurs when two or more objects are in

close proximity or when one object occludes another

object.

4.

Single objects will split into multiple regions. This

occurs when a portion of

an

object has insufficient

contrast with the background it occludes.

Instead of applying incremental improvements to re

lieve the shortcomings of motion segmentation, the AVI

technique addresses these problems at a higher level

where information about the semantic content

of

the

video data

is

more readily available. The object tracking

and motion analysis stages described in Sections 3.3 and

3.4 employ object trajectory estimates and knowledge

concerning object motion and typical motion segmenta

tion errors to construct a more accurate representation

of

the video content.

3.3.

Object tracking

The motion segmentation output is processed by the

object tracking stage. Given a segmented image Cn with

P

uniquely-labeled regions corresponding to foreground

objects in the video, the system generates a set

of

features

to represent each region. This set of features is named a

V-object (video-object), denoted V ~ ,

p =

1, . . . ,

P

A

V-object contains the label, centroid, bounding box, and

shape mask of its corresponding region,

as

well

as

object

velocity and trajectory information generated by the

tracking process.

V-objects are then tracked through the segmented

video sequence. Given segmented images

Cn

and

Cn+t

with V-objects

Vn

= { V ~ ;

p

= 1, . . . , P} and

Vn+l

=

{V,:

1

; q = 1, . . .

,

Q},

respectively, the motion tracking

process links V-objects and V ~ + l if their position

and estimated velocity indicate that they correspond

to

the same real-world object appearing in frames

Fn

and

Fn+l· This

is

determined using linear prediction of

V

object positions and a mutual nearest neighbor criter

ion via the following procedure:

1. For each V-object E

Vn,

predict its position in the

next frame using

if,.

= J/, .

tn+l

-

tn),

where if,.

is

the predicted centroid of in Cn+t>

J1:

the centroid of measured in Cm the estimated

(forward) velocity of V ~ , and

tn+l

and tn are the

timestamps

of

frames Fn+l and

Fm

respectively.

Initially, the velocity estimate is set to

=

0,

0).

2. For each E

Vn,

determine the V-object in the next

frame with centroid nearest

if,..

This nearest neigh

bor

is

denoted J V ~ . Thus,

J V ~ = V ~ + l 3 II.U : - . u ~ + 1 l l

S

if ; - . u ~ + 1 l l Vq

r.

3. For every pair V ~ , J V ~ = V ~ +

1

)

for which

no

other

V

objects in

Vn

have

1

as

a nearest neighbor, estimate

the (forward) velocity

of V ~ + l ' as

r . U ~ + 1 -

vn+l = ;

tn+1 - tn

(1)

otherwise, set v ~ + l = 0,

0).

These steps are performed for each

Cm

n

=

0, 1,

. . .

,

2.

Steps 1 and 2 find nearest neighbors

in the subsequent frame for each V-object. Step 3 gen

erates velocity estimates for V-objects that can be un

ambiguously tracked; this information is used in step 1 to

predict V-object positions for the next frame.

Next, steps

l 3

are repeated for the reverse sequence,

i.e.

Cm n = 1 N -

2,

... , 1. This results in anew set

of predicted centroids, velocity estimates, and nearest

neighbors for each V-object in the reverse direction.

Thus, the V-objects are tracked both forward and back

ward through the sequence. The remaining steps are then

performed:

4.

V-objects and V ~ + l are

mutual nearest neighbors

if J l l ~ = V ~ +

1

and J V ~ +

1

= V ~ . (Here,

J V ~

is the

nearest neighbor

of

in the forward direction, and

J V ~

1

is

the nearest neighbor of

V ~ +

1

in the reverse

direction.) For each pair

of

mutual nearest neighbors

( V ~ , v ~ + 1 ) , create a

prim ry link

from to v ~ + 1

5.

For each E

Vn

without a mutual nearest neighbor,

create a

secondary link

from to if the predicted

centroid

if,.

is

within

E of

V ~ , where

E

is

some small

distance.

6. For each in Vn+

1

without a mutual nearest

neighbor, create a

secondary

link from J V ~ +

1

to

V,:

1

if the predicted centroid p ~ +

1

is

within E of

J V ~ + l

The object tracking procedure uses the mutual nearest

neighbor criterion (step 4) to estimate frame-to-frame V-


5/19

\ -

r

o.

I I

....

.__

Automatic video indexing via object motion analysis

a)

b)

(c)

(f)

(g)

h)

Fig. 3. Exposed background detection. a) Reference image /

0

. b) Image In c) Region to be tested. d)

Edge image o a), found using Sobel

0

1 operator. e) Edge image o b). t) Edge image o c), showing

boundary pixels. g) Pixels coincident in d) and

t).

h) Pixels coincident in e) and

t).

The greater number

o coincident pixels in g) versus h) support the hypothesis that the region in question is due to exposed

background.

611


6/19

612

J. D.

COURTNEY

Fig. 4.

Reference image modified to account

for the

exposed background region detected

in Fig.

3.

object trajectories with a high degree of confidence. Pairs

of mutual nearest neighbors are connected using a pri

mary link to indicate that they are highly likely to

represent the same real-world object in successive video

frames.

Steps 5-6 associate V-objects that are tracked

with less confidence but display evidence that they might

result from the same real-world object. Thus, these

objects are joined by secondary links. These steps

are necessary to account for the

split

and join

type motion segmentation errors as described in

Section

3.2.

The object tracking process results in a list

of

V-

objects and connecting links that form a directed graph

(digraph) representing the position and trajectory of

foreground objects in the video sequence. Thus, the V-

objects are the nodes

of

the graph and the connecting

links are the arcs. This motion graph is the output

of

the

object tracking stage.

O

Fl

F2

F3

F4

Figure 5 shows a motion graph for a hypothetical

sequence of one-dimensional frames. Here, the system

detects the appearance of an object at A and tracks it to

the V-object at

B.

Due to an error in motion segmenta

tion, the object splits at D and E, and joins at F.

At

G, the

object joins with the object tracked from C due to

occlusion. These objects split at H and

I

Note that

primary links connect the V-objects that were most

reliably tracked.

3 4 otion analysis

The motion analysis stage analyses the results of

object tracking and annotates the motion graph with tags

describing several events of interest. This process pro

ceeds in two parts: V-object grouping and V-object

indexing. Figure

6

shows an example motion graph for

a hypothetical sequence of 1-D frames discussed in the

following sections.

F5

F6 F7

F8

Fig. 5. The output of

the

object tracking stage for a hypothetical sequence of

1 D

frames. The vertical lines

labeled Fn represent

frame

number n Primary links

are shown as

solid arcs; secondary links are

shown as

dashed arcs.


7/19


613

FO Fl

F2

F3

F4

F5

F6 F7

F8 F

FlO Fll

Fl2 Fl3

Fl4

Fig.

6.

An example motion graph for a sequence of 1-D frames.

3.4.1. V-object grouping.

First, the motion analysis

stage hierarchically groups V-objects into structures

representing the paths of objects through the video data.

Using graph theory terminology,0

5

l

five groupings are

defined for this purpose:

A stem M

={Vi:

i

= 1,2,

. . . NM}

is a maximal

size, directed path (dipath)

of

two or more V-objects

containing no secondary links, meeting all of the follow

ing conditions:

• outdegree(Vi) = 1 for 1 ; i

<

NM

• indegree(Vi) = 1 for 1 < i

;

NM and

• either

(2)

or

(3)

where

Jli

is the centroid

of

V-object Vi

EM.

Thus, a stem represents a simple trajectory of an object

through two

or

more frames. Figure 7 labels V-objects

from Fig. 6 belonging to separate stems with the letters

A through

K .

Stems are used to determine the motion sta te

of

real-world objects, i.e. whether they are moving or

FO Fl F2

F3 F4 F5 F6

F7

stationary.

f

equation (2) is true, then the stem is classi

fied as stationary; if equation (3) is true, then the stem is

classified as

moving.

Figure

7

highlights stationary stems

B, C,

F

and H; the remainder are moving.

A branch B ={Vi: i = 1 2,

. . .

NB} is a maximal

size dipath of two or more V-objects containing no

secondary links, for which outdegree(Vi)=1 for

1 ;

i

< NB and indegree(V;)=l for 1 <

i ;

NB Figure

8 labels V-objects belonging to branches with the letters

L through T .

A

branch represents a highly reliable

trajectory estimate of an object through a series of

frames.

f

a branch consists entirely of a single stationary stem,

then it is classified as stationary; otherwise, it is classi

fied as

moving.

Branches N and Q in Fig. (high

lighted) are stationary; the remainder are moving.

A trail is a maximal-size dipath of two or more V-

objects that contains no secondary links. This grouping

represents the object tracking stage's best estimate of an

object trajectory using the mutual nearest neighbor cri

terion. Figure 9 labels V-objects belonging to trails with

the letters U through Z .

A trail and the V-objects it contains are classified as

stationary if

all the branches it contains are stationary,

K

F8

F9

FlO

Fll

Fl2 Fl3 Fl4

Fig. 7. Stems. Stationary stems are highlighted.

FO Fl

F2 F3

F4

F5

F6 F7

FS

F9

FlO

Fll

Fl2

Fl3

Fl4

Fig. 8. Branches. Stationary branches are highlighted.


8/19

614

J.

D.

COURTNEY

FO

Fl

F2 F3

F4 F5

F6 F7 FS

F9 FlO

Fll

Fl2 Fl3 Fl4

Fig. 9. Trails.

and moving if all the branches it contains are moving.

Otherwise, the trail is classified as unknown. Trail W in

Fig.

9

is stationary; the remainder are moving.

A

track K={L,,G,,

...

LNK_

1

GNK_

1

LNK} is a

dipath of maximal size containing trails {L; : 1 ::;

i::; NK] and connecting dipaths

{G;

: 1

::;

i

<

NK}.

For each

G;

E

K

there must exist a dipath

H = {Vf,G;,

V ~ d

(where Vf is the last V-object in L;, and

1

is the first V-

object inL;

1

),

such that every \ } E H meets the require-

ment

(4)

where is the centroid of

Vf

the forward velocity of

vf t j f;

the time difference between the frames con

taining \ } and Vf, and P j is the centroid of \ ). Thus,

equation (4) specifies that the object must maintain a

constant velocity through path H.

A track represents the trajectory estimate

of

an object

that may cause or undergo occlusion one or more times in

a sequence. The motion analysis stage uses equation (4)

to attempt to follow an object through frames where an

FO

Fl

F2

F3

F4 F5 F6 F7

occlusion occurs. Figure 10 labels V-objects belonging to

tracks with the letters a ,

(3 ,

x , 6 and c .

Note that track 6 joins trails

X

and Y

A track and the V-objects it contains are classified as

stationary if all the trails it contains are stationary, and

moving if all the trails it contains are moving. Otherwise,

the track is classified as

unknown.

Track

x

in Fig. 10 is

stationary; the remaining tracks are moving.

A trace is a maximal-size, connected digraph of V-

objects. A trace represents the complete trajectory of an

object and all the objects with which it intersects. Thus,

the motion graph in Fig. 6 contains two traces: one trace

extends from F

2

to F

; the remaining V-objects form a

second trace. Figure 11 labels V-objects on these traces

with the numbers

1

and

2 .

Note that the preceding groupings are hierarchical, i.e.

for every trace E, there exists at least one track K trail L,

branch

B,

and stem

M

such that

E

2

K

2

L

2

B

2

M.

Furthermore, every V-object is a member

of

exactly one

trace.

The motion analysis stage scans the motion graph

generated by the object tracking stage and groups

V-

objects into stems, branches, trails, tracks, and traces.

FS F9

FlO

Fll

Fl2 Fl3

Fl4

Fig.

10.

Tracks. The dipath connecting trails X and Y from Fig. 9 is highlighted.

FO

Fl

F2

F3

F4

F5 F6

F7

FS

F9

FlO Fll

Fl2

Fl3 Fl4

Fig. 11. Traces.


9/19


10/19

616

J D. COURTNEY

Table 1

Cond

itions for annotatin

g V-objects with eac

h of the object-mot

ion events

V-obje

ct motion state

Moving

Stationary Unknown

Appearance

1 Head

of

track

1

Head of

trac

k

2 in degree

V) > 0

2 indegr

ee V) = 0

Disappeara

nce 1 Tail

of track

1

Tail

of track

2

outdegree V)

>

0

2

outdegree V)

=

0

Entrance

1 Head

of

track

1 H ead

of

t

rack

2

in degree V

) = 0

2 indeg

ree V) 0

Exi t

1 Tail

of track

1

T

ail of track

2 o

utdegree V)

=

0

2

outdegree V)

= 0

Deposit

1 Head

of

track

2

in degree V

) = 1

Re moval

1

Tail

of track

2

outdegree V) 1

Depositor)

Adjacent to V-ob

ject with dep osit tag

Re mover)

Adj acent fro

m V-object with re

mov l tag

Motio

n

1 Tail

of stationary stem

2

Head of moving ste

m

Rest

1

Tail o

f

moving s

tem

2

Head

of

st

ationary stem

Entrance Entrance Motion Rest Exit ppear Disappearance

FO Fi

F2.:

F3

F ¢ FS

F6

··

F7

FS

F9

FlO F l l

Fl2.:

Fl

:Fl

4

De

positor I Deposit

Exit E

ntrance

Remov

al I Remover

Exit

Fi

g.

12

Annotation r

ules applied to Fig. 6

Video Indexing

Fig. 13 A

high-level diagram

of the

AVI

system

.

it forw

ard or backward

and stop on indiv

idual frames.

T

he system also p

rovides a content-

ba sed retrieval m

e

chanism

by which the A

VI system user m

ay specify

qu eries on a vid

eo sequence usin

g spatial, tempo

ral,

event-, and object-based parameters. Th us, the user

can j

um p to importan

t points in the vi

deo sequence

ba sed on the que

ry specification.

Figure

14

show

s a picture of the

playback porti

on

of

the AVI

GUI. It provides

familiar VCR-like

controls

(i

.e. forward, revers

e, stop, step-forw

ard, step-back), a

s

well as

a system clipb

oar d for record

in g inter

mediate video analysis results (i.e. video cli ps ) .

F

or example, the

clipboard shown

in Fig.

14

contai

ns

three

clips, the result of

a previous quer

y by the user.


11/19

Au

tomatic v

id eo index

ing via o

bj ect moti

on analysi

s

617

Fig.

14

The

AVI

system playback interface.

Fig. 15

The AVI

system

query inte

rface.

The user

may sele

ct o ne o

f he se clip

s, p lay it f

or ward a

nd

bac

k, and po

se a new

query us

in g it. Th

e clip(s)

resulting

from th

e new qu

ery are t

he n push

ed onto t

he top

of the

clipboard

stack. T

he user m

ay also

peruse th

e cl ipboa

rd

stack

using th

e button

-c omman

ds

up , dow n

, and

pop .

Fi gure 15

shows

the query

in terface

to the

AVI system

.

Usi

ng the T

ype fie

ld, the us

er may s

pe cify an

y com -


12/19


13/19


14/19

6

20

1 D

. OU

RTN

EY

[ ]

[10

]

[2

0]

[

30]

[

4 ]

[

50]

[60]

[7

0]

[80

]

[90]

[1

00]

[1

1 ]

[1

20]

1

30]

[

140

]

1

50]

1

60]

[17

0]

[180

]

1

90]

[2

]

[210

]

[22

]

[23

]

Fi

g. 21

. Fr a

mes

from

an e

xam p

le vid

eo s

equen

ce. F

rame

nu m

bers

are s

hown

belo

w ea

ch im

age.

ro

om

at

th a

t po i

nt, th

e pe

rso n

is d

efine

d

as

a di

ffere

nt

ob ject.)

T

he

user

retu

rns t

o the

or i

gina l

clip

of F

ig.

23(a

) by

pop p

ing

th e c

lipb

oa rd

stac

k twi

ce. T

hen

the

us er

app l

ies

the

que r

y f

in d

remo

val

even

ts of

this

ob

je ct

to

the

br iefcase. The sys tem responds with a sing le clip

of

the

pe

rson

rem

ov i

ng

the

brief

case

,

as

sh

own

in

F

ig. 2

3(c).


15/19


[36]

[78]

[110]

Fig. 22. Clips from the video sequence of Fig. 21 satisfying the query fmd all deposit events . Boxes

highlight the objects contributing to the event.

a) b)

c) d)

Fig. 23. Advanced video analysis example. Clips show: (a) the briefcase being deposited, (b) the entrance of

the person who deposits the briefcase, (c) the briefcase being removed, (d) the exit

of

the person who

removes the briefcase.

5. EXPERIMENTAL RESULTS

621

Finally, the user specifies the query find exit events of

this object tothepersonremovingthe briefcase. The system

then responds with a single clip of the person as he leaves

the room (with the briefcase), as shown in Fig. 23(d).

The video indexing technique described in this paper

was tested using the AVI system on three video sequences

containing a total of 900 frames,

18

objects, and 44


16/19


17/19


623

[O

[12]

[24]

[36]

[ 8] [60]

72]

[8 ]

[96]

[108] [120]

[132]

144]

[156] [168]

[180]

[192]

204]

[216]

[228]

[240J

[252] [264]

[276]

Fig. 24. Frames from Test Sequence 2.

which multimedia system users may navigate through

video sequences. The video indexing technique described

in this paper abstracts raw video information using

motion segmentation object tracking and a hierarchical

path construction method which enables annotation using

several motion-based event tags. Efficient retrieval o


18/19

624

J. D. COURTNEY

[O

13]

26] 39]

52] 65]

[78]

[91]

104]

117]

130]

143]

[156]

[169] [182] [195]

208]

221]

[234]

247]

260]

[273] 286]

299]

Fig. 25. Frames from Test Sequence

3.

video clips

is

facilitated by an event index into the

abstracted video. Furthermore, a system employing this

indexing technique for assisted analysis o surveillance

video allows users to jump to points

o

interest in a

video sequence via intuitive spatial, temporal, event-, and

object-based queries.


19/19

Automatic video indexing via object motion analysis 625

a)

b)

Fig. 26. Appearance and exit

of

an individual pedestrian from Test Sequence

3.

Frame F

217

shows the

pedestrian emerging from a car; frame

248

shows the pedestrian walk out of the field of

view.

Acknowledgements- Thanks go to Dinesh Nair and Stephen

Perkins for assisting

in

the design and implementation of the

AVI

system.

REFERENCES

1. HongJiang Zhang, Atreyi Kankanhalli and

W.

Stephen,

Automatic partitioning of full-motion video,

Multimedia

Systems 1(1), 10--28 (1993).

2.

Akihito Akutsu, Yoshinobu Tonomura, Hideo Hashimoto

and Yuji Ohba, Video indexing using motion vectors, in

Visual Communications and Image Processing Proc. SPIE

1818

Petros Maragos, ed., pp. 1522-1530, Boston,

Massachusetts (November 1992).

3. Mikihiro Ioka and Masato Kurokawa, A method for

retrieving sequences of images on the basis of motion

analysis, in Image Storage and Retrieval Systems Proc.

SPIE 1662 pp. 35-46 (1992).

4.

Suh-Yin Lee and Huan-Ming Kao, Video indexing an

approach based on moving object and track, in Storage and

Retrieval for Image and

Video

Databases

Proc.

SPIE

1908 Wayne Niblack, ed.,

pp.

25-36, San Jose, California

(February 1993).

5. Glorianna Davenport, Thomas Aguierre Smith and Nata1io

Pincever, Cinematic primitives for multimedia,

IEEE

Comput. Graphics Appl.

67-74 (July 1991).

6.

Masahiro Shibata, A temporal segmentation method for

video sequences, in Visual Communications and Image

Processing

Proc.

SPIE I818

Petros Maragos, ed., pp.

1194-1205, Boston, Massachusetts (November 1992).

7.

Deborah Swanberg, Chiao-Fe Shu and Ramesh Jain,

Knowledge guided parsing in video databases, in

Storage

and Retrieval for Image and Video Databases

Proc.

SPIE

1908

Wayne Niblack, ed., pp. 13-24, San Jose, California

(February 1993).

8. F Arman, R Depommier, A. Hsu and M-Y. Chiu, Content

based browsing of video sequences, in Proc.

ACM

Int.

Conf on Multimedia San Francisco, California (October

1994).

9.

Ramesh Jain, W.

N.

Martin and J. K. Aggarwal,

Segmentation through the detection of changes due to

motion,

Comput. Graphics Image Process.

11 13-34

(1979).

10.

S.

Yalamanchili,

W.

N.

Martin and

J.

K.

Aggarwal,

Extraction of moving object descriptions via differ

encing, Comput. Graphics Image Process. 18 188-201

(1982).

11. Dana H. Ballard and Christopher

M.

Brown,

Computer

Vision. Prentice-Hall, Englewood Cliffs, New Jersey

(1982).

12. Robert

M.

Haralick and Linda G. Shapiro, Computer and

Robot Vision Vol. 2. Addison-

Wesley,

Reading, Massa

chusetts (1993).

13. Akio Shio and Jack Sklansky, Segmentation of people in

motion, in

IEEE Workshop on Visual Motion

pp. 325-332,

Princeton, New Jersey (October 1991).

14. M.

Irani and P. Anandan, A unified approach to moving

object detection in 2D and 3D scenes, in Proc. Image

Understanding Workshop

pp. 707-718, Palm Springs,

California (February 1996).

15. Gary Chartrand and Ortrud R Oellermann, Applied and

Algorithmic Graph Theory. McGraw-Hill, New York

(1993).

16. Stephen

S.

Intille and Aaron F Bobick, Closed-world

tracking, in Proc. Fifth Int. Conf. on Computer Vision pp.

672-678, Cambridge, Massachusetts (June 1995).

bout the Author JONATHAN

D. COURTNEY received the M.S. degree in Computer Science and the B.S.

degree in Computer Engineering and Computer Science from Michigan State University. r Courtney

is

a

Member of the Technical Staff in the Multimedia Systems Branch of Corporate Research and Development at

Texas Instruments. His Master s thesis research, under the direction of Professor Ani

K.

Jain, concerned mobile

robot localization using multisensor maps. His current research interests include multimedia information

systems and virtual environments for cooperative work. r Courtney is a member of the IEEE.

Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition

Documents

Transcript of Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition