Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

download Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

of 7

Transcript of Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    1/7

    GRAPH-BASED OBJECT DETECTION AND TRACKING IN H.264/AVC BITSTREAMS FOR

    SURVEILLANCE VIDEO

    Houari Sabirin, Jaeil Kim and Munchurl Kim

    Department of Information and Communications Engineering,

    Korea Advanced Institute of Science and Technology, Daejeon, Korea

    [email protected], [email protected], [email protected]

    ABSTRACT

    In this paper we present a novel method to detect and track

    moving objects in H.264/AVC bitstreams by processing

    motion vector and residue information. The encoded blocks

    with nonzero motion vectors and residues are first detected

    as moving object candidates. A spatio-temporal graph in

    video sequences is then constructed to represent groups of

    blocks in each frame and their associations to the other

    groups of blocks in subsequent frames. Identification and

    refinement of ROIs for moving objects being tracked are

    done by graph matching and adaptive ROI-size adjustment.

    The experimental results show that the proposed method

    can correctly identify real moving objects from frame to

    frame and can effectively detect small-sized objects and

    objects with small motion vectors and residues, as well as

    by recognizing moving objects even under occlusion.

    Index Terms object detection and tracking, graph

    theory, H.264/AVC, surveillance video

    1. INTRODUCTION

    Object detection and tracking in compressed bitstream

    domain has been an interesting and challenging topic in

    surveillance video analysis because the moving objects are

    detected not directly upon the visible object data but in the

    encoded data that represent the motion and pixel difference

    due to the moving objects. It focuses on how to precisely

    locate and identify the moving object regions and the

    resulting trajectories, which usually relies on limited

    information available in the compressed bitstreams.

    Especially in H.264/AVC bitstream domain, some

    research has been conducted to automatically detect and

    track moving objects of interest. Some techniques based onpartial decoding proposed by [1] and [2] utilize additional

    information such as object colors in identifying an object of

    interest from different objects. But these methods may

    require the computational complexity. Another method

    using partial decoding proposed by [3] detects moving

    vehicles in traffic recording which may not be suitable for

    general applications. A method of using the bit size of block

    partitions has shown good precision in detecting moving

    objects [4]. While the shapes of the detected objects well

    approximate real object boundaries in the precision of 44

    block units, it does not identify different detected objects.

    Similar results are given in [5] where the moving objects are

    detected via motion vector processing, but no identifications

    are made on the detected objects. The proposed method in

    [6] presents the labels and trajectories of the detected

    objects. However, it assumes that there is no noise due to

    illumination changes or improper encoding process, which

    is not usually the case in real applications.

    On the other hand, graph theory has long been used as

    one of the effective tools for object segmentation in

    computer vision. A graph cut algorithm has been popularly

    utilized for image segmentation in video sequence, which

    observes the similarities and dissimilarities in terms of

    energy between pixels which are represented as vertices. It

    has shown effective performance in segmenting objects

    from background [7]. Graph-based object tracking also has

    been utilized to correctly identify two corresponding sets of

    graphs between two consecutive frames in video sequences

    [8]. Graph-based object detection in pixel domain has also

    been studied for sports video in [9] to determine the

    trajectories of moving objects. From these observations,

    graph-based object detection and tracking might also be

    applicable in compressed domain.

    In this paper we propose a novel method of object

    detection and tracking in compressed domain using a graph-

    based approach. Firstly, the blocks that have non-zero

    motion vectors and residues are detected as moving object

    candidates. Secondly, groups of the detected blocks are

    represented as spatial graphs in each frame. Then the groups

    of detected blocks as spatial graphs in each frame are

    temporally connected to the groups of detected blocks in its

    next frame, which constitutes a spatio-temporal graph for

    the whole block groups. Thirdly, the temporal connections

    of spatial graphs are checked to remove the block groups

    that are not part of the real moving objects and to track the

    segmented block groups as moving objects by their attribute

    similarities.

    ___________________________

    This work was supported by the R&D program of MKE/IITA

    [A1100-0801-3015, Development of Open-IPTV Technologies for

    Wired and Wireless Networks]

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    2/7

    This paper is organized as follows: We first define a

    spatio-temporal graph with graph attributes in Section 2;

    Section 3 describes a method of removing noisy objects in

    the proposed spatio-temporal graph; a method of tracking

    moving objects is described in Section 4. Region refinement

    for the detected block groups is discussed in Section 5; the

    experimental results are presented in Section 6; Finally

    Section 7 concludes our works.

    2. SPATIO-TEMPORAL GRAPH

    In H.264/AVC, each MB is encoded in a block partition

    mode among 1616 ~ 44 block partitions for Inter

    prediction coding or among 44, 88 and 1616 block

    modes for Intra prediction coding. Since the object regions

    are represented in a group of 44 blocks which may include

    non-zero motion vectors and/or non-zero residues, the

    blocks having non-zero motion vectors or non-zero residues

    are detected in 44 unit and clustered into groups. Note that

    the motion vectors of the detected 44 blocks are copied

    from their respective block partitions in MBs.A block group in a frame is defined as the detected

    blocks for which their block boundaries are in touched.

    Each block group is considered a moving object candidate

    as one single subgraph for which a vertex represents a 44

    block and an edge connects a pair of blocks that are in

    touched. Thus one frame may contain several block groups

    that represent the moving object candidates (i.e. the groups

    can be the real moving objects or noise).

    Now, we define a spatial graph which simply represents

    the whole set of subgraphs in a frame. Notice that each

    subgraph can be regarded as a super-vertex and there is no

    connection between super-vertices in the spatial graph. In

    general, the super-vertices in a frame have theircorresponding super-vertices in the next following frame.

    Therefore, a spatio-temporal graph is defined by temporally

    connecting the super-vertices to their corresponding super-

    vertices between pairs of two consecutive frames in a video

    sequence. Note that the spatio-temporal graph does not

    growth in time in a video sequence. Instead, it is slid

    forward from frame to frame. Next, this sort of graph based

    representation for defining moving object candidates is

    explained in details.

    Let }0;,,{ 1 NggG N be a set of spatial graphs in

    a frame where each spatial graph ),,( aEVgn is an

    undirected attributed graph that represent the moving objectcandidates. Here N is the number of the detected moving

    objects. The vertexng

    vvvV ,,, 21 denotes the blocks

    in a block group and the edge 1,0, vuE between twovertices u and v denotes the connections between two

    adjacent blocks. The ordern

    g is the number of blocks in

    the group. An attribute for a vertex is defined as

    )(),(),(),()( vevMvDvcvann gg

    where the elements of

    the attribute denote the location, direction, magnitude and

    energy of the block, respectively, which characterizes the

    corresponding object. By this definition, each detected

    object is represented as a subgraphn

    g in the spatial graph

    in a frame. These attributes will be used to track the objects

    of interest by correctly identifying them in video sequences.

    The location JjIijivc ,;,)( indicates x and ycoordinates of the block relative to the top-left edge of the

    frame. The direction is a real number ranging from to

    calculated from the motion vector of the block as

    xijyij mvmvvD 1tan)( where xijmv and yijmv are x and ycomponents of that motion vector. The magnitude that

    indicates how far the block is moving, is given by

    ijmvvM )( . The energy of the block is a nonnegative real

    number calculated from the average of residues in block,

    which is given by k ijkrKve 21)( . Here ijkr is theresidue ofk-th pixel of block in },{ ji and Kis the number

    of pixels in which the residue is not zero.

    Between consecutive frames, a spatio-temporal graph is

    constructed by defining a weighted graph where the vertices

    are composed of the subgraphs of spatial graph from five

    consecutive frames. We define a weighted spatio-temporal

    graph ),,( wEVG where the vertices are defined as

    f

    N

    ff

    N

    f

    f

    N

    ff

    N

    ff

    N

    f

    01

    234

    ,,,,,

    ,,,,,,,,,

    1

    11

    1

    22

    1

    33

    1

    44

    1

    vvvv

    vvvvvvV

    (1)

    and the edges, the relation between two vertices in two

    consecutive frames, are defined as

    f

    n

    f

    n

    f

    n

    f

    n

    f

    n

    f

    n

    f

    n

    f

    n 01123334,,,,,,,

    1122334vvvvvvvvE

    (2)

    where n0, n-1, n-2, n-3, and n-4 are the indices of the vertices

    in frame f to frame f-4, respectively, and N0, N-1, N-2, N-3,

    and N-4 are the total numbers of vertices in the

    corresponding frames. Thus vertex fnv denotes the

    subgraphn

    g in frame f. The weight of the edge w is

    determined by calculating the similarity in distance between

    two vertices, given by

    )1()1()1()1(

    ,

    Nf

    n

    Nf

    n

    Nf

    n

    Nf

    n NNNNccw vvvv (3)

    where f-N and f-(N+1) denote the index of two adjacent

    frames, and 4,3,2,1,0N . The centroid vc is the meanof the location of all subvertices in v (vertices of subgraph

    g). Fig. 1 illustrates an example of a spatio-temporal graph

    G .

    3. GRAPH PRUNING AND PROJECTION

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    3/7

    In many cases, object detection and tracking in compressed

    domain always suffers from the falsely detected blocks that

    are not part of moving objects due to intensity change and

    movement of background clutters such as shaking trees, as

    well as due to fine quantization during encoding. To remove

    such false blocks being detected as parts of moving objects

    (and furthermore, to be tracked), a noise filtering is applied

    by pruning the spatio-temporal graph G .

    By assuming that the position of a moving object in a

    frame is very close to that of the corresponding moving

    object in the next frame (within 1 block, or 4 pixels away),we can remove the subgraphs g resulted from noisy blocks

    by pruning the vertices and edges in spatio-temporal

    graphthe spatio-temporal graph G for which the edge

    weights are larger than 4 pixels. Fig. 2 illustrates an

    example of edge weights spatio-temporal graph for two

    consecutive frames.

    Fig. 3 shows an example of graph pruning to remove

    noisy subgraphs for the Speedway sequence. In Fig. 3(a),

    the subgraphs are produced by moving objects as well as

    background clutter (shaking trees as noise). The spatio-

    temporal graph G constructed from five consecutive frames

    exposes that some subgraphs are isolated while the others

    are clustered into groups which are the groups G1, G2, andG3 as shown in Fig. 3(b).

    Further observation shows that only group G2 and G3

    are having edges in the consecutive five frames. Therefore,

    by graph pruning, we can prune all vertices except those in

    group G2 and G3 which are determined as the real moving

    objects. Fig. 3(c) shows the result of graph pruning where

    only subgraphs of the real objects remain survived after

    graph pruning.

    In other case, improper motion compensation or

    insignificant frame differences may cause the blocks that are

    supposed to represent moving objects to contain zero

    moving vectors or no residue data. In this situation, the

    graph pruning may remove the vertices that actuallyrepresent moving objects. To handle this problem, a graph

    projection is performed after graph pruning to recover

    missing vertices.

    To avoid improper projection of noisy block groups

    (subgraphs), the graph projection is performed after graph

    pruning and is only performed when the number of

    subgraphs is decreasing or becomes zero in two consecutive

    frames. We first label the vertices of spatio-temporal

    graphG in two consecutive frames. Let the vertex in frame

    f-1 be 11,

    Nmfmv and the corresponding missing vertex

    to be found by projection in frame fbe 0, Nnfn v , where

    N0 and N-1 are the numbers of vertices in the current and

    previous frames, respectively. The missing vertex in the

    current frame is projected from the previous frame.

    Therefore 1 fmfn vv where its position is calculated as

    11 fmfmfn mvcc vvv (4)

    where vmv is the motion vector of v and =0.5 is theregulator constant to avoid the projected vertex shifted too

    far from the actual object position.

    4. GRAPH-BASED OBJECT TRACKING

    The attributes of vertices in a spatio-temporal graph G are

    used to track the detected objects by correctly identifying

    them in video sequences. Object tracking is performed by

    vertex matching between the current frame and a past

    reference frame based on the attribute similarity. For vertex

    matching, the attributes of vertices are compared. A

    reference frame for vertex matching can be selected from

    the preceding frames of the current frame, depending on the

    change in the order of a spatial graph.

    4.1. Adjacent vertex matching

    Vertex matching is performed by simply matching two

    vertices with similar attribute values for location in two

    consecutive frames (the previous frame f-1 as a reference

    frame and the current framef).

    The matching between the vertices in frame f-1 and f

    can be determined by finding two similar vertices where the

    (a) (b)

    Fig. 2. (a) An example of graph where red circles represent vertex

    from current frame and blue circles represent vertex from previous

    frame. (b) Edge weights of the same graph are shown.

    (a) (b) (c)

    Fig. 3. (a) A frame from Speedway sequence with superimposed

    spatio-temporal graphsH. (b) The vertices and edges from five

    consecutive frames. (c) The resulting graph pruning.

    f f-1 f-2 f-3 f-4

    Fig. 1. An example of spatio-temporal graph G constructed for

    five consecutive frames. The edges show the correspondencesbetween two vertices in two consecutive frames.

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    4/7

    edge weight is smaller than an adaptive threshold that is

    determined by the block size unit and the magnitude of the

    motion vectors to handle significant position changes due to

    fast object motion.

    4.2. Conditional vertex matching

    Under a certain condition when vertex attributes cannot be

    obtained, for example, in case of occlusion, the vertex

    matching shall be performed by taking into account the

    change in the order of a spatial graph and the selection of a

    different reference frame.

    The change in the orders of the spatial graphs between

    two frames is defined as 101 ,,- where -1 denotesdecrement of the number of vertices, 0 denotes no changes

    in the number of vertices, and 1 denotes the increment of

    the number of vertices. Since does not explicitly determine

    the number of detected objects, we need to know whether

    the number of the objects in a frame has really changed due

    to occlusion or not.

    Let }1,0{;,)( 10 sssS v denote the status given to avertex to indicate whether occlusion is occurred or not in a

    frame. Here S0 is the default status that indicates neither

    occlusion-just-happened (OJH) nor occlusion-just-

    finished (OJF) occurred in a frame, and S1 is the status that

    indicates either OJH or OJF occurred in a frame. One vertex

    is restricted to have only one status per frame. The statuses

    are determined as follows:

    Default status S0 = 1 is initially set in a start frame for allvertices

    OJH status S1 = 1 is set when the distance between twovertices in one frame prior to the occlusion is smaller

    than a block-unit size and = -1

    OJF status S1 = 0 is set when the distance between twoobjects in one frame after occlusion is ended is smaller

    than a block-unit size and = +1.

    The selection of a reference frame in conditional vertex

    matching is determined from the last frame when the objects

    were occluded. Based on this condition, we perform vertex

    matching by selecting one preceding frame f- as a

    reference frame. The weight between vertices in framefand

    f- is defined as

    f

    n

    f

    n

    f

    n

    f

    n

    f

    n

    fn

    fn

    fn

    fn

    fn

    eeDDS

    ccSw

    vvvvv

    vvvvv

    ''''

    ,

    00

    00

    1

    0 (5)

    where0

    f

    nS v is the status of

    f

    nv in default status and

    1

    f

    nS v is the status of

    f

    nv in OJH status. Here, the

    weight of an edge now takes into account the direction and

    the energy of a vertex as a similarity feature. The direction

    and energy ofv are calculated as the means of the direction

    and energy of all subvertices in v . Since the ranges of

    direction values and energy values are significantly different,

    we need to rescale the values to balance the difference and

    to make fair comparison between two attributes. Therefore

    in (5), the direction and energy are defined as the base 10

    logarithm of their original values.

    To detect a new object is relatively simple. When a new

    vertex of a spatio-temporal graph in subsequent frames is

    detected and both adjacent vertex matching and conditional

    vertex matching cannot find a similar vertex in the reference

    frame, the vertex can be identified as a new object.

    5. ROI REFINEMENT

    In this stage, we define a region of interest (ROI) for

    moving objects with the rectangle that encloses the block

    groups (subgraphs), and refine the ROI size by controlling

    the width and height of the ROI so the refined ROI could fit

    into the real object size adaptively to accommodate the

    changes in the order of each subgraph in spatial graphs.

    Recalling subgraphn

    g as the graph representing the n-

    th object in a frame, we define the ROI of the n-th object in

    frame fas nfn gcO ,, where and are the width

    and the height of the ROI, respectively, and n

    gc is the

    centroid of the subgraph. The width is determined from the

    number of vertices along the horizontal direction ofn

    g and

    the height is determined from the vertical direction. The

    centroid is calculated as the mean of locations of the

    vertices in the subgraph.

    The refinement is performed by observing the size and

    centroid of the ROI every five frames. That is, the sizes of

    the ROI are computed and compared to the refined ROI in

    previous frame every five frames. The refinement for the

    size of the ROI is then performed according to the following

    condition

    otherwiseOO

    OOifO

    OOifO

    O

    fn

    fn

    fn

    fn

    fn

    fn

    fn

    fn

    fn

    ,

    3:4:,

    4:3:,

    1

    2

    1

    1

    4

    3

    11

    4

    3

    . (6)

    where fnO is the refined ROI and1f

    nO is the previously

    refined ROI from preceding five frames.

    In many cases, the ROIs may have different centroids

    due to the changes in the number and position of the

    vertices in subgraphs. As a result, the positions of ROIs maybe fluctuating. To reduce the large fluctuation in the ROI

    positions, the centroids of the ROIs are controlled by

    restricting their movement compared to those of the

    corresponding ROIs in the previous frame. The changes in

    the centroids of the ROIs are restricted within the 4-pixel

    distance. If the centroid of an ROI moves beyond the 4-

    pixel distance, the ROI displacement is retracted within the

    4-pixel distance. By doing so, a reliable position for the ROI

    can be ensured within the real moving object area.

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    5/7

    In case of occlusion, when two subgraphs in a frame

    are merged, they are represented as only one ROI. To track

    the detected object even in occlusion, we observe the

    attributes of vertices in the occluded subgraph (block group),

    and cluster the vertices with similar attributes that represent

    each occluded object. Therefore we can reconstruct the ROI

    of both subgraphs during the occlusion.

    Fig. 4 illustrates an ROI reconstruction during

    occlusion. At one frame prior to occlusion, the ROI size for

    the occluded objects is stored in the so-called ROI memory.

    During the occlusion, the vertices can be clustered

    according to its attribute similarities based on the attribute

    values of both subgraphs prior to occlusion. Therefore, the

    reconstruction of the ROI during the occlusion can be done

    by simply assigning the ROIs of both objects prior to

    occlusion to the locations of the clusters of vertices, as

    shown in Fig. 4. After occlusion is finished, the ROI of both

    objects are determined normally as the rectangle that

    encloses the block groups of each detected object.

    before occlusion

    Of1

    Of2

    during occlusionocclusion started

    ROI memory

    ROI attributes

    ROI attributes

    ROI attributes

    ROI attributes

    Fig. 4. Illustration of reconstructing ROI of two objects during

    occlusion: Dashed rectangles are the ROI of the encapsulated

    subgraphs.

    6. EXPERIMENT RESULTS

    We use three test video sequences for the experiments with

    Speedway, PETS2001, and Shinji sequences of 352288,

    384288, 320240 pixel resolutions, respectively. All the

    test sequences are encoded by H.264/AVC reference

    software Joint Model 15.1 [10] with quantization parameter

    value 32 in Baseline profile. The simulation platform for the

    experiments is a PC with a 2.4GHz CPU with 2GB RAM.

    Fig. 5 shows the tracking results of our proposed

    method with a superimposed snapshot of five Speedway

    sequence frames that are taken every ten frames. Thedetected object regions as ROIs in the superimposed snap

    are shown in rectangle boxes. For better visibility, simple

    brightness and contrast adjustments are made on the

    superimposed snap. From the superimposed ROIs, we can

    observe the speeds of the two detected objects with the size

    changes in their respective ROIs. When an object is moving

    fast as for Object 1, the motion displacement becomes large.

    Therefore, the ROI approximation is not accurate by

    including the non-object area. On the other hand, when an

    object moves slowly as for Object 2, the ROI approximation

    becomes quite accurate by tightly encompassing the object

    region. In general, the proposed method can detect and

    localize objects of a small size such as Object 1, as shown in

    Fig. 5.

    Fig. 5. Snapshots of superimposed five frames from Speedway

    sequence.

    Fig. 6 shows a superimposed snapshot of five

    PETS2001 sequence frames taken every five frames for

    which the proposed method also works well regardless of

    object sizes.

    Fig. 6. Snapshots of superimposed five frames from PETS2001

    sequence.

    Fig. 7 shows a series of snapshots of fragmented

    PETS2001 sequence frames that are taken every ten frames

    in order to highlight the performance of detection and

    tracking by the proposed method under occlusion.

    Fig. 7. A series of snapshots ofPETS2001 sequence frames during

    the occlusion of two objects.

    Object 1

    Object 2

    Object 1

    Object 2

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    6/7

    The rectangle boxes in the snapshots indicate the ROI

    regions for moving objects, which is detected and identified

    by the proposed method. It Object 1 (a person marked as 1

    in the red rectangle box) and Object 2 (a car marked as 2

    in the green rectangle box) are separate in the first snapshot.

    As can be shown in the subsequent snapshots, Object 1 is

    occluded by Object 2 in the third and fourth snapshots, and

    then they are successfully detected and identified as two

    separated moving objects in the fifth and sixth snapshots by

    the proposed method. Although there are several frames

    where the ROI size of Object 2 is relatively larger than the

    real object size, the rectangle boxes as ROI sizes are

    visually acceptable to distinguish different moving objects.

    Fig. 8 shows a superimposed snapshot of five Shinji

    sequence frames that have been taken every forty frames.

    Most of the ROIs are obtained from the projection of the

    vertices of the spatio-temporal graph when the blocks are

    missing due to zero motion vectors and/or no residues. As

    the object is moving forward closer to the camera, the real

    object and ROI size are getting larger. The proposed method

    successfully detects and locates the moving object under the

    change in size, as shown in Fig. 8.

    Fig. 8. Snapshots of superimposed five frames from Shinji

    sequence.

    7. CONCLUSIONS

    We have presented a graph-based method for detecting and

    tracking moving objects in H.264/AVC bitstream domain by

    constructing spatio-temporal graph from the detected blocks

    with non-zero motion vectors and/or non-zero residues.

    Here the detected blocks are clustered into groups of blocks,

    and the block groups are represented as subgraphs which

    constitute a spatial graph in each frame. The temporal

    connections between spatial graphs in two frames create a

    spatio-temporal graph in which the edge between two super-vertices represents the correspondence for the same object

    in two frames. The spatial graph enables representation of

    moving objects in each frame, even for the objects of small

    sizes, and the ROI identification for the detected objects

    during occlusion. The spatio-temporal graph can be utilized

    to recognize whether the detected blocks are real or false

    moving objects based on the edge weights between super-

    vertices by graph pruning. The spatio-temporal graph also

    enables to accurately identify objects of interest from frame

    to frame, even when the detected objects are occluded, as

    well as to detect and track the objects under the change in

    sizes.

    8. REFERENCES

    [1] W. You, M. S. H. Sabirin, and M. Kim, MovingObject Tracking in H.264/AVC Bitstream, In

    Multimedia Content Analysis and Mining 2007, Nicu

    Sebe, Yuncai Liu, Yueting Zhuang, and Thomas Huang

    (Eds.). Springer-Verlag, Berlin, Heidelberg, 483-492.

    [2] W. You, M. S. H. Sabirin, and M. Kim, Real-timedetection and tracking of multiple objects with partial

    decoding in H.264/AVC bitstream domain,Real-Time

    Image and Video Processing 2009, Vol. 7244, No. 1.

    (2009), 72440D.

    [3] C. Kas, M. Brulin, H. Nicolas, and C. Maillet,"Compressed domain aided analysis of traffic

    surveillance videos,"Distributed Smart Cameras, 2009.

    ICDSC 2009. Third ACM/IEEE InternationalConference on, pp.1-8, Aug. 30 2009-Sept. 2 2009.

    [4] C. Poppe, S. De Bruyne, T. Paridaens, P. Lambert, andR. Van de Walle, Moving object detection in the

    H.264/AVC compressed domain for video surveillance

    applications, J. Vis. Comun. Image Represent. 20, 6

    (August 2009), 428-437, 2009.

    [5] S. K. Kapotas and A. N. Skodras, "Moving objectdetection in the H.264 compressed domain," Imaging

    Systems and Techniques (IST), 2010 IEEE

    International Conference on, pp.325-328, 1-2 July

    2010.

    [6] C. Ks and H. Nicolas, An Approach to TrajectoryEstimation of Moving Objects in the H.264Compressed Domain, In Proceedings of the 3rd

    Pacific Rim Symposium on Advances in Image and

    Video Technology (PSIVT '09), Toshikazu Wada, Fay

    Huang, and Stephen Lin (Eds.). Springer-Verlag, Berlin,

    Heidelberg, 318-329.

    [7] J. Mooser, S. You, and U. Neumann, "Real-TimeObject Tracking for Augmented Reality Combining

    Graph Cuts and Optical Flow," Mixed and Augmented

    Reality, 2007. ISMAR 2007. 6th IEEE and ACM

    International Symposium on, pp.145-152, 13-16 Nov.

    2007.

    [8] Z. Guanling, W. Yuping, and D. Nanping, "Graphbased visual object tracking," Computing,Communication, Control, and Management, 2009.

    CCCM 2009. ISECS International Colloquium on,

    vol.1, pp.99-102, 8-9 Aug. 2009.

    [9] V. Pallavi, J. Mukherjee, A. K. Majumdar, and S. Sural ,"Graph-Based Multiplayer Detection and Tracking in

    Broadcast Soccer Videos," Multimedia, IEEE

    Transactions on, vol.10, no.5, pp.794-805, Aug. 2008.

  • 8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video

    7/7

    [10]Dolby Laboratories Inc., Fraunhofer-Institute HHI, andMicrosoft Corporation, H.264/14496-10 AVC

    Reference Software, http://iphome.hhi.de/suehring/tml/.