Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
-
Upload
m-syah-houari-sabirin -
Category
Documents
-
view
217 -
download
0
Transcript of Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
1/7
GRAPH-BASED OBJECT DETECTION AND TRACKING IN H.264/AVC BITSTREAMS FOR
SURVEILLANCE VIDEO
Houari Sabirin, Jaeil Kim and Munchurl Kim
Department of Information and Communications Engineering,
Korea Advanced Institute of Science and Technology, Daejeon, Korea
[email protected], [email protected], [email protected]
ABSTRACT
In this paper we present a novel method to detect and track
moving objects in H.264/AVC bitstreams by processing
motion vector and residue information. The encoded blocks
with nonzero motion vectors and residues are first detected
as moving object candidates. A spatio-temporal graph in
video sequences is then constructed to represent groups of
blocks in each frame and their associations to the other
groups of blocks in subsequent frames. Identification and
refinement of ROIs for moving objects being tracked are
done by graph matching and adaptive ROI-size adjustment.
The experimental results show that the proposed method
can correctly identify real moving objects from frame to
frame and can effectively detect small-sized objects and
objects with small motion vectors and residues, as well as
by recognizing moving objects even under occlusion.
Index Terms object detection and tracking, graph
theory, H.264/AVC, surveillance video
1. INTRODUCTION
Object detection and tracking in compressed bitstream
domain has been an interesting and challenging topic in
surveillance video analysis because the moving objects are
detected not directly upon the visible object data but in the
encoded data that represent the motion and pixel difference
due to the moving objects. It focuses on how to precisely
locate and identify the moving object regions and the
resulting trajectories, which usually relies on limited
information available in the compressed bitstreams.
Especially in H.264/AVC bitstream domain, some
research has been conducted to automatically detect and
track moving objects of interest. Some techniques based onpartial decoding proposed by [1] and [2] utilize additional
information such as object colors in identifying an object of
interest from different objects. But these methods may
require the computational complexity. Another method
using partial decoding proposed by [3] detects moving
vehicles in traffic recording which may not be suitable for
general applications. A method of using the bit size of block
partitions has shown good precision in detecting moving
objects [4]. While the shapes of the detected objects well
approximate real object boundaries in the precision of 44
block units, it does not identify different detected objects.
Similar results are given in [5] where the moving objects are
detected via motion vector processing, but no identifications
are made on the detected objects. The proposed method in
[6] presents the labels and trajectories of the detected
objects. However, it assumes that there is no noise due to
illumination changes or improper encoding process, which
is not usually the case in real applications.
On the other hand, graph theory has long been used as
one of the effective tools for object segmentation in
computer vision. A graph cut algorithm has been popularly
utilized for image segmentation in video sequence, which
observes the similarities and dissimilarities in terms of
energy between pixels which are represented as vertices. It
has shown effective performance in segmenting objects
from background [7]. Graph-based object tracking also has
been utilized to correctly identify two corresponding sets of
graphs between two consecutive frames in video sequences
[8]. Graph-based object detection in pixel domain has also
been studied for sports video in [9] to determine the
trajectories of moving objects. From these observations,
graph-based object detection and tracking might also be
applicable in compressed domain.
In this paper we propose a novel method of object
detection and tracking in compressed domain using a graph-
based approach. Firstly, the blocks that have non-zero
motion vectors and residues are detected as moving object
candidates. Secondly, groups of the detected blocks are
represented as spatial graphs in each frame. Then the groups
of detected blocks as spatial graphs in each frame are
temporally connected to the groups of detected blocks in its
next frame, which constitutes a spatio-temporal graph for
the whole block groups. Thirdly, the temporal connections
of spatial graphs are checked to remove the block groups
that are not part of the real moving objects and to track the
segmented block groups as moving objects by their attribute
similarities.
___________________________
This work was supported by the R&D program of MKE/IITA
[A1100-0801-3015, Development of Open-IPTV Technologies for
Wired and Wireless Networks]
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
2/7
This paper is organized as follows: We first define a
spatio-temporal graph with graph attributes in Section 2;
Section 3 describes a method of removing noisy objects in
the proposed spatio-temporal graph; a method of tracking
moving objects is described in Section 4. Region refinement
for the detected block groups is discussed in Section 5; the
experimental results are presented in Section 6; Finally
Section 7 concludes our works.
2. SPATIO-TEMPORAL GRAPH
In H.264/AVC, each MB is encoded in a block partition
mode among 1616 ~ 44 block partitions for Inter
prediction coding or among 44, 88 and 1616 block
modes for Intra prediction coding. Since the object regions
are represented in a group of 44 blocks which may include
non-zero motion vectors and/or non-zero residues, the
blocks having non-zero motion vectors or non-zero residues
are detected in 44 unit and clustered into groups. Note that
the motion vectors of the detected 44 blocks are copied
from their respective block partitions in MBs.A block group in a frame is defined as the detected
blocks for which their block boundaries are in touched.
Each block group is considered a moving object candidate
as one single subgraph for which a vertex represents a 44
block and an edge connects a pair of blocks that are in
touched. Thus one frame may contain several block groups
that represent the moving object candidates (i.e. the groups
can be the real moving objects or noise).
Now, we define a spatial graph which simply represents
the whole set of subgraphs in a frame. Notice that each
subgraph can be regarded as a super-vertex and there is no
connection between super-vertices in the spatial graph. In
general, the super-vertices in a frame have theircorresponding super-vertices in the next following frame.
Therefore, a spatio-temporal graph is defined by temporally
connecting the super-vertices to their corresponding super-
vertices between pairs of two consecutive frames in a video
sequence. Note that the spatio-temporal graph does not
growth in time in a video sequence. Instead, it is slid
forward from frame to frame. Next, this sort of graph based
representation for defining moving object candidates is
explained in details.
Let }0;,,{ 1 NggG N be a set of spatial graphs in
a frame where each spatial graph ),,( aEVgn is an
undirected attributed graph that represent the moving objectcandidates. Here N is the number of the detected moving
objects. The vertexng
vvvV ,,, 21 denotes the blocks
in a block group and the edge 1,0, vuE between twovertices u and v denotes the connections between two
adjacent blocks. The ordern
g is the number of blocks in
the group. An attribute for a vertex is defined as
)(),(),(),()( vevMvDvcvann gg
where the elements of
the attribute denote the location, direction, magnitude and
energy of the block, respectively, which characterizes the
corresponding object. By this definition, each detected
object is represented as a subgraphn
g in the spatial graph
in a frame. These attributes will be used to track the objects
of interest by correctly identifying them in video sequences.
The location JjIijivc ,;,)( indicates x and ycoordinates of the block relative to the top-left edge of the
frame. The direction is a real number ranging from to
calculated from the motion vector of the block as
xijyij mvmvvD 1tan)( where xijmv and yijmv are x and ycomponents of that motion vector. The magnitude that
indicates how far the block is moving, is given by
ijmvvM )( . The energy of the block is a nonnegative real
number calculated from the average of residues in block,
which is given by k ijkrKve 21)( . Here ijkr is theresidue ofk-th pixel of block in },{ ji and Kis the number
of pixels in which the residue is not zero.
Between consecutive frames, a spatio-temporal graph is
constructed by defining a weighted graph where the vertices
are composed of the subgraphs of spatial graph from five
consecutive frames. We define a weighted spatio-temporal
graph ),,( wEVG where the vertices are defined as
f
N
ff
N
f
f
N
ff
N
ff
N
f
01
234
,,,,,
,,,,,,,,,
1
11
1
22
1
33
1
44
1
vvvv
vvvvvvV
(1)
and the edges, the relation between two vertices in two
consecutive frames, are defined as
f
n
f
n
f
n
f
n
f
n
f
n
f
n
f
n 01123334,,,,,,,
1122334vvvvvvvvE
(2)
where n0, n-1, n-2, n-3, and n-4 are the indices of the vertices
in frame f to frame f-4, respectively, and N0, N-1, N-2, N-3,
and N-4 are the total numbers of vertices in the
corresponding frames. Thus vertex fnv denotes the
subgraphn
g in frame f. The weight of the edge w is
determined by calculating the similarity in distance between
two vertices, given by
)1()1()1()1(
,
Nf
n
Nf
n
Nf
n
Nf
n NNNNccw vvvv (3)
where f-N and f-(N+1) denote the index of two adjacent
frames, and 4,3,2,1,0N . The centroid vc is the meanof the location of all subvertices in v (vertices of subgraph
g). Fig. 1 illustrates an example of a spatio-temporal graph
G .
3. GRAPH PRUNING AND PROJECTION
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
3/7
In many cases, object detection and tracking in compressed
domain always suffers from the falsely detected blocks that
are not part of moving objects due to intensity change and
movement of background clutters such as shaking trees, as
well as due to fine quantization during encoding. To remove
such false blocks being detected as parts of moving objects
(and furthermore, to be tracked), a noise filtering is applied
by pruning the spatio-temporal graph G .
By assuming that the position of a moving object in a
frame is very close to that of the corresponding moving
object in the next frame (within 1 block, or 4 pixels away),we can remove the subgraphs g resulted from noisy blocks
by pruning the vertices and edges in spatio-temporal
graphthe spatio-temporal graph G for which the edge
weights are larger than 4 pixels. Fig. 2 illustrates an
example of edge weights spatio-temporal graph for two
consecutive frames.
Fig. 3 shows an example of graph pruning to remove
noisy subgraphs for the Speedway sequence. In Fig. 3(a),
the subgraphs are produced by moving objects as well as
background clutter (shaking trees as noise). The spatio-
temporal graph G constructed from five consecutive frames
exposes that some subgraphs are isolated while the others
are clustered into groups which are the groups G1, G2, andG3 as shown in Fig. 3(b).
Further observation shows that only group G2 and G3
are having edges in the consecutive five frames. Therefore,
by graph pruning, we can prune all vertices except those in
group G2 and G3 which are determined as the real moving
objects. Fig. 3(c) shows the result of graph pruning where
only subgraphs of the real objects remain survived after
graph pruning.
In other case, improper motion compensation or
insignificant frame differences may cause the blocks that are
supposed to represent moving objects to contain zero
moving vectors or no residue data. In this situation, the
graph pruning may remove the vertices that actuallyrepresent moving objects. To handle this problem, a graph
projection is performed after graph pruning to recover
missing vertices.
To avoid improper projection of noisy block groups
(subgraphs), the graph projection is performed after graph
pruning and is only performed when the number of
subgraphs is decreasing or becomes zero in two consecutive
frames. We first label the vertices of spatio-temporal
graphG in two consecutive frames. Let the vertex in frame
f-1 be 11,
Nmfmv and the corresponding missing vertex
to be found by projection in frame fbe 0, Nnfn v , where
N0 and N-1 are the numbers of vertices in the current and
previous frames, respectively. The missing vertex in the
current frame is projected from the previous frame.
Therefore 1 fmfn vv where its position is calculated as
11 fmfmfn mvcc vvv (4)
where vmv is the motion vector of v and =0.5 is theregulator constant to avoid the projected vertex shifted too
far from the actual object position.
4. GRAPH-BASED OBJECT TRACKING
The attributes of vertices in a spatio-temporal graph G are
used to track the detected objects by correctly identifying
them in video sequences. Object tracking is performed by
vertex matching between the current frame and a past
reference frame based on the attribute similarity. For vertex
matching, the attributes of vertices are compared. A
reference frame for vertex matching can be selected from
the preceding frames of the current frame, depending on the
change in the order of a spatial graph.
4.1. Adjacent vertex matching
Vertex matching is performed by simply matching two
vertices with similar attribute values for location in two
consecutive frames (the previous frame f-1 as a reference
frame and the current framef).
The matching between the vertices in frame f-1 and f
can be determined by finding two similar vertices where the
(a) (b)
Fig. 2. (a) An example of graph where red circles represent vertex
from current frame and blue circles represent vertex from previous
frame. (b) Edge weights of the same graph are shown.
(a) (b) (c)
Fig. 3. (a) A frame from Speedway sequence with superimposed
spatio-temporal graphsH. (b) The vertices and edges from five
consecutive frames. (c) The resulting graph pruning.
f f-1 f-2 f-3 f-4
Fig. 1. An example of spatio-temporal graph G constructed for
five consecutive frames. The edges show the correspondencesbetween two vertices in two consecutive frames.
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
4/7
edge weight is smaller than an adaptive threshold that is
determined by the block size unit and the magnitude of the
motion vectors to handle significant position changes due to
fast object motion.
4.2. Conditional vertex matching
Under a certain condition when vertex attributes cannot be
obtained, for example, in case of occlusion, the vertex
matching shall be performed by taking into account the
change in the order of a spatial graph and the selection of a
different reference frame.
The change in the orders of the spatial graphs between
two frames is defined as 101 ,,- where -1 denotesdecrement of the number of vertices, 0 denotes no changes
in the number of vertices, and 1 denotes the increment of
the number of vertices. Since does not explicitly determine
the number of detected objects, we need to know whether
the number of the objects in a frame has really changed due
to occlusion or not.
Let }1,0{;,)( 10 sssS v denote the status given to avertex to indicate whether occlusion is occurred or not in a
frame. Here S0 is the default status that indicates neither
occlusion-just-happened (OJH) nor occlusion-just-
finished (OJF) occurred in a frame, and S1 is the status that
indicates either OJH or OJF occurred in a frame. One vertex
is restricted to have only one status per frame. The statuses
are determined as follows:
Default status S0 = 1 is initially set in a start frame for allvertices
OJH status S1 = 1 is set when the distance between twovertices in one frame prior to the occlusion is smaller
than a block-unit size and = -1
OJF status S1 = 0 is set when the distance between twoobjects in one frame after occlusion is ended is smaller
than a block-unit size and = +1.
The selection of a reference frame in conditional vertex
matching is determined from the last frame when the objects
were occluded. Based on this condition, we perform vertex
matching by selecting one preceding frame f- as a
reference frame. The weight between vertices in framefand
f- is defined as
f
n
f
n
f
n
f
n
f
n
fn
fn
fn
fn
fn
eeDDS
ccSw
vvvvv
vvvvv
''''
,
00
00
1
0 (5)
where0
f
nS v is the status of
f
nv in default status and
1
f
nS v is the status of
f
nv in OJH status. Here, the
weight of an edge now takes into account the direction and
the energy of a vertex as a similarity feature. The direction
and energy ofv are calculated as the means of the direction
and energy of all subvertices in v . Since the ranges of
direction values and energy values are significantly different,
we need to rescale the values to balance the difference and
to make fair comparison between two attributes. Therefore
in (5), the direction and energy are defined as the base 10
logarithm of their original values.
To detect a new object is relatively simple. When a new
vertex of a spatio-temporal graph in subsequent frames is
detected and both adjacent vertex matching and conditional
vertex matching cannot find a similar vertex in the reference
frame, the vertex can be identified as a new object.
5. ROI REFINEMENT
In this stage, we define a region of interest (ROI) for
moving objects with the rectangle that encloses the block
groups (subgraphs), and refine the ROI size by controlling
the width and height of the ROI so the refined ROI could fit
into the real object size adaptively to accommodate the
changes in the order of each subgraph in spatial graphs.
Recalling subgraphn
g as the graph representing the n-
th object in a frame, we define the ROI of the n-th object in
frame fas nfn gcO ,, where and are the width
and the height of the ROI, respectively, and n
gc is the
centroid of the subgraph. The width is determined from the
number of vertices along the horizontal direction ofn
g and
the height is determined from the vertical direction. The
centroid is calculated as the mean of locations of the
vertices in the subgraph.
The refinement is performed by observing the size and
centroid of the ROI every five frames. That is, the sizes of
the ROI are computed and compared to the refined ROI in
previous frame every five frames. The refinement for the
size of the ROI is then performed according to the following
condition
otherwiseOO
OOifO
OOifO
O
fn
fn
fn
fn
fn
fn
fn
fn
fn
,
3:4:,
4:3:,
1
2
1
1
4
3
11
4
3
. (6)
where fnO is the refined ROI and1f
nO is the previously
refined ROI from preceding five frames.
In many cases, the ROIs may have different centroids
due to the changes in the number and position of the
vertices in subgraphs. As a result, the positions of ROIs maybe fluctuating. To reduce the large fluctuation in the ROI
positions, the centroids of the ROIs are controlled by
restricting their movement compared to those of the
corresponding ROIs in the previous frame. The changes in
the centroids of the ROIs are restricted within the 4-pixel
distance. If the centroid of an ROI moves beyond the 4-
pixel distance, the ROI displacement is retracted within the
4-pixel distance. By doing so, a reliable position for the ROI
can be ensured within the real moving object area.
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
5/7
In case of occlusion, when two subgraphs in a frame
are merged, they are represented as only one ROI. To track
the detected object even in occlusion, we observe the
attributes of vertices in the occluded subgraph (block group),
and cluster the vertices with similar attributes that represent
each occluded object. Therefore we can reconstruct the ROI
of both subgraphs during the occlusion.
Fig. 4 illustrates an ROI reconstruction during
occlusion. At one frame prior to occlusion, the ROI size for
the occluded objects is stored in the so-called ROI memory.
During the occlusion, the vertices can be clustered
according to its attribute similarities based on the attribute
values of both subgraphs prior to occlusion. Therefore, the
reconstruction of the ROI during the occlusion can be done
by simply assigning the ROIs of both objects prior to
occlusion to the locations of the clusters of vertices, as
shown in Fig. 4. After occlusion is finished, the ROI of both
objects are determined normally as the rectangle that
encloses the block groups of each detected object.
before occlusion
Of1
Of2
during occlusionocclusion started
ROI memory
ROI attributes
ROI attributes
ROI attributes
ROI attributes
Fig. 4. Illustration of reconstructing ROI of two objects during
occlusion: Dashed rectangles are the ROI of the encapsulated
subgraphs.
6. EXPERIMENT RESULTS
We use three test video sequences for the experiments with
Speedway, PETS2001, and Shinji sequences of 352288,
384288, 320240 pixel resolutions, respectively. All the
test sequences are encoded by H.264/AVC reference
software Joint Model 15.1 [10] with quantization parameter
value 32 in Baseline profile. The simulation platform for the
experiments is a PC with a 2.4GHz CPU with 2GB RAM.
Fig. 5 shows the tracking results of our proposed
method with a superimposed snapshot of five Speedway
sequence frames that are taken every ten frames. Thedetected object regions as ROIs in the superimposed snap
are shown in rectangle boxes. For better visibility, simple
brightness and contrast adjustments are made on the
superimposed snap. From the superimposed ROIs, we can
observe the speeds of the two detected objects with the size
changes in their respective ROIs. When an object is moving
fast as for Object 1, the motion displacement becomes large.
Therefore, the ROI approximation is not accurate by
including the non-object area. On the other hand, when an
object moves slowly as for Object 2, the ROI approximation
becomes quite accurate by tightly encompassing the object
region. In general, the proposed method can detect and
localize objects of a small size such as Object 1, as shown in
Fig. 5.
Fig. 5. Snapshots of superimposed five frames from Speedway
sequence.
Fig. 6 shows a superimposed snapshot of five
PETS2001 sequence frames taken every five frames for
which the proposed method also works well regardless of
object sizes.
Fig. 6. Snapshots of superimposed five frames from PETS2001
sequence.
Fig. 7 shows a series of snapshots of fragmented
PETS2001 sequence frames that are taken every ten frames
in order to highlight the performance of detection and
tracking by the proposed method under occlusion.
Fig. 7. A series of snapshots ofPETS2001 sequence frames during
the occlusion of two objects.
Object 1
Object 2
Object 1
Object 2
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
6/7
The rectangle boxes in the snapshots indicate the ROI
regions for moving objects, which is detected and identified
by the proposed method. It Object 1 (a person marked as 1
in the red rectangle box) and Object 2 (a car marked as 2
in the green rectangle box) are separate in the first snapshot.
As can be shown in the subsequent snapshots, Object 1 is
occluded by Object 2 in the third and fourth snapshots, and
then they are successfully detected and identified as two
separated moving objects in the fifth and sixth snapshots by
the proposed method. Although there are several frames
where the ROI size of Object 2 is relatively larger than the
real object size, the rectangle boxes as ROI sizes are
visually acceptable to distinguish different moving objects.
Fig. 8 shows a superimposed snapshot of five Shinji
sequence frames that have been taken every forty frames.
Most of the ROIs are obtained from the projection of the
vertices of the spatio-temporal graph when the blocks are
missing due to zero motion vectors and/or no residues. As
the object is moving forward closer to the camera, the real
object and ROI size are getting larger. The proposed method
successfully detects and locates the moving object under the
change in size, as shown in Fig. 8.
Fig. 8. Snapshots of superimposed five frames from Shinji
sequence.
7. CONCLUSIONS
We have presented a graph-based method for detecting and
tracking moving objects in H.264/AVC bitstream domain by
constructing spatio-temporal graph from the detected blocks
with non-zero motion vectors and/or non-zero residues.
Here the detected blocks are clustered into groups of blocks,
and the block groups are represented as subgraphs which
constitute a spatial graph in each frame. The temporal
connections between spatial graphs in two frames create a
spatio-temporal graph in which the edge between two super-vertices represents the correspondence for the same object
in two frames. The spatial graph enables representation of
moving objects in each frame, even for the objects of small
sizes, and the ROI identification for the detected objects
during occlusion. The spatio-temporal graph can be utilized
to recognize whether the detected blocks are real or false
moving objects based on the edge weights between super-
vertices by graph pruning. The spatio-temporal graph also
enables to accurately identify objects of interest from frame
to frame, even when the detected objects are occluded, as
well as to detect and track the objects under the change in
sizes.
8. REFERENCES
[1] W. You, M. S. H. Sabirin, and M. Kim, MovingObject Tracking in H.264/AVC Bitstream, In
Multimedia Content Analysis and Mining 2007, Nicu
Sebe, Yuncai Liu, Yueting Zhuang, and Thomas Huang
(Eds.). Springer-Verlag, Berlin, Heidelberg, 483-492.
[2] W. You, M. S. H. Sabirin, and M. Kim, Real-timedetection and tracking of multiple objects with partial
decoding in H.264/AVC bitstream domain,Real-Time
Image and Video Processing 2009, Vol. 7244, No. 1.
(2009), 72440D.
[3] C. Kas, M. Brulin, H. Nicolas, and C. Maillet,"Compressed domain aided analysis of traffic
surveillance videos,"Distributed Smart Cameras, 2009.
ICDSC 2009. Third ACM/IEEE InternationalConference on, pp.1-8, Aug. 30 2009-Sept. 2 2009.
[4] C. Poppe, S. De Bruyne, T. Paridaens, P. Lambert, andR. Van de Walle, Moving object detection in the
H.264/AVC compressed domain for video surveillance
applications, J. Vis. Comun. Image Represent. 20, 6
(August 2009), 428-437, 2009.
[5] S. K. Kapotas and A. N. Skodras, "Moving objectdetection in the H.264 compressed domain," Imaging
Systems and Techniques (IST), 2010 IEEE
International Conference on, pp.325-328, 1-2 July
2010.
[6] C. Ks and H. Nicolas, An Approach to TrajectoryEstimation of Moving Objects in the H.264Compressed Domain, In Proceedings of the 3rd
Pacific Rim Symposium on Advances in Image and
Video Technology (PSIVT '09), Toshikazu Wada, Fay
Huang, and Stephen Lin (Eds.). Springer-Verlag, Berlin,
Heidelberg, 318-329.
[7] J. Mooser, S. You, and U. Neumann, "Real-TimeObject Tracking for Augmented Reality Combining
Graph Cuts and Optical Flow," Mixed and Augmented
Reality, 2007. ISMAR 2007. 6th IEEE and ACM
International Symposium on, pp.145-152, 13-16 Nov.
2007.
[8] Z. Guanling, W. Yuping, and D. Nanping, "Graphbased visual object tracking," Computing,Communication, Control, and Management, 2009.
CCCM 2009. ISECS International Colloquium on,
vol.1, pp.99-102, 8-9 Aug. 2009.
[9] V. Pallavi, J. Mukherjee, A. K. Majumdar, and S. Sural ,"Graph-Based Multiplayer Detection and Tracking in
Broadcast Soccer Videos," Multimedia, IEEE
Transactions on, vol.10, no.5, pp.794-805, Aug. 2008.
-
8/2/2019 Graph-based Object Detection and Tracking in H.264/AVC Bitstreams for Surveillance Video
7/7
[10]Dolby Laboratories Inc., Fraunhofer-Institute HHI, andMicrosoft Corporation, H.264/14496-10 AVC
Reference Software, http://iphome.hhi.de/suehring/tml/.