SPIE JEI Paper UnderReview

download SPIE JEI Paper UnderReview

of 43

Transcript of SPIE JEI Paper UnderReview

  • 8/10/2019 SPIE JEI Paper UnderReview

    1/43

    A piece-wise Kalman-filter based body joint tracking scheme for low

    resolution interlaced imagery

    Binu M Naira, Kimberly D Kendricksb, Vijayan K Asaria, Ronald F TuttlecaUniversity of Dayton Vision Lab, 300 College Park , Dayton, OH-45469

    bCentral State University, 1400 Brush Row Road, Wilberforce, OH-45384

    cAir Force Institute of Technology, 2950 Hobson Wa, OH-45433

    Abstract. We propose an efficient scheme to track joint locations in low resolution interlaced imagery based on

    primitive low-level image features. The proposed scheme consists of a piece-wise linear trackers where we model

    the sinusoidal joint trajectory as a series of linear sub-trajectories with an lbp-based region matching implemented at

    the boundaries. Each sub-trajectory is modeled by a Kalman filter which tracks the optical flow at the joint region in

    successive frames. When an optical flow mismatch occurs signaling the end of a sub-trajectory, the tracking scheme

    switches to an lbp-based region matching. This region-based match along with coarse joint estimated location from

    either a human pose estimation or point light algorithm provides us with a search region on which a second pass of

    region matching is performed. The corresponding Kalman filter is reinitialized with this finer joint location estimate

    and switches to track the optical flow along the next sub-trajectory. This tracking scheme is tested on a datasetcontaining 24 sequences of individuals walking across the face of the building. Three statistical measures are computed

    to describe its efficiency in terms of spatio-temporal closeness and multiple object tracking accuracy: Trajectory Co-

    variance, MOTA/MOTP and precision/recall scores.

    Keywords:Joint Tracking, Local Binary Patterns, Kalman Filter, Lucas Kanade Optical Flow, Region-based match-

    ing, Multiple Object Tracking, Sinusoidal trajectory modeling.

    Address all correspondence to: Binu M Nair, University of Dayton Vision Lab, E-mail: [email protected]

    1 Introduction

    Detection, tracking and locating of objects of interest in a scene is an important aspect of computer

    vision where this tuple of research areas hold many applications in surveillance based scenarios

    ranging from target identification and tracking in aerial imagery to object detection and location

    estimation in videos from CCTV cameras. In recent years, research in object tracking from surveil-

    lance videos has been restricted to either detecting and tracking large objects in the scene such as

    people in shopping malls, players on a soccer/basketball court,1, 2 detection and tracking of cars

    etc. Some research on tracking points in a scene has also been developed where interest points

    detected on the human body can be tracked to obtain trajectories to differentiate between differ-

    ent actions.35 Research on joint tracking which requires consistent estimation of a certain body

    1

  • 8/10/2019 SPIE JEI Paper UnderReview

    2/43

    part has been tackled in scenarios such as indoors for gaming applications where depth sensors

    give accurate depth information and higher resolution video data captured at higher frame rates are

    available. But, the problem of body joint detection and tracking from surveillance videos captured

    by low resolution CCTV cameras without any depth information and at low frame rates has not

    given much emphasis in todays research.

    In this manuscript, we propose a novel tracking scheme which tracks some of the relevant

    body joints such as the shoulder, elbow,wrist, waist, knee and ankle across the scene given a

    coarse estimation of the joint location obtained from a human pose estimation algorithm or from

    a point light software. An illustration of the specific body joints used for tracking is shown in

    Figure 1a. The proposed scheme approximates the non-linear trajectory of a joint as a combination

    of successive linear sub-trajectories where each one is modeled by a Kalman filter designed to

    track small non-linear variations between frames. The novel aspect of this scheme is that the

    boundaries of these sub-trajectories are not pre-defined and can in fact be determined on-line by

    detecting apparent mismatches of joint regions between frames along the linear sub-trajectory. In

    low resolution imagery, optical flow provides us with the best coarse of measure to track regions or

    blobs moving on a linear(or slighty non-linear) fashion. By designing a Kalman filter to model the

    state transitions of the optical flow matches, we can detect mismatches if the optical flow match in

    the current frame does not fall into the Kalman filter predicted search region. By using this scheme

    of the Kalman filter to track optical flow matches on a linear sub-trajectory, its boundaries can be

    determined. This brings us to the next issue : Finding a suitable joint location to reinitialize the

    Kalman tracker. This finer estimate of the joint location can be determined by using a 2-level region

    matching scheme, where at each level, a coarse measure of the joint location is used to define a

    search region for the next level and at the finer level, a minimum distance measure between LBP 6

    2

  • 8/10/2019 SPIE JEI Paper UnderReview

    3/43

    (a) Illustration of specific joints on the human body to be tracked. (b) Illustration of a piecewise Kalman Filter

    concept for joint tracking.

    Fig 1: Joint illustration and its theoretical trajectory.

    joint region descriptors is used. Figure 1b illustrates the conceptual modeling of the non-linear

    trajectory by piece-wise Kalman filters.

    This paper is organized as follows : Section 2 gives the work related to various kinds of research

    done in tracking and a brief detail of the problems and issues being tackled. Section 3 explains the

    required theoretical background which formed the foundations of the proposed tracking framework

    described in Section 4. Finally, we provide results of tracking the specific body joints in Section 5

    and conclude this paper in Section 6 with some future proposals and ideas for further improvement

    of the algorithm.

    2 Related Work

    Joint tracking research or in other words, the human body pose estimation problem has been tack-

    led by the research community in two different scenarios; one which uses the depth information

    and the other which uses only the images. The former uses the depth information generated either

    by a depth sensor such as the Kinect or by a 3D reconstruction algorithm from multiple video

    sources. This is suitable for applications in indoor scenarios such as in gaming consoles or for

    3

  • 8/10/2019 SPIE JEI Paper UnderReview

    4/43

    human interactive systems where high resolution is available. The latter is used in surveillance

    scenarios which does not have any source of depth information such as the video feed from CCTV

    cameras monitoring a parking lot or a shopping mall. One of the most recent and popular work

    was done by Shotten et al7 for locating 3D joint position in the human body from a depth image

    acquired by a Kinect sensor. They used a part-based recognition paradigm where they converted

    a difficult pose estimation problem to an easier per-pixel classification problem and subsequently

    estimate the 3D joint locations irrespective of pose, body shape or clothing. In a more recent

    approach by Huang et al,8 human body pose is estimated and tracked across the scene using infor-

    mation acquired by a multi-camera system. Here, both the skeletal joint positions as well as the

    surface deformations(body shape changes) are estimated by fitting a reference surface model to the

    3D point reconstructions from the multi-camera system. This also makes use of a learning scheme

    which divides the point reconstructions into rigid body parts for accurate estimation. But the above

    cases requires the use of high resolution imagery under controlled lighting or environmental condi-

    tions to work with high accuracy. Another restriction of these methods is the dependence on depth

    information for either direct usage or for point cloud reconstruction.

    One of the earlier and popular works which does not use the depth information and which

    uses only a single video camera to track human motion is done by Markus Kohler. 9 Here, he

    designs a Kalman Filter to track human motion in such a way that non-linearity in motion can be

    considered as a motion with constant velocity and changing acceleration which can be modeled as

    white noise. The process noise co-variance of the Kalman filter is designed in such a manner so as

    to incorporate this changing acceleration. In our proposed algorithm, we use a modification of this

    Kalman filter and the design of the process noise co-variance to track the body joints along a sub-

    trajectory across the video sequence. Kaniche et al4 used the extended version of the Kalman filter

    4

  • 8/10/2019 SPIE JEI Paper UnderReview

    5/43

    to track specific points or corners detected at every frame of the video sequence for the purpose

    of gesture recognition. Each point is described by a region descriptor such as the Histogram of

    Oriented Gradients(HOG) and the Kalman filter tracks the position of the corner by using a HOG-

    based region matching. For tracking specific joints however, this methodology does not suffice

    as any corner point which does not get matched with previous frame gets discarded. Bilimski et

    al5 extended this methodology by incorporating the object speed and orientation to track multiple

    objects under occlusion.

    In recent years, the problem of human body pose estimation has not just being limited to track-

    ing points or corners or using depth information. One of the state of art methods for human pose es-

    timation on static images is the Flexible mixture of parts model, proposed by Yang and Ramanan. 10

    Instead of explicitly using a variety of oriented body part model templates(parameterized by pixel

    location and orientation) in a search-based template matching scheme, a family of affinely-warped

    templates is modelled, each template containing a mixture of non-oriented pictorial structures.

    This eliminates the need to estimate multiple degrees of freedom of a limb due to these approx-

    imations. Ramakrishna et al11 proposed an occlusion aware algorithm which tracks human body

    pose in a sequence where the human body is modeled as a combination of single parts such as the

    head and neck and symmetric part pairs such as the shoulders, knees and feet. Here, an impor-

    tant aspect of this algorithm is it can differentiate between similar looking parts such as the left

    or right leg/arm, thereby giving a suitable estimate of the human pose. Although these methods

    show an increased accuracy on datasets such as the Buffy Dataset12 and the Image Parse dataset,13

    the performance on very low-resolution imagery with interlacing is not yet evaluated. But one of

    main advantages of these kinds of human pose estimation algorithms is that in the post-processing

    stage, the various body part detections can provide coarse estimates of a joint location which can

    5

  • 8/10/2019 SPIE JEI Paper UnderReview

    6/43

    then be used to initialize tracking schemes and track joint locations in subsequent frames in a video

    sequence. One such work was done by Xavier et al14 where they propose a generalization of the

    non-maximum suppression post processing schemes to merge multiple post estimates either in a

    single frame or in multiple consecutive frames of a video sequence. This merging of estimates

    is done by a robust and constrained K-means clustering15 along the spatial domain for a single

    frame and along spatio-temporal domain for a video sequence. Again, one of the main concerns

    is its dependence on multiple pose estimates which relies on the ability of the state of the art pose

    estimation algorithms on low-resolution interlaced imagery.

    In our proposed joint tracking framework, we follow the track by detect scheme where we use

    optical flow matches and a Kalman filter to track joint locations lying on an approximately linear

    sub-trajectory with suitable re-initialization of the joint tracker using LBP-based region matching

    criteria. The coarse joint location estimates are used to re-initialize the tracker and this happens

    only in certain frames or intervals which indicate the beginning/end of a sub-trajectory of a joint.

    In the next section, we will describe the theory involved in the various modules of the tracking

    framework.

    3 Theory

    This section describes the necessary theoretical background required for a deeper understanding

    of the proposed model for joint estimation and tracking. The main sections which will be covered

    are : a) Lucas Kanade Optical Flow estimation, b) LBP-based Region Matching, c) Kalman Filter.

    Our proposed methodology is a combination of these techniques designed to estimate and track

    joints in a low-resolution video, given the coarse estimate of the joint locations.

    6

  • 8/10/2019 SPIE JEI Paper UnderReview

    7/43

    (a) Block schematic of optical flow computation to compute global velocity. (b) Optical flow illustration.

    Fig 2: Framework for computing optical flow and illustration.

    3.1 Lucas Kanade Optical Flow

    Optical flow between two frames of a video sequence estimates the velocity of a point in the real

    world scene by finding a relationship between the projections of that point in the corresponding

    frames. In other words, optical flow measures the velocity or movement of a pixel or region

    between two time instances. In our case, the point of interest is the corresponding joint of a human

    body and we need to estimate the velocity of that joint in the current frame given its location in

    the previous frame. There exists two main methods for computing this velocity : one is the Horn

    Shunck method which takes into consideration a global constraint (i.e the entire image pixels are

    used in the determination of the velocity of a single pixel) while the other is the Lucas Kanade 16

    method which is more localized (i.e. it considers only a neighborhood region around the point of

    interest, thereby setting a local constraint). The optical flow of both the methods are based on a

    single equation given by I(x,y,t) = I(x+ x, y+ y, t+ t). Here, lets consider that a pixel

    p= (x, y)at timethas moved to a positionp = (x + x,y + y)at timet + t. The equation then

    assumes that the brightness of the pixel remains constant through its movement. This is one of

    major assumptions of the optical flow. Other assumptions such as the spatial coherence where the

    point describing an object region does not change shape with time and temporal persistance where

    the motion happening in a pixel or region is purely based on the motion of the object and not

    7

  • 8/10/2019 SPIE JEI Paper UnderReview

    8/43

    due to the camera movement. For tracking joint regions of the human body, the localized regions

    remain rigid or do not change shape and thus does not violate the spatial coherence assumption.

    Since in our testing scenarios, we use video sequences captured from a stationary camera with a

    constant background, the temporal persistance assumption is not violated. So, for our purposes, we

    employ the Lucas Kanade(L-K) Optical Flow estimation technique which uses a local constraint.

    The optical flow equation can be derived by using a Taylor series expansion of the basic equation

    and is given by

    I

    xvx+

    I

    yvy+

    I

    t = 0 or I.v+It (1)

    where (vx, vy) are the optical flow velocity of a pixel p = (x, y). As mentioned earlier, L-K

    method uses a local constraint. A small window region (local neighborhood) around the point

    p = (x, y)is considered and within this neigbhorhood, a weighed least squares estimate equation

    is minimized. This equation is given by

    x,y

    W2(x, y)[I(x,y,t) +It(x,y,t)]2 (2)

    Using the above equation and the optical flow constraint equation (Equation 1), we can uniquely

    compute the solutionv. The assumption here is that the optical flow within that local region is con-

    stant. But there are some issues when computing the Lucas Kanade Optical flow. One issue is that

    the motion in the scene is not small enough and we will need the higher order terms in the optical

    flow constraint equation. The alternative approach is to use a pyramidal iterative Lucas Kanade

    approach where the image scene at a particular instant is down sampled to form a Gaussian pyra-

    mid and at each level, optical flow is computed. The other issue is that if the point in a local region

    8

  • 8/10/2019 SPIE JEI Paper UnderReview

    9/43

    does not move like its neighbors. This brings back to our earlier assumption of spatial coherence

    where the objects or points to be tracked should be rigid. So, one of the important design criteria

    is to determine what would be the ideal window size (local region size) for computing the optical

    flow at a certain point. For the joint tracking problem, this window size depends on the resolution

    of the video and thus for poor resolutions, we use a window size of77. An illustration of optical

    flow estimation on the points of human body silhouette is shown in Figure 2. For the proposed

    algorithm, we use the optical flow estimation in two scenarios: one to compute the global velocity

    of the motion of the human body; and the other to find an estimate of the location of a particular

    body joint in the next instant. For the latter purpose, we compute the optical flow for every point

    surrounding the joint region using L-K method and then compute the median flow.

    3.2 Region Descriptors

    The region descriptors such as the Local Binary Patterns(LBP)6 are used to describe the edge infor-

    mation and the textural content in a local region and can be a very effective descriptor for region-

    based image matching. Many efficient image descriptors are out there such as the SIFT,ORB,HOG

    etc.. but one of the assumptions is that the images should be of high resolution. The LBP is very

    effective in describing an image region in spite of low resolution and interlacing. The local bi-

    nary pattern is an image coding scheme which brings out the textural features in a region. For

    representing a joint region and to associate a joint in successive frames, the texture of the region

    plays a vital part in addition to the edge information. The LBP considers an local neighborhood of

    8 8in a joint region, and labels the neighborhood pixels by either a 1 or 0 based on the center

    pixel value. The coded value representing this local region is then the decimal representation of

    the neighborhood labels taken in clockwise manner. Thus, for every pixel within the joint region,

    9

  • 8/10/2019 SPIE JEI Paper UnderReview

    10/43

    a coded value is generated which represents the underlying texture. The LBP operator is defined

    as

    LBPP,R=P

    p=0

    s(gp gc)2p s(z) =

    1 z 0

    0 z

  • 8/10/2019 SPIE JEI Paper UnderReview

    11/43

    with co-variances given byQ = E[qkqTk ]and R = E[rkr

    Tk ]. Here, we can definePk = E[eke

    Tk ]

    as the error co-variance matrix at time instant k where we can consider a prior estimate of the

    state xk from the knowledge of the system and posterior estimate of the state xk after knowing

    the current measurementzk. The error ek is then defined as the difference between the true state

    and the posterior state(xk xk). For obtaining a true value of a response(or state) generated by

    a process or system, an iterative procedure will be to get a prior estimate xk at instantk which is

    obtained from the posterior estimate (xk1) at instant k 1. Then, using the measured value of

    the response (zk), we compute the innovation or measurement residualzk Hx

    k and use this to

    obtain a posterior estimate xk = x

    k +Kk(zk Hx

    k). The kalman gainKk at instantk is given

    by

    Kk=P

    k HT(HPk H

    T +R)1 (6)

    wherePk =E[e

    keTk ], e

    k = (xk x

    k)and Pk = (I KkH)P

    k . This iterative procedure can

    be divided into two stages; Time update (prediction stage) and Measurement Update(correction

    stage). Thus, the Kalman filter can be thought of as a process which estimates the state at one

    instant and then obtains the feedback in the form of noisy measurements of the response of the

    system.

    The recursive version of the Kalman filter can also be used for tracking purposes and in litera-

    ture, it has been widely applied for tracking points in video sequences. In the proposed algorithm,

    we use the Kalman filter to track a specific body joint across the scene. This is done by setting the

    state of the process (which in this case is the human body movement) as the (x, y)coordinates of

    the joint along with its velocity (vx, vy) to get a state vector xk R4. The measurement vector

    zk = [xo, yo] R2 will be provided by the coarse estimates obtained by using a human body pose

    11

  • 8/10/2019 SPIE JEI Paper UnderReview

    12/43

    Fig 3: Joint tracking algorithm using Kalman filter.

    estimation algorithm or point light software. By approximating the motion of a joint in a small

    time interval by a linear function, we can design the transition matrix A so that the next state is

    a linear function of the previous states. As done by Kohler,9 to account for non-constant velocity

    often associated with accelerating image structures, we use the process noise co-variance matrix

    Qdefined as

    Q=a2t

    6

    2(t)2 0 3t 0

    0 2(t)2 3t

    3t 0 6

    0 3t 6

    (7)

    whereais the acceleration and tis the time step determined by the frame rate of the camera.

    This design of the Kalman filter suits our scheme well as any small non-linearity in the sub-

    trajectory can be account for a non-constant velocity of the joint region. The modification of the

    Kalman Filter recursive algorithm used for the joint tracking is shown in Figure 3. It is shown from

    the figure that the measurement is obtained from the optical flow estimate. There are a couple of

    12

  • 8/10/2019 SPIE JEI Paper UnderReview

    13/43

    Fig 4: Block schematic of tracking.

    scenarios which needs to be tackled in order to use the optical flow as a reliable measurement

    vector. The first one is that the optical flow estimate falls in the elliptical search region computed

    during the prediction phase and this confirms the correctness of the optical flow thereby making the

    optical flow estimate as a suitable measurement vector. The elliptical search region is computed by

    using the posterior state and the predicted state as two foci of an ellipse and computing the major

    and minor axis using the possible error values from the prior error co-variance matrix. 4 The second

    scenario is when the optical flow estimate does not fall in the search region, thereby confirming that

    the optical flow estimate is noisy and is not suitable for measurement. This signals the end of the

    current linear sub-trajectory and beginning of the next linear sub-trajectory where the associated

    Kalman filter must be re-initialized to track the optical flow matches in the next sub-trajectory.

    4 Proposed Framework

    The proposed tracking scheme consists of two main stages: a) Kalman tracking of the optical flow

    matches on a sub-trajectory b) Reinitialization of the Kalman tracker using a region based match.

    In the overall schematic shown in Figure 4, the first step is to compute the foreground/background

    model. The foreground mask can provide us with an estimate of the global velocity to initialize/re-

    13

  • 8/10/2019 SPIE JEI Paper UnderReview

    14/43

    (a) Coarse joint location estimated in frame 1. (b) Elliptical search region in frame 2.

    (c) Fine estimates of joint location in frame 2 after tracking. (d) Elliptical search region in frame 4 (Wrist joint tracker is

    reinitialized).

    (e) Finer estimates of joint locations in frame 4 after tracking.

    Fig 5: Illustration of elliptical search regions before tracking and joint location estimates af-

    ter tracking. The coarse pose estimates are represented by purple color in each frame. The

    search regions and the finer joint estimates are given as shoulder(blue), elbow(green), wrist (red),

    waist(cyan), knee (yellow) and ankle(pink). In frame 4, region-based matching is initiated and the

    corresponding tracker is re-initialized.

    14

  • 8/10/2019 SPIE JEI Paper UnderReview

    15/43

    (a) Elliptical search region for frame 9. Here, the ankle joint

    undergoes region matching and since the constraint is satis-

    fied, the corresponding tracker is re-initialized.

    (b) Finer joint location estimations after the tracking scheme.

    (c) Elliptical search regions Sop(t) and Sreg(t) in frame 11.Here, for the knee joint, the constraint is not satisfied and the

    tracker is only corrected by the coarse joint location estimate

    given by the purple point.

    (d) Finer joint location estimates after the tracking scheme.

    (e) Elliptical search regions Sop(t) and Sreg(t) for both theknee and the ankle joints in frame 13.

    (f) Finer joint location estimate where the knee and ankle

    trackers are corrected by coarse joint location estimates.

    Fig 6: Illustration of elliptical search regions and fine joint estimates in certain frames when tracker

    is only corrected.

    15

  • 8/10/2019 SPIE JEI Paper UnderReview

    16/43

    initialize the Kalman tracker associated with a joint. As we traverse across each time step along

    the sub-trajectory, each joint region will be described by a uniform LBP histogram. The coarse

    estimation of the joint location is provided by the estimates given by Point Light Software. To

    demonstrate the tracking ability of the framework, we use the coarse estimated points at sub-

    trajectory boundaries to get a finer region-based estimate of the joint location. The algorithm is

    given below :

    1. Extract the first frame(time instantt = 1) of the sub-trajectory. Compute dense optical flow

    within the foreground region to get the global velocity estimate(median flow). Initialize/Re-

    initialize the Kalman filter with the coarse joint location(xcos, ycos)/finer region-based esti-

    mate(xreg2, yreg2)and the global velocity. The state of the tracker for each body joint is then

    xt = [x,y,vx, vy]where(vx, vy)is the joint velocity which is set to the global flow velocity

    estimate. This will considered as the corrected statext1 at timet = 1. Updatet t+ 1

    and predict the state(get prior state) xt of the Kalman filter. Using the predicted statex

    t ,

    posterior statext1and the apriori error co-varianceP

    t , estimate the elliptical regionSop(t)

    where the joint location is likely to fall on.

    2. Extract the next frame of the sub-trajectory. Find the optical flow match(xop, yop) of each

    joint between instances t and t 1. Also compute the dense optical flow and the global

    velocity of the foreground region. Check if optical flow joint location estimate falls on the

    predicted elliptical search region. If yes, go to Step 3. Else go to step 4.

    3. Using the joint location estimates provided by the optical flow as the measurement vector

    z = [zx, zy], perform the correction phase of the filter to get the posterior state xt . Update

    t t+1. Set the joint velocity as the global velocity and predict the state(get prior state) xt

    16

  • 8/10/2019 SPIE JEI Paper UnderReview

    17/43

    and the elliptical search region. Repeat steps 2 and 3 until the end boundary of sub-trajectory

    denoted by optical flow mismatch.

    4. Compute the joint location estimate(xreg,yreg)

    within the Kalman filter predicted search

    region using LBP-based region matching. This estimate is given by

    argminpSop(t)2(fj, fp)

    where fjis the joint descriptor updated in the previous time instant, fpis the region descriptor

    computed at the pixel p = (xreg, yreg) within the elliptical search region Sop(t). Using

    this estimate and the coarse joint location estimate, predict the new elliptical search region

    Sreg(t). If the new elliptical search region is very large, a constraintSreg(t) Sop(t) is

    used. Re-initialization occurs only if this constraint is satisfied. If it is satisfied, go to step 5,

    else goto step 6.

    5. Compute the LBP-based region matching estimate given byargminpSreg(t)2(fj , fp)where

    p = (xreg2, yreg2). Use this finer estimate of the joint location to re-initialize the Kalman

    tracker associated with that particular joint. Update joint velocity as the global velocity and

    predict the state(get prior state) xt and the elliptical search regionSop(t). Go to Step 2.

    6. Use the coarse joint location estimates(xcos, ycos)as the measurement vectorz = [zx, zy]to

    correct the corresponding tracker.

    7. Continue till all the frames of the sequence has been processed.

    We provide sample illustrations of the tracking scheme in Figures 5 and 6. In Figure 5, for frame

    2, the optical flow matches of all the joints fall in their respective predicted elliptical search region

    17

  • 8/10/2019 SPIE JEI Paper UnderReview

    18/43

    Sop(t)and these matches correct the corresponding joint tracker. In frame 4, all of the joints except

    the wrist joint are still in the sub-trajectory as there are no optical flow mismatches. For the wrist

    however, the optical flow match does not fall into its respective predicted elliptical search region

    Sop(t). Thus, within theSop(t), a LBP-based region match is found. Using this match and the

    coarse estimated joint location, another elliptical region Sreg(t)is obtained. Again, on the Sreg(t)

    , the region based match is obtained. This match re-initializes the Kalman tracker as the constraint

    is satisfied and signals the beginning of another sub-trajectory. This is not the case with the knee

    and ankle joints in frame 11 and 13 where in fact, the elliptical regionSreg(t)is much much larger

    thanSop(t). This constraint is violated either when the coarse joint location estimates are noisy(

    sometimes not on the body but on the background) or when the region-based LBP match fails and

    catches onto the an edge on the background. In the proposed technique, we tackle this issue by

    using the coarse joint location estimate to correct the existing tracker and not re-initialize it. This

    is to make sure that the tracker does not get caught on to the background edges and only keeps

    tracks of the corresponding body joint.

    5 Results and Experiments

    The proposed tracking scheme has been tested on a private dataset provided by the Air Force

    Institute of Technology, Dayton OH. It consists of 12 subjects walking along a outdoor track across

    the face of a building and with a staircase in the front, and is performed twice by each subject to

    get a total of 24 video sequences. Each subject not only wears different colored clothing but also

    wears a coat vest on their second try during data capture. These video sequences are captured

    simultaneously using two cameras focused on the same area. So, when each sequence is divided

    into 5 phases A - E, a sequence of each phase is selected from either the left camera or right

    18

  • 8/10/2019 SPIE JEI Paper UnderReview

    19/43

    (a) Background image (b) Phase A (c) Phase B

    (d) Phase C (e) Phase D (f) Phase E

    Fig 7: Illustration of the scene and division of video sequence into five phases.

    camera depending on what area is being focused on for analysis. Thus, we dont consider from

    which camera the sequence has been shot from. The description of each phase along with the

    illustration in Figure 7 is described as follows.

    Phase A : Subject is walking clockwise around the track. The frames of interest are of the

    subject walking on the cross over the platform.

    Phase B: Subject is walking clockwise around the track. The frames of interest are of the

    subject walking on the grass after the ramp.

    Phase C: Subject is walking clockwise around the track. The frames of interest are of the

    subject walking on the grass after the ramp on the side of the track away from the building.

    Phase D: Subject is walking counter-clockwise around the track. The frames of interest are

    of the subject walking on the grass before the ramp.

    19

  • 8/10/2019 SPIE JEI Paper UnderReview

    20/43

    Phase E: Subject is walking counter-clockwise around the track. The frames of interest are

    of the subject walking on the grass along the ramp.

    5.1 Challenges of the dataset and evaluation strategies

    In this manuscript, we provide test results obtained by testing the proposed tracking scheme on

    all sequences in all the phases. Although the dataset was captured to analyze the difference in the

    gait of the individual in the case of wearing/not-wearing a coat vest, this dataset provides a good

    number of challenges to test the precision of the proposed tracking scheme. The dataset comes

    with the human pose estimates at every frame and is obtained by the Point Light Software. These

    pose estimates give us a coarse joint location estimates which are noisy and accurate with regards

    to the application of gait tracking at hand. The proposed tracking scheme makes use of the coarse

    joint location estimates to give finer estimates of the joint location. The effect of the tracking

    scheme on the gait analysis algorithms is beyond the scope of this paper and we focus mainly on

    the joint tracking aspect and the smoothness of the trajectory. One main challenge in this dataset

    is the very low resolution imagery where a 17 17 neighborhood around a single joint, say a

    shoulder joint will capture the entire upper body of the individual. This is illustrated in Figure

    1a. The other challenge is the interlacing effects present in the video which can render edge based

    region descriptors ineffective and affect the matching process. Apart from these global challenges,

    there are certain characteristics associated each phase which sometimes introduces a challenging

    scenario for tracking. Some of these characteristics are

    Phase A : There can be partial/complete occlusion of the lower-body joints such as knee

    and ankle due to the platform railings and staircase. The lowest resolution of the person is

    captured in this phase as the person is at the farthest distance from the camera.

    20

  • 8/10/2019 SPIE JEI Paper UnderReview

    21/43

    Phase B : There is a complete occlusion of the ankle by the tall grass and joint region de-

    scriptions cannot be computed. Moreover, the coarse estimates provided by the Point Light

    software are also very noisy and do not give robust estimates of the joint.

    Phase C : The image region containing the person is of slightly higher resolution as he is

    closer to the camera. No occlusions of the ankle joint by the grass was noticed and it gives a

    much cleaner data for the tracking scheme.

    Phase D : There is a complete occlusion of the ankle due to the tall grass and same problems

    from phase B exists as well. The only difference being the person is walking on the opposite

    direction.

    Phase E : Same challenges as that of phase A with the difference being the person walking

    in the opposite direction.

    We set equal neighborhood sizes of17 17for each joint region and set a constant acceleration

    a = 0.1pixels/f rame2 in the process noise co-variance design of the corresponding Kalman

    filter. To illustrate the effectiveness of the tracking scheme, we provide three different types of

    measures and graphs which explains the different aspects of tracking efficiency.

    1. Co-variance-Based Trajectory Measure: A statistical measure which gives how close the

    tracked joint locations are to the coarse estimates of the joint location for each sequence

    associated with a particular subject. This statistical metric19 is given by

    d(K, Km) =

    ni=1

    (log(i(K, Km))2 (8)

    21

  • 8/10/2019 SPIE JEI Paper UnderReview

    22/43

    whereK R3 is the co-variance of the tracked points,Km R3 is the co-variance matrix

    of the coarse joint locations, i is theith eigen value associated with |K Km| = 0 and

    nbeing the number of eigen values. The lower the value, the closer are the tracked points

    to the coarse joint location estimates. This measure although does not provide us with the

    precision of the tracking scheme, gives an indication whether the tracked joint trajectory are

    located within the spatio-temporal neighborhood of the coarse joint trajectory.

    2. Multiple Object Tracking Precision/Accuracy (MOTP/MOTA): The MOTP/MOTA20 met-

    ric is a widely used efficiency measure for multiple-object tracking mechanisms where the

    MOTP/MOTA gives the precision and accuracy of the tracker by considering all the detected

    and tracked objects. We use a implementation of the CLEAR-MOT provided by the au-

    thors21 to provide us the statistical data such as false positive rate, false negative rate, MOTA

    and MOTP scores. These statistics are computed as follows

    (a) Multiple Object Tracking Precision (MOTP) : It refers to the closeness of a tracked

    point location to its true location(given as ground truth). Here, we measure the close-

    ness by measuring the overlap between the neighborhood region occupied by the tracked

    point location and the ground truth. Higher the value of this overlap, more precise is

    the estimated location of the point. This is given by

    MOTP =

    i,t o

    i

    t

    t

    ct(9)

    where oit is the amount of overlap for the joint i at frame t of a sequence and ct is

    the number of correct correspondences at frame t. Only those joints which satisfy the

    22

  • 8/10/2019 SPIE JEI Paper UnderReview

    23/43

    criteriaoit> Tare included in the above equation.

    (b) Multiple Object Tracking Accuracy (MOTA) : It gives the accumulated accuracy in

    terms of the fraction of the tracked joints matched correctly without any misses or

    mismatches. It is given by

    MOTA= 1

    t

    (mt+f pt+mmet)

    t

    gt(10)

    where mt, f pt and mmet are the number of misses, false positives and mismatches

    respectively andgtare the number of points present at framet. Thus the false negative

    rate, false positive rate and rate of mismatches can be computed as

    tmttgt

    ,

    tfpttgt

    and

    tmmet

    tgt.

    These statistics evaluates the tracking algorithm in terms of overall accuracy and precision

    achieved by accumulating the measures of all the joints of interest per video sequence.

    3. Precision/Recall: The precision and recall for the multiple object tracking is computed as the

    overall MOTP and MOTA scores. The precision, in contrast to the theoretical definition, is

    computed by accumulating the overlaps oitand correct correspondences ctat all frames for all

    the sequences and taking the ratio between them. The recall is computed by accumulating the

    total number of misses, false positives and mismatches from all frames of all the sequences

    and using the formula for MOTA. The precision and recall is computed from every possible

    parameter set of the tracking scheme so that the best combination can be found for each

    phase.

    23

  • 8/10/2019 SPIE JEI Paper UnderReview

    24/43

    (a) Kalman filtered coarse joint location estimates. (b) Corresponding range percentage for Kalman trackin

    scheme.

    (c) Proposed tracking scheme. (d) Corresponding range percentage for the proposed trackin

    scheme.

    Fig 8: Statistical measures (left) and percentage of sequences into different ranges (right) obtained

    for phase A.

    5.2 Covariance-Based Trajectory Analysis

    The co-variance based trajectory measure is computed between the tracked points and the coarse

    estimated points for each phase. We provide two variations of the tracking scheme ; a) One which

    simply uses the Kalman filter on the coarse estimates directly ; b) the other is the proposed tracking

    24

  • 8/10/2019 SPIE JEI Paper UnderReview

    25/43

    (a) Kalman filtered manual point annotation. (b) Corresponding range percentage for Kalman trackin

    scheme.

    (c) Proposed tracking scheme. (d) Corresponding range percentage for the proposed trackin

    scheme.

    Fig 9: Statistical measures (left) and percentage of sequences into different ranges (right) obtained

    for phase B.

    25

  • 8/10/2019 SPIE JEI Paper UnderReview

    26/43

    (a) Kalman filtered coarse joint location estimates. (b) Corresponding range percentage for Kalman trackin

    scheme.

    (c) Proposed tracking scheme. (d) Corresponding range percentage for the proposed trackin

    scheme.

    Fig 10: Statistical measures (left) and percentage of sequences into different ranges (right) obtained

    for phase C.

    26

  • 8/10/2019 SPIE JEI Paper UnderReview

    27/43

    (a) Kalman filtered coarse joint location estimates. (b) Corresponding range percentage for Kalman trackin

    scheme.

    (c) Proposed tracking scheme. (d) Corresponding range percentage for the proposed trackin

    scheme.

    Fig 11: Statistical measures (left) and percentage of sequences into different ranges (right) obtained

    for phase D.

    27

  • 8/10/2019 SPIE JEI Paper UnderReview

    28/43

    (a) Kalman filtered coarse joint location estimates. (b) Corresponding range percentage for Kalman trackin

    scheme.

    (c) Proposed tracking scheme. (d) Corresponding range percentage for the proposed trackin

    scheme.

    Fig 12: Statistical measures (left) and percentage of sequences into different ranges (right) obtained

    for phase E.

    28

  • 8/10/2019 SPIE JEI Paper UnderReview

    29/43

    scheme which uses the image information in determining the fine joint location estimates. Thus,

    we compute the trajectory measure of the two tracking schemes for each video sequence and for

    each joint. Figures 8, 9, 10, 11 and12 provide the tables containing the trajectory measures for

    each phase and for each tracking scheme.

    We can empirically determine different ranges of the trajectory discrepancy measures over

    which we can say the finer estimates obtained by the tracking scheme is acceptable or not and this

    is possible by a visual inspection of the trajectory plots for each joint for each sequence. A sample

    of the trajectory plots computed for subject 11 in phase A is shown in Figure. The ranges and the

    possible acceptance level with explanations are given below

    Trajectory measure ;d [0, 1): This denotes that the finer estimates obtained by a tracking

    scheme are much closer to the coarse joint location estimates than required. In this scenario,

    the finer estimates leans more towards the noisy, discrete coarse estimates. Although the

    tracking scheme gives better estimates than the coarse pose estimates, the finer estimates are

    slightly noisy in nature and are not very smooth.

    Trajectory measure; d [1, 3) : This range of values are considered as highly acceptable

    levels even though they seem farther from the noisy coarse pose estimates. By observation,

    we see that the finer estimates of the joint trajectory are more smoother than the coarse joint

    estimates and in fact resembles the actual sinusoidal trajectory of the joint.

    Trajectory measure;d [3, 5): This range of values can be considered as semi-acceptable

    where the finer joint trajectory estimates obtained from the tracking are smooth but they are

    slightly far apart from the noisy coarse estimates. This is because either the coarse pose

    estimates are noisy or that it tracks a different point on the same body joint region and

    29

  • 8/10/2019 SPIE JEI Paper UnderReview

    30/43

    maintains the wrong track. Sometimes the estimated fine trajectory might miss/track some

    other point in a sub-trajectory and the corresponding trajectory measure falls in this range as

    well.

    Trajectory measure;d [5,]: This corresponds to some wayward tracking by the tracking

    scheme. This happens mainly because the coarse joint location estimates contain a large

    error due to the failing of the human pose estimation algorithm. In this case, the finer joint

    location estimates and the coarse estimates are drastically different.

    Using these pre-defined ranges, we compute the percentage of sequences whose trajectory discrep-

    ancy measure falls within the specified ranges for the two schemes as mentioned earlier. For phase

    A, we see a large percentage of sequences of around6575%falling within the first measure range

    d [0, 1]for the Kalman filtered tracking scheme. As mentioned before, although this measure is

    small, this tracking scheme gives more precedence to the coarse points and is under the assump-

    tion that these coarse points are noise free. Thus, it is an acceptable estimated trajectory but not

    a smooth one as required for gait analysis. However, for the proposed scheme, around 65 85%

    of the sequences lie on the most acceptable region d [1, 3)with the exception of the ankle joint

    where only20% falls on it while the rest falls on the region d [0, 1) . This latter region is still

    acceptable as far as tracking is concerned.510%of sequences for shoulder, elbow, wrist and hip

    joint falls in the semi-acceptable regiond [3, 5).

    For phase B, the Kalman filtering scheme performs better where most of the sequences falls

    in the acceptable region with equal divisions between ranges d [0, 1) and d [1, 3). Using

    the proposed scheme, we get improved performance for the shoulder, knee and ankle joint and

    comparable performances for the hip and knee joint. The elbow however has a lot of sequences

    30

  • 8/10/2019 SPIE JEI Paper UnderReview

    31/43

    in the semi-acceptable ranged [3, 5) with a couple of sequences falling in the bad range d

    [5,)for the elbow, wrist and hip joints. This is mainly because the LBP descriptor of wrist joint

    was unable to capture the information as there was not enough pixels for representation in this

    low-resolution imagery. The bad region matching is also due to similar appearances between the

    clothing and the background in this phase. Interestingly enough, although the ankle was occluded

    due to the grass for some sequences, the tracking scheme was able to pick up the ankle joint

    from one of the coarse pose estimates and was able to track it to a certain degree and thus, the

    corresponding sequences fall in the acceptable regions.

    All of the sequences in Phase C falls in the acceptable region where around65 100%falls in

    the regiond [0, 1)for the Kalman filtered scheme. For the proposed scheme, these sequences are

    distributed between the two acceptable regions with the majority falling in the highly acceptable

    regiond [1, 3). Similar distribution of the sequences is seen for phase D and E with the proposed

    scheme showing a larger number of sequences falling in the highly acceptable region for all the

    joints.

    Thus, we see that for all the phases, most of the sequences are distributed in the highly ac-

    ceptable regions d [1, 3) where gait analysis can be useful. This is also the region where the

    estimated trajectories follow a smooth sinusoidal path. Some sequences however are distributed in

    the regiond [0, 1)even with the proposed scheme which will require post-processing of the fine

    joint location estimates for gait analysis. This is because of the constraint of having very low res-

    olution with interlacing effects which makes region-based descriptor matching ineffective. When

    the region matching fails, the proposed scheme becomes equivalent to the Kalman filtered tracking

    scheme, thereby atleast maintaining the joint track. This is useful when in a certain region where

    the region matching does get effective, a portion of the track can be used to analyze the gait of an

    31

  • 8/10/2019 SPIE JEI Paper UnderReview

    32/43

    individual.

    5.3 MOTP/MOTA Analysis

    We computed the MOTP,MOTA, false positive rate and false negative rate for each sequence indi-

    vidual for each phase by setting the threshold T = 0.5with same acceleration parameter a = 0.1

    and a neighborhood size of17 17 for each body joint. The corresponding distributions in the

    MOTA-MOTP space are shown in Figure 14 where the red stars are the sequences, labeled ap-

    propriately. The gaussian contours approximates the distribution of the sequences in the MOTP-

    MOTA space. The more concentrated the distribution is towards the upper right corner, the more

    better the precision and accuracy of the tracking scheme. In Figure 14a, we see that all of the

    sequences in phase A have moderately high precision and accuracy with some achieving high

    accuracy of90%with the corresponding precision being above80%. However, two sequences be-

    longing to Subject 26 show a low accuracy of60%or less with a precision of around 75%. This is

    mainly because the hip and the ankle joint tracks follow a different path as compared to the ground

    truth data. Another important factor contributing to the drop in accuracy for some sequences is

    also the noise in the ground truth data annotation provided by the Point Light Software.

    For phase B, as shown in Figure 14b, most of the sequences have only a moderate precision of

    around70 75%and moderate accuracy ranging from 50% 75%. Some sequences belonging

    to Subject 3, 22 and 26 exhibits low accuracy of50% or less. However, there are some sequences

    belonging to Subject 18 and 27 which exhibits very high accuracy of90% or more with good

    precision of8085%. The wide distribution of the sequences in the MOTP-MOTA space is mainly

    due to a lot of noisy ground truth annotations by the Point Light Software and a lot of occlusion-

    based challenges present in this phase. Even in such a challenging scenario( with occlusions and

    32

  • 8/10/2019 SPIE JEI Paper UnderReview

    33/43

  • 8/10/2019 SPIE JEI Paper UnderReview

    34/43

    (c) Wrist joint

    (d) Hip joint

    Fig 13: Estimated fine joint trajectories by different schemes for subject 11 wearing a coat in phase

    A.

    34

  • 8/10/2019 SPIE JEI Paper UnderReview

    35/43

    (e) Knee joint

    (f) Ankle joint

    Fig 13: Estimated fine joint trajectories by different schemes for subject 11 wearing a coat in phase

    A.

    35

  • 8/10/2019 SPIE JEI Paper UnderReview

    36/43

    (a) Phase A

    (b) Phase B

    Fig 14: Scatter plot showing where the sequences of each phase are distributed in the

    MOTP/MOTA space.

    36

  • 8/10/2019 SPIE JEI Paper UnderReview

    37/43

    (c) Phase C

    (d) Phase D

    Fig 14: Scatter plot showing where the sequences of each phase are distributed in the

    MOTP/MOTA space.

    37

  • 8/10/2019 SPIE JEI Paper UnderReview

    38/43

    (e) Phase E

    Fig 14: Scatter plot showing where the sequences of each phase are distributed in the

    MOTP/MOTA space.

    background similarities with interlacing ), the tracking scheme performs moderately well.

    For phase C, although there are a few sequences which shows low accuracy, majority of the

    sequences have a moderate of accuracy of around60% more. Although this phase has a slightly

    better resolution of the person, some of the challenging scenarios similar to phase B exists in this

    phase as well where due to the better resolution, the tracking scheme performs much better in

    phase C than in phase B.

    Phase D and E however, show a lot of sequences having good accuracies of75%or more. Sim-

    ilar challenging scenarios exist with the difference being the person moving in an anti-clockwise

    manner around the track. Overall for each phase, we notice that there is a considerable amount

    of sequences showing good accuracies of75% or more with a minor portion exhibiting low accu-

    38

  • 8/10/2019 SPIE JEI Paper UnderReview

    39/43

    racies of50% or more. Again, this is due to a lot of noise in the coarse joint location estimates

    provided by the Point Light Software which drops the accuracy for some sequences. This noise

    in fact contributes to the number of false positives which maybe incorrectly interpreted, thereby

    reducing some portion of the accuracy during evaluation. However, for all the phases, a good pre-

    cision of75% or more is achieved and the tracking scheme is precise in locating or providing us

    with finer estimates of the joint location.

    5.4 Precision/Recall for each body joint

    The precision and recall is computed for each phase for a particular value of the acceleration

    parameter a in the Kalman filter and is illustrated in Figure 15. For phases A,C and D, we see

    that the precision and recall achieves the highest value of around 80%and85%for an acceleration

    valuea = 0.1. However, for phases B and E, we see that an acceleration ofa = 0.2gives higher

    values of precision and recall. This is due to the difference in speed of the joints with respect to

    each individual and an optimal value of the acceleration for each person is required.

    6 Conclusions and Future Work

    We have proposed a body joint tracking algorithm for use in low-resolution imagery for outdoor

    sequences. The algorithm is a combination of primitive but effective point tracking techniques

    using the optical flow and region based matching using LBP coupled with the learning ability

    of the Kalman filter. Some joints such as shoulder, elbow and hip are successfully tracked in

    most of the sequences along with the wrist joint. However, the knee and ankle joints have multiple

    occurrences of re-initialization due to the mismatching of the optical flow caused by low-resolution

    artifacts and interlacing effects. An important addition which we plan to add in the future work is

    39

  • 8/10/2019 SPIE JEI Paper UnderReview

    40/43

    (a) Phase A (b) Phase B

    (c) Phase C (d) Phase D

    (e) Phase E

    Fig 15: Variation of precision and recall of tracking scheme with change in acceleration parameter.

    40

  • 8/10/2019 SPIE JEI Paper UnderReview

    41/43

    to use the contextual relationship between the body joints. This crucial aspect is missing in this

    proposed algorithm as it assumes that joint movement is independent of the other joints which

    pertains to the use of individual piece-wise tracking schemes for each joint.

    References

    1 H. Ben Shitrit, J. Berclaz, F. Fleuret, and P. Fua, Tracking multiple people under global

    appearance constraints, in Computer Vision (ICCV), 2011 IEEE International Conference

    on, pp. 137144, 2011.

    2 J. Shao, S. Zhou, and R. Chellappa, Tracking algorithm using background-foreground mo-

    tion models and multiple cues, inAcoustics, Speech, and Signal Processing, 2005. Proceed-

    ings. (ICASSP 05). IEEE International Conference on,2, pp. 233236, 2005.

    3 W.-L. Lu and J. Little, Simultaneous tracking and action recognition using the pca-hog de-

    scriptor, in Computer and Robot Vision, 2006. The 3rd Canadian Conference on, pp. 66,

    2006.

    4 M. Kaaniche and F. Bremond, Tracking hog descriptors for gesture recognition, in Ad-

    vanced Video and Signal Based Surveillance, 2009. AVSS 09. Sixth IEEE International Con-

    ference on, pp. 140145, 2009.

    5 P. Bilinski, F. Bremond, and M. B. Kaaniche, Multiple object tracking with occlusions using

    hog descriptors and multi resolution images, in Crime Detection and Prevention (ICDP

    2009), 3rd International Conference on, pp. 16, 2009.

    6 T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale and rotation invariant

    texture classification with local binary patterns, Pattern Analysis and Machine Intelligence,

    IEEE Transactions on24(7), pp. 971987, 2002.

    41

  • 8/10/2019 SPIE JEI Paper UnderReview

    42/43

    7 J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and

    A. Blake, Real-time human pose recognition in parts from single depth images, in Com-

    puter Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 12971304,

    2011.

    8 C.-H. Huang, E. Boyer, and S. Ilic, Robust human body shape and pose tracking, in 3DV-

    Conference, 2013 International Conference on, pp. 287294, 2013.

    9 M. Kohler, Using the Kalman Filter to Track Human Interactive Motion: Modelling and

    Initialization of the Kalman Filter for Translational Motion, Forschungsberichte des Fach-

    bereichs Informatik der Universitat Dortmund, Dekanat Informatik, Univ., 1997.

    10 Y. Yang and D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts, in

    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 1385

    1392, June 2011.

    11 V. Ramakrishna, T. Kanade, and Y. Sheikh, Tracking human pose by tracking symmet-

    ric parts, in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on ,

    pp. 37283735, 2013.

    12 V. Ferrari, M. Marin-Jimenez, and A. Zisserman, Progressive search space reduction for

    human pose estimation, in Computer Vision and Pattern Recognition, 2008. CVPR 2008.

    IEEE Conference on, pp. 18, June 2008.

    13 D. Ramanan, Learning to parse images of articulated bodies, in Advances in Neural Infor-

    mation Processing Systems 19, B. Scholkopf, J. Platt, and T. Hoffman, eds., pp. 11291136,

    MIT Press, 2007.

    42

  • 8/10/2019 SPIE JEI Paper UnderReview

    43/43

    14 X. Burgos-Artizzu, D. Hall, P. Perona, and P. Dollar, Merging pose estimates across space

    and time, inProceedings of the British Machine Vision Conference, BMVA Press, 2013.

    15 C. M. Bishop,Pattern Recognition and Machine Learning (Information Science and Statis-

    tics), Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

    16 B. D. Lucas and T. Kanade, An iterative image registration technique with an application to

    stereo vision, in Proceedings of the 7th International Joint Conference on Artificial Intelli-

    gence - Volume 2,IJCAI81, pp. 674679, 1981.

    17 T. Lacey, Tutorial: The kalman filter,Georgia Institute of Technology.

    18 G. Welch and G. Bishop, An introduction to the kalman filter, 1995.

    19 W. Forstner and B. Moonen, A metric for covariance matrices, 1999.

    20 K. Bernardin and R. Stiefelhagen, Evaluating multiple object tracking performance: The

    clear mot metrics,J. Image Video Process. 2008, pp. 1:11:10, Jan. 2008.

    21 A. D. Bagdanov, A. Del Bimbo, F. Dini, G. Lisanti, and I. Masi, Compact and efficient

    posterity logging of face imagery for video surveillance, 2012.