Camera-less Articulated Trajectory ReconstructionCamera-less Articulated Trajectory Reconstruction...

4
Camera-less Articulated Trajectory Reconstruction Yingying Zhu 1,2 , Jack Valmadre 2,3 , Simon Lucey 1,2,3 1 University of Queensland, 2 Queensland University of Technology, 3 CSIRO {yingying.zhu, jack.valmadre, simon.lucey}@csiro.au Abstract In this paper, we address the problem of reconstruct- ing 3D trajectories given only 2D point projection tra- jectories of an articulated structure. Hitherto, most applications of articulated trajectory reconstruction re- quire: (i) relative lengths, and (ii) camera motion. We propose a novel extension that allows us to in many cir- cumstances circumvent these limitations. Of particular note is our characterisation theoretically and demon- stration empirically how agnostic one be to camera mo- tion when using a DCT basis. Practical reconstructions of non-human articulated structures are depicted (e.g. koala, sea snake, tasmanian tiger) where articulated lengths and camera motion are unknown. 1. Introduction Trajectory basis non-rigid structure from motion (NRSFM) refers to the problem of reconstructing a point trajectory from 2D projections using a low- dimensional trajectory basis. Unfortunately, trajectory basis NRSFM suffers from poor reconstruction for re- alistic/slow camera motion [5]. Recently, Park and Sheikh [4] proposed an articulated trajectory approach which was able to successfully reconstruct 3D structure for slow camera motion but requires, a priori, known relative camera motion and relative articulated lengths. Bundle adjustment approaches for estimating rela- tive camera motion from a sequence of images relating to the same moving scene is now a well understood sci- ence in computer vision. The approach, however, relies heavily upon textured areas of the image relating to the rigid scene. These textured areas are employed to es- timate point correspondences between adjacent frames from which camera motion estimates can be obtained. This becomes problematic when we employ video se- quences containing non-rigid objects as the rigid back- ground texture is either often: (a) out of focus mak- ing the background not usable for estimating correspon- dences, or (b) sometimes nonexistent (e.g. white build- ing, blue sky, dusty ground, etc.). Further, the rela- tive articulated lengths of the non-rigid objecting being Known Camera & Length (Park et. al) Cameraless & Unknown Length (Our Approach) Figure 1: Park et al. [4] recently proposed an approach to reconstruct smooth articulated trajectories given that the cameras and relative length are known (e.g. a per- son). In comparison, our method can reconstruct such trajectories where lengths and cameras are unknown (e.g. tasimania tiger), but the camera motion be smooth. analysed are often not known such as non-human artic- ulated objects (e.g. animals). These two dilemmas (see Figure 1) are at the heart of this paper. In this paper we propose an extension to the artic- ulated trajectory reconstruction approach of Park and Sheikh [4] for when the camera motion and the articu- lated relative lengths are unknown. Specifically we: i) Extend Park and Sheikh’s approach [4] to include both perspective and affine cameras, the affine camera ex- tension entertains the estimation of relative articulated lengths. For the affine camera case, we theorise that for slow moving cameras good reconstruction performance can still be attained without knowing the camera mo- tion, making the algorithm practically “camera-less”. ii) Depict practical 3D reconstructions from a range of 2D video sequences containing articulated structures (e.g. koalas, Tasmanian tigers, sea-snakes, and humans) with unknown relative lengths as well as backgrounds unsuitable for bundle adjustment camera estimation. 2. Problem Each point x ti 2 R 3 imaged as w ti 2 R 2 by a pin- hole camera for all t 2 {1,...,F } and i 2 {1,...,N }, we drop the subscripts for convenience, w 1 ' R d c T b x 1 (1) The projective equality in Equation 1 yields the under- determined 2 3 system of linear equations. Qx = u, (2)

Transcript of Camera-less Articulated Trajectory ReconstructionCamera-less Articulated Trajectory Reconstruction...

Page 1: Camera-less Articulated Trajectory ReconstructionCamera-less Articulated Trajectory Reconstruction Yingying Zhu1,2, Jack Valmadre2,3, Simon Lucey1,2,3 1University of Queensland, 2Queensland

Camera-less Articulated Trajectory Reconstruction

Yingying Zhu1,2, Jack Valmadre2,3, Simon Lucey1,2,3

1University of Queensland, 2Queensland University of Technology, 3CSIRO{yingying.zhu, jack.valmadre, simon.lucey}@csiro.au

Abstract

In this paper, we address the problem of reconstruct-ing 3D trajectories given only 2D point projection tra-jectories of an articulated structure. Hitherto, mostapplications of articulated trajectory reconstruction re-quire: (i) relative lengths, and (ii) camera motion. Wepropose a novel extension that allows us to in many cir-cumstances circumvent these limitations. Of particularnote is our characterisation theoretically and demon-stration empirically how agnostic one be to camera mo-tion when using a DCT basis. Practical reconstructionsof non-human articulated structures are depicted (e.g.koala, sea snake, tasmanian tiger) where articulatedlengths and camera motion are unknown.

1. Introduction

Trajectory basis non-rigid structure from motion(NRSFM) refers to the problem of reconstructing apoint trajectory from 2D projections using a low-dimensional trajectory basis. Unfortunately, trajectorybasis NRSFM suffers from poor reconstruction for re-alistic/slow camera motion [5]. Recently, Park andSheikh [4] proposed an articulated trajectory approachwhich was able to successfully reconstruct 3D structurefor slow camera motion but requires, a priori, knownrelative camera motion and relative articulated lengths.

Bundle adjustment approaches for estimating rela-tive camera motion from a sequence of images relatingto the same moving scene is now a well understood sci-ence in computer vision. The approach, however, reliesheavily upon textured areas of the image relating to therigid scene. These textured areas are employed to es-timate point correspondences between adjacent framesfrom which camera motion estimates can be obtained.This becomes problematic when we employ video se-quences containing non-rigid objects as the rigid back-ground texture is either often: (a) out of focus mak-ing the background not usable for estimating correspon-dences, or (b) sometimes nonexistent (e.g. white build-ing, blue sky, dusty ground, etc.). Further, the rela-tive articulated lengths of the non-rigid objecting being

Known Camera & Length (Park et. al) Cameraless & Unknown Length (Our Approach)

Figure 1: Park et al. [4] recently proposed an approachto reconstruct smooth articulated trajectories given thatthe cameras and relative length are known (e.g. a per-son). In comparison, our method can reconstruct suchtrajectories where lengths and cameras are unknown(e.g. tasimania tiger), but the camera motion be smooth.

analysed are often not known such as non-human artic-ulated objects (e.g. animals). These two dilemmas (seeFigure 1) are at the heart of this paper.

In this paper we propose an extension to the artic-ulated trajectory reconstruction approach of Park andSheikh [4] for when the camera motion and the articu-lated relative lengths are unknown. Specifically we: i)Extend Park and Sheikh’s approach [4] to include bothperspective and affine cameras, the affine camera ex-tension entertains the estimation of relative articulatedlengths. For the affine camera case, we theorise that forslow moving cameras good reconstruction performancecan still be attained without knowing the camera mo-tion, making the algorithm practically “camera-less”.ii) Depict practical 3D reconstructions from a rangeof 2D video sequences containing articulated structures(e.g. koalas, Tasmanian tigers, sea-snakes, and humans)with unknown relative lengths as well as backgroundsunsuitable for bundle adjustment camera estimation.

2. Problem

Each point x

ti

2 R3 imaged as w

ti

2 R2 by a pin-hole camera for all t 2 {1, . . . , F} and i 2 {1, . . . , N},we drop the subscripts for convenience,

w

1

�'

R d

c

T

b

� x

1

�(1)

The projective equality in Equation 1 yields the under-determined 2 ⇥ 3 system of linear equations.

Qx = u, (2)

Hilton Bristow
International Conference on Pattern Recognition (ICPR), 2012
Hilton Bristow
Page 2: Camera-less Articulated Trajectory ReconstructionCamera-less Articulated Trajectory Reconstruction Yingying Zhu1,2, Jack Valmadre2,3, Simon Lucey1,2,3 1University of Queensland, 2Queensland

where Q = R � wc

T and u = bw � d. When P

represents an affine camera,

P =

R d

0 1

�) Rx = w � d. (3)

3. Articulated Trajectory Reconstruction

Parent i and child j points are connected in an artic-ulated structure at time t if,

||�x

tj

||2 = `

2ij

8t, (4)

where �x

tj

= x

tj

� x

ti

and `

ij

is a constant lengthbetween these two points. When x

ti

is known, the pro-jection of x

tj

to u

tj

defines an under-determined 2 ⇥ 3

system of equations in �x

tj

,

Q

t

�x

tj

= �u

tj

, (5)

where �u

tj

= u

tj

� u

ti

.Since the nullspace of Q is one dimensional, it can

be represented as a single vector v 2 <3, one can de-compose

�x

tj

= �x

0tj

+ ↵

tj

v , (6)

where �x

0tj

is the portion of �x

tj

lying solely in thesubspace of Q and ↵ is a scalar. Substituting this de-composition into Equation 4, we get

tj

= ↵̄

tj

± �↵

tj

. (7)

We follow the idea of solving sign for arcticulatedstructure by employing trajectory basis in Park et al.’swork [4], the solutions for the articulated trajectory ofpoint i related to the parent point j are parametrized by

�x

j

= �x

0j

+ V

j

(↵̄j

+ �↵j

� s

j

) (8)

where s

j

2 {�1, 1}F is the sign in each frame,

�x

j

=

2

64�x1j

...�x

Fj

3

75 , ↵j

=

2

64↵1j

...↵

Fj

3

75 , (9)

V

j

=

⇥v1j

⌦ e1 · · · v

Fj

⌦ e

F

⇤, (10)

where e

k

is the k-th column of an identity matrix.Dropping the subscripts again for convenience, the

orthogonal component of the relative trajectory is,

k�xk2M = k�x

0k2M + k↵̄ + �↵ � sk2VTMV . (11)

The vector s

⇤ that gives the “smoothest” trajectory interms of the basis � is thus determined,

s

⇤= arg min

s2RFk↵̄ + �↵ � sk2VTMV

subject to s � s = 1 ,

(12)

where M = I � ��

T is the orthogonal projector tothe column-space of �, k�x

0kM = �x

0TM�x

0. Weemploy Goemans et al.’s approach [2] to optimize thisproblem in function 12. Note that as ↵̄ ! 0, the casewhere an affine camera is a good assumption, the objec-tive becomes invariant to a global sign ambiguity in s.

4. Camera-less Reconstruction

The central focus of this work is how to reconstructa 3D articulated trajectory when the camera matrix Q

is unknown.We made two simplifying assumptions toaid ourselves in this task: (i) an affine rather than per-spective camera, and (ii) the scale of the projection isconstant, or can be inferred. In the affine camera case,we are able to use an existing approach for estimatingarticulated lengths [6, 7], where the length of an edge isapproximated by the longest observed projection, afternormalization for scale.

In the affine camera model, a point x

ti

in the worldcoordinate frame is transformed into the camera coordi-nate frame x

(c)ti

such that,

x

(c)ti

= R

⇤t

x

ti

+ d

⇤t

. (13)

Interpreting the point from the camera coordinate frameenables us to entertain a projection matrix which is con-stant up to scale,

w

ti

= R

(c)t

x

(c)ti

, (14)

whereR

(c)t

= �

t

1 0 0

0 1 0

�, (15)

and the scale �

t

can be estimated from a reference ob-ject near the non-rigid structure or from some rigid sub-structure using Tomasi-Kanade factorization [6].

Further, if the trajectory of a point in camera coordi-nates lies on the basis,

x

(c) ⇡ �

⇤�(c), (16)

then one can reconstruct the 3D structure in the cam-era’s coordinate frame without knowing camera param-eters, provided that no trajectories with flipped signsalso lie on the basis. A drawback to this approach is thatthe reconstructed 3D trajectory x

(c) encompasses boththe non-rigid motion of the articulated structure and therigid motion of the camera, but is still extremely use-ful for obtaining a 3D reconstruction of the articulatedstructure.4.1. Bound for Trajectory Basis

The remaining critical question is: how to find thesize of basis �

⇤ to represent trajectories in camera coor-dinates in function 16 given that x ⇡ �� in world coor-dinates. It is possible to establish a bound on the size of

Page 3: Camera-less Articulated Trajectory ReconstructionCamera-less Articulated Trajectory Reconstruction Yingying Zhu1,2, Jack Valmadre2,3, Simon Lucey1,2,3 1University of Queensland, 2Queensland

Stationary Camera

Camera Motion (K

R

)

Object Motion (K

NR

) Object Motion (K

NR

+ K

R

)

Figure 2: The camera motion is represented by K

R

ba-sis. The object motion is represented by K

NR

basis.If consider the camera be stationary, it is equal to addthe camera motion into object motion. Our method as-sumes a stationary camera and reconstruct the pointsmotion including the camera motion with basis size of(K

R

+ K

NR

).basis which must be used to represent a smooth trajec-tory in the reference frame of a smooth camera for thespecific case of a Discrete Cosine Transform (DCT) ba-sis. The DCT basis has been nearly universally used inall previous work in trajectory basis NRSFM [1, 3, 4, 5]for its ability to represent smooth trajectories.

Let the points of a trajectory {xt

}F

t=1 and the param-eters of cameras {R⇤

t

,d

⇤t

}F

t=1 be formed

x

t

=

x1(t)x2(t)x3(t)

�,d

⇤t

=

d1(t)d2(t)d3(t)

�,R

⇤t

=

r11(t) r12(t) r13(t)r21(t) r22(t) r23(t)r31(t) r32(t) r33(t)

�.

(17)If the trajectories x and d

⇤ lie on the basis � thenthey can be expressed

x

m

(t) =

KNRX

k=1

k

(t)�

mk

, d

m

(t) =

KNRX

k=1

k

(t)�

mk

,

(18)where �

k

(t) gives the t-th entry of column k in �, KNR

is the number of DCT basis to present the points trajec-tory. Following the work of Gotardo and Martinez [3],we say that a rotation matrix R

⇤ is smooth with respectto the basis � if it can be expressed

r

ij

(t) =

KRX

k=1

k

(t)⇣

ijk

, (19)

where K

R

is the number of DCT basis to representthe camera rotation. Intuitively, since each componentof x

(c)t

is a linear combination of the product of twocosines and due to the trigonometric identity

cos(x) cos(y) =

12 [cos(x � y) + cos(x + y)] , (20)

the resulting trajectory will be represented by (K

R

+

K

NR

) DCT basis as shown in Figure 2.

5. Experiments

Our method has been evaluated quantitatively us-ing synthetic projection of the CMU Motion Capturedataset1 and qualitatively with real world examples for

1http://mocap.cs.cmu.edu/

different types of articulated structures.

5.1. CMU Mocap

Figure 3 (a) presents the normalized RMS error ofthe canonical trajectory method, the articulated trajec-tory method and the camera-less approach for differ-ent camera speed. While the articulated trajectory ap-proach was provided with the ground truth lengths, ourmethod estimated the lengths using the maximally ex-tended projection. For slow cameras, our method per-forms close to the articulated reconstruction methodwhich has knowledge of the cameras and articulatedlength. Figure 3 (b) presents the basis size K

R

for cam-era rotation versus varying camera speed and Figure 3(c) shows the actual optimal basis size consistent withthe predicted basis size from Section 4. The optimalbasis size is obtained through exhaustive search.

5.2. Real Sequences

Figure 4 demonstrates practical reconstruction re-sults on real world video sequences of different artic-ulated structures where the relative articulated lengthis unknown and camera estimation would be difficultor impossible as there are no texture rigid backgroundavailable (the koala, Tasmanian tiger and sea snake) orlimited view (Lola). For the purpose of demonstration,2D point correspondences were manually labeled.

6. Discussion and Conclusions

In this work, we proposed a camera-less approach forreconstructing articulated trajectories observed by real-istic/slow motion camera. We characterized the con-dition under which trajectories can be reconstructedwithout known camera motion. Practical results weredemonstrated on several real-world videos.References

[1] I. Akhter, Y. Sheikh, S. Khan, and T. Kanade. Nonrigidstructure from motion in trajectory space. In NIPS, 2008.

[2] M. X. Goemans and D. P. Williamson. Improved ap-proximation algorithms for maximum cut and satisfiabil-ity problems using semidefinite programming. Journal ofthe ACM, 42, 1995.

[3] P. Gotardo and A. Martinez. Computing smooth time tra-jectories for camera and deformable shape in structurefrom motion with occlusion. PAMI, 33(10), 2011.

[4] H. S. Park and Y. Sheikh. 3d reconstruction of a smootharticulated trajectory from a monocular image sequence.In ICCV, 2011.

[5] H. S. Park, T. Shiratori, I. Matthews, and Y. Sheikh. 3Dreconstruction of a moving point from a series of 2D pro-jections. In ECCV, 2010.

[6] J. Valmadre and S. Lucey. Deterministic 3d human poseestimation using rigid structure. In ECCV, 2010.

[7] X. K. Wei and J. Chai. Modeling 3D human poses fromuncalibrated monocular images. ICCV, 2009.

Page 4: Camera-less Articulated Trajectory ReconstructionCamera-less Articulated Trajectory Reconstruction Yingying Zhu1,2, Jack Valmadre2,3, Simon Lucey1,2,3 1University of Queensland, 2Queensland

0.001 0.01 0.05 0.1 1 2 5 10 20 40

10−1

Camera Speed (//sec)

RM

S Er

ror

Cannonical trajectoryArticulated trajectory (known camera)Cameraless articulated trajectory

(a)

0 1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

80

90

100

Num

ber o

f Bas

is

Camera Speed (//sec)

(b)

0 20 40 60 80 10020

30

40

50

60

70

80

90

100

Bas

is S

ize

Camera Basis Size (KR)

Prodicted Size (KR+KNR)

Actual Size

(c)Figure 3: (a) Normalised mean reconstruction error versus camera speed. Cannonical trajectory basis approach [5]relies on known fast camera motion. Articulated trajectory approach [4] required known articulated relative lengthsand camera motion. The proposed cameraless articulated trajectory method depends on slow camera motion andavoids camera estimation at the small cost of increasing reconstruction error for fast camera motion. Fortunately,in most practical applications, camera motion is slow and smooth. (b) The basis size (K

R

) for representing cameramotion versus camera speed. (c) The actural basis size and the predicted basis size (K

NR

+ K

R

) for representingpoint trajectories in camera coordinates versus camera motion basis size K

R

.

Figure 4: The reconstruction of several sequences from two novel views for real worldvideo that camera estimation would be difficult as significant perspective effects are onlyobserved for a handful of frames or no texture rigid background avaiable.