Affordable 3D Face Tracking Using Projective Visionx z Figure 1: Stereo tracking with two...

Post on 31-Aug-2020

2 views 0 download

Transcript of Affordable 3D Face Tracking Using Projective Visionx z Figure 1: Stereo tracking with two...

Aff ordable 3D FaceTracking UsingProjectiveVision

D.O.Gorodnichy���

, S.Malik�, G. Roth

��ComputationalVideoGroup,NationalResearchCouncil,Ottawa,CanadaK1A 0R6�

Schoolof ComputerScience,CarletonUniversity, Ottawa,Canada,K1S5B6http://www.cv.iit.nrc.ca/research/Stereotracker

Abstract

For humans,to view a scenewith two eyesis clearly moreadvantageousthanto do that with oneeye. In computervi-sionhowever, mostof high-level visiontasks,an exampleofwhich is facetracking, are still donewith onecamera only.Thisis dueto thefact that,unlike in humanbrains,therela-tionshipbetweentheimagesobservedbytwoarbitraryvideocameras, in manycases,is not known. Recentadvancesinprojectivevisiontheoryhoweverhaveproducedthemethod-ology which allows oneto computethis relationship. Thisrelationshipis naturally obtainedwhile observingthesamescenewith bothcamerasandknowingthis relationshipnotonlymakesit possibleto track featuresin 3D,but alsomakestrackingmuch morerobustandprecise. In thispaper, wees-tablisha framework basedon projectivevision for trackingfacesin 3D using two arbitrary cameras, and describeastereotracking system,which usestheproposedframeworkto track facesin 3D with theaid of twoUSBcameras.Whilebeingvery affordable, our stereotracker exhibits pixel sizeprecisionandis robustto head’srotationin all threeaxisofrotation.

1 Intr oduction

We considerthe problemof tracking facesusing a videocameraandfocusour attentionon thedesignof thevision-basedperceptualuser interface systems[28]. The mainapplicationsof thesesystemsare seenin HCI, teleconfer-encing, entertainment,security and industry for disabled[27, 30].

Being a high-level vision problem,facetrackingprob-lem posesfour major challenges: robustness,precision,speedand affordability. While the last two have becomemuchlesscritical over the last few yearsdueto the signif-icant increaseof computerpower and decreaseof cameracost,thefirst two remainunresolved.

Theapproachesto facetrackingcanbedividedinto twoclasses:global (image-based)andlocal (feature-based)ap-proaches[10, 31]. Global approachesuseglobal cueslike�

Theauthorthecorrespondenceshouldbesentto.

skin colour, headgeometryand motion, are more robust,but cannotbe usedfor pixel-sizeprecisiontracking. Ontheotherhand,local approaches,which arebasedon track-ing facial features,cantheoreticallytrackvery precisely. Inpracticehowever they do not, becausethey arevery unro-bust, due to the variety of motionsandexpressionsa facemayexhibit.

While for humansit is definitely easierto track objectswith two eyes thanwith oneeye, in computervision facetracking is usuallydonewith onecameraonly. A few au-thorsdo usestereofor facetracking[14, 15, 20, 29]. Theyhowever usethe secondcameramainly for the purposeofacquiringthe third dimensionratherthanmaking trackingmore robust, preciseor affordable. In fact, conventionalstereosetupsareusuallyprecalibratedandquiteexpensive.

Theproblemis thathumanbrainknowsandmakesuseofthe relationshipbetweenthe imageswhile processingthem[2, 3], whilecomputersdonot. Recentadvancesin computervisionthoughprovidedthemethodologybasedonprojectivevision that allows oneto computethis relationshipfor anytwo cameras[9, 23]. This relationshipis naturallyobtainedwhile observingthe samescenewith both camerasand isrepresentedby thefundamentalmatrixwhichrelatesthetwoimagesto oneanother.

This paperdescribeshow to use the projective visiontechniquesfor tracking faceswith two arbitrary cameras.Themostsignificantresultis thatcomputingthefundamen-tal matrix not only allows one to recover the 3D positionof the objectwith low-costcameras,but alsomakestrack-ing muchmorerobust. Combiningtheproposedprojective-vision-basedtracking approachwith the robust convex-shapenosetrackingtechniquedescribedin [5] allowedustobuild astereotrackingsystem,whichis ableto trackfacesin3D usingtwo genericUSB cameras.The robustnessof thesystem,the binary codeof which canbedownloadedfromour website,is suchthat therotationsof a headof up to 40degreesin all threeaxisof rotationcanbetracked.

The paper is organizedas follows. After presentingthe outline of our framework for affordablestereotracking(Section2), we recaptheprojectivevision propertieswhichmake the framework possible(Section3). This is followedby the descriptionof the stereoselfcalibrationprocedure

x

z

Figure1: Stereotrackingwith two web-cameras.Thekey issueisfinding therelationshipbetweenthecameras.

(Section4). Thenwe describethetrackingprocedureof theframework andpresenttheexperimentalresults(Section5).

2 Stereosystemsetup

In order to presenta framework for doing stereotrackingwith two arbitrary uncalibratedcameras,we will describetheStereoTrackerdevelopedbyourgroup,wherethisframe-work is implemented.

The StereoTracker allows a user to do high-level vi-siontaskswith off-the-shelfvision equipment.It tracksthefaceof a userin 3D with theaid of two ordinaryUSB web-camerasandconsistsof threemajormodules:1) stereoself-calibration,2) facial featureslearningand3) featuretrack-ing. Thesetupof thesystemis thefollowing.

A usermountstwo USB camerason thetop of thecom-putermonitor so thathis faceis seenby bothcameras(seeFigure1), afterwhich theuserrunstheselfcalibrationmod-uleof theStereoTracker to acquiretherelationshipbetweenthecameras.For this theusercaptureshis headat thesametime with both camerasat the distanceto be usedin track-ing (Figure2). Whenthestereocalibrationinformationhasbeencalculated,the StereoTracker makesuseof it to learnrobust facial features(Figure3), and thenmakesuseof itagainwhile trackingthefeaturesin 3D (Figure4).

Beforeproceedingto the descriptionof the calibrationprocedure,which is the basisof our stereotrackingframe-work, we needto describenotationsand propertiesto beusedthroughoutthepaper.

3 Projectivevision interlude

3.1 Points, lines and calibration matrix.

Accordingto the projective paradigm,every 2D pixel ���� � �����of an imageis asassociatedwith the3D vector ���� ��������� �

which startsat thecameraorigin andgoesthoughto all 3D spacepointswhich areprojectedto thesamepixel

on the imageplane.1 A line in an imageis representedby3D vector � which is perpendicularto all points � belongingto theline: � � ����� .

The relationshipbetweenvector � andpixel position �of a point in theimageis expressedas

���! !"�$# (1)

where "� denotesa 3D vectorobtainedfrom a 2D vector �by addingoneasthelastelement.Matrix K in thisequation,thesimplifiedform of which is written below, is termedthecalibration matrix of thecamera.It describesthe intrinsicparametersof the camera,the mostimportantof which arethe centerof the image % &'(�)&+* measuredin pixel coordi-nates,andthe focal lengthof the camera, , definedasthedistancefrom thecameraorigin to thecameraimageplanemeasuredin pixels:

-� ./ , � &� , � &� � �01 # (2)

Computingthe calibrationmatrix of the cameraconsti-tutesthecalibrationprocessof thecamera.Thegoalof self-calibrationis to computethis matrix directly from an im-agesequencewithout resortingto a formal calibrationpro-cedure.Oneof thewaysto do this is by usingthe conceptof thefundamentalmatrix.

3.2 Stereoand fundamental matrix

Whena 3D point 2 in spaceis observed by two cameras,it is projectedto theimageplaneof eachcamera.This gen-eratestwo vectors� and �43 startingfrom theorigin of eachcamera.Thesevectorsarerelatedto eachotherthroughtheequation � ��57698 � 3 �!� (3)

where5

is the translationvectorbetweenthe cameraposi-tions and

8is the rotation matrix. This equationsimply

statesthefactvectors� ,5

and �43 arecoplanar. In computervision,thisequationis known astheepipolarconstraint andis usuallywrittenas

� �;: � 3 ��� where (4):=< 5>6?8is termedtheessentialmatrix# (5)

The epipolarconstraintholdsfor any two camerasetupandis very importantfor 3D applications,asit definestherelationshipbetweenthe correspondingpointsin two cam-era images.For handmadestereosetupwith off-the-shelf

1Some authors [11, 8] prefer using vector @ ACBED�B F+GIH instead of@ ACBEDJB � GIH in orderto avoid theunbalancingof thevalues.

Figure2: Imagescapturedby two camerasto beusedin selfcali-bration.They shouldhave enoughvisualfeatures.

cameras,5

and8

arenot known. If thecamerasareuncali-brated,thenmatrix is not known either. In this case,theepipolarconstraintis rewritten,usingEq. 1, as"� ��K "�L��� (6)

where "�M� � � ONP)�Q��� and "�;3R� � �S3 ��3 )�Q��� definethe rawpixel coordinatesof thecalibratedvectors� and �43 , andma-trix

Kdefinedas K < : � (7)

is termedthefundamentalmatrix of thestereo.Computingthe fundamentalmatrix constitutesthe cali-

brationprocessof the stereosetup. Next sectiondescribesthisprocessin detailfor thecaseof low-qualitycameras,andbelow we describethepropertiesof thefundamentalmatrixto beusedin stereotracking.

3.3 Epipolar lines

Given a point � in oneimage,the fundamentalmatrix canbe usedto computea line in the otherimageon which thematchingpointmustlie. This line is calledtheepipolarlineandcanbecomputedas �ETU� K �V# (8)

It is this pieceof extra informationthatmakestrackingwith two camerasmuchmorerobustthantrackingwith onecamera.In orderto filter out badmatches,the epipolarer-ror, definedasthe sumof squaresof distancesof pointstotheir epipolarlines,canbeused[9]. UsingEq. 8, the rela-tionshipbetweenepipolarerror W andfundamentalmatrix

KcanbederivedasW <!XZY %[� �ETJ\ *4] X 3 Y %[��3 �[T *�^%E� 3 � K � * YP_ `acb TJdfe g�h aib TJdfee ] `aibkj T \ dfe glh aibkj T \ dEeeZm (9)

andthefollowing propositioncanbeusedto make trackingmorerobust.Proposition: Provided that fundamentalmatrix

Kof the

stereo is known,the bestpair of matches � and ��3 corre-spondingto a 3D featureis theonethatminimizestheepipo-lar error definedbyEq. 9.

4 Stereosystemselfcalibration

The calibrationproceduredescribedbelow is implementedusing the public domain Projective Vision Toolkit (PVT)[18] and is basedon finding the correspondencesin twoimagescapturedwith both camerasobservingat the samestaticscene.Becauseoff-the-shelfUSBcamerasareusuallyof low quality and resolution,extra care is taken to dealwith the badmatchesby applyinga setof filters andusingrobuststatistics.Thedescriptionof eachstepfollows.

Finding interest points. After two imagesof the samesceneare taken, the first stepis to find a setof cornersorinterestpoints in eachimage. Theseare the pointswherethere is a significantchangein intensity gradientin boththe

�and

�direction. A local interestpoint operator[26]

is usedand a fixed numberof cornersis returned. Thefinal resultsarenot particularlysensitive to the numberofcorners. Typically there are in the order of 200 cornersfoundin eachimage.

Matching corners and symmetry filter. The next stepis to matchcornersbetweenthe images. A local windowaroundeachcorner is correlatedagainstall other cornerwindows in the adjacentimage that are within a certainpixel distance[32]. Thisdistancerepresentsanupperboundon the maximumdisparity and is set to 1/3 of the imagesize. All corner pairs that passa minimum correlationthresholdare then filtered using a symmetry test, whichrequiresthe correlation be maximum in both directions.Thisfilters out half of thematchesandforcestheremainingmatchesto beone-to-one.

Disparity gradient filter. Thenext stepis to performlocalfiltering of thesematches.We usea relaxation-like processbasedon theconceptof disparitygradient[12] which mea-suresthecompatibilityof two correspondencesbetweenanimagepair. It is definedastheratioX'npo �rq s�tvuws�x�q yzq s|{�}'~� q (10)

where s t and s x arethe disparityvectorsof two corners,s�{�}'~� is the vector that joins midpointsof thesedisparity

vectors,and q���q designatesthe absolutevalueof a vector.The smaller the disparity gradient, the more the twocorrespondencesin agreementwith eachother. This filter isvery efficient. At a very low computationalcost,it removesa significantnumberof incorrectmatches.

Usingrandom sampling. Thefinal stepis to usethefilteredmatchesto computethe fundamentalmatrix. This processmustberobust,sinceit still cannot beassumedthatall fil-teredcorrespondencesarecorrect. Robustnessis achievedby usinga randomsamplingalgorithm. This is a “generate

andtest”processin whichaminimalsetof correspondences,thesmallestnumbernecessaryto defineauniquefundamen-tal matrix (7 points),arerandomlychosen[25, 22]. A fun-damentalmatrix is then computedfrom this bestminimalsetusingEq. 6. Thesetof all cornersthatsatisfythis fun-damentalmatrix, in termsof Eq. 9, is calledthesupportset.Thefundamentalmatrix with thelargestsupportsetis usedin stereotracking.Beforeit canbeusedhowever, it needstobe evaluated,becauserobust andprecisetracking trackingis possibleonly underthe assumptionsthat the computedfundamentalmatrix correctlyrepresentsthe currentstereo-setup.

4.1 Evaluating the quality of calibration

The evaluationis doneusing two ways: analytically - byexaminingthesizeof thesupportset,andvisually- by visualexaminationof theepipolarlines.

It hasbeenempiricallyobtainedthatfor thefundamentalmatrix to becorrectit shouldhaveat least35matchesin thesupportset. If their numberis less,it meansthat either i)therewerenotenoughvisualfeaturespresentin theimages,or ii) camerasare locatedtoo far from eachother. In thiscase,camerashaveto berepositionedor someextraobjects,in additionto thehead,shouldbeaddedin thecamerafieldof view andthecalibrationprocedureshouldberepeated.

Figure3 shows theepipolarline (on theright side)cor-respondingto the tip of the nose(on the left side) as ob-tainedfor thestereosetupconsistingof two Intel USBweb-camsshown in Figure1. As canbeseen,it passescorrectlythroughthenose,thusverifying thecorrectnessof thefunda-mentalmatrixandpresentingausefulconstraintfor locatingthenosein thetrackingstage.

4.2 Finding the calibration matrix

Knowing matrixK

allowsoneto computematrix . Underthe assumptionsthat the intrinsic parametersof both cam-erasarethesame,theapproachwe useis thatof [17]. It isbasedon the fact that matrix

:, definedby Eq. 5 andre-

latedto matrixK

throughEq. 7, hasexactly two non-zeroandequaleigenvalues.The ideais to find suchcalibrationmatrix that makesthe two eigenvaluesof

:ascloseas

possible.Giventwo nonzeroeigenvaluesof:

: � ` and � Y( � `|� � Y ), considerthedifference%E� ` u�� Y * y�� ` .

Givena fundamentalmatrixK

, selfcalibrationproceedsby finding sucha calibrationmatrix thatminimizesthis dif-ference. This is an optimizationproblemwhich is solvedby dynamichill climbinggradientdescentapproach[16]. In[24] it is shown that this procedure,while quite simple, isnot inferior to morecomplicatedapproachesto selfcalibra-tion, suchasthoseusingtheKruppa’sequation[13].

Knowing matrix and the focal length allows one toreconstructspatialrelationshipof theobservedfeatures.For

Figure3: Learningthefeaturesusingtheepipolarconstraints.Thenosefeaturemustlie on theepipolarline.

the tracking processhowever, this information is not re-quired.

5 Stereofeature tracking

We usethepatternrecognitionparadigmto representa fea-tureasamulti-dimensionalvectormadeof featureattributes.In the caseof image-basedfeaturetracking,the featureat-tributesare the image intensitiesobtainedby centeringaspeciallydesignedpeepholemaskonthepositionof thefea-ture(asin [6]).

Themajoradvantageof stereotracking,i.e. trackingwithtwo cameras,overtrackingwith onecamerais thatof havinganadditionalepipolarconstraintwhich ensuresthat theob-servedfeaturesbelongto arigid body(seeSection3.3).Us-ing thisconstraintallowsusto relaxthematchingconstraintusedto enforcethe visual similarity betweenthe observedandthetemplatefeaturevectors.Thematchingconstraintisthe main reasonwhy feature-basedtrackingwith onecam-erais not robust,andrelaxingthisconstraintmakestrackingmuchmorerobust.

5.1 Feature learning

In thiswork, wearenotconcernedwith automaticdetectionof facial features.All featuresaremanuallychosenat thelearningstage.During this stage,by clicking with a mousea userselectsa featurein oneof thepair images(seeFigure3). This generatesan epipolarline in the other imageandthetemplatevectorsof boththeselectedfeatureandthebestmatchof the featurein theotherimagelying on theepipo-lar line arestored.Theexamplesof learnttemplatescanbeseenin Figure4 underneaththefaceimages.By visuallyex-aminingthesimilarity of thesetemplates,a usercanensurethat thestereosetupis well calibratedandthat theepipolarconstraintindeedprovidesausefulconstraint.

Several featurescanbe selectedfor tracking. Howeveroneof thesefeaturesmustbethetip of thenose.Thetip ofthenoseis averyuniqueandimportantfeature.As shown in[5], dueto its convex shape,this featureis invariantto boththerotationandthescaleof thehead.If detectionof theface

orientationis not required,thenrobust facetrackingin 3Dcanbeachievedusingthis featureonly.

Otherfeaturesmayincludeconventionaledge-basedfea-turessuchasinnercornersof the brows andcornersof themouth. At leasttwo of thesefeaturesarerequiredin orderto track the orientationof the head.Thesefeaturesarenotinvariant to the 3D motion of the head. Therefore,in or-der to make the trackingof thesefeaturesrobust, the rigid-ity constraintwhich relatestheir positionsto thenoseposi-tion is imposed.Thisconstraintis computedwhile selectingthe features. Another constraintusedin stereotrackingisthedisparityconstraint,which definestheallowabledispar-ity valuesfor featuresduring thetracking,thuslimiting theareaof search.Boththerigidity andthedisparityconstraintsarecomputedduringtheselectionof featuresin thelearningstageandcanbeengagedor disengagedduringthetracking.

5.2 Tracking procedure

Eachselectedfacial featureis tracked using the followingfour-stepprocedure.

Step1. The areafor the local searchis obtainedusingtherigidity constraintandtheknowledgeof thenosetip po-sition. Whentherigidity constraintis not engagedor whenthe featureis the nose,the local searchareais setaroundthepreviouspositionof thefeature.If thisknowledgeis notavailable(asin thecaseof mistracking),thenthesearchareais setto theentireimage.

Step2. A setof � bestcandidatematches�+���+� is gen-eratedin both imagesby usingthepeepholemaskandcor-relatingthelocal searchareawith templatevector ��S� learntin the training. In order to be consideredvalid, all candi-datematchesare requiredto have the correlationwith thetemplatefeaturelarger thana certainthreshold. In our ex-periments,theallowableminimumcorrelationis setto 0.9.

Step3. Out of � Y possiblematchpairs betweentwoimages, the best � pairs are selectedusing the cross-correlationbetweenthe images. In our experiments,� issetto 10and � is setto 5. Increasing� or � increasestheprocessingtime,but makestrackingmorerobust.

Step4. Finally, thepropositionof Section3 is usedandthe matchpair that minimizesthe epipolarerror W definedby Eq. 9, is returnedby thestereotrackingsystem.If there-turnedmatchhastheepipolarerrorlessthenacertainthresh-old, thanthefeatureis consideredsuccessfullytracked.Oth-erwise,the featureis consideredmistracked. The valueofthemaximumallowableepipolarerror WZ���p� dependsonthequality of stereoselfcalibrationand shouldbe determinedduring the learningstageby observingthe epipolarlinesatdifferentpartsof theimage.In theexperimentspresentedinthispaperit is setequalto 5 pixels.

Whena featureis successfullytracked, its 3D positioncan be calculatedusing the essentialmatrix

:as in [9].

The distancebetweenthe camerascanbe eithermeasured

Figure4: TheStereoTracker atwork. Theorientationandscaleofthe virtual man(at the bottomright) is controlledby the positionof theobservedface.

by handor setequalto one.In thelattercase,thereconstruc-tion will be known up to a scalefactor, which is sufficientfor mostapplications

5.3 Experiments

The tracked facial featuresprovide the information aboutthe position and the orientationof the headwith respectto the cameras.This informationcanbe usedto control a2D object on the screen,suchas a cursoror a turret in aaim-n-shootgame(as in [7]), or it can be usedto controla 3D object,examplesof which canbe found in computergamesand avatar-like computer-generatedcommunicationprograms(asin [19]). TheStereoTracker system,which wehave developed,allows one to test the applicability of theproposedstereotrackingtechniqueto eitherof theseappli-cations.

Figure4 shows two imageswhich arecontrolledby themotionof thehead.Theimageon thebottomleft is there-projectionof thedetectedfeaturesontothe2D imageplane.Our experimentsshow thatby thinking of a noseasa point-ing device,onecaneasilypin-pointto any pixelsontheper-spectiveview screen.Thesystemis robustto therotationofthehead.This makesit possibleto draw a straightline withthenose,representedby the lowestvertex of thetriangleinthefigure,by simply rotatingthehead.

Theuserinterfaceof theStereoTrackerallowsonealsotochangetheperspectiveview of thefeaturesto thetop-down.view. This is usedto evaluatethepotentialof usingthethirddimensionof thetrackedfeaturesfor controllingthesizeofa cursoror anothervirtual objecton thescreen.The imageon thebottomright of Figure4 showsavirtual man,thepo-sitionof whichin avirtual 3D spaceis controlledby themo-

tion of theuser’s head;thescaleof themanis proportionalto thedistancebetweenthefeaturesandthecameraandtheroll rotationof the mancoincideswith the roll rotationofthehead.

Dueto therobustnessof thestereotracking,thereis quitea wide rangeof headmotion which can be tracked. Forexample,for the setupshown in Figure1, the experimentsshow that within a rangefrom 20 cm to 60 cm the headistrackedsuccessfully. Thetrackinghoweverbecomelessro-bustto therotationof thehead,asausermovesfurtherfromtheoriginalposition.

Another observation is that stereotrackingwith low-quality camerasdoesnot allow oneto retrieve the panandtilt rotationof thehead.Thishoweverdoesnotcomeassur-prise,if we recallthatthedepthcalculationerrordueto lowresolutionof the imagecanbe ashigh as10% of the mea-sureddepthdistancedependingonthepositionof thefeature[4, 21].

The user interfaceof the StereoTracker allows one toevaluatetherobustness,speedandprecisionof stereotrack-ing with respectto the internalparametersandconstraintsof tracking. Theseparameterscan be changedduring thetrackingstageandincludealreadythementionedminimumallowablecorrelation,thenumberof featurecandidates( � ),the numberof refinedmatchesafter cross-correlation( � ),maximumepipolarerror, andalsothesizeof theareafor thesubpixel refinement,whichcanbeusedfor thefeaturespos-sessingthecontinuityproperty(see[5]), andthemaximumcolordifferencebetweenthestoredfeaturepixel andapixelbeing scanned. The last constraintis due to the fact thattemplatematchingin oursystemis donewith black-n-whiteimages;addingthis constraintallows us to usethe colourinformationto reducethenumberof featurecandidates.

In addition,thefacialfeaturescanbecheckedin andoutfor trackingusingthe menuof the program.Thedisparity,rigidity and epipolarconstraintscan also be engagedanddisengagedusing the menu. While tracking is performed,the StereoTracker outputsstatisticssuchas minimum andmaximumvaluesof correlation,cross-correlationandepipo-lar errorsof thedetectedfeaturesaswell asthedetected3Dpositionandroll orientationof theheadandthepositionofthecursorcontrolledby thehead.

While the paperpresentsonly the snapshotsof our ex-periments,full MPEG videosof the StereoTracker at workareavailableatourwebsite.Thebinarycodeof theprogramcanalsobedownloadedfrom thewebsite.In orderto run it,a userwill only have to have two USB webcamsconnectedto a computer. In our experiments,we usedtwo Intel USBweb-cameras.OtherUSB webcams,suchasLogitech,Cre-ative and3Com,werealso tried. Accessingthe imagesisdoneusingtheDirectX interface.Theresolutionof the im-agesis 320by 240.Lowerresolutionof 160by 120wasalsotried andfoundto besufficient for preciseandrobusttrack-ing. Imagesmoothingis doneusingthe Intel OpenSource

CV library [1]. On a PentiumIII 800 MHz processor, allprocessingtakes0.10msperframein average.This allowsoneto do stereofacetrackingin realtime.

Figure5 shows a typical rangeof headmotionwhich istrackedusingour framework alongwith thepositionof 2Dand3D objectscontrolledby the headmotion: tilt motion(a),panmotion(b), roll motion(c), andmotionfurtherfromthecameras(d). Yellow boxesaroundthefeaturesshow theareasfor local searchof thefacialfeatures.Usingthreefea-turesis found most optimal for recovering the orientationof the head;inner cornersof the brows beingmoreprefer-ableto track thancornersof themouth. Usingfive featuresslowed down the processingandoften resultedin breakingthe rigidity constraint.Thefigureshows well the precisionandtherobustnessof stereotracking. By switchingon andoff theepipolarconstraint,we wereableto observe thatus-ing onecameradoesnot allow oneto detectfeaturesin theimageslike thoseshown in Figure5.(c)(d), whereasusingtwo camerasdoes.In asimilar fashion,by switchingonandoff the rigidity constraint,we couldseehow stereotrackingallows oneto track robustly even suchnon-robust featuresascornersof thebrows.

We have experimentedwith the differentbaselinesbe-tweenthecameras,andthedistanceof about10–20cm ap-pearsto be themostoptimal. Largerbaselinesresultin toofew matchesneededto computethe fundamentalmatrix,while smallerbaselinescauselarge epipolarerrors. It wasalsoobservedthat thealignmentof thecamerasis not verycritical for our framework, whichis is attributedto usingtherotationinvariantconvex-shapenosefeatureandthecombi-nationof theepipolarconstraintwith therigidity constraint.

6 Conclusions

In this paperwe defineda framework for trackingfacesin3D usingtwo genericweb-cameras.Theadvantagesof us-ing two cameras(two “eyes”) for facetracking,asexhibitedby ourStereoTracker, aresummarizedbelow. Onecantrackfeatures:1. In low quality images.– Low costplug-n-playUSBcam-erasareused.2. Moreprecisely, with sub-pixel accuracy. – Onecanmovethe screencursorwith his/herheadonepixel at time on a360x240grid.3. More robustly, with respectto scaleand rotations. –Featuresaretrackedfor up to almost40 degreesof rotationof headin all threedirections: “yes”(up-down), “no”(left-right), and“don’t know” (clockwise).4. In real time. – After two camerashave beencalibrated,the work of trackingbecomessimpler. Calibrationcanbedoneautomatically, assoonastheheadis seenin bothcam-eras.5. In 3D. - 3D coordinatesandtheroll angleof theheadare

recovered.Projective vision androbust statisticsenableus to deal

with uncalibratedcameras(i.e. camerasfor which the in-trinsic parameterssuchas focal length, optical centeretc.arenotknown),whicharethemostcommoncamerasonthemarket. Thegainin theaccuracy androbustnessis achievedby usingtwo cameraswhereonly onecamerahasbeenusedin the past. It is thus believed that the proposedtechnol-ogy bringshigh-level computervision solutionscloserforthemarket.

References

[1] G. Bradski.OpenCV:Examplesof useandnew applicationsin stereo,recognitionandtracking. In Proc.Intern.Conf. onVision Interface(VI’2002), Calgary, May 2002.

[2] D. Burr, M. Morrone, and L. Vaina. Large receptivefields for optic flow detectorsin humans. Vision Research,38(12):1731–1743,1998.

[3] D. Gorodnichy. Investigationand designof high perfor-mance neural networks. PhD dissertation, GlushkovCybernetics Center of Ac.Sc. of Ukraine in Kiev(http://www.cs.ualberta.ca/� ai/alumni/dmitri/PINN), 1997.

[4] D. Gorodnichy. Vision-basedworld modelingusingapiece-wise linear representationof the occupancy function. PhDdissertation,ComputingScienceDept,University of Alberta(http://www.cs.ualberta.ca/� ai/alumni/dmitri/WorldModeling),2000.

[5] D. Gorodnichy. On importanceof nosefor facetracking.In Proc. IEEE Intern.Conf. on AutomaticFaceandGestureRecognition (FG’2002), Washington,D.C.,May 2002.

[6] D. Gorodnichy, W. Armstrong,andX. Li. Adaptivelogicnet-works for facial featuredetection.In Proc. Intern. Conf. onImage Analysisand Processing(ICIAP’97), Vol. II (LNCS,Vol. 1311),pp.332-339,Springer, 1997.

[7] D. Gorodnichy, S. Malik, and G. Roth. Nouse‘Use yournoseasa mouse’- a new technologyfor hands-freegamesand interfaces. In Proc. Intern. Conf. on Vision Interface(VI’2002), Calgary, May 2002.

[8] R. Hartley. In defenceof the8 pointalgorithm. IEEETrans-actionsonPatternAnalysisandMachineIntelligence, 19(6),1997.

[9] R. Hartley and A. Zisserman. Multiple view geometryincomputervision. CambridgeUinversityPress,2000.

[10] E. HjelmasandB. K. Low. Large receptive fields for op-tic flow detectorsin humans. ComputerVision and ImageUnderstanding, 83:236– 274,2001.

[11] K. Kanatani. GeometricComputationfor Machine Vision.Oxford UniversityPress,1993.

[12] R. Klette, K. Schluns,andA. Koschan. ComputerVision:three-dimensionaldatafromimages. Springer, 1996.

[13] L. LourakisandR. Deriche. Cameraself-calibrationusingthe svd of the fundamentalmatrix. TechnicalReport3748,INRIA, August1999.

[14] G. Loy, E. Holden, and R. Owens. 3D headtracker foran automaticlipreadingsystem. In Proc. Australian Conf.on RoboticsandAutomation(ACRA2000), Melbourne,Aus-tralia,August2000.

[15] Y. MatsumotoandA. Zelinsky. An algorithmfor real-timestereovision implementationof headposeandgazedirec-tion measurement.In Proc.IEEEIntern.Conf. onAutomaticFaceandGesture Recognition (FG2000), 2000.

[16] M. MazaandD. Yuret. Dynamichill climbing. AI Expert,pages26–31,1994.

[17] P. Mendoncaand R. Cipolla. A simple techniqueforself-calibration. In Proc. IEEE Conf. on ComputerVisionand Pattern Recognition (ICPR’99), pages112–116,FortCollins,Colorado,June1999.

[18] NRC. PVT (Projective Vision Toolkit) website.http://www.cv.iit.nrc.ca/research/PVT,2001.

[19] M. PanticandL. Rothkrantz.Automaticanalysisof facialex-pressions:Thestateof theart. IEEE Transactionson PAMI,22(12):1424–1445,2000.

[20] R.Newman,Y. Matsumoto,S. Rougeaux,andA. Zelinsky.Real-timestereotrackingfor headposeandgazeestimation.In Proc. IEEE Intern.Conf. on AutomaticFaceandGestureRecognition Gesture Recognition (FG2000), 2000.

[21] J. RodriguezandJ. Aggarwal. Stochasticanalysisof stereoquantazationerror. 12/5:467–470,1990.

[22] G. RothandM. D. Levine. Extractinggeometricprimitives.ComputerVision, Graphicsand Image Processing: ImageUnderstanding, 58(1):1–22,July1993.

[23] G. Roth andA. Whitehead.Using projective vision to findcamerapositionsin an image sequence. In Proc. Intern.Conf. on Vision Interface(VI’2000), pages87–94,Montreal,May 2000.

[24] G. RothandA. Whitehead.Someimprovementson two au-tocalibrationalgorithmsbasedon the fundamentalmatrix.In Proc. Intern. Conf. on Pattern Recognition (ICPR2002),QuebecCity, Canada,2002.

[25] P. J.Rousseeuw. Leastmedianof squaresregression.Journalof AmericanStatisticalAssociation, 79(388):871–880,De-cember1984.

[26] S. SmithandJ. Brady. Susan- a new approachto low levelimageprocessing.InternationalJournalof ComputerVision,pages45–78,May 1997.

[27] K. Toyama. Prolegomenafor robust facetracking. Techni-cal ReportMSR-TR-98-65,Microsoft Research,November1998.

[28] M. Turk, C. Hu, R. Feris,F. Lashkari,andA. Beall. TLAbasedfacetracking. In Proc. Intern. Conf. on Vision Inter-face(VI’2002), Calgary, May 2002.

[29] M. Xu andT. Akatsuka. Detectingheadposefrom stereoimagesequencefor active facerecognition. In Proc. IEEEIntern. Conf. on AutomaticFace and Gesture Recognition(FG’98), 1998.

[30] J.Yang,R.Stiefelhagen,U. Meier, andA. Waibel.Real-timefaceand facial featuretrackingandapplications. In Proc.AVSP’98,pages79–84,Terrigal, Australia, 1998.

[31] M. Yang,N. Ahuja, andD. Kriegman. Detectingfacesinimages: A survey. IEEE Transactionon Pattern AnalysisandMachineIntelligence, 24(1):34–58,2002.

[32] Z. Zhang,R. Deriche,O. Faugeras,andQ.-T. Luong. A ro-busttechniquefor matchingtwo uncalibratedimagesthroughtherecovery of theunkown epipolargeometry. Artificial In-telligenceJournal, 78:87–119,October1995.

a)

b)

c)

d)

Figure5: Snapshotsof experimentsshowing therobustnessandprecisionof thestereotrackingaccomplishedwith two webcams.