Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Click here to load reader

download Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

of 51

  • date post

    23-Feb-2016
  • Category

    Documents

  • view

    39
  • download

    0

Embed Size (px)

description

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. {bangpeng,feifeili}@cs.stanford.edu. Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford University. Human-Object Interaction. Robots interact with objects. Automatic sports commentary. - PowerPoint PPT Presentation

Transcript of Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Context paper

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction ActivitiesBangpeng Yao and Li Fei-FeiComputer Science Department, Stanford University

{bangpeng,feifeili}@cs.stanford.edu1

1Robots interact with objectsAutomatic sports commentaryKobe is dunking the ball.2Human-Object Interaction

Medical care3

Vs.Human-Object Interaction

Playing saxophonePlaying bassoonPlaying saxophoneGrouplet is a generic feature for structured objects, or interactions of groups of objects.

(Previous talk: Grouplet)

Caltech101HOI activity: Tennis ForehandHolistic image based classificationDetailed understanding and reasoningBerg & Malik, 2005Grauman & Darrell, 2005Gehler & Nowozin, 2009OURS48%59%77%62%4Human-Object Interaction

TorsoRight-armLeft-armRight-legLeft-legHead Human pose estimationHolistic image based classificationDetailed understanding and reasoning5Human-Object Interaction

Tennis racket Human pose estimationHolistic image based classificationDetailed understanding and reasoning Object detection6Human-Object Interaction Human pose estimationHolistic image based classificationDetailed understanding and reasoning Object detection

TorsoRight-armLeft-armRight-legLeft-legHeadTennis racketHOI activity: Tennis Forehand Background and Intuition Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference Experiments ConclusionOutline7 Background and Intuition Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference Experiments ConclusionOutline8

Felzenszwalb & Huttenlocher, 2005 Ren et al, 2005 Ramanan, 2006 Ferrari et al, 2008 Yang & Mori, 2008 Andriluka et al, 2009 Eichner & Ferrari, 2009Difficult part appearanceSelf-occlusionImage region looks like a body partHuman pose estimation & Object detection9Human pose estimation is challenging.

Human pose estimation & Object detection10Human pose estimation is challenging. Felzenszwalb & Huttenlocher, 2005 Ren et al, 2005 Ramanan, 2006 Ferrari et al, 2008 Yang & Mori, 2008 Andriluka et al, 2009 Eichner & Ferrari, 2009

Human pose estimation & Object detection11FacilitateGiven the object is detected.

Viola & Jones, 2001 Lampert et al, 2008 Divvala et al, 2009 Vedaldi et al, 2009

Small, low-resolution, partially occluded

Image region similar to detection targetHuman pose estimation & Object detection12Object detection is challenging

Human pose estimation & Object detection13Object detection is challenging Viola & Jones, 2001 Lampert et al, 2008 Divvala et al, 2009 Vedaldi et al, 2009

Human pose estimation & Object detection14FacilitateGiven the pose is estimated.

Human pose estimation & Object detection15Mutual Context

Hoiem et al, 2006 Rabinovich et al, 2007 Oliva & Torralba, 2007 Heitz & Koller, 2008 Desai et al, 2009 Divvala et al, 2009 Murphy et al, 2003 Shotton et al, 2006 Harzallah et al, 2009 Li, Socher & Fei-Fei, 2009 Marszalek et al, 2009 Bao & Savarese, 2010Context in Computer Vision~3-4%with contextwithout contextHelpful, but only moderately outperform better

Previous work Use context cues to facilitate object detection:

Viola & Jones, 2001 Lampert et al, 2008

16Context in Computer VisionOur approach Two challenging tasks serve as mutual context of each other:

With mutual context:Without context:17~3-4%with contextwithout contextHelpful, but only moderately outperform betterPrevious work Use context cues to facilitate object detection:

Hoiem et al, 2006 Rabinovich et al, 2007 Oliva & Torralba, 2007 Heitz & Koller, 2008 Desai et al, 2009 Divvala et al, 2009 Murphy et al, 2003 Shotton et al, 2006 Harzallah et al, 2009 Li, Socher & Fei-Fei, 2009 Marszalek et al, 2009 Bao & Savarese, 2010 Background and Intuition Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference Experiments ConclusionOutline18

19HAMutual Context Model Representation More than one H for each A; Unobserved during training.A:

Croquet shot

Volleyball smash

Tennis forehand

Intra-class variationsActivityObjectHuman poseBody parts

lP: location; P: orientation; sP: scale.

Croquet malletVolleyball

Tennis racketO:H:P:f:Shape context.[Belongie et al, 2002]

P1

Image evidence

fOf1f2fNOP2PN19

20Mutual Context Model Representation

Markov Random FieldClique potentialClique weightO

P1PN

fOHAP2f1f2fN

, , : Frequency of co-occurrence between A, O, and H.20

21Af1f2fNMutual Context Model Representation

fO

P1PNP2OH , , : Spatial relationship among object and body parts.

locationorientationsize

Markov Random FieldClique potentialClique weight

, , : Frequency of co-occurrence between A, O, and H.21

22HAf1f2fNMutual Context Model RepresentationObtained by structure learning

fOPNP1

P2O Learn structural connectivity among the body parts and the object.

, , : Frequency of co-occurrence between A, O, and H. , , : Spatial relationship among object and body parts.

locationorientationsize

Markov Random FieldClique potentialClique weight22

23HOA

fOf1f2fN

P1P2PNMutual Context Model Representation and : Discriminative part detection scores.

[Andriluka et al, 2009]Shape context + AdaBoost Learn structural connectivity among the body parts and the object.[Belongie et al, 2002][Viola & Jones, 2001]

, , : Frequency of co-occurrence between A, O, and H. , , : Spatial relationship among object and body parts.

locationorientationsize

Markov Random FieldClique potentialClique weight23 Background and Intuition Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference Experiments ConclusionOutline2425Model LearningHOA

fOf1f2fN

P1P2PN

cricket shotcricket bowlingInput:

Goals:Hidden human poses

26Model LearningHOA

fOf1f2fN

P1P2PN

Input:

Goals:Hidden human posesStructural connectivity

cricket shotcricket bowling

27Model LearningGoals:Hidden human posesStructural connectivityPotential parametersPotential weightsHOA

fOf1f2fN

P1P2PN

Input:

cricket shotcricket bowling28Model LearningGoals:Parameter estimationHidden variablesStructure learningHOA

fOf1f2fN

P1P2PN

Input:

cricket shotcricket bowlingHidden human posesStructural connectivityPotential parametersPotential weights29Model LearningGoals:HOA

fOf1f2fN

P1P2PNApproach:croquet shot

Hidden human posesStructural connectivityPotential parametersPotential weights30Model LearningGoals:HOA

fOf1f2fN

P1P2PNApproach:

Joint density of the modelGaussian priori of the edge number

Add an edge

Remove an edgeAdd an edgeRemove an edgeHill-climbing

Hidden human posesStructural connectivityPotential parametersPotential weights31Model LearningGoals:HOA

fOf1f2fN

P1P2PNApproach:

Maximum likelihood Standard AdaBoost

Hidden human posesStructural connectivityPotential parametersPotential weights32Model LearningGoals:HOA

fOf1f2fN

P1P2PNApproach:Max-margin learning

xi: Potential values of the i-th image. wr: Potential weights of the r-th pose. y(r): Activity of the r-th pose. i: A slack variable for the i-th image.Notations

Hidden human posesStructural connectivityPotential parametersPotential weights

33Learning Results

Cricket defensive shot

Cricket bowlingCroquet shot34Learning ResultsTennis serveVolleyball smash

Tennis forehand

Background and Intuition Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference Experiments ConclusionOutline35

36Model Inference

The learned models

37Model Inference

The learned models

Head detectionTorso detectionTennis racket detection

Layout of the object and body parts.Compositional Inference[Chen et al, 2007]

38Model Inference

The learned models

Output

Background and Intuition Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference Experiments ConclusionOutline3940Dataset and Experiment Setup Object detection; Pose estimation; Activity classification.Tasks:

[Gupta et al, 2009]Cricket defensive shotCricket bowlingCroquet shotTennis forehandTennis serveVolleyball smashSport data set: 6 classes

180 training (supervised with object and part locations) & 120 testing images

[Gupta et al, 2009]Cricket defensive shotCricket bowlingCroquet shotTennis forehandTennis serveVolleyball smashSport data set: 6 classes

41Dataset and Experiment Setup Object detection; Pose estimation; Activity classification.Tasks:180 training (supervised with object and part locations) & 120 testing images

Object Detection Results

Cricket bat42

Valid region

Croquet malletTennis racketVolleyball

Cricket ballOur MethodSliding windowPedestrian context[Andriluka et