Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan

39
RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang, Sheng-Jyh Student: Hung, Fei-Fan Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

description

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction - PowerPoint PPT Presentation

Transcript of Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan

Page 1: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSESDate: 2013/05/27Instructor: Prof. Wang, Sheng-Jyh Student: Hung, Fei-Fan

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Page 2: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

2

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 3: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

3

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 4: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

4

Why using context in computer vision?

• simple image vs. human activities

~3-4%

with context

without context

With mutual context:

Without context:

Page 5: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

5

Challenges in Human Pose Estimation

• Human pose estimation is challenging

• Object detection facilitate human pose estimation

Difficult part appearance

Self-occlusion

Image region looks like a body part

Page 6: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

6

Challenges in Object Detection• Object detection is challenging

• human pose estimation facilitate object detection

Small, low-resolution, partially occluded

Image region similar to detection target

Page 7: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

7

The Goal• To build a mutual context model in Human-Object

Interaction(HOI) activities

Page 8: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

8

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 9: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

9

Tennis ball

Croquet mallet

Volleyball

Tennis racket

O:

Model representation• Modeling the mutual context of object and human poses

A:

Croquet shot

Volleyball smash

Tennis forehand

H:

P: body parts,

, M:num of bounding box

More than one atomic pose H in A

Body parts

Page 10: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

10

• : co-occurrence compatibility between A,O,H• : spatial relationship between O,H• : modeling the image evidence with detectors or classifiers

Model representation

H

A

P1 P2 PL

O1 O2

activity

Human poseobjects

Page 11: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

11𝝓1: Co-occurrence context• co-occurrence between all A,O,H

• : strength of co-occurrence interaction

between

: indicator function: total number of atomic poses : total number of objects : total number of activity classes

H

A

P1 P2 PL

O1 O2

Page 12: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

12

• Spatial relationship between all O and different H

• : weight of • : a sparse binary vector • shows relative location• of w.r.t.

𝝓2: Spatial context

H

A

P1 P2 PL

O1 O2

:

Page 13: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

13

• Model O in the image I using object detection score

• For all object O• : vector of score of detecting • : weight of

• Between Om and Om’

• : binary feature vector• : weight of and

𝝓3: Modeling objects

H

A

P1 P2 PL

O1 O2

Page 14: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

14𝝓4: Modeling human pose• Model atomic pose that H belongs to and likelihood

• : Gaussian likelihood function• : vector of score of detecting body part in

H

A

P1 P2 PL

O1 O2

Page 15: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

15𝝓5: Modeling activity• Model HOI activity by training activity classifier

• : -dim output of one-versus-all (OVA) discriminative classifier taking image as features

• : feature weight of

H

A

P1 P2 PL

O1 O2

Page 16: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

17

Model Properties• Spatial context between O and H

• Object detection and human pose estimation facilitate each other • Ignore the objects and body parts that are unreliable

• Flexible to extend to large scale datasets and other activities• Jointly model can share all objects and atomic poses

Page 17: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

18

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 18: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

19

Model Learning

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

Page 19: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

20

• Using clustering to obtain atomic poses

• Normalize the annotations

• Finding missing part• Using the nearest visible neighbor

• Obtain a set of atomic poses• Hierarchical clustering with maximum linkage measure :

Obtaining Atomic Poses

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

Page 20: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

21

Training Detectors and Classifiers• : Object detector in • : Human body part detector in

• : Overall activity classifier in

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

deformable part model

Spatial pyramid matching (SPM)SIFT + 3 level image pyramid

Page 21: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

24

Estimating Model Parameters• Estimate by using ML approach

with zero-mean Gaussian priorAssign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

Page 22: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

25

Learning result

Page 23: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

26

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 24: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

27

Model Inference

Initialize with learned results

New image

Update human body parts

Update object detection results

Update A and H labels

Page 25: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

28

Initialization

Initialize Activity classification

Object detectionHuman pose estimation

New image

Initialize with learned results

A: SPM classificationO: object detectionH: pictorial structure model

Page 26: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

29

Update model inference• Marginal distribution of human pose:

• Using mixture of Gaussian to refine the prior of body part

Update human body parts

Update object detection results

Update A and H labels

Page 27: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

30

Update model inference

• Greedy forward search method :• Initial and no object in bounding box• Select • Label box as • update

• Stop when <0

Update human body parts

Update object detection results

Update A and H labels

O,H

O,A,H O,I

Page 28: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

31

Update model inference• Enumerate possible A and H label

• Optimize

Update human body parts

Update object detection results

Update A and H labels

Page 29: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

32

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 30: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

33

Experimental Results (Sports Dataset)

Page 31: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

34

Experimental Results (Sports Dataset)

Page 32: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

35

Experimental Results (Sports Dataset)• Activity classification

Page 33: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

36

Page 34: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

37

Experimental results (PPMI Dataset)

Page 35: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

38

Experimental results (PPMI Dataset)

Page 36: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

39

Page 37: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

40

Outline• Introduction

• Intuition and goal• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 38: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

41

Conclusion• Mutual context can significantly improve the performance

in difficult visual recognition problems

• The joint model can share all the information

• Annotate all the human body parts and objects in training images

Page 39: Date: 2013/05/27 Instructor :  Prof. Wang , Sheng- Jyh Student: Hung,  Fei -Fan

42

Reference• Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in

Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)

• B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010

• B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010.

• S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.

• http://en.wikipedia.org/wiki/Hierarchical_clustering