Recognizing Human-Object Interactions inStill Images by Modeling the Mutual Contextof Objects and...

Recognizing Human-Object Interactions inStill Images by Modeling the Mutual

Contextof Objects and Human Poses

Presented By

Arwa Chittalwala

Irfan Shaikh

Heena Patel

Robots interact with objects

Automatic sports commentary

“Kobe is dunking the ball.”

Human-Object Interaction

Medical care

Playing saxophone

Playing bassoon

Playing saxophone

Grouplet is a generic feature for structured objects, or interactions of groups of objects.

(Previous talk: Grouplet)

Caltech101

HOI activity: Tennis Forehand

Holistic image based classification

Detailed understanding and reasoning

Berg & Malik, 2005 Grauman & Darrell, 2005 Gehler & Nowozin, 2009 OURS

48% 59% 77% 62%

TorsoRight-armLeft-a

Left-leg

• Human pose estimation

Tennis racket

• Object detection

TorsoRight-armLeft-a

Left-leg

Tennis racket

HOI activity: Tennis Forehand

• Background and Intuition

• Mutual Context of Object and Human Pose Model Representation

Model Learning

Model Inference

• Experiments

• Conclusion

Outline

Model Learning

Model Inference

• Experiments

• Conclusion

Outline

• Felzenszwalb & Huttenlocher, 2005

• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009

Difficult part appearance

Self-occlusion

Image region looks like a body part

Human pose estimation & Object detection

Human pose estimation is challenging.

• Felzenszwalb & Huttenlocher, 2005

• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009

Facilitate

Given the object is detected.

• Viola & Jones, 2001

• Lampert et al, 2008

• Divvala et al, 2009• Vedaldi et al, 2009

Small, low-resolution, partially occluded

Image region similar to detection target

Object detection is challenging

• Viola & Jones, 2001

• Lampert et al, 2008

• Divvala et al, 2009• Vedaldi et al, 2009

Facilitate

Given the pose is estimated.

Mutual Context

• Hoiem et al, 2006• Rabinovich et al, 2007• Oliva & Torralba, 2007• Heitz & Koller, 2008• Desai et al, 2009• Divvala et al, 2009

• Murphy et al, 2003• Shotton et al, 2006• Harzallah et al, 2009• Li, Socher & Fei-Fei, 2009• Marszalek et al, 2009• Bao & Savarese, 2010

Context in Computer Vision

with context

without context

Helpful, but only moderately outperform better

Previous work – Use context cues to facilitate object detection:

• Viola & Jones, 2001• Lampert et al, 2008

Context in Computer Vision

Our approach – Two challenging tasks serve as mutual context of each other:

With mutual context:

Without context:

with context

without context

Helpful, but only moderately outperform better

Previous work – Use context cues to facilitate object detection:

• Hoiem et al, 2006• Rabinovich et al, 2007• Oliva & Torralba, 2007• Heitz & Koller, 2008• Desai et al, 2009• Divvala et al, 2009

• Murphy et al, 2003• Shotton et al, 2006• Harzallah et al, 2009• Li, Socher & Fei-Fei, 2009• Marszalek et al, 2009• Bao & Savarese, 2010

Model Learning

Model Inference

• Experiments

• Conclusion

Outline

Mutual Context Model Representation

• More than one H for each A;

• Unobserved during training.

Croquet shot

Volleyball smash

Tennis forehand

Intra-class variations

Activity

Object

Human pose

Body parts

lP: location; θP: orientation; sP: scale.

Croquet mallet

Volleyball

Tennis racket

f: Shape context. [Belongie et al, 2002]

Image evidence

f1 f2 fN

( , )e O H

( , )e A O( , )e A H

e ee E

Markov Random Field

Clique potential

Clique weight

f1 f2 fN

( , )e A O ( , )e A H ( , )e O H• , , : Frequency of co-occurrence between A, O, and H.

f1 f2 fN

( , )e nO P

( , )e m nP P

P1 PNP2

H• , , : Spatial

relationship among object and body parts.

( , )e nO P ( , )e m nP P( , )e nH P

bin binn n nO P O P O Pl l s s

location orientation size

( , )e nH P

e ee E

Markov Random Field

Clique potential

Clique weight

f1 f2 fN

Obtained by structure learning

PNP1 P2

• Learn structural connectivity among the body parts and the object.

• , , : Spatial relationship among object and body parts.

( , )e nO P ( , )e m nP P( , )e nH P

location orientation size ( , )e nO P

( , )e m nP P

( , )e nH P

e ee E

Markov Random Field

Clique potential

Clique weight

f1 f2 fN

P1 P2 PN

• and : Discriminative part detection scores.( , )e OO f ( , )

ne n PP f

[Andriluka et al, 2009]

Shape context + AdaBoost

• Learn structural connectivity among the body parts and the object.

[Belongie et al, 2002][Viola & Jones, 2001]

( , )e OO f

( , )ne n PP f

• , , : Spatial relationship among object and body parts.

( , )e nO P ( , )e m nP P( , )e nH P

location orientation size

e ee E

Markov Random Field

Clique potential

Clique weight

Model Learning

Model Inference

• Experiments

• Conclusion

Outline

Model Learning

f1 f2 fN

P1 P2 PN

e ee E

cricket shot

cricket bowling

Input:

Goals:

Hidden human poses

Model Learning

f1 f2 fN

P1 P2 PN

Input:

Goals:

Hidden human poses

Structural connectivity

e ee E

cricket shot

cricket bowling

e ee E

Model Learning

Goals:

Hidden human poses

Potential parameters

Potential weights

f1 f2 fN

P1 P2 PN

Input:

cricket shot

cricket bowling

Model Learning

Goals:

Parameter estimation

Hidden variables

Structure learning

f1 f2 fN

P1 P2 PN

Input:e e

cricket shot

cricket bowling

Hidden human poses

Potential weights

Model Learning

Goals:

f1 f2 fN

P1 P2 PN

Approach:

croquet shot

e ee E

Hidden human poses

Potential weights

Model Learning

Goals:

f1 f2 fN

P1 P2 PN

Approach:

max2e eeE e

Joint density of the model

Gaussian priori of the edge number

Remove

an edge

Remove

an edge

Hill-climbing

e ee E

Hidden human poses

Potential weights

Model Learning

Goals:

f1 f2 fN

P1 P2 PN

Approach:

( , )e O H( , )e A O ( , )e A H( , )e nO P ( , )e m nP P( , )e nH P

( , )e OO f ( , )ne n PP f

• Maximum likelihood

• Standard AdaBoost

e ee E

Hidden human poses

Potential weights

Model Learning

Goals:

f1 f2 fN

P1 P2 PN

Approach:

Max-margin learning

2 r ir i

• xi: Potential values of the i-th image.

• wr: Potential weights of the r-th pose.

• y(r): Activity of the r-th pose.• ξi: A slack variable for the i-th

image.

Notations

s.t. , where ,

c i r i i

i r y r y c

w x w x

e ee E

Hidden human poses

Potential weights

Learning Results

Cricket defensive

Cricket bowling

Croquet shot

Learning Results

Tennis serve

Volleyball smash

Tennis forehand

Model Learning

Model Inference

• Experiments

• Conclusion

Outline

Model Inference

The learned models

Model Inference

The learned models

Head detection

Torso detection

Tennis racket detection

Layout of the object and body parts.

Compositional Inference

[Chen et al, 2007]

* *1 1 1 1,, , , n nA H O P

Model Inference

The learned models

* *1 1 1 1,, , , n nA H O P * *

,, , ,K K K K n nA H O P

Output

Model Learning

Model Inference

• Experiments

• Conclusion

Outline

Dataset and Experiment Setup

• Object detection;• Pose estimation;• Activity classification.

Tasks:

[Gupta et al, 2009]

Cricket defensive shot

Cricket bowling

Croquet shot

Tennis forehand

Tennis serve

Volleyball smash

Sport data set: 6 classes180 training (supervised with object and part locations) & 120 testing images

[Gupta et al, 2009]

Cricket bowling

Croquet shot

Tennis forehand

Tennis serve

Volleyball smash

Sport data set: 6 classes

Tasks:

180 training (supervised with object and part locations) & 120 testing images

0 0.2 0.4 0.6 0.8 10

Recall

Object Detection Results

Cricket bat

Valid region

Croquet mallet Tennis racket Volleyball

0 0.2 0.4 0.6 0.8 10

Recall

Cricket ball

Our Method

Sliding window

Pedestrian context

[Andriluka et al, 2009]

[Dalal & Triggs, 2006]

Object Detection Results

430 0.2 0.4 0.6 0.8 10

Recall

Volleyball

0 0.2 0.4 0.6 0.8 10

Recall

Cricket ball

0 0.2 0.4 0.6 0.8 10

RecallP

Our MethodPedestrian as contextScanning window detector

0 0.2 0.4 0.6 0.8 10

Recall

0 0.2 0.4 0.6 0.8 10

Recall

Sliding window Pedestrian context Our method

Tasks:

[Gupta et al, 2009]

Cricket bowling

Croquet shot

Tennis forehand

Tennis serve

Volleyball smash

Sport data set: 6 classes180 training & 120 testing images

Human Pose Estimation Results

Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head

Ramanan, 2006 .52 .22 .22 .21 .28 .24 .28 .17 .14 .42

Andriluka et al, 2009 .50 .31 .30 .31 .27 .18 .19 .11 .11 .45

Our full model .66 .43 .39 .44 .34 .44 .40 .27 .29 .58

Ramanan, 2006 .52 .22 .22 .21 .28 .24 .28 .17 .14 .42

Andriluka et al, 2009 .50 .31 .30 .31 .27 .18 .19 .11 .11 .45

Our full model .66 .43 .39 .44 .34 .44 .40 .27 .29 .58

Andriluka et al, 2009

Our estimation result

Tennis serve model

Andriluka et al, 2009

Our estimation result

Volleyball smash model

Ramanan, 2006 .52 .22 .22 .21 .28 .24 .28 .17 .14 .42

Andriluka et al, 2009 .50 .31 .30 .31 .27 .18 .19 .11 .11 .45

Our full model .66 .43 .39 .44 .34 .44 .40 .27 .29 .58

One pose per class .63 .40 .36 .41 .31 .38 .35 .21 .23 .52

Estimation result

Tasks:

[Gupta et al, 2009]

Cricket bowling

Croquet shot

Tennis forehand

Tennis serve

Volleyball smash

Sport data set: 6 classes180 training & 120 testing images

Activity Classification Results

Gupta et al, 2009

Our model

Bag-of-Words

racy 78.9%

No scene information Scene is

critical!! Cricket shot

Tennis forehand

Bag-of-wordsSIFT+SVM

Gupta et al, 2009

Our model

ConclusionHuman-Object Interaction

Next Steps

• Pose estimation & Object detection on PPMI images.

• Modeling multiple objects and humans.

Grouplet representation

Mutual context model

Recognizing Human-Object Interactions inStill Images by Modeling the Mutual Contextof Objects and...

Education

Transcript of Recognizing Human-Object Interactions inStill Images by Modeling the Mutual Contextof Objects and...

The issue of dosimetry and uncertaintiesin the contextof ... · anatomical structures in the body when calculating the energy deposition limitations in the physical data (e.g. energy

Indigenous!Policy!Lens!...2 Agenda • Contextof!MCFD!Policy!and!Policy!Discussions!! "• Aboriginal!Equity!and!Inclusion!Lens" • Applying!the!Indigenous!Lens!–Policy!Processes

Civil Law Actions in the Contextof Competition Restricting Practices under Polish Law

APPENDIX A DISCUSSION OF DOW INSTILL VACUUM …

No Right Brain Left Behind - A global movement to re-instill creativity in education.

Essential Question: How can we avoid "readicide" and instill a love for reading?

Ear-Eye-Nose Injuries - ø The #1 Human Anatomy and ... · Human Anatomy Course.com Ear-Eye-Nose Injuries . TABLE OF CONTENTS Lesson Paragraphs INTRODUCTION 1. IRRIGATE, INSTILL EYE

ComparativeLiterary+Studies+Program+ CourseDescriptions+ ... Descriptions_2014-15.pdfCourseDescription This!course!studies!selected!works!of!modern!Jewish!literature!in!the!contextof!their!historical!background.!

Instill Work

Combined Shareholders’ Meeting 2020 - Alstom · 2020. 10. 29. · Contextof the Meeting: acquisition projectof Bombardier Transportation 6 Henri POUPART-LAFARGE, Chairman and CEO

DGRZETICH - Adv Vulnerability Management DRAFTto!examine!patch!management!in!the!contextof!vulnerability!management throughoutthis!paper.!

HOW DUA IS MEANT TO INSTILL HOPE IN US DURING · PDF fileHOW DUA IS MEANT TO INSTILL HOPE IN US DURING RAMADAAN Written by ... conscious of Almighty Allah at all ... MEANT TO INSTILL

How to Instill Ethics in Commercial Lending: Understanding Due Diligence

THAILAND!VISIT!2016! “Beyond‘Tolerance’:!! - MMN · exclusion!of!migrants!in!the!contextof!origin!counties!Cambodiaand!Myanmar!as!well! as!destination!countries!Thailand!and!Japan.!!

ACCELERATING PRO POOR GROWTH IN THE CONTEXTOF KILIMO KWANZA · ACCELERATING PROPOOR GROWTH IN THE CONTEXTOF KILIMO KWANZA Paper Presented to the Annual National Policy Dialogue On

Records - Joni Mitchell · The centralthemeof Mark Knopfler’smusic istheloner, a modernShaneagainsttheodds. But in the contextof DireStraits’breezyrock-ing this myth-makingseems

PIA, methodology February 2018 edition - cnil.fr · 1. define and describe the contextof the ... Objective: build the system that ensures compliance with privacy protection ... the

AINT EDWARD THE CONFESSOR IDDLE CHOOL CIENCE · collect and analyze data in Google Drive apps. Engineering Challenges instill in students the applications of science to solve human

Nazi Propaganda. Nazi Propaganda was used to: Instill in Germans a fear and hatred of Jews Instill in Germans a fear and hatred of Jews Instill in people.

Gepan Instill Productmonograph