Real-Time Human Pose Recognition in Parts from Single...

23
Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 PRESENTER: AHSAN ABDULLAH

Transcript of Real-Time Human Pose Recognition in Parts from Single...

Page 1: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

Real-Time Human Pose Recognition

in Parts from Single Depth Images

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard

Moore, Alex Kipman, Andrew Blake

CVPR 2011

PRESENTER: AHSAN ABDULLAH

Page 2: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

PROBLEM

Page 3: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

right

elbow

right hand left

shoulderneck

APPROACH

• Partitioning into body parts helps localizing the joints

Shotton et. al. CVPR 2011

Page 4: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

infer

body parts

per pixelcluster pixels to

hypothesize

body joint

positions

capture

depth image &

remove bg

fit model &

track skeleton

PIPELINE

Shotton et. al. CVPR 2011

Design Goals

• Efficiency

• Robustness

Page 5: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

Compute P(ci|wi)

pixels i = (x, y)

body part ci

image window wi

Discriminative approach

learn classifier P(ci|wi) from training data

image windows move

with classifier

BODY PART CLASSIFICATION

Shotton et. al. CVPR 2011

Page 6: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

LEARNING DATA

synthetic(train & test)

real(test) Shotton et. al. CVPR 2011

Page 7: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

LEARNING – DATA SYNTHESIS

Record MoCap500k frames

distilled to 100k poses

Retarget to several models

Render (depth, body parts) pairs

Shotton et. al. CVPR 2011

Page 8: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

• Depth comparisons

- very fast to compute

input

depth

image

xΔx

Δ

x

Δ

x

Δ

𝑓 𝐼, x = 𝑑𝐼 x − 𝑑𝐼(x + Δ)

image depth

image coordinate

offset depth

feature

response

Background pixelsd = large constant

scales inversely with depth

Δ =𝐯

𝑑𝐼 x

FEATURE SET

Shotton et. al. CVPR 2011

Page 9: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

Aggregation of decision trees

DECISION FORESTS

Page 10: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

Qn = (I, x)

f(I, x; Δn) > θn

no yes

c

Pr(c)

body part c

Pn(c)

c

Pl(c)

Take (Δ, θ) that maximises information gain

n

l r

reduce

entropy

[Breiman et al. 84]

for all pixels

Shotton et. al. CVPR 2011

TRAINING DECISION TREES

Page 11: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

image windowcentred at x

no

Toy example:Distinguish left (L)

and right (R) sides of

the body

no yes

yes

L R

P(c)

L R

P(c)

L R

P(c)

f(I, x; Δ1) > θ1

f(I, x; Δ2) > θ2

Shotton et. al. CVPR 2011

DECISION TREE CLASSIFICATION

Page 12: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

Trained on different random subset of images

“bagging” helps avoid over-fitting

Average tree posteriors

[Amit & Geman 97]

[Breiman 01]

[Geurts et al. 06]

………tree 1 tree T

c

P1(c)c

PT(c)

(𝐼, x) (𝐼, x)

𝑃 𝑐 𝐼, x =1

𝑇

𝑡=1

𝑇

𝑃𝑡(𝑐|𝐼, x)

Shotton et. al. CVPR 2011

DECISION FOREST CLASSIFIER

Page 13: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

ground truth

1 tree 3 trees 6 trees

inferred body parts (most likely)

40%

45%

50%

55%

1 2 3 4 5 6

Av

era

ge

pe

r-c

lass

Number of trees

Shotton et. al. CVPR 2011

NUMBER OF TREES

Page 14: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

30%

35%

40%

45%

50%

55%

60%

65%

8 12 16 20

Av

era

ge

pe

r-c

lass

ac

cu

rac

y

Depth of trees

30%

35%

40%

45%

50%

55%

60%

65%

5 15Depth of trees

synthetic test data real test data

Shotton et. al. CVPR 2011

TREE DEPTH

Page 15: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

• Define 3D world space density

• Mean shift for mode detection

Body parts to joint hypotheses

3. hypothesize

body joints

1 2

pixel index ibandwidth

3D coord

of i th pixel3D coord

pixel

weight

inferred

probability

depth at

i th pixel

Shotton et. al. CVPR 2011

Page 16: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

front view top viewside view

input depth inferred body parts

inferred joint positions

Shotton et. al. CVPR 2011No tracking or smoothing

Page 17: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

front view top viewside view

input depth inferred body parts

inferred joint positions

Shotton et. al. CVPR 2011No tracking or smoothing

Page 18: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

0.00.10.20.30.40.50.60.70.80.91.0

Ce

nte

r H

ea

d

Ce

nte

r N

ec

k

Left

Sh

ou

lde

r

Rig

ht…

Left

Elb

ow

Rig

ht

Elb

ow

Left

Wrist

Rig

ht

Wrist

Left

Ha

nd

Rig

ht

Ha

nd

Left

Kn

ee

Rig

ht

Kn

ee

Left

An

kle

Rig

ht

An

kle

Left

Fo

ot

Rig

ht

Fo

ot

Me

an

AP

Av

era

ge

pre

cis

ion

Shotton et. al. CVPR 2011

JOINT PREDICTION ACCURACY

Page 19: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

0.00.10.20.30.40.50.60.70.80.91.0

Cen

ter

Hea

d

Cen

ter

Nec

k

Lef

t S

ho

uld

er

Rig

ht

Sh

ou

lder

Lef

t E

lbo

w

Rig

ht

Elb

ow

Lef

t W

rist

Rig

ht

Wri

st

Lef

t H

and

Rig

ht

Han

d

Lef

t K

nee

Rig

ht

Kn

ee

Lef

t A

nkl

e

Rig

ht

An

kle

Lef

t F

oo

t

Rig

ht

Fo

ot

Mea

n A

P

Ave

rag

e p

reci

sio

n

Joint prediction from ground truth body parts

Joint prediction from inferred body parts

Shotton et. al. CVPR 2011

JOINT PREDICTION ACCURACY

Page 20: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

• No temporal information

- frame-by-frame

• Very fast

- simple depth image feature

- parallel decision forest classifier

Shotton et. al. CVPR 2011

ANALYSIS

Page 21: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

Uses…

• 3D joint hypotheses

• kinematic constraints

• temporal coherence

… to give

• full skeleton

• higher accuracy

• invisible joints

• multi-player4. track skeleton

1

2

3

KINECT SYSTEM

Page 22: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

• Frame-by-frame gives robustness

• Body parts representation for efficiency

• Fast, simple machine learning

• Significant engineering to scale to a

massive, varied training data set

Shotton et. al. CVPR 2011

SUMMARY

Page 23: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,

QUESTIONS