An Ef˜cient Branch-and-Bound Algorithm for Optimal Human ...

1
An Efficient Branch-and-Bound Algorithm for Optimal Human Pose Estimation Dept. of Electrical Engineering and Computer Science, University of Michigan at Ann Arbor, USA Silvio Savarese Murali Telaprolu VISION LAB Min Sun Honglak Lee 1. Overview GOAL: Propose an efficient and exact inference algorithm based on branch-and-bound (BB) to solve the human pose estimation problem on loopy graphical models Motivation: - Cast human pose estimation problem as MAP-MRFs inference problem - Solving MAP inference on general MRFs is challenging Pros: efficient inference by dynamic programming ( ). Cons: approximated model; common misclassification errors. Pros: more interactions between parts. Cons: exact inference NP-hard; require approximated inference [1] or reduced state space. Contributions: - Solve loopy models with large # of part hypotheses exactly. - Efficiency is provably guaranteed when # of nodes is small. - Up to 74 times faster than competing techniques [2] ! - Enable more accurate results than non-loopy models [3] ! Key Intuitions: - Flexible bound by relaxing the loopy model into a mixture of star models. - Novel data structure (BMT) and an efficient search routine (OBMS) to significantly reduce the time complexity for calculating the bound in each branch of the BB search Given the model , + = ε ij j i p ij N i i u i h h h f ) , ( ) ( θ) (h; θ θ search for the best body parts locations . Tree Model [3,8,11] Models: Inferred State Variables Tree Edges Loopy Edges Loopy Model [6,7,9,10] θ p θ u θ) (h; max h h MAP f = ) O(H 2 - Source code available online (http://www.eecs.umich.edu/vision/BBproj.html) . from to (Q<<H). ) O(H 2 H) O(Qlog 2 2. Branch-and-Bound Basic - Bound: of - Branching: φ = = Η 2 1 2 1 H H ; H H N N N N N ( ) ) Η ( LB ), Η ( UB N N - Stopping criteria: ) Η ( LB ) Η ( UB N N = N Η 1 H N 2 H N 11 H N 12 H N ) θ ; h ( MAP f 3. Flexible Bound Torso θ u β Head Upper Arm ) ) , ˆ ( max ) ( ( β) , θ (h; ˆ u Η + = i j j N j i j ji h N i i u i R h h h f β θ ) , ( ) , ( ) , ( s.t. j i p ij j i ij i j ji h h h h h h θ β β = + Mixture of Star Models: (Time complexity ) ) O(H 2 β) , θ (h; max ) β , θ ; ( UB u h u N f R N θ) ; (h MAP f = Η Η Bounds: (Constant time) θ) ; (h θ) ); ( (h ) β , θ ; ( LB MAP * u N N f f Η = Η Tightness of the Bound: - controls the tightness of the bound. β ) β , θ ; ( LB ) β , θ ; ( UB u u N N Η Η = Primal-Dual-Gap (PDG) - PDG reduced by solving using Message Passing [1]. β Η Η + = Η N i N j i j ji h i u i h N i j j i i h h h ) ) , ˆ ( max ) ( ( max β) , θ ; UB( ˆ u β θ 10 8 -1 5 10 5 10 6 7 -1 5 7 5 7 4. Efficient Bound Naive Bound: time complexity ; H is #states. ) O(H 2 Branch-Max-Tree (BMT): Similar to [5] ) , ( max ] [ A max j i j j i j ji h h h h h h j j β Η Η = - 1D array: - Efficient data structure: given A =[10 8 -1 5]. Building time: O(H) Query time: O(1) Opportunity Branch-Max-Search (OBMS) Max(A[0..3]) = 10 Max(A[0,1]) = 10 Max(A[2,3]) = 5 10 8 -1 5 10 5 10 BMT - Time complexity: O(H). Η Η = Η N i N i i N h ); ; ( max ) β , θ ; ( UB N h u λ - Re-define - 1D array: ) ; ( max ] A[ max i i N i i h h h h i i i Η = Η Η λ 2)bound for N N Η Η ˆ 1) only a few states competing for max ) ˆ ; ( ] [ A ˆ ] A[ i N i i i h h h Η = λ 6 8 -1 5 8 5 8 Given, A[0] was the max. - Opportunity Search by updating BMT : ] [ A ˆ max i ˆ i h i h Η Given A =[10 8 -1 5] and A=[6 7 -2 4]. ˆ Update BMT by A[0] ˆ Is A[0] the max? No ˆ Update BMT by A[1] ˆ Is A[1] the max? Yes ˆ - Time complexity: , Q<<H. H) O(Qlog 2 6. Experiment Results Human Pose Estimation (HPE): extending a tree-model (CPS) [3] to a fully-connected model. Video Pose Estimation: Solving Stretchable Model (SM) [4] ) O(H q Torso Upper Arms Lower Arms Head Ours Sapp et al. Ours Sapp et al. Ours Sapp et al. Buffy Stickmen VideoPose2 10^-1 10^0 10^1 10^2 10^3 10^4 10^-1 10^0 10^1 10^2 10^3 10^4 our time(sec) CP time(sec) Hard Problems 15 20 25 30 35 40 20 40 60 80 Pixel Error Threshold Accuracy % Elbow, Ours Elbow, [4] Wrist, Ours Wrist, [4] 20 40 60 80 Pixel Error Threshold 20 25 30 35 40 Accuracy % 15 20 40 60 80 15 Pixel Error Threshold 20 25 30 35 40 / Baseline Methods : CP [2] (q is max cluster size). - WithOBMS 74 times faster than CP [2] - Solve 13 out of 18 sequences within 20 minutes. References [1] A. Globerson and T. Jaakkola.Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIP’08. [2] D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, and T. Jaakkola. Tightening LP relaxations for MAP using message-passing. In UAI’08. [3] B. Sapp, A. Toshev, and B. Taskar. Cascaded models for articulated pose estimation. In ECCV’10. [5] D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 1984. [4] B. Sapp, D. Weiss, and B. Taskar. Parsing human motion with stretchable models. In CVPR’11. [6] M. Sun, M. Telaprolu, H. Lee, and S. Savarese. Efficient and Exact MAP-MRF Inference using Branch and Bound. In AISTATS’12. [7] M. Sun, M. Telaprolu, H. Lee, and S. Savarese. An Efficient Branch-and-Bound Algorithm for Optimal Human Pose Estimation. In CVPR’12. [8] P. Felzenszwalb and D. P. Huttenlocher. Pictorial Structures for Object Recognition. In IJCV’05. [9] D. Tran and D. Forsyth. Improved human parsing with a full relational model. In ECCV’10. [10] T. P. Tian and S. Sclaroff. Fast globally optimal 2D human detection with loopy graph models. In CVPR’10. [11] Y. Yang and D. Ramanan. Articulated pose estimation using flexible mixtures of parts. In CVPR’11. Acknowledgement [12] W. Choi, K. Shahid, and S. Savarese. Learning Context for Collective Activity Recognition. In CVPR’11. We acknowledge the support of the ONR grant N000141110389, ARO grant W911NF-09-1-0310. Correspondence [email protected] [email protected] [email protected] [email protected] 7. Conclusion - Flexible bound for MRFs with different structures (HPE and SM). - Faster than state-of-the-art exact Inference (CP) [2]. - Loopy model achieves superior accuracy than tree model [3]. - BMT and OBMS are used to speed up from to . ) O(H 2 H) O(Qlog 2 Guided Variable Selection (GVS): Split the selected variable, e.g., human pose: select a specific body part. - Re-define Upper Bound - Re-define Lower Bound - Define Node-wise-Local-Primal-Dual-Gap (NLPDG) - Properties of NLPDG Variable State Ordering (VSO) - Split by domain knowledge. E.g.: Human Pose: location of body part. -Select the Largest NLPDG 5. Branching Strategy Primal-Dual-Gap (PDG) = sum of NLPDG PDG = 0 implies the BB search is completed. ) , ˆ ( max ) ( ) ( ); ( β) , θ (h; max ) β , θ ; ( UB ˆ * u h u N Η Η + = = = Η i j j N j i j ji h i u i i i N i i i R N h h h h h f β θ λ λ + = = = Η i N j i j ji i u i i N i i N h h h f ) , ( ) ( ) h ( ˆ ); h ( ˆ θ) ; (h ) β , θ ; ( LB * * u β θ λ λ ) h ( ˆ ) ( ) h ( * * * i i i i h λ λ δ = 0 ) , ( ) , ( max ) h ( i N * * * * = j i j ji i j ji h i h h h h j β β δ PDG ) β , θ ; ( LB ) β , θ ; ( UB ) h ( u u * = Η Η = N N N i i δ Non-Negative

Transcript of An Ef˜cient Branch-and-Bound Algorithm for Optimal Human ...

Page 1: An Ef˜cient Branch-and-Bound Algorithm for Optimal Human ...

An Ef�cient Branch-and-Bound Algorithm for Optimal Human Pose Estimation

Dept. of Electrical Engineering and Computer Science, University of Michigan at Ann Arbor, USASilvio SavareseMurali Telaprolu

VISION LAB

Min Sun Honglak Lee

1. OverviewGOAL: Propose an e�cient and exact inference algorithm based on branch-and-bound (BB) to solve the human pose estimation problem onloopy graphical models

Motivation:- Cast human pose estimation problem as MAP-MRFs inference problem- Solving MAP inference on general MRFs is challenging

Pros: e�cient inference by dynamicprogramming ( ).Cons: approximated model; common misclassi�cation errors.

Pros: more interactions between parts.Cons: exact inference NP-hard; require approximated inference [1] or reduced state space.

Contributions:- Solve loopy models with large # of part hypotheses exactly.- E�ciency is provably guaranteed when # of nodes is small.- Up to 74 times faster than competing techniques [2] !- Enable more accurate results than non-loopy models [3] !

Key Intuitions:- Flexible bound by relaxing the loopy model into a mixture of star models.- Novel data structure (BMT) and an e�cient search routine (OBMS) to signi�cantly reduce the time complexity for calculating the bound in each branch of the BB search

Given the model ,∑∑∈∈

+=εij

jip

ijNi

iui hhhf ),()(θ)(h; θθ

search for the best body parts locations .

Tree Model [3,8,11]Models:Inferred State

VariablesTree EdgesLoopy Edges

Loopy Model [6,7,9,10]

θp

θu

θ)(h;maxh hMAP f=

)O(H2

- Source code available online (http://www.eecs.umich.edu/vision/BBproj.html).

from to (Q<<H).)O(H2 H)O(Qlog2

2. Branch-and-Bound Basic- Bound: of

- Branching: φ=∩∪=Η 2121 HH;HH NNNNN

( ))Η(LB),Η(UB NN

- Stopping criteria: )Η(LB)Η(UB NN =

1HN2HN

11HN12HN

)θ;h( MAPf

3. Flexible Bound

Torso θu

βHead Upper

Arm

)),ˆ(max)((β),θ(h; ˆu ∑∑

∈Η∈

+=i

jjNj

ijjihNi

iui

R hhhf βθ

),(),(),( s.t. jip

ijjiijijji hhhhhh θββ =+

Mixture of Star Models:(Time complexity ) )O(H2

β),θ(h;max)β,θ;(UB uh

uN

f RN θ);(hMAPf≥=Η Η∈

Bounds:

(Constant time) θ);(hθ));((h)β,θ;(LB MAP*uNN ff ≤Η=Η

Tightness of the Bound:

- controls the tightness of the bound.β)β,θ;(LB)β,θ;(UB uu

NN Η−Η=Primal-Dual-Gap (PDG)

- PDG reduced by solving using Message Passing [1].β

∑ ∑∈ ∈

Η∈Η∈ +=ΗNi Nj

ijjihiuihN

ijjii

hhh )),ˆ(max)((maxβ),θ;UB( ˆu βθ

10 8 -1 5

10 5

10

6 7 -1 5

7 5

7

4. E�cient BoundNaive Bound: time complexity ; H is #states.)O(H2

Branch-Max-Tree (BMT): Similar to [5]

),(max ][Amax ji

jj ijjihh

h hhhjjβΗ∈Η∈ =- 1D array:

- E�cient data structure: given A =[10 8 -1 5].

Building time: O(H) Query time: O(1)

Opportunity Branch-Max-Search (OBMS)

Max(A[0..3]) = 10

Max(A[0,1]) = 10

Max(A[2,3]) = 510 8 -1 5

10 5

10

BMT

- Time complexity: O(H).

∑∈

Η∈ Η=ΗNi

NiiN h );;(max)β,θ;(UBNh

u λ- Re-de�ne

- 1D array: );(max ]A[max ii Niihh hhiii

Η= Η∈Η∈ λ

2)bound for NN Η⊂Η1) only a few statescompeting for max )ˆ;(][A ]A[ i Niii hhh Η=≥ λ

6 8 -1 5

8 5

8

Given, A[0] was the max.

- Opportunity Search by updating BMT : ][Amax iˆi

hih Η∈

Given A =[10 8 -1 5] and A=[6 7 -2 4].ˆ

Update BMT by A[0]ˆIs A[0] the max? Noˆ

Update BMT by A[1]ˆIs A[1] the max? Yesˆ

- Time complexity: , Q<<H.H)O(Qlog2

6. Experiment ResultsHuman Pose Estimation (HPE): extending a tree-model (CPS) [3] to a fully-connected model.

Video Pose Estimation: Solving Stretchable Model (SM) [4]

)O(Hq

TorsoUpper ArmsLower ArmsHead

Ours Sapp et al. Ours Sapp et al. Ours Sapp et al.

Bu�y

Stic

kmen

Vide

oPos

e2

10^−1 10^0 10^1 10^2 10^3 10^410^−1

10^0

10^1

10^2

10^3

10^4

our time(sec)

CP ti

me(

sec)

Hard Problems

15 20 25 30 35 40

20

40

60

80

Pixel Error Threshold

Accu

racy

%

Elbow, OursElbow, [4]Wrist, OursWrist, [4]

20

40

60

80

Pixel Error Threshold20 25 30 35 40

Accu

racy

%

15

20

40

60

80

15Pixel Error Threshold

20 25 30 35 40

/

Baseline Methods : CP [2] (q is max cluster size).

- WithOBMS74 times fasterthan CP [2]

- Solve 13 out of 18 sequences within 20 minutes.

References[1] A. Globerson and T. Jaakkola.Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIP’08.

[2] D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, and T. Jaakkola. Tightening LP relaxations for MAP using message-passing. In UAI’08.

[3] B. Sapp, A. Toshev, and B. Taskar. Cascaded models for articulated pose estimation. In ECCV’10.

[5] D. Harel and R. E. Tarjan. Fast algorithms for �nding nearest common ancestors. SIAM Journal on Computing, 1984.

[4] B. Sapp, D. Weiss, and B. Taskar. Parsing human motion with stretchable models. In CVPR’11.

[6] M. Sun, M. Telaprolu, H. Lee, and S. Savarese. E�cient and Exact MAP-MRF Inference using Branch and Bound. In AISTATS’12.

[7] M. Sun, M. Telaprolu, H. Lee, and S. Savarese. An E�cient Branch-and-Bound Algorithm for Optimal Human Pose Estimation. In CVPR’12.

[8] P. Felzenszwalb and D. P. Huttenlocher. Pictorial Structures for Object Recognition. In IJCV’05.

[9] D. Tran and D. Forsyth. Improved human parsing with a full relational model. In ECCV’10.

[10] T. P. Tian and S. Sclaro�. Fast globally optimal 2D human detection with loopy graph models. In CVPR’10.

[11] Y. Yang and D. Ramanan. Articulated pose estimation using �exible mixtures of parts. In CVPR’11.

Acknowledgement

[12] W. Choi, K. Shahid, and S. Savarese. Learning Context for Collective Activity Recognition. In CVPR’11.

We acknowledge the supportof the ONR grant N000141110389, ARO grant W911NF-09-1-0310.

[email protected]

[email protected]@eecs.umich.edu

[email protected]

7. Conclusion- Flexible bound for MRFs with di�erent structures (HPE and SM).

- Faster than state-of-the-art exact Inference (CP) [2].- Loopy model achieves superior accuracy than tree model [3].

- BMT and OBMS are used to speed up from to .)O(H2 H)O(Qlog2

Guided Variable Selection (GVS): Split the selected variable,e.g., human pose: select a speci�c body part.

- Re-de�ne Upper Bound

- Re-de�ne Lower Bound

- De�ne Node-wise-Local-Primal-Dual-Gap (NLPDG)

- Properties of NLPDG

Variable State Ordering (VSO)- Split by domain knowledge. E.g.: Human Pose: location of body part.

-Select the Largest NLPDG

5. Branching Strategy

Primal-Dual-Gap (PDG) = sum of NLPDG

PDG = 0 implies the BB search is completed.

),ˆ(max)()(

);(β),θ(h;max)β,θ;(UB

ˆ

*uh

uN

∈Η∈

∈Η∈

+=

==Η

ijj

Njijjihi

uiii

Niii

RN

hhhh

hf

βθλ

λ

+=

==Η

iNjijjii

uii

NiiN

hhh

f

),()()h(ˆ

);h(ˆθ);(h)β,θ;(LB **u

βθλ

λ

)h(ˆ)()h( ***iiii h λλδ −=

0),(),(max)h(iN

**** ≥−= ∑∈j

ijjiijjihi hhhhj

ββδ

PDG)β,θ;(LB)β,θ;(UB)h( uu* =Η−Η=∑∈

NNNi

Non-Negative