New Convex Relaxations for MRF Inference with Unknown ...

7
New Convex Relaxations for MRF Inference with Unknown Graphs (Supplementary Material) Zhenhua Wang †* , Tong Liu , Qinfeng Shi , M. Pawan Kumar ] , Jianhua Zhang Zhejiang University of Technology, The University of Adelaide, ] The University of Oxford This supplementary provides: Motivations for estimating graph structures within the inference problem. The maximum likelihood interpretation of inferring MRF graphs and labels simultaneously. Detailed exposition of the matrix Θ. The proofs of Propositions 1-3. The pseudocode of our QP+CCCP algorithm. Additional experimental results. 1. Motivations for estimating graph structures within the inference problem In some problems it is possible to learn the graph structure instead of inferring it. This approach is well suited to problems where all the instances share the same structure (that is, the structure is “homogeneous”). However, as noted by Lan et al . 2010, who introduced the problem of simultaneous estimation of MRF labels and structures, some problems are characterized by the lack of a common basic structure across its instances. We refer to these as “heterogeneous” problems. In other words, each instance of the problem has a different graph structure. One such problem, group activity recognition, is studied in our paper, where we do not even know the number of persons (that is, the number of random variables of the MRF) in a scene. Furthermore, the persons can enter and exit the scenes at different time frames thereby changing the number of variables. Treating the structure as an unknown and inferring it for each instance is an intuitive solution to address this problem. 2. Maximum Likelihood Interpretation Because the underlying structure of heterogeneous graphs is changing and there are no multiple observations (or not sufficient many) for a fixed structure to draw meaningful statistical estimation, learning graph structures in this case is unachievable. Thus we would like to find the best graph structure and label jointly for each instance x. Let θ i (y i ) denote the unary potential when the node i takes a label y i , and let θ s,t (y s ,y t ) denote the pairwise potential when nodes s, t take y s ,y s at the same time. We can define the joint likelihood P(y,G| θ) as P(y,G| θ)= 1 Z (θ) exp ( - X iV θ i (y i ) - X (s,t)E θ st (y s ,y t ) ) . (1) Note in G =(V,E), the node set V is given and the edge set E is the unknown to be estimated. The normalisation constant Z (θ), or called partition function is Z (θ)= X ˆ yY X ˆ GG exp ( - X iˆ V θ i y i ) - X (s,t)ˆ E θ st y s , ˆ y t ) ) . (2) 1

Transcript of New Convex Relaxations for MRF Inference with Unknown ...

Page 1: New Convex Relaxations for MRF Inference with Unknown ...

New Convex Relaxations for MRF Inference with Unknown Graphs(Supplementary Material)

Zhenhua Wang †∗, Tong Liu †, Qinfeng Shi ‡, M. Pawan Kumar ], Jianhua Zhang †† Zhejiang University of Technology, ‡ The University of Adelaide, ] The University of Oxford

This supplementary provides:

• Motivations for estimating graph structures within the inference problem.

• The maximum likelihood interpretation of inferring MRF graphs and labels simultaneously.

• Detailed exposition of the matrix Θ.

• The proofs of Propositions 1-3.

• The pseudocode of our QP+CCCP algorithm.

• Additional experimental results.

1. Motivations for estimating graph structures within the inference problem

In some problems it is possible to learn the graph structure instead of inferring it. This approach is well suitedto problems where all the instances share the same structure (that is, the structure is “homogeneous”). However,as noted by Lan et al . 2010, who introduced the problem of simultaneous estimation of MRF labels and structures,some problems are characterized by the lack of a common basic structure across its instances. We refer to these as“heterogeneous” problems. In other words, each instance of the problem has a different graph structure. One suchproblem, group activity recognition, is studied in our paper, where we do not even know the number of persons(that is, the number of random variables of the MRF) in a scene. Furthermore, the persons can enter and exit thescenes at different time frames thereby changing the number of variables. Treating the structure as an unknownand inferring it for each instance is an intuitive solution to address this problem.

2. Maximum Likelihood Interpretation

Because the underlying structure of heterogeneous graphs is changing and there are no multiple observations(or not sufficient many) for a fixed structure to draw meaningful statistical estimation, learning graph structuresin this case is unachievable. Thus we would like to find the best graph structure and label jointly for each instancex. Let θi(yi) denote the unary potential when the node i takes a label yi, and let θs,t(ys, yt) denote the pairwisepotential when nodes s, t take ys, ys at the same time. We can define the joint likelihood P(y, G|θ) as

P(y, G|θ) =1

Z(θ)exp

(−∑i∈V

θi(yi)−∑

(s,t)∈E

θst(ys, yt)). (1)

Note in G = (V,E), the node set V is given and the edge set E is the unknown to be estimated. The normalisationconstant Z(θ), or called partition function is

Z(θ) =∑y∈Y

∑G∈G

exp(−∑i∈V

θi(yi)−∑

(s,t)∈E

θst(ys, yt)). (2)

1

Page 2: New Convex Relaxations for MRF Inference with Unknown ...

Here y = (yi)i=1,2,...,n, G = (V , E). The following equivalence holds:

argmaxy,G

P(y, G|θ) = argmaxy,G

ln P(y, G|θ) = argminy,G

{∑i∈V

θi(yi) +∑

(s,t)∈E

θst(ys, yt)}. (3)

This means that inference with unknown graphs can be done by maximising the potential function (or minimisingthe energy) jointly for both graph and label.

3. The Detailed Structure of Θ Matrix

Θ is a block diagonal matrix has the following form:

Θ =

θ11(1, 1) 0 . . . 0 0

θ11(1, 2)...

......

...

θ11(c, c) 00 θ12(1, 1) 0 0... θ12(1, 2)

......

...θ12(c, c) 0 0

0...

......

θn−1n(1, 1) 0

θn−1n(1, 2)...

...θn−1n(c, c)

0 θnn(1, 1)... θnn(1, 2)

...θnn(c, c)

,

here we let θij(a, b) = 0 ∀i = j, a, b ∈ Y. As in the paper, n, c denote the number of nodes and the cardinality ofthe label set respectively.

4. Proofs

4.1. Proof of Proposition 1

Proof According to the first two groups of constraints in problem (6), λij(yi, yj) ≤ min{zij , µij(yi, yj)}.We define δ(., .) = 1 if both its arguments are true, and 0 otherwise.∑

a,b

λij(a, b) = zij ,

=⇒∑a,b

(1− δ(a = yi, b = yj))λij(a, b) + λij(yi, yj) = zij ,

=⇒∑a,b

(1− δ(a = yi, b = yj))µij(a, b) + λij(yi, yj) ≥ zij ,

=⇒ 1− µij(yi, yj) + λij(yi, yj) ≥ zij ,=⇒ λij(yi, yj) ≥ zij + µij(yi, yj)− 1.

Page 3: New Convex Relaxations for MRF Inference with Unknown ...

Since λij(yi, yj) ≥ 0, we have λij(yi, yj) ≥ max{0, zij + µij(yi, yj)− 1}. That is to say, the feasibility set of LP-Mis a subset of LP-W. Moreover, it is easy to find a feasible point of LP-W, which is infeasible to LP-M. The proofis complete.

4.2. Proof of Proposition 2

Proof First d∗ is a feasible point of problem (12). This is because Θ+diag(d) is a Hermitian diagonally dominantmatrix with non-negative diagonal entries, thus positive semidefinite. Second we show d∗ is optimal using proofby contradiction below. Suppose we can find a d 6= d∗ such that ‖d‖1 < ‖d∗ ‖1 and Θ + diag(d) � 0. The d hasthe following structure:

d =[[αij ]i,j∈V,i≤j , [γij ]i,j∈V,i≤j

],

where [αij ]i,j∈V,i≤j is a concatenation of vectors αij in an order formed by enumerating all possible indices inturn; αij = [αij(yi, yj)]yi,yj is a vector formed by enumerating all possible labels in turn; similarly, [γij ]i,j∈V,i≤j is

another vector formed by enumerating all possible indices. It follows from ‖d‖1 < ‖d∗ ‖1 that∑i,j∈V,i≤j

∑yi,yj

|αij(yi, yj)|+∑

i,j∈V,i≤j

|γij | <∑

i,j∈V,i≤j

∑yi,yj

|θij(yi, yj)|.

Let αij(yi, yj) = 12 |θij(yi, yj)| + ρij(yi, yj), γij =

∑yi,yj

12 |θij(yi, yj)| + σij(yi, yj), ρij(yi, yj), σij(yi, yj) ∈ R. The

following inequality holds:

∣∣ ∑i,j∈V,i≤j

∑yi,yj

αij(yi, yj) +∑

i,j∈V,i≤j

γij∣∣ < ∑

i,j∈V,i≤j

∑yi,yj

|θij(yi, yj)|,

=⇒ |∑i≤j

∑yi,yj

|θij(yi, yj)|+∑i≤j

∑yi,yj

ρij(yi, yj) + σij(yi, yj)| <∑i≤j

∑yi,yj

|θij(yi, yj)|.

=⇒∑

i,j∈V,i≤j

∑yi,yj

ρij(yi, yj) + σij(yi, yj) < 0.

Let Q′ = Θ + diag(d). Defining v and u as

v = [vij ]i,j∈V,i≤j , where vij = [vij(yi, yj)]yi,yj ,u = [uij ]i,j∈V,i≤j .

Here u,v share the same structures as α and [γij ]i,j∈V,i<j respectively. Since Θ+diag(d) � 0, [v,u]>Q′[v,u] ≥ 0should always hold. According to (5), we can see

[v,u]>Q′[v,u] =1

2

∑i≤j

∑yi,yj

πij(yi, yj) +∑i≤j

∑yi,yj

(v2ij(yi, yj)ρij(yi, yj) + u2ijσij(yi, yj)).

Here

πij(yi, yj) =

{|θij(yi, yj)|(vij(yi, yj) + uij)

2 if θij(yi, yj) > 0,

|θij(yi, yj)|(vij(yi, yj)− uij)2 otherwise.

Let uij = 1. Let vij(yi, yj) = −uij if θij(yi, yj) > 0, and vij(yi, yj) = uij otherwise. Then [v,u]>Q′[v,u] < 0,which contradicts Q′ � 0. The proof is complete.

Page 4: New Convex Relaxations for MRF Inference with Unknown ...

4.3. Proof of Proposition 3

Lemma 1 The convex QP problem (14) is equivalent to the following

min∑i∈V

∑yi

µi(yi)θi(yi) +∑

i,j∈V,i<j

∑yi,yj

θij(yi, yj)λij(yi, yj)

s.t. λij(yi, yj) ≤(zij + µij(yi, yj))− (zij − µij(yi, yj))2

2if θij(yi, yj) < 0,

λij(yi, yj) ≥(zij + µij(yi, yj))

2 − (zij + µij(yi, yj))

2if θij(yi, yj) ≥ 0,

µ, z ∈ O .

Proof Expanding the objective function in (14):

χ>q + χ>Qχ = χ>(θ − d∗) + χ>(Θ + diag(d∗))χ

=∑i∈V

∑yi

µi(yi)θi(yi) +∑i<j

∑yi,yj

θij(yi, yj)βij(yi, yj),

where

βij(yi, yj) =

{12 (zij + µij(yi, yj)− 1

2 (zij − µij(yi, yj))2 if θij(yi, yj) < 0,12 (zij + µij(yi, yj)

2 − 12 (zij + µij(yi, yj)) otherwise.

Here βij(yi, yj) has two cases because of the absolute value used in the definition of d∗ in (12). Now we can seethat the convex QP problem (14) is equivalent to the following optimization problem:

min∑i∈V

∑yi

µi(yi)θi(yi) +∑

i,j∈V,i<j

∑yi,yj

θij(yi, yj)λij(yi, yj),

s.t. θij(yi, yj)λij(yi, yj) ≥ θij(yi, yj)βij(yi, yj), µ, z ∈ O .

Clearly θij(yi, yj)λij(yi, yj) ≥ θij(yi, yj)βij(yi, yj) is equivalent to the following:

λij(yi, yj) ≤(zij + µij(yi, yj))− (zij − µij(yi, yj))2

2if θij(yi, yj) < 0,

λij(yi, yj) ≥(zij + µij(yi, yj))

2 − (zij + µij(yi, yj))

2if θij(yi, yj) ≥ 0.

The proof is complete.

With Lemma 1, now we can prove Proposition 3.Proof According to the constraints λij(yi, yj) ≥ max{0, zij+µij(yi, yj)−1} in problem (3), we can get λij(yi, yj) ≥(zij+µij(yi,yj))

2−(zij+µij(yi,yj))2 .

According to the constraints λij(yi, yj) ≤ min{µij(yi, yj), zij}. we have

|zij − µij(yi, yj)| = |(zij − λij(yi, yj))− (µij(yi, yj)− λij(yi, yj))|,=⇒ |zij − µij(yi, yj)| ≤ (zij − λij(yi, yj)) + (µij(yi, yj)− λij(yi, yj)),

=⇒ λij(yi, yj) ≤zij + µij(yi, yj)− |zij − µij(yi, yj)|

2.

Since |zij−µij(yi, yj)| ≤ 1,zij+µij(yi,yj)−|zij−µij(yi,yj)|

2 ≤ (zij+µij(yi,yj))−(zij−µij(yi,yj))2

2 . It follows that λij(yi, yj) ≤(zij+µij(yi,yj))−(zij−µij(yi,yj))

2

2 .In summary, from the constriants of problem (3), we can get

(zij + µij(yi, yj))− (zij − µij(yi, yj))2

2≥ λij(yi, yj) ≥

(zij + µij(yi, yj))2 − (zij + µij(yi, yj))

2,

Page 5: New Convex Relaxations for MRF Inference with Unknown ...

0.62 0.84 1.06 1.28 1.50-25

-20

-15

-10

-5

0

aver

age

ener

gy

LP-WLP-Mconvex QPLP-W+B&BQP+CCCP

2.00 3.00 4.00h

-25

-20

-15

-10

-5

0

aver

age

ener

gy

LP-WLP-Mconvex QPLP-W+B&BQP+CCCP

Figure 1. Estimated energy (the objective in (1)) as the increase of the coupling strength parameter η (left diagram) andthe degree parameter h (right diagram). Lower is better.

consequently the constraints in the convex QP relaxation (14) using Lemma 1. Hence LP-W is tighter than theconvex QP relaxation. With Proposition 1, we know that LP-M is tighter than LP-W, thus the proof is complete.

5. The QP+CCCP Algorithm

Algorithm 1 QP+CCCP

Require: node set V , potentials θi(yi),∀i ∈ V , θij(yi, yj),∀i < j, yi, yj , ε. tmaxOutput: estimated labels Y ∗ and graph structure Z∗.1: Solve convex QP (14), let (µ(0), z(0)) be the solution. χ(0) ← [µ(0), z(0)].2: Repeat solving convex QP (14) till F (χ(t−1))− F (χ(t)) < ε or t ≥ tmax.

3: Decode Y ∗ ← {y∗i }: y∗i ← argminyi µ(t)ii (yi, yi).

4: Decode Z∗ ← {z∗ij}: z∗ij ← 1 if z(t)ij ≥ 0.5, otherwise z∗ij ← 0.

5: Return Y ∗, Z∗.

6. Additional Experimental Results and Interpretations

To further show the benefits of the proposed LP-W and QP+CCCP algorithms, here we provide more experi-mental results.

6.1. Estimated Energy as η and h Changing

Note η is the coupling strength parameter used to create synthetic data (see Section 5 in the main text), and hcontrols the maximum degree of the estimated graph. For each η or h, we create synthetic data (20 examples) usingthe Potts distance. We report average energy of these 20 examples in Figure 1. We can see that our QP+CCCPalways performs best with various h and η parameters among all methods. Lan performs second best among themethods that estimate the graph repetitively (including QP+CCCP, LP-W+B&B and Lan). Among methodsthat estimate the graph only once (including QP, LP-W and LP-M), the proposed LP-M performs best. LP-Meven outperforms LP-W+B&B for all cases, and outperforms Lan on small η, despite the latter two estimating thegraph repetitively.

6.2. Visualization of The Learned Parameters for Human Interaction Recognition

In order to model human interactions, within our MRF model, we include the relative spatial positions ofone anchored person to another, which is typically utilized as a high-level contextual cue in human interaction

Page 6: New Convex Relaxations for MRF Inference with Unknown ...

recognition task.

overl

ap

near

left

adja

cent

left

adja

cent

rig

ht

near

rig

ht

far

Figure 2. The discretized relative spatial positions of one anchored person (the shaded rectangle) to another (each cell withwhite background).

far

OT HS HF HG KS

OT

HS

HF

HG

KS0

0.02

0.04

0.06

0.08

0.1

0.12

adjacent

OT HS HF HG KS

OT

HS

HF

HG

KS 0

0.2

0.4

0.6

0.8

1

1.2

1.4

near

OT HS HF HG KS

OT

HS

HF

HG

KS0

0.1

0.2

0.3

0.4

0.5

overlap

OT HS HF HG KS

OT

HS

HF

HG

KS0

0.5

1

1.5

2

2.5

Figure 3. Visualization of the learned parameters for human interaction recognition on TVHI. Here OT, HS, HF, HG, KSmeans no-interaction, handshake, highfive, hug and kiss respectively. See text below for interpretation.

To illustrate the effectiveness of the learned model for human interaction recognition, we visualize the parametersthat encodes the confidence (brighter indicates more confident) of assigning a particular pair of action labels to twoindividuals when they have interactions with other, see Figure 3. We can conclude that when persons are nearlyoverlapping with respect to their relative distance, it is confident for our model to recognize their interaction ashugging or kissing (the overlap diagram). When persons are adjacent or near to each other, our model is likelyto recognize their interaction as handshake or highfive (the adjacent and the near diagrams). When people arefar alway from each other, our model favors the highfive class (the far diagram), because the training set containshighfive-examples performed by distant persons.

Page 7: New Convex Relaxations for MRF Inference with Unknown ...

6.3. Visualization of Group Activity Recognition Results on CAD

The results are shown in Figure 4.

Figure 4. Visualisation of recognition results on CAD using our QP+CCCP algorithm. The predicted action and pose labelsare shown in cyan and green boxes. The red edges represent the learned graph structures within the action layer. For actionnames, CR, WK, QU, WT indicate cross, walk, queue and wait. For poses, B, L, R, F, BL, BR, FR, FL denote back, left,right, front, back-left, back-right, front-right and front-left respectively.

6.4. Visualization of Human Interaction Recognition Results on TVHI

The results are shown in Figure 5.

Figure 5. Visualisation of recognition results on TVHI using our QP+CCCP algorithm. The predicted action and poselabels are shown in cyan and green boxes. The red edges represent the learned graph structures. For interaction names, OT,HS, HF, HG, KS indicate no-interaction, handshake, highfive, hug and kiss. For poses, B, L, R, FL, FR denote backwards,left, right, frontal-left, frontal-right respectively.