Variable Elimination (VE) - TAUhaimk/pgm-seminar/part_2_barak.pdfย ยท The Induced Graph And Maximal...
Transcript of Variable Elimination (VE) - TAUhaimk/pgm-seminar/part_2_barak.pdfย ยท The Induced Graph And Maximal...
Variable Elimination (VE)
Barak Sternberg
Basic Ideas in VE
โช Example 1:
โช Let G be a Chain Bayesian Graph:
โช How would one compute ๐ ๐๐ = ๐ ?โ Using the CPDs:
โช ๐ ๐2 = ๐ฅ = ๐ฅโ๐๐๐ ๐1 ๐ ๐1 = ๐ฅ P X2 = l X1 = ๐ฅ)
โช Saving each distribution and reusing it for next computation (Dynamic Programming).
๐1 ๐2 ๐๐โ1 ๐๐
Basic Ideas in VE
โช Complexity of Example 1:โ Assume value space of ๐๐s is of size ๐
โ For Each distribution ๐(๐๐): ๐(๐2)
โ There are ๐ nodes: ๐(๐๐2)
โช Naive Computation cost is ๐(๐๐โ1):โ computing ๐1,โฆ๐๐โ1 ๐(๐1, โฆ๐๐)
Basic Ideas in VE
โช Example 2
โช Assume The equation:
โช ๐ ๐ด, ๐ต, ๐ถ, ๐ท = ๐1(๐ด)๐2(B,A) ๐3(C,B) ๐4 ๐ท, ๐ถ
โช We want to compute:โ ๐ ๐ท := ๐ดโฒ,๐ตโฒ,๐ถโฒ๐ ๐ดโฒ, ๐ตโฒ, ๐ถโฒ, ๐ท
โ This will be โelimination of A,B,C from ๐โ.
Basic Ideas in VE
โช We can directly compute:โ ๐ดโฒ,๐ตโฒ,๐ถโฒ๐1(๐ด
โฒ)๐2(Bโฒ,Aโฒ)๐3(Cโฒ,Bโฒ)๐4 ๐ท, ๐ถโฒ
โ Will cost ๐(๐พ3)
โช Or we can use the following observation:
๐ดโฒ,๐ตโฒ,๐ถโฒ
๐1(๐ดโฒ)๐2(Bโฒ,Aโฒ)๐3(Cโฒ,Bโฒ)๐4 ๐ท, ๐ถโฒ
=
๐ถโฒ
๐4 ๐ท, ๐ถโฒ
๐ตโฒ
๐3(Cโฒ,Bโฒ)
๐ดโฒ
๐2(Bโฒ,Aโฒ)๐1(๐ดโฒ)
Basic Ideas in VE
โช We can do it in stages:
โช Stage 1:โ ๐๐๐๐๐๐ ๐ฟ0 ๐ด
โฒ, ๐ตโฒ = ๐1 ๐ดโฒ ๐2(Bโฒ,Aโฒ)
โ ๐ถ๐๐๐๐ข๐ก๐ ๐ฟ1 ๐ตโฒ = ๐ดโฒ ๐ฟ0 ๐ด
โฒ, ๐ตโฒ
โช We have Left to compute:โ ๐ตโฒ,๐ถโฒ ฮด 1(๐ตโฒ)๐3(Cโฒ,Bโฒ) ๐4 ๐ท, ๐ถโฒ
โช Continue the same way.
โช Complexity:โ costs: ๐ 3๐พ2 = ๐(๐พ2)
Conclusion
โช Computing inner expressions to outer, caching inner ones prevents recomputation! Exactly like in dynamic programming.
โช โPushing the sigmasโ insideโ may help us compute the total function in stages.
Factor Marginalization
โช ๐ท๐๐๐๐๐๐ก๐๐๐: ๐ ๐ ๐๐ ๐ ๐๐๐๐ก๐๐ ๐๐๐ ๐ โ ๐๐๐ ๐ โ โWโ๐๐๐ ๐ ๐๐ ๐ ๐ ๐๐ก ๐๐ ๐๐ฃโฒ๐ , ๐๐๐๐๐ก๐ ๐๐๐๐๐ ๐ โ ๐
โช ๐ท๐๐๐๐๐๐ก๐๐๐: Let ๐ be a set of rvโs, ๐ โ ๐ is a rv, and a factor ๐(๐, ๐).
Then โThe factor marginalization of Y in ๐โ or ๐๐ is:
๐๐ โ ๐ ๐ = ๐โฒโ๐๐๐(๐)๐(๐, ๐โฒ)
โช Trivial properties:
โ ๐ ๐๐ = ๐ ๐๐
โ ๐ โ ๐๐๐๐๐ ๐1 โ ๐๐1๐2 = ๐1 ๐๐2
Algorithm Result
โช Given set of factors ฮฆ and variables โto eliminateโ, ๐, the algorithm returns the factor (function):
๐
๐โฮฆ
๐
3SAT Run Example On Board
Complexity Analysis
โช Assume we have ๐ random variables and ๐ factors. We run the algorithm for total variable elimination.
โช e.g: in Bayesian model with ๐ nodes the number of factors is ๐ = ๐,Markov can have ๐ โฅ ๐.
โช ๐ + ๐ factors at most were in the set of factors.
โช Denote the number of entries in ๐๐ as ๐๐.
โช Each factor ๐ is multiplied at most ๐๐ times when generating its corresponding ๐๐, and then replaced with new factor.
โช Thus, max multiplication number: ๐( ๐ +๐ maxi(๐๐))
Complexity Analysis
โช Additions Complexity:
โช each entry in ๐๐ is added once to generate a new factor ๐๐, there are at most ๐ factors ๐๐ generated in the internal algorithm.
โช Thus, addition complexity: ๐(๐maxi(๐๐))
โช Total Complexity: ๐((๐ + ๐)maxi(๐๐))
Complexity Analysis
โช If each factor ๐๐ has ๐พ๐ variables, and we have no more than ๐ฃ values, ๐๐ โค ๐ฃ
๐๐
โช And when each variable has exactly ๐ฃ possible values, then ๐๐ = ๐ฃ
๐๐ .
Thus, we depend on all factors generated, exponentially in their variables number.
Variable Elimination For Conditional Proability
Algorithm Result
โช Given a set of factors ฮฆ and query variables ๐, E=e as evidence, the algorithm returns:
โช ๐โ ๐ = ๐ ๐โฮฆ๐[๐ธ = ๐]
โช ๐ผ = ๐,๐ ๐โฮฆ๐[๐ธ = ๐]
Elimination as Graph Transformation
โช ๐ท๐๐๐๐๐๐ก๐๐๐: let ฮฆ be a set of factors, then define:
๐๐๐๐๐ ฮฆ โ
๐โฮฆ
๐๐๐๐๐ ๐
โช ๐ท๐๐๐๐๐๐ก๐๐๐: Let ฮฆ be a set of factors, ๐ปฮฆ is the undirected graph over the nodes ๐๐๐๐๐(ฮฆ)where ๐๐ , Xj have an edge iff there is ๐ such that ๐๐ , ๐๐ โ ๐๐๐๐๐ ๐ .
Elimination as Graph Transformation
โช For given set of factors in the algorithm ฮฆ, construct ๐ปฮฆ.
โช Eliminate variables (nodes), remove edges from removed node, add fill edges.
The Induced Graph
๐ท๐๐๐๐๐๐ก๐๐๐: Let ๐ท be a set of factors over ๐= {๐1, โฆ๐๐} and โบ be elimination ordering over some ๐โฒ โ ๐ .
โช The induced graph ๐ฟ๐ท,โบ nodes are ๐.
โช ๐๐ and ๐๐ are connected by an edge if they both appear in some intermediate factor generated by the VE algorithm using โบ as an elimination ordering.
The Induced Graph And Maximal Cliques
๐โ๐๐๐๐๐: Let ๐ฟฮฆ,โบ be the induced graph for a set of factorsฮฆ and some elimination ordering โบ. Then:
1. The scope of every factor generated during the variable elimination process is a clique in ๐ฟฮฆ,โบ.
2. Every maximal clique in ๐ฟฮฆ,โบ is the scope of some intermediate factor in the computation.
The Induced Graph And Maximal Cliques
โช ๐๐๐๐๐:
1. Assume factor ๐(๐1, . . , ๐๐) generated by our algorithm, by the construction of the induced graph, each ๐๐ , ๐๐ will have edge between them in the induced graph.
2. For a Maximal clique: Y = {๐1, . . , ๐๐}, assume wlogthat ๐1 was the first eliminated by the algorithm:
โ ๐ is a clique: so ๐1 has edges to each other ๐๐ (its now eliminated), therefore its removal adding all edges between the others. Therefore generating a factor between all of them.
โ The factor generated contains only them, otherwise by (1) there is a larger clique contains ๐ in contradiction to ๐โฒsmaximallity.
Conclusions
Elimination Ordering
the maximal factors
(generated by our algorithm )
Maximal Cliques in
the induced graph
Maximal Cliques in
the induced graph
Switching Lecturers
Elimination Ordering
โช ๐ท๐๐๐๐๐๐ก๐๐๐:width of the induced graph is the size of the largest clique minus one.
โช ๐ท๐๐๐๐๐๐ก๐๐๐: induced width by ordering ๐ is the width of the induced graph after applying the VE algorithm with ๐ as ordering.
โช ๐๐ โ ๐ถ๐๐๐๐๐๐ก๐ ๐๐๐๐๐๐๐:Given a graph H and some bound K, determine whether there exists an elimination ordering achieving an induced width K
Chordal Graphs
โช ๐ท๐๐๐๐๐๐ก๐๐๐: a Chordal graph is an undirected graph such that any minimal loop is of size 3 .
โช ๐โ๐๐๐๐๐: Every induced graph is chordal.
โช ๐๐๐๐๐: assume ๐1 โ ๐2 โโฏ๐๐ โ ๐1 is a cycle where k>3, then:โ wlog assume ๐1 was first eliminated.
โ Then, because no edge is added after ๐1 eliminated, both ๐1 โ ๐๐ and ๐1 โ ๐2 exists at this stage.
โ Therefore, we will add the edge: ๐2 โ ๐๐ , thus ๐1 โ ๐2โ Xk is a minimal loop in contradiction that k>3.
Chordal Graphs
โช ๐โ๐๐๐๐๐: Every Chordal graph ๐ป is the induced graph with some elimination ordering without adding new fill edges.
โช ๐๐๐๐๐: using induction on the number of nodes in the tree. Let H be a chordal graph with n nodes.โ From Chapter 4.12: โEvery Chordal Graph have an Clique
Treeโ.
โ Take a leaf in the clique tree, choose one node which is only in that clique (property about clique trees we wonโt prove), Eliminate it.
โ Its neighbors are only from the clique, therefore no new edges introduced.
โ Remaining graph is also chordal, with n-1 nodes, continue the same.
Elimination Ordering Heuristic 1
Elimination Ordering Heuristic 1
โช ๐โ๐๐๐๐๐: Let H be a chordal graph. Let ๐ be the ranking obtained by running Max-Cardinality on H. Then Sum-Product-VE eliminating variables in order of increasing ๐, does not introduce any fill edges.
โช Proof Intuition: Look at the Clique-Tree for that chordal graph, once weโve selected node from any clique, we will choose first the separators before choosing the nodes in the clique connected to it. This will ensure that in the ordering obtained, for each node, we only should have add edges from it to its neighbors.
Elimination Ordering Heuristic 1
โช ๐บ๐๐๐:make any H given graph model into a chordal graph with a minimum width.
โช Itโs also NP-Hard problem
Elimination Ordering Heuristic 2
โช Greedy algorithm with heuristic cost function:
Heuristic cost function examples
โข Min-neighbors: The cost of a vertex is the number of neighbors it has in the current graph.
โข Min-weight: The cost of a vertex is the product of weights โ domain cardinality โ of its neighbors.
โข Min-fill: The cost of a vertex is the number of edges that need to be added to the graph due to its elimination.
โข Weighted-min-fill: The cost of a vertex is the sum of weights of the edges that need to be added to the graph due to its elimination, where a weight of an edge is the product of weights of its constituent vertices.
Heuristics testing results
โOverall, Min-Fill and Weighted-Min-Fill achieve the best performance, but they are not universally better. The best answer obtained by the greedy algorithms is generally very good; it is often significantly better than the answer obtained by a deterministic state-of-the-art scheme, and it is usually quite close to the best-known ordering, even when the latter is obtained using much more expensive techniques. Because the computational cost of the heuristic ordering-selection algorithms is usually negligible relative to the running time of the inference itself, we conclude that for large networks it is worthwhile to run several heuristic algorithms in order to find the best ordering obtained by any of them.
Remainder: Clique Tree Definition
Remainder: Clique Tree Definition