Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and...
-
Upload
donald-howard -
Category
Documents
-
view
224 -
download
0
description
Transcript of Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and...
![Page 1: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/1.jpg)
Learning Bayes Nets Based on Conditional Dependencies
Oliver SchulteDepartment of Philosophy andSchool of Computing ScienceSimon Fraser UniversityVancouver, [email protected] `
with Wei Luo (Simon Fraser) andRuss Greiner (U of Alberta)
![Page 2: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/2.jpg)
Learning Bayes Nets Based on Conditional Dependencies
2/28
Outline
Brief Intro to Bayes NetsCombining Dependency Information with Model SelectionLearning from Dependency Data Only: Learning-Theoretic Analysis
![Page 3: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/3.jpg)
Learning Bayes Nets Based on Conditional Dependencies
3/28
Bayes Nets: OverviewBayes Net Structure = Directed Acyclic Graph.Nodes = Variables of Interest.Arcs = direct “influence”, “association”.Parameters = CP Tables = Prob of Child given Parents.Structure represents (in)dependencies.Structure + parameters represents joint probability distribution over variables.
![Page 4: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/4.jpg)
Learning Bayes Nets Based on Conditional Dependencies
4/28
Examples from CIspace (UBC)
![Page 5: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/5.jpg)
Learning Bayes Nets Based on Conditional Dependencies
5/28
Graphs entail Dependencies
A
B
C
A
B
C
A
B
C
Dep(A,B),Dep(A,B|C)
Dep(A,B),Dep(A,B|C),Dep(B,C),Dep(B,C|A),Dep(A,C|B)
![Page 6: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/6.jpg)
Learning Bayes Nets Based on Conditional Dependencies
6/28
I-maps and Probability Distributions
• Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G.
• Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G.
• Informally, G is an I-map of P G entails all conditional dependencies in P.
• Theorem Fix G,P. There is a parameter setting for G such that (G, ) represents P G is an I-map of P.
![Page 7: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/7.jpg)
Two Approaches to Learning Bayes Net Structure
• selectgraph G as “model” with parameters to be estimated• “search and score”
• find G that represents dependencies in P• “test and cover” dependencies
Aim: find G that represents P with suitable parameters
![Page 8: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/8.jpg)
Learning Bayes Nets Based on Conditional Dependencies
8/28
Our Hybrid Approach
Sample
Set ofDependencies Final
Output Graph
The final selected graph maximizesa model selection score and covers all observed dependencies.
![Page 9: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/9.jpg)
Definition of Hybrid Criterion• Let d be a sample. Let S(G,d) be a score function.
AB
C
Case 1 Case 2 Case 3
S 10.5
Let Dep be a set of conditional dependencies extracted from sample d.
Graph G optimizes score S given Dep, sample d 1. G entails the dependencies Dep, and2. if any other graph G’ entails Dep, then score(G,d) ≥
score(G’,d).
![Page 10: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/10.jpg)
Learning Bayes Nets Based on Conditional Dependencies
10/28
Local Search Heuristics for Constrained Search • There is a general method for adapting any local search heuristic to accommodate observed dependencies.• Will present adaptation of GES search - call it IGES.
![Page 11: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/11.jpg)
Learning Bayes Nets Based on Conditional Dependencies
11/28
GES Search (Meek, Chickering)
GrowthPhase:AddEdges
B
CAScore = 5
B
CA Score = 7
B
CA Score = 8.5
ShrinkPhase:DeleteEdges
B
CA
Score = 9
B
CA Score = 8
![Page 12: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/12.jpg)
Learning Bayes Nets Based on Conditional Dependencies
12/28
IGES Search
Case 1 Case 2 Case 3
Step 1: Extract Dependencies From Sample
Testing Procedure Dependencies
1. Continue with Growth Phase until all dependencies are covered.
2. During Shrink Phase, delete edge only if dependencies are still covered.
B
CAScore = 7
B
CA Score = 5
given Dep(A,B)
![Page 13: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/13.jpg)
Asymptotic Equivalence GES = IGES Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit.• So IGES inherits the convergence properties
of GES.
![Page 14: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/14.jpg)
Learning Bayes Nets Based on Conditional Dependencies
14/28
Extracting Dependencies
We use 2 test (with cell coverage condition)Exhaustive testing of all triplesIndep(X,Y|S) for cardinality(S) < k
chosen by user More sophisticated testing strategy coming soon.
![Page 15: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/15.jpg)
Learning Bayes Nets Based on Conditional Dependencies
15/28
Simulation Setup: Methods
• The hybrid approach is a general schema.Our Setup• Statistical Test: 2
• Score S: BDeu (with Tetrad default settings)• Search Method: GES, adapted
![Page 16: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/16.jpg)
Simulation Setup: Graphs and Data• Random DAGs with binary variables.• #Nodes: 4,6,8,10.• Sample Sizes 100, 200, 400, 800,
1600, 3200, 6400, 12800, 25600.• 10 random samples per graph per
sample size, average results.• Graphs generated with Tetrad’s random DAG
utility.
![Page 17: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/17.jpg)
Result Graphs
![Page 18: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/18.jpg)
Conclusion for I-map learning: The Underfitting Zone
Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well.But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs.
samplesize
small:little significance
medium:underfitting of correlations
large:convergence zone
Diver-gence from True Graph
standard search + scoreconstrained S + S
![Page 19: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/19.jpg)
Learning Bayes Nets Based on Conditional Dependencies
19/28
Part II: Learning-Theoretic Model (COLT 2007)
• Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements.• Data repetition is possible.• Learner outputs graph (pattern); may output ?.Dep(A,B) Dep(B,C) Dep(A,C|B)
B
CA
B
CA?
……
Data
Conjectures
![Page 20: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/20.jpg)
Learning Bayes Nets Based on Conditional Dependencies
20/28
Criteria for Optimal Learning
Convergence: Learner must eventually settle on true graph.Learner must minimize mind changes.Given 1 and 2, learner is not dominated in convergence time.
![Page 21: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/21.jpg)
Learning Bayes Nets Based on Conditional Dependencies
21/28
The Optimal Learning Procedure Theorem There is a unique optimal
learner defined as follows:1. If there is a unique graph G covering
the observed dependencies with a minimum number of adjacencies, output G.
2. Otherwise output ?.
![Page 22: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/22.jpg)
Learning Bayes Nets Based on Conditional Dependencies
22/28
Computational Complexity of the Unique Optimal Learner
Theorem The following problem is NP-hard:1. Decide if there is a unique edge-minimal map for
a set of dependencies D.2. If yes, output the graph.Proof: Reduction to Unique Exact 3Set Cover.
{x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9}
x1 x2 x3 x4 x5 x6 x7 x8 x9
{x1,x2,x3},{x4,x5,x7},{x3,x6,x9}
![Page 23: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/23.jpg)
Learning Bayes Nets Based on Conditional Dependencies
23/28
Hybrid Method and Optimal Learner
Score-based methods tend to underfit (with discrete variables): place edges correctly but too few
mind change optimal but not convergence time optimal.• Hybrid method speeds up convergence.
![Page 24: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/24.jpg)
Learning Bayes Nets Based on Conditional Dependencies
24/28
A New Testing Strategy
• Say that a graph G satisfies the Markov condition wrt sample d for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)).• Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies.
![Page 25: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/25.jpg)
Learning Bayes Nets Based on Conditional Dependencies
25/28
Future Work
• Use Markov condition to develop local search algorithm for score optimization requiring only (#Var)2 tests.• Apply idea of Markov condition +edge minimization for continuous variable models.
![Page 26: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/26.jpg)
Learning Bayes Nets Based on Conditional Dependencies
26/28
Summary: Hybrid Criterion - test, search and score.
• Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. • Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations.• Theory + Simulation evidence suggests that this:
• speeds up convergence to correct graph• addresses underfitting on small-medium samples.
![Page 27: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/27.jpg)
Learning Bayes Nets Based on Conditional Dependencies
27/28
Summary: Learning-Theoretic Analysis Learning Model: Learn graph from dependencies alone.Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies.Implementing this method is NP-hard.
![Page 28: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,](https://reader031.fdocuments.in/reader031/viewer/2022012917/5a4d1b057f8b9ab059988404/html5/thumbnails/28.jpg)
Learning Bayes Nets Based on Conditional Dependencies
28/28
References
“Mind Change Optimal Learning of Bayes Net Structure”.O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT).
THE END