Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models'...
Transcript of Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models'...
![Page 1: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/1.jpg)
Daphne Koller
Structure'Learning'
Probabilis2c'Graphical'Models' BN'Structure'
Learning'
![Page 2: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/2.jpg)
Daphne Koller
Why Structure Learning • To learn model for new queries, when
domain expertise is not perfect
• For structure discovery, when inferring network structure is goal in itself
![Page 3: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/3.jpg)
Daphne Koller
Importance of Accurate Structure
• Incorrect independencies • Correct distribution P*
cannot be learned • But could generalize better
• Spurious dependencies • Can correctly learn P* • Increases # of parameters • Worse generalization
A B D
C
Adding an arc Missing an arc
A B D
C A B D
C
![Page 4: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/4.jpg)
Daphne Koller
Score-Based Learning
A,B,C <1,0,0> <1,1,1> <0,0,1> <0,1,1> . . <0,1,0>
A B
C
C
B A
C B A
Search for a structure that maximizes the score
Define scoring function that evaluates how well a structure matches the data
![Page 5: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/5.jpg)
Daphne Koller
Likelihood)Structure)Score)
Probabilis3c)Graphical)Models) BN)Structurds)
Learning)
![Page 6: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/6.jpg)
Daphne Koller
Likelihood Score • Find (G,θ) that maximize the likelihood
![Page 7: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/7.jpg)
Daphne Koller
Example X Y X Y
![Page 8: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/8.jpg)
Daphne Koller
General Decomposition • The Likelihood score decomposes as:
![Page 9: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/9.jpg)
Daphne Koller
Limitations of Likelihood Score
• Mutual information is always ≥ 0 • Equals 0 iff X, Y are independent
– In empirical distribution • Adding edges can’t hurt, and
almost always helps • Score maximized for fully
connected network
X Y X Y
![Page 10: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/10.jpg)
Daphne Koller
Avoiding Overfitting • Restricting the hypothesis space – restrict # of parents or # of
parameters • Scores that penalize complexity: – Explicitly – Bayesian score averages over all
possible parameter values
![Page 11: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/11.jpg)
Daphne Koller
Summary • Likelihood score computes log-likelihood of D
relative to G, using MLE parameters – Parameters optimized for D
• Nice information-theoretic interpretation in terms of (in)dependencies in G
• Guaranteed to overfit the training data (if we don’t impose constraints)
![Page 12: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/12.jpg)
Daphne Koller
BIC$Score$and$Asympto3c$Consistency$
Probabilis3c$Graphical$Models$ BN$Structure$
Learning$
![Page 13: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/13.jpg)
Daphne Koller
Penalizing Complexity
• Tradeoff between fit to data and model complexity
![Page 14: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/14.jpg)
Daphne Koller
Asymptotic Behavior
• Mutual information grows linearly with M while complexity grows logarithmically with M – As M grows, more emphasis is given to fit to data
![Page 15: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/15.jpg)
Daphne Koller
Consistency
• As M∞, the true structure G* (or any I-equivalent structure) maximizes the score – Asymptotically, spurious edges will not
contribute to likelihood and will be penalized – Required edges will be added due to linear
growth of likelihood term compared to logarithmic growth of model complexity
![Page 16: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/16.jpg)
Daphne Koller
Summary • BIC score explicitly penalizes model
complexity (# of independent parameters) – Its negation often called MDL
• BIC is asymptotically consistent: – If data generated by G*, networks I-equivalent
to G* will have highest score as M grows to ∞
![Page 17: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/17.jpg)
Daphne Koller
Bayesian(Score(
Probabilis0c(Graphical(Models( BN(Structure(
Learning(
![Page 18: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/18.jpg)
Daphne Koller
Bayesian Score Marginal likelihood Prior over structures
Marginal probability of Data
![Page 19: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/19.jpg)
Daphne Koller
Marginal Likelihood of Data Given G
Likelihood Prior over parameters
![Page 20: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/20.jpg)
Daphne Koller
Marginal Likelihood Intuition
![Page 21: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/21.jpg)
Daphne Koller
Marginal Likelihood: BayesNets
∫∞
−−=Γ0
1)( dtetx tx
)1()( −Γ⋅=Γ xxx
![Page 22: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/22.jpg)
Daphne Koller
Marginal Likelihood Decomposition
![Page 23: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/23.jpg)
Daphne Koller
Structure Priors
• Structure prior P(G) – Uniform prior: P(G) ∝ constant – Prior penalizing # of edges: P(G) ∝ c|G| (0<c<1) – Prior penalizing # of parameters
• Normalizing constant across networks is similar and can thus be ignored
![Page 24: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/24.jpg)
Daphne Koller
Parameter Priors • Parameter prior P(θ|G) is usually the BDe prior
– α: equivalent sample size – B0: network representing prior probability of events – Set α(xi,pai
G) = α P(xi,paiG| B0)
• Note: paiG are not the same as parents of Xi in B0
• A single network provides priors for all candidate networks
• Unique prior with the property that I-equivalent networks have the same Bayesian score
![Page 25: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/25.jpg)
Daphne Koller
BDe and BIC • As M∞, a network G with Dirichlet
priors satisfies
![Page 26: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/26.jpg)
Daphne Koller
Summary • Bayesian score averages over parameters to avoid
overfitting • Most often instantiated as BDe – BDe requires assessing prior network – Can naturally incorporate prior knowledge – I-equivalent networks have same score
• Bayesian score – Asymptotically equivalent to BIC – Asymptotically consistent – But for small M, BIC tends to underfit
![Page 27: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/27.jpg)
Daphne Koller
Structure'Learning'In'Trees'
Probabilis4c'Graphical'Models' BN'Structure'
Learning'
![Page 28: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/28.jpg)
Daphne Koller
Score-Based Learning
A,B,C <1,0,0> <1,1,1> <0,0,1> <0,1,1> . . <0,1,0>
A B
C
C
B A
C B A
Search for a structure that maximizes the score
Define scoring function that evaluates how well a structure matches the data
![Page 29: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/29.jpg)
Daphne Koller
Optimization Problem Input: – Training data – Scoring function (including priors, if needed) – Set of possible structures
Output: A network that maximizes the score Key Property: Decomposability
![Page 30: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/30.jpg)
Daphne Koller
• Forests – At most one parent per variable
• Why trees? – Elegant math – Efficient optimization – Sparse parameterization
Learning Trees/Forests
![Page 31: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/31.jpg)
Daphne Koller
Learning Forests • p(i) = parent of Xi, or 0 if Xi has no parent
• Score = sum of edge scores + constant
Score of “empty” network
Improvement over “empty” network
![Page 32: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/32.jpg)
Daphne Koller
• Set w(i→j) = Score(Xj | Xi ) - Score(Xj)
• For likelihood score, w(i→j) = M I (Xi; Xj), and all edge weights are nonnegative Optimal structure is always a tree
• For BIC or BDe, weights can be negative Optimal structure might be a forest
Learning Forests I
P̂
![Page 33: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/33.jpg)
Daphne Koller
• A score satisfies score equivalence if I-equivalent structures have the same score – Such scores include likelihood, BIC, and BDe
• For such a score, we can show w(i→j) = w(j→i), and use an undirected graph
Learning Forests II
![Page 34: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/34.jpg)
Daphne Koller
• Define undirected graph with nodes {1,…,n} • Set w(i,j) = max[Score(Xj | Xi ) - Score(Xj),0] • Find forest with maximal weight
– Standard algorithms for max-weight spanning trees (e.g., Prim’s or Kruskal’s) in O(n2) time – Remove all edges of weight 0 to produce a forest
Learning Forests III (for score-equivalent scores)
![Page 35: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/35.jpg)
Daphne Koller
PCWP CO HRBP
HREKG HRSAT
ERRCAUTER HR HISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT FIO2
PRESS
INSUFFANESTH TPR
LVFAILURE
ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME
PCWP CO HRBP
HREKG HRSAT
ERRCAUTER HR HISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT FIO2
PRESS
INSUFFANESTH TPR
LVFAILURE
ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME
HYPOVOLEMIA
CVP
BP
HYPOVOLEMIA
CVP
BP
Learning Forests: Example
• Not every edge in tree is in the original network • Inferred edges are undirected – can’t determine direction
Correct edges
Spurious edges
Tree learned from data of Alarm network
![Page 36: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/36.jpg)
Daphne Koller
Summary • Structure learning is an optimization over
the combinatorial space of graph structures • Decomposability network score is a sum
of terms for different families • Optimal tree-structured network can be
found using standard MST algorithms • Computation takes quadratic time
![Page 37: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/37.jpg)
Daphne Koller
General'Graphs:'Search'
Probabilis2c'Graphical'Models' BN'Structure'
Learning'
![Page 38: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/38.jpg)
Daphne Koller
Optimization Problem Input: – Training data – Scoring function – Set of possible structures
Output: A network that maximizes the score
![Page 39: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/39.jpg)
Daphne Koller
Beyond Trees • Problem is not obvious for general networks
– Example: Allowing two parents, greedy algorithm is no longer guaranteed to find the optimal network
• Theorem: – Finding maximal scoring network structure with at
most k parents for each variable is NP-hard for k>1
![Page 40: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/40.jpg)
Daphne Koller
Heuristic Search
A B
C
D
A B
C
D
A B
C
D
A B
C
D
![Page 41: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/41.jpg)
Daphne Koller
• Search operators: – local steps: edge addition, deletion, reversal – global steps
• Search techniques: – Greedy hill-climbing – Best first search – Simulated Annealing – ...
Heuristic Search
![Page 42: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/42.jpg)
Daphne Koller
• Start with a given network – empty network – best tree – a random network – prior knowledge
• At each iteration – Consider score for all possible changes – Apply change that most improves the score
• Stop when no modification improves score
Search: Greedy Hill Climbing
![Page 43: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/43.jpg)
Daphne Koller
Greedy Hill Climbing Pitfalls • Greedy hill-climbing can get stuck in: – Local maxima – Plateaux
• Typically because equivalent networks are often neighbors in the search space
![Page 44: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/44.jpg)
Daphne Koller
Why Edge Reversal
B A
C
B A
C
![Page 45: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/45.jpg)
Daphne Koller
A Pretty Good, Simple Algorithm • Greedy hill-climbing, augmented with: • Random restarts: – When we get stuck, take some number of
random steps and then start climbing again • Tabu list: – Keep a list of K steps most recently taken – Search cannot reverse any of these steps
![Page 46: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/46.jpg)
Daphne Koller
Example: ICU-Alarm
0
0.5
1
1.5
2
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
KL D
iver
genc
e
M
True Structure/BDe α = 10 Unknown Structure/BDe α = 10
![Page 47: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/47.jpg)
Daphne Koller
JamBayes
Horvitz, Apacible, Sarin, & Liao, UAI 2005
![Page 48: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/48.jpg)
Daphne Koller
Predicting Surprises
Horvitz, Apacible, Sarin, & Liao, UAI 2005
![Page 49: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/49.jpg)
Daphne Koller
Learned Model
Horvitz, Apacible, Sarin, & Liao, UAI 2005
![Page 50: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/50.jpg)
Daphne Koller
Influences in Learned Model
Horvitz, Apacible, Sarin, & Liao, UAI 2005
![Page 51: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/51.jpg)
Daphne Koller
Known 15/17 Supported 2/17 Reversed 1 Missed 3
From “Causal protein-signaling networks derived from multiparameter single-cell data” Sachs et al., Science 308:523, 2005. Reprinted with permission from AAAS.
Biological Network Reconstruction Phospho-Proteins Phospho-Lipids Perturbed in data
PKC
Raf
Erk
Mek
Plcγ
PKA
Akt
Jnk P38
PIP2
PIP3
Subsequently validated in wetlab
This figure may be used for non-commercial and classroom purposes only. Any other uses require the prior written permission from AAAS
![Page 52: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/52.jpg)
Daphne Koller
Summary • Useful for building better predictive models: – when domain experts don’t know the structure – for knowledge discovery
• Finding highest-scoring structure is NP-hard • Typically solved using simple heuristic search – local steps: edge addition, deletion, reversal – hill-climbing with tabu lists and random restarts
• But there are better algorithms
![Page 53: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/53.jpg)
Daphne Koller
General'Graphs:'Decomposability'
Probabilis5c'Graphical'Models' BN'Structure'
Learning'
![Page 54: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/54.jpg)
Daphne Koller
Heuristic Search
A B
C
D
A B
C
D
A B
C
D
A B
C
D
![Page 55: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/55.jpg)
Daphne Koller
Naïve Computational Analysis • Operators per search step:
• Cost per network evaluation: – Components in score – Compute sufficient statistics – Acyclicity check
• Total: O(n2 (Mn + m)) per search step
![Page 56: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/56.jpg)
Daphne Koller
Exploiting Decomposability
A B
C
D
A B
C
D Δscore(D) = Score(D | {B,C}) - Score(D | {C})
score = Score(A | {}) + Score(B | {}) + Score(C | {A,B}) + Score(D | {C})
score = Score(A | {}) + Score(B | {}) + Score(C | {A,B}) + Score(D | {B,C})
![Page 57: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/57.jpg)
Daphne Koller
Exploiting Decomposability
A B
C
D
A B
C
D
A B
C
D
A B
C
D
Δscore(D) = Score(D | {B,C}) - Score(D | {C})
Δscore(C) = Score(C | {A}) - Score(C | {A,B})
Δscore(C)+Δscore(B) = Score(C | {A}) - Score(C | {A,B}) + Score(B | {C}) - Score(B | {})
![Page 58: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/58.jpg)
Daphne Koller
Exploiting Decomposability
A B
C
D
A B
C
D
A B
C
D
A B
C
D
To recompute scores, only need to re-score families that changed in the last move
Δscore(C) = Score(C | {A}) - Score(C | {A,B})
![Page 59: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/59.jpg)
Daphne Koller
Computational Cost • Cost per move – Compute O(n) delta-scores damaged by move – Each one takes O(M) time
• Keep priority queue of operators sorted by delta-score – O(n log n)
![Page 60: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/60.jpg)
Daphne Koller
More Computational Efficiency • Reuse and adapt previously computed
sufficient statistics • Restrict in advance the set of operators
considered in the search
![Page 61: Graphical' Models' Structure' Learning'€¦ · Learning' Probabilis2c' Graphical' Models' BN'Structure' Learning' Daphne Koller Why Structure Learning • To learn model for new](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f7798779c23e011234283d3/html5/thumbnails/61.jpg)
Daphne Koller
Summary • Even heuristic structure search can get
expensive for large n • Can exploit decomposability to get orders
of magnitude reduction in cost • Other tricks are also used for scaling