Estimation of Individual Tree Metrics using Structure-from ...
Structure Estimation in Graphical Modelssteffen/seminars/waldstructure.pdf · Structure estimation...
Transcript of Structure Estimation in Graphical Modelssteffen/seminars/waldstructure.pdf · Structure estimation...
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure Estimation in Graphical Models
Steffen Lauritzen, University of Oxford
Wald Lecture, World Meeting on Probability and Statistics
Istanbul 2012
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Advances in computing has set focus on estimation of structure:
I Model selection (e.g. subset selection in regression)
I System identification (engineering)
I Structural learning (AI or machine learning)
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Advances in computing has set focus on estimation of structure:
I Model selection (e.g. subset selection in regression)
I System identification (engineering)
I Structural learning (AI or machine learning)
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Advances in computing has set focus on estimation of structure:
I Model selection (e.g. subset selection in regression)
I System identification (engineering)
I Structural learning (AI or machine learning)
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Advances in computing has set focus on estimation of structure:
I Model selection (e.g. subset selection in regression)
I System identification (engineering)
I Structural learning (AI or machine learning)
Graphical models describe conditional independence structures, sogood case for formal analysis.
Methods must scale well with data size, as many structures andhuge collections of data are to be considered.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Why estimation of structure?
I Parallel to e.g. density estimation
I Obtain quick overview of relations between variables incomplex systems
I Data mining
I Gene regulatory networks
I Reconstructing family trees from DNA information
I General interest in sparsity.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Why estimation of structure?
I Parallel to e.g. density estimation
I Obtain quick overview of relations between variables incomplex systems
I Data mining
I Gene regulatory networks
I Reconstructing family trees from DNA information
I General interest in sparsity.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Why estimation of structure?
I Parallel to e.g. density estimation
I Obtain quick overview of relations between variables incomplex systems
I Data mining
I Gene regulatory networks
I Reconstructing family trees from DNA information
I General interest in sparsity.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Why estimation of structure?
I Parallel to e.g. density estimation
I Obtain quick overview of relations between variables incomplex systems
I Data mining
I Gene regulatory networks
I Reconstructing family trees from DNA information
I General interest in sparsity.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Why estimation of structure?
I Parallel to e.g. density estimation
I Obtain quick overview of relations between variables incomplex systems
I Data mining
I Gene regulatory networks
I Reconstructing family trees from DNA information
I General interest in sparsity.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Why estimation of structure?
I Parallel to e.g. density estimation
I Obtain quick overview of relations between variables incomplex systems
I Data mining
I Gene regulatory networks
I Reconstructing family trees from DNA information
I General interest in sparsity.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Markov mesh model
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
PC algorithm
Crudest algorithm (HUGIN), 10000 simulated cases
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Bayesian GES
Crudest algorithm (WinMine), 10000 simulated cases
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Tree model
PC algorithm, 10000 cases, correct reconstruction
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Bayesian GES on tree
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Chest clinic
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
PC algorithm
10000 simulated cases
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Bayesian GES
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
SNPs and gene expressions
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
● ● ● ●
●●●
●
●
●
●●●●
●
●
● ●●
●●
●
●
●
●●
●
●●
●●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
● ●
●
●
● ●
●●
●
●
●
● ●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●
●●●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●●
●● ●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
1
2
3
45
6
7
8
9
1011
12
13
14
15
16
17
18
1920
21
2223 24
25 26
2728
29
30
31
32
3334
3536
37
38
39 40
41
4243
44
45
46
4748
49
50
51
52
53
54
55
5657
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
848586
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101102
103
104
105
106
107
108
109 110
111
112
113 114
115116
117
118
119
120 121
122
123
124
125
126127
128
129130
131
132133
134
135
136
137
138
139
140
141
142143
144
145
146147
148
149
150
151
152
153
154
155
156
157
158
159
160161
162
163
164
165
166
167
168169
170
171
172
173
174
175
176
177
178
179
180
181182
183
184
185
186
187
188
189
190
191
192
193194
195
196
197
198 199
200
201
202
203
204
205
206
207
208
209210 211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253254
255
256
257
258
259260
261
262
263
264
265
266
267
268
269
270
271
272
273 274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292293
294
295296
297
298
299
300301
302
303
304
305
306
307
308
309
310
311312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361 362
363
364
365
366367
368369
370
371
372
373374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
min BIC forest
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Methods for structure identification in graphical models can beclassified into three types:
I score-based methods: For example optimizing a penalizedlikelihood by using convex programming e.g. glasso;
I Bayesian methods: Identifying posterior distributions overgraphs; can also use posterior probability as score.
I constraint-based methods: Querying conditionalindependences and identifying compatible independencestructures, for example PC, PC*, NPC, IC, CI, FCI, SIN, QP,. . .
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Methods for structure identification in graphical models can beclassified into three types:
I score-based methods: For example optimizing a penalizedlikelihood by using convex programming e.g. glasso;
I Bayesian methods: Identifying posterior distributions overgraphs; can also use posterior probability as score.
I constraint-based methods: Querying conditionalindependences and identifying compatible independencestructures, for example PC, PC*, NPC, IC, CI, FCI, SIN, QP,. . .
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Structure estimationSome examplesGeneral points
Methods for structure identification in graphical models can beclassified into three types:
I score-based methods: For example optimizing a penalizedlikelihood by using convex programming e.g. glasso;
I Bayesian methods: Identifying posterior distributions overgraphs; can also use posterior probability as score.
I constraint-based methods: Querying conditionalindependences and identifying compatible independencestructures, for example PC, PC*, NPC, IC, CI, FCI, SIN, QP,. . .
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Assume P factorizes w.r.t. an unknown tree τ . Based on a samplex (n) = (x1, . . . , xn) the likelihood function becomes
`(τ, p) = log L(τ) = log p(x (n) | τ)
=∑
e∈E(τ)
log p(x(n)e )−
∑v∈V{deg(v)− 1} log p(x
(n)v ).
Maximizing this over p for a fixed tree τ yields the profile likelihood
l(τ) = l(τ, p) =∑
e∈E(τ)
Hn(e) +∑v∈V
Hn(v)
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Here Hn(e) is the empirical cross-entropy or mutual informationbetween endpoint variables of the edge e = {u, v}:
Hn(e) =∑ n(xu, xv )
nlog
n(xu, xv )/n
n(xu)n(xv )/n2
and similarly Hn(v) is the empirical entropy of Xv .
Based on this fact, (Chow and Liu, 1968) showed MLE τ of T hasmaximal weight, where the weight of τ is
w(τ) =∑
e∈E(τ)
Hn(e) = l(τ)− l(φ0).
Here φ0 is the graph with all vertices isolated.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Fast algorithms (Kruskal Jr., 1956) compute maximal weightspanning tree from weights W = (wuv , u, v ∈ V ).
Chow and Wagner (1978) show a.s. consistency in total variationof P: If P factorises w.r.t. τ , then
supx|p(x)− p(x)| → 0 for n→∞,
so if τ is unique for P, τ = τ for all n > N for some N.
If P does not factorize w.r.t. a tree, P converges to closesttree-approximation P to P (Kullback-Leibler distance).
Note that if P is Markov w.r.t. and undirected graph G and wedefine weights on edges as w(e) = H(e) by cross-entropy, P isMarkov w.r.t any maximal weight spanning tree of G.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Forests
Note that if we consider forests instead of trees, i.e. allow missingbranches, it is still true that
l(φ) = l(φ, p) =∑
e∈E(φ)
Hn(e) +∑v∈V
Hn(v).
Thus if we add a penalty for each edge, e.g. proportional to thenumber of additional parameters qe of introducing the edge e, wecan find the maximum penalised forest by Kruskal’s algorithmusing weights
w(φ) =∑
e∈E(φ)
{Hn(e)− λqe} .
This has been exploited in the package gRapHD (Edwards et al.,2010), see also Panayidou (2011) and Højsgaard et al. (2012).
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
SNPs and gene expressions
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
● ● ● ●
●●●
●
●
●
●●●●
●
●
● ●●
●●
●
●
●
●●
●
●●
●●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
● ●
●
●
● ●
●●
●
●
●
● ●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●
●●●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●●
●● ●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
1
2
3
45
6
7
8
9
1011
12
13
14
15
16
17
18
1920
21
2223 24
25 26
2728
29
30
31
32
3334
3536
37
38
39 40
41
4243
44
45
46
4748
49
50
51
52
53
54
55
5657
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
848586
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101102
103
104
105
106
107
108
109 110
111
112
113 114
115116
117
118
119
120 121
122
123
124
125
126127
128
129130
131
132133
134
135
136
137
138
139
140
141
142143
144
145
146147
148
149
150
151
152
153
154
155
156
157
158
159
160161
162
163
164
165
166
167
168169
170
171
172
173
174
175
176
177
178
179
180
181182
183
184
185
186
187
188
189
190
191
192
193194
195
196
197
198 199
200
201
202
203
204
205
206
207
208
209210 211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253254
255
256
257
258
259260
261
262
263
264
265
266
267
268
269
270
271
272
273 274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292293
294
295296
297
298
299
300301
302
303
304
305
306
307
308
309
310
311312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361 362
363
364
365
366367
368369
370
371
372
373374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
min BIC forest
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Gaussian Trees
If X = (Xv , v ∈ V ) is regular multivariate Gaussian, it factorizesw.r.t. an undirected graph if and only if its concentration matrixK = Σ−1 satisfies
kuv = 0 ⇐⇒ u 6∼ v .
Results of Chow et al. are easily extended to Gaussian trees, withthe weight of a tree determined as
w(τ) =∑
e∈E(τ)
− log(1− r2e ),
with r2e being the empirical correlation coeffient between Xu and
Xv .
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
I Direct likelihood methods (ignoring penalty terms) lead tosensible results.
I (Boootstrap) sampling distribution of tree MLE can besimulated
I Penalty terms additive along branches, so highest AIC or BICscoring tree (forest) also available using a MWST algorithm.
I Tree methods scale extremely well with both sample size andnumber of variables;
I Pairwise marginal counts are sufficient statistics for the treeproblem (empirical covariance matrix in the Gaussian case).
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
I Direct likelihood methods (ignoring penalty terms) lead tosensible results.
I (Boootstrap) sampling distribution of tree MLE can besimulated
I Penalty terms additive along branches, so highest AIC or BICscoring tree (forest) also available using a MWST algorithm.
I Tree methods scale extremely well with both sample size andnumber of variables;
I Pairwise marginal counts are sufficient statistics for the treeproblem (empirical covariance matrix in the Gaussian case).
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
I Direct likelihood methods (ignoring penalty terms) lead tosensible results.
I (Boootstrap) sampling distribution of tree MLE can besimulated
I Penalty terms additive along branches, so highest AIC or BICscoring tree (forest) also available using a MWST algorithm.
I Tree methods scale extremely well with both sample size andnumber of variables;
I Pairwise marginal counts are sufficient statistics for the treeproblem (empirical covariance matrix in the Gaussian case).
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
I Direct likelihood methods (ignoring penalty terms) lead tosensible results.
I (Boootstrap) sampling distribution of tree MLE can besimulated
I Penalty terms additive along branches, so highest AIC or BICscoring tree (forest) also available using a MWST algorithm.
I Tree methods scale extremely well with both sample size andnumber of variables;
I Pairwise marginal counts are sufficient statistics for the treeproblem (empirical covariance matrix in the Gaussian case).
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
I Direct likelihood methods (ignoring penalty terms) lead tosensible results.
I (Boootstrap) sampling distribution of tree MLE can besimulated
I Penalty terms additive along branches, so highest AIC or BICscoring tree (forest) also available using a MWST algorithm.
I Tree methods scale extremely well with both sample size andnumber of variables;
I Pairwise marginal counts are sufficient statistics for the treeproblem (empirical covariance matrix in the Gaussian case).
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
I Direct likelihood methods (ignoring penalty terms) lead tosensible results.
I (Boootstrap) sampling distribution of tree MLE can besimulated
I Penalty terms additive along branches, so highest AIC or BICscoring tree (forest) also available using a MWST algorithm.
I Tree methods scale extremely well with both sample size andnumber of variables;
I Pairwise marginal counts are sufficient statistics for the treeproblem (empirical covariance matrix in the Gaussian case).
Note sufficiency holds despite parameter space very different fromopen subset of Rk .
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Consider an undirected Gaussian graphical model and thel1-penalized log-likelihood function
2`pen(K ) = log detK − tr(KW )− λ||K ||∗1.
The penalty ||K ||∗1 =∑
i 6=j |kij | is essentially a convex relaxation ofthe number of edges in the graph and optimization of thepenalized likelihood will typically lead to several kij = 0 and thus ineffect estimate a particular graph.
This penalized likelihood can be maximized efficiently (Banerjeeet al., 2008) as implemented in the graphical lasso (Friedmanet al., 2008).
Beware: not scale-invariant!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
glasso for bodyfat
●Weight
●Knee
●Thigh
●Hip
●Abdomen
●Ankle ●Biceps
●Forearm
●Height
●Wrist
●Neck
●Chest
●Age
●BodyFat
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Markov equivalence
D and D′ are equivalent if and only if:
1. D and D′ have same skeleton (ignoring directions)
2. D and D′ have same unmarried parents
so
s - ss@@R? s ≡ s - s s
s?@@I
but
s - s -
s@@R? s 6≡ s - s - s
s6@
@R
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Equivalence class searches
Searches directly in equivalence classes of DAGs.
Define score function σ(D), with the property that
D ≡ D′ ⇒ σ(D) = σ(D′).
This holds e.g. if score function is AIC or BIC or full Bayesianposterior with strong hyper Markov prior (based upon Dirichlet orinverse Wishart distributions).
Equivalence class with maximal score is sought.
dlasso? problems with invariance over equivalence classes!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Estimating trees and forestsGraphical lassoScore-based methods for DAGs
Greedy equivalence class search
1. Initialize with empty DAG
2. Repeatedly search among equivalence classes with a singleadditional edge and go to class with highest score - until noimprovement.
3. Repeatedly search among equivalence classes with a singleedge less and move to one with highest score - until noimprovement.
For BIC, for example, this algorithm yields consistent estimate ofequivalence class for P. (Chickering, 2002)
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
For g in specified set of graphs, Θg is associated parameter spaceso that P factorizes w.r.t. g if and only if P = Pθ for some θ ∈ Θg .
πg is prior on Θg . Prior p(g) is uniform for simplicity.
Based on x (n), posterior distribution of G is
p∗(g) = p(g | x (n)) ∝ p(x (n) | g) =
∫Θg
p(x (n) | g , θ)πg (dθ).
Bayesian analysis looks for MAP estimate g∗ maximizing p∗(g) orattempts to sample from posterior, using e.g. MCMC.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
For connected decomposable graphs and strong hyper Markovpriors Dawid and Lauritzen (1993) show
p(x (n) | g) =
∏C∈C p(x
(n)C )∏
S∈S p(x(n)S )
,
where each factor has explicit form. C are the cliques of g and Sthe separators (mininal cutsets).
Hence, if the prior distributions over a class of graphs is uniform,the posterior distribution has the form
p(g | x (n)) ∝∏
C∈C p(x(n)C )∏
S∈S p(x(n)S )
=
∏C∈C w(C )∏S∈S w(S)
.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
Byrne (2011) shows that a distribution over decomposable graphshaving the form
p(g) ∝∏
C∈C w(C )∏S∈S w(S)
,
satisfies a structural Markov property so that for subsets A and Bwith V = A ∪ B, it holds that
GA⊥⊥GB | (A,B) is a decomposition of G.
In words, the finer structure of the graph bits in two componentsof any decomposition are independent.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
Trees are decomposable, so for trees we get
p(τ | x (n)) ∝∏
e∈E(τ)
p(x(n)e )
which more illuminating can be expressed as
p(τ | x (n)) ∝∏
e∈E(τ)
Be
where
Buv =p(x
(n)uv )
p(x(n)u )p(x
(n)v )
is the Bayes factor for independence along the edge uv .
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
MAP estimates of trees can be computed (also in Gaussian case)
Good direct algorithms exist for generating random spanning trees(Guenoche, 1983; Broder, 1989; Aldous, 1990; Propp and Wilson,1998) so full posterior analysis is possible for trees.
MCMC methods for exploring posteriors of undirected graphs havebeen developed.
Even for forests, it seems complicated to sample from the posteriordistribution (Dai, 2008).
p(φ | x (n)) ∝∏
e∈E(φ)
Be
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
Some challenges for undirected graphs
I Find feasible algorithm for (perfect) simulation from adistribution over decomposable graphs as
p(g) ∝∏
C∈C w(C )∏S∈S w(S)
,
where w(A),A ⊆ V are a prescribed set of positive weights.
I Find feasible algorithm for obtaining MAP in decomposablecase. This may not be universally possible as problem isNP-complete, even for bounded maximum clique size.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Basic Bayesian model estimationDecomposable graphical modelsBayesian methods for DAGs
Posterior distribution for DAG
For strong directed hyper Markov priors it holds that
p(x (n) | d) =∏v∈V
p(x(n)v | x (n)
pa(v))
sop(d | x (n)) ∝
∏v∈V
p(x(n)v | x (n)
pa(v)),
see e.g. Spiegelhalter and Lauritzen (1990); Cooper and Herskovits(1992); Heckerman et al. (1995)Challenge: Find good algorithm for sampling from this fullposterior.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
First step of constraint-based methods (eg PC-algorithm) is toidentify skeleton of G, which is the undirected graph with α 6∼ β ifand only if there exists S ⊆ V \ {α, β} with α⊥g β |S .
The skeleton stage of any such algorithm is thus correct for anytype of graph which represents the independence structure!
In particular, the the PC algorithm can easily be adapted to UGsand CHGs (Meek, 1996).
Challenge: the properties of these algorithms when translated tosampling situations are not sufficiently well understood.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
Step 1: Identify skeleton using that, for P faithful,
u 6∼ v ⇐⇒ ∃S ⊆ V \ {u, v} : Xu ⊥⊥Xv | XS .
Begin with complete graph, check for S = ∅ andremove edges when independence holds. Thencontinue for increasing |S |.PC-algorithm (Spirtes et al., 1993) exploits that onlyS with S ⊆ ne(u) or S ⊆ ne(v) needs checking, nerefers to current skeleton.
Step 2: Identify directions to be consistent withindependence relations found in Step 1.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
pcalg for bodyfat
BodyFat Age
Weight
Height
Neck
Chest
Abdomen
Hip
Thigh
Knee
Ankle
Biceps
Forearm
Wrist
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
pcalg for carcass
Fat11
Meat11
Fat12
Meat12
Fat13
Meat13
LeanMeat
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
Faithfulness
A given distribution P is in general compatible with a variety ofstructures, i.e. if P corresponds to complete independence. Toidentify a structure G something like the following must hold
P is said to be faithful to G if
A⊥⊥B | XS ⇐⇒ A⊥g B | S .
Most distributions are faithful. More precisely, for DAGs it holdsthat the non-faithful distributions form a Lebesgue null-set inparameter space associated with a DAG.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
Exact properties of PC-algorithm
If P is faithful to DAG D, PC-algorithm finds D′ equivalent to D.It uses N independence checks where N is at most
N ≤ 2
(|V |2
) d∑i=0
(|V | − 1
i
)≤ |V |
d+1
(d − 1)!,
where d is the maximal degree of any vertex in D. f So worst casecomplexity is exponential, but algorithm fast for sparse graphs.Sampling properties are less well understood although consistencyresults exist.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
PC algorithm
Constraint based methods establish fundamentally two lists:
1. An independence list I of triplets (α, β, S) with α⊥⊥β |S ;identifies skeleton;
2. A dependence list D of triplets (α, β, S) with ¬(α⊥⊥β | S).
They get established recursively, for increasing size of S , and thelist I is most likely to have errors. The lists may or may not beconsistent with a DAG model, but methods exist for checking this.Question: Assume a current pair of input lists is consistent withcompositional graphoid axioms and a new triplet (α, β, S) isconsidered. Can it be verified whether it can be consistently addedto any of the two lists? Graph representable? Compositional?
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Seems out of hand to extend Bayesian and score basedmethods to more general graphical models;
I Fully Bayesian methods do typically not scale well withnumber of variables;
I Constraint based methods have less clear formal inferencebasis; challenge to improve this.
I Constraint based methods have been developed to work incases where P is faithful and conditional independence queriescan be resolved without error.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Seems out of hand to extend Bayesian and score basedmethods to more general graphical models;
I Fully Bayesian methods do typically not scale well withnumber of variables;
I Constraint based methods have less clear formal inferencebasis; challenge to improve this.
I Constraint based methods have been developed to work incases where P is faithful and conditional independence queriescan be resolved without error.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Seems out of hand to extend Bayesian and score basedmethods to more general graphical models;
I Fully Bayesian methods do typically not scale well withnumber of variables;
I Constraint based methods have less clear formal inferencebasis; challenge to improve this.
I Constraint based methods have been developed to work incases where P is faithful and conditional independence queriescan be resolved without error.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Seems out of hand to extend Bayesian and score basedmethods to more general graphical models;
I Fully Bayesian methods do typically not scale well withnumber of variables;
I Constraint based methods have less clear formal inferencebasis; challenge to improve this.
I Constraint based methods have been developed to work incases where P is faithful and conditional independence queriescan be resolved without error.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
I Factorisation properties for Markov distributions;
I Local computation algorithms for speeding up any relevantcomputation;
I Causality and causal interpretations;
I Non-parametric graphical models for large scale data analysis;
I Probabilistic expert systems, for example in forensicidentification problems;
I Markov theory for infinite graphs (huge graphs).
I THANK YOU FOR LISTENING!
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Aldous, D. (1990). A random walk construction of uniformspanning trees and uniform labelled trees. SIAM Journal onDiscrete Mathematics, 3(4):450–465.
Banerjee, O., Ghaoui, L. E., and dAspremont, A. (2008). Modelselection through sparse maximum likelihood estimation formultivariate Gaussian or binary data. 9:485–216.
Broder, A. (1989). Generating random spanning trees. In ThirtiethAnnual Symposium of Computer Science, pages 442–447.
Byrne, S. (2011). Hyper and Structural Markov Laws for GraphicalModels. PhD thesis, Statistical Laboratory, University ofCambridge.
Chickering, D. M. (2002). Optimal structure identification withgreedy search. Journal of Machine Learning Research,3:507–554.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Chow, C. K. and Liu, C. N. (1968). Approximating discreteprobability distributions with dependence trees. IEEETransactions on Information Theory, 14:462–467.
Chow, C. K. and Wagner, T. J. (1978). Consistency of an estimateof tree-dependent probability distributions. IEEE Transactionson Information Theory, 19:369–371.
Cooper, G. F. and Herskovits, E. (1992). A Bayesian method forthe induction of probabilistic networks from data. MachineLearning, 9:309–347.
Dai, H. (2008). Perfect sampling methods for random forests.Advances in Applied Probability, 40:897–917.
Dawid, A. P. and Lauritzen, S. L. (1993). Hyper Markov laws inthe statistical analysis of decomposable graphical models. TheAnnals of Statistics, 21:1272–1317.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Edwards, D., de Abreu, G., and Labouriau, R. (2010). Selectinghigh-dimensional mixed graphical models using minimal AIC orBIC forests. BMC Bioinformatics, 11(1):18.
Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inversecovariance estimation with the graphical lasso. Biostatistics,9(3):432–441.
Guenoche, A. (1983). Random spanning tree. Journal ofAlgorithms, 4(3):214 –220.
Heckerman, D., Geiger, D., and Chickering, D. M. (1995).Learning Bayesian networks: The combination of knowledge andstatistical data. Machine Learning, 20:197–243.
Højsgaard, S., Edwards, D., and Lauritzen, S. (2012). GraphicalModels with R. Springer-Verlag, New York.
Kruskal Jr., J. B. (1956). On the shortest spanning subtree of aSteffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
graph and the travelling salesman problem. Proceedings of theAmerican Mathematical Society, 7:48–50.
Meek, C. (1996). Selecting Graphical Models: Causal andStatistical Reasoning. PhD thesis, Carnegie–Mellon University.
Panayidou, K. (2011). Estimation of Tree Structure for VariableSelection. PhD thesis, Department of Statistics, University ofOxford.
Propp, J. G. and Wilson, D. B. (1998). How to get a perfectlyrandom sample from a generic Markov chain and generate arandom spanning tree of a directed graph. Journal ofAlgorithms, 27(2):170 –217.
Spiegelhalter, D. J. and Lauritzen, S. L. (1990). Sequentialupdating of conditional probabilities on directed graphicalstructures. Networks, 20:579–605.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models
IntroductionScore-based methods
Bayesian analysisConstraint-based methods
Summary and challengesThings I did not even get near
References
Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation,Prediction and Search. Springer-Verlag, New York. Reprinted byMIT Press.
Steffen Lauritzen, University of Oxford Structure Estimation in Graphical Models