Learning the Structure of Related Tasks
description
Transcript of Learning the Structure of Related Tasks
![Page 1: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/1.jpg)
Learning the Structure of Related Tasks
Presented by Lihan He
Machine Learning Reading Group
Duke University
02/03/2006
A. Niculescu-Mizil, R. Caruana
![Page 2: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/2.jpg)
Outline
Introduction
Learning single Bayes networks from data
Learning from related tasks
Experimental results
Conclusions
![Page 3: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/3.jpg)
Introduction
Graphical model:
Node represents random variables; edge represents dependency.
Undirected graphical model: Markov network
Directed graphical model: Bayesian network
x1
x2 x3
x4
Causal relationships between nodes;
Directed acyclic graph (DAG) : No directed cycles allowed;
B={G,θ}
),,,( 4321 XXXXp
)|()|()|()( 3,2413121 XXXpXXpXXpXp
![Page 4: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/4.jpg)
Introduction
Goal: simultaneously learn Bayes Net structures for multiple tasks.
Different tasks are related;
Structures might be similar, but not identical.
Example: gene expression data.
1) Learning one single structure from data.
2) Generalizing to multiple task learning by setting joint prior of structures.
![Page 5: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/5.jpg)
Single Bayesian network learning from data
Bayes Network B={G, θ}, including a set of n random variables X={X1, X2,…, Xn}
Joint probability P(X) can be factorized by
Given dataset D={x1, x2, …, xm}, where xi = (x1,x2,…,xn), we can learn structure G
and parameter θ from the dataset D.
![Page 6: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/6.jpg)
Single Bayesian network learning from data
Model selection: find the highest P(G|D) for all possible G
Searching for all possible G is impossible:
n=4, there are 543 possible DAGs
n=10, there are O(1018) possible DAGs
Question: How to search the best structure in the huge amount of possible DAGs?
![Page 7: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/7.jpg)
Algorithm:
1) Randomly generate an initial DAG, evaluate its score;
2) Evaluate the scores of all the neighbors of current DAG;
3) while {some neighbors have higher scores than current DAG}
move to the neighbor that has the highest score
Evaluate the scores of all the neighbors of the new DAG;
end
4) Repeat (1) - (3) a number of times starting from different DAG every time.
Single Bayesian network learning from data
![Page 8: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/8.jpg)
Neighbors of a structure G: the set of all the DAGs that can be obtained by adding, removing or reversing an edge in G
Single Bayesian network learning from data
Must satisfy acyclic constraint
x1
x2 x3
x4
x1
x2 x3
x4
x1
x2 x3
x4
x1
x2 x3
x4
x1
x2 x3
x4
![Page 9: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/9.jpg)
Given iid dataset D1, D2, …, Dk,
Simultaneously learn the structure B1={G1, θ1} ,B2={G2, θ2},…,Bk={Gk, θk}
Structures (G1,G2,…,Gk) – similar, but not identical
Learning from related task
![Page 10: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/10.jpg)
Learning from related task
One more assumption: the parameters of different networks are
independent:
Not true, but make structure learning more efficient. Since we focus on structure
learning, not parameter learning, this is acceptable.
![Page 11: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/11.jpg)
Learning from related task
Prior:
If structures are not related: G1,…,Gk are independent a priori
Structures are learned independently for each task.
If structures are identical, )...(),...,( 11 kk GGcGGp
Learning the same structure:
},...2,1{ ),,,...,,(),...,,( 2121 kTSKTSKXXXXXX nn
Learning the single structure under the restriction that TSK is always the parent of all the other nodes.
Common structure: remove node TSK and all the edges connected to it.
![Page 12: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/12.jpg)
Learning from related task
Prior:
Between independent and identical:
Penalize each edge (Xi, Xj) that is different in two DAGs
δ=0: independent
δ=1: identical
0<δ<1
For the k task prior
![Page 13: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/13.jpg)
Learning from related task
Model selection: find the highest P(G1,…,Gk|D1,…Dk)
Same idea as single task structure learning.
Question: what is a neighbor of (G1,…,Gk) ?
Def 1: )()...()( 21 kGneighborGneighborGneighbor
Size of neighbors: O(n2k)
Def 2: Def1 + one more constraint:
All the changes of edges happen between the same two nodes for all DAGs in (G1,…,Gk)
Size of neighbors: O(n23k)
![Page 14: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/14.jpg)
Learning from related task
Acceleration:
At each iteration, algorithm must find best score from a set of neighbors
Not necessary search all the elements in
),...,,( 21 ii GGGC
The first i tasks are specified and the rest k-i tasks are not specified.
k
irrr
i
ppp
k
ksrisr
k
iqpqp GDPGDPGGPGGP
11
1
1
1
1
1
1)ˆ|()|()ˆ,ˆ(),(
where
is the upper bound of the neighbor subset )ˆ,...ˆ,,...,,( 121 kii GGGGG
![Page 15: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/15.jpg)
Results
Original network, delete edges with probability Pdel, create 5 tasks.
1000 data points.
10 trials
Compute KL-divergence and editing distance between learned
structure and true structure.
KL-divergence Editing distance
![Page 16: Learning the Structure of Related Tasks](https://reader036.fdocuments.in/reader036/viewer/2022082517/56813af8550346895da38314/html5/thumbnails/16.jpg)
Learning from related task