Using physics-based priors in a Bayesian algorithm to enhance ...
An Algorithm for Bayesian Network Construction from Data
description
Transcript of An Algorithm for Bayesian Network Construction from Data
![Page 1: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/1.jpg)
An Algorithm for Bayesian
Network Construction from
Data by: Jie Cheng David A. Bell
Weiru LiuUniversity of Ulster, UK
Presented by: Jian Xu
![Page 2: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/2.jpg)
04/10/23 Machine Learning 2
Outline
• Introduction• Some basic concepts• The proposed algorithm for BN
construction• Experiment results• Discussions & comments
![Page 3: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/3.jpg)
04/10/23 Machine Learning 3
Serum Calcium Brain Tumor
Metastatic Cancer
Coma Headaches
P(M).20
S B P(C)+ + .80+ - .80- + .80- - .05
B P(H)+ .80- .60
M P(S)+ .80- .20 M P(B)
+ .20- .05
What is a Bayesian Network?
Cancer BN Example
![Page 4: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/4.jpg)
04/10/23 Machine Learning 4
Bayesian Network (BN)
• A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn}
• Two components– Structure: direct acyclic graph (DAG)
over nodes, which exploits causal relations in the domain
– CPD: each node has a conditional probability distribution associated with it
![Page 5: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/5.jpg)
04/10/23 Machine Learning 5
BN Learning
• Structure learning– To identify the topology of the
network– Score based methods– Dependency analysis methods
• Parameter learning– To learn the conditional probabilities
for a given network topology– MLE, Bayesian approach, etc
![Page 6: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/6.jpg)
04/10/23 Machine Learning 6
BN Structure Learning
• Search & scoring methods:– To search for a structure most likely to have
generated the data– Use heuristic search method to construct a
model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc
– May not find the best solution– Random restarts: to avoid getting stuck in a
local maximum– Less time complexity in the worst case, i.e.,
when the underlying DAG is fully connected
![Page 7: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/7.jpg)
04/10/23 Machine Learning 7
BN Learning Algorithms (Cont’d)
• Dependency analysis methods: – Use conditional independency (CI) test to
analyze dependency relationships among nodes.
– Usually asymptotically correct when the data is DAG-faithful
– Works efficiently when the underlying network is sparse
– CI tests with large condition sets may be unreliable unless the volume of data is enormous.
– Used in this proposed algorithm
![Page 8: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/8.jpg)
04/10/23 Machine Learning 8
Basic Concepts
• D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that:– every collider on P is in C or has a descendant in C – no other nodes on path P is in C– C is called a condition-set
• Open path: a path between X and Y is said to be open if every node in the path is active.
• Closed path: if any node in the path is inactive• Collider node• Non-collider node
![Page 9: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/9.jpg)
04/10/23 Machine Learning 9
Basic Concepts (Cont’d)
• DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution.
• D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge)
• I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN)
• Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M.
• P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M.
![Page 10: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/10.jpg)
04/10/23 Machine Learning 10
Mutual Information
• The mutual information of two nodes Xi , Xj is defined as:
• The conditional mutual information is defined as:
![Page 11: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/11.jpg)
04/10/23 Machine Learning 11
Assumptions
• All attributes are discrete • No missing values in any record• All the records are drawn from a
single probability model independently
• The size of dataset is big enough for reliable CI tests
• The ordering of the attributes are available before the network construction
![Page 12: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/12.jpg)
04/10/23 Machine Learning 12
An Algorithm for BN Construction
• Drafting– Compute mutual information of each pair
of nodes, and creates a draft of the model• Thickening
– Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model
• Thinning– Each arc of the I-map is examined using CI
tests and will be removed if the two nodes are the arc are conditionally independent
![Page 13: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/13.jpg)
04/10/23 Machine Learning 13
Drafting Phase
1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R
2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S.
3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering)
4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R.
5. Repeat step 4 until S is empty.
![Page 14: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/14.jpg)
04/10/23 Machine Learning 14
Drafting Example
• Figure (a) is the underlying BN structure
• I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε
• Figure (b) is the draft graph
![Page 15: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/15.jpg)
04/10/23 Machine Learning 15
Thickening Phase
6. Get the first pair of nodes in R and remove it from R
7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc.
8. Go to step 6 until R is empty.
![Page 16: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/16.jpg)
04/10/23 Machine Learning 16
Thickening Example
• Figure (b) is the draft graph
• Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B}
• CI test reveal that D and E are dependent given {B}, so arc (D,E) is added
• (A,C) is not added because A and C are independent given {B}
![Page 17: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/17.jpg)
04/10/23 Machine Learning 17
Thinning Phase
9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently.
![Page 18: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/18.jpg)
04/10/23 Machine Learning 18
Thinning Example
• Figure (c) is the I-map of the underlying BN
• Arc (B,E) is removed because B and E are independent of each other given {C,D}.
• Figure (d) is the perfect I-map of the underlying dependency model (a).
![Page 19: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/19.jpg)
04/10/23 Machine Learning 19
Finding Minimum Block Set
![Page 20: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/20.jpg)
04/10/23 Machine Learning 20
Complexity Analysis
• For a dataset with N attributes, r maximum possible values each, k parents at most– Phase I: N2 mutual information
computation, each of which requires O(r2) basic operations, O(N2r2)
– Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN)
– Phase III: same as Phase II.
![Page 21: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/21.jpg)
04/10/23 Machine Learning 21
ALARM Network Structure
![Page 22: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/22.jpg)
04/10/23 Machine Learning 22
Experiment setup
• ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring– 37 nodes, 46 arcs– 3 versions: same structure, different CPD’s
• 10000 cases for each dataset• Modified conditional mutual information ca
lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable
• ε = 0.003
![Page 23: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/23.jpg)
04/10/23 Machine Learning 23
Result on ALARM BN
![Page 24: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/24.jpg)
04/10/23 Machine Learning 24
Discussions & Comments
• About the assumptions– All attributes are discrete – No missing values in any record– The size of dataset is big enough for relia
ble CI tests– The ordering of the attributes are availab
le before the network construction
![Page 25: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/25.jpg)
04/10/23 Machine Learning 25
Discussions & Comments
• Threshold ε– ε = 0.003– How do we pick an appropriate ε?– How does it affect the accuracy and time by cho
osing different ε?
• Modification in the experiment part– Use Modified conditional mutual information ca
lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable
– Does this modification affect the result in any way other than increasing the accuracy?
![Page 26: An Algorithm for Bayesian Network Construction from Data](https://reader033.fdocuments.in/reader033/viewer/2022051209/54876af85806b5a32f8b45b2/html5/thumbnails/26.jpg)
04/10/23 Machine Learning 26