BayesianNetworks - Carnegie Mellon University
Transcript of BayesianNetworks - Carnegie Mellon University
![Page 1: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/1.jpg)
Bayesian Networks
Machine Learning 10-‐601B
Seyoung Kim
Many of these slides are derived from William Cohen. Thanks!
![Page 2: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/2.jpg)
Bayesian Networks
A
J M
B E B – Did a burglary occur? E – Did an earthquake occur?
A – Did the alarm sound off?
M – Mary calls
J – John calls
![Page 3: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/3.jpg)
Bayesian network: Inference
• Once the network is constructed, we can use algorithms for inferring the values of unobserved variables.
• For example, in our previous network the only observed variables are the phone calls. However, what we are really interested in is whether there was a burglary or not.
• How can we determine that?
![Page 4: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/4.jpg)
Inference
• Let’s start with a simpler quesTon
-‐ How can we compute a joint distribuTon from the network? -‐ For example, P(B,¬E,A,J, ¬M)?
• Answer: -‐ That’s easy, let’s use the network
![Page 5: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/5.jpg)
Compu;ng: P(B,¬E,A,J, ¬M)
A
J M
B E P(B)=.05
P(E)=.1
P(A|B,E) =.95 P(A|B,¬E) = .85 P(A| ¬ B,E) =.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J|¬A) = .05
P(M|A) =.8 P(M|¬A) = .15
P(B,¬E,A,J, ¬M) =
P(B)P(¬E)P(A | B, ¬E) P(J | A)P(¬M | A)
= 0.05*0.9*.85*.7*.2
= 0.005355
![Page 6: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/6.jpg)
Compu;ng: P(B,¬E,A,J, ¬M)
A
J M
B E P(B)=.05
P(E)=.1
P(A|B,E) )=.95 P(A|B,¬E) = .85 P(A| ¬ B,E) )=.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J|¬A) = .05
P(M|A) )=.8 P(M|¬A) = .15
P(B,¬E,A,J, ¬M) =
P(B)P(¬E)P(A | B, ¬E) P(J | A)P(¬M | A)
= 0.05*0.9*.85*.7*.2
= 0.005355 We can easily compute a complete joint distribuTon. What about parTal distribuTons? CondiTonal distribuTons?
![Page 7: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/7.jpg)
Inference
• We are interested in queries of the form:
P(B | J,¬M) • This can also be wricen as:
• How do we compute the new joint?
A
J M
B E
![Page 8: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/8.jpg)
Inference in Bayesian networks
• We will discuss three methods:
1. EnumeraTon 2. Variable eliminaTon
3. StochasTc inference
![Page 9: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/9.jpg)
Compu;ng par;al joints
Sum all instances with these seengs (the sum is over the possible assignments to the other two variables, E and A)
![Page 10: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/10.jpg)
Compu;ng: P(B,J, ¬M)
A
J M
B E P(B)=.05
P(E)=.1
P(A|B,E) )=.95 P(A|B,¬E) = .85 P(A| ¬ B,E) )=.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J|¬A) = .05
P(M|A) )=.8 P(M|¬A) = .15
P(B,J, ¬M) =
P(B,J, ¬M,A,E) +
P(B,J, ¬M, ¬ A,E) +
P(B,J, ¬M,A, ¬ E) +
P(B,J, ¬M, ¬ A, ¬ E)
= 0.0007+0.00001+0.005+0.0003 = 0.00601
![Page 11: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/11.jpg)
Compu;ng par;al joints
Sum all instances with these seengs (the sum is over the possible assignments to the other two variables, E and A)
• This method can be improved by re-‐using calculaTons (similar to dynamic programming)
• STll, the number of possible assignments is exponenTal in the number of unobserved variables?
• That is, unfortunately, the best we can do. General querying of Bayesian networks is NP-‐complete
![Page 12: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/12.jpg)
Inference in Bayesian networks is NP complete (sketch)
• ReducTon from 3SAT
• Recall: 3SAT, find saTsfying assignments to the following problem: (a ∨ b ∨ c) ∧ (d ∨ ¬ b ∨ ¬ c) …
P(xi=1) = 0.5
P(xi=1) = P((x1 ∨ x2 ∨ x3)=1)
P(Y=1) = P((x1 ∧ x2 ∧ x3 ∧ x4)=1)
What is P(Y=1)?
Y
![Page 13: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/13.jpg)
Inference in Bayesian networks
• We will discuss three methods:
1. EnumeraTon 2. Variable eliminaTon
3. StochasTc inference
![Page 14: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/14.jpg)
Variable elimina;on
Reuse computaTons rather than recompute probabiliTes
A
J M
B E P(B)=.05
P(E)=.1
P(A|B,E) )=.95 P(A|B,¬E) = .85 P(A| ¬ B,E) )=.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J|¬A) = .05
P(M|A) )=.8 P(M|¬A) = .15
P(B,J, ¬M) =
P(B,J, ¬M,A,E)+
P(B,J, ¬M, ¬ A,E) +
P(B,J,¬M,A, ¬ E) +
P(B,J, ¬M, ¬ A, ¬ E)
= 0.0007+0.00001+0.005+0.0003
= 0.00601
![Page 15: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/15.jpg)
Compu;ng: P(B,J, ¬M)
A
J M
B E P(B,J, ¬M) =
P(B,J, ¬M,A,E)+
P(B,J, ¬M, ¬ A,E) +
P(B,J, ¬M,A, ¬ E) +
P(B,J, ¬M, ¬ A, ¬ E) =
Store as a funcTon of a and use whenever necessary (no need to recompute each Tme)
![Page 16: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/16.jpg)
Variable elimina;on
A
J M
B E Set:
![Page 17: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/17.jpg)
Variable elimina;on
A
J M
B E Set:
![Page 18: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/18.jpg)
Variable elimina;on
A
J M
B E Lets conTnue with these funcTons:
We can now define the following funcTon:
And so we can write:
![Page 19: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/19.jpg)
Variable elimina;on
A
J M
B E
Lets conTnue with another funcTon:
And finally we can write:
![Page 20: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/20.jpg)
Example
J
P(B)=.05 P(E)=.1
P(A|B,E) =.95 P(A|B,¬E) = .85 P(A| ¬ B,E) =.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J| ¬ A) = .05
P(M|A) =.8 P(M| ¬ A) = .15
M
A
B E
Calling the same funcTon mulTple Tmes
![Page 21: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/21.jpg)
Final computa;on (normaliza;on)
![Page 22: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/22.jpg)
Algorithm
• e -‐ evidence (the variables that are known) • vars -‐ the condiTonal probabiliTes derived from the network
in reverse order (bocom up)
• For each var in vars -‐ factors <-‐ make_factor (var,e) -‐ if var is a hidden variable then create a new factor by
summing out var
• Compute the product of all factors • Normalize
![Page 23: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/23.jpg)
Computa;onal complexity
• We are reusing computaTons so we are reducing the running Tme.
• However, there are sTll cases in which this algorithm will lead to exponenTal running Tme.
• Consider the case of fx(y1 … yn). When factoring x out we would need to account for all possible values of the y’s.
Variable eliminaTon can lead to significant cost saving but its efficiency depends on the network structure
![Page 24: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/24.jpg)
Inference in Bayesian networks
• We will discuss three methods:
1. EnumeraTon 2. Variable eliminaTon
3. StochasTc inference
![Page 25: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/25.jpg)
Stochas;c inference
• We can easily sample the joint distribuTon to obtain possible instances
1. Sample the free variable
2. For all other variables:
-‐ If all parents have been sampled, sample based on condiTonal
distribuTon
We end up with a new set of assignments for B,E,A,J and M which are a random sample from the joint
A
J M
B E P(B)=.05
P(E)=.1
P(A|B,E) )=.95 P(A|B,¬E) = .85 P(A| ¬ B,E) )=.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J|¬A) = .05
P(M|A) )=.8 P(M|¬A) = .15
![Page 26: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/26.jpg)
Stochas;c inference
• We can easily sample the joint distribuTon to obtain possible instances
1. Sample the free variable
2. For all other variables:
-‐ If all parents have been sampled, sample based on condiTonal
distribuTon A
J M
B E P(B)=.05
P(E)=.1
P(A|B,E) )=.95 P(A|B,¬E) = .85 P(A| ¬ B,E) )=.5 P(A| ¬ B, ¬ E) = .05
P(J|A) )=.7 P(J|¬A) = .05
P(M|A) )=.8 P(M|¬A) = .15
It is always possible to carry out this sampling procedure, why?
![Page 27: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/27.jpg)
Using sampling for inference
• Let’s revisit our problem: Compute P(B | J,¬M)
• Looking at the samples we can count: -‐ N: total number of samples
-‐ Nc : total number of samples in which the condiTon holds (J,¬M)
-‐ NB: total number of samples where the joint is true (B,J,¬M)
• For a large enough N
-‐ Nc / N ≈ P(J,¬M)
-‐ NB / N ≈ P(B,J,¬M)
• And so, we can set
P(B | J,¬M) = P(B,J,¬M) / P(J,¬M) ≈ NB / Nc
![Page 28: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/28.jpg)
Using sampling for inference
• Lets revisit our problem: Compute P(B | J,¬M)
• Looking at the samples we can cound: -‐ N: total number of samples
-‐ Nc : total number of samples in which the condiTon holds (J,¬M)
-‐ NB: total number of samples where the joint is true (B,J,¬M)
• For a large enough N
-‐ Nc / N ≈ P(J,¬M)
-‐ NB / N ≈ P(B,J,¬M)
• And so, we can set
P(B | J,¬M) = P(B,J,¬M) / P(J,¬M) ≈ NB / Nc
Problem: What if the condiTon rarely happens?
We would need lots and lots of samples, and most would be wasted
![Page 29: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/29.jpg)
Weighted sampling
• Compute P(B | J,¬M)
• We can manually set the value of J to 1 and M to 0
• This way, all samples will contain the correct values for the condiTonal variables
• Problems? A
J M
B E
![Page 30: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/30.jpg)
Weighted sampling
• Compute P(B | J,¬M)
• Given an assignment to parents, we assign a value of 1 to J and 0 to M.
• We record the probability of this assignment (w = p1*p2) and we weight the new joint sample by w
A
J M
B E
![Page 31: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/31.jpg)
Weighted sampling algorithm for compu;ng P(B | J,¬M)
• Set NB,Nc = 0
• Sample the joint seeng the values for J and M, compute the weight, w, of this sample
• Nc = Nc+w • If B = 1, NB = NB+w
• Ater many iteraTons, set
P(B | J,¬M) = NB / Nc
![Page 32: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/32.jpg)
Other inference methods
• Convert network to a polytree -‐ In a polytree no two nodes have more
than one path between them
-‐ We can convert arbitrary networks to a polytree by clustering (grouping) nodes. For such a graph there is a algorithm which is linear in the number of nodes
-‐ However, converTng into a polytree can result in an exponenTal increase in the size of the CPTs
A
J M
B E
A
J M
B E
![Page 33: BayesianNetworks - Carnegie Mellon University](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158b3fe4ef901046e729d06/html5/thumbnails/33.jpg)
Important points
• Bayes rule • Joint distribuTon, independence, condiTonal independence • Acributes of Bayesian networks • ConstrucTng a Bayesian network • Inference in Bayesian networks