Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.
-
Upload
tyrone-terry -
Category
Documents
-
view
214 -
download
0
Transcript of Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.
Lecture 29
Conditional Independence,Bayesian networks intro
Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2
1
Announcement
• Assignment 4 will be out on Wed.• Due Wed. April 8
• Final is on April 17• Similar format as the midterm (mix of short conceptual
question from the posted list and of problem solving questions)
• Remember that you have to pass the final in order to pass the course
2
Lecture Overview
• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction
3
Chain Rule
4
• Allows representing a Join Probability Distribution (JPD) as the product of conditional probability distributions
Marginal Independence
• Intuitively: if X ╨ Y, then
• learning that Y=y does not change your belief in X• and this is true for all values y that Y could take
• For example, weather is marginally independent of the result of a coin toss
5
Exploiting marginal independence
• Recall the product rule
p(X=x ˄ Y=y) = p(X=x | Y=y) × p(Y=y)• If X and Y are marginally independent,
p(X=x | Y=y) = p(X=x)• Thus we have
p(X=x ˄ Y=y) = p(X=x) × p(Y=y)• In distribution form
p(X,Y) = p(X) × p(Y)
6
Exploiting marginal independence
Exponentially fewer than the JPD!
7
Conditional Independence
• Intuitively: if X and Y are conditionally independent given Z, then• learning that Y=y does not change your belief in X
when we already know Z=z• and this is true for all values y that Y could take
and all values z that Z could take
8
Example for Conditional Independence
• However, whether light l1 is lit (Lit-l1 ) is conditionally independent from the position of switch s2 (Up-s2 ) given whether there is power in wire w0
• Once we know Power-w0, learning values for Up-s2 does not change our beliefs about Lit-l1
• I.e., Lit-l1 is conditionally independent of Up-s2 given Power-w0
Power-w0
Lit-l1
Up-s2
9
• Whether light l1 is lit and the position of switch s2 are not marginally independent• The position of the switch determines whether
there is power in the wire w0 connected to the light
Lit-l1
Up-s2
Lecture Overview
• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction
10
Another example of conditionally but not marginally independent variables
• ExamGrade and AssignmentGrade are not marginally independent• Students who do well on one typically do well on the other, and viceversa
• But conditional on UnderstoodMaterial, they are independent• Variable UnderstoodMaterial is a common cause of
variables ExamGrade and AssignmentGrade• Knowing UnderstoodMaterial shields any information we could get from
AssignmentGrade toward Exam grade (and viceversa)
UnderstoodMaterial
Assignment Grade
ExamGrade
11
Assignment Grade
ExamGrade
Example: marginally but not conditionally independent
• But they are not conditionally independent given alarmE.g., if the alarm rings and you learn S=true your belief in F decreases
Alarm
Smoking At Sensor
Fire
12
• Two variables can be marginally but not conditionally independent• “Smoking At Sensor” S: resident smokes cigarette next to fire sensor• “Fire” F: there is a fire somewhere in the building• “Alarm” A: the fire alarm rings• S and F are marginally independent
Learning S=true or S=false does not change your belief in F
Smoking At Sensor
Fire
Conditional vs. Marginal IndependenceTwo variables can be
Conditionally but not marginally independent• ExamGrade and AssignmentGrade• ExamGrade and AssignmentGrade given UnderstoodMaterial
• Lit l1 and Up(s2)• Lit l1 and Up(s2) given
Marginally but not conditionally independent• SmokingAtSensor and Fire• SmokingAtSensor and Fire given Alarm
Both marginally and conditionally independent• CanucksWinStanleyCup and Lit(l1)
• CanucksWinStanleyCup and Lit(l1) given Power(w0)
Neither marginally nor conditionally independent• Temperature and Cloudiness• Temperature and Cloudiness given Wind
UnderstoodMaterial
Assignment Grade
ExamGrade
Alarm
Smoking At Sensor
Fire
Power(w0)
Lit(l1)
Up(s2)
Power(w0)
Lit(l1)
Canucks Win
Temperature
Claudiness
Wind
13
Exploiting Conditional Independence
• Example 1: Boolean variables A,B,C• C is conditionally independent of A given B• We can then rewrite P(C | A,B) as P(C|B)
14
Exploiting Conditional Independence
• Example 2: Boolean variables A,B,C,D• D is conditionally independent of A given C• D is conditionally independent of B given C• We can then rewrite P(D | A,B,C) as P(D|B,C)• And can further rewrite P(D|B,C) as P(D|C)
15
Exploiting Conditional Independence
• If, for instance, D is conditionally independent of A and B given C, we can rewrite the above as
• Under independence we gain compactness• The chain rule allows us to write the JPD as a product of conditional
distributions• Conditional independence allows us to write them compactly
),,|(),|()|()(),,,(.,. CBADPBACPABPAPDCBAPge
),|()|()(),,,( BACPABPAPDCBAP
16
Bayesian (or Belief) Networks
• Bayesian networks and their extensions are Representation & Reasoning systems explicitly defined to exploit independence in probabilistic reasoning
17
Lecture Overview
• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction
18
Bayesian Network Motivation• We want a representation and reasoning system that is based on
conditional (and marginal) independence• Compact yet expressive representation• Efficient reasoning procedures
• Bayesian (Belief) Networks are such a representation• Named after Thomas Bayes (ca. 1702 –1761)• Term coined in 1985 by Judea Pearl (1936 – )• Their invention changed the primary focus of AI from logic to
probability!
Thomas Bayes Judea Pearl
In 2012 Pearl received the very prestigious ACM Turing Award for his contributions to Artificial Intelligence!
19
Bayesian Networks: Intuition
• A graphical representation for a joint probability distribution• Nodes are random variables• Directed edges between nodes reflect dependence
• Some informal examples:
UnderstoodMaterial
Assignment Grade
ExamGrade
Alarm
Smoking At
SensorFire
Power-w0
Lit-l1
Up-s2
20
Belief (or Bayesian) networks
Def. A Belief network consists of
• a directed, acyclic graph (DAG) where each node is associated with a random variable Xi
• A domain for each variable Xi
• a set of conditional probability distributions for each node Xi given its parents Pa(Xi) in the graph
P (Xi | Pa(Xi))
• The parents Pa(Xi) of a variable Xi are those Xi directly depends on
• A Bayesian network is a compact representation of the JDP for a set of variables (X1, …,Xn )
P(X1, …,Xn) = ∏ni= 1 P (Xi | Pa(Xi))
21
Bayesian Networks: Definition
• Discrete Bayesian networks:• Domain of each variable is finite• Conditional probability distribution is a conditional probability table • We will assume this discrete case
But everything we say about independence (marginal & conditional) carries over to the continuous case
• Def. A Belief network consists of
– a directed, acyclic graph (DAG) where each node is associated with a random variable Xi
– A domain for each variable Xi
– a set of conditional probability distributions for each node Xi given its parents Pa(Xi) in the graph
P (Xi | Pa(Xi))
22
1. Define a total order over the random variables: (X1, …,Xn)
2. Apply the chain rule
P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1)
3. For each Xi, , select the smallest set of predecessors Pa (Xi) such that
P(Xi | X1, … ,Xi-1) = P (Xi | Pa(Xi))
4. Then we can rewrite
P(X1, …,Xn) = ∏ni= 1 P (Xi | Pa(Xi))
• This is a compact representation of the initial JPD• factorization of the JPD based on existing conditional independencies
among the variables
How to build a Bayesian network
Predecessors of Xi in the total order defined over the variables
Xi is conditionally independent from all its other predecessors given Pa(Xi)
23
How to build a Bayesian network (cont’d)
5. Construct the Bayesian Net (BN)• Nodes are the random variables
• Draw a directed arc from each variable in Pa(Xi) to Xi
• Define a conditional probability table (CPT) for each variable Xi:
• P(Xi | Pa(Xi))
24
Example for BN construction: Fire DiagnosisYou want to diagnose whether there is a fire in a building• You can receive reports (possibly noisy) about whether everyone is leaving
the building• If everyone is leaving, this may have been caused by a fire alarm• If there is a fire alarm, it may have been caused by a fire or by tampering • If there is a fire, there may be smoke
Start by choosing the random variables for this domain, here all are Boolean:• Tampering (T) is true when the alarm has been tampered with• Fire (F) is true when there is a fire• Alarm (A) is true when there is an alarm• Smoke (S) is true when there is smoke• Leaving (L) is true if there are lots of people leaving the building• Report (R) is true if the sensor reports that lots of people are leaving the
building
Next apply the procedure described earlier
25
Example for BN construction: Fire Diagnosis1. Define a total ordering of variables:
- Let’s chose an order that follows the causal sequence of events- Fire (F), Tampering (T), Alarm, (A), Smoke (S) Leaving (L) Report (R)
2. Apply the chain rule
P(F,T,A,S,L,R) =
We will do steps 3, 4 and 5 together, for each element P(Xi | X1, … ,Xi-1) of the factorization
3. For each variable (Xi), choose the parents Parents(Xi) by evaluating conditional independencies, so that
P(Xi | X1, … ,Xi-1) = P (Xi | Parents (Xi))
4. Rewrite
P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents (Xi))
5. Construct the Bayesian network
26
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
Fire (F) is the first variable in the ordering, X1. It does not have parents.
Fire Diagnosis Example
Fire
27
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
• Tampering (T) is independent of fire (learning that one is true would not change your beliefs about the probability of the other)
Example
Tampering Fire
28
P(F)P (T ) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
• Alarm (A) depends on both Fire and Tampering: it could be caused by either or both
Fire Diagnosis Example
Tampering Fire
Alarm
29
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
• Smoke (S) is caused by Fire, and so is independent of Tampering and Alarm given whether there is a Fire
Fire Diagnosis Example
Tampering Fire
AlarmSmoke
30
P(F)P (T | F) P (A | F,T) P (S | F) P (L | F,T,A,S) P (R | F,T,A,S,L)
• Leaving (L) is caused by Alarm, and thus is independent of the other variables given Alarm
Example
Tampering Fire
AlarmSmoke
Leaving
31
P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | F,T,A,S,L)
• Report ( R) is caused by Leaving, and thus is independent of the other variables given Leaving
Fire Diagnosis Example
Tampering Fire
AlarmSmoke
Leaving
Report
32
P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)
The result is the Bayesian network above, and its corresponding, very compact factorization of the original JPD
P(F,T,A,S,L,R)= P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)
Fire Diagnosis Example
Tampering Fire
AlarmSmoke
Leaving
Report
33
• Define and give examples of random variables, their domains and probability distributions• Calculate the probability of a proposition f given µ(w) for the set of possible worlds
• Define a joint probability distribution (JPD)• Marginalize over specific variables to compute distributions over any subset of the variables
• Given a JPD• Marginalize over specific variables• Compute distributions over any subset of the variables
• Prove the formula to compute conditional probability P(h|e)• Use inference by enumeration
• to compute joint posterior probability distributions over any subset of variables given evidence
• Derive and use Bayes Rule• Derive the Chain Rule• Define and use marginal independence
• Define and use conditional independence• Build a Bayesian Network for a given domain
Learning Goals For Probability so far
34