Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Lecture 29

Conditional Independence,Bayesian networks intro

Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2

1

Announcement

• Assignment 4 will be out on Wed.• Due Wed. April 8

• Final is on April 17• Similar format as the midterm (mix of short conceptual

question from the posted list and of problem solving questions)

• Remember that you have to pass the final in order to pass the course

2

Lecture Overview

• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction

3

Chain Rule

4

• Allows representing a Join Probability Distribution (JPD) as the product of conditional probability distributions

Marginal Independence

• Intuitively: if X ╨ Y, then

• learning that Y=y does not change your belief in X• and this is true for all values y that Y could take

• For example, weather is marginally independent of the result of a coin toss

5

Exploiting marginal independence

• Recall the product rule

p(X=x ˄ Y=y) = p(X=x | Y=y) × p(Y=y)• If X and Y are marginally independent,

p(X=x | Y=y) = p(X=x)• Thus we have

p(X=x ˄ Y=y) = p(X=x) × p(Y=y)• In distribution form

p(X,Y) = p(X) × p(Y)

6

Exploiting marginal independence

Exponentially fewer than the JPD!

7

Conditional Independence

• Intuitively: if X and Y are conditionally independent given Z, then• learning that Y=y does not change your belief in X

when we already know Z=z• and this is true for all values y that Y could take

and all values z that Z could take

8

Example for Conditional Independence

• However, whether light l1 is lit (Lit-l1 ) is conditionally independent from the position of switch s2 (Up-s2 ) given whether there is power in wire w0

• Once we know Power-w0, learning values for Up-s2 does not change our beliefs about Lit-l1

• I.e., Lit-l1 is conditionally independent of Up-s2 given Power-w0

Power-w0

Lit-l1

Up-s2

9

• Whether light l1 is lit and the position of switch s2 are not marginally independent• The position of the switch determines whether

there is power in the wire w0 connected to the light

Lit-l1

Up-s2

Lecture Overview


10

Another example of conditionally but not marginally independent variables

• ExamGrade and AssignmentGrade are not marginally independent• Students who do well on one typically do well on the other, and viceversa

• But conditional on UnderstoodMaterial, they are independent• Variable UnderstoodMaterial is a common cause of

variables ExamGrade and AssignmentGrade• Knowing UnderstoodMaterial shields any information we could get from

AssignmentGrade toward Exam grade (and viceversa)

UnderstoodMaterial

Assignment Grade

ExamGrade

11

Assignment Grade

ExamGrade

Example: marginally but not conditionally independent

• But they are not conditionally independent given alarmE.g., if the alarm rings and you learn S=true your belief in F decreases

Alarm

Smoking At Sensor

Fire

12

• Two variables can be marginally but not conditionally independent• “Smoking At Sensor” S: resident smokes cigarette next to fire sensor• “Fire” F: there is a fire somewhere in the building• “Alarm” A: the fire alarm rings• S and F are marginally independent

Learning S=true or S=false does not change your belief in F

Smoking At Sensor

Fire

Conditional vs. Marginal IndependenceTwo variables can be

Conditionally but not marginally independent• ExamGrade and AssignmentGrade• ExamGrade and AssignmentGrade given UnderstoodMaterial

• Lit l1 and Up(s2)• Lit l1 and Up(s2) given

Marginally but not conditionally independent• SmokingAtSensor and Fire• SmokingAtSensor and Fire given Alarm

Both marginally and conditionally independent• CanucksWinStanleyCup and Lit(l1)

• CanucksWinStanleyCup and Lit(l1) given Power(w0)

Neither marginally nor conditionally independent• Temperature and Cloudiness• Temperature and Cloudiness given Wind

UnderstoodMaterial

Assignment Grade

ExamGrade

Alarm

Smoking At Sensor

Fire

Power(w0)

Lit(l1)

Up(s2)

Power(w0)

Lit(l1)

Canucks Win

Temperature

Claudiness

Wind

13

Exploiting Conditional Independence

• Example 1: Boolean variables A,B,C• C is conditionally independent of A given B• We can then rewrite P(C | A,B) as P(C|B)

14


• Example 2: Boolean variables A,B,C,D• D is conditionally independent of A given C• D is conditionally independent of B given C• We can then rewrite P(D | A,B,C) as P(D|B,C)• And can further rewrite P(D|B,C) as P(D|C)

15


• If, for instance, D is conditionally independent of A and B given C, we can rewrite the above as

• Under independence we gain compactness• The chain rule allows us to write the JPD as a product of conditional

distributions• Conditional independence allows us to write them compactly

),,|(),|()|()(),,,(.,. CBADPBACPABPAPDCBAPge

),|()|()(),,,( BACPABPAPDCBAP

16

Bayesian (or Belief) Networks

• Bayesian networks and their extensions are Representation & Reasoning systems explicitly defined to exploit independence in probabilistic reasoning

17

Lecture Overview


18

Bayesian Network Motivation• We want a representation and reasoning system that is based on

conditional (and marginal) independence• Compact yet expressive representation• Efficient reasoning procedures

• Bayesian (Belief) Networks are such a representation• Named after Thomas Bayes (ca. 1702 –1761)• Term coined in 1985 by Judea Pearl (1936 – )• Their invention changed the primary focus of AI from logic to

probability!

Thomas Bayes Judea Pearl

In 2012 Pearl received the very prestigious ACM Turing Award for his contributions to Artificial Intelligence!

19

http://forwardthinking.pcmag.com/show-reports/295428-acm-awards-judea-pearl-the-turing-award-for-work-on-artificial-intelligence




Bayesian Networks: Intuition

• A graphical representation for a joint probability distribution• Nodes are random variables• Directed edges between nodes reflect dependence

• Some informal examples:

UnderstoodMaterial

Assignment Grade

ExamGrade

Alarm

Smoking At

SensorFire

Power-w0

Lit-l1

Up-s2

20

Belief (or Bayesian) networks

Def. A Belief network consists of

• a directed, acyclic graph (DAG) where each node is associated with a random variable Xi

• A domain for each variable Xi

• a set of conditional probability distributions for each node Xi given its parents Pa(Xi) in the graph

P (Xi | Pa(Xi))

• The parents Pa(Xi) of a variable Xi are those Xi directly depends on

• A Bayesian network is a compact representation of the JDP for a set of variables (X1, …,Xn )

P(X1, …,Xn) = ∏ni= 1 P (Xi | Pa(Xi))

21

Bayesian Networks: Definition

• Discrete Bayesian networks:• Domain of each variable is finite• Conditional probability distribution is a conditional probability table • We will assume this discrete case

But everything we say about independence (marginal & conditional) carries over to the continuous case

• Def. A Belief network consists of

– a directed, acyclic graph (DAG) where each node is associated with a random variable Xi

– A domain for each variable Xi

– a set of conditional probability distributions for each node Xi given its parents Pa(Xi) in the graph

P (Xi | Pa(Xi))

22

1. Define a total order over the random variables: (X1, …,Xn)

2. Apply the chain rule

P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1)

3. For each Xi, , select the smallest set of predecessors Pa (Xi) such that

P(Xi | X1, … ,Xi-1) = P (Xi | Pa(Xi))

4. Then we can rewrite

P(X1, …,Xn) = ∏ni= 1 P (Xi | Pa(Xi))

• This is a compact representation of the initial JPD• factorization of the JPD based on existing conditional independencies

among the variables

How to build a Bayesian network

Predecessors of Xi in the total order defined over the variables

Xi is conditionally independent from all its other predecessors given Pa(Xi)

23

How to build a Bayesian network (cont’d)

5. Construct the Bayesian Net (BN)• Nodes are the random variables

• Draw a directed arc from each variable in Pa(Xi) to Xi

• Define a conditional probability table (CPT) for each variable Xi:

• P(Xi | Pa(Xi))

24

Example for BN construction: Fire DiagnosisYou want to diagnose whether there is a fire in a building• You can receive reports (possibly noisy) about whether everyone is leaving

the building• If everyone is leaving, this may have been caused by a fire alarm• If there is a fire alarm, it may have been caused by a fire or by tampering • If there is a fire, there may be smoke

Start by choosing the random variables for this domain, here all are Boolean:• Tampering (T) is true when the alarm has been tampered with• Fire (F) is true when there is a fire• Alarm (A) is true when there is an alarm• Smoke (S) is true when there is smoke• Leaving (L) is true if there are lots of people leaving the building• Report (R) is true if the sensor reports that lots of people are leaving the

building

Next apply the procedure described earlier

25

Example for BN construction: Fire Diagnosis1. Define a total ordering of variables:

- Let’s chose an order that follows the causal sequence of events- Fire (F), Tampering (T), Alarm, (A), Smoke (S) Leaving (L) Report (R)

2. Apply the chain rule

P(F,T,A,S,L,R) =

We will do steps 3, 4 and 5 together, for each element P(Xi | X1, … ,Xi-1) of the factorization

3. For each variable (Xi), choose the parents Parents(Xi) by evaluating conditional independencies, so that

P(Xi | X1, … ,Xi-1) = P (Xi | Parents (Xi))

4. Rewrite

P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents (Xi))

5. Construct the Bayesian network

26


• Tampering (T) is independent of fire (learning that one is true would not change your beliefs about the probability of the other)

Example

Tampering Fire

28

P(F)P (T ) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)

• Alarm (A) depends on both Fire and Tampering: it could be caused by either or both


Tampering Fire

Alarm

29


• Smoke (S) is caused by Fire, and so is independent of Tampering and Alarm given whether there is a Fire


Tampering Fire

AlarmSmoke

30

P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | F,T,A,S,L)

• Report ( R) is caused by Leaving, and thus is independent of the other variables given Leaving


Tampering Fire

AlarmSmoke

Leaving

Report

32

P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)

The result is the Bayesian network above, and its corresponding, very compact factorization of the original JPD

P(F,T,A,S,L,R)= P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)


Tampering Fire

AlarmSmoke

Leaving

Report

33

• Define and give examples of random variables, their domains and probability distributions• Calculate the probability of a proposition f given µ(w) for the set of possible worlds

• Define a joint probability distribution (JPD)• Marginalize over specific variables to compute distributions over any subset of the variables

• Given a JPD• Marginalize over specific variables• Compute distributions over any subset of the variables

• Prove the formula to compute conditional probability P(h|e)• Use inference by enumeration

• to compute joint posterior probability distributions over any subset of variables given evidence

• Derive and use Bayes Rule• Derive the Chain Rule• Define and use marginal independence

• Define and use conditional independence• Build a Bayesian Network for a given domain

Learning Goals For Probability so far

34

Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Documents

Transcript of Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.