Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

34
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1

Transcript of Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Page 1: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Lecture 29

Conditional Independence,Bayesian networks intro

Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2

1

Page 2: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Announcement

• Assignment 4 will be out on Wed.• Due Wed. April 8

• Final is on April 17• Similar format as the midterm (mix of short conceptual

question from the posted list and of problem solving questions)

• Remember that you have to pass the final in order to pass the course

2

Page 3: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Lecture Overview

• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction

3

Page 4: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Chain Rule

4

 

 

• Allows representing a Join Probability Distribution (JPD) as the product of conditional probability distributions

Page 5: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Marginal Independence

• Intuitively: if X ╨ Y, then

• learning that Y=y does not change your belief in X• and this is true for all values y that Y could take

• For example, weather is marginally independent of the result of a coin toss

5

 

Page 6: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Exploiting marginal independence

• Recall the product rule

p(X=x ˄ Y=y) = p(X=x | Y=y) × p(Y=y)• If X and Y are marginally independent,

p(X=x | Y=y) = p(X=x)• Thus we have

p(X=x ˄ Y=y) = p(X=x) × p(Y=y)• In distribution form

p(X,Y) = p(X) × p(Y)

6

Page 7: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Exploiting marginal independence

 

Exponentially fewer than the JPD!

7

Page 8: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Conditional Independence 

• Intuitively: if X and Y are conditionally independent given Z, then• learning that Y=y does not change your belief in X

when we already know Z=z• and this is true for all values y that Y could take

and all values z that Z could take

8

Page 9: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Example for Conditional Independence

• However, whether light l1 is lit (Lit-l1 ) is conditionally independent from the position of switch s2 (Up-s2 ) given whether there is power in wire w0

• Once we know Power-w0, learning values for Up-s2 does not change our beliefs about Lit-l1

• I.e., Lit-l1 is conditionally independent of Up-s2 given Power-w0

Power-w0

Lit-l1

Up-s2

9

• Whether light l1 is lit and the position of switch s2 are not marginally independent• The position of the switch determines whether

there is power in the wire w0 connected to the light

Lit-l1

Up-s2

Page 10: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Lecture Overview

• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction

10

Page 11: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Another example of conditionally but not marginally independent variables

• ExamGrade and AssignmentGrade are not marginally independent• Students who do well on one typically do well on the other, and viceversa

• But conditional on UnderstoodMaterial, they are independent• Variable UnderstoodMaterial is a common cause of

variables ExamGrade and AssignmentGrade• Knowing UnderstoodMaterial shields any information we could get from

AssignmentGrade toward Exam grade (and viceversa)

UnderstoodMaterial

Assignment Grade

ExamGrade

11

Assignment Grade

ExamGrade

Page 12: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Example: marginally but not conditionally independent

• But they are not conditionally independent given alarmE.g., if the alarm rings and you learn S=true your belief in F decreases

Alarm

Smoking At Sensor

Fire

12

• Two variables can be marginally but not conditionally independent• “Smoking At Sensor” S: resident smokes cigarette next to fire sensor• “Fire” F: there is a fire somewhere in the building• “Alarm” A: the fire alarm rings• S and F are marginally independent

Learning S=true or S=false does not change your belief in F

Smoking At Sensor

Fire

Page 13: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Conditional vs. Marginal IndependenceTwo variables can be

Conditionally but not marginally independent• ExamGrade and AssignmentGrade• ExamGrade and AssignmentGrade given UnderstoodMaterial

• Lit l1 and Up(s2)• Lit l1 and Up(s2) given

Marginally but not conditionally independent• SmokingAtSensor and Fire• SmokingAtSensor and Fire given Alarm

Both marginally and conditionally independent• CanucksWinStanleyCup and Lit(l1)

• CanucksWinStanleyCup and Lit(l1) given Power(w0)

Neither marginally nor conditionally independent• Temperature and Cloudiness• Temperature and Cloudiness given Wind

UnderstoodMaterial

Assignment Grade

ExamGrade

Alarm

Smoking At Sensor

Fire

Power(w0)

Lit(l1)

Up(s2)

Power(w0)

Lit(l1)

Canucks Win

Temperature

Claudiness

Wind

13

Page 14: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Exploiting Conditional Independence

• Example 1: Boolean variables A,B,C• C is conditionally independent of A given B• We can then rewrite P(C | A,B) as P(C|B)

 

14

Page 15: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Exploiting Conditional Independence

• Example 2: Boolean variables A,B,C,D• D is conditionally independent of A given C• D is conditionally independent of B given C• We can then rewrite P(D | A,B,C) as P(D|B,C)• And can further rewrite P(D|B,C) as P(D|C)

 

15

Page 16: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Exploiting Conditional Independence

• If, for instance, D is conditionally independent of A and B given C, we can rewrite the above as

• Under independence we gain compactness• The chain rule allows us to write the JPD as a product of conditional

distributions• Conditional independence allows us to write them compactly

 

),,|(),|()|()(),,,(.,. CBADPBACPABPAPDCBAPge

),|()|()(),,,( BACPABPAPDCBAP

16

Page 17: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Bayesian (or Belief) Networks

• Bayesian networks and their extensions are Representation & Reasoning systems explicitly defined to exploit independence in probabilistic reasoning

17

Page 18: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Lecture Overview

• Recap lecture 28• More on Conditional Independence• Bayesian Networks Introduction

18

Page 19: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Bayesian Network Motivation• We want a representation and reasoning system that is based on

conditional (and marginal) independence• Compact yet expressive representation• Efficient reasoning procedures

• Bayesian (Belief) Networks are such a representation• Named after Thomas Bayes (ca. 1702 –1761)• Term coined in 1985 by Judea Pearl (1936 – )• Their invention changed the primary focus of AI from logic to

probability!

Thomas Bayes Judea Pearl

In 2012 Pearl received the very prestigious ACM Turing Award for his contributions to Artificial Intelligence!

19

Page 20: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Bayesian Networks: Intuition

• A graphical representation for a joint probability distribution• Nodes are random variables• Directed edges between nodes reflect dependence

• Some informal examples:

UnderstoodMaterial

Assignment Grade

ExamGrade

Alarm

Smoking At

SensorFire

Power-w0

Lit-l1

Up-s2

20

Page 21: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Belief (or Bayesian) networks

Def. A Belief network consists of

• a directed, acyclic graph (DAG) where each node is associated with a random variable Xi

• A domain for each variable Xi

• a set of conditional probability distributions for each node Xi given its parents Pa(Xi) in the graph

P (Xi | Pa(Xi))

• The parents Pa(Xi) of a variable Xi are those Xi directly depends on

• A Bayesian network is a compact representation of the JDP for a set of variables (X1, …,Xn )

P(X1, …,Xn) = ∏ni= 1 P (Xi | Pa(Xi))

21

Page 22: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Bayesian Networks: Definition

• Discrete Bayesian networks:• Domain of each variable is finite• Conditional probability distribution is a conditional probability table • We will assume this discrete case

But everything we say about independence (marginal & conditional) carries over to the continuous case

• Def. A Belief network consists of

– a directed, acyclic graph (DAG) where each node is associated with a random variable Xi

– A domain for each variable Xi

– a set of conditional probability distributions for each node Xi given its parents Pa(Xi) in the graph

P (Xi | Pa(Xi))

22

Page 23: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

1. Define a total order over the random variables: (X1, …,Xn)

2. Apply the chain rule

P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1)

3. For each Xi, , select the smallest set of predecessors Pa (Xi) such that

P(Xi | X1, … ,Xi-1) = P (Xi | Pa(Xi))

4. Then we can rewrite

P(X1, …,Xn) = ∏ni= 1 P (Xi | Pa(Xi))

• This is a compact representation of the initial JPD• factorization of the JPD based on existing conditional independencies

among the variables

How to build a Bayesian network

Predecessors of Xi in the total order defined over the variables

Xi is conditionally independent from all its other predecessors given Pa(Xi)

23

Page 24: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

How to build a Bayesian network (cont’d)

5. Construct the Bayesian Net (BN)• Nodes are the random variables

• Draw a directed arc from each variable in Pa(Xi) to Xi

• Define a conditional probability table (CPT) for each variable Xi:

• P(Xi | Pa(Xi))

24

Page 25: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Example for BN construction: Fire DiagnosisYou want to diagnose whether there is a fire in a building• You can receive reports (possibly noisy) about whether everyone is leaving

the building• If everyone is leaving, this may have been caused by a fire alarm• If there is a fire alarm, it may have been caused by a fire or by tampering • If there is a fire, there may be smoke

Start by choosing the random variables for this domain, here all are Boolean:• Tampering (T) is true when the alarm has been tampered with• Fire (F) is true when there is a fire• Alarm (A) is true when there is an alarm• Smoke (S) is true when there is smoke• Leaving (L) is true if there are lots of people leaving the building• Report (R) is true if the sensor reports that lots of people are leaving the

building

Next apply the procedure described earlier

25

Page 26: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

Example for BN construction: Fire Diagnosis1. Define a total ordering of variables:

- Let’s chose an order that follows the causal sequence of events- Fire (F), Tampering (T), Alarm, (A), Smoke (S) Leaving (L) Report (R)

2. Apply the chain rule

P(F,T,A,S,L,R) =

We will do steps 3, 4 and 5 together, for each element P(Xi | X1, … ,Xi-1) of the factorization

3. For each variable (Xi), choose the parents Parents(Xi) by evaluating conditional independencies, so that

P(Xi | X1, … ,Xi-1) = P (Xi | Parents (Xi))

4. Rewrite

P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents (Xi))

5. Construct the Bayesian network

26

Page 27: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)

Fire (F) is the first variable in the ordering, X1. It does not have parents.

Fire Diagnosis Example

Fire

27

Page 28: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)

• Tampering (T) is independent of fire (learning that one is true would not change your beliefs about the probability of the other)

Example

Tampering Fire

28

Page 29: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T ) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)

• Alarm (A) depends on both Fire and Tampering: it could be caused by either or both

Fire Diagnosis Example

Tampering Fire

Alarm

29

Page 30: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)

• Smoke (S) is caused by Fire, and so is independent of Tampering and Alarm given whether there is a Fire

Fire Diagnosis Example

Tampering Fire

AlarmSmoke

30

Page 31: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T | F) P (A | F,T) P (S | F) P (L | F,T,A,S) P (R | F,T,A,S,L)

• Leaving (L) is caused by Alarm, and thus is independent of the other variables given Alarm

Example

Tampering Fire

AlarmSmoke

Leaving

31

Page 32: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | F,T,A,S,L)

• Report ( R) is caused by Leaving, and thus is independent of the other variables given Leaving

Fire Diagnosis Example

Tampering Fire

AlarmSmoke

Leaving

Report

32

Page 33: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)

The result is the Bayesian network above, and its corresponding, very compact factorization of the original JPD

P(F,T,A,S,L,R)= P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)

Fire Diagnosis Example

Tampering Fire

AlarmSmoke

Leaving

Report

33

Page 34: Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 1.

• Define and give examples of random variables, their domains and probability distributions• Calculate the probability of a proposition f given µ(w) for the set of possible worlds

• Define a joint probability distribution (JPD)• Marginalize over specific variables to compute distributions over any subset of the variables

• Given a JPD• Marginalize over specific variables• Compute distributions over any subset of the variables

• Prove the formula to compute conditional probability P(h|e)• Use inference by enumeration

• to compute joint posterior probability distributions over any subset of variables given evidence

• Derive and use Bayes Rule• Derive the Chain Rule• Define and use marginal independence

• Define and use conditional independence• Build a Bayesian Network for a given domain

Learning Goals For Probability so far

34