06 Bayesian Networks

download 06 Bayesian Networks

of 36

Transcript of 06 Bayesian Networks

  • 8/8/2019 06 Bayesian Networks

    1/36

    Bayesian

    Networks

    Chapter 14

    Section 1 2

  • 8/8/2019 06 Bayesian Networks

    2/36

    Updating the Belief State

    0.5760.1440.0640.016cavity

    0.0080.0720.0120.108cavity

    pcatchpcatchpcatchpcatch

    toothache toothache

    Let D now observe toothache with probability0.8 (e.g., the patient says so)

    How should D update its belief state?

  • 8/8/2019 06 Bayesian Networks

    3/36

    Updating the Belief State

    0.5760.1440.0640.016cavity

    0.0080.0720.0120.108cavity

    pcatchpcatchpcatchpcatch

    toothache toothache

    Let E be the evidence such that P(toothache|E) = 0.8

    We want to compute P(ctpc|E) = P(cpc|t,E) P(t|E)

    Since E is not directly related to the cavity or theprobe catch, we consider that c and pc are independentof E given t, hence: P(cpc|t,E) = P(cpc|t)

  • 8/8/2019 06 Bayesian Networks

    4/36

    Let E be the evidence such that P(Toothache|E) = 0.8

    We want to compute P(ctpc|E) = P(cpc|t,E) P(t|E)

    Since E is not directly related to the cavity or theprobe catch, we consider that c and pc are independentof E given t, hence: P(cpc|t,E) = P(cpc|t)

    To get these 4 probabilitieswe normalize their sum to 0.2

    To get these 4 probabilitieswe normalize their sum to 0.8

    Updating the Belief State

    0.5760.1440.0640.016Cavity

    0.0080.0720.0120.108Cavity

    PCatchPCatchPCatchPCatch

    Toothache Toothache

    0.432

    0.064 0.256

    0.048 0.018

    0.036 0.144

    0.002

  • 8/8/2019 06 Bayesian Networks

    5/36

    Issues

    If a state is described by n propositions,then a belief space contains 2n states(possibly, some have probability 0)

    Modeling difficulty: many numbersmust be entered in the first place

    Computational issue: memory size andtime

  • 8/8/2019 06 Bayesian Networks

    6/36

  • 8/8/2019 06 Bayesian Networks

    7/36

    Bayesian networks

    A simple, graphical notation for conditionalindependence assertions and hence for compactspecification of full joint distributions

    Syntax:

    a set of nodes, one per variable a directed, acyclic graph (link "directly influences")

    a conditional distribution for each node given its parents:P (Xi | Parents (Xi))

    In the simplest case, conditional distribution represented

    as a conditional probability table (CPT) giving thedistribution overXi for each combination of parent values

    Following slides by Hwee Tou Ng (Singapore)

  • 8/8/2019 06 Bayesian Networks

    8/36

    Example (1)

    Topology of network encodes conditional independenceassertions:

    Weatheris independent of the other variables

    Toothache and Catch are conditionally independentgiven Cavity

  • 8/8/2019 06 Bayesian Networks

    9/36

    Bayesian Network

    Notice that Cavity is the cause of both Toothacheand PCatch, and represent the causality links explicitly

    Give the prior probability distribution of Cavity

    Give the conditional probability tables of Toothacheand PCatch

    Cavity

    Toothache

    0.2

    P(Cavity)

    0.6

    0.1

    CavityCavity

    P(Toothache|c)

    PCatch

    0.9

    0.02

    CavityCavity

    P(PCatch|c)

    5 probabilities, instead of 7

    P(ctpc) = P(tpc|c) P(c)= P(t|c) P(pc|c) P(c)

  • 8/8/2019 06 Bayesian Networks

    10/36

    Example (2)

    I'm at work, neighbor John calls to say my alarm is ringing, but neighborMary doesn't call. Sometimes it's set off by minor earthquakes. Is there aburglar?

    Variables: Burglary, Earthquake,Alarm, JohnCalls, MaryCalls

    Network topology reflects "causal" knowledge: A burglar can set the alarm off

    An earthquake can set the alarm off

    The alarm can cause Mary to call

    The alarm can cause John to call

  • 8/8/2019 06 Bayesian Networks

    11/36

    A More Complex BN

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    causes

    effects

    Directedacyclic graph

    Intuitive meaningof arc from x to y:

    x has directinfluence on y

  • 8/8/2019 06 Bayesian Networks

    12/36

    0.950.940.290.001

    TFTF

    TTFF

    P(A|)EB

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    0.001

    P(B)

    0.002

    P(E)

    0.900.05

    TF

    P(J|)A0.700.01

    TF

    P(M|)A

    Size of theCPT for anode with kparents: 2k

    A More Complex BN

    10 probabilities, instead of 31

  • 8/8/2019 06 Bayesian Networks

    13/36

    What does the BN encode?

    Each of the beliefs JohnCallsand MaryCalls is independent ofBurglary and Earthquake givenAlarm or Alarm

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    For example, John doesnot observe any burglariesdirectly

    P(bj) P(b) P(j)P(bj|a) = P(b|a) P(j|a)

  • 8/8/2019 06 Bayesian Networks

    14/36

    The beliefs JohnCallsand MaryCalls are

    independent givenAlarm or AlarmFor instance, the reasons whyJohn and Mary may not call ifthere is an alarm are unrelated

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    What does the BN encode?

    A node is independent ofits non-descendantsgiven its parents

    P(bj|a) = P(b|a) P(j|a)P(jm|a) = P(j|a) P(m|a)

  • 8/8/2019 06 Bayesian Networks

    15/36

    The beliefs JohnCallsand MaryCalls are

    independent givenAlarm or AlarmFor instance, the reasons whyJohn and Mary may not call ifthere is an alarm are unrelated

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    What does the BN encode?

    A node is independent ofits non-descendantsgiven its parents

    Burglary andEarthquake areindependent

  • 8/8/2019 06 Bayesian Networks

    16/36

    Locally Structured World

    A world is locally structured (or sparse) if each of its componentsinteracts directly with relatively few other components

    In a sparse world, the CPTs are small and the BN contains muchfewer probabilities than the full joint distribution

    If the # of entries in each CPT is bounded by a constant, i.e., O(1),

    then the # of probabilities in a BN is linear in n the # ofpropositions instead of 2n for the joint distribution

  • 8/8/2019 06 Bayesian Networks

    17/36

    Semantics

    The full joint distribution is defined as the product of the local

    conditional distributions:

    P(X1, ,Xn) = i = 1P(Xi| Parents(Xi))

    e.g., P(j m a b e)

    = P(j | a) P(m | a) P(a | b, e) P(b) P(e)

    n

  • 8/8/2019 06 Bayesian Networks

    18/36

    Calculation of Joint Probability

    0.950.940.290.001

    TFTF

    TTFF

    P(A|)EB

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    0.001

    P(B)

    0.002

    P(E)

    0.900.05

    TF

    P(J|)A0.700.01

    TF

    P(M|)A

    P(J

    M

    A

    B

    E) = ??

  • 8/8/2019 06 Bayesian Networks

    19/36

    P(JMABE)= P(JM|A,B,E) P(ABE)= P(J|A,B,E) P(M|A,B,E) P(ABE)(J and M are independent given A)

    P(J|A,B,E) = P(J|A)

    (J and

    B

    E are independent given A) P(M|A,B,E) = P(M|A)

    P(ABE) = P(A|B,E) P(B|E) P(E)= P(A|B,E) P(B) P(E)

    (B and E are independent)

    P(JMABE) = P(J|A)P(M|A)P(A|B,E)P(B)P(E)

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

  • 8/8/2019 06 Bayesian Networks

    20/36

    Calculation of Joint Probability

    0.950.940.290.001

    TFTF

    TTFF

    P(A|)EB

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    0.001

    P(B)

    0.002

    P(E)

    0.900.05

    TF

    P(J|)A0.700.01

    TF

    P(M|)A

    P(J

    M

    A

    B

    E)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

  • 8/8/2019 06 Bayesian Networks

    21/36

    Calculation of Joint Probability

    0.950.940.290.001

    TFTF

    TTFF

    P(A|)EB

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    0.001

    P(B)

    0.002

    P(E)

    0.900.05

    TF

    P(J|)A0.700.01

    TF

    P(M|)A

    P(J

    M

    A

    B

    E)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

    P(x1x2xn) = i=1,,nP(xi|parents(Xi)) full joint distribution table

  • 8/8/2019 06 Bayesian Networks

    22/36

    Calculation of Joint Probability

    0.950.940.290.001

    TFTF

    TTFF

    P(A|)EB

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    0.001

    P(B)

    0.002

    P(E)

    0.900.05

    TF

    P(J|)A0.700.01

    TF

    P(M|)A

    P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

    Since a BN defines thefull joint distribution of aset of propositions, it

    represents a belief space

    P(x1x2xn) = i=1,,nP(xi|parents(Xi)) full joint distribution table

  • 8/8/2019 06 Bayesian Networks

    23/36

    Constructing Bayesian networks

    1. Choose an ordering of variablesX1, ,Xn 2. For i= 1 to n

    addXi to the network

    select parents fromX1, ,Xi-1 such that

    P(Xi | Parents(Xi)) = P(Xi | X1, ... Xi-1)

    This choice of parents guarantees:

    P(X1, ,Xn) = i =1P(Xi | X1, , Xi-1) (chain rule)

    = i =1P(Xi| Parents(Xi)) (by construction)

    n

    n

  • 8/8/2019 06 Bayesian Networks

    24/36

    Suppose we choose the ordering M, J, A, B, E

    P(J | M) = P(J)?

    Example

  • 8/8/2019 06 Bayesian Networks

    25/36

  • 8/8/2019 06 Bayesian Networks

    26/36

    Suppose we choose the ordering M, J, A, B, E

    P(J | M) = P(J)?No

    P(A | J, M) = P(A | J)?P(A | J, M) = P(A)? No

    P(B | A, J, M) = P(B | A)?

    P(B | A, J, M) = P(B)?

    Example

  • 8/8/2019 06 Bayesian Networks

    27/36

    Suppose we choose the ordering M, J, A, B, E

    P(J | M) = P(J)?No

    P(A | J, M) = P(A | J)?P(A | J, M) = P(A)? No

    P(B | A, J, M) = P(B | A)?Yes

    P(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?

    P(E | B, A, J, M) = P(E | A, B)?

    Example

  • 8/8/2019 06 Bayesian Networks

    28/36

  • 8/8/2019 06 Bayesian Networks

    29/36

    Example contd.

    Deciding conditional independence is hard in noncausal directions

    (Causal models and conditional independence seem hardwired for

    humans!)

    Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

  • 8/8/2019 06 Bayesian Networks

    30/36

    The BN gives P(t|c) What about P(c|t)? P(Cavity|t)

    = P(Cavity t)/P(t)= P(t|Cavity) P(Cavity) / P(t)

    [Bayes rule]

    P(c|t) = P(t|c) P(c)

    Querying a BN is just applying thetrivial Bayes rule on a larger scale

    Querying the BN

    Cavity

    Toothache

    0.1

    P(C)

    0.40.01111

    TF

    P(T|c)C

    Slides:Jean Claude Latombe

  • 8/8/2019 06 Bayesian Networks

    31/36

    New evidence E indicates that JohnCalls withsome probability p

    We would like to know the posterior probabilityof the other beliefs, e.g. P(Burglary|E)

    P(B|E) = P(BJ|E) + P(B J|E)= P(B|J,E) P(J|E) + P(B |J,E) P(J|E)= P(B|J) P(J|E) + P(B|J) P(J|E)= p P(B|J) + (1-p) P(B|J)

    We need to compute P(B|J) and P(B|J)

    Querying the BN

  • 8/8/2019 06 Bayesian Networks

    32/36

    P(b|J) = P(bJ)= maeP(bJmae) [marginalization]

    = maeP(b)P(e)P(a|b,e)P(J|a)P(m|a) [BN]

    = P(b)eP(e)aP(a|b,e)P(J|a)mP(m|a) [re-ordering]

    Querying the BN

  • 8/8/2019 06 Bayesian Networks

    33/36

    Depth-first evaluation of P(b|J) leads to computing

    each of the 4 following products twice:P(J|A) P(M|A), P(J|A) P(M|A), P(J|A) P(M|A), P(J|A) P(M|A)

    Bottom-up (right-to-left) computation + caching e.g.,variable elimination algorithm (see R&N) avoids suchrepetition

    For singly connected BN, the computation takes timelinear in the total number of CPT entries ( time linearin the # propositions if CPTs size is bounded)

    Querying the BN

  • 8/8/2019 06 Bayesian Networks

    34/36

    Singly Connected BN

    A BN is singly connected if there is at most oneundirected path between any two nodes

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

    is singly connected

  • 8/8/2019 06 Bayesian Networks

    35/36

    Comparison to Classical Logic

    Burglary Alarm

    Earthquake Alarm

    Alarm JohnCalls

    Alarm MaryCalls

    If the agent observesJohnCalls,it infers Alarm,Burglary, and Earthquake

    If it observes JohnCalls, then itinfers nothing

    Burglary Earthquake

    Alarm

    MaryCallsJohnCalls

  • 8/8/2019 06 Bayesian Networks

    36/36

    Summary

    Bayesian networks provide a natural

    representation for (causally induced)

    conditional independence

    Topology + CPTs = compactrepresentation of joint distribution

    Generally easy for domain experts to

    construct