Bayesian Networks

50
Bayesian Networks causal probabilistic network, or Bayesian network, s an directed acyclic graph (DAG) where nodes epresent variables and links represent dependency rela .g. of the type cause-effect, between variables nd quantified by (conditional) probabilities ualitative component + quantitative component A B C D E F G H

description

Bayesian Networks. A causal probabilistic network , or Bayesian network , is an directed acyclic graph (DAG) where nodes represent variables and links represent dependency relations , e.g. of the type cause-effect, between variables - PowerPoint PPT Presentation

Transcript of Bayesian Networks

Page 1: Bayesian Networks

Bayesian Networks• A causal probabilistic network, or Bayesian network, is an directed acyclic graph (DAG) where nodes represent variables and links represent dependency relations, e.g. of the type cause-effect, between variables and quantified by (conditional) probabilities

• Qualitative component + quantitative component

A

B

C

D

E

F

G

H

Page 2: Bayesian Networks

Bayesian Networks

• Qualitative component : relations of conditional dependence / independence

I(A, B | C): A and B are independent given CI(A, B) = I(A, B | Ø): A and B are a priori independent

• Formal study of the properties of the ternary relation I

• A Bayesian network may encode three fundamental types of relations among neighbour variables.

Page 3: Bayesian Networks

Qualitative Relations : type I

FGH

Ex: F: smoke, G: bronchitis, H: respiratory problems (dyspnea)

Relations:¬ I(F, H)

I(F, H | G)

Page 4: Bayesian Networks

Qualitative Relations : type II

EFG

Ex: F: smoke, G: bronchitis,

E: lung cancer

Relations:¬ I(E, G)

I(E, G | F)

Page 5: Bayesian Networks

Qualitative Relations : type III

B C E

Ex: C: alarm, B: movement detection,

E: rain

Relations: I(B, E)

¬ I(B, E | C)

Page 6: Bayesian Networks

Probabilistic component

• Qualitative knowledge: a directed acyclic graph G (DAG)Nodes(G) = V = {X1, …, Xn} -- discrete variables --Edges(G) VxVParents(Xi) = {Xi : (Xj, Xi) Edges(G)}

• Probabilistic knowledge: P(Xi | parents(Xi))

These probabilities determine a joint probability distribution P over V = {X1, …, Xn}:

P(X1, …, Xn) = P(X1 | parents(X1)) · · · P(Xn | parents(Xn))

Bayesian Network = (G, P)

Page 7: Bayesian Networks

Joint Distribution

• P(X1,X2,...Xn) = P(Xn|Xn-1,...X1) ... P(X3|X2,X1) P(X2|X1) P(X1).

• Independence relations of each variable Xi with the set of predecessor variables of the parents of Xi:

P(Xi | parents(Xi), Y1,.., Yk) = P(Xi | parents(Xi))

P(X1, X2, ..., Xn) = i=1,n P(Xi | parents(Xi))

• to have in each node Xi the conditional probability distribution P(Xi | parents(Xi)) is enough to determine the full joint probability distribution P(X1,X2,...,Xn)

Page 8: Bayesian Networks

ExampleA

B

C

D

E

F

G

H

P(A): P(a) = 0.01P(B | A): P(b | a) = 0.05, P(b | ¬a) = 0.01P(C | B,E): P(c | b, e) = 1, P(c | b, ¬e) = 1, P(c | ¬b, e) = 1, P(c | ¬b, ¬e) = 0P(F): P(f) = 0.5P(D | C): P(d | c) = .98, P(d | ¬c) = 0.05P(E | F): P(e | f) = 0.1, P(e | ¬f) = 0.01P(G | F): P(g | f) = 0.6, P(g | ¬f) = 0.3P(H | C, G): P(h | c,g) =0.9 , P(h | c,¬g) = 0.7, P(h | ¬c,g) = 0.8, P(h | ¬c,¬g) = 0.1,

P(A,B,C,D,E,F,G,H) = P(D | C) P(H | C, G) P(C | B, E) P(G | F) P(E | F) P(F) P(B | A) P(A)

P(a,¬b,c,¬d,e,f,g,¬h) = P(¬d |c) P(¬h |c,g) P(c | ¬b,e) P(g | f) P(e | f) P(f) P(¬b | a) P(a) = (1- 0.98) (1-0.9) 1 0.6 0.1 0.5 (1-0.05) 0.01 = 5,7 10-7.

A: visit to Asia B: tuberculosisF: smoke E: lung cancerG: bronchitis C: B or ED: X-ray H: dyspnea

Page 9: Bayesian Networks

D-separation relations and probabilistic independence

Goal: precesely determine which independence relations (graphically) are defined by one DAG.

Previous definitions:

• A path is a sequence of connected nodes in the graph. • A non directed path is a path without taking into account the directions of the arrows. • A “head-to-head” link in a node is a (non directed) path of the form xyw, the node y is clalled a “head-to-head” node.

Page 10: Bayesian Networks

D-separation• A path c is called to be activated by a set of nodes Z if the following two conditions are satisfied:

1) Every node in c with links head-to-head is in Z or it has a descendent in Z.

2) Any other node in c does not belong to Z.Otherwise, the path c is called to be blocked by Z.

Definition. If X, Y and Z are three disjoint subsets of nodes disjunts in a DAG G, then Z d-separates X from Y, or equivalently X and Y are graphically independent given Z, when all the paths between any node from X and any node from Y are blocked by Z

Page 11: Bayesian Networks

D-separationA

B C

G

E

D

Theorem. Let G be a DAG and let X,Y and Z be subsets of nodes such that X and Y are d-separated by Z. Then, X and Y are conditionally independent from Zfor any probability P such that (G, P) is a causal network over G, that is, s.t. P(X | Y,Z) = P(X | Z) and P(Y | X,Z) = P(Y | Z).

{B} and {C} are d-separated by {A}:

Path B-E-C: E,G {A} - {A} blocks the path B-E-C

Path B-A-C: - {A} blocks the path B-A-C

Page 12: Bayesian Networks

Inference in Bayesian NetworksKnowledge about a domain encoded by a Bayesian network XB = (G, P).

Inference = updating probabilities: evidence E on values taken by some variables modify the probabilities of the rest of variables

P(X) --- > P’(X) = P(X | E)

Direct Method:

XB = < G = {A,B,C,D,E}, P(A,B,C,D,E) >

Evidence: A = ai, B = bjP ( a i , b j , c k , d m , e p )

m , p

P ( a i , b j , c k , d m , e p )

k , m , p

P(C = ck | A = ai, B = bj) =

Page 13: Bayesian Networks

Inference in Bayesian Networks• Bayesian networks allow local computations, which exploit the indepence relations among variables explictly induced by the corresponding DAG of the networks.

• They allow updating the probability of a variable using only the probabilities of the immediat predecessor nodes (parents), and in this way, step by step to update the probabilities of all non-instantiated variables in the network ---> propagation methods

• Two main propagation methods:

• Pearl method: message passing over the DAG

• Lauritzen & Spiegelhalter method: previous transformation of the DAG in a tree of cliques

Page 14: Bayesian Networks

Propagation method in trees of cliques

1) transformation of initial network in another graphical structure, a tree of cliques (subsets de nodes)

equivalent probabilistic information

BN = (G, P) ----> [Tree, P]

2) propagation algorithm over the new structure

Page 15: Bayesian Networks

Graphical TransformationDefinition: a “clique” in a non-directed graph is a complete

and maximal subgraph

To transform a DAG G in a tree of cliques:

1) Delete directions in edges of G: G’

2) Moralization of G’: add edges between nodes with common sons in the original DAG G: G’’

3) Triangularization of G’’ : G*

4) Identification of the cliques in G*

5) Suitable enumeration of the cliques (Running Inters. Prop.)

6) Construction of the tree according to the enumeration

Page 16: Bayesian Networks

Example (1)

A

B

C

D

E

F

G

H A

B

C

D

E

F

G

H

A

B

C

D

E

F

G

H

1)

2)

Page 17: Bayesian Networks

Example (2): triangularizationA

B

C

D

E

F

G

H

A

B

C

D

E

F

G

H

A

B

C

D

E

F

G

H

3) 3)

Page 18: Bayesian Networks

Example (3): cliques

A

B

C

D

E

F

G

H

A

B

C

D

E

F

G

H

Cliques:{A,B}, {B,C,E}, {E,F,G}, {C,E,G}, {C,G,H}, {C,D}

Cliques:4)

Page 19: Bayesian Networks

Ordering of cliques

Enumeration of cliques Clq1, Clq2, …, Clqn such that the following property holds:

Running Intersection Property: for all i=1,…, n there exists j < i such that Si Clqj , where Si = Clqi(Clq1Clq2...Clqi-1).

This property is guaranteed if: (i) nodes of the graph are enumerated following the criterion of “maximum cardinality search”(ii) cliques are ordered according to the node of the clique with a highest ranking in the former enumaration.

Page 20: Bayesian Networks

Example (4): ordering cliques

A

B

C

D

E

F

G

H

1

2

4

8 7

3

6

5

Clq1 = {A,B}, Clq2 = {B,E,C}, Clq3 = {E,C,G}, Clq4 = {E,G,F}, Clq5 = {C,G,H}, Clq6 = {C,D}

Page 21: Bayesian Networks

Tree Construction

Let [Clq1, Clq2, …, Clqn ] be an ordering satisfying R.I.P.

For each clique Clqi, define

Si = Clqi(Clq1Clq2...Clqi-1)Ri = Clqi-Si.

Tree of cliques:- (hyper) nodes: cliques- root: Clq1

- for each clique Clqi, its “father” candidates are

cliques Clqk with k < i and s.t. Si Clqk

(if more than one candidate, random selection)

Page 22: Bayesian Networks

Example (5): trees S2 = Clq2 Clq1{Clq1

S3 = Clq3(Clq1Clq2){E,CClq2

S4 = Clq4(Clq1Clq2 Clq3){GClq3

S5 = Clq5(Clq1Clq2 Clq3.Clq4){C,GClq3

S6 = Clq6( Clq1Clq2 Clq3.Clq4Clq5){CClq2, Clq3, Clq5

Clq1

Clq2

Clq3

Clq4 Clq5Clq6

Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

Page 23: Bayesian Networks

Propagation Algorithm

• Potential Representation of the distribution P(X1, …, Xn):

([W1...Wp], ) is a potential representation of P, where the Wi

are subsets of V = {X1, …, Xn}, if P(V) =

• In a Bayesian network (G, P): P(X1, ..., Xn) = P(Xn| parents(Xn))·...· P(X1| parents(X1))

admits a potential representationP(X1, ..., Xn) = (Clq1) ·(Clq2) · ...·(Clqm)

with (Clqi)= ∏{P(Xj | parents(Xj)) | XjClqi, parents(Xj) Clqi ,

K ( W i )

i = 1

p

Page 24: Bayesian Networks

Propagation Algorithm (2)

Fundamental property of the potential representations:

• Let ([W1, ..., Wm], ) be a potential representation for P. Evidence: X3 = a and X5 = b.

• Problem: update the probabilitaty P’(X1, ..., Xn) = P(X1, ..., Xn| X3=a,X5 = b) ??

Define: W^i = Wi - {X3, X5} ^(W^i) = (Wi (X3=a, X5=b))

([W^1, ..., W^m], ^) is a potential representation for P'.

Page 25: Bayesian Networks

Example (6): potentialsClq1

Clq2

Clq3

Clq4 Clq5

Clq6

Clq1 = {A,B}, Clq2 = {B,E,C}, Clq3 = {E,C,G}, Clq4 = {E,G,F}, Clq5 = {C,G,H}, Clq6 = {C,D}

A

B

C

D

E

F

G

H

(Clq1) = P(A)· P(B | A) (Clq2) = P(C | B,E), (Clq3) = 1 (Clq4) = P(F).P(E | F).P(G | F), (Clq5) = P(H | C, G)(Clq6) = P(D | C)

P(A,B,C,D,E,F,G,H) = P(D | C) P(H | C, G) P(C | B, E) P(G | F) P(E | F) P(F) P(B | A) P(A)

P(A,B,C,D,E,F,G,H) = (Clq1) • …. • (Clq6)

Page 26: Bayesian Networks

Example(6): potentials

(Clq1) = P(A)· P(B | A)(a,b) = P(a) · P(b | a) = 0.005(¬a,b) = P(¬a) · P(b | ¬a) = 0.0099(a,¬b) = P(a) · P(¬b | a) = 0.0095(¬a,¬b) = P(¬a) · P(¬b | ¬a) = 0.9801

(Clq5) = P(H | C, G)(c,g,h) = P(h | c,g) = 0.9 (c,g,¬h) = P(¬h | c,g) = 0.1(c,¬g,h) = P(h | c,¬g) = 0.7 (c,¬g,¬h) = P(¬h | c,¬g) = 0.3(¬c,g,h) = P(h | ¬c,g) = 0.8 (¬c,g,¬h) = P(¬h | ¬c,g) = 0.2(¬c,¬g,h) = P(h | ¬c,¬g) = 0.1 (¬c,¬g,¬h) = P(¬h | ¬c,¬g) = 0.9

Page 27: Bayesian Networks
Page 28: Bayesian Networks

Propagation algorithm: theoretical resultats

Causal network (G, P)([Clq1, ..., Clqp], ) is a potential representation for P

1) P(Clqi) = P(Ri|Si).P(Si)

2) P(Rp|Sp) = , where is the marginal

of the function with respect to the variables of Rp.

3) If father(Clqp) = Clqj, then ([Clq1,...Clqp-1], ') is a potential representation for the marginal distribution of P(V-Rp) where:

'(Clqi)=Clqi) for all i≠j, i < p'(Clqj)=Clqj)

( Clq p )

ψ ( Clq p )

R p

( Clq p )

R p

( Clq p )

R p

Page 29: Bayesian Networks

Propagation algorithm: step by step (2)

Goal: to compute P(Clqi) for all cliques.

Two graph traversals: one bottom-up and one top-down

BU) start with clique Clqp . Combining properties 2 i 3 we have, an iterative form of computing the conditional distributions P(Ri|Si) in each clique until reaching the root clique Clq1.

Root: P(Clq1)=P(R1|S1).

TD) P(S2)= , and from there P(Si)=

--we can always compute in a clique Clqi the distribution P(Si) whenever we have already computed the distribution of its father clique Clqj --

P ( Clq 1 )

Clq 1 − S 2

∑P ( Clq j )

Clq j − S i

Page 30: Bayesian Networks

 Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

P(Ri | Si)

P(Si)

P(Clqi) = P(Ri,Si) = P(Ri | Si) P(Si)

Page 31: Bayesian Networks

Clqi P(Ri|Si) = =

(Clqi)Ri(Clqi)

(Clqi)

(Clqi)’(Clqi) =

(Clqi) j(Sj) k(Sk) Clqi

Clqj Clqk

Clqi

Clqj Clqk

(Clqi) i(Si)

Case 1)

Case 2)

Page 32: Bayesian Networks

Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

6(S6)

5(S5) 4(S4)

3(S3)

2(S2)

Page 33: Bayesian Networks

Example (7)

A) Bottom-up traversal: passing k(Sk) = Rk(Clqk),

Clique Clq6 = {C,D} (R6= {D}, S6 = {C}).

P(R6|S6) = P(D | C) =

6(c) = (c, d) + (c, ¬d) = 0.98 + 0.02 = 16(¬c) = (¬c, d) + (¬c, ¬d) = 0.05 + 0.95 = 1,

P(d | c) = P(¬d | c) = 0.02

P(d | ¬c) = P(¬d | ¬c) = 0.95

( R6

, S6

)

λ6

( S6

)

( c , d )

λ6

( c )

=

0 . 98

1

= 0 . 98

( ¬ c , d )

λ ( ¬ c )

=

0 . 05

1

= 0 . 05

Page 34: Bayesian Networks

Example (7)

Clique Clq5 = {C, G, H} (R5 = {H}, S5 = {C, G}).

This node is clique Clq6’s father. According to point [3], we modify the potential function of the clique Clq5:

'(Clq5)=Clq5)

P(R5 | S5) = P(H | C,G) =

where 5(C,G) =

5(c,g) = '(c, g, h) + '(e, g, ¬h) = 0.9 + 0.1 = 15(c,¬g) = '(c, ¬g, f) + '(c, ¬g, ¬h) = 0.7 + 0.3 = 15(¬c,g) = … = 5(c,¬g) = ...= 1

( Clq6

)

R6

∑ = ψ ( Clq 5 ) ⋅ λ6

( S6

)

' ( Clq5

)

ψ ' ( Clq 5 )

R 5

=

ψ ' ( R5

, S5

)

λ5

( S5

)

' ( C , G , H )

H

Page 35: Bayesian Networks

Exemple (7)

Clique Clq3 = {E,C,G} (R3 = {G}, S3 = {E,C})

Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

Clq3 is father of two cliques: Clq4 and Clq5, both already processed

'(Clq3) = Clq3) R(Clq4) · R(Clq5)

= (Clq5) · 4(S4) · 5(S5)

'(E,C,G) = E,C,G) · 4(E,G) · 5(C,G)

P(R3 | S3) = P(G | E, C) =

where 3(E,C) =

' ( Clq3

)

ψ ' ( Clq3

)

R 3

=

ψ ' ( R3

, S3

)

λ3

( S3

)

' ( E , C , G )

G

Page 36: Bayesian Networks

Example (7)Root: Clique Clq1 = {A, B} (R1 = {A, B}, S1 = ).

'(A,B)=A,B) · 2(B)

P(R1) = P(R1 | S1) =

where 1 = '(a,b) + '(a,¬b)+'(¬a,b)+'(¬a,¬b).

P(A,B) = A,B) : P(a,b) = 0.005, P(a, ¬b) = 0.0095, P(¬a, b) = 0.099, P(¬a, ¬b) =

0.9801

' ( Clq1

)

ψ ' ( Clq 1 )

R 1

=

ψ ' ( R1

)

λ1

( ∅ )

=

ψ ' ( A , B )

λ1

Page 37: Bayesian Networks

Clqi

Clqj Clqk

P(Clqi) = P(Ri|Si).P(Si)

P(Sk) = Clqi -Sk P(Clqi) = i(Sk) P(Sj) = Clqi -Sj P(Clqi) = i(Sj)

Page 38: Bayesian Networks

Clq1

Clq2

Clq3

Clq4 Clq5

Clq6

5(S6)

3(S5) 3(S4)

2(S3)

1(S2)

Page 39: Bayesian Networks

Example (7)

A) Top-down traversal:

Clique Clq2 = {B,E,C} (R2 = {E,C}, S2 = {B}).

P(B) = P(S2) =

P(b) = P(a, b) + P(¬a, b) = 0.005 + 0.099 = 0.104 , P(¬b) = P(a, ¬b) + P(¬a, ¬b) = 1- 0.104 = 0.896

*** P(Clq2) = P(R2 | S2) · P(S2)

P ( Clq 1 )

Clq 1 − S 2

Page 40: Bayesian Networks

Example (7)

Clique Clq3 = {E,C,G} (R3 = G, S3 = {E,C}).

we have to compute P(S3) i P(Clq3)

Clique Clq4 ={E, G, F} (R4 = {F}, S4 = {E,G}).

we have to compute P(S4) i P(Clq4)

Clique Clq5 = {C, G, H} (R5 = {H}, S5 = {C, G}).

we have to compute P(S5) i P(Clq5)

Clique Clq6 = {C,D} (R6= {D}, S6 = {C}).

we have to compute P(S6) i P(Clq6)

Page 41: Bayesian Networks
Page 42: Bayesian Networks

Summary

Given a Bayesian network BN = (G, P), we have seen how

1) To transform G into a tree of cliques and factorize P as

P(X1, ..., Xn) = (Clq1) ·(Clq2) ·...·(Clqm)

where (Clqi)= ∏{P(Xj | parents(Xj)) | XjClqi, parents(Xj) Clqi,

2) To compute the probabilty distributions P(Clqi) with a propagation algorithm, and from there, to compute the probabilities P(Xj) for Xj Clqi, by marginalization.

Page 43: Bayesian Networks

Probability updating

It remains to see how to perform inference,

i.e. how to update probabilities P(Xj) when some information (evidence E) is available about some variables:

P(Xj) --- > P*(Xj) = P(Xj | E)

The updating mechanism is based in a fundamental property of the potential representations when applied to P(X1, ..., Xn) and its potential representation in terms of cliques:

P(X1, ..., Xn) = (Clq1) ·(Clq2) ·...·(Clqm)

Page 44: Bayesian Networks

Updating mechanismRecall:

• Let ([Clq1, ..., Clqm], ) be a potential representation for P(X1, …, Xn).

• We observe: X3 = a and X5 = b.

• Actualització de la probabilitat: P*(X1,X2,X4,X6,..., Xn) = P(X1, ...,Xn| X3=a,X5 = b)

Define: Clq^i = Clqi - {X3, X5} ^(Clq^i) = (Clqi (X3=a, X5=b))

([Clq^1, ..., Clq^m], ^) is a potential representation for P*.

Page 45: Bayesian Networks

Updating mechanism

Based on three steps:

A) build the new tree of cliques obtained by deleting from the original tree the instantiated variables,

B) re-compute the new potential functions ^ corresponding to the new cliques and, finally,

C) apply the propagation algorithm over the new tree of cliques and potential functions.

Page 46: Bayesian Networks

A,B

B,E,C

E,C,G

E,G,F C,G,H

C,D

Clq1

Clq2

Clq3

Clq4

Clq5

Clq6

B

B,E,C

E,C,G

E,G,F C,G

C,D

Clq’1

Clq’2

Clq’3

Clq’4

Clq’5

Clq’6

A = a, H = bP(Xj) P*(Xj) = P(Xj | X=a,H=h)

Page 47: Bayesian Networks

A = a, H = b

Page 48: Bayesian Networks

A = a, H = b

Page 49: Bayesian Networks
Page 50: Bayesian Networks

P(D = d | A = a, H = h) ?