Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about...

15
1 1 Bayesian Networks – Representation Machine Learning – CSE546 Carlos Guestrin University of Washington November 20, 2014 ©Carlos Guestrin 2005-2014 ©Carlos Guestrin 2005-2014 2 Factored joint distribution - Preview Flu Allergy Sinus Headache Nose

Transcript of Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about...

Page 1: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

1

1

Bayesian Networks – Representation

Machine Learning – CSE546 Carlos Guestrin University of Washington

November 20, 2014 ©Carlos Guestrin 2005-2014

©Carlos Guestrin 2005-2014 2

Factored joint distribution - Preview

Flu Allergy

Sinus

Headache Nose

Page 2: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

2

©Carlos Guestrin 2005-2014 3

What about probabilities? Conditional probability tables (CPTs)

Flu Allergy

Sinus

Headache Nose

©Carlos Guestrin 2005-2014 4

Key: Independence assumptions

Flu Allergy

Sinus

Headache Nose

Knowing sinus separates the variables from each other

Page 3: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

3

©Carlos Guestrin 2005-2014 5

The independence assumption

Flu Allergy

Sinus

Headache Nose

Local Markov Assumption: A variable X is independent of its non-descendants given its parents

©Carlos Guestrin 2005-2014 6

Explaining away

Flu Allergy

Sinus

Headache Nose

Local Markov Assumption: A variable X is independent of its non-descendants given its parents

Page 4: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

4

©Carlos Guestrin 2005-2014 7

Naïve Bayes revisited

Local Markov Assumption: A variable X is independent of its non-descendants given its parents

©Carlos Guestrin 2005-2014 8

Joint distribution

Flu Allergy

Sinus

Headache Nose

Why can we decompose? Markov Assumption!

Page 5: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

5

©Carlos Guestrin 2005-2014 9

The chain rule of probabilities

n  P(A,B) = P(A)P(B|A)

n  More generally: ¨ P(X1,…,Xn) = P(X1) P(X2|X1) … P(Xn|X1,…,Xn-1)

Flu

Sinus

©Carlos Guestrin 2005-2014 10

Chain rule & Joint distribution

Flu Allergy

Sinus

Headache Nose

Local Markov Assumption: A variable X is independent of its non-descendants given its parents

Page 6: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

6

©Carlos Guestrin 2005-2014 11

The Representation Theorem – Joint Distribution to BN

Joint probability distribution: Obtain

BN: Encodes independence assumptions

If conditional independencies

in BN are subset of conditional

independencies in P

©Carlos Guestrin 2005-2014 12

Two (trivial) special cases

Edgeless graph Fully-connected graph

Page 7: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

7

13

Bayesian Networks – (Structure) Learning

Machine Learning – CSE546 Carlos Guestrin University of Washington

November 20, 2014 ©Carlos Guestrin 2005-2014

Review n  Bayesian Networks

¨  Compact representation for probability distributions

¨  Exponential reduction in number of parameters

n  Fast probabilistic inference ¨  As shown in demo examples ¨  Compute P(X|e)

n  Today ¨  Learn BN structure

Flu Allergy

Sinus

Headache Nose

14 ©Carlos Guestrin 2005-2014

Page 8: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

8

Learning Bayes nets

x(1) …

x(m)

Data

structure parameters

CPTs – P(Xi| PaXi)

15 ©Carlos Guestrin 2005-2014

Learning the CPTs

x(1) …

x(m)

Data For each discrete variable Xi

16 ©Carlos Guestrin 2005-2014

Page 9: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

9

Information-theoretic interpretation of maximum likelihood 1

n  Given structure, log likelihood of data:

Flu Allergy

Sinus

Headache Nose

17 ©Carlos Guestrin 2005-2014

Information-theoretic interpretation of maximum likelihood 2

n  Given structure, log likelihood of data:

Flu Allergy

Sinus

Headache Nose

18 ©Carlos Guestrin 2005-2014

Page 10: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

10

Information-theoretic interpretation of maximum likelihood 3

n  Given structure, log likelihood of data:

Flu Allergy

Sinus

Headache Nose

19 ©Carlos Guestrin 2005-2014

Decomposable score

n  Log data likelihood

n  Decomposable score: ¨ Decomposes over families in BN (node and its parents) ¨ Will lead to significant computational efficiency!!! ¨ Score(G : D) = ∑i FamScore(Xi|PaXi : D)

20 ©Carlos Guestrin 2005-2014

Page 11: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

11

How many trees are there? Nonetheless – Efficient optimal algorithm finds best tree

21 ©Carlos Guestrin 2005-2014

Scoring a tree 1: equivalent trees

22 ©Carlos Guestrin 2005-2014

Page 12: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

12

Scoring a tree 2: similar trees

23 ©Carlos Guestrin 2005-2014

Chow-Liu tree learning algorithm 1

n  For each pair of variables Xi,Xj ¨  Compute empirical distribution:

¨  Compute mutual information:

n  Define a graph ¨  Nodes X1,…,Xn ¨  Edge (i,j) gets weight

24 ©Carlos Guestrin 2005-2014

Page 13: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

13

Chow-Liu tree learning algorithm 2

n  Optimal tree BN ¨ Compute maximum weight

spanning tree ¨ Directions in BN: pick any

node as root, breadth-first-search defines directions

25 ©Carlos Guestrin 2005-2014

Structure learning for general graphs

n  In a tree, a node only has one parent

n  Theorem: ¨ The problem of learning a BN structure with at most d

parents is NP-hard for any (fixed) d>1

n  Most structure learning approaches use heuristics ¨  (Quickly) Describe the two simplest heuristic

26 ©Carlos Guestrin 2005-2014

Page 14: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

14

Learn BN structure using local search

Starting from Chow-Liu tree

Local search, possible moves: •  Add edge •  Delete edge •  Invert edge

Score using BIC

27 ©Carlos Guestrin 2005-2014

Learn Graphical Model Structure using LASSO

n  Graph structure is about selecting parents:

n  If no independence assumptions, then CPTs depend on all parents:

n  With independence assumptions, depend on key variables:

n  One approach for structure learning, sparse logistic regression!

©Carlos Guestrin 2005-2014 28

Flu Allergy

Sinus

Headache Nose

Page 15: Bayesian Networks – Representation · Sinus Headache Nose . 15 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information theoretic

15

What you need to know about learning BN structures

n  Decomposable scores ¨ Maximum likelihood ¨  Information theoretic interpretation

n  Best tree (Chow-Liu) n  Beyond tree-like models is NP-hard n  Use heuristics, such as:

¨ Local search ¨ LASSO

29 ©Carlos Guestrin 2005-2014