Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall...

27
Expectation Expectation Maximization Maximization Introduction to Introduction to Artificial Intelligence Artificial Intelligence COS302 COS302 Michael L. Littman Michael L. Littman Fall 2001 Fall 2001

Transcript of Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall...

Page 1: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Expectation MaximizationExpectation Maximization

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

Page 2: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

AdministrationAdministration

Exams halfway graded. They assure Exams halfway graded. They assure me they will be working over me they will be working over Thanksgiving break.Thanksgiving break.

Project groups.Project groups.

Next week, synonyms via web.Next week, synonyms via web.

Week after, synonyms via wordnet. Week after, synonyms via wordnet. (See web site.)(See web site.)

Page 3: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

PlanPlan

Connection between learning from Connection between learning from data and finding a maximum data and finding a maximum likelihood (ML) modellikelihood (ML) model

ML from complete dataML from complete data

EM: ML with missing dataEM: ML with missing data

EM for HMMsEM for HMMs

QED, PDQ, MOUSEQED, PDQ, MOUSE

Page 4: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Learning from DataLearning from Data

We want to learn a model with a set We want to learn a model with a set of parameter values M.of parameter values M.

We are given a set of data D.We are given a set of data D.

An approach: argmaxAn approach: argmaxMM Pr(M|D) Pr(M|D)

This is the This is the maximum likelihoodmaximum likelihood model (ML).model (ML).

How relate to Pr(D|M)?How relate to Pr(D|M)?

Page 5: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Super Simple ExampleSuper Simple Example

Coin I and Coin II. (Weighted.)Coin I and Coin II. (Weighted.)

Pick a coin at random (uniform).Pick a coin at random (uniform).

Flip it 4 times.Flip it 4 times.

Repeat.Repeat.

What are the parameters of the What are the parameters of the model?model?

Page 6: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

DataData

Coin ICoin I Coin IICoin II

HHHTHHHT TTTHTTTH

HTHHHTHH THTTTHTT

HTTHHTTH TTHTTTHT

THHHTHHH HTHTHTHT

HHHHHHHH HTTTHTTT

Page 7: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Probability of D Given MProbability of D Given M

p: Probability of H from Coin Ip: Probability of H from Coin I

q: Probability of H from Coin IIq: Probability of H from Coin II

Let’s say h heads and t tails for Coin Let’s say h heads and t tails for Coin I. h’ and t’ for Coin II.I. h’ and t’ for Coin II.

Pr(D|M) = pPr(D|M) = ph h (1-p)(1-p)t t qqh’ h’ (1-q)(1-q)t’ t’

How maximize this quantity?How maximize this quantity?

Page 8: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Maximizing pMaximizing p

DDpp(p(ph h (1-p)(1-p)t t qqh’ h’ (1-q)(1-q)t’ t’ ) = 0) = 0

DDpp(p(phh)(1-p))(1-p)t t + p+ ph h DDpp((1-p)((1-p)tt) = 0) = 0

h ph ph-1h-1 (1-p) (1-p)t t = p= ph h t(1-p)t(1-p)t-1t-1

h (1-p)h (1-p) = p= p tt

h = ph = p t + hpt + hp

h/(t+h) = ph/(t+h) = pDuh…Duh…

Page 9: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Missing DataMissing Data

HHHTHHHT HTTHHTTH

TTTHTTTH HTHHHTHH

THTTTHTT HTTTHTTT

TTHTTTHT HHHHHHHH

THHHTHHH HTHTHTHT

Page 10: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Oh Boy, Now What!Oh Boy, Now What!

If we knew the labels (which flips If we knew the labels (which flips from which coin), we could find ML from which coin), we could find ML values for p and q.values for p and q.

What could we use to label?What could we use to label?

p and q!p and q!

Page 11: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Computing LabelsComputing Labels

p = ¾, q = 3/10p = ¾, q = 3/10

Pr(Coin I | HHTH)Pr(Coin I | HHTH)

= Pr(HHTH | Coin I) Pr(Coin I) / c= Pr(HHTH | Coin I) Pr(Coin I) / c

= (3/4)= (3/4)33(1/4) (1/2)/c = .052734375/c(1/4) (1/2)/c = .052734375/c

Pr(Coin II | HHTH)Pr(Coin II | HHTH)

= Pr(HHTH | Coin II) Pr(Coin II) / c= Pr(HHTH | Coin II) Pr(Coin II) / c

= (3/10)= (3/10)33(7/10) (1/2)/c= .00945/c(7/10) (1/2)/c= .00945/c

Page 12: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Expected LabelsExpected Labels

II IIII II IIII

HHHTHHHT .85.85 .15.15 HTTHHTTH .44.44 .56.56

TTTHTTTH .10.10 .90.90 HTHHHTHH .85.85 .15.15

THTTTHTT .10.10 .90.90 HTTTHTTT .10.10 .90.90

TTHTTTHT .10.10 .90.90 HHHHHHHH .98.98 .02.02

THHHTHHH .85.85 .15.15 HTHTHTHT .44.44 .56.56

Page 13: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Wait, I Have an IdeaWait, I Have an Idea

Pick some model MPick some model M00

ExpectationExpectation

• Compute expected labels via MCompute expected labels via Mii

MaximizationMaximization

• Compute ML model MCompute ML model Mi+1i+1

RepeatRepeat

Page 14: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Could This Work?Could This Work?

Expectation-Maximization (EM)Expectation-Maximization (EM)

Pr(D|MPr(D|Mii) will not decrease.) will not decrease.

Sound familiar? Type of search.Sound familiar? Type of search.

Page 15: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Coin ExampleCoin Example

Compute expected labels.Compute expected labels.

Compute counts of heads and tails Compute counts of heads and tails (fractions).(fractions).

Divide to get new probabilities.Divide to get new probabilities.

p=.63 q=.42p=.63 q=.42 Pr(D|M)=9.95 x 10Pr(D|M)=9.95 x 10-13-13

p=.42 q=.63p=.42 q=.63 Pr(D|M)=9.95 x 10Pr(D|M)=9.95 x 10-13-13

p=.52 q=.52p=.52 q=.52 Pr(D|M)=9.56 x 10Pr(D|M)=9.56 x 10-13-13

Page 16: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

More General EMMore General EM

Need to be able to compute Need to be able to compute probabilities: generative modelprobabilities: generative model

Need to tabulate counts to estimate Need to tabulate counts to estimate ML modelML model

Let’s think this through with HMMs.Let’s think this through with HMMs.

Page 17: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Recall HMM ModelRecall HMM Model

N states, M observationsN states, M observations

(s): prob. starting state is s(s): prob. starting state is s

p(s,s’): prob. of s to s’ transitionp(s,s’): prob. of s to s’ transition

b(s, k): probability of obs k from sb(s, k): probability of obs k from s

kk00 k k11 … k … kll: observation sequence: observation sequence

argmaxargmax, p, b, p, b Pr( Pr(, p, b | k, p, b | k00 k k11 … k … kll))

Page 18: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

ML in HMMML in HMM

How estimate How estimate , p, b?, p, b?

What’s the missing information?What’s the missing information?

kk00 k k11 … k … kll

ss00 s s11 … s … sll

Page 19: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Pr(sPr(stt=s|=s|N N NN N N))

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

NN NN NN

UPUP

DOWNDOWN

Page 20: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Forward ProcedureForward Procedure

(s,t): probability of seeing first t (s,t): probability of seeing first t observations observations andand ending up in ending up in state s: Pr(kstate s: Pr(k00…k…ktt, s, stt=s)=s)

(s,0) = (s,0) = (s) b(k(s) b(k00, s), s)

(s,t) = sum(s,t) = sums’s’ b(k b(ktt,s) p(s,s’) ,s) p(s,s’) (s’,t-1)(s’,t-1)

Page 21: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Backward ProcedureBackward Procedure

(s,t): probability of seeing (s,t): probability of seeing observations from t to l given that observations from t to l given that we start in state s: we start in state s: Pr(kPr(kt+1t+1…k…kll | | sstt=s)=s)

(s,l+1) = 1(s,l+1) = 1

(s,t)=sum(s,t)=sumss p(s,s’) p(s,s’)(s’,t+1) b(k(s’,t+1) b(kt+1t+1,s) ,s)

Page 22: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Combing Combing and and

Want to know Pr(sWant to know Pr(stt=s | k=s | k00…k…kll))

= Pr(k= Pr(k00…k…kll, s, stt=s ) / c=s ) / c

= Pr(k= Pr(k00…k…kt t kkt+1t+1…k…kll, s, stt=s ) / c=s ) / c

= Pr(k= Pr(k00…k…ktt, s, stt=s )=s ) Pr(k Pr(kt+1t+1…k…kl l | k| k00…k…ktt, s, stt=s ) / c=s ) / c

= Pr(k= Pr(k00…k…ktt, s, stt=s ) Pr(k=s ) Pr(kt+1t+1…k…kl l | s| stt=s ) / c=s ) / c

= = (s,t) (s,t) (s,t) / c(s,t) / c

Page 23: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

EM For HMMEM For HMM

Expectation: Forward-backward Expectation: Forward-backward (Baum-Welch)(Baum-Welch)

Maximization: Use counts to Maximization: Use counts to reestimate parametersreestimate parameters

Repeat.Repeat.

Gets stuck, but still works well.Gets stuck, but still works well.

Page 24: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

What to LearnWhat to Learn

Maximum Likelihood (counts)Maximum Likelihood (counts)

Expectation (expected counts)Expectation (expected counts)

EMEM

Forward-backward for HMMsForward-backward for HMMs

Page 25: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 8 (due 11/28)Homework 8 (due 11/28)

1.1. Write a program that decides if a pair of Write a program that decides if a pair of words are synonyms using the web. I’ll words are synonyms using the web. I’ll send you the list, you send me the send you the list, you send me the answers.answers.

2.2. Recall the naïve Bayes model in which a Recall the naïve Bayes model in which a class is chosen at random, then features class is chosen at random, then features are generated from the class. Consider are generated from the class. Consider a simple example with 2 classes with 3 a simple example with 2 classes with 3 binary features. Let’s use EM to learn a binary features. Let’s use EM to learn a naïve Bayesnaïve Bayes

Page 26: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

(continued)(continued)

model. (a) What are the parameters of model. (a) What are the parameters of the model? (b) Imagine we are given the model? (b) Imagine we are given data consisting of the two feature data consisting of the two feature values for each sample from the model. values for each sample from the model. We are not given the class label. We are not given the class label. Describe an “expectation” procedure to Describe an “expectation” procedure to compute class labels for the data given compute class labels for the data given a model. (c) How do you use this a model. (c) How do you use this procedure to learn a maximum procedure to learn a maximum likelihood model for the data?likelihood model for the data?

Page 27: Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 9 (due 12/5)Homework 9 (due 12/5)

1.1. Write a program that decides if a Write a program that decides if a pair of words are synonyms using pair of words are synonyms using wordnet. I’ll send you the list, you wordnet. I’ll send you the list, you send me the answers.send me the answers.

2.2. more soonmore soon