10601 Machine Learning

58
1 10601 Machine Learning Recitation 2 Öznur Taştan September 2, 2009

description

September 2, 2009. 10601 Machine Learning. Recitation 2 Öznur Taştan. Logistics. Homework 2 is going to be out tomorrow. It is due on Sep 16, Wed. There is no class on Monday Sep 7 th (Labor day) Those who have not return Homework 1 yet - PowerPoint PPT Presentation

Transcript of 10601 Machine Learning

Page 1: 10601 Machine Learning

1

10601 Machine Learning

Recitation 2Öznur Taştan

September 2, 2009

Page 2: 10601 Machine Learning

Logistics Homework 2 is going to be out tomorrow.

It is due on Sep 16, Wed.

There is no class on Monday Sep 7th (Labor day)

Those who have not return Homework 1 yetFor details of how to submit the homework policy please

check : http://www.cs.cmu.edu/~ggordon/10601/hws.html

Page 3: 10601 Machine Learning

Outline We will review

Some probability and statistics Some graphical models

We will not go over Homework 1 Since the grace period has not ended yet. Solutions will be up next week on the web page.

Page 4: 10601 Machine Learning

We’ll play a game: Catch the goof! I’ll be the sloppy TA… will make ‘intentional’ mistakes

You’ll catch those mistakes and correct me!

Slides with mistakes are marked with

Correct slides are marked with

Page 5: 10601 Machine Learning

Catch the goof!!

Page 6: 10601 Machine Learning

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

j i jj

j i j jj

P Y y P X x Y y

P Y y P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

Page 7: 10601 Machine Learning

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

j i jj

j i j jj

P Y y P X x Y y

P Y y P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

Page 8: 10601 Machine Learning

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

j i jj

j i j jj

P Y y P X x Y y

P Y y P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

( )iP X x

( )iP X x

Page 9: 10601 Machine Learning

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

i i jj

i i j jj

P X x P X x Y y

P X x P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

Page 10: 10601 Machine Learning

Given two discrete random variables X and Y

Law of total probability

( ) ( , )i i jj

P X x P X x Y y

Joint probability

Marginal probability

( | ) ( )i j jj

P X x Y y P Y y

Conditional probability of X conditioned on Y

Page 11: 10601 Machine Learning

Given two discrete random variables X and Y

Law of total probability

( ) ( , )i i jj

P X x P X x Y y

Joint probability

Marginal probability

( | ) ( )i j jj

P X x Y y P Y y

Conditional probability of X conditioned on Y

Formulas are fine.Anything wrong with the names?

Page 12: 10601 Machine Learning

Given two discrete random variables X and Y

Law of total probability

( ) ( , )i i jj

P X x P X x Y y

Joint probability of X,Y

Marginal probability

( | ) ( )i j jj

P X x Y y P Y y

Conditional probability of X conditioned on Y

Marginal probability

Page 13: 10601 Machine Learning

In a strange world Two discrete random variables X and Y take binary values

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.5P X Y

( 1, 1) 0.5P X Y

Joint probabilities

Page 14: 10601 Machine Learning

In a strange world Two discrete random variables X and Y take binary values

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.5P X Y

( 1, 1) 0.5P X Y

Joint probabilities

Should sum up to 1

Page 15: 10601 Machine Learning

The world seems fine Two discrete random variables X and Y take binary values

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities

Page 16: 10601 Machine Learning

What about the marginals?

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

Page 17: 10601 Machine Learning

This is a strange world

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

Page 18: 10601 Machine Learning

In a strange world

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

Page 19: 10601 Machine Learning

This is a strange world

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

Page 20: 10601 Machine Learning

Let’s have a simple problem

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.4( 1) 0.6

P XP X

( 0) 0.5( 1) 0.5

P YP Y

( 0) ( 0, 0) ( 1, 1) =0.4 P X P X Y P X Y

Page 21: 10601 Machine Learning

Conditional probabilitiesWhat is the complementary event of P(X=0|Y=1) ?

P(X=1|Y=1) OR P(X=0|Y=0)

Page 22: 10601 Machine Learning

Conditional probabilitiesWhat is the complementary event of P(X=0|Y=1) ?

P(X=1|Y=1) OR P(X=0|Y=0)

Page 23: 10601 Machine Learning

The game ends here.

Page 24: 10601 Machine Learning

Independent number of parametersAssume X and Y take Boolean values {0,1}:

How many independent parameters do you need to fully specify:

marginal probability of X?

the joint probability of P(X,Y)?

the conditional probability of P(X|Y)?

Page 25: 10601 Machine Learning

Independent number of parametersAssume X and Y take Boolean values {0,1}:

How many independent parameters do you need to fully specify:

marginal probability of X?P(X=0) 1 parameter only [ because P(X=1)+P(X=0)=1 ]

the joint probability of P(X,Y)?

P(X=0, Y=0) 3 parameters P(X=0, Y=1)

P(X=1, Y=0)the conditional probability of P(X|Y)?

Page 26: 10601 Machine Learning

Number of parameters Assume X and Y take Boolean values {0,1}?

How many independent parameters do you need to fully specifymarginal probability of X?

P(X=0) 1 parameter only P(X=1)= 1-P(X=0) How many independent parameters do you need to fully specify

the joint probability of P(X,Y)? P(X=0, Y=0) 3 parameters P(X=0, Y=1)

P(X=1, Y=0) How many independent parameters do you need to fully specify

the conditional probability of P(X|Y)? P(X=0|Y=0) 2 parameters P(X=0|Y=1)

Page 27: 10601 Machine Learning

Number of parameters What about P(X | Y,Z) , how many independent parameters do you need to be able to fully specify the probabilities?

Assume each RV takes:

m values

P(X | Y,Z)

n values q values

Page 28: 10601 Machine Learning

Number of parameters What about P(X | Y,Z) , how many independent parameters do you need to be able to fully specify the probabilities?

Assume each RV takes:

m values

Number of independent parameters: (m-1)*nq

P(X | Y,Z)

n values q values

Page 29: 10601 Machine Learning

Graphical models

A graphical model is a way of representing probabilistic relationships between random variables

Variables are represented by nodes:Edges indicates probabilistic relationships:

Arrive class late

You miss the bus

1 2 n i ii 1

P(X ,X ..X ) P(X | Pa(X ))

Page 30: 10601 Machine Learning

Serial connection

X

Y

Z

Is X and Z independent?

X Z ?

Page 31: 10601 Machine Learning

Serial connection

X

Y

Z

Is X and Z independent?

X Z

X and Z are not independent

Page 32: 10601 Machine Learning

Serial connection

X

Y

Z

Is X conditionally independent of Z given Y?

|X Z Y ?

Page 33: 10601 Machine Learning

Serial connection

X

Y

Z

Is X conditionally independent of Z given Y?

|X Z Y

Yes they are independent

Page 34: 10601 Machine Learning

How can we show it?

X

Y

Z

( , , ) ( ) ( | ) ( | )P X Y Z P X P Y X P Z Y

( | , )

( , , )

( , )

( ) ( | ) ( | )

( ) ( | )

( | )

P Z X Y

P X Y Z

P X Y

P X P Y X P Z Y

P X P Y X

P Z Y

Is X conditionally independent of Z given Y?

|X Z Y

Page 35: 10601 Machine Learning

An example case

Studied late last night

Wake up late

Arrive class late

Page 36: 10601 Machine Learning

Common cause

Z

YX

Shoe Size

Age

Gray Hair

X and Y are not marginally independentX and Y are conditionally independent given Z

Page 37: 10601 Machine Learning

Explaining away

X Z

Y

Flu Allergy

Sneeze

X and Z marginally independentX and Z conditionally dependent given Y

Page 38: 10601 Machine Learning

D-separation X and Z are conditionally independent given Y if Y d-separates X

and Z

X

Y

Z

Path between X and Z is blocked by Y

X

Y

Z

X

Y

Z

X

Y

Z

Neither Y nor its descendants should beobserved

Page 39: 10601 Machine Learning

D-separation example

Is B, C independent given A?

Page 40: 10601 Machine Learning

D-separation example

Is B, C independent given A?

Yes

Page 41: 10601 Machine Learning

D-separation example

Is B, C independent given A?

Yes

Observed, A blocks the path

Page 42: 10601 Machine Learning

Is B, C independent given A?

Yes

Observed, A blocks the path

not observed neither its descendants

Page 43: 10601 Machine Learning

D-separation example

Is A, F independent given E?

Page 44: 10601 Machine Learning

Is A, F independent given E? Yes

Page 45: 10601 Machine Learning

Is A, F independent given E? Yes

Page 46: 10601 Machine Learning

Is C, D independent given F?

Page 47: 10601 Machine Learning

Is C, D independent given F?

No

Page 48: 10601 Machine Learning

Is A, G independent given B and F?

Page 49: 10601 Machine Learning

Is A, G independent given B and F?Yes

Page 50: 10601 Machine Learning

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than once

Page 51: 10601 Machine Learning

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than once

What parameters are stored?

Page 52: 10601 Machine Learning

Naïve Bayes Model

J

D C R

J : The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than once

P(J)=

P(R/J=1)=P(R/J=0)=

P(D/J=1)=P(D/J=0)=

P(C/J=1)=P(C/J=0)=

Page 53: 10601 Machine Learning

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculus C: The person leaves in campusR: Saw the “Return of the King” more than once

P(J)=

P(R/J=1)=P(R/J=0)=

P(D/J=1)=P(D/J=0)=

P(C/J=1)=P(C/J=0)=

Page 54: 10601 Machine Learning

 

Are you ajunior?

Do you knowcalculus?

Do you livein campus?

Have you seen'Return of the King‘more than once?

Student 1 1 0 1 1

Student 2 1 1 1 0

Student 3 1 0 1 1

Student 4 1 0 1 1

Student 5 1 1 1 0

Student 6 1 0 1 1

Student 7 1 1 1 1

Student 8 1 1 1 1

Student 9 0 1 0 1

Student 10 1 1 1 1

Student 11 1 0 1 0

Student 12 1 0 1 1

Student 13 0 1 1 1

Student 14 1 1 1 1

Student 15 1 1 1 1

Student 16 1 1 1 1

Student 17 0 0 0 1

Student 18 1 0 1 0

Student 19 0 1 1 1

Student 20 0 0 1 1

We have the structure how do we get the CPTs?Estimate them from observed data

Page 55: 10601 Machine Learning

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than onceP(J)=

P(R/J)=P(R/~J)=

P(C/J)=P(C/~J)=

P(C/J)=P(C/~J)=

Suppose a new person come and says: I don’t know calculus I live in campus I have seen ‘The return of the king’ five times

What is the probability that he is a Junior?

Page 56: 10601 Machine Learning

Naïve Bayes Model

J

D C R

Suppose a person says:I don’t know calculus D=0I live in campus C=1I have not seen ‘The return of the king’ five times R=1

What is the probability that he is a Junior?

P(J=1/D=0,C=1,R=1)

Page 57: 10601 Machine Learning

What is the probability that he is a Junior?

P(J=1,D=0,C=1,R=1)P(J=1/D=0,C=1,R=1) =

P(D 0,C 1,R 1)

P(J 1)P(C 1/ J 1)P(R 1/ J 1)P(D 0/ J 1)

P(D 0,C 1,R 1)

J

D C R

To calculate this marginalize over J

Page 58: 10601 Machine Learning

Naïve Bayes Model

P(J=1/D=0,C=1,R=1)

P(J 1)P(C 1/ J 1)P(R 1/ J 1)P(D 0/ J 1)

P(D 0,C 1,R 1)