1 10601 Machine Learning Recitation 2 Öznur Taştan September 2, 2009.

58
1 10601 Machine Learning Recitation 2 Öznur Taştan September 2, 2009
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    1

Transcript of 1 10601 Machine Learning Recitation 2 Öznur Taştan September 2, 2009.

1

10601 Machine Learning

Recitation 2Öznur Taştan

September 2, 2009

Logistics Homework 2 is going to be out tomorrow.

It is due on Sep 16, Wed.

There is no class on Monday Sep 7th (Labor day)

Those who have not return Homework 1 yetFor details of how to submit the homework policy please

check : http://www.cs.cmu.edu/~ggordon/10601/hws.html

Outline We will review

Some probability and statistics Some graphical models

We will not go over Homework 1 Since the grace period has not ended yet. Solutions will be up next week on the web page.

We’ll play a game: Catch the goof! I’ll be the sloppy TA… will make ‘intentional’ mistakes

You’ll catch those mistakes and correct me!

Slides with mistakes are marked with

Correct slides are marked with

Catch the goof!!

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

j i jj

j i j jj

P Y y P X x Y y

P Y y P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

j i jj

j i j jj

P Y y P X x Y y

P Y y P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

j i jj

j i j jj

P Y y P X x Y y

P Y y P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

( )iP X x

( )iP X x

Given two discrete random variables X and Y X takes values in

Law of total probability

( ) ( , )

( ) ( | ) ( )

i i jj

i i j jj

P X x P X x Y y

P X x P X x Y y P Y y

1, , mx x

Y takes values in 1, , ny y

Given two discrete random variables X and Y

Law of total probability

( ) ( , )i i jj

P X x P X x Y y

Joint probability

Marginal probability

( | ) ( )i j jj

P X x Y y P Y y

Conditional probability of X conditioned on Y

Given two discrete random variables X and Y

Law of total probability

( ) ( , )i i jj

P X x P X x Y y

Joint probability

Marginal probability

( | ) ( )i j jj

P X x Y y P Y y

Conditional probability of X conditioned on Y

Formulas are fine.Anything wrong with the names?

Given two discrete random variables X and Y

Law of total probability

( ) ( , )i i jj

P X x P X x Y y

Joint probability of X,Y

Marginal probability

( | ) ( )i j jj

P X x Y y P Y y

Conditional probability of X conditioned on Y

Marginal probability

In a strange world Two discrete random variables X and Y take binary values

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.5P X Y

( 1, 1) 0.5P X Y

Joint probabilities

In a strange world Two discrete random variables X and Y take binary values

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.5P X Y

( 1, 1) 0.5P X Y

Joint probabilities

Should sum up to 1

The world seems fine Two discrete random variables X and Y take binary values

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities

What about the marginals?

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

This is a strange world

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

In a strange world

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

This is a strange world

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.2( 1) 0.8

P XP X

( 0) 0.5( 1) 0.5

P YP Y

Let’s have a simple problem

( 0, 1) 0.2P X Y

( 0, 0) 0.2P X Y

( 1, 0) 0.3P X Y

( 1, 1) 0.3P X Y

Joint probabilities Marginal probabilities

( 0) 0.4( 1) 0.6

P XP X

( 0) 0.5( 1) 0.5

P YP Y

( 0) ( 0, 0) ( 1, 1) =0.4 P X P X Y P X Y

Conditional probabilitiesWhat is the complementary event of P(X=0|Y=1) ?

P(X=1|Y=1) OR P(X=0|Y=0)

Conditional probabilitiesWhat is the complementary event of P(X=0|Y=1) ?

P(X=1|Y=1) OR P(X=0|Y=0)

The game ends here.

Independent number of parametersAssume X and Y take Boolean values {0,1}:

How many independent parameters do you need to fully specify:

marginal probability of X?

the joint probability of P(X,Y)?

the conditional probability of P(X|Y)?

Independent number of parametersAssume X and Y take Boolean values {0,1}:

How many independent parameters do you need to fully specify:

marginal probability of X?P(X=0) 1 parameter only [ because P(X=1)+P(X=0)=1 ]

the joint probability of P(X,Y)?

P(X=0, Y=0) 3 parameters P(X=0, Y=1)

P(X=1, Y=0)the conditional probability of P(X|Y)?

Number of parameters Assume X and Y take Boolean values {0,1}?

How many independent parameters do you need to fully specifymarginal probability of X?

P(X=0) 1 parameter only P(X=1)= 1-P(X=0) How many independent parameters do you need to fully specify

the joint probability of P(X,Y)? P(X=0, Y=0) 3 parameters P(X=0, Y=1)

P(X=1, Y=0) How many independent parameters do you need to fully specify

the conditional probability of P(X|Y)? P(X=0|Y=0) 2 parameters P(X=0|Y=1)

Number of parameters What about P(X | Y,Z) , how many independent parameters do you need to be able to fully specify the probabilities?

Assume each RV takes:

m values

P(X | Y,Z)

n values q values

Number of parameters What about P(X | Y,Z) , how many independent parameters do you need to be able to fully specify the probabilities?

Assume each RV takes:

m values

Number of independent parameters: (m-1)*nq

P(X | Y,Z)

n values q values

Graphical models

A graphical model is a way of representing probabilistic relationships between random variables

Variables are represented by nodes:Edges indicates probabilistic relationships:

Arrive class late

You miss the bus

1 2 n i ii 1

P(X ,X ..X ) P(X | Pa(X ))

Serial connection

X

Y

Z

Is X and Z independent?

X Z ?

Serial connection

X

Y

Z

Is X and Z independent?

X Z

X and Z are not independent

Serial connection

X

Y

Z

Is X conditionally independent of Z given Y?

|X Z Y ?

Serial connection

X

Y

Z

Is X conditionally independent of Z given Y?

|X Z Y

Yes they are independent

How can we show it?

X

Y

Z

( , , ) ( ) ( | ) ( | )P X Y Z P X P Y X P Z Y

( | , )

( , , )

( , )

( ) ( | ) ( | )

( ) ( | )

( | )

P Z X Y

P X Y Z

P X Y

P X P Y X P Z Y

P X P Y X

P Z Y

Is X conditionally independent of Z given Y?

|X Z Y

An example case

Studied late last night

Wake up late

Arrive class late

Common cause

Z

YX

Shoe Size

Age

Gray Hair

X and Y are not marginally independentX and Y are conditionally independent given Z

Explaining away

X Z

Y

Flu Allergy

Sneeze

X and Z marginally independentX and Z conditionally dependent given Y

D-separation X and Z are conditionally independent given Y if Y d-separates X

and Z

X

Y

Z

Path between X and Z is blocked by Y

X

Y

Z

X

Y

Z

X

Y

Z

Neither Y nor its descendants should beobserved

D-separation example

Is B, C independent given A?

D-separation example

Is B, C independent given A?

Yes

D-separation example

Is B, C independent given A?

Yes

Observed, A blocks the path

Is B, C independent given A?

Yes

Observed, A blocks the path

not observed neither its descendants

D-separation example

Is A, F independent given E?

Is A, F independent given E? Yes

Is A, F independent given E? Yes

Is C, D independent given F?

Is C, D independent given F?

No

Is A, G independent given B and F?

Is A, G independent given B and F?Yes

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than once

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than once

What parameters are stored?

Naïve Bayes Model

J

D C R

J : The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than once

P(J)=

P(R/J=1)=P(R/J=0)=

P(D/J=1)=P(D/J=0)=

P(C/J=1)=P(C/J=0)=

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculus C: The person leaves in campusR: Saw the “Return of the King” more than once

P(J)=

P(R/J=1)=P(R/J=0)=

P(D/J=1)=P(D/J=0)=

P(C/J=1)=P(C/J=0)=

 

Are you ajunior?

Do you knowcalculus?

Do you livein campus?

Have you seen'Return of the King‘more than once?

Student 1 1 0 1 1

Student 2 1 1 1 0

Student 3 1 0 1 1

Student 4 1 0 1 1

Student 5 1 1 1 0

Student 6 1 0 1 1

Student 7 1 1 1 1

Student 8 1 1 1 1

Student 9 0 1 0 1

Student 10 1 1 1 1

Student 11 1 0 1 0

Student 12 1 0 1 1

Student 13 0 1 1 1

Student 14 1 1 1 1

Student 15 1 1 1 1

Student 16 1 1 1 1

Student 17 0 0 0 1

Student 18 1 0 1 0

Student 19 0 1 1 1

Student 20 0 0 1 1

We have the structure how do we get the CPTs?Estimate them from observed data

Naïve Bayes Model

J

D C R

J: The person is a juniorD: The person knows calculusC: The person leaves in campusR: Saw the “Return of the King” more than onceP(J)=

P(R/J)=P(R/~J)=

P(C/J)=P(C/~J)=

P(C/J)=P(C/~J)=

Suppose a new person come and says: I don’t know calculus I live in campus I have seen ‘The return of the king’ five times

What is the probability that he is a Junior?

Naïve Bayes Model

J

D C R

Suppose a person says:I don’t know calculus D=0I live in campus C=1I have not seen ‘The return of the king’ five times R=1

What is the probability that he is a Junior?

P(J=1/D=0,C=1,R=1)

What is the probability that he is a Junior?

P(J=1,D=0,C=1,R=1)P(J=1/D=0,C=1,R=1) =

P(D 0,C 1,R 1)

P(J 1)P(C 1/ J 1)P(R 1/ J 1)P(D 0/ J 1)

P(D 0,C 1,R 1)

J

D C R

To calculate this marginalize over J

Naïve Bayes Model

P(J=1/D=0,C=1,R=1)

P(J 1)P(C 1/ J 1)P(R 1/ J 1)P(D 0/ J 1)

P(D 0,C 1,R 1)