INC 551 Artificial Intelligence

INC 551 Artificial Intelligence

Lecture 11

Machine Learning (Continue)

Bayes Classifier

Bayes Rule

Play Tennis Example

John wants to play tennis everyday.

However, in some days, the condition is not good.So, he decide not to play.

The following table is the record for the last 14 days.

Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Sunny Mild Normal Strong Yes

Overcast Mild High Strong Yes

Overcast Hot Normal Weak Yes

Rain Mild High Strong No

Question:

Today’s condition is

<Sunny, Mild Temperature, Normal Humidity, Strong Wind>

Do you think John will play tennis?

)|( PlayTennisconditionPFind

We need to use naïve Bayes assumption.Assume that all events are independent.

)|()|(

)|()|(

)|(

PlayTennisstrongPPlayTennisnormalP

PlayTennismildPPlayTennissunnyP

PlayTennisstrongnormalmildsunnyP

Now, let’s look at each property

6.05/3)|(

33.09/3)|(

2.05/1)|(

66.09/6)|(

4.05/2)|(

44.09/4)|(

6.05/3)|(

22.09/2)|(

noPlayTennisstrongWindP

yesPlayTennisstrongWindP

noPlayTennisnormalHumidP

yesPlayTennisnormalHumidP

noPlayTennismildTempP

yesPlayTennismildTempP

noPlayTennissunnyP

yesPlayTennissunnyP

0288.06.02.04.06.0

)|(

022.033.066.044.022.0

)|(

noPlayTennisstrongnormalmildsunnyP

yesPlayTennisstrongnormalmildsunnyP

Using Bayes rule

)(

)()|()|(

conditionP

PlayTennisPPlayTennisconditionPconditionPlayTennisP

)(

01415.0

)(

643.0022.0)|(

conditionPconditionPconditionyesPlayTennisP

)(

01028.0

)(

357.00288.0)|(

conditionPconditionPconditionnoPlayTennisP

Since P(condition) is the same, we can conclude that John is more likely to play tennis today.

Note that, we do not need to compute P(condition) to get the answer. However, if you want to get the number, we can calculate P(condition) in the way similar to normalize the probability.

02443.001028.001415.0)(

)()()(

conditionP

PlayTennisconditionPPlayTennisconditionPconditionP

42.002443.0

01028.0)|(

58.002443.0

01415.0)|(

conditionnoPlayTennisP

conditionyesPlayTennisP

Therefore, John is more likely to play tennis today with 58% chance.

Learning and Bayes Classifier

Learning is the adjustment of probability valuesto compute a posterior probability when new dataIs added.

Classifying Object ExampleSuppose we want to classify objects into two classes, A and B. There are two features that we can measure from each object, f1 and f2.We sample four objects randomly to be a database and classify it by hand.

Sample f1 f2 Class

1 5.2 1.2 B

2 2.3 5.4 A

3 1.5 4.4 A

4 4.5 2.1 B

Now, we have another sample that have f1=3.2f2=4.2 we want to know what class it is.

We want to find )|( featureClassP

Using Bayes rule

)(

)()|()|(

featureP

ClassPClassfeaturePfeatureClassP

From the table, we will count the number of events.

5.04/2)(

5.04/2)(

BClassP

AClassP

)|( ClassfeaturePFind

Again, we use the naïve Bayes assumption.Assume that all events are independent.

)|2()|1()|21( ClassfPClassfPClassffP

To find we need to assume probabilitydistribution because the features are continuous value.

The most common form of distribution is Gaussian(normal) distribution.

)|1( ClassfP

Gaussian distribution

2

2

2 2

)(exp

2

1)(

x

xP

There are two parameters: mean µ and variance σ

Using the maximum likelihood principle, the meanand the variance can be estimated from the samplesIn the database.

Class Af1: Mean = (2.3+1.5 )/2 = 1.9 SD = 0.4f2: Mean = (5.4+4.4 )/2 = 4.9 SD = 0.5

Class Bf1: Mean = (5.2+4.5 )/2 = 4.85 SD = 0.35f2: Mean = (1.2+2.1 )/2 = 1.65 SD = 0.45

)4.0(2

)9.1(exp)4.0(2

1)|1(

2

2

2

xAfP

0051.0)4.0(2

)9.12.3(exp)4.0(2

1)|1(

2

2

2

AfP

The object that we want to classify has f1=3.2 f2=4.2.

2995.0)5.0(2

)9.42.4(exp)5.0(2

1)|2(

2

2

2

AfP

05-1.7016e)35.0(2

)85.42.3(exp)35.0(2

1)|1(

2

2

2

BfP

08-9.4375e)45.0(2

)65.12.4(exp)45.0(2

1)|2(

2

2

2

BfP

Therefore,

12-1.6059e8-9.4375e5-1.7016e)|21(

0015.02995.00051.0)|21(

BClassffP

AClassffP

)(

)()|()|(

featureP

ClassPClassfeaturePfeatureClassP

From Bayes

)(

5.00015.0)|(

featurePfeatureAP

)(

5.0126059.1)|(

featureP

efeatureBP

Therefore, we should classify the sample as Class A.

Nearest Neighbor Classification

NN is considered as no model classification.

Nearest Neighbor’s PrincipleThe unknown sample is classified to be the sameclass as the sample with closet distance.

Feature 1

Feature 2

Closet Distance

We classify the sample as a circle.

Distance between Samples

k kNN

kkkN

i

kii yxyxyxyxyxD )(..)()()(),( 22111

Sample X and Y have multi-dimension feature values.

0

1

3

2

X

3

5

1

2

Y

The distance between sample X,Y can be calculated bythis formula.

k kNN

kkkN

i

kii yxyxyxyxyxD )(..)()()(),( 22111

If k = 1 , the distance is called Manhattan distanceIf k = 2 , the distance is called Euclidean distanceIf k = ∞ , the distance is the maximum value of feature

Euclidean is well-known and is the prefer one.

Sample f1 f2 Class

1 5.2 1.2 B

2 2.3 5.4 A

3 1.5 4.4 A

4 4.5 2.1 B

Classifying Object with NN

Now, we have another sample, f1=3.2 f2=4.2We want to know its class.

Compute Euclidian distance from it to all other samples

4698.2)1.22.4()5.42.3()4,(

7117.1)4.42.4()5.12.3()3,(

5.1)4.52.4()3.22.3()2,(

6056.3)2.12.4()2.52.3()1,(

22

22

22

22

sxD

sxD

sxD

sxD

The unknown sample has the closest distance to thesecond sample. Therefore, we classify it to be thesame class as the second sample, which is Class A.

K-Nearest Neighbor (KNN)

Instead of using the closet sample as the decided class,we use the closet k samples as the decided class.

Feature 1

Feature 2

Example k=3

The data is classified as a circle

Feature 1

Feature 2

Example k=5

The data is classified as a star.

INC 551 Artificial Intelligence

Documents

Transcript of INC 551 Artificial Intelligence