INC 551 Artificial Intelligence Lecture 8 Models of Uncertainty.
INC 551 Artificial Intelligence
description
Transcript of INC 551 Artificial Intelligence
INC 551 Artificial Intelligence
Lecture 11
Machine Learning (Continue)
Bayes Classifier
Bayes Rule
Play Tennis Example
John wants to play tennis everyday.
However, in some days, the condition is not good.So, he decide not to play.
The following table is the record for the last 14 days.
Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
Question:
Today’s condition is
<Sunny, Mild Temperature, Normal Humidity, Strong Wind>
Do you think John will play tennis?
)|( PlayTennisconditionPFind
We need to use naïve Bayes assumption.Assume that all events are independent.
)|()|(
)|()|(
)|(
PlayTennisstrongPPlayTennisnormalP
PlayTennismildPPlayTennissunnyP
PlayTennisstrongnormalmildsunnyP
Now, let’s look at each property
6.05/3)|(
33.09/3)|(
2.05/1)|(
66.09/6)|(
4.05/2)|(
44.09/4)|(
6.05/3)|(
22.09/2)|(
noPlayTennisstrongWindP
yesPlayTennisstrongWindP
noPlayTennisnormalHumidP
yesPlayTennisnormalHumidP
noPlayTennismildTempP
yesPlayTennismildTempP
noPlayTennissunnyP
yesPlayTennissunnyP
0288.06.02.04.06.0
)|(
022.033.066.044.022.0
)|(
noPlayTennisstrongnormalmildsunnyP
yesPlayTennisstrongnormalmildsunnyP
Using Bayes rule
)(
)()|()|(
conditionP
PlayTennisPPlayTennisconditionPconditionPlayTennisP
)(
01415.0
)(
643.0022.0)|(
conditionPconditionPconditionyesPlayTennisP
)(
01028.0
)(
357.00288.0)|(
conditionPconditionPconditionnoPlayTennisP
Since P(condition) is the same, we can conclude that John is more likely to play tennis today.
Note that, we do not need to compute P(condition) to get the answer. However, if you want to get the number, we can calculate P(condition) in the way similar to normalize the probability.
02443.001028.001415.0)(
)()()(
conditionP
PlayTennisconditionPPlayTennisconditionPconditionP
42.002443.0
01028.0)|(
58.002443.0
01415.0)|(
conditionnoPlayTennisP
conditionyesPlayTennisP
Therefore, John is more likely to play tennis today with 58% chance.
Learning and Bayes Classifier
Learning is the adjustment of probability valuesto compute a posterior probability when new dataIs added.
Classifying Object ExampleSuppose we want to classify objects into two classes, A and B. There are two features that we can measure from each object, f1 and f2.We sample four objects randomly to be a database and classify it by hand.
Sample f1 f2 Class
1 5.2 1.2 B
2 2.3 5.4 A
3 1.5 4.4 A
4 4.5 2.1 B
Now, we have another sample that have f1=3.2f2=4.2 we want to know what class it is.
We want to find )|( featureClassP
Using Bayes rule
)(
)()|()|(
featureP
ClassPClassfeaturePfeatureClassP
From the table, we will count the number of events.
5.04/2)(
5.04/2)(
BClassP
AClassP
)|( ClassfeaturePFind
Again, we use the naïve Bayes assumption.Assume that all events are independent.
)|2()|1()|21( ClassfPClassfPClassffP
To find we need to assume probabilitydistribution because the features are continuous value.
The most common form of distribution is Gaussian(normal) distribution.
)|1( ClassfP
Gaussian distribution
2
2
2 2
)(exp
2
1)(
x
xP
There are two parameters: mean µ and variance σ
Using the maximum likelihood principle, the meanand the variance can be estimated from the samplesIn the database.
Class Af1: Mean = (2.3+1.5 )/2 = 1.9 SD = 0.4f2: Mean = (5.4+4.4 )/2 = 4.9 SD = 0.5
Class Bf1: Mean = (5.2+4.5 )/2 = 4.85 SD = 0.35f2: Mean = (1.2+2.1 )/2 = 1.65 SD = 0.45
)4.0(2
)9.1(exp)4.0(2
1)|1(
2
2
2
xAfP
0051.0)4.0(2
)9.12.3(exp)4.0(2
1)|1(
2
2
2
AfP
The object that we want to classify has f1=3.2 f2=4.2.
2995.0)5.0(2
)9.42.4(exp)5.0(2
1)|2(
2
2
2
AfP
05-1.7016e)35.0(2
)85.42.3(exp)35.0(2
1)|1(
2
2
2
BfP
08-9.4375e)45.0(2
)65.12.4(exp)45.0(2
1)|2(
2
2
2
BfP
Therefore,
12-1.6059e8-9.4375e5-1.7016e)|21(
0015.02995.00051.0)|21(
BClassffP
AClassffP
)(
)()|()|(
featureP
ClassPClassfeaturePfeatureClassP
From Bayes
)(
5.00015.0)|(
featurePfeatureAP
)(
5.0126059.1)|(
featureP
efeatureBP
Therefore, we should classify the sample as Class A.
Nearest Neighbor Classification
NN is considered as no model classification.
Nearest Neighbor’s PrincipleThe unknown sample is classified to be the sameclass as the sample with closet distance.
Feature 1
Feature 2
Closet Distance
We classify the sample as a circle.
Distance between Samples
k kNN
kkkN
i
kii yxyxyxyxyxD )(..)()()(),( 22111
Sample X and Y have multi-dimension feature values.
0
1
3
2
X
3
5
1
2
Y
The distance between sample X,Y can be calculated bythis formula.
k kNN
kkkN
i
kii yxyxyxyxyxD )(..)()()(),( 22111
If k = 1 , the distance is called Manhattan distanceIf k = 2 , the distance is called Euclidean distanceIf k = ∞ , the distance is the maximum value of feature
Euclidean is well-known and is the prefer one.
Sample f1 f2 Class
1 5.2 1.2 B
2 2.3 5.4 A
3 1.5 4.4 A
4 4.5 2.1 B
Classifying Object with NN
Now, we have another sample, f1=3.2 f2=4.2We want to know its class.
Compute Euclidian distance from it to all other samples
4698.2)1.22.4()5.42.3()4,(
7117.1)4.42.4()5.12.3()3,(
5.1)4.52.4()3.22.3()2,(
6056.3)2.12.4()2.52.3()1,(
22
22
22
22
sxD
sxD
sxD
sxD
The unknown sample has the closest distance to thesecond sample. Therefore, we classify it to be thesame class as the second sample, which is Class A.
K-Nearest Neighbor (KNN)
Instead of using the closet sample as the decided class,we use the closet k samples as the decided class.
Feature 1
Feature 2
Example k=3
The data is classified as a circle
Feature 1
Feature 2
Example k=5
The data is classified as a star.