SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN

26
SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN Wentao TIAN, [email protected]

description

SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN. Wentao TIAN, [email protected]. Classification: Definition. Given a collection of records ( training set ), each record contains a set of attributes , one of the attributes is the class . - PowerPoint PPT Presentation

Transcript of SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN

Page 1: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

SEEM4630 2013-2014 Tutorial 2

Classification:Decision tree, Naïve

Bayes & k-NNWentao TIAN, [email protected]

Page 2: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

Given a collection of records (training set ), each record contains a set of attributes, one of the attributes is the class.

Find a model for class attribute as a function of the values of other attributes. Decision tree Naïve bayes k-NN

Goal: previously unseen records should be assigned a class as accurately as possible.

2

Classification: Definition

Page 3: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

GoalConstruct a tree so that instances belonging to

different classes should be separated Basic algorithm (a greedy algorithm)

Tree is constructed in a top-down recursive manner

At start, all the training examples are at the rootTest attributes are selected on the basis of a

heuristics or statistical measure (e.g., information gain)

Examples are partitioned recursively based on selected attributes 3

Decision Tree

Page 4: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

4

Attribute Selection Measure 1: Information Gain

Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D|

Expected information (entropy) needed to classify a tuple in D:

Information needed (after using A to split D into v partitions) to classify D:

Information gained by branching on attribute A

)(||||

)(1

j

v

j

jA DInfo

DD

DInfo

(D)InfoInfo(D)Gain(A) A

)(log)( 21

i

m

ii ppDInfo

Page 5: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

5

Attribute Selection Measure 2: Gain Ratio Information gain measure is biased towards

attributes with a large number of values C4.5 (a successor of ID3) uses gain ratio to

overcome the problem (normalization to information gain)

GainRatio(A) = Gain(A)/SplitInfo(A)

)||||

(log||||

)( 21 D

DDD

DSplitInfo jv

j

jA

Page 6: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

6

Attribute Selection Measure 3: Gini index If a data set D contains examples from n classes,

gini index, gini(D) is defined as

where pj is the relative frequency of class j in D If a data set D is split on A into two subsets D1

and D2, the gini index gini(D) is defined as

Reduction in Impurity:

n

jp jDgini

1

21)(

)(||||)(

||||)( 2

21

1 DginiDD

DginiDDDginiA

)()()( DginiDginiAgini A

Page 7: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

ExampleOutlook Temperature Humidity Wind Play TennisSunny >25 High Weak NoSunny >25 High Strong NoOvercast >25 High Weak YesRain 15-25 High Weak YesRain <15 Normal Weak YesRain <15 Normal Strong NoOvercast <15 Normal Strong YesSunny 15-25 High Weak NoSunny <15 Normal Weak YesRain 15-25 Normal Weak YesSunny 15-25 Normal Strong YesOvercast 15-25 High Strong YesOvercast >25 Normal Weak YesRain 15-25 High Strong No

7

Page 8: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

8

Tree induction example

S[9+, 5-] Outlook

Sunny [2+,3-] Overcast [4+,0-] Rain [3+,2-]

Gain(Outlook) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))] – 4/14[-4/4(log2(4/4))-0/4(log2(0/4))] – 5/14[-3/5(log2(3/5))-2/5(log2(2/5))] = 0.94 – 0.69 = 0.25

Entropy of data S

Split data by attribute Outlook

Info(S) = -9/14(log2(9/14))-5/14(log2(5/14)) = 0.94

Page 9: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

9

Tree induction example

S[9+, 5-] Temperature

<15 [3+,1-]15-25 [5+,1-]>25 [2+,2-]

Gain(Temperature) = 0.94 – 4/14[-3/4(log2(3/4))-1/4(log2(1/4))] – 6/14[-5/6(log2(5/6))-1/6(log2(1/6))] – 4/14[-2/4(log2(2/4))-2/4(log2(2/4))] = 0.94 – 0.80 = 0.14

Split data by attribute Temperature

Page 10: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

10

S[9+, 5-] WindWeak [6+, 2-]

Strong [3+, 3-]

Gain(Humidity) = 0.94 – 7/14[-3/7(log2(3/7))-4/7(log2(4/7))] – 7/14[-6/7(log2(6/7))-1/7(log2(1/7))] = 0.94 – 0.79 = 0.15

Gain(Wind) = 0.94 – 8/14[-6/8(log2(6/8))-2/8(log2(2/8))] – 6/14[-3/6(log2(3/6))-3/6(log2(3/6))] = 0.94 – 0.89 = 0.05

Split data by attribute Humidity

Split data by attribute Wind

Tree induction example

S[9+, 5-] HumidityHigh [3+,4-]

Normal [6+, 1-]

Page 11: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

11

Outlook

Yes?? ??

Overcast

Sunny Rain

Gain(Outlook) = 0.25Gain(Temperature)=0.14Gain(Humidity) = 0.15Gain(Wind) = 0.05

NoWeakHigh>25SunnyNoStrongHigh>25SunnyYesWeakHigh>25Overca

stYesWeakHigh15-25RainYesWeakNormal<15RainNoStrongNormal<15RainYesStrongNormal<15Overca

stNoWeakHigh15-25SunnyYesWeakNormal<15SunnyYesWeakNormal15-25RainYesStrongNormal15-25SunnyYesStrongHigh15-25Overca

stYesWeakNormal>25Overca

stNoStrongHigh15-25Rain

Play Tennis

WindHumidity

Temperature

Outlook

Tree induction example

Page 12: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

12

Sunny[2+, 3-] Wind

Weak [1+, 2-]

Strong [1+, 1-]

Gain(Humidity) = 0.97 – 3/5[-0/3(log2(0/3))-3/3(log2(3/3))] – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))]= 0.97 – 0 = 0.97

Gain(Wind)= 0.97 – 3/5[-1/3(log2(1/3))-2/3(log2(2/3))] – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))]= 0.97 – 0.95= 0.02

Entropy of branch Sunny

Split Sunny branch by attribute Temperature

Split Sunny branch by attribute Humidity

Split Sunny branch by attribute Wind

Info(Sunny) = -2/5(log2(2/5))-3/5(log2(3/5)) = 0.97

Sunny[2+,3-] Temperature

<15 [1+,0-]15-25 [1+,1-]>25 [0+,2-]

Gain(Temperature) = 0.97 – 1/5[-1/1(log2(1/1))-0/1(log2(0/1))] – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]= 0.97 – 0.4 = 0.57

Sunny[2+,3-] Humidity

High [0+,3-]

Normal [2+, 0-]

Page 13: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

13

Outlook

YesHumidity ??

YesNo

High

Sunny Rain

Normal

Overcast

Tree induction example

Page 14: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

Gain(Humidity) = 0.97 – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]= 0.97 – 0.95 = 0.02

Gain(Wind) = 0.97 – 3/5[-3/3(log2(3/3))-0/3(log2(0/3))] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]= 0.97 – 0 = 0.97

Entropy of branch Rain

Split Rain branch by attribute Temperature

Split Rain branch by attribute Humidity

Split Rain branch by attribute Wind

14

Info(Rain) = -3/5(log2(3/5))-2/5(log2(2/5)) = 0.97

Rain[3+,2-] Temperature

<15 [1+,1-]15-25 [2+,1-]>25 [0+,0-]

Gain(Outlook) = 0.97 – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))] – 0/5[-0/0(log2(0/0))-0/0(log2(0/0))]= 0.97 – 0.95 = 0.02

Rain[3+,2-] Wind

Weak [3+, 0-]

Strong [0+, 2-]

Rain[3+,2-] Humidity

High [1+,1-]

Normal [2+, 1-]

Page 15: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

15

Outlook

YesHumidity Wind

YesNo

NormalHigh

NoYes

StrongWeak

OvercastSunny Rain

Page 16: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

16

Bayesian Classification A statistical classifier: performs probabilistic

prediction, i.e., predicts class membership probabilities where xi is the value of attribute

Ai

Choose the class label that has the highest probability Foundation: Based on Bayes’ Theorem.

posteriori probability prior probability likelihood

),...,,|( 21 ni xxxCP

),...,,|( 21 ni xxxCP

)|,...,,( 21 in CxxxP

)( iCP

),...,,()()|,...,,(),...,,|(

21

2121

n

iinni xxxP

CPCxxxPxxxCP

Model: compute from data

Page 17: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

)|,...,,( 21 in CxxxP

17

Naïve Bayes Classifier Problem: joint probabilities are difficult to estimate

Naïve Bayes Classifier Assumption: attributes are conditionally independent

)|()|()|,...,,( 121 iniin CxPCxPCxxxP

11 2

1 2

( | ) ( )( | , ,..., )

( , ,..., )

nj i ij

i nn

P x C P CP C x x x

P x x x

Page 18: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

A B Cm b tm s tg q th s tg q tg q fg s fh b Fh q fm b f

18

Example: Naïve Bayes Classifier

P(C=t) = 1/2 P(C=f) = 1/2

P(A=m|C=t) = 2/5P(A=m|C=f) = 1/5P(B=q|C=t) = 2/5P(B=q|C=f) = 2/5

Test Record: A=m, B=q, C=?

Page 19: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

For C = tP(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 *

1/2 = 2/25

P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q)

For C = fP(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 *

1/2 = 1/25

P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q)

Conclusion: A=m, B=q, C=t19

Example: Naïve Bayes Classifier

Higher!

Page 20: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

InputA set of stored recordsk: # of nearest neighbors

OutputCompute distance: Identify k nearest neighborsDetermine the class label of unknown record based on

class labels of nearest neighbors (i.e. by taking majority vote)

20

Nearest Neighbor Classification

i ii

qpqpd 2)(),(

Page 21: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

21

Nearest Neighbor ClassificationInput Given 8 training

instancesP1 (4, 2) OrangeP2 (0.5, 2.5) OrangeP3 (2.5, 2.5) OrangeP4 (3, 3.5) OrangeP5 (5.5, 3.5) OrangeP6 (2, 4) BlackP7 (4, 5) BlackP8 (2.5, 5.5) Black k = 1 & k = 3

New Instance:Pn (4, 4) ?

Calculate the distances:d(P1, Pn) = d(P2, Pn) = 3.80d(P3, Pn) = 2.12d(P4, Pn) = 1.12d(P5, Pn) = 1.58d(P6, Pn) = 2d(P7, Pn) = 1d(P8, Pn) = 2.12

A Discrete Example

2)42()44( 22

Page 22: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

22

Nearest Neighbor Classificationk = 1

P1P2 P3

P4 P5P6

P7P8

Pn

P1P2 P3

P4 P5

P6

P7P8

Pn

k = 3

Page 23: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

Scaling issuesAttributes may have to be scaled to

prevent distance measures from being dominated by one of the attributes• Each attribute must follow in the same range• Min-Max normalization

Example:• Two data records: a = (1, 1000), b = (0.5, 1)• dis(a, b) = ?

23

Nearest Neighbor Classification…

Page 24: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

Two Types of Learning MethodologiesLazy Learning

• Instance-based learning. (k-NN)Eager Learning

• Decision-tree and Bayesian classification.• ANN & SVM

24

Classification: Lazy & Eager Learning

P1P2 P3

P4 P5

P6

P7P8Pn

P1P2 P3

P4 P5

P6

P7P8Pn

Page 25: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

Lazy Learninga. Do not require model buildingb. Less time training but more time predictingc. Lazy method effectively uses a richer

hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function

Eager Learninga. Require model buildingb. More time training but less time predictingc. Must commit to a single hypothesis that

covers the entire instance space

25

Differences Between Lazy &Eager Learning

Page 26: SEEM4630  2013-2014 Tutorial  2  Classification : Decision tree, Naïve Bayes &  k-NN

Thank you & Questions?

26