Data Mining (and machine learning)

Data Mining(and machine learning)

ROC curves

Rule InductionCW3

Two classes is a common and special case


Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …


Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer





False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.






False Negative: also to be minimised – miss a landmine / cancer very bad in many applications






False Negative: also to be minimised – miss a landmine / cancer very bad in many applications

True Negative?:




Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks




Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task

Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition?




YES

NO

YES

NO

Sensitivity: 100%Specificity: 25%

YES NO

YES

NO

Sensitivity: 93.8%Specificity: 50%

YES

NO

Sensitivity: 81.3%Specificity: 83.3%

YES NO

YES

NO


YES NO

YES

NO

Sensitivity: 100%Specificity: 25%

YES NO

100% Sensitivity means: detects all cancer cases (or whatever) but possibly with many false positives

YES

NO


YES NO

100% Specificity means: misses some cancer cases (or whatever) but no false positives

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks

Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases?A highly sensitive test for cancer: if “NO” then you be sure it’s “NO”

Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”.

With many trained classifiers, you can ‘move the line’ in this way.E.g. with NB, we could use a threshold indicating how much higherthe log likelihood for Y should be than for N

ROC curves

David Corne, and Nick Taylor, Heriot-Watt University - [email protected] slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html

Rule Induction• Rules are useful when you want to learn a

clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible

• There are a number of different ways to ‘learn’ rules or rulesets.

• Before we go there, what is a rule / ruleset?

Rules

IF Condition … Then Class Value is …

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Rules are Rectangular

IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Rules are Rectangular

IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO

A Ruleset

IF Condition1 … Then Class = A


IF Condition3 … Then Class = B

IF Condition4 … Then Class = C

…

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

What’s wrong with this ruleset?(two things)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

What about this ruleset?

Two ways to interpret a ruleset:


As a Decision List


ELSE IF Condition2 … Then Class = A

ELSE IF Condition3 … Then Class = B

ELSE IF Condition4 … Then Class = C

…

ELSE … predict Background Majority Class


As an unordered set



IF Condition3 … Then Class = B

IF Condition4 … Then Class = C

Check each rule and gather votes for each class

If no winner, predict background majority class

Three broad ways to learn rulesets


1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!


2. Use any good search/optimisation algorithm.

Evolutionary (genetic) algorithms are the most

common. You will do this coursework 3.

This means simply guessing a ruleset at random,

and then trying mutations and variants, gradually

improving them over time.


3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Take each class in turn ..

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Pick a random member of that class in the training set

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Next class

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

And so on…

CW3• Run expts program that evolves a ruleset

• Try different sizes of training and test set

• Observe ‘overfitting’ and report

Data Mining (and machine learning)

Documents

Transcript of Data Mining (and machine learning)