Data Mining (and machine learning)

44
Data Mining (and machine learning) ROC curves Rule Induction CW3

description

Data Mining (and machine learning). ROC curves Rule Induction Basics of Text Mining. Two classes is a common and special case. Two classes is a common and special case. Medical applications: cancer, or not? Computer Vision applications: landmine, or not? - PowerPoint PPT Presentation

Transcript of Data Mining (and machine learning)

Page 1: Data Mining (and machine learning)

Data Mining(and machine learning)

ROC curves

Rule InductionCW3

Page 2: Data Mining (and machine learning)

Two classes is a common and special case

Page 3: Data Mining (and machine learning)

Two classes is a common and special case

Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …

Page 4: Data Mining (and machine learning)

Two classes is a common and special case

Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 5: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 6: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 7: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.

False Negative: also to be minimised – miss a landmine / cancer very bad in many applications

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 8: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.

False Negative: also to be minimised – miss a landmine / cancer very bad in many applications

True Negative?:

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 9: Data Mining (and machine learning)

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 10: Data Mining (and machine learning)

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task

Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition?

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 11: Data Mining (and machine learning)

YES

NO

Page 12: Data Mining (and machine learning)

YES

NO

Page 13: Data Mining (and machine learning)

YES

NO

Sensitivity: 100%Specificity: 25%

YES NO

Page 14: Data Mining (and machine learning)

YES

NO

Sensitivity: 93.8%Specificity: 50%

Page 15: Data Mining (and machine learning)

YES

NO

Sensitivity: 81.3%Specificity: 83.3%

YES NO

Page 16: Data Mining (and machine learning)

YES

NO

Sensitivity: 56.3%Specificity: 100%

YES NO

Page 17: Data Mining (and machine learning)

YES

NO

Sensitivity: 100%Specificity: 25%

YES NO

100% Sensitivity means: detects all cancer cases (or whatever) but possibly with many false positives

Page 18: Data Mining (and machine learning)

YES

NO

Sensitivity: 56.3%Specificity: 100%

YES NO

100% Specificity means: misses some cancer cases (or whatever) but no false positives

Page 19: Data Mining (and machine learning)

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks

Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases?A highly sensitive test for cancer: if “NO” then you be sure it’s “NO”

Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”.

With many trained classifiers, you can ‘move the line’ in this way.E.g. with NB, we could use a threshold indicating how much higherthe log likelihood for Y should be than for N

Page 20: Data Mining (and machine learning)

ROC curves

David Corne, and Nick Taylor, Heriot-Watt University - [email protected] slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html

Page 21: Data Mining (and machine learning)

Rule Induction• Rules are useful when you want to learn a

clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible

• There are a number of different ways to ‘learn’ rules or rulesets.

• Before we go there, what is a rule / ruleset?

Page 22: Data Mining (and machine learning)

Rules

IF Condition … Then Class Value is …

Page 23: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Rules are Rectangular

IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES

Page 24: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Rules are Rectangular

IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO

Page 25: Data Mining (and machine learning)

A Ruleset

IF Condition1 … Then Class = A

IF Condition2 … Then Class = A

IF Condition3 … Then Class = B

IF Condition4 … Then Class = C

Page 26: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

What’s wrong with this ruleset?(two things)

Page 27: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

What about this ruleset?

Page 28: Data Mining (and machine learning)

Two ways to interpret a ruleset:

Page 29: Data Mining (and machine learning)

Two ways to interpret a ruleset:

As a Decision List

IF Condition1 … Then Class = A

ELSE IF Condition2 … Then Class = A

ELSE IF Condition3 … Then Class = B

ELSE IF Condition4 … Then Class = C

ELSE … predict Background Majority Class

Page 30: Data Mining (and machine learning)

Two ways to interpret a ruleset:

As an unordered set

IF Condition1 … Then Class = A

IF Condition2 … Then Class = A

IF Condition3 … Then Class = B

IF Condition4 … Then Class = C

Check each rule and gather votes for each class

If no winner, predict background majority class

Page 31: Data Mining (and machine learning)

Three broad ways to learn rulesets

Page 32: Data Mining (and machine learning)

Three broad ways to learn rulesets

1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!

Page 33: Data Mining (and machine learning)

Three broad ways to learn rulesets

2. Use any good search/optimisation algorithm.

Evolutionary (genetic) algorithms are the most

common. You will do this coursework 3.

This means simply guessing a ruleset at random,

and then trying mutations and variants, gradually

improving them over time.

Page 34: Data Mining (and machine learning)

Three broad ways to learn rulesets

3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage

Page 35: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Take each class in turn ..

Page 36: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Pick a random member of that class in the training set

Page 37: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 38: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 39: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 40: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 41: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Next class

Page 42: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Next class

Page 43: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

And so on…

Page 44: Data Mining (and machine learning)

CW3• Run expts program that evolves a ruleset

• Try different sizes of training and test set

• Observe ‘overfitting’ and report