Data Mining (and machine learning)
description
Transcript of Data Mining (and machine learning)
Data Mining(and machine learning)
ROC curves
Rule InductionCW3
Two classes is a common and special case
Two classes is a common and special case
Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …
Two classes is a common and special case
Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer
False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer
False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.
False Negative: also to be minimised – miss a landmine / cancer very bad in many applications
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer
False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.
False Negative: also to be minimised – miss a landmine / cancer very bad in many applications
True Negative?:
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task
Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition?
Predicted Y Predicted N
Actually Y True Positive False Negative
Actually N False Positive True Negative
YES
NO
YES
NO
YES
NO
Sensitivity: 100%Specificity: 25%
YES NO
YES
NO
Sensitivity: 93.8%Specificity: 50%
YES
NO
Sensitivity: 81.3%Specificity: 83.3%
YES NO
YES
NO
Sensitivity: 56.3%Specificity: 100%
YES NO
YES
NO
Sensitivity: 100%Specificity: 25%
YES NO
100% Sensitivity means: detects all cancer cases (or whatever) but possibly with many false positives
YES
NO
Sensitivity: 56.3%Specificity: 100%
YES NO
100% Specificity means: misses some cancer cases (or whatever) but no false positives
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks
Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases?A highly sensitive test for cancer: if “NO” then you be sure it’s “NO”
Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”.
With many trained classifiers, you can ‘move the line’ in this way.E.g. with NB, we could use a threshold indicating how much higherthe log likelihood for Y should be than for N
ROC curves
David Corne, and Nick Taylor, Heriot-Watt University - [email protected] slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Rule Induction• Rules are useful when you want to learn a
clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible
• There are a number of different ways to ‘learn’ rules or rulesets.
• Before we go there, what is a rule / ruleset?
Rules
IF Condition … Then Class Value is …
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Rules are Rectangular
IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Rules are Rectangular
IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO
A Ruleset
IF Condition1 … Then Class = A
IF Condition2 … Then Class = A
IF Condition3 … Then Class = B
IF Condition4 … Then Class = C
…
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
What’s wrong with this ruleset?(two things)
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
What about this ruleset?
Two ways to interpret a ruleset:
Two ways to interpret a ruleset:
As a Decision List
IF Condition1 … Then Class = A
ELSE IF Condition2 … Then Class = A
ELSE IF Condition3 … Then Class = B
ELSE IF Condition4 … Then Class = C
…
ELSE … predict Background Majority Class
Two ways to interpret a ruleset:
As an unordered set
IF Condition1 … Then Class = A
IF Condition2 … Then Class = A
IF Condition3 … Then Class = B
IF Condition4 … Then Class = C
Check each rule and gather votes for each class
If no winner, predict background majority class
Three broad ways to learn rulesets
Three broad ways to learn rulesets
1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!
Three broad ways to learn rulesets
2. Use any good search/optimisation algorithm.
Evolutionary (genetic) algorithms are the most
common. You will do this coursework 3.
This means simply guessing a ruleset at random,
and then trying mutations and variants, gradually
improving them over time.
Three broad ways to learn rulesets
3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Take each class in turn ..
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Pick a random member of that class in the training set
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Next class
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
Next class
YES
NO
5
4
3
2
1
00 1 2 3 4 5 6 7 8 9 10 11 12
And so on…
CW3• Run expts program that evolves a ruleset
• Try different sizes of training and test set
• Observe ‘overfitting’ and report