Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University...

36
Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University [email protected] Part 2 1

Transcript of Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University...

Page 1: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

1

Pattern Recognition and Machine Learning

Lucy Kuncheva

School of Computer ScienceBangor [email protected]

Part 2

Page 2: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

2

Pattern Recognition – DIY using WEKA

Page 3: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

3

The weka (also known as Maori hen or woodhen) (Gallirallus australis) is a flightless bird species of the rail family. It is endemic to New Zealand, where four subspecies are recognized. Weka are sturdy brown birds, about the size of a chicken. As omnivores, they feed mainly on invertebrates and fruit.

http://en.wikipedia.org/wiki/Weka

Page 4: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

WEKAhttp://www.cs.waikato.ac.nz/ml/weka/

“WEKA is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. WEKA contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.” 4

Page 5: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

5

WEKAAnd we will be using only the hammer...

Page 6: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

6

Data set:

OB

JEC

TS

FEATURES(attributes, variables, covariates...)

123.N

object # 3

1 2 3 . . . n

feature # 2

8

Your data sets are of the WIDE type: small number of objects, large number of features

PROBLEM

Page 7: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

WEKA

7

Prepare the file .arff:

1. Open in an ascii editor2. Add rows• @RELATION one_word• @ATTRIBUTE name

NUMERIC... for all features

• @ATTRIBUTE class {1,2}... for the class variable

• @DATA3. Paste the data underneath

Page 8: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

8

Feature selection

(b) Feature subsets

2 questio

ns

How do we select the subsets?

How do we evaluate the worth of a

subset?

Page 9: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

9

Feature selection

(b) Feature subsets

2 questio

ns

How do we select the subsets?

How do we evaluate the worth of a

subset?Not our problem now

Classification accuracy

Wrapper

Filter Embedded

Some easier-to-calculate proxy for the

Classification accuracy

Decision tree classifierSVM

Page 10: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

10

Feature selection

(b) Feature subsets

2 questio

ns

How do we select the subsets?

How do we evaluate the worth of a

subset?

Wrapper Filter

Embedded

Ranker Greedy

Sequential Forward Selection

(SFS)

Random

Heuristic search

Bespoke

Genetic Algorithms

(GA)

Swarm optimisation

Page 11: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

11

Feature selection

(b) Feature subsets

2 questio

ns

How do we select the subsets?

How do we evaluate the worth of a

subset?

Wrapper Filter

Embedded

Ranker Greedy

Sequential Forward Selection

(SFS)

Random

Heuristic search

Bespoke

Genetic Algorithms

(GA)

Swarm optimisation

Page 12: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

12

Feature selection methods

FCBF (Fast Correlation-Based Filter) - originally proposed for microarray data analysis (Yu and Liu, 2003). The idea of FCBF is that the features that are worth keeping should be correlated with the class variable but not correlated among themselves.

CfsSubsetEval

1. L. Yu and H. Liu (2003), Feature selection for high-dimensional data: A fast correlation-based filter solution.

Page 13: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

13

Feature selection methods

Relief-F. Kira and Rendell, 1992; Kononenko et al., 1997.

For each object in the data set, find the nearest neighbour from the same class (NearHit) and the nearest neighbour from the opposite class (NearMiss) using all features. The relevance score of a feature increases if the feature value in the current object is closer to that in the NearHit compared to that in the NearMiss. Otherwise, the relevance score of the feature decreases.

ReliefFAttributeEval

1. K. Kira and L. Rendell (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings.

2. I. Kononenko et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55

Page 14: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

14

Feature selection methods

Relief-F.

Current object

NearHit

NearMiss

Relevance score for x increases

Relevance score for ydecreases

Page 15: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

15

Feature selection methods

SVM. This classifier builds a linear function that separates the classes. The hyperplane is calculated so as to maximise the distance to the nearest points. The absolute values of the coefficients in front of the features can be interpreted as “importance”. SVM-RFE. RFE stands for “Recursive Feature Elimination” (Guyon et al., 2006). Starting with an SVM on the entire feature set, a fraction of the features with the lowest weights is dropped. A new SVM is trained with the remaining features, and subsequently reduced in the same way. The procedure stops when the set of the desired cardinality is reached. While SVM-RFE has been found to be extremely useful for wide data such as functional magnetic resonance imaging (fMRI) data (DeMartino et al., 2008), it was discovered that the RFE step is not always needed (Abeel et al., 2010; Geurts et al., 2005).SVMAttributeEval

Page 16: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

16

Feature selection methods

SVM-RFEEliminate one feature at each iteration(default)

SVMSet this value to 0

Page 17: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

17

Feature selection methods

Ranked attributes: 6 2 GRIP_TEST_Right 5 5 HEIGHT_Standing_cm 4 1 GRIP_TEST_Left 3 4 HEIGHT_Seated_cm 2 3 WEIGHT_Kg 1 6 ARM_SPAN_cm

For this example, both SVM and SVM-RFE give the same result

Selected attributes: 1,2,5 : 3 GRIP_TEST_Left GRIP_TEST_Right HEIGHT_Standing_cm

FCBF

Ranked attributes: 0.07863 2 GRIP_TEST_Right 0.07549 5 HEIGHT_Standing_cm 0.05528 4 HEIGHT_Seated_cm 0.05414 1 GRIP_TEST_Left 0.03172 3 WEIGHT_Kg 0.00797 6 ARM_SPAN_cm

Relief-F

Page 18: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

18

Feature selection methods

Ranked attributes: 6 2 GRIP_TEST_Right 5 5 HEIGHT_Standing_cm 4 1 GRIP_TEST_Left 3 4 HEIGHT_Seated_cm 2 3 WEIGHT_Kg 1 6 ARM_SPAN_cm

For this example, both SVM and SVM-RFE give the same result

Selected attributes: 1,2,5 : 3 GRIP_TEST_Left GRIP_TEST_Right HEIGHT_Standing_cm

FCBF

Ranked attributes: 0.07863 2 GRIP_TEST_Right 0.07549 5 HEIGHT_Standing_cm 0.05528 4 HEIGHT_Seated_cm 0.05414 1 GRIP_TEST_Left 0.03172 3 WEIGHT_Kg 0.00797 6 ARM_SPAN_cm

Relief-F

PROBLEM

While these results are (probably) curious, there is no statistical significance we can attach to them...

Page 19: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

19

Time for a coffee-break

Page 20: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

20

Feature selection methods

Permutation test

Feature of interest: X

Class label variable: Y (say, G/N)

Let XG be the sample from class G, and XN, the sample from class N.

Two-sample t-test can be used to test the hypothesis of equal means when XG and XN come from approximately normal distributions.

If we cannot ascertain this condition, use PERMUTATION tests.

Quantity of interest

V = | mXG - mXN | (difference between the two means)

Observed value for our data: V*

Question: What is the probability that we observe V* if there was no relationship between X and the class label Y.

X Y4.32.11.82.33.2

G

N

G

G

N

... ...

Page 21: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

21

Feature selection methods

Permutation test

p-value = 0.00460 2 4 6 8 10

0

5000

10000

15000

20000

25000

30000

abs(mean1 - mean2)

# oc

cure

nces

1. ANTHRO- HEIGHT - Standing (cm)

Observed value

Histogram of V for permuted labels

Very small chance to obtain the observed V* or larger.

Page 22: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

22

Feature selection methods

1. ANTHRO- HEIGHT - Standing (cm) 1. ANTHRO- HEIGHT - Seated (cm) 1. ANTHRO - GRIP TEST Right 2.1 DT PACE BOWL - Average MPH 1. ANTHRO-WEIGHT (Kg) 1. ANTHRO - GRIP TEST - Left 1. ANTHRO - ARM SPAN (cm) 2.1 DT PACE BOWL - max MPH 8.1 FT - SPRINT (40m) 8.1 FT - SPRINT (30m)

Permutation test

0.0046 0.0058 0.0077 0.0100 0.0123 0.0159 0.0193 0.0266 0.0319 0.0489

p-value feature

Page 23: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

23

Neuroscientist Craig Bennett purchased a whole Atlantic salmon, took it to a lab at Dartmouth, and put it into an fMRI machine used to study the brain. The beautiful fish was to be the lab’s test object as they worked out some new methods.

So, as the fish sat in the scanner, they showed it “a series of photographs depicting human individuals in social situations.” To maintain the rigor of the protocol (and perhaps because it was hilarious), the salmon, just like a human test subject, “was asked to determine what emotion the individual in the photo must have been experiencing.”

The Dead Salmon Lo and behold! Brain activity responding to the stimuli!

Page 24: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

24

Bonferroni correction for multiple comparisons = the simplest and most conservative method to control the familywise error rate

If we increase the number of hypotheses in a test, we also increase the likelihood of witnessing a rare event, and therefore declaring difference when there is none.

So, if the desired significance level for the whole family of n tests should be (at most) α, then the Bonferroni correction would test each individual hypothesis at a significance level of α/n. 

In our case, we have n = 50, significance level 0.05/50 = 0.001.

Page 25: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

25

Feature selection methods

1. ANTHRO- HEIGHT - Standing (cm) 1. ANTHRO- HEIGHT - Seated (cm) 1. ANTHRO - GRIP TEST Right 2.1 DT PACE BOWL - Average MPH 1. ANTHRO-WEIGHT (Kg) 1. ANTHRO - GRIP TEST - Left 1. ANTHRO - ARM SPAN (cm) 2.1 DT PACE BOWL - max MPH 8.1 FT - SPRINT (40m) 8.1 FT - SPRINT (30m)

Permutation test

0.0046 0.0058 0.0077 0.0100 0.0123 0.0159 0.0193 0.0266 0.0319 0.0489

p-value feature

PROBLEM

None of the features survives the Bonferroni correction (p < 0.001 for significance level 0.05).

Page 26: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

26

Feature selection methods

Permutation test

More PROBLEMs

1. If there are permutation tests in WEKA, they are hidden very well...

2. If there is Bonferroni correction in WEKA, it is hidden very well too...

Solution?

DIY...

Page 27: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

27

Feature selection methods

Permutation test

1. Calculate the observed value V*. Choose the number of iterations, e.g., T = 10,000 .

2. for i = 1:T

a) Permute the labels randomly

b) Calculate and store V(i) with the permuted labels

3. end (for)

4. Calculate the p-value as the proportion of V greater than or equal to V* .

5. If you do this experiment for n features, compare p with alpha/n, where alpha is your chosen significance level (typically alpha = 0.05).

Here is an algorithm for those of you with some programming experience:(the null hypothesis is “no difference”, hence V = 0; assume the greater the V, the larger the difference)

Page 28: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

28

Feature selection methods

Permutation test

And here is a MATLAB script

% Permutation test (assume that there are no missing values)clear, close, clc X = xlsread('ECB U13 2010 talent testing data.xlsx',... 'U13 Talent Test Raw Data','G2:L27');[~,Y] = xlsread('ECB U13 2010 talent testing data.xlsx',... 'U13 Talent Test Raw Data','F2:F27'); % symbolic label[~,Names] = xlsread('ECB U13 2010 talent testing data.xlsx',... 'U13 Talent Test Raw Data','G1:L1'); % feature names % Convert Y to numbers (1 selected, 2 not selected)u = unique(Y); L = ones(size(Y)); L(strcmp(u(1),Y)) = 2; T = 20000;

continues on the next slide

Page 29: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

29

Feature selection methods

Permutation test

And here is a MATLAB script continued from previous slide...

for i = 1:T la = L(randperm(numel(L))); for j = 1:size(X,2) % for each feature fe = X(:,j); V(i,j) = abs(mean(fe(la == 1)) - mean(fe(la == 2))); endend % p-values for the featuresfor j = 1:size(X,2) V_star(j) = abs(mean(X(L == 1,j)) - mean(X(L == 2,j))); p(j) = mean(V(:,j) > V_star(j)); fprintf('%35s %.4f\n',Names{j},p(j)) end

Page 30: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

30

Feature selection methods

Permutation test

MATLAB output

1. ANTHRO - GRIP TEST - Left 0.0176

1. ANTHRO - GRIP TEST Right 0.0087

1. ANTHRO-WEIGHT (Kg) 0.0124

1. ANTHRO- HEIGHT - Seated (cm) 0.0059

1. ANTHRO- HEIGHT - Standing (cm) 0.0055

1. ANTHRO - ARM SPAN (cm) 0.0202

The numbers may vary slightly from one run to the next because of the random generator. However, the larger the iteration number (T), the better.

The p-values are not corrected (Bonferroni). Correction should be applied if necessary.

Page 31: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

31

Time for a coffee-break

Page 32: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

32

Time for our classifiers!!!

The Classification tab

Choose a

classifier

(SVM)Choose a training-

testingprotocol

When ready (all

chosen) click here

Page 33: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

33

Where to find the results

The confusion

matrix

Page 34: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

34

Where to find the results

Classification accuracy

(and classification error)

Page 35: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

35

And a lot lot more ...

Page 36: Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Part 2 1.

36

Thank you!