Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard...

Post on 18-Jan-2018

218 views 0 download

Transcript of Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard...

Testing Predictive Performance of Ecological Niche Models

A. Townsend Peterson, STOLEN FROMRichard Pearson

Niche Model Validation• Diverse challenges …

– Not a single loss function or optimality criterion– Different uses demand different criteria– In particular, relative weights applied to omission and

commission errors in evaluating models

• Nakamura: “which way is relevant to adopt is not a mathematical question, but rather a question for the user”– Asymmetric loss functions

Where do I get testing data????

(after Araújo et al. 2005 Gl. Ch. Biol.)

Model calibration and evaluation strategies: resubstitution

100%

Same region

Different region

Different time

Different resolutionEvaluation

Calibration

Projection

All available

data

(after Araújo et al. 2005 Gl. Ch. Biol.)

Model calibration and evaluation strategies: independent validation

100%All

available data

Same region

Different region

Different time

Different resolutionEvaluation

Calibration

Projection

(after Araújo et al. 2005 Gl. Ch. Biol.)

Model calibration and evaluation strategies: data splitting

70%

Test data

Same region

Different region

Different time

Different resolution

Evaluation

Calibration

Projection

Calibration data

30%

Types of Error

The four types of results that are possible when testing a distribution model

(see Pearson NCEP module 2007)

Presence-absence confusion matrix

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Thresholding

Selecting a decision threshold (p/a data)

(Liu et al. 2005 Ecography 29:385-393)

Selecting a decision threshold (p/a data)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

Threshold

Kapp

a

Selecting a decision threshold (p/a data)

Omission(proportion of presences predicted absent)

(c/a+c)

Commission(proportion of absences predicted present)

(b/b+d)

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

threshold

omis

sion

rate

LPTT10

Selecting a decision threshold (p-o data)

Threshold-dependent Tests(= loss functions)

The four types of results that are possible when testing a distribution model

(see Pearson NCEP module 2007)

Presence-absence test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion (%) correctly predicted (or ‘accuracy’, or ‘correct classification rate’):

(a + d)/(a + b + c + d)

Cohen’s Kappa:

)]/)))(())(((([)]/)))(())(((()[(

ndcdbbacanndcdbbacadak

Presence-absence test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):

a/(a + c)

Presence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):

a/(a + c)

Proportion of observed presences incorrectly predicted (or ‘omission rate’, or ‘false negative fraction’):

c/(a + c)

Presence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Presence-only test statistics:testing for statistical significance

U. sikorae

Leaf-tailed gecko (Uroplatus)

U. sikorae

Success rate: 4 from 7Proportion predicted present: 0.231Binomial p = 0.0546

Success rate: 6 from 7Proportion predicted present: 0.339Binomial p = 0.008

Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):

d/(b + d)

Absence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):

d/(b + d)

Proportion of observed (or assumed) absences incorrectly predicted (or ‘commission rate’, or ‘false positive fraction’):

b/(b + d)

Absence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

AUC: a threshold-independent test statistic

Predicted presentPredicted absent

Recorded present Recorded (or assumed) absent

a (true positive)c (false negative)

b (false positive)d (true negative)

sensitivity = a/(a+c)

specificity = d/(b+d)

(1 – omission rate)

(fraction of absences predicted present)

1 - specificity0 1

0

1

sens

itivi

ty Predicted probability of occurrence

Predicted probability of occurrence

10

10Fr

eque

ncy

Freq

uenc

y

set of ‘absences’ set of ‘presences’

set of ‘absences’ set of ‘presences’

Threshold-independent assessment:The Receiver Operating Characteristic (ROC) Curve

A B

C

(check out: http://www.anaesthetist.com/mnm/stats/roc/Findex.htm)