Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald

Post on 07-Jan-2016

34 views 1 download

description

All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald August 10, 2012 Ecological Society of America Annual Meeting Portland, Oregon. - PowerPoint PPT Presentation

Transcript of Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald

All for one or One for All?

Mapping many species individually vs. simultaneously with random forest.

Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather Roberts and Harold Zald

August 10, 2012Ecological Society of America Annual Meeting

Portland, Oregon

Species Distribution Modeling

• Been around for a long time, and has exploded over the last decade.

With the rise of new powerful statistical techniques and GIS tools, the development of predictive habitat distribution models has rapidly increased in ecology.

– Guisan and Zimmerman 2000• Generalized Linear/Additive Models • Neural networks• Bayesian models• Ordination• Classification methods

• Web of Knowledge: ‘species distribution’– 2000 - 2001: 556 articles– 2011 – 2012: 1,389 articles

SDM Uses

From Giusan and Thuiller 2005

Strategies for community-level modeling

• ‘assemble first, predict later’

• ‘predict first, assemble later’

• ‘assemble and predict together’

--Ferrier & Guisan 2006

Objective: Compare two strategies for community-level predictive mapping.

You Are Here

Pacific silver fir Abies amabilisGrand fir/ White fir Abies grandis / concolorSubalpine fir Abies lasiocarpaNoble fir / Shasta red fir Abies procera/shastensisBigleaf maple Acer macrophyllusRed alder Alnus rubraMadrone Arbutus menzieziiIncense cedar Calocedrus decurrensMountain mahogany Cercocarpus ledifoliusGiant chinkapin Chrysolepis chrysophyllaPacific Dogwood Cornus nutalliiOregon ash Fraxinus latifoliaWestern Juniper Juniperus occidentalisNo Trees PresentLodgepole pine Pinus contortaEngelman spruce Picea engelmaniiJeffrey Pine Pinus jeffreyiiSugar pine Pinus lambertianaWestern white pine Pinus monticolaPonderosa pine Pinus ponderosaBlack cottonwood Populus balsamifera ssp trichocarpaBitter cherry Prunus emarginataDouglas-fir Pseudotsuga menzieziiOregon white oak Quercus garryanaCalifornia black oak Quercus kelloggiiPacific yew Taxus brevifoliaWestern red cedar Thuja plicataWestern hemlock Tsuga heterophyllaMountain hemlock Tsuga mertensiana

Plot Data

Forest Inventory and Analysis Annual Plots: 1948 plots

Techniques – Random Forest Based (Breiman 2001, Cutler et al. 2007)

Binary prediction (R package: randomForest, Liaw & Wiener 2002)

Continuous prediction

Nearest Neighbor Imputation (R package: yaImpute, Crookston & Finley 2008)

Spatial Data Layers

Climate (from PRISM climate data)

Soil Parent Material (from SSURGO/Soil Resources Inventory)

Topography (from National Elevation Dataset)

Spectral reflectance (LANDSAT)

|SMRTP < 228.5

ANNTMP < 606

TC3 < -1433.5

SMRTP < 244.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTMP < 1169

TC3 < -1440.39 SMRTP < 246.5

ANNTMP < 748.5FALSE TRUE

FALSE FALSEFALSE

|SMRTMP < 1223.5

SMRTP < 228.5

TC1 < 2164.61

SMRTP < 246.5

TRUE FALSEFALSE FALSE FALSE

|SMRTP < 228.5

DEM < 1268.5

TC1 < 2162.89

SMRTP < 244.5

FALSETRUE FALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1239.17

SMRTP < 268.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1240.94

SMRTMP < 1327.5

FALSE TRUEFALSE

FALSE FALSE

# True / # Trees = 4/6 = .66

For RF Regression, predicted value for a pixel is the average of all the predictions of nodes.

Random forest -- Nearest-Neighbor imputation

Imputation = Filling in missing values from existing values.

studyarea

(2) Place new pixel

withinfeature

space

(3) find nearest-neighbor plot within feature

space

(4) impute nearest

neighbor’s Plot ID # to

pixel

Methods: k-NN

feature space geographic space

Elevation

Rainfall

(1)Place plots

within feature space

“Assemble and Predict Together”

(2) calculate

axis scores of pixel from

mapped data layersstudyarea

(3) find nearest-neighbor plot

in gradient space

(4) impute nearest

neighbor’s Plot ID# to

pixel

Methods: GNN (Ohmann and Gregory 2002)

gradient space geographic spaceCCA

Axis 2(e.g., Temperature,

Elevation)

CCAAxis 1

(e.g., Rainfall, local

topography)

(1)conductgradient

analysis ofplot data

studyarea

Methods: Random Forest Nearest Neighbor Imputation

Random Forest space geographic space

|SMRTP < 228.5

ANNTMP < 606

TC3 < -1433.5

SMRTP < 244.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTMP < 1169

TC3 < -1440.39 SMRTP < 246.5

ANNTMP < 748.5FALSE TRUE

FALSE FALSEFALSE

|SMRTMP < 1223.5

SMRTP < 228.5

TC1 < 2164.61

SMRTP < 246.5

TRUE FALSEFALSE FALSE FALSE

|SMRTP < 228.5

DEM < 1268.5

TC1 < 2162.89

SMRTP < 244.5

FALSETRUE FALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1239.17

SMRTP < 268.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1240.94

SMRTMP < 1327.5

FALSE TRUEFALSE

FALSE FALSE

23

4

567

89 10

3 3

3 1

11

77

777

5

5

5

2

2 2

5 4

68

Nearest Neighbor Plot: #3Second Nearest Neighbor: #5

Strategies for communitiy-level modeling

• ‘assemble first, predict later’

• ‘predict first, assemble later’– Random forest – classification (binary prediction)– Random forest – regression (continuous prediction)

• ‘assemble and predict together’– Random forest – imputation (continuous prediction)

--Ferrier & Giusan 2006

Dimensions of Map Accuracy

• Single-species metrics– Range – presence/absence– Abundance – How much basal area?– Is the distribution of values predicted realistic?

• Community-level metrics– Diversity– Composition

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sensitivity: True positives/(True Positives + False Negatives)

Specificity: True Negatives/(True Negatives + False Positives)

True Skill Statistic (TSS): Sensitivity + Specificity - 1

Root Mean Square Difference:

17.72

18.46

0 50 100 150

0.40.50.60.70.80.91.0

Value

Cum

ulat

ive

% o

f da

tase

t

RF_CRFNNPlot Data

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

0 50 100 150 200

0.40.50.60.70.80.91.0

Value

Cum

ulat

ive

% o

f dat

aset RF_C

RFNNPlot DataRoot Mean Square Difference:

21.34

18.73

Single Species Models• Range

– Random Forest – Binary: best– Random Forest – Nearest Neighbor: acceptable– Random Forest -- Continuous: fail

• Abundance (Basal Area)– RMSD

• Random Forest – Continuous: best• Random Forest – Nearest Neighbor: acceptable• Random Forest – Binary: NA

– Empirical Cumulative Distribution Functions: (predicted value distributions)

• Random Forest – Nearest Neighbor: best• Random Forest – Continuous: fail• Random Forest – Binary: NA

Obse

rvatio

ns

RF_B

RF_C

RFN

N_C

Alpha diversity

0

5

10

15

20

Diversity: Species Richness and Evenness

Obse

rvatio

ns

RF_C

RFN

N_C

Shannon diversity

y0.0

0.5

1.0

1.5

Beta Diversity

Obse

rvatio

ns

RF_B

RF_C

RFN

N_C

Beta

Div

ers

ity

0

2

4

6

8

10

12

1 1 3 5 6

1 1 3 5 4

2 2 3 5 4

2 2 3 4 4

2 2 3 4 4

Average Alpha Diversity for Blue Pixel: 3.04

1 1 3 5 6

1 1 3 5 4

2 2 3 5 4

2 2 3 4 4

2 2 3 4 4

Results – Composition

RF_B

RF_C

RFN

N_B

Bray-Curtis, Binary

0.0

0.2

0.4

0.6

RF_

C

RFN

N_C

Bray-Curtis, Continuous

0.0

0.1

0.2

0.3

0.4

What is the Bray-Curtis distance between our observed and predicted communities?

Discussion• Species absences are an important dimension of

composition– Disturbance?– Succession?– Competition/Facilitation?– Dispersal limitations?

• Community assembly rules can be used to help refine mapped species lists. (e.g., Guisan and Rahbek, 2011)

• But… imputation avoids the pitfalls & complications of re-assembling communities after mapping because they are never taken apart.

Conclusions• Practical Considerations:

– Models of individual species may be • Strongest in one dimension• Useful for understanding species’ ecology• The best option for some types of available data (e.g.,

presence-only data from museum specimens)

– Nearest Neighbor mapping is a useful tool for building multipurpose maps.

• Ranges and abundances• Composition• Diversity

Acknowledgements

• Nationwide Forest Imputation Study

• Landscape Ecology Modeling Mapping and Analysis team in Corvallis.