Aitf 2014 pem_introduction_presentation_feb28_ram_version2

87
Automated Predictive Mapping: Some thoughts based on my experiences R. A. (Bob) MacMillan LandMapper Environmental Solutions Inc. Presented to ABMI Workshop: Feb 28, 2014

Transcript of Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Page 1: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Automated Predictive Mapping:

Some thoughts based on my experiences

R. A. (Bob) MacMillanLandMapper Environmental Solutions Inc.

Presented to ABMI Workshop: Feb 28, 2014

Page 2: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Outline

Theoretical and Conceptual Considerations

Practical and Operational Considerations

Page 3: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Theoretical & Conceptual Considerations• The Importance of having a Theoretical Framework for

Predictive Mapping– Universal Model of Variation

– Z(x) = Z*(x) + ε’(x) + ε’

• What is Optimization and Why is it Important?– Why all the fuss about optimization?

– Implications of optimization:

• Conventional vs. Pedometric Mapping and Modelling– Why do I now favour pedometric mapping?

• What is Truth? – What do we want and need to map?

– How can we measure how well we mapped what we wanted?

– How “close” is what we predicted to what we observe?

Page 4: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Practical & Operational Considerations• Variable Selection – finding the most effective predictors

– Exploratory data analysis• Discovering structure in the predictor data layers

• Multi-scale analysis to find the most effective resolutions

• Harmonization of disparate site observations

• Model Selection – finding the most effective model(s)– Ensemble models and optimizing multiple models

• Polygon Disaggregation– Making the most of available conventional polygon maps

• Rigour and Structure in Predictive Mapping– Take time to do things right

– Reproducibility and automated work flows

• Open data, open platforms, open software, collaboration

Page 5: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Importance of Having a Strong

Theoretical Foundation

Existing Empirical Models of Variation

Universal Model of Spatial Variation

A Unifying Framework for all Forms of

Spatial Prediction

Page 6: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Theoretical Basis for Current Manual or

“Conventional ” Soil or Ecological Mapping– Jenny (1941)

• CLORPT (Note no N=space)

– Simonson (1959)• Process Model of additions,

removals, translocations, transformations

– Ruhe (1975)• Erosional -Depositional

surfaces, open/closed basins

– Dalrymple et al., (1968)• Nine unit hill slope model

– Milne (1936a, 1936b)• Catena concept, toposequences

Page 7: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Soil-Landscape Model as the

Fundamental Basis for Soil Mapping

Page 8: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Ecological Theory

• J.S. Rowe (1996)

– Landforms, with their vegetation, modify and

shape their coincident climates over all scales

• Earth surface energy-moisture regimes at all scales /sizes are

the dynamic driving variables of functional ecosystems at all

scales/sizes

• Climatic regimes are primarily interpreted from visible terrain

features known to be linked to the regimes of radiation and

moisture (viz. landform and vegetation)

Page 9: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

• J.S. Rowe (1996)

– All fundamental variations in landscape ecosystems

can initially (in primary succession) be attributed to

variations in landforms as they modify climate

• Boundaries between potential ecosystems can be mapped

to coincide with changes in those landform characteristics

known to regulate the reception and retention of energy and

water

Ecological Theory

Page 10: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Universal Model of Spatial Variation:

A Unifying Scientific Theory– Waldo Tobler (1970)

• First law of geography

– Everything is related to everything else, but near things are more related than distant things

– Matheron (1971)• Theory of regionalized variables

– Webster and Cuanalo (1975)• clay, silt, pH, CaCO3, colour

value, and stoniness on transect

– Burgess and Webster (1980 ab)• Soil Property maps by kriging

• Universal kriging (drift) of EC

Page 11: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Universal Model of Spatial Variation:

A Unifying Scientific Theory– Applies to ALL spatial prediction

• Matheron (1971)

• Adapted to predict soil variation

– Burgess and Webster (1980 ab)

– Webster and Burrough (1980)

– Burrough (1986)

– Webster and McBratney (1987)

– Oliver (1989)

Source: Oliver, 1989

Page 12: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Universal Model of Soil Variation• A Unifying Framework for Predictive Mapping

Z(s) = Z*(s) + ε’(s) + ε’

Predicted soil type or

soil property value

Deterministic part of

the predictive model

Stochastic part of the

predictive model

Pure Noise part of

the predictive model

Predicted spatial

pattern of some soil

property or class

including uncertainty

of the estimate

part of the variation

that shows spatial

structure, can be

modelled with a

variogram

part of the variation

that is predictable by

means of some

statistical or heuristic

soil-landscape model

part of the variation

that can’t be predicted

at the current scale

with the available

data and models

Source: Burrough, 1986 eq. 8.14

Page 13: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Deterministic Part of Prediction Model:

Z*(s)

• Conceptual Models

– Conceptual or mental soil-landscape models

– Produce area-class maps

• Statistical Models

– Scorpan – relate soils/soil properties to covariates

– Explain spatial distribution of soils in terms of known soil forming factors as represented by covariates

EOR Series DYD Series KLM Series FMN Series

15

40

60

COR Series

In d iv id u a l sa lin ity h a za rd ra tin gsfor ea ch la ye r

10 0 x 10 0 m g rid

La nd sca pecu rva tu re

Veg e ta tio n

R a in fa ll

G e o lo g y

S o ils

L a n d su rfa ce

S a lin ity h az a rdm ap

L a ye r w e igh ting s

2 x

1 x

2 x

1 x

3 x

To ta l sa lin ityha za rd ra ting

Page 14: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Stochastic Part of Prediction Model: ε’(s)

Optimal Interpolation by Kriging

Fit Semi-variogram to lag data

6

6

7

6

6

7

7

5

8 5

x

y

Collect point sample observations

Irregular spatial distribution

(of observed point values)

Compute semi-variance

at different lag distances

Estimate values and error at fixed grid locations

6.1 5.7 5.3 5.8

7.0 6.5 6.0 5.2

7.6 6.0 5.77.0

7.2 7.0 6.2 5.5

Page 15: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Stochastic Part of Prediction Model: ε’(s)

• Geostatistical Estimation

– Predict soil properties• Point or block kriging

– Predict soil classes• Indicator kriging (0/1)

• Soil class likelihood values

– Predict error of estimate

• Correct Deterministic Part

– Error in deterministic part is computed (residuals)

– If structure exists in error then krige error & subtract

Page 16: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Pure Noise Part of Prediction Model: ε’

• Some Variation not Predictable

– Have to be honest about this• Should quantify and report it

• Deterministic Prediction

– Mental and Statistical Models• Not perfect – often lack suitable

covariates to predict target variable

• Lack covariates at finer resolution

• Geostatistical Prediction

– Insufficient point input data• Can’t predict at less than the

smallest spacing of input point datad1 d2 d3 d4

SemiVariance

Lag (distance)

Sill

Nugget

Range

Page 17: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

What is Optimization and Why is it

Important?

Why all the fuss about optimization?

Page 18: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Importance of Optimization in the

Universal Model of Variation• Minimize Variance

– Between observed and

predicted values or classes

Source: Zhu et al., 1997

• Maximize Accuracy

– Discrete polygon surfaces

– Continuous value surfaces

Page 19: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Conventional vs. Pedometric Mapping

Implications relative to optimization

Implications relative to other sciences

Page 20: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Conventional vs. Pedometric Mapping

• Conventional Mapping

– Subjective and empirical

• Only tries to implement the

deterministic past of UMSV

– Impossible to replicate

• Too many possible outcomes

– Impossible to automate

• Too many manual decisions

– Impossible to optimize

• Unlimited number of

possible realizations

• Can’t maximize fit to data

Page 21: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Conventional vs. Pedometric Mapping

• Pedometric Mapping

– Objective and systematic

• Implements both parts

Deterministic & Stochastic

– Completely reproducible

• Models can be re-run

– Can be automated

• Fully automated workflows

– Can be optimized

• Multiple models can be run

to identify best fit to data

• Objective ID of best fit

Page 22: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Rationale Against Conventional Mapping

• Essentially a Religion

– Based on belief and faith

• Subjective assessments

– Can only map static features

• Can’t handle time, management

– Can’t objectively optimize

• Impossible to guarantee fit of

predictions to observed values

– Very difficult to update

• Takes years to re-map an area

• Can’t reuse/recycle existing data

EOR Series DYD Series KLM Series FMN Series

15

40

60

COR Series

Page 23: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Rationale for Pedometric Mapping

• Time to Become a Science

– Use objective, evidence-based

input data and models

• Like meteorology, hydrology

– Use multiple-model runs

• To reduce & quantify error

– Quantify fit and error

• Best possible fit of predictions

to observed values (optimized)

– Allow for real-time update

• Continuous improvement

• Model dynamic properties

Page 24: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

What is Truth?

What do we want/need to map?

How can we know if we have mapped it?

How “close” is what we predict to what

we observe?

Page 25: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

What Do We Want/Need to Map?

• Conceptual Classes of Objects

– Ecological Class Objects

• Defined by attributes that describe

their environmental settings

• Often presented on maps as map unit

collections of classes within polygons

– Soil Class Objects (Series, MUs)

• Defined by observable profile

attributes which are taken to reflect

the controlling environment

• Often presented on maps as map unit

collections of classes within polygons

EOR Series DYD Series KLM Series FMN Series

15

40

60

COR Series

Page 26: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Environmental Gradients Are Used to

Define Ecological Classes

• Ecological Field Guides

– Record and present

knowledge using

• Text descriptions

• Ecological Keys

• Edatopic Grids

• Landscape Profiles

• Summary Tables

Source: Steen and Coupé (1997)

Page 27: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Implications of Mapping Classed Objects

• Key Assumptions

– Classes can be recognized

consistently and correctly

• Most sites are not proto-

typical of a single class,

show partial membership

– Class attributes and

definitions do not overlap

(mutually exclusive)

– All key properties of

interest co-vary spatially

with all classes

• Problems and Challenges

– Often impossible to map a

single uniform class

• Usually have map units with a

collection of 2 or more classes

– Classes do not change

abruptly at boundaries

• Often it is just the proportions

of classes that change

– Ambiguity and error

• Same class in different settings

• Different classes in same setting

Page 28: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

How Do We Use Area-Class Maps?

• General Planning

– General planning and

decision making often at a

regional level

• Need information on

extent or area of given

classes and their attributes

• Estimates of proportions

of classes within defined

areas is usually “good

enough”

• Not adversely affected by

error if classes are “close”

• Operational Planning

– Usually applies to some

minimum sized area under

intensive management

• Ideally desire knowledge of

exact classes at exact locations

• But often can’t manage to

exact site level so proportions

of classes within a minimum

size management area is often

sufficient

• Not adversely affected by

error if classes are “close”

Page 29: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

What Do We Really Need from Area-

Class Maps?

• Exact Class Match?

– Exact match of classes at

exact locations

• Observed class exactly

matches prototypical class

definition at all locations

• In a perfect world this is

the ideal

– Assumes classes change

abruptly at boundaries

• In real world we tend to

map assemblages of classes

• Approximate Match?

– Proportions of named

classes within a defined area

• May be “good enough” to

support decisions we need to

make for management

– Similarity of observed site

to definition of class

• If observed site conditions

are “close” to those defined

for prototypical class that

may also be “good enough”

Page 30: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

What is Truth in Area-Class Maps?

• Exact Measurement Error

– Four ecologists traversed

the same 4 transects

– Exact agreement at exact

locations (congruency)

• Any 2 agreed with each other

– Average agreement 42%

– Minimum agreement 23%

– Maximum agreement 73%

• All 4 agreed with each other

– Average agreement 21%

– Minimum agreement 11%

– Maximum agreement 30%Source: Moon, 2008

A cautionary tale

Congruency = Exact Class Match at

Exact Spatial Locations

Page 31: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

What is Truth in Area-Class Maps?

• Proportions in a Small Area

– Traverse area approximates

minimum size map polygon

– Compute proportions of each

class along total traverse

• Primary call agreement all 4

– Average agreement 64%

– Minimum agreement 37%

– Maximum agreement 86%

• Primary & alternate agreement

– Average agreement 71%

– Minimum agreement 43%

– Maximum agreement 95%Source: Moon, 2008

A cautionary tale

Overlap = Degree to which proportion

estimates along total traverse agree

Page 32: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

How “Close” is What we Predict to

What we Observe?

• In Geographic Space?

– Displacement of field

observations from map

predictions

• Can easily be +/- 25 m

even with GPS locations

• So, if class is found

anywhere within 25 m is

that a correct prediction?

– If we know extent of class

within minimum area

• Is that “close” enough

• In Parameter Space?

– Point of predicting classes

is to predict attributes

• If attributes of predicted

class at predicted location are

similar to observed attributes

– Is that “close” enough

• Can assess similarity of

observed to predicted

attributes at each location

– From this can assess how

“close” predicted attributes

are to observed attributes

Page 33: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

How “Close” is What we Predict to

What we Observe?

• In Parameter Space?• Can assess similarity of

observed to predicted

attributes at each location

– From this can assess how

“close” predicted class is to

observed class

Page 34: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

How “Close” is What we Predict to

What we Observe?

• Measures of Class Similarity

– Taxonomic distance

• Difference in taxonomic space

between observed profile and

prototype reference profile

– Carré and Jacobson, 2009

– Minasny and McBratney, 2007

– Site attribute similarity

• Difference in attribute space

between observed site attributes

and attributes of reference site

– Zhu et al., Shi, SoLIM

Page 35: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Practical & Operational Considerations

Variable Selection

Model Selection

Polygon Disaggregation

Rigour and Structure in Predictive Mapping

Open data, open platforms, open software, collaboration

Page 36: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Variable SelectionFinding the most useful predictor variables

Discovering structure in the input variables

Multi-scale analysis to find the most

effective resolutions

Harmonization of disparate observations

Page 37: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Discovering Structure in Input Data Sets

• Inherent Spatial Structure

– Systematic analysis of

environmental covariates

• Detect distances and scales

over which each covariate

exhibits a strong relationship

with itself or with a soil or

property to be predicted.

– Analyse range of variation

inherent to each covariate

– Vary window sizes and grid

resolutions and compute

regressions on derivatives

» Functional relationships

are dependent on scale

Source: Park, 2004

Page 38: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Discovering Structure in Input Data Sets

• Relate correlations to scale

– Exploratory data analysis

• Systematic examination of

relationships of properties,

classes or processes to scale

• Multi-scale analysis

– varying window size and

grid resolution

• Assess strength of correlation

relative to resolution & scale

– New geomorphic measures

• Primarily contextual and

related to landform position

Source: Smith et al., 2006

Source: Schmidt and Andrew., 2005

Page 39: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Discovering Structure in Input Data Sets

• Terrain Attributes

– Multi-scale analysis

• Varying window size and

grid resolution

• Identifies that some

variables are more useful

when computed over larger

windows or coarser grids

– Finer resolution grids not

always needed or better

– Drop off in predictive

power of DEMs after

about 30-50 m grid

resolution

Source: Deng et al., 2007

Page 40: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Discovering Structure in Input Data Sets:

Smoothing vs. Coarsening of DEM Data

Source: Li et al., 2011

Page 41: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Multi-scale DEM Analysis: MrVBFSmooth and subsample

Original: 25 m Generalised: 75 m Generalised 675 mFlatness

Bottomness

Valley Bottom

Flatness

Valley Bottom

Flatness

Bottomness

Flatness

Source: Gallant, 2012

Page 42: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Multiple Resolution Landform Position

MrVBF Example Outputs

Source: Gallant, 2012

Broader Scale 9” DEM

MRVBF for 25 m DEM

Page 43: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Hyper-scale Analyses of Terrain Context

• ConMAP: Hyper-scale Contextual Analysis of Topographic Parameters

Source: Behrens et al., in press

– Neighborhood example

• Diameter

– 21 km

• Predictors

– 775

Page 44: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Hyper-scale Analyses of Terrain Context

• ConSTAT: Hyper-scale Contextual Analysis of Topographic Parameters

Source: Behrens et al., in press

ConStat (ConMap)- neighborhood reduction

a) Full neighborhoodb) Reduction of radiic) Reduction on radii d) Combination of b and c

Page 45: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Hyper-scale Analyses of Terrain Context

• Hyper-scale Terrain

Analysis in ConSTAT

– Systematic analysis of relative

importance of terrain

measures at different scales

• Compute statistics of terrain

measures at different scales

– Use data mining (Random

Forests) to identify

importance of different

statistics at different scales

and at each different location

Source: Berhens et al., in press

Page 46: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Hyper-scale Analyses of Terrain Context

• ConSTAT: Hyper-scale Contextual Analysis of Topographic Parameters

Source: Behrens et al., in press

Page 47: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Improved Measures of Landform Position

• SAGA-RHSP: relative

hydrologic slope position

• SAGA-ABC: altitude

above channel

Source: C. Bulmer, unpublishedCalculation based on: MacMillan, 2005

Source: C. Bulmer, unpublished

Page 48: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Improved Measures of Landform Position

• TOPHAT – Schmidt

and Hewitt (2004)

• Slope Position – Hatfield

(1996)

Source: Hatfield (1996)Source: Schmidt & Hewitt, (2004)

Page 49: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Improved Measures of Landform Position -

Scilands

Source: Rüdiger Köthe , 2012

Page 50: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Improved Classification of Landform Types

– Scilands/LandMapR

Source: MacMillan/Rüdiger Köthe , 2012

Page 51: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Measures of Relative Slope Length (L)

Computed by LandMapR• Percent L Pit to Peak • Percent L Channel to Divide

MEASURE OF LOCAL CONTEXTMEASURE OF REGIONAL CONTEXT

Image Data Copyright the Province of British Columbia, 2003

Source: MacMillan, 2005

Page 52: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Image Data Copyright the Province of British Columbia, 2003

Measures of Relative Slope Position

Computed by LandMapR• Percent Diffuse Upslope Area • Percent Z Channel to Divide

RELATIVE TO MAIN STREAM CHANNELSSENSITIVE TO HOLLOWS & DRAWS

Source: MacMillan, 2005

Page 53: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Harmonization of Input Data Sets

• Inconsistent class names

– Need to identify different

class codes/names that are

essentially the same entity

• Maybe assess each class in

terms of similarity to all others

• Harmonize several similar

class names to the same name

– Need to address how to use

maps of compound units as

evidence for training data

• Sample units by proportion

Source: Sun et al., 2010

Page 54: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Harmonization of Soil Profiles by Depth

Source: David Jacquier, 2010

Harmonization of soil profile depth data through spline fitting

Page 55: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Harmonization of Soil Profiles by DepthFrom discrete soil classes to continuous soil properties

‘Modal’

profile

Fit mass-

preserving

spline

Spline

averages

at

specified

depth

ranges

Estimate

averages for

spline at

standardised

depth

ranges, e.g.,

globalsoilmap

depth ranges

Fitted

Spline

Clearfield soil seriesWapello County, Iowa

Mukey: 411784Musym: 230C

Source: Sun et al., (2010)

Harmonization of soil profile

data through spline fitting

Page 56: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Model SelectionFinding the most useful model or models

Main types of modeling options

Arguments for adopting objective models

amenable to optimization

Arguments for adopting ensemble models

Page 57: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Approaches to Producing Predictive Area-

Class Maps

Page 58: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Knowledge-Based Classification In SoLIM

Source: Zhu, SoLIM Handbook

Page 59: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Knowledge-Based Classification In LandMapR

Source: Steen and Coupé, 1997

Source: MacMillan, 2005

Page 60: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Knowledge-Based Classification In LandMapR

Source: Global Forest Watch Canada, 2012

Page 61: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Approaches to Producing Predictive Area-

Class Maps

Page 62: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Supervised Prediction Models Based on

Analysis of Evidence• Regression Trees, Random Forests

– McKenzie & Ryan, 1998, Odeh et al., 1994

• Fuzzy Logic-Neural Networks

– Zhu, 1997

• Bayesian Expert Knowledge

– Skidmore et al., 1996

– Cook et al., 1996, Corner et al., 1997

• GLMs – General Linear Models

– McKenzie & Austin, 1993

– Gessler et al., 1995Source: McKenzie and Ryan, 1998

Page 63: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Supervised Classification Using Regression Trees

Note similarity of supervised rulesand classes to typical soil-landformconceptual classes

Note numeric estimate of likelihood of occurrence of classes

Source: Zhou et al., 2004,

JZUS

Page 64: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Uncertainty of prediction

Bui and Moran (2003)

Geoderma 111:21-44

Extrapolation

Source: Bui and Moran., 2003

Predicting Area-Class Soil Maps Using

Regression Trees/Random Forests

Page 65: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Approaches to Producing Predictive Area-

Class Maps

Page 66: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Unsuperivsed Fuzzy K-means Clustering

Credit: J. Balkovič & G. Čemanová

Source: Sobocká et al., 2003

Page 67: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Example of Application of Fuzzy K-means

Unsupervised Classification

From: Burrough et al., 2001, Landscsape Ecology

Note similarity of unsupervised

classes to conceptual classes

Page 68: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Importance of Selecting Objective Models

Amenable to Optimization

Source: Heuvelink et al., 2004

Deterministic part of

the predictive model

Stochastic part of the

predictive model

Lots of things qualify

as regression!

Regression just

means minimizing

variance

Why is optimization

so important?

Z(s) = Z*(s) + ε(s) + ε

Page 69: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Usefulness of Running Multiple

Models or Ensemble Models• All Models are wrong

– But some models are useful

• Some are more correct than

others

• Can assess error of each

model objectively

• Then weight each model to

average predictions

– Operates on the principal of

the wisdom of crowds

• Average out errors across

different models

Source: Hengl et al., in prep

Page 70: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Source: Sun et al., 2010

Predicting Area-Class Soil Maps Using

Multiple Regression Trees (100 x)

Prepare a database and tables of mapping units & soil series, and covariates

Select 1/n of the points systematically (n=100)

Sample soil series randomly from the multinomial distribution of mapping unit composites

Construct decision tree

Predict soil series at all pixels

Calculate the soil series statistics based on the n predictions for each pixel

Calculate the probability for each soil series

Generate soil series maps

Repeat n

times

Used See 5, (RuleQuest

Research, 2009

Page 71: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Polygon DisaggregationMaking the most of existing polygon class maps

Conventional manual maps are useful

Many manual maps have compound map

units with multiple component classes

Many ways to disaggregate compound maps

Page 72: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Knowledge-Based Polygon Disaggregation

Using Simple Expert Knowledge in USA

Gilpin

Pineville

Laidig

Guyandotte

Dekalb

Component Soils

Craigsville

Meckesville

Cateache

Shouns

Source: Thompson et al., 2010 WCSS

Page 73: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Predicting Area-Class Soil Maps

Source: Grinand et al., 2008

Clovis Grinand, Dominique Arrouays,

Bertrand Laroche, and Manuel Pascal Martin.

Extrapolating regional soil landscapes from an

existing soil map: Sampling intensity,

validation procedures, and integration of

spatial context. Geoderma 143, 180-190

Page 74: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Example of Application of Disaggregation of

a Soil Map by Clustering into Components

Source: Faine, 2001

Page 75: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Source: Sun et al., 2010

Polygon Disaggregation of Area-Class

Soil Map Using 100x Regression Trees

Prepare a database and tables of mapping units & soil series, and covariates

Select 1/n of the points systematically (n=100)

Sample soil series randomly from the multinomial distribution of mapping unit composites

Construct decision tree

Predict soil series at all pixels

Calculate the soil series statistics based on the n predictions for each pixel

Calculate the probability for each soil series

Generate soil series maps

Repeat n

times

Used See 5, (RuleQuest

Research, 2009

Page 76: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Source: Sun et al., 2010

Predicting Area-Class Soil Maps Using

Multiple Regression Trees (100 x)

A closer look at the junction point in the middle of 4 combined maps,

(a) the original map units, and

(b) the most likely soil series map and its associated probability.

The length of the image is approximately 14 km.

Legend

monr_comppct

Value

High : 100

Low : 7

(a)

(b)

Page 77: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Rigour and Structure in Predictive

Mapping

Take time to do things rightStrive for reproducibility through automated

work flows

Page 78: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Rigour and Structure in Predictive

Mapping

• Take Time to Get it Right

– Exploratory Data Analysis

• Find the best predictors at

the best resolution(s)

• Assess multiple inputs at

different resolutions & scales

• Find surrogates for parent

material type & texture

– Evaluate Multiple Models

• Apply and assess multiple

prediction models

• Assess model performance

quantitatively

• Strive for Reproducibility

– It is not Science if Everyone

Else Cannot Reproduce it

• All evidence, input layers and

models must be available

– Automated Workflows are

Highly Desirable

• Let you rerun the process if

any new data are obtained

• Let others rerun the process to

verify your results

– Grow and Reuse Input Data

Page 79: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Open FrameworkOpen data, Open platforms, Open software

Facilitate collective and collaborative

action through open systems

Continuously add to and reuse data

Reinvent conventional terrestrial mapping

Page 80: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

An Example of an Open Platform:

Global Soil Information Facility

Source: Hengl et al., 2011

Page 81: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

A Conceptual Framework for a Global Soil

Information Facility (GSIF)

Source: Hengl et al., 2011

Collaborative and

open modelling on

an inter-active, web-

based server-side

platform

Collaborative and

open production,

assembly and sharing

of covariate data

(World Grids)

Collaborative and

open collection,

input and sharing of

geo-registered field

evidence

(Open Soil Profiles)

Maps we can all contribute to, access, use, modify and

update, continuously and transparently

Everything is

accessible,

transparent and

repeatable

Page 82: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

A Conceptual Framework for a Global Soil

Information Facility (GSIF)

Source: Hengl et al., 2011

Possibility of making

use of existing

legacy soil maps

(even new soil maps)

needed for soil

prediction anywhere

Possibility of

rescuing, sharing,

harmonizing and

archiving soil

profile point data

needed for soil

prediction anywhere

Possibility to

develop and use

global models (even

for local mapping)

Possibility to

develop and use

multi-scale and

multi-resolution

hierarchical models

Possibility to assess

error and correct for

it everywhere

Visit: www.worldsoilprofiles.org, www.worldgrids.org

Page 83: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Future: Collaborative Global to Local,

Multi-Scale Mapping through Open Platforms

Source: Hengl et al., 2011

Possibility for combining

Top-Down and Bottom-up

mapping through weighted

averaging of 2 or more sets

of predictions

)

Possibility to

develop and use

global models (even

for local mapping)

Page 84: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Future: Global to Local, Multi-Scale

Modeling of Soil Properties or Classes

Source: Hengl et al., 2011

Possibility to

develop and use

global models (even

for local mapping)

Possibility to

develop and use

multi-scale and

multi-resolution

hierarchical models

Page 85: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Future: From Static Mapping to

Continuously Updated Modelling

Possibility to move from single snapshot

mapping of static soil properties or classes

to continuous update and improvement of

maps of both static and dynamic

properties within a structured and

consistent framework.

Page 86: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

The Future: Collaborative Global to Local,

Multi-Scale On-line Mapping

Source: Hengl et al., 2011

A Global

Collaboratory!

Working together

we can map the

world one block at a

time!

The next generation

of soil surveyors is

everyone!

Page 87: Aitf 2014 pem_introduction_presentation_feb28_ram_version2

Thank You

Hope this was helpful