Spatial Modeling and Analysis Deana D. Pennington, PhD University of New Mexico.

Post on 13-Jan-2016

214 views 0 download

Tags:

Transcript of Spatial Modeling and Analysis Deana D. Pennington, PhD University of New Mexico.

Spatial Modeling and Spatial Modeling and AnalysisAnalysis

Deana D. Pennington, PhDDeana D. Pennington, PhDUniversity of New MexicoUniversity of New Mexico

What is spatial analysis?What is spatial analysis?

Analyses where the data are spatially located and explicit consideration is given to the possible importance of their spatial arrangement in the

analysis

Statistical IssuesStatistical IssuesValid statistics depend on:Valid statistics depend on: Temporal stability and causal transienceTemporal stability and causal transience Unit homogeneityUnit homogeneity IndependenceIndependence Constant effectsConstant effects

BUT Ecology & Earth Science violate all of these!BUT Ecology & Earth Science violate all of these!We study:We study: Change with time (no temporal stability)Change with time (no temporal stability) Legacies, persistence, recovery (no causal transience )Legacies, persistence, recovery (no causal transience ) Heterogenity through space and time (no unit Heterogenity through space and time (no unit

homogeneityhomogeneity Spatial structure (no independence)Spatial structure (no independence) Differences in response through space/time (non-Differences in response through space/time (non-

constant effects)constant effects) Attributes rather than causal factors, which must be Attributes rather than causal factors, which must be

inferredinferred

Issues in Spatial AnalysisIssues in Spatial Analysis

•Error•Small sample sizes compared with size of environmental data sets•Spatial dependency•Spatial heterogeneity•Boundaries effects•Modifiable Areal Unit Problem

Spatial DependencySpatial Dependency

Tobler’s Law: All things are related, but nearby things are more related than distant things

Non-independent observations: duplicates observations in the sample set, therefore is a loss of information compared with independent observations. Affects mean, variance, confidence intervals and significance tests

***Field samples tend to be taken from nearby locations, and are almost always spatially autocorrelated***

Spatial HeterogeneitySpatial Heterogeneity

•Stratification of the landscape (regions, classes, etc) problematic due to gradational nature•Intra-strata variability, mixtures•Differences in numbers of observations within strata

Heterogenity in spatial data

300 x 300 pixels, 192 training pixels out of 90,000 total pixels, 7 mislabeled

*low % samples*errors in samples

Hyperspectral ExampleHyperspectral Example

Roads 33Clouds 23

River 23

Riparian 28

Arid upland 25

Barren 22

Agriculture 38

7TrueColor

FalseColor

6 km2

Hyperspectral ResultsHyperspectral Results

7

Riparian

Riparian

Riparian

Riparian

Arid upland

Semi-arid upland

Arid upland

K-meansUnsupervised10 classes

Semi-arid upland

Clouds/barren

•Confusion between river & agriculture

•Confusion between clouds and barren

•Unsampled semi-arid upland

•Mislabeled arid upland

•Unsampled variability in riparian

River/agriculture

•Road variability

7

Clouds

Agriculture

River

Riparian

Arid upland

Barren

Roads

Unclassified

K-means UnsupervisedMaximum Likelihood89.44%

Naïve Bayesian83.33%

Parallelepiped82.78%

Minimum Distance69.44%

Support Vector Machine77.22%

•Confusion between river & agriculture•Confusion between clouds and barren•Unsampled semi-arid upland

•Mislabeled arid upland (4.4%)•Unsampled variability in riparian•Road variability

Boundary EffectsBoundary Effects

•Loss of neighbors in analyses that depend on neighborhood values•Solution: collect data along a border outside of the analysis area

Modifiable Areal Unit Modifiable Areal Unit Problem (MAUP)Problem (MAUP)

•Results sensitive to cell size, location, orientation

Components of Spatial Analysis

Exploratory Spatial Data Analysis (ESDA) Finding interesting patterns. Visualization Showing interesting patterns. Spatial Modeling Explaining interesting patterns.

Spatial AnalysesSpatial Analyses

Things to consider:Things to consider: Objective: describe, map, causationObjective: describe, map, causation Data type: binary (Y/N), categorical, Data type: binary (Y/N), categorical,

continuouscontinuous Expected pattern: gradient, periodic, Expected pattern: gradient, periodic,

clusteredclustered Scale of patternScale of pattern Univariate/multivariateUnivariate/multivariate

Spatial AnalysesSpatial Analyses

Biological survey where each point denotesthe observation of an endangered species. If a pattern exists, like this diagram, we may be ableto analyze behavior in termsof environmental characteristics

1. Quantify pattern• Attraction or

repulsion• Directionality

2. Make inferences about process based on observed pattern

ChoicesChoices

Point pattern analyses

Single scale of pattern Quadrat analysisNearest neighbor

Multiscale patternRefined nearest neighbor2nd order analysisRipley’s K

Make maps from pointsDistance interpolation

KrigingTrend surface analysis

Spline

Test models with space as causal factor

Mantel testMantel correlogramMultivariate analysis

Describe spatial structure

Gradient, periodic

Single scale of patternSemivariogramCorrelogram

Multiscale patternSpectral analysis

EdgeWaveletanalysis

ContextAdjacency measures

Cross variogramCross

correlogramSelf-similarity

Fractaldimension

Network AnalysisPath analysisAllocationConnectivity

Point Pattern Point Pattern AnalysisAnalysis

Clustered (attraction)

Uniform (repulsion)

Point Pattern AnalysisPoint Pattern Analysis

Statistical tests for significant patterns in data, compared with the null hypothesis of random spatial pattern

The standard against which spatial point patterns are compared is a:

Completely Spatially Random (CSR) Point Process Poisson probability distribution (mean = variance)

used to generate spatially random points

Quadrat AnalysisQuadrat Analysis1. Divide the area up into quadrats2. Count the number of points in each quadrat3. Compare counts with expected counts in random distribution

# ofcells

# of pts/cell

Expected CSR = null hypothesis

Clustered

UniformExpected mean #/cell in CSR = N/# of quadsFor Poisson distribution:

p(x) = (e- x)/x!

Chi square 2 = (observed – expected)2/expected# Oi P(x) Ei0 2 0.0156 0.391 2 0.0649 1.62 5.39 2.422 5 0.1350 3.383 1 0.1873 4.68… 2

Check Chi square tableIf Ho rejected:Mean <> varianceMean > variance (uniform)Mean < variance (clustered)

Nearest Neighbor Nearest Neighbor DistanceDistance

1. Calculate the distance to the nearest neighbor for every point2. Calculate mean nn distance3. Calculate expected mean for CSR distribution E(di) = 0.5 A/N4. Compare expected mean to observed mean with Z statistic

Z = [ d – E(di)] / [0.0683 A/N2]

Look up in significance in z-statistic tableIf Ho rejected,

observed mean < expected and Z < 0 => clusteredobserved mean > expected and Z > 0 => uniform

Ripley’s KRipley’s K1. Expand a circle of increasing radius around each point2. Count the number of points within each circle.3. Calculate L(d), a measure of the expected number of points

within distance (d); L(d) = [ASkij/N(N-1)]0.5, where A = area, Skij = number of points j within distance d of all i points

4. Monte Carlo simulations or t-test

Radius

L(d)

Expected CSR mean

Clustered

Uniform

***Note added information – mean clustering distance

Lab #12ALab #12A

Point pattern analysisPoint pattern analysis

Analysis of Continuous Analysis of Continuous DataData

1. Variation in mean values

2. Describe local variability & spatial dependence

Mean trendsMean trends

Focal

Zonal

Global

Input Output

Single value (surface analysis)

or table

Grid Analysis: Focal Grid Analysis: Focal AnalysisAnalysis

Spatial filters: output value for each cell is calculated from neighboring cells (moving windows)

Neighborhood shapes: MajorityMaximumMeanMedianMinimumRangeStandard deviationSumVariety

Species A habitatSpecies B habitat

Range Species A = 4 cellsSpecies A depends on B

•Low pass: Smoothing, removing noise•High pass: Emphasize local variation•Edge enhancement

Grid Analysis: Zonal Grid Analysis: Zonal AnalysisAnalysis

Vegetation class A or land use A

Vegetation class B or land use B

Vegetation class C or land use C

AreaCentroidGeometryPerimeter

MajorityMaximumMeanMedianMinimumRangeStatisticsStandard deviationSumThicknessVarietyOutput is:

a) grid with same value in each cell for a given zoneb) table with values by zone

Lab 12B Lab 12B Neighborhood and Neighborhood and

Zone AnalysisZone Analysis

Geostatistics BasicsGeostatistics Basics

Parametric StatsUnivariate Multivariate

Spatial StatsUnivariateMultivariate

meanvariance

x

correlation

covariance

x, y

semi-variancelag correlation

lag covariance

x, h

h = lag (time or space)

cross-semivariance (variogram)cross correlation ||inverse

cross covariance (correlogram)

x, y, h

Semi-variance Semi-variance hh

N

Variance: 2 = (xi – x )2

i=1

N

Nh

Semi-variance: h = (xi – xi+h )2

i=1

2Nh

Local meanw.r.t study

extent

1. Slide x through space to get h 2. Vary h

Xi

Xi+h

Semi-variance Semi-variance hh

Nh

Semi-variance: h = (xi – xi+h )2

i=1

2Nh

Local mean

Xi

Xi+h

Number of cells N = 10Number of windows Nh = # cells – h

h = 1….Nh = 9

h = 5….Nh = 5

Limit h to 1/3 of study extent

Nh

Semi-variance: h = (xi – xi+h )2

i=1

2Nh

Next x

Semi-variogramSemi-variogramIf xi is similar to xi+h , h is small, and they are spatially correlatedIf xi is not similar to xi+h , h is large, and they are not spatially correlated

=> h measures heterogeneity

Nugget

Sill

Range

Nugget – value of h at distance 0 (not in data) – measure of unexplained variabilityRange – distance h of leveling off – below range heterogeneity is increasing in a predictable manner, above range, heterogenity is constant – measure of independenceSill – measure of maximum heterogeneity in data (max)

h

hh

0independence

spatialdependence

Semi-variogramsSemi-variograms

h

hh

0

h

hh

0

periodic, cyclic

Examples: timber harvest, forest agerange harvest areasill rotation

gradient, no sill or range

Lag Covariance: Geary’s Lag Covariance: Geary’s CC

Xi

Xi+hXi-h

Centered around mean values of x, x

Nh

Lag covariance: Ch = (xi – xi-h )(xi – xi+h ) i=1

Nh

Local mean

Correlograms have the inverse shape of semi-variograms

If x, xi+h and xi-h are all the same, Ch = 0If values are increasing or decreasing through space (xi-h < x < xi+h, or xi-h > x > xi+h, 1 term is negative and Ch = negative, things are not similar. Otherwise positive, things are similar

Lag Correlation: Moran’s Lag Correlation: Moran’s II

Centered around mean values of x, xStandardized against sample variation

Nh

Lag covariance: Ch = (xi – xi-h )(xi – xi+h ) i=1

Nh

Lag correlation Ph = Ch Sx-h Sx+h

ComparisonComparison

Semi-variance h 0 < Gh <

Lag Covariance Geary’s C Ch - < Ch <

Lag Correlation Moran’s I Ph -1 < Ph < +1

h

hh

0 h

CChh

0-

h

PPhh

0-1

+1

range similar h

zero

Correlated Independent

Lab 12C CorrelogramsLab 12C Correlograms

Surface AnalysisSurface Analysis

Spatial distribution of Spatial distribution of surface information in surface information in terms of a three-terms of a three-dimensional structuredimensional structure

Surfaces do not have to Surfaces do not have to be elevation, but could be elevation, but could be population density, be population density, species richness, or any species richness, or any other measured other measured attributedattributed

Surface Surface AnalysisAnalysis

Given geolocated point data, calculate values at regular intervals between points

Inverse distance weighting

•Can’t create extremes (ridges, valleys)•Isotropic influence (not ridge preserving)•Best with dense samples

Kriging

•Uses semi-variogram to determine relative importance (weighting) of data at different distances•Uses global variation, only works well if semi-varigram captures variation across entire mapTrend analysis

•Calculates a best-fit polynomial equation using linear regression•Recalculates all positions using equation (lose original data)•Smoothing depends on polynomial order

Spline

•Calculates a 2-D minimum curvature surface that passes through every input point

Surface Analysis: Surface Analysis: StreamsStreams

Network AnalysisNetwork Analysis Designed specifically for line features organized Designed specifically for line features organized

in connected networks, typically applies to in connected networks, typically applies to transportation problems and location analysistransportation problems and location analysis

•Streams•Dispersal vectors•Community interactions

Network AnalysisNetwork Analysis

•Pathfinding: shortest or least cost•Allocation of network areas to a center based on supply, demand and impedance•Connectivity

Integrated Integrated AnalysisAnalysis

DEMHydroModel

Watershed

LandCover

Soil

Grid Process

Statistics

Modeling- regression,

et al.

GaugePoints

Samples

Field Data(Vector)

Lab 12D Lab 12D CorrelationCorrelation

SamplingSampling Spatial dependency must be considered in Spatial dependency must be considered in

sample designsample design Non-independent observationsNon-independent observations Fewer degrees of freedomFewer degrees of freedom Differences within groups will appear small => Differences within groups will appear small =>

over estimate significance of between group over estimate significance of between group variationvariation

Spatial structure & heterogeneity can affect Spatial structure & heterogeneity can affect experimental results – response due to treatments experimental results – response due to treatments or due to inherent spatial structure?or due to inherent spatial structure?

Solutions:Solutions: include space as an explanatory variable (Mantel include space as an explanatory variable (Mantel

test)test) Sample at greater distance than the variogram Sample at greater distance than the variogram

rangerange

Elevation (m)

Vegetation cover type

P, juniper, 2200m, 16CP, pinyon, 2320m, 14CA, creosote, 1535m, 22C

Sample 3, lat, long, species, absence

Mean annual temperature (C)

Access File

Excel File

Integrated data:

Sample 2, lat, long, species, presence

Sample 1, lat, long, species, presence

Example: Integrating Example: Integrating Species Occurrence Species Occurrence Points and ImagesPoints and Images

1. Semantics2. Compatible scales3. Reproject4. Resample grain5. Clip extent6. Sample occurrence points

ENM ResultsENM Results

Geographic patterns of species richness of 17 native rodent species.

Sanchez-Cordero and Martinez-Meyer, 2000

Model building and testing. a) training data; b) predictive model.

Peterson, Ball and Cohoon, 2002