Post on 25-Jan-2016
description
www.csiro.au
Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model
Erin E. Peterson
Postdoctoral Research Fellow
CSIRO Mathematical and Information Sciences Division
March 3, 2006
The work reported here was developed under STAR Research Assistance Agreement CR-829095 awarded by the U.S.
Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by
EPA. EPA does not endorse any products or commercial services mentioned in this presentation.
Space-Time Aquatic Resources Modeling and Analysis Program
This research is funded by
U.S.EPA凡Science To AchieveResults (STAR) ProgramCooperativeAgreement # CR -829095
This research is funded by
U.S.EPAScience To AchieveResults (STAR) ProgramCooperativeAgreement # CR -829095
Collaborators
Dr. David M. TheobaldNatural Resource Ecology LabDepartment of Recreation & TourismColorado State University, USA
Dr. N. Scott UrquhartDepartment of StatisticsColorado State University, USA
Dr. Jay M. Ver HoefNational Marine Mammal Laboratory, Seattle, USA
Andrew A. MertonDepartment of StatisticsColorado State University, USA
Overview
Introduction~
Background~
Patterns of spatial autocorrelation in stream water chemistry
~Visualizing model predictions
~Current and future research in
SEQ
Water Quality Monitoring Goals
Create a regional water quality assessment
Identify water quality impaired stream segments
Purpose
Demonstrate a geostatistical methodology based on
Coarse-scale GIS data
Field surveys
Predict water quality characteristics about stream segments throughout a region
Purpose of Our Research
How are geostatistical model different from traditional statistical models?
Traditional statistical models (non-spatial)
Residual error (ε) is assumed to be uncorrelated
ε = unexplained variability in the data
Geostatistical models
Residual errors are correlated through space
Spatial patterns in residual error resulting from unidentified process(es)
Model spatial structure in the residual error
Explain additional variability in the data
Generate predictions at unobserved sites
Y X
( ) ( ) ( )Y s X s s
Geostatistical Modelling
Fit an autocovariance function to data Describes relationship between observations based on separation distance
Separation Distance
Sem
ivar
ian
ce
Sill
Nugget Range
10000
0
103 Autocovariance Parameters
1) Nugget: variation between sites as separation distance approaches zero
2) Sill: delineated where semivariance asymptotes
3) Range: distance within which spatial autocorrelation occurs
Distance Measures and Spatial Relationships
Straight Line Distance (SLD)
As the crow flies
A
B
C
Symmetric Hydrologic Distance (SHD)
As the fish swims
A
B
C
Distance Measures and Spatial Relationships
Weighted asymmetric hydrologic distance (WAHD)
As the water flows
Incorporate flow direction & flow volume
A
B
C
Distance Measures and Spatial Relationships
Ver Hoef, J.M., Peterson, E.E., and Theobald, D.M. (2006) Spatial Statistical Models that Use Flow and Stream Distance, Environmental and Ecological Statistics, to appear.
A
B
C
Challenge: Spatial autocovariance models developed for SLD may not be valid for hydrologic
distances– Covariance matrix is not positive definite
Distance Measures and Spatial Relationships
Asymmetric Autocovariance Models for Stream Networks
Flow
Ver Hoef, J.M., Peterson, E.E., and Theobald, D.M., Spatial Statistical Models that Use Flow and Stream Distance, Environmental and Ecological Statistics. In Press.
Weighted asymmetric hydrologic distance (WAHD)
Developed by Jay Ver Hoef, National Marine Mammal Laboratory, Seattle, WA, USA
Moving average models
Incorporate flow volume, flow direction, and use hydrologic distance
Positive definite covariance matrices
Evaluate 8 chemical response variables1. pH measured in the lab (PHLAB)2. Conductivity (COND) measured in the lab μmho/cm3. Dissolved oxygen (DO) mg/l4. Dissolved organic carbon (DOC) mg/l5. Nitrate-nitrogen (NO3) mg/l6. Sulfate (SO4) mg/l7. Acid neutralizing capacity (ANC) μeq/l8. Temperature (TEMP) °C
Determine which distance measure is most appropriate SLD, SHD, WAHD? More than one?
Find the range of spatial autocorrelation
Objectives
Maryland Biological Stream Survey (MBSS) Data
Maryland Department of Natural Resources
Maryland, USA
1995, 1996, 1997
Stratified probability-based random survey design
1st, 2nd, and 3rd order non-tidal streams
955 sites
881 sites after pre-processing
17 interbasins
Maryland, USA
Baltimore
AnnapolisWashington D.C. Chesapeake Bay
Study Area
Spatial Distribution of MBSS Data
N
Create data for geostatistical modelling1. Calculate watershed covariates for each stream segment2. Calculate separation distances between sites
SLD, SHD, Asymmetric hydrologic distance (AHD)3. Calculate the spatial weights for the WAHD4. Convert GIS data to a format compatible with statistics software
FLoWS website: http://www.nrel.colostate.edu/projects/starmap
1 2
3
1 2
3
SLD
1 2
3
SHD AHD
Functional Linkage of Watersheds and Streams (FLoWS)
Spatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a downstream survey site Weighted by catchment area: Surrogate for flow volume
1. Calculate the PI of each upstream segment on segment directly downstream
2. Calculate the PI of one survey site on another site Flow-connected sites Multiply the segment PIs
BA
C
Watershed Segment B
Watershed Segment A
Segment PI of A
Watershed Area A
Watershed Area A+B=
A
BC
DE
F
G
H
survey sitesstream segment
Spatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a downstream survey site Weighted by catchment area: Surrogate for flow volume
1. Calculate the PI of each upstream segment on segment directly downstream
2. Calculate the PI of one survey site on another site Flow-connected sites Multiply the segment PIs
A
BC
DE
F
G
H
Site PI = B * D * F * G
Spatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a downstream survey site Weighted by catchment area: Surrogate for flow volume
1. Calculate the PI of each upstream segment on segment directly downstream
2. Calculate the PI of one survey site on another site Flow-connected sites Multiply the segment PIs
Data for Geostatistical Modelling
Distance matrices
SLD, SHD, AHD
Spatial weights matrix
Contains flow dependent weights for WAHD
Watershed covariates
Lumped watershed covariates
Mean elevation, % Urban
Observations
MBSS survey sites
Validation Set Unique for each chemical response variable
Initial Covariate Selection 5 covariates
Model Development Restricted model space to all possible linear models 4 model sets
Response Significant CovariatesANC (μeq/l) PASTUR, LOWURB, WOODYWET, YR96, YR97COND (μmho/cm) HIGHURB, LOWURB, COALMINE, YR96, NORTHINGDOC (mg/l) WOODYWET, CONIFER, MIXEDFOR, LOWURB, NORTHINGDO (mg/l) DECIDFOR, HIGHURB, WOODYWET, YR96, YR97NO3 (mg/l) PASTUR, PROBCROP, ROWCROP, LOWURB, WATERpH Lab PROBCROP, DECIDFOR, WOODYWET, ACREAGE, CONIFERSO4 (mg/l) LOWURB, COALMINE, NORTHING, ER67, ER69TEMP (°C) PROBCROP, LOWURB, WATER, YR96, YR97
Geostatistical Modeling Methods
Geostatistical model parameter estimation
Maximize the profile log-likelihood function
Geostatistical Modelling Methods
Log-likelihood function of the parameters ( ) given the observed data Z is:2, ,
)()'(2
1log
2
1)2log(
2);,,( 1
222
XZXZ
nZ
Maximizing the log-likelihood with respect to B and sigma2 yields:
2log
2
1)ˆlog(
2)2log(
2),ˆ,ˆ;( 22 nnnZprofile
ZXXX 111 ')'(ˆ 1
2ˆ ˆ( ) ' ( )
ˆZ X Z X
n
and
Both maximum likelihood estimators can be written as functions of alone
Derive the profile log-likelihood function by substituting the MLEs ( ) back into the log-likelihood function
2ˆ ˆ,
Correlation matrix for SLD and SHD models
Fit exponential autocorrelation function
1 1 21 2
1 if 0( ; , )
(1 )exp( / ) if 0
hC h
h h
where C1 is the correlation based on the distance between two sites, h, given the autocorrelation parameter estimates: nugget ( ), sill ( ), and range ( ).0 1 2
Geostatistical Modeling Methods
1 0
1
0 locations are not flow connected,
( , | ) (0) if location 1 = location 2,
( ) otherwise.D
i j
j B j
C s s C
w C h
Correlation matrix for WAHD model
Fit exponential autocorrelation function (C1) Hadamard (element-wise) product of C1 & square root of spatial weights
matrix forced into symmetry ( )Dj B jw
Geostatistical Modeling Methods
Model selection between model types 100 Predictions: Universal kriging algorithm Mean square prediction error (MSPE) Cannot use AICC to compare models based on different distance
measures
Model comparison r2 for observed vs. predicted values
Model selection within model set GLM: Akaike Information Corrected Criterion (AICC) Geostatistical models: Spatial AICC (Hoeting et al., in press)
2
12),,;(2 2
kpn
kpnZAICC profile
where n is the number of observations, p-1 is the number of covariates, and k is the number of autocorrelation parameters.
http://www.stat.colostate.edu/~jah/papers/spavarsel.pdf
Results
Summary statistics for distance measures Spatial neighborhood differs Affects number of neighboring sites Affects median, mean, and maximum separation distance
* Asymmetric hydrologic distance is not weighted here
Summary statistics for distance measures in kilometers using DO (n=826).
Distance Measure N Pairs Min Median Mean Max
Straight Line Distance 340725 0.05 101.02 118.16 385.53
Symmetric Hydrologic Distance 62625 0.05 156.29 187.10 611.74
Pure Asymmetric * Hydrologic Distance 1117 0.05 4.49 5.83 27.44
Results
SLD
SHD
WAHD
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
ANC COND DOC DO NO3 PHLAB SO4 TEMP
Ran
ge
(km
)
180.79 301.76
Range of spatial autocorrelation differs Shortest for SLD TEMP = shortest range values DO = largest range values
Mean Range ValuesSLD = 28.2 kmSHD = 88.03 km
WAHD = 57.8 km
ANC
0.00
50000.00
100000.00
150000.00
200000.00
250000.00
300000.00
350000.00
GLM SL SH WAH
COND
0.00
5000.00
10000.00
15000.00
20000.00
25000.00
30000.00
35000.00
40000.00
GLM SL SH WAH
DOC
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
GLM SL SH WAH
DO
0.00
0.50
1.00
1.50
2.00
2.50
GLM SL SH WAH
NO3
0.00
0.20
0.40
0.60
0.80
1.00
1.20
GLM SL SH WAH
SO4
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
GLM SL SH WAH
TEMP
6.50
7.00
7.50
8.00
8.50
9.00
GLM SL SH WAH
PHLAB
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
GLM SL SH WAH
MS
PE
GLM
SLD
SHD
WAHD
Distance Measures GLM always has less predictive ability More than one distance measure usually performed well
– SLD, SHD, WAHD: PHLAB & DOC– SLD and SHD : ANC, DO, NO3– WAHD & SHD: COND, TEMP
SLD distance: SO4
Results
Results
GLM
SLD
SHD
WAHD
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
ANC COND DOC DO NO3 PHLAB SO4 TEMP
R2r2
Strong: ANC, COND, DOC, NO3, PHLAB Weak: DO, TEMP, SO4
r2 Predictive ability of models
Discussion
Site’s relative influence on other sites Dictates form and size of spatial neighborhood
Important because… Impacts accuracy of the geostatistical model predictions
Distance measure influences how spatial relationships are represented in a stream network
SHD WAHDSLD
SLD
SHD
Geostatistical models describe more variability than GLM
Patterns of spatial autocorrelation found at relatively coarse scale
> 1 distance measure performed well SLD never substantially inferior Do not represent movement through network
Different range of spatial autocorrelation? Larger SHD and WAHD range values Separation distance larger when restricted to network
SLD, SHD, and WAHD represent spatial autocorrelation in continuous coarse-scale variables
Discussion
Probability-based random survey design (-) affected WAHD Maximize spatial independence of sites Does not represent spatial relationships in networks Validation sites randomly selected
0 2
244
149133
109
66
38 32
12 7
3519 15 13 6 1 0
0
275
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Fre
quen
cy
Number of Neighboring Sites
244 sites did not have neighbors Sample Size = 881Number of sites with ≤1 neighbor: 393Mean number of neighbors per site: 2.81
Discussion
0
45004500
0
Diff
ere
nce
(O
– E
)
Number of Neighboring Sites
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1715 16
WAHD GLM
Not when neighbors had: Similar watershed conditions Significantly different chemical response values
WAHD models explained more variability as neighboring sites increased
Discussion
Discussion
GLM predictions improved as number of neighbors increased Clusters of sites in space have similar watershed conditions
– Statistical regression pulled towards the cluster
GLM contained hidden spatial information– Explained additional variability in data with > neighbors
0
45004500
0
Number of Neighboring Sites
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1715 16
WAHD GLM
Diff
ere
nce
(O
– E
)
Predictive Ability of Geostatistical Models
r2
PH
Coarse
Fine
Sca
le o
f un
kno
wn
in
flue
ntia
l pro
cess
es ANC
NO3
COND
DOC
SO4
DO
0 0.5 1.0
TEMP
Conclusions
1) Spatial autocorrelation exists in stream chemistry data at a relatively coarse scale
2) Geostatistical models improve the accuracy of water chemistry predictions
3) Patterns of spatial autocorrelation differ between chemical response variables Ecological processes acting at different spatial scales affect
conditions at the survey site
4) SLD is the most suitable distance measure in Maryland for these chemical response variables at this time Unsuitable survey designs SHD: GIS processing time is prohibitive
Conclusions
5) Results are scale specific Spatial patterns change with survey scale Other patterns may emerge at shorter separation distances
6) Further research is needed at finer scales Watershed or small stream network
Demonstrate how a geostatistical methodology can be used to compliment regional water quality monitoring efforts
1) Predict regional water quality conditions
2) Identify the spatial location of potentially impaired stream segments
Visualization of Model Predictions
MBSS 1996 DOC
N 0 20
Kilometers
n Min 1st Qu. Median Mean 3rd Qu. Max σ2312 0.6 1.2 1.7 1.9 2.7 15.9 1.8
N 0 20
Kilometers
0 20
Kilometers
n Min 1st Qu. Median Mean 3rd Qu. Max σ2312 0.6 1.2 1.7 1.9 2.7 15.9 1.8n Min 1st Qu. Median Mean 3rd Qu. Max σ2
312 0.6 1.2 1.7 1.9 2.7 15.9 1.8
Spatial Patterns in Model Fit
Squared Prediction Error (SPE)
Generate Model Predictions
Prediction sites Study area
– 1st, 2nd, and 3rd order non-tidal streams– 3083 segments = 5973 stream km
ID downstream node of each segment– Create prediction site
More than one site at each confluence
Generate predictions and prediction variances
SLD Mariah model Universal kriging algorithm Assigned predictions and prediction variances back
to stream segments in GIS
DOC Predictions (mg/l)
Weak Model Fit
Strong Model Fit
Water Quality Attainment by Stream Kilometres
Threshold values for DOC Set by Maryland Department of Natural Resources High DOC values may indicate biological or ecological
stress
Theshold DOC (mg/l)Stream
Kilometers PercentLow < 5.0 5387.67 90.2Medium 5.0 - 8.0 400.19 6.7High > 8.0 185.16 3.1
Different ways to capture spatial information
1) Geostatistical models
Attempt to explain spatial relationship between response variables
May represent another ecological process that is affecting them
2) Spatial location of covariates
Does the spatial location of landuse within the watershed affect the response?
Does the spatial configuration of landuse affect the response?
3) Stream network configuration and connectivity
How does the configuration of the network affect the response?
Are stream segments within one network really connected?
Current and Future Research in SEQ
( ) ( ) (| |) ( ) / ( ) ( )rY s s K u s u s x u du
meanconstant here but might incorporateother covariates
weight function for relative stream orders or watershed areas
independent Gaussianprocess
kernel function: Governs spatialdependence
|u-s| = river distance d
Covariance Matched Constrained Kriging (CMCK)
Geostatistical Models
Cressie, N., Frey, J., Harch, B., and Smith, M.: 2006, ‘Spatial Prediction on a River Network’, Journal of Agricultural, Biological, and Environmental Statistics, to appear.
Covariance Matched Constrained Kriging (CMCK)
Combination of distance measures
A
B
C
Cressie, N., Frey, J., Harch, B., and Smith, M.: 2006, ‘Spatial Prediction on a River Network’, Journal of Agricultural, Biological, and Environmental Statistics, to appear.
Geostatistical Models
Fish
Invertebrates
Develop geostatistical models
Individual indices and multivariate indicators
Physical/Chemical
Nutrients
Ecosystem Processes
Determine which distance measure(s) to use
One distance measure: SLD, SHD, WAHD
More than one distance measure: CMCK (covariance matched constrained kriging)
Based on statistical evidence, ecological expertise, and survey design
Make model predictions
Geostatistical Models and the EHMP
Spatial Location of Watershed Attributes
Lumped non-spatial watershed attributes
Covariate DescriptionAREA Catchment area (ha) 30 meterURBAN % Urban 30 meterBARREN % Barren 30 meterWATER % Open Water 30 meterCONIFER % Conifer or evergreen forest type 30 meterDECIDFOR % Deciduous forest type 30 meterMIXEDFOR % Mixed forest type 30 meterEMERGWET % Emergent Herbacious Wetlands 30 meterWOODYWET % Woody or shrubby wetlands 30 meterCOALMINE % Coalmine 30 meterEASTING Easting - Albers Equal Area Conic 1 footNORTHING Northing - Albers Equal Area Conic 1 footER63-ER69 Omernik's Level III Ecoregion 1:7,500,000MEANELEV Mean elevation in the watershed 30 meterSLOPE Mean slope in the watershed 30 meterARGPERC % Argillaceous rock type in watershed 1:250,000CARPERC % Carbonic rock type in watershed 1:250,000FELPERC % Felsic rock type in watershed 1:250,000MAFPERC % Mafic rock type in watershed 1:250,000SILPERC % Siliceous rock type in watershed 1:250,000
MEANKMean soil erodability factor in watershed (adjusted for rock fragments) 1 kilometer
MAXTEMP Mean annual maximum temperature (°C) 4 kilometerMINTEMP Mean minimum temperature for January - April (°C) 4 kilometerPRECIP Mean precipitation for January - April (mm) 4 kilometerANPRECIP Mean annual precipitation 4 kilometer
Buffer streams using straight-line distance
Straight-line distance from stream outlet
Overland hydrologic distance
+ instream distance to stream
outlet
Overland hydrologic distance to stream
Spatial Location of Watershed Attributes
How large or small are patches of landuse?
How complex is the shape?
Is landuse clumped or dissected?
Is landuse adjacent to stream?
Spatial Configuration of Watershed Attributes
Network Configuration
Network Connectivity
= Survey site
BarrierBarrier
Represent connectivity on a regional scale
= Survey site
Network Connectivity
Define individual networks
Network Connectivity
Measure network size and complexity
Network Configuration and Connectivity
www.csiro.au
Questions? Comments?
Erin E. Peterson
Phone: +61 7 3214 2914
Email: Erin.Peterson@csiro.au