Post on 25-Feb-2016
description
Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model Erin PetersonGeosciences Department Colorado State University Fort Collins, Colorado
The work reported here was developed under STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. EPA does not endorse any products or commercial services mentioned in this presentation.Space-Time Aquatic Resources Modeling and Analysis Program
Introduction~Background~Patterns of spatial autocorrelation in stream water chemistry~Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: A case study in Maryland Overview
The Clean Water Act (CWA) 1972Section 303(d) Requires states and tribes to ID water quality impaired stream segments
Section 305(b)Create a biannual water quality inventoryCharacterizes regional water quality Based on attainment of designated-use standards assigned to individual stream segments
Probability-based Random Survey DesignsUsed to meet section 305(b) requirementsDerive a regional estimate of stream conditionAssign a weight based on stream orderProvides representative sample of streams by orderStatistical inference about population of streams, within stream order, over large areaReported in stream miles based on inference of attainment
Disadvantages
Does not take watershed influence into accountDoes not ID spatial location of impaired stream segmentsFails to meet requirements of CWA Section 303(d)
PurposeDevelop a geostatistical methodology based on coarse-scale GIS data and field surveys that can be used to predict water quality characteristics about stream segments found throughout a large geographic area (e.g., state)
a.k.a. KrigingInterpolation methodAllows spatial autocorrelation in error term More accurate predictions
Fit an autocovariance function to dataDescribes relationship between observations based on separation distanceGeostatistical Modeling3 Autocovariance Parameters
Nugget: variation between sites as separation distance approaches zero
Sill: delineated where semivariance asymptotes
Range: distance within which spatial autocorrelation occurs
Distance Measures & Spatial RelationshipsStraight-line Distance (SLD)Geostatistical models typically based on SLD
Distances and relationships are represented differently depending on the distance measure
Distance Measures & Spatial RelationshipsDistances and relationships are represented differently depending on the distance measureSymmetric Hydrologic Distance (SHD)Hydrologic connectivity: Fish movement
Distance Measures & Spatial RelationshipsDistances and relationships are represented differently depending on the distance measureAsymmetric Hydrologic DistanceLongitudinal transport of material
Distance Measures & Spatial Relationships Challenge: Spatial autocovariance models developed for SLD may not be valid for hydrologic distancesCovariance matrix is not positive definiteDistances and relationships are represented differently depending on the distance measure
Asymmetric Autocovariance Models for Stream NetworksWeighted asymmetric hydrologic distance (WAHD)
Developed by Jay Ver Hoef, National Marine Mammal Laboratory, Seattle
Moving average models
Incorporate flow volume, flow direction, and use hydrologic distance
Positive definite covariance matricesVer Hoef, J.M., Peterson, E.E., and Theobald, D.M., Spatial Statistical Models that Use Flow and Stream Distance, Environmental and Ecological Statistics. In Press.
Patterns of Spatial Autocorrelation in Stream Water Chemistry
Evaluate 8 chemical response variables
pH measured in the lab (PHLAB)Conductivity (COND) measured in the lab mho/cmDissolved oxygen (DO) mg/lDissolved organic carbon (DOC) mg/lNitrate-nitrogen (NO3) mg/lSulfate (SO4) mg/lAcid neutralizing capacity (ANC) eq/lTemperature (TEMP) C
Determine which distance measure is most appropriate
SLDSHDWAHDMore than one?
Find the range of spatial autocorrelationObjectives
DatasetMaryland Biological Stream Survey (MBSS) Data
Maryland Department of Natural Resources 1995, 1996, 1997
Stratified probability-based random survey design
881 sites in 17 interbasins
Spatial Distribution of MBSS Data
GIS ToolsAutomated tools needed to extract data about hydrologic relationships between survey sites did not exist!
Wrote Visual Basic for Applications (VBA) programs to:Calculate watershed covariates for each stream segmentFunctional Linkage of Watersheds and Streams (FLoWS)Calculate separation distances between sitesSLD, SHD, Asymmetric hydrologic distance (AHD)Calculate the spatial weights for the WAHDConvert GIS data to a format compatible with statistics software
FLoWS tools will be available on the STARMAP website:http://nrel.colostate.edu/projects/starmap
Spatial Weights for WAHDProportional influence (PI): influence of each neighboring survey site on a downstream survey siteWeighted by catchment area: Surrogate for flow volume
Proportional influence (PI): influence of each neighboring survey site on a downstream survey siteWeighted by catchment area: Surrogate for flow volumesurvey sitesstream segmentSpatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a downstream survey siteWeighted by catchment area: Surrogate for flow volumeABCDEFGHSpatial Weights for WAHD
Data for Geostatistical ModelingDistance matricesSLD, SHD, AHD
Spatial weights matrixContains flow dependent weights for WAHD
Watershed covariates Lumped watershed covariatesMean elevation, % Urban
ObservationsMBSS survey sites
Validation SetUnique for each chemical response variable100 sites
Initial Covariate SelectionReduce covariates to 5
Model DevelopmentRestricted model space to all possible linear modelsModel set = 32 models (25 models)One model set for:General linear model (GLM), SLD, SHD, and WAHD modelsGeostatistical Modeling Methods
Geostatistical model parameter estimationMaximize the profile log-likelihood functionGeostatistical Modeling Methods
Fit exponential autocorrelation functionModel selection within model setGLM: Akaike Information Corrected Criterion (AICC)Geostatistical models: Spatial AICC (Hoeting et al., in press)
Geostatistical Modeling Methodswhere n is the number of observations, p-1 is the number of covariates, and k is the number of autocorrelation parameters.
http://www.stat.colostate.edu/~jah/papers/spavarsel.pdf
Geostatistical Modeling MethodsModel selection between model types100 Predictions: Universal kriging algorithm Mean square prediction error (MSPE)Cannot use AICC to compare models based on different distance measures
Model comparison: r2 for observed vs. predicted values
ResultsSummary statistics for distance measuresSpatial neighborhood differsAffects number of neighboring sitesAffects median, mean, and maximum separation distance
Range of spatial autocorrelation differs:Shortest for SLDTEMP = shortest range valuesDO = largest range valuesResultsMean Range ValuesSLD = 28.2 kmSHD = 88.03 kmWAHD = 57.8 km
Distance Measures: GLM always has less predictive ability
More than one distance measure usually performed wellSLD, SHD, WAHD: PHLAB & DOCSLD and SHD : ANC, DO, NO3WAHD & SHD: COND, TEMP
SLD distance: SO4Results
Strong: ANC, COND, DOC, NO3, PHLAB Weak: DO, TEMP, SO4Resultsr2Predictive ability of models:
DiscussionSites relative influence on other sitesDictates form and size of spatial neighborhood
Important becauseImpacts accuracy of the geostatistical model predictionsDistance measure influences how spatial relationships are represented in a stream network
DiscussionProbability-based random survey design (-) affected WAHD
Maximize spatial independence of sites
Does not represent spatial relationships in networks
Validation sites randomly selected
DiscussionNot when neighbors had:Similar watershed conditionsSignificantly different chemical response values WAHD models explained more variability as neighboring sites increased
GLM predictions improved as number of neighbors increased
Clusters of sites in space have similar watershed conditionsStatistical regression pulled towards the cluster
GLM contained hidden spatial informationExplained additional variability in data with > neighborsDiscussion
Predictive Ability of Geostatistical Modelsr2
ConclusionsSpatial autocorrelation exists in stream chemistry data at a relatively coarse scale
Geostatistical models improve the accuracy of water chemistry predictions
Patterns of spatial autocorrelation differ between chemical response variablesEcological processes acting at different spatial scales
SLD is the most suitable distance measure at regional scale at this timeUnsuitable survey designsSHD: GIS processing time is prohibitive
ConclusionsResults are scale specificSpatial patterns change with survey scaleOther patterns may emerge at shorter separation distances
Further research is needed at finer scalesWatershed or small stream network
Need new survey designs for stream networksCapture both coarse and fine scale variationEnsure that hydrologic neighborhoods are represented
Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model: A Case Study In Maryland
ObjectiveDemonstrate how a geostatistical methodology can be used to meet the requirements of the Clean Water Act Predict regional water quality conditions
ID the spatial location of potentially impaired stream segments
Potential covariatesMethods
Potential covariates after initial model selection (10)Methods
Fit geostatistical models
Two distance measures: SLD and WAHD
Restricted model space to all possible linear models
1024 models per set (210 models)
Parameter Estimation
Maximized the profile log-likelihood function
Methods
Methods
ResultsSLD models performed better than WAHD
Exception: Spherical model
Best models:SLD Exponential, Mariah, and Rational Quadratic modelsr2 for SLD model predictions
Almost identicalFurther analysis restricted to SLD Mariah model
ResultsCovariates for SLD Mariah model:
WATER, EMERGWET, WOODYWET, FELPERC, & MINTEMP
Positive relationship with DOC:WATER, EMERGWET, WOODYWET, MINTEMP
Negative relationship with DOCFELPERC
Cross-validation interval: 95% of regression coefficients produced by leave-one-out cross validation procedure
Narrow intervalsFew extreme regression coefficient valuesNot produced by common sitesCovariate values for the site are represented in observed dataNot clustered in spaceCross-validation intervals for Mariah model regression coefficients
r2 Observed vs. Predicted Valuesn = 312 sitesr2 = 0.721 influential siter2 without site = 0.66
Model Fit
SLD models more accurate than WAHD modelsLandscape-scale covariates were not restricted to watershed boundariesGeology typeTemperatureWetlands & water
Discussion
Regression Coefficients
Narrow cross-validation intervals Spatial location of the sites not as important as watershed characteristics
Extreme regression coefficient valuesNot produced by common sitesNot clustered in space
Local-scale factor may have affected stream DOC Point source of organic wasteDiscussion
North and east of Chesapeake Bay - large SPE valuesNaturally acidic blackwater streams with elevated DOC
Not well represented in observed dataset 2 blackwater sites
Geostatistical model unable to account for natural variabilityLarge square prediction errors
Large prediction variancesSpatial Patterns in Model Fit
West of Chesapeake Bay - low SPE values
Due to statistical and spatial distribution of observed dataRegression equation fit to the mean in the data Most observed sites = low DOC values
Less variation in western and central Maryland Neighboring sites tend to be similar
Separation distances shorter in the west Short separation distances = stronger covariances Spatial Patterns in Model Fit
What caused abrupt differences?Point sources of organic pollutionNot represented in the model
Non-point sources of pollutionLumped watershed attributes are non-spatial Differences due to spatial location of landuse are not representedChallenging to represent ecological processes using coarse-scale lumped attributesi.e. Flow path of waterModel PerformanceUnable to account for abrupt differences in DOC values between neighboring sites with similar watershed conditions
Generate Model PredictionsPrediction sites
Study area 1st, 2nd, and 3rd order non-tidal streams3083 segments = 5973 stream km
ID downstream node of each segmentCreate prediction site
More than one site at each confluence
Generate predictions and prediction variances
SLD Mariah modelUniversal kriging algorithmAssigned predictions and prediction variances back to stream segments in GIS
Weak Model Fit
Strong Model Fit
Water Quality Attainment by Stream KilometersThreshold values for DOCSet by Maryland Department of Natural ResourcesHigh DOC values may indicate biological or ecological stress
Implications for Water Quality MonitoringTradeoff between cost-efficiency and model accuracy
Western MarylandCan be described using a single geostatistical model
Eastern and northeastern Maryland Accept poor model fitCollect additional survey data for regional geostatistical modelDevelop a separate geostatistical model for eastern Maryland
Implications for Water Quality MonitoringApply this methodology to other regulated constituents Technical and Regulatory Services Administration within the MDE modifying the NHD Include water quality standards & stream-use designations by NHD segment Use water quality standards instead of thresholdsCategorize predictions into potentially impaired or unimpaired statusReport on attainment in stream miles/kilometers
ConclusionsGeostatistical models generated more accurate DOC predictions than previous non-spatial models based on coarse-scale landscape data
SLD is more appropriate than WAHD for regional geostatistical modeling of DOC at this time
Adds value to existing water quality monitoring effortsUsed to comply with the CWA more easilyAdditional field sampling is not necessaryInferences about regional stream condition can be generatedIt can be used to identify the spatial location of potentially impaired stream segments
ConclusionsModel predictions and prediction variancesAllow additional field efforts to be concentrated inAreas with large amounts of uncertainty Areas with a greater potential for water quality impairment
Model results can be displayed visuallyAllows professionals to communicate results to a wide variety of audiences
Thank You!Advisors: Dave Theobald and Melinda LaituriCommittee Members: Will Clements and Brian Bledsoe
Collaborators: N. Scott Urquhart, Jay M. Ver Hoef, and Andrew A. Merton
Team Theobald: Grant Wilcox, John Norman, Nate Peterson, and Melissa Sherburne
Dennis Ojima and Keith Paustian
Family and friendsMy husband Nate
Questions?
405 of the 898 sites had upstream neighbors 1396 neighboring pairs 10.06 km, the minimum was 0.05 km, and the maximum was 97.19 km 431 sites had hydrologic distance < 3 km