Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional...

Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model Erin PetersonGeosciences Department Colorado State University Fort Collins, Colorado

The work reported here was developed under STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. EPA does not endorse any products or commercial services mentioned in this presentation.Space-Time Aquatic Resources Modeling and Analysis Program

Introduction~Background~Patterns of spatial autocorrelation in stream water chemistry~Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: A case study in Maryland Overview

The Clean Water Act (CWA) 1972Section 303(d) Requires states and tribes to ID water quality impaired stream segments

Section 305(b)Create a biannual water quality inventoryCharacterizes regional water quality Based on attainment of designated-use standards assigned to individual stream segments

Probability-based Random Survey DesignsUsed to meet section 305(b) requirementsDerive a regional estimate of stream conditionAssign a weight based on stream orderProvides representative sample of streams by orderStatistical inference about population of streams, within stream order, over large areaReported in stream miles based on inference of attainment

Disadvantages

Does not take watershed influence into accountDoes not ID spatial location of impaired stream segmentsFails to meet requirements of CWA Section 303(d)

PurposeDevelop a geostatistical methodology based on coarse-scale GIS data and field surveys that can be used to predict water quality characteristics about stream segments found throughout a large geographic area (e.g., state)

a.k.a. KrigingInterpolation methodAllows spatial autocorrelation in error term More accurate predictions

Fit an autocovariance function to dataDescribes relationship between observations based on separation distanceGeostatistical Modeling3 Autocovariance Parameters

Nugget: variation between sites as separation distance approaches zero

Sill: delineated where semivariance asymptotes

Range: distance within which spatial autocorrelation occurs

Distance Measures & Spatial RelationshipsStraight-line Distance (SLD)Geostatistical models typically based on SLD

Distances and relationships are represented differently depending on the distance measure

Distance Measures & Spatial RelationshipsDistances and relationships are represented differently depending on the distance measureSymmetric Hydrologic Distance (SHD)Hydrologic connectivity: Fish movement

Distance Measures & Spatial RelationshipsDistances and relationships are represented differently depending on the distance measureAsymmetric Hydrologic DistanceLongitudinal transport of material

Distance Measures & Spatial Relationships Challenge: Spatial autocovariance models developed for SLD may not be valid for hydrologic distancesCovariance matrix is not positive definiteDistances and relationships are represented differently depending on the distance measure

Asymmetric Autocovariance Models for Stream NetworksWeighted asymmetric hydrologic distance (WAHD)

Developed by Jay Ver Hoef, National Marine Mammal Laboratory, Seattle

Moving average models

Incorporate flow volume, flow direction, and use hydrologic distance

Positive definite covariance matricesVer Hoef, J.M., Peterson, E.E., and Theobald, D.M., Spatial Statistical Models that Use Flow and Stream Distance, Environmental and Ecological Statistics. In Press.

Patterns of Spatial Autocorrelation in Stream Water Chemistry

Evaluate 8 chemical response variables

pH measured in the lab (PHLAB)Conductivity (COND) measured in the lab mho/cmDissolved oxygen (DO) mg/lDissolved organic carbon (DOC) mg/lNitrate-nitrogen (NO3) mg/lSulfate (SO4) mg/lAcid neutralizing capacity (ANC) eq/lTemperature (TEMP) C

Determine which distance measure is most appropriate

SLDSHDWAHDMore than one?

Find the range of spatial autocorrelationObjectives

DatasetMaryland Biological Stream Survey (MBSS) Data

Maryland Department of Natural Resources 1995, 1996, 1997

Stratified probability-based random survey design

881 sites in 17 interbasins

Spatial Distribution of MBSS Data

GIS ToolsAutomated tools needed to extract data about hydrologic relationships between survey sites did not exist!

Wrote Visual Basic for Applications (VBA) programs to:Calculate watershed covariates for each stream segmentFunctional Linkage of Watersheds and Streams (FLoWS)Calculate separation distances between sitesSLD, SHD, Asymmetric hydrologic distance (AHD)Calculate the spatial weights for the WAHDConvert GIS data to a format compatible with statistics software

FLoWS tools will be available on the STARMAP website:http://nrel.colostate.edu/projects/starmap

Spatial Weights for WAHDProportional influence (PI): influence of each neighboring survey site on a downstream survey siteWeighted by catchment area: Surrogate for flow volume

Proportional influence (PI): influence of each neighboring survey site on a downstream survey siteWeighted by catchment area: Surrogate for flow volumesurvey sitesstream segmentSpatial Weights for WAHD

Proportional influence (PI): influence of each neighboring survey site on a downstream survey siteWeighted by catchment area: Surrogate for flow volumeABCDEFGHSpatial Weights for WAHD

Data for Geostatistical ModelingDistance matricesSLD, SHD, AHD

Spatial weights matrixContains flow dependent weights for WAHD

Watershed covariates Lumped watershed covariatesMean elevation, % Urban

ObservationsMBSS survey sites

Validation SetUnique for each chemical response variable100 sites

Initial Covariate SelectionReduce covariates to 5

Model DevelopmentRestricted model space to all possible linear modelsModel set = 32 models (25 models)One model set for:General linear model (GLM), SLD, SHD, and WAHD modelsGeostatistical Modeling Methods

Geostatistical model parameter estimationMaximize the profile log-likelihood functionGeostatistical Modeling Methods

Fit exponential autocorrelation functionModel selection within model setGLM: Akaike Information Corrected Criterion (AICC)Geostatistical models: Spatial AICC (Hoeting et al., in press)

Geostatistical Modeling Methodswhere n is the number of observations, p-1 is the number of covariates, and k is the number of autocorrelation parameters.

http://www.stat.colostate.edu/~jah/papers/spavarsel.pdf

Geostatistical Modeling MethodsModel selection between model types100 Predictions: Universal kriging algorithm Mean square prediction error (MSPE)Cannot use AICC to compare models based on different distance measures

Model comparison: r2 for observed vs. predicted values

ResultsSummary statistics for distance measuresSpatial neighborhood differsAffects number of neighboring sitesAffects median, mean, and maximum separation distance

Range of spatial autocorrelation differs:Shortest for SLDTEMP = shortest range valuesDO = largest range valuesResultsMean Range ValuesSLD = 28.2 kmSHD = 88.03 kmWAHD = 57.8 km

Distance Measures: GLM always has less predictive ability

More than one distance measure usually performed wellSLD, SHD, WAHD: PHLAB & DOCSLD and SHD : ANC, DO, NO3WAHD & SHD: COND, TEMP

SLD distance: SO4Results

Strong: ANC, COND, DOC, NO3, PHLAB Weak: DO, TEMP, SO4Resultsr2Predictive ability of models:

DiscussionSites relative influence on other sitesDictates form and size of spatial neighborhood

Important becauseImpacts accuracy of the geostatistical model predictionsDistance measure influences how spatial relationships are represented in a stream network

DiscussionProbability-based random survey design (-) affected WAHD

Maximize spatial independence of sites

Does not represent spatial relationships in networks

Validation sites randomly selected

DiscussionNot when neighbors had:Similar watershed conditionsSignificantly different chemical response values WAHD models explained more variability as neighboring sites increased

GLM predictions improved as number of neighbors increased

Clusters of sites in space have similar watershed conditionsStatistical regression pulled towards the cluster

GLM contained hidden spatial informationExplained additional variability in data with > neighborsDiscussion

Predictive Ability of Geostatistical Modelsr2

ConclusionsSpatial autocorrelation exists in stream chemistry data at a relatively coarse scale

Geostatistical models improve the accuracy of water chemistry predictions

Patterns of spatial autocorrelation differ between chemical response variablesEcological processes acting at different spatial scales

SLD is the most suitable distance measure at regional scale at this timeUnsuitable survey designsSHD: GIS processing time is prohibitive

ConclusionsResults are scale specificSpatial patterns change with survey scaleOther patterns may emerge at shorter separation distances

Further research is needed at finer scalesWatershed or small stream network

Need new survey designs for stream networksCapture both coarse and fine scale variationEnsure that hydrologic neighborhoods are represented

Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model: A Case Study In Maryland

ObjectiveDemonstrate how a geostatistical methodology can be used to meet the requirements of the Clean Water Act Predict regional water quality conditions

ID the spatial location of potentially impaired stream segments

Potential covariatesMethods

Potential covariates after initial model selection (10)Methods

Fit geostatistical models

Two distance measures: SLD and WAHD

Restricted model space to all possible linear models

1024 models per set (210 models)

Parameter Estimation

Maximized the profile log-likelihood function

Methods

Methods

ResultsSLD models performed better than WAHD

Exception: Spherical model

Best models:SLD Exponential, Mariah, and Rational Quadratic modelsr2 for SLD model predictions

Almost identicalFurther analysis restricted to SLD Mariah model

ResultsCovariates for SLD Mariah model:

WATER, EMERGWET, WOODYWET, FELPERC, & MINTEMP

Positive relationship with DOC:WATER, EMERGWET, WOODYWET, MINTEMP

Negative relationship with DOCFELPERC

Cross-validation interval: 95% of regression coefficients produced by leave-one-out cross validation procedure

Narrow intervalsFew extreme regression coefficient valuesNot produced by common sitesCovariate values for the site are represented in observed dataNot clustered in spaceCross-validation intervals for Mariah model regression coefficients

r2 Observed vs. Predicted Valuesn = 312 sitesr2 = 0.721 influential siter2 without site = 0.66

Model Fit

SLD models more accurate than WAHD modelsLandscape-scale covariates were not restricted to watershed boundariesGeology typeTemperatureWetlands & water

Discussion

Regression Coefficients

Narrow cross-validation intervals Spatial location of the sites not as important as watershed characteristics

Extreme regression coefficient valuesNot produced by common sitesNot clustered in space

Local-scale factor may have affected stream DOC Point source of organic wasteDiscussion

North and east of Chesapeake Bay - large SPE valuesNaturally acidic blackwater streams with elevated DOC

Not well represented in observed dataset 2 blackwater sites

Geostatistical model unable to account for natural variabilityLarge square prediction errors

Large prediction variancesSpatial Patterns in Model Fit

West of Chesapeake Bay - low SPE values

Due to statistical and spatial distribution of observed dataRegression equation fit to the mean in the data Most observed sites = low DOC values

Less variation in western and central Maryland Neighboring sites tend to be similar

Separation distances shorter in the west Short separation distances = stronger covariances Spatial Patterns in Model Fit

What caused abrupt differences?Point sources of organic pollutionNot represented in the model

Non-point sources of pollutionLumped watershed attributes are non-spatial Differences due to spatial location of landuse are not representedChallenging to represent ecological processes using coarse-scale lumped attributesi.e. Flow path of waterModel PerformanceUnable to account for abrupt differences in DOC values between neighboring sites with similar watershed conditions

Generate Model PredictionsPrediction sites

Study area 1st, 2nd, and 3rd order non-tidal streams3083 segments = 5973 stream km

ID downstream node of each segmentCreate prediction site

More than one site at each confluence

Generate predictions and prediction variances

SLD Mariah modelUniversal kriging algorithmAssigned predictions and prediction variances back to stream segments in GIS

Weak Model Fit

Strong Model Fit

Water Quality Attainment by Stream KilometersThreshold values for DOCSet by Maryland Department of Natural ResourcesHigh DOC values may indicate biological or ecological stress

Implications for Water Quality MonitoringTradeoff between cost-efficiency and model accuracy

Western MarylandCan be described using a single geostatistical model

Eastern and northeastern Maryland Accept poor model fitCollect additional survey data for regional geostatistical modelDevelop a separate geostatistical model for eastern Maryland

Implications for Water Quality MonitoringApply this methodology to other regulated constituents Technical and Regulatory Services Administration within the MDE modifying the NHD Include water quality standards & stream-use designations by NHD segment Use water quality standards instead of thresholdsCategorize predictions into potentially impaired or unimpaired statusReport on attainment in stream miles/kilometers

ConclusionsGeostatistical models generated more accurate DOC predictions than previous non-spatial models based on coarse-scale landscape data

SLD is more appropriate than WAHD for regional geostatistical modeling of DOC at this time

Adds value to existing water quality monitoring effortsUsed to comply with the CWA more easilyAdditional field sampling is not necessaryInferences about regional stream condition can be generatedIt can be used to identify the spatial location of potentially impaired stream segments

ConclusionsModel predictions and prediction variancesAllow additional field efforts to be concentrated inAreas with large amounts of uncertainty Areas with a greater potential for water quality impairment

Model results can be displayed visuallyAllows professionals to communicate results to a wide variety of audiences

Thank You!Advisors: Dave Theobald and Melinda LaituriCommittee Members: Will Clements and Brian Bledsoe

Collaborators: N. Scott Urquhart, Jay M. Ver Hoef, and Andrew A. Merton

Team Theobald: Grant Wilcox, John Norman, Nate Peterson, and Melissa Sherburne

Dennis Ojima and Keith Paustian

Family and friendsMy husband Nate

Questions?

405 of the 898 sites had upstream neighbors 1396 neighboring pairs 10.06 km, the minimum was 0.05 km, and the maximum was 97.19 km 431 sites had hydrologic distance < 3 km

Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional...

Documents

Transcript of Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional...

Integration of Geostatistical Modeling with History … of Geostatistical Modeling with History ... of geostatistical modelling with history matching is an important process to optimize

Comparison of Regression and Geostatistical

Bayesian Inference for Geostatistical Regression Modelsnsu/starmap/johnson.spatial.regression.pdf · Bayesian Inference for Geostatistical Regression Models ... generalized linear

Erin Peterson Environmental Risk Technologies CSIRO Mathematical & Information Sciences St Lucia, Queensland Predicting Water Quality Impaired Stream Segments.

MONITORING FOR IMPAIRED SEGMENTS 2011-12 EPA TASK …

Geostatistical Reservoir Characterization in Barracuda ...

Geostatistical Analyst Tutorial - ArcGIS

Geostatistical Simulation under Orthogonal Transformed ...

Spatial data analysis: geostatistical tools

Tutorial Geostatistical Analyst

Geostatistical Approaches for Quantifying Facies ...

Covariance Estimation and Geostatistical Simulation for ...earth.esa.int/fringe07/participants/225/pres_225_knospe.pdf · Covariance Estimation and Geostatistical Simulation for InSAR

Geostatistical Analysis

SPATIO-TEMPORAL TOOLS AND GEOSTATISTICAL …

2D Geostatistical Modeling and Volume Estimation …spgindia.org/11_biennial_form/2d-geostatistical-modeling-and...2D Geostatistical Modeling d) Area of Poor Reservoir Quality In the

An Improved Model for Geostatistical Simulation of ... Improved Model for Geostatistical Simulation of ... a fractured reservoir in South of ... for Geostatistical Simulation of Fracture

Geostatistical Analyst

Tutorial - Using ArcGIS Geostatistical Analystdownloads2.esri.com/.../ao_/Geostatistical_Analyst_Tutorial.pdf · With Geostatistical Analyst, you can easily create a continuous surface,

Geostatistical Rock Physics AVA Inversion

Geostatistical Inversion in Carbonate and Clastic ... · Geostatistical Inversion in Carbonate and Clastic Reservoirs: Oilfield Case Studies ... of using geostatistical seismic inversion