Evaluation of Five GIS based Interpolation Techniques for Estimating the Radon Concentration for...

Post on 29-Mar-2015

216 views 1 download

Tags:

Transcript of Evaluation of Five GIS based Interpolation Techniques for Estimating the Radon Concentration for...

Evaluation of Five GIS based Interpolation Techniques for Estimating the Radon

Concentration for Unmeasured Zip Codes in the State of Ohio

By

Suman Maroju

Department of Civil Engineering

The University of Toledo

Advisor: Ashok Kumar PhD

IntroductionIntroduction Radon is a naturally occurring radioactive gas produced by the Radon is a naturally occurring radioactive gas produced by the

breakdown of Uranium in soil, rock and water. breakdown of Uranium in soil, rock and water.

Radon is the second most common cause of lung cancer after Radon is the second most common cause of lung cancer after

cigarette smoking, accounting for 15,000 to 22,000 cancer deaths cigarette smoking, accounting for 15,000 to 22,000 cancer deaths

per year in the US alone according to the National Cancer Institute per year in the US alone according to the National Cancer Institute

(USA) (USA)

Radon gas is believed to cause about 14% of lung cancer deaths

(1000+ deaths) in Ohio annually.

45% of homes in Ohio exceed the USEPA action level.

62.5% of schools in Ohio have at least one room in excess of the

USEPA action level

Data CollectionData Collection

Data collected from various county health Data collected from various county health

departments, commercial testing services and departments, commercial testing services and

university researchers.university researchers.

Original database – Kumar et al. (1990)Original database – Kumar et al. (1990)

1996 and 1997 – 82,000 1996 and 1997 – 82,000

New data being constantly addedNew data being constantly added

Total of 130,826 observations used in this studyTotal of 130,826 observations used in this study

ObjectivesObjectives

To evaluate the best interpolation technique To evaluate the best interpolation technique for the radon data set.for the radon data set.

To perform this interpolation technique on the To perform this interpolation technique on the whole radon data set, obtain prediction map whole radon data set, obtain prediction map and estimate concentrations for unmeasured and estimate concentrations for unmeasured zip codes. zip codes.

To present the impact of the results obtained To present the impact of the results obtained from this study. from this study.

ArcGIS Geostatistical AnalystArcGIS Geostatistical Analyst

Geostatistical Analyst provides a wide variety of Geostatistical Analyst provides a wide variety of

tools for spatial data exploration, identification of tools for spatial data exploration, identification of

data anomalies, evaluation of error in prediction data anomalies, evaluation of error in prediction

surface models, statistical estimation and optimal surface models, statistical estimation and optimal

surface creation.surface creation.

Exploratory Spatial Data Analysis Exploratory Spatial Data Analysis (ESDA) Tool(ESDA) Tool

The ESDA tools are designed to explore the The ESDA tools are designed to explore the

distribution of data, look for global trends in the distribution of data, look for global trends in the

data, examining spatial autocorrelation and data, examining spatial autocorrelation and

understand the correlation between multiple data understand the correlation between multiple data

sets.sets.

Tools include Histogram, Normal QQ Plot, Trend Tools include Histogram, Normal QQ Plot, Trend

Analysis, Semivariogram/Covariance Cloud.Analysis, Semivariogram/Covariance Cloud.

Histogram Histogram

The Histogram tool in The Histogram tool in

ESDA provides a ESDA provides a

univariate (one-univariate (one-

variable) description of variable) description of

the data. the data.

The plots shows the The plots shows the

frequency distribution frequency distribution

for the radon data set.for the radon data set.

Normal QQ PlotNormal QQ Plot

The QQ Plot is to

compare the distribution

of the data to a standard

normal distribution.

Trend AnalysisTrend Analysis

The Trend The Trend

Analysis tool Analysis tool

can help identify can help identify

global trends in global trends in

the input data the input data

set.set.

North-South Trend line

East-West trend line

North-South axisEast-West axis

Semivariogram/Covariance CloudSemivariogram/Covariance CloudSemivariogram Semivariogram points representing points representing pairs of locationspairs of locations

ApproachApproach The geometric mean of radon concentration values is The geometric mean of radon concentration values is

inputted for each zip code and zero values are assigned inputted for each zip code and zero values are assigned to the zip codes that are not measured. to the zip codes that are not measured.

The polygon features of Ohio zip codes shape file is The polygon features of Ohio zip codes shape file is converted into point features to input as point data converted into point features to input as point data source in the interpolation techniques. source in the interpolation techniques.

The point featured shape file is then divided into two The point featured shape file is then divided into two shape files; one having 1066 zip codes with radon shape files; one having 1066 zip codes with radon concentration data and the other contains 796 zip codes concentration data and the other contains 796 zip codes

with no measured radon concentration data.with no measured radon concentration data.

ApproachApproach

The first step is to evaluate the best interpolation The first step is to evaluate the best interpolation

technique. technique.

The point featured shape file is divided into 80% The point featured shape file is divided into 80%

training data points and 20% test data points. training data points and 20% test data points.

Sensitivity analysis for division of data setSensitivity analysis for division of data set

Then the different interpolation techniques are Then the different interpolation techniques are

executed using the training data points which executed using the training data points which

creates a layer of spatial variation and the creates a layer of spatial variation and the

predictions are evaluated for test data points.predictions are evaluated for test data points.

ApproachApproach

Second part Second part

– Best interpolation technique is chosen based on values Best interpolation technique is chosen based on values

of of statistical parametersstatistical parameters..

– Modeling is done for the whole radon data set, which Modeling is done for the whole radon data set, which

creates a surface of spatial variation and the predictions creates a surface of spatial variation and the predictions

for unmeasured zip codes (where no data is collected) for unmeasured zip codes (where no data is collected)

is evaluated from the surface created.is evaluated from the surface created.

Interpolation methodsInterpolation methods

Five Interpolation TechniquesFive Interpolation Techniques

Ordinary KrigingOrdinary Kriging

Inverse Distance Weighting (IDW)Inverse Distance Weighting (IDW)

Radial Basis Function (RBF)Radial Basis Function (RBF)

Local Polynomial Interpolation Local Polynomial Interpolation

Global Polynomial InterpolationGlobal Polynomial Interpolation

Ordinary Kriging Ordinary Kriging

Kriging is divided into two distinct tasks:Kriging is divided into two distinct tasks: Quantifying the spatial structure of the data Quantifying the spatial structure of the data

(known as variography) and producing a (known as variography) and producing a prediction i.e., fitting a spatial dependence prediction i.e., fitting a spatial dependence model to the data.model to the data.

Make a prediction for the unknown value of a Make a prediction for the unknown value of a specific location. Achieved by using the fitted specific location. Achieved by using the fitted model from the variography (spatial data model from the variography (spatial data configuration) and values of the measured configuration) and values of the measured sample points around the prediction location. sample points around the prediction location.

Ordinary KrigingOrdinary KrigingThe equation used in Ordinary Kriging is:The equation used in Ordinary Kriging is:

Z*Z* ( (uu) is the Ordinary Kriging estimate at spatial ) is the Ordinary Kriging estimate at spatial location location uu, ,

n (n (uu) is the number of the data used at the ) is the number of the data used at the known locations given a neighborhoodknown locations given a neighborhood

Z (Z (uuαα ) are the n measured data at locations ) are the n measured data at locations uuαα located close to located close to uu

m= mean of distributionm= mean of distribution

)(

1

)(1un

u

Z*(u) =

Z(u)(

)(

1

uun

m

Ordinary KrigingOrdinary Kriging

λλαα ( (u)=u)= weights for location weights for location uuαα computed from computed from the spatial covariance matrix based on the the spatial covariance matrix based on the spatial continuity (variogram) model, which is spatial continuity (variogram) model, which is given by: given by:

n is the number of data pairs separated by distance h z(ui) and z(ui+h) are the data values at locations

separated by distance h

2

1

))()((2

1huzuz

n i

n

ii

γ (h) =

Ordinary KrigingOrdinary Kriging

Ordinary KrigingOrdinary Kriging There are three primary

parameters that describe the autocorrelation of radon concentrations. These are range, nugget and sill.

– The range is where the best-fit line starts to level off, (46.55). Within the range, all data are correlated.

– The maximum semivariogram value is

sill parameter (0.2869)

– Nugget is data variation due to measurement errors (0.20487).

Range

Sill

Nugget

Spherical model

Ordinary KrigingOrdinary Kriging

Ordinary KrigingOrdinary Kriging

Inverse Distance Weighting (IDW)Inverse Distance Weighting (IDW)

IDW interpolation assumes that things close to one IDW interpolation assumes that things close to one another are more alike than those farther apart. another are more alike than those farther apart.

To predict a value for any unmeasured location, IDW will To predict a value for any unmeasured location, IDW will use the measured values surrounding the prediction use the measured values surrounding the prediction location. location.

Measured values closest to the prediction location will Measured values closest to the prediction location will have more influence on the predicted value than those have more influence on the predicted value than those farther away. farther away.

IDW assumes that each measured point has a local IDW assumes that each measured point has a local influence that diminishes with distance. influence that diminishes with distance.

Inverse Distance WeightingInverse Distance Weighting

A simple IDW weighting function, as defined by A simple IDW weighting function, as defined by Shepard, is :Shepard, is :

Where w(d) is the weighting factor applied to a known value Where w(d) is the weighting factor applied to a known value

d is the distance between known and unknown values d is the distance between known and unknown values

p is the power parameter (most common value is 2).p is the power parameter (most common value is 2).

A general form of interpolating a value using IDW is:A general form of interpolating a value using IDW is:

Inverse Distance WeightingInverse Distance Weighting

Inverse Distance WeightingInverse Distance Weighting

Radial Basis Function (RBF)Radial Basis Function (RBF)

RBF is an exact interpolation technique in the RBF is an exact interpolation technique in the sense that, the surface created must go through sense that, the surface created must go through each measured sample value.each measured sample value.

It is similar to IDW, except that it predicts values It is similar to IDW, except that it predicts values above the maximum and below the minimum above the maximum and below the minimum measured values.measured values.

Radial Basis Function (RBF)Radial Basis Function (RBF)

Radial Basis Function (RBF)Radial Basis Function (RBF)

Global Polynomial InterpolationGlobal Polynomial Interpolation

Global Global polynomial polynomial interpolation interpolation technique fits a technique fits a plane through plane through the measured the measured data points. A data points. A plane is typically plane is typically a polynomial. a polynomial.

Global Polynomial InterpolationGlobal Polynomial Interpolation

Local polynomial InterpolationLocal polynomial Interpolation

While Global While Global Polynomial Polynomial interpolation fits interpolation fits a polynomial to a polynomial to the entire the entire surface, Local surface, Local Polynomial Polynomial interpolation fits interpolation fits many many polynomials, polynomials, each within each within specified specified overlapping overlapping neighborhoods. neighborhoods.

Local polynomial InterpolationLocal polynomial Interpolation

Evaluation CriteriaEvaluation Criteria

Several statistical indicators (Root Mean Square Error Several statistical indicators (Root Mean Square Error

(RMSE), Mean Error (ME), Mean Absolute Error (MAE) (RMSE), Mean Error (ME), Mean Absolute Error (MAE)

and Mean Square Error (MSE)) are computed on observed and Mean Square Error (MSE)) are computed on observed

and predicted radon concentrations.and predicted radon concentrations.

Confidence limits on the statistics for NormalizedConfidence limits on the statistics for Normalized Mean Mean

Square Error (NMSE), Fractional Bias (FB),Square Error (NMSE), Fractional Bias (FB), and Coefficient and Coefficient

of Correlation (r) are calculated using Bootstrap application of Correlation (r) are calculated using Bootstrap application

to identify the most suitable interpolation technique.to identify the most suitable interpolation technique.

ResultsResultsMeasured Vs Predicted Radon Conc. Values for the test Measured Vs Predicted Radon Conc. Values for the test

datasetsdatasetsOrdinary Kriging Estimates for Test Dataset

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0.00 2.00 4.00 6.00 8.00 10.00

Predicted Values

Mea

sure

d V

alue

s

Ordinary Kriging estimatesfor Test Dataset

Linear (Ordinary Krigingestimates for Test Dataset)

IDW Estimates for Test Dataset

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0.00 2.00 4.00 6.00 8.00 10.00

Predicted Values

Mea

sure

d V

alue

s

IDW Estimates for TestDataset

Linear (IDW Estimates forTest Dataset)

RBF Estimates for Test Dataset

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0.00 2.00 4.00 6.00 8.00 10.00

Predicted Values

Mea

sure

d Va

lues RBF Estimates for Test

Dataset

Linear (RBF Estimates forTest Dataset)

ResultsResults Measured Vs Predicted Radon Conc. Values for test Measured Vs Predicted Radon Conc. Values for test

datasetsdatasets

LPI Estimated for Test Dataset

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0.00 2.00 4.00 6.00 8.00 10.00

Predicted Values

Mea

sure

d Va

lues LPI Estimated for Test

Dataset

Linear (LPI Estimated forTest Dataset)

GPI Estimates for the Test Dataset

0.001.002.003.004.005.00

6.007.008.009.00

10.00

0.00 2.00 4.00 6.00 8.00 10.00

Predicted Values

Mea

sure

d V

alue

s

GPI Estimates for the TestDataset

Linear (GPI Estimates forthe Test Dataset)

ResultsResults

ME, MAE, MSE and RMSE values of different ME, MAE, MSE and RMSE values of different interpolation techniques for geometric mean of interpolation techniques for geometric mean of

radon concentration test predictionsradon concentration test predictions

 Ordinary Kriging

IDW RBFGlobal

Polynomial Interpolation

Local Polynomial

Interpolation

ME 0.09 0.17 0.19 0.1 0.14

MAE 1.33 1.45 1.44 1.46 1.4

MSE 4.99 5.77 5.57 5.15 5.21

RMSE Value

2.23 2.4 2.36 2.27 2.28

ResultsResults

NMSE, FB and Corr. Values from Bootstrap MethodNMSE, FB and Corr. Values from Bootstrap Method

 Ordinary Kriging

IDW RBFGlobal

Polynomial Interpolation

Local Polynomial

Interpolation

NMSE 0.41 0.46 0.44 0.42 0.42

FB -0.026 -0.047 -0.055 -0.027 -0.041

Corr. (r) 0.5 0.42 0.45 0.48 0.47

ResultsResultsSummary of Robust and Seductive 95% Summary of Robust and Seductive 95%

Confidence Limits Analyses on Each TechniqueConfidence Limits Analyses on Each Technique

 Ordinary Kriging

IDW RBFGlobal

PolynomialLocal

Polynomial

NMSE X X X X X

FB          

Corr. (r) X X X X X

Note:X indicates significantly different from zero.Blank indicates not significantly different from zero.

ResultsResults

Summary of Robust and Seductive 95% Confidence Limits Analyses Summary of Robust and Seductive 95% Confidence Limits Analyses among Each Techniqueamong Each Technique

Interpolation Technique

Among Techniques

NMSE FB Corr.(r)

Yes No Yes No Yes No

Ordinary Kriging- IDW            

Ordinary Kriging –RBF     X      

Ordinary Kriging - GPI            

Ordinary Kriging - LPI            

IDW- RBF            

IDW- GPI            

IDW- LPI            

RBF- GPI            

RBF- LPI            

GPI – LPI            

Note:Yes- Indicates significantly different from zero.No- Indicates not significantly different from zero

Comparison of the behavior of the prediction maps with the soil Comparison of the behavior of the prediction maps with the soil uranium concentrations mapuranium concentrations map

Comparison of the behavior of the prediction maps with the soil Comparison of the behavior of the prediction maps with the soil uranium concentrations mapuranium concentrations map

ResultsResults

ResultsResults

Predicted Geometric Mean of Radon Predicted Geometric Mean of Radon Concentrations Using Ordinary Kriging technique Concentrations Using Ordinary Kriging technique

for Lucas Countyfor Lucas County

ZIP CODE COUNTY PREDICTED GM

43402 LUCAS 1.88

43445 LUCAS 2.96

43449 LUCAS 2.89

43460 LUCAS 2.35

43522 LUCAS 1.80

43551 LUCAS 2.28

43558 LUCAS 1.92

ConclusionConclusion

Prediction maps were created using the training data set for all five Prediction maps were created using the training data set for all five

interpolation techniques and projected values were estimated for the interpolation techniques and projected values were estimated for the

test data set.test data set.

Statistical parameters (error values) were evaluated and the Statistical parameters (error values) were evaluated and the

prediction maps generated from these techniques were compared to prediction maps generated from these techniques were compared to

the soil uranium concentration map.the soil uranium concentration map.

It was inferred that any of the four (Ordinary Kriging, IDW, RBF and It was inferred that any of the four (Ordinary Kriging, IDW, RBF and

Local Polynomial) interpolation techniques can be used for predicting Local Polynomial) interpolation techniques can be used for predicting

the radon concentrations for unmeasured zip codes.the radon concentrations for unmeasured zip codes.

Ordinary Kriging technique was chosen and the geometric means of Ordinary Kriging technique was chosen and the geometric means of

radon concentrations were evaluated for unmeasured zip codes.radon concentrations were evaluated for unmeasured zip codes.

ConclusionConclusion

From the data sets available prior to study, number of zip codes From the data sets available prior to study, number of zip codes having geometric mean of radon concentration over 4.0 pCi/l is having geometric mean of radon concentration over 4.0 pCi/l is 390. 390.

After using the Ordinary Kriging interpolation technique to calculate After using the Ordinary Kriging interpolation technique to calculate the predictions for unmeasured zip codes, number of zip codes the predictions for unmeasured zip codes, number of zip codes having radon concentration over 4.0 pCi/l is 688.having radon concentration over 4.0 pCi/l is 688.

The predicted radon concentrations for unmeasured zip codes were The predicted radon concentrations for unmeasured zip codes were found to be below 8 pCi/l.found to be below 8 pCi/l.

Therefore, for the cases where the geometric mean of radon Therefore, for the cases where the geometric mean of radon concentration exceeds 8 pCi/l and 20 pCi/l, the number of zip codes concentration exceeds 8 pCi/l and 20 pCi/l, the number of zip codes from existing data is equal to that obtained by interpolation from existing data is equal to that obtained by interpolation technique for unmeasured zip codes (85 and 9 for the respective technique for unmeasured zip codes (85 and 9 for the respective cases).cases).

Thank youThank you

Sensitivity Analysis for division of Sensitivity Analysis for division of data set data set

Interpolation Technique

80-20 (%) 70-30 (%) 60-40 (%)

RMSE RMSE RMSE

Ordinary Kriging 2.23 3.33 2.86

IDW 2.4 3.31 2.29

RBF 2.36 3.31 2.93

Global Polynomial 2.27 3.57 3.06

Local Polynomial 2.28 3.3 2.91