Statistical Analysis of Left-Censored Geochemical Data

21
Statistical Analysis of Left-Censored Geochemical Data Michael S. Tomlinson & Eric H. De Carlo

description

Geochemical datasets frequently contain left-censored data, i.e., the actual concentration falls in the range between 0 and the detection limit (DL). These data are referred to as nondetects (NDs). An ND does not necessarily mean the analyte was not present but, if it was present, it was at a concentration below the DL. In addition to NDs, contract labs often report estimated values (often flagged with a “J”) which lie between the DL and the reporting limit (RL). The RL is the level at or above which the lab will state the result is quantitative. A common approach to statistically analyzing left-censored data is to use substitution (e.g., ½DL). Although still a common practice, substitution can introduce bias to statistical analyses. Fortunately, there are a number of statistical techniques specifically designed to handle left-censored data that do not compromise the results of statistical analyses by using substitution. All of these techniques work with NDs and some work with estimated data. There are a number of techniques for calculating summary statistics for left-censored data including nonparametric Kaplan-Meier survival statistics, regression on order statistics (ROS), and the Turnbull interval-censored method. As the name implies, the Turnbull method works with interval censored data (i.e., quantitative data ≥RL, DL-RL [estimated], and 0-DL). In the latter two cases, an interval is used, i.e., the true value lies somewhere within the interval but picking a single value such as ½DL is not required. Interval-censored data can also be used on multivariate ordination techniques such as nonmetric multidimensional scaling (NMDS) and the interval-censored score test – an analog of the generalized Wilcoxon test. Kendall’s tau (τ) is a nonparametric correlation analysis that can be applied to left-censored data. For this test, the estimated (J-flagged) values are used. Kendall’s τ is analogous to the familiar parametric Pearson’s r and, like Pearson’s r, the test for Kendall’s τ also provides a measure of the correlation significance. The case study for this presentation will include the geochemical data and statistical results from the Hawaiʻi Ordnance Reef Follow-Up investigation of the U.S. Army’s Remotely Operated Underwater Munitions Recovery System.

Transcript of Statistical Analysis of Left-Censored Geochemical Data

Page 1: Statistical Analysis of Left-Censored Geochemical Data

Statistical Analysis ofLeft-Censored Geochemical Data

Michael S. Tomlinson & Eric H. De Carlo

Page 2: Statistical Analysis of Left-Censored Geochemical Data

Case Study – Ordnance Reef, Oʻahu, Hawaiʻi

Oʻahu

Hawaiʻi

Maui

Kauaʻi

Molokaʻi

LānaʻiKahoʻolawe

Niʻihau

Ordnance Reef Pre- & Post-ROUMRS Investigations Diamond Head

HonoluluBarbersPoint

Kaʻena Point

Kahuku Point

MakapuʻuPoint

Kailua

Kāneʻohe

Māmala Bay

Waiʻanae

Page 3: Statistical Analysis of Left-Censored Geochemical Data

What is the problem?

Disposed Military

Munitionsor DMM

(conventional)

Page 4: Statistical Analysis of Left-Censored Geochemical Data

How extensive is the problem?

Page 5: Statistical Analysis of Left-Censored Geochemical Data

What is theU.S. Army doing about it?

ROUMRS–Remotely Operated Underwater Munitions Recovery System

Page 6: Statistical Analysis of Left-Censored Geochemical Data

Did DMM recovery improve conditions and how was this determined?

• Sediments & biota were sampled and analyzed for energetics & elements in 2009 (Pre-ROUMRS)

• Sediments & biota were again sampled and analyzed for energetics & elements in 2011-2013 (Post-ROUMRS)

• Statistical analyses were conducted to characterize & compare pre- & post-ROUMRS data and identify possible analyte sources

Page 7: Statistical Analysis of Left-Censored Geochemical Data

This is what we are talking about today

Page 8: Statistical Analysis of Left-Censored Geochemical Data

Lab sends data – now what?Note: It is highly unlikely a contract lab

would send data in this format

No Information!

Page 9: Statistical Analysis of Left-Censored Geochemical Data

Nondetects (NDs) are real data!(the partial table below is a better format for geochemical data)

The “U” data qualifier inserted by data validator is redundant and unnecessarywith “ND” and ND provides NO information without the detection limit (DL)

Page 10: Statistical Analysis of Left-Censored Geochemical Data

So what do you do with nondetects (NDs)

Ignore

0

½DL

DL

RL

Page 11: Statistical Analysis of Left-Censored Geochemical Data

Read countless articles on statistics or…

buy this book which has an excellent compilation of these methods and an accompanying website:www.practicalstats.com

Page 12: Statistical Analysis of Left-Censored Geochemical Data

Format your data for these methods

• There are several methods but we will talk about two:– Interval Censored• 0 – DL, DL – RL, & quantitative result(i.e., ≥ RL)

– Indicator Variable• < DL = 1• ≥ DL = 0

Don’t worry – examples on next slide

Page 13: Statistical Analysis of Left-Censored Geochemical Data

Data Input Formats

(2 examples)

IC

IV

Page 14: Statistical Analysis of Left-Censored Geochemical Data

Summary Statistics

•No NDs

•< 50% NDs

•< 50% NDs

•≥ 50% & < 80% NDs

•≥ 80% NDs

“Standard” statistics

Kaplan-Meier (K-M) statistics (IV) or

Turnbull interval-censored method (IC)

Regression on order statistics (ROS, IV)

Maximum and # & %NDs

Page 15: Statistical Analysis of Left-Censored Geochemical Data

Summary Statistics Table (partial)

Statistical method used

Page 16: Statistical Analysis of Left-Censored Geochemical Data

Censored boxplots-visualizing data distribution & comparing data

No peeking below red line!

CENSORED

Censored boxplots use variation of the indicator variable format

Analog of nonparametric Wilcoxon test (different data format)

Page 17: Statistical Analysis of Left-Censored Geochemical Data

Possible sources of analytes? Try nonmetric multidimensional scalingAnd, notice how terrestrial elements cluster with control samples

Notice how DMM analytes cluster with DMM samples

Page 18: Statistical Analysis of Left-Censored Geochemical Data

How strong is the relationship between the various post-ROUMRS analytes?

Correlation matrix (partial) using nonparametric Kendall’s τ; bold green = sig. + correlation & bold red = sig. - corr. at α = 0.05

Page 19: Statistical Analysis of Left-Censored Geochemical Data

Conclusions• There are a number of statistical routines that can work with

left-censored data• Substitution (e.g., ½DL) is neither necessary nor

recommended• Even with left-censored data you can:– Calculate summary statistics– Visualize data distributions with boxplots– Compare datasets– Use exploratory methods to look for patterns– Calculate the strength of correlations

• There were some significant changes but they could not be attributed to ROUMRS

Page 20: Statistical Analysis of Left-Censored Geochemical Data

What’s next?Hawaii Undersea Military Munitions Assessment

• South Oʻahu - chemical munitions (16,000 100-lb mustard bombs) dumped in >500-m deep water

• Arsenic containing chemical agent Lewisite dumped in deeper water west of Oʻahu

• Biological effects using multivariate statistics• Geostatistics to determine possible sources of arsenic

Page 21: Statistical Analysis of Left-Censored Geochemical Data

Mahalo nui loa! Questions?

Michael Tomlinson – [email protected]