Data Analysis and Interpretation Data-Driven Analysis

18
1 1 Session 5: Data Analysis and Interpretation Data Analysis and Interpretation Spatial variability (e.g., exposure or environmental justice) Source characterization Supporting health effects assessments Methods evaluation Trends characterization Evaluating and improving air quality models 5 54 78 32 21 33 238 99 61 Establish a stakeholder group Assess issues of concern Set project goals Collect and QC data Interpret data & recommend actions Take action Design full project 2 Session 5: Data Analysis and Interpretation Data-Driven Analysis Are data of sufficient quantity and quality to meet project objectives with statistical certainty? Uncertainty – sampling, analytical, representativeness –N 0.5 – are there enough samples? Data quality – contamination and other data issues Using the data Evidence-based comparison to initial hypothesis Accept, reject, or find inconclusive results regarding initial hypothesis

Transcript of Data Analysis and Interpretation Data-Driven Analysis

Page 1: Data Analysis and Interpretation Data-Driven Analysis

1

1Session 5: Data Analysis and Interpretation

Data Analysis and Interpretation

• Spatial variability (e.g., exposure or environmental justice)

• Source characterization

• Supporting health effects assessments

• Methods evaluation

• Trends characterization

• Evaluating and improving air quality models

55478

32

213323899

61

Establish astakeholder

group

Assess issues ofconcern

Set projectgoals

Collect and QC data

Interpret data& recommend

actionsTake action

Design full project

2Session 5: Data Analysis and Interpretation

Data-Driven Analysis

• Are data of sufficient quantity and quality to meet project objectives with statistical certainty?– Uncertainty – sampling, analytical, representativeness

– N0.5 – are there enough samples?

– Data quality – contamination and other data issues

• Using the data – Evidence-based comparison to initial hypothesis

– Accept, reject, or find inconclusive results regarding initial hypothesis

Page 2: Data Analysis and Interpretation Data-Driven Analysis

2

3Session 5: Data Analysis and Interpretation

Project Objectives

• Community-scale monitoring– Spatial gradients near sources and environmental justice issues

– Emissions source characterization

– Support health effects assessments

– Baseline concentrations for exposure assessment

– Evaluate and improve air quality models

• Methods evaluation– Assess new methods for analysis of priority air toxics

– Evaluate methods that may be operable on a routine basis to measure air toxics

• Analyze existing data– Same as community-scale monitoring and trends analysis

4Session 5: Data Analysis and Interpretation

Spatial Comparisons

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Hancock Western Adcock Fyfe Fyfe Adcock Western Hancock

Increasing Distance from US 95

0-1 m/s

1-2 m/s

2-3 m/s>=3 m/s

upwind downwind

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Hancock Western Adcock Fyfe Fyfe Adcock Western Hancock

Increasing Distance from US 95

0-1 m/s

1-2 m/s

2-3 m/s>=3 m/s

upwind downwind

Means with 95% confidence intervals

0-1 1-2 2-3 >=3Wind Speed Bin (m/s)

0

1

2

3

4

5

6

7

BC

Con

cen

tra

tion

(ug

/m3)

Box plots

US 95 MSAT Project Examples

Page 3: Data Analysis and Interpretation Data-Driven Analysis

3

5Session 5: Data Analysis and Interpretation

Maps of average concentrations – Not quantitative

Cumulative distribution functions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1,000 10,000 100,000 1,000,000 10,000,000

Population

Fra

cti

on

of

co

un

ties

US countiesCounties with metals measurementsCounties with VOC measurementsNez Perce County

Median county population The subsets of counties with metals or VOC measurements have median populations that are at the upper end of the distribution compared to all US counties.

37786

6th percentile

26th percentileNez Perce Example

Spatial Comparisons (cont.)

Benzene

6Session 5: Data Analysis and Interpretation

Pollution rose

Nez Perce Example

Concentrations are highest when wind blows from south and southwest.

Comparison to national concentration distribution

Spatial Comparisons (cont.)

Page 4: Data Analysis and Interpretation Data-Driven Analysis

4

7Session 5: Data Analysis and Interpretation

• The site of interest has concentrations that – Are statistically significantly higher than other

sites (mean, median, other metric)

– Are higher when the wind is from a certain direction

– Are higher than concentrations at other sites in the community, state, and/or nation

– Are higher than expected given local population and emissions sources

Spatial Comparisons (cont.)

Note: Higher could be lower if you want to focus on clean sites.

8Session 5: Data Analysis and Interpretation

Emissions Source Characterization

Chemical source profiles

Pollution rose

Concentrations are high when winds blow from the south

Page 5: Data Analysis and Interpretation Data-Driven Analysis

5

Wind roses for high Zn/Mn factor/source days at Detroit’s Allen Park site point to large point sources in the industrial area of Detroit

Emissions Source Characterization(cont.)

Session 5: Data Analysis and Interpretation

MDEQ example

10Session 5: Data Analysis and Interpretation

Emissions Source Characterization (cont.)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23Hour

No

rma

lize

d C

on

ce

ntr

ati

on

s

Rush hour peak

Nighttime peak

Photo-chemical peakMidday Peak

Invariant

Nighttime Peak

Morning Peak

Diurnal profiles

Monday

Tuesday

Wednesd

ay

Thursday

Friday

Saturday

Sunday0

1

2

3

4

5

6

Co

nce

ntr

atio

n (µ

g/m

3)

Benzene Day-of-week profiles

Diurnal and day-of-week profiles provide information on emissions temporal activity

Page 6: Data Analysis and Interpretation Data-Driven Analysis

6

11Session 5: Data Analysis and Interpretation

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Hour

No

rma

lize

d tr

affic

act

ivity

, m

ixin

g

he

igh

t, o

r so

lar

rad

iatio

n Solar Radiation

Diurnal Patterns: Conceptual Model

Concentrations = (Sources – Sinks + Transport)/Dispersion

Source = Traffic Activity

Dispersion = Inverse Mixing Height

Sinks = OH radical

12Session 5: Data Analysis and Interpretation

Time Series Analysis

Nez Perce Example

0

5

10

15

20

25

01-May-06 01-Jul-06 31-Aug-06 31-Oct-06 31-Dec-06 02-Mar-07 02-May-07

conc

en

trat

ion

(pp

b)

HATWAI 1 ITD average LAPWAI LSOB SUNSET 5-site average

Time series of formaldehyde concentrations (ppb) at each Lewiston area monitoring site and a five-site average. Error bars indicate the standard deviation in the five-site average.

Page 7: Data Analysis and Interpretation Data-Driven Analysis

7

13Session 5: Data Analysis and Interpretation

Emissions Source Characterization

Receptor Modeling

Biogenic6% Liquid Gas

10%

Evaporative Emissions

7%

Motor Vehicle45%

Natural Gas21%

Industrial Process Losses

11%

Apportionment of benzene (in total VOC) at a Los

Angeles site

Apportionment of VOCs in Edmonton AB

14Session 5: Data Analysis and Interpretation

Emissions Source Characterization (cont.)

Analyses can demonstrate that• Chemical fingerprint profiles are consistent

with emissions source

• Concentrations of certain pollutants are higher when winds are from source direction

• Temporal variability is consistent with emissions activity

• Concentrations at nearby receptors are higher than at other sites

• Receptor modeling identifies and quantifies emissions source

Page 8: Data Analysis and Interpretation Data-Driven Analysis

8

15Session 5: Data Analysis and Interpretation

Health Effects Assessments

Upper limit of risk

<1x10-x

Upper limit of risk

>1x10-x Risk>1x10-x

Risk<1x10-x

Risk screening

Where 10-x is user defined

16Session 5: Data Analysis and Interpretation

Health Effects Assessments (cont.)

Model to monitor comparison

Page 9: Data Analysis and Interpretation Data-Driven Analysis

9

17Session 5: Data Analysis and Interpretation

Health Effects Assessments

A. Pollutants with a majority of sites with risk estimates > 1-in-a-million risk level

B. Pollutants with most of the data < MDL, but detection limits above the 1-in-a-million risk level

C. Pollutants with the majority of monitoring sites reporting concentrations < the 1-in-a-million risk level including those usually above and below MDL

Risk-weighted concentrations

18Session 5: Data Analysis and Interpretation

Health Effects Assessments (cont.)

0

3

6

9

12

15

18

21

24

27

30

33

36

Form

aldeh

yde

Chrom

ium

Benze

ne

1,3-

Butadie

ne

Carbo

n tet

Aceta

ldehy

de

Chloro

form

1,4-

dich

lorob

enzen

e

Tetra

chlor

oeth

ene

Ch

ron

ic c

ance

r ri

sk (

per

mill

ion)

Chronic cancer risk (per million people) comparison for the highest annual mean concentration monitored in the Lewiston study.

Page 10: Data Analysis and Interpretation Data-Driven Analysis

10

19Session 5: Data Analysis and Interpretation

Health Effects Assessments (cont.)

Ambient data can be used to • Perform simple risk screening against levels of

concern

• Calculate risk-weighted concentrations to estimate risk levels

• Validate and evaluate modeling efforts

• Identify temporal variability in concentrations for use in exposure modeling efforts

20Session 5: Data Analysis and Interpretation

Methods Evaluation

Shelow et al., National Air Monitoring Conference, 2009

Page 11: Data Analysis and Interpretation Data-Driven Analysis

11

21Session 5: Data Analysis and Interpretation

Methods Evaluation (cont.)

8

6

4

2

0

12:00 PM1/7/2008

12:00 AM1/10/2008

12:00 PM1/12/2008

12:00 AM1/15/2008

12:00 PM1/17/2008

12:00 AM1/20/2008

12:00 PM1/22/2008

12:00 AM1/25/2008

12:00 PM1/27/2008

dat

BC Sunset EC (g/m3)

(g/m3)

22Session 5: Data Analysis and Interpretation

Methods Evaluation (cont.)

8

6

4

2

0

Aet

h B

C

54321Sunset EC

Coefficient values ± one standard deviation a =0.11438 ± 0.0241 b =1.6522 ± 0.0176

1086420

OM concentration µg/m3

Average BC/EC ratio=1.83

Page 12: Data Analysis and Interpretation Data-Driven Analysis

12

23Session 5: Data Analysis and Interpretation

Methods Evaluation (cont.)

• Evaluation against standards can test accuracy and precision

• Evaluation against other existing methods can identify biases and real-world performance under ambient conditions

• Novel methods often provide surprising data that lead to better understanding of local emissions sources (e.g., local chrome facilities, solvent releases)

24Session 5: Data Analysis and Interpretation

Trend Analyses

• Trends are useful for demonstrating progress (or lack thereof) in mitigating emissions of air toxics.

• Trend analysis can be complicated by data below MDL, changing methods, and step-changes in ambient concentrations.

Page 13: Data Analysis and Interpretation Data-Driven Analysis

13

25Session 5: Data Analysis and Interpretation

Effect of Changes in MDL on Trends Assessment

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Co

nce

ntr

atio

n (

µg

/m3 )

Average ConcentrationAverage MDL

Year

Man

gan

ese

PM

2.5

In the national-level investigation of manganese (Mn) trends, MDL trends were similar to concentration trends, making us suspicious of the reliability of the overall ambient trend. This example shows average Mn PM2.5 concentrations and MDLs from 1990 to 2003. For this data set, Hyslop and White (2007) showed that reported MDLs are much lower than actual detection limits. Current recommendations are to be cautious with data within a factor of 6 to 10 of the reported MDL.

The trend shown here may not be a real trend.

26Session 5: Data Analysis and Interpretation

y = - 0 . 0 2 x + 4 8 . 6 3

R 2 = 0 . 8 8

0

0 . 0 5

0 . 1

0 . 1 5

0 . 2

0 . 2 5

0 . 3

0 . 3 5

0 . 4

0 . 4 5

1 9 9 6 1 9 9 8 2 0 0 0 2 0 0 2 2 0 0 4 2 0 0 6

Y e a r

Co

nc

en

tra

tio

n (

µg

/m3

)

Effect of Changes in MDL on Trends Assessment (cont.)

In contrast to the previous Mn PM2.5 trend, this benzene trend does not show influence from a change in MDL (i.e., the trends in concentration and MDL show different patterns).

Benzene 1997-2006 Trend

Page 14: Data Analysis and Interpretation Data-Driven Analysis

14

27Session 5: Data Analysis and Interpretation

How to Quantify Trends

• Initial investigation of trends– Inspect first and last year of the trend period or two multi-year

averages for change.– Use simple linear regression to determine the magnitude of a

trend over the trend period.

• Quantifying trends– The percent difference between the first and last year of the

trend period provides a rough sense of the change. – The difference between two multi-year averages provides

another measure of change and helps smooth out possible influences of meteorology.

– The percent change per year is provided by the slope of the regression line. This “normalized” value allows the analyst to compare changes across varying lengths of time (i.e., sites withdifferent trend periods).

• Test for significance (F-test or others)

28Session 5: Data Analysis and Interpretation

Visualizing Trends

y = 0.0851x - 166.54R2 = 0.369

y = 0.1253x - 248.65R2 = 0.5038

0

1

2

3

4

1992 1993 1994 1995 1996 1997 1998 1999 2000

Year

Co

nce

ntr

atio

n (

µg

/m3 )

y = -0.2248x + 450.95R2 = 0.5492

0

1

2

3

4

1992 1993 1994 1995 1996 1997 1998 1999 2000

Co

nce

ntr

atio

n (

µg

/m3 )

Benzene Annual Averages

%Δ = -55%

Page 15: Data Analysis and Interpretation Data-Driven Analysis

15

29Session 5: Data Analysis and Interpretation

Formaldehyde Annual Averages

1998 1999 2000 2001 2002 2003 2004 2005 2006Year

0

5

10

15

20

25

30

35

Co

nce

ntr

atio

n (

ug

/m3 )

Formaldehyde Annual Averages

0

5

10

15

20

25

1999 2000 2001 2002 2003 2004 2005

Year

Co

nce

ntr

atio

n (

µg

/m3 )

95th Percentile of Annual ConcentrationAnnual Average ConcentrationAnnual Median Concentration5th Percentile of Annual Concentration

Visualizing Trends (cont.)

30Session 5: Data Analysis and Interpretation

Summarizing Trends

y = -0.16x + 314.62R

2= 0.90

0

0.5

1

1.5

2

2.5

3

1999 2001 2003 2005Year

Co

nce

ntr

atio

n (

µg

/m3 )

Annual AverageAverage MDL

A statistically significant decreasing benzene trend

y = -0.01x + 26.18R2= 0.03

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2000 2002 2004 2006Year

Co

nce

ntr

atio

n (

µg

/m3 )

Annual AverageAverage MDL

A statistically insignificant decreasing benzene trend

• Site-level trends for benzene from two U.S. sites.

• Confidence in these results is high. The data are mostly above detection, MDLs are consistent for the whole trend period, and no outliers appear to influence the trend.

Comparing Trends

Page 16: Data Analysis and Interpretation Data-Driven Analysis

16

31Session 5: Data Analysis and Interpretation

Summarizing Trends (cont.)

Example from MDEQ trends report

32Session 5: Data Analysis and Interpretation

Interpretation of Results

• Were data of sufficient quantity and quality to meet project objectives with statistical certainty?– Uncertainty – sampling, analytical, representativeness

– N0.5 – are there enough samples?

– Data quality – contamination and other data issues

• Using the data to test project hypotheses – Emissions characterization

• Chemical source profile comparison (ambient vs. emissions)

• Emissions activity matches expected temporal patterns after adjusting for meteorology

• Wind analysis to corroborate impact of emitter on monitor

• Source apportionment

Page 17: Data Analysis and Interpretation Data-Driven Analysis

17

33Session 5: Data Analysis and Interpretation

Interpretation of Results (cont.)

• Using the data to test project hypotheses – Health effects assessment

• Comparison of concentrations to health benchmarks

• Comparison of concentrations to other sites/cities/states/nation

• Identifying pollutants above health benchmarks

– Community baseline• Characterizing annual averages, seasonal variability

• Quantifying toxics concentrations likely to be targeted by emissions reductions measures

• Characterizing spatial variability

– Methods evaluation• Is method more accurate, precise, sensitive?

• Does it have better time resolution?

• How much does it cost versus routine method?

34Session 5: Data Analysis and Interpretation

Lessons Learned from Data Analysis

• Plan for data analysis in your study.• Reserve adequate funds and time (schedule) to

conduct data analysis.• Start looking at your data early in the project as

it is first collected – don’t wait until the end.• Isolating a particular source impact on pollutant

concentrations is tricky (and local met data are vital).

• Many studies noted issues with differing MDLs across labs, too much data below MDL, etc.

Page 18: Data Analysis and Interpretation Data-Driven Analysis

18

35Session 5: Data Analysis and Interpretation

Lessons Learned from Data Analysis (cont.)

• Analyses do not always lead to the answer you anticipated.

• Getting a similar result using different analysis approaches gives you more confidence in your results.

• Visualization of data is key (…a picture tells a thousand words).

• Show uncertainty in results to demonstrate statistical significance of findings.