Data Analysis and Interpretation Data-Driven Analysis
-
Upload
dinhkhuong -
Category
Documents
-
view
265 -
download
13
Transcript of Data Analysis and Interpretation Data-Driven Analysis
1
1Session 5: Data Analysis and Interpretation
Data Analysis and Interpretation
• Spatial variability (e.g., exposure or environmental justice)
• Source characterization
• Supporting health effects assessments
• Methods evaluation
• Trends characterization
• Evaluating and improving air quality models
55478
32
213323899
61
Establish astakeholder
group
Assess issues ofconcern
Set projectgoals
Collect and QC data
Interpret data& recommend
actionsTake action
Design full project
2Session 5: Data Analysis and Interpretation
Data-Driven Analysis
• Are data of sufficient quantity and quality to meet project objectives with statistical certainty?– Uncertainty – sampling, analytical, representativeness
– N0.5 – are there enough samples?
– Data quality – contamination and other data issues
• Using the data – Evidence-based comparison to initial hypothesis
– Accept, reject, or find inconclusive results regarding initial hypothesis
2
3Session 5: Data Analysis and Interpretation
Project Objectives
• Community-scale monitoring– Spatial gradients near sources and environmental justice issues
– Emissions source characterization
– Support health effects assessments
– Baseline concentrations for exposure assessment
– Evaluate and improve air quality models
• Methods evaluation– Assess new methods for analysis of priority air toxics
– Evaluate methods that may be operable on a routine basis to measure air toxics
• Analyze existing data– Same as community-scale monitoring and trends analysis
4Session 5: Data Analysis and Interpretation
Spatial Comparisons
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Hancock Western Adcock Fyfe Fyfe Adcock Western Hancock
Increasing Distance from US 95
0-1 m/s
1-2 m/s
2-3 m/s>=3 m/s
upwind downwind
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Hancock Western Adcock Fyfe Fyfe Adcock Western Hancock
Increasing Distance from US 95
0-1 m/s
1-2 m/s
2-3 m/s>=3 m/s
upwind downwind
Means with 95% confidence intervals
0-1 1-2 2-3 >=3Wind Speed Bin (m/s)
0
1
2
3
4
5
6
7
BC
Con
cen
tra
tion
(ug
/m3)
Box plots
US 95 MSAT Project Examples
3
5Session 5: Data Analysis and Interpretation
Maps of average concentrations – Not quantitative
Cumulative distribution functions
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
100 1,000 10,000 100,000 1,000,000 10,000,000
Population
Fra
cti
on
of
co
un
ties
US countiesCounties with metals measurementsCounties with VOC measurementsNez Perce County
Median county population The subsets of counties with metals or VOC measurements have median populations that are at the upper end of the distribution compared to all US counties.
37786
6th percentile
26th percentileNez Perce Example
Spatial Comparisons (cont.)
Benzene
6Session 5: Data Analysis and Interpretation
Pollution rose
Nez Perce Example
Concentrations are highest when wind blows from south and southwest.
Comparison to national concentration distribution
Spatial Comparisons (cont.)
4
7Session 5: Data Analysis and Interpretation
• The site of interest has concentrations that – Are statistically significantly higher than other
sites (mean, median, other metric)
– Are higher when the wind is from a certain direction
– Are higher than concentrations at other sites in the community, state, and/or nation
– Are higher than expected given local population and emissions sources
Spatial Comparisons (cont.)
Note: Higher could be lower if you want to focus on clean sites.
8Session 5: Data Analysis and Interpretation
Emissions Source Characterization
Chemical source profiles
Pollution rose
Concentrations are high when winds blow from the south
5
Wind roses for high Zn/Mn factor/source days at Detroit’s Allen Park site point to large point sources in the industrial area of Detroit
Emissions Source Characterization(cont.)
Session 5: Data Analysis and Interpretation
MDEQ example
10Session 5: Data Analysis and Interpretation
Emissions Source Characterization (cont.)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23Hour
No
rma
lize
d C
on
ce
ntr
ati
on
s
Rush hour peak
Nighttime peak
Photo-chemical peakMidday Peak
Invariant
Nighttime Peak
Morning Peak
Diurnal profiles
Monday
Tuesday
Wednesd
ay
Thursday
Friday
Saturday
Sunday0
1
2
3
4
5
6
Co
nce
ntr
atio
n (µ
g/m
3)
Benzene Day-of-week profiles
Diurnal and day-of-week profiles provide information on emissions temporal activity
6
11Session 5: Data Analysis and Interpretation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hour
No
rma
lize
d tr
affic
act
ivity
, m
ixin
g
he
igh
t, o
r so
lar
rad
iatio
n Solar Radiation
Diurnal Patterns: Conceptual Model
Concentrations = (Sources – Sinks + Transport)/Dispersion
Source = Traffic Activity
Dispersion = Inverse Mixing Height
Sinks = OH radical
12Session 5: Data Analysis and Interpretation
Time Series Analysis
Nez Perce Example
0
5
10
15
20
25
01-May-06 01-Jul-06 31-Aug-06 31-Oct-06 31-Dec-06 02-Mar-07 02-May-07
conc
en
trat
ion
(pp
b)
HATWAI 1 ITD average LAPWAI LSOB SUNSET 5-site average
Time series of formaldehyde concentrations (ppb) at each Lewiston area monitoring site and a five-site average. Error bars indicate the standard deviation in the five-site average.
7
13Session 5: Data Analysis and Interpretation
Emissions Source Characterization
Receptor Modeling
Biogenic6% Liquid Gas
10%
Evaporative Emissions
7%
Motor Vehicle45%
Natural Gas21%
Industrial Process Losses
11%
Apportionment of benzene (in total VOC) at a Los
Angeles site
Apportionment of VOCs in Edmonton AB
14Session 5: Data Analysis and Interpretation
Emissions Source Characterization (cont.)
Analyses can demonstrate that• Chemical fingerprint profiles are consistent
with emissions source
• Concentrations of certain pollutants are higher when winds are from source direction
• Temporal variability is consistent with emissions activity
• Concentrations at nearby receptors are higher than at other sites
• Receptor modeling identifies and quantifies emissions source
8
15Session 5: Data Analysis and Interpretation
Health Effects Assessments
Upper limit of risk
<1x10-x
Upper limit of risk
>1x10-x Risk>1x10-x
Risk<1x10-x
Risk screening
Where 10-x is user defined
16Session 5: Data Analysis and Interpretation
Health Effects Assessments (cont.)
Model to monitor comparison
9
17Session 5: Data Analysis and Interpretation
Health Effects Assessments
A. Pollutants with a majority of sites with risk estimates > 1-in-a-million risk level
B. Pollutants with most of the data < MDL, but detection limits above the 1-in-a-million risk level
C. Pollutants with the majority of monitoring sites reporting concentrations < the 1-in-a-million risk level including those usually above and below MDL
Risk-weighted concentrations
18Session 5: Data Analysis and Interpretation
Health Effects Assessments (cont.)
0
3
6
9
12
15
18
21
24
27
30
33
36
Form
aldeh
yde
Chrom
ium
Benze
ne
1,3-
Butadie
ne
Carbo
n tet
Aceta
ldehy
de
Chloro
form
1,4-
dich
lorob
enzen
e
Tetra
chlor
oeth
ene
Ch
ron
ic c
ance
r ri
sk (
per
mill
ion)
Chronic cancer risk (per million people) comparison for the highest annual mean concentration monitored in the Lewiston study.
10
19Session 5: Data Analysis and Interpretation
Health Effects Assessments (cont.)
Ambient data can be used to • Perform simple risk screening against levels of
concern
• Calculate risk-weighted concentrations to estimate risk levels
• Validate and evaluate modeling efforts
• Identify temporal variability in concentrations for use in exposure modeling efforts
20Session 5: Data Analysis and Interpretation
Methods Evaluation
Shelow et al., National Air Monitoring Conference, 2009
11
21Session 5: Data Analysis and Interpretation
Methods Evaluation (cont.)
8
6
4
2
0
12:00 PM1/7/2008
12:00 AM1/10/2008
12:00 PM1/12/2008
12:00 AM1/15/2008
12:00 PM1/17/2008
12:00 AM1/20/2008
12:00 PM1/22/2008
12:00 AM1/25/2008
12:00 PM1/27/2008
dat
BC Sunset EC (g/m3)
(g/m3)
22Session 5: Data Analysis and Interpretation
Methods Evaluation (cont.)
8
6
4
2
0
Aet
h B
C
54321Sunset EC
Coefficient values ± one standard deviation a =0.11438 ± 0.0241 b =1.6522 ± 0.0176
1086420
OM concentration µg/m3
Average BC/EC ratio=1.83
12
23Session 5: Data Analysis and Interpretation
Methods Evaluation (cont.)
• Evaluation against standards can test accuracy and precision
• Evaluation against other existing methods can identify biases and real-world performance under ambient conditions
• Novel methods often provide surprising data that lead to better understanding of local emissions sources (e.g., local chrome facilities, solvent releases)
24Session 5: Data Analysis and Interpretation
Trend Analyses
• Trends are useful for demonstrating progress (or lack thereof) in mitigating emissions of air toxics.
• Trend analysis can be complicated by data below MDL, changing methods, and step-changes in ambient concentrations.
13
25Session 5: Data Analysis and Interpretation
Effect of Changes in MDL on Trends Assessment
0
0.0005
0.001
0.0015
0.002
0.0025
0.003
0.0035
0.004
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Co
nce
ntr
atio
n (
µg
/m3 )
Average ConcentrationAverage MDL
Year
Man
gan
ese
PM
2.5
In the national-level investigation of manganese (Mn) trends, MDL trends were similar to concentration trends, making us suspicious of the reliability of the overall ambient trend. This example shows average Mn PM2.5 concentrations and MDLs from 1990 to 2003. For this data set, Hyslop and White (2007) showed that reported MDLs are much lower than actual detection limits. Current recommendations are to be cautious with data within a factor of 6 to 10 of the reported MDL.
The trend shown here may not be a real trend.
26Session 5: Data Analysis and Interpretation
y = - 0 . 0 2 x + 4 8 . 6 3
R 2 = 0 . 8 8
0
0 . 0 5
0 . 1
0 . 1 5
0 . 2
0 . 2 5
0 . 3
0 . 3 5
0 . 4
0 . 4 5
1 9 9 6 1 9 9 8 2 0 0 0 2 0 0 2 2 0 0 4 2 0 0 6
Y e a r
Co
nc
en
tra
tio
n (
µg
/m3
)
Effect of Changes in MDL on Trends Assessment (cont.)
In contrast to the previous Mn PM2.5 trend, this benzene trend does not show influence from a change in MDL (i.e., the trends in concentration and MDL show different patterns).
Benzene 1997-2006 Trend
14
27Session 5: Data Analysis and Interpretation
How to Quantify Trends
• Initial investigation of trends– Inspect first and last year of the trend period or two multi-year
averages for change.– Use simple linear regression to determine the magnitude of a
trend over the trend period.
• Quantifying trends– The percent difference between the first and last year of the
trend period provides a rough sense of the change. – The difference between two multi-year averages provides
another measure of change and helps smooth out possible influences of meteorology.
– The percent change per year is provided by the slope of the regression line. This “normalized” value allows the analyst to compare changes across varying lengths of time (i.e., sites withdifferent trend periods).
• Test for significance (F-test or others)
28Session 5: Data Analysis and Interpretation
Visualizing Trends
y = 0.0851x - 166.54R2 = 0.369
y = 0.1253x - 248.65R2 = 0.5038
0
1
2
3
4
1992 1993 1994 1995 1996 1997 1998 1999 2000
Year
Co
nce
ntr
atio
n (
µg
/m3 )
y = -0.2248x + 450.95R2 = 0.5492
0
1
2
3
4
1992 1993 1994 1995 1996 1997 1998 1999 2000
Co
nce
ntr
atio
n (
µg
/m3 )
Benzene Annual Averages
%Δ = -55%
15
29Session 5: Data Analysis and Interpretation
Formaldehyde Annual Averages
1998 1999 2000 2001 2002 2003 2004 2005 2006Year
0
5
10
15
20
25
30
35
Co
nce
ntr
atio
n (
ug
/m3 )
Formaldehyde Annual Averages
0
5
10
15
20
25
1999 2000 2001 2002 2003 2004 2005
Year
Co
nce
ntr
atio
n (
µg
/m3 )
95th Percentile of Annual ConcentrationAnnual Average ConcentrationAnnual Median Concentration5th Percentile of Annual Concentration
Visualizing Trends (cont.)
30Session 5: Data Analysis and Interpretation
Summarizing Trends
y = -0.16x + 314.62R
2= 0.90
0
0.5
1
1.5
2
2.5
3
1999 2001 2003 2005Year
Co
nce
ntr
atio
n (
µg
/m3 )
Annual AverageAverage MDL
A statistically significant decreasing benzene trend
y = -0.01x + 26.18R2= 0.03
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2000 2002 2004 2006Year
Co
nce
ntr
atio
n (
µg
/m3 )
Annual AverageAverage MDL
A statistically insignificant decreasing benzene trend
• Site-level trends for benzene from two U.S. sites.
• Confidence in these results is high. The data are mostly above detection, MDLs are consistent for the whole trend period, and no outliers appear to influence the trend.
Comparing Trends
16
31Session 5: Data Analysis and Interpretation
Summarizing Trends (cont.)
Example from MDEQ trends report
32Session 5: Data Analysis and Interpretation
Interpretation of Results
• Were data of sufficient quantity and quality to meet project objectives with statistical certainty?– Uncertainty – sampling, analytical, representativeness
– N0.5 – are there enough samples?
– Data quality – contamination and other data issues
• Using the data to test project hypotheses – Emissions characterization
• Chemical source profile comparison (ambient vs. emissions)
• Emissions activity matches expected temporal patterns after adjusting for meteorology
• Wind analysis to corroborate impact of emitter on monitor
• Source apportionment
17
33Session 5: Data Analysis and Interpretation
Interpretation of Results (cont.)
• Using the data to test project hypotheses – Health effects assessment
• Comparison of concentrations to health benchmarks
• Comparison of concentrations to other sites/cities/states/nation
• Identifying pollutants above health benchmarks
– Community baseline• Characterizing annual averages, seasonal variability
• Quantifying toxics concentrations likely to be targeted by emissions reductions measures
• Characterizing spatial variability
– Methods evaluation• Is method more accurate, precise, sensitive?
• Does it have better time resolution?
• How much does it cost versus routine method?
34Session 5: Data Analysis and Interpretation
Lessons Learned from Data Analysis
• Plan for data analysis in your study.• Reserve adequate funds and time (schedule) to
conduct data analysis.• Start looking at your data early in the project as
it is first collected – don’t wait until the end.• Isolating a particular source impact on pollutant
concentrations is tricky (and local met data are vital).
• Many studies noted issues with differing MDLs across labs, too much data below MDL, etc.
18
35Session 5: Data Analysis and Interpretation
Lessons Learned from Data Analysis (cont.)
• Analyses do not always lead to the answer you anticipated.
• Getting a similar result using different analysis approaches gives you more confidence in your results.
• Visualization of data is key (…a picture tells a thousand words).
• Show uncertainty in results to demonstrate statistical significance of findings.