Post on 05-Jan-2016
Regulatory Perspectiveson Data Quality
Nick Mangus
US EPA / OAR / Air Quality System (AQS) Team
for the Earth Science Information Partners
July 13, 2011
04/20/23 1U.S. Environmental Protection Agency
Nightmare Scenario
04/20/23 2U.S. Environmental Protection Agency
Quality Data is Foundational for Policy• First, know the quality• Data Quality Objectives
– Each purpose has associated DQOs
– Prepare to respond to challenges
• Methods, Operation, Audits, etc.• Quality Assurance Project Plans
– SLT QAPPs
– EPA Regional Review
– EPA HQ QAPP
• Precision and Accuracy tests and reporting (“quality” metadata)• Independent performance audits
04/20/23 3U.S. Environmental Protection Agency
There is nothing either good or bad, but thinking makes it so.
- Hamlet
The Regulatory Data-Cycle
04/20/23 U.S. Environmental Protection Agency 4
Monitor the Air
Handle (QA, Flag) Data
Acquire Data
Report (Load) Data
Analyze
Regulate
StoreDisseminate
FED
SLT
NGO
Data Management Issues
• I’m not a quality person• I’m in data management: data comes first
– Data Quality vs. Quality Data• Think food quality vs. quality food
• Particular issues down the Value Chain– Custody– Movement– Changes– Calculations
04/20/23 5U.S. Environmental Protection Agency
Custody and Movement
• Who can change the data?– AQS business model: data is forever owned by the submitter
(e.g., not the feds)– When a question / complaint comes in, all we can do is pass
it up the chain– We have sufficient metadata to know who’s touched it and
(usually) why data is an outlier
• Movement: recent ETL example of rounding
04/20/23 6U.S. Environmental Protection Agency
AQS Data MartETL
We spend a lot of time comparing values: random spot checks
Calculations
• We get hourly ozone data from a monitor• We calculate:
– Submitted value in standard units of measure– 8-hour aggregates– Daily aggregates (of 1-hour and 8-hour values [2], for each
standard [3 x ?], in/excluding flagged data [3])– Quarterly aggregates (ditto)– Annual aggregates(ditto)– 3-year aggregates (ditto); substitutions for missing data
• A lot can go wrong – software QA is essential
04/20/23 7U.S. Environmental Protection Agency
Changes
• Regulatory changes drive data changes– Recent examples:
• Change the number of significant figures carried through calculations
• Change the substitution routines based on data completeness
– Apply retroactively throughout history
• Analysis artifacts (speciation carbon)• Old submittals
04/20/23 8U.S. Environmental Protection Agency
Summary
• Quality is expensive / time consuming• Pushing issues / metadata back up the chain is
an unresolved issue (?)• One mistake can tarnish a reputation that took
1,000 correct actions to create• Your system is optional• We have to work together to keep each other’s
systems meaningful and viable
04/20/23 9U.S. Environmental Protection Agency
You want it bad, you get it bad.
- Lillis
04/20/23 U.S. Environmental Protection Agency 10
Note for reviewers. The following slides are not intended to be part of the presentation but are “hip pocket” slides intended only to be used in the case of particular questions being asked.
AQ Data Chain – One View
04/20/23 11U.S. Environmental Protection Agency
Disseminate Decide Evaluate Calculate Store Validate VerifyCollect
Design Purchase Deploy Operate Collect Analyze QA (flag) Report
Data Owner (SLT)
Data Custodian
(EPA)
04/20/23 12U.S. Environmental Protection Agency