1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications...
-
Upload
christina-campbell -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Automated Data Quality Assurance for Marine Observations James V. Koziana Science Applications...
1
Automated Data Quality Assurance for Marine Observations
James V. KozianaScience Applications International Corporation (SAIC)
Hampton, VA 23666 USA
Third Meeting of GCOOS DMACRenaissance Orlando Hotel
Forbes Place Orlando, FL
23-24 February 2009
2
Outline Introduction
IOOS State of Oceans (time and space)
Data Quality Assurance System Quality Assurance Pyramid Science Data Challenges Quality Assurance System
Block diagram and discussions
Results Conclusions
Present and Future Applications Wrap-up
3
Vice Admiral Lautenbacher, Jr., U.S. Navy (Ret.) Under Secretary of Commerce for Oceans & Atmosphere; November 21, 2005 ACOOS
NANOOS
SCCOOS
CenCOOS
PacIOOS GCOOS
CaRA
SECOORA
MACOORA
NERAGLOS
11 Groups Funded byNOAA Coastal Services Center to
Establish Regional Associations (RAs)
U.S. Integrated Ocean Observing System (IOOS)
Vision: Lead the way in the provision of products and services based on ocean observations for a wide range of societal benefits.
Goal: Achieve unprecedented levels of resolution, quality, and distribution of all global and coastal ocean observations to improve predictions of ecosystem, weather and water, and climate events.
IOOS Requirements• Vision for observing systems will bring streams of real-time data from a distributed sensor system• Each data provider will prepare their data using Data Management and Communications (DMAC) standards and protocols.
4
State of Oceans & Coasts Varies Across Time and Space
Geophysical1. Sea surface meteorological variables2. Land–Sea Stream flows3. Sea level4. Surface waves, currents5. Ice distribution6. Temperature, Salinity7. Bathymetry
Biophysical1. Optical properties2. Benthic habitats
Chemical1. pCO2
2. Dissolved inorganic nutrients3. Contaminants4. Dissolved oxygen
Biological1. Fish species, abundance2. Zooplankton species, abundance3. Phytoplankton species, biomass (ocean
color)4. Waterborne pathogens
IOOS Core Variables
5
http://upload.wikimedia.org/wikipedia/commons/e/ee/Groundhog-Standing2.jpg
http://contraryguy-plants.buzznet.com/user/photos/fast-food-close-up/?id=1420108#usersubnav
Data Quality
Data Quality refers to the quality of data. Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J.M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose.
Quality Assurance (QA) and Quality Control (QC)
Quality assurance (QA): an integrated system of activities involving planning, quality control, quality assessment, reporting and quality improvement to ensure that a product or service meets defined standards of quality with a stated level of confidence.
Quality control (QC): the overall system of technical activities whose purpose is to measure and control the quality of a product or service so that it meets the needs of users. The aim is to provide quality that is satisfactory, adequate, dependable, and economical.
QUALITY ASSURANCE DIVISIONNational Center for Environmental Research And Quality AssuranceOffice of Research and DevelopmentU. S. Environmental Protection AgencyDecember 10, 1997
6
Quality Checks• Static: Single station; single-time checks– locates external out liners in
observations.– Unaware of previous or current meteorological or hydrological situation by other
observations and grids• Validity
• Internal consistency
• Vertical consistency
• Dynamic: Which defines the QC information by taking advantage of other available hydrological information.
– Position Consistency
– Temporal Consistency
– Spatial consistency
• Single character “data descriptor” for each observation– Provides an overall opinion to quality by combining the information from various quality
checks
– Algorithms used to complete the “data descriptors” functions of type of QC checks applied to observations and sophisticated checks
• Level 1: least sophisticated
• Level 2: medium
• Level 3: Most sophisticated
7
Range Limit Checks (Dynamic, Seasonal and Regional)
Rate Checks (Time Continunity)
Inter-comparisons (same sensor same platform)
Inter-comparisons (nearby similar sensor/platform)
Inter-comparisons (dis-similar sensor/platform)
Comparison with statistical trends
Comparison with remotely sensed data
Comparison with model
●
●
●
Begin at bottom and work upward
Increased
Accuracy
Algorithms more concrete
Works on wider variety of data
Algorithms more conceptual
Work on less variety of data
Quality Assurance Pyramid
Science Data Lifecycle
System Development Deployment/O&M Exploitation
Sensor Development
Platform Development
Algorithm Development
Algorithm Implementation
Model Development
Data Center Development
Platform Operations
Platform Maintenance
Data Acquisition
Processing DisseminationQA Archiving
Data Center Ops
Basic Research
Model Runs
Applications Decision Support
Model Implementation
Enhanced
Understanding
• Enhanced Understanding
• Forecasts
9
Science Data Management Challenges Data management systems are an important component of Earth Science missions
Support primary mission goal of timely delivery of high quality data products to science community
They are expensive and time consuming to develop and maintain Relative size of data management code vs science code Problems with traditional stovepipe development approach Continuous change associated with science algorithms
The science team needs to be able to allocate their time and resources to science, not data management. Data management functions represent 60% - 80% of the code for typical science
applications Continuous change inherent in research environments
Highly iterative nature of algorithm development drives numerous changes to code during development and after launch Data management code which is tightly coupled to algorithms and data products
Can be time consuming to modify as changes ripple throughout the code Leads to stovepipe approach that results in significant duplication of code and
effort
10
Data Quality Assurance System for Earth Science Data and Information
Scalable, modular system that can be used to address various methods of characterizing the quality of data products This approach facilitates science software development
Reduce level of effort required and program risk (cost effective)Allow data management team to be more responsive to science algorithm developers (flexibility)
System is designed to: Include substational core functionality that is common to any science applications Be easily configurable to work with many different data sets (observations and model output) Readily accommodate algorithm and data product additions and modifications with minimal code changes. Balance flexibility and performance
11
Data Quality Assurance System Block Diagram
“Run time” defined configuration files
Common Data Structure Components
Algorithm QC 1
ControlSubsystem
InputData
OutputFile
User Supplied
QC Algorithms
DataStore
Config. Files
AlgorithmLibrary
NetCDF
HDF
SensorML
XML
ASCII
Input Subsystem
NetCDF
HDF
SensorML
XML
ASCII
Output Subsystem
Algorithm QC 2
Algorithm QC 3
InputData FlowOutput
Framework
DataBaseDataBase
18
Dqa_app input_data input_data_cfg input_limits input_limits_cfg output_data output_data_cfg
42007 6/25/2008 0:00:00 ATMP1 29 42007 6/25/2008 1:00:00 ATMP1 29 42007 6/25/2008 2:00:00 ATMP1 28.9 42007 6/25/2008 3:00:00 ATMP1 29 42007 6/25/2008 4:00:00 ATMP1 29.1 42007 6/25/2008 5:00:00 ATMP1 28.7 42007 6/25/2008 6:00:00 ATMP1 28.8 42007 6/25/2008 7:00:00 ATMP1 28.9 42007 6/25/2008 8:00:00 ATMP1 28.5 42007 6/25/2008 9:00:00 ATMP1 28.5 42007 6/25/2008 10:00:00 ATMP1 28.4 42007 6/25/2008 11:00:00 ATMP1 28.2 42007 6/25/2008 12:00:00 ATMP1 28.3 42007 6/25/2008 13:00:00 ATMP1 28.2 42007 6/25/2008 14:00:00 ATMP1 26.6 42007 6/25/2008 15:00:00 ATMP1 64.5 42007 6/25/2008 16:00:00 ATMP1 37.5 42007 6/25/2008 17:00:00 ATMP1 35.5 42007 6/25/2008 18:00:00 ATMP1 28.2 42007 6/25/2008 19:00:00 ATMP1 28.2 42007 6/25/2008 20:00:00 ATMP1 28.3 42007 6/25/2008 21:00:00 ATMP1 28.4 42007 6/25/2008 22:00:00 ATMP1 28.6 42007 6/25/2008 23:00:00 ATMP1 28.4
42007 6/25/2008 0:00:00 ATMP1 29 042007 6/25/2008 1:00:00 ATMP1 29 042007 6/25/2008 2:00:00 ATMP1 28.9 042007 6/25/2008 3:00:00 ATMP1 29 042007 6/25/2008 4:00:00 ATMP1 29.1 042007 6/25/2008 5:00:00 ATMP1 28.7 042007 6/25/2008 6:00:00 ATMP1 28.8 042007 6/25/2008 7:00:00 ATMP1 28.9 042007 6/25/2008 8:00:00 ATMP1 28.5 042007 6/25/2008 9:00:00 ATMP1 28.5 042007 6/25/2008 10:00:00 ATMP1 28.4 042007 6/25/2008 11:00:00 ATMP1 28.2 042007 6/25/2008 12:00:00 ATMP1 28.3 042007 6/25/2008 13:00:00 ATMP1 28.2 042007 6/25/2008 14:00:00 ATMP1 26.6 042007 6/25/2008 15:00:00 ATMP1 64.5 V42007 6/25/2008 16:00:00 ATMP1 37.5 V42007 6/25/2008 17:00:00 ATMP1 35.5 L42007 6/25/2008 18:00:00 ATMP1 28.2 042007 6/25/2008 19:00:00 ATMP1 28.2 042007 6/25/2008 20:00:00 ATMP1 28.3 042007 6/25/2008 21:00:00 ATMP1 28.4 042007 6/25/2008 22:00:00 ATMP1 28.6 042007 6/25/2008 23:00:00 ATMP1 28.4 0
NDBC Provided Air Temp DataBuoy 42007 6/25/08
Quality Checked Data forBuoy 42007 6/25/08
QA
Range Checks (Hard & soft
flags)
Time Continunity
Checks (Hard & soft
flags)
Input, Output and Control
Configuration Files
Input/Output Configuration File Type Data Parameters Data Set Dimensions
Control Subsystem (ie., Data Flow)
Quality Assurance Procedures
19
Western Gulf of Mexico Recent Marine Datahttp://ndbc.noaa.gov/maps/WestGulf.shtml
Louisiana/Mississippi Coastal Region Recent Marine Data
http://ndbc.noaa.gov/maps/WestGulf_inset.shtml
Source of Data
20
Regional and Seasonal LimitsCentral Gulf of Mexico
Parameter JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
BARO MAX 1029.3 1028.4 1026.7 1025.3 1022.4 1021.4 1022.2 1021.1 1020.1 1022.7 1026.2 1028.1BARO MIN 1010.3 1007.7 1007.9 1008.2 1009.8 1010.1 1012.5 1011.1 1008.5 1008.3 1010.2 1011.0ATMP MAX 27.9 27.9 28.5 28.9 30.0 31.3 32.1 32.1 31.5 30.6 29.3 28.1ATMP MIN 12.7 13.2 14.6 18.1 21.6 24.3 25.5 25.5 24.6 21.2 16.9 14.4WTMP MAX 28.2 28.1 28.3 28.5 29.4 30.6 31.5 31.8 32.7 30.5 29.5 28.7WTMP MIN 18.5 17.8 18.3 18.0 23.1 25.8 27.2 27.6 25.5 24.9 22.2 20.3WDIR MAX 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0WDIR MIN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0WSPD MAX 14.5 14.2 13.9 12.6 11.3 10.3 9.4 9.8 12.8 13.4 13.6 13.8WSPD MIN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0WVHGT MX 3.0 3.0 2.9 2.5 2.2 2.0 1.7 1.8 2.7 2.7 2.7 2.8WVHGT MN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Seasonal/Regional Air Temperature Ranges
0
5
10
15
20
25
30
35
Month of Year
Air Temp (Max)
AirTemp (Min)
Seasonal/Regional Wind Speed
0
2
4
6
8
10
12
14
16
Month of Year
Wind Speed (Max)
Wind Speed (Min)
NDBC Technical Document 03-02
21
Air Temperature for Station 42001 (June 25, 08)
0
10
20
30
40
50
60
700 2 4 6 8 10
12
14
16
18
20
22
Time (Hours)
Atm
osp
heri
c T
em
pera
ture
(C
)
Air Temperature (42001)
Max Air Temp
Min Air Temp
VL
Note: L: Failed Limits CheckV: Failed Time-Continunity Test
22
Wind Speed for Station 42001 (August 2005)
0
5
10
15
20
25
30
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91
Time (Days)
Win
d S
peed
(Met
ers/
Sec
ond)
Maximum Wind Speed
Wind Direction
Minimum WInd Speed
Storm Flag
Seasonal/Regional Min Limit
Seasonal/Regional Max Limit
Seasonal/Regional Max Limit
Seasonal/Regional Min Limit
Wind Speed
“a” soft flags for observations above the Seasonal/Regional Max Limit
Katrina August 23 to 29, 2005
http://visibleearth.nasa.gov/view_rec.php?id=7938
(hours)
42007
23
Present and Future Applications
QA system is well-suited for a diverse science community employing many different methods to characterize the quality of their data products. Allow single-data providers to large-scale data providers (i.e., observational and model) to
perform automatic quality assurance on their data products QA system processes a real-time data stream
High quality data product Associated metadata Aggregated quality flags
Expanding the QA system To address additional input/output data types To enhance the algorithm library by developing additional quality control algorithms
o simple quality tests (ex., storm limits, time continuity, internal consistency and others)o higher order algorithm development that exploit the relationship between sensors and parameters.
To explore more supplicated algorithms that provide higher level accuracy presented by specific configuration of sensors.
Explore applying data mining to quality control Define interface to the QA for data providers
Use of the state-of-the-art visualization Use of analysis tools to monitor the real-time data streams How users analyze data to determine the root causes of problems and editing
24
Wrap-Up Scalable, modular Data Quality Assurance System
QA Algorithm Library is extendable to other data parameters by changing ASCII configuration files
Reduces level of effort required (cost effective)
Be easily configurable to work with many different data sets (observations and model output)
Readily accommodate algorithm and data product additions and modifications with minimal code changes.
QA was performed at same confidence level (Range Limit and Time Continuity Checks)
Initial validation testing with NDBC products Limited set of daily air temp (24 hours station 42007) and wind speed (3 days station 42001). Air temp (3 hard flags) Wind speed (48 soft flags)