Processed data file: "M89A_AREA" Record size: 200 bytes Records per day: 8 Samples per record: 3...

1
Processed data file: "M89A_AREA" Record size: 200 bytes Records per day: 8 Samples per record: 3 Minutes per sample: 60 RECORD: 1 | CAL1: 18857 CAL2: 62635 M89001.000 C2 | 116709 106833 81992 129625 458268 89422 53084 0 M89001.030 A1 | 118587 108187 87758 120024 32041 48073 45640 0 M89001.060 C1 | 117745 108023 104585 155078 58808 72077 48562 0 RECORD: 2 | CAL1: 18857 CAL2: 62635 M89001.090 C2 | 117333 107242 83220 130655 454178 89622 52106 0 M89001.120 A1 | 119568 107565 82080 118471 31417 45542 44253 0 M89001.150 C1 | 119525 107279 98577 154252 63241 72239 46884 0 ******************************************************************************* Processed data file: "M89_AREA" Record size: 200 bytes Records per day: 12 Samples per record: 2 Minutes per sample: 60 RECORD: 1 | CAL1: 62635 M89005.030 A1 | 133639 115526 48404 120229 32122 45119 46083 213023 M89005.060 C1 | 134752 115629 45905 112618 483211 83517 54996 212240 RECORD: 2 | CAL1: 62635 M89005.090 A1 | 131066 116164 47814 120268 32215 46229 46174 213993 M89005.120 C1 | 130249 114536 43369 112737 457665 81288 57024 214091 ******************************************************************************* Processed data file: "M90_HGHT" Record size: 264 bytes Records per day: 12 Samples per record: 4 Minutes per sample: 30 RECORD: 1009 | CAL1: 18566 CAL2: 68285 M90200.030 C1 | 6695 6054 2210 10861 6072 2888 3485 13789 M90200.045 A1 | 6716 6092 2255 11331 3167 2779 3493 13912 M90200.060 C2 | 6673 6224 2296 11300 4129 3076 3860 13878 M90200.075 A2 | 6680 6096 2281 11317 3221 2724 3501 13670 RECORD: 1010 | CAL1: 18566 CAL2: 68285 M90200.090 C1 | 6692 6072 2235 10846 6145 2895 3494 13758 M90200.105 A1 | 6667 6090 2289 11210 3156 2747 3508 13656 M90200.120 C2 | 6678 6212 2244 11287 4110 3089 3878 13882 M90200.135 A2 | 6710 6077 2235 11237 3254 2764 3498 13831 The Radiatively Important Trace Species (RITS) Data Recovery Project 1 J.D. Nance 3 , T.M. Thompson 2 , J.H. Butler 2 , J.W. Elkins 2 1 Funded by NOAA’s Environmental Services Data and Information Management Program (ESDIM) 2 NOAA Climate Monitoring and Diagnostics Laboratory, 325 Broadway, Boulder, CO 80305 3 Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder 80309 Table: RITS System Channel Summary RITS Channe l Gas Chromatogra ph Carri er Gas Column Packing Materia l Detecto r Eluted Compound s A Hewlett- Packard 5890 P5 Porasil B Electro n Capture N 2 O CFC-12 CFC-11 B Hewlett- Packard 5890 N 2 OV-101 Electro n Capture CFC-11 CFC-113 CH 3 CCl 3 CCl 4 C Shimadzu P5 Porapak Q Electro n Capture N 2 O SF 6 Species) systems. Over the 16-year history of the RITS program, numerous modifications to system hardware/software and sampling conventions has given an evolutionary aspect to the basic structure and storage format of the RITS database. Early chromatogram analysis and quality control measures were significantly constrained by limitations in processing power. The computation of atmospheric concentrations from processed chromatograms has largely been performed in a piecewise fashion on an annual basis. Since the termination of the RITS program, an enhanced system of quality control methods and graphical analysis techniques has been implemented for the purpose of re-examining the RITS data in its entirety. This poster focuses on the effort to assemble all of the RITS data into a standardized and finalized form for inclusion in NOAA data center archives. Background The RITS program was launched in 1985 to provide ground-based, in situ atmospheric monitoring of several ozone-depleting and greenhouse gases measured by NOAA/CMDL (Table I). Three-channel gas chromatographs (shown at left) with electron capture detectors were installed at five sites over a five- year period (1986-1990). An additional ship-based deployment spanning the tropics and mid-latitudes of the Pacific Ocean was executed in the winter/spring of 1989. Secondary calibration standards referenced to primary gravimetric standards were prepared in the laboratory and shipped to the ground stations for sampling alternately with the outdoor environment. By the end of 1991, the RITS systems at all sites were injecting samples every 30 minutes producing a total of up to 4700 chromatograms every week. Between March of 1999 and August of 2001, the RITS systems were replaced with newer and more capable CATS (Chromatograph for Atmospheric Trace MLO/MC areas and heights: First year of measurements: 1987 Last year of measurements: 2000 Record size: 20 bytes Records per day: 48 Minutes per record: 30 RECORD: 35089 M89 001.002 62635 0 000 89422 3368 RECORD: 35091 M89 001.032 1 0 000 48073 2509 RECORD: 35093 M89 001.062 18857 0 000 72077 3670 RECORD: 35095 M89 001.092 62635 0 000 89622 3385 RECORD: 35097 M89 001.122 1 0 000 45542 2384 RECORD: 35099 M89 001.152 18857 0 000 72239 3606 *** *** *** RECORD: 35283 M89 005.032 1 0 000 45119 2377 RECORD: 35285 M89 005.062 62635 0 000 83517 3096 RECORD: 35287 M89 005.092 1 0 000 46229 2379 RECORD: 35289 M89 005.122 62635 0 000 81288 3116 *** *** *** *** *** RECORD: 62163 M90 200.032 18566 0 000 52915 2888 RECORD: 62164 M90 200.047 1 0 000 54394 2779 RECORD: 62165 M90 200.062 68285 0 000 56676 3076 RECORD: 62166 M90 200.077 2 0 000 49393 2724 RECORD: 62167 M90 200.092 18566 0 000 54117 2895 RECORD: 62168 M90 200.107 1 0 000 51115 2747 RECORD: 62169 M90 200.122 68285 0 000 56543 3089 RECORD: 62170 M90 200.137 2 0 000 51095 2764 Data Collection Transport of chromatograms to Boulder was normally accomplished via floppy disk and US mail or, in later years, via the internet. In Boulder, the chromatograms were transferred to a total of 48 DC600 tape cartridges (prior to normal quality control measures) and also to hard disk for quality control, processing and subsequent storage to a total of 17 magneto optical disks. Original storage formats for the chromatograms include both binary and text file types with byte-order differences among the binary types. The entire store of RITS raw data consists of ~2.5 million chromatograms from the five field sites combined. Chromatogram Standardization, Inventory, and Storage Renewal Chromatograms were converted to a standard format and run through a series of consistency checks prior to storage renewal on CDROM. The format-standardizing program checked for “time folds” -- regions of overlapping data due to system clock changes -- and other inconsistencies between the internal (file header) and external (filename) descriptors. Sample-type labeling errors were detected by plotting ratios of processed peak measures for nearby environmental and calibration sample injections. Cross-channel inconsistencies were detected by running the chromatograms through an inventory program that recorded the station, timestamp, sample-type, and channel of each chromatogram found within a 30-minute time slot (30 minutes being the highest sample injection rate for the RITS data). Inconsistencies were found in ~1 % of the chromatograms rechecked. These were corrected and reanalyzed to recover the lost data. Database Restructuring Another type of data loss was discovered to be related to the coarse time-resolution of the original database files. The grouping of an entire sampling cycle into a single data record with a single timestamp lead to inadvertent and inappropriate timestamp modification and data loss by overwriting after interruptions to the normal sampling cycle. This problem was addressed by restructuring the database of analyzed peaks to include timestamps for every sample injection. This was accomplished by initializing the restructured database with timestamps and sample types from the chromatogram inventory and employing an algorithm to match the peak analysis outputs stored in the original database with the appropriate inventoried chromatograms and transfer the data into the new database. Although this form of data loss was relatively minor, restructuring the database offered several important additional advantages: 1.The restructured database is compatible with all of the varied types of original database files. Thus, all of the data associated with a given analysis peak was able to be collected into a single file without regard to the details of the sampling cycle. 2.Upon scanning the new database in search of overwritten samples (i.e. initialized records for which no peak analysis outputs were transferred over from the original database) -- which typically numbered on the order of a thousand per station -- tens of thousands more good quality samples were discovered to have been overlooked during prior analyses. All overwritten and overlooked chromatograms were fetched and analyzed to fill in the gaps. 3.A flag byte was added to each data record to facilitate the flagging of individual injections for equipment problems. Because a single calibration sample of poor quality can adversely affect several individual computations of a compound’s atmospheric concentration, flagging these samples prior to final reduction becomes a powerful way to enhance the overall quality of the final dataset. 4.Isolating each chromatographic peak in its own file facilitates potential analyses of additional peaks (e.g. SF 6 in channel C). Summary: Primary Reasons for Data Loss .Raw data (i.e. chromatogram) recording errors • Timestamp, sample-type/channel identifiers .Problems with the original chromatogram analysis • Misidentified peaks • Excessively/Insufficiently-constrained analysis • Limitations of analysis software .Problems with the original analyzed peak database • Variant structure dependent on details of sampling cycle • Several injections grouped under a single timestamp • No facility for flagging samples of poor quality • Analyses of additional peaks very inconvenient Data Reduction Chromatogram analysis was most often performed in Boulder using modified BASIC language software acquired during the very early stages of the RITS program. The sole exception to this rule was during the years 1988-1993 when, because of logistical constraints, South Pole chromatograms were analyzed on site. The outputs generated during analysis (i.e. peak areas and heights) were assembled in record-oriented binary or text format database files for later retrieval during the computation of atmospheric concentrations. Each database file was structured in accordance with one of several multiple- injection sampling cycles. Data records were designed to accommodate a full cycle of injections to which a single timestamp was assigned. The details of the sampling cycle and the form of the timestamp both changed over time. Chromatogram Analysis Issues Apart from issues involving the non-uniformity of data storage formats and data loss from chromatogram recording errors, newly-developed graphical displays of the database found substantial data loss that occurred during chromatogram analysis due to the limitations of the analysis software: • Misidentified peaks • Missed peaks (Excessively-constrained analysis method) • Temporal instability of analysis (Insufficiently- constrained analysis method ) Much of this data loss ultimately resulted from the inability of the analysis software to focus all of its limited resources on one peak at a time. This problem was addressed by modifying the software to give it this ability and reanalyzing the appropriate chromatograms. Chromatography Problems: Flagging Example Before After Channel A Channel B Channel C N 2 O CFC-12 CFC-11 CFC-11 CFC-113 CH 3 CCl 3 CCl 4 N 2 O Chromatogram Examples Misidentified Peaks Original Analysis Reanalysis Missed Peaks Original Analysis Reanalysis Temporal Instability Original Analysis Reanalysis Chromatogram Reanalysis: Three Examples Niwot Ridge, Colorado Barrow, Alaska Mauna Loa, Hawaii Cape Matatula, American Samoa South Pole, Antarctica Ocean Cruise The Original Database: Examples From 3 Files The Restructured Database Channel A: N 2 O, CFC-12, CFC-11 Channel B: CFC-11, CFC-113, CH 3 CCl 3 , CCl 4 Channel C: N 2 O Details of sampling cycle Areas and heights kept in separate files. Timestamps associated with AIR1 sample. All Mauna Loa CH 3 CCl 3 peak areas and heights are contained in a single file. Every sample injection is initialized with a timestamp and sample type from the chromatogram inventory. A flag byte is used to mark individual injections for chromatography problems. These injections can be passed over during final reduction (i.e. the computation of atmospheric concentrations). One of several possible computational algorithms can also be set using the flag byte. Database Restructuring Example: Mauna Loa CH 3 CCl 3 All line-connected data points are used to compute atmospheric concentrations. Off-line data points are ignored.

Transcript of Processed data file: "M89A_AREA" Record size: 200 bytes Records per day: 8 Samples per record: 3...

Page 1: Processed data file: "M89A_AREA" Record size: 200 bytes Records per day: 8 Samples per record: 3 Minutes per sample: 60 RECORD: 1 | CAL1: 18857 CAL2: 62635.

Processed data file: "M89A_AREA" Record size: 200 bytes Records per day: 8 Samples per record: 3 Minutes per sample: 60

RECORD: 1 | CAL1: 18857 CAL2: 62635M89001.000 C2 | 116709 106833 81992 129625 458268 89422 53084 0M89001.030 A1 | 118587 108187 87758 120024 32041 48073 45640 0M89001.060 C1 | 117745 108023 104585 155078 58808 72077 48562 0

RECORD: 2 | CAL1: 18857 CAL2: 62635M89001.090 C2 | 117333 107242 83220 130655 454178 89622 52106 0M89001.120 A1 | 119568 107565 82080 118471 31417 45542 44253 0M89001.150 C1 | 119525 107279 98577 154252 63241 72239 46884 0

*******************************************************************************

Processed data file: "M89_AREA" Record size: 200 bytes Records per day: 12 Samples per record: 2 Minutes per sample: 60

RECORD: 1 | CAL1: 62635M89005.030 A1 | 133639 115526 48404 120229 32122 45119 46083 213023M89005.060 C1 | 134752 115629 45905 112618 483211 83517 54996 212240

RECORD: 2 | CAL1: 62635M89005.090 A1 | 131066 116164 47814 120268 32215 46229 46174 213993M89005.120 C1 | 130249 114536 43369 112737 457665 81288 57024 214091

*******************************************************************************

Processed data file: "M90_HGHT" Record size: 264 bytes Records per day: 12 Samples per record: 4 Minutes per sample: 30

RECORD: 1009 | CAL1: 18566 CAL2: 68285M90200.030 C1 | 6695 6054 2210 10861 6072 2888 3485 13789M90200.045 A1 | 6716 6092 2255 11331 3167 2779 3493 13912M90200.060 C2 | 6673 6224 2296 11300 4129 3076 3860 13878M90200.075 A2 | 6680 6096 2281 11317 3221 2724 3501 13670

RECORD: 1010 | CAL1: 18566 CAL2: 68285M90200.090 C1 | 6692 6072 2235 10846 6145 2895 3494 13758M90200.105 A1 | 6667 6090 2289 11210 3156 2747 3508 13656M90200.120 C2 | 6678 6212 2244 11287 4110 3089 3878 13882M90200.135 A2 | 6710 6077 2235 11237 3254 2764 3498 13831

The Radiatively Important Trace Species (RITS) Data Recovery Project1

J.D. Nance3, T.M. Thompson2, J.H. Butler2, J.W. Elkins2

1Funded by NOAA’s Environmental Services Data and Information Management Program (ESDIM)2NOAA Climate Monitoring and Diagnostics Laboratory, 325 Broadway, Boulder, CO 80305

3Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder 80309

Table: RITS System Channel Summary

RITS Channel

Gas Chromatograph

Carrier Gas

Column Packing Material

DetectorEluted

Compounds

AHewlett-Packard

5890P5 Porasil B

Electron Capture

N2O CFC-12CFC-11

BHewlett-Packard

5890N2 OV-101

Electron Capture

CFC-11CFC-113 CH3CCl3

CCl4

C Shimadzu P5 Porapak QElectron Capture

N2O SF6

Species) systems.

Over the 16-year history of the RITS program, numerous modifications to system hardware/software and sampling conventions has given an evolutionary aspect to the basic structure and storage format of the RITS database. Early chromatogram analysis and quality control measures were significantly constrained by limitations in processing power. The computation of atmospheric concentrations from processed chromatograms has largely been performed in a piecewise fashion on an annual basis.

Since the termination of the RITS program, an enhanced system of quality control methods and graphical analysis techniques has been implemented for the purpose of re-examining the RITS data in its entirety. This poster focuses on the effort to assemble all of the RITS data into a standardized and finalized form for inclusion in NOAA data center archives.

BackgroundThe RITS program was launched in 1985 to provide ground-based, in situ atmospheric monitoring of several ozone-depleting and greenhouse gases measured by NOAA/CMDL (Table I). Three-channel gas chromatographs (shown at left) with electron capture detectors were installed at five sites over a five-year period (1986-1990). An additional ship-based deployment spanning the tropics and mid-latitudes of the Pacific Ocean was executed in the winter/spring of 1989. Secondary calibration standards referenced to primary gravimetric standards were prepared in the laboratory and shipped to the ground stations for sampling alternately with the outdoor environment. By the end of 1991, the RITS systems at all sites were injecting samples every 30 minutes producing a total of up to 4700 chromatograms every week.

Between March of 1999 and August of 2001, the RITS systems were replaced with newer and more capable CATS (Chromatograph for Atmospheric Trace

MLO/MC areas and heights: First year of measurements: 1987 Last year of measurements: 2000 Record size: 20 bytes Records per day: 48 Minutes per record: 30

RECORD: 35089 M89 001.002 62635 0 000 89422 3368

RECORD: 35091 M89 001.032 1 0 000 48073 2509

RECORD: 35093 M89 001.062 18857 0 000 72077 3670

RECORD: 35095 M89 001.092 62635 0 000 89622 3385

RECORD: 35097 M89 001.122 1 0 000 45542 2384

RECORD: 35099 M89 001.152 18857 0 000 72239 3606

***

***

***

RECORD: 35283 M89 005.032 1 0 000 45119 2377

RECORD: 35285 M89 005.062 62635 0 000 83517 3096

RECORD: 35287 M89 005.092 1 0 000 46229 2379

RECORD: 35289 M89 005.122 62635 0 000 81288 3116

***

***

***

***

***

RECORD: 62163 M90 200.032 18566 0 000 52915 2888 RECORD: 62164 M90 200.047 1 0 000 54394 2779 RECORD: 62165 M90 200.062 68285 0 000 56676 3076 RECORD: 62166 M90 200.077 2 0 000 49393 2724 RECORD: 62167 M90 200.092 18566 0 000 54117 2895 RECORD: 62168 M90 200.107 1 0 000 51115 2747 RECORD: 62169 M90 200.122 68285 0 000 56543 3089 RECORD: 62170 M90 200.137 2 0 000 51095 2764

Data CollectionTransport of chromatograms to Boulder was normally accomplished via floppy disk and US mail or, in later years, via the internet. In Boulder, the chromatograms were transferred to a total of 48 DC600 tape cartridges (prior to normal quality control measures) and also to hard disk for quality control, processing and subsequent storage to a total of 17 magneto optical disks. Original storage formats for the chromatograms include both binary and text file types with byte-order differences among the binary types. The entire store of RITS raw data consists of ~2.5 million chromatograms from the five field sites combined.

Chromatogram Standardization, Inventory, and Storage Renewal

Chromatograms were converted to a standard format and run through a series of consistency checks prior to storage renewal on CDROM. The format-standardizing program checked for “time folds” -- regions of overlapping data due to system clock changes -- and other inconsistencies between the internal (file header) and external (filename) descriptors. Sample-type labeling errors were detected by plotting ratios of processed peak measures for nearby environmental and calibration sample injections. Cross-channel inconsistencies were detected by running the chromatograms through an inventory program that recorded the station, timestamp, sample-type, and channel of each chromatogram found within a 30-minute time slot (30 minutes being the highest sample injection rate for the RITS data). Inconsistencies were found in ~1 % of the chromatograms rechecked. These were corrected and reanalyzed to recover the lost data.

Database RestructuringAnother type of data loss was discovered to be related to the coarse time-resolution of the original database files. The grouping of an entire sampling cycle into a single data record with a single timestamp lead to inadvertent and inappropriate timestamp modification and data loss by overwriting after interruptions to the normal sampling cycle. This problem was addressed by restructuring the database of analyzed peaks to include timestamps for every sample injection. This was accomplished by initializing the restructured database with timestamps and sample types from the chromatogram inventory and employing an algorithm to match the peak analysis outputs stored in the original database with the appropriate inventoried chromatograms and transfer the data into the new database. Although this form of data loss was relatively minor, restructuring the database offered several important additional advantages:

1. The restructured database is compatible with all of the varied types of original database files. Thus, all of the data associated with a given analysis peak was able to be collected into a single file without regard to the details of the sampling cycle.

2. Upon scanning the new database in search of overwritten samples (i.e. initialized records for which no peak analysis outputs were transferred over from the original database) -- which typically numbered on the order of a thousand per station -- tens of thousands more good quality samples were discovered to have been overlooked during prior analyses. All overwritten and overlooked chromatograms were fetched and analyzed to fill in the gaps.

3. A flag byte was added to each data record to facilitate the flagging of individual injections for equipment problems. Because a single calibration sample of poor quality can adversely affect several individual computations of a compound’s atmospheric concentration, flagging these samples prior to final reduction becomes a powerful way to enhance the overall quality of the final dataset.

4. Isolating each chromatographic peak in its own file facilitates potential analyses of additional peaks (e.g. SF6 in channel C).

Summary: Primary Reasons for Data Loss1. Raw data (i.e. chromatogram) recording errors

• Timestamp, sample-type/channel identifiers2. Problems with the original chromatogram analysis

• Misidentified peaks• Excessively/Insufficiently-constrained analysis• Limitations of analysis software

3. Problems with the original analyzed peak database• Variant structure dependent on details of sampling cycle• Several injections grouped under a single timestamp• No facility for flagging samples of poor quality• Analyses of additional peaks very inconvenient

Data ReductionChromatogram analysis was most often performed in Boulder using modified BASIC language software acquired during the very early stages of the RITS program. The sole exception to this rule was during the years 1988-1993 when, because of logistical constraints, South Pole chromatograms were analyzed on site. The outputs generated during analysis (i.e. peak areas and heights) were assembled in record-oriented binary or text format database files for later retrieval during the computation of atmospheric concentrations. Each database file was structured in accordance with one of several multiple-injection sampling cycles. Data records were designed to accommodate a full cycle of injections to which a single timestamp was assigned. The details of the sampling cycle and the form of the timestamp both changed over time.

Chromatogram Analysis Issues

Apart from issues involving the non-uniformity of data storage formats and data loss from chromatogram recording errors, newly-developed graphical displays of the database found substantial data loss that occurred during chromatogram analysis due to the limitations of the analysis software:

• Misidentified peaks• Missed peaks (Excessively-constrained analysis method)• Temporal instability of analysis (Insufficiently-constrained analysis method )

Much of this data loss ultimately resulted from the inability of the analysis software to focus all of its limited resources on one peak at a time. This problem was addressed by modifying the software to give it this ability and reanalyzing the appropriate chromatograms.

Chromatography Problems: Flagging Example

Before

After

Channel A Channel B Channel C

N2O

CFC-12

CFC-11

CFC-11

CFC-113

CH3CCl3

CCl4

N2O

Chromatogram Examples

Misidentified Peaks

Original Analysis

Reanalysis

Missed Peaks

Original Analysis

Reanalysis

Temporal Instability

Original Analysis

Reanalysis

Chromatogram Reanalysis: Three Examples

Niwot Ridge, Colorado Barrow, Alaska Mauna Loa, Hawaii Cape Matatula, American Samoa South Pole, AntarcticaOcean Cruise

The Original Database: Examples From 3 Files The Restructured Database

Channel A: N2O, CFC-12, CFC-11

Channel B: CFC-11, CFC-113, CH3CCl3, CCl4

Channel C: N2O

Details of sampling cycle

Areas and heights kept in separate files. Timestamps associated with AIR1 sample.

All Mauna Loa CH3CCl3

peak areas and heights are contained in a single file.

Every sample injection is initialized with a timestamp and sample type from the chromatogram inventory.

A flag byte is used to mark individual injections for chromatography problems. These injections can be passed over during final reduction (i.e. the computation of atmospheric concentrations). One of several possible computational algorithms can also be set using the flag byte.

Database Restructuring Example: Mauna Loa CH3CCl3

All line-connected data points are used to compute atmospheric concentrations.

Off-line data points are ignored.