Data Challenges from the NASA Perspective
-
Upload
datacenters -
Category
Technology
-
view
378 -
download
0
Transcript of Data Challenges from the NASA Perspective
11Science Mission Directorate
Space Science Archives at NASA
Jeffrey Hayes Science Mission Directorate, NASA HQ
July 15, 2004
22Science Mission Directorate
The funding agencies and muses
How we really work!
(Note where Urania is…)
33Science Mission Directorate
Space Science Archives
The idea of archiving data is old in the NASA sciencecommunity. Data are the only legacy a mission hasonce it has ended.
Because data was being archived in a hap-hazard manner early on, it was decided in the late 1960’s to form a central archive to capture all Space Science data -- the National Space Science Data Center (NSSDC) at Goddard Space Flight Center (GSFC). This is still the “deep archive” for all NASA Space Science missions.
44Science Mission Directorate
Space Science Archives
Over the years the NSSDC evolved to include not only
Space Science data, but large ground-basedAstronomical catalogues needed by both the NASA
andthe general science communities.
All data were available on request.All data were maintained on (by today’s standards) very primitive media (i.e. cards, 7 track mag. tape)
Leads to questions of accessibility and usefulness.
55Science Mission Directorate
Standards
• For data to be useful, it must conform to a form that most of the community agrees to, and can understand.– Catalogue data was standardized on the 80-character line of
data (hold-over from punch cards). There were limitations, because some records were longer than 80 characters. Image data was essentially photographic prints. Spectroscopy was in lists of lines and wavelengths.
– All of these formats had severe limitations for transport and further analysis.
66Science Mission Directorate
Standards
• In 1977 the Flexible Image Transport System (FITS) standard was published (Wells). This was a self-describing data structure that allowed for the storage of data on a computer as a file with an embedded header.
• Quickly became the standard in the astronomical and parts of the Solar physics communities, and with substantial modifications, is still the standard.
• All FITS data can be read by all FITS readers (with provisos).
• Became a NASA standard for data in 1999.• New standards are coming on-line (XML, VOTable).
77Science Mission Directorate
Archival Evolution
The NSSDC was a great idea, but the advances incomputer science made it out of date. Astronomerswanted access to digital data that was straight-forwardand did not require translation from one data format toanother. There was also a desire for rapiddissemination of that data other than by US Mail.
The growth in the Internet and the decline of computerhardware prices made this possible.
88Science Mission Directorate
Archival Evolution
• In the mid 1980’s NASA tried to develop the Astronomical Data System (ADS), which would combine all astronomical data. It was a failure because there was not enough compute power or network capacity for such system. (A sideline to the ADS work was digitizing all the astronomical literature, and the ADS is now the premier site for such data in the world. All scanned papers are in PDF or JPEG formats.)
• 1990 NASA started the HEASARC with a more limited view of systematically archiving all High Energy Astrophysics data. This was very successful and the model for other space science archives.
99Science Mission Directorate
Active Archives
• HEASARC was the first of a series of active archives that allowed the community to interact with the data by allowing down-loads of post-Level 1 data sets. The archive model now consists of scientists who maintain the integrity of the data, and also develop new and better tools for the manipulation and analysis of these data.
• The model has been generalized across wavelength regimes: HEA -- HEASARC; UV/Optical -- MAST; IR/sub-mm -- IRSA; CMB -- LAMBDA; etc.
• The planetary sciences have the Planetary Data System (PDS) which parallels the Astronomy archives, but with differences (i.e. includes FITS as well as other standards, and nodes based on science discipline type).
1010Science Mission Directorate
Philosophy
• The last decade has seen the phenomenal growth in the power and ability in computers. This has allowed for the rapid evolution in archives and their ability to respond to the community’s needs.
• We have moved from a main-frame, static archive philosophy, to one that is more mobile and dynamic and evolves through feedback with more sophisticated data products.
• We have moved away from simple curation to managing the data. Data must now be migrated from one medium to another with a reasonable plan on how to do it.
1111Science Mission Directorate
Philosophy
• It is NASA policy that all science data come into the public domain as soon as possible. The researcher signs an agreement that all data taken by a NASA mission will be archived and after a suitable length of time (within 6 to 12 months from the date of observation/data acquisition), the data becomes publicly accessible. The are very few exceptions to this rule.
1212Science Mission Directorate
Hardware
• Hardware is now cheap. Both memory and disks are to the point where it is logical and practical to spin all data possible and make it accessible via the Web to all users.
• Less than 10 years ago, we were still using 9-track tapes, some Exabytes, some DAT, and M/O disks. Little data was “spinning”. Now with the advent of RAID disks on the TB scale, flash drives on the GB scale, and laptops with G-flop compute power, the only problem is bandwidth. That too is getting cheaper and more accessible.
• The NASA archives are trying to keep abreast of all these developments.
1313Science Mission Directorate
Software
• The archiving tools used are usually COTS products. It is not cost effective to develop entire stand-alone SQL systems for archives. The customization is in adapting their use in an astronomical context. However as the evolution of database management continues, we are seeing a tremendous flexibility in how these data can be managed.
• The compute power also allows for the development of very sophisticated data imaging, manipulation, and analysis tools. This is now considered to be within the purview of the active science archives.
1414Science Mission Directorate
Management
• Day-to-day operations are managed at the active archive level by a scientist responsible to NASA HQ for the assigned activities.
• There are biannual meetings of a coordinating committee (ADEC) with NASA HQ having ex-officio membership.
• The active Astronomy archives are peer-reviewed every 4 years in the Astronomy Senior Review process. On-going and new activities are proposed and judged. Funding can be reallocated as needed.
1515Science Mission Directorate
Operational Archives
• The suite of NASA Space Science archives currently consists of:– 1 “deep” archive:
— NSSDC– 7 active archives:
— HEASARC, MAST, IRSA, LAMBDA, MSC, PDS, SSDC– 2 data services:
— ADS, NED– 3 on-going Great Observatory missions have stand-alone
archives which are associated with the above active archives:
— HST, CXO, SIRTF
1616Science Mission Directorate
Interoperability
• There is a movement in the astronomical community to have access to data from multiple wavelength regimes in order to cross-correlate them. The Space Science archives are now working on a plan to implement such interoperability (the NVO by another name). In addition, we want to incorporate both theory and modeling data in the infrastructure.
• This promises a huge leap in science by using the Internet, Grid technologies, and very fast computing techniques.
• NASA is working on a response to the white paper produced by the archives.
1717Science Mission Directorate
Historical Data
• One last point: What to do with old, pre-digital data?– Photographic plates exist in huge numbers and are both
fragile and of finite lifetime. Harvard and Caltech are digitizing their plate collections, but most other institutions cannot because of the cost.
– Do we accept the lost of such historical data, or so we collect only those collections at large or national observatories which have a uniform pedigree?
Should data have expiration dates? (Like milk.)
1818Science Mission Directorate
NASA Astronomy Archives
Backup Material
1919Science Mission Directorate
Data Archive Centers: LAMBDA
Legacy Archive for Microwave Background Data Analysis (LAMBDA)
• http://lambda.gsfc.nasa.gov/• “One Stop Shopping for CMB Researchers”• Contains Cosmic Microwave Background data and data
products from WMAP, COBE, IRAS, SWAS missions; related software (CMBFAST, HEALPix etc); and archives of news and science papers.
2020Science Mission Directorate
Data Archive Centers: IRSA
NASA/IPAC InfraRed Science Archive (IRSA)
• http://irsa.ipac.caltech.edu/• “Archive node for scientific data sets from NASA’s infrared and
sub-millimeter astronomy projects and missions” • Contains data from 2MASS, IRAS, MSX, SWAS, ISO, Spitzer,
and related inventory, software, and data exploration services.
2121Science Mission Directorate
Data Archive Centers: MAST
Multimission Archive at Space Telescope (MAST)
• http://archive.stsci.edu/• “Supports a variety of astronomical data archives, with the
primary focus on scientifically related data sets in the optical, ultraviolet, and near-infrared parts of the spectrum”
• Contains data and data products from HST, FUSE, IUE, EUVE, ASTRO, HUT, UIT, WUPPE, and others;
• Also, catalogues and surveys from GALEX, SDSS, GSC, DSS, VLA-FIRST, relevant software (STSDAS), etc.
2222Science Mission Directorate
Data Archive Centers: HEASARC
High Energy Astrophysics Science Archive Research Center (HEASARC)
• http://heasarc.gsfc.nasa.gov/• “An archive of astronomy data from extreme ultraviolet, X-ray,
and gamma-ray observatories” • Contains data from ASCA, BeppoSAX, CGRO, Chandra, EUVE,
HETE-2, Integral, ROSAT, RXTE, XMM-Newton, and others. In the future will serve data from Astro-E2 and Swift. Also multi-mission software and analysis tools, and information for educators and the public
2323Science Mission Directorate
Data Archive Centers: CXC
Chandra X-ray Center (CXC)
• http://chandra.harvard.edu/• Center for Chandra science and calibration data, proposer
information, data analysis software assistance, public information and education resources.
2424Science Mission Directorate
Services: ADS and NED
NASA Astrophysics Data System (ADS)
• http://adswww.harvard.edu/• The main body of data in the ADS consists of bibliographic
records searchable through database queries, and full-text scans of much of the astronomical literature.
NASA/IPAC Extragalactic Database (NED)
• http://nedwww.ipac.caltech.edu/• Database built around a master list of extragalactic objects;
bibliographic references, photometry, position and redshift data, etc.
2525Science Mission Directorate
Data Archive Centers: MSC
Michelson Science Center (MSC)
• http://msc.caltech.edu/• “Science operations and analysis service organization for
selected NASA Origins Theme projects” - software infrastructure, science ops and consulting to Navigator Program projects and their user communities;
• Up-and-coming archive for data Palomar Testbed Interferometer, Keck Interferometer, SIM, and TPF.
2626Science Mission Directorate
Data Archive Centers: NSSDC
National Space Science Data Center (NSSDC)
• http://nssdc.gsfc.nasa.gov/• “The NSSDC is responsible for the long term archiving and
preservation of all space science data” - provides a permanent archive for OSS data (for space physics, solar physics and planetary/lunar, as well as astrophysics)
• Relatively recent data is held on CD-ROMs; older astrophysics datasets available on offline media.
2727Science Mission Directorate
Data Archive Centers: SSDC
Solar Science Data Center (SSDC)
• http://ssdc.gsfc.nasa.gov/• Provides a permanent archive for Solar data (for space physics,
solar physics as well as upper atmospheric physics)• Relatively recent data is held on CD-ROMs; older astrophysics
datasets available on offline media.• Data is in mixed formats: mainly FITS for imaging, but HDF
used for spectra.• Colocated with NSSDC at GSFC.
2828Science Mission Directorate
Data Archive Centers: PDS
Planetary Data System (PDS)
• http://pds.jpl.nasa.gov/
• Provides active archive for various aspects of planetary mission data. Unlike other centers, it is discipline specific, not wavelength specific (i.e. rings, small bodies, satellites, etc.)
• Relatively recent data is held on CD-ROMs; older datasets available on RAID disks or on offline media.
• Data is in mixed formats: some FITS for imaging, but mainly HDF of various flavors for other data.
• Various discipline nodes across the country with central coordination at JPL Central Node.
2929Science Mission Directorate
Data Archives: Funding Levels
NASA archive funding levels in FY04 (in $M)LAMBDA 0.9
IRSA 1.2
MAST/HST 0.85 (HST ~4M)
HEASARC 2.8
CXC (archive costs hard to deconvolve from overall Center costs )
NSSDC/SSDC (complicated because of shared costs from Solar Science.) ~$7M
MSC 7.5
ADS 0.8
NED 1.3
PDS 9 (for all nodes)