Working with Data Providers in a Distributed Data Environment
description
Transcript of Working with Data Providers in a Distributed Data Environment
![Page 1: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/1.jpg)
Working with Data Providers in a Distributed Data Environment
Raymond J. WalkerTodd A. KingSteven P. Joy
Lee F. BargatzePeter Chi
James WeygandRobert L. McPherron
Presented at Virtual Observatories in GeoscienceDenver, Colorado
June 12, 2007
![Page 2: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/2.jpg)
New Challenges
Background• Most Heliophysics data are available through
independent repositories.– Found around the world– Use different metadata standards – Are organized differently
• The Heliophysics Virtual Observatories have been tasked with connecting these disparate repositories into a logical whole that enables scientists to locate and access the data and services they need.
![Page 3: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/3.jpg)
The VMO Approach: Part I
• Selected the Space Physics Archive Search and Extract (SPASE) metadata standard and its XML representation to describe resources.
• Enables interoperability in a federated environment.
– Acts as an “interlingua” or intermediate language through which the VMO communicates with data repositories.
– Common metadata allows the repositories to be interconnected.
• Current state of SPASE– Version 1.2.0 has been released and VMO has baselined to that
version.– Defined a standard data model for all of Heliophysics.
![Page 4: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/4.jpg)
The Elements
• Resource descriptions are stored in registries.
• The VMO provides services:– Query registries– Aggregate and organize the
responses– Direct users to the resource– Provide data services
(reformat, manipulate, display, and analyze)
Resource
Repository
Registry
Access pointModel and Methods
![Page 5: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/5.jpg)
Organized in a Self-declared Network
VMO
VxOResidentArchive
IndividualResearcher
![Page 6: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/6.jpg)
The Approach: Part II
• Generate resource descriptions in SPASE XML.
• The SPASE data dictionary is scientifically very rich. • SPASE is so rich that the learning curve is steep.
• At best it is a formidable task to populate the registries.
• Most data providers do not have the resources to create the SPASE metadata and populate the registries.
• Develop a system for creating and populating the metadata with minimum effort.
![Page 7: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/7.jpg)
Creating SPASE Metadata
• Built tools to edit and verify the SPASE metadata.
• Built tools to populate the registries.
• Enlisted a group of domain experts (X-men) to work with data providers.
![Page 8: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/8.jpg)
Qualifications of the Magnetospheric X-men
• Research scientists who are actively engaged in the analysis of magnetospheric data.– Must understand space plasma physics.– Must understand space particles and fields instruments or have sufficient
background that they can quickly learn about them.– Must be expert in time series data analysis techniques.
• X-men must augment their scientific background with training in the principles of data management.
• Must understand the details of the SPASE data model.
• Must be expert in tools used for creating the metadata and populating the registries.
![Page 9: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/9.jpg)
What X-men do
• Develop a plan to make all of the data useful for magnetospheric research available to the community.– We are working to make the list exhaustive.– The list includes correlative data which we plan to access
through the other VXOs.
• Prioritized the ingestion tasks and work out an ingestion schedule.
• Contact data providers and jointly work out a plan to include their data in the VMO.
![Page 10: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/10.jpg)
• The SPASE data model is complex.
• The X-men have identified structure in the model that can be used to build tools to aid in writing the high level metadata.
![Page 11: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/11.jpg)
SPASE Editors Developed by VMO(Web Based)
![Page 12: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/12.jpg)
SPASE Editors Developed by VMO(Excel and Matlab)
(Input by VMO members or data providers)
Programmed by VMO
![Page 13: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/13.jpg)
SPASE Model
1) Ontology Tree 2) Enumeration Lists 3) Custom Settings
spase_model
WDC Geomagnetic Master Catalog
1) Acknowledgement File 2) Data Granule Existence Map 3) Granule Path, Name, Specifics
wdc_1_min
XML Files
create_spase_structurepopulate_structurewrite_structure
Version 1.2
SPASE Editors Developed by VMO(IDL)
![Page 14: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/14.jpg)
Why Not Just One Editor?• Each of the three X-men uses a different SPASE editing scheme.
• The SPASE leaning curve is sufficiently steep that they didn’t want to learn a new software system.
• The three tools use approaches with which they are comfortable.
• The existence of these three approaches plus others developed by the SPASE consortium hopefully will allow data providers to select software with which they are comfortable.
• For a first hand discussion see Bargatze et al., (this meeting)
![Page 15: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/15.jpg)
Automating the Generation of Detailed SPASE XML
![Page 16: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/16.jpg)
Validation Tool
![Page 17: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/17.jpg)
SPASE Data Dictionary Tool
![Page 18: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/18.jpg)
Registry Tools
![Page 19: Working with Data Providers in a Distributed Data Environment](https://reader036.fdocuments.in/reader036/viewer/2022081517/568157a7550346895dc536d5/html5/thumbnails/19.jpg)
Working with Data Providers to make Data Available Through VMO
The X-men assist the data providers to:• Use a SPASE editor to write high level SPASE XML.
• Verify the XML.
• Create Rule Sets (or other software) to populate the detailed level SPASE XML.
• Establish the registry at the remote site, if desired.
• Load the high level SPASE XML into the registry.
• Run the Rule Sets (or other software) to populate the registry.
Most importantly an expert is available to data providers at each step of the process.