Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines

1
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ([email protected] ) , Deborah L. McGuinness (dlm@ cs.rpi.edu ) , Stephan Zednik ([email protected] ) , Patrick West (westp@ rpi.edu ) , Peter Arthur Fox ( [email protected] ) Rensselaer Polytechnic Institute 110 8 th St., Troy, NY, 12180 United States Poster: MT15A-08 Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute VSTO – Virtual Solar Terrestrial Observatory. FITS – Flexible Image Transport System Acknowledgments: Sapan Shah and Naveen Sridhar from the Tetherless World Constellation at RPI Joan Burkepile, Steve Tomczyk and Leonard Sitongia at the High Altitude Observatory. Sponsors: National Science Foundation Motivations and Challenges: Analysis of solar data necessary for space weather modeling and forecasting which have broad implications for terrestrial activity (e.g., communication grid reliability). Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable needed analyses. This work focuses on two challenges: Only small sections of the data will typically contain content of interest to scientists Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Data Management Strategies: Provenance records for individual visualizations. Ontological classification of visualizations, using DQ and STOM Encoding records in RDF Datacube [2] (proposed) IN51D-1713 Next Steps http:// bit.ly /V8NkBo Get the poster at Mauna Loa Solar Observatory (MLSO) Hawaii Intensity Visualizations Raw Image Data Raw Image Data Captured National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Follow-up Processing on Raw Data Publishes Time-stamped Observation Logs, maintained by MLSO staff. Comments on: Weather + Instrument conditions Case Study: Coronal Multi-channel Polarimeter (CoMP): Semantic Visualization Provenance Records: Datacube Basics: Properties attached to datasets/slices/observati ons: Dimensions: Year, Metric Attributes: GBU Metric Measures: 146 (the value) Datacube Usage: For HAO visualization records, Datacube can be used in two ways: - Returning aggregations of statistics for images (e.g., GBU results). - Returning sets of visualizations (data points) for further exploration, based on constraints (e.g., temporal range). Use Cases: - Activity Log Usage: Return images corresponding to a specific solar event record. - Provenance (utilized data product): For this set of images utilizing the following flat field configuration file. - Provenance (utilized process): For this set of images running based on version 2.0 of process “Extract Intensity”. - Observer Log Usage: For the following observer log comment, return visualizations within 2 hours of the comment timestamp. Time-stamped Activity Logs, maintained by MLSO staff. Comments on solar events (Coronal Mass Ejections, Active Regions) - Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory . - Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledge Base [3]. - Expanded use of dimensions in data cube, to include FITS header data. References: [1] Mauna Loa Solar Observatory (High Altitude Observatory Site): http://mlso.hao.ucar.edu / [2] RDF Datacube Vocabulary: http://www.w3.org/TR/vocab-data-cube / [3] Heliophysics Event Knowledge Base: http://www.lmsal.com/hek/ index.html

description

IN51D- 1713. Semantic Visualization Provenance Records:. Intensity Visualizations. Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines - PowerPoint PPT Presentation

Transcript of Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines

Page 1: Applying  Semantics in Dataset Summarization for Solar Data Ingest Pipelines

Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines

James Michaelis ([email protected]), Deborah L. McGuinness ([email protected]), Stephan Zednik ([email protected]), Patrick West ([email protected]), Peter Arthur Fox ([email protected])

Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States

Poster: MT15A-08Glossary:RPI – Rensselaer Polytechnic InstituteTWC – Tetherless World Constellation at Rensselaer Polytechnic InstituteVSTO – Virtual Solar Terrestrial Observatory.FITS – Flexible Image Transport System

Acknowledgments:Sapan Shah and Naveen Sridhar from the Tetherless World Constellation at RPIJoan Burkepile, Steve Tomczyk and Leonard Sitongia at the High Altitude Observatory.

Sponsors:

National Science Foundation

Motivations and Challenges:• Analysis of solar data necessary for space weather modeling and

forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability).

• Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable needed analyses.

• This work focuses on two challenges:• Only small sections of the data will typically contain content of

interest to scientists• Subsets of time-series data may correspond to an event of

interest at a particular time (e.g., a solar event)• Based on these challenges, one goal in this work was to enable

scientists to get back data sets corresponding to desired data products - to facilitate further analysis.

Data Management Strategies:• Provenance records for individual visualizations.• Ontological classification of visualizations, using DQ and STOM• Encoding records in RDF Datacube [2] (proposed)

IN51D-1713

Next Steps

http://bit.ly/V8NkBo

Get the poster at

Mauna Loa Solar Observatory (MLSO)Hawaii

Intensity Visualizations

Raw Image Data

Raw Image DataCaptured

National Center for Atmospheric Research (NCAR) Data Center.Boulder, CO

Follow-up Processing on Raw Data

Publishes

Time-stamped Observation Logs, maintained by MLSO staff. Comments on:Weather + Instrument conditions

Case Study: Coronal Multi-channel Polarimeter (CoMP):

Semantic Visualization Provenance Records:

Datacube Basics:

Properties attached to datasets/slices/observations:

Dimensions: Year, Metric

Attributes: GBU Metric

Measures: 146 (the value)

Datacube Usage:For HAO visualization records, Datacube can be used in two ways:- Returning aggregations of statistics for images (e.g., GBU results).

- Returning sets of visualizations (data points) for further exploration, based on constraints (e.g., temporal range).

Use Cases:- Activity Log Usage: Return images corresponding to a specific solar event record.

- Provenance (utilized data product): For this set of images utilizing the following flat field configuration file.

- Provenance (utilized process): For this set of images running based on version 2.0 of process “Extract Intensity”.

- Observer Log Usage: For the following observer log comment, return visualizations within 2 hours of the comment timestamp.

Time-stamped Activity Logs, maintained by MLSO staff. Comments on solar events (Coronal Mass Ejections, Active Regions)

- Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory .- Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledge Base [3].- Expanded use of dimensions in data cube, to include FITS header data.

References:[1] Mauna Loa Solar Observatory (High Altitude Observatory Site): http://mlso.hao.ucar.edu/[2] RDF Datacube Vocabulary: http://www.w3.org/TR/vocab-data-cube/[3] Heliophysics Event Knowledge Base: http://www.lmsal.com/hek/index.html