YesWorkflow: More Provenance Mileage from Hybrid Provenance Models and Queries
Coding Provenance in Software and Matching Tools to Data
description
Transcript of Coding Provenance in Software and Matching Tools to Data
![Page 1: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/1.jpg)
Coding Provenance in Softwareand Matching Tools to Data
OPeNDAP Provenance Project
And
ESIP ToolMatch Project
Patrick West, Tetherless World Constellation Rensselaer Polytechnic Institute
![Page 2: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/2.jpg)
What is Provenance
• Provenance is information about entities, activities, and people involved in producing a piece of data or thing.
• In Data Science we’re interested in keeping track of, or being able to trace back, how a data product was generated and from what.
• E.G. As part of the Ecosystem Status Report there’s an interesting plot in one of the chapters which I’m interested in learning more about.
2
![Page 3: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/3.jpg)
Generating a Plot
3
![Page 4: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/4.jpg)
How did I get there?
4
![Page 5: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/5.jpg)
I know how it was generated
• Because I’m the one who added the plot to the document
• I know how the plot was generated
• I wrote parts of the software in OPeNDAP Hyrax that’s doing the data access, manipulation, and transformation
• So I know: . A plot is generated by accessing a set of data using OPeNDAP Hyrax; which generates a DAP DataDDS object by reading in a set of NetCDF files, constraining and projecting the data, running a server side function or two, doing an aggregation; and then using that data product to generate the plot.
5
![Page 6: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/6.jpg)
IPythonNotebook
cell
cell
cell
cell
Generating a Plot
6
OPeNDAP Hyrax
Reads in Data
Spits outdataBadda Bing Badda Boom
Uses dataGenerates plot
OPeNDAPRequest URL
BUT I WANT TO KNOW MORE
![Page 7: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/7.jpg)
Some informationI WANT to know
• How was that plot generated?
• What software was used to generate the plot and any intermediary data?
• What data files were read in to generate the plot, what was done to the data, and by what?
• Where did those data files come from? What parameters are in there? What sensors measured those parameters? Tell me information about the measuring of the data.
7
![Page 8: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/8.jpg)
Generating a Plot
8
OPeNDAP Hyrax
Reads in Data
Spits outdata
IPythonNotebook
cell
cell
cell
cellUses dataGenerates plot
OPeNDAPRequest URL
Where did the datafiles come from?
![Page 9: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/9.jpg)
Linked Data
• I also am interested in the developers of the software and who publishes the software, the licensing of the software, and how I could use it.
• I’m interested in what IPython Notebooks are, what they can do, and whether I could use them for other projects.
• And I want to be able to let the “owner” of the data files know that I’ve used the results of an access in a publication, presentation, article, or whatever.
9
![Page 10: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/10.jpg)
What the project focuses on
10
OPeNDAP HyraxOPeNDAP Hyrax
OLFS BES
NetCDF dap ServerSideFunctions
aggregate
transformRequest URL
![Page 11: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/11.jpg)
W3C Prov
11
![Page 12: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/12.jpg)
Prov-O
12
:dds_of_reading a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used <http://test.opendap.org/dap/data/h5/monday.h5> [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL <http://test.opendap.org/dap/data/h5/monday.h5>; ]; prov:used <http://test.opendap.org/dap/data/h5/tuesday.h5> [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL <http://test.opendap.org/dap/data/h5/monday.h5>; ]; prov:wasAssociatedWith <opendapi:software/hdf5_handler/2.1.1>; ];.
![Page 13: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/13.jpg)
Prov-O
13
:aggregated_dds a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used :constrained_dds; prov:wasAssociatedWith <opendapi:software/ncml_module/1.2.2>; ];.
:result a foaf:Document; nfo:fileName "thursday.h5"; dcterms:format netcdf; prov:wasGeneratedBy [ a prov:Activity; prov:used :aggregated_dds; prov:wasAssociatedWith <opendapi:software/fileout_netcdf/1.2.1>; ];.
:constrained_dds a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used :dds_of_reading; prov:wasAssociatedWith <opendapi:software/BES/3.12.0>; ];.
![Page 14: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/14.jpg)
DOAP – Description of a Project
14
![Page 15: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/15.jpg)
DOAP – Description of a Project
15
<http://opendap.tw.rpi.edu/instances/software/BES> a doap:Project, prov:Entity; doap:name "OPeNDAP Back-End Server (BES)"; doap:developer <http://tw.rpi.edu/instances/PatrickWest>; doap:developer <http://tw.rpi.edu/instances/DanHalloway>; doap:developer <http://tw.rpi.edu/instances/James_Gallagher>; doap:developer <http://tw.rpi.edu/instances/NathanPotter>; doap:homepage <http://opendap.org/download/hyrax?q=BES_software>; doap:vendor <http://tw.rpi.edu/instances/OPeNDAP>; doap:repository <http://opendap.tw.rpi.edu/instances/Repository>; doap:bug-database <http://scm.opendap.org/trac/>; doap:release <http://opendap.tw.rpi.edu/instances/software/BES/3.12.0>; doap:description "BES is a high-performance back-end server software framework that allows data providers more flexibility in providing end users views of their data."; doap:license <http://opendap.tw.rpi.edu/instances/License>;. <http://opendap.tw.rpi.edu/instances/software/BES/3.12.0> a doap:Version, prov:Entity; prov:specializationOf <http://opendap.tw.rpi.edu/instances/software/BES>; doap:name "BES-3.12.0"; doap:revision "3.12.0"; doap:download-page <http://opendap.org/download/hyrax/1.9>; doap:repository <http://scm.opendap.org/svn/tags/bes/3.12.0>; doap:license <http://opendap.tw.rpi.edu/instances/License>; doap:created 2013-08-27;
.
![Page 16: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/16.jpg)
DOAP – Description of a Project
16
<http://opendap.tw.rpi.edu/instances/Repository> a doap:SVNRepository; doap:location <http://scm.opendap.org/svn/> doap:browse <http://scm.opendap.org/svn/>.
<http://opendap.tw.rpi.edu/instances/License> dc:description "This software is distributed under the GNU Lesser General Public License <http://www.gnu.org/licenses/gpl.html>"; doap:name "GNU LESSER GENERAL PUBLIC LICENSE"; rdfs:seeAlso <http://www.gnu.org/licenses/gpl.html>;.
<http://opendap.tw.rpi.edu/id/opendap/D9IH6677D3I6HDIHD36IHDI7DH> # The hash above is: HASH(config file, BES version that read it) a prov:Agent; prov:wasDerivedFrom <http://opendap.tw.rpi.edu/instances/software/hdf5_handler/2.1.1>, <http://opendap.tw.rpi.edu/instances/software/BES/3.12.0>, <http://opendap.tw.rpi.edu/instances/software/ncml_module/1.2.2/>, <http://opendap.tw.rpi.edu/instances/software/fileout_netcdf/1.2.1>; . prov:wasDerivedFrom :config_file_hash; # b/c BES set it up: prov:wasAttributedTo <http://scm.opendap.org/svn/tags/bes/3.9.2>;.
![Page 17: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/17.jpg)
What We’re Trying
• The BES loads shared modules at startup that handle specific tasks
• Our first attempt was to use something called a Reporter that reports on the completion of a request, but it’s too after the fact.
• Second thought is that the modules themselves add provenance information on the fly, which to me is ideal, but is unrealistic.
• The probably implementation is that the BES, the software framework that communicates with the modules, is where the provenance is tracked.
17
![Page 18: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/18.jpg)
What’s next
• Get more use cases about what types of information we want to collect
• Write the story about what we’re trying to do
• Come up with software use cases for the implementation
• Continue discussing provenance with the core OPeNDAP group
• Continue to work with the original Prov group (Tim, Jim, and Stephan) in discussions
18
![Page 19: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/19.jpg)
Questions
19
![Page 20: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/20.jpg)
ToolMatch Usecase
• "I need data for Carbon dioxide (CO2) concentrations, a climate change indicator, for the summer of 2012, that can be accessed via OPeNDAP Hyrax and plotted as a timeseries.”
• "I need data with measurements of atmospheric aerosol optical depth sliced along latitude and longitude, returned as netcdf data, and accessible in MatLab."
20
![Page 21: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/21.jpg)
Using SADL
21
![Page 22: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/22.jpg)
Inferencing
22
* Equivalent ClassDataCollection <Aqua_AIRS_Level2_Plus_AMSU>and (isAccessedBy value OPeNDAP) or (hasDataStorageFormat value NetCDF)and (usesGridType value AuxiliaryLatLonGrid) or (usesGridType value RegularLatLonGrid)and usesConvention value ClimateForecast_CF* Subclass OfmappedBy value IDVand mappedBy value McIDAS-Vand mappedBy value Panoply Inferred
![Page 23: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/23.jpg)
Inferencing
23
* Equivalent ClassDataCollectionand (isAccessedBy value OPeNDAP) or (hasDataFormat value NetCDF)and usesConvention value CF1Conventionand usesConvention value RegularLatLonGrid* Subclass OfmappedBy value Ferretand mappedBy value GrADS
Inferred
![Page 24: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/24.jpg)
Inferencing
24
* Equivalent ClassDataCollectionand (isAccessedBy value GrADSDataServer) or (isAccessedBy value Hyrax) or (isAccessedBy value ThreddsDataServer) or (isAccessedBy value erddap)* Subclass OfisAccessedBy value OPeNDAP
Inferred
![Page 25: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/25.jpg)
Resulting Query
25
The resulting query to find the set of tools available to visualize a data collection becomes very simple
DESCRIBE ?toolWHERE { <data_collection> toolmatch:visualizedBy ?tool . ?tool rdf:type toolmatch:Tool .}
![Page 26: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/26.jpg)
The Result
26
Description
Tools
![Page 27: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/27.jpg)
Where are weand what’s next
• We’ve got part of the ontology done
• We’ve got stuff in the triple store
• We need to complete the dataset ontology piece
• We need to verify the ontology and rules
• We need crowd sourcing for more tools and information about tools
• Patrick needs to understand rules better
27
![Page 28: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/28.jpg)
Questions
28
![Page 29: Coding Provenance in Software and Matching Tools to Data](https://reader036.fdocuments.in/reader036/viewer/2022062304/5681457b550346895db24f20/html5/thumbnails/29.jpg)
References
OPeNDAP Provenance Project•Prov Overview - http://www.w3.org/TR/prov-overview/•OPeNDAP Prov - https://github.com/tetherless-world/opendap/•OPeNDAP LODSPeaKr - http://opendap.tw.rpi.edu/index.html•OPeNDAP Endpoint - http://opendap.tw.rpi.edu/virtuoso/sparql •OPeNDAP – http://opendap.org
ToolMatch Project•ToolMatch - http://wiki.esipfed.org/index.php/ToolMatch•ToolMatch Virtual Server - http://toolmatch.tw.rpi.edu/•ToolMatch Schema - http://toolmatch.tw.rpi.edu/docs/index •ToolMatch Endpoint - http://toolmatch.tw.rpi.edu/sparql
29