Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise,...

39
Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance Reporting and Scientifc Software Management in Virtual Labs

Transcript of Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise,...

Page 1: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire

Data61 and LAND & WATER

Standard Proveance Reporting and Scientifc Software Management in Virtual Labs

Page 2: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

What are VLs?

What is VHIRL?

What is provenance?

How does VHIRL manage provenance (or not)?

How do we represent VHIRL’s actions to standardised provenance?

What work, other than representation, is needed for provenance?

What benefits do we get from this work?

Outline

Page 3: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

What are VLs?

Page 4: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

From https://nectar.org.au/virtual-laboratories-1, they are:

data repositories and computational tools and streamlining research workflows

What are VLs?

Page 5: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

What is VHIRL?

Page 6: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

• Virtual Hazards Impact & Risk Laboratory (VHIRL) is a scientific workflow portal

• Gives researchers access to a cloud computing for natural hazards research

• data from a variety of sources

• uses cloud computing resources

• currently has tools for the earthquakes, tsunamis & tropical cyclones in the Asia-Pacific region

What is VHIRL?

Page 7: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Components of the Virtual Lab: Virtual Hazard

Impact & Risk Laboratory (VHIRL) Data Services Processing

Services

Compute Services

Enablers

Virtual Laboratories

/Apps Data Analytics

Magnetics

Gravity

DEM

eScript

ANUGA

NCI Petascale

NCI Cloud

NeCTAR Cloud

Amazon Cloud

Desktop

Service Orchestration

Provenance Metadata

Auth.

Coastal Inundation

Tsuanmi Inundation

Scenario

Cyclone Wind Path Calculation

Landsat

Bathymetry

Cyclone Wind Model

Surface Wave Propagation

(earthquake)

TCRM

Connectivity via Provenance | Melanie Ayre | eResearch Australiasia 2015, Brisbane

Page 8: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance
Page 9: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance
Page 10: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance
Page 11: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

What is provenance?

Page 12: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

From http://en.wikipedia.org/wiki/Provenance#Computer_Science:

What is provenance?

“Computer science uses the term provenance to mean the lineage of data or processes, as per data provenance. However there is a field of informatics research within computer science called provenance that studies how provenance of data and processes should be characterised, stored and used. Semantic web standards bodies, such as the World Wide Web Consortium, ratified a standard for provenance representation in 2014, known as PROV.”

Page 13: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

How do we represent VLs using standardised provenance?

Page 14: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

• Natively tracks ‘everything’ used for scenario (re)runs

• Is not a: Data store, Software repo, Records mgt system

• Externalises as much information mgt as possible

• Code managed by the SSSC

VHIRL’s own data management

Page 15: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

• SSSC is a web-based system to manage code & dependencies

• Contains Problems & Solutions that define a workflow

• Solutions consists of a Toolbox

• Toolboxes are code wrapped in a Python script + description of the required inputs

Scientific Solutions Software Centre (SSSC)

Class diagram for the SSSC

Page 16: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Scientific Solutions Software Centre (SSSC) • Beautiful, RESTful API this example: http://vhirl-dev.csiro.au/scm/toolbox/2

• Solution prov:Plan

• No RDF metadata, yet!

Page 17: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Mapping VHIRL to PROV 1

Input Data Process Output

Data

Page 18: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Mapping VHIRL to PROV 2

Code Process Output

Data

Config

Input Data

“Ontology Design Pattern”

Page 19: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Mapping VHIRL to PROV 3

Code Process Output

Data

Config

Input Data

Who/

which

system

Who

used

Entity Activity Agent

Page 20: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Mapping VHIRL to PROMS

Report N

Entity Activity Agent

Reporting

System X

R.S. Report

Page 21: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Mapping VHIRL to PROMS

Page 22: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

VHIRL provenance into PROMS Server

Report N

Entity Activity Agent

Reporting

System X

R.S. Report

Report N Report N

Report M

Report N Reporting

System Y Report N

Report N Report N

Organisational

Provenance

Store

reported and stored

Page 23: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Modelling VHIRL’s data types

VL Run output

data

user The VL

Report N

managed

data

web

service

data

user

supplied

data

managed

code

user

supplied

code

Page 24: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

PROMS Reporting Toolkits

Page 25: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

VHIRL’s native PROV output

RDF file

Page 26: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

What work other, than representation, is needed for

provenance?

Page 27: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Provenance effort (step) pyramid

Data Management

Establishing Reporting

Continued

Reporting

Page 28: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

managed

data

web

service

data

user

supplied

data

managed

code

user

supplied

code

Data Management

output

data

all Entities need to

be ID’d (via URI)

and persisted VL Run

each VL run is

reported as an

Activity within a

Report

each VL instance

has/needs an ID and

is modelled as a

Reporting System

user

each VL user is

known by their login

(account) details.

Modelled as a

Reporter

The VL

Report N

each VL Report is ID’d

and persisted in the VL

Provenance Store

Page 29: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

managed

data

web

service

data

user

supplied

data

managed

code

user

supplied

code

Data Management VL ID’d and persisted

output

data

cited using PROMS-O format

soon to be VL ID’d and persisted, with

minimal metadata recorded too

SSSC ID’s and persisted

perhaps SSSC ID’s and persisted,

perhaps VL managed

soon to be VL ID’d and persisted, if required,

perhaps with time limits

Page 30: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

managed

data

web

service

data

user

supplied

data

managed

code

user

supplied

code

Data Management VL ID’d and persisted

output

data

cited using PROMS-O format

soon to be VL ID’d and persisted, with

minimal metadata recorded too

SSSC ID’s and persisted

perhaps SSSC ID’s and persisted,

perhaps VL managed

soon to be VL ID’d and persisted, if required,

perhaps with time limits

Virtual Labs Service Citation Example

[{ref}] {service title}

{service endpoint URI}

{query}

{time queried}

{cached copy ID}

[1] “Subset of elevation”

http://pid.csiro.au/service/anuga-thredds

“bussleton.nc?var=elevation&spatial=bb&

north=-33.06495205829679&south=-

33.551573283840156&west=114.849678

74597227&east=115.70661233971667&t

emporal=all&time_start=&time_end=&hor

izStride”

“2014-12-15T13:15:11”

http://pid.csiro.au/dataset/abcd1234

Page 31: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Establishing Reporting

VL Report

Organisational

Provenance

Store

querying & redelivery

Pro

ve

na

nce

Re

po

rtin

g T

oo

lkit

C#

Java

Python

Page 32: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Establishing Reporting - Reporting Toolkits

managed

data

web

service

data

VL Run

“Grid X”

“Service Y”

“Run 456”

e1 = Entity(title='Grid X',

description='netCDF grid of property X',

uri='http://eg-vl.org.au/dataset/123',

downloadURL='http://eg-vl.org.au/dataset/123?_view=dl',

wasAttributedTo='http://data.ga.gov.au/id/person/john.doe')

Agent

N

Report N Report for

Run 456

Page 33: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Establishing Reporting - Reporting Toolkits

managed

data

web

service

data

VL Run

“Grid X”

“Service Y”

“Run 456”

e1 = Entity(title='Grid X',

description='netCDF grid of property X',

uri='http://eg-vl.org.au/dataset/123',

downloadURL='http://eg-vl.org.au/dataset/123?_view=dl',

wasAttributedTo='http://data.ga.gov.au/id/person/john.doe')

Agent

N

e2 = ServiceEntity(

title='Subset of elevation',

description='5km solar radiation interpolated raster service',

serviceBaseUri='http://siss2.anu.edu.au/anuga/busselton.nc',

query='var=elevation&spatial=bb&north=-33.06495205&south=-

33.551573283&west=114.84967874&east=115.70661233&tempor

al=all&time_start=&time_end=&horizStride',

queriedAtTime='2014-12-15T13:15:11'

chachedCopy='http://bom.gov.au/dataset/678')

Report N Report for

Run 456

Page 34: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Establishing Reporting - Reporting Toolkits

managed

data

web

service

data

VL Run

“Grid X”

“Service Y”

“Run 456”

Agent

N

a0 = Activity(

title='Run 456',

description='Upper bound run, full Grid X use',

wasAssociatedWith={VL added automatically},

startedAtTime={VL added automatically},

endedAtTime={VL added automatically},

usedEntities= [e1, e2],

generatedEntities={VL added automatically}) Report N Report for

Run 456

Page 35: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Establishing Reporting - Reporting Toolkits

managed

data

web

service

data

VL Run

“Grid X”

“Service Y”

“Run 456”

Agent

N

Report N Report for

Run 456

r0 = Report(

title='Report for Run 456',

description='Upper bound run, full Grid X use',

startingActivity={VL added automatically},

endingActivity={VL added automatically})

rs0 = ReportSender('http://provstore.vl.org.au/report/')

rs.send(r0)

Page 36: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

What do we get from this work?

Page 37: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Graph power!

Report N Reporting

System X

...

Page 38: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

URI power!

Report N Reporting

System X

corporate

staff DB

temp repo

public web

service

DAP-style

repo

PROMS

instance

Page 39: Standard Proveance Reporting and Scientifc Software … · 2016. 2. 14. · Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance

Distributed graphs!

GA PROMS

instance

VL PROMS

instance

Uni Prov

Store

Distributed Querying via endpoint cache