“provenance”

13
“provenance” DATA TRACK Chair : Krystyna Marek Rapporteur: Wolfram Horstmann 6th e-Infrastructure Concertation Lyon 24 Nov 2008

description

6th e-Infrastructure Concertation Lyon 24 Nov 2008. “provenance”. DATA TRACK Chair : Krystyna Marek Rapporteur: Wolfram Horstmann. Motivation. Last two meetings were on standards It was proposed to have a more focussed discussion - PowerPoint PPT Presentation

Transcript of “provenance”

Page 1: “provenance”

“provenance”

DATA TRACK

Chair : Krystyna MarekRapporteur: Wolfram Horstmann

6th e-Infrastructure ConcertationLyon

24 Nov 2008

Page 2: “provenance”

Motivation

• Last two meetings were on standards

• It was proposed to have a more focussed discussion– Focus on practice and interoperability rather

than standards

• Select an arbitrary but important topic

Page 3: “provenance”

Notions of Provenance

• Where do data objects* originate from? – Scientific Work -- examples

• Instrumentation techniques– Manufacturers of hard- and software

• Methodologies– Processes, e.g. gene sequencing

– Technical/Local -- examples

• (web)-identifiers• Database, repository name

* Primary data, documents, metadata …

Page 4: “provenance”

Why Provenance?

• Quoting / Citing / Referencing as global scientific principle – „Reproducible research“

• Giving credits to authors / creators in distributed environments

• Original location / context has to be known

• Experienced in Grid-Environments [1]

Page 5: “provenance”

Provenance & Interoperability

• Re-Use / Sharing: “Addressing/Accessing”– Common view, common use– Unidirectional: No change of data objects!

• Federation: “Discovering in Context”– Remote representation of distributed DOs

• Aggregation: “Contextualizing”– Add unchanged object in a context

• Processing/Annotation: “Changing”– Uni- vs. Bidirectional: Change of DOs and remote

representation vs. back-storage (e.g. CVS)

Page 6: “provenance”

IVOA

• Astronomy area: Repositories use OAI-PMH to provide general

• Provenance as kind of metadata– „Observation data model“ – History of data (process „lineage“)

• Processing• Configuration: telescope, camera • Ambient condiditions: temperature etc.

– Versioning is included (also algorithms etc.)

Page 7: “provenance”

MetaFor

• Data from numerical models

• Descriptive information from model

• Models are often transformed

• Database / Registry for models in distributed repositories

Page 8: “provenance”

D4Science

• Framework for

• More than simple import framework

• Graphs representing provenance information– Thematic: fishing site / statistic /

Page 9: “provenance”

DRIVER

• Focus on document repositories– Some 100 …

• Simple Provenance– OAI-PMH

• Further (2nd order) Provenance– OAI-PMH („about“): repository identifiers– Enhanced Publications >> OAI-ORE

• Semantic Model (named graphs) representing packages of documents and data objects

Page 10: “provenance”

Solutions

• Provenance– Registries for curator, publisher etc.– Resolving over registry

• Diversity of approaches– CIDOC-CRM, OPM, EuroStats, – Languages: RDF / OAI-ORE

Page 11: “provenance”

Differentiations

• Expertise from Data-Centers as opposed to Data-Providers– Infrastructures should provide functions to

add provenenace information (but do not)– e.g. EGEE provides an additional module for

recording provenance data

Page 12: “provenance”

Hot topics

• Propagating provenance: versioning

• Disambiguation / Deduplication– different identical objects

• Who provides the data?– Each processing step should provide at least

some metadata

Page 13: “provenance”

Recommendations for Infrastructure

• Standards for Provenance: Non-existing?– Each processing step should provide at least

some metadata – Look deeper into specific implementations in

subject communities

• Technical point to point organisation– Bilateral

• Programming a meeting– 24/25th ESA: earth science meeting?