Security and privacy in provenance
description
Transcript of Security and privacy in provenance
![Page 1: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/1.jpg)
Architecture Tutorial
Security and privacy in provenance
Simon MilesKing’s College London
![Page 2: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/2.jpg)
Architecture Tutorial
Outline
• Provenance• Models and Systems• Illustrative Application• Privacy and Security Issues
![Page 3: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/3.jpg)
Architecture Tutorial
Provenance
![Page 4: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/4.jpg)
Architecture Tutorial
What Provenance Is
• Oxford English Dictionary: – the fact of coming from some particular source or
quarter; origin, derivation– the history or pedigree of a work of art, manuscript,
rare book, etc.; – concretely, a record of the passage of an item through its various owners.
• Provenance is important for:– Interpretation– Judging value
![Page 5: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/5.jpg)
Architecture Tutorial
Causation
• Everything that is part of the provenance of an item is a cause of that item being as it is
• For example, provenance of a bottle of wine includes:– Grapes from which it is made– Where those grapes grew– Steps in the wine’s preparation– How the wine was stored– Between which parties the wine was transported, e.g.
producer to distributer to retailer
![Page 6: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/6.jpg)
Architecture Tutorial
Motivating Applications
• We and other projects interviewed and supported users with issues regarding provenance in a range of domains, including:
• Bioinformatics Particle Physics• Proteomics Organ transplant• Aircraft simulation Police database
integration• Social planning Chemical analysis• Genetic diseasesGrid service fault tolerance• Brain image analysis Astronomy
![Page 7: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/7.jpg)
Architecture Tutorial
Provenance Questions
• How did I (or someone else) come by this result?
• What was common and relevant in the history of this set of successful outcomes?
• Was the process claimed to be performed the one which was actually performed?
![Page 8: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/8.jpg)
Architecture Tutorial
Provenance Questions
• What inputs were used to derive this output?
• What software produced this data?
• Can I generalise from the process by which this result was produced to a re-usable plan?
![Page 9: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/9.jpg)
Architecture Tutorial
Provenance Questions
• Were these regulations followed in producing this result?
• Are these two independent conclusions actually based on the same faulty assumption/input?
• What differed between the way these two results were produced?
![Page 10: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/10.jpg)
Architecture Tutorial
Shared Histories and Futures
• Multiple data can be produced by one process
• One process can use data from many sources as input
• The provenance (and futures) of data items overlap
• It is suspect to say that one data item = one provenance, provenance stored with data
![Page 11: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/11.jpg)
Architecture Tutorial
Causal Provenance Models
Illustrative Application
![Page 12: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/12.jpg)
Architecture Tutorial
Causal graphs
Donor OrganDecision: Yes
![Page 13: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/13.jpg)
Architecture Tutorial
Causal graphs
Donor OrganDecision: Yes
Family ConsentDecision: Yes
decision based on
Blood TestResults: -ve
![Page 14: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/14.jpg)
Architecture Tutorial
Causal graphs
Donor OrganDecision: Yes
Family ConsentDecision: Yes
decision based on
response to
Blood TestResults: -ve
Blood TestRequest: 432
Family ConsentRequest: 432
response to
![Page 15: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/15.jpg)
Architecture Tutorial
Causal graphs
Donor OrganDecision: Yes
Family ConsentDecision: Yes
Patient BrainDeath: PID 432
decision based on
response to
triggered by
Blood TestResults: -ve
Blood TestRequest: 432
Family ConsentRequest: 432
response to
![Page 16: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/16.jpg)
Architecture Tutorial
Causal graphs
Donor OrganDecision: Yes
Family ConsentDecision: Yes
Patient BrainDeath: PID 432
decision based on
response to
triggered by
Blood TestResults: -ve
Blood TestRequest: 432
Family ConsentRequest: 432
response totriggered by
![Page 17: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/17.jpg)
Architecture Tutorial
Causal Connections
Patient afterdonation withtwo kidneys
Donationoperation
• Causes and effects are occurrences– Occurrence of an event, or– Occurrence of a data item or
physical object being in a particular state
![Page 18: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/18.jpg)
Architecture Tutorial
Documentation and Provenance• We can distinguish
– process documentation (the documentation recorded into a store about processes)
– provenance (everything that caused an item to be as it is)• Process documentation is recorded as processes are executed• The data items that a process will ultimately produce may not be
known at that time• Provenance of an entity is obtained as the result of a query over
process documentation
Process documentation Provenance
![Page 19: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/19.jpg)
Architecture Tutorial
Process Documentation
• Documentation of one process comes from multiple, possibly independent, sources
• May share a store or use separate ones
Family
TestingLab
Doctor
Blood TestResults
Blood TestRequest
Family ConsentDecision
Family ConsentRequest
Donor OrganDecision: Yes
Patient BrainDeath: PID 432
![Page 20: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/20.jpg)
Architecture Tutorial
Provenance Scope• An item is caused to be as it is by
previous events, which were themselves caused by other events
• The causal graph could go back to the beginning of time
• If all this information was provided as a result of a query, it would be unmanageable and mostly irrelevant to the querier
• Therefore, the querier needs to scope the query to that which is relevant scope
![Page 21: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/21.jpg)
Architecture Tutorial
Open Data Model
Organisation 1
Organisation 2
Organisation 3
• Distributed processes involve functionality from multiple independent organisations
• Each needs to record documentation independently• We need a common, open data model and interfaces for
recording and querying data in that model
ProvenanceStores
![Page 22: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/22.jpg)
Architecture Tutorial
Digitally Controlled Process
Inference
Blood TestResults
Blood TestRequest
![Page 23: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/23.jpg)
Architecture Tutorial
Inferred Physical ProcessDigitally Controlled Process
Inference
Blood TestResults
Blood TestRequest
Sent BloodSample
Received BloodSample
![Page 24: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/24.jpg)
Architecture Tutorial
Privacy and Security Issues
![Page 25: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/25.jpg)
Architecture Tutorial
Anonymised User Actions
• Provenance records for healthcare will include documentation regarding the actions of patients (or samples of theirs)• Going to see a particular (their) GP• Undergoing surgery at a particular hospital• Their blood sample being sent to a testing lab
• Even if the patient is anonymised within the records, the pattern of their actions can be enough to uniquely identify them
![Page 26: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/26.jpg)
Architecture Tutorial
Data and Metadata Rights
• Provenance is often viewed as metadata to the data of which it provides a history
• Provenance information is usually generated automatically at runtime, and it is not known what that information will be in advance, appropriate rights have to be applied to the provenance
• How do access rights of the provenance metadata relate to those of the data?
![Page 27: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/27.jpg)
Architecture Tutorial
Multi-Data Metadata
• Furthermore, provenance is often metadata to multiple data items
• For example, a record of the process of a transplant operation is the provenance of• The transplanted organ,• The decision to transplant,• Blood tests carried out to decide to transplant, etc.
• Each may be stored separately and have very different access control policies
![Page 28: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/28.jpg)
Architecture Tutorial
Necessary Distribution of Query
• It is sometimes necessary to distribute parts of the provenance data about a process into multiple stores
• For example, in the OTM case, by EU law the data regarding activity within each hospital had to remain within that hospital
• To answer a provenance question, we need to query across distributed stores
![Page 29: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/29.jpg)
Architecture Tutorial
Automatic Capture
• Provenance is often viewed as metadata to the data of which it provides a history
• Provenance information is usually generated automatically at runtime, and it is not known what that information will be in advance, appropriate rights have to be applied to the provenance
• How do access rights of the provenance metadata relate to those of the data?
![Page 30: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/30.jpg)
Architecture Tutorial
Traffic Confidentiality and Inference
• Traffic confidentiality means hiding the fact that a service was used by a client, even where transmitted data is encrypted• A pharmaceutical company querying a small lab’s
public database concerning a particular disease• Can help achieve confidentiality by using
intermediaries who use multiple services• But could infer actual service used from
provenance set up to allow inferences
![Page 31: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/31.jpg)
Architecture Tutorial
Extra Material
![Page 32: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/32.jpg)
Architecture Tutorial
Extra Material Index
• Motivation for general provenance models• Interoperability and the Open Provenance Model• Provenance technologies in database research,
digital libraries, semantic web• Provenance in Tupelo (from NCSA)• Provenance in Taverna (from Manchester)• The Provenance Challenges• Open research issues
![Page 33: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/33.jpg)
Architecture Tutorial
Motivation forCommon, General
Provenance Models
![Page 34: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/34.jpg)
Architecture Tutorial
Separately Documented Aspects
• Attribution and related events– Modified by Simon Miles, compressed by X– Created at time T1, deposited at T2
• Documentation of the processing of data– Enactment of workflows– Chain of ownership
• Versioning• Differing practice, technologies, emphasis:
workflows, DB research, libraries, semweb
![Page 35: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/35.jpg)
Architecture Tutorial
Preparation for Questions
• Don’t know in advance of something being produced that it will be produced– When documenting events, can’t yet
associate that documentation with what those events ultimately produce
• Don’t know in advance of being asked (about provenance) what will be asked– When documenting provenance, can’t restrict
documentation to that you know will be used
![Page 36: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/36.jpg)
Architecture Tutorial
Shared Histories and Futures
• Multiple data can be produced by one process
• One process can use data from many sources as input
• The provenance (and futures) of data items overlap
• It is suspect to say that one data item = one provenance, provenance stored with data
![Page 37: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/37.jpg)
Architecture Tutorial
Alternative Accounts
• In some disciplines or for some kinds of data, provenance can be disputed
• Even within a computer system, there can be multiple accounts of apparently the same event
A B
A sent X to B A sent Y to B
corruption
![Page 38: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/38.jpg)
Architecture Tutorial
Common General Models
• Provide skeleton for documenting all aspects of provenance
• Record lots without (much) regard to particular questions...
• Then query as relevant to required usage• System interoperation through common
serialisation• Can connect records from different
systems involved in producing 1 data item
![Page 39: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/39.jpg)
Architecture Tutorial
Provenance Scope• An item is caused to be as it is by
previous events, which were themselves caused by other events
• The causal graph could go back to the beginning of time
• If all this information was provided as a result of a query, it would be unmanageable and mostly irrelevant to the querier
• Therefore, the querier needs to scope the query to that which is relevant scope
![Page 40: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/40.jpg)
Architecture Tutorial
Interoperability
![Page 41: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/41.jpg)
Architecture Tutorial
Open Data Model
Organisation 1
Organisation 2
Organisation 3
• Distributed processes involve functionality from multiple independent organisations
• Each needs to record documentation independently• We need a common, open data model and interfaces for
recording and querying data in that model
ProvenanceStores
![Page 42: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/42.jpg)
Architecture Tutorial
Open Provenance Model
Can describe any process (not just WF execution)Allows alternate accounts by different observers
http://openprovenance.org
![Page 43: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/43.jpg)
Architecture Tutorial
OPM Requirements• To allow provenance information to be
exchanged between systems, by means of a compatibility layer based on a shared provenance model.
• To allow developers to build and share tools that operate on such provenance model.
• To define the model in a precise, technology-agnostic manner.
• To support a digital representation of provenance for any “thing”, whether produced by computer systems or not.
![Page 44: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/44.jpg)
Architecture Tutorial
OPM Non-Requirements• OPM does not specify the internal
representations that systems have to adopt to store and manipulate provenance internally.
• OPM does not define a computer-parsable syntax for this model (but prototype RDF, XML schemas have been developed)
• OPM does not specify protocols to store such provenance information in provenance repositories.
• OPM does not specify protocols to query provenance repositories.
![Page 45: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/45.jpg)
Architecture Tutorial
Contributors
• Original contributors from:– Universities: Southampton, Indiana, King’s
College, Manchester, Davis, Hasselt, Utah, Southern California
– Microsoft, NCSA, PNNL• Plus 3rd challenge participants including:
– Universities: Harvard, Chicago, Santa Barbara, Amsterdam
– SDSC
![Page 46: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/46.jpg)
Architecture Tutorial
Open Provenance Model
• 3 node types – artifact, process, agent• 5 arc types – used, generated, triggered,
derived, controlled – and inference rules• Generic – extensibility via annotation• Choice of granularity and focus (e.g.,
artifact or process-centric)
![Page 47: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/47.jpg)
Architecture Tutorial
Entities
• Artifact: Immutable piece of state, which may have a physical embodiment in an physical object, or a digital representation in a computer system.
• Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts.
• Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution.
![Page 48: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/48.jpg)
Architecture Tutorial
Edges
A
A
Pused
Pwas generated by
A
Pwas triggered by
was derived from
P
A
Role identifiers on edges specify in what wayan artifact relates to a process
![Page 49: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/49.jpg)
Architecture Tutorial
Pegasus Example
FITS DataSet Produce
Sky Mosaic
used (inputSet)
Degree used (size)
Mosaic
was generated by(output)
Pegasus /Condor DAGMan
was controlled by(enactor)
agent
artifact
artifact
processartifact
![Page 50: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/50.jpg)
Architecture Tutorial
Mapping Attribution to OPM
creation
used
used
A
was generatedby
Simon Miles
wasActionOf
agent
artifact
artifact
processartifact
A dc:creator “Simon Miles”
![Page 51: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/51.jpg)
Architecture Tutorial
Provenance Technologiesin
database research, digital libraries, semantic web
![Page 52: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/52.jpg)
Architecture Tutorial
Database Research
• In database research, the concept of provenance has been used for:– Inferring what database table values affected
a query result (Buneman et al)– Tracking the changes in relational data
structure between versions of a database– Tracking changes in database schemas
(Chiticariu and Tan)
![Page 53: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/53.jpg)
Architecture Tutorial
Why & Where Provenance (Buneman et al.)
SELECT name, telephoneFROM employeeWHERE salary > SELECT AVERAGE salary
FROM employee
AlfredBerthaCharlieDenise
Eric 020 7848 ….020 7848 ….020 7848 ….020 7848 ….020 7848 ….
900800700600500
DeniseEric 020 7848 ….
020 7848 ….
name telephone salary
name telephone
where
why
![Page 54: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/54.jpg)
Architecture Tutorial
Digital Library Technologies
• In digital libraries, a set of standards are sometimes used to provide data structures to store metadata along with archived objects, OAIS, METS, PREMIS...
• An Archival Information Packet (AIP) provides write-once data and metadata
• AIP metadata can contain identifiers and relationships to connect one version to preceding versions, and record events relevant to the archived object, e.g. compression, integrity check
![Page 55: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/55.jpg)
Architecture Tutorial
Provenance in RDF
• Different schemes have been suggested for recording documentation on the provenance of statements in RDF
• Reified statements:A: http://...subj http://...isRelated http://...objB: <A> http://...hasCreator “Simon”• Named graphs• Causal graph explicit as part of data model
![Page 56: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/56.jpg)
Architecture Tutorial
Provenance as Bibliography
• Dublin Core can be used to express bibliography information: creator, publisher, subject, etc.– http://purl.org/dc/elements/1.1/creator
• Not as expressive as causal graphs and can be captured in a graph– e.g. who created something is part of the process by
which it was created• But DC metadata common across applications
and easy to use• Users can find it helpful to include both
![Page 57: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/57.jpg)
Architecture Tutorial
Provenance in Tupelo
Thanks to Joe Futrelle, National Centre for Supercomputing Applications for following
slides
![Page 58: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/58.jpg)
Architecture Tutorial
Tupelo: semantic content
Abstracts content from storage impls (e.g., Sesame, Mulgara)Provides location-independent addressing of content and metadataSupports transparent mirroring, caching, failover, etc.
(tupeloproject.org)
![Page 59: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/59.jpg)
Architecture Tutorial
Tupelo
• “Tupelo... provides a Web access protocol and Java API (Application Program Interface) that interface with an RDF (Resource Description Framework) mapping of the Open Provenance Model.”– Towards provenance-aware geographic
information systems, ACM SIGSPATIAL 2008
![Page 60: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/60.jpg)
Architecture Tutorial
NCSA Provenance Infrastructure
Open Provenance Model
Tupelo Semantic Content Repository
Context ContextContext
OPM toolkit
Store Store Store
OPM toolkit
Visualization,interaction
Tracking,modeling,presentation
Abstraction,inference,storage
desktop,portal,etc.
![Page 61: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/61.jpg)
Architecture Tutorial
Tupelo Provenance API
• Java API to record OPM data as RDF, e.g
Artifact artifact = graph.newArtifact("input file 1");
graph.assertArtifact (artifact);
![Page 62: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/62.jpg)
Architecture Tutorial
Tupelo Provenance API
• Query OPM graph by searching for patterns in RDF
Unifier u = new Unifier();u.setColumnNames("file", "path");u.addPattern("file", Rdf.TYPE,
PC3Utilities.ns("CSV_file"));u.addPattern("file“, PC3Utilities.ns("PathToFile"), "path");
context.perform(u);
![Page 63: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/63.jpg)
Architecture Tutorial
Provenance in Taverna
Thanks to Paolo Missier, University of Manchester, for following slides
![Page 64: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/64.jpg)
Architecture Tutorial
Taverna
• “The Taverna workbench is a free software tool for designing and executing workflows, created by the myGrid project”– Taverna website
![Page 65: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/65.jpg)
Architecture Tutorial
65Collections example: from genes to SNPs
gene -> genomic region
extend region
retrieve SNPs in the region
rearrange SNP details
• See myexperiment.org: http://www.myexperiment.org/workflows/166
[ ENSG00000139618 , ENSG00000083093 ]
[[<1,23554512,16,rs45585833>, <1,23554712,16,rs45594034>,...],[<1,31820153,13,ENSSNP10730823>, <1,31818497,13,ENSSNP10730820>,...] ]
![Page 66: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/66.jpg)
Architecture Tutorial
66Collections, iterations, and provenance
l(s) → l(s)
l(s) → l(s)
s → s
s → l(s)
s → s
Processor signatures[139618, 83093]
[139618, 83093]
<13, 31871809,...>
[23520984, 31786617][16,13]
<16, 23560179,..> [16,13] [23560179, 31871809]
[ <1,23553692,16,rs152451>,...]
[<1,31840948,13,rs169546>,...]
Dot product
139618 83093
![Page 67: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/67.jpg)
Architecture Tutorial
67Capturing provenance with iterations
X:s
PY:s
[a1...ai...an]
semantics:Y = (map P [a1...an]) = [ (P a1) ... (P an) ](extends to multiple inputs...)
[b1...bi...bn]
workflow processor:the elementary graph building block
XP[n]
Y
a1 an
b1 bn
XP[1]
Y
...unfoldingduring execution:
b1 a1
bn an
P[1]
P[n]
wasGeneratedBy
wasGeneratedBy
used
used
OPM pattern:
...
iteration due to list depth mismatch
![Page 68: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/68.jpg)
Architecture Tutorial
68Querying provenance graphs
• Problem:– users are rarely interested in the complete
provenance graph• noisy, possibly large, difficult to navigate
• Goal: let users identify– variables that carry interesting values for
which provenance is sought– nodes in the graph where provenance
information should be reported
![Page 69: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/69.jpg)
Architecture Tutorial
Provenance query - no semantics
provenancy query syntax:SELECT merged_pathwaysAT get_pathways_by_genes1, mmusculus_gene_ensembl
interestingvalue
interestingprocessor
interestingprocessor
![Page 70: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/70.jpg)
Architecture Tutorial
Role of semantics in provenance
Tavernaruntime
P1
P2
P3
P4
P5
P6
P1
P2
P3
P4
P5
P6
P1
P2
P3
P4
P5
P6
dataflow topology +raw lineage events
Provenance capture and query processor
lineage database
(RDB)
query
semanticresource
annotations
“describe the derivation of each pathway through
Kegg, in which gene g is involved”
referenceontologies
Semanticoverlays
currentimplementation
![Page 71: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/71.jpg)
Architecture Tutorial
The Provenance Challenges
![Page 72: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/72.jpg)
Architecture Tutorial
Provenance Challenges 1 & 2
IPAW 2006, HPDC 200720 teams, 1 workflow, 9 queriesInteroperability?
lots of manual work requiredcall for standards
(source: gridprovenance.org)
![Page 73: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/73.jpg)
Architecture Tutorial
Provenance Challenge 3
• Ended with a workshop in Amsterdam, 10-11th June
• Specifically aimed at interoperability• Each team:
– Runs an astronomy data analysis process– Executes queries on provenance– Exports provenance as OPM– Imports other teams’ OPM provenance and
re-runs queries
![Page 74: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/74.jpg)
Architecture Tutorial
Open Issues
![Page 75: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/75.jpg)
Architecture Tutorial
Intention and Reason
• OPM provides a mechanistic view of what has occurred
• It does not capture assertions such as:– X occurred because I aimed to achieve Y– X occurred because I believed that Y was true– X occurred because I had an obligation to
ensure it did
![Page 76: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/76.jpg)
Architecture Tutorial
Digitally Controlled Process
Inference and Physical Processes
Blood TestResults
Blood TestRequest
![Page 77: Security and privacy in provenance](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681671e550346895ddb9986/html5/thumbnails/77.jpg)
Architecture Tutorial
Inferred Physical ProcessDigitally Controlled Process
Inference and Physical Processes
Blood TestResults
Blood TestRequest
Sent BloodSample
Received BloodSample