Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotations. IEEE BigData...
-
Upload
khalid-belhajjame -
Category
Technology
-
view
550 -
download
0
description
Transcript of Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotations. IEEE BigData...
Small Is Beautiful: Summarizing Scientific Workflows
Using Semantic AnnotationsPinar Alper, Khalid Belhajjame, Carole A. Goble
University of Manchester
Pinar Karagoz Middle East Technical University
IEEE 2nd International Congress on Big DataJune 27-July 2, 2013
Pinar and her daughter Nile at the end of year school party.
• Data driven analysis pipelines
• Systematic gathering of data and analysis tools into computational solutions for scientific problem-solving
• Tools for automating frequently performed data intensive activities
• Provenance for the resulting datasets
– The method followed
– The resources used
– The datasets used
Scientific Workflows
Science with workflows
GWAS, PharmacogenomicsAssociation study of Nevirapine-induced skin rash in Thai Population
Trypanosomiasis (sleeping sickness parasite) in African Cattle
Astronomy & HelioPhysics
Library Doc
Preservation
Systems Biology of Micro-Organisms
Observing Systems Simulation Experiments
JPL, NASA
BioDiversity Invasive Species Modelling
[Credit Carole A. Goble]
Provenance is paramount for science
• Reporting findings– Derivation - how did we get this result?
• processes/programs used, execution trace, data lineage, source of components (data, services)
– History - who did what when? • creator, contributors, timestamps.
• Adapting to Change– Explanation - why did this record start to appear in
the result? – Change Impact - which steps will be affected if I
change this tool or data input?
PROV Primer, Gil et al
WF Execution TraceRetrospective Provenance: Actual data used, actual invocations, timestamps and data derivation trace
WF Description Prospective Provenance: Intended method for analysis
Workflows can get complex!• Overwhelming for users who are not the
developers
• Abstractions required for reporting
• Lineage queries result in very long trails
Reason and extent of complexity
• a.k.a. Shims
• Dealing with data and protocol heterogeneities
• Local organization of data
Garijo D., Alper. P., Belhajjame K. et al
D. Hull et al
~ 60%
Static Ways To Tackle ComplexityProcess-Wise and Data-Wise
abstractions• Sub-workflows
– Not always a significant unit of function (e.g. aesthetic purposes)
• Bookmarked data links– Cluster the output signature– Further complicates workflow
• Components– Library dependent
Our Solution: Workflow Description Summaries
• A graph model for representing workflows
• Graph re-write rules for summarization
IF <performs certain function> THEN <re-write WF graph>
motifs reduction-primitives
• Domain Independent categorization– Data-Oriented Nature– Resource/Implementation-Oriented
Nature
• Captured In a lightweight OWL Ontology
http://purl.org/net/wf-motifs
PART-1: Scientific Workflow Motifs
A graph model of data-driven workflowsPure Dataflows
W= <N,E>
Operation and Port Nodes
N = (Nop U Np)
Dataflow edges
E = (Eopp U Epp U Epop )
Motif annotations over operations
motifs(color_pathway_by_objects) = {m1:DataRetrieval}
motifs(Get_Image_From_URL_2) = {m2:DataMoving}
DataRetrievalDataRetrieval
DataMovinglDataMovingl
PART-2: Workflow reduction primitives
• Collapse (Up/Down)
• Compose
• Eliminate
Collapse Down
Collapse Up
Compose
Eliminate
How will rules be put to use
• Strategies as a set of rules for summarization
• Two sample strategies based on an empirical analysis of workflows
• Reporting:– Process: Significant activities (Retrieval, Analysis,
Visualization)– Data:
• Reduced cardinality • Stripped of protocol specific payload/formatting
Two sample strategies• By-Eliminate
– Minimal annotation effort – Single rule
• By Collapse– More specific annotation– Multiple rules
Overall Approach
Workflow Designer
Taverna Workbench
Motif Ontology
WF Summary
WF Description
Summarizer
Summarization Rules
Analysis Data Set
• 30 Workflows from the Taverna system• Entire dataset & queries accessible from
http://www.myexperiment.org/packs/467.html
• Manual Annotation using Motif Vocabulary
Summaries at a glance
By-Collapse
By-Elimination
• Causal Ordering of operations
• Reduced depth
By-Collapse
By-Elimination
Mechanistic Effect of Summarization
User Summaries vs. Summary Graphs
Related Work• User Views over provenance O. Biton, et al.
– User specified significant operations– Automatic partitioning of workflow graph.
• Provenance Redaction T. Cadenhead, et al. – Redaction primitives – Graph queries with regular expressions
• Provenance Publishing S. C. Dey et al.
– User policies on publishing (hide, retain)– Consistency checks
Highlights• Re-writing workflow graphs with rules• Exploiting semantic annotations of operations• Controlled, primitive-based re-writing
– Preserve acyclicity
• Users indirectly control the summarization– Encoding their preferences as summary rules
• Querying of Workflow Execution Provenance using summaries.
Future Work
Thank you!
Carole A. GOBLEUniversity of Manchester
Khalid BELHAJJAMEUniversity of Manchester
Pinar KARAGOZMiddle East Technical University
Pinar ALPERUniversity of Manchester
BibliographyD. Garijo, P. Alper, K. Belhajjame, O. Corcho, C. Goble, and Y. Gil. Common motifs in scientific workflows: An empirical
analysis. In the proceedings of the IEEE eScience Conference 2012.
P. Alper, K. Belhajjame, C. A. Goble, and P. Senkul. Enhancing and abstracting scientific workflow provenance for data publishing. Submitted for publication to BIGProv 2013 International Workshop on Managing and Querying Provenance Data at Scale, co-located with EDBT-2013.
O. Biton, et al. Querying and Managing Provenance through User Views in Scientific Workflows. 2008 IEEE 24th International Conference on Data Engineering, pages 1072–1081, Apr. 2008. J.Cheney et al.Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379–474, 2009.
D. Hull et al. Treating shimantic web syndrome with ontologies. In AKT Workshop on Semantic Web Services, 2004.
S. C. Dey, D. Zinn, and B. Ludäscher. Propub: towards a declarative approach for publishing customized, policy-aware provenance. In Proceedings of the 23rd international conference on Scientific and statistical database management, SSDBM’11, pages 225–243, Berlin, Heidelberg, 2011. Springer-Verlag.
Y.Gil and S. Miles Editors. The PROV Model Primer http://www.w3.org/TR/prov-primer/ .
S. C. Dey, D. Zinn, and B. Ludäscher. Propub: towards a declarative approach for publishing customized, policy-aware provenance. In Proceedings of the 23rd international conference on Scientific and statistical database management, SSDBM’11, pages 225–243, Berlin
T. Cadenhead, V. Khadilkar, M. Kantarcioglu, and B. Thuraisingham. Transforming provenance using redaction. In Proceedings of the 16th ACM symposium on Access control models and technologies, SACMAT ’11, pages 93–102, New York, NY, USA, 2011. ACM.,
Taverna Open Source and Domain Independent Workflow Management System http://www.taverna.org.uk/