ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler...
-
Upload
lorena-perkins -
Category
Documents
-
view
218 -
download
0
Transcript of ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler...
ISWC 2005, Galway
Seven Bottlenecks toWorkflow Reuse and Repurposing
Antoon Goderis
Ulrike Sattler
Phillip Lord
Carole Goble
University of Manchester
ISWC 2005, Galway
Take home message
• New problem – Workflow reuse and repurposing is
happening, how do we make it scale?• Data: Survey of 6 e-Science middleware projects• Requirements analysis: 7 bottlenecks
– Creating a pool of process knowledge– Accessing this pool
ISWC 2005, Galway
e-Science
• Support sharing and col-laboratories in science• The world of distributed web services
– A boom in services: e.g. 1800+ bio services in the myGrid project
• Pulled together as in silico experiments– Scientist-friendly workflow languages– Hard to build (>1 year!) – A boom in workflows?
100 workflows in myGrid, up to 50 services
ISWC 2005, Galway
Evolving e-Science to a Web of Science?
• In silico experiments as commodities and know-how• Share, reuse, repurpose
– authoring time, quality and provenance collection
Manchester, CS
Manchester, Biology
Newcastle, CS
ISWC 2005, Galway
Scientists &developers
3rd party annotation providers
Scientists
Discover existing work
Edit workflow (repurposing actions)
Try out workflow
Register and annotate workflow and new services
for reuse
Deploy workflow
Workflow by example
Scientists &developers
Maintain reuse/repurpose history
Wroe, Goble, Goderis, Lord et al. Recycling workflows and services through discovery and reuse. CCPE 2005
ISWC 2005, Galway
Analyze This
ISWC 2005, Galway
Analyze This x #scientistsx #workflows
x #versionsx #runs
ISWC 2005, Galway
Workflow Web service
Describes process Describes process
Different workflow languages: BPEL, Scufl etc.
SOAP/WSDL interface
Orchestration/choreography of Web and web services
Participant in a workflow
Executable with workflow enactor
Executable
Can be published as a web or Web service
ISWC 2005, Galway
Workflow reuse Web service reuse
Reuse of editable processes Reuse of encapsulated processes
Repurpose / build onother people’s work
Incorporate other people’s work
Hackable; change data/control flow
Parametrisable operations
Discovery based on data/control flow
Discovery based on WSDL operations
Measures of aggregated task similarity and flow similarity
Measures of task similarity
ISWC 2005, Galway
Repurposing, discovery and composition
• Discovery – The process of finding, ranking and selecting existing
resources• Composition
– The process of combining resources into a new working assembly
– (auto-) discovery + (auto-) integration• Repurposing
– Auto discovery + manual integration – Need techniques for composition-oriented discovery
• Discovery supporting integration through rankings
ISWC 2005, Galway
A field report of six projects
• www.myGrid.org.uk – reuse by collaborators – personal reuse (versioning)
• www.kepler-project.org – 10 complex workflows– reuse of distributed execution models
• www.inforsense.com – intranet exchanges within large pharmas
• www.geodise.org – 150 Matlab functions, 10 scripts– reuse of function combinations
ISWC 2005, Galway
A field report of six projects
• www.myGrid.org.uk – reuse by collaborators – personal reuse (versioning)
• www.kepler-project.org – 10 complex workflows– reuse of distributed execution models
• www.inforsense.com – intranet exchanges within large pharmas
• www.geodise.org – 150 Matlab functions, 10 scripts– reuse of function combinations
No support for comparing workflows!No third party reuse!
ISWC 2005, Galway
7 bottlenecks toreuse &
repurposing
Service availability
Workflow interoperability
Workflow rigidity
Discovery model
Process KA
IP rights
Ranking
Weare
here
ISWC 2005, Galway
Step 1: Collect as many workflows
as possibleRanking
Service availability
Workflow interoperability
Workflow rigidity
Discovery model
Process KA
IP rights
ISWC 2005, Galway
Ranking
Service availability
Workflow interoperability
Workflow rigidity
Discovery model
Process KA
IP rights
Step 2: Make thiscollection usable
ISWC 2005, Galway
Ranking
Service availability
Workflow interoperability
Workflow rigidity
Discovery model
Process KA
IP rights
e-Sciencecommunity
Semantic Webcommunity?
Wanted: technology providers
ISWC 2005, Galway
The bottlenecks, in more detail
1. Service availability– web services: Kepler actors, myGrid
processors, Inforsense services– Local services: Web enable, encode,
repository
2. Intellectual property rights– Anonymization; journal policies
3. Workflow rigidity– Evolution and adaptation: parametrisation
ISWC 2005, Galway
4 The nice thing about workflow standards…
• Workflow languages abound• Out of 6 projects, 5 do not use BPEL• Behavioural semantics left implicit, as a feature • Repurposing in case of multiple workflow systems
– outside system boundaries– and across
Benesh notation Laba notation
ISWC 2005, Galway
• Bring out the behavioural semantics– Comparing 3 projects
through workflow patterns• E.g. simple merge
– Scientific workflows use functional programmingpatterns
– How do these combine into different distributed execution models?
– WSMO/SWSI/OWL-S?
4 The nice thing about workflow standards…
ISWC 2005, Galway
• How to retrieve existing scientific workflows? – Scientists & developers facing distributed programs
• For scientists? Data flow discovery, in jargon, largely abstracting from control
ACAAGATGCCATTGT• For developers? Control flow discovery,
largely abstracting from data– Workflow patterns, Kepler distributed execution models
• Process networks, process algebra, Petri nets…
5 What belongs in the discovery model?
= ?
?
ISWC 2005, Galway
• For scientists– WSMO Capability and OWL-S Profile clearly not
intended for data flow-based queries– OWL DL: A-Box based workflow queries [Goderis+DL’05]
• For developers– Workflow patterns, Kepler distributed execution models
• Pattern example based retrieval• An early table of combined execution models
5 What belongs in the discovery model?
ISWC 2005, Galway
• Who does the annotation?
+ +• What should be in the annotation?
– Workflow fragments• Task aggregation/prediction• “Service decomposition”
– The things that went wrong!
6 New challenges in Knowledge Acquisition
ISWC 2005, Galway
• Who does the annotation?– Updated service ontology learning and
automated service annotation techniques• What should be in the annotation?
– Workflow fragments• “Service decomposition”
– Cutting up service webs» Social network analysis (services as users!)
– The things that went wrong• Web site usability mining
6 New challenges in Knowledge Acquisition
ISWC 2005, Galway
• Repurposing measuring integration effort• Ranking data flow (in jargon)
• Structural edit distance• E.g. services to remove/add/replace to
equal 2 workflows• For OWL workflow ontology, need
abduction or off-line processing• Ranking control flow
• Relationship between control flow constructs
7 Ranking workflow relevance
ISWC 2005, Galway
Take home message
• Problem: Workflow reuse and repurposing is happening, how do we make it scale
• Data: Survey of 6 e-Science middleware projects• Requirements analysis: 7 bottlenecks
– Creating a pool of process knowledge• Workflow interoperability
– Accessing this pool of knowledge • Workflow discovery, KA and ranking
ISWC 2005, Galway
Acknowledgements
• This work is supported by the UK e-Science programme EPSRC GR/ R67743.
• The authors would like to acknowledge the myGrid team. Hannah Tipney developed the Williams’ syndrome workflow and is supported by The Wellcome Foundation (G/R:1061183). We thank the survey interviewees for their contribution: Chris Wroe, Mark Greenwood and Peter Li (myGrid), Ilkay Altintas (Kepler), Vasa Curcin (InforSense), Ian Wang (Triana), Colin Puleston (Geodise) and Ben Butchart (Sedna).
• Sean Bechhofer provided useful comments on the draft.