ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler...

26
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester

Transcript of ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler...

Page 1: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Seven Bottlenecks toWorkflow Reuse and Repurposing

Antoon Goderis

Ulrike Sattler

Phillip Lord

Carole Goble

University of Manchester

Page 2: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Take home message

• New problem – Workflow reuse and repurposing is

happening, how do we make it scale?• Data: Survey of 6 e-Science middleware projects• Requirements analysis: 7 bottlenecks

– Creating a pool of process knowledge– Accessing this pool

Page 3: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

e-Science

• Support sharing and col-laboratories in science• The world of distributed web services

– A boom in services: e.g. 1800+ bio services in the myGrid project

• Pulled together as in silico experiments– Scientist-friendly workflow languages– Hard to build (>1 year!) – A boom in workflows?

100 workflows in myGrid, up to 50 services

Page 5: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Scientists &developers

3rd party annotation providers

Scientists

Discover existing work

Edit workflow (repurposing actions)

Try out workflow

Register and annotate workflow and new services

for reuse

Deploy workflow

Workflow by example

Scientists &developers

Maintain reuse/repurpose history

Wroe, Goble, Goderis, Lord et al. Recycling workflows and services through discovery and reuse. CCPE 2005

Page 6: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Analyze This

Page 7: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Analyze This x #scientistsx #workflows

x #versionsx #runs

Page 8: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Workflow Web service

Describes process Describes process

Different workflow languages: BPEL, Scufl etc.

SOAP/WSDL interface

Orchestration/choreography of Web and web services

Participant in a workflow

Executable with workflow enactor

Executable

Can be published as a web or Web service

Page 9: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Workflow reuse Web service reuse

Reuse of editable processes Reuse of encapsulated processes

Repurpose / build onother people’s work

Incorporate other people’s work

Hackable; change data/control flow

Parametrisable operations

Discovery based on data/control flow

Discovery based on WSDL operations

Measures of aggregated task similarity and flow similarity

Measures of task similarity

Page 10: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Repurposing, discovery and composition

• Discovery – The process of finding, ranking and selecting existing

resources• Composition

– The process of combining resources into a new working assembly

– (auto-) discovery + (auto-) integration• Repurposing

– Auto discovery + manual integration – Need techniques for composition-oriented discovery

• Discovery supporting integration through rankings

Page 11: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

A field report of six projects

• www.myGrid.org.uk – reuse by collaborators – personal reuse (versioning)

• www.kepler-project.org – 10 complex workflows– reuse of distributed execution models

• www.inforsense.com – intranet exchanges within large pharmas

• www.geodise.org – 150 Matlab functions, 10 scripts– reuse of function combinations

Page 12: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

A field report of six projects

• www.myGrid.org.uk – reuse by collaborators – personal reuse (versioning)

• www.kepler-project.org – 10 complex workflows– reuse of distributed execution models

• www.inforsense.com – intranet exchanges within large pharmas

• www.geodise.org – 150 Matlab functions, 10 scripts– reuse of function combinations

No support for comparing workflows!No third party reuse!

Page 13: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

7 bottlenecks toreuse &

repurposing

Service availability

Workflow interoperability

Workflow rigidity

Discovery model

Process KA

IP rights

Ranking

Weare

here

Page 14: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Step 1: Collect as many workflows

as possibleRanking

Service availability

Workflow interoperability

Workflow rigidity

Discovery model

Process KA

IP rights

Page 15: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Ranking

Service availability

Workflow interoperability

Workflow rigidity

Discovery model

Process KA

IP rights

Step 2: Make thiscollection usable

Page 16: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Ranking

Service availability

Workflow interoperability

Workflow rigidity

Discovery model

Process KA

IP rights

e-Sciencecommunity

Semantic Webcommunity?

Wanted: technology providers

Page 17: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

The bottlenecks, in more detail

1. Service availability– web services: Kepler actors, myGrid

processors, Inforsense services– Local services: Web enable, encode,

repository

2. Intellectual property rights– Anonymization; journal policies

3. Workflow rigidity– Evolution and adaptation: parametrisation

Page 18: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

4 The nice thing about workflow standards…

• Workflow languages abound• Out of 6 projects, 5 do not use BPEL• Behavioural semantics left implicit, as a feature • Repurposing in case of multiple workflow systems

– outside system boundaries– and across

Benesh notation Laba notation

Page 19: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

• Bring out the behavioural semantics– Comparing 3 projects

through workflow patterns• E.g. simple merge

– Scientific workflows use functional programmingpatterns

– How do these combine into different distributed execution models?

– WSMO/SWSI/OWL-S?

4 The nice thing about workflow standards…

Page 20: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

• How to retrieve existing scientific workflows? – Scientists & developers facing distributed programs

• For scientists? Data flow discovery, in jargon, largely abstracting from control

ACAAGATGCCATTGT• For developers? Control flow discovery,

largely abstracting from data– Workflow patterns, Kepler distributed execution models

• Process networks, process algebra, Petri nets…

5 What belongs in the discovery model?

= ?

?

Page 21: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

• For scientists– WSMO Capability and OWL-S Profile clearly not

intended for data flow-based queries– OWL DL: A-Box based workflow queries [Goderis+DL’05]

• For developers– Workflow patterns, Kepler distributed execution models

• Pattern example based retrieval• An early table of combined execution models

5 What belongs in the discovery model?

Page 22: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

• Who does the annotation?

+ +• What should be in the annotation?

– Workflow fragments• Task aggregation/prediction• “Service decomposition”

– The things that went wrong!

6 New challenges in Knowledge Acquisition

Page 23: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

• Who does the annotation?– Updated service ontology learning and

automated service annotation techniques• What should be in the annotation?

– Workflow fragments• “Service decomposition”

– Cutting up service webs» Social network analysis (services as users!)

– The things that went wrong• Web site usability mining

6 New challenges in Knowledge Acquisition

Page 24: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

• Repurposing measuring integration effort• Ranking data flow (in jargon)

• Structural edit distance• E.g. services to remove/add/replace to

equal 2 workflows• For OWL workflow ontology, need

abduction or off-line processing• Ranking control flow

• Relationship between control flow constructs

7 Ranking workflow relevance

Page 25: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Take home message

• Problem: Workflow reuse and repurposing is happening, how do we make it scale

• Data: Survey of 6 e-Science middleware projects• Requirements analysis: 7 bottlenecks

– Creating a pool of process knowledge• Workflow interoperability

– Accessing this pool of knowledge • Workflow discovery, KA and ranking

Page 26: ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

ISWC 2005, Galway

Acknowledgements

• This work is supported by the UK e-Science programme EPSRC GR/ R67743.

• The authors would like to acknowledge the myGrid team. Hannah Tipney developed the Williams’ syndrome workflow and is supported by The Wellcome Foundation (G/R:1061183). We thank the survey interviewees for their contribution: Chris Wroe, Mark Greenwood and Peter Li (myGrid), Ilkay Altintas (Kepler), Vasa Curcin (InforSense), Ian Wang (Triana), Colin Puleston (Geodise) and Ben Butchart (Sedna).

• Sean Bechhofer provided useful comments on the draft.