CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis...
-
Upload
emily-payne -
Category
Documents
-
view
217 -
download
0
Transcript of CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis...
CHESS seminar July 2005
Promoting reuse and repurposing on the Semantic Grid
Antoon Goderis
University of Manchester, UK
CHESS seminar, 19 July 2005
CHESS seminar July 2005
Talk plan
• The grid
• The semantic grid
• Reuse and repurposing
• 7 bottlenecks to repurposing
• Semantics to the rescue
CHESS seminar July 2005
The Grid
1. Pervasive and dependable computing utility
2. A distributed computing infrastructure for advanced science and engineering
3. Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations
CHESS seminar July 2005
Science in the 21st century
• Huge quantities of data • Huge number of data collection
devices• Analysis is the bottleneck• Global distributed science
– Collaboration and sharing the norm
• In silico experiments– Build, reuse, repurpose
on-line concurrent processes (workflows)
114 genomes735 in progress
CHESS seminar July 2005
Grid application evolution
Large scale data, large number of machines,
expensive computation, simple semantics, small
numbers of people
Smaller scale data, less machine computational
intensive, complex heterogeneous
applications, complex semantics, many people
High Energy Physics
Functional GenomicsOceanographyBiodiversityEarth ScienceNeuroscience
CHESS seminar July 2005
The Semantic Grid
• The Grid has been about large scale computation
• But the applications are also about collaboration• A gap between grid computing endeavours and
the vision of Grid computing • To support the full richness of the vision we
need both grid and semantic web (technologies)• Knowledge explicitly asserted & explicitly used
CHESS seminar July 2005
ClassicalWeb
ClassicalGrid
SemanticWeb
Ric
her
sem
antic
s
More computation
SemanticGrid
Source: Norman Paton
CHESS seminar July 2005
Semantics in Grid workflows
• Classification and discovery of computational and data resources; provenance trails
• Declarative specification of services, workflows and their requirements; problem solving selection
• Job control, distributed execution models, semantic integration, resource brokering, resource scheduling
• Encoding performance metrics, service state, event notification topics, access rights to databases, personal profiles and security groupings; charging infrastructure
CHESS seminar July 2005
Talk plan
• The grid
• The semantic grid
• Reuse and repurposing
• 7 bottlenecks to repurposing
• Semantics to the rescue
CHESS seminar July 2005
From building workflows to recycling them
• Reuse of workflows– Best practice – Training– Peer review
• Repurposing– Adapt and extend useful fragments– Build on best practice– Across groups / communities
CHESS seminar July 2005
Analyze This
CHESS seminar July 2005
Analyze This x #scientistsx #workflows
x #versionsx #runs
CHESS seminar July 2005
Bridging user information need and workflow descriptions
CHESS seminar July 2005
Network effects!
Bridging user information need and workflow descriptions
CHESS seminar July 2005
Reuse and repurposing
• A user will reuse a workflow or workflow fragment that fits their purpose and could be customised with different parameter settings or data inputs to solve their particular scientific problem.
CHESS seminar July 2005
Reuse and repurposing
• A user will reuse a workflow or workflow fragment that fits their purpose and could be customised with different parameter settings or data inputs to solve their particular scientific problem.
– A piece of an experimental description that is a coherent sub-workflow that makes sense to a domain specialist (in Ptolemy, a composite actor)
– A snippet of workflow code + annotation
CHESS seminar July 2005
Reuse and repurposing
• A user will reuse a workflow or workflow fragment that fits their purpose and could be customised with different parameter settings or data inputs to solve their particular scientific problem.
• A user will repurpose a workflow or workflow fragment by
1. finding one that is close enough to be the basis of a new workflow for a different purpose and
2. making small changes to its structure to fit it to its new purpose.
Aiming for automated discovery of ranked fragments
CHESS seminar July 2005
7 bottlenecks to workflow repurposing
1. Lack of a comprehensive discovery model
2. Process knowledge acquisition bottleneck
3. Lack of workflow fragment rankings
4. Workflow interoperability
5. Restrictions on service availability
6. Rigidity of service and workflow definitions
7. Intellectual property rights on workflows
Collect enough
workflows
Make workflows
usable
CHESS seminar July 2005
A comprehensive discovery model
• A user will repurpose a workflow or workflow fragment by1. finding one that is close enough to be the basis of a
new workflow for a different purpose and 2. making small changes to its structure to fit it to its
new purpose.• Based on semantic annotation, find a set of workflows,
which people can then edit– For scientists: data flow based queries in their
jargon, largely abstracting from control– For developers: control flow based queries,
largely abstracting from data
CHESS seminar July 2005
Keplerhttp://kepler.ecoinformatics.org/
Courtesy Bertram Ludaescher
CHESS seminar July 2005
• Scientist queries– Find all processes where sequence alignment is
followed by visualisation– Given a set of data points, services, or fragments,
have these been connected up in an existing base of workflows? Alternatives?
– Show me the provenance of this workflow• Developer queries
– How have people applied this dataflow execution model (eg in Ptolemy, an SDF Director)?
– How can it be combined with other execution models?
A comprehensive discovery model
CHESS seminar July 2005
• Challenges– Libraries of (scientific) task based patterns
• Eg task semantics of gene annotation pipelines classified in OWL
– Libraries of design patterns for distributed behaviour• Identify how people build concurrent systems;
how they choose (combinations of) execution semantics
• A good start: workflow patterns for Petri Nets– Eg synchronizing merge and multi-merge
A comprehensive discovery model
CHESS seminar July 2005
Workflow fragment rankings
• A user will repurpose a workflow or workflow fragment by1. finding one that is close enough to be the basis of a new
workflow for a different purpose and 2. making small changes to its structure to fit it to its new purpose.
• We need metrics for processes– For scientists: ranking scientific relevance– For developers:
• compare processes based on the same execution semantics
• compare different execution semantics• Challenge: defining the metrics, and combining them into
rankings
CHESS seminar July 2005
Workflow interoperability
• A user will repurpose a workflow or workflow fragment by
1. finding one that is close enough to be the basis of a new workflow for a different purpose and
2. making small changes to its structure to fit it to its new purpose.• Workflows take a long time to build and get very large• The nice thing about standards…• Different workflow systems, different (implicit) semantics• Import workflows across workflow environments
1. Manually redo it in your own
2. Wrapping
3. Auto-rewrite to new environment • eg
CHESS seminar July 2005
Workflow interoperability
• To inform interoperation, we need a layer of abstraction that captures behavioural semantics
• Many non-standardised formalisms out there– Functional languages - one paradigm fits all?– Petri nets – Process algebras– Finite State Machines– All (hierarchical-) combinations of these
• Challenge: – Behavioural design patterns to compare formalism
classes, eg PN and SDF Director
CHESS seminar July 2005
Conclusions
• Grid = Semantic Grid• Reuse <> repurposing• Task and behavioural semantics both needed for
repurposing• Design patterns for distributed processes: a long road
ahead– Task semantics– Behavioural semantics
CHESS seminar July 2005
EPSRC funded UK eScience Program Pilot Project
Many slides taken from Carole Goble
CHESS seminar July 2005
Core• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro
Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Jan Humble, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Ian Roberts, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson, Jimi Worthington and Chris Wroe.
Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of
Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester,
UK• Steve Kemp, Liverpool, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Keith Flanagan, Antoon
Goderis, Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker
CHESS seminar July 2005
References
• Publications on– Home page: www.cs.man.ac.uk/~goderisa– myGrid site: www.mygrid.org.uk