my Grid and Taverna: Now and in the Future
description
Transcript of my Grid and Taverna: Now and in the Future
![Page 1: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/1.jpg)
myGrid and Taverna:Now and in the Future
Dr. K. Wolstencroft
University of Manchester
![Page 2: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/2.jpg)
Background
• myGrid middleware components to support in silico experiments in biology
• Originally designed to support bioinformatics
chemoinformatics
health informatics
medical imaging
integrative biology
![Page 3: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/3.jpg)
History
EPSRC funded UK eScience Program Pilot Project
![Page 4: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/4.jpg)
myGrid in OMII-UK
OGSA-DAI
myGrid
OMII Stack
March 2006
10 DevelopersDedicated design, implementation, testing and support team – moving towards production quality software
![Page 5: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/5.jpg)
Lots of Resources
NAR 2006 – over 850 databases
![Page 6: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/6.jpg)
The User Community
Bioinformatics is an open Community• Open access to data• Open access to resources• Open access to tools• Open access to applications
Global in silico biological research
![Page 7: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/7.jpg)
The User Community Problems
• Everything is Distributed
– Data, Resources and Scientists
• Heterogeneous data • Very few standards
– I/O formats, data representation, annotation – Everything is a string!
Integration of data and interoperability of resources is difficult
![Page 8: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/8.jpg)
ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
![Page 9: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/9.jpg)
myGrid Approach - Workflows
General technique for describing and enacting a process
describes what you want to do, not how you want to do it
Simple language specifies how bioinformatics processes fit together – processes are web services
- High level workflow diagram separated from any lower level coding – therefore, you don’t have to be a coder to build workflows
RepeatMasker
Web service
GenScanWeb Service
BlastWeb Service
Sequence Predicted Genes out
![Page 10: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/10.jpg)
Freefluo Workflow enactor
Scufl + Workflow Object Model
Processor Processor
PlainWeb
Service
Soaplab
Processor
LocalApp
Processor
Enactor
TavernaWorkbench
Processor
BioMOBY
Processor
SeqHound
Processor
BioMART
SCUFL
Application data flow layerScufl graph + service introspection
Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates
Processor invocation layer
Workflow Execution
![Page 11: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/11.jpg)
Taverna Workflow Components
Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available
Freefluo Workflow engine to run workflows
Freefluo
SOAPLABWeb Service
Any Application
Web Service e.g. DDBJ BLAST
![Page 12: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/12.jpg)
What Services we Support
![Page 13: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/13.jpg)
User Interaction Handling
• Interaction Service and corresponding Taverna processor allows a workflow to call out to an expert human user
• Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline
Collaboration with the University of Bergen
Ref: Poster, Nettab 2005
• R for numerical analysis (microarray informatics amongst others)
![Page 14: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/14.jpg)
What shall I do when a service fails?
• Most services are owned by other people• No control over service failure• Some are research level
Workflows are only as good as the services they connect!
To help - Taverna can:• Notify failures• Instigate retries• Set criticality• Substitute services
![Page 15: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/15.jpg)
myGrid Users
• ~20000 downloads
• Users in US, Singapore, UK, Europe, Australia
• Systems biology• Proteomics• Gene/protein annotation• Microarray data analysis• Medical image analysis
![Page 16: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/16.jpg)
Trypanosomiasis Study
Resistance to trypanosomiasis in cattle in Kenya
Andy Brass, Paul Fisher – University of Manchester
•Form of Sleeping sickness in cattle –
Known as n’gana
•Caused by Trypanosoma brucei
![Page 17: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/17.jpg)
Study involves
Microarray data
QTL
SNPs
Metabolic pathway analysis
Need to access microarray data, genomic sequence information, pathway databases AND integrate the results
![Page 18: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/18.jpg)
![Page 19: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/19.jpg)
Workflow Reuse
Addisons Disease
SNP design
Protein annotation
Microarray analysis
myGrid Workflow Repository
http://workflows.mygrid.org.uk/repository
![Page 20: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/20.jpg)
Scufl Workflows + Taverna Workflow Workbench
OGSA-Distributed Query Processing
Results management
LSID
mIR
e-Science coordination e-Science mediator
e-Science process patterns
e-Science events
Notification service
Components designed to work together
myGrid information model
Metadata & provenance management using semantics
KAVE
Service management
Publication and Discovery using semantics
Feta
Pedro
Ontology
Portal & Application tools
![Page 21: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/21.jpg)
Data Management
• Workflows can generate vast amount of data - how can we manage and track it?
• Data AND metadata AND experiment provenance• LSIDs - to identify objects• Semantic Web technologies (RDF, Ontologies)
– To store knowledge provenance
• Taverna workflow workbench & plugins– Ensure automated recording
![Page 22: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/22.jpg)
KAVE Data and metadata management
• Life Science Identifiers (LSIDs)• Information Model• File management• Support for custom database
building• Provenance metadata capture
using RDF• SRB integration• OGSA-DAI integration
urn:data:f2
urn:data:f2
urn:data1urn:data1
urn:data2urn:data2
urn:compareinvocation3urn:compareinvocation3
urn:data12
urn:data12
Blast_report
[input]
[output]
[input]
[distantlyDerivedFrom]
SwissProt_seq
[instanceOf]
Sequence_hit
[hasHits]
urn:hit2….
urn:hit2….
urn:hit1…urn:hit1…
urn:hit50…..
urn:hit50…..
[instanceOf]
[similar_sequence_to]
Data generated by services/workflows
Concepts
[ ]
[performsTask]
Find similar sequence[contains]
Services
urn:data:3urn:data:3
urn:hit8….
urn:hit8….
urn:hit5…urn:hit5…
urn:hit10…..
urn:hit10…..
[contains]
[instanceOf]
urn:BlastNInvocation3urn:BlastNInvocation3
urn:invocation5urn:invocation5urn:data:f1
urn:data:f1
[output]
New sequence
Missed sequence
[hasName] [hasName
]
literalsDatumCollection
[type]
LSDatum
[type]Properties
[instanceOf]
[output]
[output]
[directlyDerivedFrom]
![Page 23: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/23.jpg)
Provenance Browsing in Taverna
New in Taverna 1.4
![Page 24: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/24.jpg)
Feta Semantic Discovery
Over 3000 services!
Find services by their function
Questions we can ask:
Find me all the services that perform a multiple sequence alignment And accepts protein sequences in FASTA format as input
![Page 25: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/25.jpg)
Upper level ontology
Task ontology
Informatics ontology
Molecular Biology ontology
Bioinformatics ontology
Web Service ontology
Specialises
Contributes to
sequence
biological_sequence
protein_sequence
nucleotide_sequence
DNA_sequence
protein_structure_feature
BLASTp service
Similarity Search Service
BLAST service
InterProScan service
myGrid Ontology
![Page 26: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/26.jpg)
Feta Architecture
Feta EngineService
Semantic Discovery
Taverna Workbench
Feta G
UI C
lient
DL ReasonerOntology Editor
Ontologist
User Classification
- In RDF(S) - Build myGrid Domain Ontology
Obtain descriptions
Obtain Classification
3
3
4
Feta Descriptions
Feta Descriptions
Feta Descriptions
![Page 27: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/27.jpg)
Annotations
• Feta has been available for ~1 year• Not yet in the release• Need critical mass of services before release
• Annotation experiments with users and domain experts
• Domain expert annotations much better – – hiring a full-time annotation – see the myGrid website for details
![Page 28: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/28.jpg)
Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results
Provenance Record
Custom Data Model
Input
Result
Results Integration
Smarter workflow design incorporating visualisation VBI collaboration
![Page 29: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/29.jpg)
Utopia
SeqVista
Visualisation
![Page 30: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/30.jpg)
New Plans for Taverna 2.0
![Page 31: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/31.jpg)
Evolving challenges
• Long running data intensive workflows• Manipulation of confidential or otherwise protected
information• Use with classical grid systems• Interaction with users during workflows
![Page 32: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/32.jpg)
Development
• Development of Taverna 2.0– reworking of the processor model to include duel
execution semantics incorporating data and control flow
– enhanced support for long-running workflows• fully distributed workflow enactment and authoring • User steering
– large scale data transfer
![Page 33: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/33.jpg)
Enhanced Processor Model
• Modular dispatcher mechanism– Dynamic service binding– Recursive invocation– Data filter implementation– Retry, failover, back-off behaviours
• Transparent third party data transfers• High throughput stream handling with implicit iteration
semantics
![Page 34: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/34.jpg)
3rd Party Data Transfers
• Allows ‘in place’ referencing of data – Large data sets no longer round-trip between workflow engine
and data provider– Allows restricted access to sensitive data
• Automatic de-reference when a reference type is linked to a value type within a workflow.– Connecting a grid service to a web service
![Page 35: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/35.jpg)
Streaming Data
• Allow execution of downstream workflow stages on partially complete results from upstream.
Service 1 Service 2 Service 3
Non streaming (Taverna 1), entire iteration must complete at each stage
Streamed data, Service 2 starts operating on partial results from Service 1
![Page 36: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/36.jpg)
Recursive Invocation
• Dispatcher allowing recursive invocation to be plugged into per operation semantics.
Test Forcompletion
Invokeoperation
ModifyInput Set
GatherResult Set
Return Result
ReceiveInput
![Page 37: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/37.jpg)
Future Direction
• Enhancements to the Workflow Core• Enhancements to user interface and experience• Expanded use of semantic web technologies
• Code remains open source and always will
![Page 38: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/38.jpg)
Latest News
• See plans for Taverna 2.0 on myGrid wiki• Taverna development is user-driven
– Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers
• Bioinformatics curator for service annotation
Details on the myGrid website
![Page 39: my Grid and Taverna: Now and in the Future](https://reader036.fdocuments.in/reader036/viewer/2022081511/56814585550346895db26586/html5/thumbnails/39.jpg)
Acknowledgements
• The myGrid group – Past and Present• OMII-uk
• Carole Goble• Pinar Alper• Tom Oiin• Antoon Goderis• Matthew Gamble• Daniele Turi