Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji /...
-
Upload
aldous-white -
Category
Documents
-
view
215 -
download
2
Transcript of Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji /...
Bioinformatics Workflows
Chris Wroe(based on material from the myGrid team &
May Tassabehji / Hannah Tipney
Medical Genetics, St Marys)
Bioinformatics pipelines on the web
• Copying and pasting from one web based application to annotation by hand
• Advantages : quick, easy access to distributed resources
• Disadvantages: time consuming, error prone, tacit procedure so difficult to share both protocol and results
RepeatMasker BLASTn Twinscan
Automating pipelines
• Using Perl/ Matlab scripts to implement a pipeline• Advantages : automation, quick to write,
significant community resources (e.g. BioPerl)• Disadvantages: hard to explain, hard to relocate,
hard to tinker with.
WorkflowsRepeatMasker
Web service
BLASTnWeb Service
TwinscanWeb Service
Sequence in Predicted genes out
• Simple scripting language aims to specify how steps of a pipeline link together
• High level picture of the pipeline separated from any low level fiddling
• Application logic and low level fiddling encapsulated in remote web services
• Advantages : automation, quick to write, easier to explain, share, relocate, and record provenance of results in a standard way
Workflow components in myGrid
• Scufl – Simple Conceptual Unified Flow Language– Developed by myGrid members at EBI.– Designed to be as simple as possible, just enough features to
support bioinformatics workflows
• Taverna – a tool for writing, running workflows and examining results.
(http://taverna.sourceforge.net)
• FreeFluo – workflow engine to run workflows (http://freefluo.sourceforge.net)
Workflow use
• Newcastle University (Anil Wipat, Peter Li)
– Affymetrix Microarray Analysis Workflow– Gene annotation workflow
• Manchester University May Tassabehji, PhD student Hannah Tipney, Medical
Gentics, St Marys (Wellcome Trust Funded)
– Gene alerting service workflow (GAS)– Gene and protein annotation workflow
• And others
Workflow experience +
• Easy to get started with Taverna (1-2 hours tutorial)
• Sharing does happen• Cuts down the time taken to perform one
pipeline from 2wks to 2 hours
Workflow experience: outstanding issues
• Early days: web services rare; significant time take to wrap applications as web services (licensing, installation, maintenance)– Soaplab and Gowlab try to help
(http://industry.ebi.ac.uk/soaplab)
• Fiddly bits don’t go away: Many ‘shim’ services needed to ensure the output of one step fits the expected input of another
• Automation produces many results in a short amount of time. Issues of result management and display
Other workflow systems
• Commercial bioinformatics – drug discovery – Incogen VIBE– TurboWorx Pipeline Pilot
• eScience– DiscoveryNet (bioinformatics – proprietary)– Keppler ( US ecology)– Triana (UK Physics astronomy, signal
processing)
Workflow standards
• Can’t have enough of them! All currently come from e-Business rather than science community
• BPEL – Business Process Execution Language• WS – Orchestration• XML Process Definition Language (XPDL)• Business Process Markup Language (BPML)