The Taverna Software Suite

37
The Taverna Software Suite Prof Carole Goble FREng FBCS CITP The University of Manchester, UK [email protected] http://www.mygrid.org.uk http://www.taverna.org.uk

description

Carole Goble at EGI User Forum, SHIWA, 2013

Transcript of The Taverna Software Suite

Page 1: The Taverna Software Suite

The Taverna Software SuiteProf Carole Goble FREng FBCS CITP

The University of Manchester, [email protected]

http://www.mygrid.org.ukhttp://www.taverna.org.uk

Page 2: The Taverna Software Suite

The Taverna Suite of ToolsClient User Interfaces

User InterfacesWorkflow Repository

Service Catalogue

Third Party Tools

Web Portals / Gateways

Activity and Service Plug-in Manager

Workflow Provenance

Workflow Server

Secure Service AccessOAuth1 & 2, username/password,

certificates.

Workflow Engine

Virtual Machine

Prog APIs

Command Line

Player

WorkflowComponents

Workbench Taverna Lite

Interaction Server

Page 3: The Taverna Software Suite

VPH-Share ProjectModels of Human Physiology

Eagle Genomics & NHSNext Generation Sequencing based Patient Diagnostics

Astronomy & HelioPhysics

Library Doc

Preservation

Systems Biology of Micro-Organisms

OpenTox Project Chemistry Development Kit

Drug Toxicity

BioDiversity Invasive Species Modelling

Metagenomics

Page 4: The Taverna Software Suite

5820 members, 304 groups, 2415 workflows, 604 files and 229 packs (research objects)

Page 5: The Taverna Software Suite

biovel.myexperiment.org

Page 6: The Taverna Software Suite

5820 members, 304 groups, 2415 workflows, 604 files and 229 packs (research objects)

Page 7: The Taverna Software Suite

The Wf4Ever Components

http://www.wf4ever-project.org

ModelsEncoded in StandardsContributed to Standards

ServicesFoundational, Extension, UserAPIs, ArchitectureWeb protocols/services

Policy and PlanningLeveraging established

protocols Preservation planning, policiesBest workflow design practices

Reference SystemsCommand line+

Third party systemsUser Driver

Page 8: The Taverna Software Suite

The Research Object www.researchobjects.org

Execution Platform

Page 9: The Taverna Software Suite

Using and Making Standards

Standard id for each componentORCID, DOI, URI

OAI-OREStructuring and Bundling

descriptions and components.

W3C Open Annotation Data Model (AO)Wf4Ever instrumental and hosting rollout

meeting in ManchesterTransferable annotations

Structured and semantically tagged packs for exchange and for linking across repositories

Semantic Web Encoding

Aggregation

Annotation

Identity

ro Ontology

Page 10: The Taverna Software Suite

Preservation ChecklistMonitoring environmentMetadata Completeness

Release, not PublishSoftware release practice for workflows and scripts,

services, data, articles, research objects

Gamble, Zhao, Klyne, Goble. MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked Data, 8th IEEE e-Science 2012, Chicago, USA

W3C PROVRepair recordPreserved record of execution

Gil, Miles, Belhajjame, Deus, Garijo, Klyne, Missier, Soiland-Reyes, Zednik. Primer for the PROV Provenance Model. World Wide Web Consortium (W3C). 2012.

Belhajjame, Goble, Soiland-Reyes, De Roure. Fostering Scientific Workflow Preservation Through Discovery of Substitute Services. Proc 7th IEEE eScience 2011 Stockholm Sweden

Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

minim

wfprov roevo

Page 11: The Taverna Software Suite

Preservation ModelExperiment Descriptions

Organise workflows into structuredstudies

wfdescInputs, outputs, dependencies

Workflow DecayComponent, Data & Infrastructure unavailability or inaccessibility

Taverna Components

Experiment Decay

Methodological changesNew technologies, resources, components, data

WorkflowMotifs

IEEE e-Science 2012FGCS submission

Best PracticesSWAT4LS

Page 12: The Taverna Software Suite

http://www.researchobject.org/W3C Research Object for Scholarly Communication (ROSC) Community Grouphttp://www.w3.org/community/rosc/

Page 13: The Taverna Software Suite

Taverna Engine Execution

• Scufl2 language• Functional dataflow, simple control flows, implicit iteration

• Linking services and tools• Data movement, monitoring, staging, reference• “In Workflow Programming” Beanshell scripting• Provenance collection: W3C PROV(+) format• Plug-in Framework

– Infrastructures: Grid, HPC, Web Services (SOAP, REST) – Domain: CDK, BioMart, VOTable, SADI– Common Tools: Excel Spreadsheets, Google Refine, R

• OAuth security plug-in

Page 14: The Taverna Software Suite

Taverna Pro-Workbench

• Desktop application• GUI• Intermediate results

views• Gateway to

BioCatalogue and myExperiment

• Plug-in Framework

Page 15: The Taverna Software Suite

Workflow Blocks made of a workflow

• Well described • Well behaved• Well looked after• Agreed fail• Agreed formats in and out• Agreed provenance

Deposited in myExperimentGrouped into families

Components

Page 16: The Taverna Software Suite

Workflow Blocks made of a workflow

• Well described • Well behaved• Well looked after• Agreed fail• Agreed formats in and out• Agreed provenance

Deposited in myExperimentGrouped into families

Components

Page 17: The Taverna Software Suite

Workflow Blocks made of a workflow

• Well described • Well behaved• Well looked after• Agreed fail• Agreed formats in and out• Agreed provenance

Deposited in myExperimentGrouped into families

Components

Page 18: The Taverna Software Suite

Desktop Clienthttp://www.xworx.org/

Data Centric Interface

BIFI (Beautiful Interfaces for Inputs) Taverna Workbench Plug-in, GUI definition language

Page 19: The Taverna Software Suite

Data services• Vanilla Taverna

– Domain data type neutral

• AstroTaverna plug-in – IVOA data services– VOTables

• PyWPS plug-in– Exposes OGC-compliant

Web Processing Services that can handle large data

Page 20: The Taverna Software Suite

Taverna Server• Multiple clients, Multi-user• SOAP and REST API

Server HostServer Host

TavernaServer

“Client”

TavernaServer

“Client”

Taverna Server Front End

Taverna Server Front End

TavServ Back EndTavServ

Back End

TavServ Back EndTavServ

Back End

TavServ Back EndTavServ

Back End

ServiceService

ServiceService

ServiceService

Page 21: The Taverna Software Suite

Taverna Server Family• Taverna Server

– Multiple clients, Multi-user– SOAP and REST API

• Taverna Server Amazon Machine Image– Bundled R server, Atom feed server– Multiple instances in Amazon Cloud and as

required, for multiple users/uses and different security scenarios

• Taverna Virtual Machine• Taverna Command Line• Bundled Servers

Page 22: The Taverna Software Suite

Calling DCI Grid/Cloud Services• Expose services/tools as WSDL/REST services

– HELIO: Fixed host name – VPH-Share: Services running on dynamically started

instances– SZTAKI Desktop Grid – BOINC/Debian Package

• Specific service/extension to Taverna– UNICORE plugin: Ask grid what services are available,

Include services in a workflow, Invoke services on the grid see talk by Shahbaz Memon

• Library to control job submission to grid– PBS plugin: beanshells in a workflow include invocations

of jobs– KnowARC plugin: Advanced Resource Connector to

submit jobs to NorduGrid

Page 23: The Taverna Software Suite

Webinterface

InputSNPs

Results

Storage (S3)

Ensembl (mySQL)

Cache(S3)

Taverna Server

Taverna Server

Taverna Server

Workflow engine

orchestratore-Hive

other

Taverna

Application specific tools and Web Services

Application specific tools and Web Services

Application specific tools and Web Services

WS WS ToolToolWS

All user interaction via web interface

User data stored in the Cloud

Data for all tools and Web Services stored in the Cloud

Unified access to different workflow engines with our common REST API

Tools and Web Services for each workflow are installed together for easy replication

Cloud Analytics for Life Sciences

Page 24: The Taverna Software Suite
Page 25: The Taverna Software Suite

Tavoop—Taverna & Hadoop

• Compiles Taverna Workflow to collection of Hadoop jobs

• Designed for handling very large amounts of data– Overhead to using Hadoop, but

wins if enough data– Data ingest (expensive step)

must have already been done• Supports Taverna Platform

Execution interface• Parallelisable service types• http://wiki.opf-labs.org/display/SP/P

PL Hadoop ClusterHadoop Cluster

Taverna Execution InterfaceTaverna Execution Interface

Tavoop CompilerTavoop Compiler

Portal(Taverna Player)

Portal(Taverna Player)

GUI Application(Workbench)

GUI Application(Workbench)

Page 26: The Taverna Software Suite

Interacting with a workflow

• Many workflows need user interaction• A workflow on a server does not need to

be “press a button and wait”– VPH-Share opens a VNC connection to the

spawned instance.

• Taverna Interaction Service– Users interact with a workflow (wherever it is

running) in a web browser. – Interaction Service Plug-in in workbench

Page 27: The Taverna Software Suite

URLs and Frames

Page 28: The Taverna Software Suite

Taverna Tool Spectrum

Technical ComputationalScientist

DomainScientist

Workbench WorkbenchComponents

Lite Domain-SpecificWebsite / Tool / Portal

Workflow Visibility

Concept KnowledgeTaverna Domain

High LowPlayer Command Line

Page 29: The Taverna Software Suite

Taverna Client Family• Java library / Ruby GEM • Run a Taverna workflow in another

workflow system e.g. Galaxy tools• Command line• Simple Taverna “player”

– Fixed workflow

• Upload & run workflows and choose data– Universitat Pompeu Fabra’s “Soaplab

MajorDomo”– Taverna Lite

Page 30: The Taverna Software Suite

Taverna-LiteGeneric Web-based Client

Hide complexityAccess to datasetsUpload and interact with

workflows

Build Portal• Homepage• User-Sessions• Workflow Management• Run Management• Server Credentials

Uses Components for simpler assembly and workflow edits

Page 31: The Taverna Software Suite

Web apps to create and run workflows

Service Chaining EditorPete Walker et al Plymouth Marine Laboratory

For chaining OGC Web Processing Service geospatial Web services

Page 32: The Taverna Software Suite

Web apps to create and run workflows Online Taverna

• Dr Vadim Surpin and Vitaly Sharanutsa• Institute for Information Transmission Problems of

Russian Academy of Sciences (IITP RAS)

An online, in-browser application for assembling and running Taverna Workflows over a HPC platform

Software Sustainability Institute BoothDr Vadim Surpin

Page 33: The Taverna Software Suite

Upload workflow by URL Online Taverna

Page 34: The Taverna Software Suite

Taverna 3

Beta July 2013

Page 35: The Taverna Software Suite

Summary

• Taverna Suite for interactive and batch workflows

• Flexible Plug-ins and Flexibly Plugged-in

• Themed Taverna

• Establishing Taverna Foundation

• We welcome collaboration/contribution

• http://www.taverna.org.uk

Page 36: The Taverna Software Suite

Learn more….• myGrid

– http://www.mygrid.org.uk

• Taverna– http://www.taverna.org.uk

• myExperiment– http://www.myexperiment.org

• BioCatalogue– http://www.biocatalogue.org

• Wf4ever– http://www.wf4ever-project.org

• SCAPE– http://www.scape-project.eu

• Software Sustainability Institute– http://www.software.ac.uk

• BioVeL– http://www.biovel.eu

Page 37: The Taverna Software Suite

• Virtual data objects– Johan

• MOU – Portals for BioVeL– DCI platforms

• myExperiment – SHIWA repository (execution)– How can we interchange