2. Carole Goble - University of Manchester

34
Provisioning bioinformatics using open source software: plate spinning, wave riding and friendships. Professor Carole Goble FREng FBCS CITP University of Manchester, UK [email protected]

Transcript of 2. Carole Goble - University of Manchester

Page 1: 2. Carole Goble - University of Manchester

Provisioning bioinformatics using open source software: plate spinning, wave riding and friendships.

Professor Carole Goble FREng FBCS CITPUniversity of Manchester, [email protected]

Page 2: 2. Carole Goble - University of Manchester

Cytoscape

Bio* BioBabel

Page 3: 2. Carole Goble - University of Manchester

Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD5000+ registered users, 56 countries1600+ workflows, 1700+ services

Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL361 organisation, 48 countries70,000+ binary downloads , ~4000 source

http://www.mygrid.org.uk

Handy tools for data management tasks in bioinformatics. BSD

Page 4: 2. Carole Goble - University of Manchester

The Taverna Open Source Suite of Tools

Client User InterfacesGUI WorkbenchWorkflow Repository

Service Catalogue

Third Party Tools

Programming and APIs

Web Portals

Activity and Service Plug-in Manager

Provenance Store

Workflow Server

Open Provenance

Model

Secure Service Access

Workflow Engine

e-Galaxy

Virtual Machine

Page 5: 2. Carole Goble - University of Manchester

Scientific workflows, scripts and pipelinesNow also neuroscience, music and numerical analysisDeveloped with Oxford and Southampton

Web-based Software & Sharing Services“Mobilising the long tail of scientists for all our benefit”

Common Ruby on RAILS platformCommon and exchanged codebases

Systems Biology models, data and protocolsAdopted by 4 EU wide consortiums and 4 UK sitesDeveloped with HITS and Stellenboch

Crowd sourced curated Web servicesAdopted by EdUnify and ELDA education projectsDeveloped with EBI and EMBRACE network

Find experts, advice, scripts, variable setsTowards interface for UK Data ArchivesDeveloped with NIBHI

Page 6: 2. Carole Goble - University of Manchester

BioPortal

Controlled vocabulary restrictions

Rightfield: Wired in Annotationhttp://www.rightfield.org.uk

Page 7: 2. Carole Goble - University of Manchester
Page 8: 2. Carole Goble - University of Manchester

New Bioinformatics. Like the Old Bioinformatics? But bigger.

• Large scale data pipelines & Next Gen Sequencing– Cloud Analytics

• Sharing pipelines & expertise • Service reuse & curation • Data/model sharing • Metadata annotation• Data integration

• Trained 950+ bioinformaticians

Page 9: 2. Carole Goble - University of Manchester

PharmacogenomicsAssociation study of Nevirapine-induced skin rash in Thai Population

HIV and TB research in South AfricaTryps in African Cattle

Astronomy & HelioPhysics

Library Document Preservation

Systems Biology for BioFuels and Crop research

Observing Systems Simulation Experiments

JPL, NASA

User base – I’ll come back to this

Page 10: 2. Carole Goble - University of Manchester

Sharing Platform Trusted Service

CurationIncentives, Content

Standards & ContentGovernance & PolicyMetadata standardsData sharing policyRules of curationMethodology

Local dataCentralised data

Preservation &Publication Platform

GatewayPublic data banks

Software & ToolsOpen source

PALSAdvice, Consultancy, Training

Knowledge Network Skills & Community Building

Page 11: 2. Carole Goble - University of Manchester

OSS BenefitsAcademic, Industry & Society ConsumersSouth Africa, South America, Thailand

• Free software and resources.• Developer and user ramps.• Transparent access to code. • Constant innovation. • No vendor lock-in.• Capacity building in academia and small biotech• Open content (workflow) authoring community• Open developer community.

Page 12: 2. Carole Goble - University of Manchester

OSS BenefitsAcademic, Industry & Society ProducersmyGrid projects and the team

• Reputation.• Adoption & Capacity Building• Collaborations• Help to do more with smart people

– Open content community– Open developer community– Constant innovation and improvement.– Customisation and extensions.

• Sustainability paths – more funds– Impact metrics

Page 13: 2. Carole Goble - University of Manchester

Where does open source come from?Affects governance and approach

• Industry– Eclipse, Apache: Foundations

• Community – Bio*: Open Bioinformatics Foundation

• Projects– Taverna, myExperiment, BioCatalogue, SEEK, Galaxy,

KNIME, EMBOSS, Bioclipse, GMOD, BioConductor, CytoScape, SADI, Copasi ….

– Project based, sometimes foundations, often github• Individual

– PhD student (3 year cycle) & Independents– Stuck on sourceforge

Page 14: 2. Carole Goble - University of Manchester

Provisioning Bioinformatics using Open Source Software

• Free– Funding models

• Open Source – License models

• Open Development – Contribution models

Provisioning

Open Source Software

• Sustained & Documented• Service & Preserved• Support & Community

Page 15: 2. Carole Goble - University of Manchester

Plate spinning:• Staying relevant to new challenges

(but stable for current ones)• Supporting & sustaining software and communities• Balancing research & engineering

Friendships:• Coordinated Core foundation• Collaborations + Contributions• Activity and staff streams

Wave riding:• Opportunities for funding and collaborations• Readiness to adapt to be adopted

Page 16: 2. Carole Goble - University of Manchester

Core Team24 post-docs, 8 organisations, 4 countries, 12 projects, 5 research councils, 28 PALs, £1.5million per annum

X

XX

X X

X

BioinformaticansSoftware EngineersComp Sci Researchers

System AdminsTool developersBench scientists

Service providersInfrastructure providers

Page 17: 2. Carole Goble - University of Manchester
Page 18: 2. Carole Goble - University of Manchester

All Collaborative Projectsand frequently international

This makes transitioning to open source development easier.

Page 19: 2. Carole Goble - University of Manchester

Cooperative SCRUM & Community IntelligenceKeeping it real: Act local think and look global. Partner.

• The PALS– Advocacy and Alerts.

• Agile development– Team building.– Continual release.– “Web 2.0” style.– Discard at least once.

• Lots of collaborations.

[1] De Roure, D. and Goble, C. "Software Design for Empowering Scientists," IEEE Software, vol. 26, no. 1, pp. 88-95, Jan/Feb 2009[2] http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

Page 20: 2. Carole Goble - University of Manchester

Friendships: Software ContributorsContributor Licence Agreement (Apache style)

Collaborate withWork with

Don’t knowKnow

Virtual Liver

TavernaPBS

TavernaLC

ChemTavernaEdUnify

eGalaxy

OPALTavernaCDK-Taverna

SADIR Shell

FriendsFamily

Acquaintances Strangers

CoreTeam

CVRG

caBIG

Page 21: 2. Carole Goble - University of Manchester

Production

Research

Applications

Seed Project

SageCite

Page 22: 2. Carole Goble - University of Manchester

A Critical Mass of resource streams

• Research Councils & Industry– Hurrah for EU FP7, JISC and the BBSRC!

• Generic software: Applications– AstroPhysics, Astronomy, Chemistry, Document

Preservation, Social Science, BioDiversity.– Cookie cutter, Cross development

In kind contributions• Other projects and intl partners• Students

Page 23: 2. Carole Goble - University of Manchester

Coordinated plate-spinning Egosystem

• Resource planning• Matrix management• Deliverable balancing• Agenda balancing• Stakeholder balancing • Complexity drift• Core slow downOn the other hand….• Resilient• Planned for• Skunky• Community Coordination• What open source is

Page 24: 2. Carole Goble - University of Manchester

Every project brings its own deliverables and slows down the core roadmap

Makes coordination even tougher

Funding dips, funding fashions and cycles, funding gaps

Critical mass, top slicing subsidises, collaborators & friends, fund raisers and

“platform awards”

Page 25: 2. Carole Goble - University of Manchester

(Funding) Fashions and Agile Wave riding

• Predicting the next wave• Making the wave• Spotting the wave and repositioning• Riding out the lulls and cycles

• Workflows -> e-Labs• New models of publishing• Semantics yo-yo

– Was Semantic Web, now Linked Data• Grid -> Cloud

Page 26: 2. Carole Goble - University of Manchester

Coordination

Sustainability

Interoperability

Adoption

Critical Mass Community

Software

The Open Source Software Polo Mint Model

Page 27: 2. Carole Goble - University of Manchester

Sustainability of the Core Software is Free like Puppies are FreeBIS RCUK Expert Panel report on e-infrastructure

Novelty• Funding agencies (excl BBSRC BBR and T&R)• Self-promotion

Research vs Production Confusion• Claiming research has a user base• Claiming production is research

Entropy• Curating services• Service Decay• Bit Rot

Page 28: 2. Carole Goble - University of Manchester

Adoption academic not so different to commercial. Risk Management

• Fit for adoption (& customisation)– Not just stuck on Sourceforge.– Engineering Quality and Documentation.– Plan for adoption & exploitation.– Release cycles

• Help stakeholders adopt it.– The last mile, ramps– Engineering and Documentation.– Expertise: Support desk, SLAs, – Community self-help– Reward not hinder adoption.

Page 29: 2. Carole Goble - University of Manchester

Strategic Interoperabilityand Flocking Amplify adoption, capability and usefulness.

• Taverna + CDK, RShell, EMBOSS• Taverna + Bioclipse• Taverna + Galaxy• Galaxy + Cytoscape• Galaxy + GMOD

• Used Standards & Formats• Simple and suitable APIs• Common frameworks, e.g. OSGi• Compatible Licensing

– OSSWatch

Page 30: 2. Carole Goble - University of Manchester

• Drive long-term vision and secure resources.• Best practice, governance, policing.• Coordinate contributions & co-shaping.Models:• Benign dictator• Community democracy • Independent market placeA Foundation?• Export legal risk and admin overhead• IP assignment and due diligence• Community building• Succession: Benevolent Dictator for Life ?

Coordination. A core. A leader.

Page 31: 2. Carole Goble - University of Manchester

Industry

Commercialisation Business Models• Dual Support: customisation and service• Eagle Genomics support partnership!• Indemnify against potential IP infringement

Full blown commercialisation• Slow down• Open Source starvation• Flow back guarantees• Different priorities

Page 32: 2. Carole Goble - University of Manchester

Training

• Tutorials and Training• Summer schools• Developer and User

Days• Annotation Jamborees• Undergraduate and

Postgraduate Bioinformatics

Software ● Services ● Content ● Skills ● Community ●

Page 33: 2. Carole Goble - University of Manchester

Provisioning Bioinformatics using Open Source Software. Going to get more so.

Provisioning

Open Source Software

Challenge is securing sustainability

of softwareof skillsof community

Page 34: 2. Carole Goble - University of Manchester

Further Information• myGrid

– http://www.mygrid.org.uk• Taverna

– http://www.taverna.org.uk• myExperiment

– http://www.myexperiment.org• BioCatalogue

– http://www.biocatalogue.org• SysMO-SEEK

– http://www.sysmo-db.org• MethodBox

– http://www.methodbox.org.uk• OMII-UK

– http://www.omii.ac.uk• Software Sustainability Institute

– http://www.software.ac.uk