2. Carole Goble - University of Manchester
-
Upload
eagle-genomics-ltd -
Category
Documents
-
view
820 -
download
5
Transcript of 2. Carole Goble - University of Manchester
Provisioning bioinformatics using open source software: plate spinning, wave riding and friendships.
Professor Carole Goble FREng FBCS CITPUniversity of Manchester, [email protected]
Cytoscape
Bio* BioBabel
Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD5000+ registered users, 56 countries1600+ workflows, 1700+ services
Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL361 organisation, 48 countries70,000+ binary downloads , ~4000 source
http://www.mygrid.org.uk
Handy tools for data management tasks in bioinformatics. BSD
The Taverna Open Source Suite of Tools
Client User InterfacesGUI WorkbenchWorkflow Repository
Service Catalogue
Third Party Tools
Programming and APIs
Web Portals
Activity and Service Plug-in Manager
Provenance Store
Workflow Server
Open Provenance
Model
Secure Service Access
Workflow Engine
e-Galaxy
Virtual Machine
Scientific workflows, scripts and pipelinesNow also neuroscience, music and numerical analysisDeveloped with Oxford and Southampton
Web-based Software & Sharing Services“Mobilising the long tail of scientists for all our benefit”
Common Ruby on RAILS platformCommon and exchanged codebases
Systems Biology models, data and protocolsAdopted by 4 EU wide consortiums and 4 UK sitesDeveloped with HITS and Stellenboch
Crowd sourced curated Web servicesAdopted by EdUnify and ELDA education projectsDeveloped with EBI and EMBRACE network
Find experts, advice, scripts, variable setsTowards interface for UK Data ArchivesDeveloped with NIBHI
BioPortal
Controlled vocabulary restrictions
Rightfield: Wired in Annotationhttp://www.rightfield.org.uk
New Bioinformatics. Like the Old Bioinformatics? But bigger.
• Large scale data pipelines & Next Gen Sequencing– Cloud Analytics
• Sharing pipelines & expertise • Service reuse & curation • Data/model sharing • Metadata annotation• Data integration
• Trained 950+ bioinformaticians
PharmacogenomicsAssociation study of Nevirapine-induced skin rash in Thai Population
HIV and TB research in South AfricaTryps in African Cattle
Astronomy & HelioPhysics
Library Document Preservation
Systems Biology for BioFuels and Crop research
Observing Systems Simulation Experiments
JPL, NASA
User base – I’ll come back to this
Sharing Platform Trusted Service
CurationIncentives, Content
Standards & ContentGovernance & PolicyMetadata standardsData sharing policyRules of curationMethodology
Local dataCentralised data
Preservation &Publication Platform
GatewayPublic data banks
Software & ToolsOpen source
PALSAdvice, Consultancy, Training
Knowledge Network Skills & Community Building
OSS BenefitsAcademic, Industry & Society ConsumersSouth Africa, South America, Thailand
• Free software and resources.• Developer and user ramps.• Transparent access to code. • Constant innovation. • No vendor lock-in.• Capacity building in academia and small biotech• Open content (workflow) authoring community• Open developer community.
OSS BenefitsAcademic, Industry & Society ProducersmyGrid projects and the team
• Reputation.• Adoption & Capacity Building• Collaborations• Help to do more with smart people
– Open content community– Open developer community– Constant innovation and improvement.– Customisation and extensions.
• Sustainability paths – more funds– Impact metrics
Where does open source come from?Affects governance and approach
• Industry– Eclipse, Apache: Foundations
• Community – Bio*: Open Bioinformatics Foundation
• Projects– Taverna, myExperiment, BioCatalogue, SEEK, Galaxy,
KNIME, EMBOSS, Bioclipse, GMOD, BioConductor, CytoScape, SADI, Copasi ….
– Project based, sometimes foundations, often github• Individual
– PhD student (3 year cycle) & Independents– Stuck on sourceforge
Provisioning Bioinformatics using Open Source Software
• Free– Funding models
• Open Source – License models
• Open Development – Contribution models
Provisioning
Open Source Software
• Sustained & Documented• Service & Preserved• Support & Community
Plate spinning:• Staying relevant to new challenges
(but stable for current ones)• Supporting & sustaining software and communities• Balancing research & engineering
Friendships:• Coordinated Core foundation• Collaborations + Contributions• Activity and staff streams
Wave riding:• Opportunities for funding and collaborations• Readiness to adapt to be adopted
Core Team24 post-docs, 8 organisations, 4 countries, 12 projects, 5 research councils, 28 PALs, £1.5million per annum
X
XX
X X
X
BioinformaticansSoftware EngineersComp Sci Researchers
System AdminsTool developersBench scientists
Service providersInfrastructure providers
All Collaborative Projectsand frequently international
This makes transitioning to open source development easier.
Cooperative SCRUM & Community IntelligenceKeeping it real: Act local think and look global. Partner.
• The PALS– Advocacy and Alerts.
• Agile development– Team building.– Continual release.– “Web 2.0” style.– Discard at least once.
• Lots of collaborations.
[1] De Roure, D. and Goble, C. "Software Design for Empowering Scientists," IEEE Software, vol. 26, no. 1, pp. 88-95, Jan/Feb 2009[2] http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
Friendships: Software ContributorsContributor Licence Agreement (Apache style)
Collaborate withWork with
Don’t knowKnow
Virtual Liver
TavernaPBS
TavernaLC
ChemTavernaEdUnify
eGalaxy
OPALTavernaCDK-Taverna
SADIR Shell
FriendsFamily
Acquaintances Strangers
CoreTeam
CVRG
caBIG
Production
Research
Applications
Seed Project
SageCite
A Critical Mass of resource streams
• Research Councils & Industry– Hurrah for EU FP7, JISC and the BBSRC!
• Generic software: Applications– AstroPhysics, Astronomy, Chemistry, Document
Preservation, Social Science, BioDiversity.– Cookie cutter, Cross development
In kind contributions• Other projects and intl partners• Students
Coordinated plate-spinning Egosystem
• Resource planning• Matrix management• Deliverable balancing• Agenda balancing• Stakeholder balancing • Complexity drift• Core slow downOn the other hand….• Resilient• Planned for• Skunky• Community Coordination• What open source is
Every project brings its own deliverables and slows down the core roadmap
Makes coordination even tougher
Funding dips, funding fashions and cycles, funding gaps
Critical mass, top slicing subsidises, collaborators & friends, fund raisers and
“platform awards”
(Funding) Fashions and Agile Wave riding
• Predicting the next wave• Making the wave• Spotting the wave and repositioning• Riding out the lulls and cycles
• Workflows -> e-Labs• New models of publishing• Semantics yo-yo
– Was Semantic Web, now Linked Data• Grid -> Cloud
Coordination
Sustainability
Interoperability
Adoption
Critical Mass Community
Software
The Open Source Software Polo Mint Model
Sustainability of the Core Software is Free like Puppies are FreeBIS RCUK Expert Panel report on e-infrastructure
Novelty• Funding agencies (excl BBSRC BBR and T&R)• Self-promotion
Research vs Production Confusion• Claiming research has a user base• Claiming production is research
Entropy• Curating services• Service Decay• Bit Rot
Adoption academic not so different to commercial. Risk Management
• Fit for adoption (& customisation)– Not just stuck on Sourceforge.– Engineering Quality and Documentation.– Plan for adoption & exploitation.– Release cycles
• Help stakeholders adopt it.– The last mile, ramps– Engineering and Documentation.– Expertise: Support desk, SLAs, – Community self-help– Reward not hinder adoption.
Strategic Interoperabilityand Flocking Amplify adoption, capability and usefulness.
• Taverna + CDK, RShell, EMBOSS• Taverna + Bioclipse• Taverna + Galaxy• Galaxy + Cytoscape• Galaxy + GMOD
• Used Standards & Formats• Simple and suitable APIs• Common frameworks, e.g. OSGi• Compatible Licensing
– OSSWatch
• Drive long-term vision and secure resources.• Best practice, governance, policing.• Coordinate contributions & co-shaping.Models:• Benign dictator• Community democracy • Independent market placeA Foundation?• Export legal risk and admin overhead• IP assignment and due diligence• Community building• Succession: Benevolent Dictator for Life ?
Coordination. A core. A leader.
Industry
Commercialisation Business Models• Dual Support: customisation and service• Eagle Genomics support partnership!• Indemnify against potential IP infringement
Full blown commercialisation• Slow down• Open Source starvation• Flow back guarantees• Different priorities
Training
• Tutorials and Training• Summer schools• Developer and User
Days• Annotation Jamborees• Undergraduate and
Postgraduate Bioinformatics
Software ● Services ● Content ● Skills ● Community ●
Provisioning Bioinformatics using Open Source Software. Going to get more so.
Provisioning
Open Source Software
Challenge is securing sustainability
of softwareof skillsof community
Further Information• myGrid
– http://www.mygrid.org.uk• Taverna
– http://www.taverna.org.uk• myExperiment
– http://www.myexperiment.org• BioCatalogue
– http://www.biocatalogue.org• SysMO-SEEK
– http://www.sysmo-db.org• MethodBox
– http://www.methodbox.org.uk• OMII-UK
– http://www.omii.ac.uk• Software Sustainability Institute
– http://www.software.ac.uk