SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

Post on 02-Jan-2016

23 views 2 download

Tags:

description

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models. Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller , O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch. - PowerPoint PPT Presentation

Transcript of SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit)Jacky Snoep - University of Stellenbosch 

MS eScience Workshop, Pittsburgh, PA

SysMO=SYStems biology of Micro Organisms

(2)

(2)

(29)

(22)

(9)(4)

(1)

11 projects, 91 partners, 9 countries, started 2007

Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites

Sensitively retrofit a data access, model handling and data integration platform.

Support and manage the diversity of data, models and competencies.

Web-based solution:exchange of data, models and processes (intra- 

and inter-consortia).search for data, models and processes across 

the initiative.dissemination of results.

SysMO-DB

SysMO-DB Team

University of Stellenbosch, South AfricaUniversity of Manchester, UK

Jacky Snoep

EML Research gGmbH, Germany

Isabel Rojas

University of Manchester, UK

Olga Krebs

Wolfgang Müller

Sergejs Aleksejevs

Carole Goble

Stuart Owen

Katy Wolstencroft

Connect projects, connect to outside

Project specific solutions

Internally used tools & data

Outside data and tools

Project

Public

My Disk: DataModelsWorkflows

Personal

SysMO-DB, inter-project

Own solutions

Suspicion

Data issues

Resource Issues

Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial …  files and spreadsheets.

Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.

 Many do not have data, or follow the standards that exist or know who is doing what.   Much of the data cannot be compared

Different organisms, different strains.

No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping

Principles…

Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible

Provide instant gratification Address doubt and anxiety Build it

Modellers

Exchange

Experimentalists

Exch

ange

Exchange

Exchange

Bioinformaticians

Three types of people

„Natural“ collaboration within SysMO

Short, simplified, black and white: Collaboration during 

project design Varying methods of 

collaboration during project Binomes (One modeller, one 

experimentalist) Groups collaborating with 

groups (occasional/formalized exchange of information)

Varying success Need for a watering 

hole/meeting point Application where 

experimentalists/bioinf/ modelers meet

({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)

({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)

({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)

Trying to make experimentalists, modellers, bioinformaticians peacefully share resources

Some numbers& Some consequences

1 Software Engineer  1 Bioinformatician, 1 Bio-database specialist

11 projects, 91 partners 20 programmer days/year/project 2.5 programmer days/year/partner “just in case“ approach impossible

Focus on real needs “just in time“, “just enough“ The right 20%

Help people help themselves Communication!

20%

80%

80-20-rule:80% of the featureswon‘t be used anyway

Useful features

Social Approach Questionnaires PALs (Project Area Liaison)

21 Postdocs and PhD students Bio/bioinf/modeller Our design and technical 

collaboration team Very intense face to face and 

virtual collaboration UK and Continental PALS 

Chapters Audits and Sharing

Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Communication via PALs

DB team PALS Projects

Show what is thereSuggest what is possible

Ask for requirements

Give requirementsTell priorities

Rate outcomesSuggest improvements

Double checkTransmit

Disseminate

Collect answers

Need to find the guy who does xyz: Yellow pages

Need to storeStandard Operating Procedures

Almost all our data is Excel

Outcome of first PALs meeting:

What‘s thereSysMO-SEEK screenshots

Yellow pages

Tag clouds

Bookmarks

Yellow pages tabs

ISA tabs

Standard Operation Procedures

JWS connection for modellers

View Study

New Assay (ISA)

Rights and sharing

Rights and sharing: create group

So much for the webapp

Rights+Sharing Connection to modelers‘ tools

Yellow pages SOPs

Almost there: Improved excel support

Matthew Horridge

Towards Just-Enough Exchange

Incremental steps from beta to beta 

Towards Just-Enough Exchange

Largely a story about how to handle Excel sheets for user‘s benefits

SysMO Just Enough Exchange

COSMIC

Alfresco

BaCell-SysMO

Alfresco

MOSES

Wiki

SysMO-LAB

Wiki

SABIO-RK

Public Resources

SABIO-RK

Spreadsheets

SpreadsheetsSpread

sheets

Spreadsheets

BASE

Need for tradeoff

Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards

Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation Need to move incrementally to just-enough 

standard implementation

Path = goalThe journey is part of the reward

Let people use what they use anyway If changes necessary, 

be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many 

users as possible: Simple search, simple exchange, simple tool use

A roadmap

Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs

Upload service: Hand-triggered upload of link/file Hand-added metadata

Harvesting+change detection service Automatic download Hand-added metadata

Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers

Use other data types where appropriate  SBML, Matlab, Mathematica…

Stability hierarchy

Single group

Single SysMO project

Whole SysMO

Template for a group of experiments

More stable JERM data modelTemplate best practise

Project-level template

Increasing stability

Parsers/ annotators

Enter into that

Use mappers where needed

JERM Extraction Architecture

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

DataM

etad.

Data

Metad.

Data

MapperParser

Data

Metad.

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

Data

MapperParser

Project repositories

OopsSome projects not prolonged

Need all project data in the system fast,so…

JERM Extraction Architecture

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

DataM

etad.

Data

Metad.

Data

MapperParser

Data

Metad.

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

DataData

Data

MapperParser

DataProject repositories

Lessons we‘re learningSome interesting bits along the way

Subsetting: Don‘t overwhelm

Standards need to be comprehensive

Goal: „Minimum information“… (MIBBI)

Tends to be superset of what is needed for a project

Example for non-applicable attributes  Tissue of a single cell Gender

Useful to use adapted subset-templates

Experimental design selection list

From biofolksonomy to ontology

Observation: Fast growing set of 

standards Standards are moving 

target Incremental approach

Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to 

standard ontologies Provide migration tools

Tags + suggestions

Home-brewed taxonomy

A word on software

Template tooling Excel JAVA

SysMO-SEEK (open source under Apache license) Ruby on Rails

Convention over configuration Libraries & plugins

Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby

Database:MySQL also tested with SQLite(exclude db depedencies)

Summary

SysMO-DB as a virtual meeting point for different flavours of systems biologists

SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done  still a lot todo 

Challenges ahead…

Social PALs work great and motivated Now need moremoremore datadatadata

Technical Publishing into public repositories Search + exploration: The test for data quality

Hierarchical Faceted Search Distributed search via Taverna workflows

More workflows via SysMO-SEEK Improve modelling support

Bonus track: what if…

…the average data quality is below par?

„Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings

Thanks

EML People:  Isabel Olga

UMAN People: Carole Katy Finn Stuart Sergejs

Jacky at Stellenbosch

BBSRC BMBF KTF

…and Microsoft for sponsoring this workshop

www.sysmo-db.orgEnd + questons

END