Outline
description
Transcript of Outline
Opwarmer for discussion on the harmonization of similar initiatives in NBIC sequencing, metabolomics, protomics and biobanking task forces (+friends like NuGO, EBI, GEN2PHEN, BBMRI-NL, SysMO, EU-PANACEA, Groningen Genomics Coordination Center).
LIMS – laboratory info mngmnt system- AKA study capturing framework - AKA sample treatment tracker - AKA investigation metadata annotator
Outline
• What do we mean with LIMS / SCS?
• Ingredients for collaboration
• Suggestive discussion topics
•Peak finding• SNP analysis• GWAS• xQTL•...
• Individuals• Samples• Protocols• Results• Background info
10 20 30 40 50 60 70 80
10
20
30
40
50
60
70
80
• Sequencing• Genotyping• Microarrays• Mass spec•…
LIMS/SCF is portal for
• CilairDB
• Corra
• OpenBIS
• OpenMS
• …
Examples Proteomics
http://www.cisd.ethz.ch/software/openBIS/HCS
• SequenceLIMS
• ChIPLIMS Nijmegen
• GenotypeLIMS
• IBIDAS?
• iSeq
• …
ExamplesSequencing/Genotyping
Courtesy Joris Lops, GCC & LifeLines
• QTL XGAP/EU-PANACEA
• GWAS XGAP/LifeLines
• HGVbaseG2P 2.0
• …
ExamplesBiobanking
Courtesy Joeri van der Velde & friends, GCC & LifeLines
Working hypotheses
1. Each platform has one or more study ‘portals’• Captures all wet-lab and dry-lab flows
• Links to (or copies from) public annotations
• Provides value and data inputs for pipelines
• Stores provenance and results of all pipeline runs (as result files)
2. All tools developed in BioAssist will be connected to them• Need to think on user interaction
• Need to think on data exchange (formats)
• i.e. what does the biologist want?
• We can benefit greatly if we harmonize and share work• Each domain has specific needs but we can still share
• Data models, User Interfaces, Back-ends, …
• Coordination of this a task of CET?
Ingredients for collaboration
1. Conceptual model• To capture all data, including variation/extension mechanisms
2. Exchange formats• To exchange between public and private databases
3. User interfaces• Data import wizards
• Extraction / query modules
• Platforms for analysis!!!
4. Backend engines1. Large scale binary data
2. Automatic generation of services/pipelines
1. Conceptual model• Targets: the thing being followed
AKA: Individuals, Sample, Panels/Groups, Material
• Features: a abstract property of a target
AKA: Characteristics, Comments,
• Values: a concrete property of target (at a certain time)
AKA: Data
• Protocols: description of an activity
AKA: EventType, Template
• ProtocolApplications: use of protocol that produced (a) value
AKA: Events, Activity, Assay
• Investigation: some container of above + contacts/publications
AKA: Study, Project, Laboratory, Partner
‘Pheno-OM’ (generic variation mechanism)
NLNLEBI
Flexible: any feature,
value, and target combo
Observedvalue
Observedvalue*
Observationtarget
Observationtarget
time
Observablefeature
Observablefeature
*
PanelPanel IndividualIndividual*
* ProtocolProtocol
ProtocolapplicationProtocol
application
*
time
Observed Relation
Observed Relation Inferred ValueInferred Value*
*
time
*
Height
179cmInd1
XGAP (extension based variation mechanism)
Swertz et al (2010) Genome Biology 11(3).
DATA ELEMENT
TRAIT
SUBJECT
columns
rows
dimension ELEMENT
PROBE-Name-Gene-Chromosme-Locus
PROBE-Name-Gene-Chromosme-Locus
MARKER-Name-Allele-Chromosme-Locus
MARKER-Name-Allele-Chromosme-Locus
MASSPEAK-Name-MZ-RetentionTime
MASSPEAK-Name-MZ-RetentionTime
Panel-Name-Type: CSS, RIL..-Parent Panels
Panel-Name-Type: CSS, RIL..-Parent Panels
INDIVIDUAL-Name-Strain-Mother-Father-Sex
INDIVIDUAL-Name-Strain-Mother-Father-Sex
SAMPLE-Name-Individual-Tissue
SAMPLE-Name-Individual-Tissue And so on
…
And so on…
And so on…And so on…
NLNL
ISA-TAB(generic model)
Differs from MAGE-TAB• Nested investigations (as studies)• To have templates assays• More aligned to FuGE• But some find it too difficult
ISA =• Investigation• Study (Investigation component)• Assay (a component of Study)• Data files
Still in testing phase though…
http://isatab.sf.net
MIBBI
• MIBBI Minimum Information for Biological and Biomedical Investigations (total 31 areas)
http://mibbi.sourceforge.netTaylor et al 2008 Nature Biotechnology 8, p 889
MIAME Minimum Information About a Microarray Experiment
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
Ingredients for collaboration
1. Conceptual model• To capture all data, including variation/extension mechanisms
2. Exchange formats• To exchange between public and private databases
3. User interfaces• Data import wizards
• Extraction / query modules
• Platforms for analysis!!!
4. Backend engines1. Large scale binary data
2. Automatic generation of services/pipelines
17
Connect to R statistics
Connect to R statistics
Workflow ready web-services
Workflow ready web-services
UML documentation of your model
UML documentation of your model
Edit & trace your dataEdit & trace your data
Import/export to ExcelImport/export to Excel
plugin your own scripts (OntBrowse)
plugin your own scripts (OntBrowse)
Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.
find.investigation()102 downloaded
obs<-find.observedvalue(43,920 downloaded
#some calculationadd.inferredvalue(res)36 added
3. User interfaces
3. User interfaces (import wizards)
http://www.obofoundry.org/http://bioportal.bioontology.org/ REST serviceshttp://www.ebi.ac.uk/ontology-lookup/ SOAP serviceshttp://ontocat.sf.net – Simple API around bioportal
ADD PICTURE OF GSCF
Things to discuss as next steps?Put all people/tools in this room on the table
• Agree on exchange formats & models (generic/specific)
• Test drive data exchange or even federation
Share the work
• Communicate requirements and plans
• Reuse each other user interface components
• Share scalable back-ends (for high throughput data)
Invest in technology interoperation
• Invest in Galaxy callback to MOLGENIS/Grails (data chooser)?
• Invest in a MOLGENIS to Grails generator (must be easy)?
Something for NBIC mgmt team to think about