Building dbNP the nutritional phenotype database · 2010-02-08 · the European Nutrigenomics...
Transcript of Building dbNP the nutritional phenotype database · 2010-02-08 · the European Nutrigenomics...
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
Building dbNPthe nutritional phenotype databaseA real data structure for systems biology
Chris Evelo
Department of Bioinformatics - BiGCaT
Maastricht University
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The NBX network
1. Small server network (nbx)
2. Data-Grid
3. Nutritional Phenotype Database (dbNP)
4. Genomics pipelines
5. Statistical tools
6. Pathway analysis
7. Pathway development
8. Systems Biology
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NBX network
BioLinux 5.0Repository at NEBC
NuGO NBXRepositories,
Potsdam, Maastricht
Ubuntu 8.04
Included inBioLinux
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO e.g. http://nbx1.nugo.org
1. Web access through GenepatternBroad Institutes interface for biologists to bioinformatics.
2. A NuGO desktop for interactive analysis.
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The NuGO Desktop
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Also for
• Yet another nbx (small server)
• A workstation (PC)
• Memory stick (when needed)
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The Data-Grid
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The Data-Grid
Transparent data sharing
Allows to:• Access shared data
on any nbxwithout knowing where it is.
• Share data with other NuGO members• Use NuGOnet to give access (LDAP)
Contains PPS and focus team areas
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
Pathway & GOTechnology X
Profiles
The overall structure
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The basic structure
Clean data db
Study metadata
db
Dataanalysis
sampleanalysis
1
2
3
4
5
Study design
Clean data db
Study metadata
db
Dataanalysis
sampleanalysis
1
2
3
4
5
Study design
1) Protocols are stored in the study metadata database.
2) Analytical procedures on study samples
3) Processing to ‘clean data’
4) By interrogating the study metadata database, data subsets of multiple
studies can be selected
5) and analyzed by statistical and bioinformatics tools.
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The food intake module
Clean data db
Study metadata
db
Dataanalysis
sampleanalysis
1
2
3
4
5
Study design
Clean data db
Study metadata
db
Dataanalysis
sampleanalysis
1
2
3
4
5
Study design
Food
analysis
Food
consumption
6
7
8
9
Clean data db
Study metadata
db
Dataanalysis
sampleanalysis
1
2
3
4
5
Study design
Clean data db
Study metadata
db
Dataanalysis
sampleanalysis
1
2
3
4
5
Study design
Food
analysis
Food
consumption
6
7
8
9
6) food consumption captured
7) and stored in the clean data database.
8) foods are analysed
9) and food composition data are stored in the clean data database
2) The food metabolome is analysed in biofluid samples
5) Statistical and bioinformatic analyses of the food metabolome data are
used to assess food intake or compliance to the dietary intervention.
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
Pathway & GOTechnology X
Profiles
The overall structure
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
Why Pathway Analysis?
Intuitive to Biologists
• Provide biological context for results
• More efficient than searching databases gene-by-gene
Computation on Pathway Content
• Analyze over-representation of changed genes or metabolites
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NBX Pathway tools
• PathVisio: Open source Maastricht/San Francisco, general pathwaytool, statistics, metabolomics, plugin architecture
• EuGene: Florence, pathways statistics, large pathway collection, uses PathVisio.
• Metacore: commercial tool, interesting content
• WikiPathways: pathway creation, webservices
• Cytoscape: Open source, network analysis, plugins for PathVisio/WikiPathways.
Biological Pathways
PathVisioPathVisio
• Visualize data on biological pathways
• It can use gene expression, proteomics and metabolomics data
• Identify significantly changed processes
www.pathvisio.org
Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers,
Susan Coort, Bruce R Conklin, Chris Evelo (2008) Presenting and
exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399
Pathway Analysis: Z-score
1717
2 dimensional
pathway profile
vector
PID1, zscore 1
PID2, zscore 2
…
WikiPathways
• Public resource for biological pathways
• Anyone can contribute and curate
• More up-to-date representation of biological knowledge
WikiPathways: Pathway Editing for the People. Alexander R. Pico, Thomas Kelder, Martijn P. van Iersel, Kristina Hanspers, Bruce R. Conklin, Chris Evelo. PLoS Biology 2008: 6: 7. e184
Commentaries:Big data: Wikiomics. Mitch Waldrop. Nature 2008: 455, 22-25
We the curators. Allison Doerr. Nature Methods 2008: 5, 754 - 755
Download
Tutorials are in Help
Revision history
Select two versions
Click
Diff tool
Species• We can add
any speciesthat is in ENSEMBL
• Or anything where youcreate an ENSEMBL like DB
• Just ask!
Metabolomics
Visualization and statistics fully supported
HMDB
Chebi
NuGOwiki
Automatic content generation
GO, KEGG and BioPAX: converters available
- GO analysis in R (through webservices) vsiualisation in tables.
- KEGG full downloadable set
- BioPAX: interesting for Reactome round trip
Assisted content generation
Suggestions from:
• HMDB
• KEGG
• BIND/IntAct
• Text mining (Phasar, EBI-tools)
• WikiPathways itself
• Extendible…
Assisted content generation
Assisted content generation
Assisted content generation
Assisted content generation
Portals
Need one?
Webservice
Web service
Mining biological pathways using WikiPathways web services. Thomas Kelder, Alexander R Pico, Kristina Hanspers, Martijn P van Iersel, Chris Evelo, Bruce R Conklin. PLoS One 2009: 4:7 07
Webservice example
Integrates data from
ArrayExpress Atlas
with WikiPathways pathways
Cytoscape plugin
Search and open pathways
from WikiPathways directly in
Cytoscape.
• SSOAPhttp://www.omegahat.org/SSOAP/
R Example
GSEA in R on WikiPathways
Pathways using Webservices
Credit system
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO A real life example
Using Pathways in Networks
The NuGO PPS1 for Dummies
• Mice
• High fat vs low fat diet
• Samples taken before and after treatment
• Three tissues: liver, muscle and white adipose
Distribution genes vs. pathways
Genes with q <= 0.01 Pathways with z >= 2
Cytoscape visualization
• Network visualization
• Data visualization
– Z score for pathways
– Expression profile for reporter nodes
Pathway
Significant reporter
A
B
G1
G2
G3
G1, G3: present in pathway A
G2: present in pathway A and B
Higher z-score
Log2(tn/t0): [t1, t6, t9, t12]
Liver
All pathways
Pathways with high z-score
grouped together.
Explains why there are
relatively few significant
genes, but many pathways
with high z-score.
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Can we also?
We can show metabolites and do pathway statistics.
Can we also work with fluxes, intergratedynamice modelling.
Not yet! SBML integration started (hard)
Flux representation planned (eassier)
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Can we also?
We can show gene products.
Can we also combine with sequence basedregulatory information (e.g. TF binding).
Yes! Direct connection to Cytoscape, many data combinations possible (but needs coding).
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Can we also?
We can show gene products.
Can we also show SNPs.
Yes! Well… almost ;-)BridgeDB connections allow this.
But… What do you want to see?
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
Pathway & GOTechnology X
Profiles
The overall structure
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
Integrated Pipelines
• QC• Normalisation• Data treatment• Statistics• …
Most often done in R/Bioconductor
Using NuGO R-servers or Grid
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The overall structure
Pathway & GOTechnology X
Profiles
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The overall structure
Pathway & GOTechnology X
Profiles
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
BridgeDb
Problem: Identifier Mapping
?
Affymetrix probeset100234_at
Entrez Gene 3643
Solution: Conversion tools
Problem: Usability
• Check for double IDs
• Check for missing IDs
• Only 1000 at once
• Check alignment of
Excel columns
• Manual
• Error-prone
Solution: Built-in Mapping
• Genericbioinformaticsplatforms shouldhave identifiermapping built-in.
BioConductor
PathVisio
Cytoscape
...BatteriesIncluded
Solution: Built-in Mapping
Mapping
service
Entrez Gene 3643
Affymetrix probeset100234_at
• Synergizer
• EnsMart
• DAVID
• CRONOS
• AliasServer
• MatchMiner
• OntoTranslate
Problem: Which mapping service?
Solution: Abstraction Layer
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The overall structure
Pathway & GOTechnology X
Profiles
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
AnalysisTech X
Studydesign
Technology XClean data db
Technology XRaw data db
Samples
Study metadata
db
Research answer
Queryinterface
Data selection for bioinformatics
Statisticaltoolbox
Bioinformaticstoolbox
Study subset selection
Research question
IdentifiermappingBridgeDB
Technology XData processing
IdentifiermappingBridgeDB
Or, from a user perspective:
metabolite
info
genome
info
Biological
information
Bioinformatics
Toolbox
Food
composition
Food
metabolome
Intake
Nutrition
information
Genepattern
R
proteome
db
metabolite
db
Generic
Data storage
GEO
Array
Express
PathVisio
Cytoscape
Specific
db`sSelenoDB
HuGE-
net
Polyphenol
db
proteome
info
WikiPathwaysKnowledge
sources
Semantic
Web
PubMed
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGOSemantic web integration
Literature,
text mining
External databases,
data mining
Concept store Triple store
dbNP Analytical tools
Knowledge integration
allows systems biology
approaches
www.dbNP.org
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Available
• NBXs
• DataGrid
• Toolbox (R, Cytoscape, PathVisio, Eu.Gene, etc)
• BridgeDB
• Data processing pipelines– Affymetrix
– Two colour– ChIP and Methylation arrays
– (miRNA analysis)
• Environment (Grails) chosen
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Being done
• Specifications (on wiki)
• Data format (Isatab +)
• Capturing tool
• Data structure
• Importing SNP data and representing
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Many people involved
Challenges of molecular nutrition research 6:
The Nutritional Phenotype database to store, share and
evaluate nutritional systems biology studies.
Ben van Ommen, Jildau Bouwman, Lars Dragsted,
Christian A. Drevon, Ruan Elliott, Philip de Groot, Jim
Kaput, John C. Mathers, Michael Müller, Fre Pepping, Jahn
Saito, Augustin Scalbert, Marijana Radonjic, Philippe
Rocca-Serra, Tony Travis, Suzan Wopereis and
Chris Evelo. Genes and Nutrition in press
Developers on dnNP.org
Many more to come…