Daehee Hwang Leroy Hood Institute for Systems Biology
description
Transcript of Daehee Hwang Leroy Hood Institute for Systems Biology
Daehee HwangLeroy Hood
Institute for Systems Biology
2Why Prequips for Systems Biology with proteomic data?
• Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample
analysis
• Need for an interface between proteomic data and systems biology analytical tools such as network/pathway analyses
3Integration of proteomic data at various levels
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
Communicationnot possible!
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
eRaw Data
(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?T
ran
s-P
rote
om
ic P
ipel
ine
4Pep3d: Quality Assessment
Prequips
Multi Sample
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
Pep3D
Properties
- quality assessment
- 2D gel-like visualizationGaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
5Pep3d: Quality Assessment
Pep3D
Instance 1
Pep3D
Instance 2Communication
not possible!
6Interface to Systems Biology
Gaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
Communicationnot possible!
7Prequips Overview
Prequips
Multi Sample
Gaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
- handles multiplesamples at all levels
Key Properties
- integrates high-levelanalysis tools
- is extensible
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
8Integration of proteomic datasets at various levels
Database Search
raw data
Mass Spectrometer
peptide-level data
e.g. mzXML, mzData, ...
Validation
Peptide Quantification
Protein Inference
protein-level data
Protein Quantitation
e.g. pepXML,AnalysisXML,...
e.g. protXML, ...
Trans-Proteomic Pipeline
annotation
further analysis results
9
Raw Data
Data model
Peptide LevelProtein Level
Core Core CoreMeta Meta Meta
Single-Sample Analysis
Multi-Sample Analysis
Project
Data Providers
Data Structures
protein-level data source,e.g. protXML files
peptide-level datasource, e.g. pepXML,dta or AnalysisXML files
raw data level,e.g. mzXML or mzDatafiles
View
ers
Perspectives
10Case Study: Toponomic change in drug treated Mø
Calreticulin
BiP
Bcl2
ATPase
Lamp1
2 4 6 8 10 12 14 16 18 20
8% 28%
114 115 116 117
Fraction #:
Mock1 Mock2 Thapsigargin
11Visualization: Single exp.
CID spectrathat have been selected
detailed information about one of the level 2 spectra
projectmanager peak map for run 29
level 1 spectrum & corresponding CID spectra
level 1
level 2
level 2all scans of Mock 1 experiment
12Visualization: Multiple exps.
(polymer?) contamination in all 4 runs(this would be hard to see with Pep3D)
green = 0red = 1
13Visualization: assess, quntify, etc.Mock Up (software is under development):
m/z
min maxretention time
min max
map 1map 2map 3map 4map 5map 6
map 1 map 2
map 3 map 4X
XX
Doesn’t really match the remaining 3 maps!
14Prequips & the Gaggle
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
15Mayday
16Cytoscape
overall mouse protein/protein interaction map in Cytoscape
17Analysis: Feature extraction
Proteintable
Gaggle pluginfor interactionwith other tools
Filters
18Analysis: Feature extraction
Gaggle plugin: selection for broadcast
calreticulin
19Analysis: Feature selection
Mock1 Mock2 Thapsigargin
20Broadcast to Gaggle
21Prequips to Gaggle
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
22Gaggle Boss
23Gaggle to Cytoscape
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
24Integration: Network Analysis
proteasome complex
ribosome large subunit
chaperones
actin filamentregulation
Thapsigargin 114 iTRAQ ratio
25Cytoscape to Prequips
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
26Analysis: Feature extraction- Module selection
the ids sentfrom Cytoscapethrough the Gaggle
proteasome proteins
27Prequips & the Gaggle
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
28Analysis: Functional enrichmentthe proteasome complex enriched compared to a mouse genome background
29Prequips Summary
Prequips
Multi Sample
Gaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
- handles multiplesamples at all levels
Key Properties
- integrates high-levelanalysis tools
- is extensible
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
30Conclusion
• general and extensible software for systems biology research with proteomics mass spectrometry data.
• Integration capability of data from various sources for visualization and analysis.
• An interactive environment that supports (visual) data exploration.
31Software details
• implemented in Java
• based on Eclipse Rich Client Platform
• extremely modular architecture
• multiple plugin interfaces– e.g. viewers, data providers, algorithms
• meta information framework– analysis results, sequence information, annotation, ...– data structures as plugins– requirement to support future analytical tools and data
sources
32Acknowledgements
• Special thanks to Nils Gehlenborg
• Hood Lab: Inyoul Lee
• Kay Nieselt
• Aebersold Lab: Nichole King, James Eddes,
Eric Deutsch, Ning Zhang, David
Shteynberg, Wei Yan, and Andrew Garbutt
• Paul Shannon for help with the Gaggle
33
Core
Mayday
Database Gaggle
R
Visualization
Excel
PostgreSQLdatabase
MySQLdatabase
R environmentBioconductor
SBEAMSSBEAMSinstallation
Machine
Learning
WEKA Library
anything else
Prequips
34Cytoscape