Daehee Hwang Leroy Hood Institute for Systems Biology

34
Daehee Hwang Leroy Hood Institute for Systems Biology

description

Daehee Hwang Leroy Hood Institute for Systems Biology. Why Prequips for Systems Biology with proteomic data?. Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample analysis - PowerPoint PPT Presentation

Transcript of Daehee Hwang Leroy Hood Institute for Systems Biology

Page 1: Daehee Hwang Leroy Hood Institute for Systems Biology

Daehee HwangLeroy Hood

Institute for Systems Biology

Page 2: Daehee Hwang Leroy Hood Institute for Systems Biology

2Why Prequips for Systems Biology with proteomic data?

• Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample

analysis

• Need for an interface between proteomic data and systems biology analytical tools such as network/pathway analyses

Page 3: Daehee Hwang Leroy Hood Institute for Systems Biology

3Integration of proteomic data at various levels

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Communicationnot possible!

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

eRaw Data

(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?T

ran

s-P

rote

om

ic P

ipel

ine

Page 4: Daehee Hwang Leroy Hood Institute for Systems Biology

4Pep3d: Quality Assessment

Prequips

Multi Sample

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Pep3D

Properties

- quality assessment

- 2D gel-like visualizationGaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

Page 5: Daehee Hwang Leroy Hood Institute for Systems Biology

5Pep3d: Quality Assessment

Pep3D

Instance 1

Pep3D

Instance 2Communication

not possible!

Page 6: Daehee Hwang Leroy Hood Institute for Systems Biology

6Interface to Systems Biology

Gaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Communicationnot possible!

Page 7: Daehee Hwang Leroy Hood Institute for Systems Biology

7Prequips Overview

Prequips

Multi Sample

Gaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

- handles multiplesamples at all levels

Key Properties

- integrates high-levelanalysis tools

- is extensible

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Page 8: Daehee Hwang Leroy Hood Institute for Systems Biology

8Integration of proteomic datasets at various levels

Database Search

raw data

Mass Spectrometer

peptide-level data

e.g. mzXML, mzData, ...

Validation

Peptide Quantification

Protein Inference

protein-level data

Protein Quantitation

e.g. pepXML,AnalysisXML,...

e.g. protXML, ...

Trans-Proteomic Pipeline

annotation

further analysis results

Page 9: Daehee Hwang Leroy Hood Institute for Systems Biology

9

Raw Data

Data model

Peptide LevelProtein Level

Core Core CoreMeta Meta Meta

Single-Sample Analysis

Multi-Sample Analysis

Project

Data Providers

Data Structures

protein-level data source,e.g. protXML files

peptide-level datasource, e.g. pepXML,dta or AnalysisXML files

raw data level,e.g. mzXML or mzDatafiles

View

ers

Perspectives

Page 10: Daehee Hwang Leroy Hood Institute for Systems Biology

10Case Study: Toponomic change in drug treated Mø

Calreticulin

BiP

Bcl2

ATPase

Lamp1

2 4 6 8 10 12 14 16 18 20

8% 28%

114 115 116 117

Fraction #:

Mock1 Mock2 Thapsigargin

Page 11: Daehee Hwang Leroy Hood Institute for Systems Biology

11Visualization: Single exp.

CID spectrathat have been selected

detailed information about one of the level 2 spectra

projectmanager peak map for run 29

level 1 spectrum & corresponding CID spectra

level 1

level 2

level 2all scans of Mock 1 experiment

Page 12: Daehee Hwang Leroy Hood Institute for Systems Biology

12Visualization: Multiple exps.

(polymer?) contamination in all 4 runs(this would be hard to see with Pep3D)

green = 0red = 1

Page 13: Daehee Hwang Leroy Hood Institute for Systems Biology

13Visualization: assess, quntify, etc.Mock Up (software is under development):

m/z

min maxretention time

min max

map 1map 2map 3map 4map 5map 6

map 1 map 2

map 3 map 4X

XX

Doesn’t really match the remaining 3 maps!

Page 14: Daehee Hwang Leroy Hood Institute for Systems Biology

14Prequips & the Gaggle

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 15: Daehee Hwang Leroy Hood Institute for Systems Biology

15Mayday

Page 16: Daehee Hwang Leroy Hood Institute for Systems Biology

16Cytoscape

overall mouse protein/protein interaction map in Cytoscape

Page 17: Daehee Hwang Leroy Hood Institute for Systems Biology

17Analysis: Feature extraction

Proteintable

Gaggle pluginfor interactionwith other tools

Filters

Page 18: Daehee Hwang Leroy Hood Institute for Systems Biology

18Analysis: Feature extraction

Gaggle plugin: selection for broadcast

calreticulin

Page 19: Daehee Hwang Leroy Hood Institute for Systems Biology

19Analysis: Feature selection

Mock1 Mock2 Thapsigargin

Page 20: Daehee Hwang Leroy Hood Institute for Systems Biology

20Broadcast to Gaggle

Page 21: Daehee Hwang Leroy Hood Institute for Systems Biology

21Prequips to Gaggle

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 22: Daehee Hwang Leroy Hood Institute for Systems Biology

22Gaggle Boss

Page 23: Daehee Hwang Leroy Hood Institute for Systems Biology

23Gaggle to Cytoscape

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 24: Daehee Hwang Leroy Hood Institute for Systems Biology

24Integration: Network Analysis

proteasome complex

ribosome large subunit

chaperones

actin filamentregulation

Thapsigargin 114 iTRAQ ratio

Page 25: Daehee Hwang Leroy Hood Institute for Systems Biology

25Cytoscape to Prequips

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 26: Daehee Hwang Leroy Hood Institute for Systems Biology

26Analysis: Feature extraction- Module selection

the ids sentfrom Cytoscapethrough the Gaggle

proteasome proteins

Page 27: Daehee Hwang Leroy Hood Institute for Systems Biology

27Prequips & the Gaggle

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 28: Daehee Hwang Leroy Hood Institute for Systems Biology

28Analysis: Functional enrichmentthe proteasome complex enriched compared to a mouse genome background

Page 29: Daehee Hwang Leroy Hood Institute for Systems Biology

29Prequips Summary

Prequips

Multi Sample

Gaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

- handles multiplesamples at all levels

Key Properties

- integrates high-levelanalysis tools

- is extensible

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Page 30: Daehee Hwang Leroy Hood Institute for Systems Biology

30Conclusion

• general and extensible software for systems biology research with proteomics mass spectrometry data.

• Integration capability of data from various sources for visualization and analysis.

• An interactive environment that supports (visual) data exploration.

Page 31: Daehee Hwang Leroy Hood Institute for Systems Biology

31Software details

• implemented in Java

• based on Eclipse Rich Client Platform

• extremely modular architecture

• multiple plugin interfaces– e.g. viewers, data providers, algorithms

• meta information framework– analysis results, sequence information, annotation, ...– data structures as plugins– requirement to support future analytical tools and data

sources

Page 32: Daehee Hwang Leroy Hood Institute for Systems Biology

32Acknowledgements

• Special thanks to Nils Gehlenborg

• Hood Lab: Inyoul Lee

• Kay Nieselt

• Aebersold Lab: Nichole King, James Eddes,

Eric Deutsch, Ning Zhang, David

Shteynberg, Wei Yan, and Andrew Garbutt

• Paul Shannon for help with the Gaggle

Page 33: Daehee Hwang Leroy Hood Institute for Systems Biology

33

Core

Mayday

Database Gaggle

R

Visualization

Excel

PostgreSQLdatabase

MySQLdatabase

R environmentBioconductor

SBEAMSSBEAMSinstallation

Machine

Learning

WEKA Library

anything else

Prequips

Page 34: Daehee Hwang Leroy Hood Institute for Systems Biology

34Cytoscape