Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis

Post on 26-Jan-2016

39 views 0 download

Tags:

description

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis. Bio-Trac 40 (Protein Bioinformatics) October 9, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology - PowerPoint PPT Presentation

Transcript of Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis

1

Bio-Trac 40 (Protein Bioinformatics)Bio-Trac 40 (Protein Bioinformatics)

October 9, 2008October 9, 2008

Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate ProfessorResearch Associate ProfessorProtein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular BiologyBiochemistry and Molecular & Cellular BiologyGeorgetown University Medical CenterGeorgetown University Medical Center

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis

2

Overview• IntroductionIntroduction

- What are large-scale omics data?- What do they tell you? How to interpret?

• ApproachesApproaches- Omics data integration- Resources: databases and tools

• Case studiesCase studies• Systems biology Systems biology

- Top-down, bottom-up- Pathway, network modeling

3

Bioinformatics focus is changing…

• Individual moleculesIndividual molecules – DNA, RNA, proteins– Sequence, structure, function – Evolutionary analysis

• Population of moleculesPopulation of molecules– Genome, proteome and other “-omes”– Interactions, complexes– Pathways, processes– High level organizations

Genomics, Proteomics

4

From One Gene: multiple genetic variants, multiple transcripts,

multiple protein products…

and PTMs…

5

To Global Knowledge: The “-ome” and “-omics”

GenomeGenome TranscriptomeTranscriptome ProteomeProteome MetabolomeMetabolome

Other “-omes”:ORFeomePromoterome InteractomeReceptomePhenomemore…

6

SPARC

COL3A1

SULF1

YARS

ABCA5

THY1

SIDT2

Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007)

ECM cluster

Gen

es

Potential Gene Markers

Global analysis

Gastric Cancer

7

Identification of novel MAP kinase pathway signaling targets

Twenty-five targets of this signaling pathway were identified, of which only five were previously characterized as MKK/ERK effectors. The remaining targets suggest novel roles for this signaling cascade in cellular processes of nuclear transport, nucleotide excision repair, nucleosome assembly, membrane trafficking, and cytoskeletal regulation. -- Mol Cell. 6:1343-54, 2000

~3500 spots ~91spot

changes reproducible

Digest of U-24

(PMA/TPA K562 cells MAPK pathway targets)

8

Drosophila Embryo Interaction Map

The proteins in the map that bear The proteins in the map that bear an RA (Ras Association) or RBD an RA (Ras Association) or RBD (Raf-like Ras-binding) domain (Raf-like Ras-binding) domain define a discrete subnetwork define a discrete subnetwork around Ras-like GTPases (colored around Ras-like GTPases (colored in yellow).in yellow).

The exploration of the present map leads to numerous biological hypothesis and expands our knowledge of regulatory protein networks important in human cancer as shown by the biological analysis of a particularly interesting network surrounding the Ras oncogene. Genome Res. 15:376-84, 2005.

Using Y2H technology, 102 bait protein homologous to human cancer genes, 2300 interactions detected, 710 high confidence.

9

Strategy for Functional Analyses of Omics Data

Functional analysis

Protein mapping

Functional annotation

Omics Data

Microarray, 2D, IP, MS, etc.

Data integration

Bioinformatics Databases

Gene, Protein, PPI, Pathway, PTM, etc.

Literature (MEDLINE)Text mining

Pathway, network, biomarker discovery

~50% GO annotations

Biological pathways (e.g. KEGG, Reactome, PID, BioCarta)

GO Profiling:Molecular function, biological process,

cellular component

Molecular networks (e.g. interaction, association)

<10% pathway annotations

biological insights

10

Methods for Functional Analysis• Omics data integrationOmics data integration• Functional profilingFunctional profiling• Pathway analysisPathway analysis• Resources/knowledgebasesResources/knowledgebases

– Molecular databases– Omics data repositories

• Bioinformatics toolsBioinformatics tools– Open source: DAVID, FatiGO, iProXpress– Commercial: Ingenuity, GeneGO

• LiteratureLiterature– Text mining

11

Principles of multi-omics data integration for Systems Biology

Protein-Centric –Omics Analysis

Protein precursor

Signaling Pathways

Splicing forms

Functional Profiling and Analysis

Biological Processes

Function Sites Metabolic

Pathways

Enzyme1

Enzyme2

Protease/ Peptidase

iProXpress

DNA methylation profiling: coding

genes

Epigenomics

Transcriptomics

mRNA microarray

dbEST coding EST

Proteomics

Protein Peptide

Peptidomics

Natural peptides

dbSNP/ HapMap: NS-SNP

Genomics Metabolomics

Metabolites: HMDB

12

http://pir.georgetown.edu/pirwww/search/idmapping.shtml

ID Mapping

Information matrix

Functional profiling

Batch gene/protein retrieval and

profilingEnter ID, gi #

13

Well annotated entry: human p53

(P53_HUMAN)

Comments (CC line)

Cross References

(DR line)

Features (FT line)

References (RX line)

21 years!

Protein annotations

GO

14

what cellular component?

what molecular function?

what biological process?

15

Biological Pathways and NetworksSignaling pathways Metabolic pathways

Organelle biogenesis

Molecular networks

16

Pathways

http://www.pnas.org/cgi/data/0610772104/DC1/30

Global gene expression in skeletal muscle from gastric bypass patients before surgery and 1 year afterward.

General trend after surgery: up-regulated anaerobic metabolism; down-regulated oxidative phosphorylation

Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82

green, down-regulated genes red, up-regulated genes white, no data available

Human metabolic maps

17

Databases of Protein Functions

• Metabolic Pathways– KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic

Pathways– EcoCyc: Encyclopedia of E. coli Genes and Metabolism– MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)

• Inter-Molecular Interactions and Regulatory Pathways– IntAct: Protein interaction data from literature and user submission– BIND: Descriptions of interactions, molecular complexes and pathways– DIP: Catalogs experimentally determined interactions between proteins – Reactome - A curated knowledgebase of biological pathways – Pathway Interaction Database (PID)– BioCarta: Biological pathways of human and mouse– Pathway Commons

• GO and GO annotation projects

18

Gene Ontology

(GO)

19

GO Slim

http://www.geneontology.org/GO.slims.shtml

20

Biological Pathway Resource Collectionhttp://www.pathguide.org/

• Protein-protein interactions

• Metabolic pathways

• Signaling pathways

• Pathway diagrams

• Transcription factors / gene regulatory networks

• Protein-compound interactions

• Genetic interaction networks

21http://www.pathwaycommons.org/pc/home.do

22

KEGG Metabolic & Regulatory Pathways KEGG is a suite of databases and associated software, integrating our current knowledge on molecular

interaction networks, the information of genes and proteins, and of chemical compounds and reactions.

(http://www.genome.ad.jp/kegg/pathway.html)

23

BioCarta Cellular Pathways

(http://www.biocarta.com/index.asp)

Transforming Growth Factor (TGF) beta signaling [Homo sapiens]

24

Transforming Growth Factor (TGF) beta signaling [Homo sapiens]

Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleusObject -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……

(http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&)

Reactome: events and objects (including modified forms and complex)

25

PIDTransforming Growth Factor beta signaling

26

Reactome PID

~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID

Transforming Growth Factor (TGF) beta signaling

27

II ITGF-beta receptor

Smad 4

Nucleus

DNA binding and transcription regulation

MAPKKK

Shc S PY P

S PY P

Y PS P

Y P

S PS P

T P

PRO:000000523

XIAP

P38 MAPKpathway

JNKcascade

II IS P

Y PY P

S PY P

S PS P

T P

Sm

ad 7

ST

RA

P

TAK1K U

Degradation

TAK1

Smad 2

S PS P

Ski

Shc

Smad 2

PRO:000000650

Smad 4

Smad 2

S PS P

Smad 4

Smad 2

S PS P

Smad 4

Smad 2

S PS P

X

ERK1/2

CaM

Smad 2

TPSPSP

SPSP

TP

Smad 2

S PS P

X P

Furin

TGF- TGF-

TGF-LAP

Cytoplasm

X

Ca2+Growth signals

S P

T P

Y P

K U

Phosphorylation (P) at Serine (S), Threonine (T) and Tyrosine (Y)

Ubiquitination (U) at Lysine (K)

PRO:000000410

TGF-beta signaling – comparison between PID and Reactome

Growth signals

Only reported in Reactome

Common in both Reactome & PID

* All others are in PID. Not all components in the pathway from both databases are listed

Stress signals

MEKK1

T PT P

PRO:000000616

28PRIDE: centralized, standards compliant, public data repository for proteomics data

GEO: a gene expression/ molecular abundance repository

http://www.ebi.ac.uk/pride/

IntAct: open source database system and analysis tools for protein interaction data

http://www.ebi.ac.uk/pride/

http://www.ncbi.nlm.nih.gov/geo/

29

Analysis Tools• iProXpress

– http://pir.georgetown.edu/iproxpress/ • DAVID

– http://david.abcc.ncifcrf.gov/ • Babelomics - FatiGO

– http://babelomics.bioinfo.cipf.es/• Commercial:

– Ingenuity: http://www.ingenuity.com/ – GeneGO: http://www.genego.com/

• Visual tools: – Cytoscape: http://www.cytoscape.org/ – CellDesigner: http://www.celldesigner.org/

30

iProXpressiProXpress: Integrative analysis of : Integrative analysis of proteomic and gene expression dataproteomic and gene expression data

DataData

InformationInformation

KnowledgeKnowledge

MS spectrum

Peptide ident.

Protein ident.

FunctionPathwayFamily

CategorizeStatisticsAssociation

http://pir.georgetown.edu/iproxpress/

31

iProXpress – Pathway Profiling

• Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway…

• Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination.

ER Mit

Mit

ER

KEGG pathway

• Organelle proteome data sets

32

1

23

4 5

6

87

Cross-data groups comparative profiling

iProXpress Analysis Interface

33

http://david.abcc.ncifcrf.gov/

34

A Literature-Derived Network for Yeast

All MEDLINE abstracts processed using statistical co-occurrence and NLP methods:

• Functional association (co-occurrence – grey shades

• Physical interaction – green

• Regulation of expression – red

• Phosphorylation – dark blue

• Dephosphorylation – light blue

Inference: Ssn3 ->Hsp104 (b) and Ume6 -> Ino2 & Erg9 (c) expressions

Jensen et al., 2006

35

Pathway studies: analysis of proteomics and gene expression data from cancer research

I. Estrogen Signaling Pathways (estrogen-induced apoptosis)

II. Purine Metabolic Pathways (radiation-induced DNA repair)

III. Melanosome Biogenesis (comparative organelle proteomic profiling)

Breast cancer cells (+E2) IP (AIB1, pY) 1D-gel MS/MS

Human fibroblast (AT patient) + irradiation 2D-gel MS

DNA microarray

Melanoma cell isolation of stage specific melanosmes MS

Case Studies

36

pY-IP AIB1-IP

MS proteomics Expression Profiling, Pathway/Network Mapping

Integrated Bioinformatics

I. Estrogen Signaling Pathways (estrogen-induced apoptosis)

200nM for 2hE2

MCF-7 MCF-7/5CEstrogen deprived condition

Apoptosis

Breast cancer cells Signaling pathway: early events?

AIB1Growth

Mimicking clinical condition: 2nd phase anti-estrogen drug resistance

Hu ZZ, et al. (2008) US HUPO

37

GO profiling (biological process)

Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP

Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis

Transcription

Cell communication

38

G(o) alpha-2 subunit (pY/5C +E2)

RAP1GAP (AIB1/5C+E2)

Pathway Mapping:

39

Hypothesized E2-induced Apoptosis Pathways

Sirt3

TLE3

Histone modification, apoptosis

Co-repression, apoptosis

BAD-mediated apoptosis

Rap1GAP

GNAO2

CDK1

CIP29

pY-IP AIB1-IP

Cell cycle arrest/apoptosis

G(o) alpha-2, GPCR signaling

Growth inhibition/apoptosis

Function

Rap1GAP

E2

Rap1a

GPR30

Cytoplasm?

GNAO2pY

ERaE2

ERK

NucleusTLE3

RUNX3

Sirt3

CIP29pY

ERaE2

MEK

Sirt3

Apoptosis

Apoptosis

CDK1pY

BAD

AIB1

AIB1

Cell growth

Gas?

40

Text mining for protein-protein interaction (PPI) information

41

Proteins differentially expressed (1093)

2D-gel/MS

mRNAs differentially expressed (231)

DNA Microarray

(13 proteins/genes)

Intersections

Expression Profiling, Pathway/Network Mapping

Integrated Bioinformatics

AT5BIVA ATCL8

ATM introduced

AT patient fibroblast

Ionizing Radiation

Sensitive to IR damage

Resistant to IR damage

ATM-mutated ATM-wild type

II. Purine Metabolic Pathways (radiation-induced DNA repair)

ATM

Hu ZZ, et al. (2008) J Prot. Bioinfo.

42

KEGG pathway profiles

43

(RRM2)

44

Purine metabolic pathway

Ribonucleoside diphosphate reductase subunit M2 (RRM2)

DNA synthesis DNA repair

1.17.4.1

ATP X dATP

ADP dADP

dGTP X GTP

dGDPGDP

1.17.4.1

45

RRM2

p53BRCA1

HDAC1

Functional Association Networks

RRM2 connected to other major DNA repair and cell cycle proteins, such as p53, BRCA1, HDAC1.

46

ATM

BRCA1

p53

p53

RRM2RRM1

DNA repair

HDAC1

RR complex

BRCA1

ATM

RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway

47

III. Organelle Proteomes

Nucleus

Early endosome

Late endosome

Lysosome

Keratinocytes

Golgi

vATPaseG2

2

1

3

4

4

A

B

C

2

3

Stage I hybrid organelle

PEDF

MART1

TYR

Tyrp1

Flotillin-2

I1Matp

MGST3

Stage IIMART1

TYR

Tyrp1

Atp7a

Matp

Cu2+

AP-2a

SLC24A5 (golden)

vATPaseG2

Rab27a

Rab5c

P21-rac1

Tyrp1TYR

Molecularmotors:

kinesin, dynein/dynactin, dynaminMyosin V, myosin

Ic, Id, I4

Myo

-Va

Rab38

Lyst

DDT?

H+

Na+/K+/Ca2+

AP-2a

-actin ARPC4

Stage IV

Vinculin

Drebrin

Rab5

Pmel17

Pmel17

Pmel17

V

V

V

V

V

V

P

M

M

M

M

M

M

P

PP P

P

P

PM

MP

P

Newly identified and validated

Mouse color gene homolog

Proposed new protein

P

P

P

Sec24P

VAP-AP

* Untagged are known melanosome proteins

OA1

DCT

Melanocyte

Nucleus

Early endosome

Early endosome

Late endosome

LysosomeLysosome

Keratinocytes

Golgi

vATPaseG2

2

1

3

4

4

A

B

C

2

3

Stage I hybrid organelle

PEDF

MART1

TYRTYR

Tyrp1

Flotillin-2

I1Matp

MGST3

Stage IIMART1

TYRTYR

Tyrp1

Atp7a

Matp

Cu2+

AP-2a

SLC24A5 (golden)

vATPaseG2

Rab27a

Rab5c

P21-rac1

Tyrp1TYRTYR

Molecularmotors:

kinesin, dynein/dynactin, dynaminMyosin V, myosin

Ic, Id, I4

Myo

-Va

Rab38

Lyst

DDT?

H+

Na+/K+/Ca2+

AP-2a

-actin ARPC4

Stage IV

Vinculin

Drebrin

Rab5

Pmel17

Pmel17

Pmel17

V

V

V

V

V

V

P

M

M

M

M

M

M

P

PP P

P

P

PM

MP

P

Newly identified and validated

Mouse color gene homolog

Proposed new protein

P

P

P

Sec24P

VAP-AP

* Untagged are known melanosome proteins

OA1OA1

DCTDCT

Melanocyte

Comparative organelle proteome profiling allows to propose key proteins potentially involved in regulation of organelle biogenesis

Schematic drawing of melanosome biogenesis pathway and key proteins involved in each stage.

Chi A, et al. (2006) J. Prot. Res.

48

Towards Systems Biology

(Nature 422:193, 2003)

GenomicsTranscriptomics

ProteomicsMetabolomics Bioinformatics

Bibliomics

…mics…mics…omics

Literature MiningIntegrated knowledge and tools are needed for Systems Biology’s research

49

What is Systems Biology?

‘Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.’ -- Leroy Hood

Systems Biology, 2004, 1(1):19-27.

• How an organism works from an overall perspective.

• Interactions of parts of biological systems

– how molecules work together to serve a regulator function in cells or between cells.

– how cells work to make organs, how organs work to make a person.

• Systems biology is the converse of reductionist biology.

50

Reductionist vs. Systems Biology

The driving force in 20th century biology has been reductionism:

From the population to the individual From the individual to the cell

From the cell to the biomolecule From the biomolecule to the genome

From the genome to the genome sequence

With the publication of genome sequences, reductionist biology has

reached its endpoint

The driving force for 21st century biology will be integration:

Integrating the activity of genes and regulators into regulatory networks

Integrating the interactions of amino acids into protein folding predictions

Integrating the interactions of metabolites into metabolic networks

Integrating the interactions of cells into organisms

Integrating the interactions of individuals into ecosystems

51

Although the individual components are unique to a given organism, the topologic properties of cellular networks share surprising similarities with those of natural and social networks

Level 4

Level 3

Level 2

Level 1

Universal Organizing Principles

Large-scale organization

Functional modules

Regulatory motif, pathway

Omics data, information

52

Approaches: top-down or bottom-up

Bruggeman FJ, Westerhoff HV. Trends Microbiol. 2007 15:45-50.

• top-down: systemic-data driven, to discover or refine pre-existing models that describe the measured data (more on regulatory models). Emerges as dominant method due to “-omics”.

• bottom-up: starts with the molecular properties to construct models to predict systemic properties followed by validation and model refinement (more on kinetic models) (Silicon cell program: http://www.siliconcell.net/)

Three types of models

53

Top-down

Curr Opin Chem Biol. 2006 Dec;10(6):551-8.

Yeast two-hybrid

Combination of techniques (Y2H, protein arrays)

Integration of other types of information (expression, localization or genetic studies)

dynamic biologically relevant interaction subnetworks

54

EGFR-GAB1-ERK/Akt network

J Biol Chem. 2006 281:19925-38

EGFR signaling network model is constructed based on the reaction stoichiometry and kinetic constants Bottom-up

The model allows predictions of temporal patterns of cellular responses to EGF under diverse perturbations (e.g., EGF doses):

• The dynamics of GAB1 tyr-phosphorylation is controlled by positive GAB1-PI3K and negative MAPK-GAB1 feedbacks. • The essential function of GAB1 is to enhance PI3K/Akt activation and extend the duration of Ras/MAPK signaling. • GAB1 plays a critical role in cell proliferation and tumorigenesis by amplifying positive interactions between survival and

mitogenic pathways

55

Gene regulatory networks (GRNs)

Reprod Toxicol. 19:281-90, 2005

WIRED Systems biology looks at the connections between components in cells.

Essential elements of the role of Dorsal in establishing dorsoventral polarity in Drosophila embryonic development

56

Modeling of the main modules of cell-cycle progression

Chembiochem 5:1322-33, 2004

Three functional Three functional units:units:

• Start function: onset of S-phase• Cyclin cascades (C1, C2, C3)• End function: onset of mitosis to cell division

57

Challenges to Systems Biology

• A complete characterization of an organism (molecular constituents interactions cell function)

• Spatial-temporal molecular characterization of a cell• A thorough systems analysis of “molecular response”

of a cell to external/internal perturbations• Information must be integrated into mathematical

models to enable knowledge testing by formulating hypothesis and discovery of new biological mechanisms…

58

Cellular Maps?signaling, metabolism, gene regulation …