Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis
description
Transcript of Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis
1
Bio-Trac 40 (Protein Bioinformatics)Bio-Trac 40 (Protein Bioinformatics)
October 9, 2008October 9, 2008
Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate ProfessorResearch Associate ProfessorProtein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular BiologyBiochemistry and Molecular & Cellular BiologyGeorgetown University Medical CenterGeorgetown University Medical Center
Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis
2
Overview• IntroductionIntroduction
- What are large-scale omics data?- What do they tell you? How to interpret?
• ApproachesApproaches- Omics data integration- Resources: databases and tools
• Case studiesCase studies• Systems biology Systems biology
- Top-down, bottom-up- Pathway, network modeling
3
Bioinformatics focus is changing…
• Individual moleculesIndividual molecules – DNA, RNA, proteins– Sequence, structure, function – Evolutionary analysis
• Population of moleculesPopulation of molecules– Genome, proteome and other “-omes”– Interactions, complexes– Pathways, processes– High level organizations
Genomics, Proteomics
4
From One Gene: multiple genetic variants, multiple transcripts,
multiple protein products…
and PTMs…
5
To Global Knowledge: The “-ome” and “-omics”
GenomeGenome TranscriptomeTranscriptome ProteomeProteome MetabolomeMetabolome
Other “-omes”:ORFeomePromoterome InteractomeReceptomePhenomemore…
6
SPARC
COL3A1
SULF1
YARS
ABCA5
THY1
SIDT2
Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007)
ECM cluster
Gen
es
Potential Gene Markers
Global analysis
Gastric Cancer
7
Identification of novel MAP kinase pathway signaling targets
Twenty-five targets of this signaling pathway were identified, of which only five were previously characterized as MKK/ERK effectors. The remaining targets suggest novel roles for this signaling cascade in cellular processes of nuclear transport, nucleotide excision repair, nucleosome assembly, membrane trafficking, and cytoskeletal regulation. -- Mol Cell. 6:1343-54, 2000
~3500 spots ~91spot
changes reproducible
Digest of U-24
(PMA/TPA K562 cells MAPK pathway targets)
8
Drosophila Embryo Interaction Map
The proteins in the map that bear The proteins in the map that bear an RA (Ras Association) or RBD an RA (Ras Association) or RBD (Raf-like Ras-binding) domain (Raf-like Ras-binding) domain define a discrete subnetwork define a discrete subnetwork around Ras-like GTPases (colored around Ras-like GTPases (colored in yellow).in yellow).
The exploration of the present map leads to numerous biological hypothesis and expands our knowledge of regulatory protein networks important in human cancer as shown by the biological analysis of a particularly interesting network surrounding the Ras oncogene. Genome Res. 15:376-84, 2005.
Using Y2H technology, 102 bait protein homologous to human cancer genes, 2300 interactions detected, 710 high confidence.
9
Strategy for Functional Analyses of Omics Data
Functional analysis
Protein mapping
Functional annotation
Omics Data
Microarray, 2D, IP, MS, etc.
Data integration
Bioinformatics Databases
Gene, Protein, PPI, Pathway, PTM, etc.
Literature (MEDLINE)Text mining
Pathway, network, biomarker discovery
~50% GO annotations
Biological pathways (e.g. KEGG, Reactome, PID, BioCarta)
GO Profiling:Molecular function, biological process,
cellular component
Molecular networks (e.g. interaction, association)
<10% pathway annotations
biological insights
10
Methods for Functional Analysis• Omics data integrationOmics data integration• Functional profilingFunctional profiling• Pathway analysisPathway analysis• Resources/knowledgebasesResources/knowledgebases
– Molecular databases– Omics data repositories
• Bioinformatics toolsBioinformatics tools– Open source: DAVID, FatiGO, iProXpress– Commercial: Ingenuity, GeneGO
• LiteratureLiterature– Text mining
11
Principles of multi-omics data integration for Systems Biology
Protein-Centric –Omics Analysis
Protein precursor
Signaling Pathways
Splicing forms
Functional Profiling and Analysis
Biological Processes
Function Sites Metabolic
Pathways
Enzyme1
Enzyme2
Protease/ Peptidase
iProXpress
DNA methylation profiling: coding
genes
Epigenomics
Transcriptomics
mRNA microarray
dbEST coding EST
Proteomics
Protein Peptide
Peptidomics
Natural peptides
dbSNP/ HapMap: NS-SNP
Genomics Metabolomics
Metabolites: HMDB
12
http://pir.georgetown.edu/pirwww/search/idmapping.shtml
ID Mapping
Information matrix
Functional profiling
Batch gene/protein retrieval and
profilingEnter ID, gi #
13
Well annotated entry: human p53
(P53_HUMAN)
Comments (CC line)
Cross References
(DR line)
Features (FT line)
References (RX line)
21 years!
Protein annotations
GO
14
what cellular component?
what molecular function?
what biological process?
15
Biological Pathways and NetworksSignaling pathways Metabolic pathways
Organelle biogenesis
Molecular networks
16
Pathways
http://www.pnas.org/cgi/data/0610772104/DC1/30
Global gene expression in skeletal muscle from gastric bypass patients before surgery and 1 year afterward.
General trend after surgery: up-regulated anaerobic metabolism; down-regulated oxidative phosphorylation
Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82
green, down-regulated genes red, up-regulated genes white, no data available
Human metabolic maps
17
Databases of Protein Functions
• Metabolic Pathways– KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic
Pathways– EcoCyc: Encyclopedia of E. coli Genes and Metabolism– MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)
• Inter-Molecular Interactions and Regulatory Pathways– IntAct: Protein interaction data from literature and user submission– BIND: Descriptions of interactions, molecular complexes and pathways– DIP: Catalogs experimentally determined interactions between proteins – Reactome - A curated knowledgebase of biological pathways – Pathway Interaction Database (PID)– BioCarta: Biological pathways of human and mouse– Pathway Commons
• GO and GO annotation projects
18
Gene Ontology
(GO)
19
GO Slim
http://www.geneontology.org/GO.slims.shtml
20
Biological Pathway Resource Collectionhttp://www.pathguide.org/
• Protein-protein interactions
• Metabolic pathways
• Signaling pathways
• Pathway diagrams
• Transcription factors / gene regulatory networks
• Protein-compound interactions
• Genetic interaction networks
21http://www.pathwaycommons.org/pc/home.do
22
KEGG Metabolic & Regulatory Pathways KEGG is a suite of databases and associated software, integrating our current knowledge on molecular
interaction networks, the information of genes and proteins, and of chemical compounds and reactions.
(http://www.genome.ad.jp/kegg/pathway.html)
23
BioCarta Cellular Pathways
(http://www.biocarta.com/index.asp)
Transforming Growth Factor (TGF) beta signaling [Homo sapiens]
24
Transforming Growth Factor (TGF) beta signaling [Homo sapiens]
Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleusObject -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……
(http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&)
Reactome: events and objects (including modified forms and complex)
25
PIDTransforming Growth Factor beta signaling
26
Reactome PID
~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID
Transforming Growth Factor (TGF) beta signaling
27
II ITGF-beta receptor
Smad 4
Nucleus
DNA binding and transcription regulation
MAPKKK
Shc S PY P
S PY P
Y PS P
Y P
S PS P
T P
PRO:000000523
XIAP
P38 MAPKpathway
JNKcascade
II IS P
Y PY P
S PY P
S PS P
T P
Sm
ad 7
ST
RA
P
TAK1K U
Degradation
TAK1
Smad 2
S PS P
Ski
Shc
Smad 2
PRO:000000650
Smad 4
Smad 2
S PS P
Smad 4
Smad 2
S PS P
Smad 4
Smad 2
S PS P
X
ERK1/2
CaM
Smad 2
TPSPSP
SPSP
TP
Smad 2
S PS P
X P
Furin
TGF- TGF-
TGF-LAP
Cytoplasm
X
Ca2+Growth signals
S P
T P
Y P
K U
Phosphorylation (P) at Serine (S), Threonine (T) and Tyrosine (Y)
Ubiquitination (U) at Lysine (K)
PRO:000000410
TGF-beta signaling – comparison between PID and Reactome
Growth signals
Only reported in Reactome
Common in both Reactome & PID
* All others are in PID. Not all components in the pathway from both databases are listed
Stress signals
MEKK1
T PT P
PRO:000000616
28PRIDE: centralized, standards compliant, public data repository for proteomics data
GEO: a gene expression/ molecular abundance repository
http://www.ebi.ac.uk/pride/
IntAct: open source database system and analysis tools for protein interaction data
http://www.ebi.ac.uk/pride/
http://www.ncbi.nlm.nih.gov/geo/
29
Analysis Tools• iProXpress
– http://pir.georgetown.edu/iproxpress/ • DAVID
– http://david.abcc.ncifcrf.gov/ • Babelomics - FatiGO
– http://babelomics.bioinfo.cipf.es/• Commercial:
– Ingenuity: http://www.ingenuity.com/ – GeneGO: http://www.genego.com/
• Visual tools: – Cytoscape: http://www.cytoscape.org/ – CellDesigner: http://www.celldesigner.org/
30
iProXpressiProXpress: Integrative analysis of : Integrative analysis of proteomic and gene expression dataproteomic and gene expression data
DataData
InformationInformation
KnowledgeKnowledge
MS spectrum
Peptide ident.
Protein ident.
FunctionPathwayFamily
CategorizeStatisticsAssociation
http://pir.georgetown.edu/iproxpress/
31
iProXpress – Pathway Profiling
• Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway…
• Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination.
ER Mit
Mit
ER
KEGG pathway
• Organelle proteome data sets
32
1
23
4 5
6
87
Cross-data groups comparative profiling
iProXpress Analysis Interface
33
http://david.abcc.ncifcrf.gov/
34
A Literature-Derived Network for Yeast
All MEDLINE abstracts processed using statistical co-occurrence and NLP methods:
• Functional association (co-occurrence – grey shades
• Physical interaction – green
• Regulation of expression – red
• Phosphorylation – dark blue
• Dephosphorylation – light blue
Inference: Ssn3 ->Hsp104 (b) and Ume6 -> Ino2 & Erg9 (c) expressions
Jensen et al., 2006
35
Pathway studies: analysis of proteomics and gene expression data from cancer research
I. Estrogen Signaling Pathways (estrogen-induced apoptosis)
II. Purine Metabolic Pathways (radiation-induced DNA repair)
III. Melanosome Biogenesis (comparative organelle proteomic profiling)
Breast cancer cells (+E2) IP (AIB1, pY) 1D-gel MS/MS
Human fibroblast (AT patient) + irradiation 2D-gel MS
DNA microarray
Melanoma cell isolation of stage specific melanosmes MS
Case Studies
36
pY-IP AIB1-IP
MS proteomics Expression Profiling, Pathway/Network Mapping
Integrated Bioinformatics
I. Estrogen Signaling Pathways (estrogen-induced apoptosis)
200nM for 2hE2
MCF-7 MCF-7/5CEstrogen deprived condition
Apoptosis
Breast cancer cells Signaling pathway: early events?
AIB1Growth
Mimicking clinical condition: 2nd phase anti-estrogen drug resistance
Hu ZZ, et al. (2008) US HUPO
37
GO profiling (biological process)
Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP
Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis
Transcription
Cell communication
38
G(o) alpha-2 subunit (pY/5C +E2)
RAP1GAP (AIB1/5C+E2)
Pathway Mapping:
39
Hypothesized E2-induced Apoptosis Pathways
Sirt3
TLE3
Histone modification, apoptosis
Co-repression, apoptosis
BAD-mediated apoptosis
Rap1GAP
GNAO2
CDK1
CIP29
pY-IP AIB1-IP
Cell cycle arrest/apoptosis
G(o) alpha-2, GPCR signaling
Growth inhibition/apoptosis
Function
Rap1GAP
E2
Rap1a
GPR30
Cytoplasm?
GNAO2pY
ERaE2
ERK
NucleusTLE3
RUNX3
Sirt3
CIP29pY
ERaE2
MEK
Sirt3
Apoptosis
Apoptosis
CDK1pY
BAD
AIB1
AIB1
Cell growth
Gas?
40
Text mining for protein-protein interaction (PPI) information
41
Proteins differentially expressed (1093)
2D-gel/MS
mRNAs differentially expressed (231)
DNA Microarray
(13 proteins/genes)
Intersections
Expression Profiling, Pathway/Network Mapping
Integrated Bioinformatics
AT5BIVA ATCL8
ATM introduced
AT patient fibroblast
Ionizing Radiation
Sensitive to IR damage
Resistant to IR damage
ATM-mutated ATM-wild type
II. Purine Metabolic Pathways (radiation-induced DNA repair)
ATM
Hu ZZ, et al. (2008) J Prot. Bioinfo.
42
KEGG pathway profiles
43
(RRM2)
44
Purine metabolic pathway
Ribonucleoside diphosphate reductase subunit M2 (RRM2)
DNA synthesis DNA repair
1.17.4.1
ATP X dATP
ADP dADP
dGTP X GTP
dGDPGDP
1.17.4.1
45
RRM2
p53BRCA1
HDAC1
Functional Association Networks
RRM2 connected to other major DNA repair and cell cycle proteins, such as p53, BRCA1, HDAC1.
46
ATM
BRCA1
p53
p53
RRM2RRM1
DNA repair
HDAC1
RR complex
BRCA1
ATM
RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway
47
III. Organelle Proteomes
Nucleus
Early endosome
Late endosome
Lysosome
Keratinocytes
Golgi
vATPaseG2
2
1
3
4
4
A
B
C
2
3
Stage I hybrid organelle
PEDF
MART1
TYR
Tyrp1
Flotillin-2
I1Matp
MGST3
Stage IIMART1
TYR
Tyrp1
Atp7a
Matp
Cu2+
AP-2a
SLC24A5 (golden)
vATPaseG2
Rab27a
Rab5c
P21-rac1
Tyrp1TYR
Molecularmotors:
kinesin, dynein/dynactin, dynaminMyosin V, myosin
Ic, Id, I4
Myo
-Va
Rab38
Lyst
DDT?
H+
Na+/K+/Ca2+
AP-2a
-actin ARPC4
Stage IV
Vinculin
Drebrin
Rab5
Pmel17
Pmel17
Pmel17
V
V
V
V
V
V
P
M
M
M
M
M
M
P
PP P
P
P
PM
MP
P
Newly identified and validated
Mouse color gene homolog
Proposed new protein
P
P
P
Sec24P
VAP-AP
* Untagged are known melanosome proteins
OA1
DCT
Melanocyte
Nucleus
Early endosome
Early endosome
Late endosome
LysosomeLysosome
Keratinocytes
Golgi
vATPaseG2
2
1
3
4
4
A
B
C
2
3
Stage I hybrid organelle
PEDF
MART1
TYRTYR
Tyrp1
Flotillin-2
I1Matp
MGST3
Stage IIMART1
TYRTYR
Tyrp1
Atp7a
Matp
Cu2+
AP-2a
SLC24A5 (golden)
vATPaseG2
Rab27a
Rab5c
P21-rac1
Tyrp1TYRTYR
Molecularmotors:
kinesin, dynein/dynactin, dynaminMyosin V, myosin
Ic, Id, I4
Myo
-Va
Rab38
Lyst
DDT?
H+
Na+/K+/Ca2+
AP-2a
-actin ARPC4
Stage IV
Vinculin
Drebrin
Rab5
Pmel17
Pmel17
Pmel17
V
V
V
V
V
V
P
M
M
M
M
M
M
P
PP P
P
P
PM
MP
P
Newly identified and validated
Mouse color gene homolog
Proposed new protein
P
P
P
Sec24P
VAP-AP
* Untagged are known melanosome proteins
OA1OA1
DCTDCT
Melanocyte
Comparative organelle proteome profiling allows to propose key proteins potentially involved in regulation of organelle biogenesis
Schematic drawing of melanosome biogenesis pathway and key proteins involved in each stage.
Chi A, et al. (2006) J. Prot. Res.
48
Towards Systems Biology
(Nature 422:193, 2003)
GenomicsTranscriptomics
ProteomicsMetabolomics Bioinformatics
Bibliomics
…mics…mics…omics
Literature MiningIntegrated knowledge and tools are needed for Systems Biology’s research
49
What is Systems Biology?
‘Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.’ -- Leroy Hood
Systems Biology, 2004, 1(1):19-27.
• How an organism works from an overall perspective.
• Interactions of parts of biological systems
– how molecules work together to serve a regulator function in cells or between cells.
– how cells work to make organs, how organs work to make a person.
• Systems biology is the converse of reductionist biology.
50
Reductionist vs. Systems Biology
The driving force in 20th century biology has been reductionism:
From the population to the individual From the individual to the cell
From the cell to the biomolecule From the biomolecule to the genome
From the genome to the genome sequence
With the publication of genome sequences, reductionist biology has
reached its endpoint
The driving force for 21st century biology will be integration:
Integrating the activity of genes and regulators into regulatory networks
Integrating the interactions of amino acids into protein folding predictions
Integrating the interactions of metabolites into metabolic networks
Integrating the interactions of cells into organisms
Integrating the interactions of individuals into ecosystems
51
Although the individual components are unique to a given organism, the topologic properties of cellular networks share surprising similarities with those of natural and social networks
Level 4
Level 3
Level 2
Level 1
Universal Organizing Principles
Large-scale organization
Functional modules
Regulatory motif, pathway
Omics data, information
52
Approaches: top-down or bottom-up
Bruggeman FJ, Westerhoff HV. Trends Microbiol. 2007 15:45-50.
• top-down: systemic-data driven, to discover or refine pre-existing models that describe the measured data (more on regulatory models). Emerges as dominant method due to “-omics”.
• bottom-up: starts with the molecular properties to construct models to predict systemic properties followed by validation and model refinement (more on kinetic models) (Silicon cell program: http://www.siliconcell.net/)
Three types of models
53
Top-down
Curr Opin Chem Biol. 2006 Dec;10(6):551-8.
Yeast two-hybrid
Combination of techniques (Y2H, protein arrays)
Integration of other types of information (expression, localization or genetic studies)
dynamic biologically relevant interaction subnetworks
54
EGFR-GAB1-ERK/Akt network
J Biol Chem. 2006 281:19925-38
EGFR signaling network model is constructed based on the reaction stoichiometry and kinetic constants Bottom-up
The model allows predictions of temporal patterns of cellular responses to EGF under diverse perturbations (e.g., EGF doses):
• The dynamics of GAB1 tyr-phosphorylation is controlled by positive GAB1-PI3K and negative MAPK-GAB1 feedbacks. • The essential function of GAB1 is to enhance PI3K/Akt activation and extend the duration of Ras/MAPK signaling. • GAB1 plays a critical role in cell proliferation and tumorigenesis by amplifying positive interactions between survival and
mitogenic pathways
55
Gene regulatory networks (GRNs)
Reprod Toxicol. 19:281-90, 2005
WIRED Systems biology looks at the connections between components in cells.
Essential elements of the role of Dorsal in establishing dorsoventral polarity in Drosophila embryonic development
56
Modeling of the main modules of cell-cycle progression
Chembiochem 5:1322-33, 2004
Three functional Three functional units:units:
• Start function: onset of S-phase• Cyclin cascades (C1, C2, C3)• End function: onset of mitosis to cell division
57
Challenges to Systems Biology
• A complete characterization of an organism (molecular constituents interactions cell function)
• Spatial-temporal molecular characterization of a cell• A thorough systems analysis of “molecular response”
of a cell to external/internal perturbations• Information must be integrated into mathematical
models to enable knowledge testing by formulating hypothesis and discovery of new biological mechanisms…
58
Cellular Maps?signaling, metabolism, gene regulation …