Building biological networks from diverse genomic data Chad Myers Department of Computer Science,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Building biological networks from diverse genomic data Chad Myers Department of Computer Science,...
Building biological networks from diverse genomic data
Chad Myers
Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics
Princeton University
PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006
2
Motivation: building biological networks from experimental data
Explosion of functional genomic DATA
KNOWLEDGE of components and inter-relationships that lead to function
? Find missing pathway components
Detect uncharacterized crosstalk between pathways
Discover novel pathways
3
Motivation: building biological networks from experimental data
noisy
How can we harness this information without sacrificing precision?
4Directed network discovery: involving the biologist in the search process
Previous approaches to network analysis from genomic data:
largely undirected global approaches that detect interesting network features
Incorporating expert direction can:
Improve sensitivity and precision by using context information
Focus on relevant information for biologist user (allows interactivity)
Two-hybrid interaction network, yeast (SH3
domain) Boone lab
Previous work: Bader et al. (2003), Asthana et al. (2004)Yamanashi et al. (2004,2005), Kato et al. (2005)
5
bioPIXIE system overview
bioPIXIE: Pathway Inference from eXperimental Interaction Evidence
6
Overview
How do we integrate heterogeneous evidence?
Expert-driven network discoveryMaking it usable: practical visualization
and other interface considerationsDoes it work?
(evaluation experiments and biological validation)
Challenges/opportunities and future work
7
Heterogeneous data integration
Diverse forms of data: what’s a unifying framework?
Variable coverage, reliability, and relevance Integration scheme should utilize information in data
when available, but be robust when missing
physical binding
genetic interaction
cellular localization
expressionsequence (TF motifs, coding,…)
Bayes net
Map to associations of genes/proteins
8 Bayes net for evidence integration
Functional Relationship
Microarray correlation
Shared transcription
factors
Purified complex
Affinity precipitation
2 Hybrid
Syntheticlethality
Syntheticrescue
Co-localization
evidenceproteintorelatedlyfunctionalisprotein jiPWe infer:
Input evidence: grouped by lab (source) and by type
Structure:
Naïve Bayes (~60 nodes)
(also tried TAN)
CPT’s:
learned from GO gold standard
Fully-connected, weighted graph
of proteins
…
9
Overview
How do we integrate heterogeneous evidence?
Expert-driven network discoveryMaking it usable: practical visualization
and other interface considerationsDoes it work?
(evaluation experiments and biological validation)
Challenges/opportunities and future work
10
Expert-driven network discovery Local search in the PPI network centered at the
query
Which proteins should we extract as a single, functionally coherent group?
Should consider: confidence in links and topology surrounding query group
11
Extracting relevant proteins
Basic idea: compute expected linkage to query set eij = P ( protein i is functionally related to protein j | evidence)
Xij : binary RV with prob. eij
SQ ( pi ): # of links from protein i to query set, Q
Find proteins that maximize:
Qpij
Qpij
QpijiQ
jjj
eXEXEpSE
What about indirect links to the query set?
12 Graph search: handling indirect links
Solution: iterative expanding search where indirect links to the query through high confidence neighbors
are counted
13
Overview
How do we integrate heterogeneous evidence?
Expert-driven network discoveryMaking it usable: practical visualization
and other interface considerationsDoes it work?
(evaluation experiments and biological validation)
Challenges/opportunities and future work
14
Making bioPIXIE usable
Guiding principles: Accessibility (users can access most recent data with little effort)
Simplicity vs. flexibility
Drill-down (details, e.g. supporting exp. data, hidden until requested)
Browseable
15
Graph visualization
16
Overview
How do we integrate heterogeneous evidence?
Expert-driven network discoveryMaking it usable: practical visualization
and other interface considerationsDoes it work?
(evaluation experiments and biological validation)
Challenges/opportunities and future work
17
Evaluation experiments
Recovering known network components:
How much does integration help?
Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS)
10 random proteins as query set and try to recover remaining members
18
Evaluation experiments (2)
Recovering known network components:
Do naïve methods of integration/search work just as well?
Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS)
10 random proteins as query set and try to recover remaining members
19 Biological validation: finding new components
S. cerevisiae uncharacterized gene, YPL077C
Predicted involvement in chromosome segregation
Using bioPIXIE to characterize unknown genes
20
Biological validation: finding new components
P-value based on blind counting: 1.98x10-7 , Fisher’s exact test
21
(Helmut Pospiech)
Biological validation: novel links between pathways
DNA replication initiation:
Cdc7: “switch” that starts replication (activated by Dbf4)
Linked to Hsp90 complex by our method
Hsp90 (yeast- hsc82,hsp82):Cytosolic molecular chaperone that participates in the folding of several signaling kinases and hormone receptors
22
Genetic analysis of DNA replication-Hsp90 link
105 cells
105 cells
105 cells
wt
db
f4Δ
hsp
82Δ
db
f4Δ
hsp
82Δ
wt
db
f4Δ
hsc
82Δ
db
f4Δ
hsc
82Δ
wt
db
f4Δ
cpr7
Δ
db
f4Δ
cpr7
Δ
RT
30°C
37°C
YKO Dbf4 vs. hsp82, hsc82 and co-chaperones: cpr7, sti1, cdc37
23
Overview
How do we integrate heterogeneous evidence?
Expert-driven network discoveryMaking it usable: practical visualization
and other interface considerationsDoes it work?
(evaluation experiments and biological validation)
Challenges/opportunities and future work
24 Practical challenges/opportunities
Visualizing complex networks of interactions in a meaningful way
how does it scale with added data? easy user navigation around the network
Data-centric vs. established knowledge viewsHow do we overlay current knowledge of pathways with predictions derived from experimental data?
25
Future workAn observation:
The more specific we can be about the end goal, the better the accuracy of our prediction
26
Future workExploiting relevance and reliability variation: context-specific integration
27
Summary
bioPIXIE can facilitate precise network discovery from experimental data using:
Bayesian data integration Expert-directed search Web-based dynamic interfacebioPIXIE is an effective tool for browsing
genomic evidence and generating specific, testable hypotheses
http://pixie.princeton.edu
28
Acknowledgements
http://pixie.princeton.edu
Olga TroyanskayaDrew RobsonAdam Wible
Kara Dolinski
Camelia Chiriac
Matt Hibbs
Curtis Huttenhower
David Botstein Lab
Leonid Kruglyak LabThank you!
29
Evaluation experiments (3): what about noise in the query set?
AU
PR
C
# of random proteins out of 20
total query proteins
31
30°C
37°C
HU 0 mM HU 50 mM HU 100 mM
wt
cpr7
Δ
sti1
Δ
db
f4Δ
hs
p8
2Δ
hs
c8
2Δ
db
f4Δ
hs
c8
2Δ
db
f4Δ
sti1
Δ
db
f4Δ
cpr7
Δdb
f4Δ
hs
p8
2Δ
wt
cpr7
Δst
i1Δ
db
f4Δ
cpr7
Δ wt
cpr7
Δ
sti1
Δ
db
f4Δ
cpr7
Δhs
p8
2Δ
hs
p8
2Δ
hs
c8
2Δ
hs
c8
2Δ
db
f4Δ
db
f4Δ
db
f4Δ
hs
p8
2Δ
db
f4Δ
hs
p8
2Δ
db
f4Δ
hs
c8
2Δ
db
f4Δ
hs
c8
2Δ
db
f4Δ
sti1
Δ
db
f4Δ
sti1
Δ
Hydroxyurea sensitivity (replication inhibitor)
106 cells
106 cells
32
Is this interaction specific to DNA replication?
37°C
wt
cpr7
Δ
sti1
Δ
db
f4Δ
hs
p8
2Δ
hs
c8
2Δ
db
f4Δ
hs
c8
2Δ
db
f4Δ
sti1
Δ
db
f4Δ
cpr7
Δdb
f4Δ
hs
p8
2Δ
wt
cpr7
Δst
i1Δ
db
f4Δ
cpr7
Δ wt
cpr7
Δ
sti1
Δ
db
f4Δ
cpr7
Δhs
p8
2Δ
hs
p8
2Δ
hs
c8
2Δ
hs
c8
2Δ
db
f4Δ
db
f4Δ
db
f4Δ
hs
p8
2Δ
db
f4Δ
hs
p8
2Δ
db
f4Δ
hs
c8
2Δ
db
f4Δ
hs
c8
2Δ
db
f4Δ
sti1
Δ
db
f4Δ
sti1
Δ
106 cells
MMS treatment has no apparent effect at RT, 30°C or 37°C (shown)
MMS sensitivity (induces DNA damage)
Conclusions:
Hsp90 complex plays specific role in DNA replication
Hsc82 and hsp82 do not have identical function
Possible new link between signaling cascades, stress, and DNA replication
Our system generates specific, testable hypotheses
33
34