Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant,...
-
Upload
pierce-powers -
Category
Documents
-
view
215 -
download
0
Transcript of Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant,...
![Page 1: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/1.jpg)
Genecentric: Finding Graph Theoretic Structure in High-Throughput Epistasis Data
Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott
Tufts University
![Page 2: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/2.jpg)
Protein-protein interaction
![Page 3: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/3.jpg)
High-throughput Interaction Data: aka ‘The Hairball’
![Page 4: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/4.jpg)
What we want:
What we have:
Question: Can we infer anything about "real" pathways from the low-resolution graph model of pairwise interactions?
![Page 5: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/5.jpg)
The hairball: A simple graph model
vertices ↔ genes/proteins
edges ↔ physical interactions or
genetic interactions
simplifications:
• undirected
• loses temporal information
• difficult to decompose into separate processes
• conflates different PPI types into one class of "physical interactions"
![Page 6: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/6.jpg)
1)Physical interactions2) Genetic Interactions (epistasis)
![Page 7: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/7.jpg)
Interaction types
• We distinguish here between two types of interaction:
– physical interactions
• genetic interactions
![Page 8: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/8.jpg)
Genetic interactions (epistasis)
Only 18% of yeast genes are essential (the yeast dies when they’re removed).
For the rest, we can compare the growth of the double knockout to its component single knockouts.
![Page 9: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/9.jpg)
Genetic interactions (epistasis)
• For non-essential genes, we can compare the growth of the double knockout to its component single knockouts
Picture: Ulitsky
![Page 10: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/10.jpg)
Nonessential Genes
– Some genes are non-essential because they are only required under certain conditions (i.e. an enzyme to metabolize a particular nutrient).
– Other genes are non-essential because the network has some built-in redundancy.
• One gene (completely or partially) compensates for the loss of another.
• One functional pathway (completely or partially) compensates for the loss of another.
![Page 11: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/11.jpg)
Redundant pathwaysand synthetic lethality
![Page 12: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/12.jpg)
Kelley and Ideker (2005):Between-Pathway Model (BPM)
![Page 13: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/13.jpg)
In reality, the data are very incomplete:Between-Pathway Model (BPM)
![Page 14: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/14.jpg)
Kelley and Ideker (2005)
• Goal: detect putative BPMs in yeast interactome• Method:
1) find densely-connected subsets of the physical protein-protein interaction (PI) network (putative pathways)
2) check the genetic interaction (GI) network to see if patterns in density of genetic interactions correlate with these putative pathways
3) check resulting structures for overrepresentation of biological function (gene set enrichment)
and Ulitsky and Shamir (2007)
![Page 15: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/15.jpg)
Kelley and Ideker (2005)and Ulitsky and Shamir (2007)
(1) (2)
(3)
enriched for function X
enriched for function Y
![Page 16: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/16.jpg)
Kelley and Ideker (2005)
• Problems:– Sparse data limits the potential scope of discovery
– independent validation is difficult
and Ulitsky and Shamir (2007)
![Page 17: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/17.jpg)
Further work on this problem:
Synthetic lethality:– Ulitsky and Shamir (2007)– Ma, Tarrone and Li (2008) – Brady, Maxwell, Daniels and Cowen (2009) – Hescott, Leiserson, Cowen and Slonim (2010)
Epistasis (weighted) data: -- Kelley and Kingsford (2011) -- Leiserson, Tatar, Cowen and Hescott (2011)
![Page 18: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/18.jpg)
So: what is the right way to generalize BPMs to edge weights?
![Page 19: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/19.jpg)
Quantitative interaction data
-0.6347
0.5838
-7.3556
-6.3511
-5.5312
3.69893
-5.2571
-3.3368
3.2723
-1.3668
E-MAP, Epistatic Miniarray Profile
Data is scalar (-22 to 15)
Synthetic Lethal, < -2.5 Synthetic Sick, -2.5 < x < 0
Synthetic Rescue, >+2.5Allevating 0<x< 2.5
SGA, Synthetic Genetic Array(smaller weights, -1.1 to 0.8)
New methods generates high-throughput data for genetic interactions.
![Page 20: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/20.jpg)
Want most negative weight across
0.553838
-7.32156
-6.315511
-5.506312
3.6539866
-5.252571
-3.365368
3.23673
-1.366879
-5.506312
-0.66434
0.53838
-7.32156
-6.31511
3.68398
-5.25271
-3.36536
3.23723
-1.36879
2.73
![Page 21: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/21.jpg)
What is the Quality of a BPM?
Once we obtain a candidate BPM we can score it using interaction data.
Sum interactions within
Sum interactions between
Take the difference andnormalize to create aninteraction score
-0.664347
0.553838
-7.321556
-6.315511
3.685398
-5.252571
-3.365368
3.236723
-1.366879
2.13473
0.13342
![Page 22: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/22.jpg)
Genecentric takes the perspective of each gene in turn
What is the ‘best’ candidate BPM that contains node g?
Consider a diverse set of GLOBAL partitions that try to MAXIMIZE our objective function over the whole graph. Which genes are consistently placed in the same (opposite) partition as g?
-0.664347
0.553838
-7.321556
-6.315511
3.685398
-5.252571
-3.365368
3.236723
-1.366879
2.13473
0.13342
![Page 23: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/23.jpg)
So we can extract a gene’s best BPM from a diverse set of good
global bipartitions
Idea for constructing the global
bipartitions: Maximal cut
![Page 24: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/24.jpg)
Create a random bipartitionFor every vertex (gene) assign to a partition at random
![Page 25: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/25.jpg)
Local search methodNow for each gene, v, consider its interaction scores
![Page 26: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/26.jpg)
Unhappy vs happy vertices
![Page 27: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/27.jpg)
FlipFlip to the other side to make it happy!
same(v) is now opposite(v) and opposite(v) is same(v)
some vertices could change to happy or unhappy
![Page 28: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/28.jpg)
Important properties
Flip will always terminate
- finite number of possible partitions
- weight between partitions decreases with each flip
- everyone is happy eventually
- local optimum
![Page 29: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/29.jpg)
How we make a BPM from bipartitions
For every gene run weighted flip on the entire graph of interactions, M times (250 times)
Some genes will stay on same side for most runs.
Some genes will stay on the opposite side for most runs.
Most will switch sides among the different runs
-0.66434
0.55338
-7.3215
-6.3151
3.6398
-5.252571
-3.3653
3.23672
-1.36679
2.1373
0.13342
![Page 30: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/30.jpg)
BPM collection: Removing Redundancies
Sort by score, add to final output set if Jaccard index < .66 for all previously added BPMs
Remove BPMs that are too large or small
-0.664347
0.553838
-7.321556
-6.315511
3.685398
-5.252571
-3.365368
3.236723
-1.366879
2.13473
0.13342
Take the difference and divide by the size
Numbers chosen to match previous studies
![Page 31: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/31.jpg)
How do we measure results?
• FuncAssociate to measure gene set enrichment
Berriz, Beaver, Cenik, Tasan, Roth, “Next generation software for functional trend analysis,” Bioinformatics, 2009, 25(22): 3043-4.
Location of physical interactions
![Page 32: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/32.jpg)
Our Results
![Page 33: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/33.jpg)
Comparison to previous methods: yeast ChromBio E-MAP
Study#Modules / (%Enriched) #BPMs
Enriched Same
Function
Enriched Same or Similar Function
Bandyopadhyay et al.
37 (35) 96 41 (43%) 53 (55%)
Ulitsky et al. 43 (43) 111 43 (39%) 71 (64%)
Kelley et al. 40 (40) 98 35 (36%) 52 (53%)
Genecentric 112 (103) 58 39 (67%) 43 (74%)
![Page 34: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/34.jpg)
How does Gencentric work with various data?
-0.66434
0.5538
-7.3215
-6.315511
-5.506312
3.6853
-5.252571
-3.365368
3.26723
-1.366879
-7.22314-6.31511
-0.55672
0.253228
-2.404421
4.51368
-3.355371
-6.63178
1.23711
-1.687991
E-MAP(Cell Cycle)
E-MAP(s. pombe)
SGA
E-MAP(MAP-K)
-0.22314-0.91511
0.253228
0.404421
-0.687991
0.983123
0.54278
-0.22565-5.7225
1.2833
-7.137271
5.22163
-3.12363
![Page 35: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/35.jpg)
Genecentric on Various Data Sets
Data Set #BPMs
Enriched Same
Function
Enriched Same or Similar Function
Collins et al.(Cell Cycle)
58 39 (67%) 43 (74%)
Fiedler et al.(MAP-K)
5 0 (0%) 4 (80%)
Tong et al. (SGA) 149 8 (5%) 17 (11%)
Roguev et al, (S. pombe)
16 1 (6%) 1 (6%)
![Page 36: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/36.jpg)
Consider physical interactions -0.66434
0.5538
-7.3215
-6.31511
-5.506312
3.6853
-5.252571
-3.365368
3.236723
-1.366879
genetic interactions
Physical Interactions-0.66347
0.55838
-7.3556
-6.3111
3.5398
-5.25371
-3.33368
3.2723
-1.3689
2.13473
![Page 37: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/37.jpg)
Physical interactions in Local Cut BPMS
Data Set
PIswithin
Pathways
Expected by chance within
PIsbetween
Pathways
Expected bychance
between
Collins et al.
172 20 18 20
Fiedler et al.
13 1 1 1
Tong et al.147 41 17 39
![Page 38: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/38.jpg)
Modifying the weights
How does alleviating interaction data affect the results?
Do extreme weights affect the quality of the results?
Does a continuum of possible weights change the results?
-0.664347
0.553838
-7.321556
-6.315511
-5.506312
3.685398
-5.252571
-3.365368
3.236723
-1.366879
![Page 39: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/39.jpg)
Local Cut Weight VariantsWeight scheme #BPMs
Enriched Same
Function
Enriched Same or Similar Function
Unchanged 58 39 (67%) 43 (74%)
No alleviating 26 17 (65%) 19 (73%)
Large values capped 68 4 (6%) 6 (9%)
Alleviating +1 Aggravating -1
30 3 (10%) 7 (23%)
![Page 40: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/40.jpg)
Genecentric: try this at home
• Project name: Genecentric• Project homepage:
http://bcb.cs.tufts.edu/genecentric• Operating system: platform independent• Programming language: Python• Other requirements: Python 2.6 or higher• License: GNU Public License (GPL 2.0)
![Page 41: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/41.jpg)
Gencentric parameters
• Set M (number of randomized bipartitions) default 250
• Set C (consistency of same side/opposite side for inclusion in g’s BPM) default 90%
• Set J (Jaccard index, how much overlap before similar BPMs are pruned) default .66
• Do you want a min or max size module? (default 3-25)
• FuncAssociate parameters: genespace, p-value
![Page 42: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/42.jpg)
Genecentric works out of the box
• “New” E-MAP of plasma membrane genes from Aguilar et al. in 2010.
• 374 genes including those known to be involved in endocytosis, signaling, lipid metabolism, eisome function.
• Genecentric was run with default E-MAP parameters, except C was lowered from .9 to .8 to produce more BPMs (22 instead of 6)
![Page 43: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/43.jpg)
Genecentric on plasma membrane E-MAP : example BPM
• COG6 COG5 COG8 PIB2 COG7
• Intra-Golgi vesicle-mediated transport, protein targeting to vacuole
BPM2
• ARL1 VPS35 GET3 ARL3 SYS1 GOT1 PEP8 SFT2 MNN1 VPS17
• Protein transport, Golgi apparatus, endsome transport, vesicle-mediated transport
BPM1
![Page 44: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/44.jpg)
Genecentric on plasma membrane E-MAP : example BPM
• SLT2 BCK1 CLC1
• Endoplasmic reticulum unfolded protein response
BPM2
• PEX1 PEX6 EDE1 SKN7 ERG4 ADH1 PEX15 ARC18 EMC33
• Protein import into peroxisome matrix, receptor recycling
BPM1
![Page 45: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/45.jpg)
Biological Findings (cont.)
• Some complexes come up again and again– could they be global mechanisms of fault tolerance?
In Plasma Membrane; -- COG complex In Chrombio;
– SWR-C complex (Chromatin remodeling)– Prefoldin complex (Chaperone)– MRE11 complex (DNA damage repair)
![Page 46: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/46.jpg)
Co-authors and collaborators
• Ben Hescott • Max Leiserson• Diana Tartar• Maxim Kachalov
![Page 47: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/47.jpg)
thanks.
![Page 48: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/48.jpg)
A Graph Theory Problem
• Our algorithm samples from the maximal bipartite subgraphs. With what distribution? Is it uniform? Proportional to the number of edges that cross the cut?? ???
• What are the properties of the stable bipartite subgraphs of the synthetic lethal network? Are they conserved across species?
![Page 49: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/49.jpg)
Approach• Run the partitioning algorithm 250 times on
the yeast SL network (G).• For each gene g in G,
– Construct a set A consisting of g and all nodes in G which wind up in the same set as g at least 70% of the time.
– Construct another set B consisting of all nodes in G which wind up in the opposite set from g at least 70% of the time.
• We call the subgraph of G defined by A and B the “stable bipartite subgraph of g”, and designate it as a candidate BPM.
![Page 50: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/50.jpg)
Delete a gene in pathway 1; see if changes in pathway 2 coherent
![Page 51: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/51.jpg)
log10 ratio
BPM
Deleted Gene
Pathway restriction
Sort
![Page 52: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/52.jpg)
Validation: Microarray Data
• Rosetta compendium (Hughes et al, 2000): -- contains yeast expression profiles of 276
deletion mutants: i.e. for each gene in the yeast genome,
measures how its expression levels change when particular gene g is deleted, as compared to wildtype yeast.
![Page 53: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/53.jpg)
At step i: N to 1
Calculate weighted percent of genes in pathway seen so far and precent of genes not in pathway:
Score is max difference
![Page 54: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/54.jpg)
• Using a permutation test we sample 99 random subsets of genes the same size as the pathway
• We calculate the cluster rank score for each of these 99 sets
• We sort the test plus the pathway score• The p-value is the percentile• A pathway is validated if its p-value is <=0.1
How to validate a pathway
![Page 55: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/55.jpg)
Delete a gene in pathway 1; see if changes in pathway 2 coherent
We call a pathway “Validated” if its Cluster Rank Score has p-value < .1
![Page 56: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/56.jpg)
Kelley-Ideker Histogram of the Lowest CRS per Pathway per BPM
This histogram displays all the CRS scores from all of the results from Kelley and Ideker’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.
![Page 57: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/57.jpg)
Ulitskyi Histogram of the Lowest CRS per Pathway per BPM
This histogram displays all the CRS scores from all of the results from Ulitskyi’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.
![Page 58: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/58.jpg)
Ma Histogram of the Lowest CRS per Pathway per BPM
This histogram displays all the CRS scores from all of the results from Ma’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.
![Page 59: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/59.jpg)
Brady Histogram of the Lowest CRS per BPM
This histogram displays all the CRS scores from all of the results from Brady’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM. Clearly, Brady’s BPMs are disproportionately represented in the lower p value range.
![Page 60: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/60.jpg)
Results
BPM dataset # paths hitknockouts
# validated pathways
% validatedpathways
Kelley-Ideker (05)
160 16 10%
Ulitsky-Shamir (07)
36 5 14%
Ma et al. (08)
54 6 11%
Our results 959 230 24%
![Page 61: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/61.jpg)
A Tantalizing Peek of What We can Do With More Data!
• A heat map of the differential expression of yeast genes in pathway 2 in response to the deletion of two different genes (SHE4 and GAS1) from pathway 1 in a validated BPM of Ma et al.
![Page 62: Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.](https://reader035.fdocuments.in/reader035/viewer/2022062518/56649e585503460f94b50cd3/html5/thumbnails/62.jpg)
A random-gene validation test couples the two pathways together