Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...
Transcript of Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...
-
Identifying the importance of higher order epistasis
Jacob Jaffe
B.Sc. Honors Candidate in Applied Mathematic - Biology
Advisor: Professor Daniel Weinreich Ph.D. Second Reader: Professor Anastasios Matzavinos Ph.D.
-
2
Table of Contents Abstract...............................................................................................................................3Introduction.........................................................................................................................4Methods...............................................................................................................................8
Data Handling ......................................................................................................................... 8Walsh Coefficients ................................................................................................................. 8Linear regression ................................................................................................................... 9Statistical testing .................................................................................................................. 10
Results................................................................................................................................12Discussion..........................................................................................................................16
Handling experimental uncertainty ..................................................................................... 17Handling non-binary cardinalities ....................................................................................... 18
Conclusions.......................................................................................................................19Acknowledgments............................................................................................................20Bibliography.......................................................................................................................21Appendix............................................................................................................................25
1a. Palmer et al. (2015) linear regression figure ................................................................ 252a. Mira et al. (2015) table of TEM b-50 lactamase system grown on 15 different antibiotic backgrounds. ....................................................................................................... 262b. Regression Analysis of Mira et al. (2015) ..................................................................... 273a. Other Sub-sampled data from the same system ......................................................... 28
-
3
Abstract
Higher order epistasis, interactions between more than two mutations, is prevalent in
many different fitness landscapes. While traditionally, evolutionary biologists have
viewed higher order epistatic interactions as less important than main effects or pairwise
epistatic interactions, recent work suggests that higher order epistatic interactions can
explain a portion of variance in fitness landscapes. In this thesis, I use the Walsh
Transformation to rank the explanatory power of individual epistatic coefficients. I find
that in 13 of 19 distinct datasets, higher order epistasis is not incrementally less
important than lower order epistasis or point mutations. This result provides strong
evidence that higher order epistatic interactions play an important role in evolution and
determining fitness landscapes. Furthermore, it shows that new techniques are needed
to better quantify epistasis in non-homogeneous loci systems.
-
4
Introduction Epistasis, the interaction between genes or mutations, is integral to understanding the
dynamics that guide evolution. Epistasis is widely observed in nature and is implicated
in biological phenomena ranging from human disease to antibiotic resistance (Phillips et
al. 1998; Weinreich et al. 2006). Historically, the term epistasis has taken on different
definitions to describe distinct, yet sometimes interacting, biological events including
allele frequencies in Mendelian genetics, unexpected molecular or protein regulatory
interactions resulting from multiple base pair changes, and the statistical deviation from
the additive expectation of two individual mutations (Bateson & Mendel 1913; Fisher et
al. 1930; Philips et al. 2008). For the purposes of this thesis, when I refer to epistasis, I
will mean functional epistasis (Weinreich et al. 2005). Specifically, I will take epistasis to
be the phenotypic deviation from the additive combination of two individual mutations for
a constant environmental and genetic background (Weinreich et. al 2013).
In evolutionary biology, measuring fitness or reproductive success, is an approach used
to quantify and predict evolutionary trends of a population of organisms. The concept of
a fitness landscape, first introduced by Seawall Wright (1932), is a graphical
representation of genotype-phenotype mapping of fitness where proximity signifies
genotypic relatedness and peaks signify local fitness maxima. A population is said to
climb up the fitness landscape when it acquires beneficial mutations that increase
fitness. Consequently, populations can become ‘stuck’ on local fitness peak when
-
5
reaching the global fitness maximum would involve traversing fitness ‘valleys’. [For
definitions of relevant terms see Box 1]
It is widely accepted that sign epistasis can
give rise to increased complexity, or
‘ruggedness’, in fitness landscapes (Weinreich
et al. 2005). This additional ruggedness can
substantially limit the number of accessible
mutational pathways for natural selection, since
each successive acquired mutation must either
be beneficial or neutral. To investigate these consequences on accessible mutational
pathways, it is necessary to consider the projection of an organism’s fitness in
nucleotide sequence space where each locus represents the presence or absence of a
given mutation (Weinreich et al. 2013; Smith, 1970). For example, if each locus (L) has
two possible states, wild type or mutated, constructing all 2L to combinations of
mutations and measuring a proxy for fitness, allows one to construct a complete fitness
landscape. Qualitatively, these measurements can be considered as rank orders in
phenotype, where paths must follow increasing rank orders to reach a fitness maximum
(Crona et al. 2013).
Traditionally, evolutionary biologists have only considered the main effects of single
mutations and pairwise epistatic effects on fitness important in evolution (Hartl, 2014). If
pairwise epistasis is evolutionary significant, how often does pairwise epistasis itself
Box 1: Relevant terms Epistasis: Interactions between genes or mutations such that the phenotypic consequence of multiple mutations differs from the additive expectation based on the constituent mutations. Higher-order epistasis: epistasis of order three or more. Sign epistasis: epistasis in which the sign of the effect of a mutation depends on genetic background. Magnitude epistasis: epistasis in which the sign of the effect of a mutation is independent of genetic background.
-
6
vary with all other genetic backgrounds (Weinreich et al. 2013)? Only recently have
higher order epistatic relationships, interactions between three or more mutations,
begun to be examined. In part, this is due to the sheer number of higher order
interactions. For a given dataset of size n, there are !" k-way epistatic interactions, and
2L total interactions (Weinreich et al. 2013).
Using the Walsh transformation to quantify different orders of epistasis, higher order
epistatic interactions were found in 14 datasets analyzed by Weinreich et al. (2013).
Even though we can empirically observe epistasis in the data, this observation alone
does not mean that higher order epistasis is as significant as main effects or pairwise
epistasis. A recent contribution by Palmer et al. (2015) on trimethoprim resistance in
Escherichia coli modeled the importance of higher order epistasis in shaping a fitness
landscape. Their results showed that a portion of higher order epistatic interactions
were responsible for shaping more of the fitness landscape than some of individual
effects of mutations and pairwise epistatic interactions [Appendix 1a].
For this thesis, I will analyze all available combinatorically complete fitness data sets
using Walsh coefficients to quantify epistatic terms. I will construct regression models of
increasing complexity with these data and quantify the significance of higher order
epistasis against the naïve expectation that as the order of epistasis increases, the
relative contribution to shaping the fitness landscape decreases. Lastly, I will discuss
-
7
the implications of these findings and future directions of research in higher order
epistasis.
-
8
Methods Data Handling
The combinatorically complete datasets used in this meta-study come from many
different biological systems of interest in evolutionary biology and medicine. For each
individual system fitness was inferred by a proxy: growth rate assay, 75% Inhibitory
Concentration (IC-75), Minimum Inhibitory Concentration (MIC), binding assays, and
melting points. While not directly equivalent to each other or direct fitness, growth rate,
MIC and IC-75 measurements have been shown to be closely correlated (Whitlock &
Bourguet 2000). In addition, binding assays and melting points, provide metrics to
measure the fitness of proteins. In experiments that tested the same combinations of
mutations against many different antibiotic backgrounds, I have randomly chosen one
such background to include in my final analysis. Datasets with locus that contain more
than two states have been randomly subsampled to create bi-allelic combinatorically
complete data (Palmer et al. 2015).
Walsh Coefficients
Prior to calculating Walsh Coefficients, data were log-transformed consistent with
Weinreich et al. (2013), which minimized higher order epistatic effects. In each
transformed dataset of 2L fitness values, each allele was represented by a binary bit
string, where one represented the presence of a mutation and zero represented the
absence (Weinreich et al 2013). These bit strings were sorted from smallest to largest
according to their binary values and represented as a vector W. Walsh coefficients were
-
9
generated through applying the Walsh transformation to W and scaling the result by a
factor of 2-L [Box 2]. The resulting E vector of Walsh Coefficients is already sorted from
smallest to largest per their binary values. The binary representation of each Walsh
coefficient can be thought of as the epistatic interaction term between the Loci with ones
in the bit string (Weinreich et al. 2013). The number of number present in each Walsh
coefficient bit string representation are referred to as the Walsh coefficients epistatic
order.
An important property of the Walsh Transformation is that it is symmetrical and has an
orthogonal basis. This guarantees the independence of each Walsh coefficient and
prevents pairwise epistatic coefficients from capturing higher order epistatic effects, a
common issue with regression analysis (Heckendorn & Whitley 1999; Hartl, 2014). It
also allows one to easily convert Walsh coefficients back to fitness values.
Linear regression A step-wise linear regression was performed by back-transforming subsets of Walsh
coefficients to fitness values and quantifying how much of the data these coefficients
explained. To start, each Walsh coefficient paired with the 0th Walsh coefficient (the
grand average) and back-transformed to fitness. The subset of two with the most
explanatory power was chosen, keeping track of the non-0th Walsh coefficient and the
remaining unexplained variance. Next, all other Walsh coefficients were put into subsets
with the 0th Walsh coefficient, and the previous winner. These subsets of three were
back-transformed to fitness values, and the triplicate with the most explanatory power
-
10
was chosen, noting the last added Walsh coefficient and the remaining unexplained
variance. This process was iterated until no Walsh coefficients remained. The order that
Walsh coefficients were added to this model could be predicted by absolute magnitude
of the Walsh Coefficients (Lan & Heckendorn, pers. comm. 2017).
Statistical testing Intuitively, we would expect Walsh Coefficients with lower epistatic order, to explain
more variance in the fitness data. This intuition comes from the fact that biological
systems are primarily additive, and main effects should play a more dominant role in
shaping the fitness landscape. The implication of this is observation is that the order in
which Walsh coefficients are added to the regression model should be predicted by the
epistatic interaction order the coefficient represents. For example, in a system with
three possible mutations, L = 3 (2L Walsh coefficients), we would expect the order of
Walsh coefficients by explanatory power to be exactly the grand average Walsh
coefficient (0th order epistasis), followed by the three main effects coefficients (1st order
epistasis), the three pairwise epistatic coefficients (2nd order epistasis), and ending with
the 3rd order Walsh coefficient.
Kendall’s tb rank correlation, which can account for multiple ties, was calculated
between the observed ordering of Walsh coefficients epistatic order detailed in our
linear regression model and the expected ordering of Walsh Coefficient epistatic order
(Kendall, 1938).
-
11
Figure 1: (a) sample calculation of Walsh coefficients. (b) Oobserved is ordered according to the corresponding Walsh Coefficient absolute magnitude. Note how the 4th Walsh coefficient of epistasis order 2, has an absolute magnitude less than the 8th Walsh coefficient of epistasis order 3. (c) This is the t-b distribution generated through 10,000 psuedo-sample by randomly sampling Walsh coefficient order and calculating tb between each pseudo-sample and the Oexpected. The p-value was the proportion of pseudo-sample tb greater than our observed tb. In this case to the right of the orange line.
To characterize how ordered the epistatic orders of the Walsh coefficients were relative
to Walsh coefficients being ordered randomly, a probability distribution was generated
by randomly pseudo-sampling Walsh Coefficient 1000 times without replacement.
Kendal tb was calculated between the pseudo-samples of Walsh coefficient order and
expected order of Walsh coefficient epistatic order. We then took our p-value to be the
proportion of Tau from the pseudo-sample greater than observed correlation and
corrected for testing between multiple datasets with the Holm-Bonferroni correction
(Holm, 1979). Figure1
-
12
Results In most datasets, step-wise regression revealed Walsh coefficients with higher order
epistatic order appearing before at least one main effect or pairwise Walsh coefficient
[Figure 1]. Only 7 of the 19 datasets exhibited statistically significant orderings of Walsh
Coefficient by epistatic ordering [Table 1]. In studies that tested the same mutational
sites across different environmental background, I found Walsh coefficient order and the
subsequent ordering of epistatic changed relative to environment [Appendix 2a & 2b].
In these cases I generated a complete table of uncorrected p-values [Appendix 3a].
-
13
-
14
Figure 2: Each subfigure letter corresponds to a step-wise linear regression performed on datasets in table 1. Each point represents the epistatic order of the last added parameter to the regression, while the y-axis represents the remaining unexplained variance in the model.
-
15
Table 1: Statistical tests in unique systems (some description taken directly from Weinreich et al. 2013)
* indicates statistical significance
Figure 1
System
# of loci
Tau
Holm-Bonferroni
Corrected p-value
Reference
e Methylobacterium extorquens
Beneficial mutations in novel metabolic pathway
4 0.7527 0* Chou et al. 2011
d Escherichia coli
Beneficial mutations that increase growth rates
5 0.5415 0.0009*
Khan et al. 2011
s E. coli
Coenzyme used in isopropylmalate dehydrogenase (IMDH)
9 0.5257 0.00187*
Lunzer et al. 2005
g E. coli
Metallo-β-lactamases 4 0.6989 0.00288* Meini et al 2015
p Drosophila melanogaster
5 0.4896 0.00465* Whitlock, & Bourguet 2000
b Saccharomyces cerevisiae
Plasmodium falciparum DHFR 3 0.8182 0.00924* Brown et al. 2010
f Avian Lysozome
3 0.8182 0.0104* Malcolm et al. 1990
i HIV Glycoprotein Mutants
5 0.3964 0.0366* Da Silva et al. 2010
q E. coli
TEM b-Lactamase mutation 5 0.3731 0.05907 Weinreich et al. 2006
h E. coli
piperacillin with clavulanic acid 5 0.3290 0.12 Tan et al. 2010
n E. coli
pyrimethamine resistance 4 0.4516 0.15417 Lozovsky et al. 2009
r Solinaceae sequiterpine
6 0.1974 0.21344 O’Maille et al. 2006
c Plasmodium falciparum
Pyrimethamine resistance 3 0.4545 0.38108 Constanzo & Hartl 2011
o E. coli
Trimethoprim resistance recombined DHFR gene
6 0.1324 0.5883
Palmer et al. 2016
j Aspergillus niger
testing recombination and the evolution of sex 5 0.13062 0.6531 De Visser et al. 2009
k E. coli
TEM b-50 lactamase cefotaxime background 4 0.1505 1.21668 Mira et al. 2015
a Mammalian glucocorticoid receptor
4 0.1075 1.07058 Bridgham et al. 2006
l Saccharomyces cerevisiae (haploid) evolution of sex. 6 0.02522 0.8007 Hall et al. 2010
m Saccharomyces cerevisiae (diploid) evolution of sex. 6 0.06620 0.74048 Hall et al. 2010
-
16
Discussion For 13 of the 19 tested datasets, I conclude that the ordering of Walsh Coefficients was
not significantly different from coefficients being randomly ordered. While the largest
dataset tested, Lunzer et al. (2005), was found to be ordered non-randomly, the number
of loci within a dataset did not appear to predict if the dataset’s Walsh coefficients would
be significantly ordered. In addition, the type of biological system did not appear to be a
predictor of ordering; some datasets testing antibiotic resistance, and metabolic growth
pathways were significantly ordered, while others were not. One pattern that did emerge
was that all datasets that came from studies exploring the evolution of sex were not
significantly ordered. were intergenic while all the other studies were intragenic.
How unexpected is it that some higher order epistatic terms appear to explain more
variance than main effects or pairwise epistasis? Particularly in larger datasets, when
higher order epistatic coefficients greatly outnumbering main effects and pairwise
epistatic coefficients, is it inevitable that there will be some randomness in how these
coefficients are ordered? In reviewing many of the same datasets used in this thesis,
Hartl (2014) notes that in many of these systems, as beneficial mutations accumulate,
their returns diminish, following a curve similar to a Michaelis-Menten enzyme kinetics
curve. Hartl (2014) reasons that when beneficial mutations result map to fitness on the
very shallow or very steep slope of the curve, fitness appears additive, but when it is
mapped to the center of the curve, magnitude epistasis appears.
-
17
However, while magnitude epistasis can change the apparent magnitude of Walsh
Coefficients, it does not change the order in which Walsh coefficients appear in our
regression model. A recent paper by Sailor & Harms (2017) agreed that not accounting
for non-linearity prior preforming the Walsh Transformation can overestimate higher
order epistatic interactions. The authors suggest performing a non-linear regression to
identify the optimal power transformation to linearize raw data and account for non-
additive interactions.
Do different environmental backgrounds affect how Walsh Coefficients order
themselves by explanatory power in the same gene system? In Mira et al. (2014), the
authors measured the average growth rates of all combinations of the same four
mutations in the TEM-50 b-Lactamase gene across 15 different antibiotic media. The
resulting tb correlation coefficients were highly dependent on the antibiotic medium the
bacteria were grown in, with coefficients ranging from -.29 to .41. This difference in
correlation statistics is not surprising, since the authors noted that natural selection
favored different combinations of mutations in different growth media (Mira et al. 2014).
Handling experimental uncertainty The next step to develop a more robust model and statistical testing will involve
handling uncertainty. Particularly in the context of ranking Walsh Coefficients by
explanatory power in the regression model, when does the additional explanatory power
of very small Walsh Coefficients become indistinguishable? A potential solution to this
-
18
problem would involve implementing an information criterion in the regression model,
such as Akiko Information Criterion (AIC), which would identify when to stop keeping
track of Walsh Coefficient ranks. In this scheme, all coefficients after the AIC cutoff
would be tied for having the least explanatory power.
Handling non-binary cardinalities The Walsh Transformation is limited in that it cannot handle non-biallelic loci systems.
This turns out to be an impediment when analyzing epistatic interactions in genotype
space with four possible nucleic acids and protein space, which can have as many as
twenty amino acid substitutions. The concept of the Walsh Transformations has been
shown to be generalizable to encompass homogeneously larger alphabets (Zhou et al.
2007). However, both the Walsh Transformation and its extension into higher
dimensionalities suffer from the inability to handle heterogeneous loci systems.
Anderson et al. (2015) has proposed a scheme that identifies epistatic interactions in
systems where each locus can handle alphabets of size 2k. Still, this scheme is not a
robust method to handle different alphabet sizes, and becomes requires one to
construct a transformation table by hand. Within this thesis, I handled heterogeneous
loci systems by randomly subsampling from all available mutations. Palmer et al. (2015)
had one non-binary locus with a cardinality of three. This allowed for creating
subsampling affected Walsh coefficient ordering. The resulting tb correlation
coefficients ranged from .13 - .24, however, regardless of which subsamples were
chosen, the resulting corrected p-values would have been non-significant.
-
19
Conclusions
Higher order epistasis is present in every combinatorically complete dataset that I have
examined. It is pervasive in shaping fitness landscapes and determining evolutionary
accessible peaks. The Walsh Transform provides a straightforward way to quantify
higher order epistasis. Due to the Walsh Transform’s orthogonal properties, ranking
Walsh coefficients by absolute magnitude is equivalent to identifying coefficients by
order of explanatory power in a regression model (Lan & Heckendorn, pers. commun
2017). Contrary to the naïve expectation that higher order epistatic terms are less
important than main effects or pairwise epistatic terms, I have shown that in 13 of the 19
datasets examined, higher order epistatic terms were not significantly less important
than other terms.
-
20
Acknowledgments I would like to thank Professor Daniel Weinreich Ph.D. and Yinghong Lan for providing me with invaluable mentorship throughout this project. I would also like to thank my parents for supporting me throughout my life and feeding my insatiable curiosity throughout my childhood.
-
21
Bibliography Anderson DW, McKeown AN, Thornton JW. Intermolecular epistasis shaped the
function and evolution of an ancient transcription factor and its DNA binding sites. Elife. 2015 Jun 15;4:e07864.
Bateson W, Mendel G. Mendel's principles of heredity. University press; 1913. Bridgham, J.T., Carroll, S.M. and Thornton, J.W., 2006. Evolution of hormone-receptor
complexity by molecular exploitation. Science, 312(5770), pp.97-101. Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R. and Hartl, D.L., 2010.
Compensatory mutations restore fitness during the evolution of dihydrofolate reductase. Molecular biology and evolution, 27(12), pp.2682-2690.
Chou HH, Chiu HC, Delaney NF, Segrè D, Marx CJ. Diminishing returns epistasis
among beneficial mutations decelerates adaptation. Science. 2011 Jun 3;332(6034):1190-2.
Costanzo, M.S. and Hartl, D.L., 2011. The evolutionary landscape of antifolate
resistance in Plasmodium falciparum. Journal of genetics, 90(2), pp.187-190. Crona K, Greene D, Barlow M. The peaks and geometry of fitness landscapes. Journal
of theoretical biology. 2013 Jan 21;317:1-0. da Silva, J., Coetzer, M., Nedellec, R., Pastore, C. and Mosier, D.E., 2010. Fitness
epistasis and constraints on adaptation in a human immunodeficiency virus type 1 protein region. Genetics, 185(1), pp.293-303.
de Visser, J.A.G., Park, S.C. and Krug, J., 2009. Exploring the effect of sex on empirical
fitness landscapes. the american naturalist, 174(S1), pp.S15-S30. Fisher RA. The genetical theory of natural selection: a complete variorum edition.
Oxford University Press; 1930. Flynn KM, Cooper TF, Moore FB, Cooper VS. The environment affects epistatic
interactions to alter the topology of an empirical fitness landscape. PLoS Genet. 2013 Apr 4;9(4):e1003426.
Franke J, Klözer A, de Visser JA, Krug J. Evolutionary accessibility of mutational
pathways. PLoS Comput Biol. 2011 Aug 18;7(8):e1002134.
-
22
Hall, D.W., Agan, M. and Pope, S.C., 2010. Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. Journal of heredity, 101(suppl 1), pp.S75-S84.
Hartl DL. What can we learn from fitness landscapes?. Current opinion in microbiology.
2014 Oct 31;21:51-7. Heckendorn RB, Whitley D. Predicting epistasis from mathematical models.
Evolutionary Computation. 1999;7(1):69-101. Holm S. A simple sequentially rejective multiple test procedure. Scandinavian journal of
statistics. 1979 Jan 1:65-70. Kendall MG. A new measure of rank correlation. Biometrika. 1938 Jun 1;30(1/2):81-93. Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E. and Cooper, T.F., 2011. Negative
epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), pp.1193-1196.
Lan & Heckendorn, personal Commication 2017 Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J.,
Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M. and Hartl, D.L., 2009. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of the National Academy of Sciences, 106(29), pp.12025-12030.
Lunzer, M., Miller, S.P., Felsheim, R. and Dean, A.M., 2005. The biochemical
architecture of an ancient adaptive landscape. Science, 310(5747), pp.499-501. Malcolm BA, Wilson KP, Matthews BW, Kirsch JF, Wilson AC. Ancestral lysozymes
reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature. 1990 May 3;345(6270):86.
Meini MR, Tomatis PE, Weinreich DM, Vila AJ. Quantitative description of a protein
fitness landscape based on molecular features. Molecular biology and evolution. 2015 Jul 1;32(7):1774-87.
Miller, S.P., Lunzer, M. and Dean, A.M., 2006. Direct demonstration of an adaptive
constraint. Science, 314(5798), pp.458-461. Mira PM, Crona K, Greene D, Meza JC, Sturmfels B, Barlow M. Rational design of
antibiotic treatment plans: a treatment strategy for managing evolution and reversing resistance. PloS one. 2015 May 6;10(5):e0122283.
-
23
O'maille, P.E., Malone, A., Dellas, N., Hess, B.A., Smentek, L., Sheehan, I.,
Greenhagen, B.T., Chappell, J., Manning, G. and Noel, J.P., 2008. Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature chemical biology, 4(10), pp.617-623.
Palmer, A.C., Toprak, E., Baym, M., Kim, S., Veres, A., Bershtein, S. and Kishony, R.,
2015. Delayed commitment to evolutionary fate in antibiotic resistance fitness landscapes. Nature communications, 6.
Phillips PC. Epistasis—the essential role of gene interactions in the structure and
evolution of genetic systems. Nature Reviews Genetics. 2008 Nov 1;9(11):855-67.
Phillips PC. The language of gene interaction. Genetics. 1998 Jul 1;149(3):1167-71. Sailer ZR, Harms MJ. Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype
Maps. Genetics. 2017 Mar 1;205(3):1079-88. Smith, John Maynard. "Natural selection and the concept of a protein space." (1970):
563-564. Tan, L., Serene, S., Chao, H.X. and Gore, J., 2011. Hidden randomness between
fitness landscapes limits reverse evolution. Physical review letters, 106(19), p.198102.
Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only
very few mutational paths to fitter proteins. science. 2006 Apr 7;312(5770):111-4. Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. Should evolutionary geneticists worry
about higher-order epistasis? Current opinion in genetics & development. 2013 Dec 31;23(6):700-7.
Weinreich DM, Watson RA, Chao L. Perspective: sign epistasis and genetic constraint
on evolutionary trajectories. Evolution. 2005 Jun;59(6):1165-74. Whitlock, M.C. and Bourguet, D., 2000. Factors affecting the genetic load in Drosophila:
synergistic epistasis and correlations among fitness components. Evolution, 54(5), pp.1654-1660.
Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution.
na; 1932 Aug.
-
24
Zhou S, Sun Z, Heckendorn RB. Extended probe method for linkage discovery over high-cardinality alphabets. InProceedings of the 9th annual conference on Genetic and evolutionary computation 2007 Jul 7 (pp. 1484-1491). ACM.
-
25
Appendix 1a. Palmer et al. (2015) linear regression figure
“Each genotype was fitted by a series of models of increasing complexity, that assigned changes in resistance through parameters based on the grand average of fitness, the individual effects of mutations, and pairwise and higher order epistatic interactions. The single parameter with the greatest explanatory power was chosen, followed by the parameter with the second greatest explanatory power when combined with the first. Each point in the plot denotes the class of parameter added and the residual variance with that many parameters.” Palmer et al. (2015)
-
26
2a. Mira et al. (2015) table of TEM b-50 lactamase system grown on 15 different antibiotic backgrounds.
Antibiotic Media Number of loci tb correlation p-value
Ampicillin (AMP) 4 0.02151 0.4573 Amoxicillin (AM) 4 -0.032258 0.5448 Cefaclor (CEC) 4 0.40860 0.025 Cefotaxime (CTX) 4 0.15054 0.2424 Ceftizoxime (ZOX) 4 0.35484 0.0531 Cefuroxime (CRO) 4 0.322581 0.0673 Ceftriaxone (AMC) 4 0.150538 0.242 Amoxicillin + Clavulanic acid (AMC) 4 0 0.4832
Ceftazidime (CAZ) 4 0.150538 0.239 Cefotetan (CTT) 4 0.010753 0.4696 Ampicillin + Sulbactam (SAM) 4 0.376344 0.0397
Cefotprozil (CPR) 4 0.215054 0.1605 Cefpodoxime (CPD 4 0.193549 0.1863 Pipercillin + Tazobactam (TZP) 4 0.225806 0.145
Cefepime (FEP) 4 -0.29032 0.902
-
27
2b. Regression Analysis of Mira et al. (2015)
These models show that different medium backgrounds produce different fitness landscapes and regression models.
-
28
3a. Other Sub-sampled data from the same system
System
# of loci
tb correlation
p-value
Reference
E. coli Subset 01
Trimethoprim resistance 6
0.1324
.09305
Palmer et al. 2015
E. coli Subset 02
Trimethoprim resistance 6 0.1412 0.08228
Palmer et al. 2015
E. coli Subset 12
Trimethoprim resistance 6 0.2478 0.00717
Palmer et al. 2015
A. niger testing recombination and the
evolution of sex 5
0.13062
.1089
De Visser et al. 2009
A. niger testing recombination and the
evolution of sex 5 0.2176 0.07031
Franke et al. 2011
Escherichia coli Beneficial mutations that increase
growth rates in Guanazole 5 0.4819 0.0029
Flynn et al. 2013
Escherichia coli Beneficial mutations that increase
growth rates in EGTA 5 0.3549 0.0076
Flynn et al. 2013
Escherichia coli Beneficial mutations that increase
growth 5 0.5415 0
Khan et al. 2011