Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...

28
Identifying the importance of higher order epistasis Jacob Jaffe B.Sc. Honors Candidate in Applied Mathematic - Biology Advisor: Professor Daniel Weinreich Ph.D. Second Reader: Professor Anastasios Matzavinos Ph.D.

Transcript of Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...

  • Identifying the importance of higher order epistasis

    Jacob Jaffe

    B.Sc. Honors Candidate in Applied Mathematic - Biology

    Advisor: Professor Daniel Weinreich Ph.D. Second Reader: Professor Anastasios Matzavinos Ph.D.

  • 2

    Table of Contents Abstract...............................................................................................................................3Introduction.........................................................................................................................4Methods...............................................................................................................................8

    Data Handling ......................................................................................................................... 8Walsh Coefficients ................................................................................................................. 8Linear regression ................................................................................................................... 9Statistical testing .................................................................................................................. 10

    Results................................................................................................................................12Discussion..........................................................................................................................16

    Handling experimental uncertainty ..................................................................................... 17Handling non-binary cardinalities ....................................................................................... 18

    Conclusions.......................................................................................................................19Acknowledgments............................................................................................................20Bibliography.......................................................................................................................21Appendix............................................................................................................................25

    1a. Palmer et al. (2015) linear regression figure ................................................................ 252a. Mira et al. (2015) table of TEM b-50 lactamase system grown on 15 different antibiotic backgrounds. ....................................................................................................... 262b. Regression Analysis of Mira et al. (2015) ..................................................................... 273a. Other Sub-sampled data from the same system ......................................................... 28

  • 3

    Abstract

    Higher order epistasis, interactions between more than two mutations, is prevalent in

    many different fitness landscapes. While traditionally, evolutionary biologists have

    viewed higher order epistatic interactions as less important than main effects or pairwise

    epistatic interactions, recent work suggests that higher order epistatic interactions can

    explain a portion of variance in fitness landscapes. In this thesis, I use the Walsh

    Transformation to rank the explanatory power of individual epistatic coefficients. I find

    that in 13 of 19 distinct datasets, higher order epistasis is not incrementally less

    important than lower order epistasis or point mutations. This result provides strong

    evidence that higher order epistatic interactions play an important role in evolution and

    determining fitness landscapes. Furthermore, it shows that new techniques are needed

    to better quantify epistasis in non-homogeneous loci systems.

  • 4

    Introduction Epistasis, the interaction between genes or mutations, is integral to understanding the

    dynamics that guide evolution. Epistasis is widely observed in nature and is implicated

    in biological phenomena ranging from human disease to antibiotic resistance (Phillips et

    al. 1998; Weinreich et al. 2006). Historically, the term epistasis has taken on different

    definitions to describe distinct, yet sometimes interacting, biological events including

    allele frequencies in Mendelian genetics, unexpected molecular or protein regulatory

    interactions resulting from multiple base pair changes, and the statistical deviation from

    the additive expectation of two individual mutations (Bateson & Mendel 1913; Fisher et

    al. 1930; Philips et al. 2008). For the purposes of this thesis, when I refer to epistasis, I

    will mean functional epistasis (Weinreich et al. 2005). Specifically, I will take epistasis to

    be the phenotypic deviation from the additive combination of two individual mutations for

    a constant environmental and genetic background (Weinreich et. al 2013).

    In evolutionary biology, measuring fitness or reproductive success, is an approach used

    to quantify and predict evolutionary trends of a population of organisms. The concept of

    a fitness landscape, first introduced by Seawall Wright (1932), is a graphical

    representation of genotype-phenotype mapping of fitness where proximity signifies

    genotypic relatedness and peaks signify local fitness maxima. A population is said to

    climb up the fitness landscape when it acquires beneficial mutations that increase

    fitness. Consequently, populations can become ‘stuck’ on local fitness peak when

  • 5

    reaching the global fitness maximum would involve traversing fitness ‘valleys’. [For

    definitions of relevant terms see Box 1]

    It is widely accepted that sign epistasis can

    give rise to increased complexity, or

    ‘ruggedness’, in fitness landscapes (Weinreich

    et al. 2005). This additional ruggedness can

    substantially limit the number of accessible

    mutational pathways for natural selection, since

    each successive acquired mutation must either

    be beneficial or neutral. To investigate these consequences on accessible mutational

    pathways, it is necessary to consider the projection of an organism’s fitness in

    nucleotide sequence space where each locus represents the presence or absence of a

    given mutation (Weinreich et al. 2013; Smith, 1970). For example, if each locus (L) has

    two possible states, wild type or mutated, constructing all 2L to combinations of

    mutations and measuring a proxy for fitness, allows one to construct a complete fitness

    landscape. Qualitatively, these measurements can be considered as rank orders in

    phenotype, where paths must follow increasing rank orders to reach a fitness maximum

    (Crona et al. 2013).

    Traditionally, evolutionary biologists have only considered the main effects of single

    mutations and pairwise epistatic effects on fitness important in evolution (Hartl, 2014). If

    pairwise epistasis is evolutionary significant, how often does pairwise epistasis itself

    Box 1: Relevant terms Epistasis: Interactions between genes or mutations such that the phenotypic consequence of multiple mutations differs from the additive expectation based on the constituent mutations. Higher-order epistasis: epistasis of order three or more. Sign epistasis: epistasis in which the sign of the effect of a mutation depends on genetic background. Magnitude epistasis: epistasis in which the sign of the effect of a mutation is independent of genetic background.

  • 6

    vary with all other genetic backgrounds (Weinreich et al. 2013)? Only recently have

    higher order epistatic relationships, interactions between three or more mutations,

    begun to be examined. In part, this is due to the sheer number of higher order

    interactions. For a given dataset of size n, there are !" k-way epistatic interactions, and

    2L total interactions (Weinreich et al. 2013).

    Using the Walsh transformation to quantify different orders of epistasis, higher order

    epistatic interactions were found in 14 datasets analyzed by Weinreich et al. (2013).

    Even though we can empirically observe epistasis in the data, this observation alone

    does not mean that higher order epistasis is as significant as main effects or pairwise

    epistasis. A recent contribution by Palmer et al. (2015) on trimethoprim resistance in

    Escherichia coli modeled the importance of higher order epistasis in shaping a fitness

    landscape. Their results showed that a portion of higher order epistatic interactions

    were responsible for shaping more of the fitness landscape than some of individual

    effects of mutations and pairwise epistatic interactions [Appendix 1a].

    For this thesis, I will analyze all available combinatorically complete fitness data sets

    using Walsh coefficients to quantify epistatic terms. I will construct regression models of

    increasing complexity with these data and quantify the significance of higher order

    epistasis against the naïve expectation that as the order of epistasis increases, the

    relative contribution to shaping the fitness landscape decreases. Lastly, I will discuss

  • 7

    the implications of these findings and future directions of research in higher order

    epistasis.

  • 8

    Methods Data Handling

    The combinatorically complete datasets used in this meta-study come from many

    different biological systems of interest in evolutionary biology and medicine. For each

    individual system fitness was inferred by a proxy: growth rate assay, 75% Inhibitory

    Concentration (IC-75), Minimum Inhibitory Concentration (MIC), binding assays, and

    melting points. While not directly equivalent to each other or direct fitness, growth rate,

    MIC and IC-75 measurements have been shown to be closely correlated (Whitlock &

    Bourguet 2000). In addition, binding assays and melting points, provide metrics to

    measure the fitness of proteins. In experiments that tested the same combinations of

    mutations against many different antibiotic backgrounds, I have randomly chosen one

    such background to include in my final analysis. Datasets with locus that contain more

    than two states have been randomly subsampled to create bi-allelic combinatorically

    complete data (Palmer et al. 2015).

    Walsh Coefficients

    Prior to calculating Walsh Coefficients, data were log-transformed consistent with

    Weinreich et al. (2013), which minimized higher order epistatic effects. In each

    transformed dataset of 2L fitness values, each allele was represented by a binary bit

    string, where one represented the presence of a mutation and zero represented the

    absence (Weinreich et al 2013). These bit strings were sorted from smallest to largest

    according to their binary values and represented as a vector W. Walsh coefficients were

  • 9

    generated through applying the Walsh transformation to W and scaling the result by a

    factor of 2-L [Box 2]. The resulting E vector of Walsh Coefficients is already sorted from

    smallest to largest per their binary values. The binary representation of each Walsh

    coefficient can be thought of as the epistatic interaction term between the Loci with ones

    in the bit string (Weinreich et al. 2013). The number of number present in each Walsh

    coefficient bit string representation are referred to as the Walsh coefficients epistatic

    order.

    An important property of the Walsh Transformation is that it is symmetrical and has an

    orthogonal basis. This guarantees the independence of each Walsh coefficient and

    prevents pairwise epistatic coefficients from capturing higher order epistatic effects, a

    common issue with regression analysis (Heckendorn & Whitley 1999; Hartl, 2014). It

    also allows one to easily convert Walsh coefficients back to fitness values.

    Linear regression A step-wise linear regression was performed by back-transforming subsets of Walsh

    coefficients to fitness values and quantifying how much of the data these coefficients

    explained. To start, each Walsh coefficient paired with the 0th Walsh coefficient (the

    grand average) and back-transformed to fitness. The subset of two with the most

    explanatory power was chosen, keeping track of the non-0th Walsh coefficient and the

    remaining unexplained variance. Next, all other Walsh coefficients were put into subsets

    with the 0th Walsh coefficient, and the previous winner. These subsets of three were

    back-transformed to fitness values, and the triplicate with the most explanatory power

  • 10

    was chosen, noting the last added Walsh coefficient and the remaining unexplained

    variance. This process was iterated until no Walsh coefficients remained. The order that

    Walsh coefficients were added to this model could be predicted by absolute magnitude

    of the Walsh Coefficients (Lan & Heckendorn, pers. comm. 2017).

    Statistical testing Intuitively, we would expect Walsh Coefficients with lower epistatic order, to explain

    more variance in the fitness data. This intuition comes from the fact that biological

    systems are primarily additive, and main effects should play a more dominant role in

    shaping the fitness landscape. The implication of this is observation is that the order in

    which Walsh coefficients are added to the regression model should be predicted by the

    epistatic interaction order the coefficient represents. For example, in a system with

    three possible mutations, L = 3 (2L Walsh coefficients), we would expect the order of

    Walsh coefficients by explanatory power to be exactly the grand average Walsh

    coefficient (0th order epistasis), followed by the three main effects coefficients (1st order

    epistasis), the three pairwise epistatic coefficients (2nd order epistasis), and ending with

    the 3rd order Walsh coefficient.

    Kendall’s tb rank correlation, which can account for multiple ties, was calculated

    between the observed ordering of Walsh coefficients epistatic order detailed in our

    linear regression model and the expected ordering of Walsh Coefficient epistatic order

    (Kendall, 1938).

  • 11

    Figure 1: (a) sample calculation of Walsh coefficients. (b) Oobserved is ordered according to the corresponding Walsh Coefficient absolute magnitude. Note how the 4th Walsh coefficient of epistasis order 2, has an absolute magnitude less than the 8th Walsh coefficient of epistasis order 3. (c) This is the t-b distribution generated through 10,000 psuedo-sample by randomly sampling Walsh coefficient order and calculating tb between each pseudo-sample and the Oexpected. The p-value was the proportion of pseudo-sample tb greater than our observed tb. In this case to the right of the orange line.

    To characterize how ordered the epistatic orders of the Walsh coefficients were relative

    to Walsh coefficients being ordered randomly, a probability distribution was generated

    by randomly pseudo-sampling Walsh Coefficient 1000 times without replacement.

    Kendal tb was calculated between the pseudo-samples of Walsh coefficient order and

    expected order of Walsh coefficient epistatic order. We then took our p-value to be the

    proportion of Tau from the pseudo-sample greater than observed correlation and

    corrected for testing between multiple datasets with the Holm-Bonferroni correction

    (Holm, 1979). Figure1

  • 12

    Results In most datasets, step-wise regression revealed Walsh coefficients with higher order

    epistatic order appearing before at least one main effect or pairwise Walsh coefficient

    [Figure 1]. Only 7 of the 19 datasets exhibited statistically significant orderings of Walsh

    Coefficient by epistatic ordering [Table 1]. In studies that tested the same mutational

    sites across different environmental background, I found Walsh coefficient order and the

    subsequent ordering of epistatic changed relative to environment [Appendix 2a & 2b].

    In these cases I generated a complete table of uncorrected p-values [Appendix 3a].

  • 13

  • 14

    Figure 2: Each subfigure letter corresponds to a step-wise linear regression performed on datasets in table 1. Each point represents the epistatic order of the last added parameter to the regression, while the y-axis represents the remaining unexplained variance in the model.

  • 15

    Table 1: Statistical tests in unique systems (some description taken directly from Weinreich et al. 2013)

    * indicates statistical significance

    Figure 1

    System

    # of loci

    Tau

    Holm-Bonferroni

    Corrected p-value

    Reference

    e Methylobacterium extorquens

    Beneficial mutations in novel metabolic pathway

    4 0.7527 0* Chou et al. 2011

    d Escherichia coli

    Beneficial mutations that increase growth rates

    5 0.5415 0.0009*

    Khan et al. 2011

    s E. coli

    Coenzyme used in isopropylmalate dehydrogenase (IMDH)

    9 0.5257 0.00187*

    Lunzer et al. 2005

    g E. coli

    Metallo-β-lactamases 4 0.6989 0.00288* Meini et al 2015

    p Drosophila melanogaster

    5 0.4896 0.00465* Whitlock, & Bourguet 2000

    b Saccharomyces cerevisiae

    Plasmodium falciparum DHFR 3 0.8182 0.00924* Brown et al. 2010

    f Avian Lysozome

    3 0.8182 0.0104* Malcolm et al. 1990

    i HIV Glycoprotein Mutants

    5 0.3964 0.0366* Da Silva et al. 2010

    q E. coli

    TEM b-Lactamase mutation 5 0.3731 0.05907 Weinreich et al. 2006

    h E. coli

    piperacillin with clavulanic acid 5 0.3290 0.12 Tan et al. 2010

    n E. coli

    pyrimethamine resistance 4 0.4516 0.15417 Lozovsky et al. 2009

    r Solinaceae sequiterpine

    6 0.1974 0.21344 O’Maille et al. 2006

    c Plasmodium falciparum

    Pyrimethamine resistance 3 0.4545 0.38108 Constanzo & Hartl 2011

    o E. coli

    Trimethoprim resistance recombined DHFR gene

    6 0.1324 0.5883

    Palmer et al. 2016

    j Aspergillus niger

    testing recombination and the evolution of sex 5 0.13062 0.6531 De Visser et al. 2009

    k E. coli

    TEM b-50 lactamase cefotaxime background 4 0.1505 1.21668 Mira et al. 2015

    a Mammalian glucocorticoid receptor

    4 0.1075 1.07058 Bridgham et al. 2006

    l Saccharomyces cerevisiae (haploid) evolution of sex. 6 0.02522 0.8007 Hall et al. 2010

    m Saccharomyces cerevisiae (diploid) evolution of sex. 6 0.06620 0.74048 Hall et al. 2010

  • 16

    Discussion For 13 of the 19 tested datasets, I conclude that the ordering of Walsh Coefficients was

    not significantly different from coefficients being randomly ordered. While the largest

    dataset tested, Lunzer et al. (2005), was found to be ordered non-randomly, the number

    of loci within a dataset did not appear to predict if the dataset’s Walsh coefficients would

    be significantly ordered. In addition, the type of biological system did not appear to be a

    predictor of ordering; some datasets testing antibiotic resistance, and metabolic growth

    pathways were significantly ordered, while others were not. One pattern that did emerge

    was that all datasets that came from studies exploring the evolution of sex were not

    significantly ordered. were intergenic while all the other studies were intragenic.

    How unexpected is it that some higher order epistatic terms appear to explain more

    variance than main effects or pairwise epistasis? Particularly in larger datasets, when

    higher order epistatic coefficients greatly outnumbering main effects and pairwise

    epistatic coefficients, is it inevitable that there will be some randomness in how these

    coefficients are ordered? In reviewing many of the same datasets used in this thesis,

    Hartl (2014) notes that in many of these systems, as beneficial mutations accumulate,

    their returns diminish, following a curve similar to a Michaelis-Menten enzyme kinetics

    curve. Hartl (2014) reasons that when beneficial mutations result map to fitness on the

    very shallow or very steep slope of the curve, fitness appears additive, but when it is

    mapped to the center of the curve, magnitude epistasis appears.

  • 17

    However, while magnitude epistasis can change the apparent magnitude of Walsh

    Coefficients, it does not change the order in which Walsh coefficients appear in our

    regression model. A recent paper by Sailor & Harms (2017) agreed that not accounting

    for non-linearity prior preforming the Walsh Transformation can overestimate higher

    order epistatic interactions. The authors suggest performing a non-linear regression to

    identify the optimal power transformation to linearize raw data and account for non-

    additive interactions.

    Do different environmental backgrounds affect how Walsh Coefficients order

    themselves by explanatory power in the same gene system? In Mira et al. (2014), the

    authors measured the average growth rates of all combinations of the same four

    mutations in the TEM-50 b-Lactamase gene across 15 different antibiotic media. The

    resulting tb correlation coefficients were highly dependent on the antibiotic medium the

    bacteria were grown in, with coefficients ranging from -.29 to .41. This difference in

    correlation statistics is not surprising, since the authors noted that natural selection

    favored different combinations of mutations in different growth media (Mira et al. 2014).

    Handling experimental uncertainty The next step to develop a more robust model and statistical testing will involve

    handling uncertainty. Particularly in the context of ranking Walsh Coefficients by

    explanatory power in the regression model, when does the additional explanatory power

    of very small Walsh Coefficients become indistinguishable? A potential solution to this

  • 18

    problem would involve implementing an information criterion in the regression model,

    such as Akiko Information Criterion (AIC), which would identify when to stop keeping

    track of Walsh Coefficient ranks. In this scheme, all coefficients after the AIC cutoff

    would be tied for having the least explanatory power.

    Handling non-binary cardinalities The Walsh Transformation is limited in that it cannot handle non-biallelic loci systems.

    This turns out to be an impediment when analyzing epistatic interactions in genotype

    space with four possible nucleic acids and protein space, which can have as many as

    twenty amino acid substitutions. The concept of the Walsh Transformations has been

    shown to be generalizable to encompass homogeneously larger alphabets (Zhou et al.

    2007). However, both the Walsh Transformation and its extension into higher

    dimensionalities suffer from the inability to handle heterogeneous loci systems.

    Anderson et al. (2015) has proposed a scheme that identifies epistatic interactions in

    systems where each locus can handle alphabets of size 2k. Still, this scheme is not a

    robust method to handle different alphabet sizes, and becomes requires one to

    construct a transformation table by hand. Within this thesis, I handled heterogeneous

    loci systems by randomly subsampling from all available mutations. Palmer et al. (2015)

    had one non-binary locus with a cardinality of three. This allowed for creating

    subsampling affected Walsh coefficient ordering. The resulting tb correlation

    coefficients ranged from .13 - .24, however, regardless of which subsamples were

    chosen, the resulting corrected p-values would have been non-significant.

  • 19

    Conclusions

    Higher order epistasis is present in every combinatorically complete dataset that I have

    examined. It is pervasive in shaping fitness landscapes and determining evolutionary

    accessible peaks. The Walsh Transform provides a straightforward way to quantify

    higher order epistasis. Due to the Walsh Transform’s orthogonal properties, ranking

    Walsh coefficients by absolute magnitude is equivalent to identifying coefficients by

    order of explanatory power in a regression model (Lan & Heckendorn, pers. commun

    2017). Contrary to the naïve expectation that higher order epistatic terms are less

    important than main effects or pairwise epistatic terms, I have shown that in 13 of the 19

    datasets examined, higher order epistatic terms were not significantly less important

    than other terms.

  • 20

    Acknowledgments I would like to thank Professor Daniel Weinreich Ph.D. and Yinghong Lan for providing me with invaluable mentorship throughout this project. I would also like to thank my parents for supporting me throughout my life and feeding my insatiable curiosity throughout my childhood.

  • 21

    Bibliography Anderson DW, McKeown AN, Thornton JW. Intermolecular epistasis shaped the

    function and evolution of an ancient transcription factor and its DNA binding sites. Elife. 2015 Jun 15;4:e07864.

    Bateson W, Mendel G. Mendel's principles of heredity. University press; 1913. Bridgham, J.T., Carroll, S.M. and Thornton, J.W., 2006. Evolution of hormone-receptor

    complexity by molecular exploitation. Science, 312(5770), pp.97-101. Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R. and Hartl, D.L., 2010.

    Compensatory mutations restore fitness during the evolution of dihydrofolate reductase. Molecular biology and evolution, 27(12), pp.2682-2690.

    Chou HH, Chiu HC, Delaney NF, Segrè D, Marx CJ. Diminishing returns epistasis

    among beneficial mutations decelerates adaptation. Science. 2011 Jun 3;332(6034):1190-2.

    Costanzo, M.S. and Hartl, D.L., 2011. The evolutionary landscape of antifolate

    resistance in Plasmodium falciparum. Journal of genetics, 90(2), pp.187-190. Crona K, Greene D, Barlow M. The peaks and geometry of fitness landscapes. Journal

    of theoretical biology. 2013 Jan 21;317:1-0. da Silva, J., Coetzer, M., Nedellec, R., Pastore, C. and Mosier, D.E., 2010. Fitness

    epistasis and constraints on adaptation in a human immunodeficiency virus type 1 protein region. Genetics, 185(1), pp.293-303.

    de Visser, J.A.G., Park, S.C. and Krug, J., 2009. Exploring the effect of sex on empirical

    fitness landscapes. the american naturalist, 174(S1), pp.S15-S30. Fisher RA. The genetical theory of natural selection: a complete variorum edition.

    Oxford University Press; 1930. Flynn KM, Cooper TF, Moore FB, Cooper VS. The environment affects epistatic

    interactions to alter the topology of an empirical fitness landscape. PLoS Genet. 2013 Apr 4;9(4):e1003426.

    Franke J, Klözer A, de Visser JA, Krug J. Evolutionary accessibility of mutational

    pathways. PLoS Comput Biol. 2011 Aug 18;7(8):e1002134.

  • 22

    Hall, D.W., Agan, M. and Pope, S.C., 2010. Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. Journal of heredity, 101(suppl 1), pp.S75-S84.

    Hartl DL. What can we learn from fitness landscapes?. Current opinion in microbiology.

    2014 Oct 31;21:51-7. Heckendorn RB, Whitley D. Predicting epistasis from mathematical models.

    Evolutionary Computation. 1999;7(1):69-101. Holm S. A simple sequentially rejective multiple test procedure. Scandinavian journal of

    statistics. 1979 Jan 1:65-70. Kendall MG. A new measure of rank correlation. Biometrika. 1938 Jun 1;30(1/2):81-93. Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E. and Cooper, T.F., 2011. Negative

    epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), pp.1193-1196.

    Lan & Heckendorn, personal Commication 2017 Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J.,

    Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M. and Hartl, D.L., 2009. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of the National Academy of Sciences, 106(29), pp.12025-12030.

    Lunzer, M., Miller, S.P., Felsheim, R. and Dean, A.M., 2005. The biochemical

    architecture of an ancient adaptive landscape. Science, 310(5747), pp.499-501. Malcolm BA, Wilson KP, Matthews BW, Kirsch JF, Wilson AC. Ancestral lysozymes

    reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature. 1990 May 3;345(6270):86.

    Meini MR, Tomatis PE, Weinreich DM, Vila AJ. Quantitative description of a protein

    fitness landscape based on molecular features. Molecular biology and evolution. 2015 Jul 1;32(7):1774-87.

    Miller, S.P., Lunzer, M. and Dean, A.M., 2006. Direct demonstration of an adaptive

    constraint. Science, 314(5798), pp.458-461. Mira PM, Crona K, Greene D, Meza JC, Sturmfels B, Barlow M. Rational design of

    antibiotic treatment plans: a treatment strategy for managing evolution and reversing resistance. PloS one. 2015 May 6;10(5):e0122283.

  • 23

    O'maille, P.E., Malone, A., Dellas, N., Hess, B.A., Smentek, L., Sheehan, I.,

    Greenhagen, B.T., Chappell, J., Manning, G. and Noel, J.P., 2008. Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature chemical biology, 4(10), pp.617-623.

    Palmer, A.C., Toprak, E., Baym, M., Kim, S., Veres, A., Bershtein, S. and Kishony, R.,

    2015. Delayed commitment to evolutionary fate in antibiotic resistance fitness landscapes. Nature communications, 6.

    Phillips PC. Epistasis—the essential role of gene interactions in the structure and

    evolution of genetic systems. Nature Reviews Genetics. 2008 Nov 1;9(11):855-67.

    Phillips PC. The language of gene interaction. Genetics. 1998 Jul 1;149(3):1167-71. Sailer ZR, Harms MJ. Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype

    Maps. Genetics. 2017 Mar 1;205(3):1079-88. Smith, John Maynard. "Natural selection and the concept of a protein space." (1970):

    563-564. Tan, L., Serene, S., Chao, H.X. and Gore, J., 2011. Hidden randomness between

    fitness landscapes limits reverse evolution. Physical review letters, 106(19), p.198102.

    Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only

    very few mutational paths to fitter proteins. science. 2006 Apr 7;312(5770):111-4. Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. Should evolutionary geneticists worry

    about higher-order epistasis? Current opinion in genetics & development. 2013 Dec 31;23(6):700-7.

    Weinreich DM, Watson RA, Chao L. Perspective: sign epistasis and genetic constraint

    on evolutionary trajectories. Evolution. 2005 Jun;59(6):1165-74. Whitlock, M.C. and Bourguet, D., 2000. Factors affecting the genetic load in Drosophila:

    synergistic epistasis and correlations among fitness components. Evolution, 54(5), pp.1654-1660.

    Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution.

    na; 1932 Aug.

  • 24

    Zhou S, Sun Z, Heckendorn RB. Extended probe method for linkage discovery over high-cardinality alphabets. InProceedings of the 9th annual conference on Genetic and evolutionary computation 2007 Jul 7 (pp. 1484-1491). ACM.

  • 25

    Appendix 1a. Palmer et al. (2015) linear regression figure

    “Each genotype was fitted by a series of models of increasing complexity, that assigned changes in resistance through parameters based on the grand average of fitness, the individual effects of mutations, and pairwise and higher order epistatic interactions. The single parameter with the greatest explanatory power was chosen, followed by the parameter with the second greatest explanatory power when combined with the first. Each point in the plot denotes the class of parameter added and the residual variance with that many parameters.” Palmer et al. (2015)

  • 26

    2a. Mira et al. (2015) table of TEM b-50 lactamase system grown on 15 different antibiotic backgrounds.

    Antibiotic Media Number of loci tb correlation p-value

    Ampicillin (AMP) 4 0.02151 0.4573 Amoxicillin (AM) 4 -0.032258 0.5448 Cefaclor (CEC) 4 0.40860 0.025 Cefotaxime (CTX) 4 0.15054 0.2424 Ceftizoxime (ZOX) 4 0.35484 0.0531 Cefuroxime (CRO) 4 0.322581 0.0673 Ceftriaxone (AMC) 4 0.150538 0.242 Amoxicillin + Clavulanic acid (AMC) 4 0 0.4832

    Ceftazidime (CAZ) 4 0.150538 0.239 Cefotetan (CTT) 4 0.010753 0.4696 Ampicillin + Sulbactam (SAM) 4 0.376344 0.0397

    Cefotprozil (CPR) 4 0.215054 0.1605 Cefpodoxime (CPD 4 0.193549 0.1863 Pipercillin + Tazobactam (TZP) 4 0.225806 0.145

    Cefepime (FEP) 4 -0.29032 0.902

  • 27

    2b. Regression Analysis of Mira et al. (2015)

    These models show that different medium backgrounds produce different fitness landscapes and regression models.

  • 28

    3a. Other Sub-sampled data from the same system

    System

    # of loci

    tb correlation

    p-value

    Reference

    E. coli Subset 01

    Trimethoprim resistance 6

    0.1324

    .09305

    Palmer et al. 2015

    E. coli Subset 02

    Trimethoprim resistance 6 0.1412 0.08228

    Palmer et al. 2015

    E. coli Subset 12

    Trimethoprim resistance 6 0.2478 0.00717

    Palmer et al. 2015

    A. niger testing recombination and the

    evolution of sex 5

    0.13062

    .1089

    De Visser et al. 2009

    A. niger testing recombination and the

    evolution of sex 5 0.2176 0.07031

    Franke et al. 2011

    Escherichia coli Beneficial mutations that increase

    growth rates in Guanazole 5 0.4819 0.0029

    Flynn et al. 2013

    Escherichia coli Beneficial mutations that increase

    growth rates in EGTA 5 0.3549 0.0076

    Flynn et al. 2013

    Escherichia coli Beneficial mutations that increase

    growth 5 0.5415 0

    Khan et al. 2011