Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...

Identifying the importance of higher order epistasis

Jacob Jaffe

B.Sc. Honors Candidate in Applied Mathematic - Biology

Advisor: Professor Daniel Weinreich Ph.D. Second Reader: Professor Anastasios Matzavinos Ph.D.

2

Table of Contents Abstract...............................................................................................................................3Introduction.........................................................................................................................4Methods...............................................................................................................................8

Data Handling ......................................................................................................................... 8Walsh Coefficients ................................................................................................................. 8Linear regression ................................................................................................................... 9Statistical testing .................................................................................................................. 10

Results................................................................................................................................12Discussion..........................................................................................................................16

Handling experimental uncertainty ..................................................................................... 17Handling non-binary cardinalities ....................................................................................... 18

Conclusions.......................................................................................................................19Acknowledgments............................................................................................................20Bibliography.......................................................................................................................21Appendix............................................................................................................................25

1a. Palmer et al. (2015) linear regression figure ................................................................ 252a. Mira et al. (2015) table of TEM b-50 lactamase system grown on 15 different antibiotic backgrounds. ....................................................................................................... 262b. Regression Analysis of Mira et al. (2015) ..................................................................... 273a. Other Sub-sampled data from the same system ......................................................... 28

3

Abstract

Higher order epistasis, interactions between more than two mutations, is prevalent in

many different fitness landscapes. While traditionally, evolutionary biologists have

viewed higher order epistatic interactions as less important than main effects or pairwise

epistatic interactions, recent work suggests that higher order epistatic interactions can

explain a portion of variance in fitness landscapes. In this thesis, I use the Walsh

Transformation to rank the explanatory power of individual epistatic coefficients. I find

that in 13 of 19 distinct datasets, higher order epistasis is not incrementally less

important than lower order epistasis or point mutations. This result provides strong

evidence that higher order epistatic interactions play an important role in evolution and

determining fitness landscapes. Furthermore, it shows that new techniques are needed

to better quantify epistasis in non-homogeneous loci systems.

4

Introduction Epistasis, the interaction between genes or mutations, is integral to understanding the

dynamics that guide evolution. Epistasis is widely observed in nature and is implicated

in biological phenomena ranging from human disease to antibiotic resistance (Phillips et

al. 1998; Weinreich et al. 2006). Historically, the term epistasis has taken on different

definitions to describe distinct, yet sometimes interacting, biological events including

allele frequencies in Mendelian genetics, unexpected molecular or protein regulatory

interactions resulting from multiple base pair changes, and the statistical deviation from

the additive expectation of two individual mutations (Bateson & Mendel 1913; Fisher et

al. 1930; Philips et al. 2008). For the purposes of this thesis, when I refer to epistasis, I

will mean functional epistasis (Weinreich et al. 2005). Specifically, I will take epistasis to

be the phenotypic deviation from the additive combination of two individual mutations for

a constant environmental and genetic background (Weinreich et. al 2013).

In evolutionary biology, measuring fitness or reproductive success, is an approach used

to quantify and predict evolutionary trends of a population of organisms. The concept of

a fitness landscape, first introduced by Seawall Wright (1932), is a graphical

representation of genotype-phenotype mapping of fitness where proximity signifies

genotypic relatedness and peaks signify local fitness maxima. A population is said to

climb up the fitness landscape when it acquires beneficial mutations that increase

fitness. Consequently, populations can become ‘stuck’ on local fitness peak when

5

reaching the global fitness maximum would involve traversing fitness ‘valleys’. [For

definitions of relevant terms see Box 1]

It is widely accepted that sign epistasis can

give rise to increased complexity, or

‘ruggedness’, in fitness landscapes (Weinreich

et al. 2005). This additional ruggedness can

substantially limit the number of accessible

mutational pathways for natural selection, since

each successive acquired mutation must either

be beneficial or neutral. To investigate these consequences on accessible mutational

pathways, it is necessary to consider the projection of an organism’s fitness in

nucleotide sequence space where each locus represents the presence or absence of a

given mutation (Weinreich et al. 2013; Smith, 1970). For example, if each locus (L) has

two possible states, wild type or mutated, constructing all 2L to combinations of

mutations and measuring a proxy for fitness, allows one to construct a complete fitness

landscape. Qualitatively, these measurements can be considered as rank orders in

phenotype, where paths must follow increasing rank orders to reach a fitness maximum

(Crona et al. 2013).

Traditionally, evolutionary biologists have only considered the main effects of single

mutations and pairwise epistatic effects on fitness important in evolution (Hartl, 2014). If

pairwise epistasis is evolutionary significant, how often does pairwise epistasis itself

Box 1: Relevant terms Epistasis: Interactions between genes or mutations such that the phenotypic consequence of multiple mutations differs from the additive expectation based on the constituent mutations. Higher-order epistasis: epistasis of order three or more. Sign epistasis: epistasis in which the sign of the effect of a mutation depends on genetic background. Magnitude epistasis: epistasis in which the sign of the effect of a mutation is independent of genetic background.

6

vary with all other genetic backgrounds (Weinreich et al. 2013)? Only recently have

higher order epistatic relationships, interactions between three or more mutations,

begun to be examined. In part, this is due to the sheer number of higher order

interactions. For a given dataset of size n, there are !" k-way epistatic interactions, and

2L total interactions (Weinreich et al. 2013).

Using the Walsh transformation to quantify different orders of epistasis, higher order

epistatic interactions were found in 14 datasets analyzed by Weinreich et al. (2013).

Even though we can empirically observe epistasis in the data, this observation alone

does not mean that higher order epistasis is as significant as main effects or pairwise

epistasis. A recent contribution by Palmer et al. (2015) on trimethoprim resistance in

Escherichia coli modeled the importance of higher order epistasis in shaping a fitness

landscape. Their results showed that a portion of higher order epistatic interactions

were responsible for shaping more of the fitness landscape than some of individual

effects of mutations and pairwise epistatic interactions [Appendix 1a].

For this thesis, I will analyze all available combinatorically complete fitness data sets

using Walsh coefficients to quantify epistatic terms. I will construct regression models of

increasing complexity with these data and quantify the significance of higher order

epistasis against the naïve expectation that as the order of epistasis increases, the

relative contribution to shaping the fitness landscape decreases. Lastly, I will discuss

7

the implications of these findings and future directions of research in higher order

epistasis.

8

Methods Data Handling

The combinatorically complete datasets used in this meta-study come from many

different biological systems of interest in evolutionary biology and medicine. For each

individual system fitness was inferred by a proxy: growth rate assay, 75% Inhibitory

Concentration (IC-75), Minimum Inhibitory Concentration (MIC), binding assays, and

melting points. While not directly equivalent to each other or direct fitness, growth rate,

MIC and IC-75 measurements have been shown to be closely correlated (Whitlock &

Bourguet 2000). In addition, binding assays and melting points, provide metrics to

measure the fitness of proteins. In experiments that tested the same combinations of

mutations against many different antibiotic backgrounds, I have randomly chosen one

such background to include in my final analysis. Datasets with locus that contain more

than two states have been randomly subsampled to create bi-allelic combinatorically

complete data (Palmer et al. 2015).

Walsh Coefficients

Prior to calculating Walsh Coefficients, data were log-transformed consistent with

Weinreich et al. (2013), which minimized higher order epistatic effects. In each

transformed dataset of 2L fitness values, each allele was represented by a binary bit

string, where one represented the presence of a mutation and zero represented the

absence (Weinreich et al 2013). These bit strings were sorted from smallest to largest

according to their binary values and represented as a vector W. Walsh coefficients were

9

generated through applying the Walsh transformation to W and scaling the result by a

factor of 2-L [Box 2]. The resulting E vector of Walsh Coefficients is already sorted from

smallest to largest per their binary values. The binary representation of each Walsh

coefficient can be thought of as the epistatic interaction term between the Loci with ones

in the bit string (Weinreich et al. 2013). The number of number present in each Walsh

coefficient bit string representation are referred to as the Walsh coefficients epistatic

order.

An important property of the Walsh Transformation is that it is symmetrical and has an

orthogonal basis. This guarantees the independence of each Walsh coefficient and

prevents pairwise epistatic coefficients from capturing higher order epistatic effects, a

common issue with regression analysis (Heckendorn & Whitley 1999; Hartl, 2014). It

also allows one to easily convert Walsh coefficients back to fitness values.

Linear regression A step-wise linear regression was performed by back-transforming subsets of Walsh

coefficients to fitness values and quantifying how much of the data these coefficients

explained. To start, each Walsh coefficient paired with the 0th Walsh coefficient (the

grand average) and back-transformed to fitness. The subset of two with the most

explanatory power was chosen, keeping track of the non-0th Walsh coefficient and the

remaining unexplained variance. Next, all other Walsh coefficients were put into subsets

with the 0th Walsh coefficient, and the previous winner. These subsets of three were

back-transformed to fitness values, and the triplicate with the most explanatory power

10

was chosen, noting the last added Walsh coefficient and the remaining unexplained

variance. This process was iterated until no Walsh coefficients remained. The order that

Walsh coefficients were added to this model could be predicted by absolute magnitude

of the Walsh Coefficients (Lan & Heckendorn, pers. comm. 2017).

Statistical testing Intuitively, we would expect Walsh Coefficients with lower epistatic order, to explain

more variance in the fitness data. This intuition comes from the fact that biological

systems are primarily additive, and main effects should play a more dominant role in

shaping the fitness landscape. The implication of this is observation is that the order in

which Walsh coefficients are added to the regression model should be predicted by the

epistatic interaction order the coefficient represents. For example, in a system with

three possible mutations, L = 3 (2L Walsh coefficients), we would expect the order of

Walsh coefficients by explanatory power to be exactly the grand average Walsh

coefficient (0th order epistasis), followed by the three main effects coefficients (1st order

epistasis), the three pairwise epistatic coefficients (2nd order epistasis), and ending with

the 3rd order Walsh coefficient.

Kendall’s tb rank correlation, which can account for multiple ties, was calculated

between the observed ordering of Walsh coefficients epistatic order detailed in our

linear regression model and the expected ordering of Walsh Coefficient epistatic order

(Kendall, 1938).

11

Figure 1: (a) sample calculation of Walsh coefficients. (b) Oobserved is ordered according to the corresponding Walsh Coefficient absolute magnitude. Note how the 4th Walsh coefficient of epistasis order 2, has an absolute magnitude less than the 8th Walsh coefficient of epistasis order 3. (c) This is the t-b distribution generated through 10,000 psuedo-sample by randomly sampling Walsh coefficient order and calculating tb between each pseudo-sample and the Oexpected. The p-value was the proportion of pseudo-sample tb greater than our observed tb. In this case to the right of the orange line.

To characterize how ordered the epistatic orders of the Walsh coefficients were relative

to Walsh coefficients being ordered randomly, a probability distribution was generated

by randomly pseudo-sampling Walsh Coefficient 1000 times without replacement.

Kendal tb was calculated between the pseudo-samples of Walsh coefficient order and

expected order of Walsh coefficient epistatic order. We then took our p-value to be the

proportion of Tau from the pseudo-sample greater than observed correlation and

corrected for testing between multiple datasets with the Holm-Bonferroni correction

(Holm, 1979). Figure1

12

Results In most datasets, step-wise regression revealed Walsh coefficients with higher order

epistatic order appearing before at least one main effect or pairwise Walsh coefficient

[Figure 1]. Only 7 of the 19 datasets exhibited statistically significant orderings of Walsh

Coefficient by epistatic ordering [Table 1]. In studies that tested the same mutational

sites across different environmental background, I found Walsh coefficient order and the

subsequent ordering of epistatic changed relative to environment [Appendix 2a & 2b].

In these cases I generated a complete table of uncorrected p-values [Appendix 3a].

14

Figure 2: Each subfigure letter corresponds to a step-wise linear regression performed on datasets in table 1. Each point represents the epistatic order of the last added parameter to the regression, while the y-axis represents the remaining unexplained variance in the model.

15

Table 1: Statistical tests in unique systems (some description taken directly from Weinreich et al. 2013)

* indicates statistical significance

Figure 1

System

# of loci

Tau

Holm-Bonferroni

Corrected p-value

Reference

e Methylobacterium extorquens

Beneficial mutations in novel metabolic pathway

4 0.7527 0* Chou et al. 2011

d Escherichia coli

Beneficial mutations that increase growth rates

5 0.5415 0.0009*

Khan et al. 2011

s E. coli

Coenzyme used in isopropylmalate dehydrogenase (IMDH)

9 0.5257 0.00187*

Lunzer et al. 2005

g E. coli

Metallo-β-lactamases 4 0.6989 0.00288* Meini et al 2015

p Drosophila melanogaster

5 0.4896 0.00465* Whitlock, & Bourguet 2000

b Saccharomyces cerevisiae

Plasmodium falciparum DHFR 3 0.8182 0.00924* Brown et al. 2010

f Avian Lysozome

3 0.8182 0.0104* Malcolm et al. 1990

i HIV Glycoprotein Mutants

5 0.3964 0.0366* Da Silva et al. 2010

q E. coli

TEM b-Lactamase mutation 5 0.3731 0.05907 Weinreich et al. 2006

h E. coli

piperacillin with clavulanic acid 5 0.3290 0.12 Tan et al. 2010

n E. coli

pyrimethamine resistance 4 0.4516 0.15417 Lozovsky et al. 2009

r Solinaceae sequiterpine

6 0.1974 0.21344 O’Maille et al. 2006

c Plasmodium falciparum

Pyrimethamine resistance 3 0.4545 0.38108 Constanzo & Hartl 2011

o E. coli

Trimethoprim resistance recombined DHFR gene

6 0.1324 0.5883

Palmer et al. 2016

j Aspergillus niger

testing recombination and the evolution of sex 5 0.13062 0.6531 De Visser et al. 2009

k E. coli

TEM b-50 lactamase cefotaxime background 4 0.1505 1.21668 Mira et al. 2015

a Mammalian glucocorticoid receptor

4 0.1075 1.07058 Bridgham et al. 2006

l Saccharomyces cerevisiae (haploid) evolution of sex. 6 0.02522 0.8007 Hall et al. 2010

m Saccharomyces cerevisiae (diploid) evolution of sex. 6 0.06620 0.74048 Hall et al. 2010

16

Discussion For 13 of the 19 tested datasets, I conclude that the ordering of Walsh Coefficients was

not significantly different from coefficients being randomly ordered. While the largest

dataset tested, Lunzer et al. (2005), was found to be ordered non-randomly, the number

of loci within a dataset did not appear to predict if the dataset’s Walsh coefficients would

be significantly ordered. In addition, the type of biological system did not appear to be a

predictor of ordering; some datasets testing antibiotic resistance, and metabolic growth

pathways were significantly ordered, while others were not. One pattern that did emerge

was that all datasets that came from studies exploring the evolution of sex were not

significantly ordered. were intergenic while all the other studies were intragenic.

How unexpected is it that some higher order epistatic terms appear to explain more

variance than main effects or pairwise epistasis? Particularly in larger datasets, when

higher order epistatic coefficients greatly outnumbering main effects and pairwise

epistatic coefficients, is it inevitable that there will be some randomness in how these

coefficients are ordered? In reviewing many of the same datasets used in this thesis,

Hartl (2014) notes that in many of these systems, as beneficial mutations accumulate,

their returns diminish, following a curve similar to a Michaelis-Menten enzyme kinetics

curve. Hartl (2014) reasons that when beneficial mutations result map to fitness on the

very shallow or very steep slope of the curve, fitness appears additive, but when it is

mapped to the center of the curve, magnitude epistasis appears.

17

However, while magnitude epistasis can change the apparent magnitude of Walsh

Coefficients, it does not change the order in which Walsh coefficients appear in our

regression model. A recent paper by Sailor & Harms (2017) agreed that not accounting

for non-linearity prior preforming the Walsh Transformation can overestimate higher

order epistatic interactions. The authors suggest performing a non-linear regression to

identify the optimal power transformation to linearize raw data and account for non-

additive interactions.

Do different environmental backgrounds affect how Walsh Coefficients order

themselves by explanatory power in the same gene system? In Mira et al. (2014), the

authors measured the average growth rates of all combinations of the same four

mutations in the TEM-50 b-Lactamase gene across 15 different antibiotic media. The

resulting tb correlation coefficients were highly dependent on the antibiotic medium the

bacteria were grown in, with coefficients ranging from -.29 to .41. This difference in

correlation statistics is not surprising, since the authors noted that natural selection

favored different combinations of mutations in different growth media (Mira et al. 2014).

Handling experimental uncertainty The next step to develop a more robust model and statistical testing will involve

handling uncertainty. Particularly in the context of ranking Walsh Coefficients by

explanatory power in the regression model, when does the additional explanatory power

of very small Walsh Coefficients become indistinguishable? A potential solution to this

18

problem would involve implementing an information criterion in the regression model,

such as Akiko Information Criterion (AIC), which would identify when to stop keeping

track of Walsh Coefficient ranks. In this scheme, all coefficients after the AIC cutoff

would be tied for having the least explanatory power.

Handling non-binary cardinalities The Walsh Transformation is limited in that it cannot handle non-biallelic loci systems.

This turns out to be an impediment when analyzing epistatic interactions in genotype

space with four possible nucleic acids and protein space, which can have as many as

twenty amino acid substitutions. The concept of the Walsh Transformations has been

shown to be generalizable to encompass homogeneously larger alphabets (Zhou et al.

2007). However, both the Walsh Transformation and its extension into higher

dimensionalities suffer from the inability to handle heterogeneous loci systems.

Anderson et al. (2015) has proposed a scheme that identifies epistatic interactions in

systems where each locus can handle alphabets of size 2k. Still, this scheme is not a

robust method to handle different alphabet sizes, and becomes requires one to

construct a transformation table by hand. Within this thesis, I handled heterogeneous

loci systems by randomly subsampling from all available mutations. Palmer et al. (2015)

had one non-binary locus with a cardinality of three. This allowed for creating

subsampling affected Walsh coefficient ordering. The resulting tb correlation

coefficients ranged from .13 - .24, however, regardless of which subsamples were

chosen, the resulting corrected p-values would have been non-significant.

19

Conclusions

Higher order epistasis is present in every combinatorically complete dataset that I have

examined. It is pervasive in shaping fitness landscapes and determining evolutionary

accessible peaks. The Walsh Transform provides a straightforward way to quantify

higher order epistasis. Due to the Walsh Transform’s orthogonal properties, ranking

Walsh coefficients by absolute magnitude is equivalent to identifying coefficients by

order of explanatory power in a regression model (Lan & Heckendorn, pers. commun

2017). Contrary to the naïve expectation that higher order epistatic terms are less

important than main effects or pairwise epistatic terms, I have shown that in 13 of the 19

datasets examined, higher order epistatic terms were not significantly less important

than other terms.

20

Acknowledgments I would like to thank Professor Daniel Weinreich Ph.D. and Yinghong Lan for providing me with invaluable mentorship throughout this project. I would also like to thank my parents for supporting me throughout my life and feeding my insatiable curiosity throughout my childhood.

21

Bibliography Anderson DW, McKeown AN, Thornton JW. Intermolecular epistasis shaped the

function and evolution of an ancient transcription factor and its DNA binding sites. Elife. 2015 Jun 15;4:e07864.

Bateson W, Mendel G. Mendel's principles of heredity. University press; 1913. Bridgham, J.T., Carroll, S.M. and Thornton, J.W., 2006. Evolution of hormone-receptor

complexity by molecular exploitation. Science, 312(5770), pp.97-101. Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R. and Hartl, D.L., 2010.

Compensatory mutations restore fitness during the evolution of dihydrofolate reductase. Molecular biology and evolution, 27(12), pp.2682-2690.

Chou HH, Chiu HC, Delaney NF, Segrè D, Marx CJ. Diminishing returns epistasis

among beneficial mutations decelerates adaptation. Science. 2011 Jun 3;332(6034):1190-2.

Costanzo, M.S. and Hartl, D.L., 2011. The evolutionary landscape of antifolate

resistance in Plasmodium falciparum. Journal of genetics, 90(2), pp.187-190. Crona K, Greene D, Barlow M. The peaks and geometry of fitness landscapes. Journal

of theoretical biology. 2013 Jan 21;317:1-0. da Silva, J., Coetzer, M., Nedellec, R., Pastore, C. and Mosier, D.E., 2010. Fitness

epistasis and constraints on adaptation in a human immunodeficiency virus type 1 protein region. Genetics, 185(1), pp.293-303.

de Visser, J.A.G., Park, S.C. and Krug, J., 2009. Exploring the effect of sex on empirical

fitness landscapes. the american naturalist, 174(S1), pp.S15-S30. Fisher RA. The genetical theory of natural selection: a complete variorum edition.

Oxford University Press; 1930. Flynn KM, Cooper TF, Moore FB, Cooper VS. The environment affects epistatic

interactions to alter the topology of an empirical fitness landscape. PLoS Genet. 2013 Apr 4;9(4):e1003426.

Franke J, Klözer A, de Visser JA, Krug J. Evolutionary accessibility of mutational

pathways. PLoS Comput Biol. 2011 Aug 18;7(8):e1002134.

22

Hall, D.W., Agan, M. and Pope, S.C., 2010. Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. Journal of heredity, 101(suppl 1), pp.S75-S84.

Hartl DL. What can we learn from fitness landscapes?. Current opinion in microbiology.

2014 Oct 31;21:51-7. Heckendorn RB, Whitley D. Predicting epistasis from mathematical models.

Evolutionary Computation. 1999;7(1):69-101. Holm S. A simple sequentially rejective multiple test procedure. Scandinavian journal of

statistics. 1979 Jan 1:65-70. Kendall MG. A new measure of rank correlation. Biometrika. 1938 Jun 1;30(1/2):81-93. Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E. and Cooper, T.F., 2011. Negative

epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), pp.1193-1196.

Lan & Heckendorn, personal Commication 2017 Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J.,

Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M. and Hartl, D.L., 2009. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of the National Academy of Sciences, 106(29), pp.12025-12030.

Lunzer, M., Miller, S.P., Felsheim, R. and Dean, A.M., 2005. The biochemical

architecture of an ancient adaptive landscape. Science, 310(5747), pp.499-501. Malcolm BA, Wilson KP, Matthews BW, Kirsch JF, Wilson AC. Ancestral lysozymes

reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature. 1990 May 3;345(6270):86.

Meini MR, Tomatis PE, Weinreich DM, Vila AJ. Quantitative description of a protein

fitness landscape based on molecular features. Molecular biology and evolution. 2015 Jul 1;32(7):1774-87.

Miller, S.P., Lunzer, M. and Dean, A.M., 2006. Direct demonstration of an adaptive

constraint. Science, 314(5798), pp.458-461. Mira PM, Crona K, Greene D, Meza JC, Sturmfels B, Barlow M. Rational design of

antibiotic treatment plans: a treatment strategy for managing evolution and reversing resistance. PloS one. 2015 May 6;10(5):e0122283.

23

O'maille, P.E., Malone, A., Dellas, N., Hess, B.A., Smentek, L., Sheehan, I.,

Greenhagen, B.T., Chappell, J., Manning, G. and Noel, J.P., 2008. Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature chemical biology, 4(10), pp.617-623.

Palmer, A.C., Toprak, E., Baym, M., Kim, S., Veres, A., Bershtein, S. and Kishony, R.,

2015. Delayed commitment to evolutionary fate in antibiotic resistance fitness landscapes. Nature communications, 6.

Phillips PC. Epistasis—the essential role of gene interactions in the structure and

evolution of genetic systems. Nature Reviews Genetics. 2008 Nov 1;9(11):855-67.

Phillips PC. The language of gene interaction. Genetics. 1998 Jul 1;149(3):1167-71. Sailer ZR, Harms MJ. Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype

Maps. Genetics. 2017 Mar 1;205(3):1079-88. Smith, John Maynard. "Natural selection and the concept of a protein space." (1970):

563-564. Tan, L., Serene, S., Chao, H.X. and Gore, J., 2011. Hidden randomness between

fitness landscapes limits reverse evolution. Physical review letters, 106(19), p.198102.

Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only

very few mutational paths to fitter proteins. science. 2006 Apr 7;312(5770):111-4. Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. Should evolutionary geneticists worry

about higher-order epistasis? Current opinion in genetics & development. 2013 Dec 31;23(6):700-7.

Weinreich DM, Watson RA, Chao L. Perspective: sign epistasis and genetic constraint

on evolutionary trajectories. Evolution. 2005 Jun;59(6):1165-74. Whitlock, M.C. and Bourguet, D., 2000. Factors affecting the genetic load in Drosophila:

synergistic epistasis and correlations among fitness components. Evolution, 54(5), pp.1654-1660.

Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution.

na; 1932 Aug.

24

Zhou S, Sun Z, Heckendorn RB. Extended probe method for linkage discovery over high-cardinality alphabets. InProceedings of the 9th annual conference on Genetic and evolutionary computation 2007 Jul 7 (pp. 1484-1491). ACM.

25

Appendix 1a. Palmer et al. (2015) linear regression figure

“Each genotype was fitted by a series of models of increasing complexity, that assigned changes in resistance through parameters based on the grand average of fitness, the individual effects of mutations, and pairwise and higher order epistatic interactions. The single parameter with the greatest explanatory power was chosen, followed by the parameter with the second greatest explanatory power when combined with the first. Each point in the plot denotes the class of parameter added and the residual variance with that many parameters.” Palmer et al. (2015)

26

2a. Mira et al. (2015) table of TEM b-50 lactamase system grown on 15 different antibiotic backgrounds.

Antibiotic Media Number of loci tb correlation p-value

Ampicillin (AMP) 4 0.02151 0.4573 Amoxicillin (AM) 4 -0.032258 0.5448 Cefaclor (CEC) 4 0.40860 0.025 Cefotaxime (CTX) 4 0.15054 0.2424 Ceftizoxime (ZOX) 4 0.35484 0.0531 Cefuroxime (CRO) 4 0.322581 0.0673 Ceftriaxone (AMC) 4 0.150538 0.242 Amoxicillin + Clavulanic acid (AMC) 4 0 0.4832

Ceftazidime (CAZ) 4 0.150538 0.239 Cefotetan (CTT) 4 0.010753 0.4696 Ampicillin + Sulbactam (SAM) 4 0.376344 0.0397

Cefotprozil (CPR) 4 0.215054 0.1605 Cefpodoxime (CPD 4 0.193549 0.1863 Pipercillin + Tazobactam (TZP) 4 0.225806 0.145

Cefepime (FEP) 4 -0.29032 0.902

27

2b. Regression Analysis of Mira et al. (2015)

These models show that different medium backgrounds produce different fitness landscapes and regression models.

28

3a. Other Sub-sampled data from the same system

System

# of loci

tb correlation

p-value

Reference

E. coli Subset 01

Trimethoprim resistance 6

0.1324

.09305

Palmer et al. 2015

E. coli Subset 02

Trimethoprim resistance 6 0.1412 0.08228

Palmer et al. 2015

E. coli Subset 12

Trimethoprim resistance 6 0.2478 0.00717

Palmer et al. 2015

A. niger testing recombination and the

evolution of sex 5

0.13062

.1089

De Visser et al. 2009

A. niger testing recombination and the

evolution of sex 5 0.2176 0.07031

Franke et al. 2011

Escherichia coli Beneficial mutations that increase

growth rates in Guanazole 5 0.4819 0.0029

Flynn et al. 2013


growth rates in EGTA 5 0.3549 0.0076

Flynn et al. 2013


growth 5 0.5415 0

Khan et al. 2011

Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...

Documents

Transcript of Identifying the importance of higher order epistasis€¦ · Identifying the importance of higher...