Computational Protein Evolution - DiVA...

82
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2020 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1922 Computational Protein Evolution Modeling the Selectivity and Promiscuity of Engineered Enzymes KLAUDIA SZELER ISSN 1651-6214 ISBN 978-91-513-0918-7 urn:nbn:se:uu:diva-407494

Transcript of Computational Protein Evolution - DiVA...

Page 1: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2020

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1922

Computational Protein Evolution

Modeling the Selectivity and Promiscuity ofEngineered Enzymes

KLAUDIA SZELER

ISSN 1651-6214ISBN 978-91-513-0918-7urn:nbn:se:uu:diva-407494

Page 2: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

Dissertation presented at Uppsala University to be publicly examined in Room C4:301,BMC, Husargatan 3, Uppsala, Monday, 18 May 2020 at 13:15 for the degree of Doctor ofPhilosophy. The examination will be conducted in English. Faculty examiner: Professor MariaRamos (University of Porto).

AbstractSzeler, K. 2020. Computational Protein Evolution. Modeling the Selectivity and Promiscuityof Engineered Enzymes. Digital Comprehensive Summaries of Uppsala Dissertations fromthe Faculty of Science and Technology 1922. 81 pp. Uppsala: Acta Universitatis Upsaliensis.ISBN 978-91-513-0918-7.

Enzymes are biological catalysts that significantly increase the rate of all biochemical reactionsthat take place within cells and are essential to maintain life. Many questions regardingtheir function remain unknown. Experimental techniques, such as kinetic measurements,spectroscopy, and site-directed mutagenesis, can provide relevant information about enzymestructure, key residues, active site conformations, and kinetics. However, they struggleto provide a full picture of enzyme catalysis. Combining experiments with computationaltechniques gives the possibility to generate a complete explanation with atomistic resolution.Computational modeling offers an incredibly robust toolkit that can provide detailed insight intothe reactivity and dynamics of biomolecules.

Compounds that contain phosphate and sulfate groups are essential in the living world.They are present as i.e., a biological source of energy (ATP), signaling molecules (GTP),coenzymes, building blocks (DNA, RNA). Furthermore, phosphate esters can be used asinsecticides, herbicides, flame retardants, and as chemical weapons. Cleavage of the phosphatebond involves an extremely low rate of spontaneous hydrolysis, nevertheless it is commonreaction in living organisms. Phosphatases (enzymes catalysing cleavage of phosphate bond)are crucial in both physiological regulation as well as serious pathological conditions includingasthma, immunosuppression, cardiovascular diseases, diabetes.

Understanding the basis of phosphoryl and sulfuryl transfer reactions is crucial for medical,biological, and biotechnological industries in order to i.e., create and improve existingdrugs, modify enzyme structures, understand the development of some diseases. However,despite decades of both experimental and computational studies, mechanistic details of thesereactions remain controversial. These reactions can occur via multiple different mechanismsinvolving intermediate steps or transition state structures. To solve these puzzles, we performedcomputational studies to verify the reaction pathway of diaryl sulfate diesters hydrolysis.We suggest that the reaction proceeds through a concerted mechanism with a loose (slightlydissociative) transition state.

Serum paraoxonase 1 (PON1) is calcium-dependent lactonase, which is bound to high-densitylipoprotein (HDL) with apolipoprotein A-I (ApoA-I). The enzyme is highly promiscuousand catalyzes the hydrolysis of multiple, different types of chemical compounds, such aslactones, aromatic esters, oxons, and organophosphates. We performed several, complex studieson PON1’s reaction mechanism, promiscuity, PON1-HDL interactions, and evolutionarytrajectories. One of the most extensively used approaches in this thesis was the empirical valencebond (EVB) method. Our models reproduce essential experimental observables and providemechanistic insights and a better understanding of the enzymes role and its evolutionary derivedpromiscuity.

Keywords: enzymes, kinetics, catalysis, density functional theory, empirical valence bond,diesters, evolution, promiscuity

Klaudia Szeler, Department of Chemistry - BMC, Biochemistry, Box 576, Uppsala University,SE-75123 Uppsala, Sweden.

© Klaudia Szeler 2020

ISSN 1651-6214ISBN 978-91-513-0918-7urn:nbn:se:uu:diva-407494 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-407494)

Page 3: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

I was taught that the way of progress is neither swift nor easy.

Maria Skłodowska-Curie

To my beloved husband, kids and parents.

Page 4: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its
Page 5: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Szeler K., Williams N. H., Hengge A. C., Kamerlin S. C. L.,

Modeling the alkaline hydrolysis of diaryl sulfate diesters: A mechanistic study. J. Org. Chem., Manuscript submitted

II Ben-David M., Sussman J. L., Maxwell C. I., Szeler K., Kamer-

lin S. C. L. And Tawfik D. S. Catalytic stimulation by restrained active-site floppiness – The case of high-density lipoprotein bound serum paraoxonase 1. J. Mol. Biol. 2015, 427, 1359.

III Blaha-Nelson D., Krüger D. M., Szeler K., Ben-David M. and

Kamerlin S. C. L. Active site hydrophobicity and the convergent evolution of paraoxonase activity in structurally divergent en-zymes: The case of serum paraoxonase 1. J. Am. Chem. Soc. 2017, 139, 1155.

IV Ben-David M., Soskine M., Dubovetskyi A., Cherukuri K., Dym

O., Sussman J. L., Liao Q., Szeler K., Kamerlin S. C. L., Tawfik D. S. Enzyme evolution: An epistatic ratchet versus a smooth re-versible transition. Mol. Biol. Evol. 2019, 37, 4, 1133.

Reprints were made with permission from the respective publishers.

Page 6: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

Additional Publications

I Ma H., Szeler K., Kamerlin S. C. L. and Widersten M. Linking coupled motions and entropic effects to the catalytic activity of 2-deoxyribose-5-phosphate aldolase (DERA). Chem. Sci. 2016, 7, 1415.

II Carvalho T. P., Szeler K., Vavitsas K., Aqvist J. and Kamerlin

S. C. L. Modelling the mechanisms of biological GTP hydroly-sis. Arch. Biochem. Biophys. 2015, 582.

III Petrović D., Szeler K. and Kamerlin S. C. L. Challenges and ad-

vances in the computational modeling of biological phosphate hydrolysis. ChemComm 2018, 54, 377.

Page 7: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

Contents

1. Introduction .............................................................................................. 11

2. Enzymes ................................................................................................... 13

2.1. Enzyme kinetics ................................................................................ 14 2.1.1. Michaelis-Menten kinetics ......................................................... 14

2.2. Enzyme catalysis ............................................................................... 17

2.3. Enzymes dynamics, promiscuity and evolution ................................ 20

3. Computational approaches ....................................................................... 23

3.1. Quantum mechanical approaches ...................................................... 23 3.1.1. Born-Oppenheimer approximation ............................................ 24 3.1.2. Basis sets .................................................................................... 25 3.1.3. Hartree-Fock method ................................................................. 26 3.1.4. Density functional theory ........................................................... 27

3.2. Classical methods .............................................................................. 28 3.2.1. Force fields ................................................................................. 29 3.2.2. Molecular dynamics ................................................................... 30 3.2.3. Free energy calculations............................................................. 31

3.3. Multiscale models ............................................................................. 32 3.3.1. The combined quantum mechanics/molecular mechanics (QM/MM) approach ............................................................................. 32 3.3.2. The empirical valence bond method .......................................... 33

3.4. Analysis of the results ....................................................................... 36 3.4.1. Clustering algorithms ................................................................. 36 3.4.2. Principal component analysis ..................................................... 38

4. Phosphate and sulfate esters ..................................................................... 40

4.1. Mechanisms of phosphate and sulfate ester hydrolysis .................... 41

4.2. Linear free energy relationship ......................................................... 43

Page 8: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

4.3. Kinetic isotope effects ....................................................................... 45

4.4. The nature of the transition states in involved in phosphate and sulfate ester hydrolysis (Paper I) ......................................................................... 46

5. Understanding the mechanism and evolution of enzymes that catalyze phosphoryl and sulfuryl transfer. ................................................................. 49

5.1. Catalytic mechanism and properties of serum paraoxonase 1 .......... 49

5.2. Serum paraoxonase 1 ........................................................................ 50

5.3. Catalytic floppiness and the activity of PON1 (Paper II) ................. 52

5.4. Influence of solvation on PON1 activity (Paper III) ......................... 56

5.5. An epistatic ratchet in PON1 (Paper IV) .......................................... 62

6. Conclusions .............................................................................................. 67

7. Sammanfattning på svenska ..................................................................... 69

8. Acknowledgements .................................................................................. 71

Bibliography ................................................................................................ 73

Page 9: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

Abbreviations

AMBER – Assisted Model Building with Energy Refinement Apo-enzyme – substrate-free form of an enzyme CAD – coronary artery disease CHARMM – Chemistry at HARvard Macromolecular Mechanics DFT – density functional theory EVB – empirical valence bond FEP – free energy perturbation FF – force field GROMOS- GROningen MOlecular Simulation KIE – kinetic isotope effect LFER – linear free energy relationship LRA – linear response approximation MD – molecular dynamics MO – molecular orbital OPLS-AA – optimized potentials for liquid simulations PBC – periodic boundary conditions PCA – principal component analysis PDB - Protein Data Bank PES – potential energy surface PON – Enzymes from the serum paraoxonase family PON1 - serum paraoxonase 1 QM - quantum mechanics QM/MM – hybrid quantum mechanics/molecular mechanics VB – valence bond vdW – van der Waals

Page 10: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its
Page 11: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

11

1. Introduction

Phosphate esters are crucial among living organisms.[1,2] For example, DNA and RNA, which are the most important molecules in cell biology, are phos-phodiesters. Besides, many coenzymes contain phosphate groups in their structures. ATP, an essential biological source of energy, and GTP, a ubiqui-tous signaling molecule, both contain phosphate groups.[1,3] Suggested reasons why nature has chosen phosphate for the most critical roles in our body in-clude the highly stable phosphate bond and unique chemical properties of phosphoric acid.[4] The stability of the phosphate bond invokes an extremely low rate of spontaneous hydrolysis at neutral pH; the half-life of cleaving of an individual phosphate bond in DNA is around 30 million years at room tem-perature.[5] The chemical properties of phosphoric acid make it favorable in nature as it can simultaneously link two nucleotides and ionize, the provided negative charge stabilizes the diesters and prevent hydrolysis.[2]

Cleavage of the stable phosphate bond is a common reaction in living or-ganisms despite being energetically costly in the absence of catalysts. En-zymes act as nature’s leading catalysts to overcome unfavorably energetic bar-riers. They have not only biological importance but also industry potential. Enzymes can be used for the production of medicines, biofuels, chemicals, food, detergents, and treatment of metabolic disorders.[6] Understanding how enzymes evolve, gain new activities (promiscuity of enzymes), and perform catalysis is crucial for medical, chemical, biological, and biotechnological in-dustries in order to understand and potentially modify existing structure into ‘better or superior’ enzymes.

Phosphate esters are not only the critical structures in the living world but can also be used as chemical constituents of insecticides, herbicides, flame retardants, and weapons.[7,8] The first organophosphate insecticide were made in 1850,[9] and with time, an increased usage (most notably after World War II) resulted in increasing numbers of poisoned people every year.[8] Exposure of these compounds can occur through inhalation, ingestion, or skin contact. To alleviate the symptoms associated with organophosphate poisoning, serum paraoxonase 1 (PON1) was identified as an enzyme that can cleave the organ-ophosphate bonds of some pesticides and nerve agents.[10] Although, PON1 has been broadly studied for its role in the prevention of the coronary artery disease (CAD),[11–13] and as a potential drug for organophosphate pesticide and nerve agent poisoning,[14] specific questions remain unanswered, including re-action mechanism, enzyme specificity, promiscuity, and evolutionary origin.

Page 12: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

12

The purpose of this thesis was to understand the PON1 enzyme reaction mechanisms better and provide evolution perspectives using relevant compu-tational modeling of organophosphate hydrolysis. Additional focus was placed on the modeling of the non-enzymatic reaction of sulfate diesters hy-drolysis in solution. To achieve these aims, different computational and anal-ysis methods were applied, among which empirical valence bond calculations, molecular dynamics, and density functional theory were used extensively.

Outline of the thesis and papers The thesis starts with a general overview of enzymes, with attention to enzyme catalysis, kinetics, and the concept of enzyme promiscuity, as well as evolu-tionary aspects. Chapter 3 brings a theoretical background and fundamental aspects of computational chemistry methods used in this thesis. It is followed by the essential features of phosphate and sulfate chemistry with particular attention to analytical methods. This chapter covers a detailed mechanistic study of the non-enzymatic sulfate diesters hydrolysis in Paper I. The main interests of chapter 5 is the overview of the reaction mechanism, promiscuous properties, and evolutionary perspectives of serum paraoxonase 1. This chap-ter includes Papers II, III, and III.

Page 13: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

13

2. Enzymes

It is impossible to imagine the world that we know without enzymes as they are the catalyst of life, which speed up essentially all biochemical reactions.[15] They are biological molecules (proteins), which are essential for respiration, food digestion (metabolism), muscle movements, nerve functions, and many other roles.[3] All cells contain enzymes of different sizes and complexity.[15] Most biological reactions are too slow to occur in the absence of a catalyst under normal conditions, as, i.e., some reactions would require hundreds to millions of years. For instance, one of the most efficient enzymes – orotidine 5’-phosphate (OMP) decarboxylase enhances the rate of reaction by a factor of about 1017.[16,17]

The first identified enzyme was amylase isolated from germinating barley by Payen and Persoz in 1833.[18] A few years later, Schwann recognized the first animal enzyme – pepsin isolated from the stomach wall of animals as a component of gastric juice.[19]

Enzymes are typically assumed to be highly specific, still, most enzymes can promiscuously catalyze reactions different from their specialized func-tion.[20] The history of enzyme identification drives back 200 years; however, new structures are classified every year, and there remain gaps in the under-standing of enzyme mechanisms and their evolution. It is not clear if many modern enzymes retain their promiscuous functions as a remnant of their an-cestors or by design. How can the same active site catalyze different reactions, which are sometimes entirely unrelated? Some enzymes, including iso-propylmalate isomerase,[21] retro-aldolases,[22] PriA,[23] or serum paraoxonase 1,[24] and many others,[25,26] have native and promiscuous functions driven by conformational changes (in some other cases by single or multiple mutations).

The beginning of solving the puzzles in understanding enzyme structure could be dated back to 1926 when Sumner, who crystallized urease,[27] made a big step forward. His work proved that enzymes are proteins, a hypothesis that used to be highly controversial.[28,29] Biochemists realized that their unique structure could drive enzyme function.[30] Unfortunately, knowledge about enzymes at this stage was negligible. The next important discovery was made by Johnson and Phillips, who determined the first 3D X-ray structure of lysozyme in 1965.[31] During this time, enzymes were considered to be frozen structures until Koshland introduced his idea about ‘induced-fit’ and enzyme flexibility.[32] An enzyme fold, or three-dimensional structure, is defined by its unique sequence, which introduces interactions between amino acids and

Page 14: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

14

defines the properties of the enzyme. Since the prediction of enzymes fold is limited to identification of the homologous structures, experimental tech-niques are typically used. Enzymes have one or more active sites where the reaction is catalyzed.[15,33] A single active site could be used to adapt multiple substrates leading to enzyme promiscuity, which is explained later in this chapter. Computational methods can be successfully used to predict enzyme mechanisms and identify crucial residues for catalysis, properties that may be used to improve enzyme structures for different purposes. In contrast to ex-periments, they allow us to track the mobility of the enzyme in an atomistic resolution at fs timescales,[34] and understand how bonds are forming/breaking along the reaction pathway.[35]

2.1. Enzyme kinetics Investigation of enzyme kinetics is essential for understanding cellular sys-tems and studying enzyme mechanisms. Michaelis and Menten demonstrated one of the best-known mathematical models of single substrate enzymatic re-action in 1913.[36] Although their equation was derived previously by Henri, his experiments failed to support the theory.[37] Briggs and Haldane, in 1925, demonstrated a modified version of the model with steady-state approxima-tion,[38] resulting in the version well-known and used today. In the first step, the enzyme (E) binds to the substrate (S), forming a complex (ES), also known as the Michaelis complex. This complex rearranges into the transition state and intermediary state (ES‡) between substrate and product. Finally, the tran-sition state transforms into the product (P) and dissociates from the enzyme. The resulting reaction sequence is shown in Eq. 2.1. The formation of the en-zyme-substrate complex step is reversible – with rate constants k1 and k-1.

⇋ → (2.1)

where k1 is rate constant of enzyme-substrate complex formation, and k2 is product formation rate constant.

2.1.1. Michaelis-Menten kinetics The Michaelis-Menten model, which is a widely used model for the represen-tation of enzyme kinetics, is a form of relationship between maximum reaction velocity (Vmax), substrate concentration, and the Michaelis constant (KM). By definition, KM is equivalent to the substrate concentration at half of the maxi-mum reaction velocity (v=Vmax/2).

Page 15: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

15

Figure 1. A schematic plot of the reaction velocity as a function of substrate concen-tration for an enzymatic reaction.

The Michaelis-Menten equation can be directly derived from the reaction rates as:

(2.2)

E S ES (2.3)

E S ES (2.4)

ES (2.5)

E S ES (2.6)

In most systems, the concentration of the enzyme-substrate complex [ES] will approach a steady-state in which the rate of formation of [ES] and disappear-ance of the product are equal, and therefore the overall concentration of [ES] is constant. By invoking the steady-state approximation,

0 (2.7)

Eq. 2.6. may be rewritten as:

Page 16: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

16

(2.8) Since the enzyme in the system is either free or in complex, we can reframe the equation as a function of initial concentration, a known quantity, and com-plex concentration as:

(2.9)

E (2.10)

The maximum velocity of reaction will always occur at the saturation of all available enzymes. Therefore,

(2.11)

Substituting Eqs. 2.10. and 2.11. into Eq. 2.5., moreover, acknowledging that

the reaction rate is , we can obtain the rate of the reaction as:

(2.12)

Defining the Michaelis-Menten constant, KM, as:

(2.13)

We obtain the Michaelis-Menten equation as:

(2.14)

KM, kcat and kcat/KM are commonly used for validation of enzyme kinetics. These parameters can be used to compare different enzymes catalyzing the same reaction and validate which enzyme is ‘better’ through experimental techniques. Measurements of enzyme kinetics are used in computational bio-chemistry to confirm enzyme mechanisms, test promiscuous substrates, and predict improvement by mutations. Assuming that enzymes work mainly un-der steady-state conditions, both in-vivo and in-vitro, kinetic parameters can be determined as kcat – the catalytic constant of substrate to product conversion and KM – the Michaelis constant defined as the substrate concentration when the initial rate is in half of the maximum velocity (used as the indicator of specificity). The ratio between these two parameters (kcat/KM) is often used to compare different enzymes catalyzing the same reaction.

Page 17: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

17

2.2. Enzyme catalysis Understanding enzyme catalysis could have a significant impact on drug de-sign, disease treatment, fuel production, and many other aspects of chemistry and biology. As an example, the knowledge of transition states involved in reaction allows developing new inhibitors that will bind strongly to their cog-nate enzymes.[39] Additionally, pharmaceutical studies focused on proteases, which can be potential drug targets, as well as prognostic and diagnostic bi-omarkers, i.e., in cancer research.[40] The usage of enzymes in biodiesel pro-duction makes it more environmentally friendly and less energy-intensive in comparison to traditional alkaline catalyzed processes. Another example is the enzymatic hydrolysis of cellulose in bioethanol production that allows reduc-ing costs as it is carried out at mild conditions.[41] The chemical mechanisms of enzyme catalysis have been hot topics within both experimental and theo-retical research. Enzymes use specific functional groups and/or cofactors, which alter reaction rates lowering the activation energy as compare to the uncatalyzed reactions (Fig. 2). In order for a reaction to occur, reactant mole-cules must obtain the required amount of energy to cross the reaction barrier, also known as the activation energy ΔG‡. Some reactions would never occur in solution and at ambient temperatures due to very high ΔG‡. Various models of catalysis have been presented over the years.

Figure 2. Schematic illustration of a theoretical free energy profile for a one-step re-action. The uncatalyzed reaction in solution is shown in green, whereas the reaction catalyzed by an enzyme is shown in blue.

Page 18: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

18

Lock and key model The lock and key model, introduced by Emil Fischer in 1894, describes the substrate as a key perfectly shaped to fit into an enzyme’s active site (the ‘lock’) (Fig. 3a). The substrate needs to match the size, shape, and chemical nature of the active site to induce a catalytic effect.[42] This model does not explain enzyme flexibility and was later proven to be too simple to explain catalysis.

Induced-fit and conformational selection models The induced-fit model suggests that binding of a ligand induces a conforma-tional change in the enzyme and ligand (Fig. 3b) in comparison to the lock and key model where an enzyme structure is unchanged with or without substrate. In the induced-fit model, the substrate binds to the enzyme, which rearranges to improve stabilization.[32,43] An additional model of enzyme catalysis, the conformational selection model, describes an equilibrium of active site con-formations between different conformational states. In contrast to the induced-fit model (where binding occurs first and then is followed by rearrangement), conformational change precedes ligand binding (Fig. 3c).[44–46]

Figure 3. Schematic illustration of the protein-ligand models (a), lock and key (b) induced fit, and (c) conformational selection. Reprinted with permission from Du et al. [47]

Ground state destabilization The ground state destabilization concept began with Bayliss’ suggestion that a rise of the chemical potential of substrates could increase reaction rates.[48] It was followed by Quastels’ proposal that a substrate could be activated by an enzyme’s idiosyncratic electric field.[49] The concept was clarified by Jencks, who stated that enzymes increase the reaction rates by destabilizing the bound substrate relative to the transition state.[50] When the substrate binds to the enzyme, energy increases, either through changes to the substrate or enzyme. This increase effectively reduces the free energy required to reach

Page 19: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

19

the transition state from the substrate-bound state. Despite a strong theoretical basis at that time, the hypothesis does not fully explain enzyme catalysis. The amount of energy associated with steric strain due to the ligand-binding is small compared to the activation energy.[51]

Transition state stabilization Another model suggests that an enzyme’s active site provides the optimal electrostatic environment (dipoles of the active site in suitable orientations) to stabilize the transition state more effectively than in aqueous solution. The same reaction in solution without enzyme will have a significant increase in free energy spent on the reorganization of the dipoles.[52–54] The transition state stabilization model was first introduced by Pauling,[55] who theorized that the enzyme (catalyst) enhances the reaction rate with stronger TS binding, which functionally lowers the energy compared to the uncatalyzed reaction. Transi-tion-state analogs studied 20 years later supported this hypothesis.[56] For many years, this phenomenon was discussed and explanations explored, in-cluding electrostatic stabilization and preorganization theories, which were confirmed by experimental and computational studies.[57–60]

In 1967, Vernon claimed that the charge stabilization is presumably a dom-inant factor in catalysis.[61] Later, Warshel claimed that electrostatic effects could be the most important factor for reducing the activation barrier and ex-plaining enzyme catalysis.[62] He suggested the use of a reaction in solution as a reference reaction to observe the catalytic effect of enzymes, a procedure still prevalent today. Electrostatic stabilization enhances reaction rates through contributions by protein charges, permanent and induced dipoles, and solvation. In his review,[60] he stated that a preorganized polar environment is provided by the enzyme active site and stabilizes the transition state signifi-cantly more than an environment in solution.[63] In essence, dipoles in solution require additional work to reorganize while enzymes provide already prear-ranged polar environment to stabilize the transition state. A schematic expla-nation of the phenomena is shown in Fig. 4. Computational studies confirm that the preorganization of the dipoles in the active site is the dominant con-tributor to enzyme catalysis.[60,64,65]

Page 20: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

20

Figure 4. Schematic illustration of preorganized electrostatics. Dipoles in the ground state of the enzyme (B) are preorganized to adapt to the transition state, whereas dipole reorganization is still needed to happen in a solution enzyme-free environment (A). Adapted with permission from ref.[60] Copyright 2006, American Chemical Society.

2.3. Enzymes dynamics, promiscuity and evolution It is widely accepted that, in many cases, the biological properties of enzymes may not be fully explained and understood when looking at a static 3D repre-sentation of an enzyme. Enzymes are flexible catalysts, and the contribution of their dynamics to biological functions and molecular recognitions has been shown in many experimental and computational studies.[66–71]

The dynamic movements of enzymes range in time length from fast, local bond vibrations to slow, global conformational motions.[25] A variety of move-ments allow enzymes to allosterically regulate functions,[72] access catalyti-cally competent conformations,[73] promote order or disorder into their struc-tures,[74,75] and gain new activity or specificity.[76] Conformational diversity and functional promiscuity can facilitate enzyme evolution. An enzyme can adopt multiple structures with one polypeptide sequence, and that allows for the repurposing or developing an entirely new active site.[77] Additionally, even a single, distant mutation can significantly affect the conformational equilibrium. Enzyme dynamics and flexibility in catalytic efficiency is a widely studied topic. An enzyme’s reaction rate is modulated and strongly dependent on the enzymes ‘floppiness’, there have therefore been many stud-ies about the importance of dynamics on enzyme catalysis (i.e., DERA,[78,79] DHFR,[80] glucose oxidase,[81] -lactamases,[82] cyclohexadienyl dehydra-tase,[83] ribonuclease A,[84,85] GTPases,[86,87] indoleglycerol phosphate synthe-tase (inGPS),[88,89] histone deacetylase amidohydrolase (HDAH),[90] to name a few examples). More recently, the effect of too much floppiness was stud-ied,[91,92] and it was shown to decrease catalytic efficiencies, as discussed in Paper II.

Page 21: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

21

Conformational dynamics allow an enzyme to perform promiscuous mech-anisms or catalyze promiscuous substrates. In the case of enzyme catalysis, promiscuity describes any enzymatic activity outside of those for which the enzyme evolved. Promiscuous reactions are not part of an organism’s physi-ology. Despite long-lasting knowledge of this phenomenon,[20,93,94] promiscu-ity has become an important topic with the expansion of enzyme engineering, bio-catalysis, and evolution.[95–97] Some mutations can drastically change an enzyme’s reaction capabilities, sometimes leading to a much more proficient catalyst or allow for the enzyme to gain promiscuous function.[98] Because of its complexity, natural enzyme evolution drives enzymes to be ‘good enough’ to perform their task and satisfy the needs of the cell. For this reason, many natural enzymes retain some promiscuity despite evolving for specialized tasks. Some enzymes, such as those which most likely originated from the same gene, maintain similar reaction intermediates and the ability to catalyze each other’s reactions, even with higher efficiency than the native en-zyme.[99,100]

Enzymes promiscuity can be classified as:[101] Conditions promiscuity – enzymes adapted to unnatural conditions: such

as anhydrous media, extreme temperatures or pH. Substrate promiscuity – enzymes that exhibit broad substrate specificity. Catalytic promiscuity – enzymes that catalyze different chemical reac-

tions. The promiscuity can occur as: Accidental – wild type enzyme that catalyzes side reaction (‘moonlight-

ing’). Induced – new promiscuity appears by single or multiple mutations.

Though this classification aids in understanding enzyme promiscuity, the dif-ferent types of promiscuity are not independent, and several types of promis-cuity may be observed within a single enzyme.[102] Most enzymes can catalyze other reactions or substrates in addition to the native function. When unnatu-ral substrates are introduced, enzymes may explore new promiscuous func-tions with some of these substrates (i.e., Pseudomonas aeruginosa arylsulfa-tase (PaS)).[103–105] In addition to different types of promiscuity, enzymes can adapt new functions through varying methods, including allosteric or active site conformational changes (i.e., human Sulfotransferase Family 1A Member 1 (SULT1A1) and Pyrococcus horikoshii isopropylmalate isomerase small subunit (PhIPMI-s), Acinetobacter baumannii β-lactamase), and changes to the protonation states of active site residues (i.e., malonate semialdehyde de-carboxylase (MSAD)).[76,106] Some of the enzymes catalyze the promiscuous and the native reaction switching the active site configuration while others use the same configuration and features; this includes enzymes that use nucleo-

Page 22: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

22

philic catalysis. An enzyme can sometimes use two different subsets of resi-dues for the native and new function (as seen in the case of serum paraoxonase 1) or substitute the cofactors.[76]

The first hypothesis about evolution and promiscuity was made in the 1970s by Jensen[107] and Yčas[108]. They proposed that ancestral enzymes were multifunctional generalists, but gene duplication and divergence changed them to become more specialized with higher activities for preferred sub-strates.[107–109] Conformational changes in the loop or side-chain dynamics al-low enzymes to adopt new structures and gain promiscuity. These confor-mations could be insignificant in the wild type enzyme, but with proper mu-tations, conformational populations could shift towards different confor-mations allowing new activities to evolve and dominate the native reactions. It is generally agreed that most modern enzyme activities originated from ran-dom mutations as a consequence of the evolutionary process. Gene duplica-tion is one of the possibilities for the appearance of such mutations. In such cases, one copy of the gene is responsible for retaining the original function-ality, while another gene evolves towards new functionality.[110] This process involves the evolutionary adaptation of an existing active site for a new activ-ity with multiple generations of multifunctional enzymes as the intermediate step. James and Tawfik provided a fundamental understanding of the role of dynamics in enzyme evolution, however, there are still many unknowns in this subject. They proposed that enzyme function can emerge on previously non-catalytic scaffolds by modulating an enzyme’s dynamical properties.[77,111]

Recently, researchers[112–115] investigated various protein states along a di-rected evolution trajectory to clarify the role of dynamics in enzyme evolution. The residues which were responsible for new functions appeared initially as a collection of several productive conformations in a mix of many non-produc-tive ones. Additionally, secondary mutations acted to eliminate non-produc-tive structures and shifted the equilibrium towards productive conformations. Some conformational fluctuations which were necessary for original function but irrelevant for the evolved one were eliminated over the evolution process to support the development of new functionalities. Nevertheless, it was rare for new enzyme function to emerge from the re-functionalization of an exist-ing active site without the presence of a multifunctional enzyme ancestor. The new function could additionally be introduced through conformational shifts of existing active-site residues and mutations that affected protein stability.[112]

Page 23: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

23

3. Computational approaches

3.1. Quantum mechanical approaches The connection between physics and chemistry is clear, although the Dirac statement from 1929 brought confusion: ‘The underlying physical laws nec-essary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the ex-act application of these laws leads to equations much too complicated to be soluble.’[116] Some scientists concluded that all of chemistry is already known and can be explained with physical laws in the future; chemistry can be re-duced to physics. Now, 90 years later, it is still the case that large parts of chemistry are not yet known or understood. To this day, we struggle to solve those equations.

It has to be mentioned that many of the current methods used today origi-nated from the quantum chemistry pioneers of the early 1920s.[117,118] Their explanations and applications were restricted to the one-, two-atom systems, or small symmetric structures that can be solved by hand. An explanation of black body radiation by Planck in 1900 was one of the most important mile-stones.[118] Afterward, Danish physicist Bohr published his first paper that fun-damentally changed our understanding of the atom.[119] Another brilliant sci-entist was Rutherford, who proved that the nucleus is massive but very small in the same moment.[120] Formulations of Sommerfeld, de Broglie, Pauli, Hei-senberg, and Schrödinger are known as the foundations for non-relativistic quantum mechanics.[117,121] Recent years of developments (mostly in DFT methods[122]) allowed quantum chemistry calculations of moderate sizes mol-ecules to reach accuracies comparable to experiments.[35]

Quantum mechanics is a theoretical field that describes and predicts the behavior of physical systems ranging from particles through nuclei, atoms, and radiation to molecules and condensed matter.[123] Quantum chemistry ap-plies quantum mechanics to study various properties of molecules and their reactions.[35] The foundation of quantum chemistry is a wave model, and every system can be described by a wave function,[117] which is a mathematical de-scription of the probability of a particle’s quantum state (i.e., position, mo-mentum, time and/or spin).[124]

Solving the Schrödinger equation is a major challenge in quantum chemis-try because of its complexity for many-electron systems.

Page 24: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

24

, , (3.1)

The Hamiltonian operator determines the time evolution of wave function:

, (3.2)

The Hamiltonian could be derived as: Electron Nuclear kinetic energy kinetic energy

∑ ∑ (3.3)

∑ ∑| |, ∑

electron-electron electron-nucleus nucleus-nucleus interaction

where RI, RJ, and ri, rj are positions of nuclei and electrons, respectively; MI and me are the nucleus and electron masses; vacuum permittivity, Z the atomic number, and is the Laplacian operator;

(3.4)

The wave function for N electron system can be written as:

1,2… 1 2 … (3.5)

3.1.1. Born-Oppenheimer approximation It is possible to solve the Schrödinger equation analytically if the wave func-tion ( is known. Unfortunately, this is not always the case as -occurs to be extraordinarily complicated for almost all chemical systems (more compli-cated than H2 structure). The Born-Oppenheimer approximation (BOA) over-comes this issue, allowing for a solution to the Schrödinger equation. It con-siders the movements of the electrons to be in the field of the fixed nuclei, and therefore allows to decouple movements of both the nuclei and electrons into two Schrödinger equations:

, (3.6)

Page 25: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

25

Resulting in the wave-function, which does not satisfy the Pauli exclusion principle: two electrons could not be in the same state at the same time. The Slater determinant form is the simplest solution:

Ψ√ !

1 1 … 12 2 … 2

⋮ ⋮…

⋮ (3.7)

BOA allows us to split the Hamiltonian into two separate terms – one corre-sponding to nuclei and other to electrons:

(3.8)

∑ ∑ ∑,

⟨ ⟩ (3.9)

∑ ∑ ∑,

∑ ∑, (3.10)

Applying BOA to calculations of wave function and energies could be written as:

(3.11)

where {x} is the set of spatial coordinates and spin states of all electrons in the system. The minimal ground state of the system (Eel), which is bigger or equal to the real electronic ground state (E0), is represented as follows:

⟨ ⟩∗

∗ (3.12)

3.1.2. Basis sets A practical form of the Slater determinant wave function with BOA approxi-mation is used in ab initio theory. [125] It states that the electronic wave func-tion is unaffected by nuclear motions and basis sets, which represent molecu-lar orbitals. Basis sets used to describe molecular orbitals or electron densities of electronic structures are a crucial component of existing methods. They aim to give the best representation with a small computational cost. The basis set is a collection of one-particle functions used to build molecular orbital or elec-tron density.[126] The choice of functions is essential for accurate results. The most popular functions for molecular systems are nuclear-centered Slater

Page 26: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

26

(STO) or Gaussian type (GTO) functions, with GTO more preferable because of its higher computational efficiency.[127] Every molecular (spatial) orbital is expressed as a linear combination of n basis set functions:

∑ (3.13)

where is the coefficient of the primitive Gaussian (or Slater) function (ba-sis function) with r as a component, and n as the number of functions in the expansion. The coefficients are known as the molecular orbital expansion coefficients or so-called the MO coefficients. Primitive or uncontracted func-tion means that both the coefficient and the exponent parameters vary during ab initio calculations. It is worth mentioning that significant computational effort is needed to run the corresponding calculations, and this is why con-tracted functions are more commonly used since both components are pre-determined and kept constant during calculations. Slater and Gaussian basis functions are formulated as:

, , (3.14)

where f(x,y,z) is polynomial, and using larger gives more effective center-ing. If n=1, we use STO- orbitals, whereas n=2 corresponds to GTO-orbitals. They are different levels of basis sets, which depend on the number of basis functions used to characterize an electron.

3.1.3. Hartree-Fock method To obtain the exact solution of the Schrödinger equation is only possible for tiny systems, and for bigger systems, one can apply approximations. The Har-tree-Fock (HF) approach, developed in 1927,[128] provides an approximate so-lution of the electronic Schrödinger equation. This model uses the following approximations: the BOA, independent electron, and linear combination of atomic orbitals. All relativistic effects are entirely neglected. The total wave function is constrained to be a product of N one-electron orbitals without in-cluding the spin of the electrons or anti-symmetry. A single Slater determinant describes the eigenfunction of each energy. This method is also named as the self-consistent field method (SCF) as the final field is required to be self-con-sistent with the assumed initial field. The HF method allows to split Hamilto-nian ( of the multi-electron wave function to an individual, one electron Hamiltonians, named Fock operators :

∑,

∑ (3.15)

Page 27: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

27

where the first part ( characterizes one-electron terms, and the second part ( describes the potential involved in the electron-electron repulsion. In the HF method, a two-electron repulsion operator is replaced by one-elec-tron operator , and it represents the average of repulsions.

3.1.4. Density functional theory Density functional theory method (DFT) has been one of the most popular methods for energetics calculations in the last 30 years.[129] DFT is commonly used for various chemical systems to calculate ground-state electronic struc-tures of atoms and molecules as it allows us to achieve the right balance be-tween accurate results and low computational costs. DFT focuses on the elec-tron density (probability of finding an electron at a given position) of systems in contrast to the methods based on the Schrödinger equation.

The idea of using electron density dates back to the beginning of the 20th century. However, it was not explored until Hohenberg and Kohn created the mathematical basis for DFT in 1964,[130] which has helped to develop research in solid-state physics.[131] DFT is based on two Hohenberg and Kohn theo-rems,[130] which state that the electron density of the system determines the ground state of the system and that it is possible to acquire the electronic den-sity and the ground state energy with a variational principle. Consequently, Kohn and Sham proposed to work with non-interacting electrons where the system is constructed in a way that its density is the same as of the interacting electrons (Fig. 5). They replaced the many-body problem with a system of non-interacting auxiliary particles, and introduce so-called Kohn-Sham orbit-als.[132]

Figure 5. The comparison of many-body (Schrödinger method) and DFT perspec-tives. In the DFT method, the non-interacting model is used instead of including all interactions between the electrons. This model uses electron density, which mimics the behavior and properties of electrons.

Page 28: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

28

The ground state of the system is represented as:

∑ … | , , , , … , , | (3.16)

It is a single Slater determinant of the orbitals. The difficulty of the many-electron systems is to calculate the function of 3N variables. DFT method di-minishes the 3N degrees of freedom of the N-body system to only three spatial coordinates.

The single determinant is replaced by the exchange-correlation functional, which contains terms for computing the exchange and the electron correlation energies:

⟨ ⟩ ⟨ ⟩ (3.17)

where EX[P] and EC[P] are the exchange, and the correlation functional, re-spectively.

, , , (3.18)

The exchange functional is unknown and needs to be approximated what makes it an essential ingredient of Kohn-Sham density functional theory (KS-DFT). The approximations widely used in DFT are local spin-density approx-imation (LSDA),[132] generalized gradient approximation (GGA),[133] meta-GGA[134] and hyper-GGA.[135] Most of the functionals are successful in many applications but fail in some circumstances.[136]

The functionals used for calculations presented in this thesis were B97X-D,[137] M11L,[138] and M062X.[53] As shown in Paper I and by many other re-search groups,[129,139–142] usage of the functional should be system and prob-lem-dependent as the wrong choice could lead to entirely different conclusions and chemically unrealistic results.

3.2. Classical methods The methods described above are computationally too expensive to use for greater systems, such as enzymes in a solvent environment. Molecular me-chanics (MM) does not treat the movements of electrons explicitly according to the Schrödinger equation. Instead, it uses a classical description based on Newton’s equations to characterize interactions between atoms in the mole-cule. It decreases the complexity of calculations.[126] Atoms are described as balls bonded to one another by mechanical springs (‘ball and spring’ model).

Page 29: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

29

The set of analytical parameters and functions used to describe the atoms and molecules in a given system is called a force field.

3.2.1. Force fields We assume that the nucleus is heavy enough to be treated with classical me-chanics instead of the Schrödinger equation.[126] The force field energy (VFF) is a sum of bonded and non-bonded interatomic interactions of the system (Fig. 6).[143]

(3.19)

where bonded interactions can be described as a sum of stretching (corre-sponding to bonds), bending (angles) and torsional (dihedrals) energies:

(3.20)

Non-bonded terms are described by van der Waals and electrostatic interac-tions.

(3.21)

Figure 6. Equations and schematic representation of force field forces. Reprinted from ref. [144] Copyright 2014, with permission from Elsevier.

Force field represents all parameters of the molecules in the system, and based on the second Newton’s law, it can be defined as the negative gradient of po-tential energy in the specific position of the particle. Parameters from one

Page 30: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

30

force field are not straightforwardly transferable to others as there are derived either from experiments or high-level theory calculations of a set of small models that mimic a set of bigger systems.

Examples of commonly used force fields for biomolecular simulation are AMBER99SB,[145] AMBER14SB,[146] CHARMM36,[147] or OPLS-AA.[148] In our publications, the all-atom OPLS-AA force field was largely used to char-acterize the enzyme systems studied.

3.2.2. Molecular dynamics Molecular dynamics (MD) is one of the most widely used methods in many areas of chemistry, biology, biotechnology, engineering, materials, and other parts of science and technology.[149] The main goal of computational simula-tions is to mimic ‘real-life’ behavior, to light up the invisible microscopic and nanoscopic details. They describe real atoms’ behavior and motions as realis-tically as a chosen potential energy function. Molecular simulations coupled with modern computing power offers the perfect connection between theory and experiments, and they can save both time and money. Simulations can be used to predict the behavior of materials, calculate binding energies or explain enzyme mechanisms.

For every computer simulation, there is a compromise between the accu-racy of results and the time/cost of calculations.[150] In comparison to high-level theory methods, classical MD simulations are fast and allow us to simu-late systems (currently) in the billions[151] of atoms, yet they have some obvi-ous limitations as a result of empirically fitted parameters or the approxima-tions used to describe the force field.[152]

Molecular dynamics simulations compute motions and velocity of a partic-ular molecule over time.[143] In classical MD, a force field is used to describe the system. Changes in the atomic coordinates of a system as a function of time are generated by integrating Newton’s second law of motion.

(3.22)

where F is the net force, m is mass of the body, and a is its acceleration.

(3.23)

where is the mass of particle along coordinate with force on that particle in this direction.

The major problem with molecular mechanics methods is that they do not allow us to introduce chemical changes in the system, such as bond breaking or forming. Although, other approaches, such as the cluster model approach,

Page 31: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

31

QM/MM methods, or introducing the reactive force fields (i.e., ReaxFF[153]), are successful for that purpose.

3.2.3. Free energy calculations Free energy perturbation (FEP) is a statistical mechanics method often used to compute free energy differences from molecular dynamics or Monte Carlo simulations.[154] FEP is one of the oldest and most widely used strategies to calculate the free energy difference between states. FEP calculations can often provide relatively accurate free energies within a mean absolute deviation (MAD) from experimental results of 4-6 kJ/mol.[155–158] Perturbation methods were initially used in mathematics and later applied to physics and chemistry. The idea of the theory is to introduce an initial problem, so-called the unper-turbed problem, and then a target problem, and which is the problem of inter-est, and it is represented as a perturbation of the unperturbed problem. The effect is named the perturbation parameter. The free energy difference of the reference and target system is characterized by the Hamiltonian of the refer-ence system and Hamiltonian of the difference.

, , ∆ , (3.24)

To obtain a free energy difference between two states, the Zwanzig equa-tion[159] is used:

∆ 0 → 1 ⟨exp ⟩ (3.25)

Furthermore, Zwanzig derived the free energy change as a power series with E1 as the energy of the target state defined as the sum of the energies of the reference state E0 and a small perturbing potential, V:

(3.26)

∆ ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ (3.27)

The difference is estimated to the first order by the reference ensemble average of the V, and to the second-order by addition of the squared deviation of V from its mean or the variance.[154]

One of the main limitations of this method is that the results depend strongly on the starting conditions.[160–162] The most common solution to solve this issue is to run several independent simulations with different random-number seeds of starting velocities.

Page 32: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

32

3.3. Multiscale models There are different ways to treat the system in computational mechanistic en-zymology. The system can be treated as a model of only the reactive parts with a complete and accurate quantum description (i.e., DFT) of small systems yet with limitations to describe the environmental effects for systems larger than a few hundred atoms due to very demanding computational cost. Molecular mechanics approaches based on classical potentials are efficient methods with relatively low costs that are helpful in the explanation of environmental be-havior, but they are unable to describe the electronic changes during a chemi-cal reaction. As a solution to overcome these issues, one could use multilayer approaches (such as the hybrid QM/MM approaches), which treats the system as a reactive model with a complete quantum mechanical description, and it includes less complicated descriptions of the non-reacting part of the system. Moreover, the environment can be treated with different levels of approxima-tions: as an all-atom model, coarse-grained model, or dielectric continuum (Fig. 7). The representing QM region in QM/MM methods can be grouped into molecular orbital (MO) and valence bond (VB)-based approaches.

Figure 7. Schematic depiction of the QM/MM approach of an enzyme system. When the reactive model is treated with QM level theory, the middle part is QM/MM cou-pling, and the outer region is treated with MM level of theory. Reprinted with permis-sions from The Royal Society of Chemistry, Romero-Rivera et.al.[163]

3.3.1. The combined quantum mechanics/molecular mechanics (QM/MM) approach The quantum mechanics/molecular mechanics (QM/MM) approach was intro-duced as a concept by Warshel and Levitt in 1976.[62] They presented a method with essential features applied to an enzymatic reaction. Later, in 1990, this method was implemented into the CHARMM force field by Field, Bash, and

Page 33: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

33

Karplus.[164] QM/MM methods combine both QM and MM approaches. More precisely, the reactive part is treated with the quantum mechanics methods, and the outer part of the system (e.g., enzyme or solvent) is a molecular me-chanical component. There are different ways how to treat interactions be-tween the QM and MM regions. Therefore the total energy of the system is not merely a straightforward sum of the two energies of both parts; instead, the interactions are treated by a given embedding scheme.[165] Depending on the type of approach chosen, one possibility is to use link atoms that cover the QM region, and are not part of the entire system. Another way is to introduce unique features to both subsystems. The standard choice for the QM part is DFT or semi-empirical methods, whereas any molecular dynamics force field can be used for the MM part. Different schemes are introduced for QM/MM calculations, among which subtractive and additive schemes are often used. In the subtractive scheme, the entire system is calculated with the MM method (1), the inner part is determined by QM (2), and the internal subsystem is cal-culated with MM (3). The energy of the entire system is then obtained by summing energies of (1) and (2) and subtracting (3):

/ (3.28)

where S means the system, I inner part, and L link atoms. An example of the QM/MM subtractive scheme is Integrated Molecular

Orbital/Molecular Mechanics (n-layered ONIOM) method.[166] In the additive scheme, MM is performed only on the outer region, and an

explicit coupling term EQM-MM(I,O) is introduced between the two subsystems, the capped inner part (I+L) is calculated with QM level theory.

/ , (3.29)

The electrostatic interactions between subsystems can be handled at different levels of sophistication: mechanical, polarized, or electrostatic embedding.[165]

QM/MM calculations are not trivial, and modeling an enzymatic reaction can become very challenging. It is possible to gain highly accurate results and detailed mechanistic insights into enzymatic reactions with a carefully pre-pared system; otherwise, QM/MM can provide many errors as every error or wrong choice of parameters in the initial phase cannot be recovered at a later stage.

3.3.2. The empirical valence bond method Biomolecular modeling has evolved to a high level of performance in the last few years as a result of the hard work of the pioneers in the field, such as

Page 34: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

34

Nobel prize winners in 2013 Warshel, Karplus, and Levitt.[167] They demon-strated multiscale models for complex chemical systems that allow moving biochemical modeling to the next level. However, even with increasing com-putational power, advanced mechanistic studies on biomolecular systems with full QM descriptions are highly computationally expensive. As shown before, hybrid QM/MM methods overcome this problem, and the empirical valence bond approach (EVB) as an example of the QM/MM approach is a highly successive method to simulate reaction pathways in biochemistry while ob-taining reasonable results for a reasonable cost. It uses the valence bond (VB) theory instead of the molecular orbital (MO) theory that was shortly men-tioned before.

EVB method is a semi-empirical approach, and it uses an entirely classical description of the different valence bond configurations onward a chemical reaction.[168] It is possible to describe the reaction system with covalent, ionic, and mixture bond types in the set of EVB configurations. The linear combina-tion of VB states, which define the chemical reaction, is described as the min-ima of curved energy surfaces. It can be obtained in the modified version of the classical force field description.[169] EVB approach has great computa-tional benefits: 1) it is noticeably fast because of the usage of the classical force field, and 2) it can carry an incredible amount of chemical information with proper parametrization of the force field.[170] On the other hand, the meth-ods bear many limitations as they are strongly dependent on parametrization (the simulations are as good as the introduced parameters), to name one of them.

Although the EVB approach can be used to describe many states in the chemical reaction, here for simplicity, we focus on describing the EVB ap-proach for only two reaction states. The ground state EVB potential function is represented by the lowest eigenvalue of the secular equation:

22 4 (3.30)

where H11 and H22 are Hamiltonians for different states: reactants, and prod-ucts; and H12 is the phase-independent coupling between two EVB states (Fig. 8).

(3.31)

where represents the interaction potential between the solute atoms and the surroundings, the surrounding-surroundings interaction potential is shown as .

Page 35: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

35

Figure 8. Schematic representation of the free energy surface for simple two-state reaction B-C → A-B . H12 represents the coupling energy between states, and is the reorganization energy.

It should be mentioned that EVB uses Morse potentials instead of Harmonic potentials for atoms involved in bond breaking/forming. The FEP approach and umbrella sampling procedure are involved in calculating the free energy of the reaction.[168] The coupling parameter m connects two states via a set of intermediate potentials, which are sufficiently similar and speed up calcula-tions allowing adequate sampling between two the potentials E1 and E2. It is done by sampling with the mapping potential (m):

1 (3.32)

Potentials 1 and 2 have their minima at the reactant and product geometry, respectively. The free energy difference between the states can be formulated as:

→ ⟨ ⟩ (3.33)

where kB is Boltzmann constant, and T is temperature.

∆ → ∑ → (3.34)

∆ represents only the free energy associated with moving from potential 1 to 2 on the constraint potential m, and it is not the activation energy G‡. EVB does not provide absolute free energies but free energy relative to the ground state. It is possible to obtain free energy (G(x)), which corresponds to the trajectories shifting on a ground-state potential (Eg). The way to achieve this is to define a reaction coordinate (x) for the particular reaction of interest,

Page 36: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

36

which could be determined in terms of the energy gap () between two VB states, 1 and 2.

∆ ≅ ∆ ⟨ ⟩ (3.35)

The energy difference between the mapping (m) and the ground state potential (Eg) (calculated at every point of the trajectory with the Boltzmann average of the difference) allows for correcting free energy obtained on the mapping po-tential.

As mentioned, the EVB approach is a semi-empirical method, and it is re-liable only with proper parametrization, as shown in many publications.[171–

174] The main philosophy is to use a well-defined reference state, to which we fit the off-diagonal coupling element, Hij, and the gas-phase shift, α. The choice of the reference state is essential, and it could be either the non-enzy-matic reaction in the aqueous environment (fitted to experiment or high-level quantum chemical calculations) or the wild type enzyme with comparison to a range of substituted enzyme variants. However, EVB is accurate enough to calculate exact kinetic quantities, such as kinetic isotope effects as proved by many successful studies.[171–174]

The linear response approximation (LRA) can be used to evaluate energy differences between two states A and B,

∆ → ⟨ ⟩ ⟨ ⟩ (3.36)

where ⟨… ⟩ and ⟨… ⟩ are the ensemble average sampled on UA and UB, re-spectively, and is reorganization energy. The equation is used with an assumption that the free energy functionals, which describe two states, are parabolas of similar curvature.

The EVB approach allows screening for different factors, which drive ca-talysis. It points to the source of the effect as well as all the reorganization effects in the system. The method can be used to justify, for example, the ef-fect of the solute to changes in the system, which is shown in Paper III.

3.4. Analysis of the results 3.4.1. Clustering algorithms Every computational modeling calculation generates a large amount of data that needs to be analyzed to find an answer and proper description of a given question. There is a significant increase in the amount of data and no sign of when/if the increase ends. It brings an important question: what is essential to

Page 37: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

37

keep and what is safe to leave out.[175] Biological systems are flexible, and they can adopt a large number of different conformations. One simulation can gen-erate millions of individual structural snapshots describing the movements of proteins. Therefore, it is crucial to choose a proper method for analyzing, sort-ing, and/or statistically evaluating the trajectories. For this reason, clustering methods have been commonly used to sort an individual snapshot into clusters – groups with similar properties.

Clustering methods divide similar objects into smaller sub-groups based on various algorithms and then group them based on similarities or specific dis-tances. Conformations from MD trajectories are grouped with geometrical or kinetically similarity, which could massively reduce the amount of data for additional analysis. Moreover, the algorithm needs to generate clusters where members will have high similarity inside the same cluster, however, which are dissimilar to members of other clusters.[175]

Furthermore, clustering algorithms can be split into partitional and hierar-chical clustering approaches.[176] In the former, the set of data is un-nested in comparison to the latter algorithm. A partitional clustering divides the set of results into non-overlapping clusters while the hierarchical approach nests them in as a tree (dendrogram). Each snapshot from MD simulation contains three-dimensional Cartesian coordinates of all atoms in the system, including all biomolecules, solvents, and ions if present. One of the geometrical criteria in the partitional clustering approach is to compare them with a distance be-tween pairs of conformations by root-mean-square deviation (RMSD). An ex-ample of the clustering method is shown in Fig. 9, which shows several groups highlighted with different colors representing different conformational states within the MD trajectory.

Figure 9. Clustering results for small peptide showing possible conformations and folding over time. Reprinted with permission from Rajan et al.[177]

Page 38: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

38

Examples of clustering algorithm types are hierarchical methods (agglomera-tive, divisive), partitioning methods (i.e., relocation, probabilistic, k-mean, density-based), grid-based methods, constraint-based methods; scalable clus-tering; machine learning (i.e., evolutionary, gradient descent), and high-di-mensionality data.[175] The two first algorithm types are commonly used in science. Hierarchical algorithms determine new clusters using previously es-tablished clusters, where partitioning algorithms establish all clusters at the same time.[178] Relocation methods determine clusters by relocating the proto-types during iterations when density-based algorithms look to areas with a high population of data. The k-means is a highly robust and straightforward method, why it is usually the first type of algorithms chosen for analysis. How-ever, it tends to find local instead of global optima depending on starting val-ues.

3.4.2. Principal component analysis In addition to clustering methods, snapshots from MD simulations can be an-alyzed with principal component analysis (PCA) method. PCA is a statistical technique, and a standard method used to find patterns in high-dimensional data that allows highlighting their similarities and differences.[175] This method was introduced in 1901 by Pearson[179] for computing a few variables and later extended by Hotelling in 1933[180] to account for a higher number of variables. The main goal of this method is to reduce the dimensionality of a multivariate data set.

The process starts with a subtraction of the mean of each data dimension. The next step is to calculate the covariance matrix for each pair of dimensions, and then eigenvectors and eigenvalues for the covariance matrix. Eigenvalues (and their corresponding eigenvectors) are ordered from largest to smallest, the size of the eigenvalue indicating the amount of variance described by its corresponding eigenvector. Normally, eigenvectors with large eigenvalues are used for analysis, whilst eigenvectors with negligible eigenvalues are disre-garded (as they describe very little of the variance in the data set).[175]

PCA analysis is looking for combinations of interrelated variables (x1, x2, …, xj) based upon variances to produce a transformed uncorrelated set of var-iables (p1, p2, …, pi, called principal components). The principal component is a linear combination of variables with a coefficient of the variable.

∑ , (3.37)

where ci,j is the coefficient of the descriptor xj. PCA is a standard statistical tool used in molecular dynamics data mining.

Amadei et al. suggested that correlated motions, comprising only a few de-

Page 39: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

39

grees of freedom, are functionally important, in contrast to not relevant inde-pendent movements and Gaussian fluctuations.[181] PCA, when applied to the analysis of MD simulations, reduces the number of dimensions required to explain protein dynamics. It is possible to do it by a decomposition process that groups observed motions from largest to smallest spatial scales,[182–184] simplified, only the dominant modes corresponding to the protein motions from MD simulations are extracted.[185] Fig. 10 shows a representative output in the form of a diagonalized covariance matrix from PCA analysis.

Figure 10. Example of results from dynamic cross correlation maps (DCCM) of the DERA (2-deoxyribose-5-phopshate aldolase) enzyme for (A) wild-type DERA and (B) structure with S239P mutation. Reprinted with permissions from The Royal So-ciety of Chemistry, Ma et. al.[79]

Page 40: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

40

4. Phosphate and sulfate esters

Phosphate and, to a lesser extent, sulfate transfer reactions are among the most common reactions in biology.[186,187] In particular, phosphoryl transfer reac-tions play a crucial role in cellular signaling, energy metabolism as well as protein synthesis, and many other processes. In contrast, sulfuryl transfer re-actions are found in cellular signaling pathways, are involved additionally, in hormone regulation, detoxification, and cellular degradation.[188] The general-ized structure of several phosphate and sulfur esters, as well as that of the biologically important compound ATP, is shown in Fig. 11. Understanding the rigorous mechanism of phosphate and sulfate hydrolysis in solution is fun-damental for sufficiently explaining the mechanism that enzymes use to cata-lyze these reactions, such as DNA polymerase or various phosphatases.

Figure 11. Phosphate and sulfate esters with crucial biological compound ATP (aden-osine triphosphate) as an example.

This chapter focuses on different mechanistic possibilities for these reactions. Then, special attention is given to two physical organic chemistry approaches to characterize transition states, specifically linear free energy relationship (LFER) and kinetic isotope effect (KIE). Finally, the properties of the sulfur

Page 41: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

41

and phosphate ester transitions states, which is the main focus in Paper I, are discussed.

4.1. Mechanisms of phosphate and sulfate ester hydrolysis Hydrolysis of phosphate and sulfate esters can occur via different scenarios and possibilities (Fig. 12). One of the most debated mechanistic questions is whether the phosphoryl transfer occurs via a stepwise or concerted pathway. Therefore, regardless of decades of computational and experimental studies, mechanistic details remain controversial with a strong disagreement between studies.[4,187]

Research on different mechanistic possibilities of phosphate esters started in the early 1950s,[189,190] where it has been observed that hydrolysis rate for protonated phosphate monoesters occurred faster than for monoester dianions. The observation suggested the stepwise dissociative pathway, where the rate-determining step was unimolecular decomposition to a metaphosphate inter-mediate, followed by the rapid addition of water. Later, it was shown that for monoesters with particularly good leaving groups, the dianion hydrolysis was faster than for monoanionic form.[191] Although the experiments supported this finding,[192,193] the following studies suggested that the obtained results were based on the properties of a transition state and not the intermediate state. The difference between them being that the transition state described a local max-imum of the reaction coordinate, whereas the intermediate state lied in a local minimum.[187]

Page 42: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

42

Figure 12. Several possible reaction pathways for phosphate ester hydrolysis. (A) More-O’Ferral-Jencks (MFJ) diagram, which illustrates a hypothetic reaction path-way for the phosphoryl transfer. The reaction can proceed by an associative pathway with the phosphorane intermediate; a dissociative pathway with the metaphosphate intermediate, or a concerted pathway with the ‘tight’, ‘synchronous’, or ‘loose’ tran-sition state. (B) Multiple possible reaction mechanisms for phosphate monoester hy-drolysis: associative (AN+DN), dissociative (DN+AN), concerted (ANDN). Addition-ally, the reaction can be either substrate- or solvent assisted.

Different experimental and theoretical methods have been used to validate all possible scenarios of reaction pathways for phosphate and sulfate esters. As a result of low-lying d-orbitals on the phosphorus atom, the reaction can proceed via stepwise associative (AN + DN; with phosphorane intermediate) or step-wise dissociative (DN + AN; with metaphosphate intermediate) pathways. The nucleophilic attack appears earlier to the leaving group departure (including pentavalent phosphorane intermediate) in the associative pathway. On the other hand, in the dissociative pathway, the leaving group departure appears before the nucleophilic attack. Additionally, the reactions can occur via a con-certed mechanism, with existence again of two possibilities: concerted asso-ciative or concerted dissociative pathways. Multiple studies were performed to verify all possibilities, and there is no clear evidence for the pathways that include the formation of the metaphosphate intermediates, it could not be proven experimentally by stereochemical studies and by KIE and LFER meth-ods.[194,195] In the case of phosphate monoester dianions, experiments[196–198] suggested a concerted dissociative pathway, where computational studies[199–

201] have presented the possibility of both concerted pathways: with a dissoci-ative or associative character of TS depending on the pKa of the leaving group.

Page 43: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

43

It appears that for systems with high pKa of leaving groups, the reaction pro-ceeds via the associative pathway, and gradually gets more dissociative with more acidic leaving groups. However, proper interpretation of the experi-mental results could be problematic as different pathways result in similar ex-perimental observables.[202] In contrast to phosphate monoester dianions, which are generally considered to proceed via concerted pathways, computa-tional studies on phosphate monoester monoanions suggested stepwise disso-ciative and associative mechanisms. Unfortunately, some of the studies seem to be controversial as they were prepared in gas-phase or did not compare dissociative and associative pathways for the same system.[4] Moreover, asso-ciative and dissociative pathways are equally reasonable in the case of methyl phosphate.[203] The additional discussion revealed if substrate-assisted cataly-sis is a viable mechanism for phosphate monoester hydrolysis (what was pro-posed in the 1990s’).[204,205] Thereafter, computational studies on two extreme cases: 4-nitrophenyl and methyl phosphates suggested that the solvent-as-sisted pathway (concerted dissociative pathway) was energetically more pre-ferred in the case of 4-nitrophenyl phosphate in contrast to methyl phosphate where the substrate-assisted pathway (more associative TS) was the energeti-cally choice. The mechanism appeared sensitive to the leaving group pKa, as in previous cases. Furthermore, phosphate diesters and triesters showed a more associative transition state in comparison to monoesters. Phosphate diesters and monoesters are more leaving group dependent than triesters. In the case of the poor leaving group, such as dineopentyl phosphate diester, a stepwise pathway with phosphorane intermediate was observed.[206]

Studies comparing phosphate and sulfate esters (p-nitrophenyl phosphate (pNPP) and p-nitrophenyl sulfate (pNPS)) suggested that both compounds proceed via loose transition states.[207,208] However, in contrast to experimental suggestions, theoretical studies[200,205,209,210] proposed a solvent-assisted mech-anism for pNPS, as implied before, but a more associative, substrate-assisted pathway for pNPP.

Even with extensive computational and experimental studies, the full ex-planation of the nature of transition states and mechanistic details for phos-phate and sulfate esters hydrolysis are still unknown, and more studies com-bining experimental data and theoretical evaluations are needed.

4.2. Linear free energy relationship The reaction mechanisms of organic compounds are often interpreted using linear free energy relationships (LFER) method, which allows analyzing the effect of substitutions of organic compounds on rate constants of the cleavage reaction. It is possible to characterize a transition state of the reaction by mod-ifying the substituents of the structure and measure the effects of this substi-tution on reaction rates and equilibria. When the bond breaks/forms in the TS,

Page 44: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

44

a change in electron density and charge is expected. When electron-withdraw-ing substituents are introduced, they stabilize the development of the negative charge but destabilize the development of positive charge. The opposite effect is observed when electron-donating groups are introduced.

The Hammett equation treats the substitution effect of the organic com-pound on the rate of equilibria and could be expressed as:

log (4.1)

or

log (4.2)

where kX and KX are the rates or equilibrium constant, respectively, and X is the substitution of hydrogen (H), , and are the substituent and reaction con-stants, respectively. The magnitude and the sign of the reaction constant sug-gest whether the negative charge is built ( 1 or not ( 1 .

For the catalytic step, the Gibbs free energy for dissociation of a proton is proportional to the activation energy. If the relation is not linear, it means that the reaction mechanism is not the same for the chosen group of catalysts. The Brønsted correlation of the rate constants with nucleophile pKa ( is used to determine the nature of the transition state. Reactions with low values of have the transition state close to the reactant form with a less extensive proton transfer, and an opposite trend is observed for high values when the TS resem-bles the product form. The Brønsted equation is as follow:

(4.3)

or

log (4.4) LFER can be specified as a correlation between log(k) and log(KEQ), where the slope shows how much charge was built up and how extensively the bond was cleaved/formed in the transition state. Typically, graphs of log(k) or log(KEQ) versus pKa (Brønsted plots) are often shown instead of plots of log(k) and log(KEQ). It is worth mentioning that KEQ values are difficult to obtain. Therefore, the log(k) vs. pKa plots are more convenient to use.

In the case of phosphate esters, the pKa measurements of the nucleophile or the leaving group provide a reference for the charge development in the same way as the equilibrium for cleavage involving the development of a charge -1 on the phenolate.

Page 45: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

45

4.3. Kinetic isotope effects Kinetic isotope effects (KIE) probe the change in bond strength at the transi-tion state for a chemical reaction. To do so, they determine changes in the reaction rates k, which occur if the atom for the concerned compound is sub-stituted with a heavy isotope. From quantum mechanics, all bonds have a char-acteristic minimum vibrational energy state, called the zero-point energy, which is dependent on the bond strength as well as the masses of the atoms involved in the bond. When the mass increases, the zero-point energy de-creases. If the bond breaks, a loss of the vibrational energy is observed, and the difference between the heavy atom and light isotope disappears. When the bonds are partially broken in the TS, the difference between isotopes decreases relative to the ground state. When an isotopic substitution appears in positions of bond cleavage, the term primary isotope effect is introduced. If the bond is not broken in that position, we observe a secondary isotope effect, which is always smaller than the primary isotope effect. Also, the value decreases with distance from the isotopically labeled atom to the reaction site. The dominant vibrational contribution for the primary isotope effect usually comes from the stretching mode (loss or gain).

The substitution can be calculated from the equation: KIE = kL/kH (4.5)

where kL and kH are light and heavy atom isotope rate constants, respectively. Including activation free energy (∆ ‡):

∆ ‡ ∆ ‡

(4.6)

The second equation contains effect of the quantum mechanical nuclear tun-neling included in (∆ ‡). Additionally, the effect of substitution can change the value of the equilibrium constant, but not modify the nature of the equilib-rium. The effect is called the equilibrium isotope effect (EIE), which can be calculated from changes in the equilibrium constants K:

EIE = KL/KH (4.7)

When the value of KL > KH, it is specified as a ‘normal’ isotope effect, in opposite situation, when KH > KL, it is referred to an ‘inverse’ isotope effect.

KIE approach is highly potent as it allows us to obtain information about transition states involved in the reactions. However, it is not sufficient if a non-chemical step is rate limiting. Furthermore, in the case of phosphoryl

Page 46: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

46

compound reactions, the isotope effect does not have exceptional mechanistic interpretation.

4.4. The nature of the transition states in involved in phosphate and sulfate ester hydrolysis (Paper I) Many researchers have focused their interests on insights of enzymatic hy-drolysis of both phosphatase and sulfatase activities,[4,7,187,211,212] where an un-derstanding of non-enzymatic catalysis is not less essential as it gives a knowledge of the nature of the transition states as well as the fundamental chemistry that could be used for better understanding of the enzymatic reac-tions. Nevertheless, scientists focused more on phosphoryl transfer reactions, and sulfate hydrolysis has received less intention. In the case of monoesters, both experimental[207,213–217] and computational[209,210] studies suggested mech-anistic similarities of both sulfate monoester transition states and the phos-phate monoesters, although transition states of sulfate monoesters appeared to be slightly more compact.[210] In contrast to the monoesters, sulfate diester hydrolysis can proceed through C-O or S-O cleavage. It was shown that dial-kyl and aryl alkyl sulfate diesters react by C-O bond break rather than S-O bond cleavage.[218,219] Based on experimental work of Younker and Hengge,[220] diaryl sulfate diesters appeared to involve the S-O bond cleavage with a direct nucleophilic attack on sulfur. That allows us to directly compare these compounds to previously studied phosphate monoesters, aryl phosphate diesters, as well as fluoro-substituted phosphates, pyridinio-N-phosphates, and sulfonate monoesters in order to validate their transition state nature and mechanism pathway. In our work, we performed theoretical calculations of substituted diaryl sulfate diesters hydrolysis previously studied experimen-tally. The studied compounds vary in their leaving group pKa values, which are shown in Table 1.

Table 1. List of the leaving group substitutions on the phenyl group with the respective pKa values.

Leaving group (phenol) Leaving group pKa

4-Cl-3-NO2 7.75 4-NO2 7.15 2,6-F2 7.12 2-F-4-NO2 6.2 2,3,4,5,6-F5 5.33 3-F-4-NO2 5.3

In order to shed light on the transition state nature of different substituents, we used three different functionals: M062X,[53] M11L,[221] and B97X-D[137] as well as solvation models with a various number of explicit water molecules (0

Page 47: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

47

to 8). We demonstrated that the calculated LFERs are dependent on the chosen functional as well as the number of explicit water molecules included in the system, as shown in Fig. 13.

Figure 13. Comparison of experimental and calculated linear free energy relationship for hydrolysis of substituted diaryl sulfate diesters with various solvent molecules (marked with different colors) and (A) M11L, (B) M062X, (C) B97X-D functionals.

In comparison to previously studied compounds, which were mostly charged, diaryl sulfate diesters are neutral structures. Based on this, we observed that the explicit molecules interacted better with the polarized TS and got distorted in the RS and PS. The observation suggested that the water oxygen atoms are better hydrogen bond acceptors than the sulfate oxygen atoms. It explained the differences in the calculated free energies as the hydroxide ion moved from an ideal in-line attack position. We showed that the structures were largely

Page 48: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

48

independent of the number of explicit solvent molecules introduced in the sys-tem. Despite the differences between used methods, we obtained good agree-ment with the experimental values of KIE (Table 2). Our focus was put on reproducing the trends observed experimentally, such as the LFER slope, not on replication of the absolute energies for the reactions.

We confirmed that hydrolysis of the mentioned compounds proceeds through a concerted pathway with loose transition states (concerted with a dis-sociative nature) without the presence of an intermediate, which was sug-gested experimentally. The bond cleavage to the leaving group occurred be-fore the formation of the nucleophilic bond. The nature of transition states was similar to previously studied compounds, however, it appeared to be slightly tighter. The knowledge gained from the observations can increase un-derstanding of the enzymes that catalyze these reactions, mainly the selectivity of the enzymes. Moreover, it put light on challenges in the modeling of sulfate ester hydrolysis.

Table 2. Comparison between experimental[220] and calculated KIE for the 4-nitro-phenyl phenyl sulfate hydrolysis with a different number of explicit water molecules and M11L, B97X-D, M062X functionals.

Waters Experiment M11L ωB97X-D M062X

18kbridge 15k 18kbridge 15k 18kbridge 15k 18kbridge 15k

1.003 ± 0.002 1.0000 ± 0.0005

0 1.003 1.0006 1.006 1.0012 1.005 1.0008

1 1.004 1.0010 1.006 1.0014 1.006 1.0007

2 1.003 1.0005 1.007 1.0009 1.003 1.0010

3 1.004 1.0005 1.007 1.0014 1.004 1.0010

4 1.003 1.0007 1.005 1.0007 1.005 1.0005

5 1.003 1.0005 1.006 1.0010 1.002 1.0008

6 1.006 1.0004 1.006 1.0013 1.003 1.0007

7 1.004 1.0006 1.006 1.0013 1.005 1.0007

8 1.005 1.0006 1.006 1.0013 1.003 1.0007

Page 49: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

49

5. Understanding the mechanism and evolution of enzymes that catalyze phosphoryl and sulfuryl transfer.

Hydrolysis of phosphate esters has an extremely crucial role in biological pro-cesses, such as protein synthesis, energy production or signal transduction, and many others as phosphates are building blocks for DNA and RNA. DNA is incredibly stable compound with extremely slow uncatalyzed hydrolysis rates, on the timescale of millions of years. Enzymes, one of the building blocks of life, speed up reactions with high efficiency. Many of the specialists’ enzymes have promiscuous functions and can catalyze different reactions. One such enzyme is serum paraoxonase 1 (PON1), which is native lactonase with promiscuous esterase activity.

The central role of this chapter is to overview the most relevant studies of the enzyme serum paraoxonase 1, which is an excellent example of organo-phosphatases that evolve from lactonases with high promiscuity to catalyze various of compounds. This part covers Papers II, III, and IV.

5.1. Catalytic mechanism and properties of serum paraoxonase 1 Serum paraoxonases (PONs) are a family of detoxifying lactonases found only in mammals; however, Tawfik et al.[222] identified a similar kind of family in bacteria (PON-like ‘quorum-quenching’ lactonases). Three identified mam-malian PON enzymes (PON1, PON2, PON3) catalyze the same reaction, lac-tone hydrolysis, with different substrates specificities and share around 65% sequence identity. They are encoded by three separate genes on the same chro-mosome 7 (human) or 6 (mouse) and reside primarily in the liver due to their function. PON’s have multifunctional roles in many biochemical pathways such us detoxification of reactive molecules, regulation of cell apoptosis and proliferation, bioactivation of drugs, protection against oxidative damage and lipid peroxidation, contribution to innate immunity.[223] Because of their im-portant functions in many fields, they are considered as ‘moonlighting pro-teins’. PON enzymes are highly promiscuous, and in addition to their lacto-

Page 50: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

50

nase functions, they can successfully catalyze the hydrolysis of organophos-phate (OP), nerve agents, and pesticides. Although the PONs family are lac-tonases, which also promiscuous catalyze organophosphates hydrolysis, both PON2 and PON3 are inefficient catalysts of organophosphate hydrolysis in comparison to PON1.[10,224]

PON1 is well-studied for its protective role against coronary heart diseases (CAD), yet research on enzyme polymorphisms associated with CAD remains controversial. Several studies have shown a connection between PON1 Q192R and L55M polymorphisms and CAD,[225–229] but others disagree with this hypothesis.[230–232] This multitasking enzyme can hydrolyze a variety of substrates like arylesters, thioesters, lactones, thiolactones, organophosphates, carbonates.[10,233,234] The mechanism of promiscuous substrates hydrolysis is still poorly explained, and there are many arguments about essential residues that are crucial for hydrolysis.

5.2. Serum paraoxonase 1 PON1 is a six-bladed -propeller consisting of 354 amino acid residues. It has two calcium ions in the active site, one important for catalysis, the other for the stability of the enzyme (Fig. 14).[235,236] It is synthesized in the liver and then secreted into plasma, where it binds to high-density lipoprotein (HDL), particularly one containing apolipoprotein A-I (ApoA-I) (Fig. 14a).[236–238] PON1 is known to protect HDL and LDL (low-density lipoprotein) from oxi-dation and destroys oxidized lipids of lipoproteins.[236,239–241]

PON1 is widely promiscuous: it catalyzes the hydrolyses of oxons such as paraoxon, diaoxon, and chlorpyrifos; aromatic esters like phenylacetate; aro-matic and aliphatic lactones; cyclic carbonates like dihydrocoumarin, -butyr-olactone, homocysteine thiolactone; nerve agents like sarin, soman, and ta-bun.[223,233,242–247]

PON1 has been shown to use a different set of catalytic residues for native and promiscuous activity (Fig. 16)[241,247–249]. The native lactonase activity is a multi-step reaction with the nucleophilic attack to form a tetrahedral interme-diate, this step is the most likely the rate-limiting step, then the lactone ring is opened, and C-S bond is hydrolyzed. Organophosphate hydrolysis is a one-step hydroxide attack. Nucleophilic activation proceeds through general base catalysis, which is D269 for organophosphate hydrolysis and H115 for lactone hydrolysis.

Page 51: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

51

Figure 14. Overall structure of PON1 (PDB ID: 3SRG[247]). View from (A) a side, and (B) an above of the propeller. Including three helices (H1-H3) at top interacting with the HDL structure. Calcium ions represented as green spheres. Flexible loop (colored in pink), active site residues, and ligand analog (2-hydroxyquinoline; yellow) pre-sented. Adapted by permission from Springer Nature Customer Service Centre GmbH: Nature, Nature Structural & Molecular Biology, Harel et al.[250] Copyright 2004.

Figure 15. Active site residues of serum paraoxonase 1 in complex with 5-thiobutyl butyrolactone (TBBL) substrate (PDB ID:3SRG[247]).

Page 52: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

52

Figure 16. Mechanism of PON1 hydrolysis reaction for organophosphate (A) and lac-tone (B) substrates.

5.3. Catalytic floppiness and the activity of PON1 (Paper II) Many of the mammalian enzymes are membrane-associated, and studies on membrane proteins suggested that lipid membranes affect local structure, dy-namics, and activity of the proteins.[40,251–254] Understanding these interactions is essential to describe catalysis, however, a detailed explanation is unknown. Experimental work has shown that HDL binding is crucial for PON1 stability (inactivation due to calcium loss affects the rate of misfolding), which de-creases more than 350-fold, as does the catalytic efficiency.[237,255] It leads to kcat/KM increase up to 20-fold with lipophilic lactonases and detergent-associ-ated PON1.[237,255] The catalytic efficiency of the detergent-free enzyme was impossible to measure because extensive delipidation causes a complete loss of activity.

Serum paraoxonase 1 contains three helices on top of the structure, out of which two (H1 and H2) were suggested to play a role in the anchoring to HDL (Fig. 14).[250] Truncation of the H1 helix reduces affinity to HDL by ~100-fold, and direct evidence of the role of the H2 is missing.[237] HDL binding stimulates the native lactone activity, but the promiscuous paraoxonase activ-ity is less stimulated. Paper II is a combination of experimental (mutational and kinetic analysis) and computational studies (EVB calculations). We per-formed mutational and structural analyses with computational simulations to validate the role of the HDL bounding to PON1 activity. We analyzed the interactions of the PON1 enzyme with a detergent molecule (DDM: n-do-decyl--D-maltoside), which was also included in the medium used in protein crystallization. The detergent interacted with residues of H1 and H2 helixes, suggesting to have similar hydrophobic nature as the HDL-PON1 structure.

Page 53: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

53

Our analysis implies that binding of detergent molecules and the crystal pack-ing interactions associated with helixes H1 and H2 play a role in limiting the enzyme conformational flexibility and assist in forming the protein crystal. We proposed that HDL binding affects the organization of the active site res-idues and its shape. The in-situ analysis of PON1 and HDL interactions showed a strong HDL stimulation of lactonase activity, when the paraoxonase activity was less affected (Fig. 17). No stimulation was noticed for the H1-truncated PON1 mutant.[237]

Figure 17. The results of activity stimulation performed by in situ expression of PON1 recombinant with the presence of high-density lipoprotein particles. Reprinted from ref. [241] Copyright 2015, with permission from Elsevier.

The positions of hydrophobic residues in the H2 helix, Y190, W914, L198, Y185, and F186, are highly conserved in mammalian PON1s,[222] which is un-usual for surface located residues. The mutational analysis, which was per-formed to validate the role of hydrophobic residues in HDL binding and stim-ulation, demonstrated that single substitutions decrease the HDL-induced sta-bilization and stimulation only to a small degree. However, a double mutant of Y190 and W194 revealed a more significant effect on stimulation and sta-bilization, reducing it by nearly 14-fold. The Y185G mutant presented almost a complete loss of HDL-stimulation, and mutants of F186 showed a decrease in lactonase stimulation and stability. The mutational analysis supported the role of the residues H2 helix in the HDL binding and its stimulation. The res-idues of helix H2 were found to be connected to the active site calcium, and floppy loop via >20Å chain of hydrogen bond interactions showed in Fig. 18. Additional mutational analysis of other residues involved in the hydrogen bond tunnel indicates that these amino acids have a crucial role in stabilization of calcium ion and in-direct in catalysis as the active site becomes unstable due to the loss of hydrogen bonds.

Page 54: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

54

Following the experiments, we performed EVB calculations with diverse models representing different helices conformations due to the unavailability of the crystal structure of the enzyme bound to HDL. The models were the snapshots from the PON1 crystal structure (PDB ID: 3SRG[247]) with most representative normal modes (NM7-NM11; obtained from normal mode anal-ysis, which was performed on ElNémo server[256]) with lipophilic lactone sub-strate- TBBL (5-thiobutyl butyrolactone), and promiscuous substrate- paraoxon (PXN). Detailed information about normal modes is included in Pa-per II. The models did not necessarily represent the structure of PON1 bound to HDL but could provide a reasonable set of starting structures for better un-derstanding H1 and H2 role in binding. All calculations were performed in the Q simulation package[257] with the OPLS-AA[148] force field. The wild type crystal structure with both substrates served as our reference structure. We modeled only the rate-limiting step (the nucleophilic hydroxide attack to form a tetrahedral intermediate) to provide more extended simulations of the differ-ent models and WT structure. The maximal observed displacement of H2 he-lix with normal modes was 4.9Å in comparison to crystal structure. As sug-gested in previous work,[247] we used different mechanisms for lactonase and organophosphate activity, H115 and D269, taking the role as a general base, respectively (Fig. 16). The network of interactions (more than 20Å) between helix H2 (residues H184 and D183) and the active site residues that are in direct contact with the substrate and a floppy loop containing Y71 residue seems to be crucial for effective hydrolysis. It was observed in our calculations and validated by experimental set of mutations. The calculated activation and reaction free energies (Table 3) were different depending on the model used for calculations and provide evidence of the importance of H2 helix for effec-tive catalysis. We observed that the same positioning of residues has a differ-ent effect on the native and promiscuous substrate. However, restraining the active site floppiness has a significant effect on both substrates.

Page 55: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

55

Figure 18. Hydrogen bond network between residues located on helix H2 and cata-lytic calcium (crystal structure with lactone analog 2-hydroxyquinoline 2HQ, PDB ID: 3SRG[247]). Reprinted from ref.[241] Copyright 2015, with permission from Else-vier.

Table 3. Calculated activation and reaction free energies for lactonase and promiscu-ous paraoxonase activities with different initial structures. NM7-NM11 stands for the most representative normal modes of PON1 structure. The calculated energies are av-eraged over 15 EVB simulations per system. ‘H2 initial’ means the initial position of H2 helix in comparison to the crystal structure (in Å units), and ‘H2 after 5 ns’ are averaged RMSD values over 5 ns of equilibration of every system.

System ∆ ‡ ∆ ∆∆ ‡ H2 (initial) H2 (after 5ns)

Lactonase activity NM10 12.2 0.8 3.8 1.4 -2.0 4.36 4.58 0.18 NM11 14.0 1.1 6.7 1.5 -0.2 1.09 1.78 0.15 Crystal structure 14.2 0.7 8.8 1.3 0.0 0.00 1.48 0.14 NM9 16.4 0.4 11.6 0.6 +2.2 1.15 1.98 0.23 NM8 17.8 2.1 11.2 2.0 +3.6 2.39 2.73 0.09 NM7 18.1 0.5 13.7 0.8 +3.9 1.76 2.96 0.29 Paraoxonase activity NM7 14.9 0.5 -11.8 1.0 -1.8 1.76 2.87 0.25 NM8 16.7 0.6 -8.7 1.7 0.0 2.39 2.80 0.16 Crystal structure 16.7 0.7 -7.2 1.0 0.0 0.00 2.14 0.60 NM9 17.4 0.6 -10.4 1.6 +0.7 1.15 1.73 0.13 NM10 22.1 0.7 -1.2 0.9 +4.7 4.36 4.38 0.25 NM11 26.0 0.8 -5.6 4.0 +11.1 1.09 1.69 0.10

Page 56: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

56

Figure 19. Superimposition of PON1 bound 2HQ crystal structure with the PON1 bound PXN model. Reprinted from ref.[241] Copyright 2015, with permission from Elsevier.

The displacement of Y71 makes the active site size favorable for bigger sub-strates such as PXN (Fig. 19). Additional roles of Y71 and the active site loop are discussed in the following Section 5.4. The most optimal mode for lacto-nase activity was the least optimal mode for organophosphate activity (Table 3). Even if we could not reproduce the exact effect of HDL PON1 interactions, we reproduced the critical structural features of the system. We obtained a good agreement with experimental free energy barriers using the crystal struc-tures, with calculated values of 14.2 kcal/mol (∆G‡

exp=14.9 kcal/mol) for TBBL hydrolysis and 16.7 kcal/mol (∆G‡

exp=16.5 kcal/mol) for PXN hydrol-ysis.

5.4. Influence of solvation on PON1 activity (Paper III) Following our previous study, we examined the role of tyrosine-71 and the active site loop, where the residue is located. We already found that tyrosine is responsible for the proper positioning of substrates and helps to keep the loop closed. (Fig. 20).

Page 57: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

57

Figure 20. (A) The overall structure of PON1 with floppy active site loop (residues 70-81) colored in red. (B) Tunnel of interactions between helix H2 and active site with all relevant amino acids. Reprinted (adapted) with permission from ref.[24] Copyright 2017, American Chemical Society.

In the previous study,[241] we observed only a small displacement of Y71 in response to the H2 reposition when PON1 bounds the lactone substrate. When the loop is in a closed position, Y71 points directly towards the active site, which makes it challenging to fit bulky organophosphate substrate even though kinetic data demonstrates that PON1 can hydrolyze it efficiently with kcat/KM = 2149 393 M-1 s-1. The only way for PXN to fit in the active site of PON1 is to push Y71 out. We wanted to validate the role of the loop in that case for promiscuous activity. Following the experimental results for Y71 mu-tagenesis, we performed molecular dynamics equilibration for all systems: wild type PON1 bound to lactone and organophosphate substrates (TBBL and PXN) and structures with tyrosine mutants: Y71F, Y71A, Y71G, Y71M, and Y71W with both substrates. Due to the lack of crystal structures of mutants, we used the wild type crystal (PDB ID: 3SRG[247]), which contains a full loop density, and we introduced point mutations using the Dunbrack library[258] im-plemented in Chimera[259] software. Similar to before, all calculations were performed using the Q-package[257] and the OPLS-AA[148] force field. To val-idate the energetics effects of tyrosine mutations, we performed EVB calcula-tions for equilibrated systems with ten replicas, each differing in initial veloc-ities. In comparison to our previous study, we modeled both steps of the TBBL reaction.

Experimental results were complemented by our calculations (Fig. 21), and they showed that mutations of tyrosine mostly affect promiscuous activity (16% of screened variants of paraoxonase vs. 52% variants of lactonase showed 50% activity compared to WT enzyme). The results agreed with a hypothesis that the native functions of the enzyme are more robust and re-sistant to mutations in contrast to promiscuous functions.

Page 58: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

58

Figure 21. Comparison between experimental (blue) and calculated (green) free en-ergies of hydrolysis of TBBL (A) and PXN (B) for wild type (WT) and Y71 mutants (Y71W, Y71M, Y71F, Y71A, and Y71G). The error bars for calculated energies are shown as green lines. Reprinted (adapted) with permission from ref.[24] Copyright 2017, American Chemical Society.

As can be observed in Fig. 20, Y71 forms a hydrogen bond network with D183 and N168 and indirectly interact with S166 and H184. During our simulations, we observed that binding to PXN significantly reduced the loops flexibility, even if the active site volume was increased (Fig. 22). However, the loop was still highly flexible and could adapt multiple conformations in all enzyme-substrate wild-type and mutant systems (Fig. 23). We did not observe a direct trend between introduced mutations and loop behavior, but there was a visible effect of the overall reorganization of the active site. The most significant change was observed when tyrosine was mutated to phenylalanine, which in-troduced a loss of the OH group, causing the interaction with D183 to disap-pear. In the apo enzyme structure (without substrate), the F71 side chain was positioned to the hydrophobic region below the floppy loop formed by L69, I74, and V346. In systems with both substrates, Y71 was pushed out from the active site, which affected the opening of the flexible loop and changed the conformation of H134 (Fig. 23).

Page 59: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

59

Figure 22. Root mean square fluctuations (RMSF, Å) of Cα atoms calculated over the last 25 ns of the simulations for all systems: wild type (WT) and Y71 mutants shown in different colors. The (A) graph represents the substrate-free systems, (B) enzyme bound to TBBL, and (C) enzyme with PXN substrate. Reprinted (adapted) with per-mission from ref. [24]. Copyright 2017, American Chemical Society.

Page 60: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

60

Figure 23. The main observations of conformational change due to Y71 mutations ((A) Y71F; (B) Y71W; (C) Y71G; (D) Y71M) from molecular dynamics simulations. Apo form is presented in green color, an enzyme with paraoxon substrate in violet, complex with TBBL in salmon. Reprinted (adapted) with permission from ref. [24]. Copyright 2017, American Chemical Society.

The I74, V346, and L69 residues play critical roles in the enzyme’s specificity and functions, and the effects of their mutations were discussed in other stud-ies.[229,245] Mutations of these residues to small amino acids as, i.e., Gly or Ala affected not only the loop dynamics but also the hydrophobic region. A sub-stitution for methionine did not have a strong effect on the hydrophobic region positions; only minimal perturbations were found. The effects suggested that the Y71 mutations had an indirect effect on the whole active site configura-tion, mainly to hydrophobic region residues. We recognized that the interac-tions between N166 and D183, as well as S166 and K192, were disturbed as a consequence of the loss of the hydrogen bonding network. Furthermore, the K192 residue was already known to be important for interactions with HDL,[241] as explained in the previous study.

Page 61: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

61

Figure 24. Solvation effects on the active site in different mutants’ variants for com-plexes with PXN (A and B) and TBBL (C and D). The solvent-accessible region is colored as a gray shadow, and all solvent molecules within 6Å from the reacting center are shown here. Reprinted (adapted) with permission from ref. [24]. Copyright 2017, American Chemical Society.

We observed that the active site loop position and the hydrophobic region were affected by the specific mutation, which was also supported by differ-ences in solvation of the active site. When the loop was closed, it created a hydrophobic cage with only a nucleophilic water molecule present. On the other hand, more solvent molecules were in close proximity to the substrate when the loop opened (Fig. 24). This affected the local electrostatic environ-ment of the active site.

The increased volume of the active site resulted in an increased number of water molecules in the active site. In order to validate the influence of solva-tion on the calculated activation free energies, we applied the linear response approximation to our EVB trajectories to extract solute-solvent interactions (‘so-lute’ is chosen as the reacting atoms) to the complete reorganization energy (as explained in the EVB subsection). This allowed for the assessment of changes in the system reorganization energy simultaneously when moving be-tween EVB states. Higher activation energy correlates with larger reorganiza-tion energy. The solute-solvent reorganization energy was observed to be higher for paraoxonase than lactonase activity.

In our calculations, we examined the role of Y71 and its mutants in the active site stability and catalytic efficiency of PON1. We suggested that the Y71 acts as a hydrophobic gate, preserving the active site loop in the closed

Page 62: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

62

position, and keeping solvent molecules outside of the active site, which in-creases effective hydrolysis for the hydrophobic lactones and neutral organo-phosphates. Additionally, it stabilizes other residues in the positions that al-lows the formation of the H-bond network, which was shown to be essential for substrate binding and catalysis. The substitutions affected floppy loop mo-bility, increasing flexibility, and disturbing position of another crucial residue K192. Mutations of Y71 have a proven effect on the whole enzyme structure, dynamics, and activity. The knowledge could be used for further evolutionary studies because selective hydration can affect promiscuous and native activi-ties in a different way allowing to control the artificial evolution of enzymes.

5.5. An epistatic ratchet in PON1 (Paper IV) The evolutionary trajectories are believed to be irreversible,[260] and reversion of mutations that switch functionality causes loss of both native and new func-tions.[261–264] However, evolutionary transition states, and the sequences which are suitable for both forward and reverse divergence, have been reported.[265] The enzyme trajectories gaining new functions seem to be exceptionally epi-static and much more likely to be irreversible than changing the sequence, although keeping the same function.[266–268] Although non-epistatic trajectories were described[269] with an unknown mechanistic basis. Paper IV is a com-bined experimental and computational work, and it analyses the evolutionary trajectories of PON1. Closely following the experiments, we implemented a mechanistic analysis of newly generated evolutionary trajectories of serum paraoxonase 1 with native (lactonase) and promiscuous (organophosphate) ac-tivities. Both of the trajectories started with a single mutation, H115W. This mutation caused a loss in native activity and increased promiscuous function-ality. The neo-functionalization trajectory increased the organophosphate ac-tivity, while the re-functionalization trajectory restored the lactonase activity. The histidine revertants of the trajectories demonstrated an opposite trend. The revertants of the re-functionalization trajectory were fully active, whereas the revertant of neo-functionalization trajectory lost both functions. To computa-tionally validate mechanistic insights, we performed substrate-free (apo form) Hamiltonian replica exchange molecular dynamics (HREX-MD) simulations with PLUMED 2.3.0 plugin,[270] where we probed the rotamer space accessi-ble for the side chains of residues marked in experiments as crucial – on posi-tion 53 and 115. HREX-MD simulations were performed in GROMACS v. 5.1.4.[271,272] on the wild type PON1 (PDB ID: 3SRG[247]), 5m-OPH-115H crystal (PDB ID: G6MU[248]), and manually generated 5m-OPH-115H/G69L, 5m-OPH-115H/F222S, 5m-OPH-115H/R134H, and 5m-OPH-115H/G69L/R134H mutants. Our calculations were performed with the lipid DDM micelle generated by the Micelle Maker web server and the OPLS-AA force field for consistency with our EVB calculations.

Page 63: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

63

We validated the position of active site residues in all revertant crystals to unravel the mechanistic explanation of the irreversibility and found that both crucial active site residues, H115 and E53, demonstrated diverse positions. Histidine was found in three possible positions, which we named ‘out’ (far from active site calcium and N168), ‘in’ (close to calcium ion and N168), and ‘alternate’ (between mentioned positions). Two E53 positions were observed: one similar to the wild type position (‘on’), another one close to the H115W mutant location (‘off’). They are presented in Fig. 26. Additionally, we noted that the displacement of E53 is correlated with the shift of calcium ion (Fig. 26). In the revertant structure (5m-OPH-115H), histidine 115 appeared in a completely different position than in the wild type (Fig. 25), with a displace-ment of E53’s side chain that could affect in loss of OPH activity, as well as in the failure of regaining the native activity. Different conformation observed for both residues in the 5m-OPH crystal (in comparison to wild-type) can ex-plain why the 5m-OPH trajectory was irreversible. The HREX-MD data sug-gest that the displacements already exist in the wild type of PON1.

Figure 25. Conformational space analysis of (A) H115 and (B) E53 dihedral angles observed in various systems (wild-type PON1, the 5m-OPH mutant, the revertants:

Page 64: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

64

5m-OPH-115H, 5m-OPH-115H/G69L, 5m-OPH-115H/G69L/R134H, 5m-OPH-115H/S222F, 5m-OPH-115H/G69L/R134H, and 5m-OPH-115H/G69L/R134H and S222F) (analysis on results from HREX-MD). Three positions for H115 (‘in’, ‘out’ and ‘alternate’) and two for E53 (‘on’ and ‘off’) were observed, as explained in the text. The figure was reproduced from Ref. [248], by permission of Oxford University Press.

Figure 26. Possible configurations of histidine 115 and glutamine 53 in crystal struc-tures. (A) Overlay of wild type PON1 (blue) with 5m-OPH crystal (orange) and its revertant (5m-OPH-115H; pink). (B) Comparison of the structure of wild type PON1 (pink) and 5m-OPH (brown) structures both bound to lactone substrate (not shown here). The major conformations of histidine 115 (C) and glutamate 53 (D) observed in our studies. (E) Conformational space analysis of the dihedral angles of glutamate 53 and histidine 115 side chains for different structures. The figure was reproduced from Ref.[248], by permission of Oxford University Press.

We performed EVB calculations in order to describe how the reversion of tra-jectory appeared. All calculations were again performed in the Q-package[257] with the OPLS-AA[148] force fields using all variants of mutants with lactone substrate (TBBL). We used wild type crystal structure (PDB ID: 3SRG[247]), and H115W mutant structure (PDB ID: 4HHO[248]) of PON1 expressed in E. coli, and all other mutations were introduced manually using the Richardson rotamer library[273] implemented in Chimera[259] software.

Page 65: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

65

There were two possible suggestions to explain why the activity was re-gaining: 1) E53 could take over and compensate for the loss of H115, or 2) D269 acts as a general base similar to the promiscuous activity. We could reproduce experimental energetics results (Fig. 27) and proved that in almost all revertant structures E53/H115-based mechanism takes over, except the fi-nal round (9H8) where a swap occurred to the D269-based mechanism of the reaction (Fig. 28). The 9H8 variant (round 9) has H134R mutation, and this reversion led H115 to move to non-catalytic ‘out’ conformation.

We concluded that three residues E53, D269 (calcium ligating residues), and H115 appeared to be highly versatile and had a crucial role in the revers-ibility of the PON1 evolutionary trajectories. The 5m-OPH variant, which ap-peared in the neo-functionalization trajectory, showed high organophosphate activity and almost no lactonase activity due to change in reaction mechanism caused mainly by displacement of E53 residue. Introducing histidine back brought both E53 and H115 displacement, causing loss of both promiscuous and native activities. Additionally, our results indicate that serum paraoxonase 1 uses the alternative/promiscuous mechanism with re-functionalization tra-jectories. However, when the histidine residue was reversed, the original mechanism took over a backup D269-based mechanism (excluding last round, 9H8 variant). This indicates that in the case of PON1, the pre-existing mech-anism is more potent to reversion than a newly established mechanism. We speculated that the D269 mechanism was not the intrinsic mechanism, but more probably one which appeared by optimization of PON1’s catalytic effi-ciency that introduced new interactions reinforcing the primary mechanism.

Page 66: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

66

Figure 27. The proposed reaction mechanism for native (A) and promiscuous (B) ac-tivity. The comparison between experimental (red) and calculated (blue) free energies for mutants (C) and revertant (D) structures. The figure was reproduced from Ref.[248] by permission of Oxford University Press.

Figure 28. The energetics from EVB calculations for mutants and revertants with de-tailed information about the mechanism. The figure was reproduced from Ref.[248] by permission of Oxford University Press.

Page 67: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

67

6. Conclusions

Phosphate and sulfate ester hydrolysis are essential biological reactions in-volved in living organisms. Full interpretation of enzymes specificity and cat-alytic efficiency can aid to successfully engineer new enzyme variants, find and improve potential drugs, as well as increase our understanding of how natural selection works in the context of enzyme evolution. Unfortunately, it is a challenging and extremely complex problem, with many unsolved puzzles despite years of investigations. Experimental techniques offer valuable in-sights into enzyme structure and catalytic efficiency. However, they do not provide mechanistic details in the atomistic resolution, which is possible with computational approaches. Recent years in the development of computational methods contribute to more detailed and accurate calculations, which can be used for careful investigation of reaction mechanisms and prediction of pos-sible improvements.

The studies presented here focused on mechanistic insights to non-enzy-matic sulfur diester hydrolysis (Paper I) in purpose to provide a detailed ex-planation of the reaction mechanism of these compounds that could be used for a better understanding of enzymatic catalysis. Computational methods, such as DFT calculations, were addressed to explain the nature of transition states and reaction pathways of diaryl sulfate diesters hydrolysis. We com-pared obtained results to existing data for phosphate and sulfate transfer reac-tions to validate the differences in the reaction mechanisms. Additionally, we showed that the selection of appropriate DFT functionals could be challeng-ing, and proper investigation for the specific problem is needed. Our analysis supported the experimental assumption that hydrolysis of diaryl sulfate diesters proceeds through a concerted mechanism, with slight dissociative na-ture of transition states. These results provide an improved understanding of enzyme substrate selectivity as well as on the chemical role of sulfate esters.

Other significant studies were achieved on enzymatic catalysis of organo-phosphate and lactone hydrolyses. The main goal here was to provide a better understanding of the PON1 promiscuity and catalysis, as well as to enhance knowledge of enzyme evolution. We used the EVB approach to validate the role of interactions between PON1 and HDL in Paper II. We showed that the interactions occur via binding to H1 and H2 helixes, which provide a hydrogen bond network and proper alignment of the crucial catalytic residues. These interactions restrain unproductive degrees of freedom inside the active site, and in this way, improve enzyme catalysis. In Paper III, we investigated the

Page 68: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

68

role of the active site loop on enzyme performance. Results from our calcula-tions demonstrated that the loop with Y71 residue is acting as the hydrophobic gate, which keeps solvent molecules outside of the active site. Mutations of the loop increased the solvation of the active site and reduce enzyme effec-tiveness. In the third enzymatic study (Paper IV), we analyzed the PON1 tra-jectories from directed evolution experiments, which both started with H115W mutation (H115 is proved to act as a general base). One of the trajec-tories (neo-functionalization trajectory) amplified promiscuous activity, and other (re-functionalization trajectory) reconstructed the native activity. The H115 revertants of the first trajectory lost both functions, although revertants of another were completely active. The studies aimed to discover possible changes in the structure and mechanism which provide modifications in the enzyme activity. We showed that positions of two residues (E53, H115) are crucial for reversion of function; however, PON1 has a backup D269 mecha-nism that could be used for both native- lactonase and promiscuous- organo-phosphate activity. These findings are essential for understanding natural and engineered enzyme evolution. We demonstrate that even highly conserved residues may change over long enough evolutionary trajectories.

Studies presented in the thesis provide relevant insight into phosphate and sulfate transfer reactions, which could be used in rational drug design as many prodrugs contain phosphate groups. Further, this work could also help to dis-cover better drugs for the treatment of organophosphate poisoning. Addition-ally, our findings provide fundamental knowledge in understanding enzyme promiscuity and selectivity, and this understanding can be applied to improve our ability to artificially evolve enzymes.

Page 69: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

69

7. Sammanfattning på svenska

Enzymer är biologiska katalysatorer som påskyndar alla biokemiska reakt-ioner som äger rum i celler. Trots deras livsavgörande roller är många frågor om deras funktioner obesvarade. Experimentella tekniker, såsom kinetiska mätningar, spektroskopiska analyser och lägesspecifik mutagenes, ger viktig information om enzymstruktur, aminosyrarester, samt konformation av aktiva ytan. Men forskare söker fortfarande en fullständig förståelse av enzymkata-lys. Att kombinera experiment med beräkningstekniker ger möjligheter till en förklaring av samband mellan struktur och funktion. Beräkningsmodellering är en robust verktygslåda som kan bidra till en grundläggande analys av bio-molekylers reaktivitet och dynamik.

Föreningar som innehåller fosfat- och sulfatgrupper är viktiga för liv. De används som biologisk energikälla (ATP), som signalmolekyler (GTP), ko-enzym, och byggstenar (DNA, RNA). Dessutom kan fosfatestrar användas som insekticider, herbicider, flamskyddsmedel och som kemvapen. Klyvning av fosfatbindningen sker extremt långsamt vid spontan hydrolys, men det är en mycket vanlig reaktion i levande organismer. Fosfataser (enzymer som ka-talyserar klyvning av fosfatbindning) är avgörande för både fysiologisk regle-ring och för vissa allvarliga sjukdomstillstånd som astma, immunsuppression, hjärt-kärlsjukdomar och diabetes.

Att förstå grunderna för fosforyl- och sulfurylöverföringsreaktioner är av-görande för medicinska och biotekniska industrier, för att kunna skapa nya och förbättra befintliga läkemedel, modifiera enzymstrukturer, samt för alt förstå utvecklingen av vissa sjukdomar. Trots årtionden av både experiment och beräkningsstudier är mekanistiska detaljer om dessa reaktioner emellertid fortsatt kontroversiella. Reaktionerna kan ske via flera vägar som involverar mellansteg eller övergångstillstånd. För att lösa detta pussel utförde vi beräk-ningsstudier för att verifiera reaktionsvägen för diarylsulfatdiester-hydrolys (artikel I). Vi föreslår att reaktionen fortgår genom er samordna mekanism med ett löst (något dissociativt) övergångstillstånd.

Serumparaoxonas 1 (PON1) är ett kalciumberoende laktonas, som binder till högdensitetslipoprotein (HDL) med apolipoprotein A-I (ApoA-I). Enzy-met är mycket promiskuöst och katalyserar hydrolys av flera olika föreningar såsom laktoner, aromatiska estrar, oxoner och organofosfater. Vi utförde flera studier av PON1s reaktionsmekanism, dess promiskuitet, interaktioner med HDL och dess evolutionära utveckling. Vi använde EVB-metoden för att va-lidera rollen av interaktioner mellan PON1 och HDL i artikel II. Vi visade att

Page 70: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

70

interaktionerna sker via bindning till -helixarna H1 och H2, vilken resulterar i ett vätebindningsnätverk som leder till korrekt anpassning av de avgörande katalytiska aminosyrorna. Dessa interaktioner begränsar onödig rörelsefrihet på det aktiva stället, och på det sättet förbättras enzymkatalysen. I artikel III undersökte vi rollen av en ögla i proteinstrukturen för enzymfunktion. Resultat från våra beräkningar visar att Y71 fungerar som en hydrofob grind, som ute-stänger vatten molekyler från det aktiva sätet. Mutationer i öglan ökade sol-vatiseringen av den aktiva ytan vilket minskade enzymets effektivitet. I den tredje enzymatiska studien (artikel IV) analyserade vi utifrån riktade evolut-ionsexperiment hur PON1 utvecklades från en införd H115W mutation; H115 har visat sig fungera som en generell bas. Frågeställningen gällde hur evolut-ionen mot nya funktioner skedde. En av utvecklingsvägarna (neo-funktional-iseringsvägen), förstärkte promiskuös aktivitet medan den andra (re-funktion-aliseringsvägen) återskapade den nativa aktiviteten. H115-revertanterna från den vägen förlorade dock båda funktionerna. Studierna syftade till att upp-täcka möjliga förändringar i strukturen och mekanismen som ger modifie-ringar i enzymaktiviteten. Vi visade att positioner för två aminosyror (E53 och H115) är avgörande för en funktion. PON1 har emellertid en säkerhetskopie-rad D269-mekanism som kan användas för både nativ laktonas och promis-kuös organofosfataktivitet. Dessa fynd är viktiga för att förstå naturlig och artificiell enzymutveckling. Vi föreslår att även mycket konserverade ami-nosyror kan förändras givet tillräckligt med tid.

Beräkningsmodellering är en otroligt kraftfull verktygslåda för analys av biomolekylers reaktivitet och dynamik. En av de mest använda programva-rorna i denna avhandling var empirical valence bond (EVB) metoden. Våra modeller kan reproducera experimentella resultat och ge mekanistiska insikter och en bättre förståelse av enzymers roll och proteiners evolution.

Page 71: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

71

8. Acknowledgements

I would like to thank all the people whose assistance had a significant impact on my knowledge, Ph.D. studies, and this thesis.

I wish to express my deepest gratitude to my supervisors Lynn Kamerlin and Mikael Widersten. Thank you, Lynn, for allowing me to work with challeng-ing and fascinating projects, and to meet brilliant scientists from all over the world. I am grateful for your guidance throughout this research. Mikael, I wish to acknowledge the support and feedback during my studies as well as writing the thesis. Your sincerity and motivation have deeply inspired me and help to go through more difficult moments. I am thankful to all collaborators for teaching me scientific thinking and showing the outstanding knowledge in their area. Besides, I would like to thank Kamerlin group members (former and present) for compelling discussions, group meetings full of feedback, and every fika. Fernanda, Alexandra and Chris for introducing me to research projects and Ph.D. Miha, Anna, David, Beat, and Alex for discussions about EVB, which taught me a lot; for bringing questions even for trivial problems that allowed me to look to them from other perspectives. Additionally, for all board games evening and barbecues. Paul, for your exceptional knowledge about Q pro-gram and assistance in calculations and problem-solving. I am glad that you have never gave up even when I had many questions. Dusan, I am grateful for all your support during last years of my Ph.D., and every office discussion (scientific and not scientific as well). Malin, Yash, thank you for your opti-mism. Michal, for the stimulating discussions and showing the light in the darkness. Marina, Sergi, I am grateful for our meetings, RPG sessions, and parties. Catia and Ana Rita, which are always smiling, thank you for your support and introduction to DFT world. Jasmin and Rory, I would like to express my sincere gratitude for your feedback on my thesis and English sup-port. Rory, thank you for all animals, they were super cute! Adrian, Anil, Sebastian for extraordinary thinking in problem-solving solutions and all jokes during lunches. Andrey, thank you for Slavic discussions and giving me a lot information about our culture. Thank you all for fantastic time in the group and being part of my scientific family.

Page 72: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

72

My sincere thanks also go to Huan for showing me the beauty and power of experiments. Zaza, thank you for the accommodation and for introducing me to the Swedish world and ICM. My appreciation to all people that came to our board games evenings for an exciting time. I would like to thank everybody at ICM and Chemistry departments for scientific discussions, amazing sem-inars and Friday fikas. Finally, I would like to acknowledge with gratitude the support and love of my family- my parents, Teresa and Gerard, that always were proud of me and help me to reach unreachable. My husband, Irek, without whom I would never be where I am. You still support me and follow in every decision. This thesis would not have been possible without you.

The cover of this thesis was design using available vector pictures from freepik.com

Page 73: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

73

Bibliography

[1] H. J. Jessen, N. Ahmed, A. Hofer, Org. Biomol. Chem. 2014, 12, 3526–3530. [2] F. H. Westheimer, Science 1987, 235, 1173–1178. [3] D. L. N. M. M. Cox, Lehninger Principles of Biochemistry, W. H. Freeman 2013. [4] S. C. Kamerlin, P. K. Sharma, R. B. Prasad, A. Warshel, Q Rev Biophys 2013,

46, 1–132. [5] G. K. Schroeder, C. Lad, P. Wyman, N. H. Williams, R. Wolfenden, Proc. Natl.

Acad. Sci. U. S. A. 2006, 103, 4052 LP – 4055. [6] S. Adhikari, Enzym. Food Biotechnol., Academic Press, 2019, 711–721. [7] C. M. Timperley, Best Synth. Methods Organophosphorus Chem., Academic

Press 2015, 365–562. [8] E. L. Robb, M. B. Baker, Organophosphate Toxic., Treasure Island 2019. [9] G. Georgiadis, C. Mavridis, C. Belantis, I. E. Zisis, I. Skamagkas, I.

Fragkiadoulaki, I. Heretis, V. Tzortzis, K. Psathakis, A. Tsatsakis, et al., Toxicology 2018, 406–407, 129–136.

[10] O. Khersonsky, D. S. Tawfik, Biochemistry 2005, 44, 6371–6382. [11] T. Bhattacharyya, S. J. Nicholls, E. J. Topol, R. Zhang, X. Yang, D. Schmitt, X.

Fu, M. Shao, D. M. Brennan, S. G. Ellis, et al., JAMA 2008, 299, 1265–1276. [12] T. S. Sigales, G. Uliano, L. Muniz, C. Barros, A. Schneider, S. C. Valle, J.

Pediatr. (Rio. J). 2019. [13] N. Shunmoogam, P. Naidoo, R. Chilton, Vasc. Health Risk Manag. 2018, 14,

137–143. [14] C. E. Furlong, S. M. Suzuki, R. C. Stevens, J. Marsillach, R. J. Richter, G. P.

Jarvik, H. Checkoway, A. Samii, L. G. Costa, A. Griffith, et al., Chem. Biol. Interact. 2010, 187, 355–361.

[15] J. M. Berg, J. L. Tymoczko, L. Stryer, Biochemistry, W.H. Freeman 2012. [16] W. Yang, Q. Rev. Biophys. 2011, 44, 1–93. [17] A. Radzicka, R. Wolfenden, Science 1995, 267, 90–93. [18] A. Payen, J. F. Persoz, Ann Chim 1833, 53, 73–92. [19] Florkin M, Rev. Med. Liege 1957, 12, 139–44. [20] O. K. and D. S. Tawfik, Annu. Rev. Biochem. 2010, 79, 471–505. [21] Y. Yasutake, M. Yao, N. Sakai, T. Kirita, I. Tanaka, J. Mol. Biol. 2004, 344,

325–333. [22] A. Romero-Rivera, M. Garcia-Borras, S. Osuna, ACS Catal. 2017, 7, 8524–8532. [23] A. V Due, J. Kuper, A. Geerlof, J. P. von Kries, M. Wilmanns, Proc. Natl. Acad.

Sci. U. S. A. 2011, 108, 3554–3559. [24] D. Blaha-Nelson, D. M. Kruger, K. Szeler, M. Ben-David, S. C. L. Kamerlin, J.

Am. Chem. Soc. 2017, 139, 1155–1167. [25] D. Petrović, V. A. Risso, S. C. L. Kamerlin, J. M. Sanchez-Ruiz, J. R. Soc.

Interface 2018, 15, 20180330. [26] A. Pabis, V. A. Risso, J. M. Sanchez-Ruiz, S. C. L. Kamerlin, Curr. Opin. Struct.

Biol. 2018, 48, 83–92. [27] J. B. Sumner, J. Biol. Chem. 1926, 69, 435–441.

Page 74: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

74

[28] D. Crowfoot, Nature 1940, 145, 990–991. [29] R. D. Simoni, R. H. Hill, M. Vaughan, J. Biol. Chem. 2002, 277, 23e. [30] D. Ringe, G. A. Petsko, Science 2008, 320, 1428 LP – 1429. [31] L. N. Johnson, D. C. Phillips, Nature 1965, 206, 761–763. [32] D. E. J. Koshland, Nature 2004, 432, 447. [33] T. Einav, L. Mazutis, R. Phillips, J. Phys. Chem. B 2016, 120, 6021–6037. [34] M. C. Childers, V. Daggett, Mol. Syst. Des. Eng. 2017, 2, 9–33. [35] Y. Tu, A. Laaksonen, Comb. Quantum Mech. Mol. Mech. Some Recent

Progresses QM/MM Methods, Academic Press 2010, 1–15. [36] M. M. L. Michaelis L., Biochem. Z. 1913, 49, 333–369. [37] K. A. Johnson, FEBS Lett. 2013, 587, 2753–2766. [38] G. E. Briggs, J. B. Haldane, Biochem. J. 1925, 19, 338–339. [39] V. L. Schramm, Chem. Rev. 2018, 118, 11194–11258. [40] A.-N. Bondar, C. del Val, S. H. White, Structure 2009, 17, 395–405. [41] S. Al-Zuhair, K. B. Ramachandran, M. Farid, M. K. Aroua, P. Vadlani, S.

Ramakrishnan, L. Gardossi, Enzyme Res. 2011, 2011, 658263. [42] E. Fischer, Berichte der Dtsch. Chem. Gesellschaft 1894, 27, 2985–2993. [43] D. E. Koshland, Proc. Natl. Acad. Sci. U. S. A. 1958, 44, 98–104. [44] C. J. Tsai, S. Kumar, B. Ma, R. Nussinov, Protein Sci. 1999, 8, 1181–1190. [45] T. R. Weikl, C. von Deuster, Proteins 2009, 75, 104–110. [46] S. Fieulaine, A. Boularot, I. Artaud, M. Desmadril, F. Dardel, T. Meinnel, C.

Giglione, PLoS Biol. 2011, 9, e1001066. [47] X. Du, Y. Li, Y.-L. Xia, S.-M. Ai, J. Liang, P. Sang, X.-L. Ji, S.-Q. Liu, Int. J.

Mol. Sci. 2016, 17. [48] W. M. Bayliss, Nature 1925, 116, 744. [49] J. H. Quastel, Biochem. J. 1926, 20, 166–194. [50] W. P. Jencks, Adv. Enzymol. Relat. Areas Mol. Biol. 1975, 43, 219–410. [51] A. Warshel, J. Florián, Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 5950–5955. [52] B. Ensing, A. Laio, M. Parrinello, M. L. Klein, J. Phys. Chem. B 2005, 109,

6676–6687. [53] Y. Zhao, D. G. Truhlar, Theor. Chem. Acc. 2008, 120, 215–241. [54] F. Duarte, S. Gronert, S. C. L. Kamerlin, J. Org. Chem. 2014, 79, 1280–1288. [55] L. Pauling, Chem. Eng. News Arch. 1946, 24, 1375–1377. [56] R. Wolfenden, Nature 1969, 223, 704–705. [57] H. Gu, S. Zhang, K.-Y. Wong, B. K. Radak, T. Dissanayake, D. L. Kellerman,

Q. Dai, M. Miyagi, V. E. Anderson, D. M. York, et al., Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 13002–13007.

[58] D. Burschowsky, A. van Eerde, M. Ökvist, A. Kienhöfer, P. Kast, D. Hilvert, U. Krengel, Proc. Natl. Acad. Sci. U. S. A. 2014, 111, 17516–17521.

[59] K. E. Ranaghan, L. Ridder, B. Szefczyk, W. A. Sokalski, J. C. Hermann, A. J. Mulholland, Org. Biomol. Chem. 2004, 2, 968–980.

[60] A. Warshel, P. K. Sharma, M. Kato, Y. Xiang, H. Liu, M. H. M. Olsson, Chem. Rev. 2006, 106, 3210–3235.

[61] C. A. Vernon, Proc. R. Soc. London. Ser. B, Biol. Sci. 1967, 167, 389–401. [62] A. Warshel, M. Levitt, J. Mol. Biol. 1976, 103, 227–249. [63] A. Warshel, Proc. Natl. Acad. Sci. U. S. A. 1978, 75, 5250–5254. [64] A. Warshel, J. Biol. Chem. 1998, 273, 27035–27038. [65] A. T. P. Carvalho, A. Barrozo, D. Doron, A. V. Kilshtain, D. T. Major, S. C. L.

Kamerlin, J. Mol. Graph. Model. 2014, 54, 62–79. [66] T. Zou, V. A. Risso, J. A. Gavira, J. M. Sanchez-Ruiz, S. B. Ozkan, Mol. Biol.

Evol. 2015, 32, 132–143.

Page 75: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

75

[67] N. Doucet, P. Y. Savard, J. N. Pelletier, S. M. Gagne, J. Biol. Chem. 2007, 282, 21448–21459.

[68] K. A. Henzler-Wildman, M. Lei, V. Thai, S. J. Kerns, M. Karplus, D. Kern, Nature 2007, 450, 913–916.

[69] I. Bahar, T. R. Lezon, L.-W. Yang, E. Eyal, Annu. Rev. Biophys. 2010, 39, 23–42.

[70] T. Sikosek, H. S. Chan, J. R. Soc. Interface 2014, 11, 20140419. [71] D. A. Liberles, S. A. Teichmann, I. Bahar, U. Bastolla, J. Bloom, E. Bornberg-

Bauer, L. J. Colwell, A. P. J. de Koning, N. V Dokholyan, J. Echave, et al., Protein Sci. 2012, 21, 769–785.

[72] H. N. Motlagh, J. O. Wrabl, J. Li, V. J. Hilser, Nature 2014, 508, 331–339. [73] S. Osuna, G. Jimenez-Oses, E. L. Noey, K. N. Houk, Acc. Chem. Res. 2015, 48,

1080–1089. [74] U. Olsson, M. Wolf-Watz, Nat. Commun. 2010, 1, 111. [75] C. Schulenburg, D. Hilvert, Top. Curr. Chem. 2013, 337, 41–67. [76] O. Khersonsky, C. Roodveldt, D. S. Tawfik, Curr. Opin. Chem. Biol. 2006, 10,

498–508. [77] L. C. James, D. S. Tawfik, Trends Biochem. Sci. 2003, 28, 361–368. [78] M. Schulte, D. Petrovic, P. Neudecker, R. Hartmann, J. Pietruszka, S. Willbold,

D. Willbold, V. Panwalkar, ACS Catal. 2018, 8, 3971–3984. [79] H. Ma, K. Szeler, S. C. L. Kamerlin, M. Widersten, Chem. Sci. 2016, 7, 1415–

1421. [80] J. R. Schnell, H. J. Dyson, P. E. Wright, Annu. Rev. Biophys. Biomol. Struct.

2004, 33, 119–140. [81] A. Haouz, C. Twist, C. Zentz, P. Tauc, B. Alpert, Eur. Biophys. J. 1998, 27, 19–

25. [82] O. Fisette, S. Gagne, P. Lague, Biophys. J. 2012, 103, 1790–1801. [83] B. E. Clifton, J. A. Kaczmarski, P. D. Carr, M. L. Gerth, N. Tokuriki, C. J.

Jackson, Nat. Chem. Biol. 2018, 14, 542–547. [84] A. T. Brunger, C. L. 3rd Brooks, M. Karplus, Proc. Natl. Acad. Sci. U. S. A.

1985, 82, 8458–8462. [85] A. Merlino, L. Vitagliano, M. A. Ceruso, A. Di Nola, L. Mazzarella, Biopolymers

2002, 65, 274–283. [86] H. Li, X.-Q. Yao, B. J. Grant, PLoS Comput. Biol. 2018, 14, e1006364. [87] D. Spiering, L. Hodgson, Cell Adh. Migr. 2011, 5, 170–180. [88] D. D. Boehr, R. N. D’Amico, K. F. O’Rourke, Protein Sci. 2018, 27, 825–838. [89] D. Mazumder-Shivakumar, T. C. Bruice, Proc. Natl. Acad. Sci. U. S. A. 2004,

101, 14379–14384. [90] M. Korbus, S. Backe, F.-J. Meyer-Almes, J. Mol. Recognit. 2015, 28, 201–209. [91] E. M. Flynn, J. A. Hanson, T. Alber, H. Yang, J. Am. Chem. Soc. 2010, 132,

4772–4780. [92] S. Y. Lam, R. C. Y. Yeung, T.-H. Yu, K.-H. Sze, K.-B. Wong, PLoS Biol. 2011,

9, e1001027. [93] P. J. O’Brien, D. Herschlag, Chem. Biol. 1999, 6, R91–R105. [94] S. D. Copley, Curr. Opin. Chem. Biol. 2003, 7, 265–272. [95] K. Hult, P. Berglund, Curr. Opin. Biotechnol. 2003, 14, 395–400. [96] U. T. Bornscheuer, R. J. Kazlauskas, Angew. Chemie - Int. Ed. 2004, 43, 6032–

6040. [97] P. B. and S. Park, Curr. Org. Chem. 2005, 9, 325–336. [98] A. Aharoni, L. Gaidukov, O. Khersonsky, S. M. Gould, C. Roodveldt, D. S.

Tawfik, Nat. Genet. 2005, 37, 73–76.

Page 76: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

76

[99] W. S. Yew, J. Akana, E. L. Wise, I. Rayment, J. A. Gerlt, Biochemistry 2005, 44, 1807–1815.

[100] J. B. Thoden, E. A. Taylor Ringia, J. B. Garrett, J. A. Gerlt, H. M. Holden, I. Rayment, Biochemistry 2004, 43, 5716–5727.

[101] K. Hult, P. Berglund, Trends Biotechnol. 2007, 25, 231-238. [102] P. Rona, R. Ammon, H. Trurnit, Biochem. Z. 1931, 247, 113–145. [103] J. A. Gerlt, P. C. Babbitt, I. Rayment, Arch. Biochem. Biophys. 2005, 433, 59–

70. [104] A. Pabis, S. C. L. Kamerlin, Curr. Opin. Struct. Biol. 2016, 37, 14–21. [105] S. Lund, T. Courtney, G. J. Williams, ChemBioChem 2019, 20, 2217–2221. [106] R. D. Gupta, Sustain. Chem. Process. 2016, 4, 2. [107] R. A. Jensen, Annu. Rev. Microbiol. 1976, 30, 409–425. [108] M. Ycas, J. Theor. Biol. 1974, 44, 145–160. [109] M. S. Newton, V. L. Arcus, M. L. Gerth, W. M. Patrick, Curr. Opin. Struct.

Biol. 2018, 48, 110–116. [110] S. W. Soukup, Teratology 1974, 9, 250–251. [111] L. C. James, D. S. Tawfik, Protein Sci. 2001, 10, 2600–2607. [112] E. Campbell, M. Kaltenbach, G. J. Correy, P. D. Carr, B. T. Porebski, E. K.

Livingstone, L. Afriat-Jurnou, A. M. Buckle, M. Weik, F. Hollfelder, et al., Nat. Chem. Biol. 2016, 12, 944–950.

[113] G. Bhabha, D. C. Ekiert, M. Jennewein, C. M. Zmasek, L. M. Tuttle, G. Kroon, H. J. Dyson, A. Godzik, I. A. Wilson, P. E. Wright, Nat. Struct. Mol. Biol. 2013, 20, 1243–1249.

[114] C. J. Jackson, J.-L. Foo, N. Tokuriki, L. Afriat, P. D. Carr, H.-K. Kim, G. Schenk, D. S. Tawfik, D. L. Ollis, Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 21631–21636.

[115] J. P. Klinman, A. Kohen, J. Biol. Chem. 2014, 289, 30205–30212. [116] P. A. M. Dirac, Proc. R. Soc. London 1929, A123, 714–733. [117] L. Piela, Ideas of Quantum Chemistry, Elsevier, 2007. [118] M. J. Klein, Arch. Hist. Exact Sci. 1961, 1, 459–479. [119] N. Bohr, London, Edinburgh, Dublin Philos. Mag. J. Sci. 1913, 26, 1–25. [120] E. Rutherford, London, Edinburgh, Dublin Philos. Mag. J. Sci. 1911, 21, 669–

688. [121] A. Aspect, J. Villain, Comptes Rendus Phys. 2017, 18, 583–585. [122] R. G. Parr, in Horizons Quantum Chem., Springer 1980, 5–15. [123] Richard P. Feynman 1918-1988, The Feynman Lectures on Physics, Addison-

Wesley Pub. Co., C1963-1965. [124] E. Merzbachen, Quantum Mechanics, Wiley 1998. [125] F. Jensen, Introduction to Computational Chemistry, John Wiley & Sons, Inc.

2006. [126] A. R. Leach, Molecular Modelling : Principles and Applications, Prentice Hall

2001. [127] R. Pagni, J. Chem. Educ. 2006, 83, 387. [128] D. R. Hartree, Math. Proc. Cambridge Philos. Soc. 1928, 24, 89–110. [129] N. Mardirossian, M. Head-Gordon, Mol. Phys. 2017, 115, 2315–2372. [130] P. Hohenberg, W. Kohn, Phys. Rev. 1964, 136, B864–B871. [131] T. van Mourik, M. Bühl, M.-P. Gaigeot, Philos. Trans. A. Math. Phys. Eng.

Sci. 2014, 372, 20120488. [132] W. Kohn, L. J. Sham, Phys. Rev. 1965, 140, A1133–A1138. [133] J. P. Perdew, K. Burke, M. Ernzerhof, Phys. Rev. Lett. 1996, 77, 3865–3868. [134] J. P. Perdew, L. Constantin, Phys. Rev. B 2007, 75.

Page 77: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

77

[135] J. P. Perdew, V. N. Staroverov, J. Tao, G. E. Scuseria, Phys. Rev. A 2008, 78, 52513.

[136] F. Liu, J. Kong, J. Chem. Theory Comput. 2017, 13, 2571–2580. [137] J.-D. Chai, M. Head-Gordon, Phys. Chem. Chem. Phys. 2008, 10, 6615–6620. [138] R. Peverati, D. G. Truhlar, J. Phys. Chem. Lett. 2011, 2, 2810–2817. [139] N. Mardirossian, M. Head-Gordon, J. Chem. Phys. 2016, 145, 186101. [140] H. Arı, Z. Büyükmumcu, Comput. Mater. Sci. 2017, 138, 70–76. [141] K. M. Flurchick, Math. Comput. Appl. 2015, 20, 111–120. [142] H. Hirao, J. Phys. Chem. A 2011, 115, 9308–9313. [143] L. Monticelli, E. Salonen, Biomolecular Simulations: Methods and Protocols,

Humana Press 2013. [144] R. A. Latour, Colloids Surfaces B Biointerfaces 2014, 124, 25–37. [145] K. Lindorff-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis, R. O.

Dror, D. E. Shaw, Proteins 2010, 78, 1950–1958. [146] J. A. Maier, C. Martinez, K. Kasavajhala, L. Wickstrom, K. E. Hauser, C.

Simmerling, J. Chem. Theory Comput. 2015, 11, 3696–3713. [147] J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. de Groot, H.

Grubmüller, A. D. MacKerell Jr, Nat. Methods 2017, 14, 71–73. [148] W. L. Jorgensen, J. Tirado-Rives, J. Am. Chem. Soc. 1988, 110, 1657–1666. [149] Y. I. Yang, Q. Shao, J. Zhang, L. Yang, Y. Q. Gao, J. Chem. Phys. 2019, 151,

70902. [150] R. Elber, J. Chem. Phys. 2016, 144, 60901. [151] Y. Shibuta, S. Sakane, E. Miyoshi, S. Okita, T. Takaki, M. Ohno, Nat.

Commun. 2017, 8, 10. [152] P. E. M. Lopes, O. Guvench, A. D. MacKerell Jr, Methods Mol. Biol. 2015,

1215, 47–71. [153] T. P. Senftle, S. Hong, M. M. Islam, S. B. Kylasa, Y. Zheng, Y. K. Shin, C.

Junkermeier, R. Engel-Herbert, M. J. Janik, H. M. Aktulga, et al., npj Comput. Mater. 2016, 2, 15011.

[154] C. Chipot, A. Pohorille, Free Energy Calculations: Theory and Applications in Chemistry and Biology Springer Series in Chemical Physics, Springer 2007.

[155] C. D. Christ, T. Fox, J. Chem. Inf. Model. 2014, 54, 108–120. [156] P. Mikulskis, S. Genheden, U. Ryde, J. Chem. Inf. Model. 2014, 54, 2794–

2806. [157] L. Wang, Y. Wu, Y. Deng, B. Kim, L. Pierce, G. Krilov, D. Lupyan, S.

Robinson, M. K. Dahlgren, J. Greenwood, et al., J. Am. Chem. Soc. 2015, 137, 2695–2703.

[158] E. Harder, W. Damm, J. Maple, C. Wu, M. Reboul, J. Y. Xiang, L. Wang, D. Lupyan, M. K. Dahlgren, J. L. Knight, et al., J. Chem. Theory Comput. 2016, 12, 281–296.

[159] R. W. Zwanzig, J. Chem. Phys. 1954, 22, 1420–1426. [160] A. Elofsson, L. Nilsson, J. Mol. Biol. 1993, 233, 766–780. [161] L. S. Caves, J. D. Evanseck, M. Karplus, Protein Sci. 1998, 7, 649–666. [162] M. Lawrenz, R. Baron, J. A. McCammon, J. Chem. Theory Comput. 2009, 5,

1106–1116. [163] A. Romero-Rivera, M. Garcia-Borràs, S. Osuna, Chem. Commun. 2017, 53,

284–297. [164] M. J. Field, P. A. Bash, M. Karplus, J. Comput. Chem. 1990, 11, 700–733. [165] H. M. Senn, W. Thiel, Angew. Chem. Int. Ed. Engl. 2009, 48, 1198–1229. [166] M. Svensson, S. Humbel, R. D. J. Froese, T. Matsubara, S. Sieber, K.

Morokuma, J. Phys. Chem. 1996, 100, 19357–19363. [167] The Nobel Prize in Chemistry 2013, 2013.

Page 78: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

78

[168] F. Duarte, S. C. L. Kamerlin, Theory and Applications of the Empirical Valence Bond Approach: From Physical Chemistry to Chemical Biology, John Wiley & Sons 2017.

[169] S. C. L. Kamerlin, A. Warshel, Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011, 1, 30–45.

[170] M. Purg, S. C. L. Kamerlin, Methods Enzymol. 2018, 607, 3–51. [171] M. H. M. Olsson, W. W. Parson, A. Warshel, Chem. Rev. 2006, 106, 1737–

1756. [172] I. Feierberg, V. Luzhkov, J. Åqvist, J. Biol. Chem. 2000, 275, 22657–22662. [173] E. Hatcher, A. V Soudackov, S. Hammes-Schiffer, J. Am. Chem. Soc. 2007,

129, 187–196. [174] S. C. L. Kamerlin, J. Mavri, A. Warshel, FEBS Lett. 2010, 584, 2759–2766. [175] R. Sullivan, Introduction to Data Mining for the Life Sciences, Humana Press

2012. [176] P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison

Wesley Longman Publishing Co. 2006. [177] A. Rajan, P. L. Freddolino, K. Schulten, PLoS One 2010, 5, e9890–e9890. [178] W. A. Barbakh, Y. Wu, C. Fyfe, Stud. Comput. Intell. 2009, 249, 1–234. [179] K. Pearson, London, Edinburgh, Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [180] H. Hotelling, J. Educ. Psychol. 1933, 24, 417–441. [181] A. Amadei, A. B. Linssen, H. J. Berendsen, Proteins 1993, 17, 412–425. [182] C. C. David, D. J. Jacobs, Methods Mol. Biol. 2014, 1084, 193–226. [183] H. Abdi, L. J. Williams, Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–

459. [184] I. Jolliffe, Principal Component Analysis., Wiley 2005. [185] S. Haider, G. N. Parkinson, S. Neidle, Biophys. J. 2008, 95, 296–311. [186] C. Lad, N. H. Williams, R. Wolfenden, Proc. Natl. Acad. Sci. U. S. A. 2003,

100, 5607 LP – 5610. [187] J. K. Lassila, J. G. Zalatan, D. Herschlag, Annu. Rev. Biochem. 2011, 80, 669–

702. [188] S. R. Hanson, M. D. Best, C.-H. Wong, Angew. Chem. Int. Ed. Engl. 2004, 43,

5736–5763. [189] W. W. Butcher, F. H. Westheimer, J. Am. Chem. Soc. 1955, 77, 2420–2424. [190] J. Kumamoto, F. H. Westheimer, J. Am. Chem. Soc. 1955, 77, 2515–2518. [191] G. Di Sabato, W. P. Jencks, J. Am. Chem. Soc. 1961, 83, 4400–4405. [192] A. J. Kirby, A. G. Varvoglis, J. Am. Chem. Soc. 1967, 89, 415–423. [193] G. Di Sabato, W. P. Jencks, J. Am. Chem. Soc. 1961, 83, 4400–4405. [194] S. L. Buchwald, J. M. Friedman, J. R. Knowles, J. Am. Chem. Soc. 1984, 106,

4911–4916. [195] S. L. Buchwald, D. E. Hansen, A. Hassett, J. R. Knowles, Methods Enzymol.

1982, 87, 279–301. [196] A. J. Kirby, W. P. Jencks, J. Am. Chem. Soc. 1965, 87, 3209–3216. [197] A. C. Hengge, W. A. Edens, H. Elsing, J. Am. Chem. Soc. 1994, 116, 5045–

5049. [198] A. J. Kirby, A. G. Varvoglis, J. Am. Chem. Soc. 1967, 89, 415–423. [199] J. Florián, A. Warshel, J. Phys. Chem. B 1998, 102, 719–734. [200] M. Klähn, E. Rosta, A. Warshel, J. Am. Chem. Soc. 2006, 128, 15310–15323. [201] N. Iche-Tarrat, M. Ruiz-Lopez, J.-C. Barthelat, A. Vigroux, Chemistry 2007,

13, 3617–3629. [202] J. Aqvist, K. Kolmodin, J. Florian, A. Warshel, Chem. Biol. 1999, 6, R71-80. [203] S. C. L. Kamerlin, J. Wilkie, Org. Biomol. Chem. 2011, 9, 5394–5406. [204] T. Schweins, R. Langen, A. Warshel, Nat. Struct. Biol. 1994, 1, 476–484.

Page 79: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

79

[205] J. Florián, A. Warshel, J. Am. Chem. Soc. 1997, 119, 5473–5474. [206] S. C. L. Kamerlin, J. Florián, A. Warshel, ChemPhysChem 2008, 9, 1767–

1773. [207] R. H. Hoff, P. Larsen, A. C. Hengge, J. Am. Chem. Soc. 2001, 123, 9338–9344. [208] A. C. Hengge, Acc. Chem. Res. 2002, 35, 105–112. [209] S. C. L. Kamerlin, J. Org. Chem. 2011, 76, 9228–9238. [210] F. Duarte, J. Åqvist, N. H. Williams, S. C. L. Kamerlin, J. Am. Chem. Soc.

2015, 137, 1081–1093. [211] M. F. Mohamed, F. Hollfelder, Biochim Biophys Acta 2013, 1834, 417–424. [212] A. Pabis, F. Duarte, S. C. Kamerlin, Biochemistry 2016, 55, 3061–3081. [213] E. J. Fendler, J. H. Fendler, J. Org. Chem. 1968, 33, 3852–3859. [214] S. J. Benkovic, P. A. Benkovic, J. Am. Chem. Soc. 1966, 88, 5504–5511. [215] A. Hopkins, R. A. Day, A. Williams, J. Am. Chem. Soc. 1983, 105, 6062–6070. [216] N. Bourne, A. Hopkins, A. Williams, J. Am. Chem. Soc. 1985, 107, 4327–

4331. [217] B. T. Burlingham, L. M. Pratt, E. R. Davidson, V. J. Shiner, J. Fong, T. S.

Widlanski, J. Am. Chem. Soc. 2003, 125, 13036–13037. [218] E. Buncel, C. Chuaqui, J. Org. Chem. 1980, 45, 2825–2830. [219] E. Buncel, A. Raoult, J. F. Wiltshire, J. Am. Chem. Soc. 1973, 95, 799–802. [220] J. M. Younker, A. C. Hengge, J. Org. Chem. 2004, 69, 9043–9048. [221] R. Peverati, D. G. Truhlar, J. Phys. Chem. Lett. 2012, 3, 117–124. [222] H. Bar-Rogovsky, A. Hugenmatter, D. S. Tawfik, J. Biol. Chem. 2013, 288,

23914–23927. [223] J. J. Ceron, F. Tecles, A. Tvarijonaviciute, BMC Vet. Res. 2014, 10, 74. [224] D. I. Draganov, J. F. Teiber, A. Speelman, Y. Osawa, R. Sunahara, B. N. La

Du, J. Lipid Res. 2005, 46, 1239–1247. [225] J. Ruiz, H. Blanche, R. W. James, M. C. Garin, C. Vaisse, G. Charpentier, N.

Cohen, A. Morabia, P. Passa, P. Froguel, Lancet (London, England) 1995, 346, 869–872.

[226] H. Schmidt, R. Schmidt, K. Niederkorn, A. Gradert, M. Schumacher, N. Watzinger, H. P. Hartung, G. M. Kostner, Stroke 1998, 29, 2043–2048.

[227] J. G. Wheeler, B. D. Keavney, H. Watkins, R. Collins, J. Danesh, Lancet (London, England) 2004, 363, 689–695.

[228] B. Mackness, G. K. Davies, W. Turkie, E. Lee, D. H. Roberts, E. Hill, C. Roberts, P. N. Durrington, M. I. Mackness, Arterioscler. Thromb. Vasc. Biol. 2001, 21, 1451–1457.

[229] N. Gupta, S. Singh, V. N. Maturu, Y. P. Sharma, K. D. Gill, PLoS One 2011, 6, e17805.

[230] M. Antikainen, S. Murtomaki, M. Syvanne, R. Pahlman, E. Tahvanainen, M. Jauhiainen, M. H. Frick, C. Ehnholm, J. Clin. Invest. 1996, 98, 883–885.

[231] M. Arca, D. Ombres, A. Montali, F. Campagna, E. Mangieri, G. Tanzilli, P. P. Campa, G. Ricci, R. Verna, G. Pannitteri, Eur. J. Clin. Invest. 2002, 32, 9–15.

[232] D. Ombres, G. Pannitteri, A. Montali, A. Candeloro, F. Seccareccia, F. Campagna, R. Cantini, P. P. Campa, G. Ricci, M. Arca, Arterioscler. Thromb. Vasc. Biol. 1998, 18, 1611–1616.

[233] S. Billecke, D. Draganov, R. Counsell, P. Stetson, C. Watson, C. Hsu, B. N. La Du, Drug Metab. Dispos. 2000, 28, 1335–1342.

[234] L. G. Costa, T. B. Cole, G. P. Jarvik, C. E. Furlong, Annu. Rev. Med. 2003, 54, 371–392.

[235] C. L. Kuo, B. N. La Du, Drug Metab. Dispos. 1998, 26, 653–660. [236] D. I. Draganov, B. N. La Du, Naunyn. Schmiedebergs. Arch. Pharmacol. 2004,

369, 78–88.

Page 80: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

80

[237] L. Gaidukov, D. S. Tawfik, Biochemistry 2005, 44, 11843–11854. [238] M. C. Blatter, R. W. James, S. Messmer, F. Barja, D. Pometta, Eur. J. Biochem.

1993, 211, 871–879. [239] S. D. Nguyen, N. D. Hung, P. Cheon-Ho, K. M. Ree, S. Dai-Eun, Biochim.

Biophys. Acta 2009, 1790, 155–160. [240] M. Aviram, Free Radic. Biol. Med. 2004, 37, 1301–1303. [241] M. Ben-David, J. L. Sussman, C. I. Maxwell, K. Szeler, S. C. L. Kamerlin, D.

S. Tawfik, J. Mol. Biol. 2015, 427, 1359–1374. [242] H. G. Davies, R. J. Richter, M. Keifer, C. A. Broomfield, J. Sowalla, C. E.

Furlong, Nat. Genet. 1996, 14, 334–336. [243] O. Khersonsky, D. S. Tawfik, Chembiochem 2006, 7, 49–53. [244] H. Jakubowski, J. Biol. Chem. 2000, 275, 3957–3962. [245] G. Amitai, L. Gaidukov, R. Adani, S. Yishay, G. Yacov, M. Kushnir, S.

Teitlboim, M. Lindenbaum, P. Bel, O. Khersonsky, et al., FEBS J. 2006, 273, 1906–1919.

[246] M. Goldsmith, Y. Ashani, Y. Simo, M. Ben-David, H. Leader, I. Silman, J. L. Sussman, D. S. Tawfik, Chem. Biol. 2012, 19, 456–466.

[247] M. Ben-David, M. Elias, J.-J. Filippi, E. Dunach, I. Silman, J. L. Sussman, D. S. Tawfik, J. Mol. Biol. 2012, 418, 181–196.

[248] M. Ben-David, M. Soskine, A. Dubovetskyi, K.-P. Cherukuri, O. Dym, J. L. Sussman, Q. Liao, K. Szeler, S. C. L. Kamerlin, D. S. Tawfik, Mol. Biol. Evol. 2019, 37, 4, 1133-1147.

[249] G. Aggarwal, R. Prajapati, R. K. Tripathy, P. Bajaj, A. R. S. Iyengar, A. T. Sangamwar, A. H. Pande, PLoS One 2016, 11, e0147999.

[250] M. Harel, A. Aharoni, L. Gaidukov, B. Brumshtein, O. Khersonsky, R. Meged, H. Dvir, R. Ravelli, A. Mccarthy, L. Toker, et al., Nat. Struct. Mol. Biol. 2004, 11, 412–419.

[251] S. Urban, M. S. Wolfe, Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 1883–1888. [252] L. Kalvodova, N. Kahya, P. Schwille, R. Ehehalt, P. Verkade, D. Drechsel, K.

Simons, J. Biol. Chem. 2005, 280, 36815–36823. [253] A.-N. Bondar, C. del Val, J. A. Freites, D. J. Tobias, S. H. White, Structure

2010, 18, 847–857. [254] E. Jardon-Valadez, A.-N. Bondar, D. J. Tobias, Biophys. J. 2010, 99, 2200–

2207. [255] L. Gaidukov, R. I. Viji, S. Yacobson, M. Rosenblat, M. Aviram, D. S. Tawfik,

Biochemistry 2010, 49, 532–538. [256] K. Suhre, Y.-H. Sanejouand, Nucleic Acids Res. 2004, 32, W610-4. [257] J. Marelius, K. Kolmodin, I. Feierberg, J. Aqvist, J. Mol. Graph. Model. 1998,

16, 213-225,261. [258] M. V Shapovalov, R. L. Dunbrack Jr, Structure 2011, 19, 844–858. [259] E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt,

E. C. Meng, T. E. Ferrin, J. Comput. Chem. 2004, 25, 1605–1612. [260] F. Liu, E. Proynov, J.-G. Yu, T. R. Furlani, J. Kong, J. Chem. Phys. 2012, 137,

114104. [261] O. Soylemez, F. A. Kondrashov, Genome Biol. Evol. 2012, 4, 1213–1222. [262] M. J. Harms, J. W. Thornton, Nat. Rev. Genet. 2013, 14, 559–571. [263] T. N. Starr, J. W. Thornton, Protein Sci. 2016, 25, 1204–1218. [264] J. Cheema, J. A. Faraldos, P. E. O’Maille, Plant Sci. 2017, 255, 29–38. [265] A. Wellner, M. Raitses Gurevich, D. S. Tawfik, PLoS Genet. 2013, 9,

e1003665. [266] J. T. Bridgham, E. A. Ortlund, J. W. Thornton, Nature 2009, 461, 515–519. [267] A. Gupta, C. Adami, PLoS Genet. 2016, 12, e1005960.

Page 81: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

81

[268] N. C. Wu, A. J. Thompson, J. Xie, C.-W. Lin, C. M. Nycholat, X. Zhu, R. A. Lerner, J. C. Paulson, I. A. Wilson, Nat. Commun. 2018, 9, 1264.

[269] M. Kaltenbach, J. R. Burke, M. Dindo, A. Pabis, F. S. Munsberg, A. Rabin, S. C. L. Kamerlin, J. P. Noel, D. S. Tawfik, Nat. Chem. Biol. 2018, 14, 548–555.

[270] G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni, G. Bussi, Comput. Phys. Commun. 2014, 185, 604–613.

[271] D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, H. J. C. Berendsen, J. Comput. Chem. 2005, 26, 1701–1718.

[272] M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl, SoftwareX 2015, 1–2, 19–25.

[273] S. C. Lovell, J. M. Word, J. S. Richardson, D. C. Richardson, Proteins 2000, 40, 389–408.

Page 82: Computational Protein Evolution - DiVA portaluu.diva-portal.org/smash/get/diva2:1416765/FULLTEXT01.pdf · mechanistic insights and a better understanding of the enzymes role and its

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1922

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science andTechnology, Uppsala University, is usually a summary of anumber of papers. A few copies of the complete dissertationare kept at major Swedish research libraries, while thesummary alone is distributed internationally throughthe series Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology.(Prior to January, 2005, the series was published under thetitle “Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-407494

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2020