Directed Evolution (2)

Chapter 1

DIRECTED EVOLUTION

(Overall Strategies and Methods for Improved Enzymatic Performance)

1.1. Introduction

Enzyme are the product of biological evolution, which takes several million years.

They are among the most remarkable biomolecules know because of their extraordinary

specifity and catalytic power, which are far greater than those of man-made catalysis.

Moreover, as they are adjusted perfectly to their physiological role, the activity and stability

of naturally occuring enzymes are ofter far away from what organic chemists, biochemists

and biotransformists need. This is true for the stability of enzymes in organic solvents and

certain other reactions requiring high selectivity and, finally yielding industrially important

compounds.

It is widely recognized that enzymes hold tremendous potential for industry.

Currently, over 500 products across a wide spectrum of applications utilize enzymes in their

manufacture. A variety of enzymes have been in industrial use since before genetic

engineering appeared. However, merely a couple of dozen enzymes account for over 90% of

total industrial enzyme use. Some common examples are listed in Table 1.1.

Table 1.1 Important Proteins Used Industrially

Protein Function

Amylases Hydrolysis of starch for brewing

Lactase Hydrolysis of lactose in milk processing

Invertase Hydrolysis of sucrose

Cellulase Hydrolysis of cellulose from plant materials

Glucose isomerase Conversion of glucose to fructose for high-fructose syrups

Pectinase Hydrolysis of pectins to clarify fruit juices, etc.

Proteases (ficin, bromelain, papain)

Hydrolysis of proteins for meat tenderizing and clarification of fruit juices

Rennet Protease used in cheese making

Glucose oxidase Antioxidant in processed foods

Catalase Antioxidant in processed foods

Lipases Lipid hydrolysis in preparing cheese and other foods

(Source. Clark, David P, et.al. (2012). Biotechnology : Academic Cell Update. London : Academic Press of

Elsevier Inc.)

Three main different, yet complementary approaches have been used to develop

enzymes with optimal catalyst performance in the past several decades (Figure 1.1). The

objective here is to engineer proteins so that they may be used under industrial conditions

without being denatured and losing activity. However, it is also possible to alter proteins to

change the specificity of their enzyme activities or even to create totally new enzyme

activities. Ultimately, it may be possible to design proteins from basic principles.

One approach is rational design, in which site-specific changes are made on the target

enzyme with the aid of detailed knowledge about the protein structure, function, and catalytic

mechanism. Another approach is directed evolution, which involves repeated cycles of

random mutagenesis and/or gene recombination followed by high throughput screening or

selection of the functionally improved mutants.

Figure 1.1. Existing approaches for developing commercially viable enzymes that often require optimized

features such as activity, selectivity and stability. Among the three most widely used approaches including

directed evolution (1), rational design (2), and bioprospecting (3), directed evolution is considered as the most

effective approach in filling the functional gap between naturally occurring enzymes and the commercially

viable enzymes in termsof time and cost

(Source. Rubin-Pitel, Sheryl B. and Zhao, Huimin. (2006). Recent Advances in Biocatalysis by Directed

Enzyme Evolution. Combinatorial Chemistry & High Throughput Screening, 9, 247-257)

1.2. Directed Evolution

Generally there are two main strategies for protein engineering: directed evolution and

rational design, which can be combined to semi-rational design or focused directed

(designed) evolution (Figure 1.2a). Directed evolution technique is based on three

fundamental steps, production of mutation library from parental protein molecule by

introducing sequence diversity, identifying the desire variants by efficient screening and

selection and finally further mutagenesis and recombination of selected variants for further

improvement of protein molecule (Cole and Gaucher, 2011).

Figure 1.2. (a) Overview of approaches for protein engineering by random, rational and combined

methods (b) Overview of directed enzyme evolution.

(Source. (a) Steiner, Kerstin and Schwab, Helmut. (2012). Recent Advances in Rational Approaches for

Enzyme Engineering. Computational and Structural Biotechnology Journal,2, 1-12 (b) Tao, H. & Cornish, V.

W. (2002). Milestones in directed enzyme evolution. Curr Opin Chem Biol, 6, 858-64)

Sequence diversity and screening/selection both are important to obtain a desire

property of an enzyme catalyst using in vitro evolution techniques. Designing large libraries

followed by high throughput assays to select a desire variant is preferable among the

researchers. A variant with proxy desired function can be captured by using high throughput

assay. To avoid these unwanted variants, much accurate low throughput assays should be

considered if the library size can be reduced without losing functional diversity, otherwise it

is worth to remember ‘you get what you screen for’.

The advantage of directed evolution is that no structural information is needed and

that variations at unexpected positions distant from the active site can be introduced.

However, usually the changes are small and several rounds of evolution have to be applied

and thus a high number of variants have to be screened, which is time and labor consuming

and requires cheap, fast and reliable high-throughput assays.

1.3. Directed Evolution : Immproving Enzyme Properties

(a) (b)

Directed evolution has enjoyed great success in improving existing enzyme

characteristics. In the following sections, only a few selected examples will be highlighted.

Alterations have been made for almost all aspects of enzyme properties, such as substrate

specificity, product specificity, selectivity, activity, stability, or folding/solubility. Such

alterations are required for enzymes to become practically useful biocatalysts or therapeutics.

Directed evolution represents a highly effective strategy for discovering and

optimizing enzymes for industrial applications. It is complementary in its approach to

alternative, equally powerful methods that exploit the inherent diversity that already exists in

nature. Directed evolution is also very effective in engineering enzyme stability and activity.

Unlike rational design, which tends to improve one enzyme property at a time (in fact,

attempts to rationally alter one enzyme property often disrupt other existing important

characteristics), directed evolution may improve multiple enzyme properties simultaneously.

For example, five rounds of directed evolution consisting of alternate cycles of error-

prone PCR and in vitro gene recombination coupled with screening led to the isolation of a

highly stable and active subtilisin E mutant. This mutant contained eight thermo-stabilizing

mutations, which were located all over the protein structure. It showed a >200-fold longer

thermal inactivation half-life at 65°C, an 18°C higher temperature optima, and a >5-fold

higher activity than the wild-type enzyme.

Another impressive example is the simultaneous improvement of four distinct enzyme

properties of subtilisin, including thermostability, activity in organic solvents, activity at pH

10, and activity at pH 5.5 by directed evolution. Family shuffling was used to recombine 26

homologous subtilisin genes to create a library of chimerical subtilisin genes. Out of 654

active subtilisins, a few mutants showed significant improvement over any of the parental

enzymes for each individual enzyme property.

1.4. Methods of Directed Evolution

A range of strategies for the introduction of diversity into the starting gene(s) are

available, and these can be broadly divided into two classes; (i) non-recombinative and (ii)

recombinative methods, and can range from creating libraries with as few as 200 variants to

many tens of thousands of variants.

Figure 1.3. The directed evolution cycle requires the gene (or genes) of interest, but there is no

requirement for a detailed knowledge of structure or function. Diversity may be introduced using a range of

methods, and after expression variants with the desired property are selected or screened out of the mixture.

Further rounds of directed evolution may be carried out using the first-generation DNA as parent for the second.

(Source. Willliam, G. J., Nellson, A.S., and Berry, A. (2004). Directed Evolution of Enzymes for Bicatalysis

and The Life Sciences. Cellular and Molecular Life Sciences, 61, 3034–3046)

1.4.1. Non-recombinative methods

Saturation Mutagenesis

Non-recombinative methods generally create diversity via point mutation and

include the directed substitution of single amino acids, the insertion or deletion of

more than one amino acid, for example by cassette mutagenesis, and random

mutagenesis across the whole gene. Thus, a variety of methods are available

depending on the extent of mutation required. In cases where a high-resolution

structure of the target protein with bound substrate or inhibitor is available, residues

which contact the substrate can be identified and can be hypothesized to be

responsible in varying degrees for the natural reaction specificity.

Mutation of these contacting residues to all other 19 amino acids by saturation

mutagenesis (sm) can often lead to the identification of variants with significantly

altered substrate specificity. For example, Schultz and co-workers used saturation

mutagenesis at five positions in the active site of the Methanococcus jannaschii

tyrosyl transfer RNA (tRNA) synthetase to alter the amino acid specificity so that it

accepts only an unnatural amino acid. Using several rounds of positive and negative

growth selection, a mutant synthetase was obtained which had a kcat/Km for the target

unnatural amino acid O-methyl-L-tyrosine, 100-fold higher than for the natural

substrate tyrosine.

A disadvantage of using crystal structures to identify residues thought to be

responsible for substrate specificity is that this approach may ignore residues distant

from the active site.

Error-Prone PCR (epPCR)

It is the first method described to achieve random mutagenesis. The technique

based on the fact of Taq DNA polymerases that lack the proof reading that incorporate

mispairing at the frequency of 0.1 × 10−4 to 2 × 10−4 per nucleotide during the

extension of strand in PCR reaction. Despite the important and growing use of non-

recombinant methods for variant library production, the most significant changes in

enzyme function have been created using recombinative methods. Several DNA

polymerase fidelity has been identified and amongst all, Taq polymerase has the

lowest fidelity, which makes Taq the best candidate for an in vitro mutagenesis

(Cadwell and Joyce, 1992).

In addition to the Taq DNA polymerase, increasing the concentration of MgCl2

nucleotide analogs, and MnCl2 can incorporate mispairing during PCR (Kaur and

Sharma, 2006). DNA polymerase has one binding site for template, one for dNTP and

one for dNMP. Binding of Mn2+ effect base-pairing properties by altering template

and substrate molecule. It also interact with DNA polymerase reducing the selection

priority of nucleotides before they insert (Beckman et al., 1985).

Figure 1.4. Proposed mechanisms for infidelity during DNA replication by metal ions

(Source. Zakour, R. A., Kunkel, T. A. & Loeb, L. A. 1981. Metal-induced infidelity of DNA synthesis.

Environ Health Perspect, 40, 197-205)

dITP is a natural occurring base analog which occasionally found at the first

position of tRNA anticodon. It can pair with or without hydrogen bonds to any of the

four nucleotides. In tRNA anticodon it pair with A, C, G and U and with poly

(Aristarkhova et al.) They make a stable complex (Ohtsuka et al., 1985).

The important point to be considered in error-prone PCR technique is that the

beneficial mutations are rare in comparison to the deleterious. It is possible that the

combination of beneficial and deleterious mutation form an inactive enzyme. It is

necessary in this technique that frequency should be maintained at low to obtain high

number of desired variants. The protocols available for error-prone are mostly not

random enough. They mostly favor transitional point mutations over transversional

mutations. Transitional point mutations exchange one pyrimidine with another

pyrimidine, or one purine with another purine (AT↔GC and TA↔CG) while in

transverional exchange occure between purine to pyramidine and pyramidine to

purine (AT↔CG,AT↔TA,GC↔CG,GC↔TA).

Table 1.2 Non-recombinative methods

Method Advantages Disadvantages

Error-prone PCR

Simplicity Accumulates deleterious mutationsLimited amino acid substitutionsPolymerase bias

Saturation mutagenesis

SimplicityMutate specific site(s) in a geneAccess all 20 amino acids

Limited diversity generationGene sequence required

1.4.2. Recombinative Methods

DNA Shuffling

Natural selection works on new sequences generated both by mutation and

recombination. DNA shuffling is a method of artificial evolution that includes the

creation of novel mutations as well as recombination. The gene to be improved is cut

into random segments around 100 to 300 base pairs long. The segments are then

reassembled by using a suitable DNA polymerase with overlapping segments or by

using some version of overlap PCR This recombines segments from different copies

of the same gene (Figure 1.5a).

A more powerful variant of DNA shuffling is to start with several closely

related (i.e., homologous) versions of the same gene from different organisms. The

genes are cut at random with appropriate restriction enzymes and the segments mixed

before reassembly. The result is a mixture of genes that have recombined different

segments from different original genes (Figure 1.5b). Note that the reassembled

segments keep their original natural order. For example, several related β-lactamases

from different enteric bacteria have been shuffled. The shuffled genes were cloned

onto a plasmid vector and transformed into host bacteria. The bacteria were then

screened for resistance to selected β-lactam antibiotics. This approach yielded

improved β-lactamases that degraded certain penicillins and cephalosporins more

rapidly and so made their host cells up to 500-fold more resistant to these β-lactam

antibiotics.

Figure 1.5. (a) DNA Shuffling for a Single Gene. Introducing point mutations and shuffling gene

segments can generate a better version of a protein. First, many copies of the original gene are

generated with random mutations. The genes are then cut into random segments. Last, the fragments

are reassembled using overlap PCR. The new constructs must be assessed for enhanced protein

function. (b) DNA Shuffling for Multiple Related Genes. Shuffling segments from related genes can

also enhance the function of a particular protein. The original set of related genes are digested into

small fragments and reassembled using PCR. The new combinations are tested for a change in function.

(Source. Clark, David P, et.al. (2012). Biotechnology : Academic Cell Update. London : Academic

Press of Elsevier Inc.)

Family Shuffling

Family shuffling, applied DNA shuffling to a group of naturally occuring

homologous genes rather than laboratory-created mutants. Family shuffling

significantly accelerated the rate of functional enzyme improvement in a single

recombination selection cycle. Although they are powerful methods, DNA shuffling

and family shuffling are not without limitations. Shuffling methods require the

presence of zones of relatively high sequence homology surrounding regions of

diversity.

Additionally, significant biases are found in where crossover events occur and

in which parents are involved: crossover tends to occur in regions of higher

homology, and among parents which share greater sequence identity. Bias is also

introduced by nonrandom gene fragmentation by the DNaseI enzyme. All of these

factors limit the diversity created in a shuffled library. In extreme cases, lack of

homology among parents can lead to the majority of reconstructed “shuffled”

sequences entirely representing a single parent.

Staggered Extension Protocol (StEP)

However, DNA shuffling requires a large amount of template DNA, although

Staggered Extension Protocol (StEP)150 is yet another method that was developed to

overcome its limitations. This method does not require the DNaseI fragmentation step

and yields chimeric genes through template switching. The template sequences go

through repeated cycles of denaturation and extremely short duration annealing/

polymerase catalyzed extension. In each cycle, the growing fragments anneal to

different templates based on sequence complementarity and extend further. This is

repeated until full-length sequences form. This technique has been used in the

evolution of thermostable subtilisin. Five thermostabilized subtilisin E variants

identified by a single round of epPCR and screening the StEP recombination library

yielded subtilisin E whose half-life at 65 °C was 50 times that of the wild type.

Random Chimeragenesis on Transient Templates (RACHITT)

RACHITT does not utilize themocycling, strand switching, or staggered

extension of primers. Instead, a uracil-containing parent gene is made single-stranded

to serve as a scaffold for the ordering of top-strand fragments of additional,

homologous parent gene(s), and recombination occurs when fragments from different

parent genes hybridize to the scaffold. Pfu DNA polymerase 3’-5’ exonuclease

activity removes the unhybridized 5’ or 3’ overhanging “flaps” created by fragment

annealing, and also fills gaps between the annealed fragments using the transient

scaffold as a template.

The template strand is then eliminated by treatment with uracil-DNA-

glycosylase before applying the template-chimera hybrid to PCR, resulting in

amplification of double stranded, homoduplex chimerical gene sequences. The

process of RACHITT recombination is illustrated in Figure 1.7. RACHITT provides a

significantly higher rate of crossover compared in other family shuffling methods,

with an average of 14 crossover per gene versus one to four crossover for most other

methods. RACHITT also generates 100% chimerical progeny with no duplications of

recombination pattern in chimerical genes. Although the benefits of this method are

obvious, its use may be limited by its complexity and the requirement to create single

stranded gene fragments as well as single stranded, uracil-DNA template.

Figure 1.6. Random homologous DNA recombination by RACHITT

(Source. Rubin-Pitel, Sheryl B., et.al. (2001). Chapter 3. Directed Evolution Tools in Bioproduct and

Biprocess Development. Net Bioethanol, 19, 419-427)

Exon Shuffliing

Exon shuffling requires the creation of DNA fragments containing are amplified with

a mixture of synthetic chimeric oligonucleotides, causing the fragments to be spliced

together randomly. These spliced fragments are then assembled by primerless PCR,

where individual fragments prime against each other to recreate a full-length gene.

Recombination occurs when a chimeric oligonucleotide connects an exon from one

parent gene to a second exon from a different parent gene. The diversity in an exon

shuffling library is controlled by the number of modules which are recombined, and

the number of homologs that are included for each module; in some cases, the

availability of homologous domains may limit the creation of a shuffled library. The

diversity of an exon shuffling library can also be controlled experimentally through

the design of the chimeric oligonucleotides, facilitating certain connections between

domains but not others, or by modifying the molar ratio of domainencoding fragments

to control the stoichiometry of the individual domains in the progeny. As with other

recombination methods, additional diversity can be created in the library by

introducing random point mutations, insertions, or deletions. Rearranging the order of

domain-encoding exons also creates novel diversity.

Figure 1.7. Method of non-homologous recombination by exon shuffling

(Source. Rubin-Pitel, Sheryl B., et.al. (2001). Chapter 3. Directed Evolution Tools in Bioproduct and

Biprocess Development. Net Bioethanol, 19, 419-427)

Incremental Truncation for the Creation of Hybrids Enzymes (ITCHY)

To surpass the disadvantages of DNA shuffling, which can create crossover

only at homologous region, Ostermeier et al. created an approach to generate fusion

libraries between two gene fragments called Incremental Truncation for the Creation

of HYbrids enzymes (ITCHY). Two parental genes are digested with exonuclease III

in controlled conditions to generate truncated gene libraries with progressive 1bp

deletions.

The truncated 5’-fragments of one gene are fused to truncated 3’-fragments of

the other gene, which yields a library of chimeric sequences, which are then expressed

and screened or selected for improved enzyme activity. It allows creation of

functional fusions of genes from overlapping amino or carboxyterminal gene

fragments independent ofDNAsequence homology. However, this method has a

lengthy protocol and requires extensive point sampling and to overcome these

shortcomings an alternative procedure, termed THIO-ITCHY, was developed to

create ITCHY libraries using nucleotide triphosphate analogs such as α-

phosphothionate dNTPs.

DNA is protected by the nucleotide analogs from exonuclease digestion and

hence leads to the desired variation in truncation length upon nuclease treatment. The

two targeted gene fragments can be combined into a single vector, as the generation of

diversity is no longer a function of timed exonuclease digestion but instead based on

the random distribution of the α-phosphothionate nucleotides.

Figure 1.8. Methods used for creating libraries using direcrted evolution

(Source. Kaur, J. and Sharma, R. (2006). Directed Evolution : An Approach to Engineer Enzymes. Critical

Reviews in Biotechnology, 26, 165-199)

Table 1.3 Recombinative methods

Method Advantages Disadvantages

DNA shuffling

Robust, felxibleBack-crossing to parent removes non-essential mutations

Biased to crossover in high homology regionsLow crossover rateHigh percentage of parents

Family shuffling

Exploit natural diversityAccelerates functional enzyme improvement

Biased to crossover in high homology regionsNeed high sequence homoplogy in the gene familyHigh percentage of parent

StEP Simplicity Need high homologyLow crossover rateNeed tight control of PCR

RACHITT No parent genes in a shuffled libraryHigher rate of recombinationRecombine genes of low sequence homology

ComplexRequires synthesis and fragmentation of sigle-stranded complement DNA

Exon shuffling

Preserves exon function Requires known intron-exon organization of target-geneLimited diversity

ITCHY Eliminate recombination biasStructural knowledge not needed

Limited to two parentsSignificant fraction of progeny out-of-frameComplex, labor-intensive

THIO-ITCHY Same advantages as ITCHYCombines recombination and random mutagenesisSimplified ITCHY method

Same disadvantages as ITCHYIncorporated dNTP analogs may complicate further experimentation

Chapter 2

STUDY CASES

2.1 Biocatalysis Engineering of GAR Transformylase using Incremental Truncation for

Creation of Hybrid Enzymes

2.1.1. Background

Figure 1. DNA Shuffling and Crossover Point

(Source: http://academic.pgcc.edu/~kroberts/Lecture/Chapter%207/07-29_Recombination_L.jpg)

DNA shuffling have been used to improve enzyme activity, stability, folding, and to

alter substrate specificity. In this technique, parental genes are fragmented and subsequently

reassembled by PCR to reconstitute the full-length genes. During this reassembly process,

novel combinations of the parental genes arise along with new point mutations. The result of

DNA shuffling is a large library of mutant genes from which acquisition of a desired function

is selected for using an appropriate selection or screening system. This method require

relatively high levels of DNA homology to recombine genes in vitro. However, DNA

shuffling cannot exploit alarge portion of the total combinatorial space because crossover

points between shuffled genes occur only in regions of relatively high-level DNA homology

and at the loci of identity.

Crossovers between structurally homologous proteins at sites lacking DNA homology

are likely to be productive for protein engineering. Exchange of non-homologous low-energy

structures was a more productive strategy than DNA shuffling. However, no combinatorial

strategy for creating hybrids between genes that lack DNA homology has been demonstrated.

While it is true that DNA shuffling of families of genes with DNA homology can create

hybrid enzymes with new properties, such molecular breeding is only feasible for genes with

high genetic homology and, for this reason, is unlikely to evolve an entirely novel function. It

is important to realize that the primary rationale for success in the shuffling of families of

genes is the similarity of the three-dimensional structures of the proteins they encode, not the

degree of DNA homology. Indeed, it is an interesting question whether successful directed

evolution on homologous families might be equally or better served by the creation of genes

with crossovers between family members at regions of little or no genetic homology.

Incremental gene truncation libraries can be used to identify loci for the functional

bisection of protein and have proposed a number of protein engineering strategies that

utuilize incremental truncation. A combinatorial method for biocatalysis engineering called

ITCHY (Incremental Truncation for the Creation of Hybrid Enzymes) creates combinatorial

libraries between two genes in a manner that is independent of DNA sequence homology.

ITCHY libraries allow the identification of a more diverse set of functional fusions than DNA

shuffling.

2.1.2. Basic Principle

2.1.2.1 Incremental truncation

Knowing where to make the fusions is a central problem in the creation of such

hybrids. Since current methodologies for genes lacking high homology were limited to `try it

and see if it works,' we developed a combinatorial approach to this problem termed

incremental truncation. Through incremental truncation we can create fusion libraries of

many (or all) different combinations of lengths of two genes. This approach, described

herein, is thus a combinatorial solution to the questions `where can enzymes or enzyme

fragments be fused to produce active hybrids' as well as `where are the points at which an

enzyme can be bisected'. In addition, we outline a method that should circumvent homology

limitations to DNA shuffling by allowing shuffling of genes independent of sequence

homology.

For the average size gene, the separate construction of all possible one-codon

truncations would require the assembly of hundreds of plasmids, a labor intensive and time

consuming task. Incremental truncation of DNA, on the other hand, allows the construction

of a library containing all possible truncations of a gene, gene fragment or DNA library in a

single experiment (Figure 2).

Incremental truncation is achieved by utilizing the slow, directional, controlled

digestion of DNA. During this digestion, small aliquots are frequently removed and the

digestion quenched. Thus by taking multiple samples over a given time period we can create

a library of all possible single base-pair deletions of a given piece of DNA.

We have been using Exonuclease III (Exo III) which exhibits such properties. Exo III

has been previously shown to be useful in the creation of large truncations of linear DNA and

for techniques in the sequencing of large genes. The digestion rate of Exo III at 37 0C (500

bases/min) is much too fast for purposes of incremental truncation where every one-codon

deletion is desired. However, the digestion rate of the exonuclease can be affected by a

variety of methods such as lowering the incubation temperature, altering the digestion buffer

composition, inclusion of a nuclease inhibitor or lowering the ratio of enzyme to DNA.

Figure 2. Incremental Truncation

(Source: http://www.sciencedirect.com/science/article/pii/S0968089699001431)

Incremental truncation is a method for creating a combinatorial library containing one

base pair deletions of a gene or gene fragment of interest. In this protocol, truncations are

introduced in opposite directions on fragments from two different genes in two separate

reactions. The sets of truncated DNA molecules from each digestion are ligated to each other

with DNA ligase. The resulting “fusions” are cloned as chimeric molecules. The library of

cloned fusions is transformed into bacteria and used for further experiments (e.g., phage

display, enzymatic activity assay, etc.).

2.1.2.2 Hybrid Enzymes

Hybrid enzymes are engineered to contain elements of two or more enzymes. A hybrid

enzyme is considered to be composed of elements of more than one enzyme. Thus, hybrid

enzymes can be generated in a number of ways (Figure. 3): an existing enzyme can be altered

by a single point mutation (or series of point mutations) based on structures existing in a

second enzyme; similarly, secondary-structural elements or whole domains of enzymes, or

monomeric units of multimeric enzymes, can be exchanged; fusions between two enzymes

that have separate and distinct activities are also, by this definition, hybrid enzymes.

The construction of hybrid enzymes parallels the strategies that nature uses to evolve

enzymes. It is generally thought that enzymes have evolved to fit a specific niche in biology

through such processes as gene duplication, domain recruitment and fixation of multiple

point mutations. Similarly, hybrid-enzyme approaches seek to recruit established functions

and properties from existing enzymes and incorporate them into the engineered enzyme.

Hybrid enzymes have often been used to determine the differences between related

enzymes, identifying those residues or structures that impart a specific property that one

enzyme has but another, homologous, enzyme does not. For example, hybrids between two

highly homologous proteinases from Lactococcus lactis were used to determine which

residues were responsible for their cleavage specificity and rate towards as α- and β-casein.

The hybrids were also used to identify an additional unique domain involved in substrate

binding that was absent from related subtilisins. Hybrid enzymes have also been used to

investigate the relative merits of structural and sequence alignments between related

enzymes.

Figure 3. Generation of hybrid enzymes. (a) Substitution of point mutations, secondary structures or both

from enzyme A into a homologous enzyme B.

(b) Exchange of functional domains between enzymes C and D or fusion of the intact enzymes.

(Source: http://www.jhu.edu/chembe/ostermeier/pdf/04_TrendsBiotech.pdf)

2.1.2.3 Incremental Truncation for Creating Hybrid Enzymes

The combination of two incremental truncation libraries called ITCHY creates diversity

by fusing two gene fragments. Performing ITCHY on a single gene generates libraries of

proteins with internal deletions and duplications whereas performing ITCHY between two

different genes generate libraries of fusion proteins in a DNA-homology independent fashion.

ITCHY allows the creation of hybrid enzyme libraries between a random length

5’fragment of the gene encoding protein A and a random length 3’ fragment of the gene

encoding protein B. A key step in this process is the digestion of the parent genes with

exonuclease III (ExoIII) in the presence of NaCl such that the reaction rate is limited to ≤ 10

bases/min. During ExoIII digestion, small aliquots are removed at short intervals and

quenched by addition to a low-pH, high salt buffer. As ExoIII digests DNA at a relatively

uniform rate, members of the library ostensibly correspond to progressive 1 bp deletions.

Figure 4. Schematic overview of THIO-ITCHY using α-phosphothioate nucleotide incorporation by PCR

amplification. (a) Linearization of the starting plasmid by restriction digestion at the unique site between the two

genes or gene fragments. (b) PCR amplification of the entire linearized vector in the presence of a mixture of

dNTPs and αS-dNTPs as described in Materials and Methods. (c) Incubation of the plasmid with exonuclease III

results in hydrolysis of standard dNMPs while the dNMP analogs will block enzymatic degradation. (d) The

single-stranded overhangs of the plasmids are removed enzymatically with mung bean nuclease. (e) The blunt-

ended constructs are recircularized by intramolecular ligation.

(Source: http://nar.oxfordjournals.org/content/29/4/e16/F2.expansion)

2.1.3. Enzyme GAR Transfomylase

Figure 5. GAR Transformylase Structure

(http://www.ebi.ac.uk/thornton-srv/databases/cgi bin/enzymes/GetPage.pl?ec_number=2.1.2.2)

GAR transformylase has important roles in purine biosynthesis and it also has potential

therapeutic benefit from its inhibition. Formyl transfer reactions play a key role in the

construction of the purine heterocycle during de novo purine biosynthesis. Formylation is

catalyzed early in the pathway by the purN glycinamide ribonucleotide transformylase (GAR

Transformylase, EC 2.1.2.2) in a tetrahydrofolate-dependent manner and also by the purT

GAR transformylase in a tetrahydrofolate-independent manner in bacteria.

Figure 6. Reaction Catalyzed by GAR Transformylase

(http://www.ebi.ac.uk/thornton-srv/databases/cgi bin/enzymes/GetPage.pl?ec_number=2.1.2.2)

2.1.4. Genes Encoding Enzyme

E. coli Human

Gene purN GART segment

Gene Function Monofunctional GAR

transfomylase of 212 amino

acids

Trifunctional enzyme

glycinamide ribonucleotide

synthethase-aminoimidazole

ribonucleotide synthetase-

glycinamide ribonucleotide

transformylase

Enzyme Function Catalyses the transfer of the

formyl group from the

cofactor N10-formly-

tetrahydrofolate (fTHF) to the

amino group of GAR to yield

formyl-glycinamide

ribonucleotide (fGAR)

Utilizes cofactor fTHF and is

functional as a separate

domain

There is 50% identity at DNA level between the two genes and 41% identity (60%

homology) on the amino acid level. Amino acid alignment between purN and the GART

segment reveals no gap, although GART lacks nine aminoacids at the C terminus. Structures

of the active sites of the two enzymes have been reported to be esentially identical but the

structure of GART is not availabel in the Protein Data Bank.

2.1.5. Mechanism of ITCHY

2.1.5.1 Making ITCHY Libraries

ITCHY libraries for this case were created between 5’ fragments of purN (1-144) and

3’ fragments of GART (54-203). There are two libraries:

IT-A was created by electroporation into DH5α to obtain a larger library.

IT-B was created ny electroporation into DH5α-E.

The chosen gene fragments have 1-270 bp of overlap, thus, as each gene fragment

would have 1 to 270 bp truncated, an ideal library containing one member of each of the

desired fusions would have 270 X 270 = 72,900 members. IT-B should contain all possible

fusions between the two gene fragments in the region overlap. The size diversity of IT-A and

IT-B was evaluated on randomly selected library members and found to be essentially

random, but with a bias against small fusions (Figure7.)

Figure 7. Size distribution of libraries. The sizes of the gene fusions of randomly selected members of IT-

A (m) and IT-B (P) were estimated by gel electrophoresis and arranged in descending size order. The

shaded area represents the theoretical size range based on the deletion of 1–270 bases of each fragment.

Fusions larger than the desired size range result from fusion of gene fragments in which truncation has

stopped in the approximately 30 bp spacer between the start of truncation and the gene to be truncated. The

dashed line indicates the size of hybrid genes that are fused where their parents’ sequences align.

(Source: Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation. (1999), Proc.

Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.)

Hybrids of PurN and GaRT with Gar transformylase activity were selected for on

minimal media using E. coli auxotroph TX680F which lacks a functional GAR

tranformylase. Because GAR transformylase activity is essential for purine biosynthesis,

TX680F is unable to grow on minimal media in the absence of purines. Plasmid DNA was

purified from the libraries and transformed into TX680F.

The number of active fusions per library of IT-A and IT-B was estimated to be 9 and

111 by multiplying the number of colonies on the selective plates by the library size and

dividing by the number of colony-forming units plated on the selective plates.

Amino acids sequences of randomly selected active fusions of IT-A and IT-B were

determined by DNA sequencing (Table. 1). Active genes were found with fusions in regions

of high and low homology (Figure 8.) and within loops, α-helices and β-sheets of PurN.

Almost all fusion points of active hybrids occured at sites of exact allignment (Figure 9.).

Table 1. Active PurN-GART Fusions



Figure 3. Fusion points of active PurN–GART hybrids relative to the alignment of PurN and GART. Crossovers

of active fusions found by ITCHY are shown by a solid line and those found by DNA shuffling are shown by a

dashed line. To the left of the crossover point, the fusion has sequence from PurN. To the right of the crossover

point, the fusion has sequence from GART. If the crossover occurred in a region of DNA identity, the exact

fusion point could not be assigned and is shown at the 3’ end of the region of identity. Regions of sequence

identity are indicated in gray. *Three key active site residues. The region chosen to search for active fusions is

shown between the long dashed lines. None of the sequenced, active members of IT-A or IT-B had any other

mutations. Covalent fusion of the two fragments was found to be necessary as no interspecies heterodimers

between any combination of truncated fragments of PurN (1-144) and GART (54-203) were able to complement

the auxotroph.



Figure 4. Fusion points of active PurN–GART hybrids mapped onto the structure of PurN31. The region

searched for active fusions (residues 54–144) is shown in green, and the area outside the search region is shown

in blue. The side chains of the three key active site residues, Asn106, His108, and Asp144, are shown in yellow.

The substrate GAR (top) and a cofactor analog (5-deaza-5,6,7,8-tetrahydrofolate) are shown in white. The

amino acids of PurN to which fusion of a GART fragment results in an active enzyme are shown in red. These

include fusion points identified by sequencing and those that can be inferred given that an active fusion was

found elsewhere in a region of amino acid identity (e.g., 104–113). The location of fusion points of hybrids

characterized in Table 2 are indicated by the numbers.



2.1.5.2 Kinetic Characterization of Active Hybrids

Based on kcat/Km (GAR), all characterized fusions were found to have activities of at

least 500-folds less than wild-type PurN. In very simplistic view, it can be expected that the

Km of the hybrids to be similiar to that of PurN (because the PurN fragment contains most of

the residues important for binding) and the kcat to be similar to that of GART (because all of

part of the key active site residues derive from GART).

Table 2. Kinetic constants of selected PurN–GART fusions



2.1.5.3 Conclusion

ITCHY can create combinatorial libraries of genes in a manner that is independent of

DNA sequence homology by identifying 10 active PurN-GART fusion proteins

between N-terminal fragments of E. coli GAR transformylase PurN and C-terminal

fragments of human GAR transformylase GART.

The optimum start of the GART domain may have been difficult to define and

nonnative E. coli codon usage at the N terminus of the fusions might have led to poor

expression.

Fusions within the active site may be less disruptive because of structural similarity

within this region, whereas fusions distal to the active site with lesser homology may

be more disruptive.

Insertions of a few amino acids or even entire proteins have been shown to be

compatible with activity in other enzymes. Thus, fusions with a few extra amino acids

would not only be active but that they would be the predominant active species.

Predominance of crossovers at positions of precise alignment can be explained:

1. The linear distances between conserved residues may have some importance

for structure and/or function.

2. The decrease in activity caused by extra amino acids is small, but this small

decrease may be enough to prevent complementation of the auxotroph since

kc/Km is already reduced 500- to 10,000-fold for active fusions.

Insertion or deletion of residues in hybrids could prove advantageous for the

engineering of other proteins.

Residues 63 and 112 of PurN might be good choices as fusion points for creating

hybrid enyzmes. Although fusion between purN and GART at residue 112 proved to

be active, no active fusions were found with a PurN fragment shorter than 1-100

residues.

Kinetic characterization of four active hybrids from IT-A and IT-B (Table 2) suggests

that hybrids fused between residues 54 and 100 are not active because of weak

binding of the substrate and cofactor, as the Km values for both substrate and cofactor

increase with the length of the GART in the fusion. The exception is that IT-A1 has a

higher Km(fDDF) than ITA5, even though IT-A1 contains less of GART.

Presumably, this is attributable to IT-A1 being fused in the 140–144 loop, which has

been shown to have a role in binding the cofactor.

2.1.6. Advantages

ITCHY method enabled identification of a more diverse set of active chimeras than

DNA shuffling, principally as a result of the relatively nonbiased and non-homology based

method that creates the fusions. The active fusions identified by ITCHY demonstrate that

crossovers between genes at regions of structural homology, irrespective of DNA sequence

homology, are important for creating functional hybrid enzymes.

Although the library created by DNA shuffling had a higher frequency of positives, it

was not very diverse. Fused genes of ITCHY libraries could have been initially selected for

size (e.g. the size of the original genes) resulting in an enrichment for active members of

probably 10- to 100-fold to give a frequency of 0.1-1.0%. DNA shuffling can create hybrids

with multiple crossovers, whereas ITCHY libraries are limited to one crossover point per

library member. One can envision an iterative method for ITCHY in order to create library

members with multiple crossovers. However, as ITCHY libraries create all possible

crossovers between two genes, DNA shuffling of ITCHY libraries should allow one to create

a library of genes with multiple crossovers that include crossovers at regions of no homology,

thus accessing a more diverse sequence space. In a fashion analogous to DNA family

shuffling, which improves directed evolution by accessing a more diverse yet functional

sequence space, such a strategy should prove useful for the directed evolution of proteins. In

addition, ITCHY libraries should have applications in the creation of novel enzymes by

domain and subdomain swapping as well as in the determination of structure/function

relationships by characterizing hybrids of interspecies homologs.

2.1.7. Improvement

An improvement over incremental truncation for the creation of hybrid enzyme

(ITCHY) is SCRATCHY (ITCHY combined with DNA shuffling). The approach combines

two methods for recombining genes: ITCHY and DNA shuffling. First, ITCHY is used to

create a comprehensive set of fusions between fragments of genes in a DNA homology-

independent fashion. This artificial family is then subjected to a DNA-shuffling step to

augment the number of crossovers. SCRATCHY libraries were created from the

glycinamide–ribonucleotide formyltransferase (GART) genes from E. coli (purN) and human

(hGART).

REFERENCE

Aristharkova, S. A., Burlakova, E. B. & Sheludchenko, N. I. 1979. [Effect of lecithin on liver

microsomal lipid peroxidation]. Biokhimiia, 44, 125-9.

Beckmen, R. A., Mildvan, A. S. & Loeb, L. A. 1985. On the fidelity of DNA replication:

manganese mutagenesis in vitro. Biochemistry, 24, 5810-7.

Cadwell, R. C. & Joyce, G. F. 1992. Randomization of genes by PCR mutagenesis. PCR

Methods Appl, 2, 28-33.

Cole, M. F. & Gaucher, E. A. (2011). Exploiting models of molecular evolution to efficiently

direct protein engineering. J Mol Evol, 72, 193-203.

Dasu, V. Venkata et al. Developments in Directed Evolution for Improving Enzyme

Functions. (18 August 2007), Appl Biochem Biotechnol (2007) 143:212–223 DOI

10.1007/s12010-007-8003-4

Kaur, J. & Sharma, R. 2006. Directed evolution: an approach to engineer enzymes. Critical

Reviews Biotechnoogyl, 26, 165-99.

Nixon, Andrew E. et al. Hybrid Enzymes: Manipulating Enzyme Design. (1998), TIBTECH

JUNE 1998 (VOL 16), Elsevier Science Ltd.

Nixon, Andrew E. et al. Incremental Truncation as a Strategy in the Engineering of Novel

Biocatalysts. (15 October 1998), Bioorganic & Medicinal Chemistry 7 (1999)

2139±2144.

Ohtsuka, E., Matsuki, S., Ikehara, M., Takahashi, Y. & Matsubara, K. 1985. An alternative

approach to deoxyoligonucleotides as hybridization probes by insertion of deoxyinosine

at ambiguous codon positions. Journal of Biological Chemistry, 260, 2605-2608.

Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation.

(1999), Proc. Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.

Rubin-Pitel, Sheryl B. and Zhao, Huimin. (2006). Recent Advances in Biocatalysis by

Directed Enzyme Evolution. Combinatorial Chemistry & High Throughput Screening,

9, 247-257.

Tao, H. & Cornish, V. W. (2002). Milestones in directed enzyme evolution. Curr Opin Chem

Biol, 6, 858-64.

Willliam, G. J., Nellson, A.S., and Berry, A. (2004). Directed Evolution of Enzymes for

Bicatalysis and The Life Sciences. Cellular and Molecular Life Sciences, 61, 3034–

3046.

Zakour, R. A., Kunkel, T. A. & Loeb, L. A. 1981. Metal-induced infidelity of DNA

synthesis. Environ Health Perspect, 40, 197-205.

Directed Evolution (2)

Documents

Transcript of Directed Evolution (2)