Directed Evolution (2)
-
Upload
ardiellaputri -
Category
Documents
-
view
35 -
download
2
Transcript of Directed Evolution (2)
Chapter 1
DIRECTED EVOLUTION
(Overall Strategies and Methods for Improved Enzymatic Performance)
1.1. Introduction
Enzyme are the product of biological evolution, which takes several million years.
They are among the most remarkable biomolecules know because of their extraordinary
specifity and catalytic power, which are far greater than those of man-made catalysis.
Moreover, as they are adjusted perfectly to their physiological role, the activity and stability
of naturally occuring enzymes are ofter far away from what organic chemists, biochemists
and biotransformists need. This is true for the stability of enzymes in organic solvents and
certain other reactions requiring high selectivity and, finally yielding industrially important
compounds.
It is widely recognized that enzymes hold tremendous potential for industry.
Currently, over 500 products across a wide spectrum of applications utilize enzymes in their
manufacture. A variety of enzymes have been in industrial use since before genetic
engineering appeared. However, merely a couple of dozen enzymes account for over 90% of
total industrial enzyme use. Some common examples are listed in Table 1.1.
Table 1.1 Important Proteins Used Industrially
Protein Function
Amylases Hydrolysis of starch for brewing
Lactase Hydrolysis of lactose in milk processing
Invertase Hydrolysis of sucrose
Cellulase Hydrolysis of cellulose from plant materials
Glucose isomerase Conversion of glucose to fructose for high-fructose syrups
Pectinase Hydrolysis of pectins to clarify fruit juices, etc.
Proteases (ficin, bromelain, papain)
Hydrolysis of proteins for meat tenderizing and clarification of fruit juices
Rennet Protease used in cheese making
Glucose oxidase Antioxidant in processed foods
Catalase Antioxidant in processed foods
Lipases Lipid hydrolysis in preparing cheese and other foods
(Source. Clark, David P, et.al. (2012). Biotechnology : Academic Cell Update. London : Academic Press of
Elsevier Inc.)
Three main different, yet complementary approaches have been used to develop
enzymes with optimal catalyst performance in the past several decades (Figure 1.1). The
objective here is to engineer proteins so that they may be used under industrial conditions
without being denatured and losing activity. However, it is also possible to alter proteins to
change the specificity of their enzyme activities or even to create totally new enzyme
activities. Ultimately, it may be possible to design proteins from basic principles.
One approach is rational design, in which site-specific changes are made on the target
enzyme with the aid of detailed knowledge about the protein structure, function, and catalytic
mechanism. Another approach is directed evolution, which involves repeated cycles of
random mutagenesis and/or gene recombination followed by high throughput screening or
selection of the functionally improved mutants.
Figure 1.1. Existing approaches for developing commercially viable enzymes that often require optimized
features such as activity, selectivity and stability. Among the three most widely used approaches including
directed evolution (1), rational design (2), and bioprospecting (3), directed evolution is considered as the most
effective approach in filling the functional gap between naturally occurring enzymes and the commercially
viable enzymes in termsof time and cost
(Source. Rubin-Pitel, Sheryl B. and Zhao, Huimin. (2006). Recent Advances in Biocatalysis by Directed
Enzyme Evolution. Combinatorial Chemistry & High Throughput Screening, 9, 247-257)
1.2. Directed Evolution
Generally there are two main strategies for protein engineering: directed evolution and
rational design, which can be combined to semi-rational design or focused directed
(designed) evolution (Figure 1.2a). Directed evolution technique is based on three
fundamental steps, production of mutation library from parental protein molecule by
introducing sequence diversity, identifying the desire variants by efficient screening and
selection and finally further mutagenesis and recombination of selected variants for further
improvement of protein molecule (Cole and Gaucher, 2011).
Figure 1.2. (a) Overview of approaches for protein engineering by random, rational and combined
methods (b) Overview of directed enzyme evolution.
(Source. (a) Steiner, Kerstin and Schwab, Helmut. (2012). Recent Advances in Rational Approaches for
Enzyme Engineering. Computational and Structural Biotechnology Journal,2, 1-12 (b) Tao, H. & Cornish, V.
W. (2002). Milestones in directed enzyme evolution. Curr Opin Chem Biol, 6, 858-64)
Sequence diversity and screening/selection both are important to obtain a desire
property of an enzyme catalyst using in vitro evolution techniques. Designing large libraries
followed by high throughput assays to select a desire variant is preferable among the
researchers. A variant with proxy desired function can be captured by using high throughput
assay. To avoid these unwanted variants, much accurate low throughput assays should be
considered if the library size can be reduced without losing functional diversity, otherwise it
is worth to remember ‘you get what you screen for’.
The advantage of directed evolution is that no structural information is needed and
that variations at unexpected positions distant from the active site can be introduced.
However, usually the changes are small and several rounds of evolution have to be applied
and thus a high number of variants have to be screened, which is time and labor consuming
and requires cheap, fast and reliable high-throughput assays.
1.3. Directed Evolution : Immproving Enzyme Properties
(a) (b)
Directed evolution has enjoyed great success in improving existing enzyme
characteristics. In the following sections, only a few selected examples will be highlighted.
Alterations have been made for almost all aspects of enzyme properties, such as substrate
specificity, product specificity, selectivity, activity, stability, or folding/solubility. Such
alterations are required for enzymes to become practically useful biocatalysts or therapeutics.
Directed evolution represents a highly effective strategy for discovering and
optimizing enzymes for industrial applications. It is complementary in its approach to
alternative, equally powerful methods that exploit the inherent diversity that already exists in
nature. Directed evolution is also very effective in engineering enzyme stability and activity.
Unlike rational design, which tends to improve one enzyme property at a time (in fact,
attempts to rationally alter one enzyme property often disrupt other existing important
characteristics), directed evolution may improve multiple enzyme properties simultaneously.
For example, five rounds of directed evolution consisting of alternate cycles of error-
prone PCR and in vitro gene recombination coupled with screening led to the isolation of a
highly stable and active subtilisin E mutant. This mutant contained eight thermo-stabilizing
mutations, which were located all over the protein structure. It showed a >200-fold longer
thermal inactivation half-life at 65°C, an 18°C higher temperature optima, and a >5-fold
higher activity than the wild-type enzyme.
Another impressive example is the simultaneous improvement of four distinct enzyme
properties of subtilisin, including thermostability, activity in organic solvents, activity at pH
10, and activity at pH 5.5 by directed evolution. Family shuffling was used to recombine 26
homologous subtilisin genes to create a library of chimerical subtilisin genes. Out of 654
active subtilisins, a few mutants showed significant improvement over any of the parental
enzymes for each individual enzyme property.
1.4. Methods of Directed Evolution
A range of strategies for the introduction of diversity into the starting gene(s) are
available, and these can be broadly divided into two classes; (i) non-recombinative and (ii)
recombinative methods, and can range from creating libraries with as few as 200 variants to
many tens of thousands of variants.
Figure 1.3. The directed evolution cycle requires the gene (or genes) of interest, but there is no
requirement for a detailed knowledge of structure or function. Diversity may be introduced using a range of
methods, and after expression variants with the desired property are selected or screened out of the mixture.
Further rounds of directed evolution may be carried out using the first-generation DNA as parent for the second.
(Source. Willliam, G. J., Nellson, A.S., and Berry, A. (2004). Directed Evolution of Enzymes for Bicatalysis
and The Life Sciences. Cellular and Molecular Life Sciences, 61, 3034–3046)
1.4.1. Non-recombinative methods
Saturation Mutagenesis
Non-recombinative methods generally create diversity via point mutation and
include the directed substitution of single amino acids, the insertion or deletion of
more than one amino acid, for example by cassette mutagenesis, and random
mutagenesis across the whole gene. Thus, a variety of methods are available
depending on the extent of mutation required. In cases where a high-resolution
structure of the target protein with bound substrate or inhibitor is available, residues
which contact the substrate can be identified and can be hypothesized to be
responsible in varying degrees for the natural reaction specificity.
Mutation of these contacting residues to all other 19 amino acids by saturation
mutagenesis (sm) can often lead to the identification of variants with significantly
altered substrate specificity. For example, Schultz and co-workers used saturation
mutagenesis at five positions in the active site of the Methanococcus jannaschii
tyrosyl transfer RNA (tRNA) synthetase to alter the amino acid specificity so that it
accepts only an unnatural amino acid. Using several rounds of positive and negative
growth selection, a mutant synthetase was obtained which had a kcat/Km for the target
unnatural amino acid O-methyl-L-tyrosine, 100-fold higher than for the natural
substrate tyrosine.
A disadvantage of using crystal structures to identify residues thought to be
responsible for substrate specificity is that this approach may ignore residues distant
from the active site.
Error-Prone PCR (epPCR)
It is the first method described to achieve random mutagenesis. The technique
based on the fact of Taq DNA polymerases that lack the proof reading that incorporate
mispairing at the frequency of 0.1 × 10−4 to 2 × 10−4 per nucleotide during the
extension of strand in PCR reaction. Despite the important and growing use of non-
recombinant methods for variant library production, the most significant changes in
enzyme function have been created using recombinative methods. Several DNA
polymerase fidelity has been identified and amongst all, Taq polymerase has the
lowest fidelity, which makes Taq the best candidate for an in vitro mutagenesis
(Cadwell and Joyce, 1992).
In addition to the Taq DNA polymerase, increasing the concentration of MgCl2
nucleotide analogs, and MnCl2 can incorporate mispairing during PCR (Kaur and
Sharma, 2006). DNA polymerase has one binding site for template, one for dNTP and
one for dNMP. Binding of Mn2+ effect base-pairing properties by altering template
and substrate molecule. It also interact with DNA polymerase reducing the selection
priority of nucleotides before they insert (Beckman et al., 1985).
Figure 1.4. Proposed mechanisms for infidelity during DNA replication by metal ions
(Source. Zakour, R. A., Kunkel, T. A. & Loeb, L. A. 1981. Metal-induced infidelity of DNA synthesis.
Environ Health Perspect, 40, 197-205)
dITP is a natural occurring base analog which occasionally found at the first
position of tRNA anticodon. It can pair with or without hydrogen bonds to any of the
four nucleotides. In tRNA anticodon it pair with A, C, G and U and with poly
(Aristarkhova et al.) They make a stable complex (Ohtsuka et al., 1985).
The important point to be considered in error-prone PCR technique is that the
beneficial mutations are rare in comparison to the deleterious. It is possible that the
combination of beneficial and deleterious mutation form an inactive enzyme. It is
necessary in this technique that frequency should be maintained at low to obtain high
number of desired variants. The protocols available for error-prone are mostly not
random enough. They mostly favor transitional point mutations over transversional
mutations. Transitional point mutations exchange one pyrimidine with another
pyrimidine, or one purine with another purine (AT↔GC and TA↔CG) while in
transverional exchange occure between purine to pyramidine and pyramidine to
purine (AT↔CG,AT↔TA,GC↔CG,GC↔TA).
Table 1.2 Non-recombinative methods
Method Advantages Disadvantages
Error-prone PCR
Simplicity Accumulates deleterious mutationsLimited amino acid substitutionsPolymerase bias
Saturation mutagenesis
SimplicityMutate specific site(s) in a geneAccess all 20 amino acids
Limited diversity generationGene sequence required
1.4.2. Recombinative Methods
DNA Shuffling
Natural selection works on new sequences generated both by mutation and
recombination. DNA shuffling is a method of artificial evolution that includes the
creation of novel mutations as well as recombination. The gene to be improved is cut
into random segments around 100 to 300 base pairs long. The segments are then
reassembled by using a suitable DNA polymerase with overlapping segments or by
using some version of overlap PCR This recombines segments from different copies
of the same gene (Figure 1.5a).
A more powerful variant of DNA shuffling is to start with several closely
related (i.e., homologous) versions of the same gene from different organisms. The
genes are cut at random with appropriate restriction enzymes and the segments mixed
before reassembly. The result is a mixture of genes that have recombined different
segments from different original genes (Figure 1.5b). Note that the reassembled
segments keep their original natural order. For example, several related β-lactamases
from different enteric bacteria have been shuffled. The shuffled genes were cloned
onto a plasmid vector and transformed into host bacteria. The bacteria were then
screened for resistance to selected β-lactam antibiotics. This approach yielded
improved β-lactamases that degraded certain penicillins and cephalosporins more
rapidly and so made their host cells up to 500-fold more resistant to these β-lactam
antibiotics.
Figure 1.5. (a) DNA Shuffling for a Single Gene. Introducing point mutations and shuffling gene
segments can generate a better version of a protein. First, many copies of the original gene are
generated with random mutations. The genes are then cut into random segments. Last, the fragments
are reassembled using overlap PCR. The new constructs must be assessed for enhanced protein
function. (b) DNA Shuffling for Multiple Related Genes. Shuffling segments from related genes can
also enhance the function of a particular protein. The original set of related genes are digested into
small fragments and reassembled using PCR. The new combinations are tested for a change in function.
(Source. Clark, David P, et.al. (2012). Biotechnology : Academic Cell Update. London : Academic
Press of Elsevier Inc.)
Family Shuffling
Family shuffling, applied DNA shuffling to a group of naturally occuring
homologous genes rather than laboratory-created mutants. Family shuffling
significantly accelerated the rate of functional enzyme improvement in a single
recombination selection cycle. Although they are powerful methods, DNA shuffling
and family shuffling are not without limitations. Shuffling methods require the
presence of zones of relatively high sequence homology surrounding regions of
diversity.
Additionally, significant biases are found in where crossover events occur and
in which parents are involved: crossover tends to occur in regions of higher
homology, and among parents which share greater sequence identity. Bias is also
introduced by nonrandom gene fragmentation by the DNaseI enzyme. All of these
factors limit the diversity created in a shuffled library. In extreme cases, lack of
homology among parents can lead to the majority of reconstructed “shuffled”
sequences entirely representing a single parent.
Staggered Extension Protocol (StEP)
However, DNA shuffling requires a large amount of template DNA, although
Staggered Extension Protocol (StEP)150 is yet another method that was developed to
overcome its limitations. This method does not require the DNaseI fragmentation step
and yields chimeric genes through template switching. The template sequences go
through repeated cycles of denaturation and extremely short duration annealing/
polymerase catalyzed extension. In each cycle, the growing fragments anneal to
different templates based on sequence complementarity and extend further. This is
repeated until full-length sequences form. This technique has been used in the
evolution of thermostable subtilisin. Five thermostabilized subtilisin E variants
identified by a single round of epPCR and screening the StEP recombination library
yielded subtilisin E whose half-life at 65 °C was 50 times that of the wild type.
Random Chimeragenesis on Transient Templates (RACHITT)
RACHITT does not utilize themocycling, strand switching, or staggered
extension of primers. Instead, a uracil-containing parent gene is made single-stranded
to serve as a scaffold for the ordering of top-strand fragments of additional,
homologous parent gene(s), and recombination occurs when fragments from different
parent genes hybridize to the scaffold. Pfu DNA polymerase 3’-5’ exonuclease
activity removes the unhybridized 5’ or 3’ overhanging “flaps” created by fragment
annealing, and also fills gaps between the annealed fragments using the transient
scaffold as a template.
The template strand is then eliminated by treatment with uracil-DNA-
glycosylase before applying the template-chimera hybrid to PCR, resulting in
amplification of double stranded, homoduplex chimerical gene sequences. The
process of RACHITT recombination is illustrated in Figure 1.7. RACHITT provides a
significantly higher rate of crossover compared in other family shuffling methods,
with an average of 14 crossover per gene versus one to four crossover for most other
methods. RACHITT also generates 100% chimerical progeny with no duplications of
recombination pattern in chimerical genes. Although the benefits of this method are
obvious, its use may be limited by its complexity and the requirement to create single
stranded gene fragments as well as single stranded, uracil-DNA template.
Figure 1.6. Random homologous DNA recombination by RACHITT
(Source. Rubin-Pitel, Sheryl B., et.al. (2001). Chapter 3. Directed Evolution Tools in Bioproduct and
Biprocess Development. Net Bioethanol, 19, 419-427)
Exon Shuffliing
Exon shuffling requires the creation of DNA fragments containing are amplified with
a mixture of synthetic chimeric oligonucleotides, causing the fragments to be spliced
together randomly. These spliced fragments are then assembled by primerless PCR,
where individual fragments prime against each other to recreate a full-length gene.
Recombination occurs when a chimeric oligonucleotide connects an exon from one
parent gene to a second exon from a different parent gene. The diversity in an exon
shuffling library is controlled by the number of modules which are recombined, and
the number of homologs that are included for each module; in some cases, the
availability of homologous domains may limit the creation of a shuffled library. The
diversity of an exon shuffling library can also be controlled experimentally through
the design of the chimeric oligonucleotides, facilitating certain connections between
domains but not others, or by modifying the molar ratio of domainencoding fragments
to control the stoichiometry of the individual domains in the progeny. As with other
recombination methods, additional diversity can be created in the library by
introducing random point mutations, insertions, or deletions. Rearranging the order of
domain-encoding exons also creates novel diversity.
Figure 1.7. Method of non-homologous recombination by exon shuffling
(Source. Rubin-Pitel, Sheryl B., et.al. (2001). Chapter 3. Directed Evolution Tools in Bioproduct and
Biprocess Development. Net Bioethanol, 19, 419-427)
Incremental Truncation for the Creation of Hybrids Enzymes (ITCHY)
To surpass the disadvantages of DNA shuffling, which can create crossover
only at homologous region, Ostermeier et al. created an approach to generate fusion
libraries between two gene fragments called Incremental Truncation for the Creation
of HYbrids enzymes (ITCHY). Two parental genes are digested with exonuclease III
in controlled conditions to generate truncated gene libraries with progressive 1bp
deletions.
The truncated 5’-fragments of one gene are fused to truncated 3’-fragments of
the other gene, which yields a library of chimeric sequences, which are then expressed
and screened or selected for improved enzyme activity. It allows creation of
functional fusions of genes from overlapping amino or carboxyterminal gene
fragments independent ofDNAsequence homology. However, this method has a
lengthy protocol and requires extensive point sampling and to overcome these
shortcomings an alternative procedure, termed THIO-ITCHY, was developed to
create ITCHY libraries using nucleotide triphosphate analogs such as α-
phosphothionate dNTPs.
DNA is protected by the nucleotide analogs from exonuclease digestion and
hence leads to the desired variation in truncation length upon nuclease treatment. The
two targeted gene fragments can be combined into a single vector, as the generation of
diversity is no longer a function of timed exonuclease digestion but instead based on
the random distribution of the α-phosphothionate nucleotides.
Figure 1.8. Methods used for creating libraries using direcrted evolution
(Source. Kaur, J. and Sharma, R. (2006). Directed Evolution : An Approach to Engineer Enzymes. Critical
Reviews in Biotechnology, 26, 165-199)
Table 1.3 Recombinative methods
Method Advantages Disadvantages
DNA shuffling
Robust, felxibleBack-crossing to parent removes non-essential mutations
Biased to crossover in high homology regionsLow crossover rateHigh percentage of parents
Family shuffling
Exploit natural diversityAccelerates functional enzyme improvement
Biased to crossover in high homology regionsNeed high sequence homoplogy in the gene familyHigh percentage of parent
StEP Simplicity Need high homologyLow crossover rateNeed tight control of PCR
RACHITT No parent genes in a shuffled libraryHigher rate of recombinationRecombine genes of low sequence homology
ComplexRequires synthesis and fragmentation of sigle-stranded complement DNA
Exon shuffling
Preserves exon function Requires known intron-exon organization of target-geneLimited diversity
ITCHY Eliminate recombination biasStructural knowledge not needed
Limited to two parentsSignificant fraction of progeny out-of-frameComplex, labor-intensive
THIO-ITCHY Same advantages as ITCHYCombines recombination and random mutagenesisSimplified ITCHY method
Same disadvantages as ITCHYIncorporated dNTP analogs may complicate further experimentation
Chapter 2
STUDY CASES
2.1 Biocatalysis Engineering of GAR Transformylase using Incremental Truncation for
Creation of Hybrid Enzymes
2.1.1. Background
Figure 1. DNA Shuffling and Crossover Point
(Source: http://academic.pgcc.edu/~kroberts/Lecture/Chapter%207/07-29_Recombination_L.jpg)
DNA shuffling have been used to improve enzyme activity, stability, folding, and to
alter substrate specificity. In this technique, parental genes are fragmented and subsequently
reassembled by PCR to reconstitute the full-length genes. During this reassembly process,
novel combinations of the parental genes arise along with new point mutations. The result of
DNA shuffling is a large library of mutant genes from which acquisition of a desired function
is selected for using an appropriate selection or screening system. This method require
relatively high levels of DNA homology to recombine genes in vitro. However, DNA
shuffling cannot exploit alarge portion of the total combinatorial space because crossover
points between shuffled genes occur only in regions of relatively high-level DNA homology
and at the loci of identity.
Crossovers between structurally homologous proteins at sites lacking DNA homology
are likely to be productive for protein engineering. Exchange of non-homologous low-energy
structures was a more productive strategy than DNA shuffling. However, no combinatorial
strategy for creating hybrids between genes that lack DNA homology has been demonstrated.
While it is true that DNA shuffling of families of genes with DNA homology can create
hybrid enzymes with new properties, such molecular breeding is only feasible for genes with
high genetic homology and, for this reason, is unlikely to evolve an entirely novel function. It
is important to realize that the primary rationale for success in the shuffling of families of
genes is the similarity of the three-dimensional structures of the proteins they encode, not the
degree of DNA homology. Indeed, it is an interesting question whether successful directed
evolution on homologous families might be equally or better served by the creation of genes
with crossovers between family members at regions of little or no genetic homology.
Incremental gene truncation libraries can be used to identify loci for the functional
bisection of protein and have proposed a number of protein engineering strategies that
utuilize incremental truncation. A combinatorial method for biocatalysis engineering called
ITCHY (Incremental Truncation for the Creation of Hybrid Enzymes) creates combinatorial
libraries between two genes in a manner that is independent of DNA sequence homology.
ITCHY libraries allow the identification of a more diverse set of functional fusions than DNA
shuffling.
2.1.2. Basic Principle
2.1.2.1 Incremental truncation
Knowing where to make the fusions is a central problem in the creation of such
hybrids. Since current methodologies for genes lacking high homology were limited to `try it
and see if it works,' we developed a combinatorial approach to this problem termed
incremental truncation. Through incremental truncation we can create fusion libraries of
many (or all) different combinations of lengths of two genes. This approach, described
herein, is thus a combinatorial solution to the questions `where can enzymes or enzyme
fragments be fused to produce active hybrids' as well as `where are the points at which an
enzyme can be bisected'. In addition, we outline a method that should circumvent homology
limitations to DNA shuffling by allowing shuffling of genes independent of sequence
homology.
For the average size gene, the separate construction of all possible one-codon
truncations would require the assembly of hundreds of plasmids, a labor intensive and time
consuming task. Incremental truncation of DNA, on the other hand, allows the construction
of a library containing all possible truncations of a gene, gene fragment or DNA library in a
single experiment (Figure 2).
Incremental truncation is achieved by utilizing the slow, directional, controlled
digestion of DNA. During this digestion, small aliquots are frequently removed and the
digestion quenched. Thus by taking multiple samples over a given time period we can create
a library of all possible single base-pair deletions of a given piece of DNA.
We have been using Exonuclease III (Exo III) which exhibits such properties. Exo III
has been previously shown to be useful in the creation of large truncations of linear DNA and
for techniques in the sequencing of large genes. The digestion rate of Exo III at 37 0C (500
bases/min) is much too fast for purposes of incremental truncation where every one-codon
deletion is desired. However, the digestion rate of the exonu- clease can be affected by a
variety of methods such as lowering the incubation temperature, altering the digestion buffer
composition, inclusion of a nuclease inhibitor or lowering the ratio of enzyme to DNA.
Figure 2. Incremental Truncation
(Source: http://www.sciencedirect.com/science/article/pii/S0968089699001431)
Incremental truncation is a method for creating a combinatorial library containing one
base pair deletions of a gene or gene fragment of interest. In this protocol, truncations are
introduced in opposite directions on fragments from two different genes in two separate
reactions. The sets of truncated DNA molecules from each digestion are ligated to each other
with DNA ligase. The resulting “fusions” are cloned as chimeric molecules. The library of
cloned fusions is transformed into bacteria and used for further experiments (e.g., phage
display, enzymatic activity assay, etc.).
2.1.2.2 Hybrid Enzymes
Hybrid enzymes are engineered to contain elements of two or more enzymes. A hybrid
enzyme is considered to be composed of elements of more than one enzyme. Thus, hybrid
enzymes can be generated in a number of ways (Figure. 3): an existing enzyme can be altered
by a single point mutation (or series of point mutations) based on structures existing in a
second enzyme; similarly, secondary-structural elements or whole domains of enzymes, or
monomeric units of multimeric enzymes, can be exchanged; fusions between two enzymes
that have separate and distinct activities are also, by this definition, hybrid enzymes.
The construction of hybrid enzymes parallels the strategies that nature uses to evolve
enzymes. It is generally thought that enzymes have evolved to fit a specific niche in biology
through such processes as gene duplication, domain recruitment and fixation of multiple
point mutations. Similarly, hybrid-enzyme approaches seek to recruit established functions
and properties from existing enzymes and incorporate them into the engineered enzyme.
Hybrid enzymes have often been used to determine the differences between related
enzymes, identifying those residues or structures that impart a specific property that one
enzyme has but another, homologous, enzyme does not. For example, hybrids between two
highly homologous proteinases from Lactococcus lactis were used to determine which
residues were responsible for their cleavage specificity and rate towards as α- and β-casein.
The hybrids were also used to identify an additional unique domain involved in substrate
binding that was absent from related subtilisins. Hybrid enzymes have also been used to
investigate the relative merits of structural and sequence alignments between related
enzymes.
Figure 3. Generation of hybrid enzymes. (a) Substitution of point mutations, secondary structures or both
from enzyme A into a homologous enzyme B.
(b) Exchange of functional domains between enzymes C and D or fusion of the intact enzymes.
(Source: http://www.jhu.edu/chembe/ostermeier/pdf/04_TrendsBiotech.pdf)
2.1.2.3 Incremental Truncation for Creating Hybrid Enzymes
The combination of two incremental truncation libraries called ITCHY creates diversity
by fusing two gene fragments. Performing ITCHY on a single gene generates libraries of
proteins with internal deletions and duplications whereas performing ITCHY between two
different genes generate libraries of fusion proteins in a DNA-homology independent fashion.
ITCHY allows the creation of hybrid enzyme libraries between a random length
5’fragment of the gene encoding protein A and a random length 3’ fragment of the gene
encoding protein B. A key step in this process is the digestion of the parent genes with
exonuclease III (ExoIII) in the presence of NaCl such that the reaction rate is limited to ≤ 10
bases/min. During ExoIII digestion, small aliquots are removed at short intervals and
quenched by addition to a low-pH, high salt buffer. As ExoIII digests DNA at a relatively
uniform rate, members of the library ostensibly correspond to progressive 1 bp deletions.
Figure 4. Schematic overview of THIO-ITCHY using α-phosphothioate nucleotide incorporation by PCR
amplification. (a) Linearization of the starting plasmid by restriction digestion at the unique site between the two
genes or gene fragments. (b) PCR amplification of the entire linearized vector in the presence of a mixture of
dNTPs and αS-dNTPs as described in Materials and Methods. (c) Incubation of the plasmid with exonuclease III
results in hydrolysis of standard dNMPs while the dNMP analogs will block enzymatic degradation. (d) The
single-stranded overhangs of the plasmids are removed enzymatically with mung bean nuclease. (e) The blunt-
ended constructs are recircularized by intramolecular ligation.
(Source: http://nar.oxfordjournals.org/content/29/4/e16/F2.expansion)
2.1.3. Enzyme GAR Transfomylase
Figure 5. GAR Transformylase Structure
(http://www.ebi.ac.uk/thornton-srv/databases/cgi bin/enzymes/GetPage.pl?ec_number=2.1.2.2)
GAR transformylase has important roles in purine biosynthesis and it also has potential
therapeutic benefit from its inhibition. Formyl transfer reactions play a key role in the
construction of the purine heterocycle during de novo purine biosynthesis. Formylation is
catalyzed early in the pathway by the purN glycinamide ribonucleotide transformylase (GAR
Transformylase, EC 2.1.2.2) in a tetrahydrofolate-dependent manner and also by the purT
GAR transformylase in a tetrahydrofolate-independent manner in bacteria.
Figure 6. Reaction Catalyzed by GAR Transformylase
(http://www.ebi.ac.uk/thornton-srv/databases/cgi bin/enzymes/GetPage.pl?ec_number=2.1.2.2)
2.1.4. Genes Encoding Enzyme
E. coli Human
Gene purN GART segment
Gene Function Monofunctional GAR
transfomylase of 212 amino
acids
Trifunctional enzyme
glycinamide ribonucleotide
synthethase-aminoimidazole
ribonucleotide synthetase-
glycinamide ribonucleotide
transformylase
Enzyme Function Catalyses the transfer of the
formyl group from the
cofactor N10-formly-
tetrahydrofolate (fTHF) to the
amino group of GAR to yield
formyl-glycinamide
ribonucleotide (fGAR)
Utilizes cofactor fTHF and is
functional as a separate
domain
There is 50% identity at DNA level between the two genes and 41% identity (60%
homology) on the amino acid level. Amino acid alignment between purN and the GART
segment reveals no gap, although GART lacks nine aminoacids at the C terminus. Structures
of the active sites of the two enzymes have been reported to be esentially identical but the
structure of GART is not availabel in the Protein Data Bank.
2.1.5. Mechanism of ITCHY
2.1.5.1 Making ITCHY Libraries
ITCHY libraries for this case were created between 5’ fragments of purN (1-144) and
3’ fragments of GART (54-203). There are two libraries:
IT-A was created by electroporation into DH5α to obtain a larger library.
IT-B was created ny electroporation into DH5α-E.
The chosen gene fragments have 1-270 bp of overlap, thus, as each gene fragment
would have 1 to 270 bp truncated, an ideal library containing one member of each of the
desired fusions would have 270 X 270 = 72,900 members. IT-B should contain all possible
fusions between the two gene fragments in the region overlap. The size diversity of IT-A and
IT-B was evaluated on randomly selected library members and found to be essentially
random, but with a bias against small fusions (Figure7.)
Figure 7. Size distribution of libraries. The sizes of the gene fusions of randomly selected members of IT-
A (m) and IT-B (P) were estimated by gel electrophoresis and arranged in descending size order. The
shaded area represents the theoretical size range based on the deletion of 1–270 bases of each fragment.
Fusions larger than the desired size range result from fusion of gene fragments in which truncation has
stopped in the approximately 30 bp spacer between the start of truncation and the gene to be truncated. The
dashed line indicates the size of hybrid genes that are fused where their parents’ sequences align.
(Source: Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation. (1999), Proc.
Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.)
Hybrids of PurN and GaRT with Gar transformylase activity were selected for on
minimal media using E. coli auxotroph TX680F which lacks a functional GAR
tranformylase. Because GAR transformylase activity is essential for purine biosynthesis,
TX680F is unable to grow on minimal media in the absence of purines. Plasmid DNA was
purified from the libraries and transformed into TX680F.
The number of active fusions per library of IT-A and IT-B was estimated to be 9 and
111 by multiplying the number of colonies on the selective plates by the library size and
dividing by the number of colony-forming units plated on the selective plates.
Amino acids sequences of randomly selected active fusions of IT-A and IT-B were
determined by DNA sequencing (Table. 1). Active genes were found with fusions in regions
of high and low homology (Figure 8.) and within loops, α-helices and β-sheets of PurN.
Almost all fusion points of active hybrids occured at sites of exact allignment (Figure 9.).
Table 1. Active PurN-GART Fusions
(Source: Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation. (1999), Proc.
Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.)
Figure 3. Fusion points of active PurN–GART hybrids relative to the alignment of PurN and GART. Crossovers
of active fusions found by ITCHY are shown by a solid line and those found by DNA shuffling are shown by a
dashed line. To the left of the crossover point, the fusion has sequence from PurN. To the right of the crossover
point, the fusion has sequence from GART. If the crossover occurred in a region of DNA identity, the exact
fusion point could not be assigned and is shown at the 3’ end of the region of identity. Regions of sequence
identity are indicated in gray. *Three key active site residues. The region chosen to search for active fusions is
shown between the long dashed lines. None of the sequenced, active members of IT-A or IT-B had any other
mutations. Covalent fusion of the two fragments was found to be necessary as no interspecies heterodimers
between any combination of truncated fragments of PurN (1-144) and GART (54-203) were able to complement
the auxotroph.
(Source: Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation. (1999), Proc.
Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.)
Figure 4. Fusion points of active PurN–GART hybrids mapped onto the structure of PurN31. The region
searched for active fusions (residues 54–144) is shown in green, and the area outside the search region is shown
in blue. The side chains of the three key active site residues, Asn106, His108, and Asp144, are shown in yellow.
The substrate GAR (top) and a cofactor analog (5-deaza-5,6,7,8-tetrahydrofolate) are shown in white. The
amino acids of PurN to which fusion of a GART fragment results in an active enzyme are shown in red. These
include fusion points identified by sequencing and those that can be inferred given that an active fusion was
found elsewhere in a region of amino acid identity (e.g., 104–113). The location of fusion points of hybrids
characterized in Table 2 are indicated by the numbers.
(Source: Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation. (1999), Proc.
Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.)
2.1.5.2 Kinetic Characterization of Active Hybrids
Based on kcat/Km (GAR), all characterized fusions were found to have activities of at
least 500-folds less than wild-type PurN. In very simplistic view, it can be expected that the
Km of the hybrids to be similiar to that of PurN (because the PurN fragment contains most of
the residues important for binding) and the kcat to be similar to that of GART (because all of
part of the key active site residues derive from GART).
Table 2. Kinetic constants of selected PurN–GART fusions
(Source: Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation. (1999), Proc.
Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.)
2.1.5.3 Conclusion
ITCHY can create combinatorial libraries of genes in a manner that is independent of
DNA sequence homology by identifying 10 active PurN-GART fusion proteins
between N-terminal fragments of E. coli GAR transformylase PurN and C-terminal
fragments of human GAR transformylase GART.
The optimum start of the GART domain may have been difficult to define and
nonnative E. coli codon usage at the N terminus of the fusions might have led to poor
expression.
Fusions within the active site may be less disruptive because of structural similarity
within this region, whereas fusions distal to the active site with lesser homology may
be more disruptive.
Insertions of a few amino acids or even entire proteins have been shown to be
compatible with activity in other enzymes. Thus, fusions with a few extra amino acids
would not only be active but that they would be the predominant active species.
Predominance of crossovers at positions of precise alignment can be explained:
1. The linear distances between conserved residues may have some importance
for structure and/or function.
2. The decrease in activity caused by extra amino acids is small, but this small
decrease may be enough to prevent complementation of the auxotroph since
kc/Km is already reduced 500- to 10,000-fold for active fusions.
Insertion or deletion of residues in hybrids could prove advantageous for the
engineering of other proteins.
Residues 63 and 112 of PurN might be good choices as fusion points for creating
hybrid enyzmes. Although fusion between purN and GART at residue 112 proved to
be active, no active fusions were found with a PurN fragment shorter than 1-100
residues.
Kinetic characterization of four active hybrids from IT-A and IT-B (Table 2) suggests
that hybrids fused between residues 54 and 100 are not active because of weak
binding of the substrate and cofactor, as the Km values for both substrate and cofactor
increase with the length of the GART in the fusion. The exception is that IT-A1 has a
higher Km(fDDF) than ITA5, even though IT-A1 contains less of GART.
Presumably, this is attributable to IT-A1 being fused in the 140–144 loop, which has
been shown to have a role in binding the cofactor.
2.1.6. Advantages
ITCHY method enabled identification of a more diverse set of active chimeras than
DNA shuffling, principally as a result of the relatively nonbiased and non-homology based
method that creates the fusions. The active fusions identified by ITCHY demonstrate that
crossovers between genes at regions of structural homology, irrespective of DNA sequence
homology, are important for creating functional hybrid enzymes.
Although the library created by DNA shuffling had a higher frequency of positives, it
was not very diverse. Fused genes of ITCHY libraries could have been initially selected for
size (e.g. the size of the original genes) resulting in an enrichment for active members of
probably 10- to 100-fold to give a frequency of 0.1-1.0%. DNA shuffling can create hybrids
with multiple crossovers, whereas ITCHY libraries are limited to one crossover point per
library member. One can envision an iterative method for ITCHY in order to create library
members with multiple crossovers. However, as ITCHY libraries create all possible
crossovers between two genes, DNA shuffling of ITCHY libraries should allow one to create
a library of genes with multiple crossovers that include crossovers at regions of no homology,
thus accessing a more diverse sequence space. In a fashion analogous to DNA family
shuffling, which improves directed evolution by accessing a more diverse yet functional
sequence space, such a strategy should prove useful for the directed evolution of proteins. In
addition, ITCHY libraries should have applications in the creation of novel enzymes by
domain and subdomain swapping as well as in the determination of structure/function
relationships by characterizing hybrids of interspecies homologs.
2.1.7. Improvement
An improvement over incremental truncation for the creation of hybrid enzyme
(ITCHY) is SCRATCHY (ITCHY combined with DNA shuffling). The approach combines
two methods for recombining genes: ITCHY and DNA shuffling. First, ITCHY is used to
create a comprehensive set of fusions between fragments of genes in a DNA homology-
independent fashion. This artificial family is then subjected to a DNA-shuffling step to
augment the number of crossovers. SCRATCHY libraries were created from the
glycinamide–ribonucleotide formyltransferase (GART) genes from E. coli (purN) and human
(hGART).
REFERENCE
Aristharkova, S. A., Burlakova, E. B. & Sheludchenko, N. I. 1979. [Effect of lecithin on liver
microsomal lipid peroxidation]. Biokhimiia, 44, 125-9.
Beckmen, R. A., Mildvan, A. S. & Loeb, L. A. 1985. On the fidelity of DNA replication:
manganese mutagenesis in vitro. Biochemistry, 24, 5810-7.
Cadwell, R. C. & Joyce, G. F. 1992. Randomization of genes by PCR mutagenesis. PCR
Methods Appl, 2, 28-33.
Cole, M. F. & Gaucher, E. A. (2011). Exploiting models of molecular evolution to efficiently
direct protein engineering. J Mol Evol, 72, 193-203.
Dasu, V. Venkata et al. Developments in Directed Evolution for Improving Enzyme
Functions. (18 August 2007), Appl Biochem Biotechnol (2007) 143:212–223 DOI
10.1007/s12010-007-8003-4
Kaur, J. & Sharma, R. 2006. Directed evolution: an approach to engineer enzymes. Critical
Reviews Biotechnoogyl, 26, 165-99.
Nixon, Andrew E. et al. Hybrid Enzymes: Manipulating Enzyme Design. (1998), TIBTECH
JUNE 1998 (VOL 16), Elsevier Science Ltd.
Nixon, Andrew E. et al. Incremental Truncation as a Strategy in the Engineering of Novel
Biocatalysts. (15 October 1998), Bioorganic & Medicinal Chemistry 7 (1999)
2139±2144.
Ohtsuka, E., Matsuki, S., Ikehara, M., Takahashi, Y. & Matsubara, K. 1985. An alternative
approach to deoxyoligonucleotides as hybridization probes by insertion of deoxyinosine
at ambiguous codon positions. Journal of Biological Chemistry, 260, 2605-2608.
Ostermeier, Marc et al. Combinatorial Protein Engineering by Incremental Truncation.
(1999), Proc. Natl. Acad. Sci. USA Vol. 96, pp. 3562–3567, March 1999 Biochemistry.
Rubin-Pitel, Sheryl B. and Zhao, Huimin. (2006). Recent Advances in Biocatalysis by
Directed Enzyme Evolution. Combinatorial Chemistry & High Throughput Screening,
9, 247-257.
Tao, H. & Cornish, V. W. (2002). Milestones in directed enzyme evolution. Curr Opin Chem
Biol, 6, 858-64.
Willliam, G. J., Nellson, A.S., and Berry, A. (2004). Directed Evolution of Enzymes for
Bicatalysis and The Life Sciences. Cellular and Molecular Life Sciences, 61, 3034–
3046.
Zakour, R. A., Kunkel, T. A. & Loeb, L. A. 1981. Metal-induced infidelity of DNA
synthesis. Environ Health Perspect, 40, 197-205.