Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List...

65
Exploration of peptide recognition using directed evolution of the PDZ domain fold by Megan Elizabeth McLaughlin A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Molecular Genetics University of Toronto © Copyright by Megan McLaughlin 2013

Transcript of Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List...

Page 1: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

Exploration of peptide recognition using directed evolution of the PDZ domain fold

by

Megan Elizabeth McLaughlin

A thesis submitted in conformity with the requirementsfor the degree of Master of Science

Graduate Department of Molecular GeneticsUniversity of Toronto

© Copyright by Megan McLaughlin 2013

Page 2: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

ii

Exploration of peptide recognition using directed evolution of the PDZ domain fold

Megan Elizabeth McLaughlin

Master of Science

Graduate Department of Molecular GeneticsUniversity of Toronto

2013

Abstract

The PDZ domain family is one of the most abundant peptide recognition modules in metazoan

proteomes. Characterization of natural PDZ domains has provided insight into the structural basis

and diversity of peptide recognition by this fold. In order to test the limits of the current model, I

evolved synthetic PDZ domains. Based on the Erbin PDZ domain and selected for binding to pep-

tides with different position-2 residues, synthetic variants were characterized using high throughput

peptide profiling. This approach generated insight into subclass specificities in the most common

natural specificity classes (I [ST]-2 and II Φ-2), demonstrated an alternative basis for a rare specific-

ity (Class III D-2) and predicted that some natural domains may exhibit an easily-evolved novel

specificity (Class IV R-2). These results also emphasize that some non-contact residues may have a

disproportionate effect on position-2 specificity, contrary to the predictions of the original model.

Page 3: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

iii

Acknowledgements

Thank you to my supervisor, Dev Sidhu, for the tremendous opportunities you have offered and for your patience. Thanks also to my committee members, Alan Davidson and Tony Pawson, for chal-lenging me to think more critically about this project and for sharing your enthusiasm.

I owe a debt of gratitude to the fantastic group of people in the Sidhu lab, past and present, for your camaraderie and commiseration. Special thanks go to: Andreas Ernst, for your patient training, kindness and support; I will look back on our scientific arguments with great fondness.

Sarah Barker, for insisting that we celebrate every success, even the minor ones. Now we are prepared to celebrate some major successes, right?

Nicolas Economopoulos, for highlighting the hilarity in otherwise tragic experimental disasters. Linda Beatty, for all your efforts that keep the lab running and too often go unappreciated.

Wei Ye, for your unfailing optimism and tireless dedication. Working with you has been the best part of the last few years. Unfortunately our project is not documented in this thesis!

To farflung friends, thanks for not letting me take myself too seriously. Please accept my apologies for making you visit me in Toronto.

Most importantly, thank you to my family for your tremendous support and love, for demonstrat-ing the virtues of boundless curiosity and for teaching me to persevere.

Page 4: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

iv

Table of Contents

Acknowledgements iiiTable of Contents ivList of Tables viList of Figures viiChapter 1 Introduction 1 1.1 Cellular function depends on specific interactions between domains and peptides 2 1.2 PDZ domain biology and C-terminal peptide recognition 2 1.3 Natural PDZ domain specificities 4 1.4 Structural basis for position-2 specificities 5 1.3 Directed evolution of synthetic PDZ domain variants 5 1.4 Summary and rationale 6Chapter 2 Directed evolution of position-2 specificities in the PDZ domain fold 7 2. 1 Introduction and Rationale 8 2.2 Directed evolution of PDZ variants 8 2.3 Preliminary analysis of PDZ variant sequences 12 2.4 Strategy for analyzing selected PDZ variants 14 2.5 Specificity profiling of PDZ variants 15 2.6 Multiplex highthroughput sequencing of enriched peptide-phage pools 17 2.7 Sequencing data filtering and logo generation 18 2.8 Results 19 2.8.1 Selectant specificity profiles 20 2.8.2 Proximal residue double mutant specificity profiles 25 2.8.3 Proximal residue single mutant specificity profiles 27 2.8.4 Distal residue mutant specificity profiles 29 2.9 Discussion 33 2.9.1 The set of engineered variants encompasses the range of specificities observed among natural PDZ domains. 33 2.9.2 Class I specificity can be achieved by canonical or non-canonical contacts. 34 2.9.3 Class II sub-specificities require two proximal contact residues, but the most selective variants depend on an alternative binding mode. 35 2.9.4 Class III variant suggests α2-5 can be the major specificity determining residue. 37 2.9.5 Class IV specificity can be generated by polar or non-polar contacts, and this specificity may exist in the set of natural domains. 38 2.9.6 Model for helix contributions to specificity, structure or generic affinity is an oversimplification. 40

Page 5: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

v

Table of Contents (continued)

2.10 Materials and Methods 41 2.10.0 Strains 41 2.10.1 PDZ-phage library construction 41 2.10.2 Cloning, expression and purification of GST-peptide selection targets 42 2.10.3 Selection of PDZ-phage library against GST-peptides 43 2.10.4 Binding validation and sequencing of PDZ variants 44 2.10.5 Subcloning, expression and purification of PDZ variants 45 2.10.6 Highthroughput peptide profiling of PDZ variants 47 2.10.7 Preparation of barcoded cluster-ready PCR products for Illumina sequencing 48 2.10.8 Sequencing data processing and logo generation 49Chapter 3 Conclusion 51 3.1 Summary of work and future directions 52References 54

Page 6: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

vi

List of Tables

Table 2.1 PDZ variants selected directly from the library according to specificity class. 19

Page 7: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

vii

List of Figures

Figure 1.1 Structural basis of PDZ domain specificity for C-terminal peptides. 3

Figure 2.1 Combinatorial library designed to generate PDZ variants with novel position-2 specifici-ties. 9

Figure 2.2 Directed evolution of PDZ variants using phage display. 10

Figure 2.3 Peptide targets with different position-2 residues for selection of PDZ variants with dif-ferent specificities. 10

Figure 2.4 Competitive selection strategy to enhance recovery of selective PDZ variants. 11

Figure 2.5 Selection strategy to recover protease-resistant PDZ variants. 11

Figure 2.6 Sequences of PDZ variants recovered from peptide and protease resistance selections. 13

Figure 2.7 Analysis of selected PDZ variants suggests randomized residues contribute to specificity, generic affinity or structure. 14

Figure 2.8 Strategy for analyzing PDZ variant specificities. 15

Figure 2.9 Specificity profiling of PDZ variants using phage-displayed peptide libraries. 16

Figure 2.10 Barcoding strategy for multiplex highthroughput sequencing. 17

Figure 2.11. Alignment of barcoded cluster-ready PCR product to part of peptide-phage display vector. 18

Figure 2.12 PDZ selectants with Class I peptide profiles [ST]-2 . 21

Figure 2.13 PDZ selectants with Class II peptide profiles Φ-2 . 22

Figure 2.14 PDZ selectants with Class III D-2, Class IV R-2 and naive library peptide profiles. 23

Figure 2.15 PDZ selectants that are non-selective for position-2. 24

Page 8: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

viii

List of Figures (continued)

Figure 2.16 Peptide profiles for PDZ variants with mutations proximal to peptide position-2. 26

Figure 2.17 Peptide profiles for PDZ variants with single mutations proximal to peptide position-2. 28

Figure 2.18 Peptide profiles for PDZ variants with mutations distal to peptide position-2. 30

Figure 2.19 Summary of position-2 information content and hydrophobicity scores for profiled PDZ variants. 31

Figure 2.20 Summary of position-2 information content and hydrophobicity scores for profiled PDZ variants, coded according to net change in formal charge. 32

Figure 2.21 Comparison of position-2 specificities found in natural PDZ domains and engineered variants. 33

Figure 2.22 Summary of human PDZ domain sequences corresponding to residue randomized in the combinatorial library. 35

Figure 2.23 Necessity and sufficiency of two mutations to yield Class II W-2 specificity. 36

Figure 2.24 Multiple specificity logos for PDZ variants with Class II W-2 specificity. 36

Figure 2.25 Necessity of at least two mutations to yield Class III specificity in this PDZ variant. 37

Figure 2.27 Necessity and sufficiency of two mutations to yield Class IV specificity. 37

Page 9: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

1

Chapter 1Introduction

Page 10: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

2

1.1 Cellular function depends on specific interactions between domains and peptides Cellular function depends on the dynamic spatiotemporal organization of proteins within the cell, which is accomplished in large part by protein-protein interaction domains [1]. A subset of these interaction domains recognize short peptide sequences in other proteins, and are known as peptide recognition modules (PRMs) [2, 3]. Duplication and divergence have shuffled these domains into many different protein contexts. Some PRM families have hundreds of representa-tives in the human genome, each using the same fold to recognize different partner proteins [4]. Understanding how these modules are able to bind specifically to cognate partners, selected from large collections of similar alternatives, is a fundamental question [3]. Our understanding of peptide specificity in some PRM families is quite advanced, owing largely to two complementary approaches: in vitro characterization of peptide preferences using combinatorial methods (peptide profiling and peptide arrays), and detailed structural and bio-physical studies of representative modules. PDZ domains have served as a model family for these approaches [5 -13]. Studies of large sets of natural PDZ domains have uncovered a wide range of specificities and endogenous ligands [12, 13]. Structures of key PDZ domain - peptide ligand com-plexes have helped explain how different peptide specificities can be generated by the same overall fold. To test the completeness of our current understanding for PDZ domain specificity, I evolved synthetic PDZ variants and compared their specificities and sequences to those of natural PDZ do-mains. Based on the specificities that were evolvable, and the mutations required, it is possible to refine our understanding of selective peptide recognition by the PDZ domain fold.

1.2 PDZ domain biology and C-terminal peptide recognition

PDZ domains are peptide recognition modules that typically bind to the C termini of other proteins. There are over 250 PDZ domains in the human genome located in more than 100 pro-teins [16], whose biological roles include assembling signaling complexes at specialized subcellular sites like synapses by binding receptors, ion channels, cytoskeletal proteins and adhesion proteins, among others[17]. The intrinsic specificity of a PDZ domain dictates its affinity for and range of possible bind-ing partners, which is the foundation for its biological function. PDZ domains that share simi-lar intrinsic specificities may be involved in the same biological process. Competition for limited binding partners can result in heightened sensitivity to local concentrations. In conjunction with relatively moderate affinities and quick off rates, PDZ-peptide interactions can contribute to the responsiveness of a network or pathway [17]. Higher order regulation may prevent interactions that are biophysically possible but biolog-

Page 11: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

3

ically deleterious from occurring. These regulatory mechanisms can include temporal control such as transcriptional regulation, spatial control such as cellular compartmentalization, and post-trans-lational modification for reversible steric inhibition of binding. Many of these regulatory mecha-nisms depend on other interaction domains found in multidomain PDZ-containing proteins [1].

PDZ domains may recognize up to the last seven residues of their peptide ligand, although the last four peptide residues and the terminal carboxylate constitute the core recognition motif and contribute most of the binding energy [8]. PDZ domain:peptide ligand interaction affinities typically range from high nanomolar (nM) to low micromolar (uM). The last peptide residue is identified as position0, with neighbouring residues numbered towards the N terminus as position-1, position-2, etc. The portion of the PDZ domain that recognizes each ligand position is correspond-ingly referred to as site0, site-1 and so on (Figure 1). Other PDZ domain interactions have been identified, including binding to internal pep-tides (Dvl, [18]), binding to lipids [19], and domain-domain dimerization [20-22]. Although the biological relevance of some of these unusual interactions is well established, this thesis focuses on the canonical binding interaction with C-terminal peptides. The structural basis for observed specificities has been determined wih the help of a num-ber of PDZ domain structures [5, 6, 8, 9, 20, 21, etc]. In the canonical binding mode, the terminal

COOH  0  -­‐2  

-­‐1  -­‐3  

COOH  0  -­‐2  

-­‐1  -­‐3  

ETWV Figure 1.1 Structural basis of PDZ domain specificity for C-terminal peptides. A, Distinct regions of the PDZ domain (grey surface) interact with each ligand sidechain (green stick). The peptide positions are numbered in decreasing order from its C-terminus, which is position0. The peptide’s C-terminal carboxylate group inserts into a pocket on the domain surface. B, Secondary structure elements of the PDZ do-main (grey ribbons) support ligand binding. The peptide ligand adds on to a domain β-strand by β-augmentation, which satisfies the ligand’s main chain hydrogen bonds. The flanking α-helix orients some of its sidechains towards the ligand. Favourable sidechain-sidechain interactions contribute to a domain’s peptide specificity. (PDB: 1N7T, NMR structure of the Erbin PDZ domain in complex with ETWVCOOH peptide.)

A B

Page 12: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

4

carboxylate of the peptide ligand interacts with the carboxylate binding loop of the domain, with the rest of the peptide binding in a groove between an α-helix and β-strand (Figure 1). In effect, the peptide adds on to the domain secondary structure through β-strand augmentation; these main chain: main chain hydrogen bonds contribute favourably to binding energy and dictate the overall binding mode, but cannot determine specificity [8]. Another consequence of β-strand augmenta-tion is that the bound peptide adopts an extended conformation. As a result, its side chains contact distinct regions of the PDZ domain [8]. Thus, favorable contacts for adjacent ligand positions may be relatively independent from one another. Side chain:side chain interactions generally determine a domain’s specificity, and peptide side chain: domain main chain interactions contribute to a lesser degree. Peptide side chains that contribute to the stability of the complex are preferred. In a general way, the breadth of natural PDZ specificities can be attributed to the fold’s ability to support diversity at key positions. Likewise, a domain’s specificity can be altered by mutation of one or more of these specificity determinants [13]. 1.3 Natural PDZ domain specificities

Initial studies grouped PDZ domains into one of three major classes based on amino acid preference at just one ligand residue, position-2, [6], as follows:

- Class I [ST]-2X[VIL]$ - Class II φ-2Xφ0$ - Class III [DE]-2Xφ$

where φ = hydrophobe, $ = C-terminus and X = any residue.

Subsequent studies have shown that PDZ domains can have preferences for additional po-sitions of the ligand, and can exhibit subtle differences in specificity. A study from our group identi-fied sixteen PDZ specificity classes based on preferences across four or more peptide positions [13]. Other groups have argued that the range of specificities is best represented as a continuum rather than discrete classes [12]. The distinction is largely semantic, as it is increasingly apparent that PDZ domains exhibit subtle differences in specificity that may be biologically relevant. However, the class distinction based on position-2 is useful, in particular because specificity for position-2 is the simplest and best understood on a structural level.

Page 13: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

5

1.4 Structural basis for position-2 specificities

Based on structural analysis and mutation, there are two major specificity determining residues for position-2: two residues of an α-helix whose side chains are proximal to the position-2 sidechain (Figure 1.2 A) [8]. According to the sturctural nomenclature for these domains, the sec-ondary structure elements (SSEs) and their residues are numbered from N- to C-terminus, with loops identified by their flanking SSEs. Class I preference for residues with hydroxyl groups (ser-ine or threonine) at position-2 is typically mediated by a hydrogen bond to a histidine sidechain at of the first residue of that helix, i.e. α2-1, as observed in many class I co-complex structures, such as the Erbin PDZ domain in complex with ETWVCOOH peptide [8] (Figure 2.1A). Mutation of the α2-1 residue can switch a domain’s peptide preference from one class to another. Introduction of a histidine at this position can sometimes convert a domain to Class I, while substitution with another residue can switch the domain from Class I to Class II [23, 24]. Class II preference for hydrophobic position-2 residues is generally mediated by hydrophobic side-chains at α2-1 and/or α2-5, which are in range to form Van der Waals contacts. However, attempts to rationalize PDZ preference for particular subtypes of hydrophobic residues, such as aliphatic or aromatic residues, have been limited. Rare Class III preference for negatively charged aspartate residue at position-2 is mediated, in the case of the nNOS PDZ domain, by a hydrogen bond to a tyrosine residue at α2-5 [29]. Residues at α2-1 and sometimes α2-5 are generally considered to dictate position-2 specificity. However, this model cannot fully account for within-class subtleties, such as a strict preference for threonine over serine at position-2, that are observed among natural PDZ domains.

1.5 Directed evolution of synthetic PDZ domain variants

Duplication and divergence of a progenitor PDZ fold accounts for the breadth of natural PDZ domain specificities. In a more targeted way, it is possible to switch a PDZ domain’s specificity by mutating key residues [23, 24]. However, point mutations often result in a loss of any preference, rather than an alteration in specificity [25], which suggests that combinations of mutations are necessary. One difference between natural and directed evolution is that in the second case, muta-tions do not have to accumulate step-wise without impairing protein function, and mutations can be targeted to particular residues. The full range of genetically-encoded amino acid diversity can be incorporated at specificity determining residues using degenerate oligonucleotide-directed muta-genesis, to generate large combinatorial libraries of PDZ variants [25]. These large libraries of PDZ variants can be displayed as genetic fusions to phage coat proteins. Individual variants with desired specificities can be selected from this library by their ability to bind a particular peptide ligand. Se-lected PDZ variants can then be produced recombinantly for characterization. (Figure 2.2). Given that specificity for position-2 is best understood, I designed a combinatorial library based on that

Page 14: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

6

model, and evolved PDZ variants specific for different position-2 residues.

1.6 Summary and rationale

The PDZ domain family is one of the most abundant peptide recognition modules in meta-zoan proteomes. Extensive characterization of natural PDZ domains has provided insight into the structural basis and diversity of peptide recognition by this fold. Our understanding of specificity for peptide position-2 is the most advanced, but the basis for subtle within-class differences has not been systematically explored. In order to test the limits of the current model for position-2 speci-ficity and gain further insight, I evolved synthetic PDZ domains with a wide range of position-2 specificities. Systematic characterization of these synthetic PDZ domains suggests that previously unappreciated domain residues can tune within-class specificities. Moreover, the PDZ domain fold is capable of supporting new specificities that have not yet been observed among natural domains.

Page 15: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

Chapter 2Directed evolution of position-2 specificities in the PDZ domain fold

7

Page 16: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

8

2.1 Introduction and rationale

Starting from a single PDZ domain as a scaffold, I designed a combinatorial library intend-ed to yield new position-2 specificities, based on current understanding of its structural basis. Syn-thetic PDZ domain variants with interesting position-2 preferences were isolated from the library based on their ability to bind a peptide with a particular position-2 residue. Peptide preferences of thes engineered variants were then characterized using a phage-displayed random peptide library. Additional PDZ variants that incorporated fewer mutations were also profiled, to assess the neces-sity and sufficiency of particular mutations to generate a given specificity.

2.2 Directed evolution of PDZ variants

There are two major specificity determining residues for position-2: two residues of an α-helix whose side chains are proximal to the position-2 sidechain (Figure 2.1A) [8]. Therefore, these two proximal residues were randomized in the combinatorial library. The adjacent positions of the α-helix, whose sidechains are distal from the peptide, were also randomized to allow more conformational diversity of the proximal residues. In total, seven domain residues in the α-helix (β5:α2-1 to α2-6) were randomized to allow all twenty genetically encoded amino acids at each position (Figure 2.1B). The Erbin PDZ domain was used as a scaffold and the library was displayed on phage as a fusion to one of the minor coat proteins [25]. PDZ variants with desirable specificities were iden-tified using iterative selections for the ability to bind a particular peptide sequence (Figure 2.2) Target peptides with the desired specificities were produced as GST fusions in E. coli (Figure 2.3). To enhance the selective pressure in favor of PDZ variants that were selective for the immobilized target peptide, GST-peptides with other position-2 residues were added as competitors in solution during the later selection rounds (Figure 2.4). A parallel selection for protease resistance was car-ried out to identify structural limitations on the combinatorial library (Figure 2.5) [25]. PDZ variant clones from the fourth and fifth rounds of selection were tested for binding to target peptide, compared to GST control, using enzyme linked immunosorbent assays (ELISAs). I screened 48 to 96 clones from each peptide selection and sequenced up to 24 clones, choosing those with the highest ratio of specific binding for subsequent study. In total, 339 unique PDZ variants were identified from the peptide binding selections. I also screened 96 clones from the protease resistance selections for binding to anti-gD antibody relative to GST control; all clones had ELISA ratios much greater than 2 and all were sequenced. Unique PDZ variant sequences from each selec-tion are summarized as logos in Figure 2.6.

Page 17: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

9

Figure 2.1 Combinatorial library designed to generate PDZ variants with novel position-2 specificities. A, PDZ domain residues whose sidechains are proximal to the peptide position-2 sidechain include α2-1 and α2-5 (magenta spheres with sidechains shown as sticks). These residues are the major specificity determinants for position-2 (orange stick). In class I PDZ domains, as shown here, a histidine residue in the α-helix (α2-1) can form a hydrogen bond with hy-droxyl groups in peptide position-2, resulting in a preference for serine or threonine residues. B, PDZ domain residues randomized in the combinatorial library, designed to yield new specificities for position-2 (orange sphere). Seven contiguous positions of the Erbin PDZ domain, encompassing the two major specificity determinants, were random-ized to allow all genetically encoded amino acids (magenta spheres). The theoreti-cal diversity of this library is 207 = 1.28x109 unique PDZ variants. The mutated region spans residues β5:α2-4 to α2-6, according to the structural nomenclature; secondary structure elements (SSEs) and their residues are numbered from N- to C-terminus of the domain, with loops identified by their flanking SSEs. For example, His(α2-1) is the first residue of the second α-helix of the canonical PDZ fold. The Erbin PDZ do-main actually lacks the first α-helix found in many other PDZ domains. (PDB: 1N7T)

A B

ETWV EXWV

Page 18: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

10

Figure 2.2 Directed evolution of PDZ variants using phage display. Large combinatorial libraries of PDZ variants can be displayed as fusions to bacterio-phage M13 coat proteins, such that the sequence of the displayed protein on a single phage particle is encoded by the genome packaged within. This physical connection between phenotype and genotype makes it possible to select a displayed PDZ variant based on its binding properties, and then readily obtain its sequence from the phage genome. Selections are performed by immobilizing the ligand of interest to a solid sup-port, allowing the library of PDZ variants the opportunity to bind, and then washing away the phage that fail to bind the immobilized ligand. The PDZ-phage that are retained can be infected into bacteria to replicate themselves, and the new pool of phage can then be used for another selection. After repeated rounds of selection, PDZ-phage with the selected properties dominate the population, at which point individual PDZ-phage can be isolated, sequenced and subcloned for expression and further characterization.

pep#de  with  target  sequence    

PDZ    variants  1010    

Target Peptides

-2GST - E T W V COOH Class IGST - E S W V COOH [ST]X[VIL]$GST - E W W V COOH

GST - E Y W V COOH

GST - E F W V COOH

GST - E I W V COOH Class IIGST - E L W V COOH ΦXΦX$GST - E V W V COOH

GST - E A W V COOH

GST - E M W V COOH

GST - E D W V COOH Class IIIGST - E E W V COOH [DE]XΦ$GST - E R W V COOH

GST - E K W V COOH

GST - E H W V COOH

GST - E Q W V COOH

GST - E N W V COOH

GST - E C W V COOH

GST - E P W V COOH

GST - E G W V COOH

Figure 2.3 Peptide targets with different position-2 residues for selection of PDZ variants with different specificities. The twenty peptides listed were produced as GST fusions. Only position-2 was altered, and is color-coded according to sidechain char-acter. The other peptide positions match the optimal ligand sequence for the Erbin PDZ domain, which is the parental scaffold for the combinatorial library. The overall bind-ing mode of these peptides to the PDZ vari-ants should be preserved, based on favor-able contacts between optimal ligand residues and unmutated regions of the PDZ variants. The major classes of natural PDZ specifici-ties are outlined, to illustrate that only some of the target peptides belong to specificity classes observed for natural domains to date.

Page 19: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

11

19 other GST-peptides

competitors in solution

GST - E S W V COOH

Immobilized anti-gD antibody

protease resistant PDZ variant

Figure 2.4 Competitive selection strategy to enhance recovery of selec-tive PDZ variants. Each target peptide was immobi-lized and the remaining 19 GST-pep-tides with other position-2 residues were added as competitors in solution. The total competitor concentration was 300 nM. The concentration of each indi-vidual competitor was 17 nM. PDZ vari-ants that bound the competitors in solu-tion were more likely to be washed away. Selective PDZ variants that preferen-tially bound the immobilized target even in the presence of closely-related com-petitors were more likely to be retained.

Imm

obili

ze o

ne ta

rget

GS

T-pe

ptid

e

GST

-E

TW

VC

OO

H

GST - E W W V COOH

GST - E I W V COOH

GST - E D W V COOH

GST - E R W V COOH

GST - E Q W V COOH

Figure 2.5 Selection strategy to recover protease-resistant PDZ variants.The recombinant phage coat pro-tein displays an N-terminal gD epitope (SMADPNRFRGKDL) in addition to a PDZ variant. Immobilized anti-gD antibody can retain PDZ-variant phage particles only if the PDZ variant is intact. Unstable PDZ vari-ants tend to be susceptible to degradation by the bacterial outer membrane proteases. Over multiple rounds of anti-gD selection, protease susceptible PDZ variants are de-pleted and protease-resistant PDZ variants are enriched. Protease resistance corre-lates with protein stability, and so selections for protease resistance can reveal structur-al limitations imposed on library diversity.

PDZ  var

iant

phage  

particl

e

protease susceptible PDZ variant

gD

gD  tag

gD

gD

Page 20: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

12

2.3 Preliminary analysis of PDZ variant sequences

This systematic approach revealed trends in the PDZ variant sequences that were indepen-dent of the peptide target and unlikely to contribute to specificity. First, all the protease resistant PDZ variants had an alanine residue at α2-4, as did the vast majority of PDZ variants selected for peptide binding. This strongly suggested that other residues were structurally deleterious at this location of the α-helix. Structurally, this is reasonable because the α2-4 alanine side chain packs against the hydrophobic core of the domain [8]. There was no bias at the remaining randomized residues of the protease resistant variants, indicating these residues were structurally tolerant to the full range of substitutions (Figure 2.6B). Second, the aggregate logo of all peptide-binding PDZ variants showed that arginine resi-dues occurred frequently at the distal positions of the engineered helix. These trends could not be attributed to structural limitations and therefore must have resulted from selection for peptide binding. Given that the peptides were quite negatively charged (ETWVCOO- = formal charge 2-), it was possible that PDZ variants with additional positive charges had tighter affinities based on im-proved charge complementarity and faster on-rates. These effects were observed for PDZ variants selected against all peptide targets, and were unlikely to contribute to position-2 specificity. Encouragingly, there were peptide sequence-specific trends at the two major specificity de-terminant residues, evident in some peptide-specific logos (Figure 2.6C) and corroborated by the absence of such trends in the aggregate logo (Figure 2.6D). This suggested that the selections had enriched for PDZ variants with the intended specificities.

Page 21: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

13

GRTPS

Y

G

WPFL

GAWR

KFRALYWLRW12 domains

RSPV

A

WFPL

KWRKWRGAV

A

TYWLIR

17 domains

TRSPPLV

L

FARWS

KWRGAHYWWVMLHR

14 domains3 domains

R

G

LIFCWPLHGR

RAVTQG

S

RGLWVA

R

A

LKYVMW

S

L

TRA

QPE

G

YRKFW

11 domains

EC

WVR

TSPG

RWMLY

T

P

N

G

E

A

RWMEARGAY

T

S

H

RA

QE

G

A

MHRW

22 domains

PVSIWS

TPAYQNR

S

G

QKRW

PLARWG

RMAW

TSRG

QAW

T

NHFAW

13 domainsTPGSR

P

K

WVQALT

S

G

WKR

A

QKHRAYWRHQKR23 domains

R

L

YVMGFW

PGARK

T

A

RQMEWRPGAWL

AWFRS

R

G

YWE

QHRW

10 domainsTPSR

QPALSWRWKRGATGEALWKR

18 domains

TPSPLI

VM

QL

HGWRRAARLKYWR

22 domainsR

G

ASP

L

HVPV

S

L

I

WF

FRWGAAYRWIFWRVL

23 domains

R

G

TPSQHLF

S

HGWRKRAVHWTYVKWR19 domains

T

HPSR

R

VLAPL

RFWLMRWGAV

L

RAYW

VLRW

17 domains

TPRMLRDGWGKRAYQERW19 domains

R

TNGSPD

V

LPGA

P

QGWAR

S

WQRGAV

R

G

MLFAW

V

L

G

YQHWR

22 domains

R

KPTS

RLT

S

WGPRA

V

G

QKDRAVSPNMITKATSGQHDWR20 domains

RPTSRHT

S

G

W

Q

NKRA

KRAVLWEARQNAR20 domains

RTPSRGSQPEAQREAVLKRTWMKYQHR

22 domainsTSPHWRWRAYVAKIR

11 domains

ET

WV

ES

WV

EW

WV

EY

WV

EF

WV

EIW

VE

LW

VE

VW

VE

AW

VE

MW

V

ED

WV

EE

WV

EQ

WV

EN

WV

EP

WV

EG

WV

ER

WV

EK

WV

EH

WV

PDZ variant residuesTa

rget

CA

B

D

Figure 2.6 Sequences of PDZ vari-ants recovered from peptide and protease resistance selections.A, structure of the Erbin PDZ domain in complex with ETWVCOOH peptide, with the seven randomized positions (spheres) color- coded to match the wildtype do-main sequence shown below. These seven domain positions correspond to the logo positions in B, C and D.B, PDZ variants selected for bind-ing to immobilized anti-gD antibody, via an N-terminal gD epitope, to re-cover protease resistant domains. C, PDZ variants selected for bind-ing to EXWVCOOH peptides, with target peptide shown to the left and number of unique domains indicated below.D, Single logo representing all the PDZ variants selected for binding to twen-ty position-2 peptides (ie. all the vari-ants shown in panel B. (PDB:1N7T)

Logos summarize only the seven domain positions that were randomized in the com-binatorial library. The height of a column is proportional to the information content at that position. The height of each symbol within a column is proportional to its frequency. The single-letter amino acid codes are col-ored according to their chemical character.

N

A

T

RDPGS

S

V

G

P

E

AL

T

Q

V

S

G

P

R

A

Q

G

D

A

T

ER

K

G

ALQVSEARDTLGQEAR

W

G

TRPS

G

F

V

Q

A

HPRL

P

S

F

G

A

WRE

Q

A

K

WRL

W

R

GAT

Y

L

Q

E

A

WR

Q

K

H

A

V

L

WR

95 domains

339 domains

proteaseresistance

peptidebinding

Page 22: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

14

2.4 Strategy for analyzing selected PDZ variants

Based on these trends, it appeared that the randomized α-helix residues were contributing to structure, to specificity or to generic affinity (Figure 2.7) depending on whether their sidechains packed against the domain core, were proximal to the position-2 sidechain or were distal to the posi-tion-2 sidechain. In order to test this model, I also characterized PDZ variants with only proximal mutations or only distal mutations, in addition to PDZ variants that were selected directly from the library (selectants). I also characterized all the point mutants of either proximal specificity deter-mining residue, in order to establish whether single mutations were sufficient to generate specifici-ties observed in the multiple mutants (Figure 2.8).

Figure 2.7 Analysis of selected PDZ variants suggests randomized residues contribute to specificity, generic affinity or structure. Logos summarize the recovered sequences of different PDZ variants (showing only the seven positions randomized in the combinatorial library, cyan.) Colour coded struc-tures and arrows highlight the proposed proximal specificity determinants (magenta), distal generic affinity contributors (blue), and the structurally important residue (black).

All  ligand    selected    variants  n=339  

PDZ  variants:    

randomized  posi:ons  

Likely  contribu:ons  of  each  randomized  posi:on:  

Specificity   Generic  Affinity   Structure  

All  ligand    selected    variants  n=339  

Structure    selected    variants  n=95  

Page 23: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

15

2.5 Specificity profiling of PDZ variants

Rather than profile all 339 PDZ variants, representative selectants were chosen and I cloned them and their derivative PDZ variants into a GST fusion vector for expression in E. coli. I ex-pressed and purified 169 PDZ variants: 87 selectants, 17 variants with proximal mutations, 22 vari-ants with distal mutations, 38 point mutations at either proximal residue and 5 controls. I profiled these PDZ variants using a random peptide library to assess their specificities in a relatively unbiased way (Figure 2.9). Peptide-phage selected by PDZ variants were enriched over the rounds of selection, which was monitored by peptide-phage ELISAs. After three rounds of selection, ELISAs showed strong selective binding of peptide-phage pools to GST-PDZ variants compared to GST alone. Two additional rounds of selection were carried out to increase the stringency of the selec-

Figure 2.8 Strategy for analyzing PDZ variant specificities. PDZ variants obtained directly from the combinatorial library by selection (re-ferred to as “selectants”) were peptide profiled. In addition, derivative PDZ vari-ants that incorporated only the proximal or distal mutations were profiled, to as-certain whether the proximal mutations were sufficient to yield the selectant’s specificity. PDZ variants with single mutations at either proximal position (α2-1 or α2-5) were also profiled, to demonstrate whether both proximal mutations were necessary to generate their specificities. In addition, affinity measurements for these sets of domains will illustrate whether the distal residues improve peptide-binding affinity and whether this has a detrimental affect on specificity.

Variant  PDZ  from  selec2on  

Wildtype  PDZ  

Are  direct  contacts  sufficient    

for  specificity?  

Are  both  direct  contacts  necessary    

for  specificity?  

Specificity  residues  

 proximal  

Generic  affinity  residues  

 

distal    

Page 24: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

16

tion by enforcing high levels of competition between peptide-phage for limited amount of immo-bilized PDZ variant. Pooled ELISAs with the fifth round pools of peptide-phage exhibited strong, selective binding to the target GST-PDZ variant compared to GST alone (data not shown.)

Figure 2.9 Specificity profiling of PDZ variants using phage-displayed peptide libraries.Immobilized PDZ variants were incubated with a random C-terminal peptide library dis-played on phage. In this case, the peptide library was seven residues long, with a theoretical diversity of 207 = 1.28x109 peptides. The practical diversity of the library exceeded 1010 unique members. Each PDZ variant was incubated with ~1012 phage, oversampling the practical library diversity by ~100-fold. Statistically, every PDZ variant was presented with all possible random peptides. Peptide-phage that fail to bind are washed away and remaining phage are eluted and replicated by passage through a bacterial host. The selection procedure is repeated with the amplified phage. After sev-eral rounds of selection, the population becomes enriched with peptide-phage that bind the immobilized PDZ domain. Single peptide-phage clones can be tested for binding in an ELISA and sequenced, or the enriched phage pool can be sequenced in parallel using highthroughput methods. Sequences of binding peptides yield a position weight matrix of binding peptides for the PDZ variant, which can be represented as a logo. The total height of each column indicates the information content at that position, and the height of each symbol within the column indicates its frequency. PWM and logo are reproduced from Tonikian et al. (2007) Identifying specificity profiles for peptide recogni-tion modules from phage-displayed peptide libraries. Nature Protocols 2, 1368 - 1386.

Immobilized  PDZ    

domain    

random    pep2de  library  1010  

Sequences  of  binding  pep2des  

Page 25: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

17

2.6 Multiplex highthroughput sequencing of enriched peptide-phage pools

Highthroughput sequencing was used to decode the selected peptide sequences. This tech-nology offered millions of short reads, such that several hundred PDZ profiling samples could be multiplexed in a single sequencing run. Multiplexing was accomplished with barcoded forward and reverse PCR primers, which also added the adapter sequences necessary for the sequencing reactions (Figures 2.10 and 2.11). Peptide-phage pools from the fifth round of selections served as template in separate PCR reactions. The PCR products were then pooled, purified by gel extraction and submitted for sequencing.

Figure 2.10 Barcoding strategy for multiplex highthroughput sequencing. Specificity profiling of PDZ variants yields pools of peptide-phage that are highly en-riched with binding peptide sequences. Highthroughput sequencing methods generate millions of short reads, suitable for decoding the recombinant peptide-encoding region of the phage genome and in sufficient number to allow substantial multiplexing of peptide-phage pools. PCR reactions are used to add forward and reverse barcodes to identify the particular peptide-phage pool uniquely, in addition to the adapter sequences neces-sary for sequencing. Each sequence read includes the forward and reverse barcode in addition to the peptide-coding sequence, so that the profiles can be deconvolved.

Pools of phage encoding peptides

selected by different PDZ

variants!

Label each pool with two unique barcodes using

PCR!!

24 X 24 primers!label 576 pools

uniquely!

Sequence barcodes and

peptide-encoding region!

Deconvolve profiles!

BC 1!

BC A!BC 1!

BC A!

BC 3!

BC B!

BC 3!

BC B!

millions of short reads!BC1 peptide-encoding BC A!!

Page 26: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

18

2.7 Sequencing data filtering and logo generation

Each Illumina lane yielded more than 36 million reads. The sequencing data were processed by Simon Taehyung Kim, a summer student in the Kim lab, as follows. The data were filtered to ensure a minimum read quality, and then deconvolved based on barcodes. The reads in each de-convolved sample were then filtered based on frequency, to omit rare sequences that most likely corresponded to PCR or sequencing errors. Finally, each unique DNA sequence was translated and these peptides were summarized as logos [26, 13]. The height of each logo column corresponds to the information content at that position, on a normalized scale from 0 (non-specific, uniform dis-tribution of all 20 residues) to 1 (completely specific, only one amino acid observed).

A TTCA CCG CG TA CGGGGG CCG CCTGGAGGAGAA CA TCGA CAG CG CCC

CCGG TGG CGGANNNNNNNNNNNNNNNNNNNNN TGA TAAA CCGA TA CA

A TTAAAGG CTCCTTTTGGAG CCTTTTTTTTGGAGA TTTTCAA CG TGA

AAAAA TTA TTA TTCG CAA TTCCTTTAG TTG TT

CAAG CAGAAGA CGG CA TA CGAG CTCTTCCGA TCTNNNNNNAG CG CCC

CCGG TGG CGGANNNNNNNNNNNNNNNNNNNNN TGA TAAA CCGA TA CA

A TTAAAGG CTCCTNNNNNNNNAGA TCGGAAGAG CG TCG TG TAGGGAA

AGAG TG TAGA TCTCGG TGG TCG CCG TA TCA TT

F T A Y G G R L E E N I D S A

P G G G ? ? ? ? ? ? ? * * T D T

I K G S F W S L F F G D F Q R E

K I I I R N S F S C

K Q K T A Y E L F R S ? ? S A

P G G G ? ? ? ? ? ? ? * * T D T

I K G S ? ? ? R S E E R R V G K

E C R S R W S P Y H

constant / PC...M13 major coat protein

constant / PCR pri...constantRANDOM PEPTIDE 7merconstant / PCR primer an...

constant / PCR primer anneali...

constant / PC...barcode (6 b)Adapter 1

constant / PCR pri...constantRANDOM PEPTIDE 7merconstant / PCR primer an...

Adapter 2Barcode (8 b)constant / PCR primer anneali...

Adapter 2

Figure 2.11. Alignment of barcoded cluster-ready PCR product to part of peptide-phage display vector. Part of the phagemid genome, encoding the recombinant coat protein (top se-quence) is PCR amplified. The PCR amplification primers yield a barcod-ed DNA molecule with the necessary adapters for Illumina flow cell hybrid-ization, amplification and sequencing primer annealing (bottom sequence).

Page 27: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

19

2.8 Results

To engineer PDZ variants with different position-2 specificities, I used structure-guided combinatorial library design and directed evolution. To evaluate whether the resulting PDZ vari-ants had new specificities, I used random peptide libraries to profile their binding preferences. Peptides that bound each PDZ variant are presented here as logos. PDZ variant specificities were classified according to their position-2 preferences. Most natural PDZ domain specificities were recapitulated by the synthetic PDZ variants, and some novel position-2 specificities were observed as well. Although most of the analysis focuses on position-2, all seven peptide positions are shown in each logo because they reflect peptide binding mode. Most PDZ variants retain the same pref-erences at positions-3,-1 and 0, which indicates that the canonical peptide binding mode is preserved in these variants. In a few cases, PDZ variants have altered preferences at these other positions in addition to position-2, indicating that these peptides are probably binding the variant domain in a non-canonical manner. The selectant logos were classified by inspection (Figures 2.12 – 2.15), with minor adjust-ments based on the position-2 information content and hydrophobicity scores to ensure consis-tency. The threshold for non-specificity was set by the position-2 information content score for the naive peptide library, calculated from all unique reads. Peptide profile logos for the derivative PDZ variants with proximal mutations, distal mutations or single mutations are shown separately (Fig-ures 2.16 – 2.18). In addition to the logos, graphs showing the information content score and also hydropho-bicity score of the position-2 amino acid distribution are shown in Figure 2.19. Graphs showing the same information, but colored to indicate the net change in formal charge of PDZ variants com-pared to the parental scaffold are shown in Figure 2.20. The information content was calculated as for the logos [26]. The hydrophobicity score was calculated using the Black and Mould scale [27] normalized to a range of 1, so that glycine scored 0, phenylalanine scored 0.5 and arginine scored -0.5 [28].

Class Description CountClass  I ST-­‐2 27Class  II φ-­‐2 30Class  III D-­‐2 1Class  IV R-­‐2 6Position-­‐2  non  selective non-­‐2 22All  positions  non  selective non 1

Table 2.1 PDZ variants selected directly from the library according to specificity class.

Page 28: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

20

2.8.1 Selectant specificity profiles

Of the 87 PDZ variants selected directly from the combinatorial library, 86 had strong pep-tide binding preferences across at least the last four ligand positions. Nearly all of the profiled se-lectants retained the parental scaffold’s peptide preferences at positions-3,-1 and 0. Only two variants had altered preferences for peptide positions other than position-2; both of these PDZ variants had altered position-1 preferences (see below). The distribution of selectants across specificity classes is described in Table 2.1.

Page 29: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

21

Figure 2.12 PDZ selectants with Class I peptide profiles [ST]-2 .Peptide profiles for PDZ variants selected directly from the combinatorial library (ie. PDZ selectants) are shown as logos. Each PDZ selectant’s engineered sequence is given below the logo, with residues colored according to physicochemical character. The number of unique peptide sequences represented by the logo is also indicated.

310 peptides

789 peptides

370 peptides

894 peptides

278 peptides

560 peptides

350 peptides

321 peptides

783 peptides

286 peptides

117 peptides

250 peptides

507 peptides 398 peptides

118 peptides

512 peptides

523 peptides

289 peptides

157 peptides

334 peptides

454 peptides

324 peptides

339 peptides

498 peptides

459 peptides

666 peptides

526 peptides

F G R V R A W

P L W W G Y V

C L A V M P F

T H R R A A R

P F W R A W C

M G R W R S H

R P F W A W L

P A R R A W R

T H A R A R R

S H N R A R R

S V W R A R R

S H R R A R A

P W W W G Y L

P L R R A Q R

S H R R A R Q

S H K R A R N

S Y W R A W F

P L W W G W V

R L W M A Y W

S H R R A A R

S R P R A K W

S P F W A Y V

P R W R A S Q

T L R R A A S

S G Q R A A R

S Q R R A A R

T-2 ST-2 T-2 ST-2 polar-2

P V W W G W L

canonicalα2-1 His

non-canonicalα2-1 Xaa

Page 30: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

22

Figure 2.13 PDZ selectants with Class II peptide profiles Φ-2 .Peptide profiles for PDZ variants selected directly from the combinatorial library (ie. PDZ selectants) are shown as logos. Each PDZ selectant’s engineered sequence is given below the logo, with residues colored according to physicochemical character. The number of unique peptide sequences represented by the logo is also indicated.Symbols denote aromatic (Ω), aliphatic(Ψ) or hydrophobic (Φ) residues.

W-2

Ψ-2 Ω-2 Φ-2

372 peptides

486 peptides

361 peptides

977 peptides

1610 peptides

345 peptides

1026 peptides

254 peptides

402 peptides

952 peptides

841 peptides

720 peptides 256 peptides

485 peptides

399 peptides 162 peptides

586 peptides 550 peptides563 peptides 663 peptides

263 peptides

569 peptides

364 peptides 853 peptides

325 peptides

639 peptides768 peptides

345 peptides

472 peptides

423 peptides

S G G Q A A R

G Y W R A W W

S A W R A R R

T P R R A R W

S P W R A R R

S A R R A A R

I Q R A W S T

S L R R A W R D A R R A A R

S P R R A A R

P M R R A K Y

P F W R A L V

T R R R A I R

P L A R A M T

F E E I A A R

S P A R A W W

P L R R A W L

S A R R A Q R

S R R R A A Q

P P F W A W W

V R Q G A W W

S R R R A R W

R L R R A A R

D A R R A W R P R W R A V QT L R R A A R

S L R R A A R

P L A R A W R

P L W R A R R

R L G R A Q R

Page 31: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

23

Class III

781 peptides

97 peptides

*120,126 reads passed the PHRED30

quality threshold prior to filtering

based on read count.

238 peptides

434 peptides

250 peptides

663 peptides

Class IV

F L V L V S G

T L D G A E W

R L E E A E R

R L W R A E R

S P W W A Y I

542 peptidesS K W R A Q R

787 peptidesP H W R A Y R

naive library

α2-1 Leuα2-5 Glu

other

Figure 2.14 PDZ selectants with Class III D-2, Class IV R-2 and naive library peptide profiles.Peptide profiles for PDZ variants selected directly from the combinatorial library (ie. PDZ selectants) are shown as logos. Each PDZ selectant’s engineered sequence is given below the logo, with residues colored according to physicochemical character. The number of unique peptide sequences represented by the logo is also indicated.Symbols denote aromatic (Ω), aliphatic (Ψ) or hydrophobic (Φ) residues.

Page 32: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

24

Figure 2.15 PDZ selectants that are non-selective for position-2. Peptide profiles for PDZ variants selected directly from the combinatorial li-brary (ie. PDZ selectants) are shown as logos. Boxed logos are highlighted be-cause they might be expected to have other specificities, based on their engi-neered helix sequences. Each PDZ selectant’s engineered sequence is given below the logo, with residues colored according to physicochemical character. The

α2-1 His

X-2

315 peptides

172 peptides

1344 peptides

73 peptides

964 peptides

255 peptides

546 peptides

674 peptides

703 peptides

241 peptides 645 peptides

299 peptides

718 peptides

216 peptides

539 peptides275 peptides

503 peptides

279 peptides508 peptides

606 peptides

387 peptides

204 peptidesS H R R A Y R P F W R A W V

S W W R A Y K

P L W W G F V

R F G R A Y R

S F G R A W W

P W W R A Y R

T V W R A R K

R P W W A Y L

P G P Q A A H S L A G A T G

R F R R A V R

T L W R A E R

S L R K A Y R

R L W R A E W S R A R A R H

S Q R R A L R

S L R R A T RG G P R A L R

P L A R A T R

S F R R A L W

S F R R A W Rα2-1 Leu α2-5 Glu

Page 33: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

25

2.8.2 Proximal residue double mutant specificity profiles

The set of 17 PDZ variants with two mutations of both residues proximal to the position-2 sidechain (α2-1 and α2-5) encompassed the same range of specificities as the selectants, with the exception of Class III [D]-2, which was not observed. One of these PDZ variants exhibited the same alteration of specificity at position-1 as two of the selectants (Figure 2.16). This set of PDZ variants exhibited a similar range of information content scores and hydrophobicity scores as the selectants (Figure 2.19).

Page 34: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

26

Class II

Figure 2.16 Peptide profiles for PDZ variants with mutations proximal to peptide position-2. These pairwise mutations at α2-1 and α2-5 (highlighted) were observed in PDZ variants selected directly from the library. Residues at the dis-tal positions (grey) match the wildtype Erbin PDZ domain sequence. Sym-bols denote aromatic (Ω), aliphatic (Ψ) or hydrophobic (Φ) residues.

E G G Q A A S

E L G Q A Y S

E Q G Q A L S665 peptides

E L G Q A Q S616 peptides

E P G Q A Y S1041 peptides

E L G Q A E S403 peptides

E P G Q A W S251 peptides

754 peptides

E M G Q A K S934 peptides

E F G Q A Y S952 peptides

E R G Q A R S699 peptides

958 peptides

E F G Q A W S915 peptides

E R G Q A K S629 peptides

E L G Q A W S820 peptides

E A G Q A Q S518 peptides

E R G Q A A S494 peptides E L G Q A A S

993 peptides

E A G Q A A S463 peptides

Class I Class III Class IV non

none

Ψ-2

Ω-2

W-2α2-1 Xaa α2-1 Leuα2-5 Glu

other

Φ-2

Page 35: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

27

2.8.3 Proximal residue single mutant specificity profiles

The set of 38 PDZ variants with single mutations of either proximal residue (α2-1 or α2-5) did not exhibit Class III [D]-2 specificity, nor any altered peptide preference at ligand positions oth-er than position-2 (Figure 2.18). In general, mutation of His (α2-1) resulted in class switching from a preference for polar residues (Class I) to hydrophobic residues, (Class II) or to a loss of specificity for position-2. In contrast, mutation of Val (α2-5) did not alter the preference for polar residues, but did affect the strength of that preference, as evidenced by the range of information content scores for these PDZ variants (Figure 2.19C).

Page 36: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

28

Figure 2.17 Peptide profiles for PDZ variants with single mutations proximal to peptide position-2.Mutations at α2-1 or α2-5 are highlighted according to physicochemical char-acter. Residues at the remaining positions (grey) match the wildtype Erbin PDZ domain sequence. Note wildtype Erbin PDZ’s profile, with helix EHGQAVS.

α2-1 mutations

1407 peptides230 peptides

824 peptides566 peptides

784 peptides797 peptides

1164 peptides1004 peptides

938 peptides731 peptides

1240 peptides1418 peptides

654 peptides346 peptides

1041 peptides267 peptides

512 peptides892 peptides

1957 peptides633 peptidesE A G Q A V S

E C G Q A V S

E D G Q A V S

E E G Q A V S

E F G Q A V S

E G G Q A V S

E I G Q A V S

E K G Q A V S

E L G Q A V S

E M G Q A V S

E N G Q A V S

E P G Q A V S

E Q G Q A V S

E R G Q A V S

E T G Q A V S

E V G Q A V S

E W G Q A V S

E Y G Q A V S

E H G Q A V S

E S G Q A V S

1004 peptides

560 peptides 612 peptides

359 peptides 296 peptides

173 peptides 473 peptides

252 peptides 230 peptides

436 peptides 562 peptides

850 peptides 1028 peptides

162 peptides 752 peptides

749 peptides

586 peptides 66 peptides

463 peptides 589 peptides

E H G Q A A S

E H G Q A C S

E H G Q A D S

E H G Q A E S

E H G Q A F S

E H G Q A G S

E H G Q A H S

E H G Q A I S

E H G Q A K S

E H G Q A L S

E H G Q A M S

E H G Q A N S

E H G Q A P S

E H G Q A Q S

E H G Q A R S

E H G Q A S S

E H G Q A T S

E H G Q A V S

E H G Q A W S

E H G Q A Y S

α2-5 mutations

Page 37: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

29

2.8.4 Distal residue mutant specificity profiles

The set of 21 PDZ variants with five mutations at the helix positions distal to the posi-tion-2 sidechain (with proximal residues His (α2-1) or Val (α2-5) unaltered) either retained Class I specificity or were non-selective for position-2. The strength of the position-2 preference varied; some PDZ variants had equivalent or higher position-2 information content scores than the paren-tal scaffold, but most had lower scores (Figure 2.19B). In general, information content scores were inversely proportional to the increase in formal charge of the distal positions of the engineered helix (Figure 2.20B).

Page 38: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

30

Figure 2.18 Peptide profiles for PDZ variants with mutations distal to peptide position-2. These combinations of five mutations (highlighted) were observed in PDZ variants selected directly from the library. Residues at the proximal positions (grey) match the wildtype Erbin PDZ domain sequence (EHGQAVS). Peptide profiles are grouped according to specificity class and increase in formal charge (in columns).

T H R R A V R455 peptides

S H G R A V W321 peptides

T H R R A V S416 peptides

S H F W A V V242 peptides

P H W R A V V1170 peptides

S H R K A V R826 peptides

S H A R A V H678 peptides

P H R R A V Y687 peptides

S H R R A V R443 peptides

R H W W A V L185 peptides

P H R R A V R262 peptides

S H R R A V A965 peptides

R H R R A V R378 peptides

P H F W A V W836 peptides

S H Q R A V R212 peptides

P H A R A V R269 peptides

T H W R A V R115 peptides

S H P R A V W573 peptides

R H G R A V R315 peptides

D H R R A V R862 peptides

576 peptidesR H R R A V S

Cla

ss I

othe

r

1+ 2+ 3+ 4+ 5+

Page 39: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

31

Figure 2.19 Summary of position-2 information content and hydrophobicity scores for profiled PDZ variants. The information content at position-2 was calculated as for the logos, assuming a uni-form background distribution of peptide sequences. The hydrophobicity scores were calculated based on Black and Mould’s (1991) normalized scale, where glycine = 0, ar-ginine = -0.5 and phenylalanine = 0.5. Selectants are shown in A, and in D, where they are colour-coded according to their assigned specificity class. Variants with only proxi-mal mutations or only distal mutations are shown in B. Variants with single mutations at either proximal position are shown in C. Controls shown in each panel include the naive library (yellow triangle) or the parental scaffold (wildtype Erbin PDZ, orange circle).

-­‐0.50  

-­‐0.25  

0.00  

0.25  

0.50  

0.00   0.25   0.50   0.75   1.00  

Hydrph

obicity  at

 pos

i6on

-­‐2  

Informa6on  Content  at  posi6on-­‐2  

Class  I  

Class  II  

Class  III  

Class  IV  

Nonspecific  

Wildtype  ErbinPDZ  

Naive  Library  

-­‐0.50  

-­‐0.25  

0.00  

0.25  

0.50  

0.00   0.25   0.50   0.75   1.00  

Hydrph

obicity  at

 pos

i6on

-­‐2  

Informa6on  Content  at  posi6on-­‐2  

Selectants  

Wildtype  ErbinPDZ  

Naive  Library  

-­‐0.50  

-­‐0.25  

0.00  

0.25  

0.50  

0.00   0.25   0.50   0.75   1.00  

Hydrph

obicity  at

 pos

i6on

-­‐2  

Informa6on  Content  at  posi6on-­‐2  

alpha2-­‐1  single  mutants  

alpha2-­‐5  single  mutants  

Wildtype  ErbinPDZ  

Naive  Library  

-­‐0.50  

-­‐0.25  

0.00  

0.25  

0.50  

0.00   0.25   0.50   0.75   1.00  

Hydr

phob

icity  at  p

osi6on

-­‐2  

Informa6on  Content  at  posi6on-­‐2  

Proximal  mutants  

Distal  mutants  

Wildtype  ErbinPDZ  

Naive  Library  

A B

C D

Page 40: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

32

-­‐0.50  

-­‐0.25  

0.00  

0.25  

0.50  

0.00   0.25   0.50   0.75   1.00  

Hydr

phob

icity

 at  p

osi6

on-­‐2  

Informa6on  Content  at  posi6on-­‐2  

0  

1  

2  

3  

4  

5  

Wildtype  ErbinPDZ  

Naive  Library  

-­‐0.50  

-­‐0.25  

0.00  

0.25  

0.50  

0.00   0.25   0.50   0.75   1.00  

Hydr

phob

icity

 at  p

osi6

on-­‐2  

Informa6on  Content  at  posi6on-­‐2  

0  

1  

2  

3  

4  

5  

Wildtype  ErbinPDZ  

Naive  Library  

A B

Figure 2.20 Summary of position-2 information content and hydrophobic-ity scores for profiled PDZ variants, coded according to net change in formal charge. PDZ variant symbols are color-coded according to net difference in formal charge relative to parental scaffold helix EHGQAVS, considering only the five distal residues. Selectants are shown in A. PDZ variants with only distal mutations are shown in B.

Page 41: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

33

2.9 Discussion

2.9.1 The set of engineered variants encompasses the range of specificities observed among natural PDZ domains.

The engineered variants demonstrate a similar range of specificities as has been observed among natural PDZ domains to date (Figure 2.21) [13, 29]. This includes examples of the major specificity classes (I, II and III) although not examples of every sub-specificity within a class. These exceptions tend to correspond to specificities that depend on unusual peptide binding modes, such as Dvl2 (G-2), in which two residues occupy site-2[18] or PDLIM4, which can accommodate proline or aspartate at position-2 (Sidhu and Ernst, unpublished results). In addition, I recovered PDZ selectants with specificities that have not yet been document-ed among natural domains. These include several examples of domains that prefer R-2 , referred to as Class IV in this thesis. I also recovered PDZ variants specific for W-2, which is a Class II subtype that has not been described before, and which also has different preferences for flanking ligand residues, indicating that the overall peptide binding mode is altered (discussed further below).

Figure 2.21 Comparison of position-2 specificities found in natural PDZ domains and engineered variants. Top row, summary of natural position-2 specificities. Red lettering indicates speci-ficities found in a survey of nearly half of the PDZ domains encoded in the hu-man and worm genomes (reproduced from Tonikian et al. (2008) A spec-ificity map for the PDZ domain family. PLoS Biol 6:e239.) Blue lettering indicates specificity of nNOS PDZ (logo generated from peptide sequences reported in Stricker et al. (1997) PDZ domain of neuronal nitric oxide synthase rec-ognizes novel C-terminal peptide sequences. Nature Biotechnol 15:336-42.Bottom row, summary of position-2 specificities found among the PDZ variants selected directly from the combinatorial library, including two specificities that have not yet been reported among natural domains, shown in black lettering.Natual specificities with no engineered counterpart correspond to a) domains that employ non-canonical peptide binding modes, including Dvl2 (G-2,PDB:3CBX) and PDLIM4 (([PD]-2, unpublished), or b) domains whose specificities are not recon-ciled by structures solved in complex with suboptimal ligands, such as CASK (F-2

PDB:1KWA) Symbols denote aromatic (Ω), aliphatic (Ψ) or hydrophobic (Φ) residues.

N

1 2 3 4

VMLIA

G

5

D

6

Q

N

M

L

I

G

C

A

F

7

VC

D

R W

Eng

inee

red

Nat

ural

Page 42: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

34

Given that the selection targets matched the optimal specificity for Erbin-PDZ at positions other than position-2, it is not surprising that only a few variants with unusual peptide binding modes were recovered. The baseline affinity of the scaffold domain for the selection targets, in-dependent of position-2 contacts, also explains why variants with negligible position-2 preferences were still recovered. The T-2 to A-2 substitution reduces affinity about 100-fold, from 20 nM to 2 uM for peptides WETWVCOOH or WEAWVCOOH, respectively, based on previous affinity measurements [8].

2.9.2 Class I specificity can be achieved by canonical or non-canonical contacts.

Of the 27 Class I selectants, only seven have the canonical His (α2-1) specificity determi-nant. The α2-5 residues (Ala or Arg) in these canonical selectants correspond to the most S-selec-tive or T-selective α2-5 point mutants, which is an indication that the selection conditions with competitors in solution did favor selectivity (Figure 2.17). The remaining 20 Class I selectants must employ non-canonical specificity determinants. In a few cases, the canonical peptide binding mode is probably preserved. See for instance the variant with Gln (α2-1); the point mutants demonstrate that this residue can substitute for His (α2-1) (Fig-ure 2.17). In most cases, however, the relationship of the α-helix relative to the rest of the domain is probably altered, which brings other contacts into proximity with the position-2 sidechain. Of these 20 non-canonical Class I selectants, 11 have residues in their engineered sequences that likely affect the register, rotation or conformation of the α-helix; specifically, 7 contain glycine and 4 contain proline (excluding the helix-initiating residue, where proline is frequently observed but unlikely to perturb the helix.) For instance, four variants with Gly (α2-4) all exhibit class I specificities (Figure 2.12); the only other variant with Gly (α2-4) also prefers polar position-2 residues but is below the threshold for specificity (Figure 2.15). The α2-4 sidechain packs against the hydrophobic core; all 96 protease-selected variants have Ala (α2-4), as does the parental scaffold. The Ala to Gly (α2-4) substitution may allow the helix to kink or shift, resulting in Class I specificity via alternate contacts. It is inter-esting that these same variants also all have proline at the helix-initiating position (β5:α2-4), which may compensate for reduced stability incurred by the Gly (α2-4) substitution. Some of the most successful PDZ specificity prediction approaches are based on the iden-tity of contact residues, determined by structural consensus [13]. In contrast, these engineering efforts highlight the influence of non-contact residues, or “second sphere” residues, on specificity. The set of human PDZ domains includes more diversity at α2-4 than was observed in the engi-neered variants (Figure 2.22). It would be interesting to see whether computational prediction of natural PDZ specificities could be improved by including α2-4 along with direct contact residues. Characterization of this large set of variants also demonstrates that His (α2-1) is not entirely predictive of Class I specificity. Some α2-5 residues preclude that specificity (eg. Trp (α2-5), Figure

Page 43: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

35

2.17). Also, several distal residue mutants have preferences for residues other than [ST]-2 despite having the same two proximal residues as the Class I parental scaffold; this is true of some natural domains as well, such as MAGI1-PDZ4 [13]. These subtleties present challenges for the prediction of natural PDZ domain specificities from primary sequences. If the structural changes that prevent His (alpha 2-1) from binding peptides with [ST]-2 can be identified, this information could be used

to reduce the false-positive rate of Class I predictions.

2.9.3 Class II sub-specificities require two proximal contact residues, but the most selective variants depend on an alternative binding mode.

Mutation of His (α2-1) to nearly any other residue results in a PDZ variant with preference for aromatic residues at position-2 (Figures 2.17 and 2.19C.) Achieving selectivity within Class II requires additional mutations at α2-5. For instance, two mutations are necessary and sufficient to generate a PDZ variant with a preference for aliphatic residues at position-2 (Figure 2.16). There does not seem to be an obvious relationship between bulk or branching of the two proximal resi-dues and preference for aromatic or aliphatic position-2 residues. Moreover, some variants with hydrophobic proximal residues exhibit class IV specificity for R-2. These results may reflect fundamental differences between polar/ionic and hydrophobic in-teractions. Whereas specificity for polar sidechains relies on geometry and polarization, specificity for hydrophobic sidechains relies on surface area and steric constraints to disfavor some residues. In that sense, the α2-1 mutants are not necessarily “specific” for aromatic position-2 residues; rather,

Figure 2.22 Summary of human PDZ domain sequences corresponding to residue randomized in the combinatorial library.Alignment of all human PDZ domains (n=364) was downloaded from the SMART database (January 17, 2012). Correct align-ment of the α2 helix was confirmed by com-parison with available PDZ structures. Fre-quency logos for domains with (n=108) and without (n=156) canonical Class I specific-ity determinant His (α2-1) are shown with single residue amino acid symbols color coded according to physicochemical char-acter. In these frequency logos, the height of each column is fixed and does not represent information content; the height of each sym-bol is proportional to its representation in the alignment.

N

I

A

Q

K

H

D

L

PGVERST

HPIMGYNHQRKTLSDAE

T

N

L

H

C

F

Y

S

A

RKDQE

T

L

MIVAR

M

L

QAIVP

M

I

L

H

TRAKSQDNE

A

M

F

N

I

QKVRDLPST

K

I

C

E

D

Q

NAMVPYFRTSL

Y

V

P

C

T

G

L

RQDAKSNHE

H

G

F

T

N

L

RSAKDQE

T

R

M

F

N

C

QILEDAV

N

K

D

R

M

H

YS

TE

QLIVA

P

C

W

T

N

G

IRHLSDEAQKV

β5:α

2-4

α2-1

α2-2

α2-3

α2-4

α2-5

α2-6

β5:α

2-4

α2-1

α2-2

α2-3

α2-4

α2-5

α2-6

Page 44: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

36

in the absence of favorable polar contacts or steric constraints, the largest hydrophobic sidechains (ie. aromatic residues) make the largest energetic contributions and are preferred. However, the ab-solute energetic contribution from these contacts may be minimal (compared to a hydrogen bond between domain and peptide, for instance) and may not be indicative of a great degree of selectiv-ity. Peptide profiling by phage display selects the highest affinity peptide binders, and others have noted that phage peptide profiles tend to be more hydrophobic in character than the set of natural ligands for a given domain [30]. Nevertheless, it is clearly possible to achieve selectivity for a particular hydrophobic residue at position-2, as is evident in the set of natural domains and also in some of the engineered vari-ants. However, this seems to require more than two direct contact residues to achieve. For instance, the most selective Class II variant also had altered specificity for positions-1,-3, indicating that the

overall peptide binding mode was altered (Figure 2.23). The derivative PDZ variant with matching proximal residues Gly (α2-1) and Ala (α2-5) shares this specificity, but it would be misleading to attribute this to direct favorable contacts. The depth of peptide profiling data generated using this approach may aid modeling efforts to determine the structural alterations that permit selectivity for W-2. Covariation between peptide positions can be identified with confidence, given the large number of binding peptides [15, 31]. In the case of these variants, the full set of binding peptides is best represented by three separate logos.

Figure 2.23 Necessity and sufficiency of two mutations to yield Class II W-2 speci-ficity. Each PDZ variant’s engineered sequence is given below its logo, along with the number of unique peptide sequences it represents.

1610 peptidesS G G Q A A R

PDZ variant

731 peptidesE G G Q A V S

α2-1

560 peptidesE H G Q A A S

α2-5

754 peptidesE G G Q A A S

proximal distal

Not profiled

Figure 2.24 Multiple specificity logos for PDZ variants with Class II W-2 specificity.Each PDZ variant’s engineered sequence is given below its logo, along with the number of unique peptide sequences it represents.

1610 peptidesS G G Q A A R

PDZ variant

771 peptide

Logo 1

351 peptide

Logo 2

486 peptides

Logo 3

407 peptide 171 peptideE G G Q A A S

175 peptides754 peptides

Page 45: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

37

In all three logos, W-2 is preferred, but the alternative preferences at the other peptide positions reflect the overall binding mode(s) and could be used to refine a structural model that rationalizes W-2 specificity.

2.9.4 Class III variant suggests α2-5 can be the major specificity determining residue.

I recovered one PDZ variant specific for aspartate residues at ligand position-2, which is the

Figure 2.25 Necessity of at least two mutations to yield Class III specific-ity in this PDZ variant. Each PDZ variant’s engineered sequence is given be-low its logo, along with the number of unique peptide sequences it represents.

Not profiled

PDZ variant α2-1 α2-5 proximal distal

Not profiled230 peptides

E L G Q A V S1028 peptides

E H G Q A S S238 peptides

F L V L V S G

Figure 2.26 Comparison of Erbin PDZ (Class I) and nNOS (Class III) peptide bind-ing modes with sequence of engineered Class III variant.Structural alignment of the Erbin PDZ domain (light grey surface, grey ligand ETWVCOOH) and nNOS PDZ domain (dark grey surface, cyan ligand VDSVCOOH). The engineered helix residues are shown as a ribbon, with the α2-1 and α2-5 sidechains shown as sticks. Hy-drogen bonds between α2-1 and position-2 (His (α2-1) and T-2 in Erbin, yellow bond or Tyr (α2-1) and D-2 in nNOS, orange bond). By comparison, the engineered PDZ variant cannot form a hydrogen bond between its α2-1 Leu residue and D-2; instead, its Ser (α2-5) residue may provide the dominant favorable contact, due to a change in the register or rotation of the engineered alpha helix, enabled by flanking mutations such as Val (α2-4) or Gly (α2-6).

1004 peptidesE H G Q A V S

238 peptidesF L V L V S G

N

1 2 3 4

VMLIA

G

5

D

6

Q

N

M

L

I

G

C

A

F

7

VC

S Y D S A L E

Erbin PDZ

nNOS PDZ

Variant PDZ

Page 46: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

38

same specificity exhibited by the nNOS PDZ domain [29] (Figure 2.25). Structural alignment of the nNOS PDZ structure (PDB:1B8Q) and the library scaffold Erbin-PDZ (PDB:1N7T) shows that both domains can form a hydrogen bond between the α2-1 residue (Tyr or His) and the position-2 sidechain (D-2 or T-2) (Figure 2.26). However, the PDZ variant’s Leu (α2-1) cannot form a hydrogen bond with D-2. Instead, it seems likely that Ser (α2-5) is the hydrogen bond donor, and that the other helix mutations allow α2-5 to become the major favorable contact (Figure 2.25). In other natural PDZ domains, α2-5 is the major contact, such as GRIP1-PDZ6, whose preference for Y-2 is mediated by Ile (α2-5) [20]. This switch might be permitted by a change in helix register due to Val (α2-4), which is rare in the set of engineered variants and not observed in the protease resistant variants, but common in the set of human PDZ domains. (Figure 2.22). This Class III variant illustrates that there can be many structural mechanisms of generat-ing the same functional specificity, and suggests that particular non-contact residues may have a disproportionate effect on the relative importance of direct contact residues.

2.9.5 Class IV specificity can be generated by polar or non-polar contacts, and this specificity may exist in the set of natural domains.

Several PDZ variants exhibited specificity for R-2, which has not been widely reported among natural PDZ domains to date (Figure 2.14). Direct contacts are sufficient to generate this specificity, but apparently through two different mechanisms. Three variants with Leu (α2-1) and Glu (α2-5) prefer R-2, as does the proximal mutant with these two residues (Figure 2.27). This

Figure 2.27 Necessity and sufficiency of two mutations to yield Class IV specificity. Each PDZ variant’s engineered sequence is given below its logo, along with the number of unique peptide sequences it represents.

781 peptidesT L D G A E W

434 peptidesR L E E A E R

250 peptidesR L W R A E R

230 peptidesE L G Q A V S

252 peptidesE H G Q A E S

Not profiled

PDZ variant α2-1 α2-5 proximal distal

E L G Q A E S403 peptides

Page 47: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

39

specificity is probably mediated by ionic interaction between Glu (α2-5)’s carboxylate group and R-2 guanidium group (as well as hydrophobic contact between Leu (α2-1) and the aliphatic portion of the R-2 sidechain.) Alternatively, two variants with Tyr (α2-5) (Figure 2.14) and point mutants with aromatic α2-5 residues (Figure 2.17) also bind R-2 peptides, as do proximal mutants with Phe or Leu at α2-1 and aromatic α2-5 residues (Figure 2.16). In this case, the interaction probably relies on cation: π interactions. The logos suggest that R-2 specificity based on putative polar/ionic con-tacts is more selective. It would be interesting to compare absolute affinities, and also the degree of selectivity between variants with similar specificities mediated by different interacting residues (see Section 3.1 Future Directions). The influence of distal residues on the strength of specificity is illustrated by the range of information content scores for R-2-specific variants with proximal residues Leu (α2-1) and Glu (α2-5) (Figure 2.15B, blue squares). The most selective PDZ variant (helix TLDGAEW) is far more se-lective for R-2 than the proximal mutant (helix ELGQAES), despite many similarities: both variants have formal charge of 1- (considering distal residues only) and both have a glycine residue, albeit at different helix positions. The distal helix residues in the more selective variant must optimize the proximal residue contacts with the R-2 sidechain. The next most selective PDZ variant (helix RLEEAER) has a neutral formal charge while the third most selective variant has a formal charge of 3+ (considering distal residues only.) It appears that the degree of selectivity decreases as the charge complementarity between the helix and ligand (ETWVCOOH) increases. The location of the charges may also matter; the second most selective PDZ variant has a neutral formal charge only because the two glutamate residues in the center of the helix are counterbalanced by two arginine residues at either end. Given that specificity for R-2 seems easily evolvable, requiring only direct contacts and no apparent alteration of the PDZ fold, it is curious that this specificity has not been widely observed among natural domains. There are three natural human PDZ domains with Leu (α2-1) and Glu (α2-5): the second PDZ domain of DLG5, the third PDZ domain of USH1C, and TIAM2’s only PDZ domain. DLG5_PDZ2 has not been specificity profiled by our group or others. Ush1C_PDZ3 was peptide profiled but did not yield specific peptide sequences [13]. However, TIAM2 was stud-ied by three groups [6, 12, 13] whose contradictory results led to a third independent study, which showed that TIAM2 prefers positively charged residues (R, H, K) and tyrosine (Y) at position-2 [32]. Biologically relevant binding partners for TIAM2 with [RHK]-2 residues have not yet been identified. Further specificity profiling natural domains to establish whether they do prefer basic residues at position-2 could demonstrate that the study of engineered variants can generate predic-tive insights into natural specificities.

Page 48: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

40

2.9.6 Model for helix contributions to specificity, structure or generic affinity is an oversimplifica-tion.

The summary of specificities suggests that pairwise combinations of residues at the proxi-mal positions are generally necessary and sufficient to generate most specificities in this scaffold. However, it is clear that attributing an entirely structural role to one residue is an oversimplifica-tion. Variants with residues other than the most structurally-preferred Ala (α2-4) were among the most specific. These variants might also be less stable, which would represent a tradeoff in favor of the variant’s peptide binding function. Structural constraints on tolerated diversity at α2-4 do not preclude its importance for determining position-2 specificity; in fact, it may be among the most important non-contact residues. Similarly, introduction of helix-disrupting glycine or proline resi-dues yielded some of the most specific PDZ variants, but not in a way that is consistent with the model outlined above. As a whole, these findings suggest that models for predicting natural PDZ domain specificity could be improved by accounting for the effects of key non-contact residues. Testing the effect of charged residues at distal positions of the α2 helix on affinity and speci-ficity will require affinity measurements. The inverse relationship between helix formal charge and position-2 information content score among the set of PDZ variants with distal mutations does suggest that a tradeoff exists; increased charge complementarity between domain and ligand may reduce the energetic importance of specific favorable position-2 contacts. It also suggests that the scaffold PDZ domain (and other natural PDZ domains) are perhaps optimized to minimize generic affinity in order to maximize their selectivity.

Page 49: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

41

2.10 Materials and Methods

2.10.0 Strains

E. coli strain XL1Blue (Stratagene) was used to express GST-fusion proteins and to propa-gate phage during selections. PDZ-phage libraries were prepared by single strand mutagenesis us-ing dU-ssDNA harvested from phage propagated in E. coli strain CJ236 (New England Biolabs). PDZ-phage and peptide-phage libraries were amplified and re-amplified using E. coli strain SS320, which was generated by mating strain XL1Blue to strain MC1061.

2.10.1 PDZ-phage library construction

A phage display library of Erbin PDZ variants was constructed using a single-strand muta-genesis method, as described [25]. The display phagemid encoded the Erbin PDZ domain with an N-terminal epitope tag (gD tag: SMADPNRFRGKDL) as a fusion to the minor coat protein, pIII. Seven contiguous positions of the Erbin PDZ domain were completely randomized using a single-strand mutagenesis approach. A version of display phagemid with stop codons in the region to be mutated served as the template for library construction. The stop codons ensured that wildtype Erbin PDZ clones were not overrepresented in the naïve library, and that only successfully mutated clones were displayed on phage. Single-stranded stop template DNA was harvested from phage produced in E. coli strain CJ236, which aberrantly incorporates dUTP into replicating DNA. The mutagenic oligonucleotides had 15 bases complementary to the 5’ and 3’ sequences flanking the sites to be mutated, to allow annealing to the stop template. The sites to be mutated were synthesized with mixtures of bases. Each of the seven randomized positions was encoded by an NNK codon (N=A,T,G or C; K=G or T), whose 32 permutations encode all 20 amino acids and the amber stop codon. The mutagenic oligo was enzymatically phosphorylated at its 5’ end and was annealed to the stop template. The rest of the second strand of DNA was enzymatically synthesized and ligated via the oligonucleotides’s 5’ phosphoryl group, to covalently close the second strand of DNA (CCC-dsDNA). After purification to remove salt and concentrate the CCC-dsDNA, it was electroporated intoE. coli strain SS320. This strain possessed enzymes to preferentially replace the dUTP-con-taining stop template strand, which resulted in a dsDNA phagemid encoding a PDZ variant with mutations in its α-helix. The preferential removal of the dUTP –containing template increases the mutation rate to approximately 80%, which increases the proportion of the library transformants that actually encode PDZ variants (rather than stop template.) In order to exhaustively sample this library, the electroporation was carried out on a scale that exceeds the theoretical library diversity. At the protein level, the theoretical library diversity

Page 50: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

42

was 217 = 1.3x109. The electroporation yielded ~1x1010 transformants, as assessed by serial dilu-tions plated less than one doubling time (30 minutes) after electroporation. The practical library diversity oversampled the theoretical library diversity by several fold. Oversampling of the practical library diversity during selections (see below) allowed us to infer that the theoretical library was also oversampled. The electrocompetent SS320 cells were pre-transformed with the helper phagemid M13KO7, which encodes all the other proteins necessary for phage production, in addition to a wildtype ver-sion of pIII. Consequently, every transformed cell was primed to propagate phage particles display-ing a PDZ variant and packaging the phagemid that encodes it. The electroporated SS320 were diluted into rich medium containing antibiotics that select-ed for the display phagemid and the helper phagemid, and cultured until saturation. At that point, the secreted phage particles were harvested from the culture medium by polyethylene glycol (PEG) precipitation and titered in preparation for the first round of selections.

2.10.2 Cloning, expression and purification of GST-peptide selection targets

Target peptides were expressed as C-terminal fusions to N-terminally 6xHis-tagged glu-tathione-S-transferase (GST) in an IPTG-inducible open reading frame (ORF) (vector pS2771). The set of twenty vectors encoding different position-2 residues were generated by oligonucleotide-directed single strand mutagenesis, at 1/10th of the scale described above for library construction, and transformed into chemically competent XL1Blue. Single colonies were propagated in 2YT + 100 ug/ml carbenicillin and stored as glycerol stocks (25% glycerol v/v) at -80°C. GST-peptide sequences were confirmed by Sanger sequencing . Starter cultures for protein expression were inoculated from glycerol stocks and propagated at 37°C, 200 rpm overnight. The following day, 2-L baffled flasks containing 250 ml of 2YT + 100 ug ml carbenicillin were inoculated with 5 ml of overnight culture and grown at 37°C, 200 rpm until reaching early log phage (OD600 = 0.4). Expression was induced by addition of isopropyl-β-D-thiogalactopyranoside (IPTG) to a final concentration of 0.4 mM, and the cultures were grown for another 20 hours at 16°C, 200 rpm. Cells were harvested by centrifugation (17,600 x g) at 4°C and cell pellets were flash frozen in liquid nitrogen. Frozen cell pellets were resuspended in 5 ml PBS (137 mM NaCl, 2.7 mM KCl, 4.3 mM Na2HPO4, 1.47 mM KH2PO4, pH 7.4) containing 1 mM EDTA, 1 mM DTT, 0.5% Triton X-100 (v/v) and additional protease inhibitors (1 tablet per 50 ml buffer, Roche) and transferred to 50 ml conical vials. Cells were lysed by three cycles of flash freezing/thaw in liquid nitrogen and 37°C water bath. Insoluble material was pelleted by centrifugation at 26,700 x g, 4°C for 20 minutes. Cleared lysate was transferred to equilibrated glutathione-sepharose 4B resin (150 ul packed resin per 250 ml culture volume, Amersham) and incubated at 4°C with end-over-end mixing for at least two hours. Lysate and resin were transferred to gravity flow columns and washed with cold buffers:

Page 51: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

43

once with 3 ml PBS, once with 3 ml PBS + 150 mM NaCl and once more with 3 ml PBS. Columns were capped and GST-peptides were eluted by incubation for 30 minutes at 4°C with 1 ml of 10 mM reduced glutathione (Sigma) in 50 mM Tris-Cl, pH 8.0. Protein purity and expected size were confirmed by SDS-PAGE and Coomassie staining; purity generally exceeded 95% based on visual inspection. Protein concentration was calculated from absorbance at 280 nm and sequence-specific theoretical extinction coefficient (www.expasy.org/tools/protparam.html). Typical yields ranged from 2 mg – 10 mg of GST-peptide per liter of bacterial culture. Protease inhibitors EDTA and PMSF were added to a final concentration of 1 mM each and proteins were stored at 4°C .

2.10.3 Selection of PDZ-phage library against GST-peptides First round of selections: Selection targets were immobilized by adsorption to microtiter plates (96-well Maxisorp, NUNC) during overnight incubations at 4°C. For the first round, eight wells were coated with 100 ul of 100 ug/ml of each GST-peptide, or anti-gD antibody (Genentech) diluted 1:1000 in PBS. Wells coated overnight were then blocked for 2 hours at room temperature with 200 ul PBS + 0.5% BSA. Blocked wells were washed three times with PBS. Naïve PDZ-phage library was resuspended to approximately 1012 cfu/ml in PBS + 0.5% BSA + 0.05% Tween-20. Naïve library was aliquoted (100 ul per well) and sealed plates were incubated at room temperature with gentle agitation for 2 hours. Unbound phage were discarded and wells were washed eight times with PBS + 0.05% Tween-20. Bound phage were eluted by 10 minute incuba-tion with 100 ul 0f 0.1 M HCl. Phage eluted from wells coated with the same target peptide were combined and transferred to sterile 1.5 ml tubes containing neutralization buffer (1M Tris pH 11) (15 ul neutralization buffer per 100 ul of eluted phage). The entire volume of neutralized phage from each selection target was used to infect 4 ml of mid-log phase E.coli XL1Blue (OD600 = 0.4 to 0.8) during 30 minute incubation at 37°C, 200 rpm. A small volume of infected cells (10 ul) was removed to estimate output phage titer (see below.) Helper phage (M13KO7, New England Biolabs) was added at a concentration of 5x1010 cfu/ml to each culture and grown at 37°C, 200 rpm for an hour to allow superinfection. Each superinfected culture was transferred to 50 ml 2YT + carbenicillin 100 ug/ml + kanamycin 25 ug/ml in a 250 ml baffled flask and grown for 16 hours at 37°C, 200 rpm to propagate phage for the subsequent round. The naïve library input was titered by 10-fold serial dilution in PBS (10-1 to 10-16) and in-fection of log phase XL1Blue (10 ul phage dilution + 90 ul cells) for 30 minutes at 37°C, 200 rpm in a sealed 96-well plate, followed by plating on 2YT+carb plates and growth at 37°C overnight. The number of colonies was used to calculate the concentration of naïve library phage used in the first round of selections. The selection output was titered by serial dilution of the 10 ul aliquot of infected cells in 2YT and immediate plating to 2YT + carb plates for growth at 37°C overnight. Second round of selections: Three wells were coated with each GST-peptide or anti-gD antibody, as described for the first round. In addition, three wells were coated with GST alone, to

Page 52: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

44

be used in a preselection step to remove GST-binding PDZ-phage. Phage were harvested from the overnight culture supernatants as follows. Bacteria were pelleted by centrifugation at 26,700 x g for 10 minutes at 4°C and 40 ml of culture supernatant was transferred to a clean tube containing 10 ml of PEG/NaCl solution (20% (w/v) polyethylene glycol 8000 + 2.5 M NaCl) and incubated on ice for 30 minutes. After centrifugation at 26,700 x g for 20 minutes, the supernatant was discarded. The tubes were centrifuged again briefly to collect residual supernatant, which was removed by pipetting. The precipitated phage was resuspended by gentle pipetting with 1 ml PBS + 0.5% BSA + 0.05% Tween-20 and transferred to sterile 1.5 ml tubes. The resuspended phage was centrifuged at 16,100 x g and 4°C to pellet insoluble material, and the phage supernatant was transferred to a clean tube, for use as round 2 input phage. After blocking GST-coated preselection wells for 2 hours with 200 ul PBS+0.5% BSA, wells were washed three times with PBS+0.5% Tween-20. Then 100 ul of input phage from each selec-tion was added to three wells and incubated at room temperature for 1 hour. After blocking the GST-peptide or anti-gD coated wells for 3 hours, wells were washed three times with PBS+0.5% Tween-20. Input phage from GST preselection wells were transferred to appropriate selection wells and incubated at room temperature for 2 hours. Washing, phage elution, and neutralization were carried out as for round 1. Half the neu-tralized phage was used to infect 1.5 ml of log-phase XL1Blue, which was superinfected and propa-gated overnight as described. Round 2 input phage was titered using the method described for the naïve library. Round 2 output phage was titered as described for round 1 output phage. Residual input and output phage were stored at -20°C. Third, fourth and fifth rounds of selection: Subsequent rounds of selection were carried out in the same manner as round 2, with the following modification: GST-peptides were added as competitors during selection. First, competitor mixtures were prepared that contained nineteen GST-peptides (excluding a given selection target) each at a concentration of 0.5 ug/ml. Only 90 ul of input phage were transferred from each preselection well to selection well, and then 10 ul of appropriate competitor mixture was added to each selection well. The final concentration of all GST-peptides in solution was 320 nM; the final concentration of each individual GST-peptide was 17 nM.

2.10.4 Binding validation and sequencing of PDZ variants

PDZ-phage clones from rounds 4 and 5 were tested for peptide binding by enzyme-linked immunosorbent assay (ELISA) and positive binders were sequenced, as follows. Adjacent wells of a 384-well Maxisorp plate were coated with 30 ul of selection target GST-peptide (100 ug/ml) or GST control (100 ug/ml) overnight at 4°C. For anti-gD selections, adjacent wells were coated with 30 ul of anti-gD antibody diluted 1:1000 in PBS, or GST control. Output phage from selection rounds 4 and 5 were infected into log-phase XL1Blue and serial dilutions were plated to 2YT+carbenicillin.

Page 53: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

45

Single colonies were picked into 450 ul of 2YT+carbenicillin 100 ug/ml + 5x1010 cfu/ml M13KO7 in a 96-well block and grown overnight at 37°C, 200 rpm to produce clonal phage supernatants. I picked 96 clones from each peptide selection and from the selection for protease resistance, and thus, a total of 2100 clones were screened. The following day, coated wells were blocked with PBS+0.5% BSA for one hour at room temperature and then washed three times with PBS+0.05% Tween-20. Overnight cultures were centrifuged at 3400 xg for 10 minutes to pellet bacteria, and then 30 ul of each clonal phage super-natant was transferred to GST-peptide and GST control wells and incubated at room temperature for one hour. Wells were washed four times with PBS+0.05% Tween-20. Anti-M13:HRP conjugated antibody was diluted 1:8000 in PBS+0.5% BSA+0.05% Tween-20) and 30 ul was added to every well, and allowed to bind at room temperature for 45 minutes. Wells were washed eight times with PBS+0.05% Tween-20. Colourimeteric HRP substrate reagents (TMB substrate, Pierce) were mixed 1:1 and 25 ul was added to every well. The reaction was allowed to proceed for 5-10 minutes with gentle shaking and then stopped by addition of 25 ul of 1 M H3PO4. Absorbance at 450 nm was measured for each well using a plate reader. Ratio of binding to GST-peptide relative to GST control was used to rank PDZ-phage clones for each selection. The top 24 clones from each selection, with a minimum two-fold differ-ence in binding to GST-peptide over GST control, were sequenced. All 96 clonal PDZ-phage from selection for protease resistance exhibited strong, specific binding to immobilized antibody; all 96 clones were sequenced. The clonal phage supernatant was used as template in a PCR reaction that amplified the displayed PDZ variant and added universal primer sites to either end of the PCR product. Residual primers and dNTPS were removed enzymatically using exonuclease I (USB) and shrimp alkaline phosphatase (USB) and the cleaned-up PCR product was submitted for Sanger sequencing.

2.10.5 Subcloning, expression and purification of PDZ variants

PDZ variants were expressed as C-terminal fusions to N-terminally 6xHis-tagged GST in an IPTG-inducible ORF (vector pS2771). PDZ variants selected from the library were subcloned using conventional methods. PCR was used to amplify a PDZ variant’s coding sequence and added flanking KpnI and SpeI restriction enzyme sites. Digested PCR product was ligated into similarly digested vector pS2771 and transformed into XL1Blue. PDZ variants that were not directly selected from the library (point mutants, proximal resi-due mutant and distal residue mutants) were generated by oligonucleotide-directed single strand mutagenesis at 1/10th of the scale described above for library construction and transformed into chemically competent XL1Blue. Some of the single mutant PDZ domains were described in previ-ous work [13] and these plasmids were resequenced and used to transform chemically competent XL1Blue.

Page 54: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

46

In all cases, single colonies were arrayed in 96-well boxes with individual cells, grown over-night in 200 ul 2YT + 100 ug/ml carbenicillin and stored as glycerol stocks (25% glycerol v/v) at -80°C. Correct GST-PDZ variants were re-arrayed to two 96-well plates, and resequenced. GST-PDZ variants were expressed in a 96-well format using deep well plates (Whatman). GST-PDZ glycerol stocks were partly thawed and used to inoculate overnight starter cultures (450 ul 2YT + 100 ug/ml carbenicillin, 37°C, 200 rpm). The following day, each starter culture plate was used to inoculate four duplicate expression plates filled with 1.4 ml 2YT + 50 ug/ml carbenicillin + 0.4 mM IPTG. The expression plates were grown at 37°C, 200 rpm for 48 hours. One expression plate was centrifuged to pellet the bacteria (3400 x g for 10 minutes), the supernatant was discarded and the cultures from a second duplicate plate were transferred on top of these pellets and centri-fuged again. This process was repeated to combine the contents of four duplicate plates to a single plate. The plate containing the bacterial pellets was sealed with foil and frozen at -20°C overnight. GST-PDZ variants were purified using their N-terminal 6XHis tags and immobilized metal affinity chromatography (IMAC). Frozen bacterial pellets were resuspended in 250 ul resuspen-sion buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 5 mM imidazole, 5% glycerol, 1 mM PMSF, 1 mM benzamidine) by pipetting. After thorough resuspension, bacteria were lysed by addition of 750 ul of lysis buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 5 mM imidazole, 5% glycerol, 1 mM PMSF, 1 mM benzamidine, 1% Triton X-100, 1 mg/ml lysozyme, 5 units/ml DNAseI) and incuba-tion with vigorous shaking at room temperature for 30 minutes. Insoluble material was collected by centrifugation at 3400 x g for 60 minutes. During centrifugation, Ni-NTA resin was equilibrated in resuspension buffer and 50 ul packed resin was aliquotted to each well of a 96-well filter plate (Seahorse Bioscience). Filter plates were centrifuged briefly to confirm retention of NiNTA resin in every well and to remove excess equilibration buffer, and then the bottom of the filter plate was sealed with parafilm. Cleared lysate was transferred to the resin-containing filter plate, sealed with foil, and in-cubated with end-over end mixing for at least one hour at room temperature. Unbound lysate was removed by centrifugation. The resin was washed four times with wash buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 30 mM imidazole, 5% glycerol). For each wash, the bottom of the filter plate was sealed with parafilm, the wash buffer was aliquotted to each well, the top of the filter plate was sealed with adhesive foil and the plate was shaken vigorously to resuspend the resin. Wash buffer was removed by centrifugation (1000 x g). To elute the GST-PDZ proteins, the bottom of the filter plate was sealed, and 200 ul of elu-tion buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 250 mM imidazole, 5% glycerol) was aliquotted to each well and incubated for 10 minutes at room temperature. Eluate was collected by centrifuga-tion into a non-binding 96-well plate (NUNC). Protein concentrations were determined by Bradford assay in 96-well format. SDS-PAGE and Coomassie staining were used to evaluate the purity of selected GST-PDZ samples , represent-ing the full range of observed concentrations. Typical protein yields ranged from 5-50 ug, and

Page 55: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

47

purity generally exceeded 90% based on visual inspection.

2.10.6 Highthroughput peptide profiling of PDZ variants

A peptide-phage library displaying random heptapeptides fused to the C-terminus of the M13 major coat protein was selected against immobilized GST-PDZ variants, to elucidate their peptide binding preferences, as previously described [25]. The primary peptide-phage library was produced using single-strand mutagenesis, essentially as described in section 2.4.1, using degen-erate oligonucleotide 5’AGTTTTATAAATATTNNKNNKNNKNNKNNKNNKNNKTTGCTA-AAAACTTTC. The theoretical library diversity (1x109) was exceeded by the electroporation diver-sity (1x1010)). To carry out 200 selections in parallel with an input phage titer of at least 1x1012 peptide-phage per PDZ variant required at least 2x1014 phage, which required re-amplifying the primary library. Briefly, 1x1012 bacteria (250 ml of E. coli SS320 at OD600 = 0.8) were infected with 1x1012

peptide-phage, for 30 minutes at 37°C, 200 rpm. Serial dilutions of the infected culture were plated to confirm that the re-infection titer exceeded the theoretical and practical library diversities by several orders of magnitude. Infected cells were then superinfected with helper phage (1x1013 cfu M13KO7) for 60 minutes at 37°C, 200 rpm to rescue every infected cell. The rate of superinfection was also confirmed by titration. The superinfected culture was then diluted into 5L superbroth + 100 ug/ml carbenicillin + 25 ug/ml kanamycin and cultured at 37°C, 200 rpm in eight 2-L baffled flasks for 22 hours. The re-amplified peptide phage library was harvested from the culture supernatants by two rounds of PEG precipitation, as described for preparation of the PDZ variant library.The library was stored in 50% glycerol (v/v) at -20°C and the stock concentration was titered as described above; the total yield of reamplified peptide-phage exceeded 1x1015 cfu. First round of selections: For the first round, 20 ul of purified protein was diluted in 80 ul of sterile PBS (X to X ug/ml final concentration) in each well and incubated overnight at 4°C. Wells were blocked for two hours at room temperature with blocking buffer (PBS + 0.5% BSA). Peptide-phage library was re-precipitated and resuspended at 1x1013 cfu/ml in PBS+0.5% BSA + 0.05% Tween-20. Blocked wells were washed three times with PBS+0.05% Tween-20 and 100 ul of naïve peptide-phage library was aliquoted to each well and allowed to bind at 4°C for 2 hours. Unbound phage were discarded and wells were washed eight times with cold PBS + 0.5% Tween-20. Bound phage were eluted by incubation with mid log phase XL1Blue (OD600 = 0.4 – 0.8) for 30 minutes at 37°C, 200 rpm. Xl1Blue were superinfected with helper phage (1x1010 cfu/ml M13KO7) for 45 minutes at 37°C, 200 rpm and then transferred to a 96-well block containing 1.4 ml 2YT + 100 ug/ml carbenicillin + 25 ug/ml kanamycin. Cultures were grown for 20 hours at 37°C, 200 rpm to propagate phage for the next round of selection. Naïve library input was titered by serial dilution, infection into log phase XL1Blue

Page 56: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

48

and plating on 2YT+carb. First round output from one row of one selection plate (encompassing a parental Erbin GST-PDZ control) were also determined by serial dilution and plating. Second through fifth rounds of selection: For all subsequent rounds, 10 ul of purified pro-tein was diluted in 90 ul of sterile PBS in each well of a Maxisorp plate and incubated overnight at 4°C. Overnight cultures were centrifuged 3400 x g, 10 minutes) to precipitate bacteria. Phage su-pernatant was transferred to a sterile 96-well block and pH-adjusted by addition of 1/10th volume of 10X PBS stock. pH adjusted phage supernatant was heat-sterilized by incubation at 65°C for 30 minutes. Selections were carried out as for round 1, except that sterilized pH-adjusted phage su-pernatant was used instead of naïve library. Input and output phage from one row of the selection plate (encompassing a wildtype GST-PDZ control) were titered as described above. Peptide-phage ELISA: Enrichment of peptide-phage pools was monitored for selection rounds 1 to 5 by ELISA. Adjacent wells of a 384-well Maxisorp plate were coated with 30 ul of GST-variant (same concentration range as for selections) or GST control (10 ug/ml) overnight at 4°C. The following day, coated wells were blocked with PBS+0.5% BSA for one hour at room tempera-ture and then washed three times with PBS+0.05% Tween-20. Heat-sterilized, pH adjusted peptide phage supernatant, chilled to 4°C, was added to GST-PDZ variant and GST control wells (30 ul each) and incubated at 4°C for 2 hours. The ELISA plates were washed four times with cold PBS+0.5% Tween-20. Anti-M13:HRP conjugated antibody was diluted 1:8000 in PBS+0.5% BSA+0.05% Tween-20)and 30 ul was added to every well, and al-lowed to bind at room temperature for 45 minutes. Wells were washed eight times with PBS+0.05% Tween-20. Colourimetric peroxidase substrate reagents (TMB substrate, Pierce) were mixed 1:1 and 25 ul was added to every well. The reaction was allowed to proceed for 5 to 10 minutes with gentle shaking and then stopped by addition of 25 ul of 1 M H3PO4. Absorbance at 450 nm was measured for each well using a plate reader. Analysis of the ratio of peptide-phage binding to GST-PDZ vari-ant relative to GST control showed the progress of enrichment.

2.10.7 Preparation of barcoded cluster-ready PCR products for Illumina sequencing

The Illumina platform requires the addition of DNA adapters to either end of the region of interest, to permit hybridization to the flow cell, amplification of the hybridized DNA, and an-nealing of the sequencing primer. I added the adapters using polymerase chain reaction (PCR) on peptide-phage template. I added barcodes to the adapter PCR primers such that these unique six- and eight-base se-quences would be part of each sequencing read. Unique combinations of 24 forward and 24 reverse barcodes were used to multiplex several hundred peptide-phage pools. The barcodes were designed to have equal representation of all bases at all positions, and to be sufficiently different that a single incorrect base call could not convert one barcode into another. The forward and reverse primers,

Page 57: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

49

with the barcodes replaced by X, were as follows: forward primer: 5’CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTXXXXXXAGC-GCCCCCGGGGCGGAreverse primer: 5’AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC-GCTCTTCCGATCTXXXXXXXXAGGAGCCTTTAATTGTATCGGT The barcoding PCR reactions were designed to minimize the possibility of introducing bias. The reactions used the highest recommended amount of Phusion high fidelity DNA polymerase (Fermentas), and sufficient phage supernatant to supply an excess of template phage genomes. The PCR reaction mixture was 1X HF buffer, 200 uM dNTPs, 0.5 uM forward primer (IDT), 0.5 uM reverse primer (IDT), with 1 U Phusion enzyme and 20 ul of pH adjusted, heat sterilized peptide-phage supernatant per 50 ul reaction. The thermocycler program was 1:98°C 180s, 2:98°C 10s, 3:68°C 10s, 4:72°C 10 s, 5: GOTO2 – 24X, 6:72°C, 300s, 7:4°C ∞. Approximately equal amounts of barcoded PCR product from each peptide-phage pool were combined, based on agarose gel electrophoresis band intensity. The pool was concentrated us-ing a column clean up kit (Qiagen) and then purified by agarose gel extraction. The manufacturer’s protocol was modified to use extended incubation at room temperature to dissolve the agarose, rather than heating, which has been shown to introduce a GC bias [33]. The DNA concentration of the purified PCR product was quantitated using a dsDNA spe-cific fluorescent dye (PicoGreen, Invitrogen) and supplied to Sequensys (La Jolla, California) for single end, 125 base sequencing. Two independent PCR reactions were prepared for each fifth round peptide-phage selection pool, combined separately and run on different lanes of the Illu-mina sequencer. The agreement between these technical replicates was exceedingly high (data not shown.)

2.10.8 Sequencing data processing and logo generation

Each Illumina lane yielded more than 36 million reads. The sequencing data were processed by Simon Taehyung Kim (a student in Philip Kim’s lab), using software he designed for this applica-tion [31]. The data were filtered to remove any reads containing any base call with a quality score less than PHRED 30, which corresponds to a per-base accuracy greater than 99.9%. More than 30 million reads met that quality threshold. To address misattribution due to sequencing errors in the barcodes, the data were filtered to retain only sequences that appeared multiple times. The total number of sequences per multiplexed sample varied by several orders of magnitude, due to approximate quantitation before pooling PCR products. Instead of setting a fixed cutoff (eg. only including sequences that appeared at least 100 times), sequences were ranked in order of decreasing read counts and the top 5% most frequently-observed sequences were retained and translated. Peptide sequences with premature stop codons were omitted, and the remaining peptide sequences were summarized as logos, with every unique

Page 58: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

50

sequence represented once, regardless of the number of corresponding reads. The distribution of read counts wasextremely skewed, so the top 5% of most common se-quences represent a much larger percentage of the total reads. In effect, this filtering method omit-ted sequences that are observed only once or a few times, and which probably correspond to PCR or sequencing errors rather than true binding peptide sequences. This approach resulted in logos representing a few hundred peptides, each of which was sequenced thousands of times, with correspondingly high confidence. In contrast, processing the naïve library sequences in this manner reduces 120,126 reads to a logo with no obvious specificity representing 96 peptides.

Page 59: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

3 Conclusion

51

Page 60: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

52

3.1 Summary of work and future directions

The goal of this investigation was to gain insight into PDZ domain specificity. Using direct-ed evolution, I selected a large set of PDZ variants for their ability to bind peptides with different position-2 residues. Peptide profiling of this comprehensive set of PDZ variants allowed me to assess the necessity and sufficiency of a subset of mutations to alter specificity. This systematic approach generated insight into subclass specificities in the most common natural specificity classes (I [ST]-2 and II Φ-2), demonstrated an alternative basis for a rare specificity (Class III D-2) and predicted that some natural domains may exhibit an easily-evolved novel specificity (Class IV R-2). These results also emphasize that key non-contact residues may have a disproportionate effect on position-2 spec-ificity. Ongoing and future experiments will test whether inclusion of non-contact residue α2-4 can improve predictions of natural PDZ domain position-2 specificity. As well, human PDZ domains that are anticipated to have Class IV specificity will be peptide profiled to test this prediction. Given that one or two mutations can sometimes be sufficient to completely alter the peptide binding mode, this work also underscores the importance of unbiased characterization of engi-neered peptide binding modules; comparing affinities for a limited number of peptides could be quite misleading. This at least partly explains why previous PDZ engineering efforts have failed to yield custom peptide binding reagents, despite enthusiastic promises [23, 24]. Combinatorial libraries probably need to include critical non-contact residues in order to yield truly selective vari-ants, particularly for hydrophobic specificities. The downside to this approach is that the overall peptide binding mode may change in unpredictable ways. The highly multiplexed peptide profiling approach detailed here represents a key technical advance that will facilitate future engineering ef-forts. High throughput peptide profiling allows efficient prioritization of subsequent low through-ghput, quantitative studies. In this case, the next step is to quantitate selectivity of some key vari-ants by measuring their affinities for appropriate sets of peptides. These will include the set of 20 peptides with different position-2 residues, as well as three ligands for the W-2 variants based on the multiple specificity logos. Shawn Li’s group at the University of Western Ontario will undertake these measurements using fluorescence polarization. Quantitative affinity measurements will also test the effect of charged residues at distal positions of the α2 helix on affinity and specificity. These quantitative measurements will help us to understand the energetic consequences of these muta-tions on binding in more detail. The tremendous expansion of the PDZ domain in metazoan proteomes makes sense in light of this fold’s ability to support diversity at key specificity determining residues, as demonstrated by the outcome of this engineering endeavour.

Page 61: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

53

Page 62: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

54

References

[1] Pawson, T. (2003). Assembly of Cell Regulatory Systems Through Protein Interaction Domains. Science 300, 445–452. [2] Tong, A.H.Y. (2001). A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules. Science 295, 321–324.

[3] Sidhu, S.S., Bader, G.D., and Boone, C. (2003). Functional genomics of intracellular peptide recognition domains with combinatorial biology methods. Curr Opin Chem Biol 7, 97–102.

[4] Bhattacharyya, R.P., Reményi, A., Yeh, B.J., and Lim, W.A. (2006). Domains, motifs, and scaf-folds: the role of modular interactions in the evolution and wiring of cell signaling circuits. Annu. Rev. Biochem. 75, 655–680.

[5] Doyle, D.A., Lee, A., Lewis, J., Kim, E., Sheng, M., and MacKinnon, R. (1996). Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ. Cell 85, 1067–1076.

[6] Songyang, Z., Fanning, A.S., Fu, C., Xu, J., Marfatia, S.M., Chishti, A.H., Crompton, A., Chan, A.C., Anderson, J.M., and Cantley, L.C. (1997). Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. Science 275, 73–77.

[7] Laura, R.P., Witt, A.S., Held, H.A., Gerstner, R., Deshayes, K., Koehler, M.F.T., Kosik, K.S., Sidhu, S.S., and Lasky, L.A. (2002). The Erbin PDZ domain binds with high affinity and specificity to the carboxyl termini of delta-catenin and ARVCF. J. Biol. Chem. 277, 12906–12914.

[8] Skelton, N.J. (2002). Origins of PDZ Domain Ligand Specificity. STRUCTURE DETERMINA-TION AND MUTAGENESIS OF THE ERBIN PDZ DOMAIN. Journal of Biological Chemistry 278, 7645–7654.

[9] Appleton, B.A. (2006). Comparative Structural Analysis of the Erbin PDZ Domain and the First PDZ Domain of ZO-1: INSIGHTS INTO DETERMINANTS OF PDZ DOMAIN SPECIFICITY. Journal of Biological Chemistry 281, 22312–22320.

[10] Zhang, Y. (2006). Convergent and Divergent Ligand Specificity among PDZ Domains of the LAP and Zonula Occludens (ZO) Families. Journal of Biological Chemistry 281, 22299–22311.

Page 63: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

55

[11] Runyon, S.T., Zhang, Y., Appleton, B.A., Sazinsky, S.L., Wu, P., Pan, B., Wiesmann, C., Skel-ton, N.J., and Sidhu, S.S. (2007). Structural and functional analysis of the PDZ domains of human HtrA1 and HtrA3. Protein Science 16, 2454–2471.

[12] Stiffler, M.A., Chen, J.R., Grantcharova, V.P., Lei, Y., Fuchs, D., Allen, J.E., Zaslavskaia, L.A., and MacBeath, G. (2007). PDZ domain binding selectivity is optimized across the mouse pro-teome. Science 317, 364–369.

[13] Tonikian, R., Zhang, Y., Sazinsky, S.L., Currell, B., Yeh, J.-H., Reva, B., Held, H.A., Appleton, B.A., Evangelista, M., Wu, Y., et al. (2008). A specificity map for the PDZ domain family. PLoS Biol. 6, e239.

[14] Tonikian, R., Zhang, Y., Boone, C., and Sidhu, S.S. (2007). Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries. Nat Protoc 2, 1368–1386.

[15] Ernst, A., Gfeller, D., Kan, Z., Seshagiri, S., Kim, P.M., Bader, G.D., and Sidhu, S.S. (2010). Co-evolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing. Mol Biosyst 6, 1782–1790.

[16] Letunic, I., Doerks, T., and Bork, P. (2012). SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 40, D302–305.

[17] Sheng, M., and Sala, C. (2001). PDZ domains and the organization of supramolecular com-plexes. Annu. Rev. Neurosci. 24, 1–29.

[18] Zhang, Y., Appleton, B.A., Wiesmann, C., Lau, T., Costa, M., Hannoush, R.N., and Sidhu, S.S. (2009). Inhibition of Wnt signaling by Dishevelled PDZ peptides. Nat. Chem. Biol. 5, 217–219.

[19] Zimmermann, P., Meerschaert, K., Reekmans, G., Leenaerts, I., Small, J.V., Vandekerckhove, J., David, G., and Gettemans, J. (2002). PIP(2)-PDZ domain binding controls the association of syntenin with the plasma membrane. Mol. Cell 9, 1215–1225.

[20] Im, Y.J., Park, S.H., Rho, S.-H., Lee, J.H., Kang, G.B., Sheng, M., Kim, E., and Eom, S.H. (2003). Crystal structure of GRIP1 PDZ6-peptide complex reveals the structural basis for class II PDZ tar-get recognition and PDZ domain-mediated multimerization. J. Biol. Chem. 278, 8501–8507.

[21] Im, Y.J. (2003). Crystal Structure of the Shank PDZ-Ligand Complex Reveals a Class I PDZ In-teraction and a Novel PDZ-PDZ Dimerization. Journal of Biological Chemistry 278, 48099–48104.

Page 64: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

56

[22] Chang, B.H., Gujral, T.S., Karp, E.S., BuKhalid, R., Grantcharova, V.P., and MacBeath, G. (2011). A systematic family-wide investigation reveals that ~30% of mammalian PDZ domains engage in PDZ-PDZ interactions. Chem. Biol. 18, 1143–1152.

[23] Reina, J., Lacroix, E., Hobson, S.D., Fernandez-Ballester, G., Rybin, V., Schwab, M.S., Serrano, L., and Gonzalez, C. (2002). Computer-aided design of a PDZ domain to recognize new target se-quences. Nat. Struct. Biol. 9, 621–627.

[24] Ferrer, M., Maiolo, J., Kratz, P., Jackowski, J.L., Murphy, D.J., Delagrave, S., and Inglese, J. (2005). Directed evolution of PDZ variants to generate high-affinity detection reagents. Protein Eng. Des. Sel. 18, 165–173.

[25] Ernst, A., Sazinsky, S.L., Hui, S., Currell, B., Dharsee, M., Seshagiri, S., Bader, G.D., and Sidhu, S.S. (2009). Rapid evolution of functional complexity in a domain family. Sci Signal 2, ra50.

[26] Schneider, T.D., and Stephens, R.M. (1990). Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100.

[27] Black, S.D., and Mould, D.R. (1991). Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal. Biochem. 193, 72–82.

[28] Voynov, V., Chennamsetty, N., Kayser, V., Helk, B., and Trout, B.L. (2009). Predictive tools for stabilization of therapeutic proteins. MAbs 1, 580–582.

[29] Stricker, N.L., Christopherson, K.S., Yi, B.A., Schatz, P.J., Raab, R.W., Dawes, G., Bassett, D.E., Jr, Bredt, D.S., and Li, M. (1997). PDZ domain of neuronal nitric oxide synthase recognizes novel C-terminal peptide sequences. Nat. Biotechnol. 15, 336–342.

[30] Luck, K., and Trave, G. (2011). Phage display can select over-hydrophobic sequences that may impair prediction of natural domain-peptide interactions. Bioinformatics 27, 899–902.

[31] Kim, T., Tyndel, M.S., Huang, H., Sidhu, S.S., Bader, G.D., Gfeller, D., and Kim, P.M. (2012). MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets. Nucleic Acids Res. 40, e47.

[32] Shepherd, T.R., Hard, R.L., Murray, A.M., Pei, D., and Fuentes, E.J. (2011). Distinct Ligand Specificity of the Tiam1 and Tiam2 PDZ Domains. Biochemistry 50, 1296–1308.

Page 65: Exploration of peptide recognition using directed evolution of the … · 2013. 7. 16. · vii List of Figures Figure 1.1 Structural basis of PDZ domain specificity for C-terminal

57

[33] Quail, M.A., Kozarewa, I., Smith, F., Scally, A., Stephens, P.J., Durbin, R., Swerdlow, H., and Turner, D.J. (2008). A large genome center’s improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010.