AID Recognizes Structured DNA for Class Switch...

18
Article AID Recognizes Structured DNA for Class Switch Recombination Graphical Abstract Highlights d Structured substrates, such as G4 substrates, are preferred AID targets in vitro d A bifurcated substrate-binding surface supports structured- substrate recognition d G4 substrates induce AID oligomerization upon binding d Disrupting structured-substrate recognition or AID oligomerization compromises CSR Authors Qi Qiao, Li Wang, Fei-Long Meng, Joyce K. Hwang, Frederick W. Alt, Hao Wu Correspondence [email protected] In Brief Qiao et al. demonstrated that structured substrates, like G4 and branched DNA, are preferred AID targets in vitro. A bifurcated substrate-binding surface in AID structure supports structured- substrate recognition. G4 substrates mimicking the Ig S regions also induce cooperative AID oligomerization. Disrupting structured-substrate recognition or AID oligomerization both compromise CSR. Qiao et al., 2017, Molecular Cell 67, 361–373 August 3, 2017 ª 2017 Elsevier Inc. http://dx.doi.org/10.1016/j.molcel.2017.06.034

Transcript of AID Recognizes Structured DNA for Class Switch...

Article

AID Recognizes Structure

d DNA for Class SwitchRecombination

Graphical Abstract

Highlights

d Structured substrates, such as G4 substrates, are preferred

AID targets in vitro

d A bifurcated substrate-binding surface supports structured-

substrate recognition

d G4 substrates induce AID oligomerization upon binding

d Disrupting structured-substrate recognition or AID

oligomerization compromises CSR

Qiao et al., 2017, Molecular Cell 67, 361–373August 3, 2017 ª 2017 Elsevier Inc.http://dx.doi.org/10.1016/j.molcel.2017.06.034

Authors

Qi Qiao, Li Wang, Fei-Long Meng,

Joyce K. Hwang,

Frederick W. Alt, Hao Wu

[email protected]

In Brief

Qiao et al. demonstrated that structured

substrates, like G4 and branched DNA,

are preferred AID targets in vitro. A

bifurcated substrate-binding surface in

AID structure supports structured-

substrate recognition. G4 substrates

mimicking the Ig S regions also induce

cooperative AID oligomerization.

Disrupting structured-substrate

recognition or AID oligomerization both

compromise CSR.

Molecular Cell

Article

AID Recognizes Structured DNAfor Class Switch RecombinationQi Qiao,1,2,5 Li Wang,1,2,5 Fei-Long Meng,1,2,3,4 Joyce K. Hwang,1,2,3 Frederick W. Alt,1,2,3 and Hao Wu1,2,6,*1Program in Cellular and Molecular Medicine, Boston Children’s Hospital2Department of Biological Chemistry and Molecular Pharmacology3Howard Hughes Medical InstituteHarvard Medical School, Boston, MA 02115, USA4Present address: Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320

Yue-yang Road, Shanghai 200031, China5These authors contributed equally6Lead Contact

*Correspondence: [email protected]

http://dx.doi.org/10.1016/j.molcel.2017.06.034

SUMMARY

Activation-induced cytidine deaminase (AID) initiatesboth class switch recombination (CSR) and somatichypermutation (SHM) in antibody diversification.Mechanisms of AID targeting and catalysis remainelusive despite its critical immunological rolesand off-target effects in tumorigenesis. Here, we pro-duced active human AID and revealed its preferredrecognitionanddeaminationof structuredsubstrates.G-quadruplex (G4)-containing substrates mimickingthe mammalian immunoglobulin switch regions areparticularly good AID substrates in vitro. By solvingcrystal structures of maltose binding protein (MBP)-fused AID alone and in complex with deoxycytidinemonophosphate, we surprisingly identify a bifurcatedsubstrate-binding surface that explains structuredsubstrate recognition by capturing two adjacent sin-gle-stranded overhangs simultaneously. Moreover,G4 substrates induce cooperative AID oligomeriza-tion. Structure-based mutations that disrupt bifur-cated substrate recognition or oligomerization bothcompromise CSR in splenic B cells. Collectively, ourdata implicate intrinsic preference of AID for struc-tured substrates and uncover the importance of G4recognition and oligomerization of AID in CSR.

INTRODUCTION

Antibody diversification is a central process in adaptive immunity

that produces antigen-specific, high-affinity antibodies to com-

bat millions of antigens. In B cells, immunoglobulin (Ig) genes un-

dergo two DNA-alteration events to enhance the specificity and

functionality of antibodies: somatic hypermutation (SHM) and

class switch recombination (CSR). Until now, activation-induced

cytidine deaminase (AID) is the single enzyme that is known to

initiate both SHM and CSR. AID belongs to the APOBEC cytidine

M

deaminase family. It converts deoxycytidine into deoxyuridine

on single-stranded DNA (ssDNA) substrates in vitro and in vivo

but does not show detectable activity on RNA substrates (Bran-

steitter et al., 2003). Although AID exhibits sequence similarity

to APOBEC cytidine deaminases, its critical function in antibody

diversification, especially in CSR, cannot be substituted by

APOBEC proteins.

Across the genome, AID predominantly mutates immunoglob-

ulin genes at the variable (V) and switch (S) regions. In SHM, V re-

gion sequencing data showed that a WRCH (W = A/T, R = A/G,

and H = A/C/T) hotspot motif exhibits higher AID-induced muta-

tion probability than average, by about 2- to 4-fold (Larijani et al.,

2005; Rogozin and Diaz, 2004). In CSR, mammalian S regions lie

between sets of constant region exons and are enriched in the

AGCT sequence, a palindromic example of WRCH. Deamination

at both strands of DNA may contribute to double-strand breaks

(DSBs) by co-opted repair pathways (Han et al., 2011; Yeap

et al., 2015). Joining of AID-initiated DSBs replaces the IgM

heavy chain constant region with that of other isotypes to

achieve class switching (Matthews et al., 2014a). Patients with

AID mutations produce mainly low-affinity IgM antibodies with

impairment in CSR and/or SHM, a syndrome known as hyper-

IgM immunodeficiency (Xu et al., 2012). AID off-target mutations

as well as the subsequent DSBs and chromosomal transloca-

tions often promote tumorigenesis, in particular for many types

of leukemia and lymphoma (Xu et al., 2012).

The mechanism of AID targeting has been a long-standing

mystery. Genome sequencing data showed that AID mutates

Ig regions at a 10�4–10�3 per base, per generation frequency

(McKean et al., 1984; Rajewsky et al., 1987), which is a million-

fold higher than the 10�9 genomic basal mutation frequency.

Numerous models of AID targeting have been proposed, which

both propel the field forward and leave remaining questions un-

answered. Chromatin accessibility and transcription that pro-

vides ssDNA substrates for AID to act is a known prerequisite,

but it does not explain AID targeting specificity to limited

genomic regions. Convergently transcribed regions in certain su-

per-enhancer loci appear to correlate with enhanced AID

off-target mutations (Meng et al., 2014), but it is not clear

whether Ig regions undergo convergent transcription, and not

olecular Cell 67, 361–373, August 3, 2017 ª 2017 Elsevier Inc. 361

all super-enhancer loci with convergent transcription are AID tar-

gets (Qian et al., 2014). The originally proposed hotspot model

(Pham et al., 2003) also does not appear sufficient, as ssDNA

substrates containing hotspot motifs do not show advantage in

recruiting AID in vitro (Larijani et al., 2007), and hotspot motif dis-

tribution in the genome does not correlate with the AID off-tar-

geting spectrum in vivo (Duke et al., 2013). In AID recruiter

models, various proteins, including replication protein A (RPA)

(Chaudhuri et al., 2004), Spt5 (Pavri et al., 2010), and 14-3-3

(Xu et al., 2010), have been proposed in guiding AID to genes.

However, the genome distribution of these recruiters is not

unique to Ig regions. A more recent recruiter model proposed

that transcribed G-repeat rich switch RNAs form G-quadruplex

(G4) structures, bind AID, and guide AID to genes for CSR (Zheng

et al., 2015). However, themechanism of transfer of AID between

G4 RNA and target DNA has not been addressed.

Here, we present a comprehensive biochemical and structural

study on AID, which provides unexpected insights on AID tar-

geting specificity. Through protein engineering, we produced a

fully functional monomeric AID with intact activity in vitro and in

cells. Using various forms of DNA, we found that structured

substrates containing multiple ssDNA overhangs, like G4 and

branched DNA, are preferred by AID in binding and deamination

over linear ssDNA substrates in vitro. This observation may form

the basis for the frequent targeting of AID tomammalian Ig switch

regions containing high density ofG-repeat sequences.Wedeter-

mined the crystal structures of maltose binding protein (MBP)-

fused AID and its complex with cytidine (C), deoxycytidine (dC),

and deoxycytidine monophosphate (dCMP). These structures

not only explain the discrimination between DNA and RNA in

AID catalysis but also reveal a bifurcated substrate-binding sur-

face, which strongly supports that one AID recognizes two adja-

cent ssDNA overhangs from one structured substrate to achieve

high affinity. In addition, we observed that G4 structured sub-

strates induce AID cooperative oligomerization, which may pro-

mote clustered mutations in Ig S regions. Structure-guided muta-

genesis revealed that both the bifurcated substrate binding

surface and the putative oligomerization interface are essential

for CSR, elucidating recognition of structured substrates as an

important AID-targeting mechanism to Ig S regions.

RESULTS

Active AID Monomer from Protein EngineeringAlthough endogenous AID extracted from B cells exhibited near-

monomer molecular mass (Chaudhuri et al., 2003), previously

published data (Larijani et al., 2007) and our results showed

largely heterogeneous and poorly active aggregates of recombi-

nant wild-type (WT) AID (Figures 1A, 1B, S1A, and S1B). To

obtain physiologically relevant recombinant AID for functional

elucidation, we performed rounds of protein engineering. We

found that the His-MBP-fused double-mutant H130A/R131E

produced a low amount of monomeric AID, and further N-

and C-terminal tail (CTT) truncations improved the monomer

yield, resulting in constructs that we named AID.mono+CTT

and AID.mono, respectively (Figures 1A, 1B, S1A, and S1B).

Both monomeric and aggregated AID fractions were purified

and compared in deamination assays, revealing that monomeric

362 Molecular Cell 67, 361–373, August 3, 2017

AID with and without CTT exhibited much higher deamination

activity on ssDNA than aggregated fractions of WT AID,

AID.mono+CTT, and AID.mono (Figure 1C). Compared to pre-

viously reported AID turnover rates (�10 fmol/min/mg; King

et al., 2015), the activities of monomeric AID.mono+CTT and

AID.mono were roughly 1,000 fold higher (�104 fmol/min/mg).

These data suggest that aggregation renders AID largely inac-

tive. Consistent with the lack of sequence conservation of

H130 and R131 among AID species (Figure S1C), the H130A/

R131E mutant behaved as WT in mutation frequency by an

SHM-mimic rifampicin resistance (RifR) assay (Petersen-Mahrt

et al., 2002; Figure 1D) and fully rescuedCSRwhen reconstituted

into AID-deficient ex vivo CSR-activated splenic B cells (Figures

1E and S1F). Notably, all AID constructs tested in RifR and CSR

assays contain only the internal mutations, with no fusion tag,

and no truncations of the nuclear transportation signals at the

N or C terminus. Because AID.mono+CTT and AID.mono both

showed robust deamination activity and the His-MBP tag did

not affect this activity (Figures S1D and S1E), we mainly used

the His-MBP-fused AID.mono with higher yield in subsequent

biochemical characterizations.

Linear and G4 Structured Substrates from Ig S RegionsGenome sequencing has shown that themammalian Ig S regions

contain abundant tandem G repeats interspersed by AGCT hot-

spots and are heavily targeted by AID during CSR (Figure S2A;

Yu et al., 2003). It has been proposed that, during transcription,

G4 structures form on the G-repeat non-template strand and

contribute to R-loop stability (Duquette et al., 2004). Consis-

tently, we found that authentic non-template Sm fragment (64

nt) and a single G-repeat substrate both spontaneously assem-

bled into G4 structures (Figures 2A and 2B). The G4 assembly

was validated by the fluorescence enhancement of a G4-specific

dye and the disruptive effect of LiCl (Figure 2B; Bardin and Leroy,

2008). Gel electrophoresis showed that the 64-nt Sm fragment

formed heterogeneous oligomers, likely representing mixed

inter- and intramolecular G4s (Figure S2B). Differently, the single

G-repeat substrate formed homogeneous intermolecular G4

that could be separated from linear ssDNA by size-exclusion

chromatography and gel electrophoresis (Figures 2C, 2D, and

S2B). Dimethyl sulfate protection footprinting indicated that

all Gs in the GGGGTG motif were involved in the G4 assembly

(Figure S2C; Sun and Hurley, 2010). Circular dichroism (CD)

spectroscopy showed that the G-repeat substrates were pre-

dominantly parallel, instead of anti-parallel, G4 structures (Fig-

ure S2D; Vorlı�ckova et al., 2012). Notably, the purified linear

and G4 fractions of the single G-repeat substrate share the iden-

tical DNA sequence and only differ in structure. The same size-

exclusion purification procedure was used in preparing other

linear and G4 structured substrates containing a single G repeat

for in vitro studies.

AID, but Not APOBECs, Prefers Binding andDeaminating G4 Structured Substrates over LinearSubstratesDespite the identical DNA sequence, the purified linear and G4

substrates exhibited distinct behavior in AID binding and deam-

ination assays. The G4 structured substrates displayed�10-fold

Figure 1. Active Monomeric AID from Protein Engineering

(A) A schematic diagram of construct design with indicated mutation sites. More details are shown in Figure S1A.

(B) Gel filtration chromatography of WT AID, AID.mono+CTT, AID.mono, and AID.mono after MBP removal. The measured molecular masses by multi-angle light

scattering (MALS) and the theoretical molecular masses are shown.

(C) In vitro deamination assay showing that both monomeric AID.mono and AID.mono+CTT had much higher activity than aggregated AID on ssDNA. Experi-

ments used 0.1 mM AID and 1 mM DNA.

(D) Rifampicin resistance (RifR) assay (Petersen-Mahrt et al., 2002) in E. coli KL16 and its uracil-DNA glycosylase (UDG)-deficient derivative BW310 (ung�/�),showingmutation frequencies by AID.WT and AIDwithmutations in AID.mono and AID.crystal. UDG removes the uracil base as a first step in base-excision repair

following cytidine deamination, and its deficiency enhances AID-mediated mutation frequency. Data are represented as mean ± SD from 12 independent

measurements.

(E) CSR rescue of AID-deficient splenic B cells ex vivo by AID.WT and by AID with mutations in AID.mono and AID.crystal. Percent (%) CSR to IgG1 is the ratio

between GFP+/IgG1+ cells (upper right quadrant) and total GFP+ cells (the two right quadrants). Data are represented asmean ± SD from three to six independent

measurements.

See also Figure S1.

higher AID binding affinity (KD = 0.1–0.2 mM) than the linear sub-

strates of the same sequence (KD = 1.5–7.1 mM; Figure 3A). The

difference is irrespective of the hotspot or the direction of the

ssDNA overhang relative to the G-repeat sequence (Figure 3A).

By dissecting the G4 substrate structure, we found that AID

did not bind to the core structure but rather required at least

5-nt single-stranded overhangs for optimal interaction (Fig-

ure 3B). Previously, binding of AID to switch region RNA G4 tran-

scripts has been observed (Zheng et al., 2015). Interestingly, we

found that the binding of AID to RNA G4 is equal to DNA G4, with

similar affinities and requirement for single-stranded overhangs

(Figure 3C). Therefore, AID appears to recognize single-stranded

overhangs adjacent to a G4 core, with little dependence on their

sequence, orientation, and whether they are DNA or RNA.

Consistently, in vitro deamination assays showed that

AID.mono with and without CTT exhibited more robust activity

on the G4 structured substrates than the linear substrates,

despite the identical DNA sequence (Figures 3D and S3A). The

Molecular Cell 67, 361–373, August 3, 2017 363

Figure 2. Linear and G4 Structured Substrates

(A) Two types of AID substrates: one with multiple G-repeats and hotspots as in a mouse Sm fragment and the other with a single G repeat and hotspot. The

potential ability of these substrates to form either intramolecular or intermolecular G4 structure is illustrated.

(B) G4 structure formation confirmed by the G4-specific dye N-methyl mesoporphyrin IX (NMM). LiCl is known to inhibit G4 structures. In total, 10 mM DNA and

20 mMNMMwere used in each experiment. Heat denaturationwas done by 95�C incubation for 10min followed by flash cooling to eliminate DNA structures. Data

are represented as mean ± SD from three independent measurements.

(C) Separation of intermolecular G4 and linear fractions of single G-repeat substrate using a Superdex 75 gel filtration column.

(D) Native gel showing the clear separation of linear and G4 structured substrate from Superdex 75 gel filtration purification in (C).

See also Figure S2.

preference remained irrespective whether the substrates contain

a hotspot (AGCT) or cold spot (TTCT; Figures 3D and S3A).

Remarkably, in a competition assay, linear ssDNAwith or without

hotspots did not affect G4 structured substrate deamination

even at 100-fold molar excess but severely inhibited linear sub-

strate deamination at 10-fold molar excess (Figure 3E). In com-

parison, two AID homologs APOBEC3A and APOBEC3G did

not show a catalytic preference for G4 structured substrate (Fig-

ure 3F), suggesting that the G4 structure preference may be a

unique feature for AID specific functions, like CSR.

Location-Dependent Deamination by AID on G4Structured SubstrateTo probe how AID performs catalysis on G4 structured sub-

strates, we designed a series of G4 substrates with a hotspot

(AACT) or a cold spot (TTCT) located at different positions of

an ssDNA overhang. We found that the peak deamination activ-

ity was achieved when the target deoxycytidine was placed at

the third position 30 to the G4 core (Figures 3G and S3B). AID-

mediated deamination steadily decayed when the deoxycytidine

was moved away from the third position, albeit still significantly

364 Molecular Cell 67, 361–373, August 3, 2017

higher than that for the linear substrate (Figure 3G). Peak activ-

ities at the third position were similar between hotspot- and

cold-spot-containing substrates, suggesting that the G4 struc-

ture might override the hotspot preference in AID targeting. In

positions away from the third position, deamination activity on

hotspot substrates roughly doubled that on cold spot sub-

strates, recapitulating the observed hotspot preference in vitro

and in cells (Larijani et al., 2005; Pham et al., 2003). Notably, in

Ig S regions, a deoxycytidine often exists exactly at the third po-

sition from the G-repeat motif (GGGGTG; Figure S2A), an ideal

site for AID deamination, as suggested by our results.

Because of the significant cooperativity observed in binding

assays using G4 structured substrates (Hill coefficient n R 2;

Figure 3A), we suspected that AID-AID interaction might

occur uponG4 binding. The hypothesis was confirmed by recon-

stituting AID.mono/G4 complex in vitro, which mainly eluted

from the void position of a size-exclusion column with an

apparent measured molecular mass of �1.3 MDa (Figure S3C)

andwas observed as large oligomers under electronmicroscopy

(Figure S3D). To discern the consequence of AID oligomeriza-

tion, we designed a similar series of G4 substrates with the target

Figure 3. AID Preferentially Binds and Deaminates G4 Structured Substrates

(A) Electrophoretic mobility shift assay (EMSA) curves showing the significantly higher AID binding affinity of G4 fractions than linear fractions with identical

primary sequences.

(B) KD calculated by EMSA showing that ssDNA overhangs in G4 substrates are required for AID binding. Affinities increased with overhang length and plateaued

at 5 nt.

(C) EMSA showing that AID binds to RNA G4 similarly as to DNA G4.

(D) In vitro deamination assays showing that AID has a higher deamination activity on G4 substrates than on linear substrates, with or without hotspots.

Experiments used 0.1 mM AID and 1 mM DNA.

(E) Competition assays showing that excess linear substrates did not compete with G4 substrates. Experiments used 1 mM AID, 1 mM substrate DNA, and up to

100 mM competitor ssDNA. Reaction time was 10 min.

(F) In vitro deamination assay showing that Apobec3A and 3G did not exhibit G4 preference. Experiments used 0.1 mMAPOBEC protein and 1 mMsubstrate DNA.

(G) In vitro deamination assays showing that the peak activity of AID appears when the substrate nucleotide is at the third position 30 to the G4 core. Experiments

used 0.1 mM AID and 1 mM DNA. Reaction time was 10 min.

(H) In vitro deamination assays showing that AID oligomerization causes clustered mutations. Experiments used 0.2 mM DNA and 0.2, 0.4, or 0.8 mM AID.mono.

Reaction time was 2 min.

Data in (A)–(C), (G), and (H) are represented as mean ± SD from three independent measurements. See also Figure S3.

Molecular Cell 67, 361–373, August 3, 2017 365

C located at up to 18 nt away from the G4 core (Figure S3E),

which showed a similar peak of deamination at the third position

at a low AID concentration (Figure 3H, blue trace). However,

when we increased the AID concentration to induce oligomeriza-

tion, we observed that the cytidine sites away from the G4 core

were more efficiently deaminated (Figure 3H, green and yellow

traces), suggesting that AID oligomerization on G4 may spread

the mutations to more distal sites. The peaks of deamination

are separated by �6 nt (Figure 3H), which is consistent with

the 5-nt minimal ssDNA length for AID binding that we identified

earlier (Figure 3B).

Structures of AID and Its Complex with SubstratesBecause oligomerization of AID.mono upon binding to a G4

substrate resulted in a heterogeneous oligomeric complex,

we screened additional mutations on surface hydrophobic resi-

dues that could disrupt AID oligomerization to facilitate crystalli-

zation. We found that the F42E/F141Y/F145E triple mutation

plus shortening of the linker to the MBP tag rendered AID.mono

entirely monomeric as measured by multi-angle light scattering

(MALS) (Figures 1A and S4A). The new construct that we named

AID.crystal maintained G4 preference in vitro (Figure S4B)

and formed a homogeneous AID2/G4 complex without further

oligomerization, as determined byMALS (Figure 4A). The stoichi-

ometry of the complex suggested that each AID is capable of

binding two ssDNA overhangs in the G4 substrate. Confirma-

tively, a designed branched substrate with only two ssDNA over-

hangs displayed a 1:1 interaction with AID.crystal (Figure 4A).

Interestingly, AID.mono bound and deaminated better on the

two-overhang branched substrate in comparison with a single-

overhang substrate, but the binding did not show cooperativity

(Hill coefficient nz1.0; Figures 4B and S4C) as for G4 substrates

(Figure 3A).

The MBP-fused AID.crystal and its catalytically dead mutant

E58A were crystallized in complex with G4 DNA or branched

DNA but did not crystallize alone. However, despite confirmed

presence of DNA in the crystals (Figure S4D), only fragmented

DNA density was visible, which appeared to mediate crystal

packing. Indeed, AID also co-crystallized with blunt-ended dou-

ble-stranded DNA (dsDNA) that it did not bind in solution (Fig-

ure 4B); the dsDNA stacked in the crystal lattice and likely

neutralized repulsion between highly positively charged AID

with isoelectric point (PI) of �9.0 (Figure S4E). Upon trying

many different substrates (Table S1), we obtained in total

seven structures of AID, alone and in complex with cytidine

(C), 20-deoxycytidine (dC), and 20-deoxycytidine-50-monophos-

phate (dCMP) at a highest resolution of 2.4 A (Figures 4C; Table

S2). Importantly, mutations in AID.crystal at residues F42, F141,

and F145 all localize on the opposite side of the active site

defined by the bound dCMP (Figure 4D).

In contrast to the recently reported crystal structure of an AID-

APOBEC3A hybrid (AIDv) alone (Pham et al., 2016), our AID/

dCMP complex captured using the E58A mutant revealed a

deep substrate channel and the direction of ssDNA binding

(Figure 4E). Because of the replacement of residues 7–36 of

AID with those of APOBEC3A, the AIDv structure exhibits dis-

rupted shape and charge distribution at the substrate channel

(Figures 4F and S4F). The active site is comprised of the catalytic

366 Molecular Cell 67, 361–373, August 3, 2017

proton-donating residue E58 and the Zn2+ ion coordinated by

H56, C87, C90, and usually a fourth ligand, e.g., a water in the

dCMP complex or a cacodylic acid from an Apo-AID crystalliza-

tion condition (Figures 4G and S4G). No significant conforma-

tional changes were observed at the active site between WT

and E58A structures (Figure S4H).

Mechanisms of Cytosine Recognition and DNA/RNADifferentiationThe dCMP-defined substrate channel is mainly formed by the

a1-b1, b2-a2, and the b4-a4 loops (Figure 4E), among which the

b4-a4 loop was previously designated as the recognition loop

important for hotspot specificity (Kohli et al., 2009; Wang et al.,

2010). Although there are limited global conformational differ-

ences between structures of AID alone and its complexes, the

side chain of F115 in the b4-a4 recognition loop is antiparallel to

theconservedY114 inall complexstructures (FigureS4I),whereas

a stacking conformation between F115 and Y114 was also

observed in Apo-AID structures (Figure S4I), suggesting that the

flip of F115 may be induced or stabilized by substrate binding.

The cytosine is cradled by aromatic residues H56, W84, and

Y114 (Figure 4G) and precisely positioned by interactions with

surrounding residues. The atom N4 forms hydrogen bonds to

the Zn2+-coordinating water and the carbonyl oxygen of S85,

whereas the atom O2 hydrogen bonds with the hydroxyl of T27

(Figure 4G). If we superimpose the E58-containing Apo-AID

structure, N4 also interacts with the side chain of E58 (Figure 4G).

In contrast, the product uracil possesses an O4 instead of an N4

and cannot form the stabilizing hydrogen bonds, consistent

with the lack of ligand density in AID co-crystallized with uridine

(Table S1). The core arrangement of the AID active site is akin to

that of human cytidine deaminase (CDA) despite a different

structural fold (Figure S4J), with similar Zn2+ coordination and

location of the catalytic Glu (Figure S4K). This structural observa-

tion suggests that AID uses a similar deamination mechanism, in

which the E58 side chain interacts with N3 of the pyrimidine ring

to facilitate nucleophilic attack at C4 by the Zn2+-activated water

for deamination (Chaudhuri and Alt, 2004).

Previous data showed that AID does not deaminate RNA

substrates (Bransteitter et al., 2003). Particularly, replacing

the target dC with C on an otherwise ssDNA substrate abol-

ished AID-induced deamination (Nabel et al., 2013). Support-

ing this observation, we only captured significant electron

density of dCMP, but not CMP, in the catalytic center (Fig-

ure S4L), in spite of similar binding behaviors of AID for DNA

and RNA in vitro (Figures 3A and 3C). In the AID/dCMP com-

plex, R25 interacts with the 50-phosphate and Y114 interacts

with the O50, whereas N51 hydrogen bonds with the 30-OH

(Figure 4G). Compared to the Apo-AID structure, both R25

and N51 undergo significant side chain adjustments upon sub-

strate binding (Figure 4H). Proximity of the carbonyl oxygen of

R25 to the C20 of the deoxyribose indicates a steric hindrance

if the deoxyribose is replaced by a ribose (Figures 4G and 4H).

Without the 50-phosphate, the AID structure in complex with

C or dC either showed different sugar position or much

weaker density (Figures S4L and S4M), suggesting that the

50-phosphate is essential for fixing dCMP orientation in the

catalytic center for DNA/RNA differentiation. The interactions

Figure 4. Structures of AID and Its Complex with dCMP

(A) Gel filtration chromatography with in-line MALS showing that AID.crystal binds G4 DNA in 2:1 ratio and branched DNA in 1:1 ratio. Measured and calculated

molecular masses are labeled.

(B) EMSA curves showing enhanced AID binding affinity for branched substrate with two overhangs (red), in comparison to that with one overhang (linear

substrate, black) or no overhang (dsDNA, blue). Data are represented as mean ± SD from three independent measurements.

(C) Ribbon diagram of human AID in rainbow color showing the secondary structures and catalytic residues near active site Zn2+. W, water.

(D) Locations of F42, F141, and F145 mutated in AID.crystal on the face of the crystal structure opposite to the active site.

(E) Surface charge distribution of the AID/dCMP structure and pocket prediction revealed a substrate binding channel that passes through the active site

(green mesh).

(F) Comparison between the AID-APOBEC3A hybrid AIDv (PDB: 5JJ4) and AID.crystal showed distinct surface charge distribution at the substrate channel.

(G) Substrate dCMP in AID (E58A) catalytic center showing the interactionswith surrounding residues. The E58 side chain is taken from theWTApo-AID structure.

(H) Alignment between Apo- and dCMP-bound AID structures showing the movement of R25 and N51 upon substrate recognition.

(I) Mutations associated with the hyper-IgM syndrome mapped to AID structure.

See also Figure S4 and Tables S1–S3.

Molecular Cell 67, 361–373, August 3, 2017 367

Figure 5. AID Structures Revealed a Bifur-

cated Recognition Site for Two ssDNA

Overhangs

(A) Surface charge distribution of AID structure

revealed an additional positive patch away from

the substrate-binding channel for recognition of an

additional ssDNA.

(B) Sequence alignment among different species

of AID as well as human APOBECs, showing that

the unique basic residue distribution in the sub-

strate channel and the assistant patch is not

conserved in APOBECs.

(C) Mutations on positively charged residues in

the substrate chain channel abolished deamina-

tion on substrates with either one or two ssDNA

overhangs. The K34S/R77S/R107S mutant is a

negative control on positively charged residues

elsewhere.

(D) Mutations on positively charged residues on

the assistant patch impaired deamination on the

substrate with two overhangs, without signifi-

cantly affecting AID activity on that with one

overhang. Experiments in (C) and (D) used 0.1 mM

AID and 1 mM DNA.

(E) Mutations on the assistant patch abolished

CSR. The AIDv was also completely deficient in

rescuing CSR. Notably, all AID constructs in this

assay contain intact CTT, including AIDv. Data are

represented as mean ± SD from three indepen-

dent measurements.

See also Figure S5.

we observed in the complex structure explain many hyper-IgM

syndrome mutations (Figure 4I; Table S3) and previously re-

ported disruptive mutations, such as R25A/D, T27A, N51A,

and Y114A (Basu et al., 2005; King et al., 2015; Shivarov

et al., 2008).

A Bifurcated Substrate-Binding Surface Explains G4Preferences and Is Critical for CSRAs described earlier, our in vitro data suggested that one AID in-

teracts with two ssDNA overhangs (Figure 4A). Supporting this

observation, we found that, apart from the substrate channel,

the AID structure contains an additional positively charged sur-

face at helix a6, which we named the ‘‘assistant patch’’ (Fig-

ure 5A). Together with the substrate channel, the AID structure

suggested a bifurcated substrate-binding surface wedged by a

negatively charged b4-a4 loop (Figure 5A). This structural feature

is surprisingly analogous to branched nucleic acid recognition by

the T4 RNase H (Devos et al., 2007) and Cas9 (Jiang et al., 2016;

Figures S5A and S5B). Thus, we propose a simultaneous recog-

368 Molecular Cell 67, 361–373, August 3, 2017

nition mechanism for two overhangs in

structured substrates, such as G4 in

AID targeting, in which one ssDNA over-

hang passes through the active site

and an adjacent ssDNA binds at the

assistant patch to enhance affinity.

Sequence alignment shows that the posi-

tively charged residues in the bifurcated

substrate-binding surface are highly

conserved in AID across species, but this conservation is absent

in APOBEC homologs (Figure 5B), explaining the lack of G4 pref-

erence in APOBEC3A and APOBEC3G (Figure 3F).

To validate the bifurcated substrate-binding surface, we

mutated either the substrate channel or the assistant patch.

For the substrate channel, the triple mutation on K22, R24, and

R25 and the double mutation on R50 and R52 abolished AID

deamination activity on DNA substrates with either one or two

ssDNA overhangs (Figure 5C). As a control, a triple mutation

on residues K34, R77, and R107, which are localized far away

from the substrate-binding surface, did not affect AID activity

(Figure 5C). For the assistant patch, mutations on R171, R174,

R177, and R178 specifically compromised AID deamination ac-

tivity on substrate with two ssDNA overhangs, without signifi-

cantly altering AID activity on one ssDNA overhang substrate

(Figure 5D), confirming its assistant role in recognizing structured

substrates with multiple overhangs.

Remarkably, when reconstituted into AID-deficient splenic

B cells, the assistant patch mutants showed completely

Figure 6. AID Oligomerization and Struc-

tural Comparison with APOBECs

(A) Bio-layer interferometry by Blitz showing the

kinetics of a hotspot containing G4 substrate

binding by AID.mono (left) and AID.crystal

(right). Much faster dissociation was observed for

AID.crystal in comparison with AID.mono.

(B) Bio-layer interferometry by Blitz showing the

kinetics of non-hotspot G4 substrate binding by

AID.mono (left) and AID.crystal (right). The much

faster dissociation remained for AID.crystal.

(C) The U-shaped substrate-binding channel

observed in A3A-ssDNA complex structure (PDB:

5SWW).

(D) AID surface with the U-shaped substrate from

in A3A, showing steric clash.

(E) A structure model of AID in complex with the

AGCTT ssDNA.

(F) A schematic diagram suggesting that a

U-shaped substrate channel in A3A and A3B may

not support structured substrate recognition.

See also Figure S6.

abolished CSR activity (<1% of the WT; Figures 5E and S5C).

Among the mutants, R174S was previously reported from pa-

tients with hyper-IgM syndrome (Durandy et al., 2006; Honjo

et al., 2012; Table S3). Moreover, the AID-APOBEC3A hybrid

Mole

AIDv (Pham et al., 2016) was also

completely deficient in conducting CSR

(Figures 5E and S1A), likely due to the

disrupted substrate channel orientation

(Figures 4F and S4F). Of note, all AID

constructs tested in the CSR assay

were with intact N-terminal nuclear local-

ization sequence (NLS) and CTT, sup-

porting that the functional deficiencies

were solely contributed by the internal

mutations. Collectively, we demonstrate

that the integrity of the bifurcated sub-

strate-binding surface, including both

the substrate channel and the assistant

patch, is essential for AID function

in CSR.

Putative AID Oligomerization on G4Contributes to CSRCompared to AID.mono, the three addi-

tional mutations in AID.crystal (F42/F141/

F145) localize on the opposite side of the

proposed substrate-binding surface (Fig-

ure 4D). Interestingly, these mutations

impaired CSR by �60% in comparison

to the WT without affecting AID deamina-

tion activity in vitro and in the SHM-mimic

RifR assay (Figures 1D, 1E, and S4B), sug-

gesting that CSR may require more AID

properties than SHM. DNA binding assay

showed that the oligomerization-deficient

AID.crystal exhibited no cooperativity (Hill

coefficient n z1.0; Figure S5D) or formed large oligomers (Fig-

ure S3D) upon G4 binding, in comparison to the highly coopera-

tive AID.mono (Figure 3A). Additionally, AID.crystal exhibited

much faster dissociation from G4 structured substrates than

cular Cell 67, 361–373, August 3, 2017 369

Figure 7. A Model of G4-Structure-Medi-

ated AID Recruitment and Oligomerization

that Create Mutation Clusters and DSB

WT-like AID.mono as measured by bio-layer interferometry, irre-

spective of the DNA sequence (Figures 6A and 6B). Based on

these data, we hypothesize that AID oligomerization upon G4

structure binding contributes to CSR by enhancing AID recruit-

ment and accumulation to Ig S regions. It should be noted that

AID.mono only cooperatively oligomerizes on G4, but not on a

simpler structured substrate, such as branched DNA, that may

be formed by local secondary structures (Figure 4B). The AID

oligomerization may be further facilitated by the CTT (Mondal

et al., 2016) and result in previously observed processivity

(Pham et al., 2003).

Comparison of Substrate Recognition by AID andAPOBECsTwo recent papers reported crystal structures of APOBECs

in complex with ssDNAs, including those of the non-catalytic

APOBEC3G N-domain (A3G-N) (Xiao et al., 2016), APOBEC3A

(A3A), and APOBEC3B containing a replaced a1-b1 loop from

A3A (A3B-A chimera; Shi et al., 2017). Consistent with its lack

370 Molecular Cell 67, 361–373, August 3, 2017

of an active site, A3G-N interacts with

ssDNA at a surface shifted from the anal-

ogous AID active site (Figure S6A). The

A3A and A3B-A chimera structures

displayed the same binding mode to

U-shaped ssDNA (Figure 6C), among

which the cognate C at the center binds

similarly as dCMP in AID (Figure 6D).

However, the remaining ssDNA clashes

with the AID surface and does not follow

the observed substrate channel of AID

(Figures 6D and S6B).

To further validate the observed sub-

strate channel in AID, we modeled the

binding of an AGCTT hotspot substrate

using the bound dCMP position as an an-

chor (Figure 6E). We found that the b4-a4

recognition loop stayed in the conforma-

tion observed in the dCMPcomplex struc-

ture (Figure S4I) and that conformation re-

arrangement is only required at the a1-b1

loop to accommodate the substrate at

the �1 and �2 positions (Figures 6E and

S6C). No gross conformational changes

are needed for the superficial binding

at +1 and +2 positions (Figure 6E). The

modeling exercise therefore supports the

observed substrate channel. We suspect

that, if a U-shaped substrate channel ex-

ists in AID as in A3A and A3B, it will likely

not be able to support simultaneous bind-

ing of another ssDNA at the assistant

patch,whichmaybe the reason for the dif-

ference between substrate recognition by AID and APOBECs

(Figure 6F) and for the CSR deficiency of AIDv (Figure 5E).

DISCUSSION

In this study, we combined biochemical, biophysical, and

structural approaches to elucidate AID targeting mechanisms,

especially in CSR. By using the fully functional AID.mono, we

clearly demonstrated that G4 substrates mimicking the Ig S

regions are preferred AID targets in vitro. Different from the pre-

viously proposed hotspot hypothesis, our data indicate that

the AID preference for G4 substrates is predominantly due to

their bundled ssDNA overhangs structure rather than any pri-

mary sequence motif. By solving structures of the MBP-fused

AID.crystal alone and in complex with dCMP, we observed a

bifurcated substrate-binding surface, which strongly supports

a model of one AID recognizing two adjacent ssDNA overhangs

in a structured substrate, such as G4, to achieve high affinity

(Figure 7). Although AID can recruit either DNA or RNA in a

sequence-independent manner, the positioning of dCMP in the

active site suggests that only deoxycytidine, but not cytidine,

can flip into the catalytic center to be deaminated.

Other than Ig S regions, it has been shown that G4 structures

maybeprevalent in other regions ofmammalian genomes (Cham-

bers et al., 2015; Maizels and Gray, 2013). Especially, G-rich re-

gions in AID off-target genes, like c-MYC and BCL6, have been

proposed to be targeted byAID (Duquette et al., 2005, 2007). After

inspecting many identified AID off-target genes, we found that

these genes often contain G-repeat (GGG) enriched regions,

particularly in their non-template strands, which may lead to G4

assembly during transcription and direct AID recruitment (Fig-

ure S6D; Table S4). However, more experimental data and

genome sequence analysis may be required to establish the cor-

relation between G-repeat motif distribution and AID targeting.

Additionally, our data suggest that G4 DNA-binding-induced

cooperative AID oligomerization may contribute to its accumula-

tion in Ig S regions, promoting high-density mutations, DSBs, and

their joining during CSR (Figure 7). In comparison, SHM can be

induced by CSR-deficient AID variants in cells (Hwang et al.,

2015;Phamet al., 2016) andbyAPOBEC3homologsduring retro-

viral infection (Halemano et al., 2014), suggesting that mutations

in Ig V regions may not require all the AID features employed in

CSR. Because chromatin immunoprecipitation sequencing data

showed that Ig V regions may not bind AID as stably as S regions

(Matthews et al., 2014b), and the assistant patch mutant R174S

also causes SHM deficiency (Durandy et al., 2006; Honjo et al.,

2012), we speculate that Ig V regions may only transiently recruit

AID using local secondary structures like branched DNA, which

do not induce AID oligomerization and stable association.

Although our biochemical data and the bipartite DNA binding

surface in AID structure strongly suggest a model of how AID rec-

ognizes structured substrates (Figure 6F), a definitive complex

structure that contains fully characterized substrate conformation

is still lacking, despite extensive efforts using both crystallography

and electron microscopy (EM). Co-crystallization of purified AID/

G4, AID/branched DNA, and AID/linear DNA complexes did not

yield visible density for the substrates, likely due to crystal packing

effects. Alternatively, we tried co-crystallization using 1- to 5-nt

ssDNA fragments together with a 12-bp dsDNA (Table S1), but

any ssDNA fragment longer than 1 nt did not reveal any density

in the active site. Neither did soaking of various substrates into

pre-formed AID crystals (Table S1). We further used EM on the

AID/G4 complex, but the flexibility of the complex caused issues

in the reconstruction (Figure S6E). Thus, the exactmolecular basis

for ourmodel remains tobedeterminedby futurestructural studies

of AID/DNAcomplexes. Due to the close correlation of AID activity

with B cell lymphoma and other types of cancers, the AID struc-

tureswill alsoprovide templates for potential therapeutic interven-

tion against this important cytidine deaminase.

STAR+METHODS

Detailed methods are provided in the online version of this paper

and include the following:

d KEY RESOURCES TABLE

d CONTACT FOR REAGENT AND RESOURCE SHARING

d METHOD DETAILS

B Protein Engineering and Purification

B Ex Vivo CSR Assay

B Rifampicin Resistance (RifR) Assay

B In Vitro Deamination Assay

B In Vitro EMSA

B G4 and Branched DNA Purification and Character-

ization

B AID/DNA Complex Crystallization and Structure Deter-

mination

B Molecular Mass Measurement by Multi-Angle Light

Scattering (MALS)

B Electron Microscopy

d QUANTIFICATION AND STATISTICAL ANALYSES

B In Vitro EMSA and Deamination Assay Quantification

B Data Analysis in Flow Cytometry

d DATA AND SOFTWARE AVAILABILITY

B Accession Numbers

SUPPLEMENTAL INFORMATION

Supplemental Information includes six figures and four tables can be found

with this article online at http://dx.doi.org/10.1016/j.molcel.2017.06.034.

AUTHOR CONTRIBUTIONS

H.W. supervised the project. Q.Q. and L.W. designed the experiments and

performed protein engineering, in vitro assays, structure determination, and

analysis. F.-L.M. performed ex vivo CSR assay, and F.W.A. supervised the

effort. H.W. and Q.Q. wrote most of the manuscript with help from J.K.H.

and F.W.A. for the introduction. All authors contributed to data analysis and

critical interpretation of results and approved the manuscript.

ACKNOWLEDGMENTS

We thank Ermelinda Damko and Devendra Srivastava for their earlier work on

this project; Ming Tian, Zhou Du, Leng Siew Yeap, Jiazhi Hu, and Junchao

Dong for discussions; Yang Li for EM data analysis; Xia Xie for assistances

in CSR assay; Rida Mourtada in Dr. Loren D. Walensky’s lab and Kelly Arnett

in Harvard Medical School Center for Macromolecular Interactions for their

assistance with CD spectroscopy; Sukumar Narayanasami and Surajit Bane-

rjee of NE-CAT at the Advance Photon Source for their assistance on data

collection; andMaria Ericsson and Louise Trakimas of HarvardMedical School

EM facility for assistance on EM imaging. This work was supported by a Can-

cer Research Institute Irvington Postdoctoral Fellowship (to Q.Q.), a Lym-

phoma Research Foundation Fellowship (to F.-L.M.), NIH R01 AI077595 (to

F.W.A.), and NIH F30 AI114179 (to J.K.H.). F.W.A. is an Investigator of the Ho-

ward Hughes Medical Institute.

Received: March 1, 2017

Revised: May 3, 2017

Accepted: June 27, 2017

Published: July 27, 2017

REFERENCES

Adams, P.D., Afonine, P.V., Bunkoczi, G., Chen, V.B., Davis, I.W., Echols, N.,

Headd, J.J., Hung, L.W., Kapral, G.J., Grosse-Kunstleve, R.W., et al. (2010).

PHENIX: a comprehensive Python-based system for macromolecular struc-

ture solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221.

Bardin, C., and Leroy, J.L. (2008). The formation pathway of tetramolecular

G-quadruplexes. Nucleic Acids Res. 36, 477–488.

Molecular Cell 67, 361–373, August 3, 2017 371

Basu, U., Chaudhuri, J., Alpert, C., Dutt, S., Ranganath, S., Li, G., Schrum, J.P.,

Manis, J.P., and Alt, F.W. (2005). The AID antibody diversification enzyme is

regulated by protein kinase A phosphorylation. Nature 438, 508–511.

Bransteitter, R., Pham, P., Scharff, M.D., and Goodman, M.F. (2003).

Activation-induced cytidine deaminase deaminates deoxycytidine on single-

stranded DNA but requires the action of RNase. Proc. Natl. Acad. Sci. USA

100, 4102–4107.

Chambers, V.S., Marsico, G., Boutell, J.M., Di Antonio, M., Smith, G.P., and

Balasubramanian, S. (2015). High-throughput sequencing of DNA G-quadru-

plex structures in the human genome. Nat. Biotechnol. 33, 877–881.

Chaudhuri, J., and Alt, F.W. (2004). Class-switch recombination: interplay of

transcription, DNA deamination and DNA repair. Nat. Rev. Immunol. 4,

541–552.

Chaudhuri, J., Tian, M., Khuong, C., Chua, K., Pinaud, E., and Alt, F.W. (2003).

Transcription-targeted DNA deamination by the AID antibody diversification

enzyme. Nature 422, 726–730.

Chaudhuri, J., Khuong, C., and Alt, F.W. (2004). Replication protein A interacts

with AID to promote deamination of somatic hypermutation targets. Nature

430, 992–998.

Cheng, H.L., Vuong, B.Q., Basu, U., Franklin, A., Schwer, B., Astarita, J., Phan,

R.T., Datta, A., Manis, J., Alt, F.W., and Chaudhuri, J. (2009). Integrity of the

AID serine-38 phosphorylation site is critical for class switch recombination

and somatic hypermutation in mice. Proc. Natl. Acad. Sci. USA 106,

2717–2722.

Devos, J.M., Tomanicek, S.J., Jones, C.E., Nossal, N.G., and Mueser, T.C.

(2007). Crystal structure of bacteriophage T4 50 nuclease in complex with a

branched DNA reveals how flap endonuclease-1 family nucleases bind their

substrates. J. Biol. Chem. 282, 31713–31724.

Duke, J.L., Liu, M., Yaari, G., Khalil, A.M., Tomayko, M.M., Shlomchik, M.J.,

Schatz, D.G., and Kleinstein, S.H. (2013). Multiple transcription factor binding

sites predict AID targeting in non-Ig genes. J. Immunol. 190, 3878–3888.

Duquette, M.L., Handa, P., Vincent, J.A., Taylor, A.F., and Maizels, N. (2004).

Intracellular transcription of G-rich DNAs induces formation of G-loops, novel

structures containing G4 DNA. Genes Dev. 18, 1618–1629.

Duquette, M.L., Pham, P., Goodman, M.F., and Maizels, N. (2005). AID binds

to transcription-induced structures in c-MYC that map to regions associated

with translocation and hypermutation. Oncogene 24, 5791–5798.

Duquette, M.L., Huber, M.D., and Maizels, N. (2007). G-rich proto-oncogenes

are targeted for genomic instability in B-cell lymphomas. Cancer Res. 67,

2586–2594.

Durandy, A., Peron, S., Taubenheim, N., and Fischer, A. (2006). Activation-

induced cytidine deaminase: structure-function relationship as based on the

study of mutants. Hum. Mutat. 27, 1185–1191.

Emsley, P., Lohkamp, B., Scott, W.G., and Cowtan, K. (2010). Features and

development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501.

Halemano, K., Guo, K., Heilman, K.J., Barrett, B.S., Smith, D.S., Hasenkrug,

K.J., and Santiago, M.L. (2014). Immunoglobulin somatic hypermutation by

APOBEC3/Rfv3 during retroviral infection. Proc. Natl. Acad. Sci. USA 111,

7759–7764.

Han, L., Masani, S., and Yu, K. (2011). Overlapping activation-induced cytidine

deaminase hotspot motifs in Ig class-switch recombination. Proc. Natl. Acad.

Sci. USA 108, 11584–11589.

Honjo, T., Kobayashi, M., Begum, N., Kotani, A., Sabouri, S., and Nagaoka, H.

(2012). The AID dilemma: infection, or cancer? Adv. Cancer Res. 113, 1–44.

Hwang, J.K., Alt, F.W., and Yeap, L.S. (2015). Relatedmechanisms of antibody

somatic hypermutation and class switch recombination. Microbiol. Spectr. 3,

MDNA3-0037-2014.

Jiang, F., Taylor, D.W., Chen, J.S., Kornfeld, J.E., Zhou, K., Thompson, A.J.,

Nogales, E., and Doudna, J.A. (2016). Structures of a CRISPR-Cas9 R-loop

complex primed for DNA cleavage. Science 351, 867–871.

King, J.J., Manuel, C.A., Barrett, C.V., Raber, S., Lucas, H., Sutter, P., and

Larijani, M. (2015). Catalytic pocket inaccessibility of activation-induced

372 Molecular Cell 67, 361–373, August 3, 2017

cytidine deaminase is a safeguard against excessive mutagenic activity.

Structure 23, 615–627.

Kitamura, S., Ode, H., Nakashima,M., Imahashi, M., Naganawa, Y., Kurosawa,

T., Yokomaku, Y., Yamane, T., Watanabe, N., Suzuki, A., et al. (2012). The

APOBEC3C crystal structure and the interface for HIV-1 Vif binding. Nat.

Struct. Mol. Biol. 19, 1005–1010.

Kohli, R.M., Abrams, S.R., Gajula, K.S., Maul, R.W., Gearhart, P.J., and

Stivers, J.T. (2009). A portable hot spot recognition loop transfers sequence

preferences from APOBEC family members to activation-induced cytidine

deaminase. J. Biol. Chem. 284, 22898–22904.

Larijani, M., Frieder, D., Basit, W., and Martin, A. (2005). The mutation spec-

trum of purified AID is similar to the mutability index in Ramos cells and in

ung(-/-)msh2(-/-) mice. Immunogenetics 56, 840–845.

Larijani, M., Petrov, A.P., Kolenchenko, O., Berru, M., Krylov, S.N., and Martin,

A. (2007). AID associates with single-strandedDNAwith high affinity and a long

complex half-life in a sequence-independent manner. Mol. Cell. Biol.

27, 20–30.

Maizels, N., and Gray, L.T. (2013). The G4 genome. PLoS Genet. 9, e1003468.

Matthews, A.J., Zheng, S., DiMenna, L.J., and Chaudhuri, J. (2014a).

Regulation of immunoglobulin class-switch recombination: choreography of

noncoding transcription, targeted DNA deamination, and long-range DNA

repair. Adv. Immunol. 122, 1–57.

Matthews, A.J., Husain, S., and Chaudhuri, J. (2014b). Binding of AID to DNA

does not correlate with mutator activity. J. Immunol. 193, 252–257.

McKean, D., Huppi, K., Bell, M., Staudt, L., Gerhard, W., and Weigert, M.

(1984). Generation of antibody diversity in the immune response of BALB/c

mice to influenza virus hemagglutinin. Proc. Natl. Acad. Sci. USA 81,

3180–3184.

Meng, F.L., Du, Z., Federation, A., Hu, J., Wang, Q., Kieffer-Kwon, K.R.,

Meyers, R.M., Amor, C., Wasserman, C.R., Neuberg, D., et al. (2014).

Convergent transcription at intragenic super-enhancers targets AID-initiated

genomic instability. Cell 159, 1538–1548.

Mondal, S., Begum, N.A., Hu, W., and Honjo, T. (2016). Functional require-

ments of AID’s higher order structures and their interaction with RNA-binding

proteins. Proc. Natl. Acad. Sci. USA 113, E1545–E1554.

Nabel, C.S., Lee, J.W., Wang, L.C., and Kohli, R.M. (2013). Nucleic acid deter-

minants for selective deamination of DNA over RNA by activation-induced

deaminase. Proc. Natl. Acad. Sci. USA 110, 14225–14230.

Pavri, R., Gazumyan, A., Jankovic, M., Di Virgilio, M., Klein, I., Ansarah-

Sobrinho, C., Resch, W., Yamane, A., Reina San-Martin, B., Barreto, V.,

et al. (2010). Activation-induced cytidine deaminase targets DNA at sites of

RNA polymerase II stalling by interaction with Spt5. Cell 143, 122–133.

Petersen-Mahrt, S.K., Harris, R.S., and Neuberger, M.S. (2002). AIDmutates E.

coli suggesting a DNA deamination mechanism for antibody diversification.

Nature 418, 99–103.

Pham, P., Bransteitter, R., Petruska, J., and Goodman, M.F. (2003).

Processive AID-catalysed cytosine deamination on single-stranded DNA sim-

ulates somatic hypermutation. Nature 424, 103–107.

Pham, P., Afif, S.A., Shimoda, M., Maeda, K., Sakaguchi, N., Pedersen, L.C.,

andGoodman, M.F. (2016). Structural analysis of the activation-induced deox-

ycytidine deaminase required in immunoglobulin diversification. DNA Repair

(Amst.) 43, 48–56.

Qian, J., Wang, Q., Dose, M., Pruett, N., Kieffer-Kwon, K.R., Resch, W., Liang,

G., Tang, Z., Mathe, E., Benner, C., et al. (2014). B cell super-enhancers and

regulatory clusters recruit AID tumorigenic activity. Cell 159, 1524–1537.

Rajewsky, K., Forster, I., and Cumano, A. (1987). Evolutionary and somatic se-

lection of the antibody repertoire in the mouse. Science 238, 1088–1094.

Rogozin, I.B., and Diaz, M. (2004). Cutting edge: DGYW/WRCH is a better pre-

dictor of mutability at G:C bases in Ig hypermutation than the widely accepted

RGYW/WRCY motif and probably reflects a two-step activation-induced cyti-

dine deaminase-triggered process. J. Immunol. 172, 3382–3384.

Shi, K., Carpenter, M.A., Banerjee, S., Shaban, N.M., Kurahashi, K.,

Salamango, D.J., McCann, J.L., Starrett, G.J., Duffy, J.V., Demir, O., et al.

(2017). Structural basis for targeted DNA cytosine deamination and muta-

genesis by APOBEC3A and APOBEC3B. Nat. Struct. Mol. Biol. 24, 131–139.

Shivarov, V., Shinkura, R., and Honjo, T. (2008). Dissociation of in vitro DNA

deamination activity and physiological functions of AID mutants. Proc. Natl.

Acad. Sci. USA 105, 15866–15871.

Sun, D., and Hurley, L.H. (2010). Biochemical techniques for the characteriza-

tion of G-quadruplex structures: EMSA, DMS footprinting, and DNA poly-

merase stop assay. Methods Mol. Biol. 608, 65–79.

Vorlı�ckova, M., Kejnovska, I., Sagi, J., Ren�ciuk, D., Bedna�rova, K., Motlova, J.,

and Kypr, J. (2012). Circular dichroism and guanine quadruplexes. Methods

57, 64–75.

Wang, M., Yang, Z., Rada, C., and Neuberger, M.S. (2009). AID upmutants iso-

lated using a high-throughput screen highlight the immunity/cancer balance

limiting DNA deaminase activity. Nat. Struct. Mol. Biol. 16, 769–776.

Wang, M., Rada, C., and Neuberger, M.S. (2010). Altering the spectrum of

immunoglobulin V gene somatic hypermutation by modifying the active site

of AID. J. Exp. Med. 207, 141–153.

Winn, M.D., Ballard, C.C., Cowtan, K.D., Dodson, E.J., Emsley, P., Evans,

P.R., Keegan, R.M., Krissinel, E.B., Leslie, A.G., McCoy, A., et al. (2011).

Overview of the CCP4 suite and current developments. Acta Crystallogr. D

Biol. Crystallogr. 67, 235–242.

Xiao, X., Li, S.X., Yang, H., and Chen, X.S. (2016). Crystal structures of

APOBEC3G N-domain alone and its complex with DNA. Nat. Commun.

7, 12193.

Xu, Z., Fulop, Z., Wu, G., Pone, E.J., Zhang, J., Mai, T., Thomas, L.M.,

Al-Qahtani, A., White, C.A., Park, S.R., et al. (2010). 14-3-3 adaptor proteins

recruit AID to 50-AGCT-30-rich switch regions for class switch recombination.

Nat. Struct. Mol. Biol. 17, 1124–1135.

Xu, Z., Zan, H., Pone, E.J., Mai, T., and Casali, P. (2012). Immunoglobulin

class-switch DNA recombination: induction, targeting and beyond. Nat. Rev.

Immunol. 12, 517–531.

Yeap, L.S., Hwang, J.K., Du, Z., Meyers, R.M., Meng, F.L., Jakubauskait _e, A.,

Liu, M., Mani, V., Neuberg, D., Kepler, T.B., et al. (2015). Sequence-intrinsic

mechanisms that target AID mutational outcomes on antibody genes. Cell

163, 1124–1137.

Yu, K., Chedin, F., Hsieh, C.L.,Wilson, T.E., and Lieber, M.R. (2003). R-loops at

immunoglobulin class switch regions in the chromosomes of stimulated B

cells. Nat. Immunol. 4, 442–451.

Yu, J., Zhou, Y., Tanaka, I., and Yao, M. (2010). Roll: a new algorithm for the

detection of protein pockets and cavities with a rolling probe sphere.

Bioinformatics 26, 46–52.

Zheng, S., Vuong, B.Q., Vaidyanathan, B., Lin, J.Y., Huang, F.T., and

Chaudhuri, J. (2015). Non-coding RNA generated following lariat debranching

mediates targeting of AID to DNA. Cell 161, 762–773.

Molecular Cell 67, 361–373, August 3, 2017 373

STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER

Antibodies

APC Rat anti-Mouse IgG1 BD Biosciences 560089; RRID: AB_1645625

Anti-AID Chaudhuri et al., 2003 N/A

Anti-GFP MBL Code # 598; RRID: AB_10597267

Anti-CD40 eBioscience 16-0402-86

Bacterial and Virus Strains

E.coli Bl21 (DE3) Agilent Technologies 200131

E. coli KL16 Coli Genetic Stock Center CGSC# 4245

E. coli BW310 Coli Genetic Stock Center CGSC# 6078

GIBCO Sf9 cells Thermo Scientific 11496-015

DH10Bac Thermo Scientific 10361-012

Chemicals, Peptides, and Recombinant Proteins

SYBR Gold Nucleic Acid Gel Stain Thermo Scientific S-11494

40% Acrylamide/Bis Solution, 19:1 Bio-Rad 161-0144

SIGMAFAST Protease Inhibitor Cocktail Tablets, EDTA-Free Sigma-Aldrich S8830-20TAB

Rifampicin Sigma-Aldrich R3501-250MG

Dimethyl sulfate Sigma-Aldrich D186309-100ML

Piperidine Sigma-Aldrich 411027-100ML

Cellfectin II Reagent Invitrogen Cat#10362-100

Recombinant Mouse Interleukin-4/IL-4 Novoprotein Cat#CK15

Lipopolysaccharides from Escherichia coliO111:B4 Sigma Aldrich L2630-100MG

Uracil-DNA Glycosylase NEB M0280S

20-deoxycytidine Sigma-Aldrich D3897-500MG

20-deoxycytidine 50-monophosphate Sigma-Aldrich D7625-100MG

Cytidine Sigma-Aldrich C4654-1G

Cytidine 50-monophosphate Sigma-Aldrich C1006-1G

Uranyl formate Electron Microscopy Sciences 22450

PreScission protease GE Healthcare 27-0843-01

LB Broth RPI L24066-5000.0

HyClone SFX-Insect Media Thermo Scientific SH30278LS

Grace’s Insect Medium, Supplemented Thermo Scientific 11605-102

Fetal Bovine Serum Sigma-Aldrich TMS-013-B

Critical Commercial Assays

EasySep Mouse B Cell Isolation Kit STEMCELL Technologies Cat#19854

Deposited Data

AID.crystal, co-crystallized with G4 DNA Protein Data Bank PDB: 5W0Z

AID.crystal E58A, co-crystallized with Branched DNA and

cacodylic acid

Protein Data Bank PDB: 5W0R

AID.crystal E58A, co-crystallized with dsDNA and cytidine Protein Data Bank PDB: 5W1C

AID.crystal E58A, co-crystallized with dsDNA and dCMP Protein Data Bank PDB: 5W0U

Recombinant DNA

pGEX6P1 GE Healthcare Cat#28954648

pTrc99A DNA Resource Core N/A

pFastBac Thermo Scientific 10584-027

All synthesized DNA/RNA Integrated DNA Technologies N/A

(Continued on next page)

e1 Molecular Cell 67, 361–373.e1–e4, August 3, 2017

Continued

REAGENT or RESOURCE SOURCE IDENTIFIER

Other

S1000 Thermal Cycler Bio-Rad S1000

ChemiDoc MP imager Bio-Rad 1708280

Image Lab Version 4.1 Bio-Rad N/A

Image Scanner FLA-9000 Fujifilm N/A

Multi Gauge Version 3.0 Fujifilm N/A

Origin OriginLab Corporation N/A

mini-DAWN TRISTAR Wyatt Technology N/A

Optilab DSP Wyatt Technology N/A

ASTRA V Wyatt Technology N/A

BLItz system with Streptavidin (SA) Biosensor ForteBio N/A

BLItz 1.1 ForteBio N/A

Tecnai G2 Spirit BioTWIN electron microscope FEI N/A

J-815 CD Spectropolarimeter Jasco N/A

Phenix Adams et al., 2010 N/A

CCP4 Winn et al., 2011 N/A

Coot Emsley et al., 2010 N/A

Pymol Schrodinger N/A

POCASA 1.1 Yu et al., 2010 N/A

YASARA YASARA Biosciences N/A

Integrative Genomics Viewer Broad Institute N/A

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Dr. Hao

Wu ([email protected]).

METHOD DETAILS

Protein Engineering and PurificationConstructs of human AID (UniProt: Q9GZX7) were generated in the pFastBac vector (Thermo Fisher Scientific) and expressed in Sf9

insect cells for 48 hr using recombinant baculoviruses. To overcome AID aggregation in recombinant expression, we created

numerous AID constructs and found that adding the maltose binding protein (MBP) solubility tag, modifying the N-terminal nuclear

localization sequence (NLS) and removing the C-terminal tail (CTT) resulted in improved yield, but retained AID aggregation.

By screening non-conserved AID surface residue mutations, we identified the AID mutant H130A/R131E. We named this construct

AID.mono because it contained a distinct monomeric fraction in addition to the aggregated faction, in contrast to complete

aggregation of wild-type (WT) AID. When necessary, the MBP-tag was removed by incubating with the PreScission Protease

(GE Healthcare) at 1/100 ratio at 4�C overnight. Adding C-terminal tail back (AID.mono+CTT) decreased the monomer yield by

95%. In AID.crystal, the linker between AID and MBP was shortened to facilitate crystallization. The proteins were affinity-purified

using amylose resin (New England Biolabs), followed by chromatography using Superdex 200 10/300 GL, monomer fraction

was further purified by HiTrap Heparin HP (GE Healthcare) and another Superdex 200. Final gel filtration buffer contains 20 mM

Bis-Tris at pH 6.8, 200 mM NaCl and 1 mM tris(2-carboxyethyl)phosphine) (TCEP).

Human Apobec3A (1-199) and APOBEC3G (197-380) was cloned into a pGEX6p-1 vector and expressed in Bl21 (DE3) strain. The

N-GST fusion proteins were purified by Glutathione Sepharose 4B resin (GE Healthcare), and the tag was removed by on-column

Precision Protease treatment. APOBEC proteins were further purified over a Superdex 200.

Ex Vivo CSR AssayThe AID constructs used in this assay contained point mutations in AID.mono (H130A/R131E), AID.crystal (H130A/R131E plus F42E/

F141Y/F145E), assistant patch residues, or AIDv. The ex vivo CSR assaywas performed as described previously (Cheng et al., 2009).

Molecular Cell 67, 361–373.e1–e4, August 3, 2017 e2

Briefly, AID-deficient mouse splenic B cells were stimulated with IL-4 (Novoprotein) and anti-CD40 (eBioscience) to induce CSR to

IgG1. One day after stimulation, WT or mutant AID together with GFP via an internal ribosome entry site (IRES) was retrovirally deliv-

ered into the cells. The percentage of GFP-positive cells that underwent switching to IgG1 was taken as the level of CSR rescue. The

data for each mutant were either from 3 or 6 mice.

Rifampicin Resistance (RifR) AssayWT and mutant AID were cloned into the pTrc99A vector (pTrc99A-AID) and the rifampicin resistance (RifR) assay was performed as

previously described (Wang et al., 2009). Briefly, E. coli strain KL16 (Hfr [PO-45] relA1 spoT1 thi-1) and its UDG-deficient derivative

BW310 bacteria transformedwith pTrc99A-AID plasmidswere grown overnight to saturation in LBmedium supplementedwith ampi-

cillin (100 mg/ml) and isopropyl b-D-1-thiogalactopyranoside (IPTG, 1 mM), and plated on LB low-salt agar containing ampicillin

(100 mg/ml) and rifampicin (50 mg/ml). Mutation frequency was measured by determining the median number of colony-forming cells

that survived selection per 107 viable cells plated from 12 independent cultures. The identity of mutations was determined by

sequencing the relevant section of rpoB (typically from 25 to 200 individual colonies) after PCR amplification using oligonucleotides

50-TTGGCGAAATGGCGGAAAACC-30 and 50-CACCGACGGATACCACCTGCTG-30) (synthesized by Integrated DNA Technologies).

In Vitro Deamination AssayEach 10ml reaction contained 0.1 or 1 mMAID, and 1 mMsubstrateDNA labeled by 6-carboxyfluorescein (FAM) at 50 end (synthesized by

Integrated DNA Technologies) in 20 mMHEPES at pH 7.5, 100 mM KCl and 1 mMDTT. Following incubation at 37�C for the indicated

length of time, each reaction was raised to 95�C for 10 min and flash cooled to inactivate AID and resolve DNA structures. Sufficient

amount of UDG (NewEngland Biolabs, 5 unit with each unit catalyzing 60 pmol/min at 37�C) was then added and themixture was incu-

bated at 37�C for 1h. Lastly, NaOH was added to a final concentration of 0.15 M, and treated at 95�C for 15 min to break abasic sites.

Urea gelwasused to separate the product from the substrate. Experiments in Figures 5Gand 5HusedSYBRGold staining to save cost.

Images were taken on Image Scanner FLA-9000 (Fujifilm) using excitation wavelength of 495 nm and emission wavelength of 519 nm.

Quantification was performed using the Image Lab (Bio-Rad) and Multi Gauge (Fujifilm) software.

In Vitro EMSAUp to 25 mMAIDwas titrated into 10 nM FAM labeled DNA followed by 10%–12%acrylamide:bisacrylamide (19:1) native gel to sepa-

rate free DNA and the AID/DNA complex. Quantified free DNA amounts at different AID concentrations were used to calculate the

dissociation constant (KD) and the Hill coefficient (n) of the AID/DNA interaction. 100 nM unlabeled TGGGGT1-5 and TGGGGT10were used in experiments of Figure 3B to eliminate any effect from the fluorophore and its linker to DNA. SYBR Gold was used

for staining. Images were taken on Image Scanner FLA-9000 (Fujifilm). TheMulti Gauge (Fujifilm) software was used for quantification

and the Origin software (OriginLab Corporation) was used in curve fitting.

G4 and Branched DNA Purification and CharacterizationTo generate mostly G4 structures, Sm fragment and G-repeat DNAs were dissolved in 20 mM HEPES at pH 7.5, 100 mM KCl and

1 mM DTT, incubated at 95�C for 5 min and slowly cooled down to room temperature. In contrast, to generate mostly linear DNA,

heat denaturation at 95�C for 10 min followed by flash cooling on ice was used to eliminate DNA structures. A Jasco J-815 Circular

dichroism (CD) spectropolarimeter was used in determining the parallel and anti-parallel conformation of the G4 assembly (Vor-

lı�ckova et al., 2012). When preparing large amounts of G4 structures, up to 1M KCl was used. The size exclusion column Superdex

75 or 200 (GE Healthcare) was used to separate G4 and linear DNA fractions. Notably, the FAM fluorescence intensity was quenched

significantly upon single G-repeat G4 assembly. DMS protection assay was performed as described (Sun and Hurley, 2010).

Branched DNAs were generated by the similar annealing procedure without the further purification steps.

AID/DNA Complex Crystallization and Structure DeterminationGel filtration purified AID.crystal/G4 complex containing G4 DNA with the sequence of TGGGGTTTTTTT was crystallized using

0.26 M NaCl, 0.1 M MES at pH 6.0 and 12% PEG3350 at 25�C. Other AID complexes with G4 substrates including TGGGGTTTTT,

TGGGGTTTTTT, TGGGGTTCTTTT, TGGGGTTCTTT, TGGGGAACTTT, TGGGGAAUTTT and TTTTTTGGGGGTTCTTT were also

crystallized, which showed similar AID structures and no ordered DNA.

Gel filtration purified AID.crystal (WT and E58A)/branched complexes containing DNA with the sequences of GTTCAAGGCCA

GAGCTTT and TTTTTCCTGGCCTTGAAC were crystallized using 0.05 M sodium cacodylate at pH 5.5, 20 mM MgCl2, 10 mM

CaCl2, 10 mM spermidine and 5% PEG3350 at 16�C.When using dsDNA as a crystallization chaperone, AID.crystal (E58A) was mixed with dsDNA with the sequences of GTTCAAGG

CCAG and CTGGCCTTGAAC at 1:1 molar ratio, and crystallized either alone or with various substrate ligands such as 50-dCMP, de-

oxycytidine, and cytidine (Sigma) at 20mM concentration, under 0.1MMES at pH 6.2, 3%PEG3350 and 10mMCaCl2 at 16�C.Many

other short oligos were also tried in co-crystallization, but did not show ordered nucleotide density in the structures (Table S1).

The AID/G4 structure was solved by molecular replacement calculations in Phaser using the crystal structure of MBP (PDB: 3VD8)

and APOBEC3C as the model (PDB: 3VOW) (Kitamura et al., 2012). Subsequent structures were determined by molecular

replacement using the AID structure as a model. Phenix, CCP4 and Coot were used in model building and refinement (Adams

e3 Molecular Cell 67, 361–373.e1–e4, August 3, 2017

et al., 2010; Emsley et al., 2010; Winn et al., 2011). The POCASA 1.1 server was used for pocket prediction (Yu et al., 2010). The

YASARA server was used for energy minimization of the substrate manually docked to AID (YASARA Biosciences). Pymol was uti-

lized for molecular visualization and structure display (Schrodinger).

Molecular Mass Measurement by Multi-Angle Light Scattering (MALS)AID alone and AID-DNA complexes were reconstituted and purified as described above. The complex peak fractions con-

taining �0.2 mg protein were loaded onto a Superdex 200 gel filtration column coupled to a three-angle light scattering detector

(mini-DAWN TRISTAR) and a refractive index detector (Optilab DSP) (Wyatt Technology). Data analysis was carried out using

ASTRA V.

Electron MicroscopyAID and AID/DNA complex were applied to carbon-coated grids and negatively stained with 1%uranyl formate (ElectronMicroscopy

Sciences). Samples were examined in a Tecnai G2 Spirit BioTWIN electron microscope (FEI) at an accelerating voltage of 80 keV and

a nominal magnification of 3 49,000.

QUANTIFICATION AND STATISTICAL ANALYSES

In Vitro EMSA and Deamination Assay QuantificationImages were obtained by instruments described in Method Details. The band areas were boxed and backgrounds were subtracted.

The band quantification results were averaged from three repeats and the error bars were shown as standard deviations.

Data Analysis in Flow CytometryCell populationwas identified in FlowJo. Unpaired t test was used and two-tailed P valuewas calculated usingGraphPad Prism6. The

class switch recombination (CSR) results showed average from three biological replicates, with bars indicated SD.

DATA AND SOFTWARE AVAILABILITY

Accession NumbersThe accession numbers for the data reported in this paper are PDB: 5W0Z (AID.crystal, co-crystallized with G4 DNA), 5W0R

(AID.crystal E58A, co-crystallized with Branched DNA and cacodylic acid), 5W1C (AID.crystal E58A, co-crystallized with dsDNA

and cytidine), and 5W0U (AID.crystal E58A, co-crystallized with dsDNA and dCMP).

Molecular Cell 67, 361–373.e1–e4, August 3, 2017 e4