psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293...

22
Unmodified A375 A375-Cas9 HT29-Cas9 293T-Cas9 MOLM13-Cas9 BV2-Cas9 0 20 40 60 80 100 Percent EGFP Negative Cas9 Activity Assay hU6 EGFP sgRNA PGK WPRE ·/75 ·/75 psi+gag RRE cPPT Puromycin 2A EGFP pXPR_011 GFP - A F-SCA A375 A375-Cas9 0.52% 82.1% A B C Supplementary Figure 1. Cas9 activity assays for cell lines used in this study. (a) Schematic of pXPR_011 (Addgene #59702). Cells that do not express Cas9 will express EGFP and puromycin resistance, while cells with sufficient levels of active Cas9 will utilize the sgRNA targeting EGFP. Placement of EGFP downstream of a 2A site allows for continued puromy- cin resistance after indel formation. (b) Quantitation of the fraction of cells with Cas9 activity. Samples were analyzed ten to fourteen days post-infection.(c) Example flow cytometey plots. Nature Biotechnology: doi:10.1038/nbt.3437

Transcript of psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293...

Page 1: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Unmod

ified A

375

A375-C

as9

HT29-C

as9

293T

-Cas

9

MOLM13

-Cas

9

BV2-Cas

90

20

40

60

80

100

Perc

ent E

GFP

Neg

ative

Cas9 Activity Assay

hU6 EGFP sgRNA PGK WPRE

�·/75 �·/75

psi+gag RRE cPPT

Puromycin 2A EGFP

pXPR_011

GFP - A

F-SC

A

A375 A375-Cas9

0.52% 82.1%

A

B

C

Supplementary Figure 1. Cas9 activity assays for cell lines used in this study. (a) Schematic of pXPR_011 (Addgene #59702). Cells that do not express Cas9 will express EGFP and puromycin resistance, while cells with sufficient levels of active Cas9 will utilize the sgRNA targeting EGFP. Placement of EGFP downstream of a 2A site allows for continued puromy-cin resistance after indel formation. (b) Quantitation of the fraction of cells with Cas9 activity. Samples were analyzed ten to fourteen days post-infection.(c) Example flow cytometey plots.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 2: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Avana Pool 1

Avana Pool 2

Avana Pool 3

Avana Pool 4

Avana Pool 5

Avana Pool 6

GeCKOv2

Cu

mu

lative

re

ad

fra

ctio

n

Supplementary Figure 2 Distribution of sgRNA abundance in the plasmid

DNA pool of each Avana subpool in lentiGuide (solid lines) and GeCKOv2

library in lentiGuide (dashed line).

Fraction of ranked sgRNAs

Nature Biotechnology: doi:10.1038/nbt.3437

Page 3: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Expand Cas9 expressing cell line

ůŝďƌĂƌLJ�Ăƚ�ĚĞƐŝƌĞĚ�ƌĞƉƌĞƐĞŶƚĂƟŽŶ

^Ɖůŝƚ�ƚŽ�ƐŵĂůů�ŵŽůĞĐƵůĞ

2 weeks passaging

�džƚƌĂĐƚ�Ő�E��ĨƌŽŵ�ƐĐƌĞĞŶ�ĞŶĚ�ĐĞůůƉĞůůĞƚƐ�ĂŶĚ�ƉĞƌĨŽƌŵ�ďĂƌĐŽĚŝŶŐ�W�Z

^Ƶďŵŝƚ�ƐĂŵƉůĞƐ�ĂŶĚ�ĚĞĐŽŶǀŽůƵƚĞ�ŽƵƚƉƵƚ

/ůůƵŵŝŶĂ�ƐĞƋƵĞŶĐŝŶŐ

Supplementary Figure 3 Pooled Screen workflow. Cells are infected with a Cas9-expression plasmid and are subjected to one week or more of blasticidin selection. Cells are infected with library and selected with puromycin for one week before being split into small molecule condi-tions or continued passaging for positive and negative selection screens, respectively. At then end point, cell pellets are collected and gDNA extracted. Samples are then barcoded and sequenced by Illumina. For analysis, the log2-fold-change of each sgRNA is determined rela-tive to the starting plasmid DNA (pDNA) pool.

ϭн�ǁĞĞŬ�ďůĂƐƟĐŝĚŝŶ�ƐĞůĞĐƟŽŶ

Nature Biotechnology: doi:10.1038/nbt.3437

Page 4: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

0 50 100

10-4

10-3

10-2

Gene Ranks

RIG

ER p

-val

ueAvana lentiCRISPRv2

Avana lentiGuide

GeCKOv1 lentiCRISPRv1

GeCKOv1 lentiGuide

GeCKOv2 lentiGuide

Supplementary Figure 4 Comparison of the p-values for the top 100 genes

determined by weighted sum analysis in RIGER. y-axis is plotted in log10-

scale.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 5: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

40

60

80

100Day 010 days, no drug10 days, vemurafnib

1 2 3 4 5 6 7 8 9 10 A AB B C A B C D A B C D ENegative Controls CUL3 NF1 NF2 TP53

Perc

enta

ge E

GFP

-neg

ativ

e ce

lls

Supplementary Figure 5 Validation of EGFP competition assay to confirm individual sgRNA activity. A375-Cas9 cells were infected with individual sgRNAs, selected with puromycin for one week, and then mixed with A375-Cas9 cells also expressing EGFP and puromycin resis-tance. The ratio in each population was determined by flow cytometry. Populations were then maintained with vemurafenib or with no drug for 10 days, and the relative fraction of sgRNA-containing, EGFP-negative cells was determined by flow cytometry. For the 10 negative controls, all cells died after 10 days in the presence of vemurafenib and thus could not be assessed by flow cytometry.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 6: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

2 3 4 5 6Number of subpools

Perc

enta

ge o

f gen

es re

tain

ed

Supplementary Figure 6 Subsampling analysis of Avana library in selumetinib resistance screening data. STARS was run on all six subpools to determine the number of genes that passed at FDR thresholds of 10% and 25% (first number in legend). Subpools were then iteratively removed and STARS was re-run to determine the average number of genes retained by using fewer subpools at FDR thresholds of 10%, 25% and 75% (second number in legend). Error bars represent one standard deviation when subsampling from different combina-tions of subpools.

LG 10,10

LG 10,75

LG 25,25LC 10,10

LC 10,75

LC 25,25

0

50

100

Nature Biotechnology: doi:10.1038/nbt.3437

Page 7: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

ï8 ï6 ï4 ï2 0 2 4 6 80.0

0.2

0.4

0.6

0.8

1.0

Negative controls

Targeting

Log2-fold changeLog2-fold change

Negative controls

Avana lentiGuide

Log2-fold change

Cum

ulat

ive

Dis

trib

utio

n Avana, 6 subpools

A375 cells

Supplementary Figure 7. Change in abundance of Avana library in negative selection

screens. (a) Cumulative distribution of 996 negative controls and 108,467 gene target-

ing sgRNAs in A375 cells for the Avana library in lentiGuide. Data are from two biologi-

cal replicates. (b) Cumulative distribution of 1000 negative controls and 73,687 gene

targeting sgRNAs in HT29 cells for the Avana library screened with 4 subpools in

lentiGuide. Data are from three biological replicates.

ï5 ï4 ï3 ï2 ï1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

Dis

trib

utio

n

Log2-fold change

Negative controls

Targeting

BAvana, 4 subpools

HT29 cells

A

Nature Biotechnology: doi:10.1038/nbt.3437

Page 8: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

1 10

19

28

37

46

55

64

73

82

91

100

0.0001

0.001

0.01

0.1

1

Top 100 Core Essential Genes

Fa

lse

Dis

cove

ry R

ate

GeCKOv2

lentiGuide

Avana

lentiGuide

Supplementary Figure 8. Depletion of core essential genes with the

Avana and GeCKOv2 libraries as assessed by STARS analysis. Data

from two biological replicates were merged and STARS analysis

performed for all genes.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 9: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

RIBO

SOM

E

SPLI

CEO

SOM

E

AMIN

OAC

YL T

RNA

BIO

SYNT

HESI

S

DNA

REPL

ICAT

ION

RNA

POLY

MER

ASE

PRO

TEAS

OM

E

PRO

TEIN

EXP

ORT

OXI

DATI

VE P

HOSP

HORY

LATI

ON

NUCL

EOTI

DE E

XCIS

ION

REPA

IR

PYRI

MID

INE

MET

ABO

LISM

HOM

OLO

GO

US R

ECO

MBI

NATI

ON

MIS

MAT

CH R

EPAI

R

RNA

DEG

RADA

TIO

N

CELL

CYC

LE

1.5

2.0

2.5

3.0

KEGG Gene Set

GSE

A No

rmal

ized

Enric

hmen

t Sco

reAvanaGeCKOv2

Supplementary Figure 9. Gene Set Enrichment Analysis (GSEA) of negative selection screens performed in A375 cells with the indicated libraries in lentiGuide. All KEGG gene sets with normalized enrichment scores of 2.0 or greater are shown for Avana, with the corresponding score shown for GeCKOv2. The input ranked gene list was determined by RIGER analysis using the weighted sum option.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 10: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

0 20 40 60 80 1000.0

0.2

0.4

0.6

0.8

1.0

Avana, 4 subpools (0.84)GeCKOv2 (0.71)

Percent-Rank of Core Essential sgRNAs

Cum

ulat

ive p

erce

ntag

e

Supplementary Figure 10 ROC-AUC analysis of core essential genes in HT29 cells for Avana screened with 4 subpools and the GeCKOv2 library in lentiGuide. AUCs for each library are specified in brackets. Data are from three biological replicates.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 11: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

0

5

10

15 P

opula

tion D

oublings

A375

6-thioguanineno drug

0

5

10

HT29

Popula

tion D

oublings

293T

0

5

10

15

20

Popula

tion D

oublings

Control

sgRNA HPRT NUDT5

sgRNA 4 sgRNA 4sgRNA 6 sgRNA 6

Supplementary Figure 11 Validation of 6-thioguanine resistance hits. Individual sgRNAs

cloned into lentiGuide were infected into 3 cell lines, selected with puromycin, and after 1

week of selection were split into 6-thioguanine or continued to be passaged in the absence

of drug (day 0). An sgRNA targeting EGFP was used as a control. The populations were

counted at each passage, and the cumulative number of population doublings was deter-

mined relative to day 0. For A375 cells, the columns represent cell counts taken on days 2,

5, 7, 9, 11, 13, and 15; for HT29 cells, days 5, 7, 10, 15; for 293T cells, days 2, 5, 7, 9, 11,

13, and 15.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 12: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

CCDC101

CUL3 NF1

Vem

urafe

nib

Log2

-fold

Cha

nge

MED12 TADA2B

Selu

meti

nib

TADA1

HPRT1

6-T

hio

gu

an

ine

Percent Protein Length

Supplementary Figure 12 Activity of sgRNAs across the length of the targeted protein for the RES dataset.

Log2

-fold

Cha

nge

Log2

-fold

Cha

nge

Log2

-fold

Cha

nge

NF2

Nature Biotechnology: doi:10.1038/nbt.3437

Page 13: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

AUC

SVM + LogReg (Doench, RS1)L1 class., 1 & 2 ntsL1 class., 1 nt only (Xu)SVM, 1 & 2 ntsSVM, 1 nts only (Wang)

Train:Test:

FCFC

RESRES

FC+RESFC+RES

Supplementary Figure 13 Performance of various classification models assessed by area under the curve (AUC). Error bars represent one standard deviation for performance across genes using a leave-one-gene-out method for training and testing.

Nature Biotechnology: doi:10.1038/nbt.3437

Page 14: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Spea

rman

cor

rela

tion

train: FCtest: FC

train: RES test: RES

train: FC+RES test: FC+RES

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

AU

C

L1 RegressionL1 Classifier

train: FCtest: FC

train: RES test: RES

train: FC+RES test: FC+RES

Supplementary Figure 14 Comparison of two comparable predictive models, one of which uses only a 0/1 binarization of the sgRNA scores (L1 Classifier), and the other that uses the sgRNA scores directly (L1 Regression). (a) uses area under the curve (AUC) as the performance metric, while (b) uses Spearman correlation as the metric. For all three data sets, and across both metrics, the regression approach outperforms the classification approach. Each bar shows the mean value across all genes, with the error bars denoting one standard deviation across the genes. Statistical significance of the improvement in Spearman correlation when using regression over classification is, for FC, RES and FC+RES data, respectively, p < 1×10-16, p = 5.4×10-13, p < 1×10-16

A

B

Nature Biotechnology: doi:10.1038/nbt.3437

Page 15: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

posi

tion

dep.

ord

er 2

posi

tion

dep.

ord

er 1

posi

tion

ind.

ord

er 2

Tm, s

gRN

A nt

s 16

- 20

amin

o ac

id c

ut p

ositi

on

Tm, s

gRN

A nt

s 8

- 15

posi

tion

ind.

ord

er 1

Tm, D

NA

30m

er

NG

GN

inte

ract

ion

GC

cou

nt

Tm s

gRN

A nt

s 3

- 7

0.0

0.1

0.2

0.3

0.4

0.5

Aver

age

Gin

i im

porta

nces

Supplementary Figure 15. Top features weights comprising Rule Set 2. The Gini impor-tances sum to 1.0. The two most important features, postition dependent order 1 and 2, refer to the identity of a specific single and dinucleotides at a particular position, e.g. is there an A at position 3, is there an AG at positions 9 and 10. Position independent order 1 and 2 refer to the total number of particular nucleotides, e.g. how many As are there in the sequence, how many CGs. The 3 melting temperature features (Tm) for the sgRNA were discretized by the UHJLRQ�RI�WKH�VJ51$��ZLWK�SRVLWLRQ���DV�WKH��·�HQG�RI�WKH�VJ51$�DQG�SRVLWLRQ����DV�WKH�3$0�proximal nucleotide. The NGGN interaction consisted of the two nucleotides that occur in the 3$0�RQ�HDFK�VLGH�RI�WKH�LQYDULDQW�**�

Nature Biotechnology: doi:10.1038/nbt.3437

Page 16: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Supplementary Figure 16 Performance of the final model for Rule Set 2, gradient boosted regression trees, when trained and tested on all datasets.

Spea

rman

cor

rela

tion

FC RES FC+RES0.40

0.45

0.50

0.55

0.60

0.65

FCRESFC+RES

Test Data

Trained Data

Nature Biotechnology: doi:10.1038/nbt.3437

Page 17: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Rule

Set 1 S

core

0.2

0.4

0.6

0.8

Eff Ineff Eff IneffEff Ineff

MouseHumanHuman

n:

Activity:

Species:

731 438 671 237 830 234

All essentialNon-RiboRibosomalDataset:

****

0.0

Supplementary Figure 17 Performance of Rule Set 1 on negative selection

data, using negative selection data as curated by Xu et al. From Wang et al.,

sgRNAs were classified as either effective or ineffective for dropout screens

performed in HL-60 and KBM-7 human cell lines, and split into two classes,

one for ribosomal genes and the other for essential, non-ribosomal genes. The

data curated from Koike-Yusa et al. represent a dropout screen performed in

mouse ES cells, and all essential genes were kept as one class in the curation

performed by Xu et al. Rule Set 1 effectively distinguishes effective from

ineffective sgRNAs in all cases, with p-values of 1.4x10-32, 1.8x10-16, and

1.1x10-11 from left to right (two-sample Kolmogorov-Smirnov test).

********

Nature Biotechnology: doi:10.1038/nbt.3437

Page 18: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Supplementary Figure 18 CD33 flow cytometry outline. Gates were set on the BD

FACSDiVa software. The sample was first gated to exclude debris; two additional gates

were applied to select singlets and a fourth to select viable cells via DAPI. Finally, the

population was gated, using a CD-33 DAPI positive control and a DAPI only negative

control, in order to sort for PE negative cells (green bracket).

FSC-A

SS

C-H

SS

C-A

DA

PI-

A

SSC-A

FS

C-H

SSC-W FSC-W

0 100K 200K 0 100K 200K

0 100K 200K 0 104

Nature Biotechnology: doi:10.1038/nbt.3437

Page 19: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Supplementary Figure 19 Protein-activity map of CD33. Log2-fold change of sgRNAs utilizing the canonical NGG PAM across the length of the target protein are shown. The sgRNAs in the region of highest activity (red) were included for further analysis, and sgRNAs targeted to regions of low activity or exhibiting low activity (blue) were excluded from subsequent analysis.

20

4

3

2

1

0

-1

-2

-3

-4

IncludedExcluded

Log 2-f

old-

chan

ge

Percent Protein40 60 80 100

Nature Biotechnology: doi:10.1038/nbt.3437

Page 20: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

15

10

5

0

40

30

20

10

0

20

15

10

5

0

Mismatch Insertion

Deletion

Supplementary Figure 20 Distributions of the number of unique position/nucleotide instances in the off-target data set, for the three different classes of imperfect interac-tions.

Number of instances7

Den

sity

9 10 11 12 1314 15 16 1718 19 20 21 22 23 25 26 29Number of instances

Den

sity

60 62 63 64 65

Den

sity

Number of instances7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 28

Nature Biotechnology: doi:10.1038/nbt.3437

Page 21: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

1.0

0.8

0.6

0.4

0.2

0.0

Supplementary Figure 21 Comparison of the average delta-log2-fold-change and percent-active calculations for variant sgRNAs for all mismatch interac-tions; Pearson correlation coefficient = -0.97.

Perc

ent A

ctiv

e

Average delta log2-fold-change0 1 2 3 4

Nature Biotechnology: doi:10.1038/nbt.3437

Page 22: psi+gag RRE cPPT hU6 EGFP sgRNA PGK WPRE 2A …...Unmodified A 37 5 A 375-Cas 9 HT 29-Cas 9 293 T-MOLM 13-B V 2-Cas9 0 20 40 60 80 100 Perce n t E GF P Ne g at i v e Cas9 Activity

Supplementary Figure 22. Optimized bowtie2 misses potential off-target sites. For three

sgRNAs that had previously been assessed by Guide-Seq all possible off-target sites

were identified by two search methods, optimized bowtie2 (see Methods) and Cas-

OFFinder. The potential sites were then assessed by the CFD score, and the total number

of off-target sites in the human genome receiving a CFD score > 0.2 are plotted.

GAGTCCGAGCAGAAGAAGA

GGAATCCCTTCTGCAGCACC

GACCCCCTCCACCCCGCCTC0

400

800

1200

sgRNA sequence

Num

ber o

f off-

targ

et s

ites

with

CFD

sco

re >

0.2

Cas-OFFinder

Optimized bowtie2

Nature Biotechnology: doi:10.1038/nbt.3437