Post on 19-Jan-2020
Unmod
ified A
375
A375-C
as9
HT29-C
as9
293T
-Cas
9
MOLM13
-Cas
9
BV2-Cas
90
20
40
60
80
100
Perc
ent E
GFP
Neg
ative
Cas9 Activity Assay
hU6 EGFP sgRNA PGK WPRE
�·/75 �·/75
psi+gag RRE cPPT
Puromycin 2A EGFP
pXPR_011
GFP - A
F-SC
A
A375 A375-Cas9
0.52% 82.1%
A
B
C
Supplementary Figure 1. Cas9 activity assays for cell lines used in this study. (a) Schematic of pXPR_011 (Addgene #59702). Cells that do not express Cas9 will express EGFP and puromycin resistance, while cells with sufficient levels of active Cas9 will utilize the sgRNA targeting EGFP. Placement of EGFP downstream of a 2A site allows for continued puromy-cin resistance after indel formation. (b) Quantitation of the fraction of cells with Cas9 activity. Samples were analyzed ten to fourteen days post-infection.(c) Example flow cytometey plots.
Nature Biotechnology: doi:10.1038/nbt.3437
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Avana Pool 1
Avana Pool 2
Avana Pool 3
Avana Pool 4
Avana Pool 5
Avana Pool 6
GeCKOv2
Cu
mu
lative
re
ad
fra
ctio
n
Supplementary Figure 2 Distribution of sgRNA abundance in the plasmid
DNA pool of each Avana subpool in lentiGuide (solid lines) and GeCKOv2
library in lentiGuide (dashed line).
Fraction of ranked sgRNAs
Nature Biotechnology: doi:10.1038/nbt.3437
Expand Cas9 expressing cell line
ůŝďƌĂƌLJ�Ăƚ�ĚĞƐŝƌĞĚ�ƌĞƉƌĞƐĞŶƚĂƟŽŶ
^Ɖůŝƚ�ƚŽ�ƐŵĂůů�ŵŽůĞĐƵůĞ
2 weeks passaging
�džƚƌĂĐƚ�Ő�E��ĨƌŽŵ�ƐĐƌĞĞŶ�ĞŶĚ�ĐĞůůƉĞůůĞƚƐ�ĂŶĚ�ƉĞƌĨŽƌŵ�ďĂƌĐŽĚŝŶŐ�W�Z
^Ƶďŵŝƚ�ƐĂŵƉůĞƐ�ĂŶĚ�ĚĞĐŽŶǀŽůƵƚĞ�ŽƵƚƉƵƚ
/ůůƵŵŝŶĂ�ƐĞƋƵĞŶĐŝŶŐ
Supplementary Figure 3 Pooled Screen workflow. Cells are infected with a Cas9-expression plasmid and are subjected to one week or more of blasticidin selection. Cells are infected with library and selected with puromycin for one week before being split into small molecule condi-tions or continued passaging for positive and negative selection screens, respectively. At then end point, cell pellets are collected and gDNA extracted. Samples are then barcoded and sequenced by Illumina. For analysis, the log2-fold-change of each sgRNA is determined rela-tive to the starting plasmid DNA (pDNA) pool.
ϭн�ǁĞĞŬ�ďůĂƐƟĐŝĚŝŶ�ƐĞůĞĐƟŽŶ
Nature Biotechnology: doi:10.1038/nbt.3437
0 50 100
10-4
10-3
10-2
Gene Ranks
RIG
ER p
-val
ueAvana lentiCRISPRv2
Avana lentiGuide
GeCKOv1 lentiCRISPRv1
GeCKOv1 lentiGuide
GeCKOv2 lentiGuide
Supplementary Figure 4 Comparison of the p-values for the top 100 genes
determined by weighted sum analysis in RIGER. y-axis is plotted in log10-
scale.
Nature Biotechnology: doi:10.1038/nbt.3437
40
60
80
100Day 010 days, no drug10 days, vemurafnib
1 2 3 4 5 6 7 8 9 10 A AB B C A B C D A B C D ENegative Controls CUL3 NF1 NF2 TP53
Perc
enta
ge E
GFP
-neg
ativ
e ce
lls
Supplementary Figure 5 Validation of EGFP competition assay to confirm individual sgRNA activity. A375-Cas9 cells were infected with individual sgRNAs, selected with puromycin for one week, and then mixed with A375-Cas9 cells also expressing EGFP and puromycin resis-tance. The ratio in each population was determined by flow cytometry. Populations were then maintained with vemurafenib or with no drug for 10 days, and the relative fraction of sgRNA-containing, EGFP-negative cells was determined by flow cytometry. For the 10 negative controls, all cells died after 10 days in the presence of vemurafenib and thus could not be assessed by flow cytometry.
Nature Biotechnology: doi:10.1038/nbt.3437
2 3 4 5 6Number of subpools
Perc
enta
ge o
f gen
es re
tain
ed
Supplementary Figure 6 Subsampling analysis of Avana library in selumetinib resistance screening data. STARS was run on all six subpools to determine the number of genes that passed at FDR thresholds of 10% and 25% (first number in legend). Subpools were then iteratively removed and STARS was re-run to determine the average number of genes retained by using fewer subpools at FDR thresholds of 10%, 25% and 75% (second number in legend). Error bars represent one standard deviation when subsampling from different combina-tions of subpools.
LG 10,10
LG 10,75
LG 25,25LC 10,10
LC 10,75
LC 25,25
0
50
100
Nature Biotechnology: doi:10.1038/nbt.3437
ï8 ï6 ï4 ï2 0 2 4 6 80.0
0.2
0.4
0.6
0.8
1.0
Negative controls
Targeting
Log2-fold changeLog2-fold change
Negative controls
Avana lentiGuide
Log2-fold change
Cum
ulat
ive
Dis
trib
utio
n Avana, 6 subpools
A375 cells
Supplementary Figure 7. Change in abundance of Avana library in negative selection
screens. (a) Cumulative distribution of 996 negative controls and 108,467 gene target-
ing sgRNAs in A375 cells for the Avana library in lentiGuide. Data are from two biologi-
cal replicates. (b) Cumulative distribution of 1000 negative controls and 73,687 gene
targeting sgRNAs in HT29 cells for the Avana library screened with 4 subpools in
lentiGuide. Data are from three biological replicates.
ï5 ï4 ï3 ï2 ï1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Dis
trib
utio
n
Log2-fold change
Negative controls
Targeting
BAvana, 4 subpools
HT29 cells
A
Nature Biotechnology: doi:10.1038/nbt.3437
1 10
19
28
37
46
55
64
73
82
91
100
0.0001
0.001
0.01
0.1
1
Top 100 Core Essential Genes
Fa
lse
Dis
cove
ry R
ate
GeCKOv2
lentiGuide
Avana
lentiGuide
Supplementary Figure 8. Depletion of core essential genes with the
Avana and GeCKOv2 libraries as assessed by STARS analysis. Data
from two biological replicates were merged and STARS analysis
performed for all genes.
Nature Biotechnology: doi:10.1038/nbt.3437
RIBO
SOM
E
SPLI
CEO
SOM
E
AMIN
OAC
YL T
RNA
BIO
SYNT
HESI
S
DNA
REPL
ICAT
ION
RNA
POLY
MER
ASE
PRO
TEAS
OM
E
PRO
TEIN
EXP
ORT
OXI
DATI
VE P
HOSP
HORY
LATI
ON
NUCL
EOTI
DE E
XCIS
ION
REPA
IR
PYRI
MID
INE
MET
ABO
LISM
HOM
OLO
GO
US R
ECO
MBI
NATI
ON
MIS
MAT
CH R
EPAI
R
RNA
DEG
RADA
TIO
N
CELL
CYC
LE
1.5
2.0
2.5
3.0
KEGG Gene Set
GSE
A No
rmal
ized
Enric
hmen
t Sco
reAvanaGeCKOv2
Supplementary Figure 9. Gene Set Enrichment Analysis (GSEA) of negative selection screens performed in A375 cells with the indicated libraries in lentiGuide. All KEGG gene sets with normalized enrichment scores of 2.0 or greater are shown for Avana, with the corresponding score shown for GeCKOv2. The input ranked gene list was determined by RIGER analysis using the weighted sum option.
Nature Biotechnology: doi:10.1038/nbt.3437
0 20 40 60 80 1000.0
0.2
0.4
0.6
0.8
1.0
Avana, 4 subpools (0.84)GeCKOv2 (0.71)
Percent-Rank of Core Essential sgRNAs
Cum
ulat
ive p
erce
ntag
e
Supplementary Figure 10 ROC-AUC analysis of core essential genes in HT29 cells for Avana screened with 4 subpools and the GeCKOv2 library in lentiGuide. AUCs for each library are specified in brackets. Data are from three biological replicates.
Nature Biotechnology: doi:10.1038/nbt.3437
0
5
10
15 P
opula
tion D
oublings
A375
6-thioguanineno drug
0
5
10
HT29
Popula
tion D
oublings
293T
0
5
10
15
20
Popula
tion D
oublings
Control
sgRNA HPRT NUDT5
sgRNA 4 sgRNA 4sgRNA 6 sgRNA 6
Supplementary Figure 11 Validation of 6-thioguanine resistance hits. Individual sgRNAs
cloned into lentiGuide were infected into 3 cell lines, selected with puromycin, and after 1
week of selection were split into 6-thioguanine or continued to be passaged in the absence
of drug (day 0). An sgRNA targeting EGFP was used as a control. The populations were
counted at each passage, and the cumulative number of population doublings was deter-
mined relative to day 0. For A375 cells, the columns represent cell counts taken on days 2,
5, 7, 9, 11, 13, and 15; for HT29 cells, days 5, 7, 10, 15; for 293T cells, days 2, 5, 7, 9, 11,
13, and 15.
Nature Biotechnology: doi:10.1038/nbt.3437
CCDC101
CUL3 NF1
Vem
urafe
nib
Log2
-fold
Cha
nge
MED12 TADA2B
Selu
meti
nib
TADA1
HPRT1
6-T
hio
gu
an
ine
Percent Protein Length
Supplementary Figure 12 Activity of sgRNAs across the length of the targeted protein for the RES dataset.
Log2
-fold
Cha
nge
Log2
-fold
Cha
nge
Log2
-fold
Cha
nge
NF2
Nature Biotechnology: doi:10.1038/nbt.3437
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
AUC
SVM + LogReg (Doench, RS1)L1 class., 1 & 2 ntsL1 class., 1 nt only (Xu)SVM, 1 & 2 ntsSVM, 1 nts only (Wang)
Train:Test:
FCFC
RESRES
FC+RESFC+RES
Supplementary Figure 13 Performance of various classification models assessed by area under the curve (AUC). Error bars represent one standard deviation for performance across genes using a leave-one-gene-out method for training and testing.
Nature Biotechnology: doi:10.1038/nbt.3437
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Spea
rman
cor
rela
tion
train: FCtest: FC
train: RES test: RES
train: FC+RES test: FC+RES
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
AU
C
L1 RegressionL1 Classifier
train: FCtest: FC
train: RES test: RES
train: FC+RES test: FC+RES
Supplementary Figure 14 Comparison of two comparable predictive models, one of which uses only a 0/1 binarization of the sgRNA scores (L1 Classifier), and the other that uses the sgRNA scores directly (L1 Regression). (a) uses area under the curve (AUC) as the performance metric, while (b) uses Spearman correlation as the metric. For all three data sets, and across both metrics, the regression approach outperforms the classification approach. Each bar shows the mean value across all genes, with the error bars denoting one standard deviation across the genes. Statistical significance of the improvement in Spearman correlation when using regression over classification is, for FC, RES and FC+RES data, respectively, p < 1×10-16, p = 5.4×10-13, p < 1×10-16
A
B
Nature Biotechnology: doi:10.1038/nbt.3437
posi
tion
dep.
ord
er 2
posi
tion
dep.
ord
er 1
posi
tion
ind.
ord
er 2
Tm, s
gRN
A nt
s 16
- 20
amin
o ac
id c
ut p
ositi
on
Tm, s
gRN
A nt
s 8
- 15
posi
tion
ind.
ord
er 1
Tm, D
NA
30m
er
NG
GN
inte
ract
ion
GC
cou
nt
Tm s
gRN
A nt
s 3
- 7
0.0
0.1
0.2
0.3
0.4
0.5
Aver
age
Gin
i im
porta
nces
Supplementary Figure 15. Top features weights comprising Rule Set 2. The Gini impor-tances sum to 1.0. The two most important features, postition dependent order 1 and 2, refer to the identity of a specific single and dinucleotides at a particular position, e.g. is there an A at position 3, is there an AG at positions 9 and 10. Position independent order 1 and 2 refer to the total number of particular nucleotides, e.g. how many As are there in the sequence, how many CGs. The 3 melting temperature features (Tm) for the sgRNA were discretized by the UHJLRQ�RI�WKH�VJ51$��ZLWK�SRVLWLRQ���DV�WKH��·�HQG�RI�WKH�VJ51$�DQG�SRVLWLRQ����DV�WKH�3$0�proximal nucleotide. The NGGN interaction consisted of the two nucleotides that occur in the 3$0�RQ�HDFK�VLGH�RI�WKH�LQYDULDQW�**�
Nature Biotechnology: doi:10.1038/nbt.3437
Supplementary Figure 16 Performance of the final model for Rule Set 2, gradient boosted regression trees, when trained and tested on all datasets.
Spea
rman
cor
rela
tion
FC RES FC+RES0.40
0.45
0.50
0.55
0.60
0.65
FCRESFC+RES
Test Data
Trained Data
Nature Biotechnology: doi:10.1038/nbt.3437
Rule
Set 1 S
core
0.2
0.4
0.6
0.8
Eff Ineff Eff IneffEff Ineff
MouseHumanHuman
n:
Activity:
Species:
731 438 671 237 830 234
All essentialNon-RiboRibosomalDataset:
****
0.0
Supplementary Figure 17 Performance of Rule Set 1 on negative selection
data, using negative selection data as curated by Xu et al. From Wang et al.,
sgRNAs were classified as either effective or ineffective for dropout screens
performed in HL-60 and KBM-7 human cell lines, and split into two classes,
one for ribosomal genes and the other for essential, non-ribosomal genes. The
data curated from Koike-Yusa et al. represent a dropout screen performed in
mouse ES cells, and all essential genes were kept as one class in the curation
performed by Xu et al. Rule Set 1 effectively distinguishes effective from
ineffective sgRNAs in all cases, with p-values of 1.4x10-32, 1.8x10-16, and
1.1x10-11 from left to right (two-sample Kolmogorov-Smirnov test).
********
Nature Biotechnology: doi:10.1038/nbt.3437
Supplementary Figure 18 CD33 flow cytometry outline. Gates were set on the BD
FACSDiVa software. The sample was first gated to exclude debris; two additional gates
were applied to select singlets and a fourth to select viable cells via DAPI. Finally, the
population was gated, using a CD-33 DAPI positive control and a DAPI only negative
control, in order to sort for PE negative cells (green bracket).
FSC-A
SS
C-H
SS
C-A
DA
PI-
A
SSC-A
FS
C-H
SSC-W FSC-W
0 100K 200K 0 100K 200K
0 100K 200K 0 104
Nature Biotechnology: doi:10.1038/nbt.3437
Supplementary Figure 19 Protein-activity map of CD33. Log2-fold change of sgRNAs utilizing the canonical NGG PAM across the length of the target protein are shown. The sgRNAs in the region of highest activity (red) were included for further analysis, and sgRNAs targeted to regions of low activity or exhibiting low activity (blue) were excluded from subsequent analysis.
20
4
3
2
1
0
-1
-2
-3
-4
IncludedExcluded
Log 2-f
old-
chan
ge
Percent Protein40 60 80 100
Nature Biotechnology: doi:10.1038/nbt.3437
15
10
5
0
40
30
20
10
0
20
15
10
5
0
Mismatch Insertion
Deletion
Supplementary Figure 20 Distributions of the number of unique position/nucleotide instances in the off-target data set, for the three different classes of imperfect interac-tions.
Number of instances7
Den
sity
9 10 11 12 1314 15 16 1718 19 20 21 22 23 25 26 29Number of instances
Den
sity
60 62 63 64 65
Den
sity
Number of instances7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 28
Nature Biotechnology: doi:10.1038/nbt.3437
1.0
0.8
0.6
0.4
0.2
0.0
Supplementary Figure 21 Comparison of the average delta-log2-fold-change and percent-active calculations for variant sgRNAs for all mismatch interac-tions; Pearson correlation coefficient = -0.97.
Perc
ent A
ctiv
e
Average delta log2-fold-change0 1 2 3 4
Nature Biotechnology: doi:10.1038/nbt.3437
Supplementary Figure 22. Optimized bowtie2 misses potential off-target sites. For three
sgRNAs that had previously been assessed by Guide-Seq all possible off-target sites
were identified by two search methods, optimized bowtie2 (see Methods) and Cas-
OFFinder. The potential sites were then assessed by the CFD score, and the total number
of off-target sites in the human genome receiving a CFD score > 0.2 are plotted.
GAGTCCGAGCAGAAGAAGA
GGAATCCCTTCTGCAGCACC
GACCCCCTCCACCCCGCCTC0
400
800
1200
sgRNA sequence
Num
ber o
f off-
targ
et s
ites
with
CFD
sco
re >
0.2
Cas-OFFinder
Optimized bowtie2
Nature Biotechnology: doi:10.1038/nbt.3437