Rozenblatt-Rosen et al., 2012 SUPPLEMENTARY INFORMATION · 17 Pathway enrichment analysis Gene...
Transcript of Rozenblatt-Rosen et al., 2012 SUPPLEMENTARY INFORMATION · 17 Pathway enrichment analysis Gene...
W W W. N A T U R E . C O M / N A T U R E | 1
SUPPLEMENTARY INFORMATIONdoi:10.1038/nature11288
Contents
Supplementary Figures and Legends
1 Summary of viral ORFs screened 2 Overlap of Y2H and TAP-MS data sets 3 Enrichment of GO terms for targeted host proteins 4 Network of HPV E7 protein complex associations 5 Chromatin accessibility of IRF1 binding sites 6 Regulatory loops 7 Heatmap of transcriptome perturbations 8 Growth rate and senescence of selected IMR-90 cell lines expressing viral ORFs 9 Viral proteins, transcription factors and clusters 10 MAML1 is targeted by E6 proteins from HPV5 and HPV8 11 The E6 protein from HPV5 and HPV8 regulates expression of Natch pathway responsive
genes HES1 and DLL4 12 DNA tumour viruses target cancer genes 13 Reproducibility of viral-host protein associations observed in replicate TAP-MS
experiments as a function of the number of unique peptides detected 14 Somatic mutations in genome-wide tumour sequencing projects 15 Network of VirHostSM to host targets and cancers 16 The IMR-90 cell culture pipeline 17 Silver stain analyses of viral-host protein complexes 18 Cluster propensity of microarrays before and after ComBat 19 Cluster coherence 20 Expression bias of TAP-MS associations 21 Overlap of viral-host protein pairs identified through TAP-MS with a literature-curated
positive reference set 22 Assessment of Y2H data set
Supplementary Methods
A. Viral ORFeome cloning B. Yeast two-hybrid (Y2H) assay C. Cell culture pipeline D. HPV E6 oncoproteins and Notch signalling E. Tandem Affinity Purifications (TAP) followed by Mass Spectrometry (MS) F. Subtracting likely non-specific protein associations (“Tandome”) G. TAP reproducibility H. Virus-human Positive Reference Set (PRS)
Rozenblatt-Rosen et al., 2012
3
SUPPLEMENTARY INFORMATION
2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
I. Pathway enrichment analysis J. Microarray preprocessing and differential expression K. Microarray gene and sample clustering L. Predicting cell-specific transcription factor binding sites M. Transcription factor binding site (TFBS) enrichment analysis N. Randomization of gene clusters for enrichment analyses O. Evaluating predictive power of regulatory cascades P. Building a probabilistic map of IMR-90 specific RBPJ binding sites Q. Notch pathway enrichment analysis R. Identification of loci implicated in familial and somatic cancer S. Testing enrichment of gene sets for cancer genes T. Viral target overlap with candidate cancer genes identified by transposon screens U. Comparison of viral interactome and prioritization of cancer genes
Supplementary Tables (available on Nature website)
1 List of viral ORFs Summary of all viral ORFs tested, including all information for each viral ORF and the data generated for each ORF.
2 Y2H data set Summary of virus-host binary interactions identified by Y2H screens.
3 Host proteins targeted by viral proteins in Y2H
List of host proteins in the Y2H network that are significantly targeted or untargeted by viral proteins.
4 TAP data set List of virus-host protein associations identified in two independent TAP-MS experiments.
5 GO cluster summary List of the GO terms that are enriched in each cluster of differentially regulated genes, and the adjusted P values and odds ratios associated with them.
6 KEGG cluster summary List of all the KEGG terms that are enriched in each cluster of differentially regulated genes, and the adjusted P values and odds ratios associated with them.
7 TF list by Y2H,TAP,Array Virally targeted TFs with an enriched number of predicted binding sites for genes within each cluster. Listed TFs come from either associations through TAP-MS, interactions through Y2H, or are differentially expressed in response to viral ORF expression.
8 COSMIC Classic annotation COSMIC Classic genes are annotated based on literature review as tumour suppressor, oncogene or both. If no call could be made, the designation is left as unclear.
9 VirHost data set 947 VirHost proteins identified through TAP-MS association (3 or more unique peptides), Y2H interaction, or as differentially expressed TFs from motif enrichment analysis.
10 Transposon Candidates Overlap between Sleeping Beauty and murine leukemia virus transposon-based screens and VirHost data set.
Rozenblatt-Rosen et al., 2012
4
W W W. N A T U R E . C O M / N A T U R E | 3
SUPPLEMENTARY INFORMATION RESEARCH
11 Polyphen score Cumulative POLYPHEN2 score calculated for each gene with one or more somatic mutations from genome-wide tumour sequencing data.
12 GWAS–VirHost, SCNA-DEL–VirHost, SCNA-AMP–VirHost intersections
Genes at intersection of VirHost data set with genes identified through cancer GWAS or through cancer SCNA analysis (amp=amplification, del=deletion).
13 PCR Primers and viral ORF sequences
PCR primers used to generate viral ORFs from Ad5, HPVs and PyVs and sequence of each viral ORF.
14 Autoactivators List of viral ORFs that act as auto-activators in Y2H or wNAPPA protein interaction assays.
15 Tandome List of host proteins identified in more than 1% of 108 control TAP experiments (Tandome).
16 Y2H PRS, TAP-MS PRS List of Y2H and TAP-MS Positive Reference Sets (PRS). 17 Pathway enrichment analysis Gene Ontology term enrichment for viral targets identified
by TAP-MS for Ad5, EBV, HPV and PyV viruses; Odds ratio and resampling-adjusted P-values are provided.
18 Microarray batches Lists of the individual microarrays in the gene expression data set, and the batch in which each one was processed.
19 Differentially regulated genes Annotation of the most frequently perturbed host genes, with the cluster membership, fold changes, and adjusted P values in each cell line expressing a viral ORF relative to control cell line.
20 GWAS loci Mapping of loci previously identified through cancer GWAS to nearby genes.
21 SCNA loci Loci previously identfied through SCNA analysis of cancers, with corresponding statistical evidence (q-value and residual q-value) and genes within each interval.
22 wNAPPA results wNAPPA scores for tested viral ORF interactions.
Supplementary Notes
1. Significantly targeted and untargeted proteins 2. HI-2 and analysis of overlaps between Y2H, TAP-MS and their respective PRS 3. Measurement of the precision of the Y2H data set by wNAPPA 4. Network motif identification 5. Cellular growth phenotypes 6. Cancer pathways perturbed by viral ORFs
Supplementary References
Rozenblatt-Rosen et al., 2012
5
SUPPLEMENTARY INFORMATION
4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Tested byY2H: 123
Transduced intoIMR90: 87
6855 19
Viral ORFs screened
2426 29
73
10
Y2H: 53
TAP-MS: 54
Microarray: 63
Viral ORF data sets
Supplementary Figure 1. Summary of viral ORFs screened. Overlap of the number of viral ORFs tested in the Y2H and cell line branches of the experimental pipeline (left Venn diagram), and the overlaps of viral ORFs that yielded data in each channel of the experimental pipeline (right Venn diagram).
Rozenblatt-Rosen et al., 2012
6
OR
HPV8E6
Virus HostEBV-EBNALPEBV-EBNA1EBV-EBNA1SV40-LTSV40-LTSV40-LT
SP100KPNA6SRPK2ZC3H18TP53CNBP
Freq
uenc
y
Sampling spaceY2H
TAP-MS
0
50
100
150
200
250
0
50
100
150
200
250
P < 0.001
0 10 20 30 40 0 10 20 30 40
Y2H
Y2H
+N(H
I-2)
Number of virus-host protein interactionsoverlapping between Y2H and TAP-MS
SV40LT TP53
TAP-MSV-H
Y2HV-H
SV40LT TP53
TAP-MSV-H
Y2HV-H
PDLIM7
GEMY2HV-H
Y2HH-H
TAP-MSV-H
P < 0.001
P < 0.001P < 0.001
Supplementary Figure 2. Overlap of Y2H and TAP-MS data sets. Upper panels: number of virus-host interactions observed (green arrow) in Y2H and TAP-MS versus those seen by chance through random sampling of the Y2H (red) or TAP-MS (blue) search spaces, with six shared inter-actions observed listed. Lower panels: corresponding overlaps with expanded (Y2H+N(HI-2)) network, which includes human proteins “one hop” away in the HI-2 human-human interactome network.
Rozenblatt-Rosen et al., 2012
7
W W W. N A T U R E . C O M / N A T U R E | 5
SUPPLEMENTARY INFORMATION RESEARCH
1 5Odds Ratio
HP
V
EB
V
Pol
yom
aviru
s
Ade
novi
rus
ATPase activity
EBV
PDLIM7
P4HA1
P4HA2 PDLIM7
P4HB
P4HA3
Procollagen-proline4-dioxygenase activity
POLYO
PDLIM7
STRN
TP53 PDLIM7
STRN4
STRN3
PP2A binding
ADENO5
PDLIM7BAX
BH domain binding
BAK1BIKGPI-anchor transamidase complex
Mitotic cell cycle
Proteasome complex
Apoptosis
Procollagen-proline 4-dioxygenase activity
Protein phosphatase 2A binding
BH domain binding
mRNA 5’UTR binding
Supplementary Figure 3. Enrichment of GO terms for targeted host proteins. Enrichment of GO terms for host proteins physically interacting with viral proteins (Supplementary Table 17). Three examples are shown. All Odds Ratios higher than 5 were set to 5 for visualization purposes.
Rozenblatt-Rosen et al., 2012
8
SUPPLEMENTARY INFORMATION
6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Shared host targetsof HPV-E7 proteins
Freq
uenc
y0
300
600
0 150
60
120
10.5 21.5
Freq
uenc
y
Ratio = E7 of same classE7 of different class
P < 0.001
P = 0.424
CutaneousHigh-riskLow-risk 30
Supplementary Figure 4. Network of HPV E7 protein complex associations. Network of p r o t e i n complex associations of E7 viral proteins from six HPV types (hexagons, coloured according to disease class) with host proteins (circles). Host proteins that associate with two or more E7 proteins are coloured according to the disease class(es) of the corresponding HPV types. Circle size is proportional to the number of associations between host and viral proteins in the E7 networks. Viral-host protein complex associations (links) are weighted by the number of unique peptides detected for the host protein (thin links: 1-2; thick links: ≥ 3). Distribution plots of 1,000 randomised networks and experimentally observed data (green arrow) for the number of host proteins targeted at random by two or more proteins in the corresponding sub-networks (left grams), or the ratio of the probability of a host protein being targeted by viral pro-teins from the same class to the probability it is targeted by viral proteins from different classes (right histograms). Representative random networks selected from these distributions are shown as insets in the histograms.
Rozenblatt-Rosen et al., 2012
9
W W W. N A T U R E . C O M / N A T U R E | 7
SUPPLEMENTARY INFORMATION RESEARCH
Response to Type I interferonPadj < 0.001
OR = 40
IFITM1 IFIT3 IFIT2 IFIT1IFI6 IFI35
IRF1
IMR
90 D
NA
se A
cces
ibili
ty (c
ount
s pe
r 10
mill
ion
read
s)
Position (bp)
0
50
100
150
200
●●
IFIT3 IFIT1L IFIT1
91,100,000 91,120,000 91,140,000 91,160,000
AAACTGGAAAGTGAAACT AAAGGGGAAAGTGAAACTSite 1 Site 2
2 4 6 8 10 12 14 16 18
0
0.5
1.0
1.5
2.0
IRF1 binding motif
Info
rmat
ion
cont
ent
Position
Supplementary Figure 5. Chromatin accessibility of IRF1 binding sites. DNase accessible canonical IRF1 binding sites in the promoters of interferon inducible genes highly expressed in response to Group III viral protein expression (left). Predicted targets of IRF1 within cluster C24 are significantly annotated for the GO term “Type I interferon response” (right).
Rozenblatt-Rosen et al., 2012
10
SUPPLEMENTARY INFORMATION
8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
E2F1
TFAP2A
SMAD3
TFAP2A
RBPJ
HES1
Autoregulatory loops
Freq
uenc
y
0
50
100
150
4 8 12
0
200
400
0 50 100 150
Freq
uenc
y
0
50
100
150
0 2,000 4,000 6,000
Freq
uenc
y
Feedback loops
Feedforward loops
0
P < 0.001
P < 0.001
P < 0.001
Supplementary Figure 6. Regulatory loops. Null distributions of autoregulatory (top), feedback (middle) and feedforward (bottom) motifs compared to the number of observed network motifs (green arrow) for enriched TFs.
Rozenblatt-Rosen et al., 2012
11
W W W. N A T U R E . C O M / N A T U R E | 9
SUPPLEMENTARY INFORMATION RESEARCH
DNA damage response (DNA replication)(Circadian rhythm)Potassium ion import
Nucleosome assembly Regulation of cell proliferationType I interferon response
Response to hypoxia
PDGFR signalling (Pathways in Cancer)Cell morphogenesis in differentiationProteinaceous extracellular matrix
TGF-beta signalling pathwayGlutathione metabolism
Response to radiation (Wnt signalling)
Lipid biosynthesis
Lysosome (Lysosome)Lysosome (Lysosome)
(Phagosome)
C31C30C29C28C27C26C25C24C23C22C21C20C19C18C17C16C15C14C13 C12C11C10C9C8C7C6C5C4C3C2C1
Regulation of fibroblast proliferation
(Calcium signalling pathway)
Growth factor binding
Enriched GO (KEGG)TP53RB/E2F
5−E6
8−E6
BIL
F1E
4OR
F3B
LRF2
TSV
-LS
TM
CP
yV-L
TTR
MC
PyV
-LT
LMP
2AE
1B19
KB
NR
F1B
RLF
1E
BN
A1
BX
LF1
E4O
RFB LF
211
-E6
6b-E
6B
ALF
1M
CP
yV-L
ST
BN
LF2B
BFL
F2B
TRF1
BR
RF2
16-E
6XX
18-E
6X5-
E7
E4O
RF1
BFR
F0.5
E3G
P19
KB
SR
F1B
GLF
311
-E5B
16-E
56b
-E5A
BFR
F28-
E7
18-E
5E
BN
ALP
6b-E
5BB
HR
F1B
GLF
3.5
6b-E
711
-E7
BN
LF2A
BD
LF3.
5B
DLF
4B
VLF
1SV40−S
TW
U-L
ST
JCC
Y-L
ST
16-E
7JC
Mad
1-LS
TS
V40
-LT
HP
yV7-
LST
HP
yV6-
LST
16−E
6NOX
18−E
6NOX
16-E
6NO
XI1
28T
16-E
618−E
618−E
7MCPyV−S
T
[snoRNA]#
TRAIL binding (p53 signalling pathway)
I II III
Cell-substrate adhesion
Acute inflammatory response
DNA damage response
Enriched TF
0 12 (fold change)
Color Key
−1Log
E2F* SP* REL* CTCF MAZ
POU2F2 CREB3* SP* E2F* CRXBPTF REL TFAP2* NRF1 ETV4
STAT NFKB* REL* CTCF E2F*
REL E2F* MAFF PATZ1 NRF1
STAT1 EGR2RELA FOSL2 JUNB TFAP4 EGR2
E2F* STAT* TFAP4 EGR2 EP300
GABPB2
NFYA E2F* ZFP161 MYC FOSL*
STAT* TP53 REL FOSL2 E2F*
FOSL2 ARNTL
FOSL2
FOSL2 JUN TFAP4 STAT1 EGR2
ERG GABPB2 FOS* NFE2L2
REL* NFKB*IRF1 STAT1 CREB* PATZ1 MAZ
RBPJ CREB3* EGR2 EHF STAT*
E2F* FOSL2 JUN TFAP2A
EGR2 FOSL2
E2F*
E2F* NRF1 EGR1
Supplementary Figure 7. Heatmap of transcriptome perturbations. Enlarged version of Fig. 2a. Enriched GO terms and KEGG pathways are listed adjacent to the numbered expression clus-ters. Transcription factors (TFs) with enriched binding sites and gene targets enriched for the listed GO and/or KEGG pathways that are physically associated with or differentially expressed in response to viral proteins are shown, with * denoting multiple members of a TF family. TFs are ranked by odds ratio (OR) for pathway enrichment, and up to 5 TFs are shown for any cluster. In cluster C1 eight of the nine transcripts are snoRNAs (# sign). Upper dendrogram is shaded by viORF grouping. Blocks show which viral proteins associate with the indicated host proteins, as detected in our data set (grey) or manually curated (green).
EBV HPV Adenovirus Polyomavirus
Rozenblatt-Rosen et al., 2012
12
SUPPLEMENTARY INFORMATION
1 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4
0.0
0.1
0.2
0.3
SV40-LT
HVP5-E6
HPV6b-E6
HPV8-E6
HPV16-E6
N-GFPSV40-ST
MCPyV-ST
EBV-BGLF3
N-GFP
SV40-LT
HPV18-E6NOX
HPV18-E6X
MCPyV-LTTRUNC
C-GFP
Average cluster C31 log2(fold change)
Gro
wth
rate
(1/d
ays)
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4
0
20
40
60
80
100
SV40-LT
HVP5-E6
HPV8-E6
HPV6b-E6HPV16-E6
N-GFP
SV40-ST
MCPyV-ST
MCPyV-LTTRUNC
EBV-BGLF3
N-GFP
SV40-LT
HPV18-E6NOX
HPV18-E6XC-GFP
Per
cent
age
of s
enes
cent
cel
ls
Average cluster C31 log2(fold change)
R = 0.8Padj = 0.008
R = -0.92Padj = 0.016
Supplementary Figure 8. Growth rate and senescence of selected IMR-90 cell lines expressing viral ORFs. Growth rate and senescence plotted against average expression of genes in cluster 31. Error bars in growth rate represent standard deviation across three samples in all cases except for MCPyV-LTTRUNC, which is across two samples, and HPV5-E6 and HPV8-E6, which represent one sample each. Error bars in senescence represent standard devia-tion across two samples.
Rozenblatt-Rosen et al., 2012
13
W W W. N A T U R E . C O M / N A T U R E | 1 1
SUPPLEMENTARY INFORMATION RESEARCH
LF2
HPV18-E5
SV40-ST
HPV11-E7
HPV6b-E5BHPV6b-E7
BRLF1
HPV16-E7HPV8-E7
BNLF2A
GATA6REL
CREB3L1STAT4
ZNF219
GABPB2CREBL2ARNTLFOSL2CREB3BACH1FOXK1ERG
STAT1E2F5TFDP2E2F1
ELF1TP73
SP4HBP1CREB5
STAT2ATF3
BLRF2BDLF3.5BGLF3
MCPyV-STHPV5-E6
HPV18-E7
HPV16-E6XX
BFLF2
ZFP161
NFIATEAD2
FLI1
NFIBTFAP4SMAD3
KLF7 HPV8-E6
HPV18-E6EBNA1
BILF1
HPV11-E5B
MCPyV-LTTRUNC
TSV-LSTMCPyV-LT
BFRF0.5
HES1
TFAP2C
LEF1
E2F3STAT3POU2F2
MYBL1EGR1MYC
FOS
TP53
HPV6b-E6
BDLF4HPV5-E7
WU-LST
BTRF1
E4ORF1NR5A2
RELB
E2F4
TFDP1
EGR2
NFKB2JUNB
NFYA
EHFCEBPB
BHRF1EBNALP
JCMad1-LSTHPV11-E6
JCCY-LSTBALF1
BXLF1
NFKB1MAXMAFF
CRXBPTF
EP300
PRDM1
RB1ZNF281NFE2L3
TEAD4RORA
SV40-LT
MCPyV-LST
BNRF1
BVLF1HPyV7-LST
MYBL2IRF1
CTCFSOX11
RUNX1JUN
ETV4
STAT5A
ELF4
MAZMYBSP1SP3
GABPB1ELK3
NFE2L2
E2F2
FOSL1
NFYBETS1
PATZ1
BNLF2BHPV16-E6NOX
BSRF1HPV16-E6
HPV16-E6NOXI128T
BGLF3.5
E4ORFB
LMP2ABRRF2
HPV18-E6NOXE1B19K
HPyV6-LST
E4ORF3
C
Differential expression? V
C
Experimental TFC V
Predicted
Cell-substrate adhesion
Regulation of fibroblast proliferation
DNA damage response
Causes expression change of TF
EBV HPV Adenovirus Polyomavirus
TAP-MS Y2H Binding site enrichment
TFs
Clusters
HPV18E6
HPV5E6
C27
SV40LT
ETV4
E4ORF3
BPTF
BILF1
ELF4
HPV11E7
JUN TEAD2 FOSL2
HPV6bE7
HPV5E6
C8
HPV8E6
HPyV6LST
TFAP4
EGR1
HPV18E7
HPV6bE7
FOSL2
C13
HPV16E6
HPV11E7BRLF1
TFs
Clusters
Viralproteins
Viralproteins
C28
C24
C25
C30
C31
C14
C9
C22C11C19C23C20C16
C8C15C13
C3
C27
C18
C26
C10
C7C17C21
C2
C12C5C4
C6C29
Supplementary Figure 9. Viral proteins, transcription factors and clusters. Network repre-sentation of all predicted viral protein-TF-cluster cascades (left). Schematic (right) shows how viral protein-TF-target gene network was constructed, with representative networks underneath.
Rozenblatt-Rosen et al., 2012
14
SUPPLEMENTARY INFORMATION
1 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
IMR-90
WB
IP: HA
kDa HP
V5-
E6
HP
V8-
E6
HP
V6b
-E6
HP
V11
-E6
HP
V16
-E6
HP
V18
-E6
GFP
G
FP W
CL
EP300 225 -
102 - MAML1 150 -
GFP
HP
V5-
E6
HP
V8-
E6
HP
V6b
-E6
HP
V11
-E6
HP
V16
-E6
HP
V18
-E6
WB
WCL
EP300 225 -
kDa
MAML1 102 - 150 -
HA 31 - 24 -
38 -
UBE3A 76 -
102 - 102 -
HA *31 - 24 -
38 -
UBE3A 76 -
Actin 38 -
WB
IP: HA
kDa siM
AM
L1
HP
V5-
E6
HP
V8-
E6
HP
V6b
-E6
HP
V11
-E6
HP
V16
-E6
HP
V18
-E6
siC
ontro
l
EP300 225 -
U-2 OS
HP
V18
-E6
HP
V16
-E6
HP
V11
-E6
HP
V6b
-E6
HP
V8-
E6
HP
V5-
E6
pNM
CV
WB
WCL
EP300 225 -
kDa siM
AM
L1
siC
ontro
l
150 -
pNM
CV
HA
UBE3A
24 -
102 - 76 - UBE3A 102 -
76 -
MAML1 150 - 120 -
24 - HA
150 - 102 - MAML1
52 - Actin 38 -
WCL
Supplementary Figure 10. MAML1 is targeted by E6 proteins from HPV5 and HPV8. Western blots of whole cell lysates (WCL) and co-immunoprecipitations of HPV E6 proteins in IMR-90 cells (upper panels) or U-2 OS cells (lower panels).
Rozenblatt-Rosen et al., 2012
15
W W W. N A T U R E . C O M / N A T U R E | 1 3
SUPPLEMENTARY INFORMATION RESEARCH
N-C
MV
5E6
8E6
6bE
6
11E
6
16E
6
18E
6
ICD
RLU
No
ICD
1
2
U-2 OS CellsHES1 reporter
25, 50, 100 ng of plasmid
3
0 WB:
FLAG
IMR-90
Actin
MAML1 150 - 52 -
kDa
38 -
Rel
ativ
e ex
pres
sion DLL4
HES1
Primers
Rel
ativ
e ex
pres
sion
MAML1 HES1 DLL4
Control MAML1
GFP
HPV5-E6
HPV6b-E
6HPV11
-E6
HPV16-E
6
HPV18-E
6
Cont
rol
MAM
L1
00.20.40.60.81.01.21.41.61.82.0
0
0.2
0.4
0.6
0.8
1.0
1.2 shRNA
HPV8-E6
Supplementary Figure 11. The E6 protein from HPV5 and HPV8 regulates expression of Notch pathway responsive genes HES1 and DLL4. Upper panels: qPCR of Notch pathway responsive genes upon expression of E6 proteins from different HPV types (left panel) or upon knockdown of MAML1 (right panel). Error bars represent standard deviation across two experi-ments and three experiments, respectively. Western blot of MAML1 in IMR-90 cells upon shRNA knockdown. Lower panel: HES1 luciferase reporter assays of co-transfected Notch intracellular domain (Notch-ICD) and E6 proteins at increasing concentrations from different HPV types in U-2 OS cells. Error bars represent standard deviation across three experiments. Western blot (inset) depicts E6 protein expression from the indicated HPV type in a representative experiment.
Rozenblatt-Rosen et al., 2012
16
SUPPLEMENTARY INFORMATION
1 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
−log(P−value)
Fold
enr
ichm
ent i
n C
OS
MIC
Cla
ssic
pr
otei
n re
cove
ry
1.6
1.8
2.0
2.2
2.4
●
●
●
19
17
16
15
13
1.0 1.5 2.0 2.5 3.0
Uniquepeptides
● 1● 2● 3
45
P=0.05
19,398 proteins outside diagram
TAP-MS
COSMIC Classic
Y2H TF 56359
4612
286
13
11 2
0
0
0
0
91
16 cancer proteins within Y2H+TF+TAP-MS: P = 0.007
Supplementary Figure 12. DNA tumour viruses target cancer genes. Fold enrichment of COSMIC Classic genes in TAP-MS, Y2H and TF VirHost data sets as a function of the number of unique peptides (circle sizes) detected for host proteins identified through TAP-MS. Numbers of proteins in intersections of VirHost and COSMIC Classic genes are indicated. Venn diagram of overlaps between proteins identified by TAP-MS (≥3 unique peptides), Y2H, TF and COSMIC Classic genes.
Rozenblatt-Rosen et al., 2012
17
80
9095
97 98 99
Rep
rodu
cibi
lity
(%)
Minimum number of unique peptides
40
60
50
70
80
90
100
53
1 2 3 4 5 6 7 8 9 10
TAP-A, TAP-BTAP-B, TAP-AAverage
Supplementary Figure 13. Reproducibility of viral-host protein associations observed in replicate TAP-MS experiments as a function of the number of unique peptides detected. Data were plotted based on both experimental orientations (TAP-A then TAP-B, blue line and TAP-B then TAP-A, red line), with the average indicated by the green dashed line.
Rozenblatt-Rosen et al., 2012
18
W W W. N A T U R E . C O M / N A T U R E | 1 5
SUPPLEMENTARY INFORMATION RESEARCH
Protein
0.0
0.1
0.2
0.3
0.4
0.5
20 40 60 80 100
Pol
yphe
n2 m
utat
ion
dele
terio
usne
ss s
core
Polyphen2 Prediction
UnscoredBenignPossibly DamagingProbably Damaging
Somatic mutation count
Cancer type (source)
Chronic Lymphocytic Leukemia
GBM(Baylor)
GBM(WashU)
Head&Neck(Agrawal)
Head&Neck(Stransky)
Medulloblastoma
Multiple Myeloma
Non−Small Cell Lung Cancer
Ovarian(Baylor)
Ovarian(Broad)
Ovarian(WashU)
Small Cell Lung Cancer
0 2000 4000 6000
0.0
0.1
0.2
0.4
0.5
TP53
HIS
T1H
4AP
TEN
KR
AS
HIS
T1H
3AC
DK
N2A
NR
AS
C19
orf5
3E
GFR
RH
OA
0.3
Supplementary Figure 14. Somatic mutations in genome-wide tumour sequencing projects. Somatic mutations identified through genome-wide sequencing studies of human cancers and evaluated for impact using Polyphen2 (upper panel). Ranking of proteins with somatic mutations in cancer based on the cumulative inferred impact of all observed mutations (lower panel) (Supplementary Table 11). Inset: top ten ranked proteins.
Rozenblatt-Rosen et al., 2012
19
SUPPLEMENTARY INFORMATION
1 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Supplementary Figure 15. Network of VirHostSM to host targets and cancers. Mapping of VirHostSM gene products to both tumours in which they are mutated (left) and to viral interactors (right). Proteins annotated with the GO term “regulation of apoptosis” indicated in purple.
MCPyV-LT
16-E6
SV40-ST
18-E6NOX16-E6NOX
18-E7
BGLF3.5
Medulloblastoma
Ovarian
Small cell lung
11-E7
JCMad1-LST
BVLF1
HPyV6-LST18-E6HPyV7-LST
MCPyV-ST
Glioblastoma
Head and neck
Multiple myeloma
BK-LTTRUNC
WU-LST
BGLF3
JCCY-LST
EBNA3B5-E7
18-E6X
BTRF1
BNRF1
LMP2A
16-E7
BSRF1
EBNALP
E1A
BBRF2
11-E5B
E1B19KE4ORFB
33-E7
BDLF4
EBNA1
BALF1
16-E5BFRF0.5
18-E5BSLF2/BMLF1
BHRF1
BNLF2A
6b-E5A
SV40-LT
8-E76b-E7
8-E6
PSMB6
FBXW7
PSMA4CDK4
MEOX2POLR1C
DMRT3
MAGEA1NFE2L2
FOXK1
POLM
ELF1
DNAJA3CNP
ZBTB25WDYHV1
GMCL1TP53TAF9
PRTFDC1
UBE2IPRAF2
TMEM126ALAMTOR1
BAK1BIK
KHDRBS2RPS24
KRTAP13-1
C1orf50
ROPN1
NFYBFAM71C
EGFR
STAU1
ARHGDIARRAS2
RABL3CHRM2
F2R
RB1PSMC5
PPP2R1A
Regulation of apoptosis
Y2H interactionTAP-MS association
COSMIC Classic
TF
E4ORF3
5-E6BDLF3.5MCPyV-LTTRUNC
BNLF2B
BILF1
Causes expression change of TF
Rozenblatt-Rosen et al., 2012
20
W W W. N A T U R E . C O M / N A T U R E | 1 7
SUPPLEMENTARY INFORMATION RESEARCH
0.5 x106
1 x106
P3
Split 1:3P4
P5
P6
Thaw
3
9
27
Split 1:3
Split 1:3
Transduce 26 viral genes& controls (GFP, SV40LT)
P6
Split 1:5P7
Split 1:3P8
5
9
P9
RNA
5
P12
Protein
18
Freeze
P7, P8, P10
Puromycin selection
Western Blot Analysis
MycoplasmaTesting
Sequence viORF
Bioanalyzer
P0
Split 1:3P1
P2
P3
Thaw
Cell bank Generation of cell lines Quality Controls
IMR-90 Bank
IMR-90 Bank
3
9
27
Split 1:3
Split 1:3
Supplementary Figure 16. The IMR-90 cell culture pipeline. Generation of the IMR-90 cell bank (left), and generation of IMR-90 cell lines expressing viORFs (middle) and quality control (QC) measures taken (right).
Rozenblatt-Rosen et al., 2012
21
SUPPLEMENTARY INFORMATION
1 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
50
110
90
160200
MW(kDa)
40
30
201510
70
Viral Protein
SubcellularFractionCell Line #
C N C N C N C N C N C N
1 33 59 85 99 111
CTRL-GFP CTRL-GFP CTRL-GFP CTRL-GFP CTRL-GFP CTRL-GFP
Viral Protein
SubcellularFractionCell Line # 0 00
CTRL-IMR90 CTRL-IMR90 CTRL-VECTOR
C N C N C N
Viral Protein
SubcellularFractionCell Line #
N M N M N M N M N M
44 45 47 48 49
HPV-6b E5A HPV-6b E5B HPV-11 E5B HPV-16 E5 HPV-18 E5
Rozenblatt-Rosen et al., 2012
22
W W W. N A T U R E . C O M / N A T U R E | 1 9
SUPPLEMENTARY INFORMATION RESEARCH
C N C N C N C N C N C N C N C N
Viral Protein
SubcellularFractionCell Line # 21 22 23 24 25 26 38 40
HPV-5 E6 HPV-6b E6 HPV-8 E6 HPV-11 E6 HPV-16 E6 HPV-18 E6 HPV-5 E6 HPV-8 E6
Viral Protein
87 91 92 93
HPV-16 E6no* HPV-16 E6no*I128T
HPV-18 E6no* HPV-18 E6*
C N C N C N C NSubcellularFractionCell Line #
C N C N C N C N C N C N C N C N
Viral Protein
SubcellularFractionCell Line # 15 16 17 18 19 20 96 113
HPV-5 E7 HPV-6b E7 HPV-8 E7 HPV-11 E7 HPV-16 E7 HPV-18 E7 HPV-5 E7 HPV-16 E7
Rozenblatt-Rosen et al., 2012
23
SUPPLEMENTARY INFORMATION
2 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Viral Protein
SubcellularFractionCell Line #
C N C N C N C N C N C N C N
5 6 37 60 86 112 50
POLYO-SV40LT
POLYO-SV40LT
POLYO-SV40LT
POLYO-SV40LT
POLYO-SV40LT
POLYO-SV40LT
POLYO-SV40sT
Viral Protein
7 51 118
POLYO-MCPyVLT
POLYO-MCPyVsT
POLYO-MCPyVLsT
C N C N C NSubcellularFractionCell Line #
Viral Protein
SubcellularFractionCell Line #
C N C N C N C N C N C N
114 115 116 117 119 120
POLYO-JCCYLsT
POLYO-JCMad1LsT
POLYO-BKLsT
POLYO-WULsT
POLYO-HPy6LsT
POLYO-HPy7LsT
Rozenblatt-Rosen et al., 2012
24
W W W. N A T U R E . C O M / N A T U R E | 2 1
SUPPLEMENTARY INFORMATION RESEARCH
C N C N C N C N C N C N
Viral Protein
SubcellularFractionCell Line # 14 53 54 56 57 70
EBV-EBNALP EBV-LMP2a EBV-BHRF1 EBV-BRLF1 EBV-BNRF1 EBV-EBNA1
Viral Protein
77 78
79 81 82 109 110
EBV-BTRF1 EBV-BGLF3
EBV-BVLF1 EBV-BDLF4 EBV-BDLF3.5 EBV-BXLF1 EBV-BALF1
C N C N
C N C N C N C N C NSubcellularFractionCell Line #
Viral Protein
SubcellularFractionCell Line #
C N C N C N
C N C N C N
129 130 131
132 135 136
EBV-BNLF2a EBV-BNLF2b EBV-BFRF0.5
EBV-BSRF1 EBV-BGLF3.5 BFLF2
C N C N C N
102 122 125
ADENO5E1B19K
ADENO5E4ORF1
ADENO5E3GP19K
Supplementary Figure 17. Silver stain analyses of viral-host protein complexes. A 15% aliquot of the final eluate of each TAP was resolved on polyacrylamide gels then visualized by silver-stain (shown here for the first TAP replicate).
Rozenblatt-Rosen et al., 2012
25
SUPPLEMENTARY INFORMATION
2 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Supplementary Figure 18. Cluster propensity of microarrays before and after ComBat. Hierarchical clustering of all microarrays (a) before and (b) after applying ComBat, an algorithm used for removing batch effects in microarray data.
X21D
X120
EX5
6CX5
6DX5
6AX5
6BX5
6E X3A
X3D
X3B
X3C
X3E
X109
EX1
09A
X109
CX1
09B
X109
DX3
5EX3
5AX3
5DX3
5BX3
5CX7
0BX7
0C X2B
X2A
X2D
X70E
X70A
X70D
X121
CX1
21A
X121
BX1
21D
X121
EX4
9DX4
9CX4
9AX4
9BX4
9EX1
27B
X127
AX1
27C
X127
DX1
27E
X7A
X7C
X7E
X7B
X7D
X130
AX1
30C
X130
DX1
18C
X118
DX1
36A
X136
CX1
36D
X58C
X58E
X58B
X58A
X58D X4
DX3
4BX4
CX2
CX2
EX1
07D
X107
AX1
07B
X107
CX1
07E
X124
BX1
24E
X124
DX1
24A
X124
CX2
3AX2
3EX2
3CX2
3BX2
3DX3
8BX4
0AX4
0BX2
1AX2
1CX2
1BX2
1EX4
0cX4
0DX3
8CX3
8EX3
8AX3
8DX5
5DX5
5AX5
5BX5
5CX5
5EX9
8CX9
8DX9
8AX9
8BX9
8EX2
4AX2
4BX2
4DX2
2BX2
2CX2
2DX2
2AX2
2EX2
4CX2
4EX1
30B
X130
EX1
36B
X136
EX1
18A
X118
BX1
18E
X74A
X74B
X77B
X77D
X77E
X53B
X53D X4
AX4
BX4
EX5
3EX5
3AX5
3CX1
23E
X123
DX1
23C
X123
AX1
23B
X34C
X34A
X34D
X20E
.2X2
0E.3
X20E
.4X2
0E.1
X20E
_040
7X2
0E_0
416
X120
AX1
14B
X114
EX1
14C
X114
AX1
14D
X120
CX1
20B
X120
DX1
19A
X119
CX1
19B
X119
DX1
19E
X87C
X87D
X25B
X25D
X87B
X92B
X92A
X92E
X87A
X91A
X25A
X26A
X60C
X60A
X60B
X60D
X60E
X6D
X5A
X5D
X6A
X5C
X5E
X86A
X86B
X6C
X37B
X37D
X6C
_040
7X6
C_0
416
X6C
.1X6
C.2
X87E
X19A
X135
CX1
35B
X135
EX4
7AX4
7EX4
8AX4
8EX8
5EX8
5BX8
5CX5
9AX3
3CX3
3DX1
7EX4
4AX4
4BX4
4EX1
31A
X88C
X132
AX7
5EX7
8EX7
8DX8
1DX8
2DX4
5EX4
5CX4
5AX4
5BX7
8AX8
2AX8
2CX8
5DX1
7AX9
6DX8
8DX9
3DX1
4BX1
4AX1
4EX1
02D
X102
BX1
02C
X50A
X50B
X50E
X50C
X50D
X99E
X102
EX3
3AX8
8EX7
9DX7
9EX8
2EX7
5DX5
9EX5
9BX5
9DX4
5DX1
29C
X54B
X54D
X75C
X135
AX1
22E
X125
EX4
8DX1
9BX1
9CX1
32B
X125
BX1
31B
X122
BX1
22C
X125
CX1
31C
X1B
X1C
X1E
X16C
X15A
X15D
X17B
X17C
X51C
X51D
X51E
X51A
X51B
X88A
X102
AX9
9DX5
9CX7
5BX3
3BX3
3EX1
6BX1
5EX1
5BX1
5CX8
5AX9
3AX4
4CX1
11B
X111
EX1
11A
X111
DX1
AX1
25A
X132
CX1
8AX1
6EX1
29A
X129
EX1
9DX1
9EX1
8CX1
8DX1
6AX1
8BX1
8EX1
32D
X132
EX1
29D
X78B
X78C
X48B
X48C
X47C
X47B
X47D
X44D X1
DX1
6DX1
7DX6
C.3
X92C
X92D
X91C
X91D
X91B
X91E
X86D
X86C
X86E
X5B
X6B
X6E
X6C
_033
010
X37A
X37C
X37E
X26C
X26B
X26E
X26D
X25C
X25E
X20E
_033
010
X20C
X20E
X20D
X20A
X20B
X57C
X57A
X57B
X57D
X57E
X81C
X54A
X54C
X54E
X122
AX1
22D
X131
EX1
35D
X125
DX1
31D
X75A
X93E
X99B
X96B
X88B
X93B
X81E
X129
BX1
11C
X74D
X74E
X81B
X77A
X81A
X74C
X77C
X14C
X14D
X93C
X96C
X99A
X99C
X96A
X96E
X110
DX1
10A
X110
BX1
10C
X110
EX7
9AX7
9BX7
9CX1
12A
X112
DX1
12C
X112
BX1
12E
X36C
X36A
X36E
X36B
X36D
X117
AX1
17B
X117
CX1
17D
X117
EX1
15B
X115
EX1
15A
X115
CX1
15D
X113
AX1
13E
X113
BX1
13C
X113
D
X12
0EX
87E
X87
CX
87D
X92
CX
92D
X91
CX
91D
X91
BX
91E
X6C
.3X
86D
X86
CX
86E
X98
CX
98D
X98
AX
98B
X98
EX
85D
X85
EX
85B
X85
CX
96D
X88
DX
93D
X88
EX
102D
X10
2BX
102C
X99
EX
102E
X79
DX
79E
X75
DX
59E
X59
BX
59D
X77
EX
82E
X13
1AX
135A
X12
9CX
125C
X13
1CX
125E
X12
5BX
131B
X12
2EX
122B
X12
2CX
107D
X10
7AX
107B
X10
7CX
107E
X11
0DX
110A
X11
0BX
110C
X11
0EX
109E
X10
9AX
109C
X10
9BX
109D
X11
2AX
112E
X11
2DX
112B
X11
2CX
117C
X11
7DX
117E
X11
7AX
117B
X11
5BX
115E
X11
5AX
115C
X11
5DX
113A
X11
3EX
113B
X11
3CX
113D
X12
0CX
120B
X12
0DX
119A
X11
9BX
119C
X11
9DX
119E
X12
0AX
114B
X11
4EX
114A
X11
4CX
114D
X57
CX
57A
X57
BX
57D
X57
EX
36C
X36
AX
36E
X36
BX
36D
X6C
X6C
_040
7X
6C_0
416
X6E
X6C
_033
010
X37
BX
37D
X6D
X5D X5E
X37
AX
37C
X37
EX
5AX
6AX
5C X5B
X6B
X6C
.1X
6C.2
X60
CX
60A
X60
BX
60D
X60
EX
79A
X79
BX
79C
X59
CX
78B
X78
CX
74C
X74
EX
77C
X81
BX
77A
X81
AX
75B
X81
CX
18A
X16
EX
16B
X15
BX
15C
X1A
X54
AX
19D
X19
EX
18D
X15
EX
18E
X47
AX
47E
X47
CX
47B
X47
DX
44D
X44
EX
33B
X33
EX
44C
X48
BX
48C
X2D
X4D X4E
X15
DX
16D
X17
DX
1DX
14D
X4C
X2C X2E
X3D X3E
X3A
X3B
X3C
X35
EX
35D
X35
CX
35A
X35
BX
53B
X53
DX
53E
X53
AX
53C
X2A
X2B
X4A
X4B
X50
AX
50B
X50
CX
50D
X50
EX
33A
X48
DX
48A
X48
EX
33C
X33
DX
44A
X44
BX
45C
X45
AX
45B
X45
DX
45E
X49
DX
49C
X49
AX
49B
X49
EX
17A
X19
AX
1EX
14E
X14
CX
14A
X14
BX
54C
X54
EX
54B
X54
DX
17C
X17
EX
16A
X18
BX
16C
X18
CX
19B
X19
CX
1BX
1CX
15A
X17
BX
56C
X56
DX
56A
X56
BX
56E
X85
AX
93A
X93
CX
88C
X96
CX
99A
X99
CX
96A
X96
EX
93E
X88
BX
93B
X96
BX
99B
X88
AX
102A
X99
DX
129A
X12
9EX
129D
X11
1BX
111E
X13
5CX
135B
X13
5EX
129B
X11
1CX
122A
X12
2DX
135D
X12
5DX
131D
X13
1EX
132B
X13
2CX
132D
X13
2EX
132A
X12
5AX
111A
X11
1DX
58C
X58
EX
58B
X58
AX
58D
X70
EX
70A
X70
DX
70B
X70
CX
74A
X74
DX
77D
X74
BX
77B
X82
CX
78A
X59
AX
82A
X78
EX
81E
X78
DX
81D
X82
DX
75E
X75
AX
75C
X51
CX
51D
X51
EX
51A
X51
BX
20E
.3X
20E
.2X
20E
.1X
20E
.4X
20A
X20
BX
20E
_033
010
X20
E_0
407
X20
EX
20C
X20
DX
20E
_041
6X
86A
X86
BX
87B
X92
BX
92A
X92
EX
87A
X91
AX
26A
X25
AX
25D
X25
CX
25B
X25
EX
26D
X26
EX
26B
X26
CX
40c
X40
DX
38E
X38
CX
38A
X38
DX
38B
X40
AX
40B
X21
DX
21B
X21
CX
21A
X21
EX
23C
X23
BX
23D
X23
AX
23E
X12
3CX
123E
X12
3DX
123A
X12
3BX
124D
X12
4BX
124E
X12
4AX
124C
X12
1CX
121D
X12
1EX
121A
X12
1BX
136A
X11
8CX
118D
X13
0AX
130C
X13
0DX
127B
X12
7AX
127C
X12
7DX
127E
X11
8AX
118B
X11
8EX
136C
X13
6DX
130B
X13
0EX
136B
X13
6EX
55A
X55
BX
55D
X55
CX
55E
X7B
X7D X7A
X7C X7E
X34
BX
34C
X34
AX
34D
X24
BX
24A
X24
DX
24C
X24
EX
22B
X22
DX
22E
X22
AX
22C
Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8
a
b
Rozenblatt-Rosen et al., 2012
26
Enriched GO pathways
Freq
uenc
y
0
20
40
P < 0.01
20 40 60Enriched KEGG pathways
Freq
uenc
y
0
15
30
P < 0.01
0.0 0.4 0.8
Supplementary Figure 19. Cluster coherence. Frequency of KEGG and GO pathway enrich-ment as compared to randomly generated clusters.
Rozenblatt-Rosen et al., 2012
27
W W W. N A T U R E . C O M / N A T U R E | 2 3
SUPPLEMENTARY INFORMATION RESEARCH
Average number of unique peptides detected
Mea
n m
RN
A in
tens
ity in
con
trol s
ampl
es
6
8
10
12
14
75th percentile
50th percentile
25th percentile
10 20 30 40 50 60
Supplementary Figure 20. Expression bias of TAP-MS associations. Viral targets identified through TAP-MS are biased towards highly expressed genes. Relationship between mRNA abun-dance in control samples of IMR-90 (y-axis) versus the number of unique peptides observed for viral associations for that protein (x-axis). Number of unique peptides is the average for two biological replicates. If more than one virus association is seen for a host protein, the association with the highest number of unique peptides is shown. Horizontal lines correspond to the indicated percentiles for mRNA abundance of all transcripts on the microarray.
Rozenblatt-Rosen et al., 2012
28
SUPPLEMENTARY INFORMATION
2 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Fold enrichment over random sets
Number of identified unique peptides
● 1 ● 2 ● 3 4 5
Num
ber o
f hos
t−vi
rus
PR
Sas
soci
atio
ns re
cove
red
10
11
12
13
14
15
16
17 ● ●
●
30 35 40 45 50 55 60
Supplementary Figure 21. Overlap of viral-host protein pairs identified through TAP-MS with a literature-curated positive reference set. All points are significant at P < 0.001. Dot size reflects minimum number of unique peptide detections required for protein identification.
Rozenblatt-Rosen et al., 2012
29
Signal threshold
Rec
over
y ra
te (%
)
PRS (50)RRS (94)Y2H Viral-Hostdata set (447)
1 2 3 4 5
0
5
10
15
20
25
30
PRS RRS Y2HData set
Rec
over
y ra
te (%
)
0
2
4
6
8
10
12
14
Supplementary Figure 22. Assessment of Y2H data set. Percentage of Y2H interacting protein pairs positive in wNAPPA assay at increasing assay signal for PRS, RRS and Y2H viral-host data set. Inset: fraction of PRS or Y2H viral-host data set positive in wNAPPA at a threshold of 1% RRS positive.
Rozenblatt-Rosen et al., 2012
30
W W W. N A T U R E . C O M / N A T U R E | 2 5
SUPPLEMENTARY INFORMATION RESEARCH
Supplementary Methods
A. Viral ORFeome cloning
Open-reading frame (ORF) clones encoding selected proteins from adenovirus 5 (Ad5), seven
human papillomaviruses, and nine polyomaviruses (Supplementary Table 1) were obtained by
PCR-based Gateway recombinational cloning, following a protocol previously described for
cloning Epstein-Barr virus ORFs4. To generate viral entry clones ORFs were PCR-amplified with
KOD HotStart Polymerase (Novagen). The PCR primers used contained attB1.1 and attB2.1
recombination sites fused to ~20 nucleotides of ORF-specific forward and reverse primers,
respectively. Primers used to clone viral ORFs are listed (Supplementary Table 13). PCR
products were then transferred into pDONR223 by a Gateway BP reaction, followed by
transformation into chemically competent E. coli DH5α cells, selecting for spectinomycin
resistance30.
Sequence-verified entry clone viral ORFs were transferred by Gateway LR recombinational
cloning (Invitrogen) into appropriate expression vectors. Recombination products were directly
transformed into E. coli (DH5α-T1R strain) via selection for ampicillin resistance in liquid LB
media. Plasmid DNA was extracted from E. coli grown overnight in a 96-well format using a
Qiagen BioRobot 8000. The prepared plasmid DNA was used for transformations into yeast
cells or for generation of retrovirus for subsequent transductions into IMR-90 human cells.
For assay of protein interactions by yeast two-hybrid (Y2H), viral ORFs (viORFs) were
introduced into both pDEST-DB and pDEST-AD-CYH2 destination vectors9, generating Gal4
DNA binding domain (DB)-viORF hybrid proteins and Gal4 activation domain (AD)-viORF hybrid
proteins, respectively.
For expression in mammalian cells, sequence validated viral ORFs in pDONR223 were
recombined into either the Gateway destination vector MSCV-N-Flag-HA-IRES-PURO (NTAP)
or MSCV-C-Flag-HA-IRES-PURO (CTAP) (gifts of M. Sowa and J. W. Harper)31 using LR
Rozenblatt-Rosen et al., 2012
31
SUPPLEMENTARY INFORMATION
2 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Clonase (Invitrogen). The following primer sets were used for resequencing of the resulting
plasmids:
MSCV-N and MSCV-C Fwd: 5’-CCCTTGAACCTCCTCGTTCGACC-3’
MSCV-N Rev: 5’-GCCAAAAGACGGCAATATGGTGG-3’
MSCV-C Rev: 5’-GTCGGGCACGTCGTAGGG-3’
Adenovirus: Nine full length ORFs (Supplementary Table 1) were PCR amplified from Ad5
genomic DNA (human adenovirus 5 strain Adenoid 75 from the ATCC (ATCC VR-5) to generate
Gateway entry clones.
Epstein-Barr Virus (EBV): Eighty-one EBV ORFs were investigated (Supplementary Table 1).
EBV Gateway entry clones and Y2H DB-X and AD-Y clones were previously described4.
Selected EBV ORFs (Supplementary Table 1) were transferred to the NTAP vector for
subsequent transduction of IMR-90 cells.
Human Papillomaviruses (HPV): Seven HPV types32-38 (virus classifications from
http://pave.niaid.nih.gov/#prototypes?type=human) were chosen for this study: HPV6b, 11, 16,
18 and 33 of the alpha genus associated with mucosal lesions (gifts of Y. Jacob), and HPV5
and HPV8 (gifts of Y. Jacob and H. Pfister) of the beta genus which infect cutaneous epithelia.
ORFs clones encoding the early region proteins (E4, E5 proteins where relevant, E6, and E7) of
all seven HPVs (Supplementary Table 1) were amplified by PCR and cloned by recombination
into Gateway vectors.
HPV16 and HPV18 E6 variants: Due to the occurrence of internal splicing in HPV16-E6 and
HPV18-E6 and the desire to assure that full-length E6 proteins were expressed in IMR-90 cells,
various splice-defective derivatives of HPV16-E6 and HPV18-E6 were also generated. HPV11,
6b, 5 and 8 do not encode internally spliced versions of E6. HPV16-E6X, HPV16-E6XX and
HPV18-E6X correspond to the previously described major splice variants of these proteins39,40.
“E6X” designates the E6*, whereas E6XX designates the E6** splice variants. Primers used to
clone and mutagenize HPV16-E6 and HPV18-E6 variants are listed (Supplementary Table 13).
Rozenblatt-Rosen et al., 2012
32
W W W. N A T U R E . C O M / N A T U R E | 2 7
SUPPLEMENTARY INFORMATION RESEARCH
HPV16-E6X, HPV16-E6XX and HPV18-E6X were cloned by PCR amplification using as
template genomic DNA from IMR-90 cell populations that had been transduced with NTAP-
HPV16-E6 or NTAP-HPV18-E6 expression vectors. These cell lines had been shown to contain
integrated copies of these splice variants during the quality control process of the IMR-90 cell
culture pipeline. PCR products with the 5’ CACC overhang were directionally cloned into the
pENTR vector, sequenced, cloned into the NTAP vector and re-sequenced.
To generate versions of HPV16 and HPV18-E6 proteins that do not undergo internal splicing
events (designated E6NOX), the splice donor site(s) encoded within the respective E6 ORFs
were eliminated by site-directed mutagenesis (QuikChange®; Stratagene)40. As a consequence
the HPV16-E6NOX and HPV18-E6NOX proteins carry a V to L mutation at amino acid residues
42 and 44, respectively40.
The previously characterized HPV16-E6 mutants defective in p53 binding (Y54D) and
UBE3A/E6AP binding (I128T)41 were generated by site directed mutagenesis (QuikChange®;
Stratagene) of the NTAP-HPV16-E6NOX vector.
PDZ protein binding defective HPV16-E6 and HPV18- E6 mutants were generated by
cloning the HPV16-E6NOX and HPV18-E6NOX ORFs into the CTAP vector. This construct
fuses HA and FLAG epitope tags to the C-terminal PDZ binding domains and thereby blocks
association with cellular PDZ proteins.
Polyomaviruses: ORF clones were obtained from nine polyomaviruses: BK, HPyV6, HPyV7,
JCCY, JCMad1, MCPyV, SV40, TSV and WU. The entire early region of BK was PCR amplified
from BKPyV Dunlop genomic DNA cloned into pBR322 (gift from Michael Imperiale and Peter
Howley). Upon quality control we found that the vector encoded a truncated form of Large T
protein. The entire early region of HPyV6 was PCR amplified from pHPyV6-607a (Addgene
plasmid 24727)42. The entire early region of HPyV7 was PCR amplified from pHPyV7-713a
(Addgene plasmid 24728)42. The entire early region of JCCY was amplified from JCCY genomic
DNA (gift from Igor Koralnik). The entire early region of JCMad1 was amplified from JCMad1
Rozenblatt-Rosen et al., 2012
33
SUPPLEMENTARY INFORMATION
2 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
genomic DNA (gift from Igor Koralnik). The entire early region of TSV was amplified from
pUC19-TSV43. The entire WU early region was amplified from pcDNA WUER (gift from David
Wang). SV40 ORFs encoding Large T antigen (LT)44, Small T antigen (ST)45, Agnoprotein,
17KT, VP1, VP2, and VP3 were PCR amplified from SV40 genomic DNA.
MCPyV-LST was PCR amplified from Addgene plasmid 24729 pMCPyV-R17a containing
the MCPyV R17a genome42. MCPyV ORF clones MCPyV-LT, MCPyV-LTtrunc (which contains
a premature stop codon at position 961-963), and MCPyV-ST were subcloned from a MCPyV-
LST ORF that was first PCR amplified from tumour samples. MCPyV-ST was PCR amplified
directly, whereas MCPyV-LT and MCPyV-LTtrunc were obtained by overlap extension PCR46 to
generate a cDNA for LT (nt 429 to 861 of MCV). MCPyV-SPLT was created similarly but with
MCPyV-LT as template and with a different set of overlapping primers (spanning nucleotides
1622 to 2778 of MCV). LT, ST, and SPLT use the same Gateway attB1-ORF primer and LT and
SPLT use the same Gateway attB2-ORF primer. After two separate PCRs with the relevant
Gateway-ORF primer and internal spanning primer pairs, overlap extension PCR produced the
final PCR product for subsequent BP cloning to generate MCPyV-LT, MCPyV -LTtrunc, and
MCPyV -SPLT entry clones.
B. Yeast two-hybrid (Y2H) assay
Y2H assays used viral ORFs or ORF fragments against the human ORFeome v5.1 collection
consisting of ~15,000 full-length human ORFs10,30 (http://horfdb.dfci.harvard.edu/). Yeast strains
Y8800 and Y893047, of mating type MATa and MATα respectively, harboring the genotype leu2-
3,112 trp1-901 his3Δ200 ura3-52 gal4Δ gal80Δ GAL2::ADE2 GAL1::HIS3@LYS2
GAL7::lacZ@MET2 cyh2R, were transformed together with AD-viORF and DB-viORF
constructs, respectively. For Y2H screening MATa and MATα haploid yeast strains carrying the
corresponding AD-viORFs and DB-viORFs were mated against haploid yeast cells of the
appropriate mating type carrying DB-human hybrid proteins (DB-huORF) or AD-human hybrid
Rozenblatt-Rosen et al., 2012
34
W W W. N A T U R E . C O M / N A T U R E | 2 9
SUPPLEMENTARY INFORMATION RESEARCH
proteins (AD-huORFs), respectively. All AD-viORF and AD-huORF plasmids also carry the
counter-selectable marker CYH2, which allows selection on plates containing cycloheximide
(CHX) of yeast cells that do not contain any AD-Y plasmid9,48. Colonies growing on CHX-
containing medium lacking histidine are considered latent autoactivators and the presumptive
pair removed from the data set9,48. At each stage of the interactome mapping pipeline, reporter
gene activity is evaluated in parallel both on regular selective plates and on CHX-containing
plates.
For Y2H screens with viral ORFs as DB-hybrid proteins, the 123 Y8930:DB-viORF yeast
strains (Y8930 transformed with a unique DB-viORF) were individually mated against mini-
libraries containing 188 different Y8800:AD-HuORF yeast strains, each carrying a unique AD-
huORF49. Twenty-seven of the viral ORFs consistently behaved as autoactivators when
screened as DB-viORF fusions in yeast grown on selective medium (Supplementary Table 14).
Primary reciprocal screens of ~15,000 DB-huORFs against AD-viORFs used individual
Y8930:DB-huORF strains mated against virus-specific mini-libraries of Y8800:AD-viORF pools.
Primary screens in both orientations were completed twice and all initial positive pairs from the
primary screens underwent secondary phenotyping prior to determining the interaction pairs by
sequencing11,49,50. Each interacting pair was individually retested from fresh stocks of viral and
human yeast strains in both orientations, irrespective of the original Y2H orientation in which the
viral-human interactions were initially identified. Autoactivating baits were removed at each step
of the primary screening and during retesting.
Viral-human interactions found in multiple primary screens and verified by pair-wise
retesting are valid biophysical interactions11,49,50. Multiple pair-wise retesting can demonstrate
that interactions are reproducibly reliable even though any given pair may not retest positive
each and every time, providing an additional level of confidence in the data set. Besides the
pair-wise retesting done at the time of the original screens, we also carried out an additional
retest of all verified Y2H interaction pairs followed by resequencing of all DB-X and AD-Y ORFs
Rozenblatt-Rosen et al., 2012
35
SUPPLEMENTARY INFORMATION
3 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
to confirm ORF identity in each interaction pair. The final set of viral-human interactions,
consisting of all sequence-verified pairs that successfully retested in a CHX-sensitive manner,
contains 53 viral ORFs engaged in 454 interactions with 307 human proteins (Supplementary
Table 2). These 454 interactions constitute a set of biophysically real interactions that can be
successfully used for further analysis and downstream investigations.
For screens with HPV proteins, all individual viral-human pairs from each HPV strain were
also retested in both orientations using the orthologous ORFs of all seven HPVs against all
human proteins found by at least one HPV (i.e., a particular huORF found to interact with only
HPV16-E6 was tested against all seven HPV E6 proteins multiple times and in both
orientations). The final data set of HPV-human interactions includes those additional interaction
pairs even if they were not initially found in the primary screens, as long as they did score
positive multiple times in pair-wise retests, in either orientation, without exhibiting growth on
CHX-containing medium and after the identity of the specific HPV protein was confirmed by
sequencing. Virus-host interactions detected by Y2H are available at
http://interactome.dfci.harvard.edu/V_hostome/index.php.
C. Cell culture pipeline
Generating the IMR-90 cell bank: We obtained the human diploid fibroblast cell line IMR-90
(CCL-186) from the American Type Culture Collection (ATCC). IMR-90 cells were cultured in
DMEM (Cellgro) supplemented with 15% FBS (Omega Scientific), 1% Pen Strep (GIBCO), 1%
Glutamax (GIBCO) and 1% MEM NEAA (GIBCO). A master cell bank of third passage cells
consisted of 27 aliquots that were frozen and stored in liquid nitrogen. A single vial of frozen
cells was thawed and used to generate each batch of 26 cell lines in the pipeline
(Supplementary Fig. 16).
Generating IMR-90 cell lines expressing viral ORFs: Recombinant retrovirus was produced
in Phoenix cells by co-transfecting (Lipofectamine LTX, Invitrogen) plasmids encoding the viral
Rozenblatt-Rosen et al., 2012
36
W W W. N A T U R E . C O M / N A T U R E | 3 1
SUPPLEMENTARY INFORMATION RESEARCH
ORF, GAG/POL and ENV. To produce cell lines expressing each viral ORF, a vial of IMR-90
cells from the bank was thawed and expanded (5.0 x 105 IMR-90 cells were seeded onto 10 cm
plate per transduction) to allow simultaneous transduction of 26 individual viral ORFs or controls
(Supplementary Fig. 16), and then infected twice with recombinant retrovirus for 5 – 8 hr in the
presence of 5 µg/ml hexadimethrine bromide (Sigma) with subsequent selection with puromycin
(2 µg/ml).
To permit comparisons between different batches of 26 cell lines, each batch included
biological replicates of the GFP control and SV40 LT. Five pipeline batches with a total of 87
unique viral ORFs led to the establishment of 75 IMR-90 cell lines expressing unique viral ORFs
(Fig. 1b and Supplementary Table 1).
Quality control (QC): IMR-90 cells expressing viral ORFs were uniformly expanded to generate
several vials of cells that were frozen and stored. IMR-90 cell lines expressing viral ORFs were
banked at passages 7, 8, and 10. All cell lines had to pass several Quality Control (QC) steps
(Supplementary Fig. 16): semi-quantitative western blot analysis for viral ORF expression using
anti-HA antibodies (HA-11 Clone 16B12, Covance), mycoplasma testing (MycoAlert Kit, Lonza),
and sequencing of the integrated viral ORF from genomic DNA post transduction. Sixty-four of
the 75 cell lines initially established passed QC (Fig. 1b and Supplementary Table 1). Following
QC, cell lines were processed for microarray analysis and tandem affinity purification.
For the sequencing QC step, genomic DNA was extracted from all cell lines (DNeasy Blood
and Tissue Kit, Qiagen) and the viral ORFs were amplified with LA Taq (Takara), using the
following vector-specific primer sets:
MSCV-N and MSCV-C Fwd primer: 5’-CGCATGGACACCCAGACCAGGTC-3’
MSCV-N Rev primer: 5’-TCACGACATTCAACAGACCTTGC-3’
MSCV-C Rev primer: 5’-GTCGGGCACGTCGTAGGG-3’
PCR products were extracted from the gel (QIAquick Gel Extraction Kit, Qiagen) and sequenced
with the following primers:
Rozenblatt-Rosen et al., 2012
37
SUPPLEMENTARY INFORMATION
3 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
MSCV-N and MSCV-C Fwd primer: 5’-CCCTTGAACCTCCTCGTTCGACC-3’
MSCV-N Rev primer: 5’-GCCAAAAGACGGCAATATGGTGG-3’
RNA isolation and microarray analysis: IMR-90 cell lines expressing viral ORFs at passage 9
were seeded into five replicates at a density of 1 x 106 cells per 10-cm plates and harvested
after 48 hr in RNAlater (Ambion). After isolation of total RNA (RNeasy, Qiagen) RNA integrity
was determined using a Bioanalyzer (Agilent). Gene expression was assayed using Human
Gene 1.0 ST arrays (Affymetrix) according to standard protocols.
Cell proliferation and senescence assays: We measured the growth rate of a subset of IMR-
90 lines expressing the viral ORFs HPV5-E6, HPV8-E6, MCPyV-LTTRUNC, SV40-ST, MCPyV-
ST, HPV6b-E6, HPV18-E6X, C-GFP, N-GFP (two biological replicates), EBV-BGLF3, HPV18-
E6NOX, HPV16-E6, and SV40-LT (two biological replicates) between passages 9 and 12. Cells
were seeded in triplicate onto six 12-well plates (day 0; 2 x 104 cells per well) and cell density
was measured by crystal violet assay on days 1, 4, 7, 9, 11, and 14. All cell density values were
normalized to the value measured at day 1. For senescence assays, IMR-90 cells expressing
viral ORFs between passages 10 and 12 were seeded in triplicate onto 6-well plates and
subjected after 48 hr to a SA-b-gal Senescence Colorimetric Assay (Sigma). Stained cells were
overlaid with 50% glycerol and photographed at 10X magnification under a Nikon Eclipse E300
microscope with a Spot digital camera (Diagnostic Instruments, Inc.). Three non-overlapping
fields were photographed. The number of SA-b-gal positive cells was determined by
examination of the digital images51.
D. HPV E6 oncoproteins and Notch signalling
The U-2 OS cells used in several of the Notch signalling experiments were cultured in DMEM
(Invitrogen) supplemented with 10% FBS (Gemini Bio-products), and 1% Pen Strep (Invitrogen).
Generating pNCMV expression vectors encoding HPV E6 proteins: PCR amplification
products encoding HPV E6 proteins were amplified using B-Actin HPV E6 expression vectors as
Rozenblatt-Rosen et al., 2012
38
W W W. N A T U R E . C O M / N A T U R E | 3 3
SUPPLEMENTARY INFORMATION RESEARCH
a template52. PCR amplification products with a 5’ BamHI and a 3’ BglII restriction enzyme
cleavage sites were directionally cloned into the BamHI site of the pNCMV vector52,53, resulting
in an in-frame fusion of the FLAG and HA epitope tags at the N-terminus of the E6 coding
sequence. This approach was used for all HPV E6 proteins except for HPV18 E6 which has an
internal BamHI site, and was cloned into the pNCMV vector using PCR amplification products
with BglII restriction enzyme cleavage site at both the 5’ and 3’. All constructs were sequence
validated.
Transfection, cell lysates, immunoprecipitation, immunoblotting, and antibodies: U-2 OS
cells were seeded onto 15 cm plates (2 x 106) and transfected with 12.5 micrograms of a control
pNCMV vector or of a pNCMV vector encoding the indicated HPV E6 proteins using
XtremeGENE 9 (Roche). Cells were harvested 48 hours post transfection. Both IMR-90 cell
lines expressing viral ORFs and transfected U-2 OS cells were lysed in EBC buffer54 (50 mM
Tris HCl, pH 8.5, 150 mM NaCl, 0.5% NP-40, 0.5 mM EDTA, proteinase and phosphatase
inhibitors). Extracts were cleared at 14,000 rpm for 15 min. For immunoprecipitations, equal
amounts of cell extracts were incubated for 4 hr at 4°C with 30 µ l of anti-HA agarose (Sigma).
The beads were washed four times with EBC buffer and resuspended in sample loading buffer
(Bio-Rad). Proteins were resolved by SDS-PAGE (CriterionTM TGXTM precast gels, Bio-Rad) and
subjected to immunoblot analysis with one of the following commercially available antibodies:
anti-p300 (A-300-358A, Bethyl Laboratories), anti-MAML1 (4608S, Cell Signaling Technology or
A-300-672A, Bethyl Laboratories), anti-E6AP (H-182, sc-25509, Santa-Cruz Biotechnology),
anti-vinculin (V9131, Sigma), and anti-HA (HA-11 Clone 16B12, Covance). After incubation with
the appropriate secondary antibodies, antigen/antibody complexes were detected by enhanced
chemiluminescence (SuperSignal, Pierce).
Reverse transcription-quantitative PCR (RT-qPCR) analyses: Total RNA isolated from IMR-
90 cells expressing viral ORFs and total RNA isolated (RNeasy, Qiagen) from IMR-90 cells
expressing MAML1 shRNA or control was converted to cDNA with a QuantiTect Reverse
Rozenblatt-Rosen et al., 2012
39
SUPPLEMENTARY INFORMATION
3 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Transcription Kit (Qiagen). This cDNA preparation was diluted five-fold, and 1 µl was used per
reaction. Real-time PCR quantification, was done in triplicate, using the Brilliant III Ultra Fast
SYBR Green QPCR Master Mix (Agilent technologies), and a Stratagene MX3005 instrument.
The qPCR primer pairs for the human genes HES1 (HP208643), DLL4 (HP213393), and
MAML1 (HP211129) were obtained from OriGene. GAPDH (HP205798) was used as the
internal reference standard. We used the 2(-ΔΔ
Ct) method55 to quantify transcript levels.
RNA interference: For lentivirus-mediated RNA interference, the lentiviral construct targeting
MAML1 (SHCLNG-NM_014757, TRCN0000003353, Sigma) [5’-
CCGGCATGATACAGTTAAGAGGAATCTCGAGATTCCTCTTAACTGTATCATGTTTTT-3’] or
the control empty vector pLKO.1ps (gift of William Hahn) were transfected into HEK293FT cells
according to published protocols56. Puromycin selection (2 µg/ml) began two days after infecting
IMR-90 cells with lentiviruses and was maintained thereafter. IMR-90 cells transfected at
passage 9 with MAML1 shRNA or control pLKO.1ps vectors were seeded (1 x 106) into five
replicates (5 x 10-cm plates) and harvested after 48 hr in RNAlater (Ambion). After isolation of
total RNA (RNeasy, Qiagen) RNA integrity was determined using a Bioanalyzer (Agilent). Gene
expression was assayed using Human Gene 1.0 ST arrays. For siRNA-mediated RNA
interference ON-TARGETplus SMARTpool siRNA (Thermo Scientific) targeting MAML1 (L-
013417-00; sequences below), or control ON-TARGETplus Non-targeting siRNA (D-001810-10-
20; sequences below) were transfected into cells using Lipofectamine RNAiMAX (Invitrogen).
Cells were harvested 72 hours post-transfection.
siRNA J-013417-06, MAML1 GUUAGGCUCUCCACAAGUG siRNA J-013417-07, MAML1 GGCAUAACCCAGAUAGUUG siRNA J-013417-08, MAML1 GCAGCUGUCCAUAUAAGU siRNA J-013417-09, MAML1 UCGAAGACCUGCCUUGCAU siRNA D-001810-01, non-target UGGUUUACAUGUCGACUAA siRNA D-001810-02, non-target UGGUUUACAUGUUGUGUGA siRNA D-001810-03, non-target UGGUUUACAUGUUUUCUGA siRNA D-001810-04, non-target UGGUUUACAUGUUUUCCUA
Rozenblatt-Rosen et al., 2012
40
W W W. N A T U R E . C O M / N A T U R E | 3 5
SUPPLEMENTARY INFORMATION RESEARCH
Luciferase Reporter Assays: U-2 OS cells were seeded in triplicate onto 6-well plates (2 x 105
cells per well). Each well was transfected with 10 nanograms of pGK Renilla, 0.5 micrograms of
pGL2 mHes1-Luc, 0.25 microgram of pcDNA3 2HA-ICN1 (gift of Elliot Kieff) and increasing
concentrations (25, 50, and 100 nanograms) of the NCMV vector control or NCMV encoding the
indicated HPV E6 proteins using X-tremeGENE 9 (Roche). The total amount of DNA per
transfection was brought up to 2 microgram per well by adding salmon sperm DNA. After 48
hours cells were lysed with Passive Lysis Buffer (Promega), and subjected to the Dual-
Luciferase Reporter Assay (Promega). Luciferase activity was measured with a LMax II384
microplate reader (Molecular Devices), and the Renilla luciferase was used as the internal
reference standard.
E. Tandem Affinity Purifications (TAP) followed by Mass Spectrometry (MS)
IMR-90 fractionation and protein extract preparation: For each viral ORF tested the
corresponding IMR-90 cells were seeded into twelve to eighteen 15 cm dishes to produce ~2 to
4 x 108 cells by harvesting time. Cells were washed in situ with ice-cold PBS, harvested using a
cell lifter and collected by centrifugation at 500 x g for 5 min at 4°C. Subcellular fractions were
obtained by differential digitonin fractionation57 with the following modifications. To prepare
nuclear and cytoplasmic fractions, cell pellets were resuspended in five pellet volumes of ice-
cold digitonin extraction buffer (0.015% digitonin, 300 mM sucrose, 100 mM NaCl, 10 mM
PIPES, 3 mM MgCl2, pH 6.8) containing protease inhibitors (Roche) and then incubated end-
over-end for 10 min at 4°C. Triton X-100 was added to a final concentration of 0.5% and the cell
suspension was vortexed for 10 seconds. Triton X-100 extracted cells were then divided equally
between three tubes and centrifuged at 1000 x g for 10 min at 4°C. The supernatants (crude
cytoplasmic fraction) were transferred to fresh tubes apart from the nuclear pellets. Both
fractions were flash frozen in liquid nitrogen and stored at -80°C. To prepare nuclear and
Rozenblatt-Rosen et al., 2012
41
SUPPLEMENTARY INFORMATION
3 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
membrane fractions, cell pellets were resuspended in five pellet volumes of ice-cold digitonin
extraction buffer and then incubated end-over-end for 10 min at 4°C. The cell suspension was
centrifuged at 500 x g for 10 min at 4°C. The resulting pellet was resuspended in five pellet
volumes of Triton X-100 extraction buffer (0.05% Triton X-100, 300 mM sucrose, 100 mM NaCl,
10 mM PIPES, 3 mM MgCl2, pH 7.4) containing protease inhibitors and then incubated end-
over-end for 30 min at 4°C. The suspension was divided equally into three tubes and
centrifuged at 5000 x g for 10 min at 4°C. The supernatant (crude membrane fraction) was
transferred to fresh tubes and flash frozen in liquid nitrogen, apart from the nuclear pellets,
which were also flash frozen. Before affinity purification the composition of the cytoplasmic and
membrane fractions was adjusted to 150 mM NaCl, 50 mM Tris pH 7.5. The composition of the
cytoplasmic fraction was additionally adjusted by adding NP40 to a final concentration of 0.05%.
TAP of viral protein complexes: Viral proteins were purified from 6 to 9 plate-equivalents of
nuclear and cytoplasmic (or nuclear and membrane) fractions by sequential FLAG and HA
immunoprecipitation12,58. Tandem affinity purifications were typically done in batches of 2 to 6
cell lines expressing viral ORFs and included empty vectors or GFP controls cells. For FLAG
purification 40 ml of FLAG agarose bead slurry (FLAG-M2 agarose, Sigma) was added to
cellular extracts, which were then incubated for 4 hr at 4°C under constant end-over-end mixing.
The FLAG-agarose beads were collected by low speed centrifugation and washed three times
with 500 ml of cold FLAG IP buffer (150 mM NaCl, 50 mM Tris, 1 mM EDTA, 0.5% NP40, 10%
glycerol and protease inhibitors) and once with 500 ml of HA-IP buffer (150 mM NaCl, 50 mM
Tris, 1 mM EDTA, 0.05% NP40, 10% glycerol and protease inhibitors). FLAG-purified
complexes were recovered by two successive 30 min elutions with 20 ml FLAG elution buffer
(400 µg /ml of FLAG peptide in HA IP buffer) at 4°C under constant vortexing. To prevent FLAG
agarose bead carry-over, pooled eluates were filtered through a pre-wetted 0.45 mm filter
before the subsequent HA purification step. For HA purification 20 ml of HA agarose bead slurry
(HA-F7 agarose conjugate, Santa Cruz) was added to cellular fractions, which were then
Rozenblatt-Rosen et al., 2012
42
W W W. N A T U R E . C O M / N A T U R E | 3 7
SUPPLEMENTARY INFORMATION RESEARCH
incubated overnight at 4°C under constant vortexing. HA agarose beads were collected by low
speed centrifugation and washed three times with 200 ml of cold HA-IP buffer. HA-purified
complexes were recovered by two successive 30 min elutions in 20 ml HA elution buffer (400 µg
/ml of 6xHis-HA peptide in HA IP buffer) at 4°C under constant vortexing. To prevent HA
agarose bead carry-over, pooled eluates were filtered through a pre-wetted 0.45 mm filter
before analysis. A 15% aliquot of each HA eluate was resolved on 4-12% acrylamide gels run in
MOPS buffer (NuPAGE, Invitrogen), and viral and associated host proteins were visualised by
silver-stain (Supplementary Fig. 17).
Biological replicates: We repeated the tandem affinity purification process for 144 (95%) of the
initial 152 sub-cellular fractions used for the first iteration (TAP1). This biological repeat (TAP2)
was initiated several months after the completion of TAP1.
Sample digestion of viral-protein multi-component complexes: Purified viral proteins and
associated host proteins were denatured and reduced by incubation at 56°C for 30 min in 10
mM DTT and 0.1% RapiGest (Waters). Reduced cysteines were alkylated by adding
iodoacetamide to a final concentration of 20 mM and then incubated at room temperature for 20
min in the dark. Protein digestion was carried out overnight at 37°C after adding 2 mg of trypsin
and adjusting the pH to 8.0 with a 1 M Tris solution. For tryptic peptide clean-up RapiGest was
inactivated following the protocol of the manufacturer. Tryptic peptides were purified by batch-
mode reverse-phase C18 chromatography (Poros 10R2, Applied Biosystems) using 40 ml of a
50% bead slurry in RP buffer A (0.1% trifluoroacetic acid), washed with 100 ml of the same
buffer and eluted with 50 ml of RP buffer B (40% acetonitrile in 0.1% trifluoroacetic acid). After
vacuum concentration, peptides were further purified by strong cation exchange SCX
chromatography (Poros 20HS, Applied Biosystems) using 20 ml of a 50% bead slurry in SCX
buffer A (25% ACN in 0.1% formic acid), washed with 20 ml of the same buffer and eluted with
20 ml of SCX buffer B (25% ACN, 300 mM KCl in 0.1% formic acid).
Rozenblatt-Rosen et al., 2012
43
SUPPLEMENTARY INFORMATION
3 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Primary LC-MS/MS data acquisition: After vacuum concentration, half of each sample was
loaded onto a pre-column (100 µm I.D. x 4 cm; packed with POROS 10R2, Applied Biosystems)
at a flow rate of 4 µl/min for 15 min, using a NanoAcquity Sample Manager (20 µl sample loop)
and UPLC pump59. After loading, the peptides were gradient eluted (1-30% B in 45 min; buffer
A: 0.2 M acetic acid, buffer B: 0.2 M acetic acid in acetonitrile) at a flow rate of ~50 nl/min to an
analytical column (30 µm I.D. x 12 cm; packed with Monitor 5 µm C18 from Column
Engineering, Ontario, CA), and introduced into an LTQ-Orbitrap XL mass spectrometer
(ThermoFisher Scientific) by electrospray ionization (spray voltage = 2200V). Three rapid LC
gradients were included at the end of every analysis to minimize peptide carry-over between
successive analyses and to re-equilibrate the columns. The mass spectrometer was
programmed to operate in data dependent mode, such that the top eight most abundant
precursors in each MS scan (detected in the Orbitrap mass analyzer, resolution = 60,000) were
subjected to MS/MS resolution (CAD, linear ion trap detection, collision energy = 35%,
precursor isolation width = 2.8 Da, threshold = 20,000). Dynamic exclusion was enabled with a
repeat count of 1 and a repeat duration of 30 sec. The second half of each sample was
reanalyzed by LC-MS/MS after the completion of the initial set of analyses (“Technical
Replicates”).
Database searching: Orbitrap raw data files were processed within the multiplierz software
environment60. MS spectra were recalibrated using the background ion (Si(CH3)2O)6 at m/z
445.12 +/- 0.03 and converted into a Mascot generic format (.mgf). MS spectra were searched
using Mascot version 2.3 against four appended databases of: i) human protein sequences
(downloaded from RefSeq on 07/11/2011); ii) viral proteins analyzed in this study; iii) common
lab contaminants and iv) a decoy database generated by reversing the sequences from the
human database. Search parameters specified a precursor ion mass tolerance of 1 Da, a
product ion mass tolerance of 0.6 Da, trypsin with a maximum of two missed cleavages,
Rozenblatt-Rosen et al., 2012
44
W W W. N A T U R E . C O M / N A T U R E | 3 9
SUPPLEMENTARY INFORMATION RESEARCH
variable oxidation of methionine (M, +16 Da), and carbamidomethylation of cysteines (C, +57
Da).
Mascot search results from all experimental runs were separately imported for TAP1 and
TAP2 into two multiplierz mzResults files (.mzd)61 for further processing: The list of peptide hits
from the Mascot searches was filtered to exclude peptides with precursor mass error greater
than 5 ppm. Sequence matches to the decoy database were used to establish a 1% false
discovery rate (FDR) filter for the resulting peptide identifications. A rapid peptide matching
algorithm62 was then used to map peptide sequences to all possible human Entrez Gene IDs.
Only peptides that were uniquely assignable to a single Entrez Gene ID were considered as
detection evidence. For each viral-host protein association, peptide evidences for a given target
human protein were combined across different subcellular fractions and LC-MS/MS technical
replicates, when applicable. Any Entrez Gene ID present in the “Tandome” (Supplementary
Table 15) was automatically removed from the list of putative interactors.
TAP-LC-MS quality controls: We used a multi-tiered approach to minimize the occurrence of
peptide carryover across LC-MS analyses: 1) samples within each batch of LC-MS analyses
were injected in order of increasing abundance and complexity, as evaluated by SDS-
PAGE/silver stain visualisation of the corresponding TAPs; 2) three rapid LC gradients were run
at the end of every LC-MS analysis; 3) for all LC-MS analyses, we systematically evaluated run-
to-run carry-over by calculating extracted ion chromatogram (XIC) intensities of peptides derived
from viral proteins. 4) samples within each purification batch, and LC-MS analysis sequence
were randomized between biological replicates (TAP1 and TAP2). This multi-step quality control
process led us to discard all LC-MS data for EBV-BFRF2 and EBV-BRRF2 from our TAP2 (and
as a consequence from the final data set, since we only report proteins identified reproducibly
across TAP1 and TAP2), and for two GFP negative controls. Because we identified cross
contamination between HPV8 E6 and HPV11 E6 at an early point of our project, we were able
to repeat the TAP and LC-MS analyses to generate a “clean” data set for bio-replicates.
Rozenblatt-Rosen et al., 2012
45
SUPPLEMENTARY INFORMATION
4 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Pathway visualisation and analysis: Protein associations detected between HPV-E6 and
HPV-E7 oncoproteins and host proteins were imported into Pathway Palette for data analysis
and visualisation63. For each of these data sets, we first calculated the probability that the
observed set of human proteins targeted by multiple E6 or E7 across different HPV types
(HPV5, HPV6b, HPV8, HPV11, HPV16 and HPV18) could occur by chance (Fig. 1d and
Supplementary Fig. 4): for each viral protein of a given HPV type, the same number of
associated human proteins as that experimentally observed in our data was randomly chosen
from our HI-2 database of human binary interactions
(http://interactome.dfci.harvard.edu/H_sapiens/index.php). This process was repeated 1,000
times. The number of human proteins associated with two or more viral proteins from different
HPV types was calculated in each simulated network and the frequency distribution of random
graphs with a given number of shared nodes was computed. Lastly, this distribution was used to
calculate the probability that the number of human proteins targeted by multiple HPV types
observed in our experimental graphs (40 for E6 and 25 for E7) could occur by chance. We also
calculated the probability of detecting human proteins targeted by two viral proteins from HPVs
of the same class: the total number of human proteins targeted by viral proteins from pairs of
HPVs of the same class ([HPV5 and HPV8], [HPV6b and HPV11], [HPV16 and HPV18]) was
extracted from our TAP-MS data. This value was divided by the total number of human proteins
that could theoretically be shared across all pairs of HPV from the same class in our observed
networks. The same ratio was calculated for human proteins targeted by two viral proteins from
HPVs of different classes. The ratio of these two ratios (4.9 for E6 and 1.05 for E7, indicated by
green arrows in the frequency distribution on the right) represents the overall enrichment of
human proteins targeted by proteins encoded by HPVs of the same versus different class
observed in our graphs. One thousand randomized networks were created as before and used
to evaluate the probability that the observed ratio could occur by chance (Fig. 1d and
Supplementary Fig. 4). Virus-host interactions detected by TAP-MS are available at
Rozenblatt-Rosen et al., 2012
46
W W W. N A T U R E . C O M / N A T U R E | 4 1
SUPPLEMENTARY INFORMATION RESEARCH
http://interactome.dfci.harvard.edu/V_hostome/index.php?page=home. Graphs of protein
interaction networks for each viral protein, including the silver stain image of a representative
tandem affinity purification, are accessible at: http://blaispathways.dfci.harvard.edu/Palette.html.
F. Subtracting likely non-specific protein associations (“Tandome”)
Twenty five control TAP experiments were run at multiple points during the entire project. These
controls consisted of FLAG-HA purifications from nuclear, cytoplasmic or membrane extracts
prepared from parental IMR-90 cells, from IMR-90 expressing FLAG-HA-tagged GFP or from
IMR-90 cells transduced with the empty NTAP or CTAP vectors. An additional 83 control TAP
experiments consisted of FLAG-HA or FLAG-StrepTactin purifications on HeLa subcellular
extracts. Any Entrez Gene ID identified in more than 1% of the control TAP experiments
(whether or not the Gene ID was detected by a uniquely assignable peptide sequence) was
included in the “Tandome” list. Any of the 679 Gene IDs present in the Tandome was
systematically removed from all viral protein interaction TAP data sets (Supplementary Table
15).
G. TAP reproducibility
The reproducibility rate of TAP-MS across the two biological replicates (TAP1 and TAP2)
corresponds to the percentage of viral-host protein associations detected in one TAP (the
“reference TAP”) at a given uniquely assignable peptide threshold, which was confirmed in the
other (the “repeat TAP”). This percentage was calculated for both permutations in which the
“reference TAP” and the “repeat TAP” were successively (TAP1 and TAP2) and (TAP2 and
TAP1), respectively. The average reproducibility rate was also calculated. As expected,
reproducibility increased as a function of the number of unique peptides detected
(Supplementary Fig. 13).
Rozenblatt-Rosen et al., 2012
47
SUPPLEMENTARY INFORMATION
4 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
H. Virus-human Positive Reference Set (PRS)
The VirusMINT database curates from the literature viral-host interactions for 10 disease-
relevant virus groups64. Each curated interaction gets an evidence count, where multiple
evidences could be multiple publications, or multiple methodological verifications, reporting the
same interaction. To generate a Positive Reference Set (PRS) of high confidence interactions,
we only collected interactions supported by two or more evidences in VirusMINT, resulting in a
master list of 134 interactions. Given the different types of associations detected by Y2H and
TAP-MS, a specific PRS was designed for each one of these data sets (Supplementary Table
16). For the Y2H-specific PRS, we first selected host proteins with an available ORF clone in
the human ORFeome 5.1 collection (http://horfdb.dfci.harvard.edu/), resulting in a list of 94 viral-
host protein interactions. This list was filtered to remove interactions involving known
autoactivators in the yeast two-hybrid assay (Supplementary Table 14) and interactions that
were not tested in the yeast two-hybrid screen. The resulting Y2H-specific PRS contains 62
interactions. Starting with the master list of 135 interactions, the TAP-specific PRS was created
by removing interactions that involved viruses that were not tested in the TAP-MS assay and
interactions that involved a human protein present in our “Tandome” (Supplementary Table 15),
resulting in a TAP-specific PRS of 94 interactions (Supplementary Table 16).
I. Pathway enrichment analysis
Pathway enrichment used FuncAssociate 2.0 (http://llama.mshri.on.ca/funcassociate/)65, which
performs a Fisher’s Exact Test and compares the observed P-value for the query set to the
minimum P-value obtained from Monte Carlo sampling of the appropriate gene sample space.
When the query set of genes arose from multiple sample spaces, a Python script of the
FuncAssociate approach was used to independently sample from each space so that the
random query sets matched the original set in size and composition. KEGG pathway
annotations were downloaded from http://www.genome.jp/kegg/download/. Gene Ontology
Rozenblatt-Rosen et al., 2012
48
W W W. N A T U R E . C O M / N A T U R E | 4 3
SUPPLEMENTARY INFORMATION RESEARCH
annotations were downloaded from the FuncAssociate 2.0 website. For all analyses we
considered only specific pathways and functional terms, defined here as those which have been
annotated with fewer than 1000 genes. All P-values were adjusted for testing of all pathways in
the GOA or KEGG databases (Supplementary Table 17), including the reported P-values for
apoptosis pathway members. Additionally, pathway enrichment of TAP-MS data sets were
assessed at unique peptide thresholds ranging from 1 to 5, with the resulting P-values corrected
for the five hypotheses tested, using the Bonferroni method. Odds ratios were calculated using
2x2 contingency tables.
J. Microarray preprocessing and differential expression
Microarray data analysis used R/Bioconductor (http://www.bioconductor.org). Data was first
normalized by robust multiarray averaging (RMA) using a custom Chip Description File (CDF)
from the Michigan Microarray Lab (http://brainarray.mbni.med.umich.edu, Version 13). The
Human Gene 1.0 ST array from Affymetrix contains 764,885 probes, with 26 probes located
across the full length of each annotated gene from the March 2006 human genome sequence
assembly. The Michigan Microarray Lab CDF reanalyzes and matches each probe sequence to
the most current gene annotation, producing summarized expression values for 19,628 genes
with Entrez Gene IDs. Using this custom CDF rather than the original Affymetrix annotation
ensured a more accurate estimate of gene expression levels66.
A total of 435 microarrays were used to profile cell lines expressing 63 viral ORFs, as well
as controls. The arrays were processed in eight groups (Supplementary Table 18). Note that
each group includes cell lines expressing proteins from a variety of viruses, five of the groups
include control cell lines, and the order of the arrays was randomized. These preemptive
measures were taken in order to eliminate potential systematic biases that could affect our
results and interpretation. The normalized data showed a slight propensity to cluster according
to the batch covariate (Supplementary Fig. 18). We used ComBat67 to reduce this batch effect.
Rozenblatt-Rosen et al., 2012
49
SUPPLEMENTARY INFORMATION
4 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
ComBat applies a linear model to the expression levels, with factors corresponding to global
expression, sample-dependent expression, additive/multiplicative batch effects and noise. It
then synthesizes information across genes using an empirical Bayes framework, and thus
estimates and removes batch effects (Supplementary Fig. 18). The R package limma was used
to test for differential expression between each of the 63 conditions and GFP-control arrays.
Multiple testing adjustments were carried out with the Benjamini-Hochberg method.
K. Microarray gene and sample clustering
We used the R package limma to test for differential expression across all pairwise comparisons
between biologically distinct samples, including the control. The top 2,944 genes were selected
by ranking all genes on the array by the number of significant comparisons (Padj < 0.005;
Supplementary Table 19). This transcriptome profiling data is available at
http://interactome.dfci.harvard.edu/V_hostome/index.php?page=home. We applied the R
package mclust to carry out model-based clustering on the top 2,944 genes. We used the hmap
function in the seriation R package to generate the heatmap representation of mean fold-
change of expression of genes within each cluster.
L. Predicting cell-specific transcription factor binding sites
Position weight matrices (PWMs) corresponding to mammalian transcription factors were
obtained from the TRANSFAC 7.0, Jaspar, and UniProbe databases. We supplemented
available PWMs by downloading experimentally determined genome-wide transcription factor
binding sites from the GEO and the UCSC genome databases. We conducted de novo motif
discovery using the MEME 4.3 and Weeder 1.4.2 software tools. We then mapped motifs to
genes using either the keys provided by the databases or based on the antibody used for the
ChIP experiment. A total of 610 genes were identified for which at least one PWM was
available. Of these 361 had at least one “high-quality” PWM, defined as being derived from
Rozenblatt-Rosen et al., 2012
50
W W W. N A T U R E . C O M / N A T U R E | 4 5
SUPPLEMENTARY INFORMATION RESEARCH
either a ChIP-seq or ChIP-chip experiment or from a universal protein binding microarray
(PBM).
Database Number of PWMs Transfac: http://www.biobase-international.com/ 709 Jaspar: http://jaspar.genereg.net/ 721 Uniprobe (not in Jaspar): http://the_brain.bwh.harvard.edu/uniprobe/ 26 UCSC: http://genome.ucsc.edu/ or GEO: http://www.ncbi.nlm.nih.gov/geo/
67
Clustering of position weight matrices: Our identification of regulatory variants depends on
correctly mapping genes to motifs, although our integrative approach can tolerate some errors.
We took a computational approach to improving the quality of the gene to motif mapping. For
genes for which a “high-quality PWM” was available, we eliminated any other PWM that showed
no resemblance. To do so systematically, we hierarchically clustered all PWMs. First 100,000
random 30-mers were generated. Then for each of the ~1500 PWMs, the maximum match
score (MMS) for overlap between the motif and 30-mer was generated by comparing the
observed frequency fbj of base b at each position j in the PWM with the background frequency of
that base pb in the genome and summing over the length of the motif:
The matrix of 30-mer x PWM was hierarchically clustered using the hclust function in R, with
average linkage and the Pearson correlation coefficient used as the metric for similarity. A
coarse clustering cutpoint (height = 0.85, resulting in 232 total PWM clusters) was used to
broadly group together PWMs with some evidence of similarity. For genes having at least one
high quality PWM, we eliminated all PWMs that failed to cluster with any of these (although we
retained PWMs that represented dimers of the high quality motif). This approach rejected 113
gene-PWM pairs.
Rozenblatt-Rosen et al., 2012
51
SUPPLEMENTARY INFORMATION
4 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
Selection of motifs for genome-wide binding site prediction and enrichment analysis: We
selected individual motifs for subsequent analysis with the goal of providing at least one high
quality motif for every gene. For each of the 610 genes we selected all motifs derived from a
ChIP-seq, ChIP-chip or protein binding microarray experiment. If any other motif existed (i.e.
from Transfac or Jaspar), we included it if it appeared to correspond to a dimer or half-site not
represented in the otherwise selected high-quality motifs. All motifs were selected prior to
further analysis for genes without any high-quality PWM. A total of 841 motifs corresponding to
573 TFs satisfied the above criteria and the minimum match score of 12.
M. Transcription factor binding site (TFBS) enrichment analysis
IMR-90-specific TFBS were predicted by identifying binding sites with an MMS≥12 within
accessible chromatin regions. Accessible chromatin regions were identified using IMR-90-
specific genome-wide DNase-seq data downloaded in wig file format from the GEO database
(GSM530665, GSM530666). Wig files were processed using the BEADS packages68 to correct
for systematic biases in GC-content and ability to map reads. Corrected counts were smoothed
and binned into 250 bp groups. Bins with a significant number of counts were identified by
assuming that such counts follow a Poisson distribution. The Benjamini-Hochberg procedure
was used to estimate false discovery rates and contiguous bins with FDR < 0.001 were merged
into peaks. Only peaks observed within both replicate IMR-90 data sets were used for
subsequent analyses.
The enrichment of TFBS within a given expression cluster was computed for each TF by
counting the number of TFBS in DNase accessible regions located within 25 kb of the
transcription start site of any transcript within this cluster and comparing the count to a null
distribution of TFBS from length- and GC-content matched sets of mappable genomic regions.
For each cluster, nearby DNase-seq peaks were assembled into a single file and the peaks
trimmed to common lengths of 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250 or 2500 bp.
Rozenblatt-Rosen et al., 2012
52
W W W. N A T U R E . C O M / N A T U R E | 4 7
SUPPLEMENTARY INFORMATION RESEARCH
Next, 200 sets of length-matched “control” DNase-accessible segments were sampled from the
random genome segments. For each motif, the number of TFBS for each set of 200 sequences
was computed, and the values fitted to a Poisson distribution. This null distribution was used to
estimate the P-value for the actual number of TFBS within IMR-90 cells. The enrichment P-
values for all TF motifs within each cluster were combined and a tail-based false discovery rate
was estimated using the Benjamini-Hochberg procedure69. TFs with motifs significant at an FDR
of 0.5% or less were selected for further analysis. In total, we found 239 TFs to be enriched for
target genes in one or more clusters.
N. Randomization of gene clusters for enrichment analyses
To assess the biological coherence of identified gene clusters (Supplementary Tables 5 and 6)
we computed the random expectation of enriched KEGG pathways and GO terms. For 100
iterations, we randomly reassigned the 2,944 genes to 31 clusters matched in size to those
observed and repeated the enrichment analysis, counting at each iteration the average number
of significant KEGG pathways and GO terms per cluster. The computationally more intensive
GO term enrichment was performed by counting the average number of nominally significant
pathways (P < 0.05) per cluster. We found that there were more GO and KEGG annotations
than expected by chance (P < 0.01), underscoring the functional coherence of these clusters
(Supplementary Fig. 19).
O. Evaluating predictive power of regulatory cascades
We computed scores for the viral ORF-TF-cluster network in the following way: for each viral
ORF, we first identified all genes for which we would expect differential expression. To do so,
we identified all TFs that had a physical association with the protein encoded by the viral ORF
or was differentially expressed in response to its expression, considering fold changes greater
than 1.2 at a Padj < 0.001. For each of these TFs, we then identified all microarray clusters
enriched for genes containing the corresponding binding site within their promoters. Within
Rozenblatt-Rosen et al., 2012
53
SUPPLEMENTARY INFORMATION
4 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
these clusters, all genes with a high-probability binding site for that TF in their promoter were
selected. The union of these genes across all TFs targeted by a given viral ORF represents the
set of genes that would be expected to show differential expression upon transduction of this
viral ORF. We chose as the meaningful statistic the fraction of these target genes that were
observed to be differentially expressed on the corresponding array (i.e. where that viral ORF
was transduced) and then computed the average over all viral ORFs. A null distribution was
created by computing the same score for 1,000 random permutations of the 63 array conditions
(Fig. 2b). We used Cytoscape 2.8.1 to visualise the network70. The resulting network
(Supplementary Fig. 9) illustrates how viral proteins may regulate DNA damage response, as
well as core functions of fibroblasts, like proliferation and cell adhesion via multiple TFs.
P. Building a probabilistic map of IMR-90 specific RBPJ binding sites
To build a map of IMR-90 specific TF binding sites, we followed a previously described
probabilistic framework71. The approach combines PWM information with complementary
genomic and experimental data in a mixture model, and outputs a posterior probability of DNA
binding. Features of interest include chromatin accessibility (as obtained through genome-wide
DNase-seq number or histone mark ChIP-seq experiments), multi-species sequence
conservation, and distance from the nearest gene transcription start site.
We downloaded publicly available data from diverse sites. FASTA files corresponding to the
primary sequence of the complete human genome (hg19) were downloaded from the UCSC
FTP site. IMR-90-specific chromatin accessibility data was generated as in “Transcription factor
binding site (TFBS) enrichment analysis”. We collected individual base conservation information
from the phastCon tables in the UCSC genome browser. To identify transcription start sites for
all RefSeq IDs we downloaded the refGene.txt file from the UCSC Genome Browser FTP site.
Using the data obtained for each of the RBPJ PWM, we proceeded as follows:
Rozenblatt-Rosen et al., 2012
54
W W W. N A T U R E . C O M / N A T U R E | 4 9
SUPPLEMENTARY INFORMATION RESEARCH
1. Match score: the complete genome was scanned to compute a match score at each
base pair. All matches with a score ≥ 12 (an arbitrary threshold for similarity) were
included as potential binding sites.
2. Chromatin accessibility score: we defined a 200 bp “chromatin accessibility window”
around each potential binding site from (1) by assigning the total number of DNase-seq
reads in the 100 bp upstream and downstream of the motif starting position.
3. Nearest transcription start site: we identified the nearest transcription start site to each of
the potential binding sites.
4. Conservation score: we averaged the phastCon conservation score over the length of
the motif and assigned this average score to each potential binding site.
We then used the CENTIPEDE R package71 to infer an IMR-90-specific probability of RBPJ
binding at each position in the genome.
To generate a short list of Notch pathway transcriptional targets, we took the union of Notch
pathway members (as defined in KEGG) and potential Notch target genes72. From this set, we
chose the genes containing an IMR-90-specific high probability RBPJ binding site in their
promoters to generate the heatmap in Fig. 3b.
Q. Notch pathway enrichment analysis
Enrichment of viral protein targets for Notch pathway members compared the number of protein
complex associations or direct interactions between viral proteins and Notch pathway members
(as defined by KEGG Pathways) to the number of interactions selected at random for each viral
protein. The sampling methods have been defined in “HI-2 and analysis of overlaps between
Y2H, TAP-MS and PRS”.
R. Identification of loci implicated in familial and somatic cancer
The COSMIC Classic list of causal genes with somatic mutations observed in tumours was
downloaded from http://www.sanger.ac.uk/genetics/CGP/Classic/. To assemble the genome-
Rozenblatt-Rosen et al., 2012
55
SUPPLEMENTARY INFORMATION
5 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
wide association loci associated with cancer we consulted the publicly available Catalog of
Published Genome-Wide Association Studies (http://www.genome.gov/gwastudies/) and
selected all variants associated with cancer. We grouped variants showing evidence of linkage
disequilibrium and, for the resulting 107 loci, selected as the index SNP the most highly
associated variant for any phenotype.
We next mapped SNPs to neighboring genes, a task for which there is no consensus
procedure73. We first defined the boundaries in which SNPs tagged by the index variant may
reside. To do so, we identified all SNPs that were in close linkage disequilibrium (R2 ≥ 0.5) with
the index SNP. We then marched outward from the outermost SNP positions to the nearest
recombination hotspot74 (downloaded from the UCSC genome browser). We assumed that any
SNP within the region bounded by the two recombination hot spots may be a causal variant
responsible for the original association. Since variants within enhancers can act at considerable
distance from their physical position, we also considered all genes within 250 kb from the
hotspot boundaries. When no genes resided within these boundaries, we moved outwards in 50
Kb intervals until a gene was identified (Supplementary Table 20).
To compile SCNA loci implicated in somatic forms of cancer we consulted a recent
compendium27 of loci derived from SNP arrays of over 3,000 tumour samples. Physical
boundaries for the SCNAs were used to map each locus to genes contained fully within the
SCNA interval (Supplementary Table 21).
S. Testing enrichment of gene sets for cancer genes
We followed the simple hypothesis that the viral interactome is enriched in cancer genes and
looked for overlap between viral interaction partners and cancer genes highlighted in the
COSMIC Classic list (Supplementary Table 8). The number of overlapping genes was
compared to that obtained from 10,000 random gene sets matched in size and composition to
the query set.
Rozenblatt-Rosen et al., 2012
56
W W W. N A T U R E . C O M / N A T U R E | 5 1
SUPPLEMENTARY INFORMATION RESEARCH
T. Viral target overlap with candidate cancer genes identified by transposon screens
We identified four “Sleeping Beauty” transposon-based murine insertional mutagenesis screens
focused on finding genetic determinants of tumorigenesis in murine cancer models. These
studies correspond to murine models of colon cancer with75,76, or without77 mutations of the Apc
tumour suppressor, and pancreatic carcinoma78. The list of candidate cancer genes was
extracted from the main text or supplementary data of each manuscript and pooled to form a
single set of 1,359 genes (Supplementary Table 10). As expected, the candidate genes
identified through insertional mutagenesis exhibited a marked bias towards long genes. We also
noted that genes in this candidate set tend to be highly expressed based on our IMR90
microarray data.
We assessed the overlap between the Sleeping Beauty candidate genes and the VirHost
set by permutation. The analysis was limited to the universe of human genes with mouse
orthologues (downloaded from the Mouse Genome Database). We corrected for the bias
observed in gene expression and gene length by binning genes into quartiles based on mRNA
expression and based on gene length using the longest distance between the transcription start
site and 3’ end of all transcripts corresponding to a single gene. A total of sixteen bins were
created, corresponding to all combinations between bins of expression and gene length. 1,249
Sleeping Beauty candidate genes and 900 VirHost genes could be assigned to one of these
bins, with 156 genes common to the two sets. To assess the chance expectation for overlap,
random sets of 1,249 genes were selected that matched the original query set in gene length
and expression composition. Ten thousand iterations were used and the empirical P-value
determined to be the fraction of random gene sets with at least 156 overlapping genes.
We also identified an insertional mutagenesis screen completed to find genetic determinants
of leukemia using the Maloney Murine Leukemia virus79. The candidate gene set was
downloaded from the Supplementary Data, mapped to human orthologues and tested for
Rozenblatt-Rosen et al., 2012
57
SUPPLEMENTARY INFORMATION
5 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
overlap against the VirHost set. A significant overlap (20 out of 213 genes, P = 0.048) was
found.
U. Comparison of viral interactome and prioritization of cancer genes
Catalogues of somatic mutations from genome-wide cancer sequencing studies were obtained
from the supplementary tables of nine recently published studies80-88. The deleteriousness of
each mutation was assessed with the Polyphen2 web server
(http://genetics.bwh.harvard.edu/pph2/), which provides a score (HumVar) between 0 and 1,
with 1 being the most damaging (Supplementary Table 11). A cumulative deleteriousness score
for each protein was obtained by summing all Polyphen2 scores. We observed that large
proteins tended to have higher scores and thus normalized scores for protein length. All proteins
with at least one somatic mutation were ranked by cumulative deleteriousness score, with TP53
having the highest score (Supplementary Fig. 14). For a direct comparison of COSMIC Classic
gene overlap with the viral interactome at various peptide count thresholds, a matching number
of somatically mutated proteins were selected from the top of the ranked list and statistical
significance assessed by Fisher’s Exact Test. Genes at SCNA-AMP, SCNA-DEL and GWAS
loci were also tested against COSMIC Classic genes using Fisher’s Exact Test. Odds ratios
were calculated with 2x2 contingency tables.
Rozenblatt-Rosen et al., 2012
58
W W W. N A T U R E . C O M / N A T U R E | 5 3
SUPPLEMENTARY INFORMATION RESEARCH
Supplementary Notes
1. Significantly targeted and untargeted proteins
“Significantly targeted or untargeted” host proteins are sets of proteins that interact with viral
proteins at a higher or lower frequency, respectively, than would be expected given their degree
in the human HI-2 interaction network. The full set of viral-human Y2H interactions
(Supplementary Table 2) includes interactions obtained after recursive testing of the viral
proteins from all seven HPV types against a human interactor found by any one HPV in the
initial screen. To eliminate any bias introduced by the recursive nature of the verification retest
experiments with HPV proteins, and insure that all viruses are analyzed within the context of the
same reference search space this analysis only included interactions detected in the primary
screens and directly verified by pairwise retesting. In addition, we further restricted this analysis
to human proteins present in HI-2 (174 out of 307). We first counted the number of interactions
between viral proteins and human targets in the virus-human interaction network and then
randomly selected an identical number of proteins from HI-2. Sampling was performed with
replacement (i.e. a protein could be chosen more than once) and the likelihood of choosing any
protein determined by its number of interacting partners (degree) in HI-2. This process was
repeated 10,000 times to generate a random distribution When less than 5% of the simulations
generated a number of interactions equal to or greater than the number of experimentally
observed interactions for a given host protein, this was considered “significantly targeted”. When
more than 95% of the simulations generated a number of interactions equal to or greater than
the number of experimentally determined interactions, the host protein was considered
“significantly untargeted”. Otherwise, the human protein was considered “expectedly targeted”.
2. HI-2 and analysis of overlaps between Y2H, TAP-MS and their respective PRS
The human protein-protein interactions previously reported in our three high-throughput data
sets11,49,50 were updated to the gene annotations in human ORFeome 5.1 and then combined to
Rozenblatt-Rosen et al., 2012
59
SUPPLEMENTARY INFORMATION
5 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
generate the HI-2 list of unique interactions. Distinctions between ORFs corresponding to
multiple splice variant of a human gene were ignored when reporting protein-protein
interactions.
The significance of overlaps between different data sets was assessed by Fisher’s Exact
Test or permutation. We examined three overlaps, the overlap between TAP-MS and Y2H, as
well as the overlap of these two data sets separately with their respective PRS (Supplementary
Table 16). The search space for calculating the Y2H overlap with the Y2H-specific PRS is the
hORFeome v5.1. The search space for the TAP-MS and Y2H overlap corresponds to the
intersection of the hORFeome 5.1 and TAP-MS search spaces. The TAP-MS search space was
used to calculate the overlap between the TAP-MS data set and the TAP-MS-specific PRS. We
noticed that host proteins identified as viral targets by the TAP-MS technology tend to be
encoded by genes with high mRNA expression in IMR-90 cells (Supplementary Fig. 20). To
circumvent this expression bias when considering the TAP-MS data set, random sampling was
performed from a set of genes matching the query set in terms of expression quartiles, as
judged from genome-wide microarray analysis of IMR-90 cells.
Statistical significance of overlaps observed between the Y2H data set or the TAP-MS data
set and their respective PRS were computed by random sampling from the appropriate search
space, and controlling for expression bias for TAP-MS. Both our TAP-MS protein complex and
Y2H binary data sets significantly overlapped with PRS pairs (17 out of 94 protein pairs [P <
0.001] and 1 out of 62 protein pairs [P = 0.047] respectively). The overlap between our protein
complex viral-host interactions and those in the PRS varied as a function of analytical
parameters in our TAP-MS data: We observed that the fold enrichment of PRS proteins in the
set of host proteins detected in protein complex associations increased with the number of
unique peptides, although with fewer associations recovered at more stringent thresholds
(Supplementary Fig. 21).
Rozenblatt-Rosen et al., 2012
60
W W W. N A T U R E . C O M / N A T U R E | 5 5
SUPPLEMENTARY INFORMATION RESEARCH
Twenty four viral ORFs were found to have interactors in both TAP-MS and Y2H data sets
(Supplementary Fig. 1), and of these a total of six viral ORF-host protein interactions were
common to the two technologies (Supplementary Fig. 2). The observed overlap of six
interactions was significantly higher (P < 0.001) than the most common random expectation of
0.81 interactions (Supplementary Fig. 2). To test whether the set of genes identified by TAP-MS
as targets of a given viral protein included the immediate neighbours of viral targets identified by
Y2H, we repeated the analysis after expanding the list of Y2H targets to their direct neighbours
in HI-2. We observed that targets of viral protein identified through both technologies showed a
significant tendency to interact with each other (P < 0.001; Supplementary Fig. 2) implying that
host targets of the two technologies tend to ‘fall in the same neighbourhood’ of the network.
3. Measurement of the precision of the Y2H data set by wNAPPA
We measured the fraction of true biophysical interactions in our virus-host binary Y2H data set
(precision) by comparing the recovery rate of the detected interactions to those of the Positive
Reference Set (PRS) and Random Reference Set (RRS) in a wNAPPA orthogonal assay. In the
wNAPPA assay, bait and prey fusion proteins were expressed by coupled transcription-
translation and GST-tagged bait proteins captured by anti-GST antibody. Interactions were
detected using antibody to HA with standard immunochemical protocols. One viral ORF
(adeno5-E1A) was found to bind non-specifically to the GST plate and was therefore removed
from subsequent analysis of the wNAPPA data (Supplementary Table 22). The precision was
calculated for a recovery rate of 1% for the RRS, corresponding to a z-score threshold of 1.5. At
this threshold, we observe recovery rates of 9% ± 1% for the interactions detected by Y2H, 10%
± 4% for the PRS interactions and precisely 1% ± 1% for the RRS interactions. The precision
value was calculated according to the following formula50.
Rozenblatt-Rosen et al., 2012
61
SUPPLEMENTARY INFORMATION
5 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
where I+ is the number of true positives in the detected interactions (precision), f+ is the false
positive rate of wNAPPA (obtained as the fraction of RRS pairs reported positive) and (1 – f-) is
the fraction of Y2H supported PRS pairs that report positive in wNAPPA. Error bars of the
precision were calculated according to the variation of those values in the [1.5, 1.6] z-score
window. These measurements led to a precision of 90% +/- 7% for the virus-host binary Y2H
data set. We determined the fraction of true biophysical interactions in our virus-host binary Y2H
data set to be ~90% by comparing the validation rates of the data set to those of the PRS and
random reference set (Supplementary Fig. 22).
4. Network motif identification
We hypothesized that the striking variation in expression of multiple clusters may relate to TF
activity reinforcing feedback or feedforward loops. We thus used the list of predicted TF binding
sites to explore the interconnectedness of TFs. We focused on the TFs that were enriched for
TFBS in expression clusters and physically associated with or were differentially expressed in
response to expression of viral proteins (Supplementary Table 7). We looked for instances of
autoregulatory loops, feedback loops, and feed-forward loops – network motifs that might
amplify or modulate signals in biological networks – and compared the number observed in our
data to that observed after 1,000 random selections of TFs chosen from the set of TFs with one
or more predicted high-probability promoter binding site. There were far more interconnections
than expected by chance (P < 0.001; Supplementary Fig. 6), suggesting that viral proteins
target a dense regulatory TF network.
5. Cellular growth phenotypes
If gene expression clusters regulated by viral proteins were relevant to human cancer, then the
downstream variation in expression of host genes would be reflected in cellular growth
phenotypes. We measured growth and senescence rates for fifteen IMR-90 viral protein
expressing and control cell lines, and computed their Pearson correlation coefficients (R) with
Rozenblatt-Rosen et al., 2012
62
W W W. N A T U R E . C O M / N A T U R E | 5 7
SUPPLEMENTARY INFORMATION RESEARCH
the average gene expression level in each cluster. The significance of the correlation was
assessed for each cluster by randomly sampling 1,000 sets of equal size from the full gene
universe represented on the microarray. The resulting P values were adjusted for multiple
testing across the 31 clusters using the Benjamini-Hochberg procedure. There was significant
correlation between five clusters and growth or senescence phenotypes. Cluster C31
demonstrated the strongest correlation between mRNA expression and both cellular growth (R
= 0.8, Padj = 0.008) and senescence (R = -0.92, Padj = 0.016) (Supplementary Fig. 8). This
correlation suggests that the expression variation observed in each cluster across cell lines (Fig.
2a) is functionally relevant to viral proteins altering cellular phenotypes.
6. Cancer pathways perturbed by viral ORFs
Viral proteins perturb diverse core signalling pathways and biological functions of the host cell,
including many of the known hallmarks of cancer14. Dysregulation of these pathways perturbed
by viral ORFs could lead to tumorigenesis. Together, our transcriptional analysis and the
interactome data sets shed light on known and potentially novel connections between viral
proteins and cancer mechanisms.
We found several signatures of growth suppressor evasion in the transcriptional data. Viral
proteins in group III, which include polyomavirus and high-risk HPV proteins that target RB1,
were correlated with increased expression of genes involved in cell proliferation, including
components of the cell cycle (CDC25A, CDC25C, CDK1) and DNA replication machinery
(PCNA, RFC and MCM family members) and genes involved in nucleosome assembly (HIST1H
histone family). TP53-dependent pathways were typically downregulated in group III proteins,
reflecting that many of these viral proteins bind to and inactivate TP53. Pathways include cell-
cycle arrest mediated by CDKN1A as well as cell death through BAX and TRAIL pathway
components (cluster C12, 2.0-fold enriched for TP53 binding sites, P = 2.3 X 10-9). We also
observed that group II viral proteins, which include alternatively spliced E6 proteins from high-
Rozenblatt-Rosen et al., 2012
63
SUPPLEMENTARY INFORMATION
5 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
risk HPV types, HPV E5 proteins, E7 proteins from low-risk HPV6b and HPV11 and high-risk
HPV16, and many EBV proteins including EBNALP, induced a large decrease in TGFβ
signalling genes (cluster C16).
Metabolic dysregulation is often observed in cancer cells. Group III viral proteins induced
high expression of genes involved in cholesterol biosynthesis genes (DHCR24, HMGCS1,
SQLE) and fatty acid metabolism (MGLL, FABP3, ELOVL6, SCD), potentially reflecting the
need of rapidly proliferating cells and tumours for cholesterol89 and unsaturated fatty acids90,91.
Many viral proteins also increased expression of genes involved in response to hypoxia (cluster
C22).
Other characteristics that enable tumour growth include angiogenesis, invasion, and
metastasis. Group III viral proteins induced increased transcription of genes in the VEGF
pathway (cluster C23) and decreased transcription of genes involved in core functions of
fibroblasts, including expression of collagen components, secreted growth factors and
metalloproteases, and cell adhesion molecules such as integrins and protocadherins (clusters
C13, C17, C18, C19; Supplementary Fig. 9). Such changes imply some amount of
“reprogramming” induced by oncogenic viral proteins and again draw parallels with cancer
pathogenesis. High-risk HPV proteins may achieve changes to cell-substrate adhesion and
ECM production through transcriptional regulation of the growth factor EGR1. Other TFs with
highly enriched binding sites for these pathway genes include FOSL2, which interacts with low-
risk HPV E7s, and RUNX1. Still other viral ORFs may act through the transcription factors KLF7
and ARNTL, both of which are involved in insulin signalling92,93.
Many of the transcriptional changes caused by viral proteins relate to the two enabling
characteristics of cancer: genome instability and mutation, and tumour-promoting inflammation.
Group III viral proteins caused decreased expression of genes involved in the response to DNA
damage, including not only cell cycle arrest and apoptosis, but also autophagy via lysosome
assembly (clusters C6 and C7) and acidification (cluster C3; 3.5-fold enriched for binding sites
Rozenblatt-Rosen et al., 2012
64
W W W. N A T U R E . C O M / N A T U R E | 5 9
SUPPLEMENTARY INFORMATION RESEARCH
of the autophagy implicated TF NRF2/NFE2L294 (P = 0.0007); and oxidative stress control
through induction of G6PD and glutathione transferases GSTM4 and GSTM5 (cluster C15). We
also observed marked upregulation of DNA nucleotide excision and double-stranded break
repair (cluster C26 and C27: BRCA1, BRCA2, POL epsilon/delta, FANCB, XRCC2, BLM,
RAD51) and robust activation of two “inflammasome” pathways known to be triggered by DNA
damage: the type I interferon response via IRF activity (cluster C24, 9.5-fold enriched, P = 3 X
10-10; Fig. 2b) and inflammatory responses (e.g. IL6, CCL2, IL1B) via NF-κB activity (cluster
C23, 4.7-fold enriched for NF-κB sites, P = 1.8 X 10-5). Our network suggests that HPV E6s,
EBV proteins, and polyomaviruses activate inflammatory pathways by regulating the expression
of TFs like IRF1, NF-kB, RELB, and MAFF. In contrast, HPV E7s perturb REL directly through a
binary interaction. Tumour virus proteins also induce changes in the expression of genes
involved in other signalling pathways, including PDGFR signaling, calcium signalling, and WNT
signalling.
Pathway enrichment analysis of proteins identified through TAP-MS or Y2H was also
instructive on established and novel pathways targeted by viruses. To identify host pathways
targeted by viral proteins in our combined interactome data set, we explored the enrichment of
Gene Ontology (GO) terms among host target proteins (Supplementary Fig. 3). Enrichment for
some of the GO terms was expected (e.g., “mitotic cell cycle” for HPV, EBV and polyomavirus
and “protein phosphatase 2A binding” for polyomavirus). Over-representation of “mitotic cell
cycle” genes amongst HPV-associated proteins in our data set arises not only from established
associations of HPV E6 and E7 with the RB1 and TP53 tumour suppressors, but also from
associations observed with 28 additional cell cycle proteins, including PCNA, MCM3 and
CDKN1A (Padj < 0.01, OR = 3.3) (Supplementary Table 17).
Enrichment analysis also revealed previously unrecognised pathways unique to specific
viruses. EBV proteins collectively bind members of the procollagen-proline 4-hydroxylase family
(Padj= 0.05, OR = 389; Supplementary Fig. 3) including P4HA2, a hypoxia- and p53-responsive
Rozenblatt-Rosen et al., 2012
65
SUPPLEMENTARY INFORMATION
6 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
protein, which inhibits angiogenesis and tumour growth in mice95. Adenoviral proteins bind
multiple members of the BH-domain family (BAX, BIK, BAK1; Padj = 0.04, OR = 159), which play
a critical role in apoptosis96. The adenovirus E4ORF1 protein binds to a family of mRNA 5’UTR
binding proteins (Padj = 0.03, OR = 319), including the insulin-like growth factor binding protein 2
(IGFBP2), which has recently been implicated in angiogenesis and metastasis in breast
cancer97.
Rozenblatt-Rosen et al., 2012
66
W W W. N A T U R E . C O M / N A T U R E | 6 1
SUPPLEMENTARY INFORMATION RESEARCH
Supplementary References 30. Rual, J.F. et al., Human ORFeome version 1.1: a platform for reverse proteomics.
Genome Res. 14, 2128-2135 (2004). 31. Sowa, M.E., Bennett, E.J., Gygi, S.P., & Harper, J.W., Defining the human
deubiquitinating enzyme interaction landscape. Cell 138, 389-403 (2009). 32. Cole, S.T. & Danos, O., Nucleotide sequence and comparative analysis of the human
papillomavirus type 18 genome. Phylogeny of papillomaviruses and repeated structure of the E6 and E7 gene products. J. Mol. Biol. 193, 599-608 (1987).
33. Dartmann, K., Schwarz, E., Gissmann, L., & zur Hausen, H., The nucleotide sequence and genome organization of human papilloma virus type 11. Virology 151, 124-130 (1986).
34. Fuchs, P.G., Iftner, T., Weninger, J., & Pfister, H., Epidermodysplasia verruciformis-associated human papillomavirus 8: genomic sequence and comparative analysis. J. Virol. 58, 626-634 (1986).
35. Zachow, K.R., Ostrow, R.S., & Faras, A.J., Nucleotide sequence and genome organization of human papillomavirus type 5. Virology 158, 251-254 (1987).
36. Cole, S.T. & Streeck, R.E., Genome organization and nucleotide sequence of human papillomavirus type 33, which is associated with cervical cancer. J. Virol. 58, 991-995 (1986).
37. Schwarz, E. et al., DNA sequence and genome organization of genital human papillomavirus type 6b. EMBO J. 2, 2341-2348 (1983).
38. Seedorf, K., Krammer, G., Durst, M., Suhai, S., & Rowekamp, W.G., Human papillomavirus type 16 DNA sequence. Virology 145, 181-185 (1985).
39. Schwarz, E. et al., Structure and transcription of human papillomavirus sequences in cervical carcinoma cells. Nature 314, 111-114 (1985).
40. Sedman, S.A. et al., The full-length E6 protein of human papillomavirus type 16 has transforming and trans-activating activities and cooperates with E7 to immortalize keratinocytes in culture. J. Virol. 65, 4860-4866 (1991).
41. Liu, Y. et al., Multiple functions of human papillomavirus type 16 E6 contribute to the immortalization of mammary epithelial cells. J. Virol. 73, 7297-7307 (1999).
42. Schowalter, R.M., Pastrana, D.V., Pumphrey, K.A., Moyer, A.L., & Buck, C.B., Merkel cell polyomavirus and two previously unknown polyomaviruses are chronically shed from human skin. Cell Host Microbe 7, 509-515 (2010).
43. van der Meijden, E. et al., Discovery of a new human polyomavirus associated with trichodysplasia spinulosa in an immunocompromized patient. PLoS Pathog. 6, e1001024 (2010).
44. Zalvide, J. & DeCaprio, J.A., Role of pRb-related proteins in simian virus 40 large-T-antigen-mediated transformation. Mol. Cell. Biol. 15, 5800-5810 (1995).
45. Hahn, W.C. et al., Enumeration of the simian virus 40 early region elements necessary for human cell transformation. Mol. Cell. Biol. 22, 2111-2123 (2002).
46. Zhong, Q. et al., Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol. 5, 321 (2009).
47. James, P., Halladay, J., & Craig, E.A., Genomic libraries and a host strain designed for highly efficient two-hybrid selection in yeast. Genetics 144, 1425-1436 (1996).
48. Walhout, A.J. & Vidal, M., A genetic strategy to eliminate self-activator baits prior to high-throughput yeast two-hybrid screens. Genome Res. 9, 1128-1134 (1999).
49. Rual, J.F. et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).
50. Venkatesan, K. et al., An empirical framework for binary interactome mapping. Nature Methods 6, 83-90 (2009).
Rozenblatt-Rosen et al., 2012
67
51. Litovchick, L., Florens, L.A., Swanson, S.K., Washburn, M.P., & DeCaprio, J.A., DYRK1A protein kinase promotes quiescence and senescence through DREAM complex assembly. Genes Dev. 25, 801-813 (2011).
52. Spangle, J.M. & Munger, K., The human papillomavirus type 16 E6 oncoprotein activates mTORC1 signaling and increases protein synthesis. J. Virol. 84, 9398-9407 (2010).
53. Baker, S.J., Markowitz, S., Fearon, E.R., Willson, J.K., & Vogelstein, B., Suppression of human colorectal carcinoma cell growth by wild-type p53. Science 249, 912-915 (1990).
54. Litovchick, L. et al., Evolutionarily conserved multisubunit RBL2/p130 and E2F4 protein complex represses human cell cycle-dependent genes in quiescence. Mol. Cell 26, 539-551 (2007).
55. Livak, K.J. & Schmittgen, T.D., Analysis of relative gene expression data using real-time quantitative PCR and the 2(-ΔΔCT) Method. Methods 25, 402-408 (2001).
56. Stewart, S.A. et al., Lentivirus-delivered stable gene silencing by RNAi in primary cells. RNA 9, 493-501 (2003).
57. Ramsby, M.L. & Makowski, G.S., Differential detergent fractionation of eukaryotic cells. Analysis by two-dimensional gel electrophoresis. Methods Mol. Biol. 112, 53-66 (1999).
58. Nakatani, Y. & Ogryzko, V., Immunoaffinity purification of mammalian protein complexes. Methods Enzymol. 370, 430-444 (2003).
59. Ficarro, S.B. et al., Improved electrospray ionization efficiency compensates for diminished chromatographic resolution and enables proteomics analysis of tyrosine signaling in embryonic stem cells. Anal. Chem. 81, 3440-3447 (2009).
60. Parikh, J.R. et al., multiplierz: an extensible API based desktop environment for proteomics data analysis. BMC Bioinformatics 10, 364 (2009).
61. Webber, J.T., Askenazi, M., & Marto, J.A., mzResults: an interactive viewer for interrogation and distribution of proteomics results. Mol. Cell. Proteomics 10, M110 003970 (2011).
62. Askenazi, M., Marto, J.A., & Linial, M., The complete peptide dictionary--a meta-proteomics resource. Proteomics 10, 4306-4310 (2010).
63. Askenazi, M., Li, S., Singh, S., & Marto, J.A., Pathway Palette: a rich internet application for peptide-, protein- and network-oriented analysis of MS data. Proteomics 10, 1880-1885 (2010).
64. Chatr-aryamontri, A. et al., VirusMINT: a viral protein interaction database. Nucleic Acids Res. 37, D669-673 (2009).
65. Berriz, G.F., Beaver, J.E., Cenik, C., Tasan, M., & Roth, F.P., Next generation software for functional trend analysis. Bioinformatics 25, 3043-3044 (2009).
66. Dai, M. et al., Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).
67. Johnson, W.E., Li, C., & Rabinovic, A., Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118-127 (2007).
68. Cheung, M.S., Down, T.A., Latorre, I., & Ahringer, J., Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 39, e103 (2011).
69. Benjamini, Y. & Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B (Methodological) 57, 289-300 (1995).
70. Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., & Ideker, T., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27, 431-432 (2011).
71. Pique-Regi, R. et al., Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447-455 (2011).
72. Palomero, T. et al., NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth. Proc. Natl Acad. Sci. USA 103, 18261-18266 (2006).
Rozenblatt-Rosen et al., 2012
68
SUPPLEMENTARY INFORMATION
6 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
51. Litovchick, L., Florens, L.A., Swanson, S.K., Washburn, M.P., & DeCaprio, J.A., DYRK1A protein kinase promotes quiescence and senescence through DREAM complex assembly. Genes Dev. 25, 801-813 (2011).
52. Spangle, J.M. & Munger, K., The human papillomavirus type 16 E6 oncoprotein activates mTORC1 signaling and increases protein synthesis. J. Virol. 84, 9398-9407 (2010).
53. Baker, S.J., Markowitz, S., Fearon, E.R., Willson, J.K., & Vogelstein, B., Suppression of human colorectal carcinoma cell growth by wild-type p53. Science 249, 912-915 (1990).
54. Litovchick, L. et al., Evolutionarily conserved multisubunit RBL2/p130 and E2F4 protein complex represses human cell cycle-dependent genes in quiescence. Mol. Cell 26, 539-551 (2007).
55. Livak, K.J. & Schmittgen, T.D., Analysis of relative gene expression data using real-time quantitative PCR and the 2(-ΔΔCT) Method. Methods 25, 402-408 (2001).
56. Stewart, S.A. et al., Lentivirus-delivered stable gene silencing by RNAi in primary cells. RNA 9, 493-501 (2003).
57. Ramsby, M.L. & Makowski, G.S., Differential detergent fractionation of eukaryotic cells. Analysis by two-dimensional gel electrophoresis. Methods Mol. Biol. 112, 53-66 (1999).
58. Nakatani, Y. & Ogryzko, V., Immunoaffinity purification of mammalian protein complexes. Methods Enzymol. 370, 430-444 (2003).
59. Ficarro, S.B. et al., Improved electrospray ionization efficiency compensates for diminished chromatographic resolution and enables proteomics analysis of tyrosine signaling in embryonic stem cells. Anal. Chem. 81, 3440-3447 (2009).
60. Parikh, J.R. et al., multiplierz: an extensible API based desktop environment for proteomics data analysis. BMC Bioinformatics 10, 364 (2009).
61. Webber, J.T., Askenazi, M., & Marto, J.A., mzResults: an interactive viewer for interrogation and distribution of proteomics results. Mol. Cell. Proteomics 10, M110 003970 (2011).
62. Askenazi, M., Marto, J.A., & Linial, M., The complete peptide dictionary--a meta-proteomics resource. Proteomics 10, 4306-4310 (2010).
63. Askenazi, M., Li, S., Singh, S., & Marto, J.A., Pathway Palette: a rich internet application for peptide-, protein- and network-oriented analysis of MS data. Proteomics 10, 1880-1885 (2010).
64. Chatr-aryamontri, A. et al., VirusMINT: a viral protein interaction database. Nucleic Acids Res. 37, D669-673 (2009).
65. Berriz, G.F., Beaver, J.E., Cenik, C., Tasan, M., & Roth, F.P., Next generation software for functional trend analysis. Bioinformatics 25, 3043-3044 (2009).
66. Dai, M. et al., Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).
67. Johnson, W.E., Li, C., & Rabinovic, A., Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118-127 (2007).
68. Cheung, M.S., Down, T.A., Latorre, I., & Ahringer, J., Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 39, e103 (2011).
69. Benjamini, Y. & Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B (Methodological) 57, 289-300 (1995).
70. Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., & Ideker, T., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27, 431-432 (2011).
71. Pique-Regi, R. et al., Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447-455 (2011).
72. Palomero, T. et al., NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth. Proc. Natl Acad. Sci. USA 103, 18261-18266 (2006).
Rozenblatt-Rosen et al., 2012
68
73. Raychaudhuri, S. et al., Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 5, e1000534 (2009).
74. Myers, S., Bottolo, L., Freeman, C., McVean, G., & Donnelly, P., A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321-324 (2005).
75. Starr, T.K. et al., A Sleeping Beauty transposon-mediated screen identifies murine susceptibility genes for adenomatous polyposis coli (Apc)-dependent intestinal tumorigenesis. Proc. Natl Acad. Sci. USA 108, 5765-5770 (2011).
76. March, H.N. et al., Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis. Nature Genet. 43, 1202-1209 (2011).
77. Starr, T.K. et al., A transposon-based genetic screen in mice identifies genes altered in colorectal cancer. Science 323, 1747-1750 (2009).
78. Mann, K.M. et al., Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma. Proc. Natl Acad. Sci. USA 109, 5934-5941 (2012).
79. Bergerson, R.J. et al., An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model. Blood 119, 4512-4523 (2012).
80. Cancer Genome Atlas Research Network, Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615 (2011).
81. Agrawal, N. et al., Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154-1157 (2011).
82. Chapman, M.A. et al., Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467-472 (2011).
83. Lee, W. et al., The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473-477 (2010).
84. Parsons, D.W. et al., An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807-1812 (2008).
85. Parsons, D.W. et al., The genetic landscape of the childhood cancer medulloblastoma. Science 331, 435-439 (2011).
86. Pleasance, E.D. et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-190 (2010).
87. Puente, X.S. et al., Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101-105 (2011).
88. Stransky, N. et al., The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157-1160 (2011).
89. Kuehnle, K. et al., Prosurvival effect of DHCR24/Seladin-1 in acute and chronic responses to oxidative stress. Mol. Cell. Biol. 28, 539-550 (2008).
90. Scaglia, N., Caviglia, J.M., & Igal, R.A., High stearoyl-CoA desaturase protein and activity levels in simian virus 40 transformed-human lung fibroblasts. Biochim. Biophys. Acta 1687, 141-151 (2005).
91. Scaglia, N. & Igal, R.A., Stearoyl-CoA desaturase is involved in the control of proliferation, anchorage-independent growth, and survival in human transformed cells. J. Biol. Chem. 280, 25339-25349 (2005).
92. Kawamura, Y., Tanaka, Y., Kawamori, R., & Maeda, S., Overexpression of Kruppel-like factor 7 regulates adipocytokine gene expressions in human adipocytes and inhibits glucose-induced insulin secretion in pancreatic beta-cell line. Mol. Endocrinol. 20, 844-856 (2006).
93. Marcheva, B. et al., Disruption of the clock components CLOCK and BMAL1 leads to hypoinsulinaemia and diabetes. Nature 466, 627-631 (2010).
Rozenblatt-Rosen et al., 2012
69
W W W. N A T U R E . C O M / N A T U R E | 6 3
SUPPLEMENTARY INFORMATION RESEARCH
73. Raychaudhuri, S. et al., Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 5, e1000534 (2009).
74. Myers, S., Bottolo, L., Freeman, C., McVean, G., & Donnelly, P., A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321-324 (2005).
75. Starr, T.K. et al., A Sleeping Beauty transposon-mediated screen identifies murine susceptibility genes for adenomatous polyposis coli (Apc)-dependent intestinal tumorigenesis. Proc. Natl Acad. Sci. USA 108, 5765-5770 (2011).
76. March, H.N. et al., Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis. Nature Genet. 43, 1202-1209 (2011).
77. Starr, T.K. et al., A transposon-based genetic screen in mice identifies genes altered in colorectal cancer. Science 323, 1747-1750 (2009).
78. Mann, K.M. et al., Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma. Proc. Natl Acad. Sci. USA 109, 5934-5941 (2012).
79. Bergerson, R.J. et al., An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model. Blood 119, 4512-4523 (2012).
80. Cancer Genome Atlas Research Network, Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615 (2011).
81. Agrawal, N. et al., Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154-1157 (2011).
82. Chapman, M.A. et al., Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467-472 (2011).
83. Lee, W. et al., The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473-477 (2010).
84. Parsons, D.W. et al., An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807-1812 (2008).
85. Parsons, D.W. et al., The genetic landscape of the childhood cancer medulloblastoma. Science 331, 435-439 (2011).
86. Pleasance, E.D. et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-190 (2010).
87. Puente, X.S. et al., Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101-105 (2011).
88. Stransky, N. et al., The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157-1160 (2011).
89. Kuehnle, K. et al., Prosurvival effect of DHCR24/Seladin-1 in acute and chronic responses to oxidative stress. Mol. Cell. Biol. 28, 539-550 (2008).
90. Scaglia, N., Caviglia, J.M., & Igal, R.A., High stearoyl-CoA desaturase protein and activity levels in simian virus 40 transformed-human lung fibroblasts. Biochim. Biophys. Acta 1687, 141-151 (2005).
91. Scaglia, N. & Igal, R.A., Stearoyl-CoA desaturase is involved in the control of proliferation, anchorage-independent growth, and survival in human transformed cells. J. Biol. Chem. 280, 25339-25349 (2005).
92. Kawamura, Y., Tanaka, Y., Kawamori, R., & Maeda, S., Overexpression of Kruppel-like factor 7 regulates adipocytokine gene expressions in human adipocytes and inhibits glucose-induced insulin secretion in pancreatic beta-cell line. Mol. Endocrinol. 20, 844-856 (2006).
93. Marcheva, B. et al., Disruption of the clock components CLOCK and BMAL1 leads to hypoinsulinaemia and diabetes. Nature 466, 627-631 (2010).
Rozenblatt-Rosen et al., 2012
69
94. Rao, V.A. et al., The antioxidant transcription factor Nrf2 negatively regulates autophagy and growth arrest induced by the anticancer redox agent mitoquinone. J. Biol. Chem. 285, 34447-34459 (2010).
95. Teodoro, J.G., Parker, A.E., Zhu, X., & Green, M.R., p53-mediated inhibition of angiogenesis through up-regulation of a collagen prolyl hydroxylase. Science 313, 968-971 (2006).
96. Kelekar, A. & Thompson, C.B., Bcl-2-family proteins: the role of the BH3 domain in apoptosis. Trends Cell Biol. 8, 324-330 (1998).
97. Png, K.J., Halberg, N., Yoshida, M., & Tavazoie, S.F., A microRNA regulon that mediates endothelial recruitment and metastasis by cancer cells. Nature 481, 190-194 (2011).
Rozenblatt-Rosen et al., 2012
70