Rozenblatt-Rosen et al., 2012 SUPPLEMENTARY INFORMATION · 17 Pathway enrichment analysis Gene...

W W W. N A T U R E . C O M / N A T U R E | 1

SUPPLEMENTARY INFORMATIONdoi:10.1038/nature11288

Contents

Supplementary Figures and Legends

1 Summary of viral ORFs screened 2 Overlap of Y2H and TAP-MS data sets 3 Enrichment of GO terms for targeted host proteins 4 Network of HPV E7 protein complex associations 5 Chromatin accessibility of IRF1 binding sites 6 Regulatory loops 7 Heatmap of transcriptome perturbations 8 Growth rate and senescence of selected IMR-90 cell lines expressing viral ORFs 9 Viral proteins, transcription factors and clusters 10 MAML1 is targeted by E6 proteins from HPV5 and HPV8 11 The E6 protein from HPV5 and HPV8 regulates expression of Natch pathway responsive

genes HES1 and DLL4 12 DNA tumour viruses target cancer genes 13 Reproducibility of viral-host protein associations observed in replicate TAP-MS

experiments as a function of the number of unique peptides detected 14 Somatic mutations in genome-wide tumour sequencing projects 15 Network of VirHostSM to host targets and cancers 16 The IMR-90 cell culture pipeline 17 Silver stain analyses of viral-host protein complexes 18 Cluster propensity of microarrays before and after ComBat 19 Cluster coherence 20 Expression bias of TAP-MS associations 21 Overlap of viral-host protein pairs identified through TAP-MS with a literature-curated

positive reference set 22 Assessment of Y2H data set

Supplementary Methods

A. Viral ORFeome cloning B. Yeast two-hybrid (Y2H) assay C. Cell culture pipeline D. HPV E6 oncoproteins and Notch signalling E. Tandem Affinity Purifications (TAP) followed by Mass Spectrometry (MS) F. Subtracting likely non-specific protein associations (“Tandome”) G. TAP reproducibility H. Virus-human Positive Reference Set (PRS)

Rozenblatt-Rosen et al., 2012

3

SUPPLEMENTARY INFORMATION

2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

I. Pathway enrichment analysis J. Microarray preprocessing and differential expression K. Microarray gene and sample clustering L. Predicting cell-specific transcription factor binding sites M. Transcription factor binding site (TFBS) enrichment analysis N. Randomization of gene clusters for enrichment analyses O. Evaluating predictive power of regulatory cascades P. Building a probabilistic map of IMR-90 specific RBPJ binding sites Q. Notch pathway enrichment analysis R. Identification of loci implicated in familial and somatic cancer S. Testing enrichment of gene sets for cancer genes T. Viral target overlap with candidate cancer genes identified by transposon screens U. Comparison of viral interactome and prioritization of cancer genes

Supplementary Tables (available on Nature website)

1 List of viral ORFs Summary of all viral ORFs tested, including all information for each viral ORF and the data generated for each ORF.

2 Y2H data set Summary of virus-host binary interactions identified by Y2H screens.

3 Host proteins targeted by viral proteins in Y2H

List of host proteins in the Y2H network that are significantly targeted or untargeted by viral proteins.

4 TAP data set List of virus-host protein associations identified in two independent TAP-MS experiments.

5 GO cluster summary List of the GO terms that are enriched in each cluster of differentially regulated genes, and the adjusted P values and odds ratios associated with them.

6 KEGG cluster summary List of all the KEGG terms that are enriched in each cluster of differentially regulated genes, and the adjusted P values and odds ratios associated with them.

7 TF list by Y2H,TAP,Array Virally targeted TFs with an enriched number of predicted binding sites for genes within each cluster. Listed TFs come from either associations through TAP-MS, interactions through Y2H, or are differentially expressed in response to viral ORF expression.

8 COSMIC Classic annotation COSMIC Classic genes are annotated based on literature review as tumour suppressor, oncogene or both. If no call could be made, the designation is left as unclear.

9 VirHost data set 947 VirHost proteins identified through TAP-MS association (3 or more unique peptides), Y2H interaction, or as differentially expressed TFs from motif enrichment analysis.

10 Transposon Candidates Overlap between Sleeping Beauty and murine leukemia virus transposon-based screens and VirHost data set.


4


SUPPLEMENTARY INFORMATION RESEARCH

11 Polyphen score Cumulative POLYPHEN2 score calculated for each gene with one or more somatic mutations from genome-wide tumour sequencing data.

12 GWAS–VirHost, SCNA-DEL–VirHost, SCNA-AMP–VirHost intersections

Genes at intersection of VirHost data set with genes identified through cancer GWAS or through cancer SCNA analysis (amp=amplification, del=deletion).

13 PCR Primers and viral ORF sequences

PCR primers used to generate viral ORFs from Ad5, HPVs and PyVs and sequence of each viral ORF.

14 Autoactivators List of viral ORFs that act as auto-activators in Y2H or wNAPPA protein interaction assays.

15 Tandome List of host proteins identified in more than 1% of 108 control TAP experiments (Tandome).

16 Y2H PRS, TAP-MS PRS List of Y2H and TAP-MS Positive Reference Sets (PRS). 17 Pathway enrichment analysis Gene Ontology term enrichment for viral targets identified

by TAP-MS for Ad5, EBV, HPV and PyV viruses; Odds ratio and resampling-adjusted P-values are provided.

18 Microarray batches Lists of the individual microarrays in the gene expression data set, and the batch in which each one was processed.

19 Differentially regulated genes Annotation of the most frequently perturbed host genes, with the cluster membership, fold changes, and adjusted P values in each cell line expressing a viral ORF relative to control cell line.

20 GWAS loci Mapping of loci previously identified through cancer GWAS to nearby genes.

21 SCNA loci Loci previously identfied through SCNA analysis of cancers, with corresponding statistical evidence (q-value and residual q-value) and genes within each interval.

22 wNAPPA results wNAPPA scores for tested viral ORF interactions.

Supplementary Notes

1. Significantly targeted and untargeted proteins 2. HI-2 and analysis of overlaps between Y2H, TAP-MS and their respective PRS 3. Measurement of the precision of the Y2H data set by wNAPPA 4. Network motif identification 5. Cellular growth phenotypes 6. Cancer pathways perturbed by viral ORFs

Supplementary References


5



RESEARCH

Tested byY2H: 123

Transduced intoIMR90: 87

6855 19

Viral ORFs screened

2426 29

73

10

Y2H: 53

TAP-MS: 54

Microarray: 63

Viral ORF data sets

Supplementary Figure 1. Summary of viral ORFs screened. Overlap of the number of viral ORFs tested in the Y2H and cell line branches of the experimental pipeline (left Venn diagram), and the overlaps of viral ORFs that yielded data in each channel of the experimental pipeline (right Venn diagram).


6

OR

HPV8E6

Virus HostEBV-EBNALPEBV-EBNA1EBV-EBNA1SV40-LTSV40-LTSV40-LT

SP100KPNA6SRPK2ZC3H18TP53CNBP

Freq

uenc

y

Sampling spaceY2H

TAP-MS

0

50

100

150

200

250

0

50

100

150

200

250

P < 0.001

0 10 20 30 40 0 10 20 30 40

Y2H

Y2H

+N(H

I-2)

Number of virus-host protein interactionsoverlapping between Y2H and TAP-MS

SV40LT TP53

TAP-MSV-H

Y2HV-H

SV40LT TP53

TAP-MSV-H

Y2HV-H

PDLIM7

GEMY2HV-H

Y2HH-H

TAP-MSV-H

P < 0.001

P < 0.001P < 0.001

Supplementary Figure 2. Overlap of Y2H and TAP-MS data sets. Upper panels: number of virus-host interactions observed (green arrow) in Y2H and TAP-MS versus those seen by chance through random sampling of the Y2H (red) or TAP-MS (blue) search spaces, with six shared inter-actions observed listed. Lower panels: corresponding overlaps with expanded (Y2H+N(HI-2)) network, which includes human proteins “one hop” away in the HI-2 human-human interactome network.


7



1 5Odds Ratio

HP

V

EB

V

Pol

yom

aviru

s

Ade

novi

rus

ATPase activity

EBV

PDLIM7

P4HA1

P4HA2 PDLIM7

P4HB

P4HA3

Procollagen-proline4-dioxygenase activity

POLYO

PDLIM7

STRN

TP53 PDLIM7

STRN4

STRN3

PP2A binding

ADENO5

PDLIM7BAX

BH domain binding

BAK1BIKGPI-anchor transamidase complex

Mitotic cell cycle

Proteasome complex

Apoptosis

Procollagen-proline 4-dioxygenase activity

Protein phosphatase 2A binding

BH domain binding

mRNA 5’UTR binding

Supplementary Figure 3. Enrichment of GO terms for targeted host proteins. Enrichment of GO terms for host proteins physically interacting with viral proteins (Supplementary Table 17). Three examples are shown. All Odds Ratios higher than 5 were set to 5 for visualization purposes.


8



RESEARCH

Shared host targetsof HPV-E7 proteins

Freq

uenc

y0

300

600

0 150

60

120

10.5 21.5

Freq

uenc

y

Ratio = E7 of same classE7 of different class

P < 0.001

P = 0.424

CutaneousHigh-riskLow-risk 30

Supplementary Figure 4. Network of HPV E7 protein complex associations. Network of p r o t e i n complex associations of E7 viral proteins from six HPV types (hexagons, coloured according to disease class) with host proteins (circles). Host proteins that associate with two or more E7 proteins are coloured according to the disease class(es) of the corresponding HPV types. Circle size is proportional to the number of associations between host and viral proteins in the E7 networks. Viral-host protein complex associations (links) are weighted by the number of unique peptides detected for the host protein (thin links: 1-2; thick links: ≥ 3). Distribution plots of 1,000 randomised networks and experimentally observed data (green arrow) for the number of host proteins targeted at random by two or more proteins in the corresponding sub-networks (left grams), or the ratio of the probability of a host protein being targeted by viral pro-teins from the same class to the probability it is targeted by viral proteins from different classes (right histograms). Representative random networks selected from these distributions are shown as insets in the histograms.


9



Response to Type I interferonPadj < 0.001

OR = 40

IFITM1 IFIT3 IFIT2 IFIT1IFI6 IFI35

IRF1

IMR

90 D

NA

se A

cces

ibili

ty (c

ount

s pe

r 10

mill

ion

read

s)

Position (bp)

0

50

100

150

200

●●

IFIT3 IFIT1L IFIT1

91,100,000 91,120,000 91,140,000 91,160,000

AAACTGGAAAGTGAAACT AAAGGGGAAAGTGAAACTSite 1 Site 2

2 4 6 8 10 12 14 16 18

0

0.5

1.0

1.5

2.0

IRF1 binding motif

Info

rmat

ion

cont

ent

Position

Supplementary Figure 5. Chromatin accessibility of IRF1 binding sites. DNase accessible canonical IRF1 binding sites in the promoters of interferon inducible genes highly expressed in response to Group III viral protein expression (left). Predicted targets of IRF1 within cluster C24 are significantly annotated for the GO term “Type I interferon response” (right).


10



RESEARCH

E2F1

TFAP2A

SMAD3

TFAP2A

RBPJ

HES1

Autoregulatory loops

Freq

uenc

y

0

50

100

150

4 8 12

0

200

400

0 50 100 150

Freq

uenc

y

0

50

100

150

0 2,000 4,000 6,000

Freq

uenc

y

Feedback loops

Feedforward loops

0

P < 0.001

P < 0.001

P < 0.001

Supplementary Figure 6. Regulatory loops. Null distributions of autoregulatory (top), feedback (middle) and feedforward (bottom) motifs compared to the number of observed network motifs (green arrow) for enriched TFs.


11



DNA damage response (DNA replication)(Circadian rhythm)Potassium ion import

Nucleosome assembly Regulation of cell proliferationType I interferon response

Response to hypoxia

PDGFR signalling (Pathways in Cancer)Cell morphogenesis in differentiationProteinaceous extracellular matrix

TGF-beta signalling pathwayGlutathione metabolism

Response to radiation (Wnt signalling)

Lipid biosynthesis

Lysosome (Lysosome)Lysosome (Lysosome)

(Phagosome)

C31C30C29C28C27C26C25C24C23C22C21C20C19C18C17C16C15C14C13 C12C11C10C9C8C7C6C5C4C3C2C1

Regulation of fibroblast proliferation

(Calcium signalling pathway)

Growth factor binding

Enriched GO (KEGG)TP53RB/E2F

5−E6

8−E6

BIL

F1E

4OR

F3B

LRF2

TSV

-LS

TM

CP

yV-L

TTR

MC

PyV

-LT

LMP

2AE

1B19

KB

NR

F1B

RLF

1E

BN

A1

BX

LF1

E4O

RFB LF

211

-E6

6b-E

6B

ALF

1M

CP

yV-L

ST

BN

LF2B

BFL

F2B

TRF1

BR

RF2

16-E

6XX

18-E

6X5-

E7

E4O

RF1

BFR

F0.5

E3G

P19

KB

SR

F1B

GLF

311

-E5B

16-E

56b

-E5A

BFR

F28-

E7

18-E

5E

BN

ALP

6b-E

5BB

HR

F1B

GLF

3.5

6b-E

711

-E7

BN

LF2A

BD

LF3.

5B

DLF

4B

VLF

1SV40−S

TW

U-L

ST

JCC

Y-L

ST

16-E

7JC

Mad

1-LS

TS

V40

-LT

HP

yV7-

LST

HP

yV6-

LST

16−E

6NOX

18−E

6NOX

16-E

6NO

XI1

28T

16-E

618−E

618−E

7MCPyV−S

T

[snoRNA]#

TRAIL binding (p53 signalling pathway)

I II III

Cell-substrate adhesion

Acute inflammatory response

DNA damage response

Enriched TF

0 12 (fold change)

Color Key

−1Log

E2F* SP* REL* CTCF MAZ

POU2F2 CREB3* SP* E2F* CRXBPTF REL TFAP2* NRF1 ETV4

STAT NFKB* REL* CTCF E2F*

REL E2F* MAFF PATZ1 NRF1

STAT1 EGR2RELA FOSL2 JUNB TFAP4 EGR2

E2F* STAT* TFAP4 EGR2 EP300

GABPB2

NFYA E2F* ZFP161 MYC FOSL*

STAT* TP53 REL FOSL2 E2F*

FOSL2 ARNTL

FOSL2

FOSL2 JUN TFAP4 STAT1 EGR2

ERG GABPB2 FOS* NFE2L2

REL* NFKB*IRF1 STAT1 CREB* PATZ1 MAZ

RBPJ CREB3* EGR2 EHF STAT*

E2F* FOSL2 JUN TFAP2A

EGR2 FOSL2

E2F*

E2F* NRF1 EGR1

Supplementary Figure 7. Heatmap of transcriptome perturbations. Enlarged version of Fig. 2a. Enriched GO terms and KEGG pathways are listed adjacent to the numbered expression clus-ters. Transcription factors (TFs) with enriched binding sites and gene targets enriched for the listed GO and/or KEGG pathways that are physically associated with or differentially expressed in response to viral proteins are shown, with * denoting multiple members of a TF family. TFs are ranked by odds ratio (OR) for pathway enrichment, and up to 5 TFs are shown for any cluster. In cluster C1 eight of the nine transcripts are snoRNAs (# sign). Upper dendrogram is shaded by viORF grouping. Blocks show which viral proteins associate with the indicated host proteins, as detected in our data set (grey) or manually curated (green).

EBV HPV Adenovirus Polyomavirus


12


1 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4

0.0

0.1

0.2

0.3

SV40-LT

HVP5-E6

HPV6b-E6

HPV8-E6

HPV16-E6

N-GFPSV40-ST

MCPyV-ST

EBV-BGLF3

N-GFP

SV40-LT

HPV18-E6NOX

HPV18-E6X

MCPyV-LTTRUNC

C-GFP

Average cluster C31 log2(fold change)

Gro

wth

rate

(1/d

ays)

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4

0

20

40

60

80

100

SV40-LT

HVP5-E6

HPV8-E6

HPV6b-E6HPV16-E6

N-GFP

SV40-ST

MCPyV-ST

MCPyV-LTTRUNC

EBV-BGLF3

N-GFP

SV40-LT

HPV18-E6NOX

HPV18-E6XC-GFP

Per

cent

age

of s

enes

cent

cel

ls

Average cluster C31 log2(fold change)

R = 0.8Padj = 0.008

R = -0.92Padj = 0.016

Supplementary Figure 8. Growth rate and senescence of selected IMR-90 cell lines expressing viral ORFs. Growth rate and senescence plotted against average expression of genes in cluster 31. Error bars in growth rate represent standard deviation across three samples in all cases except for MCPyV-LTTRUNC, which is across two samples, and HPV5-E6 and HPV8-E6, which represent one sample each. Error bars in senescence represent standard devia-tion across two samples.


13

W W W. N A T U R E . C O M / N A T U R E | 1 1


LF2

HPV18-E5

SV40-ST

HPV11-E7

HPV6b-E5BHPV6b-E7

BRLF1

HPV16-E7HPV8-E7

BNLF2A

GATA6REL

CREB3L1STAT4

ZNF219

GABPB2CREBL2ARNTLFOSL2CREB3BACH1FOXK1ERG

STAT1E2F5TFDP2E2F1

ELF1TP73

SP4HBP1CREB5

STAT2ATF3

BLRF2BDLF3.5BGLF3

MCPyV-STHPV5-E6

HPV18-E7

HPV16-E6XX

BFLF2

ZFP161

NFIATEAD2

FLI1

NFIBTFAP4SMAD3

KLF7 HPV8-E6

HPV18-E6EBNA1

BILF1

HPV11-E5B

MCPyV-LTTRUNC

TSV-LSTMCPyV-LT

BFRF0.5

HES1

TFAP2C

LEF1

E2F3STAT3POU2F2

MYBL1EGR1MYC

FOS

TP53

HPV6b-E6

BDLF4HPV5-E7

WU-LST

BTRF1

E4ORF1NR5A2

RELB

E2F4

TFDP1

EGR2

NFKB2JUNB

NFYA

EHFCEBPB

BHRF1EBNALP

JCMad1-LSTHPV11-E6

JCCY-LSTBALF1

BXLF1

NFKB1MAXMAFF

CRXBPTF

EP300

PRDM1

RB1ZNF281NFE2L3

TEAD4RORA

SV40-LT

MCPyV-LST

BNRF1

BVLF1HPyV7-LST

MYBL2IRF1

CTCFSOX11

RUNX1JUN

ETV4

STAT5A

ELF4

MAZMYBSP1SP3

GABPB1ELK3

NFE2L2

E2F2

FOSL1

NFYBETS1

PATZ1

BNLF2BHPV16-E6NOX

BSRF1HPV16-E6

HPV16-E6NOXI128T

BGLF3.5

E4ORFB

LMP2ABRRF2

HPV18-E6NOXE1B19K

HPyV6-LST

E4ORF3

C

Differential expression? V

C

Experimental TFC V

Predicted

Cell-substrate adhesion

Regulation of fibroblast proliferation

DNA damage response

Causes expression change of TF

EBV HPV Adenovirus Polyomavirus

TAP-MS Y2H Binding site enrichment

TFs

Clusters

HPV18E6

HPV5E6

C27

SV40LT

ETV4

E4ORF3

BPTF

BILF1

ELF4

HPV11E7

JUN TEAD2 FOSL2

HPV6bE7

HPV5E6

C8

HPV8E6

HPyV6LST

TFAP4

EGR1

HPV18E7

HPV6bE7

FOSL2

C13

HPV16E6

HPV11E7BRLF1

TFs

Clusters

Viralproteins

Viralproteins

C28

C24

C25

C30

C31

C14

C9

C22C11C19C23C20C16

C8C15C13

C3

C27

C18

C26

C10

C7C17C21

C2

C12C5C4

C6C29

Supplementary Figure 9. Viral proteins, transcription factors and clusters. Network repre-sentation of all predicted viral protein-TF-cluster cascades (left). Schematic (right) shows how viral protein-TF-target gene network was constructed, with representative networks underneath.


14



RESEARCH

IMR-90

WB

IP: HA

kDa HP

V5-

E6

HP

V8-

E6

HP

V6b

-E6

HP

V11

-E6

HP

V16

-E6

HP

V18

-E6

GFP

G

FP W

CL

EP300 225 -

102 - MAML1 150 -

GFP

HP

V5-

E6

HP

V8-

E6

HP

V6b

-E6

HP

V11

-E6

HP

V16

-E6

HP

V18

-E6

WB

WCL

EP300 225 -

kDa

MAML1 102 - 150 -

HA 31 - 24 -

38 -

UBE3A 76 -

102 - 102 -

HA *31 - 24 -

38 -

UBE3A 76 -

Actin 38 -

WB

IP: HA

kDa siM

AM

L1

HP

V5-

E6

HP

V8-

E6

HP

V6b

-E6

HP

V11

-E6

HP

V16

-E6

HP

V18

-E6

siC

ontro

l

EP300 225 -

U-2 OS

HP

V18

-E6

HP

V16

-E6

HP

V11

-E6

HP

V6b

-E6

HP

V8-

E6

HP

V5-

E6

pNM

CV

WB

WCL

EP300 225 -

kDa siM

AM

L1

siC

ontro

l

150 -

pNM

CV

HA

UBE3A

24 -

102 - 76 - UBE3A 102 -

76 -

MAML1 150 - 120 -

24 - HA

150 - 102 - MAML1

52 - Actin 38 -

WCL

Supplementary Figure 10. MAML1 is targeted by E6 proteins from HPV5 and HPV8. Western blots of whole cell lysates (WCL) and co-immunoprecipitations of HPV E6 proteins in IMR-90 cells (upper panels) or U-2 OS cells (lower panels).


15



N-C

MV

5E6

8E6

6bE

6

11E

6

16E

6

18E

6

ICD

RLU

No

ICD

1

2

U-2 OS CellsHES1 reporter

25, 50, 100 ng of plasmid

3

0 WB:

FLAG

IMR-90

Actin

MAML1 150 - 52 -

kDa

38 -

Rel

ativ

e ex

pres

sion DLL4

HES1

Primers

Rel

ativ

e ex

pres

sion

MAML1 HES1 DLL4

Control MAML1

GFP

HPV5-E6

HPV6b-E

6HPV11

-E6

HPV16-E

6

HPV18-E

6

Cont

rol

MAM

L1

00.20.40.60.81.01.21.41.61.82.0

0

0.2

0.4

0.6

0.8

1.0

1.2 shRNA

HPV8-E6

Supplementary Figure 11. The E6 protein from HPV5 and HPV8 regulates expression of Notch pathway responsive genes HES1 and DLL4. Upper panels: qPCR of Notch pathway responsive genes upon expression of E6 proteins from different HPV types (left panel) or upon knockdown of MAML1 (right panel). Error bars represent standard deviation across two experi-ments and three experiments, respectively. Western blot of MAML1 in IMR-90 cells upon shRNA knockdown. Lower panel: HES1 luciferase reporter assays of co-transfected Notch intracellular domain (Notch-ICD) and E6 proteins at increasing concentrations from different HPV types in U-2 OS cells. Error bars represent standard deviation across three experiments. Western blot (inset) depicts E6 protein expression from the indicated HPV type in a representative experiment.


16



RESEARCH

−log(P−value)

Fold

enr

ichm

ent i

n C

OS

MIC

Cla

ssic

pr

otei

n re

cove

ry

1.6

1.8

2.0

2.2

2.4

●

●

●

19

17

16

15

13

1.0 1.5 2.0 2.5 3.0

Uniquepeptides

● 1● 2● 3

45

P=0.05

19,398 proteins outside diagram

TAP-MS

COSMIC Classic

Y2H TF 56359

4612

286

13

11 2

0

0

0

0

91

16 cancer proteins within Y2H+TF+TAP-MS: P = 0.007

Supplementary Figure 12. DNA tumour viruses target cancer genes. Fold enrichment of COSMIC Classic genes in TAP-MS, Y2H and TF VirHost data sets as a function of the number of unique peptides (circle sizes) detected for host proteins identified through TAP-MS. Numbers of proteins in intersections of VirHost and COSMIC Classic genes are indicated. Venn diagram of overlaps between proteins identified by TAP-MS (≥3 unique peptides), Y2H, TF and COSMIC Classic genes.


17

80

9095

97 98 99

Rep

rodu

cibi

lity

(%)

Minimum number of unique peptides

40

60

50

70

80

90

100

53

1 2 3 4 5 6 7 8 9 10

TAP-A, TAP-BTAP-B, TAP-AAverage

Supplementary Figure 13. Reproducibility of viral-host protein associations observed in replicate TAP-MS experiments as a function of the number of unique peptides detected. Data were plotted based on both experimental orientations (TAP-A then TAP-B, blue line and TAP-B then TAP-A, red line), with the average indicated by the green dashed line.


18



Protein

0.0

0.1

0.2

0.3

0.4

0.5

20 40 60 80 100

Pol

yphe

n2 m

utat

ion

dele

terio

usne

ss s

core

Polyphen2 Prediction

UnscoredBenignPossibly DamagingProbably Damaging

Somatic mutation count

Cancer type (source)

Chronic Lymphocytic Leukemia

GBM(Baylor)

GBM(WashU)

Head&Neck(Agrawal)

Head&Neck(Stransky)

Medulloblastoma

Multiple Myeloma

Non−Small Cell Lung Cancer

Ovarian(Baylor)

Ovarian(Broad)

Ovarian(WashU)

Small Cell Lung Cancer

0 2000 4000 6000

0.0

0.1

0.2

0.4

0.5

TP53

HIS

T1H

4AP

TEN

KR

AS

HIS

T1H

3AC

DK

N2A

NR

AS

C19

orf5

3E

GFR

RH

OA

0.3

Supplementary Figure 14. Somatic mutations in genome-wide tumour sequencing projects. Somatic mutations identified through genome-wide sequencing studies of human cancers and evaluated for impact using Polyphen2 (upper panel). Ranking of proteins with somatic mutations in cancer based on the cumulative inferred impact of all observed mutations (lower panel) (Supplementary Table 11). Inset: top ten ranked proteins.


19



RESEARCH

Supplementary Figure 15. Network of VirHostSM to host targets and cancers. Mapping of VirHostSM gene products to both tumours in which they are mutated (left) and to viral interactors (right). Proteins annotated with the GO term “regulation of apoptosis” indicated in purple.

MCPyV-LT

16-E6

SV40-ST

18-E6NOX16-E6NOX

18-E7

BGLF3.5

Medulloblastoma

Ovarian

Small cell lung

11-E7

JCMad1-LST

BVLF1

HPyV6-LST18-E6HPyV7-LST

MCPyV-ST

Glioblastoma

Head and neck

Multiple myeloma

BK-LTTRUNC

WU-LST

BGLF3

JCCY-LST

EBNA3B5-E7

18-E6X

BTRF1

BNRF1

LMP2A

16-E7

BSRF1

EBNALP

E1A

BBRF2

11-E5B

E1B19KE4ORFB

33-E7

BDLF4

EBNA1

BALF1

16-E5BFRF0.5

18-E5BSLF2/BMLF1

BHRF1

BNLF2A

6b-E5A

SV40-LT

8-E76b-E7

8-E6

PSMB6

FBXW7

PSMA4CDK4

MEOX2POLR1C

DMRT3

MAGEA1NFE2L2

FOXK1

POLM

ELF1

DNAJA3CNP

ZBTB25WDYHV1

GMCL1TP53TAF9

PRTFDC1

UBE2IPRAF2

TMEM126ALAMTOR1

BAK1BIK

KHDRBS2RPS24

KRTAP13-1

C1orf50

ROPN1

NFYBFAM71C

EGFR

STAU1

ARHGDIARRAS2

RABL3CHRM2

F2R

RB1PSMC5

PPP2R1A

Regulation of apoptosis

Y2H interactionTAP-MS association

COSMIC Classic

TF

E4ORF3

5-E6BDLF3.5MCPyV-LTTRUNC

BNLF2B

BILF1

Causes expression change of TF


20



0.5 x106

1 x106

P3

Split 1:3P4

P5

P6

Thaw

3

9

27

Split 1:3

Split 1:3

Transduce 26 viral genes& controls (GFP, SV40LT)

P6

Split 1:5P7

Split 1:3P8

5

9

P9

RNA

5

P12

Protein

18

Freeze

P7, P8, P10

Puromycin selection

Western Blot Analysis

MycoplasmaTesting

Sequence viORF

Bioanalyzer

P0

Split 1:3P1

P2

P3

Thaw

Cell bank Generation of cell lines Quality Controls

IMR-90 Bank

IMR-90 Bank

3

9

27

Split 1:3

Split 1:3

Supplementary Figure 16. The IMR-90 cell culture pipeline. Generation of the IMR-90 cell bank (left), and generation of IMR-90 cell lines expressing viORFs (middle) and quality control (QC) measures taken (right).


21



RESEARCH

50

110

90

160200

MW(kDa)

40

30

201510

70

Viral Protein

SubcellularFractionCell Line #

C N C N C N C N C N C N

1 33 59 85 99 111

CTRL-GFP CTRL-GFP CTRL-GFP CTRL-GFP CTRL-GFP CTRL-GFP

Viral Protein

SubcellularFractionCell Line # 0 00

CTRL-IMR90 CTRL-IMR90 CTRL-VECTOR

C N C N C N

Viral Protein


N M N M N M N M N M

44 45 47 48 49

HPV-6b E5A HPV-6b E5B HPV-11 E5B HPV-16 E5 HPV-18 E5


22



C N C N C N C N C N C N C N C N

Viral Protein

SubcellularFractionCell Line # 21 22 23 24 25 26 38 40

HPV-5 E6 HPV-6b E6 HPV-8 E6 HPV-11 E6 HPV-16 E6 HPV-18 E6 HPV-5 E6 HPV-8 E6

Viral Protein

87 91 92 93

HPV-16 E6no* HPV-16 E6no*I128T

HPV-18 E6no* HPV-18 E6*

C N C N C N C NSubcellularFractionCell Line #

C N C N C N C N C N C N C N C N

Viral Protein

SubcellularFractionCell Line # 15 16 17 18 19 20 96 113

HPV-5 E7 HPV-6b E7 HPV-8 E7 HPV-11 E7 HPV-16 E7 HPV-18 E7 HPV-5 E7 HPV-16 E7


23



RESEARCH

Viral Protein


C N C N C N C N C N C N C N

5 6 37 60 86 112 50

POLYO-SV40LT

POLYO-SV40LT

POLYO-SV40LT

POLYO-SV40LT

POLYO-SV40LT

POLYO-SV40LT

POLYO-SV40sT

Viral Protein

7 51 118

POLYO-MCPyVLT

POLYO-MCPyVsT

POLYO-MCPyVLsT

C N C N C NSubcellularFractionCell Line #

Viral Protein



114 115 116 117 119 120

POLYO-JCCYLsT

POLYO-JCMad1LsT

POLYO-BKLsT

POLYO-WULsT

POLYO-HPy6LsT

POLYO-HPy7LsT


24




Viral Protein

SubcellularFractionCell Line # 14 53 54 56 57 70

EBV-EBNALP EBV-LMP2a EBV-BHRF1 EBV-BRLF1 EBV-BNRF1 EBV-EBNA1

Viral Protein

77 78

79 81 82 109 110

EBV-BTRF1 EBV-BGLF3

EBV-BVLF1 EBV-BDLF4 EBV-BDLF3.5 EBV-BXLF1 EBV-BALF1

C N C N

C N C N C N C N C NSubcellularFractionCell Line #

Viral Protein


C N C N C N

C N C N C N

129 130 131

132 135 136

EBV-BNLF2a EBV-BNLF2b EBV-BFRF0.5

EBV-BSRF1 EBV-BGLF3.5 BFLF2

C N C N C N

102 122 125

ADENO5E1B19K

ADENO5E4ORF1

ADENO5E3GP19K

Supplementary Figure 17. Silver stain analyses of viral-host protein complexes. A 15% aliquot of the final eluate of each TAP was resolved on polyacrylamide gels then visualized by silver-stain (shown here for the first TAP replicate).


25



RESEARCH

Supplementary Figure 18. Cluster propensity of microarrays before and after ComBat. Hierarchical clustering of all microarrays (a) before and (b) after applying ComBat, an algorithm used for removing batch effects in microarray data.

X21D

X120

EX5

6CX5

6DX5

6AX5

6BX5

6E X3A

X3D

X3B

X3C

X3E

X109

EX1

09A

X109

CX1

09B

X109

DX3

5EX3

5AX3

5DX3

5BX3

5CX7

0BX7

0C X2B

X2A

X2D

X70E

X70A

X70D

X121

CX1

21A

X121

BX1

21D

X121

EX4

9DX4

9CX4

9AX4

9BX4

9EX1

27B

X127

AX1

27C

X127

DX1

27E

X7A

X7C

X7E

X7B

X7D

X130

AX1

30C

X130

DX1

18C

X118

DX1

36A

X136

CX1

36D

X58C

X58E

X58B

X58A

X58D X4

DX3

4BX4

CX2

CX2

EX1

07D

X107

AX1

07B

X107

CX1

07E

X124

BX1

24E

X124

DX1

24A

X124

CX2

3AX2

3EX2

3CX2

3BX2

3DX3

8BX4

0AX4

0BX2

1AX2

1CX2

1BX2

1EX4

0cX4

0DX3

8CX3

8EX3

8AX3

8DX5

5DX5

5AX5

5BX5

5CX5

5EX9

8CX9

8DX9

8AX9

8BX9

8EX2

4AX2

4BX2

4DX2

2BX2

2CX2

2DX2

2AX2

2EX2

4CX2

4EX1

30B

X130

EX1

36B

X136

EX1

18A

X118

BX1

18E

X74A

X74B

X77B

X77D

X77E

X53B

X53D X4

AX4

BX4

EX5

3EX5

3AX5

3CX1

23E

X123

DX1

23C

X123

AX1

23B

X34C

X34A

X34D

X20E

.2X2

0E.3

X20E

.4X2

0E.1

X20E

_040

7X2

0E_0

416

X120

AX1

14B

X114

EX1

14C

X114

AX1

14D

X120

CX1

20B

X120

DX1

19A

X119

CX1

19B

X119

DX1

19E

X87C

X87D

X25B

X25D

X87B

X92B

X92A

X92E

X87A

X91A

X25A

X26A

X60C

X60A

X60B

X60D

X60E

X6D

X5A

X5D

X6A

X5C

X5E

X86A

X86B

X6C

X37B

X37D

X6C

_040

7X6

C_0

416

X6C

.1X6

C.2

X87E

X19A

X135

CX1

35B

X135

EX4

7AX4

7EX4

8AX4

8EX8

5EX8

5BX8

5CX5

9AX3

3CX3

3DX1

7EX4

4AX4

4BX4

4EX1

31A

X88C

X132

AX7

5EX7

8EX7

8DX8

1DX8

2DX4

5EX4

5CX4

5AX4

5BX7

8AX8

2AX8

2CX8

5DX1

7AX9

6DX8

8DX9

3DX1

4BX1

4AX1

4EX1

02D

X102

BX1

02C

X50A

X50B

X50E

X50C

X50D

X99E

X102

EX3

3AX8

8EX7

9DX7

9EX8

2EX7

5DX5

9EX5

9BX5

9DX4

5DX1

29C

X54B

X54D

X75C

X135

AX1

22E

X125

EX4

8DX1

9BX1

9CX1

32B

X125

BX1

31B

X122

BX1

22C

X125

CX1

31C

X1B

X1C

X1E

X16C

X15A

X15D

X17B

X17C

X51C

X51D

X51E

X51A

X51B

X88A

X102

AX9

9DX5

9CX7

5BX3

3BX3

3EX1

6BX1

5EX1

5BX1

5CX8

5AX9

3AX4

4CX1

11B

X111

EX1

11A

X111

DX1

AX1

25A

X132

CX1

8AX1

6EX1

29A

X129

EX1

9DX1

9EX1

8CX1

8DX1

6AX1

8BX1

8EX1

32D

X132

EX1

29D

X78B

X78C

X48B

X48C

X47C

X47B

X47D

X44D X1

DX1

6DX1

7DX6

C.3

X92C

X92D

X91C

X91D

X91B

X91E

X86D

X86C

X86E

X5B

X6B

X6E

X6C

_033

010

X37A

X37C

X37E

X26C

X26B

X26E

X26D

X25C

X25E

X20E

_033

010

X20C

X20E

X20D

X20A

X20B

X57C

X57A

X57B

X57D

X57E

X81C

X54A

X54C

X54E

X122

AX1

22D

X131

EX1

35D

X125

DX1

31D

X75A

X93E

X99B

X96B

X88B

X93B

X81E

X129

BX1

11C

X74D

X74E

X81B

X77A

X81A

X74C

X77C

X14C

X14D

X93C

X96C

X99A

X99C

X96A

X96E

X110

DX1

10A

X110

BX1

10C

X110

EX7

9AX7

9BX7

9CX1

12A

X112

DX1

12C

X112

BX1

12E

X36C

X36A

X36E

X36B

X36D

X117

AX1

17B

X117

CX1

17D

X117

EX1

15B

X115

EX1

15A

X115

CX1

15D

X113

AX1

13E

X113

BX1

13C

X113

D

X12

0EX

87E

X87

CX

87D

X92

CX

92D

X91

CX

91D

X91

BX

91E

X6C

.3X

86D

X86

CX

86E

X98

CX

98D

X98

AX

98B

X98

EX

85D

X85

EX

85B

X85

CX

96D

X88

DX

93D

X88

EX

102D

X10

2BX

102C

X99

EX

102E

X79

DX

79E

X75

DX

59E

X59

BX

59D

X77

EX

82E

X13

1AX

135A

X12

9CX

125C

X13

1CX

125E

X12

5BX

131B

X12

2EX

122B

X12

2CX

107D

X10

7AX

107B

X10

7CX

107E

X11

0DX

110A

X11

0BX

110C

X11

0EX

109E

X10

9AX

109C

X10

9BX

109D

X11

2AX

112E

X11

2DX

112B

X11

2CX

117C

X11

7DX

117E

X11

7AX

117B

X11

5BX

115E

X11

5AX

115C

X11

5DX

113A

X11

3EX

113B

X11

3CX

113D

X12

0CX

120B

X12

0DX

119A

X11

9BX

119C

X11

9DX

119E

X12

0AX

114B

X11

4EX

114A

X11

4CX

114D

X57

CX

57A

X57

BX

57D

X57

EX

36C

X36

AX

36E

X36

BX

36D

X6C

X6C

_040

7X

6C_0

416

X6E

X6C

_033

010

X37

BX

37D

X6D

X5D X5E

X37

AX

37C

X37

EX

5AX

6AX

5C X5B

X6B

X6C

.1X

6C.2

X60

CX

60A

X60

BX

60D

X60

EX

79A

X79

BX

79C

X59

CX

78B

X78

CX

74C

X74

EX

77C

X81

BX

77A

X81

AX

75B

X81

CX

18A

X16

EX

16B

X15

BX

15C

X1A

X54

AX

19D

X19

EX

18D

X15

EX

18E

X47

AX

47E

X47

CX

47B

X47

DX

44D

X44

EX

33B

X33

EX

44C

X48

BX

48C

X2D

X4D X4E

X15

DX

16D

X17

DX

1DX

14D

X4C

X2C X2E

X3D X3E

X3A

X3B

X3C

X35

EX

35D

X35

CX

35A

X35

BX

53B

X53

DX

53E

X53

AX

53C

X2A

X2B

X4A

X4B

X50

AX

50B

X50

CX

50D

X50

EX

33A

X48

DX

48A

X48

EX

33C

X33

DX

44A

X44

BX

45C

X45

AX

45B

X45

DX

45E

X49

DX

49C

X49

AX

49B

X49

EX

17A

X19

AX

1EX

14E

X14

CX

14A

X14

BX

54C

X54

EX

54B

X54

DX

17C

X17

EX

16A

X18

BX

16C

X18

CX

19B

X19

CX

1BX

1CX

15A

X17

BX

56C

X56

DX

56A

X56

BX

56E

X85

AX

93A

X93

CX

88C

X96

CX

99A

X99

CX

96A

X96

EX

93E

X88

BX

93B

X96

BX

99B

X88

AX

102A

X99

DX

129A

X12

9EX

129D

X11

1BX

111E

X13

5CX

135B

X13

5EX

129B

X11

1CX

122A

X12

2DX

135D

X12

5DX

131D

X13

1EX

132B

X13

2CX

132D

X13

2EX

132A

X12

5AX

111A

X11

1DX

58C

X58

EX

58B

X58

AX

58D

X70

EX

70A

X70

DX

70B

X70

CX

74A

X74

DX

77D

X74

BX

77B

X82

CX

78A

X59

AX

82A

X78

EX

81E

X78

DX

81D

X82

DX

75E

X75

AX

75C

X51

CX

51D

X51

EX

51A

X51

BX

20E

.3X

20E

.2X

20E

.1X

20E

.4X

20A

X20

BX

20E

_033

010

X20

E_0

407

X20

EX

20C

X20

DX

20E

_041

6X

86A

X86

BX

87B

X92

BX

92A

X92

EX

87A

X91

AX

26A

X25

AX

25D

X25

CX

25B

X25

EX

26D

X26

EX

26B

X26

CX

40c

X40

DX

38E

X38

CX

38A

X38

DX

38B

X40

AX

40B

X21

DX

21B

X21

CX

21A

X21

EX

23C

X23

BX

23D

X23

AX

23E

X12

3CX

123E

X12

3DX

123A

X12

3BX

124D

X12

4BX

124E

X12

4AX

124C

X12

1CX

121D

X12

1EX

121A

X12

1BX

136A

X11

8CX

118D

X13

0AX

130C

X13

0DX

127B

X12

7AX

127C

X12

7DX

127E

X11

8AX

118B

X11

8EX

136C

X13

6DX

130B

X13

0EX

136B

X13

6EX

55A

X55

BX

55D

X55

CX

55E

X7B

X7D X7A

X7C X7E

X34

BX

34C

X34

AX

34D

X24

BX

24A

X24

DX

24C

X24

EX

22B

X22

DX

22E

X22

AX

22C

Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8

a

b


26

Enriched GO pathways

Freq

uenc

y

0

20

40

P < 0.01

20 40 60Enriched KEGG pathways

Freq

uenc

y

0

15

30

P < 0.01

0.0 0.4 0.8

Supplementary Figure 19. Cluster coherence. Frequency of KEGG and GO pathway enrich-ment as compared to randomly generated clusters.


27



Average number of unique peptides detected

Mea

n m

RN

A in

tens

ity in

con

trol s

ampl

es

6

8

10

12

14

75th percentile

50th percentile

25th percentile

10 20 30 40 50 60

Supplementary Figure 20. Expression bias of TAP-MS associations. Viral targets identified through TAP-MS are biased towards highly expressed genes. Relationship between mRNA abun-dance in control samples of IMR-90 (y-axis) versus the number of unique peptides observed for viral associations for that protein (x-axis). Number of unique peptides is the average for two biological replicates. If more than one virus association is seen for a host protein, the association with the highest number of unique peptides is shown. Horizontal lines correspond to the indicated percentiles for mRNA abundance of all transcripts on the microarray.


28



RESEARCH

Fold enrichment over random sets

Number of identified unique peptides

● 1 ● 2 ● 3 4 5

Num

ber o

f hos

t−vi

rus

PR

Sas

soci

atio

ns re

cove

red

10

11

12

13

14

15

16

17 ● ●

●

30 35 40 45 50 55 60

Supplementary Figure 21. Overlap of viral-host protein pairs identified through TAP-MS with a literature-curated positive reference set. All points are significant at P < 0.001. Dot size reflects minimum number of unique peptide detections required for protein identification.


29

Signal threshold

Rec

over

y ra

te (%

)

PRS (50)RRS (94)Y2H Viral-Hostdata set (447)

1 2 3 4 5

0

5

10

15

20

25

30

PRS RRS Y2HData set

Rec

over

y ra

te (%

)

0

2

4

6

8

10

12

14

Supplementary Figure 22. Assessment of Y2H data set. Percentage of Y2H interacting protein pairs positive in wNAPPA assay at increasing assay signal for PRS, RRS and Y2H viral-host data set. Inset: fraction of PRS or Y2H viral-host data set positive in wNAPPA at a threshold of 1% RRS positive.


30



Supplementary Methods

A. Viral ORFeome cloning

Open-reading frame (ORF) clones encoding selected proteins from adenovirus 5 (Ad5), seven

human papillomaviruses, and nine polyomaviruses (Supplementary Table 1) were obtained by

PCR-based Gateway recombinational cloning, following a protocol previously described for

cloning Epstein-Barr virus ORFs4. To generate viral entry clones ORFs were PCR-amplified with

KOD HotStart Polymerase (Novagen). The PCR primers used contained attB1.1 and attB2.1

recombination sites fused to ~20 nucleotides of ORF-specific forward and reverse primers,

respectively. Primers used to clone viral ORFs are listed (Supplementary Table 13). PCR

products were then transferred into pDONR223 by a Gateway BP reaction, followed by

transformation into chemically competent E. coli DH5α cells, selecting for spectinomycin

resistance30.

Sequence-verified entry clone viral ORFs were transferred by Gateway LR recombinational

cloning (Invitrogen) into appropriate expression vectors. Recombination products were directly

transformed into E. coli (DH5α-T1R strain) via selection for ampicillin resistance in liquid LB

media. Plasmid DNA was extracted from E. coli grown overnight in a 96-well format using a

Qiagen BioRobot 8000. The prepared plasmid DNA was used for transformations into yeast

cells or for generation of retrovirus for subsequent transductions into IMR-90 human cells.

For assay of protein interactions by yeast two-hybrid (Y2H), viral ORFs (viORFs) were

introduced into both pDEST-DB and pDEST-AD-CYH2 destination vectors9, generating Gal4

DNA binding domain (DB)-viORF hybrid proteins and Gal4 activation domain (AD)-viORF hybrid

proteins, respectively.

For expression in mammalian cells, sequence validated viral ORFs in pDONR223 were

recombined into either the Gateway destination vector MSCV-N-Flag-HA-IRES-PURO (NTAP)

or MSCV-C-Flag-HA-IRES-PURO (CTAP) (gifts of M. Sowa and J. W. Harper)31 using LR


31



RESEARCH

Clonase (Invitrogen). The following primer sets were used for resequencing of the resulting

plasmids:

MSCV-N and MSCV-C Fwd: 5’-CCCTTGAACCTCCTCGTTCGACC-3’

MSCV-N Rev: 5’-GCCAAAAGACGGCAATATGGTGG-3’

MSCV-C Rev: 5’-GTCGGGCACGTCGTAGGG-3’

Adenovirus: Nine full length ORFs (Supplementary Table 1) were PCR amplified from Ad5

genomic DNA (human adenovirus 5 strain Adenoid 75 from the ATCC (ATCC VR-5) to generate

Gateway entry clones.

Epstein-Barr Virus (EBV): Eighty-one EBV ORFs were investigated (Supplementary Table 1).

EBV Gateway entry clones and Y2H DB-X and AD-Y clones were previously described4.

Selected EBV ORFs (Supplementary Table 1) were transferred to the NTAP vector for

subsequent transduction of IMR-90 cells.

Human Papillomaviruses (HPV): Seven HPV types32-38 (virus classifications from

http://pave.niaid.nih.gov/#prototypes?type=human) were chosen for this study: HPV6b, 11, 16,

18 and 33 of the alpha genus associated with mucosal lesions (gifts of Y. Jacob), and HPV5

and HPV8 (gifts of Y. Jacob and H. Pfister) of the beta genus which infect cutaneous epithelia.

ORFs clones encoding the early region proteins (E4, E5 proteins where relevant, E6, and E7) of

all seven HPVs (Supplementary Table 1) were amplified by PCR and cloned by recombination

into Gateway vectors.

HPV16 and HPV18 E6 variants: Due to the occurrence of internal splicing in HPV16-E6 and

HPV18-E6 and the desire to assure that full-length E6 proteins were expressed in IMR-90 cells,

various splice-defective derivatives of HPV16-E6 and HPV18-E6 were also generated. HPV11,

6b, 5 and 8 do not encode internally spliced versions of E6. HPV16-E6X, HPV16-E6XX and

HPV18-E6X correspond to the previously described major splice variants of these proteins39,40.

“E6X” designates the E6*, whereas E6XX designates the E6** splice variants. Primers used to

clone and mutagenize HPV16-E6 and HPV18-E6 variants are listed (Supplementary Table 13).


32



HPV16-E6X, HPV16-E6XX and HPV18-E6X were cloned by PCR amplification using as

template genomic DNA from IMR-90 cell populations that had been transduced with NTAP-

HPV16-E6 or NTAP-HPV18-E6 expression vectors. These cell lines had been shown to contain

integrated copies of these splice variants during the quality control process of the IMR-90 cell

culture pipeline. PCR products with the 5’ CACC overhang were directionally cloned into the

pENTR vector, sequenced, cloned into the NTAP vector and re-sequenced.

To generate versions of HPV16 and HPV18-E6 proteins that do not undergo internal splicing

events (designated E6NOX), the splice donor site(s) encoded within the respective E6 ORFs

were eliminated by site-directed mutagenesis (QuikChange®; Stratagene)40. As a consequence

the HPV16-E6NOX and HPV18-E6NOX proteins carry a V to L mutation at amino acid residues

42 and 44, respectively40.

The previously characterized HPV16-E6 mutants defective in p53 binding (Y54D) and

UBE3A/E6AP binding (I128T)41 were generated by site directed mutagenesis (QuikChange®;

Stratagene) of the NTAP-HPV16-E6NOX vector.

PDZ protein binding defective HPV16-E6 and HPV18- E6 mutants were generated by

cloning the HPV16-E6NOX and HPV18-E6NOX ORFs into the CTAP vector. This construct

fuses HA and FLAG epitope tags to the C-terminal PDZ binding domains and thereby blocks

association with cellular PDZ proteins.

Polyomaviruses: ORF clones were obtained from nine polyomaviruses: BK, HPyV6, HPyV7,

JCCY, JCMad1, MCPyV, SV40, TSV and WU. The entire early region of BK was PCR amplified

from BKPyV Dunlop genomic DNA cloned into pBR322 (gift from Michael Imperiale and Peter

Howley). Upon quality control we found that the vector encoded a truncated form of Large T

protein. The entire early region of HPyV6 was PCR amplified from pHPyV6-607a (Addgene

plasmid 24727)42. The entire early region of HPyV7 was PCR amplified from pHPyV7-713a

(Addgene plasmid 24728)42. The entire early region of JCCY was amplified from JCCY genomic

DNA (gift from Igor Koralnik). The entire early region of JCMad1 was amplified from JCMad1


33



RESEARCH

genomic DNA (gift from Igor Koralnik). The entire early region of TSV was amplified from

pUC19-TSV43. The entire WU early region was amplified from pcDNA WUER (gift from David

Wang). SV40 ORFs encoding Large T antigen (LT)44, Small T antigen (ST)45, Agnoprotein,

17KT, VP1, VP2, and VP3 were PCR amplified from SV40 genomic DNA.

MCPyV-LST was PCR amplified from Addgene plasmid 24729 pMCPyV-R17a containing

the MCPyV R17a genome42. MCPyV ORF clones MCPyV-LT, MCPyV-LTtrunc (which contains

a premature stop codon at position 961-963), and MCPyV-ST were subcloned from a MCPyV-

LST ORF that was first PCR amplified from tumour samples. MCPyV-ST was PCR amplified

directly, whereas MCPyV-LT and MCPyV-LTtrunc were obtained by overlap extension PCR46 to

generate a cDNA for LT (nt 429 to 861 of MCV). MCPyV-SPLT was created similarly but with

MCPyV-LT as template and with a different set of overlapping primers (spanning nucleotides

1622 to 2778 of MCV). LT, ST, and SPLT use the same Gateway attB1-ORF primer and LT and

SPLT use the same Gateway attB2-ORF primer. After two separate PCRs with the relevant

Gateway-ORF primer and internal spanning primer pairs, overlap extension PCR produced the

final PCR product for subsequent BP cloning to generate MCPyV-LT, MCPyV -LTtrunc, and

MCPyV -SPLT entry clones.

B. Yeast two-hybrid (Y2H) assay

Y2H assays used viral ORFs or ORF fragments against the human ORFeome v5.1 collection

consisting of ~15,000 full-length human ORFs10,30 (http://horfdb.dfci.harvard.edu/). Yeast strains

Y8800 and Y893047, of mating type MATa and MATα respectively, harboring the genotype leu2-

3,112 trp1-901 his3Δ200 ura3-52 gal4Δ gal80Δ GAL2::ADE2 GAL1::HIS3@LYS2

GAL7::lacZ@MET2 cyh2R, were transformed together with AD-viORF and DB-viORF

constructs, respectively. For Y2H screening MATa and MATα haploid yeast strains carrying the

corresponding AD-viORFs and DB-viORFs were mated against haploid yeast cells of the

appropriate mating type carrying DB-human hybrid proteins (DB-huORF) or AD-human hybrid


34



proteins (AD-huORFs), respectively. All AD-viORF and AD-huORF plasmids also carry the

counter-selectable marker CYH2, which allows selection on plates containing cycloheximide

(CHX) of yeast cells that do not contain any AD-Y plasmid9,48. Colonies growing on CHX-

containing medium lacking histidine are considered latent autoactivators and the presumptive

pair removed from the data set9,48. At each stage of the interactome mapping pipeline, reporter

gene activity is evaluated in parallel both on regular selective plates and on CHX-containing

plates.

For Y2H screens with viral ORFs as DB-hybrid proteins, the 123 Y8930:DB-viORF yeast

strains (Y8930 transformed with a unique DB-viORF) were individually mated against mini-

libraries containing 188 different Y8800:AD-HuORF yeast strains, each carrying a unique AD-

huORF49. Twenty-seven of the viral ORFs consistently behaved as autoactivators when

screened as DB-viORF fusions in yeast grown on selective medium (Supplementary Table 14).

Primary reciprocal screens of ~15,000 DB-huORFs against AD-viORFs used individual

Y8930:DB-huORF strains mated against virus-specific mini-libraries of Y8800:AD-viORF pools.

Primary screens in both orientations were completed twice and all initial positive pairs from the

primary screens underwent secondary phenotyping prior to determining the interaction pairs by

sequencing11,49,50. Each interacting pair was individually retested from fresh stocks of viral and

human yeast strains in both orientations, irrespective of the original Y2H orientation in which the

viral-human interactions were initially identified. Autoactivating baits were removed at each step

of the primary screening and during retesting.

Viral-human interactions found in multiple primary screens and verified by pair-wise

retesting are valid biophysical interactions11,49,50. Multiple pair-wise retesting can demonstrate

that interactions are reproducibly reliable even though any given pair may not retest positive

each and every time, providing an additional level of confidence in the data set. Besides the

pair-wise retesting done at the time of the original screens, we also carried out an additional

retest of all verified Y2H interaction pairs followed by resequencing of all DB-X and AD-Y ORFs


35



RESEARCH

to confirm ORF identity in each interaction pair. The final set of viral-human interactions,

consisting of all sequence-verified pairs that successfully retested in a CHX-sensitive manner,

contains 53 viral ORFs engaged in 454 interactions with 307 human proteins (Supplementary

Table 2). These 454 interactions constitute a set of biophysically real interactions that can be

successfully used for further analysis and downstream investigations.

For screens with HPV proteins, all individual viral-human pairs from each HPV strain were

also retested in both orientations using the orthologous ORFs of all seven HPVs against all

human proteins found by at least one HPV (i.e., a particular huORF found to interact with only

HPV16-E6 was tested against all seven HPV E6 proteins multiple times and in both

orientations). The final data set of HPV-human interactions includes those additional interaction

pairs even if they were not initially found in the primary screens, as long as they did score

positive multiple times in pair-wise retests, in either orientation, without exhibiting growth on

CHX-containing medium and after the identity of the specific HPV protein was confirmed by

sequencing. Virus-host interactions detected by Y2H are available at

http://interactome.dfci.harvard.edu/V_hostome/index.php.

C. Cell culture pipeline

Generating the IMR-90 cell bank: We obtained the human diploid fibroblast cell line IMR-90

(CCL-186) from the American Type Culture Collection (ATCC). IMR-90 cells were cultured in

DMEM (Cellgro) supplemented with 15% FBS (Omega Scientific), 1% Pen Strep (GIBCO), 1%

Glutamax (GIBCO) and 1% MEM NEAA (GIBCO). A master cell bank of third passage cells

consisted of 27 aliquots that were frozen and stored in liquid nitrogen. A single vial of frozen

cells was thawed and used to generate each batch of 26 cell lines in the pipeline

(Supplementary Fig. 16).

Generating IMR-90 cell lines expressing viral ORFs: Recombinant retrovirus was produced

in Phoenix cells by co-transfecting (Lipofectamine LTX, Invitrogen) plasmids encoding the viral


36



ORF, GAG/POL and ENV. To produce cell lines expressing each viral ORF, a vial of IMR-90

cells from the bank was thawed and expanded (5.0 x 105 IMR-90 cells were seeded onto 10 cm

plate per transduction) to allow simultaneous transduction of 26 individual viral ORFs or controls

(Supplementary Fig. 16), and then infected twice with recombinant retrovirus for 5 – 8 hr in the

presence of 5 µg/ml hexadimethrine bromide (Sigma) with subsequent selection with puromycin

(2 µg/ml).

To permit comparisons between different batches of 26 cell lines, each batch included

biological replicates of the GFP control and SV40 LT. Five pipeline batches with a total of 87

unique viral ORFs led to the establishment of 75 IMR-90 cell lines expressing unique viral ORFs

(Fig. 1b and Supplementary Table 1).

Quality control (QC): IMR-90 cells expressing viral ORFs were uniformly expanded to generate

several vials of cells that were frozen and stored. IMR-90 cell lines expressing viral ORFs were

banked at passages 7, 8, and 10. All cell lines had to pass several Quality Control (QC) steps

(Supplementary Fig. 16): semi-quantitative western blot analysis for viral ORF expression using

anti-HA antibodies (HA-11 Clone 16B12, Covance), mycoplasma testing (MycoAlert Kit, Lonza),

and sequencing of the integrated viral ORF from genomic DNA post transduction. Sixty-four of

the 75 cell lines initially established passed QC (Fig. 1b and Supplementary Table 1). Following

QC, cell lines were processed for microarray analysis and tandem affinity purification.

For the sequencing QC step, genomic DNA was extracted from all cell lines (DNeasy Blood

and Tissue Kit, Qiagen) and the viral ORFs were amplified with LA Taq (Takara), using the

following vector-specific primer sets:

MSCV-N and MSCV-C Fwd primer: 5’-CGCATGGACACCCAGACCAGGTC-3’

MSCV-N Rev primer: 5’-TCACGACATTCAACAGACCTTGC-3’

MSCV-C Rev primer: 5’-GTCGGGCACGTCGTAGGG-3’

PCR products were extracted from the gel (QIAquick Gel Extraction Kit, Qiagen) and sequenced

with the following primers:


37



RESEARCH

MSCV-N and MSCV-C Fwd primer: 5’-CCCTTGAACCTCCTCGTTCGACC-3’

MSCV-N Rev primer: 5’-GCCAAAAGACGGCAATATGGTGG-3’

RNA isolation and microarray analysis: IMR-90 cell lines expressing viral ORFs at passage 9

were seeded into five replicates at a density of 1 x 106 cells per 10-cm plates and harvested

after 48 hr in RNAlater (Ambion). After isolation of total RNA (RNeasy, Qiagen) RNA integrity

was determined using a Bioanalyzer (Agilent). Gene expression was assayed using Human

Gene 1.0 ST arrays (Affymetrix) according to standard protocols.

Cell proliferation and senescence assays: We measured the growth rate of a subset of IMR-

90 lines expressing the viral ORFs HPV5-E6, HPV8-E6, MCPyV-LTTRUNC, SV40-ST, MCPyV-

ST, HPV6b-E6, HPV18-E6X, C-GFP, N-GFP (two biological replicates), EBV-BGLF3, HPV18-

E6NOX, HPV16-E6, and SV40-LT (two biological replicates) between passages 9 and 12. Cells

were seeded in triplicate onto six 12-well plates (day 0; 2 x 104 cells per well) and cell density

was measured by crystal violet assay on days 1, 4, 7, 9, 11, and 14. All cell density values were

normalized to the value measured at day 1. For senescence assays, IMR-90 cells expressing

viral ORFs between passages 10 and 12 were seeded in triplicate onto 6-well plates and

subjected after 48 hr to a SA-b-gal Senescence Colorimetric Assay (Sigma). Stained cells were

overlaid with 50% glycerol and photographed at 10X magnification under a Nikon Eclipse E300

microscope with a Spot digital camera (Diagnostic Instruments, Inc.). Three non-overlapping

fields were photographed. The number of SA-b-gal positive cells was determined by

examination of the digital images51.

D. HPV E6 oncoproteins and Notch signalling

The U-2 OS cells used in several of the Notch signalling experiments were cultured in DMEM

(Invitrogen) supplemented with 10% FBS (Gemini Bio-products), and 1% Pen Strep (Invitrogen).

Generating pNCMV expression vectors encoding HPV E6 proteins: PCR amplification

products encoding HPV E6 proteins were amplified using B-Actin HPV E6 expression vectors as


38



a template52. PCR amplification products with a 5’ BamHI and a 3’ BglII restriction enzyme

cleavage sites were directionally cloned into the BamHI site of the pNCMV vector52,53, resulting

in an in-frame fusion of the FLAG and HA epitope tags at the N-terminus of the E6 coding

sequence. This approach was used for all HPV E6 proteins except for HPV18 E6 which has an

internal BamHI site, and was cloned into the pNCMV vector using PCR amplification products

with BglII restriction enzyme cleavage site at both the 5’ and 3’. All constructs were sequence

validated.

Transfection, cell lysates, immunoprecipitation, immunoblotting, and antibodies: U-2 OS

cells were seeded onto 15 cm plates (2 x 106) and transfected with 12.5 micrograms of a control

pNCMV vector or of a pNCMV vector encoding the indicated HPV E6 proteins using

XtremeGENE 9 (Roche). Cells were harvested 48 hours post transfection. Both IMR-90 cell

lines expressing viral ORFs and transfected U-2 OS cells were lysed in EBC buffer54 (50 mM

Tris HCl, pH 8.5, 150 mM NaCl, 0.5% NP-40, 0.5 mM EDTA, proteinase and phosphatase

inhibitors). Extracts were cleared at 14,000 rpm for 15 min. For immunoprecipitations, equal

amounts of cell extracts were incubated for 4 hr at 4°C with 30 µ l of anti-HA agarose (Sigma).

The beads were washed four times with EBC buffer and resuspended in sample loading buffer

(Bio-Rad). Proteins were resolved by SDS-PAGE (CriterionTM TGXTM precast gels, Bio-Rad) and

subjected to immunoblot analysis with one of the following commercially available antibodies:

anti-p300 (A-300-358A, Bethyl Laboratories), anti-MAML1 (4608S, Cell Signaling Technology or

A-300-672A, Bethyl Laboratories), anti-E6AP (H-182, sc-25509, Santa-Cruz Biotechnology),

anti-vinculin (V9131, Sigma), and anti-HA (HA-11 Clone 16B12, Covance). After incubation with

the appropriate secondary antibodies, antigen/antibody complexes were detected by enhanced

chemiluminescence (SuperSignal, Pierce).

Reverse transcription-quantitative PCR (RT-qPCR) analyses: Total RNA isolated from IMR-

90 cells expressing viral ORFs and total RNA isolated (RNeasy, Qiagen) from IMR-90 cells

expressing MAML1 shRNA or control was converted to cDNA with a QuantiTect Reverse


39



RESEARCH

Transcription Kit (Qiagen). This cDNA preparation was diluted five-fold, and 1 µl was used per

reaction. Real-time PCR quantification, was done in triplicate, using the Brilliant III Ultra Fast

SYBR Green QPCR Master Mix (Agilent technologies), and a Stratagene MX3005 instrument.

The qPCR primer pairs for the human genes HES1 (HP208643), DLL4 (HP213393), and

MAML1 (HP211129) were obtained from OriGene. GAPDH (HP205798) was used as the

internal reference standard. We used the 2(-ΔΔ

Ct) method55 to quantify transcript levels.

RNA interference: For lentivirus-mediated RNA interference, the lentiviral construct targeting

MAML1 (SHCLNG-NM_014757, TRCN0000003353, Sigma) [5’-

CCGGCATGATACAGTTAAGAGGAATCTCGAGATTCCTCTTAACTGTATCATGTTTTT-3’] or

the control empty vector pLKO.1ps (gift of William Hahn) were transfected into HEK293FT cells

according to published protocols56. Puromycin selection (2 µg/ml) began two days after infecting

IMR-90 cells with lentiviruses and was maintained thereafter. IMR-90 cells transfected at

passage 9 with MAML1 shRNA or control pLKO.1ps vectors were seeded (1 x 106) into five

replicates (5 x 10-cm plates) and harvested after 48 hr in RNAlater (Ambion). After isolation of

total RNA (RNeasy, Qiagen) RNA integrity was determined using a Bioanalyzer (Agilent). Gene

expression was assayed using Human Gene 1.0 ST arrays. For siRNA-mediated RNA

interference ON-TARGETplus SMARTpool siRNA (Thermo Scientific) targeting MAML1 (L-

013417-00; sequences below), or control ON-TARGETplus Non-targeting siRNA (D-001810-10-

20; sequences below) were transfected into cells using Lipofectamine RNAiMAX (Invitrogen).

Cells were harvested 72 hours post-transfection.

siRNA J-013417-06, MAML1 GUUAGGCUCUCCACAAGUG siRNA J-013417-07, MAML1 GGCAUAACCCAGAUAGUUG siRNA J-013417-08, MAML1 GCAGCUGUCCAUAUAAGU siRNA J-013417-09, MAML1 UCGAAGACCUGCCUUGCAU siRNA D-001810-01, non-target UGGUUUACAUGUCGACUAA siRNA D-001810-02, non-target UGGUUUACAUGUUGUGUGA siRNA D-001810-03, non-target UGGUUUACAUGUUUUCUGA siRNA D-001810-04, non-target UGGUUUACAUGUUUUCCUA


40



Luciferase Reporter Assays: U-2 OS cells were seeded in triplicate onto 6-well plates (2 x 105

cells per well). Each well was transfected with 10 nanograms of pGK Renilla, 0.5 micrograms of

pGL2 mHes1-Luc, 0.25 microgram of pcDNA3 2HA-ICN1 (gift of Elliot Kieff) and increasing

concentrations (25, 50, and 100 nanograms) of the NCMV vector control or NCMV encoding the

indicated HPV E6 proteins using X-tremeGENE 9 (Roche). The total amount of DNA per

transfection was brought up to 2 microgram per well by adding salmon sperm DNA. After 48

hours cells were lysed with Passive Lysis Buffer (Promega), and subjected to the Dual-

Luciferase Reporter Assay (Promega). Luciferase activity was measured with a LMax II384

microplate reader (Molecular Devices), and the Renilla luciferase was used as the internal

reference standard.

E. Tandem Affinity Purifications (TAP) followed by Mass Spectrometry (MS)

IMR-90 fractionation and protein extract preparation: For each viral ORF tested the

corresponding IMR-90 cells were seeded into twelve to eighteen 15 cm dishes to produce ~2 to

4 x 108 cells by harvesting time. Cells were washed in situ with ice-cold PBS, harvested using a

cell lifter and collected by centrifugation at 500 x g for 5 min at 4°C. Subcellular fractions were

obtained by differential digitonin fractionation57 with the following modifications. To prepare

nuclear and cytoplasmic fractions, cell pellets were resuspended in five pellet volumes of ice-

cold digitonin extraction buffer (0.015% digitonin, 300 mM sucrose, 100 mM NaCl, 10 mM

PIPES, 3 mM MgCl2, pH 6.8) containing protease inhibitors (Roche) and then incubated end-

over-end for 10 min at 4°C. Triton X-100 was added to a final concentration of 0.5% and the cell

suspension was vortexed for 10 seconds. Triton X-100 extracted cells were then divided equally

between three tubes and centrifuged at 1000 x g for 10 min at 4°C. The supernatants (crude

cytoplasmic fraction) were transferred to fresh tubes apart from the nuclear pellets. Both

fractions were flash frozen in liquid nitrogen and stored at -80°C. To prepare nuclear and


41



RESEARCH

membrane fractions, cell pellets were resuspended in five pellet volumes of ice-cold digitonin

extraction buffer and then incubated end-over-end for 10 min at 4°C. The cell suspension was

centrifuged at 500 x g for 10 min at 4°C. The resulting pellet was resuspended in five pellet

volumes of Triton X-100 extraction buffer (0.05% Triton X-100, 300 mM sucrose, 100 mM NaCl,

10 mM PIPES, 3 mM MgCl2, pH 7.4) containing protease inhibitors and then incubated end-

over-end for 30 min at 4°C. The suspension was divided equally into three tubes and

centrifuged at 5000 x g for 10 min at 4°C. The supernatant (crude membrane fraction) was

transferred to fresh tubes and flash frozen in liquid nitrogen, apart from the nuclear pellets,

which were also flash frozen. Before affinity purification the composition of the cytoplasmic and

membrane fractions was adjusted to 150 mM NaCl, 50 mM Tris pH 7.5. The composition of the

cytoplasmic fraction was additionally adjusted by adding NP40 to a final concentration of 0.05%.

TAP of viral protein complexes: Viral proteins were purified from 6 to 9 plate-equivalents of

nuclear and cytoplasmic (or nuclear and membrane) fractions by sequential FLAG and HA

immunoprecipitation12,58. Tandem affinity purifications were typically done in batches of 2 to 6

cell lines expressing viral ORFs and included empty vectors or GFP controls cells. For FLAG

purification 40 ml of FLAG agarose bead slurry (FLAG-M2 agarose, Sigma) was added to

cellular extracts, which were then incubated for 4 hr at 4°C under constant end-over-end mixing.

The FLAG-agarose beads were collected by low speed centrifugation and washed three times

with 500 ml of cold FLAG IP buffer (150 mM NaCl, 50 mM Tris, 1 mM EDTA, 0.5% NP40, 10%

glycerol and protease inhibitors) and once with 500 ml of HA-IP buffer (150 mM NaCl, 50 mM

Tris, 1 mM EDTA, 0.05% NP40, 10% glycerol and protease inhibitors). FLAG-purified

complexes were recovered by two successive 30 min elutions with 20 ml FLAG elution buffer

(400 µg /ml of FLAG peptide in HA IP buffer) at 4°C under constant vortexing. To prevent FLAG

agarose bead carry-over, pooled eluates were filtered through a pre-wetted 0.45 mm filter

before the subsequent HA purification step. For HA purification 20 ml of HA agarose bead slurry

(HA-F7 agarose conjugate, Santa Cruz) was added to cellular fractions, which were then


42



incubated overnight at 4°C under constant vortexing. HA agarose beads were collected by low

speed centrifugation and washed three times with 200 ml of cold HA-IP buffer. HA-purified

complexes were recovered by two successive 30 min elutions in 20 ml HA elution buffer (400 µg

/ml of 6xHis-HA peptide in HA IP buffer) at 4°C under constant vortexing. To prevent HA

agarose bead carry-over, pooled eluates were filtered through a pre-wetted 0.45 mm filter

before analysis. A 15% aliquot of each HA eluate was resolved on 4-12% acrylamide gels run in

MOPS buffer (NuPAGE, Invitrogen), and viral and associated host proteins were visualised by

silver-stain (Supplementary Fig. 17).

Biological replicates: We repeated the tandem affinity purification process for 144 (95%) of the

initial 152 sub-cellular fractions used for the first iteration (TAP1). This biological repeat (TAP2)

was initiated several months after the completion of TAP1.

Sample digestion of viral-protein multi-component complexes: Purified viral proteins and

associated host proteins were denatured and reduced by incubation at 56°C for 30 min in 10

mM DTT and 0.1% RapiGest (Waters). Reduced cysteines were alkylated by adding

iodoacetamide to a final concentration of 20 mM and then incubated at room temperature for 20

min in the dark. Protein digestion was carried out overnight at 37°C after adding 2 mg of trypsin

and adjusting the pH to 8.0 with a 1 M Tris solution. For tryptic peptide clean-up RapiGest was

inactivated following the protocol of the manufacturer. Tryptic peptides were purified by batch-

mode reverse-phase C18 chromatography (Poros 10R2, Applied Biosystems) using 40 ml of a

50% bead slurry in RP buffer A (0.1% trifluoroacetic acid), washed with 100 ml of the same

buffer and eluted with 50 ml of RP buffer B (40% acetonitrile in 0.1% trifluoroacetic acid). After

vacuum concentration, peptides were further purified by strong cation exchange SCX

chromatography (Poros 20HS, Applied Biosystems) using 20 ml of a 50% bead slurry in SCX

buffer A (25% ACN in 0.1% formic acid), washed with 20 ml of the same buffer and eluted with

20 ml of SCX buffer B (25% ACN, 300 mM KCl in 0.1% formic acid).


43



RESEARCH

Primary LC-MS/MS data acquisition: After vacuum concentration, half of each sample was

loaded onto a pre-column (100 µm I.D. x 4 cm; packed with POROS 10R2, Applied Biosystems)

at a flow rate of 4 µl/min for 15 min, using a NanoAcquity Sample Manager (20 µl sample loop)

and UPLC pump59. After loading, the peptides were gradient eluted (1-30% B in 45 min; buffer

A: 0.2 M acetic acid, buffer B: 0.2 M acetic acid in acetonitrile) at a flow rate of ~50 nl/min to an

analytical column (30 µm I.D. x 12 cm; packed with Monitor 5 µm C18 from Column

Engineering, Ontario, CA), and introduced into an LTQ-Orbitrap XL mass spectrometer

(ThermoFisher Scientific) by electrospray ionization (spray voltage = 2200V). Three rapid LC

gradients were included at the end of every analysis to minimize peptide carry-over between

successive analyses and to re-equilibrate the columns. The mass spectrometer was

programmed to operate in data dependent mode, such that the top eight most abundant

precursors in each MS scan (detected in the Orbitrap mass analyzer, resolution = 60,000) were

subjected to MS/MS resolution (CAD, linear ion trap detection, collision energy = 35%,

precursor isolation width = 2.8 Da, threshold = 20,000). Dynamic exclusion was enabled with a

repeat count of 1 and a repeat duration of 30 sec. The second half of each sample was

reanalyzed by LC-MS/MS after the completion of the initial set of analyses (“Technical

Replicates”).

Database searching: Orbitrap raw data files were processed within the multiplierz software

environment60. MS spectra were recalibrated using the background ion (Si(CH3)2O)6 at m/z

445.12 +/- 0.03 and converted into a Mascot generic format (.mgf). MS spectra were searched

using Mascot version 2.3 against four appended databases of: i) human protein sequences

(downloaded from RefSeq on 07/11/2011); ii) viral proteins analyzed in this study; iii) common

lab contaminants and iv) a decoy database generated by reversing the sequences from the

human database. Search parameters specified a precursor ion mass tolerance of 1 Da, a

product ion mass tolerance of 0.6 Da, trypsin with a maximum of two missed cleavages,


44



variable oxidation of methionine (M, +16 Da), and carbamidomethylation of cysteines (C, +57

Da).

Mascot search results from all experimental runs were separately imported for TAP1 and

TAP2 into two multiplierz mzResults files (.mzd)61 for further processing: The list of peptide hits

from the Mascot searches was filtered to exclude peptides with precursor mass error greater

than 5 ppm. Sequence matches to the decoy database were used to establish a 1% false

discovery rate (FDR) filter for the resulting peptide identifications. A rapid peptide matching

algorithm62 was then used to map peptide sequences to all possible human Entrez Gene IDs.

Only peptides that were uniquely assignable to a single Entrez Gene ID were considered as

detection evidence. For each viral-host protein association, peptide evidences for a given target

human protein were combined across different subcellular fractions and LC-MS/MS technical

replicates, when applicable. Any Entrez Gene ID present in the “Tandome” (Supplementary

Table 15) was automatically removed from the list of putative interactors.

TAP-LC-MS quality controls: We used a multi-tiered approach to minimize the occurrence of

peptide carryover across LC-MS analyses: 1) samples within each batch of LC-MS analyses

were injected in order of increasing abundance and complexity, as evaluated by SDS-

PAGE/silver stain visualisation of the corresponding TAPs; 2) three rapid LC gradients were run

at the end of every LC-MS analysis; 3) for all LC-MS analyses, we systematically evaluated run-

to-run carry-over by calculating extracted ion chromatogram (XIC) intensities of peptides derived

from viral proteins. 4) samples within each purification batch, and LC-MS analysis sequence

were randomized between biological replicates (TAP1 and TAP2). This multi-step quality control

process led us to discard all LC-MS data for EBV-BFRF2 and EBV-BRRF2 from our TAP2 (and

as a consequence from the final data set, since we only report proteins identified reproducibly

across TAP1 and TAP2), and for two GFP negative controls. Because we identified cross

contamination between HPV8 E6 and HPV11 E6 at an early point of our project, we were able

to repeat the TAP and LC-MS analyses to generate a “clean” data set for bio-replicates.


45



RESEARCH

Pathway visualisation and analysis: Protein associations detected between HPV-E6 and

HPV-E7 oncoproteins and host proteins were imported into Pathway Palette for data analysis

and visualisation63. For each of these data sets, we first calculated the probability that the

observed set of human proteins targeted by multiple E6 or E7 across different HPV types

(HPV5, HPV6b, HPV8, HPV11, HPV16 and HPV18) could occur by chance (Fig. 1d and

Supplementary Fig. 4): for each viral protein of a given HPV type, the same number of

associated human proteins as that experimentally observed in our data was randomly chosen

from our HI-2 database of human binary interactions

(http://interactome.dfci.harvard.edu/H_sapiens/index.php). This process was repeated 1,000

times. The number of human proteins associated with two or more viral proteins from different

HPV types was calculated in each simulated network and the frequency distribution of random

graphs with a given number of shared nodes was computed. Lastly, this distribution was used to

calculate the probability that the number of human proteins targeted by multiple HPV types

observed in our experimental graphs (40 for E6 and 25 for E7) could occur by chance. We also

calculated the probability of detecting human proteins targeted by two viral proteins from HPVs

of the same class: the total number of human proteins targeted by viral proteins from pairs of

HPVs of the same class ([HPV5 and HPV8], [HPV6b and HPV11], [HPV16 and HPV18]) was

extracted from our TAP-MS data. This value was divided by the total number of human proteins

that could theoretically be shared across all pairs of HPV from the same class in our observed

networks. The same ratio was calculated for human proteins targeted by two viral proteins from

HPVs of different classes. The ratio of these two ratios (4.9 for E6 and 1.05 for E7, indicated by

green arrows in the frequency distribution on the right) represents the overall enrichment of

human proteins targeted by proteins encoded by HPVs of the same versus different class

observed in our graphs. One thousand randomized networks were created as before and used

to evaluate the probability that the observed ratio could occur by chance (Fig. 1d and

Supplementary Fig. 4). Virus-host interactions detected by TAP-MS are available at


46



http://interactome.dfci.harvard.edu/V_hostome/index.php?page=home. Graphs of protein

interaction networks for each viral protein, including the silver stain image of a representative

tandem affinity purification, are accessible at: http://blaispathways.dfci.harvard.edu/Palette.html.

F. Subtracting likely non-specific protein associations (“Tandome”)

Twenty five control TAP experiments were run at multiple points during the entire project. These

controls consisted of FLAG-HA purifications from nuclear, cytoplasmic or membrane extracts

prepared from parental IMR-90 cells, from IMR-90 expressing FLAG-HA-tagged GFP or from

IMR-90 cells transduced with the empty NTAP or CTAP vectors. An additional 83 control TAP

experiments consisted of FLAG-HA or FLAG-StrepTactin purifications on HeLa subcellular

extracts. Any Entrez Gene ID identified in more than 1% of the control TAP experiments

(whether or not the Gene ID was detected by a uniquely assignable peptide sequence) was

included in the “Tandome” list. Any of the 679 Gene IDs present in the Tandome was

systematically removed from all viral protein interaction TAP data sets (Supplementary Table

15).

G. TAP reproducibility

The reproducibility rate of TAP-MS across the two biological replicates (TAP1 and TAP2)

corresponds to the percentage of viral-host protein associations detected in one TAP (the

“reference TAP”) at a given uniquely assignable peptide threshold, which was confirmed in the

other (the “repeat TAP”). This percentage was calculated for both permutations in which the

“reference TAP” and the “repeat TAP” were successively (TAP1 and TAP2) and (TAP2 and

TAP1), respectively. The average reproducibility rate was also calculated. As expected,

reproducibility increased as a function of the number of unique peptides detected



47



RESEARCH

H. Virus-human Positive Reference Set (PRS)

The VirusMINT database curates from the literature viral-host interactions for 10 disease-

relevant virus groups64. Each curated interaction gets an evidence count, where multiple

evidences could be multiple publications, or multiple methodological verifications, reporting the

same interaction. To generate a Positive Reference Set (PRS) of high confidence interactions,

we only collected interactions supported by two or more evidences in VirusMINT, resulting in a

master list of 134 interactions. Given the different types of associations detected by Y2H and

TAP-MS, a specific PRS was designed for each one of these data sets (Supplementary Table

16). For the Y2H-specific PRS, we first selected host proteins with an available ORF clone in

the human ORFeome 5.1 collection (http://horfdb.dfci.harvard.edu/), resulting in a list of 94 viral-

host protein interactions. This list was filtered to remove interactions involving known

autoactivators in the yeast two-hybrid assay (Supplementary Table 14) and interactions that

were not tested in the yeast two-hybrid screen. The resulting Y2H-specific PRS contains 62

interactions. Starting with the master list of 135 interactions, the TAP-specific PRS was created

by removing interactions that involved viruses that were not tested in the TAP-MS assay and

interactions that involved a human protein present in our “Tandome” (Supplementary Table 15),

resulting in a TAP-specific PRS of 94 interactions (Supplementary Table 16).

I. Pathway enrichment analysis

Pathway enrichment used FuncAssociate 2.0 (http://llama.mshri.on.ca/funcassociate/)65, which

performs a Fisher’s Exact Test and compares the observed P-value for the query set to the

minimum P-value obtained from Monte Carlo sampling of the appropriate gene sample space.

When the query set of genes arose from multiple sample spaces, a Python script of the

FuncAssociate approach was used to independently sample from each space so that the

random query sets matched the original set in size and composition. KEGG pathway

annotations were downloaded from http://www.genome.jp/kegg/download/. Gene Ontology


48



annotations were downloaded from the FuncAssociate 2.0 website. For all analyses we

considered only specific pathways and functional terms, defined here as those which have been

annotated with fewer than 1000 genes. All P-values were adjusted for testing of all pathways in

the GOA or KEGG databases (Supplementary Table 17), including the reported P-values for

apoptosis pathway members. Additionally, pathway enrichment of TAP-MS data sets were

assessed at unique peptide thresholds ranging from 1 to 5, with the resulting P-values corrected

for the five hypotheses tested, using the Bonferroni method. Odds ratios were calculated using

2x2 contingency tables.

J. Microarray preprocessing and differential expression

Microarray data analysis used R/Bioconductor (http://www.bioconductor.org). Data was first

normalized by robust multiarray averaging (RMA) using a custom Chip Description File (CDF)

from the Michigan Microarray Lab (http://brainarray.mbni.med.umich.edu, Version 13). The

Human Gene 1.0 ST array from Affymetrix contains 764,885 probes, with 26 probes located

across the full length of each annotated gene from the March 2006 human genome sequence

assembly. The Michigan Microarray Lab CDF reanalyzes and matches each probe sequence to

the most current gene annotation, producing summarized expression values for 19,628 genes

with Entrez Gene IDs. Using this custom CDF rather than the original Affymetrix annotation

ensured a more accurate estimate of gene expression levels66.

A total of 435 microarrays were used to profile cell lines expressing 63 viral ORFs, as well

as controls. The arrays were processed in eight groups (Supplementary Table 18). Note that

each group includes cell lines expressing proteins from a variety of viruses, five of the groups

include control cell lines, and the order of the arrays was randomized. These preemptive

measures were taken in order to eliminate potential systematic biases that could affect our

results and interpretation. The normalized data showed a slight propensity to cluster according

to the batch covariate (Supplementary Fig. 18). We used ComBat67 to reduce this batch effect.


49



RESEARCH

ComBat applies a linear model to the expression levels, with factors corresponding to global

expression, sample-dependent expression, additive/multiplicative batch effects and noise. It

then synthesizes information across genes using an empirical Bayes framework, and thus

estimates and removes batch effects (Supplementary Fig. 18). The R package limma was used

to test for differential expression between each of the 63 conditions and GFP-control arrays.

Multiple testing adjustments were carried out with the Benjamini-Hochberg method.

K. Microarray gene and sample clustering

We used the R package limma to test for differential expression across all pairwise comparisons

between biologically distinct samples, including the control. The top 2,944 genes were selected

by ranking all genes on the array by the number of significant comparisons (Padj < 0.005;

Supplementary Table 19). This transcriptome profiling data is available at

http://interactome.dfci.harvard.edu/V_hostome/index.php?page=home. We applied the R

package mclust to carry out model-based clustering on the top 2,944 genes. We used the hmap

function in the seriation R package to generate the heatmap representation of mean fold-

change of expression of genes within each cluster.

L. Predicting cell-specific transcription factor binding sites

Position weight matrices (PWMs) corresponding to mammalian transcription factors were

obtained from the TRANSFAC 7.0, Jaspar, and UniProbe databases. We supplemented

available PWMs by downloading experimentally determined genome-wide transcription factor

binding sites from the GEO and the UCSC genome databases. We conducted de novo motif

discovery using the MEME 4.3 and Weeder 1.4.2 software tools. We then mapped motifs to

genes using either the keys provided by the databases or based on the antibody used for the

ChIP experiment. A total of 610 genes were identified for which at least one PWM was

available. Of these 361 had at least one “high-quality” PWM, defined as being derived from


50



either a ChIP-seq or ChIP-chip experiment or from a universal protein binding microarray

(PBM).

Database Number of PWMs Transfac: http://www.biobase-international.com/ 709 Jaspar: http://jaspar.genereg.net/ 721 Uniprobe (not in Jaspar): http://the_brain.bwh.harvard.edu/uniprobe/ 26 UCSC: http://genome.ucsc.edu/ or GEO: http://www.ncbi.nlm.nih.gov/geo/

67

Clustering of position weight matrices: Our identification of regulatory variants depends on

correctly mapping genes to motifs, although our integrative approach can tolerate some errors.

We took a computational approach to improving the quality of the gene to motif mapping. For

genes for which a “high-quality PWM” was available, we eliminated any other PWM that showed

no resemblance. To do so systematically, we hierarchically clustered all PWMs. First 100,000

random 30-mers were generated. Then for each of the ~1500 PWMs, the maximum match

score (MMS) for overlap between the motif and 30-mer was generated by comparing the

observed frequency fbj of base b at each position j in the PWM with the background frequency of

that base pb in the genome and summing over the length of the motif:

The matrix of 30-mer x PWM was hierarchically clustered using the hclust function in R, with

average linkage and the Pearson correlation coefficient used as the metric for similarity. A

coarse clustering cutpoint (height = 0.85, resulting in 232 total PWM clusters) was used to

broadly group together PWMs with some evidence of similarity. For genes having at least one

high quality PWM, we eliminated all PWMs that failed to cluster with any of these (although we

retained PWMs that represented dimers of the high quality motif). This approach rejected 113

gene-PWM pairs.


51



RESEARCH

Selection of motifs for genome-wide binding site prediction and enrichment analysis: We

selected individual motifs for subsequent analysis with the goal of providing at least one high

quality motif for every gene. For each of the 610 genes we selected all motifs derived from a

ChIP-seq, ChIP-chip or protein binding microarray experiment. If any other motif existed (i.e.

from Transfac or Jaspar), we included it if it appeared to correspond to a dimer or half-site not

represented in the otherwise selected high-quality motifs. All motifs were selected prior to

further analysis for genes without any high-quality PWM. A total of 841 motifs corresponding to

573 TFs satisfied the above criteria and the minimum match score of 12.

M. Transcription factor binding site (TFBS) enrichment analysis

IMR-90-specific TFBS were predicted by identifying binding sites with an MMS≥12 within

accessible chromatin regions. Accessible chromatin regions were identified using IMR-90-

specific genome-wide DNase-seq data downloaded in wig file format from the GEO database

(GSM530665, GSM530666). Wig files were processed using the BEADS packages68 to correct

for systematic biases in GC-content and ability to map reads. Corrected counts were smoothed

and binned into 250 bp groups. Bins with a significant number of counts were identified by

assuming that such counts follow a Poisson distribution. The Benjamini-Hochberg procedure

was used to estimate false discovery rates and contiguous bins with FDR < 0.001 were merged

into peaks. Only peaks observed within both replicate IMR-90 data sets were used for

subsequent analyses.

The enrichment of TFBS within a given expression cluster was computed for each TF by

counting the number of TFBS in DNase accessible regions located within 25 kb of the

transcription start site of any transcript within this cluster and comparing the count to a null

distribution of TFBS from length- and GC-content matched sets of mappable genomic regions.

For each cluster, nearby DNase-seq peaks were assembled into a single file and the peaks

trimmed to common lengths of 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250 or 2500 bp.


52



Next, 200 sets of length-matched “control” DNase-accessible segments were sampled from the

random genome segments. For each motif, the number of TFBS for each set of 200 sequences

was computed, and the values fitted to a Poisson distribution. This null distribution was used to

estimate the P-value for the actual number of TFBS within IMR-90 cells. The enrichment P-

values for all TF motifs within each cluster were combined and a tail-based false discovery rate

was estimated using the Benjamini-Hochberg procedure69. TFs with motifs significant at an FDR

of 0.5% or less were selected for further analysis. In total, we found 239 TFs to be enriched for

target genes in one or more clusters.

N. Randomization of gene clusters for enrichment analyses

To assess the biological coherence of identified gene clusters (Supplementary Tables 5 and 6)

we computed the random expectation of enriched KEGG pathways and GO terms. For 100

iterations, we randomly reassigned the 2,944 genes to 31 clusters matched in size to those

observed and repeated the enrichment analysis, counting at each iteration the average number

of significant KEGG pathways and GO terms per cluster. The computationally more intensive

GO term enrichment was performed by counting the average number of nominally significant

pathways (P < 0.05) per cluster. We found that there were more GO and KEGG annotations

than expected by chance (P < 0.01), underscoring the functional coherence of these clusters


O. Evaluating predictive power of regulatory cascades

We computed scores for the viral ORF-TF-cluster network in the following way: for each viral

ORF, we first identified all genes for which we would expect differential expression. To do so,

we identified all TFs that had a physical association with the protein encoded by the viral ORF

or was differentially expressed in response to its expression, considering fold changes greater

than 1.2 at a Padj < 0.001. For each of these TFs, we then identified all microarray clusters

enriched for genes containing the corresponding binding site within their promoters. Within


53



RESEARCH

these clusters, all genes with a high-probability binding site for that TF in their promoter were

selected. The union of these genes across all TFs targeted by a given viral ORF represents the

set of genes that would be expected to show differential expression upon transduction of this

viral ORF. We chose as the meaningful statistic the fraction of these target genes that were

observed to be differentially expressed on the corresponding array (i.e. where that viral ORF

was transduced) and then computed the average over all viral ORFs. A null distribution was

created by computing the same score for 1,000 random permutations of the 63 array conditions

(Fig. 2b). We used Cytoscape 2.8.1 to visualise the network70. The resulting network

(Supplementary Fig. 9) illustrates how viral proteins may regulate DNA damage response, as

well as core functions of fibroblasts, like proliferation and cell adhesion via multiple TFs.

P. Building a probabilistic map of IMR-90 specific RBPJ binding sites

To build a map of IMR-90 specific TF binding sites, we followed a previously described

probabilistic framework71. The approach combines PWM information with complementary

genomic and experimental data in a mixture model, and outputs a posterior probability of DNA

binding. Features of interest include chromatin accessibility (as obtained through genome-wide

DNase-seq number or histone mark ChIP-seq experiments), multi-species sequence

conservation, and distance from the nearest gene transcription start site.

We downloaded publicly available data from diverse sites. FASTA files corresponding to the

primary sequence of the complete human genome (hg19) were downloaded from the UCSC

FTP site. IMR-90-specific chromatin accessibility data was generated as in “Transcription factor

binding site (TFBS) enrichment analysis”. We collected individual base conservation information

from the phastCon tables in the UCSC genome browser. To identify transcription start sites for

all RefSeq IDs we downloaded the refGene.txt file from the UCSC Genome Browser FTP site.

Using the data obtained for each of the RBPJ PWM, we proceeded as follows:


54



1. Match score: the complete genome was scanned to compute a match score at each

base pair. All matches with a score ≥ 12 (an arbitrary threshold for similarity) were

included as potential binding sites.

2. Chromatin accessibility score: we defined a 200 bp “chromatin accessibility window”

around each potential binding site from (1) by assigning the total number of DNase-seq

reads in the 100 bp upstream and downstream of the motif starting position.

3. Nearest transcription start site: we identified the nearest transcription start site to each of

the potential binding sites.

4. Conservation score: we averaged the phastCon conservation score over the length of

the motif and assigned this average score to each potential binding site.

We then used the CENTIPEDE R package71 to infer an IMR-90-specific probability of RBPJ

binding at each position in the genome.

To generate a short list of Notch pathway transcriptional targets, we took the union of Notch

pathway members (as defined in KEGG) and potential Notch target genes72. From this set, we

chose the genes containing an IMR-90-specific high probability RBPJ binding site in their

promoters to generate the heatmap in Fig. 3b.

Q. Notch pathway enrichment analysis

Enrichment of viral protein targets for Notch pathway members compared the number of protein

complex associations or direct interactions between viral proteins and Notch pathway members

(as defined by KEGG Pathways) to the number of interactions selected at random for each viral

protein. The sampling methods have been defined in “HI-2 and analysis of overlaps between

Y2H, TAP-MS and PRS”.

R. Identification of loci implicated in familial and somatic cancer

The COSMIC Classic list of causal genes with somatic mutations observed in tumours was

downloaded from http://www.sanger.ac.uk/genetics/CGP/Classic/. To assemble the genome-


55



RESEARCH

wide association loci associated with cancer we consulted the publicly available Catalog of

Published Genome-Wide Association Studies (http://www.genome.gov/gwastudies/) and

selected all variants associated with cancer. We grouped variants showing evidence of linkage

disequilibrium and, for the resulting 107 loci, selected as the index SNP the most highly

associated variant for any phenotype.

We next mapped SNPs to neighboring genes, a task for which there is no consensus

procedure73. We first defined the boundaries in which SNPs tagged by the index variant may

reside. To do so, we identified all SNPs that were in close linkage disequilibrium (R2 ≥ 0.5) with

the index SNP. We then marched outward from the outermost SNP positions to the nearest

recombination hotspot74 (downloaded from the UCSC genome browser). We assumed that any

SNP within the region bounded by the two recombination hot spots may be a causal variant

responsible for the original association. Since variants within enhancers can act at considerable

distance from their physical position, we also considered all genes within 250 kb from the

hotspot boundaries. When no genes resided within these boundaries, we moved outwards in 50

Kb intervals until a gene was identified (Supplementary Table 20).

To compile SCNA loci implicated in somatic forms of cancer we consulted a recent

compendium27 of loci derived from SNP arrays of over 3,000 tumour samples. Physical

boundaries for the SCNAs were used to map each locus to genes contained fully within the

SCNA interval (Supplementary Table 21).

S. Testing enrichment of gene sets for cancer genes

We followed the simple hypothesis that the viral interactome is enriched in cancer genes and

looked for overlap between viral interaction partners and cancer genes highlighted in the

COSMIC Classic list (Supplementary Table 8). The number of overlapping genes was

compared to that obtained from 10,000 random gene sets matched in size and composition to

the query set.


56



T. Viral target overlap with candidate cancer genes identified by transposon screens

We identified four “Sleeping Beauty” transposon-based murine insertional mutagenesis screens

focused on finding genetic determinants of tumorigenesis in murine cancer models. These

studies correspond to murine models of colon cancer with75,76, or without77 mutations of the Apc

tumour suppressor, and pancreatic carcinoma78. The list of candidate cancer genes was

extracted from the main text or supplementary data of each manuscript and pooled to form a

single set of 1,359 genes (Supplementary Table 10). As expected, the candidate genes

identified through insertional mutagenesis exhibited a marked bias towards long genes. We also

noted that genes in this candidate set tend to be highly expressed based on our IMR90

microarray data.

We assessed the overlap between the Sleeping Beauty candidate genes and the VirHost

set by permutation. The analysis was limited to the universe of human genes with mouse

orthologues (downloaded from the Mouse Genome Database). We corrected for the bias

observed in gene expression and gene length by binning genes into quartiles based on mRNA

expression and based on gene length using the longest distance between the transcription start

site and 3’ end of all transcripts corresponding to a single gene. A total of sixteen bins were

created, corresponding to all combinations between bins of expression and gene length. 1,249

Sleeping Beauty candidate genes and 900 VirHost genes could be assigned to one of these

bins, with 156 genes common to the two sets. To assess the chance expectation for overlap,

random sets of 1,249 genes were selected that matched the original query set in gene length

and expression composition. Ten thousand iterations were used and the empirical P-value

determined to be the fraction of random gene sets with at least 156 overlapping genes.

We also identified an insertional mutagenesis screen completed to find genetic determinants

of leukemia using the Maloney Murine Leukemia virus79. The candidate gene set was

downloaded from the Supplementary Data, mapped to human orthologues and tested for


57



RESEARCH

overlap against the VirHost set. A significant overlap (20 out of 213 genes, P = 0.048) was

found.

U. Comparison of viral interactome and prioritization of cancer genes

Catalogues of somatic mutations from genome-wide cancer sequencing studies were obtained

from the supplementary tables of nine recently published studies80-88. The deleteriousness of

each mutation was assessed with the Polyphen2 web server

(http://genetics.bwh.harvard.edu/pph2/), which provides a score (HumVar) between 0 and 1,

with 1 being the most damaging (Supplementary Table 11). A cumulative deleteriousness score

for each protein was obtained by summing all Polyphen2 scores. We observed that large

proteins tended to have higher scores and thus normalized scores for protein length. All proteins

with at least one somatic mutation were ranked by cumulative deleteriousness score, with TP53

having the highest score (Supplementary Fig. 14). For a direct comparison of COSMIC Classic

gene overlap with the viral interactome at various peptide count thresholds, a matching number

of somatically mutated proteins were selected from the top of the ranked list and statistical

significance assessed by Fisher’s Exact Test. Genes at SCNA-AMP, SCNA-DEL and GWAS

loci were also tested against COSMIC Classic genes using Fisher’s Exact Test. Odds ratios

were calculated with 2x2 contingency tables.


58



Supplementary Notes

1. Significantly targeted and untargeted proteins

“Significantly targeted or untargeted” host proteins are sets of proteins that interact with viral

proteins at a higher or lower frequency, respectively, than would be expected given their degree

in the human HI-2 interaction network. The full set of viral-human Y2H interactions

(Supplementary Table 2) includes interactions obtained after recursive testing of the viral

proteins from all seven HPV types against a human interactor found by any one HPV in the

initial screen. To eliminate any bias introduced by the recursive nature of the verification retest

experiments with HPV proteins, and insure that all viruses are analyzed within the context of the

same reference search space this analysis only included interactions detected in the primary

screens and directly verified by pairwise retesting. In addition, we further restricted this analysis

to human proteins present in HI-2 (174 out of 307). We first counted the number of interactions

between viral proteins and human targets in the virus-human interaction network and then

randomly selected an identical number of proteins from HI-2. Sampling was performed with

replacement (i.e. a protein could be chosen more than once) and the likelihood of choosing any

protein determined by its number of interacting partners (degree) in HI-2. This process was

repeated 10,000 times to generate a random distribution When less than 5% of the simulations

generated a number of interactions equal to or greater than the number of experimentally

observed interactions for a given host protein, this was considered “significantly targeted”. When

more than 95% of the simulations generated a number of interactions equal to or greater than

the number of experimentally determined interactions, the host protein was considered

“significantly untargeted”. Otherwise, the human protein was considered “expectedly targeted”.

2. HI-2 and analysis of overlaps between Y2H, TAP-MS and their respective PRS

The human protein-protein interactions previously reported in our three high-throughput data

sets11,49,50 were updated to the gene annotations in human ORFeome 5.1 and then combined to


59



RESEARCH

generate the HI-2 list of unique interactions. Distinctions between ORFs corresponding to

multiple splice variant of a human gene were ignored when reporting protein-protein

interactions.

The significance of overlaps between different data sets was assessed by Fisher’s Exact

Test or permutation. We examined three overlaps, the overlap between TAP-MS and Y2H, as

well as the overlap of these two data sets separately with their respective PRS (Supplementary

Table 16). The search space for calculating the Y2H overlap with the Y2H-specific PRS is the

hORFeome v5.1. The search space for the TAP-MS and Y2H overlap corresponds to the

intersection of the hORFeome 5.1 and TAP-MS search spaces. The TAP-MS search space was

used to calculate the overlap between the TAP-MS data set and the TAP-MS-specific PRS. We

noticed that host proteins identified as viral targets by the TAP-MS technology tend to be

encoded by genes with high mRNA expression in IMR-90 cells (Supplementary Fig. 20). To

circumvent this expression bias when considering the TAP-MS data set, random sampling was

performed from a set of genes matching the query set in terms of expression quartiles, as

judged from genome-wide microarray analysis of IMR-90 cells.

Statistical significance of overlaps observed between the Y2H data set or the TAP-MS data

set and their respective PRS were computed by random sampling from the appropriate search

space, and controlling for expression bias for TAP-MS. Both our TAP-MS protein complex and

Y2H binary data sets significantly overlapped with PRS pairs (17 out of 94 protein pairs [P <

0.001] and 1 out of 62 protein pairs [P = 0.047] respectively). The overlap between our protein

complex viral-host interactions and those in the PRS varied as a function of analytical

parameters in our TAP-MS data: We observed that the fold enrichment of PRS proteins in the

set of host proteins detected in protein complex associations increased with the number of

unique peptides, although with fewer associations recovered at more stringent thresholds



60



Twenty four viral ORFs were found to have interactors in both TAP-MS and Y2H data sets

(Supplementary Fig. 1), and of these a total of six viral ORF-host protein interactions were

common to the two technologies (Supplementary Fig. 2). The observed overlap of six

interactions was significantly higher (P < 0.001) than the most common random expectation of

0.81 interactions (Supplementary Fig. 2). To test whether the set of genes identified by TAP-MS

as targets of a given viral protein included the immediate neighbours of viral targets identified by

Y2H, we repeated the analysis after expanding the list of Y2H targets to their direct neighbours

in HI-2. We observed that targets of viral protein identified through both technologies showed a

significant tendency to interact with each other (P < 0.001; Supplementary Fig. 2) implying that

host targets of the two technologies tend to ‘fall in the same neighbourhood’ of the network.

3. Measurement of the precision of the Y2H data set by wNAPPA

We measured the fraction of true biophysical interactions in our virus-host binary Y2H data set

(precision) by comparing the recovery rate of the detected interactions to those of the Positive

Reference Set (PRS) and Random Reference Set (RRS) in a wNAPPA orthogonal assay. In the

wNAPPA assay, bait and prey fusion proteins were expressed by coupled transcription-

translation and GST-tagged bait proteins captured by anti-GST antibody. Interactions were

detected using antibody to HA with standard immunochemical protocols. One viral ORF

(adeno5-E1A) was found to bind non-specifically to the GST plate and was therefore removed

from subsequent analysis of the wNAPPA data (Supplementary Table 22). The precision was

calculated for a recovery rate of 1% for the RRS, corresponding to a z-score threshold of 1.5. At

this threshold, we observe recovery rates of 9% ± 1% for the interactions detected by Y2H, 10%

± 4% for the PRS interactions and precisely 1% ± 1% for the RRS interactions. The precision

value was calculated according to the following formula50.


61



RESEARCH

where I+ is the number of true positives in the detected interactions (precision), f+ is the false

positive rate of wNAPPA (obtained as the fraction of RRS pairs reported positive) and (1 – f-) is

the fraction of Y2H supported PRS pairs that report positive in wNAPPA. Error bars of the

precision were calculated according to the variation of those values in the [1.5, 1.6] z-score

window. These measurements led to a precision of 90% +/- 7% for the virus-host binary Y2H

data set. We determined the fraction of true biophysical interactions in our virus-host binary Y2H

data set to be ~90% by comparing the validation rates of the data set to those of the PRS and

random reference set (Supplementary Fig. 22).

4. Network motif identification

We hypothesized that the striking variation in expression of multiple clusters may relate to TF

activity reinforcing feedback or feedforward loops. We thus used the list of predicted TF binding

sites to explore the interconnectedness of TFs. We focused on the TFs that were enriched for

TFBS in expression clusters and physically associated with or were differentially expressed in

response to expression of viral proteins (Supplementary Table 7). We looked for instances of

autoregulatory loops, feedback loops, and feed-forward loops – network motifs that might

amplify or modulate signals in biological networks – and compared the number observed in our

data to that observed after 1,000 random selections of TFs chosen from the set of TFs with one

or more predicted high-probability promoter binding site. There were far more interconnections

than expected by chance (P < 0.001; Supplementary Fig. 6), suggesting that viral proteins

target a dense regulatory TF network.

5. Cellular growth phenotypes

If gene expression clusters regulated by viral proteins were relevant to human cancer, then the

downstream variation in expression of host genes would be reflected in cellular growth

phenotypes. We measured growth and senescence rates for fifteen IMR-90 viral protein

expressing and control cell lines, and computed their Pearson correlation coefficients (R) with


62



the average gene expression level in each cluster. The significance of the correlation was

assessed for each cluster by randomly sampling 1,000 sets of equal size from the full gene

universe represented on the microarray. The resulting P values were adjusted for multiple

testing across the 31 clusters using the Benjamini-Hochberg procedure. There was significant

correlation between five clusters and growth or senescence phenotypes. Cluster C31

demonstrated the strongest correlation between mRNA expression and both cellular growth (R

= 0.8, Padj = 0.008) and senescence (R = -0.92, Padj = 0.016) (Supplementary Fig. 8). This

correlation suggests that the expression variation observed in each cluster across cell lines (Fig.

2a) is functionally relevant to viral proteins altering cellular phenotypes.

6. Cancer pathways perturbed by viral ORFs

Viral proteins perturb diverse core signalling pathways and biological functions of the host cell,

including many of the known hallmarks of cancer14. Dysregulation of these pathways perturbed

by viral ORFs could lead to tumorigenesis. Together, our transcriptional analysis and the

interactome data sets shed light on known and potentially novel connections between viral

proteins and cancer mechanisms.

We found several signatures of growth suppressor evasion in the transcriptional data. Viral

proteins in group III, which include polyomavirus and high-risk HPV proteins that target RB1,

were correlated with increased expression of genes involved in cell proliferation, including

components of the cell cycle (CDC25A, CDC25C, CDK1) and DNA replication machinery

(PCNA, RFC and MCM family members) and genes involved in nucleosome assembly (HIST1H

histone family). TP53-dependent pathways were typically downregulated in group III proteins,

reflecting that many of these viral proteins bind to and inactivate TP53. Pathways include cell-

cycle arrest mediated by CDKN1A as well as cell death through BAX and TRAIL pathway

components (cluster C12, 2.0-fold enriched for TP53 binding sites, P = 2.3 X 10-9). We also

observed that group II viral proteins, which include alternatively spliced E6 proteins from high-


63



RESEARCH

risk HPV types, HPV E5 proteins, E7 proteins from low-risk HPV6b and HPV11 and high-risk

HPV16, and many EBV proteins including EBNALP, induced a large decrease in TGFβ

signalling genes (cluster C16).

Metabolic dysregulation is often observed in cancer cells. Group III viral proteins induced

high expression of genes involved in cholesterol biosynthesis genes (DHCR24, HMGCS1,

SQLE) and fatty acid metabolism (MGLL, FABP3, ELOVL6, SCD), potentially reflecting the

need of rapidly proliferating cells and tumours for cholesterol89 and unsaturated fatty acids90,91.

Many viral proteins also increased expression of genes involved in response to hypoxia (cluster

C22).

Other characteristics that enable tumour growth include angiogenesis, invasion, and

metastasis. Group III viral proteins induced increased transcription of genes in the VEGF

pathway (cluster C23) and decreased transcription of genes involved in core functions of

fibroblasts, including expression of collagen components, secreted growth factors and

metalloproteases, and cell adhesion molecules such as integrins and protocadherins (clusters

C13, C17, C18, C19; Supplementary Fig. 9). Such changes imply some amount of

“reprogramming” induced by oncogenic viral proteins and again draw parallels with cancer

pathogenesis. High-risk HPV proteins may achieve changes to cell-substrate adhesion and

ECM production through transcriptional regulation of the growth factor EGR1. Other TFs with

highly enriched binding sites for these pathway genes include FOSL2, which interacts with low-

risk HPV E7s, and RUNX1. Still other viral ORFs may act through the transcription factors KLF7

and ARNTL, both of which are involved in insulin signalling92,93.

Many of the transcriptional changes caused by viral proteins relate to the two enabling

characteristics of cancer: genome instability and mutation, and tumour-promoting inflammation.

Group III viral proteins caused decreased expression of genes involved in the response to DNA

damage, including not only cell cycle arrest and apoptosis, but also autophagy via lysosome

assembly (clusters C6 and C7) and acidification (cluster C3; 3.5-fold enriched for binding sites


64



of the autophagy implicated TF NRF2/NFE2L294 (P = 0.0007); and oxidative stress control

through induction of G6PD and glutathione transferases GSTM4 and GSTM5 (cluster C15). We

also observed marked upregulation of DNA nucleotide excision and double-stranded break

repair (cluster C26 and C27: BRCA1, BRCA2, POL epsilon/delta, FANCB, XRCC2, BLM,

RAD51) and robust activation of two “inflammasome” pathways known to be triggered by DNA

damage: the type I interferon response via IRF activity (cluster C24, 9.5-fold enriched, P = 3 X

10-10; Fig. 2b) and inflammatory responses (e.g. IL6, CCL2, IL1B) via NF-κB activity (cluster

C23, 4.7-fold enriched for NF-κB sites, P = 1.8 X 10-5). Our network suggests that HPV E6s,

EBV proteins, and polyomaviruses activate inflammatory pathways by regulating the expression

of TFs like IRF1, NF-kB, RELB, and MAFF. In contrast, HPV E7s perturb REL directly through a

binary interaction. Tumour virus proteins also induce changes in the expression of genes

involved in other signalling pathways, including PDGFR signaling, calcium signalling, and WNT

signalling.

Pathway enrichment analysis of proteins identified through TAP-MS or Y2H was also

instructive on established and novel pathways targeted by viruses. To identify host pathways

targeted by viral proteins in our combined interactome data set, we explored the enrichment of

Gene Ontology (GO) terms among host target proteins (Supplementary Fig. 3). Enrichment for

some of the GO terms was expected (e.g., “mitotic cell cycle” for HPV, EBV and polyomavirus

and “protein phosphatase 2A binding” for polyomavirus). Over-representation of “mitotic cell

cycle” genes amongst HPV-associated proteins in our data set arises not only from established

associations of HPV E6 and E7 with the RB1 and TP53 tumour suppressors, but also from

associations observed with 28 additional cell cycle proteins, including PCNA, MCM3 and

CDKN1A (Padj < 0.01, OR = 3.3) (Supplementary Table 17).

Enrichment analysis also revealed previously unrecognised pathways unique to specific

viruses. EBV proteins collectively bind members of the procollagen-proline 4-hydroxylase family

(Padj= 0.05, OR = 389; Supplementary Fig. 3) including P4HA2, a hypoxia- and p53-responsive


65



RESEARCH

protein, which inhibits angiogenesis and tumour growth in mice95. Adenoviral proteins bind

multiple members of the BH-domain family (BAX, BIK, BAK1; Padj = 0.04, OR = 159), which play

a critical role in apoptosis96. The adenovirus E4ORF1 protein binds to a family of mRNA 5’UTR

binding proteins (Padj = 0.03, OR = 319), including the insulin-like growth factor binding protein 2

(IGFBP2), which has recently been implicated in angiogenesis and metastasis in breast

cancer97.


66



Supplementary References 30. Rual, J.F. et al., Human ORFeome version 1.1: a platform for reverse proteomics.

Genome Res. 14, 2128-2135 (2004). 31. Sowa, M.E., Bennett, E.J., Gygi, S.P., & Harper, J.W., Defining the human

deubiquitinating enzyme interaction landscape. Cell 138, 389-403 (2009). 32. Cole, S.T. & Danos, O., Nucleotide sequence and comparative analysis of the human

papillomavirus type 18 genome. Phylogeny of papillomaviruses and repeated structure of the E6 and E7 gene products. J. Mol. Biol. 193, 599-608 (1987).

33. Dartmann, K., Schwarz, E., Gissmann, L., & zur Hausen, H., The nucleotide sequence and genome organization of human papilloma virus type 11. Virology 151, 124-130 (1986).

34. Fuchs, P.G., Iftner, T., Weninger, J., & Pfister, H., Epidermodysplasia verruciformis-associated human papillomavirus 8: genomic sequence and comparative analysis. J. Virol. 58, 626-634 (1986).

35. Zachow, K.R., Ostrow, R.S., & Faras, A.J., Nucleotide sequence and genome organization of human papillomavirus type 5. Virology 158, 251-254 (1987).

36. Cole, S.T. & Streeck, R.E., Genome organization and nucleotide sequence of human papillomavirus type 33, which is associated with cervical cancer. J. Virol. 58, 991-995 (1986).

37. Schwarz, E. et al., DNA sequence and genome organization of genital human papillomavirus type 6b. EMBO J. 2, 2341-2348 (1983).

38. Seedorf, K., Krammer, G., Durst, M., Suhai, S., & Rowekamp, W.G., Human papillomavirus type 16 DNA sequence. Virology 145, 181-185 (1985).

39. Schwarz, E. et al., Structure and transcription of human papillomavirus sequences in cervical carcinoma cells. Nature 314, 111-114 (1985).

40. Sedman, S.A. et al., The full-length E6 protein of human papillomavirus type 16 has transforming and trans-activating activities and cooperates with E7 to immortalize keratinocytes in culture. J. Virol. 65, 4860-4866 (1991).

41. Liu, Y. et al., Multiple functions of human papillomavirus type 16 E6 contribute to the immortalization of mammary epithelial cells. J. Virol. 73, 7297-7307 (1999).

42. Schowalter, R.M., Pastrana, D.V., Pumphrey, K.A., Moyer, A.L., & Buck, C.B., Merkel cell polyomavirus and two previously unknown polyomaviruses are chronically shed from human skin. Cell Host Microbe 7, 509-515 (2010).

43. van der Meijden, E. et al., Discovery of a new human polyomavirus associated with trichodysplasia spinulosa in an immunocompromized patient. PLoS Pathog. 6, e1001024 (2010).

44. Zalvide, J. & DeCaprio, J.A., Role of pRb-related proteins in simian virus 40 large-T-antigen-mediated transformation. Mol. Cell. Biol. 15, 5800-5810 (1995).

45. Hahn, W.C. et al., Enumeration of the simian virus 40 early region elements necessary for human cell transformation. Mol. Cell. Biol. 22, 2111-2123 (2002).

46. Zhong, Q. et al., Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol. 5, 321 (2009).

47. James, P., Halladay, J., & Craig, E.A., Genomic libraries and a host strain designed for highly efficient two-hybrid selection in yeast. Genetics 144, 1425-1436 (1996).

48. Walhout, A.J. & Vidal, M., A genetic strategy to eliminate self-activator baits prior to high-throughput yeast two-hybrid screens. Genome Res. 9, 1128-1134 (1999).

49. Rual, J.F. et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).

50. Venkatesan, K. et al., An empirical framework for binary interactome mapping. Nature Methods 6, 83-90 (2009).


67

51. Litovchick, L., Florens, L.A., Swanson, S.K., Washburn, M.P., & DeCaprio, J.A., DYRK1A protein kinase promotes quiescence and senescence through DREAM complex assembly. Genes Dev. 25, 801-813 (2011).

52. Spangle, J.M. & Munger, K., The human papillomavirus type 16 E6 oncoprotein activates mTORC1 signaling and increases protein synthesis. J. Virol. 84, 9398-9407 (2010).

53. Baker, S.J., Markowitz, S., Fearon, E.R., Willson, J.K., & Vogelstein, B., Suppression of human colorectal carcinoma cell growth by wild-type p53. Science 249, 912-915 (1990).

54. Litovchick, L. et al., Evolutionarily conserved multisubunit RBL2/p130 and E2F4 protein complex represses human cell cycle-dependent genes in quiescence. Mol. Cell 26, 539-551 (2007).

55. Livak, K.J. & Schmittgen, T.D., Analysis of relative gene expression data using real-time quantitative PCR and the 2(-ΔΔCT) Method. Methods 25, 402-408 (2001).

56. Stewart, S.A. et al., Lentivirus-delivered stable gene silencing by RNAi in primary cells. RNA 9, 493-501 (2003).

57. Ramsby, M.L. & Makowski, G.S., Differential detergent fractionation of eukaryotic cells. Analysis by two-dimensional gel electrophoresis. Methods Mol. Biol. 112, 53-66 (1999).

58. Nakatani, Y. & Ogryzko, V., Immunoaffinity purification of mammalian protein complexes. Methods Enzymol. 370, 430-444 (2003).

59. Ficarro, S.B. et al., Improved electrospray ionization efficiency compensates for diminished chromatographic resolution and enables proteomics analysis of tyrosine signaling in embryonic stem cells. Anal. Chem. 81, 3440-3447 (2009).

60. Parikh, J.R. et al., multiplierz: an extensible API based desktop environment for proteomics data analysis. BMC Bioinformatics 10, 364 (2009).

61. Webber, J.T., Askenazi, M., & Marto, J.A., mzResults: an interactive viewer for interrogation and distribution of proteomics results. Mol. Cell. Proteomics 10, M110 003970 (2011).

62. Askenazi, M., Marto, J.A., & Linial, M., The complete peptide dictionary--a meta-proteomics resource. Proteomics 10, 4306-4310 (2010).

63. Askenazi, M., Li, S., Singh, S., & Marto, J.A., Pathway Palette: a rich internet application for peptide-, protein- and network-oriented analysis of MS data. Proteomics 10, 1880-1885 (2010).

64. Chatr-aryamontri, A. et al., VirusMINT: a viral protein interaction database. Nucleic Acids Res. 37, D669-673 (2009).

65. Berriz, G.F., Beaver, J.E., Cenik, C., Tasan, M., & Roth, F.P., Next generation software for functional trend analysis. Bioinformatics 25, 3043-3044 (2009).

66. Dai, M. et al., Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).

67. Johnson, W.E., Li, C., & Rabinovic, A., Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118-127 (2007).

68. Cheung, M.S., Down, T.A., Latorre, I., & Ahringer, J., Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 39, e103 (2011).

69. Benjamini, Y. & Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B (Methodological) 57, 289-300 (1995).

70. Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., & Ideker, T., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27, 431-432 (2011).

71. Pique-Regi, R. et al., Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447-455 (2011).

72. Palomero, T. et al., NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth. Proc. Natl Acad. Sci. USA 103, 18261-18266 (2006).


68



RESEARCH

51. Litovchick, L., Florens, L.A., Swanson, S.K., Washburn, M.P., & DeCaprio, J.A., DYRK1A protein kinase promotes quiescence and senescence through DREAM complex assembly. Genes Dev. 25, 801-813 (2011).

52. Spangle, J.M. & Munger, K., The human papillomavirus type 16 E6 oncoprotein activates mTORC1 signaling and increases protein synthesis. J. Virol. 84, 9398-9407 (2010).

53. Baker, S.J., Markowitz, S., Fearon, E.R., Willson, J.K., & Vogelstein, B., Suppression of human colorectal carcinoma cell growth by wild-type p53. Science 249, 912-915 (1990).

54. Litovchick, L. et al., Evolutionarily conserved multisubunit RBL2/p130 and E2F4 protein complex represses human cell cycle-dependent genes in quiescence. Mol. Cell 26, 539-551 (2007).

55. Livak, K.J. & Schmittgen, T.D., Analysis of relative gene expression data using real-time quantitative PCR and the 2(-ΔΔCT) Method. Methods 25, 402-408 (2001).

56. Stewart, S.A. et al., Lentivirus-delivered stable gene silencing by RNAi in primary cells. RNA 9, 493-501 (2003).

57. Ramsby, M.L. & Makowski, G.S., Differential detergent fractionation of eukaryotic cells. Analysis by two-dimensional gel electrophoresis. Methods Mol. Biol. 112, 53-66 (1999).

58. Nakatani, Y. & Ogryzko, V., Immunoaffinity purification of mammalian protein complexes. Methods Enzymol. 370, 430-444 (2003).

59. Ficarro, S.B. et al., Improved electrospray ionization efficiency compensates for diminished chromatographic resolution and enables proteomics analysis of tyrosine signaling in embryonic stem cells. Anal. Chem. 81, 3440-3447 (2009).

60. Parikh, J.R. et al., multiplierz: an extensible API based desktop environment for proteomics data analysis. BMC Bioinformatics 10, 364 (2009).

61. Webber, J.T., Askenazi, M., & Marto, J.A., mzResults: an interactive viewer for interrogation and distribution of proteomics results. Mol. Cell. Proteomics 10, M110 003970 (2011).

62. Askenazi, M., Marto, J.A., & Linial, M., The complete peptide dictionary--a meta-proteomics resource. Proteomics 10, 4306-4310 (2010).

63. Askenazi, M., Li, S., Singh, S., & Marto, J.A., Pathway Palette: a rich internet application for peptide-, protein- and network-oriented analysis of MS data. Proteomics 10, 1880-1885 (2010).

64. Chatr-aryamontri, A. et al., VirusMINT: a viral protein interaction database. Nucleic Acids Res. 37, D669-673 (2009).

65. Berriz, G.F., Beaver, J.E., Cenik, C., Tasan, M., & Roth, F.P., Next generation software for functional trend analysis. Bioinformatics 25, 3043-3044 (2009).

66. Dai, M. et al., Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).

67. Johnson, W.E., Li, C., & Rabinovic, A., Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118-127 (2007).

68. Cheung, M.S., Down, T.A., Latorre, I., & Ahringer, J., Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 39, e103 (2011).

69. Benjamini, Y. & Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B (Methodological) 57, 289-300 (1995).

70. Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., & Ideker, T., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27, 431-432 (2011).

71. Pique-Regi, R. et al., Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447-455 (2011).

72. Palomero, T. et al., NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth. Proc. Natl Acad. Sci. USA 103, 18261-18266 (2006).


68

73. Raychaudhuri, S. et al., Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 5, e1000534 (2009).

74. Myers, S., Bottolo, L., Freeman, C., McVean, G., & Donnelly, P., A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321-324 (2005).

75. Starr, T.K. et al., A Sleeping Beauty transposon-mediated screen identifies murine susceptibility genes for adenomatous polyposis coli (Apc)-dependent intestinal tumorigenesis. Proc. Natl Acad. Sci. USA 108, 5765-5770 (2011).

76. March, H.N. et al., Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis. Nature Genet. 43, 1202-1209 (2011).

77. Starr, T.K. et al., A transposon-based genetic screen in mice identifies genes altered in colorectal cancer. Science 323, 1747-1750 (2009).

78. Mann, K.M. et al., Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma. Proc. Natl Acad. Sci. USA 109, 5934-5941 (2012).

79. Bergerson, R.J. et al., An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model. Blood 119, 4512-4523 (2012).

80. Cancer Genome Atlas Research Network, Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615 (2011).

81. Agrawal, N. et al., Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154-1157 (2011).

82. Chapman, M.A. et al., Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467-472 (2011).

83. Lee, W. et al., The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473-477 (2010).

84. Parsons, D.W. et al., An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807-1812 (2008).

85. Parsons, D.W. et al., The genetic landscape of the childhood cancer medulloblastoma. Science 331, 435-439 (2011).

86. Pleasance, E.D. et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-190 (2010).

87. Puente, X.S. et al., Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101-105 (2011).

88. Stransky, N. et al., The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157-1160 (2011).

89. Kuehnle, K. et al., Prosurvival effect of DHCR24/Seladin-1 in acute and chronic responses to oxidative stress. Mol. Cell. Biol. 28, 539-550 (2008).

90. Scaglia, N., Caviglia, J.M., & Igal, R.A., High stearoyl-CoA desaturase protein and activity levels in simian virus 40 transformed-human lung fibroblasts. Biochim. Biophys. Acta 1687, 141-151 (2005).

91. Scaglia, N. & Igal, R.A., Stearoyl-CoA desaturase is involved in the control of proliferation, anchorage-independent growth, and survival in human transformed cells. J. Biol. Chem. 280, 25339-25349 (2005).

92. Kawamura, Y., Tanaka, Y., Kawamori, R., & Maeda, S., Overexpression of Kruppel-like factor 7 regulates adipocytokine gene expressions in human adipocytes and inhibits glucose-induced insulin secretion in pancreatic beta-cell line. Mol. Endocrinol. 20, 844-856 (2006).

93. Marcheva, B. et al., Disruption of the clock components CLOCK and BMAL1 leads to hypoinsulinaemia and diabetes. Nature 466, 627-631 (2010).


69



73. Raychaudhuri, S. et al., Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 5, e1000534 (2009).

74. Myers, S., Bottolo, L., Freeman, C., McVean, G., & Donnelly, P., A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321-324 (2005).

75. Starr, T.K. et al., A Sleeping Beauty transposon-mediated screen identifies murine susceptibility genes for adenomatous polyposis coli (Apc)-dependent intestinal tumorigenesis. Proc. Natl Acad. Sci. USA 108, 5765-5770 (2011).

76. March, H.N. et al., Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis. Nature Genet. 43, 1202-1209 (2011).

77. Starr, T.K. et al., A transposon-based genetic screen in mice identifies genes altered in colorectal cancer. Science 323, 1747-1750 (2009).

78. Mann, K.M. et al., Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma. Proc. Natl Acad. Sci. USA 109, 5934-5941 (2012).

79. Bergerson, R.J. et al., An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model. Blood 119, 4512-4523 (2012).

80. Cancer Genome Atlas Research Network, Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615 (2011).

81. Agrawal, N. et al., Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154-1157 (2011).

82. Chapman, M.A. et al., Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467-472 (2011).

83. Lee, W. et al., The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473-477 (2010).

84. Parsons, D.W. et al., An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807-1812 (2008).

85. Parsons, D.W. et al., The genetic landscape of the childhood cancer medulloblastoma. Science 331, 435-439 (2011).

86. Pleasance, E.D. et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-190 (2010).

87. Puente, X.S. et al., Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101-105 (2011).

88. Stransky, N. et al., The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157-1160 (2011).

89. Kuehnle, K. et al., Prosurvival effect of DHCR24/Seladin-1 in acute and chronic responses to oxidative stress. Mol. Cell. Biol. 28, 539-550 (2008).

90. Scaglia, N., Caviglia, J.M., & Igal, R.A., High stearoyl-CoA desaturase protein and activity levels in simian virus 40 transformed-human lung fibroblasts. Biochim. Biophys. Acta 1687, 141-151 (2005).

91. Scaglia, N. & Igal, R.A., Stearoyl-CoA desaturase is involved in the control of proliferation, anchorage-independent growth, and survival in human transformed cells. J. Biol. Chem. 280, 25339-25349 (2005).

92. Kawamura, Y., Tanaka, Y., Kawamori, R., & Maeda, S., Overexpression of Kruppel-like factor 7 regulates adipocytokine gene expressions in human adipocytes and inhibits glucose-induced insulin secretion in pancreatic beta-cell line. Mol. Endocrinol. 20, 844-856 (2006).

93. Marcheva, B. et al., Disruption of the clock components CLOCK and BMAL1 leads to hypoinsulinaemia and diabetes. Nature 466, 627-631 (2010).


69

94. Rao, V.A. et al., The antioxidant transcription factor Nrf2 negatively regulates autophagy and growth arrest induced by the anticancer redox agent mitoquinone. J. Biol. Chem. 285, 34447-34459 (2010).

95. Teodoro, J.G., Parker, A.E., Zhu, X., & Green, M.R., p53-mediated inhibition of angiogenesis through up-regulation of a collagen prolyl hydroxylase. Science 313, 968-971 (2006).

96. Kelekar, A. & Thompson, C.B., Bcl-2-family proteins: the role of the BH3 domain in apoptosis. Trends Cell Biol. 8, 324-330 (1998).

97. Png, K.J., Halberg, N., Yoshida, M., & Tavazoie, S.F., A microRNA regulon that mediates endothelial recruitment and metastasis by cancer cells. Nature 481, 190-194 (2011).


70

Rozenblatt-Rosen et al., 2012 SUPPLEMENTARY INFORMATION · 17 Pathway enrichment analysis Gene...

Documents

Transcript of Rozenblatt-Rosen et al., 2012 SUPPLEMENTARY INFORMATION · 17 Pathway enrichment analysis Gene...