ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular...

25
Supplementary document 1 ShinyGO: a web application for in-depth analysis of gene sets Steven Xijin Ge 1,* , and Dongmin Jung 1,2 1 Department of Mathematics and Statistics, Box 2225, Brookings, SD, USA 57007 2 Avison Biomedical Research Center, Yonsei University, Seoul, South Korea Table 1 compares ShinyGO (http://ge-lab.org/go/ ) with several other tools for enrichment analysis using gene lists. Methods To enable gene ID conversion, we downloaded all available gene ID mappings from Ensembl. The final mapping table for the current ShinyGO v0.4 release consists of 135,832,098 rows, mapping various gene IDs, including DNA microarray probe names, into Ensembl gene IDs. Enrichment analysis is calculated based on hypergeometric distribution followed by false discovery rate (FDR) correction. Background gene-sets are all protein-coding genes in the genome. As many of the enriched GO terms are related or redundant (i.e., cell cycleand cell cycle process), we provide two plots to summarize such correlation following a method developed in [2]. We first measure the distance among two gene-sets by 1─ N i / N u , where N i and N u is the number of genes in the intersect and the union of the two sets, respectively. The distance matrix is used to construct a hierarchical clustering tree using average linkage, and to construct a network of GO terms using a cutoff of 0.05 overlap ratio [2]. To identify enriched TF binding motifs, transcript annotation and promoter sequences are retrieved from Ensembl. For genes with multiple transcripts, the transcription start site (TSS) with multiple transcripts is used. If multiple TSS locations have the same number of transcripts, then the most upstream TSS is used. Promoters are scanned using TF binding motifs in CIS-BP [4]. Instead of defining a binary outcome of binding or not binding, which depends on arbitrary cutoffs, we recorded the best score for each of the TFs in every 300bp and 600bp promoter sequences. Then student’s t-test is used to compare the scores observed in a group of genes against the rest of genes. The P-values are corrected for multiple testing using false discovery rate (FDR). Use case As an example, we analyzed a set of 149 genes (Table 2) up-regulated in lymphoblasts cells (TK6, WTK1, and NH32) treated with ionizing radiation [6]. This gene list is available on the MSigDB [9] website [11]. The 149 human genes are mapped to 147 Ensembl gene IDs for enrichment analysis, as suggested by the mapping information is available at the “Genes” tab. Below, we will use a large collection of human gene-sets (Table 3) to investigate this list. GO Biological Process Using Gene Ontology (GO) Biological Process for gene-sets, we get an enrichment results as Table 4. The top terms are related with positive regulation of cellular metabolic process, response to stress, apoptosis, etc. These terms, many of which are related, are ranked by FDR in the table. A more organized presentation of these terms is shown in Figure 1, where related terms are grouped together. For example, several apoptosis related terms are grouped in the branch at the bottom of Figure 1. The most significant groups of terms are related to nitrogen metabolism. Ionizing radiation induces reactive oxygen and nitrogen species, which might activate signaling pathways in response to DNA Table 1. Comparison of selected enrichment analysis tools. #Organisms Custom Background Reference Enrichr 2 No [1] GOrilla 8 No [3] PlantGSEA 15 No [5] Panther 112 No [7] STRING 2031 No [8] DAVID 65,000 Yes [10] g:Profiler 208 No [12] ShinyGO** 208 No Present study ShinyGO new features: visualization of overlapping gene sets, gene characteristics plots (gene length, GC content, chromosomal location), KEGG pathway diagram, protein-protein interaction (PPI) network.

Transcript of ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular...

Page 1: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

1

ShinyGO: a web application for in-depth analysis of gene sets

Steven Xijin Ge1,*, and Dongmin Jung1,2

1Department of Mathematics and Statistics, Box 2225, Brookings, SD, USA 57007

2Avison Biomedical Research Center, Yonsei University, Seoul, South Korea

Table 1 compares ShinyGO (http://ge-lab.org/go/ ) with

several other tools for enrichment analysis using gene

lists.

Methods To enable gene ID conversion, we downloaded all

available gene ID mappings from Ensembl. The final

mapping table for the current ShinyGO v0.4 release

consists of 135,832,098 rows, mapping various gene IDs,

including DNA microarray probe names, into Ensembl

gene IDs.

Enrichment analysis is calculated based on

hypergeometric distribution followed by false discovery

rate (FDR) correction. Background gene-sets are all

protein-coding genes in the genome. As many of the

enriched GO terms are related or redundant (i.e., “cell

cycle” and “cell cycle process”), we provide two plots to

summarize such correlation following a method

developed in [2]. We first measure the distance among two

gene-sets by 1─ Ni / Nu, where Ni and Nu is the number of

genes in the intersect and the union of the two sets, respectively. The distance matrix is used to construct a hierarchical

clustering tree using average linkage, and to construct a network of GO terms using a cutoff of 0.05 overlap ratio [2].

To identify enriched TF binding motifs, transcript annotation and promoter sequences are retrieved from Ensembl.

For genes with multiple transcripts, the transcription start site (TSS) with multiple transcripts is used. If multiple TSS

locations have the same number of transcripts, then the most upstream TSS is used. Promoters are scanned using TF

binding motifs in CIS-BP [4]. Instead of defining a binary outcome of binding or not binding, which depends on

arbitrary cutoffs, we recorded the best score for each of the TFs in every 300bp and 600bp promoter sequences. Then

student’s t-test is used to compare the scores observed in a group of genes against the rest of genes. The P-values are

corrected for multiple testing using false discovery rate (FDR).

Use case As an example, we analyzed a set of 149 genes (Table 2) up-regulated in lymphoblasts cells (TK6, WTK1, and NH32)

treated with ionizing radiation [6]. This gene list is available on the MSigDB [9] website [11]. The 149 human genes

are mapped to 147 Ensembl gene IDs for enrichment analysis, as suggested by the mapping information is available

at the “Genes” tab. Below, we will use a large collection of human gene-sets (Table 3) to investigate this list.

GO Biological Process Using Gene Ontology (GO) Biological Process for gene-sets, we get an enrichment results as Table 4. The top terms

are related with positive regulation of cellular metabolic process, response to stress, apoptosis, etc. These terms, many

of which are related, are ranked by FDR in the table. A more organized presentation of these terms is shown in Figure

1, where related terms are grouped together. For example, several apoptosis related terms are grouped in the branch

at the bottom of Figure 1. The most significant groups of terms are related to nitrogen metabolism. Ionizing radiation

induces reactive oxygen and nitrogen species, which might activate signaling pathways in response to DNA

Table 1. Comparison of selected enrichment analysis

tools.

#Organisms Custom

Background

Reference

Enrichr 2 No [1]

GOrilla 8 No [3]

PlantGSEA 15 No [5]

Panther 112 No [7]

STRING 2031 No [8]

DAVID 65,000 Yes [10]

g:Profiler 208 No [12]

ShinyGO** 208 No Present study

ShinyGO new features: visualization of overlapping gene sets, gene

characteristics plots (gene length, GC content, chromosomal location),

KEGG pathway diagram, protein-protein interaction (PPI) network.

Page 2: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

2

damage[13]. As shown in Figure 1, other groups of terms are related to regulation of biosynthesis, response to

stimulus, and apoptosis. Many of these processes are known to underpinning cellular response to ionizing radiation

[14].

GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented with 61 (41%) nuclear

proteins (Table 5 and Figure 2) (FDR < 7.3×10-11, hypergeometric test). Several small and highly specific functional

categories are also identified. For example, 4 out of 7 proteins involved in the I-κB/NF-κB complex are included in

the gene list (FDR < 2.6×10-6). As shown in Table 5, the 4 proteins are RELA NFKB2 NFKBIA NFKB1, which is

available in the downloaded enrichment results. The I-κB/NF-κB complex plays important roles in immune response

[15]. This list also contains 6 out of the 42 proteins in the Cyclin-dependent protein kinase holoenzyme complex (FDR

< 9.4×10-6). Both proteins in the PCNA-p21 complex are included in the list (FDR < 7.5×10-4). Downstream of p53

signaling pathways, the interaction of p21 and PCNA plays a role in regulating DNA cell cycle after DNA damage

[16]. Enriched terms can also be displayed as connected networks. For example, Figure 4. The network of enriched

GO Cellular Component terms. It shows a big cluster of interconnected terms related to chromatin and nuclear, and a

group of 3 terms related to membrane and cell surfaces.

GO Molecular Function Using GO molecular function (Table 6 and Figure 3), ShinyGO reveals that 40 (27%) of the 147 genes have DNA

binding transcription factor activity (FDR < 2.9×10-14). This list contains many transcription factors such as JUNB,

NFKB1, STAT1, MYC and so on, which give rise to many of the terms in the big branch in the lower side of Figure

3. Other less significant terms include kinase binding, and cytokine receptor binding.

KEGG pathways Using KEGG pathways, we can detect overrepresentation of genes in cancer pathways with FDR < 8.5×10-20 (Table

7). Thirty-two genes in the lists are related to tumorigenesis (Figure 5). Other significant pathways include the P53

signaling pathway, for DNA damage response, and TNF and NF-κB signaling pathways for immune response.

ShinyGO retrieves the pathway diagrams from KEGG web server and highlights the user’s genes (Figure 6).

Transcription Factor target genes To investigate whether the 149 genes can be regulated by common transcription factors (TF)s, we choose the

“TF.Target” gene sets. These are verified or predicted TF target genes compiled from various sources, including

RegNetwork [17], CircuitsDB [18], TRED [19], ENCODE [20], and TRRUST[21]. As shown Tale 8, the most

significantly enriched TF is p53, which is represented by multiple gene-sets from different databases. Among the 147

genes, 27 (18%) are target genes of p53 (FDR < 7.08×10-21), which play a critical role in cellular response to ionizing

radiation[16]. Other significant TFs are NFKB and RELA, which probably mediate the immune response via the

Rela/NF-κB pathway[15]. Consistent with this enrichment, NFKB1, NFKB2, and RELA are included in the 147 query

genes. Other TFs with enriched target genes include SP1 and BRCA1. These enrichment results are organized in

Figure 7. The 4 gene-sets related to p53 target genes are grouped together. The 12 gene-sets of Rela/NF-κB also form

a bigger branch on the tree. Another groups of TFs at the bottom of the tree includes SP1, as well as FOS, and JUN.

Taking advantage of a large collection of transcription factor target genes, ShinyGO can help generate hypothesis on

gene regulation.

microRNA target genes As shown in Table 10, 13 of the 147 genes are target genes of miR-145, with FDR < 9.23×10-6. Previous studies have

shown that miR-145 is involved in DNA damage repair, and is regulated by p53 [22]. As shown in Figure 8, two

gene-sets are related to miR-145. miR-21 target genes are also overrepresented in the 147 genes (FDR < 9.23×10-6).

miR-21 has also been shown to be involved in DNA damage repair [23, 24], probably by targeting MSH2, a mismatch

repair gene. Other microRNAs in Figure 8, are less significant but may also be further investigated. For example, miR-

146 might be regulated by p53 as well [25].

Gene characteristics ShinyGO can also compare the list of 147 genes with the rest of the genes from several aspects. Figure 9A shows

these genes are distributed randomly on the chromosomes (Chi-squared test, P=0.97). A detailed genomic location

Page 3: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

3

map is shown in Figure 11. Figure 9C indicates that the genes are all protein-coding. These genes seem to have less

exons (Chi-squared test, P= 0.023) and more transcript isoforms (Chi-squared test, P= 0.0086) than other protein-

coding genes (Figure 9B and D). The distribution of the length of various gene features are shown in Figure 10. The

list of 147 genes have similar lengths for coding sequences, transcripts, and genomic span, and 3’ UTR (untranslated

region). Their 5’ UTR are slightly longer than the rest of coding-genes (T-test, P = 0.026, Figure 10B). Plotting and t-

test on these gene lengths are done on a log-scale, as the transformed data are closer to normal distribution. The GC

content of the genes are also similar to other genes.

STRING API The STRING API recognized 138 (94%) of the 147 genes. In addition to enrichment analysis based on GO and KEGG,

STRING also offers enrichment of protein domains using Pfam and InterPro databases. As shown in Table 11.

Enriched Pfam domains in the query genes.Table 11, this list is overrepresented with 5 proteins with Cyclin, N-

terminal domain, and 5 proteins with Helix-loop-helix DNA-binding domain.

A protein-protein interaction (PPI) network can be retrieved for all or some of the genes in the list. Figure 12 shows

the network for 50 genes, indicating the interactions of NFKBIA, RELA and other factors underlying the immune

response. This network is also richly annotated with links to original studies and even protein structures.

In conclusion, using ShinyGO, users can easily perform in-depth analysis of gene lists and generating novel

hypothesis for further study.

Table 2. Genes upregulated by radiation in lymphoblasts [6].

ADORA3 AHR AKAP5 ARL2BP ATF5 AXL BAI3 BATF BBC3 BIRC3 BTG1 BTG2 CARTPT CASP1 CCL18 CCL20 CCL21 CCL3 CCL4 CCNE2 CCNF CCNG1 CCNG2 CCNI CCNK CCR1

CCT4 CCT7 CD164 CD59 CD79A CD80 CD83 CDK1 CDK6 CDKN1A CDKN1C CFLAR CGRRF1 CLK1 COL4A2 COL6A1 COX7B CRAT CXCL10 DDB2 DDIT3 DKC1 DLX2 DPYSL3

DUSP6 ENC1 ENO1 EP300 FAM134C FAS FCN3 FGFR1 FN1 GDF15 GRN HBEGF HOXC5 IFITM1 IFNGR1 IL2RB IRF4 IRF9 ITGA2 JAK2 JUNB KIF20B KLF4 KLRC2 KRT14

KRT19 KRT8 LGALS3 LZTR1 MAP7 MAX MDFI MED21 MSC MSH2 MYC NCOA3 NDUFB5 NFE2L1 NFKB1 NFKB2 NFKBIA NFKBIE NR4A3 PCNA PKD2L1 PLAGL2 PLAU PLCG2

PLK1 PLOD3 POU3F4 POU5F1 PPM1D PRMT1 PTN PTPN11 PTPN22 PTPN6 PTTG1 RABGAP1 RAD50 RANBP1 RB1 RELA RFX5 ROCK1 RORA RPS6KA1 RRM2B SGK1 SIPA1

SMAD3 SNAPC2 STAT1 STAT5A STK4 TANK TCF21 TFDP2 TGFBR3 TGIF1 TLE3 TNFAIP3 TNFAIP8 TNFRSF8 TNFSF10 TP53I3 TP53TG1 TRAF1 TRRAP TSC22D3 UBE2C

VEGFC XPC ZBTB48 ZNF141 ZNF274 ZNF85

Page 4: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

4

Table 3. Gene Set databases for enrichment analysis in human.

Type Subtype/Database name #GeneSets Source Ref.

Gene Ontology Biological Process(BP) 12661 Ensembl 91 [26] Cellular Component (CC) 1581 Ensembl 91 Molecular Function (MF) 3191 Ensembl 91 KEGG KEGG 321 Release 84.1 [27] MSigDB.Computational Computational gene sets 858 MSigDB 6.1 [9] MSigDB.Curated REACTOME 674 MSigDB 6.1 BIOCARTA 217 MSigDB 6.1 PID 196 MSigDB 6.1 KEGG 186 MSigDB 6.1 Literature 3465 MSigDB 6.1 MSigDB.Hallmark hallmark 50 MSigDB 6.1 MSigDB.Immune Immune system 4872 MSigDB 6.1 MSigDB.Location Cytogenetic band 326 MSigDB 6.1 MSigDB.Motif TF and miRNA Motifs 836 MSigDB 6.1 MSigDB.Oncogenic Oncogenic signatures 189 MSigDB 6.1 GeneSetDB.Pathways Biocarta 197 GeneSetDB [28] EHMN 55 GeneSetDB HumanCyc 59 GeneSetDB KEGG(Disease) 81 GeneSetDB NetPath 25 GeneSetDB Reactome 776 GeneSetDB WikiPathways 152 GeneSetDB GeneSetDB.Drug CTD 1047 GeneSetDB DrugBank 176 GeneSetDB MATADOR 266 GeneSetDB SIDER 473 GeneSetDB SMPDB 81 GeneSetDB GeneSetDB.Other CancerGenes 23 GeneSetDB HPO 1639 GeneSetDB INOH 76 GeneSetDB MethCancerDB 21 GeneSetDB MethyCancer 54 GeneSetDB MPO 3134 GeneSetDB PID 195 GeneSetDB STITCH 4616 GeneSetDB T3DB 846 GeneSetDB TF.Target TFactS 109 GeneSetDB RegNetwork 1041 v.2015 [17] CircuitsDB 849 V.2012 [18] TRED 131 tftargets [19] ITFP 1974 tftargets [29] Neph2012 16477 tftargets [30] Marbach2016 609 tftargets [31] ENCODE 157 tftargets [20] TRRUST 798 tftargets [21] miRNA.Targets MicroCosm 44 GeneSetDB [28] miRTarBase 62 GeneSetDB miRDB 2588 V 5.0 [32] TargetScan 583 V 7.1 [33] miRTarBase 2599 V 7.0 [34] RegNetwork 618 V. 2015 [17] CircuitsDB 140 V. 2012 [18]

Total: 72,394

Page 5: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

5

Table 4. Enriched GO Biological Process terms for the 147 genes induced by radiation.

FDR Genes in list

Total genes Functional Category Genes (first 6)

1.52E-24 77 3202 Positive regulation of cellular metabolic process NFE2L1 CCNK AHR CCL20 GDF15 CCL4 1.52E-24 76 3087 Positive regulation of nitrogen compound metabolic process NFE2L1 CCNK AHR CCL20 GDF15 CCL4 2.1E-24 92 4771 Negative regulation of cellular process BIRC3 NFKB2 MSH2 MDFI TNFRSF8 TSC22D3 4.86E-24 76 3200 Positive regulation of macromolecule metabolic process NFE2L1 CCNK AHR CCL20 GDF15 CCL4 1.38E-23 78 3452 Positive regulation of metabolic process NFE2L1 CCNK AHR CCL20 GDF15 CCL4 1.39E-21 83 4228 Response to stress FAS NFKB2 CD59 MSH2 JAK2 STK4 1.63E-19 71 3305 Response to organic substance FAS NFKB2 CCL20 TNFRSF8 NCOA3 GDF15 5.46E-19 57 2137 Programmed cell death CFLAR BIRC3 MSH2 JAK2 TNFRSF8 PLAGL2 7.54E-19 69 3214 Cellular response to chemical stimulus CCL20 NCOA3 GDF15 CCL21 DUSP6 JUNB 1.22E-18 58 2265 Cell death CFLAR BIRC3 MSH2 JAK2 TNFRSF8 PLAGL2 1.62E-18 49 1586 Regulation of programmed cell death BIRC3 JAK2 GDF15 TSC22D3 DDIT3 CFLAR 3.56E-18 54 1998 Apoptotic process CFLAR BIRC3 MSH2 JAK2 TNFRSF8 PLAGL2 4.04E-18 83 4810 System development MSH2 JAK2 CD79A SGK1 TNFRSF8 DUSP6 4.21E-18 50 1705 Regulation of cell death BIRC3 JAK2 GDF15 TSC22D3 JUNB DDIT3 4.96E-18 48 1568 Regulation of apoptotic process BIRC3 JAK2 GDF15 TSC22D3 DDIT3 CFLAR 5.15E-18 82 4729 Regulation of cellular biosynthetic process NFKB2 NFE2L1 CCNK AHR MDFI TFDP2 5.33E-18 80 4512 Regulation of macromolecule biosynthetic process NFKB2 NFE2L1 CCNK AHR MDFI TFDP2 1.18E-17 82 4798 Regulation of biosynthetic process NFKB2 NFE2L1 CCNK AHR MDFI TFDP2 1.28E-17 64 2945 Regulation of multicellular organismal process CD83 TNFRSF8 DUSP6 VEGFC FGFR1 CDK6 1.48E-17 61 2682 Cellular response to organic substance CCL20 NCOA3 GDF15 CCL21 DUSP6 JUNB 1.49E-17 81 4713 Response to chemical FAS NFKB2 CCL20 TNFRSF8 NCOA3 GDF15 1.67E-17 70 3550 Animal organ development JAK2 CD79A DUSP6 TGFBR3 CDK6 PRMT1 3.45E-17 47 1587 Positive regulation of multicellular organismal process VEGFC FGFR1 TNFRSF8 IRF4 CASP1 CARTPT 4.03E-17 79 4573 Regulation of nucleobase-containing compound metabolic process NFKB2 NFE2L1 CCNK MSH2 AHR MDFI 4.03E-17 67 3315 Regulation of molecular function RABGAP1 BIRC3 CCNK CCL20 CCL3 CCL4 4.18E-17 64 3031 Cell surface receptor signaling pathway CD79A CCL20 GDF15 CCL21 DUSP6 VEGFC 1.59E-16 53 2129 Cell proliferation FAS JAK2 HBEGF VEGFC JUNB FGFR1 2.25E-16 37 986 Immune system development MSH2 JAK2 CD79A CDK6 PRMT1 AXL 2.91E-16 54 2246 Response to external stimulus FAS CD59 PKD2L1 CCL20 TNFAIP3 TNFRSF8 3.34E-16 61 2879 Intracellular signal transduction NFKB2 MSH2 JAK2 STK4 NFKB1 CCL20

Page 6: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

6

Figure 1. Hierarchical tree summarizing significant GO terms. Bigger dots at the end of branches correspond to more

significant FDR values, which is printed in front of the terms. Terms sharing more genes are grouped together.

Nitrogen metabolism

Apoptosis

Response to stimulus

Regulation of Biosynthesis

Page 7: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

7

Table 5. Enriched GO Cellular Component terms.

FDR Genes in list

Total genes Functional Category Genes (first 6)

7.26E-11 61 3657 Nucleoplasm BIRC3 FAS RRM2B CCNK JAK2 EP300 2.68E-10 58 3502 Protein complex CD79A COX7B CASP1 KIF20B XPC RELA 3.68E-10 65 4336 Nuclear lumen RAD50 JUNB BIRC3 FAS RRM2B FGFR1 1.25E-09 67 4715 Nuclear part AHR RAD50 XPC JUNB BIRC3 FAS 2.60E-06 4 7 I-kappaB/NF-kappaB complex RELA NFKB2 NFKBIA NFKB1 9.44E-06 6 42 Cyclin-dependent protein kinase holoenzyme complex CDK6 CDKN1A RB1 CDK1 CCNE2 CCNK 9.44E-06 13 332 Transcription factor complex TFDP2 MAX RB1 SMAD3 JUNB RELA 5.26E-05 16 595 Nuclear chromosome part RAD50 JUNB MSH2 STAT1 NCOA3 PCNA 9.52E-05 16 629 Nuclear chromosome RAD50 JUNB MSH2 STAT1 NCOA3 PCNA 1.82E-04 12 385 Nuclear chromatin RAD50 JUNB STAT1 NCOA3 KLF4 SMAD3 1.82E-04 10 262 External side of plasma membrane CD79A TGFBR3 CD59 CCR1 FAS IL2RB 7.54E-04 2 2 PCNA-p21 complex CDKN1A PCNA 7.99E-04 6 99 Serine/threonine protein kinase complex CDK6 CDKN1A RB1 CDK1 CCNE2 CCNK 1.30E-03 13 560 Chromatin RAD50 JUNB STAT1 NCOA3 KLF4 SMAD3 1.45E-03 6 113 Protein kinase complex CDK6 CDKN1A RB1 CDK1 CCNE2 CCNK 1.71E-03 17 932 Chromosomal part RAD50 JUNB MSH2 STAT1 NCOA3 PCNA 1.71E-03 7 170 Nuclear transcription factor complex TFDP2 MAX RB1 JUNB DDIT3 TRRAP 2.49E-03 16 875 Cell surface CD79A FAS TGFBR3 ENO1 CD59 HBEGF 3.59E-03 12 559 Side of membrane JAK2 CD79A TGFBR3 CD59 PTPN22 CCR1 3.59E-03 15 821 Transferase complex CDK6 CDKN1A MAX PRMT1 DDB2 RB1 3.72E-03 6 143 RNA polymerase II transcription factor complex TFDP2 MAX RB1 JUNB DDIT3 TRRAP 3.91E-03 2 5 Transcription factor AP-1 complex JUNB DDIT3 3.91E-03 9 340 Receptor complex CD79A AHR TGFBR3 FGFR1 ITGA2 KLRC2 3.95E-03 11 496 Centrosome RANBP1 ARL2BP CDK6 CCT4 PCNA KIF20B 5.13E-03 17 1060 Chromosome RAD50 JUNB MSH2 STAT1 NCOA3 PCNA 5.13E-03 2 6 CD95 death-inducing signaling complex FAS CFLAR 5.29E-03 42 3939 Extracellular space CCL20 CCL21 VEGFC CXCL10 CCL4 CCL18 1.09E-02 2 9 Zona pellucida receptor complex CCT4 CCT7 1.09E-02 2 9 Death-inducing signaling complex CFLAR FAS 1.47E-02 2 11 Chaperonin-containing T-complex CCT4 CCT7

Page 8: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

8

Figure 2. Enriched GO Cellular Component terms.

Page 9: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

9

Table 6. Enriched GO Molecular Function Terms.

Enrichment FDR

Genes in list

Total genes

Functional Category

2.90E-14 40 1322 DNA binding transcription factor activity 2.90E-14 45 1706 Transcription regulator activity 3.66E-12 29 798 RNA polymerase II transcription factor activity, sequence-specific DNA binding 3.66E-12 30 861 Double-stranded DNA binding 6.40E-12 30 897 Regulatory region DNA binding 6.40E-12 30 899 Regulatory region nucleic acid binding 6.40E-12 30 895 Transcription regulatory region DNA binding 1.90E-11 32 1080 Sequence-specific DNA binding 1.64E-10 26 763 Sequence-specific double-stranded DNA binding 3.19E-10 25 726 Transcription regulatory region sequence-specific DNA binding 1.00E-09 23 643 RNA polymerase II regulatory region sequence-specific DNA binding 1.01E-09 23 646 RNA polymerase II regulatory region DNA binding 1.51E-09 19 431 Transcriptional activator activity, RNA polymerase II transcription regulatory region sequence-specific binding 2.66E-09 21 561 Transcription factor binding 7.34E-09 22 658 Kinase binding 2.25E-08 41 2275 Enzyme binding 3.93E-08 15 311 Cytokine receptor binding 4.00E-08 37 1948 Molecular function regulator 9.50E-08 33 1648 Receptor binding 1.33E-07 20 645 Transcription factor activity, protein binding 1.49E-07 43 2653 DNA binding 5.08E-07 19 636 Transcription factor activity, transcription factor binding 5.08E-07 18 569 Transcription cofactor activity 6.59E-07 22 865 Protein complex binding 9.86E-07 30 1559 Identical protein binding 2.28E-06 10 171 Hormone receptor binding 2.63E-06 14 382 Core promoter proximal region sequence-specific DNA binding 2.70E-06 14 384 Core promoter proximal region DNA binding 4.81E-06 14 404 Transcription factor activity, RNA polymerase II core promoter proximal region sequence-specific binding 1.01E-05 13 369 RNA polymerase II core promoter proximal region sequence-specific DNA binding

Page 10: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

10

Figure 3. Enriched GO Molecular Function terms.

Page 11: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

11

Figure 4. The network of enriched GO Cellular Component terms. Size of the node represents the number of genes in the gene-

set. Thickness of the edges indicate the percent of overlapping genes.

Page 12: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

12

Table 7. Enriched KEGG pathways.

FDR Genes in list

Total genes Functional Category Genes

8.46E-20 32 530 Pathways in cancer CDK6 CDKN1A COL4A2 DDB2 EP300 FGFR1 FN1 BBC3 BIRC3 IFNGR1 FAS IL2RB …… 2.74E-15 15 93 Small cell lung cancer CDK6 CDKN1A COL4A2 DDB2 FN1 BIRC3 ITGA2 MAX MYC NFKB1 NFKBIA RB1 RELA TRAF1 CCNE2

4.55E-13 15 133 Measles CDK6 IRF9 BBC3 IFNGR1 FAS IL2RB JAK2 NFKB1 NFKBIA RELA STAT1 STAT5A TNFAIP3 TNFSF10 CCNE2 1.02E-12 15 143 Hepatitis B CDK6 CDKN1A DDB2 EP300 FAS SMAD3 MYC NFKB1 NFKBIA PCNA RB1 RELA STAT1 STAT5A CCNE2 6.51E-12 13 107 Th17 cell differentiation AHR IFNGR1 IL2RB IRF4 JAK2 SMAD3 NFKB1 NFKBIA NFKBIE RELA RORA STAT1 STAT5A 1.54E-11 11 68 P53 signaling pathway CDK6 CDKN1A DDB2 BBC3 FAS RRM2B PPM1D CCNG1 CCNG2 CCNE2 TP53I3 2.61E-11 15 185 Kaposi s sarcoma-associated

herpesvirus infection CDK6 CDKN1A IRF9 CCR1 EP300 IFNGR1 FAS JAK2 MYC NFKB1 NFKBIA PLCG2 RB1 RELA STAT1

2.80E-11 13 124 Cell cycle CDK6 CDKN1A CDKN1C EP300 SMAD3 MYC PCNA PLK1 RB1 TFDP2 CCNE2 PTTG1 CDK1 4.92E-11 11 78 Chronic myeloid leukemia CDK6 CDKN1A DDB2 SMAD3 MYC NFKB1 NFKBIA PTPN11 RB1 RELA STAT5A 8.20E-11 12 108 TNF signaling pathway BIRC3 FAS CXCL10 JUNB NFKB1 NFKBIA RELA CCL20 TNFAIP3 TRAF1 VEGFC CFLAR 2.90E-10 11 93 NF-kappa B signaling pathway BIRC3 NFKB1 NFKB2 NFKBIA PLAU PLCG2 RELA CCL21 TNFAIP3 TRAF1 CFLAR 6.78E-10 14 201 Viral carcinogenesis CDK6 CDKN1A IRF9 EP300 NFKB1 NFKB2 NFKBIA RANBP1 RB1 RELA STAT5A TRAF1 CCNE2 CDK1 1.38E-09 15 255 HTLV-I infection CDKN1A EP300 IL2RB SMAD3 MYC NFKB1 NFKB2 NFKBIA PCNA RANBP1 RB1 RELA STAT5A TRRAP PTTG1 2.41E-09 13 184 Transcriptional misregulation in

cancer CDKN1A DDB2 DDIT3 DUSP6 BIRC3 IL2RB MAX MYC NFKB1 PLAU RELA TRAF1 NR4A3

5.91E-09 13 199 Epstein-Barr virus infection CDKN1A EP300 MYC NFKB1 NFKB2 NFKBIA NFKBIE PLCG2 RB1 RELA TNFAIP3 TRAF1 CDK1 8.00E-09 10 99 AGE-RAGE signaling pathway in

diabetic complications COL4A2 FN1 JAK2 SMAD3 NFKB1 PLCG2 RELA STAT1 STAT5A VEGFC

1.22E-08 12 173 Influenza A IRF9 EP300 IFNGR1 FAS CXCL10 JAK2 NFKB1 NFKBIA RELA STAT1 CASP1 TNFSF10 2.33E-08 12 184 Herpes simplex infection IRF9 EP300 IFNGR1 FAS JAK2 NFKB1 NFKBIA PTPN11 RELA STAT1 TRAF1 CDK1 5.65E-08 12 200 Proteoglycans in cancer CDKN1A HBEGF FGFR1 FN1 FAS ITGA2 MYC PLAU PLCG2 PTPN6 PTPN11 ROCK1 5.96E-08 15 350 PI3K-Akt signaling pathway CDK6 CDKN1A COL4A2 COL6A1 FGFR1 FN1 IL2RB ITGA2 JAK2 MYC NFKB1 RELA SGK1 VEGFC CCNE2 5.96E-08 11 163 Necroptosis IRF9 BIRC3 IFNGR1 FAS JAK2 STAT1 STAT5A TNFAIP3 CASP1 TNFSF10 CFLAR 5.96E-08 11 162 Jak-STAT signaling pathway CDKN1A IRF9 EP300 IFNGR1 IL2RB JAK2 MYC PTPN6 PTPN11 STAT1 STAT5A 5.96E-08 9 92 Th1 and Th2 cell differentiation IFNGR1 IL2RB JAK2 NFKB1 NFKBIA NFKBIE RELA STAT1 STAT5A 7.99E-08 9 97 Prostate cancer CDKN1A EP300 FGFR1 NFKB1 NFKBIA PLAU RB1 RELA CCNE2 9.44E-08 8 70 B cell receptor signaling

pathway NFKB1 NFKBIA NFKBIE PLCG2 PTPN6 RELA IFITM1 CD79A

1.17E-07 10 137 Apoptosis DDIT3 BBC3 BIRC3 FAS NFKB1 NFKBIA RELA TRAF1 TNFSF10 CFLAR 1.19E-07 14 319 Human papillomavirus infection CDK6 CDKN1A IRF9 COL4A2 COL6A1 EP300 FN1 FAS ITGA2 NFKB1 RB1 RELA STAT1 CCNE2 1.46E-07 8 75 Pancreatic cancer CDK6 CDKN1A DDB2 SMAD3 NFKB1 RB1 RELA STAT1 2.49E-07 10 150 MicroRNAs in cancer CDK6 CDKN1A EP300 MYC NFKB1 PLAU PLCG2 ROCK1 CCNG1 CCNE2 4.18E-07 10 159 Cellular senescence RAD50 CDK6 CDKN1A SMAD3 MYC NFKB1 RB1 RELA CCNE2 CDK1

Page 13: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

13

Figure 5. Pathways in cancer from KEGG with query genes highlighted in red.

Page 14: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

14

Figure 6. The gene list is enriched with genes belong to P53 signaling pathway.

Page 15: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

15

Table 8. Enriched TF target genes.

FDR Genes in list

Total genes Functional Category Genes (first 10)

7.08E-21 27 279 Tftargets:TF Target TP53 BIRC3 FAS CASP1 CCNG1 CDKN1A CLK1 DDB2 JAK2 LGALS3 MSH2 3.73E-19 26 298 TRRUST:TF Target RELA ADORA3 BIRC3 BTG2 CCL20 CCL3 CCL4 CD80 CD83 CDK6 CDKN1A 1.22E-18 41 1035 RegNetwork:TF Target TP53 CDKN1A CCNG1 TP53I3 RRM2B JAK2 FAS DDB2 BTG2 XPC TP53TG1 3.78E-18 25 300 TRRUST:TF Target NFKB1 ADORA3 BIRC3 BTG2 CCL20 CCL3 CCL4 CD80 CD83 CDKN1A CFLAR 4.05E-18 19 131 TFactS Rel/NF-kappaB target genes FN1 FAS IRF4 JUNB LGALS3 MYC NFKB1 NFKB2 NFKBIA PLAU 1.88E-16 19 161 TRRUST:TF Target TP53 BBC3 BTG2 CASP1 CDK1 CDKN1A DDB2 FAS GDF15 MYC NFKB1 4.36E-16 26 415 TFactS SP1 CDKN1A CDKN1C COL4A2 COL6A1 DKC1 HBEGF AHR FN1 FAS IL2RB 4.51E-14 22 328 Tftargets:TF Target SP1 FAS TNFRSF8 CDK1 CDKN1A CDKN1C COL4A2 DKC1 FN1 IFNGR1 IL2RB 1.81E-13 16 143 TFactS TP53 CDKN1A DDB2 BBC3 FAS MSH2 MYC RRM2B PCNA PLK1 RB1 1.84E-13 29 704 RegNetwork:TF Target RELA NFKB2 TRAF1 SMAD3 NFKBIA NFKB1 CXCL10 BIRC3 VEGFC TSC22D3 TANK 1.84E-13 15 118 RegNetwork:TF Target BRCA1 PCNA JAK2 DDB2 CDK1 EP300 JUNB MED21 MYC NFKB1 RB1 2.09E-13 18 208 Tftargets:TF Target NFKB1 BIRC3 FAS CD80 CD79A FN1 CXCL10 IRF4 JUNB MYC NFKB1 4.95E-13 33 983 RegNetwork:TF Target NFKB1 TRAF1 VEGFC SMAD3 PLAU NFKBIA NFKB2 NFKB1 JUNB CXCL10 CD83 8.79E-13 21 345 RegNetwork:TF Target AR SMAD3 RELA POU5F1 PLAU NFKBIA NCOA3 MYC KRT19 FGFR1 CDKN1A 3.69E-12 23 467 TRRUST:TF Target SP1 AHR CD83 CDK6 CDKN1A DDB2 FAS FGFR1 HBEGF IFNGR1 IL2RB 5.94E-11 27 769 Tftargets:TF Target c-Myc TNFRSF8 CDK6 COL4A2 CRAT DDIT3 DKC1 DPYSL3 ENO1 IFNGR1 JUNB 8.53E-11 21 441 Tftargets:TF Target NFKB TGIF1 NFKBIA PTTG1 ARL2BP BIRC3 RFX5 TNFAIP3 JUNB STAT5A MSC 3.47E-10 41 1915 RegNetwork:TF Target JUN KRT8 CDKN1A ZNF274 VEGFC TRAF1 TNFSF10 TNFRSF8 TNFAIP3 TGIF1 STAT1 2.22E-09 20 472 RegNetwork:TF Target NFKB2 TRAF1 NFKB2 NFKB1 TANK STAT5A SMAD3 ROCK1 RELA RANBP1 POU3F4 4.16E-09 11 103 TFactS GLI2 CDKN1A IRF9 CLK1 IFNGR1 JUNB MYC PCNA IFITM1 CFLAR CCNG2 4.48E-09 12 135 TFactS NFKB1 FN1 FAS CXCL10 IRF4 MYC RRM2B PLAU CCL3 CCL4 TRAF1 4.48E-09 31 1239 RegNetwork:TF Target FOS KRT8 VEGFC TRAF1 TNFAIP3 TGIF1 RABGAP1 PTN NR4A3 LGALS3 KLF4 5.52E-09 9 57 TRRUST:TF Target BRCA1 CDKN1A DDB2 DDIT3 EP300 IRF9 JAK2 MYC STAT1 XPC 1.39E-08 10 88 Tftargets:TF Target RELA BIRC3 FN1 CXCL10 IRF4 MYC NFKB1 NFKB2 NFKBIA CCL3 CCL20 2.28E-08 9 67 RegNetwork:TF Target IKBKB CFLAR NCOA3 NFKB1 NFKB2 NFKBIA NFKBIE RELA TANK TNFAIP3 2.55E-08 12 159 TFactS FOXO1 CDKN1A CDKN1C COL4A2 BIRC3 KRT19 RELA TRAF1 TNFSF10 CFLAR CCNG2 2.64E-08 12 160 TFactS E2F1 CDKN1A CDKN1C DDB2 FGFR1 BBC3 MYC PCNA PLK1 RB1 NCOA3 3.41E-08 52 3378 RegNetwork:TF Target SP1 NFKB2 SMAD3 SIPA1 RORA RELA RABGAP1 POU5F1 NFE2L1 KRT19 IFNGR1 3.77E-08 7 31 TRRUST:TF Target RB1 CDK1 CDKN1A DDIT3 FGFR1 MYC PCNA RB1 4.17E-08 15 297 Tftargets:TF Target RELA CXCL10 STAT5A NR4A3 CCL20 NFKBIA PTTG1 TNFAIP3 ARL2BP NFKB2 BTG2

Page 16: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

16

Figure 7. Enriched TF target genes presented as a tree.

Page 17: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

17

Table 9. Overrepresented TF binding motifs in the promoters of the 147 genes. Note that the NFKB2 and TFDP2 genes

themselves are included in this list.

Enriched motif in promoter TF TF family P val. FDR Score Note

GGGGGGGGGCC PATZ1 C2H2 ZF 4.31E-05 0.006683 20 GGGATTTCC REL Rel 0.000224 0.017361 15 GGGCGTG KLF7 C2H2 ZF 0.000639 0.022875 11 GGCGGGAA E2F4 E2F 0.000732 0.022875 14 GTGGGCGTGGC SP6 C2H2 ZF 0.000738 0.022875 15 TGCGGG ZBTB1 C2H2 ZF 0.00095 0.02453 8 CACGTG TCFL5 bHLH 0.001136 0.025163 9 GGGGGCGGGGC SP2 C2H2 ZF 0.001484 0.028149 21 GGGGGGT ZIC5 C2H2 ZF 0.001634 0.028149 8 GGGCGGGAA E2F6 E2F 0.001818 0.028176 12 GGGGATTCCCC NFKB2 Rel 0.003435 0.048406 15 * Query GGCCGGAG MBD2 MBD 0.006025 0.075734 12 CCCGCATACAACGAA CENPB CENPB 0.006352 0.075734 10 CTGACTCAT FOSB bZIP 0.00816 0.089428 10 CACAGCGGGGGGTC ZIC4 C2H2 ZF 0.009188 0.089428 11 GGAAGTGC ZSCAN10 C2H2 ZF 0.009231 0.089428 7 GTGGGCGTGG SP8 C2H2 ZF 0.01048 0.090955 11 TGTCAGGGGGC INSM1 C2H2 ZF 0.010562 0.090955 9 GGGATTTCCCA HIVEP1 C2H2 ZF 0.018106 0.1447 11 CTTCCGGGTC NR2C2 Nuclear receptor 0.018671 0.1447 9 GCCAATCA PBX3 Homeodomain 0.023319 0.16396 10 CAATAGCGGTGGTG ZBTB4 C2H2 ZF 0.023698 0.16396 7 TGGGCA HIC2 C2H2 ZF 0.02433 0.16396 5 CG KDM2B CxxC 0.025573 0.165161 2 AAATGGCGGGAAA TFDP2 DP,E2F 0.027874 0.172819 8 * Query AGCATGACTCAT BACH1 bZIP 0.029371 0.175096 11 TGACTCAGCA NFE2 bZIP 0.032016 0.183797 9 GATGACTCA BACH2 bZIP 0.035729 0.197785 8

Page 18: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

18

Table 10. Overrepresented microRNA target gene-sets.

FDR Genes in list

Total genes Functional Category Genes

9.23E-06 13 235 MiRTarBase:miRNA Target hsa-miR-145-5p KLF4 CDKN1A STAT1 MYC POU5F1 PPM1D SMAD3 CDK6 PLAGL2 ROCK1 BTG1 DUSP6 HBEGF 5.99E-05 18 608 MiRTarBase:miRNA Target hsa-miR-21-5p PLOD3 CDK6 BTG2 FAS TGIF1 NCOA3 MYC TGFBR3 MSH2 TNFAIP3 CCR1 CCL20 NFKB1 TCF21

RB1 RABGAP1 CCNG1 CXCL10 5.99E-05 7 56 MiRTarBase hsa-miR-21 MSH2 MYC TGFBR3 TGIF1 BTG2 NCOA3 PLOD3 1.83E-04 8 107 MiRTarBase:miRNA Target hsa-miR-504-5p FAS BBC3 TP53I3 RABGAP1 CDK6 PLCG2 RORA NDUFB5 1.83E-04 5 24 MiRTarBase hsa-miR-145 CDKN1A MYC POU5F1 STAT1 KLF4 3.05E-04 4 13 MiRTarBase hsa-miR-222 CDKN1C BBC3 STAT5A TNFSF10 3.05E-04 11 264 MiRTarBase:miRNA Target hsa-miR-29a-3p CDK6 COL4A2 PPM1D TNFAIP3 CCT4 KLF4 BTG2 SGK1 BBC3 AHR MYC 3.05E-04 11 261 MiRTarBase:miRNA Target hsa-miR-18a-5p NCOA3 TSC22D3 SMAD3 TLE3 COX7B PCNA ITGA2 STK4 FAS RORA TNFAIP3 4.61E-04 8 132 MiRTarBase:miRNA Target hsa-miR-6507-5p CDK1 BTG2 ROCK1 SGK1 TNFAIP3 RANBP1 TGIF1 STK4 4.61E-04 13 403 RegNetwork:miRNA Target hsa-miR-21 BTG2 CCL20 CCR1 CDK6 CDKN1A FAS MSH2 MYC NCOA3 PLOD3 TGFBR3 TGIF1 TNFAIP3 5.11E-04 30 1857 MiRTarBase:miRNA Target hsa-miR-26b-5p EP300 CDK6 ZNF141 TRAF1 PPM1D BTG1 CCNI NDUFB5 FN1 AKAP5 TP53I3 NR4A3 KLRC2

NFKBIE COL4A2 BTG2 ZNF85 TNFAIP3 DDB2 DPYSL3 CCT7 BATF KRT8 NFKB1 SIPA1 PCNA IRF4 RB1 ITGA2 TNFSF10

1.20E-03 9 204 RegNetwork:miRNA Target hsa-miR-146a CCNE2 CDKN1A FAS NFKB1 NR4A3 ROCK1 STAT1 TGIF1 TNFAIP8 1.28E-03 4 21 MiRTarBase hsa-miR-146a CDKN1A FAS NFKB1 ROCK1 1.81E-03 17 779 MiRTarBase:miRNA Target hsa-miR-98-5p MYC NCOA3 BIRC3 TRAF1 CDKN1A AHR FAS NFKB2 IFNGR1 TNFSF10 CCNF HBEGF PLAGL2

STK4 PLCG2 CD59 TGFBR3 2.19E-03 8 173 MiRTarBase:miRNA Target hsa-miR-3665 TRAF1 ZNF85 BTG2 CDKN1A FGFR1 GDF15 PLAGL2 PLK1 3.41E-03 3 10 MiRTarBase hsa-miR-34b* CDK6 MYC CCNE2 4.16E-03 17 849 MiRTarBase:miRNA Target hsa-miR-24-3p MYC UBE2C NFKBIA CDK1 TNFAIP3 PLAGL2 PCNA RANBP1 CCNG1 TFDP2 NFE2L1 AKAP5 PPM1D

STK4 CCL3 CCL4 BBC3 4.16E-03 3 11 RegNetwork:miRNA Target hsa-miR-34b* CCNE2 CDK6 MYC 4.43E-03 14 608 MiRTarBase:miRNA Target hsa-let-7e-5p COL6A1 CCNG1 TRRAP STK4 NCOA3 CDKN1A MYC AHR PLAGL2 PLCG2 CD59 TGFBR3 PLK1

TNFAIP3 4.50E-03 8 201 MiRTarBase:miRNA Target hsa-miR-146a-5p ROCK1 NFKB1 CDKN1A FAS STAT1 IFITM1 CD80 MSC 4.50E-03 9 260 MiRTarBase:miRNA Target hsa-miR-29b-3p COL4A2 CDK6 NCOA3 GRN BBC3 BTG2 SGK1 TNFAIP3 MYC 4.50E-03 3 12 MiRTarBase hsa-miR-503 CDKN1A CCNF CCNE2 4.53E-03 7 150 MiRTarBase hsa-miR-124 CDK6 TSC22D3 AHR RELA CD164 PLOD3 CD59 5.46E-03 10 337 MiRTarBase:miRNA Target hsa-let-7g-5p MYC FN1 STK4 NCOA3 CDKN1A AHR PLAGL2 PLCG2 CD59 TGFBR3 5.46E-03 14 635 MiRTarBase:miRNA Target hsa-let-7a-5p CDK6 MYC NFKB1 CDKN1A TRRAP CCNG1 STK4 NCOA3 AHR PLAGL2 PLCG2 CD59 TGFBR3

TNFAIP3 6.01E-03 7 160 MiRTarBase:miRNA Target hsa-miR-3919 ITGA2 MYC SGK1 IFNGR1 CD59 BTG1 RELA 6.26E-03 7 162 RegNetwork:miRNA Target hsa-miR-518c CARTPT GRN NDUFB5 NFE2L1 PTN RB1 TFDP2 6.70E-03 4 38 MiRTarBase:miRNA Target hsa-miR-4750-3p CDK6 RORA CDK1 XPC 7.52E-03 3 16 MiRTarBase hsa-miR-221 CDKN1C BBC3 TNFSF10 7.52E-03 12 512 MiRTarBase:miRNA Target hsa-let-7c-5p MYC CCNF TRRAP CCNG1 STK4 NCOA3 CDKN1A AHR PLAGL2 PLCG2 CD59 TGFBR3

Page 19: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

19

Figure 8. Overrepresented miRNA target genes. Most significant are two gene-sets

for miR-145 and three gene-sets for miR-21.

Page 20: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

20

A

C

B

D

Figure 9. Gene characteristics plots. A) Distribution of the 147 genes on the chromosomes. B) Number of exons. C) Type

of genes. D). Number of transcripts per gene. Chi-squared test are done to test the difference between query genes with

other genes on the genome.

Page 21: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

21

A

E

B

D

F

C

Figure 10. Distribution of gene feature lengths. Gene lengths are log-transformed. T-test are conducted to test the

difference between the query and other protein-coding genes on the genome. The 147 genes have slightly longer 5’ UTR

when compared with other genes.

Page 22: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

22

Figure 11. Display of query genes on the genome. Only part of the plot covering chromosomes 19-22 and X is shown.

Page 23: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

23

Table 11. Enriched Pfam domains in the query genes.

adj.Pval nGenes Pathways

6.10E-08 5 Cyclin, N-terminal domain 1.60E-05 5 Helix-loop-helix DNA-binding domain 3.50E-04 3 bZIP transcription factor 5.80E-04 2 Rel homology domain (RHD) 7.00E-04 3 Small cytokines (intecrine/chemokine), interleukin-8 like 7.80E-04 2 BTG family 1.70E-03 2 TCP-1/cpn60 chaperonin family 2.00E-03 2 Cyclin, C-terminal domain 2.40E-03 2 Protein-tyrosine phosphatase 2.80E-03 2 Ankyrin repeat

Figure 12. Protein-protein interaction network among query genes. This interactively, richly annotated network is accessible

on the STRING website through a custom link generated by the API.

Page 24: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

24

References: 1. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik

KM, Lachmann A, et al: Enrichr: a comprehensive gene set enrichment analysis web server 2016

update. Nucleic Acids Res 2016, 44:W90-97.

2. Jung D, Ge SX: PPInfer: a Bioconductor package for inferring functionally related proteins using

protein interaction networks. F1000Research 2018, 6:1969.

3. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z: GOrilla: a tool for discovery and visualization of

enriched GO terms in ranked gene lists. BMC Bioinformatics 2009, 10:48.

4. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA,

Mann I, Cook K, et al: Determination and inference of eukaryotic transcription factor sequence

specificity. Cell 2014, 158:1431-1443.

5. Yi X, Du Z, Su Z: PlantGSEA: a gene set enrichment analysis toolkit for plant community. Nucleic

Acids Res 2013, 41:W98-103.

6. Tsai MH, Chen X, Chandramouli GV, Chen Y, Yan H, Zhao S, Keng P, Liber HL, Coleman CN, Mitchell

JB, Chuang EY: Transcriptional responses to ionizing radiation reveal that p53R2 protects against

radiation-induced mutagenesis in human lymphoblastoid cells. Oncogene 2006, 25:622-632.

7. Mi H, Muruganujan A, Casagrande JT, Thomas PD: Large-scale gene function analysis with the

PANTHER classification system. Nat Protoc 2013, 8:1551-1566.

8. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A,

Santos A, Tsafou KP, et al: STRING v10: protein-protein interaction networks, integrated over the

tree of life. Nucleic Acids Res 2015, 43:D447-452.

9. Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P: The Molecular Signatures

Database (MSigDB) hallmark gene set collection. Cell Syst 2015, 1:417-425.

10. Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using

DAVID bioinformatics resources. Nat Protoc 2009, 4:44-57.

11. MSigDB website

[http://software.broadinstitute.org/gsea/msigdb/cards/TSAI_RESPONSE_TO_IONIZING_RADIATION.ht

ml]

12. Reimand J, Arak T, Adler P, Kolberg L, Reisberg S, Peterson H, Vilo J: g:Profiler-a web server for

functional interpretation of gene lists (2016 update). Nucleic Acids Res 2016, 44:W83-89.

13. Mikkelsen RB, Wardman P: Biological chemistry of reactive oxygen and nitrogen and radiation-

induced signal transduction mechanisms. Oncogene 2003, 22:5734-5754.

14. Maier P, Hartmann L, Wenz F, Herskind C: Cellular Pathways in Response to Ionizing Radiation and

Their Targetability for Tumor Radiosensitization. Int J Mol Sci 2016, 17.

15. Zhang Q, Lenardo MJ, Baltimore D: 30 Years of NF-kappaB: A Blossoming of Relevance to Human

Pathobiology. Cell 2017, 168:37-57.

16. Ando T, Kawabe T, Ohara H, Ducommun B, Itoh M, Okamoto T: Involvement of the interaction

between p21 and proliferating cell nuclear antigen for the maintenance of G2/M arrest after DNA

damage. J Biol Chem 2001, 276:42971-42977.

17. Liu ZP, Wu C, Miao H, Wu H: RegNetwork: an integrated database of transcriptional and post-

transcriptional regulatory networks in human and mouse. Database (Oxford) 2015, 2015.

18. Friard O, Re A, Taverna D, De Bortoli M, Cora D: CircuitsDB: a database of mixed

microRNA/transcription factor feed-forward regulatory circuits in human and mouse. BMC

Bioinformatics 2010, 11:435.

19. Jiang C, Xuan Z, Zhao F, Zhang MQ: TRED: a transcriptional regulatory element database, new

entries and other development. Nucleic Acids Res 2007, 35:D137-140.

20. Consortium EP: An integrated encyclopedia of DNA elements in the human genome. Nature 2012,

489:57-74.

21. Han H, Shim H, Shin D, Shim JE, Ko Y, Shin J, Kim H, Cho A, Kim E, Lee T, et al: TRRUST: a

reference database of human transcriptional regulatory interactions. Sci Rep 2015, 5:11432.

22. Suzuki HI, Yamagata K, Sugimoto K, Iwamoto T, Kato S, Miyazono K: Modulation of microRNA

processing by p53. Nature 2009, 460:529-533.

Page 25: ShinyGO: a web application for in-depth analysis of gene sets · 2018-05-04 · GO Cellular Component When switching to GO cellular component, we can detect that this list is overrepresented

Supplementary document

25

23. Yu Y, Wang Y, Ren X, Tsuyada A, Li A, Liu LJ, Wang SE: Context-dependent bidirectional regulation

of the MutS homolog 2 by transforming growth factor beta contributes to chemoresistance in breast

cancer cells. Mol Cancer Res 2010, 8:1633-1642.

24. Wan G, Mathur R, Hu X, Zhang X, Lu X: miRNA response to DNA damage. Trends Biochem Sci 2011,

36:478-484.

25. Ghose J, Sinha M, Das E, Jana NR, Bhattacharyya NP: Regulation of miR-146a by RelA/NFkB and p53

in STHdh(Q111)/Hdh(Q111) cells, a cell model of Huntington's disease. PLoS One 2011, 6:e23837.

26. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS,

Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Nat Genet 2000, 25:25-29.

27. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K: KEGG: new perspectives on genomes,

pathways, diseases and drugs. Nucleic Acids Res 2017, 45:D353-D361.

28. Araki H, Knapp C, Tsai P, Print C: GeneSetDB: A comprehensive meta-database, statistical and

visualisation framework for gene set analysis. FEBS Open Bio 2012, 2:76-82.

29. Zheng G, Tu K, Yang Q, Xiong Y, Wei C, Xie L, Zhu Y, Li Y: ITFP: an integrated platform of

mammalian transcription factors. Bioinformatics 2008, 24:2416-2417.

30. Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA: Circuitry and

dynamics of human transcription factor regulatory networks. Cell 2012, 150:1274-1286.

31. Marbach D, Lamparter D, Quon G, Kellis M, Kutalik Z, Bergmann S: Tissue-specific regulatory circuits

reveal variable modular perturbations across complex diseases. Nat Methods 2016, 13:366-370.

32. Wong N, Wang X: miRDB: an online resource for microRNA target prediction and functional

annotations. Nucleic Acids Res 2015, 43:D146-152.

33. Agarwal V, Bell GW, Nam JW, Bartel DP: Predicting effective microRNA target sites in mammalian

mRNAs. Elife 2015, 4.

34. Chou CH, Shrestha S, Yang CD, Chang NW, Lin YL, Liao KW, Huang WC, Sun TH, Tu SJ, Lee WH, et

al: miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions.

Nucleic Acids Res 2018, 46:D296-D302.