Gene Signatures and Knowledge-Guided Gene Set...

43
Gene Signatures and Knowledge-Guided Gene Set Characterization Lab KnowEnG Center Signatures and Knowledge-Guided Characterization | KnowEnG Center 1 PowerPoint by Charles Blatti

Transcript of Gene Signatures and Knowledge-Guided Gene Set...

Page 1: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Gene Signatures and Knowledge-Guided Gene Set

Characterization LabKnowEnG Center

Signatures and Knowledge-Guided Characterization | KnowEnG Center

1

PowerPoint by Charles Blatti

Page 2: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Introduction

This goals of the lab are as follows:

1. Define a novel gene expression signature based on estrogen receptor status in TCGA samples using the integrative (iLINCS) Data Portal and identify other similar known gene expression signatures

2. Use networks of prior knowledge to identify pathways, additional genes, and other annotations that relate to the gene set of our novel signature using SPIA, GeneMANIA, and KnowEnG’s DRaWR.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

2

Page 3: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 0: Download and Extract Data Files

For viewing and manipulating the files needed for this laboratory exercise, download the following archive:

http://publish.illinois.edu/computational-genomics-course/files/2019/06/07_Signatures_and_Characterization.zip

Right Click and Extract the contents of the archive to your course directory. We will use the files found in:

[course_directory]/07_Signatures_and_Characterization/

Signatures and Knowledge-Guided Characterization | KnowEnG Center

3

Page 4: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Creating a Novel Gene Expression SignatureIn this exercise, we will use the integrative iLINCS data portal to extract gene expression data from TCGA BRCA samples and build a gene signature based on the estrogen receptor status.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

4

Page 5: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 1: Perturbagen and Disease Datasets

Open your web browser and go to the iLINCS data portal: http://www.ilincs.org/ilincs/

This portal, curated by the LINCS Data Coordination and Integration Center, contains transcriptomic and proteomic datasets from the many LINCS affiliated projects, including the LINCS L1000 assay. It also contains several other large public datasets of perturbations to cell lines and samples of disease.

We will define a custom gene signature from TCGA data and see how it relates to the library of signatures generated from the LINCS L1000 project.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

5

Page 6: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 2: Select Breast Cancer Dataset

Click on “Datasets” in the options along the top

Select the “All Datasets” tab

Click “Choose” button for TCGA datasets

Find “919 mRNA-seq breast invasive carcinoma (BRCA) samples from TCGA project” by Collins, et al. Click “Analyze”.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

6

Page 7: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 3A: Creating a Novel Gene SignatureClick on “Create a Signature”

In “Select grouping variable” dropdown select “breast_carcinoma_estrogen_receptor_status”

In “Select group 1” dropdown select “Negative”

In “Select group 2” dropdown select “Positive”

Finally, click on “Create Signature” button

Signatures and Knowledge-Guided Characterization | KnowEnG Center

7

Page 8: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

When the signature is calculated, a quick summary of the number of samples from each group is presented

Step 3B: Our ER Status Gene Signature

Signatures and Knowledge-Guided Characterization | KnowEnG Center

8

Next, we will look more closely at the genes involved in our signature.

Page 9: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 4A: Examining Gene Expression of our SignatureTo get statistics about how the signature is defined, we will select “Modify the list of selected genes”

We are presented with a volcano plot for the log fold change (x-axis) and differential expression significance (y-value) of each gene.

Thresholds on both of these criteria define the genes selected for the signature

Slide the differential expression range to the values -3 and 3 so there are only about 100 genes in the our ER status signature. Click Analyze.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

9

Page 10: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 4B: Examining Gene Expression of our SignatureClick the “Signature Data” tab and “Show selected genes” to see the list of selected signature genes

Note that ESR1, estrogen receptor 1, is the most significantly differentially expressed gene, which is consistent with the immunohistochemical staining assay result that defined the positive and negative groups.

Because of the number of samples (868) is high, the differential expression p-values are very significant for these top signature genes

Click the “Download ” button to save “Signature with only selected genes” table as an Excel file called “subsetSig_*.xls” and move to [course_directory]/07_Signatures_and_Characterization/. We will use this later.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

10

Page 11: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Finding Similar Signatures from Public LibrariesHere we will attempt to find signatures from large public collections that relate to the ER status signature we defined. These public signatures are defined using both basic methods as well as the Characteristic Direction method.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

11

Page 12: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 5A: Finding Related L1000 SignaturesClick on the “Connected Signatures” tab and “Use Complete Signature” in the bottom half of the screen.

We will start by looking at shRNA gene knockdown signatures defined from the LINCS L1000 project, thus only 976 genes are measured.

Click the checkbox next to “LINCS consensus (CGS) gene knockdown signature”. These consensus signatures are defined by combining all different shRNA with different seeds that target the same gene and by comparing to appropriate control experiments

Signatures and Knowledge-Guided Characterization | KnowEnG Center

12

Page 13: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 5B: Finding Related L1000 SignaturesTo view the similar signatures when the calculation is complete, expand the results by clicking on “992 of LINCS consensus (CGS) gene knockdown signature”

The third result is CDK4, cell division protein kinase 4, an important regulator of cell cycle progression. Previous literature has shown that silencing of CDK4 will have a variable influence on cancer progression based on the expression of estrogen receptor. Inhibition of CDK4 increases migration and stem-like cell activity in ER negative breast cancer.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

13

Page 14: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 5C: Finding Related L1000 Signatures

By clicking on the small graph icon in the Concordance column, we can see the correlation between our ER status signature and the LINCS 1000 signature for CDK4 knockdown

Signatures and Knowledge-Guided Characterization | KnowEnG Center

14

Page 15: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 6: Related ENCODE Signatures

Uncheck the checkbox for “LINCS consensus (CGS) gene knockdown signature” and Click the checkbox next to “ENCODE Transcription Factor Binding Signatures”. Expand the results when computed.These signatures are defined by creating gene level scores that integrate the distance of transcription factor ChIP peaks and the likelihood that the gene is regulated in the condition using the TREG method, True REGulatory TF-gene interactions.

Two of the top five results we recover are TF signatures of estrogen receptor binding meaning our ER status differential expression signature matches ER differential binding signatures. The other TFs in the top signatures also have known roles in mediating ER binding.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

15

Page 16: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 7A: Finding Related Signatures Using Characteristic Direction

Click on the “Analysis Results” tab which contains many different methods for analyzing our novel ER status gene signature. We will discuss some of these next.

The public signatures so far have been defined by independently considering whether each gene is differential expressed. The following exercise uses signatures defined with the characteristic direction method (L1000CDS²), which represents each signature with an arrow in gene expression space.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

16

Page 17: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 7B: Finding Related Signatures Using Characteristic Direction

The L1000CDS² tool is a LINCS L1000 characteristic direction signature search engine where users can find matches to their input signature from 33K small molecule perturbagen signatures covering 62 cell lines and 4K small molecules.

We will go directly to the L1000CDS² tool by pasting this link in our browser: http://amp.pharm.mssm.edu/L1000CDS2/

Signatures and Knowledge-Guided Characterization | KnowEnG Center

17

Page 18: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 7C: Finding Related Signatures Using Characteristic Direction

Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterization/ERstatus_top100_logDiffExp.csvThis is our ER status gene signature with the pair of columns [Name_GeneSymbol, Value_LogDiffExp] extracted from our earlier Excel download

Paste the contents of this file into the “up genes” text box on the left, its name will change to “signature”.

In the Configuration panel, • switch mimic to reverse small molecule signature• make sure latest database version is selected• uncheck the three remaining checkboxesClick the “Search” button

Signatures and Knowledge-Guided Characterization | KnowEnG Center

18

Page 19: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 7D: Finding Related Signatures Using Characteristic DirectionThese are the top small molecule LINCS L1000 signatures that are the most opposite to our ER status signature. The idea is that if our ER status signature represents a direction in gene space, these signatures of small molecules perturbations represent the best reversal of that signature.

The top two results are unnamed Broad compounds, but the Jak2 inhibitor curcubitacin I is known to reduce mammary tumorigenesis and metastasis by inhibiting Rac1 activity which is frequently elevated in ER positive tumors.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

19

Page 20: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 8: Caveats about using LINCS signatures

Only 978 genes are measured in L1000 probe. Other gene values can be imputed from Connectivity Map dataset. However, the missing or imputed values can make signature analysis less reliable.

Also, although tens of thousands of signatures exist, most are still missing. Tools are being developed to identify signatures by learning models on dense parts of the cube and then learning how to correctly transfer those models to sparse regions.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

20

Page 21: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Discovering Pathways Related to Our Gene SignatureIn this section, we will consider some of the characterization resources that available for gene signatures and gene sets.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

21

Page 22: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 9: Standard Gene Set Enrichment

Back on the iLINCS tab, two of the “Analysis Results” tools that are linked to are Enrichr and DAVID.

DAVID is the enrichment tool used in the Regulatory Genomics lab.

Both tools use standard statistical enrichment tests to examine the overlap of the 100 genes of our ER status gene signature with Gene Ontology term annotations, pathways, and other gene sets.

These tools output the results in slightly different ways, so you may want to explore them in your own time.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

22

Page 23: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 10A: Pathway Network Enrichment TestSignaling Pathway Enrichment Analysis (SPIA) is a method for assessing the impact of a gene set on a pathway. It combines standard enrichment p-values with network perturbation based p-values.

Click on “Pathway Analysis”

Estrogen signaling pathway is the third result related to our ER status gene signature, although the overall adjusted p-value “SPIA padj” is not significant. Our gene signature is computed to activate the pathway.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

23

Page 24: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 10B: Pathway Network Enrichment TestClick on the KEGG icon in the last column for Estrogen signaling pathway.Yellow nodes are up-regulated genes in our signature, blue are down-regulated

Signatures and Knowledge-Guided Characterization | KnowEnG Center

24

Page 25: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 11A: GeneMANIA

Return to the analysis result by clicking on “Differential Expression Signature” in the tool bar at the top

The last linked tool we will explore today from iLINCS is GeneMANIA.

GeneMANIA is a network-based guilt-by-association algorithm that finds the network neighbors of an input gene set from a heterogeneous collection of interaction networks

Go to https://genemania.org/

Signatures and Knowledge-Guided Characterization | KnowEnG Center

25

Page 26: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 11A: GeneMANIA

We are going to enter the top 20 differentially expressed genes from our ER status gene signature. We will use GeneMANIA to return 20 additional network neighbor genes (not necessarily differentially expressed themselves)

Then we will look at functional enrichment of this combined set of 40 genes.

Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterization/ERstatus_top20.txtThis is the top 20 differentially expressed genes of our ER status gene signature extracted from the Name_GeneSymbol column our earlier Excel downloadPaste this list into the text box at the top left corner of the main page

Signatures and Knowledge-Guided Characterization | KnowEnG Center

26

Page 27: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 11B: GeneMANIA

Click on the stacked-dots options button

This first list shows all the possible networks that GeneMANIA will consider combining for the analysis of our twenty genes

Select “Customise advanced options”

This menu shows that we are going to find at most 20 neighbors using the automatic network weighting scheme, which is based on our 20 query genes

Click the search magnifying glass.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

27

Page 28: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 11C: GeneMANIA

The resulting network contains our 20 input genes (striped) and our 20 predicted network neighbors (solids). The size of the network neighbors indicates its final guilt-by-association value on the composite affinity network.You may choose between three arrangements of the graph. The stacked arrangement may be easiest for understanding the nodes. You can hover over any node to highlight its neighbors.For example,NCOA7 is also known as Estrogen Nuclear Receptor Coactivator 1 NCOA3 is associated with Estrogen-Receptor Positive Breast CancerBoth are connected to ESR1 (and other top 20 genes) through pathways edges and neither are in our original 100 differentially expressed gene signature

Signatures and Knowledge-Guided Characterization | KnowEnG Center

28

Page 29: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 11D: GeneMANIA

On the right side is the selected interaction networks that were relevant to the 20 input genes, sorted by type and by weight. You can toggle the networks to display any set of edges. The highest weighted co-expression network is from breast tumors and relates the top 20 genes to each other fairly well, but does not connect them to the predicted 20.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

29

Page 30: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 11E: GeneMANIAFinally, we can perform the standard enrichment tests incorporating our predicted neighbors into our gene set.

Click on the pie chart in the bottom left corner

We see most of the results relating to hormone and steroid signaling pathways and receptors.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

30

Page 31: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Gene Set Characterization Using Discriminative Random WalksIn this final exercise, we will find terms related to the 100 top differentially expressed genes of our ER status signature using the DRaWR method that incorporates the functional annotation terms directly in the network-based algorithm.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

31

Page 32: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12A: Login Into KnowEnG Platform

Signatures and Knowledge-Guided Characterization | KnowEnG Center

32

KnowEnG Platform: https://knoweng.org/analyze/

Go to development version: https://dev.knoweng.org/(will be at end of course)

Login with CILogon - Login service through other accountsSearch: Urbana, Mayo, Google, Github

Page 33: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12B: Launch DRaWR Analysis

The first page has links to many resources, but we will get started by clicking “Start a New Pipeline”

The KnowEnG Analysis Platform has many knowledge network-informed pipelines. You will learn about more of them in the afternoon session.

For now, hover over Gene Set Characterization and click “Start Pipeline”.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

33

Page 34: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12C: Upload DataLeave the default species “Human”

Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterization/ERstatus_top100.txtThis is the top 100 differentially expressed genes of our ER status gene signature extracted from the Name_GeneSymbol column our earlier Excel download

Click on the “Upload New Data” tabSelect the “Paste a Gene List” button.Give your gene list a name, e.g. “ERstatus_gene_list”Paste the file contents into the gene list text box. Click “Done”

Click “Select” next to the name of your pasted list and you should see a checkmarkClick “Next”

Signatures and Knowledge-Guided Characterization | KnowEnG Center

34

Page 35: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12D: Configure Algorithm ParametersWe will choose to use a subset of 4 gene set collections available in the knowledge network

Ontologies: Gene Ontology (default)Pathways: Enrichr Pathway Membership (must add)Pathways: Reactome Pathways Curated (must add)Tissue Expression: GEO Expression Set (must add)

(unclick Protein Domains: PFam Protein Domains)

Click “Next”Signatures and Knowledge-Guided Characterization |

KnowEnG Center 35

Page 36: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12E: Configure Network ParametersClick “Yes” for question about using the Knowledge NetworkThe Knowledge Network we will use is an integrated network from the HumanNet project (“HumanNet Integrated Network”)Network size information can be found here

The amount of network smoothing controls how much importance is put on network connections instead of the original 100 genes. We will use the default of 50%

Click “Next”

Signatures and Knowledge-Guided Characterization | KnowEnG Center

36

Page 37: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12F: Reminder about DRaWR Algorithm

• Squares are the Gene Ontology and pathway terms we selected• Query Genes are our 100 ER status

signature genes• Gray edges are the HumanNet

Integrated Network • We are asking the algorithm to find

property squares that a random walker who is forced to restart often at the query genes will visit unusually frequently

Signatures and Knowledge-Guided Characterization | KnowEnG Center

37

Page 38: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 12G: Launch DRaWR JobChange job name to “gene_set_characterization-DRaWR-HN”

Verify all the parameters are correct.

Click “Submit Job”

While this is running, we are going to launch the standard fisher exact enrichment tests with the same data sets.

Click “Start New Pipeline”

Signatures and Knowledge-Guided Characterization | KnowEnG Center

38

Page 39: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 13: Launch Standard Enrichment TestsHover over Gene Set Characterization and click “Start Pipeline”

Click “Select” next to the name of your pasted list and you should see a checkmark. Click “Next”

Select same 4 collections:• Ontologies: Gene Ontology (default)• Pathways: Enrichr Pathway Membership (must add)• Pathways: Reactome Pathways Curated (must add)• Tissue Expression: GEO Expression Set (must add)• (unclick Protein Domains: PFam Protein Domains)

Click “Next”

Click “No” for question about using the Knowledge Network. Click “Next”

Change job name to “gene_set_characterization-fisher”Verify all the parameters are correct.Click “Submit Job”

Signatures and Knowledge-Guided Characterization | KnowEnG Center

39

Page 40: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 14A: View DRaWR ResultsClick the “Go to Data Page” button

You can check the status of your jobs here. Gray arrows mean that your job is currently queued or running. A red icon means something went wrong.

Otherwise, when your job is successfully finished, you should be able to click the green arrow and see the primary result files.

Click on the DRaWR job “gene_set_characterization-DRaWR-HN”Then click on the “View Results” button

Signatures and Knowledge-Guided Characterization | KnowEnG Center

40

Page 41: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 14B: View DRaWR ResultsSlide the filter slider all the way to the right.

The DRaWR method picks up many GEO Expression gene sets that relate to ESR1 and estrogen and estradiol.

DRaWR also ranks highly a number of pathway and Gene Ontology terms related to extracellular matrix, which is known to have many molecules effected by estrogens and related to ER expression

Signatures and Knowledge-Guided Characterization | KnowEnG Center

41

Page 42: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Step 14C: View Fisher ResultsClick the “Data” link at the top of the pageClick on the DRaWR job “gene_set_characterization-fisher”Then click on the “View Results” buttonSlide the filter slider all the way to the right.

The Fisher method finds the same GEO Expression gene sets that relate to ESR1 and estrogen and estradiol, as well as some additional estradiol ones that DRaWR missed. It also detects many more less obviously related GEO gene sets.The standard enrichment method does not detect any highly significant enrichments with pathways or Gene Ontology terms. The extracellular matrix terms detected by DRaWR are strongly connected to the signature genes, but mostly through their HumanNet network neighbors and not direct connections.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

42

Page 43: Gene Signatures and Knowledge-Guided Gene Set ...publish.illinois.edu/.../07_Signatures_and_Characterization_new_2019.pdf · Characterization Lab KnowEnG Center Signatures and Knowledge-Guided

Main Take Home Messages

•When you create your own gene signature, • you can search libraries of public gene signatures that

might provide you with insights relating to • mechanisms (e.g. gene knockdowns and transcription factor binding) or • treatments (e.g. reverse small molecule perturbagens).

•A gene signature or more simply a gene set can be analyzed • in the context of a pathway, interaction, or other affinity

network • to provide complementary annotations to standard

enrichment tests

Signatures and Knowledge-Guided Characterization | KnowEnG Center

43