Summarizing Differential Expression Using Mann-Whitney U-tests.

16
Summarizing Differential Expression Using Mann- Whitney U-tests

Transcript of Summarizing Differential Expression Using Mann-Whitney U-tests.

Page 1: Summarizing Differential Expression Using Mann-Whitney U-tests.

Summarizing Differential Expression Using Mann-Whitney U-tests

Page 2: Summarizing Differential Expression Using Mann-Whitney U-tests.

RNA-Seq… at it’s Most Basic Form

Samples from two conditions

Isolate RNA Generate cDNA

Create sequencing library by fragmenting, size selection and adding adaptorsRun sequencerGenerate short

reads

Identify differentially expressed genes

Profound biological discovery

Page 3: Summarizing Differential Expression Using Mann-Whitney U-tests.

Heat stress experiment analyzed with tag-based RNA-seq

indi

vidu

al

stress

controlstress

Page 4: Summarizing Differential Expression Using Mann-Whitney U-tests.

Input: - list of significant genes (“our list”)- all GO annotations for all genes in a genome (or transcriptome)

Enrichment test: whether “our list” contain more representatives of a certain GO category than expected by chance (Fisher’s exact, hypergeometric, or similar test)

Gene Ontology enrichment analysis (classic)

Page 5: Summarizing Differential Expression Using Mann-Whitney U-tests.
Page 6: Summarizing Differential Expression Using Mann-Whitney U-tests.

Mann-Whitney U-test

• Use ranks to test if distributions of group X and group Y are different

• Robust to outliers and does not require normally distributed data

Page 7: Summarizing Differential Expression Using Mann-Whitney U-tests.

Input: - list of significant genes with measures to rank them- GO annotations for all genes in a genome (or transcriptome)

Enrichment test: whether a GO category is significantly enriched with either top- or bottom-ranking genes (two-sided Mann-Whitney U test, or permutations)

Advantages: - no need to do choose a “significance cutoff”- can keep track of direction of change

Gene Ontology enrichment analysis (rank-based)

controlstressGenes annotatedwith the GO term

MWU test determines whether genes annotated with the GO term in question (stripes on the white box to the left) are significantly “bunched up” either on top or at the bottom of the ranked list.

“delta rank” : mean rank of GO-term genes minus mean rank of all other genes (how much shift in ranks there is).

Page 8: Summarizing Differential Expression Using Mann-Whitney U-tests.

control treatment

Differential Expression

Analysis(DESeq EdgeR)

Name pvalue -log(p) Rank

gene1 0.0001 4 1gene2 0.001 3 2gene3 0.01 2 3gene4 0.1 1 4gene5 0.1 -1 5gene6 0.01 -2 6gene7 0.001 -3 7gene8 0.0001 -4 8

deltarank

Page 9: Summarizing Differential Expression Using Mann-Whitney U-tests.

- Cluster GO categories according to the proportion of shared genes would bring similar biological processes together

- Merge identical or very similar categories to reduce redundancy.

Some GO categories in your data might share the same genes(and some may overlap completely)

Page 10: Summarizing Differential Expression Using Mann-Whitney U-tests.

Run R Script GO_MWU.R • go to ~/Desktop/Mann-Whitney_U-tests/MWU_go

• open the file GO_MWU.R

• execute commands by highlighting and pressing control + enter

Page 11: Summarizing Differential Expression Using Mann-Whitney U-tests.
Page 12: Summarizing Differential Expression Using Mann-Whitney U-tests.

gene,logPisogroup0,0.6isogroup1,3.5isogroup10,6.8isogroup100,6.4isogroup1000,1.7isogroup10000,0.1isogroup10001,-0.2isogroup10002,0.6isogroup10003,-0.4

heats.csv(differential expression dataset)

V1 V2isogroup15359 GO:0001614;GO:0004931;GO:0009719;isogroup0 GO:0004687isogroup100 GO:0003779;GO:0008091isogroup10001 GO:0003993isogroup10002GO:0005524;GO:0016887;GO:0000166;GO:0017111isogroup10003GO:0006605;GO:0006886;GO:0016020;GO:0015031;isogroup10004GO:0004197;GO:0006508;GO:0008234;GO:0004217isogroup10006 GO:0001733isogroup10007GO:0003824;GO:0008152;GO:0000247;GO:0008416;GO:0016863;GO:0018842;GO:0004165

amil_defog_iso2go.tab(links genes with their GO terms)

id: GO:0000002name: mitochondrial genome maintenancenamespace: biological_processdef: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]is_a: GO:0007005 ! mitochondrion organization

[Term]id: GO:0000003name: reproductionnamespace: biological_processalt_id: GO:0019952alt_id: GO:0050876def: "The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms." [GOC:go_curators, GOC:isa_complete, GOC:jl, ISBN:0198506732]subset: goslim_genericsubset: goslim_pirsubset: goslim_plantsubset: gosubset_prok

go.obo(links GO terms with names, namespaces, and definitions)

Page 13: Summarizing Differential Expression Using Mann-Whitney U-tests.

Molecular function:

Cellular component:

Dendrograms : sharing of genes between categories. Fractions : genes with an unadjusted p<0.05 / total number of genes within the category.

FDR-adjusted p-values

GO_MWU: response to adult corals to 3 days of heat stress

https://github.com/z0on/GO_MWU

Page 14: Summarizing Differential Expression Using Mann-Whitney U-tests.

Run R Script GO_MWU.R • go to ~/Desktop/Mann-Whitney_U-tests/MWU_go

• open the file GO_MWU.R

• execute commands by highlighting and pressing control + enter

Page 15: Summarizing Differential Expression Using Mann-Whitney U-tests.

KOG-MWU: same idea as GOMWU (“KOGMWU” package in )

Non-hierarchical and [mostly] non-overlapping nature of KOG class annotations allows for quantitative comparisons of diverse datasets based on KOG delta-ranks.

“categories enriched with either up- or down-regulated genes”

Page 16: Summarizing Differential Expression Using Mann-Whitney U-tests.

Questions