2 Outline Review of major computational approaches to facilitate biological interpretation of ...

56
Chapter 8: Biological Knowledge Assembly and Interpretation Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao

Transcript of 2 Outline Review of major computational approaches to facilitate biological interpretation of ...

Page 1: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

Chapter 8: Biological Knowledge Assembly and Interpretation

Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine,

Seoul, Korea,

Presenter: Zhen Gao

Page 2: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

2

Outline

Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

Page 3: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

3

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 4: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

4

FAA: Functional Annotation Analysis GO: Gene Ontology Pathway DEG: Differentially Expressed Genes GSEA: Gene Set Enrichment Analysis Biological Interpretation and Biological

Semantics Concept lattice analysis

Glossary

Page 5: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

Pathway and Ontology-Based Analysis

GO and biological pathway-based analysis: one of the most powerful methods for inferring

the biological meanings of expression changes list of genes obtained by:

differential expression analysis co-expression analysis (or clustering)

Page 6: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

6

Pathway and Ontology-Based Analysis

Page 7: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

7

Page 8: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

8

Attributes can be applied for FAA:

transcription factor binding clinical phenotypes like disease associations MeSH (Medical Subject Heading) terms microRNA binding sites protein family memberships chromosomal bands, etc GO terms biological pathways

Pathway and Ontology-Based Analysis

Page 9: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

9

Features may have their own ontological

structures

GO has a structure as a DAG (Directed Acyclic Graph)

Pathway and Ontology-Based Analysis

Page 10: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

10

DEGs:

Pathway and Ontology-Based Analysis

Page 11: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

11

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 12: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

12

DEGs: 3 techniques which help obtain DEGs:

t-test Wilcoxon’s rank sum test ANOVA

Need to note that multiple-hypothesis-testing problem should be properly managed

Pathway and Ontology-Based Analysis

Page 13: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

13

Co-expression analysis

Pathway and Ontology-Based Analysis

Page 14: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

14

Co-expression analysis

puts similar expression profiles together and different ones apart

Returning genes that are assumed to be co-regulated

Clustering algorithms: hierarchical-tree clustering partitional clustering

Pathway and Ontology-Based Analysis

Page 15: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

15

Pathways are powerful resources for the

understanding of shared biological processes E.g.: KEGG, MetaCyc and BioCarta (signaling

pathways)

Pathway and Ontology-Based Analysis

Page 16: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

16

MetaCyc:

an experimentally determined non-redundant metabolic pathway database

It is the largest collection containing over 1400 metabolic pathways

Pathway and Ontology-Based Analysis

Page 17: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

17

Ontology / GO:

providing a shared understanding of a certain domain of information

controlled vocabularies

DAG structures with 3 vocabularies of GO: Molecular Function (MF) Cellular Compartment (CC) Biological Process (BP)

Pathway and Ontology-Based Analysis

Page 18: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

18

Common Gos:

MIPS: integrated source, protein properties, variety of complete genomes

MeSH: clinical including disease names OMIM (Online Mendelian Inheritance in Man) UMLS (Unified Medical Language System)

Pathway and Ontology-Based Analysis

Page 19: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

19

GO enrichment test: For example

if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’

only 1% of the genes in the whole human genome fall into this functional category

Pathway and Ontology-Based Analysis

Page 20: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

20

Common statistical tests:

Chi-square binomial hypergeometric tests

Pathway and Ontology-Based Analysis

Page 21: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

21

hypergeometric test:

Pathway and Ontology-Based Analysis

Page 22: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

22

Avoid pitfalls when using hypergeometric test

Choice of background, that makes substantial impact on the result. All genes having at least one GO annotation all genes ever known in genome databases all genes on the microarray

GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories

Pathway and Ontology-Based Analysis

Page 23: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

23

Common Tools

DAVID ArrayX- Path Pathway Miner EASE GOFish GOTree etc.

Pathway and Ontology-Based Analysis

Page 24: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

24

Page 25: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

25

Gene Set-Wise Differential Expression Analysis

Page 26: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

26

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 27: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

27

Evaluates coordinated differential expression

of gene groups

Gene Set Enrichment Analysis (GSEA) The first developed in this category evaluates for each a pre-defined gene set the

significant association with phenotypic classes

Gene Set-Wise Differential Expression Analysis

Page 28: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

28

Difference between FAA and GSEA:

FAA: find over-represented GO terms from a interesting gene list

GSEA: obtain the pre-defined gene list first and test the changes under different conditions.

Gene Set-Wise Differential Expression Analysis

Page 29: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

29

Page 30: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

30

Advantages of gene set-wise differential expression

analysis: successfully identified modest but coordinated

changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis.

(many tiny expression changes can collectively create a big change)

straightforward biological interpretation because the gene sets are defined by biological knowledge

Gene Set-Wise Differential Expression Analysis

Page 31: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

31

Enrichment Score (ES) is calculated by evaluating the

fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,

Gene Set-Wise Differential Expression Analysis

Page 32: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

32

Typical gene sets:

regulatory-motif function-related disease-related sets

Database: MSigDB:

6769 gene sets classified into five different collections Has some interesting extensions

Gene Set-Wise Differential Expression Analysis

Page 33: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

33

Differential Co-Expression Analysis

Page 34: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

34

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 35: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

35

Co-expression analysis:

determines the degree of co-expression of a cluster of genes under a certain condition

Differential co-expression analysis: determines the degree of co-expression difference of a

gene pair or a gene cluster across different conditions

Differential Co-Expression Analysis

Page 36: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

36

3 major types:

(a) differential co-expression of gene cluster(s) (b) gene pair-wise differential co- expression (c) differential co-expression of paired gene sets

Differential Co-Expression Analysis

Page 37: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

37

Page 38: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

38

Type (a), identify differentially co-expressed gene

cluster(s) between two conditions Let conditions and genes be denoted by J and I,

respectively. The mean squared residual of model is a measurement of co-expression of genes:

Differential Co-Expression Analysis

Page 39: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

39

Differential Co-Expression Analysis

Type (a) cont.

Page 40: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

40

Type (b)

Differential Co-Expression Analysis

Page 41: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

41

Type (b), identify differentially co-expressed gene pairs

Techniques: F-statistic A meta-analytic approach

Differential Co-Expression Analysis

Page 42: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

42

Note that identification of differentially co-expressed

gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs.

Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.

Differential Co-Expression Analysis

Page 43: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

43

Type (c), dCoxS (differential co-expression of gene sets)

algorithm identifies gene set pairs differentially co-expressed across different conditions

Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.

Differential Co-Expression Analysis

Page 44: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

44

Type (c) cont.

To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.

Differential Co-Expression Analysis

Page 45: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

45

Type (c) cont.

Differential Co-Expression Analysis

Page 46: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

46

Biological Interpretation and Biological Semantics

Page 47: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

47

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 48: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

48

Biomedical semantics provides rich descriptions for

biomedical domain knowledge.

Motivation for Biological Semantics: GO has limitations:

The result of GO is typically a long unordered list of annotations

Most of the analysis tools evaluate only one cluster at a time time-consuming to read the massive annotation lists hard to manually assemble Many annotations are redundant

Biological Interpretation and Biological Semantics

Page 49: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

49

Introducing BioLattice:

a mathematical framework based on concept lattice analysis organize traditional clusters and associated annotations

into a lattice of concepts A graphical summary considers gene expression clusters as objects and

annotations as attributes

Thus, complex relations among clusters and annotations are clarified, ordered and visualized.

Biological Interpretation and Biological Semantics

Page 50: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

50

Another advantage of BioLattice is that heterogeneous

biological knowledge resources can be added

Biological Interpretation and Biological Semantics

Page 51: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

51

Page 52: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

52

Tool to construct BioLattice:

The Ganter algorithm http:// www.snubi.org/software/biolattice/

Biological Interpretation and Biological Semantics

Page 53: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

53

Page 54: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

54

Review of major computational approaches to

facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

Conclusion

Page 55: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

55

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

Page 56: 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

56