Post on 22-Dec-2015
Chapter 8: Biological Knowledge Assembly and Interpretation
Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine,
Seoul, Korea,
Presenter: Zhen Gao
2
Outline
Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.
3
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
co-expression / clustering
Gene Set-Wise Differential Expression Analysis
Differential Co-Expression Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice
4
FAA: Functional Annotation Analysis GO: Gene Ontology Pathway DEG: Differentially Expressed Genes GSEA: Gene Set Enrichment Analysis Biological Interpretation and Biological
Semantics Concept lattice analysis
Glossary
Pathway and Ontology-Based Analysis
GO and biological pathway-based analysis: one of the most powerful methods for inferring
the biological meanings of expression changes list of genes obtained by:
differential expression analysis co-expression analysis (or clustering)
6
Pathway and Ontology-Based Analysis
7
8
Attributes can be applied for FAA:
transcription factor binding clinical phenotypes like disease associations MeSH (Medical Subject Heading) terms microRNA binding sites protein family memberships chromosomal bands, etc GO terms biological pathways
Pathway and Ontology-Based Analysis
9
Features may have their own ontological
structures
GO has a structure as a DAG (Directed Acyclic Graph)
Pathway and Ontology-Based Analysis
10
DEGs:
Pathway and Ontology-Based Analysis
11
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
co-expression / clustering
Gene Set-Wise Differential Expression Analysis
Differential Co-Expression Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice
12
DEGs: 3 techniques which help obtain DEGs:
t-test Wilcoxon’s rank sum test ANOVA
Need to note that multiple-hypothesis-testing problem should be properly managed
Pathway and Ontology-Based Analysis
13
Co-expression analysis
Pathway and Ontology-Based Analysis
14
Co-expression analysis
puts similar expression profiles together and different ones apart
Returning genes that are assumed to be co-regulated
Clustering algorithms: hierarchical-tree clustering partitional clustering
Pathway and Ontology-Based Analysis
15
Pathways are powerful resources for the
understanding of shared biological processes E.g.: KEGG, MetaCyc and BioCarta (signaling
pathways)
Pathway and Ontology-Based Analysis
16
MetaCyc:
an experimentally determined non-redundant metabolic pathway database
It is the largest collection containing over 1400 metabolic pathways
Pathway and Ontology-Based Analysis
17
Ontology / GO:
providing a shared understanding of a certain domain of information
controlled vocabularies
DAG structures with 3 vocabularies of GO: Molecular Function (MF) Cellular Compartment (CC) Biological Process (BP)
Pathway and Ontology-Based Analysis
18
Common Gos:
MIPS: integrated source, protein properties, variety of complete genomes
MeSH: clinical including disease names OMIM (Online Mendelian Inheritance in Man) UMLS (Unified Medical Language System)
Pathway and Ontology-Based Analysis
19
GO enrichment test: For example
if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’
only 1% of the genes in the whole human genome fall into this functional category
Pathway and Ontology-Based Analysis
20
Common statistical tests:
Chi-square binomial hypergeometric tests
Pathway and Ontology-Based Analysis
21
hypergeometric test:
Pathway and Ontology-Based Analysis
22
Avoid pitfalls when using hypergeometric test
Choice of background, that makes substantial impact on the result. All genes having at least one GO annotation all genes ever known in genome databases all genes on the microarray
GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories
Pathway and Ontology-Based Analysis
23
Common Tools
DAVID ArrayX- Path Pathway Miner EASE GOFish GOTree etc.
Pathway and Ontology-Based Analysis
24
25
Gene Set-Wise Differential Expression Analysis
26
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
co-expression / clustering
Gene Set-Wise Differential Expression Analysis
Differential Co-Expression Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice
27
Evaluates coordinated differential expression
of gene groups
Gene Set Enrichment Analysis (GSEA) The first developed in this category evaluates for each a pre-defined gene set the
significant association with phenotypic classes
Gene Set-Wise Differential Expression Analysis
28
Difference between FAA and GSEA:
FAA: find over-represented GO terms from a interesting gene list
GSEA: obtain the pre-defined gene list first and test the changes under different conditions.
Gene Set-Wise Differential Expression Analysis
29
30
Advantages of gene set-wise differential expression
analysis: successfully identified modest but coordinated
changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis.
(many tiny expression changes can collectively create a big change)
straightforward biological interpretation because the gene sets are defined by biological knowledge
Gene Set-Wise Differential Expression Analysis
31
Enrichment Score (ES) is calculated by evaluating the
fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,
Gene Set-Wise Differential Expression Analysis
32
Typical gene sets:
regulatory-motif function-related disease-related sets
Database: MSigDB:
6769 gene sets classified into five different collections Has some interesting extensions
Gene Set-Wise Differential Expression Analysis
33
Differential Co-Expression Analysis
34
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
co-expression / clustering
Gene Set-Wise Differential Expression Analysis
Differential Co-Expression Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice
35
Co-expression analysis:
determines the degree of co-expression of a cluster of genes under a certain condition
Differential co-expression analysis: determines the degree of co-expression difference of a
gene pair or a gene cluster across different conditions
Differential Co-Expression Analysis
36
3 major types:
(a) differential co-expression of gene cluster(s) (b) gene pair-wise differential co- expression (c) differential co-expression of paired gene sets
Differential Co-Expression Analysis
37
38
Type (a), identify differentially co-expressed gene
cluster(s) between two conditions Let conditions and genes be denoted by J and I,
respectively. The mean squared residual of model is a measurement of co-expression of genes:
Differential Co-Expression Analysis
39
Differential Co-Expression Analysis
Type (a) cont.
40
Type (b)
Differential Co-Expression Analysis
41
Type (b), identify differentially co-expressed gene pairs
Techniques: F-statistic A meta-analytic approach
Differential Co-Expression Analysis
42
Note that identification of differentially co-expressed
gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs.
Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.
Differential Co-Expression Analysis
43
Type (c), dCoxS (differential co-expression of gene sets)
algorithm identifies gene set pairs differentially co-expressed across different conditions
Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.
Differential Co-Expression Analysis
44
Type (c) cont.
To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.
Differential Co-Expression Analysis
45
Type (c) cont.
Differential Co-Expression Analysis
46
Biological Interpretation and Biological Semantics
47
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
co-expression / clustering
Gene Set-Wise Differential Expression Analysis
Differential Co-Expression Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice
48
Biomedical semantics provides rich descriptions for
biomedical domain knowledge.
Motivation for Biological Semantics: GO has limitations:
The result of GO is typically a long unordered list of annotations
Most of the analysis tools evaluate only one cluster at a time time-consuming to read the massive annotation lists hard to manually assemble Many annotations are redundant
Biological Interpretation and Biological Semantics
49
Introducing BioLattice:
a mathematical framework based on concept lattice analysis organize traditional clusters and associated annotations
into a lattice of concepts A graphical summary considers gene expression clusters as objects and
annotations as attributes
Thus, complex relations among clusters and annotations are clarified, ordered and visualized.
Biological Interpretation and Biological Semantics
50
Another advantage of BioLattice is that heterogeneous
biological knowledge resources can be added
Biological Interpretation and Biological Semantics
51
52
Tool to construct BioLattice:
The Ganter algorithm http:// www.snubi.org/software/biolattice/
Biological Interpretation and Biological Semantics
53
54
Review of major computational approaches to
facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.
Conclusion
55
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
co-expression / clustering
Gene Set-Wise Differential Expression Analysis
Differential Co-Expression Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice
56