Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs -...

53
Tutorial 7 Gene expression analysis 1

Transcript of Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs -...

Page 1: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Tutorial 7

Gene expression analysis

1

Page 2: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Gene expression analysis• How to interpret an expression matrix

• Expression data DBs - GEO

• General clustering methods Unsupervised Clustering

• Hierarchical clustering• K-means clustering

• Tools for clustering - EPCLUST

• Functional analysis - Go annotation2

Page 3: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Gene expression data sources

3

Microarrays RNA-seq experiments

Page 4: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

How to interpret an expression data matrix

• Each column represents all the gene expression levels from:– In two-color array: from a single experiment.– In one-color array: from a single sample.

• Each row represents the expression of a gene across all experiments.

Exp1 /Sample 1

Exp2 /Sample 2

Exp3 /Sample 3

Exp4 /Sample 4

Exp5 /Sample 5

Exp6 /Sample 6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

4

Page 5: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

How to interpret an expression data matrix

Each element is a log ratio: • In two-color array: log2 (T/R).

T - the gene expression level in the testing sample R - the gene expression level in the reference sample • In one-color array: log2(X) X - the gene expression level in the current sample

Exp1 /Sample 1

Exp2 /Sample 2

Exp3 /Sample 3

Exp4 /Sample 4

Exp5 /Sample 5

Exp6 /Sample 6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

5

Page 6: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

How to interpret an expression data matrix

6

In two-color array: Scale

Red indicates a positive log ratio: T>R

Black indicates a log ratio of zero: T=~R

Green indicates a positive log ratio: T>R

Samp 1 Samp 2 Samp 3 Samp 4 Samp 5 Samp 6

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

ScaleIn one-color array:

Bright green indicates a high expression value

Black indicates no expression

Expr.1 Expr.2 Expr.3 Expr.4 Expr.5 Expr.6

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Page 7: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Exp

Log

ratio

Exp

Log

ratio

Microarray Data:Different representations

T<R

T>R

7

Page 8: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

8

How to analyze gene expression data

Page 9: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

9

Expression profiles DBs

• GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/

• Human genome browserhttp://genome.ucsc.edu/

• ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/

Page 10: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

10

The current rate of submission and processing is over 10,000 Samples per month.

In 2002 Nature journals announce requirement for microarray data deposit to public databases.

Page 11: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

11

Searching for expression profiles in the GEOhttp://www.ncbi.nlm.nih.gov/geo/

Page 12: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

GEO accession IDs

GPL**** - platform IDGSM**** - sample IDGSE**** - series IDGDS**** - dataset ID

•A Series record denes a set of related Samples considered to be part of a group.•A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS.

12

Page 13: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Download dataset

Clustering

Statistic analysis 13

Page 14: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Clustering analysis

14

Page 15: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Clustering analysis – zoom in

15

Page 16: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

16

Clustering analysis – zoom in

Page 17: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

17

Page 18: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Viewing the expression levels

18

Page 19: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

19

Viewing the expression levels

Page 20: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

20

Page 21: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

ClusteringGrouping together “similar” genes

21

Page 22: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Clustering• Unsupervised learning: The classes are

unknown a priori and need to be “discovered” from the data.

• Supervised learning: The classes are predefined and the task is to understand the basis for the classification from a set of labeled objects. This information is then used to classify future observations.

22http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

Page 23: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Unsupervised Clustering

• Hierarchical methods - These methods provide a hierarchy of clusters, from the smallest, where all objects are in one cluster, through to the largest set, where each observation is in its own cluster.

• Partitioning methods - These usually require the specification of the number of clusters. Then a mechanism for apportioning objects to clusters must be determined.

23http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

Page 24: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together.

24

Hierarchical Clustering

Page 25: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

25

• In both phylogenetic trees and in clustering we create a tree based on distances matrix.

• When computing phylogenetic trees:We compute distances between sequences.• When computing clustering dendograms we

compute distances between expression values.

ATCTGTCCGCTCGATGTGTGCGCTTG

Expr.1 Expr.2 Expr.3 Expr.4 Expr.5 Expr.6

Gene 1

Gene 2

Rings a bell?...

Score Score

Page 26: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

How to determine the similarity between two genes?

Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

26

Page 27: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

27

Hierarchical clustering methods produce a tree or a dendrogram.They avoid specifying how many clusters are appropriate by providing a partition for each K. The partitions are obtained from cutting the tree at different levels.

2 clusters

4 clusters6 clusters

Page 28: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

28

The more clusters you want the higher the similarity is within each cluster.

http://discoveryexhibition.org/pmwiki.php/Entries/Seo2009

Page 29: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Hierarchical clustering results

29http://www.spandidos-publications.com/10.3892/ijo.2012.1644

Page 30: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

An algorithm to classify the data into K number of groups.

30

K=4

Unsupervised Clustering – K-means clustering

Page 31: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

How does it work?

31

The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

1 2 3 4

k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

k clusters are created by associating every observation with the nearest mean

The centroid of each of the k clusters becomes the new means.

Steps 2 and 3 are repeated until convergence has been reached.

Page 32: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

32

How should we determine K?

• Trial and error• Take K as square root of gene number

Page 33: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

33

http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

Tool for clustering - EPclust

Page 34: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

34

Page 35: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

35

Choose distance metricChoose algorithm

Page 36: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

36

Hierarchical clustering

Page 37: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

37

Zoom in by clicking on the nodes

Page 38: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

38

Page 39: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

39

K-means clustering

K-means clustering

Page 40: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Graphical representation of the

cluster

Graphical representation of the

cluster

Samples found in cluster

40

Page 41: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

10 clusters, as requested

41

Page 42: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Now that we have clusters – we want to know what is the function of each group.

There is a need for some kind of generalization for gene functions.

42

Now what?

Page 43: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Gene Ontology (GO)http://www.geneontology.org/

The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

Page 44: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

44

Cellular Component (CC) - the parts of a cell or its extracellular environment.

Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis.

Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

Gene Ontology (GO)

Page 45: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

The GO tree

Page 46: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

GO sources

ISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation

Page 47: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.
Page 48: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

DAVID

Functional Annotation Bioinformatics Microarray Analysis

 

• Identify enriched biological themes, particularly GO terms• Discover enriched functional-related gene/protein groups• Cluster redundant annotation terms• Explore gene names in batch

http://david.abcc.ncifcrf.gov/

Page 49: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

ID conversion

annotation

classification

Page 50: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Functional annotationUpload

Genes from your list

involved in this category

Charts for each

category

Charts for each

category

Charts for each

category

Page 51: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Minimum number of genes for

corresponding term

Maximum EASE score/ E-value

Genes from your list

involved in this category

Genes from your list

involved in this category

E-ValueEnriched terms associated with

your genesSource of term

Page 52: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

52

A group of terms having similar biological meaning due to sharing similar gene members

Page 53: Tutorial 7 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering.

Gene expression analysis• How to interpret an expression matrix

• Expression data DBs - GEO

• General clustering methods Unsupervised Clustering

• Hierarchical clustering• K-means clustering

• Tools for clustering - EPCLUST

• Functional analysis - Go annotation53