Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs -...

download Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

If you can't read please download the document

description

Gene expression data sources 3 MicroarraysRNA-seq experiments

Transcript of Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs -...

Tutorial 8 Gene expression analysis 1 How to interpret an expression matrix Expression data DBs - GEO Clustering Hierarchical clustering K-means clustering Tools for clustering - EPCLUST Functional analysis Go annotation DAVID 2 Gene expression data sources 3 MicroarraysRNA-seq experiments How to interpret an expression data matrix Each column represents all the gene expression levels from a single sample. Each row represents the expression of a gene across all experiments. Sample 1Sample 2Sample 3Sample 4Sample 5Sample 6 Gene Gene Gene Gene Gene Gene Raw data pre-processing Raw data the data values that we get from the microarray/ sequencer. Raw values are a general term used for the raw measurements made by an instrument. In microarrays the raw data is probe intensities. In sequencing the raw data is counts per gene. Raw data will almost always need to undergo some kind of processing in order to be in adequate quality and have a biological meaning. For example high throughput sequencing raw data are the sequenced reads. They need to get mapped to the genome, possibly filtered, and then variant calling is done. 5 6 Expression profiles DBs GEO (Gene Expression Omnibus)Human genome browserArrayExpress 7 The current rate of submission and processing is over 10,000 samples per month. In 2002 Nature journals announce requirement for microarray data deposit to public databases. 8 Searching for expression profiles in the GEO GEO accession IDs GPL**** - platform ID GSM**** - sample ID GSE**** - series ID GDS**** - dataset ID A Series record defines a set of related samples considered to be part of a group. A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS. 9 Download dataset Clustering Statistical analysis 10 Raw data (soft file) Probes Genes Expression values per sample (GSM) Gene annotations Clustering analysis 12 Zoom in Clustering analysis zoom in 13 14 Clustering analysis zoom in 15 Viewing the expression levels 16 17 Viewing the expression levels 18 Clustering Grouping together genes with a similar signature 19 This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together. 20 Hierarchical Clustering 21 In both phylogenetic trees and in clustering we create a tree based on distance matrix. When computing phylogenetic trees: We compute distances between sequences. When computing clustering dendograms we compute distances between expression values. ATCTGTCCGCTCG ATGTGTGCGCTTG Expr.1Expr.2Expr.3Expr.4Expr.5Expr.6 Gene 1 Gene 2 Rings a bell?... Score 22 Hierarchical clustering methods produce a tree or a dendrogram. They avoid specifying how many clusters are appropriate. The partitions are obtained from cutting the tree at dierent levels. 2 clusters 4 clusters 6 clusters 23 The more clusters you want the higher the similarity is within each cluster./Entries/Seo2009 Hierarchical clustering results 24publications.com/ /ijo You can cluster both samples and genes (separately) An algorithm to classify the data into K number of groups. 25 K=4 Unsupervised Clustering K-means clustering How does it work? 26 The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters. 1 k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). 2 k clusters are created by associating every observation with the nearest mean 3 The centroid of each of the k clusters becomes the new means. 4 Steps 2 and 3 are repeated until convergence has been reached. 27 How should we determine K? Trial and error Take K as square root of gene number 28Tool for clustering - EPclust 29 30 Choose distance metric Choose algorithm 31 Hierarchical clustering 32 Zoom in by clicking on the nodes 33 34 K-means clustering Graphical representation of the cluster Samples found in cluster 35 10 clusters, as requested 36 Now that we have clusters we want to know what is the function of each group. There is a need for some kind of generalization for gene functions. 37 Now what? Gene Ontology (GO)The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains: Biological process Cellular component Molecular function 39 Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Gene Ontology (GO) The GO tree a partial example DAVID Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups ID conversion annotation Functional annotation - upload 44 Gene list you want to explore (for example all the genes in a certain cluster) What is the identifier? (probes/ gene names/ gene IDs) You can supply a background list as well Functional annotation - results 45 Different kinds of enrichments are calculated Genes from your list involved in this category Charts for each category Functional annotation - results Minimum number of genes for corresponding term Maximum EASE score/ E-value Genes from your list involved in this category P-Value Enriched terms associated with your genes Source of term Adjusted P-Value Gene expression analysis 48 How to interpret an expression matrix Expression data DBs - GEO Clustering Hierarchical clustering K-means clustering Tools for clustering - EPCLUST Functional analysis Go annotation DAVID