Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

11
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Page 1: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Gene ontology & hypergeometric test

Simon Rasmussen

CBS - DTU

Page 2: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

The DNA Microarray Analysis Pipeline

Sample PreparationHybridization

Array designProbe design

Experimental Design

Buy standardChip / Array

Statistical AnalysisFit to Model (time series)

Expression IndexCalculation

Advanced Data AnalysisClustering PCA Gene Annotation Analysis Promoter Analysis

Classification Meta analysis Survival analysis Regulatory Network

ComparableGene Expression Data

Normalization

Image analysis

Question/hypothesis

Page 3: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Gene Ontology

• Gene Ontology (GO) is a collection of controlled vocabularies describing the biology of a gene product in any organism

• Very useful for interpreting biological function of microarray data

• Organized in 3 independent sets of ontologies in a tree structure– Molecular function (MF), Biological process (BP),

Cellular compartment (CC)

Page 4: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Tree structure

• Controlled networked terms (total ~25.000)

– Parent / child network organized as a tree

– Terms get more detailed as you move down the network

Page 5: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Relationship

• A gene can be– present in any of the ontologies (MF / BP /

CC)– a member of several GO terms

• True path rule– If a gene is member of a term it is also

member of the terms parents

Page 6: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

GO Tree example

•visit www.geneontology.org for more information

Page 7: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

KEGG

• KEGG PATHWAYS:– Manually drawn pathway maps representing our

knowledge on the molecular interaction and reaction networks, for a large selection of organisms

• 1. Metabolism• 2. Genetic Information Processing• 3. Environmental Information Processing • 4. Cellular Processes• 5. Human Diseases • 6. Drug Development

Other pathway database: Reactome

Page 8: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

KEGG example

Page 9: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Using Gene ontology

• Input: Any list of genes; from microarray exp.– Cluster of genes with similar expression– Up/down regulated genes

• Question we ask:– Are any GO terms overrepresented in the gene list,

compared to what would happen by chance?

• Method– Hypergeometric testing

Page 10: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

• The hypergeometric distribution arises from sampling from a fixed population.

10 balls

• We want to calculate the probability for drawing 7 or more white balls out of 10 balls given the distribution of balls in the urn

20 white ballsout of

100 balls

Hypergeometric test

Page 11: Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Example

• List of 80 significant genes from a microarray experiment of yeast (~ 6000 genes)

• 10 of the 80 genes are in BP-GO term: DNA replication– Total nr of yeast genes in GO term is 100

• What is the probability of this occurring by chance?

The GO term DNA replication is overrepresented in our list

100 white ballsout of

6000 balls

10 x

70 x

Total 80 balls

p = 6.2 * 10-8