Comparative genomics

39
Comparative genomics Haixu Tang School of Informatics

description

Comparative genomics. Haixu Tang School of Informatics. WGS of human genome. 2001 Two assemblies of initial human genome sequences published International Human Genome project Celera Genomics: WGS approach. Model organisms. 1995 Haemophilus influenzae sequenced 1997 E. Coli sequenced - PowerPoint PPT Presentation

Transcript of Comparative genomics

Page 1: Comparative genomics

Comparative genomics

Haixu Tang

School of Informatics

Page 2: Comparative genomics

WGS of human genome

• 2001 Two assemblies of initial human genome sequences published– International Human

Genome project

– Celera Genomics: WGS approach

Page 3: Comparative genomics

• 1995 Haemophilus influenzae sequenced

• 1997 E. Coli sequenced

• 1998 Complete sequence of the Caenorhabditis elegans genome

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome

Model organisms

Page 4: Comparative genomics

Why model organisms?

• Testing and improvements of genome sequencing technology and strategy

Page 5: Comparative genomics

• 1993 Whole genome shotgun sequencing proposed (J. C. Venter)

• 1995 Haemophilus influenzae sequenced ~1.5-2 MBps

• 1995 Automated fluorescent sequencing instruments and robotic operations (PerkinsElmer, Inc)

• 1996 Yeast sequenced

• 1996 Double barrelled sequencing

• 1997 E. Coli sequenced ~4 Mbps

• 1998 Complete sequence of the Caenorhabditis elegans genome ~ 100 Mbps

• 1998 Whole genome shotgun sequencing (Weber & Myers)

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome ~ 180 Mbps

Model organisms

Page 6: Comparative genomics

Why model organisms?

• Testing and improvements of genome sequencing technology and strategy

• Model organisms have important biological implications themselves.

Page 7: Comparative genomics

• 1995 Haemophilus influenzae sequenced (infectious disease)

• 1996 Yeast sequenced (industry and biology)

• 1997 E. Coli sequenced (industry and biotechnology)

• 1998 Complete sequence of the Caenorhabditis elegans genome (multi-cellular organism, development)

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (genetics, entomology)

Model organisms

Page 8: Comparative genomics

Why model organisms?

• Testing and improvements of genome sequencing technology and strategy.

• Model organisms have important biological implications themselves.

• Genome sequences provide useful information to study genome function and evolution.

Page 9: Comparative genomics

• 1995 Haemophilus influenzae sequenced (Bacterial)

• 1996 Yeast sequenced (Uni-cellular)

• 1997 E. Coli sequenced (Bacterial)

• 1998 Complete sequence of the Caenorhabditis elegans genome (Multi-cellular organism, nematode)

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (Multi-cellular organism, insect)

Model organisms

Page 10: Comparative genomics

• 2001 Human genome

• 2002 Mouse genome– Initial sequencing and comparative analysis of the

mouse genome

• 2003 Rat genome

• 2004 Chicken genome (first bird)

• 2005 Chimpanzee genome

Model mammalian and vertebrate genomes

Page 11: Comparative genomics

Comparative genomics

• Solving biological problems by comparing genomic sequences– Function of genes and genomes– Evolution of genes and genomes

• Data driven approaches– Computational methods are the core

Page 12: Comparative genomics

Which genomes to sequence?

• Species having important biological applications• For comparative genomics studies

– Functional consideration• Evolutionary divergent genomes conserved elements, e.g.

human vs. mouse (~75% identical)• Evolutionary close genomes divergent elements, e.g.

human vs. chimpanzee (98.4% identical)

– Evolutionary consideration• Specific evolutionary puzzles whole genome duplications

in yeast

Page 13: Comparative genomics

Ongoing eukaryotic genome projects

• http://igweb.integratedgenomics.com/ERGO_supplement/genomes_eukarya.html

• >20 yeast, insects (12 drosophila, 2 mosquitoes, Silkworm), Flea, Sea urchin, frog, fish (Zebrafish, Fugu), Mammals (mouse, rat, dog, cow, pig, monkey, etc.), plants (Arabidopsis, Rice(>2), Maize, etc)

Page 14: Comparative genomics

Comparative genomics: case studies

• Gene function and evolution

• Gene-gene relationship

• Genome evolution

Page 15: Comparative genomics

• Orthologues : any gene pairwise relation where the ancestor node is a speciation event

• Paralogues : any gene pairwise relation where the ancestor node is a duplication event

HomologueHomologue relationships of geneselationships of genes

Page 16: Comparative genomics

Atime

Duplication

M 2’

Speciation

Duplication

M 2

A 1 A 2

M 1 H 1

H 2

Inparalogues

OutparaloguesOrthologues

Inparalogues

Inparalogues

Homologue RelationshipsHomologue Relationships

Page 17: Comparative genomics

Functional implications

• Orthologous genes same function in different species

• Paralogous genes different functions

Page 18: Comparative genomics

Yeast speciescerevisiae

paradoxus

mikatae

bayanus

glabrata

castellii

lactis

gossypii

waltii

hansenii

albicans

lipolytica

crassa

graminearum

grisea

nidulans

pombe

• 5-20 million years

• Sufficient conservation to align

• Sufficient divergence to identify conserved functional elements

~20M

~5M

Page 19: Comparative genomics

Large scale genome evolution

• Most genes have a clear match

• Clear blocks of synteny

Page 20: Comparative genomics
Page 21: Comparative genomics

Human–chimpanzee comparisons

• POSITIVE SELECTION---A sequence change in a species that results in increased fitness is subject to positive selection. As a consequence, the change normally becomes fixed, leading to adaptive evolution of that species.

Page 22: Comparative genomics

Genome vs. Genes

• The whole genome sequence can tell not only what genes exist in a genome, but also what genes do not exist (deleted) in a genome.

Page 23: Comparative genomics

Phylogenetic profile analysis

• A non-homologous approach to gene function prediction

• The phylogenetic profile of a gene is a string encoding the presence or absence of the gene in every sequenced genome

• The phylogenetic profiles of genes involving in the same biological process are often “similar'‘, since they may co-evolve.

Page 24: Comparative genomics

Phylogenetic profile analysis

• Phylogenetic profile (against N genomes)– For each gene X in a target genome (e.g., E coli), build a

phylogenetic profile as follows– If gene X has a homolog in genome #i, the ith bit of X’s

phylogenetic profile is “1” otherwise it is “0”

Page 25: Comparative genomics

Phylogenetic profile analysis

• Example – phylogenetic profiles based on 89 genomes

orf1034:1110110110010111110100010100000000111100011111110110111010101orf1036:1011110001000001010000010010000000010111101110011011010000101orf1037:1101100110000001110010000111111001101111101011101111000010100orf1038:1110100110010010110010011100000101110101101111111111110000101orf1039:1111111111111111111111111111111111111111101111111111111111101orf104: 1000101000000000000000101000000000110000000000000100101000100orf1040:1110111111111101111101111100000111111100111111110110111111101orf1041:1111111111111111110111111111111101111111101111111111111111101orf1042:1110100101010010010110000100001001111110111110101101100010101orf1043:1110100110010000010100111100100001111110101111011101000010101orf1044:1111100111110010010111010111111001111111111111101101100010101orf1045:1111110110110011111111111111111101111111101111111111110010101orf1046:0101100000010001011000000111110000010100000001010010100000000orf1047:0000000000000001000010000001000100000000000000010000000000000orf105: 0110110110100010111101101010111001101100101111100010000010001orf1054:0100100110000001100001000100000000100100100001000100100000000

Genes with similar phylogenetic profiles have related functions or functionally linked – D Eisenberg and colleagues (1999)

Page 26: Comparative genomics

Genome evolution

• Genome rearrangement

• Whole genome duplication

Page 27: Comparative genomics

Turnip vs Cabbage: Look and Taste Different

• Although cabbages and turnips share a recent common ancestor, they look and taste different

Page 28: Comparative genomics

Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information

Page 29: Comparative genomics

Turnip vs Cabbage: Different mtDNA Gene Order

• Gene order comparison:

Before

After

Evolution is manifested as the divergence in gene order

Page 30: Comparative genomics

Comparative Genomic Architecture of Human and Mouse Genomes

To locate where corresponding gene is in humans, the relative architecture of human and mouse genomes were analyzed.

Page 31: Comparative genomics

Types of Rearrangements

Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6

Translocation1 2 3 44 5 6

1 2 6 4 5 3

1 2 3 4 5 6

1 2 3 4 5 6

Fusion

Fission

Page 32: Comparative genomics

Comparative Genomic Architectures: Mouse vs Human Genome

• Humans and mice have similar genomes, but their genes are ordered differently

• ~245 rearrangements– Reversals– Fusions– Fissions– Translocation

Page 33: Comparative genomics

Hypothesis (1997): Whole Genome Duplication

cerevisiae

paradoxus

mikatae

bayanus

glabrata

castellii

lactis

gossypii

waltii

hansenii

albicans

lipolytica

crassa

graminearum

grisea

nidulans

pombe

?

~100M

Page 34: Comparative genomics

Hypothetical resolution of WGD

• A 1:2 mapping where– nearly every region in species Y would correspond to

two sister regions in S. cerevisiae – the two sister regions in S. cerevisiae would contain

ordered interleaving subsequences of the genes in the corresponding region of species Y

– nearly every region of S. cerevisiae would correspond to one region of species Y, and thus be paired to a sister region in S. cerevisiae

Page 35: Comparative genomics
Page 36: Comparative genomics

Hypothesis (1997): Whole Genome Duplication

cerevisiae

paradoxus

mikatae

bayanus

glabrata

castellii

lactis

gossypii

waltii

hansenii

albicans

lipolytica

crassa

graminearum

grisea

nidulans

pombe

?

~100M

Page 37: Comparative genomics

Aligning the S. cerevisiae and K. waltii genomes

• Most regions in K. waltii mapped to two regions in S. cerevisiae with each containing matches to only a subset of the K. waltii genes

Page 38: Comparative genomics

Duplication covers the whole S. cerevisiae genome

Page 39: Comparative genomics

What happens to genes post WGD?

• 12% (457) of paralogous gene pairs were retained

• 76 of the 457 gene pairs (17%) show accelerated protein evolution