Comparative genomics
-
Upload
jin-ratliff -
Category
Documents
-
view
30 -
download
0
description
Transcript of Comparative genomics
![Page 1: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/1.jpg)
Comparative genomics
Haixu Tang
School of Informatics
![Page 2: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/2.jpg)
WGS of human genome
• 2001 Two assemblies of initial human genome sequences published– International Human
Genome project
– Celera Genomics: WGS approach
![Page 3: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/3.jpg)
• 1995 Haemophilus influenzae sequenced
• 1997 E. Coli sequenced
• 1998 Complete sequence of the Caenorhabditis elegans genome
• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome
Model organisms
![Page 4: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/4.jpg)
Why model organisms?
• Testing and improvements of genome sequencing technology and strategy
![Page 5: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/5.jpg)
• 1993 Whole genome shotgun sequencing proposed (J. C. Venter)
• 1995 Haemophilus influenzae sequenced ~1.5-2 MBps
• 1995 Automated fluorescent sequencing instruments and robotic operations (PerkinsElmer, Inc)
• 1996 Yeast sequenced
• 1996 Double barrelled sequencing
• 1997 E. Coli sequenced ~4 Mbps
• 1998 Complete sequence of the Caenorhabditis elegans genome ~ 100 Mbps
• 1998 Whole genome shotgun sequencing (Weber & Myers)
• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome ~ 180 Mbps
Model organisms
![Page 6: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/6.jpg)
Why model organisms?
• Testing and improvements of genome sequencing technology and strategy
• Model organisms have important biological implications themselves.
![Page 7: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/7.jpg)
• 1995 Haemophilus influenzae sequenced (infectious disease)
• 1996 Yeast sequenced (industry and biology)
• 1997 E. Coli sequenced (industry and biotechnology)
• 1998 Complete sequence of the Caenorhabditis elegans genome (multi-cellular organism, development)
• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (genetics, entomology)
Model organisms
![Page 8: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/8.jpg)
Why model organisms?
• Testing and improvements of genome sequencing technology and strategy.
• Model organisms have important biological implications themselves.
• Genome sequences provide useful information to study genome function and evolution.
![Page 9: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/9.jpg)
• 1995 Haemophilus influenzae sequenced (Bacterial)
• 1996 Yeast sequenced (Uni-cellular)
• 1997 E. Coli sequenced (Bacterial)
• 1998 Complete sequence of the Caenorhabditis elegans genome (Multi-cellular organism, nematode)
• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (Multi-cellular organism, insect)
Model organisms
![Page 10: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/10.jpg)
• 2001 Human genome
• 2002 Mouse genome– Initial sequencing and comparative analysis of the
mouse genome
• 2003 Rat genome
• 2004 Chicken genome (first bird)
• 2005 Chimpanzee genome
Model mammalian and vertebrate genomes
![Page 11: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/11.jpg)
Comparative genomics
• Solving biological problems by comparing genomic sequences– Function of genes and genomes– Evolution of genes and genomes
• Data driven approaches– Computational methods are the core
![Page 12: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/12.jpg)
Which genomes to sequence?
• Species having important biological applications• For comparative genomics studies
– Functional consideration• Evolutionary divergent genomes conserved elements, e.g.
human vs. mouse (~75% identical)• Evolutionary close genomes divergent elements, e.g.
human vs. chimpanzee (98.4% identical)
– Evolutionary consideration• Specific evolutionary puzzles whole genome duplications
in yeast
![Page 13: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/13.jpg)
Ongoing eukaryotic genome projects
• http://igweb.integratedgenomics.com/ERGO_supplement/genomes_eukarya.html
• >20 yeast, insects (12 drosophila, 2 mosquitoes, Silkworm), Flea, Sea urchin, frog, fish (Zebrafish, Fugu), Mammals (mouse, rat, dog, cow, pig, monkey, etc.), plants (Arabidopsis, Rice(>2), Maize, etc)
![Page 14: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/14.jpg)
Comparative genomics: case studies
• Gene function and evolution
• Gene-gene relationship
• Genome evolution
![Page 15: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/15.jpg)
• Orthologues : any gene pairwise relation where the ancestor node is a speciation event
• Paralogues : any gene pairwise relation where the ancestor node is a duplication event
HomologueHomologue relationships of geneselationships of genes
![Page 16: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/16.jpg)
Atime
Duplication
M 2’
Speciation
Duplication
M 2
A 1 A 2
M 1 H 1
H 2
Inparalogues
OutparaloguesOrthologues
Inparalogues
Inparalogues
Homologue RelationshipsHomologue Relationships
![Page 17: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/17.jpg)
Functional implications
• Orthologous genes same function in different species
• Paralogous genes different functions
![Page 18: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/18.jpg)
Yeast speciescerevisiae
paradoxus
mikatae
bayanus
glabrata
castellii
lactis
gossypii
waltii
hansenii
albicans
lipolytica
crassa
graminearum
grisea
nidulans
pombe
• 5-20 million years
• Sufficient conservation to align
• Sufficient divergence to identify conserved functional elements
~20M
~5M
![Page 19: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/19.jpg)
Large scale genome evolution
• Most genes have a clear match
• Clear blocks of synteny
![Page 20: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/20.jpg)
![Page 21: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/21.jpg)
Human–chimpanzee comparisons
• POSITIVE SELECTION---A sequence change in a species that results in increased fitness is subject to positive selection. As a consequence, the change normally becomes fixed, leading to adaptive evolution of that species.
![Page 22: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/22.jpg)
Genome vs. Genes
• The whole genome sequence can tell not only what genes exist in a genome, but also what genes do not exist (deleted) in a genome.
![Page 23: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/23.jpg)
Phylogenetic profile analysis
• A non-homologous approach to gene function prediction
• The phylogenetic profile of a gene is a string encoding the presence or absence of the gene in every sequenced genome
• The phylogenetic profiles of genes involving in the same biological process are often “similar'‘, since they may co-evolve.
![Page 24: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/24.jpg)
Phylogenetic profile analysis
• Phylogenetic profile (against N genomes)– For each gene X in a target genome (e.g., E coli), build a
phylogenetic profile as follows– If gene X has a homolog in genome #i, the ith bit of X’s
phylogenetic profile is “1” otherwise it is “0”
![Page 25: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/25.jpg)
Phylogenetic profile analysis
• Example – phylogenetic profiles based on 89 genomes
orf1034:1110110110010111110100010100000000111100011111110110111010101orf1036:1011110001000001010000010010000000010111101110011011010000101orf1037:1101100110000001110010000111111001101111101011101111000010100orf1038:1110100110010010110010011100000101110101101111111111110000101orf1039:1111111111111111111111111111111111111111101111111111111111101orf104: 1000101000000000000000101000000000110000000000000100101000100orf1040:1110111111111101111101111100000111111100111111110110111111101orf1041:1111111111111111110111111111111101111111101111111111111111101orf1042:1110100101010010010110000100001001111110111110101101100010101orf1043:1110100110010000010100111100100001111110101111011101000010101orf1044:1111100111110010010111010111111001111111111111101101100010101orf1045:1111110110110011111111111111111101111111101111111111110010101orf1046:0101100000010001011000000111110000010100000001010010100000000orf1047:0000000000000001000010000001000100000000000000010000000000000orf105: 0110110110100010111101101010111001101100101111100010000010001orf1054:0100100110000001100001000100000000100100100001000100100000000
Genes with similar phylogenetic profiles have related functions or functionally linked – D Eisenberg and colleagues (1999)
![Page 26: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/26.jpg)
Genome evolution
• Genome rearrangement
• Whole genome duplication
![Page 27: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/27.jpg)
Turnip vs Cabbage: Look and Taste Different
• Although cabbages and turnips share a recent common ancestor, they look and taste different
![Page 28: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/28.jpg)
Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information
![Page 29: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/29.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
Before
After
Evolution is manifested as the divergence in gene order
![Page 30: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/30.jpg)
Comparative Genomic Architecture of Human and Mouse Genomes
To locate where corresponding gene is in humans, the relative architecture of human and mouse genomes were analyzed.
![Page 31: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/31.jpg)
Types of Rearrangements
Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6
Translocation1 2 3 44 5 6
1 2 6 4 5 3
1 2 3 4 5 6
1 2 3 4 5 6
Fusion
Fission
![Page 32: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/32.jpg)
Comparative Genomic Architectures: Mouse vs Human Genome
• Humans and mice have similar genomes, but their genes are ordered differently
• ~245 rearrangements– Reversals– Fusions– Fissions– Translocation
![Page 33: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/33.jpg)
Hypothesis (1997): Whole Genome Duplication
cerevisiae
paradoxus
mikatae
bayanus
glabrata
castellii
lactis
gossypii
waltii
hansenii
albicans
lipolytica
crassa
graminearum
grisea
nidulans
pombe
?
~100M
![Page 34: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/34.jpg)
Hypothetical resolution of WGD
• A 1:2 mapping where– nearly every region in species Y would correspond to
two sister regions in S. cerevisiae – the two sister regions in S. cerevisiae would contain
ordered interleaving subsequences of the genes in the corresponding region of species Y
– nearly every region of S. cerevisiae would correspond to one region of species Y, and thus be paired to a sister region in S. cerevisiae
![Page 35: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/35.jpg)
![Page 36: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/36.jpg)
Hypothesis (1997): Whole Genome Duplication
cerevisiae
paradoxus
mikatae
bayanus
glabrata
castellii
lactis
gossypii
waltii
hansenii
albicans
lipolytica
crassa
graminearum
grisea
nidulans
pombe
?
~100M
![Page 37: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/37.jpg)
Aligning the S. cerevisiae and K. waltii genomes
• Most regions in K. waltii mapped to two regions in S. cerevisiae with each containing matches to only a subset of the K. waltii genes
![Page 38: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/38.jpg)
Duplication covers the whole S. cerevisiae genome
![Page 39: Comparative genomics](https://reader035.fdocuments.in/reader035/viewer/2022062721/568136d8550346895d9e7486/html5/thumbnails/39.jpg)
What happens to genes post WGD?
• 12% (457) of paralogous gene pairs were retained
• 76 of the 457 gene pairs (17%) show accelerated protein evolution