1 Comparative Genomics in Vertebrates : Lessons from the Tetraodon nigroviridis genome.
-
Upload
grace-chandler -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Comparative Genomics in Vertebrates : Lessons from the Tetraodon nigroviridis genome.
1
Comparative Genomics in Comparative Genomics in Vertebrates :Vertebrates :
Lessons from the Tetraodon nigroviridis genome
2
R. Hinegardner 1968
3
Identify human genes by comparison to a compact vertebrate genome
Tetraodon genomic sequence
Human genomic sequence Exons
4
Query: SPWTFPS*FLMSSSMKVPSWSRISSPM*GIL*STVSSST SPWTFPS* L+SSS+KV S S SSPM*GIL T SSSTSbjct: SPWTFPS*LLISSSIKVSSSSFTSSPM*GILHKTXSSST
Query: LLFQLFLALSDLKQLRILHTDLKPDNVMLVD--EKELKIKLMDFGLALLTHEAKT--GTI +L Q+ AL LK L ++H DLKP+N+MLVD + ++K++DFG A +H +KT T Sbjct: ILQQVATALKKLKSLGLIHADLKPENIMLVDPVRQPYRVKVIDFGSA--SHVSKTVCSTY
Query: VNALAQYSHNEDEEEEEEHDFKVDKT-DLCDSKKHPE VNAL QY+ ++D+++ ++ + + +K DL D + ESbjct: VNALGQYNDDDDDDDGDDPEEREEKQKDLEDHRDDKE
Query: RYKELTEQQMPGALPPECTPNMDGPHARSVRREQSLHSFHTLFCRRCFKYDRFLH +YKELTEQQ+PGALPPECTPN+DGP+A+SV+REQSLHSFHTLFCRRCFKYD FLHSbjct: KYKELTEQQLPGALPPECTPNIDGPNAKSVQREQSLHSFHTLFCRRCFKYDCFLH
5
BLAST
A T T G C G T A T G C A G C G T A G C A A T T G C G A T A C
T T A C G C G A T G T A G A C A G C G T A G C A A T G T T G C A
Exact match
Query
Subject
word of size W = 11 bases
6
A T T G C G T A T G C A G C G T A G C A A T T G C G A T A C
T T A C G C G A T G T A G A C A G C G T A G C A A T G T T G C A
Blast:
Query
Subject
T A T G C A G C G T A G C A A T
Scoring matrix NUC.4.4
A T G C NA 5 -4 -4 -4 -2T -4 5 -4 -4 -2G -4 -4 5 -4 -2C -4 -4 -4 5 -2N -2 -2 -2 -2 -1
+5-4-4+5
- 8 < X
X = threshold for cumulative score of successive mismatches = 21 by default
W
7
Word “W” = 3 amino acids
(threshold “X”)
(threshold “T”)
L E C N Q L I P I A H K T C P E G K N L
H K TH L TH V TH Y TY K TN K T
L K C H N T Q L P F I Y K T C P E G K N
Extension
Automaton
TBLASTX, BLASTP, BLASTX
8
A R N D C Q E G H I L K M F P S T W Y V B Z X *A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 B -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4 Z -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1
BLOSUM62 scoring matrix
9
Results
TBLASTX:
- Non substitutive scoring matrix: match = +15mismatch = -12
- Initial anchoring word: W= 5
- Never more than 2 consecutive mismatches: X = 25
10
33 % of Tetraodongenome
TBLASTXW=5, X=25n.s. matrix
(10 hours)
8,3 million alignments
322 annotatedHuman genes
11Length (bp)
% Id
12
Exofish performancesExofish performances
Sensitivity Sensitivity genesgenes 62.5%62.5%exonsexons 27.5%27.5%
SpecificitySpecificity100 %100 %
On 322 genesOn 322 genes
Human geneHuman gene
Tetraodon Tetraodon matchesmatches
EcoresEcores((EEvolutionary volutionary CoConserved nserved ReRegions)gions)
Ecores per gene 2.58
13
ExofishExofish
(fishing for exons…with fish exons)(fishing for exons…with fish exons)
Human genomic sequence
Tetraodon genome
Compute alignments
Assemble selected alignments
EcoresEvolutionary conserved regions
Select alignments
Filter repeats and low complexity regions
14
a
Genscan Genscan
Exofish Exofish Annotation Annotation
Carnitine palmitoyl transferase ICarnitine palmitoyl transferase I
15
Genscan
Exofish
Exofish
Genscan Exofish
Annotation
Annotation
Similar to mouse HTF9C
Ran binding protein 1
KIAA1292 protein
16
Exofish 43728 ecores
Estimating the number of genesin a genome
Refseq(13751 genes)
65 %
35 %
? ecores
(How many genes ?)
Human genome
17
How many genes in the human genome?
42066 ecores found in 42,4% of the human genome
42066 / 0.424 = 99212 ecores in the entire genome
11% of ecores correspond to pseudogenes
99212 x 0.89 = 88299 ecores correspond to genes
A gene possesses on average between 2.58 and 3.18 ecore88299 / 3.18 = 27767 genes88299 / 2.58 = 34224 genes
28000 < Human genes < 34000
Estimation based on 42 % of the human genome (january 2000)
18
Genesweep (organisé par Ewan Birney a Cold Spring Harbor in may 2000).
Science, 28 may 2000
19
Organism Nb. Genes Size genome
Virus flu 8 0,001 MbVirus polio 1 0,007 Mb
Mycoplasma genitalium 480 0,58 MbArcheoglobus fulgidus 2.420 2,18 MbMesorhizobium loti 6.746 7,03 Mb
Yeast 6.000 16 MbNematode 19.000 100 MbDrosophila 14.000 120 MbArabidopsis 25.000 100 MbHuman 30.000 3000 Mb
Number of genes in eukaryotes; a paradox ?
Paramecium 40.000 80 Mb
How to estimate the number of genes in a genome…… without knowing the sequence?
92 93 94 95 96 97 98 99 00 01 02 03 04 05 06
20 000
40 000
160 000
140 000
120 000
100 000
80 000
60 000
?
(Antequera and Bird)
(Fields et al.)
(Roest Crollius et al.)
(Lander et al.)
Published estimates
(Ewing and Green et al.)
(Liang et al.)
21
64% of the genome is anchored to chromosomes
36% remains as independent sequences
350 Mb genome21 chromosomes
Whole Genome Shotgun Sequencing : 8 X
Assembly with Arachne
Physical mapping
Gnatostomata(jawed vertebrates)
Chondrichthyes(cartilaginous fishes)
Actinoptérygiens(ray finned fish)
Osteichthyes(bony fishes)
Mammals
Tetrapodes Coelacanthimorpha
Sauropsidae
Mus musculus
Homosapiens
Gallusgallus
Oryziaslatipes
Tetraodonnigroviridis
Takifugurubripes
Danio rerio
Sarcopterygiens(lobe finned fish)
Teleosts Acipenseriforms(sturgeons,…)
Percomorphs Otophysi
CypriniformsBeloniforms Tetraodontiforms
225 my
Pal
éoz
oic
Méz
ozoi
c
23
Ancestral species
orthologs
paralogs
Species 1 Species 2
speciation
A B
duplication
B’
• A and B derive from an ancestral gene by speciationspeciation: they are orthologsorthologs
• B’ appears by duplication of B: they are paralogsparalogs
Signature?
Signature?
24
Ancestral genome
Duplication Deletionsintra-chromosomal
rearrangementsFusions
and fissions
Translocations
Time (tenth of million years)
25
Tetraodon Takifugu
n = 1078Ks <=0.35 n = 330 30.6%Ks > 0.35 n = 748 69.4%
n = 995Ks <=0.35 n = 179 18.0%Ks > 0.35 n = 816 82.0%
Identification of duplicate genes
26
Distribution of 748 duplicate genes in the Tetraodon genome
27
Common ancestor
duplication
diploidization
Homo sapiens Tetraodon nigroviridis
28
Human genome:Synteny with the Tetraodon genome
Tetraodon genome:Synteny with the human genome
29
30
The Paleozoic era
31Drawing by Z. Burian under the direction of Prof. J. Augusta
The giant placoderm Dunkleosteus (~7 metres) chases two Cladoselache sarks
Gnatostomata(jawed vertebrates)
Chondrichthyes(cartilaginous vertebrates)
Actinoptérygiens(ray finned fishes)
Osteichthyes(bony vertebrates)
Mammals
Tetrapods Coelacanthimorpha
Sauropsidae
Mus musculus
Homosapiens
Gallusgallus
Oryzalatipes
Tetraodonnigroviridis
Takifugurubripes
Danio rerio
Sarcopterygiens(lobbed finned fishes)
Teleosts Acipenseriforms(esturgeons,…)
Percomorphs Otophysi
CypriniformsTetraodontiformsBeloniforms
33
34
The ancestral osteichthyes genome (bony vertebrates)
35
What are the intermediary steps in the evolution of the Tetraodon and the human genome ?
36
Modeling the evolution of a duplicated Tetraodon chromosome
Gene order is progressively rearranged over time along Tetraodon and human chromosomes (independently)
The degree of rearrangement along a chromosome segment is thus a measure of elapsed time
Modeling a few simple cases of chromosomal rearrangements in Tetraodon:
1) No rearrangement2) a recent fusion between two chromosomes3) an ancient fusion between two chromosomes4) a fission (break) of a chromosome
37
A simple case: no interchromosomal rearrangement after the dulication
Tetraodon nigroviridisHomo sapiens
Ancestral genome
38
39
1 2 3 4 5 6 7 8 9 10 11 12 13
14
15
16
17
18
19
20
21
Chromosomes Tetraodon
1
2
3
4
5
6
789
10
11
12
13141516
171819
2021
22X
Chr
omos
ome
s H
uma
in
Distribution of 6884 orthologs in their respective genomes
9 11
1
2
3
4
5
6
7
89
10
11
12
13141516
171819
2021
22X
Tetraodon chromosomes
40
41
42
Olivier JaillonJean-Marc AuryJean-Louis PetitLaurence BouneauCécile FischerAlain BernotSophie NicaudCarole DossatBéatrice SegurensCorinne DasilvaMarcel SalanoubatMichael LevyNathalie BoudetVéronique AnthouardClaire JubinVanina CastelliMichael KatinkaBenoît VacherieZineb SkalliLaurence CattolicoJulie PoulainSimone DupratPhilippe BrottierGuillaume LardierVincent SchachterFrancis QuetierWilliam SaurinClaude ScarpelliPatrick WinckerJean WeissenbachHugues Roest Crollius
Georges LutfallaChristian BiémontJean-Nicolas Volff
Jérôme GouzyDaniel Kahn
Nicole Stange-ThomannEvan MauceliDavid JaffeSheila FisherKevin J. McKernanPaul McEwanStephanie BosakMike ZodyJill MesirovKerstin Lindblad-TohBruce BirrenChad NusbaumEric S. Lander
Jean-Pierre CoutanceauCatherine Ozouf-Costaz
Frédéric BrunetMarc Robinson-RechaviVincent Laudet
Sergi CastellanoGenis ParraCharles ChappleRoderic Guigó