AutomaticOptimizationofan Silico ModelofHumaniPSCDerived ...
In silico reconstruction of an ancestral mammalian genome
description
Transcript of In silico reconstruction of an ancestral mammalian genome
![Page 1: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/1.jpg)
In silico reconstruction of an ancestral mammalian genome
UQAM
Seminaire de bioinformatique
Mathieu Blanchette
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 2: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/2.jpg)
CGACTGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGCATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGA TGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTCGATTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGCAATA CGACTGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGCA CGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTCGTAACGTTACGCATGACGATCAGACTACGCATAGATAGAGCCGATCATCT CAGACGACGATCAGACTACTATATCAGCAGATTACGGTGGCATACTAATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAAA CGACGATCAGACTACTATATCAGCAGATTACGGTGCGCGAATTCATATATTTACGTTACGCATGACGATCAGACTACGCATAGATAGATTGATA CATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGCATATTTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGATCATCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTAGCATTCTCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAATGC ACGACGATCAGACTACTATATCAGCAGATTACGGTGATAGATACGATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGATAGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGATACGCATGACGATCAGACTACGCATAGATAGATTATTACTGGATACTGCA
The Human genome• Sequence of ~3*109 nucleotides
• Complete sequence is known (2001)
HOW DOES IT WORK??
![Page 3: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/3.jpg)
Comparative Genomics
• Goal: Functional annotation of the genome– What is the role of each region of the genome?– Very hard to answer….
• Idea: Look not only at what our genome is now, but also at how it evolved– Different types of functional regions have different evolutionary
signatures
• Complete genomes are sequenced for:– Human, chimp, mouse, rat, house, chicken, zebrafish, pufferfish
• Partial genomes are available for:– Dog, cow, rabbit, elephant, armadillo
![Page 4: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/4.jpg)
![Page 5: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/5.jpg)
MutationsG(t) = ACGTAGGCGATCAG---ATCGATG(t+1)= ACGAAGG--ATCAGGGGATCGAT
• Other less frequent mutations:- Duplications- Genome rearrangements (e.g. large inversions)
• Mutations happen randomly• Natural selection favors mutations that improve fitness
Substitutions Deletions Insertions
![Page 6: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/6.jpg)
A random walk in genome space
![Page 7: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/7.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
http://www.broad.mit.edu/personal/jpvinson/phylogenetics/bigtree_1_0.jpg
Mammalian evolution
-Rapid radiation ~75 Myrs ago
-Many nearly independent phyla
-Many “noisy” copies of ancestor
- Accurate reconstruction of ancestors may be feasible
![Page 8: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/8.jpg)
Ancestral Genome ReconstructionGiven: - Genomic sequences of several mammals
- Phylogenetic tree
Find: The genomic sequence of all their ancestors
ARMADILLO TGCTACTAATATTTAGTACATAGAGCCCAGGGGTGCTGCTGAAAGTCTTAAAATGCACAGTGTAGCCCCTCCTCC
COW GCCTCTCTTTCTGCCCTGCAGGCTAGAATGTATCACTTAGATGTTCCAAATCAGAAAGTGTTCAGCCATTTCCATACC
HORSE GTCACAATTTAGGAAGTGCCACTGGCCTCTAGAGGGTAGAAGACAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCC
CAT GTCACAGTTTAGGGGGTACTACTGGCATCTATCGGGTGGAGGATAGGGATACTGATAATCATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCC
DOG GTCACAATTTGGGGGATACTACTGGCATCTAATGGGTAGAGGACAGGGATACTGATAATTGCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCC
HEDGEHOG GTCATAGTTTGATTATATGGGCTTCTTAGTAGACAAAGAAAAAGATGTTCTGGTAGTCATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTC
MOUSE GTCACAGTTTGGAGGATGTTACTGACATCTAGAGAGTAGACTTTAAAGATACTGATAGTCACCCCATTGTGCACCTCC
RAT GTCACAATTTGGAGGATGTTACTGGCATCTAGAGAGTAGACTTTAAGGACACTGATAATCATACTATGCTGCACTTCC
RABBIT ATCACAATTTGGGGAACACCACTGGCATCTCGGGTAGCAGGCCAGGCATGCTGGTAATTATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACC
LEMUR ATCACAATTGGGGGTGCCACGGTCCTCCAGTGGGTAGAGAACAGGGAGGCTGATAACCACCCTGCAGTGCACAGGGCAGTGCCCCACTCCCACCAC
MOUSE-LEMUR ATCACAGTTGGGGGATGCCACTGGCCTCAAGTGGGTAGAGAACAGGGAGGCTGAAAACCACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCC
VERVET GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC
MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGAATGCTTATAATCATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCC
BABOON GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAAAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC
ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTCGACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCC
GORILLA GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGTGGGGATGCTTATACTCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC
CHIMP GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC
HUMAN GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC
Mutational operations• Small-scale : Substitutions, deletions, insertions (inc. transposons)• Large scale: Genome rearrangement, segmental/tandem duplications
(*): Heterochromatin non-included
All of it: Functional, non-functional, introns,
intergenic, repeats, everything*!
![Page 9: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/9.jpg)
Reconstruction algorithm
1) Identify syntenic regions in each species• Blastz (Schwartz et al.) and Chaining/netting
program (Kent et al.)
• In ENCODE case: targeted BAC sequencing
![Page 10: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/10.jpg)
Reconstruction algorithm
2) Compute multiple genome alignment• TBA program (Blanchette, Miller, et al.)
ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG
COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA
HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA
CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA
DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA
HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA
MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG
RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG
RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG
LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG
MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG
VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG
MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG
BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG
ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG
GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG
CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA
HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG
• Goal: Phylogenetic correctness• Two nucleotides are aligned if and only if
they have a common ancestor.
![Page 11: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/11.jpg)
Reconstruction algorithm
3) Reconstruct insertion/deletion history • Find most likely explanation for gaps observed
ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG
COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA
HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA
CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA
DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA
HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA
MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG
RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG
RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG
LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG
MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG
VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG
MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG
BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG
ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG
GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG
CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA
HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG
![Page 12: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/12.jpg)
Reconstruction algorithm
3) Reconstruct insertion/deletion history • Find most likely explanation for gaps observed
ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG
COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA
HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA
CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA
DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA
HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA
MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG
RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG
RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG
LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG
MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG
VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG
MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG
BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG
ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG
GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG
CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA
HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG
![Page 13: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/13.jpg)
Reconstruction algorithm
3) Reconstruct insertion/deletion history – Find most likely explanation for gaps observed
• This defines the presence/absence of a base at each position of each ancestor
ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG
COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA
HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA
CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA
DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA
HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA
MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG
RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG
RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG
LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG
MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG
VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG
MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG
BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG
ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG
GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG
CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA
HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG
NNNNNNNNNNNNNNNNNNNNNNNNNNNN-----N-NNNNN-NNNNNNN-NN-NNNNNNNNNNNNNNNNN----------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
![Page 14: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/14.jpg)
Reconstruction algorithm
ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG
COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA
HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA
CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA
DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA
HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA
MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG
RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG
RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG
LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG
MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG
VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG
MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG
BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG
ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG
GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG
CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA
HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG
GTCACAATTTGGGGGATGCTACTGGCAT-----C-TAGTG-GGTAGAG-AA-CAGGGATGCTGATAATC----------ATCCTACAGTGCACAGGACAGTGCCCCCACCCCCACTCCAACAACAAAGAATTATCCGGCCCAAAATGCCAATA--------GT--GCCCAGG
4) Infer max.-like. nucleotide at each position– Felsenstein algo. with context-sensitive model
• Ancestral sequences are inferred!
![Page 15: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/15.jpg)
Optimal indel reconstructionNot so easy!
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
![Page 16: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/16.jpg)
Reconstructing indel historyNot so easy!
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
![Page 17: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/17.jpg)
Reconstructing indel historyNot so easy!
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
![Page 18: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/18.jpg)
Reconstructing indel historyNot so easy!
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NN----------------------NNNNNNN
NNNN-----------------------NNNN
NNNNNN---------------------NNNN
![Page 19: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/19.jpg)
Reconstructing indel historyNot so easy!
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
NNNNNNNNNNNNNNN
NN------NNNNNNN
NNNN-------NNNN
NNNNNN-----NNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NN----------------------NNNNNNN
NNNN-----------------------NNNN
NNNNNN---------------------NNNN
![Page 20: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/20.jpg)
Inferring indel history• Given:
– A multiple sequence alignment, – A phylogenetic tree, – Probability model for deletions
• Probability depends on deletion length and branch length
– Probability model for insertions• Probability depends on insertion length, branch length, and content
• Find: The most likely set of insertions and deletions that lead to the given alignment
• NP-hard (Chindelevitch et al. 2006)• Fredslund et al. (2003): Restricted enumeration• Blanchette et al. (2004): Greedy algorithm• Chindelevitch et al. (2006): Integer Linear Programming
![Page 21: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/21.jpg)
Partial Results - Deletions only• If only deletions are allowed and all deletions have
the same probability (cost), then:– Rectangle-covering problem, where the tree determines
which sets of rows of admissible
NNNNNNN---NN-----NNNNNNNNN--NN-----NN---NNNNNNNNNN---NNN--NNNNNNNNNNNNNN
– Exact polynomial-time greedy algorithm
– Idea: There always exists a “forced moved”, i.e. a gap that can only be covered by a single maximal deletion
![Page 22: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/22.jpg)
Measuring accuracy• We use simulations of mammalian sequence
evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.
- Start with a random (realistic) ancestral sequence
AGCATAGA
![Page 23: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/23.jpg)
Measuring accuracy• We use simulations of mammalian sequence
evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.
1) Simulate evolution along the mammalian treeAGCATAGAACGACGATAAGCATAAGCATCAGAGCAAATCAGACTACAAGCATCAGCAGGAGGCTAGGACATCAAGGACACCAAGGACACCAAGGACCCCAAGGACCCCAAGGATTCAGGATTCAGGATTCAGGGTTCAGGGTTC
AGCATAGA
AGGATAGA
AGCATTAGA
AGCATTGAGA
![Page 24: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/24.jpg)
Measuring accuracy• We use simulations of mammalian
sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.
- Use TBA to align the sequences generatedAG-C-AT---ACGA-CG---A----GC---AGC--AT---AGCA-A----AGAC-TA---AGCAATC---AGGC------AGGC------AGGA-CA---AGGA-CACCAAGGA-CACCAAGGA-CCCCAAGGA-CCCCAAGGA--TTC-AGGA--TTC-AGGA--TTC-AGGG--TTC-AGGG--TTC-
AGCATAGA
AGGATAGA
AGCATTAGA
AGCATTGAGA
![Page 25: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/25.jpg)
Measuring accuracy• We use simulations of mammalian
sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.
- Reconstruct indel history: AG-C-AT---ACGA-CG---A----GC---AGC--AT---AGCA-A----AGAC-TA---AGCAATC---AGGC------AGGC------AGGA-CA---AGGA-CACCAAGGA-CACCAAGGA-CCCCAAGGA-CCCCAAGGA--TTC-AGGA--TTC-AGGA--TTC-AGGG--TTC-AGGG--TTC-
AGCATAGA
AGGATAGA
AGCATTAGA
AGCATTGAGA
![Page 26: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/26.jpg)
Measuring accuracy• We use simulations of mammalian
sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.
- Infer ancestral sequences at each nodeAG-C-AT---ACGA-CG---A----GC---AGC--AT---AGCA-A----AGAC-TA---AGCAATC---AGGC------AGGC------AGGA-CA---AGGA-CACCAAGGA-CACCAAGGA-CCCCAAGGA-CCCCAAGGA--TTC-AGGA--TTC-AGGA--TTC-AGGG--TTC-AGGG--TTC-
AGCATAGA
AGGATAGA
AGCATTAGA
AGCATTGAGA
AGATCGA
AGCTTGAGA
AGTATTTAGA
AGTATAGGA
![Page 27: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/27.jpg)
Measuring accuracy• We use simulations of mammalian
sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.
For each node, align true and predicted ancestorCount: Missing bases
+ Added bases+ Substituted
basesAGCATAGA
AGGATAGA
ACGCATTAGA
AGCATTGAGA
AGATCGA
AGCTTGAGA
AGTATTTAGA
AGTATAGGA
ACGCATT-AGA A-GTATTTAGA
3 errors/10 bp Error rate = 0.3
![Page 28: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/28.jpg)
Simulation details• We simulate neutrally evolving regions of 50kb • We model: - Lineage-specific neutral mutation rates - Insertions and deletions based on empirical frequency and length distributions - Insertion of transposable elements - CpG effect• We don’t model: - DNA polymerase slippage - Positive selection - Genome rearrangement, duplications• Sanity checks: Simulated sequences are similar to actual mammalian
sequences: – Same pair-wise percent identity– Same frequency and length distribution of insertions and deletions
– Same repetitive content and age distribution of repeats
![Page 29: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/29.jpg)
Guess which ancestor can be best reconstructed?
Eizirik et al. 2001
![Page 30: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/30.jpg)
Reconstructability and tree topology
R
Star phylogeny• Leaves are independent• Accuracy approaches 100% exponentially fast as n increases
n independent descendents
Bifurcating root• Information lost between R and A or B can’t be recovered• Can’t do better than if A and B were reconstructed perfectly• Accuracy < 100% - for all n
n dependent descendents
R
A
B
![Page 31: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/31.jpg)
Eizirik et al. 2001
![Page 32: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/32.jpg)
How many species do we need?
Best choice of species:- Sample many taxa- Choose slowly evolving species
0
2
4
6
8
10
12
14
4 5 7 10 15 20
Number of species used
Percentage of error
Missing basesAdded basesMismatches
![Page 33: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/33.jpg)
What if the fast-radiation model is wrong?
0
1
2
3
4
5
6
7
0 1 2 4
Multiplicative factor for early branches
Error percentage
Added bases
Missing bases
Mismatches
![Page 34: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/34.jpg)
Reconstructing real ancestors
![Page 35: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/35.jpg)
MOUSE-LEMUR
COW
RAT
CHIMP, GORILLA, ORANGUTAN, MACAQUE, VERVET, BABOON
For this set of species, simulations predict:
- Expected accuracy ~95%
![Page 36: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/36.jpg)
Transposon consensus
Actual mammalian ancestor
External validation using ancestral transposons
Human relic
![Page 37: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/37.jpg)
Transposon consensus
Actual mammalian ancestor
0.391 subst/site
0.117 subst/site
External validation using ancestral transposons
Reconstructedmammalian ancestor
Human relic
0.314 subst/site
![Page 38: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/38.jpg)
Transposon consensus
Actual mammalian ancestor
0.391 subst/site
0.117 subst/site Error = 0.026 subst/site
External validation using ancestral transposons
Reconstructedmammalian ancestor
Human relic
0.314 subst/site
![Page 39: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/39.jpg)
![Page 40: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/40.jpg)
What’s next? Whole genome!• Data available
– Whole genomes: Human, chimp, mouse, rat, dog– Unassembled/ low coverage genomes: Cow, rabbit,
armadillo, elephant
• Challenges:– Fewer species– Unassembled contigs– Genome rearrangements– Recombination hotspots
We expect that 90% of theBoreoeutherian genome can be reconstructed with ~90% accuracy
![Page 41: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/41.jpg)
![Page 42: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/42.jpg)
Why should we care?
• Ancestral genome allows to see what and when changes happened in our genome– Allows detection and “dating” of lineage specific
innovations (e.g. FOXP2).
• Allows a better understanding of the forces driving genome evolution
• New model organism?– Human genome is 4 times closer to the ancestral
genome than to the mouse genome: better model for human phenotypes?
![Page 43: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/43.jpg)
![Page 44: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/44.jpg)
Even if we had the full genomes of all living mammalian species:
• Technological problem: – We can’t synthesize large regions of DNA
• Many regions can’t be reconstructed at all:– Heterochromatin– Regions with high recombination rates
• 99% base-by-base accuracy is not enough– One mistake may be enough to make life impossible
![Page 45: In silico reconstruction of an ancestral mammalian genome](https://reader036.fdocuments.in/reader036/viewer/2022062409/568150a5550346895dbead02/html5/thumbnails/45.jpg)
Acknowledgements
• David Haussler, Brian Raney UC Santa Cruz• Webb Miller Penn State Univ.• Eric Green NHGRI
• UC Santa Cruz group:– Adam Siepel, Robert Baertsch, Gill Bejerano, Jim Kent
• McGill group:– Leonid Chindelevitch, Zhentao Li, Eric Blais