CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr....
-
Upload
melvyn-willis -
Category
Documents
-
view
215 -
download
0
Transcript of CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr....
![Page 1: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/1.jpg)
CSCE555 BioinformaticsCSCE555 Bioinformatics
Lecture 12 Phylogenetics I
Meeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555
University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.
HAPPY CHINESE NEW YEAR
![Page 2: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/2.jpg)
OutlineOutline
Introduction to EvolutionWhat is phylogeny and
phylogeneticsApplication of phylogeneticsAlgorithms for phylogenetic
inference
04/20/23 2
![Page 3: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/3.jpg)
How did life evolve on How did life evolve on earth?earth?
Courtesy of the Tree of Life project
An international effort to An international effort to understand how life evolved on understand how life evolved on earthearth
Biomedical applications: drug Biomedical applications: drug design, protein structure and design, protein structure and function prediction, biodiversity.function prediction, biodiversity.
![Page 4: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/4.jpg)
EvolutionEvolution
Evolution of new organisms is driven by
Mutations◦ The DNA sequence can
be changed due to single base changes, deletion/insertion of DNA segments, etc.
Selection bias
![Page 5: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/5.jpg)
Theory of EvolutionTheory of EvolutionBasic idea
◦speciation events lead to creation of different species.
◦Speciation caused by physical separation into groups where different genetic variants become dominant
Any two species share a (possibly distant) common ancestor
![Page 6: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/6.jpg)
Primate evolution
A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.
![Page 7: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/7.jpg)
DNA Sequence EvolutionDNA Sequence Evolution
AAGACTT
TGGACTTAAGGCCT
-3 mil yrs
-2 mil yrs
-1 mil yrs
today
AGGGCAT TAGCCCT AGCACTT
AAGGCCT TGGACTT
TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT
AGGGCAT TAGCCCT AGCACTT
AAGACTT
TGGACTTAAGGCCT
AGGGCAT TAGCCCT AGCACTT
AAGGCCT TGGACTT
AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT
![Page 8: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/8.jpg)
Morphological vs. Morphological vs. MolecularMolecularClassical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.
Modern biological methods allow to use molecular features◦Gene sequences◦Protein sequences◦Whole genome sequences. E.g.
rearrangements
![Page 9: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/9.jpg)
Morphological topology
BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat
Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus
Archonta
Glires
Ungulata
Carnivora
Insectivora
Xenarthra
(Based on Mc Kenna and Bell, 1997)
![Page 10: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/10.jpg)
Rat QEPGGLVVPPTDA
Rabbit QEPGGMVVPPTDA
Gorilla QEPGGLVVPPTDA
Cat REPGGLVVPPTEG
From sequences to a phylogenetic tree
There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).
![Page 11: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/11.jpg)
DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat
Asiatic shrewLong-clawed shrew
MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon
White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus
Perissodactyla
Carnivora
Cetartiodactyla
Rodentia 1
HedgehogsRodentia 2
Primates
ChiropteraMoles+ShrewsAfrotheria
XenarthraLagomorpha
+ Scandentia
Mitochondrial topology(Based on Pupko et al.,)
![Page 12: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/12.jpg)
Phylogenenetic treesPhylogenenetic trees
Leaves - current day species (or taxa – plural of taxon)
Internal vertices - hypothetical common ancestors
Edges length - “time” from one speciation to the next
Aardvark Bison Chimp Dog Elephant
![Page 13: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/13.jpg)
Types of TreesTypes of TreesA natural model to consider is that
of rooted treesCommonAncestor
![Page 14: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/14.jpg)
Types of treesTypes of treesUnrooted tree represents the same
phylogeny without the root node
Depending on the model, data from current day species does not distinguish between different placements of the root.
![Page 15: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/15.jpg)
Rooted versus unrooted treesTree a
ab
Tree b
c
Tree c
Represents the three rooted trees
![Page 16: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/16.jpg)
What is phylogenetics?What is phylogenetics?Phylogenetics is the study of
evolutionary relationships among and within species.◦Inference of trees from data◦Interpreting the evolutionary tree◦Application of evolutionary trees
crocodiles
birds
lizards
snakesrodents
primates
marsupials
![Page 17: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/17.jpg)
What is phylogenetics?What is phylogenetics?
crocodiles
birds
lizards
snakes
rodents
primates
marsupials
This is an example of a phylogenetic tree.
![Page 18: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/18.jpg)
• Forensics:Did a patient’s HIV infection result from an invasive
dental procedure performed by an HIV+ dentist?
Applications of Applications of phylogeneticsphylogenetics
• Conservation:How much gene flow is there among local populations of
island foxes off the coast of California?
• Medicine:What are the evolutionary relationships among the
various prion-related diseases? HIV case
![Page 19: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/19.jpg)
Applications of Applications of phylogeneticsphylogenetics1. Forensics
Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
![Page 20: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/20.jpg)
Phylogenetic analysisPhylogenetic analysis
![Page 21: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/21.jpg)
So what do the results So what do the results mean?mean?
• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?
• Do we have enough data to be confident in our conclusions? What additional data would help?
• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?
![Page 22: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/22.jpg)
Applications of Applications of phylogeneticsphylogenetics2. ConservationHow much gene flow is there
among local populations of island foxes off the coast of California?
![Page 23: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/23.jpg)
http://bioquest.org/bedrock/
Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)
![Page 24: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/24.jpg)
Applications of Applications of phylogeneticsphylogenetics3. MedicineWhat are the evolutionary
relationships among the various prion-related diseases?
![Page 25: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/25.jpg)
Inferring PhylogeniesInferring Phylogenies
Trees can be inferred:
◦ Morphology of the organisms
◦ Sequence comparison
Example:
Orc: ACAGTGACGCCCCAAACGT
Elf: ACAGTGACGCTACAAACGT
Dwarf: CCTGTGACGTAACAAACGA
Hobbit: CCTGTGACGTAGCAAACGA
Human: CCTGTGACGTAGCAAACGA
![Page 26: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/26.jpg)
How Many Trees?How Many Trees?
Unrooted trees Rooted trees
# sequences
# pairwise distances # trees
# branches /
tree # trees
# branches
/tree
3
4
5
6
10
30
N
(assuming bifurcation only)
![Page 27: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/27.jpg)
How Many Trees?How Many Trees?
2N - 2(2N - 3)!
2N - 2 (N - 2)!
2N - 3(2N - 5)!
2N - 3 (N - 3)!
N (N - 1)
2
N
584.95 1038578.69 103643530
1834,459,425172,027,0254510
109459105156
8105715105
6155364
433133
# branches
/tree# trees
# branches /
tree# trees
# pairwise distance
s
# sequence
s
Rooted treesUnrooted trees
![Page 28: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/28.jpg)
Phylogenetic MethodsPhylogenetic Methods
Maximum likelihood• Maximizes likelihood of observed data
Many different procedures exist. Three of the most popular:
Maximum parsimony• Minimizes total evolutionary change
Neighbor-joining• Minimizes distance between nearest
neighbors
![Page 29: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/29.jpg)
Comparison of MethodsComparison of Methods
Neighbor-joining Maximum parsimony Maximum likelihood
Very fast Slow Very slow
Easily trapped in local optima
Assumptions fail when evolution is rapid
Highly dependent on assumed evolution model
Good for generating tentative tree, or choosing among multiple trees
Best option when tractable (<30 taxa, strong conservation)
Good for very small data sets and for testing trees built using other methods
![Page 30: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/30.jpg)
Distance based tree Distance based tree ConstructionConstructionDistance- A weighted tree that realizes the distances
between the objects.Given a set of species (leaves in a supposed tree), and
distances between them – construct a phylogeny which best “fits” the distances.
![Page 31: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/31.jpg)
Distance MatrixDistance MatrixGiven n species, we can compute
the n x n distance matrix Dij
Dij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species.
![Page 32: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/32.jpg)
Distances in TreesDistances in Trees
Edges may have weights reflecting:◦Number of mutations on evolutionary path from one species to another
◦Time estimate for evolution of one species into another
In a tree T, we often compute dij(T) - the length of a path between leaves
i and j
![Page 33: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/33.jpg)
Distance in Trees: an Distance in Trees: an ExampeExampe
d1,4 = 12 + 13 + 14 + 17 + 12 = 68
i
j
![Page 34: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/34.jpg)
Fitting Distance MatrixFitting Distance MatrixGiven n species, we can compute
the n x n distance matrix Dij
Evolution of these genes is described by a tree that we don’t know.
We need an algorithm to construct a tree that best fits the distance matrix Dij
![Page 35: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/35.jpg)
SummarySummaryEvolution and PhylogenyConcepts of Phylogenetics Application of PhylogeneticsCategory of phylogenetic inference
algorithms
Next lecture:Detailed algorithms for phylogenetic
inference
![Page 36: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f055503460f94c1a84f/html5/thumbnails/36.jpg)
AcknowledgementAcknowledgementAnonymous authors