Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary...
![Page 1: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/1.jpg)
Phylogeny
- A brief introduction in 4 hours -
![Page 2: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/2.jpg)
Outline
• Introduction• Practical approach• Evolutionary models• Distance-based methods / TP5_1• Databases and software• Sequence-based methods / TP5_2
![Page 3: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/3.jpg)
What is phylogeny?
![Page 4: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/4.jpg)
Phylogeny is the evolutionary history and relationship of species.
![Page 5: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/5.jpg)
Why is phylogeny of interest in a proteomics
course?
![Page 6: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/6.jpg)
What data types can be used to infer phylogenies?
• Morphological characters• Physiological characters• Gene order (e.g. in mitochondria)• Sequence data
– Nucleotide sequences– Amino acid sequences
• Mixed characters• ….
![Page 7: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/7.jpg)
What is a phylogenetic tree?
• A phylogenetic tree is a model about the evolutionary relationship between species (OTUs) based on homologous characters
• But not all trees are phylogenetic trees– Dendrogram = general term for a
branching diagram– Cladogram: branching diagram without
branch length estimates– Phylogenetic tree or Phylogram: branching
diagram with branch length estimates
![Page 8: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/8.jpg)
What is a phylogenetic tree?
• Rooted or unrooted• bifurcating or multifurcating
(solved or unsolved)
![Page 9: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/9.jpg)
Gene duplication• Prokaryots: at least 50%• Eukaryots: >90%
![Page 10: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/10.jpg)
After gene duplication• Coexistence (normally only for a short
while)• Mostly, only one copy is retained
– becomes nonfunctional (non-functionalization),– becomes a pseudogene (pseudogenization)– is lost
• Both copies are retained– Distinct expression pattern– Distinct subcellular location (rare)– One copy keeps the original function, the other
copy acquires a new function (neofunctionalization)
– Deleterious mutations in both entries (subfunctionalization)
![Page 11: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/11.jpg)
Human gene A
Mouse gene B
Mouse gene A
Human gene B
Frog gene A
Frog gene B
Drosophila gene AB
Orthologs
Orthologs
Paralogs
Homologs
Gene duplication
Ancestral gene
Relationships within homologs
![Page 12: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/12.jpg)
Homologs …Homologs = Genes of common originOrthologs = 1. Genes resulting from a speciation event, 2. Genes originating
from an ancestral gene in the last common ancestor of the compared genomes
Co-orthologs = Orthologs that have undergone lineage-specific gene duplications subsequent to a particular speciation event
Paralogs = Genes resulting from gene duplicationInparalogs = Paralogs resulting from lineage-specific duplication(s)
subsequent to a particular speciation eventOutparalogs = Paralogs resulting from gene duplication(s) preceding a
particular speciation eventOne-to-one (1:1) orthologs = Orthologs with no (known) lineage-specific gene
duplications subsequent to a particular speciation eventOne-to-many (1:n) orthologs: Orthologs of which at least one - and at most all
but one - has undergone lineage-specific gene duplication subsequent to a particular speciation event
Many-to-many (n:n) orthologs = Orthologs which have undergone lineage-specific gene duplications subsequent to a particular speciation event
Xenologs = Orthologs derived by horizontal gene transfer from another lineage
![Page 13: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/13.jpg)
Human gene A
Mouse gene B
Mouse gene A
Human gene B
Frog gene A
Frog gene B
Drosophila gene AB
Inparalogs of Group 2
Gene duplication
Ancestral gene
Co-orthologs of Drosophila gene AB
Orthologs (Group 1)
Outparalogs of Group 1
Orthologs (Group 2)
Relationships between orthologs and paralogs
![Page 14: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/14.jpg)
Practical approach I
Actin-related protein 2 (first 60 columns of the alignment)
ARP2_A MESAP---IVLDNGTGFVKVGYAKDNFPRFQFPSIVGRPILRAEEKTGNVQIKDVMVGDEARP2_B MDSQGRKVIVVDNGTGFVKCGYAGTNFPAHIFPSMVGRPIVRSTQRVGNIEIKDLMVGEEARP2_C MDSQGRKVVVCDNGTGFVKCGYAGSNFPEHIFPALVGRPIIRSTTKVGNIEIKDLMVGDEARP2_D MDSQGRKVVVCDNGTGFVKCGYAGSNFPEHIFPALVGRPIIRSTTKVGNIEIKDLMVGDEARP2_E MDSKGRNVIVCDNGTGFVKCGYAGSNFPTHIFPSMVGRPMIRAVNKIGDIEVKDLMVGDE *:* :* ******** *** *** . **::****::*: . *::::**:***:*
Species are:Caenorhabditis briggsaeDrosophila melanogasterHomo sapiensMus musculusSchizosaccharomyces pombe
Can you build a dendrogram (tree) for the sequences of the alignment?Can you assign the species to the corresponding sequences of the alignment?
![Page 15: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/15.jpg)
Phylogenetic analysis
1. Select Data2. Alignment3. Select a data model4. Select a substitution model5. Tree-building
• [Distance matrix]• Tree-building
6. Tree evaluation
![Page 16: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/16.jpg)
Select data
• To be considered:– Input data must be homolog!– Number of character states– Content of phylogenetic information– Size of the dataset– Automated cluster data from large
datasets– etc
![Page 17: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/17.jpg)
Alignment
• MSA methods– ClustalW– muscle– MAFFT– Probcons– T-coffee– …
• See previous course …
![Page 18: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/18.jpg)
Data model
= Characters selected for the analysis
• To be considered:– Each character should be homolog!– Missing data (in some OTU)– Number of characters– etc
![Page 19: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/19.jpg)
Evolutionary modelsPhylogenetic tree-building presumes
particular evolutionary modelsThe model used influences the outcome of
the analysis and should be considered in the interpretation of the analysis results
• Which aspects are to be considered?1. Frequencies of aa exchange2. Change of aa frequencies during evolution3. Between-site rate variation or Among-site
substitution rate heterogenity4. Presence of invariable sites
![Page 20: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/20.jpg)
Evolutionary modelsNotation, e.g.
JTTJTT + FJTT + F + gamma (4 )JTT + F + gamma (8 ) + I (under discussion)JTT + F + I
It is not always the most complex model that produces the best result.
The more complex the model, the more complex the explanation of the results.
![Page 21: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/21.jpg)
Tree-building methods
• Distance (matrix) methods1. Calculate distances for all pairs of taxa
based on the sequence alignment2. Construct a phylogenetic tree based on
a distance matrix
• Character-based (Sequence) methods
1. Constructs a phylogenetic tree based on the sequence alignment
![Page 22: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/22.jpg)
Step 1: Compute distances
1. Estimate the number of amino acid substitutions between sequence pairs
p distance: p=nd/n
p = proportion (p distance)nd= number of aa differences
n = number of aa used
^
![Page 23: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/23.jpg)
Step 1: Compute distances
• Nonlinear relationship of p with t (time)
• Estimation of aa substitutions– Poisson correction
• PC distance
– Gamma correction• Gamma distance
![Page 24: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/24.jpg)
Step 2: Tree-building
Common distance methods• Neighbor Joining (NJ)• UPGMA / WPGMA• Least Square (LS)• Minimal Evolution (ME)
![Page 25: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/25.jpg)
Neighbor Joining (NJ)• Saitou, Nei (1987)• Principle
– Clustering method– Simplified minimal evolution principle– Neighbors = taxa connected by a single
node in an unrooted tree– Computational process: Star tree, followed
by a successive joining of neighbors and the creation of new pairs of neighbors
– Result: • A single final tree with branch length estimates• unrooted tree
![Page 26: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/26.jpg)
Neighbor Joining (NJ)
• Sum of branch lengths in the star tree
• Calculate the sum of all branch lengths for all possible neighbors …
![Page 27: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/27.jpg)
Neighbor Joining (NJ)
• Calculate Length X-Y
• Calculate again sum of all branch length
![Page 28: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/28.jpg)
Neighbor Joining (NJ)
![Page 29: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/29.jpg)
Neighbor Joining (NJ)
• Advantage– Very efficient– Also for large datasets
• Disadvantage– Does not examine all possible
topologies
![Page 30: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/30.jpg)
Bootstrap
• Used to test the robustness of a tree topology
• by Bradley Efron (1979)• Felsenstein (1985)• Principle: new MSA datasets are created by
choosing randomly N columns from the original MSA; where N is the length of the original MSA
• 100-1000 replicates• Bootstrap support values: (75%), 95%, 98%
![Page 31: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/31.jpg)
TP5 - 1st part, Exercises 1-5
http://education.expasy.org/m07_phylo.html
![Page 32: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/32.jpg)
Ortholog databases & phylogenetic databases
Some databases providing orthologous groups and trees
• COG/KOG• HOGENOM• Ensembl• OMA browser• OrthoDB• OrthoMCL
• Pfam• PANDIT• SYSTERS• TreeBase• Tree of Life
![Page 33: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/33.jpg)
Phylogenetic software
Software packages• Freely available
– Phylip – BioNJ– PhyML– Tree Puzzle– MrBayes
• Commercial– PAUP– MEGA
![Page 34: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/34.jpg)
Phylogenetic servers
• http://www.phylogeny.fr/• http://bioweb.pasteur.fr/seqanal/phylogeny/intro-
uk.html• http://atgc.lirmm.fr/phyml/• http://phylobench.vital-it.ch/raxml-bb/• http://www.fbsc.ncifcrf.gov/app/htdocs/appdb/
drawpage.php?appname=PAUP• http://power.nhri.org.tw/power/home.htm
![Page 35: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/35.jpg)
Sequence methods
Most common:• Maximum Parsimony (MP)• Maximum Likelihood (ML)• Baysian Inference
![Page 36: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/36.jpg)
Maximum Parsimony (MP)
• Originally developed for morphological characters
• Henning, 1966• William of Ockham: the best
hypothesis is the one that requires the smallest number of assumptions
![Page 37: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/37.jpg)
Maximum Parsimony (MP)• Principle:
– Estimate the minimum number of substitutions for a given topology
– Parsimony-informative sites (exclude invariable sites and singletons)
– Searching MP trees• Exhaustive search• Branch-and-bound (Hendy-Penny, 1982)
– Good but time-consuming, if m>20• Heuristic search
– Result tree might not be the most parsimonious tree
– Result• Multiple result trees are possible (strict consensus
tree, majority-rule consensus tree)• Most parsimonious tree vs true tree• Unrooted result trees
![Page 38: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/38.jpg)
Maximum Parsimony (MP)
• Advantages– Free from assumptions (model-free)
• Disadvantages– Does not take into account homoplasy– Long-branch attraction (LBA): creates
wrong topologies, if the substitution rate varies extensively between lineages
![Page 39: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/39.jpg)
Maximum Likelihood (ML)• Cavalli-Sforza, Edwards (1967), gene frequency data• Felsenstein (1981), nucleotide sequences• Kishino (1990), proteins• Principle
– Maximizes the likelihood of observing the sequence data for a specific model of character state changes
– Likelihood of a site = Sum of probabilities of every possible reconstruction of ancestral states at the internal nodes
– Likelyhood of the tree = Product of the likelihoods for all sites (=sum of log likelihoods)
– Result = tree with the highest likelihood• Maximized to estimate branch lengths, not topologies• Search strategies: rarely exhaustive, mostly heuristic
• NNI (Nearest neighbor interchanges)• TBR (Tree bisection-reconnection)• SPR (Subtree pruning and regrafting)
![Page 40: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/40.jpg)
Number of possible trees
• Unrooted bifurcating trees:
• Rooted bifurcating trees:
![Page 41: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/41.jpg)
Number of possible trees
Leaves Rooted Unrooted
![Page 42: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/42.jpg)
Number of possible trees
Leaves Unrooted Rooted 3 1 3 4 3 15 5 15 105 6 105 945 7 945 10395 8 10395 135135 9 135135 202702510 2027025 34459425
![Page 43: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/43.jpg)
Maximum Likelihood (ML)
• Methods:– ProML (Phylip)– PhyML– RaxML– …
![Page 44: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/44.jpg)
Tree evaluation
1. Topology1. Comparison with species tree2. Robustness, e.g. bootstrap
2. Branch lengths
![Page 45: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2a5503460f949fec29/html5/thumbnails/45.jpg)
TP5 – 2nd part, Exercise 6
http://education.expasy.org/m07_phylo.html