The Genome Access Course Phylogenetic Analysis
description
Transcript of The Genome Access Course Phylogenetic Analysis
![Page 1: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/1.jpg)
TheGenomeAccessCourse
Phylogenetic Analysis
![Page 2: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/2.jpg)
Phylogenetics
•Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)
![Page 3: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/3.jpg)
What is the ancestral sequence?
• pfeffer
• pepper
• (pf/p)e(ff/pp)er
![Page 4: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/4.jpg)
Evolutionary Trees
• A tree is a connected, acyclic 2D graph
• Leaf: Taxon
• Node: Vertex
• Branch: Edge
• Tree length = sum of all branch lengths
• Phylogenetic trees are binary trees
![Page 5: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/5.jpg)
A Generic Tree
![Page 6: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/6.jpg)
Evolutionary Trees
• Rooted– common ancestor– unique path to any leaf– directed
• Unrooted– root could be placed anywhere– fewer possible than rooted
![Page 7: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/7.jpg)
Rooted Treegenerated by DRAWGRAM (PHYLIP)
![Page 8: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/8.jpg)
Unrooted Treegenerated by DRAWTREE (PHYLIP)
![Page 9: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/9.jpg)
Possible Evolutionary Trees
Taxa (n) Rooted(2n-3)!/(2n-2(n-2)!)
Unrooted(2n-5)!/(2n-3(n-3)!)
2 1 1
3 3 1
4 15 3
5 105 15
6 954 105
7 10395 954
8 135135 10395
9 2027025 135135
10 34459425 2027025
![Page 10: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/10.jpg)
Genes vs. Species
• Sequences show gene relationships, but phylogenetic histories may be different for gene and species
• Genes evolve at different speeds
• Horizontal gene transfer
![Page 11: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/11.jpg)
Methods for Phylogenetic Analysis
• Character-State– Maximum Parsimony– Maximum Likelihood
• Genetic Distance– Fitch & Margoliash– Neighbor-Joining– Unweighted Pair Group
![Page 12: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/12.jpg)
Phylogenetic Software
• PHYLIP
• PAUP (Available in GCG)
• TREE-PUZZLE
• PhyloBLAST
• Felsenstein maintains an extensive list of programs on the PHYLIP site
![Page 13: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/13.jpg)
PHYLIP Programs
• dnapars/protpars
• dnadist/protdist
• dnaml (use fastDNAml instead)
• neighbor
• fitch/kitsch
• drawtree/drawgram
![Page 14: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/14.jpg)
Maximum Parsimony
• Most common method• Allows use of all evolutionary information• Build and score all possible trees• Each node is a transformation in a character
state• Minimize treelength• Best tree requires the fewest changes to
derive all sequences
![Page 15: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/15.jpg)
Which is the more parsimonious tree?
9 Node Crossings
8 Node Crossings3 Nodes
3 Nodes
![Page 16: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/16.jpg)
• Reconstruction using an explicit evolutionary model
• Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data.
• Demanding computationally
• Slowest method
• Use to test (or improve) an existing tree
Maximum Likelihood
![Page 17: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/17.jpg)
Clustering Algorithms
• Use distances to calculate phylogenetic trees• Trees are based on the relative numbers of
similarities and differences between sequences
• A distance matrix is constructed by computing pairwise distances for all sequences
• Clustering links successively more distant taxa
![Page 18: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/18.jpg)
DNA Distances
• Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences
• Can only work for pairs of sequences that are similar enough to be aligned
• All base changes are considered equal
• Insertion/deletions are generally given a larger weight than replacements (gap penalties).
• Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.
![Page 19: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/19.jpg)
Amino Acid Distances
• More difficult to compute
• Substitutions have differing effects on structure
• Some substitutions require more than one DNA mutation
• Use replacement frequencies (PAM, BLOSUM)
![Page 20: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/20.jpg)
Fitch & Margoliash
• 3 sequences are combined at a time to define branches and calculate their length
• Additive branch lengths
• Accurate for short branches
![Page 21: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/21.jpg)
Neighbor Joining
• Most common method of tree construction
• Distance matrix adjusted for each taxon depending on its rate of evolution
• Good for simulation studies
• Most efficient computationally
![Page 22: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/22.jpg)
UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages
• Simplest method
• Calculates branch lengths between most closely related sequences
• Averages distance to next sequence or cluster
• Predicts a position for the root
![Page 23: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/23.jpg)
Phylogenetic Complications
• Errors
• Loss of function
• Convergent evolution
• Lateral gene transfer
![Page 24: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/24.jpg)
Validation
• Use several different algorithms and data sets• NJ methods generate one tree, possibly supporting
a tree built by parsimony or maximum likelihood• Bootstrapping
– Perturb data and note effect on tree
– Repeat many times
– Unchanged ~90%, tree’s correctness is supported
![Page 25: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/25.jpg)
Are there bugs in our genome?
N-acetylneuraminate lyase
![Page 26: The Genome Access Course Phylogenetic Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062423/5681467a550346895db39dfd/html5/thumbnails/26.jpg)
The End