The Genome Access Course Phylogenetic Analysis

26
The Genome Access Course Phylogenetic Analysis

description

The Genome Access Course Phylogenetic Analysis. Phylogenetics. Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966). What is the ancestral sequence?. pfeffer pepper (pf/p)e(ff/pp)er. Evolutionary Trees. - PowerPoint PPT Presentation

Transcript of The Genome Access Course Phylogenetic Analysis

Page 1: The Genome Access Course Phylogenetic Analysis

TheGenomeAccessCourse

Phylogenetic Analysis

Page 2: The Genome Access Course Phylogenetic Analysis

Phylogenetics

•Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)

Page 3: The Genome Access Course Phylogenetic Analysis

What is the ancestral sequence?

• pfeffer

• pepper

• (pf/p)e(ff/pp)er

Page 4: The Genome Access Course Phylogenetic Analysis

Evolutionary Trees

• A tree is a connected, acyclic 2D graph

• Leaf: Taxon

• Node: Vertex

• Branch: Edge

• Tree length = sum of all branch lengths

• Phylogenetic trees are binary trees

Page 5: The Genome Access Course Phylogenetic Analysis

A Generic Tree

Page 6: The Genome Access Course Phylogenetic Analysis

Evolutionary Trees

• Rooted– common ancestor– unique path to any leaf– directed

• Unrooted– root could be placed anywhere– fewer possible than rooted

Page 7: The Genome Access Course Phylogenetic Analysis

Rooted Treegenerated by DRAWGRAM (PHYLIP)

Page 8: The Genome Access Course Phylogenetic Analysis

Unrooted Treegenerated by DRAWTREE (PHYLIP)

Page 9: The Genome Access Course Phylogenetic Analysis

Possible Evolutionary Trees

Taxa (n) Rooted(2n-3)!/(2n-2(n-2)!)

Unrooted(2n-5)!/(2n-3(n-3)!)

2 1 1

3 3 1

4 15 3

5 105 15

6 954 105

7 10395 954

8 135135 10395

9 2027025 135135

10 34459425 2027025

Page 10: The Genome Access Course Phylogenetic Analysis

Genes vs. Species

• Sequences show gene relationships, but phylogenetic histories may be different for gene and species

• Genes evolve at different speeds

• Horizontal gene transfer

Page 11: The Genome Access Course Phylogenetic Analysis

Methods for Phylogenetic Analysis

• Character-State– Maximum Parsimony– Maximum Likelihood

• Genetic Distance– Fitch & Margoliash– Neighbor-Joining– Unweighted Pair Group

Page 12: The Genome Access Course Phylogenetic Analysis

Phylogenetic Software

• PHYLIP

• PAUP (Available in GCG)

• TREE-PUZZLE

• PhyloBLAST

• Felsenstein maintains an extensive list of programs on the PHYLIP site

Page 13: The Genome Access Course Phylogenetic Analysis

PHYLIP Programs

• dnapars/protpars

• dnadist/protdist

• dnaml (use fastDNAml instead)

• neighbor

• fitch/kitsch

• drawtree/drawgram

Page 14: The Genome Access Course Phylogenetic Analysis

Maximum Parsimony

• Most common method• Allows use of all evolutionary information• Build and score all possible trees• Each node is a transformation in a character

state• Minimize treelength• Best tree requires the fewest changes to

derive all sequences

Page 15: The Genome Access Course Phylogenetic Analysis

Which is the more parsimonious tree?

9 Node Crossings

8 Node Crossings3 Nodes

3 Nodes

Page 16: The Genome Access Course Phylogenetic Analysis

• Reconstruction using an explicit evolutionary model

• Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data.

• Demanding computationally

• Slowest method

• Use to test (or improve) an existing tree

Maximum Likelihood

Page 17: The Genome Access Course Phylogenetic Analysis

Clustering Algorithms

• Use distances to calculate phylogenetic trees• Trees are based on the relative numbers of

similarities and differences between sequences

• A distance matrix is constructed by computing pairwise distances for all sequences

• Clustering links successively more distant taxa

Page 18: The Genome Access Course Phylogenetic Analysis

DNA Distances

• Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences

• Can only work for pairs of sequences that are similar enough to be aligned

• All base changes are considered equal

• Insertion/deletions are generally given a larger weight than replacements (gap penalties).

• Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.

Page 19: The Genome Access Course Phylogenetic Analysis

Amino Acid Distances

• More difficult to compute

• Substitutions have differing effects on structure

• Some substitutions require more than one DNA mutation

• Use replacement frequencies (PAM, BLOSUM)

Page 20: The Genome Access Course Phylogenetic Analysis

Fitch & Margoliash

• 3 sequences are combined at a time to define branches and calculate their length

• Additive branch lengths

• Accurate for short branches

Page 21: The Genome Access Course Phylogenetic Analysis

Neighbor Joining

• Most common method of tree construction

• Distance matrix adjusted for each taxon depending on its rate of evolution

• Good for simulation studies

• Most efficient computationally

Page 22: The Genome Access Course Phylogenetic Analysis

UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages

• Simplest method

• Calculates branch lengths between most closely related sequences

• Averages distance to next sequence or cluster

• Predicts a position for the root

Page 23: The Genome Access Course Phylogenetic Analysis

Phylogenetic Complications

• Errors

• Loss of function

• Convergent evolution

• Lateral gene transfer

Page 24: The Genome Access Course Phylogenetic Analysis

Validation

• Use several different algorithms and data sets• NJ methods generate one tree, possibly supporting

a tree built by parsimony or maximum likelihood• Bootstrapping

– Perturb data and note effect on tree

– Repeat many times

– Unchanged ~90%, tree’s correctness is supported

Page 25: The Genome Access Course Phylogenetic Analysis

Are there bugs in our genome?

N-acetylneuraminate lyase

Page 26: The Genome Access Course Phylogenetic Analysis

The End