Phylogenetics workshop: Protein sequence phylogeny week 2
description
Transcript of Phylogenetics workshop: Protein sequence phylogeny week 2
Phylogenetics workshop:Protein sequence phylogeny
week 2
Darren Soanes
• Species trees• Interpretation of trees• Taxon sampling• Tools• Lateral (horizontal) gene transfer• Fast evolving genes
Using DNA sequence to construct trees
TGCTATT TGCTTTT TGCTTTT
TGCTATT – ancestral DNA sequence
TGCTTTT – sequence change due to mutation
Reversals can confuse phylogeniesTGCTATT TGCTATTTGCTTTT TGCTTTT TGCTTTT
TGCTATT – ancestral DNA sequence
TGCTTTT – sequence change
TGCTATTreversal
To minimise the effect of reversals
• Use DNA sequences that are evolving slowly – mutations happen rarely.
• Use long stretches of DNA.• Align sequences, use the parts of the
alignment that show a high degree of conservation.
• rDNA sequences (genes that encode ribosomal RNA) are often used.
Species tree constructed using ribosomal DNA (rDNA) sequence
Using protein sequences to create species trees
• Advantages– protein sequences evolve more slowly than DNA
sequences (many DNA mutations are neutral – they do not change amino acid sequences)
– reversals are less common than in DNA• Single copy protein encoding genes identified• Protein sequences joined together to create a
multiple protein sequence for each species• Sequences aligned • Disadvantage – need sequenced genomes
basidiomycetes
ascomycetes
filamentous ascomycetes
yeasts
zygomycete
30 proteins
60 proteins
Fungal species trees – more proteins = better resolutionoomycete (not fungi)
microsporidia
plant
Fungal Species Tree (based on 153 concatenated protein sequences)
Clades
A clade consists of an ancestor organism and all its descendants.
Gene trees
• The evolutionary history of genes can be represented as phylogenetic trees based on alignment of protein sequences.
• Gene duplication and loss can be inferred from phylogenetic trees.
• Protein sequences evolve more slowly that DNA sequences (due to redundancy in genetic code)
Gene duplication
• Gene duplication due to unequal crossing over during meiosis can create gene families.
• Sequence and function of different members of a gene family can diverge.
Gene duplication
Sequence homology (1)
• Genes are said to be homologous if they share a common evolutionary ancestor.
• Orthologues are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologues retain the same function in the course of evolution. (e.g. myoglobin in mammals).
Sequence homology (2)• Paralogous genes are related by duplication within a
genome. Paralogues often evolve new functions, even if these are related to the original one.
• In-paralogues, paralogues that were duplicated after a speciation and are therefore in the same species
• Out-paralogues, paralogues that were duplicated before a speciation. Not necessarily in the same species.
Orthology and paralogy
Paralogues
In-paraloguesOut-paralogues
A, B and C are different species
α and β are different paralogues of the same gene
Evolution of globin superfamily in human lineage
TOR gene duplication events in fungi
TOR: protein kinase, subunit of a complex that regulate cell growth in response to nutrient availability and cellular stresses
Taxon sampling methods
• BLAST easiest – though subjective• Occurence of Pfam (protein family) motif• Clustering e.g.
– INPARANOID http://inparanoid.sbc.su.se/cgi-bin/index.cgi
– orthoMCL http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi
Minimum bootstrap
• 70% bootstrap is thought to be broadly similar to P-value 0.05
• Minimum bootstrap used depends on study• To improve bootstrap support
– remove poorly aligned sequences if possible, can be due to mis-annotation of genomes.
– Change taxon sampling
Collapse branches with bootstrap less than defined value
Lateral gene transfer (purine-cytosine permease)
oomycete
fungi
Eukaryotic Tree of Life
Phytophthora sojae
Aspergillus oryzae
Genes that evolve quickly (1)
• Synonymous substitution – change in DNA sequence that does not affect the amino acid sequence, often in the third position of a codon, e.g. CCG (Pro)→CCA (Pro).
• Non-synonymous substitution - change in DNA sequence that does affect the amino acid sequence, often in the first or second position of a codon, e.g. CCG (Pro)→CAG (Gln).
Genes that evolve quickly (2)
• For a given protein encoding gene (comparison between orthologues in more than one species)
• dN=number of non-synonomous mutations• dS=number of synonomous mutations• We can calculate the ratio dN/dS.• For most genes this is < 1• Genes under evolutionary pressure to change protein
sequence (diversify), dN/dS > 1
Genes that evolve quickly (3)
• CodeML (part of the PAML package) will calculate dN/dS for a set of orthologues from different (closely related) species.
• Human vs Chimpanzee – rapidly evolving genes involved in immunity, reproduction and olfaction (smell).
• Genes with very low dN/dS (under purifying selection) involved in metabolism, intracellular signalling, nerve / brain function.