Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

51
Lecture 5 : Lecture 5 : Phylogenies Phylogenies 9/16/09
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Page 1: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Lecture 5 : PhylogeniesLecture 5 : Phylogenies

9/16/09

Page 2: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Translated blast = protein vs translated database

Page 3: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Blasting Genbank - blastnBlasting Genbank - blastn

Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum

AX8GS9DG01S

Page 4: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Blasting Genbank - discont Blasting Genbank - discont megablast - exactly same as megablast - exactly same as

blastnblastn

Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum

AX9N23U7014

Page 5: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Blasting Genbank - megablast - Blasting Genbank - megablast - same species but different ordersame species but different order

Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum

AX9TUM1G016

Page 6: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Blasting Genbank - Blasting Genbank - TblastnTblastn

AX9DYYTE01N

T. aculeatus - echidna S. brachyurus - quokka S. crassicaudata - fat tailed dunnart M. fasciatus - numbat I. obesulus - quenda

Page 7: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Species found by BLASTSpecies found by BLAST

I. obesulus = quenda = bandicoot

T. aculeatus = echidna

M. fasciatus = numbat

T. rostratus = honey possum S. crassicaudata

= fat tailed dunnart

O. anatinus = platypus

S. brachyurus = quokka

Z. bruijni - Long beaked echidna

Page 8: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Homologene - can be reached Homologene - can be reached from NCBI home pagefrom NCBI home page

Scroll down - they are listed alphabetically

Page 9: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

QuestionsQuestions

Phylogenies - what are they?

1. How do we build them?

2. What do they tell us?

Page 10: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

PhylogenyPhylogeny Evolutionary

history of a a group of organisms, especially as depicted in a family tree

Haeckel, 1879

Page 11: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Things trees might tell Things trees might tell you :you :

How are organisms with particular trait related?

Did trait evolve multiple times or only once?

What is evolutionary pathwayOf organismsOf genes

Page 12: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Molecules can be used to Molecules can be used to learn how organisms are learn how organisms are

relatedrelated

Page 13: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

To learn about vertebrate To learn about vertebrate evolution: Compare >600 genesevolution: Compare >600 genes

1998

Page 14: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Used genes to measure time

1) Time since common ancestor with human

2) Time since two groups diverged

Page 15: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

More recent version of vertebrate evolution which shows divergence times on the animal tree

Ponting 2008

Page 16: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

OrangutanHumanChimpRhesus monkey

MouseRat

DogCatHorseCowOpposum

Wallaby

Anole

Chicken

FrogFish -Medaka Fugu Tetraodon ZebrafishElephant sharkLamprey

Platypus

Page 17: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Primates 25 MY

Mammals 100 MY100 MY

All vertebrates 550 MY

Tetrapods 420 MY420 MY

Fish 320 MY

Page 18: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Molecular clockMolecular clock

Molecules change at a steady rate We can calibrate how fast they

change using fossils The molecules then become a time

piece to measure how recently different groups split off from each other

Page 19: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Sequence conservation may Sequence conservation may be highbe high

Gene might code for a protein which is highly constrained

Might have to interact with lots of other proteins

Selection might be quite strong

Page 20: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Sequence conservation may Sequence conservation may be lowbe low

Not much constraint

Few sites of interaction

Selection might be weak

Page 21: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Phylogeny stepsPhylogeny steps

Align sequences so homologous AA can be compared

Determine the similarity between sequences

Use this to generate a relationship between sequences

Page 22: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Clustalw2 to align Clustalw2 to align sequencessequences

Page 23: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Put sequences in FASTA Put sequences in FASTA filefile

>TetraodonG1MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGTPINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPLFGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVRAAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCAFFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGGAVDDETSVSASKTEVSSVS

>ZebrafishG1MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGFPINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVMEGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPLFGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKAAAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPAFFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA

>CichlidG1MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGTPINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAIEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPLFGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKAAAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPAFFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGGMVEDETSVSTSKTEVSSVS

Page 24: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Aligned sequences .aln ; Jalview gives colored version

Funky tree .dnd (need special program to draw)

Scroll down this page for tree (use Phylogram)

Page 25: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

CLUSTAL W (1.83) multiple sequence alignment

TetraodonG1 MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGT 60CichlidG1 MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGT 60ZebrafishG1 --------MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGF 52 *****.***********:****::*.****.:* ** **:***:** *

TetraodonG1 PINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAV 120CichlidG1 PINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAI 120ZebrafishG1 PINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVM 112 *** ***.****:***************.** ** ****::: .:: **: **. *.:

TetraodonG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPL 180CichlidG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPL 180ZebrafishG1 EGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPL 172 ***:*****:**************************:. ** .*: ***:** :** ***

TetraodonG1 FGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVR- 239CichlidG1 FGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKA 240ZebrafishG1 FGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKA 232 ******:***** ********* * :******:*** ** :** ********* **:

TetraodonG1 AAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCA 299CichlidG1 AAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPA 300ZebrafishG1 AAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPA 292 ******:* *****::***** :***:***.**.***:*:.***:*:**:* . : *: *

TetraodonG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGG--AVDDETS-VSASKTEVSSVS-- 351CichlidG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGG--MVEDETS-VSTSKTEVSSVS-- 352ZebrafishG1 FFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA 349 **:*:**::**:****:*****.***.*: * :**:* **:*********

Page 26: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Alignment is keyAlignment is key

Any other analysis that you do is only as good as your alignment

If your alignment is bad subsequent analyses will be bad

Junk in = Junk out

Page 27: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

AlignmentsAlignments

Tell you about sequence conservationHow much is there?Where is it?

Page 28: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Calculate sequence Calculate sequence similaritiessimilarities

Zebrafish M--------NGTEGSNFYIPMSNR Trout M------Q-NGTEGSNFYIPMSNR Medaka M------E-NGTEGKNFYIPMNNR Cod M----RMEANGTEGKNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR Goldfish M--------NGTEGNNFYVPLSNR Killifish M---GYG-PNGTEGNNFYIPMSNK * *****.***:*:.*:

Pairwise comparisons

Page 29: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Use tree to show Use tree to show sequence relationshipssequence relationships

Short branches mean sequences are more similarLong branches mean there are more differences

Page 30: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Q3. How do we build Q3. How do we build phylogenies?phylogenies?

Assume the relationships involve bifurcating branches

ATC

ATG

ACG

CCG

CCC

ATC

ATG

ACG

CCG

CCC

Page 31: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Methods to determine Methods to determine similaritiessimilarities

Parsimony

Distance

Maximum likelihood

Bayesian

Page 32: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

ParsimonyParsimony

The least complex explanation is the most likely to be correctOccam’s razor

The preferred phylogenetic tree is one that requires fewest changes Count up # changes for all possible

treesFind the shortest one

Page 33: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Trees based on parsimonyTrees based on parsimony

ATCG

ATCG

ACCG

ACCG

ATCG

ACCG

ATCG

ACCG

CT

CT

CT

Most parsimonious

Page 34: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Trees based on parsimonyTrees based on parsimony

T

T

C

C

T

C

T

C

CT

CT

CT

Most parsimonious

Page 35: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Can’t always distinguish tree Can’t always distinguish tree topologiestopologies

T

T

C

C

T

T

C

C

CT CT

Equally parsimonious

Page 36: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Other limitationsOther limitations

All changes are weighted the sameC-T same as C - ASame no matter how long it takes for

the change to occur

Page 37: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Distance methodsDistance methods

Calculate a numerical value for sequence differencesDo for all pairwise combinations

Build tree by joining most similar sequences and then more divergent

Page 38: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Distance methodsDistance methods

Fast Pretty robust Only deals with data in pairs

Page 39: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Pairwise distancesPairwise distances

Taxa1 AACGGTCATGGCGTTGCATTTaxa2 AACGGTCAGGGCGTTGCATTTaxa3 AACGGTCACGCCGCTGCATT

1 2 3

1 0 .05 .15

2 .05 0 .15

3 .15 .15 0

Page 40: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Distance, dDistance, d

p is fractional similarity of sequence

Simplest form of distance: d = 1 - p

AACGGTCATGGCGTTGCATTAACGGTCACGGCGTTGCATT

p = 19/20 d = 0.05

Page 41: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Tree buildingTree building

Neighbor joiningJoin most similar pair of sequencesAdd more divergent after

1 2 3

1 0 .05 .15

2 .05 0 .15

3 .15 .15 0

1

2

3

Page 42: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

How different can 2 sequences How different can 2 sequences get?get?

At infinite time, random probability that two sequences are the sameProbability a base is same = 1/4

DNA only has 4 basesCertain sites will start to change

multiple timesNeed to account for these multiple hits

Page 43: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Random sequencesRandom sequences

Write down 20 bases of sequence

Page 44: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Compare your sequence Compare your sequence to this oneto this one

AGTCCGATTACGGCTAGCAG

What fraction of sites are the same in the two sequences?

Page 45: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Sequence similarity Sequence similarity decays to 25% over long decays to 25% over long

timestimes

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3 3.5

Time

Sequence similarity

Page 46: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Sequence difference Sequence difference maxes at 0.75maxes at 0.75

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5

Time

Sequence difference

Page 47: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Sequence change accumulates Sequence change accumulates linearly with time at beginninglinearly with time at beginning

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5

Time

Sequence difference

Page 48: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

DNA modelsDNA models Use different DNA models to

account for how sequences evolve with timeAllows you to apply different molecular

clocksRelate sequence change to timeClock is not linear except for small

changes and short times Models same as used in maximum

likelihood methods

Page 49: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

How good is your tree?How good is your tree?

Bootstrap approachRun the same method multiple timesSubsample data each time

Use 50% of dataSee how reproducible the trees areCount how many times a particular

grouping occurs

Page 50: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Distance tree Distance tree for rod and for rod and cone cone transducin transducin alpha alpha subunitsubunit

Branch lengths Branch lengths are are proportional to proportional to sequence sequence

differencesdifferences

Page 51: Lecture 5 : Phylogenies 9/16/09. Translated blast = protein vs translated database.

Boot strap values are given for each node which tells how reproducible that

grouping is

58

100

100

95

98

72

69

72

98

86

98

68

97