Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains...

Post on 29-Dec-2015

219 views 0 download

Transcript of Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains...

Molecular evidence for endosymbiosis

• Perform blastp to investigate sequence similarity among domains of life

• Found yeast nuclear genes exhibit more sequence similarity (closer in evolutionary time) with archaeal genes

• Found yeast mitochondrial genes exhibit more sequence similarity with eubacterial genes

t-test and significance

• t-test determines if the data come from the same population or if there are significant differences

• Calculate the mean of data, standard deviation of each data set, derive a weighted standard deviation to be used in t-test

• Compare to t-critical value obtained from t-table or software

Origins of eukaryotic cells

Martin-Muller hypothesis

Martin and Muller hypothesis

Evidence from phylogenetic relationships

Leprae vs. tuberculosis

• Leprae (3.2Mb) is ~50% coding, contrasted with 4.4 Mb and 91% coding for tuberculosis

• Comparing genomes using Mummer:

• http://www.tigr.org/tigr-scripts/CMR2/webmum/mumplot

How Mummer works:

• Uses suffix trees to create an internal representation of a genome sequence

• Identify maximal unique matches (MUM); version 2.0 uses streaming whereas 1.0 adds sequence 2 to suffix tree for sequence 1

• Alignment via Smith-Waterman

Origin of species

• Mitochondrial DNA and human evolution

• Evolution of pathogens

Phylogeny – data mining by biologists

• Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences

Why phylogenetics?

• Understand evolutionary history

• Map pathogen strain diversity for vaccines

• Assist in epidemiology (Dentist and HIV)

• Aid in prediction of function of novel genes

• Biodiversity

• Microbial ecology

Changes can occur

Observing differences in nucleotides

• The simplest measure of distance between two sequences is to count the # of sites where the two sequences differ

• If all sites are not equally likely to change, the same site may undergo repeated substitutions

• As time goes by, the number of differences between two sequences becomes less and less an accurate estimator of the actual number of substitutions that have occurred

The relationship between time and substitutions is non-linear

Various models have been generated to more accurately estimate distance and evolution

• All use the following framework:

Probability matrix

pAC is the probability of a site starting with an A had a C at the end of time interval t, etc.

Base composition of sequence; fa = frequency of A

Jukes-Cantor Model

• Distance between any two sequences is given by: d = -3/4 ln(1-4/3p)

• p is the proportion of nucleotides that are different in the two sequences

• All substitutions are equally probable– Each position in matrix = ; except diagonal =

1-

Kimura’s two parameter model

• d = ½ ln[1/(1-2P-Q)] + ¼ ln[1/1-2Q)]

• P and Q are proportional differences between the two sequences due to transitions and transversions, respectively.

• Accounts for transition bias in sequences (transversions more rare)

Evolutionary models

Implementing models and building trees

Rooted vs. unrooted

• Root – ancestor of all taxa considered

• Unrooted – relationship without consideration of ancestry

• Often specify root with outgroup– Outgroup – distantly related species (ie.

mammals and an archaeal species)

Tree building

• Get protein/RNA/DNA sequences

• Construct multiple sequence alignment

• Compute pairwise distances (if necessary)

• Build tree – topology and distances

• Estimate reliability

• Visualize

Distance methods

• UPMGA

• Neighbor joining

Unweighted pair-group method using arithmetic averages (UPGMA)• Assumes a constant rate of gene

substitution, evolution• Clustering algorithm that measures

distances between all sequences, merges the closest pair, recalculates that node as an average, then merges the next closest pair, re-iterate

• Usually gives a rooted tree

Testing the reliability of trees

• Interior branch test or Bootstrap analysis

• Bootstrap analysis – subsequences or sequence deletion or replacement; re-draw trees; how many times do you get some branching? Bootstrap values of 70 (95) or greater are normally considered reliable

Homework due on 10/6

• Discovery questions in Chapter 2

• 4, 25-27