Post on 29-Oct-2019
Evolution, Trees and HIV
Collection date
Cum
ulat
ive
bran
ch le
ngth Trend line
Random biological fact of the day: Galton and regression
https://commons.wikimedia.org/wiki/File:Francis_Galton_1850s.jpghttps://commons.wikimedia.org/wiki/File:Sweet_Pea-01.jpghttps://commons.wikimedia.org/wiki/File:Sweet_pea_(Lathyrus_odoratus)_seeds.jpg
Mother pea diameter
Daug
hter
pea
dia
met
er
Where we’ve been
• Part 1: mechanisms• 4 forces• Pop gen model (simulator)• Sequence change over longer time scales:
substitutions
• Part 2: methods• Calculating distances (Jukes-Cantor)• Reconstructing trees (Neighbor joining)
• Unrooted trees• Inferring host switch events
• Inferences about selection (Ka/Ks)• Cumulative br len vs. collection date plots
https://commons.wikimedia.org/wiki/File:HIV-budding-Color.jpg
Themes• Evolution as organizing principle• Sequence information has many uses• Molecular shape → molecular function• Importance of stochastic processes
♬
Advice for studying…
• Review problems• Written homework• Coding homework• Recitation problems• Activities from lecture
• Go through lecture slides• Spend time connecting the pieces
Topics for today
• Revisiting the homework (a bit)• Some notes on methods we’ve talked about• Questions?
Homework 2: population size, drift and selection
Homework 2: genetic variation, drift and selection
Mutation vs. substitution rate
GGG
GGG
GGG
GGG
GGG GGG
1 replication
GGG
GGG
GGG
GGG
GGG
GGG GGG
Population size is 4, so we have 4 replications per generation
TGG
GGG
GGG
GGG
! mutationssite∗ replication ∗ 0 replicationsgeneration
10substitutionsmutation*Substitution
rate=
Expected number of substitutions per site per generation
=
Homework 3: mutation vs. substitution rate
! mutationssite∗ replication ∗ 0 replicationsgeneration
10substitutionsmutation*Substitution
rate=
Expected number of substitutions per site per generation
=
Case D (no selection) Case E (selection)
>>> mutProb=0.0003>>> printSummary(subL,".6f")0 0.000299 (0.000266-0.000332)1 0.000266 (0.000235-0.000297)2 0.000289 (0.000255-0.000323)>>> mutProb=0.0006>>> printSummary(subL,".6f")0 0.000575 (0.000531-0.000619)1 0.000585 (0.000539-0.000631)2 0.000630 (0.000578-0.000682)
>>> mutProb=0.0003>>> printSummary(subL,".6f")0 0.000050 (0.000039-0.000060)1 0.000052 (0.000040-0.000063)2 0.000293 (0.000268-0.000317)>>> mutProb=0.0006>>> printSummary(subL,".6f")0 0.000112 (0.000095-0.000130)1 0.000101 (0.000085-0.000117)2 0.000583 (0.000551-0.000614)
Topics for today
• Revisiting the homework (a bit)• Some notes on methods we’ve talked about• Questions?
Methods for reconstructing a tree: how they fit together
TAATTCATGAGAAAGATATTAGTA------GCAGGAAT
TAATTCATGAGAAAGATAT
TAGTAGCAGGAAT
Sequence samples
Align sequences
Calculate proportion of sites that are different
613
Jukes-Cantor correction to get substitutions per site
$ = −34 ln 1 − 43* = 0.717Fill in
entry in
distance matrix
Repeat until we have distances for all pairs of sequences we want to look at
Neighbor-joining
Jukes-Cantor: two applications we’ve seen
TAATTCATGAGAAAGATATTAGTA------GCAGGAAT
Calculate proportion of sites that are different
613
Jukes-Cantor correction to get substitutions per site
$ = −34 ln 1 − 43* = 0.717
Ancestor: GTA
CTA (aa)
GCA (aa)
TTA (aa)
GTT (syn)
GTG (syn)GTA
GTA
GTA
Calculate average proportion of sites that are different
$. = −34 ln 1 − 43 ∗316 = 0.216
Jukes-Cantor correction to get substitutions per site
316
Ka/Ks
“Industrial strength” approaches• More realistic trees• Ancestor seq not
required
Ancestor
Our approach• Ancestor seq• Star phylogeny
Topics for today
• Revisiting the homework (a bit)• Some notes on methods we’ve talked about• Questions?