how do we explain the patterns of variation observed in DNA sequences?
description
Transcript of how do we explain the patterns of variation observed in DNA sequences?
![Page 1: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/1.jpg)
how do we explain the patterns of variation observed in DNA sequences? how do we detect selection by comparing silent site substitutions to replacement substitutions? how do we detect selection by comparing fixed differences between species to polymorphisms within species? how do we detect selection by using hitchhiking?
Goal: understand the logic behind key tests.
Molecular evolution:
![Page 2: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/2.jpg)
Neutralist vs. selectionist view
Are most substitutions due to drift or natural selection?
“Neutralist” vs. “selectionist”Agree that:
Most mutations are deleterious and are removed.
Some mutations are favourable and are fixed.Dispute:
Are most replacement mutations that fix beneficial or neutral?Is observed polymorphism due to selection or drift?
![Page 3: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/3.jpg)
Reminder: substitution vs. polymorphism
What happen after a mutation changes a nucleotide in a locus
Polymorphism: mutant allele is one of several present in population
Substitution: the mutant allele fixes in the population. (New mutations at other nucleotides may occur later.)
![Page 4: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/4.jpg)
Substitution schematicIndividual 1 2 3 4 5 6 7
Time 0: aaat aaat aaat aaat aaat aaat aaatTime 10: aaat aaat aaat aaat acat aaat aaatTime 20: aaat aaat acat aaat acat acat acatTime 30: acat acat acat acat acat acat acatTime 40: acat acat actt acat acat acat acat
Times 10-29: polymorphismTime 30: mutation fixed -> substitutionTime 40: new mutation: polymorphism
![Page 5: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/5.jpg)
Reminder: substitution rates for neutral mutations
Most neutral mutations are lostOnly 1 out of 2N fixMost that are lost go quickly (< 20
generations for population sizes from 100 - 2000)
Most replacement mutations are lost since deleterious: rate of loss is faster than neutral
![Page 6: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/6.jpg)
Data in favor of neutrality
• Substitutions in DNA appear to be clock-like
Figure 6.21
![Page 7: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/7.jpg)
Drift model pseudocodePopulation with 2N – 1 copies of allele A, 1 of allele aFor each generation, draw from prior generation alleles.
-> generate a random number. If less than f(A), new allele = A. Otherwise, allele = a.-> repeat until 2N alleles drawn
Check to see outcome of drift->If a is lost, start over.->If a has fixed, note the number of years->Otherwise, next year with the new allele frequencies
Repeat 100x per population sizeTest populations of 100, 500, 1000, 1500, and 2000
![Page 8: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/8.jpg)
Times to fix for neutral alleles(Only 1/2N fix: how long do they take?)
0
5000
10000
15000
20000
25000
30000
35000
0 500 1000 1500 2000
Population size
Year
s to
fix
neut
ral m
utat
ion
Estimated formula: fixation time = 4.07 * N – 57
Theoretical formula: fixation time = 4N
![Page 9: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/9.jpg)
Puzzle for neutrality
• Rates of substitution are clock-like per year, not per generation.
Years
Subs
titut
ions
Actual patternrabbits
elephants
Years
Subs
titut
ions
Expected pattern
rabbits
elephants
![Page 10: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/10.jpg)
Revised theory: the nearly – neutral theory
Figure 6.22
![Page 11: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/11.jpg)
Can we distinguish selection from drift using sequence data?
• Compare two species: infer where substitutions have occurred.
• Silent site substitutions should be neutral (dS)• Non-synonymous substitutions are expected to
be deleterious (usually) (dN)• so, expect < 1
Translation: rate of non-synonymous (dN) is less than the rate of synonymous substitutions (dS)
dSdN
![Page 12: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/12.jpg)
dSdN
and inferences about selection
dSdN
dSdN
dSdN
< 1: replacements are deleterious
= 1: replacements are neutral
> 1: replacements are beneficial
![Page 13: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/13.jpg)
What happens to fixation time with selection? Model pseudocode
Population with 2N – 1 copies of allele A, 1 of allele aWA = 1 + s; Wa = 1For each generation, draw from prior generation alleles.
-> generate a random number. If greater than f(A), new alleel = a. Otherwise, test fitness: if random < WA, new allele = A. -> repeat until 2N alleles drawn
Check to see outcome of drift->If a is lost, start over.->If a has fixed, note the number of years->Otherwise, next year with the new allele frequencies
Repeat 100x per fitnessTest populations of 100
![Page 14: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/14.jpg)
Time to fix favourable allele
Ne = 100 (black line: estimated time to fixation = 2 ln(2N) / |s|
0200400600800
10001200
0 0.01 0.02 0.03 0.04 0.05
Selection coefficient
Year
s to
fix
neut
ral
mut
atio
n
![Page 15: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/15.jpg)
Time to fix: neutral vs. favourable
0
5000
10000
15000
20000
25000
30000
35000
0 500 1000 1500 2000
Population size
Year
s to
fix
neut
ral m
utat
ion
Simulation results: black – neutral mutations; red – favourable mutations
![Page 16: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/16.jpg)
Time to fixation: drift is slowNeutral:
New mutations per generation: 2NeuProbability of fixing a new mutation: 1 / 2Ne
Fixations per generation: = 2Neu * 1 / 2Ne = uTime to fix: 4Ne
Favored by selectionNew mutations per generation: 2Neu (but how many favourable??)Favored mutation probability of fixing: 2|s|Fixations per generation: 2Neu * 2|s| * prob. favourableTime to fix: 2 ln (2Ne) / |s|
2 ln (2Ne) / |s| << 4NeShorter time to fixation
Derivations of these results are tough! See Kimura (1962) and Kimutra and Ohta (1969).
![Page 17: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/17.jpg)
Time to fixation: favourable and neutral
![Page 18: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/18.jpg)
dN / dS data: BRCA1
dSdN
< 1
dSdN
> 1
Figure 6.21
![Page 19: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/19.jpg)
Molecular evidence of selection II: McDonald-Kreitman Test
dSdN
is very conservative: many selective events may be missed.
Example: immunoglobins.
= 0.37 overall
We suspect selection favoring new combinations at key sites. Antigen recognition sites:
dSdN
dSdN
> 3.0
![Page 20: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/20.jpg)
Evidence of selection II: McDonald-Kreitman test
v
v
![Page 21: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/21.jpg)
McDonald-Kreitman test III
If evolution of protein is neutral, the percentage of mutations that alter amino acids should be the same along any branch
If all mutations are neutral, all should have the same probability of persisting
So: dN / dS among polymorphisms should be the same as within fixed differences
![Page 22: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/22.jpg)
McDonald-Kreitman logic• Silent sites
- always neutral- fix slowly- contribute to polymorphism
• Replacement sites– mainly unfavourable– if neutral, fix at same rate as silent and contribute to
polymorphism– proportion of replacement mutations that are neutral
determines dN / dS for polymorphism– if favourable, fix quickly and do not contribute to
polymorphism: higher dN / dS for fixed differences, lower rate for polymorphism
![Page 23: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/23.jpg)
Time to fixation: favourable and neutral
![Page 24: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/24.jpg)
Polymorphism and fixation
Silent Replacement
Neutral
Deleterious
1 / 2N neutral mutations fix
![Page 25: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/25.jpg)
Polymorphism and fixation
Silent Replacement
Neutral
Deleterious
Favourable
1 / 2N neutral mutations fix
- slow
Neutral Favourable
2|s| fix
-fast
![Page 26: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/26.jpg)
dN / dS for neutral and favourable
Neutral
Polymorphism
Fixation
dNdS
dN
dS
Favourable
dNdS
dN
dS
dSdN
poly dSdN
fixed=
dSdN
poly dSdN
fixed<
![Page 27: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/27.jpg)
McDonald-Kreitman hypotheses
H0: All mutations are neutral.
Then, dN / dS for polymorphic sites should equal dN / dS for fixed differences
H1: replacements are favoured. Favoured mutations fix rapidly, so dN / dS for polymorphic < dN / dS fixed
![Page 28: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/28.jpg)
Example of MK test: ADH in Drosophilia
Compare sequences of D. simulans and D. yakuba for ADH (alcohol dehydrogenase)
Fixed differences
Polymorphic sites
Replacement 7 2
Silent 17 42
% fixed 7 / 24 = 29% 2 / 44 = 5%
Significance? Use χ2 test for independence
![Page 29: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/29.jpg)
Evidence of selection III: selective sweeps
• Imagine a new mutation that is strongly favored (e.g. insecticide resistance in mosquitoes)
![Page 30: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/30.jpg)
Detecting selection using linkage: G6PD in humans
Natural history:• Located on X chromosome• encodes glucose-6-phosphate dehydrogenase • Red blood cells lack mitochondria• Glycolysis only• NADPH only via pentose-phosphate shunt –
requires G6PD• NADPH needed for glutathione, which protects
against oxidation
![Page 31: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/31.jpg)
G6PD and malaria
• Malaria (Plasmodium falciparum) infects red blood cells
• Has limited G6PD function typically (but can produce the enzyme)
• Uses NADPH from red blood cell• In G6PD deficient individuals?
![Page 32: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/32.jpg)
G6PD mutants
• Different mutants result in different levels of enzymatic activity
• Severe mutants result in destruction of red blood cells and anemia
• Most common mutant: G6PD-202A• Usually mild effects: may increase risk of
miscarriage
• Prediction: G6PD and malaria?
![Page 33: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/33.jpg)
Frequency of G6PD deficiency
![Page 34: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/34.jpg)
Has G6PD-202A been selected?
• 14 markers up to 413,000 bp from G6PD• LD? • Long distance LD implies strong, recent
selection
![Page 35: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/35.jpg)
Has G6PD-202A been selected?Li
nkag
e di
squi
libriu
m
kb from core region
Fig 7.14
![Page 36: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/36.jpg)
Alternative hypothesis: drift caused linkage disequilibrium
Allele frequency
G6PD-202A
Figure 7.14b
![Page 37: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/37.jpg)
Detecting selection II: CCR532
![Page 38: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/38.jpg)
Detecting selection II: CCR5Δ32
• Stephens (1998) found strong disequilibrium between CCR5-Δ32 and nearby markers
• Implies recent origin (< 2000 years): recombination breaks down linkage
• Implies selected
![Page 39: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/39.jpg)
Detecting selection II: CCR5Δ32
• But: new data – November 2005.
• Better map:
![Page 40: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/40.jpg)
Detecting selection: summary
• Several approaches to detecting selection– dN / dS– McDonald-Kreitman test– using hitchhiking
Challenges of each method?
![Page 41: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/41.jpg)
Other uses of molecular data: the coalescent
Any two alleles in a population share a common ancestor in the last generation
1 / 2Ne
Therefore, going backwards in time, the expected time to find the common ancestor is 1 / (1 / 2Ne) = 2Ne
![Page 42: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/42.jpg)
Coalescent II
![Page 43: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/43.jpg)
Coalescent and sequences
Imagine that you have two sequences at a locus.
They shared a common ancestor 2Ne generations ago.
They accumulate mutations at rate u per generation per basepair.
2Ne generations / lineage * 2 lineages * u = 4Neu differences per basepair between the
two sequences.
![Page 44: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/44.jpg)
Coalescent example
We sequence 1000 base pairs from two sequences, and find 16 base pair differences, how large is the population/
Assume u = 2 x 10-8.4Neu * 1000 = 16; 8 x 10-5 * Ne = 16;
Ne * 10-5 = 2; Ne = 200,000
![Page 45: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/45.jpg)
Neutral theory as a null model
![Page 46: how do we explain the patterns of variation observed in DNA sequences?](https://reader033.fdocuments.in/reader033/viewer/2022050800/568167c9550346895ddd1762/html5/thumbnails/46.jpg)
Additional readingsEyre-Walker (2006) The genomic rate of adaptive evolution. Trends in
Ecology and Evolution 29:569-575. (Well-written review)Gillespie (2004). Population genetics: a concise guide. John Hopkins:
Baltimore, MD. (Very short, clear, but dense!)Graur and Li (2000) Fundamentals of molecular evolution. Sinauer:
Sunderland, MA. (Very clear)Kimura (1962) On the probability of fixation of mutant genes in
populations. Genetics 47:713-719. (If you really want the derivation)Kimura and Ohta (1969) The average number of generations until
fixation of a mutant gene in a finite population. Genetics 61:763-771. (If you really want the derivation)
Sabeti et al (2006) The case for selection at CCR5-32. PLoS Biology 3:1963-1969.
Questions: 1. Explain why clock-like rates of substitutions per year did not fit with the neutral theory.
See posted molecular evolution practice questions: highly recommended!