Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population...
Transcript of Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population...
Spezielle Statistik in der Biomedizin
WS 2014/15
Introduction to Population
Genetics
What is population genetics?
Describes the genetic structure
and variation of populations.
• Causes
• Maintenance
• Changes
Theorizes on the evolutionary
forces acting on populations.
• Drift
• Mutation
• Selection
• Migration
Phenotypic and Genotypic Variation
Phenotypic Variation
• Mendel 1866: Paper of heredity; discrete phenotypic
variation.
• Galton 1822 — 1911: Study of human hereditary
differences; continuous variation (e.g., height).
Genotypic Variation
• Classical Hypothesis Variation due to mutation (↑)
vs. negative selection (↓).
• Balance Hypothesis Selection maintains variation
because it favors heterozygots or rare genotypes.
• Neutral Theory Maybe much variation has only very
little effect?
Definitions
Locus Place on a chromosome where an allele
resides.
Allele The bit of DNA at that place. The same allele
can have different DNA sequences.
Site A specific, unique position of a single
nucleotide in a genome.
Segregating (Polymorphic) Site A site with different
nucleotides in independently sampled alleles (e.g.,
Single Nucleotide Polymorphism; SNP).
Silent (Synonymous) Polymorphism Alternative
codons code for the same amino acid.
Replacement (Non-Synonymous) Polymorphism A
nucleotide polymorphism that causes an amino acid
polymorphism.
Example: DNA Variation in Drosophila
The alcohol dehydrogenase locus (ADH) locus in D.
melanogaster has the typical exon-intron structure
of eukaryotic genes. Kreitman & Gillespie (1983)
analyzed 11 alleles from Florida, Washington, Africa,
Japan and France.
Exon Coding region.
Intron Non-coding region.
DNA for Coding Region at ADH locus
ADH Alleles for Different Dmel Populations
The Great Obsession of Population Genetics
What evolutionary forces could have led to such
divergence between individuals within the same
species?
Why do silent polymorphisms preponderate over
replacement polymorphisms?
Table: Summary of polymorphic sites within melanogaster and
fixed differences between melanogaster and erecta.
McDonald-Kreitman Test
Tests, if both silent and replacement variation are
neutral; follows standard procedure for 2 ×2
contingency tables.
Segregating Sites and Nucleotide Mismatches
There are different ways to quantify the amount of
nucleotide mismatches in sequence data.
Segregating sites S = Number of nucleotide sites that
differ among the aligned sequences
Average number of nucleotide mismatches
= Total number of nucleotide mismatchesTotal number of pairwise comparisons
The Parameter θ
The parameter θ = 4Nu (N is the population size, u is
the mutation coefficient) determines the level of
variation under the neutral model.
The estimates of both S and can be related to θ
Testing for equality between these two estimates is
one of many ways to detect departures from
neutrality.
Genotype Frequency and Allele Frequency
To introduce the notion of genotype and allele
frequencies, we will not refer to a particular sample,
but rather to one locus that has two alleles A and a,
segregating in the population.
Genotype frequency
Genotype AA Aa aa
Relative frequency x11 x12 x22
The frequency of the A allele in the population is
p = x11 + ½ x12
Hardy-Weinberg Law
Relates the allele frequencies to the genotype
frequencies at an autosomal locus in an equilibrium
randomly mating population. More assumptions:
• nonoverlapping generations
• sexual reproduction
• diploid
• diallelic
• P(male) = P(female)
• infinite population size
• no selection
• no mutation
• no migration
Hardy-Weinberg Law
The Hardy-Weinberg genotype frequencies will remain
unchanged in all generations after the first. This is a
statement we can test against (null model)!
Problem
There are two islands of the same size, one inhabited
with people with 5 fingers, and one with people with 6
fingers. The islands, through some geological
cataclysm, crash into one another. The first generation
hybrids between the inhabitants of the two islands
have 6 fingers. How will the frequency of the six-finger
allele change over time?
(Thanks to Andrea Betancourt for the problem set)
aa
aa
aa
aa aa
AA
AA
AA
AA AA
A A
A A a
a a a
A
a A
a
f(A) = 1
f(a) = 1
f(A) = p
f(a) = 1-p
Problem
R exercise
>source("~Desktop\exercises.R")
> run_simulation()
Population size ? 10
Initial frequency of red ? 0.5
(Hit q to quit)
Genetic Drift
In finite populations, random changes in allele frequencies
result from variation in the number of offspring between
individuals.
• Genetic drift causes random changes in allele
frequencies.
• Hence, alleles can be lost from the population (genetic
variation is removed).
• The direction of the random changes is neutral.
Wright-Fisher model; binomial sampling
Genetic Drift
> run_many_simulations()
Population size ? 100
Initial frequency of red ? 0.5
How many runs to simulate ? 100
• What affect does changing the population
size have?
• Changing the initial frequency?
Genetic Drift
• What is the expected change in frequency
under this model?
•Variance?
Genetic Drift
Genetic drift
(Variance effective size of Wright 1931)
Genetic Drift
Usually, Ne < N
- Separate sexes
- Variance in offspring number
- Bottlenecks in population size
Wright 1931; Kimura 1983
Selection
Individuals with different genotypes may leave different
number of offspring (on average). Given the fitness
schema below, what is the expected change in
frequency for the A allele?
AA Aa aa
frequency
p2
2p(1-p)
(1-p)2
relative number of
offspring
w11
w12
w22
frequency in the
offspring
p2w11
w
(1 p)2w22
w
2p(1 p)w12
w
Selection
Selection
Individuals with different genotypes may leave different
number of offspring (on average).
AA Aa aa
frequency
p2
2p(1-p)
(1-p)2
relative number of
offspring
w11
w12
w22
1+s 1+hs
1
Selection
Selection
Selection as a Wright-Fisher process:
AA Aa aa
frequency
p2
2p(1-p)
(1-p)2
relative number of
offspring
1+2s
1+s
1
Selection
> run_many_simulations(s=0.1)
Population size ? 100
Initial frequency of red ? 0.2
How many runs to simulate ? 100
• What affect does changing N have?
•try 10, 100, 1000 with s=0.01 and p=0.1
• Changing the initial frequency?
•try 0.01,0.1, 0.5 with N =100, s=0.01
Selection
N = 10 N = 100
Selection
p0= 0.01 p0= 0.1 p0= 0.5
Divergence between populations
Divergence between populations
• What affect does changing N have? •try, e.g, 10, 100, 1000. Note that the scale will change
on the x-axis. To plot in a new window, type ‘quartz()’
• Changing the initial frequency?
•try, e.g, 0.01, 0.1, 0.5.
Run simulations with genetic drift alone
Divergence between populations
Ne =10 Ne =100 Ne =1000
Divergence between populations
p0 = 0.01 p0 = 0.1 p0 = 0.5
Divergence between populations
Run simulations with selection
Two scenarios:
1) selection in opposite directions run_many_simulations(s=c(-0.1, 0.1))
2) selected vs. control populaiton run_many_simulations(s=c(0.1, 0))
Divergence between populations
selected vs. control selected in opposite directions
Divergence between populations
selected in opposite directions
s = +/-0.01, N=100, p0=0.1
Selection vs. Drift
In a population of size 200, an allele changes
frequency from 0.2 to 0.22 in a single generation.
- What is the probability of this occurring if the
allele is neutral?
- What if the allele is favorable, with a heterozygous
s of 0.1?
Selection vs. Drift
Neutral:
• Binomial probability with 2N = 400, and k = (0.22*400) = 88, p = 0.2.
> dbinom(x=88, size=400, p = 0.2)
[1] 0.02952489
Selected for, s= 0.1 • Binomial probability with 2N = 400, k = 88, and p’
=p+(p*(1-p)*s)/mean(w) = 0.215
> dbinom(x=88, size=400, p = 0.215)
[1] 0.04670787
Selection vs. Drift
Suppose in the same population, the
allele changes from 0.22 to 0.25 (k= 100).
What is the probability of the whole
sequence for both models?
- neutral?
- s = 0.1?
Selection vs. Drift
Suppose in the same population, the allele
changes from 0.22 to 0.25 (k= 100). What is
the probability of the whole sequence for
both models?
- neutral? >dbinom(x=88, size=400, p = 0.2)* dbinom(x=100, size=400, p = 0.22)
[1] 0.0004914107
- s = 0.1? (p’ = 0.236) >dbinom(x=88, size=400, p = 0.215)* dbinom(x=100, size=400, p =
0.236)
[1] 0.001734624
Summary
Population genetics is concerned with the genetic
basis of evolution.
In a population geneticists world, evolution is the
change in the frequencies of genotypes through
time.
Probabilistic models of evolution are constructed
and checked whether they are compatible with real
data.
Outlook
Some further topics not (yet) discussed here
Demography inference
Migration
Recombination
Epistasis . . .