Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population...

44
Spezielle Statistik in der Biomedizin WS 2014/15 Introduction to Population Genetics

Transcript of Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population...

Page 1: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Spezielle Statistik in der Biomedizin

WS 2014/15

Introduction to Population

Genetics

Page 2: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

What is population genetics?

Describes the genetic structure

and variation of populations.

• Causes

• Maintenance

• Changes

Theorizes on the evolutionary

forces acting on populations.

• Drift

• Mutation

• Selection

• Migration

Page 3: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Phenotypic and Genotypic Variation

Phenotypic Variation

• Mendel 1866: Paper of heredity; discrete phenotypic

variation.

• Galton 1822 — 1911: Study of human hereditary

differences; continuous variation (e.g., height).

Genotypic Variation

• Classical Hypothesis Variation due to mutation (↑)

vs. negative selection (↓).

• Balance Hypothesis Selection maintains variation

because it favors heterozygots or rare genotypes.

• Neutral Theory Maybe much variation has only very

little effect?

Page 4: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Definitions

Locus Place on a chromosome where an allele

resides.

Allele The bit of DNA at that place. The same allele

can have different DNA sequences.

Site A specific, unique position of a single

nucleotide in a genome.

Segregating (Polymorphic) Site A site with different

nucleotides in independently sampled alleles (e.g.,

Single Nucleotide Polymorphism; SNP).

Silent (Synonymous) Polymorphism Alternative

codons code for the same amino acid.

Replacement (Non-Synonymous) Polymorphism A

nucleotide polymorphism that causes an amino acid

polymorphism.

Page 5: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Example: DNA Variation in Drosophila

The alcohol dehydrogenase locus (ADH) locus in D.

melanogaster has the typical exon-intron structure

of eukaryotic genes. Kreitman & Gillespie (1983)

analyzed 11 alleles from Florida, Washington, Africa,

Japan and France.

Exon Coding region.

Intron Non-coding region.

Page 6: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

DNA for Coding Region at ADH locus

Page 7: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

ADH Alleles for Different Dmel Populations

Page 8: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

The Great Obsession of Population Genetics

What evolutionary forces could have led to such

divergence between individuals within the same

species?

Why do silent polymorphisms preponderate over

replacement polymorphisms?

Table: Summary of polymorphic sites within melanogaster and

fixed differences between melanogaster and erecta.

Page 9: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

McDonald-Kreitman Test

Tests, if both silent and replacement variation are

neutral; follows standard procedure for 2 ×2

contingency tables.

Page 10: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Segregating Sites and Nucleotide Mismatches

There are different ways to quantify the amount of

nucleotide mismatches in sequence data.

Segregating sites S = Number of nucleotide sites that

differ among the aligned sequences

Average number of nucleotide mismatches

= Total number of nucleotide mismatchesTotal number of pairwise comparisons

Page 11: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

The Parameter θ

The parameter θ = 4Nu (N is the population size, u is

the mutation coefficient) determines the level of

variation under the neutral model.

The estimates of both S and can be related to θ

Testing for equality between these two estimates is

one of many ways to detect departures from

neutrality.

Page 12: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genotype Frequency and Allele Frequency

To introduce the notion of genotype and allele

frequencies, we will not refer to a particular sample,

but rather to one locus that has two alleles A and a,

segregating in the population.

Genotype frequency

Genotype AA Aa aa

Relative frequency x11 x12 x22

The frequency of the A allele in the population is

p = x11 + ½ x12

Page 13: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Hardy-Weinberg Law

Relates the allele frequencies to the genotype

frequencies at an autosomal locus in an equilibrium

randomly mating population. More assumptions:

• nonoverlapping generations

• sexual reproduction

• diploid

• diallelic

• P(male) = P(female)

• infinite population size

• no selection

• no mutation

• no migration

Page 14: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Hardy-Weinberg Law

The Hardy-Weinberg genotype frequencies will remain

unchanged in all generations after the first. This is a

statement we can test against (null model)!

Page 15: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Problem

There are two islands of the same size, one inhabited

with people with 5 fingers, and one with people with 6

fingers. The islands, through some geological

cataclysm, crash into one another. The first generation

hybrids between the inhabitants of the two islands

have 6 fingers. How will the frequency of the six-finger

allele change over time?

(Thanks to Andrea Betancourt for the problem set)

Page 16: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

aa

aa

aa

aa aa

AA

AA

AA

AA AA

A A

A A a

a a a

A

a A

a

f(A) = 1

f(a) = 1

f(A) = p

f(a) = 1-p

Problem

Page 17: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

R exercise

>source("~Desktop\exercises.R")

> run_simulation()

Population size ? 10

Initial frequency of red ? 0.5

(Hit q to quit)

Page 18: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genetic Drift

In finite populations, random changes in allele frequencies

result from variation in the number of offspring between

individuals.

• Genetic drift causes random changes in allele

frequencies.

• Hence, alleles can be lost from the population (genetic

variation is removed).

• The direction of the random changes is neutral.

Wright-Fisher model; binomial sampling

Page 19: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genetic Drift

> run_many_simulations()

Population size ? 100

Initial frequency of red ? 0.5

How many runs to simulate ? 100

• What affect does changing the population

size have?

• Changing the initial frequency?

Page 20: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genetic Drift

• What is the expected change in frequency

under this model?

•Variance?

Page 21: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genetic Drift

Page 22: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genetic drift

(Variance effective size of Wright 1931)

Page 23: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Genetic Drift

Usually, Ne < N

- Separate sexes

- Variance in offspring number

- Bottlenecks in population size

Wright 1931; Kimura 1983

Page 24: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

Individuals with different genotypes may leave different

number of offspring (on average). Given the fitness

schema below, what is the expected change in

frequency for the A allele?

AA Aa aa

frequency

p2

2p(1-p)

(1-p)2

relative number of

offspring

w11

w12

w22

frequency in the

offspring

p2w11

w

(1 p)2w22

w

2p(1 p)w12

w

Page 25: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

Page 26: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

Individuals with different genotypes may leave different

number of offspring (on average).

AA Aa aa

frequency

p2

2p(1-p)

(1-p)2

relative number of

offspring

w11

w12

w22

1+s 1+hs

1

Page 27: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

Page 28: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

Selection as a Wright-Fisher process:

AA Aa aa

frequency

p2

2p(1-p)

(1-p)2

relative number of

offspring

1+2s

1+s

1

Page 29: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

> run_many_simulations(s=0.1)

Population size ? 100

Initial frequency of red ? 0.2

How many runs to simulate ? 100

• What affect does changing N have?

•try 10, 100, 1000 with s=0.01 and p=0.1

• Changing the initial frequency?

•try 0.01,0.1, 0.5 with N =100, s=0.01

Page 30: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

N = 10 N = 100

Page 31: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection

p0= 0.01 p0= 0.1 p0= 0.5

Page 32: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

Page 33: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

• What affect does changing N have? •try, e.g, 10, 100, 1000. Note that the scale will change

on the x-axis. To plot in a new window, type ‘quartz()’

• Changing the initial frequency?

•try, e.g, 0.01, 0.1, 0.5.

Run simulations with genetic drift alone

Page 34: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

Ne =10 Ne =100 Ne =1000

Page 35: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

p0 = 0.01 p0 = 0.1 p0 = 0.5

Page 36: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

Run simulations with selection

Two scenarios:

1) selection in opposite directions run_many_simulations(s=c(-0.1, 0.1))

2) selected vs. control populaiton run_many_simulations(s=c(0.1, 0))

Page 37: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

selected vs. control selected in opposite directions

Page 38: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Divergence between populations

selected in opposite directions

s = +/-0.01, N=100, p0=0.1

Page 39: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection vs. Drift

In a population of size 200, an allele changes

frequency from 0.2 to 0.22 in a single generation.

- What is the probability of this occurring if the

allele is neutral?

- What if the allele is favorable, with a heterozygous

s of 0.1?

Page 40: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection vs. Drift

Neutral:

• Binomial probability with 2N = 400, and k = (0.22*400) = 88, p = 0.2.

> dbinom(x=88, size=400, p = 0.2)

[1] 0.02952489

Selected for, s= 0.1 • Binomial probability with 2N = 400, k = 88, and p’

=p+(p*(1-p)*s)/mean(w) = 0.215

> dbinom(x=88, size=400, p = 0.215)

[1] 0.04670787

Page 41: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection vs. Drift

Suppose in the same population, the

allele changes from 0.22 to 0.25 (k= 100).

What is the probability of the whole

sequence for both models?

- neutral?

- s = 0.1?

Page 42: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Selection vs. Drift

Suppose in the same population, the allele

changes from 0.22 to 0.25 (k= 100). What is

the probability of the whole sequence for

both models?

- neutral? >dbinom(x=88, size=400, p = 0.2)* dbinom(x=100, size=400, p = 0.22)

[1] 0.0004914107

- s = 0.1? (p’ = 0.236) >dbinom(x=88, size=400, p = 0.215)* dbinom(x=100, size=400, p =

0.236)

[1] 0.001734624

Page 43: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Summary

Population genetics is concerned with the genetic

basis of evolution.

In a population geneticists world, evolution is the

change in the frequencies of genotypes through

time.

Probabilistic models of evolution are constructed

and checked whether they are compatible with real

data.

Page 44: Introduction to Population Geneticsi122server.vu-wien.ac.at/pop/Kosiol_website/... · Population genetics is concerned with the genetic basis of evolution. In a population geneticists

Outlook

Some further topics not (yet) discussed here

Demography inference

Migration

Recombination

Epistasis . . .