Populations. Large populations Terns Dryopteris fragrans, a rare cliff fern Small populations.

Populations

Large populations

Terns

Dryopteris fragrans, a rare cliff fern

Small populations

Dynamic populations

Homo sapiens

Complex populations

Markers: isozymes

AFLPs

MMMMM

Illumina Beadstation genotyping for SNPsIllumina Beadstation genotyping for SNPs

• High throughput genotypins

• Genotyping of a cross:

• Low cost per genotype (5-20 cents) but need to assay for large number of genotypes (either 384, or 768, or 1586) makes total cost large (thousands of $)

What do populations have to do with genetic markers

Influence levels of diversity

Conversely, polymorphic genetic markers can infer many population processes

Emphasis in FRST432 is the latter

Quantifying genetic variation

Gene frequency

Genetic diversity

Hardy-Weinberg

Estimation of gene frequency

For co-dominant loci, simply count the numbers (“gene counting method”)

Gene counting method also is the “maximum likelihood estimate”

Estimation of allele frequency

MN blood group: genotype number

MM 392 MN 707 NN 320

Total: 1419 (actual sample size is twice)

Frequency of M=PM=(2 x 392 + 707)/[2 x 1419]=0.525

Frequency of N is 1-PM


Estimation based upon gene counting pA = (2NAA+NAa.)/(2NAA+2NAa+2Naa)

More theoretical relationship pA = fAA+.5fAa F’s are frequencies

Var(pA) = Var(pa) = pA (1-pA)/(2N) Binomial sampling variance Construct confidence interval

Comparing among populations

Hardy-Weinberg

Predict genotypic frequencies from gene frequencies F(AA)=p2

F(Aa)=2pq F(aa)=q2

Expansion of (p+q)2

HW is basis for almost all models

Inbreeding also detected as excess of homozygotes

Historical context

Is the population going to be driven to a particular frequency for an allele simply because it is inherited in a Mendelian fashion?

Is the recessive phenotype driven to occur in 25% of the population?

Hardy and Weinberg proved this was false

HW “Equilibrium”

Equilibrium = nothing changes across generations

Genotypes are transient, broken up each generation

Reconstituted randomly into zygotes

Reached in just one generation

Assumptions of HW

No directive forcesNo mutation, migration, selection

No dispersive forces Infinite population size, random mating

Predictions of HW

Allele frequencies unchanged over time

After one generation, genotypic frequencies unchanged over time

Allele frequencies, not genotypic frequencies, are sufficient parameters for models

One prediction of H-W rule

The fundamental measure of genetic variation: expected heterozygosity

At one locus, gene frequency for i-th allele is

expected Hardy-Weinberg frequency of homozygous genotype is

Over all possible alleles i, i=1,n, the probability that the locus is homozygous for any allele is

2ip

ip

n

iipJ

1

2

Expected heterozygosity = 1-expected homozygosity

n

iipJH

1

211

often referred to as gene diversity

Heterozygosity at 20 variable allozymes out of 71loci sampled in a population of European people

Gene Locus Enzyme Encoded Heterozygosity H

Aph Alkaline phosphatase (placental) 0.53 Acph Acid phosphatase 0.52 Gpt Glutamate-pyruvate transaminase 0.50 Adh-3 Alcohol dehydrogenase-3 0.48 Peps Pepsinogen 0.47 Pgm-2 Phosphoglucomutase-2 0.38 Pept-A Peptidase-A 0.37 Pgm-1 Phosphoglucomutase-l 0.36 Me Malic enzyme 0.30 Ace Acetylcholinesterase 0.23 Adn Adenosine deaminase 0.11 Gput Galactose-1-phosphate uridyl transferase 0.11 Adk Adenylate kinase 0.09 Amy Amylase (pancreatic) 0.09 Adh-2 Alcohol dehydrogenase-2 0.07 6Pgdh 6-Phosphogluconate dehydrogenase 0.05 Hk Hexokinase (white-cell) 0.05 Got Glutamate-oxaloacetate transaminase 0.03 Pept-C Peptidase-C 0.02 Pept-D Peptid ase-D 0.02 51 Loci invariant (Monomorphic) 0.00

After H. Harris and D. A. Hopkinson, J. Human Genetics

Comparison of isozyme variation across kingdoms

Variation of diversity among species

Explaining levels of diversity is a prime activity of population genetics

Plants have most diverse array of life histories, short-lived and self-fertilizers have least variation, long-lived outcrossers have most variation

Vertebrates have narrowest array of life histories, hence lowest variation of diversity among species

Just explaining the mean level of diversity is challenging

Outcome of complex interplay of mutation, selection, and chance (drift)…

Q. What does heterozygosity measure?

A. The tendency for a population to have “intermediate” gene frequencies

Other measures of genetic variation

Polymorphism Ford (1940) “the occurrence together in the same habitat of two or more

discontinuous forms in such proportions that the rarest of them cannot be maintained by recurrent mutation”probably not a good definition in 2006

Polymorphism Cavalli-Sforza and Bodmer (1971) “the occurrence in the same population of two or more alleles

at one locus, each with appreciable frequency” but what is “appreciable frequency?”

Other measures of diversity

Proportion of polymorphic loci: P practical definition of “appreciable frequency” arbitrary limit for most common allele

0.95 normally

0.99 sometimes (used when sample is adequate, N >100)

Numbers of alleles Number of alleles, n

allele diversity or allele richness strongly influenced by sample size

Effective number of alleles ne = 1 / ( 1 - H ) number of equally frequent alleles that gives observed H

P vs. H over taxa

Measures of nucleotide diversity

Proportion of sites that differ = S/N S=number of segregating sites N=number of nucleotide sites Depends on number of sequences aligned

the more sequences, the higher S like the proportion of polymorphic loci

Nucleotide diversity Heterozygosity averaged over aligned sites If there are K sequences, make all possible pairwise

comparisons (there are K(K-1)/2 comparisons) Analogous to H as estimated from gene frequencies


Gene counting

Freq(A) = Freq(AA)+.5 Freq(Aa)

Var(p) = p(1-p)/(2N) Binomial sampling variance Construct confidence interval

Dominance: need Hardy-Weinberg


Dominance: assume Hardy-Weinberg

2qfaa

aafq ˆ

)4/()1()ˆ(Var 2 Nqq

Kermode bear example

A total of 87 bears were collected for hair samples on Gribbell, Princess Royal and Roderick Islands

66 were black, 21 were white

Frequency of recessive phenotype = 21/(66+21) = 0.241

Estimate of gene frequency of white gene is square root of this: sqrt(0.241) = 0.49

Variance is (1-0.492)/(4*87)=0.00218

SE is sqrt of this, sqrt(0.00218) =0.046

We also have nucleotide data for gene underlying Kermode coat color

AA and AG = black, GG=white 42 AA, 24 AG, 21 GG Gene frequency of G (white) =

(24 + 2 x 21)) / (2 x 87) = 0.38 SE = sqrt(q(1-q)/2N) = 0.040

Using just coat color, with white recessive q=0.49, SE=0.046 (from previous slide) q is higher (0.49 vs. 0.38); why?

Expected frequency of white bears

Using co-dominant Mc1r data, expected number of GG = 87 x (0.38)2 = 12.5

Observed number is 21 (>>12.5)

Can be caused by Assortative mating which creates excess of white genotype (GG)

over HW expectations Variation of gene frequency among islands

Microsatellite loci show no excess homozygosity! Assortative mating at coat color locus Excess homozygosity only at Mc1r

Null alleles or inbreeding? Fis values (excess homozygosity above HW expectations) for Yellow Warbler microsatellites

Locus Fis value

Caµ 28 0.30

Dpµ 01 0.01

Dpµ 03 0.05

Dpµ 15 0.12

Dpµ 16 0.00

Maµ 23 0.02

Another exercise in HW: null alleles increase apparent homozygote frequency

n

i

n

inii ppp

1

1

1

2 2Sum of all true homozygotes plus all heterozygous nulls

)1(2 nne ppJ Equals expected homozygosity plus twice null frequency

221

22221

12121

...

............

...

...

nnn

n

n

ppppp

ppppp

ppppp(e.g., sum last row and column of the expansion of gene frequencies, except for the lower right corner)

Populations: defining and identifying

Two major paradigms for defining populations

•Ecological paradigmA group of individuals of the same species that co-occur in space and time and have an opportunity to interact with each other.

•Evolutionary paradigmA group of individuals of the same species living in close enough proximity that any member of the group can potentially mate with any other member.

Cocoa from 32 abandoned estates in Trinidad 88 Imperial College Selection (ICS) clones conserved in the International Cocoa Genebank, Trinidad, assayed for 35 microsatellite loci

Unweighted pair groupmethod used to constructdendrogram of relatedness between individuals

The different colored groups can be identified by eye, oridentified with the computer program “STRUCTURE” (as was done here).

Yellow perch

The yellow perch plays a significant role in the survival and success of the double-crested cormorant and other birds, predatory fish, commercial fisherman, and sport fisherman in the Great Lakes region. This fish must be properly managed in order to prevent the trophic structure and economy of the Great Lakes region from collapsing.

The yellow perch (Perca flavescens) is found in the United States and Canada, and looks similar to the European perch but are paler. It is in the same family as the walleye, but in a different family from white perch.

mt DNA Control region haplotype frequency patterning for Yellow Perch spawning site groups across North America

Relationships among mtDNA haplotypes of Yellow Perch

Allele distribution for six representative Yellow Perch microsatellite loci among selected regions. Rings represent loci, colors within a ring represent alleles.

Bayesian assignment of Yellow Perch genetic structure, using STRUCTURE. Vertical bars represent individuals, colors within a bar represent probability of assignment to a cluster. 8 microsatellite loci, 25 collection sites, N= 495 fish, K=10

Inference of population structure using multi-locus genotype data

Pritchard, Stephens, and Donnelly (2000) Falush, Stephens, and Pritchard (2003)

STRUCTURE V2.1 Pritchard, J.K., and Wen, W. (2004)

Main objective of “structure”

Assign individuals to populations on the bases of their genotypes, while simultaneously estimating population allele frequencies

Infer number of populations “K” in the process

Other objectives

Begin with a set of predefined populations and to classify individuals of unknown origin

Identify the extent of admixture of individuals

Infer the origin of particular loci in the sampled individuals

Structure is a Bayesian Model Based method of clustering

many assumptions about parameters and distributions

Four basic models

1. Model without admixtureeach individual is assumed to originate in one (only one) of K populations

2. Model with admixtureeach individual is assumed to have inherited some proportion of its ancestry from each of K populations

Four basic models

3. Linkage model

“Chunks” of chromosomes as derived as intact units from one or another K population and all allele copies on the same “chunk” derive from the same population.

Four basic models

4. F modelThe populations all diverged from a common ancestral population at the same time, but allows that the populations may have experienced different amounts of drift since the divergence event

Assumptions• The main modeling assumptions are Hardy-Weinberg equilibrium (HW) within populations and complete linkage equilibrium (LD) between loci within populations

• The model accounts for the presence of HW or LD by introducing population structure and attempts to find populations groupings that (as far as possible) are not in disequilibrium

Hardy-Weinberg

Gives relationship between gene frequencies and genotypic frequencies, assuming random mating

F(AA)=p2

F(Aa)=2pq F(aa)=q2

The extent of a randomly mating population is predicted from STUCTURE using HW predictions

Pairwise comparison of LD along chromosomes, high LD is red, low LD is green

Bayesian procedure employed by STRUCTURE Step 1: estimate the allele frequencies for each

population assuming that the population of origin of each individual is known.

Step 2: estimate the population of origin of each individual, assuming that the population allele frequencies are known.

Iterate several times using “Markov-Chain Monte-Carlo” procedure

Good and bad things about “structure” When populations are real, most efficient way to

estimate number of populations K and the membership of individuals to populations

When populations are more continuous (for example a continuous cline), can impose incorrect structure on data, and create an arbitrary number of artificial groups.

Human variation and differentiation

Hundreds of microsatellites now available ALU markers

Can evolutionary history be reconstructed Are there distinct “races” Are certain populations less diverse

K is set to 3

We place individuals in three groups, without prior knowledge of group membership

More loci, the better identification of groups

• Human Genome Diversity Panel

• 55 Indigenous Populations from 5 Continents: Africa, Americas, Asia, Europe, Oceania, total of 1,056 people

• 377 microsatellite markers assayed

Noah Rosenberg et al, Science, 2002

Structure within structure

Jun Li et al, Science, 2008

Human Genome Diversity Panel, 938 individuals from 51 populations, 5 continents

650,000 SNP Markers

Bayesian prior for population assignment

Ursus americanus ssp. Kermodii

Purpose of Kermode bear study, conducted in conjunction with Western Forest Products

• Determine if white bear populations are genetically unique for other types of genetic variation

• Identify the gene, or genes, that cause the white coat color difference

• Infer the role of natural selection vs. genetic drift from patterns of genetic variation for this gene

• Predict effects of forest practices using this information

Populations sampled for Kermode bear hairs

Barb wire hair trap with Kermode hair

From 1685 hair samplesto 766 microsatellite profilesto 216 unique genotypes (22 Kermode)

Kermode-containing populations (yellow): perhaps 10% less genetic variation, but other island populations show 10% less variation too

Genetic divergence (below diagonal), gene flow (above diagonal)

Relationship of populations based upon pairwise genetic divergence (previous table); gene frequencies of white phase given in parenthesis

(0.56)

(0.05)

(0.08)

(0.00)

(0.02)

(.013)

(0.05)

(0.04)

(0.21)

(0.10)(0.00)

(0.33)

Kermode populationsare not closely relatedto each other, somesuggestion of complexinterrelations

E-Pr H (Hawkesbury)

E-Pr W-H

P (Pooly Is), R (Roderick Is)

T (Terrace/Nass)

Populations. Large populations Terns Dryopteris fragrans, a rare cliff fern Small populations.

Documents

Transcript of Populations. Large populations Terns Dryopteris fragrans, a rare cliff fern Small populations.