Populations. Large populations Terns Dryopteris fragrans, a rare cliff fern Small populations.
-
Upload
alison-regina-goodwin -
Category
Documents
-
view
217 -
download
1
Transcript of Populations. Large populations Terns Dryopteris fragrans, a rare cliff fern Small populations.
Populations
Large populations
Terns
Dryopteris fragrans, a rare cliff fern
Small populations
Dynamic populations
Homo sapiens
Complex populations
Markers: isozymes
AFLPs
MMMMM
Illumina Beadstation genotyping for SNPsIllumina Beadstation genotyping for SNPs
• High throughput genotypins
• Genotyping of a cross:
• Low cost per genotype (5-20 cents) but need to assay for large number of genotypes (either 384, or 768, or 1586) makes total cost large (thousands of $)
What do populations have to do with genetic markers
Influence levels of diversity
Conversely, polymorphic genetic markers can infer many population processes
Emphasis in FRST432 is the latter
Quantifying genetic variation
Gene frequency
Genetic diversity
Hardy-Weinberg
Estimation of gene frequency
For co-dominant loci, simply count the numbers (“gene counting method”)
Gene counting method also is the “maximum likelihood estimate”
Estimation of allele frequency
MN blood group: genotype number
MM 392 MN 707 NN 320
Total: 1419 (actual sample size is twice)
Frequency of M=PM=(2 x 392 + 707)/[2 x 1419]=0.525
Frequency of N is 1-PM
Estimation of gene frequency
Estimation based upon gene counting pA = (2NAA+NAa.)/(2NAA+2NAa+2Naa)
More theoretical relationship pA = fAA+.5fAa F’s are frequencies
Var(pA) = Var(pa) = pA (1-pA)/(2N) Binomial sampling variance Construct confidence interval
Comparing among populations
Hardy-Weinberg
Predict genotypic frequencies from gene frequencies F(AA)=p2
F(Aa)=2pq F(aa)=q2
Expansion of (p+q)2
HW is basis for almost all models
Inbreeding also detected as excess of homozygotes
Historical context
Is the population going to be driven to a particular frequency for an allele simply because it is inherited in a Mendelian fashion?
Is the recessive phenotype driven to occur in 25% of the population?
Hardy and Weinberg proved this was false
HW “Equilibrium”
Equilibrium = nothing changes across generations
Genotypes are transient, broken up each generation
Reconstituted randomly into zygotes
Reached in just one generation
Assumptions of HW
No directive forcesNo mutation, migration, selection
No dispersive forces Infinite population size, random mating
Predictions of HW
Allele frequencies unchanged over time
After one generation, genotypic frequencies unchanged over time
Allele frequencies, not genotypic frequencies, are sufficient parameters for models
One prediction of H-W rule
The fundamental measure of genetic variation: expected heterozygosity
At one locus, gene frequency for i-th allele is
expected Hardy-Weinberg frequency of homozygous genotype is
Over all possible alleles i, i=1,n, the probability that the locus is homozygous for any allele is
2ip
ip
n
iipJ
1
2
Expected heterozygosity = 1-expected homozygosity
n
iipJH
1
211
often referred to as gene diversity
Heterozygosity at 20 variable allozymes out of 71loci sampled in a population of European people
Gene Locus Enzyme Encoded Heterozygosity H
Aph Alkaline phosphatase (placental) 0.53 Acph Acid phosphatase 0.52 Gpt Glutamate-pyruvate transaminase 0.50 Adh-3 Alcohol dehydrogenase-3 0.48 Peps Pepsinogen 0.47 Pgm-2 Phosphoglucomutase-2 0.38 Pept-A Peptidase-A 0.37 Pgm-1 Phosphoglucomutase-l 0.36 Me Malic enzyme 0.30 Ace Acetylcholinesterase 0.23 Adn Adenosine deaminase 0.11 Gput Galactose-1-phosphate uridyl transferase 0.11 Adk Adenylate kinase 0.09 Amy Amylase (pancreatic) 0.09 Adh-2 Alcohol dehydrogenase-2 0.07 6Pgdh 6-Phosphogluconate dehydrogenase 0.05 Hk Hexokinase (white-cell) 0.05 Got Glutamate-oxaloacetate transaminase 0.03 Pept-C Peptidase-C 0.02 Pept-D Peptid ase-D 0.02 51 Loci invariant (Monomorphic) 0.00
After H. Harris and D. A. Hopkinson, J. Human Genetics
Comparison of isozyme variation across kingdoms
Variation of diversity among species
Explaining levels of diversity is a prime activity of population genetics
Plants have most diverse array of life histories, short-lived and self-fertilizers have least variation, long-lived outcrossers have most variation
Vertebrates have narrowest array of life histories, hence lowest variation of diversity among species
Just explaining the mean level of diversity is challenging
Outcome of complex interplay of mutation, selection, and chance (drift)…
Q. What does heterozygosity measure?
A. The tendency for a population to have “intermediate” gene frequencies
Other measures of genetic variation
Polymorphism Ford (1940) “the occurrence together in the same habitat of two or more
discontinuous forms in such proportions that the rarest of them cannot be maintained by recurrent mutation”probably not a good definition in 2006
Polymorphism Cavalli-Sforza and Bodmer (1971) “the occurrence in the same population of two or more alleles
at one locus, each with appreciable frequency” but what is “appreciable frequency?”
Other measures of diversity
Proportion of polymorphic loci: P practical definition of “appreciable frequency” arbitrary limit for most common allele
0.95 normally
0.99 sometimes (used when sample is adequate, N >100)
Numbers of alleles Number of alleles, n
allele diversity or allele richness strongly influenced by sample size
Effective number of alleles ne = 1 / ( 1 - H ) number of equally frequent alleles that gives observed H
P vs. H over taxa
Measures of nucleotide diversity
Proportion of sites that differ = S/N S=number of segregating sites N=number of nucleotide sites Depends on number of sequences aligned
the more sequences, the higher S like the proportion of polymorphic loci
Nucleotide diversity Heterozygosity averaged over aligned sites If there are K sequences, make all possible pairwise
comparisons (there are K(K-1)/2 comparisons) Analogous to H as estimated from gene frequencies
Estimation of gene frequency
Gene counting
Freq(A) = Freq(AA)+.5 Freq(Aa)
Var(p) = p(1-p)/(2N) Binomial sampling variance Construct confidence interval
Dominance: need Hardy-Weinberg
Estimation of gene frequency
Dominance: assume Hardy-Weinberg
2qfaa
aafq ˆ
)4/()1()ˆ(Var 2 Nqq
Kermode bear example
A total of 87 bears were collected for hair samples on Gribbell, Princess Royal and Roderick Islands
66 were black, 21 were white
Frequency of recessive phenotype = 21/(66+21) = 0.241
Estimate of gene frequency of white gene is square root of this: sqrt(0.241) = 0.49
Variance is (1-0.492)/(4*87)=0.00218
SE is sqrt of this, sqrt(0.00218) =0.046
We also have nucleotide data for gene underlying Kermode coat color
AA and AG = black, GG=white 42 AA, 24 AG, 21 GG Gene frequency of G (white) =
(24 + 2 x 21)) / (2 x 87) = 0.38 SE = sqrt(q(1-q)/2N) = 0.040
Using just coat color, with white recessive q=0.49, SE=0.046 (from previous slide) q is higher (0.49 vs. 0.38); why?
Expected frequency of white bears
Using co-dominant Mc1r data, expected number of GG = 87 x (0.38)2 = 12.5
Observed number is 21 (>>12.5)
Can be caused by Assortative mating which creates excess of white genotype (GG)
over HW expectations Variation of gene frequency among islands
Microsatellite loci show no excess homozygosity! Assortative mating at coat color locus Excess homozygosity only at Mc1r
Null alleles or inbreeding? Fis values (excess homozygosity above HW expectations) for Yellow Warbler microsatellites
Locus Fis value
Caµ 28 0.30
Dpµ 01 0.01
Dpµ 03 0.05
Dpµ 15 0.12
Dpµ 16 0.00
Maµ 23 0.02
Another exercise in HW: null alleles increase apparent homozygote frequency
n
i
n
inii ppp
1
1
1
2 2Sum of all true homozygotes plus all heterozygous nulls
)1(2 nne ppJ Equals expected homozygosity plus twice null frequency
221
22221
12121
...
............
...
...
nnn
n
n
ppppp
ppppp
ppppp(e.g., sum last row and column of the expansion of gene frequencies, except for the lower right corner)
Populations: defining and identifying
Two major paradigms for defining populations
•Ecological paradigmA group of individuals of the same species that co-occur in space and time and have an opportunity to interact with each other.
•Evolutionary paradigmA group of individuals of the same species living in close enough proximity that any member of the group can potentially mate with any other member.
Cocoa from 32 abandoned estates in Trinidad 88 Imperial College Selection (ICS) clones conserved in the International Cocoa Genebank, Trinidad, assayed for 35 microsatellite loci
Unweighted pair groupmethod used to constructdendrogram of relatedness between individuals
The different colored groups can be identified by eye, oridentified with the computer program “STRUCTURE” (as was done here).
Yellow perch
The yellow perch plays a significant role in the survival and success of the double-crested cormorant and other birds, predatory fish, commercial fisherman, and sport fisherman in the Great Lakes region. This fish must be properly managed in order to prevent the trophic structure and economy of the Great Lakes region from collapsing.
The yellow perch (Perca flavescens) is found in the United States and Canada, and looks similar to the European perch but are paler. It is in the same family as the walleye, but in a different family from white perch.
mt DNA Control region haplotype frequency patterning for Yellow Perch spawning site groups across North America
Relationships among mtDNA haplotypes of Yellow Perch
Allele distribution for six representative Yellow Perch microsatellite loci among selected regions. Rings represent loci, colors within a ring represent alleles.
Bayesian assignment of Yellow Perch genetic structure, using STRUCTURE. Vertical bars represent individuals, colors within a bar represent probability of assignment to a cluster. 8 microsatellite loci, 25 collection sites, N= 495 fish, K=10
Inference of population structure using multi-locus genotype data
Pritchard, Stephens, and Donnelly (2000) Falush, Stephens, and Pritchard (2003)
STRUCTURE V2.1 Pritchard, J.K., and Wen, W. (2004)
Main objective of “structure”
Assign individuals to populations on the bases of their genotypes, while simultaneously estimating population allele frequencies
Infer number of populations “K” in the process
Other objectives
Begin with a set of predefined populations and to classify individuals of unknown origin
Identify the extent of admixture of individuals
Infer the origin of particular loci in the sampled individuals
Structure is a Bayesian Model Based method of clustering
many assumptions about parameters and distributions
Four basic models
1. Model without admixtureeach individual is assumed to originate in one (only one) of K populations
2. Model with admixtureeach individual is assumed to have inherited some proportion of its ancestry from each of K populations
Four basic models
3. Linkage model
“Chunks” of chromosomes as derived as intact units from one or another K population and all allele copies on the same “chunk” derive from the same population.
Four basic models
4. F modelThe populations all diverged from a common ancestral population at the same time, but allows that the populations may have experienced different amounts of drift since the divergence event
Assumptions• The main modeling assumptions are Hardy-Weinberg equilibrium (HW) within populations and complete linkage equilibrium (LD) between loci within populations
• The model accounts for the presence of HW or LD by introducing population structure and attempts to find populations groupings that (as far as possible) are not in disequilibrium
Hardy-Weinberg
Gives relationship between gene frequencies and genotypic frequencies, assuming random mating
F(AA)=p2
F(Aa)=2pq F(aa)=q2
The extent of a randomly mating population is predicted from STUCTURE using HW predictions
Pairwise comparison of LD along chromosomes, high LD is red, low LD is green
Bayesian procedure employed by STRUCTURE Step 1: estimate the allele frequencies for each
population assuming that the population of origin of each individual is known.
Step 2: estimate the population of origin of each individual, assuming that the population allele frequencies are known.
Iterate several times using “Markov-Chain Monte-Carlo” procedure
Good and bad things about “structure” When populations are real, most efficient way to
estimate number of populations K and the membership of individuals to populations
When populations are more continuous (for example a continuous cline), can impose incorrect structure on data, and create an arbitrary number of artificial groups.
Human variation and differentiation
Hundreds of microsatellites now available ALU markers
Can evolutionary history be reconstructed Are there distinct “races” Are certain populations less diverse
K is set to 3
We place individuals in three groups, without prior knowledge of group membership
More loci, the better identification of groups
• Human Genome Diversity Panel
• 55 Indigenous Populations from 5 Continents: Africa, Americas, Asia, Europe, Oceania, total of 1,056 people
• 377 microsatellite markers assayed
Noah Rosenberg et al, Science, 2002
Structure within structure
Jun Li et al, Science, 2008
Human Genome Diversity Panel, 938 individuals from 51 populations, 5 continents
650,000 SNP Markers
Bayesian prior for population assignment
Ursus americanus ssp. Kermodii
Purpose of Kermode bear study, conducted in conjunction with Western Forest Products
• Determine if white bear populations are genetically unique for other types of genetic variation
• Identify the gene, or genes, that cause the white coat color difference
• Infer the role of natural selection vs. genetic drift from patterns of genetic variation for this gene
• Predict effects of forest practices using this information
Populations sampled for Kermode bear hairs
Barb wire hair trap with Kermode hair
From 1685 hair samplesto 766 microsatellite profilesto 216 unique genotypes (22 Kermode)
Kermode-containing populations (yellow): perhaps 10% less genetic variation, but other island populations show 10% less variation too
Genetic divergence (below diagonal), gene flow (above diagonal)
Relationship of populations based upon pairwise genetic divergence (previous table); gene frequencies of white phase given in parenthesis
(0.56)
(0.05)
(0.08)
(0.00)
(0.02)
(.013)
(0.05)
(0.04)
(0.21)
(0.10)(0.00)
(0.33)
Kermode populationsare not closely relatedto each other, somesuggestion of complexinterrelations
E-Pr H (Hawkesbury)
E-Pr W-H
P (Pooly Is), R (Roderick Is)
T (Terrace/Nass)