Post on 16-Jan-2016
Population Genetics: Chapter 3
Epidemiology 217January 16, 2011
Outline
Allele Frequency Estimation
Hardy-Weinberg equilibrium (HWE)
HWE Game
Population Substructure
Allele Frequency
Diploid, autosomal locus with 2 alleles: A and aAllele frequency is the fraction:
No. of particular allele
No. of all alleles in population
0.86 0.14
0.64 0.36
0.53 0.47
0.93 0.08
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
CEU
CHB
JPT
YRI
rs1036819: longevity SNP
AC
Allele (Gamete) Frequency
Let p = Freq(A) frequency of the dominant allele
Let q = Freq(a) frequency of the recessive allele
Then, p + q =1
Genotype Frequency
p2 = frequency of homozygous dominant genotype q2 = frequency of homozygous recessive
genotype 2pq = frequency of heterozygous genotype
Then, p2 +2pq + q2 =1
Estimating Allele Frequencies from Genotype Frequencies
Frequency of A allele = p2 + ½ (2pq)Frequency of a allele = q2 + ½ (2pq)
Genotypes: AA Aa aaFrequency: p2 2pq q2
Ex. Calculation: Allele Frequencies
In Pop 1:
Assume N=200 in each of two populationsPop 1: 90 AA 40 Aa 70 aa (N=200)Pop 2: 45 AA 130Aa 25 aa (N=200)
p = 90/200 + ½ (40/200) = 0.45 + 0.10 = 0.55q = 70/200 + ½ (40/200) = 0.35 + 0.10 = 0.45
In Pop 2:p = 45/200 + ½ (130/200) = 0.225 + 0.325 = 0.55q = 25/200 + ½ (130/200) = 0.125 + 0.325 = 0.45
Take home points
p + q =1 (sum of the allele frequencies = 1)
p2 + 2pq + q2 =1 (sum of the genotype frequencies = 1)
Two populations with markedly different genotype
frequencies can have the same allele frequencies
Hardy-Weinberg
The Hardy–Weinberg principle states that both allele and genotype frequencies in a population remain constant—that is, they are in equilibrium—from generation to generation unless specific disturbing influences are introduced
p2 + 2pq + q2 = 1
Hardy-Weinberg Assumptions
Allele frequencies do not vary IF:Large populationRandom matingNo in or out migrationNo isolated groups within the populationNo mutationNo selection (no allele is advantageous)
Test of Hardy-Weinberg Equilibrium
Allele frequencies
G alleles = 100*2 + 30 = 230
A alleles =20*2 + 30 = 70
Total alleles = 300
1. Calculate observed allele & genotype frequencies
Genotype frequencies
GG = 100/150 = 0.67
AG =30/150 = 0.20
AA = 20/150 = 0.13
100 GG
30 AG
20 AA
G afq (p) = 230/300 = 0.71
A afq (q) = 1-p = 0.23
Test of Hardy-Weinberg Equilibrium
p2 (GG) = 0.77 * 0.77 = 0.59
2pq (AG) = 2 * 0.77 * 0.23 = 0.35
q2 (AA) = 0.23 * 0.23 = 0.05
2. Calculate expected genotype frequencies based on HW: p2 + 2pq + q2 = 1
Test of Hardy-Weinberg Equilibrium
expected observed
GG 0.59 0.67
AG 0.35 0.20
AA 0.06 0.13
3. Compare expected genotype frequencies to observed frequencies
Chi-square test = Σ(observed – expected)2/expected
= 29.17 with 1 degree of freedom
p = 6.6 x 10-8 > Out of H-W
HWE can be easily expanded to account for any number of alleles at a locus
3 allele case (p1, p2, p3)
Allele frequencies: p1 + p2 + p3 = 1
Genotype frequencies:p1
2 + p22 + p3
2 + 2p1p2 + 2p1p3 + 2p2p3= 1
4 allele case (p1, p2, p3, p4)Allele frequencies: p1 + p2 + p3 + p4= 1
Genotype frequencies:p12 + p22 + p32 + p4
2 + 2p1p2 + 2p1p3 + 2p2p3 + 2p3p4=
1
Application of Hardy-Weinberg Equilibrium
For genetic association studies:
Used as QC measure to assess the accuracy of the genotyping method
Expect SNPs to be in HWE among control populations (ethnic-specific)
Violations of HWE could indicate genotyping errors or bias in data
HWE Game
1. Everyone receives ~5 pairs of cards
2. Two allele model: Red (R allele) & Black (B allele)
3. Random Mating: Exchange one card from each pair with another person (keep cards face down)
4. Determine genotype frequency: RR, RB, BB
5. Determine allele frequency: R, B
Population Stratification
Population stratification is a form of confounding in genetic studies where a gene under study shows marked variation in allele frequency across subgroups of a population and these subgroups differ in their baseline risk of disease
Population Stratification: Confounding
Exposure of Interest
True Risk Factor Disease
Genotype of Interest
Disease
Ethnicity
True Risk Factor
Wacholder, JNCI, 2000
Population Stratification: Gm3;5,13,14 in admixed sample of Native Americans of the Pima and Papago tribes
Study Population: 4,290 Pima and Papago Indians
Genetic Variant: Gm 3;5,13, 15 haplotype (Gm system of human immunoglobulin G)
Outcome: Type 2 diabetes
Question: Is the Gm 3; 5,13, 15 haplotype associated with Type 2 diabetes?
Knowler, AJHG, 1998
Population Stratification: Gm3;5,13,14 in admixed sample of Native Americans of the Pima and Papago tribes
Unadjusted for ethnic backgroundOR = 0.27 (95% 0.18-0.40)
Full heritage American Indian population
+ -
Gm3;5,13,14 ~1% ~99%
NIDDM prevalence ~40%
Caucasian population
+ -
Gm3;5,13,14 ~66% ~34%
NIDDM prevalence ~15%
Gm3,5,13,14 haplotype Cases Controls
+ 7.80% 29.00%
- 92.20% 71.00%
Population Stratification: Gm3;5,13,14 in admixed sample of Native Americans of the Pima and Papago tribes
Gm3,5,13,14 haplotype Cases Controls
+ 7.80% 29.00%
- 92.20% 71.00%
Adjusted for ethnic background OR = 0.83 (95% 0.58-1.18)
Index of Indian heritage
Gm3;5,13,14 haplotype
% Diabetes
0 65.8% 18.5%
4 42.1% 28.5%
8 1.6% 39.2%
Ancestry Informative Markers
Polymorphisms with known allele frequency differences across ancestral groups
Useful in estimating ancestry in admixed individuals
Example: Duffy locus (codes for blood group)
100% sub-Saharan Africans vs. other groups
protects P. vivax (malaria)
Example AIM: Duffy locus
0.15 0.85
0.99 0.01
0.77 0.24
1.00
1.00
0.53 0.47
1.00
1.00
0.00 0.20 0.40 0.60 0.80 1.00
U.S. Hispanics
Mayan
U.S. Whites
England
Germany
U.S. Blacks
Central Africa
Nigeria
rs1814778: Duffy locus
A
G
http://www.ncbi.nlm.nih.gov/projects/SNP
Population Inbreeding
Population inbreeding occurs when there is a preference of mating between close relatives or because of geographic isolation in a population. This will cause deviations in HWE by causing a deficit of heterozygotes.
How to quantify the amount of inbreeding in a population?
Inbreeding coefficient, F
The probability that a random individual in the population inherits two copies of the same allele from a common ancestor
F ranges 0 to 1:
F is low in random mating populations
F close to 1 in self-breeding population (plants)
Helgason, Science, 2008
Kinship & Reproduction: Icelandic couples
# of children
# of children
that reproduce
# of grandchildren
mean lifespan of children