Lecture 3 l dand_haplotypes_full
-
Upload
lekki-frazier-wood -
Category
Science
-
view
83 -
download
3
description
Transcript of Lecture 3 l dand_haplotypes_full
Introduction
All about my classes
• Lectures are stand alone - No preparation needed except for previous course content.
• Nearly always provide additional resources• -Take home exercise• -Papers referenced• -Resources such as other lecture slides
All about me….
All about you….
Try to always orient you to the session
• Go over the theory of linkage disequilibrium and haplotypes
• Calculate linkage disequilibrium by hand• Relaxing session: story of HapMap• Lab: Today walk you through, hand-holding look
at HapMap.
• Each ~30 minutes, so please go spent extra time familiarizing yourself with HapMap.
Try to give you your learning objectives
• Primary objectives• Describe linkage disequilibrium and a haplotype• Explain the meaning of r2 = 1.0, r2 = .8 and r2 = .5• Find a region of interest (ROI) on HapMap• Locate tagSNPs for an ROI on HapMap. • Secondary objectives• Describe how mutations and recombination give rise to linkage
disequilibrium and haplotypes• Calculate D, D’ and r2 by hand • List key differences between D, D’ and r2
• Evaluate the contribution of HapMap to public health genetics
Part 1Haplotype and Linkage Disequilibrium
theory
One source of variation in our DNA occurs through mutation events….
A
C
C
Mutation
Ancestral population
Mutation event
A
Population
Mutations that proliferate are ‘SNPs’
• Single Nucleotide Polymorphisms• The most common type of variation in DNA• Substitution of 1 nucleotide for another• 2/3 SNPs involve C-> T • Definition is evolving:
• Old definition: SNPs must be seen in 1% of the population
• SNPs occur ~ every 300 bp• Therefore ~ 10 million SNPs in the human genome
The number of mutations increases over time
A
C
1st Mutation event
2nd Mutation event
G
G
A
C
G
G
C C Mutation
Proliferating SNPs give rise to haplotypes
• A haplotype is “A specific set of DNA variants observed on a single chromosome, or part of a chromosome”
• In practice, usually referring to a set of SNPs within a single gene
Haplotypes:
A
C
G
G
C C
A C
A
A
A
A
T
T
T
T
T
T
T
T
C
C
C
C
T
T
T
T
Haplotype 1: AG
Haplotype 2: CG
Haplotype 3: CC
Haplotype 4: AC
Resolve the population haplotypes!
C G A C T A G T
GA, CA, GT, CT,
C G A C T A G T A C C A
GAG, CAG, GTG, CTG, GAT, CAT, GTT, CTT,
GC
AT
GC
AT
GT
How many possible haplotypes?
C G A C T A G T
GA, CA, GT, CT,
C G A C T A G T A C C A
GAG, CAG, GTG, CTG, GAT, CAT, GTT, CTT,
GC
AT
GC
AT
GT
22 = 6
23 = 8
How many possible haplotypes?
2 (alleles) to the power of n loci:2n
How many haplotypes does a person have for a given chromosomal region?
C G A C T A G TGC
AT
C G A C T A G TGC
AT
C G A C T A G TGC
AT
But what if the person is homozygous at both loci?
C G A C T A G TGC
AT
C G A C T A G T
C G A C T A G T
GA, CA, GT, CT,
C
C
T
T
CT, CT, CT, CT,
Haplotype overview
• Method of characterizing variation at more than one locus on a chromosome
• Only 1 allele from each locus• But as many alleles as there are loci on the
chromosome… IF….……those loci contain variation (SNPs)
• Like SNPs each person has 2 haplotypes….. Which (like SNPs) may be the same
• The number of possible haplotypes in the population is 2 to the power of n loci.
Variation in our DNA also occurs through recombination
A G
Before recombination
After recombination
C G
C C
A G
C G
C C
A C
The number of recombination events increases over time
Our chromosome are mosaics….
• The extent and conservation of pieces depends on:• Recombination rate• Mutation rate• Population size• Natural selection
What do these mosaics mean….
…. For our haplotypes?
Key concept….
…. alleles often co-occur at greater than chance levels
XX
Linkage Disequilibrium (LD)
• The nonrandom association of alleles at different loci
• Equilibrium – when things are ‘in balance’ or as we would expect
• When a particular allele at one locus is found together on the same chromosome with a specific allele at a second locus, more often than expected if the loci were segregating independently in a population. The loci are in disequilibrium – it is out of balance, or not what we would expect
Linkage disequilibrium is a measureable trait
Determined by space and time
XX
Time decreases linkage disequilibrium
X X
Space decreases linkage disequilibrium
X XX XX
Summary of part 1
• Mutations give rise to SNPs• SNPs give rise to haplotypes• A haplotype is a specific set of DNA variants • Recombination patterns lead to linkage
disequilibrium • Linkage disequilibrium is when we see haplotypes
more often than by chance
Questions before we proceed to calculating LD?
Part 2
Calculating Linkage Disequlibrium
All about punnet squares….
Locus B
Locus A
B b
A
a
PAB PAb
PaB Pab
Totals
Totals:
PA
Pa
PB Pb 1.0
2 loci; A: A/a, B: B/bWhat are out haplotypes?
All about punnet squares (in LD calculation)….
• Each cell contains frequency of a haplotype• Row & column ends contain the frequency of an
allele• When you sum the rows and columns you should
get 1.0
Measures of Linkage Disequilibrium
• (A Little History lesson)• Three measures of LD:
• D • D’• r
Measures of Linkage Disequilibrium - D
• 1960 Lewontin & Kojima• D – unstandardized measure of how far the
association between two alleles differs from that expected by chance
Linkage Equilibrium
PAB = PAPB
Linkage Disequilibrium
PAB = PAPB
Linkage Disequilibrium
PAB = PAPB
D = PAB - (PAPB)
Linkage Disequilibrium – an example
Given the following haplotype frequencies – are the alleles in linkage disequilibrium?PAB = .2PAb = .5PaB = .3Pab = .0i.e. what is D?
D = PAB - (PAPB)
Step 1: Complete the punnet square PAB = .2PAb = .5PaB = .3Pab = .0
Locus B
B b
A
a
.2 .5
.3 .0
Totals
Totals:
.7
.3
.5 .5 1.0
D = PAB - (PAPB)
Locus A
Step 2: Calculate allele frequencies PAB = .2PAb = .5PaB = .3Pab = .0
PA = Pa = PB = Pb =
.7
.3
.5
.5
D = PAB - (PAPB)
Step 3: Calculate D PAB = .2PAb = .5PaB = .3Pab = .0
PA = Pa = PB = Pb =
.7
.3
.5
.5
D = PAB - (PAPB)
D=.2 – (.7 * . 5)D= -.15
Are the alleles in linkage disequlibrium?
Measures of Linkage Disequilibrium - D
Problems:• Sign is arbitrary• Range depends on allele frequencies
Measures of Linkage Disequilibrium – D’
• 1964 Lewinton• D’ – Standardize D to the maximum possible value it
can take
• D’ = D / Dmax/min
Step 4: Calculate Dmax/min PAB = .2PAb = .5PaB = .3Pab = .0
PA = Pa = PB = Pb =
.7
.3
.5
.5
D = -.15
• Where D is positive:Dmax = the lesser of PAPb or PaPB
• Where D is negative:Dmin = the larger of -PAPB or -PaPb
What is our Dmax/min?
Max {-.7*.5, -.3*.5} =
Max{-.35, -.15}
Step 5: Calculate D’ PAB = .2PAb = .5PaB = .3Pab = .0
PA = Pa = PB = Pb =
.7
.3
.5
.5
D = -.15
Dmin = -.15
D’= D / Dmax/min
D’ = -.15 / -.15 = 1
Measures of Linkage Disequilibrium – D’
• D’= +/- 1 = complete LD• No evidence for recombination• Ancestral haplotype not disruptedProblems• D’ is inflated in small N• D’ inflated with rare alleles• No information on allele frequency
Measures of Linkage Disequilibrium – r2
• 1968 Hill & Robertson• r2 = correlation coefficient between 2 alleles
Step 5: Calculate r2 PAB = .2PAb = .5PaB = .3Pab = .0
PA = Pa = PB = Pb =
.7
.3
.5
.5
D = -.15
Dmin = -.15
r2 = D2 / PA Pa PB Pb
r2 = -.152 / [.7*.3*.5*.5] = .43
Measures of Linkage Disequilibrium – r2
• r2 = 0-1• 1= two markers give identical informationProblems
What can we learn from our 3 measures of LD?
• D = -.15• D’ = 1.0• r2 = .43
D’ vs r2
• Both are a measure of association with 1 being the maximum, and indicating most LD
• BUT r2 requires equal allele frequency to be 1.
Perfect LD
• Equal allele frequency• Allelic association is as strong
as possible– 2 haplotypes observed – No detected recombination
between SNPs
D´ = 1 r2 = 1
Complete LD
Unequal allele frequency– 3 haplotypes observed – No detected recombination
between SNPs
D´ = 1 r2 < 1
Calculate your own Linkage Disequilibrium measures of D, D’ and r2
PAB = .6PAb = .1PaB = .2Pab = .1
At the end of the day…..
Linkage disequilibrium is the non random association of markers [SNPs] at two or more loci
….. But what does this mean for applying genetics to public health? (finally we get there….)
Part 3Using LD in genetic studies: The Hapmap
consortium
The Human Genome Project
DbSNP
Cystic Fibrosis
Inflammatory bowel disease
• Likely had many causal variants• Heritable MZ > DZ• 10% of those with IBD had 1 relative with IBD• Reasonable linkage signal on Chr 5• What could explain this structure?
Inflammatory bowel disease
5qp31
5qp31
8 SNPsGGACAACCAATTCGGG
Haplotype Map
• Add to Human Genome Project with information on diversity
• How did HapMap and Human genome project differ?
• ‘Chunks’ of data
8 SNPsGGACAACCAATTCGGG
“Short cuts”
A T A G T A C ATC
AC
AT
GA
GC
GCA
AATT
GGAA
GCGC
TCCC
GCGC
ACCC
SNPs 1, 3 and 4 are TagSNPs
HapMap
• Launched in 2001• Open access resource for all researchers• In real time• Spin off from The Human Genome Project• Qu: What was the key difference between the HGP
and HapMap?• Characterizes LD across the genome• Also develop analytic tools
• Haploview
HapMap
“The success of the HapMap will be measured in terms of the genetic discoveries enabled, and improved knowledge
of disease aetiology.”
HapMapMark Daly “The
community’s response after a number of years of
struggling and to not finding genetic factors for
complex disease”.
HapMap – Phase 1
• Launched in 2001; Production 2002-3• Phase I• Not comprehensive• 90 Yoruba individuals• 90 individuals of European descent • 45 Han Chinese• 45 Japanese• 1,000,000 SNPs
HapMap – Phase 1
Minor allele frequency
HapMap – Phase I
• Released in 2005• 1 million SNPs• August 2006, “dbSNP included more than ten million SNPs, and
more than 40% of them were known to be polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were identified, and no more than 10% of them were known to be polymorphic.”
HapMap – an LD plot
HapMap – Phase I
Recombination hotspots are widespreadand account for LD structure
HapMap – Phase I
Tagger
Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds
YRI CEU CHB+JPT
Pairwise r2 ≥ 0.5 324,865 178,501 159,029
r2 ≥ 0.8474,409 293,835 259,779
r2 = 1 604,886 447,579 434,476
Will tag SNPs picked from HapMap apply to other population samples?
Population differences add very little inefficiency(stolen slide from ASHG... I can’t source this)
CEU
Whites fromLos Angeles, CA
Botnia, Finland
CEUCEU
Utah residents with European ancestry
(CEPH)
HapMap – Phases II and III
• Phase II• >3.1 million genetic variants• Captured 90 to 96 percent of common genetic
variation• Phase III• 1,301 samples from 11 populations
HapMap and Public Health
• How has HapMap helped us in the quest to find genes for disorders?
What is next for HapMap?
• 1,000 Genomes Project
Part 4
HapMap Practical
Goals of this lab
Part 11. Find HapMap SNPs near a gene.2. View patterns of LD amongst the SNPs.3. Select tag SNPs.4. Download information on the SNPs for use in
Haploview.5. Evaluate genotype data in a paper against HapMap
data.Part 26. Make a file from data for use in haploview
Data origin
Goals of this lab
Part 11. Find HapMap SNPs near a gene.2. View patterns of LD amongst the SNPs.3. Select tag SNPs.4. Download information on the SNPs for use in
Haploview.5. Evaluate genotype data in a paper against HapMap
data.
Goals of this lab
Part 11. Find HapMap SNPs near a gene.>Navigate to HapMap>Using release #27 (Pase 3) locate the LRP1 gene (hint: it is a landmark).>Answer questions 1-3
1. Go to hapmap.ncbi.nlm.nih.gov
2. Select release 2, Phase #3
3. Put LRP1 in the search box
5. Look at the information
6. Turn different tracks on and off
(Don’t forget ‘update image’)
7. Count the genotyped SNPs
8. Create an LD plot
9. Choose tag SNPs
Goals of this lab
Part 11. Find HapMap SNPs near a gene.2. View patterns of LD amongst the SNPs.3. Select tag SNPs.4. Download information on the SNPs for use in
Haploview.5. Evaluate genotype data in a paper against HapMap
data.
10. Download LRP1 data & open in Haploview
11. Open in Haploview, Answer questions 4-7
Slide graveyard
6. Turn different tracks on and off
(Don’t forget ‘update image’)
6. Turn different tracks on and off
(Don’t forget ‘update image’)
4. Look at the different PPARy
Try to give you your learning objectives
• Primary objectives• Describe linkage disequilibrium and a haplotype• Explain the meaning of r2 = 1.0, r2 = .8 and r2 = .5• Find a region of interest (ROI) on HapMap• Locate tagSNPs for an ROI on HapMap. • Secondary objectives• Describe how mutations and recombination give rise to linkage
disequilibrium and haplotypes• Calculate D, D’ and r2 by hand • List key differences between D, D’ and r2
• Evaluate the contribution of HapMap to public health genetics
A
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible SNP combinations:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G T A C C T A
C C G A C T A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 5
Haplotype 6
Haplotype 7
Haplotype 8
G
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
1
72
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible Haplotypes, but 3 observed haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
TAGTATTGGTGTCAGCATCGGCGT
1. Information about our population
• Factors that influence linkage disequilibrium:• Genetic drift • Mutation• Founder effects• Selection• Stratification
• Factors that maintain linkage disequilibrium:• Selection• Non-random mating• Linkage
• Mainstay of ‘population genetics’
2. Interpretation of our findings
• Genetic association is correlational therefore, we cannot make causal inferences• SNP1 -> Trait• SNP1 and SNP2 are in LD• We don’t know which is the true causal
variant
Linkage Disequilibrium coefficient D’
PAB = PAPB
DAB = PAB - PAPB
PAB = PAPB + DAB
Problems:• Sign is arbitrary• Range depends on allele frequencies
Q: Why are these problems for applied genetics in public health?
Calculating Linkage EqulibriumLocus B
Locus A B b
A
a
PAB PAb
PaB Pab
Totals
Totals:
PA
Pa
PB Pb 1.0
A
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible SNP combinations:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G T A C C T A
C C G A C T A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 5
Haplotype 6
Haplotype 7
Haplotype 8
G
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
1
72
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible Haplotypes, but 3 observed haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
TAGTATTGGTGTCAGCATCGGCGT
Linkage Equilibrium
PAB = PAPB
PAb = PaPb = PA P (1-Pb)PaB = PaPB = (1-PA) PB
Pab = PaPb = (1-PA) (1-PB)
Linkage Disequilibrium coefficient D
PAB = PAPB
DAB = PAB - PAPB
Problems:• Sign is arbitrary• Range depends on allele frequencies
Q: Why are these problems for applied genetics in public health?
S.M. Bray, J.G. Mulle, A.F. Dodd, A.E. Pulver, S. Wooding and S.T. Warren. Signatures of founder effects, admixture and selection in the Ashkenazi Jewish population. PNAS Early Edition (2010).
C T G A C T A A G T A C C G AC T G A C T A A G T A C C T AC T G A C T A G G T A C C G AC T G A C T A G G T A C C T AC C G A C T A A G T A C C G AC C G A C T A A G T A C C T AC C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
8 Possible haplotypes:
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 5
Haplotype 6
Haplotype 7
Haplotype 8
C T G A C T A A G T A C C G AC T G A C T A A G T A C C T AC T G A C T A G G T A C C G AC T G A C T A G G T A C C T AC C G A C T A A G T A C C G AC C G A C T A A G T A C C T AC C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Measures of Linkage Disequilibrium - D
• 1960s Lewontin & Kojima• D – unstandardized measure of how far the
association between two alleles differs from that expected by chance
Then we get recombinationA
C
G
G
C C
A
C
G
G
C C
Before recombination
After recombination
A C
C T G A C T A A G T A C C G AC T G A C T A A G T A C C T AC T G A C T A G G T A C C G AC T G A C T A G G T A C C T AC C G A C T A A G T A C C G AC C G A C T A A G T A C C T AC C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Ancestor
Present Day
Recombination on an individual level
Measures of Linkage Disequilibrium - D
• At single locus: Aa PA = (1-Pa)
C
C G A C T A G T A C C ATC
AG
GT
T G A C T A A G T A C C G A
8 Possible SNP combinations:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A GH
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Refresher
• Recombination
Sources of variation in our DNA
New Concept – Linkage Disequilibrium
• Linkage Disequilibrium is the tendency for 2 (or more) SNPs to be inherited together
• AATAAGCCTGATC• ATTAAGCCTGATC• AATTAGCCTGATC• ATTAAGGCTGATC
Why is this important?
• Allows to only genotype certain SNPs of the genome…
• ….. We can infer more than we type
Haplotype
• Inheritance of a cluster of SNPs• “Haploid” “Genotype”