Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis...

33
Topic #3 Topic #3 Linkage Disequilibrium, Linkage Disequilibrium, Haplotypes & Tagging Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011

Transcript of Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis...

Page 1: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Topic #3Topic #3Linkage Disequilibrium, Haplotypes & Linkage Disequilibrium, Haplotypes &

TaggingTagging

University of Wisconsin

Genetic Analysis Workshop

June 2011

Page 2: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

OverviewOverview• Fate of a new mutation

• Linkage Disequilibrium (LD)– Measurement– Indirect association

• SNP selection based on LD – Haplotypes– SNP selection by tagging

• Practical – SNP selection using Haploview

Page 3: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Introduction of a Mutation into a PopulationIntroduction of a Mutation into a Population

Page 4: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Introduction of a Mutation into a PopulationIntroduction of a Mutation into a Population

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

Page 5: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Haplotype ConceptHaplotype Concept• The sequence 111212 in this location becomes a

signature for the chromosome carrying the mutation

• Haplotype – alleles inherited together at linked loci on the same chromosome

• 111212 haplotype will not be a perfect marker of disease– At the time mutation arose, there may have been other

chromosomes with 111212– New mutations– Recombination

Page 6: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.
Page 7: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Indirect AssociationIndirect Association

• Each of the alleles in the 111212 haplotype is also expected to be indirectly associated with carrying the mutation.

• Indirect association is an association of a marker with phenotype that is non-causal, being based on linkage disequilibrium (LD)

Page 8: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Linkage Disequilibrium (LD)Linkage Disequilibrium (LD)

• Mendel’s Second Law: alleles at different loci assort independently

• Linkage Disequilibrium (LD): population-level association of alleles at linked loci

Page 9: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

How LD is MeasuredHow LD is MeasuredLD – population-level association between linked loci

A locus: A1 or A2

B locus:B1 or B2

Let P(A1) = pA1

Let P(B1) = pB1

Let P(A1B1) = pA1B1

D = pA1B1 - pA1pB1 = 0 if independent

Page 10: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Common LD MeasuresCommon LD Measures• D = |d|

– Preferred measure for population geneticists– Maximum value is bounded by the marginals

• D’ = |d|/dmax– D’ varies between 0 and 1– Does not have an easy interpretation and 1.0 is achieved if

one off-diagonal is zero

• r2 ( D2) = D2/p(1-p)q(1-q)– Has several interpretations:

• = squared (phi) correlation so lies in [0,1]. • = 2/N

– Directly related to power for indirect association

Page 11: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Allelic AssociationAllelic Association• Direct Association

– Initially it was thought that we could pick the genes and the (single) genetic variant w/i each gene that was relevant for disease

• Indirect Association– The existence of LD opens up the possibility of tests by

indirect association – we don’t need to actually test the causal variant but rather need only genotype a marker that is in high LD with the causal variant

Page 12: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Indirect and Direct Allelic AssociationIndirect and Direct Allelic Association

D

Direct Association

Assess relationship of D locus to phenotype directly – expect D to be a functional polymorphism in a candidate gene

D

Indirect Association

M1 M2M3

Assess relationship of D locus indirectly by determining whethermarkers (Mi) are associated with disease – Mi don’t need to befunctional

Page 13: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394

Page 14: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.
Page 15: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Dawson, E. et al. (2002). A first-generation LD map of 22. Nature 418: 544-547

Page 16: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Population DifferencesPopulation Differences

Weiss, K.M & Clark, A.G. (2002). Trends in Genetics, 18(1):19-24.

Page 17: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Recombination HotspotsRecombination Hotspots

Kauppi, L., Jeffreys, A. J., & Keeney, S. (2004). Where the crossovers are: Recombination distributions in mammals. Nature Reviews Genetics, 5, 413-424

Hotspots typically span 1-2 kb

Page 18: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Haplotype BlocksHaplotype Blocks

Page 19: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Two- and Three-locus HaplotypesTwo- and Three-locus Haplotypes

Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394

APOE locus and haplotypescontaining APOE

Page 20: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Two- and Three-locus HaplotypesTwo- and Three-locus Haplotypes

Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394

3-locus haplotype strongersignal than individual markers

Page 21: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

SNP Selection by TaggingSNP Selection by Tagging• Basic rationale:

– The power for a causal SNP in a sample of size N is equivalent to power of tagging SNP in a sample of size N/r2

• Tagging SNP selection:– Based on some reference sample (HapMap)– Two overarching strategies

• Pairwise tagging• Multimarker tagging

de Bakker, P. I. W., et al. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37(11), 1217-1223.

Page 22: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Reference Sample: HapMapReference Sample: HapMap(www.hapmap.org)(www.hapmap.org)

• HapMap Phase 1:– SNP Selection Strategy (yield ~ 1 million):

• >1 common SNP every 5 kb, total of 1.3 million before QC• MAF > .05• Some priority for non-synonymous cSNPs

– Sample: N=270 (269) individuals from 4 populations• 30 trios of Europeans from Utah (CEU)• 45 unrelated Han Chinese (CHB)• 45 unrelated Japanese (JPT)• 30 Yoruban trios from Nigeria (YRI)

Page 23: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Reference Sample: HapMapReference Sample: HapMap(www.hapmap.org)(www.hapmap.org)

• Phase 2:– 2.1 million additional SNPs

• Total now averages ~ 1/per kb; >98% of common variants w/i 5kb• Focus still on MAF > .05• Average max r2 of untyped common SNPs to a typed SNP

Population HapMap I HapMap II

YRI .67 .90

CEU .85 .96

CHB+JPT .83 .95

Page 24: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Reference Sample: HapMapReference Sample: HapMap(www.hapmap.org)(www.hapmap.org)

• Phase 3:– Expand to N=1115 in 11 ancestral groups 2.1 million

additional SNPslabel population sample # samples QC+ Draft 1ASW* African ancestry in Southwest USA 90 71

CEU*Utah residents with Northern and Western

European ancestry from the CEPH collection

180 162

CHB Han Chinese in Beijing, China 90 82CHD Chinese in Metropolitan Denver, Colorado 100 70GIH Gujarati Indians in Houston, Texas 100 83J PT J apanese in Tokyo, J apan 91 82LWK Luhya in Webuye, Kenya 100 83

MEX* Mexican ancestry in Los Angeles, California 90 71

MKK* Maasai in Kinyawa, Kenya 180 171TSI Toscans in Italy 100 77YRI* Yoruba in Ibadan, Nigeria 180 163

1,301 1,115* Sample consists of family triples

Page 25: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

HAPMAP3, Release 2HAPMAP3, Release 2

Region in NCBI B36

COMT

Phase, Release and Build

Page 26: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

HapMap Genotyped SNPs in COMTHapMap Genotyped SNPs in COMT

Page 27: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

Using Haploview to Identify Using Haploview to Identify Tagging SNPs for COMTTagging SNPs for COMT

• Download Data from HapMap– Choose HapMap Download, Phase 3, and Release 2– Choose population– Choose chromosome (22) and region (NCBI B36/hg18)

• Transcription starts at 18309; I will start at 18304• Transcription ends at 18337; I will end at 18340

• Haploview Analysis– Get LD plot– Run Tagger (pairwise)– Force include/exclude

Page 28: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

COMT LD Plot (D’)COMT LD Plot (D’)

Page 29: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

COMT LD Plot (rCOMT LD Plot (r22))

Page 30: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

COMT Tagging SNPs (15 tag 24 at avg rCOMT Tagging SNPs (15 tag 24 at avg r22 = .996) = .996)Tag SNP bp Location MAF Other SNPs Tagged

rs5748489 18307146 5’ .37 rs1544325

rs4646310 18308806 5’ .21 rs6518591

rs737865 18310121 Intron #1 .32 rs737866, rs2020917, rs8185002

rs174675 18314051 Intron #1 .31 rs933271

rs5993882 18317533 Intron #1 .26

rs5993883 18317638 Intron #1 .41

rs740601 18330763 Intron #2 .45 rs2239393, rs4646312

rs4680 18331271 Exon #4 .47 rs4633

rs4646316 18332132 Intron #5 .25

rs174696 18333176 Intron #5 .18

rs9306235 18335157 Intron #5 .11

rs9332377 18335692 Intron #5 .17 rs165728

rs165824 18339366 Intron #5 .06

rs165815 18339473 3’ .12

rs5993891 18339746 3’ .06

Page 31: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.
Page 32: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

LD Plot Available from SNPInfo(http://manticore.niehs.nih.gov/)

Page 33: Topic #3 Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011.

ConclusionsConclusions• Alleles at linked loci tend to be inherited together, a

phenomenon known as linkage disequilibrium (LD)

• Because recombination is not uniform, the genome has a “block-like” structure – haplotype

• You do not need to have the “causal variant” in your genotyped set if it is adequately tagged

• A major strategy for SNP selection is to ensure adequate coverage (r2 > .8) of common genetic variants in a gene, which can be done with Haploview