EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide...

75
EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft [email protected] Bldg 2 Rm 206 2-4271
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    2

Transcript of EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide...

Page 1: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

EPI293Design and analysis of gene association studies

Winter Term 2008

Lecture 7: Genome-wide association scans

Peter Kraft

[email protected] 2 Rm 206

2-4271

Page 2: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

1900

1920

1940

1960

1980

2005

Rediscovery of Mendel’s laws

Association between Blood Groups and malignant disease fails to replicate

Microsattelite maps for genome-wide linkage analysis developedHuman Genome Project launched

Human Genome Project working draft completed; beginnings of SNP map

HapMap launched

Risch and Merikangas paper

Principles of Linkage Analysis discovered

Association between Blood Groups and malignant disease published

1990

2000

First Genome-Wide Association Study

HapMap Phase I completed (draft Phase II available)Genome-wide SNP panels developed

RFLPs available for linkage analysis developed

2006

2007

Page 3: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 4: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

5 December 2007 [email protected]

A A B B C C C C

A B C C

A C A C B C B C A C

Linkage Analysis

Page 5: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

5 December 2007 [email protected]

3

3

Gg

14Control

41Case

GGgg

GGGG

GG

GG

GGGg

Gg

Gg

gg

Gg

Gg

Gggg

gg

gg

gg

Page 6: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

5 December 2007 [email protected]

Linkage vs. Association

• Linkage studies– Pro: can scan genome with fewer markers

– Cons: Can only detect alleles with large effect; limited resolution (identify broad region, not individual genes); requires data on multiple family members

• Association studies– Pros: can detect subtle effects; very fine resolution

– Cons: requires 0.5 to 1 million markers to cover whole genome; requires large sample size

Page 7: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Risch and Merikangas (1996) Science 273:1516-7

Page 8: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Schloterer C. Nat Rev Genet. 2004;5:63-9.

Page 9: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 10: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Ozaki K. Myocardial Infarction. Nat Genet 2002;32:650–4.• Klein RJ. Age-related macular degeneration. Science 2005;308:385–9.• Maraganore DM. Parkinson disease. Am J Hum Genet 2005;77:685–93.• Shiffman D. Myocardial Infarction. Am J Hum Genet 2005;77:596–605.• Cheung VG. Gene expression. Nature 2005;437:1365-9.• Stranger BE. Gene expression. PLOS Genet 2005;1:695-704.• Mah S. Schizophrenia. Mol Psychiatry 2006;11:471-8.• Herbert A. Obesity. Science 2006; 312:279-83.

Published Genome-Wide Association Scans

Reviews• Hirschorn J. Nat Reviews Genet 2005;6: 95-108.• Wang WY. Nat Reviews Genet 2005;6: 109-18. • Thomas DC. Am J Hum Genet 2005 77: 337-45.• Thomas DC. Cancer Epidemiol Biomarkers Prev 2006 15: 595-8.• Evans DM. Trends in Genetics 2006 (epub)

OLD SLIDE!!!!

Page 11: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

96 cases, 50 controls

103,611 SNPs

rs380390Recessive OR

7.4 (2.9-19)

PAR (70%)

Genotyping errors

Functionality

ReplicationScience 2005;308:421–4

Science 2005;308:419–21

Klein RJ Science 2005;308:385–9

Page 12: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Tier 1 Tier 2

443 sib pairs 332 matched unrelated case-control pairs198,000 SNPs 3,148 SNPs

No SNPs pass Bonferroni-corrected significance threshold (2.510-7).

Maraganore Am J Hum Genet 2005 77:685-93

Page 13: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 14: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Known Breast Cancer Genes, November 2006

Known Prostate Cancer Genes, November 2006

Page 15: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Known Breast Cancer Genes, Fall 2007

Known Prostate Cancer Genes, Fall 2007

Page 16: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Kraft and Cox 2008 in: Rao and Gu, eds.

Page 17: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Power issues– Tagging efficiency of genome-wide panels

– Multi-stage design and analysis

• Design issues

• Analytic issues– Imputation

• CGEMS examples

Outline

Page 18: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Power issues– Tagging efficiency of genome-wide panels

– Multi-stage design and analysis

• Design issues

• Analytic issues– Imputation

• CGEMS examples

Outline

Page 19: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 20: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 21: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Known

Unknown

r2

r2

Page 22: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Barrett JC. Nat Genet 200638:659-62 Pe’er I. Nat Genet 2006;38:663-7.

Page 23: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

International HapMap Consortium. Nature. 2007 Oct 18;449(7164):851-61

Page 24: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 25: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

MAF < 5% MAF 5-12.5% MAF 12.5-25% MAF 25-37.5% MAF 37.5-50%

.90-1.00

.81-.90

.61-.80

.32-.60

.01-.30

0

Distribution of max r2 with tag panel as a function of MAF

Tags chosen from a “pseudo Phase II HapMap” and evaluated against ENCODE SNPs

Page 26: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

The fundamental theorem of the HapMap

The power of a study that genotypes N cases and N controls at a marker that has a correlation of r2 with a disease susceptibility locus has the same power as a study that genotypes N = r2 N cases

and N controls at the disease susceptibility locus.

Power adjusting for tagging efficiency

)()( fNPow

Pritchard JK. Am J Hum Genet 2001;69:1-14.Jorgenson Am J Hum Genet 2006;78:884-8.

Terwilliger JD Eur J Hum Genet 2006;14:426-37.

Page 27: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

0 5000 10000 15000 20000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

sample size (cases)

OR=1.3 OR=1.5 OR=1.8M

AF

=.0

1M

AF

=.0

5M

AF

=.1

0po

wer

direct

indirect(averaged over r2)

indirect(r2 fixed at 80%)

Page 28: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Power issues– Tagging efficiency of genome-wide panels

– Multi-stage design and analysis

• Design issues

• Analytic issues– Imputation

• CGEMS examples

Outline

Page 29: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

SN

Ps

subjects

TT11 TT22 TT33

Page 30: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Replication analysis Joint analysis

Power = Pr(T1>k1,…,TS>kS)=Pr(T1>k1)…Pr(TS>kS)

Power = Pr(T1*>k1

*,…,TS*>kS

*)

ks = Quantile(1-ms+1/ms)ks

* chosen s.t. expected number of markers (under null) taken to

s+1st stage is ms+1

Ts* = 1..s Ts

mS+1 is number of expected false leads (under the null) at the end of Sth stage

(e.g. mS+1 = .05 is strong control of FWER at α=.05)

Power of multi-stage designs

Skol. Nat Genet 2006;38:209-13; Wang Genet Epidemiol 2006;30:356-68; Kraft (in prep)

Page 31: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Multistage Design and Analysis

• It is (or should be) well known that “replication analysis ” is statistically inefficient [cf Thomas DC et al (1985) AJE, Skol (2006) Nat Genet]

• Usually you can find a multistage design that has almost the same power as a single-stage design but is much cheaper

• Multi-stage design is NOT a way of finessing the multiple testing issue. If genotypes were free, you would genotype everybody for every SNP and test all SNPs at very very small alpha level.

• Multi-stage design IS a way of saving big $s, ₤s, €s, etc.

Page 32: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Amount of savings and cheapest design depend on prices—which are very fluid!

Page 33: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Calculating power for “replication analysis”

P2=1-q,,r,N2,22=M3/M2N2M2

k=Mk+1/Mk

1=M2/M1

Effective level

Πi=1..k PiOverall

Pk=1-q,,r,Nk,kNkMk

P1=1-q,,r,N1,1N1M1

PowerNumber of subjects

Number of Markers

Mk+1 is “number of significant tests expected under the null”

E.g. Mk+1=.05 is Bonferroni-corrected threshold for M1 tests

Page 34: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Calculating power for “replication analysis”

2=.0036,0001,500

1=.003

Effective level

Overall

2,400 (1:1 case:control)

500,000

PowerNumber of subjects

Number of Markers

q=10%; dominant OR=1.4; M4=5

.883

.999

.882

Cost: ca. USD7002,400+USD606,000=USD

2.04 million

Page 35: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Calculating power for “replication analysis”

2=.0753,00020,000

=.003

1=.04

Effective level

Overall

3,0001,500

2,400 (1:1 case:control)

500,000

PowerNumber of subjects

Number of Markers

q=10%; dominant OR=1.4; M4=5

.999

.998

.950

.946

Cost: ca. USD7002,400+USD2003,000+

USD603,000=USD 2.46 million

Two-stage study with equivalent power costs > 2.8

million

Page 36: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

A B C20000 5.9 17.7 1

1500 35.8 107.4 120 91.7 275.1 1

Nsnp

Three different per-SNP pricing scenarios considered

Prices relative to per-SNP costs for whole-genome platform

Page 37: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Pricing scheme A; cost relative to single stage study using 7,000 subjects

Page 38: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Relative costs for studies with 65% power

Page 39: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Power for single stage studies, accounting for tagging efficiency

Page 40: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Pow

er

relative cost relative cost relative cost

Illumina 550 Affy 500 Affy 1,000

Power for three stage studies, accounting for tagging efficiency

Page 41: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Illumina 550 Affy 500 Affy 1,000

(Simulated) tagging properties of three panels

Page 42: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

How to select SNPs for 2nd Stage?

• Rank by increasing p-value– But recall, prob. of being false positive depends not only on p-value,

but also on power and prior

• Hence Bayesian alternatives [WTCCC, Wakefield 2007 Am J Hum Genet]

• Quasi-Bayesian FPRP [Wacholder et al 2004; Samani 2007 NEJM]

• Prior-weighted analyses [Roeder 2007 Genet Epidemiol, Lewinger 2007 Genet Epidemiol]

• Pragmatist: meh, no big difference in practice

• What about multiple SNPs in high LD?– Cull so as to interrogate as many regions as possible (“broad” follow

up), or retain to try and distinguish causal variant (“deep” follow up)?

• Can I improve coverage by genotyping more SNPs around “hits”?– Again: “deep” coverage

Page 43: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

“broad” follow-up

“deep” follow-up

“broad” / “deep” defined

Page 44: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Thought Experiment

• Two kinds of GWAS products– Tagging—captures HapMap II at r2>80%

– Random—has density of Affy 500k

• Choose additional SNPs in 2nd stage so that you tag region spanning “hit” in HapMap II at >95%

• Does this increase your power over simply genotyping the top hit?

Page 45: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

R2 initial map

R2

de

nse

r m

ap

MAF < 5%MAF 5-12.5%MAF 12.5-25%MAF < 25-37.5%MAF > 37.5%

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MAF <5% MAF 5-12.5% MAF 12.5-25% MAF 25-37.5% MAF 37.5-50%

.90-1.0

.81-.90

.61-.80

.31-.60

.01-.30

0

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

MAF < 5% MAF 5-12.5% MAF 12.5-25% MAF 25-37.5% MAF 37.5-50%

.90-1.00

.81-.90

.61-.80

.32-.60

.01-.30

0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

R2 initial map

R2

de

nse

r m

ap

Tag

ging

Pan

elR

ando

m P

anel

3.22 X markers

1.46 X markers

# markers per region

Page 46: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

cst

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

max

imum

pow

er f

or

bud

get

co

st

Tag

ging

Pan

elR

ando

m P

anel

Broad

Deep

Power of one-stage design

OR=1.3, MAF=.10Two-stage designs

7,000 cases/controls

Page 47: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

“deep” follow-up “broad” follow-up

Am J Hum Genet 2007

Page 48: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Very small gain in power from fine mapping=deep follow up. Is it worth the opportunity cost? Genotyping a lot of extra markers “fine mapping” null loci means you will miss the chance to replicate the true signals that happened to be lower on your list.

Page 49: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Power calculations

http://www.sph.umich.edu/csg/abecasis/CaTS/

Page 50: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.
Page 51: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

http://www.hsph.harvard.edu/faculty/kraft/soft.htm

Page 52: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Power issues– Tagging efficiency of genome-wide panels

– Multi-stage design and analysis

• Design issues

• Analytic issues– Imputation

• CGEMS examples

Outline

Page 53: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Subject selection• Flexible but simple analysis

– (Multistage design may limit analysis options)

• Sample heterogeneity across stages• Data QC• Population stratification• Bioinformatics• Data sharing, scientific replication, and validation

The design of genome-wide association studies is an art of the possible.

Page 54: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Power issues– Tagging efficiency of genome-wide panels

– Multi-stage design and analysis

• Design issues

• Analytic issues– Imputation

• CGEMS examples

Outline

Page 55: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Analytic issues

• Multiple comparisons• Phenotypic / Genetic heterogeneity• Epistasis• Incorporating external information• Imputation

Page 56: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

BPC3-1 A T A A CBPC3-2 A T G A TBPC3-3 C T A TBPC3-4 C C CBPC3-5 A T G A CBPC3-6 C G A TBPC3-7 A T A TBPC3-8 C C A C BPC3-9 A T A A TBPC3-10 A T G CBPC3-11 T A A BPC3-12 A C G C TBPC3-13 A T A A C

HapM-1 AACGTTTGAACT CCATTGCACHapM-2 AAGGTTTGAACT CTATTGCATHapM-3 CAGGTTTGAACT CTATTGCATBPC3-1 AACGTTTGAACTACTATTGCACBPC3-2 AACGTTTGAACTGCTATTGCATBPC3-3 CAGGTTTGAACT CTATTGCATBPC3-4 CAGGTTCGAACT CTCTTGCACBPC3-5 AATGTTTGAACTGCTATTGCACBPC3-6 CATGTTCGAACTGCTATTGCATBPC3-7 AATGTTTGAACT CTATTGCATBPC3-8 CAGGTTCGAACTACTCTTGCACBPC3-9 AATGTTTGAACTACTATTGCATBPC3-10 AATGTTTGAACTGCT TTGCACBPC3-11 AATGTTTGAACTACTATTGCAC BPC3-12 AATGTTCGAACTGCTCTTGCATBPC3-13 AATGTTTGAACTACTATTGCAC

BPC3-1 AACGTTTGAACTACTATTGCACBPC3-2 AACGTTTGAACTGCTATTGCATBPC3-3 CAGGTTTGAACTACTATTGCATBPC3-4 CAGGTTCGAACTACTCTTGCACBPC3-5 AATGTTTGAACTGCTATTGCACBPC3-6 CATGTTCGAACTGCTATTGCATBPC3-7 AATGTTTGAACTGCTATTGCATBPC3-8 CAGGTTCGAACTACTCTTGCACBPC3-9 AATGTTTGAACTACTATTGCATBPC3-10 AATGTTTGAACTGCT TTGCACBPC3-11 AATGTTTGAACTACTATTGCAC BPC3-12 AATGTTCGAACTGCTCTTGCATBPC3-13 AATGTTTGAACTACTATTGCAC

Page 57: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Accuracy?

Marchini et al. (2007)

Page 58: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Accuracy?

Li et al.

Page 59: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Power Gains?

Marchini et al. (2007)

Page 60: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Implementation

• MACH 1.0 (Li Y et al. submitted)

• IMPUTE (Marchini et al. Nat Genet 2007)

• Bim-Bam (Servin and Stephens, PLoS Genet 2007)

• MACH 1.0 (Li Y et al. submitted)

• IMPUTE (Marchini et al. Nat Genet 2007)

• Bim-Bam (Servin and Stephens, PLoS Genet 2007)

MEC-BMEC-HMEC-LPLCO-B

MEC-JACSATBCEPICHPFS

MEC-WPHS

PLCO-W

(Sub) cohorts

CosmopolitanCHB+JPTCEUReference panel

Page 61: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

de Bakker et al. Nat Genet 2007

Page 62: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

• Power issues– Tagging efficiency of genome-wide panels

– Multi-stage design and analysis

• Design issues

• Analytic issues– Imputation

• CGEMS examples

Outline

Page 63: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

The design of genome-wide association studies is an art of the possible.

Replication Study #1

3000 cases / 3000 controls

Replication Study #2

3000 cases / 3000 controls

Replication Study #3

1200 cases / 1200 controls

Initial Study1200cases / 1200controls

~15,000 SNPs

~1,500 SNPs

Ca. 200 + New ht-SNPs

~500,000 Tag SNPs

Ca. 15-20 Loci

Control Type I error at 510-5

For prostate:PLCONCI’s CGEMS project

Parallel GWA scans for breast and prostate cancer

susceptibility loci

Page 64: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Yeager et al. 2007 Nat Genet

“Fast Track” Partial Replication

Not shown: ca. 100 other “top SNPs” that did not replicate convincingly.

Page 65: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Multi-locus modeling provides evidence for independent effects!

Characterization

Model name Nparms -2 log L p-value AIC BIC BIC Weight0 NULL: Intercept only 1 11691.71 ref 11693.71 11700.75 0.0001 SNP1 - Dominant Model 2 11636.95 1.36E-13 11640.95 11655.03 0.0002 SNP1 - Recessive Model 2 11653.34 5.86E-10 11657.34 11671.42 0.0003 SNP1 - Additive (log odds) Model 2 11622.22 7.68E-17 11626.22 11640.30 0.0004 SNP1 - Codominant Model 3 11621.62 6.02E-16 11627.62 11648.74 0.0005 SNP2 - Dominant Model 2 11614.80 1.79E-18 11618.80 11632.88 0.0006 SNP2 - Recessive Model 2 11674.28 2.98E-05 11678.28 11692.36 0.0007 SNP2 - Additive (log odds) Model 2 11610.08 1.64E-19 11614.08 11628.16 0.0008 Two additive (log odds) SNPs, additive (log odds) interation 3 11548.83 9.43E-32 11554.83 11575.95 0.7479 Two additive (log odds) SNPs, additive (risk scale) interaction 3 11551.00 2.79E-31 11557.00 11578.12 0.253

10 Two codominant SNPs, general interaction 9 11541.41 1.70E-28 11559.41 11622.77 0.000

Say we know two SNPs are associated with risk. Next step is to ask: How? Do they each contribute to disease risk (i.e. conditional on the other SNP, does

adding a SNP improve model fit)? How do they “interact”?

Page 66: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

aaAa

AA

bb

Ba

BB

0.00

0.50

1.00

1.50

2.00

2.50

Odds Ratio (relative to '00')

Additive (log odds) SNPs, additive (log odds) interaction

aaAa

AA

bb

Ba

BB

0.00

0.50

1.00

1.50

2.00

2.50

Unrestricted model

a.k.a. “Main effects only”

Although the saturated model (with 8 unrestricted log odds

ratio parameters) is “closest to the data,” the BIC suggests it is “too close.” The exceptional pattern for odds across the A

locus in the BB stratum is probably just noise (small

cells), not “gene-gene interaction”

Page 67: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Pooled Phase I and II Results

Initial Scan Region p-value Rank p-value

8q24 3.07E-19 116 1.12E-04 8q24 6.58E-12 300 3.92E-04

HNF1B 9.58E-10 384 5.21E-04

MSMB 7.31E-13 24,223 0.042 11q13 1.76E-09 2,439 0.004 CTBP2 1.70E-07 319 4.09E-04 JAZF1 2.14E-06 24,407 0.042

Pooled Phase I+II

Thomas et al, in press

Page 68: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Population Attributable Risk (PAR)

0.23

0.27

0.48

0.40

0.49

0.50

0.10

Freq.

1.10

1.17

1.23

1.22

1.22

1.26

1.43

ORmul

14%JAZF1

9%CTB2

20%11q13

16%MSMB

19%HNF1B

22%8q24-c

8%8q24-a

PARLocus

Joint PAR ~ 60%

Thomas et al, in press

Page 69: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

PARs do not add!

E

G1

G2

All Cases

Marginal PAR for exposure E is 100%Marginal PAR for gene G1 is 100%

Marginal PAR for gene G2 is 20%

A joint PAR of 60% for top seven loci does not mean there are no other risk loci

nor does it mean modifiable environmental factors do not influence prostate cancer risk

Page 70: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Individual Risk PredictionOdds ratio comparing 90th percentile to 10th

percentile ~ 2.5

Thomas et al, submitted

Based on allele frequencies in

controls and multi-locus model assuming

codominant effects at each locus and

multiplicative effects across loci

Page 71: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Probability that a man in the top 10th percentile of risk according to seven-SNP model develops prostate cancer: 45%

Positive predictive value for screening test that predicts prostate cancer for men above a genetic risk profile above a given threshold; recall PPV involves test sensitivity and specificity AS WELL AS incidence rates (here: age specific rates from ACS website)

Page 72: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Novel Risk Loci

• 8q24– Three independent loci with no known function, associated with risks of

prostate and colorectal cancer

• HNF1B (TCF2)– Prostate cancer risk alleles associated with decreased risk of T2D

• MSMB– Encodes beta-microseminoprotein, a proposed prostate-cancer

biomarker

• CTB2– Has anti-apoptotic activity

• JAZF1– Fused by translocation with SUZ12 in endometrial cancer

Page 73: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Where to from Here?

These results open up new and often unexpected avenues for research (c.f. 8q24 region). They may also point to etiologic pathways as targets for treatment.

Despite large PARs, individually these variants are not good predictors on individual's risk. But taken together they may—MAY—be useful for prediction: either for screening or prognosis. The performance of any screening panel will

need to be evaluated in independent studies, and its ultimate efficacy will depend on its discriminative power, and the availability of an intervention proven

to reduce risk.

In the next 3-5 years we'll see many more discoveries using the simple, brute force approach illustrated here. The new challenge will be making sense out of it all: characterizing effects in different populations, looking for gene-environment

interactions, developing new treatments and sound & ethical prevention strategies to reduce cancer morbidity and mortality.

Page 74: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Acknowledgements

NCI Core Genotyping Facility

NCI Division of Cancer Epidemiology and

Genetics

Harvard School of Public Health

Stephen ChanockGilles Thomas

Meredith YeagerKevin Jacobs

Bob HooverRichard Hayes

Sholom WacholderNilanjan Chatterjee

Kai Yu

David HunterJiali Han

Connie Chen

And all the subjects and support staff from the participating studies!

Page 75: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg.

Further ReadingNew England Journal of Medicine, 2 August 2007