Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet...
-
Upload
lesley-stinchcomb -
Category
Documents
-
view
220 -
download
1
Transcript of Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet...
Genome-wide Association Studies
John S. Witte
Association Studies
Hirschhorn & Daly, Nat Rev Genet 2005
Candidate Gene or GWAS
Affymetrix Array
Genome-wide Association Studies
Altshuler & Clark, Science 2005
Genome-wide Assocation Studies (GWAS)
GWAS+ Strategy
Clarification:Sequencing+
Confirmation /Characterization:
Follow-upGenotyping+
Discovery:Multi-stage
GWAS+
# Markers # Samples
Tim
e
GWAS+ Strategy
Clarification:Sequencing+
Confirmation /Characterization:
Follow-upGenotyping+
Discovery:Multi-stage
GWAS+
# Markers # Samples
Tim
e
1,2,
3,…
……
……
……
……
,N
1,2,3,……………………………,M
SNPs
Sam
ples
One-Stage DesignOne-Stage Design
Stage 1
Sta
ge 2
samples
markers
Two-Stage DesignTwo-Stage Design
1,2,3,……………………………,M
SNPs
Sam
ples
1,2,
3,…
……
……
……
……
,N
One- and Two-Stage GWA DesignsOne- and Two-Stage GWA Designs
SNPs
Sam
ples
Replication-based analysisSNPs
Sam
ples
Stage 1
Stag
e 2
One-Stage DesignOne-Stage Design
Joint analysisSNPs
Sam
ples
Stage 1
Stag
e 2
Two-Stage DesignTwo-Stage Design
Multistage Designs
• Joint analysis has more power than replication
• p-value in Stage 1 must be liberal
• Lower cost—do not gain power
• http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
QC Steps
• Filter SNPs and Individuals– MAF, Low call rates
• Test for HWE among controls & within ethnic groups. Use conservative alpha-level
• Check for relatedness. Identity-by-state calculations.
Analysis of GWAS
• Most common approach: look at each SNP one-at-a-time.• Possibly add in multi-marker information.• Further investigate / report top SNPs only.• Or backwards replication…
P-values
GWAS Analysis
• Most commonly trend test.• Log additive model, logistic regression.• Adjust for potential population stratification.
Quantile-Quantile (QQ) PlotQuantile-Quantile (QQ) Plot
http://cgems.cancer.govchromosome
Example: GWAS of Prostate Cancer
Witte, Nat Genet 2007
0
5
10
15
20
25
30
128.10 128.20 128.30 128.40 128.50 128.60 128.70
Position on 8q24 (Mb)
-lo
g(p
-va
lue
)
Gudmundsson et al.
Haiman et al.
Yeager et al.
Combined (adjusted)
rs6983267
rs1447295
rs16901979
Region 1Region 2
Region 3
Multiple prostate cancer loci on 8q24
Locus A Freq Association
Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn
2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking
3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic
6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.
7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking
8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic
8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic
8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic
10q11 rs10993994 C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.
10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity
11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic
17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties
17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic
19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA
Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis
Prostate Cancer Replications
Witte, Nat Rev Genet 2009Modest ORs
Locus A Freq Association
Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn
2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking
3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic
6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.
7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking
8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic
8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic
8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic
10q11 rs10993994 C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.
10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity
11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic
17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties
17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic
19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA
Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis
Prostate Cancer Replications
Witte, Nat Rev Genet 2009Modest ORs
Locus A Freq Association
Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn
2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking
3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic
6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.
7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking
8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic
8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic
8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic
10q11 rs10993994
C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.
10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity
11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic
17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties
17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic
19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA
Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis
SNPs Missed in Replication?
Witte, Nat Rev Genet, 2009
24,223 smallestP-value!
Manolio et al. Clin Invest 2008www.genome.gov/gwastudies
ProstateCancer
BRCA1
Smoking
0
0.25
0.5
0.75
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
PA
R
Risk Allele Frequency
Population Attributable Risk and GWAs
MI
Type 2 Diabetes
Obesity
MI
Crohn's
Type 1 Diabetes
Prostate Cancer
Lung Cancer
Breast Cancer
Population Attributable Risks for GWAS
Jorgenson & Witte, 2009
Smoking & lung cancer
BRCA1 & Breast cancer
Limitations of GWAS• Not very predictive
Witte, Nat Rev Genet 2009
Example: AUC for Br Cancer Risk
Gail = 58%SNPs = 58.9%G + S = 61.8%
Wacholder et al. NEJM 2010
Limitations of GWAS
• Not very predictive • Explain little heritability• Focus on common variation• Many associated variants are not causal
Where’s the Heritability?
McCarthy et al., 2008
Many moreof these?
See: NEJM, April 30, 2009
Common disease rare variant (CDRV) hypothesis: diseases due tomultiple rare variants with intermediate penetrances (allelic heterogeneity)
Will GWAS results explain more heritability?
• Possibly, if…1. Causal SNPs not yet detected due to power /
practical issues (e.g., not yet included in replication studies).
2. Stronger effects for causal SNPs:Associated SNP may only serve as a
marker for multiple different causal SNPs.
Imputation of SNP Genotypes
• Estimate unmeasured or missing genotypes.• Based on measured SNPs and external info (e.g.,
haplotype structure of HapMap).• Increase GWAS power.• Allow for combining data across different platforms
(e.g., Affy & Illumina) (for replication / meta-analysis).
Imputation Example
Observed Genotypes
. . . . A . . . . . . . A . . . . A . . .
. . . . G . . . . . . . C . . . . A . . .
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C
Study Sample
HapMap/1K genomes
Gonçalo Abecasis
Identify Match with Reference
Observed Genotypes
. . . . A . . . . . . . A . . . . A . . .
. . . . G . . . . . . . C . . . . A . . .
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C
Gonçalo Abecasis
Phase chromosomes, impute missing genotypes
Observed Genotypes
c g a g A t c t c c c g A c c t c A t g gc g a a G c t c t t t t C t t t c A t g g
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C
Gonçalo Abecasishttp://www.sph.umich.edu/csg/abecasis/MACH
Imputation Application
Chromosomal PositionMarchini Nature Genetics2007http://www.stats.ox.ac.uk/~marchini/#software
TCF7L2 gene region & T2D from the WTCCC data
Observed genotypes blackImputed genotypes red.
Genome-wide Sequence Studies
• Trade off between number of samples, depth, and genomic coverage.
MAF
Sample Size Depth 0.5-1% 2-5%
1,000 20x perfect perfect
2,000 10x r2=0.98 r2=0.995
4,000 5x r2=0.90 r2=0.98
Goncalo Abecasis
Near-term Design Choices
• For example, between:1. Sequencing few subjects with extreme
phenotypes: • e.g., 200 cases, 200 controls, 4x coverage. Then follow-
up in larger population.
2. 10M SNP chip based on 1,000 genomes. • 5K cases, 5K controls.
• Which design will work best…?
m
SNPORx
m
iiji
j
1
)ln(
• Many weak associations combine to risk?• Score model:
where – ln(ORi ) = ‘score’ for SNPi from ‘discovery’ sample– SNPij = # of alleles (0,1,2) for SNPi, person j in ‘validation’
sample.– Large number of SNPs (m)
• xj associated with disease?
Polygenic Models
ISC / Purcell et al. Nature 2009
Purcell / ISC et al. Nature 2009
Application of Model
Application to CGEMs PCa GWAS
• 1,172 cases, 1,157 controls from PLCO Trial• Oversampled more aggressive cases.• Illumina 550K array.
• PCa & stratified by disease aggressiveness.• Split into halves, resampling:
– one as ‘discovery’ sample;– other as ‘validation’.
• LD filter: r2 = 0.5.
Witte & Hoffman 2010
Results for Prostate Cancer
Nat Rev Cancer 2010;10:205-212
Common Polygenic Model for Prostate and Breast Cancer?
- CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.
Results for PCa & BrCa
Complex diseases
Diabetes
Obesity
Diet
Physical activity
Hypertension
Hyperlipidemia
Vulnerable plaques
Atherosclerosis MI
Genetic susceptibility
Complex diseases: Many causes = many causal pathways!
Pathways
• Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways.
• Example: BioCarta: http://www.biocarta.com/
• May be interested in potential joint and/or interaction effects of multiple genes in one pathway.
Moving Beyond Genome
Transcriptome: All messenger RNA molecules (‘transcripts’)
Proteome:All proteins in cell or organism
Metabolome:all metabolites in a biological organism (end products of its gene expression).
Syst
ems
Biol
ogy