Analysis of imputed rare variants

32
Analysis of imputed rare variants Andrew Morris Advanced Topics in GWAS Toronto, 30 May 2012

description

Analysis of imputed rare variants. Andrew Morris Advanced Topics in GWAS Toronto, 30 May 2012. Introduction. GWAS have been successful in detecting novel loci for complex traits: typically characterised by common variants of modest effect; - PowerPoint PPT Presentation

Transcript of Analysis of imputed rare variants

Page 1: Analysis of imputed rare variants

Analysis of imputed rare variants

Andrew MorrisAdvanced Topics in GWAS

Toronto, 30 May 2012

Page 2: Analysis of imputed rare variants

Introduction

• GWAS have been successful in detecting novel loci for complex traits:• typically characterised by common variants of

modest effect;• together explain relatively little of the heritability.

• Low-frequency and rare variation may contribute to the “missing” genetic component of complex traits:• IFIH1 and type 1 diabetes;• MYH6 and sick sinus syndrome.

Page 3: Analysis of imputed rare variants

Rare variants and complex disease

• Rare variants are likely to have arisen from founder effects in the last few generations.

• Rare variants are expected to have larger effects on complex traits that common variants.

• Statistical methods focus on the accumulation of minor alleles at rare variants (mutational load) within the same functional unit.

Page 4: Analysis of imputed rare variants

• Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles.

• Model disease phenotype via regression on pi and any other covariates in GLM framework.

GRANVIL

1 0 0 0 0 1 0 0 0 1 pi = 3/10

Reedik Magihttp://www.well.ox.ac.uk/GRANVIL/

Page 5: Analysis of imputed rare variants

Assaying rare genetic variation

• Gold-standard approach to assaying rare genetic variation is through re-sequencing, which is expensive on the scale of the whole genome.

• GWAS genotyping arrays are inexpensive, but are not designed to capture rare genetic variation.

• Increasing availability of large-scale reference panels of whole-genome re-sequencing data: 1000 Genomes Project and the UK10K Project.

• Impute into GWAS scaffolds up to these reference panels to recover genotypes at rare variants at no additional cost, other than computing.

Page 6: Analysis of imputed rare variants

• Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles.

• Replace direct genotypes with posterior probability of heterozygous or rare homozygous call from imputation.

• Model disease phenotype via regression on pi and any other covariates in GLM framework.

GRANVIL: imputed variants

0.9 0.1 0.2 0.1 0.1 0.8 0.1 0.1 0.1 0.6 pi = 3.0/10

Page 7: Analysis of imputed rare variants

Study question

• Can we make use of imputation into GWAS scaffolds up to re-sequencing reference panels to detect rare variant associations with complex traits?

• Simulation study performed to compare power to detect association using GRANVIL for four alternative strategies for assaying rare genetic variation.

Page 8: Analysis of imputed rare variants

Design and analysis strategies

ANALYSIS COHORT

PHASED REFERENCE PANEL

Page 9: Analysis of imputed rare variants

Strategy 1. Re-sequence analysis cohort

PHASED REFERENCE PANEL

ANALYSIS COHORT

Page 10: Analysis of imputed rare variants

Strategy 2. Genotype analysis cohort for variants in reference panel

PHASED REFERENCE PANEL

ANALYSIS COHORT

Variants not present in reference panel will be missed

Page 11: Analysis of imputed rare variants

Strategy 3. Genotype analysis cohort with GWAS chip

PHASED REFERENCE PANEL

ANALYSIS COHORT

Page 12: Analysis of imputed rare variants

Strategy 4. Genotype analysis cohort with GWAS chip and impute variants on reference panel

PHASED REFERENCE PANEL

ANALYSIS COHORT

Recovery of rare variants on reference panel by imputation

Page 13: Analysis of imputed rare variants

Simulation study

• Simulate 1050kb region of genome containing 50kb gene in phased reference panel (120, 500 or 4000 individuals) and analysis cohort (2000 individuals).

• Select causal variants within gene subject to maximum MAF and total MAF.

• Simulate quantitative trait for analysis cohort given causal variants and contribution to overall trait variance.

Page 14: Analysis of imputed rare variants

Simulation study

• Apply each strategy and test for association of rare variants (MAF<1% in analysis cohort) with quantitative trait using GRANVIL.• Strategies 3 and 4: GWAS Illumina 660K chip.• Strategy 4: Imputation performed using IMPUTEv2

allowing a “buffer” of 500kb, with low quality imputed variants (info score < 0.4) excluded from analysis.

• Assess power to detect association at a nominal 5% significance threshold.

Page 15: Analysis of imputed rare variants

Maximum MAF of causal variant: 1% Total MAF of causal variants: 5%

Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.

Page 16: Analysis of imputed rare variants

Maximum MAF of causal variant: 0.5% Total MAF of causal variants: 2%

Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.

Page 17: Analysis of imputed rare variants

Comments• We can recover up to 80% of the power to detect

rare variant associations attained through re-sequencing by imputation into GWAS data.

• Essential to include a “buffer” for imputation.• As the MAF of causal variants decreases, larger

reference panels offer greater power.• Limiting assumptions of simulation study:• No re-sequencing or phasing errors in the reference

panel, and no miscalled or missing genotypes in the analysis cohort.

• Reference panel ascertained from same population as analysis cohort.

Page 18: Analysis of imputed rare variants

Application to WTCCC

• GWAS of seven complex human diseases from the UK (2000 cases each and 3000 shared controls from 1958 British Birth Cohort and National Blood Service):• bipolar disease (BD), coronary artery disease (CAD),

Crohn’s disease (CD), hypertension (HT), rheumatoid arthritis (RA), type 1 diabetes (T1D) and type 2 diabetes (T2D).

• Individuals genotyped using the Affymetrix GeneChip 500K Mapping Array Set.

Page 19: Analysis of imputed rare variants

Quality control• Samples excluded on the basis of

mismatch with external data, low call rate, outlying heterozygosity, duplication, relatedness, and non-European ancestry.

• SNPs excluded on the basis of:• call rate <95% (<99% if MAF <5%);• extreme deviation from HWE (exact

p<5.7x10-7);• MAF <1%.

Cohort Samples passing QC

Controls 2,938

BD 1,868

CAD 1,926

CD 1,748

HT 1,952

RA 1,860

T1D 1,963

T2D 1,924

A total of 16,179 samples and 391,060 high-quality autosomal SNPs carried forward for analysis

Page 20: Analysis of imputed rare variants

Fine-scale UK population structure

• Fine-scale population structure may have greater impact on rare variants than on common SNPs because of recent founder effects.

• Utilised EIGENSTRAT to construct principal components to represent axes of genetic variation across the UK: 27,770 high-quality LD pruned (r2<0.2) common autosomal SNPs (MAF>5%).

Page 21: Analysis of imputed rare variants

Fine-scale UK population structure

Page 22: Analysis of imputed rare variants

Imputation

• SNPs mapped to NCBI build 37 of human genome.• Samples imputed up to 1000 Genomes Phase 1

cosmopolitan reference panel (June 2011 interim release).

• 8.23M imputed autosomal rare variants (MAF<1%) polymorphic in WTCCC.

• 5.38M (65.3%) were “well-imputed” (i.e. Info score > 0.4) and carried forward for analysis.

• Mean info score was 0.618, and 17.3% had info score > 0.8.

Page 23: Analysis of imputed rare variants

Rare variant analysis

• Test for association of each disease with accumulation of rare variants (MAF<1%) within genes using GRANVIL.

• Gene boundaries defined from UCSC human genome database (build 37).

• Analyses adjusted for three principal components to adjust for fine-scale UK population structure.

• Genome-wide significance threshold p<1.7x10-6: Bonferroni adjustment for 30,000 genes.

Page 24: Analysis of imputed rare variants

No evidence of residual population structure

Page 25: Analysis of imputed rare variants

Rare variant association with CAD

• Genome-wide significant evidence of association of CAD with rare variants in the gene PRDM10 (p=4.9x10-8).

• Gene contains 122 well imputed rare variants with mean MAF of 0.23%.

• Accumulations of minor alleles across these variants were associated with decreased risk of disease: odds ratio 0.828 (0.774-0.886) per minor allele.

Page 26: Analysis of imputed rare variants

Rare variant association with T1D

• Genome-wide significant evidence of association of T1D with rare variants in multiple genes from the MHC.

• Strongest signal of association observed for HLA-DRA (p=2.0x10-13).• Gene contains 23 well imputed rare variants with mean MAF of

0.32%. • Accumulations of minor alleles across these variants were

associated with decreased risk of disease: odds ratio 0.556 (0.476-0.650) per minor allele.

Page 27: Analysis of imputed rare variants

T1D association across the MHC

PBMUCL2NCR3

EHMT2

SLC44A4TNXA PBX2

AGPAT1C6orf10

HLA-DRB5

HLA-DRA

• Ten genes achieve genome-wide significant evidence of rare variant association with T1D.

Page 28: Analysis of imputed rare variants

T1D association across the MHC

PBMUCL2 SKIVL2EHMT2

SLC44A4

TNXB

PBX2

AGPAT1 HLA-DMAHLA-DRB5HLA-DRA

• After additional adjustment for additive effect of lead GWAS common variant from the MHC (rs9268645).

Page 29: Analysis of imputed rare variants

T1D association across the MHC

Page 30: Analysis of imputed rare variants

Comments

• GRANVIL assumes the same direction of effect on the trait of all rare variants within the functional unit.

• Methodology allowing for different directions of effect of rare variants are well established for re-sequencing data, and are being generalised to allow for imputation.

• The most powerful rare variant test will depend on the underlying genetic architecture of the trait.

Page 31: Analysis of imputed rare variants

Summary

• Simulations suggest that we can recover up to 80% of the power to detect rare variant associations attained through re-sequencing by imputation into GWAS data.

• Requires no additional cost, other than computation, which is not trivial!

• Imputation up to 1000 Genomes reference panel into GWAS data from WTCCC highlighted:• novel association of rare genetic variation in PRDM10 with

CAD;• complex genetic architecture underlying T1D association

across the MHC involving multiple genes.

Page 32: Analysis of imputed rare variants

Lab practical

• Use GRANVIL to test for association of T1D with imputed rare variants within genes across the MHC, using data from the WTCCC.

• Investigate the impact on results of: • the MAF threshold for inclusion of rare variants in

the analysis; • filtering rare variants on the basis of annotation; • gene boundary definition.