BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of...

50
BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerland work: ++41-21-692-5452 cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of...

Page 1: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

BSc Course:

"Experimental design“

Genome-wide Association Studies

Sven BergmannDepartment of Medical Genetics

University of LausanneRue de Bugnon 27 - DGM 328 

CH-1005 Lausanne Switzerland

 

work: ++41-21-692-5452cell: ++41-78-663-4980

http://serverdgm.unil.ch/bergmann

Page 2: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 3: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 4: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

6’18

9 in

divi

dual

s

Phenotypes

159 measurement

144 questions

Genotypes

500.000 SNPs

CoLaus = Cohort Lausanne

Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

Page 5: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

Genetic variation in SNPs (Single Nucleotide Polymorphisms)

Page 6: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Analysis of Genotypes only

Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

Page 7: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Example: 2PCs for 3d-data

http://ordination.okstate.edu/PCA.htm

Raw data points: {a, …, z}

Page 8: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Example: 2PCs for 3d-data

http://ordination.okstate.edu/PCA.htm

Normalized data points: zero mean (& unit std)!

Page 9: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Example: 2PCs for 3d-data

http://ordination.okstate.edu/PCA.htm

Identification of axes with the most variance

Most variance is along PCA1

The direction of most variance

perpendicular to PCA1 defines

PCA2

Page 10: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Ethnic groups cluster according to geographic distances

PC1 PC1

PC

2P

C2

Page 11: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

PCA of POPRES cohort

Page 12: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 13: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Phenotypic variation:

Page 14: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6

What is association?chromosomeSNPs trait variant

Genetic variation yields phenotypic variation

Population with ‘ ’ allele Population with ‘ ’ allele

Distributions of “trait”

Page 15: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Quantifying Significance

Page 16: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

T-test

t-value (significance) can be translated into p-value (probability)

Page 17: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Association using regression

genotype Coded genotype

phen

otyp

e

Page 18: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Regression analysis

X

Y

“response”

“feature(s)”

“intercept”

“coefficients”

“residuals”

Page 19: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Regression formalism

(monotonic)transformation

phenotype(response variable)of individual i

effect size(regression coefficient)

coded genotype(feature) of individual i

p(β=0)error(residual)

Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

Page 20: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 21: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Whole Genome Association

Page 22: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Whole Genome AssociationCurrent microarrays probe ~1M SNPs!

Standard approach: Evaluate significance for association of each SNP independently:

sig

nif

ican

ce

Page 23: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Whole Genome Associationsi

gn

ific

ance

Manhattan plot

ob

serv

edsi

gn

ific

ance

Expected significance

Quantile-quantile plot

Chromosome & position

GWA screens include large number of statistical tests!• Huge burden of correcting for multiple testing!• Can detect only highly significant associations (p < α / #(tests) ~ 10-7)

Page 24: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

GWAS: >20 publications in 2006/2007

Massive!

Page 25: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.
Page 26: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.
Page 27: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.
Page 28: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.
Page 29: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the

calcium-sensing receptor (CASR) gene

Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka, Zoltán Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F. Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth, Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik, James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari Stefansson, Gérard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann

Page 30: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Current insights from GWAS:

• Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations

• Many of these associations have been replicated in independent studies

Page 31: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Current insights from GWAS:

• Each locus explains but a tiny (<1%) fraction of the phenotypic variance

• All significant loci together explain only a small (<10%) of the variance

David Goldstein:

“~93,000 SNPs would be required to explain 80% of the population variation in height.”

Common Genetic Variation and Human Traits, NEJM 360;17

Page 32: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

The “Missing variance” (Non-)Problem

Why should a simplistic (additive) model using incomplete or approximate features possibly explain anything close to the genetic variance of a complex trait?

… and it doesn’t have to as long as Genome-wide Association Studies are meant to as an undirected approach to elucidate new candidate loci that impact the trait!

Page 33: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

1. Other variants like Copy Number Variations or epigenetics may play an important role

2. Interactions between genetic variants (GxG) or with the environment (GxE)

3. Many causal variants may be rare and/or poorly tagged by the measured SNPs

4. Many causal variants may have very small effect sizes

5. Overestimation of heritabilities from twin-studies?

So what do we miss?

Page 34: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 35: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Intensity of Allele G

Inte

nsi

ty o

f A

llele

A

Genotypes are called with varying uncertainty

Page 36: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Some Genotypes are missing at all …

Page 37: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

… but are imputed with different uncertainties

Page 38: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

… using Linkage Disequilibrium!

Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.

Marker 1 2 3 n

LD

D

Page 39: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Conclusion

• Genotypic markers are always measured or inferred with some degree of uncertainty

• Association methods should take into account this uncertainty

Page 40: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Two easy ways dealing with uncertain genotypes

1. Genotype Calling: Choose the most likely genotype and continue as if it is true(p11=10%, p12=20% p22=70% => G=2)

2. Mean genotype: Use the weighted average genotype(p11=10%, p12=20% p22=70% => G=1.6)

Page 41: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Associations: Basics

• Whole genome associations

• Population stratification

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 42: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

1. Improve measurements:- measure more variants (e.g. by UHS)- measure other variants (e.g. CNVs)- measure “molecular phenotypes”

2. Improve models:- proper integration of uncertainties- include interactions- multi-layer models

How could our models become more predictive?

Page 43: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Towards a layered Systems Model

We need intermediate (molecular) phenotypes to better understand organismal phenotypes

Page 44: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Organisms

Data types

Conditions

Developmental

Physiological

Environmental

Experimental

Clinical

– Genotypic data– Gene expression data– Metabolomics data– Interaction data– Pathways

The challenge of many datasets: How to integrate all the information?

Page 45: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Network Approaches for Integrative Association Analysis

Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

Page 46: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Using expression datafrom Hapmap panel we computed all 1012 pairwiseinteractions for several expression values:

The strongest interactions do not occur for SNPs with high (marginal) significance!

Bioinformatics Advance Access published online on April 7, 2010 Bioinformatics, doi:10.1093/bioinformatics/btq147

Page 47: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Transcription Modules reduce Complexity

SB, J Ihmels & N Barkai Physical Review E (2003)

http://maya.unil.ch:7575/ExpressionView

Page 48: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Association of (average) module expression is often stronger than for any

of its constituent genes

Page 49: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Overview• Associations: Basics

• Whole genome associations

• Population stratification

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 50: BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

• Analysis of genome-wide SNP data reveals

that population structure mirrors geography

• Genome-wide association studies elucidate

candidate loci for a multitude of traits, but

have little predictive power so far

• Future improvement will require– better genotyping (CGH, UHS, …) – New analysis approaches (interactions,

networks, data integration)

Take-home Messages: