Human genetic variation and its contribution to complex traits

59
Deplancke Lab Monica Albarca Jean-Daniel Feuz Carine Gubelmann Korneel Hens Alina Isakova Irina Krier Andreas Massouras Sunil Raghav Jovan Simicevic Sebastian Waszak Wiebke Westhall You? deplanckelab.epfl.ch

description

Guest lecture by Prof. Dr. ir. Bart Deplancke introducing the basic principles of systems genetics.

Transcript of Human genetic variation and its contribution to complex traits

Page 1: Human genetic variation and its contribution to complex traits

Deplancke Lab

Monica Albarca

Jean-Daniel Feuz

Carine Gubelmann

Korneel Hens

Alina Isakova

Irina Krier

Andreas Massouras

Sunil Raghav

Jovan Simicevic

Sebastian Waszak

Wiebke Westhall

You?

deplanckelab.epfl.ch

Page 2: Human genetic variation and its contribution to complex traits

Human genetic variation and its contribution to complex traits

Laboratory of Systems Biology and Genetics

26 June 2000

Bart Deplancke ([email protected])

Page 3: Human genetic variation and its contribution to complex traits

The human genome First announcement

In June 2000: first announcement of a working draft (haplotype!) with the Nature and Science papers in February 2001 In June 2001: finished chromosome 20, with others following until finishing of chromosome 1 in May 2006

International Human Genome Sequencing Consortium (2001) Nature 409:860-921; Venter et al. (2001) Science

291:1304-1351.

Gregory et al. (2006), Nature, 441, 315-321

James Kent (UCSC) Eugene Myers (Celera)

Page 4: Human genetic variation and its contribution to complex traits

Why are we so phenotypically different?

Page 5: Human genetic variation and its contribution to complex traits

Classes of human genetic variation Common versus rare Refers to the frequency of the minor allele in the human population:

• Common variants = minor allele frequency (MAF) >1% in the population. Also described as polymorphisms. • Rare variants = MAF < 1%

Neutrality: • The vast majority of genetic variants are likely neutral = no contribution to phenotypic variation. • Some may reach significant frequencies, but this is chance.

Two different nucleotide composition classes:

• Single nucleotide variants • Structural variants

Page 6: Human genetic variation and its contribution to complex traits

Single nucleotide variants

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

T/G T/G A/C

Page 7: Human genetic variation and its contribution to complex traits

Simple 5’ to 3’ read-out

How are SNPs detected?

Unique oligonucleotide primers to generate minimally overlapping lone range-PCR products of 10-kb average

length

High-density oligonucleotide arrays

Flanking issues

Chee et al., Science, 1996

Page 8: Human genetic variation and its contribution to complex traits

How are SNPs detected? Other strategies

Reduced representation

shotgun sequencing followed by genomic

alignment

From Rothberg et al. Nature Biotech, 2001

Clustered alignment

Gene-centric studies

Reference sequence

Page 9: Human genetic variation and its contribution to complex traits

The SNP database - dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/

Three “out of Africa” genomes: • 1.2 million (67%) (all three), 1.7 million (52%) (any two), 1.0 million (30%) unique • Overall, 5.2 million SNPs in the three genomes, the majority being present in dbSNP • Data indicate that most SNVs are common rather than rare

>

> High

Page 10: Human genetic variation and its contribution to complex traits

Single nucleotide variants • Estimated that the human genome contains > 11 million SNPs (~7 million with MAF > 5%, rest between 1-5%). • Unknown how many rare or even novel (“de novo”) SNVs • SNP alleles in the same genomic interval are often correlated with one another “Linkage disequilibrium (LD)” = Nonrandom association of alleles – varies in complex and unpredictable manner across the genome and between different populations. • International HapMap Project can we divide the genome into groups of highly correlated SNPs that are generally inherited together = “LD bins” Number of tag SNPs required to capture common Phase II SNPs

Page 11: Human genetic variation and its contribution to complex traits

Single nucleotide variants • International HapMap Project can we divide the genome into groups of highly correlated SNPs that are generally inherited together = “LD bins” Number of tag SNPs required to capture common Phase II SNPs

Pairwise linkage disequilibrium (LD) r2 (if 1 SNPs statistically indistinguishable)

Based on genotyping over 3.1 million SNPs in 270 individuals from 4 geographically diverse populations (Frazer et al., Nature, 2007)

Recap

By genotyping the DNA sample of an individual with a “tagging” SNP from each LD bin, knowledge regarding 80% of SNPs with a

MAF > 5% across the genome is gained. (Frazer et al., Nature Rev. Genetic., 2010)

Page 12: Human genetic variation and its contribution to complex traits

Scan Entire Genome - 500,000 SNPs

Querying human genetic variation

Page 13: Human genetic variation and its contribution to complex traits

Population Stratification Subdivision of a population into different ethnic groups with

potentially different marker allele frequencies and thus different disease prevalence

Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

From Sven Bergmann, UNIL

Page 14: Human genetic variation and its contribution to complex traits

Ethnic groups cluster according to geographic distances

PC1 PC1

PC

2

PC

2

Population Stratification

From Sven Bergmann, UNIL

Page 15: Human genetic variation and its contribution to complex traits

PCA of POPRES cohort

Population Stratification

From Sven Bergmann, UNIL

Page 16: Human genetic variation and its contribution to complex traits

A classic that opened the door to structural variant research:

Structural variants

(Frazer et al., Nature Rev. Genetic., 2010)

Sebat et al. Large-Scale Copy Number Polymorphism in the Human Genome. Science, 2004.

Used ROMA technique to detect copy number variants

Page 17: Human genetic variation and its contribution to complex traits

1) Genome digestion 2) Adapters to sticky ends and

PCR amplification 3) After PCR, representations of

the entire genome (restriction fragments) are amplified to pronounce relative increases, decreases or preserve equal copy number in the two genomes.

4) Representations of the two different genomes are labeled with different fluorophores and co-hybridized to a microarray with probes specific to restriction site locations across the entire human genome.

Representational Oligonucleotide Microarray Analysis (ROMA)

Page 18: Human genetic variation and its contribution to complex traits

Representational Oligonucleotide Microarray Analysis (ROMA)

On average, individuals (20 tested) differed by 11 CNPs (average length = 465 kb)

affecting 70 genes.

Page 19: Human genetic variation and its contribution to complex traits

Our ability to detect SVs is still very poor (see later)

Structural variants (SVs)

(Frazer et al., Nature Rev. Genetic., 2010)

Page 20: Human genetic variation and its contribution to complex traits

Structural variants (SVs) Fosmid-based library

sequencing of 8 humans (4 Yorubian and 4 non-African)

(Kidd et al., Nature, 2008)

• 1 million fosmid clones/individual • Both ends of each clone insert sequenced a pair of high-quality end sequences (termed an end-sequence pair (ESP).

(~450 bp/sequence)

Only SVs over 8 kb can be detected

Page 21: Human genetic variation and its contribution to complex traits

Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian

and 4 non-African) (Kidd et al., Nature, 2008)

~2,000 SVs that were experimentally verified

Novel sequence (either in

gaps (black) or not

(orange))

Page 22: Human genetic variation and its contribution to complex traits

Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian

and 4 non-African) (Kidd et al., Nature, 2008)

~2,000 SVs that were experimentally verified

Novel sequence (either in

gaps (black) or not

(orange))

• 50% of SVs seen >1 individual • ~50% outside regions previously annotated as SVs nearly half lay outside regions of the genome previously described as structurally variant • 525 new insertion sequences • 20% of all genetic variants = SVs, but covers >70% of nucleotide variation • SVs b/w 9- 25 Mb (~0.5-1% of the genome) • The majority of SVs are yet to be discovered

Page 23: Human genetic variation and its contribution to complex traits

Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian

and 4 non-African) (Kidd et al., Nature, 2008)

Regions of increased SNV

density

Page 24: Human genetic variation and its contribution to complex traits

Structural variants and linkage disequilibrium McCarroll et al., Nature Genet., 2008

• Most common, diallelic CNPs (with MAF greater than 5%) were perfectly captured (r2 = 1.0) by at least one SNP tag from HapMap Phase II • Mean r2 as a function of distance from a polymorphism = indistinguishable for SNPs and diallelic CNPs common, diallelic CNPs are ancestral mutations

Common SVs are in LD with tagging SNPs

Page 25: Human genetic variation and its contribution to complex traits

Contribution of variants to phenotypes?

Page 26: Human genetic variation and its contribution to complex traits

Common versus rare “Common disease – common variant hypothesis”

versus Common complex traits are the summation of low-frequency, high-penetrance variants

OR = odd ratio or PAR = population attributable risk = measure of the multifactorial inherited component of a disease

Page 27: Human genetic variation and its contribution to complex traits

Whole Genome Association studies

How significant is this?

Page 28: Human genetic variation and its contribution to complex traits

P-value

Note: “Genome-wide” is a misnomer • 20% of common SNPs not or only partially tagged • Rare variants not tagged at all

Whole genome association studies

Page 29: Human genetic variation and its contribution to complex traits

Whole Genome Association studies

* * *

* * Scan Entire Genome - 500,000 SNPs

Identify local regions of interest, examine genes, SNP density regulatory regions, etc

Replicate the finding

-lo

g 10(p

) -l

og 1

0(p

)

From Sven Bergmann, UNIL

Concept

Page 30: Human genetic variation and its contribution to complex traits

McCarthy et al., Nature Rev. Genet., 2008

Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

Whole Genome Association studies

Visualization

Page 31: Human genetic variation and its contribution to complex traits

* * *

* * Scan Entire Genome - 500,000s SNPs

Identify local regions of interest, examine genes, SNP density regulatory regions, etc

Replicate the finding

-lo

g 10(p

) -l

og 1

0(p

)

Whole genome association studies

Concept

From Sven Bergmann (UNIL)

Page 32: Human genetic variation and its contribution to complex traits

Whole genome association studies An avalanche of GWA studies

• From 2006 >220 studies reported to date • For over 80 phenotypes 300 loci have been implicated • Most implicated loci were identified for the first time (no prior knowledge)

Page 33: Human genetic variation and its contribution to complex traits

Whole genome association studies Type 2 diabetes: an example

• 18 genomic intervals with 4 containing previously implicated genes • Major message: the molecular diversity of T2D genes was not anticipated, thus:

(Patients with = disease) ≠ (Patients with = underlying biological disorder)

Frazer et al., Nat. Rev. Genet., 2010

Page 34: Human genetic variation and its contribution to complex traits

Whole genome association studies Overlap of genetic risk factor loci for common diseases

• 15 loci are associated with two or more diseases (8 are shown) • Not necessarily same impact (PTPN22 + Crohn’s, - for other ai diseases • Different diseases may have similar molecular underpinnings

• Expected: ai diseases (same clinical features) • Unexpected: e.g. GCKR in both TGC levels and ai disease

Frazer et al., Nat. Rev. Genet., 2010

Page 35: Human genetic variation and its contribution to complex traits

Whole genome association studies From association to molecular mechanism

• Very difficult: • what are the precise variants associated with a trait? • if located in exons: easy, but outside, then what? • most are located outside exons! (e.g. 9p21 <-> myocardial infarction is located 150 kb from the nearest gene!) • May have a regulatory function, i.e. control gene expression

1 c2 3

A G

• humans are heterozygous at more functional cis-regulatory sites than at amino acid positions, with 10,700 functional biallelic cis-regulatory polymorphisms in a typical human (Rockman and Wray. Mol. Biol. Evol., 2002: 19, 1991). • 34% of promoter polymorphisms (170 tested) significantly modulated reporter gene expression (>1.5-fold) (Hoogendoorn et al., Hum. Mol. Genet., 2003: 12, 2249). • Case study with the CC chemokine receptor 5, a major chemokine coreceptor of HIV-1 necessary for viral entry into cells

• G to A SNP of CCR5 at –2459 nt • CCR5 density – low (homozygous GG), intermediate (GA), and highest (homozygous –2459AA) (Salkowitz et al., Clin. Immunol., 2003: 108, 234).

Page 36: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

• Transcript abundance = a quantitative trait that can be mapped with considerable power = eQTLs

Environment Genetics

Heritability (H2) = genetic variance over total trait variance with 0 = no genetic effects and 1 = all variance is under genetic control

Classic paper: Schadt et al., Nature, 2003 Genetics of gene expression surveyed in maize, mouse and man

• Liver tissues from 111 F2 mice constructed (from C57BL/6J and DBA/2J) • Microarray analysis of 23,574 genes: 7,861 significantly differentially expressed (either in the

parental strains or in at least 10% of the F2 mice)

• eQTL identification (log of the odds ratio (LOD) > 4.3 (P-value < 0.00005))for 2,123 genes • These eQTLs explained 25% of the transcription variation of the corresponding genes

Page 37: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Schadt et al., Nature, 2003

% eQTL across 920 evenly spaced bins, each 2 cM wide

• Several hotspots (>1% of detected eQTLs are located within a 4 cM

interval)

• 40% of genes with ≥ 1 eQTL (LOD > 3.0) had more than one eQTL, and

close to 4% of such genes had more than three eQTL

Gene expression = complex trait

Page 38: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Schadt et al., Nature, 2003

Known polymorphisms between the two parental strains • Overlap between polymorphism and

eQTL = cis-acting transcriptional regulation

For example:

• The C5 gene 2 bp deletion in the coding region in DBA mice resulting in

rapid transcript decay compared with B6. A LOD of 27.4 centred over the C5 gene

on chromosome 2 is readily detected (black curve).

• The Alad gene present in 2 copies in DBA

Page 39: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Schadt et al., Nature, 2003

Combining clinical, gene expression and genetic factors

• Classical QTLs for FPM: 4 significant loci

• Further analyses with subgroups:

additional loci identified

• Some QTLs only affect a subset of the F2 population, demonstrating the complexity

underlying traits such as obesity

Page 40: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Dixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression

• 206 families of British descent using immortalized lymphoblastoid cell lines (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes)

~15,000 H2 > 0.3

Gene Ontology descriptors for: • Response to unfolded protein (HSFs, chaperones) • Immune responses and apoptosis • Regulation of progression through the cell cycle, • RNA processing and DNA repair.

Page 41: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Dixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression

• 206 families of British descent using immortalized lymphoblastoid cell lines (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes)

• Trans effects are weaker than those in cis

• Nevertheless, significant trans associations were detected:

e.g. 1) ~700 transcripts with the peak of association on the same chromosome but

>100 kb from the nearest transcribed gene, 2) 10,382 transcripts, the peak of

association was on a different chromosome

Page 42: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Libioulle et al., PLOS Genet., 2007

Using eQTLs to better understand GWAS results

GWAS for Crohn’s disease

• Disease-associated polymorphisms may be regulating PTGER4 expression in cis, but >250 kb away more research needed but likely regulatory polymorphism

1.25 Mb Gene desert

• One of the neighboring genes PTGER4 may be involved • Trace eQTLs in LCL data

Page 43: Human genetic variation and its contribution to complex traits

Whole genome association studies Mapping eQTLs

Stranger et al., Science, 2007: Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes

We looked at SNPs but what about other structural variants?

• LCLs of 210 unrelated HapMap individuals from four populations • Copy number variants were identified via CGH against a common reference individual

SNP CNV

From probe associated with linked gene From probe associated with linked gene

• 83.6% and 17.7% of the total detected genetic variation in gene expression • SNPs close to their respective genes, less so for CNVs • Little overlap between SNP and CNV associations (only 20%) • Not “mere” gene dosage effects

Page 44: Human genetic variation and its contribution to complex traits

Whole genome association studies How universal are GWAS findings?

• Allele frequencies are different in different populations • LD patterns across loci that co-segregate with a causally associated variant may be different from population to population • Control for population differences is essential in large studies

Frazer et al., Nat. Rev. Genet., 2010 Associated with myocardial

infarction

LD less strong in African population bottleneck principle

Red = high pairwise SNP correlation

SNPs that efficiently (r2 > 0.8) tag one another are

connected

Page 45: Human genetic variation and its contribution to complex traits

Whole genome association studies Impact so far

• No complex traits for which there is > 10% of the genetic variance explained e.g. T2D: 18 genetic variants together < 4% of the total trait liability

• Sample size may compensate (increased statistical power) But…studies for lipid phenotypes involving >40,000 people still <10% … some diseases have only a low number of affected individuals

• Does the answer lie in structural variants? Most are still unmapped But… they are likely in LD with common SNPs

• Does the answer lie in rare variants? Possibly…

• Rare variants are not in LD with tagging SNPs and thus so far undetected (Amish study) • Can have very high penetrance • However, how to detect on a population-wide basis?

Page 46: Human genetic variation and its contribution to complex traits

The power of whole-genome sequencing

• Sequenced genomes of 2 parents and 2 children, both affected by Miller Syndrome • Identified 3.7 million SNPs that varied within the family • Resequenced 34000 candidate mutations 28 de novo mutations • Narrowing down via “rare” assumption and knowledge of recessive inheritance • Found one gene, dihydroorotate dehydrogenase (DHOH) known to be involved

Miller syndrome: autosomal recessive genetic trait (Roach et al., Science, 2010)

Whole genome association studies

Page 47: Human genetic variation and its contribution to complex traits

Toward the elucidation of each person’s genetic make-up Entering the age of personalized medicine

Necessary for: 1) DNA-based risk assessment for common complex disease 2) Drug discovery (new implicated genes can be identified)

But also to: 3) Identify molecular signatures for disease diagnosis and prognosis

And for:

4) A DNA-guided therapy and dose selection A person’s genetic make-up significantly affects the efficacy of a drug

• Polymorphisms in the VKORC1 and CYP2C9 genes dictate the effective dose levels of the anti-coagulant Warfarin • Polymorphisms in the UGT1A1 gene correlate with increased toxicity of the anti-colon cancer drug Irinotecan • Polymorphisms in the MTHFR gene are associated with increased toxicity of Methotrexate used to treat Crohn’s disease • Polymorphisms in the CYP2D6 gene dictates the probability of relapse in women with breastcancer treated with Tamoxifen

Page 48: Human genetic variation and its contribution to complex traits

The revolution of high-throughput sequencing: Illumina Entering the age of personalized medicine

Solid phase amplification: 1) initial priming and extending of the single-stranded, single-molecule template, and 2) bridge amplification of the immobilized template with immediately adjacent primers to form clusters.

Metzker et al., Nat. Rev. Genet., 2010

1

1

Page 49: Human genetic variation and its contribution to complex traits

From sequence to genome: mapping reads Entering the age of personalized medicine

Trapnell and Salzberg, Nat. Biotech., 2009

Four sequences of equal strength = seeds

If 1SNP, the other 3 seeds intact; If 2 SNPs, the other 2 seeds intact; Thus, max 2 SNPs/read Limitation: Indexing takes up huge memory

Using BW, the index for the entire human genome fits into < 2

Gb of memory

Is 30 times faster than indexing

Also is limited to 2 SNPs within one

read

Page 50: Human genetic variation and its contribution to complex traits

Burrows-Wheeler transform

Entering the age of personalized medicine

Wikipedia

Easier to compress strings with runs of repeated characters

Page 51: Human genetic variation and its contribution to complex traits

A first human genome project using HTS

Entering the age of personalized medicine

Bentley et al., Nature, 2008 • Solexa Technology • First: X-chromosome

• 204 million reads • Sampling of sequence fragments is close to random (GC content slight effect)

Page 52: Human genetic variation and its contribution to complex traits

A first human genome project using HTS

Entering the age of personalized medicine

Bentley et al., Nature, 2008 • 135 Gb of sequence (~4 billion paired 35-base reads) (8 weeks) • The approximate consumables cost = $250,000 • 97% of the reads were aligned using MAQ • 99.9% of the human reference covered with ≥ 1 reads at 40.6X

99% agreement with HapMap results!

Page 53: Human genetic variation and its contribution to complex traits

More human genome projects

Entering the age of personalized medicine

Snyder et al., G&D, 2010

Page 54: Human genetic variation and its contribution to complex traits

More human genome projects

Entering the age of personalized medicine

Snyder et al., G&D, 2010

Page 55: Human genetic variation and its contribution to complex traits

More human genome projects

Entering the age of personalized medicine

Snyder et al., G&D, 2010

Page 56: Human genetic variation and its contribution to complex traits

Tackling the SV problem using HTS

Entering the age of personalized medicine

• Really difficult and progress is limited. • Existing methods are based on two approaches:

• Paired-end mapping (PEM) • Depth-of-coverage (DOC) approach

• The ends of each fragment tagged by a biotinylated (B) nucleotide • Circularization forms a junction between the two ends • Random fragmentation and recovery of biotinylated fragments • Circularized DNA is randomly fragmented and the biotinylated junction fragments are recovered • Standard sequencing procedure thereafter

Page 57: Human genetic variation and its contribution to complex traits

Tackling the SV problem using HTS: paired-end mapping

Entering the age of personalized medicine

Medvedev et al., Nature Meth., 2009

Page 58: Human genetic variation and its contribution to complex traits

Tackling the SV problem using HTS: DOC

Entering the age of personalized medicine

Snyder et al., G&D, 2010 Campbell et al., Nature Genet., 2008

Page 59: Human genetic variation and its contribution to complex traits

Entering the age of personalized medicine

Snyder et al., G&D, 2010

Tackling the SV problem using HTS: state-of-the-art