The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and...

40
The Integration of ENCODE into the Study of the Complexity of Cancer Susceptibility Stephen J. Chanock, M.D. Director, Division of Cancer Epidemiology & Genetics July 1, 2015

Transcript of The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and...

Page 1: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

The Integration of ENCODE into the Study of the

Complexity of Cancer Susceptibility

Stephen J. Chanock, M.D. Director, Division of Cancer Epidemiology & Genetics

July 1, 2015

Page 2: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Etiology of Cancer Environment Genetic Susceptibility

Lifestyle

Mutations

SNPs

STR’s

Chromosomes

(Epigenetics)

100% 100%

“Triggers” “Set Point” “Chance”

Page 3: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Cancer Genomics: 4 Spaces

Germline Somatic

Dis

cove

ry

Clin

ical

ly

Act

iona

ble

‘Drivers’  ‘Passengers’  

TCGA/ICGC-­‐  Cosmic  Data  

>115  Cancer  Syndromes  >25  Moderate  Penetrant    

>475  GWAS  Loci  BRCA1/2  

Lynch  Syndrome  ACMG  “AcConable”  

Targeted  Therapy  HER2  EGFR  

BRAF600  

Heterogeneity  Metastases  

TEST:  Single  MutaBon  Singe  Gene  Panel  of  “Cancer  Genes”  Exome  Whole  Genome  

Page 4: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

What  Happens  When  There  is  More  than  One  Genome?  

Challenge  of  Cancer  Genomics    

Page 5: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

5

TCGA: Driven by the Numbers But not yet validated in the laboratory….

Value of Frequency in Generating Hypotheses But, further laboratory work is needed…

Page 6: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Evidence for Heritability of Cancer

1866 Broca observed heritability based on familial breast cancer

Interim Twin/Family/Sibling studies…

1969 Li-Fraumeni observed familial clustering (TP53)

1971 Knudson postulated “two-hit” hypothesis for retinoblastoma

1991 Positional cloning of a familial breast cancer gene (BRCA1)

Page 7: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

ATM PTCH

FLCN

BLM

BRCA1 BRCA2

T

MLH1

MSH2 POT1 PMS2

MSH6 HRAS

PTEN

DKC1

TERC

TERT

TINF2

WRAP53 TP53

NF1

EXT1

EXT2

FANCA

FANCB

FANCC

FANCD2

FANCE FANCF

FANCG PAX5

FANCI

FANCL

FANCJ FANCM

FANCN/BRIP1

>115 Genes Mutated in Cancer Susceptibility Syndromes

RAD51C

TSC2

CDH1

KIT PDGFRA HRPT2

RUNX1 CHEK2

CDKN2A

CDK4

BUB1B

MEN1 RET

ALK

PHOXB2

NF2

NBS1

SDHD SDHC

SDHB

STK11

APC

BMPR1A

SMAD4

MUTYH

FH

MET

RB1

SMARCB1

RECQL4

GPC3

TSC1

ERCC4

VHL

WRN

WT1

XPA ERCC3

XPC

ERCC2

DDB2

SLX4

ERCC5

POLH

GATA2

CEBPA

SDHA

SDHAF2 TMEM127

MAX

DICER1

BAP1

ELANE

HAX1

CBL

PTPN11

MITF

GALNT12

HOXB13

CYLD

EGFR

SUFO DIS3L2

SBDS

Ascertained  in  Families  Rare  MutaCon  with  Strong  Effect  Oncogenes  &  Tumor  Suppressors  

Incomplete  “Penetrance”  Not  all  affected  develop  cancer  Modifiers  -­‐  geneCc  &  environmental  

Page 8: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

8

TCGA: Lessons Learned from the Data Survival Analyses

Impact of germline or somatic mutations

Page 9: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Rahman Nature 2014

Nearly  50%  “High  Frequency”    SomaCc  MutaCons  

High Penetrance Mutations & Somatic Alterations

Page 10: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Search  for  Common  Variants  in  Complex  Diseases  

Reproducible  Technology   SNP  Microarray  Chip  

>5  M  genotyped  SNPs  across  genome  >30  M  imputed  SNPs  across  genome  

High  Concordance  >  99.5%/assay  ‘Markers’  across  the  genome  

Commitment  to  Mapping  

Creates  a  mulBple  tesBng  problem  

Page 11: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Published Cancer GWAS Etiology Hits: July 2015

31 Multiple

Basal Cell 13 Bladder 14 Breast 85 Cervical 7 CLL 31 Colorectal 31 Endometrial 1 Esophageal Sq 17 Ewing Sarcoma 3 Gallbladder 1

Gastric 4 Glioma 10 Hodgkins 8 Kidney 5 Liver 7 Lung 20 Melanoma 14 Multiple Myeloma 6 Nasopharyngeal 11

Neuroblastoma 9 Non-Hodgkin 13 Osteosarcoma 2 Ovary 16 Pancreas 14 Pediatric ALL 9 Prostate 99 Thyroid 5 Wilms 3 Testicular 21

CASP8

SHROOM2

LSP1

SLC14A1

ERBB4

2q35

CHST9 18q11.2

IRF4

TSPAN32

15q23

BMF

15q21.3

SP140

PMAIP1

PTPN2

BAK1 MICA

6p21.32 CDKN1A

15q15.1

GREM1

DCC SMAD7

MICB

EXOC2

HLA-DRA

SLC25A37

CTBP2 TBX5

KLK2/ KLK3

NUDT11

AR

TRIM31

NOTCH4

FOXP4

MYO6

9p21

XAGE3

SLC7A

Xq13

MLPH

NEDD9

RSG17

GSDMB CCHCR1 HLA-DRB5

ARMC2 HLA-DRB6

6q14.1

11p15 10p15.1 GATA3

10p14

DNAJC1

FGFR2 10q26

10q26.12 TCF7L2 TRIM8

FAS

ZMIZ1 ERG2 ZNF365 MSMB

MARCH8

11q24.3 GRAMD1B PHLDB1 HTR3B

11q23.1

MMP7

FAM111A OVOL1/ SNX32 11q13 CCND1

POLD3

11q14

C12orf51 RPL6 12q24.21

ALDH2 ACAD10 NTN4

KRT8 KRT5 12q13 12q13.13

TUBA1C

CCND2

CCDC88C TTC9

RAD51L1 RAD51B SIX1

BMP4

MC1R

IRF8

CDYL2 PHLPP2 CDH1 FTO

HEATR3 TOX3

VPS53 SMG6 TP53

17q12 HNF1Bx2

SPOP HAP1

SALL3 BCL2

BABAM1 ELL 19p13

CCNE1 RHPN2

19q13 C19orf61 PRKD2

LILRA3

C20orf54 BMP2 20p12.3

RALY ADNP

GATAS LAMA5

ZGPAT

RTEL1

EPAS1

ZEB2

TAF1B

2p24.1 C2orf43

2p22.2

THADA

EHBP1

REL

GGCX/ VAMP8

12p13.1 ITPR2

PTHLH

SCARB1

KIF1B

CDCA7

ACOXL

2q14.2

STAT4

UGT1A

FARP2

DIRC3

ITGA6 DLX2 NABP1

HLA-DQ

RFX6 ECHDC1/ RNF146

HBS1L

ESR1

IPCEF1

TAB2

6q25.1

SLC22A3

15q25.1

CPEB1

CHRNA3/5

17q24

BPTF

TBX1

CHEK2 EMID1 XBP1 MTMR3

22q13

22q13

VTI1A

12q23.1 6q22.1

6p22.1

UNC5CL BTNL2

BAT3

NOL10

DDX1

DLG2

NAT2 NKX3.1

NRG1 EBF2

8p12

HNF4G 8q21.11

EIF3H

MYC CCDC26

8q24

PSCA

9p21.3

CDKN2A/ CDKN2B

RAD23B

FOXE1 9q31.2 DAB21P

NKX2-1

PRDM14

DMRT1

KITLG

16q22.3

1p13.2 GSTM1 deletion

1p11.2 GOLPH3L

SLC25A44

KCNN3

UCK2

LGR6 MDM4 SLC41A1

DUSP10

ITPR1

TARDBP

PEX14

1p36

EOMES DAZL SLC4A7

TGFBR2

3p12.1

3p11.2

SIDT1 ZBTB20

EEFSEC

ZBTB38

MYNN

ST6GAL1 TP63

C3orf21

TACC3

EXOC1

AFM COX18 PDLIM5 4q22.3 ADH1B

4q26

ADAM29

LEF1 CENP3

TET2

TERT/ CLPTM1L

IRX4

PRKAA1

5p12 FGF10

PDE4D

MAP3K1 RAB3C

5q31.1

SPRY4

EBF1 FAM44B

ATF7IP

MCM3AP

NRIP1 GRIK1

RUNX1 TMPRSS2

17q22

ZNF652 COX11

OR8U8

MAD1L1

SP8

JAZF1

TNS3 IKZF1 EGFR

LMTK2

POT1

7q32

ARHGEF5

PIP4K2A

ARID5B

FERMT2 PAX9 MBIP

CEBPE

LINC-PINT

16q23.1 FAM19A5

NR5A2

DAB2

PRLHR

MIPEP

PDX1 BRCA2

13q22

UBAC2 13q22.1

BACH1

TFF1

ZNRF3

2q31.1

3q25.31

8q21.13

BNC2

10p12.31

17q21.32

BRIP1

2p25.2

GRM4

3q27.3

6p21.33

BARD1

1q21.1

DUSP12

DDX4

6p22.3

LIN28B

LMO1

HSD17B12

HACE1 1q22

ITGA9

MDS1

HLA-A/B/C GABBR1 HLA-F

TNFRSF19

FOXQ1

RANBP9

ULK4 DNAH11

CBX7

TNFRSF13B

MKL1

1q21.3

PARP1

10q25.1

TYR

ATM

CDC91L1

MX2 22q13.1

RHOU

PLCE1

TGM3

RGS22 8q22.3

JAG1

ZC3H11A

ARRDC3

PRC1

PITX1

10q24.2

CLDN11/ SKIL

TERC

NCOA1

LPP

PVT1 CXCR5

ETS1

RSPO1 PEX14

1p36

SYNPO2

GPX6

ATAD5

>490  Disease  Loci  marked  by  SNPs  For  >  29  “cancers”  1  Locus  marked  by  a  CNV  20%  Intergenic  ~8%  Shared  between  cancers=  Pleiotropy  

ABO

Virtually  none  are  associated  with  outcomes  

>100  to  be  reported  by  ONCOARRAY  (Breast/prostate/lung/colon/ovarian)    

Page 12: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

GWAS  Signals  &  SomaCc  MutaCons:  No  Strong  CorrelaCon  

Redundant  Pathways-­‐  ‘NOT  Drivers’  GWAS Genes Permutation Genes

Number of Mutations (per 100 individuals)0 0−1 1−2 2−3 3−4 4−5 5<

M Machiela in Revision

Page 13: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

     InterpretaCon:    CorrelaCon  does  not  imply  causaCon    GWAS  designs  provide  no  mechanism  to  disGnguish    staGsGcal  associaGon  from  causaGon  

Balding,      Nature  GeneGcs  Review  2006  

Linkage GWAS

NGS

Page 14: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Can  we  use  ENCODE  to  prioriGze  SNPs  for  follow-­‐up?  

Comprehensive Functional Annotation of 77 ProstateCancer Risk LociDennis J. Hazelett1*, Suhn Kyong Rhie1, Malaina Gaddis2, Chunli Yan1, Daniel L. Lakeland3,

Simon G. Coetzee4, Ellipse/GAME-ON consortium5", Practical consortium6", Brian E. Henderson5,

Houtan Noushmehr4, Wendy Cozen7, Zsofia Kote-Jarai6, Rosalind A. Eeles6,8, Douglas F. Easton9,

Christopher A. Haiman5, Wange Lu10, Peggy J. Farnham2, Gerhard A. Coetzee1*

1 Departments of Urology and Preventive Medicine, Norris Cancer Center, University of Southern California Keck School of Medicine, Los Angeles, California, United States

of America, 2 Department of Biochemistry and Molecular Biology, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of

America, 3 Sonny Astani Department of Civil and Environmental Engineering, University of Southern California, Los Angeles, California, United States of America,

4 Department of Genetics, University of Sao Paulo, Ribeirao Preto, Brazil, 5 Department of Preventive Medicine, Norris Cancer Center, University of Southern California

Keck School of Medicine, Los Angeles, California, United States of America, 6 The Institute of Cancer Research, Sutton, United Kingdom, 7 USC Keck School of Medicine,

Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America, 8 Royal Marsden National Health Service (NHS)

Foundation Trust, London and Sutton, United Kingdom, 9 Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, United

Kingdom, 10 Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Department of Biochemistry and Molecular Biology, Keck School of Medicine,

University of Southern California, Los Angeles, California, United States of America

Abstract

Genome-wide association studies (GWAS) have revolutionized the field of cancer genetics, but the causal links betweenincreased genetic risk and onset/progression of disease processes remain to be identified. Here we report the first step insuch an endeavor for prostate cancer. We provide a comprehensive annotation of the 77 known risk loci, based upon highlycorrelated variants in biologically relevant chromatin annotations— we identified 727 such potentially functional SNPs. Wealso provide a detailed account of possible protein disruption, microRNA target sequence disruption and regulatoryresponse element disruption of all correlated SNPs at r2§0:5. 88% of the 727 SNPs fall within putative enhancers, and manyalter critical residues in the response elements of transcription factors known to be involved in prostate biology. We defineas risk enhancers those regions with enhancer chromatin biofeatures in prostate-derived cell lines with prostate-cancercorrelated SNPs. To aid the identification of these enhancers, we performed genomewide ChIP-seq for H3K27-acetylation, amark of actively engaged enhancers, as well as the transcription factor TCF7L2. We analyzed in depth three variants in riskenhancers, two of which show significantly altered androgen sensitivity in LNCaP cells. This includes rs4907792, that is inlinkage disequilibrium (r2~0:91) with an eQTL for NUDT11 (on the X chromosome) in prostate tissue, and rs10486567, theindex SNP in intron 3 of the JAZF1 gene on chromosome 7. Rs4907792 is within a critical residue of a strong consensusandrogen response element that is interrupted in the protective allele, resulting in a 56% decrease in its androgensensitivity, whereas rs10486567 affects both NKX3-1 and FOXA-AR motifs where the risk allele results in a 39% increase inbasal activity and a 28% fold-increase in androgen stimulated enhancer activity. Identification of such enhancer variants andtheir potential target genes represents a preliminary step in connecting risk to disease process.

Citation: Hazelett DJ, Rhie SK, Gaddis M, Yan C, Lakeland DL, et al. (2014) Comprehensive Functional Annotation of 77 Prostate Cancer Risk Loci. PLoS Genet 10(1):e1004102. doi:10.1371/journal.pgen.1004102

Editor: Vivian G. Cheung, University of Michigan, United States of America

Received October 1, 2013; Accepted November 14, 2013; Published January 30, 2014

Copyright: ! 2014 Hazelett et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The work reported here was funded by the National Institutes of Health (NIH) [CA109147, U19CA148537 and U19CA148107 to GAC; 5T32CA009320-27to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587 for MG. The scientific development and funding ofthis project were in part supported by the Genetic Associations and Mechanisms in Oncology (GAME-ON): a NCI Cancer Post-GWAS Initiative. The funders had norole in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected] (DJH); [email protected] (GAC)

" Membership of the Ellipse/GAME-ON consortium and the Practical consortium is provided in the Acknowledgments.

Introduction

The basic goal of research into human genetics is to connectvariation at the genetic level with variation in organismal andcellular phenotype. Until recently, inferences about such connec-tions have been limited to the kind associated with heritabledisorders and developmental syndromes. Such variations oftenturn out to be the result of disruptions to protein coding sequencesof critical enzymes for an affected pathway. Recent advances in

genomics and medicine have begun to illuminate a sea of variationof a more subtle variety, not always the result of mutation ofprotein coding sequences. In particular, genome-wide associationstudies (GWAS) have identified thousands of variants associatedwith hundreds of disease traits [1]. These variants, typicallyencoded by single nucleotide polymorphisms (SNPs), are givenlandmark status and called ‘index-SNPs’ (they are also frequentlyreferred to in the literature as ‘tag-SNPs’) as the reference fordisease or phenotype association in that region. The vast majority

PLOS Genetics | www.plosgenetics.org 1 January 2014 | Volume 10 | Issue 1 | e1004102

GPA: A Statistical Approach to Prioritizing GWAS Resultsby Integrating Pleiotropy and AnnotationDongjun Chung1,2., Can Yang1,3,4., Cong Li5, Joel Gelernter3,6,7,8, Hongyu Zhao1,5,7,9*

1 Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America, 2 Department of Public Health Sciences, Medical University

of South Carolina, Charleston, South Carolina, United States of America, 3 Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of

America, 4 Department of Mathematics, Hong Kong Baptist University, Hong Kong, China, 5 Program in Computational Biology and Bioinformatics, Yale University, New

Haven, Connecticut, United States of America, 6 VA CT Healthcare Center, West Haven, Connecticut, United States of America, 7 Department of Genetics, Yale School of

Medicine, West Haven, Connecticut, United States of America, 8 Department of Neurobiology, Yale School of Medicine, New Haven, Connecticut, United States of America,

9 VA Cooperative Studies Program Coordinating Center, West Haven, Connecticut, United States of America

Abstract

Results from Genome-Wide Association Studies (GWAS) have shown that complex diseases are often affected by manygenetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem.There is a need to develop more powerful statistical methods to leverage available information to improve upon traditionalapproaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novelstatistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation), to increase statistical power to identifyrisk variants through joint analysis of multiple GWAS data sets and annotation information because: (1) accumulatingevidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2) functionally annotatedvariants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasetsand functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence ofpleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking isachieved through an EM algorithm that can handle genome-wide markers efficiently. When we applied GPA to jointlyanalyze five psychiatric disorders with annotation information, not only did GPA identify many weak signals missed by thetraditional single phenotype analysis, but it also revealed relationships in the genetic architecture of these disorders. Usingour hypothesis testing framework, statistically significant pleiotropic effects were detected among these psychiatricdisorders, and the markers annotated in the central nervous system genes and eQTLs from the Genotype-Tissue Expression(GTEx) database were significantly enriched. We also applied GPA to a bladder cancer GWAS data set with the ENCODEDNase-seq data from 125 cell lines. GPA was able to detect cell lines that are biologically more relevant to bladder cancer.The R implementation of GPA is currently available at http://dongjunchung.github.io/GPA/.

Citation: Chung D, Yang C, Li C, Gelernter J, Zhao H (2014) GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy andAnnotation. PLoS Genet 10(11): e1004787. doi:10.1371/journal.pgen.1004787

Editor: Hua Tang, Stanford University, United States of America

Received February 17, 2014; Accepted September 29, 2014; Published November 13, 2014

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone forany lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Funding: This study was supported by National Institutes of Health grants RC2 DA028909, R01 DA12690, R01 DA12849, R01 DA18432, R01 AA11330, R01AA017535, P50 AA12870, R01 GM59507, R01 DA030976, UL1 RR024139, and the VA Cooperative Studies Program of the Department of Veterans Affairs, Office ofResearch and Development. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* Email: [email protected]

. These authors contributed equally to this work.

Introduction

Hundreds of genome-wide association studies (GWAS) havebeen conducted to study the genetic bases of complex humantraits. As of January, 2014, more than 12,000 single-nucleotidepolymorphisms (SNPs) have been reported to be significantlyassociated with at least one complex trait (see the web resource ofGWAS catalog [1] http://www.genome.gov/gwastudies/). De-spite of these successes, these significantly associated SNPs canonly explain a small portion of genetic contributions to complextraits/diseases [2]. For example, human height is a highly heritabletrait whose heritability is estimated to be around 80%, i.e., 80% ofvariation in height within the same population can be attributed togenetic effects [3]. Based on large-scale GWAS, about 180 SNPshave been reported to be significantly associated with humanheight [4]. However, these loci together only explain about 5-10%

of variation in height [2,4,5]. This phenomenon is referred to asthe ‘‘missing heritability’’ [2,6,7].

Identifying the source of this missing heritability has drawnmuch attention from researchers, and progress has been madetowards explaining the apparent discrepancy. The role of a muchgreater-than-expected set of common variants (minor allelefrequency (MAF)§0.01) has been shown to be critical inexplaining the phenotypic variance [8]. Instead of only usinggenome-wide significant SNPs, Yang et al. [9] reported that, byusing all genotyped common SNPs, 45% of the variance forhuman height can be explained. This result suggests that a largeproportion of the heritability is not actually missing: given thelimited sample size, many individual effects of genetic markers aretoo weak to pass the genome-wide significance, and thus thosevariants remain undiscovered. So far, people have found similargenetic architectures for many other complex traits [10], such as

PLOS Genetics | www.plosgenetics.org 1 November 2014 | Volume 10 | Issue 11 | e1004787

YES      &      NO  

Page 15: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

GWAS Signals “One by One” Investigation Insights into Biology Perturbations of Redundant Pathways/Processes Not Causal Instead….. Functional Contribution

Page 16: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

       PSCA, 8q24.3 Discovered 2009

Translational application rs2294008 predicts PSCA

expression in tumors Anti-PSCA therapy?

Possible Clinical Trial

Therapeutic humanized anti-PSCA antibody for

bladder cancer

Fine Mapping Genotyping & Imputation

mRN

A  

Surface  

expression

 Protein,  IH

C  

Bladder Cancer GWAS Discovery Clinical Trial Target Prostate Stem Cell Antigen (PSCA)

PI: M Prokunina-Olsson

rs2294008  

Functional Studies Risk allele T

!mRNA expression

Mila Prokunina- Olsson

Page 17: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Architecture of Genetic Susceptibility of Cancer Defining Distinct Spaces

Damaging  Drivers  

PerturbaGon  Key  pathways  

BRCA1,  TP53,  RB,PTCH  

Polygenic  Models  

SNPs  &  SNPs  

Page 18: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

What Fraction of the Polygenic Component Contributes to Each Cancer?

•  13 cancer GWAS •  49,492 cases •  34,131 controls (often used in > 1 study)

•  Use genotyped SNPs

•  Explains 10-50% of variability on the liability scale

Page 19: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Across Cancer Types

Sampson - under review

Page 20: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Shared Heritability from GWAS 13 Distinct Cancers

(49,492 cases and 34,131 shared controls)

Shared factors: Some expected • Testes & Kidney • CLL & DLBCL • Bladder & Lung (smoking) Others not… •  DLBCL & Osteosarcoma

Josh Sampson + 280 co-authors

Page 21: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

 PredicBon  is  difficult,  

Especially  about  the  future.  

Yogi  Berra  Dan  Quayle  

Niels  Bohr  

Page 22: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Genetic Predisposition to Breast Cancer European Population

1

1.1

1 0 0.1 0.9

3

10 BRCA1 BRCA2

TP53 PTEN

ATM CHEK2

PALB2 BRIP1 RAD51C ERCC2

1.2

1.3

1.4

1.5

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Popu

latio

n ge

noty

pe r

elat

ive

risk

Population risk-allele frequency

??

> Doubled in 2014….now 100 loci 60 More with OncoArray

Explain  ~35-­‐40%  excess  familial  risk  

15%  +  35-­‐40%  >  50%  FRR  

Page 23: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

•  Total heritability corresponds to 2-fold sibling relative risk. •  GWAS Heritability: ~3000 SNPs explain 1.4 fold sibling relative risk

2015 Projections For General Population So far, we explain ~35-40% familial risk Change in Absolute Risk For Screening

JuHyun Park

Page 24: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Projected Distribution of Absolute Lifetime Risk (Age 30-80) of Breast Cancer for US Caucasian Women

Average  Risk  =  11.2%    Full  model    Risk  at  bo_om  decile=4.3%    Risk  at  top  decile=  23.7%    

Page 25: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

50   55   60   65   70   75   80   85  

50

60

AGE

Pros

tate

Can

cer

Ris

k Predicted Prostate Cancer Risk by SNP Profile Distribution (76 SNPs)

1st Percentile 10th Percentile

90th Percentile

99th Percentile

Mean

Antoniou pers commun

Age is critical Public Health Value

Page 26: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Genome-wide association studies

Large chromosomal abnormalities, structural variation, aneuploidy in germline DNA

Unexpected Findings

Rodriguez-Santiago AJHG 2010 Jacobs et al Nature Genetics 2012 Laurie et al Nature Genetics 2012

Page 27: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Nature  Gene*cs  44,  614–616  (2012)  doi:10.1038/ng.2311  

SomaBc  Mosaicism-­‐  the  Dynamic  Genome  

Page 28: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Rate of Mosaicism by Chromosome: Adjusted for Chromosomal Size

Page 29: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Combined Sample Detected Events >2Mb (N=1,330 events in 127,417 individuals)

Mosaic Gain Mosaic Copy Neutral Mosaic Loss

* Combined GENEVA+TGSI+TGSII, N=127,417 Mitch Machiela

AJHG 2015

Non-Heme Cancers Cancer-free Controls

Page 30: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Breakpoint  Analysis  of  Large  Mosaic  Regions  

•  688  IntersGGal  Events  •  543  Telomeric  Copy  Neutral  Events  

•  Examined    – 200kb  Windows    – 500  PermutaGon    

•  Enrichment  of  ENCODE  elements?  

Page 31: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

95%  CI  

PermutaCon  DistribuCons  

µ  

549  Telomeric  Copy  Neu  688  IntersGGal  Loss  

*GENEVA+TGSI+TGSII  events  >2Mb  

ENCODE  Features  around  Breakpoint  Regions  

Page 32: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

ENCODE  Features  around  Breakpoint  Regions  

Telomeric  Copy  Neutral  IntersGGal  Loss  

Open  Chrom

aCn  

*GENEVA+TGSI+TGSII  events  >2Mb  

Page 33: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Telomeric  Copy  Neutral  IntersGGal  Loss  

*GENEVA+TGSI+TGSII  events  >2Mb  

Repe

CCve  Elemen

ts  

ENCODE  Features  around  Breakpoint  Regions  

Page 34: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Telomeric  Copy  Neutral  IntersGGal  Loss  

*GENEVA+TGSI+TGSII  events  >2Mb  

ENCODE  Features  around  Breakpoint  Regions  Gen

e-­‐rich  Regions  

Page 35: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Telomeric  Copy  Neutral  IntersGGal  Loss  

*GENEVA+TGSI+TGSII  events  >2Mb  

ENCODE  Features  around  Breakpoint  Regions  

Page 36: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Detectable Mosaicism: Tip of the Iceberg?

Large events detected by SNP arrays/aCGH Detect smaller events with new algorithms for NGS (NEJM 2014) “U” shape curve Seen in very young & aging population Significance for aging diseases

Xie et al Nat Med 2014

Page 37: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Current  Challenges  of  Explaining  SuscepGbility  

•  Tissue  Specificity  •  Tissue  of  origin  •  Adjacent  cells  •  Immunological  ModulaGon  

•  Example:  SelecGve  Success  of  Immune  Blockade  (PD-­‐1)  

•  Timing  of  Effect  

•  InteracGon  with  environmental  sGmuli  

Page 38: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Immense  Value  of  ENCODE  

ScienCfic    

•  Spectacular  Resource  for  Understanding  the  FuncGonal  Basis  of  SuscepGbility  –  PrioriGzaGon  of  variants  

•  Opportunity  to  Explore  Novel  Elements  –  Individual  –  InteracGons  

Cultural    

•  Team  Science  –  Short  Term  

–  Long  Term  

•  Establish  Thresholds  &  Standards  –  Driven  by  QuesCons  at  hand  

Page 39: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

Acknowledgements NCI-DCEG Joseph Fraumeni Robert Hoover Peggy Tucker Meredith Yeager Mitch Machiela Nilanjan Chatterjee Juhyn Park Zhaoming Wang Weiyin Zhou Nathaniel Rothman Mila Prokunina-Olsson Joshua Sampson Victoria Fisher      

   

NCI-Special Experts David Hunter-HSPH Peter Kraft-HSPH Montse Garcia-Closas-ICR More than 400 Collaborators

Page 40: The integration of ENCODE into the Study of the Complexity of … · 2015. 8. 3. · to HN and NIDH/NHGRI U54HG006996 to PJF] and David Mazzone Awards Program (GAC) and 5T32GM067587

acknowledgements