R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington...

Post on 22-Dec-2015

216 views 2 download

Tags:

Transcript of R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington...

R.K.Wilson 2007

“Cancer Genomics”

rwilson@watson.wustl.edu

Richard K. Wilson, Ph.D.Washington University

School of Medicine

R.K.Wilson 2007

Human Genome v1.0

TechnologySoftware toolsInfrastructure

Ancillarygenomes:

mousechimp

etc.

Discovery

CancerOther diseases

Cancer Genomics

Next-generation sequencing technology

R.K.Wilson 2007

list of candidate

genes

large collection of patient samples

PCR-based re-sequencing

R.K.Wilson 2007

K DFG R Y

Tyrosine kinase

745 Y869

K DFG Y Y Y YTM

718 964

EGF ligand binding autophos

GXGXXG

835

R

776

H

858 947

M

LREA

EGFR mutations in NSCLC

Most TKI responders have EGFR mutations: Study 1: 8/9 (89%) vs. 0/7 controls Study 2: 5/5 (100%) vs. 0/4 controls Study 3: 19/24 (79%) vs. 0/20 controls

R.K.Wilson 2007

~600 genes of interest

~200 lung adenocarcinoma samples

Tumor Sequencing Project

• Sequencing Centers: BCM-HGSC, BI, WUGSC• Cancer Centers: MSKCC, DFCI, SCC, MDA

R.K.Wilson 2007

• Too expensive to sequence the whole genome; therefore, focus on “drugable” targets.

• For lung adenocarcinoma TSP: ~600 genes (exons only)– Receptor tyrosine kinases (e.g. EGFR)– Selected serine-threonine kinases– Known oncogenes– Known tumor suppressor genes– EGFR pathway genes– DNA repair genes– Etc.

TSP Target List

R.K.Wilson 2007

SNP Arrays

R.K.Wilson 2007

SNP Arrays

R.K.Wilson 2007

DNA Chips/SNP Arrays

R.K.Wilson 2007

Lung Adeno Genomic Events

SNP Array Analysis

Weir et al. Nature (2007)

R.K.Wilson 2007

Lung Adeno Genomic Events

Weir et al. Nature (2007)

R.K.Wilson 2007

Lung Adeno Genomic Events

Weir et al. Nature (2007)

R.K.Wilson 2007

Lung Adenocarcinoma Amplifications

Weir et al. Nature (2007)

R.K.Wilson 2007

KRAS and TP53 Are Mutated in About 1/3 of Tumor Samples Indels have not been included in the analysis

0

10

20

30

40

50

60

70

KR

AS

E2F

4T

P53

GN

AS

ST

K11

EG

FR

LRR

K2

CD

KN

2AE

PH

A3

NF

1S

CA

RF

2P

TP

RD

LMT

K2

TY

K2

RIN

1R

OR

2M

KN

K2

ER

BB

4LR

P1B

NT

RK

1M

YO

3BP

IK3C

GLZ

TR

1JA

G2

CD

C2L

2E

PH

A5

CD

H11

PA

K3

SLC

38A

3P

IK3C

3IN

SR

RN

TR

K3

AT

MP

RK

CG

BA

GE

4K

DR

PT

EN

NR

AS

ZM

YN

D10

PD

GF

RA

INH

BA

PF

TK

1T

P73

LF

LT4

LTK

DO

CK

3N

TR

K2

EP

HB

6IR

AK

2IT

KE

PH

B1

AP

CE

PH

A7

BA

GE

3M

ST

1LM

TK

3P

AK

7G

AT

A1

TF

DP

1P

RK

AC

BT

SH

RM

INK

1F

GF

R4

RB

1F

GF

R1

# o

f m

uta

tio

ns

Mutations in lung adenocarcinoma

R.K.Wilson 2007

Mutations in TP53, ERBB3, and AKT3 appear to correlate with tumor grade

N=24 N=85 N=71

Mutation

R.K.Wilson 2007

• Mutations in PDGFRA, PTEN, NTRK1 and PRKDC show positive correlation with tumor stage.

• Mutations in LRP1B, PRKDC, TP53, and APC correlate with the solid tumor histological subtype of lung adenocarcinoma.

• High correlation of mutations in EGFR and MYO3B with never smoker and mutations in KRAS and LRP1B with smokers.

Correlations between mutations and clinical features

R.K.Wilson 2007

Screen of kinase domains in glioblastomano recurrent mutations But …

119 Lung Tumors: no EC mutations270 HapMap Normals: no EC mutations

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28TMEC KDJM

18/132 glioblastoma (13.6%); + 1 KD1/8 glioblastoma cell lines (12.5%)

0/11 lower grade gliomas

151 Total samples

red=somaticblue=germlineblack=unknown

L86

1Q

A28

9V/D

/TT

263PR

108K

D46

N,G

63R

R32

4LE

330K

G59

8VP

596L

EGFRvIII (del AA 30-297)

KINASEI II III IV

7 8 15 212 3

EGFR mutations in glioblastoma

• Hypothesis-driven (biased): - Gene sets with related functions: “kinome”,

“phosphatome”- Genes mutated in other cancers- Closely related genes- Investigator-driven ideas

• Data-driven (unbiased):- Use genomic platforms to identify loci with

recurrent somatic alterations- Array-based RNA profiling- Array CGH- Array-based SNP genotyping

Genomic Studies of Cancer

R.K.Wilson 2007

R.K.Wilson 2007

• Project initiated in 2002.• Primary tumors, matched normal

tissue (i.e., germline variants vs. somatic mutations)

• “Discovery set” (46 tumors) + “Validation set” (94 tumors)

• Initial target list: 450 genes• Orthogonal technologies (CGH

arrays, expression profiling, etc.) for genome characterization and to detect additional sequencing targets.

Acute myelogenous leukemia

R.K.Wilson 2007

- FLT3: 29%

- NPM1: 25%

- NRAS: 9.6%

- PTPN11: 4%

- RUNX1: 4%

- GCSFR: 4%

- Others: 2-3%

Acute myelogenous leukemia

• What are we missing outside of the exons?

• PCR-based re-sequencing:- Relatively expensive- Diploid (at best) & low coverage

Is there a better approach?

R.K.Wilson 2007

R.K.Wilson 2007

Solexa/Illumina 1G Analyzer

R.K.Wilson 2007

Solexa/Illumina 1G Analyzer

Illumina flow cell

• Acts as the microfluidic conduit for cluster generation and sequencing reagents.

• 8-lane flow cell configuration.• Separate libraries can be sequenced in each lane, or

the same library in all.• ~60M clusters are sequenced per flow cell.

R.K.Wilson 2007

Next Generation Sequencing Technologies

Genome size 3000 Mb

Req'd coverage 6 12 20

3730 454 FLX Solexabp/read 600 250 32Reads/run 96 400,000 28,000,000 bp/run 57,600 100,000,000 896,000,000 #/runs req'd 312,500 360 67

Cost per run 48$ 6,800$ 9,300$ Total cost 15,000,000$ 2,448,000$ 622,768$

R.K.Wilson 2007

• Whole genome sequence (tumor genome): Solexa• FL cDNA normalized library: Solexa + 454• Whole genome sequence (epidermal genome): Solexa

• Compare sequence to previously identified mutations. • Compare increasing coverage levels to heterozygous

SNPs from Affy/Illumina arrays for coverage evaluation.• Devise strategic approaches to find novel variants;

validate and characterize.

Data types:

Analysis plans:

AML: Whole Genome Sequencing

R.K.Wilson 2007

“933124”

• 57 y/o Caucasian female

• De novo M1 AML• 100% blasts in initial

BM sample• Relapsed and died at

11 months• Normal cytogenetics• No LOH on Affy

500K SNP array• Informed consent for

whole genome sequencing

R.K.Wilson 2007 R.K.Wilson 2007

R.K.Wilson 2007

• As of 1/28/08:• 75 Solexa runs completed (32 bp reads)• 62 billion bp (~22X haploid coverage)• 2,123,143 sequence variants detected (Q30)• 492,569 (23.2%) are previously undiscovered SNPs

• 46,320 heterozygous (informative) SNPs from Affy and Ilumina SNP arrays.

• 77% of informative SNPs with both WT and variant alleles were detected in the genome sequence.

• 97.4% of informative SNPs of either allele were detected in the genome sequence.

AML: Whole Genome Sequencing

R.K.Wilson 2007

R.K.Wilson 2007

“933124” genome sequence

2,123,143 variants

dbSNP 1,630,574

Genic334,477

Intergenic145,092

Splice_site

99Other

329,322Coding5,056

Synonymous1,222

Missense

3,402Nonsense

320Nonstop

9

*Only reporting Q30 variants*Genic region = gene boundary +/- 50kb

AML: Whole Genome Sequencing

R.K.Wilson 2007

454 cDNA sequencing:Number of mapped cDNA reads: 306,267

Solexa cDNA sequencing:Number of mapped reads: 47,153,784

AML: Transcriptome Sequencing

Various cDNA library construction procedures & normalization schemes

Expressed genes: variant:germline frequencies

– MYCBP2 1188:345– HSP90B1 694:1347– BCCIP 391:394– NCOR1 256:268– CHFR 230:52– DNAJ 218:0– PTPN11 198:1– NUMA1 157:2– CASPASE 7 145:147– HOX C6 118:2– PLEKHC1 112:14– NTRK3 112:10– CDC2 96:82

R.K.Wilson 2007

AML: Transcriptome Sequencing

R.K.Wilson 2007

V194M (C to T) in FLT3

cDNA sequence Tumor genome sequence

CTCT

R.K.Wilson 2007

• Currently using SXOligoSearchG (Synamatix) to detect small (1-2 bp) indels.

• Evaluating software tools for detection of larger indels.

AML: Whole Genome Sequencing

AML: Current status

R.K.Wilson 2007

thirsty for knowledge?

• Diploid coverage was obtained for 77% of an AML M1 tumor genome with 22x haploid coverage.

• 2.1M sequence variants found (similar to other whole genomes already ‘finished’).

• ~495,000 novel variants: SNPs vs. somatic mutations• 10x coverage of epidermis (“normal”) genome just

completed; may identify >90% of variants as rare SNPs.• Remaining 50,000 variants are being prioritized by

detection in cDNA: should be <1,000• Very rare somatic mutations in cDNA thusfar (only 2

validated).• No mutator (“driver”) phenotype is readily apparent for this

AML case; ”passenger” mutations appear to be rare.• We continue to sift through the data…

AML: Current status

R.K.Wilson 2007

• Exon-targeted sequencing (TSP, glioblastoma) is revealing useful & interesting findings; expensive & slow!

• Next Gen sequencing is here and will have a substantial near-term impact on the study of cancer genomes!

• Ancillary genome-based technologies (expression profiling, SNP arrays, cDNA sequencing) are crucial for understanding the target genome before considering WGS.

• The dream is not hype: a comprehensive understanding of the “cancer genome” is probable, and will change the way that you diagnose & treat your patients.

Cancer Genomics

R.K.Wilson 2007

R.K.Wilson 2007

Acknowledgments• WU Genome Sequencing Center

Elaine Mardis, Li Ding, Dave Dooling, Tracy Miner, Mike McLellan, Ginger Fewell, Jim Eldred, Asif Chinwalla, Yumi Kasai, Lucinda Fulton, Vince Magrini, Matt Hickenbotham, Lisa Cook, Michael Wendl, Michael Province

• WU Siteman Cancer CenterTim Ley, Mark Watson, Matt Walter, Rhonda Ries, Jackie Payton, John DiPersio, Dan Link, Michael Tomasson, Tim Graubert, Sharon Heath

• TSP/TCGA ColleaguesBaylor HGSC, Broad Institute, many others…

• Funding sourcesNHGRI (Wilson), NCI (Ley), Alvin J. Siteman (AML WGS)

genome.wustl.edu