Download - 20170209 ngs for_cancer_genomics_101

Transcript
Page 1: 20170209 ngs for_cancer_genomics_101

Next Generation Sequencing for

Cancer Genomics 1012017/02/10Ino de BruijnJorge Reis-Filho Lab

Page 2: 20170209 ngs for_cancer_genomics_101

Content• DNA sequencing

– Targets• DNA, genes, exons, introns• RNA

• How do we analyze NGS data?– Genetic changes– Mutational Signatures– RNASeq

Page 3: 20170209 ngs for_cancer_genomics_101

Sequencing DNA in the modern era• DNA Sequencing is to convert real world DNA to digital

DNA

• In 1980s– Sanger sequencing– Compare short regions of DNA

• Possible by hand

• In mid 2000s– Parallelization of sequencing

reactions– Generates billions of DNA reads

• DNA read: short stretch of DNA– Compare whole genomes

• Impossible by hand

CACGTCTAAGGGCGAAGAGCTGACTGCTTTTTT

Page 4: 20170209 ngs for_cancer_genomics_101

Targeting parts of the genome• Human genome has 3 billion bases• Be cost effective:

– Focus on part of genome related to your subject

Page 5: 20170209 ngs for_cancer_genomics_101

What is a gene?• Human genome 3 billion bases

– 23 Chromosomes– Certain stretches of DNA are code for

proteins which perform a wide variety of functions in your body (~20,000 in total)

Page 6: 20170209 ngs for_cancer_genomics_101

Gene to protein

Exons comprise only ~1.5% of genome

Page 7: 20170209 ngs for_cancer_genomics_101

Different targets• Whole Genome Sequencing (WGS)

– E.g. Rearrangements outside genes• Whole Exome Sequencing (WES)

– E.g. Gene Discovery (Rare/unknown tumors)

• Custom target– MSK-IMPACT (Integrated Mutation Profiling

of Actionable Cancer Targets)• 410 genes related to cancer• >15K patents profiled at MSKCC

– https://cbioportal.mskcc.org/study?id=mskimpact

Page 8: 20170209 ngs for_cancer_genomics_101

NGS Principles

Page 9: 20170209 ngs for_cancer_genomics_101

NGS Principles - Coverage

Sequence same part many times:Coverage is number of times a base is covered by a read

Page 10: 20170209 ngs for_cancer_genomics_101

NGS Principles - Coverage• Not all reads retrieved are correct

– Many errors when sequencing• DNA Library prep protocol• Sequencing error rate

• Sequencing groups of cells– Certain genetic changes only in small

fraction of cells• Need to sequence the same part multiple times to get confidence

– Amount depends on analysis & expectation

Page 11: 20170209 ngs for_cancer_genomics_101

How to analyze the NGS data?• Some might guess this is where the bioinformatician comes in…

Page 12: 20170209 ngs for_cancer_genomics_101

How to analyze the NGS data? • Some might guess this is where the bioinformatician comes in…

Too late - the bioinformatician should have been helping you design the experiment

Page 13: 20170209 ngs for_cancer_genomics_101

How to analyze NGS data?• Tons of different options

– What is the research question?• Common analysis: identify genetic changes in the tumor

Page 14: 20170209 ngs for_cancer_genomics_101

Identify the genetic changes

Meyerson et al. Nat Rev Genet 2010

Page 15: 20170209 ngs for_cancer_genomics_101

Identify the genetic changes• Compare against reference human genome

– Gives both germline and somatic mutations

• How to differentiate?– Databases with common germline variants

misses many• Somatic mutations

– Take DNA from normal cells and tumor cells

– Filter mutations in normal

Page 16: 20170209 ngs for_cancer_genomics_101

Identify somatic mutations

Page 17: 20170209 ngs for_cancer_genomics_101

Identify mutations• Automated pipelines to do this

– Example: Mutation calling tools take into account

• Number of reads having the mutation versus all reads (Mutation Allelic Fraction (MAF))

• Coverage at that position• Read quality score• If calling somatic mutations

– Mutation in the normal• Every parameter makes assumptions about the data – communicate the goal of the project

Page 18: 20170209 ngs for_cancer_genomics_101

Categorize Mutations• Silent/Nonsilent

– Does the mutation alter phenotype?• In exonic region

– Synonymous: Amino Acid Code stay the same

– Nonsynonymous: Changes Amino Acid Code of protein

Page 19: 20170209 ngs for_cancer_genomics_101

Categorize Mutations• Oncogenesis

– Oncogenes (the gas)• Cell growth• Activation causes cancer

– Tumor Suppressor Genes (the breaks)• DNA repair, slow down cell division• Loss of function causes cancer

– Two Hit Hypothesis (Knudson 1971)

Page 20: 20170209 ngs for_cancer_genomics_101

Mutational Signatures• Find activated mutational processes• Use the identified SNVs (single nucleotide variants) to determine

– Use 1 base context on both 5’ and 3’ side• .C. > .T.• 6 base transition classes

– C>A, C>G, C>T, T>A, T>C, T>G• 4 possible bases on both sides• Total: 6 * 4 * 4 = 96 possible transitions

Page 21: 20170209 ngs for_cancer_genomics_101

Mutational Signatures

Alexandrov, L.B. et al. Nature 2013

Biological processes generating somatic mutations in cancer samples

Dataset: 4,938,362 mutations from 7,042 cancers

Aging signature Defective DNA MMR signature POLE signatureI T

CG>TA transitions at NpCpG

I, indels; T, transcriptional strand bias

CG>TA transitions at NpCpGCG>AT transversions at CpCpC

C>A transversions at TpCpT; T>G at TpTpT

Page 22: 20170209 ngs for_cancer_genomics_101

Copy Number Analysis

(amplification)

(gain)

(neutral)

(loss)

(deletion)

Relative copy

number:

Page 23: 20170209 ngs for_cancer_genomics_101

RNA-Seq Analysis• Gene expression

– Find low or highly expressed genes

Breast Invasive Carcinoma (TCGA, Nature 2012)

AMP+Upreg AMP Upreg

Page 24: 20170209 ngs for_cancer_genomics_101

RNA-Seq Analysis• Gene expression as prognosis indicator

Verhaak et al (JCI, 2013)

Page 25: 20170209 ngs for_cancer_genomics_101

RNA-Seq Analysis• Fusion gene detection

– E.g. TMPRSS2:ERGa (50% prostate cancers)