Post on 07-Apr-2017
Next Generation Sequencing for
Cancer Genomics 1012017/02/10Ino de BruijnJorge Reis-Filho Lab
Content• DNA sequencing
– Targets• DNA, genes, exons, introns• RNA
• How do we analyze NGS data?– Genetic changes– Mutational Signatures– RNASeq
Sequencing DNA in the modern era• DNA Sequencing is to convert real world DNA to digital
DNA
• In 1980s– Sanger sequencing– Compare short regions of DNA
• Possible by hand
• In mid 2000s– Parallelization of sequencing
reactions– Generates billions of DNA reads
• DNA read: short stretch of DNA– Compare whole genomes
• Impossible by hand
CACGTCTAAGGGCGAAGAGCTGACTGCTTTTTT
Targeting parts of the genome• Human genome has 3 billion bases• Be cost effective:
– Focus on part of genome related to your subject
What is a gene?• Human genome 3 billion bases
– 23 Chromosomes– Certain stretches of DNA are code for
proteins which perform a wide variety of functions in your body (~20,000 in total)
Gene to protein
Exons comprise only ~1.5% of genome
Different targets• Whole Genome Sequencing (WGS)
– E.g. Rearrangements outside genes• Whole Exome Sequencing (WES)
– E.g. Gene Discovery (Rare/unknown tumors)
• Custom target– MSK-IMPACT (Integrated Mutation Profiling
of Actionable Cancer Targets)• 410 genes related to cancer• >15K patents profiled at MSKCC
– https://cbioportal.mskcc.org/study?id=mskimpact
NGS Principles
NGS Principles - Coverage
Sequence same part many times:Coverage is number of times a base is covered by a read
NGS Principles - Coverage• Not all reads retrieved are correct
– Many errors when sequencing• DNA Library prep protocol• Sequencing error rate
• Sequencing groups of cells– Certain genetic changes only in small
fraction of cells• Need to sequence the same part multiple times to get confidence
– Amount depends on analysis & expectation
How to analyze the NGS data?• Some might guess this is where the bioinformatician comes in…
How to analyze the NGS data? • Some might guess this is where the bioinformatician comes in…
Too late - the bioinformatician should have been helping you design the experiment
How to analyze NGS data?• Tons of different options
– What is the research question?• Common analysis: identify genetic changes in the tumor
Identify the genetic changes
Meyerson et al. Nat Rev Genet 2010
Identify the genetic changes• Compare against reference human genome
– Gives both germline and somatic mutations
• How to differentiate?– Databases with common germline variants
misses many• Somatic mutations
– Take DNA from normal cells and tumor cells
– Filter mutations in normal
Identify somatic mutations
Identify mutations• Automated pipelines to do this
– Example: Mutation calling tools take into account
• Number of reads having the mutation versus all reads (Mutation Allelic Fraction (MAF))
• Coverage at that position• Read quality score• If calling somatic mutations
– Mutation in the normal• Every parameter makes assumptions about the data – communicate the goal of the project
Categorize Mutations• Silent/Nonsilent
– Does the mutation alter phenotype?• In exonic region
– Synonymous: Amino Acid Code stay the same
– Nonsynonymous: Changes Amino Acid Code of protein
Categorize Mutations• Oncogenesis
– Oncogenes (the gas)• Cell growth• Activation causes cancer
– Tumor Suppressor Genes (the breaks)• DNA repair, slow down cell division• Loss of function causes cancer
– Two Hit Hypothesis (Knudson 1971)
Mutational Signatures• Find activated mutational processes• Use the identified SNVs (single nucleotide variants) to determine
– Use 1 base context on both 5’ and 3’ side• .C. > .T.• 6 base transition classes
– C>A, C>G, C>T, T>A, T>C, T>G• 4 possible bases on both sides• Total: 6 * 4 * 4 = 96 possible transitions
Mutational Signatures
Alexandrov, L.B. et al. Nature 2013
Biological processes generating somatic mutations in cancer samples
Dataset: 4,938,362 mutations from 7,042 cancers
Aging signature Defective DNA MMR signature POLE signatureI T
CG>TA transitions at NpCpG
I, indels; T, transcriptional strand bias
CG>TA transitions at NpCpGCG>AT transversions at CpCpC
C>A transversions at TpCpT; T>G at TpTpT
Copy Number Analysis
(amplification)
(gain)
(neutral)
(loss)
(deletion)
Relative copy
number:
RNA-Seq Analysis• Gene expression
– Find low or highly expressed genes
Breast Invasive Carcinoma (TCGA, Nature 2012)
AMP+Upreg AMP Upreg
RNA-Seq Analysis• Gene expression as prognosis indicator
Verhaak et al (JCI, 2013)
RNA-Seq Analysis• Fusion gene detection
– E.g. TMPRSS2:ERGa (50% prostate cancers)