20170209 ngs for_cancer_genomics_101

download 20170209 ngs for_cancer_genomics_101

If you can't read please download the document

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of 20170209 ngs for_cancer_genomics_101

  • Next Generation Sequencing for Cancer Genomics 1012017/02/10Ino de BruijnJorge Reis-Filho Lab

  • ContentDNA sequencingTargetsDNA, genes, exons, intronsRNAHow do we analyze NGS data?Genetic changesMutational SignaturesRNASeq

  • Sequencing DNA in the modern eraDNA Sequencing is to convert real world DNA to digital DNA

    In 1980sSanger sequencingCompare short regions of DNAPossible by hand

    In mid 2000sParallelization of sequencing reactionsGenerates billions of DNA readsDNA read: short stretch of DNACompare whole genomesImpossible by hand


  • Targeting parts of the genomeHuman genome has 3 billion basesBe cost effective:Focus on part of genome related to your subject

  • What is a gene?Human genome 3 billion bases23 ChromosomesCertain stretches of DNA are code for proteins which perform a wide variety of functions in your body (~20,000 in total)

  • Gene to proteinExons comprise only ~1.5% of genome

  • Different targetsWhole Genome Sequencing (WGS)E.g. Rearrangements outside genesWhole Exome Sequencing (WES)E.g. Gene Discovery (Rare/unknown tumors)Custom targetMSK-IMPACT (Integrated Mutation Profiling of Actionable Cancer Targets)410 genes related to cancer>15K patents profiled at MSKCChttps://cbioportal.mskcc.org/study?id=mskimpact

  • NGS Principles

  • NGS Principles - Coverage

    Sequence same part many times:Coverage is number of times a base is covered by a read

  • NGS Principles - CoverageNot all reads retrieved are correctMany errors when sequencingDNA Library prep protocolSequencing error rateSequencing groups of cellsCertain genetic changes only in small fraction of cellsNeed to sequence the same part multiple times to get confidenceAmount depends on analysis & expectation

  • How to analyze the NGS data?Some might guess this is where the bioinformatician comes in

  • How to analyze the NGS data? Some might guess this is where the bioinformatician comes in

    Too late - the bioinformatician should have been helping you design the experiment

  • How to analyze NGS data?Tons of different optionsWhat is the research question?Common analysis: identify genetic changes in the tumor

  • Identify the genetic changes

    Meyerson et al. Nat Rev Genet 2010

  • Identify the genetic changesCompare against reference human genomeGives both germline and somatic mutationsHow to differentiate?Databases with common germline variants misses manySomatic mutationsTake DNA from normal cells and tumor cellsFilter mutations in normal

  • Identify somatic mutations

  • Identify mutationsAutomated pipelines to do thisExample: Mutation calling tools take into accountNumber of reads having the mutation versus all reads (Mutation Allelic Fraction (MAF))Coverage at that positionRead quality scoreIf calling somatic mutationsMutation in the normalEvery parameter makes assumptions about the data communicate the goal of the project

  • Categorize MutationsSilent/NonsilentDoes the mutation alter phenotype?In exonic regionSynonymous: Amino Acid Code stay the sameNonsynonymous: Changes Amino Acid Code of protein

  • Categorize MutationsOncogenesisOncogenes (the gas)Cell growthActivation causes cancerTumor Suppressor Genes (the breaks)DNA repair, slow down cell divisionLoss of function causes cancerTwo Hit Hypothesis (Knudson 1971)

  • Mutational SignaturesFind activated mutational processesUse the identified SNVs (single nucleotide variants) to determine Use 1 base context on both 5 and 3 side.C. > .T.6 base transition classesC>A, C>G, C>T, T>A, T>C, T>G4 possible bases on both sidesTotal: 6 * 4 * 4 = 96 possible transitions

  • Mutational SignaturesAlexandrov, L.B. et al. Nature 2013

    Biological processes generating somatic mutations in cancer samples

    Dataset: 4,938,362 mutations from 7,042 cancersAging signatureDefective DNA MMR signaturePOLE signatureI


    CG>TA transitions at NpCpGI, indels; T, transcriptional strand biasCG>TA transitions at NpCpGCG>AT transversions at CpCpCC>A transversions at TpCpT; T>G at TpTpT

  • Copy Number Analysis





    (deletion)Relative copy number:

  • RNA-Seq AnalysisGene expressionFind low or highly expressed genes

    Breast Invasive Carcinoma (TCGA, Nature 2012)




  • RNA-Seq AnalysisGene expression as prognosis indicator

    Verhaak et al (JCI, 2013)

  • RNA-Seq Analysis

    Fusion gene detectionE.g. TMPRSS2:ERGa (50% prostate cancers)

  • debruiji@mskcc.org