The Application of Next Generation Sequencing (NGS) in cancer treatment
Next generation sequencing (NGS): procedure ed...
Transcript of Next generation sequencing (NGS): procedure ed...
Next generation sequencing (NGS):
procedure ed applicazioni
Istituto di Tecnologie Biomediche Consiglio Nazionale delle Ricerche
ITB - CNR
Ermanno Rizzi, PhD
Ferrara 05/05/2014 [email protected]
NGS…what?
• High performance sequencing: High throughput Sequencing (HTS) Why?
• Higher sequencing data request at lower cost (compare to Sanger).
When?
• in 2005 first NGS platfrom: GS20 454 Lifesciences …something is changing
• high quantity and quality of available data • new genomics applications
…what will change?
• new multidisciplinary approaches.
Next Generation Sequencing NGS: Intro
Next Generation Sequencing NGS: Intro
Workflow
Reference sequence
Mapping Reads
Depth or “X” or fold
Coverage (% of reference sequence)
Next Generation Sequencing NGS: Intro
Keywords
454\Roche Genome Sequencer FLX-Titanium, Junior
NGS Platforms
Illumina Genome Anlyzer, HiSeq, MiSeq
Life SOLiD 5500 W Series Genetic Analysis Systems
Life Ion Personal Genome Machine (PGM) and Proton
Next Generation Sequencing NGS: Intro
NGS platform comparison
Platform Sequencing chemistry Signal detection
Roche/454 single-nucleotide addition (SNA)
by pyrosequencing
Luminescence
Torrent or Proton (PGM)
single-nucleotide addition (SNA) by
semiconductor sequencing Chip pH
Illumina Sequencing by Synthesis (SBS)
using cyclic reversible termination (CRT)
Fluorescence
Solid Sequencing by ligation (SBL) Fluorescence
NGS platform distribution
Platform Run time Read
Length Gigabases/run
Reagent Cost/run
Roche/454 Titanium FLX+ 20 hrs. 800 0,8 $ 6 200
Roche/454 Titanium Junior 10 hrs. 400 0,04 $ 977
Ion Torrent - Proton I 4 hrs. 175 12,2 $ 1 000
Illumina HiSeq 2500 - high output v3
2 days 50 75 $ 5 866
Illumina HiSeq 2500 - high output v3
11 days 200 300 $ 13 580
Life Technologies SOLiD – 5500xl
8 days 110 155 $ 10 503
NGS cost comparison
Target sample Library preparation NGS
Signal detecion and analysis
Reads analysis
Final data
Library amplification
A T C G
A T C G
NGS Workflow. From sample prep to final data
Library preparation
DNA fragmentation
• Sonication • Ultra sonication (Covaris) • Nebulization • Enzymatic
• Ligation • Tagmentation • Paired ends o Mate Pair
Adapter addition and Multiplexing (MID o Index)
• Fluorometer • qPCR • Agilent Bioanalyser
Library quality control and quantitation
…To Library amplification
Procedure Library preparation: adapter Ligation
Roche/454 Rapid Protocol:
• for all Roche/454 applications • starting material: 500 ng of gDNA
Procedure Library preparation: enzymatic “tagmentation”
Illumina Nextera “tagmentation”
• for small genomes • small amount of starting material: 1 or 50 ng of gDNA
Procedure Library preparation: paired ends
Amplification and selection
• Coupling ends of large fragments: 3, 8, 20K bases
• Large contigs • Enhance scaffolding • Complete genome seq. • Structural variants
Roche/454 Rapid Protocol:
• Library captured onto beads surface
• Water in oil emulsion creates microreactors
• millions of DNA fragments amplified onto beads surface
• recovery of DNA beads by emulsion breaking
• enrichment to eliminate null beads and recover positive beads
Library amplification: Emulsion PCR (emPCR)
emPCR
Illumina Bridge amplification • High-density primers attached to the slide • Solid-phase amplification • 100–200 million spatially separated template clusters
Library amplification: Bridge Amplification
454\Roche Genome Sequencer FLX-Titanium, Junior
NGS Platforms
Illumina Genome Anlyzer, HiSeq, MiSeq
Life SOLiD 5500 W Series Genetic Analysis Systems
Life Ion Personal Genome Machine (PGM) and Proton
www.molecularecologist.com
Error Rates
Instrument Primary Errors Single-pass Error Rate (%) Final Error Rate (%)
3730xl (capillary) Substitution 0.1-1 0.1-1
454, all models Indel 1 1
Illumina, all models
Substitution ~0.1 ~0.1
Ion Torrent – all chips
Indel ~1 ~1
SOLiD – 5500xl A-T bias ~5 ≤ 0.1
Platforms error rates
The best platform is…..
Things to be considered: • Amount of data • Read length • Support from company • User community • Post sequencing requirements • Platform cost • Kits cost • cost per base
• Denovo Vs Re-sequencing
•Target: DNA, RNA
• Single ends Vs paired ends
• Sequencing approaches: Shotgun Vs Enrichment
Enrichment by PCR Enrichment by probe capture ChIP-Seq for epigenetic studies
NGS applications
!
?
Target: DNA and RNA
DNA Seq = Genomics • genome sequencing (nuDNA o mtDNA) • exome • variant calling: mutations and SNPs • copy number variation (CNV) • Epigenetics
• ChIP-Seq: promoter metylation, histon modifications, transcription factors
RNA Seq = Transcriptomics • gene expression levels • variant calling • splicing variants • fusion transcripts • transcript discovery
Sample preparation for RNA Seq
RNA seq Sequencing of: • mRNA • ncRNA, • small RNA • micro RNA
Poly T or random examers
Capture sequencing
Capture Total genomic
DNA
Direct sequencing
PCR
Amplicon sequencing
NGS applications: DNA
Capture probe hybridization
Pre-amplification
Capture probe design and synthesis
Library preparation
NGS protocol
“fishing”
Washing
Target recovery
Enrichment by probe capture
Capture sequencing: the target
Exome
• targets: exons
• ~1% of human genome
• size: ~30Mb
• ~85% mutations related to disease
• multiple sample variant call (MSVC)
for Low pass sequencing
Custom
• Diagnostic genome regions
• Chromosomes
• Specific regions (kinase, transcription
factors…)
• Specific genes (HLA, MHC ecc)
Aims • Rare and common variant identification
• single nucleotide • Insertions and Deletions (InDels)
• SNPs analysis • Copy Number Variations (CNV)
Looking forward…
Sanger Sequencing
NGS
Third Generation Sequencing (TGS)
Third generation: single molecule sequencing
Company Name TGS principle
Helicos Genetic Analysis Platform Virtual Terminator nucleotides
Pacific Biosciences Anchored DNA polymerase+Zero-mode waveguide (ZMW)
VisiGen Biotechnologies Modified DNA polymerase + Fluorescence Resonance Energy Transfer (FRET)
• Halcyon Molecular • ZS Genetics Transmission Electron Microscopy (TEM)
Oxford Nanopore modified α-hemolysin pore + Measure of Ionic current
TGS Features • No “wash-and-scan” technology • “Real time” - really fast • No synchronization required no dephasing problem • Single molecule sequencing
Applications @ ITB-CNR • Shotgun: bacteria genome finishing • PCR enrichment: Integrome study in Gene Therapy • Variant calling in ancient DNA • RNA seq: transcriptome of breast cancer • Metagenomics
Istituto di Tecnologie Biomediche Consiglio Nazionale delle Ricerche
ITB - CNR
Shotgun: bacteria genome finishing
Fuel droplet A.venetianus colonies
Circular representations of A. venetianus VE-C3 chromosome and plasmids.
Acinetobacter venetianus VE-C3 genome sequencing
• Roche/454 + Illumina sequencing • 3,564,836 bp bases were assembled
Adhesion to oil fuel: wee cluster for n-alkanes adhesion
Metabolism of n-alkanes: alk-like sequences cytochrome P450
Resistance to heavy metal:
As, Cd, Co,Cr, Cu, Hg, Pb, Zn found in the Venice Lagoon
Bioremediation an resistance clusters identification
Phylogenetic analysis conducted using a set of conserved proteins: FusA, IleS, LepA, LeuS, PyrG, RecA, RecG, RplB, RpoB
Each genome is represented by an arc and the different genomes (arcs) are connected by vertices accounting for their shared sequence similarity
Philogenetic analysis: Acinetobacter pangenome
BLAST comparisons of Acinetobacter species.
Integrome study in Gene Therapy
Proviral vector Integration
Integrome study in Gene Therapy
Human genome
Proviral vector
GATCCGTTTCAGTCGATCAGTGGGCATA
Integration site (IS) nucleotide sequence
Integrome: all detectable IS in the human genome
Recover of vector-genome junctions: Ligation Mediated PCR (LM-PCR)
5’ LTR 3’ LTR linker
Pst I Mse I
Restriction sites
Integrated proviral vector
Genomic DNA
LM-PCR
Nested PCR
Integrome study in Gene Therapy
Distribution of retroviral integrations around transcription start sites.
Integrome study in Gene Therapy
A
B
C
Distribution of the distance of MLV and HIV integration sites from the transcription start site (TSS) of targeted genes at 2500-bp (A), 50-bp (B), or 5-bp (C) resolution.
Integrome study in Gene Therapy
Results and applications •Integration pattern for pro-viral vector to be used in gene therapy
•Tool to study the transcriptionally active regions -> applied in stem cells studies
5’ LTR 3’ LTR linker
Pst I Mse I
Restriction sites
Integrated proviral vector
Genomic DNA
Integrome study in Gene Therapy
Ancient DNA analysis by NGS
… to recover genetic info from the past.
• To determine phylogenetic relationship among extint and extant animals
• For palaeogenetics and evolutionary biology studies
Why to study ancient DNA (aDNA?)
“Homo” evolution
Common ancestor
Why to study aDNA?
For anthropological applications
and population genetics on
modern human and on
Early Modern Humans (EMH).
Domestication process
Why to study aDNA?
aDNA analysis challenge
Authenticity assessment: Ancient Vs Modern
Features of aDNA • low amount • high degradation • small fragment size: 70-120 bp • contamination • post-mortem damage
Ancient DNA analysis by NGS
Features of aDNA: Misincorporation pattern I
Patterns of damage in genomic DNA sequences from a Neandertal. Briggs AW, PNAS 2007
Features of aDNA: Misincorporation pattern II
Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient DNA. Sawyer S, PLoS One. 2012
C to T misincorporations at the first position of mtDNA fragments as a function of age.
Ancient DNA and NGS
• Single locus PCR • Multiplex PCR • Shotgun approach • Custom capture best approach in terms of:
• Discrimination endogenous vs exogenous
• Cost • Enrichment ratio
Forensic DNA
Forensic DNA common features with aDNA
• Fragmentation • Low amount • High level of contamination
NGS applied to forensic DNA:
Short Tandem Repeats (STR) count and analysis
STR profiling
RNA seq: transcriptome of breast cancer
Rationale •primary human lobular breast cancer tissue • 132,000 reads • validated by RT-PCR Results:
• one deletion • two novel ncRNAs • ten unknown or rare transcript isoforms • a novel gene fusion • thousands of novel non-coding transcripts • more than three hundred reads corresponding to the non-coding RNA
MALAT1, which is highly expressed in many human carcinomas.
intragenic deletion in WHSC1L1, identified by the 99 bases long read 1B.
The green and blue arrows represent the two halves of the fusion transcript which map on the opposite order to the genome.
RNA seq: transcriptome of breast cancer
Metagenomics
Human Metagenomics
Environmental Metagenomics
A microbiome is "the ecological community of commensal, symbiotic, and pathogenic microorganisms that literally share our body space."
Microbiome
study of genetic material recovered directly from environmental samples.
Metagnomics Vs 16 S rRNA seq.
Ingrid Cifola
Clarissa Consolandi
Clelia Peano
Roberta Bordoni
Alessandro Pietrelli
Marco Severgnini
Eleonora Mangano
Eva Pinatel
Luca Petiti
Simone Puccio
Santosh Anand
Gianluca De Bellis
Cristina Battaglia
Thanks to my colleagues