NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Session 2.1.1 - VHIR,...
-
Upload
vhir-vall-dhebron-institut-de-recerca -
Category
Science
-
view
1.074 -
download
5
description
Transcript of NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Session 2.1.1 - VHIR,...
1
Vall d’Hebron Institut de Recerca (VHIR)
Rosa PrietoHead of the High Tech Unit
15/05/2014
Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII)
NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
2
INTRODUCTION TO NGS1
2
3
4
Index
NGS TECHNOLOGY OVERVIEW
NGS APPLICATIONS OVERVIEW
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
WHAT IS NEXT IN SEQUENCING TECHNOLOGIES?
5
Introduction
Personalized medicine era
Biomarker identification:•Diagnostic
•Susceptibility/risk (prevention)•Prognostic (indolent vs. aggressive)
•Predictive (response)
-The right therapeutic strategy for the right person at the right time-Predisposition to disease
-Early and targeted prevention
7
Introduction: “omics”
“Omics”
Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms (Wikipedia).
http://www.genomicglossaries.com/content/omes.asp
Genomics
High-throughput technologies
EpigenomicsMetagenomics
Transcriptomics Proteomics MetabolomicsLipidomics
8
Next generation sequencingThe future is here, now?
Everything can be sequenced…
9
Introduction to NGS technologies
1st generation 2nd generation 3rd generation
http://www.ipc.nxgenomics.org/newsletter/no11.htm
3.234,83 Mb (haploid)$ 2,7 billion
Automatic sequencer ABI1987
(GS20)
Sequencing technology milestones
First generation sequencing Second generation sequencing
NGS increases capacity and reduces costs
Moore’s Law: the number of transistors in an integrated circuit duplicates in 2-years time (1965).
Source - NHGRI : http://www.genome.gov/sequencingcosts/
Date Cost per Mb Cost per Genome % cost vs. sep01
Sep-01 $5.292,39 $95.263.072 100%
Sep-02 $3.413,80 $61.448.422 64,5039%
Oct-03 $2.230,98 $40.157.554 42,1544%
Oct-04 $1.028,85 $18.519.312 19,4402%
Oct-05 $766,73 $13.801.124 14,4874%
Oct-06 $581,92 $10.474.556 10,9954%
Oct-07 $397,09 $7.147.571 7,5030%
Oct-08 $3,81 $342.502 0,3595%
Oct-09 $0,78 $70.333 0,0738%
Oct-10 $0,32 $29.092 0,0305%
Oct-11 $0,086 $7.743 0,0081%
Oct-12 $0,074 $6.618 0,0069%
Oct-13 $0,057 $5.096 0,0053%
Jan-14 $0,045 $4.008 0,0042%
1. Fragmentación de DNA 1. Fragmentación de DNA
2.Clonaje en Vectores; Transformación Bacterias; crecimiento y aislamiento
vector DNA
2. Ligación de adaptadores in vitro y Amplificación clonal
3. Ciclo Secuenciación
CTATGCTCG
Secuencia:Primer:
PolimerasadNTPs
ddNTPs marcados
Electroforesis(1 Secuencia/Capilar)
3. Secuenciación masiva en paralelo
4. Procesamiento imagen y análisis de datos
4. Procesamiento imagen
1. Fragmentación de DNA
2. y 3. Ligación de adaptadores in vitroy Secuenciación masiva
SIN Amplificación
Sanger 2ªNGS 3ªNGS
Sanger sequencing vs. NGS (2nd and 3rd generation)
4. Procesamiento imagen y análisis de datos
Comparison of different NGS platforms
-Similarities (and differences vs. Sanger):
•library preparation:starting material: short fragments of nucleic acidsadapter ligationmultiplexing (MID tags)
•clonal amplification (not for 3rd generation sequencing)•massive parallel sequencing •the use of physical location to identify unique reads is a critical concept for all next generation sequencing systems. The density of the reads and the ability to record them without interfering noise is vital to the throughput of a given instrument.•signal needs to be processed and post-treated to get the individual sequences•complex data analysis due to the big amount of data
-Differences:
•Clonal amplification method/sequencing technology/signal detection•Throughput•Read-length•Run time•Cost per base
16
Illumina
Life Technologies
ROCHE
SOLID5500xl
GS Junior 454GS FLX+ 454
HiSeq 2500 MiSeqNextSeq500
Benchtop Instruments
2ns generation NGS platforms
IonPGMIonProton
HiSeq X-Ten (exp.2014)
17
DNA fragmentation and in vitro adaptor ligationDifferent kinds of libraries (amplicons, shot-gun,cDNA….)
emulsion PCR bridge PCR
454 sequencing Illumina technologyIon Proton/PGM
Pyrosequencing Semiconductor sequencing 4-colour fluorescent nucleotides
1
2
3
11
22
33
Library preparation
Clonal amplification
Cyclic array sequencing
NGS general workflow
18
-1 starting effective fragment per microreactor- ~106 microreactors per ml- All processed in parallel (Clonal amplification)
High-speed shaker
Clonal amplification by emPCR (454, Ion)
emPCR based systems (Roche, SoLID, Ion)
19
Clonal amplification by emPCR (454, Ion)
Clonal amplification??
No empty beads
No beads containing more than one amplified fragment
1) Bead vs. starting DNA quantity titration
2) Optimal enrichment:
Melt
dsDNAUnión de Primer marcado
con Biotina a bolas de captura con ssDNA
Adición de bolas magnéticas con estreptavidina
Melt
5-20% OK
20Generación de clusters: PCR “en puente” 100-200 millones de clusters
HiSeq2500: 2 “flow-cells”, 8 carriles por celda
Unión de cadenas sencillas a los adaptadores
Eliminación de las cadenas reversas
Bloqueo y adición primer secuenciación
Clusters clonales de cadena doble
Bridge amplification (Illumina)
21
Metal coated PTP reduces crosstalk29 μm well diameter (20/bead)
3,400,000 wells per PTP
GS FLX 454 sequencing
22
Pyrosequencing (sequencing by synthesis)
CCD Camera
“flowgram” (signal intensity is proportional to the number of nucleotides incorporated in the
sequence)
- throughput limited by the nº of wells in the PTP- errors in homopolymers :S (454)- long sequences (up to 1000bp) are achieved- low throughput, very expensive reagents
GS FLX 454 sequencing
23
Illumina sequencing
- Limited by the fragment length than can “bridge”- Labelled nucleotides are not incorporated as efficiently as native ones
- Short sequences-Strand-specific errors, substitutions towards the end of the read, base substitution errors (sistematic error GGT >GGG)-High throughput, expensive machines, cost per Mb OK
Liberación secuencial de 4 nucleótidos fluorescentes
Incorporación
Captación de imagen
Eliminación terminador 3’
Reversible dye terminator nucleotides (sequencing by synthesis)
24
Fragmentación & secuencias adaptadoras
1. Liberación secuencial de nucleótidos no modificados2. La incorporación de un nucleótido por la polimerasa libera un H+3. Detección directa y simultánea de un cambio de pH en todos los
pocillos.
ION TORRENT (Life Techn.)
Amplificación clonal (emPCR sobre beads)Deposición de las beads+DNA en los pocillos del chip
Ion Torrent sequencing
•pHmeter, no optical system: rapid output improvement based on chips•Fast runs (native nucleotides)•Inexpensible machine and reagents•Fails in homopolymers detection
25
NGS data analysis
454 sequencing
Pyrosequencing
26
PLATFORM ROCHE GS FLX+ 454 ILLUMINA HISEQ 2500 ION PROTON
Library preparationemPCR Bridge amplification emPCR
Sequencing chemistry Pyrosequencing Reversible dye terminators pH change
Read length Up to 1000bp From 2x125 bp to 2x300 bp Up to 200 bp
Run time 22 hrs 7 hrs-6 days From 2 to 4 hrs
Throughput/run Up to 700 Mb 500-1000Gb (1Tb) 10Gb (PI), 100Gb (PII)
Equipment Cost 500.000 $ 750.000 $ 250.000 $
Reagents Cost/run 8.000 $ 5.500 $ 1.000 $
GOOD! Longest read length High throughput/low cost per base/ease of use Quick, easy to use and cheap
BAD!
High error rate in homopolymers (>6); very expensive; low throughput; not automatized at all
Short sequencesStrand-specific errors, substitutions towards the end of the read, base substitution errors (sistematic error GGT >GGG)
Errors in homopolymersHigher bias than Illumina
NGS platforms comparison
27
NGS High-Throughput Platforms comparison
Two modes: Rapid Run and High OutputSingle/Dual Flow Cells PE 2 x 125 pb120 Gb in 27 hours (Rapid)1 Tb in 6 days (High) 20 exomes in a day1 human genome in a day30 RNAseq samples in 5 hours
Human exome, 30x, aprox. 800-1000 €Human RNAseq (30Mreads, 100bp PE, strandspecific): aprox. 800-1000 €Human whole genome 30x: 4000 €
HiSeq Xten(10 HiSeqX)
Only High Output modeSingle/Dual Flow CellsPE 2 x 150 pb600 Gb in a day (dual flow cell)1.8 Tb in 3 days (4x faster than HiSeq2500)HiSeq XTen: 10.000 genomes at 30x per year
Ion Proton
Source: Nextgenseek.com & Allseq.com.Todos estos costes son orientativos a mayo de 2.014 y de ninguna manera vinculantes para la UAT
Ion PI chip: Up to 20 Gb output (specific. 10 Gb)Read length:Up to 200 bpRun time: 2-4 hrs1 human exome (aprox. 1000 €)
Ion PII chip:Up to 100 Gb output (expected 2014), now reduced to 20-30 Gb at launchRun time: 2-4 hrsRead length: 100 pbHuman Whole Genome (10x, ?)
Ion PIII chip (???): 200 Gb output per run
28
NGS Platforms specifications and applications
Ion PGM/Ion ProtonIllumina
29
Roche 454
NGS Platforms specifications and applications
PacBio RSII (3rd generation)
31
NGS advantages and limitations
Journal of Investigative Dermatology (2013) 133