Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos...

48
Introduction to NGS technologies Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead [email protected]

Transcript of Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos...

Page 1: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

Introduction to NGS technologies

Curso Medicina Genomica en Oncologia

Junio 2017

Carlos Mackintosh, PhD

Cancer Genetics [email protected]

Page 2: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

2

The Human Genome Project And The First Revolution In DNA Sequencing

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 3: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

3© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Human Genome Project progress

- The HGP is one of the greatest endeavors of humanity and was accomplished by a sequencing technology developed in 1977 (Chain termination or Sanger Sequencing)

- 1st Revolution in the Sequencing world

- Parallel sequencing through Single Chain Termination + capillary electrophoresis + shotgun sequencing

Page 4: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

4

Sanger Sequencing + Capillary Electrophoresis: parallel sequencing (non-massive)

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Source: wikipedia

- In 1995 Applied Biosystems released the ABI PRISM 310 genetic analyzer, a breakthrough in DNA

sequencing technology.

- A few years later, they released the 96-capillary high-throughput ABI PRISM 3700 DNA analyzer and ABI

PRISM 3100 genetic analyzer, used during most of the HGP

- 2002: 3730xl DNA analyzer, quadrupling the sequencing production capacity

Page 5: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

5

Shotgun Sequencing, simplified

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- Random fragmentation of the whole genome by enzymatic approaches

- Cloning the fragments in high capacity, single copy, bacterial/yeast plasmids (YACs, BACs, PACs, fosmids, etc.)

- Generating a 2nd Generation map of the whole genome, using STS, RFLPs, Restriction Sites, etc.

- Selection of clones generating the largest scaffolds, the tiling path

- Fragmentation of each clone, subcloning in lower capacity plasmids (M13) and sequencing of each subclone

- Bioinformatic assembly

BAC

clone

libraries

Used by Celera Genomics

Used by the public consortium

Page 6: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

6

Moore’s Law

- Intel co-founder Gordon Moore's observation that computing power tends to double — and that its price therefore halves — every 2 years has held true for nearly 50 years with only minor revision.

- https://www.genome.gov/sequencingcostsdata/

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Sanger Seq

Illumina HiSeq X Ten

2017 Human Genome $1,000

Less than 3 days

45 human genomes a day, Ten machines

ABI PRISM 3730xl

2004 Human Genome $15,000,000

Many months, with many of them…

Page 7: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

7

Current NGS Technologies

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- 2nd Generation NGS (short reads / clonal amplification)

1. Discontinued or Almost: 454 (Roche), SOLiD

2. Reigning kings: Illumina (Solexa)

3. Gaining Market: Complete Genomics (BGI), Ion Torrent

4. Just Arrived or Upcoming: Qiagen (Intelligent Biosystems), Agilent , Illumina Firefly

- 3rd Generation NGS (single molecule seq. / long reads)

- Pacific Biosystems (not anymore a Roche partner) – SMRT

- Oxford Nanopore

- Genia (Roche)

- Synthetic Long Reads:

- 10x Genomics

- Illumina Synthetic Long Reads (formerly Moleculo)

- Failed platforms: Helicos Biosciences, VisiGen, Genizon Biosciences, Starlight, etc.

Page 8: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

8

Sequencing Revolutions – Sequencing Generations

Nat Rev Microbiol. 2017 Dec;13(12):787-94

1st Gen Sequencing 2nd Gen Sequencing 3rd Gen Sequencing

NGS

Page 9: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

9

NGS technologies: Library Prep and Basic terminology

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 10: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

10

Basic Steps of Library Preparation in 2nd

Gen NGS

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Source: http://tucf-genomics.tufts.edu/home/faq

- Sequencing by Ligation (SBL) or Sequencing by synthesis (SBS)

- 1st: Random fragmentation of the DNA. Multiple ways to do it. Size select.

-1.Sonication (Covaris) / Nebulization

-2.Restriction Enzymes

-3.Transposons (Tagmention)

-4.Chemical/Heat (RNA-seq)

-5. None: Not necessary if starting from PCR products

- 2nd: End repair (blunt ends), 3’-A tailing, Adapter Ligation

- 3rd: Clonal Amplification on a Solid Support (beads or glass) with or without Emulsion PCR

- This is needed in order to enable detection of the sequencing signal by cameras, microscopes or semiconductor-transistors

- Exception: Complete Genomics, BGI, “DNA nanoballs” are generated in solution

- Source of duplicates, which will be removed during the bioinformaic analysis

- 4th: Deposition on a solid surface and sequencing:

- Flow cell or patterned flow cell: SOLiD/Illumina/Complete Genomics Picotiter plate (454)

SBS can be:

- Cyclic Reversible Termination (CRT)

- Single Nucleotide Addition (SNA)

Page 11: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

11

Fragmentation Approaches: Acoustic shearing and Tagmentation

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Source: Bitesizebio.com

- Acoustic Shearing: Hydrodynamic shearing. Covaris (Adaptive Focused Acoustics™).

- Bursts of ultrasonic acoustic energy at a frequency 15 to 30 times higher than a sonicator. Wavelength of only a few millimeters focused into a discrete zone within a sample vessel immersed in a water bath.

-Controllable power intensity, duration, and duty factor of these bursts.

- Tagmentation: Nextera system (developed by Epicentre).

- Transoposomes insert randomly into DNA by ‘cut and paste’. Tagmentation fragments the DNA and adds on the sequences required for PCR amplification (two transposomes: each with one of the two flowcell compatible adaptors, dual indexing).

Source: Covaris Website.

TagmentationAcoustic Shearing

Page 12: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

12© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Quality Control: Capillary gel electrophoresis and Fluorometer

- Capillary gel electrophoresis: “Lab-on-a-Chip”

- Micro-fabricated glass chips: two glass layers (one with through wholes, another with micro-channels); upper plastic caddy with the wells toload 1st the gel-fluorescent dye and 2nd samples/ladder.

- Gel-like images (bands), electropherograms, DIN value and concentration

- 1st- Check sample integrity, 2nd.- check the efficiency of the fragmentation process 3rd.- check the library

- Fluorometer: Specific dyes developed by Molecular Probes bind differentially to DNA or RNA.

- They emit fluorescence after binding to DNA/RNA, proportionally to their concentration in the sample

- Qubit (ThermoFisher): several products for different ranges of DNA/RNA concentration.

- Qubit dsDNA HS Assay: 0.2 – 100 ng

Source: ESD

journal

Electropherogram from NGS library

Source: ThermoFisher

webpage

Page 13: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

13

NGS basics dictionary (I)

- Read (2nd Gen NGS): the consensus sequence of a cluster or molecular clone that is obtained at the end of the sequencing process, or to put more simply, the sequence of a DNA/RNA fragment

- Library: A collection of DNA fragments with adapters ligated to each end.

- Multiplexing: several libraries (i.e. from different samples) can be pooled and sequenced simultaneously.

- Unique index sequences are added to each DNA fragment during library preparation. Each read can be traced back to its corresponding sample during data analysis.

- Barcoding: molecular barcodes allow the identification of duplicate reads, significantly improving base calling accuracy even at low allelic frequencies.

- Paired-end sequencing: sequencing of both ends of the DNA fragments, and alignment of both reads as a pair.

- More accurate read alignment. Ability to detect indels. Helps remove PCR duplicates. Higher number of SNV calls.

- Highly repetitive nature of the genome: increased likelihood of uniquely mapping the pair, since at least one end of the pair will be correctly aligned to the reference sequence.

- Identification of Structural variations (indels, copy number polymorphisms, inversions, insertions, novel genome content, and translocations). Separation and orientation of aligned read pairs on the reference (interpair distances larger or closer than expected by library size, i.e. split reads, or forward and reverse reads aligned either in an unexpected orientation, or onto different chromosomes)

- Mate pair reads: size of DNA fragments is larger (over 1kb). Optimal for building scaffolds and finding large size structural variants

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 14: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

14© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

NGS basics dictionary (II)

- Duplicate reads: biased over-amplification of some fragments of DNA, which will be over-represented in the library and will cause a incorrect estimation of the allelic frequency

- Number of PCR cycles

- Starting molecular complexity of the sample DNA: low starting quantity and/or quality (FFPE) will increase the portion of duplicates.

- Coverage (or Sequencing Depth): average number of reads that align to each base of the reference genome (or the subset of the reference genome under study).

- Coverage level determines whether variant discovery can be made with confidence.

- Whole genome: 30-fold (30X) at >90% of bases.

- Cancer: at least 200x minimum at >90% of bases, 1000x average optimal, to detect mutations at 5-10% allelic frequency

- Coverage considerations:

- Multiple variant reads are needed to call a variant in any particular position

- Reads are not distributed evenly: many bases with fewer than the average coverage, while many others with higher coverage: minimum coverage versus average coverage

- Illumina Coverage Calculator tool: http://support.illumina.com/downloads/sequencing_coverage_calculator.html

Page 15: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

15© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Source: University of Pennsylvania, NGS core, website, https://ngsc.med.upenn.edu/#/

Source: Illumina webpage, “Illlumina

sequence introduction”

NGS basics dictionary (III)

Illumina read structure

Library multiplexing

Page 16: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

16

NGS technologies: 2nd Generation NGS

Clonal Amplification by PCR → Signal Detection(Short reads)

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 17: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

17

Sanger Sequencing vs. 454-Life Sciences NGS

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Nature Biotechnology 26, 1135 - 1145 (2008)

Nature 437, 376-380. 2005. Genome sequencing in

microfabricated high-density picolitre reactors.

Marcel Margulies,[…], Jonathan M. Rothberg

6986 cites as of Dec 16

Page 18: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

18

Clonal Amplification on a Solid Support

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Modified from Nature Reviews

Genetics 17, 333–351 (2016)

Emulsion PCR:

454, SOLiD, Ion

Torrent

Bridge

Amplification:

Solexa, Illumina

Page 19: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

19

Clonal amplification by Emulsion PCR - 454/Ion torrent/SOLiD

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- Fragmented and Ligated DNA is first denatured, then it is mixed with a great excess of beads

- Each bead has thousands of copies of an oligonucleotide with sequence complement to one of the adapters – Most of the beads only capture one DNA fragment

- The solution is mixed with oil forming a tiny microreactor where clonal amplification happens

- Emulsion is broken and empty beads (those that did not capture any DNA fragment) are discarded by streptavidin selection / centrifugation

- Beads are deposited into wells of a micro-titer plate (454) or a chip (Ion Torrent): one bead per well. SOLiD: beads are deposited and crosslinked to a glass surface

Page 20: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

20

Clonal amplification by Bridge Amplification: Solexa/Illumina

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- No beads, library is simply diluted and added to the flowcell surface

- Flowcell surface has the oligos complement to the adapters

- Bridge amplification (isothermal amplification) happens and clusters of clonal

amplification grow

- HiSeq 3000 – 4000: patterned flow cell

Modified from Illumina website

Page 21: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

21

Clonal amplification in solution (no solid support): Complete Genomics, BGI

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- DNA is fragmented and ligated to the first of four adapter sequences.

- The template is amplified, circularized and cleaved with a type II endonuclease.

- The second set of adapters is added, followed by amplification, circularization and cleavage.

- This process is repeated for the remaining two adapters.

- Library molecules undergo a rolling circle amplification (Phi 29 DNA polymerase), generating a large mass of

concatamers called DNA nanoballs, which are then deposited on a patterned flow cell

- Adapter sequences are palindromic, forcing the compact structure of the nanoball when in ssDNA

Nat Rev Genet. 2016 May 17;17(6):333-51

Page 22: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

22© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Sequencing by Synthesis, Single Nucleotide Addition, Semiconductor Sequencing: Ion Torrent

- Semiconductor Sequencing: protons released after dNTP incorporation – change in pH is detected

- Integrated complementary metal-oxide-semiconductor (CMOS): circuit on a microchip that contains different types of semiconductor transistors to create a circuit that both uses very little power and is resistant to high levels of electronic noise.

- Ion-sensitive field-effect transistor (ISFET): A type of transistor that is sensitive to changes in ion concentration.

- No fluorochromes / No microscopes or CCD camera

- Only one dNTP species is present during each cycle; multiple identical dNTPs can be incorporated during a cycle, increasing the electric signal

Modified from Nat Rev

Genet. 2016 May

17;17(6):333-51

and ThermoFisher

webpage

Page 23: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

23

- High homopolymer error: The pH change detected by the sensor is imperfectly proportional to the number of nucleotides detected. Error in homolopolymers > 6-8 bp

- 400 bp read-lengths

- S5: no need of Argon (PGM) but no paired-end option

- Coupled to Ion Chef: fast and simple library prep (finally simplifying the beads-emulsion PCR protocol)

- Overall error rate is on par with other NGS platforms in non-homopolymer regions

- Different types of chips and instruments to tune sequencer performance

- Throughput: 50 Mb to 15 Gb

- Runtimes: 2 - 7 hours, faster than most other platforms.

- Niche: Gene-panel sequencing and point-of-care clinical applications

- Ion Torrent is attempting to capitalize on this: fast diagnostics

- Low amount starting DNA (1-10ng) due to IonTorrent focus on amplicon gene panel applications (AmpliSeq); optimal for FFPE samples (cancer): low quality and quantity DNA yield

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Sequencing by Synthesis, Single Nucleotide Addition: Ion Torrent, key specs

Page 24: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

24© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Sequencing by Synthesis, Single Nucleotide Addition: Ion Torrent, key specs

Modified from Nat Rev Genet. 2016 May 17;17(6):333-51

Page 25: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

25

Sequencing by Synthesis, Cyclic Reversible Termination: Illumina, Qiagen

- 1.- Nucleotides are blocked by a 3′-O-azidomethyl group and labeled with a base-specific, cleavable fluorophore(Qiagen system: 3′-O-allyl group). Mutagenesis of DNA polymerase is required to facilitate the incorporation of 3′-blocked terminators

- 2.- During each cycle, only one nucleotide is incorporated (blocked 3′ group)

- 3.- Slide is imaged by total internal reflection fluorescence (TIRF) microscopy using either two or four laser channels. TIRF allows observation of a very thin region of a specimen, usually less than 200 nm.

- 4.- The dye is then cleaved and the 3′-OH is regenerated with the reducing agent tris(2-carboxyethyl)phosphine (TCEP). (Qiagen system: mixture of palladium and P(PhSO3Na)3 (TPPTS)

Nat Rev Genet. 2016 May 17;17(6):333-51

Page 26: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

26© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Qiagen

(Intelligent Biosystems)

Illumina

(Solexa)

- Key invention for SBS-CRT technologies

- Nucleotides are blocked by a 3′-O-azidomethyl group (Illumina) and labeled with a base-specific, cleavable fluorophore (Qiagen system: 3′-O-allyl group).

- Mutagenesis of DNA polymerase is required to facilitate the incorporation of 3′-blocked terminators

- In blue: cleavage site for Fluorochrome removal and residual linker structure

- In red: terminating functional groups or blocking agents

- Reducing agents to unblock the incorporated nucleotides:

- Illumina: Tris(2-carboxyethyl)phosphine (TCEP).

- Qiagen: mixture of palladium and P(PhSO3Na)3 (TPPTS)

- Others: lighting terminators (LaserGen), Virtual terminators (Helicos BioSciences)

Sequencing by Synthesis, Cyclic Reversible Termination: 3′-blocked reversible terminators

Nat Rev Genet. 2010 Jan;11(1):31-46

Page 27: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

27

Illumina, complete process video

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

https://www.youtube.com/watch?v=fCd6B5HRaZ8

(Source: Illumina)

Page 28: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

28© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Sequencing by Synthesis, Cyclic Reversible Termination: Illumina, key specs

- Illumina dominates the short-read sequencing industry

- Most common error type: substitutions (specially after a ‘G’ base). – dNTPs are not added sequentially (as in SNA approaches) but simoultaneously

- Underrepresentation of AT-rich and GC-rich regions (amplification bias during template preparation)

- Overall, very high accuracy (>99.5%)

- Mature technology, highly validated over many years

- Wide range of platforms:

- Ultra low-throughput: MiniSeq

- Low-throughput: MiSeq

- Mid throughput: NextSeq

- High-throughput: HiSeq 2500/3000/4000

- Ultra-high-throughput: HiSeq X (1,800 human genomes, 30x, per year); NovaSeq 5000/6000

- Many options for runtime, read structure and read length (up to 300 bp):

- Single-end and paired-end; synthetic long reads

- 36 – 300 bp

- Throughput range: 540 Mb – 900 Gb: 14M reads – 3B reads

- Low homopolymer error because of the CRT technology (lower than SBS-SNA platforms).

- Index mis-assignment due to “cluster bleeding” (solved with patterned flow cells in HiSeq 3000-4000) and/or PCR recombination (0.1-0.5% of reads; due to PCR not done in individual emulsion droplets)

Page 29: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

29© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Sequencing by Synthesis, Cyclic Reversible Termination: Illumina, key specs

Modified from Nat Rev Genet. 2016 May 17;17(6):333-51 (partial table v2/v3 reagents not shown)

Page 30: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

30

2nd Generation NGS known limitations

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- Read length determined by the number of sequencing cycles

- Dephasing: errors in the synchronous growing of the strands during the consecutive cycles of dNTP additions. Requirement of high cycle efficiency

- Lagging strands: n – 1 or n – x from the expected cycle, result from incomplete extension

- Leading strands: n + 1 or n + x, from the addition of multiple nucleotides or probes in a population of identical templates.

- Read sequence is a consensus sequence of the cluster/molecular clon

- Signal dephasing increases fluorescence noise, causing base-calling errors and shorter reads.

- PCR required: short reads are unavoidable.

- Millions of reads needed for coverage values that can be reached with only thousand of 3rd Generation long reads.

- Need of dNTP addition cycles (Sequencing cycles), making it a slow technology

- PCR limitations: AT-rich / GC-rich sequences bias

- Structural Variation not resolved

- Only useful for species with reference genomes available (or small genomes such as bacteria or virus)

Page 31: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

31

2nd Gen NGS: all about “big boxes”

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Source: Illumina website

5500xl SOLiD (Applied Biosystems)GS Junior (Roche) Ion Torrent S5 (ThermoFisher)

Source: manufacturer websites

HiSeq 2000 (Illumina)

1944, The Colossus computer

Page 32: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

32

NGS technologies: 3rd generation NGS

Single Molecule Seq.(Long reads)

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 33: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

33

Pacific Biosciences, SMRT (single-molecule real-time) DNA sequencing method

- Flowcell: Zeromode waveguide (ZMW) ‘wells’, each holding 20 zeptoliters (10–21 liters).

- The ZMW holes are ~70 nm in diameter and ~100 nm in depth. Light travel is impaired through a small aperture → the optical field decays exponentially inside the chamber.

- Within this tiny volume, the activity of DNA polymerase incorporating a single nucleotide can be readily detected,

at a pace of ~3 bases/second

- DNA polymerase immobilized in the bottom: detection of only the bottom of the well where the nucleotide incorporation happens

- dNTP incorporation on each single-molecule visualized with a laser and camera: recording of the color and duration of emitted light (labeled nucleotide pauses during incorporation at the bottom of the ZMW)

Nat Rev Genet. 2010 Jan;11(1):31-46

Page 34: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

34

- The fluorophore is attached to the terminal phosphate group: cleaved fluorescent dye molecule then diffuses out of the detection volume.

- Circular template that allows each template (SMRTbell template) to be sequenced multiple times as the polymerase repeatedly traverses the circular molecule.

- These multiple passes are used to generate a consensus read of insert, known as a circular consensus sequence (CCS)

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Pacific Biosciences, SMRT (single-molecule real-time) DNA sequencing method

Nat Rev Genet. 2010 Jan;11(1):31-46

Nat Rev Genet. 2016 May 17;17(6):333-51

Page 35: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

35© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Pacific Biosciences, SMRT DNA sequencing method, key specs

- Average read length 10 – 15kb; reads surpass 50 kb easily; Fast runtime

- Ideal for de novo genome assembly, complex long-range genomic structures and for full-length transcript sequencing.

- Single-pass error rate for long reads is 15%, indel errors the most frequent ones (homopolymers)

- Errors are randomly distributed and can be discarded with high coverage and/or the use of a circular template → up to ~99.999% for insert sequences derived from at least 10 subreads

- Consensus sequence: multiple overlapping reads from a single molecule of DNA are aligned to each other; most likely base at each position is determined, reducing single-pass error rates.

- A high-quality consensus sequence derived from the circular template is called a circular consensus sequence (CCS). 50x is optimal to achieve accuracy.

- Runtimes and throughput can be controlled by regulating the length of time that the sensor monitors the ZMW; longer templates require longer times.

- 1 kb library run for 1 hour generates around 7,500 bases of sequence per molecule, with an average of 8 passes; a 4-hour run will generate around 30,000 bases per molecule and ~30 passes.

- Conversely, a 10 kb library requires a 4-hour run to generate ~30,000 bases with ~3 passes.

- PacBio RS II: high cost of the machine and runs + limited throughput (around $1,000 per Gb); need for high coverage.

- PacBio Sequel: throughput 7× RS II, lower costs; able to sequence a human genome at 30× coverage

Page 36: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

36© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Oxford Nanopore, Nanopore Sequencing

- Direct sequencing of ssDNA through nanopore (1.2 nm wide)

- Ion current passes through the pore constantly.

- A motor protein translocates the ssDNA → modulation of the current passing through the pore

- The graph showing the current alteration over time is called “squiggle space”

- More than 1,000 (one for each possible k-mer), especially for modified bases

- Nanopores are built into membranes, lying across the microssuport

- Flowcell with sensor array: collection of electrodes and microsupports, each one has an individually addressable electronic channel;

- Signal is measured by an application specific integrated circuit (ASIC) in the flow cell, and processed by MinKNOWsoftware: real time quality control data, experimental control, etc

- Real Time data analysis with Metrichor: data is acquired and analyzed in Real Time

Nat Rev Genet. 2016 May 17;17(6):333-51

Page 37: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

37© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

https://www.youtube.com/watch?v=BNz880V52rQ

Source: Oxford Nanopore

Oxford Nanopore, Nanopore Sequencing at work

Page 38: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

38

Oxford Nanopore, Nanopore Sequencing, key specs

- ONP motto: “Sequencing anything, anywhere” → field testing epidemics (Ebola outbreak), environmental monitoring, infectious disease (antibiotic resistance), NASA took one to space

- R9.4 pore: 500 bp/sec

- 1D, 10 Gb, 92% accuracy

- 2D, 5 Gb, 97% accuracy

- This accuracy values are still substantially lower than the ones from other platforms. However, new analysis algorithms and the new 1D2 library prep could solve this issue

- MinION: size of a handpalm, USB 3 cable connection to laptop (no external energy supply)

- SmidgION (not released yet): connection to the cell phone, ultraportable, low throughput

- PromethION (not released yet): ultra-high-throughput platform, 48 individual flow cells, each one 3,000 pores running at 500 bp per second.

- ~2–4 Tb for a 2-day run on (if true, competitor to Illumina’s HiSeq X).

- Sequence yourself: detect infections, cancer, immunologic responses, before getting sick

- Simple and fast workflow: sample added to a port in the minION, flows across the surface of the sensor array

- RUN UNTIL option: Running until enough selected data is collected: minutes (virus detection) or days (genome analysis); Reverse pore mode: eject seqs that are not intended to be sequenced.

- Easy and Fast library prep; Automatized library prep: VolTRAX (not released yet)

- Direct seq. of RNA + Many other products and accessories in development

Page 39: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

39

3rd Generation NGS, conclusions

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- Easier sample preparation: adapter ligation, no PCR

- Read length determined by the speed of the sequencing technology (no sequencing cycles)

- Faster runtimes (continuous run, no need of cycles of dNTPs addition)

- Real Time sequencing: the read sequence is obtained as it is being read by the device

- No PCR involved: long reads (1kb – 500 kb), fragmentation during DNA extraction may be the only limit

- Long reads are much easier to align to the ref. genome: Structural variation, repeats (de novo assembly ofgenomes), complete structure of transcripts

- Precise characterization of highly polymorphic loci (HLA, CYP2D6)

- Synthetic long read approaches

- No duplicates: no error carryover from PCR amplification

- Lower accuracy methods, but improving using artificial intelligence algorithms

- Single molecules susceptible to multiple nucleotide additions (homopolymer error).

- No cluster consensus: deletion errors owing to quenching effects between adjacent dye molecules or no signalbecause of the incorporation of dark nucleotides (nucleotide or probe that looses the fluorescent label by cleavageor hydrolysis). This issues are solved in 2nd Gen by cluster consensus.

Page 40: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

40

3rd Gen NGS: from big boxes to ultraportable

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

PacBio Sequel (no longer Roche)

Source: manufacturer websites

MinION (Oxford Nanopore) SmidgION (Oxford Nanopore)

Page 41: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

41

NGS technologies: Key concepts andApplications

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 42: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

42

NGS applications

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- De novo genome assembly: long reads + short reads hybrid approach (plus an optical genome mapping technology such as Iris from BioNano Genomics)

- Variant discovery: Whole genome, Exome, Targeted Sequencing (Gene panels) → Clinical applications: detection of SNVs, structural variations and mutations in mendelian diseases and cancer

- Transcriptomics: RNA-seq, GRO-seq, Ribosome profiling

- Metagenomics: bacterial 16s sequencing to study the communities of microbial genomes that in the sample (no need to culture isolated microbes).

- Epigenomics: ChIP-seq, FAIRE-seq, methyl-seq, DNase-seq

- Single-cell genomics, transcriptomics, and epigenomics

- Many more…

Source:http://finchtalk.blogspot.com.es/2008_11_01_archive.html

Page 43: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

43

Targeted Re-Sequencing, Target Enrichment and Gene Panels

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- Price of a 30x whole genome is still high

- Seq. Depth: very deep sequencing is needed for applications such as cancer genomics (specially for liquid biopsy).

- Faster time to results and library multiplexing

- Size of the whole exome is 1000 times less than whole genome (less than 2% of the human genome): around 50 Mb versus 3 Gb | A panel of 300-400 genes is under 2Mb

- Hybridization capture and Multiplexed PCR → most popular technologies

- Microdroplet PCR or MicroFluidics → PCR approaches in droplets or physical partitions, not very popular

- Capture on solid surface (microarrays) → Not used anymore

- Molecular Inversion Probes → Not very popular

- dCas9 target enrichment is a new approach (ONP is developing it)

- Most common applications: exome and gene panels

Nat Rev Genet. 2010

Jan;11(1):31-46

Page 44: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

44

Target Enrichment by Probe Hybridization in Solution

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

- 50-80% enrichment (on-target sequences)

- 80-120 bp RNA baits (synthetic or cRNA), end labeled with a biotin molecule (can be DNA as well)

- Streptavidin magnetic beads used for isolation

- Baits are digested with RNase

- High amount of DNA required: at least 100-200 ng for optimal results, although it can be reduced to 10 ng (increasing the propotion of duplicates)

- Time consuming: several hyb-clean up rounds required

- Popular Products: Roche Nimblegen SeqCap, Agilent SureSelect, Illumina TruSight, iDT xGen panels

- Barcoding for dedup: Haloplex technology

Source: Agilent

Webpage

Page 45: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

45© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Target Enrichment by Multiplexed PCR: AmpliSeq (Ion Torrent)

Modified from Clin Endocrinol (Oxf). 2017 Apr;82(4):533-42

- Up to 24,000 targets per tube (Ampliseq Exome: 8 tubes)

- Tiling PCR reactions separated in at least two independent reaction (to avoid overlapping of the PCR products)

- Different PCR size depending on the source of the sample:

- Fresh/frozen: up to 250 bp

- FFPE: up to 120 bp

- Pre-designed panels and custom panels (AmpliSeq tool)

- Non-amplified primers are digested

- Amplicons are subject to bead-emulsion clonal amplification, as explained previously

- One to thousands of genes

- Up to 384 barcodes for multiplexing

- GeneRead (Qiagen): higher number of PCRs, with more overlapping and hence, more reactions (at least 4 reactions)

From Qiagen

webpage

Page 46: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

46

Hybridization capture vs Multiplexed PCR

HYB CAPTURE PROS & CONS (Traditionally Illumina):

- More reliable detection of variants

- Lower duplicates

- High degree of variability in coverage and representation

- Capturing of off-target fragments

- Requirement of hundreds of nanograms of DNA

- Prep + Sequencing price will be higher than whole genome sequencing at some point in the near future

MULTIPLEXED PCR/AMPLICONS PROS & CONS (Traditionally Ion Torrent):

- Higher proportion of on-taget reads

- Fast workflow and low cost

- Low DNA input (10 ng DNA)

- Primer cross reactivity

- Dropouts: Variation at primer binding sites could lead to amplicon dropout

- Also high degree of variability in coverage and representation

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 47: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

47

Muchas gracias

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.

Page 48: Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

Instituto de Medicina Genómica SL

Agustín Escardino 9,

Parc Científic de la Universitat de València

46980 Paterna (Valencia, España)

+34 963 212 340

Imegen.es

© 2017 IMEGEN – Información confidencial. Todos los derechos reservados.