NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

32
Vall d’Hebron Institut de Recerca (VHIR) Rosa Prieto Head of the High Tech Unit [email protected] 15/05/2014 Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII) NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH

description

Course: Bioinformatics for Biomedical Research (2014). Session: 2.1.2- Next Generation Sequencing. Technologies and Applications. Part II: NGS Applications I. Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.

Transcript of NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Page 1: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

1

Vall d’Hebron Institut de Recerca (VHIR)

Rosa PrietoHead of the High Tech Unit

[email protected]

15/05/2014

Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII)

NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS

CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH

Page 2: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

2

INTRODUCTION TO NGS1

2

3

4

Index

NGS TECHNOLOGY OVERVIEW

NGS APPLICATIONS OVERVIEW

CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH

WHAT IS NEXT IN SEQUENCING TECHNOLOGIES?

Page 3: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

NGS applications

-Amplicon sequencing-Targeted DNA resequencing-Exome sequencing-Whole genome sequencing

-Metagenomics

-RNA sequencing-Targeted RNA resequencing

-Epigenomics-Sequencing of free DNA-RNA (plasma/serum)

Page 4: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Considerations to use NGS

-What do I want to sequence? Whole genome, exome, several genes, metagenome,epigenome, RNAseq.....

-How many samples?

-Length of read required?

-Quality and quantity of starting material?

-Size of nucleic acids to sequence

-Amount of sequence needed: coverage

(Depth of) Coverage: how many times a particular base is sequenced.30x = each base has been read by 30 sequences (in average)

Depth of coverage = (nº reads * read length / size of target genome)

(Breadth of) Coverage: amount of the target sequence that has been covered (with agiven coverage)

Page 5: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Considerations to use NGS

Which depth of coverage do I need?It is an empiric value that depends on the objective of the study and its particular conditions (consensus values may exist)

Page 6: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Amplicon sequencing: viral quasispecies

In an infected patient the population of viruses presents highrates of mutation and replication. It is a complex mixing ofdifferent mutants.

Goal of the study:

Detection and quantification of mutations or combination ofmutations that could confer resistance to viral inhibitors in

samples from infected patients.

Special interest in mutations at a low rate (minor variants).

HCV, HBV, HIV virus populations have special characteristics:

Page 7: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Amplicon sequencing: viral quasispecies

Minor variants often play an important role in the development of resistance to antiviral treatments in patients, even if they are present in a very low percentage in the population.

Minor variants may not be detected by classical sequencing methods You obtain hundreds of sequences with much effort and high cost

NGS allows to detect efficiently variants at a very low rate You obtain thousands of sequences with relatively low cost

WHY IS NGS APPROPIATED FOR THIS KIND OF STUDY?

454 technology is the most appropiated method in this particular case (longsequences are achieved)

Page 8: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Targeted sequencing using gene panels

Array-based capture system

Liquid capture system

Page 9: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Targeted sequencing using gene panels

Illumina

Ion Torrent

Page 10: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Considerations that affect capture efficiency

-Quality and quantity of input DNA-Repeat elements, tandem repeats and pseudogenes: uneven distribution of coverage-Extreme GC content: 5’UTR, first exons of genes, promoter regions-Library insert length and its distribution:

•Different capture platforms recommend different sets of standard practices forsample library preparation.•.As a result of these underlying chemistries, each platform has its own range ofrecommended fragment sizes. Agilent insert size ranges from 100 to 300bp,Nimblegen ranges from 150 to 250bp and TruSeq has the broadest range of 300to 500bp.

-Consistent laboratory procedures.

Page 11: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Sequence capture for cancer genomics

Page 12: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Exome vs. whole genome sequencing

PROS:• Enabling technologies: NGS machines, open-source algorithms,

capture reagents, lowering cost, big sample collections• Exomes are more cost effective (less sequencing for the same

coverage): human genome 3,2 Gb vs. human exome aprox. 50 Mb (1-2% of the genome)

• Simplified bioinformatics analysis compared to whole genomes

CHALLENGES:• Still can’t interpret many Mendelian disorders• Rare variants need large samples sizes• Exome might miss regions of interest (e.g. novel non-coding genes)• Exome reagents do not capture all exons• Sometimes unsuccessful to interpret clinical data

Shendure, Genome Biol 2011

Page 13: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

( )

/emPCR

Exome sequencing workflow

Page 14: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Illumina exome sequencing

Kits

Sequencers

-Nimblegen EZ capture-Agilent SureSelect-Raindance.......

Page 15: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Ion exome sequencing

Page 16: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

De novo sequencing

Resequencing

Whole genome sequencing

http://www.ncbi.nlm.nih.gov/projects/WGS/WGSprojectlist.cgi

Page 17: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

Whole genome sequencing

Sequenced reads

Contigs

Scaffolds

Mapped Scaffolds

Genome map

Long reads (454, PacBio, PE Illumina reads)

Shot gun

Page 18: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

18

Secuenciación de la cepa bacteriana E. coli O104:H4 con GS Junior, MiSeq, PGM.

1. Creación de un ensamblaje de referencia (Roche GS FLX+ shotgun + 8 Kb PE, coverage 32x). Contiene 1 cromosoma (5.3 kb) y 2 plásmidos. Quedan 153 gaps correspondientes a regiones repetitivas sin resolver.

2. Secuenciación de la misma cepa usando:• 2 runs del 454 GS Junior• 2 chips 316 del Ion Torrent PGM• 1 run del MiSeq (2x150 bases)

Performance comparison of benchtop high-troughput sequencing platforms.Nat. Biotechn. 30 (5): 434-441 (2012)

Whole genome sequencing

Page 19: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

19

Conclusions: “One important conclusion from this evaluation is that saying that one has “sequenced a bacterial genome” means different things on different benchtop sequencing

platforms”

MiSeq GS Junior IonTorrent

Throughput/run The highest The lowest The fastest

Errors The lowest Intermediate(indels) Many, specially in homopolymers

Read length Intermediate (2x150bp)

The longest (520 bp) The shortest (100bp)

Run time The longest (27 hr) Intermediate (9 hr) The shortest (3 hr)

Price per Mb The cheapest The most expensive Intermediate

Other considerations Unfillable gaps Errors in homopolymers The worstest performance

Performance comparison of benchtop high-troughput sequencing platforms.Nat. Biotechn. 30 (5): 434-441 (2012)

Whole genome sequencing

Page 20: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

20

• La pequeña fracción del genoma con variaciones entre los individuos puede explicar diferencias en la susceptibilidad a una

enfermedad, en la respuesta a fármacos o en la reacción a factores ambientales. El “Proyecto de los 1000 genomas” tratará

de establecer un mapa del genoma humano que incluya la descripción de la mayor cantidad posible de variaciones en el

mismo, mejorando de forma espectacular la información obtenida con el proyecto HapMap.

• El proyecto se realiza con el soporte principal de tres instituciones: el Wellcome Trust Sanger Institute (Hinxton, Inglaterra),

el Beijing Genomics Institute (Shenzen, China) y el National Human Genome Research Institute, que forma parte del NIH

(National Institutes of Health, USA).

1000 Genomes Project

Page 21: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

21

Methods:

1-Low coverage (5x) sequencing: SOLiD+Illumina

2-Whole exome sequencing (80× average coverage across a consensus target of 24 Mb spanning more than 15,000 genes)):SeqCap EZHuman Exome Library, Nimblegen, and SureSelect All Exon V2 Target Enrichment kit from Agilent.

3-SNP genotyping: Initially all samples were typed using a Sequenom MassArray SNP Genotyping panel of 23 SNPs and onegender determining assay to establish a genetic fingerprint. After gender concordance was verified the samples were placed on 96well plates using the llumina HumanOmni2.5OQuad v1.0 B SNP array.

1000 Genomes Project

Page 22: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

22

El proyecto publicará el genotipo de los voluntarios,junto con información detallada de su fenotipo:registros médicos, varios análisis, imágenes RM, etc.Toda la información estará disponible para cualquieraen Internet, para que investigadores puedan probarvarias hipótesis acerca de las relaciones entre elgenotipo, el ambiente y el fenotipo.

Personal Genome Project

Page 23: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

23

ClinVar

MedGen to research the phenotype

http://www.ncbi.nlm.nih.gov/medgen/

GTR (Genetic Testing Registry) to choose appropriate tests

http://www.ncbi.nlm.nih.gov/gtr/

ClinVar to research variant pathogenicity

http://www.ncbi.nlm.nih.gov/clinvar/

NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)

Page 24: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

24

NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)

Patient showing signs compatible with Marfan syndrome:

Page 25: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

25

NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)

Page 26: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

26

List of tests for Marfan syndrome (panels included)

Page 27: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

27

NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)

Page 28: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

28

Page 29: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

29

NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)

Page 30: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

30

Searching ClinVar

NM_000138.4:c.4786C>TFBN1:c.4786C>Tc.4786C>TArg1596TerR1596*

Page 31: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

31

Allele summary• Gene• Variant type• Genomic location• HGVS expressions*• Molecular consequence*• Links*• Frequency*

Phenotype summary• Names• Links*• Age of onset *• Prevalence *

Interpretation• Significance• Review status *• Accession.version *

* May be provided by NCBI

ClinVar detailed display

Page 32: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

32

ClinVar detailed display