Introduction to NGS

Post on 14-Jun-2015

4.655 views 2 download

Tags:

description

Introduction to NGS - Ana Conesa -Massive sequencing data analysis workshop -Granada 2011

Transcript of Introduction to NGS

Introduction to NGS

Ana Conesa

Head of Genomics of Gene Expression Lab

Centro de Investigaciones Prínicpe Felipe

aconesa@cipf.es

http://bioinfo.cipf.es/aconesa

Next Generation Sequencing

NGS has brought high speed not only to genome sequencing and personal medicine, but has also change the way we do genome research:

Got a question on genome organization:

SEQUENCE IT!!!!

NGS technologies

Cost-effectiveFast

Ultra throughputCloning-freeShort reads

Roche 454 pyrosequencing

Roche 454 pyrosequencing

Roche 454

GS Junior, benchtop

Solexa

Solexa

Solexa-HiSeq

200 Gb/run in 8 days2x100 bp fragments

2 billion reads per run

Helicos

SOLiD

SOLiD

* Sequencing output in “color space”

* Needs reference genome to translate to base space.

SOLiD 5500

* Fifth 3-based encoded primer

* Sequencing output in base space

* No reference needed

5500 xl-u SOLiD

180 Gb/run (microbeads)300 Gb/run (nanobeads)

35-75 bp fragments

2.8 - 4.8 billion reads/run

2x6 lanes/run96 bar-codes

99.99% accuracy

Pacific Biosystems

Real time DNA synthesisUp to 12000 nt??

50 bases/second??

Ion Torrent

•$ 50.000•$ 500 /sample

• 1 hour/run• > 200 nt lengths

•Reads H+ released by DNA polymerase

Comparison

•Short fragments•Errors: Hexamer bias•High throughput•Cheap

•Resequencing:•ChipSeq•RNASeq•MethylSeq

•Short fragments•Color-space•High throughput•Cheap

•Resequencing:•ChipSeq•RNASeq•MethylSeq

•Long fragments•Errors: poly nts•Low throughput•Expensive

•De novo sequencing:•Amplicon sequencing

Roche 454 Solexa SOLiD

Applications

De novo sequencingResequencingExome SequencingRNA-seqGenome annotationChip-seqMethyl-seq…….

Applications

De novo sequencingResequencingExome SequencingRNA-seqGenome annotationChip-seqMethyl-seq…….

Basic steps NGS data processing

QC and read cleaning

Basic steps NGS data processing

QC and read cleaning

Mapping

Basic steps NGS data processing

QC and read cleaning

MappingFeature

identification

Basic steps NGS data processing

QC and read cleaning

MappingFeature

identification

SNVsIndels

Rearrang.

RPKMSplicing

DNA Binding site

RNA-seq

Elucidate gene models

Quantify gene expression

RNA-seq

Elucidate gene models

RNA-seq protocol*

total RNA purification

oligodT

RiboZ

mRNA preparation

2nd strand synthesis fragmentation1st strand synthesis

RNADNA

*Solexa Pair-End

RNA-seq protocol (II)

A

A

A

A

A

A

A

A

A

A

adenylation 3’ ends

ligate adapters

amplification

SEQUENCING!

library

10

0b

p la

d

400-200

400-200

Strand-specific RNAseq

Strand-specific RNA-seq

fastq: sequence data and qualities

SAM/BAM: mapping data and qualities

File formats

Some Figures

1 Solexa run ==8 lanes ==25 M reads/lane==2 x 4 G fastq/lane (PE)32 G disk space

Mapping @ processor 12 cores, 48 GB RAM , 4TB disk 24 hours

SAM (Ascii) / BAM (Binary) output 36 G / 9 G

How much does it “cost” (computationally) to sequence a human transcriptome?

One human transcriptome: 100 Million reads

Applications of RNAseq

Qualitative:* Alternative splicing* Antisense expression* Extragenic expression* Alternative 5’ and 3’ usage* Detection of fusion transcripts

….

Quantitative:* Differential expression* Dynamic range of gene expression

….

Tophat/CufflinksScripture

Alexa

edgeRDESeqbaySeqNOISeq

Advantages of RNAseq?

* Non targeted transcript detection* No need of reference genome* Strand specificity* Find novels splicing sites* Larger dynamic range* Detects expression and SNVs* Detects rare transcripts

….

* Restricted to probes on array* Needs genome knowledge* Normally, not strand specific* Exon arrays difficult to use* Smaller dynamic range* Does not provide sequence info* Rare transcripts difficult

….

RNAseq microarrays

and…. are there any disadvantages?????

Resequencing

Exome Sequencing

DNA (patient)

Gene A Gene B

Produce shotgun library

Capture exon sequences

Wash & Sequence

Map againstreference genome

Determine variants,Filter, comparepatients

candidate genes

1

2

3

45

Exome capture

The principle: comparison of patients

Patient 1

Patient 2

Patient 3

Patient 4

Patient 5

Patient 6

mutation

candidate gene (shares mutation for all patients)

ChipSeq

MethylSeq

MIDseq

Census NGS methods

Sucessful Stories

Miller syndrome

Species composition of metagenomic DNAextracted from mammoth hair.

Conclusions

NGS is revolutionizing how we do genome research

Conclusions

NGS is revolutionizing how we do genome research

But it will also revolutionize our lives….

Conclusions

NGS is revolutionizing how we do genome research

But it will also revolutionize our lives….

If we manage to process and analyze the data

YOUR SUCESSFUL STORY???

Have a great MDA course?