NEXT GENERATION SEQUENCING II · NEXT GENERATION SEQUENCING II R. PIAZZA (MD, PHD) DEPT OF MEDICINE...

51
NEXT GENERATION SEQUENCING II R. PIAZZA (MD, PHD) DEPT OF MEDICINE AND SURGERY UNIVERSITY OF MILANO BICOCCA

Transcript of NEXT GENERATION SEQUENCING II · NEXT GENERATION SEQUENCING II R. PIAZZA (MD, PHD) DEPT OF MEDICINE...

NEXT GENERATION SEQUENCING II

R. PIAZZA (MD, PHD) DEPT OF MEDICINE AND SURGERY UNIVERSITY OF MILANO BICOCCA

DNA

+

R. Piazza – NGS Sequencing II

Capillary

Electrophoresis

5’ 3’

5’

3’

SANGER SEQUENCING

DNA Polymerase

Flowcell

NEXT-GENERATION-SEQUENCING

R. Piazza – NGS Sequencing II

Genomic DNA

DNA Library

~100bp ~100bp

Single-Read Paired-End

NEXT-GENERATION-SEQUENCING

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

HIGH-THROUGHPUT SEQUENCING

The sequence is read in each cluster through multiple cycles of nucleotide incorporation

R. Piazza – NGS Sequencing II

NEXT-GENERATION-SEQUENCING

THROUGHPUT

SINGLE SEQUENCING RUN

6000 000 000 000 bp !!

R. Piazza – NGS Sequencing II

FASTQ

NGS ANALYSIS: FIRST STEPS

1 2 3 4

SEQUENCE

QUALITY (PHRED)

p = 1/100

Quality = -10 Log (1/100)

Quality = -10 Log 10-2

Quality = -10 * -2 = 20

p = 1/1000

Quality = -10 Log (1/1000)

Quality = -10 Log 10-3

Quality = -10 * -3 = 30

Quality = 40 P = 1/10000

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

FASTQ

NGS ANALYSIS: FIRST STEPS

SEQUENCE

QUALITY (PHRED)

Number of FASTQ elements = 1000 000 000 000 / 100 = 10 billion FASTQ

100 bases per read -> 100 bytes

+

Quality = Sequence = 100 bytes

10 billion FASTQ elements x (100bytes + 100bytes + 50bytes)

+

Lines 1 + 3 = ~ 50bp

2250 billion bytes = 2250 Gigabytes = 2.25 Terabytes

R. Piazza – NGS Sequencing

1 2 3 4

FASTQ

NGS ANALYSIS: FIRST STEPS

1 2 3 4

SEQUENCE

QUALITY (PHRED)

BWA - http://bio-bwa.sourceforge.net/

BOWTIE - http://bowtie-bio.sourceforge.net/index.shtml

ALIGNMENT TO A REFERENCE

PROBLEM: A STANDARD NGS EXPERIMENT MAY

GENERATE HUNDREDS OF MILLIONS OF INDIVIDUAL

READS !!

BOWTIE2 - http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

R. Piazza – NGS Sequencing II

ALIGNMENT ?

1) SAM (Sequence Alignment Map)

2) BAM (Binary Alignment Map) + BAI index file

Name Chromosome Position Sequence and Quality

SAMTOOLS (samtools.sourceforge.net/)

Li H et al., Bioinformatics. 2009 Aug 15;25(16):2078-9.

R. Piazza – NGS Sequencing II

ALIGNMENT VIEWER - IGV

R. Piazza – NGS Sequencing II

SINGLE

NUCLEOTIDE

POLYMORPHISM

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT..

SOMATIC

VARIANT

CONTROL SAMPLE

CASE SAMPLE

SOMATIC MUTATION OR SNP ?

CGGCATTGGGACAGACAACAACAGAACTTCT GGCATTGGGACAGACAACAACAGAACTTCTG GGCATTGGGACAGACTACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGAACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGAACTTCTGACCAA

GACAGACTACAACAGCACTTCTGACCAAGC

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT.. CGGCATTGGGACAGACTACAACAGCACTTCT GGCATTGGGACAGACAACAACAGCACTTCTG GGCATTGGGACAGACAACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGCACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGCACTTCTGACCAA

GACAGACAACAACAGCACTTCTGACCAAGC

VARIANT CALLING

R. Piazza – NGS Sequencing II

SINGLE

NUCLEOTIDE

POLYMORPHISM

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT..

SOMATIC

VARIANT

CONTROL SAMPLE

CASE SAMPLE

SOMATIC MUTATION OR SNP ?

CGGCATTGGGACAGACAACAACAGAACTTCT GGCATTGGGACAGACAACAACAGAACTTCTG GGCATTGGGACAGACTACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGAACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGAACTTCTGACCAA

GACAGACTACAACAGCACTTCTGACCAAGC

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT.. CGGCATTGGGACAGACTACAACAGCACTTCT GGCATTGGGACAGACAACAACAGCACTTCTG GGCATTGGGACAGACAACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGCACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGCACTTCTGACCAA

GACAGACAACAACAGCACTTCTGACCAAGC

VARIANT CALLING

R. Piazza – NGS Sequencing II

SOMATIC VARIANT: DRIVER OR PASSENGER ?

R. Piazza – NGS Sequencing II

NGS GOES DIGITAL C

ASE

(TU

MO

R)

CO

NTR

OL

(GER

MLI

NE)

????

R. Piazza – NGS Sequencing II

CO

NT

RO

L

CA

SE

NGS GOES DIGITAL

R. Piazza – NGS Sequencing II

Case

Control

Re

ad c

ou

nt

Genomic position

R. Piazza – NGS Sequencing II

Wilcoxon Signed-Rank test

Statistical module

Wilcoxon Signed-Rank

test Test statistic W

As sample size increases

(Nr > 10) the Z-Score

converges to a Gaussian

distribution!

Estimating the error function of the normal

distribution of W..

rN

i

i

control

i

case

i RxxW1

)()(sgn

25

5

4

4

3

3

2

211)( xetatatatataxerf

..using the Abramowitz and Stegun

approximation equation 7.1.26

R. Piazza – NGS Sequencing II

SINGLE NUCLEOTIDE POLYMORPHISMS

http://atlasofscience.org

R. Piazza – NGS Sequencing II

CO

NT

RO

L

CA

SE

T

A

A

A T

A T

LOSS OF HETEROZYGOSITY – ALLELIC IMBALANCE

R. Piazza – NGS Sequencing II

COMPARATIVE

EXONIC

QUANTIFICATION

ANALYZER

http://www.ngsbicocca.org/

BIOINFORMATICS – CEQer2

Piazza R. et al., PLoS One. 2013 Oct 4;8(10):e74825.

Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24.

Gambacorti C. et al., Blood. 2015 Jan 15;125(3):499-503.

Piazza R. et al., Nucleic Acids Res. 2012 Sep;40(16):e123.

Spinelli R. et al., Mol Genet Genomic Med. 2013 Nov;1(4):246-59.

Piazza R. et al., Nat Comm. 2018. In press.

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

CM

L-C

P

CM

L-B

C

SO

LID

TU

MO

R

Chr17 TP53

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

BIOINFORMATICS – CEQer2

SOMATIC UNIPARENTAL DISOMY

R. Piazza – NGS Sequencing II

NORMAL CHROMOSOMES: 1 MATERNAL, 1 PATERNAL

CBL

Chromosome break

OS = Oncosuppressor

MUTATION

HOMOLOGOUS REPAIR

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

Sequence Type: ST-1879

Organism: Klebsiella pneumoniae

MLST Profile: kpneumoniae

GENE % IDENTITY HSP Length Allele Length GAPS BEST MATCH

GAPA 100 237 450 0 GAPA_1

INFB 100 318 318 0 INFB_3

MDH 100 477 477 0 MDH_1

PGI 100 402 432 0 PGI_1

PHOE 100 179 420 0 PHOE_1

RPOB 100 501 501 0 RPOB_1

TONB 100 414 414 0 TONB_79

Bartual SG, Seifert H, Hippler C, Luzon MA, Wisplinghoff H, Rodriguez-Valera F. J Clin Microbiol 2005; 43:4382-90.

Griffiths D, Fawley W, Kachrimanidou M, et al. J Clin Microbiol 2010; 48:770-8.

Lemee L, Dhalluin A, Pestel-Caron M, Lemeland JF, Pons JL. J Clin Microbiol 2004; 42:2609-17.

Wirth T, Falush D, Lan R, et al. Mol Microbiol 2006; 60:1136-51.

Jaureguy F, Landraud L, Passet V, et al. BMC Genomics 2008; 9:560.

Larsen MV, Cosentino S, Rasmussen S, et al. J. Clin. Micobiol. 2012. 50(4): 1355-1361.

Resistance gene

Identity Query/HSP Contig Position in

contig Phenotype Accession no.

blaOXA-9 99.76 840/840 NODE_11694_length_1029_cov_574.73956

3 106..944 Beta-lactam resistance JF703130

blaLEN12 90.79 789/684 NODE_2323_length_681_cov_631.278992 18..701 Beta-lactam resistance AJ635406

aadA2 99.59 780/491 NODE_3745_length_470_cov_805.451050 1..490 Aminoglycoside resistance X68227

blaTEM-79 100 861/621 NODE_8636_length_601_cov_644.326111 1..621 Beta-lactam resistance AF190692

aph(3')-Ia 100 816/700 NODE_258_length_680_cov_479.180878 1..700 Aminoglycoside resistance V00359

mph(A) 99.72 906/704 NODE_10668_length_684_cov_417.179810 1..704 Macrolide resistance D16251

catA1 99.85 660/660 NODE_1982_length_1526_cov_495.138275 248..907 Phenicol resistance V00622

dfrA12 100 498/498 NODE_4437_length_917_cov_476.905121 416..913 Trimethoprim resistance AB571791

QnrS1 100 657/657 NODE_6327_length_1563_cov_355.095337 459..1115 Quinolone resistance AB187515

fosA 96.9 420/420 NODE_745_length_1522_cov_307.268066 45..464 Fosfomycin resistance NZ_AFBO01000747

BACTERIAL GENOME

R. Piazza – NGS Sequencing II

BACTERIAL GENOME – HIERARCHICAL CLUSTERING

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

BACTERIAL GENOME – PCA/PCoA

R. Piazza – NGS Sequencing II

BacteriaFingerprint example showing simulated outbreak/epidemic data. HCE allows to track down the origin of the outbreak (Milan).

BACTERIAL GENOME

MICROBIOME

MICROBIOME: collection of genomes of microbes in a system

MICROBIOTA: collection of organisms that are present in a system

R. Piazza – NGS Sequencing II

MICROBIOME - UTILITY

R. Piazza – NGS Sequencing II

To track inflammatory bowel diseases such as Crohn’s or ulcerative colitis

Differences in gut microbial communities have been identified between individuals with non-alcoholic fatty liver disease (NAFLD) who had either mild to moderate or advanced liver fibrosis

Gut microbiome might be an important factor in a wide range of health issues like obesity, asthma, diabetes, cancer, autoimmune disorders and heart disease.

MICROBIOME - UTILITY

R. Piazza – NGS Sequencing II

Genomic DNA

DNA Library

NEXT-GENERATION-SEQUENCING

R. Piazza – NGS Sequencing II

MICROBIOME

R. Piazza – NGS Sequencing II

MICROBIOME - RESULTS

MICROBIOME - RESULTS

R. Piazza – NGS Sequencing II

BacteriaFingerprint simulated outbreak

THANK YOU FOR YOUR ATTENTION!

R. Piazza – NGS Sequencing II

FASTQ

NGS ANALYSIS: FIRST STEPS

1 2 3 4

SEQUENCE

QUALITY (PHRED)

Chromosome Position Ref Var Gene Codon Change AA Change Var Type Polymorphism MAF Clinical Polymorphism

chr11 119148931 G A CBL TGT->TAT Cys384Tyr SNV 0

chr12 22811995 A G ETNK1 AAT->AGT Asn244Ser SNV rs370316713 -1 Non-Clinical

chr18 42531913 G A SETBP1 GGC->AGC Gly870Ser SNV rs267607040 -1 Clinical

chr19 57840143 T C ZNF543 TTT->TCT Phe438Ser SNV 0

chr20 31021250 C T ASXL1 CGA->TGA Arg416*; Arg417*;

Arg308* SNV rs375215583 -1 Non-Clinical

SNP Filtering

Gambacorti-Passerini C. et al., Blood. 2015 Jan 15;125(3):499-503.

Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24.

Hoischen A. et al., Nat Genet. 2010 Jun;42(6):483-5. doi: 10.1038/ng.581.

SNP FILTERING

R. Piazza – NGS Sequencing II

SNP FILTERING

R. Piazza – NGS Sequencing II