The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of...

35
The Application of Next Generation Sequencing for HLA Genotyping in the Clinical Laboratory Dianne De Santis Department of Clinical Immunology, PathWest, Royal Perth Hospital, Perth, Australia

Transcript of The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of...

Page 1: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

The Application of Next Generation Sequencing for HLA Genotyping in the Clinical LaboratoryDianneDeSantisDepartment of Clinical Immunology, PathWest, Royal Perth Hospital, Perth, Australia

Page 2: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

http://www.ebi.ac.uk/ipd/imgt/hla/intro.html

April 2013HLA‐A 2,244HLA‐B 2,934HLA‐C 1,788HLA‐DRB1 1,317HLA‐DQB1 323HLA‐DPB1 185

Why do we need Next Generation Sequencing in the HLA laboratory?

Page 3: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

HLA ambiguity results from the amplification and Sanger sequencing based typing (SBT) of partial genesAlleleambiguity ‐ results when polymorphisms that distinguish alleles fall outside of the regions examined by the typing system or incomplete gene sequence in database

• A*01:01:01:01 vs A*01:01:01:02N (Intron 2)

•DRB1*12:01 vs DRB1*12:06 (Exon 3) Exon2 Intron2

Current genotyping strategy at DCI includes sequencing of exons 2-4 for HLA-A, -B, -C,exon 2-3 for DQB1, exon 2 for DPB1 and DRB genes

Page 4: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Genotypeambiguity ‐ results from an inability to establish phase(cis/transambiguity) between closely linked polymorphisms identified by the typing system

HLA ambiguity results from heterozygous sequencing by Sanger SBTA*01:01:01:01+24:02:01:01 ------------- Y-------Y----- -------------R- ------Y------ -------------MS A*01:14 + 24:46 ------------- Y-------Y----- -------------R- ------Y------ -------------MSA*24:02:01:01 -------------- C------T------ ------------A- ------C------ ------------ACA*24:46 -------------- C------T------ ------------A- ------C------ ------------CG

Page 5: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Allele 1 Allele 2A*02011 A*03011Allele 1 Allele 2A*02010101 A*03010101A*0226 A*0307A*0234 A*0308A*02010101 A*03010102NA*02010102L A*03010101A*02010102L A*03010102NA*02010101 A*03010103A*02010102L A*03010103

Allele 1 Allele 2A*02010101 A*03010101A*0226 A*0307A*0234 A*0308A*02010101 A*03010102NA*02010102L A*03010101A*02010102L A*03010102NA*02010101 A*03010103A*02010102L A*03010103A*0224 A*0317A*0290 A*0309A*02010103 A*03010101A*02010103 A*03010102NA*02010103 A*03010103A*020102 A*030112A*0323 A*9295

Allele 1 Allele 2A*02:01:01:01 A*03:01:01:01A*02:26 A*03:07A*02:34 A*03:08A*02:01:01:01 A*03:01:01:02NA*02:01:01:02L A*03:01:01:01A*02:01:01:02L A*03:01:01:02NA*02:01:01:01 A*03:01:01:03A*02:01:01:02L A*03:01:01:03A*02:24:01 A*03:17A*02:90 A*03:09A*02:01:01:03 A*03:01:01:01A*02:01:01:03 A*03:01:01:02NA*02:01:01:03 A*03:01:01:03A*02:01:02 A*03:01:12A*03:23:01 A*02:195A*02:01:52 A*03:01:03A*02:35:01 A*03:108A*02:237 A*03:05

Year IMGT Release2000 1.5Year IMGT Release2005 2.8Year IMGT Release2010 2.28Year IMGT Release2011 3.3.0

The discovery of new HLA alleles has resulted in an increase in heterozygous allele combinations that are identical in the commonly sequenced regions Year IMGT Release2013 3.9.01

550 combinations (ex2+3)

20 combinations (ex2-4)

Page 6: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Next-Generation Sequencing (NGS) is a method that can provide a complete solution to the limitations of currenttyping systemsDue to current limitations of existing methods and the increasing rate of new alleles, there is strong demand for a new method for HLA genotyping.NGS Features Important for HLA Typing:• Clonalamplification

provides sequencing information for a single DNA molecule –ensure the identification of phase• Massivellyparallel

large sequencing capacity enables an expansion of the HLA regions sequence and the ability to include other genes eg KIR, C4 amplification and sequencing of multiple loci (HLA class I & II) from many individuals (barcoding) in a single pool

Page 7: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Template preparationTemplate preparation SequencingSequencing Allele-callingAnalysisAllele-callingAnalysisShort range PCRLR-PCRExome captureWGS

CEMiSeqPGMGS-Jnr454/FLXPacific BiosciencesIn-houseCommercial

ThespecificpathwaywillvaryindifferentlaboratoriesCost? Convenience? Flexibility?

Next Generation Sequencing Strategy

Page 8: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Next ‘Second’ Generation Sequencing• 454 pyrosequencing (read length <1000bp)• Illumina Sequencing (read length 2 x 250bp)• SOLiD sequencing (read length 50-75bp)Third Generation Sequencing• Ion semiconductor sequencing (Ion Torrent) (read length <400bp)• Pacific Biosciences RS – Single molecule Real-time sequencing (read length 3-6kb)• Oxford Nanopore Single molecule

Next Generation Sequencing Technologies

Page 9: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Next Generation Sequencing Workflow: 454, Illumina, Ion Torrent PGM

Page 10: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

MID

MID

key

keyA

B

Sequence of interest

Locus‐specific PCR amplification

emPCR Amplification and sequencing

Primer design includes:

• GS sequencing Primer A or Primer B 454 sequencing adapter (which includes a four‐base library “key” sequence) at the 5‐prime portion of the oligonucleotide (25 nt)

• Target‐specific sequence at the 3‐prime end of the oligonucleotide

• Multiplex Identifier (MID) sequence to allow for automated software identification of samples after pooling/multiplexing and sequencing

In recently introduced workflow improvements for HLA genotyping, pooling of amplicons is done immediately following this genomic PCR = less reagent cost, less hands-on time see www.454.com

454 Amplicon Library Generation: MIDS and adaptors are incorporated during genomic PCR

Page 11: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

www.454.com

Read Flowgram

Mix amplicons& capture beads

Isolate DNA containing beads

PCR in “water-in-oil” emulsion

Add PCR Reagents& emulsion oil

Amplicon pool

A

B

Micro-reactors

Load Enzyme Beads

Load beads onto PicoTiter™Plate

DNA Capture Bead T

ATP

Light + oxyluciferin

Sulfurylase

Luciferase

APS PPi

Load PTP on Sequencer

Pyro-sequence

cooled 16Mpixel CCD camera

T A G C T

luciferin

454 Sequencing procedure

Page 12: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

www.454.com

1,280 genotypes sequenced, 95% allele assignmentOverall concordance 97.2%Median ambiguity string  ~ 1‐ 2Analysis time cut >25‐fold with little or no reflex testing required.

HLA Genotyping International Study

Page 13: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

• 454 GS Junior• 17 exons of HLA-A, -B, -C, DQB1, DPB1, DRB1,3,4,5• 173 samples, 18 GS Junior runs• Protocol validated under routine conditions in accordance with established policies and procedures of the European Federation for Immunogenetics (EFI)• Average read count =66,078 + 16.8/run, median read length of 425bp + 24• Conexio ATF for analysis

Rapid, scalable and highly automated HLA Genotyping using next-generation sequencing: A transition from research to diagnostics: Danzer et al. BMC Genomics 2013

Page 14: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

• From a total of 1,273 loci analysed, 1,241 (97.3%) were initially successful• DRB3 – amplification of exon 2 in the groups DRB3*01 ad *02 was inadequate leadingto uncertain results (n=20)• 77.2% of genotypes were called reliably without editing, • 22.8% needed manual editing

• Mostly DRB1 due to co-amplified DRB pseudo genes or PCR artefacts • HLA-C*07 as a result of C homopolymer region in exon 4

• The mean ambiguity reduction for the analysed loci was 93.5% with no significant improvement for DRB3, DRB4, DRB5

Page 15: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

• In nature, when a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released as a by-product.

• Each well holds a different DNA template, beneath the wells is an ion sensitive layer, and beneath that layer is an ion sensor.

PGM Ion Torrent Sequencing Technology

Page 16: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

The chip is flooded with a nucleotide in T, G, A, C order. If a nucleotide is incorporated a hydrogen ion is released, and there is a pH change, the pH change is converted to voltage and is recorded by the semiconductor sensorIf the next nucleotide that floods the chip is not incorporated, no hydrogen ion released and therefore no pH change

If there are two identical bases on the DNA strand then the signal is doubled and two bases is recorded.

PGM Ion Torrent Sequencing Technology

Page 17: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

A B C DRB DQB1 DPB1

Library Preparation for PGM NGS1 PCR and pool 2 Enzymatic Fragmentation

3 Adapter and Barcode LigationAdapterA Barcode001 P1Adapter

Barcode 002

Barcode 003

Library 1

Library 2

Library 3

TargetInsert

Page 18: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

ISP

Emulsion PCR and bead enrichment1 Anneal ssDNA to ISPs 2 Emulsify beads and ssDNA into water-in-oil microreactors

3 Clonal amplification by PCR 4 Enrichment of templated ISPs

ISPISP

ISP

Page 19: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Sequencing of HLA fragment libraries with 400bp chemistryreveals a high proportion of 400bp reads

Page 20: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Uneven read depth distribution across HLA Class I genes remains a problem with longer read chemistry and is most likely due to GC content of these genes and bias in the emPCR

GeneratedfromHLAplugin,LifeTechnologies

Max ReadDepth Max ReadDepth

Ex2 Ex3Ex2 Ex3

Ex2 Ex3Ex2 Ex3

Ex2 Ex3Ex2 Ex3

Ex2 Ex3Ex2 Ex3

Ex2 Ex3Ex2 Ex3

Ex2 Ex3Ex2 Ex3

HLA-AHLA-BHLA-CDRB1DQB1DPB1

Patient 1: Patient 2:

Page 21: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

However, minimum read depths obtained following minor modifications to ION PGM library preparation protocol is sufficient for accurate allele callingA*02:01:01:01,B*13;02:01,B*15:01:01:01,C*03:04:01:01,C*06:02:01:01,DRB1*04:04:01,DRB1*07:01:01

DatageneratedfromHLAplugin,LifeTechnologies

Page 22: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Minimum read depths of less than 100 reads/base still enables accurate allele callingA*11:01:01,A*25:01:01,B*07:02:01,B*35:01:01:01,C*04:01:01:01,C*07:02:01,DRB1*04:07:01,DRB1*15:01:01

Page 23: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Sequencing on the Ion PGM identifies DNA fragments representing each HLA allele

68:01:02

68:01:02

68:01:02

02:01:01:01

02:01:01:01

02:01:01:01

02:01:01:01

AdaptedfromASSIGN,MPS,ConexioGenomics

Page 24: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Good read depth ensures accurate base calling with an average base-call error rate ~1-5%68:01:0202:01:01:01

AdaptedfromASSIGN‐MPS,ConexioGenomics

READ DEPTH

Page 25: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

NGS-PGM sequencing resolves common allele ambiguities and in some cases identifies the less common allele combination

EXON 1

SSBT B*08:01/08:19N B*27:05/27:13NGS B*08:01:01 B*27:13

08:01:01

08:01:01

27:13

27:13

B*27:13(A)vsB*27:05(C)

AdaptedfromASSIGN‐MPS,ConexioGenomics

Page 26: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Illumina Sequencing Technology: Sequencing by Synthesis• Sequencing by synthesis uses 4 fluorescentlylabelled nucleotides to sequence millions of clusters attached to a flow cell

• Clonal amplification occurs on a flow cell by bridge amplification

• Flow cell contains a dense lawn of primers complementary to adaptors on target fragments

• Unlabelled nucleotides and enzyme are addedto build double‐stranded bridges on flow cell

• Double‐stranded DNA then denatured to allowsequencing of single stranded template on flow cell

• Sequencing occurs by adding four labelled reversible terminators, primers and polymerase

• After laser excitation, fluorescence is emitted from each cluster and captured 

• Then cycle is repeated to capture subsequent bases

Page 27: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Insert (variable length)

Read 2 (250 bases)Read 1 (250 bases)

Paired-End Sequencing on Illumina platform increases read length

Page 28: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

B*07:02:01B*41:01

TC

TC

ATGC CG

TA

Insert sizes: 787 and 685 basesPhased Polymorphisms 609 bases apartUsing paired‐end 250 base reads

Phasing Paired-End Sequences

Page 29: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Phase-defined complete sequencing of the HLA genes by next-generation sequencing: Hosomichi et al. BMC Genomics 2013• Long range PCR amplicons (3.4kb-13.6kb) including entire regions of HLA-A, -B, -C,DRB1, -DQB1, DPB1• Paired-end reads of 2 x 250bp• 33 homozygous cell lines, 11 HLA heterozygous samples, and 3 parent-child families• 2 methods of sequencing;

• Individual-tagging method – all of the PCR amplicons of 6 HLA genes from an Individual were pooled before library preparation• Gene-tagging method – each PCR amplicon barcoded before library preparation

• Individual-tagging method • 66.35% reads mapped to reference hg19 with average read depth of 157x• 33 homozygous cell lines – 6 amplicons = 198 amplicons• 32/198 amplicons – HLA sequences could not be generated due to low read depth• 152/166 completely homozygous sequences were identical to reference sequence(IMGT/HLA)• 14 were found to be novel – variants in intronic sequence• Unable to obtain phase-define sequences for HLA heterozygous samples

Page 30: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Phase-defined complete sequencing of the HLA genes by next-generation sequencing: Hosomichi et al. BMC Genomics 2013• Gene-tagging method

• 73.1% of all reads mapped to hg19 reference for 66 amplicons• Average depth ranged from 146x – 6,678x, mean 2,281x• 100 HLA gene haplotypes were defined, 32 HLA gene haplotypes recorded1-5bp mm,• 17 HLA gene haplotype had mismatches in exonic region = ?sequence error• 15 HLA gene haplotypes had mismatches in intronic regions = new alleles

• Study concludes that although able to define phase in HLA heterozygous samplesusing the described mapping algorithm (map to hg19), the gene-tagging method would be low throughput and costly compared with individual-tagging method

Page 31: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

• Readlength– Important for the linking of resolving polymorphisms and therefore establishing phase

Important considerations for HLA Typing by NGS Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8

145 differences between A*01:01:01:01 and A*02:01:01:01

< 211bp >

0 100

200

300

400

500

600

700

800

900

100 0

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

2800

2900

3000

3100

3200

3300

3400

3500

549 bp 374 bp

32 differences between A*66:02 and A*68:02:01:01

893 bp829 bp

2 differences between A*01:01:01:01 and A*01:03

1305 bp

21 differences between A*26:01:01 and A*34:02

A

G

C

T

Lind C et al., Hum Immunol. 10:1033-42 (2010)

Page 32: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

‘Third’ Generation Sequencing: Pacific Biosciences RS-Single molecule real-time sequencing

Page 33: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer
Page 34: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

Sam

ple

Prep

arat

ion

LS – long sequencing reads

• Large insert sizes (2kb-10kb)• Generates one pass on each molecule sequenced

• Small insert sizes 500bp• Generates multiple passes on each molecule

sequenced

Standard

Circular Consensus

CCS – high quality sequencing reads

Two sequencing modes of Pacific Biosciences RS

Page 35: The Application of Next Generation Sequencing for HLA ... · MID MID key key A B Sequence of interest Locus‐specific PCR amplification emPCR Amplification and sequencing Primer

PacBio Single Molecule Sequencing and HLA genotyping• No PCR amplification of genomic DNA

• With the ability to sequence between 3‐6kb, technology will allow sequencing of complete HLA Class I genes and most of the HLA Class II genes eliminating the problem of phase ambiguity

• However, the technology is still in development for the application of HLA genotyping and the cost of sequencing a sample is currently  too high for implementationinto the HLA diagnostic laboratory 

BUTWATCHTHISSPACE!!!!