1 Rapid Detection of Genetic Engineering, Structural Variation, … · identify genetic...

1

Rapid Detection of Genetic Engineering, Structural Variation, and Antimicrobial Resistance 1

Markers in Bacterial Biothreat Pathogens by Nanopore Sequencing 2

A. S. Gargis1*, B. Cherney1, A. B. Conley2, H. P. McLaughlin1, D. Sue1 3

4

1 Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic 5

Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA. 6

2 IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, Georgia, USA. 7

* Corresponding author. Current Address: Division of Healthcare Quality Promotion, National Center for 8

Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, 9

Georgia, USA. 10

11

12

13

14

15

16

17

18

19

20

21

22

made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also

The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint

https://doi.org/10.1101/730093

2

Abstract 23

Widespread release of Bacillus anthracis (anthrax) or Yersinia pestis (plague) would prompt a 24

public health emergency. During an exposure event, high-quality whole genome sequencing (WGS) can 25

identify genetic engineering, including the introduction of antimicrobial resistance (AMR) genes. Here, 26

we developed rapid WGS laboratory and bioinformatics workflows using a long-read nanopore 27

sequencer (MinION) for Y. pestis (6.5h) and B. anthracis (8.5h) and sequenced strains with different 28

AMR profiles. Both salt-precipitation and silica-membrane extracted DNA were suitable for MinION 29

WGS using both rapid and field library preparation methods. In replicate experiments, nanopore quality 30

metrics were defined for genome assembly and mutation analysis. AMR markers were correctly 31

detected and >99% coverage of chromosomes and plasmids was achieved using 100,000 raw sequencing 32

reads. While chromosomes and large and small plasmids were accurately assembled, including novel 33

multimeric forms of the Y. pestis virulence plasmid, pPCP1, MinION reads were error-prone, 34

particularly in homopolymer regions. MinION sequencing holds promise as a practical, front-line 35

strategy for on-site pathogen characterization to speed the public health response during a biothreat 36

emergency. 37

38

Introduction 39

Yersinia pestis and Bacillus anthracis are the etiological agents of plague and anthrax, 40

respectively. Deliberate misuse of these pathogens as bioterrorism agents poses a serious public health 41

and safety threat, due to low infectious doses, high lethality, and the ease of production, dissemination 42

and communicability 1. Strains of both species were weaponized previously, and the 2001 Amerithrax 43

incident underscores the importance of biological threat preparedness and rapid laboratory response 2, 3. 44

Natural anthrax and plague outbreaks are reported annually in animals, and human cases are often 45

zoonotic. While these diseases continue to cycle in isolated, enzootic foci, re-emergence was reported 46



https://doi.org/10.1101/730093

3

during 2016 anthrax outbreaks in Russia and Sweden4, and the 2017 pneumonic plague outbreak in 47

Madagascar 5. Both naturally-occurring and engineered antimicrobial resistance (AMR) have been 48

reported in Y. pestis and B. anthracis 6. Most B. anthracis strains are susceptible to antibiotics, but 49

surveys of clinical and environmental isolates indicate penicillin resistance occurs in 2 to 16% of strains 50

7. Attenutated B. anthracis strains with AMR were isolated following laboratory in vitro passaging 51

(selective pressure) and/or targeted genetic manipulation 8, 9. Laboratory-selected virulent and avirulent 52

Y. pestis strains with quinolone-resistance were previously reported 10, 11 and engineered multi-drug 53

resistant (MDR) strains are a public health concern 12, 13. Y. pestis is intrinsically susceptible to every 54

antibiotic recommended for human plague therapy, but a few Y. pestis isolates with transferable 55

plasmid-mediated resistance were reported from Madagascar 13, 14, 15, 16. 56

A public health response to incidents involving human plague or anthrax requires timely and 57

accurate AMR profiling. Conventional broth microdilution (BMD) is the gold standard antimicrobial 58

susceptibility testing (AST) method, but requires lengthy incubation times for B. anthracis (16 to 20 59

hours) and Y. pestis (at least 24 hours)17, 18. Several new phenotypic methods can rapidly assess 60

functional susceptibility using: real-time PCR to detect growth in the presence of antimicrobial agents 19, 61

bioluminescent reporter phage 20, flow cytometry 21, laser light scattering technology 22, and optical 62

screening 23. While these phenotypic methods are more rapid than BMD AST, they cannot definitively 63

detect genetic manipulation. Whole genome sequencing (WGS) can reveal evidence of genetic 64

engineering in bacteria, such as the introduction of genes, mutations and/or plasmids. While there is not 65

a comprehensive understanding of all genes or mutations associated with resistance, WGS is invaluable 66

for understanding novel genotypic/phenotypic relationships. Moreover, unlike PCR–based methods, a 67

comprehensive knowledge of AMR is not required, since de novo WGS is unbiased and can reveal 68

previously undescribed markers without targeted primer sets. Genomic data complements phenotypic 69



https://doi.org/10.1101/730093

4

AST results and offers insight into mechanisms of drug resistance, which is especially critical in the age 70

of synthetic biology. 71

Two next-generation sequencing (NGS) approaches are used for microbial WGS: short-read 72

sequencing (e.g. Illumina) and long-read-sequencing (e.g. Pacific Biosciences, PacBio and Oxford 73

Nanopore Technologies, ONT). Short-read sequence data have low error rates and are useful for strain 74

typing and the identification of single nucleotide polymorphisms (SNPs) 24, 25. The short (maximally 75

300nt) read lengths limit the ability to resolve repetitive sequences or structural genomic 76

rearrangements, and can result in unfinished, fragmented de novo assemblies comprised of hundreds of 77

contigs 25, 26. Many AMR gene regions are flanked by repetitive insertion sequences (IS), and short-read 78

sequencing cannot span these regions 25. This can result in inaccurate chromosome or plasmid 79

assemblies, including multiple copies of a repeat region collapsed into one location 25, 27. Long-read 80

sequencing platforms produce reads ranging from several kilobases (kb) to greater than 10 kb in length 81

that can bridge gaps between repetitive regions and allow complete bacterial genome finishing 25. 82

However, long-read sequence data have higher error rates than short-read data 28, and are dominated by 83

insertions and deletions (indels), especially in homopolymeric regions 24, 29. Small plasmids (<7 kb) are 84

difficult to accurately assemble from PacBio reads due to size selection limitations of library preparation 85

and data analysis 30 31. However, short- and long- read NGS data are complementary and together 86

generate highly accurate hybrid genome assemblies 25, 26, 32, 33. 87

Work with Y. pestis and B. anthracis requires specialized facilities to ensure both biosecurity and 88

biosafety. Physical, high-containment laboratory space is often limited, and instruments used for 89

molecular characterization, such as WGS, are typically located in separate areas at lower containment 90

levels. NGS instruments can incur high capital costs, occupy a large footprint or require specialized 91

technical training, making routine on-site sequencing cost- and space- prohibitive 28, 34. By contrast, 92

there is minimal capital cost for the MinION (ONT) device, which is the size of a USB thumb-drive, and 93



https://doi.org/10.1101/730093

5

its operation requires a computer or ONT device controller that together occupy little benchtop space. 94

Therefore, nanopore WGS can occur within a high-containment laboratory with limited space. Unlike 95

PacBio sequencing that requires ≥1 µg of genomic DNA (gDNA), only 400 ng of gDNA is required for 96

library preparation using the MinION Rapid Sequencing Kit. Sequencing libraries can also be prepared 97

using a MinION Field Sequencing Kit that includes shelf-stable lyophilized reagents. After a run 98

begins, MinION sequence data are available in minutes enabling real-time sequence analysis. 99

The field of nanopore sequencing is rapidly evolving, and benchmarking studies that use 100

standardized conditions to assess run-to-run variability are lacking. Few studies have assessed the 101

accuracy and consistency of nanopore sequencing using the same strain with consistent chemistry, 102

consumables, and software versions 26, 35, 36. We developed a custom bioinformatics pipeline to compare 103

the genome assemblies from standardized replicate nanopore sequencing experiments to define 104

sequencing quality metrics, establish performance characteristics, and demonstrate the feasibility of 105

rapid long-read WGS for Y. pestis and B. anthracis. The resulting same-day laboratory workflow for 106

DNA isolation, library preparation, nanopore sequencing, and bioinformatics analysis is timely and 107

effective for the rapid detection of known markers related to AMR and evidence of genetic engineering 108

in bacterial biothreat pathogens. 109

110

Results 111

Y. pestis gDNA quality and quantity. All Y. pestis gDNA met ONT criteria for MinION sequencing as 112

measured by optical density absorbancy ratios at 260 nm and 280 nm (A260/280) and at 260 nm and 230 113

nm (A260/230) (Supp. Table 1). Both silica-membrane (SM)- and salt-precipitation (SP)-based gDNA 114

extraction methods produced Y. pestis gDNA of comparable quality and quantity, with intact high 115



https://doi.org/10.1101/730093

6

molecular weight (HMW) and little evidence of shearing (Supp. Fig. 1). The SP method produced 116

higher molecular weight DNA compared to SM extraction, but required an hour longer to complete. 117

Y. pestis MinION sequencing technical replicates using Rapid Sequencing and Field Sequencing 118

Kit libraries. Y. pestis S1037, a derivative of the attenuated strain All2237, was sequenced to establish 119

baseline performance and quality metrics for rapid nanopore WGS. Strain S1037 contains two virulence 120

plasmids, pMT1 (~110 kb) and pPCP1 (9.5 kb monomer and ~19 kb dimer), and is ciprofloxacin non-121

susceptible (CIP-NS, MIC 4 µg/ml by BMD AST). Unlike A1122, strain S1037 contains a gyrA single 122

nucleotide polymorphism (SNP) (Table 1) in the previously described Y. pestis quinolone resistance-123

determining region (QRDR) 11. Three replicate rapid sequencing libraries were prepared from one SP 124

gDNA extraction (S1037, SP, Rapid 1, 2, and 3; Fig. 1) and from one SM gDNA extraction (S1027, SM, 125

Rapid 1, 2, and 3; Fig. 1) to evaluate variation among sequencing runs from the same sample. Libraries 126

were also prepared with the Field Sequencing Kit using the same SP and SM gDNA used in the 127

technical replicate study (SP Field and SM Field; Fig. 1). Each library was sequenced for up to 48 128

hours, but only reads generated during the first 16 hours of sequencing were analyzed to allow for a 129

standardized comparison of data sets. 130

After base calling, FASTQ reads with an average quality (Q) score of <7 were discarded, 131

resulting in removal of 4 to 74% of total reads from analysis (Fig. 1, Supp. Table 3). The MinKNOW 132

software became unresponsive after ~11-12 hours of sequencing for SP Rapid 1 and SM Rapid 1, and 133

both were restarted and continued for ~6-8 additional hours (Supp. Table 2). While SP Rapid 1 134

appeared unaffected by the software problem, SM Rapid 1 lost ~74% of reads following basecalling and 135

contained the fewest reads ≥Q7 of any data set. All flowcells met ONT’s minimum active pore 136

requirement (≥800) during platform QC, but the flowcell used for SP Rapid 3 had an initial QC of 871 137

active pores that decreased to 759 after loading the library (Fig. 1, Supp. Table 3). SP Rapid 3 lost ~6% 138



https://doi.org/10.1101/730093

7

of reads following base-calling, but the amount of data generated and average Q-score resembled other 139

technical replicate runs (Fig. 1, Supp. Table 3). 140

The number of ≥Q7 FASTQ reads generated was inconsistent within and among the SP and SM 141

replicate runs. During the first 16 hours of sequencing, the SP technical replicates generated 310,906-142

639,598 ≥Q7 reads, while SM experiments generated 305,824-1,172,803 ≥Q7 reads. The average read 143

lengths within and among the SM and SP replicates varied. Both average read length and number of 144

reads generated contributed to the amount of useable data (Fig. 1, Supp. Table 3). For example, SP 145

Rapid 2 (388,435 reads) had more ≥Q7 reads than SP Rapid 3 (310,906 reads), but SP Rapid 3 read 146

lengths were ~1.75X longer. The amount of SP Rapid 3 run data generated was greater (2.6 Gb vs. 1.8 147

Gb). Overall, the range of useable data generated by the SP replicate experiments (1.8 to 3.1 Gb) was 148

comparable to SM (1.3 to 4.6 Gb). The average Q-scores of the SM and SP Rapid Sequencing technical 149

replicates ranged from 12.5 to 13.2. The amount of sequencing data (≥Q7 reads and total Gb) generated 150

using the Field Sequencing Kit was comparable to the Rapid Sequencing Kit (Fig. 1, Supp. Table 3). 151

The Field Sequencing Kits produced data sets with the overall highest SP (13.5) and SM (13.9) Q-152

scores. 153

The strand-to-pore ratio is a quality metric that represents the quantity of MinION flow cell 154

pores actively sequencing DNA (pores in strand) relative to total available pores. The ratio was 155

recorded after ~10 minutes of sequencing per run to assess the library preparation quality (Supp. Table 156

2). A ≥50% strand-to-pore ratio was expected for the RAD-004 Rapid Sequencing Kit (ONT 157

Community and personal communication). For SP Rapid 1, 2, and 3, the average strand-to-pore ratios 158

were 60%, 63%, and 64%, respectively; and for SM Rapid 1, 2, and 3, the ratios were 80%, 75%, and 159

82%, respectively. Ratios increased to >80% within the first 1-2 hours of sequencing for all SM and SP 160

replicates. 161



https://doi.org/10.1101/730093

8

MinION sequencing performance and Y. pestis S1037 de novo assembly using retrospective 162

downsampling of passed fast5 reads. A retrospective downsampling of passed fast5 reads was 163

performed to determine the amount of data necessary to produce high-quality genome assemblies. 164

Assembly quality was measured by indel frequency, genome coverage, ability to assemble large and 165

small plasmids, and accurate detection of the gyrA SNP associated with the CIP-NS phenotype (Fig. 2). 166

Minimal improvements in overall assembly quality were observed when >75,000 reads were 167

analyzed (Fig. 2). De novo assemblies using ≤75,000 reads did not consistently assemble the Y. pestis 168

chromosome, or large (pMT1) and small (pPCP1) plasmids. Subsequent nanopore analyses were 169

performed with 100,000 reads only. The average number of mismatches for the SP and SM 100,000-170

read data sets ranged from 220 to 368, while Illumina MiSeq and PacBio assemblies of the same strain 171

contained 44 and 29 mismatches, respectively (Fig. 2a). Thousands of indels (3,151 to 8,251) were 172

detected in nanopore assemblies (Fig. 2b), compared to Illumina (18 indels) and PacBio (93 indels), and 173

were predominantly located in homopolymeric regions. Indels were not significantly reduced in 174

nanopore assemblies supplemented with >100,000 additional reads (data not shown). Using 100,000 175

reads, averaged across the three Rapid SP and three Rapid SM Y. pestis S1037 data sets, the poly-N 176

length and the fraction of poly-N's correct in the assemblies was plotted for each nucleotide type using 177

both the RACON and Nanopolish error corrected assemblies (Supp. Fig. 2). Nanopolish increased the 178

fraction of Y. pestis poly-N lengths correctly called, but this fraction exponentially decreased for lengths 179

>4. Increased accuracy was observed when calling poly-As and Ts compared to poly-Gs and Cs. The 180

gyrA point mutation, which is not located in a homopolymeric region, was correctly identified in all Y. 181

pestis assemblies (Fig. 2g). 182

The first 100,000 fast5 reads yielded >99% identity with A1122 across the ~4.6 megabase Y. 183

pestis genome for all experiments (Fig. 2c). For the chromosome, all nanopore assemblies generated a 184

single contig, except the SP Rapid 1 data set (2 contigs observed) (Fig. 2g). The average depth of 185



https://doi.org/10.1101/730093

9

chromosome coverage was 101X for all replicate Y. pestis S1037 data sets (Fig. 2d). Plasmid pMT1 was 186

consistently assembled to completion, with multiple overlaping reads spanning the entire plasmid length 187

in all 100,000-read data sets, however pPCP1 was not. Un- or misassembled pPCP1 fragments were 188

detected (i.e. assembly of multiple small fragments with homology to pPCP1 or bases in different order 189

than the reference sequence) (Fig. 2g). Reads corresponding to pPCP1 with lengths greater than 9.5 kb 190

and 19 kb were also identified, and are discussed below. Overall, the SP sequencing runs produced 191

higher average coverage for both pMT1 (104X) and pPCP1 (3,567X) compared to the SM runs (22X 192

and 431X, respectively) (Fig. 2e and 2f). Among the SM and SP Rapid and Field Sequencing Kit runs, 193

the average time to generate 100,000 fast5 reads was ~2 hours (Supp. Table 2) and data analysis 194

(including base-calling) required ~2.5 additional hours. For Y. pestis, SM gDNA extraction from an 195

agar culture isolate, MinION WGS, de novo assembly and analysis can be completed in ≤6.5 hours (Fig. 196

3). 197

Detection of multimeric pPCP1 forms in Y. pestis S1037 and A1122 assemblies. Analysis of read 198

length distributions among the SP and SM technical replicates revealed a greater number of long reads 199

generated in SP experiments (Supp. Fig. 3). SP data sets also contained more ~18.2 kb reads, which 200

correspond to the pPCP1 dimer (2-mer) form. Most reads from Y. pestis S1037 SP data sets with 201

regions homologous to pPCP1 were ~9.1 kb (~17%) or ~18.2 kb (~80%) (Supp. Fig. 4), which 202

correspond to the previously described monomer (~9.5 kb) and dimer (~19 kb) forms 38. A few reads 203

(~3%) with lengths >18.2 kb were detected in the 100,000-read data sets from SP extracted gDNA, but 204

were absent in SM data sets (Supp. Fig. 3 and Supp. Fig. 4). Dimer, trimer, and tetramer pPCP1 forms 205

are linked together in a 5′-3′ head-to-toe fashion (Fig. 4). Multimers were present in the nanopore data 206

sets derived from the Field Sequencing and three Rapid Sequencing Kit preparations of the same SP 207

extracted gDNA (SP-1). An S1037 assembly using Illumina short-read data resulted in three contigs 208

(688 nt, 1,894 nt, and 7,375 nt) with pPCP1 homology, and did not assemble the ≥19 kb plasmid forms 209



https://doi.org/10.1101/730093

10

(Fig. 2g). An S1037 PacBio assembly resulted in 3 contigs corresponding to the chromosome 210

(4,576,792 nt), pMT1 (95,849 nt), and a partially assembled pPCP1 (21,732 nt) (Fig. 2g). 211

An additional Y. pestis S1037 SP gDNA extraction (SP-2, Supp. Table 1) was prepared using 212

care to minimize DNA shearing (e.g. without vortexing and minimal pipetting) to improve WGS 213

detection of multimer pPCP1 forms. Following sequencing using the Rapid Kit, the resulting 100,000-214

read de novo assembly yielded comparable sequence quality to the technical replicate assemblies (data 215

not shown) and overall contained more reads that were >18.2 kb than the S1037 SP-1 data set (Supp. 216

Fig. 5a). The average read length (10,960 nt) was greater than the longest average read length (7,800 nt) 217

observed in the technical replicate experiments. The SP-2 assembly contained an increased fraction of 218

~18.2 kb reads that corresponded to the pPCP1 plasmid dimer (Supp. Fig. 5b). Few reads with pPCP1 219

homology measured ~45,500 nt (4 reads) and 54,600 nt (1 read), which correspond to putative 5-mer 220

and 6-mer forms (Supp. Fig. 5b). 221

To determine if Y. pestis A1122, the parent strain of S1037, also contained reads >18.2 kb with 222

pPCP1 homology, SP and SM extracted gDNA was used for Rapid Sequencing Kit library preparation 223

and nanopore sequencing. The 100,000-read de novo assemblies from A1122 yielded comparable 224

sequence quality to the technical replicate assemblies (data not shown). A1122 reads >18.2 kb with 225

pPCP1 homology (Supp. Fig 6) were assigned 3- or 4-mer forms. When >100,000 reads were analyzed, 226

additional pPCP1 homologous reads in this size range or greater were not detected. A PacBio assembly 227

of A1122 resulted in 3 contigs corresponding to the chromosome (4,586,979 nt), pMT1 (96,144 nt), and 228

pPCP1 (8,935 nt) (data not shown). 229

B. anthracis gDNA quality and quantity. The rapid Y. pestis nanopore sequencing approach was 230

applied to B. anthracis strains with different AMR profiles to evaluate de novo assembly quality and to 231

identify plasmids and known AMR markers/mutations (Table 1). The SM DNA extraction method was 232

selected over the SP method for this study based on speed (1.5 hours time saving). Results from SP 233



https://doi.org/10.1101/730093

11

extracted gDNA were comparable in quality and quantity (data not shown). B. anthracis SM extractions 234

produced HMW DNA (>30 kb) with minimal shearing (Supp. Fig. 7) at concentrations acceptable for 235

MinION sequencing (Supp. Table 1). The SM extraction for B. anthracis Sterne/pUTE29 met ONT’s 236

absorbancy ratio requirements, but the A260/230 ratio for strains 411A2 and UT308 were lower than the 237

recommended range of 2.0-2.2. Repeated extraction attempts did not improve the ratios for these strains 238

(data not shown). Additional clean-up by DNA precipitation was not performed to avoid extending the 239

procedure time and to minimize gDNA shearing. Rapid Sequencing Kit libraries were prepared using 240

gDNA from UT308, 411A2, and Sterne/pUTE29 and Field Sequencing Kit libraries were prepared using 241

gDNA from the latter two strains. 242

B. anthracis MinION WGS and retrospective down sampling analysis. Following nanopore 243

sequencing of B. anthracis strains, 100,000 passed fast5 reads were analyzed (Fig. 5). The average 244

number of mismatches among the Rapid and Field Sequencing Kit B. anthracis data sets ranged from 245

169 to 594 (Fig. 5a). B. anthracis Sterne/pUTE29 Illumina MiSeq and PacBio assemblies contained 246

fewer mismatches (107 and 87, respectively). Nanopore B. anthracis data sets contained thousands of 247

indels (6,262-11,574), compared to Illumina (24 indels) and PacBio (21 indels) assemblies (Fig. 5b). 248

Nanopolish increased the fraction of B. anthracis poly-N lengths correctly called, but this fraction 249

decreased with lengths >4. Increased accuracy was observed when calling poly-As and -Ts compared to 250

poly-Gs and -Cs (Supp. Fig. 8). 251

The 100,000-read analysis yielded >99% identity across the ~5.2 Mb B. anthracis genome in 252

every data set analyzed (Fig. 5c). B. anthracis nanopore assemblies contained single contigs for the 253

chromosome (66X average depth of coverage) and pXO1 (~180 kb, 223X average depth of coverage), 254

which were consistantly assembled to completion (Fig. 5d, e, g). Analysis of 100,000 fast5 reads 255

provided sufficient coverage to detect known AMR markers (chromosomal or plasmid-associated) 256

(Table 1). For Sterne/pUTE29, the ~7.3 kb pUTE29 plasmid was consistantly assembled using both 257



https://doi.org/10.1101/730093

12

Rapid and Field Kit nanopore data (average 5,621X depth of coverage) (Fig. 5f, g). However, pUTE29 258

was partially assembled or unassembled using Illumina and PacBio data, respectively (Fig. 5g). The 259

gyrA and parC mutations associated with quinolone resistance were detected in the CIP-NS strain 260

411A2 (Fig. 5g). Among the SM Rapid and Field Kit runs, the average time to generate 100,000 fast5 261

reads was ~3 hours (Supp. Table 2) and data analysis required ~ 2.5 additional hours (including base-262

calling). For B. anthracis, SM DNA extractions from an agar culture isolate, MinION WGS, and de 263

novo assembly and analysis can be completed in ≤ 8.5 hours (Fig. 3). 264

Discussion 265

During an outbreak or public health emergency involving bacterial biothreat pathogens, critical 266

decisions are made immediately after a release, when details about the causative agent(s) are limited, 267

and at every stage of the response. Rapid reporting of accurate laboratory testing results (pathogen 268

identification, strain typing, AST) are critical for informing a public health response. For example, 269

detection of genetic manipulation and/or unexpected AMR profiles could signal the introduction of 270

plasmids, AMR genes, or other evidence of engineering. Speeding this response to limit the impact of 271

biothreat incidents is a goal of the 2018 U.S. National Biodefense Strategy 39. 272

Culture- and DNA-based methods have improved the ability of laboratories to rapidly 273

characterize microbial pathogens. The introduction of NGS platforms has accelerated these efforts, and 274

WGS offers a comprehensive genome-wide snapshot of microbial pathogens. However, DNA 275

sequencing using Illumina and PacBio platforms is time-consuming compared to nanopore sequencing 276

and relies on instuments with a large footprint that are often space-prohibitive. Advances in long-read 277

microbial DNA sequencing with the MinION instrument have transformed the fields of NGS and 278

molecular characterzation, enabling real-time pathogen sequencing for epidemiologic investigations40. 279

MinION WGS is promising as a front-line strategy for on-site microbial characterization due to the 280

speed of sample preparation, relatively low cost and device portability 41. Here, we describe a nanopore 281



https://doi.org/10.1101/730093

13

sequencing approach for Y. pestis and B. anthracis with a turnaround time that is ~90% faster than 282

current Illumina MiSeq workflows and yields complete genome assemblies, including large and small 283

plasmids, and accurately identifies AMR genes and mutations within a work day. 284

For the widespread adoption of WGS-based approaches in public health and clinical laboratories, 285

validated methods that are simple to set up are essential. During an outbreak investigation, laboratory 286

methods must be reliable and reproducible. Both commercially-available DNA extraction methods 287

evaluated here were capable of generating gDNA of sufficient quality, quantity, and molecular weight 288

for Y. pestis and B. anthracis MinION sequencing. The SM method was the fastest (30 min for Y. pestis 289

and 1.5 hr for B. anthracis) and simple to perform. Despite the sub-optimal quality of some B. anthracis 290

extractions, de novo assemblies using data from these samples were unaffected, indicating downstream 291

library preparation and sequencing reactions were not compromised. Further optimization could 292

improve the extraction process and DNA quality, and work is ongoing to evaluate the sequencing 293

performance of DNA extracted from wild type B. anthracis encapsulated vegetative cells. 294

Shelf-stable reagents that do not require cold chain storage are preferred for rapid deployment 295

and use in the field. The lyophilized Field Sequencing Kit, which does not require a cold chain, 296

performed comparably to the Rapid Kit. To identify the minimum amount of sequencing time needed to 297

consistently generate enough data for high-quality Y. pestis and B. anthracis de novo genome 298

assemblies, a retrospective down sampling analysis was performed. On average, only 100,000 fast5 299

reads were necessary to assemble the chromosome and large and small plasmids with sufficient 300

coverage to correctly identify known AMR genes and mutations. Additional reads did not strengthen 301

the assembly quality, and any improvement of coverage and number of mismatches, indels, and % 302

identity was negligible. Our laboratory and bioinformatic workflows were achieved in <8.5 hours 303

starting from a culture isolate, and accurately detected genomic AMR markers in approximately half the 304

time required to complete functional AST for Y. pestis and B. anthracis. 305



https://doi.org/10.1101/730093

14

We demonstrate that thresholds and quality metrics can be established that are indicative of 306

nanopore sequencing quality. MinION sequencing performance was assessed by comparing data from 307

two technical replicate experiments where the DNA extraction, library preparation kit, sequencing 308

conditions, MinKNOW software, and MinION instrument and flowcell versions remained the same. 309

Since the majority of nanopore data is generated during the first 16 hours of sequencing when the most 310

active pores are used 26, 36, only the passed reads generated during this time were analyzed in order to 311

standardize the technical replicate data sets. Considerable run-to-run variation was observed among the 312

SP and SM replicates from the same biological sample. We demonstrated that the number of active 313

pores was not predictive of the amount or quality of data. These inconsistencies could result from 314

intrinsic flow cell variability and day-to-day loading efficiencies. However, comparable average quality 315

scores were observed for basecalled reads (≥Q7) for both the SP and SM data sets. 316

The rapid sequencing workflow developed in this study aimed to deliver the fastest turnaround 317

time per bacterial isolate using one flowcell per run. The study avoids the complexities of 318

multiplexing/demultiplexing sequences during data analysis. Due to the variability we observed in 319

flowcell performance, multiplexing the same sample on a single flow cell could be used to establish 320

“within run” precision. Improved rapid multiplex capabilities have been described since this study 321

began, but multiplexing is a balance between sequencing speed and genome coverage. Additional 322

considerations, including incorporation of positive and negative DNA controls, are needed to establish 323

indicators of sequencing quality within and between runs. Future work must build on this foundation to 324

ensure the accuracy of nanopore data and expand MinION sequencing to include multiple complex 325

samples, such as clinical specimens. 326

The assay parameters established in the Y. pestis technical replicate analysis were applied to B. 327

anthracis by sequencing three AMR strains: Sterne/pUTE29, 411A2, and UT308. With the exception of 328

an additional one hour lysis step for B. anthracis, the same SM DNA extraction protocol was employed 329



https://doi.org/10.1101/730093

15

and 100,000 fast5 reads were sufficient for >99% coverage of the chromosome and plasmids (both 330

native and introduced). While functional phenotypic methods remain essential for accurate AST, WGS 331

can be used to detect point mutations that predict AMR in some biothreat pathogens 42. For both Y. 332

pestis and B. anthracis data sets, MinION reads were error-prone compared to Illumina and PacBio data 333

and contained more indels, a well described nanopore error type 33, 34, 43. Therefore, open reading frame 334

prediction and SNP-typing were not performed for MinION assemblies due to the large number of 335

indels. Overall, use of the Nanopolish error correction tool reduced indels, but accurate basecalls could 336

not be made in homopolymer regions (poly-N lengths >4). For this reason, the homopolymeric regions 337

associated with penicillin resistance in B. anthracis, sigP-rsiP 42, were not included for analysis in this 338

study. However, QRDR mutations associated with AMR in B. anthracis and Y. pestis that are located in 339

non-homoploymeric regions of the genome were correctly identified in nanopore assemblies. Currently, 340

a hybrid (long- and short-read) assembly approach is necessary to construct complete genomes with 341

sufficent quality for variant calling (for SNP and indel detection) with nanopore data. 342

Y. pestis isolates harboring various multimeric pPCP1 forms in addition to, or in place of, one of 343

the three prototypical plasmids were described previously 38, 44. The 9.5 kb and 19 kb pPCP1 plasmids 344

are stable upon multiple passages and coexist; however, one plasmid is often present at a higher 345

concentration than the other, suggesting a level of genetic control of expression 38. Chu et al.38 noted the 346

presence of pPCP1 dimer forms in strain A1122, regardless of DNA extraction method used 38. Other 347

atypical Y. pestis plasmids have been described in Angola and C790 strains, including a chimeric 348

114,570 nt form with two tandemly repeated pPCP1 copies integrated into pMT1 45, and a pMT1-pCD1 349

chimera 46, respectively. The presence of an IS element, IS100, on pPCP1, pCD1, and pMT1 may have 350

facilitated the formation of these multimeric and chimeric plasmids via homologous recombination 45, 46. 351

Understanding these plasmid size and form variations among naturally occurring Y. pestis isolates is 352

important for distinguishing between natural and potentially engineered variation 44. 353



https://doi.org/10.1101/730093

16

In this study, the pPCP1 monomer and dimer forms were not consistently assembled correctly 354

because multiple pPCP1 reads of different sizes introduced considerable ambiguity into the assembly 355

graphs. Multiple overlapping pPCP1 reads corresponding to closed-circle multimer plasmids of various 356

increasing sizes were also identified. The longest pPCP1 multimers were detected in a Y. pestis S1037 357

assembly for which additional steps to reduce gDNA shearing during the SP extraction were employed. 358

Analysis of 100,000 reads revealed four 5-mer reads and one 6-mer read. Additional analysis of 359

300,000 reads identified eleven 5-mer reads and only one 6-mer read (data not shown); none of which 360

were detected in SP technical replicate data sets. The challenges of using long-read sequence data for 361

plasmid assembly, discussed below, and/or DNA shearing may explain the rarity of finding these large 362

multimers. 363

Chu et al.38 also described the presence of multimeric pPCP1 forms in A1122 >19 kb by 364

Southern blot analysis and they explain these multimer forms may contribute to bacterial plasmid 365

maintenance. While the Y. pestis genome is considered monomorphic 47, there is evidence for genome 366

plasticity due to large genomic rearrangements (translocations and inversions), as well as gene loss due 367

to homologous recombination between multiple copies of highly repetitive IS elements found on all 368

three plasmids and on the chromosome 48, 49, 50. We speculate that in Y. pestis strains S1037 and All22 369

the 9.5 kb pPCP1 plasmid has undergone homologous recombination with itself and with the 19 kb 370

pPCP1 plasmid to form the 2-mer to 6-mer species. 371

The power of MinION sequencing to resolve complex genomic structures was evident in the 372

ability to identify expected and novel pPCP1 plasmid multimers. These pPCP1 multimers were only 373

partially assembled using S1037 and A1122 Illumina and PacBio data. In addition, the A1122 reference 374

sequence (CP002956.1), assembled from Illumina data, contains only the pPCP1 monomer form. This 375

is an example of the limitation of using short-read data to assemble complex genomic regions and 376

highlights that assemblies in public databases may be incomplete. 377



https://doi.org/10.1101/730093

17

Small plasmid assembly using long-read data is challenging for many algorithms 51. Unlike the 378

issues associated with assembling multiple copies of the Y. pestis multimeric plasmid, the pUTE29 (~7.3 379

kb) plasmid was readily assembled in all B. anthracis data sets. However, the bioinformatics pipeline in 380

this study was tailored to assemble these plasmids. It is important to note that these parameters may not 381

be generalizable for every data set. While this work was performed using a small number of strains, the 382

availablity of resistant Y. pestis and B. anthraicis isolates is limited; for both laboratory derived and 383

naturally-occuring isolates. Further optimization is required to assure the pipeline can correctly identify 384

different plasmid and AMR profiles, and this is currently under investigation. Since the completion of 385

this work, new assemblers, e.g. WTDBG2 52 and FLYE 53, have been released and address the 386

limitations of plasmid assembly using long-read sequence data. 387

Other limitations of nanopore sequencing (e.g. cost, throughput, and accuracy) could be 388

mitigated by future updates to the software, hardware, library preparation kits, and reagents 41. But, 389

with these improvements and updates come challenges such as locking down assay parameters to assess 390

and compare performance 36, 54, 55. The workflows described here for same-day bacterial MinION WGS, 391

assembly, and bioinformatics analysis for biological threat pathogens show the utility of nanopore 392

sequencing for rapid diagnostics that will be critical during a biothreat event. As NGS technologies 393

continue to evolve and stablize, real-time nanopore WGS for bacterial characterization, including AMR 394

marker detection, chromosome, large and small plasmid assembly, and accurate SNP detection, may 395

soon be a reality. 396

397

Methods 398

Biosafety Procedures. Only select agent-excluded strains were used in this study (Y. pestis A1122, Y. 399

pestis S1037, B. anthracis Sterne/pUTE29, B. anthracis 411A2, and B. anthracis UT308). All 400

procedures were performed in a biosafety level 2 (BSL-2) laboratory by trained personnel wearing 401



https://doi.org/10.1101/730093

18

appropriate personal protective equipment (PPE) according to the CDC/NIH publication Biosafety in 402

Microbiological and Biomedical Laboratories, 5th edition 56. 403

Bacterial Strains. Study strains are listed in Table 1. Y. pestis A1122 was passaged on increasing 404

concentrations of CIP using a previously described method 11 to create the CIP-NS strain, S1037. This 405

strain was developed as a control strain with approval from the CDC Institutional Biosafety Committee 406

and Institutional Biosecurity Board. 407

Growth Conditions. All strains were cultured for 16-20 hours (B. anthracis) or 24 hours (Y. pestis) on 408

BD BBLTM Trypticase Soy Agar II with 5% sheep blood (SBA) (Thermo Fisher Scientific, Waltham, 409

MA, USA) at 35°C in ambient air from glycerol stocks stored at -70°C. 410

Antimicrobial Susceptibility Testing. Broth microdilution (BMD) was performed to determine 411

antimicrobial susceptibility following the Clinical and Laboratory Standards Institute guidelines 57. 412

Genomic DNA Extraction. gDNA for each study strain was extracted from agar plate cultures, 413

described above. For Y. pestis, colonies were transferred to 200µl of Phosphate Buffer Solution (0.01M, 414

pH 7.40) and vortexed until completely resuspended. B. anthracis colonies were transferred to 200µl of 415

lysis buffer (20mM Tris HCL pH 8.0, 2mM EDTA, 1.2% Triton X-100) and incubated for 60 min at 416

37°C. Y. pestis and B. anthracis gDNA was isolated using the MasterPure Complete DNA and RNA 417

Purification Kit (Lucigen, Middleton, WI), a salt-precipitation (SP) extraction method, and the QIAamp 418

DNA Blood Mini Kit (Qiagen Valencia, CA), a silica-membrane (SM) extraction method. Both 419

extractions were performed according to manufacturer’s specifications. Y. pestis extracts were eluted in 420

75µl of Elution Buffer (EB) (10 mM Tris-Cl, pH 8.5, Qiagen, Valencia, CA) and B. anthracis extracts 421

were eluted in 50µl EB. gDNA samples were subsequently stored at 4°C. 422

Assessment of gDNA quality and quantity. gDNA extractions were evaluated for quantity and quality 423

using the Qubit dsDNA BR Assay Kit (Invitrogen, Waltham, MA) and NanoDrop 2000 424

spectrophotometer (ThermoFisher Scientific, Pittsburgh, PA) following manufacturer’s instructions. 425



https://doi.org/10.1101/730093

19

gDNA integrity was assessed following 0.6% agarose gel electrophoresis of 100ng of each sample to 426

visualize the presence of intact, high molecular weight (HMW) DNA. The Quick-Load 1 kb Extend 427

DNA Ladder (New England BioLabs, Ipswich, MA) was used as a molecular marker. 428

Nanopore DNA library preparation. Libraries were prepared using 400ng of gDNA per strain and the 429

Rapid Sequencing Kit, SQK-RAD004 (ONT, Oxford, England) or the Field Sequencing Kit, SQK-430

LRK001 (ONT, pre-released), according to manufacturer’s specifications. All Field Sequencing Kit 431

components were stored at room temperature prior to use. Y. pestis S1037 libraries were prepared in 432

triplicate on the same day from the same gDNA extraction and stored at 4°C for no longer than 3 days. 433

Nanopore sequencing. Sequencing was performed on a MinION (MK1B) for 16-48 hours using R9.4.1 434

flowcells and MinKNOW (version 18.01.6). All flowcells met the minimum ≥800 active pore count 435

requirement immediately prior to sequencing. Each sample was sequenced on a separate flowcell and 436

each flowcell was used for only one sequencing run. For the technical replicate study, the same 437

laboratorian loaded each library into the flowcells to minimize variation. The replicate Y. pestis S1037 438

libraries were sequenced on sequential days over a 3 day period. To standardize the comparison of the 439

technical replicate Rapid Kit and Field Kit Y. pestis S1037 data sets, only raw fast5 reads from the 440

“passed folder,” generated during the first 16 hours of sequencing were analyzed. Raw fast5 reads from 441

the “failed folder” were not included in this analysis. Sequencing runs included in this study are 442

described in Supp Table 2. 443

Bioinformatics pipeline. A pipeline was developed to perform rapid de novo genome and plasmid 444

assemblies, to detect AMR variants and genes, and to identify variants (mismatches and indels), see 445

Supp Fig. 9. 446

Basecalling. Basecalling of raw fast5 reads was performed using the Albacore basecalling 447

utility (version 2.3.10), to produce raw FASTQ reads. Raw FASTQ reads were further processed using 448

Porechop (version 0.2.3) to remove adapters and any potentially chimeric sequences. Processed FASTQ 449



https://doi.org/10.1101/730093

20

reads were used to generate a draft assembly using minimap2 58 (version 2.10-r770-dirty) and miniasm 450

59 (0.2-r168-dirty). First, all-versus-all alignment of the FASTQ reads was performed using minimap2 451

with the option ‘-x ava-ont’. The resulting minimap2 alignments and input reads were used to create a 452

draft assembly with miniasm, using the options ‘-s 1750 –h 1000 –I .5’. Errors in the draft assembly 453

generated by miniasm were iteratively improved using RACON 60 (version 1.1.1). For 5 iterations, 454

minimap2 was used to align the reads to an assembly and the alignments were used by RACON to 455

produce a consensus assembly. The first iteration used the draft assembly generated by miniasm, while 456

subsequent iterations used the consensus assembly generated by the previous RACON step. The final 457

output of the RACON process was further processed using Nanopolish 61 (version 0.10.1). Nanopolish 458

was run in the variants mode using the options ‘--faster --consensus --min-candidate-frequency 0.1’, and 459

the vcf2fasta feature was used to create the final consensus sequence. 460

Detection of known genes related to AMR. The ResFinder62 database of AMR genes was 461

queried against the final Nanopolish assembly using BLASTn 63 and the options ‘-perc_identity 90 –462

evalue 1e-10’ to identify potential AMR sequences in each assembly. AMR gene results were clustered 463

using BEDTools 64 merge (version 2.24.0) with the options ‘-d -30’ in order to identify unique, non-464

overlapping regions with high similarity to AMR sequences. For each cluster, BEDTools intersect was 465

used to find the gene with the highest % identity, which most completely covered the cluster. 466

Detection of known mutations related to AMR. Basecalls from the nanopore sequencing reads 467

were compared to a reference genome using minimap2 58 by aligning reads against the corresponding 468

reference sequence (Y. pestis A1122, Accession No. CP002956.1, or B. anthracis Ames Ancestor, 469

Accession No. AE017334) with the options ‘-a -x map-ont’. The samtools 65 mpileup utility (version 470

1.2) was used to find the numbers of each base seen in the reads at each position in the QRDR gene 471

coding regions. These basecall counts were used with bcftools 65 call (version 1.8) to identify variant 472

sites in those genes. 473



https://doi.org/10.1101/730093

21

Characterization of Plasmids. Contigs ≥500,000 nt were excluded from plasmid 474

characterization as they likely represent chromosomal sequence. Remaining contigs ( ≤500,000 nt) 475

were queried with a custom database of plasmids and artificial sequences using the pblat 66 utility 476

(version 35). A custom algorithm, pChunks, was used to find the set of plasmid sequences present in the 477

assembly such that the maximal amount of plasmid sequence was included. 478

Detection of mismatches, insertions and deletions. The consensus assembly generated by the 479

last iteration of RACON was aligned with the corresponding reference sequence (Y. pestis A1122 or B. 480

anthracis Ames Ancestor) to identify mismatches and small indels using the dnadiff utility from the 481

MUMmer 67 package (version 4.0.0beta2). Gaps from the resulting ‘rdiff’ (reference differences) and 482

‘qdiff’ (query differences) outputs were used to identify indels in one genome relative to the other. 483

Sequence gaps ≥100 bases, and with positive lengths in both the assembly and reference sequences 484

(denoting real insertions and not duplications) were kept as insertions. Mismatches identified by 485

MUMmer were compared to the presence of homopolymeric regions in the reference sequence to 486

determine the number of homopolymeric regions that were assembled correctly. Coverage of the 487

reference chromosome and plasmids was characterized by aligning reads against the references 488

sequences using minimap2, and enumerating the mean per-base coverage. 489

Downsampling. Y. pestis and B. anthracis reads were retrospectively assembled using the above 490

analysis in the order they were generated for the first 25,000, 50,000, 75,000, 100,000 passed fast5 reads 491

to determine the minimum number of sequencing reads required for chromosome and plasmid assembly, 492

as well as the identification of known AMR genes and mutations. Per assembly, the number of 493

mismatches, indels, and percent identity were determined using an alignment to the reference sequence. 494

The chromosome and plasmid precent coverage was calculated. 495

pPCP1 Analysis. For each SP S1037 Y. pestis 100,000-read data set, basecalled reads of ≥1000 nt were 496

aligned to the pPCP1 reference sequence from Y. pestis A1122, Accession No. CP002956.1, using 497



https://doi.org/10.1101/730093

22

minimap2. Reads that aligned to the pPCP1 sequence across at least 95% of their length were included 498

for analysis. Reads were assigned to a 1, 2, 3, 4, 5, or 6-mer version of the pPCP1 sequence based on 499

their length and the fractions of these versions in each sample were used to determine the pPCP1 500

multimer fraction (Supp Fig. 4). All SP Y. pestis S1037 data sets contained reads corresponding to 2, 3, 501

and 4-mer forms of plasmid pPCP1, and SP Rapid 1 was chosen as the representative data set to use to 502

generate Circos plots 68 representing the distribution of reads from the plasmid multimers. From the SP 503

Rapid 1 aligned reads, a representative read for each multimeric pPCP1 sequence was choosen to 504

represent the complete plasmid. The representative read was rotated such that the first base of the read 505

aligned to the first base of the pPCP1 reference sequence. The remaining reads for each multimeric 506

pPCP1 sequence were aligned against the reference read. Known coding sequences from pPCP1 were 507

found in the multimeric forms by querying these features against the representative read using BLASTn 508

63. 509

Illumina library preparation, sequencing, and analysis. Sequencing of Y. pestis S1037 and B. 510

anthracis Sterne/pUTE29 was performed on an Illumina MiSeq using the v2 kit (2x250 nt paired-end) 511

with Nextera XT libraries. Sequencing adapters were removed and reads were trimmed using Cutadapt 512

(v1.14). De novo assembly was performed using the SPAdes Genome Assembler (v3.10.0). The 513

consensus Illumina assemblies were aligned with the corresponding Y. pestis A1122 or B. anthracis 514

Ames Ancestor reference assemblies to identify mismatches and small indels using the dnadiff utility 515

from the MUMmer 67 package (version 4.0.0beta2). 516

PacBio library preparation, sequencing, and analysis. Libraries (10 kb) of Y. pestis S1037 and B. 517

anthracis Sterne/pUTE29 were prepared with the SMRTbell Template Prep Kit 1.0 (Pacific Biosciences, 518

Menlo Park, CA) and bound using the DNA/Polymerase Binding Kit P6v2 (Pacific Biosciences, Menlo 519

Park, CA). The bound libraries were then loaded on one SMRTcell and sequenced with C4v2 chemistry 520

for 270 min movies on the RSII instrument (Pacific Biosciences, Menlo Park, CA). Sequence reads 521



https://doi.org/10.1101/730093

23

were de novo assembled using RS_HGAP Assembly.3 in the SMRT Analysis 2.3.0 portal. The 522

consensus PacBio assemblies were aligned with the corresponding Y. pestis A1122 or B. anthracis Ames 523

Ancestor reference assemblies to identify mismatches and small indels using the dnadiff utility from the 524

MUMmer 67 package (version 4.0.0beta2). Comparison of PacBio reads corresponding to the pPCP1 525

sequence was carried out as described above for the analysis of pPCP1. 526

Data Availability 527

The nanopore, illumina, and PacBio sequence data sets generated and analyzed in the current study are 528

available from NCBI in the Sequence Read Archive under bioproject PRJNA523610. All nanopore 529

sequencing runs and corresponding BioSample accession numbers are also listed in Supp Table 2. All 530

data supporting the findings of this study are available within the article and its supplementary 531

information files, or are available upon request. 532

533

References 534

1. Wagar E. Bioterrorism and the Role of the Clinical Microbiology Laboratory. Clin Microbiol Rev 29, 175-535 189 (2016). 536

537 2. Jernigan DB, et al. Investigation of bioterrorism-related anthrax, United States, 2001: epidemiologic 538

findings. Emerg Infect Dis 8, 1019-1028 (2002). 539

540 3. Jernigan JA, et al. Bioterrorism-related inhalational anthrax: the first 10 cases reported in the United 541

States. Emerg Infect Dis 7, 933-944 (2001). 542

543 4. Shandomy S IA, Raizman E, Bruni M, Palamara E, Pittiglio C, Lubroth J. Anthrax outbreaks: a warning for 544

improved prevention, control and heightened awareness. In: Empres Watch. Food and Agriculture 545 Organization of the United Nations (2016). 546

547 5. Mead PS. Plague in Madagascar - A Tragic Opportunity for Improving Public Health. N Engl J Med 378, 548

106-108 (2018). 549

550 6. Weigel LM, Morse S. A. Implications of antibiotic resistance in potential agents of bioterrorism. In 551

Mayers DL (ed), Antimicrobial Drug Resistance Humana Press, Totowa, NJ, 1293-1312 (2009). 552



https://doi.org/10.1101/730093

24

553 7. Chen Y, Succi J, Tenover FC, Koehler TM. Beta-lactamase genes of the penicillin-susceptible Bacillus 554

anthracis Sterne strain. J Bacteriol 185, 823-830 (2003). 555

556 8. Pomerantsev AP, Shishkova NA, Marinin LI. Comparison of therapeutic effects of antibiotics of the 557

tetracycline group in the treatment of anthrax caused by a strain inheriting tet-gene of plasmid pBC16. 558 Antibiot Khimioter 37, 31-34 (1992). 559

560 9. Price LB, Vogler A, Pearson T, Busch JD, Schupp JM, Keim P. In vitro selection and characterization of 561

Bacillus anthracis mutants with high-level resistance to ciprofloxacin. Antimicrob Agents Chemother 47, 562 2362-2365 (2003). 563

564 10. Ryzhko IV, et al. Virulence of rifampicin and quinolone resistant mutants of strains of plague microbe 565

with Fra+ and Fra- phenotypes. Antibiot Khimioter 39, 32-36 (1994). 566

567 11. Lindler LE, Fan W, Jahan N. Detection of ciprofloxacin-resistant Yersinia pestis by fluorogenic PCR using 568

the LightCycler. J Clin Microbiol 39, 3649-3655 (2001). 569

570 12. Inglesby TV, et al. Plague as a biological weapon: medical and public health management. Working 571

Group on Civilian Biodefense. JAMA 283, 2281-2290 (2000). 572

573 13. Galimand M, Carniel E, Courvalin P. Resistance of Yersinia pestis to antimicrobial agents. Antimicrob 574

Agents Chemother 50, 3233-3236 (2006). 575

576 14. Guiyoule A, et al. Transferable plasmid-mediated resistance to streptomycin in a clinical isolate of 577

Yersinia pestis. Emerg Infect Dis 7, 43-48 (2001). 578

579 15. Galimand M, et al. Multidrug resistance in Yersinia pestis mediated by a transferable plasmid. N Engl J 580

Med 337, 677-680 (1997). 581

582 16. Cabanel N, Bouchier C, Rajerison M, Carniel E. Plasmid-mediated doxycycline resistance in a Yersinia 583

pestis strain isolated from a rat. Int J Antimicrob Agents 51, 249-254 (2018). 584

585 17. CLSI. Methods for antimicrobial diluiton and disc susceptibility testing of infrequently isolated or 586

fastidious bacteria, 3rd ed. M45. CLSI, Wayne, PA. (2016). 587

588 18. Urich SK, Chalcraft L, Schriefer ME, Yockey BM, Petersen JM. Lack of antimicrobial resistance in Yersinia 589

pestis isolates from 17 countries in the Americas, Africa, and Asia. Antimicrob Agents Chemother 56, 590 555-558 (2012). 591

592 19. Weigel LM, Sue D, Michel PA, Kitchel B, Pillai SP. A rapid antimicrobial susceptibility test for Bacillus 593

anthracis. Antimicrob Agents Chemother 54, 2793-2800 (2010). 594



https://doi.org/10.1101/730093

25

595 20. Schofield DA, et al. Bacillus anthracis diagnostic detection and rapid antibiotic susceptibility 596

determination using 'bioluminescent' reporter phage. J Microbiol Methods 95, 156-161 (2013). 597

598 21. Steinberger-Levy I, et al. Enrichment of Yersinia pestis from blood cultures enables rapid antimicrobial 599

susceptibility determination by flow cytometry. Adv Exp Med Biol 603, 339-350 (2007). 600

601 22. Bugrysheva JV, Lascols C, Sue D, Weigel LM. Rapid Antimicrobial Susceptibility Testing of Bacillus 602

anthracis, Yersinia pestis, and Burkholderia pseudomallei by Use of Laser Light Scattering Technology. J 603 Clin Microbiol 54, 1462-1471 (2016). 604

605 23. McLaughlin HP, Gargis AS, Michel P, Sue D, Weigel LM. Optical Screening for Rapid Antimicrobial 606

Susceptibility Testing and for Observation of Phenotypic Diversity among Strains of the Genetically 607 Clonal Species Bacillus anthracis. J Clin Microbiol 55, 959-970 (2017). 608

609 24. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing 610

technologies. Nature reviews Genetics 17, 333-351 (2016). 611

612 25. Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION 613

sequencing. Microbial genomics 3, e000132 (2017). 614

615 26. Lu H, Giordano F, Ning Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics, 616

proteomics & bioinformatics 14, 265-279 (2016). 617

618 27. Ricker N, Qian H, Fulthorpe RR. The limitations of draft assemblies for understanding prokaryotic 619

adaptation and evolution. Genomics 100, 167-175 (2012). 620

621 28. Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E. Next-generation sequencing technologies 622

and their application to the study and control of bacterial infections. Clinical microbiology and infection : 623 the official publication of the European Society of Clinical Microbiology and Infectious Diseases 24, 335-624 341 (2018). 625

626 29. Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput 627

sequencing errors and their correction. Briefings in bioinformatics 17, 154-179 (2016). 628

629 30. Stoesser N, et al. First Report of blaIMP-14 on a Plasmid Harboring Multiple Drug Resistance Genes in 630

Escherichia coli Sequence Type 131. Antimicrob Agents Chemother 60, 5068-5071 (2016). 631

632 31. Forde BM, et al. The complete genome sequence of Escherichia coli EC958: a high quality reference 633

sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone. PloS one 9, 634 e104400 (2014). 635

636



https://doi.org/10.1101/730093

26

32. Bashir A, et al. A hybrid approach for the automated finishing of bacterial genomes. Nature 637 Biotechnology 30, 701 (2012). 638

639 33. Goldstein S, Beka L, Graf J, Klassen JL. Evaluation of strategies for the assembly of diverse bacterial 640

genomes using MinION long-read sequencing. BMC Genomics 20, 23 (2019). 641

642 34. Ashton PM, et al. MinION nanopore sequencing identifies the position and structure of a bacterial 643

antibiotic resistance island. Nat Biotechnol 33, 296-300 (2015). 644

645 35. Jain M, et al. MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 646

chemistry. F1000Research 6, 760 (2017). 647

648 36. Tyler AD, et al. Evaluation of Oxford Nanopore’s MinION Sequencing Device for Microbial Whole 649

Genome Sequencing Applications. Scientific Reports 8, 10931 (2018). 650

651 37. Jawetz E, Meyer, K.F. Avirulent strains of Pasteurella pestis. Journal of Infectious Disease 124-143 (1943). 652

653 38. Chu MC, Dong XQ, Zhou X, Garon CF. A cryptic 19-kilobase plasmid associated with U.S. isolates of 654

Yersinia pestis: a dimer of the 9.5-kilobase plasmid. The American journal of tropical medicine and 655 hygiene 59, 679-686 (1998). 656

657 39. The National Biodefense Strategy. Office of the President of the United States, 2018. 658

https://www.hsdl.org/?view&did=815921. Access Date: June 4, 2019. 659

660 40. Quick J, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228-232 661

(2016). 662

663 41. Gwinn M, MacCannell D, Armstrong GL. Next-Generation Sequencing of Infectious Pathogens. JAMA, 664

(2019). 665

666 42. Gargis AS, et al. Analysis of Whole-Genome Sequences for the Prediction of Penicillin Resistance and 667

beta-Lactamase Activity in Bacillus anthracis. mSystems 3, (2018). 668

669 43. O'Donnell CR, Wang H, Dunbar WB. Error analysis of idealized nanopore sequencing. Electrophoresis 34, 670

2137-2144 (2013). 671

672 44. Anisimov AP, Lindler LE, Pier GB. Intraspecific diversity of Yersinia pestis. Clin Microbiol Rev 17, 434-464 673

(2004). 674

675 45. Eppinger M, et al. Genome sequence of the deep-rooted Yersinia pestis strain Angola reveals new 676

insights into the evolution and pangenome of the plague bacterium. J Bacteriol 192, 1685-1699 (2010). 677



https://doi.org/10.1101/730093

27

678 46. Rajanna C, et al. Characterization of pPCP1 Plasmids in Yersinia pestis Strains Isolated from the Former 679

Soviet Union. International journal of microbiology 2010, 760819 (2010). 680

681 47. Achtman M. Evolution, population structure, and phylogeography of genetically monomorphic bacterial 682

pathogens. Annual review of microbiology 62, 53-70 (2008). 683

684 48. Radnedge L, Agron PG, Worsham PL, Andersen GL. Genome plasticity in Yersinia pestis. Microbiology 685

(Reading, England) 148, 1687-1698 (2002). 686

687 49. Wren BW. The yersiniae--a model genus to study the rapid evolution of bacterial pathogens. Nature 688

reviews Microbiology 1, 55-64 (2003). 689

690 50. Parkhill J, et al. Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413, 523-527 691

(2001). 692

693 51. Kolmogorov M, Yuan J, Lin Y, Pevzner P. Assembly of Long Error-Prone Reads Using Repeat Graphs. 694

bioRxiv, 247148 (2018). 695

696 52. Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. bioRxiv, 530972 (2019). 697

698 53. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat 699

Biotechnol 37, 540-546 (2019). 700

701 54. Lemon JK, Khil PP, Frank KM, Dekker JP. Rapid Nanopore Sequencing of Plasmids and Resistance Gene 702

Detection in Clinical Isolates. J Clin Microbiol 55, 3530-3543 (2017). 703

704 55. Giordano F, et al. De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms. Sci Rep 705

7, 3935 (2017). 706

707 56. Chosewood LC, Wilson DE, Centers for Disease Control and Prevention (U.S.), National Institutes of 708

Health (U.S.). Biosafety in microbiological and biomedical laboratories, 5th edn. U.S. Dept. of Health and 709 Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institutes of 710 Health (2009). 711

712 57. M45. Methods for Antimicrobial Dilution and Disc Susceptibility Testing of Infrequently Isolated or 713

Fastidious Bacteria. Clinical and Laboratory Standards Institute 3rd ed, (2015). 714

715 58. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics (Oxford, England) 34, 716

3094-3100 (2018). 717

718



https://doi.org/10.1101/730093

28

59. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. 719 Bioinformatics (Oxford, England) 32, 2103-2110 (2016). 720

721 60. Vaser R, Sovic I, Nagarajan N, Sikic M. Fast and accurate de novo genome assembly from long 722

uncorrected reads. Genome research 27, 737-746 (2017). 723

724 61. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore 725

sequencing data. Nature methods 12, 733-735 (2015). 726

727 62. Kleinheinz KA, Joensen KG, Larsen MV. Applying the ResFinder and VirulenceFinder web-services for 728

easy identification of acquired antibiotic resistance and E. coli virulence genes in bacteriophage and 729 prophage nucleotide sequences. Bacteriophage 4, e27943 (2014). 730

731 63. Chen Y, Ye W, Zhang Y, Xu Y. High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic acids 732

research 43, 7762-7768 (2015). 733

734 64. Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Current protocols in 735

bioinformatics 47, 11.12.11-34 (2014). 736

737 65. Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 25, 738

2078-2079 (2009). 739

740 66. Kent WJ. BLAT--the BLAST-like alignment tool. Genome research 12, 656-664 (2002). 741

742 67. Delcher AL, Salzberg SL, Phillippy AM. Using MUMmer to identify similar regions in large sequence sets. 743

Current protocols in bioinformatics Chapter 10, Unit 10.13 (2003). 744

745 68. Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome research 19, 746

1639-1645 (2009). 747

748 69. Ross CL, Thomason KS, Koehler TM. An extracytoplasmic function sigma factor controls beta-lactamase 749

gene expression in Bacillus anthracis and other Bacillus cereus group species. J Bacteriol 191, 6683-6693 750 (2009). 751

752 70. Gargis AS, Lascols C, McLaughlin HP, Conley AB, Hoffmaster AR, Sue D. Genome Sequences of Penicillin-753

Resistant Bacillus anthracis Strains. Microbiol Resour Announc 8, (2019). 754

755

756

757

758



https://doi.org/10.1101/730093

29

Acknowledgements 759

This work was supported by the U.S. CDC Office of Public Health Preparedness and Response (now 760

Center for Preparedness and Response). We thank Christopher Lynberg, CDC, and Charles Johnston, 761

General Dynamics contract agency, for computational support, technical advice, and valuable 762

discussions. We thank Dhwani Batra, CDC, and Lori A. Rowe, CDC, for PacBio sequencing and 763

analysis. We thank Jeannine Petersen, CDC, Luke C. Kingry, CDC, and Alex Hoffmaster, CDC, for 764

their critical review of the manuscript. The findings and conclusions in this report are those of the 765

authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. 766

The use of trade names is for identification only and does not imply endorsement by the Centers for 767

Disease Control and Prevention. 768

769

Author Contributions 770

A.S.G. and D.S. conceived the project and all authors contributed to the design of experiments. A.S.G., 771

B.C., and H.P.M. performed all experiments and all authors contributed to data analysis. A.B.C. 772

established the bioinformatics pipeline, performed all sequence analysis, and created all figures. A.S.G. 773

led development of the manuscript, and all authors contributed to the writing process. All authors 774

reviewed and edited the manuscript. 775

776

Competing interests 777

The authors declare no competing interests. 778

779

780

781

782



https://doi.org/10.1101/730093

30

Figure Legends: 783

Figure 1. Nanopore sequencing run metrics for Y. pestis strain S1037. Genomic DNA was 784

extracted by the salt-precipitation (SP) or the silica-membrane (SM) method and libraries were prepared 785

in triplicate using the ONT Rapid Sequencing Kit (Rapid), or the Field Sequencing Kit (Field). All run 786

metrics were based on a 16 hour sequencing run time and the active pore count (-o-) was recorded 787

during the platform QC step prior to the sequencing run. The number of raw fast5 reads generated 788

(green, passed; yellow, failed) and raw FASTQ Albacore base-called reads with an average quality score 789

of ≥ 7 (blue) are displayed. 790

791

Figure 2. Assessment of nanopore sequencing performance and de novo assembly for Y. pestis 792

S1037. DNA sequencing libraries were prepared using the Rapid Sequencing Kit (in triplicate) and 793

Field Sequencing Kit from a salt-precipitation (SP) or a silica-membrane (SM) extraction. Assemblies 794

were generated using analysis of the first 25,000, 50,000, 75,000, and 100,000 passed fast5 nanopore 795

reads. a-c Determination of mismatches, indels, and % identity, and d-f fold coverage for chromosomal 796

and plasmid sequences of Y. pestis S1037 data sets aligned to the All22 reference strain. g Using 797

100,000 passed fast5 reads, the ability to assemble the chromosome and plasmids, and to detect the gyrA 798

SNP associated with the non-susceptible CIP phenotype is depicted in a presence/absence plot. 799

Assembled is defined as a complete sequence with the same nucleotide order as the reference sequence 800

(navy), partially assembled represents a sequence that was assembled but not as a single contig (green), 801

misassembled includes a sequence that was assembled but the bases are in a different order than the 802

reference sequence, or a sequence assembled multiple times resulting in an incomplete assembly 803

(yellow), and unassembled refers to a sequence that was not present in the assembly (orange). 804

805

Figure 3. Time required to detect antimicrobial resistance markers in Y. pestis and B. anthracis 806

using nanopore whole genome sequencing. Time 0:00 starts from a pure isolated culture. The time to 807



https://doi.org/10.1101/730093

31

extract genomic DNA by the silica-membrane method is shown, including the one hour B. anthracis 808

lysis step. Quality control steps include assessment of DNA concentration, quality, and integrity. 809

Library preparation time is estimated for the Rapid Sequencing Kit. The average time required to 810

generate 100,000 passed fast5 reads is shown for each species. Genome analysis includes base-calling 811

using Albacore and de novo genome and plasmid assembly. See methods and Supplementary Figure 9 812

for the bioinformatics analysis pipeline details. 813

814

Figure 4. Detection of multimeric forms of pPCP1 in nanopore sequence assemblies of Y. pestis 815

S1037. pPCP1 is ~9.5 kb, but the pPCP1 1-mer assembly is ~9.1 kb due to the presence of small indels 816

in the nanopore-generated sequence. Reads >18.2 kb were assigned to the 3- or 4-mer form based on 817

length. A representative read (grey) for each multimer was selected as a reference for the complete 818

plasmid the remaining reads were aligned against this sequence (purple). Coding sequences (green) and 819

IS100 elements (yellow) are shown in the center. The 2-, 3- and 4-mer forms are linked together in a 5′-820

3′ head-to-toe orientation. 821

822

Figure 5. Assessment of nanopore sequencing performance and de novo assembly for B. anthracis 823

strains. DNA sequencing libraries were prepared using the Rapid and Field Sequencing Kits from a 824

silica-membrane extraction. Assemblies were generated using analysis of the first 25,000, 50,000, 825

75,000, and 100,000 passed fast5 nanopore reads. a-c Determination of mismatches, indels and % 826

identity, and d-f fold coverage for chromosomal and plasmid sequences in data sets of B. anthracis 827

strains (Sterne/pUTE29, 411A2, and UT308) aligned to the Ames Ancestor reference sequence. g Using 828

analysis of 100,000 passed fast5 reads, the ability to assemble the chromosome and plasmids (blue) and 829

detect the gyrA and parC SNPs associated with the non-susceptible CIP phenotype (teal) or wild type 830

sequence (N/A), is depicted in a presence/absence plot. Assembled is defined as a complete sequence 831



https://doi.org/10.1101/730093

32

with the same nucleotide order as the reference sequence (navy), partially assembled represents a 832

sequence that was assembled but not as a single contig (green), and unassembled refers to a sequence 833

that was not present in the assembly (orange). 834

835



https://doi.org/10.1101/730093

33

Figure 1. 836

837

838

839

840



https://doi.org/10.1101/730093

34

Figure 2a-f.841

842



https://doi.org/10.1101/730093

35

Figure 2g. 843

844

845

846

847

848

849

850

851

852



https://doi.org/10.1101/730093

36

Figure 3. 853

854



https://doi.org/10.1101/730093

37

Figure 4. 855

856



https://doi.org/10.1101/730093

38

Figure 5a-f. 857

858



https://doi.org/10.1101/730093

39

Figure 5g. 859

860

861



https://doi.org/10.1101/730093

40

Table 1. Bacterial strains used in this study. 862

Strain Susceptibility

Profile Plasmids Gene

Nucleotide

Mutation Reference

Y. pestis A1122 CIPS, DOXS, GENS pMT1+, pPCP1+(*), pCD1− gyrA Wild Type (37)

Y. pestis S1037 CIPNS, DOXS, GENS pMT1+, pPCP1+(*), pCD1− gyrA C249 ►A This work

B. anthracis

Sterne/pUTE29 PENS, CIPS, DOXNS pX01+, pX02−, pUTE29+ tetL NA (19)

B. anthracis 411A2 PENS, CIPNS, DOXS pX01+, pX02− gyrA

parC

C254 ►T

C242 ►T (19)

B. anthracis

UT308 PENR, CIPS, DOXS pX01+(#), pX02− gyrA Wild Type (69, 70)

Susceptibility, plasmid profiles and genes/mutations associated with antibiotic resistance investigated in this study. The first 100,000 fast5 reads generated 863 for each sequencing run were analyzed. (*) multimer forms of this plasmid were detected, (#) UT308 pXO1 is missing the pathogenicity island containing 864 the anthrax toxin genes, (NA) no nucleotide mutations associated with resistance are found in this strain. 865



https://doi.org/10.1101/730093

1 Rapid Detection of Genetic Engineering, Structural Variation, … · identify genetic...

Documents

Transcript of 1 Rapid Detection of Genetic Engineering, Structural Variation, … · identify genetic...