1 Rapid Detection of Genetic Engineering, Structural Variation, … · identify genetic...
Transcript of 1 Rapid Detection of Genetic Engineering, Structural Variation, … · identify genetic...
1
Rapid Detection of Genetic Engineering, Structural Variation, and Antimicrobial Resistance 1
Markers in Bacterial Biothreat Pathogens by Nanopore Sequencing 2
A. S. Gargis1*, B. Cherney1, A. B. Conley2, H. P. McLaughlin1, D. Sue1 3
4
1 Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic 5
Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA. 6
2 IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, Georgia, USA. 7
* Corresponding author. Current Address: Division of Healthcare Quality Promotion, National Center for 8
Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, 9
Georgia, USA. 10
11
12
13
14
15
16
17
18
19
20
21
22
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
2
Abstract 23
Widespread release of Bacillus anthracis (anthrax) or Yersinia pestis (plague) would prompt a 24
public health emergency. During an exposure event, high-quality whole genome sequencing (WGS) can 25
identify genetic engineering, including the introduction of antimicrobial resistance (AMR) genes. Here, 26
we developed rapid WGS laboratory and bioinformatics workflows using a long-read nanopore 27
sequencer (MinION) for Y. pestis (6.5h) and B. anthracis (8.5h) and sequenced strains with different 28
AMR profiles. Both salt-precipitation and silica-membrane extracted DNA were suitable for MinION 29
WGS using both rapid and field library preparation methods. In replicate experiments, nanopore quality 30
metrics were defined for genome assembly and mutation analysis. AMR markers were correctly 31
detected and >99% coverage of chromosomes and plasmids was achieved using 100,000 raw sequencing 32
reads. While chromosomes and large and small plasmids were accurately assembled, including novel 33
multimeric forms of the Y. pestis virulence plasmid, pPCP1, MinION reads were error-prone, 34
particularly in homopolymer regions. MinION sequencing holds promise as a practical, front-line 35
strategy for on-site pathogen characterization to speed the public health response during a biothreat 36
emergency. 37
38
Introduction 39
Yersinia pestis and Bacillus anthracis are the etiological agents of plague and anthrax, 40
respectively. Deliberate misuse of these pathogens as bioterrorism agents poses a serious public health 41
and safety threat, due to low infectious doses, high lethality, and the ease of production, dissemination 42
and communicability 1. Strains of both species were weaponized previously, and the 2001 Amerithrax 43
incident underscores the importance of biological threat preparedness and rapid laboratory response 2, 3. 44
Natural anthrax and plague outbreaks are reported annually in animals, and human cases are often 45
zoonotic. While these diseases continue to cycle in isolated, enzootic foci, re-emergence was reported 46
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
3
during 2016 anthrax outbreaks in Russia and Sweden4, and the 2017 pneumonic plague outbreak in 47
Madagascar 5. Both naturally-occurring and engineered antimicrobial resistance (AMR) have been 48
reported in Y. pestis and B. anthracis 6. Most B. anthracis strains are susceptible to antibiotics, but 49
surveys of clinical and environmental isolates indicate penicillin resistance occurs in 2 to 16% of strains 50
7. Attenutated B. anthracis strains with AMR were isolated following laboratory in vitro passaging 51
(selective pressure) and/or targeted genetic manipulation 8, 9. Laboratory-selected virulent and avirulent 52
Y. pestis strains with quinolone-resistance were previously reported 10, 11 and engineered multi-drug 53
resistant (MDR) strains are a public health concern 12, 13. Y. pestis is intrinsically susceptible to every 54
antibiotic recommended for human plague therapy, but a few Y. pestis isolates with transferable 55
plasmid-mediated resistance were reported from Madagascar 13, 14, 15, 16. 56
A public health response to incidents involving human plague or anthrax requires timely and 57
accurate AMR profiling. Conventional broth microdilution (BMD) is the gold standard antimicrobial 58
susceptibility testing (AST) method, but requires lengthy incubation times for B. anthracis (16 to 20 59
hours) and Y. pestis (at least 24 hours)17, 18. Several new phenotypic methods can rapidly assess 60
functional susceptibility using: real-time PCR to detect growth in the presence of antimicrobial agents 19, 61
bioluminescent reporter phage 20, flow cytometry 21, laser light scattering technology 22, and optical 62
screening 23. While these phenotypic methods are more rapid than BMD AST, they cannot definitively 63
detect genetic manipulation. Whole genome sequencing (WGS) can reveal evidence of genetic 64
engineering in bacteria, such as the introduction of genes, mutations and/or plasmids. While there is not 65
a comprehensive understanding of all genes or mutations associated with resistance, WGS is invaluable 66
for understanding novel genotypic/phenotypic relationships. Moreover, unlike PCR–based methods, a 67
comprehensive knowledge of AMR is not required, since de novo WGS is unbiased and can reveal 68
previously undescribed markers without targeted primer sets. Genomic data complements phenotypic 69
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
4
AST results and offers insight into mechanisms of drug resistance, which is especially critical in the age 70
of synthetic biology. 71
Two next-generation sequencing (NGS) approaches are used for microbial WGS: short-read 72
sequencing (e.g. Illumina) and long-read-sequencing (e.g. Pacific Biosciences, PacBio and Oxford 73
Nanopore Technologies, ONT). Short-read sequence data have low error rates and are useful for strain 74
typing and the identification of single nucleotide polymorphisms (SNPs) 24, 25. The short (maximally 75
300nt) read lengths limit the ability to resolve repetitive sequences or structural genomic 76
rearrangements, and can result in unfinished, fragmented de novo assemblies comprised of hundreds of 77
contigs 25, 26. Many AMR gene regions are flanked by repetitive insertion sequences (IS), and short-read 78
sequencing cannot span these regions 25. This can result in inaccurate chromosome or plasmid 79
assemblies, including multiple copies of a repeat region collapsed into one location 25, 27. Long-read 80
sequencing platforms produce reads ranging from several kilobases (kb) to greater than 10 kb in length 81
that can bridge gaps between repetitive regions and allow complete bacterial genome finishing 25. 82
However, long-read sequence data have higher error rates than short-read data 28, and are dominated by 83
insertions and deletions (indels), especially in homopolymeric regions 24, 29. Small plasmids (<7 kb) are 84
difficult to accurately assemble from PacBio reads due to size selection limitations of library preparation 85
and data analysis 30 31. However, short- and long- read NGS data are complementary and together 86
generate highly accurate hybrid genome assemblies 25, 26, 32, 33. 87
Work with Y. pestis and B. anthracis requires specialized facilities to ensure both biosecurity and 88
biosafety. Physical, high-containment laboratory space is often limited, and instruments used for 89
molecular characterization, such as WGS, are typically located in separate areas at lower containment 90
levels. NGS instruments can incur high capital costs, occupy a large footprint or require specialized 91
technical training, making routine on-site sequencing cost- and space- prohibitive 28, 34. By contrast, 92
there is minimal capital cost for the MinION (ONT) device, which is the size of a USB thumb-drive, and 93
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
5
its operation requires a computer or ONT device controller that together occupy little benchtop space. 94
Therefore, nanopore WGS can occur within a high-containment laboratory with limited space. Unlike 95
PacBio sequencing that requires ≥1 µg of genomic DNA (gDNA), only 400 ng of gDNA is required for 96
library preparation using the MinION Rapid Sequencing Kit. Sequencing libraries can also be prepared 97
using a MinION Field Sequencing Kit that includes shelf-stable lyophilized reagents. After a run 98
begins, MinION sequence data are available in minutes enabling real-time sequence analysis. 99
The field of nanopore sequencing is rapidly evolving, and benchmarking studies that use 100
standardized conditions to assess run-to-run variability are lacking. Few studies have assessed the 101
accuracy and consistency of nanopore sequencing using the same strain with consistent chemistry, 102
consumables, and software versions 26, 35, 36. We developed a custom bioinformatics pipeline to compare 103
the genome assemblies from standardized replicate nanopore sequencing experiments to define 104
sequencing quality metrics, establish performance characteristics, and demonstrate the feasibility of 105
rapid long-read WGS for Y. pestis and B. anthracis. The resulting same-day laboratory workflow for 106
DNA isolation, library preparation, nanopore sequencing, and bioinformatics analysis is timely and 107
effective for the rapid detection of known markers related to AMR and evidence of genetic engineering 108
in bacterial biothreat pathogens. 109
110
Results 111
Y. pestis gDNA quality and quantity. All Y. pestis gDNA met ONT criteria for MinION sequencing as 112
measured by optical density absorbancy ratios at 260 nm and 280 nm (A260/280) and at 260 nm and 230 113
nm (A260/230) (Supp. Table 1). Both silica-membrane (SM)- and salt-precipitation (SP)-based gDNA 114
extraction methods produced Y. pestis gDNA of comparable quality and quantity, with intact high 115
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
6
molecular weight (HMW) and little evidence of shearing (Supp. Fig. 1). The SP method produced 116
higher molecular weight DNA compared to SM extraction, but required an hour longer to complete. 117
Y. pestis MinION sequencing technical replicates using Rapid Sequencing and Field Sequencing 118
Kit libraries. Y. pestis S1037, a derivative of the attenuated strain All2237, was sequenced to establish 119
baseline performance and quality metrics for rapid nanopore WGS. Strain S1037 contains two virulence 120
plasmids, pMT1 (~110 kb) and pPCP1 (9.5 kb monomer and ~19 kb dimer), and is ciprofloxacin non-121
susceptible (CIP-NS, MIC 4 µg/ml by BMD AST). Unlike A1122, strain S1037 contains a gyrA single 122
nucleotide polymorphism (SNP) (Table 1) in the previously described Y. pestis quinolone resistance-123
determining region (QRDR) 11. Three replicate rapid sequencing libraries were prepared from one SP 124
gDNA extraction (S1037, SP, Rapid 1, 2, and 3; Fig. 1) and from one SM gDNA extraction (S1027, SM, 125
Rapid 1, 2, and 3; Fig. 1) to evaluate variation among sequencing runs from the same sample. Libraries 126
were also prepared with the Field Sequencing Kit using the same SP and SM gDNA used in the 127
technical replicate study (SP Field and SM Field; Fig. 1). Each library was sequenced for up to 48 128
hours, but only reads generated during the first 16 hours of sequencing were analyzed to allow for a 129
standardized comparison of data sets. 130
After base calling, FASTQ reads with an average quality (Q) score of <7 were discarded, 131
resulting in removal of 4 to 74% of total reads from analysis (Fig. 1, Supp. Table 3). The MinKNOW 132
software became unresponsive after ~11-12 hours of sequencing for SP Rapid 1 and SM Rapid 1, and 133
both were restarted and continued for ~6-8 additional hours (Supp. Table 2). While SP Rapid 1 134
appeared unaffected by the software problem, SM Rapid 1 lost ~74% of reads following basecalling and 135
contained the fewest reads ≥Q7 of any data set. All flowcells met ONT’s minimum active pore 136
requirement (≥800) during platform QC, but the flowcell used for SP Rapid 3 had an initial QC of 871 137
active pores that decreased to 759 after loading the library (Fig. 1, Supp. Table 3). SP Rapid 3 lost ~6% 138
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
7
of reads following base-calling, but the amount of data generated and average Q-score resembled other 139
technical replicate runs (Fig. 1, Supp. Table 3). 140
The number of ≥Q7 FASTQ reads generated was inconsistent within and among the SP and SM 141
replicate runs. During the first 16 hours of sequencing, the SP technical replicates generated 310,906-142
639,598 ≥Q7 reads, while SM experiments generated 305,824-1,172,803 ≥Q7 reads. The average read 143
lengths within and among the SM and SP replicates varied. Both average read length and number of 144
reads generated contributed to the amount of useable data (Fig. 1, Supp. Table 3). For example, SP 145
Rapid 2 (388,435 reads) had more ≥Q7 reads than SP Rapid 3 (310,906 reads), but SP Rapid 3 read 146
lengths were ~1.75X longer. The amount of SP Rapid 3 run data generated was greater (2.6 Gb vs. 1.8 147
Gb). Overall, the range of useable data generated by the SP replicate experiments (1.8 to 3.1 Gb) was 148
comparable to SM (1.3 to 4.6 Gb). The average Q-scores of the SM and SP Rapid Sequencing technical 149
replicates ranged from 12.5 to 13.2. The amount of sequencing data (≥Q7 reads and total Gb) generated 150
using the Field Sequencing Kit was comparable to the Rapid Sequencing Kit (Fig. 1, Supp. Table 3). 151
The Field Sequencing Kits produced data sets with the overall highest SP (13.5) and SM (13.9) Q-152
scores. 153
The strand-to-pore ratio is a quality metric that represents the quantity of MinION flow cell 154
pores actively sequencing DNA (pores in strand) relative to total available pores. The ratio was 155
recorded after ~10 minutes of sequencing per run to assess the library preparation quality (Supp. Table 156
2). A ≥50% strand-to-pore ratio was expected for the RAD-004 Rapid Sequencing Kit (ONT 157
Community and personal communication). For SP Rapid 1, 2, and 3, the average strand-to-pore ratios 158
were 60%, 63%, and 64%, respectively; and for SM Rapid 1, 2, and 3, the ratios were 80%, 75%, and 159
82%, respectively. Ratios increased to >80% within the first 1-2 hours of sequencing for all SM and SP 160
replicates. 161
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
8
MinION sequencing performance and Y. pestis S1037 de novo assembly using retrospective 162
downsampling of passed fast5 reads. A retrospective downsampling of passed fast5 reads was 163
performed to determine the amount of data necessary to produce high-quality genome assemblies. 164
Assembly quality was measured by indel frequency, genome coverage, ability to assemble large and 165
small plasmids, and accurate detection of the gyrA SNP associated with the CIP-NS phenotype (Fig. 2). 166
Minimal improvements in overall assembly quality were observed when >75,000 reads were 167
analyzed (Fig. 2). De novo assemblies using ≤75,000 reads did not consistently assemble the Y. pestis 168
chromosome, or large (pMT1) and small (pPCP1) plasmids. Subsequent nanopore analyses were 169
performed with 100,000 reads only. The average number of mismatches for the SP and SM 100,000-170
read data sets ranged from 220 to 368, while Illumina MiSeq and PacBio assemblies of the same strain 171
contained 44 and 29 mismatches, respectively (Fig. 2a). Thousands of indels (3,151 to 8,251) were 172
detected in nanopore assemblies (Fig. 2b), compared to Illumina (18 indels) and PacBio (93 indels), and 173
were predominantly located in homopolymeric regions. Indels were not significantly reduced in 174
nanopore assemblies supplemented with >100,000 additional reads (data not shown). Using 100,000 175
reads, averaged across the three Rapid SP and three Rapid SM Y. pestis S1037 data sets, the poly-N 176
length and the fraction of poly-N's correct in the assemblies was plotted for each nucleotide type using 177
both the RACON and Nanopolish error corrected assemblies (Supp. Fig. 2). Nanopolish increased the 178
fraction of Y. pestis poly-N lengths correctly called, but this fraction exponentially decreased for lengths 179
>4. Increased accuracy was observed when calling poly-As and Ts compared to poly-Gs and Cs. The 180
gyrA point mutation, which is not located in a homopolymeric region, was correctly identified in all Y. 181
pestis assemblies (Fig. 2g). 182
The first 100,000 fast5 reads yielded >99% identity with A1122 across the ~4.6 megabase Y. 183
pestis genome for all experiments (Fig. 2c). For the chromosome, all nanopore assemblies generated a 184
single contig, except the SP Rapid 1 data set (2 contigs observed) (Fig. 2g). The average depth of 185
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
9
chromosome coverage was 101X for all replicate Y. pestis S1037 data sets (Fig. 2d). Plasmid pMT1 was 186
consistently assembled to completion, with multiple overlaping reads spanning the entire plasmid length 187
in all 100,000-read data sets, however pPCP1 was not. Un- or misassembled pPCP1 fragments were 188
detected (i.e. assembly of multiple small fragments with homology to pPCP1 or bases in different order 189
than the reference sequence) (Fig. 2g). Reads corresponding to pPCP1 with lengths greater than 9.5 kb 190
and 19 kb were also identified, and are discussed below. Overall, the SP sequencing runs produced 191
higher average coverage for both pMT1 (104X) and pPCP1 (3,567X) compared to the SM runs (22X 192
and 431X, respectively) (Fig. 2e and 2f). Among the SM and SP Rapid and Field Sequencing Kit runs, 193
the average time to generate 100,000 fast5 reads was ~2 hours (Supp. Table 2) and data analysis 194
(including base-calling) required ~2.5 additional hours. For Y. pestis, SM gDNA extraction from an 195
agar culture isolate, MinION WGS, de novo assembly and analysis can be completed in ≤6.5 hours (Fig. 196
3). 197
Detection of multimeric pPCP1 forms in Y. pestis S1037 and A1122 assemblies. Analysis of read 198
length distributions among the SP and SM technical replicates revealed a greater number of long reads 199
generated in SP experiments (Supp. Fig. 3). SP data sets also contained more ~18.2 kb reads, which 200
correspond to the pPCP1 dimer (2-mer) form. Most reads from Y. pestis S1037 SP data sets with 201
regions homologous to pPCP1 were ~9.1 kb (~17%) or ~18.2 kb (~80%) (Supp. Fig. 4), which 202
correspond to the previously described monomer (~9.5 kb) and dimer (~19 kb) forms 38. A few reads 203
(~3%) with lengths >18.2 kb were detected in the 100,000-read data sets from SP extracted gDNA, but 204
were absent in SM data sets (Supp. Fig. 3 and Supp. Fig. 4). Dimer, trimer, and tetramer pPCP1 forms 205
are linked together in a 5′-3′ head-to-toe fashion (Fig. 4). Multimers were present in the nanopore data 206
sets derived from the Field Sequencing and three Rapid Sequencing Kit preparations of the same SP 207
extracted gDNA (SP-1). An S1037 assembly using Illumina short-read data resulted in three contigs 208
(688 nt, 1,894 nt, and 7,375 nt) with pPCP1 homology, and did not assemble the ≥19 kb plasmid forms 209
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
10
(Fig. 2g). An S1037 PacBio assembly resulted in 3 contigs corresponding to the chromosome 210
(4,576,792 nt), pMT1 (95,849 nt), and a partially assembled pPCP1 (21,732 nt) (Fig. 2g). 211
An additional Y. pestis S1037 SP gDNA extraction (SP-2, Supp. Table 1) was prepared using 212
care to minimize DNA shearing (e.g. without vortexing and minimal pipetting) to improve WGS 213
detection of multimer pPCP1 forms. Following sequencing using the Rapid Kit, the resulting 100,000-214
read de novo assembly yielded comparable sequence quality to the technical replicate assemblies (data 215
not shown) and overall contained more reads that were >18.2 kb than the S1037 SP-1 data set (Supp. 216
Fig. 5a). The average read length (10,960 nt) was greater than the longest average read length (7,800 nt) 217
observed in the technical replicate experiments. The SP-2 assembly contained an increased fraction of 218
~18.2 kb reads that corresponded to the pPCP1 plasmid dimer (Supp. Fig. 5b). Few reads with pPCP1 219
homology measured ~45,500 nt (4 reads) and 54,600 nt (1 read), which correspond to putative 5-mer 220
and 6-mer forms (Supp. Fig. 5b). 221
To determine if Y. pestis A1122, the parent strain of S1037, also contained reads >18.2 kb with 222
pPCP1 homology, SP and SM extracted gDNA was used for Rapid Sequencing Kit library preparation 223
and nanopore sequencing. The 100,000-read de novo assemblies from A1122 yielded comparable 224
sequence quality to the technical replicate assemblies (data not shown). A1122 reads >18.2 kb with 225
pPCP1 homology (Supp. Fig 6) were assigned 3- or 4-mer forms. When >100,000 reads were analyzed, 226
additional pPCP1 homologous reads in this size range or greater were not detected. A PacBio assembly 227
of A1122 resulted in 3 contigs corresponding to the chromosome (4,586,979 nt), pMT1 (96,144 nt), and 228
pPCP1 (8,935 nt) (data not shown). 229
B. anthracis gDNA quality and quantity. The rapid Y. pestis nanopore sequencing approach was 230
applied to B. anthracis strains with different AMR profiles to evaluate de novo assembly quality and to 231
identify plasmids and known AMR markers/mutations (Table 1). The SM DNA extraction method was 232
selected over the SP method for this study based on speed (1.5 hours time saving). Results from SP 233
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
11
extracted gDNA were comparable in quality and quantity (data not shown). B. anthracis SM extractions 234
produced HMW DNA (>30 kb) with minimal shearing (Supp. Fig. 7) at concentrations acceptable for 235
MinION sequencing (Supp. Table 1). The SM extraction for B. anthracis Sterne/pUTE29 met ONT’s 236
absorbancy ratio requirements, but the A260/230 ratio for strains 411A2 and UT308 were lower than the 237
recommended range of 2.0-2.2. Repeated extraction attempts did not improve the ratios for these strains 238
(data not shown). Additional clean-up by DNA precipitation was not performed to avoid extending the 239
procedure time and to minimize gDNA shearing. Rapid Sequencing Kit libraries were prepared using 240
gDNA from UT308, 411A2, and Sterne/pUTE29 and Field Sequencing Kit libraries were prepared using 241
gDNA from the latter two strains. 242
B. anthracis MinION WGS and retrospective down sampling analysis. Following nanopore 243
sequencing of B. anthracis strains, 100,000 passed fast5 reads were analyzed (Fig. 5). The average 244
number of mismatches among the Rapid and Field Sequencing Kit B. anthracis data sets ranged from 245
169 to 594 (Fig. 5a). B. anthracis Sterne/pUTE29 Illumina MiSeq and PacBio assemblies contained 246
fewer mismatches (107 and 87, respectively). Nanopore B. anthracis data sets contained thousands of 247
indels (6,262-11,574), compared to Illumina (24 indels) and PacBio (21 indels) assemblies (Fig. 5b). 248
Nanopolish increased the fraction of B. anthracis poly-N lengths correctly called, but this fraction 249
decreased with lengths >4. Increased accuracy was observed when calling poly-As and -Ts compared to 250
poly-Gs and -Cs (Supp. Fig. 8). 251
The 100,000-read analysis yielded >99% identity across the ~5.2 Mb B. anthracis genome in 252
every data set analyzed (Fig. 5c). B. anthracis nanopore assemblies contained single contigs for the 253
chromosome (66X average depth of coverage) and pXO1 (~180 kb, 223X average depth of coverage), 254
which were consistantly assembled to completion (Fig. 5d, e, g). Analysis of 100,000 fast5 reads 255
provided sufficient coverage to detect known AMR markers (chromosomal or plasmid-associated) 256
(Table 1). For Sterne/pUTE29, the ~7.3 kb pUTE29 plasmid was consistantly assembled using both 257
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
12
Rapid and Field Kit nanopore data (average 5,621X depth of coverage) (Fig. 5f, g). However, pUTE29 258
was partially assembled or unassembled using Illumina and PacBio data, respectively (Fig. 5g). The 259
gyrA and parC mutations associated with quinolone resistance were detected in the CIP-NS strain 260
411A2 (Fig. 5g). Among the SM Rapid and Field Kit runs, the average time to generate 100,000 fast5 261
reads was ~3 hours (Supp. Table 2) and data analysis required ~ 2.5 additional hours (including base-262
calling). For B. anthracis, SM DNA extractions from an agar culture isolate, MinION WGS, and de 263
novo assembly and analysis can be completed in ≤ 8.5 hours (Fig. 3). 264
Discussion 265
During an outbreak or public health emergency involving bacterial biothreat pathogens, critical 266
decisions are made immediately after a release, when details about the causative agent(s) are limited, 267
and at every stage of the response. Rapid reporting of accurate laboratory testing results (pathogen 268
identification, strain typing, AST) are critical for informing a public health response. For example, 269
detection of genetic manipulation and/or unexpected AMR profiles could signal the introduction of 270
plasmids, AMR genes, or other evidence of engineering. Speeding this response to limit the impact of 271
biothreat incidents is a goal of the 2018 U.S. National Biodefense Strategy 39. 272
Culture- and DNA-based methods have improved the ability of laboratories to rapidly 273
characterize microbial pathogens. The introduction of NGS platforms has accelerated these efforts, and 274
WGS offers a comprehensive genome-wide snapshot of microbial pathogens. However, DNA 275
sequencing using Illumina and PacBio platforms is time-consuming compared to nanopore sequencing 276
and relies on instuments with a large footprint that are often space-prohibitive. Advances in long-read 277
microbial DNA sequencing with the MinION instrument have transformed the fields of NGS and 278
molecular characterzation, enabling real-time pathogen sequencing for epidemiologic investigations40. 279
MinION WGS is promising as a front-line strategy for on-site microbial characterization due to the 280
speed of sample preparation, relatively low cost and device portability 41. Here, we describe a nanopore 281
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
13
sequencing approach for Y. pestis and B. anthracis with a turnaround time that is ~90% faster than 282
current Illumina MiSeq workflows and yields complete genome assemblies, including large and small 283
plasmids, and accurately identifies AMR genes and mutations within a work day. 284
For the widespread adoption of WGS-based approaches in public health and clinical laboratories, 285
validated methods that are simple to set up are essential. During an outbreak investigation, laboratory 286
methods must be reliable and reproducible. Both commercially-available DNA extraction methods 287
evaluated here were capable of generating gDNA of sufficient quality, quantity, and molecular weight 288
for Y. pestis and B. anthracis MinION sequencing. The SM method was the fastest (30 min for Y. pestis 289
and 1.5 hr for B. anthracis) and simple to perform. Despite the sub-optimal quality of some B. anthracis 290
extractions, de novo assemblies using data from these samples were unaffected, indicating downstream 291
library preparation and sequencing reactions were not compromised. Further optimization could 292
improve the extraction process and DNA quality, and work is ongoing to evaluate the sequencing 293
performance of DNA extracted from wild type B. anthracis encapsulated vegetative cells. 294
Shelf-stable reagents that do not require cold chain storage are preferred for rapid deployment 295
and use in the field. The lyophilized Field Sequencing Kit, which does not require a cold chain, 296
performed comparably to the Rapid Kit. To identify the minimum amount of sequencing time needed to 297
consistently generate enough data for high-quality Y. pestis and B. anthracis de novo genome 298
assemblies, a retrospective down sampling analysis was performed. On average, only 100,000 fast5 299
reads were necessary to assemble the chromosome and large and small plasmids with sufficient 300
coverage to correctly identify known AMR genes and mutations. Additional reads did not strengthen 301
the assembly quality, and any improvement of coverage and number of mismatches, indels, and % 302
identity was negligible. Our laboratory and bioinformatic workflows were achieved in <8.5 hours 303
starting from a culture isolate, and accurately detected genomic AMR markers in approximately half the 304
time required to complete functional AST for Y. pestis and B. anthracis. 305
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
14
We demonstrate that thresholds and quality metrics can be established that are indicative of 306
nanopore sequencing quality. MinION sequencing performance was assessed by comparing data from 307
two technical replicate experiments where the DNA extraction, library preparation kit, sequencing 308
conditions, MinKNOW software, and MinION instrument and flowcell versions remained the same. 309
Since the majority of nanopore data is generated during the first 16 hours of sequencing when the most 310
active pores are used 26, 36, only the passed reads generated during this time were analyzed in order to 311
standardize the technical replicate data sets. Considerable run-to-run variation was observed among the 312
SP and SM replicates from the same biological sample. We demonstrated that the number of active 313
pores was not predictive of the amount or quality of data. These inconsistencies could result from 314
intrinsic flow cell variability and day-to-day loading efficiencies. However, comparable average quality 315
scores were observed for basecalled reads (≥Q7) for both the SP and SM data sets. 316
The rapid sequencing workflow developed in this study aimed to deliver the fastest turnaround 317
time per bacterial isolate using one flowcell per run. The study avoids the complexities of 318
multiplexing/demultiplexing sequences during data analysis. Due to the variability we observed in 319
flowcell performance, multiplexing the same sample on a single flow cell could be used to establish 320
“within run” precision. Improved rapid multiplex capabilities have been described since this study 321
began, but multiplexing is a balance between sequencing speed and genome coverage. Additional 322
considerations, including incorporation of positive and negative DNA controls, are needed to establish 323
indicators of sequencing quality within and between runs. Future work must build on this foundation to 324
ensure the accuracy of nanopore data and expand MinION sequencing to include multiple complex 325
samples, such as clinical specimens. 326
The assay parameters established in the Y. pestis technical replicate analysis were applied to B. 327
anthracis by sequencing three AMR strains: Sterne/pUTE29, 411A2, and UT308. With the exception of 328
an additional one hour lysis step for B. anthracis, the same SM DNA extraction protocol was employed 329
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
15
and 100,000 fast5 reads were sufficient for >99% coverage of the chromosome and plasmids (both 330
native and introduced). While functional phenotypic methods remain essential for accurate AST, WGS 331
can be used to detect point mutations that predict AMR in some biothreat pathogens 42. For both Y. 332
pestis and B. anthracis data sets, MinION reads were error-prone compared to Illumina and PacBio data 333
and contained more indels, a well described nanopore error type 33, 34, 43. Therefore, open reading frame 334
prediction and SNP-typing were not performed for MinION assemblies due to the large number of 335
indels. Overall, use of the Nanopolish error correction tool reduced indels, but accurate basecalls could 336
not be made in homopolymer regions (poly-N lengths >4). For this reason, the homopolymeric regions 337
associated with penicillin resistance in B. anthracis, sigP-rsiP 42, were not included for analysis in this 338
study. However, QRDR mutations associated with AMR in B. anthracis and Y. pestis that are located in 339
non-homoploymeric regions of the genome were correctly identified in nanopore assemblies. Currently, 340
a hybrid (long- and short-read) assembly approach is necessary to construct complete genomes with 341
sufficent quality for variant calling (for SNP and indel detection) with nanopore data. 342
Y. pestis isolates harboring various multimeric pPCP1 forms in addition to, or in place of, one of 343
the three prototypical plasmids were described previously 38, 44. The 9.5 kb and 19 kb pPCP1 plasmids 344
are stable upon multiple passages and coexist; however, one plasmid is often present at a higher 345
concentration than the other, suggesting a level of genetic control of expression 38. Chu et al.38 noted the 346
presence of pPCP1 dimer forms in strain A1122, regardless of DNA extraction method used 38. Other 347
atypical Y. pestis plasmids have been described in Angola and C790 strains, including a chimeric 348
114,570 nt form with two tandemly repeated pPCP1 copies integrated into pMT1 45, and a pMT1-pCD1 349
chimera 46, respectively. The presence of an IS element, IS100, on pPCP1, pCD1, and pMT1 may have 350
facilitated the formation of these multimeric and chimeric plasmids via homologous recombination 45, 46. 351
Understanding these plasmid size and form variations among naturally occurring Y. pestis isolates is 352
important for distinguishing between natural and potentially engineered variation 44. 353
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
16
In this study, the pPCP1 monomer and dimer forms were not consistently assembled correctly 354
because multiple pPCP1 reads of different sizes introduced considerable ambiguity into the assembly 355
graphs. Multiple overlapping pPCP1 reads corresponding to closed-circle multimer plasmids of various 356
increasing sizes were also identified. The longest pPCP1 multimers were detected in a Y. pestis S1037 357
assembly for which additional steps to reduce gDNA shearing during the SP extraction were employed. 358
Analysis of 100,000 reads revealed four 5-mer reads and one 6-mer read. Additional analysis of 359
300,000 reads identified eleven 5-mer reads and only one 6-mer read (data not shown); none of which 360
were detected in SP technical replicate data sets. The challenges of using long-read sequence data for 361
plasmid assembly, discussed below, and/or DNA shearing may explain the rarity of finding these large 362
multimers. 363
Chu et al.38 also described the presence of multimeric pPCP1 forms in A1122 >19 kb by 364
Southern blot analysis and they explain these multimer forms may contribute to bacterial plasmid 365
maintenance. While the Y. pestis genome is considered monomorphic 47, there is evidence for genome 366
plasticity due to large genomic rearrangements (translocations and inversions), as well as gene loss due 367
to homologous recombination between multiple copies of highly repetitive IS elements found on all 368
three plasmids and on the chromosome 48, 49, 50. We speculate that in Y. pestis strains S1037 and All22 369
the 9.5 kb pPCP1 plasmid has undergone homologous recombination with itself and with the 19 kb 370
pPCP1 plasmid to form the 2-mer to 6-mer species. 371
The power of MinION sequencing to resolve complex genomic structures was evident in the 372
ability to identify expected and novel pPCP1 plasmid multimers. These pPCP1 multimers were only 373
partially assembled using S1037 and A1122 Illumina and PacBio data. In addition, the A1122 reference 374
sequence (CP002956.1), assembled from Illumina data, contains only the pPCP1 monomer form. This 375
is an example of the limitation of using short-read data to assemble complex genomic regions and 376
highlights that assemblies in public databases may be incomplete. 377
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
17
Small plasmid assembly using long-read data is challenging for many algorithms 51. Unlike the 378
issues associated with assembling multiple copies of the Y. pestis multimeric plasmid, the pUTE29 (~7.3 379
kb) plasmid was readily assembled in all B. anthracis data sets. However, the bioinformatics pipeline in 380
this study was tailored to assemble these plasmids. It is important to note that these parameters may not 381
be generalizable for every data set. While this work was performed using a small number of strains, the 382
availablity of resistant Y. pestis and B. anthraicis isolates is limited; for both laboratory derived and 383
naturally-occuring isolates. Further optimization is required to assure the pipeline can correctly identify 384
different plasmid and AMR profiles, and this is currently under investigation. Since the completion of 385
this work, new assemblers, e.g. WTDBG2 52 and FLYE 53, have been released and address the 386
limitations of plasmid assembly using long-read sequence data. 387
Other limitations of nanopore sequencing (e.g. cost, throughput, and accuracy) could be 388
mitigated by future updates to the software, hardware, library preparation kits, and reagents 41. But, 389
with these improvements and updates come challenges such as locking down assay parameters to assess 390
and compare performance 36, 54, 55. The workflows described here for same-day bacterial MinION WGS, 391
assembly, and bioinformatics analysis for biological threat pathogens show the utility of nanopore 392
sequencing for rapid diagnostics that will be critical during a biothreat event. As NGS technologies 393
continue to evolve and stablize, real-time nanopore WGS for bacterial characterization, including AMR 394
marker detection, chromosome, large and small plasmid assembly, and accurate SNP detection, may 395
soon be a reality. 396
397
Methods 398
Biosafety Procedures. Only select agent-excluded strains were used in this study (Y. pestis A1122, Y. 399
pestis S1037, B. anthracis Sterne/pUTE29, B. anthracis 411A2, and B. anthracis UT308). All 400
procedures were performed in a biosafety level 2 (BSL-2) laboratory by trained personnel wearing 401
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
18
appropriate personal protective equipment (PPE) according to the CDC/NIH publication Biosafety in 402
Microbiological and Biomedical Laboratories, 5th edition 56. 403
Bacterial Strains. Study strains are listed in Table 1. Y. pestis A1122 was passaged on increasing 404
concentrations of CIP using a previously described method 11 to create the CIP-NS strain, S1037. This 405
strain was developed as a control strain with approval from the CDC Institutional Biosafety Committee 406
and Institutional Biosecurity Board. 407
Growth Conditions. All strains were cultured for 16-20 hours (B. anthracis) or 24 hours (Y. pestis) on 408
BD BBLTM Trypticase Soy Agar II with 5% sheep blood (SBA) (Thermo Fisher Scientific, Waltham, 409
MA, USA) at 35°C in ambient air from glycerol stocks stored at -70°C. 410
Antimicrobial Susceptibility Testing. Broth microdilution (BMD) was performed to determine 411
antimicrobial susceptibility following the Clinical and Laboratory Standards Institute guidelines 57. 412
Genomic DNA Extraction. gDNA for each study strain was extracted from agar plate cultures, 413
described above. For Y. pestis, colonies were transferred to 200µl of Phosphate Buffer Solution (0.01M, 414
pH 7.40) and vortexed until completely resuspended. B. anthracis colonies were transferred to 200µl of 415
lysis buffer (20mM Tris HCL pH 8.0, 2mM EDTA, 1.2% Triton X-100) and incubated for 60 min at 416
37°C. Y. pestis and B. anthracis gDNA was isolated using the MasterPure Complete DNA and RNA 417
Purification Kit (Lucigen, Middleton, WI), a salt-precipitation (SP) extraction method, and the QIAamp 418
DNA Blood Mini Kit (Qiagen Valencia, CA), a silica-membrane (SM) extraction method. Both 419
extractions were performed according to manufacturer’s specifications. Y. pestis extracts were eluted in 420
75µl of Elution Buffer (EB) (10 mM Tris-Cl, pH 8.5, Qiagen, Valencia, CA) and B. anthracis extracts 421
were eluted in 50µl EB. gDNA samples were subsequently stored at 4°C. 422
Assessment of gDNA quality and quantity. gDNA extractions were evaluated for quantity and quality 423
using the Qubit dsDNA BR Assay Kit (Invitrogen, Waltham, MA) and NanoDrop 2000 424
spectrophotometer (ThermoFisher Scientific, Pittsburgh, PA) following manufacturer’s instructions. 425
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
19
gDNA integrity was assessed following 0.6% agarose gel electrophoresis of 100ng of each sample to 426
visualize the presence of intact, high molecular weight (HMW) DNA. The Quick-Load 1 kb Extend 427
DNA Ladder (New England BioLabs, Ipswich, MA) was used as a molecular marker. 428
Nanopore DNA library preparation. Libraries were prepared using 400ng of gDNA per strain and the 429
Rapid Sequencing Kit, SQK-RAD004 (ONT, Oxford, England) or the Field Sequencing Kit, SQK-430
LRK001 (ONT, pre-released), according to manufacturer’s specifications. All Field Sequencing Kit 431
components were stored at room temperature prior to use. Y. pestis S1037 libraries were prepared in 432
triplicate on the same day from the same gDNA extraction and stored at 4°C for no longer than 3 days. 433
Nanopore sequencing. Sequencing was performed on a MinION (MK1B) for 16-48 hours using R9.4.1 434
flowcells and MinKNOW (version 18.01.6). All flowcells met the minimum ≥800 active pore count 435
requirement immediately prior to sequencing. Each sample was sequenced on a separate flowcell and 436
each flowcell was used for only one sequencing run. For the technical replicate study, the same 437
laboratorian loaded each library into the flowcells to minimize variation. The replicate Y. pestis S1037 438
libraries were sequenced on sequential days over a 3 day period. To standardize the comparison of the 439
technical replicate Rapid Kit and Field Kit Y. pestis S1037 data sets, only raw fast5 reads from the 440
“passed folder,” generated during the first 16 hours of sequencing were analyzed. Raw fast5 reads from 441
the “failed folder” were not included in this analysis. Sequencing runs included in this study are 442
described in Supp Table 2. 443
Bioinformatics pipeline. A pipeline was developed to perform rapid de novo genome and plasmid 444
assemblies, to detect AMR variants and genes, and to identify variants (mismatches and indels), see 445
Supp Fig. 9. 446
Basecalling. Basecalling of raw fast5 reads was performed using the Albacore basecalling 447
utility (version 2.3.10), to produce raw FASTQ reads. Raw FASTQ reads were further processed using 448
Porechop (version 0.2.3) to remove adapters and any potentially chimeric sequences. Processed FASTQ 449
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
20
reads were used to generate a draft assembly using minimap2 58 (version 2.10-r770-dirty) and miniasm 450
59 (0.2-r168-dirty). First, all-versus-all alignment of the FASTQ reads was performed using minimap2 451
with the option ‘-x ava-ont’. The resulting minimap2 alignments and input reads were used to create a 452
draft assembly with miniasm, using the options ‘-s 1750 –h 1000 –I .5’. Errors in the draft assembly 453
generated by miniasm were iteratively improved using RACON 60 (version 1.1.1). For 5 iterations, 454
minimap2 was used to align the reads to an assembly and the alignments were used by RACON to 455
produce a consensus assembly. The first iteration used the draft assembly generated by miniasm, while 456
subsequent iterations used the consensus assembly generated by the previous RACON step. The final 457
output of the RACON process was further processed using Nanopolish 61 (version 0.10.1). Nanopolish 458
was run in the variants mode using the options ‘--faster --consensus --min-candidate-frequency 0.1’, and 459
the vcf2fasta feature was used to create the final consensus sequence. 460
Detection of known genes related to AMR. The ResFinder62 database of AMR genes was 461
queried against the final Nanopolish assembly using BLASTn 63 and the options ‘-perc_identity 90 –462
evalue 1e-10’ to identify potential AMR sequences in each assembly. AMR gene results were clustered 463
using BEDTools 64 merge (version 2.24.0) with the options ‘-d -30’ in order to identify unique, non-464
overlapping regions with high similarity to AMR sequences. For each cluster, BEDTools intersect was 465
used to find the gene with the highest % identity, which most completely covered the cluster. 466
Detection of known mutations related to AMR. Basecalls from the nanopore sequencing reads 467
were compared to a reference genome using minimap2 58 by aligning reads against the corresponding 468
reference sequence (Y. pestis A1122, Accession No. CP002956.1, or B. anthracis Ames Ancestor, 469
Accession No. AE017334) with the options ‘-a -x map-ont’. The samtools 65 mpileup utility (version 470
1.2) was used to find the numbers of each base seen in the reads at each position in the QRDR gene 471
coding regions. These basecall counts were used with bcftools 65 call (version 1.8) to identify variant 472
sites in those genes. 473
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
21
Characterization of Plasmids. Contigs ≥500,000 nt were excluded from plasmid 474
characterization as they likely represent chromosomal sequence. Remaining contigs ( ≤500,000 nt) 475
were queried with a custom database of plasmids and artificial sequences using the pblat 66 utility 476
(version 35). A custom algorithm, pChunks, was used to find the set of plasmid sequences present in the 477
assembly such that the maximal amount of plasmid sequence was included. 478
Detection of mismatches, insertions and deletions. The consensus assembly generated by the 479
last iteration of RACON was aligned with the corresponding reference sequence (Y. pestis A1122 or B. 480
anthracis Ames Ancestor) to identify mismatches and small indels using the dnadiff utility from the 481
MUMmer 67 package (version 4.0.0beta2). Gaps from the resulting ‘rdiff’ (reference differences) and 482
‘qdiff’ (query differences) outputs were used to identify indels in one genome relative to the other. 483
Sequence gaps ≥100 bases, and with positive lengths in both the assembly and reference sequences 484
(denoting real insertions and not duplications) were kept as insertions. Mismatches identified by 485
MUMmer were compared to the presence of homopolymeric regions in the reference sequence to 486
determine the number of homopolymeric regions that were assembled correctly. Coverage of the 487
reference chromosome and plasmids was characterized by aligning reads against the references 488
sequences using minimap2, and enumerating the mean per-base coverage. 489
Downsampling. Y. pestis and B. anthracis reads were retrospectively assembled using the above 490
analysis in the order they were generated for the first 25,000, 50,000, 75,000, 100,000 passed fast5 reads 491
to determine the minimum number of sequencing reads required for chromosome and plasmid assembly, 492
as well as the identification of known AMR genes and mutations. Per assembly, the number of 493
mismatches, indels, and percent identity were determined using an alignment to the reference sequence. 494
The chromosome and plasmid precent coverage was calculated. 495
pPCP1 Analysis. For each SP S1037 Y. pestis 100,000-read data set, basecalled reads of ≥1000 nt were 496
aligned to the pPCP1 reference sequence from Y. pestis A1122, Accession No. CP002956.1, using 497
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
22
minimap2. Reads that aligned to the pPCP1 sequence across at least 95% of their length were included 498
for analysis. Reads were assigned to a 1, 2, 3, 4, 5, or 6-mer version of the pPCP1 sequence based on 499
their length and the fractions of these versions in each sample were used to determine the pPCP1 500
multimer fraction (Supp Fig. 4). All SP Y. pestis S1037 data sets contained reads corresponding to 2, 3, 501
and 4-mer forms of plasmid pPCP1, and SP Rapid 1 was chosen as the representative data set to use to 502
generate Circos plots 68 representing the distribution of reads from the plasmid multimers. From the SP 503
Rapid 1 aligned reads, a representative read for each multimeric pPCP1 sequence was choosen to 504
represent the complete plasmid. The representative read was rotated such that the first base of the read 505
aligned to the first base of the pPCP1 reference sequence. The remaining reads for each multimeric 506
pPCP1 sequence were aligned against the reference read. Known coding sequences from pPCP1 were 507
found in the multimeric forms by querying these features against the representative read using BLASTn 508
63. 509
Illumina library preparation, sequencing, and analysis. Sequencing of Y. pestis S1037 and B. 510
anthracis Sterne/pUTE29 was performed on an Illumina MiSeq using the v2 kit (2x250 nt paired-end) 511
with Nextera XT libraries. Sequencing adapters were removed and reads were trimmed using Cutadapt 512
(v1.14). De novo assembly was performed using the SPAdes Genome Assembler (v3.10.0). The 513
consensus Illumina assemblies were aligned with the corresponding Y. pestis A1122 or B. anthracis 514
Ames Ancestor reference assemblies to identify mismatches and small indels using the dnadiff utility 515
from the MUMmer 67 package (version 4.0.0beta2). 516
PacBio library preparation, sequencing, and analysis. Libraries (10 kb) of Y. pestis S1037 and B. 517
anthracis Sterne/pUTE29 were prepared with the SMRTbell Template Prep Kit 1.0 (Pacific Biosciences, 518
Menlo Park, CA) and bound using the DNA/Polymerase Binding Kit P6v2 (Pacific Biosciences, Menlo 519
Park, CA). The bound libraries were then loaded on one SMRTcell and sequenced with C4v2 chemistry 520
for 270 min movies on the RSII instrument (Pacific Biosciences, Menlo Park, CA). Sequence reads 521
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
23
were de novo assembled using RS_HGAP Assembly.3 in the SMRT Analysis 2.3.0 portal. The 522
consensus PacBio assemblies were aligned with the corresponding Y. pestis A1122 or B. anthracis Ames 523
Ancestor reference assemblies to identify mismatches and small indels using the dnadiff utility from the 524
MUMmer 67 package (version 4.0.0beta2). Comparison of PacBio reads corresponding to the pPCP1 525
sequence was carried out as described above for the analysis of pPCP1. 526
Data Availability 527
The nanopore, illumina, and PacBio sequence data sets generated and analyzed in the current study are 528
available from NCBI in the Sequence Read Archive under bioproject PRJNA523610. All nanopore 529
sequencing runs and corresponding BioSample accession numbers are also listed in Supp Table 2. All 530
data supporting the findings of this study are available within the article and its supplementary 531
information files, or are available upon request. 532
533
References 534
1. Wagar E. Bioterrorism and the Role of the Clinical Microbiology Laboratory. Clin Microbiol Rev 29, 175-535 189 (2016). 536
537 2. Jernigan DB, et al. Investigation of bioterrorism-related anthrax, United States, 2001: epidemiologic 538
findings. Emerg Infect Dis 8, 1019-1028 (2002). 539
540 3. Jernigan JA, et al. Bioterrorism-related inhalational anthrax: the first 10 cases reported in the United 541
States. Emerg Infect Dis 7, 933-944 (2001). 542
543 4. Shandomy S IA, Raizman E, Bruni M, Palamara E, Pittiglio C, Lubroth J. Anthrax outbreaks: a warning for 544
improved prevention, control and heightened awareness. In: Empres Watch. Food and Agriculture 545 Organization of the United Nations (2016). 546
547 5. Mead PS. Plague in Madagascar - A Tragic Opportunity for Improving Public Health. N Engl J Med 378, 548
106-108 (2018). 549
550 6. Weigel LM, Morse S. A. Implications of antibiotic resistance in potential agents of bioterrorism. In 551
Mayers DL (ed), Antimicrobial Drug Resistance Humana Press, Totowa, NJ, 1293-1312 (2009). 552
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
24
553 7. Chen Y, Succi J, Tenover FC, Koehler TM. Beta-lactamase genes of the penicillin-susceptible Bacillus 554
anthracis Sterne strain. J Bacteriol 185, 823-830 (2003). 555
556 8. Pomerantsev AP, Shishkova NA, Marinin LI. Comparison of therapeutic effects of antibiotics of the 557
tetracycline group in the treatment of anthrax caused by a strain inheriting tet-gene of plasmid pBC16. 558 Antibiot Khimioter 37, 31-34 (1992). 559
560 9. Price LB, Vogler A, Pearson T, Busch JD, Schupp JM, Keim P. In vitro selection and characterization of 561
Bacillus anthracis mutants with high-level resistance to ciprofloxacin. Antimicrob Agents Chemother 47, 562 2362-2365 (2003). 563
564 10. Ryzhko IV, et al. Virulence of rifampicin and quinolone resistant mutants of strains of plague microbe 565
with Fra+ and Fra- phenotypes. Antibiot Khimioter 39, 32-36 (1994). 566
567 11. Lindler LE, Fan W, Jahan N. Detection of ciprofloxacin-resistant Yersinia pestis by fluorogenic PCR using 568
the LightCycler. J Clin Microbiol 39, 3649-3655 (2001). 569
570 12. Inglesby TV, et al. Plague as a biological weapon: medical and public health management. Working 571
Group on Civilian Biodefense. JAMA 283, 2281-2290 (2000). 572
573 13. Galimand M, Carniel E, Courvalin P. Resistance of Yersinia pestis to antimicrobial agents. Antimicrob 574
Agents Chemother 50, 3233-3236 (2006). 575
576 14. Guiyoule A, et al. Transferable plasmid-mediated resistance to streptomycin in a clinical isolate of 577
Yersinia pestis. Emerg Infect Dis 7, 43-48 (2001). 578
579 15. Galimand M, et al. Multidrug resistance in Yersinia pestis mediated by a transferable plasmid. N Engl J 580
Med 337, 677-680 (1997). 581
582 16. Cabanel N, Bouchier C, Rajerison M, Carniel E. Plasmid-mediated doxycycline resistance in a Yersinia 583
pestis strain isolated from a rat. Int J Antimicrob Agents 51, 249-254 (2018). 584
585 17. CLSI. Methods for antimicrobial diluiton and disc susceptibility testing of infrequently isolated or 586
fastidious bacteria, 3rd ed. M45. CLSI, Wayne, PA. (2016). 587
588 18. Urich SK, Chalcraft L, Schriefer ME, Yockey BM, Petersen JM. Lack of antimicrobial resistance in Yersinia 589
pestis isolates from 17 countries in the Americas, Africa, and Asia. Antimicrob Agents Chemother 56, 590 555-558 (2012). 591
592 19. Weigel LM, Sue D, Michel PA, Kitchel B, Pillai SP. A rapid antimicrobial susceptibility test for Bacillus 593
anthracis. Antimicrob Agents Chemother 54, 2793-2800 (2010). 594
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
25
595 20. Schofield DA, et al. Bacillus anthracis diagnostic detection and rapid antibiotic susceptibility 596
determination using 'bioluminescent' reporter phage. J Microbiol Methods 95, 156-161 (2013). 597
598 21. Steinberger-Levy I, et al. Enrichment of Yersinia pestis from blood cultures enables rapid antimicrobial 599
susceptibility determination by flow cytometry. Adv Exp Med Biol 603, 339-350 (2007). 600
601 22. Bugrysheva JV, Lascols C, Sue D, Weigel LM. Rapid Antimicrobial Susceptibility Testing of Bacillus 602
anthracis, Yersinia pestis, and Burkholderia pseudomallei by Use of Laser Light Scattering Technology. J 603 Clin Microbiol 54, 1462-1471 (2016). 604
605 23. McLaughlin HP, Gargis AS, Michel P, Sue D, Weigel LM. Optical Screening for Rapid Antimicrobial 606
Susceptibility Testing and for Observation of Phenotypic Diversity among Strains of the Genetically 607 Clonal Species Bacillus anthracis. J Clin Microbiol 55, 959-970 (2017). 608
609 24. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing 610
technologies. Nature reviews Genetics 17, 333-351 (2016). 611
612 25. Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION 613
sequencing. Microbial genomics 3, e000132 (2017). 614
615 26. Lu H, Giordano F, Ning Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics, 616
proteomics & bioinformatics 14, 265-279 (2016). 617
618 27. Ricker N, Qian H, Fulthorpe RR. The limitations of draft assemblies for understanding prokaryotic 619
adaptation and evolution. Genomics 100, 167-175 (2012). 620
621 28. Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E. Next-generation sequencing technologies 622
and their application to the study and control of bacterial infections. Clinical microbiology and infection : 623 the official publication of the European Society of Clinical Microbiology and Infectious Diseases 24, 335-624 341 (2018). 625
626 29. Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput 627
sequencing errors and their correction. Briefings in bioinformatics 17, 154-179 (2016). 628
629 30. Stoesser N, et al. First Report of blaIMP-14 on a Plasmid Harboring Multiple Drug Resistance Genes in 630
Escherichia coli Sequence Type 131. Antimicrob Agents Chemother 60, 5068-5071 (2016). 631
632 31. Forde BM, et al. The complete genome sequence of Escherichia coli EC958: a high quality reference 633
sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone. PloS one 9, 634 e104400 (2014). 635
636
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
26
32. Bashir A, et al. A hybrid approach for the automated finishing of bacterial genomes. Nature 637 Biotechnology 30, 701 (2012). 638
639 33. Goldstein S, Beka L, Graf J, Klassen JL. Evaluation of strategies for the assembly of diverse bacterial 640
genomes using MinION long-read sequencing. BMC Genomics 20, 23 (2019). 641
642 34. Ashton PM, et al. MinION nanopore sequencing identifies the position and structure of a bacterial 643
antibiotic resistance island. Nat Biotechnol 33, 296-300 (2015). 644
645 35. Jain M, et al. MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 646
chemistry. F1000Research 6, 760 (2017). 647
648 36. Tyler AD, et al. Evaluation of Oxford Nanopore’s MinION Sequencing Device for Microbial Whole 649
Genome Sequencing Applications. Scientific Reports 8, 10931 (2018). 650
651 37. Jawetz E, Meyer, K.F. Avirulent strains of Pasteurella pestis. Journal of Infectious Disease 124-143 (1943). 652
653 38. Chu MC, Dong XQ, Zhou X, Garon CF. A cryptic 19-kilobase plasmid associated with U.S. isolates of 654
Yersinia pestis: a dimer of the 9.5-kilobase plasmid. The American journal of tropical medicine and 655 hygiene 59, 679-686 (1998). 656
657 39. The National Biodefense Strategy. Office of the President of the United States, 2018. 658
https://www.hsdl.org/?view&did=815921. Access Date: June 4, 2019. 659
660 40. Quick J, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228-232 661
(2016). 662
663 41. Gwinn M, MacCannell D, Armstrong GL. Next-Generation Sequencing of Infectious Pathogens. JAMA, 664
(2019). 665
666 42. Gargis AS, et al. Analysis of Whole-Genome Sequences for the Prediction of Penicillin Resistance and 667
beta-Lactamase Activity in Bacillus anthracis. mSystems 3, (2018). 668
669 43. O'Donnell CR, Wang H, Dunbar WB. Error analysis of idealized nanopore sequencing. Electrophoresis 34, 670
2137-2144 (2013). 671
672 44. Anisimov AP, Lindler LE, Pier GB. Intraspecific diversity of Yersinia pestis. Clin Microbiol Rev 17, 434-464 673
(2004). 674
675 45. Eppinger M, et al. Genome sequence of the deep-rooted Yersinia pestis strain Angola reveals new 676
insights into the evolution and pangenome of the plague bacterium. J Bacteriol 192, 1685-1699 (2010). 677
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
27
678 46. Rajanna C, et al. Characterization of pPCP1 Plasmids in Yersinia pestis Strains Isolated from the Former 679
Soviet Union. International journal of microbiology 2010, 760819 (2010). 680
681 47. Achtman M. Evolution, population structure, and phylogeography of genetically monomorphic bacterial 682
pathogens. Annual review of microbiology 62, 53-70 (2008). 683
684 48. Radnedge L, Agron PG, Worsham PL, Andersen GL. Genome plasticity in Yersinia pestis. Microbiology 685
(Reading, England) 148, 1687-1698 (2002). 686
687 49. Wren BW. The yersiniae--a model genus to study the rapid evolution of bacterial pathogens. Nature 688
reviews Microbiology 1, 55-64 (2003). 689
690 50. Parkhill J, et al. Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413, 523-527 691
(2001). 692
693 51. Kolmogorov M, Yuan J, Lin Y, Pevzner P. Assembly of Long Error-Prone Reads Using Repeat Graphs. 694
bioRxiv, 247148 (2018). 695
696 52. Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. bioRxiv, 530972 (2019). 697
698 53. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat 699
Biotechnol 37, 540-546 (2019). 700
701 54. Lemon JK, Khil PP, Frank KM, Dekker JP. Rapid Nanopore Sequencing of Plasmids and Resistance Gene 702
Detection in Clinical Isolates. J Clin Microbiol 55, 3530-3543 (2017). 703
704 55. Giordano F, et al. De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms. Sci Rep 705
7, 3935 (2017). 706
707 56. Chosewood LC, Wilson DE, Centers for Disease Control and Prevention (U.S.), National Institutes of 708
Health (U.S.). Biosafety in microbiological and biomedical laboratories, 5th edn. U.S. Dept. of Health and 709 Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institutes of 710 Health (2009). 711
712 57. M45. Methods for Antimicrobial Dilution and Disc Susceptibility Testing of Infrequently Isolated or 713
Fastidious Bacteria. Clinical and Laboratory Standards Institute 3rd ed, (2015). 714
715 58. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics (Oxford, England) 34, 716
3094-3100 (2018). 717
718
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
28
59. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. 719 Bioinformatics (Oxford, England) 32, 2103-2110 (2016). 720
721 60. Vaser R, Sovic I, Nagarajan N, Sikic M. Fast and accurate de novo genome assembly from long 722
uncorrected reads. Genome research 27, 737-746 (2017). 723
724 61. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore 725
sequencing data. Nature methods 12, 733-735 (2015). 726
727 62. Kleinheinz KA, Joensen KG, Larsen MV. Applying the ResFinder and VirulenceFinder web-services for 728
easy identification of acquired antibiotic resistance and E. coli virulence genes in bacteriophage and 729 prophage nucleotide sequences. Bacteriophage 4, e27943 (2014). 730
731 63. Chen Y, Ye W, Zhang Y, Xu Y. High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic acids 732
research 43, 7762-7768 (2015). 733
734 64. Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Current protocols in 735
bioinformatics 47, 11.12.11-34 (2014). 736
737 65. Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 25, 738
2078-2079 (2009). 739
740 66. Kent WJ. BLAT--the BLAST-like alignment tool. Genome research 12, 656-664 (2002). 741
742 67. Delcher AL, Salzberg SL, Phillippy AM. Using MUMmer to identify similar regions in large sequence sets. 743
Current protocols in bioinformatics Chapter 10, Unit 10.13 (2003). 744
745 68. Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome research 19, 746
1639-1645 (2009). 747
748 69. Ross CL, Thomason KS, Koehler TM. An extracytoplasmic function sigma factor controls beta-lactamase 749
gene expression in Bacillus anthracis and other Bacillus cereus group species. J Bacteriol 191, 6683-6693 750 (2009). 751
752 70. Gargis AS, Lascols C, McLaughlin HP, Conley AB, Hoffmaster AR, Sue D. Genome Sequences of Penicillin-753
Resistant Bacillus anthracis Strains. Microbiol Resour Announc 8, (2019). 754
755
756
757
758
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
29
Acknowledgements 759
This work was supported by the U.S. CDC Office of Public Health Preparedness and Response (now 760
Center for Preparedness and Response). We thank Christopher Lynberg, CDC, and Charles Johnston, 761
General Dynamics contract agency, for computational support, technical advice, and valuable 762
discussions. We thank Dhwani Batra, CDC, and Lori A. Rowe, CDC, for PacBio sequencing and 763
analysis. We thank Jeannine Petersen, CDC, Luke C. Kingry, CDC, and Alex Hoffmaster, CDC, for 764
their critical review of the manuscript. The findings and conclusions in this report are those of the 765
authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. 766
The use of trade names is for identification only and does not imply endorsement by the Centers for 767
Disease Control and Prevention. 768
769
Author Contributions 770
A.S.G. and D.S. conceived the project and all authors contributed to the design of experiments. A.S.G., 771
B.C., and H.P.M. performed all experiments and all authors contributed to data analysis. A.B.C. 772
established the bioinformatics pipeline, performed all sequence analysis, and created all figures. A.S.G. 773
led development of the manuscript, and all authors contributed to the writing process. All authors 774
reviewed and edited the manuscript. 775
776
Competing interests 777
The authors declare no competing interests. 778
779
780
781
782
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
30
Figure Legends: 783
Figure 1. Nanopore sequencing run metrics for Y. pestis strain S1037. Genomic DNA was 784
extracted by the salt-precipitation (SP) or the silica-membrane (SM) method and libraries were prepared 785
in triplicate using the ONT Rapid Sequencing Kit (Rapid), or the Field Sequencing Kit (Field). All run 786
metrics were based on a 16 hour sequencing run time and the active pore count (-o-) was recorded 787
during the platform QC step prior to the sequencing run. The number of raw fast5 reads generated 788
(green, passed; yellow, failed) and raw FASTQ Albacore base-called reads with an average quality score 789
of ≥ 7 (blue) are displayed. 790
791
Figure 2. Assessment of nanopore sequencing performance and de novo assembly for Y. pestis 792
S1037. DNA sequencing libraries were prepared using the Rapid Sequencing Kit (in triplicate) and 793
Field Sequencing Kit from a salt-precipitation (SP) or a silica-membrane (SM) extraction. Assemblies 794
were generated using analysis of the first 25,000, 50,000, 75,000, and 100,000 passed fast5 nanopore 795
reads. a-c Determination of mismatches, indels, and % identity, and d-f fold coverage for chromosomal 796
and plasmid sequences of Y. pestis S1037 data sets aligned to the All22 reference strain. g Using 797
100,000 passed fast5 reads, the ability to assemble the chromosome and plasmids, and to detect the gyrA 798
SNP associated with the non-susceptible CIP phenotype is depicted in a presence/absence plot. 799
Assembled is defined as a complete sequence with the same nucleotide order as the reference sequence 800
(navy), partially assembled represents a sequence that was assembled but not as a single contig (green), 801
misassembled includes a sequence that was assembled but the bases are in a different order than the 802
reference sequence, or a sequence assembled multiple times resulting in an incomplete assembly 803
(yellow), and unassembled refers to a sequence that was not present in the assembly (orange). 804
805
Figure 3. Time required to detect antimicrobial resistance markers in Y. pestis and B. anthracis 806
using nanopore whole genome sequencing. Time 0:00 starts from a pure isolated culture. The time to 807
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
31
extract genomic DNA by the silica-membrane method is shown, including the one hour B. anthracis 808
lysis step. Quality control steps include assessment of DNA concentration, quality, and integrity. 809
Library preparation time is estimated for the Rapid Sequencing Kit. The average time required to 810
generate 100,000 passed fast5 reads is shown for each species. Genome analysis includes base-calling 811
using Albacore and de novo genome and plasmid assembly. See methods and Supplementary Figure 9 812
for the bioinformatics analysis pipeline details. 813
814
Figure 4. Detection of multimeric forms of pPCP1 in nanopore sequence assemblies of Y. pestis 815
S1037. pPCP1 is ~9.5 kb, but the pPCP1 1-mer assembly is ~9.1 kb due to the presence of small indels 816
in the nanopore-generated sequence. Reads >18.2 kb were assigned to the 3- or 4-mer form based on 817
length. A representative read (grey) for each multimer was selected as a reference for the complete 818
plasmid the remaining reads were aligned against this sequence (purple). Coding sequences (green) and 819
IS100 elements (yellow) are shown in the center. The 2-, 3- and 4-mer forms are linked together in a 5′-820
3′ head-to-toe orientation. 821
822
Figure 5. Assessment of nanopore sequencing performance and de novo assembly for B. anthracis 823
strains. DNA sequencing libraries were prepared using the Rapid and Field Sequencing Kits from a 824
silica-membrane extraction. Assemblies were generated using analysis of the first 25,000, 50,000, 825
75,000, and 100,000 passed fast5 nanopore reads. a-c Determination of mismatches, indels and % 826
identity, and d-f fold coverage for chromosomal and plasmid sequences in data sets of B. anthracis 827
strains (Sterne/pUTE29, 411A2, and UT308) aligned to the Ames Ancestor reference sequence. g Using 828
analysis of 100,000 passed fast5 reads, the ability to assemble the chromosome and plasmids (blue) and 829
detect the gyrA and parC SNPs associated with the non-susceptible CIP phenotype (teal) or wild type 830
sequence (N/A), is depicted in a presence/absence plot. Assembled is defined as a complete sequence 831
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
32
with the same nucleotide order as the reference sequence (navy), partially assembled represents a 832
sequence that was assembled but not as a single contig (green), and unassembled refers to a sequence 833
that was not present in the assembly (orange). 834
835
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
33
Figure 1. 836
837
838
839
840
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
34
Figure 2a-f.841
842
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
35
Figure 2g. 843
844
845
846
847
848
849
850
851
852
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
36
Figure 3. 853
854
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
37
Figure 4. 855
856
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
38
Figure 5a-f. 857
858
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
39
Figure 5g. 859
860
861
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint
40
Table 1. Bacterial strains used in this study. 862
Strain Susceptibility
Profile Plasmids Gene
Nucleotide
Mutation Reference
Y. pestis A1122 CIPS, DOXS, GENS pMT1+, pPCP1+(*), pCD1− gyrA Wild Type (37)
Y. pestis S1037 CIPNS, DOXS, GENS pMT1+, pPCP1+(*), pCD1− gyrA C249 ►A This work
B. anthracis
Sterne/pUTE29 PENS, CIPS, DOXNS pX01+, pX02−, pUTE29+ tetL NA (19)
B. anthracis 411A2 PENS, CIPNS, DOXS pX01+, pX02− gyrA
parC
C254 ►T
C242 ►T (19)
B. anthracis
UT308 PENR, CIPS, DOXS pX01+(#), pX02− gyrA Wild Type (69, 70)
Susceptibility, plasmid profiles and genes/mutations associated with antibiotic resistance investigated in this study. The first 100,000 fast5 reads generated 863 for each sequencing run were analyzed. (*) multimer forms of this plasmid were detected, (#) UT308 pXO1 is missing the pathogenicity island containing 864 the anthrax toxin genes, (NA) no nucleotide mutations associated with resistance are found in this strain. 865
made available for use under a CC0 license. certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also
The copyright holder for this preprint (which was notthis version posted August 15, 2019. . https://doi.org/10.1101/730093doi: bioRxiv preprint