Draft - University of Toronto T-Space 1 1 Survey of compound microsatellites in multiple...

17
Draft Survey of compound microsatellites in multiple Lactobacillus genomes Journal: Canadian Journal of Microbiology Manuscript ID cjm-2015-0136.R2 Manuscript Type: Article Date Submitted by the Author: 25-Aug-2015 Complete List of Authors: Basharat, Zarrin; Fatima Jinnah Women University, Yasmin, Azra; Fatima Jinnah Women University Keyword: Lactobacillus, compound microsatellite, data mining, in silico https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology

Transcript of Draft - University of Toronto T-Space 1 1 Survey of compound microsatellites in multiple...

Draft

Survey of compound microsatellites in multiple

Lactobacillus genomes

Journal: Canadian Journal of Microbiology

Manuscript ID cjm-2015-0136.R2

Manuscript Type: Article

Date Submitted by the Author: 25-Aug-2015

Complete List of Authors: Basharat, Zarrin; Fatima Jinnah Women University, Yasmin, Azra; Fatima Jinnah Women University

Keyword: Lactobacillus, compound microsatellite, data mining, in silico

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

1

Survey of compound microsatellites in multiple Lactobacillus genomes 1

Zarrin Basharat and Azra Yasmin* 2

Corresponding author email: [email protected] 3

Microbiology & Biotechnology Research Laboratory, Department of Environmental 4

Sciences, Fatima Jinnah Women University, 46000, Pakistan 5

6

7

8

9

10

11

12

13

14

15

Running title: Compound microsatellites in Lactobacillus genomes 16

17

18

Page 1 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

2

Abstract 19

Distinct simple sequence repeats with two or more individual microsatellites joined together 20

or lying adjacent to each other are identified as compound microsatellites. Investigation of 21

such composite microsatellites in the genomes of genus Lactobacillus was the aim of this 22

study. In silico inspection of microsatellite clustering in genomes of fourteen Lactobacillus 23

species revealed a wealth of compound microsatellites. All of the mined compound 24

microsatellites were imperfect, composed of variant motifs and increased in all genomes with 25

increment of dMAX value from 10 to 50. A majority of these repeats were present in the 26

coding regions. Correlation of microsatellite to compound microsatellite density was 27

detected. The difference established in compound microsatellite division between eukaryotes, 28

E. coli and Lactobacilli is suggestive of diverse genomic features and elementary distinction 29

between creation and fixation methods of compound microsatellites among these organisms. . 30

Keywords: Lactobacillus, compound microsatellite, data mining, in silico 31

32

33

34

35

36

37

38

Page 2 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

3

39

Introduction 40

Microsatellite polymorphism has been studied for preliminary typing of lactic acid bacteria 41

(Chebeňová et al. 2010) suggesting presence of compound microsatellites. Compound 42

microsatellites are composed of two or more adjacent individual microsatellites m1-xn-m2 43

termed as ‘2-microsatellite’compound and m1-xn-m2-xt-m3 as ‘3-micro-satellite’ compound. 44

Analysis and characterization of these microsatellites is important for interpretation of the 45

origin, rendering of mutational processes, and comprehending structure of these partly 46

understood sequences. Compound microsatellites have been reported in diverse taxa across 47

viruses, prokaryotes and eukaryotes (Gur-Arie et al. 2000; Chen et al. 2012; George et al. 48

2014), constituting approximately 10% of simple sequence repeats (SSRs) in human genome 49

(Weber 1990), including highly polymorphic compound repeats such as (dC-dA)n(dG-dT)n 50

(Bull et al. 1999). Kofler et al. (2008) carried out in silico examination of microsatellite 51

grouping in genomes of several eukaryotes, with compound microsatellites ranging from 4–52

25% of all microsatellites. 53

Alleles of compound microsatellite loci are capable of variation in end to end distance of one 54

or both repeats (Urquhart et al. 1994; Garza and Freimer, 1996; Brinkmann et al. 1998). 55

Length of the repeat motifs in compound microsatellites are likely to affect the structure of 56

the microsatellite. Knowledge about compound microsatellites may possibly offer valuable 57

information concerning the evolution and imperfection of microsatellites (Kofler et al. 2008). 58

Sequencing can expose disparities between the matching length alleles in compound 59

microsatellites (Bull et al. 1999), as with imperfect ones (Laura et al. 1999). Owing to the 60

sequencing of numerous genomes and the accessibility of sequence data, microsatellites can 61

be explored and analyzed in silico. Only miniscule amount of compound microsatellites in 62

Page 3 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

4

sequenced genomes have been explored and a plethora of genomes need to be explored for 63

this category of microsatellites. Comparative data analysis can thus, be utilized to shed light 64

on microsatellite and genome variation in different organisms. 65

Compound microsatellites have been investigated previously in Escherichia coli whole 66

genomes (Chen et al. 2011) and used as a model for this study. Fourteen bacterial species in 67

the genus Lactobacillus were screened in both coding and non-coding regions for possible 68

occurrence of compound microsatellites. Analysis of compound microsatellite complexity 69

was also carried out. This study can aid in understanding the intricate discrepancies 70

concerning compound simple sequence repeats (cSSRs) in various organisms and contribute 71

to the biological comprehension of this category of microsatellites in bacteria. 72

2 Material and Methods 73

2.1 Genome sequences 74

In the present study, we analyzed compound microsatellites in fourteen whole genomes of 75

Lactobacillus, ranging from 1,827,111 bp in Lactobacillus salivarius UCC118 (Accession 76

ID: NC_007929.1) to 3,308,274 bp in Lactobacillus plantarum WCFS1 (Accession ID: 77

NC_004567.1). All studied genomes are listed in ‘Genome-level Extraction Mode’ of IMEx 78

(Mudunuri and Nagarajaram 2007). The accession numbers are given in Table 1 for explored 79

whole genomes of bacterial species.. 80

2.2 Data mining and analysis 81

Compound microsatellites were mined using IMEx (Mudunuri and Nagarajaram 2007) with 82

the following parametres for Lactobacillus genomes: Include Flanking regions: 10 bp, Type 83

of Repeat: imperfect; Repeat Size: all; Minimum Repeat Number: 12, 6, 4, 3, 3, 3 , 84

Page 4 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

5

Imperfection Limit/repeat unit: 1, 1, 1, 2, 2, 3, Percent Imperfection in Repeat Tract: 10%, 85

Maximum distance allowed between any two adjacent SSRs forming a cSSR (bp) (i-e 86

dMAX): 10 with complete standardization. The obtained results were then analyzed. Previous 87

studies focused on microsatellites with varying lengths such as 15 bp for eukaryotes with 88

mismatch (Kofler et al. 2008), 12 bp for eukaryotes (Toth et al. 2000; Buschiazzo E and 89

Gemmell NJ 2010) and prokaryotes with mismatches (Chen et al. 2011). Comparison to any 90

datasets having varying minimum repeat number is not valid if the value of the minimum 91

repeat number is not similar, so the authors have only used 12 bp for comparison of bacteria 92

with previously studied E. coli genomes. 93

2.3 Statistical analysis 94

All the statistical analysis was performed using IBM SPSS v22. Linear regression (R2) was 95

calculated to evaluate the influence of genome size and GC content on the SSRs and cSSRs 96

and correlation among SSR and cSSR density. A P-value of <0.05 was considered to be 97

significant. 98

3 Results and Discussion 99

Prokaryotes and eukaryote genomes exhibit variability in the composition of compound 100

microsatellites. Compound microsatellites are putatively involved in gene expression 101

regulation and functional dictation of proteins in a number of species (Kashi and King, 2006; 102

Chen et al. 2011). Though their significance in Lactobacillus is yet to be proven 103

experimentally, possible complex regulation at the functional level is suggested. Assessment 104

of cSSRs can prove to be of aid in understanding the incidence, distribution and complexity 105

of compound microsatellites in Lactobacillus microbial genomes. 106

3.1 Number, location and genesis of compound microsatellites 107

Page 5 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

6

Analysis revealed 21,974 simple sequence microsatellites and 285 compound microsatellites 108

in the whole assessed dataset of Lactobacillus species (Table 1). Microsatellite and 109

compound microsatellites were normalized (SSR/cSSR density= number of repeats/Mb of 110

genome) over studied genomes of varying sizes for comparison. Density of microsatellites 111

showed a lot of variation and ranged from 506.7-1040.9 (Mean=747.2, SD=156.4) while 112

density of compound microsatellites ranged from 3.3-16.4 (Mean=10.0, SD=4.7). 113

Lactobacillus salivarius had the highest SSR (1040.9) and cSSR (16.4) density. Lowest SSR 114

density (506.7) was observed in Lactobacillus brevis ATCC 367, while lowest cSSR density 115

was observed in Lactobacillus casei ATCC 334 (3.3). Correlation analysis between GC 116

content and genome size with cSSR density revealed a weak correlation (R2=0.34, P<0.05 117

and R2=0.39, P<0.05). cSSR density was strongly correlated to SSR density (R2=0.79, 118

P<0.01) which varies considerably from the previously analysed E.coli genomes (R2= 0.214, 119

P < 0.01). This suggests that the association of SSR density with the cSSR density may not be 120

dependent on DNA polymerases/ replication method as put forward by Chen et al. (2011), 121

while explaining differences in eukaryotic and prokaryotic cSSR density influenced by SSR 122

distribution in the genomes. Prokaryotes share the same rolling circle replication method, so a 123

difference in our results and those reported by Chen et al. (2011) suggest that association of 124

the densities of SSRs and cSSRs is dependent on species and recombination (presence of 125

certain type of repeat motifs) (Kofler et al. 2008) instead of DNA replication. Remarkably 126

analogous SSR and cSSR distribution was observed among similar species of surveyed 127

Lactobacillus genomes (Lactobacillus reuteri DSM 20016, Lactobacillus reuteri JCM 1112 128

and Lactobacillus delbrueckii subsp. bulgaricus ATCC 11842, Lactobacillus delbrueckii 129

subsp. bulgaricusATCC BAA-365). 130

The majority of compound microsatellites (220/285) existed in coding regions 131

(Supplementary Table 1). It has been demonstrated before that microsatellites are more 132

Page 6 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

7

abundant in coding regions than in non-coding regions of some prokaryotes (Gur-Arie et al. 133

2000; Ellegren 2004; Li et al. 2004) possibly due to an enhanced selection in coding regions. 134

Owing to higher polymorphism, compound microsatellites also have an enhanced ability of 135

altering gene function than a single microsatellite (Chen et al. 2012). However, no specific 136

distribution pattern of cSSRs could be inferred as distribution was not found to be 137

homogeneous within the genomes and also compound microsatellites were not concentrated 138

in specific genes. Numerous SSRs and cSSRs were present in the hypothetical proteins, so it 139

is difficult to remark on the occurrence of cSSRs with reference to specific proteins or their 140

sub units unless annotation information for all proteins in the studied genomes is available. 141

3.2 Effect of dMAX on compound microsatellite occurrence 142

While identifying compound microsatellites in a genome, dMAX is the most powerful 143

parameter. Hence, it is indispensable to verify the impact of dMAX on the detection of 144

compound microsatellites (Kofler et al. 2008; Chen et al. 2011; George et al. 2014). We 145

assessed this impact in all listed Lactobacillus strains (Table 1) and scrutinized the presence 146

of cSSRs with dMAX being varied from default threshold of 10 bp (fit to provide paramount 147

sensitivity for compound microsatellites identification by permitting for mismatches in 148

microsatellite quest) up to 50 bp. Our results indicated that the number of compound 149

microsatellites increased with the increase of dMAX in all Lactobacillus genomes (Figure. 1) 150

and it is, therefore, contemplated that increase in the maximum allowed distance between the 151

two adjoining SSRs can result in increased cSSR formation. 152

3.3 Motifs and complexity of cSSRs in studied genomes 153

Compound microsatellites are supposed to have originated by imperfection in microsatellites. 154

To determine the organization of individual microsatellite motifs and their role in setting up 155

Page 7 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

8

compound microsatellites in Lactobacillus genomes, we probed the arrangement and 156

structural make-up of compound microsatellites too (Supplementary Table 1). Interestingly, 157

all compound microsatellite coupled motifs (e.g AAGGT-CTTGTT) were unique which 158

means they are distinct for each Lactobacillus species. Such motifs could probably have 159

arisen by imperfections in duplication. This observation was consistent with E. coli 160

compound microsatellites, formed of very dissimilar motifs having two or more distinct bases 161

(Chen et al. 2011). In eukaryotes two cSSRs have been reported to have very similar motifs 162

for at least 90% of cases (Kofler et al. 2008). Motif duplication i.e. similar motifs on both 163

ends of spacer sequence existed once in Lactobacillus delbrueckii subsp. bulgaricus ATCC 164

11842 ((AAT)n-(X)y-(AAT)z) and in Lactobacillus delbrueckii subsp. bulgaricus ATCC 165

BAA-365 ((GAA)5-X3-(GAA)4), once in Lactobacillus reuteri DSM 20016 and Lactobacillus 166

reuteri JCM 1112 ((ATT)4-X9-(ATT)4), twice in Lactobacillus gasseri ATCC 33323 ((AT)6-167

X8-(AT)6, (TTTA)3-X8-(TTTA)3)) and four times in Lactobacillus plantarum WCSF1 168

((CAT)4-X9-(CAT)5, (AAT)5-X6-(AAT)5-X6-(AAT)4, (ACT)5-X6-(ACT)4, (TTA)4,-X6-169

(TTA)5)). Occurrence of similar duplicated motifs in the same species (Lactobacillus 170

delbrueckii subsp. bulgaricus and Lactobacillus reuteri) suggests that they follow a similar 171

pattern of duplication in their genomes. Furthermore, a cSSR can either be characterized as a 172

perfect compound [(GT)n(AG)n] microsatellite or regarded as overlapping compound 173

microsatellite (overlap of preceding microsatellite with few bases of next microsatellite 174

(Kumar et al. 2014) e.g (AGG)n(GT)n). Overlapping compound microsatellites motifs existed 175

in all Lactobacillus genomes, ranging from a maximum of seventeen in Lactobacillus 176

johnsoni NCC 533 to a minimum of one in Lactobacillus fermentum IFO 3956 (Table 1). 177

Inspection of cSSR complexity indicated that all of the cSSRs found in surveyed genomes 178

belonged to either ‘2-microsatellite’, ‘3-microsatellite’ or ‘4-microsatellite’ compound 179

microsatellites. This reveals a variation in trend from previously analysed E. coli genomes as 180

Page 8 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

9

they did not comprise any of the ‘4-microsatellite’ compound microsatellites. The complexity 181

of cSSRS increased even more with increased dMAX, with every genome showing at least 182

one ‘3-microsatellite’ cSSR, eight genomes showing ‘4-microsatellite’ cSSR and one genome 183

showing ‘5-microsatellite’ cSSR (Supplementary Table. 2). This confirms more complexity 184

of cSSRs in Lactobacillus genomes as compared to E.coli (Chen et al. 2011) but still less 185

than eukaryotes, that can have a complexity of up to ‘8-microsatellite’ (Kofler et al. 2008). 186

Complexity of compound microsatellites in prokaryotes cannot, thus, be attributed to genome 187

sizes as Lactobacillus genomes range from 1.8 to 3.3 Mb (Kant et al. 2011) as compared to 188

3.5 to 5.5 Mb of E. coli (Bergthorsson and Ochman 1995). We suggest that the complexity is 189

dependent on the abundance of microsatellites, which might result in improved frequency of 190

adjoining microsatellites by chance. 191

4 Conclusion 192

Knowledge about the biological significance of compound microsatellites in bacteria is 193

limited and should be attempted in completely sequenced genomes to systematically reveal 194

the nature and evolutionary dynamics of cSSRs. Comparative analysis of compound 195

microsatellites in various Lactobacillus genomes with previously studied eukaryotic and 196

prokaryotic genomes shows that this category of microsatellites is overrepresented in 197

Lactobacilli. It is assumed that overrepresentation of compound microsatellites in bacteria of 198

genus Lactobacillus is due to either increased duplication of imperfections in microstaellites, 199

or DNA replication slippage. The population size of bacteria may also be a crucial factor 200

governing the abundance of microsatellites and hence, compound microsatellites. The 201

diversity of cSSRs in Lactobacillus genomes may be useful for better understanding of their 202

genetic diversity, evolutionary biology and strain/genotype demarcations. 203

204

Page 9 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

10

References 205

Bergthorsson, U., Ochman, H.1995. Heterogeneity of genome sizes among natural isolates of 206

Escherichia coli. J. Bacteriol., 177(20):5784-5789. 207

Brinkmann, B., Klintschar, M., Neuhuber, F., Hühne, J., Rolf, B. 1998. Mutation rate in 208

human microsatellites: Influence of the structure and length of the tandem repeat. 209

Am. J. Hum. Genet., 62:1408–1415.doi: 10.1086/301869. 210

Bull, L.N., Pabón-Peña, C.R., Freimer, N.B. 1999. Compound microsatellite repeats: 211

practical and theoretical features. Genome Res., 9(9): 830-838. doi: 212

10.1101/gr.9.9.830. 213

Buschiazzo, E., Gemmell, N. J. 2010. Conservation of human microsatellites across 450 214

million years of evolution. Genome Bio. Evol., evq007. doi: 10.1093/gbe/evq007. 215

Chebeňová, V., Berta, G., Kuchta, T., Brežná, B., Pangallo, D. 2010. Randomly-amplified 216

microsatellite polymorphism for preliminary typing of lactic acid bacteria from 217

Bryndza Cheese. Folia Microbiol. (Praha), 55(6):598-602. doi: 10.1007/s12223-218

010-0096-4. 219

Chen, M., Tan, Z., Zeng, G., Zeng, Z. 2012. Differential distribution of compound 220

microsatellites in various human immunodeficiency virus type 1 complete 221

genomes. Infect. Genet. Evol., 12(7): 1452-1457. doi: 222

10.1016/j.meegid.2012.05.006. 223

Chen, M., Zeng, G., Tan, Z., Jiang, M., Zhang, J., Zhang, C., Lu, L., Lin, Y., Peng, J. 2011. 224

Compound microsatellites in complete Escherichia coli genomes. FEBS Letters, 225

585 (7): 1072-1076. doi: 10.1016/j.febslet.2011.03.005. 226

Ellegren, H. 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. 227

Genet., 5(6): 435-445. doi: 10.1038/nrg1348. 228

Page 10 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

11

Garza, J.C., Freimer, N.B. 1996. Homoplasy for size at microsatellite loci in humans and 229

chimpanzees. Genome Res., 6:211–217. doi: 10.1101/gr.6.3.211. 230

George, B., Gnanasekaran, P., Jain, S.K., Chakraborty, S. 2014. Genome wide survey and 231

analysis of small repetitive sequences in caulimoviruses. Infect. Genet. Evol., 232

27:15-24. doi: 10.1016/j.meegid.2014.06.018. 233

Gur-Arie, R., Cohen, C.J., Eitan, Y., Shelef, L., Hallerman, E.M., Kashi, Y. 2000. Simple 234

sequence repeats in Escherichia coli: abundance, distribution, composition, and 235

polymorphism. Genome Res., 10(1): 62-71. 236

Kant, R., Blom, J., Palva, A., Siezen, R.J., de Vos, W.M. 2011. Comparative genomics of 237

Lactobacillus. Microbial biotechnol., 4(3): 323-332. doi: 10.1111/j.1751-238

7915.2010.00215.x. 239

Kashi, Y., King, D.G. 2006. Simple sequence repeats as advantageous mutators in evolution. 240

Trends Genet., 22(5): 253-259. doi: 10.1016/j.tig.2006.03.005. 241

Kofler, R., Schlotterer, C., Luschutzky, E., Lelley, T. 2008. Survey of microsatellite 242

clustering in eight fully sequenced species sheds light on the origin of compound 243

microsatellites. BMC Genomics, 9(1):612. doi: 10.1186/1471-2164-9-612. 244

Kumar, M., Kapil, A., Shanker, A. 2014. MitoSatPlant: Mitochondrial microsatellites 245

database of viridiplantae. Mitochondrion.19:334-337. doi: 246

10.1016/j.mito.2014.02.002. 247

Laura, N.B., Pabón-Peña, C.R., Freimer, N.B. 1999. Compound microsatellite repeats: 248

practical and theoretical features. Genome Res., 9(9): 830–838. doi: 249

10.1101/gr.9.9.830. 250

Li, Y.C., Korol, A.B., Fahima, T., Nevo, E. 2004. Microsatellites within genes: structure, 251

function, and evolution. Mol. Biol. Evol., 21(6): 991-1007. doi: 252

10.1093/molbev/msh073. 253

Page 11 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

12

Mudunuri, S.B., Nagarajaram, H.A. 2007. IMEx: Imperfect Microsatellite Extractor. 254

Bioinformatics, 23(10):1181-1187. doi: 10.1093/bioinformatics/btm097. 255

Toth, G., Gaspari, Z., Jurka, J. 2000. Microsatellites in different eukaryotic genomes: survey 256

and analysis. Genome Res. 10:967–981. doi: 10.1101/gr.10.7.967. 257

Urquhart, A., Kimpton, C.P., Downes, T.J., Gill, P. 1994. Variation in short tandem repeat 258

sequences—A survey of twelve microsatellite loci for use as forensic identification 259

markers. Int. J. Leg. Med., 107(1):13–20. doi: 10.1007/bf01247268. 260

Weber, J.L. 1990. Informativeness of human (dC-dA)n x (dG-dT)n polymorphisms. Genomics 261

7:524-530. doi: 10.1016/0888-7543(90)90195-z. 262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

Page 12 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

13

Figure Legend Information 279

Figure 1. Increment of cSSR in Lactobacillus genomes with increased dMAX values. S1, 280

S2.…refer to the genome serial numbers in Table 1. 281

282

Page 13 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

Figure 1.

Page 14 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

Table 1. List of analysed Lactobacillus genomes and compound microsatellites

Serial

#

Accession Organism Genome

size

(bp)

GC

content

(%)

Coding

density

(%)

Total number

of

microsatellites

Total number

of compound

microsatellites

Number of

overlapping

compound

microsatellites

S1 NC_006814.3 Lactobacillus

acidophilus NCFM

1,993,560 34.7 88.2 1819 29 15

S2 NC_008497 Lactobacillus

brevis ATCC 367

2,291,220 46.2 83.4 1161 8 5

S3 NC_008526.1 Lactobacillus casei

ATCC 334

3,079,196 46.6 81.9 1566 10 3

S4 NC_008054.1 Lactobacillus

delbrueckii subsp.

bulgaricus ATCC

11842

1,864,998 49.7 73.46 1370 30 15

S5 NC_010610.1 Lactobacillus

fermentum IFO

3956

2,098,684 51.5 80.2 1304 7 1

S6 NC_008530.1 Lactobacillus

gasseri ATCC

33323

1,894,360 35.3 88.2 1652 26 8

S7 NC_010080.1 Lactobacillus

helveticus DPC

4571

2,080,931 37.1 73.6 1653 21 10

S8 NC_004567.1 Lactobacillus

plantarum WCFS1

3,308,274 44.47 83.4 1944 20 6

S9 NC_009513.1 Lactobacillus

reuteri DSM 20016

1,999,618 38.9 85.0 1544 19 7

S10 NC_010609.1 Lactobacillus

reuteri JCM 1112

2,039,414 38.9 83.3 1575 19 6

S11 NC_007576.1 Lactobacillus sakei

subsp. sakei 23K

1,884,661 41.26 87.0 1313 17 7

Page 15 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology

Draft

S12 NC_007929.1 Lactobacillus

salivarius UCC118

1,827,111 33.04 82.5 1902 30 12

S13 NC_005362.1 Lactobacillus

johnsonni NCC 533

1,992,676 34.6 89.0 1811 29 17

S14 NC_008529.1 Lactobacillus

delbrueckii subsp.

bulgaricusATCC

BAA-365

1,856,951 49.69 77.65 1360 20 8

Page 16 of 16

https://mc06.manuscriptcentral.com/cjm-pubs

Canadian Journal of Microbiology