Download - Genomic and transcriptomic analysis of Lactobacillus ... · Tiivistelmä – Referat – Abstract ... The genes play role in citrate metabolism pathways were significantly down-regulated

Genomic and Transcriptomic Analysis ofLactobacillus rhamnosus LC705

Ilhan Cem Duru

Master’s thesisUniversity Of Helsinki

Faculty of Biological and Environmental Sciences Department of Biosciences

January 2017

Tiedekunta – Fakultet – FacultyFaculty of Biological and Environmental Sciences

Laitos – Institution – DepartmentDepartment of Biosciences

Tekijä – Författare – Author Ihan Cem DuruTyön nimi – Arbetets titel – TitleGenomic and transcriptomic analysis of Lactobacillus rhamnosus LC705Oppiaine – Läroämne – SubjectBiotechnologyTyön laji – Arbetets art – LevelMaster’s Thesis

Aika – Datum – Month and yearJanuary 2017

Sivumäärä – Sidoantal – Number of pages71

Tiivistelmä – Referat – Abstract

Lactobacilli are gram-positive lactic acid bacteria with wide beneficial properties for human health and foodproduction. Today most of the fermented products and probiotic foods are produced by lactobacilli species. Oneof the most using area of lactobacilli species is fermented products especially dairy products. Lactobacilli speciescan be used as starter or adjunct cultures in dairy products and play important role for preservation and quality,texture and flavor formation. Additionally, probiotic properties of lactobacilli species provide several health effectfor human by stimulation of immune system and protection against pathogens. Lactobacillus rhamnosus LC705 isa facultatively heterofermentative type lactobacilli which is used in production of dairy products as adjunct starterand protective culture. The complete and annotated genome sequence of L. rhamnosus strain LC705 published on2009. Known characteristics of L. rhamnosus strain LC705 are food preservation, toxin removal and healthbenefits when combined with other probiotic strains. However, molecular mechanism behind these characteristicsare not known or not clearly understood. To get further insight on these properties and roles in cheese ripening ofstrain LC705, we re-annotated genome of the LC705 with updated methods and databases, analyzed metabolicpathways of LC705, and performed RNA-seq experiment to determine gene expression changes of LC705 duringwarm room (25 °C) and cold room (5 °C) cheese ripening process.

Several un-characterized proteins of LC705 were annotated (77) and 1197 enzyme commission (EC) numbersare added to annotation file with re-annotation of genes of LC705. More importantly, re-annotation provided us 72new pathways of LC705 which is 35% of the entire collection of 201 pathways. Analyzes of pathways showedthat genome of LC705 has responsible genes for production of flavor compounds such as acetoin and diacetylwhich are provide buttery flavor to dairy products, and hydrogen sulfide which is a volatile sulfur compound thatcause unlikeable odor. Additionally to flavor compounds, we defined genes that produce anti-fungus compoundsand bacteriocin which provide food preservation characteristic to LC705. Determination of gene expressionrespond of LC705 during warm room and cold room cheese ripening process with RNA-Seq showed that centralmetabolism genes that responsible for lyase activity, degradation activity, disaccharides and monosaccharidesmetabolism are warm induced genes. The genes play role in citrate metabolism pathways were significantlydown-regulated during cold room, citrate degradation pathways are critical for buttery flavor products, thereforebuttery flavor compounds are produced by LC705 during warm room. Finally, during cold room ripening, thegenes of LC705 that produces ethanol and acetyl-CoA from pyruvate was up-regulated, so we may say thatLC705 uses pyruvate to produce ethanol and acetyl-CoA instead of lactic acid.

Avainsanat – Nyckelord – KeywordsLactic acid bacteria, RNA-seq, gene annotation, cheese, metabolic pathwayOhjaaja tai ohjaajat – Handledare – Supervisor or supervisors Olli-Pekka Smolander and Petri AuvinenSäilytyspaikka – Förvaringställe – Where depositedUniversity of Helsinki, Viikki Campus LibraryMuita tietoja – Övriga uppgifter – Additional information-

Table of Contents

Abbreviations.............................................................................................................................................5

1. Introduction............................................................................................................................................6

1.1. Lactobacillus...................................................................................................................................6

1.1.1. Role of Lactobacilli in Fermented Dairy Products..................................................................6

1.1.2. Probiotic strains........................................................................................................................9

1.2. Lactobacillus rhamnosus strain LC705........................................................................................10

1.3. Transcriptome................................................................................................................................14

1.4. RNA–Seq Analysis of the Transcriptome of the Bacteria.............................................................16

1.4.1. RNA Extraction......................................................................................................................16

1.4.2. Library Preparation................................................................................................................17

1.4.3. Sequencing of cDNA Library................................................................................................18

1.4.4. Analysis of Bacterial RNA-Seq Data for Differential Gene Expression Analysis.................21

1.4.4.1. Quality Check.................................................................................................................21

1.4.4.2. Filtering and Trimming...................................................................................................23

1.4.4.3. Mapping..........................................................................................................................23

1.4.4.4. Transcript Quantification................................................................................................24

1.4.4.5. Differential Gene Expression Analysis...........................................................................24

1.5. Functional Gene Annotation.........................................................................................................26

2. Aims of the Study.................................................................................................................................28

3. Materials and Methods.........................................................................................................................29

3.1. Cheese Sampling and RNA-sequencing.......................................................................................29

3.2. Gene Annotation Update and Pathway Analysis...........................................................................30

3.3. RNA-Seq Data Analysis................................................................................................................31

3.3.1. Filtering and Quality Control.................................................................................................31

3.3.2. Mapping and Counting Reads................................................................................................32

3.3.3. Differential gene expression..................................................................................................32

3.3.4. Gene enrichment analysis......................................................................................................33

4. Results..................................................................................................................................................35

4.1. Gene annotation update.................................................................................................................35

4.2. Metabolic pathways and metabolism of strain LC705 related to cheese......................................35

4.2.1. Fermentation Pathways..........................................................................................................37

4.2.2. Enzymatic Degradation of Amino Acids................................................................................37

3

4.2.3. Carboxylates Degradation......................................................................................................39

4.2.4. Carbohydrates Degradation....................................................................................................41

4.2.5. Production of Biogenic Amines.............................................................................................41

4.2.6. Vitamin Biosynthesis.............................................................................................................41

4.2.7. Lipolysis and fatty acid biosynthesis.....................................................................................43

4.2.8. Protective activities................................................................................................................43

4.3. RNA-Seq data analysis.................................................................................................................44

4.3.1. Read Filtering, Quality Check and Mapping.........................................................................44

4.3.2. Differential gene analysis.......................................................................................................44

4.2.3. Gene Ontology Enrichment Analysis.....................................................................................45

4.2.4. Pathway enrichment analysis.................................................................................................49

5. Discussion............................................................................................................................................52

6. Conclusions..........................................................................................................................................57

7. Acknowledgements..............................................................................................................................58

8. References............................................................................................................................................59

4

AbbreviationsBAM Binary Alignment/MapBBH bi-directional best hitBP biological processbp base pairCC cellular component EC enzyme commissionEMP Embden-Meyerhof-Parnas pathway FDR false discovery ratesGIT gastrointestinal tractGO Gene Ontology HboV human bocavirusIBS irritable bowel syndromeIFNγ interferon-γLFC log fold changeMF molecular function NSLAB non-starter lactic acid bacteriapadj adjusted P value PBMC peripheral blood mononuclear cellsPLP pyridoxal 5′-phosphate RPKM reads per kilobase per million mapped readsSAM Sequence Alignment/MapSMRT single-molecule real-timeTC Transporter Classification TMM trimmed mean of M-valuesUQ upper quartile VSC volatile sulfur compound ZMW zero-mode waveguide

5

1. Introduction

1.1. Lactobacillus

The genus Lactobacillus is one of the largest genus among the lactic acid bacteria and it

contains more than 150 species. (October 2016; http://www.ncbi.nlm.nih.gov/Taxonomy). This genus

contains rod-shaped, gram-positive bacteria that are acid-tolerant and has a genome with low GC

(guanine and cytosine)-content (Salvetti et al., 2012). Lactobacilli are mostly aero-tolerant anaerobic

organisms and they are strictly fermentative. They generally prefer to live in rich carbohydrate-

containing areas. They are found in mucosal flora of humans and animals (oral, intestinal and vaginal

microflora), in plants and in fermenting food (Merk et al., 2005). Based on the fermentation properties

of lactobacilli, there are three groups; obligately homofermentative, facultatively heterofermentative

and obligately heterofermentative (summarized in figure 1). Obligately homofermentative lactobacilli

principally convert hexoses to lactic acid through Embden-Meyerhof-Parnas pathway (EMP) or

glycolysis (Mcdonald et al., 1987). This lactobacilli group cannot ferment pentoses and gluconates.

Facultatively heterofermentative lactobacilli not only can convert hexoses to lactic acid by using the

EMP, but also it can additionally convert pentoses and gluconate to acetic acid, ethanol and formic acid

via the pentose phosphate pathway under glucose limitation (Salvetti et al., 2012). Obligately

heterofermentative lactobacilli primarily ferment pentoses and hexoses to lactic acid, ethanol and CO2

through phosphogluconate pathway (Salvetti et al., 2012).

1.1.1. Role of Lactobacilli in Fermented Dairy Products

Large number of lactobacilli species are used in the food industry, mainly in dairy products like

cheese, yogurt, fermented milk, and also in other fermented foods such as fermented meat sausages

(Fontana et al. 2016) and in bakery products (Garofalo et al., 2012). Reasons why lactobacilli are

commonly used in the food industry are the role of fermentation of lactobacilli, the food preservative

activity by acidification and production of bacteriocin and the anti-fungal compounds and the

enhancement of flavor, texture and nutrition (Giraffa et al., 2010).

6

Figure 1. Three different lactobacilli groups based on fermentation properties summarized.

The most common field that lactobacilli species are used in is cheese production. In cheese

production, the microbial enzymes are of main importance for the development of flavor. Cheese flavor

development is a complex biochemical process in which lactobacilli join this process by catabolize

milk compounds such as sugars (mostly lactose and citrate), proteins, amino acids and lipids.

Lactobacilli species like Lactobacillus helveticus can be used in cheese production as a starter

culture (O’Sullivan et al., 2016). However, lactobacilli species are mostly used as a non-starter lactic

acid bacteria (NSLAB). Starter cultures starts and accelerate the cheese ripening process in cheese by

producing intracellular enzymes such as peptidases, lipases and esterase (Hannon et al., 2003).

Breakdown of proteins and fats are mainly done by starter cultures, and these activities play an

important role in cheese flavor development (Lortal and Chapot-Chartier, 2005). Lysis of proteins by

starter cultures lead to better flavor in cheese and increase concentration of free amino acids (Lortal and

Chapot-Chartier, 2005). In cheese, casein is the main protein, and degradation of casein by peptidases

causes the reduction of bitterness taste (Fallico et al., 2005). Lipase activities in starter cultures cause

hydrolysis of milk fat into free fatty acids which are precursors of flavor compounds like esters

7

(Esteban-Torres et al., 2014). As a lactobacilli strain, Lactobacillus helveticus is highly autolytic strain

and it is used for accelerate the ripening process (Hannon et al., 2003).

NSLAB flora in cheese is dominated by facultative heterofermentative mesophilic lactobacilli

species such as Lactobacillus casei, Lactobacillus paracasei, Lactobacillus rhamnosus and

Lactobacillus plantarum (Gobbetti et al., 2015). NSLAB is used as secondary flora – as adjunct

culture- in cheese flavor production for different cheese types. Especially in Swiss-type cheeses, flavor

intensity depends to counts of facultatively heterofermentative lactobacilli and Propionibacterium

(Bouton et al., 2009). Several studies have showed the importance of adjunct lactobacilli on cheese

flavor development, and it is reported that adjunct lactobacilli species use free fatty acids, free amino

acids, lysis products of starter culture and some sugars. (Burns et al., 2012; Ortakci et al., 2015; Sgarbi

et al., 2013). Aminotransferase activity and glutamate dehydrogenase activity play key roles in

producing flavor products in lactobacilli (Peralta et al., 2016). Some NSLAB can utilize citrate in the

production of aroma compounds. As a result of citrate fermenting pathways in NSLAB, C4-

metabolites, which are acetoin, diacetyl and 2,3-butanediol, are synthesized. These compounds are

highly important for flavor formation (Martino et al., 2016). Selection of NSLAB is also a key step for

cheese production since properties of NSLAB vary a lot depending on species and strains. Some strains

can have dramatic effect on cheese flavor formation and some strains do not have any significant effect

(Crow et al., 2001).

Starter and adjunct lactobacilli microbiota play key roles in dairy product production, quality

and flavor formation. However, they can also produce undesirable compounds like biogenic amines

(Schirone et al., 2013). The production of biogenic amines, mostly histamine, tyramine, putrescine and

cadaverine, occurs from decarboxylation of amino acids such as histidine, tyrosine, lysine and ornithine

(Bunkova et al., 2010) (Summarized in figure 2). Biogenic amides are potential toxic compounds in

which consumption of high doses (>40 mg per meal) can cause several different types of diseases like

migraine, hypotension and allergic response (Shalaby, 1996). Recommended upper limit for the sum of

biogenic amines in cheese is 900 mg/kg (Valsamaki et al., 2000). Biogenic amine poisoning is mostly

reported from fish consumption, but cheese is also one of the reported causes (Linares et al., 2012).

Hence, controlling lactobacilli decarboxylation activities is important for food safety to avoid high

biogenic amine production.

8

1.1.2. Probiotic strains

Lactobacilli are also used in the health industries as probiotics. The definition of probiotics is

“live microorganisms which, when administered in adequate amounts, confer health benefit on the

host” defined by FAO/WHO (2002). Lactobacilli have been found to provide health benefits for

humans when 1 × 106 - 1 × 1010 viable cells per day were consumed (Bernardeau et al., 2006). The

most common lactobacilli species that are used in health industries due to many health benefits are

Lactobacillus rhamnosus, Lactobacillus acidophilus, Lactobacillus plantarum, and Lactobacillus casei

(Espino et al., 2015). Lactabacillus rhamnosus is the most studied lactobacilli species as a probiotics

(Douillard et al., 2013). Several studies show that lactobacilli provide health benefits to the host, such

as immunomodulatory effects (Johansson et al., 2016), protection effect against colon carcinogenesis

(Gamallat et al., 2016), reduction aggregation capacity of gastric pathogen Helicobacter pylori

(Bergonzelli et al., 2006), reduction of blood lipid concentration and heart disease (Liong and Shah

2005) and control of intestinal inflammation (Gao et al., 2015). Stimulation of immune system,

prevention of disease and protection against pathogens are the most common health impacts of

lactobacilli. Additionally lactobacilli species contribute to the nutritional support in humans by

producing several number of different enzymes that human cannot produce. These enzymes break

down important nutritional compounds for humans such as galactooligosaccharides (GOS), lactose and

some starches that cause abdominal pain (Turpin et al., 2010). Furthermore, vitamin synthesis from

lactobacilli is considered to be part of the nutritional benefit conferred from probiotic consumption

(Rossi et al., 2011). For example, vitamin B groups are produced by lactobacilli and vitamin B

complexes are essential for human diet (Kleerebezem and Hugenholtz, 2003). Lactobacilli attract

interest in markets of probiotic foods due to these health effects. Lactobacilli are species used around

the world in commercial products (table 1) and consumption of lactobacilli-containing probiotics is

high, especially in Europe (Saxelin, 2008). Hence, lactobacilli created huge probiotic industry accounts

for billions of Euros.

9

Figure 2. Decarboxylation activities in lactobacilli cause biogenic amines which are toxic compounds

for human. Produced biogenic amines from fermented food; histamine, cadaverine, tyramine and

putrescine are produced by histidine decarboxylase, lysine decarboxylase, tyrosine decarboxylase and

ornithine decarboxylase respectively.

1.2. Lactobacillus rhamnosus strain LC705

Lactobacillus rhamnosus strain LC705 belongs to Lactobacillus rhamnosus species which is

used in dairy products as protective and adjunct culture and probiotics. Strain LC705 is facultatively

heterofermentative type of lactobacilli, It can ferment sugars to lactic acid, ethanol, CO2 and citrate.

Strain LC705 is dairy adapted, allowing to be used as an adjunct starter culture in dairy products

(Kankainen et al., 2009). It also has properties of food preservation, toxin removal and health benefits

when combined with other probiotic strains (Espino et al., 2015).

10

Table 1. Lactobacilli strains which are used in commercial products and their health effect. (Saxelin et

al., 2005; Siezen and Wilson, 2010).

Species Strain Commercial Product Health effect

Lactobacillus rhamnosus

GG ActifitPlus®, Gefilus® Immune-modulatory effect, prevents diarrhoea

Lactobacillus helveticus& Lactobacillus rhamnosus

R0052 & R0011 A’Biotica® Helicobacter pyloriinhibition

Lactobacillus casei DN114001 Actimel®, DanActive® Acute diarrhoea treatment, infectionprevention

Lactobacillus plantarum 299v ProViva® Iron absorption

Lactobacillus paracasei St11 Lactobacillus fortis® Immune-modulatory effect, gut health

Lactobacillus casei Shirota Yakult® Alleviation of acute diarrhoea

Lactobacillus johnsonii NCC533 LC1 range® Immune-modulatory effect, pathogen inhibition

Complete and annotated genome sequence of strain LC705 was published in 2009

(Kankainen et al., 2009) has 3.03 Mbp genome and one plasmid. The genome contains 2992 genes and

the GC content is 47%. Compared to the Lactobacillus rhamnosus strain GG, strain LC705 has bad

adhesion capacity to human intestinal mucus (Myllyluoma et al., 2007). The comparative genomics

study between strain GG and strain LC705 showed that the reason for the poor adhesion of LC705 is

due to the lack of pilus gene cluster spaCBA (Kankainen et al., 2009). Within the Lactobacillus

rhamnosus species, only strain GG has unique pilus genes, which allows it to colonize the

gastrointestinal tract (GIT) efficiently.

Despite the poor adhesion adhesion capability of strain LC705 to GIT, it is used in

combination with other probiotic strains due to important health effects. Nine different studies show

health effects of probiotic therapy that contains strain LC705. When strain LC705 is used in probiotic

therapy with other probiotic strains which are lactobacillus rhamnosus GG, Bifidobacterium breve

11

Bb99 and Propionibacterium freudenreichii ssp. shermanii JS, It has been observed to alleviate

symptoms in irritable bowel syndrome (IBS) (Kajander et al., 2005; Kajander et al., 2008; Lyra et al.,

2010), decrease mast-cell activation and the effects of other allergy-related mediators (Oksaharju et al.,

2011), potentially reduces the presence of human bocavirus (HboV) (Lehtoranta et al., 2012). Another

study (Collado et al., 2007) with the same bacteria combination showed that probiotic bacteria

combinations have better synergistic effects against pathogens than the individual strains. When these

four strains are used with GOS, they stimulates the peripheral blood mononuclear cells (PBMC)

proliferation and interferon-γ (IFNγ) production, which are beneficial in allergic (Holma et al., 2011).

The same species combination was also tested for newborn infants, and the combination was safe for

the infants and also provided respiratory infection resistance (Kukkonen et al., 2008). LC705-

containing probiotic therapy in combination with L. rhamnosus GG, P. freudenreichii ssp. Shermanii

JS, and Bifidobacterium lactis Bb12 have other health benefits as well. This combination has beneficial

effects on the gastric mucosa in Helicobacter pylori-infected patients (Myllyluoma et al., 2007).

Additionally, previously it has been show that LC705 can activate the inflammasome and IFN response

in human macrophages better than strain GG (Miettinen et al., 2012). In conclusion, strain LC705 can

be used as probiotic strain with other probiotic strains in dairy products. Health effects of probiotic

therapies by using different combinations are summarized in table 2.

Toxin removal is one of the characteristics of LC705 (Kajander et al., 2007). The first study that

showed the ability of toxin removal of LC705 was published on 1998. This study showed that strain

LC705 has a significant effect in reducing levels of aflatoxin B1 (AFB1), which is a type of mycotoxin

in liquid media (El-Nezami et al., 1998). Aflatoxins are produced by the fungal species Aspergillus

flavus (A.flavus) and Aspergillus parasiticus, which are found in a variety of food products (Magnussen

et al., 2013). Within different aflatoxin types, aflatoxin B1 being the most common and toxic type, it is

also carcinogenic for animals and humans and increases the risk of liver cancer (Magnussen et al.,

2013). Removal of aflatoxin is important for foods which are contaminated with aflatoxin. Another

study (Nada et al., 2009) showed that strain LC705 not only removes the aflatoxin but also can inhibit

A.flavus growth which is responsible for aflatoxin production. Although LC705 is able to remove

aflatoxin by itself, combinations of lactic acid bacteria strains may work better in removing several

toxins (Halttunen et al., 2008). One clinical study in China showed that probiotic supplements

including strain LC705 reduced urinary excretion of aflatoxin B1-N7-guanine (AFB-N7-guanine), a

biomarker for increased risk of liver cancer (El-Nezami et al., 2006). Thus strain LC705 may be used

12

in diet to prevent the development of liver cancer. In addition to aflatoxin, LC705 can remove another

dangerous mycotoxin zearalenone, which can be found in foods such as maize, oats and wheat (El-

Nezami et al., 2002). More than 250 strains of lactic acid bacteria were tested for aflatoxins removal

and L. rhamnosus strains GG and LC705 are the most efficient for aflatoxin binding (El-Nezami et al.,

2006). Feature of toxin removal makes strain LC705 important for food and health industries.

Table 2. Probiotic therapy studies that contain strain LC705.

Probiotic strain

combination type

Health Effect(s) Reference(s)

Combination A Alleviates IBS.Decreases mast-cell activation and the effects of allergy-related mediators.Reduces the presence of HboV.Synergistic effects against pathogens.Consumption with GOS stimulates PBMC proliferation and IFNγ production.Increases resistance to respiratory infections for newborn infants.

Kajander et al., 2008, Lyra et al., 2010,Oksaharju et al., 2011,Lehtoranta et al., 2012,Collado et al., 2007,Holma et al., 2011,Kukkonen et al., 2008

Combination B Beneficial effect on gastric mucosa in Helicobacter pylori infected patients.

Myllyluoma et al., 2007,

Combination A = Lactobacillus rhamnosus GG, Lactobacillus rhamnosus GG, Bifidobacterium breve

Bb99 and Propionibacterium freudenreichii ssp. shermanii JS, Combination B = L. rhamnosus GG, L.

rhamnosus LC705, P. freudenreichii ssp. shermanii JS and Bifidobacterium lactis Bb12

L. rhamnosus LC705 is used in dairy products as a protective culture. The dairy products are a

very good growth medium for a great range of microorganisms, which allows microbial contamination

(Ruegg, 2013). In dairy products, food spoilage yeasts, molds and some pathogens must be controlled

for food safety. In cheese, common potential pathogenic bacteria species are Listeria monocytogenes

and Escherichia coli (Luukkonen et al., 2005), which may cause severe diseases. Several economic

losses may happen because of foodborne diseases because of these pathogens (Ruegg, 2013; Schlech

and Acheson, 2000). Hence food hygiene is critical for dairy products production. Protective cultures in

dairy products are very important sources for bio-preservative. Microorganisms that are used as bio-13

preservatives produce organic acids, hydrogen peroxide, diacetyl, antifungal peptides, and bacteriocins

as antimicrobial metabolites (Heredia-Castro et al., 2015). Due to restrictions on the use of chemical

additives and preservatives, it is important to use bio-preservatives in the food industry (Gao et al.,

2015). For that purpose, bacteriocin-based bio-preservation is common in the food industry (Gálvez et

al., 2007). Not much is known specifically about protective activities of the strain LC705. Suomalainen

and Mäyrä-Mäkinen (1999) developed a protective culture combination to replace chemical

preservatives like sorbic acid and acetic acid in fermented milk and bread. The developed protective

culture combination consists of two different bacteria: Lactobacillus rhamnosus LC705 and

Propionibacterium freudenreichii ssp. shermanii JS. That study showed that L. rhamnosus LC705 and

P. freudenreichii ssp. shermanii JS together inhibited the growht of food spoilage yeasts, molds and

Bacillus species. Another study (Luukkonen et al., 2005) showed that Lactobacillus rhamnosus LC705

can inhibit the growth of Listeria in Edam cheese. Neither of the studies explained the mechanism of

the inhibitory effect. This inhibitory mechanism requires further investigation in LC705.

1.3. Transcriptome

The transcriptome is the whole set of both coding and non-coding RNAs transcribed from the

genome of an organism at a specific stage. Transcription analysis is very common and important

because it provides functional characterization of the genome, an understanding of the expression of

genome at the transcription level and annotation of transcripts and operons. This information helps to

reconstruct gene networks that give the opportunity to understand cellular functions, regulatory

pathways and biological systems and also to understand disease processes and help in diagnosis and

drug discovery.

For transcriptome studies, different types of technologies have been developed: hybridization

techniques (microarrays) and sequencing techniques. For the hybridization techniques, mostly

annotated genomes are used to construct microarrays, and RNAs of the sample are converted to labeled

complementary DNA (cDNA). Incubation process provide hybridization of labeled cDNA and

microarrays which provide detection of transcript levels (Van Vliet, 2010). To get very high-resolution

results from hybridization techniques, genomic tiling microarrays were constructed. It uses overlapping

short sequence of probes that provide genome-wide analysis of RNA with high resolution (Selinger et

al., 2000). Genomic tiling microarray method is not commonly used because it is very expensive (Pinto

et al., 2011). For a long time, microarrays were mainly used in transcriptomic studies and thousands of

14

publications were published using microarrays. However, microarray technology has critical

limitations. It produces high rate of noise because of cross-hybridization, and transcription level

comparison is often difficult because it requires complicated normalization methods (Wang et al.,

2009).

Sequencing techniques for transcriptome studies are based on determining cDNA sequence.

Sequencing methods like Sanger sequencing of cDNA or expressed sequence tags (EST) libraries and

tag-based methods such as serial analysis of gene expression (SAGE), cap analysis of gene expression

(CAGE) and massively parallel signature sequencing (MPSS) were used for studying transcriptomes.

These methods are highly expensive and have some difficulties in mapping to the reference genome

(Wang et al., 2009). Due to these disadvantages, usage of traditional sequencing is limited in

transcription studies. New approach to sequencing which is next generation ultrahigh-throughput

sequencing, provided a new method for quantifying transcriptomes. This method is RNA-sequencing

(RNA-Seq). RNA-Seq has several advantages over other traditional methods. It is similar price than

other methods. It does not require probe sequences nor genome to be existing prior analysis. It can

analyse whole transcription, thereby it is possible to work with operons and non-coding RNAs.

Additionally, RNA-Seq data provides high resolution results, which improves the mapping of the

sequences to the reference genome. It does not produce high background noise and new transcripts can

be seen with RNA-Seq. Lastly, it is more sensitive in low levels of expression when sufficient

sequencing depth is reached. With these advantages, RNA-Seq became the gold standard for

transcriptome studies (Wang et al., 2009).

The basic working principle of RNA-Seq starts with the conversion of isolated RNAs to a

library of cDNA fragments with the addition of adapter sequences by library-construction methods.

Molecules which are obtained by library-construction are sequenced via ultrahigh-throughput

sequencing machine to get single-end sequencing or pair-end sequencing results. Any of the next-

generation sequencing platforms like Illumina, SOLiD, Roche 454, Ion Torrent, and Pacific biosciences

(Pacbio) can be used for RNA-sequencing. The size and quality of the reads from sequencing machine

vary depending on the DNA-sequencing machine and DNA-sequencing technology. Every platform

uses different library construction protocols and methods. After getting the reads from sequencing,

bioinformatic analysis is needed to process the huge data. The bioinformatic analysis is critical for

RNA-Seq and it is as challenging as the experimental process including isolation of RNAs, cDNA

library construction and sequencing. Bioinformatic analysis procedure of RNA-Seq can be vary

15

depending on experimental design, but it mostly includes quality control of RNA-Seq data and filtering

of the data, alignment of the filtered reads to a reference genome, calculation of expression levels of

genes, analysis of differentially expressed genes and their visualization.

1.4. RNA–Seq Analysis of the Transcriptome of the Bacteria

RNA transcripts in a population of bacterial cells can be studied by sequencing of cDNA

libraries with RNA-Seq technology. In comparison to eukaryotic genomes, bacterial genomes are

smaller and contain high rates of coding sequences (CDS). Nevertheless, compared to eukaryotic RNA

transcriptome studies, bacterial transcriptome studies are more challenging. Eukaryotic messenger

RNAs (mRNAs) have a long poly-A tail that provides easy mRNA isolation from other RNAs by

hybridization to immobilized poly-T residues. However, mostly prokaryotic mRNAs either do not

contain poly-A tails (Srinivasan et al., 1975) or contain a short poly-A tail (Sarkar et al., 1997).

Additionally, bacterial mRNA generally has shorter life time and high ribonuclease activity make

bacterial mRNA unstable (Deutscher, 2006). Due to these difficulties, RNA-Seq technology was first

used for eukaryotic cells (Van Vliet, 2010). These difficulties were overcame with new methods (Sorek

and Cossart, 2010) that include capturing and removing rRNAs, degrading rRNAs with magnetic

beads, or using artificially polyadenylated mRNAs. Currently, developing technologies are creating

methods and protocols (Avraham et al., 2016; Shishkin et al., 2015) that allow working on single cells

and studying host and pathogen transcriptomes with barcoding and pooling RNAs before library

preparation.

The following sub-chapters will cover the steps of RNA-Seq workflow (summarized in figure

3), RNA-Seq data analysis for transcriptome and determination of gene expression changes in bacterial

cells.

1.4.1. RNA Extraction

To generating cDNA libraries of bacterial RNA, the first step is to purify RNA from bacteria.

Generally RNA is extracted from bacteria by using commercially available kits like RNAEasy

(Qiagen), TRIZOL Max (Life Technologies), or RiboPure (Life Technologies). These kits extract RNA

by using specific lysis and filters or membranes. The kits provide easy, rapid and efficient purification

of high-quality RNA. It is important to choose a method or kit that does not bias the transcriptome and

16

is able to collect enough RNA for the library construction (Croucher and Thomson, 2010; Wilhelm and

Landry, 2009). Library construction requires approximately 0.1–10 μg of total RNA, depending on the

aim of the study and library construction protocols (Pease and Sooknanan, 2012). DNA contamination

is very common during the RNA extraction step, hence DNase treatment is highly recommended to

degrade genomic DNA. For bacterial cells, extracted RNA samples contain high-concentration of

rRNA. High amounts of rRNA can create wasted effort sequencing the same rRNA millions of times

(Wilhelm and Landry, 2009). Hence depletion of rRNA by hybridization using commercial kits like

Ribo-Zero Magnetic Kit (Bacteria) (Epicentre Biotechnologies) and MicrobExpress Bacterial mRNA

Purification Kit (Ambion) can be done to enrich mRNA and non-coding RNA (ncRNA). Nevertheless,

Lahens et al. (2014) showed that rRNA depletion creates critical biases in coverage. Therefore mRNA

enrichment step is optional. After RNA extraction, it is important to check quality of RNA for

degradation, quantity and purity.

1.4.2. Library Preparation

RNA samples are not directly sequenced. Next-generation sequencing technology prefers to

sequence DNA directly because of technical maturity and better stability of DNA (Korpelainen et al.,

2014). Therefore, the main aim of library construction for RNA-Seq is to convert single-stranded RNA

to a library of cDNA fragments by reverse transcription with adding adapter sequences. Adapter

sequences help sequences reads to combine with primers for amplification during sequencing. Library

preparation steps can vary depending on library protocols. These variations between protocols can

create differences in strand-specificity, library complexity, coverage and level of gene expression

(Levin et al., 2010). There are several different protocols for library preparation, but the main basic

steps are mRNA fragmentation, cDNA synthesis by reverse transcription, adapter ligation and PCR

amplification and purification of library.

There are two different main protocols for library construction: non-strand-specific protocol and

strand-specific protocol. Non-strand-specific protocols are cheaper and contains fewer steps compared

to strand-specific protocols. Non-strand-specific protocols work well in gene expression studies but

cannot determine original strand of the transcript which is important for overlapping transcripts, gene

regulation and antisense RNA studies. Non-strand-specific library generation is mostly based on one

standard protocol. The fragmentation step in non-strand-specific library generation can be done in two

ways: fragmentation of cDNA or fragmentation of RNA samples. Both methods create the same cDNA

17

library. However, RNA sample fragmentation has less bias compared to cDNA fragmentation (Wang et

al., 2009). After the RNA samples fragmentation, forward and reverse strands of cDNA are synthesized

by reverse transcription using random hexamers which bind to random complementary sites on a target

cDNA.

To determine the strand of the transcript, strand-specific protocols are used. Strand-specific

protocols are technically more challenging and more time consuming compared to non-strand-specific

protocols. There are two main classes of strand-specific RNA-Seq library construction protocols that

use different techniques. The first technique uses strand-specific adapters in a known orientation, hence

these protocols mark the 3’ end and 5’ ends of the mRNA when cDNAs are created. The second

technique is chemical modification of strands including bisulfide treatment and dUTP modification

(Mills et al., 2013). Every step in strand-specific protocols can change depending on the sequencing

platform used, and every sequencing platform has published its own protocols and kits for library

construction. It is recommended to use the library construction protocols of sequencing platforms, but it

can be expensive. Hence low-cost RNA sequencing protocols (Hou et al., 2015; Wang et al., 2011) are

quite common. Library preparation step is critical for the whole analysis because protocols are

generally biased processes. Several factors can create bias including favored specific sites in

fragmentation of RNA. PCR library amplification step can create critical bias including sequence

artifacts due to unequal amplification or amplification efficiency (van Dijk et al., 2014). Bias in the

library preparation step can cause low-complexity libraries, which is problematic for sequencing step.

Hence it is very important to choose protocol and kits carefully.

1.4.3. Sequencing of cDNA Library

Purified cDNA fragments can be sequenced by using a variety of next-generation sequencing

platforms such as Illumina, SOLiD, Ion Torrent and Pacific Biosciences. Each different sequencing

platform provides different lengths of reads, depth of coverage and quality by using different methods

to sequence. Illumina is the most commonly used high-throughput sequencing platform that sequences

via synthesis technique (Reuter et al., 2015). In this process, cDNA fragments are hybridized to glass

slide (flowcell) based on complementarity with adapter sequences. To create high amount of hybridized

cDNA fragments, they are amplified as a bridge and clusters are generated by polymerases. After the

bridge amplification, the reverse strand is removed by washing. For sequencing by synthesis,

nucleotides are differently labeled with fluorescent tags are added (Liu et al., 2012). Additional

18

sequencing reagent start polymerization reaction and only one base complement the template at a time.

Different fluorescent tags from the nucleotides produce unique emission wavelengths and imaging of

the emission wavelength are used to determine the base by a coupled-charge device (Reuter et al.,

2015). Sequencing can be performed at one end or from both ends (paired-end). If paired-end

sequencing is used, both ends of the cDNA fragment is sequenced. Illumina produces wide ranges of

instruments (Miniseq series, Miseq series, Hiseq series and NovaSeq series) with different features.

Depending on the instrument model, read lengths can be 100-300 nucleotides and run time of these

instruments can be vary between 4 hours and 3 days.

The SOLiD sequencing platform uses oligonucleotide ligation-based chemistry instead of

synthesis method. In SOLiD platform, cDNA fragments are attached to magnetic beads that contains

complementary oligonucleotides and emulsion PCR is performed for clonal sequencing as an

amplification step. Amplification products are selected and bound to a flowcell by covalent bonding.

On a SOLiD flowcell, sequencing is performed by a DNA ligase. Universal primer is bound to the

adapter sequence of amplified fragments to create a ligation site for specific octamers, which contain

di-base probes and fluorescent labels (Shendure and Ji, 2008). There are 16 possible di-base

combination with four fluorescent labels. If the di-base is complementary to the cDNA fragment,

ligation occurs and fluorescent signals are produced. In the second cycle, specific octamer is cleaved at

the 5th base and a ligation site is generated for the new specific octamer. Therefore ligation cycles

sequence every 5th base of amplified fragments. The number of cycles of ligation determine the read

length. For the 35 base-pair (bp) read length, seven cycles of ligation is performed. After cycles of

ligation, the first universal primer is removed from the amplified cDNA fragment and new primer is

bound to the adapter sequence of amplified fragment at the n-1 position for a new round of ligation

cycles. This process is repeated with new primer positions to get the whole read of the amplified

fragment. Due to the usage of dinucleotide probes, each base is interrogated twice. The di-base probe

approach from SOLiD provide a low error rate with accuracy of up to 99.99% (Korpelainen et al.,

2014). Depending on the sequencer model of the SOLiD platform, the read length can be vary from 35

bp to 85 bp and the run time can take up to 7 days (Liu et al., 2012).

Ion Torrent is another sequencing platform that uses sequencing by synthesis technique like the

Illumina sequencing platform. Instead of measuring light signals, Ion Torrent detects pH changes

caused by the production of hydrogen ions during DNA synthesis (Rothberg et al., 2011). To detect

hydrogen ions, Ion Torrent platform uses complementary metal-oxide semiconductor (CMOS)

19

technology with ion-sensitive field-effect transistor because high sensitivity of hydrogen ions

(Rothberg et al., 2011). The process in Ion Torrent starts with hybridization of cDNA fragments to

bead. cDNA libraries which contain adapter sequences are clonally amplified to magnetic bead by

using emulsion PCR. Template-carrying beads are loaded to sensor microwells where sequencing by

synthesis chemical reaction occurs. In the sensor microwells of the Ion Torrent platform, four different

nucleotides provided one by one. When a nucleotide is complementary to the template, it binds to

template by polymerase and hydrogen ion is released. Released hydrogen is detected by the

semiconductor sensor and converted to a voltage signal (Reuter et al., 2015). The Ion Torrent

sequencing platform offers high-speed, high-quality and low-cost sequencing performance without

fluorescence labeling and camera scanning. It has a limitation with homopolymer repeats.

Homopolymer of length longer than six base pairs create high error rates (Reuter et al., 2015). Ion

Torrent sequencing platform allow to use maximum 200 bp to 400 bp read length. The run time of the

sequencing can vary between two and eight hours depending on the machine model.

The Pacific Biosciences (PacBio) sequencing platform uses single-molecule real-time (SMRT)

technology which represents third-generation sequencing. The leading difference of SMRT sequencing

is that it requires only single molecules and enables real-time observation. As it works with single

molecules, clonal amplification by PCR is not needed; hence DNA preparation time is shorter than in

other sequencing platforms and can avoid bias and errors caused by PCR for whole genome sequencing

(Liu et al., 2012). For RNA-Seq, Pacbio platform uses the isoform sequencing (Iso-Seq) application.

Iso-Seq requires large scale PCR but it provides full length transcripts without assembly and isoforms

of transcripts. In SMRT, the sequencing template (SMRTbell) is prepared by ligation of hairpin

adapters to both end of cDNA molecules. Prepared SMRTbell is transferred to chip named SMRT cell

which contains zeptoliter-sized chambers called zero-mode waveguides (ZMW). DNA synthesis occurs

in the ZMW with polymerase at the bottom which binds to the hairpin structure of SMRTbell. During

DNA synthesis, four fluorescently-labeled nucleotides are used. Fluorescent signals is detected by

camera in real time (Reuter et al., 2015). One of the advantages of the PacBio platform is read length.

Compare to other platforms PacBio can generate much longer reads. The average read length of the

PacBio sequencing is over 10 Kbp and maximum reads can be longer than 60 Kbp (Rhoads and Au,

2015). Nevertheless, PacBio sequencing platform creates high error rate (up to 10-15%) compared to

other sequencing platforms (Rhoads and Au, 2015).

20

1.4.4. Analysis of Bacterial RNA-Seq Data for Differential Gene Expression Analysis

The huge amount of data is created from RNA-Seq needs to be analyzed. Interpretation of

RNA-Seq data requires robust computational programs, bioinformatic tools and powerful servers.

There are several types of applications for RNA-Seq experiments like fusion gene detection,

differential gene expression studies and RNA editing. There is no single pipeline for those applications.

Quality control, data filtering and trimming, alignment of reads (mapping) or de novo assembly and

indexing (quantification) of reads are common steps for all applications. For gene expression studies,

extra statistical steps are needed for identifying significance in genetic expression within the context of

experimental conditions.

1.4.4.1. Quality Check

As a result of RNA-Seq millions of raw short sequence reads are mostly generated as a FASTQ

file. During generation of these reads, critical quality problems may occur because of sequencing

errors, library preparation errors and PCR bias. These poor-quality sequences can badly affect mapping

and assembly results and gene expression estimates. Raw bases also contains adapter sequences that

need to be trimmed before mapping. To check the quality of raw reads, several tools have been

published, such as FastQC (Andrews, 2010), NGSQC (Dai et al., 2010) and PRINSEQ (Schmieder and

Edwards, 2011). With these tools, it is possible to determine base sequence qualities based on Phred

quality score, sequence GC content, read length distribution, sequence duplication and adapter

contents. Base sequence qualities based on Phred quality score measures the error probability for each

base (Ewing and Green, 1998). Quality values are encoded into the next-generation sequencing output

(FASTQ file format) with ASCII characters. Depending on the sequencing platform, different ASCII

offsets can be used. Offset of 64 and 33 (also named Sanger encoding) are mainly used. For files that

use Phred scores with an ASCII offset of 64, the 64 th character is used as zero. For files with an ASCII

ofset of 33, the 33rd character is used as zero. The Phred quality score calculation is logarithmic based.

For example, a Phred score of 20 means there is a 1 in 100 chance that the base is wrong, hence

accuracy is 99%. Similarly, a Phred score of 30 means that on average every 1000 bp read contains an

error, hence accuracy is 99.9%.

21

Figure 3. RNA-Seq workflow starts with sample preparation and RNA extraction. Total RNA from

samples can be extracted with some commercial kits. mRNA enrichment step is optional and can be

applied with commercial kits. After pure RNA extracted, library preparation can be done with RNA

fragmentation, reverse transcription, adapter ligation, and library amplification steps. Finally

sequencing can be done with next-generation sequencing platforms. Produced RNA-Seq data can be

analyzed with bioinformatic tools.

22

1.4.4.2. Filtering and Trimming

After the quality check, filtering and trimming the data is an important step to get rid of bad-

quality reads and adapter sequences, while improving the mappability. Filtering the data represents

removing the whole read, while trimming represents removing poor-quality ends of reads.

Trimmomatic (Bolger et al., 2014) and Cutadapt (Martin, 2011) can be used for both filtering and

trimming the data. For filtering the data, the mean quality of reads and read length are two key factors.

It is difficult to define best values for minimum quality and minimum length filtering. Filtering values

should be applied according to experimental design and sequencing platform. For example, for gene

expression profile studies, filtering values affect significantly the gene expression estimates (de Sá et

al., 2015; Williams et al., 2016). Ideally the recommended Phred quality score is 25 or higher

(Korpelainen et al., 2014). Therefore quality filtering should be applied for reads with scores less than

25 and for very short reads. Trimming the data is needed for RNA-seq studies for two main reasons:

adapter trimming and low-quality base trimming. For filtering the reads, the mean quality of reads is

considered. However, for trimming the reads, base quality is considered. For many sequencing

platforms, quality of reads decreases towards the 3’ end of reads (Conesa et al., 2016). The low-quality

ends need to be trimmed based on base quality. Adapter sequences that were added to reads during

library generation should also be trimmed. If adapter sequence is not known, some adapter prediction

algorithms (Schmieder et al., 2010) can be used. Bioinformatic tools that are used for filtering and

trimming the data may require knowing the type of data information like paired-end or single-end,

adapter location (3’end or 5’end) and type quality encoding type (ASCII 33 or 64). It is critical to

define these parameters to get proper results from filtering and trimming RNA-Seq data.

1.4.4.3. Mapping

After high-quality data is obtained from filtering and trimming, the next step is to align these

millions of reads to a reference genome or transcriptome to define the locations of the reads. Mapping

for bacterial RNA-Seq data is easier than eukaryotic RNA-Seq data, due to the fact that splicing

activities are very rare and that introns are very low. Due to the millions of reads that are aligned, speed

was a problem for mapping (Trapnell and Salzberg, 2009). Several published tools such as STAR

(Dobin et al., 2013), TopHat2 (Kim et al., 2013) and Bowtie2 (Langmead and Salzberg, 2012) provide

ultra-fast and memory-efficient read aligning. The algorithms that are used for alignment not only

23

provide ultra-fast alignment, but also cope with mismatches and indels very efficiently. The typical

created output file from aligners is Sequence Alignment/Map (SAM) files (Li et al., 2009). SAM files

contain several information about mapped read like mapping quality, reference sequence name, query

quality and read size. SAM files can be unsorted and require high amount of disk space. Hence it is

efficient to convert and sort SAM files to the Binary Alignment/Map (BAM) file format, which is the

binary representation of SAM files. After mapping, it is important to check the quality of mapping and

percentage of mapped reads. Additionally, visualizing the reads by using a reference genome can allow

quality check by human eye. Visualizing can be done with Integrative Genomics Viewer (Robinson et

al., 2011) and Tablet (Milne et al., 2010).

1.4.4.4. Transcript Quantification

After mapping, quantification is done to define the number of reads that are aligned to each

gene, transcript or exon. If a reference genome is already annotated, aligned reads can be easily

counted by using the coordinates of transcripts. This gene expression estimates can be done with

bioinformatic tools such as HTSeq (Anders et al., 2015) and featureCounts (Liao et al., 2014). For the

quantification step, reads mapped to multiple regions or overlapped with multiple genes can create

problems for differential gene expression studies. HTSeq and featureCounts offer the option to exclude

these multi-overlapped reads or count them for each overlapped gene. Generally for differential gene

expression analysis, excluding multi-overlapped reads is considered the better option (Liao et al., 2014;

Anders et al., 2015). If there is no reference genome, quantification can be done by using de novo

transcriptome assembly. Finally, transcript quantification can be done without mapping the reads by

using software such as Sailfish (Patro et al., 2014) and Kallisto (Bray et al., 2016). Transcript

quantification without mapping is much faster than other methods because of skipping mapping step.

Quantification without mapping method uses counts of k-mers across reads. Hence the k-mer length is

a critical parameter for these tools. It is important to choose proper parameters to get accurate results

(Patro et al., 2014; Bray et al., 2016).

1.4.4.5. Differential Gene Expression Analysis

Gene expression profiling identifies genes that are differentially expressed in different

conditions. Basically, differential gene expression analysis is a statistical power analysis for transcript

quantification. Several different R packages have been published for differential gene expression24

analysis such as DESeq2 (Love et al., 2014), edgeR (Robinson et al., 2010), EBSeq (Leng et al., 2013),

and ShrinkBayes (van de Wiel et al., 2014). There are three main steps for differentially expressed gene

analysis: normalization, dispersion estimation and significance testing to provide false discovery rates

(FDRs). Normalization is the first step for differential gene expression analysis, raw RNA-seq data has

some problems that originated from sequencing. Sequencing depth and library sizes are different for

each samples, hence raw counts cannot be used without proper normalization (Soneson et al., 2013).

For that purpose different normalization methods are exist like total count, upper quartile (UQ),

trimmed mean of M-values (TMM), reads per kilobase per million mapped reads (RPKM) and DESeq

normalization method. Dillies et al. (2013) showed that total count, RPKM and UQ methods are not

efficient, hence TMM and DESeq normalization methods are recommended to use for normalization

step. Most R packages use raw count data and perform internal normalization. Some packages like

ShrinkBayes require normalized data.

Fold change based on read counts is commonly used measure for differential gene expression

(Feng et al., 2012). It is a measure that determines quantity changes between samples. Using raw fold

change can create problems for low counts, hence dispersion estimation is needed before significance

testing. For example, to remove this problem, DESeq2 uses log fold change (LFC) shrinkage approach

(Love et al., 2014). As a result of LCF shrinkage approach, DESeq2 R package returns shrunken log2

fold change estimates for each gene. Shrunken LFC is divided by its standard error for the significance

testing (Wald test) in DESeq2 package. The Wald test gives P-values of gene expressions. As multiple

testing creates false positive results, adjustment for it is needed. Hence, estimation of FDR with the

procedure of Benjamini-Hochberg is used in DESeq2 to adjuste multiple testing (Love et al., 2014).

Different methods can be seen by other R packages, but as a result of statistical power analysis, FDR

and fold change data determine the significantly differentially expressed genes. Up-regulated and

down-regulated gene lists can be created by determining a proper threshold for FDR and fold change

data.

1.4.4.6. Downstream Analysis: Gene Set Enrichment and Pathway Analysis

Produced gene sets from differential gene expression analysis (up-regulated and down-regulated

genes) need to be biologically interpreted by enrichment analysis to determine molecular functions and

pathways of differentially expressed genes. For the gene enrichment analysis, transcriptome needs to be

annotated by functional categories. Gene Ontology (GO) (Gene Ontology Consortium et al., 2013) is

25

the most common annotation library for classification of genes. GO contains three different main

domains including biological process (BP), molecular function (MF) and cellular component (CC).

Annotated genes are assigned with descriptive GO definitions (GO terms) that show the process or

activity of a gene. Several tools have been published for GO classification and GO enrichment such as

DAVID (Huang et al., 2007), BiNGO (Maere et al., 2005) and LEGO (Dong et al., 2016). There are

different approaches for enrichment analysis statistical tests. Fisher's exact test and the hypergeometric

test are two main common tests for enrichment analysis (Rivals et al., 2007). Basically, these tools

calculate P-values with statistical tests for GO terms dependent on a list of up-regulated or down-

regulated genes.

Besides GO enrichment, other databases like KEGG (Kanehisa and Goto, 2000) and BioCyc

(Caspi et al., 2016) can be used for pathway analysis, pathway construction and defining enriched

pathways. Pathway analysis is critical for understanding the biological functions of transcriptome data.

It provides determination of function of genes and their roles in pathways. Pathway construction can be

done with automatic pathway servers such as KAAS (Moriya et al., 2007) and RAST (Aziz et al.,

2008). These servers are fully automatic, therefore functional annotation of proteins is also done by

servers with finding best similarity matches and through homology to reference genes. Additional to

fully automatic pathway construction servers, bioinformatic tools like Pathway Tools (Karp et al.,

2002) can be used to construct pathways. Unlike pathway construction servers, Pathway Tools does not

functionally annotate the genome. It requires annotation file which can include protein names, enzyme

commission (EC) number and GO terms. Annotated proteins are mapped to BioCyc pathway database

in Pathway Tools. As the annotation is not automatic, it is possible to construct more precise and clear

pathways in Pathway Tools by checking the annotation manually. Similar to GO enrichment analysis,

pathway enrichment analysis is done with statistical tests such as Fisher's exact test. By using separated

lists of up-regulated and down-regulated genes, it is possible to define which pathway is activated or

deactivated under specific condition.

1.5. Functional Gene Annotation

The prediction of functional role of genes is critical step for genomic studies. Functional

annotation is the first step that provide biological role of the gene. Without accurate annotation, there is

a big possibility to miss important results. The information of functional annotation of genes can be

26

vary depending on the databases such as UniProt (Boutet et al., 2016), RefSeq (O'Leary et al., 2016),

KEGG and GO. As a result of gene annotation, function of a gene can be defined with functional

description (protein name), EC number, Transporter Classification (TC) number (Saier et al., 2016) and

GO terms. EC and TC numbers are classification numbers, EC number is four digit numerical

classification for enzymes, and therefore it is more useful for defining enzyme functions, while TC

number is used for classification of membrane transport proteins. Automated functional gene

annotation is mostly done by sequence similarity and through homology. Sequence similarity method

uses sequence-sequence comparison against databases by using alignment algorithms by using

alignment tools like BLAST (Altschul et al., 1990). With best hit score of alignment, annotation

determined with finding homology to genes which function is experimentally known.

Automated gene annotation based on sequence similarity provide fast annotation, but error rate

can be quite high (Jones et al., 2007). Best matching sequence is not always give accurate results for

annotation (Promponas et al., 2015), and it requires manual curation to discard false positive

annotations. Due to it is very slow to check all annotation manually, bioinformatic tools which use

intelligent methods to decrease risk of incorrect annotation like PANNZER (Koskinen et al., 2015) can

be used. PANNZER reduces errors by using k-nearest neighbour clustering and novel weighted

statistical testing (Z-score) instead of best BLAST hit method. Additionally, the source of error and

misannotation can be also false positive annotations in databases, level of misannotation can reach 60%

for some databases (Schnoes et al., 2009). Therefore using multiple databases and choosing accurate

database is one of the critical steps for successful functional gene annotation.

27

2. Aims of the Study

The aim of this study was provide genomic and transcriptomic analysis of Lactobacillus

rhamnosus strain LC705 by using RNA-Seq technology and bioinformatic methods to increase

understanding of roles of L. rhamnosus strain LC705 during cheese ripening in different temperatures.

In this work, the specific focus was to:

I) Re-annotate gene annotations of strain LC705.

II) Analyze metabolic pathways of strain LC705 for cheese ripening by using updated gene

annotations.

III) Determine the gene expression responses of Strain LC705 during warm room and cold room cheese

ripening process.

28

3. Materials and Methods

3.1. Cheese Sampling and RNA-sequencing

RNA were extracted for RNA transcriptome analysis from three warm and three cold room

cheese samples during ripening process (Figure 4). Cheese type which was used for this research was

semi hard Maasdam-type cheese. For this study, semi hard Maasdam-type cheese cooked with

mesophilic cooking recipe by using mesophilic starter and thermophilic cultures; Lactococcus lactis

ssp. lactis, Lactococcus lactis ssp. cremoris, Lactobacillus rhamnosus strain LC705, Lactobacillus

helveticus and Propionibacterium freudenreichii ssp. shermanii JS. Three cooked cheeses named A, B

and C were cut into four parts and wrapped with ripening foil for the ripening step. First 30 days,

cheeses were ripened in warm room which was at +20°C. After 30 days in warm room ripening,

cheeses were transferred into cold room which was at +5°C, and ripening process was continued for

another 30 days. The first triplet samples were collected at warm room on day 12 of ripening and

samples were named A1, B1 and C1. The second triplet samples were collected at cold room on day 37

and samples were named A3, B3 and C3 (Figure 4).

Cell isolation process from cheese samples started by blending cheese samples with 2% tri-

sodium citrate in a Stomacher filter pouch and mixing with Stomacher blender for 2.5 minutes.

Obtained liquid was centrifuged for 7 minutes then liquid and fat were removed. Suspension of pellet

was done in 2 ml RNA Protect Bacteria Reagent (Qiagen). After second centrifuge process, liquid and

fat were removed again and cell pellet was obtained. For the RNA extraction from cell pellet, method

described by (Koskenniemi et al., 2011) was used.

RIBOZero kit (Epicentre) was used for removing rRNA by following manufacturer instructions.

By using Ovation RNA-Seq System V2 (NuGEN Technologies Inc., San Carlos, CA) RNA fractions

were amplified as cDNA and libraries were constructed. Libraries were sequenced by using the SOLiD

5500XL (Life technologies) sequencer with single-end sequencing method with non-strand-specific

protocol. SOLID 5500XL sequencer converted 306,557,758 reads to FASTQ-files. Created FASTQ-

files were transferred to computer servers for analyzing and interpreting the data.

29

Figure 4. Cooked semi hard Maasdam-type cheeses were ripened 60 days in total. 30 days in warm

room and 30 days in cold room. First triplet samples (A1, B1 and C1) collected at day 12, quarter of the

cheese was taken for RNA-sequencing. Second triplet samples (A3, B3 and C3) collected at day 37,

another quarter of the cheese was taken for RNA-sequencing.

3.2. Gene Annotation Update and Pathway Analysis

Before the RNA-Seq data analysis, as a first step of the project, the gene annotation of strain

LC705 (published on 2009 (Kankainen et al., 2009)) was updated. To update the gene annotation of

strain LC705, the bioinformatic tools PANNZER, Blast2GO (Conesa et al., 2005) and KEGG

automatic annotation server (KAAS) were used. For identification and visualization of the new

pathways with updated annotations, we used Pathway Tools. Gene annotation update was performed in

two steps, first step was updating conserved and putative proteins and the second was adding updated

enzyme code numbers to annotated genes.

30

For the conserved and putative proteins update, PANNZER was used. Proteins with Description

Similarity Measure scores higher than 0.7 were considered as significantly high score in PANNZER.

To obtain updated EC numbers, protein FASTA file, Blast2GO and KAAS tools were used.

With Blast2GO, we run NCBI BLASTp by using RefSeq and Swiss-Prot database by using parameters

of BLAST expectation value (E-value) 1.03E-3, number of BLAST hits 20 and taxonomy filter

bacteria. Thereafter annotation file contains EC numbers of 904 protein products was generated by the

tool with RefSeq database. KAAS tool was also used by uploading FASTA file of strain LC705 to

KAAS server and using BBH (bi-directional best hit) alignment method. KAAS gave us EC numbers

of 615 protein products as an output. Comparing Blast2GO and KAAS EC number annotation results

showed that 500 genes annotated commonly. Addition to 500 genes, Blast2GO annotated 404 unique

genes more while KAAS annotated 115 unique genes. Combination of these results gave us EC

numbers for 1019 genes. Obtained EC numbers and new protein descriptions from PANNZER were

added to annotation file of LC705 with manually written Python script and UNIX command-line

applications.

Finally, to see the difference between old protein products and new protein products, we

compared old annotation file and updated annotation file in Pathway Tools. For that purpose, we

converted annotation files to Genbank format with EMBOSS Seqret (Rice et al., 2000). Each different

pathway is checked manually by comparing two different annotation. New annotation file was also

used for metabolic pathway analysis of strain LC705 with Pathway Tools and important pathways

related to cheese ripening were reported.

3.3. RNA-Seq Data Analysis

RNA-Seq data analysis was performed to analyze differential gene expression analysis between

two different time points. Our RNA-Seq data analysis workflow was summarized in figure 5. Basically

workflow steps contain filtering and quality control, mapping (align the reads to genome), counting

reads per genes, differential expression analysis.

3.3.1. Filtering and Quality Control

Cutadapt tool was used to filter and trim sequences. SOLiD next generation platform uses adapter

sequence which is “CGCCTTGGCCGTACAGCAG” in library-construction step and create reads with

31

adapter sequence. This adapter sequence was removed from reads using Cutadapt. Addition to adapter

sequence trimming, reads that contain less than 30 base pairs and low-quality ends which have less

than 25 phred quality score were filtered by using “cutadapt -m 30 -q 25 -a

CGCCTTGGCCGTACAGCAG -o 3a_cutadapt.fastq 3a_combined.fastq” command for sample 3a, and

all samples were processed identically. Adapter sequence and quality trimming can create high number

of short reads. Short reads should be removed because short reads are hard to map unambiguously.

Next, quality check was done for reads by using FastQC.

3.3.2. Mapping and Counting Reads

Generated reads were mapped onto the genome of strain LC705 using Bowtie2. To check

mapping results by human eye, reads that aligned to the genome of strain LC705 are visualized by

Tablet software. Mapped reads were sorted and converted to BAM file format using SAMtools (Li et

al., 2009) with “samtools view -bS 3a_lc705_map.sam | samtools sort – 3a_lc705_map_sorted”

command for sample 3a, and all samples were processed identically. Sorted BAM files were used for

creating count data using HTSeq using “htseq-count -s no -f bam -t gene --idattr ID

3a_lc705_sorted.bam lc705.gff > 3a_lc705.counts” command. In more detail, with “-s no” command,

we specified non-strand-specific protocol was used for sequencing, “-f bam” used for specify format of

the input data, “-t gene” indicated feature type of annotation file, “--idattr ID” indicated feature ID that

is used in annotation file. The mode of HTSeq was not changed, default mode union which exclude

multi-overlapped reads was used.

3.3.3. Differential gene expression

The differential expressions between two time points were obtained with DESeq2 using R by creating

two conditions warm room and cold room ripening. Thus the design for differential expression analysis

was depending only on one condition. A1, B1 and C1 samples were assigned to condition warm room

ripening, and A3, B3, C3 samples were assigned to condition cold room ripening. By using this design

matrix, comparative analysis result table was created. For FDR values, DESeq2 software returns

adjusted P value (padj) for each genes. In this study genes with padj < 0.10 and log2 Fold Change > 1

were considered as significantly differentially expressed.

32

3.3.4. Gene enrichment analysis

Gene enrichment analysis performed with two different tools; Pathway Tools and Blast2GO.

Pathway Tools provided pathway enrichment analysis by using 0.10 P-value limiting and Fisher Exact

test. Blast2GO software assigned biological functions based on BLAST sequence homologies and

Gene ontology annotations of strain LC705. Gene enrichment analysis up- and down-regulated genes

over all strain LC705 genes were tested by using Blast2GO with 0.05 P-value filter mode, Fisher’s

Exact test.

33

Figure 5. For the beginning, adapter and low quality sequences were removed. Then the quality of

reads was checked. For the next step, origin of the reads was described by mapping to the reference

genome. Then reads were counted per genes using HTSeq with union mode and, finally differential

gene expression analysis was done by using statistical testing.

34

4. Results

4.1. Gene annotation update

We updated published gene annotations of the strain LC705 to obtain new and more accurate

gene annotations for our study. The published annotation file of strain LC705 contained approximately

850 proteins that did not have a specific gene annotation which were described as a putative or

conserved proteins in annotation file. Out of that 850 proteins that did not have a specific gene

annotation, we defined 77 proteins with PANNZER tools. Already annotated genes were updated by

assigning Enzyme Commission (EC) number. We used two different databases (RefSeq and Swiss-Prot)

for blasting. Annotation results from RefSeq database gave more proteins sequences with BLAST Hits

and GO annotation compared to annotation results from Swiss-Prot database (Figure 6). In more

detail, %75 of total protein sequences from LC 705 are annotated with Refseq database, while 52% of

total protein sequences annotated with Swiss-Prot database. Hence to creating EC numbers, we decided

to use RefSeq database annotation results. In total, we added 1197 EC numbers to annotation file of

strain LC705 by using Blast2GO and KAAS annotation results.

The comparison between old and updated annotation file in Pathway Tools showed that gene

annotation update was successful. With the updated annotation file we saw that there were 72 new

pathways were gained while 10 pathways were lost after the pathway analysis. Within these 72 new

pathways, there were several new pathways related to cheese ripening like sugar degradations, amino

acid degradations and fermentation pathways which are key pathways for cheese production and flavor

formation. The table 3 summarizes new pathways and pathways that got lost as a result of gene

annotation update.

4.2. Metabolic pathways and metabolism of strain LC705 related to cheese

Updated annotation file was analyzed to identify pathways and metabolism of strain LC705

related to cheese production. Pathways of enzymatic degradation of amino acids, carboxylates

degradation, carbohydrates degradation, production of biogenic amines, vitamin biosynthesis, lipolysis

and fatty acid biosynthesis and fermentation were pathways which were related to cheese development

35

and cheese flavor formation in LC705. In addition to cheese flavor pathways, we also saw pathways

which produce protective compounds for cheese.

Figure 6. Bar-plot illustration of gene annotation results by using RefSeq and Swiss-Prot databases.

RefSeq database results have more genes with BLAST Hits and GO annotation (The figure created

with Blast2GO).

36

4.2.1. Fermentation Pathways

There are two types of lactic acid fermentation; homolactic and heterolactic fermentation

(summarized in figure 7). According to pathway analysis strain LC705 can use both homolactic and

heterolactic fermentation.

In homolactic fermentation glucose which is obtained from disaccharides (lactose, maltose,

melibose, sucrose and trehalose) degradation is used to produce L-lactate in strain LC705. Glucose is

fermented to pyruvate, and then pyruvate is converted to L-lactate by L-lactate dehydrogenase enzyme.

In heterolactic fermentation fructose and glucose can be used as a source. Glucose is converted

to glucose 6-phosphate by glucokinase, fructose is first converted to fructose 6-phosphate by

fructokinase and then converted to glucose 6-phosphate by glucose-6-phosphate isomerase enzyme.

Strain LC705 does not have a gene that produces fructokinase which would convert fructose to fructose

6-phosphate. However, fructose 6-phosphate can be produced by mannitol, sorbitol, sorbose, sucrose

and mannose degradation in strain LC705. Therefore we can say that in heterolactic fermentation

fructose cannot be used by strain LC705 due to lack of fructokinase. Instead of fructose, some other

hexitols and hexoses are source of fructose 6-phosphate in strain LC705. Glucose and fructose 6-

phosphate are fermented to ethanol and pyruvate in heterolactic fermentation. Pyruvate is converted to

L-lactate and D-lactate by L-lactate dehydrogenase and D-lactate dehydrogenase.

4.2.2. Enzymatic Degradation of Amino Acids

For the enzymatic degradation of amino acids, we saw that LC705 has genes for degradation of

L-asparagine, L-aspartate, L-cysteine, L-glutamate, L-glutamine, L-ornithine, L-homocysteine, L-

lysine and L-serine. Altogether strain LC705 can degrade nine different amino acids. Within these

amino acids, degradation of L-aspartate, L-cysteine, L-serine and L-glutamate are important for cheese

flavor formation.

For the L-aspartate degradation pathway, strain LC705 was found to possess aminotransferases

and aspartate aminotransferase enzymes that convert L-aspartate to oxaloacetate which is an α-keto

acid. Oxaloacetate produced from L-aspartate degradation pathway can then be converted to pyruvate

by oxaloacetate decarboxylase enzyme.

37

Table 3. All gained and lost pathways after gene annotations updated.

38

Similarly, L-serine can also be converted to pyruvate and ammonium by L-serine dehydratase.

Produced pyruvate can be converted by LC705 into acetic acid, acetoin, ethanol and diacetyl which are

aroma compounds and contribute to flavor formation.

L-cysteine degradation occurs in strain LC705 by cystathionine gamma-synthase enzyme.

This enzyme converts L-cysteine to pyruvate, hydrogen sulfide, H2O and ammonium. Hydrogen sulfide

is a volatile sulfur compound (VSC). VSCs are very important flavor compounds in a variety of

cheeses (Fuchsmann et al., 2015).

The last important pathway of enzymatic amino acid degradation pathway related to cheese

flavor is L-glutamate degradation. Strain LC705 produces glutamate dehydrogenase enzyme which

converts L-glutamate to 2-oxoglutarate. 2-oxoglutarate is an α-ketoglutarate which is a good α-keto

donor for transaminases in lactic acid bacteria.

4.2.3. Carboxylates Degradation

For the carboxylates degradation, strain LC705 has potential to degrade nine different

carboxylates which are (S)-malate, 2-oxobutanoate, acetyl-CoA, glycolate, L-ascorbate, malonate,

citrate, D-gluconate, mevalonate. Within these nine different carboxylates, citrate degradation produces

important flavor compounds for cheese.

Our study showed that strain LC705 has citrate lyase genes that are responsible for citrate metabolism.

On the first step in this pathway, citrate lyase complex catalyzes citrate to acetate and oxaloacetate. On

the next step, oxaloacetate is decarboxylated by the oxaloacetate decarboxylase enzyme, pyruvate and

CO2 are generated as a result of this reaction. Acetoin and diacetyl are produced by pyruvate

degradation. Acetolactate synthase enzyme is responsible for converting 2 pyruvate to (S)-2-

acetolactate, and α-acetolactate decarbocylase enzyme is responsible for conversion of (S)-2-

acetolactate to (R)-acetoin in strain LC705. Diacetyl is produced by non-enzymatic oxidative

decarboxylation from (S)-2-acetolactate. However, strain LC705 does not have genes that are

responsible for the production of 2,3-butanediol from acetoin (Figure 8). Therefore strain LC705 can

produce only acetoin and diacetyl from citrate, but not 2,3-butanediol.

39

Figure 7. Heterolactic and homolactic pathways from LC705. Blue colored enzymes can be produced

by LC705. Red colored enzymes cannot be produced by LC705. Products of heterolactic fermentation

are CO2, ethanol, S-lactate and R-lactate. Product of homolactic fermentation is S-lactate.40

4.2.4. Carbohydrates Degradation

Carbohydrate sources for strain LC705 are galactose, L-rhamnose, lactose, sucrose, trehalose,

xylose, 2’-deoxy-a-D-ribose 1-phospate, N-acetylneuraminate, D-mannose, fructose, L-sorbose,

maltose, melibose, ribose. The genes responsible for the degradation of these carbohydrates are present

in the genome of the strain LC705.

Strain LC705 can catabolize lactose by using with two different pathways; lactose and galactose

degradation I and lactose degradation III. Lactose degradation III pathway converts lactose to galactose

and glucose by α- and β-galactosidase and glycosyl hydrolase. Lactose and galactose degradation I

pathway converts lactose 6’-phosphate to D-galactose 6’-phosphate and glucose by enzyme 6-phospho-

β-galactosidase. These glucose molecules which are produced by lactose degradation in strain LC705

are used in heterolactic and homolactic fermentation and converted to lactate.

4.2.5. Production of Biogenic Amines

Our pathway analysis showed that strain LC705 has a potential to produce biogenic amines

which are toxic compounds through two different pathways. The first pathway is conversion of L-

ornithine into putrescine by ornithine decarboxylase enzyme. The second pathway is degradation of L-

lysine to cadaverine and CO2 by lysine decarboxylase enzyme. Expression levels of decarboxylase

enzyme and lysine decarboxylase enzyme were also checked, and responsible genes were

expressedduring cheese ripening. Consequently, the genome of strain LC705 contains genes which are

responsible to produce putrescine and cadaverine.

4.2.6. Vitamin Biosynthesis

Results of pathway analysis indicated that vitamin B biosynthesis is also possible by strain

LC705. Cobalamin (vitamin B12), thiamine (vitamin B1), vitamin B6 biosynthesis pathways were found

in the genome.

The strain LC705 does not have the full genetic information required for biosynthesis of

adenosylcobalamin which is one of the active forms of vitamin B12. Biosynthetic pathways for

adenosylcobalamin require several genes (Rodionov et al., 2003). However, strain LC705 possesses

only Cob(I)alamin adenosyltransferase which produces adenosylcobinamide from cob(I)alamin. So that

we can say that in presence of cob(I)alamin, strain LC705 is able to produce vitamin B12.

41

Figure 8. Summary of the production of buttery flavor compounds (acetoin and diacetyl) in strain

LC705. Buttery flavor compounds are produced through citrate degradation. Responsible enzyme

names and NCBI protein numbers are given in picture. Lactic acid bacteria that have genes to produce

acetoin reductase can produce 2,3-butanediol. Nevertheless, strain LC705 does not have these genes.

Pathway analysis results showed that thiamine (vitamin B1) is produced via salvage pathways in

LC705. Five genes are present in the genome with predicted functions related to thiamine biosynthesis.

Pyridoxal 5′-phosphate (PLP) is the active form of vitamin B6. Pathway analysis showed that

strain LC705 can produce PLP by salvage pathway. Bacteria can synthesize PLP by two main

pathways; de novo pathway and a salvage pathway (El Qaidi et al., 2013). Strain LC705 does not have

genes to produce PLP by the de novo pathway, but it has genes for salvage pathway. In salvage

42

pathway vitamers which are pyridoxine, pyridoxamine and pyridoxal are phosphorylated by kinases in

strain LC705 and, phosphorylated compounds are then oxidized to PLP by pyridoxine 5'-phosphate

oxidase.

4.2.7. Lipolysis and fatty acid biosynthesis

Pathway analysis showed that strain LC705 does not have many pathways for fatty acid and

lipid degradation. According to pathway analysis, LC705 has only glycerol degradation I pathway. In

this pathway glycerol is degraded by glycerol kinase and α-glycerophosphate oxidase to glycerone

phosphate which is a simple carbohydrate. Produced glycerone phosphate is used then in

glyconeogenesis and glycolysis.

Addition to lipolysis, LC705 has pathways that produce several fatty acids. These fatty acids are

cyclopropane fatty acid, palmitate, stearate, vaccenate and gondoate. These fatty acids do not have

direct effect to the cheese flavors.

4.2.8. Protective activities

LC705 has a potential to produce anti-fungal compounds. During fermentation, strain LC705

produces organic acids, like lactic acid and acetic acid, which increase acidity of the environment. Not

only organic acids are produced from strain LC705, but also some specific anti-fungal compounds are

produced. There are 26 anti-fungal compounds published for Lactic acid bacteria (Brosnan et al.,

2012). LC705 has genes that responsible to produce Cytidine (CAR89628.1, CAR89573.1 and

CAR89636.1), succinic acid (CAR91792.1 and CAR89391.1), and hydrogen peroxide (CAR88894.1

and CAR89058.1) that belong to these anti-fungal compounds. In addition to the anti-fungal activities,

strain LC705 has prebacteriocin and bacteriocin transporter protein genes which could be related to the

inhibition of the growth of other bacterial strains.

43

4.3. RNA-Seq data analysis

4.3.1. Read Filtering, Quality Check and Mapping

We had approximately 152 and 154 million RNA-Seq reads obtained from the warm and cold

room samples, respectively. After filtering these reads, we observed that more than 80% reads passed

filters for each sample and replicate (Table 4). Quality control results were good to continue with this

reads. Mapping results showed that between 4.91% and 7.7% of reads from first time point samples

and between 8.7% and 11.1% of reads from the second time point samples mapped to the strain LC705

genome (Table 4).

Table 4. Filtering and mapping results for every sample including total reads and written reads.

4.3.2. Differential gene analysis

Our differential gene analysis showed that there were 327 significantly differentially expressed

genes between warm room ripening and cold room ripening (padj <0.10 | log2fold > 1). Figure 9 shows

the most significantly differentially expressed genes between cold room and warm room ripening

according to adjusted p-values (padj) of differentially expressed genes. Within 327 differentially

expressed genes 153 genes were up-regulated and 174 genes were down-regulated during cold room

cheese ripening. To analyze these differentially expressed genes, gene enrichment analysis was done as

a next step.

44

Figure 9. Heat map of the 60 most significantly differentially expressed genes between 2 time points.

Dark blue represents high expression (high number of reads), light blue represents low expression (low

number of reads).

4.2.3. Gene Ontology Enrichment Analysis

Gene ontology enrichment analysis for up-regulated genes (Figure 10) showed that gene ontology

terms belonging to biological process and molecular function while there are no cellular component

GO terms. In total we saw 102 different enriched GO terms.

45

Enriched GO terms for up-regulated genes during cold room cheese ripening are mostly related

to biosynthesis of purine, ribonucleoside, nucleobase and nucleoside. Transport GO term

(GO:0006810) was also highly enriched in biological process GO terms for up-regulated genes in cold

room. The GO terms of inorganic anion transmembrane transport, inorganic anion transport, phosphate

ion transport, sulfur compound transport, phosphate transmembrane transporter activity, sulfate

transmembrane transport, sulfate transport and inorganic ion transmembrane transport are main

transport related GO terms for cold-induced genes of strain LC705. Additionally, we saw both negative

and positive regulation GO terms in enrichment analysis for up-regulated genes in cold room.

Biological process GO terms contain 16 negative regulation GO terms. At the same time there are five

positive regulation GO terms which are same with negative regulations (RNA metabolic process,

transcription (DNA-templated), RNA biosynthetic process, nucleic acid-templated transcription,

nucleobase-containing compound metabolic process). Genes that cause both negative regulation and

positive regulation were up-regulated in cold room.

GO enrichment analysis for down-regulated genes during cold room cheese ripening (Figure

11) showed terms mostly in biological process and molecular function while there are only 3 terms in

cellular component GO terms. There were 28 GO terms in total.

The biological process GO terms indicated that carbohydrate metabolism genes expression was

significantly down-regulated during cold room ripening. Lactose, disaccharide and oligosaccharide

metabolic and catabolic processes were down-regulated during cold room ripening. Some other

metabolic processes like acetyl-CoA and nucleobase metabolic processes were also enriched for down-

regulated genes in cold room. On the molecular function GO terms, we saw that in general lyase

activity genes were down-regulated during cold room ripening. Particularly, carbon-carbon lyase,

citrate lysase and carboxy-lyase were significantly down-regulated during cold room ripening. Most of

the other molecular function GO terms were related to degradation and utilization of carbohydrates and

carboxylates like galactose-6-phosphate isomerase activity, glutaconyl-CoA decarboxylase activity,

alpha-glucosidase activity, tagatose-bisphosphate aldolase activity and oxaloacetate decarboxylase

activity. The GO term citrate lyase complex which is found in down-regulated cellular component GO

terms is also additional evidence that supports the results that lyase activity genes were down-regulated

during cold room ripening.

46

Figure 10.Barplots of differentially enriched GO terms for up-regulated genes in cold room ripening.

GO terms were sorted by p-value.

47

Figure 11. Barplots of differentially enriched GO terms for down-regulated genes in cold room

ripening. GO terms were sorted by p-value.

48

4.2.4. Pathway enrichment analysis

As a result of the pathway enrichment analysis of the significantly up-regulated and down-

regulated genes in cold room cheese ripening, we found 28 significantly enriched pathways (fisher test,

values lower than 0.10) for up-regulated genes (Table 5) and 20 significantly enriched pathway (fisher

test, values lower than 0.10) for down-regulated genes in cold room ripening (Table 6).

Pathway enrichment analysis for up-regulated genes in cold room verified that in cold room

ripening strain LC705 produces high amount of nucleosides and nucleotides, 13 of 28 significantly

enriched pathways were related to nucleosides and nucleotides biosynthesis. We also observed that in

cold room strain LC705 started to use some alternative sugars as carbon sources such as L-malate, S-

malate, Chitin and N-acetylglucosamine. Up-regulation of reductive monocarboxylic acid cycle

pathway is a good indicator that of increased formate biosynthesis in cold room ripening. In that

pathway pyruvate is converted to acetyl-CoA and formate. We saw that pyruvate fermentation to

ethanol pathway was significantly up-regulated during cold room ripening.

Pathway enrichment analysis for down-regulated genes in cold room showed that citrate

degradation and citrate lyase activation pathways had lowest p-value within other down-regulated

pathways. Following pathways are lactose and galactose degradation pathways. Pathway enrichment

analysis also showed that responsible genes for main pathways like sugar degradation, carbohydrate

degradation, glycolysis and homolactic fermentation were down-regulated in cold room ripening.

49

Table 5. List of pathways which were obtained from enrichment analysis for up-regulated genes in cold

room

50

Table 6. List of pathways which were obtained from enrichment analysis for down-regulated genes in

cold room.

51

5. Discussion

For this study, pathways of the strain LC705 were analyzed by Pathway Tools software. For

pathway analysis, using updated gene annotations is highly important to get accurate results. For our

project, before the gene annotation update step, published gene annotations were containing only name

of the gene products and around 850 non-annotated genes. To obtain better results in pathway analysis,

we updated gene annotations and we added EC numbers to annotations. For the EC number annotation,

the database selection was challenging. We tested different databases for NCBI BLASTp step and we

chose to use RefSeq database due to high number of annotations. KEGG database also provided EC

numbers for non-annotated proteins. As a result, 72 new pathways were discovered by Pathway Tools.

Pathway Tools software showed 201 different pathways for strain LC705 and 72 new pathways

approximately equal to 35% of the whole 201 pathways. These new pathways were important pathways

with critical roles in sugar degradation, amino acid degradation and fermentation. For example, as a

facultative heterofermentative lactobacilli, strain LC705 should have heterolactic and homolactic

fermentation pathways. Without updating the annotations we cannot see heterolactic fermentation

pathway for strain LC705 with Pathway Tools software but the pathway is among the 72 new

pathways. The gene re-annotation part was very important for this project, since due to this step we

gain the ability to characterize pathways for strain LC705 in more thorough way. To have right

pathways is critical for genome wide transcriptomic analysis in this project.

Lactobacilli species are one of the most important key factors in cheese production and cheese

flavor development. These species can be used in cheese ripening process as a starter culture or non-

starter adjunct culture to increase quality of the cheese. During cheese ripening, microbiological and

biochemical changes occur on cheese and used starter and adjunct cultures influence cheese quality and

flavor. For this study, we analyzed role of strain LC705 during cheese ripening. Strain LC705 was used

as an adjunct culture in this study. Adjunct cultures play important role in providing or enhancing the

characteristic flavors and textures of cheese (Settanni and Moschetti, 2010). Flavor formation which

occurs during cheese ripening is a complicated activity. Glycolysis, lipolysis and proteolysis are the

leading catabolic pathways that are involved to flavor formation (Yvon and Rijnen, 2001). It has also

been shown that amino acid catabolism has an important role in flavor formation in cheese by

producing aroma compounds (Yvon and Rijnen, 2001).

52

Our pathway analysis showed that strain LC705 contains at least 201 different pathways and

several number of pathways are related to cheese ripening, flavor and texture of the cheese.

Genome of the strain LC705 is equipped with important enzymes participating amino acid

degradation. By degrading L-aspartate and L-serine, buttery flavor products were produced by strain

LC705. However, buttery flavor is mostly produced through citrate degradation (Martino et al., 2016).

L-aspartate and L-serine degradation is a good alternative to produce diacetyl and acetoin for bacteria

which cannot degrade citrate like some Pediococcus strains (Irmler et al., 2013). We also saw that

strain LC705 can produce hydrogen sulfide which is a VSC. Mostly, VSCs are produced from the

degradation of the sulfur-containing amino acids like methionine and cysteine. VSCs are highly volatile

and reactive hence even with low quantities the flavor can be highly affected (Fuchsmann et al., 2015).

As a VSC, hydrogen sulfide produces unlikeable odor of rotten eggs (Landaud et al. 2008). Now we

have identified genes that are responsible for hydrogen sulfide production which may make strain

LC705 a strong flavor effector for cheese. L-glutamate degradation is important factor for amino acid

degradation activities with α-ketoglutarate end product. It is shown that deprivation of α-ketoglutarate

limits the amino acid degradation activities which enhance cheese aroma in lactic acid bacteria

(Amárita et al., 2006). Lactic acid bacteria which cannot produce α-ketoglutarate by glutamate

degradation, need to uptake α-ketoglutarate from external source by citrate transporter (Pudlik and

Lolkema, 2013). 2-oxoglutarate, an α-ketoglutarate, is produced by L-glutamate degradation in LC705.

Due to the genes that produce α-ketoglutarate compound LC705 does not need α-ketoglutarate from an

external source.

Genes responsible to citrate degradation were also found in the genome of strain LC705. Citrate

degradation is important for cheese flavor and limited number of LAB species are capable of utilizing

citrate (Smid and Kleerebezem, 2014). Citrate fermenting pathways are responsible for producing

acetoin, diacetyl and 2,3 butanediol which provide buttery flavor to cheese (Martino et al., 2016). Our

study showed that LC705 does not contain genes to produce 2,3 butanediol. Even the production of

diacetyl and acetoin is important for cheese, as it provides the buttery flavor. Citrate fermentation

pathways also produce CO2. This CO2 production by citrate metabolism can also be important for the

eye formation in cheese (O'Sullivan et al., 2016). Thus we can say that strain LC705 is responsible for

gas formation and flavor formation by having citrate degradation pathway.

Lactose and galactose are the two main simple carbohydrates in milk. In cheese ripening

process, mostly starter cultures are used for the lactose degradation. As an adjunct culture, LC705

53

possess lactose degradation genes, hence it can join lactose degradation process with starter cultures

and can be also named adjunct starter culture. Galactose degradation is important for cheese because

the existence of galactose was found to be related with lower qualities of the fermented products

(Neves et al., 2010). Additionally the existence if galactose is one of the major reasons why dairy

products turn brown when they are cooked or heated (Neves et al., 2010). Thus galactose utilization is

needed to develop higher-quality dairy products. Strain LC705 has galactose degradation genes.

Therefore, as an adjunct starter, strain LC705 contributes to cheese quality.

On the other hand adjunct cultures can be produce unwanted by-products like biogenic amines

(Weinrichter et al., 2004). Biogenic amines which pose a risk for consumer are formed by microbial

decarboxylation of the corresponding amino acids (Bunkova et al., 2010). Our pathway analysis results

showed that strain LC705 possesses genes to produce putrescine and cadaverine. Putrescine and

cadaverine enhance toxic effect of other amines by inhibiting of detoxifying enzymes (Bunkova et al.,

2010) and can cause some pharmacological effects like hypotension and bradycardia (Shalaby, 1996). It

is difficult to remove biogenic amines after they are formed. Therefore the formation of the biogenic

amines should be controlled (Valsamaki et al., 2000). Earlier studies (Bunkova et al., 2010; Valsamaki

et al., 2000; Pachlová et al., 2012) have showed that biogenic amines are produced in larger quantities

at higher temperature in cheese ripening. Interestingly, our transcription analysis between cold and

warm room ripening did not show any significant difference for genes that responsible for producing

putrescine and cadaverine. However, temperature is not only the factor that influence production of

biogenic amines. Bacterial density, pH and NaCl concentration have a significant effect on production

of biogenic amines (Valsamaki et al., 2000). Therefore we can say that these factors limited the

production of biogenic amines in warm room ripening.

Many industrial lactic acid bacteria can synthesize vitamin B complexes (Capozzi et al., 2012).

Our pathway analysis results showed that strain LC705 is equipped with genes to produce cobalamin

(vitamin B12), thiamine (vitamin B1), and vitamin B6. It is important to produce thiamine because it

plays a key role in energy metabolism. Thiamine is a fundamental cofactor for several types of

enzymes. It is required in alcohol fermentation and catabolism of carbohydrates (Falder et al., 2010). It

is also an advantage for cheese production as animals cannot produce thiamine and thus need to obtain

it from their diet (Sriram et al., 2012). Additionally, vitamin B6 is an essential cofactor and has several

key roles in amino acid and cellular metabolism (Toney, 2005).

54

Lipolysis is one of the major sources of cheese flavor and odor compounds. Though strain

LC705 does not have important genes for lipolysis activities. Therefore LC705 does not contribute to

flavor enhancement by lipolysis.

LAB and Propionibacterium are defined as bio-preservative micro-organisms because they have

a role in preservation of fermented food. It is shown that combination of Lactobacillus rhamnosus

strain LC705 and Propionibacterium freudenreichii ssp. shermanii strain JS is the most effective

bacterial combination against yeast and molds, but the actual mechanism behind this behavior was not

elucidated (Suomalainen and Mäyrä-Mäkinen, 1999). Our study showed that LC705 produces organic

acids and anti-fungal compounds. Anti-fungal activity occurs by synergistic effect of organic acids and

anti-fungal compounds (Le Lay et al., 2016). Moreover, it has been shown that acetic acid is the main

compound for anti-fungal activity especially against Penicillium corylophilum and Eurotium repens

(Le Lay et al., 2016). It is known that strain JS can produce acetate and propionic acid (Ojala et al.,

2016). Result that why the combination of LC705 and Propionibacterium freudenreichii ssp. shermanii

strain JS is the most effective bacterial combination against yeast and molds can be explained by

production of anti-fungal compounds, acetic acid and propionic acid from both strains. Synergistic

effect of these organic acids and anti-fungal compounds make these two bacteria effective anti-fungal

micro-organisms.

Lactobacillus rhamnosus species have a protective effects against Bacillus species

(Suomalainen and Mäyrä-Mäkinen, 1999) and Listeria species (Luukkonen et al., 2005). For strain

LC705, the mechanism of anti-bacterial effect was not known. In our research we found out that strain

LC705 has prebacteriocin and bacteriocin transporter protein genes which may explain why it represses

the growth of clostridia. Additionally we checked the expression levels of these genes with

transcriptomic analysis. We verified that prebacteriocin genes were expressed in LC705. These genes

have not been reported before.

Gene expression profiling showed that there were 327 significantly differentially expressed

genes between warm room and cold room ripening. 153 genes were are up-regulated, while 174 genes

were down-regulated during cold room cheese ripening.

Data from enrichment analysis for up-regulated genes in cold room cheese ripening indicated

that nucleobase and nucleoside biosynthesis genes are cold induced genes. We especially saw many of

the purine biosynthesis genes highly transcribed in cold room. Purine nucleotides have many key roles

in cellular metabolism like the structure of DNA and RNA, serving as enzyme cofactors, functioning in

55

cellular signaling, acting as phosphate group donors, and generating cellular energy (Zhao et al., 2015).

In addition, transporter activity was also increased for ions such as phosphate and sulfate. Increasing of

phosphate ion transporter expression can be explained by the increase of purine production, as during

purine synthesis high amount of phosphate ions are generated. This may indicate that strain LC705

produces high amount of phosphate ions and transport these during cold room cheese ripening. It is

interesting that we saw both negative regulation and positive regulation GO terms for this enrichment

analysis. Explanation of up-regulated negative regulation genes can be that the temperature change may

have activated the defense mechanism of LC705 against a heat-shock stress. In cold room, strain

LC705 started to use alternative sugars as a carbon sources. Finally in the cold room ripening, we saw

up-regulation for alternative pyruvate degradation activities for strain LC705. Pyruvate is degraded to

lactic acid, acetoin, diacetyl, acetyl-CoA and ethanol in strain LC705. Lactic acid and buttery

compounds (acetoin and diacetyl) production is critical for cheese ripening. However, in cold room

ripening, genes that convert pyruvate to acetyl-CoA and ethanol were up-regulated. Ethanol is a volatile

compound which gives dry and dusty odor to cheese. (Pogacic et al., 2016). It is interesting, because

volatile and aroma compound production is mostly up-regulated in warm room cheese ripening. In cold

room, instead of producing lactic acid and buttery flavor compounds, strain LC705 mostly produce

acetyl-CoA and ethanol.

Results of enrichment analysis for down-regulated genes in cold room cheese ripening showed

that central metabolism and lyase genes of strain LC705 are warm induced genes. Important main

carbohydrate degradation genes were down-regulated in cold room. Degradation activities are very

important for cheese production and cheese flavor. We also showed that in cold room alternative carbon

degradation genes of strain LC705 were up-regulated. Using alternative carbon sources instead of main

carbohydrates may indicate that other carbon sources were finished or decreased dramatically by the

day 37 of the cheese ripening or the main carbohydrate genes are warm induced genes. We clearly saw

that citrate degradation genes were down-regulated in cold room ripening. Citrate degradation is the

main pathway for producing buttery aroma in strain LC705. It is clear that citrate degradation genes are

warm induced genes, and strain LC705 produces buttery flavor to cheese during warm room ripening.

56

6. Conclusions

In this study, one of the main aims was to update the gene annotations of Lactobacillus

rhamnosus strain LC705. Annotation update provided new information about genome of the strain

LC705. By analyzing new gene annotations, this study provided understanding on the role of L.

rhamnosus strain LC705 in cheese ripening. Strain LC705 contributes to cheese ripening stage with

flavor formation and protective activities by producing lactic acid, buttery flavor compounds, volatile

sulfur compounds, biogenic amines, vitamins, some anti-fungus compounds and bacteriocin.

This study compared gene expression responses of strain LC705 during Maasdam-type cheese

ripening at warm temperature (25 °C) and cold temperature (5 °C). Central metabolism genes L.

rhamnosus LC705 that provide main carbohydrate degradation are warm induced genes. Buttery flavor

is provided to cheese by strain LC705 in warm room. In cold room strain LC705 preferably converts

pyruvate to acetyl-CoA and ethanol instead lactic acid and buttery flavor compounds.

57

7. Acknowledgements

First of all, I would like to thank my supervisors, Olli-Pekka Smolander and Petri Auvinen for

their encouragement, expert guidance, support and commenting my thesis through the study. I have

learned a lot from them.

I would also like to thank all members of DNA Sequencing and Genomics laboratory of the

Institute of Biotechnology, especially Juhana Kammonen and Margarita Andreevskaya for giving very

useful advices and providing comfortable and friendly working environment.

Special thanks to my family for supporting me all the time and covering all my expenses with

big sacrifices during my master’s degree studies and thesis. Warmly thanks to my lovely friend group

Mataramasuko. Last but not least, I would like to thank to Oona Nevanen for supportive and patient

listening and interest in my thesis.

58

8. References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

Amárita, F., de, l.P., Fernández, d.P., Requena, T., and Peláez, C. (2006). Cooperation between wild lactococcal strains for cheese aroma formation. Food Chem. 94, 240-246.

Anders, S., Pyl, P.T., and Huber, W. (2015). HTSeq--a Python framework to work with high-throughputsequencing data. Bioinformatics 31, 166-169.

Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

Avraham, R., Haseley, N., Fan, A., Bloom-Ackermann, Z., Livny, J., and Hung, D.T. (2016). A highly multiplexed and sensitive RNA-seq protocol for simultaneous analysis of host and pathogen transcriptomes. Nat. Protocols 11, 1477-1491.

Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma, K., Gerdes, S., Glass, E.M., Kubal, M., et al. (2008). The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9, 75-2164-9-75.

Bergonzelli, G.E., Granato, D., Pridmore, R.D., Marvin-Guy, L.F., Donnicola, D., and Corthesy-Theulaz, I.E. (2006). GroEL of Lactobacillus johnsonii La1 (NCC 533) is cell surface associated: potential role in interactions with the host and the gastric pathogen Helicobacter pylori. Infect. Immun. 74, 425-434.

Bernardeau, M., Guguen, M., and Vernoux, J.P. (2006). Beneficial lactobacilli in food and feed: long-term use, biodiversity and proposals for specific and realistic safety assessments. FEMS Microbiol. Rev. 30, 487-513.

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.

Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bansal, P., Bridge, A.J., Poux, S., Bougueleret, L., and Xenarios, I. (2016). UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol. Biol. 1374, 23-54.

Bouton, Y., Buchin, S., Duboz, G., Pochet, S., and Beuvier, E. (2009). Effect of mesophilic lactobacilli and enterococci adjunct cultures on the final characteristics of a microfiltered milk Swiss-type cheese. Food Microbiol. 26, 183-191.

Bray, N.L., Pimentel, H., Melsted, P., and Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525-527.

59

Brosnan, B., Coffey, A., Arendt, E.K., and Furey, A. (2012). Rapid identification, by use of the LTQ Orbitrap hybrid FT mass spectrometer, of antifungal compounds produced by lactic acid bacteria. Anal.Bioanal Chem. 403, 2983-2995.

Bunkova, L., Bunka, F., Mantlova, G., Cablova, A., Sedlacek, I., Svec, P., Pachlova, V., and Kracmar, S. (2010). The effect of ripening and storage conditions on the distribution of tyramine, putrescine and cadaverine in Edam-cheese. Food Microbiol. 27, 880-888.

Burns, P., Cuffia, F., Milesi, M., Vinderola, G., Meinardi, C., Sabbag, N., and Hynes, E. (2012). Technological and probiotic role of adjunct cultures of non-starter lactobacilli in soft cheeses. Food Microbiol. 30, 45-50.

Capozzi, V., Russo, P., Duenas, M.T., Lopez, P., and Spano, G. (2012). Lactic acid bacteria producing B-group vitamins: a great potential for functional cereals products. Appl. Microbiol. Biotechnol. 96, 1383-1394.

Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., et al. (2016). The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471-80.

Collado, M.C., Meriluoto, J., and Salminen, S. (2007). In vitro analysis of probiotic strain combinationsto inhibit pathogen adhesion to human intestinal mucus. Food Res. Int. 40, 629-636.

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., and Robles, M. (2005). Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674-3676.

Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szczesniak, M.W., Gaffney, D.J., Elo, L.L., Zhang, X., and Mortazavi, A. (2016). A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13-016-0881-8.

Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNA-seq. Curr. Opin. Microbiol. 13, 619-624.

Crow, V., Curry, B., and Hayes, M. (2001). The ecology of non-starter lactic acid bacteria (NSLAB) and their use as adjuncts in New Zealand Cheddar. Int. Dairy J. 11, 275-283.

Dai, M., Thompson, R.C., Maher, C., Contreras-Galindo, R., Kaplan, M.H., Markovitz, D.M., Omenn, G., and Meng, F. (2010). NGSQC: cross-platform quality analysis pipeline for deep sequencing data. BMC Genomics 11 Suppl 4, S7-2164-11-S4-S7.

de Sá, P.H.C.G., Veras, A.A.O., Carneiro, A.R., Pinheiro, K.C., Pinto, A.C., Soares, S.C., Schneider, M.P.C., Azevedo, V., Silva, A., and Ramos, R.T.J. (2015). The impact of quality filter for RNA-Seq. Gene 563, 165-171.

60

Deutscher, M.P. (2006). Degradation of RNA in bacteria: comparison of mRNA and stable RNA. Nucleic Acids Res. 34, 659-666.

Dillies, M.A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., et al. (2013). A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 14, 671-683.

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21.

Dong, X., Hao, Y., Wang, X., and Tian, W. (2016). LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights. Sci. Rep. 6, 18871.

Douillard, F.P., Ribbera, A., Kant, R., Pietila, T.E., Jarvinen, H.M., Messing, M., Randazzo, C.L., Paulin, L., Laine, P., Ritari, J., et al. (2013). Comparative genomic and functional analysis of 100 Lactobacillus rhamnosus strains and their comparison with strain GG. PLoS Genet. 9, e1003683.

El Qaidi, S., Yang, J., Zhang, J.R., Metzger, D.W., and Bai, G. (2013). The vitamin B(6) biosynthesis pathway in Streptococcus pneumoniae is controlled by pyridoxal 5'-phosphate and the transcription factor PdxR and has an impact on ear infection. J. Bacteriol. 195, 2187-2196.

El-Nezami, H., Kankaanpaa, P., Salminen, S., and Ahokas, J. (1998). Ability of dairy strains of lactic acid bacteria to bind a common food carcinogen, aflatoxin B1. Food Chem. Toxicol. 36, 321-326.

El-Nezami, H., Polychronaki, N., Salminen, S., and Mykkanen, H. (2002). Binding rather than metabolism may explain the interaction of two food-Grade Lactobacillus strains with zearalenone and its derivative (')alpha-earalenol. Appl. Environ. Microbiol. 68, 3545-3549.

El-Nezami, H.S., Polychronaki, N.N., Ma, J., Zhu, H., Ling, W., Salminen, E.K., Juvonen, R.O., Salminen, S.J., Poussa, T., and Mykkanen, H.M. (2006). Probiotic supplementation reduces a biomarker for increased risk of liver cancer in young men from Southern China. Am. J. Clin. Nutr. 83, 1199-1203.

Espino, E., Koskenniemi, K., Mato-Rodriguez, L., Nyman, T.A., Reunanen, J., Koponen, J., Ohman, T.,Siljamaki, P., Alatossava, T., Varmanen, P., and Savijoki, K. (2015). Uncovering surface-exposed antigens of Lactobacillus rhamnosus by cell shaving proteomics and two-dimensional immunoblotting. J. Proteome Res. 14, 1010-1024.

Esteban-Torres, M., Mancheño, J.M., de, l.R., and Muñoz, R. (2014). Production and characterization of a tributyrin esterase from Lactobacillus plantarum suitable for cheese lipolysis. J. Dairy Sci. 97, 6737-6744.

Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186-194.

61

Falder, S., Silla, R., Phillips, M., Rea, S., Gurfinkel, R., Baur, E., Bartley, A., Wood, F.M., and Fear, M.W. (2010). Thiamine supplementation increases serum thiamine and reduces pyruvate and lactate levels in burn patients. Burns 36, 261-269.

Fallico, V., McSweeney, P.L.H., Horne, J., Pediliggieri, C., Hannon, J.A., Carpino, S., and Licitra, G. (2005). Evaluation of Bitterness in Ragusano Cheese*. J. Dairy Sci. 88, 1288-1300.

FAO/WHO, (2012). Guidelines for the Evaluation of Probiotics in Food, Working group report Food and Agricultural Organization of the United Nations and World Health Organization.

Feng, J., Meyer, C.A., Wang, Q., Liu, J.S., Shirley Liu, X., and Zhang, Y. (2012). GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data. Bioinformatics 28, 2782-2788.

Fontana, C., Bassi, D., Lopez, C., Pisacane, V., Otero, M.C., Puglisi, E., Rebecchi, A., Cocconcelli, P.S., and Vignolo, G. (2016). Microbial ecology involved in the ripening of naturally fermented llama meat sausages. A focus on lactobacilli diversity. Int. J. Food Microbiol. 236, 17-25.

Fuchsmann, P., Stern, M.T., Brugger, Y.A., and Breme, K. (2015). Olfactometry Profiles and Quantitation of Volatile Sulfur Compounds of Swiss Tilsit Cheeses. J. Agric. Food Chem. 63, 7511-7521.

Gálvez, A., Abriouel, H., López, R.L., and Omar, N.B. (2007). Bacteriocin-based strategies for food biopreservation. Int. J. Food Microbiol. 120, 51-70.

Gamallat, Y., Meyiah, A., Kuugbee, E.D., Hago, A.M., Chiwala, G., Awadasseid, A., Bamba, D., Zhang, X., Shang, X., Luo, F., and Xin, Y. (2016). Lactobacillus rhamnosus induced epithelial cell apoptosis, ameliorates inflammation and prevents colon cancer development in an animal model. Biomed. Pharmacother. 83, 536-541.

Gao, C., Major, A., Rendon, D., Lugo, M., Jackson, V., Shi, Z., Mori-Akiyama, Y., and Versalovic, J. (2015). Histamine H2 Receptor-Mediated Suppression of Intestinal Inflammation by Probiotic Lactobacillus reuteri. Mbio 6, e01358-15.

Garofalo, C., Zannini, E., Aquilanti, L., Silvestri, G., Fierro, O., Picariello, G., and Clementi, F. (2012).Selection of sourdough lactobacilli with antifungal activity for use as biopreservatives in bakery products. J. Agric. Food Chem. 60, 7719-7728.

Gene Ontology Consortium, Blake, J.A., Dolan, M., Drabkin, H., Hill, D.P., Li, N., Sitnikov, D., Bridges, S., Burgess, S., Buza, T., et al. (2013). Gene Ontology annotations and resources. Nucleic Acids Res. 41, D530-5.

Giraffa, G., Chanishvili, N., and Widyastuti, Y. (2010). Importance of lactobacilli in food and feed biotechnology. Res. Microbiol. 161, 480-487.

62

Gobbetti, M., De Angelis, M., Di Cagno, R., Mancini, L., and Fox, P.F. (2015). Pros and cons for using non-starter lactic acid bacteria (NSLAB) as secondary/adjunct starters for cheese ripening. Trends FoodSci. Technol. 45, 167-178.

Halttunen, T., Collado, M.C., El-Nezami, H., Meriluoto, J., and Salminen, S. (2008). Combining strainsof lactic acid bacteria may reduce their toxin and heavy metal removal efficiency from aqueous solution. Lett. Appl. Microbiol. 46, 160-165.

Hannon, J.A., Wilkinson, M.G., Delahunty, C.M., Wallace, J.M., Morrissey, P.A., and Beresford, T.P. (2003). Use of autolytic starter systems to accelerate the ripening of Cheddar cheese. Int. Dairy J. 13, 313-323.

Heredia-Castro, P.Y., Mendez-Romero, J.I., Hernandez-Mendoza, A., Acedo-Felix, E., Gonzalez-Cordova, A.F., and Vallejo-Cordoba, B. (2015). Antimicrobial activity and partial characterization of bacteriocin-like inhibitory substances produced by Lactobacillus spp. isolated from artisanal Mexican cheese. J. Dairy Sci. 98, 8285-8293.

Holma, R., Kekkonen, R.A., Hatakka, K., Poussa, T., Vaarala, O., Adlercreutz, H., and Korpela, R. (2011). Consumption of Galactooligosaccharides together with Probiotics Stimulates the In Vitro Peripheral Blood Mononuclear Cell Proliferation and IFNγ Production in Healthy Men. ISRN Immunology 2011, 6.

Hou, Z., Jiang, P., Swanson, S.A., Elwell, A.L., Nguyen, B.K.S., Bolin, J.M., Stewart, R., and Thomson, J.A. (2015). A cost-effective RNA sequencing protocol for large-scale gene expression studies. Scientific Reports 5, 9570.

Huang, D.W., Sherman, B.T., Tan, Q., Collins, J.R., Alvord, W.G., Roayaei, J., Stephens, R., Baseler, M.W., Lane, H.C., and Lempicki, R.A. (2007). The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 8, R183.

Irmler, S., Bavan, T., Oberli, A., Roetschi, A., Badertscher, R., Guggenbuhl, B., and Berthoud, H. (2013). Catabolism of serine by Pediococcus acidilactici and Pediococcus pentosaceus. Appl. Environ. Microbiol. 79, 1309-1315.

Johansson, M.A., Bjorkander, S., Mata Forsberg, M., Qazi, K.R., Salvany Celades, M., Bittmann, J., Eberl, M., and Sverremark-Ekstrom, E. (2016). Probiotic Lactobacilli Modulate Staphylococcus aureus-Induced Activation of Conventional and Unconventional T cells and NK Cells. Front. Immunol.7, 273.

Jones, C.E., Brown, A.L., and Baumann, U. (2007). Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 8, 170.

63

Kajander, K., Hatakka, K., Poussa, T., Farkkila, M., and Korpela, R. (2005). A probiotic mixture alleviates symptoms in irritable bowel syndrome patients: a controlled 6-month intervention. Aliment. Pharmacol. Ther. 22, 387-394.

Kajander, K., Krogius-Kurikka, L., Rinttila, T., Karjalainen, H., Palva, A., and Korpela, R. (2007). Effects of multispecies probiotic supplementation on intestinal microbiota in irritable bowel syndrome. Aliment. Pharmacol. Ther. 26, 463-473.

Kajander, K., Myllyluoma, E., Rajilic-Stojanovic, M., Kyronpalo, S., Rasmussen, M., Jarvenpaa, S., Zoetendal, E.G., de Vos, W.M., Vapaatalo, H., and Korpela, R. (2008). Clinical trial: multispecies probiotic supplementation alleviates the symptoms of irritable bowel syndrome and stabilizes intestinal microbiota. Aliment. Pharmacol. Ther. 27, 48-57.

Kanehisa, M., and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30.

Kankainen, M., Paulin, L., Tynkkynen, S., von Ossowski, I., Reunanen, J., Partanen, P., Satokari, R., Vesterlund, S., Hendrickx, A.P., Lebeer, S., et al. (2009). Comparative genomic analysis of Lactobacillus rhamnosus GG reveals pili containing a human- mucus binding protein. Proc. Natl. Acad.Sci. U. S. A. 106, 17193-17198.

Karp, P.D., Paley, S., and Romero, P. (2002). The Pathway Tools software. Bioinformatics 18 Suppl 1, S225-32.

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36-2013-14-4-r36.

Kleerebezem, M., and Hugenholtz, J. (2003). Metabolic pathway engineering in lactic acid bacteria. Curr. Opin. Biotechnol. 14, 232-237.

Korpelainen, L., Tuimala, J., Somervuo, P., Huss, M., and Wong, G. (2014). RNA-seq Data Analysis: APractical Approach (UK: Chapman & Hall/CRC Mathematical & Computational Biology).

Koskenniemi, K., Laakso, K., Koponen, J., Kankainen, M., Greco, D., Auvinen, P., Savijoki, K., Nyman, T.A., Surakka, A., Salusjarvi, T., et al. (2011). Proteomics and transcriptomics characterization of bile stress response in probiotic Lactobacillus rhamnosus GG. Mol. Cell. Proteomics 10, M110.002741.

Koskinen, P., Toronen, P., Nokso-Koivisto, J., and Holm, L. (2015). PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment. Bioinformatics 31, 1544-1552.

Kukkonen, K., Savilahti, E., Haahtela, T., Juntunen-Backman, K., Korpela, R., Poussa, T., Tuure, T., and Kuitunen, M. (2008). Long-term safety and impact on infection rates of postnatal probiotic and

64

prebiotic (synbiotic) treatment: randomized, double-blind, placebo-controlled trial. Pediatrics 122, 8-12.

Lahens, N.F., Kavakli, I.H., Zhang, R., Hayer, K., Black, M.B., Dueck, H., Pizarro, A., Kim, J., Irizarry,R., Thomas, R.S., Grant, G.R., and Hogenesch, J.B. (2014). IVT-seq reveals extreme bias in RNA sequencing. Genome Biol. 15, R86.

Landaud, S., Helinck, S., and Bonnarme, P. (2008). Formation of volatile sulfur compounds and metabolism of methionine and other sulfur compounds in fermented food. Appl. Microbiol. Biotechnol.77, 1191-1205.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359.

Le Lay, C., Coton, E., Le Blay, G., Chobert, J.M., Haertle, T., Choiset, Y., Van Long, N.N., Meslet-Cladiere, L., and Mounier, J. (2016). Identification and quantification of antifungal compounds produced by lactic acid bacteria and propionibacteria. Int. J. Food Microbiol. 239, 79-85.

Lehtoranta, L., Soderlund-Venermo, M., Nokso-Koivisto, J., Toivola, H., Blomgren, K., Hatakka, K., Poussa, T., Korpela, R., and Pitkaranta, A. (2012). Human bocavirus in the nasopharynx of otitis-prone children. Int. J. Pediatr. Otorhinolaryngol. 76, 206-211.

Leng, N., Dawson, J.A., Thomson, J.A., Ruotti, V., Rissman, A.I., Smits, B.M., Haag, J.D., Gould, M.N., Stewart, R.M., and Kendziorski, C. (2013). EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035-1043.

Levin, J.Z., Yassour, M., Adiconis, X., Nusbaum, C., Thompson, D.A., Friedman, N., Gnirke, A., and Regev, A. (2010). Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Meth 7, 709-715.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map formatand SAMtools. Bioinformatics 25, 2078-2079.

Liao, Y., Smyth, G.K., and Shi, W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930.

Linares, D.M., Del Rio, B., Ladero, V., Martinez, N., Fernandez, M., Martin, M.C., and Alvarez, M.A. (2012). Factors influencing biogenic amines accumulation in dairy products. Front. Microbiol. 3, 180.

Liong, M.T., and Shah, N.P. (2005). Acid and bile tolerance and cholesterol removal ability of lactobacilli strains. J. Dairy Sci. 88, 55-66.

Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L., and Law, M. (2012). Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 251364.

65

Lortal, S., and Chapot-Chartier, M. (2005). Role, mechanisms and control of lactic acid bacteria lysis incheese. Int. Dairy J. 15, 857-871.

Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550.

Luukkonen, J., Kemppinen, A., Kärki, M., Laitinen, H., Mäki, M., Sivelä, S., Taimisto, A., and Ryhänen, E. (2005). The effect of a protective culture and exclusion of nitrate on the survival of enterohemorrhagic E. coli and Listeria in Edam cheese made from Finnish organic milk. Int. Dairy J. 15, 449-457.

Lyra, A., Krogius-Kurikka, L., Nikkila, J., Malinen, E., Kajander, K., Kurikka, K., Korpela, R., and Palva, A. (2010). Effect of a multispecies probiotic supplement on quantity of irritable bowel syndrome-related intestinal microbial phylotypes. BMC Gastroenterol. 10, 110-230X-10-110.

Maere, S., Heymans, K., and Kuiper, M. (2005). BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21, 3448-3449.

Magnussen, A., and Parsi, M.A. (2013). Aflatoxins, hepatocellular carcinoma and public health. World J. Gastroenterol. 19, 1508-1512.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal 17,

Martino, G.P., Quintana, I.M., Espariz, M., Blancato, V.S., and Magni, C. (2016). Aroma compounds generation in citrate metabolism of Enterococcus faecium: Genetic characterization of type I citrate gene cluster. Int. J. Food Microbiol. 218, 27-37.

Mato Rodriguez, L., Ritvanen, T., Joutsjoki, V., Rekonen, J., and Alatossava, T. (2011). The role of copper in the manufacture of Finnish Emmental cheese. J. Dairy Sci. 94, 4831-4842.

McDonald, L.C., McFeeters, R.F., Daeschel, M.A., and Fleming, H.P. (1987). A differential medium forthe enumeration of homofermentative and heterofermentative lactic Acid bacteria. Appl. Environ. Microbiol. 53, 1382-1384.

Merk, K., Borelli, C., and Korting, H.C. (2005). Lactobacilli – bacteria–host interactions with special regard to the urogenital tract. International Journal of Medical Microbiology 295, 9-18.

Miettinen, M., Pietila, T.E., Kekkonen, R.A., Kankainen, M., Latvala, S., Pirhonen, J., Osterlund, P., Korpela, R., and Julkunen, I. (2012). Nonpathogenic Lactobacillus rhamnosus activates the inflammasome and antiviral responses in human macrophages. Gut Microbes 3, 510-522.

Mills, J.D., Kawahara, Y., and Janitz, M. (2013). Strand-Specific RNA-Seq Provides Greater Resolution of Transcriptome Profiling. Curr. Genomics 14, 173-181.

66

Milne, I., Bayer, M., Cardle, L., Shaw, P., Stephen, G., Wright, F., and Marshall, D. (2010). Tablet--nextgeneration sequence assembly visualization. Bioinformatics 26, 401-402.

Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C., and Kanehisa, M. (2007). KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182-5.

Myllyluoma, E., Kajander, K., Mikkola, H., Kyronpalo, S., Rasmussen, M., Kankuri, E., Sipponen, P., Vapaatalo, H., and Korpela, R. (2007). Probiotic intervention decreases serum gastrin-17 in Helicobacter pylori infection. Dig. Liver Dis. 39, 516-523.

Nada, S., Amra, H., Deabes, M., and Omara, E. (2009). Saccharomyces Cerevisiae And Probiotic Bacteria Potentially Inhibit Aflatoxins Production In Vitro And In Vivo Studies. The Internet Journal of Toxicology 8,

Neves, A.R., Pool, W.A., Solopova, A., Kok, J., Santos, H., and Kuipers, O.P. (2010). Towards enhanced galactose utilization by Lactococcus lactis. Appl. Environ. Microbiol. 76, 7048-7060.

Ojala, T., Laine, P.K.S., Ahlroos, T., Tanskanen, J., Pitkänen, S., Salusjärvi, T., Kankainen, M., Tynkkynen, S., Paulin, L., and Auvinen, P. (2017). Functional genomics provides insights into the role of Propionibacterium freudenreichii ssp. shermanii JS in cheese ripening. Int. J. Food Microbiol. 241, 39-48.

Oksaharju, A., Kankainen, M., Kekkonen, R.A., Lindstedt, K.A., Kovanen, P.T., Korpela, R., and Miettinen, M. (2011). Probiotic Lactobacillus rhamnosus downregulates FCER1 and HRH4 expression in human mast cells. World J. Gastroenterol. 17, 750-759.

O'Leary, N.A., Wright, M.W., Brister, J.R., Ciufo, S., Haddad, D., McVeigh, R., Rajput, B., Robbertse, B., Smith-White, B., Ako-Adjei, D., et al. (2016). Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733-45.

Ortakci, F., Broadbent, J.R., Oberg, C.J., and McMahon, D.J. (2015). Growth and gas production of a novel obligatory heterofermentative Cheddar cheese nonstarter lactobacilli species on ribose and galactose. J. Dairy Sci. 98, 3645-3654.

O'Sullivan, D.J., McSweeney, P.L., Cotter, P.D., Giblin, L., and Sheehan, J.J. (2016). Compromised Lactobacillus helveticus starter activity in the presence of facultative heterofermentative Lactobacillus casei DPC6987 results in atypical eye formation in Swiss-type cheese. J. Dairy Sci. 99, 2625-2640.

Pachlová, V., Buňka, F., Flasarová, R., Válková, P., and Buňková, L. (2012). The effect of elevated temperature on ripening of Dutch type cheese. Food Chem. 132, 1846-1854.

Patro, R., Mount, S.M., and Kingsford, C. (2014). Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462-464.

Pease, J., and Sooknanan, R. (2012). A rapid, directional RNA-seq library preparation workflow for Illuminareg] sequencing. Nat Meth 9,

67

Peralta, G.H., Bergamini, C.V., and Hynes, E.R. (2016). Aminotransferase and glutamate dehydrogenase activities in lactobacilli and streptococci. Braz J. Microbiol. 47, 741-748.

Pinto, A.C., Melo-Barbosa, H.P., Miyoshi, A., Silva, A., and Azevedo, V. (2011). Application of RNA-seq to reveal the transcript profile in bacteria. Genet. Mol. Res. 10, 1707-1718.

Pogacic, T., Maillard, M.B., Leclerc, A., Herve, C., Chuat, V., Valence, F., and Thierry, A. (2016). Lactobacillus and Leuconostoc volatilomes in cheese conditions. Appl. Microbiol. Biotechnol. 100, 2335-2346.

Promponas, V.J., Iliopoulos, I., and Ouzounis, C.A. (2015). Annotation inconsistencies beyond sequence similarity-based function prediction - phylogeny and genome structure. Stand. Genomic Sci. 10, 108-015-0101-2. eCollection 2015.

Pudlik, A.M., and Lolkema, J.S. (2013). Uptake of alpha-ketoglutarate by citrate transporter CitP drivestransamination in Lactococcus lactis. Appl. Environ. Microbiol. 79, 1095-1101.

Reuter, J.A., Spacek, D.V., and Snyder, M.P. (2015). High-throughput sequencing technologies. Mol. Cell 58, 586-597.

Rhoads, A., and Au, K.F. (2015). PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics 13, 278-289.

Rice, P., Longden, I., and Bleasby, A. (2000). EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276-277.

Rivals, I., Personnaz, L., Taing, L., and Potier, M.C. (2007). Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics 23, 401-407.

Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24-26.

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140.

Rodionov, D.A., Vitreschak, A.G., Mironov, A.A., and Gelfand, M.S. (2003). Comparative genomics ofthe vitamin B12 metabolism and regulation in prokaryotes. J. Biol. Chem. 278, 41148-41159.

Rossi, M., Amaretti, A., and Raimondi, S. (2011). Folate production by probiotic bacteria. Nutrients 3, 118-134.

Rothberg, J.M., Hinz, W., Rearick, T.M., Schultz, J., Mileski, W., Davey, M., Leamon, J.H., Johnson, K., Milgrew, M.J., Edwards, M., et al. (2011). An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348-352.

Ruegg, P.L. (2003). Practical Food Safety Interventions for Dairy Production. J. Dairy Sci. 86, Supplement, E1-E9.

68

Saier, M.H.,Jr, Reddy, V.S., Tsu, B.V., Ahmed, M.S., Li, C., and Moreno-Hagelsieb, G. (2016). The Transporter Classification Database (TCDB): recent advances. Nucleic Acids Res. 44, D372-9.

Salvetti, E., Torriani, S., and Felis, G.E. (2012). The Genus Lactobacillus: A Taxonomic Update. Probiotics and Antimicrobial Proteins 4, 217-226.

Sarkar, N. (1997). Polyadenylation of mRNA in Prokaryotes. Annu. Rev. Biochem. 66, 173-197.

Saxelin, M. (2008). Probiotic formulations and applications, the current probiotics market, and changesin the marketplace: a European perspective. Clin. Infect. Dis. 46 Suppl 2, S76-9; discussion S144-51.

Schirone, M., Tofalo, R., Fasoli, G., Perpetuini, G., Corsetti, A., Manetta, A.C., Ciarrocchi, A., and Suzzi, G. (2013). High content of biogenic amines in Pecorino cheeses. Food Microbiol. 34, 137-144.

Schlech, W.F., and Acheson, D. (2000). Foodborne Listeriosis. Clinical Infectious Diseases 31, 770-775.

Schmieder, R., and Edwards, R. (2011). Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863-864.

Schnoes, A.M., Brown, S.D., Dodevski, I., and Babbitt, P.C. (2009). Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605.

Selinger, D.W., Cheung, K.J., Mei, R., Johansson, E.M., Richmond, C.S., Blattner, F.R., Lockhart, D.J.,and Church, G.M. (2000). RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat Biotech 18, 1262-1268.

Settanni, L., and Moschetti, G. (2010). Non-starter lactic acid bacteria used to improve cheese quality and provide health benefits. Food Microbiol. 27, 691-697.

Sgarbi, E., Lazzi, C., Tabanelli, G., Gatti, M., Neviani, E., and Gardini, F. (2013). Nonstarter lactic acidbacteria volatilomes produced using cheese components. J. Dairy Sci. 96, 4223-4234.

Shalaby, A.R. (1996). Significance of biogenic amines to food safety and human health. Food Res. Int. 29, 675-690.

Shendure, J., and Ji, H. (2008). Next-generation DNA sequencing. Nat Biotech 26, 1135-1145.

Shishkin, A.A., Giannoukos, G., Kucukural, A., Ciulla, D., Busby, M., Surka, C., Chen, J., Bhattacharyya, R.P., Rudy, R.F., Patel, M.M., et al. (2015). Simultaneous generation of many RNA-seqlibraries in a single reaction. Nat Meth 12, 323-325.

Siezen, R.J., and Wilson, G. (2010). Probiotics genomics. Microb. Biotechnol. 3, 1-9.

Smid, E.J., and Kleerebezem, M. (2014). Production of aroma compounds in lactic fermentations. Annu. Rev. Food Sci. Technol. 5, 313-326.

69

Soneson, C., and Delorenzi, M. (2013). A comparison of methods for differential expression analysis ofRNA-seq data. BMC Bioinformatics 14, 91-2105-14-91.

Sorek, R., and Cossart, P. (2010). Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat. Rev. Genet. 11, 9-16.

Srinivasan, P.R., Ramanarayanan, M., and Rabbani, E. (1975). Presence of polyriboadenylate sequences in pulse-labeled RNA of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 72, 2910-2914.

Sriram, K., Manzanares, W., and Joseph, K. (2012). Thiamine in nutrition therapy. Nutr. Clin. Pract. 27,41-50.

Suomalainen, T., H., and Mäyrä-Makinen, A., M. (1999). Propionic acid bacteria as protective cultures in fermented milks and breads. Lait 79, 165-174.

Toney, M.D. (2005). Reaction specificity in pyridoxal phosphate enzymes. Arch. Biochem. Biophys. 433, 279-287.

Trapnell, C., and Salzberg, S.L. (2009). How to map billions of short reads onto genomes. Nat. Biotechnol. 27, 455-457.

Turpin, W., Humblot, C., Thomas, M., and Guyot, J. (2010). Lactobacilli as multifaceted probiotics with poorly disclosed molecular mechanisms. Int. J. Food Microbiol. 143, 87-102.

Valsamaki, K., Michaelidou, A., and Polychroniadou, A. (2000). Biogenic amine production in Feta cheese. Food Chem. 71, 259-266.

van de Wiel, M.A., Neerincx, M., Buffart, T.E., Sie, D., and Verheul, H.M. (2014). ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs. BMC Bioinformatics 15, 116-2105-15-116.

van Dijk, E.L., Jaszczyszyn, Y., and Thermes, C. (2014). Library preparation methods for next-generation sequencing: Tone down the bias. Exp. Cell Res. 322, 12-20.

Van Vliet, A.H.M. (2010). Next generation sequencing of microbial transcriptomes: challenges and opportunities. FEMS Microbiol. Lett. 302, 1.

Wang, L., Si, Y., Dedow, L.K., Shao, Y., Liu, P., and Brutnell, T.P. (2011). A Low-Cost Library Construction Protocol and Data Analysis Pipeline for Illumina-Based Strand-Specific Multiplex RNA-Seq. Plos One 6, e26426.

Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57-63.

Weinrichter, B., Sollberger, H., Ginzinger, W., Jaros, D., and Rohm, H. (2004). Adjunct starter properties affect characteristic features of Swiss-type cheeses. Nahrung 48, 73-79.

70

Wilhelm, B.T., and Landry, J. (2009). RNA-Seq—quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48, 249-257.

Williams, C.R., Baccarella, A., Parrish, J.Z., and Kim, C.C. (2016). Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinformatics 17, 103.

Yvon, M., and Rijnen, L. (2001). Cheese flavour formation by amino acid catabolism. Int. Dairy J. 11, 185-201.

Zhao, H., Chiaro, C.R., Zhang, L., Smith, P.B., Chan, C.Y., Pedley, A.M., Pugh, R.J., French, J.B., Patterson, A.D., and Benkovic, S.J. (2015). Quantitative analysis of purine nucleotides indicates that purinosomes increase de novo purine biosynthesis. J. Biol. Chem. 290, 6705-6713.

71