Holly_Davies_Dissertation

47
Page 1 of 47 The Presence of Prevotella intermedia 17 within the human lung and its relationship with lung cancer & COPD: a metagenomic analysis of the human lung microbiome Student Name: Holly Davies Student ID: 130023847 Submitted in part candidature for the degree of B.Sc. Biology (Genetics) Institute of Biological, Environmental and Rural Sciences Aberystwyth University Submitted April 2016

Transcript of Holly_Davies_Dissertation

Page 1: Holly_Davies_Dissertation

Page 1 of 47

The Presence of Prevotella intermedia 17

within the human lung and its relationship

with lung cancer & COPD: a metagenomic

analysis of the human lung microbiome

Student Name: Holly Davies

Student ID: 130023847

Submitted in part candidature for the degree of B.Sc. Biology (Genetics)

Institute of Biological, Environmental and Rural Sciences

Aberystwyth University

Submitted April 2016

Page 2: Holly_Davies_Dissertation

Page 2 of 47

Contents Page

0. Preface

0.1 Declaration 4

0.2 Acknowledgements 5

0.3 Abstract 6

1. Introduction

1.1 Outline and Objectives 7

1.2 Lung cancer & COPD 8

1.2.1 Lung cancer 8

1.2.2 COPD 10

1.3 Prevotella intermedia 17 12

1.3.1 The Prevotella genus 12

1.3.2 Prevotella intermedia 12

1.3.3 Prevotella intermedia 17 13

1.4 Lung microbiome research 13

1.5 Previous work 14

2. Materials and Methodology

2.1 Aims and objectives 15

2.2 Initial analysis 15

2.2.1 Largest contig assembly 15

2.2.2 NCBI Blast search 16

2.2.3 Alignment 16

2.3 Individual samples 16

Page 3: Holly_Davies_Dissertation

Page 3 of 47

2.3.1 Import and sampling 17

2.3.2 De novo assembly 17

2.3.2 Read Mapping 17

3. Results

3.1 NCBI Blast results 18

3.2 NODE alignment 19

3.3 Mapping of individual samples 20

4. Discussion

4.1 Discussion of Results 23

4.1.1 NCBI MegaBlast and NODE alignment 23

4.1.2 Individual sample data 23

4.2 Limitations & implications 24

4.3 Further study 24

4.4 Conclusions 25

5. References 26

6. Word Count 31

7. List of Figures/Tables/Images 32

8. Appendix 33

Page 4: Holly_Davies_Dissertation

Page 4 of 47

0.1 Declaration

Module BR32330

I certify that all material in this paper is the result of my own investigation, except

where indicated, and references used in preparation of the text have been cited. This

paper has not been previously submitted as part of any other assessed module (with the

exception of the project proposal submitted for this paper), or submitted for any other

degree or diploma.

NAME: HOLLY DAVIES

DATE: 13/04/2016

Page 5: Holly_Davies_Dissertation

Page 5 of 47

0.2. Acknowledgements

I would like to take this opportunity to thank Dr Justin Pachebat for the opportunity to be a

part in this research, and for the constant & helpful advice and support throughout this entire

project.

I would also like to thank everyone involved in the MEDLUNG project, specifically Joe

Healey, Simon Cameron and Tom Hitch for providing the background and basis necessary

for me to be able to conduct this research.

Finally, I would like to thank Michael Best and Louise Denny for providing motivation and

support throughout this project, it has been invaluable to me.

Page 6: Holly_Davies_Dissertation

Page 6 of 47

0.3 Abstract

The aim of this project was to analyse the bacterial DNA present in the sputum of lung

cancer and COPD (Chronic Obstructive Pulmonary Disease) patients to further research into

developing a biomarker for these diseases in association with the MEDLUNG Project

(Metabolic Biomarkers for the Detection of Lung cancer) – a multicentre study on behalf of

the National Health Service (NHS). The initial analysis was conducted on an Illumina

metagenome contig assembly of data collected from 30 patients (10 healthy, 10 lung cancer,

10 COPD) using NCBI (National Centre for Biotechnology Information) BLAST (Basic Local

Alignment Search Tool) searches. From this analysis Prevotella intermedia 17 was identified

within the contig assembly.

Prevotella intermedia had previously been found orally in periodontal diseases (Maeda

et al, 1998) periapical periodontitis (Jacinto et al, 2003), and noma (an acute gangrenous

disease) (Bolivar et al, 2012), and also had been found to be associated with cystic fibrosis

(Ulrich et al, 2010) and causing an increased risk of pneumonia in mice (Nagaoka et al, 2014).

Specifically, Prevotella intermedia 17 is a clinical strain of the species that had only been

isolated from the periodontal pocket (Ruan et al, 2015).

This analysis was conducted using CLC Genomics Workbench 8 (CLC bio, 2016) and

included performing a de novo assembly with the initial patient data from the MEDLUNG

collection, and mapping this to the P. intermedia 17 reference genome. From this it was further

found that P. intermedia 17 is indeed found in the lungs, but also that lung cancer and COPD

have a seriously negative effect upon it, reducing it by 85-99% when compared with the healthy

control group.

This study has discovered the presence of Prevotella intermedia 17 in the lungs for the

first time, and also that P. intermedia 17 does have a relationship with both lung cancer and

COPD in humans. This could lead to the development of a new diagnostic test for lung cancer

or COPD, or possibly further the knowledge surrounding these diseases and how they manifest

in the human lung. Developing a new diagnostic test and providing early screening for patients

is vitally important for lung cancer and COPD, as it would have the capacity to save countless

lives by giving more people access to curative treatment at an earlier stage where it can be

effective.

Page 7: Holly_Davies_Dissertation

Page 7 of 47

1. Introduction

1.1 OUTLINE AND OBJECTIVES

The aim of this project was to analyse the bacterial DNA present in the sputum of lung

cancer and COPD (Chronic Obstructive Pulmonary Disease) patients to further research into

developing a biomarker (biological molecule which is specific to said diseases) for these

diseases in association with the MEDLUNG Project (Metabolic Biomarkers for the Detection

of Lung cancer) – a multicentre study on behalf of the National Health Service (NHS). The

initial analysis was conducted on an Illumina metagenome contig assembly of data collected

from 30 patients (10 healthy, 10 lung cancer, 10 COPD) using NCBI (National Centre for

Biotechnology Information) BLAST (Basic Local Alignment Search Tool) searches. From this

analysis Prevotella intermedia 17 was identified within the contig assembly.

Prevotella intermedia had previously been found orally in periodontal diseases (Maeda

et al, 1998) periapical periodontitis (Jacinto et al, 2003), and noma (an acute gangrenous

disease) (Bolivar et al, 2012). Outside of oral diseases, Prevotella intermedia had been found

to be associated with cystic fibrosis (Ulrich et al, 2010) and causing an increased risk of

pneumonia in mice (Nagaoka et al, 2014). Specifically, Prevotella intermedia 17 is a clinical

strain of the species that had only been isolated from the periodontal pocket (Ruan et al, 2015),

with no links to lung cancer/COPD, or even the lungs in general.

From this, the Prevotella intermedia 17 reference genome was aligned with the raw

individual patients’ data to confirm its presence within the lungs, and to determine whether it

is linked to lung cancer and COPD. Hopefully a link between Prevotella intermedia and these

diseases would be established, leading to a new diagnostic test being developed in further

study, ensuring early diagnosis and higher survival rates of lung cancer and COPD sufferers.

Developing a new diagnostic test and providing early screening for patients is vitally

important for lung cancer and COPD, as two-thirds of lung cancer cases are diagnosed at

advanced stages whereby curative treatment becomes unavailable (CancerResearchUK, 2015a)

and COPD is regularly under- and mis-diagnosed (W.H.O., 2015a). If an early diagnostic test

could be developed, it would have the capacity to save countless lives by giving more people

access to early treatment.

Page 8: Holly_Davies_Dissertation

Page 8 of 47

1.2 LUNG CANCER AND COPD

Lung cancer and COPD are among 2 of the most prevalent respiratory tract disorders

(CancerResearchUK, 2015a), both having extremely high morbidity and mortality (Eddy,

1989, Mallia et al., 2007). The most common cause of cancer death in the UK is lung cancer

(CancerResearchUK, 2015b), with COPD causing 6% of deaths globally (W.H.O., 2015a).

These diseases are not mutually exclusive, as a high risk of lung cancer usually equals a high

risk of COPD (Raviv et al, 2011). Hopefully by developing a biomarker for one, it would give

pointers for a biomarker for the other.

1.2.1 LUNG CANCER

Lung cancer is the most common cause of cancer death in the UK, accounting for 22%

of all deaths from cancer, and is the second most common cancer in the UK

(CancerResearchUK, 2015c). Globally, 58% of lung cancer cases occurred in less developed

countries in 2012 (Ferlay et al, 2014), and accounted for 1.59 million deaths (W.H.O., 2014).

In many cases the cause of the disease is clear, with tobacco smoking accounting for

more than 8 out of 10 cases, however other risk factors include exposure to carcinogens and

radiation, air pollution, family history and poor immunity (CancerResearchUK, 2015c).

Ageing is another factor that is involved in the development of lung cancer, which can be down

to an accumulation of the effect of risk factors (overall risk accumulation), however the overall

risk accumulation is then combined with the less effective cellular repair mechanisms as a

person grows older (W.H.O. 2015b). However, the World Health Organisation states that

“more than 30% of cancer deaths could be prevented by modifying or avoiding key risk factors”

(W.H.O. 2015b).

There are many preventative measures currently operating to attempt to reduce the

incidence of lung cancer. The main focus of these are to decrease smoking levels in

populations, but there are also some measures to address the rarer risk factors. Smoking

cessation is the main method of preventing lung cancer, as after 10 years of smoking cessation,

there is a 30-50% reduction in lung cancer mortality risk when compared to persistent smokers

(Fiore et al, 1996) and is helped by government campaigning as seen in Image 1. To help a

person achieve smoking cessation, the Agency for Healthcare Research and Quality (formerly

the Agency for Health Care Policy and Research [AHCPR]) developed a set of clinical

smoking-cessation guidelines for the benefit of both the patient and the health care provider

Page 9: Holly_Davies_Dissertation

Page 9 of 47

(Fiore et al, 1996), including documenting the patient’s tobacco use and the offer of one or

more effective smoking cessation treatments (nicotine-replacement, social support, skills

training/problem solving etc.). Another method of prevention includes the moderating of

occupational exposure to lung carcinogens, such as chromium, arsenic, nickel and asbestos, as

when all considered together, attribute to 9-15% of all lung cancer (Alberg et al, 2007).

Image 1: Government campaign supporting smoking cessation (Parry, 2010)

There are two main classifications of lung cancer; non-small cell and small cell. Non-

small cell lung cancer accounts for approximately 85% of lung cancers and occurs in three

types; adenocarcinoma, squamous cell carcinoma and large cell carcinoma (CancerCare®,

2016). Small cell lung cancer accounts for the remaining 15% of lung cancer incidences, and

tend to grow more quickly than non-small cell tumours (CancerCare®). The most common

symptoms associated with lung cancer are coughing, shortness of breath, fatigue and blood

present in the sputum (CancerResearchUK, 2015c). Other symptoms can include weight loss,

recurrent infections such as bronchitis and pneumonia, and chest pain (American Cancer

Society, 2016). Lung cancer can also produce hormone-like substances which enter the

bloodstream, causing paraneoplastic syndromes in various tissues and organs such as

hypercalcemia (high blood calcium levels), blood clots, gynecomastia (excess breast growth in

men) and various nervous system problems (American Cancer Society, 2016).

Despite all this there is no national screening programming for lung cancer in the UK,

leading to most cases being discovered via x-ray, by which point the cancer is usually too

advanced for curative treatment (CancerResearchUK, 2015c). There are some attempts to

introduce a screening programme into the UK, such as the UK Lung Cancer Screening Trial

(UKLS), which aims to screen people most at risk (e.g. between the age of 50-75) using various

Page 10: Holly_Davies_Dissertation

Page 10 of 47

clinical tests, the most promising being CT scanning, to help diagnose lung cancer earlier

(UKLS, 2012). There are some screening programmes in the US, however they are very

selective in who they screen and also use CT scanning to determine diagnosis (CDC, 2016).

The problem with the current focus on lung cancer screening is that it requires the use of CT

scanning, which exposes the patient to radiation, possibly increasing the risk of cancer

(Brenner, 2003). Cancer Research UK state that the essential criteria for a possible screening

programme is to be simple, quick, relatively inexpensive and not harmful (CancerResearchUK,

2015c), which the current possible screening programmes do not meet, causing harm through

radiation exposure or a possible allergic reaction to the dye used in the CT scan (NHS, 2016).

The discovery of a biomarker for lung cancer could save lives through the development of a

new diagnostic test, detecting lung cancer before it can be seen on a CT scan, whilst also

complying with the essential criteria for a screening programme.

1.2.2 COPD

Chronic Obstructive Pulmonary Disease (COPD) is a lung disease which interferes with

normal breathing via a persistent blockage of airflow. It causes 25000 deaths per year in the

UK and more than 3 million globally in 2012, approximately 6% of all deaths recorded

(W.H.O., 2015a), becoming the third most common cause of death in the world (Lozano et al,

2013). However, these numbers are not an accurate representation of how prevalent COPD is,

with an estimated 24 million people in the US suffering from the disease without even knowing

it (American Lung Association, 2016), pushing for a better diagnosis/screening programme

and more public awareness.

As with lung cancer, the leading cause of COPD is cigarette smoking (NIH, 2013a). As

many as 8 out of 10 COPD-related deaths are caused by smoking (US. Department of Health

and Human Services, 2014), accounting for approximately 5.4 million deaths in 2005 (W.H.O.

2016). There are also other risk factors for COPD, mainly prevalent in low-income countries

(W.H.O. 2016). Exposure to indoor air pollution, mainly caused by the use of biomass fuels

for cooking and heating, is the biggest risk factor in these countries due to inefficient resources

available, with approximately 3 billion people using these methods of heating (W.H.O. 2016).

Other risk factors include exposure to certain types of dust and chemicals at work (e.g. coal

and cadmium) and possibly urban air pollution (not conclusive) (NHS, 2014). The preventative

measures in place for COPD are the same as those for lung cancer, as they both have the same

risk factors.

Page 11: Holly_Davies_Dissertation

Page 11 of 47

The poor airflow associated with COPD is the result of the contributions of two

conditions; emphysema (the breaking down of lung tissue) and obstructive bronchiolitis (small

airways disease) (Vestbo et al, 2013a), which create structural changes within the lungs, as

seen in Image 2. The main symptoms associated with COPD include breathlessness, abnormal

sputum and a chronic cough (W.H.O. 2015a). However, at first, COPD can present no

symptoms, or only mild ones, making early diagnosis difficult (NIH. 2013b).

Image 2: Structural changes in human lungs with COPD (Houghton, 2013)

There is a diagnostic test for COPD called spirometry, which is only considered for

someone over the age of 35-40 who presents with various symptoms and has had a history of

exposure to the risk factors (Vestbo et al, 2013b). Spirometry involves the use of a

bronchodilator (drug to open airways) and works by measuring the amount of airflow

obstruction present (Qaseem et al, 2011). To make a diagnosis, two measurements are made:

the forced expiratory volume in one second (FEV1) (greatest volume of air expelled in one

second), and the forced vital capacity (FVC) (greatest volume of air expelled in one full breath)

(Young & Vincent, 2010). Using these two measurements a FEV1/FVC ratio can be calculated

and compared against medical guidelines (usually a ratio lower than 70% in someone with

COPD-like symptoms) to determine whether or not they have the disease, however this can

lead to an over-diagnosis of COPD in elderly patients (Qaseem et al, 2011). The issue with

spirometry as a diagnostic tool is that using it on people who do not present symptoms of COPD

Page 12: Holly_Davies_Dissertation

Page 12 of 47

has “evidence of uncertain effect, and therefore is currently not recommended” (Vestbo et al,

2013a). Due to this these is no early diagnostic method for people with COPD, therefore by the

time the disease is diagnosed, it is too advanced for curative treatment to be successful

(W.H.O., 2015a). Developing a diagnostic tool based on a biomarker would be highly

beneficial to COPD sufferers as it has the possibility to detect the disease before symptoms

have manifested, making treatment more successful. As with lung cancer, this could be

introduced as a national screening programme to reduce the deaths caused by COPD, as a large

amount of people with COPD are not diagnosed correctly (American Lung Association, 2016).

1.3 PREVOTELLA INTERMEDIA 17

1.3.1. THE PREVOTELLA GENUS

The Prevotella genus is a group of anaerobic gram-negative rod-shaped bacteria most

commonly found in association with periodontal diseases (Maeda et al, 1998). It is classified

among the group of ‘black pigmented bacteria’ due to the formation of smooth and shiny

colonies with black/grey colour when grown on a blood agar plate (Shah & Collins, 1990). The

original classification for these bacteria was Bacteroides melaninogenicus, until it was

reclassified and split into Prevotella melaninogenicus and Prevotella intermedia (Brook,

2015). The Prevotella genus is very versatile, having been found in various areas such as the

oral cavity, upper respiratory tract, urogenital tract (Eiring et al, 1998), rumen and human

faeces (Hayashi et al, 2007). Many species of Prevotella are potential/opportunistic pathogens

(Yunfeng et al, 2015) under a wide range of environments and are known to invade host tissues

(Nadkarni et al, 2012).

1.3.2 PREVOTELLA INTERMEDIA

Due to its isolation from lesions of patients, Prevotella intermedia has been found as a

putative periodontal pathogen, specifically in early periodontitis, advanced periodontitis, and

acute necrotizing ulcerative gingivitis (Haffajee & Socransky, 1994). It has also been found to

invade the human coronary artery endothelial and smooth muscle cells in vitro (Dorn et al,

1999) and has been found in atheromatous plaques (Haraszthy et al, 1998). A significant find

in relation to this study is that “P. intermedia plays a critical role in the complex

pathophysiology of lung disease in patients with cystic fibrosis” when in anaerobic sputum

Page 13: Holly_Davies_Dissertation

Page 13 of 47

plugs (Ulrich et al, 2010). The results of this study could show that this situation is not only

limited to cystic fibrosis patients, but also people suffering from lung cancer and COPD.

1.3.3. PREVOTELLA INTERMEDIA 17

P. intermedia 17 is a strain of P. intermedia clinically isolated from a human

periodontal pocket (Fukushima et al, 1992). It is differentiated from the other strains of P.

intermedia (for example 27 and ATCC 25611) by examining the diameter of fimbriae (curlin

protein appendages carrying adhesins) present on its cell surface (Leung et al, 1989). P.

intermedia presents type C (8nm diameter) fimbriae, unlike other strains of this species (Dorn

et al, 1998). Dorn et al (1998) found that, in terms of the human oral epithelial cell line, P.

intermedia 17 has the ability to invade host cells whereas strain 27 and ATCC 25611 cannot,

and also possesses strong agglutinating activity for human erythrocytes and can bind to human

buccal epithelial cells more avidly than other strains. He further speculates that “the type C

fimbriae could promote invasion by providing a means for the bacteria to attach to the cell

surface” (Dorn et al, 1998). Fan et al (2006) further state that P. intermedia 17 possesses a cell

surface protein with a broad-spectrum extra-cellular-matrix binding ability, which probably

mediates its binding through adhesins. With P. intermedia 17’s ability to do this, it could be

possible that this strain can also invade the cells of human lungs through the epithelial layer

and extracellular matrix present on the alveoli. If this is shown to be true, it would be the first

time this strain has been found in the lungs, and further could present a pathological

relationship with lung cancer/COPD.

1.4 LUNG MICROBIOME RESEARCH

Lung microbiome research is a relatively new method of research in which the bacterial

contents of the human lung are analysed, mostly for the purpose of disease investigation. Many

factors can influence the environment in the lungs, such as oxygen, pH, hydrophobicity,

temperature, salinity, predators, nutrient scarcity and many more (Dickson et al, 2015), factors

which disease can alter very easily. The microbiome of the lungs is determined by three

ecological factors; “microbial immigration into the airways, elimination of microbes from the

airways and the relative reproduction rates of its community members, as determined by

regional growth conditions” (Dickson et al, 2015). During disease these three ecological factors

change, therefore changing the bacteria species present in the lungs. By examining the changes

Page 14: Holly_Davies_Dissertation

Page 14 of 47

in bacteria species, it gives insight into the effects the disease is having on the lungs, and

possibly opens the door to new diagnostic tests and treatments being developed based on it.

Examples of successful lung microbiome projects include; the identification of a core set of

common bacteria found in the lungs of COPD patients (Erb-Downward et al, 2011), the

discovery that certain members of Staphylococcus and Streptococcus are linked to the

progression of idiopathic pulmonary fibrosis (Han et al, 2014), and the discovery that P.

intermedia plays a critical role in the pathophysiology of lung diseases in patients with cystic

fibrosis (Ulrich et al, 2010). Using the techniques set out from these papers and many more,

this study will analyse the microbiome of the lung to identify a bacteria species that possessed

a link to lung cancer or COPD.

1.5 PREVIOUS WORK

Thirty sputum samples were obtained for the MEDLUNG project (10 from healthy

patients, 10 from patients suffering with lung cancer, 10 from patients suffering from COPD)

along with the patients’ medical histories (all data was collected and treated in compliance with

ethical guidelines and confidentially). The genomic DNA was extracted from these samples

and used to create barcoded Illumina sequencing libraries for each individual samples. These

were subsequently paired-end sequences on an Illumina HiSeq2000 platform by Simon

Cameron as part of his PhD (Cameron, 2015). From this a de novo contig assembly was

performed by Tom Hitch (IBERS PhD student), which forms the starting point for this study.

Page 15: Holly_Davies_Dissertation

Page 15 of 47

2. Materials and Methods 2.1 AIMS AND OBJECTIVES

The aims and objectives for this project were to analyse the de novo contig assembly,

provided by Tom Hitch (IBERS PhD student), of the DNA samples obtained by the

MEDLUNG project. The aim of this was to possibly find a bacteria species which possessed a

relationship with lung cancer or COPD, whether that be with its presence or its absence, to

possibly develop a biomarker in future research. The discovery of a successful biomarker for

these diseases could lead to the development of a new diagnostic test for lung cancer or COPD.

An example of how this could happen would be to develop a primer for a biomarker tagged

with fluorescent markers, therefore if this biomarker is present it would be visible under ultra

violet light, meaning that the patient has one of these diseases (depending on the nature of the

biomarker). This would help the global initiative for reducing the suffering from these diseases

by enabling early diagnosis before the symptoms manifest, making curative treatment more

available.

2.2 INITIAL ANALYSIS

For this part of the analysis the metagenome contig assembly produced by Tom Hitch

(IBERS PhD student) was used to discover a bacteria species present within the sputum

samples of the MEDLUNG collection. To conduct this research, the CLC Genomics

Workbench 8 software (CLC bio, 2016) was used.

2.2.1 LARGEST CONTIG ASSEMBLY

The first stage of the initial analysis involved arranging the metagenome contig

assembly by size, from largest number of base pairs (bp), to the smallest. Due to time

constrictions on the project, the 10 largest contigs were chosen to search through as these

represented the largest portion of the contig assembly whilst being within time constraints. The

10 largest contigs were saved as a separate sequence list, then saved as separate sequences to

allow for analysis.

Page 16: Holly_Davies_Dissertation

Page 16 of 47

2.2.2 NCBI BLAST SEARCH

These individual sequences were then subject to a BLAST (Basic Local Alignment

Search Tool) (Altschul et al, 1990) function to identify their individual components by aligning

against reference genomes. For this study, the NCBI (National Centre for Biotechnology

Information) BLAST database was used due to its extensive collection of reference genomes

and genes, and it’s easy to use interface (NCBI, 2016). The individual sequences were run

through the NCBI nucleotide BLAST function, using the MegaBlast algorithm (default

parameters) (Morgulis et al, 2008) which is used for comparing a query sequence to a reference

sequence and is used for sequence identification and intra-species comparison (NCBI, 2015).

It was noticed that Prevotella intermedia 17 had hit 5 of the 10 largest contigs at relatively high

query cover levels, which was unusual as this strain of Prevotella intermedia had not yet been

found in the lungs. Therefore, it was decided to continue with this line of research for the

remainder of this project. Also the MegaBlast search results were saved for these 5 contigs to

reference later.

2.2.3 ALIGNMENT

To display the relation between the 5 largest contigs which hit P. intermedia 17 and the

reference genome itself, CLC genomics workbench 8 was used to create a visual alignment of

these contigs against the reference genome obtained from the NCBI genome database. To do

this the P. intermedia 17 reference genome was imported into CLC using standard import, and

then, using Toolbox > Molecular Biology Tools > Sequencing Data Analysis > Assemble

Sequences to Reference, was used as the reference genome for assembling each of the 5 contigs

to it to display the query cover data obtained from the search results of the MegaBlast. Using

the graphics function in CLC, these alignments were then exported for use in the results.

2.3 INDIVIDUAL SAMPLES

From the discovery of P. intermedia 17 in the metagenome contig assembly, the next

step was to use this reference genome to search through the individual patient samples, obtained

by MEDLUNG, to provide further evidence of the presence of this bacteria and to determine

whether this strain was linked to lung cancer/COPD. This was also performed in CLC genomics

workbench 8. The data consisted of 30 samples (labelled B, C or D depending on which

Page 17: Holly_Davies_Dissertation

Page 17 of 47

collection of patients the samples was from, and numbered 2-11). Each sample consisted of 4

files, 2 lanes each having 2 reads (forward and backward).

2.3.1 IMPORT AND SAMPLING

To start this section of the research, the individual patient data had to be imported into

CLC. It was in [fastq] format, therefore it was imported using the Illumina import function,

ensuring to select the paired read function to merge the 2 read files into one. After this there

was 60 files, 30 samples containing 2 lanes of data each. Then, due to time and computer

restraints, it was decided to sample 500000 reads from each file (according to sample size

calculation, on average needed to exceed ~2000 reads to be significant). To do this in CLC:

Toolbox > NGS Core Tools > Sample Reads, then specified 500000 reads to be sampled.

2.3.2 DE NOVO ASSEMBLY

To merge the 2 ‘lane’ files for each sample and to make further analysis easier, it was

decided to perform de novo assemblies for each sample. In CLC, with the 2 files of the sample

selected, Toolbox > De Novo Sequencing > De Novo Assembly. Default parameters were used

with the exception of mapping the contigs back to the contigs, due to time constraints.

2.3.3 READ MAPPING

Once the de novo assembly had completed, the next stage was to map the assemblies

to the P. intermedia 17 reference genome to identify any reads from the bacteria genome. To

do this the Map Reads to Reference function (default parameters) was used, located in CLC >

Toolbox > NGS Core Tools > Map Reads to Reference, and the P. intermedia 17 genome

selected as the reference. Once all the mapping was completed for a full set of samples i.e. B,

a track list was created of all the mapping graphs, and the maximum graph coverage set at 3

across all sample sets for the purpose of comparison later on. These track lists were then

exported using the graphics function in CLC for comparison later. Furthermore, the number of

reads for each sample set, separated into each individual sample, were plotted as graphs for

easy interpretation as results. It was decided that the track lists of mapping graphs were to be

included in the appendix as they were summarised by the graphs formulated.

Page 18: Holly_Davies_Dissertation

Page 18 of 47

3. Results 3.1 NCBI BLAST RESULTS

The first set of results show the NCBI MegaBlast search results from the initial analysis

of research, indicating the presence of P. intermedia 17 in 5 out of 10 of the largest contigs

(nodes). The full NCBI MegaBlast results are available in appendix (1).

NODE ID Description Max

Score

Total

Score

Query

cover

E

value

Identity

Accession

NODE_54069 P. intermedia 17

chromosome II

7491 24979 90% 0.0 83% CP003503.1

NODE_28947 P. intermedia 17

chromosome I

3517 10209 40% 0.0 81% CP003502.1

NODE_13609 P. intermedia 17

chromosome II

6259 18001 77% 0.0 83% CP003503.1

NODE_12098 P. intermedia 17

chromosome II

5068 18113 63% 0.0 81% CP003503.1

NODE_18381 P. intermedia 17

chromosome II

2372 5668 33% 0.0 81% CP003503.1

Table 1: NCBI MegaBlast results from inputting the 10 largest contigs (nodes) from the initial

analysis. Only the 5 contigs that hit P. intermedia 17 are displayed, including the chromosome

they hit, the max score, total score, query cover, E value, identity and accession ID.

The 5 nodes input into MegaBlast all hit at above 80% identity, with over 33% query

cover, with an E value of 0. Therefore P. intermedia 17 was significantly found within the 5

largest contigs of initial analysis, validating the search for this bacteria within the individual

samples.

To confirm the presence of this bacteria in the contigs, the P. intermedia reference

genome of the corresponding chromosome found in the MegaBlast results was aligned with

the contigs that returned P. intermedia 17 hits.

Page 19: Holly_Davies_Dissertation

Page 19 of 47

3.2 NODE ALIGNMENT

Figure 1: The alignment of the 5 contigs with the P. intermedia 17 reference genome

corresponding to the chromosomes which hit each individual node. The pink areas of the

coverage graph display the areas which align with the node.

As seen in Figure 1, there is indeed the presence of P. intermedia 17 within the

metagenome contig assembly and therefore present in the lungs of the individual patients.

Some nodes contain more P. intermedia 17 genome than others, with the most conserved being

NODE_54069, containing a 90% query cover and 83% identity to the bacteria genome, and the

Page 20: Holly_Davies_Dissertation

Page 20 of 47

least conserved being NODE_18381, containing a 33% query cover and 81% identity,

displayed in the visual alignment in Figure 1. These results do indeed prove the presence of P.

intermedia 17 within the lungs.

3.3 MAPPING OF INDIVIDUAL SAMPLES

The next stage of the analysis revolved around mapping the P. intermedia 17 genome

to the individual patient data to further support the hypothesis that P. intermedia 17 is present

within the lung and to distinguish any relationship between P. intermedia 17 and lung

cancer/COPD. The full mapping graphs from this part of the analysis are available in the

appendix (2), with individual patient mapping data available in appendix (3).

First looked at was the total number of mapped reads across the three patient groups;

Control, Lung Cancer, and COPD.

Figure 2: The total number of mapped P. intermedia 17 reads across the three patient groups;

Control, Lung Cancer, and COPD. The raw data values are displayed above the data bars.

1622

226

60

200

400

600

800

1000

1200

1400

1600

1800

Control Lung Cancer COPD

Num

ber

of

map

ped

rea

ds

Patient Group

Total Number of Mapped P. intermedia 17 Reads

Page 21: Holly_Davies_Dissertation

Page 21 of 47

From Figure 2 it can be said that the highest number of mapped P. intermedia 17 reads

were present in the control group (healthy patients), with the number decreasing significantly

in lung cancer patients, and even further in COPD patients.

To ensure that the trend displayed in Figure 2 was not due to a varying number of

reads/bases in the sample data, the average percentage of mapped reads across the entire

individual patient group was looked at to see if the trend appeared here also.

Figure 3: The average percentage of reads from the individual patient groups that mapped to

P. intermedia 17. The actual percentage is displayed to the left of each marker.

Figure 3 appears to nearly mirror the trend shown in Figure 2 that the amount of P.

intermedia 17 within human sputum is at its highest in healthy people (control group),

decreasing significantly in lung cancer patients, and even further in COPD patients.

It was decided that it would also be useful to look at the distribution of reads among the

two chromosomes in the P. intermedia 17 genome to determine which one is more prevalent

among the mapped reads.

0.756

0.289

0.0220

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Control Lung Cancer COPD

Aver

age

% o

f m

app

ed r

ead

s

Patient Group

Average % of Reads Mapped to P. intermedia 17

Page 22: Holly_Davies_Dissertation

Page 22 of 47

Figure 4: The distribution of mapped reads in the patient groups. Chromosome 1 is displayed

by the solid black data bars and chromosome 2 is represented by the patterned data bars. The

actual value of number of reads is displayed above the data bars. For COPD (due to the low

numbers) the data bars cannot be seen – 2 represents chromosome 1 and 5 represents

chromosome 2.

From Figure 4 it can be seen that chromosome 2 of P. intermedia 17 appears

significantly more among the individual patient data than chromosome 1, especially in the

control group, where chromosome 2 appears approximately over 4 times more than

chromosome 1. However, this could be due to chromosome 2 being approximately 4 times

longer than chromosome 1. On the other hand, there is also the fact that chromosome 2

appeared in 4 out of the 5 largest contigs, as opposed to chromosome 1 hitting 1 contig, which

could not be affected by sequence length.

299

392

1323

187

50

200

400

600

800

1000

1200

1400

Control Lung Cancer COPD

Num

ber

of

map

ped

rea

ds

Patient Group

Distribution of Mapped Reads in the Patient Groups

Chromosome 1 Chromosome 2

Page 23: Holly_Davies_Dissertation

Page 23 of 47

4. Discussion 4.1 DISCUSSION OF RESULTS

4.1.1 NCBI MEGABLAST AND NODE ALIGNMENT

From the NCBI MegaBlast and the following node alignment it is shown that P.

intermedia 17 is definitely present within the human lung. This is the first time that this strain

of P. intermedia has been located in the human lung and opens the door for various further

studies, such as P. intermedia 17’s molecular relationship with lung cancer/COPD and whether

the decrease in the bacterium is directly caused by the presence of the disease. P. intermedia

has already been found in the human lung in relation to cystic fibrosis (Ulrich et al, 2010) so

it’s not entirely unheard of, however strain 17 has only now been found to be present there

also.

4.1.2 INDIVIDUAL SAMPLE DATA

The individual sample data provides many conclusions towards this study.

Firstly, it further confirms the presence of P. intermedia 17 in the human lung along with the

NCBI MegaBlast and node alignment results. In addition to this these results also show a very

interesting trend between the control group, lung cancer group and COPD group. Many species

of Prevotella are potential/opportunistic pathogens (Yunfeng et al, 2015) under a wide range

of environments and are known to invade host tissues (Nadkarni et al, 2012). From this it would

be expected that if a link was found between P. intermedia 17 and lung cancer/COPD, the trend

displayed would show that the level of bacteria would increase in lung cancer/COPD patients,

as P. intermedia 17 would be a pathogen linked to the diseases (more bacteria presence = higher

chance of disease developing). However, looking at the results of the individual sample data

analysis, the trend is actually opposite in this case. The highest level of P. intermedia 17 was

present in the control group (healthy patients) and decreased by approximately 85% in the lung

cancer patient group, and decreased a further ~15% in the COPD patient group. To ensure this

was not due to variation in sample size between patient groups the average % of mapped reads

out of the whole sample group data was calculated, and the trend from this nearly mirrored the

trend shown in the total mapped reads graph. These trends show that the presence of P.

intermedia decreases when lung cancer or COPD is present in the patient. This could be due to

many factors, direct or indirect. The manifestation of these diseases could directly cause the

death of the P. intermedia 17 cells, for example by phagocytosis or toxin/hormone release. On

Page 24: Holly_Davies_Dissertation

Page 24 of 47

the other hand, they could destroy the P. intermedia cells indirectly, through possibly

increasing the growth/presence of other bacteria species which compete with the P. intermedia,

or through changing the environment in the lungs making it inhabitable for the bacteria. This

would be a question for further study, and could lead to a wider knowledge about the mechanics

of lung cancer and/or COPD in the human lung.

The distribution of mapped reads across the P. intermedia 17 genome was also looked

at to see if lung cancer/COPD affected this. Chromosome 1 of P. intermedia 17 contains

579647 base pairs, and chromosome 2 contains 2119790 base pairs, approximately four times

more. This is reflected in the distribution of mapped reads in the control patient group, with

chromosome 1 having 299 mapped reads and chromosome 2 having 1323 mapped reads. This

is relatively maintained in the other two patient groups with some allowance for standard errors,

therefore the diseases do not affect the viability of chromosome 1 or 2 within the human lung.

4.2 LIMITATIONS & IMPLICATIONS

Time and computer memory/processor deficiency was a large limitation that was

encountered during the research, leading to the sampling of 500000 reads from the individual

data. For example, conducting a de novo assembly on the original data was taking up to 18

hours per file, with some attempts aborting due to disk space and computer memory deficiency,

therefore taking a lot longer than was expected due to the size of the files. To correct this when

using this data in the future, a computer with a very large amount of memory and an excellent

processor would be required to complete genomic analysis of the full individual sample data.

4.3 FURTHER STUDY

There are many various routes that could be followed when conducting further research

from this analysis. An example would be to calculate a minimum threshold of P. intermedia

17 presence within the lung for the two diseases i.e. if a patient falls below this threshold then

further investigation would be required or a diagnosis achieved. For this to work a diagnostic

test would have to be developed. This could be achieved, for example, by developing a

biomarker for P. intermedia 17 and tagging it with a fluorescent marker. When, for example,

mixed with patient’s sputum, the biomarker with fluorescent tag would bind to any P.

Page 25: Holly_Davies_Dissertation

Page 25 of 47

intermedia 17 present, and be visible under an ultra-violet light. The less fluorescence visible,

the higher the patient’s chance of having lung cancer or COPD.

Another route of further study could be researching how the lung cancer/COPD cells

interact with the P. intermedia 17 and cause its reduced prevalence in affected lungs. The

manifestation of these diseases could directly cause the death of the P. intermedia 17 cells, for

example by phagocytosis or toxin/hormone release. On the other hand, they could destroy the

P. intermedia cells indirectly, through possibly increasing the growth/presence of other bacteria

species which compete with the P. intermedia, or through changing the environment in the

lungs making it inhabitable for the bacteria.

Other P. intermedia strains were found in the NCBI MegaBlast results, so researching

whether these appear in the human lung could be another route to follow. Also, investigating

whether P. intermedia 17 is related to any other diseases that predominantly reside in the lungs,

or maybe even whether it has relationships with other types of cancer. Additionally, possibly

investigating whether it has any adverse effects upon the disease itself could be a promising

option.

4.4 CONCLUSIONS

This study has not only discovered the presence of Prevotella intermedia 17 in the lungs

for the first time, it has also discovered that it indeed P. intermedia 17 does have a relationship

with both lung cancer and chronic obstructive pulmonary disorder in humans. This could lead

to the development of a new diagnostic test for lung cancer or COPD, or possibly further the

knowledge surrounding these diseases and how they manifest in the human lung. Developing

a new diagnostic test and providing early screening for patients is vitally important for lung

cancer and COPD, as it would have the capacity to save countless lives by giving more people

access to curative treatment at an early stage where it can be effective.

Page 26: Holly_Davies_Dissertation

Page 26 of 47

5. References ALBERG AJ, FORD JG, SAMET JM. Epidemiology of lung cancer: ACCP evidence-based clinical

practice guideline (2nd edition). Chest. 2007;132(29S-55S).

ALTSCHUL DF, GISH W, MILLER W, MYERS EW, LIPMAN DJ. Basic local alignment search tool.

J Mol Biol. 1990;215(3):403-10.

AMERICAN CANCER SOCIETY. 2016. Signs and Symptoms of Lung Cancer [Online]. American

Cancer Society. Available: http://www.cancer.org/cancer/lungcancer-non-

smallcell/moreinformation/lungcancerpreventionandearlydetection/lung-cancer-prevention-

and-early-detection-signs-and-symptoms [Accessed 4th April 2016].

AMERICAN LUNG ASSOCIATION. 2016. How Serious is COPD [Online]. American Lung

Association. Available: http://www.lung.org/lung-health-and-diseases/lung-disease-

lookup/copd/learn-about-copd/how-serious-is-copd.html?referrer=https://www.google.co.uk/

[Accessed 4th April 2016].

BOLIVAR I, WHITESON K, STADELMANN B, BARATTI-MAYERD, GIZARD Y, MOMBELLI

A. Bacterial diversity in oral samples of children in Niger with acute noma, acute necrotizing

gingivitis, and healthy controls. PLoS Negl Trop Dis. 2012;6(3):e1556.

BRENNER DJ. Radiation Risks Potentially Associated with Low-Dose CT Screening of Adult Smokers

for Lung Cancer. RSNA Radiology. 2004;231(2):030-880.

BROOK I. 2015. Bacteroides Infection: Background [Online]. Medscape. Available:

http://emedicine.medscape.com/article/233339-overview [Accessed 6th April 2016].

CAMERON S. Charting Human Microbiome and Metabolome Changes in Disease and Stress.

Aberystwyth University. 2015. PhD thesis.

CANCERCARE®. 2016. Types and Staging of Lung Cancer [Online]. Lungcancer.org (A program of

CancerCare®). Available: http://www.lungcancer.org/find_information/publications/163-

lung_cancer_101/268-types_and_staging [Accessed 4th April 2016].

CANCERRESEARCHUK. 2015a. Lung Cancer Survival Statistics [Online]. CancerResearchUK.

Available: http://www.cancerresearchuk.org/cancer-info/cancerstats/types/lung/survival/lung-

cancer-survival-statistics [Accessed 23rd March 2015]

CANCERRESEARCHUK. 2015b. Lung Cancer Mortality Statistics [Online]. CancerResearchUK.

Available: http://www.cancerresearchuk.org/cancer-info/cancerstats/types/lung/mortality/uk-

lung-cancer-mortality-statistics [Accessed 23rd March 2015]

CANCERRESEARCHUK. 2015c. General Factsheet for Lung Cancer [Online]. CancerResearchUK.

Available:

http://www.cancerresearchuk.org/prod_consump/groups/cr_common/@cah/@gen/documents/

generalcontent/cr_120625.pdf [Accessed 23rd March 2015]

CENTRES FOR DISEASE CONTROL AND PREVENTION (CDC). 2016. Lung Cancer – Basic

Information – What Screening tests are there? [Online]. Centres for Disease Control and

Page 27: Holly_Davies_Dissertation

Page 27 of 47

Prevention. Available: http://www.cdc.gov/cancer/lung/basic_info/screening.htm [Accessed

4th April 2016].

CLC BIO. 2016. CLC Genomics Workbench 8 [Software]. Qiagen.

DICKSON RP, HUFFNAGLE GB. The Lung Microbiome: New Principles for Respiratory

Bacteriology in Health and Disease. PLoS Pathog. 2015;11(7):e1004923.

DORN BR, DUNN WA JR, PROGULSKE-FOX A. Invasion of human coronary cells by periodontal

pathogens. Infect Immun. 1999:67(11);5792-8.

DORN BR, LEUNG KP, PROGULSKE-FOX A. Invasion of Human Oral Epithelial Cells by Prevotella

intermedia. Infect Immun. 1998;66(12):6054-6057.

EDDY, D. Screening for lung cancer. Annals of internal medicine. 1989;111:232-237.

EIRING P, WALLER K, WIDMANN A, WERNER H. Fibronectin and laminin binding of urogenital

and oral Prevotella species. Zentralbl Bakteriol. 1998;288(3):361-72.

FAN Y, DIVYA I, CECILIA A, JANINA P, LEWIS DR. Identification and characterisation of a cell

surface protein of Prevotella intermedia 17 with broad-spectrum binding activity for

extracellular matrix proteins. Proteomics. 2006;6(22):6023-32.

FERLAY J, SOERJOMATARAM I, ERVIK M, DIKSHIT R, ESER S, MATHERS C, REBELO M,

PARKIN DM, FORMAN D, BRAY F. 2014. Cancer Incidence and Mortality Worldwide:

IARC CancerBase No. 11. Globocan 2012 v1.1. 2014

FIORE MC, BAILEY WC, COHEN SJ. Smoking Cessation: Clinical Practice Guideline No 18. US

Department of Health and Human Services, Public Health Service, Agency for Health Care

Policy and Research. AHCPR Publ. 1996;96:0692.

FUKUSHIMA H, MOROI H, INOUE J, ONOE T, EZAKI T, YABUUCHI E, LEUNG KP, WALKER

CB, CLARK WB, SAGAWA H. Phenotypic characteristics and DNA relatedness in Prevotella

intermedia and similar organisms. Oral Microbiol Immunol. 1992;7(1):60-4.

HAFFAJEE AD, SOCRANSKY SS. Review: Microbial etiological agents of destructive periodontal

diseases. Periodontol 2000. 1994;5:78-111.

HAN MK, ZHOU Y, MURRAY S, TAYOB N, NOTH I, LAMA VN, MOORE BB, WHITE ES,

FLAHERTY KR, HUFFNAGLE GB, MARTINEZ FJ. Lung microbiome and disease

progression in idiopathic pulmonary fibrosis: an analysis of the COMET study. The Lancet

Respiratory Medicine. 2014;2(7):548-556.

HARASZTHY VI, ZAMBOM JJ, TREVISAN M, SHAH R, ZEID M, GENCO RJ. Identification of

pathogens in atheromatous plaques. J Dent Res. 1998;77:666.

HAYASHI H, SHIBATA K, SAKAMOTO M, TOMITA S, BENNO Y. Prevotella copri sp. nov. and

Prevotella stercorea sp. nov., isolated from human faeces. Int J Syst Evol Microbiol. 2007;57(Pt

5):941-6.

HOUGHTON AM. Mechanistic links between COPD and lung cancer. Nature Reviews Cancer.

2013;13:233-245.

Page 28: Holly_Davies_Dissertation

Page 28 of 47

JACINTO RC, GOMES BP, FERRAZ CC, ZAIA AA, FILHO FJ. Microbiological analysis of infected

root canals from symptomatic and asymptomatic teeth with periapical periodontitis and the

antimicrobial susceptibility of some isolated anaerobic bacteria. Oral Microbiol Immunol.

2003;18(5):285-92.

ERB-DOWNWARD JR, THOMPSON DL, HAN MK, FREEMAN CM, MCCLOSKY L, SCHMIDT

LA, YOUNG VB, TOEWS GB, CURTIS JL, SUNDARAM B, MARTINEZ FJ, HUFFNAGLE

GB. Analysis of the Lung Microbiome in the ‘Healthy’ Smoker and in COPD. PLoS ONE.

2011;6(2):e16384.

LEUNG KP, FUKUSHIMA H, SAGAWA H, WALKER CB, CLARK WB. Surface appendages,

hemagglutination, and adherence to human epithelial cells of Bacteroides intermedius. Oral

Microbiol Immunol. 1989;4(4):204-10.

LOZANO R, NAGHAVI M, FOREMAN K. Global and regional mortality from 235 causes of death

age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study

2010. Lancet. 2013;380:2095-128.

MAEDA N, OKAMOTO M, KONDO K, ISHIKAWA H, OSADA R, TSURUMOTO A. Incidence of

Prevotella intermedia and Prevotella nigrescens in periodontal health and disease. Microbiol

Immunol. 1998;42(9):583-9.

MALLIA P, CONTOLI M, CARAMORI G, PANDIT A, JOHNSTON S, PAPI A. Exacerbations of

asthma and chronic obstructive pulmonary disease (COPD): focus on virus induced

exacerbations. Current pharmaceutical design. 2003;13:73-97.

MORGULIS A, COLOURIS G, RAYTSELIS Y, MADDEN TL, AGARWALA R, SHAFFER AA.

Database indexing for production MegaBLAST searches. Bioinformatics. 2008;24(16):1757-

64.

NADKANI MA, BROWNE GV, CHHOUR K, BYUN R, NGUYEN K, CHAPPLE CC. Pattern of

distribution of Prevotella species/phylotypes associated with healthy gingiva and periodontal

disease. Eur J Clin Microbiol Infect Dis. 2012;31(11):2989-99.

NAGAOKA K, YANAGIHARA K, MORINAGA Y, NAKAMURA S, HARADA T. Prevotella

intermedia Induces Severe Bacteremic Pneumococcal Pneumonia in Mice with Upregulated

Platelet-Activating Factor Receptor Expression. Infection and Immunity. 2014;82(2):587-593.

NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION (NCBI). 2016. BLAST®

[Online]. National Centre for Biotechnology Information, U.S. National Library of Medicine.

Available: http://blast.ncbi.nlm.nih.gov/Blast.cgi [Accessed 9th April 2016].

NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION (NCBI). 2015. BLAST

Homepage and Selected Search Pages: Introducing the BLAST homepage and form

elements/functions of selected search pages [Online]. National Centre For Biotechnology

Information. Available: ftp://ftp.ncbi.nlm.nih.gov/pub/factsheets/HowTo_BLASTGuide.pdf

[Accessed 9th April 2016].

NATIONAL HEALTH SERVICE (NHS). 2014. Chronic obstructive pulmonary disease – Causes of

COPD [Online]. NHS Choices. Available: http://www.nhs.uk/Conditions/Chronic-obstructive-

pulmonary-disease/Pages/Causes.aspx [Accessed 5th April 2016].

Page 29: Holly_Davies_Dissertation

Page 29 of 47

NATIONAL HEALTH SERVICE (NHS). 2016. CT Scan – Introduction [Online]. NHS Choices.

Available: http://www.nhs.uk/conditions/ct-scan/Pages/Introduction.aspx [Accessed 4th April

2016].

NATIONAL INSTITUTES OF HEALTH (NIH). 2013a. What is COPD? [Online]. National Heart,

Lung, and Blood Institute. Available: http://www.nhlbi.nih.gov/health/health-

topics/topics/copd/ [Accessed 5th April 2016]

NATIONAL INSTITUTES OF HEALTH (NIH). 2013b. What are the signs and symptoms of COPD?

[Online]. National Heart, Lung, and Blood Institute. Available:

https://www.nhlbi.nih.gov/health/health-topics/topics/copd/signs [Accessed 5th April 2016].

QASEEM A, WILT TJ, WEINBERGER SE, HANANIA NA, CRINER G, VAN BER MOLEN T,

MARCINIUK DD. Diagnosis and Management of Stable Chronic Obstructive Pulmonary

Disease: A Clinical Practice Guideline Update from the American College of Physicians,

American College of Chest Physicians, American Thoracic Society and European Respiratory

Society. Annals of Internal Medicine. 2011;155(3):179-91

PARRY. 2010. Use and abuse of drugs – the link between smoking and lung cancer [Image][Online]

Available:

http://www.corescience.co.uk/index.php?option=com_content&view=article&id=58%3Ause-

and-abuse-of-drugs&catid=43%3Adrugs&Itemid=41&limitstart=3 [Accessed 13th April 2016]

RAVIV S, HAWKINS K, DECAMP M, KALHAN R. Lung cancer in chronic obstructive pulmonary

disease: enhancing surgical options and outcomes. American journal of respiratory and critical

care medicine. 2011;176:532-555.

RUAN Y, SHEN L, ZOU Y, QI Z, YIN J, JIANG J, GUO L, HE L, CHEN Z, TANG Z, QIN S.

Comparative genome analysis of Prevotella intermedia strain isolated from infected root canal

reveals features related to pathogenicity and adaptation. BMC Genomics. 2015;16(1):1.

SHAH HN, COLLINS DM. NOTES: Prevotella, a new genus to include bacteroides melaninogenicus

and related species formerly classified in the genus bacteroides. Int J Systematic.

1990;40(2):205-8.

UK LUNG CANCER SCREENING TRIAL (UKLS). 2012. Background to UKLS [Online]. UKLS.

Available: https://www.ukls.org/index.html [Accessed 4th April 2016].

ULRICH M, BEER I, BRAITMAIER P, DIERKES M, KUMMER F, KRISMER B. Relative

contribution of Prevotella intermedia and Pseudomonas aeruginosa to lung pathology in

airways of patients with cystic fibrosis. Thorax. 2010;65(11):978-84.

US. DEPARTMENT OF HEALTH AND HUMAN SERVICES. 2014. The Health Consequences of

Smoking – 50 years of progress: A report of the surgeon general [Online]. Centres for Disease

Control and Prevention. Available: http://www.cdc.gov/tobacco/data_statistics/sgr/50th-

anniversary/index.htm [Accessed 5th April 2016].

VESTBO, JORGEN. Definition and Overview: Global Strategy for the Diagnosis, Management, and

Prevention of Chronic Obstructive Pulmonary Disease. Global Initiative for Chronic

Obstructive Lung Disease. 2013:pp(1-7).

Page 30: Holly_Davies_Dissertation

Page 30 of 47

VESTBO, JORGEN. Diagnosis and Assessment: Global Strategy for the Diagnosis, Management, and

Prevention of Chronic Obstructive Pulmonary Disease. Global Initiative for Chronic

Obstructive Lung Disease. 2013:pp(9-17).

WORLD HEALTH ORGANISATION (W.H.O.). 2016. Chronic respiratory diseases – Causes of

COPD [Online]. World Health Organisation. Available:

http://www.who.int/respiratory/copd/causes/en/ [Accessed 5th April 2016].

WORLD HEALTH ORGANISATION (W.H.O.). 2015a. Chronic Obstructive Pulmonary Disease

(COPD) Factsheet [Online]. World Health Organisation. Available:

http://www.who.int/mediacentre/factsheets/fs315/en/ [Accessed 23rd March 2015].

WORLD HEALTH ORGANISATION (W.H.O.). 2015b. Cancer Factsheet [Online]. World Health

Organisation. Available: http://www.who.int/mediacentre/factsheets/fs297/en/ [Accessed 4th

April 2016].

WORLD HEALTH ORGANISATION (W.H.O.). 2014. World Cancer Report 2014. [Online]. World

Health Organisation. Available:

http://apps.who.int/bookorders/anglais/detart1.jsp?codlan=1&codcol=76&codcch=31#

[Accessed 4th April 2016].

YOUNG, VINCENT B. (2010). Blueprints Medicine (5th Ed.). Philadelphia: Wolters Kluwer

Health/Lippincott William & Wilkins. p. 69. ISBN: 978-0-7817-8870-0.

YUNFENG R, LU S, YAN Z, ZHENGNAN Q, JUN Y, JIE J, LIANG G, LIN H, ZIJIANG C,

ZISHENG T, SHENGYING Q. Comparative genome analysis of Prevotella intermedia strain

isolated from infected root canal reveals features related to pathogenicity and adaptation. BMC

Genomics. 2015:16;122.

Page 31: Holly_Davies_Dissertation

Page 31 of 47

6. Word Count

The final word count for this study, excluding the final list of references,

acknowledgements, tables, table of contents, and figure/image legends is:

6692

Page 32: Holly_Davies_Dissertation

Page 32 of 47

7. List of Figures/Tables/Images

Table 1: NCBI MegaBlast Search Results for the 5 largest contigs in relation to P.

intermedia 17

Figure 1: Node alignment of the 5 contigs with the P. intermedia 17 reference

genome

Figure 2: Bar chart displaying the total number of mapped reads found in each of

the patient groups

Figure 3: Line chart displaying the average percentage of mapped reads from the

total genomic data in the patient groups

Figure 4: Bar chart displaying the distribution of mapped reads across the two

chromosomes of the P. intermedia 17 genome for each of the patient

groups

Image 1: Government campaign supporting smoking cessation

Image 2: Structural changes in human lungs with COPD

Page 33: Holly_Davies_Dissertation

Page 33 of 47

8. Appendix APPENDIX 1 – NCBI MEGABLAST RESULTS (FULL)

NODE_54069

Description Max

Score

Total

Score

Query

Cover

E

value

Identity Accession

Prevotella intermedia DNA.

Complete genome. Strain:

OMA14. Chromosome 1

7413

24943

91%

0.0

83%

AP014597.1

Prevotella intermedia DNA.

Chromosome 2. Complete

genome. Strain: 17-2

7491

24979

90%

0.0

83%

AP014925.1

Prevotella intermedia 17

chromosome II. Complete

sequence.

7491

24979

90%

0.0

83%

CP003503.1

NODE_28947

Description Max

Score

Total

Score

Query

Cover

E

value

Identity Accession

Prevotella intermedia DNA,

complete genome. Strain:

OMA14. Chromosome II

4071

9268

32%

0.0

83%

AP014598.1

Prevotella intermedia DNA,

chromosome 1. Complete

genome. Strain: 17-2

3517

10209

40%

0.0

81%

AP014926.1

Prevotella intermedia 17

chromosome I. Complete

sequence

3517

10209

40%

0.0

81%

CP003502.1

Prevotella intermedia DNA.

Complete genome. Strain:

OMA14. Chromosome I.

122

122

0%

3e-22

94%

AP014597.1

Prevotella intermedia DNA,

chromosome 2. Complete

genome. Strain 17-2

121

121

0%

1e-21

94%

AP014925.1

Page 34: Holly_Davies_Dissertation

Page 34 of 47

NODE_13609

Description Max

Score

Total

Score

Query

Cover

E

value

Identity Accession

Prevotella intermedia 17

chromosome II. Complete

sequence

6259

18001

77%

0.0

83%

CP003503.1

Prevotella intermedia DNA,

chromosome 2. Complete

genome. Strain: 17-2

6255

17997

77%

0.0

83%

AP014925.1

Prevotella intermedia DNA.

Complete genome. Strain:

OMA14. Chromosome I

6325

15926

66%

0.0

83%

AP014597.1

NODE_12098

Description Max

Score

Total

Score

Query

Cover

E

value

Identity Accession

Prevotella intermedia DNA.

Complete genome. Strain:

OMA14. Chromosome I

5265

16356

54%

0.0

82%

AP014597.1

Prevotella intermedia DNA,

chromosome 2. Complete

genome. Strain: 17-2

5068

18108

63%

0.0

81%

AP014952.1

Prevotella intermedia 17

chromosome II. Complete

sequence

5068

18113

63%

0.0

81%

CP003503.1

NODE_18381

Description Max

Score

Total

Score

Query

Cover

E

value

Identity Accession

Prevotella intermedia DNA,

chromosome 2. Complete

genome. Strain: 17-2

2372

5663

33%

0.0

81%

AP014925.1

Prevotella intermedia 17

chromosome II. Complete

sequence

2372

5668

33%

0.0

81%

CP003503.1

Prevotella intermedia DNA.

Complete genome. Strain:

OMA14. Chromosome I

2287

3579

20%

0.0

80%

AP014597.1

Page 35: Holly_Davies_Dissertation

Page 35 of 47

APPENDIX 2 – MAPPING GRAPHS OF INDIVIDUAL SAMPLE DATA

Blue areas represent areas matching that of the P. intermedia 17 reference genome

SAMPLE B – CHROMOSOME 1

Page 36: Holly_Davies_Dissertation

Page 36 of 47

N.B. B11 is omitted due to no reads being mapped in either chromosome

SAMPLE B – CHROMOSOME 2

Page 37: Holly_Davies_Dissertation

Page 37 of 47

N.B. B11 is omitted due to no reads mapping on either chromosome

Page 38: Holly_Davies_Dissertation

Page 38 of 47

SAMPLE C – CHROMOSOME 1

N.B. C2, 3, 8 are omitted due to no reads mapping on either chromosome

Page 39: Holly_Davies_Dissertation

Page 39 of 47

SAMPLE C – CHROMOSOME 2

N.B. C2, 3, 8 are omitted due to no reads mapping to either chromosome

Page 40: Holly_Davies_Dissertation

Page 40 of 47

SAMPLE D – CHROMOSOME 1

N.B. D2, 4, 8, 10, 11 omitted due to no reads mapped for either chromosome.

Page 41: Holly_Davies_Dissertation

Page 41 of 47

SAMPLE D – CHROMOSOME 2

N.B. D2, 4, 8, 10, 11 omitted due to no reads mapped for either chromosome.

Page 42: Holly_Davies_Dissertation

Page 42 of 47

APPENDIX 3 – INDIVIDUAL PATIENT MAPPING DATA

SAMPLE B

B2

B3

B4

B5

Page 43: Holly_Davies_Dissertation

Page 43 of 47

B6

B7

B8

B9

Page 44: Holly_Davies_Dissertation

Page 44 of 47

B10

SAMPLE C

C4

C5

C6

Page 45: Holly_Davies_Dissertation

Page 45 of 47

C7

C9

C10

C11

Page 46: Holly_Davies_Dissertation

Page 46 of 47

SAMPLE D

D3

D5

D6

D7

Page 47: Holly_Davies_Dissertation

Page 47 of 47

D9