2 SUPPLEMENTARY FIGURE LEGENDS · 7/25/2011 · 13 Supplementary Figure 2. Evaluation of the 16S...

1

SUPPLEMENTARY FIGURES 1

SUPPLEMENTARY FIGURE LEGENDS 2

Supplementary Figure 1. A) The microbial DNA is arranged into an absolute coordinate 3

space based on their multimer-frequencies. The user defined target and non-target groups are 4

shown in green and red, respectively. B) The program suggests the appropriate probes based 5

on user defined parameters in the preceding steps. Each probe’s hybridization area to the 6

target bacteria is shown and the probe sequence is colored according to its direction 7

(sense/antisense = green/red). C) The colors in the cross-labeling graph indicate the pairwise 8

compatibility of the relevant probes. The blue color between two probes means that these are 9

compatible and will not cause problems for each others. Red means that there is high 10

probability that they will hybridize to each other’s target bacteria. 11

12

Supplementary Figure 2. Evaluation of the 16S rRNA gene PCR reaction reproducibility. 13

Three replicate 16S rRNA gene PCR reactions were performed on one of the infant fecal 14

samples. To examine the signal from certain amplified target bacteria in the sample the 15

probes, 6_1_4 and 5_1_2 were chosen. The mean signal for the three individual PCR 16

products and the mean signal for a pool of the three PCR reactions in triplicate is presented. 17

The standard deviation of the triplicates and the three individual PCR reactions is indicated 18

for each mean value. The mean signal for the three individual PCR products and a pool of the 19

three PCR products were very similar and with the same or lower standard deviation 20

compared to the pooled samples. 21

22

Supplementary Figure 3. Pictures of two of the arrays illustrating the different bacterial 23

profiles of two individuals. The difference in signal for probe 6_1_4 (specific for 24

2

Bifidobacterium longum) is illustrated with white circles. Array A shows the profile for an 25

IgE sensitized one year old individual, while array B shows the profile of an IgE non-26

sensitized one year old individual. 27

28

Supplementary Figure 4. Evaluation of the quantification of specific targets in constant 29

background of non-targets. In all the experiments the total amount of templates were kept 30

constant at 100 ng in the labeling reaction, while the targets for the illustrated probes were 31

diluted. Error bars represent standard deviations. 32

33

3

Supplementary Figure 1 34

35

36

37

A)

C) b)

g

ur

4

4


39


41

42

5


44

6

SUPPLEMENTARY TABLES 45

Supplementary Table 1. Probe reproducibility based on 43 sample replica. 46

Probe ID

Mean %

variation R2 1

1_1 20 % 0.98

1_3_3 31 % 0.97

2_1_1 31 % 0.91

2_1_min1b 22 % 0.95

2_3_2 28 % 0.86

2_4_1 16 % 0.94

2_5_1 19 % 0.88

2_7_1 16 % 0.98

3_2 26 % 0.91

4_1 18 % 0.94

4_3_1 20 % 0.96

4_4_2 37 % 0.95

4_8_1 17 % 0.98

5_1 16 % 0.95

5_1_2 20 % 0.97

6_1_4 27 % 0.74

6_2 15 % 0.89

6_2_2 22 % 0.85

Univ 16S 12 % 0.77

47

1 R

2 represents the squared regression coefficient. 48

49

50

51

7

Supplementary Table 2. Ten probe set suggestions. 52

Theoretical bacterial target group Probe set 1 Probe set 2 Probe set 3 Probe set 4 Probe set 5 Probe set 6 Probe set 7 Probe set8 Probe set 9 Probe set 10

Bacteroides 1_1 1_1 1_1 1_1 1_1 1_1 1_1 1_1 1_1 1_1

Parabacteroides 1_1_3 1_1_3 1_1_3 1_1_3 1_1_3 1_1_5 1_1_5 1_1_5 1_1_5 1_1_5

Bacteroides (dorei, fragilis, thetaiotaomicron,

vulgatus)

1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2

Bacteroides (dorei, fragilis, thetaiotaomicron,

vulgatus)

1_3_3 1_3_3 1_3_3 1_3_3 1_3_3 1_4_1 1_4_5 1_3_3 1_4_1 1_4_5

Gamma-proteobacteria 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b

Haemophilus 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1

Gamma-proteobacteria subgroup 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2



Salmonella 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1

Proteobacteria 3_2 3_2 3_2 3_2 3_2 3_2 3_2 3_2 3_2 3_2

Firmicutes (Lactobacillales, Clostridium perf.,

Staphylococcus)

4_1 4_1 4_1 4_1 4_1 4_1 4_1 4_1 4_1 4_1

Lactobacillus subgroup 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3

Clostridium ramosum 4_3_1 4_3_1 4_3_1 4_3_1 4_3_1 4_3_4 4_3_4 4_3_4 4_3_4 4_3_4

Enterococcus, Listeria 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2

Streptococcus pyogenes 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2

Streptococcus sanguinis 4_6_1 4_6_1 4_6_1 4_6_1 4_6_1 4_6_2 4_6_2 4_6_2 4_6_2 4_6_2

Listeria 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2

Streptococcus pneumoniae, Enterococcus 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1

Firmicutes (Clostridia, Bacillales, Enterococcus,

Lactobacillus)

5_1 5_1 5_1 5_1 5_1 5_1 5_1 5_1 5_1 5_1

Staphylococcus 5_1_2 5_1_5 5_1_2 5_1_5 5_1_2 5_1_5 5_1_5 5_1_5 5_1_5 5_1_5

Clostridium neonatale 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1

Bifidobacterium longum 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4

Actinobacteria 6_2 6_2 6_2 6_2 6_2 6_2 6_2 6_2 6_2 6_2

Bifidobacterium breve 6_2_1 6_2_1 6_2_2 6_2_2 6_2_1 6_2_1 6_2_1 6_2_1 6_2_2 6_2_1

+ UNI01 (16S rRNA UNVERSAL PROBE) and Hybridization control HYC01

The probes in boldface vary between the probe sets 53

54

8

Supplementary Table 3. Control probes included on arrays 55

Probe name Control type Probe sequence spotted on array Reference

UNI01c 16S Universal CCCCCTGCCAGCAGCCGCGGTAATACG (1)

HYC01c Hybridization control CCCCCTTGCCCGAATCGAATGCTAC N/A

NBC01c Non binding control CCCCCAGGAAGGAAGGAAGGAAGGG (2)

NBC02c Non binding control CCCCCCCCTTCCTTCCTTCCTTCCT (2)

56

57

9

Supplementary Table 4. Experimental design for quantification in a complex 58

background1, 2

59

B.brev B.brev B.brev B.brev B.brev B.brev B.brev B.brev B.brev

B.long B.long B.long B.long B.long B.long B.long B.long B.long

B.frag B.frag B.frag B.frag B.frag B.frag B.frag B.frag B.frag

C.rams C.rams C.rams C.rams C.rams C.rams C.rams C.rams C.rams

E.faec E.faec E.faec E.faec E.faec E.faec E.faec E.faec E.faec

S.aure S.aure S.aure S.aure S.aure S.aure S.aure S.aure S.aure

S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog

E.coli E.coli E.coli E.coli E.coli E.coli E.coli E.coli E.coli

1Abbrevations: B.brev - Bifidobacterium breve, B. long - Bifidobacterium longum subsp. 60

infantis, B.frag - Bacteroides fragilis, C.rams - Clostridium ramosum, E.faec - Enterococcus 61

faecalis, S.aure - Staphylococcus aureus subsp. aureus, S.pyog - Streptococcus pyogenes, 62

E.coli - Escherichia coli 63

2Each column represent the mix in a single sample. The species in regular font was added at 64

final concentration of 6.25 ng/µl, while the species in bold were added in respective 65

concentrations of 1, 0.5 and 0 ng/µl. All experiments were conducted in triplicates, except for 66

the addition of 0 ng/µl which were conducted in duplicates. 67

68

10

69

Supplementary Table 5. Quantitative range for selected probes and bacteria 70

Probe1

Species Upper2

Lower2

R2 3

1.1 Bacteroides vulgatus 10 0.01 1.00

1.2.2 Bacteroides vulgatus 10 0.01 1.00

1.3.3 Bacteroides vulgatus 10 0.01 1.00

6.2 Bifidobacterium breve 100 0.01 0.95

6.2.2 Bifidobacterium breve 10 0.01 0.94

2.1.min1b Escherichia coli 10 0.01 0.96

3.2 Escherichia coli 10 0.01 0.90

4.1 Enterococcus faecalis 100 0.01 0.92

4.4.2 Enterococcus faecalis 10 0.01 0.95

5.1 Enterococcus faecalis 10 0.01 0.94

2.1.min1b Salmonella enterica 10 0.01 0.97

2.5.1 Salmonella enterica 10 0.01 0.97

2.7.1 Salmonella enterica 10 0.01 0.97

3.2 Salmonella enterica 10 0.01 0.96

71 1 Only target probes are shown. 72

2 Upper and Lower represent the quantitative range of the dilution series in ng. The 73

quantitative range was determined by 10-fold dilution series of PCR products. 74

3 R

2 represents the squared regression coefficient. 75

76

11

SUPPLEMENTARY INFORMATION 77

Definitions and problem description 78

There are two probe types in the microarray set-up to describe microbial communities. We 79

define these probes as either labeling probes or capture probes. The labeling probes are short 80

unlabeled oligonucleotides (18 to 30 nt) which are labeled if the bacteria the probes identify 81

are present in the community. The labeling level is proportional to the amount of target 82

bacteria. The labeling is in the form of a single fluorescent dideoxy cytosine (TAMRA-83

ddCTP). The capture probes are complementary to the respective labeling probes. The 84

capture probes are immobilized on a microarray, and the labeling probes are hybridized to 85

their complementary capture probes after labeling. The level of fluorescence for the labeling 86

probes is determined by the fluorescence for the capture probe spots. In an experiment we 87

have a set of different labeling probes and we know which group of bacteria these probes can 88

identify (see Box 1). Among these probes, we want to identify a subset of labeling probes, 89

one for each group of bacteria that is compatible, i.e. not hybridizes to other capture probes 90

(cross-binding) or to other labeling probes (cross-labeling). It is also necessary that the 91

labeling probes do not form secondary structures, which may result in self-labeling. A simple 92

illustration of microarray set-up is provided in Supplementary Information Figure 1. 93

94

Box 1. Program’s input. The program takes a “.fastagr” file as input, which is an ordinary fasta file 95

added information in parentheses about which group of bacteria these probes belong . An example 96

with 4 groups of bacteria Bifidobacterium, Ecoli, Staph and Veillonella is shown below. 97

>1 (Bifidobacterium) 98 CCGAAGGCTTGCTCCCAAA 99 >2 (Bifidobacterium) 100 GCTTATTCGAAAGGTACACTCACCCCGAAGGG 101 >3 (Bifidobacterium) 102 GAGCAAGCGTGAGTAAGTTTA 103 >4 (Ecoli) 104 GAGCAAAGGTATTAACTTTACTC 105 >5 (Ecoli) 106 CCTGGACAAAGACTGACGCT 107 >6 (Staph) 108 ACACATATGTTCTTCCCTAATAA 109 >7 (Staph) 110 CCACTGGTGTTCCTCC 111 >8 (Veillonella) 112 GTTAAGCCCCGAACTTTTAAGA 113 >9 (Veillonella) 114 CGAACTTTTAAGACAGACTGAC 115 >10 (Veillonella) 116 GATTGGCAGTTTCCATCCCAT 117

12

Determination of score for hybridization level 118

DNA hybridize by forming hydrogen bonds between its nucleotides: Adenine (A) forms two 119

hydrogen bonds with thymine (T), and guanine (G) forms three hydrogen bonds with 120

cytosine (C). Thus, matching two arbitrary probes, we give two points for each A-T binding 121

and three points for each G-C binding. G and A are purines, while C and T are pyrimidines. 122

Purines and pyrmidines can also bind with single hydrogen bound, and a purine (A or G) 123

binding to pyrmidine (C or T) is given one point. Only hybridizing that contains more than 124

two consecutive nucleotide bindings is taken into consideration. The score is squared to give 125

longer hybridizations a higher weight. In Supplementary Information Figure 2 we have 126

compared probe nr. 2 and probe nr. 8 from the .fastagr file in Box 1. The final score between 127

these two probes is the sum of all hybridizing scores. The lower the score, the fewer 128

interruptions between these probes will be in an experiment. Pairwise scores are calculated 129

for each pair of probes from different groups. 130

131

Algorithm for determining probe compatibility 132

In order to find the optimal combination of all probes we will need to calculate scoring for 133

each combination. The combination with minimum score will be the best one to use in the 134

experiment. If we use data from example 1, we will have to find all possible combinations, 135

one probe from each of the 4 groups. Supplementary Information Figure 3 shows two 136

different combinations of probes. The red combination consists of probes nr. 1, 4, 6, and 8, 137

while the blue includes probes nr. 2, 5, 6, and 10. The score for one combination is the sum of 138

all pairwise scores between each pair of the probes in the combination, i.e. 139

Red combination: 140

1-4 + 1-6 + 1-8 + 4-6 + 4-8 + 6-8 = score for combination 1-4-6-8 141

Blue combination 142

2-5 + 2-6 + 2-10 + 5-6 + 5-10 + 6-10 = score for combination 2-5-6-10 143

144

In addition to pairwise scores, we need to add the self-hybridizing score: 1-1, 6-6, 4-4, and 8-145

8 for the red combination and 2-2, 5-5, 6-6, and 10-10 for the blue combination. Optionally, 146

the scores from the cross-binding, the cross-labeling, or both can be used in the calculations. 147

It is also possible to use empirical data determined from real hybridization experiments. 148

149

13

When all possible combinations are calculated and stored, we get a score matrix containing 150

all of these scores. When we sort these scores, we find which combination gives the 151

minimum total score. This combination is then likely to give us the best results in 152

experiments. 153

154

In our simple case, we had 4 groups of probes and in each group we had either 2 or 3 probes 155

(3*2*3*2 = 36 combinations product of number of probes in each group). 156

In general, the number of combination is calculated as following: 157

158

c = ∏xi (Eq 1) 159

160

Where c is the number of all possible combinations, xi is the number of probes in group i, and 161

n is the number of groups in the dataset. 162

163

The running time for this algorithm can be found by analyzing the cost of each step: 164

1. Find scores for all possible combinations 165

2. Choose the score with minimum cost 166

167

The above reasoning leads us to following calculations: 168

1. Given formula for the number of possible combinations (Eq 1), we want to know worst 169

case cost of the algorithm. We therefore simplify this formula by saying that the number 170

of probes in each group is equal to the biggest number of probes (m) found in the dataset. 171

The number of possible combinations can then be expressed as c = mn 172

2. The score of each combination is the sum of n pairwise scores requires n additions 173

3. Thus computing the score for all possible combinations requires (mn)n elementary 174

operations 175

4. Choosing the combination with minimum score requires mn comparisons 176

5. Summarizing, (mn)*n + (m

n) = (n+1) (m

n) elementary operations are needed for the 177

algorithm. 178

6. Writing this in terms of Big-O notation, we get O=(mn) 179

180

In a real situation we can get up to 100 groups of probes each containing 2-6 probes. Say for 181

example that we have 20 groups and each group contains only 3 probes. Then the number of 182

elementary operations will be (20+1)*(320

) = 73.222.472.421! 183

i = 1

n

14

184

Assuming that each elementary operation takes one millisecond (10-6

) on a fast computer, we 185

will need 73.222.472.421/(106*60*60) = app. 20 hours to calculate the result and it is highly 186

likely that the program will crash during the calculations due to out of memory issues. 187

188

189

The algorithm for searching the best probe sets 190

The variable ScoreTable from the code example in Box 2 contains scores for all possible 191

combinations (scoreTable[4] contains the full score for one specific combination). 192

Information is stored in SumScore table for each for loop, saving us for redundant 193

calculations. Supplementary Information Figure 4 shows what is stored after SumScore 194

iteration. When we have calculated the score for combination 1-4-6-8, we need to go only 195

one step back and rapidly find the score for combination 1-4-6-9 without having to calculate 196

the sum of scores for 1-1, 1-4, 4-4, 1-6, 4-6 and 6-6 (this sum is already stored in 197

SumScore[3]). The same applies when we want to calculate the full score of combination 1-198

4-6-10. The number in SumScore[4] (full score for one combination) is changing for each 199

time we choose another combination, so we need to keep this full score before it is 200

overwritten. For that purpose, a so-called hash-table can be used where full score is the key 201

and the index of all probes is the object (hash-table is like a dictionary: keys are the 202

keywords, and objects are these keywords translated into another language). When the hash-203

table is completed, the program sorts it with respect to scores (keys) and finds which 204

combination (object) gives the best score. 205

206

SUPPLEMENTARY INFORMATION REFERENCES 207

208 1. Lane, D. J. 1991. Nucleic acid techniques in bacterial systematics John Wiley and 209

Sons, New York. 210

2. Sanguin, H., A. Herrera, C. Oger-Desfeux, A. Dechesne, P. Simonet, E. Navarro, T. 211

M. Vogel, Y. Moënne-Loccoz, X. Nesme, and G. L. Grundmann. 2006. Development 212

and validation of a prototype 16S rRNA-based taxonomic microarray for 213

Alphaproteobacteria. Environmental Microbiology 8:289-307. 214

215

216

15

217

SUPPLEMENTARY INFORMATION FIGURE LEGENDS 218

219

Supplementary Information Figure 1. The role of labeling and capture probes. Possible 220

sources of errors showed as red dashed arrows. 221

222

Supplementary Information Figure 2. Pairwise comparison of probe 8 and 2. Blue lines 223

show the area where the hybridization might occur. 224

225

Supplementary Information Figure 3. Example with scores for two probe combinations 226

(red and blue arrows). 227

228

Supplementary Information Figure 4. Visualizing a part of the probe set searching 229

algorithm. 230

231

16

Capture probes immobilized on a microarray

SUPPLEMENTARY INFORMATION FIGURES 232 233

Supplementary Information Figure 1.

GGGTAGTTACCCGGAT C

…CGAACTTTTAAGACAGACTGACTGGCTTGCCATGCCGTCTAGTGAAATAAAGGCCAGTTCCCATCAATGGGCCTAGCCGTCAAAGGC…

GGTACGGCAGATCA C

GGTACGGCAGATCA

GGTACGGCAGATCA

GGTACGGCAGATCA

C C

…CGAACTTTTGACATCTTCCCTAATAATTGGCAGCATCTTGAACCACCCATCAATGGGCCTAGTACCTCAGAGTATACGATCAAAGGC…

GGGTAGTTACCCGGAT

T

C

C GGGTAGTTACCCGGAT

GGGTAGTTACCCGGAT

GGGTAGTTACCCGGAT

C

Cross-labeling

Self-labeling

G

T

T

T

G

G

G

A

C

C

C

G

G

A

T

A

C

Cross-binding

…GTACGGCAGA…

GGTACGGCAGAT

CA GGGTAGTTACCCGT

…GTACGGCAGA…

C

CCCATCAATGGGCC

TA

CCATGCCGTCTAGT

Legend:

5’ to 3’ direction of DNA strands

A part of DNA strand from bacterium 1

Labeling probe for bacterium 1

Labeling probe for bacterium 2

A part of DNA strand from bacterium 2

Dideoxy cytosine labeled with the reporter

Capture probe for labeling probe for bacterium 1

Capture probe for labeling probe for bacterium 2

17

234

Staph

- Probe 6

- Probe 7

Bifidobact.

- Probe 1 - Probe 2

- Probe 3

Veillonella

- Probe 8

- Probe 9

- Probe 10

Ecoli

- Probe 4

- Probe 5

Score 1-4

Score 1-6

Score 1-8

Score 2-5

Score 2-6

Score 2-10

Score 4-6

Score 6-8

Score 5-6

Score 6-10

Sco

re 8

-4 S

core 5

-10



18

235 236

4

1 2

3

6

5

4 4 5

4 5

9 10

Score for this combination of probes: 1

from group 1, group 2, 6 from group 3

and 8 from group 4 based on pairwise

combination: 1-1, 1-4, 4-4, 1-6, 4-6, 6-

6, 1-8, 4-8, 6-8, 8-8

8

7 6 7

SumScore[0] = 0

SumScore[1] = SumScore[0] + 1-1

SumScore[2] = SumScore[1] + 1-4 + 4-4

SumScore[3] = SumScore[2] + 1-6 +4-6 + 6-6

SumScore[4] = SumScore[3] + 1-8 +4-8 + 6-8 + 8-8


2 SUPPLEMENTARY FIGURE LEGENDS · 7/25/2011 · 13 Supplementary Figure 2. Evaluation of the 16S...

Documents

Transcript of 2 SUPPLEMENTARY FIGURE LEGENDS · 7/25/2011 · 13 Supplementary Figure 2. Evaluation of the 16S...