2 SUPPLEMENTARY FIGURE LEGENDS · 7/25/2011 · 13 Supplementary Figure 2. Evaluation of the 16S...
Transcript of 2 SUPPLEMENTARY FIGURE LEGENDS · 7/25/2011 · 13 Supplementary Figure 2. Evaluation of the 16S...
1
SUPPLEMENTARY FIGURES 1
SUPPLEMENTARY FIGURE LEGENDS 2
Supplementary Figure 1. A) The microbial DNA is arranged into an absolute coordinate 3
space based on their multimer-frequencies. The user defined target and non-target groups are 4
shown in green and red, respectively. B) The program suggests the appropriate probes based 5
on user defined parameters in the preceding steps. Each probe’s hybridization area to the 6
target bacteria is shown and the probe sequence is colored according to its direction 7
(sense/antisense = green/red). C) The colors in the cross-labeling graph indicate the pairwise 8
compatibility of the relevant probes. The blue color between two probes means that these are 9
compatible and will not cause problems for each others. Red means that there is high 10
probability that they will hybridize to each other’s target bacteria. 11
12
Supplementary Figure 2. Evaluation of the 16S rRNA gene PCR reaction reproducibility. 13
Three replicate 16S rRNA gene PCR reactions were performed on one of the infant fecal 14
samples. To examine the signal from certain amplified target bacteria in the sample the 15
probes, 6_1_4 and 5_1_2 were chosen. The mean signal for the three individual PCR 16
products and the mean signal for a pool of the three PCR reactions in triplicate is presented. 17
The standard deviation of the triplicates and the three individual PCR reactions is indicated 18
for each mean value. The mean signal for the three individual PCR products and a pool of the 19
three PCR products were very similar and with the same or lower standard deviation 20
compared to the pooled samples. 21
22
Supplementary Figure 3. Pictures of two of the arrays illustrating the different bacterial 23
profiles of two individuals. The difference in signal for probe 6_1_4 (specific for 24
2
Bifidobacterium longum) is illustrated with white circles. Array A shows the profile for an 25
IgE sensitized one year old individual, while array B shows the profile of an IgE non-26
sensitized one year old individual. 27
28
Supplementary Figure 4. Evaluation of the quantification of specific targets in constant 29
background of non-targets. In all the experiments the total amount of templates were kept 30
constant at 100 ng in the labeling reaction, while the targets for the illustrated probes were 31
diluted. Error bars represent standard deviations. 32
33
3
Supplementary Figure 1 34
35
36
37
A)
C) b)
g
ur
4
4
Supplementary Figure 2 38
39
Supplementary Figure 3 40
41
42
5
Supplementary Figure 4 43
44
6
SUPPLEMENTARY TABLES 45
Supplementary Table 1. Probe reproducibility based on 43 sample replica. 46
Probe ID
Mean %
variation R2 1
1_1 20 % 0.98
1_3_3 31 % 0.97
2_1_1 31 % 0.91
2_1_min1b 22 % 0.95
2_3_2 28 % 0.86
2_4_1 16 % 0.94
2_5_1 19 % 0.88
2_7_1 16 % 0.98
3_2 26 % 0.91
4_1 18 % 0.94
4_3_1 20 % 0.96
4_4_2 37 % 0.95
4_8_1 17 % 0.98
5_1 16 % 0.95
5_1_2 20 % 0.97
6_1_4 27 % 0.74
6_2 15 % 0.89
6_2_2 22 % 0.85
Univ 16S 12 % 0.77
47
1 R
2 represents the squared regression coefficient. 48
49
50
51
7
Supplementary Table 2. Ten probe set suggestions. 52
Theoretical bacterial target group Probe set 1 Probe set 2 Probe set 3 Probe set 4 Probe set 5 Probe set 6 Probe set 7 Probe set8 Probe set 9 Probe set 10
Bacteroides 1_1 1_1 1_1 1_1 1_1 1_1 1_1 1_1 1_1 1_1
Parabacteroides 1_1_3 1_1_3 1_1_3 1_1_3 1_1_3 1_1_5 1_1_5 1_1_5 1_1_5 1_1_5
Bacteroides (dorei, fragilis, thetaiotaomicron,
vulgatus)
1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2 1_2_2
Bacteroides (dorei, fragilis, thetaiotaomicron,
vulgatus)
1_3_3 1_3_3 1_3_3 1_3_3 1_3_3 1_4_1 1_4_5 1_3_3 1_4_1 1_4_5
Gamma-proteobacteria 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b 2_1_min1b
Haemophilus 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1 2_1_1
Gamma-proteobacteria subgroup 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2 2_3_2
Gamma-proteobacteria subgroup 2_4_1 2_4_1 2_4_1 2_4_1 2_4_1 2_4_1 2_4_1 2_4_1 2_4_1 2_4_1
Gamma-proteobacteria subgroup 2_5_1 2_5_1 2_5_1 2_5_1 2_5_1 2_5_1 2_5_1 2_5_1 2_5_1 2_5_1
Salmonella 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1 2_7_1
Proteobacteria 3_2 3_2 3_2 3_2 3_2 3_2 3_2 3_2 3_2 3_2
Firmicutes (Lactobacillales, Clostridium perf.,
Staphylococcus)
4_1 4_1 4_1 4_1 4_1 4_1 4_1 4_1 4_1 4_1
Lactobacillus subgroup 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3 4_2_3
Clostridium ramosum 4_3_1 4_3_1 4_3_1 4_3_1 4_3_1 4_3_4 4_3_4 4_3_4 4_3_4 4_3_4
Enterococcus, Listeria 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2 4_4_2
Streptococcus pyogenes 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2 4_5_2
Streptococcus sanguinis 4_6_1 4_6_1 4_6_1 4_6_1 4_6_1 4_6_2 4_6_2 4_6_2 4_6_2 4_6_2
Listeria 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2 4_7_2
Streptococcus pneumoniae, Enterococcus 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1 4_8_1
Firmicutes (Clostridia, Bacillales, Enterococcus,
Lactobacillus)
5_1 5_1 5_1 5_1 5_1 5_1 5_1 5_1 5_1 5_1
Staphylococcus 5_1_2 5_1_5 5_1_2 5_1_5 5_1_2 5_1_5 5_1_5 5_1_5 5_1_5 5_1_5
Clostridium neonatale 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1 5_2_1
Bifidobacterium longum 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4 6_1_4
Actinobacteria 6_2 6_2 6_2 6_2 6_2 6_2 6_2 6_2 6_2 6_2
Bifidobacterium breve 6_2_1 6_2_1 6_2_2 6_2_2 6_2_1 6_2_1 6_2_1 6_2_1 6_2_2 6_2_1
+ UNI01 (16S rRNA UNVERSAL PROBE) and Hybridization control HYC01
The probes in boldface vary between the probe sets 53
54
8
Supplementary Table 3. Control probes included on arrays 55
Probe name Control type Probe sequence spotted on array Reference
UNI01c 16S Universal CCCCCTGCCAGCAGCCGCGGTAATACG (1)
HYC01c Hybridization control CCCCCTTGCCCGAATCGAATGCTAC N/A
NBC01c Non binding control CCCCCAGGAAGGAAGGAAGGAAGGG (2)
NBC02c Non binding control CCCCCCCCTTCCTTCCTTCCTTCCT (2)
56
57
9
Supplementary Table 4. Experimental design for quantification in a complex 58
background1, 2
59
B.brev B.brev B.brev B.brev B.brev B.brev B.brev B.brev B.brev
B.long B.long B.long B.long B.long B.long B.long B.long B.long
B.frag B.frag B.frag B.frag B.frag B.frag B.frag B.frag B.frag
C.rams C.rams C.rams C.rams C.rams C.rams C.rams C.rams C.rams
E.faec E.faec E.faec E.faec E.faec E.faec E.faec E.faec E.faec
S.aure S.aure S.aure S.aure S.aure S.aure S.aure S.aure S.aure
S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog S.pyog
E.coli E.coli E.coli E.coli E.coli E.coli E.coli E.coli E.coli
1Abbrevations: B.brev - Bifidobacterium breve, B. long - Bifidobacterium longum subsp. 60
infantis, B.frag - Bacteroides fragilis, C.rams - Clostridium ramosum, E.faec - Enterococcus 61
faecalis, S.aure - Staphylococcus aureus subsp. aureus, S.pyog - Streptococcus pyogenes, 62
E.coli - Escherichia coli 63
2Each column represent the mix in a single sample. The species in regular font was added at 64
final concentration of 6.25 ng/µl, while the species in bold were added in respective 65
concentrations of 1, 0.5 and 0 ng/µl. All experiments were conducted in triplicates, except for 66
the addition of 0 ng/µl which were conducted in duplicates. 67
68
10
69
Supplementary Table 5. Quantitative range for selected probes and bacteria 70
Probe1
Species Upper2
Lower2
R2 3
1.1 Bacteroides vulgatus 10 0.01 1.00
1.2.2 Bacteroides vulgatus 10 0.01 1.00
1.3.3 Bacteroides vulgatus 10 0.01 1.00
6.2 Bifidobacterium breve 100 0.01 0.95
6.2.2 Bifidobacterium breve 10 0.01 0.94
2.1.min1b Escherichia coli 10 0.01 0.96
3.2 Escherichia coli 10 0.01 0.90
4.1 Enterococcus faecalis 100 0.01 0.92
4.4.2 Enterococcus faecalis 10 0.01 0.95
5.1 Enterococcus faecalis 10 0.01 0.94
2.1.min1b Salmonella enterica 10 0.01 0.97
2.5.1 Salmonella enterica 10 0.01 0.97
2.7.1 Salmonella enterica 10 0.01 0.97
3.2 Salmonella enterica 10 0.01 0.96
71 1 Only target probes are shown. 72
2 Upper and Lower represent the quantitative range of the dilution series in ng. The 73
quantitative range was determined by 10-fold dilution series of PCR products. 74
3 R
2 represents the squared regression coefficient. 75
76
11
SUPPLEMENTARY INFORMATION 77
Definitions and problem description 78
There are two probe types in the microarray set-up to describe microbial communities. We 79
define these probes as either labeling probes or capture probes. The labeling probes are short 80
unlabeled oligonucleotides (18 to 30 nt) which are labeled if the bacteria the probes identify 81
are present in the community. The labeling level is proportional to the amount of target 82
bacteria. The labeling is in the form of a single fluorescent dideoxy cytosine (TAMRA-83
ddCTP). The capture probes are complementary to the respective labeling probes. The 84
capture probes are immobilized on a microarray, and the labeling probes are hybridized to 85
their complementary capture probes after labeling. The level of fluorescence for the labeling 86
probes is determined by the fluorescence for the capture probe spots. In an experiment we 87
have a set of different labeling probes and we know which group of bacteria these probes can 88
identify (see Box 1). Among these probes, we want to identify a subset of labeling probes, 89
one for each group of bacteria that is compatible, i.e. not hybridizes to other capture probes 90
(cross-binding) or to other labeling probes (cross-labeling). It is also necessary that the 91
labeling probes do not form secondary structures, which may result in self-labeling. A simple 92
illustration of microarray set-up is provided in Supplementary Information Figure 1. 93
94
Box 1. Program’s input. The program takes a “.fastagr” file as input, which is an ordinary fasta file 95
added information in parentheses about which group of bacteria these probes belong . An example 96
with 4 groups of bacteria Bifidobacterium, Ecoli, Staph and Veillonella is shown below. 97
>1 (Bifidobacterium) 98 CCGAAGGCTTGCTCCCAAA 99 >2 (Bifidobacterium) 100 GCTTATTCGAAAGGTACACTCACCCCGAAGGG 101 >3 (Bifidobacterium) 102 GAGCAAGCGTGAGTAAGTTTA 103 >4 (Ecoli) 104 GAGCAAAGGTATTAACTTTACTC 105 >5 (Ecoli) 106 CCTGGACAAAGACTGACGCT 107 >6 (Staph) 108 ACACATATGTTCTTCCCTAATAA 109 >7 (Staph) 110 CCACTGGTGTTCCTCC 111 >8 (Veillonella) 112 GTTAAGCCCCGAACTTTTAAGA 113 >9 (Veillonella) 114 CGAACTTTTAAGACAGACTGAC 115 >10 (Veillonella) 116 GATTGGCAGTTTCCATCCCAT 117
12
Determination of score for hybridization level 118
DNA hybridize by forming hydrogen bonds between its nucleotides: Adenine (A) forms two 119
hydrogen bonds with thymine (T), and guanine (G) forms three hydrogen bonds with 120
cytosine (C). Thus, matching two arbitrary probes, we give two points for each A-T binding 121
and three points for each G-C binding. G and A are purines, while C and T are pyrimidines. 122
Purines and pyrmidines can also bind with single hydrogen bound, and a purine (A or G) 123
binding to pyrmidine (C or T) is given one point. Only hybridizing that contains more than 124
two consecutive nucleotide bindings is taken into consideration. The score is squared to give 125
longer hybridizations a higher weight. In Supplementary Information Figure 2 we have 126
compared probe nr. 2 and probe nr. 8 from the .fastagr file in Box 1. The final score between 127
these two probes is the sum of all hybridizing scores. The lower the score, the fewer 128
interruptions between these probes will be in an experiment. Pairwise scores are calculated 129
for each pair of probes from different groups. 130
131
Algorithm for determining probe compatibility 132
In order to find the optimal combination of all probes we will need to calculate scoring for 133
each combination. The combination with minimum score will be the best one to use in the 134
experiment. If we use data from example 1, we will have to find all possible combinations, 135
one probe from each of the 4 groups. Supplementary Information Figure 3 shows two 136
different combinations of probes. The red combination consists of probes nr. 1, 4, 6, and 8, 137
while the blue includes probes nr. 2, 5, 6, and 10. The score for one combination is the sum of 138
all pairwise scores between each pair of the probes in the combination, i.e. 139
Red combination: 140
1-4 + 1-6 + 1-8 + 4-6 + 4-8 + 6-8 = score for combination 1-4-6-8 141
Blue combination 142
2-5 + 2-6 + 2-10 + 5-6 + 5-10 + 6-10 = score for combination 2-5-6-10 143
144
In addition to pairwise scores, we need to add the self-hybridizing score: 1-1, 6-6, 4-4, and 8-145
8 for the red combination and 2-2, 5-5, 6-6, and 10-10 for the blue combination. Optionally, 146
the scores from the cross-binding, the cross-labeling, or both can be used in the calculations. 147
It is also possible to use empirical data determined from real hybridization experiments. 148
149
13
When all possible combinations are calculated and stored, we get a score matrix containing 150
all of these scores. When we sort these scores, we find which combination gives the 151
minimum total score. This combination is then likely to give us the best results in 152
experiments. 153
154
In our simple case, we had 4 groups of probes and in each group we had either 2 or 3 probes 155
(3*2*3*2 = 36 combinations product of number of probes in each group). 156
In general, the number of combination is calculated as following: 157
158
c = ∏xi (Eq 1) 159
160
Where c is the number of all possible combinations, xi is the number of probes in group i, and 161
n is the number of groups in the dataset. 162
163
The running time for this algorithm can be found by analyzing the cost of each step: 164
1. Find scores for all possible combinations 165
2. Choose the score with minimum cost 166
167
The above reasoning leads us to following calculations: 168
1. Given formula for the number of possible combinations (Eq 1), we want to know worst 169
case cost of the algorithm. We therefore simplify this formula by saying that the number 170
of probes in each group is equal to the biggest number of probes (m) found in the dataset. 171
The number of possible combinations can then be expressed as c = mn 172
2. The score of each combination is the sum of n pairwise scores requires n additions 173
3. Thus computing the score for all possible combinations requires (mn)n elementary 174
operations 175
4. Choosing the combination with minimum score requires mn comparisons 176
5. Summarizing, (mn)*n + (m
n) = (n+1) (m
n) elementary operations are needed for the 177
algorithm. 178
6. Writing this in terms of Big-O notation, we get O=(mn) 179
180
In a real situation we can get up to 100 groups of probes each containing 2-6 probes. Say for 181
example that we have 20 groups and each group contains only 3 probes. Then the number of 182
elementary operations will be (20+1)*(320
) = 73.222.472.421! 183
i = 1
n
14
184
Assuming that each elementary operation takes one millisecond (10-6
) on a fast computer, we 185
will need 73.222.472.421/(106*60*60) = app. 20 hours to calculate the result and it is highly 186
likely that the program will crash during the calculations due to out of memory issues. 187
188
189
The algorithm for searching the best probe sets 190
The variable ScoreTable from the code example in Box 2 contains scores for all possible 191
combinations (scoreTable[4] contains the full score for one specific combination). 192
Information is stored in SumScore table for each for loop, saving us for redundant 193
calculations. Supplementary Information Figure 4 shows what is stored after SumScore 194
iteration. When we have calculated the score for combination 1-4-6-8, we need to go only 195
one step back and rapidly find the score for combination 1-4-6-9 without having to calculate 196
the sum of scores for 1-1, 1-4, 4-4, 1-6, 4-6 and 6-6 (this sum is already stored in 197
SumScore[3]). The same applies when we want to calculate the full score of combination 1-198
4-6-10. The number in SumScore[4] (full score for one combination) is changing for each 199
time we choose another combination, so we need to keep this full score before it is 200
overwritten. For that purpose, a so-called hash-table can be used where full score is the key 201
and the index of all probes is the object (hash-table is like a dictionary: keys are the 202
keywords, and objects are these keywords translated into another language). When the hash-203
table is completed, the program sorts it with respect to scores (keys) and finds which 204
combination (object) gives the best score. 205
206
SUPPLEMENTARY INFORMATION REFERENCES 207
208 1. Lane, D. J. 1991. Nucleic acid techniques in bacterial systematics John Wiley and 209
Sons, New York. 210
2. Sanguin, H., A. Herrera, C. Oger-Desfeux, A. Dechesne, P. Simonet, E. Navarro, T. 211
M. Vogel, Y. Moënne-Loccoz, X. Nesme, and G. L. Grundmann. 2006. Development 212
and validation of a prototype 16S rRNA-based taxonomic microarray for 213
Alphaproteobacteria. Environmental Microbiology 8:289-307. 214
215
216
15
217
SUPPLEMENTARY INFORMATION FIGURE LEGENDS 218
219
Supplementary Information Figure 1. The role of labeling and capture probes. Possible 220
sources of errors showed as red dashed arrows. 221
222
Supplementary Information Figure 2. Pairwise comparison of probe 8 and 2. Blue lines 223
show the area where the hybridization might occur. 224
225
Supplementary Information Figure 3. Example with scores for two probe combinations 226
(red and blue arrows). 227
228
Supplementary Information Figure 4. Visualizing a part of the probe set searching 229
algorithm. 230
231
16
Capture probes immobilized on a microarray
SUPPLEMENTARY INFORMATION FIGURES 232 233
Supplementary Information Figure 1.
GGGTAGTTACCCGGAT C
…CGAACTTTTAAGACAGACTGACTGGCTTGCCATGCCGTCTAGTGAAATAAAGGCCAGTTCCCATCAATGGGCCTAGCCGTCAAAGGC…
GGTACGGCAGATCA C
GGTACGGCAGATCA
GGTACGGCAGATCA
GGTACGGCAGATCA
C C
…CGAACTTTTGACATCTTCCCTAATAATTGGCAGCATCTTGAACCACCCATCAATGGGCCTAGTACCTCAGAGTATACGATCAAAGGC…
GGGTAGTTACCCGGAT
T
C
C GGGTAGTTACCCGGAT
GGGTAGTTACCCGGAT
GGGTAGTTACCCGGAT
C
Cross-labeling
Self-labeling
G
T
T
T
G
G
G
A
C
C
C
G
G
A
T
A
C
Cross-binding
…GTACGGCAGA…
GGTACGGCAGAT
CA GGGTAGTTACCCGT
…GTACGGCAGA…
C
CCCATCAATGGGCC
TA
CCATGCCGTCTAGT
Legend:
5’ to 3’ direction of DNA strands
A part of DNA strand from bacterium 1
Labeling probe for bacterium 1
Labeling probe for bacterium 2
A part of DNA strand from bacterium 2
Dideoxy cytosine labeled with the reporter
Capture probe for labeling probe for bacterium 1
Capture probe for labeling probe for bacterium 2
17
234
Staph
- Probe 6
- Probe 7
Bifidobact.
- Probe 1 - Probe 2
- Probe 3
Veillonella
- Probe 8
- Probe 9
- Probe 10
Ecoli
- Probe 4
- Probe 5
Score 1-4
Score 1-6
Score 1-8
Score 2-5
Score 2-6
Score 2-10
Score 4-6
Score 6-8
Score 5-6
Score 6-10
Sco
re 8
-4 S
core 5
-10
Supplementary Information Figure 3.
Supplementary Information Figure 2.
18
235 236
4
1 2
3
6
5
4 4 5
4 5
9 10
Score for this combination of probes: 1
from group 1, group 2, 6 from group 3
and 8 from group 4 based on pairwise
combination: 1-1, 1-4, 4-4, 1-6, 4-6, 6-
6, 1-8, 4-8, 6-8, 8-8
8
7 6 7
SumScore[0] = 0
SumScore[1] = SumScore[0] + 1-1
SumScore[2] = SumScore[1] + 1-4 + 4-4
SumScore[3] = SumScore[2] + 1-6 +4-6 + 6-6
SumScore[4] = SumScore[3] + 1-8 +4-8 + 6-8 + 8-8
Supplementary Information Figure 4.