Manual PIBWinhlp
-
Upload
jose-jhaiver -
Category
Documents
-
view
217 -
download
1
description
Transcript of Manual PIBWinhlp
![Page 1: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/1.jpg)
PIBWin HELP FILE CONVERTED TO WORD This tutorial was written by Dr. Trevor Bryant and goes far more in-depth than the Schnf
Ashex Tutorial in terms of PIBWin’s abilities. Introduction
PROBABILISTIC IDENTIFICATION OF BACTERIA for Windows (PIBWin) is a windows version of a DOS program PIB (also called Bacterial Identifier). The programme has three major functions:
the identification of an unknown isolate the selection of additional tests to distinguish between possible strains if
identification is not achieved the storage and retrieval of results
It also has some utility functions for assessing the usefulness of identification matrices and for converting matrices into different formats. The program makes use of Excel files to store identification matrices and archived results to achieve this, although other file formats are supported to allow backwards compatibility with the DOS version of the programme. Up to date information on the programme can be found on the PIBWin web site www.som.soton.ac.uk/staff/tnb/pib.htm which can also be accessed from the Help menu. The program is designed to use probabilistic identification matrices that have either published in the literature or created by the user. The matrices that are provided with PIB have been taken from the literature. These matrices have been typed in from the publication describing them and users should refer to these publications for full details of the methods used when testing isolates. Identification Matrix
The identification matrix is displayed when the Matrix tab is selected.
![Page 2: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/2.jpg)
The matrix may be displayed as integer numbers (ranging from 1 to 99) representing the percentage probability of obtaining a positive result, or they can be displayed as +/v/- depending on the value selected. This option is set by the Options. The view can be changed by clicking the right mouse button and checking or unchecking Display Matrix as +/v/- on the pop up menu. To view the full name for a test or taxa move the cursor over the item, a pop up box will display the item in full. Sorting the identification matrix The matrix can be sorted by double clicking on the name at the top of each column. The first double click performs an ascending sort (negative results first), successive double clicks perform descending and ascending sorts. Note the underlying identification matrix is not affected by sorting as the Matrix tab displays a view of it. To return to the original order, either click the right mouse button and select Revert to original order, or select another tab and then return to the Matrix tab. Results
The Results tab is where the results for an unknown strain are entered. There are four aspects to the Results screen
Details Bar Results Grid Entering Results Buttons
Details Bar The details bar is where a personal key, the source of the isolate and details about the isolate can be entered.
Key can be a maximum of 15 characters. A key must be entered if the results are to be saved to an Archive file for recall at a later time. Source is drop down list box which allows text up to a maximum of 50 characters to be entered. To achieve consistent entry of source text, existing values from the Archive file is displayed in the drop down list, so the list will grow in length over time. Details provides for a maximum of 255 characters. The Save button is enabled when one result has been entered and there is an entry in the Key box; it is only shown on the Identification and Additional Tests tabs. Note: If an isolate is recalled from the Archive file and the key changed. Save will create a new, additional, record in the Archive file.
![Page 3: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/3.jpg)
Results Grid Results can be entered in a grid or list format. This is controlled by the status of the Use List Format for Results check box. Grid format enables a 96 well microtitre plate format to be accommodated. The full name of each test is shown in a pop up box when the cursor is placed over the test name.
List Format is a scrolling list
Entry of Results Results can be entered using the keyboard or the mouse. There are 4 possible states for a result: positive + , negative -, indeterminate ? and not done.
![Page 4: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/4.jpg)
The indeterminate state is to allow for tests that have been carried out, but the interpretation of the result is difficult and you are undecided about the result. The indeterminate state allows you to record that the test has been done, rather than the result is missing.
Result Key Function Key
Mouse Action
Positive + or = F2 Left click
Negative - or _ F3 Right click
Indeterminate ? or / F4
Missing <space bar> or <Enter>
F5 Repeat click
The programme has been written so that the shift character does not have to be pressed to obtain the + or ? symbol, although some keyboard layouts may differ. To change a result press the key for the new value. To remove a result using the mouse, click a second time. Note: because of the way the mouse works, the first left click sometimes acts as a select object so an additional click is needed. Buttons
Reset Clears the results of the current isolate and resets them all to missing. The details are left unchanged
New Clears the results and the details of the current isolate and resets them all to missing.
Recall Recalls the results of a previous isolate from an Archive file
Archived Results
The Archive Results screen displays details and identification of previously entered isolates. If an Archive file is not already open then an Open window is displayed when the Recall button is pressed in the Results window.
![Page 5: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/5.jpg)
To recall the results of a previous isolate Double Click on the row of the isolate. Sorting the Archived Results Each column of information can be sorted. Click on the column heading to sort the archived isolates into ascending order, a second click reverses the sort into descending order. Searching the Archived Results The Find button activates a search of the archived results. Searching is case insensitive, it does not include wild cards or complex searching. Once a hit has been obtained, the Find Next button is enabled to permit further searching. Searching is performed across all rows and columns excluding the first column. Technical details The software can support two types of Archive Files, Excel and DOS Archive. The DOS Archive format is for backwards compatibility with the previous DOS version of this software. It is not recommended that this format is used. It contains less information about isolates and is less flexible. The Excel format is recommended. The Excel Archive file can be opened and manipulated in Microsoft Excel. This enables the data to be used by other software packages, unwanted isolate information deleted. DO NOT CHANGE the order of the columns in the Archive file. This would make the file unusable with the identification matrix. There are some internal checks that the software performs to detect discrepancies between the Identification matrix file and the Archive file but these are not fool proof. It is a case of user beware. So if you wish to experiment make sure that you have taken back ups of your files before they are modified. Identification
The identification tab is shown once a test result has been entered in the Results window.
![Page 6: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/6.jpg)
Additional Tests
This tab is available when Identification is not successful and more than one taxon is a possible candidate for the unknown isolate. Tests may be chosen in two ways:
they may be selected so that the most likely taxon can be distinguished from other likely taxa.
they can be selected to distinguish likely taxa from each other.
Use the radio buttons to select which method of test selection you wish to choose, then use
the spin edit box to choose the number of taxa to be considered. Use Select Tests to obtain the list of tests to be used.
Move the cursor over the strains and tests to obtain the name in full in a pop up window. The Exclude Tests button allows you to specifically omit certain tests before test selection is carried out. See Also Test Selection Algorithm Exclude Tests
The Exclude Tests window is used by the Additional Tests and Select Best Tests for Matrix procedures. A list of tests in the current matrix is displayed. Those tests that will be omitted from the test selection procedure are shown with an asterisk * in the Excluded column. Tests can be included or excluded by clicking on the Excluded column.
![Page 7: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/7.jpg)
Include All Tests is used to include all tests from the Test Selection procedure Exclude All Tests is used to exclude all tests from the Test Selection procedure, then those tests that are required can be selected by clicking in the Exclude column. Tools
The Tools menu options provide functions for manipulating matrix files and investigating the properties of an identification matrix
Convert Matrix The Identification matrix file can be written in one of three formats: Excel [*.xls] Comma separated values [*.csv] Fixed format [*.mat] The recommended format is to use the Excel format because this contains more information that the other two formats. The fixed format is for backwards compatibility with the original DOS version of this software and its use is not recommended.
![Page 8: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/8.jpg)
Convert DOS archive This allows the Archive file created by the original DOS version of this software to be rewritten in the Excel archive format. It is strongly recommended that you convert old Archive files. Note: a new Archive file is created and the original Archive file is left untouched.
Select Best Tests This allows investigation of the current matrix to determine which are the most important tests in the matrix. See Select Best Tests for Matrix for further details
Calculate Matrix ID scores
This allows investigation of the current matrix to determine if there is an overlap between strains in the matrix. See Matrix ID scores for further details
Select Best Tests for Matrix
This procedure is called from the Tools Menu. The procedure can be used to select the minimum of tests to distinguish taxa in an identification matrix. Tests may be chosen in two ways:
they may be selected so that one taxon can be distinguished from other strains (taxa).
they can be selected to distinguish all strains (taxa) from each other.
![Page 9: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/9.jpg)
Use Select Tests to obtain the list of tests to be used. Move the cursor over the strains and tests to obtain the name in full in a pop up window. The Exclude Tests button allows you to specifically omit certain tests before test selection is carried out. See Also Test Selection Algorithm Matrix ID Scores
The Matrix ID scores procedure is called from the Tools Menu. It is used to assess whether the identification matrix is capable of identifying each taxon (strain) that is contained in it. The procedure considers each taxon in turn, it uses each percentage probability for that taxon as a positive or negative result, creating a Hypothetical Median Organism (HMO). It then uses this HMO to calculate an Identification Score using the Willcox probability. If any probabilities of 50 are encountered (typically missing data is coded as 50), the identification score is calculated in three ways, tests where a value of 50 is found for the taxon are:
excluded all treated as positive results all treated as negative results
These results are shown as ID Score, Missing Positive and Missing Negative. If the ID score does not exceed the Identification Threshold then the strain with the second highest identification score is listed in the Next Strain column.
Ideally the ID Score and Missing Positive and Missing Negative columns should display values of 1.00000. If identification is not achieved then the most likely taxa are listed descending order of their identification scores. The Additional Tests tab is shown when the Identification tab is selected.
![Page 10: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/10.jpg)
Differences between the unknown isolate likely taxa are listed in a second grid. What is displayed is controlled by the threshold values set in Options. Options
This calls the Options window which has two tabbed Options: General and Identification. The Use default values button resets the defaults for values on the Identification tab.
Open Last Identification Matrix The current (last) identification matrix used by the
programme is automatically opened when PIBWin is started. The name of the file is displayed when this option is selected. The Open window at the that is normally displayed at the start of the programme is not displayed when this option is selected.
Open Last Archive File: The current (last) archive file used by the programme is automatically opened when PIBWin is started. The name of the file is displayed when this option is selected.
Display Matrix as +/v/-
The identification matrix values can either be displayed as integer numbers (ranging from 1 to 99) representing the percentage probability of obtaining a positive result, or they can be displayed as +/v/- depending on the criterion used for Tests are displayed as positive if the percentage is equal to or greater than on the Identification tabbed option.
Record identification in Output Window
The identification of any unknown isolate, atypical tests, additional tests to separate possible strains are recorded in an Output window when this option is selected.
![Page 11: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/11.jpg)
Identification achieved when the ID score is greater than or equal to [default value 0.95]
An unknown is identified when the ID score, also known as the Willcox probability, is equal to or greater than the specified value. A value within the range 0.00001 to 0.99999 can be entered, though the accepted range for this value is 0.95 to 0.999 depending on the identification matrix
and the Modal Likelihood is greater than or equal to [default value 0.01]
A second criterion, the modal likelihood, is also applied to the identification. This avoids identification when one taxon gives a high ID score, but also has several test results that differ from the unknown. A value within the range 0.00001 to 0.99999 can be entered.
List atypical results for taxa with ID scores equal to or greater than [default value 0.05]
A value within the range 0.00001 to 0.99999 can be entered.
When no identification, list taxa with ID scores equal to or greater than [default value 0.001]
This controls how many possible taxa are listed when identification is not achieved. A value within the range 0.00001 to 0.99999 can be entered.
Taxa are distinguished by at least [default value 2]
If identification is not achieved, further tests may be selected. The minimum number of tests to distinguish pairs of taxa can be varied, though traditionally 2 tests is the norm.
![Page 12: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/12.jpg)
A test separates a pair of taxa if their percentage difference is at least [default value 70]
A pair of taxa are separated by a test if the absolute difference between their matrix entries is at least the value specified. This value can range from 51 to 98.
Tests are displayed as positive if the percentage is equal to or greater than [default value 85]
The Identification matrix values either be displayed as integer numbers (ranging from 1 to 99) representing the percentage probability of obtaining a positive result, or they can be displayed as +/v/- depending on the value selected. This value can range from 51 to 99. Negative results are calculated as 100-the chosen value.
Theory
Most computer assisted identification systems are based on Willcox's implementation of Bayes theorem.
where: is the probability that an unknown isolate, giving a pattern of test results R, is a
member of taxon (group of bacteria) ti and is the probability that the unknown has a pattern R given that it is a member of taxon ti. Bayes theorem incorporates prior probabilities; these are the expected prevalence of strains included in the identification matrix. For bacterial identification most authors give all taxa an equal chance of being isolated and therefore the prior probabilities for all taxa are set to 1.0 and omitted from the equation. The above equation therefore can be re-expressed as:
where the probabilities are now referred to as Identification Scores, or Willcox Scores. The identification scores for each taxon are normalized values and Li* for all taxa sums to one. Identification of an unknown isolate is achieved when Li* for one taxon exceeds a specified threshold value. An example is shown below with an identification matrix consisting of three taxa for which we have the probabilities for four tests. Identification matrix with results of unknown
Tests 1 2 3 4
a 0.01 0.20 0.99 0.90
Taxa b 0.95 0.01 0.99 0.01
c 0.99 0.10 0.85 0.99
Results of unknown + - + missing
![Page 13: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/13.jpg)
An unknown has been isolated whose results for the first three tests are positive, negative and positive respectively. The likelihoods that the taxa a, b and c will give the pattern of results observed for the unknown is calculated by multiplying the probability of obtaining a positive result for test 1 by the probability of obtaining a negative result for test 2 by the probability of obtaining a positive result for test 3 for each taxon in turn. Calculation of likelihood of unknown
1 2 3 Likelihood
a 0.01 * (1-0.20) * 0.99 = 0.00792
Taxa b 0.95 * (1-0.01) * 0.99 = 0.93110
c 0.99 * (1-0.10) * 0.85 = 0.75735
Sum = 1.69637
The original identification matrix only gives the probabilities for positive results, in order to use the probability for a negative result we must subtract the matrix entries for test 2 from 1. Calculation of likelihood of unknown
1 2 3 Likelihood
a 0.01 * (1-0.20) * 0.99 = 0.00792
Taxa b 0.95 * (1-0.01) * 0.99 = 0.93110
c 0.99 * (1-0.10) * 0.85 = 0.75735
Sum = 1.69637
The Identification Scores are expressed as normalized likelihoods. Willcox probabilities (normalised likelihoods)
Identification Score
a 0.00792 / 1.69637 = 0.004669
Taxa b 0.93110 / 1.69637 = 0.548877
c 0.75735 / 1.69637 = 0.446455
Sum = 1.000000
In this example the unknown is not identified because a single taxon does not reach the identification threshold value. Taxa b and c are still both candidates for the identity of the unknown. Threshold values of 0.999 are typically used, for example with the Enterobacteriaceae, but with other groups of bacteria, such as the streptomycetes, values as low as 0.95 have been used. In practical terms, a value of 0.999 means that the taxon which the unknown identifies with will have at least two test differences from all other taxa in the matrix.
![Page 14: Manual PIBWinhlp](https://reader030.fdocuments.in/reader030/viewer/2022032515/563db86d550346aa9a93989b/html5/thumbnails/14.jpg)
Whatever type of identification system is used, there are four possible outcomes: The unknown is identified with the correct taxon. The unknown is misidentified, i.e. incorrectly attributed to wrong taxon. The unknown is not identified at all, and correctly so because the taxon to which it
belongs is not present in the matrix. The unknown is not identified, but should have been identified with a taxon that is
present in the matrix. It is important that any system deals with these possibilities, although the last one is difficult to resolve. One problem with the identification score is that if an unknown is not represented in the matrix, but one strain within the matrix is closer to it (in a-space) than all others, the unknown may be identified as this strain. This is where additional criteria should be used to assist the identification process. These include, listing the differences in test results between the unknown and the strain it has been identified as, as well as the use of other numeric criteria such as taxonomic distance, the standard error of taxonomic distance measures or maximum likelihoods. Taxonomic distance is the distance of an unknown from the centroid of any taxon with which it is being compared; a low score, ideally less than 1.5, indicates relatedness. The standard error of taxonomic distance assumes that the taxa are in hyperspherical normal clusters. An acceptable score is less than 2.0 to 3.0, and about half the members of a taxon will have negative scores, because they are closer to the centroid than average. The maximum, or best likelihood, is the maximum probability for a taxon calculated using those tests carried out on the unknown. The calculation uses the maximum of the probabilities of a negative and positive result of a test. Maximum possible likelihoods
1 2 3 Best Likelihood
a (1-0.01) * (1-0.20) * 0.99 = 0.78408
Taxa b 0.95 * (1-0.01) * 0.99 = 0.93110
c 0.99 * (1-0.10) * 0.85 = 0.75735
This allows for taxa with several entries of 0.50 in a matrix. Some authors calculate the likelihood/maximum likelihood ratio, termed the modal likelihood fraction Modal likelihood fraction
Modal likelihood
a 0.00792 / 0.78408 = 0.010101
Taxa b 0.93110 / 0.93110 = 1.000000
c 0.75735 / 0.75735 = 1.000000
or it’s inverse and use it to decide whether to accept the identification offered by a Willcox score that has exceeded the identification threshold.