Manual PIBWinhlp

PIBWin HELP FILE CONVERTED TO WORD This tutorial was written by Dr. Trevor Bryant and goes far more in-depth than the Schnf

Ashex Tutorial in terms of PIBWin’s abilities. Introduction

PROBABILISTIC IDENTIFICATION OF BACTERIA for Windows (PIBWin) is a windows version of a DOS program PIB (also called Bacterial Identifier). The programme has three major functions:

the identification of an unknown isolate the selection of additional tests to distinguish between possible strains if

identification is not achieved the storage and retrieval of results

It also has some utility functions for assessing the usefulness of identification matrices and for converting matrices into different formats. The program makes use of Excel files to store identification matrices and archived results to achieve this, although other file formats are supported to allow backwards compatibility with the DOS version of the programme. Up to date information on the programme can be found on the PIBWin web site www.som.soton.ac.uk/staff/tnb/pib.htm which can also be accessed from the Help menu. The program is designed to use probabilistic identification matrices that have either published in the literature or created by the user. The matrices that are provided with PIB have been taken from the literature. These matrices have been typed in from the publication describing them and users should refer to these publications for full details of the methods used when testing isolates. Identification Matrix

The identification matrix is displayed when the Matrix tab is selected.

The matrix may be displayed as integer numbers (ranging from 1 to 99) representing the percentage probability of obtaining a positive result, or they can be displayed as +/v/- depending on the value selected. This option is set by the Options. The view can be changed by clicking the right mouse button and checking or unchecking Display Matrix as +/v/- on the pop up menu. To view the full name for a test or taxa move the cursor over the item, a pop up box will display the item in full. Sorting the identification matrix The matrix can be sorted by double clicking on the name at the top of each column. The first double click performs an ascending sort (negative results first), successive double clicks perform descending and ascending sorts. Note the underlying identification matrix is not affected by sorting as the Matrix tab displays a view of it. To return to the original order, either click the right mouse button and select Revert to original order, or select another tab and then return to the Matrix tab. Results

The Results tab is where the results for an unknown strain are entered. There are four aspects to the Results screen

Details Bar Results Grid Entering Results Buttons

Details Bar The details bar is where a personal key, the source of the isolate and details about the isolate can be entered.

Key can be a maximum of 15 characters. A key must be entered if the results are to be saved to an Archive file for recall at a later time. Source is drop down list box which allows text up to a maximum of 50 characters to be entered. To achieve consistent entry of source text, existing values from the Archive file is displayed in the drop down list, so the list will grow in length over time. Details provides for a maximum of 255 characters. The Save button is enabled when one result has been entered and there is an entry in the Key box; it is only shown on the Identification and Additional Tests tabs. Note: If an isolate is recalled from the Archive file and the key changed. Save will create a new, additional, record in the Archive file.

Results Grid Results can be entered in a grid or list format. This is controlled by the status of the Use List Format for Results check box. Grid format enables a 96 well microtitre plate format to be accommodated. The full name of each test is shown in a pop up box when the cursor is placed over the test name.

List Format is a scrolling list

Entry of Results Results can be entered using the keyboard or the mouse. There are 4 possible states for a result: positive + , negative -, indeterminate ? and not done.

The indeterminate state is to allow for tests that have been carried out, but the interpretation of the result is difficult and you are undecided about the result. The indeterminate state allows you to record that the test has been done, rather than the result is missing.

Result Key Function Key

Mouse Action

Positive + or = F2 Left click

Negative - or _ F3 Right click

Indeterminate ? or / F4

Missing <space bar> or <Enter>

F5 Repeat click

The programme has been written so that the shift character does not have to be pressed to obtain the + or ? symbol, although some keyboard layouts may differ. To change a result press the key for the new value. To remove a result using the mouse, click a second time. Note: because of the way the mouse works, the first left click sometimes acts as a select object so an additional click is needed. Buttons

Reset Clears the results of the current isolate and resets them all to missing. The details are left unchanged

New Clears the results and the details of the current isolate and resets them all to missing.

Recall Recalls the results of a previous isolate from an Archive file

Archived Results

The Archive Results screen displays details and identification of previously entered isolates. If an Archive file is not already open then an Open window is displayed when the Recall button is pressed in the Results window.

To recall the results of a previous isolate Double Click on the row of the isolate. Sorting the Archived Results Each column of information can be sorted. Click on the column heading to sort the archived isolates into ascending order, a second click reverses the sort into descending order. Searching the Archived Results The Find button activates a search of the archived results. Searching is case insensitive, it does not include wild cards or complex searching. Once a hit has been obtained, the Find Next button is enabled to permit further searching. Searching is performed across all rows and columns excluding the first column. Technical details The software can support two types of Archive Files, Excel and DOS Archive. The DOS Archive format is for backwards compatibility with the previous DOS version of this software. It is not recommended that this format is used. It contains less information about isolates and is less flexible. The Excel format is recommended. The Excel Archive file can be opened and manipulated in Microsoft Excel. This enables the data to be used by other software packages, unwanted isolate information deleted. DO NOT CHANGE the order of the columns in the Archive file. This would make the file unusable with the identification matrix. There are some internal checks that the software performs to detect discrepancies between the Identification matrix file and the Archive file but these are not fool proof. It is a case of user beware. So if you wish to experiment make sure that you have taken back ups of your files before they are modified. Identification

The identification tab is shown once a test result has been entered in the Results window.

Additional Tests

This tab is available when Identification is not successful and more than one taxon is a possible candidate for the unknown isolate. Tests may be chosen in two ways:

they may be selected so that the most likely taxon can be distinguished from other likely taxa.

they can be selected to distinguish likely taxa from each other.

Use the radio buttons to select which method of test selection you wish to choose, then use

the spin edit box to choose the number of taxa to be considered. Use Select Tests to obtain the list of tests to be used.

Move the cursor over the strains and tests to obtain the name in full in a pop up window. The Exclude Tests button allows you to specifically omit certain tests before test selection is carried out. See Also Test Selection Algorithm Exclude Tests

The Exclude Tests window is used by the Additional Tests and Select Best Tests for Matrix procedures. A list of tests in the current matrix is displayed. Those tests that will be omitted from the test selection procedure are shown with an asterisk * in the Excluded column. Tests can be included or excluded by clicking on the Excluded column.

Include All Tests is used to include all tests from the Test Selection procedure Exclude All Tests is used to exclude all tests from the Test Selection procedure, then those tests that are required can be selected by clicking in the Exclude column. Tools

The Tools menu options provide functions for manipulating matrix files and investigating the properties of an identification matrix

Convert Matrix The Identification matrix file can be written in one of three formats: Excel [*.xls] Comma separated values [*.csv] Fixed format [*.mat] The recommended format is to use the Excel format because this contains more information that the other two formats. The fixed format is for backwards compatibility with the original DOS version of this software and its use is not recommended.

Convert DOS archive This allows the Archive file created by the original DOS version of this software to be rewritten in the Excel archive format. It is strongly recommended that you convert old Archive files. Note: a new Archive file is created and the original Archive file is left untouched.

Select Best Tests This allows investigation of the current matrix to determine which are the most important tests in the matrix. See Select Best Tests for Matrix for further details

Calculate Matrix ID scores

This allows investigation of the current matrix to determine if there is an overlap between strains in the matrix. See Matrix ID scores for further details

Select Best Tests for Matrix

This procedure is called from the Tools Menu. The procedure can be used to select the minimum of tests to distinguish taxa in an identification matrix. Tests may be chosen in two ways:

they may be selected so that one taxon can be distinguished from other strains (taxa).

they can be selected to distinguish all strains (taxa) from each other.

Use Select Tests to obtain the list of tests to be used. Move the cursor over the strains and tests to obtain the name in full in a pop up window. The Exclude Tests button allows you to specifically omit certain tests before test selection is carried out. See Also Test Selection Algorithm Matrix ID Scores

The Matrix ID scores procedure is called from the Tools Menu. It is used to assess whether the identification matrix is capable of identifying each taxon (strain) that is contained in it. The procedure considers each taxon in turn, it uses each percentage probability for that taxon as a positive or negative result, creating a Hypothetical Median Organism (HMO). It then uses this HMO to calculate an Identification Score using the Willcox probability. If any probabilities of 50 are encountered (typically missing data is coded as 50), the identification score is calculated in three ways, tests where a value of 50 is found for the taxon are:

excluded all treated as positive results all treated as negative results

These results are shown as ID Score, Missing Positive and Missing Negative. If the ID score does not exceed the Identification Threshold then the strain with the second highest identification score is listed in the Next Strain column.

Ideally the ID Score and Missing Positive and Missing Negative columns should display values of 1.00000. If identification is not achieved then the most likely taxa are listed descending order of their identification scores. The Additional Tests tab is shown when the Identification tab is selected.

Differences between the unknown isolate likely taxa are listed in a second grid. What is displayed is controlled by the threshold values set in Options. Options

This calls the Options window which has two tabbed Options: General and Identification. The Use default values button resets the defaults for values on the Identification tab.

Open Last Identification Matrix The current (last) identification matrix used by the

programme is automatically opened when PIBWin is started. The name of the file is displayed when this option is selected. The Open window at the that is normally displayed at the start of the programme is not displayed when this option is selected.

Open Last Archive File: The current (last) archive file used by the programme is automatically opened when PIBWin is started. The name of the file is displayed when this option is selected.

Display Matrix as +/v/-

The identification matrix values can either be displayed as integer numbers (ranging from 1 to 99) representing the percentage probability of obtaining a positive result, or they can be displayed as +/v/- depending on the criterion used for Tests are displayed as positive if the percentage is equal to or greater than on the Identification tabbed option.

Record identification in Output Window

The identification of any unknown isolate, atypical tests, additional tests to separate possible strains are recorded in an Output window when this option is selected.

Identification achieved when the ID score is greater than or equal to [default value 0.95]

An unknown is identified when the ID score, also known as the Willcox probability, is equal to or greater than the specified value. A value within the range 0.00001 to 0.99999 can be entered, though the accepted range for this value is 0.95 to 0.999 depending on the identification matrix

and the Modal Likelihood is greater than or equal to [default value 0.01]

A second criterion, the modal likelihood, is also applied to the identification. This avoids identification when one taxon gives a high ID score, but also has several test results that differ from the unknown. A value within the range 0.00001 to 0.99999 can be entered.

List atypical results for taxa with ID scores equal to or greater than [default value 0.05]

A value within the range 0.00001 to 0.99999 can be entered.

When no identification, list taxa with ID scores equal to or greater than [default value 0.001]

This controls how many possible taxa are listed when identification is not achieved. A value within the range 0.00001 to 0.99999 can be entered.

Taxa are distinguished by at least [default value 2]

If identification is not achieved, further tests may be selected. The minimum number of tests to distinguish pairs of taxa can be varied, though traditionally 2 tests is the norm.

A test separates a pair of taxa if their percentage difference is at least [default value 70]

A pair of taxa are separated by a test if the absolute difference between their matrix entries is at least the value specified. This value can range from 51 to 98.

Tests are displayed as positive if the percentage is equal to or greater than [default value 85]

The Identification matrix values either be displayed as integer numbers (ranging from 1 to 99) representing the percentage probability of obtaining a positive result, or they can be displayed as +/v/- depending on the value selected. This value can range from 51 to 99. Negative results are calculated as 100-the chosen value.

Theory

Most computer assisted identification systems are based on Willcox's implementation of Bayes theorem.

where: is the probability that an unknown isolate, giving a pattern of test results R, is a

member of taxon (group of bacteria) ti and is the probability that the unknown has a pattern R given that it is a member of taxon ti. Bayes theorem incorporates prior probabilities; these are the expected prevalence of strains included in the identification matrix. For bacterial identification most authors give all taxa an equal chance of being isolated and therefore the prior probabilities for all taxa are set to 1.0 and omitted from the equation. The above equation therefore can be re-expressed as:

where the probabilities are now referred to as Identification Scores, or Willcox Scores. The identification scores for each taxon are normalized values and Li* for all taxa sums to one. Identification of an unknown isolate is achieved when Li* for one taxon exceeds a specified threshold value. An example is shown below with an identification matrix consisting of three taxa for which we have the probabilities for four tests. Identification matrix with results of unknown

Tests 1 2 3 4

a 0.01 0.20 0.99 0.90

Taxa b 0.95 0.01 0.99 0.01

c 0.99 0.10 0.85 0.99

Results of unknown + - + missing

An unknown has been isolated whose results for the first three tests are positive, negative and positive respectively. The likelihoods that the taxa a, b and c will give the pattern of results observed for the unknown is calculated by multiplying the probability of obtaining a positive result for test 1 by the probability of obtaining a negative result for test 2 by the probability of obtaining a positive result for test 3 for each taxon in turn. Calculation of likelihood of unknown

1 2 3 Likelihood

a 0.01 * (1-0.20) * 0.99 = 0.00792

Taxa b 0.95 * (1-0.01) * 0.99 = 0.93110

c 0.99 * (1-0.10) * 0.85 = 0.75735

Sum = 1.69637

The original identification matrix only gives the probabilities for positive results, in order to use the probability for a negative result we must subtract the matrix entries for test 2 from 1. Calculation of likelihood of unknown

1 2 3 Likelihood

a 0.01 * (1-0.20) * 0.99 = 0.00792

Taxa b 0.95 * (1-0.01) * 0.99 = 0.93110

c 0.99 * (1-0.10) * 0.85 = 0.75735

Sum = 1.69637

The Identification Scores are expressed as normalized likelihoods. Willcox probabilities (normalised likelihoods)

Identification Score

a 0.00792 / 1.69637 = 0.004669

Taxa b 0.93110 / 1.69637 = 0.548877

c 0.75735 / 1.69637 = 0.446455

Sum = 1.000000

In this example the unknown is not identified because a single taxon does not reach the identification threshold value. Taxa b and c are still both candidates for the identity of the unknown. Threshold values of 0.999 are typically used, for example with the Enterobacteriaceae, but with other groups of bacteria, such as the streptomycetes, values as low as 0.95 have been used. In practical terms, a value of 0.999 means that the taxon which the unknown identifies with will have at least two test differences from all other taxa in the matrix.

Whatever type of identification system is used, there are four possible outcomes: The unknown is identified with the correct taxon. The unknown is misidentified, i.e. incorrectly attributed to wrong taxon. The unknown is not identified at all, and correctly so because the taxon to which it

belongs is not present in the matrix. The unknown is not identified, but should have been identified with a taxon that is

present in the matrix. It is important that any system deals with these possibilities, although the last one is difficult to resolve. One problem with the identification score is that if an unknown is not represented in the matrix, but one strain within the matrix is closer to it (in a-space) than all others, the unknown may be identified as this strain. This is where additional criteria should be used to assist the identification process. These include, listing the differences in test results between the unknown and the strain it has been identified as, as well as the use of other numeric criteria such as taxonomic distance, the standard error of taxonomic distance measures or maximum likelihoods. Taxonomic distance is the distance of an unknown from the centroid of any taxon with which it is being compared; a low score, ideally less than 1.5, indicates relatedness. The standard error of taxonomic distance assumes that the taxa are in hyperspherical normal clusters. An acceptable score is less than 2.0 to 3.0, and about half the members of a taxon will have negative scores, because they are closer to the centroid than average. The maximum, or best likelihood, is the maximum probability for a taxon calculated using those tests carried out on the unknown. The calculation uses the maximum of the probabilities of a negative and positive result of a test. Maximum possible likelihoods

1 2 3 Best Likelihood

a (1-0.01) * (1-0.20) * 0.99 = 0.78408

Taxa b 0.95 * (1-0.01) * 0.99 = 0.93110

c 0.99 * (1-0.10) * 0.85 = 0.75735

This allows for taxa with several entries of 0.50 in a matrix. Some authors calculate the likelihood/maximum likelihood ratio, termed the modal likelihood fraction Modal likelihood fraction

Modal likelihood

a 0.00792 / 0.78408 = 0.010101

Taxa b 0.93110 / 0.93110 = 1.000000

c 0.75735 / 0.75735 = 1.000000

or it’s inverse and use it to decide whether to accept the identification offered by a Willcox score that has exceeded the identification threshold.

Manual PIBWinhlp

Documents

Transcript of Manual PIBWinhlp