Computer-Assisted Interpretation of Mass Spectra · JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20,...

9
JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1999) 363 U Computer-Assisted Interpretation of Mass Spectra Carleton S. Hayek, Fernando J. Pineda, Otis W. Doss III, and Jeffrey S. Lin nder Defense Advanced Research Projects Agency sponsorship, APL is developing a miniature time-of-flight (TOF) mass spectrometer for early warning against exposure to chemical/biological agents. Intended for operation by a wide range of military and civilian personnel, the instrument must be able to detect and identify pathological agents within minutes. Key to this mission is the spectrometer operator’s interpretation of the data. Typically, interpretation of mass spectra has been the realm of professional chemists and biochemists. Other operators must rely on computer classification of the TOF mass spectrometer’s output. We describe algorithms that can be used to interpret mass spectra and that have been successful on a limited data set. These algorithms handle precisely known, and partially unknown, signatures. For precisely known signatures, a vector space problem can be formulated to estimate the optimum approximation of the measured spectrum with a combination of stored library signatures of threat agents. For partially unknown signatures, a Bayesian probabilistic approach has been taken to relate the potentially variable signature of a bacterial threat to likelihoods of chemical composition of bacterial lipids. (Keywords: Computer classification, MALDI, Mass spectrum.) INTRODUCTION The goal of the APL Miniature Time-of-Flight (TOF) Mass Spectrometer Program is to produce a field-portable device (see McLoughlin et al., this issue). This instrument is intended to be used by military or emergency civilian personnel to detect the presence of chemical and biological warfare (CBW) agents while there is still time to minimize their effects. The ability to rapidly detect and classify chemical or biological threats is critical to the safety and effective- ness of military forces and civilian populations. Key to this mission is computer-assisted interpreta- tion of the large quantity of data produced by the mass spectrometer. The complex output must be made acces- sible to operators who have no background in mass spectrometry. The software being designed to accom- plish this part of the mission is designated the Threat Identification System (TIDS). Challenges to the development of an automated threat identifier are: Possibility of highly noisy and cluttered background

Transcript of Computer-Assisted Interpretation of Mass Spectra · JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20,...

COMPUTER-ASSISTED INTERPRETATION OF MASS SPECTRA

U

Computer-Assisted Interpretation of Mass Spectra

Carleton S. Hayek, Fernando J. Pineda, Otis W. Doss III, and Jeffrey S. Lin

nder Defense Advanced Research Projects Agency sponsorship, APL isdeveloping a miniature time-of-flight (TOF) mass spectrometer for early warningagainst exposure to chemical/biological agents. Intended for operation by a wide rangeof military and civilian personnel, the instrument must be able to detect and identifypathological agents within minutes. Key to this mission is the spectrometer operator’sinterpretation of the data. Typically, interpretation of mass spectra has been the realmof professional chemists and biochemists. Other operators must rely on computerclassification of the TOF mass spectrometer’s output. We describe algorithms that canbe used to interpret mass spectra and that have been successful on a limited data set.These algorithms handle precisely known, and partially unknown, signatures. Forprecisely known signatures, a vector space problem can be formulated to estimate theoptimum approximation of the measured spectrum with a combination of stored librarysignatures of threat agents. For partially unknown signatures, a Bayesian probabilisticapproach has been taken to relate the potentially variable signature of a bacterial threatto likelihoods of chemical composition of bacterial lipids. (Keywords: Computerclassification, MALDI, Mass spectrum.)

INTRODUCTIONThe goal of the APL Miniature Time-of-Flight

(TOF) Mass Spectrometer Program is to producea field-portable device (see McLoughlin et al., thisissue). This instrument is intended to be used bymilitary or emergency civilian personnel to detect thepresence of chemical and biological warfare (CBW)agents while there is still time to minimize their effects.The ability to rapidly detect and classify chemical orbiological threats is critical to the safety and effective-ness of military forces and civilian populations.

JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (

Key to this mission is computer-assisted interpreta-tion of the large quantity of data produced by the massspectrometer. The complex output must be made acces-sible to operators who have no background in massspectrometry. The software being designed to accom-plish this part of the mission is designated the ThreatIdentification System (TIDS). Challenges to thedevelopment of an automated threat identifier are:

• Possibility of highly noisy and cluttered background

1999) 363

C. S. HAYEK ET AL.

• Complexity of agent molecules (hence, complexity ofmass spectral signatures), combined with similarity oftheir basic bio-organic chemistry

• Variability of bacterial molecular signatures as a resultof growth conditions and incompletely understoodionization physics

• Need for the identification algorithm to perform athigh probability of detection with low probability offalse alarm, while being able to analyze low concen-trations of agents or agent mixtures

The TOF mass spectrometer may be deployed whenthe exact signature masses of the pathological agentsare well characterized (e.g., from intelligence collectionof weaponized agents) or when they are only approx-imately characterized. With detailed prior knowledge,the agent can be identified by pattern matching to apreconstructed library. With only approximate priorknowledge, a broader approach may be required; forexample, the processor may apply a set of bio-organicmolecular consistency rules for agent identification.We report here on progress toward accomplishing thegoal of computer-assisted spectrum interpretation.

MASS SPECTROMETER OPERATIONAND SIGNATURES OF SELECTEDORGANIC COMPOUNDS

Figure 1 shows a simplified diagram of a TOF massspectrometer. The sample is introduced in the inlet,which is under vacuum. Matrix-assisted laser desorp-tion/ionization (MALDI) is used to ionize the sample,typically resulting in a net single charge on the mol-ecule. Ionized sample molecules, or ionized moleculefragments, are then accelerated by a potential differ-ence into the analyzer section. Because the fragmentshave equal kinetic energies, their velocities differ in

364 JOH

proportion to the square root of their mass ratios. Frag-ments traveling at their respective speeds will arrive atdifferent times at the detector. Voltage output from thedetector will increase with increasing number of frag-ments arriving simultaneously, thereby indicating theabundance of fragments of each mass.1 (Fragments ofthe same mass but multiply charged will not arrivesimultaneously; this, however, can be accounted for inthe abundance calculation.)

Example spectra of CBW agents are shown in Fig.2. Note the characteristic pattern of the relative inten-sities along the mass axis for each agent. For testingpurposes, simulants for CBW agents are used as sam-ples. Example spectra for simulants prepared at APL areshown in Fig. 3. Ongoing signature characterizationstudies will determine the variability of these signaturesas a function of growth and environmental conditions,sample preparation, and mass spectrometer configura-tion and operating parameters.

METHODS FOR AGENTIDENTIFICATION

We have designed two mass spectrum signal process-ing algorithms, corresponding to the two identificationscenarios mentioned earlier. One is a multivariate lin-ear least-squares regression of the unknown spectrum toa spectra library, and the second is a belief networkcapable of classifying organic substances on the basis oftheir chemical (i.e., phospholipid) content.

Multivariate Least-Squares RegressionIn the first identification scenario, the field operator

has measured a sample and must determine whether themass spectrum matches one threat signature or a com-bination of threat signatures stored in a library.

Reflectron

Ion detector

Iontrack

Laserbeam

Airsampler

Ionizationlaser

Mass spectral biosignature

Metastableions

Inte

nsity

Mass/charge ratio, m/z

Vacuumpumping

Parention

Laserbeam

Bio-agent

Desorbedbiomarkers

Laser-absorbing

matrix

Control andacquisitionelectronics

MALDI details

Figure 1. Miniature time-of-flight mass spectrometer biodetector system (MALDI = matrix-assisted laser desorption/ionization).

NS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1999)

COMPUTER-A

100

50

0

32962.463940.1

Ricin

100

50

00 50,000 100,000 150,000

Mass-to-charge ratio, m/z

5229.3Rel

ativ

e in

tens

ity

2704.8

7752.2

15586.1Yersinia pestis

Figure 2. Preliminary mass spectrum signatures of ricin (toxin made from castor beans)and Yersinia pestis (gram-negative organism that causes plague).

The mass spectrum can be specified by a multirow,two-column table. The first column is “m/z” (massdivided by charge, where charge is typically 1). Thesecond column is “intensity,” i.e., the magnitude of thevoltage induced by the ions on the multichannel iondetector plate in the TOF mass spectrometer. When themass value sequence is the same for a group of spectra,they may be represented by their intensity columnsalone, i.e., they can be represented as “intensity vec-tors.” In this section we assume that the exact massesof the predominant fragments of the threat agents areknown, unique, and stored in a library as intensityvectors.

Previous work on this problem has demonstratedthat mass spectra of mixtures of substances (mixturesof “analytes”) result in combinations of their respectivemass spectra. Following the results of Platt et al.,2 wecast the threat identification problem as a vector spaceoptimization problem.

20

10

020

10

0100

50

0100

50

0

Rel

ativ

e in

tens

ity

103 104

Erwinia herbicola

B. globigii spores

Ovalbumin

MS-2 coat protein

Mass-to-charge ratio, m/z

Figure 3. Preliminary mass spectrum signatures of four threat agent simulants used fortesting purposes: Erwinia herbicola, a simulant for a vegetative bacterium such as plague;Bacillus globigii, a spore-forming bacterium similar to anthrax; ovalbumin, a chicken eggprotein developed as a simulant for toxins such as botulinium; and MS-2, a simulant forpathogenic viruses. The bottom two panels show peaks only.

JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (199

SSISTED INTERPRETATION OF MASS SPECTRA

In the noiseless case, the prob-lem is to determine optimumweighting coefficients bk such thatthe weighted sum of the intensityvectors in the library best matchthe unknown spectrum. (Theweighted sum will not exactlymatch the unknown spectrum be-cause of slight differences in theabsolute intensities of the librarysignature relative to a newly col-lected signature.) The approxima-tion is written as

b b b1 1 2 2L L L U+ + ≈K N N , (1)

where Lk is the abundance vector for library elementk, N is the number of elements (i.e., threat agents) inthe library, and U is the abundance vector forthe unknown. Non-zero values of bk indicate that atleast one signature line of library member Lk is presentin the sample, assuming a noiseless, interference-lessbackground.

Actual mass spectra intensity vectors are the resultof noisy measurements, with possible interference fromother substances in the environment. Noise impactsthe library elements, the unknown spectrum, and theapproach to identifying the optimum mixture coeffi-cients b.

Assuming that in Eq. 1 the model for the relation-ship between unknown and library elements is U = Lb+ j (with j being the noise in the measurement of U),then the solution for the minimum variance, unbiased

estimator b is given by the Gauss-Markov theorem3 as

9)

b = − − −( ) ,L K L L K UT T1 1 1

where T means transpose and K21 isthe covariance matrix of j.

The covariance matrix of b isgiven by

cov( ) ( ) .b = − −L K LT 1 1

The diagonal elements of cov(b)are the variances of the individualcomponents bk, and indicate theuncertainty in the estimatedweights of each library vectorcaused by the measurement noise.The weights b for the various

365

C. S. HAYEK ET AL.

(a)

(b)

library elements, together with thenumber of signature lines in eachlibrary element that appear inthe unknown (relative to totalnumber of lines in the respectivelibrary element), are presented tothe operator.

These computations for TOFmass spectrometer multivariate re-gression were implemented inTIDS Version 1 software. The soft-ware has been run on a 166-MHzPC desktop and a PC laptop, bothrunning Windows 95. Preliminarytests with the available data (thefour simulants Bacillus globigii,Erwinia herbicola, MS-2, and oval-bumin in a noiseless, interference-less environment) showed thateach simulant could be readilyidentified.

As an example, Fig. 4a shows an“unknown” mass spectrum (B. glo-bigii spectrum taken with a commer-cial mass spectrometer, the KratosMALDI 3) that is to be classified bythe processor. Previously stored inthe processor’s “threat” library weresignature lines from seven substanc-es derived from a training set ofspectra collected on an earlier date.Three lines in the unknown spec-trum correspond to the previouslyidentified B. globigii signature andare clearly visible in the rangearound 1100 m/z. Figure 4b showsthe detected lines overlaid on thethresholded spectrum. The proba-bility of false alarms (Pfa) for thethreshold process was set to 1024, or1 in 10,000. Also shown in Fig. 4b,within the box labeled “RelativeThreat Weights and ThresholdCrossings,” are the results of pro-cessing the detected peaks with theidentification algorithms. As thefigure illustrates, all three substancelines were detected and correctlyclassified as belonging to B. globigii.

Refinement of this approach will require additionaldata to characterize the signature intensity and noisedistributions. We can then invoke a match confidencemeasure we have derived4 to inform the operator of thelikelihood that the bk values presented are a result ofa true match of the library with the unknown or ofcoincidence given the noise environment.

Figure 4. (a) Mass sto the operator by thesignature classificatireader’s benefit and wunknown in Fig. 4a byderived from a separ

366

pectrum of an “unknown” (actually the simulant B. globigii ) presented APL TOF mass spectrometer TIDS after data collection and prior to

on processing. The red "Substance Lines" shown here are for theould not be available to the operator. (b) Correct classification of the multivariate regression to a library of stored mass spectra signaturesate training data set.

JOH

Probabilistic Graphical Modelingfor Phospholipid Identification

In the second identification scenario, the field op-erator has measured a sample and must determinewhether the mass spectrum just measured matches onethreat agent or a combination of threat agents based on

NS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1999)

COMPUTER-ASSISTED INTERPRETATION OF MASS SPECTRA

models of the chemical composition of the threatagent. To demonstrate graphical modeling techniques,we are developing models for phospholipid analysis ofMALDI-TOF spectra. Membrane lipids are importantbiomarkers that have been used to classify bacterialspecies. In particular, desorption mass spectra obtainedfrom lysed bacteria have been used to distinguish bothGram’s stain and species.5,6

The most common phospholipids are composed ofa glycerol phosphate core, two fatty acids, and a polarhead group (Fig. 5). The composition of the polar headgroup determines the phospholipid class. Classifyingbacterial species by phospholipid analysis is challengingbecause the phospholipid content of a given species canbe quite variable. For example, the distribution of fattyacids in a given species depends on factors such asculture temperature and the growth phase during whichthe culture is harvested. The distribution of polar headgroups is much less sensitive to growth conditions andhas been used by itself to differentiate species.7,8

Nevertheless, it is far from clear whether phospholipidcontent alone constitutes sufficient statistics for bacte-rial classification. Thus, we are constraining our effortsto the characterization of phospholipids in mass spec-tra, and we defer to a later time the question of whetherphospholipid characterization provides sufficient statis-tics for bacterial classification.

Our approach is based on probabilistic graphicalmodels. This approach is well established in applica-tion areas with difficult data-analysis tasks such asspeech processing and multisensor fusion. Some well-known examples of graphical models are hidden Mark-ov models, influence diagrams, and Bayesian beliefnetworks.9 To illustrate the approach, consider Fig. 6.

O

CH O P O CH2 CH2 NH2

OHO O

C = O C = O

(CHx)n (CHx)n

H C H

H

H C H

H

CH2 CH2

Glycerol group

Phosphategroup

Polar head group (PE)

Fatty acid chains

Figure 5. A typical phospholipid consists of four groups. The glycerol group and thephosphate group are always the same. The variable parts are the fatty acid group and thepolar head group. The particular phospholipid shown is PE C(34:1). This notation indicatesthat the polar head group is phosphatidyl-ethanolamine (PE) and that there are 34 carbonsand 1 double bond in the fatty acid chains.

JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (19

This graph represents a joint distribution of randomvariables as the product of conditional distributions.Each vertex is associated with a conditional distribu-tion wherein the random variable associated with eachnode is conditioned on the random variables leadingto it via directed arcs. In particular, the simple four-layer graph in Fig. 6 represents the joint distributionfunction

P lipid H E( , , ),

where lipid is the random variable that represents thepresence or absence of the phospholipid in the spec-trum, E is the set of instantiated variables (variables atthe bottom the graph), and H is the set of so-calledhidden variables (all other variables). The variables inthe topmost layer are hidden variables that captureprior knowledge concerning the likelihood that a givenpolar head group (say, PE) and a given fatty acid group(say, C34:1) are found in the sample. The topmost layeris where we account for expert knowledge about growthconditions and polar head group distributions. The sec-ond layer contains the random variable lipid. This is thequery variable whose posterior conditional probability,P(lipid|E), we ultimately seek to evaluate. The vari-ables in the third layer are hidden variables that rep-resent the detectable species that could be formed fromthe fragments and adducts of the phospholipid. Figure6 models simple biochemical and detection processes.In particular, it models just one fragmentation pathway(wherein the polar head group detaches from the fattyacid group) and three adduct pathways (wherein asodium, potassium, or calcium ion bonds to the polarhead group). In a more realistic model, one would haveto account for all the likely fragments and adducts. The

variables in the bottommost layerare the evidence variables, whichrepresent features extracted from amass spectrum.

There are two classes of featuresin our simple model. The first are“detection” features, which dependon one detectable species each.They are set to the value “true” ifa line is detected in the spectrum atthe predicted mass. If no line isfound, the random variable is set tothe value “false.” Second are “rela-tive intensity” features, which de-pend on two detectable species andare set to the value “true” if the twoamplitudes of the detected lines arewithin acceptable ranges of eachother. If the relative amplitudes areoutside the acceptable ranges, thecorresponding relative intensityfeature is set to the value “false.” If

99) 367

C. S. HAYEK ET AL.

Figure 6. A graphical model representing the joint distribution associated with thephospholipid in Fig. 5, PE (C34:1). The joint distribution depends on 16 binary randomvariables. The topmost two variables correspond to the prior probabilities associated withthe occurrence of the PE polar head group and the C34:1 fatty acid chain. The single nodein the second layer is the one whose posterior probability we wish to calculate. The fivenodes in the third layer are used to represent the conditional probabilities of forminghydrogen (M + H), sodium (M + Na), or potassium (M + K) adducts, or of losing the polarhead group (M 2 phg), or losing the polar head group and picking up a hydrogen (M 2 phg+ H). The bottom layer consists of variables that represent the events of detecting lines atthe appropriate masses (Det) and variables that represent amplitude relationships be-tween detected lines (<).

Det

PE C34:1

PE (C34:1)

M – phg + H

Det < Det << DetDet

M + H M + Na M + K M – phg

Det

only one of the detectable species is detected, a relativeintensity feature cannot be calculated and, consequent-ly, the corresponding random variable cannot be set toany value and must be treated as a hidden variable.Hidden variables are the mechanism used by probabi-listic models for dealing with missing data.

Once the belief network is defined, we can use it tosolve the fundamental problem of data analysis, whichis to determine the likelihood of a conclusion given theevidence. In this case, the conclusion is whether thephospholipid in question is present (lipid = true) orabsent (lipid = false). Bayes’ rule tells us to evaluateP(lipid|E) in the following way:

P lipid E

P lipid H E

P lipid H EH

H lipid

( | )( , , )

( , , ).

,

=∑

∑ (2)

Figure 7 shows the architecture of a prototype systemwe have developed to perform these calculations. Weuse the bucket-elimination variant of the junction treealgorithm,10 which is related to the forward-backwardalgorithm commonly used in hidden Markov models.Figure 8 shows the interface presented to the user by theprototype system. The data in the file selected by theuser are plotted in the window at the upper right. The

368 JOHN

posterior probability P(lipid|E) iscomputed for several hundredphosopholipids and displayed inorder of decreasing probability, inthe lower right-hand window. Atrained analyst may step througheach of the candidate phospholip-ids and examine the detailed evi-dence used to calculate the cor-responding posterior probability. Agraphical representation of some ofthe evidence is overlaid on thedisplayed spectrum in Fig. 8.

The system we have developedis a prototype intended to explorethe feasibility of applying graphicalmodels to phospholipid analysis.We have completed the frameworkfor the system and are currentlyimproving both the preprocessingalgorithms and the graphical mod-el. To do the latter we are workingwith chemists to characterize theexpected pathways for fragmenta-tion and adduct formation. Initialresults suggest that probabilisticcalculations based on graphicalmethods are tractable and robust.As our understanding of the chem-

istry and measurement processes becomes more sophis-ticated, it is inevitable that our graphical models willbecome more complex. It is likely that the models willprove intractable for the junction tree algorithm. Tohandle intractable graphs, APL has used internal inde-pendent research and development resources to devel-op a powerful algorithm for approximate inference onhighly connected graphs.11

SUMMARYThe goal of computer-assisted identification of mass

spectra has been approached in the relatively low masschemical compound arena as demonstrated by commer-cially available programs such as the National Institutefor Standards and Technology (NIST) Mass Spectrom-eter Search Program12 and the Probability-BasedMatching Program.13 However, the high masses andcomplexity of CBW agent molecules relevant to theAPL TOF Mass Spectrometer Program, the differentionization technique (MALDI) used in the instrumentto emphasize the molecules’ unique signatures, andthe lack of the equivalent of the NIST/Wiley librariesrequire new approaches. Working with colleaguesat APL, the Army Medical Research Institute ofInfectious Diseases, the University of Maryland, Col-lege Park, and The Johns Hopkins University School

S HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1999)

COMPUTER-ASSISTED INTERPRETATION OF MASS SPECTRA

Rawdata

Interfacevalidator FFT FFT – 1 Peak

finderPeak

character Threshold Peaklist

Modelsourcecode

Lines andfeatures

Matchpeaks

and lines

Bucketelimination

Probabilities GUIGibbssampling

MFT

Completefeaturevector

Modelparameters

C + BBNcompiler

Graphicalmodel

Preprocessing

Graphicalmodel

Inference engines

or

Figure 7. Architecture of "pTool," the application that demonstrates the concepts described in the text. pTool has three major components.The first is the preprocessing component, which extracts a list of peaks from a raw spectrum. The second is the graphical modelingcomponent. Graphical models are described in a special-purpose modeling language that is implemented as an embedded language in"C." Finally, graphical models are evaluated by the inference engines, which use either exact (bucket elimination) or approximate (Gibbssampling) algorithms. We will shortly implement an approximate algorithm based on mean field theory (MFT). FFT = fast Fourier transform;GUI = graphical user interface.

of Medicine, we have developed two techniques forhandling either consistent or variable toxin and CBWagent signatures. As more findings regarding the agents’

Figure 8. This screen capture shows the graphical user interface presented to the user. Thebox at the top left allows the user to select a data file containing a spectrum (displayed atthe top right). Arrows mark the predicted positions of lines corresponding to specificphospholipids. In this case, the five lines correspond to the phospholipid PE (C34:1). Thelist of phospholipids at the lower left also gives the posterior probability that a givenphospholipid is actually present in the spectrum. The scatter plot at the bottom has onepoint per phospholipid and provides a quick look for comparing the posterior probabilitieswith a heuristic measure of relative abundance.

JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1

signatures become available, their implications forcomputer-assisted spectrum identification will be fac-tored into algorithm development.

999)

REFERENCES1Bryden, W. A., Benson, R. C., Ecelberger, S. A.,

Phillips, T. E., Cotter, R. J., and Fenselau, C.,"The Tiny-TOF Mass Spectrometer for Chemi-cal and Biological Testing," Johns Hopkins APLTech. Dig. 16, 296–310 (1995).

2Platt, J. A., Uy, O. M., Heller, D. N., Cotter,R. J., and Fenselau, C., "Computer-Based LinearRegression Analysis of Desorption Mass Spectraof Microorganisms," Anal. Chem. 60, 1415–1419 (1988).

3Stark, H., and Woods, J., Probability, RandomProcesses, and Estimation Theory for Engineers,Prentice Hall, Englewood Cliffs, NJ (1994).

4Hayek, C. S., "Identification of Unknown MassSpectra Using Statistical Estimation," STX-97-097, JHU/APL, Laurel, MD (13 Jun 1997).

5Heller, D. N., Fenselau, C., Cotter, R. J.,Demirev, P., Olthoff, J. K., et al., "Mass-SpectralAnalysis of Complex Lipids Desorbed Directlyfrom Lyophilized Membranes and Cells,"Biochem. Biophys. Res. Commun. 142, 194–199(1987).

6Heller, D. N., Cotter, R. J., Fenselau, C., andUy, O. M., Anal. Chem. 59, 2806–2809 (1987).

7Lechevalier, M. P., "Lipids in BacterialTaxonomy—Taxonomists View," CRC Crit.Rev. Microbiol. 5, 109–210 (1977).

8Goldfine, H., "Lipids of Prokaryotes—Structureand Distribution," in Membrane Lipids ofProkaryotes, S. Razin and S. Rotern (eds.),Academic Press, New York, pp. 1–43 (1982).

9Jordan, M. I., Learning in Graphical Models, MITPress, Cambridge, MA (1999).

10Lauritzen, S. L., and Spiegelhalter, D. J., "LocalComputations with Probabilities on GraphicalStructures and Their Applications to ExpertSystems," J. Roy. Statist. Soc. Ser. B 50, 157–224(1988).

369

C. S. HAYEK ET AL.

11Pineda, F. J., and Wang, I.-J, “Phase-Space Field Theory for Sigmoid BeliefNetworks,” RSI-99-003, JHU/APL, Laurel, MD (1999).

12Stein, S. E., "Optimization and Testing of Mass Spectral Library SearchAlgorithms for Compound Identification," J. Am. Soc. Mass Spectrom. 5,859–866 (1994).

13McLafferty, F. W., Zhang, M. Y., Stauffer, D. B., and Loh, S. Y., "Comparisonof Algorithms and Databases for Matching Unknown Mass Spectra," J. Am.Soc. Mass Spectrom. 9, 92–95 (1998).

370 JOHN

ACKNOWLEDGMENTS: We would like to thank Mildred Donlon of DARPAfor the financial support to carry out this work, Harvey Ko and Wayne Bryden ofAPL for helpful engineering and biochemistry discussions, and Catherine Fenselauat the University of Maryland, College Park, and Robert Cotter of The JohnsHopkins School of Medicine for key criticisms and observations.

C. SCOTT HAYEK is a member of APL’s Principal Professional Staff. Hereceived B.S. and M.S. degrees in physics from the University of Maryland. Hejoined APL‘s Submarine Technology Department in 1978, working in oceanphysics and signal processing, and also worked for 3 years in the Naval WarfareAnalysis Department in radar signal intelligence. Mr. Hayek has been theprincipal investigator and test scientist for various sea tests studying theperformance limits of passive and active sonar. His current work is split betweenocean acoustic surveillance and counterproliferaton R&D. He is a member of theAcoustical Society of America. His e-mail address is [email protected].

THE AUTHORS

FERNANDO J. PINEDA is on the APL Principal Professional Staff and is amember of the System and Information Sciences Group of the Research andTechnology Development Center (RTDC). He earned B.S. and Ph.D. degrees inphysics from the Massachusetts Institute of Technology and the University ofMaryland, College Park, respectively. He serves as Program Manager for theApplied Mathematics Program in the RTDC and is a lecturer in the ComputerScience Department of The Johns Hopkins University. Dr. Pineda has researchinterests in neural computation, machine learning, analog VLSI, statisticalphysics, and quantum physics. He has served on the editorial boards of NeuralComputation, Applied Intelligence, Neural Networks, IEEE Transactions on NeuralNetworks, and the Johns Hopkins APL Technical Digest. His e-mail address [email protected].

OTIS W. DOSS III received his B.A. in mathematics from the University ofVirginia in 1980 and his M.S. in numerical science from The Johns HopkinsUniversity G. W. C. Whiting School of Engineering in 1984. Mr. Doss has beenworking at APL as a resident subcontractor since 1993 and currently is a memberof the Signal and Information Processing Group. Since coming to APL, he hasbeen involved in developing signal processing algorithms and software for bothactive and passive underwater acoustic applications. His current interests involvedeveloping graphical user interfaces for tactical display of passive underwateracoustic data. Mr. Doss is a member of Phi Beta Kappa. His e-mail address [email protected].

S HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1999)

COMPUTER-ASSISTED INTERPRETATION OF MASS SPECTRA

JEFFREY S. LIN received a B.S.E. degree in mechanical/aerospace engineeringfrom Princeton University in 1986 and an M.S. degree in computer science fromThe Johns Hopkins University in 1989. He is currently working on a doctoratein materials science and engineering at JHU. Mr. Lin is a member of the Systemand Information Sciences Group of the Research and Technology DevelopmentCenter, and develops systems for automated machinery diagnostics and nonde-structive evaluation of materials. His e-mail address is [email protected].

JOHNS HOPKINS APL TECHNICAL DIGEST, VOLUME 20, NUMBER 3 (1999) 371