Global parameter optimization for maximizing radioisotope detection probabilities at fixed false...

4
Global parameter optimization for maximizing radioisotope detection probabilities at fixed false alarm rates David Portnoy n , Robert Feuerbach, Jennifer Heimberg Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA article info Available online 7 February 2011 Keywords: Radiological and nuclear detection Global optimization Receiver operating characteristic Homeland security abstract Today there is a tremendous amount of interest in systems that can detect radiological or nuclear threats. Many of these systems operate in extremely high throughput situations where delays caused by false alarms can have a significant negative impact. Thus, calculating the tradeoff between detection rates and false alarm rates is critical for their successful operation. Receiver operating characteristic (ROC) curves have long been used to depict this tradeoff. The methodology was first developed in the field of signal detection. In recent years it has been used increasingly in machine learning and data mining applications. It follows that this methodology could be applied to radiological/nuclear threat detection systems. However many of these systems do not fit into the classic principles of statistical detection theory because they tend to lack tractable likelihood functions and have many parameters, which, in general, do not have a onetoone correspondence with the detection classes. This work proposes a strategy to overcome these problems by empirically finding parameter values that maximize the probability of detection for a selected number of probabilities of false alarm. To find these parameter values a statistical global optimization technique that seeks to estimate portions of a ROC curve is proposed. The optimization combines elements of simulated annealing with elements of genetic algorithms. Genetic algorithms were chosen because they can reduce the risk of getting stuck in local minima. However classic genetic algorithms operate on arrays of Booleans values or bit strings, so simulated annealing is employed to perform mutation in the genetic algorithm. The presented initial results were generated using an isotope identification algorithm developed at Johns Hopkins University Applied Physics Laboratory. The algorithm has 12 parameters: 4 realvalued and 8 Boolean. A simulated dataset was used for the optimization study; the ‘‘threat’’ set of spectra contained 540 SNM and industrial signatures, and the ‘‘benign’’ set of spectra contained 240 NORM and medical signatures. As compared to a random parameter search, the statistical optimization was able to able to find parameters that yield significantly higher probabilities of detection for all probabilities of false alarm from 0 to 0.1 (and equal to for probabilities of false alarm greater than 0.1), in a relatively small number of iterations. The number of iterations used, 1000, is also many fewer than would be required for a reasonable systematic search of the parameter space. & 2011 Elsevier B.V. All rights reserved. 1. Introduction Today there is a tremendous amount of interest in systems that can detect radiological or nuclear threats. Many of these systems operate in extremely high throughput situations where delays can have significant negative impact. Thus, calculating the trade-off between detection rates and false alarm rates is critical for their successful operation. Receiver operating characteristic (ROC) curves have long been used to depict this tradeoff. The methodology was first developed in the field of signal detec- tion [1]. In recent years it has been used increasingly in machine learning and data mining applications [2]. It follows that they could be applied to radiological/nuclear threat detections sys- tems. However, typical isotope identification algorithms used by many radiological/nuclear threat detections systems do not neatly fit into the principles of statistical detection theory. They tend to lack tractable likelihood functions and have many para- meters. Empirical ROC curves can be used to assess the perfor- mance of systems without tractable likelihood functions [3], but often these systems are further complicated by the fact that they tend to have many parameters, which, in general, do not have a one-to-one correspondence with the classes that are to be identified. In these cases the best strategy seem to be, finding those parameter values that maximize the probability of detec- tion (P d ) for a selected number of probabilities of false alarm (P fa ). This work proposes a strategy to overcome these problems by Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/nima Nuclear Instruments and Methods in Physics Research A 0168-9002/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.nima.2011.01.149 n Corresponding author. E-mail address: [email protected] (D. Portnoy). Nuclear Instruments and Methods in Physics Research A 652 (2011) 29–32

Transcript of Global parameter optimization for maximizing radioisotope detection probabilities at fixed false...

Nuclear Instruments and Methods in Physics Research A 652 (2011) 29–32

Contents lists available at ScienceDirect

Nuclear Instruments and Methods inPhysics Research A

0168-90

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/nima

Global parameter optimization for maximizing radioisotope detectionprobabilities at fixed false alarm rates

David Portnoy n, Robert Feuerbach, Jennifer Heimberg

Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA

a r t i c l e i n f o

Available online 7 February 2011

Keywords:

Radiological and nuclear detection

Global optimization

Receiver operating characteristic

Homeland security

02/$ - see front matter & 2011 Elsevier B.V. A

016/j.nima.2011.01.149

esponding author.

ail address: [email protected] (D. Por

a b s t r a c t

Today there is a tremendous amount of interest in systems that can detect radiological or nuclear

threats. Many of these systems operate in extremely high throughput situations where delays caused

by false alarms can have a significant negative impact. Thus, calculating the tradeoff between detection

rates and false alarm rates is critical for their successful operation. Receiver operating characteristic

(ROC) curves have long been used to depict this tradeoff. The methodology was first developed in the

field of signal detection. In recent years it has been used increasingly in machine learning and data

mining applications. It follows that this methodology could be applied to radiological/nuclear threat

detection systems. However many of these systems do not fit into the classic principles of statistical

detection theory because they tend to lack tractable likelihood functions and have many parameters,

which, in general, do not have a onetoone correspondence with the detection classes. This work

proposes a strategy to overcome these problems by empirically finding parameter values that

maximize the probability of detection for a selected number of probabilities of false alarm. To find

these parameter values a statistical global optimization technique that seeks to estimate portions of a

ROC curve is proposed. The optimization combines elements of simulated annealing with elements of

genetic algorithms. Genetic algorithms were chosen because they can reduce the risk of getting stuck in

local minima. However classic genetic algorithms operate on arrays of Booleans values or bit strings, so

simulated annealing is employed to perform mutation in the genetic algorithm. The presented initial

results were generated using an isotope identification algorithm developed at Johns Hopkins University

Applied Physics Laboratory. The algorithm has 12 parameters: 4 realvalued and 8 Boolean. A simulated

dataset was used for the optimization study; the ‘‘threat’’ set of spectra contained 540 SNM and

industrial signatures, and the ‘‘benign’’ set of spectra contained 240 NORM and medical signatures. As

compared to a random parameter search, the statistical optimization was able to able to find

parameters that yield significantly higher probabilities of detection for all probabilities of false alarm

from 0 to 0.1 (and equal to for probabilities of false alarm greater than 0.1), in a relatively small number

of iterations. The number of iterations used, 1000, is also many fewer than would be required for a

reasonable systematic search of the parameter space.

& 2011 Elsevier B.V. All rights reserved.

1. Introduction

Today there is a tremendous amount of interest in systemsthat can detect radiological or nuclear threats. Many of thesesystems operate in extremely high throughput situations wheredelays can have significant negative impact. Thus, calculating thetrade-off between detection rates and false alarm rates is criticalfor their successful operation. Receiver operating characteristic(ROC) curves have long been used to depict this tradeoff. Themethodology was first developed in the field of signal detec-tion [1]. In recent years it has been used increasingly in machine

ll rights reserved.

tnoy).

learning and data mining applications [2]. It follows that theycould be applied to radiological/nuclear threat detections sys-tems. However, typical isotope identification algorithms used bymany radiological/nuclear threat detections systems do notneatly fit into the principles of statistical detection theory. Theytend to lack tractable likelihood functions and have many para-meters. Empirical ROC curves can be used to assess the perfor-mance of systems without tractable likelihood functions [3], butoften these systems are further complicated by the fact that theytend to have many parameters, which, in general, do not have aone-to-one correspondence with the classes that are to beidentified. In these cases the best strategy seem to be, findingthose parameter values that maximize the probability of detec-tion (Pd) for a selected number of probabilities of false alarm (Pfa).This work proposes a strategy to overcome these problems by

D. Portnoy et al. / Nuclear Instruments and Methods in Physics Research A 652 (2011) 29–3230

empirically finding parameter values that maximize the probability ofdetection for a selected number of probabilities of false alarm.

While ROC curves can be used to compare and evaluate theperformance of classifiers, the main thrust of this work is togenerate graphs that can be used for parameter selection. Assum-ing that when Pfa decreases the best Pd associated with that Pfa alsodecreases, a set of parameters for several Pfa values are desiredsuch that the number of expected false alarms can be set accordingto various operating conditions. Thus, the objective of this work isto define a methodology that finds settings for the parameters ofisotope detection systems, which yield the best detection prob-ability for several predefined false alarm probabilities.

2. Global parameter optimization

Given enough time and resources, an estimation of all (Pd, Pfa)pairs for an algorithm’s entire parameter space at a reasonablegranularity may be generated. These estimates can then bedisplayed on ROC scatter-plot. In general, each Pfa value will haveseveral corresponding Pd values. For this work we are interestedonly in the highest Pd and its associated parameter values, for agiven Pfa. It is important to note that interpolation between pointson this graph is liable to result in incorrect conclusions because itcannot be assumed that an interpolation between parametervalues will generate the interpolated (Pd, Pfa) pair.

For algorithms with more than a handful of parametersperforming this exhaustive search of the parameter spacebecomes virtually impossible. Take for example an algorithmwith 10 parameters. If we plan to examine a relatively modest5 values for each parameter, we have 510 combinations ofparameters for which Pd and Pfa values must be estimated. Nowif each Pd and Pfa value requires a conservative 500 samples tocompute, and each sample requires 1 s to process, the time togenerate this very granular estimate of the (Pd, Pfa) pairs wouldtake (510)(5 0 0)(1)E4.9�109 CPU-seconds, or about 154.8 CPU-years. Even if one were to go through this exercise it is question-able whether such a granular estimate would even be useful. Thuswe are left with two options: a guided approach where domainexperts and algorithm designers make their best guesses as tothe sub-regions of the parameter space to explore, or cast theproblem into a global optimization task and use machine-learningtechniques to perform an optimization. While exploring theparameter space using domain experts and algorithm designersis useful, it can lead to the exclusion of productive portions of theparameter space. The work described here uses a machine-learning method applied to the optimization problem with anaim towards reducing the time required to perform the searchwhile at the same time reducing the bias that may be introducedby domain experts and developers.

Since only the maximum Pd for some predefined set of Pfa

values (Pfa targets) is desired, the problem can be cast into to oneof global optimization. For this we require an objective functionand an algorithm to perform the optimization. The followingobjective function is proposed:

f ðPd,Pfat ,PfaeÞ ¼ ð1�mÞð1�PdÞþmPfae�Pfat

Pfat

��������

where

mðPfat ,PfaeÞ ¼ 1�exp�ðPfae�PfatÞ

2

2s2

!

Here Pfat is the probability of false alarm target, Pfae is the is theprobability of false alarm estimated from the output of the identifica-tion algorithm run on the test data, and Pd is the probability ofdetection estimated from the output of the identification algorithm

run on the test data. Because the optimization is attempting to findboth the maximum Pd and the target Pfa, the function adjusts theweights of these depending on how far away the estimated Pfa is fromthe target Pfa. Thus, when there is a large difference between theestimated Pfa and the target Pfa the function is dominated by the termrepresenting the difference between these two values. As theestimated Pfa gets closer to the target Pfa, the function becomesdominated by the Pd term. The mixing function m that is used tocontrol the transition from Pfa to Pd is an unnormalized Gaussian,which has a single parameter s. Just as the standard deviation in aGaussian controls the spread of the distribution, the parameter s inthe mixing function controls how close the maximized Pd needs to beto the target Pfa. Smaller s values will force the maximized Pd valuescloser to the Pfa targets.

While any stochastic optimization technique could be usedwith the aforementioned objective function, the optimizationtechnique chosen for this work combines elements of simulatedannealing [4] with elements of genetic algorithms [5,6]. Geneticalgorithms were chosen because they can reduce the risk ofgetting stuck in local minima. However, classic genetic algorithmsoperate on arrays of Booleans or bit strings, so simulated anneal-ing is employed to perform mutation in the genetic algorithm.

In the genetic algorithm vernacular, each target Pfa is assigned apopulation of chromosomes. Each chromosome is a set of parametervalues; these parameters are the ones to be optimized. A populationof chromosomes consists of the most ‘‘fit’’ parameter values asdetermined by the objective function described above. During oneiteration of the optimization, a chromosome is randomly selectedfrom one of the target Pfa populations. This chromosome is mutatedand with some probability crossed-over with another chromosome.Simulated annealing is used to perform the mutation. New values (ormutated values) are selected from a normal distribution defined bythe current value of the parameter, the parameter value range, andthe current variance of the distribution. The variance is lowered as the‘‘temperature’’ of the system decreases. This effectively reduces thesize of the mutations as the system ‘‘cools’’ and allows ‘‘fit’’ chromo-somes to converge on a solution. After mutation is performed, thenewly mutated chromosome is crossed-over with another chromo-some taken from one of the target Pfa populations with someprobability (this probability also is reduced as the system cools).Crossover involves randomly choosing some portion of one chromo-some and swapping it with some portion of another chromosome.This results in two new chromosomes that need to be evaluated forfitness. The fitness of the new chromosomes is evaluated for eachtarget Pfa using the objective function. A chromosome that has abetter fitness than any member of a target Pfa population is added tothat population, and the least fit chromosome is removed from thatpopulation. Every time a new chromosome is added to a populationthe temperature of the system is reduced, cooling it.

The optimization algorithm has several parameters dealingwith mutation and crossover probabilities, control of the systemtemperature, and s of the objective function. The values selectedwere determined using a fake detection algorithm, which con-sisted of two functions that took 6 values and returned Pd and Pfa

estimates. This allowed the optimization to be run extremelyfast and the optimization algorithm parameters to be set usingsubjective evaluations. The optimization algorithm parametersused to generate the results shown in this paper are listed below.While they do work well for two isotope identification algo-rithms, the one described here and a PVT based portal algorithm(results currently unpublished), it may be overly optimistic toassume that they will work well for all detection systems. A morerigorous determination of these optimization algorithm para-meters is left to future work. The population size for each targetPfa was set to 10; the starting s for mutation was 6 and waslowered to 0.02 in 50 temperature reduction increments; the

D. Portnoy et al. / Nuclear Instruments and Methods in Physics Research A 652 (2011) 29–32 31

probability of crossover started at 0.2 and was lowered to 0.01 in250 temperature reduction increments; if a chromosome wasselected for crossover, then the probability of crossover with achromosome from a different population than its own was 0.5;and the objective function s was set to 0.01 for all target Pfa.

3. Results

The presented initial results were generated using an isotopeidentification algorithm similar to the one described by Portnoyet al. [7]. The only significant difference is the ability to use thenon-negative least-squares (NNLS) algorithm [8] in addition toiterative QR factorization (IQRF) for the isotope template deconvolu-tion. The algorithm has 12 parameters: 4 real-valued and 8 Boolean.The real-valued parameters can take on values in the range 0–1inclusive. The Pfa targets used were 0.002, 0.005, 0.01, 0.02, and 0.04.A simulated dataset was used for the optimization study; the threatset of spectra contained 540 SNM and industrial signatures, and thebenign set of spectra contained 240 NORM and medical signatures,which were injected onto real background spectra. Fig. 1 showsrepresentative ROC scatter-plots for a random parameter search andan optimized parameter search run of 1000 iterations each. The plotsshow the optimization focusing its search in the most relevantportions of the space: the upper-left-hand corner in the area of lowPfa and high Pd. It can also be seen that the random parameter search0was not able to find parameter values that achieve probabilities ofdetection that the optimized search can. For instance the optimized

Fig. 1. Representative ROC scatter-plots. These plots were generated using a random paramet

optimized search, plot b shows the results for the random search, plot c shows an expanded v

left portion of plot b. It can be seen that the ROC optimization does a good job of covering the

of the plot. In addition, the optimization found much higher probabilities of detection than

search found parameter values that yield an estimated Pd of 0.92 for aPfa of 0, while the best the random search could do for the same Pfa isa Pd of 0.76.

The plots shown in Fig. 1 are representative of 10 independentruns completed for both the optimization and the random searches.Because stochastic optimization algorithms cannot guarantee findingan optimum solution it is useful to assess the repeatability of thesolutions found. To this end an optimization was run for 7000iterations, more than 5000 iterations after the best parameter sets(as defined by the objective function) for each target Pfa stoppedchanging. Then 95% confidence bounds were calculated for eachmaximum Pd for each estimated Pfa from 0.0 to 0.045 (the lowerbound ranged from 0.90 to 0.98 probability of detection and theupper bound ranged from 0.94 to 1.00). For the 10 runs none of theoptimized maximum Pd fell outside the 95% confidence and none ofthe random results fell inside the confidence bounds.

4. Conclusions

This article describes a methodology for generating scatter-plotssimilar to classic ROC curves. The motivation for doing this is to gainan understanding for the tradeoffs between probability of detectionand probability of false alarm for radiological and nuclear threatdetections systems that do not fit nicely into statistical detectiontheory. The methodology relies on stochastic optimization and anobjective function that attempts to maximize probabilities of detec-tion for probabilities of false alarm that are arbitrarily close to some

er search and the optimized search for 1000 iterations, plot a shows the results for the

iew of the upper-left portion of plot a, and plot d shows an expanded view of the upper-

entire space and focusing its search in the area of most interests: the upper-left portion

the random search in the region from 0 to 0.015 probability of false alarm.

D. Portnoy et al. / Nuclear Instruments and Methods in Physics Research A 652 (2011) 29–3232

target probability of false alarm. While one cannot guarantee thatthese stochastic algorithms will not converge on local minima, theresults from initial testing indicate that the optimization searchcovers a significant portion of the parameter space before focusingin on the regions of interest. While this article concentrated onapplications to isotope identification algorithms there is no reasonwhy this methodology could not be applied to other detectionsystems that do not fit nicely into statistical detection theory andhave many parameters.

Acknowledgement

Funding for this work comes from the US Department ofHomeland Security Domestic Nuclear Detection Office.

References

[1] W.W. Peterson, T.G. Birdsall, W.C. Fox, Transactions on IRE Professional Groupon Information Theory, PGIT 2–4 (1954) 171.

[2] T. Fawcett, Pattern Recognition Letters 27 (2006) 861.[3] A.H. Najmi, S.F. Magruder, BMC Medical Informatics and Decision Making 5

(2005) 33.[4] S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Science 220 (1983) 671.[5] J.H. Holland, Adaptation in natural and artificial systems: an introductory

analysis with applications to biology, Control and Artificial Intelligence, MITPress, 1992.

[6] D.E. Goldberg, Genetic Algorithms in Search, Optimization and MachineLearning, Addison Wesley Longman, Inc., 1989.

[7] D. Portnoy, P. Bock, P. Heimberg, E. Moore, Using ALISA for High-SpeedClassification of the Components and their Concentrations in Mixtures ofRadioisotopes, Proceedings of the SPIE 5541 (2004).

[8] C.L. Lawson, R.J. Hanson, Solving Least Squares Problems, Prentice-Hall, 1974(Chapter 23).