c9 ctl - UNT Digital Library

8
DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsi- bility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recom- mendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. F s I J C P c9 ctl I L ur

Transcript of c9 ctl - UNT Digital Library

DISCLAIMER

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsi- bility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recom- mendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

F s I

J

C P

c9 ctl I

L ur

DISCLAIMER

Portions of this document may be illegible in electronic image products. Images are produced from the best available original document.

, + I

ANOMALY DETECTION APPLIED TO A MATERIALS CONTROL AND ACCOUNTING DATABASE*

Rena Whiteson, Lisa Spanks, Tresa Yarbro, F e r n Kelso, and Janet Zirkle Los Alamos National Laboratory

Los Alamos, New Mexico 87545 USA

Chris Baumgart AlliedSignal Inc., Kirtland Operations

Albuquerque, New Mexico USA

ABSTRACT

mission of reducing the nuclear danger in- cludes accurate recording of the processing and transportation of nuclear materials. Nuclear material storage facilities, nuclear chemical processing plants, and nuclear fuel fabrication facilities collect and store large amounts of data describing transactions that involve nuclear materials. To maintain confi- dence in the integrity of these data, it is essen- tial to identify anomalies in the databases. Anomalous data could indicate error, theft, or diversion of material. Yet, because of the com- plex and diverse nature of the data, analysis and evaluation are extremely tedious.

An important component of the national

This paper describes our work in the development of analysis tools to automate the anomaly detection process for the Material Accountability and Safeguards System (MASS) that tracks and records the activities associated with accountable quantities of nuclear material at Los Alamos National L a b ratory. Using existing guidelines that describe valid transactions, we have created an expert system that identifies transactions that do not conform to the guidelines. Thus, this expert

system can be used to focus the attention of the expert or inspector directly on significant phe- nomena.

BACKGROUND The Material Accountability and Safe-

guards System (MASS) is a dynamic database that tracks and reports the location, use, and status of all the nuclear material items residing at Los Alamos National Laboratory (LANL). MASS is a near-real-time database with termi- nals located at each site with nuclear material requirements and is the Laboratory's official nuclear material accountability record. It can manage diverse operations on a variety of nuclear materials. In the history of LANL, this system has evolved from pen and paper jour- nals to the current computerized system.' The large amounts of data, and the complex and diverse nature of the data, make analysis and evaluation extremely diffi~ult.~" Neverthe- less, the management of nuclear material is taking on more importance as the mission of the Department of Energy evolves from one of nuclear materials production to nuclear mate- rial management and storage. Our goal is to apply advanced artificial intelligence technol- ogies to develop automated error and anomaly detection and database assessment tools.

*This work supported by the US Department of Energy, Office of Safeguards and Security.

a . 4

APPROACH Transaction Data

Our discussions with experts at the pluto- nium facility have revealed two distinct types of data: transaction data, which records move- ment of material from one material balance area (MBA) to another, and inventory data, which records physical location of material. Our first target is the transaction data. In ear- lier anomaly detection work, we trained neural networks to model and predict how a system should behave and, thereby, detect when an abnormal state or event occ~rred.~ The neural

network approach is effective for processes for which no clear model exists and other methods will not work. However, models of many of the MASS transactions exist in the form of Process Accounting Flow Diagrams (PAFDs). These PAFDs are flowcharts that guide the user on the proper way to enter a transaction into the MASS database. See Figure 1 for an example of a PAFD. Because these models of transac- tions exist, we determined that a rule-based expert system would be the most efficient and effective method of detecting anomalies in the transaction data.

r t 3 t V PROJ: 345 I NMGTDUSE I

Q or@ oi

IDES: L520 744 PR I I R

745 P/S I NR

SOLUTION HELD IN T1, T2, T3, T4, T5, T7, T9, T10

EVAPORATQR PIS E Account 745

NMGT DUSE IDES: L520 744 P/S LR

721 P/S IX 745 P/S NR Q or@

Cr_--_) If Lots Combined in tanks FEED TANKS FOR EVAPORATOR C - - - - J

EVAPORATOR: 600 LITERS REDUCED TO -25 - 60 LITERS FI L T ~ ATE SALT DISTILLATE

I IDES: R2- R260 SUBACCOUNTABLE SAMPL

SUBACCOUNTABLE SAMPLE

CST3

745 p/sFF Figure. I . The PAFD for the Evaporator Process (EV).

Inventory Data In addition to the transaction records,

there is inventory data. We intend to generate another expert system for this data. The rules against which these records wiU be checked will be specific to the inventory data. We believe that analysis of the inventory data will complement the analysis of the transaction data. It will be interesting to learn which types of anomalies will be detectable by both analysis methods and which by only one.

PROCESS ACCOUNTING FLOW DIAGRAMS

Each unique process has a PAFD that describes the transactions that are valid for that process. The PAFDs provide a basis for dis- criminating between normal and anomalous transaction^.^-' These diagrams describe in a flowchart fashion each type of process used at LANL and identify the transaction information that is appropriate to a given process. Such information includes the sending and receiving processes, the sending and receiving accounts, the amount of nuclear material involved, the type of measurement technique used to assay or weigh the item, and other key identifiers that describe the nature of the nuclear material.

For example, Figure 1 shows the PAFD for the evaporator process (EV). The horizon- tal double lines near the top of the diagram indicate items entering the evaporator process. This PAFD indicates that processes LR (Lean Residue Ion Exchange), IX (Ion Exchange), or NR (Nitrate Recovery) are the only processes that can feed into EV. Near the bottom of the diagram are three sets of horizontal double lines, which indicate items exiting from the EV process to another process. CF (Cement Fixation), LR (Lean Residue Ion Exchange), DS (Ion Exchange), and IX (Ion Exchange) are the only processes to which EV can feed. The codes in the rectangular boxes identify the Item Description (IDES) that must be associ-

ated with each transaction. This PAFD shows that items coming into the EV process must have L520 in the IDES field.

We used the PAFDs to establish a rule- base for transaction checking. The rulebase has been coded and tested, based on PAFDs. In addition, some of the specialized knowledge that experts utilize to identify errors in the MASS database has been incorporated into the rulebase.

RULES We built our expert system using com-

mercial software for developing expert sys- tems called Exsys Professional. It distin- guishes between valid and invalid transactions. We have two sets of rules in the expert system. The first set is based on established criteria that apply to all transactions. For example, every transaction record must include a Lot ID. Any transaction lacking this field is incom- plete and considered erroneous.

The second set of rules depends on the individual processes. Each individual Process Status, as described by its PAFD, has a set of criteria. An example would be a transaction recording material moving from Process Status Evaporator (EV) to another Process Status. As explained above, material moving from EV may be moved to only Cement Fixation (CF), Lean Residue Ion Exchange (LR), Ion Exchange @S), or Ion Exchange (E). If a transaction record indicates material moved from EV to Ion Exchange (RFX), for example, that would be a violation of the PAFD and therefore anomalous. For each Process Status, we are creating an expert system. Every trans- action is checked against the general rules that apply to all transactions as well as against the rules that apply to the specific Process Status.

Our approach is to view each PAFD as a tree with multiple branches. By analyzing the

input fields in a transaction record, the expert system can determine which branch of the PAFD tree the transaction represents. For example, the Evaporator Process, as repre- sented in the EV PAFD in Figure 1 has nine branches:

1. TanktoTank 2. Tank to Evaporator 3. Filtrate branch out of evaporator 4. Salt branch out of evaporator 5. Distillate branch out of evaporator 6. Recycle distillate back to evaporator 7. Filtrate output to process CF 8. Salt output to process CF 9. Filtrate Output to processes CF, LR, DS,

or IX

Figure 2 shows the section of the EV PAFD that describes the sixth branch: recycle distillate back to evaporator. One of the char- acteristics of this branch is a Lot ID that con- tains the substring EVDIST. If no valid branch can be identified, then the transaction is con- sidered anomalous. If there is a valid branch, then further checking is done. In Figure 2 the numbrs inside the circle indicate that the only measurement code that is valid for this branch is TOO. If the transaction record indicates any other measurement code, then the transaction

would be considered anomalous. The report would reflect that the transaction is part of the recycle branch but that the measurement code is not acceptable.

RESULTS Our expert system analyzed transaction

data from March 1994 through April 1995 and generated reports that identify each transaction by a unique ID number and indicate whether it is valid or anomalous. A total of 757 transac- tions were evaluated. Of those transactions, 153 were judged to be anomalous. The types of anomalies include

required field missing, invalid field entry, procedure violation, and a valid transaction that was not defined on the PAFD.

Some of the anomalies detected can be found by manual examination. However, some were of a type not being detected by current methods.

CONCLUSIONS AND FUTURE GOALS The chief advantage of the expert system

is the ease of development and maintenance and the accessibility of the underlying logic.

EVAPORATOR: 600

Figure 2 . The recycle distillate back to evaporator branch of the evaporator process.

An expert system can provide information about a transaction other than whether it is valid or anomalous. It can evaluate a trans- action and indicate how well it matched an allowable transaction, and it can report which rule(s) have been violated.

Clustering Multhratiate Data Another method we can apply to our data

is cluster analysis. For multivariate data, each transaction or observation includes the value of several variables. For each transaction we examine 10 fields so we are using 10 variables to describe each transaction. We would like to determine if, based on these 10 variables, the transactions can be grouped into two clusters: valid and anomalous. Cluster analysis is an exploratory technique that can lead to enhanced understanding of the processes gen- erating high-dimensional data. It can also lead to practical ways to develop classification sys- tems in cases where the classes are not known in advance, but rather are defined as a result of the clustering. The objective in cluster analysis is often to identify two groups from a sample set. There are two broad approaches to cluster analysis: hierarchical or non-hierarchical. Hierarchical clustering begins by assuming each example belongs to a distinct cluster and then merges clusters until only a few clusters remain. Non-hierarchical clustering includes methods such as k-means and, more generally, model-based methods.The k-means method divides the examples into k groups by using the mean vector of each example. Model- based methods assume that each cluster gener- ates data according to a somehow unique dis- tribution. We want to choose a number of clusters that best represents the data in some sense.

A system such as ours, which can evalu- ate and assess the integrity of data, has broad applicability to safeguards databases. In addi- tion, the same algorithms can be applied to

analyzing other kinds of data. Any venture that uses databases with large amounts of data can benefit by information about the quality of its data.

As the mission of nuclear materials facili- ties changes from processing to storage, expert systems like ours will become extremely important in verifying the integrity of the data describing material in storage.

Near-term plans include a regular sched- ule of analysis of MASS data. From this site- specific anomaly detector, we intend to extract those features that would be applicable to a generic safeguards anomaly detector and incorporate it into the Local Area Network Material Accountability System (LANMAS). LANMAS is a new generation nuclear mate- rials accounting system for Department of Energy sites8 LANMAS has been designed to accept site-specific e m r checking and error handling functions. Inclusion of technology to validate MC&A data will make LANMAS more useful. We hope to assist sites in develop- ing anomaly detection systems for their MC&A databases. Advantages of doing so include

leverage work already done, use local expertise, and use the LANMAS connection when- ever possible.

Options for site implementation include integrating the anomaly detection func- tions into LANMAS, developing the anomaly detector as a stand-alone system, and integrating the error detection model into existing on-site systems.

* REFERENCES

1. Bruce H. Erkkila and Ferman Kelso, “MASS- An Automated Accountability Sys- tem,” Nucl. Mater. Manage. XXIII, 300- 303 (1994).

2. R. C. Beam, D. G. Shirk, R. S. Marshall, and C. C. Thomas, Jr., “Systems Concepts for DOE Facilities: Analysis of PF/LASS Data,” Los Alamos National Laboratory report LA-9270-MS (June 1982).

3. R. C. Bearse, N. J. Roberts, and V. L. Longmire, “Extrinsic and Intrinsic Com- plexities of the Los Alamos Plutonium Facility,” Nucl. Mater. Manage. XIV, 295- 300 (1985).

4. J. T. Markin, R. B. Strittmatter, H. S. Vaccaro, and W. J. Whitty, “Concepts for Advanced MC&A Systems,” Nucl. Mater. Manage. XV(2), 48-54 (January 1987).

5. R. Whiteson and J. J. Britschgi, “Data Requirements for an Anomaly Detector in an Automated Safeguards System Using Neural Networks,” Nucl. Mater. Manage. XXII, 83-87 (1993).

6. R. Whiteson, Chris Baumgart, Thomas W. Tunnell, and Ferman Kelso, “An Anomaly Detector Applied to a Materials Control & Accounting System,” N u l . Mater, Manage. XXIII, 318-323 (1994).

7. R. Whiteson, F. Kelso, C. Baumgart, and T. W. Tunnell, “The Effectiveness of a Pro- cessing Facility Plant Model,” Los Alamos National Laboratory, Safeguards Systems Group report NIS-7/94-957 (September 1994).

8. Joe Clabom and AI Alvarado, “LANMAS Core: Update and Current Directions,” N u l . Mater. Manage. XXIII, 313-317 (1994).