Optical implementations of radial basis classifiers · radial-basis-function classifier is...

Optical implementations of radial basis classifiers

Mark A. Neifeld and Demetri Psaltis

We describe two optical systems based on the radial basis function approach to pattern classification. Anoptical-disk-based system for handwritten character recognition is demonstrated. The optical systemcomputes the Euclidean distance between an unknown input and 650 stored patterns at a demonstratedrate of 26,000 pattern comparisons/s. The ultimate performance of this system is limited by optical-diskresolution to 1011 binary operations/s. An adaptive system is also presented -that facilitates on-linelearning and provides additional robustness.

Key words: Neural networks, radial basis functions, pattern recognition, optical disk.

1. Introduction

We describe two optical architectures for the realiza-tion of distance-based classifiers; in particular, radialbasis function classifiers. The first is an optical-disk-based implementation. The two-dimensional (2-D)storage format of the optical disk makes parallelaccess to data an attractive possibility. The opticaldisk can be thought of as a computer-addressed 2-Dbinary spatial light modulator (SLM) or storage me-dium with a space-bandwidth product (SBP) of 1010pixels. A number of potential applications that takeadvantage of these characteristics exist and have beendiscussed elsewhere in the literature. 1 2 An optical-disk-based implementation of radial basis classifiersis quite natural owing to the large storage require-ments typical of such pattern-recognition algorithms.In the system described here the optical-disk-basedradial-basis-function classifier is demonstrated as ahandwritten character recognition system. The sec-ond architecture is a parallel adaptive neural networkthat facilitates on-line learning and offers addedrobustness to noise and optical system imperfections.

2. Radial Basis Functions

The radial basis function (RBF) approach to patternrecognition differs from neural networks that arebased on supervised, output error-driven learning

M. A. Neifeld is with the Department of Electrical and ComputerEngineering, University of Arizona, Tucson, Arizona 85721. D.Psaltis is with the Department of Electrical Engineering, Califor-nia Institute of Technology, Pasadena, California 91125.

Received 21 May 1992.0003-6935/93/081370-10$05.00/0.© 1993 Optical Society of America.

algorithms such as back error propagation (BEP) in anumber of respects.3 4 It is typical for RBF-basedsystems to incur short learning times while requiringrather large network realizations. This approachtherefore bears some similarity to memory-intensivesample-based systems such as K-nearest neighbor(KNN) classifiers.5 In such systems, learning timeand learning algorithm complexity are traded forclassification time and memory requirements. Themotivation for using a RBF network to performpattern-recognition tasks comes from the relativelywell established mathematical framework associatedwith regularization theory and hypersurface recon-struction.6 In hypersurface reconstruction the prob-lem is to construct an approximate function f(w, x),which takes a vector x into a prescribed output f(x).The vector w is a parameter vector used to tune theestimate f. For simplicity we consider one-dimen-sional (1-D) outputs only. In order to construct f weprovide a set of training samples {xi -f(xi);i = 1, . . , M} taken from f(x) (i.e., the underlyinghypersurface to be approximated). The problem thenreduces to the choice of the form of f and theappropriate parameters w, such that f(w, xi) = f(xi)for i = 1, . . . , M. This problem is identical to thepattern-recognition problem in which one is given aset of training patterns and is asked to find a classifierf with the appropriate parameters w, such that theresulting machine classifies the training set correctly.In both cases we desire that future samples bemapped correctly and that the system behave well inthe presence of noise. In order to obtain thesedesirable characteristics in hypersurface reconstruc-tion, a criterion of smoothness is often placed on theestimator f. The RBF approach may be derived asthe optimal solution to the regularized problem for a

1370 APPLIED OPTICS / Vol. 32, No. 08 / 10 March 1993

specific smoothing operator.7 The RBF solution de-fines an approximating function f(w, x) as a weightedsum of radially symmetric basis functions in MUN.

Given a training set X = {xi,f(xi); i = 1, . . . ,M}comprising a set of M points {xi E MN; i = 1, . . . , M}and the values of the unknown function f(x) at thosepoints, we see that the RBF approach specifies anestimator as

M

j'(w, X) = I ai exp(- I x - tiI2/L2)i=1

(1)

where the centers or templates {ti}, the widths {i},and the weights ai comprise the parameter vectorw = {t i, i, ai; i = 1,... M} and are determinedfrom the training set.

The RBF classifier seeks to approximate the under-lying function as a sum of Gaussian bumps. Accord-ing to the above expression f comprises M of thesebumps, each centered at ti with width r; and weightedby ai to form the final output. We may estimate theparameters w from the training set such that f(xi) f(xi) by using any number of supervised and/orunsupervised algorithms.3 7

The RBF approach may also be considered as aneural-network architecture, as shown in Fig. 1.We define the RBF unit shown in Fig. 1(a) as aneuron, with a response given by

y = exp(- Ix- ti /r2)

where t i is called the neuron center and vi the neuronwidth. These units are depicted in the second layerof Fig. 1(b). The output layer of the RBF networkconsists of a single linear unit whose output is simplythe weighted sum of its inputs. The overall networkmapping conforms then to Eq. (1), as desired. InFig. 2(a) we show a RBF network for estimating afunction of two input variables, and in Fig. 2(b) wedepict an example of an input space configuration ofthe mapping induced by such a network. The dots inFig. 2(b) represent the training samples, and thedashed circles represent the e -1 contours of the fourGaussian basis functions used to construct the RBFnetwork. As a specific example of training such anetwork, we used a k-means algorithm with k = 4 todetermine the centers of the basis functions.8 Thisprocedure results in determination of the four cen-ters, shown as asterisks in the figure. In order todetermine the widths associated with each center, weused a KNN algorithm. The five nearest neighborsto each center were chosen, and the average of thesefive distances was used as vi for the associated bump.Note that these procedures result in the determina-tion of the centers ti and the widths ori in a completelyunsupervised fashion. In this way the first layer of aRBF network may be trained without using an error-driven procedure, thereby reducing training time.Training of the output layer can be accomplishedthrough the use of either a mean-square error minimi-

xlX, 1\

2 Y

t N Y = exp(-IX - tl2/o2)

X NRBF Units

Xl

X 2

w2 Y

WN Y = X W

X N Linear Units

(a)

XAF(x)

RBF Network

(b)

Fig. 1. (a) Definition and schematic of RBF units (top) and linearunits (bottom). (b) General RBF network.

zation procedure (e.g., adaline) or a relatively simpleperceptron learning algorithm.9

3. Radial-Basis-Function-Based Handwritten CharacterRecognition

In this section we describe the implementation of theRBF classifier trained to solve a handwritten charac-ter recognition problem. We consider the ten-classproblem of identifying handwritten digits 0-9. Us-ing a SUN3/60 workstation, several authors wereasked to draw the numerals 0-9 on a 16 x 16 grid.The resulting database of 950 images (95 per class)was randomly separated into a 300-element testingset and a 650-element training set, forming ourreference library. Examples of characters from thetraining and testing sets are shown in Fig. 3.

In order to provide both shift and scale invariance,we first preprocessed both training and testing sets sothat each 16 x 16 image was centered (by reposition-ing each character within the 16 x 16 grid such thatthe number of blank rows or columns of pixels wasthe same on all sides of the character) and scaled to a10 x 10 window (by stretching each character suchthat its maximum extent is 10 pixels). Followingthis preprocessing, we unraster the 10 x 10 pixel

10 March 1993 / Vol. 32, No. 08 / APPLIED OPTICS 1371

X1

X2

*

I........ I-

(a)

X2

Xi*;...- ........ ..

I, * ,

(b)

Fig. 2. (a) RBF network for estimating a scalar function of twovariables. (b) Example input space configuration resulting fromthe network shown in (a).

input field to form a 100-bit binary vector, and eachsuch vector t corresponding to each of the 650preprocessed training or reference images is stored onthe optical disk as a radial line. For each vector ti wealso store its complement t in the adjacent position.This method of encoding permits us to simulatebipolar templates, using disks that store binary,unipolar reflectivity values. The template pixel sizein this experiment is chosen to be 177 tracks by 116pixels along the track. Track-to-track spacing is 1.5pm and pixel separation is approximately 1.0 [um.This storage scheme permits us to record 1376 tem-plates per disk.

The architecture we implemented is shown in Fig.4. The preprocessed 100-bit binary vector x is pre-sented to the system shown in Fig. 4, and the firstlayer of RBF units computes the RBF projectionsyi =exp(- Ix - tiI2/ci2). We choose to use as RBF cen-ters {ti} all 650 reference images of the training set.This choice of centers also facilitated an earlierKNN-based handwritten character recognition sys-tem that has been reported elsewhere.10 After theRBF projections are calculated in the middle layer,this 650-dimensional intermediate representation isthen transformed by using the interconnection ma-trix Wto arrive at a ten-dimensional output repre-sentation as shown. Each output neuron corre-sponds to one of the ten classes, and a winner-take-allnetwork then performs the classification. Since wechoose to use the entire 650-template training set as

Fig. 3. Examples of handwritten numerals from the training set(top) and the testing set (bottom) used in the optical RBF experi-ment.

RBF centers, the only iterative learning required forthe first layer of this network is for widths {i . Ofcourse, the second layer must also be trained toperform the desired classification on the resultingRBF representations.

There are many potential training algorithms for{o } and 7 The most successful algorithm we foundfor computing the widths {(oi was to make oi propor-tional to the distance between template ti and its


-. al .- -.

EE

A Ef 11

Mml

ml

1�11

is

...

MI.,

E

N

52

E

92

.2

M

M2

�2

52

12

M12

M12

12

0

E

E

1.

0

� 11.�

I .

E

I.,

33

E

13

13

V

In

2.

1.

�_M

1.

0

1.

M

2.

E

IN

54

E

M14

M

M

0

25

m

o

95

� 1B

m

115

R

1&5

M

M

In

,5

E

15

N

15

75

15

E

II

1.

136

�0

0

IN

16

E

36

N

n

66

76

86

IM17

1M21

17

�7

107

W

E

117

117

11

117

Ml

7

17

Z'

IM37

11

57

17

17

17

11

E

E58

In

1.

E

MM

1.

E

1.

38

M

E

78

1i

E

IN

1.

E

131

119

M. 171

N

11

31

19

5�

E

E

71

E

RMH

Im

aN

7�

ml

la

la

11

'm

i.

N

m

Bo

1.

/ ....................I

I

I .......

N11

0

11

M

lo

71

.1

E

{i-}X . ~~~~~~~~~~~~~~All9 CLASSLASSIFICATION

100

650

Fig. 4. RBF network for handwritten digit recognition.

nearest neighbor. That is,

cri = urmin JtV - t ,

where the proportionality constant is selected apriori. Training of the output layer was most suc-cessful when 7was initialized with a binary addressalgorithm and then trained by using the perceptronlearning algorithm. The binary address algorithmdoes not require specific knowledge of the intermedi-ate representations generated during training; it onlyrequires knowledge of the class assignment of each ofthe 650 RBF centers. This reduces second-layercomputation time and improves network performance.The binary address algorithm defines the initial 7as

WiU = 1

if tj E Qi

otherwise-

Following this initialization, we use the perceptronalgorithm to incorporate detailed knowledge of thetraining representations into the output-layer weights.

Using these procedures for training the RBF net-work, we have in computer simulation a best RBFperformance of 89%, as shown in Table 1. Althoughthe trend with increasing is an improvement innetwork performance, in general, we found that thebroader the basis functions, the longer the perceptronalgorithm takes to converge. For this reason, Table1 does not contain any entries for > 1.2. We notehere that the best RBF network performance of 89%is substantially better than the best KNN systemperformance of 83% obtained using the same tem-plate library. This performance can also be com-pared with a single layer of ten neurons, each trainedwith the perceptron algorithm by using the 650-image reference library. The recognition rate in this

case is 75% on the 300-element testing set. Ingeneral, we would expect an improvement in RBFnetwork performance with variable centers in whichM, ti, and (xi are all optimized. This case is notstudied here since we are interested primarily in theperformance of the optical implementation.

In Fig. 5 we show the RBF widths computed byusing the procedure described above. Each row inthe figure represents the widths associated withcenters in a single class. There are therefore 65blocks per row and 10 rows in the figure. Each blockis a gray-scale coding of the width associated with thecorresponding template, with dark being zero width.With this encoding, each row of the figure corre-sponds to the values of vi for templates in a singleclass. Note that the second row in Fig. 5, correspond-ing to handwritten l's, is particularly dark, whichindicates that these vectors tend to be well clusteredor are in general located close to other vectors. Alsoin Fig. 5 we see that the width associated with oneparticular template representing a handwritten 6 isquite broad, indicating that this vector is basicallyisolated in the input space. With the same displayformat as that in Fig. 5, Fig. 6 shows the ten weightvectors of the second layer generated for the best RBFnetwork. The single bright row in each weightvector indicates that the weight vector is tuned tointermediate representations from essentially oneclass.

4. Radial-Basis-Function Optical Classifier

A schematic of the optical system used to compute thedistance between an unknown preprocessed inputimage and each template stored on the disk in theformat described above is shown in Fig. 7. In this

Table 1. Classification Results Obtained with a One-Nearest NeighborRule for Training the Radial-Basis-Fuction Widths

& Training Seta Testing Setb

0.5 650 2260.7 650 2571.0 650 2661.2 650 267

aNumber of elements correct out of a 650-element set.bNumber of elements correct out of a 300-element set.

Fig. 5. RBF widthsrule with & = 1.2.

computed using the one-nearest-neighbor


Fig. 6. Second-layer weights computed with the perceptron algo-rithm after initialization with the binary address algorithm. Theweights for neurons 1-10 appear consecutively from left to right,top to bottom.

architecture an Epson liquid-crystal television is usedas a 1-D SLM to present the unrastered input charac-ter to the system. An image of the input vector isformed as a radial line on the disk as shown, and thetotal diffracted intensity is collected by the outputlens and measured with a Photodyne 1500XP detector.The detector output represents the inner productbetween the input vector and the illuminated tem-plate vector. The postprocessing system for thisexperiment consists of two parts. First, a sample-hold circuit is used to detect the peaks of the rawdetector output. The amplitudes of these peaksrepresent the desired inner products. The sample-hold circuit is clocked by a signal that is phase lockedto the sector markers that are recorded on the disk;the markers appear as 32 bright radial lines andprovide a strong diffracted signal. The second stageof postprocessing consists of an analog-to-digital con-verter board in an IBM PC computer followed bysoftware that implements the nonlinearity of thesecond layer and computes the final output.

The 650 reference images were preprocessed asdescribed above and stored on the disk along withtheir complements as 100-bit binary vectors. With adisk rotation rate of 20 Hz, these 1300 vectors wereprocessed at a rate of 26,000 inner products/s, equiv-alent to 2,600,000 binary operations/s. It should bepointed out that this relatively slow processing speedarises from a severe underutilization of disk capacity.In this experiment a large template pixel size was

used (177 tracks by 116 pixels across track) in order toprovide alignment simplicity. A system that utilizesthe minimum disk resolution of roughly 1 pm pixels,together with a disk rotation rate of 100 Hz, wouldachieve an inner-product rate of 107/s, correspondingto a raw processing rate of 1011 binary operations/s.The 300 testing images were preprocessed as de-scribed in Section 3, and stored in an IBM PCcomputer, which drove the liquid-crystal televisionand provided input vectors to the system. An exam-ple of the raw detector output for the all l's inputvector is shown in Fig. 8. The two tallest peaks inthis trace correspond to sector markers on the diskand represent the inner product between the all l'svector and itself. From this data we can calculatethe effective brightness per input pixel measured atthe detector as 0.6 nW. This value is in good agree-ment with the known optical losses in the system.The other peaks in Fig. 8 provide normalization datathat are stored in memory and read out duringpostprocessing. The IBM-PC samples the inner-product signal once per peak, averages four rotationsworth of data (total acquisition time 0.2 s), andcomputes the Euclidean distances from the innerproducts as

Ix - til2 = Ix12 + til2 - 2x ti

where x is the unknown input image and ti is a storedtemplate. Since our optical system actually mea-sures ti x and ti x, we may form the distance forbinary vectors as

x12 = (x 1)

= x (ti + ti),

so that

Ix -til = 0i2 + X.t -t

Once again, Itij2 for i = 1, .. ., 650 is stored innormalization memory and read out during the post-processing stage.

The optical-disk-based inner-product calculationsare collected by the postprocessing system that com-

INPUT SLM

OPTICALDISK

INPUTILLUMINATION

POTRCSI DETECTOR

Fig. 7. Optical system used to compute the distance between aninput and an array of stored templates.

Fig. 8. Example of raw detector output indicating the opticallycomputed inner products.


[1 = 1, 1)]

putes the required Gaussian weighting and simulatesthe output layer in which a classification is made.These postprocessing steps were carried out off line insoftware for our experiments. The classification ratefor the optical RBF system was 83%. This is com-pared with a recognition rate of 79% obtained usingan optical KNN network based on the same templatedata. A comparison between the performances ofthe optical system and a computer simulation isshown in Table 2. The table entries indicate thenumber of correct classifications out of 30 for each ofthe 10 classes of characters 0-9. The various noisesources in the optical system result in a 6% loss ofrecognition rate. In order to better understand theeffect of these imperfections on the RBF networkperformance, we constructed a computer model thatincorporates error sources such as finite contrast,nonuniform illumination profile, detector noise, andquantization noise. Using values for the error vari-ables as measured from the optical apparatus, wefound that nonuniformity of the illumination profilewas the limiting factor in our experiment. A plot ofclassification rate versus log of the 1/e 2 Gaussianprofile width is shown in Fig. 9. We can see fromFig. 9 that for the measured profile parameter valueof 1.8, the expected recognition rate drops to 86%.This rate is then the noise-limited optical systemperformance and is close to the experimentally demon-strated 83%. The cumulative effect of these errorscan be measured a second way; directly from thedistance calculations. In Fig. 10 we show the 650distances computed for a single input image (a hand-written 3), obtained by using both the ideal computersimulation and the optical system. From the figurewe see that there is a substantial variation betweenthese two plots. This variation can be quantified bycomputing the rms distance error over the entiretesting set as

Table 2. Performance Comparison between Optical RBF Classifierand Simulationa

Class Experiment Simulation

0 25 291 28 292 28 283 25 274 28 235 20 256 25 247 23 288 24 259 22 29

Total 248 267Overall recognition rate 83% 89%

aThe table elements are the number of correct classifications outof 30 for each class.

We observe from the previous discussion that thehardware implementation of a RBF network com-prises two primary components: the subsystem forcomputing M parallel Euclidean distances and thebasis-function evaluation subsystem. Shown in Fig.11 is an optical system that can be used to obtain therequired parallel distance computation for the case ofbinary vectors. A similar system can be used tocompute the distances for continuous-valued vectors;however, we concentrate on the binary system fornow. In Fig. 11 an N-dimensional binary vector x isrepresented as a vertical intensity array in the inputplane, and each center ti is stored in a vertical columnof the t transparency shown. This system is dual railsince it requires x and t and their complements x andt, respectively. We now show that by using this

0.90

ADrms =

M 1/2_q I (d t-sim)2

1 M

iswhere disim and dPt are the Euclidean distancesbetween the 300 input images and the 650 templatescalculated from the simulation and the optical sys-tem, respectively. There are M = 195,000 suchmeasurements in our case. For the results pre-sented here the rms distance error was found to beADrms = 28.5%. Although this error is quite large,the recognition rate obtained by using the opticalsystem is in satisfactory agreement with the expectedrate, attesting to the robustness of the RBF approach.

5. Parallel Optical Distance Computation

In order to provide additional robustness to theoptical system as well as to increase the computationspeed, we propose a parallel nondisk implementation.

0.80

'K

0.70 F

0.60 -

0.50

Model

1.4 1.6 1.8 2.0 2.2 2.4 2.6

Profile Width

Fig. 9. Predicted recognition rate versus illumination profilewidth.


-- contrast-100contrast=20

I . I . . I . . . .

. . . . . . . . . . . .

Input is 3: 3x SLM

100.0

50.0

0.0 -0.0

150.0

100.0

50.0

200.0 400.0 600.0

Template Number

(a)

0.0 I

0.0 200.0 400.0 600.0

Template Number

800

(b)Fig. 10. (a) Experimental and (b) actual distance versus templatenumber for a single input image (handwritten 3, number three).Template numbers 195-260 represent the class of handwritten 3's.

representation, we can perform the distance computa-tion entirely optically.

Given an input x and a center ti, we can write theEuclidean distance between these two vectors as

N N

d = jx - tit2 = (Xj - ti)2 = j.We1 c r w t m t dcj=1We can further write the componentwise distances dji

y

4 Z

tV' | V kContrast Reversal 11/, t

SLM

\~_ -SLM

Fig. 11. Parallel optical distance computer.

in the binary case as the exclusive or function (XOR) ofthe component bits. That is,

i= X + jtj.

Writing dji in this complement form makes the opticalrealization more clear. Returning to the systemshown in Fig. 11, we see that light from the x SLM iscollimated in the x direction and imaged in the ydirection so that, immediately to the right of thetransparency t, the componentwise product is formedbetween the input and all of the centers. That is, wegenerate the array {xjtji; i = 1, . . , M;j = 1, . . . , NJ.Similarly, in the lower arm of the system the comple-ment array ITxtji; i = 1, . . . ,M; j = 1, . . . ,N} isformed, and these two arrays are simultaneouslyimaged upon a contrast-reversing SLM. This super-position combined with the contrast reversal yieldsthe desired component distances dji to the right of thecontrast-reversing SLM. A good candidate for thiscontrast-reversal SLM is the optically addressed ferro-electric liquid-crystal SLM.'1

Returningoince again to Fig. 11, we see that, afterthe bitwise XOR's are computed as described above,the desired array of distances is obtained by summingin the y direction with the cylindrical lens shown.A simple 1-D SLM can be used to represent thedesired widths so that, immediately to the right of theoutput plane shown, we obtain the desired terms{I x- til2/'Ui2;i = ,... Ml.

Although the system described above operates on1-D arrays of data, a 2-D version, which is bettersuited for operating on image data, is also possible.This 2-D extension is straightforward and involvesthe use of lenslet. arrays for accessing the spatiallymultiplexed template images stored in the t and tplanes. The details of the 2-D system as well asthose of an extended 1-D system capable of operatingon continuous-valued vectors are the subject of fu-ture research and are not discussed here; however, inorder to indicate the expected performance limita-tions of these systems, we consider the 2-D binaryversion. If we assume that the inputs to our systemare 100 x 100 pixel images, then the number of


150.0

centers in the RBF network is limited by the SBP ofthe optical system. Obtaining 900 centers requires atemplate mask with 3000 x 3000 pixels, which inturn demands an optical system with SBP = 9 x 106.This would be the largest feasible implementation.Notice that although the contrast-reversal plane musthave a large SBP, the output plane requires a SBPequal to only the number of centers. This is anattractive characteristic of the present system, sinceon-line learning will require a programmable SLM inthis plane for vi adaptation. In the present examplethis SLM would be required to have only 30 x 30pixels.

6. Parallel Basis-Function Evaluation

Having defined the optical distance computer inSection 5, we turn our attention to the secondprimary component of the optical RBF implementa-tion. This component performs the basis-functionevaluation. Notice that the basis-function evalua-tion requires only point operations in the plane ofdistances. Further, if we consider the case of asingle output neuron [i.e., f (x): N -71], thencompletion of the RBF computation after the distancecomputer requires point operations only, followed bya global sum. With this observation in mind wepropose the optoelectronic postprocessing chip shownin Fig. 12.

The chip shown consists of an array of modules,each module comprising a photodetector to detect theoutput of the optical distance computer, analog multi-pliers to realize the required width and output weight-ing, and an exponentiation unit to realize the basis-function evaluation. The output of each such moduleis summed on a common line to generate the networkresponse. We should note here that all the requiredfunctions in a module are compactly achievable byusing analog very-large-scale integration (VLSI) or, ifmore precision is required, the photodetector may befollowed by an analog-to-digital converter, and theneach module could be implemented in digital electron-ics. The 2-D extension of this postprocessing chip isonce again straightforward, and, since there are nointermodule communication requirements, connectiv-ity issues in the 2-D arrangement do not arise. Allcomputation in this postprocessing chip is local,excepting the final sum. Also note that this imple-

I II II Il I

l l

l l

l l

l l

_ _ Mo l 1{ 1MS~l M Module I

Fig. 12. Optoelectronic postprocessing chip.

mentation has the flexibility to permit the realizationof a variety of different basis functions as well as tosupport a useful on-line learning algorithm, which isdiscussed further in Section 7.

The above all-electronic postprocessing chip is par-ticularly well suited for the case of a network with oneoutput only. For the case of multiple outputs wehave two alternative systems. If the number ofoutputs is relatively small (- 10), then the mostattractive alternative is simply to use multiple postpro-cessing chips. This approach retains the simplicityand the flexibility of the VLSI implementation.Alternatively, if the number of outputs is large, wemay consider a hybrid approach wherein each e -x boxin Fig. 12 is followed by a light-modulating element,permitting the exponentially weighted distances to beread out optically, with liquid-crystal modulators, forexample.12 In this way an efficient optical implemen-tation of the output layer is facilitated. This ap-proach has the advantage of providing scalability interms of output units while retaining much of theconvenience of the VLSI implementation. In thissystem, update of the output weights during a learn-ing cycle is done optically through the use of photore-fractive holograms in the output layer. 13

7. Learning

The effective implementation of iterative learningalgorithms is a common stumbling block in bothelectronic and optical neural-network architectures.Here we suggest a learning algorithm for RBF net-works that is suitable for implementation with theoptoelectronic hardware we described. Associatedwith each of the postprocessing stages (i.e., all-electronic and hybrid) is an implementation of theon-line learning algorithm. We describe the all-electronic single-output implementation here.

Referring to Eq. (1) for the RBF network responsefunction, we can define a criterion function for thegoodness of an RBF network as

M M

E = [W Xi) -f(x])] 2 = E2i= i=

where Ei is the error between the actual and desirednetwork responses in the presence of training vectorxi only. E is the conventional sum-of-squared-errorfunction evaluated over the entire training set. Ifwe assume that the training set T is fixed and thateach training vector is used as a single RBF center asbefore, then the learning procedure reduces to finding{vi } and {ai to minimize the error E. A simplegradient descent procedure is a candidate algorithmfor the minimization of E. Using this procedure, wecan write expressions for the update of the networkparameters up and a in response to the error mea-sured for a single-input training vector x. These are

\(1/qrP)2= -c:,EIIx - tPI2ap exp(-Ix' - tp12/rP2)

(2)


r- - - - � - -,

C

Froe --- --- X

I I eetos I 0

IE m3L________-_-1

Fig. 13. On-line learning postprocessing module.

- tp F/o, 2,A(ap = a-aE, exp(- I x P (3)

where a, and eta are acceleration constants for thewidth and output weight updates, respectively.14Notice that these expressions define a backward errorpropagation type of rule for RBF networks. In thislearning algorithm, however, no special backwardresponse function is required for the RBF units owingto the fact that the exp(x) function is its own derivative.All signals required to compute the updates defined inEqs. 2) and 3) above are present in the forward pathof the network. Furthermore, we observe that allrequired learning signals are present in the electronicportion of the proposed implementation and that nointermodule communication is necessary. The impli-cation of these observations is that rapid, parallelupdate of all network parameters can be realized witha simple modification of the postprocessing modulepresented earlier. In Fig. 13 we show a block dia-gram of th� modified postprocessing module. Byincorporating the additional local connections shownand by adding accumulation registers to the (J/U,)2and ai blocks, we can implement the on-line parallelRBF learning algorithm with little increase in overallcircuit complexity over the nonadaptive system. Inthis way a 30 X 30 element adaptive postprocessingarray, capable of facilitating on-line learning in a900-center RBF network, should be possible.

8. Conclusions

We have demonstrated an o tical system that canimplement a RBF pattern classifier. The experimen-tal system achieved a processing rate of 2,600,000binary operations/s, corresponding to the computa-tion of 13,000 Euclidean distances/s. The capabilityof the optical-disk-based system is limited by the

maximum length of template vectors (_ 104 bits), themaximum number of template vectors (_ 105), andthe maximum disk rotation rate (- 100 Hz). Theseupper bounds correspond to a processing rate of- 1011 binary operations/s.

This system was trained off line with the handwrit-ten numerals 09, and it achieved a recognition ratein a computer simulation of 89% on a 300-elementtesting set. Similar performance 91% recognitionrate) was achieved by using the same off-line trainingprocedure in an RBF network with 2000 centers; thenetwork was trained on segmented zip-code dataobtained from the U.S. Postal Service database. Theoptical-disk-based 650-center system achieved a recog-nition rate of 83%. In this study it was found thatfactors such as nonuniform disk reflectivity, nonuni-form illumination, and finite contrast were all signifi-cant contributors to a 28% rms error in the opticaldistance computation. Furthermore, since this largedistance error resulted in only a 6 degradation inrecognition performance, the RBF approach was seento be robust in the presence of such errors.

We might expect an on-line learning scheme inwhich optical system imperfections are present duringthe learning phase to provide compensation for thoseimperfections and to result in a recognition ratecloser to the simulation value. We have describedsuch an adaptive optical RBF hardware implementa-tion. If on-line learning, more careful system de-sign, more powerful learning algorithms for learningthe optimal center and width values, and a largerhidden layer (- 2000 units) are combined, the opticalsystem should be able to approach the 91% recogni-tion rate obtained in simulation for the zip-code data.

This work was supported by the U.S. Army Re-search Office and the Defense Advanced ResearchProjects Agency. The authors thank SeiJi Kobayashiof the Sony Corporation, Subrata Rakshit, and AlanYamamura for assistance with the optical-disk record-ing system.

References1. D. Psaltis, M. A. Neifeld, and A. A. Yamamura, "Image

correlators using optical memory disks," Opt. Lett. 14, 429-431(1989).

2. D. Psaltis, M. A. Neifeld, A. A. Yamamura, and S. Kobayashi,"Optical memory disks in optical information processing,"Appl. Opt. 29, 2038-205 7 (1990).

3. J. Moody and C. Darken, "Fast learning in networks of locallytuned processing units," Neural Computat. 1, 281-294 (1989).

4. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learninginternal representations by error propagation," in ParallelDistributed Processing. Explorations in the Microstructureof Cognition, D. E. Rumelhart and J. L. McClelland, eds. (MIT,Cambridge, Mass., 1986), Vol. 1, pp. 318-362.

5. T. M. Cover and P. E. Hart, "Nearest neighbor patternclassification," IEEE Trans. Inf Theory IT-13, 21-27 (1967).

6. T. Poggio and F. Girosi, "A theory of networks for approxima-tion and learning," AI Memo No. 1140 (MIT Artificial Intelli-gence Laboratory, Cambridge, Mass., 1989).

7. T. Poggio and F. Girosi, "Networks for approximation andlearning," Proc. IEEE 78,1481-1495 (1990).


8. J. MacQueen, "Some methods for classification and analysis ofmultivariate observations," in Proceedings of the Fifth Berke-ley Symposium on Mathematical Statistics and Probability,L. M. LeCam and J. Neyman, eds. (U. California Press,Berkeley, Calif., 1967), Vol. 1, pp. 281-297.

9. R. Duda and P. Hart, Pattern Classification and Scene Analy-sis (Wiley, New York, 1973), Chap. 5, pp. 141-147.

10. M. A. Neifeld, S. Rakshit, A. A. Yamamura, and D. Psaltis,"Optical disk implementation of radial basis classifiers," inOptical Information Processing Systems and Architectures, II,B. Javidi, ed., Proc. Soc. Photo-Opt. Instrum. Eng. 1347, 4-15(1990).

11. D. Jared, K. M. Johnson, and G. Moddel, "Joint transformcorrelator using an amorphous-silicon ferroelectric liquid-

crystal spatial light modulator," Opt. Commun. 76, 97-102(1990).

12. T. J. Drabik and M. A. Handschy, "Silicon VLSI ferroelectricliquid-crystal technology for micropower optoelectronic com-puting devices," Appl. Opt. 29, 5220-5223 (1990).

13. D. Psaltis, D. J. Brady, and K. Wagner, "Adpative opticalnetworks using photorefractive crystals," Appl. Opt. 27, 1752-1759 (1988).

14. M. A. Neifeld, S. Rakshit, and D. Psaltis, "Handwritten zipcode recognition using an optical radial basis functionclassifier," in Applications of Artificial Neural Networks 11,S. K. Rogers, ed., Proc. Soc. Photo-Opt. Instrum. Eng. 1469,250-255 (1991).


Optical implementations of radial basis classifiers · radial-basis-function classifier is...

Documents

Transcript of Optical implementations of radial basis classifiers · radial-basis-function classifier is...