Implementation of Multispectral Image Classification
on a Remote Adaptive Computer
Marco A. Figueiredo a Clay S. Gloster b Mark Stephens c,
Corey A. Graves b, Mouna Nakkar _'
SGT Inc., Code 564, NASA Goddard Space Flight Center,
Greenbelt Road, Greenbelt, .Maryland, 20771
bElectrical and Computer Engineering, North Carolina State University
Box 7914, NCSU, Raleigh NC 27695-7914
c Code 585, NASA Goddard Space Flight Center,
Greenbelt Road, Greenbelt, Maryland, 20771
(Please send all co_espondence to Clay Glostcr)
Electrical and Computer Engineering, North Carolina State University
Box 7914, NCSU, Raleigh NC 27695-7914
E-maih [email protected]
phone: (919) 515-7348
fax: (919) 515-2285
E-mail addresses of the oth, er authors
Marco Figueiredo: marco_C__fpga.gsfc.n asa.gov
Mouna Nakkar : mouna,,_dbmoto.coln
Mark Stephens: Mark.A.Stephen. l:_gsfc.nasa.gov
Corey Graves: [email protected]
https://ntrs.nasa.gov/search.jsp?R=19990109158 2020-07-17T07:35:45+00:00Z
Abstract
As the demand for higher performance computers for the prc)cessing of r_.'mote sensing science algorithmsincreases, the need to inw_st.igate new computing paradigms is justified. I:iehl t'rogramm;tble (late Arraysenable the implementation of algorithms at the hardware gate level, leading It, orders of magnitude porformanceincrease over microprocessor based systems. The automatic classification of spaceborne multispectral imagesis an example of a comput.ation intensive application that. can I)enefit fr,,m inlplenaentation on an FPGA-based custom computing machine (adaptive or reconfigurable c(mlputer). A probabilistic neural network isused here to classify, pixols of a mvltispectral LANDSAT-2 image. The implementation &'scribed utilizes Javaclient/server application programs to access the adaptive conaputer from _ ,'emote site. l{,'sults verify that aremote hardware versic, n of the algorithm (implemented on an _daptive c_)mputer) is signiticantly f_u_ter thana local software version ,:,f the same algorithm (implemented on a typic_d g_'neral-purpose computer).
I. INTRODUCTION
A new generation of salcllites is being developed by th(' Nat.io,l,l Aeronautics and Simce Admin-
istration (NASA) to compose the Earth Observing System (EOS). The inst.rumellts aboard the EOS
satellites not only exteHcl the observation life of the curr(,nt sa.tcllilcs, but they also exlend the ca-
pabilities of remote sensing scientists to better understand the Earlll's environlnent. Along with the
scientific advancements of lhe new missions, it is also neccssat'y' 1.o (,xplore new technologies that fa-
cilitate and reduce the cosl of the data. analysis process. In order 1.o process the high volume of data
generated by the new EOS sat(qlites, NASA is constructing the l)istribuled Active Ar(hive Centers
(DAACs), an extensive and powerful parallel computing environment. Scientists will be able to request
certain data products from these centers for further analysis on their own coinputing systems. A new
technology that could bring inct'eased processing power to the scientist's desk, offering more complex
analysis and interpretai i(),l of t(,mote sensed scientific data, is higlily dosirable. The ultimate scenario
would be for the scientist to req,lest the data directly, from lhc sat(-llil.c along with histo,ic data from
an archive center.
Field Programmable (i;al.(, Array (FPGA)-based computing, also k,lown as "adaptive" or "reconfig-
urable computing", has _.morged a.s a viable computing ol)lion in c()ml)ul.ationall/ intensive applica-
tions. These computing systems combine the flexibility of goneral purl,ose l)rocessors with the speed of
application specific pro('ossors. By mapping hardware to FPGAs. the cotnl)uter designer can optimize
the hardware for a specili(' a.pplication resulting in accelerat.ioll tales of several orders of magnitude
over general purpose conll),lt.ers. Because the FPGAs at(' personalized using SI{AM-based memory
cells or a fuse programming technology, they can be reconfigul'od I:,y l]lc &,signer fol' other applications.
Several reconfigurabh' comput(ws have been implemented to d(,moiisl.rate the viability of reconfig-
urable processors [1], [2], [3], [4]. Applications mapped to these l,t'oc('ssors includ(': l)a.t.te,,_ recognition
in high-energy physics [51, a.pplications in statistical physics[6], an,I g(,mq.ic optin_iza.tiou algorithms
[7], [8]. In many cases [!)], [10], [11], the reconfigurable COml)uti,,g i,nl,lom,'ntation p,'ovid,,,l the highest
performance,in termsof executionspeed.The adventof r<,configurableprocessorsalongwith novel
methodsfor mappingapplicationsontoadaptiveor reconfigural>leprocessorsenablesa.newcomput-
ing paradigmthat may representthe futurefor remotesellsingscientificdataprocessing.In fact,
manyapplicationsutilizing FP(',Abasedcomputershave1)c(,ndevelopedshowingordersof magnitude
accelerationovermicroprocessorbasedsystems[12],[13],[14].Moreover.microprocessorsand FPOas
sharethe sameunderlyingtechnology- thesiliconfabrica(.i<>nprocess.Therefore,it is reasonableto
concludethat FPGAba.sedmachinescanusuallyoutperformmicroprocessorbasedsystemsby orders
of magnitude[15],[16],[1rl.
To achieve such perfoHiJance, the application must effec(ively utilize the availal_le resources. This
presents a challenge for software designers, who are generally acc_lstom(,d to mapping applications
onto fixed computing sys( eros. ('cnerally, the designers examiue the available hardware resources, then
modify" their application accordingly. With reconfigurable compll(.<'rs, 1.h(' available resources can be
generated as needed. \¥hile it may seem that this flexibility would ease the mapping process, it actually
introduces new problems, s llch as what components should 1)e used, and how many of each component
should be used to gen('ra(e the best performance. With conventioilal hardware COlnponents, these
questions are less of an isslle. In addition, software engineers are generally not adept at hardware design.
Thus, several research groups have developed methods I'_>r )nal)piIIg applications to rcconfigurable
processors [21, [18], [1'0], [2(1], [21].
The Adaptive Scientific 1)ata Processing (ASDP) grou 1) at .X:\S:\'s God(lard Space l"light Center
(GSFC), in conjunctioll with researchers at North Carotilm Sta)<_ (Tniversil.y, have been investigating
the utilization of FPGA 1,ased computing in the processing of remote sensing scientific algorithms.
The first prototype dev<qoI>ed bv the group utilized a con_mcrcial-off-the-shelf (COTS) rcconfigurable
accelerator in the impl_,nt_,tlta.tion of an automatic classifier for l.h<_ L:\N1)SAT-2 multispec(.ral images
[22]. The implementation discussed in this paper is an extension of t}l<, original pr()tol.yl)e that allows
users to classify" the images on the accelerator from a rem(',)e site. f/cs_llts indicate that a remote im-
plementation of the classilier in adaptive computing hardware is fasl.er than a sell.ware iml)lementation
that executes on a local lfie;h-end workstation.
This paper presents (hqails of the FPGA design and is organiz(',/ as follows. Section 2 describes
the classifier algorithm that utilizes a probabilistic neural ,ie(worl; (I'NN). The implementation of the
FPGA custom computiI_g machine is then presented, l:imdly, a I_crt'oH_lance ;,llalysis of local and
remote versions of the ;_l.gori_lm_ is presented.
II. Till.; PNN MULTISPECTRAI, IMAGE CI,ASSIFIEI/
Remote sensing satellitos utilize multispectral scanners to colh,,l il_forlnatioll abou! the Earth's
environment [23]. The data colh',:ted by such instruments are a set. ¢)t images, each corresponding to
one spectral band. A nmlt.ispect,'a.l image pixel is represe,_ed 16' a. v¢x:tor of size equal to the number
of bands. The combination of t.he multiple spectrum measur(_m('l_Is represented by each element
of the pixel vector det('rlnine a signature that corresponds (o a phs'sical object hoing viewed by the
satellite. Through the ol,servation of a. multispectral image and the corn l)arison of pixel vectors to those
obtained from known locat.i.,ls l i,_ situ measurements), a. s(ient.ist is ahle to identify uniq,te signatures
of physical objects and compos(_ ,:lasses. These classes co,ltai,l m,lllisl_ectral pixol representations of
physical objects on the (_arth that are closely related, lLxa,,iple class(,s i.cl,lde for,'st., tun,h'a, wetland,
water, etc.
Several neural network s¢:llenl¢,s have been devised for the a,ltoma_ic classification of multispectral
images [24]. One in pa,'_ic,,la.r, the Probabilistic Neural Network (I'NN) classilio," [25], (,xhibits ac-
ceptable accuracy, very small t,'aining time, robustness to weight cha,_ges, and m.gligible retraining
time. A description of 1he derivation of the PNN classifier and d(qails of the network implementa-
tion including rate of false alar, lls. neural network size, etc. a,'e pres(,,_t.ed in C'hc/Zri ct. al. [25]. The
Blackhills (South Dakota. I:SA),Iata set was generate([ by _he Lan(lsa_ :! m,,ltisp(','l.ral scanner (MSS).
The image's four spectral 1,ands (0.5-0.6 /_m, 0.6-0.7 /*,,_, 0.7-0.8 /_,H, and 0.8-1.1 /ml) correspond to
channels 4 through 7 of th¢. Lan,lsat MSS sensor. There are 262.114 l,i×els co,'resl)onding to a 512x512
pixel image size, and each i)ix('l represents a 76m x 76m grotmd a,,'a: th(' imae_('s were ohtained in
1973. The ground truth was provided by the United Stat(,s (leologi¢al S,1,'vey.
Figure 1 illustrates th, _ I'NN classifier procedure. Each multispectral pixel, represented hy a vector,
is compared to a set of I)ixels h(qonging to a class...\ l)r()bability val,_e is calculated t})r each class.
The highest value indi,atos the class into which the pix(,1 fils. ]';q. I ix used to derive a value that
indicates the probability t l_at the pixel fits into class 5'_..
] Jk
f(X I ,5'k) = Kl[k] _ e[-l,-2[_,:](.V-,r_,),(2-,i",,)] (1)i=1
where (0_ is a pixel recto, r. i_"*' is the weight i of class /,: d is the n,_ml,,,r of bands, 1,: is the number
of classes, Pk is the numh(_r of weights per class, and 1,_'1[/,:], K2[/,'] are ('o,_slants.)
lII. THISFPGA IMI'I.I.;MI:_NTATI()N
Thefirst stepin implementinganapplicationona.nactaptiveCOml,_llcr is to sel_'_'t,the l:PGA-based
customcoprocessorarchitecturethat bestmatchesthealKorithmin queslion.At ihe currentstateof
thetechnology,certainFP(',A architecturesprovidebel.l.crt)erformanc_Ihanothersfor ,_particular
classofapplications.A prelinainaryanalysisofthePNNclassifierindical.vdthatl.h_,FPGAarchitecture
[26],shownin Figure2, matchedwellwith thealgorithm.Theselecle¢lI:P(;A architectureiscomposed
of a PCI busbasedmol.herboardandup to 16plug-inmodules.Theseplug-inmoduleseachcontain
two Xilinx 4013EFP(',:\ _l_'vices(XFPGAandYFPGA) and provide'an equivalentof t3,000gates
per FPGA, or 26,000gate,spermodule. The designinll)lement.alionrequiredapproximately1160
CLBs(8.5%utilization)perl:P(',:\. Sincethemodulecontainedtwol"l'(_Asandtwoseparatememory
modules(connectedvia lhe IIBI;S), we can perform two lookul) table (I_!T) operalions simultaneously.
A. Algorithm partitioning
The computation intensive portion of the multispectral ilnage classilication algorithm found in Eq. 1
was identified by profiling an inlplementation of the a.lgoril hm that was written using the C program-
ming language. This COml)llla.lion was selected to be execllted on l[l__ I"PCA coprocessor to improve
performance for the complete classification algorithm. The g,'aphical user interface, data storage,
adaptive coprocessor initializat.ioll code, algorithm synchronization, and data 1/O ix performed by the
host processor. The coInl,llte intensive PNN classificatioll algorithm cqllal.ions were ulapt)ed onto a
single module.
Figure 3 illustrates tl_e algorillml pa.rtitioning. The hos_ processor ,lisl)lays the il,mge during classi-
fication. The host then sends a Ifixel vector to the FPGA coprocessor. (?lassificalion is t_erformed on
the coprocessor and resulls are tel urned to the host t.o I:,e ,lisplayed. The host also COlnl)Utes the total
time required to process a complete image. If we wish t<} use mull.il,le modules as coprocessors, the
host schedules a pixel \'color to I,e processed on each mo_lllle in a I_lll(1 robin fashion. 1hen gathers
the results as they become, availal_le.
B. FPGA application </(:.,i(/_
Due to the limited mm_l,('I" of gates available on a single F1)(',A. i_ was not feasihle 1_) use floating
point arithmetic in our in_l)leme,,ta.tion of the PNN algorit l,m. \Ve iherefoIe transformed lhe algorithm
to use fixed point arittnmq ic prior to hardware implemenlal ion. The' width of the tixed 1),)int datapath
was determined by sirn_llal.ing variable bit operations in (: and co_lpariil< llae i'('sl_lls ol,iained from
the original algorithm i,_ tloating point. Once the fixed point cla_siIication of lhe l]h/ckhills data
set yielded exactly the same, resltlts as the floating point version, l.h_' _lata l)ath width for the FPGA
implementation was known. (Since the output of the I'NN classitier is silnl)ly a 4-bit valu(, representing
the class that matches lhe pixel, I.he fxed point version pro,luted exacl.lv the same result as the floating
point version. Hence there i._ no loss in precision due to implemcntatio, _sing fixed point arithmetic.)
Figure 4 shows the data [low diagram for the hardware implem('iltai.ion of the' PNN classifier. A
portion of the design was mapp_,d onto the XFPGA and ihe rema.inil_v blocks were implemented on the
YFPGA of the module. The nllml_er of bands (d) was tix_'c1 to 4, 1.1_(,ma×imum value _>1'the number
of weights per class (PI.} was fix_-d to 512, and the ma.xiill_m_ numl_er of classes (/,:) was set to 16. As
shown in Eq 1, there ai_, l.wo collstants, K1 and K2, lha_. are class d_'l)_'ndent, lhese constants are
pre-caleulated on the hosl. and _lownloaded to nlenlory })alll,_s i'esidiJlg on _.he modllles.
The weight memory was mapl_ed to the SRAM thai is coi_n('ct.ed t(, 1he h'FPCA oll the lnodule. The
weight memory can be as large as 16'512"4"2 bytes - 327(;8 16-1_it w_ids. Each w_'ight value occupies
10-bits. Since each class can hav(" up to 512 weights, an array that. holds the lmlitber ol" weights for
each class is employed. Thc inp_tts of the array are also visible from the host processor.
A 4-bit register holds th(, imnll,_'r of classes. This regisl.,'r is init.i,li×ed by the hosl ])el'ore loading the
FPGA coprocessors. D_e l.o ,.h_' lack of space on the Xli'l_(i',:\. the/x'l ilmli.iplier ail_l the class compar-
ison blocks were moved to the host. These calculations allto,nt to I,' imlltiplicatio, ls and comparisons
per pixel classification. Sit,c(, lhe lmmber of classes, ].:, is small, t.lwv do _ot acco_lttt tot' a significant
amount of the computalioll, leading to a small pert'ormattce t)cnal_.5, l:or example', it" the number of
classes k = 16, the maximum ll_m_ber of weights per cla_s /_. 512, and we are (l_ssifying a 512x512
image with d = 4 spec_ ral lmnds, Eq. 1 is calculated 1(_ l.i_tes. l'l_, i_<uformai_c< - l,enalls" amounts to
only 16 multiplications and 16 comparisons per 512x517 image l.l_a_ at,- _'xecni.e_l on 1.11_'host rather
than executed on the FI'(I.,\. This is a small overhead r_,l,tl.iv_- to _.1__ _nore than 512:3 n_ltiplications
that are computed on lh_ I"I'(;A I'or this example.
Figure 4 contains a S_bl.racI_io_ I;nit that computes W, a I x 10 1)il ('le,_('_ll. vector t()l" \X (li?(I , lVl, //)2, "//)3)
minus X (Xo, Xl,Z_,x3). 'l'h(' r('s_li, of the subtraction rarities from l()i_:{ lo 1023, r('(lUil'iI_g 11 bits in
two's complement forn_al. The .qq_m.re Unit multiplies each l l-bil ('l('_,('nl of tl_(' Y v(,cl.or by itself
(i.e. to = Y0 x y0). Th(' vaht(,s of l.he elements of the '1' v(,(l()r range from 0 to 1,()1(i,52!). requiring 20
bits in two's complemcnl, forma.I..
The next computation involv(,s the Band Accunmlator (;nit. This _nit adds th(' _1elements of the
T vector together resull.i_lg in _1, ranging in value from 0 _o-1,186. l l(i. requiring 22 1)its. 'l_he K2[k]
Memory holds the K2 values for each class. K2 = (1/2)cr,72, where crk -- ".2,3 ..... I 1, 12. As a result,
K2 varies between 0.125 (rr,: = 2), and 0.003472 (crk 12). l'he targ,'st val,lc of K2 = 0. 125 in decimal
and is represented exactly itl billary (0.001). In orchw to il/crease l.lle precision of lhe mldtiplication,
the values of K2 are stored with lhe decimal point shifted Io the righ_ l_y 2 (mult.il)lied by four). After
K2 is multiplied by u in the 1(2 Multiplier Unit, the <lccin_al l)Oint of I}_e resu]l, of l[le mlllliplication is
shifted to the left by 2 (divich, 1)3 1 effect). Since this is a rCl)r('se_ll.al.i(m iSsll(', IIO hal(lwarc is necessary
to perform the shifts in the YFI'(',A (refer to Figure 'i), o1_ly the hosl needs to maintain the values in
the K2[k] memory in th(, apl)ropriate format. The 1(2 hlultiplier 1711it imllt.iplies the h'2 values for
each class by the accumula.t('d values of the difference between a pixcl and a weigllt vector. It delivers
a 44-bit result to the TO_XI:P(;:\ unit shown in Fig_lre 1. Bits 0 _o 23 tel,resent tile fra,:tion portion
(remember that the decimal point is shifted to the left I>3 2), and lilts 2.1 to 43 rel)rCs('llt l.he integer
part of the result.
The next operation is 1o complIle the exponential ot" the tlegative of I.his nlunber. (_iv(,u lhe precision
of the following operations, any ilu,nber above 24 will yield zero as n rcs, llt. Thus, if any of bits 43 to
29 is set or both bits 2S and 27 arc set, the result ot' e-" should be ×cro. Only 2.'; bits are passed on
to the Exponential LUT [Tllit, ;_lld lhey are bits 1 to 28. t_it 0 and I:,ils 29 to 43 arc discarded. It was
also found that a consi&'ral,le imlHber of results of the li_1111.il)licalion ar_' zero, which iiidicates that
the result of the exponcnl.ial should be one. In order Io s_ve pl'oces_ing sleps ill tiffs case, the output
of the multiplier is tested for zeto, and a flag is passerl lo lhc l£xpollential LUT luit., ill_licating that
its result should be 1.
A look-up table is us('d t.o del(,rlnine the value of ( . If we assllm(' lhal a = b _- c, Ill(m:
Ib+ / .... -' (2); .7: ( (
Since a is a 28-bit binnry immbcr, the value comprisin.e; lilts 27 1.o 1 i of, represcnl b, nlld the value
comprising bits 13 to 0 of (, rel,r<'sent c. The range of vnlucs of b ai_,l r -r, are:
or
which results in
00000.000000000 < b< lOIIl.lllllllll.
0 < b < 23.!)9801(;9,
(3)
(4)
0.9980519 > e -s >" 3.7S >: lO -11 (5)
The range of values of c and _ -' a.re:
O0000.O0000000001lO00000000001 < c < O0000.O00000000111 I 111111 1111,
or
(6)
which results in
1.19 x 10 -7 < c < l.S!)l!) × 1()-:_.
0.999999881 > _-_: > O.!)!)SIOi)_SS
The values of e -b and t- are previously calculated a_ld organized inio a. look ii I) fable.
(7)
(s)
At run
time, the values of b all([ c are ,Ised to address the look Ul' table stor,'d in t.he memory tllat is directly
connected to the XFPG A. Tllo val_,es of e -b and e -_ tel rie\ed from the look-up tabh' }tl'C l.h,q] multiplied
to give the value of c -r'. The values stored in the look _1t) table arc :{2-t,its wide. The result of the
multiplication is 64-bits. I,lt olllv the most significant 32 I,it.s are _ellt oul. As a r,sult.,
3.77 x 10 -11 _ e-" _ {L!)!)S0517S1. (9)
The Class Accumulator 17nil _ums up all the comparisons bctw,ell a given pixcl and all weights of
a given class, and outp_tls the ros,lt when it. receive._ a ttag indical.in.g that t.ho data lo add to the
accumulator refers to the last wei.ght in a class. The o_1_I,lt of the I".×l>om_nt.ial Mliltiplior [;nit range
is 3.77.10 TM < d < 0.99S0517,_1. Fhus, the largest ,Tcc111mllat.ed \,al_l(' is 0.99S0517S1 * 512 (max. of
weights) = 511.002511872. In order to keep the precisioll o[ d, t.ho acclmll,lator is oxten<h'd to 40 bits
to accommodate the original 31 1,il.s after the decimal i>oint and I 1,il I)e[ore the decimal point, and
the new 8 bits before lhe decimal l_oint. Each class ha_ a K1 value _ssociat.ed wilh it. The value of
K1 is determined by the following formula:
1K1 - (10)
The result of the muliiplical.ion of K1 by the accunmlated ditforence_ 1,etween a l,ixol alld all weights
in a given class is coml)are<l wilh all other classes to d(_t('rlnin(' t.l_(' I_lr.g(?s[, l'est_ll, xvhic}_ indicates in
which class a pixel most prohal,ly belongs. In order lo k('c I) the rallies h('ing mt_llil)lie(I in the same
range allowing us to use li×ed i,()int arithmetic, t.he val_l(,s of KI are normalized a_ follows: Given d,
(ra and P_, the host program calculates all K1 values, a_l(I (lividos lh('m t)v the lar_('st ore'. ]'he result
is that one value of K1 equals I a,_d all the others are l('ss I.han 1. l'h(" hl Mullil,lier l;,,it multiplies
the 40-bitresultof the ClassAccmnulatorUnit by lh_"32-1fit.1(1 value from l.h_' KI _lcmory Unit,
and outputs a 40-bit result to lhe :/ register in the (llnss (!omparis(m [7nit. The (llass Comparison
Unit receives a value that reproscqll.s the compa.rison I_¢,l.w¢,¢_na pixel and all wci<hl.s iTi a class, and
compares this value against the values generated ff)r all other classes. :\l t.he eml of Ill,' calculation
of all classes, it outputs a co(h, ilia.l, represents the class which l_r¢'scnl.ed l.he laY.e_-si, v;,lue or is the
closest match.
C. The host software
The software that. was develol,ed for the PNN a.lgorithln lhat exec,,lcs on lhc tlost processor was
written in the Java programmiil< language. We selected lhe ,lava I)r()e:r_,nming laugua.<(' for several
reasons. Java supports software r(,use, native methods, renlot.e melho_l illvocation, and il l_as a built-in
security manager. Software reuse allows ,lava objects and methods 1(, t,(' used r('l,('a.l.edl 5" in different
applications. Native methods allow legacy code (01d sol'twnr<' writl('ll ill am)t h(_r lane:uage) to be called
directly from Java metllods. _l'll_, s<,curity manager and r_'mote metho,I iilvocatioll allow .lava programs
to be executed on remote (!l'lTs with the system taking care of n¢q.work traf[ic errors, security, etc.
The FPGA system nsed t'or development of the hardware modules, contains drivers for interfacing to
the FPGA devices thal are only available in the ('. progr_l_mfing lane:uag_. Java, was a useful choice
for a programming language since native methods allow o1_, t.o call (: rouiitlcs clircct ly froin ,Java. This
is accomplished by building a <13'l/,_ll_ic link library that c_mlains the (! f_ilct.ions l l_at inl_'rfa.ce to the
FPGA coprocessors. A ,lava hal iv(, method is used to call I llesc (! I'u,lcti<ms dir,',* ly.
The application was in_l)h'i_,'l_l_'d using a client./scrv_w t_'l hodol_)e3 I_) i,rovi_lc an inl,,rface to the
FPGA coprocessors from a rcmolc site. _l'he server progra_ inl._-rfaccs dircclly lo the r,,configurable
accelerator via the C drivers, li receives a block of pixcls IlOm tile clio'hi, inilialcs the classification
of each of the pixels on lhc VPC, A accelerator, galhers i.l_e rcsn]l.s into a block of cl;_ssified data,
and sends the results 1,ack 1o Illc client. The client sofl.war_' conllOls lhe llSel' i_l_wfa.c< image data
input/output and translation, in addition to conalnunicalion witl_ lh_" server. ]_x st,l_,,'iing Java as
a programming language and s<'l_alating the program i_:o client, arid s('rver sul_sysl,<-_l_, the client
software is completely in<l<'l,<'_dc_: of the operating s3"stc_ lhal will ,,×c<:_l_, I1_, clh,nt, lm_gram. Only
the server contains code lhat is nor only dependent Ol_ i.h(, <,l)Cral.ing svsi.c_n used, 1,._t als,, depends on
the specific reconfigural,l(, accchq'ator that has been selecl.<'d, l[e_lc< in l.his 1)altar we present results
obtained from an imph'_n_,_l al io_ of the PNN algorithm lhal can he ,,xec_led tro_ a re_ot.e machine
accessible, for example, on ll_<, I_lcrnct..
I\:. EXPERIMENTALl¢l.;'_II/FS
In ourexperiments,wcusedthe r<'mote implemcl_ al.io_t of Ih(_ ]),XN d_ssilit'r 1¢, m¢'as_r<" the effec-
tiveness of a client/servtw al,lnoach to adaptive COmplllil_e;. l:igm'c 5 ill_strates a potc1_t.ial scenario
for remote image classilicatioi_, tll this configuration, 111,' scrvcl program has a direct interface to the
FPGA coprocessor. It. illitializc_ t.1/¢, tVPGA board all,l lo_ld_ the arcllil(,clure shown in I:ignre 4 into
the programmable hardwar(,, lJl _,_Ir l)roject., the Sel'\tq t'×¢'Cllles OII ;/worksla.tion ,_1 NASA. The client
program communicates with lhc server via the Inl(wllci.. fhe clicul rC<lllcsts a c_mm.cli()n with the
server and, once granted, s(,uds data to the server for tnoc_'ssing. 'll_c server l,rOccsses 1lie data and
sends the results back to the climll for display. WhiD the clicut is dcsigm.d to (.xcc_l¢, at a remote site,
e.g. NCSU, in our experimeuls, I,oth the client aud scrvm" programs xxt,l¢, ¢,xec_llc(I on a single host at
NASA.
Two software implemcul aliol,s ()f the PNN algorithm wcr(_ <l<,vcl<)l)c_l to colnl)m'(' the rclative perfor-
mance of implement.ati_ms iu tw,_ dill'trent programn_iu_ lan.g,_ages. ()_t:, \'_wsioll was wl'il 1.en entirely
in the Java programming language. The other version was writtcu usiug t.h_' C lWOgrammiug language.
The main routine in the click: _l,aW_t'd either the ,lava ov (! vcrskm_ of llw algorilhm via a call from
a normal or native melhotl r_'_l.'Cli\'ely.
Two FPGA-based havdw,wc vclsions of the PNN al._ori_ IN_ were i]_ll,l('_nentt'd ilsing siugle or mul-
tiple modules. We rel)Orl rt,mlll.s _tsing two mochiDs as we only had lwo mod_llc: available for our
experiments. In the siugh, u,)tl_llc case, one pixt-I <)r _)_1(, l,lock of l_ixt'ls w(,rc St,lit. t() t'ach FPGA
coprocessor and the res_lll s wett, _'('t _rned to the client via th(. server. I_ _l_t. two n_od_lh, _,xperiments,
one pixel or one block w;_s st'ill _o each of the t w(, 1:1'(',:\ COlWOCcs_t,rs iu all /,tl,¢qn[,: 1o speedup
algorithm execution by a lact_w of 2. Each module i_t the lnulliph' mo,hll, cast, co,tailw_l a complete
implementation of the hardwar(" iu t"igure 4.
A traditional version of t l_c I'NN (lla.ssifier algorith_u was l)rc\'i_,_lsly devclol)Ct[ as the basis for
the remote version pres_,,_,c_/ iu t l_is paper. l'his eXl)<'rimcl_t dcmo,,stralcs llac potential merits of a
remote image classificatio_ al_)rit 11111implement.alien. The traditio,,aI v¢,r_ion c×_wut_,d ,,,_ a 100 MHz
Pentium PC. This iml,lmllt'Iltali,),_, written entirely in ('., r('quircd :21113 ('I)1_ scc,),,(ls _,) classify the
complete Blackhills dala scl. ]Iv ;_,_mcnting the P(: wilh a siu_h" illo,tlllc l'UllllillK lilt' I')<N classifier
at 16 MHz, the processiug lime u'a_ r_'<luced to 220 (I1'[7 st,couds. Ill l l_i_ ca_e. I lw ,iclal)ti\'c computing
implementation is 9.2!) limes tas/cr than the soft.war(, \¢.rsioll. A,Idi_g t,l_c addit io_l_l luo,lt_h, improved
execution time to 90 (!l)l; s(,con¢ls.
In our experiments witl_ the r<,tnotc PNN classifi_'r, we" ran a total _,f I diffcrcnl _<-¢,uarios presented
10
in Figure6. Thescenariosallow1>to comparelocalandreInoteversionsofthealga)vii,bin t,ha,texecute
on the clientandserverwith I_ix¢'lhasedor block1,asedal_()rilhmswltereone1,i×elor onet)lockof
pixelsis processed.In eachexperiment,wepresentexecutiontimesfor l.wosofl.wnreiml,lcmentations
(written in JavaandC) and_wol_ar(lwa.reimplemelll.ations(Oil(' ln()(tll[(" (.,1'IXVO 1,1,M,tlc_).
In Table I, we present res_lll s of a iemote iml)lemcn, at.ion of the inme_e classiIical i_m al_,rithm where
one pixel is processed at a time. Note that the i1,11)lemeI_lation of ll,c algoritl,l_l in .lava requires
7598 CPU seconds to complete. l'h_' C version o{ the al?;orithm reqtlir<'s slightly nlore t.i,ne since it is
actually spawned from the local client .lava program I_, excclll.e on l.lw lcm(_te ser\e_ work< ation. ('The
overhead associated wilh calliiig a (! fullction fronl .lava is inchldcd i,l Ill,' exec_ltion tiIne.) For all
practical purposes, the (! a1_,l .Iav_ versions of t.hv alg,,rit hnl requilc S_l,l,roximaielv lhe sallm execution
time. This was a stra.ll,,e res,llt since ,Java is an inlcrprel<'d la.nguae_c, however, we noticed a drastic
improvement in the execul.i()t[ ,,1 .lava programs u_ii,g im),'e re(ellt \'crsi(ms of l lie .lav_ interpreter.
The remote version of the algoritl_l_/ executing on a _in.@' FI (,:\ 1_,)(1_1,' was :/.57 times faster than
the remote software version. Also m)te that the addi,.ion of cmc mc, d_lh, in the ],_flt.il,h' _nodule case
does not impact performance.
The next experiment involvc,I se,_ding a block of dal a from the ('li('_l. to the server for processing. 'The
results of this experilncnt are also sl_o\\,n in Table I. ]l_ ore' experin_(qllS, a_l arbitrary Ifl<)ck size (equal
to 6 rows) was selected. (l.'_l_lrc cxlmriments will idcl_tily lhe Ol,i.i_al I,lock size.) _;ince _here are 512
pixel vectors in a row, and 1 I,ixcl_ I,vr vector, one Iflocl¢ coiiI aiiis 1'2.22"_ Ifix¢'ls. N(_,' ll,a_ 11_¢,execution
time of the remote .lava \e_'sio_l _,t' the 1)lock-based algoritl_n_ is si.e:,tilica_tly s_,mller llla]_ the pixel-
based algorithm. The exc(_tl.ion l i_nc reduced from 7.5!)8 to 1:{58 (:1_17 sect,ntis. ()_lce a e;ai_, the single
module implementation was sigifi icanl.ly (7.6 times) [/isler lhan th(? relllol(' soft\var(' versi_m written in
Java. The addition of a se,o,ld _l_l_le did not 1)rovi,le a Sl)ee,lu 1) _11_' to file ovcrlt,'ad associated with
sending a block of dat._ to tl_e scr\c|'. Please hole ll_ai, l.l_c l"P(i:\ c_q_roccssor cot_islc_t_lv processes
a pixel at a time, how¢'x'er. 11,¢"solvet will wait for all pixel_ in a block l.r) I_e i)roccs_¢'d I,cfore sending
the results back to the cliclli.
Table II presents res_dl.s el I'NN classification excc_ i,_g ¢m a local w(_rl.:sl.al.i()_, lhe (li,'l_t. program
can initiate execution o1' eilh_., o1 ll_e software o, har_lwa,'c algol'ill,l,I i,,lpl('nu'_ll_,l iol,s. In the local
pixel-based algorithm, lhe .Ja;';/ \ersiolt requires al_oul I:_I7 (:1'1 s .tends an_l lhe single module
implementation requires 1.11 ( !I'[ see(rods. This is alq_roxi_mlely an o_'_h,, of lna.gl_il.ude il_provement
in execution time. The mulliph. _nodule version complel.es in 77 s('('_),l(ls resull.illt[ in a 2:1 speedup
over the single module as exl_ccl._._l. The results fl'om -l'ahlc I1 ill,Is, tale l.]lnl 1)locl_ I_ascd i_rocessing is
I1
counterproductiveon a]o('_llclio,if workstation.
r. CON( :l.I 7SION,q
In this paper, it was slIoWll t,ll_l 1he implementat io1_ of a ll_111tispect ral image clas_itier ol_ an adaptive
computer yields an order o[' ma_:_il u<le performance incrc-asc over higll cud workst_l ions. If we extract
the fastest execution times toy 1t,' algorithm from lhc 'l'aldcs presclllcd, we find ;_ii interesting result
that relates to the potential il,l]);_ct of remote adaptive colnputi,lg _cch,,ology. I'he t'astest remote
hardware implementalion of t l., I'NN algorithm consisted of a _i.l_lc illoduh _ l_'quirillt_ 178 CPU
seconds to complete. ()ll the _ll.'r hand, the fastcsl local software \'cvsioi_ of llw algorilhln was the
Java version that required I:_()!) (!PIT seconds. l'l_is is 7.35 limes slower lhan lllc' ren/ol,e hardware
implementation, ttence, for illmgc classification, a lcmot, c hardware illll)h'lncntat ion of l lie algorithm
is faster than a local soft.ware it,ll,h-m,_ntation of the algorithm, l:ul _lrc w,_rk is 1o i(hml ifv additional
applications wherein a r(:mol,. I,_tdware implemei_lation is consisl,'nllv faster t]l_n a local software
version. Additionally cxpcrit_..lll> that quantify tile effects of ;I I.';_\il 5' loaded ll,'twork connection
should be conducted.
[1] J. Hess, D. Lee, S. llarpcr, M. +]_m+,s. and P. Athanas, "lmplenlel_l;llion and I':v,luati.n of a I'r<,tolypt lleconfigurable
Router," in Seventh lEEK ll'ovksl..p ,m H'(,As .for Cu._tom ('omlmlmq :lla(him *, I!!!19,
[2] M.J. Wirthlin and B. L. tlul(hin,.;s, "'l)l'-;(:: A Dynamic he-lruclion S(,! Compul(,r,'" in Third 11:'15; W,,rkdtop on FPGAs
for Custom Computing .llachiu( s, [!it)7,
[3] L. Agarwal, M Wazlowski, aml S. (;].,sh. "An Asynchronous Al,lm.;.eh Io l';l[ici..l Execution of Programs on Adaptive
Architectures Utilizing t" Pl ;,,ks," in ,b, .-,rod IEEE Workshop on t"I_GA , ,[or Custom : :on_p.lin!] :lla_ hi.e*, 19'_1, pp. 101-110.
[4] P. Bertin and Herve' Touali. 'q)\M Ihogramming Environmc_lls: Praclice aml l':xpvrie,ce." in .%,,,m/II)t:?E [Vorkshopon
FPGAs ]or Custom Conzputinq :ll,u.hme._, 1994, pp. 133-135.[5] H. Hogl, A. Kugel, J. Ludvig, II NI;_,tl.,r. IC.. 11. Noffz, and P.. Zoz, "l':nahle-l--l: A %,'_ol,,l (;enera,i,,n I:P(',\ Processor," in
Third IEEE Workshop ou I:'t_(,':I _ [,. (:..'tom Computbtg M,whi_te.,, l_!)q.
[6] C.P. Cowen and S. Moimghan, "'.\ ]{.,onfigural_le Mont,e-{7?arlo (:lust,,ring Pro<essc,r (MI:(:P)," i_t ,_',c,.,.I IEEE Workshop
on FPGAs ]or Custom ('ompulinq M_l.htn_ s, 1994, pp. 59--6:..
[7] S.D. Scott, A. Samal, and S. g{th. "'II(IA: A Hardware-Based (:;cmqic Algorithm. '_ in .I(?.U/,SI(,'/).-I .S'!V.p._'ium on Field
Programmable Gate Arr.qs, 19k)5. pp 53 5!).
[8] P. Graham and B. Nelson, "'Ge._qic Algorilhn_s in Software and it* IIardware:' in l"ouvth 1EEE II.vkd., l, on FPGAs]or
Custom Computing Machittee, l_.!!i_.
[9] W. Luk, T. Lee, J. Rice, and P, {'hcl,:4, 'qleconfigurable {;Omlnlti_ :\ugnlcnled II_alily," in .%v, .th IEt'lC IVorkshop on
FPGAs for Custom (-'ompnlinq :ll,trh:.es. 199_}.[10] M. Gokhale, B. Holmes, ,\. I(_)pser, l). I,:unze, D. Lopresti, % Lueas, 1l. Minnich, ai.I P. Olsen, '%plash: .\ l/econfigurable
Linear Logic Array'," in hzt, r.,_t/,,.,,I ('o..fivence on t_aralbl Proees,.i.g, 19til), pp. q2fi _,:12.
[11] J. Vuillemin, P. Berlin. [). _;om'i.. ?d. ";hand, H. Touli, aml P. llcm(mrd, "t_ro:dranlnl:lhh ' Acqivv M, mori:,s: lleconfigurable
Systems Come of Age," I/)L'/: Iv,.,,,:, li,,._ ,., VI,,ql ,';!l_t_m_, vol, I. m_. 1, pp ",t_ I;!_. I!i!!/;
[12] P. Athanas and L. AhL¢,_I, "V,. :d lim_ ima::,e processing o_, a cuslom COml.ltin: A I,l:_lfi,_m," 1I)t)1" ('oml)*_ter, vo]. 28, pp.
16-24, February 1995.
[13] N. Shirazi P. Athanas and 3.. At,I.,_I. '[ml_lement, ation of a 2-D fast f_mrier lral,sl_,im (;n ;in 1:1)( ;,\-bzise([ on-,tom computing
machine," Proceedings o.f lh, 7,1h t.l, _ .altoaal IVorkshol_ o. t'i_ld-I_r,.Ir,.mo,_t,I, L,,!/i, a.,I ..llq_li¢,tl",ns, re] 1, pp. 282-292,1995.
[14] B. Hutchings M. Ranch{ r. "Autom;m,d Target Recognition on Splash-II,': ll)EIi .h)t.,1,,_,_",m ,m l'i, td fVv:V',..mable Custom
Computing Machines, w_l. I. pp. 2_2 211_, April 1997.
[15] K. Bazargan and M. Sarrafz;_deh, "t:,._.,h<urable Comlmling , Au.4m.,.a,,d I{<_lits,'" i. ,h', vcvth It'f'/' ll',_vl,./._p on FPGAs
`for Custom Computin:/ M,.-hit. _, I!_!i!_.
[16] Get ready for reconfiguraLl_' e_.nllmti,_,. ,'" (!omputcr [)csig_a. vol. l, pp. 55 I;3, \pril 1!_!1_.
[17] T. Hoffmann U. Nageldi*<_'<'r I_,. II_rt,q_t, in M. ller and li Ix';.iserstaul_ rn, "(h_ r<¢_,.lixural_le co p_,. ,,ssin_ u_dls," Proceed-
ings o] the 5th ReeonJig.v.ld_ .l_,hlt,,/,.,s Workshop (I_A II":_,v). vc.I. I, March :_l_ I!!:!s.
12
[18] M Wazlowski, L. Agarwal, T. l._,..\. _milh, E. Lain, P. AI}lall;lS, II. %ilverman, stud S. <',hosh, _'I'IIISM-II Compiler andArchitecture," in First II;'EL' [l',_t'/,.,h,q_ os, l"[)(;As _T" (.'t:xlon, ('o_q_Mi_ 9 3[a, tlh_¢ _, I!l!i:], pp. 9 l I;.
[19] P. M. Athanas and It. F. 3ilverulan. "I_roc_'ssor l/ecoufiguralir_n Thrrmgh In_trm:ti_,lt.qet Metaulori,hosi_;," /EEE Computer,
vol. 26, no. 3, pp. 11 -lg, 1!_93.
[20] D. Galloway, "The Traz_smogrili<.r (! Ilardware Description i_anxua_' :rod C_mq,ih.r Ibr I"[)GAs, '' i_ Third //;'EE Workshop
on FPGAs for Custom (,'omp_lim, ,Ihlchiuce, 1995.
[21] J. B. Peterson, R. B. O'Connor. and ]), M. Athanas, "Sch_,duling a.d Partitioni,,_ ASq[-(_ lh'o_rams ,,.to Multi-FPGA
CCM Architectures," iu t"ourlh IL'/;'t; W,_rk._hop on FP(;As for (_._t,,m (_oT_pulitt!l .llewhmc._, l_!_t;
[22] M. Figueiredo and C. Glosser. _qnlld,'menlation ofa prol_abili_,tic neur;ll network fi_r muhi sp<'clral image _ I;_..sificalion on anfpga-based custom cotnlml, ing machi,(C' Proc_dinys o.[ the ,sth ]'Ira.':ilia_ ,¢;yml)os_t_ o_ Neurctl :\_ l_,orks, l)t'(:ember 1998.
[23] D. L. Verbyl, Satellite rtmot_' .,,u,i_rl ,_.[ Hal_wal resom'ct._', ];,,wis lhd>lMu-rs, L,,wi- Ihddi-hers, 1!t!_"[24] M. Birmingham S. R. (:hetlri, R. I". (h'_mq,, "Design ¢7f u(.ural mtwt_rks for <'lassificalion of r, tm,_ely --nsed imagery,"
Telematics and In]ot'matzc_, vol. !i [q;. 1.15 156, 1992.
[25] S. R. Chettri and R. t". (';romp, "th_,hahiIistic neural network archihct.re for high-_,t*eed clas;sific:_tion ot remotely sensed
imagery," Telematics aud I_([orm,l_¢,. vol. II), pp. 187 19g, 1993.
[26] Giga Operations, Giga ()l_Cr,_tio._ ,_'l_,,'tc.m f?cconfigur, tl,l_ (:ouqmli_*,/ I_lalform l),,,'.m_tati,m. (;iga ()p, ralion Corpora-
tion, 1995.
10 bits
0
d number
pi;:< el weight',,.,'ec t o r vector
Numl)er of Classes K
[I
Class S kNurNDer ,Dr weiglmti
In cl_a@ "_k = Pk
Fig. 1. PNN Image Classifier.
Ii.. "2. I:])(;A Imard r,,ihnecl{.(I t_,.,, I'(11 I),>_l,)t
]l
Multi-s pectralsatellite image
_-_ download BIT file
_xecutable) image pixels '_
\ / _\ classified data ,:i_3_
"%.
"%. •
"\ _ dtsplay
"_las sift ed image
\.
FPGA Board
Z
O
e_
O
m
_mory Uni
.... IIiWeight_Mamory
Unit
0
_,_ i_[_
_mo_ Unil
i ! 1
W
ant
L
SubtFac'dc_
Unit
8
T.@)0-r
_J/ h.
Ir
d 'dme_
....-----_---%I ,.-.,_,d t;
Square- _ Unit
Unitl t ._l_ccurr, ulato u___9__,_
}:."2 a
1_.4ullJl:,lier
I.Init
I _ E-'_onenlJalLUT................................................................... Memory Unit
Lm, mm m m m mm m n m ,mum, m m ram, mmmm mmm ,imm mm mm m
XFPGA I
m<,[_/"_ Memory Uni 1
Result Unit I _, h
ICI_
ComparisonUnit
.._
/
I I _ I cla_ I d I_._,,-_._-n._l I_lKi _.!,.,I._plierm.d._,0ozurr'ul_qm-_---I_l.l_ullJp]ie f
I u_"' I_ (l ,-,_,it k I '-'nit
-., -" Pk lJrne._
eb, ec
PNN Data Flow Block Diagram
.... ,_ irmiflalize,'l L,efore F PG._.I,zoJ
___ o_-enec__,'l tothe HBUS
Vig. 4. t"NN (:h,.silier Dala [Fnit.
1(_
I LOCAL CPUIocal.ncsu.edu
,c_,e__i
North Carolina State University
I Remote CPU }remote.nasa.gov
sei'v_r
NASA Goddard Space Flight Center
_J[ _, rr,l,,ef_
Fig. 5. Algorithm Executiol, {,t _ l{emol_'Adapliv_(;{mqmt,zr.
17
Total Execution Time (in CPU seconds)
9000
80007000
6000
E 500040003000
0 20001000
0
_ _-_ -_ oo _n-
Image Classification Type
[] Software (Java)
[] Software (C)
[] Hardware (Single)
[] Hardware (Multiple)
Fig. (';. Local and Remote lm_q-(Tassific_lion l_xecution Time.
TA Igl,l,2 1
REMOTE IMAGE CLASSIFICATION '1 IMIN'(; H, EPORT (IN C])e SECONDS).
Description
Software (Java)
Software (C)
Hardware (Single)
Hardware (Multiple) I
lqxel I_ased
Avg Row Total
14.5:_ 7597.79
15.7!) 8087.2:1
4.l(i 2121>.38
4.25 I 215(i.02
l{lock Based
Avg Row Tot
2.65 135,'_ t)[
3.61 l_q 7
0.35 17_." I 70.35 t 180.
'IA I{I,E lI
LOCAL IMAGE CLASSIFICATION 'I'IMINt; I{EPOI_I' (IN CPII _ECONDS).
Description
Software (Java)
Software (C)
Hardware (Single)
Hardware (Multiple)
l'ix,'l I]ased
Avg Row Total
2.57 1317.31
3.57 1871.36
0.:27 1:] I. 10
0.11 77. i5
l{h>ck Bas,,d
Avg Row Tot
2.56 1:{0!)
3.69 1_>_
0.2S 143
0.2E 1.1:2
!13
19
Marco Figueiredo isanR&DEngineerworkingill theinvestigationandal)plicaii,,l_()fr('configurable
computingat NASAGoddardSpaceFlightCenter.Hehasa B.S.ill Electricalt';_,it_('('ringfrom the
UniversidadeFederaldeMinasGerais- Brasil(1!)_), andn mastersi_ Conlpul,,_I'._e:i_u'eringfrom
LoyolaCollegein Maryland,USA(1991).
Clay Gloster, Jr. is currentlyanAssociatelh(,f(,ssorin lira Del)artmentof Eh,cllica ,k:Computer
Engineeringat North (larolinaStateUniversity.I1(_receivedtheB.S.and M.S._l,',_,rc(>in Electrical
Engineeringfrom NorthCarolinaA&T State[Tniv(,rsity and the Ph.1). degree in (!,,llllmler Engineer-
ing from North Carolina State University. I-te als() has lw(m employed with IBNI, I/_(, l),'partment of
Defense, and the Microelectronics Center of ix,'orlll (:arolizm. Currcllt research fo,_l.,'s (,l_ lhe identifi-
cation of potential applications and the develol,mCllt of autol_mted tools that assisl :., i(.Jll ists/engineers
in mapping these applications onto reconfigurahh, c(,nputing resources. He is also ;_c_ivy,Ix" conducting
research in the area of technology based curricuh:in develol,_mnt and distance e(hl(;_ toll. Dr. Gloster
is a member of Eta Kal)pa Nil, Tat; Beta Pi. A( _,\1, an is a registered l)rol'ession_l ,._.._iil,,cr.
Mark Stephens is a (.',omp_iter Engineer workil,,, for tlm NASA (',c,<ldard Sl)aCc tli,_hl (!enter. He
has a B.S. in Applied Math, Theoretical Physics ;,_¢1 l"luid I)ynamics from TulaIl(' I _li\(wsity (1976).
He went on to get a Master of Science in Atmo._l,hcric Scic_we at (!olorado Stal_ Il_i\'cl_ily (1979).
Corey Graves received his B.S. degree in Elecl rical Engineering from Nc, rth Caroli_l:_ SI;tlc University
in 1993 and his M.S. degree in Electrical Engi,e('riHg from _<,lth (larolia_a Slate A,\I _lill,," University
in 1994. He is currently COml)leting his Ph.D. \V,)rk in Co_al,_ter Engim_ering at N,,_lh (',rolina State
University. His current research interests inclu&, f{_m-timc l{econtiKurable COml,_l i_:._. Applications
of Wavelets, Adaptive DSP algoritlams, and Archil _,¢'t_u'es I,,r 1)SP. lh_ring his 3C_,l- a s_ udent, Corey
has worked with many industrial and governntc_ll c_l.itics includi_lg IBM, ATL I. %,_l,lia National
Labs, and Oak Ridge National Labs. He is curre_ll ly a mcnll,_'r of the I]';l"I" Signal I'lo,'_.,sing Society.
Mouna Nakkar received her Bachelor's degr_'e il, I:;l,_ct.,'ical l:a_ginccring from ihc [ _ix'_'_it.y of North
Carolina at Charlotte, NC in 1991, her Master's hOln the s_ne tmiversily ill 1!)!):_. :_11_1I/or Ph.D. de-
gree in 1999 from North Carolina State University iI_ l/alci-h, NC. She joined I_l,,l,,roh_ Inc. in July
of 1999 and is currently, working on microproccss_,r research and dc\_qolmwnt., l let rcsc_rch interests
include reconfigurable comp_l.ing, embedded l,rO,cssors. 1:I'( ',A archil.c,l.ure, low i,,,,.',cr I"PGAs, high
speed CMOS design, Multi-( lhip-Module Pa.ck_v,,i_m:. 3 1)imcn_ional Nl( !NI 1)ackagi_. _,I_,l Optical In-
terconnects for high speed VLSI packaging. SIic worked Ol, a. microprocessor &'>],_l, ,l_ring a co-op
in 199.5 at Ross Technologies. Austin TX. She als,_ served i_s a lcachi_L_ assislanl t,,_ _c\_'ral electrical
engineering courses.
2O
Top Related