Download - Implementation of Multispectral Image Classification on a … · 2013-04-10 · Implementation of Multispectral Image Classification on a Remote Adaptive Computer Marco A. Figueiredo

Implementation of Multispectral Image Classification

on a Remote Adaptive Computer

Marco A. Figueiredo a Clay S. Gloster b Mark Stephens c,

Corey A. Graves b, Mouna Nakkar _'

SGT Inc., Code 564, NASA Goddard Space Flight Center,

Greenbelt Road, Greenbelt, .Maryland, 20771

bElectrical and Computer Engineering, North Carolina State University

Box 7914, NCSU, Raleigh NC 27695-7914

c Code 585, NASA Goddard Space Flight Center,

Greenbelt Road, Greenbelt, Maryland, 20771

(Please send all co_espondence to Clay Glostcr)

Electrical and Computer Engineering, North Carolina State University

Box 7914, NCSU, Raleigh NC 27695-7914

E-maih [email protected]

phone: (919) 515-7348

fax: (919) 515-2285

E-mail addresses of the oth, er authors

Marco Figueiredo: marco_C__fpga.gsfc.n asa.gov

Mouna Nakkar : mouna,,_dbmoto.coln

Mark Stephens: Mark.A.Stephen. l:_gsfc.nasa.gov

Corey Graves: [email protected]

https://ntrs.nasa.gov/search.jsp?R=19990109158 2020-07-17T07:35:45+00:00Z

Abstract

As the demand for higher performance computers for the prc)cessing of r_.'mote sensing science algorithmsincreases, the need to inw_st.igate new computing paradigms is justified. I:iehl t'rogramm;tble (late Arraysenable the implementation of algorithms at the hardware gate level, leading It, orders of magnitude porformanceincrease over microprocessor based systems. The automatic classification of spaceborne multispectral imagesis an example of a comput.ation intensive application that. can I)enefit fr,,m inlplenaentation on an FPGA-based custom computing machine (adaptive or reconfigurable c(mlputer). A probabilistic neural network isused here to classify, pixols of a mvltispectral LANDSAT-2 image. The implementation &'scribed utilizes Javaclient/server application programs to access the adaptive conaputer from _ ,'emote site. l{,'sults verify that aremote hardware versic, n of the algorithm (implemented on an _daptive c_)mputer) is signiticantly f_u_ter thana local software version ,:,f the same algorithm (implemented on a typic_d g_'neral-purpose computer).

I. INTRODUCTION

A new generation of salcllites is being developed by th(' Nat.io,l,l Aeronautics and Simce Admin-

istration (NASA) to compose the Earth Observing System (EOS). The inst.rumellts aboard the EOS

satellites not only exteHcl the observation life of the curr(,nt sa.tcllilcs, but they also exlend the ca-

pabilities of remote sensing scientists to better understand the Earlll's environlnent. Along with the

scientific advancements of lhe new missions, it is also neccssat'y' 1.o (,xplore new technologies that fa-

cilitate and reduce the cosl of the data. analysis process. In order 1.o process the high volume of data

generated by the new EOS sat(qlites, NASA is constructing the l)istribuled Active Ar(hive Centers

(DAACs), an extensive and powerful parallel computing environment. Scientists will be able to request

certain data products from these centers for further analysis on their own coinputing systems. A new

technology that could bring inct'eased processing power to the scientist's desk, offering more complex

analysis and interpretai i(),l of t(,mote sensed scientific data, is higlily dosirable. The ultimate scenario

would be for the scientist to req,lest the data directly, from lhc sat(-llil.c along with histo,ic data from

an archive center.

Field Programmable (i;al.(, Array (FPGA)-based computing, also k,lown as "adaptive" or "reconfig-

urable computing", has _.morged a.s a viable computing ol)lion in c()ml)ul.ationall/ intensive applica-

tions. These computing systems combine the flexibility of goneral purl,ose l)rocessors with the speed of

application specific pro('ossors. By mapping hardware to FPGAs. the cotnl)uter designer can optimize

the hardware for a specili(' a.pplication resulting in accelerat.ioll tales of several orders of magnitude

over general purpose conll),lt.ers. Because the FPGAs at(' personalized using SI{AM-based memory

cells or a fuse programming technology, they can be reconfigul'od I:,y l]lc &,signer fol' other applications.

Several reconfigurabh' comput(ws have been implemented to d(,moiisl.rate the viability of reconfig-

urable processors [1], [2], [3], [4]. Applications mapped to these l,t'oc('ssors includ(': l)a.t.te,,_ recognition

in high-energy physics [51, a.pplications in statistical physics[6], an,I g(,mq.ic optin_iza.tiou algorithms

[7], [8]. In many cases [!)], [10], [11], the reconfigurable COml)uti,,g i,nl,lom,'ntation p,'ovid,,,l the highest

performance,in termsof executionspeed.The adventof r<,configurableprocessorsalongwith novel

methodsfor mappingapplicationsontoadaptiveor reconfigural>leprocessorsenablesa.newcomput-

ing paradigmthat may representthe futurefor remotesellsingscientificdataprocessing.In fact,

manyapplicationsutilizing FP(',Abasedcomputershave1)c(,ndevelopedshowingordersof magnitude

accelerationovermicroprocessorbasedsystems[12],[13],[14].Moreover.microprocessorsand FPOas

sharethe sameunderlyingtechnology- thesiliconfabrica(.i<>nprocess.Therefore,it is reasonableto

concludethat FPGAba.sedmachinescanusuallyoutperformmicroprocessorbasedsystemsby orders

of magnitude[15],[16],[1rl.

To achieve such perfoHiJance, the application must effec(ively utilize the availal_le resources. This

presents a challenge for software designers, who are generally acc_lstom(,d to mapping applications

onto fixed computing sys( eros. ('cnerally, the designers examiue the available hardware resources, then

modify" their application accordingly. With reconfigurable compll(.<'rs, 1.h(' available resources can be

generated as needed. \¥hile it may seem that this flexibility would ease the mapping process, it actually

introduces new problems, s llch as what components should 1)e used, and how many of each component

should be used to gen('ra(e the best performance. With conventioilal hardware COlnponents, these

questions are less of an isslle. In addition, software engineers are generally not adept at hardware design.

Thus, several research groups have developed methods I'_>r )nal)piIIg applications to rcconfigurable

processors [21, [18], [1'0], [2(1], [21].

The Adaptive Scientific 1)ata Processing (ASDP) grou 1) at .X:\S:\'s God(lard Space l"light Center

(GSFC), in conjunctioll with researchers at North Carotilm Sta)<_ (Tniversil.y, have been investigating

the utilization of FPGA 1,ased computing in the processing of remote sensing scientific algorithms.

The first prototype dev<qoI>ed bv the group utilized a con_mcrcial-off-the-shelf (COTS) rcconfigurable

accelerator in the impl_,nt_,tlta.tion of an automatic classifier for l.h<_ L:\N1)SAT-2 multispec(.ral images

[22]. The implementation discussed in this paper is an extension of t}l<, original pr()tol.yl)e that allows

users to classify" the images on the accelerator from a rem(',)e site. f/cs_llts indicate that a remote im-

plementation of the classilier in adaptive computing hardware is fasl.er than a sell.ware iml)lementation

that executes on a local lfie;h-end workstation.

This paper presents (hqails of the FPGA design and is organiz(',/ as follows. Section 2 describes

the classifier algorithm that utilizes a probabilistic neural ,ie(worl; (I'NN). The implementation of the

FPGA custom computiI_g machine is then presented, l:imdly, a I_crt'oH_lance ;,llalysis of local and

remote versions of the ;_l.gori_lm_ is presented.

II. Till.; PNN MULTISPECTRAI, IMAGE CI,ASSIFIEI/

Remote sensing satellitos utilize multispectral scanners to colh,,l il_forlnatioll abou! the Earth's

environment [23]. The data colh',:ted by such instruments are a set. ¢)t images, each corresponding to

one spectral band. A nmlt.ispect,'a.l image pixel is represe,_ed 16' a. v¢x:tor of size equal to the number

of bands. The combination of t.he multiple spectrum measur(_m('l_Is represented by each element

of the pixel vector det('rlnine a signature that corresponds (o a phs'sical object hoing viewed by the

satellite. Through the ol,servation of a. multispectral image and the corn l)arison of pixel vectors to those

obtained from known locat.i.,ls l i,_ situ measurements), a. s(ient.ist is ahle to identify uniq,te signatures

of physical objects and compos(_ ,:lasses. These classes co,ltai,l m,lllisl_ectral pixol representations of

physical objects on the (_arth that are closely related, lLxa,,iple class(,s i.cl,lde for,'st., tun,h'a, wetland,

water, etc.

Several neural network s¢:llenl¢,s have been devised for the a,ltoma_ic classification of multispectral

images [24]. One in pa,'_ic,,la.r, the Probabilistic Neural Network (I'NN) classilio," [25], (,xhibits ac-

ceptable accuracy, very small t,'aining time, robustness to weight cha,_ges, and m.gligible retraining

time. A description of 1he derivation of the PNN classifier and d(qails of the network implementa-

tion including rate of false alar, lls. neural network size, etc. a,'e pres(,,_t.ed in C'hc/Zri ct. al. [25]. The

Blackhills (South Dakota. I:SA),Iata set was generate([ by _he Lan(lsa_ :! m,,ltisp(','l.ral scanner (MSS).

The image's four spectral 1,ands (0.5-0.6 /_m, 0.6-0.7 /*,,_, 0.7-0.8 /_,H, and 0.8-1.1 /ml) correspond to

channels 4 through 7 of th¢. Lan,lsat MSS sensor. There are 262.114 l,i×els co,'resl)onding to a 512x512

pixel image size, and each i)ix('l represents a 76m x 76m grotmd a,,'a: th(' imae_('s were ohtained in

1973. The ground truth was provided by the United Stat(,s (leologi¢al S,1,'vey.

Figure 1 illustrates th, _ I'NN classifier procedure. Each multispectral pixel, represented hy a vector,

is compared to a set of I)ixels h(qonging to a class...\ l)r()bability val,_e is calculated t})r each class.

The highest value indi,atos the class into which the pix(,1 fils. ]';q. I ix used to derive a value that

indicates the probability t l_at the pixel fits into class 5'_..

] Jk

f(X I ,5'k) = Kl[k] _ e[-l,-2[_,:](.V-,r_,),(2-,i",,)] (1)i=1

where (0_ is a pixel recto, r. i_"*' is the weight i of class /,: d is the n,_ml,,,r of bands, 1,: is the number

of classes, Pk is the numh(_r of weights per class, and 1,_'1[/,:], K2[/,'] are ('o,_slants.)

lII. THISFPGA IMI'I.I.;MI:_NTATI()N

Thefirst stepin implementinganapplicationona.nactaptiveCOml,_llcr is to sel_'_'t,the l:PGA-based

customcoprocessorarchitecturethat bestmatchesthealKorithmin queslion.At ihe currentstateof

thetechnology,certainFP(',A architecturesprovidebel.l.crt)erformanc_Ihanothersfor ,_particular

classofapplications.A prelinainaryanalysisofthePNNclassifierindical.vdthatl.h_,FPGAarchitecture

[26],shownin Figure2, matchedwellwith thealgorithm.Theselecle¢lI:P(;A architectureiscomposed

of a PCI busbasedmol.herboardandup to 16plug-inmodules.Theseplug-inmoduleseachcontain

two Xilinx 4013EFP(',:\ _l_'vices(XFPGAandYFPGA) and provide'an equivalentof t3,000gates

per FPGA, or 26,000gate,spermodule. The designinll)lement.alionrequiredapproximately1160

CLBs(8.5%utilization)perl:P(',:\. Sincethemodulecontainedtwol"l'(_Asandtwoseparatememory

modules(connectedvia lhe IIBI;S), we can perform two lookul) table (I_!T) operalions simultaneously.

A. Algorithm partitioning

The computation intensive portion of the multispectral ilnage classilication algorithm found in Eq. 1

was identified by profiling an inlplementation of the a.lgoril hm that was written using the C program-

ming language. This COml)llla.lion was selected to be execllted on l[l__ I"PCA coprocessor to improve

performance for the complete classification algorithm. The g,'aphical user interface, data storage,

adaptive coprocessor initializat.ioll code, algorithm synchronization, and data 1/O ix performed by the

host processor. The coInl,llte intensive PNN classificatioll algorithm cqllal.ions were ulapt)ed onto a

single module.

Figure 3 illustrates tl_e algorillml pa.rtitioning. The hos_ processor ,lisl)lays the il,mge during classi-

fication. The host then sends a Ifixel vector to the FPGA coprocessor. (?lassificalion is t_erformed on

the coprocessor and resulls are tel urned to the host t.o I:,e ,lisplayed. The host also COlnl)Utes the total

time required to process a complete image. If we wish t<} use mull.il,le modules as coprocessors, the

host schedules a pixel \'color to I,e processed on each mo_lllle in a I_lll(1 robin fashion. 1hen gathers

the results as they become, availal_le.

B. FPGA application </(:.,i(/_

Due to the limited mm_l,('I" of gates available on a single F1)(',A. i_ was not feasihle 1_) use floating

point arithmetic in our in_l)leme,,ta.tion of the PNN algorit l,m. \Ve iherefoIe transformed lhe algorithm

to use fixed point arittnmq ic prior to hardware implemenlal ion. The' width of the tixed 1),)int datapath

was determined by sirn_llal.ing variable bit operations in (: and co_lpariil< llae i'('sl_lls ol,iained from

the original algorithm i,_ tloating point. Once the fixed point cla_siIication of lhe l]h/ckhills data

set yielded exactly the same, resltlts as the floating point version, l.h_' _lata l)ath width for the FPGA

implementation was known. (Since the output of the I'NN classitier is silnl)ly a 4-bit valu(, representing

the class that matches lhe pixel, I.he fxed point version pro,luted exacl.lv the same result as the floating

point version. Hence there i._ no loss in precision due to implemcntatio, _sing fixed point arithmetic.)

Figure 4 shows the data [low diagram for the hardware implem('iltai.ion of the' PNN classifier. A

portion of the design was mapp_,d onto the XFPGA and ihe rema.inil_v blocks were implemented on the

YFPGA of the module. The nllml_er of bands (d) was tix_'c1 to 4, 1.1_(,ma×imum value _>1'the number

of weights per class (PI.} was fix_-d to 512, and the ma.xiill_m_ numl_er of classes (/,:) was set to 16. As

shown in Eq 1, there ai_, l.wo collstants, K1 and K2, lha_. are class d_'l)_'ndent, lhese constants are

pre-caleulated on the hosl. and _lownloaded to nlenlory })alll,_s i'esidiJlg on _.he modllles.

The weight memory was mapl_ed to the SRAM thai is coi_n('ct.ed t(, 1he h'FPCA oll the lnodule. The

weight memory can be as large as 16'512"4"2 bytes - 327(;8 16-1_it w_ids. Each w_'ight value occupies

10-bits. Since each class can hav(" up to 512 weights, an array that. holds the lmlitber ol" weights for

each class is employed. Thc inp_tts of the array are also visible from the host processor.

A 4-bit register holds th(, imnll,_'r of classes. This regisl.,'r is init.i,li×ed by the hosl ])el'ore loading the

FPGA coprocessors. D_e l.o ,.h_' lack of space on the Xli'l_(i',:\. the/x'l ilmli.iplier ail_l the class compar-

ison blocks were moved to the host. These calculations allto,nt to I,' imlltiplicatio, ls and comparisons

per pixel classification. Sit,c(, lhe lmmber of classes, ].:, is small, t.lwv do _ot acco_lttt tot' a significant

amount of the computalioll, leading to a small pert'ormattce t)cnal_.5, l:or example', it" the number of

classes k = 16, the maximum ll_m_ber of weights per cla_s /_. 512, and we are (l_ssifying a 512x512

image with d = 4 spec_ ral lmnds, Eq. 1 is calculated 1(_ l.i_tes. l'l_, i_<uformai_c< - l,enalls" amounts to

only 16 multiplications and 16 comparisons per 512x517 image l.l_a_ at,- _'xecni.e_l on 1.11_'host rather

than executed on the FI'(I.,\. This is a small overhead r_,l,tl.iv_- to _.1__ _nore than 512:3 n_ltiplications

that are computed on lh_ I"I'(;A I'or this example.

Figure 4 contains a S_bl.racI_io_ I;nit that computes W, a I x 10 1)il ('le,_('_ll. vector t()l" \X (li?(I , lVl, //)2, "//)3)

minus X (Xo, Xl,Z_,x3). 'l'h(' r('s_li, of the subtraction rarities from l()i_:{ lo 1023, r('(lUil'iI_g 11 bits in

two's complement forn_al. The .qq_m.re Unit multiplies each l l-bil ('l('_,('nl of tl_(' Y v(,cl.or by itself

(i.e. to = Y0 x y0). Th(' vaht(,s of l.he elements of the '1' v(,(l()r range from 0 to 1,()1(i,52!). requiring 20

bits in two's complemcnl, forma.I..

The next computation involv(,s the Band Accunmlator (;nit. This _nit adds th(' _1elements of the

T vector together resull.i_lg in _1, ranging in value from 0 _o-1,186. l l(i. requiring 22 1)its. 'l_he K2[k]

Memory holds the K2 values for each class. K2 = (1/2)cr,72, where crk -- ".2,3 ..... I 1, 12. As a result,

K2 varies between 0.125 (rr,: = 2), and 0.003472 (crk 12). l'he targ,'st val,lc of K2 = 0. 125 in decimal

and is represented exactly itl billary (0.001). In orchw to il/crease l.lle precision of lhe mldtiplication,

the values of K2 are stored with lhe decimal point shifted Io the righ_ l_y 2 (mult.il)lied by four). After

K2 is multiplied by u in the 1(2 Multiplier Unit, the <lccin_al l)Oint of I}_e resu]l, of l[le mlllliplication is

shifted to the left by 2 (divich, 1)3 1 effect). Since this is a rCl)r('se_ll.al.i(m iSsll(', IIO hal(lwarc is necessary

to perform the shifts in the YFI'(',A (refer to Figure 'i), o1_ly the hosl needs to maintain the values in

the K2[k] memory in th(, apl)ropriate format. The 1(2 hlultiplier 1711it imllt.iplies the h'2 values for

each class by the accumula.t('d values of the difference between a pixcl and a weigllt vector. It delivers

a 44-bit result to the TO_XI:P(;:\ unit shown in Fig_lre 1. Bits 0 _o 23 tel,resent tile fra,:tion portion

(remember that the decimal point is shifted to the left I>3 2), and lilts 2.1 to 43 rel)rCs('llt l.he integer

part of the result.

The next operation is 1o complIle the exponential ot" the tlegative of I.his nlunber. (_iv(,u lhe precision

of the following operations, any ilu,nber above 24 will yield zero as n rcs, llt. Thus, if any of bits 43 to

29 is set or both bits 2S and 27 arc set, the result ot' e-" should be ×cro. Only 2.'; bits are passed on

to the Exponential LUT [Tllit, ;_lld lhey are bits 1 to 28. t_it 0 and I:,ils 29 to 43 arc discarded. It was

also found that a consi&'ral,le imlHber of results of the li_1111.il)licalion ar_' zero, which iiidicates that

the result of the exponcnl.ial should be one. In order Io s_ve pl'oces_ing sleps ill tiffs case, the output

of the multiplier is tested for zeto, and a flag is passerl lo lhc l£xpollential LUT luit., ill_licating that

its result should be 1.

A look-up table is us('d t.o del(,rlnine the value of ( . If we assllm(' lhal a = b _- c, Ill(m:

Ib+ / .... -' (2); .7: ( (

Since a is a 28-bit binnry immbcr, the value comprisin.e; lilts 27 1.o 1 i of, represcnl b, nlld the value

comprising bits 13 to 0 of (, rel,r<'sent c. The range of vnlucs of b ai_,l r -r, are:

or

which results in

00000.000000000 < b< lOIIl.lllllllll.

0 < b < 23.!)9801(;9,

(3)

(4)

0.9980519 > e -s >" 3.7S >: lO -11 (5)

The range of values of c and _ -' a.re:

O0000.O0000000001lO00000000001 < c < O0000.O00000000111 I 111111 1111,

or

(6)

which results in

1.19 x 10 -7 < c < l.S!)l!) × 1()-:_.

0.999999881 > _-_: > O.!)!)SIOi)_SS

The values of e -b and t- are previously calculated a_ld organized inio a. look ii I) fable.

(7)

(s)

At run

time, the values of b all([ c are ,Ised to address the look Ul' table stor,'d in t.he memory tllat is directly

connected to the XFPG A. Tllo val_,es of e -b and e -_ tel rie\ed from the look-up tabh' }tl'C l.h,q] multiplied

to give the value of c -r'. The values stored in the look _1t) table arc :{2-t,its wide. The result of the

multiplication is 64-bits. I,lt olllv the most significant 32 I,it.s are _ellt oul. As a r,sult.,

3.77 x 10 -11 _ e-" _ {L!)!)S0517S1. (9)

The Class Accumulator 17nil _ums up all the comparisons bctw,ell a given pixcl and all weights of

a given class, and outp_tls the ros,lt when it. receive._ a ttag indical.in.g that t.ho data lo add to the

accumulator refers to the last wei.ght in a class. The o_1_I,lt of the I".×l>om_nt.ial Mliltiplior [;nit range

is 3.77.10 TM < d < 0.99S0517,_1. Fhus, the largest ,Tcc111mllat.ed \,al_l(' is 0.99S0517S1 * 512 (max. of

weights) = 511.002511872. In order to keep the precisioll o[ d, t.ho acclmll,lator is oxten<h'd to 40 bits

to accommodate the original 31 1,il.s after the decimal i>oint and I 1,il I)e[ore the decimal point, and

the new 8 bits before lhe decimal l_oint. Each class ha_ a K1 value _ssociat.ed wilh it. The value of

K1 is determined by the following formula:

1K1 - (10)

The result of the muliiplical.ion of K1 by the accunmlated ditforence_ 1,etween a l,ixol alld all weights

in a given class is coml)are<l wilh all other classes to d(_t('rlnin(' t.l_(' I_lr.g(?s[, l'est_ll, xvhic}_ indicates in

which class a pixel most prohal,ly belongs. In order lo k('c I) the rallies h('ing mt_llil)lie(I in the same

range allowing us to use li×ed i,()int arithmetic, t.he val_l(,s of KI are normalized a_ follows: Given d,

(ra and P_, the host program calculates all K1 values, a_l(I (lividos lh('m t)v the lar_('st ore'. ]'he result

is that one value of K1 equals I a,_d all the others are l('ss I.han 1. l'h(" hl Mullil,lier l;,,it multiplies

the 40-bitresultof the ClassAccmnulatorUnit by lh_"32-1fit.1(1 value from l.h_' KI _lcmory Unit,

and outputs a 40-bit result to lhe :/ register in the (llnss (!omparis(m [7nit. The (llass Comparison

Unit receives a value that reproscqll.s the compa.rison I_¢,l.w¢,¢_na pixel and all wci<hl.s iTi a class, and

compares this value against the values generated ff)r all other classes. :\l t.he eml of Ill,' calculation

of all classes, it outputs a co(h, ilia.l, represents the class which l_r¢'scnl.ed l.he laY.e_-si, v;,lue or is the

closest match.

C. The host software

The software that. was develol,ed for the PNN a.lgorithln lhat exec,,lcs on lhc tlost processor was

written in the Java programmiil< language. We selected lhe ,lava I)r()e:r_,nming laugua.<(' for several

reasons. Java supports software r(,use, native methods, renlot.e melho_l illvocation, and il l_as a built-in

security manager. Software reuse allows ,lava objects and methods 1(, t,(' used r('l,('a.l.edl 5" in different

applications. Native methods allow legacy code (01d sol'twnr<' writl('ll ill am)t h(_r lane:uage) to be called

directly from Java metllods. _l'll_, s<,curity manager and r_'mote metho,I iilvocatioll allow .lava programs

to be executed on remote (!l'lTs with the system taking care of n¢q.work traf[ic errors, security, etc.

The FPGA system nsed t'or development of the hardware modules, contains drivers for interfacing to

the FPGA devices thal are only available in the ('. progr_l_mfing lane:uag_. Java, was a useful choice

for a programming language since native methods allow o1_, t.o call (: rouiitlcs clircct ly froin ,Java. This

is accomplished by building a <13'l/,_ll_ic link library that c_mlains the (! f_ilct.ions l l_at inl_'rfa.ce to the

FPGA coprocessors. A ,lava hal iv(, method is used to call I llesc (! I'u,lcti<ms dir,',* ly.

The application was in_l)h'i_,'l_l_'d using a client./scrv_w t_'l hodol_)e3 I_) i,rovi_lc an inl,,rface to the

FPGA coprocessors from a rcmolc site. _l'he server progra_ inl._-rfaccs dircclly lo the r,,configurable

accelerator via the C drivers, li receives a block of pixcls IlOm tile clio'hi, inilialcs the classification

of each of the pixels on lhc VPC, A accelerator, galhers i.l_e rcsn]l.s into a block of cl;_ssified data,

and sends the results 1,ack 1o Illc client. The client sofl.war_' conllOls lhe llSel' i_l_wfa.c< image data

input/output and translation, in addition to conalnunicalion witl_ lh_" server. ]_x st,l_,,'iing Java as

a programming language and s<'l_alating the program i_:o client, arid s('rver sul_sysl,<-_l_, the client

software is completely in<l<'l,<'_dc_: of the operating s3"stc_ lhal will ,,×c<:_l_, I1_, clh,nt, lm_gram. Only

the server contains code lhat is nor only dependent Ol_ i.h(, <,l)Cral.ing svsi.c_n used, 1,._t als,, depends on

the specific reconfigural,l(, accchq'ator that has been selecl.<'d, l[e_lc< in l.his 1)altar we present results

obtained from an imph'_n_,_l al io_ of the PNN algorithm lhal can he ,,xec_led tro_ a re_ot.e machine

accessible, for example, on ll_<, I_lcrnct..

I\:. EXPERIMENTALl¢l.;'_II/FS

In ourexperiments,wcusedthe r<'mote implemcl_ al.io_t of Ih(_ ]),XN d_ssilit'r 1¢, m¢'as_r<" the effec-

tiveness of a client/servtw al,lnoach to adaptive COmplllil_e;. l:igm'c 5 ill_strates a potc1_t.ial scenario

for remote image classilicatioi_, tll this configuration, 111,' scrvcl program has a direct interface to the

FPGA coprocessor. It. illitializc_ t.1/¢, tVPGA board all,l lo_ld_ the arcllil(,clure shown in I:ignre 4 into

the programmable hardwar(,, lJl _,_Ir l)roject., the Sel'\tq t'×¢'Cllles OII ;/worksla.tion ,_1 NASA. The client

program communicates with lhc server via the Inl(wllci.. fhe clicul rC<lllcsts a c_mm.cli()n with the

server and, once granted, s(,uds data to the server for tnoc_'ssing. 'll_c server l,rOccsses 1lie data and

sends the results back to the climll for display. WhiD the clicut is dcsigm.d to (.xcc_l¢, at a remote site,

e.g. NCSU, in our experimeuls, I,oth the client aud scrvm" programs xxt,l¢, ¢,xec_llc(I on a single host at

NASA.

Two software implemcul aliol,s ()f the PNN algorithm wcr(_ <l<,vcl<)l)c_l to colnl)m'(' the rclative perfor-

mance of implement.ati_ms iu tw,_ dill'trent programn_iu_ lan.g,_ages. ()_t:, \'_wsioll was wl'il 1.en entirely

in the Java programming language. The other version was writtcu usiug t.h_' C lWOgrammiug language.

The main routine in the click: _l,aW_t'd either the ,lava ov (! vcrskm_ of llw algorilhm via a call from

a normal or native melhotl r_'_l.'Cli\'ely.

Two FPGA-based havdw,wc vclsions of the PNN al._ori_ IN_ were i]_ll,l('_nentt'd ilsing siugle or mul-

tiple modules. We rel)Orl rt,mlll.s _tsing two mochiDs as we only had lwo mod_llc: available for our

experiments. In the siugh, u,)tl_llc case, one pixt-I <)r _)_1(, l,lock of l_ixt'ls w(,rc St,lit. t() t'ach FPGA

coprocessor and the res_lll s wett, _'('t _rned to the client via th(. server. I_ _l_t. two n_od_lh, _,xperiments,

one pixel or one block w;_s st'ill _o each of the t w(, 1:1'(',:\ COlWOCcs_t,rs iu all /,tl,¢qn[,: 1o speedup

algorithm execution by a lact_w of 2. Each module i_t the lnulliph' mo,hll, cast, co,tailw_l a complete

implementation of the hardwar(" iu t"igure 4.

A traditional version of t l_c I'NN (lla.ssifier algorith_u was l)rc\'i_,_lsly devclol)Ct[ as the basis for

the remote version pres_,,_,c_/ iu t l_is paper. l'his eXl)<'rimcl_t dcmo,,stralcs llac potential merits of a

remote image classificatio_ al_)rit 11111implement.alien. The traditio,,aI v¢,r_ion c×_wut_,d ,,,_ a 100 MHz

Pentium PC. This iml,lmllt'Iltali,),_, written entirely in ('., r('quircd :21113 ('I)1_ scc,),,(ls _,) classify the

complete Blackhills dala scl. ]Iv ;_,_mcnting the P(: wilh a siu_h" illo,tlllc l'UllllillK lilt' I')<N classifier

at 16 MHz, the processiug lime u'a_ r_'<luced to 220 (I1'[7 st,couds. Ill l l_i_ ca_e. I lw ,iclal)ti\'c computing

implementation is 9.2!) limes tas/cr than the soft.war(, \¢.rsioll. A,Idi_g t,l_c addit io_l_l luo,lt_h, improved

execution time to 90 (!l)l; s(,con¢ls.

In our experiments witl_ the r<,tnotc PNN classifi_'r, we" ran a total _,f I diffcrcnl _<-¢,uarios presented

10

in Figure6. Thescenariosallow1>to comparelocalandreInoteversionsofthealga)vii,bin t,ha,texecute

on the clientandserverwith I_ix¢'lhasedor block1,asedal_()rilhmswltereone1,i×elor onet)lockof

pixelsis processed.In eachexperiment,wepresentexecutiontimesfor l.wosofl.wnreiml,lcmentations

(written in JavaandC) and_wol_ar(lwa.reimplemelll.ations(Oil(' ln()(tll[(" (.,1'IXVO 1,1,M,tlc_).

In Table I, we present res_lll s of a iemote iml)lemcn, at.ion of the inme_e classiIical i_m al_,rithm where

one pixel is processed at a time. Note that the i1,11)lemeI_lation of ll,c algoritl,l_l in .lava requires

7598 CPU seconds to complete. l'h_' C version o{ the al?;orithm reqtlir<'s slightly nlore t.i,ne since it is

actually spawned from the local client .lava program I_, excclll.e on l.lw lcm(_te ser\e_ work< ation. ('The

overhead associated wilh calliiig a (! fullction fronl .lava is inchldcd i,l Ill,' exec_ltion tiIne.) For all

practical purposes, the (! a1_,l .Iav_ versions of t.hv alg,,rit hnl requilc S_l,l,roximaielv lhe sallm execution

time. This was a stra.ll,,e res,llt since ,Java is an inlcrprel<'d la.nguae_c, however, we noticed a drastic

improvement in the execul.i()t[ ,,1 .lava programs u_ii,g im),'e re(ellt \'crsi(ms of l lie .lav_ interpreter.

The remote version of the algoritl_l_/ executing on a _in.@' FI (,:\ 1_,)(1_1,' was :/.57 times faster than

the remote software version. Also m)te that the addi,.ion of cmc mc, d_lh, in the ],_flt.il,h' _nodule case

does not impact performance.

The next experiment involvc,I se,_ding a block of dal a from the ('li('_l. to the server for processing. 'The

results of this experilncnt are also sl_o\\,n in Table I. ]l_ ore' experin_(qllS, a_l arbitrary Ifl<)ck size (equal

to 6 rows) was selected. (l.'_l_lrc cxlmriments will idcl_tily lhe Ol,i.i_al I,lock size.) _;ince _here are 512

pixel vectors in a row, and 1 I,ixcl_ I,vr vector, one Iflocl¢ coiiI aiiis 1'2.22"_ Ifix¢'ls. N(_,' ll,a_ 11_¢,execution

time of the remote .lava \e_'sio_l _,t' the 1)lock-based algoritl_n_ is si.e:,tilica_tly s_,mller llla]_ the pixel-

based algorithm. The exc(_tl.ion l i_nc reduced from 7.5!)8 to 1:{58 (:1_17 sect,ntis. ()_lce a e;ai_, the single

module implementation was sigifi icanl.ly (7.6 times) [/isler lhan th(? relllol(' soft\var(' versi_m written in

Java. The addition of a se,o,ld _l_l_le did not 1)rovi,le a Sl)ee,lu 1) _11_' to file ovcrlt,'ad associated with

sending a block of dat._ to tl_e scr\c|'. Please hole ll_ai, l.l_c l"P(i:\ c_q_roccssor cot_islc_t_lv processes

a pixel at a time, how¢'x'er. 11,¢"solvet will wait for all pixel_ in a block l.r) I_e i)roccs_¢'d I,cfore sending

the results back to the cliclli.

Table II presents res_dl.s el I'NN classification excc_ i,_g ¢m a local w(_rl.:sl.al.i()_, lhe (li,'l_t. program

can initiate execution o1' eilh_., o1 ll_e software o, har_lwa,'c algol'ill,l,I i,,lpl('nu'_ll_,l iol,s. In the local

pixel-based algorithm, lhe .Ja;';/ \ersiolt requires al_oul I:_I7 (:1'1 s .tends an_l lhe single module

implementation requires 1.11 ( !I'[ see(rods. This is alq_roxi_mlely an o_'_h,, of lna.gl_il.ude il_provement

in execution time. The mulliph. _nodule version complel.es in 77 s('('_),l(ls resull.illt[ in a 2:1 speedup

over the single module as exl_ccl._._l. The results fl'om -l'ahlc I1 ill,Is, tale l.]lnl 1)locl_ I_ascd i_rocessing is

I1

counterproductiveon a]o('_llclio,if workstation.

r. CON( :l.I 7SION,q

In this paper, it was slIoWll t,ll_l 1he implementat io1_ of a ll_111tispect ral image clas_itier ol_ an adaptive

computer yields an order o[' ma_:_il u<le performance incrc-asc over higll cud workst_l ions. If we extract

the fastest execution times toy 1t,' algorithm from lhc 'l'aldcs presclllcd, we find ;_ii interesting result

that relates to the potential il,l]);_ct of remote adaptive colnputi,lg _cch,,ology. I'he t'astest remote

hardware implementalion of t l., I'NN algorithm consisted of a _i.l_lc illoduh _ l_'quirillt_ 178 CPU

seconds to complete. ()ll the _ll.'r hand, the fastcsl local software \'cvsioi_ of llw algorilhln was the

Java version that required I:_()!) (!PIT seconds. l'l_is is 7.35 limes slower lhan lllc' ren/ol,e hardware

implementation, ttence, for illmgc classification, a lcmot, c hardware illll)h'lncntat ion of l lie algorithm

is faster than a local soft.ware it,ll,h-m,_ntation of the algorithm, l:ul _lrc w,_rk is 1o i(hml ifv additional

applications wherein a r(:mol,. I,_tdware implemei_lation is consisl,'nllv faster t]l_n a local software

version. Additionally cxpcrit_..lll> that quantify tile effects of ;I I.';_\il 5' loaded ll,'twork connection

should be conducted.

[1] J. Hess, D. Lee, S. llarpcr, M. +]_m+,s. and P. Athanas, "lmplenlel_l;llion and I':v,luati.n of a I'r<,tolypt lleconfigurable

Router," in Seventh lEEK ll'ovksl..p ,m H'(,As .for Cu._tom ('omlmlmq :lla(him *, I!!!19,

[2] M.J. Wirthlin and B. L. tlul(hin,.;s, "'l)l'-;(:: A Dynamic he-lruclion S(,! Compul(,r,'" in Third 11:'15; W,,rkdtop on FPGAs

for Custom Computing .llachiu( s, [!it)7,

[3] L. Agarwal, M Wazlowski, aml S. (;].,sh. "An Asynchronous Al,lm.;.eh Io l';l[ici..l Execution of Programs on Adaptive

Architectures Utilizing t" Pl ;,,ks," in ,b, .-,rod IEEE Workshop on t"I_GA , ,[or Custom : :on_p.lin!] :lla_ hi.e*, 19'_1, pp. 101-110.

[4] P. Bertin and Herve' Touali. 'q)\M Ihogramming Environmc_lls: Praclice aml l':xpvrie,ce." in .%,,,m/II)t:?E [Vorkshopon

FPGAs ]or Custom Conzputinq :ll,u.hme._, 1994, pp. 133-135.[5] H. Hogl, A. Kugel, J. Ludvig, II NI;_,tl.,r. IC.. 11. Noffz, and P.. Zoz, "l':nahle-l--l: A %,'_ol,,l (;enera,i,,n I:P(',\ Processor," in

Third IEEE Workshop ou I:'t_(,':I _ [,. (:..'tom Computbtg M,whi_te.,, l_!)q.

[6] C.P. Cowen and S. Moimghan, "'.\ ]{.,onfigural_le Mont,e-{7?arlo (:lust,,ring Pro<essc,r (MI:(:P)," i_t ,_',c,.,.I IEEE Workshop

on FPGAs ]or Custom ('ompulinq M_l.htn_ s, 1994, pp. 59--6:..

[7] S.D. Scott, A. Samal, and S. g{th. "'II(IA: A Hardware-Based (:;cmqic Algorithm. '_ in .I(?.U/,SI(,'/).-I .S'!V.p._'ium on Field

Programmable Gate Arr.qs, 19k)5. pp 53 5!).

[8] P. Graham and B. Nelson, "'Ge._qic Algorilhn_s in Software and it* IIardware:' in l"ouvth 1EEE II.vkd., l, on FPGAs]or

Custom Computing Machittee, l_.!!i_.

[9] W. Luk, T. Lee, J. Rice, and P, {'hcl,:4, 'qleconfigurable {;Omlnlti_ :\ugnlcnled II_alily," in .%v, .th IEt'lC IVorkshop on

FPGAs for Custom (-'ompnlinq :ll,trh:.es. 199_}.[10] M. Gokhale, B. Holmes, ,\. I(_)pser, l). I,:unze, D. Lopresti, % Lueas, 1l. Minnich, ai.I P. Olsen, '%plash: .\ l/econfigurable

Linear Logic Array'," in hzt, r.,_t/,,.,,I ('o..fivence on t_aralbl Proees,.i.g, 19til), pp. q2fi _,:12.

[11] J. Vuillemin, P. Berlin. [). _;om'i.. ?d. ";hand, H. Touli, aml P. llcm(mrd, "t_ro:dranlnl:lhh ' Acqivv M, mori:,s: lleconfigurable

Systems Come of Age," I/)L'/: Iv,.,,,:, li,,._ ,., VI,,ql ,';!l_t_m_, vol, I. m_. 1, pp ",t_ I;!_. I!i!!/;

[12] P. Athanas and L. AhL¢,_I, "V,. :d lim_ ima::,e processing o_, a cuslom COml.ltin: A I,l:_lfi,_m," 1I)t)1" ('oml)*_ter, vo]. 28, pp.

16-24, February 1995.

[13] N. Shirazi P. Athanas and 3.. At,I.,_I. '[ml_lement, ation of a 2-D fast f_mrier lral,sl_,im (;n ;in 1:1)( ;,\-bzise([ on-,tom computing

machine," Proceedings o.f lh, 7,1h t.l, _ .altoaal IVorkshol_ o. t'i_ld-I_r,.Ir,.mo,_t,I, L,,!/i, a.,I ..llq_li¢,tl",ns, re] 1, pp. 282-292,1995.

[14] B. Hutchings M. Ranch{ r. "Autom;m,d Target Recognition on Splash-II,': ll)EIi .h)t.,1,,_,_",m ,m l'i, td fVv:V',..mable Custom

Computing Machines, w_l. I. pp. 2_2 211_, April 1997.

[15] K. Bazargan and M. Sarrafz;_deh, "t:,._.,h<urable Comlmling , Au.4m.,.a,,d I{<_lits,'" i. ,h', vcvth It'f'/' ll',_vl,./._p on FPGAs

`for Custom Computin:/ M,.-hit. _, I!_!i!_.

[16] Get ready for reconfiguraLl_' e_.nllmti,_,. ,'" (!omputcr [)csig_a. vol. l, pp. 55 I;3, \pril 1!_!1_.

[17] T. Hoffmann U. Nageldi*<_'<'r I_,. II_rt,q_t, in M. ller and li Ix';.iserstaul_ rn, "(h_ r<¢_,.lixural_le co p_,. ,,ssin_ u_dls," Proceed-

ings o] the 5th ReeonJig.v.ld_ .l_,hlt,,/,.,s Workshop (I_A II":_,v). vc.I. I, March :_l_ I!!:!s.

12

[18] M Wazlowski, L. Agarwal, T. l._,..\. _milh, E. Lain, P. AI}lall;lS, II. %ilverman, stud S. <',hosh, _'I'IIISM-II Compiler andArchitecture," in First II;'EL' [l',_t'/,.,h,q_ os, l"[)(;As _T" (.'t:xlon, ('o_q_Mi_ 9 3[a, tlh_¢ _, I!l!i:], pp. 9 l I;.

[19] P. M. Athanas and It. F. 3ilverulan. "I_roc_'ssor l/ecoufiguralir_n Thrrmgh In_trm:ti_,lt.qet Metaulori,hosi_;," /EEE Computer,

vol. 26, no. 3, pp. 11 -lg, 1!_93.

[20] D. Galloway, "The Traz_smogrili<.r (! Ilardware Description i_anxua_' :rod C_mq,ih.r Ibr I"[)GAs, '' i_ Third //;'EE Workshop

on FPGAs for Custom (,'omp_lim, ,Ihlchiuce, 1995.

[21] J. B. Peterson, R. B. O'Connor. and ]), M. Athanas, "Sch_,duling a.d Partitioni,,_ ASq[-(_ lh'o_rams ,,.to Multi-FPGA

CCM Architectures," iu t"ourlh IL'/;'t; W,_rk._hop on FP(;As for (_._t,,m (_oT_pulitt!l .llewhmc._, l_!_t;

[22] M. Figueiredo and C. Glosser. _qnlld,'menlation ofa prol_abili_,tic neur;ll network fi_r muhi sp<'clral image _ I;_..sificalion on anfpga-based custom cotnlml, ing machi,(C' Proc_dinys o.[ the ,sth ]'Ira.':ilia_ ,¢;yml)os_t_ o_ Neurctl :\_ l_,orks, l)t'(:ember 1998.

[23] D. L. Verbyl, Satellite rtmot_' .,,u,i_rl ,_.[ Hal_wal resom'ct._', ];,,wis lhd>lMu-rs, L,,wi- Ihddi-hers, 1!t!_"[24] M. Birmingham S. R. (:hetlri, R. I". (h'_mq,, "Design ¢7f u(.ural mtwt_rks for <'lassificalion of r, tm,_ely --nsed imagery,"

Telematics and In]ot'matzc_, vol. !i [q;. 1.15 156, 1992.

[25] S. R. Chettri and R. t". (';romp, "th_,hahiIistic neural network archihct.re for high-_,t*eed clas;sific:_tion ot remotely sensed

imagery," Telematics aud I_([orm,l_¢,. vol. II), pp. 187 19g, 1993.

[26] Giga Operations, Giga ()l_Cr,_tio._ ,_'l_,,'tc.m f?cconfigur, tl,l_ (:ouqmli_*,/ I_lalform l),,,'.m_tati,m. (;iga ()p, ralion Corpora-

tion, 1995.

10 bits

0

d number

pi;:< el weight',,.,'ec t o r vector

Numl)er of Classes K

[I

Class S kNurNDer ,Dr weiglmti

In cl_a@ "_k = Pk

Fig. 1. PNN Image Classifier.

Ii.. "2. I:])(;A Imard r,,ihnecl{.(I t_,.,, I'(11 I),>_l,)t

]l

Multi-s pectralsatellite image

_-_ download BIT file

_xecutable) image pixels '_

\ / _\ classified data ,:i_3_

"%.

"%. •

"\ _ dtsplay

"_las sift ed image

\.

FPGA Board

Z

O

e_

O

m

_mory Uni

.... IIiWeight_Mamory

Unit

0

_,_ i_[_

_mo_ Unil

i ! 1

W

ant

L

SubtFac'dc_

Unit

8

T.@)0-r

_J/ h.

Ir

d 'dme_

....-----_---%I ,.-.,_,d t;

Square- _ Unit

Unitl t ._l_ccurr, ulato u___9__,_

}:."2 a

1_.4ullJl:,lier

I.Init

I _ E-'_onenlJalLUT................................................................... Memory Unit

Lm, mm m m m mm m n m ,mum, m m ram, mmmm mmm ,imm mm mm m

XFPGA I

m<,[_/"_ Memory Uni 1

Result Unit I _, h

ICI_

ComparisonUnit

.._

/

I I _ I cla_ I d I_._,,-_._-n._l I_lKi _.!,.,I._plierm.d._,0ozurr'ul_qm-_---I_l.l_ullJp]ie f

I u_"' I_ (l ,-,_,it k I '-'nit

-., -" Pk lJrne._

eb, ec

PNN Data Flow Block Diagram

.... ,_ irmiflalize,'l L,efore F PG._.I,zoJ

___ o_-enec__,'l tothe HBUS

Vig. 4. t"NN (:h,.silier Dala [Fnit.

1(_

I LOCAL CPUIocal.ncsu.edu

,c_,e__i

North Carolina State University

I Remote CPU }remote.nasa.gov

sei'v_r

NASA Goddard Space Flight Center

_J[ _, rr,l,,ef_

Fig. 5. Algorithm Executiol, {,t _ l{emol_'Adapliv_(;{mqmt,zr.

17

Total Execution Time (in CPU seconds)

9000

80007000

6000

E 500040003000

0 20001000

0

_ _-_ -_ oo _n-

Image Classification Type

[] Software (Java)

[] Software (C)

[] Hardware (Single)

[] Hardware (Multiple)

Fig. (';. Local and Remote lm_q-(Tassific_lion l_xecution Time.

TA Igl,l,2 1

REMOTE IMAGE CLASSIFICATION '1 IMIN'(; H, EPORT (IN C])e SECONDS).

Description

Software (Java)

Software (C)

Hardware (Single)

Hardware (Multiple) I

lqxel I_ased

Avg Row Total

14.5:_ 7597.79

15.7!) 8087.2:1

4.l(i 2121>.38

4.25 I 215(i.02

l{lock Based

Avg Row Tot

2.65 135,'_ t)[

3.61 l_q 7

0.35 17_." I 70.35 t 180.

'IA I{I,E lI

LOCAL IMAGE CLASSIFICATION 'I'IMINt; I{EPOI_I' (IN CPII _ECONDS).

Description

Software (Java)

Software (C)

Hardware (Single)

Hardware (Multiple)

l'ix,'l I]ased

Avg Row Total

2.57 1317.31

3.57 1871.36

0.:27 1:] I. 10

0.11 77. i5

l{h>ck Bas,,d

Avg Row Tot

2.56 1:{0!)

3.69 1_>_

0.2S 143

0.2E 1.1:2

!13

19

Marco Figueiredo isanR&DEngineerworkingill theinvestigationandal)plicaii,,l_()fr('configurable

computingat NASAGoddardSpaceFlightCenter.Hehasa B.S.ill Electricalt';_,it_('('ringfrom the

UniversidadeFederaldeMinasGerais- Brasil(1!)_), andn mastersi_ Conlpul,,_I'._e:i_u'eringfrom

LoyolaCollegein Maryland,USA(1991).

Clay Gloster, Jr. is currentlyanAssociatelh(,f(,ssorin lira Del)artmentof Eh,cllica ,k:Computer

Engineeringat North (larolinaStateUniversity.I1(_receivedtheB.S.and M.S._l,',_,rc(>in Electrical

Engineeringfrom NorthCarolinaA&T State[Tniv(,rsity and the Ph.1). degree in (!,,llllmler Engineer-

ing from North Carolina State University. I-te als() has lw(m employed with IBNI, I/_(, l),'partment of

Defense, and the Microelectronics Center of ix,'orlll (:arolizm. Currcllt research fo,_l.,'s (,l_ lhe identifi-

cation of potential applications and the develol,mCllt of autol_mted tools that assisl :., i(.Jll ists/engineers

in mapping these applications onto reconfigurahh, c(,nputing resources. He is also ;_c_ivy,Ix" conducting

research in the area of technology based curricuh:in develol,_mnt and distance e(hl(;_ toll. Dr. Gloster

is a member of Eta Kal)pa Nil, Tat; Beta Pi. A( _,\1, an is a registered l)rol'ession_l ,._.._iil,,cr.

Mark Stephens is a (.',omp_iter Engineer workil,,, for tlm NASA (',c,<ldard Sl)aCc tli,_hl (!enter. He

has a B.S. in Applied Math, Theoretical Physics ;,_¢1 l"luid I)ynamics from TulaIl(' I _li\(wsity (1976).

He went on to get a Master of Science in Atmo._l,hcric Scic_we at (!olorado Stal_ Il_i\'cl_ily (1979).

Corey Graves received his B.S. degree in Elecl rical Engineering from Nc, rth Caroli_l:_ SI;tlc University

in 1993 and his M.S. degree in Electrical Engi,e('riHg from _<,lth (larolia_a Slate A,\I _lill,," University

in 1994. He is currently COml)leting his Ph.D. \V,)rk in Co_al,_ter Engim_ering at N,,_lh (',rolina State

University. His current research interests inclu&, f{_m-timc l{econtiKurable COml,_l i_:._. Applications

of Wavelets, Adaptive DSP algoritlams, and Archil _,¢'t_u'es I,,r 1)SP. lh_ring his 3C_,l- a s_ udent, Corey

has worked with many industrial and governntc_ll c_l.itics includi_lg IBM, ATL I. %,_l,lia National

Labs, and Oak Ridge National Labs. He is curre_ll ly a mcnll,_'r of the I]';l"I" Signal I'lo,'_.,sing Society.

Mouna Nakkar received her Bachelor's degr_'e il, I:;l,_ct.,'ical l:a_ginccring from ihc [ _ix'_'_it.y of North

Carolina at Charlotte, NC in 1991, her Master's hOln the s_ne tmiversily ill 1!)!):_. :_11_1I/or Ph.D. de-

gree in 1999 from North Carolina State University iI_ l/alci-h, NC. She joined I_l,,l,,roh_ Inc. in July

of 1999 and is currently, working on microproccss_,r research and dc\_qolmwnt., l let rcsc_rch interests

include reconfigurable comp_l.ing, embedded l,rO,cssors. 1:I'( ',A archil.c,l.ure, low i,,,,.',cr I"PGAs, high

speed CMOS design, Multi-( lhip-Module Pa.ck_v,,i_m:. 3 1)imcn_ional Nl( !NI 1)ackagi_. _,I_,l Optical In-

terconnects for high speed VLSI packaging. SIic worked Ol, a. microprocessor &'>],_l, ,l_ring a co-op

in 199.5 at Ross Technologies. Austin TX. She als,_ served i_s a lcachi_L_ assislanl t,,_ _c\_'ral electrical

engineering courses.

2O