Modularand Scalable Architecture Realization High-speed ...

15
VLSI DESIGN 2000, Vol. 11, No. 2, pp. 115-128 Reprints available directly from the publisher Photocopying permitted by license only (C) 2000 OPA (Overseas Publishers Association) N.V. Published by license under the Gordon and Breach Science Publishers imprint. Printed in Malaysia. A Modular and Scalable Architecture for the Realization of High-speed Programmable Rank Order Filters Using Threshold Logic . HATIRNAZ, F. K. GRKAYNAK and Y. LEBLEBICI* Worcester Polytechnic Institute, Department of Electrical and Computer Engineering, Worcester, MA 01609-2280 (Received 1 June 1999; In final form 10 November 1999) We present a new scalable architecture for the realization of fully programmable rank order filters (ROF). Capacitive Threshold Logic (CTL) gates are utilized for the implementation of the multi-input programmable majority (voting) functions required in the architecture. The CTL-based realization of the majority gates used in the ROF architecture allows the filter rank as well as the window size to be user-programmable, using a much smaller silicon area, compared to conventional realizations of digital median filters. The proposed filter architecture is completely modular and scalable, and the circuit complexity grows only linearly with maximum window size (m) and with word length (n). A prototype of the proposed filter circuit has been designed and fabricated using double-polysilicon 0.81am CMOS technology. Detailed post-layout simulations and test results of the ROF prototype circuit indicate that the new architecture can accommodate sampling clock rates of up to 50 MHz, corresponding to an effective data processing rate of 800 Mb/s for a very large filter with window size 63 and word length of 16 bits. Keywords: Rank-order filters, median filters, DSP architectures, majority function, voting circuits, capacitive threshold circuits 1. INTRODUCTION As a generic definition, the rank order filter (ROF) is a non-linear digital filter which determines the i-th ranking element in a given window consisting of (m) binary encoded input words (vectors). In a simple one-dimensional example, the rank-order filter would process a certain number of input *Corresponding author. vectors contained in a sliding window, and produce an output that corresponds to the i-th ranked vector in the current window, as illustrated in Figure 1. As the sliding window moves by one vector, the overall ranking (rank-ordering) will have to be updated to produce the next output, again corresponding to the i-th ranked vector in the new window. Figure 2 shows the 115

Transcript of Modularand Scalable Architecture Realization High-speed ...

Page 1: Modularand Scalable Architecture Realization High-speed ...

VLSI DESIGN2000, Vol. 11, No. 2, pp. 115-128Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2000 OPA (Overseas Publishers Association) N.V.Published by license under

the Gordon and Breach Science

Publishers imprint.Printed in Malaysia.

A Modular and Scalable Architecturefor the Realization of High-speed Programmable

Rank Order Filters Using Threshold Logic. HATIRNAZ, F. K. GRKAYNAK and Y. LEBLEBICI*

Worcester Polytechnic Institute, Department of Electrical and Computer Engineering, Worcester, MA 01609-2280

(Received 1 June 1999; In finalform 10 November 1999)

We present a new scalable architecture for the realization of fully programmable rankorder filters (ROF). Capacitive Threshold Logic (CTL) gates are utilized for theimplementation of the multi-input programmable majority (voting) functions requiredin the architecture. The CTL-based realization of the majority gates used in the ROFarchitecture allows the filter rank as well as the window size to be user-programmable,using a much smaller silicon area, compared to conventional realizations of digitalmedian filters. The proposed filter architecture is completely modular and scalable, andthe circuit complexity grows only linearly with maximum window size (m) and withword length (n). A prototype of the proposed filter circuit has been designed andfabricated using double-polysilicon 0.81am CMOS technology. Detailed post-layoutsimulations and test results of the ROF prototype circuit indicate that the newarchitecture can accommodate sampling clock rates of up to 50 MHz, corresponding toan effective data processing rate of 800 Mb/s for a very large filter with window size 63and word length of 16 bits.

Keywords: Rank-order filters, median filters, DSP architectures, majority function, voting circuits,capacitive threshold circuits

1. INTRODUCTION

As a generic definition, the rank order filter (ROF)is a non-linear digital filter which determines thei-th ranking element in a given window consistingof (m) binary encoded input words (vectors). In a

simple one-dimensional example, the rank-orderfilter would process a certain number of input

*Corresponding author.

vectors contained in a sliding window, andproduce an output that corresponds to the i-thranked vector in the current window, as illustratedin Figure 1. As the sliding window moves byone vector, the overall ranking (rank-ordering)will have to be updated to produce the nextoutput, again corresponding to the i-th rankedvector in the new window. Figure 2 shows the

115

Page 2: Modularand Scalable Architecture Realization High-speed ...

116 ]. HATIRNAZ et al.

The smallest vector(Minimum filter)

V2

V1

i-th ranked vectorin the window

The median vector(Median filter)

V5

V4 V8

V3 V6

I TV2

Sliding window containing m vectors.

The largest vector(Maximum filter)

Time

FIGURE One-dimensional illustration of the rank-ordering process.

Original data window Median filtered output

FIGURE 2 Two dimensional application of the rank-order (median) function to a (3 3) window.

two-dimensional application of this principleon a simple (3 3) window, where the center ele-ment (pixel) is being replaced by the medianvalue contained in the current window.

Popular special cases of rank order filters aremedian, minimum and maximum filters, where theoutput is determined as the median, the minimumor the maximum value within the input window,

respectively [1]. Variants of ROFs are widely usedin digital signal and image/video processingbecause of their non-linear characteristics. Espe-cially, median filters have found many applicationsin digital image enhancement, such as reducing thehigh frequency and impulsive noise in digitalimages without the extensive blurring and edgedestruction [2, 3]. Other successful applications of

Page 3: Modularand Scalable Architecture Realization High-speed ...

ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 117

ROFs include the smoothing of noisy pitch con-tours in speech signals, data compression in blocktruncation coding schemes, speckle noise reduc-tion in coherent imaging systems, and preproces-sing data for machine vision.

Several algorithms have been proposed for rankorder filters that are based on data sorting. Al-though these algorithms are suitable for softwareimplementations, they usually result in inefficienthardware structures, since they process the inputvectors at the word level. Implementations basedon stack filters have an area-time complexity ofO(n2), and the hardware complexity increases veryrapidly with window size (m) [7].

In recent years, some innovative bit-serial struc-tures for rank-order-filters have been presented,which are mostly based on majority-decision algo-rithms [4, 6, 9]. Yet, the majority function is typi-cally hard to realize using conventional Booleanbuilding blocks, since it requires a large number ofgates and a large logic depth. Consequently, suchstructures suffer from speed and area limitations,especially if the window size becomes larger than10 vectors. Also, most of the conventional realiza-tions result in a fixed rank and a fixed window size,which limit the flexibility of their application.

In this paper, we present a new architecture forthe realization of fully programmable ROFs basedon threshold logic gates, resulting in a verycompact and highly modular structure. Thearchitecture consists of a regular array that iscomposed of only two types of building blocks,and it allows the construction of filter structuresof arbitrary size. The processing efficiency of theproposed architecture is significantly increased byfine-grain pipelining in both directions within thearray, where the clock frequency remains essen-

tially independent of the window size (m) andword length (n).The organization of this paper is as follows: The

outline of a simple bit-serial algorithm for rankordering is presented in Section 2. In Section 3, theimplementation of the programmable ROF archi-tecture is discussed, and the main building blocksare presented. The structure and operation of the

multi-input majority (voting) function blocks are

presented in Section 4, followed by a discussion ofthe prototype ROF circuit and its test results inSection 5. Finally, the conclusions are summarizedin Section 6.

2. THE RANK ORDERING ALGORITHM

2.1. Algorithm Description

A bit-serial algorithm first proposed in [6] waschosen as the basis of the programmable rank-order filter architecture implemented in this work.In this algorithm, the problem of finding a rank-order-selection for n-bit long words is reduced tofinding "n" rank-order-selections among 1-bitnumbers.The algorithm can be summarized as follows:

beginfor 1:: to n do

begin-sum[l] is the sum of the

l-th bit of all the numberssum[l] := O;for j:= tomdo

sum[l] :: sum[l] + ag;if sum[l] < m- then-selecting the i-th ranked bitselector[l] := O;

elseselector[l] := 1;where the selector[l] is the l-thbit of the i-th ranked number

forj:=l tomdobeginif a.t! selector[l] then

for q:= (1 + 1) to n doagq := az;

endendi-th-ranked-number :-- selector;-selector is the concatenation of

selector[I],..., selector [n]end

Page 4: Modularand Scalable Architecture Realization High-speed ...

118 . HATIRNAZ et al.

The algorithm starts by processing the mostsignificant bits (MSB) of the m (2N+ 1) words inthe current window, through an m-input program-mable majority gate (voting gate), to yield theMSB of the desired filter output. The key func-tion performed by the m-input voting gate canbe described as (>_ k out of- m), which meansthat the gate output is logic "1" iff (k) or more ofthe inputs are equal to "1". This output is thencompared with the MSBs of the window elements.The vectors whose MSB is not equal to the filteroutput have their MSB propagated down by oneposition, replacing the less significant bits of thecorresponding words. This process is continued forthe following bit-planes. Thus, any bit that is notequal to the corresponding stage output is propa-gated down to the lesser significant positions,until the least significant bit is processed. Thisprocess ensures that all numbers that are greater(or less) than the i-th ranked number in the windowprogressively eliminated, eventually leaving thecorrect output as the only candidate.

Figure 3 shows an example where, five 8-bitwords (denoted P through T with decimal valuesof 184, 105, 194, 117 and 75 respectively) are beingrank-ordered using the algorithm described above.The window size is m 5 and the rank is k= 3,indicating that the third smallest among these fivenumbers is being found in 8 steps. Note that themain bit-level operation at each step amounts to amajority (rank) decision among n bits of the samebit-plane.

In Step 1, the most significant bit plane isprocessed, and the output is determined as "0"since only two of the MSBs are equal to "1".Notice that the voting function performed by theprogrammable majority gate at this bit-planecorresponds to a (>_3-out-of-5) function.Immediately following this majority decision, theMSBs that do not coincide with the bit-planeoutput (i.e., the MSB of vector P and vector R) arepropagated down to lesser significant bit-planes,thereby eliminating these two vectors as potentialcandidates for output.

Time

Step1: c>

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Step 7:

__7". c>

Step 8: c>

P Q R S T Output

FIGURE 3 An illustration of the rank-ordering algorithm, for five 8-bit words.

Page 5: Modularand Scalable Architecture Realization High-speed ...

ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 119

In Step 2, the majority output is "1" and all 5bits at this bit-plane match the output; therefore,the process is continued on to the next bit-plane.In Step 3, vector T is eliminated by propagating its"0"-bit down to lesser significant bit-planes.

It is worth noting that in some cases, theelimination process may determine the correct out-put before all bit-planes are processed (as in theexample, where the output vector S is essentiallyfound after the 4th step). Yet in our implementa-tion, the algorithm is allowed to progress until itreaches the least significant bit-plane, simply to pre-serve the timing integrity in subsequent runs. Alsonote that the algorithm allows bit-level pipelining:As the process propagates through lesser significantbit-planes, the more significant bit-planes can startoperating on the next input vector set.

2.2. Realization of the Algorithm

The bit-serial operation flow of the algorithmdescribed above suggests a very simple bit-levelpipelined data path architecture. Figure 4 showsthe conceptual hardware implementation of theoperations associated with one bit-plane.

Note that each bit-plane-module consists of twomain blocks:

1. The modifier/selector(propagator) block whosefunction is to store and to shift the actual dataand to calculate the selector signal for the nextprocessing block.

2. The majority or rank decision block whichdetermines the output bit as a function of (m)bits in each bit-plane, with a (>k-out-of- m) operation.

In the modifier/selector block, also called aROF-cell, the output of the majority function is

compared with the corresponding data bit, usingan XNOR gate. The result of this XNORoperation is then combined (AND operation) withthe select signal originating from the previousblock. This provides the informtion if the data bittaken from the previous block is a propagating one

or not. If the data bit is a propagating one, thenthe new select signal will be "0", indicating thatthis data bit will continue propagating unchangedthrough the following stages. Otherwise, the selectsignal will only depend on the result of thecomparison of the filter-slice output with the

One bit-slice of the bit-serial architecture

aji aji+l

Majority DecisionModifier-Selector BlockBlock

FIGURE 4 A 1-bit slice implementation of the rank ordering algorithm (for rn 5).

Page 6: Modularand Scalable Architecture Realization High-speed ...

120 . HATIRNAZ et al.

current data bit. Identical 1-bit filter slices can beused in sequence (cascade configuration) in orderto process input vectors of arbitrary bit-length.Thus, the filter throughput can be increasedsignificantly by bit-level pipelining. The modularstructure of the one-bit slice described above alsoallows for scalable realization of the ROFs withdifferent window sizes and word lengths.

3. IMPLEMENTATION OF THEPROGRAMMABLE ROFARCHITECTURE

3.1. System Components

The proposed modular architecture consists of twotypes of blocks (cells): The ROF-cell and themajority decision gate. By using these two blocks,

a programmable rank-order filter of any windowsize and word-length can be realized. The word-length dictates the number of the majority decisiongates, whereas the window size determines thenumber of ROF-cells driving one of these majoritygates. Figure 5 shows the ROF-cell block realiza-tion at gate level. Note that in addition to thesimple Boolean operators, three DFFs are utilizedfor bit-level pipelining in both dimensions. At eachpositive clock edge, the corresponding select anddata signals are fed to the next blocks, and all datatransmission occurs between neighboring blocks,with the exception of the majority outputs that aretransmitted to all ROF-cells in the correspondingbit-plane.A typical layout of the ROF-cell is also shown in

Figure 5. It can be seen that the layout design ofthis cell permits modular expansion of the array in

Propagated_Data_Out Prev_Data Prev Select(to Majority Gate)

Shifted Data In

CLIIK CLK

l iShifted_Data_OutMUX

MajritY’i’Output

BUFFER

Next Data Next Select

MajorllOutput

FIGURE 5 Gate-level structure of a ROF-cell and the corresponding layout, allowing modular expansion.

Page 7: Modularand Scalable Architecture Realization High-speed ...

sns-uo!#oeleS-lUel:l

<.f: +!>OalaS

Page 8: Modularand Scalable Architecture Realization High-speed ...
Page 9: Modularand Scalable Architecture Realization High-speed ...

ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 123

both dimensions, by abutting the input and outputsignal pins of each cell with those of the neigh-boring cells. The programmable majority (voting)gate associated with each bit-plane is responsiblefor processing the propagated data output of allROF-cells in that bit-plane. This critical systemblock can be realized as a single capacitive thresh-old logic gate, which is described in detail inSection 4.

3.2. Overall System Architecture

The signal flow between the ROF cells and themajority gates are shown in Figure 6. The modulararchitecture consisting of only two major blocksenables fully scalable construction of filter struc-tures of arbitrary size. The ROF core has (n x m)ROF cells where m (2N+ 1) is the window sizeand (n) is the bit-length of the input words. TheROF cells processing the bits of same significanceprovide the necessary inputs to the correspondingmajority decision block which determines the filteroutput bit of that level. This output bit is fed to theoutput shift registers and back to the ROF cells, tobe used in determining the select and data signalswhich will be the inputs of the next stage.The top level block diagram of the program-

mable ROF design is shown in Figure 7. Thearchitecture consists of three main blocks: inputshift registers, ROF processing core, and outputshift registers. To allow bit-level pipelined opera-tion, the input bits are ordered using a staggeredshift register array (Fig. 7). As such, the ROFpipeline accepts a new input vector of word length(n) and produces a valid output vector in eachclock period. The overall latency of this architec-ture is (n 1) clock cycles, meaning that the outputvector generated at any given clock cycle corre-sponds to the input window which is completelyshifted into the ROF core (n- 1) cycles earlier.

3.3. Advantages of the Proposed Architecture

The realization of fully programmable rank orderfilters has traditionally been a very challenging

design problem, mainly due to the fact that therank selection function (programmable majorityfunction) is extremely hardware-intensive usingconventional design approaches. As a result, mostof the design efforts so far have either beenconstrained to median-only filters without anyrank selection capability, and/or to relatively smallwindow sizes [7, 8]. The architecture presented hereis superior to other ROF implementations, with itsfollowing capabilities:

1. The CTL-based realization of the majority gatesused in the ROF architecture allows the filterrank and the window size to be fully program-mable, using a much smaller silicon area.

2. The rank-ordering algorithm implemented withthis architecture does not require the elements ofthe input window to be pre-ordered, as opposedto other, stack-based ordering algorithms [7].This increases the processing efficiency.

3. The proposed ROF has a modular architecturewhich enables easy expandability of the windowsize and word length of the input vectors with-out a dramatic change in performance.

4. The overall circuit complexity increases linearlyboth with maximum window size (m) and withword length (n).

5. The clock frequency is essentially independentof the input window size (m), and the overalllatency of the pipeline is (n- 1) clock cycles.

4. REALIZATION OF THE MAJORITYGATE USING CAPACITIVETHRESHOLD LOGIC

The design of the modifier/selector block shown inFigure 4 is relatively straight-forward using con-ventional CMOS logic gates, whereas the majoritydecision block presents a bottleneck with conven-tional realization methods both in terms of circuitcomplexity and in terms of logic depth. This well-known limitation has traditionally been a signifi-cant impediment for the hardware realization ofsimilar structures [1,8, 11, 12]. Since the circuit

Page 10: Modularand Scalable Architecture Realization High-speed ...

124 [. HATIRNAZ et al.

complexity (number of gates) and depth (numberof stages) increase rapidly with the number ofinputs, the practical fan-in of voting circuits isusually restricted. For example, a conventionalrealization of the 63-bit majority gate wouldrequire an equivalent of 63 6-bit full-adder circuits,arranged in a network of a logic depth of 64(synthesized from HDL description).

Since the majority decision is in fact a thresholdfunction, it can be most efficiently implementedwith threshold logic. Therefore, capacitive thresh-old logic (CTL) gates, presented earlier by theauthors, offer a more efficient realization of themajority function compared to conventional logicgates [5]. The most significant advantage of theCTL-based realization is that it allows theconstruction of a very-large input programmablemajority (voting) function as a single-level logicgate. As presented in [5], a 31-input programmablemajority gate based on CTL is almost three timesfaster than a full custom standard CMOS realiza-tion [10] and occupies approximately one third ofthe area which results in a area-delay performanceincrease of nearly an order of magnitude. Inaddition, the CTL-based majority gate can beeasily integrated with the conventional CMOSgates used in the architecture, since the input andoutput signals are fully CMOS compatible.The circuit layout of a 15-input single-level

programmable majority gate based on CTL isshown in Figure 8, occupying a silicon area of(320 lam x 70 lam) using 0.8 lam CMOS technology.The circuit consists of an array of unit weightcapacitors, simple input and threshold switches to

control the input variables, and an output voltagecomparator to generate the majority output. Thelayout area of a 31-input majority gate is onlytwice as big as that of the 15-bit input gate, with(620 lam x 70 tm). The operation of the 31-inputmajority circuit is demonstrated in Figure 9, wherethe voting threshold is set to 16 for the first 7consecutive inputs, and then the threshold is re-programmed to 5 for the next 8 inputs. This circuitis capable of producing a majority output bit witha typical propagation delay of about 3ns.A larger version of the CTL-based program-

mable majority gate with 63 parallel inputs hasalso been designed for use as the main buildingblock of the ROF prototype chip. The 63-inputgate occupies a silicon area of (625 tm 130 tm),and its worst-case input-to-output propagationdelay remains less than 8 ns. Notice that one ofthe most important advantages of CTL-basedrealization of majority gates is that the siliconarea strictly increases linearly with the input size,while the propagation delay of this one-levelstructure remains a weak logarithmic function offan-in.

5. EXPERIMENTAL RESULTSAND DESIGN VALIDATION

A prototype ROF test circuit has been designedusing a conventional 0.8gm double-polysiliconCMOS process, to validate the main operationprinciples of the architecture. The prototype blocksconsist of four bit-level pipeline stages, each of

FIGURE 8 Layout of the 15-bit input majority gate. The silicon area is (320 gm x 70 gm) using 0.8 gm CMOS technology.

Page 11: Modularand Scalable Architecture Realization High-speed ...

ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 125

Post Layout Simulation Results

.1,5- 16 .1 16 16 16

’’-! ; i, i,,,..i

40 80 120 160 200Time[ns]

Reset

Eval

2.5

240 280 320 360

FIGURE 9 The illustration of the 31-input majority gate operation.

FIGURE 10 Top-level layout of the fabricated prototype circuit.

which contains a 63-bit programmable majoritygate to handle the rank selection. To limit overallcircuit complexity and to enable easier testing, each

stage is designed to process a maximum windowsize of four samples. The window size capability ofthe ROF circuit can be easily increased to

Page 12: Modularand Scalable Architecture Realization High-speed ...

126 i. HATIRNAZ et al.

accommodate up to 63 samples, since the pro-grammable majority gates already support thiswindow size. The top level layout of this ROFprototype is shown in Figure 10. The circuitoccupies a silicon area of (800 lm 1150 m).The operation of the ROF architecture is

demonstrated with detailed measurement resultsin Figure 11 which were obtained using an HP16500C Logic Analysis System. Here, the window

size is 4, and the rank is selected to be 2-meaningthat the second largest input word in the samplewindow will have to be selected. All four bits of theinput and the output are displayed individually.The input words for this sample window are

"0101", "0110", "0111" and "1000". The correct

output, "0111" appears after 3 clock cycles.Figures 12(a) and 12(b) show the measured filter

output for longer input sequences, again demon-

ROFIN 3

ROFIN 2

ROFIN

ROFIN 0

ROFIN all

CLK

ROFOUT 3

ROFOUT 2

ROFOUT

ROFOUT 0

ROFOUT all

0 0 o

oo 10,1 o,3 Io,, ,ll o6 07

FIGURE 11 Measured output sequences of the prototype circuit.

(a) Searching for the second smallest vector in the window.

(b) Searching for the second largest vector in the window.

FIGURE 12 Measurement results for longer sequences.

Page 13: Modularand Scalable Architecture Realization High-speed ...

ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 127

FIGURE 13 Top-level layout of the ROF circuit with a window size of 63 and word-length of 16 bits. The silicon area isapproximately (5 mm x 5 mm).

strating that the prototype circuit produces thecorrect output sequence with a latency of 3 clockcycles.

Finally, to serve as a dramatic demonstration ofthe proposed filter architecture, the top-levellayout of a programmable ROF core with amaximum window size of 63 samples and a

word-length of 16 bits is shown in Figure 13. Toour knowledge, the design of a rank order filter ofthis complexity has not been attempted before.The circuit occupies a silicon area of approxi-mately (5mm x 5mm) with 0.8 tm double-poly-silicon CMOS technology, and operates with a

latency of 15 clock cycles at the clock frequency of50 MHz, which results in an effective data rate of

800Mbits/s. The silicon area as well as theperformance of this structure compare veryfavorably not only with any of the existing filterarchitectures proposed so far [7, 8], but also withany extrapolated characteristics thereof for largerwindow size and word lengths.

6. CONCLUSION

In this paper, we have presented a scalablearchitecture for the realization of fully program-mable rank-order filters. The bit-serial realization

of the rank ordering algorithm allows a simplefine-grained pipelined filter architecture which ishighly modular and easily expandable. The overallperformance and processing efficiency of thearchitecture are also boosted significantly by thebit-serial pipelining. Capacitive Threshold Logic(CTL) gates are utilized for the implementation ofthe multi-input programmable majority (voting)functions required in the architecture. The CTL-based realization of the majority gates used in theROF architecture allows the filter rank as well asthe window size to be user-programmable, using amuch smaller silicon area, compared to conven-tional realizations of digital median filters.The circuit complexity of the proposed archi-

tecture grows only linearly both with window size

(m) and with the word length (n) of the inputvectors. The clock frequency remains essentiallyindependent of the window size and word length,and the ROF core is capable of producing a new

output vector (corresponding to the input window)at each clock period, with a latency of (n-1)cycles. The operation and performance of theproposed circuit architecture has been experimen-tally demonstrated with a prototype chip. Resultsindicate that the new architecture can accommo-date sampling clock rates of up to 50MHz,corresponding to an effective data processing rate

Page 14: Modularand Scalable Architecture Realization High-speed ...

128 i. HATIRNAZ et al.

of 800 Mbits/s for a very large filter with windowsize 63 and word length of 16 bits.

References

[1] Richards, D. S., "VLSI median filters", IEEE Trans.Acoust. Speech, Signal Processing, 38, 145-152, January,1990.

[2] Fitch, J. P., Coyle, E. L. and Callagher, N. C. (1984)."Median filtering by threshold decomposition", IEEETrans. Acous. Speech, Signal Proc., 32, 1183-1188.

[3] Huang, T. S. and Yang, G. J., "Median filters and theirapplications to image processing", School of Elec. Eng.,Purdue Univ., West Lafayette, IN, TR-EE 80-1, January,1980.

[4] Gasteratos, A., Andreadis, I. and Tsalides, Ph. (1997)."Realization of rank order filters based on majority gate",Pattern Recognition, 30(9), 1571-1576.

[5] Leblebici, Y., Gurkaynak, F. K. and Mlynek, D., "Acompact 31-input programmable majority gate based oncapacitive threshold logic", In: Proc. IEEE Int. ASICConference 1998, pp. 281 285.

[6] Kar, B. K. and Pradhan, D. K., "A new algorithm fororder statistic and sorting", IEEE Trans. on SignalProcessing, 41, 2688- 2694, August, 1993.

[7] Lin, C. C. and Kuo, C. J., "Fast response 2-D rank orderalgorithm by using max-min sorting network", Interna-tional Conference on Image Processing 1996, 1,403-406.

[8] Chen, C., Chen, L., Chiueh, T. and Hsiao, J., "An efficientpipelined VLSI implementation of rank order filter",ISSIPNN 1994, 2, 630-633.

[9] Lee, C. L. and Jen, C. W. (1992). "Bit-sliced median filterdesign based on majority gate", In: Proc. Ins. Elec. Eng. G,139, 63-71.

[10] Swartzlander, E. E. Jr., "Parallel counters", IEEE Trans.Comput., 34, 1021-1024, November, 1973.

[11] Wendt, P. et al. (1986). "Stack filters", IEEE Trans.Acoust. Speech, Signal Processing, pp. 898-911.

[12] Gu, Q. and Swamy, M. N. S., "A binary logic synthesisapproach to the bit-level implementation of generalizedrank-order filters", Proc. ISCAS92, pp. 109-112.

Authors’ Biographies

Ilhan Hatirnaz received his B.Sc. degree inelectrical and electronics engineering from Istan-bul Technical University in 1996. From 1996 to

1998, he worked as a design engineer at theMarmara Research Institute of the TurkishScientific and Technological Research Council.

Since Fall 1998, he has been a graduate student atthe Electrical and Computer Engineering Depart-ment of Worcester Polytechnic Institute. Hisresearch interests include VLSI design, novelcharge-based circuit structures for data processing,and digital signal processing architectures.Frank K. Gurkaynak received his B.Sc. and

M.Sc. degrees in electrical and electronics engi-neering from Istanbul Technical University in1994 and 1998, respectively. From May, 1997 toDecember, 1997, he also worked as a researchassistant at the Integrated Systems Laboratory ofthe Swiss Federal Institute of Technology (EPFL)in Lausanne, Switzerland. Currently, he is agraduate student at the Electrical and ComputerEngineering Department of Worcester PolytechnicInstitute. His research interests include full-customdesign methodologies, layout design automationfor VLSI, digital arithmetic building blocks, anddata-path synthesis methodologies.

Yusuf Leblebiei received the B.Sc. and M.Sc.degrees in electrical engineering from IstanbulTechnical University in 1984 and 1986, respec-tively, and the Ph.D. degree in electrical andcomputer engineering from the University ofIllinois at Urbana-Champaign in 1990. He workedas Visiting Assistant Professor at UIUC between1991 and 1993, as Associate Professor of ElectricalEngineering at Istanbul Technical Universitybetween 1993 and 1998, and as Invited Professorat the Swiss Federal Institute of Technology inLausanne, Switzerland between 1996 and 1998.Currently, he is Associate Professor of Electricaland Computer Engineering at Worcester Poly-technic Institute. His research interests includedigital and mixed-signal integrated circuit design,VLSI design automation and layout synthesismethodologies, and novel architectures for digitalsignal processing applications.

Page 15: Modularand Scalable Architecture Realization High-speed ...

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

DistributedSensor Networks

International Journal of