Final

1

CHAPTER 1:INTRODUCTION

1.1 Introduction

Convolution provides the mathematical framework for DSP. It is the single most important

technique in Digital Signal Processing. Convolution is a mathematical way of combining two

signals to form a third signal. Using the strategy of impulse decomposition, systems are

described by a signal called the impulse response. In signal processing, the impulse response, or

impulse response function (IRF), of a dynamic system is its output when presented with a brief

input signal, called an impulse. More generally, an impulse response refers to the reaction of any

dynamic system in response to some external change. It has applications that include statistics,

computer vision, image and signal processing, electrical engineering, and differential equations.

1.2 Introduction to Convolution

One of the most important concepts in Fourier theory, and in crystallography, is that of a

convolution. Convolutions arise in many guises, as will be shown below. Because of a

mathematical property of the Fourier transform, referred to as the convolution theorem, it is

convenient to carry out calculations involving convolutions.

1.2.1 Convolution Definition

The convolution of ƒ and g is written ƒ∗g, using an asterisk or star. It is defined as the integral

of the product of the two functions after one is reversed and shifted. As such, it is a particular

kind of integral transform:

1

http://en.wikipedia.org/wiki/Integral_transform

http://en.wikipedia.org/wiki/Differential_equations

http://en.wikipedia.org/wiki/Electrical_engineering

http://en.wikipedia.org/wiki/Signal_processing

http://en.wikipedia.org/wiki/Image_processing

http://en.wikipedia.org/wiki/Computer_vision

http://en.wikipedia.org/wiki/Statistics

1

While the symbol t is used above, it need not represent the time domain. But in that context, the

convolution formula can be described as a weighted average of the function ƒ(τ) at the moment t

where the weighting is given by g(−τ) simply shifted by amount t. As t changes, the weighting

function emphasizes different parts of the input function.

More generally, if f and g are complex-valued functions on Rd, then their convolution may be

defined as the integral:

1.3 Types of Convolution

There are two types of convolution. They are:

Linear convolution Circular convolution

1.3.1 Linear convolution

Convolution is an integral concatenation of two signals. It has many applications in numerous

areas of signal processing. The convolution described above is nothing but linear convolution.

The most popular application is the determination of the output signal of a linear time-invariant

system by convolving the input signal with the impulse response of the system. Convolving two

signals is equivalent to multiplying the Fourier transform of the two signals.

Mathematic Formula:

The linear convolution of two continuous time signals and is defined by

2

1

For discrete time signals x(n) and h(n) , the integration is replaced by a summation

1.3.2 Circular convolution

The circular convolution of two aperiodic functions occurs when one of them is convolved in the

normal way with a periodic summation of the other function. It occurs naturally in digital signal

processing when DTFTs and inverse DTFTs are replaced by DFTs and inverse DFTs.

Equivalently, the continuous frequency domain is replaced by a discrete one. (See Circular

convolution theorem.)

For a periodic function xT(t) , with period T, the convolution with another function, h(t), is also

periodic, and can be expressed in terms of integration over a finite interval as follows:

Where, to is an arbitrary parameter, and hT(t) is a periodic summation of h, defined by:

When xT(t) is expressed as the periodic summation of another function, x, this convolution is

sometimes referred to as a circular convolution of functions h and x.

3

http://en.wikipedia.org/wiki/Periodic_summation


http://en.wikipedia.org/wiki/Convolution

http://en.wikipedia.org/wiki/Discrete_Fourier_transform#Circular_convolution_theorem_and_cross-correlation_theorem

http://en.wikipedia.org/wiki/Discrete_Fourier_transform#Circular_convolution_theorem_and_cross-correlation_theorem

http://en.wikipedia.org/wiki/DFT

http://en.wikipedia.org/wiki/DTFT

http://en.wikipedia.org/wiki/Digital_signal_processing

http://en.wikipedia.org/wiki/Digital_signal_processing


1

1.4 Properties of convolution

This section describes the properties of convolution. The properties of convolution are:

Commutative

Associative

Distributive

1.4.1 Commutative property:

The commutative property for convolution is expressed in mathematical form:

a[n] * b[n] = b[n] * a[n]

In words, the order in which two signals are convolved makes no difference, the results are identical.

1.4.2 Associative property:

The associative property describes the way to convolve more than two signals. Convolve two of

the signals to produce an intermediate signal, then convolve the intermediate signal with the third

signal. The associative property provides that the order of the convolutions doesn't matter. As an

equation:

(a[n] * b[n] ) * c[n] = a[n] * ( b[n] * c[n] )

The associative property is used in system theory to describe how cascaded systems behave.

Two or more systems are said to be in a cascade if the output of one system is used as the input

for the next system. From the associative property, the order of the systems can be rearranged

without changing the overall response of the cascade. Further, any number of cascaded systems

can be replaced with a single system. The impulse response of the replacement system is found

by convolving the impulse responses of all of the original systems.

4

1

1.4.3 Distributive property:

In equation form, the distributive property is written as:

a[n] * b[n] + a[n] * c[n] = a[n] * (b[n] + c [n] )

The distributive property describes the operation of parallel systems with added outputs. Two

or more systems can share the same input, x[n] , and have their outputs added to produce y[n] .

The distributive property allows this combination of systems to be replaced with a single system,

having an impulse response equal to the sum of the impulse responses of the original systems.

1.5 Applications of Convolution

Convolution and related operations are found in many applications of engineering and mathematics. The following are the areas where convolution is being applied .

In statistics, as noted above, a weighted moving average is a convolution.

In probability theory, the probability distribution of the sum of two independent random

variables is the convolution of their individual distributions.

In optics, many kinds of "blur" are described by convolutions. A shadow (e.g. the shadow on

the table when you hold your hand between the table and a light source) is the convolution of

the shape of the light source that is casting the shadow and the object whose shadow is being

cast. An out-of-focus photograph is the convolution of the sharp image with the shape of the

iris diaphragm. The photographic term for this is bokeh.

Similarly, in digital image processing, convolutional filtering plays an important role in many

important algorithms in edge detection and related processes.

In linear acoustics, an echo is the convolution of the original sound with a function

representing the various objects that are reflecting it.

In artificial reverberation (digital signal processing, pro audio), convolution is used to map

the impulse response of a real room on a digital audio signal (see previous and next point for

additional information).

5

1

In electrical engineering and other disciplines, the output (response) of a (stationary, or time-

or space-invariant) linear system is the convolution of the input (excitation) with the system's

response to an impulse or Dirac delta function. See LTI system theory and digital signal

processing.

In time-resolved fluorescence spectroscopy, the excitation signal can be treated as a chain of

delta pulses, and the measured fluorescence is a sum of exponential decays from each delta

pulse.

In physics, wherever there is a linear system with a "superposition principle", a convolution

operation makes an appearance.

In digital signal processing, frequency filtering can be simplified by convolving two

functions (data with a filter) in the time domain, which is analogous to multiplying the data

with a filter in the frequency domain

6

http://wiki.answers.com/Q/State_the_applications_of_linear_convolution

2

CHAPTER 2: LITERATURE REVIEW

2.1 Introduction

The most important operation performed on signals is linear filtering, which can be

performed by convolution. The reason that linear filtering is so important to signal processing is

that it solves many problems and is relatively simple to describe mathematically. In this chapter

we will be looking at convolution. Convolution helps to determine the effect a system has on an

input signal. It can be shown that a linear, time-invariant system is completely characterized by

its impulse response. Using the sampling property of the delta function for continuous time

signals and the unit sample for discrete time signals we can decompose a signal into an infinite

sum / integral of scaled and shifted impulses. By knowing how a system affects a single

impulse, and by understanding the way a signal is comprised of scaled and summed impulses, it

seems reasonable that it should be possible to scale and sum the impulse responses of a system in

order to determine what output signal will results from a particular input. This is precisely what

convolution does - convolution determines the system's output from knowledge of the input and

the system's impulse response.

2.2 Convolution - Discrete time

The idea of discrete-time convolution is exactly the same as that of continuous-time convolution.

For this reason, it may be useful to look at both versions to help your understanding of this

extremely important concept. Convolution is a very powerful tool in determining a system's

output from knowledge of an arbitrary input and the system's impulse response.

We know that any discrete-time signal can be represented by a summation of scaled and

shifted discrete-time impulses. Since we are assuming the system to be linear and time-invariant,

it would seem to reason that an input signal comprised of the sum of scaled and shifted impulses

would give rise to an output comprised of a sum of scaled and shifted impulse responses. This is

exactly what occurs in convolution. For discrete time signals the convolution equation is given

by:

7

2

Graphical Interpretation:

Reflection of resulting in

Shifting of resulting in

Element-wise multiplication of the sequences and

Summation of the product sequence resulting in the convolution value for

2.2.1 Graphical illustration of convolution properties (Discrete - time)

A quick graphical example may help in demonstrating why convolution works.

Fig 2.2.1.1: A single impulse input yields the system's impulse response.

8

2

Fig 2.2.1.2. : A scaled impulse input yields a scaled response, due to the scaling property of the

system's linearity.

Fig 2.2.1.3: We now use the time-invariance property of the system to show that a delayed input

results in an output of the same shape, only delayed by the same amount as the input.

9

2

Fig 2.2.1.4 : We now use the additively portion of the linearity property of the system to complete the picture. Since any discrete-time signal is just a sum of scaled and shifted discrete-time impulses, we can find the output from knowing the input and the impulse response.

2.3 Convolution – Analog

In this module we examine convolution for continuous time signals. This will result in the

convolution integral and its properties. These concepts are very important in Engineering and

10

2

will make any engineer's life a lot easier if the time is spent now to truly understand what is

going on.

2.3.1 Derivation of the convolution integral

To begin this, it is necessary to state the assumptions we will be making. In this instance, the

only constraints on our system are that it be linear and time-invariant.

Brief Overview of Derivation Steps:

1. An impulse input leads to an impulse response output.

2. A shifted impulse input leads to a shifted impulse response output. This is due to the time-

invariance of the system.

3. We now scale the impulse input to get a scaled impulse output. This is using the scalar

multiplication property of linearity.

4. We can now "sum up" an infinite number of these scaled impulses to get a sum of an infinite

number of scaled impulse responses. This is using the additively attribute of linearity.

5. Now we recognize that this infinite sum is nothing more than an integral, so we convert both

sides into integrals.

6. Recognizing that the input is the function f(t), we also recognize that the output is exactly the

convolution integral.

Fig 2.3.1.1: We begin with a system defined by its impulse response, h(t).

11

2

Fig 2.3.1.2: We then consider a shifted version of the input impulse. Due to the time invariance

of the system, we obtain a shifted version of the output impulse response.

Fig 2.3.1.3: Now we use the scaling part of linearity by scaling the system by a value, f(τ), that

is constant with respect to the system variable, t.

Fig 2.3.1.4: We can now use the additively aspect of linearity to add an infinite number of these,

one for each possible τ. Since an infinite sum is exactly an integral, we end up with the

integration known as the Convolution Integral. Using the sampling property, we recognize the

left-hand side simply as the input f(t).

2.3.2 Convolution Integral

As mentioned above, the convolution integral provides an easy mathematical way to express the

output of an LTI system based on an arbitrary signal, x (t), and the system's impulse response,

h(t) . The convolution integral is expressed as

12

2

Convolution is such an important tool that it is represented by the symbol *, and can be written

as

y (t) = x(t) * h(t)

By making a simple change of variables into the convolution integral, τ = t−τ, we can easily

show that convolution is commutative:

x (t) * h(t) = h(t) * x(t)

2.3.3 Implementation of Convolution

Taking a closer look at the convolution integral, we find that we are multiplying the input signal

by the time-reversed impulse response and integrating. This will give us the value of the output

at one given value of t. If we then shift the time-reversed impulse response by a small amount,

we get the output for another value of t. Repeating this for every possible value of t, yields the

total output function. While we would never actually do this computation by hand in this

fashion, it does provide us with some insight into what is actually happening. We find that we

are essentially reversing the impulse response function and sliding it across the input function,

integrating as we go. This method, referred to as the graphical method, provides us with a much

simpler way to solve for the output for simple (contrived) signals, while improving our intuition

for the more complex cases where we rely on computers. In fact Texas Instruments develops

Digital Signal Processors which have special instruction sets for computations such as

convolution.

The main assumption of the consistency principle and the mutual correspondence principle

between continuous and digital transformations is that the signal is represented discretely

13

2

through shift sampling and reconstruction. An image convolution is a filtering step in which an

image is the input and a computed image is the output, with each sample of the output image

calculated by individually weighting and then constructively and/or destructively summing the

samples from some neighborhood of the input image. We did implement for the algorithm shown

below and mentioned in. We take the two discrete finite length sequences and lines the columns

up like regular multiplication but rather than carrying the number over to the next column he

writes it down in the same column. For example lets say that we are given two discrete finite

length sequences x[n] and h[n] where x[n] = {a1 a2 a3} and h[n] = { b1 b2 b3 b4} are

convolved, y[n] =x[n]*h[n], in a way that is similar to regular multiplication as shown below in

Table 2.3.3

Table 2.3.3

As we were evaluating possible design approaches to achieve low speed, our research took us

through the following progression. Figure 2.3.3 shows the convolution flow of two 16-bit

numbers, in 4-bit segments. The letters A, B, C, D, E, F, G,and H each represent 4 bits of the 16

bits number. We sum the partial product along each column; HD0 is the LS 4 bits of the product

while HD1 is the MS 4 bits of the product.The Digital Convolution is summarized as: first Flip

(reverse) one of the digital functions, second Shift it along the time axis by one sample. Third,

multiply the corresponding values of the two digital functions. Fourth, sum the products from

step 3 to get one point of the Digital Convolution. And finally repeat steps 1-4 to obtain the

14

2

digital convolution at all times that the functions overlap. For example, let X= [1 2 3 4 5] and v =

[-1 5 3 -2 1].

Figure 2.3.3.1 convolution results

A discrete convolution of these two discrete signals equals:-1 3 10 15 21 33 10 -6 5. We used

Matlab to check the results which is shown in figure 2.3.3.1. For continuous function, y(t) =

x(t)*h(t) where the input,x(t), and the impulse response, h(t) has a sufficiently small delta to

make the result to be accurate. The e results are shown in figure 2.3.3.2.

x= [-2*ones(1,400) zeros(1,1000) 3*ones(1,100)]

15

2

h=ones(1,300);

conv(x,-3,h,-2,0.01)

Figure 2.3.3.2 convolution y[n] =x[n]*h[n]

16

2

Figure 2.3.3.3 convolution of x(t) and h(t)

17

2

High performance Digital Signal Processing chips have been widely employed to solve signal

processing problems. Many of these signal processing solutions can be implemented in a Field

Programmable Gate Array (FPGA) instead of a DSP chip. This is possible because the gate

densities available in FPGAs have increased rapidly within the last few years and now allow

fairly sophisticated DSP algorithms to be implemented within a single chip. In they try to

implement the convolution in an FPGA. Their approach in calculating a finite number of L

convolution samples requires approximately 3L+L(L+1)/2 clock cycles and addresses for the two

data memories which cost lots of access time resources. In their design they extend the result of

the multiplication by six more overflow bits before the results are added to the previous sum of

products. This is done so they can prevent overflow which is costly.

Depending on the application and desired quality (i.e. the width of the filter kernel), computing

this weighted sum of neighboring pixels can require significant amounts of computation, thus

suggesting a highly parallel implementation in special-purpose hardware. In they discuss

parameterized program generation of convolution filters in an FPGA for applications in image

processing including real-time video and desktop publishing. They show an example of 2-D filter

pipeline assembled from a set of multipliers and adders, which are in turn generated from a

canonical serial-parallel multiplier stage. They show a 3x3 convolution filter for video

applications. The drawback In their research is they have a high fan-in and because of the

pipeline delay, output pixels may be rewritten directly into the source image memory.

It is important to point out the emerging field of algorithm derivation and implementation, which

could be used as a basis for future work. In it is shown there are no restrictions imposed on the

convolution length other than to be composite, but they pointed out FPGA implementation will

be a future work. Breitzman shows the automatic derivation and implementation of fast

convolution algorithms and Arce-Nazario presents an automated methodology designed for the

high-level partitioning of discrete signal transforms onto distributed hardware architectures.

18

2

To efficiently control the number of required multipliers, at the cost of a reasonable number of

adders, a study was done on a hardware efficient fast cyclic convolution algorithm. It shows the

I/O cost can be kept low and the throughput rate high. Thus, it is much more efficient than

previous cyclic convolution implementation methods. But independently applying this algorithm

for prime-length DFT will still require huge amount of hardware cost. Some specific DFT

designs remove the multiplication operations, but they require a large number of adders and

RAM/ROM resources.

Another approach people use is to go through Matlab. It is used to automatically generate

Verilog code for the hardware implementation of convolution algorithms. This automation is

very efficient when the coefficients change. As mentioned in when they are trying to implement

FIR filter, some inputs go through two consecutive subtraction operators. This optimization can

be done when the Verilog code is being automatically generated. In their implementations they

used carry-save adders to accumulate consecutive adders which are slow compared to using

other adders as will be discussed in the next section. Note that the number of required additions

is dependent on the order of iterations. The iteration order for short convolutions should be 4x4,

3x3 and 2x2, as this will lead to the lowest implementation cost.

The research paper in shows a substitute algorithm for calculating the convolution that requires

less computation time. It is shown that CDMA receivers require a long time to acquire the

signals. This is mostly due to the use of expensive FFT based convolvers in the acquisition

process. The permutations usually can be stored in lookup tables . This type of implementation is

not efficient since it will cost additional hardware to store and time to retrieve.

2.4 Symmetric convolution

In mathematics, symmetric convolution is a special subset of convolution operations in which

the convolution kernel is symmetric across its zero point. Many common convolution-based

processes such as Gaussian blur and taking the derivative of a signal in frequency-space are

symmetric and this property can be exploited to make these convolutions easier to evaluate.

19

http://en.wikipedia.org/wiki/Frequency_domain

http://en.wikipedia.org/wiki/Derivative

http://en.wikipedia.org/wiki/Gaussian_blur

http://en.wikipedia.org/wiki/Symmetric

http://en.wikipedia.org/wiki/Convolution_kernel


http://en.wikipedia.org/wiki/Mathematics

2

The convolution theorem states that a convolution in the real domain can be represented as a

point-wise multiplication across the frequency domain of a Fourier transform. Since sine and

cosine transforms are related transforms a modified version of the convolution theorem can be

applied, in which the concept of circular convolution is replaced with symmetric convolution.

Using these transforms to compute discrete symmetric convolutions is non-trivial since discrete

sine transforms (DSTs) and discrete cosine transforms (DCTs) can be counter-intuitively

incompatible for computing symmetric convolution, i.e. symmetric convolution can only be

computed between a fixed set of compatible transforms.

2.4.1 Advantages of symmetric convolution

There are a number of advantages to computing symmetric convolutions in DSTs and DCTs in

comparison with the more common circular convolution with the Fourier transform. Most

notably the implicit symmetry of the transforms involved is such that only data unable to be

inferred through symmetry is required. For instance using a DCT-II, a symmetric signal need

only have the positive half DCT-II transformed, since the frequency domain will implicitly

construct the mirrored data comprising the other half. This enables larger convolution kernels to

be used with the same cost as smaller kernels circularly convolved on the DFT. Also the

boundary conditions implicit in DSTs and DCTs create edge effects that are often more in

keeping with neighboring data than the periodic effects introduced by using the Fourier

transform.

20

http://en.wikipedia.org/wiki/Discrete_cosine_transform

http://en.wikipedia.org/wiki/Discrete_sine_transform

http://en.wikipedia.org/wiki/Discrete_sine_transform

http://en.wikipedia.org/wiki/Circular_convolution

http://en.wikipedia.org/wiki/Sine_and_cosine_transforms

http://en.wikipedia.org/wiki/Sine_and_cosine_transforms

http://en.wikipedia.org/wiki/Fourier_transforms

http://en.wikipedia.org/wiki/Pointwise_product

http://en.wikipedia.org/wiki/Convolution_theorem

3

CHAPTER 3: DESIGN OF HARDWARE MODEL

3.1 Convolution

Convolution is an important tool in data processing, in particular in digital signal and image

processing. Many image processing operations such as scaling and rotation require re-sampling

or convolution filtering for each pixel in the image Digital images can be modified (through

convolution) by neighborhood operations; these operations go beyond point wise operations, and

include smoothing, sharpening, and edge detection. Convolution has many applications which

have great significance in discrete signal processing. It is usually difficult to deal with analog

signals. Hence signals are converted to digital state. Many approaches have been attempted to

reduce the convolution processing time using hardware and software algorithms but they are

restricted to specific applications. The main problem in implementing and

computing convolution is speed, area and power which affect any DSP

system. Speeding up convolution using a Hardware Description Language for

design entry not only increases (improves) the

level of abstraction, but also opens new possibilities for using programmable

devices. Today, most DSPs suffer from limitations in available address space,

or the ability to interface with surrounding systems. The use of high speed

field programmable gate arrays i.e. FPGAs, together with DSPs, can often

increase the system bandwidth, by providing additional functionality to the

general purpose DSPs .In this project, a novel method for computing the

linear convolution of two finite length sequences is presented. A 4x4

convolution circuit can be instantiated for larger ones. This method is similar

to the multiplication of two decimal numbers, this similarity that makes this

method easy to learn and quick to computes.

3.2 Convolution in time domain

21

3

When two signals convolution is carried out in time domain it is referred to as convolution in

time domain. We are dealing with convolution in time domain in this project. In time domain

also the convolution can be continuous or discrete. When the convolution is in time domain is

discrete then it is called as convolution in discrete time and when the convolution is performed

with respect to continuous time it is called as convolution as convolution in continuous time.

Convolution in discrete and continuous time are described in previous chapter.

3.3 Convolution in frequency domain

When two signals are convolved in frequency domain then it is called as convolution in

frequency domain. It is proved that the convolution in time domain is equivalent to

multiplication in frequency domain.

Proof:

Let f, g belong to L1 (Rn). Let F be the Fourier transform of f and G be the Fourier transform of g:

Where the dot between x and ν indicates the inner product of Rn . Let h be the convolution of f and g

Now notice that

22


3

Hence by Fubini's theorem we have that so its Fourier transform H is defined by the integral formula

Observe that and hence by the argument above we may apply Fubini's theorem again:

Substitute y = z − x; then dy = dz, so:

These two integrals are the definitions of F(ν) and G(ν), so:

23

http://en.wikipedia.org/wiki/Fubini's_theorem

3

Hence, it is proved that the convolution in time domain is equivalent to multiplication in frequency domain.

3.4 General implementation flow

The generalized implementation flow diagram of the project is represented as follows.

24

3

Figure 3.4 generalized implementation flow diagram

Initially the market research should be carried out which covers the previous version of the

design and the current requirements on the design. Based on this survey, the specification and the

architecture must be identified. Then the RTL modelling should be carried out in VERILOG

25

3

HDL with respect to the identified architecture. Once the RTL modelling is done, it should be

simulated and verified for all the cases. The functional verification should meet the intended

architecture and should pass all the test cases.

Once the functional verification is clear, the RTL model will be taken to the synthesis

process. Three operations will be carried out in the synthesis process such as

Translate

Map

Place and Route

The developed RTL model will be translated to the mathematical equation format which

will be in the understandable format of the tool. These translated equations will be then mapped

to the library that is, mapped to the hardware. Once the mapping is done, the gates were placed

and routed. Before these processes, the constraints can be given in order to optimize the design.

Finally the BIT MAP file will be generated that has the design information in the binary format

which will be dumped in the FPGA board.

3.5 Implementation

In this project the implementation is carried out by first designed the individual blocks and then

these are combined to the final architecture. The individual blocks are shown in block diagram

given below:

3.5.1 Block diagram of proposed architecture

The block diagram of the proposed architecture is shown below:

26

3

Figure 3.5.1 block diagram of the proposed architecture

3.5.1.1 Multiplexer 4*1 and 8*1:

A multiplexer, sometimes referred to as a "multiplexer" or simply "mux", is a device that selects

between a numbers of input signals. In its simplest form, a multiplexer will have two signal

inputs, one control input, and one output.

A multiplexer is a device which selects any one of the inputs from 2n inputs and directed

to output depending on n-select lines.

27

http://www.wisegeek.com/what-is-a-multiplexor.htm

3

Figure 3.5.1.1.1 4*1 multiplexer

Figure 3.5.1.1.2 8*1 multiplexer

The higher order multiplexers can be implemented using the lower order multiplexers. The 4*1

multiplexer can be implemented using two 2*1 multiplexers and so on. Similarly an 8*1

multiplexer can be implemented using two 4*1 multiplexers.

28

3

3.5.1.2 Serial in parallel out block:

A serial-in/parallel-out shift register is similar to the serial-in/ serial-out shift register in

that it shifts data into internal storage elements and shifts data out at the serial-out, data-out, pin .

It is different in that it makes all the internal stages available as outputs. Therefore, a serial

in/parallel-out shift register converts data from serial format to parallel format. If four data bits

are shifted in by four clock pulses via a single wire at data-in, below, the data becomes available

simultaneously on the four Outputs QA to QD after the fourth clock pulse.

Figure 3.5.1.2.1 Serial in parallel out

The practical application of the serial-in/parallel-out shift register is to convert data from serial

format on a single wire to parallel format on multiple wires. Perhaps, we will illuminate four

LEDs (Light Emitting Diodes) with the four outputs (QA QB QC QD ).

Figure 3.5.1.2.2 Serial in parallel out details

29

3

The above details of the serial-in/parallel-out shift register are fairly simple. It looks like a serial-

in/ serial-out shift register with taps added to each stage output. Serial data shifts in at SI (Serial

Input). After a number of clocks equal to the number of stages, the first data bit in appears at SO

(QD) in the above figure. In general, there is no SO pin. The last stage (QD above) serves as SO

and is cascaded to the next package if it exists.

Figure 3.5.1.2.3 Serial in parallel out wave forms

The shift register has been cleared prior to any data by CLR', an active low signal, which clears

all type D Flip-Flops within the shift register. Note the serial data 1011 pattern presented at the

SI input. This data is synchronized with the clock CLK. This would be the case if it is being

shifted in from something like another shift register, for example, a parallel-in/ serial-out shift

register (not shown here). On the first clock at t1, the data 1 at SI is shifted from D to Q of the

first shift register stage. After t2 this first data bit is at QB. After t3 it is at QC. After t4 it is at QD.

Four clock pulses have shifted the first data bit all the way to the last stage QD. The second data

bit a 0 is at QC after the 4th clock. The third data bit a 1 is at QB. The fourth data bit another 1 is

30

3

at QA. Thus, the serial data input pattern 1011 is contained in (QD QC QB QA). It is now available

on the four outputs.

It will available on the four outputs from just after clock t4 to just before t5. This parallel

data must be used or stored between these two times, or it will be lost due to shifting out the QD

stage on following clocks t5 to t8 as shown above.

3.5.1.3 Binary multiplier:

The binary multiplier used here is a 4-bit multiplier which takes two four bit inputs and

gives an 8-bit output.

Figure 3.5.1.3 binary multiplier

The binary multiplier which is employed in convolution here in the present project has a special

characteristic that the internal carry will not be forwarded to next stage. So the number of outputs

obtained here is seven only because in binary multiplier the MSB part is nothing but the carry

obtained from the second MSB so as carry is not forwarded only seven bits will be obtained as

output.

31

Binary multiplier

S0

S1

S2

S3

S4

S5

S6

S7

3

3.5.1.4 Register:

A circuit with flip-flops is considered a sequential circuit even in the absence of Combinational

logic. Circuits that include flip-flops are usually classified by the function they perform. Two

such circuits are registers and counters.

A Register is a group of flip-flops. Its basic function is to hold information within a

digital system so as to make it available to the logic units during the computing process.

However, a register may also have additional capabilities associated with it. It may have

combinational gates that perform certain data-processing tasks.

Figure 3.5.1.4.1 4 bit register

Various types of registers are available on the market. A simple 4-bit register is shown

below. The common clock input triggers all flip-flops and the binary data available at the four

inputs are transferred into the register. The clear input is useful for clearing the register to all

0’s output.

Registers capable of shifting their binary contents in one or both directions. A

unidirectional 4-bit shift register that uses only flip-flops is as follows:

32

3

Figure 3.5.1.4.2 Shift register

33

4

CHAPTER 4: RESULTS AND DISCUSSIONS

4.1 Introduction

The Convolution process and the developed architecture for the required functionality were

discussed in the previous chapters. Now this chapter deals with the simulation and synthesis

results of the Convolution process. Here Modelsim tool is used in order to simulate the design

and checks the functionality of the design. Once the functional verification is done, the design

will be taken to the Xilinx tool for Synthesis process.

The Appropriate test cases have been identified in order to test this modeled Convolution

process architecture. Based on the identified values, the simulation results which describes the

operation of the process has been achieved. This proves that the modeled design works properly

as per its functionality.

4.2 Simulation Results

Figure 4.2.1 4:1 Multiplexer

34

4

In general the multiplexer will have ‘2n’ number of inputs and n selection lines and one output.

Here we are using 4:1 multiplexer, so it will have 4 inputs and 2 selection lines and one output.

Based on selection line the input will be selected and we will get the output. Here for doing

convolution we have the blocks multiplexer 2:1 of two blocks. The above figure shows the

simulation results of 4:1 multiplexer.

SIPO

Figure 4.2.2 serial input and parallel output

35

4

In this block the input is the output of the multiplexer. The serial input and parallel output block

will do, the data from the multiplexer it will take as the input and it will hold the value up to four

clock cycles and it will convert the data serial into parallel. The above figure shows the

simulation results of the Serial input of data into parallel output.

BINARY MULTIPLIER

Figure 4.2.3 Binary multiplier

36

4

The binary multiplier will do the multiplication operation. For the binary multiplier the input is

the data which we are getting from the serial input parallel output block. Binary multiplier do the

multiplication from the serial input and parallel output blocks.

Multiplexer

Figure 4.2.4 8:1 Multiplexer and Register

37

4

The data from the binary multiplier is applied to the multiplexer. The multiplexer convert the

parallel data into the serial data and it will be stored into the register.

Top module

Figure 4.2.5 Convolution top modules

The top module shows the processes of convolution. The input is applied to the multiplexers.

Based on the selection line the data will be selected and it will produce the output in each clock

cycle. The output data from the multiplexer is applied to the serial input and parallel output

block, the data will be convert serial to parallel. The output of the serial input parallel output

block is connected to the binary multiplier so the binary multipliers do the multiplication

38

4

operation and the output is converted into parallel to serial. The data will be stored in the

register.

4.3 Introduction to FPGA

FPGA stands for Field Programmable Gate Array which has the array of logic module, I /O

module and routing tracks (programmable interconnect). FPGA can be configured by end user to

implement specific circuitry. Speed is up to 100 MHz but at present speed is in GHz.

Main applications are DSP, FPGA based computers, logic emulation, ASIC and ASSP.

FPGA can be programmed mainly on SRAM (Static Random Access Memory). It is Volatile and

main advantage of using SRAM programming technology is re-configurability. Issues in FPGA

technology are complexity of logic element, clock support, IO support and interconnections

(Routing).

In this work, design of a DWT and IDWT is made using Verilog HDL and is synthesized

on FPGA family of Spartan 3E through XILINX ISE Tool. This process includes following:

Translate

Map

Place and Route

4.3.1 FPGA Flow

The basic implementation of design on FPGA has the following steps.

Design Entry

Logic Optimization

Technology Mapping

Placement

Routing

Programming Unit

Configured FPGA

Above shows the basic steps involved in implementation. The initial design entry of may

be Verilog HDL, schematic or Boolean expression. The optimization of the Boolean expression

will be carried out by considering area or speed.

39

4

Figure 4.3.1 Logic Block

In technology mapping, the transformation of optimized Boolean expression to FPGA

logic blocks, that is said to be as Slices. Here area and delay optimization will be taken place.

During placement the algorithms are used to place each block in FPGA array. Assigning the

FPGA wire segments, which are programmable, to establish connections among FPGA blocks

through routing. The configuration of final chip is made in programming unit.

4.4 Synthesis Result

The developed convolution project is simulated and verified their functionality. Once the

functional verification is done, the RTL model is taken to the synthesis process using the Xilinx

ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped

to a specific technology library. Here in this Spartan 3E family, many different devices were

available in the Xilinx ISE tool. In order to synthesis this design the device named as

“XC3S500E” has been chosen and the package as “FG320” with the device speed such as “-4”.

This design is synthesized and its results were analyzed as follows.

40

4

Synthesis Report:

Figure 4.4.1

41

4

RTL Schematic:

Figure 4.4.2

42

4

Figure 4.4.3

Technology Schematic

Figure 4.4.4

43

5

CHAPTER 5: LANGUAGES AND TOOLS

5.1 Verilog HDL

Verilog HDL is a Hardware Description Language (HDL). A Hardware Description

Language is a language used to describe a digital system, for example, a computer or a

component of a computer. One may describe a digital system at several levels. For example, an

HDL might describe the layout of the wires, resistors and transistors on an Integrated Circuit (IC)

chip, i. e., the switch level. Or, it might describe the logical gates and flip flops in a digital

system, i. e., the gate level. An even higher level describes the registers and the transfers of

vectors of information between registers. This is called the Register Transfer Level (RTL).

Verilog supports all of these levels. The industry is currently split on which is better. Many feel

that Verilog is easier to learn and use than VHDL.

Verilog was introduced in 1985 by Gateway Design System Corporation, now a part of

Cadence Design Systems, Inc.’s Systems Division.

Verilog HDL allows a hardware designer to describe designs at a high level of

abstraction such as at the architectural or behavioral level as well as the lower implementation

levels (i. e. , gate and switch levels) leading to Very Large Scale Integration (VLSI) Integrated

Circuits (IC) layouts and chip fabrication. A primary use of HDLs is the simulation of designs

before the designer must commit to fabrication.

5.2 Overview of VHDL:

As the size and the complexity of digital system increases, more computer aided design

tools are introduced into the hardware design process. The early papered pencil design methods

have given way to sophisticated design entry, verification and automatic hardware generation

tools. The newest addition to this design methodologies the introduction of hardware description

language (HDL).Actually the use of this language is not new languages such as CDI,ISP and

AHPL have been used for last some years. However, their primary application has been the

verification of designs architecture. They do not have the capability to model design with a high

degree of accuracy that is, their timing model is not precise and/or their language construct

44

5

implies a certain hardware structure newer languages such as VHDL have more universal timing

models and imply no particular hardware structure.

Hardware description languages have two main applications documenting a design and

modeling it. Good documentation of a design helps to ensure design accuracy and design

portability. Since a simulator supports them inherent in a HDL description can be used to

validate a design. Prototyping of complicated system is extremely expansive, and the goal of

those concerned with the development of hardware languages is to replace this prototyping

process with validation through simulation and silicon compilation.

Once an entity has been modeled, it needs to be validated by the VHDL system. A typical

VHDL system consists of an analyzer and a simulator. The analyzer reads in one or more design

units contained in a single file and compiles them into a design library after validating the syntax

and performing some static semantic checks. The design library is a place in the host

environment where compiled design units are stored.

The simulator simulates an entity, represented by an entity-architecture pair or by a

configuration, by reading in its compiled description from the design library & then performing

the following steps.

1. Elaboration

2. Initialization

3. Simulation

VHDL is an acronym for VHSIC Hardware description language(VHSIC is an acronym

for very high speed integrated circuits). It is a hardware description language that can be used to

model a digital system at many levels of abstraction, ranging from the algorithmic level to the

gate level.

The complexity of a digital system being modeled could vary from that of simple gate to

a complete digital electronic system, or anything in between.

The digital system can also be described hierarchically. Timing can also be explicitly modeled in

the same description.

The VHDL language can be regarded as an integrated amalgamation of the

following languages.

45

5

Sequential language.

Concurrent language.

Net list language.

Timing specifications.

Waveform generation language.

Therefore, the language has constructs that enable you to express the concurrent or

sequential behavior of a digital system as an interconnection of components. All the above

constructs may be combined to provide a comprehensive description of the system in a single

model.

The language not only defines the syntax but also defines very clear simulation

semantics for each language construct. Therefore models written in this language can be verified

using a VHDL simulator. It inherits many of its features especially the sequential part, from the

Ada programming language. Because VHDL provides an extensive range of modeling

capabilities, it is often difficult to understand, fortunately, it is possible to quickly assimilate a

core subset of the language that is both easy and simple to understand without learning the more

complex features. The complete language however has sufficient power to capture the

descriptions of the most complex chips to complete electronic systems.

5.2.1 Features of VHDL:

The following are the major capabilities that the language provides along with the features that

differentiate it from other hardware description languages.

The language can be used an exchange medium between chip vendors and CAD tools users.

Different chip vendors can provide VHDL descriptions of their components to system designers.

CAD tool users can use it to capture the behavior of the design at a high level of abstraction for

functional simulation

The language supports hierarchy that is a digital system can be modeled as a set of

interconnected components, each component, in turn can be modeled as a set of interconnected

sub components.

46

5

The language is not technology specific, but is capable of supporting technology specific

features. It can also support various hardware technologies, for example you may define new

logic types and new components, also specify technology specific attributes. By being

technology independent, the same model can be synthesized into different vendor libraries. It

supports both synchronous and asynchronous timing models.

Various digital modeling techniques such as finite state machine descriptions, algorithmic

descriptions and Boolean equations can be modeled using the language.

Test benches can be written using the same language to test other VHDL models.

5.3 Modelsim

ModelSim is a verification and simulation tool for VHDL, Verilog, SystemVerilog, and mixed

language designs.

5.3.1Basic Simulation Flow

The following diagram shows the basic steps for simulating a design in ModelSim.

Figure 5.3.1 Basic Simulation Flow - Overview Lab

In ModelSim, all designs are compiled into a library. You typically start a new simulation in

ModelSim by creating a working library called "work". "Work" is the library name used by the

compiler as the default destination for compiled design units.

47

5

Compiling Your Design

After creating the working library, you compile your design units into it. The ModelSim library

format is compatible across all supported platforms. You can simulate your design on any

platform without having to recompile your design. Loading the Simulator with Your Design and

Running the Simulation With the design compiled, you load the simulator with your design by

invoking the simulator on a top-level module (Verilog) or a configuration or entity/architecture

pair (VHDL). Assuming the design loads successfully, the simulation time is set to zero, and you

enter a run command to begin simulation.

Debugging Your Results

If you don’t get the results you expect, you can use ModelSim’s robust debugging

environment to track down the cause of the problem.

5.3.2Project Flow

A project is a collection mechanism for an HDL design under specification or test. Even though

you don’t have to use projects in ModelSim, they may ease interaction with the tool and are

useful for organizing files and specifying simulation settings.

The following diagram shows the basic steps for simulating a design within a ModelSim project.

As you can see, the flow is similar to the basic simulation flow. However, there are two

important differences:

You do not have to create a working library in the project flow; it is done for you

48

5

automatically.

Projects are persistent. In other words, they will open every time you invoke ModelSim

unless you specifically close them.

5.3.3 Multiple Library Flow

ModelSim uses libraries in two ways: 1) as a local working library that contains the compiled

version of your design; 2) as a resource library. The contents of your working library will

change as you update your design and recompile. A resource library is typically static and

serves as a parts source for your design. You can create your own resource libraries, or they

may be supplied by another design team or a third party (e.g., a silicon vendor).

You specify which resource libraries will be used when the design is compiled, and there are

rules to specify in which order they are searched. A common example of using both a working

library and a resource library is one where your gate-level design and testbench are compiled

into the working library, and the design references gate-level models in a separate resource

library. The diagram below shows the basic steps for simulating with multiple libraries.

Figure 5.3.3. Multiple Library Flow

5.4 Debugging Tools

ModelSim offers numerous tools for debugging and analyzing your design. Several of these

tools are covered in subsequent lessons, including:

49

5

Using projects

Working with multiple libraries

Setting breakpoints and stepping through the source code

Viewing waveforms and measuring time

Viewing and initializing memories

Creating stimulus with the Waveform Editor

Automating simulation

5.5 Basic Simulation

Figure 5.5. Basic Simulation Flow - Simulation Lab

5.5.1 Design Files for this Lesson

The sample design for this lesson is a simple 8-bit, binary up-counter with an associated

testbench. The pathnames are as follows:

50

5

Verilog – <install_dir>/examples/tutorials/verilog/basicSimulation/counter.v and tcounter.v

VHDL – <install_dir>/examples/tutorials/vhdl/basicSimulation/counter.vhd and tcounter.vhd

This lesson uses the Verilog files counter.v and tcounter.v. If you have a VHDL license, use

counter.vhd and tcounter.vhd instead. Or, if you have a mixed license, feel free to use the

Verilog testbench with the VHDL counter or vice versa.

5.5.2 Create the Working Design Library

Before you can simulate a design, you must first create a library and compile the source code

into that library.

1. Create a new directory and copy the design files for this lesson into it.

Start by creating a new directory for this exercise (in case other users will be working with these

lessons).

Verilog: Copy counter.v and tcounter.v files from

/<install_dir>/examples/tutorials/verilog/basicSimulation to the new directory.

VHDL: Copy counter.vhd and tcounter.vhd files from

/<install_dir>/examples/tutorials/vhdl/basicSimulation to the new directory.

2. Start ModelSim if necessary.

a. Type vsim at a UNIX shell prompt or use the ModelSim icon in Windows. Upon opening

ModelSim for the first time, you will see the Welcome to ModelSim dialog. Click Close.

b. Select File > Change Directory and change to the directory you created in step 1.

3. Create the working library.

a. Select File > New > Library.

This opens a dialog where you specify physical and logical names for the library (Figure 3-2).

You can create a new library or map to an existing library. We’ll be doing the former.

51

5

Figure 5.5.2.1 The Create a New Library Dialog

b. Type work in the Library Name field (if it isn’t already entered automatically).

c. Click OK.

ModelSim creates a directory called work and writes a specially-formatted file named _info into

that directory. The _info file must remain in the directory to distinguish it as a ModelSim library.

Do not edit the folder contents from your operating system; all changes should be made from

within ModelSim. ModelSim also adds the library to the list in the Workspace (Figure 3-3) and

records the library mapping for future reference in the ModelSim initialization file

(modelsim.ini).

52

5

Figure 5.5.2.2 work library in work space

When you pressed OK in step 3c above, the following was printed to the Transcript:

vlib work

vmap work work

These two lines are the command-line equivalents of the menu selections you made. Many

command-line equivalents will echo their menu-driven functions in this fashion.

5.5.3 Compile the Design

With the working library created, you are ready to compile your source files.

You can compile by using the menus and dialogs of the graphic interface, as in the Verilog

53

5

example below, or by entering a command at the ModelSim> prompt.

1. Compile counter.v and tcounter.v.

a. Select Compile > Compile. This opens the Compile Source Files dialog (Figure 3-4).

If the Compile menu option is not available, you probably have a project open. If so, close the

project by making the Workspace pane active and selecting File > Close from the menus.

b. Select both counter.v and tcounter.v modules from the Compile Source Files dialog and click

Compile. The files are compiled into the work library. c. When compile is finished, click Done.

Figure 5.5.3.1 Compile Source Files Dialog

2. View the compiled design units.

a. On the Library tab, click the ’+’ icon next to the work library and you will see two design units

(Figure 3-5). You can also see their types (Modules, Entities, etc.) and the path to the underlying

source files (scroll to the right if necessary).

b. Double-click test_counter to load the design.

54

5

You can also load the design by selecting Simulate > Start Simulation in the menu bar. This

opens the Start Simulation dialog. With the Design tab selected, click the ’+’ sign next to the

work library to see the counter and test_counter modules. Select

the test_counter module and click OK (Figure 3-6).

Figure 5.5.3.2 Loading Design with Start Simulation Dialog

When the design is loaded, you will see a new tab in the Workspace named sim that displays the

hierarchical structure of the design (Figure 3-7). You can navigate within the hierarchy by

clicking on any line with a ’+’ (expand) or ’-’ (contract) icon. You will also see a tab named

Files that displays all files included in the design.

55

5

Figure 5.3.3.3 Verilog Modules Compiled into work Library

5.3.4 Load the Design

1. Load the test_counter module into the simulator.

a. In the Workspace, click the ‘+’ sign next to the work library to show the files contained there.

Figure 5.3.4.1 Workspace sim Tab Displays Design Hierarchy

56

5

2. View design objects in the Objects pane.

a. Open the View menu and select Objects. The command line equivalent is: view objects

The Objects pane (Figure 3-8) shows the names and current values of data objects in the current

region (selected in the Workspace). Data objects include signals, nets, registers, constants and

variables not declared in a process, generics, parameters.

Figure 5.3.4.2 Object Pane Displays Design Objects

You may open other windows and panes with the View menu or with the view command. See

Navigating the Interface.

5.3.5 Run the Simulation

Now you will open the Wave window, add signals to it, then run the simulation.

1. Open the Wave debugging window.

a. Enter view wave at the command line

You can also use the View > Wave menu selection to open a Wave window.

The Wave window is one of several windows available for debugging. To see a list

of the other debugging windows, select the View menu. You may need to move or

resize the windows to your liking. Window panes within the Main window can be

57

5

zoomed to occupy the entire Main window or undocked to stand alone. For details,

see Navigating the Interface.

2. Add signals to the Wave window.

a. In the Workspace pane, select the sim tab.

b. Right-click test_counter to open a popup context menu.

c. Select Add > To Wave > All items in region (Figure 3-9).

All signals in the design are added to the Wave window.

Figure 5.3.5.1 Using the Popup Menu to Add Signals to Wave Window

3. Run the simulation.

a. Click the Run icon in the Main or Wave window toolbar.

The simulation runs for 100 ns (the default simulation length) and waves are

drawn in the Wave window.

b. Enter run 500 at the VSIM> prompt in the Main window.

58

5

The simulation advances another 500 ns for a total of 600 ns (Figure 3-10).

Figure 5.3.5.2 Waves Drawn in Wave Window

c. Click the Run -All icon on the Main or Wave window toolbar.

The simulation continues running until you execute a break command or it

hits a statement in your code (e.g., a Verilog $stop statement) that halts the

simulation.

d. Click the Break icon. The simulation stops running.

5.4 Xilinx design flow

The first step involved in implementation of a design on FPGA involves System Specifications.

Specifications refer to kind of inputs and kind of outputs and the range of values that the kit can

take in based on these Specifications. After the first step system specifications the next step is

the Architecture. Architecture describes the interconnections between all the blocks involved in

our design. Each and every block in the Architecture along with their interconnections is

59

5

modeled in either VHDL or Verilog depending on the ease. All these blocks are then simulated

and the outputs are verified for correct functioning.

Figure 5.4 Xilinx Implementation Design Flow-Chart.

After the simulation step the next steps i.e., Synthesis. This is a very important step in

knowing whether our design can be implemented on a FPGA kit or not. Synthesis converts our

VHDL code into its functional components which are vendor specific. After performing

synthesis RTL schematic, Technology Schematic and generated and the timing delays are

generated. The timing delays will be present in the FPGA if the design is implemented on it.

Place & Route is the next step in which the tool places all the components on a FPGA die for

optimum performance both in terms of areas and speed. We also see the interconnections which

will be made in this part of the implementation flow.

In post place and route simulation step the delays which will be involved on the FPGA kit

are considered by the tool and simulation is performed taking into consideration these delays

which will be present in the implementations on the kit. Delays here mean electrical loading

effect, wiring delays, stray capacitances. After post place and route, comes generating the bit-

map file, which means converting the VHDL code into bit streams which is useful to configure

the FPGA kit. A bit file is generated this step is performed. After this comes final step of

60

5

downloading the bit map file on to the FPGA board which is done by connecting the computer to

FPGA board with the help of JTAG cable (Joint Test Action Group) which is an IEEE standard.

The bit map file consist the whole design which is placed on the FPGA die, the outputs can now

be observed from the FPGA LEDs. This step completes the whole process of implementing our

design on an FPGA.

5.4.1 Xilinx ISE 10.1 software

Xilinx ISE (Integrated Software Environment) 9.2i software is from XILINX company,

which is used to design any digital circuit and implement onto a Spartan-3E FPGA device.

XILINX ISE 9.2i software is used to design the application, verify the functionality and finally

download the design on to a Spartan-3E FPGA device.

5.4.2 Xilinx ISE 10.1 software tools

SIMULATION : ISE (Integrated Software Environment) Simulator

SYNTHESIS, PLACE & POUTE : XST (Xilinx Synthesis Technology) Synthesizer

5.4.3 Design steps using Xilinx ISE 10.1

1 Create an ISE PROJECT for particular embedded system application.

2 Write the assembly code in notepad or write pad and generate the verilog or vhdl module by

making use of assembler.

3 Check syntax for the design.

4 Create verilog test fixture of the design.

5 Simulate the test bench waveform (BEHAVIORAL SIMULATION) for functional

verification of the design using ISE simulator.

6 Synthesize and implement the top level module using XST synthesizer.

61

6

CHAPTER 6: CONCLUSION

In this paper, we presented an optimized implementation of convolution. This particular model

has the advantage of being fine tuned for signal processing; this implementation has the

advantage of being optimized based on operation, power and area. To accurately analyze our

proposed system, we have coded our design using the Verilog hardware description language and

have synthesized using Xilinx. This implementation has the advantage of being optimized based

on operation, power and area. Second, we implemented an illustrative example 4X4 convolver.

Similarly, the presented concept can be extended on an NXN case. The functionality of the

convolver was tested and verified successfully on a XILINIX SE FPGA and design compiler.

The proposed circuit uses only 5mw and saves almost 35% area and it takes 20ns to complete.

This shows improvement of more than 50% less power. As FPGA technology matures and much

larger arrays become practical, techniques that allow the automatic generation of highly parallel

architectures will become central to high performance computing. We have described some

simple techniques for generation of convolution pipelines for image processing and other

applications. Higher level techniques and approaches are also needed. FPGAs permit

restructurable processing, and restructurable interconnects are also becoming available.

62

7

CHAPTER 7: BIBLIOGRAPHY

[1] John W. Pierre, “A Novel Method for Calculating the Convolution Sum of Two Finite

Length Sequences”, IEEE transaction on education, VOL.39, NO. 1, 1996.

[2] W. W. Smith, J. M. Smith, “Handbook f Real-Time Fast Fourier Transforms”, IEEE Press,

1995, p. 28.

[3] R. G. Shoup, “Parameterized convolution filtering in a field programmable gate array,” in

selected papers from the Oxford 1993 international workshop on field programmable logic and

applications on More FPGAs. Oxford, United Kingdom: Abingdon EE&CS Books, 1994, pp.

274–280.

[4] Iván Rodríguez, “Parallel Cyclic Convolution Based on Recursive Formulations of Block

Pseudocirculant MatricesMarvi Teixeira”, IEEE, transaction on signal processing,2008

[5] Thomas Oelsner ,“Implementation of Data Convolution Algorithms in FPGAs” , QuickLogic

Europe http://www.quicklogic.com/images/appnote18.pdf

[6] Chao Cheng , Keshab K. Parhi ,“Low-Cost Fast VLSI Algorithm for Discrete Fourier

Transform”, IEEE,. IEEE transaction on circuits and systems, VOL. 54, 2007

[7] J. I. Guo, C. M. Liu, and C. W. Jen, “The efficient memory-based VLSI array designs for

DFT and DCT,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 37, no. 10,

1992, pp. 723–733.

[8] T. S. Chang, J. I. Guo, and C. W. Jen, “Hardware-efficient DFT designs with cyclic

convolution and subexpression sharing”,IEEE Trans. Circuits Syst. II, Analog Digital Signal

Process., vol. 47, no. 9, 2000, pp. 886–892.

63

7

[9] C. Cheng and K. K. Parhi, “Hardware efficient fast DCT based on novel cyclic convolution

structures”, IEEE Trans. Signal Process., vol. 54, no.11, 2007, pp. 4419–4434.

[10] Chao Cheng , Keshab K. Parhi ”Hardware Efficient Fast Parallel FIR Filter Structures

Based on Iterated Short Convolution” IEEE, and, IEEE transaction on circuits and systems,

VOL. 51, NO. 8, 2004 http://www.tc.umn.edu/~chen0867/ParallelFIR2004_TCASI.pdf.

[11] Abdulqadir Alaqeeli, Janusz Starzyk, “Hardware Implementation for Fast Convolution with

a PN Code Using Field Programmable Gate”, Ohio University,

http://www.ent.ohiou.edu/~starzyk/network/Research/Papers/Recent%20conferences/

Conv_FPGA_PN_code_SSST2001.pdf.3483

64

8

CHAPTER 8: PROGRAM CODE

8.1 Code:

module top(CLK,RST,l,l1,p0,p1,p2,p3,q0,q1,q2,q3,outfinal);

input CLK,RST;

input [1:0]l;

input [2:0] l1;

wire signed load,load1;

input signed [3:0] p0,p1,p2,p3,q0,q1,q2,q3;

wire signed [7:0] r;

output signed [7:0] outfinal;

//output [7:0] r11,r12,r0,r1,r2,r3,r4,r5,r6,r7;

wire signed [7:0]r0,r1,r2,r3,r4,r5,r6,r7;

wire signed [3:0]r11,r12;

//wire signed [1:0]l;

//wire signed [2:0] l1;

//wire signed en1,clock1;

//wire signed

[3:0]parallel_out10,parallel_out11,parallel_out12,parallel_out13,

// parallel_out20,parallel_out21,parallel_out22,parallel_out23;

//wire z;

//wire signed [3:0] p [3:0];

//wire signed [3:0] q [3:0];

wire signed [3:0]

DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4,DATA_OUT_5,DATA_OUT_6,DATA

_OUT_7,DATA_OUT_8;

mux41 m1

(.CLK(CLK),.RST(RST),.a0(p0), .a1(p1), .a2(p2), .a3(p3), .s(l), .o1(r1

1),.load(load));

mux41 m2

(.CLK(CLK),.RST(RST),.a0(q0), .a1(q1), .a2(q2), .a3(q3), .s(l), .o1(r1

2),.load(load1));

65

8

memory_SIPO sp1

(.CLOCK(CLK),.load(load),.data_in(r11),.DATA_OUT_1(DATA_OUT_1),.DATA_O

UT_2(DATA_OUT_2),.DATA_OUT_3(DATA_OUT_3),.DATA_OUT_4(DATA_OUT_4));

memory_SIPO sp2

(.CLOCK(CLK),.load(load1),.data_in(r12),.DATA_OUT_1(DATA_OUT_5),.DATA_

OUT_2(DATA_OUT_6),.DATA_OUT_3(DATA_OUT_7),.DATA_OUT_4(DATA_OUT_8));

//memory sp1

(.CLK(CLK),.RST(RST),.serial_in(r11),.parallel_out0(parallel_out10),.p

arallel_out1(parallel_out11),.parallel_out2(parallel_out12),.parallel_

out3(parallel_out13));

//memory sp2

(.CLK(CLK),.RST(RST),.serial_in(r12),.parallel_out0(parallel_out20),.p

arallel_out1(parallel_out21),.parallel_out2(parallel_out22),.parallel_

out3(parallel_out23));

bm bm1

(.CLK(CLK),.RST(RST),.a0(DATA_OUT_1),.a1(DATA_OUT_2),.a2(DATA_OUT_3),.

a3(DATA_OUT_4),.b0(DATA_OUT_5),.b1(DATA_OUT_6),.b2(DATA_OUT_7),.b3(DAT

A_OUT_8),.s0(r0),.s1(r1),.s2(r2),.s3(r3),.s4(r4),.s5(r5),.s6(r6));

mux81 m3

(.CLK(CLK),.RST(RST),.a0(r0), .a1(r1), .a2(r2), .a3(r3), .a4(r4), .a5(

r5), .a6(r6), .a7(r7), .s(l1), .o1(r));

register ro1(.CLK(CLK),.RST(RST),.r(r),.out(outfinal));

endmodule

8.2 Mux 4*1:

module mux41 (CLK,RST,load,a0, a1, a2, a3, s, o1);

input CLK,RST;

66

8

input signed [3:0] a0, a1, a2, a3;

input [1:0] s;

output signed [3:0] o1;

reg signed [3:0] o1;

output reg load;

always @(posedge CLK )

begin

if(RST==1'b0)

begin

o1=4'bzzzz;

load=1'b0;

end

else

begin

case (s)

2'b00 : o1 = a0;

2'b01 : o1 = a1;

2'b10 : o1 = a2;

2'b11 : o1 = a3;

endcase

load=1'b1;

end

end

endmodule

8.3 SIPO:

module

memory_SIPO(CLOCK,load,data_in,DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_O

UT_4);

//INPUTS

67

8

input CLOCK;

input load;

input signed [3:0] data_in;

//OUTPUTS

output signed [3:0] DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4;

//REGISTERS

reg [2:0] cntr;

integer i;

reg [2:0] cntr1;

reg signed [3:0] DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4;

//MEMORY

reg signed [3:0] m [3:0];

//WRITING INTO memory

always @(posedge CLOCK)

begin

if (!load)

begin

cntr<=3'b0;

cntr1<=3'b0;

DATA_OUT_1 <= 4'b0;

DATA_OUT_2 <= 4'b0;

DATA_OUT_3 <= 4'b0;

DATA_OUT_4 <= 4'b0;

for(i=0;i<=3;i=i+1)

m[i] <= 4'b0;

end

else if(cntr<=2'd3 && load)

68

8

begin

m[cntr] <= data_in;

cntr <= cntr + 1;

end

else

begin

DATA_OUT_1 <= m[0];

DATA_OUT_2 <= m[1];

DATA_OUT_3 <= m[2];

DATA_OUT_4 <= m[3];

cntr1<=cntr1+1;

end

end

endmodule

8.4 Binary multiplier:

module bm(CLK,RST,

a0,a1,a2,a3,

b0,b1,b2,b3,

s0,

s1,

s2,

s3,

s4,

s5,

s6);

//input en;

input CLK,RST;

input signed [3:0] a0,a1,a2,a3,b0,b1,b2,b3;

output signed [7:0] s0,s1,s2,s3,s4,s5,s6;

69

8

reg signed [7:0] s0,s1,s2,s3,s4,s5,s6;

wire signed [7:0] so0,so1,so2,so3,so4,so5,so6;

wire signed [7:0]

s11,s12,s21,s22,s23,s31,s32,s33,s34,s41,s42,s43,s51,s52,s61;

//wire signed [15:0] st,x1,x2;


begin

if(RST==1'b0)

begin

s0=8'bzzzzzzzz;

s1=8'bzzzzzzzz;

s2=8'bzzzzzzzz;

s3=8'bzzzzzzzz;

s4=8'bzzzzzzzz;

s5=8'bzzzzzzzz;

s6=8'bzzzzzzzz;

// s7=8'b00000000;

end

else

begin

s0=so0;

s1=so1;

s2=so2;

s3=so3;

s4=so4;

s5=so5;

s6=so6;

// s7=so7;

end

end

70

8

assign so0 = a0*b0;

assign s11 = a1*b0;

assign s12 = a0*b1;

assign so1= s11 + s12;

assign s21 = a0*b2;

assign s22 = a1*b1;

assign s23 = a2*b0;

assign so2= s21 + s22 + s23 ;

assign s31=a0*b3;

assign s32=a1*b2;

assign s33=a2*b1;

assign s34=a3*b0;

assign so3 = s31 + s32 + s33 + s34;

assign s41 = a1*b3;

assign s42 = a2*b2;

assign s43 = a3*b1;

assign so4 = s41 + s42 + s43;

assign s51 = a2*b3;

assign s52 = a3*b2;

assign so5 = s51 + s52 ;

//assign x1<={8'b00000000,s51};

71

8

//assign x2<={8'b00000000,s52};

//assign st = x1 + x2;

assign s61=a3*b3;

assign so6=s61;

//assign so6 = s61 [7:0];

//assign so7= so6[15:8];

endmodule

8.5 Mux 8*1:

module mux81 (CLK,RST,a0, a1, a2, a3, a4, a5, a6, a7, s, o1);

input signed [7:0] a0, a1, a2, a3,a4, a5, a6, a7;

input CLK,RST;

input [2:0] s;

output signed [7:0] o1;

reg signed [7:0] o1;


begin

if(RST==1'b0)

o1=8'bzzzzzzzz;

else

case (s)

3'b000 : o1 = a0;

3'b001 : o1 = a1;

3'b010 : o1 = a2;

3'b011 : o1 = a3;

72

8

3'b100 : o1 = a4;

3'b101 : o1 = a5;

3'b110 : o1 = a6;

3'b111 : o1 = a7;

endcase

end

endmodule

8.6 Register:

module register( CLK,RST,r,out);

input CLK,RST;

input [7:0] r;

output [7:0] out;

reg [7:0] out;

always@(posedge CLK )

begin

if(RST==1'b0)

begin

out<=8'bzzzzzzzz;

end

else

begin

out<=r;

end

end

endmodule

73

8

RESULTS:

74

Final

Documents

Transcript of Final