Final
-
Upload
sameera0627 -
Category
Documents
-
view
76 -
download
0
Transcript of Final
1
CHAPTER 1:INTRODUCTION
1.1 Introduction
Convolution provides the mathematical framework for DSP. It is the single most important
technique in Digital Signal Processing. Convolution is a mathematical way of combining two
signals to form a third signal. Using the strategy of impulse decomposition, systems are
described by a signal called the impulse response. In signal processing, the impulse response, or
impulse response function (IRF), of a dynamic system is its output when presented with a brief
input signal, called an impulse. More generally, an impulse response refers to the reaction of any
dynamic system in response to some external change. It has applications that include statistics,
computer vision, image and signal processing, electrical engineering, and differential equations.
1.2 Introduction to Convolution
One of the most important concepts in Fourier theory, and in crystallography, is that of a
convolution. Convolutions arise in many guises, as will be shown below. Because of a
mathematical property of the Fourier transform, referred to as the convolution theorem, it is
convenient to carry out calculations involving convolutions.
1.2.1 Convolution Definition
The convolution of ƒ and g is written ƒ∗g, using an asterisk or star. It is defined as the integral
of the product of the two functions after one is reversed and shifted. As such, it is a particular
kind of integral transform:
1
1
While the symbol t is used above, it need not represent the time domain. But in that context, the
convolution formula can be described as a weighted average of the function ƒ(τ) at the moment t
where the weighting is given by g(−τ) simply shifted by amount t. As t changes, the weighting
function emphasizes different parts of the input function.
More generally, if f and g are complex-valued functions on Rd, then their convolution may be
defined as the integral:
1.3 Types of Convolution
There are two types of convolution. They are:
Linear convolution Circular convolution
1.3.1 Linear convolution
Convolution is an integral concatenation of two signals. It has many applications in numerous
areas of signal processing. The convolution described above is nothing but linear convolution.
The most popular application is the determination of the output signal of a linear time-invariant
system by convolving the input signal with the impulse response of the system. Convolving two
signals is equivalent to multiplying the Fourier transform of the two signals.
Mathematic Formula:
The linear convolution of two continuous time signals and is defined by
2
1
For discrete time signals x(n) and h(n) , the integration is replaced by a summation
1.3.2 Circular convolution
The circular convolution of two aperiodic functions occurs when one of them is convolved in the
normal way with a periodic summation of the other function. It occurs naturally in digital signal
processing when DTFTs and inverse DTFTs are replaced by DFTs and inverse DFTs.
Equivalently, the continuous frequency domain is replaced by a discrete one. (See Circular
convolution theorem.)
For a periodic function xT(t) , with period T, the convolution with another function, h(t), is also
periodic, and can be expressed in terms of integration over a finite interval as follows:
Where, to is an arbitrary parameter, and hT(t) is a periodic summation of h, defined by:
When xT(t) is expressed as the periodic summation of another function, x, this convolution is
sometimes referred to as a circular convolution of functions h and x.
3
1
1.4 Properties of convolution
This section describes the properties of convolution. The properties of convolution are:
Commutative
Associative
Distributive
1.4.1 Commutative property:
The commutative property for convolution is expressed in mathematical form:
a[n] * b[n] = b[n] * a[n]
In words, the order in which two signals are convolved makes no difference, the results are identical.
1.4.2 Associative property:
The associative property describes the way to convolve more than two signals. Convolve two of
the signals to produce an intermediate signal, then convolve the intermediate signal with the third
signal. The associative property provides that the order of the convolutions doesn't matter. As an
equation:
(a[n] * b[n] ) * c[n] = a[n] * ( b[n] * c[n] )
The associative property is used in system theory to describe how cascaded systems behave.
Two or more systems are said to be in a cascade if the output of one system is used as the input
for the next system. From the associative property, the order of the systems can be rearranged
without changing the overall response of the cascade. Further, any number of cascaded systems
can be replaced with a single system. The impulse response of the replacement system is found
by convolving the impulse responses of all of the original systems.
4
1
1.4.3 Distributive property:
In equation form, the distributive property is written as:
a[n] * b[n] + a[n] * c[n] = a[n] * (b[n] + c [n] )
The distributive property describes the operation of parallel systems with added outputs. Two
or more systems can share the same input, x[n] , and have their outputs added to produce y[n] .
The distributive property allows this combination of systems to be replaced with a single system,
having an impulse response equal to the sum of the impulse responses of the original systems.
1.5 Applications of Convolution
Convolution and related operations are found in many applications of engineering and mathematics. The following are the areas where convolution is being applied .
In statistics, as noted above, a weighted moving average is a convolution.
In probability theory, the probability distribution of the sum of two independent random
variables is the convolution of their individual distributions.
In optics, many kinds of "blur" are described by convolutions. A shadow (e.g. the shadow on
the table when you hold your hand between the table and a light source) is the convolution of
the shape of the light source that is casting the shadow and the object whose shadow is being
cast. An out-of-focus photograph is the convolution of the sharp image with the shape of the
iris diaphragm. The photographic term for this is bokeh.
Similarly, in digital image processing, convolutional filtering plays an important role in many
important algorithms in edge detection and related processes.
In linear acoustics, an echo is the convolution of the original sound with a function
representing the various objects that are reflecting it.
In artificial reverberation (digital signal processing, pro audio), convolution is used to map
the impulse response of a real room on a digital audio signal (see previous and next point for
additional information).
5
1
In electrical engineering and other disciplines, the output (response) of a (stationary, or time-
or space-invariant) linear system is the convolution of the input (excitation) with the system's
response to an impulse or Dirac delta function. See LTI system theory and digital signal
processing.
In time-resolved fluorescence spectroscopy, the excitation signal can be treated as a chain of
delta pulses, and the measured fluorescence is a sum of exponential decays from each delta
pulse.
In physics, wherever there is a linear system with a "superposition principle", a convolution
operation makes an appearance.
In digital signal processing, frequency filtering can be simplified by convolving two
functions (data with a filter) in the time domain, which is analogous to multiplying the data
with a filter in the frequency domain
6
2
CHAPTER 2: LITERATURE REVIEW
2.1 Introduction
The most important operation performed on signals is linear filtering, which can be
performed by convolution. The reason that linear filtering is so important to signal processing is
that it solves many problems and is relatively simple to describe mathematically. In this chapter
we will be looking at convolution. Convolution helps to determine the effect a system has on an
input signal. It can be shown that a linear, time-invariant system is completely characterized by
its impulse response. Using the sampling property of the delta function for continuous time
signals and the unit sample for discrete time signals we can decompose a signal into an infinite
sum / integral of scaled and shifted impulses. By knowing how a system affects a single
impulse, and by understanding the way a signal is comprised of scaled and summed impulses, it
seems reasonable that it should be possible to scale and sum the impulse responses of a system in
order to determine what output signal will results from a particular input. This is precisely what
convolution does - convolution determines the system's output from knowledge of the input and
the system's impulse response.
2.2 Convolution - Discrete time
The idea of discrete-time convolution is exactly the same as that of continuous-time convolution.
For this reason, it may be useful to look at both versions to help your understanding of this
extremely important concept. Convolution is a very powerful tool in determining a system's
output from knowledge of an arbitrary input and the system's impulse response.
We know that any discrete-time signal can be represented by a summation of scaled and
shifted discrete-time impulses. Since we are assuming the system to be linear and time-invariant,
it would seem to reason that an input signal comprised of the sum of scaled and shifted impulses
would give rise to an output comprised of a sum of scaled and shifted impulse responses. This is
exactly what occurs in convolution. For discrete time signals the convolution equation is given
by:
7
2
Graphical Interpretation:
Reflection of resulting in
Shifting of resulting in
Element-wise multiplication of the sequences and
Summation of the product sequence resulting in the convolution value for
2.2.1 Graphical illustration of convolution properties (Discrete - time)
A quick graphical example may help in demonstrating why convolution works.
Fig 2.2.1.1: A single impulse input yields the system's impulse response.
8
2
Fig 2.2.1.2. : A scaled impulse input yields a scaled response, due to the scaling property of the
system's linearity.
Fig 2.2.1.3: We now use the time-invariance property of the system to show that a delayed input
results in an output of the same shape, only delayed by the same amount as the input.
9
2
Fig 2.2.1.4 : We now use the additively portion of the linearity property of the system to complete the picture. Since any discrete-time signal is just a sum of scaled and shifted discrete-time impulses, we can find the output from knowing the input and the impulse response.
2.3 Convolution – Analog
In this module we examine convolution for continuous time signals. This will result in the
convolution integral and its properties. These concepts are very important in Engineering and
10
2
will make any engineer's life a lot easier if the time is spent now to truly understand what is
going on.
2.3.1 Derivation of the convolution integral
To begin this, it is necessary to state the assumptions we will be making. In this instance, the
only constraints on our system are that it be linear and time-invariant.
Brief Overview of Derivation Steps:
1. An impulse input leads to an impulse response output.
2. A shifted impulse input leads to a shifted impulse response output. This is due to the time-
invariance of the system.
3. We now scale the impulse input to get a scaled impulse output. This is using the scalar
multiplication property of linearity.
4. We can now "sum up" an infinite number of these scaled impulses to get a sum of an infinite
number of scaled impulse responses. This is using the additively attribute of linearity.
5. Now we recognize that this infinite sum is nothing more than an integral, so we convert both
sides into integrals.
6. Recognizing that the input is the function f(t), we also recognize that the output is exactly the
convolution integral.
Fig 2.3.1.1: We begin with a system defined by its impulse response, h(t).
11
2
Fig 2.3.1.2: We then consider a shifted version of the input impulse. Due to the time invariance
of the system, we obtain a shifted version of the output impulse response.
Fig 2.3.1.3: Now we use the scaling part of linearity by scaling the system by a value, f(τ), that
is constant with respect to the system variable, t.
Fig 2.3.1.4: We can now use the additively aspect of linearity to add an infinite number of these,
one for each possible τ. Since an infinite sum is exactly an integral, we end up with the
integration known as the Convolution Integral. Using the sampling property, we recognize the
left-hand side simply as the input f(t).
2.3.2 Convolution Integral
As mentioned above, the convolution integral provides an easy mathematical way to express the
output of an LTI system based on an arbitrary signal, x (t), and the system's impulse response,
h(t) . The convolution integral is expressed as
12
2
Convolution is such an important tool that it is represented by the symbol *, and can be written
as
y (t) = x(t) * h(t)
By making a simple change of variables into the convolution integral, τ = t−τ, we can easily
show that convolution is commutative:
x (t) * h(t) = h(t) * x(t)
2.3.3 Implementation of Convolution
Taking a closer look at the convolution integral, we find that we are multiplying the input signal
by the time-reversed impulse response and integrating. This will give us the value of the output
at one given value of t. If we then shift the time-reversed impulse response by a small amount,
we get the output for another value of t. Repeating this for every possible value of t, yields the
total output function. While we would never actually do this computation by hand in this
fashion, it does provide us with some insight into what is actually happening. We find that we
are essentially reversing the impulse response function and sliding it across the input function,
integrating as we go. This method, referred to as the graphical method, provides us with a much
simpler way to solve for the output for simple (contrived) signals, while improving our intuition
for the more complex cases where we rely on computers. In fact Texas Instruments develops
Digital Signal Processors which have special instruction sets for computations such as
convolution.
The main assumption of the consistency principle and the mutual correspondence principle
between continuous and digital transformations is that the signal is represented discretely
13
2
through shift sampling and reconstruction. An image convolution is a filtering step in which an
image is the input and a computed image is the output, with each sample of the output image
calculated by individually weighting and then constructively and/or destructively summing the
samples from some neighborhood of the input image. We did implement for the algorithm shown
below and mentioned in. We take the two discrete finite length sequences and lines the columns
up like regular multiplication but rather than carrying the number over to the next column he
writes it down in the same column. For example lets say that we are given two discrete finite
length sequences x[n] and h[n] where x[n] = {a1 a2 a3} and h[n] = { b1 b2 b3 b4} are
convolved, y[n] =x[n]*h[n], in a way that is similar to regular multiplication as shown below in
Table 2.3.3
Table 2.3.3
As we were evaluating possible design approaches to achieve low speed, our research took us
through the following progression. Figure 2.3.3 shows the convolution flow of two 16-bit
numbers, in 4-bit segments. The letters A, B, C, D, E, F, G,and H each represent 4 bits of the 16
bits number. We sum the partial product along each column; HD0 is the LS 4 bits of the product
while HD1 is the MS 4 bits of the product.The Digital Convolution is summarized as: first Flip
(reverse) one of the digital functions, second Shift it along the time axis by one sample. Third,
multiply the corresponding values of the two digital functions. Fourth, sum the products from
step 3 to get one point of the Digital Convolution. And finally repeat steps 1-4 to obtain the
14
2
digital convolution at all times that the functions overlap. For example, let X= [1 2 3 4 5] and v =
[-1 5 3 -2 1].
Figure 2.3.3.1 convolution results
A discrete convolution of these two discrete signals equals:-1 3 10 15 21 33 10 -6 5. We used
Matlab to check the results which is shown in figure 2.3.3.1. For continuous function, y(t) =
x(t)*h(t) where the input,x(t), and the impulse response, h(t) has a sufficiently small delta to
make the result to be accurate. The e results are shown in figure 2.3.3.2.
x= [-2*ones(1,400) zeros(1,1000) 3*ones(1,100)]
15
2
h=ones(1,300);
conv(x,-3,h,-2,0.01)
Figure 2.3.3.2 convolution y[n] =x[n]*h[n]
16
2
Figure 2.3.3.3 convolution of x(t) and h(t)
17
2
High performance Digital Signal Processing chips have been widely employed to solve signal
processing problems. Many of these signal processing solutions can be implemented in a Field
Programmable Gate Array (FPGA) instead of a DSP chip. This is possible because the gate
densities available in FPGAs have increased rapidly within the last few years and now allow
fairly sophisticated DSP algorithms to be implemented within a single chip. In they try to
implement the convolution in an FPGA. Their approach in calculating a finite number of L
convolution samples requires approximately 3L+L(L+1)/2 clock cycles and addresses for the two
data memories which cost lots of access time resources. In their design they extend the result of
the multiplication by six more overflow bits before the results are added to the previous sum of
products. This is done so they can prevent overflow which is costly.
Depending on the application and desired quality (i.e. the width of the filter kernel), computing
this weighted sum of neighboring pixels can require significant amounts of computation, thus
suggesting a highly parallel implementation in special-purpose hardware. In they discuss
parameterized program generation of convolution filters in an FPGA for applications in image
processing including real-time video and desktop publishing. They show an example of 2-D filter
pipeline assembled from a set of multipliers and adders, which are in turn generated from a
canonical serial-parallel multiplier stage. They show a 3x3 convolution filter for video
applications. The drawback In their research is they have a high fan-in and because of the
pipeline delay, output pixels may be rewritten directly into the source image memory.
It is important to point out the emerging field of algorithm derivation and implementation, which
could be used as a basis for future work. In it is shown there are no restrictions imposed on the
convolution length other than to be composite, but they pointed out FPGA implementation will
be a future work. Breitzman shows the automatic derivation and implementation of fast
convolution algorithms and Arce-Nazario presents an automated methodology designed for the
high-level partitioning of discrete signal transforms onto distributed hardware architectures.
18
2
To efficiently control the number of required multipliers, at the cost of a reasonable number of
adders, a study was done on a hardware efficient fast cyclic convolution algorithm. It shows the
I/O cost can be kept low and the throughput rate high. Thus, it is much more efficient than
previous cyclic convolution implementation methods. But independently applying this algorithm
for prime-length DFT will still require huge amount of hardware cost. Some specific DFT
designs remove the multiplication operations, but they require a large number of adders and
RAM/ROM resources.
Another approach people use is to go through Matlab. It is used to automatically generate
Verilog code for the hardware implementation of convolution algorithms. This automation is
very efficient when the coefficients change. As mentioned in when they are trying to implement
FIR filter, some inputs go through two consecutive subtraction operators. This optimization can
be done when the Verilog code is being automatically generated. In their implementations they
used carry-save adders to accumulate consecutive adders which are slow compared to using
other adders as will be discussed in the next section. Note that the number of required additions
is dependent on the order of iterations. The iteration order for short convolutions should be 4x4,
3x3 and 2x2, as this will lead to the lowest implementation cost.
The research paper in shows a substitute algorithm for calculating the convolution that requires
less computation time. It is shown that CDMA receivers require a long time to acquire the
signals. This is mostly due to the use of expensive FFT based convolvers in the acquisition
process. The permutations usually can be stored in lookup tables . This type of implementation is
not efficient since it will cost additional hardware to store and time to retrieve.
2.4 Symmetric convolution
In mathematics, symmetric convolution is a special subset of convolution operations in which
the convolution kernel is symmetric across its zero point. Many common convolution-based
processes such as Gaussian blur and taking the derivative of a signal in frequency-space are
symmetric and this property can be exploited to make these convolutions easier to evaluate.
19
2
The convolution theorem states that a convolution in the real domain can be represented as a
point-wise multiplication across the frequency domain of a Fourier transform. Since sine and
cosine transforms are related transforms a modified version of the convolution theorem can be
applied, in which the concept of circular convolution is replaced with symmetric convolution.
Using these transforms to compute discrete symmetric convolutions is non-trivial since discrete
sine transforms (DSTs) and discrete cosine transforms (DCTs) can be counter-intuitively
incompatible for computing symmetric convolution, i.e. symmetric convolution can only be
computed between a fixed set of compatible transforms.
2.4.1 Advantages of symmetric convolution
There are a number of advantages to computing symmetric convolutions in DSTs and DCTs in
comparison with the more common circular convolution with the Fourier transform. Most
notably the implicit symmetry of the transforms involved is such that only data unable to be
inferred through symmetry is required. For instance using a DCT-II, a symmetric signal need
only have the positive half DCT-II transformed, since the frequency domain will implicitly
construct the mirrored data comprising the other half. This enables larger convolution kernels to
be used with the same cost as smaller kernels circularly convolved on the DFT. Also the
boundary conditions implicit in DSTs and DCTs create edge effects that are often more in
keeping with neighboring data than the periodic effects introduced by using the Fourier
transform.
20
3
CHAPTER 3: DESIGN OF HARDWARE MODEL
3.1 Convolution
Convolution is an important tool in data processing, in particular in digital signal and image
processing. Many image processing operations such as scaling and rotation require re-sampling
or convolution filtering for each pixel in the image Digital images can be modified (through
convolution) by neighborhood operations; these operations go beyond point wise operations, and
include smoothing, sharpening, and edge detection. Convolution has many applications which
have great significance in discrete signal processing. It is usually difficult to deal with analog
signals. Hence signals are converted to digital state. Many approaches have been attempted to
reduce the convolution processing time using hardware and software algorithms but they are
restricted to specific applications. The main problem in implementing and
computing convolution is speed, area and power which affect any DSP
system. Speeding up convolution using a Hardware Description Language for
design entry not only increases (improves) the
level of abstraction, but also opens new possibilities for using programmable
devices. Today, most DSPs suffer from limitations in available address space,
or the ability to interface with surrounding systems. The use of high speed
field programmable gate arrays i.e. FPGAs, together with DSPs, can often
increase the system bandwidth, by providing additional functionality to the
general purpose DSPs .In this project, a novel method for computing the
linear convolution of two finite length sequences is presented. A 4x4
convolution circuit can be instantiated for larger ones. This method is similar
to the multiplication of two decimal numbers, this similarity that makes this
method easy to learn and quick to computes.
3.2 Convolution in time domain
21
3
When two signals convolution is carried out in time domain it is referred to as convolution in
time domain. We are dealing with convolution in time domain in this project. In time domain
also the convolution can be continuous or discrete. When the convolution is in time domain is
discrete then it is called as convolution in discrete time and when the convolution is performed
with respect to continuous time it is called as convolution as convolution in continuous time.
Convolution in discrete and continuous time are described in previous chapter.
3.3 Convolution in frequency domain
When two signals are convolved in frequency domain then it is called as convolution in
frequency domain. It is proved that the convolution in time domain is equivalent to
multiplication in frequency domain.
Proof:
Let f, g belong to L1 (Rn). Let F be the Fourier transform of f and G be the Fourier transform of g:
Where the dot between x and ν indicates the inner product of Rn . Let h be the convolution of f and g
Now notice that
22
3
Hence by Fubini's theorem we have that so its Fourier transform H is defined by the integral formula
Observe that and hence by the argument above we may apply Fubini's theorem again:
Substitute y = z − x; then dy = dz, so:
These two integrals are the definitions of F(ν) and G(ν), so:
23
3
Hence, it is proved that the convolution in time domain is equivalent to multiplication in frequency domain.
3.4 General implementation flow
The generalized implementation flow diagram of the project is represented as follows.
24
3
Figure 3.4 generalized implementation flow diagram
Initially the market research should be carried out which covers the previous version of the
design and the current requirements on the design. Based on this survey, the specification and the
architecture must be identified. Then the RTL modelling should be carried out in VERILOG
25
3
HDL with respect to the identified architecture. Once the RTL modelling is done, it should be
simulated and verified for all the cases. The functional verification should meet the intended
architecture and should pass all the test cases.
Once the functional verification is clear, the RTL model will be taken to the synthesis
process. Three operations will be carried out in the synthesis process such as
Translate
Map
Place and Route
The developed RTL model will be translated to the mathematical equation format which
will be in the understandable format of the tool. These translated equations will be then mapped
to the library that is, mapped to the hardware. Once the mapping is done, the gates were placed
and routed. Before these processes, the constraints can be given in order to optimize the design.
Finally the BIT MAP file will be generated that has the design information in the binary format
which will be dumped in the FPGA board.
3.5 Implementation
In this project the implementation is carried out by first designed the individual blocks and then
these are combined to the final architecture. The individual blocks are shown in block diagram
given below:
3.5.1 Block diagram of proposed architecture
The block diagram of the proposed architecture is shown below:
26
3
Figure 3.5.1 block diagram of the proposed architecture
3.5.1.1 Multiplexer 4*1 and 8*1:
A multiplexer, sometimes referred to as a "multiplexer" or simply "mux", is a device that selects
between a numbers of input signals. In its simplest form, a multiplexer will have two signal
inputs, one control input, and one output.
A multiplexer is a device which selects any one of the inputs from 2n inputs and directed
to output depending on n-select lines.
27
3
Figure 3.5.1.1.1 4*1 multiplexer
Figure 3.5.1.1.2 8*1 multiplexer
The higher order multiplexers can be implemented using the lower order multiplexers. The 4*1
multiplexer can be implemented using two 2*1 multiplexers and so on. Similarly an 8*1
multiplexer can be implemented using two 4*1 multiplexers.
28
3
3.5.1.2 Serial in parallel out block:
A serial-in/parallel-out shift register is similar to the serial-in/ serial-out shift register in
that it shifts data into internal storage elements and shifts data out at the serial-out, data-out, pin .
It is different in that it makes all the internal stages available as outputs. Therefore, a serial
in/parallel-out shift register converts data from serial format to parallel format. If four data bits
are shifted in by four clock pulses via a single wire at data-in, below, the data becomes available
simultaneously on the four Outputs QA to QD after the fourth clock pulse.
Figure 3.5.1.2.1 Serial in parallel out
The practical application of the serial-in/parallel-out shift register is to convert data from serial
format on a single wire to parallel format on multiple wires. Perhaps, we will illuminate four
LEDs (Light Emitting Diodes) with the four outputs (QA QB QC QD ).
Figure 3.5.1.2.2 Serial in parallel out details
29
3
The above details of the serial-in/parallel-out shift register are fairly simple. It looks like a serial-
in/ serial-out shift register with taps added to each stage output. Serial data shifts in at SI (Serial
Input). After a number of clocks equal to the number of stages, the first data bit in appears at SO
(QD) in the above figure. In general, there is no SO pin. The last stage (QD above) serves as SO
and is cascaded to the next package if it exists.
Figure 3.5.1.2.3 Serial in parallel out wave forms
The shift register has been cleared prior to any data by CLR', an active low signal, which clears
all type D Flip-Flops within the shift register. Note the serial data 1011 pattern presented at the
SI input. This data is synchronized with the clock CLK. This would be the case if it is being
shifted in from something like another shift register, for example, a parallel-in/ serial-out shift
register (not shown here). On the first clock at t1, the data 1 at SI is shifted from D to Q of the
first shift register stage. After t2 this first data bit is at QB. After t3 it is at QC. After t4 it is at QD.
Four clock pulses have shifted the first data bit all the way to the last stage QD. The second data
bit a 0 is at QC after the 4th clock. The third data bit a 1 is at QB. The fourth data bit another 1 is
30
3
at QA. Thus, the serial data input pattern 1011 is contained in (QD QC QB QA). It is now available
on the four outputs.
It will available on the four outputs from just after clock t4 to just before t5. This parallel
data must be used or stored between these two times, or it will be lost due to shifting out the QD
stage on following clocks t5 to t8 as shown above.
3.5.1.3 Binary multiplier:
The binary multiplier used here is a 4-bit multiplier which takes two four bit inputs and
gives an 8-bit output.
Figure 3.5.1.3 binary multiplier
The binary multiplier which is employed in convolution here in the present project has a special
characteristic that the internal carry will not be forwarded to next stage. So the number of outputs
obtained here is seven only because in binary multiplier the MSB part is nothing but the carry
obtained from the second MSB so as carry is not forwarded only seven bits will be obtained as
output.
31
Binary multiplier
S0
S1
S2
S3
S4
S5
S6
S7
3
3.5.1.4 Register:
A circuit with flip-flops is considered a sequential circuit even in the absence of Combinational
logic. Circuits that include flip-flops are usually classified by the function they perform. Two
such circuits are registers and counters.
A Register is a group of flip-flops. Its basic function is to hold information within a
digital system so as to make it available to the logic units during the computing process.
However, a register may also have additional capabilities associated with it. It may have
combinational gates that perform certain data-processing tasks.
Figure 3.5.1.4.1 4 bit register
Various types of registers are available on the market. A simple 4-bit register is shown
below. The common clock input triggers all flip-flops and the binary data available at the four
inputs are transferred into the register. The clear input is useful for clearing the register to all
0’s output.
Registers capable of shifting their binary contents in one or both directions. A
unidirectional 4-bit shift register that uses only flip-flops is as follows:
32
3
Figure 3.5.1.4.2 Shift register
33
4
CHAPTER 4: RESULTS AND DISCUSSIONS
4.1 Introduction
The Convolution process and the developed architecture for the required functionality were
discussed in the previous chapters. Now this chapter deals with the simulation and synthesis
results of the Convolution process. Here Modelsim tool is used in order to simulate the design
and checks the functionality of the design. Once the functional verification is done, the design
will be taken to the Xilinx tool for Synthesis process.
The Appropriate test cases have been identified in order to test this modeled Convolution
process architecture. Based on the identified values, the simulation results which describes the
operation of the process has been achieved. This proves that the modeled design works properly
as per its functionality.
4.2 Simulation Results
Figure 4.2.1 4:1 Multiplexer
34
4
In general the multiplexer will have ‘2n’ number of inputs and n selection lines and one output.
Here we are using 4:1 multiplexer, so it will have 4 inputs and 2 selection lines and one output.
Based on selection line the input will be selected and we will get the output. Here for doing
convolution we have the blocks multiplexer 2:1 of two blocks. The above figure shows the
simulation results of 4:1 multiplexer.
SIPO
Figure 4.2.2 serial input and parallel output
35
4
In this block the input is the output of the multiplexer. The serial input and parallel output block
will do, the data from the multiplexer it will take as the input and it will hold the value up to four
clock cycles and it will convert the data serial into parallel. The above figure shows the
simulation results of the Serial input of data into parallel output.
BINARY MULTIPLIER
Figure 4.2.3 Binary multiplier
36
4
The binary multiplier will do the multiplication operation. For the binary multiplier the input is
the data which we are getting from the serial input parallel output block. Binary multiplier do the
multiplication from the serial input and parallel output blocks.
Multiplexer
Figure 4.2.4 8:1 Multiplexer and Register
37
4
The data from the binary multiplier is applied to the multiplexer. The multiplexer convert the
parallel data into the serial data and it will be stored into the register.
Top module
Figure 4.2.5 Convolution top modules
The top module shows the processes of convolution. The input is applied to the multiplexers.
Based on the selection line the data will be selected and it will produce the output in each clock
cycle. The output data from the multiplexer is applied to the serial input and parallel output
block, the data will be convert serial to parallel. The output of the serial input parallel output
block is connected to the binary multiplier so the binary multipliers do the multiplication
38
4
operation and the output is converted into parallel to serial. The data will be stored in the
register.
4.3 Introduction to FPGA
FPGA stands for Field Programmable Gate Array which has the array of logic module, I /O
module and routing tracks (programmable interconnect). FPGA can be configured by end user to
implement specific circuitry. Speed is up to 100 MHz but at present speed is in GHz.
Main applications are DSP, FPGA based computers, logic emulation, ASIC and ASSP.
FPGA can be programmed mainly on SRAM (Static Random Access Memory). It is Volatile and
main advantage of using SRAM programming technology is re-configurability. Issues in FPGA
technology are complexity of logic element, clock support, IO support and interconnections
(Routing).
In this work, design of a DWT and IDWT is made using Verilog HDL and is synthesized
on FPGA family of Spartan 3E through XILINX ISE Tool. This process includes following:
Translate
Map
Place and Route
4.3.1 FPGA Flow
The basic implementation of design on FPGA has the following steps.
Design Entry
Logic Optimization
Technology Mapping
Placement
Routing
Programming Unit
Configured FPGA
Above shows the basic steps involved in implementation. The initial design entry of may
be Verilog HDL, schematic or Boolean expression. The optimization of the Boolean expression
will be carried out by considering area or speed.
39
4
Figure 4.3.1 Logic Block
In technology mapping, the transformation of optimized Boolean expression to FPGA
logic blocks, that is said to be as Slices. Here area and delay optimization will be taken place.
During placement the algorithms are used to place each block in FPGA array. Assigning the
FPGA wire segments, which are programmable, to establish connections among FPGA blocks
through routing. The configuration of final chip is made in programming unit.
4.4 Synthesis Result
The developed convolution project is simulated and verified their functionality. Once the
functional verification is done, the RTL model is taken to the synthesis process using the Xilinx
ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped
to a specific technology library. Here in this Spartan 3E family, many different devices were
available in the Xilinx ISE tool. In order to synthesis this design the device named as
“XC3S500E” has been chosen and the package as “FG320” with the device speed such as “-4”.
This design is synthesized and its results were analyzed as follows.
40
4
Synthesis Report:
Figure 4.4.1
41
4
RTL Schematic:
Figure 4.4.2
42
4
Figure 4.4.3
Technology Schematic
Figure 4.4.4
43
5
CHAPTER 5: LANGUAGES AND TOOLS
5.1 Verilog HDL
Verilog HDL is a Hardware Description Language (HDL). A Hardware Description
Language is a language used to describe a digital system, for example, a computer or a
component of a computer. One may describe a digital system at several levels. For example, an
HDL might describe the layout of the wires, resistors and transistors on an Integrated Circuit (IC)
chip, i. e., the switch level. Or, it might describe the logical gates and flip flops in a digital
system, i. e., the gate level. An even higher level describes the registers and the transfers of
vectors of information between registers. This is called the Register Transfer Level (RTL).
Verilog supports all of these levels. The industry is currently split on which is better. Many feel
that Verilog is easier to learn and use than VHDL.
Verilog was introduced in 1985 by Gateway Design System Corporation, now a part of
Cadence Design Systems, Inc.’s Systems Division.
Verilog HDL allows a hardware designer to describe designs at a high level of
abstraction such as at the architectural or behavioral level as well as the lower implementation
levels (i. e. , gate and switch levels) leading to Very Large Scale Integration (VLSI) Integrated
Circuits (IC) layouts and chip fabrication. A primary use of HDLs is the simulation of designs
before the designer must commit to fabrication.
5.2 Overview of VHDL:
As the size and the complexity of digital system increases, more computer aided design
tools are introduced into the hardware design process. The early papered pencil design methods
have given way to sophisticated design entry, verification and automatic hardware generation
tools. The newest addition to this design methodologies the introduction of hardware description
language (HDL).Actually the use of this language is not new languages such as CDI,ISP and
AHPL have been used for last some years. However, their primary application has been the
verification of designs architecture. They do not have the capability to model design with a high
degree of accuracy that is, their timing model is not precise and/or their language construct
44
5
implies a certain hardware structure newer languages such as VHDL have more universal timing
models and imply no particular hardware structure.
Hardware description languages have two main applications documenting a design and
modeling it. Good documentation of a design helps to ensure design accuracy and design
portability. Since a simulator supports them inherent in a HDL description can be used to
validate a design. Prototyping of complicated system is extremely expansive, and the goal of
those concerned with the development of hardware languages is to replace this prototyping
process with validation through simulation and silicon compilation.
Once an entity has been modeled, it needs to be validated by the VHDL system. A typical
VHDL system consists of an analyzer and a simulator. The analyzer reads in one or more design
units contained in a single file and compiles them into a design library after validating the syntax
and performing some static semantic checks. The design library is a place in the host
environment where compiled design units are stored.
The simulator simulates an entity, represented by an entity-architecture pair or by a
configuration, by reading in its compiled description from the design library & then performing
the following steps.
1. Elaboration
2. Initialization
3. Simulation
VHDL is an acronym for VHSIC Hardware description language(VHSIC is an acronym
for very high speed integrated circuits). It is a hardware description language that can be used to
model a digital system at many levels of abstraction, ranging from the algorithmic level to the
gate level.
The complexity of a digital system being modeled could vary from that of simple gate to
a complete digital electronic system, or anything in between.
The digital system can also be described hierarchically. Timing can also be explicitly modeled in
the same description.
The VHDL language can be regarded as an integrated amalgamation of the
following languages.
45
5
Sequential language.
Concurrent language.
Net list language.
Timing specifications.
Waveform generation language.
Therefore, the language has constructs that enable you to express the concurrent or
sequential behavior of a digital system as an interconnection of components. All the above
constructs may be combined to provide a comprehensive description of the system in a single
model.
The language not only defines the syntax but also defines very clear simulation
semantics for each language construct. Therefore models written in this language can be verified
using a VHDL simulator. It inherits many of its features especially the sequential part, from the
Ada programming language. Because VHDL provides an extensive range of modeling
capabilities, it is often difficult to understand, fortunately, it is possible to quickly assimilate a
core subset of the language that is both easy and simple to understand without learning the more
complex features. The complete language however has sufficient power to capture the
descriptions of the most complex chips to complete electronic systems.
5.2.1 Features of VHDL:
The following are the major capabilities that the language provides along with the features that
differentiate it from other hardware description languages.
The language can be used an exchange medium between chip vendors and CAD tools users.
Different chip vendors can provide VHDL descriptions of their components to system designers.
CAD tool users can use it to capture the behavior of the design at a high level of abstraction for
functional simulation
The language supports hierarchy that is a digital system can be modeled as a set of
interconnected components, each component, in turn can be modeled as a set of interconnected
sub components.
46
5
The language is not technology specific, but is capable of supporting technology specific
features. It can also support various hardware technologies, for example you may define new
logic types and new components, also specify technology specific attributes. By being
technology independent, the same model can be synthesized into different vendor libraries. It
supports both synchronous and asynchronous timing models.
Various digital modeling techniques such as finite state machine descriptions, algorithmic
descriptions and Boolean equations can be modeled using the language.
Test benches can be written using the same language to test other VHDL models.
5.3 Modelsim
ModelSim is a verification and simulation tool for VHDL, Verilog, SystemVerilog, and mixed
language designs.
5.3.1Basic Simulation Flow
The following diagram shows the basic steps for simulating a design in ModelSim.
Figure 5.3.1 Basic Simulation Flow - Overview Lab
In ModelSim, all designs are compiled into a library. You typically start a new simulation in
ModelSim by creating a working library called "work". "Work" is the library name used by the
compiler as the default destination for compiled design units.
47
5
Compiling Your Design
After creating the working library, you compile your design units into it. The ModelSim library
format is compatible across all supported platforms. You can simulate your design on any
platform without having to recompile your design. Loading the Simulator with Your Design and
Running the Simulation With the design compiled, you load the simulator with your design by
invoking the simulator on a top-level module (Verilog) or a configuration or entity/architecture
pair (VHDL). Assuming the design loads successfully, the simulation time is set to zero, and you
enter a run command to begin simulation.
Debugging Your Results
If you don’t get the results you expect, you can use ModelSim’s robust debugging
environment to track down the cause of the problem.
5.3.2Project Flow
A project is a collection mechanism for an HDL design under specification or test. Even though
you don’t have to use projects in ModelSim, they may ease interaction with the tool and are
useful for organizing files and specifying simulation settings.
The following diagram shows the basic steps for simulating a design within a ModelSim project.
As you can see, the flow is similar to the basic simulation flow. However, there are two
important differences:
You do not have to create a working library in the project flow; it is done for you
48
5
automatically.
Projects are persistent. In other words, they will open every time you invoke ModelSim
unless you specifically close them.
5.3.3 Multiple Library Flow
ModelSim uses libraries in two ways: 1) as a local working library that contains the compiled
version of your design; 2) as a resource library. The contents of your working library will
change as you update your design and recompile. A resource library is typically static and
serves as a parts source for your design. You can create your own resource libraries, or they
may be supplied by another design team or a third party (e.g., a silicon vendor).
You specify which resource libraries will be used when the design is compiled, and there are
rules to specify in which order they are searched. A common example of using both a working
library and a resource library is one where your gate-level design and testbench are compiled
into the working library, and the design references gate-level models in a separate resource
library. The diagram below shows the basic steps for simulating with multiple libraries.
Figure 5.3.3. Multiple Library Flow
5.4 Debugging Tools
ModelSim offers numerous tools for debugging and analyzing your design. Several of these
tools are covered in subsequent lessons, including:
49
5
Using projects
Working with multiple libraries
Setting breakpoints and stepping through the source code
Viewing waveforms and measuring time
Viewing and initializing memories
Creating stimulus with the Waveform Editor
Automating simulation
5.5 Basic Simulation
Figure 5.5. Basic Simulation Flow - Simulation Lab
5.5.1 Design Files for this Lesson
The sample design for this lesson is a simple 8-bit, binary up-counter with an associated
testbench. The pathnames are as follows:
50
5
Verilog – <install_dir>/examples/tutorials/verilog/basicSimulation/counter.v and tcounter.v
VHDL – <install_dir>/examples/tutorials/vhdl/basicSimulation/counter.vhd and tcounter.vhd
This lesson uses the Verilog files counter.v and tcounter.v. If you have a VHDL license, use
counter.vhd and tcounter.vhd instead. Or, if you have a mixed license, feel free to use the
Verilog testbench with the VHDL counter or vice versa.
5.5.2 Create the Working Design Library
Before you can simulate a design, you must first create a library and compile the source code
into that library.
1. Create a new directory and copy the design files for this lesson into it.
Start by creating a new directory for this exercise (in case other users will be working with these
lessons).
Verilog: Copy counter.v and tcounter.v files from
/<install_dir>/examples/tutorials/verilog/basicSimulation to the new directory.
VHDL: Copy counter.vhd and tcounter.vhd files from
/<install_dir>/examples/tutorials/vhdl/basicSimulation to the new directory.
2. Start ModelSim if necessary.
a. Type vsim at a UNIX shell prompt or use the ModelSim icon in Windows. Upon opening
ModelSim for the first time, you will see the Welcome to ModelSim dialog. Click Close.
b. Select File > Change Directory and change to the directory you created in step 1.
3. Create the working library.
a. Select File > New > Library.
This opens a dialog where you specify physical and logical names for the library (Figure 3-2).
You can create a new library or map to an existing library. We’ll be doing the former.
51
5
Figure 5.5.2.1 The Create a New Library Dialog
b. Type work in the Library Name field (if it isn’t already entered automatically).
c. Click OK.
ModelSim creates a directory called work and writes a specially-formatted file named _info into
that directory. The _info file must remain in the directory to distinguish it as a ModelSim library.
Do not edit the folder contents from your operating system; all changes should be made from
within ModelSim. ModelSim also adds the library to the list in the Workspace (Figure 3-3) and
records the library mapping for future reference in the ModelSim initialization file
(modelsim.ini).
52
5
Figure 5.5.2.2 work library in work space
When you pressed OK in step 3c above, the following was printed to the Transcript:
vlib work
vmap work work
These two lines are the command-line equivalents of the menu selections you made. Many
command-line equivalents will echo their menu-driven functions in this fashion.
5.5.3 Compile the Design
With the working library created, you are ready to compile your source files.
You can compile by using the menus and dialogs of the graphic interface, as in the Verilog
53
5
example below, or by entering a command at the ModelSim> prompt.
1. Compile counter.v and tcounter.v.
a. Select Compile > Compile. This opens the Compile Source Files dialog (Figure 3-4).
If the Compile menu option is not available, you probably have a project open. If so, close the
project by making the Workspace pane active and selecting File > Close from the menus.
b. Select both counter.v and tcounter.v modules from the Compile Source Files dialog and click
Compile. The files are compiled into the work library. c. When compile is finished, click Done.
Figure 5.5.3.1 Compile Source Files Dialog
2. View the compiled design units.
a. On the Library tab, click the ’+’ icon next to the work library and you will see two design units
(Figure 3-5). You can also see their types (Modules, Entities, etc.) and the path to the underlying
source files (scroll to the right if necessary).
b. Double-click test_counter to load the design.
54
5
You can also load the design by selecting Simulate > Start Simulation in the menu bar. This
opens the Start Simulation dialog. With the Design tab selected, click the ’+’ sign next to the
work library to see the counter and test_counter modules. Select
the test_counter module and click OK (Figure 3-6).
Figure 5.5.3.2 Loading Design with Start Simulation Dialog
When the design is loaded, you will see a new tab in the Workspace named sim that displays the
hierarchical structure of the design (Figure 3-7). You can navigate within the hierarchy by
clicking on any line with a ’+’ (expand) or ’-’ (contract) icon. You will also see a tab named
Files that displays all files included in the design.
55
5
Figure 5.3.3.3 Verilog Modules Compiled into work Library
5.3.4 Load the Design
1. Load the test_counter module into the simulator.
a. In the Workspace, click the ‘+’ sign next to the work library to show the files contained there.
Figure 5.3.4.1 Workspace sim Tab Displays Design Hierarchy
56
5
2. View design objects in the Objects pane.
a. Open the View menu and select Objects. The command line equivalent is: view objects
The Objects pane (Figure 3-8) shows the names and current values of data objects in the current
region (selected in the Workspace). Data objects include signals, nets, registers, constants and
variables not declared in a process, generics, parameters.
Figure 5.3.4.2 Object Pane Displays Design Objects
You may open other windows and panes with the View menu or with the view command. See
Navigating the Interface.
5.3.5 Run the Simulation
Now you will open the Wave window, add signals to it, then run the simulation.
1. Open the Wave debugging window.
a. Enter view wave at the command line
You can also use the View > Wave menu selection to open a Wave window.
The Wave window is one of several windows available for debugging. To see a list
of the other debugging windows, select the View menu. You may need to move or
resize the windows to your liking. Window panes within the Main window can be
57
5
zoomed to occupy the entire Main window or undocked to stand alone. For details,
see Navigating the Interface.
2. Add signals to the Wave window.
a. In the Workspace pane, select the sim tab.
b. Right-click test_counter to open a popup context menu.
c. Select Add > To Wave > All items in region (Figure 3-9).
All signals in the design are added to the Wave window.
Figure 5.3.5.1 Using the Popup Menu to Add Signals to Wave Window
3. Run the simulation.
a. Click the Run icon in the Main or Wave window toolbar.
The simulation runs for 100 ns (the default simulation length) and waves are
drawn in the Wave window.
b. Enter run 500 at the VSIM> prompt in the Main window.
58
5
The simulation advances another 500 ns for a total of 600 ns (Figure 3-10).
Figure 5.3.5.2 Waves Drawn in Wave Window
c. Click the Run -All icon on the Main or Wave window toolbar.
The simulation continues running until you execute a break command or it
hits a statement in your code (e.g., a Verilog $stop statement) that halts the
simulation.
d. Click the Break icon. The simulation stops running.
5.4 Xilinx design flow
The first step involved in implementation of a design on FPGA involves System Specifications.
Specifications refer to kind of inputs and kind of outputs and the range of values that the kit can
take in based on these Specifications. After the first step system specifications the next step is
the Architecture. Architecture describes the interconnections between all the blocks involved in
our design. Each and every block in the Architecture along with their interconnections is
59
5
modeled in either VHDL or Verilog depending on the ease. All these blocks are then simulated
and the outputs are verified for correct functioning.
Figure 5.4 Xilinx Implementation Design Flow-Chart.
After the simulation step the next steps i.e., Synthesis. This is a very important step in
knowing whether our design can be implemented on a FPGA kit or not. Synthesis converts our
VHDL code into its functional components which are vendor specific. After performing
synthesis RTL schematic, Technology Schematic and generated and the timing delays are
generated. The timing delays will be present in the FPGA if the design is implemented on it.
Place & Route is the next step in which the tool places all the components on a FPGA die for
optimum performance both in terms of areas and speed. We also see the interconnections which
will be made in this part of the implementation flow.
In post place and route simulation step the delays which will be involved on the FPGA kit
are considered by the tool and simulation is performed taking into consideration these delays
which will be present in the implementations on the kit. Delays here mean electrical loading
effect, wiring delays, stray capacitances. After post place and route, comes generating the bit-
map file, which means converting the VHDL code into bit streams which is useful to configure
the FPGA kit. A bit file is generated this step is performed. After this comes final step of
60
5
downloading the bit map file on to the FPGA board which is done by connecting the computer to
FPGA board with the help of JTAG cable (Joint Test Action Group) which is an IEEE standard.
The bit map file consist the whole design which is placed on the FPGA die, the outputs can now
be observed from the FPGA LEDs. This step completes the whole process of implementing our
design on an FPGA.
5.4.1 Xilinx ISE 10.1 software
Xilinx ISE (Integrated Software Environment) 9.2i software is from XILINX company,
which is used to design any digital circuit and implement onto a Spartan-3E FPGA device.
XILINX ISE 9.2i software is used to design the application, verify the functionality and finally
download the design on to a Spartan-3E FPGA device.
5.4.2 Xilinx ISE 10.1 software tools
SIMULATION : ISE (Integrated Software Environment) Simulator
SYNTHESIS, PLACE & POUTE : XST (Xilinx Synthesis Technology) Synthesizer
5.4.3 Design steps using Xilinx ISE 10.1
1 Create an ISE PROJECT for particular embedded system application.
2 Write the assembly code in notepad or write pad and generate the verilog or vhdl module by
making use of assembler.
3 Check syntax for the design.
4 Create verilog test fixture of the design.
5 Simulate the test bench waveform (BEHAVIORAL SIMULATION) for functional
verification of the design using ISE simulator.
6 Synthesize and implement the top level module using XST synthesizer.
61
6
CHAPTER 6: CONCLUSION
In this paper, we presented an optimized implementation of convolution. This particular model
has the advantage of being fine tuned for signal processing; this implementation has the
advantage of being optimized based on operation, power and area. To accurately analyze our
proposed system, we have coded our design using the Verilog hardware description language and
have synthesized using Xilinx. This implementation has the advantage of being optimized based
on operation, power and area. Second, we implemented an illustrative example 4X4 convolver.
Similarly, the presented concept can be extended on an NXN case. The functionality of the
convolver was tested and verified successfully on a XILINIX SE FPGA and design compiler.
The proposed circuit uses only 5mw and saves almost 35% area and it takes 20ns to complete.
This shows improvement of more than 50% less power. As FPGA technology matures and much
larger arrays become practical, techniques that allow the automatic generation of highly parallel
architectures will become central to high performance computing. We have described some
simple techniques for generation of convolution pipelines for image processing and other
applications. Higher level techniques and approaches are also needed. FPGAs permit
restructurable processing, and restructurable interconnects are also becoming available.
62
7
CHAPTER 7: BIBLIOGRAPHY
[1] John W. Pierre, “A Novel Method for Calculating the Convolution Sum of Two Finite
Length Sequences”, IEEE transaction on education, VOL.39, NO. 1, 1996.
[2] W. W. Smith, J. M. Smith, “Handbook f Real-Time Fast Fourier Transforms”, IEEE Press,
1995, p. 28.
[3] R. G. Shoup, “Parameterized convolution filtering in a field programmable gate array,” in
selected papers from the Oxford 1993 international workshop on field programmable logic and
applications on More FPGAs. Oxford, United Kingdom: Abingdon EE&CS Books, 1994, pp.
274–280.
[4] Iván Rodríguez, “Parallel Cyclic Convolution Based on Recursive Formulations of Block
Pseudocirculant MatricesMarvi Teixeira”, IEEE, transaction on signal processing,2008
[5] Thomas Oelsner ,“Implementation of Data Convolution Algorithms in FPGAs” , QuickLogic
Europe http://www.quicklogic.com/images/appnote18.pdf
[6] Chao Cheng , Keshab K. Parhi ,“Low-Cost Fast VLSI Algorithm for Discrete Fourier
Transform”, IEEE,. IEEE transaction on circuits and systems, VOL. 54, 2007
[7] J. I. Guo, C. M. Liu, and C. W. Jen, “The efficient memory-based VLSI array designs for
DFT and DCT,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 37, no. 10,
1992, pp. 723–733.
[8] T. S. Chang, J. I. Guo, and C. W. Jen, “Hardware-efficient DFT designs with cyclic
convolution and subexpression sharing”,IEEE Trans. Circuits Syst. II, Analog Digital Signal
Process., vol. 47, no. 9, 2000, pp. 886–892.
63
7
[9] C. Cheng and K. K. Parhi, “Hardware efficient fast DCT based on novel cyclic convolution
structures”, IEEE Trans. Signal Process., vol. 54, no.11, 2007, pp. 4419–4434.
[10] Chao Cheng , Keshab K. Parhi ”Hardware Efficient Fast Parallel FIR Filter Structures
Based on Iterated Short Convolution” IEEE, and, IEEE transaction on circuits and systems,
VOL. 51, NO. 8, 2004 http://www.tc.umn.edu/~chen0867/ParallelFIR2004_TCASI.pdf.
[11] Abdulqadir Alaqeeli, Janusz Starzyk, “Hardware Implementation for Fast Convolution with
a PN Code Using Field Programmable Gate”, Ohio University,
http://www.ent.ohiou.edu/~starzyk/network/Research/Papers/Recent%20conferences/
Conv_FPGA_PN_code_SSST2001.pdf.3483
64
8
CHAPTER 8: PROGRAM CODE
8.1 Code:
module top(CLK,RST,l,l1,p0,p1,p2,p3,q0,q1,q2,q3,outfinal);
input CLK,RST;
input [1:0]l;
input [2:0] l1;
wire signed load,load1;
input signed [3:0] p0,p1,p2,p3,q0,q1,q2,q3;
wire signed [7:0] r;
output signed [7:0] outfinal;
//output [7:0] r11,r12,r0,r1,r2,r3,r4,r5,r6,r7;
wire signed [7:0]r0,r1,r2,r3,r4,r5,r6,r7;
wire signed [3:0]r11,r12;
//wire signed [1:0]l;
//wire signed [2:0] l1;
//wire signed en1,clock1;
//wire signed
[3:0]parallel_out10,parallel_out11,parallel_out12,parallel_out13,
// parallel_out20,parallel_out21,parallel_out22,parallel_out23;
//wire z;
//wire signed [3:0] p [3:0];
//wire signed [3:0] q [3:0];
wire signed [3:0]
DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4,DATA_OUT_5,DATA_OUT_6,DATA
_OUT_7,DATA_OUT_8;
mux41 m1
(.CLK(CLK),.RST(RST),.a0(p0), .a1(p1), .a2(p2), .a3(p3), .s(l), .o1(r1
1),.load(load));
mux41 m2
(.CLK(CLK),.RST(RST),.a0(q0), .a1(q1), .a2(q2), .a3(q3), .s(l), .o1(r1
2),.load(load1));
65
8
memory_SIPO sp1
(.CLOCK(CLK),.load(load),.data_in(r11),.DATA_OUT_1(DATA_OUT_1),.DATA_O
UT_2(DATA_OUT_2),.DATA_OUT_3(DATA_OUT_3),.DATA_OUT_4(DATA_OUT_4));
memory_SIPO sp2
(.CLOCK(CLK),.load(load1),.data_in(r12),.DATA_OUT_1(DATA_OUT_5),.DATA_
OUT_2(DATA_OUT_6),.DATA_OUT_3(DATA_OUT_7),.DATA_OUT_4(DATA_OUT_8));
//memory sp1
(.CLK(CLK),.RST(RST),.serial_in(r11),.parallel_out0(parallel_out10),.p
arallel_out1(parallel_out11),.parallel_out2(parallel_out12),.parallel_
out3(parallel_out13));
//memory sp2
(.CLK(CLK),.RST(RST),.serial_in(r12),.parallel_out0(parallel_out20),.p
arallel_out1(parallel_out21),.parallel_out2(parallel_out22),.parallel_
out3(parallel_out23));
bm bm1
(.CLK(CLK),.RST(RST),.a0(DATA_OUT_1),.a1(DATA_OUT_2),.a2(DATA_OUT_3),.
a3(DATA_OUT_4),.b0(DATA_OUT_5),.b1(DATA_OUT_6),.b2(DATA_OUT_7),.b3(DAT
A_OUT_8),.s0(r0),.s1(r1),.s2(r2),.s3(r3),.s4(r4),.s5(r5),.s6(r6));
mux81 m3
(.CLK(CLK),.RST(RST),.a0(r0), .a1(r1), .a2(r2), .a3(r3), .a4(r4), .a5(
r5), .a6(r6), .a7(r7), .s(l1), .o1(r));
register ro1(.CLK(CLK),.RST(RST),.r(r),.out(outfinal));
endmodule
8.2 Mux 4*1:
module mux41 (CLK,RST,load,a0, a1, a2, a3, s, o1);
input CLK,RST;
66
8
input signed [3:0] a0, a1, a2, a3;
input [1:0] s;
output signed [3:0] o1;
reg signed [3:0] o1;
output reg load;
always @(posedge CLK )
begin
if(RST==1'b0)
begin
o1=4'bzzzz;
load=1'b0;
end
else
begin
case (s)
2'b00 : o1 = a0;
2'b01 : o1 = a1;
2'b10 : o1 = a2;
2'b11 : o1 = a3;
endcase
load=1'b1;
end
end
endmodule
8.3 SIPO:
module
memory_SIPO(CLOCK,load,data_in,DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_O
UT_4);
//INPUTS
67
8
input CLOCK;
input load;
input signed [3:0] data_in;
//OUTPUTS
output signed [3:0] DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4;
//REGISTERS
reg [2:0] cntr;
integer i;
reg [2:0] cntr1;
reg signed [3:0] DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4;
//MEMORY
reg signed [3:0] m [3:0];
//WRITING INTO memory
always @(posedge CLOCK)
begin
if (!load)
begin
cntr<=3'b0;
cntr1<=3'b0;
DATA_OUT_1 <= 4'b0;
DATA_OUT_2 <= 4'b0;
DATA_OUT_3 <= 4'b0;
DATA_OUT_4 <= 4'b0;
for(i=0;i<=3;i=i+1)
m[i] <= 4'b0;
end
else if(cntr<=2'd3 && load)
68
8
begin
m[cntr] <= data_in;
cntr <= cntr + 1;
end
else
begin
DATA_OUT_1 <= m[0];
DATA_OUT_2 <= m[1];
DATA_OUT_3 <= m[2];
DATA_OUT_4 <= m[3];
cntr1<=cntr1+1;
end
end
endmodule
8.4 Binary multiplier:
module bm(CLK,RST,
a0,a1,a2,a3,
b0,b1,b2,b3,
s0,
s1,
s2,
s3,
s4,
s5,
s6);
//input en;
input CLK,RST;
input signed [3:0] a0,a1,a2,a3,b0,b1,b2,b3;
output signed [7:0] s0,s1,s2,s3,s4,s5,s6;
69
8
reg signed [7:0] s0,s1,s2,s3,s4,s5,s6;
wire signed [7:0] so0,so1,so2,so3,so4,so5,so6;
wire signed [7:0]
s11,s12,s21,s22,s23,s31,s32,s33,s34,s41,s42,s43,s51,s52,s61;
//wire signed [15:0] st,x1,x2;
always @(posedge CLK )
begin
if(RST==1'b0)
begin
s0=8'bzzzzzzzz;
s1=8'bzzzzzzzz;
s2=8'bzzzzzzzz;
s3=8'bzzzzzzzz;
s4=8'bzzzzzzzz;
s5=8'bzzzzzzzz;
s6=8'bzzzzzzzz;
// s7=8'b00000000;
end
else
begin
s0=so0;
s1=so1;
s2=so2;
s3=so3;
s4=so4;
s5=so5;
s6=so6;
// s7=so7;
end
end
70
8
assign so0 = a0*b0;
assign s11 = a1*b0;
assign s12 = a0*b1;
assign so1= s11 + s12;
assign s21 = a0*b2;
assign s22 = a1*b1;
assign s23 = a2*b0;
assign so2= s21 + s22 + s23 ;
assign s31=a0*b3;
assign s32=a1*b2;
assign s33=a2*b1;
assign s34=a3*b0;
assign so3 = s31 + s32 + s33 + s34;
assign s41 = a1*b3;
assign s42 = a2*b2;
assign s43 = a3*b1;
assign so4 = s41 + s42 + s43;
assign s51 = a2*b3;
assign s52 = a3*b2;
assign so5 = s51 + s52 ;
//assign x1<={8'b00000000,s51};
71
8
//assign x2<={8'b00000000,s52};
//assign st = x1 + x2;
assign s61=a3*b3;
assign so6=s61;
//assign so6 = s61 [7:0];
//assign so7= so6[15:8];
endmodule
8.5 Mux 8*1:
module mux81 (CLK,RST,a0, a1, a2, a3, a4, a5, a6, a7, s, o1);
input signed [7:0] a0, a1, a2, a3,a4, a5, a6, a7;
input CLK,RST;
input [2:0] s;
output signed [7:0] o1;
reg signed [7:0] o1;
always @(posedge CLK )
begin
if(RST==1'b0)
o1=8'bzzzzzzzz;
else
case (s)
3'b000 : o1 = a0;
3'b001 : o1 = a1;
3'b010 : o1 = a2;
3'b011 : o1 = a3;
72
8
3'b100 : o1 = a4;
3'b101 : o1 = a5;
3'b110 : o1 = a6;
3'b111 : o1 = a7;
endcase
end
endmodule
8.6 Register:
module register( CLK,RST,r,out);
input CLK,RST;
input [7:0] r;
output [7:0] out;
reg [7:0] out;
always@(posedge CLK )
begin
if(RST==1'b0)
begin
out<=8'bzzzzzzzz;
end
else
begin
out<=r;
end
end
endmodule
73
8
RESULTS:
74
8
75
8
76
8
77
8
78
79