74HC595; 74HCT595 8-bit serial-in, serial or parallel-out shift
A comparison of bit-serial multipliers using VHDAL based ...
Transcript of A comparison of bit-serial multipliers using VHDAL based ...
University of Calgary
PRISM: University of Calgary's Digital Repository
Graduate Studies Restricted Theses
1997
A comparison of bit-serial multipliers using VHDAL
based logic synthesizers
Pokhrel, Khem Chandra
Pokhrel, K. C. (1997). A comparison of bit-serial multipliers using VHDAL based logic synthesizers
(Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/21986
http://hdl.handle.net/1880/26790
master thesis
University of Calgary graduate students retain copyright ownership and moral rights for their
thesis. You may use this material in any way that is permitted by the Copyright Act or through
licensing that has been assigned to the document. For uses that are not allowable under
copyright legislation or licensing, you are required to seek permission.
Downloaded from PRISM: https://prism.ucalgary.ca
THE UNIVERSITY OF CALGARY
A Cornparison of Bit-Serial Muhipliers Using
VHDL Based Logic Synthesizers
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
M PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGmERJNG
CALGARY, ALBERTA
JANUARY, 1997
National Library Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services sentices bibliographiques 395 w m ï street 395. me Wdllngkm -ON K1AONI -ON K i A W CaMda Canada
The author has grantd a non- exclusive licence alIowing the National himy of Cana& to reprodwe, loan, distn'bute or sell copies of hismer thesis by any means and in any fonn or format, malong
The auîhor retains ownetship of the copyright in Mer thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced with the author's permission.
L'auteur a accordé une licence non exclusive permettant a la Bibliothèque nationale du Canada de
vencke des copies de sa t h b de guelque manière et sous quelque forme gue ce soit pour mettre des exemplaires de cette thèse à la disposition des personnes intéressées.
L'auteur coaserve la propriété du droit d'auteur qui protège sa thèse. Ni la thèse ni des extraits substantieis de celle-ci ne doivent être imprimés ou autrement reproduits sans son alItorisati011,
Performance cornparison of five multiplier algorithms is done using VHDL synthe
sis based design flow. Ordinary pipciiaed bit-&& digit -dal , and p m d d data
flow, and radix-n ncoding aïe considered in the eomputatiod aigorithms for the
multipliers. Design ske, throughput, and area-time product are considaed as main
performance memnes. Synopsys and Powerview tools are ased for design synthe-
sis. Compatible VHDL is d for design description in Powewiew and Synopsys.
Synthesized designs are impIemented on Xiiinx 4010 field programmable gste May.
Implementation results are d for the pedormance cornpariaon of the multipk
algorithms. The VEDL based logic synthesis taols are compared based on the design
perfo1~111â11ces.
Digit-serial multiplia is found to perform the best among consided algorithms.
Respectable results of the serial multiplier in general stress th& usefidness and the
need to improve their computationd tbughput. Results dso show a considerable
dinerace in performa~ce of these VHDL synthesis tools.
1 would iike to express my sin- gratitude to Dr. G. S. Ho~c, whoee constant sug
port, guidance7 encouragesiest and constmctive aitickm has made this work possible.
1 am as0 gratefd to Dr. L. E. b e r for his induable guidance7 encouragement
and suggestions for this project.
1 appmüate the h c i d support provided by Nepal Engineering Education Project
which allowed me to go tkough M.Sc. program.
I thank Warren Flaman, TechnÏCa Support ECE Dept., for his help with X k
implemsntation board.
My sincere th& to Dr. Soorya Kuloor for all his help in Energy Management
Lab. 1 would like to thank Joeeph Pmvine, Gokaraju Ramakrishaa, Sridhar Krishnan,
and AN^ Das for their suggestions and help with formattiog this thesis in Latex.
CONTENTS
......................................... APPROVAL PAGE ii
................................... A O E N T S iv
............................................. DEDICATION v
TABLE OF CONTENTS ..................................... vi
. LIST OF TABLES .......................................... viii
.................................. LIST OF ABBREVIATION xi
1 . INTRODUCTION ....................................... 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Present Trends : Hi& Level Logic Synthesis . . . . . . . . . . . . . . 2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Motivation 7 1.4 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Approach 9 1.5 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2. DIGITAL S Y S T E M DESIGN APPROACHES ................ 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 12
2.2 Architectural Alternatives . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Serial Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Performance Measues . . . . . . . . . . . . . . . . . . . . . . . . . . 17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Serial vs Paralld 18 2.6 Rdk-N . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 . MULTIPLZER DESIGN ................................... 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Multiplication Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Functional b i t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Lat& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1.1 Latch2bit . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 SeÉialAdder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Ordinary Bit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Radix-4 Bit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . . 33 3.6 Radix-8 Bit-Senal Multiplier . . . . . . . . . . . . . . . . . . . . . . . 36 3.7 Digit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . 39
v i
3.8 Pasallel-hy Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.9 Summsry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
.................................... . 4 DESIGN SYNTHESfS 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction 44
. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 VHDL for Synthesis 45 4.2.1 VHDL Constructs support . . . . . . . . . . . . . . . . . . . . 50
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Viewlogic Tools 51 . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Synthesis Criteria 52
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Synopsys Tools 55 . . . . . . . . . . . . . . 4.4.1 Constraints and Design Opthization 57
4.4.1.1 constraints . . . . . . . . . . . . . . . . . . . . . . . 57 . . . . . . . . . . . . . . . . . . . . . . 4.4.1.2 Optimizstion 60
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary 62
................... ........ . 5 DESIGN IMPLEMENTATION ,. 63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 htduct ion 63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 CMOS Logic 64
. . . . . . . . . . . . . 5.3 ASIC Techn01ogies and Programmable Devices 66 5.3.1 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
. . . . . . . . . . . . . . 5.3.2 Standard cell and Full custom design 71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 XACT Tools 73
5.5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
.............................................. . 6 RESULTS 76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Objectives 76
. . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Implementation Results 76 . . . . . . . . . . . . . . . . . . . . . . 6.3 Discussion & Conclusions .. 84
..................... 7 . CONCLUSION AND FUTURE WORK 93 . . . . . . . . * . . . . . . . . . . . . . . . . . . . . . 7.1 Conclusions .. 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 FUTther Work 95
............................................ REFERENCES 97
APPENDLX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A XC-4000 CLB 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . B VEDLDcsaiption 101 . . . . . . . . . . . . . . . . . . . . . . . . . . C DesignCompilaSaipt 104
vii
LIST OF TABLES
. . . . . . . . . . . . . . . . . 2.1 Non-Reduodant Radk-N Rsooding Scheme 20
. . . . . . . . . . . . . . . . . . . . . . . . 2.2 Rcdundapt Radix-4 Rrcoding 21
. . . . . . . . . . . 6.1 RcsaltsfromPowerviewSynthcsizaandXACTtools 79
. . . . . . . . . . . . 6.2 Resats from Synopsys Synthesizer and XACT tooIs 80
LIST OF FIGURES
1.1 Synthesis Based Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geaeric Architectures ,.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SerialDataFIow
Typical Bit-Sezial Multiplier Module . . . . . . . . . . . . . . . . . . . . Lat& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . La tab i t
Bit-Serial Cany Save Adder . . . . . . . . . . . . . . . . . . . . . . . . . Ordinaxy BieSbal Multiplier . . . . . . . . . . . . . . . . . . . . . . . . Radix-4 BitSerial Multiplia . . . . . . . . . . . . . . . . . . . . . . . . . RadUr-8 Bit-Serd Multiplier . . . . . . . . . . . . . . . . . . . . . . . . .
Digit-Seria Multiplie . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Array Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . A VHDL Hardwaxe Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . .
Design Flow with Viewlogic Tools . . . . . . . . . . . . . . . . . . . . . 4.3 Design Flow with Synopsys Tools . . . . . . . . . . . . . . . . . . . . . . 5.1 The Acronym Tra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Implementation Flow in XACT took . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 6.1 Design Size in f owexview tod
6.2 Throughput Rate in Powaview tool . . . . . . . . . . . . . . . . . . . . . 6.3 Ama Tirne P d u c t in Powerview tool . . . . . . . . . . . . . . . . . . . 6.4 Design Size in Synopsys t d . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Throughput Rate in Synopsys tool . . . . . . . . . . . . . . . . . . . . . 6.6 Area T i e Product in Synopsys tool . . . . . . . . . . . . . . . . . . . .
ix
6.7 Comparative Design Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
. . . . . . . . . . . . . . . . . . . . . . . 6.8 Comparative Throughput Rate 89
. . . . . . . . . . . . . . . . . . . . . . 6.9 Comparative Ares Time P d u c t 90
A.1 XC4000-famitiesCLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.2 Schematic Diapim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.3 SimUlafionRedts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ASIC
AT
c m
CAE
CLB
CMOS
cm
DC
DFIRST
DRC
DTL
D m
ECL
EDA
EEPROM
EPROM
FIRST
FPGA
FTGS
GUI
GVAN
Application Speafic Intepted Cimut
Ares T h e
Computer Aided Design
Computer Aided Engineering
Codigurable Logic Block
Complementazy Metal Oxide Semiconductor
Cdcient Word Length
Design Compila
Digit-serial implementation of FiRST
Design Rule Check
Diode Transistor Logic
Data Word Length
Emitter Coupled Logic
Electronic Design Automation
Electrically Ensable Programmable ROM
Electrically Pqpmmable ROM
Fast knplementation of Rad-time Signal t r a s s f o ~ ~ ~ ~ ~ .
Fidd Programmable Gate h y
Full Timing Gatclewel Simulation
HDL
IC
110
LOGSIM
LSB
LSD
LSI
MS
MSB
MSD
MSI
m
PAL
PLD
PPR
PPSI
PPSO
PROM
RAM
RC
ROM
RTL
Hardware Description Lang~age
Iiitegrated Circuit
Input output
Logic Simulator
Least Sigdicant Bit
Least Sigdicant Digit
Large Scale Integration
Most s i m c a p t
Most Signiscat Bit
Most SignScant Digit
Medium S& htegration
MUlt ipleXor
Programmable Array Logic
Programmable Logic Device
Paztition Placenient & Route
Partid Product Sum Input
Partial Product Sum output
Programmable ROM
SCL
SRAM
SSI
TTL
VHDL
VHDLDBX
VHSIC
VSS
WPBS
XACT
XNF
Simulation Contra1 Language
Static RAM
SmaU S& kitegration
Transistor 'Itansistor Logic
VHSIC HDL
VBDL Debugge~ Simulator
Very High Speed IC
VHDL System Simulator
Word P d e l Bit Serial
Xilinx Automated CAE Tools
X i h x Net-list Format
CHAPTER 1
INTRODUCTION
1.1 Background
Computer aided hi@ level digital circuit design process consists of successive trans-
formation fiom higher to lowa lwels of &cuit description. In g e n d a Computer
Aided Design (Cm) process contains different description domains bound together
by some relationship as defined by the softwan tools in question. The three major
description domains am :
a Strucfural domain and
a Physical domain.
a System level
0 Architectural or Functional level
Logic level and
a Physical Layout or Circuit level
2
The design may be convertecl h m concept to architecture, to logic and memory,
to the circuit and hence to the physïcal layout. The dciency of design conversion
depends on the consistency of a sofhran tool in ail t h description domsios and at
ail relevant levels of abstraction. This is m d in temm of performance parameters
suchôs:
a Speed of operation,
a Chip asea,
Ease of test generation and testability-
1.2 Present 'Ikends : High Level Logic Synthesis
Advances in Integrated C i t technology have given us the abüity to design and
manufacture large Integrated Ci~cuits. But, these advancement have increased the
number of parameters that a designer must deal nith in order to realize a hi&-qudity
commercially viable design. Smder geometries of semiconductor technologies which
enabled larger and faster designs have also signiâcastly impacted the Msiable that
aned timing in designs [lï], [33]. This has i n d the numba of design alternatives
that need to be considered.
The widely adopted schematic capture based htegrated Circuit (IC) design a p
prosch of eighties has limitations that inaease with complexity of design. However,
3
with major advaoces in ElectLOnic Design Automation a nlstiveiy automated means
of high level design with Hardware Description Laogaa%es and Logic Synthesis, has
emetged. The transition to Hardware Description Language (EDL) based design a p
proach hasi enabled a substantial inaease in pductivity- Sometuaes this is quantilied
as numba of etes per eqheer per day. HDL synthesis based methodology focuses
more on logic vdcation than detailed gate level implementation used in schematic
capture b a d design. This helps the designer ver@ hi0 concepts in a short period of
tirne without going into net-list generation and layout stages.
At the heazt of this transition to HDtbased design lies Logic Synthesis" . Synthe-
sis is the iterative pmcess aimed at tradofming hctionality initidy described in
HDL, to achieve an optimized technology-spdc net-list. A typical synthesis process
indudes both reading the source code and optimizing this code. Optimization is a
step in the synthesis process which ensunil the k t possible combination of library
cells to meet the hctional, ana and speed requirements of the target design.
Hardware Description Lansusges and Logic Synthesis have a profound impact
on the design procees and the final outcorne of the design. The emerging flavor in
HDL synthesis b d design is Behaviord Synthesis Methodology. In tthis technique
the design specification and fiinctionality are descxibed in algdthmic comtructs of
Very-high-speed-integrated-circuit Hardwaxe Description Language (VHDL) or Ver-
hg. Otha high level laquages like C are also being used in design description
4
in hardware-soRwatie -design techniques. These C codes are t h converted into
behavioral VHDL or Vdog and synthesis is d e d out. The motivations behind
Behaviod Synthesis Methodology an :
0 S d e r Code : The Behaviod VHDL or V d o g d e is about 10% of its Rcgister
'banda Level (RTL) structural code-
0 faste^ Simulation : The Behavioral SimuIation is very fagt compared to RTL
simulation.
Shorter Design Cycle : O v d design time in this methodology is shorter.
A typicd high level design flow bascd on synthesis involves the steps depicted in
Figi.1. It begins with a sketch of the functiond speafication of the design. This
inchdes informations about @, ares, system Input Output (I/O), word-lengths,
control flow and data M e r , and the overall fundion of the design. This functiond-
ity is then desaibed in a Hardware Description Language as a behaviord, structural
or mixecl code. This code is andyzed and simulated for its functional verification as
spewfied earlier. If the d e mats the fundional requirement it ia then synthesized
or compileci into tagct teehnology library components. This step is key to HDL
synthesis b a d design flow. Several factors involved in synthesis are d d b e d io
detail in chapter 4- A gate level net-list is pmduced as a result of synthesis. This is
simulated to veRfy the requved functionaity. ARer gate level simulation, the design
5
is t r a n s f o d into the phfical dom& using floor-planning or place & mute tools.
To achieve the optima performance the actiürl delay and otha circuit information
is supplieci badr from the place & route tao1 to the synthesiza and the design is
te-synthesized. As shown in Fig 1.1 the steps involveci are asually iterative and are
Figure 1.1. Synthesis Based Design Flow
carrried out util the design mets the fundional requirements and initial speciî~ca-
tiona. This ptocess has an impact not only on the final net-list generated, but also on
6
the otha issues including HDL codllig styles(constn~cts), design partitioning, net-
integration and CAD methodoiogy-
That are various high Id digital logic synthesis toois, which incorporate various
HDLs like VBDL, Vmlog, Abel, FIRST, DFIRST, LOGSIM etc. The most commonly
used HDLs today are VHDL and Verilog. Dinaent tools have dinemnt b d s of usa
intaface, design optimization and eonsasints setting techniques, and 0th- design
t h e ffexibility like technology transfer (bm one to the another Application S p d c
htegrated Circuit (ASIC)). Amoag these Synopsys and Viewlogic synthesis tools are
the most popular, while VHDL is the preeent day HDL's stasdard. V d o g is an
emerging HDL.
With increasing complexities and multiple features of these s o k e tools, the
g e n d points of merits of the tool and HDL in use depends on its ability to :
i. facilitate modular design(to baadle cornplex designs),
2. provide self documenthg description,
3. facilitate parameters and constraints setting as quired by the design,
4. incorporate better ASIC technologies for test implanentation,
5. optimize the design,
6. provide design and architecture independent flexibility,
7. mtatget a given design to nem semiconductor technology, and
8. f d t a t e easy rcyse of technology independent designs desaibed in HDL.
1.3 Motivation
The EDL based high l w d digital design appraach is an important -ch ana in
Integrated Circuit tcehnology. Many design toob involving HDL b d logic synthesis
have been developed and used by IC designers. A study of these tmls has revded
the following points :
a In the quest of apgrading the Electn,nic Design Automation (EDA) tool capa-
bilities to rea.lize the f d l potential of HDL based design, the whole process of
synthesis has become increasingiy cornplex
a Area and Speed of the rdt ing design has dways played a key d e in determin-
hg the productivity of the bol. This is important when selecting a design tool
for smaller and faster circuits.
O Teats for the synthesîzing capability of Merent system of tools for the same
HDL based design have not been d e d out dcie&ly to compaze their per-
formances.
O Some of the tools are architecture speafic and somt off' very little parameter
. constratorng feature needed for optimized design.
8
Effective deaign methodobgy utilizing an optimal combination of design tools is
A lot of research work has ken dont [6], [Y], [8], [IO], [12] to kvestigate ways to inaease
the wmputational throughput of b i M d design. Various aigorithms have emerged
as a d t of these efforts. Importance of bit-serial multiplias for sup& sna-time
product ampazed to p d e l array design[6],[12] is well known. With the advent of
better techn01ogies and newer design methodologies it is worth while to reconsider
serial-muitiplier designs.
These observations motivated this research project.
1.4 Thesis Objectives
The main objective of the research is to obtain a graphical cornparison of the
performances of various multiplier aigorithms. Design size and s p d are considered
as performance meaSUTes because:
O They are the major performance parameters of a digital circuit which can give
us a standard cornparison of the design algotithms and CAD tools in use. The
atea of difhision (of the device) and the speed of operation are also related with
the physid circuit parameters such as :
- bistance Capacitance (RC) delay estimation,
- Powa collsumption,
- Padeging issues and
a In a good design approach, detennination of correct opaation and adequate
performance involves simulating the circuit at aU appropriate design coniers.
These coniers are worst-power and worst-speed [l?] , [BI, [XI, (281.
Thus it is evident that an ;uea-time or spsctspeed cornparison be done. Also the
general goal of this theais is to compare how difksent VHDL based digital h a d
ware synthesizers perform in produchg digital logic design. This -ch focuses
on two mat pop* HDL based (VHDL) logic Synthesis tools Viewlogic and Syn-
opsys. These tools are chosen because of th& populari@, flexibility, and parame-
ter constraining facilities. The performance cornparison is focused on the abüity of
these tools to generate differeat digital circuit designed h m diffkrent architectural
approaches. The Merent architecture considemi here are parallel, bit-SeLial, and
digit-serial.
1. Bitoserial multiplias with different algorith, and a parallel array multiplier
are designed using VHDL description. These designs are simulated for theV
functionality and used as the test benchmadm for the performance cornparison.
10
2. These desipz are impleimepted on Xilinx Fidd Rognmmable Gate Array (FPGA).
M t s from the hardware implementation are tabdattal snd compared.
3. Various multiplier pedormaaces an c o m p d among themselves and across ma-
jor design t& Vïewlogic and Synopsys.
1.5 Thesis Organization
Chapta 2 gives a general ovmew of the patallel and serial design techniques. A
cornparison of these techniques is p~esepted. Pîpelining of serial data paths, interna1
radix-n recodiag and pedorma~~ce measures are disnissed.
Design details of various multiplier used in this project is given in chapter 3. A
modulas hierarchical development approach £rom basic fiuictional units is stressed.
Chapter 4 presents the high level multiplier design synthesis using VBDL descrip
tion. Use of VHDL for synthesis is discussed. The design methodology and steps
involved with Viewlogic and Synopsys tooh is givm. Synthesis aiteria and design
optimization capabilities of these tools are discussed.
Chapta 5 discusses differept logic types. A brief ovaview of several implanenta-
tion styles, programmable logic aad ASIC technologies U presented. Design imple
mentation flow for Xilinx Automated Cornputer-aided-engineering TooIs (XACT) is
given.
Chapta 6 tabulates the results obtained h m design implementation. Graphical
representations for various design performances axe presented. Comparative analysis
of the results is given and some condusions an drawn.
O v d conclusions are drawn and thesis oo~ttibutions an preented in chapter 7.
Some suggestions for ftrther work are also given.
CHAPTER 2
DIGITAL SYSTEM DESIGN APPROACHES
2.1 Introduction
The primary con- of a logic designer is to achieve opeQfic real-time sample rates
using adequate but modest hardwase. Algorithm & architecture choie limits system
speed and actud area of silicon implementation, because the arithmetic elements have
iked maximum operathg speed according to the target technology used.
This chapter describes diffemnt a p p r d e s of digital systems design. Modifications
of the g m d said design method to investigate ways to increase eomputational
throughput is given the main focus. Binary arithmetic, bit-sampling and modular
development is &O discussed. Internal bit tegrouping and their interpretation for
computational steps is discussed.
The chanctaistic featues of parallel and serial a p p d e s are summatized here.
A cornparison of the p d e l and serial design method is presented. The performance
criterion of the d t a n t h a r d w is given in detail.
2.2 Architectural Alternat ives
In general, the dilssification of implementation style of a digital signal pmcessing
algorithm [1],[6] is governecl by the numba of bits pmcessed simultaneously. The
Various arcbitectd alternatives are:
Fig 2.1 shows a generic b i t - p d e l and bit-serial architecture. In b i t -pde l design
all the input bits of the semple word are nad in a single dock cyde. And output bits
aie produced togethet. In a bit-serial design, input bits arrive on a single wire over a
number of dock cycles and output bits me produœd serially. Fasta processing speed
in parallel designs are achieved using increased complexity of routing, propagation
delays within operators and at expense of large chip asea.
Digit-Senal systems process more than one bit at a time. This is done by dividing
the N-bit word into X di6erent digits of Y bits wide each, whem N is XCY. The main
objective behind this approach is to increase the opaator speed by a factor greater
thsn its size. Similarly a design can be of mixed architecture type! depending on the
mode of data bits transmission to and fiom external units. A Word P d e l Bit Serial
architecture has pardel interfixe to its external uaits, whaeas it processes the data
word one bit st a time. Simirarly Word Serial Bit Pardel architectures ttansmits
data word seridy but the bits constituting the words are processeci in pardel.
Bin (M bits)
Bin (M bits, serial)
a Bit Saial Multiplier
Figure 2.1. Generic Architectures
A digital design can be synchronous or asynchrono~. In a synchronous system
the sequace and time of operation is controlIed by meam of system wide clock. The
dock is key to the functiond behavior of the synchronous system. The maximum
dock rate dependa on the delays in the signa paths. In asynchtonous designs, the
fundional elements are driva by a sequaice of operation and t h e is no global dock.
Only synchronous designs are considered in this thesis.
Some of the key architectural terminology (11 ,[8] ,[IO] of digital systems are:
15
a Latency: This is the time re~uired for the output bits to appear afta corre-
sponding input bits are nad. A critical path in the design is the path of operators
' with kgest total latency h m a state input to date output.
a Pipelining: Pipelining &ers to the altamate interleaving of combinational
logic and tegister elements in the structure. The result is passed h m one stage
to the other. It helps in naliPng higher bit rates by makkg the depth of the
combinational Iogic delay between pipeline registas smdl. Pipelining may seem
to increase o v e d latency of the design, but for antinuow tirne operation the
systeln can accept new input evay dock cyde. This increases throughput. In a
pipelincd sfrucfure a large number of algorithm iterations are processed sirnul-
taneous1y, d at various stages of completion.
2.3 Serial Design Techniques
Communication and cornputafional strakgies makes bit-suid system different
h m its p d d counterpart. The following points describe design features [6],[7], [8]
of bit-sais multiplias.
a Smder Size: Recurrence of the basic operation in a multiplication dgorithm
aUows shared hardware resoutces between the various opaations. This le& to
ana &tient irnplementation and high connectivity- In g d they art N times
smder than p d e l counterparts. Also, the I/O pin requirement in a serial
design is very smd.
O Higher Bit Rate: Pipelinhg the si@ paths at bit or digit lwei redts in
shorter propagation delays through the combiitiond logic in the operators.
Hence higher bit rak or faste clock implcmcntation can be reaüzed.
O Flexibility: For a given coefncient (multiplier) word length design multiplier
size remains the same irrespective of the data (multiplicand). This makes it
flexible for d i f f i t data word length implemcatation. Wh- size inmeases
both 6 t h d c i e n t and data word length in the case of a paralle1 design.
O Simple Control: Data bits for each word take several dock cycles to propagate
dong the given signal path. So, a contr01 si& indicating the Least Sigdicant
Bit or Digit arrival time is requUed in serial designs. This paiodic signal and
dï6erent deiayed versions of it sems the control and synchronization purpose of
the whole iteration.
O Moddarity: D b t types of serid architecture implunentation can be done
by using basic functiond un i ts like adders, multiplexors, shift tegistas, delays,
and latchcs. This gives high modularity and easy design dtetations.
O Rout-Ability: A sesial architecture r d t s in fewer signa nets or interconnects
which leads to betta tout-ability of the design.
Fig 2.2 shows the serial commWUcation aspects of the bit-serial and digit-sais (2-bit
digit) architectures dong with the contml signal. Multiplication is a Least Signiscant
Bit (LSB) fint operation. A control pulse is coincident with LS opaand bit atrival
time. The control cycles or the iteration time is shorter in digit serial data flow than
in bit-serial.
Bit Nmber i
Data Bits
LSB .:
Figure 2.2. Serial Data Flow
2.4 Performance Measures
Vasious parameters have been considaed by designers [l] ,[6] ,[12] ,[U] to meastue
the efficiency of pmcessor architectures. These measnres consider both the physical
a n a of implementation and the tirne reqomd for the computation. For a synchronous
18
system time marsurc is the numba of dock cycles qaind to process the dota bits
or the rate at which the dock can be run.
The @ormance of a pipeiined syncbronous systcm cm be m d by Latency
and Throaghput. Throaghput is the rate at which inpot data samples are nad and
output data samples aie produad. This is of higher importance for signal pmcessing
circuits and is also known as sample rate. It gives the speed mcasun of the design.
Throughput is proportional to the dock speed and iteration period of the circuits.
The ma)Eimum dock speed and hence throaghput is determioed by the depth of the
combinational circuits between pipelinhg registers.
Anotha typical meaSUTe is the area time product. This measure gives equal weight
to area and tirne. This measure is used in this thesis to compare the performance of
different architcmual implementations of multiplias in both Viewlogic and Synopsys
platforms. The reciprocal of the area-the product is sometimes considered as an
&ciency measure of the silicon implementation of the computational algorithm in
temis of asiount of tbroughput pet unit ama.
2.5 Serial vs Pardel
The principle drawback of a Sefial multiplier is that, a product can only be corn-
puted in tirne proportional to N. In a p d e l multiplier, a N-bit product can be
computed in time independent of mord length. Serial mdtipliers also have high ratio
of storage to logic So, despite the fkct that the Senal appraach d t s in s m a k
19
s k , high connecfivity, aud higha bit rate, tby t d in hi& speed application to
th& paralle1 coupterpart. A lot of -ch work has b e n done to investigate the
ways to increa~e the computational throughput of conventional bit-ka1 design tech-
niques [6], [?] ,[a], [12] . Digitserid approach, tarin-pipe bschitecfure, various recoding
techniques are some of the outcornes.
It is McuIt to quasite the p w e r consumption for di.ff&nt architexAures. For
a Complementaq Metd Oxide Semicondudor (CM0 S) implementation the circuit
draws power only during logic transition. For a &en system power consumption is
directly proportional to the dock s p d . Serial designs nui at higher dock speed but
in the circuit the logic transition depends on the bit patterns king processed.
Better AreaeaTime product of the serial designs over p d e l design is a major
motivation of this reséarch project.
2.6 Radix-N Recoding
Several recoding schemes have been introduced in an attempt to d u c e the size
of the multiplia units [7],[8]. The trend is to regroup the incornhg coefficient bits in
pair and interpret them in a so that the dvorab le ratio of storage to logic is
decreased. Table 2.1 and 2.2 show vazïous &Oding schemes.
In non-redtmdant radix-n recoding, as shown in Table 2.1 only X dinerent bits are
required for N M'nt combinations of radix-N representation are usecl. Whereas
20
X+1 dig'érent bits an used in the case of ndupdant-P recoding. Table 2.2 shows
a ndundaati4 ncoding scheme. AU the d a e n t bit groups are treated alike in
redundant mdîng- The sign bit gmup is treated dXerent1y from the umigaed bit
p u p s in the case of non-redmdant mmdhg scheme.
As a simple tise in a two bit digit s&al architecture d c i e n t at each stage can
be recoded into radix-4. Similady d i x - 8 recodhg helps in size reduction for a t b
bit digit Senal design.
For the computational steps in the algorithm, multiplication by a recoded d u e
of 4 is represented as two left shifts of the data bits. Similady addition of two left
shifts, one le& shift and the data bit as it is represents multiplication by 7 (4 + 2 +
Table 2.1, Ordinazy (Raduc-2)
1 bit (Il) Represent at ion
~nsigned Signed +1 -1
O O
Non-Redunht Rôdix-1 Radix-4
2 bits (k& & k;-) Representation
Unsigned Signed +1 *1 +2 -2 +3 O
O
3 bits (&, Y& & k;-) Represent at ion
Unsigned Signed +1 *1 +2 e +3 &3 +4 -4 +5 O +6 +7 O
Interpretation of a binaxy number into higha radices teduces the number of partial
Table 2.2. Redundaat Radix4 Rezoding
products required to form a multiplication. This is shown in the following paragraphs.
A two's complement binary multiplication of N-bit meEcient 'A' and M-bit data
word 'B' can be expressed as :
and
Multiplication based upon a non-redmdant radix-4 interpntation of the coeflicient
word is expressed as:
Only N/2 partial products are to be computed in Radix-4 representation. T h e
partid products are:
(2A2*1+ &) B4' for i f [O, (N - 4)/2] and
EQuation 2.2 can be shown to be quivalent to equation 2.1 by wing 2*$ = 4' and
Only N/3 partial products are to be cornputecl kt this representation. These partial
products an:
As above Radix-8 notation can be shown to be equivaient to ordinary two's c m -
piement notation of equation 2.1 by using 2% = B ~ , 2% = 4' and rearra~lging tetms
With increasing
(2.5)
Radix-n representation the r&g logic requirement becornes more
cornplex. This reduces the advantages of reduced area, reduced latency and simplicity.
2.7 S n m m a r y
An introduction to serial and paralle1 architecture design approach is given in this
chapter. Features of bit-serial multiplier design are presented. Performance measures
of synchronous systems are discussed. Implementation and advastage of radix-n
recoding scheme for 4 c i e n t word is given. A cornparison of saial and pataud
designs is discussed.
CHAPTER 3
MULTIPLIER DESIGN
3.1 Introduction
This chapter describes multiplia design approaches. D e t a indade the number
decomposition and algorithm development- A hierarchical design a p
Modular approach gives the flexibility to incorporate diffèrent word
length implementation arith ease- Each design is an sssembly of g d functional
units- Input bits flow and control si@ masagement in the pipelined data path is
a h discussed in detail-
Hardware tesource sharing and t h e schedniing of the events within the archi-
tecture plays a critical role in overall efEciency of the saial design- Various design
techniques lead to ~Wereot area-time measunS. The advantage of explorhg Mti-
ous design techniques within the sexial design approach is that diBetent designs may
be suitable for differeat real time (implementation) paformaace requirements. Five
designs are considerd in this thesis:
0 Ordinary B i t - S d Multiplier
a Radix-4 Bit-Serial Multiplia
Radix-8 Bit-Serial Multiplier
0 TweBit-Digit Digit-Serial Multiplier
P d e l h y Multiplia
AU the designs have three main modules. A 12-bit coefficient multiplier design is
used to compare all the design techniques(cases). The namba of data bits is inde
pendent of the size in all the designs as explaineci in section 2.3. Howeva, longer data
words inctease the Iatency of the design. The coefficient word length is inaeased by
inaeasing the namba of middle modules.
Two's amplement binary numba repre~e~tation is considerd for d the multiplier
designs throughout this thesis. Only fixed point arithmetic is wed in all the designs.
3.2 Multiplication Algorit hm
Multiplication is a LSB Srst operation. The multiplication of Coefficient and Data
in serial fahion [l] ,[6] ,[7], [8] is presented in fouowing steps.
0 Latch the serial ux&cient bits input into the latches in succession.
a Regroup the latched bits according to the Radix-n recoding scheme in each stage.
Fonn the partial pmduct h m the recoded values of the d c i e n t and the serial
data bits as they amve.
a Add the partial produd in each stage. Shift this sum by the required number of
dock cycles. Pass it to the nart stage and concumntly sign extend the previous
partial product for requyed number of answer bit.
0 Repeat above steps in each stage.
Fig 3.1 shows a typical bitaaial multiplier modale. A lin- amay axchitecture of
such modules becames the whole design.
Truncali011 and
Sign Bit Extcntion
3.3 Functbaal Units
To d e v e all the advantages of a modiilar design, the multipliexs an designed as
a cornedion of general functiod units. It provides both the flexibility in extending
the Coefjpcient nord-length and algo provides easy debugging and maintenance of the
VEDL d e . The g e n d fundional uni ts can be used in different multiplia deaigns as
a library unit. This section describes these functional anits as pictorid representation
of their VHDL descriptions.
Latches an used to store the coefficient bits from the bit-serial or digit serial paths,
for the entire iteration paiod. It is corn@ of a two input multiplexor and a bit
delay (D-&p flop). Fig 3.2 shows a latch.
The select signal in the latch circuit is derived fiom the master control signal and
is one bit (or digit) time *de. This signal is dieshed for each itastion so that the
latched bit on the Dout pin is raiewed. It is synchronized with the LSB or Least
Sigruficant Digit (LSD) arrkil t h
The delay unit used in the latch is also used for pipelining snd matching the &val
time of vatious signals in the design. A one bit delay unit is compused of a D type
füpflop without dock mable or nset inputs. These modules are dso used in the
fruncation and sign bit extension block shown in Fig 3.1.
This module is used in the truncation and sign bit extension block of the twebit
digit digit-SeLial multipliem. Fig 3.3 shows a block representation of this unit.
DELAY
DÏn - Q . - - D DFF
1 - 1 - Figure 3.2. Lat&
3.3.2 Serial Adder
Serial Adders are used in the design for the formaton of partial product sum.
Addition is a LSB first operation. A bit-Serial adder is shown in Fig 3.4. It is made
fiom a two input multiplexor, a full adder and delay unit.
During the first dock cyde the LSB of both the inputs are present on A and B.
During this penod the contn,I (sel) signal of the multiplexor is high which sets the
carry input. Carry input is set to 'O' at the beginning of slfmmation and the LSB time
signal inhibits the carry resulting hm the previous addition. The sum is produced
and the carry is delayed (or stored) for use in next dock cyde 6 t h subsequent bits
of inputs A and B. This addition proceeds until al l bits of the inputs A and B ate
processed. The hardware size of this adder rexnains the same irtespective of the word
lengths of A and B.
Digit-serial adders are the same as bitoserial adder but have as many fidl adders
in the loop as the number of bits in the digit. All the bits in a digit are processed in
pardel. The number of dock cycles t e q k to process a word deaeases with the
30
inueasing digit width. This inc~eases the propagation deday through the adder Chain
which necessitates a decreased dock rate.
Serial adder can be convated into addition/subtraction by complementing one of
the input bits and setting the cany input high. The pardel anay multiplier uses fidl
adders and ripple cany addas (of d u e n t bit width) as functional modules. They
are defined as functions in a VHDL package aad d e d to make the fui& design.
Sel Cin
Conml Ai L t A FULL
Sum
ADDER Bi b B
CotEt I
I 1 DELAY 1
Figure 3.4. Bit-Serial Carry Save Adda
3.4 Ordinary Bitoserial Multiplier
This section desaïbes the design and firadionhg of ordinary bit-said mdtipliers
[a]. They an d e d ordhary for the fact that th- is no i n t d regrouping (and
recoding) of the operand bits.
Fig 3.5 shows the thne modalce; first, middle and last modules of ordinq bit-
serial multiplier. The operand bits ate sa idy fed to the anay of these modules.
The total number of modules is e q d to the coedlicient word length. Word length is
increased by adding middle modules in the chain.
The coefficient bits aie latched into each module starting with LSB in the first
module to sign bit (MSB) in the last module. Each d u e n t bit is interpreted
either as '1' or 'O' in dl the modules but as '-1 ' or 'O' in the last module. So, the
partial product of d c i e n t bit (multiply by 'l','O2 or '-1') and incoming seriai data
bits do not requk aa adda in d the modules. The adda used in each module is for
the sammation of the t h e shifted partial product drom the lower bit order module
and the partial product of this stage. The very first module d a s not need a partial
product sum input (PPSI), so the fidl adder is absmt. The last module has a serial
adder/subtracter to take can of a ho's complemcnt negative number (caefficient).
Contd
The control aspect is simpler in a Senal multiplia than in its p d e l counterpart.
A single signal indicating the axrival of LSB and various delayed versions of it serve
33
the control pupose- The wntrol timing synchronizes the opaand bits and partial
product input arrivaS at each module. Pipelining and/or latency change k v a l time.
Latency of Ordinary Bit-&al Multiplier is geperahed as [(2*CWL)-11 cycles.
Where CWL means Cdcient Word Length.
3.5 Radix-4 Bit-Serial Multiplier
Application of tadix-4 (non-redundant) recoding to a serial multiplier design is
presented in this section. The goal behind the application of interpal recoding is to
redua the hardware siz+ of the multiplia mit as well as to d u c e latency. The
gened idea behind t his design is Ieferred fiom [6], [?] , [a].
Fig 3.6 shows the d k - 4 recoded multiplier unit. The total number of modules is
half the number of coefficient bits. The operand bits are fed saially to the éura.y of
these modales. Two bits of the d u e n t word are latched into each module starting
with the Least Signifiant two bits in the first module to Most Signiscaat two bits in
the last module-
The îust module partial product is generated by multiplying the serially incoming
data bits by the recoded values of coefficient bits (0,1,2,3). The serial adder is
for partial product generation with a mode value of 3. This partial product is
right shifked by two bits and pêssed onto the next modulc
The middle module partial products are genaated as desaibed for the first
34
module. The two &al adders lue for partial pmduct generation with recoded
d u e of 3 and for the summation of this partial p d u c t with the time shifted
partial product h m earlier modules.
The last module two Most Sigllficant (MS) bits are d e d ss 0,1,-2,-1. The
serial adder/subtracter hae is for partial p d u c t summation with time shifted
péutial product h m the pmious stage, because multiplication with 3 is not
requind.
The Radix-4 recode logic block is simi1a.t in the fint and middle modules. Handling
the sign bit in the last module makes it diffèrent. Adding a middle module increases
the &cieat word length by 2-bits.
Control
In g e n d the wntrol scheme is the same as describeci in section 3.4. A single signal
indicates the time amival of LSB into the design and various delayed versions of it
controls the complete iteration. A change is needed to modify the dock cycles to
process the ncoded values of two coefficient bits in each module. This inaeases the ,
latency of the serid data and Cdefficient paths in each module by one dock cyde.
The total latency of the multiplier is reduced because two bits are processecl in each
module.
The latency of lodiX-4 design is given as [(3*CWL/2)+1]. The guard bits are
Figure 3.6. Radix-4 Bit-Serial Multiplier
36
requimi on the data word to prevent o v d o w in partial pmduct formed by the
recoded value of 3. Guard bits arc needed only with data because the d t i e n t
does not ditectly participate in any addition. The cornparison of this non-redundant
Radix-4 bit-serial design with the one in [7] is summ- bdow:
1. Only 2 g u d bits in data word art needed in this approach where as 3 guasd
bits are used in [7].
2. In [7] the 3x signal is computed only once in the k t module and rerouted
through pipeline delays to middle modules. In our approach 3x signal is corn-
pated in each module (except last).
3. A 4:l multiplexor is used for internal recoding. Our approach uses behavioral
VHDL.
4. In both approaches the basic recoding scheme is the same and logical data shifts
are used to perfotm arithmetic multiplication because circuitry to implement
arithmetic shifts are costly.
3.6 Radix-8 Bit-Serial Multiplier
Application of radb-8 (non-redundant) recodïng to a &al multiplier design is
presented in this section. The objective is to study the effed, of increasing radix
recoding on the resulting speed and areaIof the design.
Fig 3.7 shows the rsdix-8 recoded multiplia unit. The total number of modules
is one thvd of the numba of coefficient bits. The opasnd bits an fed seridy to the
array of thse mod des-
Three bits of the d u e n t word are latched hto each module. The three Least
Siflcant Bits are passeci to the first module and the Most Signifîcast thme bits to
the last module.
0 In the first aiid middle modales partid product is generated by multiplying the
seridy hcoming data bits by the recoded values of coefficient bits (O, 1,2,3,4,S ,6,7).
The two serial adders in this module ate for partial product generation with a
recoded value of 7. This partial product is rïght shifted by three bits and passed
ont0 the next module.
a In the middle module partial products are generated as done in the first module.
Two of the three serial adders in the middle module are for paztial product
generation with a recoded value of 7. The third is for the summation of this
partid product with the time shi&d partial product from earlier modules.
a In the last module three MS bits are d e d as (0,1,,2,3,4,-3,-2,-1).
The Radix-8 tecode logic blodr is similar in the first and middle modules. The last
module is different in the way the sign bit is processed. Each additional middle
module in the chah iacreases the d c i e n t word length by &bits.
Control
Figure 3.7. Radix-8 B i t - S d Multiplier
39
In g e n d the control scheme is the same a s describeci in section 3.5. A single signal
indicating the time &val of the LSB into the d&gn and d o u s delayed versions
of it controIs the wmplete iteratïon. A design change is needed to accommodate the
dock cycles to process the recoded values of three d c i e n t bits. This inaeapes the
latency of the saial data and d u e n t paths by one dock cyde in each module. The
net latency of the multipliais reduced as three bits sre processed in cach module.
The latency of the -8 design can be genedzed as [(4*CWL/3)+1].
3 .? Digit- Serial Mdtiplier
A twebit-digit Digit-Serial Multiplier is disctlssed in this section. This design
is one of the test benches to o h e the compromise between serial and parallel
approaches.
Fig 3.8 shows a 2-bit-digit-serial multiplier. The d c i e n t and data digits(2-bits)
are f d Senally to the modules. Each d c i e n t digit is latched into these modules
starting h m the LS digit in the fint module to the MS digit in the last module.
Two bits of the latched d c i e n t digit are ncoded into Radix-4 representation. The
iteration steps of this design are the same as the Radix-4 design presented in section
3.5 except that digit-serial adders are med for pastid product generation. Also the
truncation and sign artension block in each module uses latch2bit modules describecl
in section 3.3.
The control scheme is the same as given in section 3.5 & 3.6 with a single signal and
Figure 3.8. Digit-Serial Multiplier
41
various delayed versions of it mamgkg the synhnization and pipelining aspects.
It masipulates the two bits of operand words to be processed in p d e l in egch dock
cycle.
The latcncy of this design is give by [CWL + 11.
3.8 ParaieEArray Multiplier
The key fatures of s e v d bit-serid multiplier do not stand out ifs fJ1 pardel-
asray multiplia design is not compared with them. Also, the synthesizing ability of
the VHDL synthesis tool for an ano intensive design approach is observed. For these
reasons a pardel-array multipk design is discussed in this section.
Fig 3.9 shows a pdel-array multiplier. Both Cbecient and data bits are fed
paraUeUy to the multiplier. In each row the partial product of the data bits and ith
coefficient bit is formed. This is added with the partial product, and carry from the
(i-1)th row in the amay. The N bit s u m is right shifted by one bit and sign extended.
The lower order produa bit (ith) is extracted from each row as the LSB of the sum.
The N bit carry and hi+ N bits of the sign extended sum are passed to the (i+l)th
row in the amy. The
by complementing the
Nth row in the
partial product
of coefficient word. The N bit swn and
array takes care of ho's complement format
accordhg to the Most Signiscant Bit (MSB)
highez (N-1) bits of the camy are given to the
(N+l) th row. Higher N bits of the product are formed by adding previous N bit sum
and N-1 bit carry with MSB of the d c i e n t word as LSB of the carry input in the
N+l th mw.
The latency of the p d e l - a m y is a single dock cycle. Pipelinhg delays included
in the amay inaeases the latency, but deasrcs the propagation delay through the
adda chah. This gives a higher dock rate. The pardel-array algorithm can be
implernented as a pure comb'iational &mit. Hence a separate contml circuit is not
required to synchronize the events in the computation schane.
3.9 Summary
The traditional two's compliment bhary multiplicstion algorithm is presented in
this chapter. Basic fanctiond uni ts rqWred for hieratchical design development are
given. Detaüed design descriptions for five multipliers are desaibed with their control
requhements and features. Blodr dia- for the VHDL design description concepts
for each multiplier design is given in this chapter.
Nth Row
0 Most SigdiautN RodPct Bits
Figure 3.9. ParaUel Array Multiplia
CHAPTER 4
DESIGN SYNTHESIS
4.1 Introduction
The HDL design method has largely q l a c a d schematic capture in digital design.
It has increased the productivity of a logic designer because it is:
easier to d d with large and cornplex designs.
simpler to reuse a design as they are more concise and readable.
a focused more on the Iogic verification than on the detded gate-level ùnplemen-
tat ion.
HDL design methodology steps indudes the development of a design description,
validation of the description, synthesis, and &al verification. The pro- of designing
hardware fimm a mode1 that defines the way the hardware d l operate is calleci
synthesis. HDL synthesis convats an abstract textual d-ption of a design into a
gate level net list. Typical HDL synthesis consists of two stages.
a Wanshtion: This is the bridge between two levels of abstraction, RTL and gate
level. A behavioral desaiption can be transIated into R!îL level or syntheskd
to gate level depending on the tools used.
45
Optimisation: This is a technologyapeafic design transformation to meet area
and speed mquirements for the design. Optimization is to maximize or mhimk
a d h b l e pedorma~ce characteristic daring the design process.
HDL synthesia yicld is measurtd in tams of the circuit quality with respect to a set
of design g d . It is essential to focus on the issues nlated to the final outcorne. This
chapter diseusses the VHDL design description and synthesis issues of Viewlogic and
Spopsys tools. The main objective is to address the differences in the various steps
incorporated in the HDL design methodology in these tools.
4.2 VHDL for Synthesis
VEDL has corne a long way. 1t is one of the popular HDLs in use today. It was
e s t developed in 1982 by the US depastnrent of defénse. It was recognized as a stan-
dard HDL by the IEEE (IEEE1076 standard) in 1987 and in 1993 [2] ,[3],[5] ,[21],[23].
VHDL is sirnilar in style snd syntax to modern programming Iaogueges, but includes
m a y hardwar+speafic constmcts. Fig 4.1 gives a pictorial representation of a VaDL
hardware model. It is a strongly typed language. A hardware model can be described
in Merent VHDL design description types. VBDL language constmcts are divided
into t h categories acmrding to th& ievel of abstraction.
O Behavioral: This category d&es the functiond or algorithmic aspects of the
design description, without nfeirence to its actual interd structure. Such d e
scription consists of system outputs expressed as:
- The fiinctions of system inputs by using boolean eqastions.
- The huiction of t h e and system inputs by using sequential VHDL process.
a Data-flow: The interpretation of data as fiowing tbrough the design, fiom input
to output. It is defiaed in tanis of a coileztion of data fransformations, expressed
as concurtent VHDL statements.
a Structural: This description is dosa to hardware. It is a VHDL model where
its hdionality is described in terms of instantiation and interconnections b e
tween sub-modules in the design hierarchy. An example of structural VHDL
description is given in appendix B.
Various VEIDL constructs work togetha to desaibe a design. They are:
0 Entities: This defines the interface to 0th- desigps tbugh port dedaratiom.
a Architecture: The fiuictional implementation of a design entity is defined in
architecture. It indudes different design picces and ~ ~ n s ~ c t s comnunicat-
ing through the signais by concurrent statements. An entity can have sev-
aal architectutal implementation to meet the requirements. Together the en-
tity/a;tchitecture pair repreaents a component. An architecture consists of the
following design pieces:
Figure 4.1. A VHDL Hazdware Mode1
48
- Declarationr: A definition of the signals, constants, and components to be
used to describe the design functionality.
- Pmcess: A p u p of sequentidly executed statements makes ô process. It
d&es an independent quential proccrs representing the behaviot of some
portion of the design. It is exeded whenever an event occurs on any of the
sipals in its sensitivity list. Ddaxation of a process in an architecture is a
concurrent statement .
- Subprogrpms: They define algorithms for cumputing values or exhibiting
behavior. They are used as computationd resources by s e v d architectures.
Unlike processes they m o t directly read or d t e signals fiom the rest
of the architecture- The communication is done thtough the subprograms
interface. There are two forms of subprogcams:
* P d u m : it is a subroutine which opesates on all visible parameters
and objects, and returns zero or more values through interface sigaals.
Within an arrchitecture a procedure is either:
a c o n m t procedure instantiated as a concurrent statement
* finetion: It is a routine th& retturns a single value directly. A hct ion
defines the return d u e which is cornputed based on the values of the
formal parameters.
- Component lnstantiation: It instantiatcs the components defined in the
declmîtion part and connects their ports to 0th- components and concur-
rent s i W . This is a major construct for structurai description style.
0 Blocks: It describes a portion of the hierarchy of the design in an architecture.
A blodr is a unit of module structure, with its O- interface, comected to 0th-
blocks or ports by signals.
0 Configuration: The binding of a particu1a.r architecture to an entify to make
up a component in a design is describecl as configuration. This comtruct is used
to select the best (suitable) architectural implementation of the an entity.
Packages & Libraries: A VHDL Package is a collection of general constants,
data mes, component dedaration and subprograms that can be used by more
than one design. Each VHI)L package is cornpileci into a logical VHDL library
name. These libraties are compiled VHDL codes mapped into physical location in
the disk. A VHDL library may consist of several VHDL packages. They faalitate
modulas design appmach, easy handling of a cornplex design and design reuse.
Predefined VHDL libraries and packages to be used in present design is dcsaibed
prior to the entity dadaration. Each entity is defined by a partidar architecture
which is composed of several VHDL mmtructs. All the concurrent statements within
00
an architecture compute their dues at thc same time and they coordinate by mm-
municating via signaS. hespative of the interd funaional implementation the
interface signais enter and lave the design throagh entity.
4.2.1 VHDL Constructs support
VHDL coduig is the foundation for logic synthesis. Both Viewlogic and Synopsys
tools support most but not all IEEE standard VHDL constnids [19] , [20] ,1211 ,[BI. The
support also differs for simulation and synthesis phase of a design. Thus a VHDL d e
scription that simulates correctly may not be synthesizable. Many VHDL m n s t ~ c t s
usehl for simulation are not relevant to synthesis and csn not be synthesized. The
list of IEEE standard VBDL construct support in Viewlogic and Synopsys tools can
be foud in detd in [19],[21],[23].
In addition to the d3Terence in VHDL constructs support, both tools have their
own guidelines for better syntheois results. A good coding style wmplying with these
guidelines generates efficient designs. Some of the issues regarding c o ~ c t s support
and guidelines relevant to the designs inv01ved in this research projet are discussed
hem.
Synopsys tools have special VHDL cornrnents that can be used in the VHDL d e
scription as compiler directives to direct the synthesis actions. This is not possible
with Viewlogic's VHDL description. SMarly Synopsys tools have better set of syn-
51
thesis attributes and constraints. The ana a d s p d n k t e d paforxnance comtraints
for a desig~ ( m m d e l a y ) can be described directly in the VHDL description.
The synthesis tool franshtes Synopsys defîned VHDL attributes as design constraints.
This capability serves two purposes :
Optimization of a design a n be controlled h m wîthin the VHDL description.
0 The VHDL description can be used to document these important specification
information.
It is important to incorporate multi-aschitechue implementation of a design entity
during early stages for architectural tradeofi. Configuration construct are used to
bind an entity to a architecture in a hienuchical design. This construct is not sup
ported by Viewlogic, but Synopsys supports configuration fot one toplevel entity
with as architecture.
4.3 Viewlogic Toob
Powerview, the Unix version of Viewlagic tools was used for synthesizing VHDL
description into net-list [20] ,[21]. Fig 4.2 gives an overall design flow with the tools
involved in each step. Presynthesis fundional verification is done by analyzing the
VHDL description by VHDL analyzer and simulating the analyzed file with ViewSim
toois. This is shown by path 2. The design synthesis path leading to p s t synthe-
sis gate level simulation is shown by path 1. It starts with analyzing VHDL within
52
Viewsynthesis tool. The Viewsynthesis tool is the con of the logic synthesis pr*
ces . It is an Graphical User Interf' (GUI) with command line interface- Opaating
conditions, design comfratots, target technology libraty and analyzed VHDL an the
inputs to synthesis process. The rcsynthesis path is to reoptimize the design with
Merent set of constraïnts until the design mets the nquirements. Afta synthesiziig
the design a gate level net-list file "design-wir" is generated. This is used by the VSM
simulation tool for gate level simulation and by Viewdraw schematic representation
tool. This net-list output is a h used for hardware implementation. ViewSynthesis
tool does not support back annotated optimization with the physical circuit param-
eters fiom floor-plaPaing or place & route tool as an input to synthcsis process. But
these parameters can be used to constrain the design to meet the requisements.
4.3.1 Synt hesis Crit eria
Viewsynthesis has three sub components for controllhg synthesis parameters im-
plementation. They are :
0 Synthesis Criteria:
- The design optimization in tams of éuea vasw speed is done by assigning a
number betwan 1 and 100 to arealspeed psrameter. Number 1 meam the
design is optïmized for minimum axea.
VHDL Description
VHDL l
Set CriteZia ûperating Conditions dé Desi@lcoILStf8ints
I ,
Report Log ale wTm
Simulation ViewSim tool
Simulation R d t s
Fi- 4.2. Desi- Flow with Viewlogic Tmls
54
- The technique d by the synthesizer dusing opttnization is specified by
assi& a d u e to logic- parameter. The design's l@c type could
eïther be Fite State Machine (small), data path (large) or mixed (default).
- The target techno1agy into which the design is to be mapped is s p d e d by
assigning a technology iibrary name to tech parameter.
- The design speed related to different fabrication proces is speafied by as-
signing either O (slow), 1 (typical) or 2 (fast) to the process parameter.
- The physid operating points for the design are speàtied by assigning re-
quired values to temperature and voltage parameters.
Design Constraints: The following set of consttaints helps in modeling timing
behavior of the design during design optimization.
- The ViewSynthcsis tool uses the value assigneci to inputanival parameter
to mode1 the numba and type of gates in the path. It s p d e s the time
value of signal arrivai at different pins in the d e i g .
- The spdcation for signal required time at the output pins in the design is
done by using outputnquired parameter.
55
- The dtiw strength requiremat of an inpat pin is s p d e d by input-drive
paxameta. This determiPa the type Gd quantity of loads put on the sîgnals
by the synthesizer. They an used in delay dculatîon.
- The output Ioads of an output pin is constrained by assigning s value to
- The input load to an input pin is constrained to a maximum by using
maxinputload parameter. This load specification is used in delay calda-
tions.
4.4 Synopsys Tbob
Synopsys is a bigger CAD package than Viewlogic It has plenty of tools and
fatmes and a better Unix Like interfhce and GUI compared to Viewlogic. Only
logic synthesis related issues relevant to the design in this project are d i s d in
this section [23]. Fig 4.3 gives an o v d design flow with the tods involved in each
stage. Path 2 and 2a in the diagram show the preqmthesis simalation path for behav-
iord verification. It indudes andyzing VHDL code using Graphid VHDL Analyzer
(GVAN) tool and simulating the analyzed design with VBDL Debugger Simulator
(VHDLDBX) tool within VHDL System Simulator (VSS) family tools [23]. Path 1
shows the synthesis snd design optimization path. VHDL System Simulator (VSS)
tools and simulation control laquage with test vector inputs are used to simulate
the functionality for both paths 1 and 2. The Synopsys Des ip Compiler Family syn-
56
thesis tod is used as the cote of the synthesis process. In synthesis path the design
is analyzed withïn the Design Compiler- It is then ehborated to check correct bus
sizes and produœ an intennediate design. Analyze and elaborate steps are together
d e d nad design step. Design constraints, choia of target techn01ogy libraryt design
environment setting are provided as inputs to the synthesis or design compile proass
[3],[23]. Design environment settings refa to the values of voltage, temperature, aad
silicon pfocess for the library cells. Design coxwhînts snd opthkation axe discussed
in section 4.4.1. The Design Compiler Family has FPGA Compiler which targets
FPGA technoIogies. Since a Xilinx FPGA is used for hardware implementation of
the designs in this project, FPGA Compiler tool is used for synthesizing designs in
Synopsys. The FPGA Compiler has a special aigorithni that synthesizes and maps
dkectly to XC4000 devices. This gives an &&nt means to implement high-level
architectue-independent designs in Field Programmable Gate Array devices.
mer synthesizing the design several design âles are generated. A synthesized
output in the Synopsys intemal data base format ("design-db*) is used for schematic
representation and future format convexsion. The output in the VHDL gate level net-
list format ("gate-level-design-vhdn is used for gate level simulation through path 2b.
The synthesized VEDL net-list of target libmy c& is d y z e d using GVAN tool and
simulated with F d T i g Gate-level Simulation (FTGS) VHDL simulation models
using VSS tools. The third output of the synthesis process is the Synopsys Xilinx
Net-Est Format (XNF) file which is used by the place & route tool.
Synopsys supports back annotated design reoptixnization. The physid circuit
information and delays h m flmr-planning or place & route tool written in Standard
Delay Format (SDF) is sapplied as a comtraint to the design compila. This provides
* . better and doser to implementation comtmmq and racompilùrg of the design for
improved perfomiancc This is shown in Fig 4.3 by the recompile feedback path aRer
place & route is done.
4.4.1 Constraints and Design Opthkation
Constraints define the goals of the synthesis process in termo of meamrable circuit
characteristics (area, timing). The optimization process or the design compilation
attempts to implement a combination of target library cells with design wnstraints
to meet the hctional, ara, speed requirement of the design.
4.4.1.1 Constraints
Synopsys logic synthesizer has two major types of design consimhts.
0 Optimization Constraints: They represent design goals and restrictions that
a designer wants but may not be crucial to the operation of the design. They
coneists of timing consfraiiits such as input delay, output delay, maximum ares
and pomity. Maximum dday takes the highest precedence during optimization
phsse.
Design Rule Constraintr: They relaect technology qeckc restrictions that
must be met for a hctional des@ They indude constraints Iike maximum
transition, mPucimum fan out and maximum capacitanœ-
Design Compiler b t works on optimization wnskahts. When both timing and
an0 consfiraints are used it attempts to meet timing g d More ares goals, because
timing dways takes p d e n c e ova axea. Afta opthkation constraints are met
the design compiler works on design d e constraints. The tool tries to mat both
design rule and optimization oonstraiats but gives emphasis to design nile constraints,
because they are requirements for a funaional design. So the design compiler fixes
design d e violations even at the cost of vio1ating optimwtion constraints. During
the compilation phase the Design Compiler tries several opthkation moves. These
moves are accepted only if it decreases the eost of one parameta without increasing
the cost of more important parameters. In 0th- words,
An optimization move that improves maximum delay parameter is always ac-
O An opthbation move that improves powa is accepted only if maximum and
minimum delay and minimum pomity parameters do not inaeaee.
O An optimization move that improves area is accepted only if power and delay
costs do not inaease.
The design compiler attempts to optimize the design to meet the constrahts in
vasious phases of optÎmization. They are:
a I/O pad optimization
a Final Sequentid Optimization
The first phase of gate-level optimization is to map the sequential tells to the cells
in the tKhnology libraq. At this point the delay thmugh the combinational logic is
not defineci. After this phase following optimization information is dhed:
1. Location of the combinational logic clouds between sequential cells.
2. Timing constraiats on the logic clouds required to meet the setup and hold
consttaints on the sequential ails.
The combinational optimization phase trassforms the logic level description of the
combinational logic in the design to a pte-level net-lia. Two main steps of this phase
are:
61
1. Tecboology-independent optimization, which operates at the logic Iwel. It a p
piies algebraic and boolean techniques to a set of logic equations. This step
r&mplements the logic equations to meet the timing and ares goals, but rdains
the hctionality of the original logic. The common techniques used in this step
are:
0 Flattening: It removes aU intermediate &ables, resulting into tw4evel
sum-of-products form.
Structuring: It converts twdevel logic equations to a multilwel structure
to meet the design comtraints. This technique f&om out common subex-
pression as intemdiate variables, then substitute these variables in other
logic equations where possible.
2. Technology-Dependent Optimization (Mapping): The output fiom the previous
step is used in this step. During mapping, components fiom the tedrnology
library are selected to implemeot the logic structure. The initial logic structure
is mammged locally to try diff't logic combination, mtil those components
that xneet the predehed design comtraints are kept.
After a full mapping of combiitional logic and an initial mapping of sequential
iogic, 110 pads are ineerted and mapped. In this phase, input and output bders
are added to each port in the top-levd design. The bders are sized to meet the
62
port-to-port timing c o ~ t s when the delays through the wre logic are known.
As I/O b&ers consume signiscant nurent, the smdest I f 0 b d k that meets the
timing speciscation of the design are selected.
With accorate d u e s for all delays through the 110 pads and combinational logic,
design compiler replaces the initial estimate on sequential ce11 mapping in bal sc
quential optimization phase. Cornplex ssquentid eells fiom the libraxy can be used
to reduce area and delay. Design timing can be improved by choosing higher perfor-
mance sequcntial tells.
Localized adjusting is the final phase in gatelevel optimization. It follows a set of
heuristic rules to make l o d optimizations to adjust area and delay.
4.5 S u m m a r y
An introduction to VHDL as dcsign description tool is given in this chapter. A
block diagram representation of VHDL hardware model is preseoted. The design
synthesis flow for Synopsys and Powerview as VHDL synthesis tool is discussed.
The importance of the pro- of design optimization and parameter constraining to
achieve optimum results is discussed in this chapter.
CHAPTER 5
DESIGN IMPLEMENTATION
5.1 Introduction
The lowest level of design description is the physical domaio. The physical domain
specifies how the structure of a Semicunductor techn01ogy is built. This structure has
required connectivity between physid blocks to implunent the prescribed behavior.
Physical abstraction of the design hctionality is an involved process. The purpose
of this chapter is to gin an ovavicw of prcclent trends in physical implementation of
digital IC design and related semiconductor technologies. It is not intend to cover
details such as 4- proassing, p h o t ~ ~ k i n g and other steps in fabrication process.
Logic f d e s have corne a long way with advancements in semiconductor technolo-
gies [i 71, [NI, [El, [26], [33]. The phenornena that started wit h vacuum tubes, diodes
and switch-mode transistors ha9 evolved into gate arrays, standard ceIIs and p m
&rammabIe logic devices (PLDs). The quest for d e r , faster, and low power de-
vices has emerged into todays CMOS and GaAs techn01ogies. CMOS logic is the
most popular logic family in the semiconductor i n d m now. Section 5.2 describes
the CMOS logic's sco~ecard. There ase meral ways of implementing a CMOS system
design. An o v e ~ e w of these options and the implementation methodology used in
this thesis work is discussed in section 5.3.
5.2 CMOS Logic
CMOS technology is one option in a range of technology available to the electmnic
system designa. 0th- populat options indude silicon bipolar technology, GaAs tech-
Due to these new technologies olda logic familes like Diode 'Ltans'itor Logic
Register Transistor Logic (RTL), Emitter Coapled Logic (ECL), Transistor
Transistor Logic (TTL) are d y used now. Among new technologies, GaAs demon-
strates the fastest gate speed. Bipolat technoIogies an not far bebind, and admced
CMOS technologies are comparable with bipolas. CMOS technologies in g e n d show
the highest densities and lowest power per gate. CMOS technology is adequate for
analog circuits but better paforming bipolar circuits may be constructed. CMOS
technologies are are the cheapest to mapufacture for high densi@ digital circuits 6th
moderate analog requirements. Design a t s are the cheapest for CMOS due to the
large investment already made in design tool and all l ibdes. A combination of
CMOS and bipolar technologies d e d BiCMOS is emaging as a popdar techno1ogy,
especially for mu<ed signal chips[l?]. Though CMOS is not the only choice, for an
overwhelming pucentage of today's electronic systan, it is the technology of choice.
It is worthwhile to know the advaptages and disadvantages of a technology type when
making systan level decision for implementation. A brief summary of main CMOS
attributes are presented below.
Fdy restored logic levels, i.e. output settles at VDD or Vss.
0 Trassition times - Rise and F d times are of the same order,
Memones are implemated both densely and with low powerc dissipation.
Transmission gates pass both logic levcls well, allowing use of &dent, widely
used logic structures sach as multipIexors, Iatches, and ngistas.
0 Power Dissipation - Alrnost zero (only leakage) static power dissipation for M y
oomplementary circuits. Power is consumed only during logic transition.
0 Precharging Characteristics - Both n-type and ptype devices are adable for
precharging a bus to VDD and Vss- Nodes can be charged M y to Vm or
alternatively to Vss in short tirne.
0 Power Supply - Voltage q q a w d to switch a gate is a fixed percentage of VDD.
Variable range is 1.5 to 15 volts.
0 Padring Density - Rsquires h devices for n inputs for compIementary static
iogic. k s for dynamic logic fonns.
Layout - CMOS facilitates @ar and easily automated layout styles.
Due to its dominance CMOS procese density is of sab-micron level (a measure of
CMOS trmistor geornefry in pzocessing tcchnology). Cornplanentary gates are al-
most guaranteed to finction conectly. The automated CAD packages available have
66
reached a point where the majority of systems ean be implemented in highly sut*
mated fsshion. Howwer, 1-edge pducts continue to push the technology in
tenns of cost, density, speed and powex. BiCMOS is an arample, which is a cornProc
mise between low power, hi& density of CMOS and hi* speed of bipolar devices.
5.3 ASIC llechnologies and Programmable Devices
The CMOS chip implementation has wide ViViety of options pmviding the tade-
ofEs between design complexity, cost, spd-of-operation, impiementation and t h e
to market. This section gives an overview of such impIementation options and the
methodology used in this thesis. As the process of designing a system on silicon
is cornplicated, Very Large Scale Integration (VLSI) design ai& bave corne up with
severai CMOS technologies that caa be automated into the CAD tool being used.
These ASIC techaologies d u c e the complexity, inaease productivity and assure the
designer of a working product while providing some flexibilities. Fig 5.1 shows the
acronym tree of IC technologies.
Programmability of the ASIC technology is a way to achieve wides use and flex-
ibility. Often, the performance of the design imp1emented in programmable devices
rnay not meet system goals and an altemative solution is required. This prompts
the need for custom implementation. But, the reprogmnmable featum, cheap &
short prototyping time and automated design steps integrated with the CAD tools
make p r o g i z a b l e devices popular for design implementation. The spectrum of
3 I (b * 8 9 Field-programmable (PLDs)
f
Full Custom Standard MPGA Simple PLDs Complex FPOAs FPICs Ce11 (e.g. PALS) PLDs (e.g. Xiliax,Actel)
programmab1e devices in CMOS is divided into thm areas.
a Devices with programmable logic structures. T h i s dass of programmable CMOS
devias are n f d to as Prog~smmabIe Array Logic (PALS) or PLDs. h e r a U y
they are imp1emented as AND-OR plane devices, e.g. 22V10. These devices are
programmed by ciianging the charocteristics of the switEhing element.
Devices with programmable interco~ect. These devices progam the routing.
An example of this method is the Actd FPGAs which uses an element d e d
PLICE (Programmable Low-Impedasce Circuit Elaent) or snti-fuse.
a Devices with reprogrammable gate arrays. These are the mat popdar devices.
They are discussed in section 53.1.
In g e n d a ptogrammable bgic device consist of the foIlowing basic resources:
a Logic/Memory Blocks: Configuration if these blocks is based on lookup tables,
muhiplexors, AND-OR planes, gates, or transistor pairs.
a I/O Blocks: U d y bidkctional and may incorporate latches, fipflops, slew
rate contd, puUup/puiidown.
a Interconnect: They ptovide scient local and global connections. The focus is
tu provide maximum flexibility with minimum delay and area.
a Dedicated, low-skew dock distribution networks.
69
The best balance of the above resowces is the t q e t of every programmable device.
The programming technologies hcorporated by these devices indude:
Fusible Links: This tcchnology is normally used in conjunction wïth a bipolar
process, whae the device csn sink the high cannt naded to blow normdly
closed fusea. They are onetime pmgrammab1e.
a Anti-fuse: A normally high resistasce structure is changed pe~manently to a
low-rsistance structure when a high programhg voltage is applied. This is an
one-time prognrmmable technique-
Static Random Access Memory (SRAM) ce&: In this method the intercon-
nect configuration is achieved by contxoIliog transmission gates, Multiplexors
(MTlXes) or pass transistors. The state that determines a given interconnect
pattern is held as a application program in static &AM cells distributed across
the device. This technique provides re-programmability.
a EPROM & EEPROM switches: A signal is pded dom ushg Electrically P m
grammable Read Only Memory (EPROM) or EIectricaUy Erasable Programmable
Read Only Memory (EEPROM) cells. This method is also reprogrnmmable.
A programmable gate array device consist of identical logic hctions difhwd in
a reg& pattern, or -y in süicon. Tbese b1ocks of simple logic fhction can
be inte~connected as nqaired by appropriate customization of one or more levd of
metaization. The regular layout permits the use of automateci routing progrâ111~
that can translate a logic net-list into a chip layout. A sigdicant advantage of the
re-programmable gate anay is the ability to ncamfigure the intemals of a chip by
changing soRwate (routing program). This flexibility is of considerable advantage in
a product that has to undergo field apdates.
FPGAs azchitectures are good compmmise between standard (fixed design vendor
based) and custom circuits. Hence this approach d a s not asuaily yield as hi&
a performance as fd-custorn solutions, nor it is as flexible in the range of circuit
complexities which can be accommodated. So, a FPGA implementation is generally
referred to as Senil-custorn design.
The Xilinx FPGA technology was chosen to implernent the designs in this project
work, primarily because among sevesal ASIC technologies, the Xilinx technology 1i-
brary is the only one available for synthesis in Synopsys as well as Viewlogic tools in
the ECE Dept. The'other factor is that XACT tool is available for placement and
routing the design. &O Xilinx is one of the most populas FPGAs in the market
today. Some of the otha re-programmable FPGAs are Altera, Atmel, Algotronix,
71
ATkT and one time progmmmbie FPGAs am Quicklogic, A d , Cypress. A Xiünx
FPGA consists of a symmetrid array of CodgurabIe Logic Blocks (CLBs) embed-
ded aithin a set of horizontal and verfical channels that contain routing which can
be customized to interconnect CLBs. The interconnec3 configuration is achieved by
tarning on n-channel pass transistors. Static RAM ails are ueed to hold the state
that determines a given intercomect pattern- Each CLB conaist of two 4 by 1 &
one 3 by 1 lookup fiuiction generators, and two flipflops. Each input and output
on a CLB has a dVsd intercomect, which allows most l o d intercomection between
adjacent CLBs to take place. The global hterconnect is achieved by programmable
switchiag matrices at the jnnction of horizontal and vertical routing channek. The
timing of a design implemented in Xüinx FPGA is dependent on the basic CLB speed
and roating delay terms. Appendix A shows the block diagram of Xilinx X W 0 0
series family CLB.
5.3.2 Standard ceil and E U custom design
Gate-array architectures standardize at the chip geometry level whereas standard
ce& standardize at the logic or function level. That is, a spcaôc design for each logic
gate or a logic function in a library can be crrated. This is the basïs of ceIl based
or standard c d design. Library cells are created for the following general dass of
circuits:
72
SmaU Scale Integraticm (SSI) logic: nand, nor, xor, inverters, bders, registers
Medium Scale Integration (MSI) logic: de code^^, aiaders, addem, companrtors
Datapath: Anthmetic Logic Units (ALUs), adders, shifters, bus extractors, reg-
ister files
Mernories: RAM, Rcsd Only Memory (ROM), Content Addressable Memory
(CAM)
System-level Blocks: micmxontrollers, Univexsal Asyacbn,nous h i v e r Trans-
mitters (UARTs), Reduced Instruction Set Computer (RLSC) cores, multipliers
Compared to gate--y, standard-ceU implementation provides betta density at the
cost of u1creased prototype, and increased design complexity. Standard-cell design
might result in better ptoductivity because the fimctions do not need to be designed.
men standard cells are available a9 a set of parameterizable modules.
A custom IC is individually designed for a parti& requirement. Fd-custom
implementation is the name given to technique where the function and layout of
practically every transistor is optimbd. This technique is programmed at the silicon
mask level. This d t s in higha density (reduced n), optimal choice of transistor
s k and numbers and bence best pedormaace among all implementations. The prin-
cipal disadvantage of fdl-custom logic is the large d o r t (initial CO&, tirne) required
in design and testing, hence low produdivity.
5.4 XACT Toob
A Xilinx-based bore implementation toob d e d XACT is aeed in this project as
a place & route tooi to implement dcsigns into Xilinx FPGA. It provides an iden-
ticai impIementation approach to aU the designs synthesized in Synopsys as wd as
Powerview- This créates an opporfunify to have a fair cornparison of different de-
si- targeted in the same technology librazy- Fig 5.2 shows steps involved in the
design implementation flow using the XACT bol. The UrIR2XNF program in Pow-
erview converts the VkwDraw wïr files into one oc more Xilinx XNF files (depending
on hierazchy of the design and synthesis method used). In Synopsys a single file
containing the hieratchical information of the design is written in XNF format ("de
sign.swf"). SYN2XNF program h m XACT tools converts this file to Xilinx XNF
fle. XNFMerge mages XNF filos into one XNF fie ("design.dE") with flatteneci hi-
erarchy. XNFPrep pedomu Design Rule Check (DRC) to check the design to aisure
no design d e aror exist and removes u n d or disabled logic from the design. The
output is an "designxtf" file. Partition Placernent & Route (PPR) uses the mapping
of the logic primitives to map them to CLBs. It then places the maam (CLBs &
I/O logic) onto the FPGA and routes the appropriate connections. PPR produees
an output file Cdesign.lca') containkg placement and rtmting iaforniation for the
FPGA. XDelay is a ststic timing andyzer- It takes the PPR output and produces
the worst-case path delays for cliffernt path types. The MakeBits program produces
Fi- 5.2. knplementation Flow in XACT tools
75
bit stream ("design-bit") which can be downloaded to the FPGA chip for the physical
design verification.
The physid testhg of the serial multiplier desi*gns n a d extra circuits for paralle1
Some of the serial multiplier m e w a c designecl with these extra circuits for test-
ing them in Xilinx 4010pg191-6 prototype board for physical design verification. A
schematic diagram of such a setup for Radix-Q design is given in Fig A.2.
5.5 S u m m m y
Features of CMOS as the most populat logic today is desaibed in this chapter.
DiEerent implementation styles for digital design is htroduced. Programmable fea-
tures of ASIC techn010gies and theh advantages is discussed. Choice of Xilinx FPGA
for design implementation in this thesis is given. Design flow for XACT as an imple-
mentation tool is describecl-
CHAPTER 6
RESULTS
6.1 Objectives
The main objective of this nwaxch is to compare dinerrnt multiplier aigonthms.
Bit-serial multipliers are major focos of this thesis. The multiplier designs presented
in fhapter 3 are designed using VHDL synthesis and implemented on a Xilinx FPGA.
Another goal of this researcb pmject, is to compare the capability of two VHDL based
logic syntheshm (Synopsys & Viewlogïc) using bit-serial multiplias. It is tecognized
that a detailed comparison of these tools can not be cazrïed out based on these designs
alone. Also, as Synopsys CAD is so huge and loaded with many features (Powerview
twls is only about 1520% of the c4st of Synopsys) the compaxison may have some
bias. But a cornparison based on the design pafosmance is justifiable.
This chapter preseots the implementation results of the multiplier designs. Gaphi-
cal representation of the pdonnance measurar discussed in chapter 2 for various mul-
tiplier~ is also presented hcn. The cornparison of design performances is disfussecl.
The VHDL synthesis tools am compand accordhg to multiplier design pedo~~a9ces .
Fiaily some conclusions are drawn.
6.2 Implementation Results
The Xilinx FPGA implementation of various multiplia is presented hem. The
multiplier designs are compared among themselves in each tool and also aaoss the two
77
tools aaed (Spopsys & Powexvïew). Tht perfomce measures considered here are
design sp- (h), sample rate (throoghp~~t) and Ara-Tiw product as described
in section 2.4. AU the multiplieas are designed for B b i t d u e n t irnplementation.
12-bit d u e n t provides fair cornparison of five Merent mdtipliexs describeci in
chapter 3 because radix-4, d i x - 8 and two-bit digit-serial designs can represent 12-bit
coefficient as an exact multiple of multipIier modules. AU the designs an imp1emented
in Xilinx 4010pg191-6 part type. During the synthesis phase similar constraiats and
operating conditions were considaed in both the tool, as fat as possible.
Table 6.1 and 6.2 give the implementation results fiom Powemiew and S y n o p
sys synthesizer with XACT tools. The best results based upon compilecl values on
maximum speed and minimum area and using diffkrent synthesis methodologies are
presented here. Spad optimization is focased on implementing the design in the
r target technology Iibrary with shortest delay or fastest dock rate. Whaeas area o p
timization looks for fimctional implementation of the design with srnaIlest totd ceII
wa in the taqet librafy. The area column in tables 6.1 and 6.2 an taken from
Xilinx FPGA implementation of the designe using XACT tools. It shows the nwnber
of CLBs occupied by each design. Another result taken h m the XACT tooi is the
estimatecl maximum clock rate for each design and optimization mode. The dock
rate column &O npresents the worst case propagation delay for each design in these
tables.
78
The latency in these tabla ate calculatecl according to the design description given
in chapter 3 for 12-bit d c i e p t implementation- A bit-seriad data flow can accept
a new set of input every Data Word Lcngth (DWL) numba of dock cycles. For
2-bit digit-serial data 00w it is 1/2 of DWL number of dock cycles and for p d e l
array it is 1. This is given as iteration time in numba of dock cycles in table 6.1
and 6.2. Sample rate or throughput is dculated h m itemtion tirne and minimum
dock paiod (l/dock rate). Simil;uIy Area T h e (AT) product is calculateci as the
product of area and the per sample. Time per sample is the tirne required for a
complete sample. These dculations am explaineci at the bottom of esch table. The
throughput n s u l t s are plotted as inverse of the dda ted value (in Fig 6.2, 6.5 and
6.8). This gives an opportunity to Iook for the shortest bar for the best performance
for area, throughput and AT product plots. Following discussion gïves the graphieal
cornparison of design performances bascd on the results in these tables.
The plots shown in Figures 6.1 to 6.3 demonstrate the compasative perfkmances
of the multiplier designs in the Powerview tool. The best results for are% sample
rate and AT prodnct are c o n s i d 4 h m table 6.1 for these plots. These graphs are
plotted using normalized values. Results for ordinary bit-Jerial design is considered
as a base unit and other design paformaaces are scaled acoordingIy. The order of
designs in x-iucis is chosen srbitrarily. Fig 6.1 shows that increasing radix-D recoding
decreases the design size. Fig 6.2 is a plot of the inverse of normalized sample rate
Latgcy (No* of Clock Cyclw)
23
2lock 1 teration M e 1 Thw (MHL) (No. of mir Bi&:
2 2 . 1 . DWL
Radix-4 Non-Redwidant (Bit-Seriai) 25.7 1 DWL
Radi x-8 Non-Redundant (Bi t-Md)
Digit-Mal (2-bi t Digit)
Minimum Clock Petiod = I / Gluck Rab Simpio Rite = Il (Itwrtian Timo in W of Clock Cy~loi) (Minimun Clock Pwiod) Tlm P a $ample a Minimm Cbcû pcriod * I twrth Timo in fi of Clock Cycles AT Producl = Am in 1V of CLBi 9 Tims Pm Sunplo DWL = Daîa Word h g i h = 12 Bitr
Table-6.2 : Reaults from Synopys Synthesizer & XACT tools.
Sample Rete AT M u c t Latcncy
(MHz) ( A m + Time Par Sample) (No, of Cloclt Cycles)
DWL
Radix-4 Non-Raiundant (Bit-Serial) DWL
R8di x-8 Non-Radundant (Bit-Mai)
Digit-Serial (2-bit Digit)
Minimum Clock Pciiod = 1 / Clook Riw Sunplo Rate = I / (Iîmîion T h in (V of Clock Cyolaa) + (Minimum Cbck Pcriod) TIma Per Sunpla = Minimum Cbck Puioâ * I î d o n Timo in If of Clock Cycles AT Producl = Am in U of CLBa Tim Pw SHngle DWL=aiiîa WordLength- l2Bitr
81
(tbughput) for di&mmt des ip . It is sexm that inasshg radix-n recoding d e r s
in tbroughput. The better thughput of radix-4 compared to the ordinsry design
is due to shorta routing delays aichieved by the place & route tool. The digit-serial
multiplier synthesized wing Powentiew tooIs has bctter thraughput thm the array
m d t i p k designeci accordingly. Fig 6.3 shows that the digit-serid dtiplier ha9 the
best AT product. Tbough radix-8 design ha9 smaller area compared to Ladix-4 it's AT
product d é r s due to lower throughput or dock rate. The AT product for digit-Senal
design is about 3 times bettex than that for the paraIlel array design.
Figures 6.4 to 6.6 show the perfomance compatuon of the multipliers in the Syn-
opsys tool. As explainecl above the best results are considered h m table 6.2 and
noma l id values are used for the plots. Fig 6.4 is identical in nature to Fig 6.1 and
the radix-8 recoded design has the d e s t area The normalized plot of the inverse
of design throughput in Fig 6.5 shows that radix-8 ncoding d e r s in throughput.
Radix-4 design gave better throughput than ordinary design due to duent place
ment and short routing delays. The digit-serid design did not outperform paralle1
anay multiplia in terxns of throughput because of better logic optimization capabil-
ity of the Synopsys synthesizer. The digit-mial design shows the best AT product as
given in Fig 6.6. The radix-8 d e d design shows dose pedomance in AT product
with radix-4 design and digitserial design is only about 2 tima betta than pardel.
This is because of a better dock rate adiieved by the Synopsys synthesk and also
Figure 6.1. Design Size in Poweniew tool
due to superior logic optimization in the case of array design. In general Fig 6.1 to
6.3 are similai with Fig 6.4 to 6.6 respectively.
Figuns 6.7 to 6.9 give a wmparison of design performances in Synopsys and Pow-
erview t h . They are plotted by considering the best resdts from maximum speed
and minimum ana optimization from table 6.1 and 6.2. The r d t for ordinazy bit-
serial design is considemi as a base unit and all othet d t s from both tools are
nomalized (scaled) accordingly. These normaJized values are plotted in Fig 6.7 to
Figure 6.2. Throughput Rate in P o w e ~ e w tool
6.9. The area for ordinary, radix-4, radix-8 and digit serial deoigns are alrnost similar
in both the bols except for snsy design as shown in Fig 6.7. It demonstrates that
Synopsys synthesizer pedorms betkr in logic optimization for large combinational .
circuits. Fig 6.8 shows better throughput by Synopsys tool compared to Powerview
tool except for digit-serial design. The tkoughput for radix-8 and -y design is
noticeably better. The different performance of digit-serial design is due to efficient
plaement & ronting in Xilinx. Fig 6.9 shows a better AT product for the designs
Ordinaty
Figure 6.3. Area Time Product in Powaview tool
synthesized in Synopsys tools except for digit-SeTial design. This is due to the better
dock rate from speed optimization using VïewSynthcsis tools.
6.3 Discussion & Conclusions
Several reasons for the differences in pedoxmance of the multiplier design presented
in this chapta an due to:
O Optimization Algorithm In the case of Synopsys, FPGA compiler was used
which facilitates special compilation and optimization aigorithm for Xilinx 4000
Fi- 6.4. Design Size in Synopsys tool
parts. Also, in general the Synopsys synthesizer attempts to impmve speed when
area is being constrained and vice versa. The optimization algorithm gives higher
priority to timing constraints than the arur whenever both of these constraints
are pre4ent. But when one of these pivametas is constrained and met it makes
a move towards optimizing other constraints. The move is accepted only when
the primary constraint is not violated. This is dear fkom the dose results for
area and speed optimization goals fot a design in Synopsys as shown in table
Figure 6.5. Throughput Rate in Synopsys tool
6.2. The Powerview synthesk seem to focus only on the main parameter being
constrained (area or speed) during optimization. Thae is noticeable ciifference
between the results nom atea and speed optimization for a design in Powerview.
Powérview does not o f f - any special a3gorithms for targeting Xilw 4000 parts.
Thc ViewSynthesis tool is used as a generic logic synthesizer and optimiza for
tasgeting the design to different technology libraries. Chapta 4 gives a detailed
discassion about design consttaints and optimization in Synopsys and PoweMew
Ares-Time Roduet of The designs (Synopsys) 1 -4 L 1 I I I
Figure 6.6. Ares Tie Product in Synopsys tool
tools.
The Synthesis and optimization process in Synopsys has multiple features [3], [BI.
It takes s e v d compile runs (trial & a m ) with a combination of different com-
pile methodologies for the design to settle with optimal performance. Whereas
in the case of P o w d e w synthesis tool it does not take as many compile m.
O Technology Library: Though the basic tazget technology library used in both
tools is the same (Xiluix 4000 SeTies parts), th- is a Merence in the source for
-- - Figue 6.7. Compazative Design Size
these libraries. In the case of Powerview the unified l i b r e ''XC4000.sml'' was
used. The Synopeys tool in the department d a s not have Xilùuc FPGA library
of its own. So, a librasy for Synopsys synthesizer provided by XACT tool was
used. In this case the librasy was explicitly specified as X401û-ô.
Implementation: A large percentage of the delay in pmgrsmmable devices is
due to routing. The actiial delays can only be obtained after place and route. The
post-synthesis timing analysis of the design is only an ststistical approximation
of the actud delays, which uses wire load modeis Speased as a c011straiat to the
Figure 6.8. Comparative Throughput Rate
synthesis proass. The delays diffa considerably for a custom implementation
where the designer has control ova each transistor used in the design. Since a
cornmon place & route tool is used in this thesis for both Synopsys and Powerview
synthesized designs, differe~ces due to VHDL description and synthesis run cab
In spite of the fact that the final d t is &&ed by several steps involved in the
design flow, followïng observations can be concluded h m the implementation results
preeented in this chapter:
O The Digit-saial multiplier gives the best performance cloee to that of pardel
MOY in terms of tbughput. This is an ad~stage considering the fact that
it is less thsn half the size of the paralle1 design. For a digit size of M o , the
~ D e s i g x s - Figure 6.9. Comparative Area T i e Ptoduct
multiplier is only a h t 10% bigga than the ordinary bit serial multiplier and
yet it has about twice the throughput. These are persuasive arguments for the
use of digit-saial multiplias.
The digit-Senal design has the best AT product. The digit-serial design synthe-
sized using Powerview has 3 times better AT product than paralle1 design.
Use of radix-4 recoding in the digit-serial design impmved its performance. A
35% dSerenice betweea the radix-4 and the digit-serial desip in t a of AT
product is better than a 7% Merence without radix-4 recding in the digit-serial
design done by P.J. Graumann (ECE Dept., U of C).
91
a Siar d t s for the AT p d u c t of diflFerest multiplier a lgor i th compared
to AT pmducts for ripple carry operators compüed by Parsifal silicon compiler
given by Hadey and Corbett [6].
0 The non-redundaat radixradur-2 design is 5% d e r than the one given by Primlani
and Meador [7]. Only two gaard bits are nquired in the data word. Also the
propagation delay path is shorter in our approach. These estimates are obtained
by cornparhg them in Synopsys VHDL synthesizer.
hcreasing Radkn recodïng lowers the gain in the AT pmduct and s p d corn-
pand to the g a h of the Radùc-4 design compared to the ordinary bit-se15a.l
design.
Comparable multiplier pdormmces to those by Russell & Hutchings [Il] in
t a of numba of CLBs occupied in Xilinx 4010 and throughput.
a The Synopsys synthesizer outpezrorms the Powerview Synthesizer in the case
of logic optimization. The paralle1 array multiplier synthesized using Syaopsys
is 14% smalla compared to Powefview. It can a b be cudimed by the 26%
smaUa AT product for the Synopsys synthesized pasalle1 design compared to
Powaview. This shows the @ormance of bettex logic optimization algorithm.
0 The Synopsys synthesizer resulted in better dock speed, and hence higha through-
put, and better AT product in g e n d as seen fiom the d t s .
92
Synopsys tooh have bette automation of design flow than Powenriew tools. A
single saipt contK,ls VBDL read-in, constraint setting, compile methodoiogy
setting, pad assignments and XACT impIementation steps. An example of such
script is given in appendix C.
O The maginal improved pesformance of Synopsys does not merit the additional
dollar value wben considering the faet that Powexview W ody about 15020% of
the cost of Synopsys.
CHAPTER 7
CONCLUSION AND FUTURE WORK
Logic Synthesis is an intcgral pazt of digital systexn design today. News design
methodologies and betta CAD took prompt the n d to design, tevise and redefine
existing systems. Rapidly gmwing eemiconductor technology facilitates flexibility,
customization and betta performance of the ha1 pmduct. Changes from schematic
capture b d design mtry to HDL synthesis techniques has ban adopted by every
logic designer due to the inaeased productivi@- St~dardization of HDLs has also
enhanced design nuse and portability- These observations encourage the use of HDL
description and Synthesis bols in the design a p p r d .
Multiplier ci~cuits have traditionally b a n popular because of their extensive use in
Digital Signal Processing (DSP) algorithms. Due to its large silicon area traditional
array multiplia usage is limited in an implementation where several such u n i t s axe
required. Extensive research has been done to devise new algorithms to redua the
size while not loeing much computatiod throughput- Concept of pipeliniag and bit-
serial designs have been popular in the field of p d e l i s m in processor ill:chitectuzes.
7.1 Conclusions
In this thesis the VBDL synthesis based design approach for digital systems is
used. TWO very good EDL synthesis tools, Synopsys aad P o w e ~ e w are used for
94
design capture, simulation, synthesis and implementation. This thesis discusses the
steps involved in the design flow in these bols.
MdtiplKis are uscd zu benchmaxk designs. Four diâkent aaia multipliers and
a parallel multiplier are designecl and implemeated. The dect of rôdix-n recoding
schemes and the digit &al approach in the multiplier dgorithms are studied. These
multiplieïs are implemented in Xilinx which is one of the leading FPGA techn01opies.
A comparative analysis of these multiplias is presented in this thesis. A comparison
of the two VEDL synthesis tools based upon the design performances is &O presented
in this thesis.
The principal contribution of this thesis is a detailed cornparison of five different
multiplier algorithms using W D L synthesis based design. The main approacbes
taken for the task are:
a Designing multiplias using VBDL description. Ordinary bit-serial, non-redundant
radix-4 bit-serial, non-redundaat radix-8 bit-serial, digit serial, and pasalle1 ~ a y
multiplias are designeci for 12 bit coefficient.
0 To synthesize the designs two VEDL synthesis tools Synopsys and Powemiew
are d*
0 Design Implementation is done in Xilinx 4ûlûpg191-6 part type. The physical
testing of some of these muhipliers is done using Xilùur 4ûlûpg191-6 prototype
board-
Design pafo~ma9ce patametcrs such as area, thughput rate, AT product are
compand for these desipils.
In relation with the objectives d&ed in chapter 1 and the approach used for this
research work, this thesis pr*lents:
a A study of area-speed trade 05 and pardelism in processor architectures us-
ing multiplier designs. Bit-serial design with pipclining and radix-n recoding
techniques are stressed- A detailed compatison of the pedormasce of multiplier
algorithms studied is given in Section 6.3.
A VHDL synthesis based design methodology for digital Cvcuits.
A detailed performance cornparison of Xiünx 4010 FPGA implemented multiplier
algorithms.
0 A cornparison of two of the moot pop* VHDL synthesis based Electronic
Design Automation tools.
7.2 Further Work
Suggestions for some further work in relation with this thesis is given below:
VMable CWL implementation for the multipliers can make them more flexible.
Also dix-n recoding techniques can be used for array multipliers. This needs
fiutha investigation.
HDL d d p t i o n plays a vital role in shaping op the Iioa product. Verilog has
recentiy becn standasdized (IEEE 1364-1995). It is possible to implement the
multiplier designs using V d o g synthesis.
Dinerent ASIC technologies give diff't perfomances. Full custom or some
other FPGA implementation of the designs can be done to explore better results.
This needs target library support in the synthesis tool.
[l] P. Desyer and D. Renshaw, %SI Signal Proassing: A Bit Senal ApprOach",
Addison-Wesley PubKshing Company, 1989.
[2] Ben Cohen, 'VHDL Coding Styles and MethodoIogy", Kluwer Academic Pub-
lishers, 1995.
[3] P. Kump and T. Abbasi, "Logic Synthesis Using Synopsys" , Kluwer Academic
Publishers, 1995.
[4] Morris Mano, 'Digital Logic and cornputer designn, Prentice Hall, 1979.
[5] P.J. Ashenden, "The VHDL Cookbookn, Fint edition, Dept. of Cornputer Sci-
ence, University of Adelaide, Australia
[6] R Hartley and P. Cocbett, uDigit-Serial Pmcessing Techniques", IEEE T m .
on Circuit and Systems, Vol-36, No. 6, pp.707-719, June 1990.
[7] K.K. Primlani J.L. Meador, "A Nonredundant-Radix-4 Serial Multiplier", IEEE
Jounial of Solid-State Circuits, Vol-24, No. 6, pp.17294736, Dec. 1989.
[SI ILF. Lyon, 'Two's Complement Pipeline Multiplias", IEEE Trans. on Com-
munications, pp.418425, A p d 1976.
[9] S.L. Freeny, "Special-Purpose Hardwaze for Digital Fitering", Proaedings of
the IEEE, Vol-63, pp.633-648, April1975.
[IO] P. Cappello, A. LaPaugh and K. Steiglitz, Wptimal Choie of Intermediate
Latching to Maximize Throughput in VLSI Circuits", IEEE 'Prans. on Amus-
tics, Speech, and Signal Proassing, vol.ASSP-32, No. 1, pp.28-33, Feb. 1984.
98
[li] RJ. Petasai aod B.L. Hutcbgs, 'An Assessment of the Suitability of FPGA-
Based Systclns for Use in Digital Signal Procaisïng", Brigham Young University,
Dept. of ECE.
[12] S.G. Smith, M.S. McGregor and P.B. Denyer, Techniques to I n ~ e a ~ e the Corn-
putational Thtoughput of Bit-Serial Architeches", IEEE Proc. on htl . C o d
on Acoustics, Speech aad Signal Pmxessing, pp.543-516, Apt. 1987.
(131 P. Ienne and M.A. V i i a z , YBit-SeriaI Mdtipliem and Squamsn , IEEE Traas.
on Cornputas, Vol. 43, No.12, pp1445-1450, Dec. 1994.
[14] L. Kuhnel and Hattmut Schmeck, =A Clcloser Look at VLSI Multiplicationn,
htegration - the VLSI Journal, VoI.6, N0.3, pp.345.359, Sept. 1988.
[15] R. Nagalla, uSynthesis of Digital Signal Processing Systems Using Pipelined
Bit-Said Arithaietic", M.Sc. Thesis, University of Calgary, Nov 1991.
[16] R.F. Tider, "Digital Engineering Design - A Modem a p p d n , Prentice Hall,
1991.
[U] N.H.E. Weste and K. Eshraghian, "Principles of CMOS VLSI Design - A System
Perspective", Addison-Wesley Publishing Company, 1993.
[18] 'The Programmable Gate Artay Data Book", XILINX k, 1994.
[19] Workview Pius, Workview PLUS for Windows: VHDL Werence Msnual",
Viewlogic Systems hc., 1995.
[20] ViewIogic, YVHDL Designer Usa's Guide and Tutonal", Viewlogic Systexns
hc., 1992.
[21] Viewlogic, "VHDL Merence M d for Spthesisn, Viewlogic Systems Inc.,
1992.
[22] ViewIogic, wewsim/SD User's Guide", ViewIogic Systans Inc., 1992.
[23] Synopsys, 5YNOPSYS v3.4b Online Documentation", Synopsys Inc., 1996.
[24] E. Horbst, u A d ~ c e s in CAD for VLSI - Volumc 2: Logic Design and Simula-
tion*, North-Holhd, 1986.
[25] Arpad Barnq W S I C Technologies and Thdeofin, John Wdey & Sons, 1981.
[26] Geoff Bostock, "Programmable Logïc Devi-: Technology and Applicationn,
McGraw-Hill Book Company, 1988.
[27] Mark G. Sobel, " W n h System V: A Practical Guiden, Third edition, The Ben-
jamin/Cummings Publishing Companyy Inc, 1995.
[28] L. J. Giacoletto, u E l ~ ~ n i ~ Designers' Handbookn , Second edit ion, McGraw-
Hill Book Company, 1977.
[29] A.P. Malvino, =Digital Computa Electronics: An Introduction to Microcorn-
puters", McGraw-Hill Book Company, 1983.
[30] M. Goossens, F. Mittelbach and A. Samarin, The LATEX Cornpanid,
Addison-Wesley Publishing Company, 1994.
[31] "System Design and Rapid Prototyping Tiaioing Workbook", Canadian Mi-
electronics Corporation, May 1996.
[32] D.E. Thimas and P. Mootby, 'The V d o g Hardware Description Language",
Second Edition, Kluwer Academic Publishas, 1995.
[33] E.S. Yang, uMicroelectronic Devices", McGrsw-Hill, hc., 1988.
APPENDIX
A XC-4000 CLB
Fig A.1 shows a simpMed b l d diagram of Xc4ooO1Wes Configurable Logic
Block [18].
a a a s P P C E
Figure A.1. XC4OOû-fdes CLB
B VHDL Description
The folIowing VHDL code gives an example of design description at the highest
level of hierarchy. It is a structural description which uses concmrent component
inetantiation. The behavioral description of each component is desaibed in separate
files.
------------------------a-------------
-- File : "twel't,itxilirix.vhdw
-- Desaiption : A structural VEDL desaiption for a 12 bit fidl product
O- RadUr-4 non-redundmt bit-serial multiplier. This is an
-- example sbowing top level design description.
-- Author : Khem Pokhrel
o L I I I I I I I I I I I - - - - - - - - - - - - - - - - - - - - - - - - - -
library ieee,d41ib,merolibrary;
use ieee-stdJogicJ l64.d;
use rad4ib.aU;
use mdbltiiry-hardpadt.all;
entity twelbitxihx is
port(coefii,datai : in stcUogic-vector(l1 domto O);
&,dock : in stUogic; c0,clatch : out stcUogic;
productout : out stUogic-vector(22 downto O));
end t w e l b i ~ ,
architecture design of twdbitxilinx is
si& nl,n2,n3,n4,n5,n6 : stdlogic;
component twelbitfphasd
port ( &,datai, cntrli,dock : in stdiogic;
prodhigh,cntrlo : out stdogic; produ& out stdiogic-vector(l0 downto O));
end component;
component partoser
port ( n : in stdogic-vector(l1 downto O);
dkJd : in stUogÏc; mut : out stdlogic);
end component;
component controlgen
port ( dk,dear : in sti
end component;
component d o p a r
dlogic; c0,cl : out stcuogic);
port ( cIk,sin,sel: in stdlogic; prod : out stdlogic~v&or(ll downto O));
end component;
begin
compl: parto6er
port map (n => coefn, clk => dock, Id => nl, sout => n3);
comp2: pastoser
port map (n => datai, clk => clock, Id => ni, sout => n4);
comp3: controigen
port map (dk => dock, dear => dr, CO => al, cl => n2);
CO <= nl;
cornpl: twelbitfphazd
port map (4 => n3, datai => n4, cntrli => n2,
dock => dock, pmduct => productout (10 downto O),
cntr10 => n6, prodhigh => n5);
comp5: sertopar
port map (clk => dock, sin => n5, sel => n6,
prod => productout(22 downto 11));
datch <= n6;
end design;
--The Followhg wnfigutation is for Simulation
library ieee, rad&b, merolibtary;
use ieee.stdJogicJl64.d;
use rad4ib.all;
use merolibrary.hardpack-a&
configuration twelbitxilinxdg of twelbitxilinx is
for design
for aIk psrtcar we entity merolibraty.partoset(design);
for comp3: controlgen w entity rnerolibrary.contn>igen(desi~);
for comp4: twelbi*hard use entity rad41ib.twelbitfphard(design);
for comp5: sertopar use entity merolibrary-satopar(design);
end for;
end twelbitxilbccfg;
C Design Compiler Script
The following is an example of a single saipt (cdection of design flow commands) \
used for andyzing VHDL, Elaboating into intermediate Synopsys format, setting
constraints and compile method01ogy, pad allocation, synthesizing the design, gener-
ating relevant n p o r t s and imp1menting into Xiliw FPGA part. This demonstrates
the automatcd design flow in Synopsys tools.
/ ************+***************************************
/ * + File : Rad4max-speed-script
/ * JF Description : Synopsys Design Compiler script for andyzing VHDL
/ * * files, constaints setting, synthesizing, pad allocation
/ * * and implementation. U d for FPGA compiler
/ * * Author : Khem Pokhrel
/* Removing all the designs from the Design Analyzex */
removedesign Wall
/* DeGning a design library mapped to./rad4ib diiectory */
defînedesigaiib radrllib -path ./rad4lib
designer = "Khem Pokhreln
Company = "Dept. of El&. Engg., U of C"
/* Andyzing the Top level Designs and putting the intermediate files */
/* in rad4ib. Ik&ult WORK is mapped to thk library. */
andyze -format vhdl fmodfp.vhd -1ib radab
andyze -format vhdl midrnodfp.vhd -1ib radab
analyze -format vhdllasmodfp.vhd -1ib radab
andyze -format vhdl twelbitfp-vhd -1ib rad4lib
/* Elaborating the Designs */
/* elaborate fhodfp -1ibraty tad4lib */
/c elaborate midmodfp -1ibrary d & b */
/c elaborate lasmodfp -1ibary rad41ib */
/* Only the Top level design nceds to be Elabomted */
elaborate twelbitfp -1ibrary 4 4 l i b
/* Checking Design */
/* Uniqaayins the multiple instances of mbdesigns in the t /
/* hiemzchy of Top lm1 design and un-gmuping to remove hierarchy*/
set flatten
/* Set the synthesis design constraïnts. And compile for Fastest Design */
removedock -al1
createAock dock -period 30
/* Add pads to the design. Make sure the current design is the toplevel module.*/
setportispad n*s
setpad-type Oslewrate HIGH hutputs()
insert+ds
/* Setting Attributes to the PADs according to the XC401ûpgl91-6 prototype * /
/* Imp1ementation Board */
setattribute 4 "pdocationn -type string *Bln
setirttribute datai "padlocationn -type string "Cln
setattribute cntrli 'padlocation" -type string "Dl"
setattribute dock "padlocation" -type string *B2n
setattribute d o ' pdocationn -type string 'F16*
setattribute datao "padlocationn -type string "H16-
setattribute cntrlo "pdocationn -type string "KMn
setattribute prodhigh "pdocation" -type string "Ll6"
setattribute "prodlow<S>" "padlocation" -type s t r i n g "Dl$"
setattribute "prodlow<4>" "pad-location" -type str ing "Dl?
setattribute "prodlow<t>" ''pdocationW -type str ing *El$"
setattribute "prodlow<t>" " pdocation" -type s t r i n g * F18"
setattribute "prodlow<l>" " pdocation" -type str ing * GISn
setattribute " prodlow<O>" " padJocationS -type str ing * GliF
/* Check Design Bdore Compiling */
checkdesign
/*Compile the design wïth optimîzation aaoss hiexarchicai boundaries */
compile -mapAort high -verify -veri@dort low -boundary,optimization
/* Write report files */
reportrpga > " . /db/mw-epad/twelbie&gan
report-timing > " ./db/max-speed/twelbitfp .ha-timing"
report- > " . /db/max-spd/twelbitrp.fpga-atean
/* Saving the design */
write -format db -hierazchy -output " ./db/max-speed/twelbif+fpgadb"
/* Replscing tells with gate Ied equivaiemts */
replace4ga - g r o u p ~ -group-tlus
/* Report gate level usage */ -
reportarea > './db/max-spd/twdbia.gate-arean
report-timing > " ./db/max-speed/[email protected]
/* Wnte the gate level design */
write -format db -hieratchy -output " ./db/max-Bpeed/t719elbitfggate.dbn
mite -format vhdl -hie~archy output " ./gstesh/ma-@/tnrelbi+gate.vhdn
/* Set the part type */
setattribute twelbitfp "part" -type string "4010pg191-6"
/* Optional attribute to remove the FPGA Compilers mapping to CLBs and IOBs
iiom al l levels. This removes the BLKNhd parameters while writing XNF net-list. */
setattribute find(design,D *" ) "wtout~hUaiamesn -type boo1ean FALSE
/* keeping XNF in the Xiliox directory. */
write -format xnf -hiemîrchy output " ./xilinx/max-spad/[email protected]"
cd xilinx/max-spd
sh xmake twelbitfp
/* Ditecting the max cl& fkquency info h m XDELAY program to a file */
sh xdelay -w twelbitfpka > tweibitfp.dly
/* Writing long delay report in a fde and writing design file with net delay * /
sh xdelay -w -x -O twelbitfp.opt [email protected]
Figures A.2 and A.3 show the scbematic generated aRa synthesis and simuhtion
waveform for the non-ndundant d i x 4 design.
Figure A.2. Schematic Diagram
li U b b b L 1 L 1
U iru i u b U b b 8
1 b b II a II 1 Ili 1 \