A comparison of bit-serial multipliers using VHDAL based ...

University of Calgary

PRISM: University of Calgary's Digital Repository

Graduate Studies Restricted Theses

1997

A comparison of bit-serial multipliers using VHDAL

based logic synthesizers

Pokhrel, Khem Chandra

Pokhrel, K. C. (1997). A comparison of bit-serial multipliers using VHDAL based logic synthesizers

(Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/21986

http://hdl.handle.net/1880/26790

master thesis

University of Calgary graduate students retain copyright ownership and moral rights for their

thesis. You may use this material in any way that is permitted by the Copyright Act or through

licensing that has been assigned to the document. For uses that are not allowable under

copyright legislation or licensing, you are required to seek permission.

Downloaded from PRISM: https://prism.ucalgary.ca

THE UNIVERSITY OF CALGARY

A Cornparison of Bit-Serial Muhipliers Using

VHDL Based Logic Synthesizers

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

M PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGmERJNG

CALGARY, ALBERTA

JANUARY, 1997

National Library Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services sentices bibliographiques 395 w m ï street 395. me Wdllngkm -ON K1AONI -ON K i A W CaMda Canada

The author has grantd a non- exclusive licence alIowing the National himy of Cana& to reprodwe, loan, distn'bute or sell copies of hismer thesis by any means and in any fonn or format, malong

The auîhor retains ownetship of the copyright in Mer thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced with the author's permission.

L'auteur a accordé une licence non exclusive permettant a la Bibliothèque nationale du Canada de

vencke des copies de sa t h b de guelque manière et sous quelque forme gue ce soit pour mettre des exemplaires de cette thèse à la disposition des personnes intéressées.

L'auteur coaserve la propriété du droit d'auteur qui protège sa thèse. Ni la thèse ni des extraits substantieis de celle-ci ne doivent être imprimés ou autrement reproduits sans son alItorisati011,

Performance cornparison of five multiplier algorithms is done using VHDL synthe

sis based design flow. Ordinary pipciiaed bit-&& digit -dal , and p m d d data

flow, and radix-n ncoding aïe considered in the eomputatiod aigorithms for the

multipliers. Design ske, throughput, and area-time product are considaed as main

performance memnes. Synopsys and Powerview tools are ased for design synthe-

sis. Compatible VHDL is d for design description in Powewiew and Synopsys.

Synthesized designs are impIemented on Xiiinx 4010 field programmable gste May.

Implementation results are d for the pedormance cornpariaon of the multipk

algorithms. The VEDL based logic synthesis taols are compared based on the design

perfo1~111â11ces.

Digit-serial multiplia is found to perform the best among consided algorithms.

Respectable results of the serial multiplier in general stress th& usefidness and the

need to improve their computationd tbughput. Results dso show a considerable

dinerace in performa~ce of these VHDL synthesis tools.

1 would iike to express my sin- gratitude to Dr. G. S. Ho~c, whoee constant sug

port, guidance7 encouragesiest and constmctive aitickm has made this work possible.

1 am as0 gratefd to Dr. L. E. b e r for his induable guidance7 encouragement

and suggestions for this project.

1 appmüate the h c i d support provided by Nepal Engineering Education Project

which allowed me to go tkough M.Sc. program.

I thank Warren Flaman, TechnÏCa Support ECE Dept., for his help with X k

implemsntation board.

My sincere th& to Dr. Soorya Kuloor for all his help in Energy Management

Lab. 1 would like to thank Joeeph Pmvine, Gokaraju Ramakrishaa, Sridhar Krishnan,

and AN^ Das for their suggestions and help with formattiog this thesis in Latex.

CONTENTS

......................................... APPROVAL PAGE ii

................................... A O E N T S iv

............................................. DEDICATION v

TABLE OF CONTENTS ..................................... vi

. LIST OF TABLES .......................................... viii

.................................. LIST OF ABBREVIATION xi

1 . INTRODUCTION ....................................... 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Present Trends : Hi& Level Logic Synthesis . . . . . . . . . . . . . . 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Motivation 7 1.4 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Approach 9 1.5 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2. DIGITAL S Y S T E M DESIGN APPROACHES ................ 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 12

2.2 Architectural Alternatives . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Serial Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Performance Measues . . . . . . . . . . . . . . . . . . . . . . . . . . 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Serial vs Paralld 18 2.6 Rdk-N . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 . MULTIPLZER DESIGN ................................... 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Multiplication Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Functional b i t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.1 Lat& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1.1 Latch2bit . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.2 SeÉialAdder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Ordinary Bit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Radix-4 Bit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . . 33 3.6 Radix-8 Bit-Senal Multiplier . . . . . . . . . . . . . . . . . . . . . . . 36 3.7 Digit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . 39

v i

3.8 Pasallel-hy Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.9 Summsry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

.................................... . 4 DESIGN SYNTHESfS 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 VHDL for Synthesis 45 4.2.1 VHDL Constructs support . . . . . . . . . . . . . . . . . . . . 50

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Viewlogic Tools 51 . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Synthesis Criteria 52

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Synopsys Tools 55 . . . . . . . . . . . . . . 4.4.1 Constraints and Design Opthization 57

4.4.1.1 constraints . . . . . . . . . . . . . . . . . . . . . . . 57 . . . . . . . . . . . . . . . . . . . . . . 4.4.1.2 Optimizstion 60

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary 62

................... ........ . 5 DESIGN IMPLEMENTATION ,. 63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 htduct ion 63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 CMOS Logic 64

. . . . . . . . . . . . . 5.3 ASIC Techn01ogies and Programmable Devices 66 5.3.1 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

. . . . . . . . . . . . . . 5.3.2 Standard cell and Full custom design 71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 XACT Tools 73

5.5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

.............................................. . 6 RESULTS 76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Objectives 76

. . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Implementation Results 76 . . . . . . . . . . . . . . . . . . . . . . 6.3 Discussion & Conclusions .. 84

..................... 7 . CONCLUSION AND FUTURE WORK 93 . . . . . . . . * . . . . . . . . . . . . . . . . . . . . . 7.1 Conclusions .. 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 FUTther Work 95

............................................ REFERENCES 97

APPENDLX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A XC-4000 CLB 100

. . . . . . . . . . . . . . . . . . . . . . . . . . . . B VEDLDcsaiption 101 . . . . . . . . . . . . . . . . . . . . . . . . . . C DesignCompilaSaipt 104

vii

LIST OF TABLES

. . . . . . . . . . . . . . . . . 2.1 Non-Reduodant Radk-N Rsooding Scheme 20

. . . . . . . . . . . . . . . . . . . . . . . . 2.2 Rcdundapt Radix-4 Rrcoding 21

. . . . . . . . . . . 6.1 RcsaltsfromPowerviewSynthcsizaandXACTtools 79

. . . . . . . . . . . . 6.2 Resats from Synopsys Synthesizer and XACT tooIs 80

LIST OF FIGURES

1.1 Synthesis Based Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geaeric Architectures ,.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SerialDataFIow

Typical Bit-Sezial Multiplier Module . . . . . . . . . . . . . . . . . . . . Lat& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . La tab i t

Bit-Serial Cany Save Adder . . . . . . . . . . . . . . . . . . . . . . . . . Ordinaxy BieSbal Multiplier . . . . . . . . . . . . . . . . . . . . . . . . Radix-4 BitSerial Multiplia . . . . . . . . . . . . . . . . . . . . . . . . . RadUr-8 Bit-Serd Multiplier . . . . . . . . . . . . . . . . . . . . . . . . .

Digit-Seria Multiplie . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Array Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . A VHDL Hardwaxe Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . .

Design Flow with Viewlogic Tools . . . . . . . . . . . . . . . . . . . . . 4.3 Design Flow with Synopsys Tools . . . . . . . . . . . . . . . . . . . . . . 5.1 The Acronym Tra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Implementation Flow in XACT took . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . 6.1 Design Size in f owexview tod

6.2 Throughput Rate in Powaview tool . . . . . . . . . . . . . . . . . . . . . 6.3 Ama Tirne P d u c t in Powerview tool . . . . . . . . . . . . . . . . . . . 6.4 Design Size in Synopsys t d . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Throughput Rate in Synopsys tool . . . . . . . . . . . . . . . . . . . . . 6.6 Area T i e Product in Synopsys tool . . . . . . . . . . . . . . . . . . . .

ix

6.7 Comparative Design Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

. . . . . . . . . . . . . . . . . . . . . . . 6.8 Comparative Throughput Rate 89

. . . . . . . . . . . . . . . . . . . . . . 6.9 Comparative Ares Time P d u c t 90

A.1 XC4000-famitiesCLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

A.2 Schematic Diapim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.3 SimUlafionRedts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

ASIC

AT

c m

CAE

CLB

CMOS

cm

DC

DFIRST

DRC

DTL

D m

ECL

EDA

EEPROM

EPROM

FIRST

FPGA

FTGS

GUI

GVAN

Application Speafic Intepted Cimut

Ares T h e

Computer Aided Design

Computer Aided Engineering

Codigurable Logic Block

Complementazy Metal Oxide Semiconductor

Cdcient Word Length

Design Compila

Digit-serial implementation of FiRST

Design Rule Check

Diode Transistor Logic

Data Word Length

Emitter Coupled Logic

Electronic Design Automation

Electrically Ensable Programmable ROM

Electrically Pqpmmable ROM

Fast knplementation of Rad-time Signal t r a s s f o ~ ~ ~ ~ ~ .

Fidd Programmable Gate h y

Full Timing Gatclewel Simulation

HDL

IC

110

LOGSIM

LSB

LSD

LSI

MS

MSB

MSD

MSI

m

PAL

PLD

PPR

PPSI

PPSO

PROM

RAM

RC

ROM

RTL

Hardware Description Lang~age

Iiitegrated Circuit

Input output

Logic Simulator

Least Sigdicant Bit

Least Sigdicant Digit

Large Scale Integration

Most s i m c a p t

Most Signiscat Bit

Most SignScant Digit

Medium S& htegration

MUlt ipleXor

Programmable Array Logic

Programmable Logic Device

Paztition Placenient & Route

Partid Product Sum Input

Partial Product Sum output

Programmable ROM

SCL

SRAM

SSI

TTL

VHDL

VHDLDBX

VHSIC

VSS

WPBS

XACT

XNF

Simulation Contra1 Language

Static RAM

SmaU S& kitegration

Transistor 'Itansistor Logic

VHSIC HDL

VBDL Debugge~ Simulator

Very High Speed IC

VHDL System Simulator

Word P d e l Bit Serial

Xilinx Automated CAE Tools

X i h x Net-list Format

CHAPTER 1

INTRODUCTION

1.1 Background

Computer aided hi@ level digital circuit design process consists of successive trans-

formation fiom higher to lowa lwels of &cuit description. In g e n d a Computer

Aided Design (Cm) process contains different description domains bound together

by some relationship as defined by the softwan tools in question. The three major

description domains am :

a Strucfural domain and

a Physical domain.

a System level

0 Architectural or Functional level

Logic level and

a Physical Layout or Circuit level

2

The design may be convertecl h m concept to architecture, to logic and memory,

to the circuit and hence to the physïcal layout. The dciency of design conversion

depends on the consistency of a sofhran tool in ail t h description domsios and at

ail relevant levels of abstraction. This is m d in temm of performance parameters

suchôs:

a Speed of operation,

a Chip asea,

Ease of test generation and testability-

1.2 Present 'Ikends : High Level Logic Synthesis

Advances in Integrated C i t technology have given us the abüity to design and

manufacture large Integrated Ci~cuits. But, these advancement have increased the

number of parameters that a designer must deal nith in order to realize a hi&-qudity

commercially viable design. Smder geometries of semiconductor technologies which

enabled larger and faster designs have also signiâcastly impacted the Msiable that

aned timing in designs [lï], [33]. This has i n d the numba of design alternatives

that need to be considered.

The widely adopted schematic capture based htegrated Circuit (IC) design a p

prosch of eighties has limitations that inaease with complexity of design. However,

3

with major advaoces in ElectLOnic Design Automation a nlstiveiy automated means

of high level design with Hardware Description Laogaa%es and Logic Synthesis, has

emetged. The transition to Hardware Description Language (EDL) based design a p

proach hasi enabled a substantial inaease in pductivity- Sometuaes this is quantilied

as numba of etes per eqheer per day. HDL synthesis based methodology focuses

more on logic vdcation than detailed gate level implementation used in schematic

capture b a d design. This helps the designer ver@ hi0 concepts in a short period of

tirne without going into net-list generation and layout stages.

At the heazt of this transition to HDtbased design lies Logic Synthesis" . Synthe-

sis is the iterative pmcess aimed at tradofming hctionality initidy described in

HDL, to achieve an optimized technology-spdc net-list. A typical synthesis process

indudes both reading the source code and optimizing this code. Optimization is a

step in the synthesis process which ensunil the k t possible combination of library

cells to meet the hctional, ana and speed requirements of the target design.

Hardware Description Lansusges and Logic Synthesis have a profound impact

on the design procees and the final outcorne of the design. The emerging flavor in

HDL synthesis b d design is Behaviord Synthesis Methodology. In tthis technique

the design specification and fiinctionality are descxibed in algdthmic comtructs of

Very-high-speed-integrated-circuit Hardwaxe Description Language (VHDL) or Ver-

hg. Otha high level laquages like C are also being used in design description

4

in hardware-soRwatie -design techniques. These C codes are t h converted into

behavioral VHDL or Vdog and synthesis is d e d out. The motivations behind

Behaviod Synthesis Methodology an :

0 S d e r Code : The Behaviod VHDL or V d o g d e is about 10% of its Rcgister

'banda Level (RTL) structural code-

0 faste^ Simulation : The Behavioral SimuIation is very fagt compared to RTL

simulation.

Shorter Design Cycle : O v d design time in this methodology is shorter.

A typicd high level design flow bascd on synthesis involves the steps depicted in

Figi.1. It begins with a sketch of the functiond speafication of the design. This

inchdes informations about @, ares, system Input Output (I/O), word-lengths,

control flow and data M e r , and the overall fundion of the design. This functiond-

ity is then desaibed in a Hardware Description Language as a behaviord, structural

or mixecl code. This code is andyzed and simulated for its functional verification as

spewfied earlier. If the d e mats the fundional requirement it ia then synthesized

or compileci into tagct teehnology library components. This step is key to HDL

synthesis b a d design flow. Several factors involved in synthesis are d d b e d io

detail in chapter 4- A gate level net-list is pmduced as a result of synthesis. This is

simulated to veRfy the requved functionaity. ARer gate level simulation, the design

5

is t r a n s f o d into the phfical dom& using floor-planning or place & mute tools.

To achieve the optima performance the actiürl delay and otha circuit information

is supplieci badr from the place & route tao1 to the synthesiza and the design is

te-synthesized. As shown in Fig 1.1 the steps involveci are asually iterative and are

Figure 1.1. Synthesis Based Design Flow

carrried out util the design mets the fundional requirements and initial speciî~ca-

tiona. This ptocess has an impact not only on the final net-list generated, but also on

6

the otha issues including HDL codllig styles(constn~cts), design partitioning, net-

integration and CAD methodoiogy-

That are various high Id digital logic synthesis toois, which incorporate various

HDLs like VBDL, Vmlog, Abel, FIRST, DFIRST, LOGSIM etc. The most commonly

used HDLs today are VHDL and Verilog. Dinaent tools have dinemnt b d s of usa

intaface, design optimization and eonsasints setting techniques, and 0th- design

t h e ffexibility like technology transfer (bm one to the another Application S p d c

htegrated Circuit (ASIC)). Amoag these Synopsys and Viewlogic synthesis tools are

the most popular, while VHDL is the preeent day HDL's stasdard. V d o g is an

emerging HDL.

With increasing complexities and multiple features of these s o k e tools, the

g e n d points of merits of the tool and HDL in use depends on its ability to :

i. facilitate modular design(to baadle cornplex designs),

2. provide self documenthg description,

3. facilitate parameters and constraints setting as quired by the design,

4. incorporate better ASIC technologies for test implanentation,

5. optimize the design,

6. provide design and architecture independent flexibility,

7. mtatget a given design to nem semiconductor technology, and

8. f d t a t e easy rcyse of technology independent designs desaibed in HDL.

1.3 Motivation

The EDL based high l w d digital design appraach is an important -ch ana in

Integrated Circuit tcehnology. Many design toob involving HDL b d logic synthesis

have been developed and used by IC designers. A study of these tmls has revded

the following points :

a In the quest of apgrading the Electn,nic Design Automation (EDA) tool capa-

bilities to rea.lize the f d l potential of HDL based design, the whole process of

synthesis has become increasingiy cornplex

a Area and Speed of the rdt ing design has dways played a key d e in determin-

hg the productivity of the bol. This is important when selecting a design tool

for smaller and faster circuits.

O Teats for the synthesîzing capability of Merent system of tools for the same

HDL based design have not been d e d out dcie&ly to compaze their per-

formances.

O Some of the tools are architecture speafic and somt off' very little parameter

. constratorng feature needed for optimized design.

8

Effective deaign methodobgy utilizing an optimal combination of design tools is

A lot of research work has ken dont [6], [Y], [8], [IO], [12] to kvestigate ways to inaease

the wmputational throughput of b i M d design. Various aigorithms have emerged

as a d t of these efforts. Importance of bit-serial multiplias for sup& sna-time

product ampazed to p d e l array design[6],[12] is well known. With the advent of

better techn01ogies and newer design methodologies it is worth while to reconsider

serial-muitiplier designs.

These observations motivated this research project.

1.4 Thesis Objectives

The main objective of the research is to obtain a graphical cornparison of the

performances of various multiplier aigorithms. Design size and s p d are considered

as performance meaSUTes because:

O They are the major performance parameters of a digital circuit which can give

us a standard cornparison of the design algotithms and CAD tools in use. The

atea of difhision (of the device) and the speed of operation are also related with

the physid circuit parameters such as :

- bistance Capacitance (RC) delay estimation,

- Powa collsumption,

- Padeging issues and

a In a good design approach, detennination of correct opaation and adequate

performance involves simulating the circuit at aU appropriate design coniers.

These coniers are worst-power and worst-speed [l?] , [BI, [XI, (281.

Thus it is evident that an ;uea-time or spsctspeed cornparison be done. Also the

general goal of this theais is to compare how difksent VHDL based digital h a d

ware synthesizers perform in produchg digital logic design. This -ch focuses

on two mat pop* HDL based (VHDL) logic Synthesis tools Viewlogic and Syn-

opsys. These tools are chosen because of th& populari@, flexibility, and parame-

ter constraining facilities. The performance cornparison is focused on the abüity of

these tools to generate differeat digital circuit designed h m diffkrent architectural

approaches. The Merent architecture considemi here are parallel, bit-SeLial, and

digit-serial.

1. Bitoserial multiplias with different algorith, and a parallel array multiplier

are designed using VHDL description. These designs are simulated for theV

functionality and used as the test benchmadm for the performance cornparison.

10

2. These desipz are impleimepted on Xilinx Fidd Rognmmable Gate Array (FPGA).

M t s from the hardware implementation are tabdattal snd compared.

3. Various multiplier pedormaaces an c o m p d among themselves and across ma-

jor design t& Vïewlogic and Synopsys.

1.5 Thesis Organization

Chapta 2 gives a general ovmew of the patallel and serial design techniques. A

cornparison of these techniques is p~esepted. Pîpelining of serial data paths, interna1

radix-n recodiag and pedorma~~ce measures are disnissed.

Design details of various multiplier used in this project is given in chapter 3. A

modulas hierarchical development approach £rom basic fiuictional units is stressed.

Chapter 4 presents the high level multiplier design synthesis using VBDL descrip

tion. Use of VHDL for synthesis is discussed. The design methodology and steps

involved with Viewlogic and Synopsys tooh is givm. Synthesis aiteria and design

optimization capabilities of these tools are discussed.

Chapta 5 discusses differept logic types. A brief ovaview of several implanenta-

tion styles, programmable logic aad ASIC technologies U presented. Design imple

mentation flow for Xilinx Automated Cornputer-aided-engineering TooIs (XACT) is

given.

Chapta 6 tabulates the results obtained h m design implementation. Graphical

representations for various design performances axe presented. Comparative analysis

of the results is given and some condusions an drawn.

O v d conclusions are drawn and thesis oo~ttibutions an preented in chapter 7.

Some suggestions for ftrther work are also given.

CHAPTER 2

DIGITAL SYSTEM DESIGN APPROACHES

2.1 Introduction

The primary con- of a logic designer is to achieve opeQfic real-time sample rates

using adequate but modest hardwase. Algorithm & architecture choie limits system

speed and actud area of silicon implementation, because the arithmetic elements have

iked maximum operathg speed according to the target technology used.

This chapter describes diffemnt a p p r d e s of digital systems design. Modifications

of the g m d said design method to investigate ways to increase eomputational

throughput is given the main focus. Binary arithmetic, bit-sampling and modular

development is &O discussed. Internal bit tegrouping and their interpretation for

computational steps is discussed.

The chanctaistic featues of parallel and serial a p p d e s are summatized here.

A cornparison of the p d e l and serial design method is presented. The performance

criterion of the d t a n t h a r d w is given in detail.

2.2 Architectural Alternat ives

In general, the dilssification of implementation style of a digital signal pmcessing

algorithm [1],[6] is governecl by the numba of bits pmcessed simultaneously. The

Various arcbitectd alternatives are:

Fig 2.1 shows a generic b i t - p d e l and bit-serial architecture. In b i t -pde l design

all the input bits of the semple word are nad in a single dock cyde. And output bits

aie produced togethet. In a bit-serial design, input bits arrive on a single wire over a

number of dock cycles and output bits me produœd serially. Fasta processing speed

in parallel designs are achieved using increased complexity of routing, propagation

delays within operators and at expense of large chip asea.

Digit-Senal systems process more than one bit at a time. This is done by dividing

the N-bit word into X di6erent digits of Y bits wide each, whem N is XCY. The main

objective behind this approach is to increase the opaator speed by a factor greater

thsn its size. Similarly a design can be of mixed architecture type! depending on the

mode of data bits transmission to and fiom external units. A Word P d e l Bit Serial

architecture has pardel interfixe to its external uaits, whaeas it processes the data

word one bit st a time. Simirarly Word Serial Bit Pardel architectures ttansmits

data word seridy but the bits constituting the words are processeci in pardel.

Bin (M bits)

Bin (M bits, serial)

a Bit Saial Multiplier

Figure 2.1. Generic Architectures

A digital design can be synchronous or asynchrono~. In a synchronous system

the sequace and time of operation is controlIed by meam of system wide clock. The

dock is key to the functiond behavior of the synchronous system. The maximum

dock rate dependa on the delays in the signa paths. In asynchtonous designs, the

fundional elements are driva by a sequaice of operation and t h e is no global dock.

Only synchronous designs are considered in this thesis.

Some of the key architectural terminology (11 ,[8] ,[IO] of digital systems are:

15

a Latency: This is the time re~uired for the output bits to appear afta corre-

sponding input bits are nad. A critical path in the design is the path of operators

' with kgest total latency h m a state input to date output.

a Pipelining: Pipelining &ers to the altamate interleaving of combinational

logic and tegister elements in the structure. The result is passed h m one stage

to the other. It helps in naliPng higher bit rates by makkg the depth of the

combinational Iogic delay between pipeline registas smdl. Pipelining may seem

to increase o v e d latency of the design, but for antinuow tirne operation the

systeln can accept new input evay dock cyde. This increases throughput. In a

pipelincd sfrucfure a large number of algorithm iterations are processed sirnul-

taneous1y, d at various stages of completion.

2.3 Serial Design Techniques

Communication and cornputafional strakgies makes bit-suid system different

h m its p d d counterpart. The following points describe design features [6],[7], [8]

of bit-sais multiplias.

a Smder Size: Recurrence of the basic operation in a multiplication dgorithm

aUows shared hardware resoutces between the various opaations. This le& to

ana &tient irnplementation and high connectivity- In g d they art N times

smder than p d e l counterparts. Also, the I/O pin requirement in a serial

design is very smd.

O Higher Bit Rate: Pipelinhg the si@ paths at bit or digit lwei redts in

shorter propagation delays through the combiitiond logic in the operators.

Hence higher bit rak or faste clock implcmcntation can be reaüzed.

O Flexibility: For a given coefncient (multiplier) word length design multiplier

size remains the same irrespective of the data (multiplicand). This makes it

flexible for d i f f i t data word length implemcatation. Wh- size inmeases

both 6 t h d c i e n t and data word length in the case of a paralle1 design.

O Simple Control: Data bits for each word take several dock cycles to propagate

dong the given signal path. So, a contr01 si& indicating the Least Sigdicant

Bit or Digit arrival time is requUed in serial designs. This paiodic signal and

dï6erent deiayed versions of it sems the control and synchronization purpose of

the whole iteration.

O Moddarity: D b t types of serid architecture implunentation can be done

by using basic functiond un i ts like adders, multiplexors, shift tegistas, delays,

and latchcs. This gives high modularity and easy design dtetations.

O Rout-Ability: A sesial architecture r d t s in fewer signa nets or interconnects

which leads to betta tout-ability of the design.

Fig 2.2 shows the serial commWUcation aspects of the bit-serial and digit-sais (2-bit

digit) architectures dong with the contml signal. Multiplication is a Least Signiscant

Bit (LSB) fint operation. A control pulse is coincident with LS opaand bit atrival

time. The control cycles or the iteration time is shorter in digit serial data flow than

in bit-serial.

Bit Nmber i

Data Bits

LSB .:

Figure 2.2. Serial Data Flow

2.4 Performance Measures

Vasious parameters have been considaed by designers [l] ,[6] ,[12] ,[U] to meastue

the efficiency of pmcessor architectures. These measnres consider both the physical

a n a of implementation and the tirne reqomd for the computation. For a synchronous

18

system time marsurc is the numba of dock cycles qaind to process the dota bits

or the rate at which the dock can be run.

The @ormance of a pipeiined syncbronous systcm cm be m d by Latency

and Throaghput. Throaghput is the rate at which inpot data samples are nad and

output data samples aie produad. This is of higher importance for signal pmcessing

circuits and is also known as sample rate. It gives the speed mcasun of the design.

Throughput is proportional to the dock speed and iteration period of the circuits.

The ma)Eimum dock speed and hence throaghput is determioed by the depth of the

combinational circuits between pipelinhg registers.

Anotha typical meaSUTe is the area time product. This measure gives equal weight

to area and tirne. This measure is used in this thesis to compare the performance of

different architcmual implementations of multiplias in both Viewlogic and Synopsys

platforms. The reciprocal of the area-the product is sometimes considered as an

&ciency measure of the silicon implementation of the computational algorithm in

temis of asiount of tbroughput pet unit ama.

2.5 Serial vs Pardel

The principle drawback of a Sefial multiplier is that, a product can only be corn-

puted in tirne proportional to N. In a p d e l multiplier, a N-bit product can be

computed in time independent of mord length. Serial mdtipliers also have high ratio

of storage to logic So, despite the fkct that the Senal appraach d t s in s m a k

19

s k , high connecfivity, aud higha bit rate, tby t d in hi& speed application to

th& paralle1 coupterpart. A lot of -ch work has b e n done to investigate the

ways to increa~e the computational throughput of conventional bit-ka1 design tech-

niques [6], [?] ,[a], [12] . Digitserid approach, tarin-pipe bschitecfure, various recoding

techniques are some of the outcornes.

It is McuIt to quasite the p w e r consumption for di.ff&nt architexAures. For

a Complementaq Metd Oxide Semicondudor (CM0 S) implementation the circuit

draws power only during logic transition. For a &en system power consumption is

directly proportional to the dock s p d . Serial designs nui at higher dock speed but

in the circuit the logic transition depends on the bit patterns king processed.

Better AreaeaTime product of the serial designs over p d e l design is a major

motivation of this reséarch project.

2.6 Radix-N Recoding

Several recoding schemes have been introduced in an attempt to d u c e the size

of the multiplia units [7],[8]. The trend is to regroup the incornhg coefficient bits in

pair and interpret them in a so that the dvorab le ratio of storage to logic is

decreased. Table 2.1 and 2.2 show vazïous &Oding schemes.

In non-redtmdant radix-n recoding, as shown in Table 2.1 only X dinerent bits are

required for N M'nt combinations of radix-N representation are usecl. Whereas

20

X+1 dig'érent bits an used in the case of ndupdant-P recoding. Table 2.2 shows

a ndundaati4 ncoding scheme. AU the d a e n t bit groups are treated alike in

redundant mdîng- The sign bit gmup is treated dXerent1y from the umigaed bit

p u p s in the case of non-redmdant mmdhg scheme.

As a simple tise in a two bit digit s&al architecture d c i e n t at each stage can

be recoded into radix-4. Similady d i x - 8 recodhg helps in size reduction for a t b

bit digit Senal design.

For the computational steps in the algorithm, multiplication by a recoded d u e

of 4 is represented as two left shifts of the data bits. Similady addition of two left

shifts, one le& shift and the data bit as it is represents multiplication by 7 (4 + 2 +

Table 2.1, Ordinazy (Raduc-2)

1 bit (Il) Represent at ion

~nsigned Signed +1 -1

O O

Non-Redunht Rôdix-1 Radix-4

2 bits (k& & k;-) Representation

Unsigned Signed +1 *1 +2 -2 +3 O

O

3 bits (&, Y& & k;-) Represent at ion

Unsigned Signed +1 *1 +2 e +3 &3 +4 -4 +5 O +6 +7 O

Interpretation of a binaxy number into higha radices teduces the number of partial

Table 2.2. Redundaat Radix4 Rezoding

products required to form a multiplication. This is shown in the following paragraphs.

A two's complement binary multiplication of N-bit meEcient 'A' and M-bit data

word 'B' can be expressed as :

and

Multiplication based upon a non-redmdant radix-4 interpntation of the coeflicient

word is expressed as:

Only N/2 partial products are to be computed in Radix-4 representation. T h e

partid products are:

(2A2*1+ &) B4' for i f [O, (N - 4)/2] and

EQuation 2.2 can be shown to be quivalent to equation 2.1 by wing 2*$ = 4' and

Only N/3 partial products are to be cornputecl kt this representation. These partial

products an:

As above Radix-8 notation can be shown to be equivaient to ordinary two's c m -

piement notation of equation 2.1 by using 2% = B ~ , 2% = 4' and rearra~lging tetms

With increasing

(2.5)

Radix-n representation the r&g logic requirement becornes more

cornplex. This reduces the advantages of reduced area, reduced latency and simplicity.

2.7 S n m m a r y

An introduction to serial and paralle1 architecture design approach is given in this

chapter. Features of bit-serial multiplier design are presented. Performance measures

of synchronous systems are discussed. Implementation and advastage of radix-n

recoding scheme for 4 c i e n t word is given. A cornparison of saial and pataud

designs is discussed.

CHAPTER 3

MULTIPLIER DESIGN

3.1 Introduction

This chapter describes multiplia design approaches. D e t a indade the number

decomposition and algorithm development- A hierarchical design a p

Modular approach gives the flexibility to incorporate diffèrent word

length implementation arith ease- Each design is an sssembly of g d functional

units- Input bits flow and control si@ masagement in the pipelined data path is

a h discussed in detail-

Hardware tesource sharing and t h e schedniing of the events within the archi-

tecture plays a critical role in overall efEciency of the saial design- Various design

techniques lead to ~Wereot area-time measunS. The advantage of explorhg Mti-

ous design techniques within the sexial design approach is that diBetent designs may

be suitable for differeat real time (implementation) paformaace requirements. Five

designs are considerd in this thesis:

0 Ordinary B i t - S d Multiplier

a Radix-4 Bit-Serial Multiplia

Radix-8 Bit-Serial Multiplier

0 TweBit-Digit Digit-Serial Multiplier

P d e l h y Multiplia

AU the designs have three main modules. A 12-bit coefficient multiplier design is

used to compare all the design techniques(cases). The namba of data bits is inde

pendent of the size in all the designs as explaineci in section 2.3. Howeva, longer data

words inctease the Iatency of the design. The coefficient word length is inaeased by

inaeasing the namba of middle modules.

Two's amplement binary numba repre~e~tation is considerd for d the multiplier

designs throughout this thesis. Only fixed point arithmetic is wed in all the designs.

3.2 Multiplication Algorit hm

Multiplication is a LSB Srst operation. The multiplication of Coefficient and Data

in serial fahion [l] ,[6] ,[7], [8] is presented in fouowing steps.

0 Latch the serial ux&cient bits input into the latches in succession.

a Regroup the latched bits according to the Radix-n recoding scheme in each stage.

Fonn the partial pmduct h m the recoded values of the d c i e n t and the serial

data bits as they amve.

a Add the partial produd in each stage. Shift this sum by the required number of

dock cycles. Pass it to the nart stage and concumntly sign extend the previous

partial product for requyed number of answer bit.

0 Repeat above steps in each stage.

Fig 3.1 shows a typical bitaaial multiplier modale. A lin- amay axchitecture of

such modules becames the whole design.

Truncali011 and

Sign Bit Extcntion

3.3 Functbaal Units

To d e v e all the advantages of a modiilar design, the multipliexs an designed as

a cornedion of general functiod units. It provides both the flexibility in extending

the Coefjpcient nord-length and algo provides easy debugging and maintenance of the

VEDL d e . The g e n d fundional uni ts can be used in different multiplia deaigns as

a library unit. This section describes these functional anits as pictorid representation

of their VHDL descriptions.

Latches an used to store the coefficient bits from the bit-serial or digit serial paths,

for the entire iteration paiod. It is corn@ of a two input multiplexor and a bit

delay (D-&p flop). Fig 3.2 shows a latch.

The select signal in the latch circuit is derived fiom the master control signal and

is one bit (or digit) time *de. This signal is dieshed for each itastion so that the

latched bit on the Dout pin is raiewed. It is synchronized with the LSB or Least

Sigruficant Digit (LSD) arrkil t h

The delay unit used in the latch is also used for pipelining snd matching the &val

time of vatious signals in the design. A one bit delay unit is compused of a D type

füpflop without dock mable or nset inputs. These modules are dso used in the

fruncation and sign bit extension block shown in Fig 3.1.

This module is used in the truncation and sign bit extension block of the twebit

digit digit-SeLial multipliem. Fig 3.3 shows a block representation of this unit.

DELAY

DÏn - Q . - - D DFF

1 - 1 - Figure 3.2. Lat&

3.3.2 Serial Adder

Serial Adders are used in the design for the formaton of partial product sum.

Addition is a LSB first operation. A bit-Serial adder is shown in Fig 3.4. It is made

fiom a two input multiplexor, a full adder and delay unit.

During the first dock cyde the LSB of both the inputs are present on A and B.

During this penod the contn,I (sel) signal of the multiplexor is high which sets the

carry input. Carry input is set to 'O' at the beginning of slfmmation and the LSB time

signal inhibits the carry resulting hm the previous addition. The sum is produced

and the carry is delayed (or stored) for use in next dock cyde 6 t h subsequent bits

of inputs A and B. This addition proceeds until al l bits of the inputs A and B ate

processed. The hardware size of this adder rexnains the same irtespective of the word

lengths of A and B.

Digit-serial adders are the same as bitoserial adder but have as many fidl adders

in the loop as the number of bits in the digit. All the bits in a digit are processed in

pardel. The number of dock cycles t e q k to process a word deaeases with the

30

inueasing digit width. This inc~eases the propagation deday through the adder Chain

which necessitates a decreased dock rate.

Serial adder can be convated into addition/subtraction by complementing one of

the input bits and setting the cany input high. The pardel anay multiplier uses fidl

adders and ripple cany addas (of d u e n t bit width) as functional modules. They

are defined as functions in a VHDL package aad d e d to make the fui& design.

Sel Cin

Conml Ai L t A FULL

Sum

ADDER Bi b B

CotEt I

I 1 DELAY 1

Figure 3.4. Bit-Serial Carry Save Adda

3.4 Ordinary Bitoserial Multiplier

This section desaïbes the design and firadionhg of ordinary bit-said mdtipliers

[a]. They an d e d ordhary for the fact that this no i n t d regrouping (and

recoding) of the operand bits.

Fig 3.5 shows the thne modalce; first, middle and last modules of ordinq bit-

serial multiplier. The operand bits ate sa idy fed to the anay of these modules.

The total number of modules is e q d to the coedlicient word length. Word length is

increased by adding middle modules in the chain.

The coefficient bits aie latched into each module starting with LSB in the first

module to sign bit (MSB) in the last module. Each d u e n t bit is interpreted

either as '1' or 'O' in dl the modules but as '-1 ' or 'O' in the last module. So, the

partial product of d c i e n t bit (multiply by 'l','O2 or '-1') and incoming seriai data

bits do not requk aa adda in d the modules. The adda used in each module is for

the sammation of the t h e shifted partial product drom the lower bit order module

and the partial product of this stage. The very first module d a s not need a partial

product sum input (PPSI), so the fidl adder is absmt. The last module has a serial

adder/subtracter to take can of a ho's complemcnt negative number (caefficient).

Contd

The control aspect is simpler in a Senal multiplia than in its p d e l counterpart.

A single signal indicating the axrival of LSB and various delayed versions of it serve

33

the control pupose- The wntrol timing synchronizes the opaand bits and partial

product input arrivaS at each module. Pipelining and/or latency change k v a l time.

Latency of Ordinary Bit-&al Multiplier is geperahed as [(2*CWL)-11 cycles.

Where CWL means Cdcient Word Length.

3.5 Radix-4 Bit-Serial Multiplier

Application of tadix-4 (non-redundant) recoding to a serial multiplier design is

presented in this section. The goal behind the application of interpal recoding is to

redua the hardware siz+ of the multiplia mit as well as to d u c e latency. The

gened idea behind t his design is Ieferred fiom [6], [?] , [a].

Fig 3.6 shows the d k - 4 recoded multiplier unit. The total number of modules is

half the number of coefficient bits. The operand bits are fed saially to the éura.y of

these modales. Two bits of the d u e n t word are latched into each module starting

with the Least Signifiant two bits in the first module to Most Signiscaat two bits in

the last module-

The îust module partial product is generated by multiplying the serially incoming

data bits by the recoded values of coefficient bits (0,1,2,3). The serial adder is

for partial product generation with a mode value of 3. This partial product is

right shifked by two bits and pêssed onto the next modulc

The middle module partial products are genaated as desaibed for the first

34

module. The two &al adders lue for partial pmduct generation with recoded

d u e of 3 and for the summation of this partial p d u c t with the time shifted

partial product h m earlier modules.

The last module two Most Sigllficant (MS) bits are d e d ss 0,1,-2,-1. The

serial adder/subtracter hae is for partial p d u c t summation with time shifted

péutial product h m the pmious stage, because multiplication with 3 is not

requind.

The Radix-4 recode logic block is simi1a.t in the fint and middle modules. Handling

the sign bit in the last module makes it diffèrent. Adding a middle module increases

the &cieat word length by 2-bits.

Control

In g e n d the wntrol scheme is the same as describeci in section 3.4. A single signal

indicates the time amival of LSB into the design and various delayed versions of it

controls the complete iteration. A change is needed to modify the dock cycles to

process the ncoded values of two coefficient bits in each module. This inaeases the ,

latency of the serid data and Cdefficient paths in each module by one dock cyde.

The total latency of the multiplier is reduced because two bits are processecl in each

module.

The latency of lodiX-4 design is given as [(3*CWL/2)+1]. The guard bits are

Figure 3.6. Radix-4 Bit-Serial Multiplier

36

requimi on the data word to prevent o v d o w in partial pmduct formed by the

recoded value of 3. Guard bits arc needed only with data because the d t i e n t

does not ditectly participate in any addition. The cornparison of this non-redundant

Radix-4 bit-serial design with the one in [7] is summ- bdow:

1. Only 2 g u d bits in data word art needed in this approach where as 3 guasd

bits are used in [7].

2. In [7] the 3x signal is computed only once in the k t module and rerouted

through pipeline delays to middle modules. In our approach 3x signal is corn-

pated in each module (except last).

3. A 4:l multiplexor is used for internal recoding. Our approach uses behavioral

VHDL.

4. In both approaches the basic recoding scheme is the same and logical data shifts

are used to perfotm arithmetic multiplication because circuitry to implement

arithmetic shifts are costly.

3.6 Radix-8 Bit-Serial Multiplier

Application of radb-8 (non-redundant) recodïng to a &al multiplier design is

presented in this section. The objective is to study the effed, of increasing radix

recoding on the resulting speed and areaIof the design.

Fig 3.7 shows the rsdix-8 recoded multiplia unit. The total number of modules

is one thvd of the numba of coefficient bits. The opasnd bits an fed seridy to the

array of thse mod des-

Three bits of the d u e n t word are latched hto each module. The three Least

Siflcant Bits are passeci to the first module and the Most Signifîcast thme bits to

the last module.

0 In the first aiid middle modales partid product is generated by multiplying the

seridy hcoming data bits by the recoded values of coefficient bits (O, 1,2,3,4,S ,6,7).

The two serial adders in this module ate for partial product generation with a

recoded value of 7. This partial product is rïght shifted by three bits and passed

ont0 the next module.

a In the middle module partial products are generated as done in the first module.

Two of the three serial adders in the middle module are for paztial product

generation with a recoded value of 7. The third is for the summation of this

partid product with the time shi&d partial product from earlier modules.

a In the last module three MS bits are d e d as (0,1,,2,3,4,-3,-2,-1).

The Radix-8 tecode logic blodr is similar in the first and middle modules. The last

module is different in the way the sign bit is processed. Each additional middle

module in the chah iacreases the d c i e n t word length by &bits.

Control

Figure 3.7. Radix-8 B i t - S d Multiplier

39

In g e n d the control scheme is the same a s describeci in section 3.5. A single signal

indicating the time &val of the LSB into the d&gn and d o u s delayed versions

of it controIs the wmplete iteratïon. A design change is needed to accommodate the

dock cycles to process the recoded values of three d c i e n t bits. This inaeapes the

latency of the saial data and d u e n t paths by one dock cyde in each module. The

net latency of the multipliais reduced as three bits sre processed in cach module.

The latency of the -8 design can be genedzed as [(4*CWL/3)+1].

3 .? Digit- Serial Mdtiplier

A twebit-digit Digit-Serial Multiplier is disctlssed in this section. This design

is one of the test benches to o h e the compromise between serial and parallel

approaches.

Fig 3.8 shows a 2-bit-digit-serial multiplier. The d c i e n t and data digits(2-bits)

are f d Senally to the modules. Each d c i e n t digit is latched into these modules

starting h m the LS digit in the fint module to the MS digit in the last module.

Two bits of the latched d c i e n t digit are ncoded into Radix-4 representation. The

iteration steps of this design are the same as the Radix-4 design presented in section

3.5 except that digit-serial adders are med for pastid product generation. Also the

truncation and sign artension block in each module uses latch2bit modules describecl

in section 3.3.

The control scheme is the same as given in section 3.5 & 3.6 with a single signal and

Figure 3.8. Digit-Serial Multiplier

41

various delayed versions of it mamgkg the synhnization and pipelining aspects.

It masipulates the two bits of operand words to be processed in p d e l in egch dock

cycle.

The latcncy of this design is give by [CWL + 11.

3.8 ParaieEArray Multiplier

The key fatures of s e v d bit-serid multiplier do not stand out ifs fJ1 pardel-

asray multiplia design is not compared with them. Also, the synthesizing ability of

the VHDL synthesis tool for an ano intensive design approach is observed. For these

reasons a pardel-array multipk design is discussed in this section.

Fig 3.9 shows a pdel-array multiplier. Both Cbecient and data bits are fed

paraUeUy to the multiplier. In each row the partial product of the data bits and ith

coefficient bit is formed. This is added with the partial product, and carry from the

(i-1)th row in the amay. The N bit s u m is right shifted by one bit and sign extended.

The lower order produa bit (ith) is extracted from each row as the LSB of the sum.

The N bit carry and hi+ N bits of the sign extended sum are passed to the (i+l)th

row in the amy. The

by complementing the

Nth row in the

partial product

of coefficient word. The N bit swn and

array takes care of ho's complement format

accordhg to the Most Signiscant Bit (MSB)

highez (N-1) bits of the camy are given to the

(N+l) th row. Higher N bits of the product are formed by adding previous N bit sum

and N-1 bit carry with MSB of the d c i e n t word as LSB of the carry input in the

N+l th mw.

The latency of the p d e l - a m y is a single dock cycle. Pipelinhg delays included

in the amay inaeases the latency, but deasrcs the propagation delay through the

adda chah. This gives a higher dock rate. The pardel-array algorithm can be

implernented as a pure comb'iational &mit. Hence a separate contml circuit is not

required to synchronize the events in the computation schane.

3.9 Summary

The traditional two's compliment bhary multiplicstion algorithm is presented in

this chapter. Basic fanctiond uni ts rqWred for hieratchical design development are

given. Detaüed design descriptions for five multipliers are desaibed with their control

requhements and features. Blodr dia- for the VHDL design description concepts

for each multiplier design is given in this chapter.

Nth Row

0 Most SigdiautN RodPct Bits

Figure 3.9. ParaUel Array Multiplia

CHAPTER 4

DESIGN SYNTHESIS

4.1 Introduction

The HDL design method has largely q l a c a d schematic capture in digital design.

It has increased the productivity of a logic designer because it is:

easier to d d with large and cornplex designs.

simpler to reuse a design as they are more concise and readable.

a focused more on the Iogic verification than on the detded gate-level ùnplemen-

tat ion.

HDL design methodology steps indudes the development of a design description,

validation of the description, synthesis, and &al verification. The pro- of designing

hardware fimm a mode1 that defines the way the hardware d l operate is calleci

synthesis. HDL synthesis convats an abstract textual d-ption of a design into a

gate level net list. Typical HDL synthesis consists of two stages.

a Wanshtion: This is the bridge between two levels of abstraction, RTL and gate

level. A behavioral desaiption can be transIated into R!îL level or syntheskd

to gate level depending on the tools used.

45

Optimisation: This is a technologyapeafic design transformation to meet area

and speed mquirements for the design. Optimization is to maximize or mhimk

a d h b l e pedorma~ce characteristic daring the design process.

HDL synthesia yicld is measurtd in tams of the circuit quality with respect to a set

of design g d . It is essential to focus on the issues nlated to the final outcorne. This

chapter diseusses the VHDL design description and synthesis issues of Viewlogic and

Spopsys tools. The main objective is to address the differences in the various steps

incorporated in the HDL design methodology in these tools.

4.2 VHDL for Synthesis

VEDL has corne a long way. 1t is one of the popular HDLs in use today. It was

e s t developed in 1982 by the US depastnrent of defénse. It was recognized as a stan-

dard HDL by the IEEE (IEEE1076 standard) in 1987 and in 1993 [2] ,[3],[5] ,[21],[23].

VHDL is sirnilar in style snd syntax to modern programming Iaogueges, but includes

m a y hardwar+speafic constmcts. Fig 4.1 gives a pictorial representation of a VaDL

hardware model. It is a strongly typed language. A hardware model can be described

in Merent VHDL design description types. VBDL language constmcts are divided

into t h categories acmrding to th& ievel of abstraction.

O Behavioral: This category d&es the functiond or algorithmic aspects of the

design description, without nfeirence to its actual interd structure. Such d e

scription consists of system outputs expressed as:

- The fiinctions of system inputs by using boolean eqastions.

- The huiction of t h e and system inputs by using sequential VHDL process.

a Data-flow: The interpretation of data as fiowing tbrough the design, fiom input

to output. It is defiaed in tanis of a coileztion of data fransformations, expressed

as concurtent VHDL statements.

a Structural: This description is dosa to hardware. It is a VHDL model where

its hdionality is described in terms of instantiation and interconnections b e

tween sub-modules in the design hierarchy. An example of structural VHDL

description is given in appendix B.

Various VEIDL constructs work togetha to desaibe a design. They are:

0 Entities: This defines the interface to 0th- desigps tbugh port dedaratiom.

a Architecture: The fiuictional implementation of a design entity is defined in

architecture. It indudes different design picces and ~ ~ n s ~ c t s comnunicat-

ing through the signais by concurrent statements. An entity can have sev-

aal architectutal implementation to meet the requirements. Together the en-

tity/a;tchitecture pair repreaents a component. An architecture consists of the

following design pieces:

Figure 4.1. A VHDL Hazdware Mode1

48

- Declarationr: A definition of the signals, constants, and components to be

used to describe the design functionality.

- Pmcess: A p u p of sequentidly executed statements makes ô process. It

d&es an independent quential proccrs representing the behaviot of some

portion of the design. It is exeded whenever an event occurs on any of the

sipals in its sensitivity list. Ddaxation of a process in an architecture is a

concurrent statement .

- Subprogrpms: They define algorithms for cumputing values or exhibiting

behavior. They are used as computationd resources by s e v d architectures.

Unlike processes they m o t directly read or d t e signals fiom the rest

of the architecture- The communication is done thtough the subprograms

interface. There are two forms of subprogcams:

* P d u m : it is a subroutine which opesates on all visible parameters

and objects, and returns zero or more values through interface sigaals.

Within an arrchitecture a procedure is either:

a c o n m t procedure instantiated as a concurrent statement

* finetion: It is a routine th& retturns a single value directly. A hct ion

defines the return d u e which is cornputed based on the values of the

formal parameters.

- Component lnstantiation: It instantiatcs the components defined in the

declmîtion part and connects their ports to 0th- components and concur-

rent s i W . This is a major construct for structurai description style.

0 Blocks: It describes a portion of the hierarchy of the design in an architecture.

A blodr is a unit of module structure, with its O- interface, comected to 0th-

blocks or ports by signals.

0 Configuration: The binding of a particu1a.r architecture to an entify to make

up a component in a design is describecl as configuration. This comtruct is used

to select the best (suitable) architectural implementation of the an entity.

Packages & Libraries: A VHDL Package is a collection of general constants,

data mes, component dedaration and subprograms that can be used by more

than one design. Each VHI)L package is cornpileci into a logical VHDL library

name. These libraties are compiled VHDL codes mapped into physical location in

the disk. A VHDL library may consist of several VHDL packages. They faalitate

modulas design appmach, easy handling of a cornplex design and design reuse.

Predefined VHDL libraries and packages to be used in present design is dcsaibed

prior to the entity dadaration. Each entity is defined by a partidar architecture

which is composed of several VHDL mmtructs. All the concurrent statements within

00

an architecture compute their dues at thc same time and they coordinate by mm-

municating via signaS. hespative of the interd funaional implementation the

interface signais enter and lave the design throagh entity.

4.2.1 VHDL Constructs support

VHDL coduig is the foundation for logic synthesis. Both Viewlogic and Synopsys

tools support most but not all IEEE standard VHDL constnids [19] , [20] ,1211 ,[BI. The

support also differs for simulation and synthesis phase of a design. Thus a VHDL d e

scription that simulates correctly may not be synthesizable. Many VHDL m n s t ~ c t s

usehl for simulation are not relevant to synthesis and csn not be synthesized. The

list of IEEE standard VBDL construct support in Viewlogic and Synopsys tools can

be foud in detd in [19],[21],[23].

In addition to the d3Terence in VHDL constructs support, both tools have their

own guidelines for better syntheois results. A good coding style wmplying with these

guidelines generates efficient designs. Some of the issues regarding c o ~ c t s support

and guidelines relevant to the designs inv01ved in this research projet are discussed

hem.

Synopsys tools have special VHDL cornrnents that can be used in the VHDL d e

scription as compiler directives to direct the synthesis actions. This is not possible

with Viewlogic's VHDL description. SMarly Synopsys tools have better set of syn-

51

thesis attributes and constraints. The ana a d s p d n k t e d paforxnance comtraints

for a desig~ ( m m d e l a y ) can be described directly in the VHDL description.

The synthesis tool franshtes Synopsys defîned VHDL attributes as design constraints.

This capability serves two purposes :

Optimization of a design a n be controlled h m wîthin the VHDL description.

0 The VHDL description can be used to document these important specification

information.

It is important to incorporate multi-aschitechue implementation of a design entity

during early stages for architectural tradeofi. Configuration construct are used to

bind an entity to a architecture in a hienuchical design. This construct is not sup

ported by Viewlogic, but Synopsys supports configuration fot one toplevel entity

with as architecture.

4.3 Viewlogic Toob

Powerview, the Unix version of Viewlagic tools was used for synthesizing VHDL

description into net-list [20] ,[21]. Fig 4.2 gives an overall design flow with the tools

involved in each step. Presynthesis fundional verification is done by analyzing the

VHDL description by VHDL analyzer and simulating the analyzed file with ViewSim

toois. This is shown by path 2. The design synthesis path leading to p s t synthe-

sis gate level simulation is shown by path 1. It starts with analyzing VHDL within

52

Viewsynthesis tool. The Viewsynthesis tool is the con of the logic synthesis pr*

ces . It is an Graphical User Interf' (GUI) with command line interface- Opaating

conditions, design comfratots, target technology libraty and analyzed VHDL an the

inputs to synthesis process. The rcsynthesis path is to reoptimize the design with

Merent set of constraïnts until the design mets the nquirements. Afta synthesiziig

the design a gate level net-list file "design-wir" is generated. This is used by the VSM

simulation tool for gate level simulation and by Viewdraw schematic representation

tool. This net-list output is a h used for hardware implementation. ViewSynthesis

tool does not support back annotated optimization with the physical circuit param-

eters fiom floor-plaPaing or place & route tool as an input to synthcsis process. But

these parameters can be used to constrain the design to meet the requisements.

4.3.1 Synt hesis Crit eria

Viewsynthesis has three sub components for controllhg synthesis parameters im-

plementation. They are :

0 Synthesis Criteria:

- The design optimization in tams of éuea vasw speed is done by assigning a

number betwan 1 and 100 to arealspeed psrameter. Number 1 meam the

design is optïmized for minimum axea.

VHDL Description

VHDL l

Set CriteZia ûperating Conditions dé Desi@lcoILStf8ints

I ,

Report Log ale wTm

Simulation ViewSim tool

Simulation R d t s

Fi- 4.2. Desi- Flow with Viewlogic Tmls

54

- The technique d by the synthesizer dusing opttnization is specified by

assi& a d u e to logic- parameter. The design's l@c type could

eïther be Fite State Machine (small), data path (large) or mixed (default).

- The target techno1agy into which the design is to be mapped is s p d e d by

assigning a technology iibrary name to tech parameter.

- The design speed related to different fabrication proces is speafied by as-

signing either O (slow), 1 (typical) or 2 (fast) to the process parameter.

- The physid operating points for the design are speàtied by assigning re-

quired values to temperature and voltage parameters.

Design Constraints: The following set of consttaints helps in modeling timing

behavior of the design during design optimization.

- The ViewSynthcsis tool uses the value assigneci to inputanival parameter

to mode1 the numba and type of gates in the path. It s p d e s the time

value of signal arrivai at different pins in the d e i g .

- The spdcation for signal required time at the output pins in the design is

done by using outputnquired parameter.

55

- The dtiw strength requiremat of an inpat pin is s p d e d by input-drive

paxameta. This determiPa the type Gd quantity of loads put on the sîgnals

by the synthesizer. They an used in delay dculatîon.

- The output Ioads of an output pin is constrained by assigning s value to

- The input load to an input pin is constrained to a maximum by using

maxinputload parameter. This load specification is used in delay calda-

tions.

4.4 Synopsys Tbob

Synopsys is a bigger CAD package than Viewlogic It has plenty of tools and

fatmes and a better Unix Like interfhce and GUI compared to Viewlogic. Only

logic synthesis related issues relevant to the design in this project are d i s d in

this section [23]. Fig 4.3 gives an o v d design flow with the tods involved in each

stage. Path 2 and 2a in the diagram show the preqmthesis simalation path for behav-

iord verification. It indudes andyzing VHDL code using Graphid VHDL Analyzer

(GVAN) tool and simulating the analyzed design with VBDL Debugger Simulator

(VHDLDBX) tool within VHDL System Simulator (VSS) family tools [23]. Path 1

shows the synthesis snd design optimization path. VHDL System Simulator (VSS)

tools and simulation control laquage with test vector inputs are used to simulate

the functionality for both paths 1 and 2. The Synopsys Des ip Compiler Family syn-

56

thesis tod is used as the cote of the synthesis process. In synthesis path the design

is analyzed withïn the Design Compiler- It is then ehborated to check correct bus

sizes and produœ an intennediate design. Analyze and elaborate steps are together

d e d nad design step. Design constraints, choia of target techn01ogy libraryt design

environment setting are provided as inputs to the synthesis or design compile proass

[3],[23]. Design environment settings refa to the values of voltage, temperature, aad

silicon pfocess for the library cells. Design coxwhînts snd opthkation axe discussed

in section 4.4.1. The Design Compiler Family has FPGA Compiler which targets

FPGA technoIogies. Since a Xilinx FPGA is used for hardware implementation of

the designs in this project, FPGA Compiler tool is used for synthesizing designs in

Synopsys. The FPGA Compiler has a special aigorithni that synthesizes and maps

dkectly to XC4000 devices. This gives an &&nt means to implement high-level

architectue-independent designs in Field Programmable Gate Array devices.

mer synthesizing the design several design âles are generated. A synthesized

output in the Synopsys intemal data base format ("design-db*) is used for schematic

representation and future format convexsion. The output in the VHDL gate level net-

list format ("gate-level-design-vhdn is used for gate level simulation through path 2b.

The synthesized VEDL net-list of target libmy c& is d y z e d using GVAN tool and

simulated with F d T i g Gate-level Simulation (FTGS) VHDL simulation models

using VSS tools. The third output of the synthesis process is the Synopsys Xilinx

Net-Est Format (XNF) file which is used by the place & route tool.

Synopsys supports back annotated design reoptixnization. The physid circuit

information and delays h m flmr-planning or place & route tool written in Standard

Delay Format (SDF) is sapplied as a comtraint to the design compila. This provides

* . better and doser to implementation comtmmq and racompilùrg of the design for

improved perfomiancc This is shown in Fig 4.3 by the recompile feedback path aRer

place & route is done.

4.4.1 Constraints and Design Opthkation

Constraints define the goals of the synthesis process in termo of meamrable circuit

characteristics (area, timing). The optimization process or the design compilation

attempts to implement a combination of target library cells with design wnstraints

to meet the hctional, ara, speed requirement of the design.

4.4.1.1 Constraints

Synopsys logic synthesizer has two major types of design consimhts.

0 Optimization Constraints: They represent design goals and restrictions that

a designer wants but may not be crucial to the operation of the design. They

coneists of timing consfraiiits such as input delay, output delay, maximum ares

and pomity. Maximum dday takes the highest precedence during optimization

phsse.

Design Rule Constraintr: They relaect technology qeckc restrictions that

must be met for a hctional des@ They indude constraints Iike maximum

transition, mPucimum fan out and maximum capacitanœ-

Design Compiler b t works on optimization wnskahts. When both timing and

an0 consfiraints are used it attempts to meet timing g d More ares goals, because

timing dways takes p d e n c e ova axea. Afta opthkation constraints are met

the design compiler works on design d e constraints. The tool tries to mat both

design rule and optimization oonstraiats but gives emphasis to design nile constraints,

because they are requirements for a funaional design. So the design compiler fixes

design d e violations even at the cost of vio1ating optimwtion constraints. During

the compilation phase the Design Compiler tries several opthkation moves. These

moves are accepted only if it decreases the eost of one parameta without increasing

the cost of more important parameters. In 0th- words,

An optimization move that improves maximum delay parameter is always ac-

O An opthbation move that improves powa is accepted only if maximum and

minimum delay and minimum pomity parameters do not inaeaee.

O An optimization move that improves area is accepted only if power and delay

costs do not inaease.

The design compiler attempts to optimize the design to meet the constrahts in

vasious phases of optÎmization. They are:

a I/O pad optimization

a Final Sequentid Optimization

The first phase of gate-level optimization is to map the sequential tells to the cells

in the tKhnology libraq. At this point the delay thmugh the combinational logic is

not defineci. After this phase following optimization information is dhed:

1. Location of the combinational logic clouds between sequential cells.

2. Timing constraiats on the logic clouds required to meet the setup and hold

consttaints on the sequential ails.

The combinational optimization phase trassforms the logic level description of the

combinational logic in the design to a pte-level net-lia. Two main steps of this phase

are:

61

1. Tecboology-independent optimization, which operates at the logic Iwel. It a p

piies algebraic and boolean techniques to a set of logic equations. This step

r&mplements the logic equations to meet the timing and ares goals, but rdains

the hctionality of the original logic. The common techniques used in this step

are:

0 Flattening: It removes aU intermediate &ables, resulting into tw4evel

sum-of-products form.

Structuring: It converts twdevel logic equations to a multilwel structure

to meet the design comtraints. This technique f&om out common subex-

pression as intemdiate variables, then substitute these variables in other

logic equations where possible.

2. Technology-Dependent Optimization (Mapping): The output fiom the previous

step is used in this step. During mapping, components fiom the tedrnology

library are selected to implemeot the logic structure. The initial logic structure

is mammged locally to try diff't logic combination, mtil those components

that xneet the predehed design comtraints are kept.

After a full mapping of combiitional logic and an initial mapping of sequential

iogic, 110 pads are ineerted and mapped. In this phase, input and output bders

are added to each port in the top-levd design. The bders are sized to meet the

62

port-to-port timing c o ~ t s when the delays through the wre logic are known.

As I/O b&ers consume signiscant nurent, the smdest I f 0 b d k that meets the

timing speciscation of the design are selected.

With accorate d u e s for all delays through the 110 pads and combinational logic,

design compiler replaces the initial estimate on sequential ce11 mapping in bal sc

quential optimization phase. Cornplex ssquentid eells fiom the libraxy can be used

to reduce area and delay. Design timing can be improved by choosing higher perfor-

mance sequcntial tells.

Localized adjusting is the final phase in gatelevel optimization. It follows a set of

heuristic rules to make l o d optimizations to adjust area and delay.

4.5 S u m m a r y

An introduction to VHDL as dcsign description tool is given in this chapter. A

block diagram representation of VHDL hardware model is preseoted. The design

synthesis flow for Synopsys and Powerview as VHDL synthesis tool is discussed.

The importance of the pro- of design optimization and parameter constraining to

achieve optimum results is discussed in this chapter.

CHAPTER 5

DESIGN IMPLEMENTATION

5.1 Introduction

The lowest level of design description is the physical domaio. The physical domain

specifies how the structure of a Semicunductor techn01ogy is built. This structure has

required connectivity between physid blocks to implunent the prescribed behavior.

Physical abstraction of the design hctionality is an involved process. The purpose

of this chapter is to gin an ovavicw of prcclent trends in physical implementation of

digital IC design and related semiconductor technologies. It is not intend to cover

details such as 4- proassing, p h o t ~ ~ k i n g and other steps in fabrication process.

Logic f d e s have corne a long way with advancements in semiconductor technolo-

gies [i 71, [NI, [El, [26], [33]. The phenornena that started wit h vacuum tubes, diodes

and switch-mode transistors ha9 evolved into gate arrays, standard ceIIs and p m

&rammabIe logic devices (PLDs). The quest for d e r , faster, and low power de-

vices has emerged into todays CMOS and GaAs techn01ogies. CMOS logic is the

most popular logic family in the semiconductor i n d m now. Section 5.2 describes

the CMOS logic's sco~ecard. There ase meral ways of implementing a CMOS system

design. An o v e ~ e w of these options and the implementation methodology used in

this thesis work is discussed in section 5.3.

5.2 CMOS Logic

CMOS technology is one option in a range of technology available to the electmnic

system designa. 0th- populat options indude silicon bipolar technology, GaAs tech-

Due to these new technologies olda logic familes like Diode 'Ltans'itor Logic

Register Transistor Logic (RTL), Emitter Coapled Logic (ECL), Transistor

Transistor Logic (TTL) are d y used now. Among new technologies, GaAs demon-

strates the fastest gate speed. Bipolat technoIogies an not far bebind, and admced

CMOS technologies are comparable with bipolas. CMOS technologies in g e n d show

the highest densities and lowest power per gate. CMOS technology is adequate for

analog circuits but better paforming bipolar circuits may be constructed. CMOS

technologies are are the cheapest to mapufacture for high densi@ digital circuits 6th

moderate analog requirements. Design a t s are the cheapest for CMOS due to the

large investment already made in design tool and all l ibdes. A combination of

CMOS and bipolar technologies d e d BiCMOS is emaging as a popdar techno1ogy,

especially for mu<ed signal chips[l?]. Though CMOS is not the only choice, for an

overwhelming pucentage of today's electronic systan, it is the technology of choice.

It is worthwhile to know the advaptages and disadvantages of a technology type when

making systan level decision for implementation. A brief summary of main CMOS

attributes are presented below.

Fdy restored logic levels, i.e. output settles at VDD or Vss.

0 Trassition times - Rise and F d times are of the same order,

Memones are implemated both densely and with low powerc dissipation.

Transmission gates pass both logic levcls well, allowing use of &dent, widely

used logic structures sach as multipIexors, Iatches, and ngistas.

0 Power Dissipation - Alrnost zero (only leakage) static power dissipation for M y

oomplementary circuits. Power is consumed only during logic transition.

0 Precharging Characteristics - Both n-type and ptype devices are adable for

precharging a bus to VDD and Vss- Nodes can be charged M y to Vm or

alternatively to Vss in short tirne.

0 Power Supply - Voltage q q a w d to switch a gate is a fixed percentage of VDD.

Variable range is 1.5 to 15 volts.

0 Padring Density - Rsquires h devices for n inputs for compIementary static

iogic. k s for dynamic logic fonns.

Layout - CMOS facilitates @ar and easily automated layout styles.

Due to its dominance CMOS procese density is of sab-micron level (a measure of

CMOS trmistor geornefry in pzocessing tcchnology). Cornplanentary gates are al-

most guaranteed to finction conectly. The automated CAD packages available have

66

reached a point where the majority of systems ean be implemented in highly sut*

mated fsshion. Howwer, 1-edge pducts continue to push the technology in

tenns of cost, density, speed and powex. BiCMOS is an arample, which is a cornProc

mise between low power, hi& density of CMOS and hi* speed of bipolar devices.

5.3 ASIC llechnologies and Programmable Devices

The CMOS chip implementation has wide ViViety of options pmviding the tade-

ofEs between design complexity, cost, spd-of-operation, impiementation and t h e

to market. This section gives an overview of such impIementation options and the

methodology used in this thesis. As the process of designing a system on silicon

is cornplicated, Very Large Scale Integration (VLSI) design ai& bave corne up with

severai CMOS technologies that caa be automated into the CAD tool being used.

These ASIC techaologies d u c e the complexity, inaease productivity and assure the

designer of a working product while providing some flexibilities. Fig 5.1 shows the

acronym tree of IC technologies.

Programmability of the ASIC technology is a way to achieve wides use and flex-

ibility. Often, the performance of the design imp1emented in programmable devices

rnay not meet system goals and an altemative solution is required. This prompts

the need for custom implementation. But, the reprogmnmable featum, cheap &

short prototyping time and automated design steps integrated with the CAD tools

make p r o g i z a b l e devices popular for design implementation. The spectrum of

3 I (b * 8 9 Field-programmable (PLDs)

f

Full Custom Standard MPGA Simple PLDs Complex FPOAs FPICs Ce11 (e.g. PALS) PLDs (e.g. Xiliax,Actel)

programmab1e devices in CMOS is divided into thm areas.

a Devices with programmable logic structures. T h i s dass of programmable CMOS

devias are n f d to as Prog~smmabIe Array Logic (PALS) or PLDs. h e r a U y

they are imp1emented as AND-OR plane devices, e.g. 22V10. These devices are

programmed by ciianging the charocteristics of the switEhing element.

Devices with programmable interco~ect. These devices progam the routing.

An example of this method is the Actd FPGAs which uses an element d e d

PLICE (Programmable Low-Impedasce Circuit Elaent) or snti-fuse.

a Devices with reprogrammable gate arrays. These are the mat popdar devices.

They are discussed in section 53.1.

In g e n d a ptogrammable bgic device consist of the foIlowing basic resources:

a Logic/Memory Blocks: Configuration if these blocks is based on lookup tables,

muhiplexors, AND-OR planes, gates, or transistor pairs.

a I/O Blocks: U d y bidkctional and may incorporate latches, fipflops, slew

rate contd, puUup/puiidown.

a Interconnect: They ptovide scient local and global connections. The focus is

tu provide maximum flexibility with minimum delay and area.

a Dedicated, low-skew dock distribution networks.

69

The best balance of the above resowces is the t q e t of every programmable device.

The programming technologies hcorporated by these devices indude:

Fusible Links: This tcchnology is normally used in conjunction wïth a bipolar

process, whae the device csn sink the high cannt naded to blow normdly

closed fusea. They are onetime pmgrammab1e.

a Anti-fuse: A normally high resistasce structure is changed pe~manently to a

low-rsistance structure when a high programhg voltage is applied. This is an

one-time prognrmmable technique-

Static Random Access Memory (SRAM) ce&: In this method the intercon-

nect configuration is achieved by contxoIliog transmission gates, Multiplexors

(MTlXes) or pass transistors. The state that determines a given interconnect

pattern is held as a application program in static &AM cells distributed across

the device. This technique provides re-programmability.

a EPROM & EEPROM switches: A signal is pded dom ushg Electrically P m

grammable Read Only Memory (EPROM) or EIectricaUy Erasable Programmable

Read Only Memory (EEPROM) cells. This method is also reprogrnmmable.

A programmable gate array device consist of identical logic hctions difhwd in

a reg& pattern, or -y in süicon. Tbese b1ocks of simple logic fhction can

be inte~connected as nqaired by appropriate customization of one or more levd of

metaization. The regular layout permits the use of automateci routing progrâ111~

that can translate a logic net-list into a chip layout. A sigdicant advantage of the

re-programmable gate anay is the ability to ncamfigure the intemals of a chip by

changing soRwate (routing program). This flexibility is of considerable advantage in

a product that has to undergo field apdates.

FPGAs azchitectures are good compmmise between standard (fixed design vendor

based) and custom circuits. Hence this approach d a s not asuaily yield as hi&

a performance as fd-custorn solutions, nor it is as flexible in the range of circuit

complexities which can be accommodated. So, a FPGA implementation is generally

referred to as Senil-custorn design.

The Xilinx FPGA technology was chosen to implernent the designs in this project

work, primarily because among sevesal ASIC technologies, the Xilinx technology 1i-

brary is the only one available for synthesis in Synopsys as well as Viewlogic tools in

the ECE Dept. The'other factor is that XACT tool is available for placement and

routing the design. &O Xilinx is one of the most populas FPGAs in the market

today. Some of the otha re-programmable FPGAs are Altera, Atmel, Algotronix,

71

ATkT and one time progmmmbie FPGAs am Quicklogic, A d , Cypress. A Xiünx

FPGA consists of a symmetrid array of CodgurabIe Logic Blocks (CLBs) embed-

ded aithin a set of horizontal and verfical channels that contain routing which can

be customized to interconnect CLBs. The interconnec3 configuration is achieved by

tarning on n-channel pass transistors. Static RAM ails are ueed to hold the state

that determines a given intercomect pattern- Each CLB conaist of two 4 by 1 &

one 3 by 1 lookup fiuiction generators, and two flipflops. Each input and output

on a CLB has a dVsd intercomect, which allows most l o d intercomection between

adjacent CLBs to take place. The global hterconnect is achieved by programmable

switchiag matrices at the jnnction of horizontal and vertical routing channek. The

timing of a design implemented in Xüinx FPGA is dependent on the basic CLB speed

and roating delay terms. Appendix A shows the block diagram of Xilinx X W 0 0

series family CLB.

5.3.2 Standard ceil and E U custom design

Gate-array architectures standardize at the chip geometry level whereas standard

ce& standardize at the logic or function level. That is, a spcaôc design for each logic

gate or a logic function in a library can be crrated. This is the basïs of ceIl based

or standard c d design. Library cells are created for the following general dass of

circuits:

72

SmaU Scale Integraticm (SSI) logic: nand, nor, xor, inverters, bders, registers

Medium Scale Integration (MSI) logic: de code^^, aiaders, addem, companrtors

Datapath: Anthmetic Logic Units (ALUs), adders, shifters, bus extractors, reg-

ister files

Mernories: RAM, Rcsd Only Memory (ROM), Content Addressable Memory

(CAM)

System-level Blocks: micmxontrollers, Univexsal Asyacbn,nous h i v e r Trans-

mitters (UARTs), Reduced Instruction Set Computer (RLSC) cores, multipliers

Compared to gate--y, standard-ceU implementation provides betta density at the

cost of u1creased prototype, and increased design complexity. Standard-cell design

might result in better ptoductivity because the fimctions do not need to be designed.

men standard cells are available a9 a set of parameterizable modules.

A custom IC is individually designed for a parti& requirement. Fd-custom

implementation is the name given to technique where the function and layout of

practically every transistor is optimbd. This technique is programmed at the silicon

mask level. This d t s in higha density (reduced n), optimal choice of transistor

s k and numbers and bence best pedormaace among all implementations. The prin-

cipal disadvantage of fdl-custom logic is the large d o r t (initial CO&, tirne) required

in design and testing, hence low produdivity.

5.4 XACT Toob

A Xilinx-based bore implementation toob d e d XACT is aeed in this project as

a place & route tooi to implement dcsigns into Xilinx FPGA. It provides an iden-

ticai impIementation approach to aU the designs synthesized in Synopsys as wd as

Powerview- This créates an opporfunify to have a fair cornparison of different de-

si- targeted in the same technology librazy- Fig 5.2 shows steps involved in the

design implementation flow using the XACT bol. The UrIR2XNF program in Pow-

erview converts the VkwDraw wïr files into one oc more Xilinx XNF files (depending

on hierazchy of the design and synthesis method used). In Synopsys a single file

containing the hieratchical information of the design is written in XNF format ("de

sign.swf"). SYN2XNF program h m XACT tools converts this file to Xilinx XNF

fle. XNFMerge mages XNF filos into one XNF fie ("design.dE") with flatteneci hi-

erarchy. XNFPrep pedomu Design Rule Check (DRC) to check the design to aisure

no design d e aror exist and removes u n d or disabled logic from the design. The

output is an "designxtf" file. Partition Placernent & Route (PPR) uses the mapping

of the logic primitives to map them to CLBs. It then places the maam (CLBs &

I/O logic) onto the FPGA and routes the appropriate connections. PPR produees

an output file Cdesign.lca') containkg placement and rtmting iaforniation for the

FPGA. XDelay is a ststic timing andyzer- It takes the PPR output and produces

the worst-case path delays for cliffernt path types. The MakeBits program produces

Fi- 5.2. knplementation Flow in XACT tools

75

bit stream ("design-bit") which can be downloaded to the FPGA chip for the physical

design verification.

The physid testhg of the serial multiplier desi*gns n a d extra circuits for paralle1

Some of the serial multiplier m e w a c designecl with these extra circuits for test-

ing them in Xilinx 4010pg191-6 prototype board for physical design verification. A

schematic diagram of such a setup for Radix-Q design is given in Fig A.2.

5.5 S u m m m y

Features of CMOS as the most populat logic today is desaibed in this chapter.

DiEerent implementation styles for digital design is htroduced. Programmable fea-

tures of ASIC techn010gies and theh advantages is discussed. Choice of Xilinx FPGA

for design implementation in this thesis is given. Design flow for XACT as an imple-

mentation tool is describecl-

CHAPTER 6

RESULTS

6.1 Objectives

The main objective of this nwaxch is to compare dinerrnt multiplier aigonthms.

Bit-serial multipliers are major focos of this thesis. The multiplier designs presented

in fhapter 3 are designed using VHDL synthesis and implemented on a Xilinx FPGA.

Another goal of this researcb pmject, is to compare the capability of two VHDL based

logic syntheshm (Synopsys & Viewlogïc) using bit-serial multiplias. It is tecognized

that a detailed comparison of these tools can not be cazrïed out based on these designs

alone. Also, as Synopsys CAD is so huge and loaded with many features (Powerview

twls is only about 1520% of the c4st of Synopsys) the compaxison may have some

bias. But a cornparison based on the design pafosmance is justifiable.

This chapter preseots the implementation results of the multiplier designs. Gaphi-

cal representation of the pdonnance measurar discussed in chapter 2 for various mul-

tiplier~ is also presented hcn. The cornparison of design performances is disfussecl.

The VHDL synthesis tools am compand accordhg to multiplier design pedo~~a9ces .

Fiaily some conclusions are drawn.

6.2 Implementation Results

The Xilinx FPGA implementation of various multiplia is presented hem. The

multiplier designs are compared among themselves in each tool and also aaoss the two

77

tools aaed (Spopsys & Powexvïew). Tht perfomce measures considered here are

design sp- (h), sample rate (throoghp~~t) and Ara-Tiw product as described

in section 2.4. AU the multiplieas are designed for B b i t d u e n t irnplementation.

12-bit d u e n t provides fair cornparison of five Merent mdtipliexs describeci in

chapter 3 because radix-4, d i x - 8 and two-bit digit-serial designs can represent 12-bit

coefficient as an exact multiple of multipIier modules. AU the designs an imp1emented

in Xilinx 4010pg191-6 part type. During the synthesis phase similar constraiats and

operating conditions were considaed in both the tool, as fat as possible.

Table 6.1 and 6.2 give the implementation results fiom Powemiew and S y n o p

sys synthesizer with XACT tools. The best results based upon compilecl values on

maximum speed and minimum area and using diffkrent synthesis methodologies are

presented here. Spad optimization is focased on implementing the design in the

r target technology Iibrary with shortest delay or fastest dock rate. Whaeas area o p

timization looks for fimctional implementation of the design with srnaIlest totd ceII

wa in the taqet librafy. The area column in tables 6.1 and 6.2 an taken from

Xilinx FPGA implementation of the designe using XACT tools. It shows the nwnber

of CLBs occupied by each design. Another result taken h m the XACT tooi is the

estimatecl maximum clock rate for each design and optimization mode. The dock

rate column &O npresents the worst case propagation delay for each design in these

tables.

78

The latency in these tabla ate calculatecl according to the design description given

in chapter 3 for 12-bit d c i e p t implementation- A bit-seriad data flow can accept

a new set of input every Data Word Lcngth (DWL) numba of dock cycles. For

2-bit digit-serial data 00w it is 1/2 of DWL number of dock cycles and for p d e l

array it is 1. This is given as iteration time in numba of dock cycles in table 6.1

and 6.2. Sample rate or throughput is dculated h m itemtion tirne and minimum

dock paiod (l/dock rate). Simil;uIy Area T h e (AT) product is calculateci as the

product of area and the per sample. Time per sample is the tirne required for a

complete sample. These dculations am explaineci at the bottom of esch table. The

throughput n s u l t s are plotted as inverse of the dda ted value (in Fig 6.2, 6.5 and

6.8). This gives an opportunity to Iook for the shortest bar for the best performance

for area, throughput and AT product plots. Following discussion gïves the graphieal

cornparison of design performances bascd on the results in these tables.

The plots shown in Figures 6.1 to 6.3 demonstrate the compasative perfkmances

of the multiplier designs in the Powerview tool. The best results for are% sample

rate and AT prodnct are c o n s i d 4 h m table 6.1 for these plots. These graphs are

plotted using normalized values. Results for ordinary bit-Jerial design is considered

as a base unit and other design paformaaces are scaled acoordingIy. The order of

designs in x-iucis is chosen srbitrarily. Fig 6.1 shows that increasing radix-D recoding

decreases the design size. Fig 6.2 is a plot of the inverse of normalized sample rate

Latgcy (No* of Clock Cyclw)

23

2lock 1 teration M e 1 Thw (MHL) (No. of mir Bi&:

2 2 . 1 . DWL

Radix-4 Non-Redwidant (Bit-Seriai) 25.7 1 DWL

Radi x-8 Non-Redundant (Bi t-Md)

Digit-Mal (2-bi t Digit)

Minimum Clock Petiod = I / Gluck Rab Simpio Rite = Il (Itwrtian Timo in W of Clock Cy~loi) (Minimun Clock Pwiod) Tlm P a $ample a Minimm Cbcû pcriod * I twrth Timo in fi of Clock Cycles AT Producl = Am in 1V of CLBi 9 Tims Pm Sunplo DWL = Daîa Word h g i h = 12 Bitr

Table-6.2 : Reaults from Synopys Synthesizer & XACT tools.

Sample Rete AT M u c t Latcncy

(MHz) ( A m + Time Par Sample) (No, of Cloclt Cycles)

DWL

Radix-4 Non-Raiundant (Bit-Serial) DWL

R8di x-8 Non-Radundant (Bit-Mai)

Digit-Serial (2-bit Digit)

Minimum Clock Pciiod = 1 / Clook Riw Sunplo Rate = I / (Iîmîion T h in (V of Clock Cyolaa) + (Minimum Cbck Pcriod) TIma Per Sunpla = Minimum Cbck Puioâ * I î d o n Timo in If of Clock Cycles AT Producl = Am in U of CLBa Tim Pw SHngle DWL=aiiîa WordLength- l2Bitr

81

(tbughput) for di&mmt des ip . It is sexm that inasshg radix-n recoding d e r s

in tbroughput. The better thughput of radix-4 compared to the ordinsry design

is due to shorta routing delays aichieved by the place & route tool. The digit-serial

multiplier synthesized wing Powentiew tooIs has bctter thraughput thm the array

m d t i p k designeci accordingly. Fig 6.3 shows that the digit-serid dtiplier ha9 the

best AT product. Tbough radix-8 design ha9 smaller area compared to Ladix-4 it's AT

product d é r s due to lower throughput or dock rate. The AT product for digit-Senal

design is about 3 times bettex than that for the paraIlel array design.

Figures 6.4 to 6.6 show the perfomance compatuon of the multipliers in the Syn-

opsys tool. As explainecl above the best results are considered h m table 6.2 and

noma l id values are used for the plots. Fig 6.4 is identical in nature to Fig 6.1 and

the radix-8 recoded design has the d e s t area The normalized plot of the inverse

of design throughput in Fig 6.5 shows that radix-8 ncoding d e r s in throughput.

Radix-4 design gave better throughput than ordinary design due to duent place

ment and short routing delays. The digit-serid design did not outperform paralle1

anay multiplia in terxns of throughput because of better logic optimization capabil-

ity of the Synopsys synthesizer. The digit-mial design shows the best AT product as

given in Fig 6.6. The radix-8 d e d design shows dose pedomance in AT product

with radix-4 design and digitserial design is only about 2 tima betta than pardel.

This is because of a better dock rate adiieved by the Synopsys synthesk and also

Figure 6.1. Design Size in Poweniew tool

due to superior logic optimization in the case of array design. In general Fig 6.1 to

6.3 are similai with Fig 6.4 to 6.6 respectively.

Figuns 6.7 to 6.9 give a wmparison of design performances in Synopsys and Pow-

erview t h . They are plotted by considering the best resdts from maximum speed

and minimum ana optimization from table 6.1 and 6.2. The r d t for ordinazy bit-

serial design is considemi as a base unit and all othet d t s from both tools are

nomalized (scaled) accordingly. These normaJized values are plotted in Fig 6.7 to

Figure 6.2. Throughput Rate in P o w e ~ e w tool

6.9. The area for ordinary, radix-4, radix-8 and digit serial deoigns are alrnost similar

in both the bols except for snsy design as shown in Fig 6.7. It demonstrates that

Synopsys synthesizer pedorms betkr in logic optimization for large combinational .

circuits. Fig 6.8 shows better throughput by Synopsys tool compared to Powerview

tool except for digit-serial design. The tkoughput for radix-8 and -y design is

noticeably better. The different performance of digit-serial design is due to efficient

plaement & ronting in Xilinx. Fig 6.9 shows a better AT product for the designs

Ordinaty

Figure 6.3. Area Time Product in Powaview tool

synthesized in Synopsys tools except for digit-SeTial design. This is due to the better

dock rate from speed optimization using VïewSynthcsis tools.

6.3 Discussion & Conclusions

Several reasons for the differences in pedoxmance of the multiplier design presented

in this chapta an due to:

O Optimization Algorithm In the case of Synopsys, FPGA compiler was used

which facilitates special compilation and optimization aigorithm for Xilinx 4000

Fi- 6.4. Design Size in Synopsys tool

parts. Also, in general the Synopsys synthesizer attempts to impmve speed when

area is being constrained and vice versa. The optimization algorithm gives higher

priority to timing constraints than the arur whenever both of these constraints

are pre4ent. But when one of these pivametas is constrained and met it makes

a move towards optimizing other constraints. The move is accepted only when

the primary constraint is not violated. This is dear fkom the dose results for

area and speed optimization goals fot a design in Synopsys as shown in table

Figure 6.5. Throughput Rate in Synopsys tool

6.2. The Powerview synthesk seem to focus only on the main parameter being

constrained (area or speed) during optimization. Thae is noticeable ciifference

between the results nom atea and speed optimization for a design in Powerview.

Powérview does not o f f - any special a3gorithms for targeting Xilw 4000 parts.

Thc ViewSynthesis tool is used as a generic logic synthesizer and optimiza for

tasgeting the design to different technology libraries. Chapta 4 gives a detailed

discassion about design consttaints and optimization in Synopsys and PoweMew

Ares-Time Roduet of The designs (Synopsys) 1 -4 L 1 I I I

Figure 6.6. Ares Tie Product in Synopsys tool

tools.

The Synthesis and optimization process in Synopsys has multiple features [3], [BI.

It takes s e v d compile runs (trial & a m ) with a combination of different com-

pile methodologies for the design to settle with optimal performance. Whereas

in the case of P o w d e w synthesis tool it does not take as many compile m.

O Technology Library: Though the basic tazget technology library used in both

tools is the same (Xiluix 4000 SeTies parts), this a Merence in the source for

-- - Figue 6.7. Compazative Design Size

these libraries. In the case of Powerview the unified l i b r e ''XC4000.sml'' was

used. The Synopeys tool in the department d a s not have Xilùuc FPGA library

of its own. So, a librasy for Synopsys synthesizer provided by XACT tool was

used. In this case the librasy was explicitly specified as X401û-ô.

Implementation: A large percentage of the delay in pmgrsmmable devices is

due to routing. The actiial delays can only be obtained after place and route. The

post-synthesis timing analysis of the design is only an ststistical approximation

of the actud delays, which uses wire load modeis Speased as a c011straiat to the

Figure 6.8. Comparative Throughput Rate

synthesis proass. The delays diffa considerably for a custom implementation

where the designer has control ova each transistor used in the design. Since a

cornmon place & route tool is used in this thesis for both Synopsys and Powerview

synthesized designs, differe~ces due to VHDL description and synthesis run cab

In spite of the fact that the final d t is &&ed by several steps involved in the

design flow, followïng observations can be concluded h m the implementation results

preeented in this chapter:

O The Digit-saial multiplier gives the best performance cloee to that of pardel

MOY in terms of tbughput. This is an ad~stage considering the fact that

it is less thsn half the size of the paralle1 design. For a digit size of M o , the

~ D e s i g x s - Figure 6.9. Comparative Area T i e Ptoduct

multiplier is only a h t 10% bigga than the ordinary bit serial multiplier and

yet it has about twice the throughput. These are persuasive arguments for the

use of digit-saial multiplias.

The digit-Senal design has the best AT product. The digit-serial design synthe-

sized using Powerview has 3 times better AT product than paralle1 design.

Use of radix-4 recoding in the digit-serial design impmved its performance. A

35% dSerenice betweea the radix-4 and the digit-serial desip in t a of AT

product is better than a 7% Merence without radix-4 recding in the digit-serial

design done by P.J. Graumann (ECE Dept., U of C).

91

a Siar d t s for the AT p d u c t of diflFerest multiplier a lgor i th compared

to AT pmducts for ripple carry operators compüed by Parsifal silicon compiler

given by Hadey and Corbett [6].

0 The non-redundaat radixradur-2 design is 5% d e r than the one given by Primlani

and Meador [7]. Only two gaard bits are nquired in the data word. Also the

propagation delay path is shorter in our approach. These estimates are obtained

by cornparhg them in Synopsys VHDL synthesizer.

hcreasing Radkn recodïng lowers the gain in the AT pmduct and s p d corn-

pand to the g a h of the Radùc-4 design compared to the ordinary bit-se15a.l

design.

Comparable multiplier pdormmces to those by Russell & Hutchings [Il] in

t a of numba of CLBs occupied in Xilinx 4010 and throughput.

a The Synopsys synthesizer outpezrorms the Powerview Synthesizer in the case

of logic optimization. The paralle1 array multiplier synthesized using Syaopsys

is 14% smalla compared to Powefview. It can a b be cudimed by the 26%

smaUa AT product for the Synopsys synthesized pasalle1 design compared to

Powaview. This shows the @ormance of bettex logic optimization algorithm.

0 The Synopsys synthesizer resulted in better dock speed, and hence higha through-

put, and better AT product in g e n d as seen fiom the d t s .

92

Synopsys tooh have bette automation of design flow than Powenriew tools. A

single saipt contK,ls VBDL read-in, constraint setting, compile methodoiogy

setting, pad assignments and XACT impIementation steps. An example of such

script is given in appendix C.

O The maginal improved pesformance of Synopsys does not merit the additional

dollar value wben considering the faet that Powexview W ody about 15020% of

the cost of Synopsys.

CHAPTER 7

CONCLUSION AND FUTURE WORK

Logic Synthesis is an intcgral pazt of digital systexn design today. News design

methodologies and betta CAD took prompt the n d to design, tevise and redefine

existing systems. Rapidly gmwing eemiconductor technology facilitates flexibility,

customization and betta performance of the ha1 pmduct. Changes from schematic

capture b d design mtry to HDL synthesis techniques has ban adopted by every

logic designer due to the inaeased productivi@- St~dardization of HDLs has also

enhanced design nuse and portability- These observations encourage the use of HDL

description and Synthesis bols in the design a p p r d .

Multiplier ci~cuits have traditionally b a n popular because of their extensive use in

Digital Signal Processing (DSP) algorithms. Due to its large silicon area traditional

array multiplia usage is limited in an implementation where several such u n i t s axe

required. Extensive research has been done to devise new algorithms to redua the

size while not loeing much computatiod throughput- Concept of pipeliniag and bit-

serial designs have been popular in the field of p d e l i s m in processor ill:chitectuzes.

7.1 Conclusions

In this thesis the VBDL synthesis based design approach for digital systems is

used. TWO very good EDL synthesis tools, Synopsys aad P o w e ~ e w are used for

94

design capture, simulation, synthesis and implementation. This thesis discusses the

steps involved in the design flow in these bols.

MdtiplKis are uscd zu benchmaxk designs. Four diâkent aaia multipliers and

a parallel multiplier are designecl and implemeated. The dect of rôdix-n recoding

schemes and the digit &al approach in the multiplier dgorithms are studied. These

multiplieïs are implemented in Xilinx which is one of the leading FPGA techn01opies.

A comparative analysis of these multiplias is presented in this thesis. A comparison

of the two VEDL synthesis tools based upon the design performances is &O presented

in this thesis.

The principal contribution of this thesis is a detailed cornparison of five different

multiplier algorithms using W D L synthesis based design. The main approacbes

taken for the task are:

a Designing multiplias using VBDL description. Ordinary bit-serial, non-redundant

radix-4 bit-serial, non-redundaat radix-8 bit-serial, digit serial, and pasalle1 ~ a y

multiplias are designeci for 12 bit coefficient.

0 To synthesize the designs two VEDL synthesis tools Synopsys and Powemiew

are d*

0 Design Implementation is done in Xilinx 4ûlûpg191-6 part type. The physical

testing of some of these muhipliers is done using Xilùur 4ûlûpg191-6 prototype

board-

Design pafo~ma9ce patametcrs such as area, thughput rate, AT product are

compand for these desipils.

In relation with the objectives d&ed in chapter 1 and the approach used for this

research work, this thesis pr*lents:

a A study of area-speed trade 05 and pardelism in processor architectures us-

ing multiplier designs. Bit-serial design with pipclining and radix-n recoding

techniques are stressed- A detailed compatison of the pedormasce of multiplier

algorithms studied is given in Section 6.3.

A VHDL synthesis based design methodology for digital Cvcuits.

A detailed performance cornparison of Xiünx 4010 FPGA implemented multiplier

algorithms.

0 A cornparison of two of the moot pop* VHDL synthesis based Electronic

Design Automation tools.

7.2 Further Work

Suggestions for some further work in relation with this thesis is given below:

VMable CWL implementation for the multipliers can make them more flexible.

Also dix-n recoding techniques can be used for array multipliers. This needs

fiutha investigation.

HDL d d p t i o n plays a vital role in shaping op the Iioa product. Verilog has

recentiy becn standasdized (IEEE 1364-1995). It is possible to implement the

multiplier designs using V d o g synthesis.

Dinerent ASIC technologies give diff't perfomances. Full custom or some

other FPGA implementation of the designs can be done to explore better results.

This needs target library support in the synthesis tool.

[l] P. Desyer and D. Renshaw, %SI Signal Proassing: A Bit Senal ApprOach",

Addison-Wesley PubKshing Company, 1989.

[2] Ben Cohen, 'VHDL Coding Styles and MethodoIogy", Kluwer Academic Pub-

lishers, 1995.

[3] P. Kump and T. Abbasi, "Logic Synthesis Using Synopsys" , Kluwer Academic

Publishers, 1995.

[4] Morris Mano, 'Digital Logic and cornputer designn, Prentice Hall, 1979.

[5] P.J. Ashenden, "The VHDL Cookbookn, Fint edition, Dept. of Cornputer Sci-

ence, University of Adelaide, Australia

[6] R Hartley and P. Cocbett, uDigit-Serial Pmcessing Techniques", IEEE T m .

on Circuit and Systems, Vol-36, No. 6, pp.707-719, June 1990.

[7] K.K. Primlani J.L. Meador, "A Nonredundant-Radix-4 Serial Multiplier", IEEE

Jounial of Solid-State Circuits, Vol-24, No. 6, pp.17294736, Dec. 1989.

[SI ILF. Lyon, 'Two's Complement Pipeline Multiplias", IEEE Trans. on Com-

munications, pp.418425, A p d 1976.

[9] S.L. Freeny, "Special-Purpose Hardwaze for Digital Fitering", Proaedings of

the IEEE, Vol-63, pp.633-648, April1975.

[IO] P. Cappello, A. LaPaugh and K. Steiglitz, Wptimal Choie of Intermediate

Latching to Maximize Throughput in VLSI Circuits", IEEE 'Prans. on Amus-

tics, Speech, and Signal Proassing, vol.ASSP-32, No. 1, pp.28-33, Feb. 1984.

98

[li] RJ. Petasai aod B.L. Hutcbgs, 'An Assessment of the Suitability of FPGA-

Based Systclns for Use in Digital Signal Procaisïng", Brigham Young University,

Dept. of ECE.

[12] S.G. Smith, M.S. McGregor and P.B. Denyer, Techniques to I n ~ e a ~ e the Corn-

putational Thtoughput of Bit-Serial Architeches", IEEE Proc. on htl . C o d

on Acoustics, Speech aad Signal Pmxessing, pp.543-516, Apt. 1987.

(131 P. Ienne and M.A. V i i a z , YBit-SeriaI Mdtipliem and Squamsn , IEEE Traas.

on Cornputas, Vol. 43, No.12, pp1445-1450, Dec. 1994.

[14] L. Kuhnel and Hattmut Schmeck, =A Clcloser Look at VLSI Multiplicationn,

htegration - the VLSI Journal, VoI.6, N0.3, pp.345.359, Sept. 1988.

[15] R. Nagalla, uSynthesis of Digital Signal Processing Systems Using Pipelined

Bit-Said Arithaietic", M.Sc. Thesis, University of Calgary, Nov 1991.

[16] R.F. Tider, "Digital Engineering Design - A Modem a p p d n , Prentice Hall,

1991.

[U] N.H.E. Weste and K. Eshraghian, "Principles of CMOS VLSI Design - A System

Perspective", Addison-Wesley Publishing Company, 1993.

[18] 'The Programmable Gate Artay Data Book", XILINX k, 1994.

[19] Workview Pius, Workview PLUS for Windows: VHDL Werence Msnual",

Viewlogic Systems hc., 1995.

[20] ViewIogic, YVHDL Designer Usa's Guide and Tutonal", Viewlogic Systexns

hc., 1992.

[21] Viewlogic, "VHDL Merence M d for Spthesisn, Viewlogic Systems Inc.,

1992.

[22] ViewIogic, wewsim/SD User's Guide", ViewIogic Systans Inc., 1992.

[23] Synopsys, 5YNOPSYS v3.4b Online Documentation", Synopsys Inc., 1996.

[24] E. Horbst, u A d ~ c e s in CAD for VLSI - Volumc 2: Logic Design and Simula-

tion*, North-Holhd, 1986.

[25] Arpad Barnq W S I C Technologies and Thdeofin, John Wdey & Sons, 1981.

[26] Geoff Bostock, "Programmable Logïc Devi-: Technology and Applicationn,

McGraw-Hill Book Company, 1988.

[27] Mark G. Sobel, " W n h System V: A Practical Guiden, Third edition, The Ben-

jamin/Cummings Publishing Companyy Inc, 1995.

[28] L. J. Giacoletto, u E l ~ ~ n i ~ Designers' Handbookn , Second edit ion, McGraw-

Hill Book Company, 1977.

[29] A.P. Malvino, =Digital Computa Electronics: An Introduction to Microcorn-

puters", McGraw-Hill Book Company, 1983.

[30] M. Goossens, F. Mittelbach and A. Samarin, The LATEX Cornpanid,

Addison-Wesley Publishing Company, 1994.

[31] "System Design and Rapid Prototyping Tiaioing Workbook", Canadian Mi-

electronics Corporation, May 1996.

[32] D.E. Thimas and P. Mootby, 'The V d o g Hardware Description Language",

Second Edition, Kluwer Academic Publishas, 1995.

[33] E.S. Yang, uMicroelectronic Devices", McGrsw-Hill, hc., 1988.

APPENDIX

A XC-4000 CLB

Fig A.1 shows a simpMed b l d diagram of Xc4ooO1Wes Configurable Logic

Block [18].

a a a s P P C E

Figure A.1. XC4OOû-fdes CLB

B VHDL Description

The folIowing VHDL code gives an example of design description at the highest

level of hierarchy. It is a structural description which uses concmrent component

inetantiation. The behavioral description of each component is desaibed in separate

files.

------------------------a-------------

-- File : "twel't,itxilirix.vhdw

-- Desaiption : A structural VEDL desaiption for a 12 bit fidl product

O- RadUr-4 non-redundmt bit-serial multiplier. This is an

-- example sbowing top level design description.

-- Author : Khem Pokhrel

o L I I I I I I I I I I I - - - - - - - - - - - - - - - - - - - - - - - - - -

library ieee,d41ib,merolibrary;

use ieee-stdJogicJ l64.d;

use rad4ib.aU;

use mdbltiiry-hardpadt.all;

entity twelbitxihx is

port(coefii,datai : in stcUogic-vector(l1 domto O);

&,dock : in stUogic; c0,clatch : out stcUogic;

productout : out stUogic-vector(22 downto O));

end t w e l b i ~ ,

architecture design of twdbitxilinx is

si& nl,n2,n3,n4,n5,n6 : stdlogic;

component twelbitfphasd

port ( &,datai, cntrli,dock : in stdiogic;

prodhigh,cntrlo : out stdogic; produ& out stdiogic-vector(l0 downto O));

end component;

component partoser

port ( n : in stdogic-vector(l1 downto O);

dkJd : in stUogÏc; mut : out stdlogic);

end component;

component controlgen

port ( dk,dear : in sti

end component;

component d o p a r

dlogic; c0,cl : out stcuogic);

port ( cIk,sin,sel: in stdlogic; prod : out stdlogic~v&or(ll downto O));

end component;

begin

compl: parto6er

port map (n => coefn, clk => dock, Id => nl, sout => n3);

comp2: pastoser

port map (n => datai, clk => clock, Id => ni, sout => n4);

comp3: controigen

port map (dk => dock, dear => dr, CO => al, cl => n2);

CO <= nl;

cornpl: twelbitfphazd

port map (4 => n3, datai => n4, cntrli => n2,

dock => dock, pmduct => productout (10 downto O),

cntr10 => n6, prodhigh => n5);

comp5: sertopar

port map (clk => dock, sin => n5, sel => n6,

prod => productout(22 downto 11));

datch <= n6;

end design;

--The Followhg wnfigutation is for Simulation

library ieee, rad&b, merolibtary;

use ieee.stdJogicJl64.d;

use rad4ib.all;

use merolibrary.hardpack-a&

configuration twelbitxilinxdg of twelbitxilinx is

for design

for aIk psrtcar we entity merolibraty.partoset(design);

for comp3: controlgen w entity rnerolibrary.contn>igen(desi~);

for comp4: twelbi*hard use entity rad41ib.twelbitfphard(design);

for comp5: sertopar use entity merolibrary-satopar(design);

end for;

end twelbitxilbccfg;

C Design Compiler Script

The following is an example of a single saipt (cdection of design flow commands) \

used for andyzing VHDL, Elaboating into intermediate Synopsys format, setting

constraints and compile method01ogy, pad allocation, synthesizing the design, gener-

ating relevant n p o r t s and imp1menting into Xiliw FPGA part. This demonstrates

the automatcd design flow in Synopsys tools.

/ ************+***************************************

/ * + File : Rad4max-speed-script

/ * JF Description : Synopsys Design Compiler script for andyzing VHDL

/ * * files, constaints setting, synthesizing, pad allocation

/ * * and implementation. U d for FPGA compiler

/ * * Author : Khem Pokhrel

/* Removing all the designs from the Design Analyzex */

removedesign Wall

/* DeGning a design library mapped to./rad4ib diiectory */

defînedesigaiib radrllib -path ./rad4lib

designer = "Khem Pokhreln

Company = "Dept. of El&. Engg., U of C"

/* Andyzing the Top level Designs and putting the intermediate files */

/* in rad4ib. Ik&ult WORK is mapped to thk library. */

andyze -format vhdl fmodfp.vhd -1ib radab

andyze -format vhdl midrnodfp.vhd -1ib radab

analyze -format vhdllasmodfp.vhd -1ib radab

andyze -format vhdl twelbitfp-vhd -1ib rad4lib

/* Elaborating the Designs */

/* elaborate fhodfp -1ibraty tad4lib */

/c elaborate midmodfp -1ibrary d & b */

/c elaborate lasmodfp -1ibary rad41ib */

/* Only the Top level design nceds to be Elabomted */

elaborate twelbitfp -1ibrary 4 4 l i b

/* Checking Design */

/* Uniqaayins the multiple instances of mbdesigns in the t /

/* hiemzchy of Top lm1 design and un-gmuping to remove hierarchy*/

set flatten

/* Set the synthesis design constraïnts. And compile for Fastest Design */

removedock -al1

createAock dock -period 30

/* Add pads to the design. Make sure the current design is the toplevel module.*/

setportispad n*s

setpad-type Oslewrate HIGH hutputs()

insert+ds

/* Setting Attributes to the PADs according to the XC401ûpgl91-6 prototype * /

/* Imp1ementation Board */

setattribute 4 "pdocationn -type string *Bln

setirttribute datai "padlocationn -type string "Cln

setattribute cntrli 'padlocation" -type string "Dl"

setattribute dock "padlocation" -type string *B2n

setattribute d o ' pdocationn -type string 'F16*

setattribute datao "padlocationn -type string "H16-

setattribute cntrlo "pdocationn -type string "KMn

setattribute prodhigh "pdocation" -type string "Ll6"

setattribute "prodlow<S>" "padlocation" -type s t r i n g "Dl$"

setattribute "prodlow<4>" "pad-location" -type str ing "Dl?

setattribute "prodlow<t>" ''pdocationW -type str ing *El$"

setattribute "prodlow<t>" " pdocation" -type s t r i n g * F18"

setattribute "prodlow<l>" " pdocation" -type str ing * GISn

setattribute " prodlow<O>" " padJocationS -type str ing * GliF

/* Check Design Bdore Compiling */

checkdesign

/*Compile the design wïth optimîzation aaoss hiexarchicai boundaries */

compile -mapAort high -verify -veri@dort low -boundary,optimization

/* Write report files */

reportrpga > " . /db/mw-epad/twelbie&gan

report-timing > " ./db/max-speed/twelbitfp .ha-timing"

report- > " . /db/max-spd/twelbitrp.fpga-atean

/* Saving the design */

write -format db -hierazchy -output " ./db/max-speed/twelbif+fpgadb"

/* Replscing tells with gate Ied equivaiemts */

replace4ga - g r o u p ~ -group-tlus

/* Report gate level usage */ -

reportarea > './db/max-spd/twdbia.gate-arean

report-timing > " ./db/max-speed/[email protected]

/* Wnte the gate level design */

write -format db -hieratchy -output " ./db/max-Bpeed/t719elbitfggate.dbn

mite -format vhdl -hie~archy output " ./gstesh/ma-@/tnrelbi+gate.vhdn

/* Set the part type */

setattribute twelbitfp "part" -type string "4010pg191-6"

/* Optional attribute to remove the FPGA Compilers mapping to CLBs and IOBs

iiom al l levels. This removes the BLKNhd parameters while writing XNF net-list. */

setattribute find(design,D *" ) "wtout~hUaiamesn -type boo1ean FALSE

/* keeping XNF in the Xiliox directory. */

write -format xnf -hiemîrchy output " ./xilinx/max-spad/[email protected]"

cd xilinx/max-spd

sh xmake twelbitfp

/* Ditecting the max cl& fkquency info h m XDELAY program to a file */

sh xdelay -w twelbitfpka > tweibitfp.dly

/* Writing long delay report in a fde and writing design file with net delay * /

sh xdelay -w -x -O twelbitfp.opt [email protected]

Figures A.2 and A.3 show the scbematic generated aRa synthesis and simuhtion

waveform for the non-ndundant d i x 4 design.

Figure A.2. Schematic Diagram

li U b b b L 1 L 1

U iru i u b U b b 8

1 b b II a II 1 Ili 1 \

A comparison of bit-serial multipliers using VHDAL based ...

Documents

Transcript of A comparison of bit-serial multipliers using VHDAL based ...