Design and Optimization Techniques of High Speed VLSI Circuit

Design and optimization techniques ofhighspeed VLSI circuits

Marco Delaurenti

Politecnico di Torino

Design and optimization techniques ofhighspeed VLSI circuits

Marco Delaurenti

PhD Dissertation

December 1999

Politecnico di Torino

AdvisorProf. Maurizio Zamboni

CoordinatorProf. Ivo Montrosset

Copyright c1999 Marco Delaurenti

Writing comes more eas-ily if you have somethingto say.(Sholem Asch)

When I use a word,Humpty Dumpty said inrather a scornful tone, itmeans just what I chooseit to meanneither morenor less.(Lewis Carroll)

AcknoledgmentsFirst of all I would like to thank my advisor, Prof. M. Zamboni, Prof. GPiccinini, Prof. G. Masera for their invaluable help, and Prof. P. Civera forhis being a bridge toward the real world. Also many thanks to the VLSILAB members at Politecnico of Turin, Italy: Mario for his input about thecritical paths (no, I do not thank you for the jazz songs that you play allday long), Luca for the long discussions about books and movies (no, Ihavent seen the last Kubricks movie), Andrea for his very good cocktails(especially the Negroni one) and Danilo, because I forgot him every timewe went to lunch. Thanks also to Max (for he gave me the root password),and to Yuan&Svensson for the invention of the TSPC.Special thanks, finally, to Mg, for her support and for have been toleratingme till now.

CONTENTS

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Part I CMOS Logic 1

1. Introduction to CMOS logic . . . . . . . . . . . . . . . . . . . . . 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 CMOS logic families . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Static logic families . . . . . . . . . . . . . . . . . . . . 5

1.2.2 Dynamic logic families . . . . . . . . . . . . . . . . . . 6

1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Part II Circuit Modeling 13

2. A simple model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1 The Elmores model . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3. A complex model . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 The FAST model . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 MOS equations . . . . . . . . . . . . . . . . . . . . . . 23

3.1.2 Internal nodes approximation . . . . . . . . . . . . . . 24

viii Contents

3.1.3 Body effect . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Delay estimation . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.1 Equation solving . . . . . . . . . . . . . . . . . . . . . 32

3.3 Power estimation . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.1 Switching energy . . . . . . . . . . . . . . . . . . . . . 36

3.3.2 Shortcircuit energy . . . . . . . . . . . . . . . . . . . 39

3.3.3 Subthreshold energy . . . . . . . . . . . . . . . . . . 39

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Part III Optimization 45

4. Mathematic Optimization . . . . . . . . . . . . . . . . . . . . . 47

4.1 Optimization theory . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.1 Mono-objective optimization . . . . . . . . . . . . . . 49

4.1.1.1 Unconstrained problem . . . . . . . . . . . . 51

4.1.1.2 Constrained problem . . . . . . . . . . . . . 52

Lagrange multiplier and Penalty functions . . 52

4.1.2 Multi-objective optimization . . . . . . . . . . . . . . 54

4.1.2.1 Unconstrained . . . . . . . . . . . . . . . . . 56

4.1.2.2 Constrained . . . . . . . . . . . . . . . . . . 57

Compromise solution . . . . . . . . . . . . . . 57

4.2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . 58

4.2.1 One-dimensional search techniques . . . . . . . . . . 59

4.2.1.1 The section search . . . . . . . . . . . . . . . 59

Dicotomic search . . . . . . . . . . . . . . . . . 59

Fibonacci Search . . . . . . . . . . . . . . . . . 60

Contents ix

The golden section search . . . . . . . . . . . . 60

Convergence considerations . . . . . . . . . . . 61

4.2.1.2 Parabolic interpolation . . . . . . . . . . . . 62

The Brents rule . . . . . . . . . . . . . . . . . . 62

4.2.2 Multi-dimensional search . . . . . . . . . . . . . . . . 63

4.2.2.1 The gradient direction: steepest (maximum)descent . . . . . . . . . . . . . . . . . . . . . 63

4.2.2.2 The optimal gradient . . . . . . . . . . . . . 65

Convergence considerations . . . . . . . . . . . 66

4.2.3 The conjugate direction method . . . . . . . . . . . . 67

4.2.3.1 The FletcherReeves conjugate gradient al-gorithm . . . . . . . . . . . . . . . . . . . . . 68

4.2.3.2 The Powell conjugate gradient algorithm . . 69

4.2.4 The SLOP algorithm . . . . . . . . . . . . . . . . . . 70

4.2.5 The simulated-annealing algorithm . . . . . . . . . . 72

4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5. Circuit Optimization . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 Optimization targets . . . . . . . . . . . . . . . . . . . . . . . 78

5.1.1 Circuit delay . . . . . . . . . . . . . . . . . . . . . . . . 79

Critical Paths . . . . . . . . . . . . . . . . . . . 80

5.1.1.1 Delay formula obtained by the Elmore model 84

5.1.1.2 Delay measurement obtained by the FASTmodel and by HSPICE . . . . . . . . . . . . . 86

5.1.2 Power consumption . . . . . . . . . . . . . . . . . . . 87

5.1.3 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2 Optimization examples . . . . . . . . . . . . . . . . . . . . . . 91

5.2.1 Algorithm choice . . . . . . . . . . . . . . . . . . . . . 94

x Contents

5.2.2 Mono-objective optimizations . . . . . . . . . . . . . . 95

5.2.2.1 Area . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.2.2 Power . . . . . . . . . . . . . . . . . . . . . . 96

5.2.2.3 Delay . . . . . . . . . . . . . . . . . . . . . . 97

5.2.3 Multi-objective optimizations . . . . . . . . . . . . . . 102

5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6. A CAD tool for optimization . . . . . . . . . . . . . . . . . . . . 107

6.1 Logical description . . . . . . . . . . . . . . . . . . . . . . . . 107

6.1.1 The optimization algorithm module (OAM) . . . . . . 107

6.1.2 The function evaluation module (FEM) . . . . . . . . . 109

6.1.3 Core engine . . . . . . . . . . . . . . . . . . . . . . . . 109

6.2 Code implementation . . . . . . . . . . . . . . . . . . . . . . . 110

6.2.1 The classes CircuitNetlist and Circuit . . . . . . . . . 110

6.2.2 The class EvaluationAlgorithm . . . . . . . . . . . . . 112

6.2.3 The class OptimizationAlgorithm . . . . . . . . . . . 113

6.2.4 The critical path retrieving . . . . . . . . . . . . . . . 115

6.2.5 The derived classes . . . . . . . . . . . . . . . . . . . . 116

6.3 Program flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7. Results and conclusions . . . . . . . . . . . . . . . . . . . . . . 121

7.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.1.1 Mono-objective vs. Multiobjective . . . . . . . . . . . 122

7.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.3 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Contents xi

Appendix 143

A. Class graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

B. Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

B.1 Main functions . . . . . . . . . . . . . . . . . . . . . . . . . . 149

B.2 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . 208

B.3 Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

xii Contents

LIST OF FIGURES

1.1 Static and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Pass-transistor logic xor . . . . . . . . . . . . . . . . . . . . . 6

1.3 Domino typical gate . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 CVSL typical gate . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 C2MOS typical gate . . . . . . . . . . . . . . . . . . . . . . . . 9

1.6 TSPC Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 RC MOS equivalence . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 RC chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 RC single cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Elmore impulse response . . . . . . . . . . . . . . . . . . . . . 18

3.1 Inverter voltages waveform . . . . . . . . . . . . . . . . . . . 23

3.2 Mos chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Node voltages . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Voltages wave form in the nMOS chain . . . . . . . . . . . . 27

3.5 Voltages wave forms in the pMOS chain . . . . . . . . . . . . 28

3.6 VDS and VGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.7 MOSFET chain with static voltages . . . . . . . . . . . . . . . 30

3.8 Threshold variation . . . . . . . . . . . . . . . . . . . . . . . . 31

3.9 Delay comparison . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.10 Energy comparison . . . . . . . . . . . . . . . . . . . . . . . . 43

xiv List of Figures

4.1 Section search . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2 Minimization by Powell algorithm . . . . . . . . . . . . . . . 70

4.3 Minimization by Powell algorithm . . . . . . . . . . . . . . . 71

4.4 Minimization by SLOP algorithm . . . . . . . . . . . . . . . . 72

4.5 Minimization by Simulated-annealing algorithm . . . . . . . 73

4.6 Minimization by Simulated-annealing algorithm . . . . . . . 74

5.1 Design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Delay definition . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.3 Critical paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.4 Critical path tree . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5 Elmore delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.6 Elmore delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.7 HSPICE delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.8 FAST delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.9 HSPICE Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.10 CMOS Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.11 TSPC Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.12 TSPC And gates . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.13 TSPC Or gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.14 Static and-or gate . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.15 Static parity gate . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.16 Static full-adder . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.17 TSPC full-adder (onestage) . . . . . . . . . . . . . . . . . . . 101

6.1 Tool block diagram . . . . . . . . . . . . . . . . . . . . . . . . 108

List of Figures xv

7.1 Comparison of 0.7m and 0.25m. gates @ minimum tech-nology width . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.2 Delay optimization of 0.7m gates. . . . . . . . . . . . . . . . 125

7.3 Delay optimization of 0.25m gates. . . . . . . . . . . . . . . 126

7.4 Technology comparison of delay optimization. . . . . . . . . 127

7.5 Several delaypower optimization policies of 0.7m gates. . 132

7.6 Energy-dissipation variation (zoom of figure 7.5(b)) . . . . . 133

7.7 Several delaypower optimization policies of 0.25m gates. 134

7.8 Energy-dissipation variation (zoom of figure 7.7(b)) . . . . . 135

7.9 Delaypower optimization (50%50%) comparison of 0.7mand 0.25m gates. . . . . . . . . . . . . . . . . . . . . . . . . 136

7.10 Delay and power trajectory during 4 different multi-objectiveoptimizations for the andor gate . . . . . . . . . . . . . . . . 137

7.11 Delay and power trajectory during 4 different multi-objectiveoptimizations for the parity gate . . . . . . . . . . . . . . . . 138

7.12 Delay and power trajectory during 4 different multi-objectiveoptimizations for the static full-adder . . . . . . . . . . . . . 139

7.13 Delay and power trajectory during 4 different multi-objectiveoptimizations for the dynamic full-adder . . . . . . . . . . . 140

xvi List of Figures

LIST OF TABLES

3.1 Mean Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 Execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . 75

5.1 Basic gates: complexity . . . . . . . . . . . . . . . . . . . . . . 92

5.2 Basic gates: pre-optimization delay, power consumption andarea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3 Full-adder: delay optimization . . . . . . . . . . . . . . . . . 99

5.4 Agreements of targets . . . . . . . . . . . . . . . . . . . . . . 103

5.5 Full-adder: delay and power optimization . . . . . . . . . . 105

5.6 Full-adder: optimizations comparison . . . . . . . . . . . . . 105

7.1 Library gates list . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2 Delay and energy dissipation @ minimum width (HSPICE) . 123

7.3 Delay decreasing and energy increasing (both relative) in adelay optimization. . . . . . . . . . . . . . . . . . . . . . . . . 128

7.4 Elapsed time and total number of function evaluations for afull-delay optimization with HSPICE on a ULTRA-sparc 5 129

7.5 Constrained delay optimization of a few 0.25m gates. . . . 130

7.6 Delay worsening and energy improvement between a fulldelay optimization and delay-power optimization . . . . . . 133

xviii List of Tables

Preface

The design of high speed integrated circuit is a long and complex op-eration; nonetheless the total timetomarket required from the idea to thesilicon masks is reducing along the way.To help the designer during this long and winding road several CAD toolsare available. In the first step the only thing existing is the description ofthe circuit behaviour (the idea); in the central step of the design flow thedesigner knows only the logic functioning of each block composing the cir-cuit, but he ignores the technology realization of these blocks; in the laststeps, finally, the designer knows exactly the technology implementationof every single gate of the circuit, and can compose the final layout withevery gate. Ca va sans dire that the CAD tool are nowadays of vital import-ance in the design flow, and moreover the goodness or the badness of suchtools influence a lot the quality of the final design.

Among all the possible instruments, the optimization tools have a pri-mary role in all the phases of a project, starting from the optimization athigher level and descending to the optimization made at the electrical level.

This thesis focuses its efforts in developing new strategies and newtechniques for the optimization made at the transistor dimension level, thatis the one done by the cell library engineer, and developing also a CAD in-strument to make this work as more as harmless as possible.

xx Preface

Part I

CMOS LOGIC

Chapter 1

INTRODUCTION TO CMOS LOGIC

THE optimization of VLSI circuits involves the optimization of singleCMOS cell. In this chapter are briefly reported the basic CMOS logicfamilies, with their pros and cons. The simple goal is to pick up amongthe static and dynamic logic families the most appealing for the use in vlsicircuits, and, in some measure, the most actually used, and then apply tothem the optimization techniques shown in the next chapters.

1.1 Introduction

We might ask: why to optimize a single cell in VLSI circuit, when thedesign nowadays is shifting toward higher and higher level?

Some answers could be:

Need of re-usable library cells. This makes easier to reuse the samelibrary for different projects. It is a must nowadays, in order to reducethe total time to target/market.

An optimized library makes easier the design at higher level: floor-planning, routing, can have relaxed constraints, since the gates havea better behaviour. It is possible to reduce the time to repeat somecritical steps like floorplanning or routing until all the specificationsare met: these specifications are met earlier, since the cell globallyhave a better behaviour.

Need of having some equivalent libraries with different kind of op-timization. It is possible to have different libraries that have different

4 Chapter 1. Introduction to CMOS logic

specifications, but are functionally equivalent, so that it is possible tocreate different version of a project simply substituting the basic lib-rary. It would be possible, for example, to have, of the same project, aversion that runs at full speed, and version optimized for low-powerdissipation.This swapping of libraries does not involve the higher levels of design,for it is totally transparent to the designer during floorplanning orrouting. Just before the layout production, during the cell mapping,it is possible to choose the library on to which the project would bemapped.

These answer have led to consider the appropriateness of the produc-tion of a tool able to perform the optimization of a cell library, in a wayappropriate for the designer. The goal is to produce some results to showthat this optimization is worth during a design cycle, and also to make theinsertion of the tool in a design cycle as smooth as possible.

In order to attain results that are related to a real production cycle, wehave to choose some cells that are almost present in a real library.For this purpose we introduce a very brief description of the most usedCMOS logic families, and among them we choose the cells to develop andtest the optimization framework.

1.2 CMOS logic families

The first basic distinction inside the CMOS logic families is among thestatic logics and the dynamic logics ([1]).

Static logic: The static logic is a logic in which the functioning of the cir-cuit is not synchronized by a global signal, namely the clock of thecircuit. The output is solely function of the input of the circuit, andit is asynchronous with respect to them. The timing of the circuit isdefined exclusively by its internal delay.

Dynamic logic: The dynamic logic is a logic in which the output is syn-chronized by a global signal, viz. the clock. The output is, then, func-tion both of the inputs of the circuit and of the clock signal; and the

1.2. CMOS logic families 5

timing of the circuit is defined both by its internal delay and by thetiming of the clock.

Both the static and dynamic logics comprehend several logic families.

1.2.1 Static logic families

The principal static families are:

Conventional static logic It is the logic normally referred when speakingof static logic. A static circuit has the same number of NMOS and PMOStransistors, but the n and p branches are respectively one the dualof the other. As an example see figure 1.1, which represents a static

A

B

OUT = A and B

Fig. 1.1: Static and

and gate. It has two NMOS transistor connected in series and twoPMOS connected in parallel.

The static logic is quite fast, does not dissipate power in steady stateand has a very good noise margin.

Pseudo-NMOS It is an evolution of the yet surpassed NMOS logic. It is ob-tained by substituting the whole PMOS branch in a static logic witha single PMOS transistor with its gate connected to ground. So this


PMOS is always conducting and leads the output node to the highstate. When the NMOS branch conducts also, then the output dis-charges, if the ratio among the NMOS and PMOS transistor is well de-signed.

This logic is cited here only for historical reason, since it is not so fast,it dissipates static power in a steady state (when the output is in thelow state) and it is sensible to noise.

Pass-logic The pass-logic is relatively new logic, and, for many digital de-signs, implementation in pass-transistor logic (PTL) has been shownto be superior in terms of area, timing, and power characteristics tostatic CMOS.As an example see figure 1.2,

A

A

B

B

OUT = A xor B

Fig. 1.2: Pass transistor logic xor

1.2.2 Dynamic logic families

The principal dynamic families have a characteristic in common: everydynamic logic needs of a pre-charge (or pre-discharge) transistor to lead toa known state some pre-charged nodes. This is done during the workingphase known as pre-charge phase or memory phase; during another workingphase, the evaluation phase the output has a stable value1.

1 This brief introduction is limited to systems that have a single global clock, or onephase, intending here the word phase as synonym of clock, and not as above as a synonymof working period. There are systems that have two, or even four phase, but they are notintroduced here. The basic functioning, however, remains the same.


The principal dynamic logics are divided yet in two sub-families, pipe-lined and not-pipelined. The first two these are non-pipelined, while the oth-ers are pipelined:

Domino logic and NP Domino logic The typical domino gate is depictedin figure 1.3

NMOS Block

CLOCK

OUT

I NP U

T s

Fig. 1.3: Domino typical gate

During the pre-charge phase the clock is at its low state, so that thepre-charged node before the static inverter is high, and the output islow. During the evaluation phase the clock is high, so that the inputsof the nblock (that can perform any logical function) can dischargethe pre-charged node and lead the output to the high state.

We can cascade several of these gates, given that each gate has itsown output inverter, and we can drive every gate with the same clocksignal, given that the evaluation phase lasts the time necessary to allthe gates to finish their inputs evaluation. This last fact explains whythis is a non-pipelined logic: the output of every cell is available whenthe cell has finished its evaluation phase.Moreover this logic has a limited area occupancy, since it has a lownumber of PMOS transistors. On the other hand it is not possible toimplement inverting-structure and, as all the other dynamic logics,this logic is subject to the charge-sharing problem2.

2 The charge-sharing problem, or charge-redistribution, is a problem that affects the dy-


A natural evolution of the domino logic is the N-P domino logic, orzipper logic. It consist of two typical cells, the one depicted in fig-ure 1.3, and the dual one obtained by that, simply swapping the n-block with a p-block, and a PMOS pre-charge transistor with a NMOSpre-discharge transistor, driven by the negated clock.This logic has a lower are occupancy, since there is no need of a staticinverter, but has also a lower speed, given by the presence of PMOStransistors.

Cascode voltage switch logic (CVSL) The CVSL is part of the large familyof differential logics. It needs both the inputs and the inputs negated,and two complementary n-block that perform the logic function, as itis possible to see in figure 1.4.

OUTOUT

INPU

Ts

INPU

Ts

Fig. 1.4: CVSL typical gate

It has the advantage to be quite fast, since the positive feed-back ofthe two PMOS accelerates the switching of the gate, and also it hasvery good noise margins. Moreover it produces both the outputs and

namic logics. Basically the charge stored in an precharged node node during the memoryphase does not remain fully stored in it. Lets think to a domino gate during the pre-chargephase, when the clock is low. If there is one input in the n-block that is high, then its cor-responding transistor is conducting. The n-branch is still not conducting, since the clockedNMOS transistor is not conducting, but some charge from the precharged node can flow toothers node via the conducting transistors in the n-block. This redistribution of charge issimply a charge of a capacitor partition and lead to a state of the precharged node lesserthan the high state.

This problem can produce logic errors, and surely diminishes the noise margins of thecell.


negated outputs without needing an inverter. As a drawback, it hasa large area occupancy.

C2MOS logic The typical C2MOS gate is shown in figure 1.5. It is basicallya three-state gate, since when the clock is at the low state, the outputis floating at the high impedance state.

NMOS Block

INPU

Ts

CLOCK

PMOS Block

INPU

Ts

CLOCK

OUT

Fig. 1.5: C2MOS typical gate

It is principally used as a dynamic latch, as an interface among staticlogics and dynamic-pipelined logics.

NO RAce logic (NORA) The NORA logic, as acronym of no race, is an evol-ution of the N-P domino logic. The static inverter of the domino logicis substituted with a C2MOS inverter. This is the first of the pipelinedlogics, since the output of every gates is available only when the clockswitch its state, and not before.

Since the output stage of every cell is also dynamic (a C2MOS in-verter), then this logic is more subject to the charge-sharing problemthat the domino logic is.


True Single Phase Clock logic (TSPC) The final evolution of the NORA isthe TSPC logic, or true single phase clock logic ([2]).The TSPC logic is a n-p logic, since of each gate exists the n-versionand the p-version. For example the n-latch and the p-latch are shownin figure 1.6.

OUT

A

CLK

(a) Type n

CLK

A

OUT

(b) Type p

Fig. 1.6: TSPC Latches

The ultimate advantage of the TSPC logic is the presence of a singleclock, since for its internal structure it is not necessary the presence ofthe clock negated.

The TSPC logic is among the faster dynamic families, and surely it hasa great appealing for its very low number of transistor employed.

1.3. Conclusion 11

1.3 Conclusion

After this very brief introduction to several CMOS families, we chosetwo different logics, in order to apply the study of the optimization tech-niques objects of this thesis. The criteria that drove us in choosing thesefamilies was both the diffusion in VLSI circuits, and the presence of verygood qualities, perhaps not yet fully exploited in the real production ofcircuits.

For these reasons we have chosen to include in our library a few staticgates (an and gate, an or gate, and a few more) and a few dynamicgates, and in particular gates from the TSPC family. This family has showngood characteristics in term of speed, area occupancy and power dissipa-tion; it has also the very important feature to need only a single clock.

The complete list of the gates comprising the library can be found in thetable 7.1 (page 122), with their relative schematic diagram of CMOS imple-mentation.

Part II

CIRCUIT MODELING

Chapter 2

A SIMPLE MODEL

THE first model applied in the calculus of the delay in MOS circuits isthe Elmores model ([3]). It is a simple RC delay model, and it is thebasement of a switch MOS model (figure 2.1): the generic MOS is represen-ted, during the ON state, by its dynamic resistance across the drain pin andthe source pin, and the parasitic capacitances and resistances at the drainand source pins.

G

D

S

ON= G

C

C

D

S

CG

RdRg

S

D

RL

CL

R0

Fig. 2.1: RC MOS equivalence

If this simple MOS model is valid, then the Elmores delay formula canbe used in every structure containing some MOS. The Elmores formula is

16 Chapter 2. A simple model

appealing for its simplicity and its easy of use; however the accuracy of theformula can worsen in the deep submicron domain, since the modeling ofa MOS through its resistance it is no more valid.

Since the use of Elmores model is almost quite limited to comparis-ons with other models, of for introduction to delay modelling, section 2.1presents here only the very basic of the Elmores model and section 2.2shows the conclusions about the use of this model for VLSI models.

2.1 The Elmores model

The Elmores model or the Elmores delay formula can predict the delayof a RC chain as shown in figure 2.2.

R RRi-1 i i+1

C C Ci-1 i+1i

Vi-1 Vi Vi+1V0

Fig. 2.2: RC chain

In order to obtain the formula, lets start with a single RC cell, as shownin figure 2.3. We can express the voltage V1(t) by means of a differentialequation such as:

C0dV1dt =

V1(t)V0(t)R0

(2.1)

Integrating the equation (2.1), we can write

V1 = V0(t)[1 e tR0C0

].

The time constant is = R0C0, and with t = we obtain:

2.1. The Elmores model 17

R

CV0

0

0

V1

Fig. 2.3: RC single cell

V1 = 0.63V0(t).

So the time tD = represents the 63% delay from V0(t) to V1(t). Extend-ing the formula of the time constant to the chain of figure 2.2, we obtain:

tD =N

i=0

( ij=0

R j)

Ci.

This delay is the inputoutput delay. When there is the need to knowthe delay between the input and one of the inner nodes, a more complexformula (a semi-empirical one) can be used; for example, with N = 2:

t1 = R0C0 + qR1C1 delay from the input note to the first nodet2 = R0C0 + (R0 + R1)C1 delay from the input note to the output node

where q is:

q =

R0R0 + R1

if R1 2R0,R0C0

R0C0 + R1C1if R1 > 2R0.

18 Chapter 2. A simple model

The first case (with R1 2R0) is named strong coupling, while the secondone is named weak coupling.

Given the unit impulse response h(t) (figure 2.4) of the output node ofthe RC tree, Elmore proposed to approximate the delay by the mean ofh(t), considering h(t) as a distribution. The 50% delay is given by:

h(t)

t

m

Fig. 2.4: Elmore impulse response

Z

0h(t)dt = 0.5

while the original work of Elmore proposed:

tD = m =Z

0t h(t)dt

with

Z 0

h(t)dt = 1.

2.2. Conclusions 19

This approximation is valid only when h(t) is a symmetrical distribu-tion, as in figure 2.4, while in real cases the h(t) distribution is asymmetrical;however in [4] is proved that the Elmore approximation is an upper boundfor the 50% delay, even when the impulse response is not symmetrical, and,furthermore, the real delay asymptotically approaches the Elmore bound asthe input signal rise (or fall) time increases.

2.2 Conclusions

The model shown in this chapter is quite appealing for the calculus ofthe delay in CMOS structure, but it is inaccurate as far as we go into thesubmicron domain, so its use should be limited to a first validation of anoptimization algorithm, but not for real production.About this, it is important to note that the delay functions obtained by theElmores formula satisfy some properties useful in the optimization realm(for example equation (4.1), page 50): then the Elmore model is very usefulfor optimization algorithms testing.

Chapter 3

A COMPLEX MODEL

THE target of the model developed here is to offer limited estimationerrors with respect to physical SPICE simulations and to improve thecomputation speed of more than one order of magnitude. This could beuseful in optimization algorithms.Thus the aim of the model is to evaluate the delay and power dissipationof CMOS structures.

Several approaches have been used to evaluate the delays of CMOSstructures: some models are derived from SPICE simulations by means oflookuptables [5]; some are analytical [6] while others approximate theevaluation of the delay with step or ramp inputs [7, 8, 9, 10, 11].

Regarding the power consumption the main contributions are: switch-ing power, short circuit current and subthreshold conduction. The firstone occurs during the charge and discharge of internal capacitances; shortcircuit current originates from the simultaneous conduction of p and n net-works and it is dominated by the slope of node voltages; subthresholdcurrents are due to the weak inversion conduction of MOSFETs and becomerelevant when the power supply is scaled in sub-micron technologies.

Most of the proposed power models use estimation algorithms not com-patible with the delay analysis. The purpose of the FAST model is to com-bine delay and power evaluations in the same estimation procedure, allow-ing the simultaneous optimization of delay and power.

22 Chapter 3. A complex model

The section 3.1 reports the theory behind the FAST model, and in par-ticular: 3.1.1 shows the MOS equations used in the model, 3.1.2 showsthe internal nodes voltage approximation made by the model and 3.1.3explains how the threshold voltage variation are taken into account in themodel. Section 3.2 shows how the FAST model estimates the delay, and inparticular 3.2.1 shows how the equation are solved; while section 3.3 re-ports the method used for the calculation of the power consumption, andin particular 3.3.1 accounts for the switching power, 3.3.2 accounts for theshort-circuit power, and 3.3.3 accounts for the subthreshold power.Finally the section 3.4 presents some results by the comparison of the modelwith HSPICE and the section 3.5 draws some conclusions.

3.1 The FAST model

The low complexity and the accuracy that can be obtained by takingcare of the phenomenon of carriers velocity saturation, which is domin-ant in submicron technologies, suggested the use of the classical chargecontrol analysis and the gradualchannel approximation (Hodges model),described in 3.1.1.

Estimation accuracy and low computational effort can be achieved byoperating both on the waveforms of internal signals and on the topologyconsiderations: in particular all the waveforms in the circuit are approxim-ated with linear ramps.

By approximating the input waveform with a ramp, a strong simplific-ation of the I(V) equations is obtained. Figure 3.1 shows the output voltageof an inverter driven by a ramp input. It can be noticed that a ramp canproperly approximate the output voltage variation, especially in the centralphases of the commutation. The increasing error on the tail of the switchingdoes not affect significatively the delay and power estimation.

The voltage ramp approximation are described in 3.1.2.

3.1. The FAST model 23

0

1

2

3

4

5

1.2 1.25 1.3 1.35 1.4 1.45 1.5

V

Time (ns)

VoutVin

Model

Fig. 3.1: Inverter voltages waveform

3.1.1 MOS equations

The well known equations for the MOS transistors are (for the ntypeand ptype transistors)[1]:

below saturation

IDSn,p = n,p

[(VGS VTn,p)VDS V

2DS2

](3.1)

above saturation

IDSn,p =n,p

2[VDSsatn,p

]2(3.2)

where n,p =n,pCoxW

L , with n,p modified by the carrier velocity saturationeffect:

n =n0

1+ VDSLEcp =

p01 VDSLEc


The saturation voltage (drainsource), not including the carrier velocitysaturation effect, is given by the well known formula:

VDSn,p = VGSn,p VTn,pwhile considering the effect abovementioned:

VDSn,p = Vc1 2(VGSn,p VTn,p )Vc 1

(3.3)where the plus signs are for nMOSFETs and the minus signs are for thepMOSFETs, and Vc = |EcL|

3.1.2 Internal nodes approximation

Fig. 3.2: Mos chain with proper numbering

Let be N the number of nMOSFETs in the nchain and P as the num-ber of pMOSFETs in the pchain, and lets label the transistor in the chain


from 1 to N or from 1 to P (figure 3.2). Lets assume that the label 1 comeswith the driving transistor (i.e. the nMOSFET with source connected to VSSas the pMOSFET with source connected to VDD), as in figure 3.2. This hy-pothesis is only for the develop of the discussion; in our model any (butonly one) transistor can be a driving transistor, that is a transistor with achanging gate voltage.

Notation 3.1. In the following equations the superscript index refers to thenode number (with the variable i always for the nMOSFETs and j alwaysfor the pMOSFETs), and the smallletter subscript indexes n and p refer, re-spectively, to nMOSFETs and pMOSFETs, both for the voltage variables orfor the time variables; for the voltage variables the capital subscript indexesG and D refer to the drain node and the gate node, while the smallletterindex d refers to the initial conditions of the drain nodes.

So, for example, ViGn (t) is the gate voltage at the node i for the nMOSFETs(function of time), and V jdp is the initial condition of the drain voltage atnode j for the pMOSFETs.

The wave forms of the voltage are shown in figure 3.4 and figure 3.5,with the hypothesis t10n = t

20n = = tN0n and t10p = t20p = = tP0p ; that is

because we suppose the start of conduction of all the MOSFETs in a chaincontemporary1.

We can write, referring to figures 3.4, 3.5:

V1Gn(t) =

0 t < 0VDD 1in

t 0 t < 1inVDD 1in t

(3.4a)

V1Gp (t) =

VDD t < 0

VDD VDD 1ipt 0 t < 1ip

0 1ip t(3.4b)

ViGn(t)i=2,3,...,N

= VDD t (3.4c)1 This hypothesis is well supported by simulations


V jGp(t)

j=2,3,...,P= VSS t (3.4d)

ViDn(t)i=1,2,...,N

=

Vidn t < t

i0n

Vidn Vidnt ti0n ion ti0n

ti0n t < ionVSS ion t

(3.4e)

V jDp (t)

j=1,2,...,P=

V jdp t < t

j0p

VDD V jdp jop t j0p

t+ jop V

jdp t j0p VDD jop t j0p

t j0p t < jopVDD

jop t

(3.4f)

n(t)

n(t)

n(t)

n(t)

(t)n n

(t)

n(t)

(t)n

n

Fig. 3.3: The ith and i+ 1th MOSFETs with node voltages

It is also possible to define iin,p = i1on,p and the source voltage V

is = Vi+1d ,

as shown in figure 3.3 for the ith nMOS. The same is valid for the pMOSFETs.

The starting level Vdn,p are determined with a static analysis, describedin 3.1.3.

3.1.3 Body effect: threshold variation and its approximation

It is known that a MOS transistor with the sourcebody voltage differ-ent from zero has the threshold voltage modified by the body effect, that


ooooo i

Fig. 3.4: Voltages wave forms in the nmos chain

is if Vsb 6= 0, with Vsb the sourcebody voltage (lets remember that fora nMOSFET Vb = VSS and for a pMOSFET Vb = VDD), then |VTh|Vsb 6=0 >|VTh|Vsb=0. The initial conditions of the chain nodes are set by the initialcondition on the output. So if the output node is discharging, then one(and only one) nMOSFET is switching from off to on. It means that all theother MOSFETs are already on, and while the starting voltage of the outputnode is VDD, all the internal nodes have as a starting voltage VDD VTn.

With the notations of previous paragraphs, the Nth (topmost) nMOStransistor has VNsn = VDDVTn, with Vs source potential and VTn the thresh-old voltage modified by the body effect. All the internal transistors haveVidn = V

isn = VDD VTn, while the first one has V1dn = VDD VTn and V1sn =

0.

The threshold voltage variation as a function of Vsb is given by:

VTn = (

2|p|+ Vsb

2|p|) ,

with =

2sqNaCox and p = KTq ln ( Nani ).


ooooo i

Fig. 3.5: Voltages wave forms in the pmos chain

The source potential of the top transistor is

Vs = VDD VTn ,

and, if VTn0 is the threshold voltage with Vsb = 0, then VTn = VTn0 + VTnand we can solve for Vsb:

Vsb =

4

2|p|+ 8|p|+ 4VDD 4VTn0 + 22

+

2|p|+ VDD VTn0 + 2

2 (> 0)

We can find an analogue equation for pMOSFETs: knowing that, forthe pMOS chain depicted in figure 3.7(b), the drain potential of transistoris VPdp = 0, while V

Psp =VDD VTp; for the middle transistors V jdp = V jsp =VDDVTp ; and for the first (topMOSt) transistor V1dp =VDDVTp and

V1sp = VDD.

The threshold voltage variation function of Vsb again is:


oo

i

Fig. 3.6: Drainsource (VDS) and gatesource (VGS) voltages of th ith nMOS

VTp = (

2|p|+ Vsb

2|p|)

(for pMOS transistors threshold voltage is negative).

Again, solving:

Vsb = VDD VTp = VDD VTp0 + (

2|p|+ Vsb

2|p|)

where VTp0 is the threshold voltage with Vsb = VDD; thus we find:

Vsb =

4

2|p|+ 8|p|+ 4VDD + 4VTp0 + 22

2|p| VDD VTp0 2

2 (< 0)

The threshold variation is approximated in the model by a linear ap-proximation given by:


-VTN

-VTN

nmos 1

nmos 2

nmos N

DDV

VSS

DDV

DDV

DDV

DDV

(a) nMOSFET chain

- VTP

- VTP

pmos 1

pmos 2

pmos P

DDV

DDV

VSS

VSS

VSS

VSS

VSS(b) pMOSFET chain

Fig. 3.7: MOSFET chain with static voltages

VTn = nVsb + nVTp = pVsb + p

with n,p and n,p constants:

n =VTn VTn0VDD VTn n = VTn0

p =VTp VTp0VDD + VTp

p =VTp

(VDD + VTp0

)VDD + VTp

3.2. Delay estimation 31

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

0 1 2 3 4 5

VTn

Vsb

VTn(Vsb)VTn approx

(a) nMOSFET

-1.7

-1.6

-1.5

-1.4

-1.3

-1.2

-1.1

-1

0 1 2 3 4 5

VTp

Vsb

VTp(Vsb)VTp approx

(b) pMOSFET

Fig. 3.8: Threshold variation with Vsb (solid line) and its linear approxima-tion (dashed line)

In figure 3.8(a) and 3.8(b) the actual threshold variation (of a nMOStransistor and a pMOS transistor) when a Vsb voltage is applied is com-pared with the linear approximation used in our model, for a 0.7m tech-nology.

The max error due to the linear approximation is limited to 7%.

3.2 Delay estimation

The delay estimation of the structures reported in figure 3.2 implies theevaluation of ion,p and ti0n,p , for each transistor in the chains.

The currents in each transistor can be obtained from equations (3.1),(3.2) (page 23), with the voltage function of time defined in equations (3.4a)(3.4f) (page 25). So we can calculate the quantity of charge at each node andthus apply the charge conservation law, i.e. at each node the total chargevariation must be equal to zero:

Qin = 0 Qjp = 0 i = 1,2, . . .N and j = 1,2, . . . , P (3.5)

The generic term Qin is the sum of three elements, Qin = Qi+1I QiI QiC,define below:


Qi+1I is the charge due to the (i + 1)th MOSFET placed above the ithnode:

Qi+1I =Z ti+1sn

ti+10nIi+1sat (t)dt+Z i+1on

ti+1snIi+1lin (t)dt (3.6a)

which includes the contributions due to the currents above and be-low saturation; ts is the time at which the MOSFET switches from thesaturation to the linear region;

QiI is the charge due to the (i)th mos below the ith node:

QiI =Z tisn

ti0nIisat(t)dt+Z ion

tisnIilin(t)dt (3.6b)

QiC is the charge due to the discharging of the capacitor at the ithnode, Ci:

QiC = CiVidn . (3.6c)

Similarly equations apply for pMOSFET.

For each circuit node, a charge conservation equation can be written.

3.2.1 Equation solving

Referring to the nMOS chain in figure 3.3, we can write at the outputnode N:

QNn = QNC = CNVNdn (3.7)

because, neglecting the contribution of the pMOS chain above (if it exists),QNI = 0.

At the node N 1 we can write:


QN1n = QNI QN1I QN1C ,

and combining with eq. (3.7) (page 32)

QN1n = CNVNdn QN1I QN1C ,

and so on:

QN2n = CNVNdn CNVN1dn QN1I QN2C .

More generally:

Qin = N

k=i+1

CkVkdn QiI QiC

= Nk=i

CkVkdn QiI = 0

Proceeding till the first transistor, we obtain:

Q1n = N

k=1

CkVkdn Q1I = 0 , (3.8)

the same applies for pMOSFETs.

In order to solve nonlinear equation (3.8) one must substitute the defin-ition of the current to calculate the charge Q, as in equations (3.6a), (3.6b)(page 32), moreover one must substitute both the current calculated in thesaturation region and the one calculated in the linear region, extending theintegrals of the aforementioned equations to the proper extremes.

Finally we must distinguish among several different cases, dependingon the instant of time on which the transistor switch from the saturationregion to the linear region. For example, the first transistor can switches


between the two regions when the rising of the input has already finished,or on the contrary can switches when the input is still rising.All the possible cases are:

t10 6 t1s 6 1i 6 1o t10 6 1i 6 t1s 6 1ot1s 6 t10 6 1i 6 1o 1i 6 t10 6 t1s 6 1ot1s 6 1i 6 t10 6 1o t10 6 t1s 6 1o 6 1i

t1s 6t10 6 1o 6 1i

(3.9)

Evaluating all the possible cases, the equation (3.8) becomes a nonlinear equation of the variables t1s , t10, 1o , 1i , with t1s , t10, 1o as unknowns.A further step must be done, with the purpose of eliminating all the vari-ables but one. The real unknown is the time 1o , while all the other un-knowns can be expressed in function of 1o : in particular, the times t1s andt10 can be calculated together, with the equation VDS = VGS VTand withthe equation that states the charge conservation at node 1 between the time0 and the time t10, similar to the equation (3.5) (page 31), including the boot-strap effect due to capacitive coupling between the gate and the drain ofthe first transistor.Both these equations are functions of t1s , t10, 1o , 1i . By this way one hasthree equations with three unknowns, and by means of some approxim-ated methods2 it is possible to evaluate the three unknowns.

This solution scheme ought to be repeated for all the seven cases shownin equation (3.9). Each case gives as a solution a triple t1s , t10, 1o that is com-patible with one and only one of the conditions expressed by these cases.Thus, only one working condition is really selected, as it can be expected.

Indeed all the previous solving scheme is true only if the equation (3.6c)(page 32) apply, i.e. only if the capacitance at the node i is not a function ofthe voltage at the same node. But the capacitance actually is function of thevoltage in this manner:

Or, taking into account the carrier velocity saturation effect, the equation (3.3) (page 24).2 The problem is always strictly nonlinear.


Ci = Cij(

1+ Vi

b

)m j+Cip

(1+ V

i

b

)mp(3.10)

where C j and Cp are, respectively, function of area and function of peri-meter of a junction, because the capacitance at the node i is due to the para-sitics capacitances of the transistors connected to this node.

If the capacitance at each node are functions of the voltage at the node it-self, then one equation is no more sufficient: one must write equations likethe equation (3.8) (page 33), one for each node, and the solve them withstandard solving algorithm for nonlinear equations. The only differenceamong the equations applied at the nodes above the first and the first nodeequation is that not all of the cases of equation (3.9) are possible: in par-ticular these conditions apply only when the transistor can pass from thesaturation region to the linear region, and moreover, only when the inputrising time 1i can assume whichever value. The passage from saturation tolinearity can be made only by the first and the last transistors of the chain,as they are the only that can saturate3. But in the last transistor, the time Niis governed by Ni = N1o , giving thus only two possible cases:

tN0 6 tNs 6 Ni 6 No t0 6 Ni 6 tNs 6 No

In order to make the algorithm convergent, two other fictitious casesmust be included:

tN0 6 tNs , No 6 Nit0 6 tNs , No 6 Ni

These conditions can never verify in a real circuit, since they imply thatthe voltages at the source node and at the drain node of the last transistor

3 This is because they are the only that have a full voltage swing at some node, e.g. thegate node the first, and the drain the last. All the transistor in the middle of the chainare prevented to saturate by the body-effect, that makes the saturation condition VDS =VGS VT , (or, better, the equation (3.3), page 24) impossible.


crosses, making the transistor current flowing in an inverse direction (seefigure 3.6 for a visual explanation of the terms i and o and why they relat-ive voltage waveforms cannot cross). Their inclusion help finding the realcircuit conditions when solving the equation (3.8) for each of these fourcases: the solution of one the fictitious cases gives only unknowns compat-ible with one of the real cases.

All the other transistors, that can not saturate during the switching fromoff to on, have only one possible working condition, again that the voltagesat source and drain nodes do not cross:

ji 6 j

o j = 2, . . .N 1

Solving all the equations, one for each node, the unknowns jo can beevaluated, giving thus an estimate of the voltage waveform at each nodeof the chain. The rising/falling time of the last node of the chain gives alsothe delay of the chain itself.

3.3 Power consumption estimation

3.3.1 Switching energy

The contribution to the power dissipation due to the charge and dis-charge of internal nodes for each MOSFET can be defined as the integral ofthe voltage across the MOSFET times the current flowing through.

Theorem 3.2. The switching energy in generic nnetworks and pnetworks canbe written as:

Eswn =12

N

i=1

Ci(

V 2i V 2i)

(3.11)

Eswp =12

P

j=1

C j[(

VDD Vj)2 (VDD Vj )2] (3.12)

where Ci is the generic total capacitance of node i-th and Vi , Vi are, re-spectively, the initial and final value of the voltage swing at the same node.

3.3. Power estimation 37

Corollary 3.2.1. If the voltage swing of each node of the network is the full swingV = VDD 0, then equations (3.11), (3.12) can be written as:

Eswn =12

N

i=1

CiV2 (3.13)

Eswp =12

P

i=1

CiV2 (3.14)

Proof of theorem 3.2. Since the internal voltages and currents are known fromthe delay analysis, the energy for the nMOS network can be written bysumming all the contributions of internal nodes (see figure 3.3)

Eswn =N

i=1

Z [Vi+1Dn (t)ViDn(t)

]IiDn(t)dt

where the notation of figure 3.3 is adopted.

This equation can be written in this way:

Eswn =Z

{VNDn(t)I

NDn(t)+

N1i=1

ViDn(t)[

IiDn(t) Ii+1Dn (t)]}

dt (3.15)

It is possible to rewrite the previous equations by noting that in general:

Ii+1Dn IiDn = CidViDn

dt

and, in particular, if we neglect the current of the pMOS chain above thenode N,

INDn = CNdVNDn

dt .

Thus, for the n network it is possible to define the Eswn energy in thefollowing way:


Eswn = N

i=1

CiZ t0

t0ViDn

dViDndt dt

= N

i=1

CiZ Vi

ViViDn dV

iDn

=12

N

i=1

Ci(

V 2i V 2i)

If we integrate the equation (3.11) (page 36) only when the argument ofthe integrals are non zero, then the first integral in this equation goes fromt0 = ti0n to t

0 =

ion , so that the second integral goes from V

i = ViDn (t

i0n) to

Vi = ViDn(ion). Since V

iDn(

ion) = 0, we have Eswn =

12 Ni=1 CiV 2i , where Vi

is the actual voltage swing at the node i.

The energy dissipated in the p network (Eswp ) can be calculated withsimilar considerations leading to

Eswp =P

j=1

C jZ t0

t0

(VDD V jDp

) dViDndt dt

=P

j=1

C jZ Vj

Vj

(VDD V jDp

)dV jDp

=12 j C j

[(VDD Vj

)2 (VDD Vj )2]

Again, Vj = VjDp (t

i0n) and V

j = V

jDp (

jop ), and in the same way Vj =

VDD, so that Eswp = 12 Pj=1 C j(VDDV 2j ), where (VDDV 2j ) is the voltageswing at the node j.

In the equations (3.11) and (3.12) (page 36) the voltage variation of ca-pacitance must be included, obtaining expression for Eswn,p slightly morecomplicated, but still in closed form.

3.3. Power estimation 39

3.3.2 Shortcircuit energy

The shortcircuit contribution (for a output falling transition) is givenby:

Esc =Z o

t0VD ID dt

where ID is the pMOSFET current flowing through the pMOSFET thathas a changing gate voltage, during the output falling; of course all thepMOSFETs among this one and the output node must be on to have thiscontribution of power dissipation. So if we neglect the little discharging ofthe source voltage of this MOSFET, we can easily calculate the shortcircuitenergy, calculating the current flowing.

A similar equation can be written for the nMOS network.

Since voltage swings, internal currents and capacitances are known fromthe delay analysis, the power supply dissipation does not require addi-tional computations.

3.3.3 Subthreshold energy

The subthreshold current in a MOSFET is given by ([12]):

IDSsubth = 0WL

kTq Q(VS)

[1 e qVDSkT

]where

Q(VS) kTq

qsNa|p| e

q(VGVT )kT

and

= 1+ 12Cox

sNa|p| .

This current is proportional to the MOSFET width W, but, usually is neg-


ligible. However, with the scaling down of the dimensions and hence of thethreshold voltage this current may become no more negligible, and withlow VG and higher VD, the current becomes independent from VG.Moreover, while the shortcircuit current is limited by the switching timesof the circuit, the subthreshold current is not limited in time, so its dissip-ation can be comparable to the shortcircuit dissipation.

3.4 Results

The circuit in figure 3.2 with 2 nMOS and 2 pMOS transistors (in a0.7m technology) has been simulated using HSPICE (level 6) and the pro-posed model, for each combination of MOSFET widths from 1 m to 100 m.Figure 3.9 shows the comparison between delay (defined as the delay at50% between an input rise ramp of 200 ps and an output falling ramp)calculated by the model and the delay simulated by HSPICE for each com-bination of widths among 5 m and 30 m; similarly figure 3.10 shows thecomparison between the energy dissipated (during the output discharging)by the circuit calculated by the model and by HSPICE.

Tab. 3.1: Mean Error

Mean error Max Error Min ErrorDelay 6.115% 12.985 % 0.905%

Energy dissipated 2.1% 6.3% 0.11%

Tab. 3.2: Execution time

HSPICE execution time FAST execution time6384.3 sec. 188.91 sec.

The errors between the proposed model and the HSPICE simulation isreported in table 3.1 while table 3.2 shows corresponding execution time.These results are taken from the analysis of the circuit varying the dimen-sions of the MOSFETs continuously from 1 m to 100 m.

3.5. Conclusions 41

3.5 Conclusions

The model of this chapter is suitable for the optimization application ofchapter 5. It is able to compute the delay and the power consumption ofCMOS structures with good accuracy and a consistent speedup regardingto the HSPICE simulation taken as a reference.In a real production design cycle, this model might be used for a first preoptimization of some basic cell; then in the last steps of the design flow theoptimization using a more accurate model for the delay (or power) evalu-ation must be used.


Delay Model

510

1520

2530

W1 [micron] 510

1520

2530

W2 [micron]

20406080

100120140160180

Delay [ps]

(a) FAST model

Hspice Simulation

510

1520

2530

W1 [micron] 510

1520

2530

W2 [micron]

20406080

100120140160180

Delay [ps]

(b) HSPICE

Fig. 3.9: Delay of the circuit 3.2 with several combination of W1 and W2.

3.5. Conclusions 43

Energy Model

510

1520

2530

W1 [micron] 510

1520

2530

W2 [micron]

200300400500600700800900

1000

Energy [fJ]

(a) FAST model

Hspice Simulation

510

1520

2530

W1 [micron] 510

1520

2530

W2 [micron]

200300400500600700800900

1000

Energy [fJ]

(b) HSPICE

Fig. 3.10: Energy dissipated by the circuit of figure 3.2 with several combin-ation of W1 and W2

Part III

OPTIMIZATION

Chapter 4

MATHEMATIC OPTIMIZATION

THE very basic theory of optimization is introduced here, in order todevelop some optimization schemes, useful later for the optimizationof real circuits.The theory of mono-objective optimization involves some properties andtheorems regarding finding the minimum of functions, hence the annullingof the functions first derivatives. These results can be extended (with somerestrictions) to the case of multivariable functions but when the functionsto be optimized are more than one, being optimized simultaneously, the anew theory may be introduced.

The whole goal of this introduction to mathematical optimization isboth the developing of reliable algorithms, and the justification of some as-sumptions made in the chapter 5 (page 77), especially for the multi-objectivecase.

In section 4.1 some mathematical optimization foundations are repor-ted, and in particular in 4.1.1 is shown the theory of mono-objective optim-ization (unconstrained, 4.1.1.1, and constrained, 4.1.1.2), while in 4.1.2 isshown the theory of multi-objective optimization (unconstrained, 4.1.2.1,and constrained, 4.1.2.2).The section 4.2 reports the basic and most useful numerical algorithms foroptimization purposes: in 4.2.1 some one-dimensional search techniques,in 4.2.2 some multi-dimensional search techniques, and in 4.2.4, 4.2.5some special algorithms.Some conclusion and summarized characteristics are reported in section 4.3.

48 Chapter 4. Mathematic Optimization

4.1 Optimization theory

Notation 4.1. In the following section, the function f is defined as:f : XRp YR. X is called the decisions space, and Y is called the criteriaspace.

Problem 4.2 (Unconstrained optimization). Given the function f that de-pends on one or more variable x X, the problem of optimize f , in thiscontext, is equal to find:

minxX f (x)

this is also known as an unconstrained optimization, since there are not anyconstraints on the values the function f may assumes.

The unconstrained optimization is seldom applied in the field of digitalcircuits, so the constrained optimization is defined as:

Problem 4.3 (Constrained optimization). Find

minxX f (x) subject to g j(x) h j, j = 1,2, . . . ,m

where the n equations gi(x) hi constitute the set of constraints of the op-timization.

The function f is also called the objective of the optimization, or the costfunction of the problem.

The above problems are classical optimization problems, or mono-objec-tive problems. The multi-objective unconstrained optimization is defined asthe problem to optimize a vectorial function, so that the objective-functionis a vector of objective-functions.Notation 4.4. In the following (multi-objective optimization), the function fis defined as:f : X Rp Y Rn, or f = ( f1, f2, . . . ,n)| fi : X Rp Y R,Problem 4.5 (Unconstrained multi-objective optimization). Find

minxX fi(x), i = 1,2, . . . ,n

4.1. Optimization theory 49

where there are n objective functions.

Finally, the multi-objective constrained optimization is defined as:

Problem 4.6 (Constrained multi-objective optimization). Find

minxX fi(x), i = 1,2, . . . ,n subject to gi(x) hi, i = 1,2, . . . ,m

where there are n objective functions and m constraints.

The multi-objective optimization is a very complex problem, since theproblem of finding the minimum of two or more functions is apparentlyonly trivial: the set of independent variables xmin that minimizes, lets say,the function f1, it is not supposed to minimizes (and generally it does not)the other functions. So there should be a way to combine the information ofminimum among all the functions. The intuitive way of linear combinationis somewhat problematic:

ftot(x) =n

i=1

i fi(x), i R

because the functions fi cannot be commensurable among them. For ex-ample, if there is one function f j that is f j >> fi,i 6= j, then this functiondominate the total objective, giving false results for the optimization prob-lem. This problem is illustrated in 4.1.2.

4.1.1 Mono-objective optimization

The mono-objective optimization is the standard optimization problem,and is widely treated in literature (see [13] for an introduction). With thispreliminary statement, here are reported some results, useful to find a solu-tion for the problems 4.2, 4.3.

The existence of the minimum (at least one) is granted by the WeierstrassTheorem1, but these minimums can be local or global:

Definition 4.7 (Local Minimum). The point x? X is a local (or relative)minimum of the function f iff > 0: f (x) f (x?) x X |x x?| < .

1 iff X is a compact set, as is in this context


Definition 4.8 (Global Minimum). The point x? X is a global (or abso-lute) minimum of the function f iff f (x) f (x?) x X.Definition 4.9 (Feasible direction). d Rn is a feasible direction if ? >0: x+ d X, : 0 ?

In an intuitive manner the concept of feasible direction is useful to solvethe problem of minimization: we search all the direction in which the func-tion f is decreasing.

Lemma 4.10 (First order necessary condition). If x? X is a minimum off C1 then d Rn, where d is an feasible direction, dT f (x?) 0, where() has the usual definition of scalar product in the space Rn.Corollary 4.10.1. If x? X is an internal point of X, then dT f (x?) = 0Lemma 4.11 (Second order necessary condition). If x? X is a minimum off C2 then d Rn, where d is an feasible direction,

i) dT f (x?) 0;ii) if dT f (x?) = 0 then dT 2 f (x?) d 0

Corollary 4.11.1. If x? X is an internal point of X, then

i) dT f (x?) = 0ii) dT 2 f (x?) d 0

The conditions of the corollary 4.1.1 are necessary and sufficient con-ditions for the existence of the minimum (local). In order to have someinformation about the existence of a global minimum, the theory of convexfunctions must be very briefly reported.

Definition 4.12 (Convex function). The function f : X Y, where X is aconvex set2, is convex if x1,x2 X : 0 1

f (x1 + (1 )x2) f (x1)+ (1 ) f x2) (4.1)2 A set X Rn is convex if x,y X the segment [x,y] is totally contained in X


If in the equation (4.1) the sign < applies, then the function is said to bestrictly convex.

Another way to write the equation (4.1) is:

Lemma 4.13. The function f C1 : X Y is convex over a convex set X if

f (y) f (x)+ f (x) f (y x), y,x X

or, if f is twice derivable,

Lemma 4.14. The function f C2 : X Y is convex over a convex set X if

2 f (x) 0, x X

The convex functions are a very useful mathematical tool in the class ofoptimization problem, mainly for the next two results:

Theorem 4.15. If f : X Y is convex over a convex set X, the set A of the min-imum of the function is convex, and every local minimum is also a global min-imum.

Theorem 4.16. If f C1 : X Y is convex over a convex set X, and if x? X : x X f (x?)(x x?) 0, then x? is a global minimum of f over X.

The theorem 4.16 also implies that the conditions of the lemma 4.10 andcorollary 4.10.1 (first order conditions) are both necessary and sufficientconditions for the existence of a global minimum.

4.1.1.1 Unconstrained problem

All the previous results are, almost in theory, sufficient to solve theproblem 4.2. The theory of the convex function ensures the existence ofa global minimum, while lemma 4.10, corollary 4.10.1, and theorem 4.16suggest a method to find this minimum. We will see in 5.1 how thesemethods apply to real circuits, in which, for example, the functions deriv-ative are not available.


4.1.1.2 Constrained problem

The solution of problem 4.3 is slightly more complicated. The pres-ence of constraints reduces the feasible set of independent variables thatare solutions of the problem. So the solutions, (i.e. the value of independ-ent variables that minimize the objective function), must be searched in theset x C X that satisfies all the constraints.The most important method to solve the problem of the minimization tak-ing into account the satisfaction of some constraints (and, incidentally, themethod most useful for our real problem) is the method of the Lagrangemultiplier (and its derived, the method of the penalty function).

Lagrange multiplier and Penalty functions The first method defines aLagrangian function:

L(x, ) = f (x)+m

i=1

igi(x) (4.2)

If we define x? as the solution that:

x? = minxX f (x) gi(x

?) 0, i = 1,2, . . . ,m

then we can write the necessary KuhnTucker conditions for the existenceof the minimum:

xL(x?, ?) = 0 (4.3)L(x?, ?) 0 (4.4)(?)Tg(x?) = 0 (4.5)? 0 (4.6)

In order to find out sufficient conditions, we define the saddle-point condi-tions:

Theorem 4.17. A point (x?, ?) with ? 0 is a a saddle-point of the LagrangianL(x, ) iff


i) x? minimizes L(x, ) over the whole X

ii) gi(x?) 0, i = 1,2, . . . ,miii) ?i gi(x?) = 0, i = 1,2, . . . ,m

It can be proved that if the functions f , g are even not differentiable butare convex, then the saddle-point conditions are necessary and sufficientconditions. Although these conditions must hold at the minimum, they arenot very useful in determining the optimum point. The determination ofthe optimum by direct solution of these equations is rarely practicable.

A more feasible way is to convert the constrained problem into an un-constrained one, by defining the new objective function:

P(x,K) = f (x)+m

i=1

Ki[gi(x)]2 (4.7)

The sum added to the objective function is called penalty function, since itpenalizes the objective function adding a positive quantities (recall that wewant to minimize the cost function). The constants K = [K1,K2, . . . ,Km]T

are weighting factors (positive) that define how strongly must be satisfiedthe ith constraint, and can also made it commensurable.

Wherever x is inside the feasible region, we can ignore the constraints,so a new objective function can be defined as:

P(x,K) = f (x)+m

i=1

Ki[gi(x)]2ui(gi) (4.8)

where ui(gi) is the usual step function:

ui(gi) =

0 if gi(x) 01 if gi(x) > 0The introduction of the step function makes possible to relate the pen-


alty function defined in (4.8) with the Lagrangian function of (4.2) (page 52):

P(x,K) = L(,K)

if we let i = Kigi(x)ui(gi), so that all previous results valid for the Lag-rangian function are valid for the penalty function.Note that the solution x? found optimizing the penalty function P(x,K)converges to (x?, ?), defined by the KuhnTucker conditions, only in thelimit K .

4.1.2 Multi-objective optimization

The multi-objective optimization is not a standard problem in the engin-eering, but is quite common in economics ([14]). While with the mono-dimensional problem the concept of optimum as a minimum is quite clearand defined (the idea of greater or lesser is intuitive with the real number),with multi-objective (also multi-criteria) the concept of minimum is less in-tuitive. So we must define some relation of order among the points in amulti-dimensional space.

Notation 4.18. Given x,y Rn, define

x = y iff xk = yk k = 1,2, . . . ,nx 5 y iff xk yk k = 1,2, . . . ,nx y iff x 5 y and x 6= y (sok : xk < yk)x < y iff xk < yk k = 1,2, . . . ,n

Notation 4.19. In the following section, the function f is defined as: f : X Y, XRp,YRn. X is called the decisions space, while Y is called the criteriaspace.

Given two outcome y1,y2 of the cost functions, y1 = f (x1) and y2 =f (x2), we must define which is better and we indicate that y1 is better thany2 with y1 y2, that y1 is worse than y2 with y1 y2, and, finally, that y1 isindifferent with respect to y2 with y1 y2.

In the optimization theory a great importance has the definition of Pareto


point or Pareto preference:

Definition 4.20 (Pareto preference). Given y1,y2 Y, the Pareto preferenceis defined by

y1 y2 iff y1 y2.

A Pareto preference is intuitively guided by the relation lesser is better.

Definition 4.21 (Non-Dominated and Dominated set). If y1 y2 is a bin-ary preference defined on Y, the dominated and the non-dominated setwith respect to {} are defined as:

N({},Y) = {y0 Y | @y Y : y y0}D({},Y) = {y0 Y | y Y : y y0}

If y0 N({},Y), y0 is a Npoint. Similarly, if y0 D({},Y), y0 is a Dpoint.

Definition 4.22 (Pareto optimum). y Y is a Pareto optimum iff it is a Npoint with respect to Pareto preference.

We will give now two theorems that are fundamental for the solution ofthe multi-objective optimization problem; first we introduce the definitionof convex cone in Rn:Notation 4.23 (convex cone).

> ={d Rn |d > 0} ={d Rn |d 0}= ={d Rn |d = 0}

Theorem 4.24. i) if y0 Y minimizes y over Y for some >, then y0is a Npoint;

ii) if y0 Y uniquely minimizes y over Y for some , then y0 is aNpoint.


Corollary 4.24.1. If Y is =convex, i.e. Y+= is a convex set, then a necessarycondition for y0 Y to be an Npoint is to minimize y over Y for some >.

This very important theorem (and its corollary) states that if y0 minim-izes a linear weighted function y (for some), then y0 is a Pareto optimum.This reduces the problem from a multi-objective one to a mono-objectiveone, i.e. is sufficient minimizes a linear weighted function of the cost func-tions.Note that:

yiy j

= ji

so the ratio ji is the trade-off exchanging an unit-gain in the variable y j withan unit-gain for the variable yi. Finally, note that the theorem is valid forany shape of Y.

Theorem 4.25. A necessary and sufficient condition for y0 Y to be an Npointis that i = 1,2, . . . ,n there are n 1 constants (i)= {h j | j 6= i, j = 1,2, . . . ,n}so that y0 uniquely minimizes yi over Y((i)) = {y Y | y j h j, j 6= i, j =1,2, . . . ,n}.

Each constant h j can be seen as a constraint: so this theorem claims thata necessary and sufficient condition to be a Pareto optimum is to minimizeone criterion (the ith objective function), while satisfying the constraintsfor the remaining criteria. This is equal to say that the multiple criteriaproblem can be reduced to a single criterion problem (minimize the yi func-tions with multiple constraints (ensure that y j h j, i 6= j).

4.1.2.1 Unconstrained

Given all previous results, the solution of the unconstrained problem isgiven by all previous tools: we reduce the multi-objective problem. We willsee in 5.1 how to apply these methods and which is preferred.


4.1.2.2 Constrained

Again, the solution is to reduce the complexity of the problem from themulti-objectivity to a mono-objective one. It is possible to combine the twoprevious methods, that is to minimize a linear weighted function plus asum of penalty function; the only critical point is to ensure the same orderof magnitude of each term of the sum, such that there is not a dictatorshipof one term of the sum. The third chance to solve an unconstrained problem(or a constrained, but with some care) is to use the method of the compromisesolution:

Compromise solution Given the problem 4.3, it is possible to define y? asthe ideal outcome of the cost function f (x) without any constraints, so thaty? = inf

xX f (x); the compromise solution is defined as the minimum of regret:

r(y) = y y?;

typically, the Lpnorm (the distance between the actual solution and theideal point) ) it is used:

r(y) = r(y; p) =[

n

i=1|yi y?i |p

] 1p

.

Again, a weight can be associated for each term of the sum:

r(y; p,w) =[

n

i=1

wpi |yi y?i |p] 1

p

.

Definition 4.26 (Compromise solution). The compromise solution with re-spect to Lpnorm is yp Y that minimizes r(y; p,w) over Y.

The compromise solution enjoys several properties, the most importantis:

Property 4.27 (Pareto optimality). The compromise solution yp Y is anNpoint, for 1 p


When the ideal point is not known, one can use an approximation, or,even, a constraint; in the latter case the more appropriate term is satisfyinglevel. To point out the differences between constraints and satisfying level,one must observe:

The constraints are, typically, a disequality constraints: the solutionmust be as lesser as possible than the specified constraints. In termof a Lpnorm the solution must be as farther as possible from theconstraints, that is the Lpnorm must not to be minimized. So themethod of the penalty function is the only suitable for this kind ofproblem.

The satisfying levels are, typically, equality constraints: the solutionmust be as closer as possible to the levels indicated, that is the Lpnorm must be minimized. So the method of the compromise solutioncan be devised.

4.2 Optimization Algorithms

This is a very concise report of some algorithms used in the optimiza-tion of real circuit in the following chapters.

First are reported some one-dimensional (with respect to the decisionspace) algorithms, and then the multi-dimensional algorithms, with somebased on the previous ones. Finally some non-standard algorithms arereported, since they can be suitable for the application to digital circuit.

In the following report we focus on the algorithms that do not requirethe evaluation of the gradient of the objective functions, or that approximatethis gradient3, since (see 5.1) the functions available in real circuits are notknown in a closed form and almost

3 Essentially with fxi

(x) f (x+ x) f (x)x

4.2. Optimization Algorithms 59

4.2.1 One-dimensional search techniques

In order to find the minimum of a function f : R R, we need to brackethim:

Definition 4.28 (Bracketing). To bracket a minimum means to find a triplea, b, c R, a < b < c, such that f (b) < f (a) and f (b) < f (c). This means thatthe minimum is in the interval (a, c).

We show some algorithms, that are the most efficient in this field. Firstwe introduce the family of sectioning algorithm, from which the the goldensection search is probably the most suitable for our uses. Then we introducethe Brents rule, a quadratic interpolation algorithm.

4.2.1.1 The section search

The algorithms of sectioning apply always the same policy: divide andconquer. The initial interval [a, c] is reduced at each iteration to a smallerinterval, already bracketing the minimum x?. We have so a series of encap-sulated intervals (see figure 4.1)

x? [an, cn] [an1, cn1] [a, c].

Dicotomic search The simplest form of sectioning is the dicotomic search:at first iteration the interval [a, c] is divided in two equal parts, [a, b] and[b, c], so that b = a+ c2 ; then, choosing > 0, we check if f (b ) > f (b +). In such case we repeat he whole process with the new interval [a, b],otherwise we repeat with [b, c]. It can be proved ([13]) that this methodrequires 2k evaluations of the function f , where k is the iterations number.Also the final interval length Ik = (ck ak) is

limk Ik = I0,

where I0 = (c a).So the relative uncertainty on the minimum x? is .


a cb =a0 00 1 =cb =c2

I 0I1

I 2

1 1

Fig. 4.1: Section search algorithm

Fibonacci Search A more sophisticated algorithm is the Fibonacci search,where at each iteration the length of the interval is chosen according to theFibonacci rule: Ik3 = Ik2 + Ik1. This method has the advantage that theuncertainty after n iteration is known a priori: defining the initial intervalI0 = I1 = (c a), then

Ik =I1 + fk2

fk

where fi is the ith number of the Fibonacci sequence.The number of function evaluations are again 2k, and the disadvantages ofthis methods are that and n must be chosen a priori.

The golden section search Given a triplet (a, b, c) that brackets the min-imum, we choose a new point x that defines a new bracketing triplet (a, x, b)or (b, x, c) according to the rule:

x bc a = 1 2

b ac a


This implies that |b a| = |x c|, and that at each iteration the interval isscaled of the same ratio .Then we repeat the process with the new triplet. So the interval (a, c) is di-vided in two parts, a smaller and a larger, and the ratio between the wholeinterval and the larger is the same between the larger and the smaller, or inother words:

1=

1 ,

giving for the positive solution

=

5 12 .

This fraction is known as the golden-mean or golden-section, whose aes-thetic properties come from ancient Pythagoreans.

Convergence considerations All the three previous methods have a lin-ear convergence, since at each iteration the ratio between the interval con-taining x? and the new smaller interval is:

0 Ik+1Ik 1.

The asymptotic convergence rate is defined as

limk

Ik+1Ik

.

For the dicotomic search, since 2Ik+1 = Ik + , taking = 0 we have

limk

Ik+1Ik

=12 .

For the Fibonacci search, first we must write the generic number of theFibonacci sequence in a closed form:


fk =15

[(1+

5

2

)k+1(

152

)k+1].

then it can be proved that, taking = 0:

limk

Ik+1Ik

= limk

fk+1fk

=

5 12

For the golden section search, as previously said Ik+1Ik = , so

limk

Ik+1Ik

= =

5 12 .

Thus the convergence rate of the Fibonacci and the golden-section search areidentical.

4.2.1.2 Parabolic interpolation

Given a triplet (a, b, c) that brackets a minimum, we approximate theobjective function in the interval (a, c) with the parabola fitting the triplet.Then we find the minimum of this parabola with the formula (since wewant the abscissa, the method is indeed an inverse parabolic interpolation):

x? = b 12 (b a)2[ f (b) f (c)] (b c)2[ f (b) f (a)](b a)[ f (b) f (c)] (b c)[ f (b) f (a)]

This method is useful only when the function is quite smooth in the in-terval, but it has the advantage that the convergence is almost quadratic,and it is perfectly quadratic when the function to be optimized is a quad-ratic form.

The Brents rule The Brents rule is a mix of the last two techniques: ituses the golden section when the function is not regular and switches to aparabolic interpolation when the function is sufficiently regular. In particu-lar, it tries always a parabolic step. When the parabolic step is useless then


the method use the golden section search.

4.2.2 Multi-dimensional search

This algorithms search the solution of the optimization problem in amulti-dimensional space. Again, first an algorithm with a convergence or-der of 1 is presented, then an algorithm with a quadratic order of conver-gence is showed.

All the algorithms here presented show a sub-algorithm part that is aone-dimensional search.

4.2.2.1 The gradient direction: steepest (maximum) descent

The method of the steepest descent chooses at each iteration a new pointin the decision space x+ dx from the old point x, obviously such that:

f (x+ dx) < f (x)

This new point must also be chosen such that the variation of the functionf is as more as possible. In other words, if dl is the length of the direction:

dl =

n

i=1

(dxi)2,

the steepest descent maximizes the rate of change d f/dl.

The problem of minimize f becomes so the problem:

Problem 4.29 (Steepest descent).

maxn

i=1

d fdl = maxdxi

dl

n

i=1

fxi

dxidl ,

such that

dl =

n

i=1

(dxi)2.


This problem can be solved with the Lagrangian multipliers; from equa-tions (4.3) and (4.4) (page 52) we can write:

dxidl =

12

fxi

,

with

= 12[ ni=1

( fxi

)2] 12.

This means:

dxidl (x) =

fxi

(x)[ ni=1

( fxi

(x))2] 12 (4.9)

The steepest descend algorithm chooses at each iteration a new pointxk+1 from the old point xk from the equation (4.9) (page 64)

xk+1 = xk dl f (xk), dl > 0

with dl chosen accordingly to the desired convergence rate: if dl is smallthe algorithm will closely approximate the minimum, with slow conver-gence, while if dl is large the convergence is fast but the algorithm canoscillate near the minimum. Thus some methods are necessary to reduce(or enlarge) the step dl at each iteration: large steps if we are far away fromthe minimum, small steps if we are close to the minimum. The schemeof choosing the proper step can affect greatly the convergence of the al-gorithm. The best choice is the method of the optimal gradient.


4.2.2.2 The optimal gradient

This algorithm simply calculates the step dl according to:

mindlR+ f (x

k dl f (xk))

This is a one-dimensional optimization and it is usually performed witha method as shown previously. Strictly speaking, the optimization of f isalways a multidimensional one, since we descend along the gradient path,but inner this process there are a lot of sub-optimization steps that foundthe optimal length of this descend.

If f C2, that is f is twice differentiable and its derivatives are continue,then a closed form for the optimum step dl is determinable; we expand fin Taylor series:

f (xk + x) = f (xk)+[ f (xk)

]Tx+ 12x

kH(xk)x,

where H(x) is the Hessian4matrix of f .

Along the gradient direction:

x = dlk f (xk).

Thus:

f (xk + dlk f (xk)) = f (xk)+ dlk[ f (xk)

]T f (xk)++

12(dl

k)2[ f (xk)

]TH(xk) f (xk)

d fdlk

=[ f (xk)

]T f (xk)+ dlk [ f (xk)]T H(xk) f (xk) = 0 (4.10)4 The Hessian matrix of a function f (x1, x2, . . . , xn) is defined as:

H( f ) =

2 fx21

fx1x2

fx1xn f

x2x12 fx22

fx2xn...

.... . .

... f

xnx1 f

xnx2 2 fx2n


and

dlk =[ [ f (xk)]T f (xk)[ f (xk)]T H(xk) f (xk)

]. (4.11)

From d fdlk (xk+1) = 0, we can see that:

( f (xk + dlk f (xk)), f (xk))

)= 0,

that is f (xk)) and f (xk+1)) are orthogonal, or, the same, xk and xk+1are orthogonal. This means that successive steps of the optimal gradientalgorithm are orthogonal.

Convergence considerations A general descend algorithm converges if:

limk f (x

k) = 0.

Property 4.30. The function f monotonically decreases along the (negative)gradient path.

Proof. From equation (4.9)

d fdl =

n

i=1

fxi

dxidl =

fxi

fxi[ n

i=1

( fxi

)2] 12 = [ ni=1( fxi

)2] 12(4.12)

Thus d fdl 0, or the function f decreases along the path dl.

Lemma 4.31. The convergence of a descend method along the gradient path cannot be obtained in a finite number of steps.


Proof. From equation (4.12) (page 66)

d fdl =

[ ni=1

( fxi

)2] 12but when x approaches the optimum x?, then

limxx?

fxi

(x) = 0

so that

limxx?

d fdl (x) = 0

meaning that the optimum is reached with a rate convergence that de-creases.

For the optimal gradient method the convergence is only linear5 in f (xk)and a halting criterion for the algorithm could be:

f (xk) f (xk+1) ;

alternatively from the necessary condition f (x) = 0

maxi| fxi

(xk)| or ni=1

( fxi

(xk))2

Finally note that these methods, since they use a local gradient inform-ation, they find only a local minimum, and that the gradient algorithms arerather inefficient in the proximity of the optimum, due to the small stepsize.

4.2.3 The conjugate direction method

Let u,v XRn. They are said mutually orthogonal if uTv= 0. Similarlythey are said mutually conjugate with respect to a matrix A if uTAv = 0.

5 This means that limk f (xk+1)

f (xk) = a, with 0 a 1


Property 4.32. A set of of mutually conjugate vectors in X Rn constitutesa basis for X.

The importance of a set of mutually conjugate vectors is stated from thefollowing theorem:

Theorem 4.33. Every descent method of optimization using mutually conjugatedirections is quadratically convergent.

The concept of conjugate directions is important, since, in an intuitivelymanner, a minimization attained along one of this directions does not per-turb the the minimization along the other direction.

4.2.3.1 The FletcherReeves conjugate gradient algorithm

This algorithm calculates the mutually conjugate directions of searchwith respect to the Hessian matrix of f directly from the function evalu-ation and the gradient evaluation, but without the direct evaluation of theHessian of the function f .

Algorithm 4.34. FletcherReeves conjugate gradient algorithmRequire: x0 = starting point

1: repeat2: Compute f (x0) and h0 = f (x0)3: for i = 1, . . . ,n 1 do4: Replace xi = xi1 + i1hi1,

where i1 minimizes f (xi1 + i1hi1)5: Compute f (xi)6: if i < n then7: hi = f (xi)+ f (xi)2 f (xi1)2 hi18: end if9: x0 = xn

10: end for11: until halting criterion

The quantity f (xi)2 f (xi1)2 hi1 is added to the gradient at each iteration,and when f is a quadratic form (positive definite)

Design and Optimization Techniques of High Speed VLSI Circuit

Documents

Transcript of Design and Optimization Techniques of High Speed VLSI Circuit