Post on 14-Dec-2015
<Insert Picture Here>
RAPID Standard Cell Library Evaluation
by David Artz & Cory Krug
Oracle LabsNovember 2011
2
<Insert Picture Here>
Introduction
• Standard Cell (Logic, ECO, Power) Library Evaluation Comparison Criteria• Performance• Power• Layout Architecture (routability, area, power rails, tapless,
etc.)• Features• Drive Strengths• Supported Views (CCS/NLDM, APL, DFT, etc.)• Documentation, Support
3
<Vendor B> vs. <Vendor A> Standard Cell Comparisons
• Two libraries were compared, <Vendor B> and <Vendor A>. Both vendors have what they call general purpose (or low power) variants built around a 9 track layout architecture and high performance (12 track) variant.
• Both vendors supply typical mix of combinational (simple & complex), sequential (latches/flops), I/O’s, ECO cells, power management cells (header/footer switches, level shifters, isolation cells), etc., this is where the similarities end.
• <Vendor A> has a much richer library in terms of drive strengths, beta ratios, and device lengths where <Vendor B> has only Vt mix.
4
<Vendor B> vs. <Vendor A> Richness (Vt, Leff, General Purpose and High Performance)
CategoryLibrary
Leff
Vt Low Std Hi Low Std Hi Low Std HiStandard Cell Libraries P P P P P P
Power Management Kit P P P P P P PECO Kit P P
CategoryLibrary
Leff
Vt Low Std Hi Low Std Hi Low Std HiStandard Cell Libraries P P P P P P P P
Power Management Kit P P P P P P PECO Kit P P
High Performance
High DensitySC9 SC9MC
40SC12 SC12MC
40 50
40 40 50
<Vendor B>
<Vendor A>
5
<Vendor B> vs. <Vendor A> Richness (Functionality & Drive Strengths/Beta Ratio’s)
• For the purpose of comparison the standard cell libraries are categorized as follows:
Standard Cells
Clocking DatapathCombinational
Complex Simple
Physical Storage
Flip-Flop Register FileLatch ScanableFlip-Flop
SpecialPower
Voltage Island Switches
Level Shifter/Isolation Cells
Storage
ECO
6
<Vendor B> vs. <Vendor A> Richness (Functionality & Drive Strengths/Beta Ratio’s Cont.)
6/20 indicates 6 functions, with 20 drive
strengths total
9 track standard cell library comparisonARM TSMC Descriptions
3/180 8/213 Special cells for clock distribution, e.g., balanced nand, clock gates, etc.
11/126 17/126 Full and half adders, booth encoders, etc.
Complex 46/1197 52/699 And-Or-Inverts, inverted input(s) simple combinationals, etc.
Simple 19/942 21/489 Inv, buff, nand, nor, xor, etc.
4/45 8/87 Antenna tie downs, decaps, filler cells
Latch 12/162 16/252
Flip-Flop 9/114 24/222 D Flip-flops
Register File 4/36 0/0
Scanable 18/222 28/375 Scannable versions of flip flops
1/24 2/45 Bus holder, delay cells
Power Island Gates
6/48 6/20 Header/footer switches
Interface Logic 10/114 4/48 Level shifters, isolation cells
Retention Storage
24/303 18/150 Retention flops, etc.
167/3513 204/2726
Low Power Design
Total
Standard Cells
Clocking
Datapath
Combinational
Category
Physical Design
Storage
Special
<Vendor A> <Vendor B>
7
<Vendor B> vs. <Vendor A> Richness (Functionality & Drive Strengths/Beta Ratio’s Cont.)
• Previous chart shows 22% more functions in the <Vendor B> library over <Vendor A>. This can be misleading as it is my opinion many of these functions are of little use (e.g., many dubiously useful flavors of scanable flip-flops with both Q and QN outputs, etc.)
• Despite the richer feature set of the <Vendor B> library the <Vendor A> library has 29% addition drive strengths which consist of differing beta ratios and finer drive granularity.
• Beta Ratios: device P/N sizing to adjust timing arc performance, e.g.– Max finger size which fit’s in cell– Minimize the average delay– Equalize the delays– Equalize the output slews (e.g., on clock cells)– Minimize the maximum delay– Minimize the delay for rising output
• <Vendor A> libraries use all of the above beta ratios (where they make sense). <Vendor B> uses only minimize(max(tplh, tphl)).
• The finer granularity and sizing's allows optimization approaches to better fine tune for power, performance, and area goals.
• The “multi-channel” libraries from <Vendor A> afford even more optimization opportunities for improving leakage and minimizing processing variance (especially important in clocking).
8
<Vendor B> vs. <Vendor A> Documentation
• <Vendor A> documentation is more readable and concise.– Truth tables show “don’t care” conditions rather then explicitly
listing out all permutations of input/output states.– Detailed descriptions of the operating conditions and
constraints over which the cells were characterize (e.g., surrounding dummy metal included at representative densities, etc.) are given.
– BKM’s on routablity, power strapping, etc. within commercial tools is documented.
– Gate level schematic diagrams are included and not just a cell symbol.
9
<Vendor B> vs. <Vendor A> Layout
• <Vendor B> cell pitch is 0.14um while <Vendor A> is 0.18um• Power rails in <Vendor A> are M2 while <Vendor B>
is classical M1. My experience has taught me M2 affords better IR drop robustness and little impact if any to routability.• All library offerings come with a standard tech.lef
defining BEOL for various stackups.• Both libraries are tapless allowing for back biasing to
reduce leakage.
10
<Vendor B> vs. <Vendor A> Views
• Both vendors offer all typical library views (schematic symbols, place & route LEF, verilog, pre & post spice decks, DFT, etc.)
• <Vendor A> has some pre-compiled views (e.g., milkyway) where as <Vendor B> does not.
• Timing & Power Views– Synopsis .lib in NLDM and CCS are supplied. <Vendor B> libraries elicit warnings when
checked with the semantic checker, <Vendor A>’s do not.– The number of indices in NDLM tables are the same but interestingly <Vendor A>
characterizes over a much broader range (e.g., on small inverters 50% wider range of input slews and 280% wider on loads) then <Vendor B>.
– <Vendor B> appears to characterize more robustly for power then <Vendor A>, e.g., internal nodal currents are captured for header/footer switches.
– APL (Apache Power Libraries) are supposed to be available from both vendors (note, it appears <Vendor A> only offers APL for 12 track libraries)
– When comparing the closest matching cells across libraries (functions, drive strength, PVT, input slew rate, output loading, etc.) the <Vendor B> cells appear to be on average 3% faster in performance then <Vendor A>. I feel this is more of a characterization discrepancy (what conditions did <Vendor B> assume in the neighborhood used for characterizing these cells, was the input an ideal voltage source or a properly shaped waveform, was the output a passive cap or another representative DUT, etc.) then an actual difference in performance.
11
<Vendor B> vs. <Vendor A> Misc. Observations
• <Vendor B> offers thick gate oxide decoupling caps, <Vendor A> appears not to. Thicker oxides reduce leakage (Note: I’m dubious about any of the decaps frequency response to supply instantaneous current at our higher frequency goals).• The power saving library from <Vendor B> (i.e.,
head/footer switches) appear have more functionality in that they afford a pre-trickle charge phase signal before the final charge phase, thus supplying out of the box finer control for ramp time of voltage islands.
12
<Vendor B> vs. <Vendor A> Synthesis Observations
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
500 750 1000 1250 1500 1750
Are
a (u
m^2
)
Speed (MHz)
32-bit multiply
TSMC
ARM
750
950
1150
1350
1550
1750
1950
2150
2350
2550
2750
500 750 1000 1250 1500 1750 2000
Are
a (u
m^2
)
Speed (MHz)
16-bit multiply
TSMC
ARM
0
50
100
150
200
250
500 750 1000 1250 1500 1750 2000
Are
a (u
m^2
)
Speed (MHz)
32-bit add
TSMC
ARM
350
360
370
380
390
400
410
420
430
440
450
1250 1500 1750 2000
Are
a (u
m^2
)
Speed (MHz)
32-bit decode
TSMC
ARM
13
<Vendor B> vs. <Vendor A> Recommendation
• <Vendor A> offers a superior library in terms of performance, functionality, power, and integration. We saw no area penalty despite the difference in cell pitch. This was shown through a systematic comparison of individual library elements and as on synthesized representative blocks where <Vendor A> implementation (with all things being equal, e.g., wire load model, constraints, etc.) on average outperformed <Vendor B> by 3%-5%.
• The 9 track <Vendor A> library gives us good power savings and reasonable performance that should meet RAPID targets. The high performance library (12 track) could be used in functional units requiring higher performance (at the cost of power and area).