Literature Review
description
Transcript of Literature Review
![Page 1: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/1.jpg)
Click to edit Master title styleLiterature Review
Measuring the GapBetween FPGAs and ASICs
Ian Kuon, Jonathan RoseUniversity of Toronto
IEEE TCAD/ICASFeburary 2007
Henry ChenFebruary 26, 2010
![Page 2: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/2.jpg)
Introduction
Trade-offs between FPGAs and standard-cell ASICs– Decreased NRE, design time– Increased silicon area, power; decreased performance
FPGA inefficiencies known and accepted,but largely un-quantified
![Page 3: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/3.jpg)
Previous Comparisons
Jones et al. (1986): MPGAs to standard cells– 1.52.6x area, ~1.1x delay– Estimates based on only 5 circuits
Brown et al. (1992): FPGAs to MPGAs– 812x area, ~3x delay– Optimistic FPGA gate counting?– Anecdotal evidence– Doesn’t consider “hard” macros (multipliers, memories)
Combine for FPGAs to standard cells– 1238x area, ~3.4x delay– Dated; based on (questionable?) extractions
![Page 4: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/4.jpg)
Previous Comparisons (2000’s)
Zuchowski et al. (2002): LUT to ASIC gate (0.25μm90nm)– ~1/45 gate density, 1214x delay, ~500x dynamic power
– Unexplained process-dependent density/power variation– Dependent on gates implemented per LUT
Wilton et al. (2005): Partial programmable replacement– 88x area, 2x delay– Single logic module
Compton & Hauck (2007): FPGA apps. to standard-cell– Avg 7.2x area– Scaled FPGA 0.15μm to 0.18μm standard-cell
![Page 5: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/5.jpg)
Methodology
Implement in both FPGA and standard-cell– Altera Stratix II FPGA: TSMC 90nm multi-Vt, 1.2V
– Standard-cell: ST CMOS090 90nm, dual-Vt, 1.2V
Empirical results from 23 benchmarks– Rejected if different synthesis tools resulted in
>5% register count deviation– Mix of logic, memory, DSP
Analyze gains from FPGA’s DSP and memory blocks Exclude I/Os Have device data from Altera
![Page 6: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/6.jpg)
Implementations
FPGA– Altera-provided CAD flow– Speed/area balanced optimization; optimize critical paths
performance, otherwise optimize area– Automatic DSP, memory block inference– Set to mimic effects of high resource utilization
ASIC– Synopsys/Cadence synthesis/PAR flow– Free to choose from high/standard-Vt cells
– Timing-driven placement; target 7585% utilization– Emphasized performance in compiled memories
![Page 7: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/7.jpg)
Area Comparison
ASIC– Post PAR’d core area– Include memory macros
FPGA– Count only silicon area for used resources– Include surrounding routing resources– Count full block area even if only partially used– Area data from Altera
![Page 8: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/8.jpg)
Area Comparison Results
Logic only:35x avg (17‒54x)
Logic + DSP:25x avg (12‒58x)
Logic + Memory:33x avg (19‒70x)
Logic + Memory + DSP:18x avg (9.5‒26x)
![Page 9: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/9.jpg)
Impact of Hard Macros on Area
Smaller area penalty for designs using hard macros– Hard macro close to ASIC implementation
(plus programmable interface & routing)
![Page 10: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/10.jpg)
Area Comparison Caveats
Pessimistic FPGA area estimation; count full resource area even if only partially used (~5‒10% reduction)
ASIC density may decrease for larger designs, while FPGAs are designed to handle large designs
![Page 11: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/11.jpg)
Delay Comparison
Altera Quartus II / Synopsys PrimeTime SI
Static timing analysis to extract max. clock frequency
Compare for different FPGA speed grades– FPGAs are binned for performance– ASICs tend to be designed for worst-case
![Page 12: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/12.jpg)
Delay Comparison Results(Fastest Speed Grade)
Logic only:3.4x avg (1.9‒5.0x)
Logic + DSP:3.5x avg (2.4‒4.7x)
Logic + Memory:3.5x avg (2.8‒4.3x)
Logic + Memory + DSP:3.0x avg (2.6‒3.5x)
![Page 13: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/13.jpg)
Delay Comparison Results(Slowest Speed Grade)
Logic only:4.6x avg (2.5‒6.7x)
Logic + DSP:4.6x avg (3.0‒6.3x)
Logic + Memory:4.8x avg (3.8‒5.7x)
Logic + Memory + DSP:4.1x avg (3.8‒4.7x)
![Page 14: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/14.jpg)
Impact of Hard Macros on Delay
Almost no benefit—sometimes penalty!– Fixed positions in FPGA; extra routing to use– Fixed architecture; some apps. may not use efficiently
![Page 15: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/15.jpg)
Power Comparison
Altera Quartus II Power Analyzer / Synopsys PrimePower
Compare power, not energy consumption– FPGAs slower; need more time or parallelism– Implement for highest speed possible– Simulate at same operating frequency, voltage
Measure only core power
Assume constant toggle rates for all nets in design– Meaningful test vectors not available for all designs
FPGA static power consumption scaled by used fraction
![Page 16: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/16.jpg)
Power Comparison Results
Logic only:14x avg (5.7‒52x)
Logic + DSP:12x avg (7.5‒16x)
Logic + Memory:14x avg (12‒16x)
Logic + Memory + DSP:7.1x avg (5.3‒8.3x)
![Page 17: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/17.jpg)
Impact of Hard Macros on Power
Slight benefit—primarily from area savings?– Less area and interconnect
![Page 18: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/18.jpg)
Power Consumption Caveats
May be disproportionate power in FPGA clock network– “Overdesigned” for tested circuits– Could have small incremental power increase
ASIC clock network would have to grow with designs
![Page 19: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/19.jpg)
Static Power Comparison
Unable to draw useful conclusions about static power– 87x for typical silicon, typical temp. (25°C)– 5.4x for worst-case silicon, worst-case temp. (85°C)
Had to scale worst-case silicon temp. characterization
Subthreshold leakage is process-dependent– Little information on leakage estimate factors– Different processes from different foundries
Some correlation between static power and area gap(correlation coefficient ~0.8)– Hard macros likely reduced static power penalty
![Page 20: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/20.jpg)
Conclusions
Disparity hard to quantify—very application dependent– Avg. gap gap 3x; gap gap range 1.3‒9.1x
All-LUT designs avg. 35x area, 3.4‒4.6x delay, 14x power– 119x area, 47.6x power gap for equal performance
(assuming ideal parallelization)
Hard macros reduce area and power, but have little performance benefit– Avg. 18x area, 3‒4.1x delay, 7.1x power– 54x area, 21.3x power for equal performance
![Page 21: Literature Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/568144b9550346895db18115/html5/thumbnails/21.jpg)
References
Jones, Jr., H. S., Nagle, P. R., Nguyen, H. T., “A Comparison of Standard Cell and Gate Array Implementations in a Common CAD System”, Proc. IEEE CICC, 1986, pp. 228232
Brown, S. D., Francis, R., Rose, J., Vranesic, Z., Field-Programmable Gate Arrays, Norwell, MA: Kluwer, 1992
Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., Troxel, B., “A Hybrid ASIC and FPGA Architecture,” Proc. ICCAD, Nov. 2002, pp. 187194
Wilton, S. J., Kafafi, N., Wu, J. C. H., Bozman, K. A., Aken’Ova, V., Saleh, R., “Design Considerations for Soft Embedded Programmable Logic Cores”, IEEE JSSC, vol 40, no. 2, pp. 485497, Feb. 2005
Compton, K., Hauck, S., “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comp., vol 56, no. 5, pp. 662672, May 2007