Generating the Next Wave of Custom Chips
Borivoje Nikolić
Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Diverse Driving Applications
� No single driving application� Diversified set of applications and needs
� Both clients and cloud
Machine learning
2
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Cost of Developing New Products
“8,000 engineer-years to build NvidiaXavier SoC”= 16,000,000 hours
3
Source: IBS
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Key Issues Driving Cost� Dearth of re-use is the dominant problem� Lots of IP is out there� But that IP is largely “black-box”, hard to extend/modify
� Common modules are not commoditized� Value is in differentiation/specialization
� Approach: don’t deliver instances – capture designer methodology in generators!� Facilitates re-use via parameterization and incremental extension
(of the generator – not the instance)� Apply the same to verification� Let’s generate systems!
4
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Generators Enable SpecializationDigital: Chisel3� CHISEL: Constructing Hardware
In Scala Embedded Language� Open-source hardware
construction language� Software library whose classes
represent hardware primitives� Methods connect the classes
together� So executing the software
constructs a graph representing the RTL
� Compiles to FIRRTL� Emits Verilog + collaterals� v.3.1 is currentJ. Bachrach, et al, DAC 2012
Analog: BAG2� Open-source Python-
based framework allowing executable specification of design procedure
J. Crossley, et al, DAC 2013E. Chang, et al, CICC 2018.
Perf. Specs.
Tech. Files
Design Tools
Verified Design
Instance
Circuit Generator
BAG2
5
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Generators Enable Agile Design� Allows a small, integrated team to develop a design through a series of
functional, yet incomplete prototypes� Design, verification and validation� Improve the generator, not the instance!
Implementation (Genus and Innovus)
Validation (Incisive)
Verification(VWB)
Changing Specs, Added Features
Verilog
IP-XACT
C
TriggerGenerator
Improvement
Chisel Design
Generator
+Improving
the Generator
6
Y. Lee, IEEE MICRO’16S. Bailey, JSSC 10/19
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Berkeley RISC-V ISAwww.riscv.org
� An open, license-free ISA � Runs GCC, LLVM, Linux distributions, …� RV32, RV64, and RV128 variants for
32b, 64b, and 128b address spaces� Base ISA only ~40 integer instructions� Extensions provide full general-purpose ISA, including
IEEE-754/2008 floating-point� Designed for extension, customization� Developed at UC Berkeley, maintained by RISC-V Foundation� Most of cost of chip development is in software, so want to make sure software is
reused across many chip designs� Many research and commercial cores offered
� Ariane/Pulpino� Rocket
� Rocket is an example of a generator written in Chisel
7
Borivoje Nikolić, Generating the Next Wave of Custom Chips
RISC-V Rocket Chip Generator� Parametrizable SoC generator written in Chisel
� Processor core (Rocket), floating-point, cache, interconnect
� Standardized co-processor interface, ROcket Custom Co-processor (ROCC)
� http://github.com/chipsalliance/rocket-chip
CHIPS Alliance – A fund under Linux Foundation
8
ScalarRF FPUInt
Rocket Core
ScalarInst. Cache
ScalarData Cache
Arbiter
TileLink2 Crossbar
Tile 0
Main/scratchpad memorySPIUART
TileLink/AXI bridge
JTAG
ROCC
Co-Processors
Branch Prediction
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 1: Change generator parameters
� Options:� Edit configs.scala
9
ScalarRF FPUInt
Rocket Core
ScalarInst. Cache
ScalarData Cache
Arbiter
TileLink2 Crossbar
Tile 0
Main/scratchpad memorySPIUART
TileLink/AXI bridge
JTAG
ROCC
Co-Processors
Branch Prediction
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 1: Change generator parameters
� Options:� Edit configs.scala� Add inclusive L2 cache
� H. Cook, CARRV’19
10
ScalarRF FPUInt
Rocket Core
ScalarInst. Cache
ScalarData Cache
Arbiter
TileLink2 Crossbar
Tile 0
Shared L2 CacheSPIUART
TileLink/AXI bridge
JTAG
ROCC
Co-Processors
Branch Prediction
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 1: Change generator parameters
� Options:� Edit configs.scala� Add inclusive L2 cache
� H. Cook, CARRV’19 � Cache size, organization� Number of cores� etc.
� Target FPGA or ASIC
11
ScalarRF FPUInt
Rocket Core
ScalarInst. Cache
ScalarData Cache
Arbiter
TileLink2 Crossbar
Tile 0
Shared L2 CacheSPIUART
TileLink/AXI bridge
JTAG
ROCC
Co-Processors
Branch Prediction
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 2: Develop custom circuits
� Example:� 4KB single p-well 8T SRAM
macro for low voltage operation in 28FDSOI
B. Keller, et al, JSSC 7/2017.
� Or your own technology� Processor-to-DRAM photonic link
C. Sun, et al, Nature 2015.
ScalarRF FPUInt
Rocket Core
ScalarInst. Cache
ScalarData Cache
Arbiter
TileLink2 Crossbar
Tile 0
Shared L2 CacheSPIUART
TileLink/AXI bridge
JTAG
ROCC
Co-Processors
Branch Prediction
12
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 3: Develop a different RISC-V core
ScalarRF FPUInt
Rocket Core
ScalarInst. Cache
ScalarData Cache
Arbiter
TileLink2 Crossbar
Tile 0
Shared L2 CacheSPIUART
TileLink/AXI bridge
JTAG
ROCC
Co-Processors
Branch Prediction
� ‘Standard’ Rocket core� 5-stage, in-order
� BOOMv2/v3� Out-of-order core
C. Celio, et al, IEEE MICRO 2019.A. Gonzalez, CARRV’19
fetch decode int. ex mem
fp rf fp ex
wb
13
Borivoje Nikolić, Generating the Next Wave of Custom Chips
B(R)OOM Test Chip� BOOMv2 out-of-order-core with cache Resiliency
� 4 months to a tapeout
1MB L2 CacheData array
D$ I$
L2 tag, RIT, PAT
BPD,BTB
iregfile
OoO core
Chip summary
Area 2x3 mm2
Technology TSMC 28nm HPM
ISA RISC-V RV64IMAFD with Sv39
Fetch width 2 instructions
Issue width 4 micro-ops
Regfile 6R3W (int)3R2W (fp)
Exe Unit ALU, Mul, Div, FMA, Load/Store
L1 I/D Cache
4-way, 16KB
L2 Cache 8-way, 1MB
P.-F. Chiu, et al, SSC-L 2019
14
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 4: Develop an accelerator/co-processor
Decoupled VectorAccelerator Scalar Unit
Scalar Execution
Unit (SXU)Vector Lane 0
Vector ExecutionUnit (VXU)Sequencer/Expander
v p
Vector MemoryUnit (VMU)
Vector Lane N
Vector ExecutionUnit (VXU)
v p
Vector MemoryUnit (VMU)
Vector Lane 1
Vector ExecutionUnit (VXU)
v p
Vector MemoryUnit (VMU)
…
Master Sequencer
Sequencer/Expander
Sequencer/Expander
RocketControl
Processor
ScalarMemory
Unit(SMU)
s
a
VectorRunahead
Unit (VRU)
a
4 KBL1 VI$
L1-to-L2 TileLink Crossbar
VCMDQFPREQQ
FPRESPQ
VRCMDQ
Vector Lane 2
Vector ExecutionUnit (VXU)
v p
Vector MemoryUnit (VMU)
Sequencer/ExpanderBranch prediction
ScalarRF FPUInt
Rocket core
ScalarInst. Cache
ScalarData Cache
Arbiter
Vector Issue Unit
CrossbarVariable-precision
FPU
VectorInst. Cache
Vector RF
Int Int Int Int
TileLink2 Crossbar
Tile 0
Shared L2 CacheSPIUART
TileLink/AXI bridge
JTAG
� 4-lane vector co-processor� 4DP/8SP/16HP FP instructions� Dense and sparse linear algebra� Machine learning workloads
C. Schmidt, RISC-V Summit 12/2018
www.hwacha.org
15
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Rocket Chip Customization� Option 5: Develop a peripheral device
Branch prediction
ScalarRF FPUInt
Rocket core
32kb ScalarInst. Cache
32kb ScalarData Cache
Arbiter
Vector Issue Unit
CrossbarVariable-precision
FPU
8kb VectorInst. Cache
16kb Vector RF
Int Int Int Int
TileLink2 Crossbar
Tile 0Tile 1
Shared L2 Cache
MMIO Manager
SPIUART
Data and Control Crossbars
SCR SCRADCDAC
SCRDSP ChainDSP Chain
High-SpeedSerial
� Attach via a TileLink2 or AXI interface� DSP accelerators, off-chip interfaces� Analog peripheral interfaces� Optional DMA
16
Borivoje Nikolić, Generating the Next Wave of Custom Chips
ChiselDSP
� Supports number-representation-agnostic generator design� Datatypes and associated operators can be real/complex,
fixed/floating-point without rewriting any of the core generator code
푦 푛 = ℎ 푘 푥[푛 − 푘]
IO ParametersDspComplexDspReal
FixedPoint,Interval
17
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Adding ChiselDSP to Rocket Chip: DspBlock
� Basic building block of DSP functionality� Diplomatic interface� Streaming inputs and outputs (any number)� Optional memory interface� Control and status registers
� Maps to RocketChip
CSR
DSP BlockPack
Unpack
Memory
AXI-S AXI-SDSP
P. Rigge, et al, CARRV 2018.
18
Borivoje Nikolić, Generating the Next Wave of Custom Chips
FireSim� Cycle-exactly simulating large SoCs on
cloud FPGAs @10s-100s of MHz� Open-source: https://fires.im� Targets:
(1) Architecture evaluation(2) Validate application on a pre-Si SoC
S. Karandikar, ISCA ’18, IEEE Micro TopPicks ’18, CARRV ’19
SoC RTL
Other RTL
Network Topology
Automatically deployed, high-performance, distributed simulation
SW Models
Full Work-load
A. Amid, CARRV’19 19
Example of (2): PageRank on Rocket+Hwacha
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Need Analog – BAG to the Rescue
� Schematic/Layout generator� Produces schematic/layout from structural
parameters
� Measurement Manager� Simulates and computes performance
specifications of a given circuit instance
� Design Script� Contains algorithm used to generate
instance from top level specifications� Or use ML (K. Hakhamaneshi, DAC’19)
� Design Flow Example: Top Level Overview
E. Chang, CICC’18
20
Borivoje Nikolić, Generating the Next Wave of Custom Chips
SAR ADC
Switch-Cap DAC
Comparator R-ladder DAC
Time-Interleaved SAR ADC SerDes TX SerDes RX
Some Generators We’ve Built
21
Borivoje Nikolić, Generating the Next Wave of Custom Chips
ST 28nm FDSOI
GF 22nm FDX
TSMC 16nm
ADC Core SerDes RX Datapath
TSMC 16nm
GF 45nm RF PDSOI
SerDes RX Core(variable taps)
Parametrization and Process Portability
22
Borivoje Nikolić, Generating the Next Wave of Custom Chips
JunMay
Raven-1
Raven-2
Raven-3 Raven-4
EOS14
EOS16EOS18
EOS20EOS22 EOS24
2011 2012 2013 2014 2015
May Apr Aug Feb Jul Sep Mar Nov Mar
SWERVE
Apr
Hurricane-1
2016
Jul Mar
Hurricane-2
CRAFT-0
2017
FFT2
CraftP1
Raven, Hurricane: ST 28nm FDSOI, SWERVE: TSMC 28nm EOS: IBM 45nm SOI, CRAFT: 16nm TSMC,
Chisel3+ BAG2
EAGLE
2018
AI
Chisel+BAG-Generated RISC-V Chips
2019
ADCS
Mar
EAGLEX
HYDRA
GPS
FADER
BAG
Chisel2ChiselVerilog
23
Borivoje Nikolić, Generating the Next Wave of Custom Chips
� Signal analysis SoC� 7 GS/s ADC� DSP chain: ADC cal, tuner,
136-tap FIR, 12-tap PFB, 128-pt FFT� Pattern generator, logic analyzer� AXI4 crossbar� RISC-V Rocket core� 4-lane vector unit � 8MB main memory� UART, serial
Signal Analysis SoC in 16nm FFC
� Entire design in 14,000 hours (UCB, NGC, Cadence)� Open source
� https://github.com/ucb-art/craft2-chipS. Bailey, et al, A-SSCC’18, JSSC 10/19
24
Borivoje Nikolić, Generating the Next Wave of Custom Chips
TunerFIR FilterDecimator
PolyphaseFilter FFTADC
toCPU
signal
noise
signal x 1.375 GHz
low-pass at 150 MHz,
decimate by 8
spectralleakagereduced
complex outputunscrambledand squared
on CPU
4 GS/s
calibrationLUT
(1)
(2)
(3)
(4)
(5)
(6)
ADC
DSPChain
VectorProcessing
Radar App
SoC Runs Applications…
25
Borivoje Nikolić, Generating the Next Wave of Custom Chips
…And is Process Portable
TSMC 16FFC GLOBAL FOUNDRIES 14LPP
Port in 2,700 hours(<20% of original)
26
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Sparse FFT Chip
� Sparse FFT� 3 subsampling
4GS/s ADCs, ÷25, ÷27, ÷32
� 3 FFTs: 874-pt, 800-pt, 675-pt
� Peeling decoder� Rocket core A. Wang, et al, ESSCIRC’18, JSSC, 07/19
3,000 hours
27
Borivoje Nikolić, Generating the Next Wave of Custom Chips
SerDes Generator and Instance
� Generated a SerDes instance in TSMC 16nm @15 Gb/s
� All generated designs DRC and LVS clean without manual modifications
28
E. Chang, VLSI’18
Borivoje Nikolić, Generating the Next Wave of Custom Chips
ADC Generator Architecture� CLK generator, SARADC array, retimer� Laygo flow: ADC slice => array => CLK gen => retimer� A lot of design options
� Clock pulse-width, source follower, sampler topology, comparator topology, Asynch. clock speed/configuration, body-biasing…
x N
ICLK
+
-
CapDAC Array
+
-VIN
Asynch.Clock
Generator
SARCLK
SARLogic
CLKOUT
VREF<2:0>
DOUT<n-1:0>
CLKP
CLKN
CLK GENRetimer
L L L
p2p1
L L L
p2p1
L L L
p2p1p1/p2 DCC
29
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Design Flow� LayGo engine of BAG
30
Borivoje Nikolić, Generating the Next Wave of Custom Chips
• 38.2-dB SNDR at 7GS/s, 45mW• Generated analog design can compete with state-of-the-art custom designs!
TISARADC Instance in TSMC16FFC
31
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Summary and What’s Next
� Generator-based approach facilitates re-use and agile execution� Just like other software ecosystems, opportunities for reuse increase as
available libraries (generators) expand� The ecosystem is expanding – RISC-V, RocketChip, Chisel, BAG, FireSim� We are working on generators for SoCs, multiprocessors, radars, radios,
navigation, SerDes, data converters, PLLs, RF T/RX, …� And building the design flow generator (architecture->layout)
� Possibly including Federation tools, H. Cook, CARRV’19� Interested?
� Chisel “bootcamp”:https:/ /github.com/freechipsproject/generator-bootcamp
� BAG “bootcamp”: https:/ /github.com/ucb-art/BAG2_cds_ff_mpt� FireSim: https:/ / fires.im� New ‘one-stop shop’ repository of all Berkeley designs coming up…
32
Borivoje Nikolić, Generating the Next Wave of Custom Chips
Acknowledgments� Faculty: E. Alon, K. Asanovic, J. Bachrach, V. Stojanovic, D. Patterson� Staff: B. Richards, C. Markley, J. Lawson, J. Dunn� Students (current and past): Brian Zimmer, Yunsup Lee, Ben Keller, Paul
Rigge, Angie Wang, Stevo Bailey, Eric Chang, Pi-Feng Chiu, Colin Schmidt, John Wright, Martin Cochet, Albert Ou, Howard Mao, Woorham Bae, Jaeduk Han, Andrew Waterman, Henry Cook, Christopher Celio, Adam Izraelevitz, Zhongkai Wang, Sean Huang, Zhaokai Liu, Sagar Karandikar, Alon Amid, Nathan Narevsky, Jaehwa Kwak, Donggyu Kim, David Biancolin, Jack Koenig,…
� DARPA CRAFT and PERFECT programs� STMicroelectronics, TSMC chip donations, fabrication
33
Top Related