Sudhakar Pamarti, Puneet Gupta, and Kang–L. Wang ... · Sudhakar Pamarti, Puneet Gupta, and...
Transcript of Sudhakar Pamarti, Puneet Gupta, and Kang–L. Wang ... · Sudhakar Pamarti, Puneet Gupta, and...
www.darpa.mil
Spintronic Stochastic Dataflow Computing
Problem: Computing Has Hit a Memory Wall
Stochastic Computing (SC) Reduces Memory Access
• Unique number representation, by the frequency of occurrence of “1”s in a random, binary bitstream
• Highly compact stochastic computing (SC) MACs enable massive • parallelization of compute a.k.a. spatial unrolling• Spatial unrolling greatly reduces scratch pad & DRAM access• But, prior art in SC struggles with large random bitstream generators
Voltage-Controlled Spintronics Enables Dense, Low Energy, CMOS-Compatible Memory (MeRAM)1
4-bit Stochastic MAC: 0.05pJ/op, ~4μm2
Stochastic MAC (Y=P*A+(1-P)*B) : MUX(Y=A*B) Conventional MAC
4-bit fixed-point MAC: 0.08pJ/op, ~200μm2
4-bit Multiplier 4-bit Adder
FAA0B0
FAA1B1
FAA2B2
FAA3B3
Distribution A: Approved for public release: distribution unlimited.
▪ Today’s compute units are large requiring repeated access to memory for operands/results in data-intensive applications▪ Off-chip memory (e.g., DRAM) incurs high communication costs e.g., 100x more than compute▪ On-compute die memory (e.g., SRAM) is not big enough for applications such as machine learning▪ Memory wall problems are omnipresent: both in edge devices and even in the cloud
VC-MTJ: Res = RP or RAP
AccessTransistor
Read Ckt.
Memory Type
Unit Cell Area (um2)
Read Write
Time(ns)
BitlineEnergy
(fJ)Time(ns)
BitlineEnergy
(fJ)Me RAM 0 .0 1 4 27.5 <1 30
STT-MRAM 0 .0 6 1.3 20 9 20 450 0
SRAM (32nm ) 0 .17 0 .7 84 <2 >10 0
e DRAM(45nm ) 0 .0 7 1.4 64 <2 >70
• Voltage controlled magnetic tunneling junction (VC-MTJ)
• Switching time < 1ns, voltage < 1V, current < 10uA, energy < 5fJ
• Nonvolatile, endurance > 1015 , Area < 20F2
• MeRAM array is 5x smaller, 3x more efficient (simulated)
Spintronics Random Num. Gen. Solves SC Problem
free
fixed
V=0
Eb
free
fixed
V=Vbfree
fixed
V=0
Eb
Long Pulse
RNG TypeEnergy/Bit (fJ) Area
(µm2)Latency
(ns) RNG PeriodicityVCMA=90 ,T=40 0 oC VCMA=130 ,T=550 oC
VC-MTJ 37 18 7 26 True randomLFSR 28 35 8 2048
• True, uncorrelated random bits from thermal noise
• Spintronics random number gen is 5x smaller than LFSR
Spintronics+SC is Great for ML on Edge Devices• Compact SC MACs and dense, low energy, non-
volatile MeRAM offer great latency & energy benefits• SC is a purely digital approach that can approach the
efficiency of analog neuromorphic systems• Runtime energy-latency tradeoff on same hardware
References[1] Grezes, C., et.al. "Ultra-low switching energy and scaling in electric-field-controlled nanoscale magnetic tunnel junctions with high resistance-area product." Applied Physics Letters 2016; 108:012403.[2] Wang, S., et.al. "Hybrid VC-MTJ/CMOS non-volatile stochastic logic for efficient computing" Proceedings of the Conf. on Design, Automation & Test in Europe, March 2017, pp. 1442-1447. [3] Chen. Y. et.al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”, IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, January 2017
Progress So Far…• Demonstrated MeRAM devices in lab1 and MeRAM
array benefits using circuit sims• Developed scalable digital SC architecture for deep
neural networks & verified in sims• Offers 6.9x fewer scratchpad accesses & 410x lower
EDP compared to iso-area Eyeriss3
What Will We Accomplish?• Demonstrate MeRAM integrated with CMOS• Develop a scalable SC+MeRAM architecture for Deep Neural
Network (DNN) applications e.g., AlexNet• Achieve, in Silicon, >200x improvement in EDP over prior art,
with comparable inference accuracy
1101111101 (0.8)
1010101100 (0.5)
1000101100 (0.4)
0010010010 (0.3)
1010111110 (0.7)
Acknowledgment and DisclaimerThis material is based on research sponsored by Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement number FA8650-18-2-7867. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL and DARPA or the U.S. Government.
Sudhakar Pamarti, Puneet Gupta, and Kang–L. WangUniversity of California, Los Angeles
New Materials and Devices: Framework for Novel Compute (FRANC)