Energy efficient Machine Learning in Silicon: A...
Transcript of Energy efficient Machine Learning in Silicon: A...
Energy‐efficient Machine Learning in Silicon: A Communications‐
inspired Approach
Naresh Shanbhag
Jack Kilby Professor of Electrical and Computer Engineering
www.shanbhag.ece.illinois.edu
University of Illinois at Urbana‐Champaign
1
2
• Cognitive, decision‐making• Continuous on‐device learning
• energy, storage, computational capacity
Max. intelligence‐per‐unit‐volume
St. resource constraints
On‐device intelligence
Resource constraints
How do we operate at the Fundamental Limits?
Claude Shannon
50yrs later Communication systemsoperate at the Limits with Capacity Achieving Codes
1948Obtained Fundamental
Limits for CommunicationSystems
• Pop quiz – What is the world’s most popular learning algorithm deployed “on‐device” today?
• Answer: the least mean‐square (LMS) algorithm (Widrow‐Hoff, 1960).– Originally used to train Adaptive Linear Neuron (ADALINE) & MADALINE
• Used in communication receivers since the mid‐’60’s– Channel estimation, echo/crosstalk cancellation, equalization (Got a cell‐phone, have LMS!)
• LMS SGD applied to a linear combiner to minimize MSE• SGD is the workhorse of deep learning networks today
3
[ICML 2016 Tutorial]
4
• Design principles for energy‐efficient, high‐throughput, and capacity achieving communication ICs with learning capabilities are well‐established → repurpose these for on‐device intelligence
Ph.D. thesis ‘93
51.84 Mb/s Very high‐speed DSL receivers
(AT&T)
On‐chip Learners(adaptive equalizers)
12.5 Gb/s Long‐haul optical receivers
(Intersymbol Comm.)
Viterbi equalizer with On‐chip Learner
(channel estimator)
AlgorithmsArchitectures
Integrated Circuits
• Repurposing will get us a good baseline design, and avoid reinvention, but …. the big question is:
How do we design intelligent platforms operating at the limits of energy efficiency, throughput & information density………?
5
Improved spin
Spin Torque Transfer
VSS
VDD
Isupply
Ferromagnet
Conducting Channel
Insulating Partition
Input Magnet
Output Magnet
Limiting behavior of nanofabrics is stochasticLithography Emerging devices Probabilistic switching (RRAM)
[Wong]
• NTV CMOS is sensitive to process variations, delay (spin), resistance (RRAM), and other physical parameters are random variables
• Sensing and memory substrates are low‐SNR• Randomness in nanofabrics becoming visible at the limits of scaling, energy, throughput need an efficient method to compensate for it
Shannon‐inspired Statistical Error Compensation (SEC)
• Treat computation on stochastic fabric as a noisy channel • Leverage statistical estimation, detection, and inference
techniques
7
yx1y2yNy
, ( , )eP e [Hedge & Shanbhag, IEEE Transactions on VLSI’01, IEEE Journal of Solid‐State Circuits’04]
error probability mass functions
Statistical Error Compensation Techniques
8
Algorithmic noise‐tolerance (ANT) Stochastic sensor NOC (SSNOC)
Soft NMR Likelihood Processing
[ISLPED99,CICC01,JSSC04,TVLSI04,TVLSI08,JSSC13]
[Trans. Computers’12] [Trans. on Multimedia’13]]
[TVLSI10,CICC11,TVLSI14]
5.8X energy reductionPdet > 90% with error rates < 86%
256‐tap PN codedetection filter in180nm CMOS
• Ability to handle high computational error rates demonstrated in prototype inference ICs can apply these to machine learning algorithms
9
Subthreshold ECG classifierin 45nm CMOS
28% energy reduction(wrt MEOP)
Pdet > 95% with error rates < 58%
[Kim, Shanbhag, et al., CICC 2012]
[Abdallah, Shanbhag,IEEE Journal of Solid‐State Circuits, 2013]
Systems on Nanoscale Information fabriCswww.sonic‐center.org
A Systems‐driven approach to extend Moore’s Law into the deep Nanoscale regime
A Systems‐driven approach to extend Moore’s Law into the deep Nanoscale regime
by developing Shannon & Brain‐inspired statistical information processing principles, architectures, prototypes
by developing Shannon & Brain‐inspired statistical information processing principles, architectures, prototypes
Director: Naresh Shanbhag Associate Director: Andrew Singer
[2013‐’17]Illinois (LEAD), Berkeley, Stanford, UCSD, UCSB, Michigan, CMU, Princeton, Cornell,
MIT
Illinois (LEAD), Berkeley, Stanford, UCSD, UCSB, Michigan, CMU, Princeton, Cornell,
MIT
11
The New Game:deep learning on mobile platforms
• Cognitive, decision‐making• Continuous on‐device
learning
Maximize intelligence‐per‐unit‐volume
• on energy, storage, computational capacity
Under stringent resource constraints
12
J. Von Neumann, Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components, Princeton University Press (1956)
“treatment of error is unsatisfactory and ad hoc …. error should be treated as information has been, by the works of C. E. Shannon…The present treatment falls short of achieving this”
Thank You!