Energy efficient Machine Learning in Silicon: A...

Energy‐efficient Machine Learning in Silicon: A Communications‐

inspired Approach

Naresh Shanbhag

Jack Kilby Professor of Electrical and Computer Engineering

www.shanbhag.ece.illinois.edu

University of Illinois at Urbana‐Champaign

1

2

• Cognitive, decision‐making• Continuous on‐device learning

• energy, storage, computational capacity

Max. intelligence‐per‐unit‐volume

St. resource constraints

On‐device intelligence

Resource constraints

How do we operate at the Fundamental Limits?

Claude Shannon

50yrs later Communication systemsoperate at the Limits with Capacity Achieving Codes

1948Obtained Fundamental

Limits for CommunicationSystems

• Pop quiz – What is the world’s most popular learning algorithm deployed “on‐device” today?

• Answer: the least mean‐square (LMS) algorithm (Widrow‐Hoff, 1960).– Originally used to train Adaptive Linear Neuron (ADALINE) & MADALINE

• Used in communication receivers since the mid‐’60’s– Channel estimation, echo/crosstalk cancellation, equalization (Got a cell‐phone, have LMS!)

• LMS SGD applied to a linear combiner to minimize MSE• SGD is the workhorse of deep learning networks today

3

[ICML 2016 Tutorial]

4

• Design principles for energy‐efficient, high‐throughput, and capacity achieving communication ICs with learning capabilities are well‐established → repurpose these for on‐device intelligence

Ph.D. thesis ‘93

51.84 Mb/s Very high‐speed DSL receivers

(AT&T)

On‐chip Learners(adaptive equalizers)

12.5 Gb/s Long‐haul optical receivers

(Intersymbol Comm.)

Viterbi equalizer with On‐chip Learner

(channel estimator)

AlgorithmsArchitectures

Integrated Circuits

• Repurposing will get us a good baseline design, and avoid reinvention, but …. the big question is:

How do we design intelligent platforms operating at the limits of energy efficiency, throughput & information density………?

5

Improved spin

Spin Torque Transfer

VSS

VDD

Isupply

Ferromagnet

Conducting Channel

Insulating Partition

Input Magnet

Output Magnet

Limiting behavior of nanofabrics is stochasticLithography Emerging devices Probabilistic switching (RRAM)

[Wong]

• NTV CMOS is sensitive to process variations, delay (spin), resistance (RRAM), and other physical parameters are random variables

• Sensing and memory substrates are low‐SNR• Randomness in nanofabrics becoming visible at the limits of scaling, energy, throughput need an efficient method to compensate for it

Shannon‐inspired Statistical Error Compensation (SEC)

• Treat computation on stochastic fabric as a noisy channel • Leverage statistical estimation, detection, and inference

techniques

7

yx1y2yNy

, ( , )eP e [Hedge & Shanbhag, IEEE Transactions on VLSI’01, IEEE Journal of Solid‐State Circuits’04]

error probability mass functions

Statistical Error Compensation Techniques

8

Algorithmic noise‐tolerance (ANT) Stochastic sensor NOC (SSNOC)

Soft NMR Likelihood Processing

[ISLPED99,CICC01,JSSC04,TVLSI04,TVLSI08,JSSC13]

[Trans. Computers’12] [Trans. on Multimedia’13]]

[TVLSI10,CICC11,TVLSI14]

5.8X energy reductionPdet > 90% with error rates < 86%

256‐tap PN codedetection filter in180nm CMOS

• Ability to handle high computational error rates demonstrated in prototype inference ICs can apply these to machine learning algorithms

9

Subthreshold ECG classifierin 45nm CMOS

28% energy reduction(wrt MEOP)

Pdet > 95% with error rates < 58%

[Kim, Shanbhag, et al., CICC 2012]

[Abdallah, Shanbhag,IEEE Journal of Solid‐State Circuits, 2013]

Systems on Nanoscale Information fabriCswww.sonic‐center.org

A Systems‐driven approach to extend Moore’s Law into the deep Nanoscale regime

A Systems‐driven approach to extend Moore’s Law into the deep Nanoscale regime

by developing Shannon & Brain‐inspired statistical information processing principles, architectures, prototypes

by developing Shannon & Brain‐inspired statistical information processing principles, architectures, prototypes

Director: Naresh Shanbhag Associate Director: Andrew Singer

[2013‐’17]Illinois (LEAD), Berkeley, Stanford, UCSD, UCSB, Michigan, CMU, Princeton, Cornell,

MIT

Illinois (LEAD), Berkeley, Stanford, UCSD, UCSB, Michigan, CMU, Princeton, Cornell,

MIT

11

The New Game:deep learning on mobile platforms

• Cognitive, decision‐making• Continuous on‐device

learning

Maximize intelligence‐per‐unit‐volume

• on energy, storage, computational capacity

Under stringent resource constraints

12

J. Von Neumann, Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components, Princeton University Press (1956)

“treatment of error is unsatisfactory and ad hoc …. error should be treated as information has been, by the works of C. E. Shannon…The present treatment falls short of achieving this”

Thank You!

Energy efficient Machine Learning in Silicon: A...

Documents

Transcript of Energy efficient Machine Learning in Silicon: A...