Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise...

62
Benchmarking for Beyond-CMOS Devices in Boolean and Neuromorphic Circuits Chenyun Pan and Azad Naeemi Georgia Institute of Technology School of Electrical and Computer Engineering [email protected]

Transcript of Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise...

Page 1: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

Benchmarking for Beyond-CMOS Devices in

Boolean and Neuromorphic Circuits

Chenyun Pan and Azad Naeemi

Georgia Institute of Technology

School of Electrical and Computer Engineering

[email protected]

Page 2: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

Beyond-CMOS Devices and

Boolean Circuit

Benchmarking

Page 3: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

3

Magnetic Devices

FM

SHE Material

-> <-

FM

+V -V

Vin

Vout

Composite–Input Magnetoelectric–

based Logic Technology (CoMET)

Domain Wall Magnetic

Logic (mLogic)All Spin Logic (ASL)

Charge-Coupled

Spin Logic (CSL) Magnetoelectric

MTJ(MEMTJ)

Spin Wave Devices (SWD)

Sharma, Nishtha, et al. Energy Efficient Electronic Systems (E3S), 2015 Fourth Berkeley Symposium on. IEEE, 2015.

B. Behin-Aein, D. Datta, S. Salahuddin, and S. Datta, Nature nanotechnology, vol. 5, pp. 266-270, 2010.

S. Datta, S. Salahuddin, and B. Behin-Aein, Applied Physics Letters, vol. 101, p. 252411, 2012.

D. Morris, D. Bromberg, J.-G. J. Zhu, and L. Pileggi, DAC, 2012, pp. 486-491.

S. Dutta, S.-C. Chang, N. Kani, D. E. Nikonov, S. Manipatruni, I. A. Young, et al., Scientific reports, vol. 5, 2015.

M. Mankalale, et al. https://arxiv.org/abs/1611.09714

Page 4: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

4

Field Effect Devices

TMDTFET

GaNTFET

ThinTFET

BisFETITFET

vdWFET

NCFET

MITFET

S. Salahuddin and S. Datta, Nano letters, vol. 8, pp. 405-410, 2008.

S. Das, A. Prakash, R. Salazar, and J. Appenzeller, ACS nano, vol. 8, pp. 1681-1689, 2014.

M. O. Li, D. Esseni, G. Snider, D. Jena, and H. G. Xing, Journal of Applied Physics, vol. 115, p. 074508, 2014.

L. Wenjun, S. Sharmin, et al., Exploratory Solid-State Computational Devices and Circuits, IEEE Journal on, vol. 1, pp. 28-34, 2015.

C. Pan and A. Naeemi, in Quality Electronic Design (ISQED), 2012 13th International Symposium on, 2012, pp. 262-269

J. Son, S. Rajan, S. Stemmer, and S. J. Allen, Journal of Applied Physics, vol. 110, p. 084503, 2011.

L. Liu, Y. Lu, and J. Guo, Electron Devices, IEEE Transactions on, vol. 60, pp. 4133-4139, 2013.

S. K. Banerjee, L. F. Register, E. Tutuc, D. Reddy, and A. H. MacDonald, Electron Device Letters, IEEE, vol. 30, pp. 158-160, 2009.

GpnJ

Page 5: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

5

Boolean Benchmarking Circuit

• 32-bit arithmetic logic unit (ALU)

Nikonov, Dmitri E., and Ian A.

Young. "Benchmarking of

beyond-CMOS exploratory

devices for logic integrated

circuits." IEEE Journal on

Exploratory Solid-State

Computational Devices and

Circuits 1 (2015): 3-11.

Page 6: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

6

Benchmarking Results for a 32-bit ALU

Page 7: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

7

Motivation

• Difficult for beyond-CMOS devices to compete with CMOS in

traditional Boolean circuits.

• Need to search for non-traditional circuits where beyond-

CMOS devices can perform well:

– Have many applications

– Can be efficiently implemented with many devices

– Have well-defined boundaries and specs

• Cellular neural networks (CeNNs) perform many tasks better

than traditional circuits, e.g. image processing, associative

memory, target tracking, and etc.

Page 8: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

Conventional Charge-based

CeNN Implementation

Page 9: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

9

CeNN Dynamic and Implementation

CMOS Implementation with

Opamp and OTA

L. Chua, et al., “Cellular neural networks: theory,” IEEE TCAS, vol. 35, pp. 1257-1272, 1988.

A. Trivedi, et al., IEEE TED, vol. 61, No.11, 2015.

𝐶𝑑𝑥𝑖𝑗

𝑑𝑡= −

1

𝑅𝑥𝑖𝑗 + 𝐴𝑖𝑗 ,𝑘𝑙𝑦𝑘𝑙

𝑘𝑙∈𝑆𝑖𝑗

+ 𝐵𝑖𝑗 ,𝑘𝑙𝑢𝑘𝑙𝑘𝑙∈𝑆𝑖𝑗

+ 𝐼𝑖𝑗

Page 10: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

10

Digital CeNN Implementation

Assume an n-bit weight operation with n+3 bit addition.

The final state is 1-bit.

n-bit AND2

(n+3)-bit Register

1-bit State

(n+3)-bit Adder

CLK

Truncation

n-bit Register

CLKCounter

+Decoder

Weights

n-bit Weightn-bit Weight

n-bit Weightn-bit Register

Page 11: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

How about Spintronic

CeNN?

Page 12: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

12

Spintronic CeNN Implementation

Using All-Spin Logic device as the building block

The output magnet switches faster as the input current

increases, acting as an integrator.

->

Spin Diffusion

Vdd

Ispin1

Icharge

Ispin2

0 0.5 1 1.5 2 2.5 3-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (ns)

Ma

gn

etiza

tio

n m

y

Ic = 0.6mA

Ic = 0.8mA

Ic = 1.0mA

,

0

S

eff

s

d m d m Im H m

dt dt qN

Page 13: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

13

Spintronic CeNN Implementation

The output magnet is sensed by using two MTJs and one inverters.

MTJs are matched to tolerate the process variation.

-1 -0.5 0 0.5 10

0.1

0.2

0.3

0.4

0.5

Magnetization in z Direction mz

Vou

t (V

)

tox

= 1.9 nm

tox

= 2.0 nm

tox

= 2.1 nm

->

Spin Diffusion

Ispin1

Icharge

Ispin2

+Vdd

+Vdd

Vout

+Vdrive

-Vdrive

Vin

Vin

Using All-spin logic device as the building block

,

0

S

eff

s

d m d m Im H m

dt dt qN

Page 14: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

14

Spintronic CeNN Implementation

Illustrations of a CeNN implemented with magnetic synapses and neurons

Vout

+Vdd

Gnd

Gnd

Gnd

Vou

t

mz

I spin

1

Icharge

Ispin2

+Vdrive

Gnd

-Vdrive

Vin,1

Vin,1

+Vdrive

-Vdrive

+Vdrive

-Vdrive

Vout

Vin,2

Vin,2

Vin,n

Vin,n

Free Magnet

Fixed Magnet

Oxide

Copper

+Vdd

+Vdd

(a) (b)

,

0

S

eff

s

d m d m Im H m

dt dt qN

Page 15: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

15

Associative Memory Application

0 2 4 6 8 10 12-1

-0.5

0

0.5

1

Time (ns)

Ma

gniti

zatio

n in

z D

ire

ctio

n m

z

0 2 4 6 8 10 12-1

-0.5

0

0.5

1

Time (ns)

Magn

itizati

on in

z Di

recti

on m

z

(b)

(c) (d) (e) (f) (g) (h)

Thermal noise has been included in all simulations.

Page 16: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

16

Other Spintronic Devices Implementation

Spin Hall Effect

MTJs are used to implement synapses with various weights.

MTJs+Vdrive

Vin,1

Vin,1

Vin,2

Vin,2

Vin,n

Vin,n

+Vdrive

-Vdrive

-Vdrive

+Vdrive

-Vdrive

Vout

+Vdd

Gnd

Vout+Vdd

+Vdd

Page 17: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

17

Other Spintronic Devices Implementation

Domain Wall Motion

MTJs are used to implement synapses with various weights.

MTJs

Vout

+Vdd

+Vdrive

Gnd

Vin,1

Vin,1

Vout

Vin,2

Vin,2

Vin,n

Vin,n

+Vdd

+Vdd

+Vdrive

-Vdrive

-Vdrive

+Vdrive

-Vdrive

Page 18: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

18

CeNN Implementations with Various Devices

Analog CeNN Spintronic CeNN

Vout

+Vdd

Gnd

Gnd

Gnd

Vou

t

mzI sp

in1

Icharge

Ispin2

+Vdrive

Gnd

-Vdrive

Vin,1

Vin,1

+Vdrive

-Vdrive

+Vdrive

-Vdrive

Vout

Vin,2

Vin,2

Vin,n

Vin,n

Free Magnet

Fixed Magnet

Oxide

Copper

+Vdd

+Vdd

(a) (b)

Digital CeNN

n-bit AND2

(n+3)-bit Register

1-bit State

(n+3)-bit Adder

CLK

Truncation

n-bit Register

CLKCounter

+Decoder

Weights

n-bit Weightn-bit Weight

n-bit Weightn-bit Register

ThinTFET TMDTFET GPNJ

NCFET FEFET MITFET

GaNTFET vdWFET

Vout

+Vdd

+Vdrive

Gnd

-Vdrive

Vin,1

Vin,1

+Vdrive

-Vdrive

+Vdrive

-Vdrive

Vout

Vin,2

Vin,2

Vin,n

Vin,n

+Vdd

+Vdd

Vout

+Vdd

+Vdrive

Gnd

Vin,1

Vin,1

Vout

Vin,2

Vin,2

Vin,n

Vin,n

+Vdd

+Vdd

+Vdrive

-Vdrive

-Vdrive

+Vdrive

-Vdrive

Spin Hall Effect Domain Wall Motion

Page 19: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

Iso-Accuracy Benchmarking

for Various CeNN

Implementations

Page 20: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

20

Spintronic CeNN for Associative Memory Application

Recall rate depends on the random noisy pixel locations.

Page 21: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

21

Accuracy Analysis

Patterns 1~5 are associated with patterns 6~0 based on Hebbianlearning algorithm.

Monte Carlo simulations are performed with 100 samples for each pattern.

Four CeNN implementations are simulated:

Analog

Digital

Spin diffusion

Domain wall

Page 22: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

22

0 10 20 30 40 50

Number of Synapses

0

20

40

60

80

100

Recall

Rat e

( %)

Spin Diffusion

Analog

Digital

Domain Wall

Iso-Accuracy/Recall Rate Analysis

By increasing the number of synapses for each neuron, the recall rate can be

increased significantly.

For a target recall rate, the number of required synapses can be obtained.

With 3%

Input

Noise

With 15%

Input Noise

Page 23: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

23

Number of Required Synapses

More synapses are required for a larger noise level.

The analog CeNN requires fewer synapses compared to other

CeNNs.

Page 24: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

24

4-bit weight provides a better trade-off between required

number of synapses and synapse bits.

Number of Required Synapses

Page 25: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

25

Boolean Circuit

Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate

Spintronic devices can compete with charge-based devices for

the CeNN application.

Cellular Neural Network

102

103

104

105

106

Delay (ps)

101

102

103

104

105

En

er g

y(f

J)

Digital CMOS HP

Digital CMOS LVCMOS HP

CMOS LV

HomJTFET

HetJTFET

ThinTFET GaNTFETTMDTFET

FEFETNCFET

MITFET

GpnJ-Vg2

GpnJ-Vg3

vdWFET-BP

CMOS

TFETs

Ferroelectric

Others

Spintronics

Page 26: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

26

Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate

Spintronic devices can compete with charge-based devices for

the CeNN application.

Boolean CircuitCellular Neural Network

102

103

104

105

106

Delay (ps)

101

102

103

104

105

En

er g

y(f

J)

Digital CMOS HP

Digital CMOS LVCMOS HP

CMOS LV

GpnJ-Vg2

GpnJ-Vg3

HomJTFET

HetJTFET

ThinTFET GaNTFETTMDTFET

FEFETNCFET

MITFET

vdWFET-BPSD-PMA

SD

SD-HA

SD-HAs

SHE

SHE-CC

SHE-YIG

DW

CMOS

TFETs

Ferroelectric

Others

Spintronics

Page 27: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

27

Summary

– It is difficult for beyond-CMOS devices to compete with

CMOS for Boolean applications.

– We need to search for novel circuits to fully utilize the

potential of new technologies.

– Three types of CeNN implementations are investigated and

benchmarked, including analog, digital, and spintronic CeNNs,

with a variety of beyond-CMOS device technologies.

– Based on iso-accuracy analyses, spintronics show promising

results in CeNN application compared to the counterpart in

the conventional Boolean logic computation.

Page 28: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

28

Page 29: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

29

ASL Device Updates

All-spin logic devices based on Heusler Alloys are added:

Substantially larger Ku values allow using smaller magnets and improve the switching

delay.

The lower damping factor and higher polarization factor further improve the delay.

Parameters Heusler Alloy Improved Heusler

Alloy with small

Ms

Magnet dimensions, Wm 15nm x 15nm x 2nm 15nm x 15nm x 2nm

Saturation magnetization constant, Ms 400 x 103 A/m 100 x 103 A/m

Crystal anisotropy constant, KU 2.60 x 106 J/m3 2.60 x 106 J/m3

Damping factor, 0.005 0.005

Polarization factor, 0.8 0.8

Magnetic resistivity, ρ 120 µΩ·cm 120 µΩ·cm

Page 30: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

30

Device Performance

Operation:

Multi-phase clocking ensures non-reciprocity.

Signal amplitude gets restored at each stage.

Deterministic switching in the presence of thermal

noise (compensation of demagnetization) [1].

Interference is used to implement majority gate.

Non-volatile.

[1] S. Dutta, et al., Applied Physics Letters (2015): 192404.

[2] S. Dutta, et al., IEEE Nano, 2016.

Updates:

More rigorous calculation of ME voltage and delay.

Accounted for the energy in the clock circuit and

wires

Page 31: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

31

Reliable Operation Under Thermal Noise

Exchange-spring system changes the z axis

from high energy point to a saddle point

to get a damped switching.

S. Dutta, D. Nikonov, S. Manipatruni, I. Young, and A. Naeemi, “Overcoming thermal noise in non-

volatile spin wave logic,” Scientific Reports, 2017.

Funded by Intel.

rare-event

enhancement

Page 32: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

32

Composite-Input Magnetoelectric-Based Logic

Technology (CoMET )

• The delay is dominated by the

nucleation and propagation.

• The energy is dominated by the

switching energy of transistors.

S. Chang et al., IEEE JxCDC, 2016.

Original proposal from Intel: Modified proposal from C-SPIN

M. G. Mankalaleet al., IEEE JxCDC, 2017.

Page 33: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

33

ASL (All Spin Logic)

Updates:

The spin polarized current is better captured by solving

spin diffusion equations with approximated boundary

conditions.

Size effects such as surface scattering are accounted for at

nanoscale [1].

ASL using perpendicular magnets is added to benchmarking.

[1] S. Rakheja, S.-C. Chang, and A. Naeemi, “Impact of dimensional scaling and size effects on spin transport

in copper and aluminum interconnects,” IEEE T-ED, Nov 2013.

Major Limiting factors:

Spin-transfer torque is not energy efficient.

A considerable fraction of spins go directly to ground.

Need supply clocking.

FM switching time.

Page 34: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

34

Updated Benchmarking Summary (Spintronic)

102

103

104

105

106

107

Delay (ps)

FM

SHE Material

-> <-

FM

+V -V

Vin

Vout

FM 1

AFM

FM 2

+V -V

A

Vout

B C

101

102

103

104

105

106

Energ

y(f

J)

32bit ALU

CMOS HP

CMOS LV

ASL

ASL-PMA

ASL-HA

ASL-HAs

CSL

CSL-CC

CSL-Comp

CSL-YIG

MEMTJs

MEMTJ-Pres

SWD

CoMET

CMOS

Spintronics

BCB 3.0

mLogic

Page 35: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

35

Outline

Spintronic Devices

Field Effect Devices

Dynamic Logic and Pipelining

Interconnect Implications

Summary

Page 36: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

36

Write

VME=0.1V to 0State Variable

Read

VME=0 to 0.1V

Non-Volatile SWD (Spin Wave Devices)

S. Dutta, et al., Nature Scientific Report, 5, Article number: 9861 (2015)

VME=0 In-Plane magnet

VME=0.1V Out-of-Plane magnet

Page 37: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

37

Magentoelectric MTJ Device (MEMTJ)

A two phase clock for Vdd can be

used to preset/compute.

No need for dedicated transistors

or dipolar coupling.

U.S. Patent Application 15/654,278.

FM 1

AFM

FM 2

+V -V

A

V out

B C

AFM

A B C

+Vdd

AFMContact

+2Vdd

Vout

MEMTJ-Pres

Needs multiple

transistors to drive

the next stage.

Very sensitive to

oxide thickness.

Relies on dipolar

coupling for

input/output

isolation.

N. Sharma, et al., IEEE DCAS, 2015, pp. 1-4.

Page 38: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

38

CSL Device Updates

• Include copper collector to enhance the spin current

• Separate the pull-up and pull-down network to allow a smaller free magnet and magnets with the same magnetization on top.

• Use YIG to avoid dipolar coupling and further enhance the spin current.

FM

SHE Material

-> <-

FM

+V -V

Vin

Vout

Copper

FM

SHE Material

-> <-

FM

+V -V

Vin

Vout

Exchange/Dipole

CouplingInsulating

Oxide

Tunneling Oxide

SHE

+V

Vin

SHE

-VVout

FM2 FM2

Copper Copper

YIG YIG

+V -V

SHE

Vin

SHE

Vout

FM1 FM1

FM2 FM2

Copper Copper

CSL CSL-CC

CSL-Comp CSL-YIG

Page 39: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

39

FAME Device Updates

vdWFET is updated based on new material, black

phosphorus, which has a large field-effect

mobility and highly anisotropic bandstructure.

Parameters BP WS2

Feature Size F (nm) 15

Gate Length (nm) 12.8

Contact Resistance (Ω⋅μm) 480[1]

Ion (uA/um) 33.11[2] 28.18[2]

Ioff (uA/um) 1.5×10-3[2,3] 1.5×10-3[2,3]

Gate Capacitance (F/m) 8.28×10-11[2] 9.35×10-11[2]

Vdd (V) 0.3[2] 0.3[2]

[1] R. Kappera, et. al., “Phase-engineered low-resistance contacts for

ultrathin MoS2 transistors,” Nature materials, vol. 13, no. 12, pp. 1128–

1134, 2014.

[2] S. S. Sylvia, et. al., "Uniform Benchmarking of Low-Voltage van der

Waals FETs," in IEEE Journal on Exploratory Solid-State Computational

Devices and Circuits, vol. 2, no. , pp. 28-35, Dec. 2016.

[3] Benchmarking scripts: [Online]. Available:

https://nanohub.org/tools/nribench/browser/trunk/src.

To be consistent with other

devices, the actual ON current is

adjusted based on a contact

resistance of 160 Ω.μm

Page 40: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

40

Important Changes

Changes of Gate Capacitance vs. On Current

101

102

103

10

ON Current ( A/ m)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ga

t eC

apa

cita

nce

(fF

/ um

)

BCB 3.0

Updated Data

vdWFET

@ 0.5V

vdWFET

vdWFET

@ 0.3V

Source: Roger Lake

Page 41: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

41

Devices from LEAST

Changes of Gate Capacitance vs. On Current

Source: Joe Nahas and Alan Seabaugh

101

102

103

104

ON Current ( A/ m)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ga

t eC

apa

cita

nce

(fF

/ um

)

ThinFET

@ 0.2V

GaNTFET

@ 0.2V

TMDTFET @ 0.25V

NCFET @ 0.3V

ThinFET @ 0.2V

GaNTFET @ 0.3V

NCFET @ 0.3V

BCB 3.0

Updated Data

ThinFET

GaNTFET

TMDTFET

NCFETTMDTFET

@ 0.2V

Page 42: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

42

NCFET Switching Time

The switching time of the FE can limit the

NCFET switching time.

In the single domain approximation:

r

¶P

¶t= -

¶F

¶Pthe viscosity

coefficient

NCFET

Following up

BCB 3.0, we

add 10ps

additional

delay to

capture this.

r ­

Page 43: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

43

2x↑ 2x↑

T = tintT = 2 tint T = 4 tint

Switching Delay Analysis

For conventional CMOS logic

Would this still be valid for FE devices?

T = tint + tFE T = 2 tint + tFE T = 4 tint + tFE

Page 44: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

44

SPICE Simulation

Average pull-up and pull-down delay for conventional CMOS and

ferroelectric FET vs. the number of cascaded devices

The added delay due to the ferroelectric switching time is almost

constant for NCFET logic with various complexity.

tFE

tFE

tFE

tFEVeriloga-based

FE capacitorBSIM 16nm

Model

A. Aziz et

al., IEEE

EDL, 2016.

ASU PTM

Models

Page 45: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

45

INDEX (Graphene pn Junction)

Changes of Gate Capacitance vs. On Current

101

102

103

10

ON Current ( A/ m)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ga

t eC

apa

cita

nce

(fF

/ um

)

BCB 3.0

Updated Data

GPNJ @

0.4V

GpnJ

GPNJ-Vg2

@ 0.3V

GPNJ-Vg3

@ 0.3V

Page 46: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

46

Devices based on NDR

Input is taken at the rising edge of the clock.

The output will stay the same as long as the

clock is high even if the input changes.

When clock goes down, the gate loses its

value.

One can think of each gate as a volatile latch

with supply and clock combined.

A three phase clock is used to process

information and pass it to the next gates. ‒Vclock

Vclock

Vin Vout

Page 47: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

47

Clocking Schemes

The fact that clock and supply are combined and the device is volatile pose a

challenge.

When clock goes down, the computed bits disappear.

To address this, BCB 3.0 assumed bits must be recomputed until all bits are

computed.

This means Sum is being computed 32 times while only the last time all bits

are accurate.

The effective dynamic energy is 32x larger.

Page 48: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

48

Alternative Clocking (Clock and Hold)

Alternatively, we can hold the clock on to avoid switching energy at

the cost of larger leakage energy.

This will lower the dynamic power dissipation by 32X but will

increase the leakage power by a few times. Only the last logic gates

(XOR) are held to lock the output.

Page 49: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

49

Devices from SWAN

102

103

104

105

Delay (ps)

10-1

100

101

102

103

En

er g

y( f

J)

32-bit ALU

CMOS HP

CMOS LV

ITFET

CMOS

TFETs

Ferroelectric

Others

BisFETBisFET-new

clocking

ITFET-new

clocking

BCB 3.0

ITFET-with

leakage

BisFET with leakage

Page 50: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

50

Updated Benchmarking Summary

Page 51: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

51

Outline

Spintronic Devices

Field Effect Devices

Dynamic Logic and Pipelining

Interconnect Implications

Summary

Page 52: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

52

Static vs. Dynamic Logic

Dynamic logic is used less in advanced technologies:

Large activity factor leads to excessive power dissipation.

Relatively large leakage current can mistakenly discharge the output node.

Very high clock frequencies are not practical.

Static CMOS Dynamic Logic

Page 53: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

53

Beyond-CMOS devices for Dynamic Logic

Dynamic logic may become more attractive for some beyond-

CMOS devices:

Slow ultra-low power devices can get pipelined to improve their

throughput without reaching the power cap.

Some devices have an inherent memory feature (may not need to hold

charge).

Some devices need to be clocked any way (e.g. NDR and spintronic

devices).

Steep subthreoshold devices have low leakage currents.

For slow devices, we may not reach very large clock frequencies that

are not practical.

Page 54: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

54

Pipelining with Dynamic charge-based Logic

To improve the throughput of low-power charge-based devices,

dynamic logic circuits are used to enable deep pipelining.

A full-bit adder using the standard domino logic

clk___clk

___clk

___clk

clk

clk

A A

A B C

C

B

A

Carry

Sum

C B

B

Fang Lu and Henry Samueli, “A 200-MHz CMOS Pipelined Multiplier-Accumulator Using a Quasi-Domino

Dynamic Full-Adder Cell Design,” IEEE JSSC, 1993.

Page 55: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

55

Pipelining with Dynamic Logic (Spintronic Devices)

Low-energy devices benefit from the dynamic logic circuit and achieve higher

throughput density.

10-1 100 101 102 103 104

Throughput Density, TIOPS/cm2

10-1

100

101

Po

we

rD

en

sity,

W/c

m2

32-bit Adder

CMOS HP CMOS LVASL ASL-PMA

ASL-HA

ASL-HAs

CSL CSL-CCCSL-New

CSL-YIGMEMTJ

MEMTJs

MEMTJ-Pres

mLogic

SWD

CoMET

CMOS

Spintronics

Dynamic Logic Pipelining

Clock Cycle = Delay of a single Full Adder

10W/cm2 Power Cap

No Deep Pipelining

Clock Cycle = Delay of 32bit Adder

10W/cm2 Power Cap

Page 56: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

56

Pipelining with Dynamic Logic (Field Effect Devices)

Devices with low Vdd benefit from the dynamic logic circuit and achieve higher

throughput density.

Dynamic Logic Pipelining

Clock Cycle = Delay of a single Full Adder

10W/cm2 Power Cap

No Deep Pipelining

Clock Cycle = Delay of 32bit Adder

10W/cm2 Power Cap

Page 57: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

57

Outline

Spintronic Devices

Field Effect Devices

Dynamic Logic and Pipelining

Interconnect Implications

Summary

Page 58: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

58

10-1

100

101

102

Delay (ps)

10-4

10-3

10-2

10-1

100

Energ

y(p

J)

Inverter

CMOS HP

CMOS LV

HomJTFET

HetJTFET

ThinTFETGaNTFET

TMDTFET

FEFET

NCFET

MITFET

ITFET

BisFET

GpnJ-Vg2GpnJ-Vg3

vdWFET-BP

CMOS

TFETs

Ferroelectric

Others

Interconnects (Charge-Based Devices)

Length = 100 μm, Optimal Repeaters, Activity Factor: 0.1

The extra ferroelectric switching time has a smaller impact on the overall

interconnect delay.

EDP for an inverter , EDP for an optimally repeated wire µCtr2

µ Ctr

Page 59: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

59

Interconnects (Spintronic)

• Interconnects using spintronic devices are slow in general.

• Voltage-controlled devices are more energy-efficient.

Length = 100 μm, Optimal Repeaters, Activity Factor: 0.1

Page 60: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

60

Span of Control (# Accessible Gates per Cycle)

• Field effect devices have roughly similar spans of control.

• The switching time of FE layers increases span of control

• Spin current and spin waves decay rapidly.

Page 61: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

61

Transduction for Long Distance Communication

For long interconnects, it is much faster and energy efficient to use transducers and

communicate via electrical interconnects.

IEEE JxCDC. 2017 Funded by Intel

Page 62: Benchmarking for Beyond-CMOS Devices in Boolean and ... · 26 Energy Delay @ 4-bit Weight 10% Noise 90% Recall Rate Spintronic devices can compete with charge-based devices for the

62

Noise Filtering

• Functional demonstration of a noise filtering application

• Thermal noise has been included in all simulations.

0 0.5 1 1.5 2 2.5 3 3.5 4-1

-0.5

0

0.5

1

Time (ns)

Ma

gn

itiza

tion

in z

Dir

ect

ion

mz

0 0.5 1 1.5 2 2.5 3 3.5 4-1

-0.5

0

0.5

1

Time (ns)

Ma

gn

itiza

tion

in x

Dir

ect

ion

mx

0 0.5 1 1.5 2 2.5 3 3.5 4-1

-0.5

0

0.5

1

Time (ns)

Ma

gn

itiza

tion

in y

Dir

ect

ion

my

(a)

(b) (c) (d) (e) (f)