Ram MICRO Keynote rev3 - Ohio State University

40
High Performance Energy Efficient Near Threshold Circuits: Challenges and Opportunities 2012 MICRO Near Threshold Computing Workshop Keynote December 2012 Ram K. Krishnamurthy Senior Principal Engineer Circuits Research Lab, Circuits & Systems Research, Intel Labs Intel Corporation, Hillsboro, OR 97124, USA [email protected] Acknowledgements: Intel Circuits Research Lab, Vivek De, Rick Forand, Wen-Hann Wang, Shekhar Borkar, Greg Taylor, IPR Bangalore Design Lab, Stefan Rusu, Jim Held

Transcript of Ram MICRO Keynote rev3 - Ohio State University

Page 1: Ram MICRO Keynote rev3 - Ohio State University

High Performance Energy Efficient Near Threshold Circuits: Challenges and

Opportunities

2012 MICRONear Threshold Computing Workshop Keynote

December 2012

Ram K. KrishnamurthySenior Principal Engineer

Circuits Research Lab, Circuits & Systems Research, Intel Labs

Intel Corporation, Hillsboro, OR 97124, [email protected]

Acknowledgements: Intel Circuits Research Lab, Vivek De, Rick Forand, Wen-Hann Wang,

Shekhar Borkar, Greg Taylor, IPR Bangalore Design Lab, Stefan Rusu, Jim Held

Page 2: Ram MICRO Keynote rev3 - Ohio State University

2

Era of Tera-scale ComputingTeraflops of performance operating on Terabytes of data

Terabytes

TIPS

Gigabytes

MIPS

Megabytes

GIPS

Perform

ance

Dataset SizeKilobytes

KIPS

Mult-Media

3D &Video

Text

ModelsPersonal Media Creation and Management

Entertainment, learningand virtual travel

Health

Terascale

Multi-core

Single-core

Financial Analytics

Model-based AppsRecognition

MiningSynthesis

Page 3: Ram MICRO Keynote rev3 - Ohio State University

3

Tera-scale Platform Vision

Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric

SpecialPurposeEngines

Integrated IOdevices

IntegratedMemory

Controllers

High BandwidthMemory

Off Die interconnect

IOSocketInter-

Connect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Page 4: Ram MICRO Keynote rev3 - Ohio State University

4

Silicon Process Technology Innovation

14nm

2013*10nm

2015*7nm

2017* 2019+MANUFACTURING DEVELOPMENT

45nm

200732nm

200922nm

2011RESEARCH

65nm

2005

*projected

Process innovation leads to energy efficient performance

and predictable 2-year technology cycles

Hi-K Tri-Gate

Page 5: Ram MICRO Keynote rev3 - Ohio State University

5

22nm Performance and Energy Scaling

M. Bohr, Intel Developer Forum 2012

Page 6: Ram MICRO Keynote rev3 - Ohio State University

Silicon Integration Providing

Greater End-User Value

Silicon Integration Providing

Greater End-User Value

• More transistors/area: enables substantial system-on-chip integration opportunities

• More transistors/area: enables substantial system-on-chip integration opportunities

Page 7: Ram MICRO Keynote rev3 - Ohio State University

Extreme Scale (Exa-Scale) Computing Research2W – 100 GigaFLOPS 20MW - ExaFLOPS

10 year goal: ~300X Improvement in energy efficiency Equal to 20 pJ/FLOP at the system level

J. Rattner, ISCA 2012 Keynote

Page 8: Ram MICRO Keynote rev3 - Ohio State University

8

Ultra Low Power Graphics/Video & Security Circuits

Ultra Low Power Graphics/Video & Security Circuits

� DSP functions highly throughput-oriented: Amenable for parallelism/pipelining

⇒ Better power-performance optimization

⇒ Optimal partitioning of tasks between GP processor and dedicated engines

� DSP functions highly throughput-oriented: Amenable for parallelism/pipelining

⇒ Better power-performance optimization

⇒ Optimal partitioning of tasks between GP processor and dedicated engines

GO

PS

/W

PPC

PPC1-SOI

Sparc

Sparc2

PPC2-SOI

Sparc1 P4

x86

PPC770

Alpha

PPC970

Alpha

PPC

Itan

ium

SA-DSP

Hitachi-DSP

Fuj-DSP

Fuj-DSP

Cell-SPE

KAIST-DSP

NEC-DSP

Fuj-Multi

MPEG2

Encryp

t

MUD

MPEG2

802.11a

Microprocessors

DSPs

DedicatedHW

10x

100x

10-100X higher performance/watt vs. GP cores

Flexibility vs. energy-e

fficiency

More flexible…More flexible…

More efficient…

Source: ISSCC

Video

ME

SIMD Vector

SIMD Permutation

AES Encryp

tion

Intel ISSCC, VLSI 2008-2012

Page 9: Ram MICRO Keynote rev3 - Ohio State University

9

Specialized HW Accelerators for Specialized HW Accelerators for Specialized HW Accelerators for Specialized HW Accelerators for ExaExaExaExa----ScaleScaleScaleScale

General purpose cores, special-purpose accelerators, interconnect fabricEfficient, adaptive, reconfigurable, resilient

LowLowLowLow----power generalpower generalpower generalpower general----purpose corepurpose corepurpose corepurpose core

SP HW acceleratorsSP HW acceleratorsSP HW acceleratorsSP HW accelerators

Fixed function vs. limited programmabilityOperation over wide supply voltage range (near-threshold to nominal)

Page 10: Ram MICRO Keynote rev3 - Ohio State University

NTV Operation & Energy Efficiency

10

10-2

10-1

1

101

0

75

150

225

300

375

450

0.2 0.4 0.6 0.8 1.0 1.2 1.4Supply Voltage (V)

Ac

tive

Lea

kag

e P

ow

er

(mW

)

En

erg

y-E

ffic

ien

cy

(GO

PS

/Watt

)

320mV

9.6X

65nm CMOS, 50°C

Su

bth

res

ho

ld

1

101

103

104

102

10-2

10-1

1

101

102

0.2 0.4 0.6 0.8 1.0 1.2 1.4

65nm CMOS, 50°C

Max

imu

m F

req

ue

nc

y (

MH

z)

To

tal

Po

we

r (m

W)

Supply Voltage (V)

Frequency reduces almost linearly first, then exponentially

Total power reduces by three to four orders of magnitude

Energy efficiency improves by one order of magnitude at NTV

Energy efficiency reduces in subthreshold operation

Leakage power reduces by two to three orders of magnitude

H. Kaul, R. Krishnamurthy et al, ISSCC 2008

Page 11: Ram MICRO Keynote rev3 - Ohio State University

NTV Across Technology Generations

11

0

0.5

1.0

1.5

2.0

2.5

3.0

0.2 0.4 0.6 0.8 1.0 1.2 1.4Supply Voltage (V)

Energy Efficiency (TOPS/W)

10-3

10-2

10-1

1

10

Active Leakage Power (mW)

Reconfigurable Fabric, 32nm CMOS, 50°C

340mV

0.8mW5.7x

Su

b-t

hre

sh

old

Re

gio

n

0.2 0.4 0.6 0.8 1.0 1.2

Energy Efficiency (GOPS/W)

Supply Voltage (V)

Leakage Power (mW)

103

102

10

10-2

1

10-3

103

102

10

1

10-1

Sub-threshold

Region

22nm CMOS, 50°C

9x

9x

Register FilePermute Crossbar

0

1

2

3

4

5

6

7

8

9

0.15 0.40 0.65 0.90 1.15 1.40No

rma

lized

En

erg

y E

ffic

ien

cy

Supply Voltage (V)

300mV

8X

45nm CMOS 50°C

32b Multiply

16b SIMD Multiply

72b Add 1.1V

0.980.870.740.590.370.15VhiVlo

H. Kaul, et. al., ISSCC 2009

A. Agarwal, et. al., ISSCC 2010

S. K. Hsu, et. al., ISSCC 2012

NTV operation improves energy efficiency across 45nm-22nm CMOS

Page 12: Ram MICRO Keynote rev3 - Ohio State University

NTV Opportunities for Wide Dynamic Range

T. Thakkar, Intel Developer Forum 2012

Page 13: Ram MICRO Keynote rev3 - Ohio State University

ATOM™ 32nm SOC V/F Islands

SoC integration of many unrelated functions in their own power ‘islands’.

CPU

audiosouth

complex

security

2D/3D

graphics

Image Signal

Processor display

video

NCPLLs

GPIO

MIPI DDRGPIODDR

DDR

DDR

EMMC

GPIO D

DR

DDR

clocks

HDMI

• On-die voltage regulation leading to power ‘islands’ that

can have different voltage levels.

• Power management that shuts functional units off.

• Voltage-Frequency pairs; CPU’s can be run in several

operating points where its power supply is adjusted to

reduce power while keeping various functional blocks at

constant voltage:

– lowest frequency: 100 - 600MHz

– medium frequency: 700 - 1500MHz

– burst frequency: 1600 – 2500MHz

• OFF chip drivers have to support various voltage levels

whereas the controller logic is powered by a lower

voltage :

– LPDDR: 1.25V

– MIPI-display: 1.25V

– HDMI-display 3.3V

– SD cards: 2.85V

– GPIO: 1.25V, 1.80V

T. Thakkar, Medfield, Intel Developer Forum 2012

Page 14: Ram MICRO Keynote rev3 - Ohio State University

NTV Opportunities for Converged Core

14

T. Piazza, Intel Developer Forum 2012

Page 15: Ram MICRO Keynote rev3 - Ohio State University

Impact of Variation on NTV

15

0%

10%

20%

30%

40%

50%

60%

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Fre

q (

Re

lative

)

Vdd (Relative)

Frequency

Spread

+/- 5% Variation in Vdd or Vt

0

1

2

3

4

5

6

1.0 0.9 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Cir

cuit

vu

lne

rab

ilit

y t

o 5

% n

ois

e

Vdd scaling towards threshold ���� Threshold����

5% variation in Vt or Vdd results in up to 50% variation in circuit performance

dd

tdd

V

VVfrequency

α

)( −∝

Page 16: Ram MICRO Keynote rev3 - Ohio State University

Variation Modeling & Measurements

16

1

101

103

104

102

0.2 0.4 0.6 0.8 1.0 1.2 1.4

Supply Voltage (V)

65nm CMOSTypical Die Measurements

Maxim

um

Fre

qu

en

cy (

MH

z)

±2X

±5%

50°C

320mV

Frequency variation across 0-110°C

0.5 1.0 1.5 2.0

No

rmali

zed

Dis

trib

uti

on

65nm CMOS, 50°C

1.2V

320mV

0

1

Frequency variation across fast – slow dies

±18%

±2X

Normalized Frequency

Monte-Carlo Simulations

18% nominal frequency spread

2X spread at NTV

65nm CMOS measurements

5% nominal spread due to temperature

2X spread at NTV

H. Kaul, R. Krishnamurthy et al, ISSCC 2008

Page 17: Ram MICRO Keynote rev3 - Ohio State University

Using Vdd to Compensate for Variation

• Adaptive Voltage Compensation for variation tolerance

• Adjust supply voltage to maintain constant performance

• ±50mV adjustment about 320mV:

ð Nominal 23MHz performance sustained across 0-110°C and fast-slow skews

Intel Confidential17

65nm CMOS, 320mV Typical Die

23

MH

z

Temperature (C)

Fre

qu

en

cy (

MH

z)

Fre

qu

en

cy (

MH

z)

23

MH

z

65nm CMOS, 320mV, 50C

Process SkewSlow Typical Fast

0

14

28

42

56

0 50 110

0

14

28

42

56

Page 18: Ram MICRO Keynote rev3 - Ohio State University

Subthreshold Leakage at NTV

18

0%

10%

20%

30%

40%

50%

60%

45nm 32nm 22nm 14nm 10nm 7nm 5nm

SD

Leakag

e P

ow

er

100% Vdd

75% Vdd

50% Vdd

40% Vdd Increasing Variations

NTV operation reduces total power, improves energy efficiency

Subthreshold leakage power is substantial portion of the total

Page 19: Ram MICRO Keynote rev3 - Ohio State University

Low Voltage SRAM and Register File

19

6T SRAM suffers stability and yield at NTV

6T SRAM cell with larger transistors

8T/10T SRAM for improved stability and yield

wrbl

wrbl#

rdbl

wrbl

wrbl#

rdbl

Conventional dual-ended (DE) write cell(Write failure due to strong P and weak N)

Dual-ended transmission gate (DETG) write cell

Variation tolerant register file for NTV

S. Hsu, R. Krishnamurthy et al, ISSCC 2012

Page 20: Ram MICRO Keynote rev3 - Ohio State University

Low Voltage Latches and Flip-flops

20

Upsized

Non-minimum Channel Length

Ck

Ck

Ck

Ck

Ck

Ck

Ck

Ck

D

Q

“1”“0”

Designing flip-flops for NTV Averaging with vector flip-flops

Shared min-sized clock drivers

Vmin improves by 175 mVHold time margin by 7 to 30%

Page 21: Ram MICRO Keynote rev3 - Ohio State University

Low Voltage Logic: Multiplexers & Gates

21

“1”

“0”

“0”

“0”

“1”

One-hot 4:1

“1”

“0”

“0”

“0”

“1”

“0”

Encoded 4:1

Designing multiplexers for NTV

Up to 3X reduction in worst case

static droop

Avoid series connected

transmission gates

Logic fan in limited to 3 stack

Transmission gates, logic gatesIssue:

Large off-current paths

Weak on-current pathsBody effect

Page 22: Ram MICRO Keynote rev3 - Ohio State University

Low Voltage Level Converters

22

Low Voltage

Circuit Block

High Voltage

Circuit Block

CVSL Level ConverterSignificant energy

consumed in contention currents

Two-stage cascaded split-output level shifter

Ultra-low voltage split-output level shifter

IN

MIDVCCLOW

VCCMID

VCCMID

VCCHIGH

OUTVCCLOW

0

0

VCCHIGHCVSL Stage

CVSL Stage

CVSL split into two stages to reduce contention current

Decoupled output for smaller CVSL

20% energy reduction

Decoupled output from CVSL

Interrupts contention devices

Vmin improved by 125 mV

H. Kaul, R. Krishnamurthy et al, ISSCC 2009

Page 23: Ram MICRO Keynote rev3 - Ohio State University

Soft Errors and Reliability

23

0

0.2

0.4

0.6

0.8

1

0.5 1 1.5 2Voltage (V)

n-S

ER

/ce

ll (

se

a-l

ev

el)

65nm

90nm

130nm

180nm

250nm

Soft error/bit reduces each generation

Impact of NTV on soft error rate

1

10

180 130 90 65 45 32

Technology (nm)

Rela

tive t

o 1

30n

m

Memory

Latch

Assuming 2X bit/latch

count increase per generation

Soft error at the system level will

continue to increase

Positive impact of NTV on reliability

Low V � lower E fields, low power � lower temperature

Device aging effects mitigated

Lower electro-migration related defects

Page 24: Ram MICRO Keynote rev3 - Ohio State University

NTV SIMD Permutation Engine

•SIMD permutation operations are key for maximizing vector datapath utilization in multimedia, graphics, and signal processing workloads

•SIMD vector permutation engine with 2-dimensional shuffle consists of register file for vertical shuffle and permutation crossbar for horizontal shuffle

32x8b

3R1Wx16

Register File Permute

Crossbar

RdData0x

RdData1x

RdData2x

x16

32x8b

3R1W

Register File

PermOutx16

RdAdd2x

32x128b 3R1W Register File BankVertical Shuffle

32 entries

RdAdd1x

RdAdd0x

WrAdd0x

RdData0x

WrData0x

a0

256b Permute Crossbar (32 x 32:1)Horizontal Shuffle

RdData1x

5:32 Decoder

8b

b1

d3

a0

5:32 Decoder

b1

8b

5:32 Decoder

8b

c2

5:32 Decoder

d3

8b

a0

b1

c2

d3

h7

a0

d3

e4

x32

x32

x32

x32

mode

RdData0x

Bank0

Bank1

Bank14Bank15

2:1

e4

x16

f5g6

h7

c2

f5g6

b1

2:1

PAdd0x

2:1

x16x16X16 x16

RdData0x RdData0y

RdData0y

RdData1y

RdData2y

RdAdd0y

RdAdd1y

RdAdd2y

WrAdd0y

WrData0y

128

128

128

128

RdData2x

PAdd0y

2:1

RdData1y

5 5

16x5 16x5

16x5

16x5256

RdAdd0x

RdAdd1x

RdAdd2x

WrAdd0x

WrData0x

c2

mode

Process 22nm CMOS

Nominal Vcc

0.9V

Permute Xbar

256b byte-wise any-to-any

Register File

32x256b 3R/1W

Die Area 0.048mm2

Pad Count 30

256b Permutation Engine Organization

22nm CMOS Chip Micrograph

Page 25: Ram MICRO Keynote rev3 - Ohio State University

Logic and Memory VMIN Circuit Optimizations

•Register file and logic VMIN circuit optimizations enable NTV operation

•Register file VMIN techniques: (i) clock-less static CMOS reads, (ii) dual-ended transmission gate (DETG) writes with shared P/N

•Logic VMIN techniques: (i) vector flip-flops, (ii) stacked min-delay buffers, (iii) shared gates to average min-sized transistor variation,

(iv) ultra low voltage split-output level shifters

RdLBL

DQ

C FF

WrBl

WrBl#

Vccx

Vssx

Vccx

Vssx

RdLBL

PLBL

DQ

C FF

x128

x128

LS

sel#

32:1 MUX

DQ

C FF

pout

32x256b 3R/1W Register File Circuits

256b Permute Crossbar Circuits

Page 26: Ram MICRO Keynote rev3 - Ohio State University

22nm Simulations/Measurements

•VMIN improvement: register file VMIN by 250mV and logic VMIN by 150mV

•Nominal: 1.8GHz, 0.9V, 50°C down to sub-threshold: 16.8MHz, 280mV

•Wide voltage supply scalability across 280mV - 1.1V increasing energy efficiency by 9x: Industry’s first Tri-Gate NTV Logic + Memory circuits

0.2 0.4 0.6 0.8 1.0 1.2

Maximum Frequency (MHz)

Supply Voltage (V)

Total Power (mW)

104

103

102

10

10-1

1

104

103

102

10

10-1

1

10-2

0.2 0.4 0.6 0.8 1.0 1.2

Energy Efficiency (GOPS/W)

Supply Voltage (V)

Leakage Power (mW)

103

102

10

10-2

1

10-3

103

102

10

1

10-1

22nm CMOS, 50°C

Sub-threshold Region

Register FilePermute Crossbar

Sub-threshold Region

22nm CMOS, 50°C

9x

9x

Register FilePermute Crossbar

Register File VMINSimulations

Permutation Engine VMINSimulations

Maximum Frequency and Total Power

Measurements vs. Supply Voltage

Energy Efficiency and Active Leakage Measurements vs.

Supply Voltage

22nm VMIN simulations performed at 0°C -85°C,

3σ systematic, 6σ random variation

585GOPS/W

154GOPS/W

S. Hsu, R. Krishnamurthy et al, ISSCC 2012 & JSSC January 2013

Page 27: Ram MICRO Keynote rev3 - Ohio State University

27

H. Kaul, R. Krishnamurthy et al, ISSCC 2012

NTV Variable Precision FPU

Page 28: Ram MICRO Keynote rev3 - Ohio State University

Experimental NTV Processor

28

Technology 32nm High-K Metal Gate

Interconnect 1 Poly, 9 Metal (Cu)

Transistors 6 Million (Core)

Core Area 2mm2

IA-32 CoreLogic

Scan

RO

M

L1$-I L1$-D

Level Shifters + clk spine

1.1 mm

1.8

mm

Custom Interposer951 Pin FCBGA Package

Legacy Socket-7 MotherboardS. Jain, et al, “A 280mV-to-1.2V Wide-Operating-Range IA-32 Processor in 32nm CMOS”, ISSCC 2012

Page 29: Ram MICRO Keynote rev3 - Ohio State University

NTV Design Methodology

29

Normalized Delay Slowdown Due to Random Variations (6σ)

Logic Vcc (V)

1

2

3

4

5

6

7

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Nominal Vt High Vt

76%

Nominal Versus High Vt Devices

0

2

4

6

8

10

12

1X 2X 3X 4X 5X

Device Width

0.5V

130%

Minimum/Small Sized Devices

Page 30: Ram MICRO Keynote rev3 - Ohio State University

Subth

reshold

Power & Performance

30

915MHz

500MHz

100MHz

3MHz

737mW

174mW

17mW2mW

0

100

200

300

400

500

600

700

800

1

10

100

1000

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

0.55 0.55 0.55 0.55 0.6 0.7 0.8 0.9 1 1.1 1.2

To

tal P

ow

er (m

W)F

req

uen

cy (

MH

z)

32nm CMOS, 25oC

Logic Vcc / Memory Vcc (V)

4%

33%

62%

1%

53%

27%

15%

5%81%

11%

3%

5%

Logic dynamicMemory leakage

Logic leakage

Subthreshold NTV Super-threshold

Page 31: Ram MICRO Keynote rev3 - Ohio State University

Wide Dynamic Range

31

NTV

EN

ER

GY

E

FF

ICE

NC

Y

HIGH

LOW

VOLTAGEZERO MAX

~5x Demonstrated

Normal operating range Subthreshold

Ultra-low Power Energy Efficient High Performance

280 mV 0.45 V 1.2 V

3 MHz 60 MHz 915 MHz

2 mW 10 mW 737 mW

1500 Mips/W 5830 Mips/W 1240 Mips/W

Page 32: Ram MICRO Keynote rev3 - Ohio State University

NTV Parallelism: On-chip Interconnect

32

0

0.2

0.4

0.6

0.8

1

1.2

45 32 22 14 10 7

Re

lati

ve

Technology (nm)

Compute Energy

Interconnect Energy

6X

1.6X

Compute energy reduces faster than global interconnect energy

For constant throughput, NTV demands more parallelism

Increases data movement at the system level

System level optimization is required to determine NTV

operating point

ComputeGlobal Interconnect

Supply Voltage

Energ

y

Page 33: Ram MICRO Keynote rev3 - Ohio State University

33

Circuit-Switched On-Chip Interconnects

M. Anders, R. Krishnamurthy et al, ISSCC 2010

Page 34: Ram MICRO Keynote rev3 - Ohio State University

Reconfigurable Fabric Array (RFA)

Reconfigurable fabric tightly

coupled to processor pipeline

Reconfigurable fabric directly

interfaced with memory

Core0

Core1

Memory

RFA

Core0

Core1

Memory

RFA

RFA

Core

Core

RFA

Memory

Core

Core

RFA

Memory

Fine-grain Reconfigurable Fabrics

Page 35: Ram MICRO Keynote rev3 - Ohio State University

35

4

1Y2 Y0[0]

2:14

x4

3-Input LUT

XY

Out

X2

X0

4

Z4

4 & 5 Input LUT Merge

3C

2

1LUT4

LUT5

4b Adder4

X0

4Y0

4

4b Adder

4X1

4Y1

Sum0

Cout0

4b Adder

4X2

4Y2

Cout1

Output MUX

4

OutX

4

OutY

4

OutZ

3

OutC

4

Sum2

Cout2

4

Sum1

4

LUT3

4X0

4Y0

Y0[1]

Y0[2]

Y0[3]

Hybrid

CLBOutY

OutZ

OutC

X0

C

4Y04

OutX

4X1

33

Y1

X2

4

Y2

4

4

4

4

4

Cin01

Cin11

Cin21

32nm Hybrid CLB Design

● Optimized for arithmetic with support for random logic● Four 3-input Look-Up Tables (LUTs)● Three 4b adders

● 27 inputs, 15 outputs, 43 configuration bits

A. Agarwal, R. Krishnamurthy et al, ISSCC 2010

Page 36: Ram MICRO Keynote rev3 - Ohio State University

36

32nm High-K/Metal-Gate CMOS Die Micrograph

Process32nm High-K

Metal-Gate CMOS

Nominal Supply 1.0V

Interconnect 9 metal Cu

Number of CLBs 6

Register File Array 64-entry x 32b

Die Area 0.076mm2

Number of Transistors 110K

Pad Count 30

Clo

ckI/OControl I/O

Reconfigurable Fabric

Page 37: Ram MICRO Keynote rev3 - Ohio State University

255mV (Sub-threshold) operation in 32nm CMOS technologyIndustry’s first ultra-low-voltage reconfigurable accelerator

A. Agarwal, R. Krishnamurthy et al, ISSCC 2010

Page 38: Ram MICRO Keynote rev3 - Ohio State University

38

1.5mm3 Intraocular Pressure Monitor

● Continuous IOP monitoring

● Wireless communication

● Energy-autonomy

● Device components

● Solar cell

● Wireless transceiver

● Cap to digital converter

● Processor and memory

● Power delivery

● Thin-film Li battery

● MEMS capacitive sensor

● Biocompatible housing Courtesy: Gregory Chen, U. Michigan

Page 39: Ram MICRO Keynote rev3 - Ohio State University

39

Summary• Moore’s Law has fueled the worldwide technology revolution for

over 40 years and will continue for at least another decade

– 0.7x transistor dimension scaling every two years

– Hi-K MG & Tri-Gate devices: significant energy-efficiency benefits

• Key challenges for Sub-22nm 1-100TOPS/Watt SOC platforms

– Special-purpose accelerators for graphics/video/media DSP

– Ultra-low-voltage/NTV operation with wide dynamic voltage range

– On-die reconfigurable logic fabrics/accelerators for flexible SOCs

• Energy-efficient SOC graphics/media processing:

– Reconfigurable SIMD vector permutation processor in 22nm

– NTV processor with wide dynamic range in 32nm CMOS

– Fine-grain reconfigurable logic array fabric in 32nm CMOS

• Ultra-low voltage (NTV) circuit design challenges & opportunities

– 5-10X higher energy efficiency (GOPS/W) vs. nominal supply operation

Page 40: Ram MICRO Keynote rev3 - Ohio State University

40

Legal DisclaimerThis presentation contains the general insights and opinions of intel corporation (Intel).

• This presentation is provided for informational purposes only and is not to be relied upon for any other purpose. Intel makes no representations or warranties regarding the accuracy or completeness of the information in this presentation. Intel accepts no duty to update this presentation based on more current information. Intel is not liable for any damages, direct or indirect, consequential or otherwise, that may arise, directly or indirectly, from

the use or misuse of the information in this presentation. Intel retains all rights to the Presentation, including any patent, trademark, trade secret, copyright, trade dress, mask works, or any other intellectual property rights. The provision of the Presentation does not constitute the grant or license of any such rights by Intel. Provision of the Presentation shall not be construed to constitute advice or consultation. Intel does not provide, and is not providing, any technical, legal, regulatory or compliance advice. Nor does Intel make any representation or warranty with respect to the effectiveness

of any information contained herein. • Intel may make changes to specifications and product descriptions at any time, without notice.• Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel

products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.

• Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

• Intel Virtualization Technology requires a computer system with a processor, chipset, BIOS, virtual machine monitor (VMM) and applications enabled for virtualization technology. Functionality, performance or other virtualization technology benefits will vary depending on hardware and software configurations. Virtualization technology-enabled BIOS and VMM applications are currently in development.

• Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost.

• No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) is a security technology under development by Intel and requires for operation a computer system with Intel® Virtualization Technology, a Intel® Trusted Execution Technology-enabled Intel processor, chipset, BIOS, Authenticated Code Modules, and an Intel or other Intel® Trusted Execution Technology compatible measured virtual machine monitor. In addition, Intel® Trusted Execution Technology requires the system to contain a TPMv1.2 as defined by the Trusted Computing Group and specific software for some uses. 64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel® 64 architecture. Performance will vary depending on your hardware and software configurations. Consult with your system vendor for more information.

• Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. © 2008 Standard Performance Evaluation Corporation (SPEC) logo is reprinted with permission.

• * Other names and brands may be claimed as the property of others.