RF-Interconnect for Communications On-Chip

33
RF-Interconnect for Communications On-Chip Frank Chang 1 , Jason Cong 2 , Adam Kaplan 2 , Mishali Naik 2 , Glenn Reinman 2 Eran Socher 1 , Rocco Tam 1 Department of Electrical Engineering 1 Department of Computer Science 2

description

RF-Interconnect for Communications On-Chip. Frank Chang 1 , Jason Cong 2 , Adam Kaplan 2 , Mishali Naik 2 , Glenn Reinman 2 Eran Socher 1 , Rocco Tam 1 Department of Electrical Engineering 1 Department of Computer Science 2. Current Trend in CMP - NoC. 65nm CMOS 80 tile NoC - PowerPoint PPT Presentation

Transcript of RF-Interconnect for Communications On-Chip

Page 1: RF-Interconnect for Communications On-Chip

RF-Interconnect for Communications On-Chip

Frank Chang1, Jason Cong2, Adam Kaplan2, Mishali Naik2, Glenn Reinman2

Eran Socher1, Rocco Tam1 Department of Electrical Engineering1 Department

of Computer Science2

Page 2: RF-Interconnect for Communications On-Chip

Current Trend in CMP - NoC

• 65nm CMOS 80 tile NoC• 10X8 2D mesh network-

on-chip running @ 4GHz• Bisection bandwidth

256GB/s• 1 TFLOPS @ 1V about

98W

ISSCC 2007: An 80-Tile 1.28TFLOPS Network-on-Chipin 65nm CMOS (Sriram Vangal et al., Intel)

Page 3: RF-Interconnect for Communications On-Chip

What is The Challenge?

• Cores would keep shrinking in size but maintain the same operation frequency (2~4GHz) due to thermal constraints

• More cores would be integrated on the same chip to achieve performance boost through parallelism

• Performance would be limited by the communication efficiency between cores and memories on- and off-chip

Page 4: RF-Interconnect for Communications On-Chip

The Scaling Trend• Scaling reduces delay of logic gates but not wires

Transistor and Wire Delay Trend in CMOS

0102030405060708090

100

180n

m

130n

m90

nm65

nm45

nm32

nm

Technology Node

Del

ay [

ps] FO4

1mm RC global wire

Repeated 1mm RC globalwire

Page 5: RF-Interconnect for Communications On-Chip

Traditional Interconnect• Units communicate through a parallel bus using

voltage signaling (charging and discharging the wire capacitance)

• Latency is RC limited (~L2)• Using CMOS repeaters reduces latency (~L) but does

not benefit from scaling• Supply no longer scales due to leakage • Baseband-only signaling requires extensive

equalization• Waste of broad bandwidth available from modern

CMOS devices (ft>150GHz, fmax>250GHz)

Sig

nal P

ower

BasebandSignal

fT

Available Bandwidth

10Tf

Page 6: RF-Interconnect for Communications On-Chip

Major Interconnect Issues• Latency is large across chip• Bandwidth is RC limited (~1Gbps/wire)• Communication pattern is fixed• Energy consumption is high and not scalable

(~10pJ/bit)• Future microprocessors may encounter

communication congestion and most of the energy will be spent on “talking” instead of computing

Page 7: RF-Interconnect for Communications On-Chip

How Can RF Help?

• EM waves travel at the (effective) speed of light (~10ps/mm)

• Carrier frequencies can be modulated by modern CMOS with high data rates

• Transmission lines on- or off-chip can guide the waves (RF modulated data) from the transmitter to receiver with recoverable attenuation

Page 8: RF-Interconnect for Communications On-Chip

RF-Interconnect Concept

f0 f0

Transmission Line

Output BufferMixer Mixer LPF

datain dataout

frequency

Sig

na

l Po

we

r

Datain

0f frequency

Sig

na

l Po

we

r

frequency

Sig

na

l Po

we

r

TransmittedSignal

dataout

• Data transmit through transmission lines at the speed of light, with less dispersion across the band and less baseband interference

• data rate is only limited by CMOS mixer modulation speed

Page 9: RF-Interconnect for Communications On-Chip

RF-I using Multi-band FDMA

f1

f2

f3

f4

f1

f2

f3

f4

Transmission Line

Output BufferMixer Mixer LPF

frequency

Sig

nal P

ow

er Data1

Sig

na

l P

ow

er

frequency

frequency

Sig

na

l P

ow

er

frequency

Sig

nal P

ow

er

frequency

Sig

na

l P

ow

er

Data2

Data3

Data4

frequency

Sig

nal P

ow

er Data1

frequency

Sig

nal P

ow

er

frequency

Sig

nal P

ow

er

frequency

Sig

nal P

ow

er

Data2

Data3

Data4

f1 f2 f3 f4

• More bands are used with same modulation speed at each band

• Higher aggregate data rates can be achieved on the same transmission line

Page 10: RF-Interconnect for Communications On-Chip

3.6Gbps Multi-drop Multiband Bi-directional RF-I *

0.15ns/div

100m

V/d

iv

4ns/div

100m

V/d

iv

4ns/div

Recovered data eye diagramsRecovered data waveforms

Input data patterns

Data-B

FDMAchip4

Data-B

FDMAchip2

Data-R

FDMAchip3

Data-R

FDMAchip1

Data-B : 1.8Gb/s PRBS through basebandData-R : 1.8Gb/s PRBS through RF-band

Data-R

Data-R

Data-B

Data-B

Data-R

Data-B

Data-R Data-B

10 cm FR4 Interconnect

* World’s 1st Multiband RF-I, Ko & Chang, 2005 ISSCC

Page 11: RF-Interconnect for Communications On-Chip

Can We Implement RF-I in CMOS?

• Today’s RF-CMOS circuits are in the wireless communication “sweet spots” of 500MHz-5GHz– Insufficient bandwidth for RF-I to be

effective!

• Millimeter-wave CMOS circuits have been developed for 60GHz and recently for 324 GHz bands

Page 12: RF-Interconnect for Communications On-Chip

CMOS 324GHz Generator

-76dBm before calibration

-46dBm after calibration

*Huang, Larocca and Chang, “324GHz CMOS Frequency Generator using Linear Superposition Technique,” pp. 476- 477, 2008 ISSCC

Page 13: RF-Interconnect for Communications On-Chip

Frequency Generation in Multiband RF-Interconnect

10GHz 20GHz 30GHz 40GHz 50GHz 60GHzf

Sig

nal S

pect

rum

f6 = 60GHz

f5 = 50GHz

f4 = 40GHz

f3 = 30GHz

f2 = 20GHz

f1 = 10GHz60GHz

10GHz

60GHz

Transmission Line

Output BufferMixer Mixer LPF

frequency

Sig

nal

Pow

er

Data1

frequency

Sig

nal

Pow

er

Data6

frequency

Sig

nal

Pow

er

Data1

frequency

Sig

nal

Pow

er

Data6

10GHz

X 6 TX

X6 RX

Multi-Band Synthesizer

Page 14: RF-Interconnect for Communications On-Chip

Simultaneous Sub-harmonic Injection Locked mm-Wave Frequency Generation

• Using sub-harmonic injection-locked VCOs simultaneous lock to one single reference frequency

• Advantages:– Eliminate PLLs– Low Power

Consumption– Small Area Master VCO

Non-linear HarmonicGenerator

Slave VCOs

Page 15: RF-Interconnect for Communications On-Chip

Sub-harmonic Injection Locked VCO*

• LC-based VCO core

• Differential pair for odd harmonic generation

• Single-ended even harmonic generation

• Injection locking to high harmonic within locking range of the VCO

M1 M2

M3 M4

Ibias,VCO

Ibias,inj

Vinj

+

-

f

Out+ Out-

VCC,1V

Out buffer

Out buffer

f

3f 3f

ProcessFree Running

Frequency (GHz)

Max locking Range (GHz)

Locking Harmonics Power (mW)

This Work* 90nm CMOS 29.3 5.62nd,4th, 6th, 8th

3rd, 5th, 7th 4

*Sai-Wang Tam, M.-C. Frank Chang, etc…, "Simultaneous Sub-harmonic Injection-Locked mm-Wave Frequency Generators for Multi-band Communications in CMOS", IEEE RFIC Sym., 2008

Page 16: RF-Interconnect for Communications On-Chip

RF-I using Amplitude shift-Key (ASK) Modulation

• TX: Use transformer couples output of VCO to ASK modulator and use simple modulator to generate RF signal in ASK.

• RX: Use self-mixer for envelope detection. Afterwards a simple buffer and Schmitt Trigger recover the signal to rail-to-rail swing.

Page 17: RF-Interconnect for Communications On-Chip

Differential Transmission Line

• Loss of 0.6-1.6 dB/mm

Differential TML

Page 18: RF-Interconnect for Communications On-Chip

RF-I using Amplitude Shift-Key (ASK) Modulation

VCO Output: 60GHZ

ASK modulated Signal

Mixer output5Gbit/s Data input

Page 19: RF-Interconnect for Communications On-Chip

3DIC ASK RF-I Tested at 11Gbps*Output Eye diagram Output versus input

Input

Output

10ps/div

50mV/div

500ps/div

Coupling Capacito

rTX in Layer

2RX in Layer

1

Die Photo

*Gu and Chang, pp.448-449, 2007

ISSCC (0.33pJ/bit)

Page 20: RF-Interconnect for Communications On-Chip

Single Channel ASK RF-I Performance Summary

• Simple Architecture: One TX VCO, One Mixer, One RX Buffer

• No synchronization circuits such as PLL or clock data recovery needed in ASK RF-I

• Can expand the same architecture to multi-band RF-I

Process IBM 90nm CMOS Digital Process

RF-Carrier Freq. 60GHz

Data Rate 5Gbit/s

Power TX:2mW RX: 3mW

Energy per bit 1pJ/Bit

Active Area 1300 µm2

Page 21: RF-Interconnect for Communications On-Chip

21

Future Trends in Multi-band ASK RF-I

Technology # of Carriers data rate per carrier (Gb/s) Total Data rate per wire (Gb/s) Power (mW) Energy per bit(pJ) Area (TX+RX) mm2

Area/Gbit

(µm2/Gbit)

90nm 3RF + 1 BB 5 20 20 1.00 0.022 1100

65nm 4RF + 1 BB 6 30 25 0.83 0.0238 800

45nm 5RF + 1 BB 7 42 30 0.71 0.0228 540

32nm 6RF + 1 BB 8 56 35 0.63 0.0211 380

22nm 7RF + 1 BB 9 72 40 0.56 0.0193 260

Page 22: RF-Interconnect for Communications On-Chip

22

Interconnect Topology Comparison

2cm Interconnect Data Rate Density

0

2

4

6

8

10

12

14

90nm 65nm 45nm 32nm 22nm

Technology Node

Da

ta R

ate

De

ns

ity

[G

bp

s/u

m]

Bus

RF-I

Optical-I

2cm Interconnect Energy

0

5

10

15

20

25

90nm 65nm 45nm 32nm 22nm

Technology Node

En

erg

y [p

J/b

it]

Bus

RF-I

Optical-I

2cm Interconnect Latency

0

200

400

600

800

1000

1200

1400

1600

90nm 65nm 45nm 32nm 22nm

Technology Node

Lat

ency

[p

s]

Bus

RF-I

Optical-I

• Comparison across process technology of…

– Traditional RC parallel bus– RF-Interconnect– Optical Interconnect

• As process technology scales toward 22nm…

– RF-I has lowest latency– RF-I consumes least energy– RF-I has highest data rate density

• RF-I is fully compatible with modern CMOS technology

Page 23: RF-Interconnect for Communications On-Chip

Advantages of RF-Interconnects

• Latency

• Bandwidth

• Energy

• Reconfigurability

Page 24: RF-Interconnect for Communications On-Chip

Example: RF-I for CMP NoC Design

• 10x10 mesh of 5-cycle pipelined routers– NoC runs at 2GHz– XY/YX routing

• 64 4GHz 3-wide processor cores containing– 8KB L1 Data Cache– 8KB L1 Instruction Cache

• 32 L2 Cache Banks– 256KB each– Organized as shared

NUCA cache• 4 Main Memory Interfaces

– Labeled with + in the figure

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$

R R

R R

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$R

$R

$

R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$R

$

RR RR

RR RR

R (square) = router

C (circle) = processor core

$ (diamond) = L2 cache bank

+ (plus) = main memory interface

Page 25: RF-Interconnect for Communications On-Chip

MORFIC: Mesh Overlaid with RF-InterConnect

• Shared Z-shaped RF waveguide• Organized as 8 bidirectional

shortcut links• Each direction of each shortcut

can transmit simultaneously over shared medium

• Router A can send a flit to other router A, B to B, … H to H in a single cycle

• Router labeled X cannot directly send to any router not labeled X – E.g. Router B in upper left cannot

send to router E in upper right directly

– However, B in upper left can send to B in upper right, and then north to E using normal mesh link

A

F

A

C

C

B B

D

D E

E

G

G

H

H

F

PHYSICAL ORGANIZATIONLOGICAL ORGANIZATION

A

F

A

C

C

B B

D

D E

E

G

G

H

H

F

Page 26: RF-Interconnect for Communications On-Chip

MORFIC Results For 256B Total RF-I [HPCA’2008]

• 256B RF-I consumes 0.18% silicon overhead on 400mm2 die– RF-I components: 0.13%, Router overhead: 0.05%

• Normalized Splash-2 Execution Time and Average Packet Latency Results

– Normalized to baseline mesh run-cycles/latency at 1– Average 13% (max 18%) performance improvement– Average 22% (max 24%) packet latency improvement

0.740.760.780.800.82

fft

radi

x

wat

er-s

p

wat

ern^

2 lu

ocea

n

barn

es

Nor

mal

ized

Avg

Pac

ket

Lat 256B RF-I

0.750.800.850.900.95

fft

radi

x

wat

er-s

p

wat

ern^

2 lu

ocea

n

barn

es

Nor

mal

ized

Run

Cyc

les 256B RF-I

Page 27: RF-Interconnect for Communications On-Chip

The Bad News …Most Interconnect Optimization Techniques May

Not be Relevant …• Performance-driven interconnect design based on distributed RC delay model - all 10 versions »

Jason Cong, Kwok-Shing Leung, and Dian Zhou, Design Automation Conference 1993, Cited by 141 - Related Articles - Web Search - Library Search

• Interconnect design for deep submicron ICs - all 25 versions »J Cong, L He, KY Khoo, CK Koh, Z Pan - Proc. Int. Conf. on Computer Aided Design, 1997 - doi.ieeecomputersociety.orgCited by 139 - Related Articles - Web Search

• Efficient algorithms for the minimum shortest path Steiner arborescence problem with applications to … - all 11 versions »Jason Cong, Andrew B. Kahng, and Kwok-Shing Leung, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, JANUARY 1998

Cited by 127 - Related Articles - Web Search

• Buffer block planning for interconnect-driven floorplanning - all 21 versions »J Cong, T Kong, DZ Pan - Proc. Int. Conf. Computer-Aided Design, 1999 - doi.ieeecomputersociety.orgCited by 130 - Related Articles - Web Search

… (from Google Scholar)

Page 28: RF-Interconnect for Communications On-Chip

Good News -- Plenty of New Problems for Future PhD Students

• How many/which routers should be RF-enabled?

– How many RF-I ports should each router have?• Dedicated or multiplexed with other ports?

• How much RF-I bandwidth to allocate?

– Total? Per communicating pair?

– Impacts active layer area consumed by RF-I components

• Which routing strategy to employ in presence of RF-I express channels?

• Dynamic or static allocation of frequency bands to sources/destinations

– Dynamic: requires arbitration overhead for channel assignment

– Static: may miss opportunity to match changing communication demand

• Support of multi-cast

Page 29: RF-Interconnect for Communications On-Chip

Example: Deadlock: To Avoid or Confront?

• South-Last Strategy [Ogras and Marculescu, 2006]

– Routes which can lead to circular buffer dependence are forbidden avoids deadlock

• Deadlock Detection & Recovery (DDR)– Based on Duato and Pinkston’s theory [Duato and

Pinkston 2001]

• If deadlock occurs, route all packets in the network on a spare virtual channel

• Use deadlock-free XY-routing• Packets entering network after this point may be

routed normally

Page 30: RF-Interconnect for Communications On-Chip

Deadlock Results

– South-Last strategy too restrictive• Halves the average realizable performance

– Deadlock is best detected and recovered from when it occurs• Detection happens reasonably quickly• Performance during recovery no worse than baseline

Page 31: RF-Interconnect for Communications On-Chip

Example: RF-I Topology and Bandwidth Optimization

• For each channel– Source and destination may

be reconfigured via frequency-band reassignment

• Can assign variable # of channels to each source, destination pair (s,d) – critical channels given more

bandwidth

• A flexible means to reconfigure topology

PHYSICAL

LOGICAL ALOGICAL B

Page 32: RF-Interconnect for Communications On-Chip

Variance In Communication PatternsVariance In Communication Patterns

Mpeg2Enc time varying behavior

1

10

100

1,000

10,000

100,000

1,000,000

1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239

inte rv al (250k cycle s)

ev

en

t c

ou

nt

L2 ACCESS NW INJECT BW STALL FLITS SENT

WaterSpatial time varying behavior

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85

inte rv al (250k cycle s)

ev

en

t c

ou

nt

L2 ACCESS NW INJECT BW STALL FLITS SENT

Page 33: RF-Interconnect for Communications On-Chip

Conclusions

• RF-I on CMOS is real• RF-I is a very promising solution to global

interconnect bottleneck– Latency– Bandwidth– Energy– Reconfigurability

• RF-I introduces many interesting physical and architecture design problems in NoC designs