Microsoft PowerPoint - PhD-defense

44
Anal y sis and Implementation of Optoelectronic Network Routers Ph.D. Defense b y Mon g kol Raksapatcharawon g SMART Interconnects Group Electrical En g ineerin g - S y stems Department Universit y of Southern California - LA http://www.usc.edu/dept/cen g /pinkston/SMART.html Date: September 25, 1998 Time: 12:00pm, EEB-108

description

 

Transcript of Microsoft PowerPoint - PhD-defense

Page 1: Microsoft PowerPoint - PhD-defense

Analysis and Implementation ofOptoelectronic Network Routers

Ph.D. Defenseby

Mongkol Raksapatcharawong

SMART † Interconnects GroupElectrical Engineering - Systems Department

University of Southern California - LA

http://www.usc.edu/dept/ceng/pinkston/SMART.html

Date: September 25, 1998Time: 12:00pm, EEB-108

Page 2: Microsoft PowerPoint - PhD-defense

The Big PictureThe Big Picture

The Problem : Network bandwidth is becoming a bottleneck.Interconnection Networks must deliver sufficient bandwidth to keep pace with

microprocessor.

The Problem : Network bandwidth is becoming a bottleneck.Interconnection Networks must deliver sufficient bandwidth to keep pace with

microprocessor.

The Unknowns:

Performance Issues, Design Issues, and Technology Issues.

The Unknowns:

Performance Issues, Design Issues, and Technology Issues.

Potential Solution: O ptoelectronic Network RoutersOptoelectronic technology increases physical bandwidth.

Advanced router architectures improve bandwidth utilization.

Potential Solution: Optoelectronic Network RoutersOptoelectronic technology increases physical bandwidth.

Advanced router architectures improve bandwidth utilization.

Page 3: Microsoft PowerPoint - PhD-defense

OutlineOutline

■■ Background and MotivationBackground and Motivation

■■ Research Issues and Approach Research Issues and Approach

■■ Modeling Free-Space Optical k-ary n-cube Wormhole Networks Modeling Free-Space Optical k-ary n-cube Wormhole Networks

■■ Design Issues of Optoelectronic Network Routers Design Issues of Optoelectronic Network Routers

■■ Implementing Optoelectronic Network Routers Implementing Optoelectronic Network Routers

■■ Conclusions and Future Work Conclusions and Future Work

Page 4: Microsoft PowerPoint - PhD-defense

ProblemProblem☛ Starvation for off-chip bandwidth.

☛ On-chip clock rates are doubling compared to off-chip clock rates.

☛ Processor-memory bandwidth is doubling.

☛ Possible solution: integrate

processor and memory onto one

chip--IRAM (Patterson 1995).

☛ Problem: shifts bandwidth

problem to the network in

multiprocessor systems.1

10

100

1000

10000

1980 1986 1992 1998 2004 2010year

cloc

k ra

te (M

Hz)

In tel proc Intel bus SIA proc SIA bus

808880286

80386

80486Pentium

Pentium Pro Pentium IIMerced (IA-64)

Page 5: Microsoft PowerPoint - PhD-defense

Problem (Problem (cont’dcont’d))

0

10

20

30

40

50

60

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007Ye ar

Sus

tain

ed

Band

wid

th (G

B/s) MultithreadedMultithreaded

processorprocessor

Available off-chipAvailable off-chipbandwidthbandwidth

Single-threadedSingle-threadedprocessorprocessor

■■ PrefetchingPrefetching and and multithreadingmultithreading pipeline pipelinethe memory accesses and threadsthe memory accesses and threadsexecutions.executions.

■■ Both schemes generate Both schemes generate more off-chip more off-chiptraffic.traffic.

■■ Processor performance hasProcessor performance has increased increasedmuch faster than memory.much faster than memory.

■■ Memory latency hiding/tolerating Memory latency hiding/toleratingtechniques are techniques are required.required.

High-bandwidth network is required.High-bandwidth network is required.

Page 6: Microsoft PowerPoint - PhD-defense

Demand on Network BandwidthDemand on Network Bandwidth

☛ Multiprocessor systems require high-performance network.

☛ Network Router must be fast and provide high bandwidth.

Optoelectronic router can help mitigate the bandwidth problem.Optoelectronic router can help mitigate the bandwidth problem.

Page 7: Microsoft PowerPoint - PhD-defense

State-of-the-Art Network RoutersState-of-the-Art Network Routers

Router Year On-chip/Off-chipClock Rates (MHz)

Internal/ExternalChannel width (bits)

SGI Spider [Galles, 1996] 100/200 (double-edge) 80/20Intel Teraflop [Carbonaro/Verhoorn, 1996] 200/200 16/16

Cray T3E [Scott/Thorson, 1996] 75/375 70/14Reliable Router [Dally et al., 1994] 100/100 (double-edge) 32/23

■■ All routers All routers employ sophisticated architectural techniquesemploy sophisticated architectural techniques, e.g., adaptive routing,, e.g., adaptive routing,pipelined functions, etc.pipelined functions, etc.

■■ Network routers are mostly Network routers are mostly designed according to the available off-chip bandwidthdesigned according to the available off-chip bandwidth,,barely take advantage of the state-of-the-art semiconductor technology.barely take advantage of the state-of-the-art semiconductor technology.

Limited off-chip bandwidth limitsthe performance of network routers.

Limited off-chip bandwidth limitsLimited off-chip bandwidth limitsthe performance of network routers.the performance of network routers.

Page 8: Microsoft PowerPoint - PhD-defense

Interconnect Year Transm ission rate (G Hz) Channel width (bit)Equalized serial line [Dally, 1996] 4 1

Bidirectional signaling [Haycock/M ooney, 1997] 2.5 8POLO [USC/HP, 1996] 1 10

Optobus II [Motorola, 1995] 0.8 10ChEEtah [USC/Honeyw ell, 1997] > 1.0 12

State-of-the-Art InterconnectsState-of-the-Art Interconnects

■■ High-performance electrical interconnectsHigh-performance electrical interconnects suffer more from signal skew and jitter, suffer more from signal skew and jitter,and and usually operate in serial mode.usually operate in serial mode.

■■ Optical interconnects suffer less Optical interconnects suffer less of the same effects and operate in wider channels.of the same effects and operate in wider channels.

■■ In exchange to higher performance, electrical interconnects In exchange to higher performance, electrical interconnects require large andrequire large andsophisticated transceiver circuits.sophisticated transceiver circuits.

Optical interconnects show a potentially better price/performance.Optical interconnects show a potentially better price/performance.Optical interconnects show a potentially better price/performance.

Page 9: Microsoft PowerPoint - PhD-defense

Equalized Line Transmitter [Dally 96]Size: 550µm x 900µm (5525%)Speed: 4Ghz

I/O pad driver [Tanner Research 97]Size: 80µm x 112µm (100%)Speed: ~200MHz

Optoelectronic Transmitter [Lucent 97]Size: 17µm x 11µm (2.1%)Speed: 2.48GHz

Optoelectronic Receiver [Lucent 97]Size: 17µm x 13µm (2.5%)Speed: 2.48GHz

Sizes are based on 0.5 µm (CMOS-HP 14B) technology.

Transceiver Sizes ComparisonTransceiver Sizes Comparison

Page 10: Microsoft PowerPoint - PhD-defense

Previous Work in Complex Optoelectronic ChipsPrevious Work in Complex Optoelectronic Chips

■■ the AMOEBA switch chipthe AMOEBA switch chip by by Krishnamoorthy Krishnamoorthy et al., 1996et al., 1996

■■ a 64-bit microprocessor corea 64-bit microprocessor core by by Kiamilev Kiamilev et al., 1996et al., 1996

■■ the Optical Multiprocessor Network Interface (OMNI) chipthe Optical Multiprocessor Network Interface (OMNI) chip by by Pinkston Pinkston and and Seelan Seelan,,19961996

■■ a 1kbit a 1kbit photonic photonic page bufferpage buffer by by Krishnamoorthy Krishnamoorthy et al., 1996et al., 1996

■■ a 16kbit a 16kbit photonic photonic page bufferpage buffer by by Kiamilev Kiamilev et al., 1997et al., 1997

■■ a multiply-accumulate DSP corea multiply-accumulate DSP core by by Rozier Rozier et al., 1998 et al., 1998

Previous work focused on design and implementation, notperformance evaluation of complex optoelectronic chips in general.

Previous work focused on design and implementation, notPrevious work focused on design and implementation, notperformance evaluation of complex optoelectronic chips in general.performance evaluation of complex optoelectronic chips in general.

Page 11: Microsoft PowerPoint - PhD-defense

CMOS and SEED Technology TrendsCMOS and SEED Technology Trends[SIA 97 and Krishnamoorthy 96][SIA 97 and Krishnamoorthy 96]

Year of first shipm ent 1999 2001 2003 2006 2009Technology ( µm) 0.18 0.15 0.13 0.10 0.07# Transistors (m illions) 6.2 10 18 39 84On-chip/Off-chip Clocks (MHz) 1250/480 1500/785 2100/885 3500/1035 6000/1285# Pin-outs Required (pins) 1570 2000 2400 3270 4400# BGA Package Pin-outs (pins) 1500 1800 2200 3000 4100# SEEDs (per chip) 8000 12000 20000 35000 47000Bonding Pad size (µm) 9 8 7 5 4

Optoelectronic SEED technology shows the potential tosustain the increasing bandwidth requirement.

Optoelectronic SEED technology shows the potential toOptoelectronic SEED technology shows the potential tosustain the increasing bandwidth requirement.sustain the increasing bandwidth requirement.

Page 12: Microsoft PowerPoint - PhD-defense

WARRP Router: Complexity and I/O Pin-out RequirementWARRP Router: Complexity and I/O Pin-out Requirement

☛ Electronic I/O (BGA packaging) is a limiting factor.

Network routers can benefit from large # of I/O pin-outs provided by CMOS/SEED.Network routers can benefit from large # of I/O pin-outs provided by CMOS/SEED.

0.01

0.1

1

10

100

10 100 1000 10000

1-VC 2-VC 3-VC

# pin-outs

# tra

nsist

ors

(milli

ons)

ServerNet II

Electronic-based pin-outs CMOS/SEED-basedpin-outsCommercial routers

Intel Teraflop

Cray T3E

SGI Spider

1995

2003

2009

Year (BGA packaging)

1D-16B-Uni

8D-256B-Bi

8D-64B-Bi

8D-16B-Bi2D-8B-Bi

1D-8B-Uni

1D-4B-Uni

WARRP II

Mosaic C (1992)

Mosaic (1987)

Page 13: Microsoft PowerPoint - PhD-defense

Proposed SolutionProposed Solution

Optoelectronic Network Router based on the WARRP (Wormhole

Adaptive Recovery-based Routing via Preemption) Architecture:

☛ dense optoelectronic I/O devices—provide design flexibility

☛ high-speed signaling—enable the design of high-performance network routers

☛ increased bandwidth—allow advanced network router architectures

The proposed solution is potentially advantageous inthe development of next-generation network routers.The proposed solution is potentially advantageous inThe proposed solution is potentially advantageous inthe development of next-generation network routers.the development of next-generation network routers.

Page 14: Microsoft PowerPoint - PhD-defense

Research Issues and ApproachResearch Issues and Approach

Optoelectronic network routers:

☛ How does it benefit the multiprocessor network?—use analytical model based

on widely-employed k-ary n-cube class of networks.

☛ What are the issues pertinent to the development of such routers?—use CAD

tools and semi-empirical model based on the WARRP router to identify the

problem and evaluate the chips’ performance.

☛ Can they be implemented?—implement the WARRP router through various

optoelectronic integrated technologies.

Page 15: Microsoft PowerPoint - PhD-defense

Implementation Cost Model—Connection CapacityImplementation Cost Model—Connection Capacity

■ Bisection Width [Dally 90] is the number of connectionscrossing an imaginary plane dividing system into two equalhalves—useful for electrical interconnected systems.

■ Connection Capacity [Mongkol & Pinkston 96] isintroduced as the number of connections that can beestablished for a given imaging system—useful for 3-D free-space optical interconnects.

Page 16: Microsoft PowerPoint - PhD-defense

Bisection Width and Connection CapacityBisection Width and Connection Capacityof k-ary n-cube Networksof k-ary n-cube Networks

Bisection width:Bisection width: B = 2WnkB = 2Wnkn-1n-1

Connection capacity:Connection capacity: C = C = WnkWnknn

Where n is the network dimension,k is the network radix,W is the channel width.

Where n is the network dimension,k is the network radix,W is the channel width.

Page 17: Microsoft PowerPoint - PhD-defense

System A16-node torus

System B8-node hypercube

Bisection width 8 8Connection

capacity32 24

Mirror plane

Optical signal path(Only one row is shown)

Bisection plane

Microlens-hologram plane

Diffractive-Reflective Optical Interconnect (DROI)Diffractive-Reflective Optical Interconnect (DROI)Diffractive-Reflective Optical Interconnect (DROI)

➨➨ A system with A system with connection capacity of 24 can implement onlyconnection capacity of 24 can implement onlySystem BSystem B though both systems have similar bisection width. though both systems have similar bisection width.

Connection capacity is a Connection capacity is a more accuratemore accurate implementation cost measure. implementation cost measure.

Bisection Width and Connection Capacity ComparisonBisection Width and Connection Capacity Comparison

Page 18: Microsoft PowerPoint - PhD-defense

Network Latency for Wormhole Switched NetworksNetwork Latency for Wormhole Switched Networks

TTnetnet = = DD((ttrr + + ttss + + ttww) + max() + max(ttss,, ttww) ) L/L/WW

■■ Effects of optoelectronic Effects of optoelectronic technology on network latency: technology on network latency:

■■ Dense I/O pin-outs affects network topology ( Dense I/O pin-outs affects network topology (DD) and channel width () and channel width (WW); and); and

■■ High-speed signaling reduces propagation delay ( High-speed signaling reduces propagation delay (ttww).).

Where Tnet is the low load network latency,D is the network hops from source to destination,L is the data message length,W is the channel width,tr is the routing time,ts is the data-thru time,tw is the wire delay time.

Where Tnet is the low load network latency,D is the network hops from source to destination,L is the data message length,W is the channel width,tr is the routing time,ts is the data-thru time,tw is the wire delay time.

Page 19: Microsoft PowerPoint - PhD-defense

Other Important Equations for Performance EvaluationOther Important Equations for Performance Evaluation

Interconnection distance:Interconnection distance:

⋅⋅

=⋅

=⋅

=

,other any sin

2

,4sin

2

,2sin

2

12

1

12

max

kkp

kp

kp

R

n

n

n

θ

θ

θ

Connection capacity:Connection capacity: 22 D

system

M

AC =

Channel width:Channel width: ( ) kNNM

AnkW

Doptics log

log2,

2⋅

=

( )2

,k

NT

ALnkW

welec ⋅

=

( ),,,,, knphFAsystem θ= ( )fbx wLnF ,,,λθ =

Page 20: Microsoft PowerPoint - PhD-defense

■■ Assuming Tc is determined byAssuming Tc is determined bypropagation delay.propagation delay.

■■ Tc = Tc = To/eTo/e + +TeTe/o/o + +TpropTprop

conversion delayconversion delay (technology dependent)(technology dependent)

}

Channel Cycle Time (TChannel Cycle Time (TCC))

■■ ConversionConversion time is time is notnot a a killer!!killer!!

■■ It is very It is very importantimportant to have an to have an efficientefficientimaging systemimaging system..

propagation delay (topology dependent)

0

15

30

45

60

75

0.1 0.3 0.5 0.7 0.9link e f f icie ncy, η

(r e cie ve d pow e r , m W )

Rm

ax, c

m0.85

1.85

2.85

3.85

4.85

Tc, n

s

to po lo g y-de pe nde n t r e g ion

te chno lo g y-de pe nde n t r e g ion

Tc-in t = 4ns

00.5

11.5

22.5

33.5

0.1 0.3 0.5 0.7 0.9l in k effic ien cy, h

(recieved p o wer , m W )

in terco n n ectio n d istan ce, m

de

lay,

n

To /e Te /o Tprop

cro ssp o in t

Page 21: Microsoft PowerPoint - PhD-defense

Performance EvaluationPerformance Evaluation

Parameters for ELECTRICAL system.

Chip area 1in2

PCB size 12x12in2

# of layers 20min. connection length (p) 1.5in

Parameters for OPTICAL system.

laser wavelength (λ) 850nmVCSEL beam radius 5µmVCSEL output power 1mWP-I-N detector size 15x15µm2

microlens diameter 125µmlink efficiency (η) ~ 63%

chip area 1cm2

interconnection area 12x12cm2

usable microlens area (A) 64cm2

min. connection path (p) 1.5cmmax. deflection angle

(θmax)~ 24 o

OpticsOptics vs vs Electronics Electronics (64-node system) (64-node system)

Page 22: Microsoft PowerPoint - PhD-defense

Channel Width and Network LatencyChannel Width and Network Latency

0

20

40

60

80

100

120

140

160

180

2 3 4 5 6

dimension, n

chan

nel w

idth

, bits

/cha

nne

l

e lec optics

0

50

100

150

200

250

300

350

400

2 3 4 5 6

dimension, nne

twor

k la

tenc

y, n

s

e lecopticsoptics(200Mhz)

■■ OpticsOptics could provide about could provide about an order of magnitude higher connectivityan order of magnitude higher connectivity than thanelectronic.electronic.

■■ Optics still yields about twice the channel width of electronic.Optics still yields about twice the channel width of electronic. Hence, network Hence, networklatency is lower!latency is lower!

■■ Even ifEven if channel cycle time is determined by internal router delay, channel cycle time is determined by internal router delay, wider channel stillwider channel stillgreatly benefits the network latencygreatly benefits the network latency (shown as optics (200MHz)). (shown as optics (200MHz)).

Page 23: Microsoft PowerPoint - PhD-defense

Packaging Issues: Power DissipationPackaging Issues: Power Dissipation

0

30

60

90

2 3 4 5 6

dimension, n

late

ncy,

ns

e lec optics

laser(low )

0

100

200

300

400

500

600

2 3 4 5 6

dimension, n

chan

nel w

idth

, bit/

chan

nel

e lec optics

laser(low)

■■ Limited cooling capability reducesLimited cooling capability reduces the achievable I/O pin-outs in optics. the achievable I/O pin-outs in optics.

■■ Optics still yields lower network latencyOptics still yields lower network latency due to faster achievable cycle time. due to faster achievable cycle time.

Page 24: Microsoft PowerPoint - PhD-defense

Packaging and Device TolerancesPackaging and Device Tolerances

Angular misalignment

Longitudinal misalignment

TX RX Lateral misalignment

■■ Lateral misalignment:Lateral misalignment: ∆Lat = 102µm

■■ Longitudinal misalignment:Longitudinal misalignment: ∆Long = 230µm

■■ Angular misalignment: Angular misalignment: ∆θ = 0.044o

■■ Wavelength variation: Wavelength variation: ∆λ = 0.8nm

Page 25: Microsoft PowerPoint - PhD-defense

Optoelectronic Network Routers: How beneficial?Optoelectronic Network Routers: How beneficial?

Multiprocessor networks Multiprocessor networks can benefit from optoelectronic routerscan benefit from optoelectronic routers in two ways: in two ways:

■■ A large number of I/Os A large number of I/Os allows more design flexibilityallows more design flexibility, i.e., a wide-range of, i.e., a wide-range oftopologies is efficiently supported.topologies is efficiently supported.

■■ High-speed optical signaling unleashes the power of high-performanceHigh-speed optical signaling unleashes the power of high-performancenetwork routers by network routers by fully utilizing the advanced semiconductor technology.fully utilizing the advanced semiconductor technology.

Given that:Given that:

■■ Better packaging technologyBetter packaging technology (includes cooling technique, micro-optic (includes cooling technique, micro-opticalignment technique, etc.) and more uniform characteristic optoelectronicalignment technique, etc.) and more uniform characteristic optoelectronicdevices devices are available.are available.

■■ The bottom line: optoelectronic and its related technologies are progressingThe bottom line: optoelectronic and its related technologies are progressingat an impressive rate and, hence, at an impressive rate and, hence, the above conclusions are becoming athe above conclusions are becoming anear-term reality.near-term reality.

Page 26: Microsoft PowerPoint - PhD-defense

Pixel-based Pixel-based vsvs. Core-based CMOS/SEED Designs. Core-based CMOS/SEED Designs

Pixel-based designs:☛ small (self-contained) circuitry☛ implements simpler functions☛ connections are local and regular

Core-based designs:☛ large (non-self-contained) circuitry☛ implements complex functions☛ connections are global and less regular

The TRANSPAR chip (courtesy A. Sawchuk, USC) The WARRP II chip (SMART group, USC)

pixel

core

Design issues exist in implementing core-based designs!Design issues exist in implementing core-based designs!

Page 27: Microsoft PowerPoint - PhD-defense

Core-based CMOS/SEED Design IssuesCore-based CMOS/SEED Design Issues

☛ Large number of SEED transceivers must be integrated with CMOS core.

☛ CMOS I/O ports are not perfectly aligned with the SEED array.

☛ At least the top metal layer is reserved exclusively for SEED wiring to

simplify CMOS/SEED integration.

☛ Space-invariant imaging system requires structured I/Os on the chip.

Wiring in core-based designs is a problemWiring in core-based designs is a problem

Consequences:

☛ Connections between transceivers to CMOS I/O ports and/or bonding pads are longer.

☛ Less wiring and area resources for CMOS circuitry, reducing transistor density.

☛ May increase critical paths, reducing achievable on-chip clock rates.

Page 28: Microsoft PowerPoint - PhD-defense

Solutions for the Wiring ProblemSolutions for the Wiring Problem

☛ Manual integration (simpler, more primitive method)

☛ CMOS core and SEED array are separately designed and do not fully overlap

(e.g., WARRP II and 64-bit processor core [Kiamilev et al., MPPOI 96]).

(+) Compatible with CMOS CAD tools.

(−) Chip resources are hardly optimized, seriously negate the chip performance.

(−) Impractical for large core-based designs.

Page 29: Microsoft PowerPoint - PhD-defense

Core-based Designs using Manual IntegrationCore-based Designs using Manual Integration

The 1kbit Photonic Page-buffer chip [Krishnamoorthy et al., AO 96]

SRAM cells and datapath circuitswith the SEED array on top

SRAM cells and datapath circuitswith the SEED array on top

SEED transceivers arelocated on the peripherySEED transceivers are

located on the periphery

(+) Simplifies the wiring problem(+) Compatible with existing CAD tools

(−) Very long connections(−) May increase critical paths(−) Low chip area utilization

Page 30: Microsoft PowerPoint - PhD-defense

SEED andreceiver array

SEED andtransmitter array

data

path

SRAM

cel

ls

SRAM

cells

datapath

The 16kbit Photonic Page-buffer chip [Kiamilev et al., IJO 97]

Core-based Designs using Manual Integration (Core-based Designs using Manual Integration (cont’dcont’d))

CMOS circuits are placed on the periphery of the SEEDarray and corresponding transceivers

CMOS circuits are placed on the periphery of the SEEDarray and corresponding transceivers

(+) Simplifies the wiring problem(+) Compatible with existing CAD tools(+) Reduces connection length(+) Improves signal integrity

(−) Low chip area utilization

Page 31: Microsoft PowerPoint - PhD-defense

☛ Automatic integration (under development)

☛ CMOS core and SEED array are simultaneously optimized by CAD tool.

(+) Higher chip performance can be achieved.

(+) Practical for large core-based designs.

(−) Requires optoelectronic-compatible CAD tools.

(−) Effects of long connections and less transistor density still exist.

Automatic integration is the more efficient and preferred method.Automatic integration is the more efficient and preferred method.

Solutions for the Wiring Problem (Solutions for the Wiring Problem (cont’dcont’d))

Page 32: Microsoft PowerPoint - PhD-defense

Core-based Design using Automated CAD ToolsCore-based Design using Automated CAD Tools

The Multiply-accumulate chip [Rozier et al., LEOS 98],designed using EPOCH and EGGO CAD tools

CMOS circuits, SEED array, and SEEDtransceivers are fully-overlapped

CMOS circuits, SEED array, and SEEDtransceivers are fully-overlapped

(+) Directly tackles the wiring problem(+) Improves chip resource utilization(+) Mitigates the longer connections andless transistor density effects

(−) Requires optoelectronic-compatibleCAD tools

Page 33: Microsoft PowerPoint - PhD-defense

Cost Estimation of Core-based CMOS/SEED DesignsCost Estimation of Core-based CMOS/SEED Designs

Wiring utilization is determinedby synthesis of the WARRProuter using EPOCH tool.

Wiring cost modelWiring capacity model

SEED parameters :bonding pad size,number of SEEDs,and SEED pitches

SEED parameters :bonding pad size,number of SEEDs,and SEED pitches

Wiring parameters :number of available metal layers,signal types (single or dual-rail),

routing style, wiring utilization, andmetal pitches

Wiring parameters :number of available metal layers,signal types (single or dual-rail),

routing style, wiring utilization , andmetal pitches

System-level parameter :interconnection pattern (optical

imaging system constraints)

System-level parameter :interconnection pattern (optical

imaging system constraints)

Number of metal layers required by SEED wiringNumber of metal layers required by SEED wiring

☛ Estimate transistor density☛ Estimate critical path length☛ Estimate aggregate off-chip bandwidth

Page 34: Microsoft PowerPoint - PhD-defense

SEED and Wiring ParametersSEED and Wiring Parameters

Where:D is the total number of SEED diodes,DX is the number of SEEDs in x-direction,Dy is the number of SEEDs in y-direction,P is the bonding pad size,Xpitch is the pitch of diode in x-direction,Ypitch is the pitch of diode in y-direction,MX-pitch is the pitch of metal layer in x-direction,MY-pitch is the pitch of metal layer in y-direction.

BondingPad

SEED

BondingPad

SEED

BondingPad

SEED

BondingPad

SEED

Xpitch

Ypitch

P MY-pitch

MX-pitch

We need to find the wiring capacity providedby the space in the SEED array and the

wiring cost required to connect all SEEDs.

We need to find the wiring capacity providedby the space in the SEED array and the

wiring cost required to connect all SEEDs.

Page 35: Microsoft PowerPoint - PhD-defense

Wiring Capacity and Wiring Cost ModelsWiring Capacity and Wiring Cost Models

Assumptions:

☛ Signals are dual-rail.

☛ Wiring is X-Y style and requires at least 2 metal layers.

☛ SEEDs and CMOS I/O ports are placed randomly (worst case).

Wiring Capacity in x- and y-directions: Wiring Cost in x- and y-directions:

⋅⋅=

Dm

YKX

pitchX

pitchiC

−⋅=

Dm

PXKY

pitchY

pitchjC

22x

R

DDX ⋅=

22y

R

DDY ⋅=

Ki and Kj are the wiring utilization of metal layer i and j, typical values are 65% to 75%

Page 36: Microsoft PowerPoint - PhD-defense

Performance comparison between CMOS/SEED & CMOS chipsPerformance comparison between CMOS/SEED & CMOS chips

50

60

70

80

0.18 0.15 0.13 0.1 0.07Technology (um)

TX D

ensi

ty (%

) ava

ilabl

e to

CM

OS

/SEE

D c

hips

1

2

3

4

5

# Metal Layers

CMOS/SEED Transistor density

# Metal layers required for SEED Routing

Year of first shipm ent 1999 2001 2003 2006 2009Technology ( µm) 0.18 0.15 0.13 0.10 0.07# of Metal Layers Required (x, y) 1,1 1,1 2,1 2,2 2,2Normalized Transistor D ensity 0.778 0.778 0.645 0.592 0.675Normalized On-chip Clock 0.768 0.768 0.737 0.706 0.706Normalized Aggregate Bandwidth 2.131 2.740 4.392 7.210 9.716

100

1000

10000

100000

0.18 0.15 0.13 0.1 0.07Technology (um)

Ban

dwid

th (

GB

/s)

1000

10000

100000

# I/Os

Max BW (SEED) Max BW (SEED)

#I/Os (SEED) #I/Os (BGA)

Page 37: Microsoft PowerPoint - PhD-defense

Design Cost EstimationDesign Cost Estimation

Given the design information is available:Given the design information is available:

■■ Chip area can be estimated.Chip area can be estimated.

■■ If the cost of design is fixed, what configurations can be implemented?If the cost of design is fixed, what configurations can be implemented?

■■ To conclude, To conclude, the model gives relevant information that we have not knownthe model gives relevant information that we have not knownbefore regarding optoelectronic implementations of complex chip designs.before regarding optoelectronic implementations of complex chip designs.

The results can be used to validate that, even with thewiring problem, complex optoelectronic network routers

can still be effectively implemented !!

The results can be used to validate that, even with theThe results can be used to validate that, even with thewiring problem, complex optoelectronic network routerswiring problem, complex optoelectronic network routers

can still be effectively implemented !!can still be effectively implemented !!

Page 38: Microsoft PowerPoint - PhD-defense

Core-based CMOS/SEED Chips: Are They Effective?Core-based CMOS/SEED Chips: Are They Effective?

Compared to pure-CMOS chips, CMOS/SEED chips:Compared to pure-CMOS chips, CMOS/SEED chips:

■■ sacrifice at most 40% of transistor density and 30% of on-chip clock rates insacrifice at most 40% of transistor density and 30% of on-chip clock rates inexchange of an order of magnitude more I/O pin-outs.exchange of an order of magnitude more I/O pin-outs.

Given that:Given that:

■■ optoelectronic compatible CAD tools are available.optoelectronic compatible CAD tools are available.

■■ The bottom line: as transistors are cheaper in time, complex CMOS/SEEDThe bottom line: as transistors are cheaper in time, complex CMOS/SEEDchips provide the valuable bandwidth critically needed by current and nextchips provide the valuable bandwidth critically needed by current and nextgeneration computer systems, at a very compromising cost.generation computer systems, at a very compromising cost.

Page 39: Microsoft PowerPoint - PhD-defense

Fully adaptive wormhole network routerFully adaptive wormhole network router**

ProcessingNode

FC

IB

FC

IB

FC

IB

FC

IBDM

DM

DM

DM

5 x 6Crossbar

Switch

Normal Router

Deadlock Router

MX

OEI

OEI

OEI

OEI

FC

OB

FC

OB

FC

OB

FC

OB

Proc In Proc Out

X+

X−

Y+

Y−

X+

X−

Y+

Y−

InputPhysicalChannels(optical )

MXEOI

MXEOI

MXEOI

MXEOI

MX

OutputPhysicalChannels(optical)

Deadlock routing section

Normal routing section

deadlock

DB

Legend: DM: Demultiplexer MX: Multiplexer FC: Flow Controller

IB: Input VC Buffers OB: Output VC Buffers DB: Deadlock buffer

OEI: Opto-Electronic Interface EOI: Electro-Optic Interface

Internal flow control External flow controlExternal flow control

*Shown is a 2-D torus-connected, fully-adaptive, deadlock-recovery network router with 1 virtual channel.

Page 40: Microsoft PowerPoint - PhD-defense

The WARRP Core—A Monolithic GaAs Network Router CoreThe WARRP Core—A Monolithic GaAs Network Router Core

■■ NCIPT(ARPA) / MIT NCIPT(ARPA) / MIT Optochip Optochip ProjectProject

■■ Implements core circuit of deadlock handlingImplements core circuit of deadlock handlingmechanismsmechanisms (deadlock buffer, input/output (deadlock buffer, input/outputbuffers, arbitration logic, flow control logic).buffers, arbitration logic, flow control logic).

■■ Uses Uses monolithicmonolithic GaAs GaAs based technology based technology to toimplement both logic functions and opticalimplement both logic functions and opticalI/O (LED and OPFET detector).I/O (LED and OPFET detector).

Technology: 0.6Technology: 0.6µµm m VitesseVitesse H-GaAs III process H-GaAs III process (ECL compatible logic) (ECL compatible logic)

Die size: 2mm x 1mmDie size: 2mm x 1mmComplexity: ~1,400 transistorsComplexity: ~1,400 transistors# electrical I/Os: 27 signals# electrical I/Os: 27 signals# optical I/Os: 12 single-ended signals# optical I/Os: 12 single-ended signals

■■ 1-bit wide, 4-flit deep buffers; ring1-bit wide, 4-flit deep buffers; ringtopology. This is topology. This is sufficient to demonstratesufficient to demonstrateprogressive deadlock recovery.progressive deadlock recovery.

■■ 5 state complex FSM controller with5 state complex FSM controller withpreemption prediction logic.preemption prediction logic.

■■ Operates at > 50 MHz (under SPICE).Operates at > 50 MHz (under SPICE).

■■ Status: electrical and optical versions areStatus: electrical and optical versions areavailable.available.

WWormhole ormhole AAdaptive daptive RRecovery-based ecovery-based RRouting viaouting viaPPreemption (reemption (WARRPWARRP) core) core

LEDLEDOPFET detectorOPFET detector

Page 41: Microsoft PowerPoint - PhD-defense

The WARRP II Router ChipThe WARRP II Router Chip

☛ Ring topology☛ 4-bit-wide unidirectional channels☛ 1 virtual channel with 2-flit deep buffers☛ Fully adaptive, deadlock recovery routing☛ The core circuitry requires ~10,000 transistors.☛ Die size (core) is 0.836x0.822mm2 (0.687mm2)☛ 40 Electrical I/Os and 20 dual-rail SEED I/Os☛ CMOS HP14B process with 3 metal layers☛ Operates at ~30MHz (using IRSIM)

WARRP II core circuitryWARRP II core circuitry

SEED modulator and driver circuitsSEED modulator and driver circuits

SEED receiver and driver circuitsSEED receiver and driver circuits

Page 42: Microsoft PowerPoint - PhD-defense

ContributionsContributions

■■ Explain the network bandwidth problemExplain the network bandwidth problem which is becoming more and more which is becoming more and morecritical in multiprocessor systems.critical in multiprocessor systems.

■■ Introduce the connection capacity concept and establish a cost andIntroduce the connection capacity concept and establish a cost andperformance modelperformance model based on it to analyze the performance of 3-D optical based on it to analyze the performance of 3-D opticalnetworks.networks.

■■ Identify the wiring problem in complex CMOS/SEED chip designsIdentify the wiring problem in complex CMOS/SEED chip designs and model and modelthe performance of such chips incorporating the wiring problem by using athe performance of such chips incorporating the wiring problem by using asemi-empirical model.semi-empirical model.

■■ Implement optoelectronic network router chipsImplement optoelectronic network router chips based on the WARRP router based on the WARRP routerarchitecture using monolithic and hybrid optoelectronic/VLSI technologies.architecture using monolithic and hybrid optoelectronic/VLSI technologies.

■■ Suggest some advanced architectural techniquesSuggest some advanced architectural techniques to improve the network to improve the networkperformance and network bandwidth utilization.performance and network bandwidth utilization.

Page 43: Microsoft PowerPoint - PhD-defense

ConclusionsConclusions

■■ Optoelectronic network routers Optoelectronic network routers not only increase the network bandwidth but alsonot only increase the network bandwidth but alsofacilitate the developmentfacilitate the development of high-performance network routers. of high-performance network routers.

■■ Optoelectronic network routers Optoelectronic network routers are feasibleare feasible given that packaging, device, and given that packaging, device, andoptoelectronic compatible CAD tool technologies are effectively addressed.optoelectronic compatible CAD tool technologies are effectively addressed.

■■ An optoelectronic network router An optoelectronic network router shows the potential to outperformsshows the potential to outperforms its electronic its electroniccounterpart in terms of available bandwidth and number of I/O pin-outs.counterpart in terms of available bandwidth and number of I/O pin-outs.

■■ Internal network router architecturesInternal network router architectures must also be must also be reevaluatedreevaluated to maximize theto maximize theon-chip bandwidth and on-chip bandwidth and not to be a bottlenecknot to be a bottleneck under high-bandwidth interconnects under high-bandwidth interconnectsenvironment.environment.

Page 44: Microsoft PowerPoint - PhD-defense

Future WorkFuture Work

Network performance and bandwidth utilization can be improvedNetwork performance and bandwidth utilization can be improved by byincorporating advanced architectural techniques such as:incorporating advanced architectural techniques such as:

■■ Efficient channel configurations.Efficient channel configurations.

■■ Asynchronous token-based channel arbitration.Asynchronous token-based channel arbitration.

■■ Flit-bundling transfer technique.Flit-bundling transfer technique.

■■ Delayed-buffer technique.Delayed-buffer technique.