Firefly: Illuminating Future Network-on-Chip with Nanophotonics
-
Upload
jennifer-stone -
Category
Documents
-
view
24 -
download
0
description
Transcript of Firefly: Illuminating Future Network-on-Chip with Nanophotonics
Firefly: Illuminating Future Network-on-Chip with Nanophotonics
Yan Pan, Prabhat Kumar, John Kim†, Gokhan Memik, Yu Zhang, Alok Choudhary
EECS DepartmentNorthwestern University
Evanston, IL, USA{panyan,prabhat-kumar,g-memik,
yu-zhang,a-choudhary}@northwestern.edu
† CS DepartmentKAIST
Daejeon, [email protected]
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 2/25
On-Chip Network TopologiesOn-Chip Network Topologies
Mesh[MIT RAW] [TILE64]
[Teraflops]
C-Mesh[Balfour’06]
[Cianchetti’09]
Crossbar[Vantrease’08]
[Kirman’06]
Others: Torus[Shacham’07], Flattened Butterfly[Kim’07], Dragonfly[Kim’08], Hierarchical(Bus&Mesh)[Das’08], Clos[Joshi’09], Ring[Larrabee], ……
► Network-on-chip is critical for performance.
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 3/25
Signaling technologiesSignaling technologies
► Electrical signaling– Repeater insertion needed– Bandwidth density (up to 8 Gbps/um) [Chang HPCA‘08]
► Nanophotonics– Bandwidth density ~100 Gbps/ μm !!! [Batten HOTI’08]
– Generally distance independent power consumption– Speed of light low latency
• Propagation• Switching [Cianchetti ISCA’09]
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 4/25
Nanophotonic componentsNanophotonic components
► Basic components
off-chiplaser source
coupler
resonant modulators
resonant detectors
Ge-doped
waveguide
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 5/25
Radius r Baseline WavelengthTemperature t Manufacturing error correctionCarrier density d Fast tuning by charge injection
Resonant RingsResonant Rings
► Selective– Couple optical energy of a specific wavelength
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 6/25
Putting it togetherPutting it together
► Modulation & detection– ~100 Gbps/μm bandwidth density [Batten HOTI’08]
11010101
11010101
10001011
10001011
64 wavelengths DWDM3 ~ 5μm waveguide pitch10Gbps per link
~100 Gbps/μmbandwidth density
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 7/25
What’s the catch?What’s the catch?
► Power Cost– Ring heating– Laser Power– E/O & O/E conversions– Distance insensitive
► For short links (2.5mm)
– Nanophotonics– Electrical
• RC lines with repeater insertion
[Batten HOTI’08] [Cheng ISCA’06]
0
100
200
300
400
500
600
700
Nanophotonics RC Line
Per B
it E
nerg
y (f
J/b)
Optical Components Ring Heating
Laser Electrical
► For long links– Nanophotonics
• Cost stays the same
– Electrical• Cost increases
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 8/25
Here is the idea ……Here is the idea ……
► Design an architecture that differentiates traffic.– Use electrical signaling for short links.– Use nanophotonics only for long range traffic.
► What do we gain?– Low latency– High bandwidth density– High power efficiency– Localized arbitration– Scalability
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 9/25
OutlineOutline
► Motivation► Architecture of Firefly► Evaluation► Conclusion
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 10/25
Layout View of 64-core FireflyLayout View of 64-core Firefly
► Concentration– 4 cores share a
router– 16 routers
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3
P0 P1
P2 P3R
R R
R R
R R
R R
R R
R R
R R
R
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 11/25
Layout View of 64-core FireflyLayout View of 64-core Firefly
► Concentration► Clusters
– Electrically connected
– Mesh topology– 4 routers per
cluster– 4 clusters
R R
R R
R R
R R
R R
R R
R R
R R
Cluster 0Cluster 0(C0)(C0)
Cluster 0Cluster 0(C0)(C0)
Cluster 1Cluster 1(C1)(C1)
Cluster 1Cluster 1(C1)(C1)
Cluster 3Cluster 3(C3)(C3)
Cluster 3Cluster 3(C3)(C3)
Cluster 2Cluster 2(C2)(C2)
Cluster 2Cluster 2(C2)(C2)
C0R0 C0R1
C0R2 C0R3
C1R0 C1R1
C1R2 C1R3
C3R0 C3R1
C3R2 C3R3
C2R0 C2R1
C2R2 C2R3
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 12/25
C0R0 C0R1
C0R2 C0R3
C1R0 C1R1
C1R2 C1R3
C3R0 C3R1
C3R2 C3R3
C2R0 C2R1
C2R2 C2R3
C0R3 C1R3
C3R3C2R3
C0R2 C1R2
C3R2C2R2
Layout View of 64-core FireflyLayout View of 64-core Firefly
► Concentration► Clusters► Assemblies
– Routers from different clusters
– Optically connected
– Logical crossbars
C0R0 C1R0
C3R0C2R0
C0R1 C1R1
C3R1C2R1
A1A1
A0A0
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 13/25
C0R0 C0R1
C0R2 C0R3
C1R0 C1R1
C1R2 C1R3
C3R0 C3R1
C3R2 C3R3
C2R0 C2R1
C2R2 C2R3
Layout View of 64-core FireflyLayout View of 64-core Firefly
► Clusters– Electrical
CMESH
► Assemblies– Nanophotonic
crossbars
A2A2
A3A3
A0A0
A1A1
Nanophotonic Nanophotonic CrossbarsCrossbarsEfficient nanophotonic
crossbars needed!
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 14/25
Nanophotonic crossbarsNanophotonic crossbars
► Single-Write-Multiple-Read (SWMR) [Kirman’06] (CMXbar††)
– Dedicated sending channel– Multicast in nature– Receiver compare & discard – High fan-out laser power
SWMR Crossbar
†† [Joshi NOCS’09]
CH0
R0 R1 RN-1
w
CH1
...
......
... ... ...w
w
... ...
CH(N-1)
Data
Ch
ann
els
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 15/25
Nanophotonic crossbarsNanophotonic crossbars
► Multiple-Write-Single-Read (MWSR)[Vantrease’08] (DMXbar††)
– Dedicated receiving channel– Demux to channel– Global arbitration needed!
MWSR Crossbar
CH0
R0 R1 RN-1
CH1
...
......
... ... ...
... ...
CH(N-1)
w
w
w
Data
Ch
ann
els
†† [Joshi NOCS’09]
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 16/25
Reservation-assisted SWMRReservation-assisted SWMR
► Goal– Avoid global arbitration– Reduce power
► Proposed design– Reservation channels
• Narrow
– Multicast to reserve• Destination ID• Packet length
– Uni-cast data packet R-SWMR Crossbar
CH0a
CH1a
CH(N-1)a
...
... ... ...
log (Ns)
... ...
log (Ns)
Reservation
C
han
nels
log (Ns)
CH0
R0 R1 RN-1
CH1
CH(N-1)
...
......
... ... ...
... ...
...
w
w
w
Data
Ch
ann
els
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 17/25
Router MicroarchitectureRouter Microarchitecture
► Virtual-channel router– Added optical link ports and extra buffer.
SwitchAllocator
VCAllocator
Output k
Crossbar switch
RouterRoutingcomputation
Eject(Output 1)
VC 1
VC 2
VC v
VC 1
VC 2
VC v
Inject(Input 1)
Input k
Arbiter
global output
E/Oglobal input 1
O/E
global input gO/E
input buffer Dedicated sending channel for all traffic.
Separate receiving channels from other clusters.
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 18/25
► Routing– Intra-cluster routing– Traversing optical link
RoutingRouting
C0R0
C5R0
C5R1
C5R2
C5R3
RT LT LT LT LT LT OA RT LT RT LT RT LT RT
RT LT LT LT LT LT OA RT LT RT LT RT LT RT
RT LT LT LT LT LT OA RT LT RT LT RT LT RT
head
body
tail
RB
--
--
RTRT
RBRB
LTLT
OAOA
SwitchAllocator
VCAllocator
Output k
Crossbar switch
RouterRoutingcomputation
Eject(Output 1)
VC 1
VC 2
VC v
VC 1
VC 2
VC v
Inject(Input 1)
Input k
Arbiter
global output
E/Oglobal input 1
O/E
global input gO/E
input buffer
SwitchAllocator
VCAllocator
Output k
Crossbar switch
RouterRoutingcomputation
Eject(Output 1)
VC 1
VC 2
VC v
VC 1
VC 2
VC v
Inject(Input 1)
Input k
Arbiter
global output
E/Oglobal input 1
O/E
global input gO/E
input buffer
FIREFLY_dest FIREFLY_src
(FIREFLY_dest)
CH0a
CH1a
CH(N-1)a
...
... ... ...
log (Ns)
... ...
log (Ns)
Reservation
C
han
nels
log (Ns)
CH0
R0 R1 RN-1
CH1
CH(N-1)
...
......
... ... ...
... ...
...
w
w
w
Data
Ch
ann
els
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 19/25
Firefly – another lookFirefly – another look
► Clusters– Short electrical links– Concentrated mesh
► Assemblies– Long nanophotonic links– Partitioned crossbars
► Benefits– Traffic locality– Reduced hardware– Localized arbitration– Distributed inter-cluster bandwidth
C0R3
P
P
P
P
C0R0
P
P
P
P
C0R2
P
P
P
P
C0R1
P
P
P
P
C2R0
P
P
P
P
C3R0
P
P
P
P
C1R0
P
P
P
P
C0
C1
C2
C3
C0R3
P
P
P
P
C0R0
P
P
P
P
C0R2
P
P
P
P
C0R1
P
P
P
P
C0
...
...
C2R0
P
P
P
P
C3R0
P
P
P
P
C1R0
P
P
P
P
C1
C2
C3
...
...
...
...
A0
A1
A2
A3
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 20/25
OutlineOutline
► Motivation► Architecture of Firefly► Evaluation► Conclusion
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 21/25
Evaluation SetupEvaluation Setup
► Cycle-accurate simulator (Booksim)
► Firefly vs. CMESH, Dragonfly† and OP_XBAR► Synthetic traffic patterns and traces
Code Name Topology Global RoutingMin#VC
CMESH Concentrated mesh dimension-ordered routing 1
DFLY_MINMinimal routing, traversing nanophotonics at most once.
2
DFLY_VALNonminimal routing, traversing nanophotonics up to twice.
3
OP_XBARAll-optical crossbar using token-based global arbitration
destination-based routing 1
FIREFLYProposed hybrid architecture with multiple logical optical inter-cluster crossbar.
Intra-cluster routing in the source cluster before traversing nanophotonics
1
Dragonfly topology mapped to on-chip network
Electrical
Hybrid
Optical
Hybrid
[† Kim et al, ISCA’08]
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 22/25
Load / Latency CurveLoad / Latency Curve
► Throughput– Up to 4.8x over OP_XBAR– At least +70% over Dragonfly
0
5
10
15
20
25
30
35
0 0.1 0.2 0.3 0.4 0.5 0.6
Late
ncy
(#Cy
cles
)
Injection Rate(a)
0
5
10
15
20
25
30
35
0 0.2 0.4 0.6 0.8 1
Late
ncy
(#Cy
cles
)Injection Rate
(b)
0
10
20
30
40
50
60
0 0.2 0.4 0.6 0.8 1
Late
ncy
(#Cy
cles
)
Injection Rate(d)
0
10
20
30
40
50
60
0 0.1 0.2 0.3 0.4 0.5 0.6
Late
ncy
(#Cy
cles
)
Injection Rate(c)
Bitcomp, 1-cycle Uniform, 1-cycle
4.8x 70%
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 23/25
Energy BreakdownEnergy Breakdown
► Reduced hardware by partitioning– Reduced heating
► Throughput impact► Locality
– 34% energy reduction over OP_XBAR with locality
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
CMESHDFLY_MINDFLY_VALOP_XBAR
FIREFLYCMESH
DFLY_MINDFLY_VALOP_XBAR
FIREFLY
Tape
r_L0
.7D
7Bi
tcom
p
Average Per-packet Energy (nJ)
Router / DEMUX
Electircal Link
Optical Link
Laser
Ring Heating
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 24/25
Technology SensitivityTechnology Sensitivity
► α is heating ratio and β is laser ratio.► Firefly favors traffic locality.
bitcomp taper_L0.7D7
Motivation Architecture of Firefly Evaluation Conclusion
ISCA 2009Yan Pan 25/25
ConclusionConclusion
► Technology impacts architecture– New opportunities in nanophotonics
• Low latency, high bandwidth density
– Tailored architectures needed
► Firefly benefits from nanophotonics by providing– Power Efficiency
• Hybrid signaling• Partitioned R-SWMR crossbars
Reduced hardware/power
– Scalability• Scalable inter-cluster bandwidth• Low-radix routers/crossbars