Indirect Adaptive Routing on Large Scale Interconnection Networks
description
Transcript of Indirect Adaptive Routing on Large Scale Interconnection Networks
1
Indirect Adaptive Routing on Large Scale Interconnection Networks
Nan Jiang, William J. Dally
Computer System Laboratory
Stanford University
John Kim
Korean Advanced Institute of Science and
Technology
2
Overview• Indirect adaptive routing (IAR)
– Allow adaptive routing decision to be based on local and remote congestion information
• Main contributions– Three new IAR algorithms for large scale networks– Steady state and transient performance evaluations– Impact of network configurations– Cost of implementation
3
Presentation Outline
• Background– The dragonfly network– Adaptive routing
• Indirect adaptive routing algorithms• Performance results • Implementation considerations
4
Group 1Group 0 Group 2 …
Global Network
…
… …
Local Network
Router 1 Router 2
The Dragonfly Network• High Radix Network
– High radix routers– Small network diameter
• Each router– Three types of channels– Directly connected to a few
other groups• Each group
– Organized by a local network – Large number of global
channels (GC)• Large network with a global
diameter of one
Router 0p0
p1
…
5
Routing on the Dragonfly• Minimal Routing (MIN)
1. Source local network2. Global network3. Destination local network
• Some Adversarial traffic congests the global channels– Each group i sends all packets
to group i+1
• Oblivious solution: Valiant’s Algorithm (VAL)– Poor performance on benign
traffic
Router 0
Group 1Group 0
…
Group 2 …
p0
p1
…
… …
Router 1 Router 2Congestion
6
Adaptive Routing• Choose between the MIN path
and a VAL path at the packet source [Singh'05]– Decision metric: path delay– Delay: product of path distance
and path queue depth
• Measuring path queue length is unrealistic
• Use local queues length to approximate path– Require stiff backpressure
SourceRouter
q0 q1
q2 q3
MIN GC
VALGC
Congestion
7
Adaptive Routing: Worst Case Traffic
0 0.1 0.2 0.3 0.4 0.5100
150
200
250
300
350
400
450
Throughput (Flit Injection Rate)
Pac
ket L
aten
cy (
Sim
ulat
ion
cycl
es)
Valiant’sMinimalAdaptive
8
Indirect Adaptive Routing• Improve routing decision through remote
congestion information • Previous method:
– Credit round trip [Kim et. al ISCA’08]• Three new methods:
– Reservation – Piggyback– Progressive
9
Credit Round Trip (CRT)• Delay the return of local
credits to the congested router
• Creates the illusion of stiffer backpressure
• Drawbacks– Remote congestion is still
inferred through local queues
– Information not up to date SourceRouter
Congestion
DelayedCredits
Credits
MIN GC
VALGC
[Kim et. al ISCA’08]
10
Reservation (RES)• Each global channel track
the number of incoming MIN packets
• Injected packets creates a reservation flit
• Routing decision based on the reservation outcome
• Drawbacks– Reservation flit flooding– Reservation delay
SourceRouter
Congestion
RESFlit
RESFailed
MIN GC
VALGC
11
Piggyback (PB)• Local congestion broadcast
– Piggybacking on each packet
– Send on idle channels• Congestion data
compression
• Drawbacks– Consumes extra bandwidth– Congestion information not
up to date (broadcast delay)
SourceRouter
Congestion
GCBusy
GCFree
MIN GC
VALGC
12
Progressive (PAR)• MIN routing decisions at
the source are not final• VAL decisions are final• Switch to VAL when
encountering congestion
• Draw backs– Need an additional virtual
channel to avoid deadlock– Add extra hops
SourceRouter
Congestion
MIN GC
VALGC
13
Experimental Setup
• Fully connected local and global networks– 33 groups– 1,056 nodes
• 10 cycle local channel latency• 100 cycle global channel latency• 10-flit packets
14
Steady State Traffic: Uniform Random
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9100
120
140
160
180
200
220
240
260
280
300
Throughput (Flit Injection Rate)
Pac
ket L
aten
cy (
Sim
ulat
ion
cycl
es)
PiggybackCredit Round TripProgressiveReservationMinimal
15
Steady State Traffic: Worst Case
0 0.1 0.2 0.3 0.4 0.5100
150
200
250
300
350
400
450
Throughput (Flit Injection Rate)
Pac
ket L
aten
cy (S
imul
atio
n cy
cles
)
PiggybackCredit Round TripProgressiveReservationValiant’s
16
Transient Traffic: Uniform Random to Worst Case
0 20 40 60 80 100100
200
300
400
500
Cycles After Transition
Pac
ket L
aten
cyAverage Packet Latency per Cycle - UR to WC
ProgressivePiggyback
0 20 40 60 80 1000
50
100
Cycles After Transition% o
f Pac
kets
Rou
ting
Non
min
imal
ly
% Packets Routing Non-minimally per Cycle - UR to WC
ProgressivePiggyback
17
Network Configuration Considerations
• Packet size– RES requires long packets to amortize reservation flit cost– Routing decision is done on per packet basis
• Channel latency– Affects information delay (CRT, PB)– Affects packet delay (PAR, RES)
• Network size– Affects information bandwidth overhead (RES, PB)
• Global diameter greater than one– Need to exchange congestion information on the global
network
18
Cost Considerations• Credit round trip
– Credit delay tracker for every local channel
• Reservation– Reservation counter for every global channel– Additional buffering at the injection port to store packets
waiting for reservation
• Piggyback– Global channel lookup table for every router– Increase in packet size
• Progressive– Extra virtual channel for deadlock avoidance
19
Conclusion• Three new indirect adaptive routing algorithms for large
scale networks• Performance and design evaluation of the algorithms
• Best Algorithm?– Piggyback performed the best under steady state traffic– Progressive responded fastest to transient changes
– Network configurations will affect some algorithm performance– Cost of implementation
20
Thank You!•Questions?
21
Adaptive Routing: Uniform Traffic
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9100
120
140
160
180
200
220
240
260
280
300
Throughput - Flit Injection Rate
Pac
ket L
aten
cy -
Sim
ulat
ion
cycl
es
VALMINAdaptive
22
Transient Traffic: Worst Case to Uniform Random
0 20 40 60 80 100150
200
250
300
350
Cycles After Transition
Pac
ket L
aten
cyAverage Packet Latency per Cycle - WC to UR
PARPB
0 20 40 60 80 1000
50
100
Cycles After Transition% o
f Pac
kets
Rou
ting
Non
min
imal
ly
% Packets Routing Non-minimally per Cycle - WC to UR
PARPB
23
Transient Traffic: Worst Case 1 to Worst Case 10
0 20 40 60 80 100200
300
400
500
Cycles After Transition
Pac
ket L
aten
cyAverage Packet Latency per Cycle - WC1 to WC10
PARPB
0 20 40 60 80 1000
50
100
Cycles After Transition% o
f Pac
kets
Rou
ting
Non
min
imal
ly
% Packets Routing Non-minimally per Cycle - WC1 to WC10
PARPB
24
1000 Random Permutation Traffic
200 3000
5
10
15
20
25
30PB
Packet Latency
% o
f 1K
Per
mut
atio
ns
200 3000
5
10
15
20
25
30CRT
Packet Latency%
of 1
K P
erm
utat
ions
200 3000
5
10
15
20
25
30PAR
Packet Latency
% o
f 1K
Per
mut
atio
ns
200 3000
5
10
15
20
25
30RES
Packet Latency
% o
f 1K
Per
mut
atio
ns
200 3000
5
10
15
20
25
30VAL
Packet Latency
% o
f 1K
Per
mut
atio
ns
25
Effect of Packet size on RES: Worst Case Traffic
0 0.1 0.2 0.3 0.4 0.50
50
100
150
200
250
300
350
400
450
500
550
Throughput - Flit Injection Rate
Late
ncy
- Sim
ulat
ion
cycl
es
1 Flit2 Flits4 Flits8 Flits
26
Large local network: Uniform Random
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
50
100
150
200
250
300
350
400
Throughput - Flit Injection Rate
Pac
ket L
aten
cy -
Sim
ulat
ion
cycl
es
PBCRTMINPARRES
27
Large local network: Worst Case
0 0.1 0.2 0.3 0.4 0.50
100
200
300
400
500
600
Throughput - Flit Injection Rate
Pac
ket L
aten
cy -
Sim
ulat
ion
cycl
es
PBCRTPARRESVAL