Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… ·...

23
Lecture 9: Flow Control - III Tushar Krishna Assistant Professor School of Electrical and Computer Engineering Georgia Institute of Technology [email protected] ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Acknowledgment: Slides adapted from Univ of Toronto ECE 1749 H (N Jerger)

Transcript of Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… ·...

Page 1: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Lecture 9:Flow Control - IIITushar Krishna

Assistant ProfessorSchool of Electrical and Computer EngineeringGeorgia Institute of Technology

[email protected]

ECE 8823 A / CS 8803 - ICNInterconnection NetworksSpring 2017http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/

Acknowledgment: Slides adapted from Univ of Toronto ECE 1749 H (N Jerger)

Page 2: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

2

Designing a Flow Control Protocol: Managing Buffers and Contention

Page 3: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Flow Control Protocol: Arbitration

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

3

Output to Core

Input from Core

?1. Who should use output link?

Ring

Page 4: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Flow Control Protocol: Backpressure

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

4

New Flit

3. What should a flit do if its output is blocked?

Output to Core

Input from Core

Output to Core

Input from Core

Full

2. What to do with the other flit (from ring/core)

Page 5: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Backpressure Signaling Mechanisms¡On/Off Flow Control¡downstream router signals if it can receive or not

¡Credit-based Flow Control¡upstream router tracks the number of free buffers

available at the downstream router

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

5

Page 6: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

On/Off Flow Control¡Downstream router sends a 1-bit on/off if it can

receive or not¡Upstream router sends only when it sees on

¡Any potential challenge?¡Delay of on/off signal¡ By the time the on/off signal reaches upstream, there

might already be flits in flight¡Need to send the off signal once the number of buffers

reaches a threshold such that all potential in-flight flits have a free buffer

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

6

Page 7: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

On/Off Timeline with N buffers

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

7

Process

Node 1 Node 2t1

t2

Nthreshold reached

Process

t3t4

t5

t6t7

t8

Nthreshold set to 3 to prevent flits departing Node 1 before t4 from

overflowing

Nthreshold +1 reached

Ntotal set so that Node 2 does not run out of flits to send between

t5 and t8

On/Off Delay = 2

Process Delay = 1

Flit Delay = 1

Page 8: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Backpressure Signaling Mechanisms¡On/Off Flow Control¡Pros¡ Low overhead: one-bit signal from downstream to

upstream node, only switches when threshold crossed¡Cons¡ Inefficient buffer utilization – have to design assuming

worst case of Nthreshold flights in flight

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

8

Page 9: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Credit-based Flow Control¡Upstream router tracks the number of free

buffers available at the downstream router¡Upstream router sends only if credits > 0

¡When should credit be decremented at upstream router?¡When a flit is sent to the downstream router

¡When should credit be incremented at upstream router?¡When a flit leaves the downstream router

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

9

Page 10: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Credit Timeline

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

10

Node 1 Node 2

Process credit Process

Flitt4

Credits=1

Node 0Credits=4 Credits=2

Credits=0

Credits=5

Credits=1t5

t1

t2

t3

Credits=1Process credit

Page 11: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Backpressure Signaling Mechanisms¡On/Off Flow Control¡Pros¡ Low overhead: one-bit signal

¡Cons¡ Inefficient buffer utilization – have to design assuming

worst case of Nthreshold flights in flight

¡Credit Flow Control¡Pros¡ Each buffer fully utilized - an keep sending till credits

are zero (unlike on/off)¡Cons¡More signaling – need to signal upstream for every flit

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

11

Page 12: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Backpressure and Buffer Sizing

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

12

Node 1 Node 2

Process credit Process

Flitt4

Credits=1

Node 0Credits=4 Credits=2

Credits=0

Credits=5

Credits=1

Credits=1

t5

t1

t2

t3 Buffer Turnaround

Time or Credit Round

Trip Time

No flit can be sent into this buffer during this delay

Process credit

To prevent backpressure from limiting throughput, number of buffers >= turnaround time

Page 13: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Buffer Turnaround Delay

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

13

Flit arrives at node 1 and uses buffer

Flit leaves node 1 and credit is sent

to node 0

Node 0 receives credit

Node 0 processes credit, freed buffer reallocated to new flit New flit leaves

Node 0 for Node 1

New flit arrives at Node 1 and reuses buffer

Actual buffer usage

Credit propagation

delay

Credit pipeline delay flit pipeline delay

flit propagation

delay

1 1 3 1

How many buffers needed?

How many buffers needed in on/off flow-control?

1+1+3+1 = 6

(off propogation + processing)6 + 2 = 8

Page 14: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

But this is inefficient

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

14

Flit arrives at node 1 and uses buffer

Flit leaves node 1 and credit is sent

to node 0

Node 0 receives credit

Node 0 processes credit, freed buffer reallocated to new flit New flit leaves

Node 0 for Node 1

New flit arrives at Node 1 and reuses buffer

Actual buffer usage

Credit propagation

delay

Credit pipeline delay flit pipeline delay

flit propagation

delay

1 1 3 1

Page 15: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Flit Reservation Flow Control (Peh et al., HPCA 2000)¡What is the key idea (and benefit)?

¡Why can’t we just do static scheduling?

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

15

Page 16: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Conventional Virtual Channel Router

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

16

Crossbar SwitchInput Buffers

VC 1VC 2

VC n

VA: VC AllocationInput VCs arbitrate for “output” VCs (Input VCs at next router)

SA: Switch AllocationInput ports arbitrate for output ports

Input Buffers

VC 0VC 1

VC n

BW VA ST LT

VC Allocator

SW Allocator

BW: Buffer Write

ST: Switch Traversal

LT: Link Traversal

RC: Route Compute

RC

Route Compute

SA BR

BR: Buffer Read

VN0

VN 1

FLIT

Page 17: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Router Microarchitecture¡Components¡Virtual Channel Buffers¡Routing Logic¡Allocation¡ Switch Allocation¡ VC Allocation

¡Crossbar Switch¡Link

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

17

Crossbar SwitchInput Buffers

VC 1VC 2

VC n

Input Buffers

VC 0VC 1

VC n

VC Allocator

SW Allocator

Route Compute

VN0

VN 1

Page 18: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Flit Reservation Router

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

18

No-load delay?1-cycle

Page 19: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Key Unit: I/O Reservation Tables

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

19

Output Reservation Table Input Reservation Table

Credit Flow?

Page 20: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Design Details¡How is the crossbar switch driven?¡When is buffer ID allocated? Why?¡What if data arrives before output channel

allocated?¡Why could this happen?

¡What if 2 control flits for the same output port?¡Not discussed in paper. They probably allocate serially,

but this would require buffering¡Buffer organization – shared pool vs. per VC¡Head-of-line blocking?

¡Scheduling horizon?

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

20

Page 21: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Overheads

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

21

Page 22: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Evaluations

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

22

FR6 has same throughput as FR16

What is the key benefit of the scheme?

Page 23: Lecture 9: Flow Control - IIIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/175/2016/10/L… · Process Node 1 Node 2 t1 t2 N threshold reached Process t3 t4 t5 t6 t7 t8 N threshold

Evaluation Methodology¡What if only single-flit packets

¡How representative is random traffic for real life applications¡Does that take away from the merits of the paper?

February 13, 2017ICN | Spring 2017 | L09: Flow Control - III © Tushar Krishna, School of ECE, Georgia Tech

23