Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ)...
Transcript of Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ)...
![Page 1: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/1.jpg)
Programmable Packet Scheduling
Stephen Ibanez, Nick McKeown, Gordon Brebner, Anthony Dalleggio, Anirudh Sivaraman
2/7/2019
![Page 2: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/2.jpg)
What is Packet Scheduling?
>> 2
3 x 3Switch
scheduler
![Page 3: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/3.jpg)
Is Packet Scheduling Important?
>> 3
Web Search Workload
˃ Goal: Minimize Flow Completion Time
˃ Shortest Remaining Processing Time (SRPT)
[1] Alizadeh, Mohammad, et al. "pfabric: Minimal near-optimal datacenter transport." ACM SIGCOMM, 2013.
0.2 0.4 0.6 0.80
2
4
6
8
10
Load
Nor
mal
ized
FC
T
TCP-DropTailDCTCPPDQpFabricIdeal
(a) (0, 100KB]: Avg
0.2 0.4 0.6 0.80
2
4
6
8
10
Load
Nor
mal
ized
FC
T(b) (0, 100KB]: 99th prctile
0.2 0.4 0.6 0.80
5
10
15
20
25
Load
Nor
mal
ized
FC
T
(c) (10MB, ∞): Avg
Figure 8: Web search workload: Normalized FCT statistics across different flow sizes. Note that TCP-DropTail does not appear inpart (b) because its performance is outside the plotted range and the y-axis for part (c) has a different range than the other plots.
0.2 0.4 0.6 0.80
2
4
6
8
10
Load
Nor
mal
ized
FC
T
TCP-DropTailDCTCPPDQpFabricIdeal
(a) (0, 100KB]: Avg
0.2 0.4 0.6 0.80
2
4
6
8
10
Load
Nor
mal
ized
FC
T
(b) (0, 100KB]: 99th prctile
0.2 0.4 0.6 0.80
2
4
6
8
10
LoadN
orm
aliz
ed F
CT
(c) (10MB, ∞): Avg
Figure 9: Data mining workload: Normalized FCT statistics across different flow sizes. Note that TCP-DropTail does not appear inpart (b) because its performance is outside the plotted range.
results for the medium (100KB, 10MB] flows whose performanceis qualitatively similar to the small flows (the complete results areprovided in [6]) The results are shown in Figures 8 and 9 for thetwo workloads. We plot the average (normalized) FCT in each binand also the 99th percentile for the small flows The results showthat for both workloads, pFabric achieves near-optimal average and99th percentile FCT for the small flows: it is within ∼1.3–13.4%of the ideal average FCT and within ∼3.3–29% of the ideal 99thpercentile FCT (depending on load). Compared to PDQ, the av-erage FCT for the small flows with pFabric is ∼30-50% lower forthe web search workload and ∼45-55% lower for the data miningworkload with even larger improvements at the 99th percentile.
pFabric also achieves very good performance for the averageFCT of the large flows, across all but the highest loads in the websearch workload. pFabric is roughly the same as TCP and ∼30%worse than Ideal at 80% load for the large flows in the web searchworkload (for the data mining workload, it is within ∼3.3% ofIdeal across all flows). This gap is mainly due to the relativelyhigh loss rate at high load for this workload which wastes band-width on the upstream links (§4.2). Despite the rate control, at80% load, the high initial flow rates and aggressive retransmissionscause a ∼4.3% packet drop rate in the fabric (excluding drops at thesource NICs which do not waste bandwidth), almost all of whichoccur at the last hop (the destination’s access link). However, atsuch high load, a small amount of wasted bandwidth can cause adisproportionate slowdown for the large flows [4]. Note that thisperformance loss occurs only in extreme conditions — with a chal-lenging workload with lots of elephant flows and at very high load.As Figure 8(c) shows, under these conditions, PDQ’s performanceis more than 75% worse than pFabric.
5.4.2 Mix of deadline-constrained anddeadline-unconstrained traffic
We now show that pFabric maximizes the number of flows thatmeet their deadlines while still minimizing the flow completiontime for flows without deadlines. To perform this experiment, we
assign deadlines for the flows that are smaller than 200KB in theweb search and data mining workloads. The deadlines are assumedto be exponentially distributed similar to prior work [21, 14, 18].We vary the mean of the exponential distribution (in different sim-ulations) from 100µs to 100ms to explore the behavior under tightand loose deadlines and measure the Application Throughput (thefraction of flows that meet their deadline) and the average normal-ized FCT for the flows that do not have deadlines. We lower boundthe deadlines to be at least 25% larger than the minimum FCT pos-sible for each flow to avoid deadlines that are impossible to meet.
In addition to the schemes used for the baseline simulations withdeadline-unconstrained traffic, we present the results for pFabricwith Earliest-Deadline-First (EDF) scheduling. pFabric-EDF as-signs the packet priorities for the deadline-constrained flows to bethe flow’s deadline quantized to microseconds; the packets of flowswithout deadlines are assigned priority based on remaining flowsize. Separate queues are used at each fabric port for the deadline-constrained and deadline-unconstrained traffic with strict prioritygiven to the deadline-constrained queue. Within each queue, thepFabric scheduling and dropping mechanisms determine which pack-ets to schedule or drop. Each queue has 36KB of buffer.
Figure 10 shows the application throughout for the two work-loads at 60% load. We picked this moderately high load to testpFabric’s deadline performance under relatively stressful condi-tions. We find that for both workloads, both pFabric-EDF andpFabric achieve almost 100% application throughput even at thetightest deadlines and perform significantly better than the otherschemes. For the web search workload, pFabric-EDF achieves anApplication Throughput of 98.9% for average deadline of 100µs;pFabric (which is deadline-agnostic and just uses the remainingflow size as the priority) is only slightly worse at 98.4% (the num-bers are even higher in the data mining workload). This is notsurprising; since pFabric achieves a near-ideal FCT for the smallflows, it can meet even the tightest deadlines for them. As expected,PDQ achieves a higher application throughput than the other schemes.But it misses a lot more deadlines than pFabric, especially at the
442
10x
>>10xSRPT
![Page 4: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/4.jpg)
Weighted Fair Queueing (WFQ)
˃ Bandwidth guarantees
˃ Fully utilize capacity
˃ Common Implementations:
Deficit Round Robin (DRR)
Start Time Fair Queueing (STFQ)
Stochastic Fair Queueing (SFQ)
˃ Unideal for latency sensitive traffic
>> 4
Web
Video
Big Data
Backup
0.4
0.3
0.1
0.2
![Page 5: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/5.jpg)
Strict Priority
˃ For latency sensitive traffic
˃ Commonly supported today
˃ Problems:
Only up to 8 priority levels per output port
Starvation of low priority traffic
>> 5
Control
Memcached
Other
HI
MED
LOW
![Page 6: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/6.jpg)
Windowed Strict Priority
˃ Goal: Prioritize but avoid starvation
˃ Over time window of length T, serve at most N packets from class i if there are lower priority packets to be served
˃ Need to build a new switch ASIC for this
>> 6
Control
Other
HI
LOW
TN = 3
MemcachedMED
T
![Page 7: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/7.jpg)
Modern Programmable Switch
>> 7
Still fixed function
TrafficManager
ProgrammableParser
ProgrammableIngress Pipeline Programmable
Deparser
ProgrammableEgress Pipeline
![Page 8: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/8.jpg)
Motivation:
>> 8
Be able to deploy new scheduling policies in production networks
Can we find a programmable abstraction for scheduling policies?
Questions:
Can our abstraction be efficiently implemented in hardware?
![Page 9: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/9.jpg)
Packet Scheduler
>> 9
.
.
.
SchedulingLogic
100’s of queues
5 Tbps
One decision every:64B/5Tbps = 100ps
Observations:1. Switches are great at making per-packet decisions
2. Virtually all scheduling policies can decide a packet’s scheduling priority before it is queued
![Page 10: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/10.jpg)
Packet Scheduler
>> 10
.
.
.
SchedulingLogic
![Page 11: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/11.jpg)
Push-In-First-Out (PIFO) Model [1]
˃ Key constraint: packets can’t change rank after insert
˃ Clear separation of fixed and programmable logic
˃ Can implement virtually every scheduling policy that we care about today
>> 11
03678
Fixed PIFO
Programmable rank computation
4Packets always
dequeued from the head
[1] A. Sivaraman, et al. "Programmable packet scheduling at line rate." ACM SIGCOMM 2016
![Page 12: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/12.jpg)
PIFO Examples
>> 12
0123
PIFO
4
p.rank = now
Rank computation
0011
PIFO
0
p.rank = p.tos
Rank computation
FIFO:
Strict Priority:
![Page 13: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/13.jpg)
WFQ using PIFO
˃ If rank computations can utilize state …
˃ Start time fair queueing (STFQ)
>> 13
01235
Fixed PIFO
4
f = flow(p)p.start = max(T[f].finish, virtual_time)T[f].finish = p.start + p.lenp.rank = p.start
Rank computation
![Page 14: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/14.jpg)
Shortest Remaining Processing Time with PIFO
˃ Packet rank set by end host
>> 14
f = flow(p)p.rank = f.rem_size
13789
PIFO SchedulerRank computation
![Page 15: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/15.jpg)
Fine grained priorities
˃ Shortest Flow First (SFF)
˃ Least Slack Time First (LSTF)
˃ Earliest Deadline First (EDF)
˃ Shortest Remaining Processing Time (SRPT)
˃ Service Curve Earliest Deadline first (SCED)
˃ Least Attained Service (LAS)
>> 15
![Page 16: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/16.jpg)
Hierarchical Scheduling
>> 16
xyx bbby
a
Slide credit: Anirudh Sivaraman
˃ Hierarchical Packet Fair Queueing (HPFQ)
˃ Cannot be expressed with a single PIFO
WFQ
WFQ WFQ
a b x y
0.50.5
0.010.99 0.50.5
![Page 17: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/17.jpg)
Hierarchical Scheduling
˃ Hierarchical Packet Fair Queueing (HPFQ)
>> 17
bb b a
PIFO-Red(WFQ on a & b)
PIFO-root (WFQ on Red & Blue)
xx yy
PIFO-Blue(WFQ on x & y)
a1a
BRBB RRBR
Slide credit: Anirudh Sivaraman
WFQ
WFQ WFQ
a b x y
0.50.5
0.010.99 0.50.5
![Page 18: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/18.jpg)
Getting Creative
˃ Punish heavy hitters
˃ Prioritize flows that have experienced the most queueing delay at previous hops
˃ WFQ with weights determined by buffer occupancy
>> 18
![Page 19: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/19.jpg)
Question 2:
Can our scheduling abstraction be efficiently implemented in hardware?
(My focus in this area)
![Page 20: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/20.jpg)
7
8
Can we actually implement a PIFO?
>> 20
01334
˃ Observation: Don’t need to perfectly sort all packetsHead packets are the most important – they will be scheduled soonIt’s ok if there is some churn in the tail packets
566PIFO
HeadTail
˃ Key Features:Head is a small & fast sorting elementTail passes sorted packets to head at line rate (head can never go empty)
compare
![Page 21: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/21.jpg)
Can we actually implement a PIFO?
>> 21
˃ Observation: Don’t need to perfectly sort all packetsHead packets are the most important – they will be scheduled soonIt’s ok if there is some churn in the tail packets
Load Balancer
Selector
Register Head
Det. Skip List
Register Head
Det. Skip List
Register Head
Det. Skip List
Register Head
. . .
Insertion
Removal
HeadTail
![Page 22: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/22.jpg)
NetFPGA Implementation
˃ NetFPGA SUME (Virtex-7)
˃ 200 MHz
˃ 10G PIFO
˃ HeadRegister based – 16 packets
˃ TailDeterministic skip lists – 2K packets
˃ BRAM packet storage
>> 22
Buffer 1
Buffer i
Buffer N
. . .
. . .
Classification & Policing & Drop
Policy
Packet Storage
03678
RankComputation
PIFO
![Page 23: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/23.jpg)
Main Takeaways
q Programmable abstraction for scheduling policiesq Can be efficiently implemented in hardwareq Expose scheduling abstraction to data-plane
programmers
>> 23
![Page 24: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/24.jpg)
Questions?
![Page 25: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/25.jpg)
Extra Slides
![Page 26: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/26.jpg)
References
[1] Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown. "Programmable packet scheduling at line rate." Proceedings of the 2016 ACM SIGCOMM Conference https://cs.nyu.edu/~anirudh/pifo-sigcomm.pdf
>> 26
![Page 27: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/27.jpg)
Dealing with rank wrap
˃ Use larger keys (e.g. 56 bits, wraps every 2 years @ 1 GHz)
˃ Restart at rank=0 once PIFO is empty
˃ Use parallel PIFOsOnce close to wrap, switch to alternate PIFO & use rank=0Give old PIFO strict priority until empty
>> 27
![Page 28: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/28.jpg)
Traffic Manager
>> 28
SharedPacket Buffer
.
.
.
Scheduler
Classification
packetdescriptor
packetdescriptor
One decision every clock cycle
(1ns)
![Page 29: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/29.jpg)
Pipelined Dequeue Logic
>> 29
FIFO 1
FIFO 2
FIFO 3
ClassificationLogic
a
a
a
b
b
b
Pipeline State
1a, 2a, 3a
Dequeue LogicStage 1
Dequeue LogicStage 2
1a
2a
3a
![Page 30: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/30.jpg)
Pipelined Dequeue Logic
>> 30
FIFO 1
FIFO 2
FIFO 3
ClassificationLogic
a
a
a
b
b
b
Pipeline State
1a, 2a, 3a
Dequeue LogicStage 1
Dequeue LogicStage 2
1a
2a
3a
1a, 2a, 3a1b
2b
3b
1b, 2a, 3a
1a, 2b, 3a
1a, 2a, 3b
speculative execution
![Page 31: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/31.jpg)
Pipelined Dequeue Logic
>> 31
FIFO 1
FIFO 2
FIFO 3
ClassificationLogic
a
a
a
b
b
b
˃ Pros:Provides a way to pipeline dequeue logic, so dequeue logic can be more complicated and hence more programmable
˃ Cons:Complicated implementationSpeculative execution requires extra resources and power, which increases dramatically with dequeue pipeline depth
Pipeline StateDequeue Logic
Stage 1Dequeue Logic
Stage 2
1a, 2a, 3a1b, 2a, 3a
1a, 2b, 3a
1a, 2a, 3bRemove
1a
bc
![Page 32: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/32.jpg)
7
642
036
4
88
Line Rate Sorting
˃ Don’t need to sort all packets!
˃ Observation 1: Ranks increase within a flow
>> 32
45
67
Flow AFlow BFlow C
FIFO queues
PIFO
Flow D
2
![Page 33: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/33.jpg)
Implementation Concern: Rank Computations
>> 33
f = flow(p)p.start = max(T[f].finish, virtual_time)T[f].finish = p.start + p.lenp.rank = p.start
01235
Fixed PIFO
4
virtual_time = p.rank
virtual_time
˃ Some rank computations require shared state between PIFO enqueue and dequeue
˃ Problem:State is local to a PISA pipeline stage, cannot be shared
![Page 34: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/34.jpg)
Implementation Concern: Rank Computations
>> 34
f = flow(p)if (deq_trigger):
virtual_time = deq_rank;if (enq_trigger):
p.start = max(T[f].finish, virtual_time)T[f].finish = p.start + p.lenp.rank = p.start
01235
Fixed PIFO
4
bool deq_triggerbit<16> deq_rank
virtual_timeTrigger pipeline on both enqueue and dequeue events
EnqueueEvent
DequeueEvent
![Page 35: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/35.jpg)
Beyond Packet Scheduling˃ Event-driven packet processing
˃ Events in today’s architectures:Ingress, Egress, Recirculation
˃ New events:Timer, Enqueue, Dequeue, Drop, Loop, Ingress-to-Egress, Control Plane, Link Status Change
˃ What can you do with events?Derive congestion signals for AQMTime-based state updatesEvent-triggered Network TelemetryImproved load balancingOffload control-plane functionality to data-plane
>> 35
![Page 36: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/36.jpg)
Why should we care about packet scheduling?
˃ Lots of different types of traffic w/ different characteristics and requirements
˃ Network operators have a wide range of objectives
˃ Network devices are picking up more functionality
˃ WAN links are expensive à want to make best use of them by prioritizing traffic
˃ Performance isolation for thousands of VMs per server
>> 36
![Page 37: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/37.jpg)
Benefits of programmable packet scheduling˃ Benefits of an unambiguous way to define scheduling algorithms:
Portability
Formal verification and static analysis
Precise way to express customer needs
˃ Benefits of having programmable scheduler:
Innovation
Differentiation
Reliability
Network operators can fine tune for performance
Small menu of algorithms to choose from today
Many possible algorithms that can be expressed
>> 37
![Page 38: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/38.jpg)
p1
Enqueue Packet 1
>> 38
Classification / Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p1
&p13
rank logic rank logic
rank logic
L2
p1
compute path
p1
L
t1
t2
t3t4
t5
t6
t7t8
Output Port
![Page 39: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/39.jpg)
rank logic rank logic
rank logic
p1
p2
Enqueue Packet 2
>> 39
Classification / Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p2
&p13
&p27
L2
R1
p2
compute path
p2
R
t1
t2
t3t4
t5
t6
t7t8
Output Port
![Page 40: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/40.jpg)
rank logic rank logic
rank logic
p1
p3
p2
Enqueue Packet 3
>> 40
Classification / Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p3
&p13
&p32
&p27
L2
R1
p3
compute path
p3
R
R1 t1
t2
t3t4
t5
t6
t7t8
Output Port
![Page 41: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/41.jpg)
rank logic rank logic
rank logic
p1
p3
p2
Dequeue Packet 3
>> 41
Classification / Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p13
&p32
&p27
L2
R1
compute path
R1
R
&p3
t1
t2
t3t4
t5
t6
t7t8
Output Port
![Page 42: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/42.jpg)
p1
p2
Dequeue Packet 2
>> 42
Classification / Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p13
&p27
L2
compute path
R1
R
&p2
t1
t2
t3t4
t5
t6
t7t8
Output Port
rank logic rank logic
rank logic
![Page 43: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/43.jpg)
rank logic rank logic
rank logic
p1
Dequeue Packet 1
>> 43
Classification / Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p13
L2
compute path
L
&p1
t1
t2
t3t4
t5
t6
t7t8
Output Port
![Page 44: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/44.jpg)
PIFO Paper ASIC Design [1]
˃ Flow schedulerChoose amongst head pkt of each flow
˃ Rank StoreStore computed ranks for each flow in FIFO order
˃ PIFO blocks connected in a full mesh
˃ 64 port 10Gbps shared memory switch, 1GHz
˃ 1000 flows, 64K packets
>> 44
235
54 2
46
8
AB
C
Rank Store(SRAM)
Flow Scheduler(flip-flops)
C 3 B 1 A 0
Increasing ranks Increasing ranks
PIFO block diagram
![Page 45: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/45.jpg)
NetFPGA Prototype (P4 Workshop Demo)
˃ Parallel deterministic skip lists and register-based sorting cache
˃ BRAM based packet buffer
>> 45
Packet Buffer
InputPacket
rank computation
Queue 1
Queue i
Queue N
. . .
. . .
03478
Classification
descriptor & metadata
descriptor & rank
descriptor
PIFO Scheduler
OutputPacket
Load Balancer
Selector
Register Cache
Skip List
Register Cache
Skip List
Register Cache
Skip List
Register Cache
. . .
Insertion
Removal
Top level PIFO block diagram
![Page 46: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/46.jpg)
Approximate Pfabric
>> 46
SRPT
FCFS FCFS
7p0
0p0
7p0
7L
9p19p1
0p1
9R
8p28p2
1p2
8R
6p36p3
2p3
6R
p1p0p2p3Final Scheduling Order:
p1p0 p2p3Pfabric Scheduling Order:
![Page 47: Programmable Packet Scheduling - Stanford University · 2019-03-11 · Weighted Fair Queueing (WFQ) ˃Bandwidth guarantees ˃Fully utilize capacity ˃Common Implementations: Deficit](https://reader034.fdocuments.in/reader034/viewer/2022042200/5ea02e796842616ae0446979/html5/thumbnails/47.jpg)
Next Steps
˃ Support traffic shaping
˃ Formally understanding what PIFOs can and can’t express
˃ Language abstractions to program PIFO tree
>> 47