CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees
-
Upload
alika-tyler -
Category
Documents
-
view
36 -
download
1
description
Transcript of CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees
![Page 1: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/1.jpg)
CBR: Sharing DRAM with Minimum Latency and Bandwidth
Guarantees
Zefu Dai, Mark Jarvin and Jianwen Zhu
University of Toronto
![Page 2: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/2.jpg)
23/4/19 University of Toronto 2
Background Consumer Electronics is part of everyday life!
SoC
Mem Contr.
DRAM
![Page 3: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/3.jpg)
23/4/19 University of Toronto 3
Background A portable media player SoC example
![Page 4: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/4.jpg)
23/4/19 University of Toronto 4
Background A portable media player SoC example
![Page 5: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/5.jpg)
23/4/19 University of Toronto 5
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
![Page 6: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/6.jpg)
23/4/19 University of Toronto 6
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
1000x
![Page 7: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/7.jpg)
23/4/19 University of Toronto 7
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
![Page 8: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/8.jpg)
23/4/19 University of Toronto 8
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
I want the data
NOW!!!
![Page 9: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/9.jpg)
23/4/19 University of Toronto 9
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
I want the data
NOW!!!
I can only supply a maximum of 6.4 GB every second.
![Page 10: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/10.jpg)
23/4/19 University of Toronto 10
ChallengesSimultaneously satisfy:
- Bandwidth requirements
- Latency requirements
![Page 11: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/11.jpg)
23/4/19 University of Toronto 11
Previous WorkQoS aware
- Bandwidth or latency is heuristically improved
QoS guaranteed- Guaranteed minimum bandwidth and / or latency
![Page 12: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/12.jpg)
23/4/19 University of Toronto 12
Main IdeasStart with Bandwidth Guaranteed Prioritized
Queuing (BGPQ) algorithm - Bandwidth guarantee
Improve it using Credit Borrow and Repay (CBR) mechanism- Minimum latency guarantee
![Page 13: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/13.jpg)
23/4/19 University of Toronto 13
Bandwidth Guaranteed Prioritized Queuing
Combine both the benefits of the Priority Queuing and Weighted Fair Queuing - Credit based Weighted Fair Queuing
- Prioritized service for residual bandwidth allocation
Residual bandwidth:- The bandwidth assigned to one user that is unused
at a specific point of time
![Page 14: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/14.jpg)
23/4/19 University of Toronto 14
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.0 0.0 0.0
Initial state: everybody has a credit of zero.
Multiplexer
BGPQ Scheduler
![Page 15: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/15.jpg)
23/4/19 University of Toronto 15
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.50.2
0.3
Multiplexer
Step 1: calculate dynamic credit for each queue.
BGPQ Scheduler
![Page 16: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/16.jpg)
23/4/19 University of Toronto 16
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.50.2
0.3
Step 2: turn on switch box and transfer data from granted queue.
BGPQ Scheduler
Multiplexer
![Page 17: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/17.jpg)
23/4/19 University of Toronto 17
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0-0.5
0.20.3
Multiplexer
Step 3: subtract 1 from the credit of granted queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
BGPQ Scheduler
![Page 18: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/18.jpg)
23/4/19 University of Toronto 18
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Before new scheduling cycle:
Q1 is empty.
Priority: Q0>Q1>Q2
BGPQ Scheduler
0-0.5
0.20.3
![Page 19: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/19.jpg)
23/4/19 University of Toronto 19
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Step 1: Calculate a dynamic credit for each queue.
Credit of empty queue remain unchangedPriority: Q0>Q1>Q2
BGPQ Scheduler
00.0 0.2
0.6
![Page 20: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/20.jpg)
23/4/19 University of Toronto 20
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Step 2: allocate residual bandwidth to non-empty queue with highest priority.
Priority: Q0>Q1>Q2
BGPQ Scheduler
00.2 0.2
0.6
![Page 21: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/21.jpg)
23/4/19 University of Toronto 21
Shared Resource
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
50%
20%
30%Multiplexer
Step 3: transfer data from granted queue.
Priority: Q0>Q1>Q2
BGPQ Scheduler
00.2 0.2
0.6
![Page 22: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/22.jpg)
23/4/19 University of Toronto 22
Shared Resource
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
50%
20%
30%Multiplexer
Step 4: subtract 1 from the credit of granted queue.
Priority: Q0>Q1>Q2 One Scheduling cycle is Done!!
Sum of credits = 0!
BGPQ Scheduler
00.2 0.2
-0.4
![Page 23: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/23.jpg)
23/4/19 University of Toronto 23
BGPQ AdvantagesBGPQ = WFQ + PQ
- bandwidth guarantee
- prioritized access to residual bandwidth
Low implementation cost:- 3 adders for credit calculation
- 1 comparator tree to find the highest dynamic credit
![Page 24: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/24.jpg)
23/4/19 University of Toronto 24
BGPQ DisadvantageLow latency, low bandwidth requirement
class:- No minimum latency guarantee
Minimum latency:- No need to wait for any request that has lower
priority
![Page 25: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/25.jpg)
23/4/19 University of Toronto 25
Latency Problem of BGPQExample:
Optimal Scheduling:
![Page 26: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/26.jpg)
23/4/19 University of Toronto 26
Credit Borrow and Repay Mechanism
Borrow- Allow low latency requirement class to borrow the
scheduling opportunity from other classes
Repay- Return the credit later when convenient
![Page 27: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/27.jpg)
23/4/19 University of Toronto 27
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0: a borrowed ID FIFO
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Step 1: calculate dynamic credit, and allocate the residual bandwidth
Priority: Q0>Q1>Q2DebtQ
CBR Scheduler
Multiplexer
![Page 28: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/28.jpg)
23/4/19 University of Toronto 28
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 2: re-assign the scheduling opportunity to Q0. And record the borrowed ID.
CBR Scheduler
![Page 29: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/29.jpg)
23/4/19 University of Toronto 29
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 3: transfer data
CBR Scheduler
![Page 30: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/30.jpg)
23/4/19 University of Toronto 30
CBR MechanismCase 3: Credit Borrow
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 4: subtract 1 from original scheduled queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
CBR Scheduler
![Page 31: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/31.jpg)
23/4/19 University of Toronto 31
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Initial state: Q0 is empty but has debt. It will ‘appear’ to be non-empty
CBR Scheduler
![Page 32: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/32.jpg)
23/4/19 University of Toronto 32
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 1: calculate dynamic credits and allocate the residual bandwidth.
CBR Scheduler
![Page 33: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/33.jpg)
23/4/19 University of Toronto 33
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 2: return the scheduling opportunity and clear the DebtQ.
CBR Scheduler
![Page 34: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/34.jpg)
23/4/19 University of Toronto 34
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 3: transfer data.
CBR Scheduler
![Page 35: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/35.jpg)
23/4/19 University of Toronto 35
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0-0.4
0.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 4: subtract 1 from scheduled queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
CBR Scheduler
![Page 36: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/36.jpg)
23/4/19 University of Toronto 36
CBR MechanismMinimum Latency Guarantee using CBR
- No need to wait for requests in other queues
Worst case: Q0 is not empty while DebtQ is full- No minimum latency guarantee under such case
![Page 37: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/37.jpg)
23/4/19 University of Toronto 37
Implementation in FPGACBR MPMC top level diagram
- Instantiation-time configurable port number
- Run-time programmable priority and bandwidth
![Page 38: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/38.jpg)
23/4/19 University of Toronto 38
Implementation in FPGA
Credit calculation circuit
Sorting Network and CBR
![Page 39: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/39.jpg)
23/4/19 University of Toronto 39
Implementation Cost8 port CBR-MPMC with 16-depth DebtQ
- Xilinx Virtex-5 XC5VLX50T
- Speedy DDR backend memory controller
![Page 40: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/40.jpg)
23/4/19 University of Toronto 40
EvaluationSimulation Framework
- Cycle accurate C model of MPMC- Simple close-page DDR memory model - Trace capturing and converting method
![Page 41: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/41.jpg)
23/4/19 University of Toronto 41
EvaluationCPU workload trace file (from B. Jacob)
- Cache simulation on standard SPEC2000 integer benchmark
Irregular and low bandwidth requirement:
0.4 memory transactions per 1k instructions.
![Page 42: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/42.jpg)
23/4/19 University of Toronto 42
EvaluationAccelerator Workload
- ALPBench suite of parallel multimedia applications
![Page 43: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/43.jpg)
23/4/19 University of Toronto 43
EvaluationAccelerator Workload
- ALPBench suite of parallel multimedia applications
Periodically repeated access pattern, high bandwidth requirement:
18.3 memory transactions per 1k instructions.
![Page 44: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/44.jpg)
23/4/19 University of Toronto 44
Results BGPQ Scheduler
- Latency: number of clock cycles- Bandwidth: number of memory transaction per 1k clock cycles
![Page 45: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/45.jpg)
23/4/19 University of Toronto 45
ResultsCBR Scheduler with a 16-depth debtQ
![Page 46: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/46.jpg)
23/4/19 University of Toronto 46
Impact of DebtQ SizeRepay conditions:
- DebtQ is full
- Q0 is empty
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
CBR Scheduler
When DebtQ is full, remaining requests in Q0 will not be served with minimum latency guarantee!
![Page 47: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/47.jpg)
23/4/19 University of Toronto 47
Impact of DebtQ SizeHow big is enough for DebtQ?
- Determined by instant time bandwidth requirement
Irregular access pattern means:- Large range of DebtQ size requirement
Tradeoff- Resource efficiency VS performance
![Page 48: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/48.jpg)
23/4/19 University of Toronto 48
ResultsImpact of debt queue size
![Page 49: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/49.jpg)
23/4/19 University of Toronto 49
ConclusionsCBR scheduler can provide minimum
bandwidth and latency guarantees
Low implementation cost, power consumption
We expect its successful use in a wide range of multimedia applications
![Page 50: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees](https://reader030.fdocuments.in/reader030/viewer/2022032606/56812e22550346895d9387d4/html5/thumbnails/50.jpg)
23/4/19 University of Toronto 50
Questions?
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
CBR Scheduler
Multiplexer
Priority: Q0>Q1>Q2DebtQ