1 ECE 5900 spring 05 1
Memory Access SchedulingMemory Access Scheduling
ECE 5900 Computer Engineering Seminar
Ying Xu Mar 4, 2005
Instructor: Dr. Chigan
2 ECE 5900 spring 05 2
OutlineOutline
IntroductionModern DRAM architectureMemory access scheduling
Structure of access schedulerScheduling policies
Experimental resultsFirst-ready schedulingAggressive reordering
Conclusions
3 ECE 5900 spring 05 3
IntroductionIntroduction
Bandwidth of memory chip increases dramatically
DDR2, SDRAMMedia processors
Streaming memory reference patternsMemory bandwidth bottleneck
4 ECE 5900 spring 05 4
Intro (contd)Intro (contd)
Pipelining memory accessesMaximize the memory bandwidthSequential accesses to the different row of the same bank can’t be pipelined
Memory access schedulingReorder memory operations
Bank precharge, row activation, column accessMemory references completed out of order
5 ECE 5900 spring 05 5
Intro(contd)Intro(contd)
6 ECE 5900 spring 05 6
Characteristics of DRAM architectureCharacteristics of DRAM architecture
DRAMs are not truly random access devices3 dimensional memories
BankRowColumn
3 operationsBank prechargeRow activationColumn access
7 ECE 5900 spring 05 7
DRAM organizationDRAM organization
8 ECE 5900 spring 05 8
Resource constraints of DRAMSResource constraints of DRAMS
Dram resourcesInternal banksA single set of address linesA single set of data lines
Different operation has different demand
9 ECE 5900 spring 05 9
Bank stateBank state
10 ECE 5900 spring 05 10
Memory access schedulingMemory access scheduling
Process of ordering DRAM operationsSubject to resource constraintsSimplest: oldest pending references first
InefficientDRAM Not ready for the oldest referencesLeave the available resource idle
Need more complicated scheduling algorithm
11 ECE 5900 spring 05 11
Memory access scheduler structureMemory access scheduler structure
12 ECE 5900 spring 05 12
Memory access scheduling policiesMemory access scheduling policies
13 ECE 5900 spring 05 13
Memory access scheduling Memory access scheduling algorithmalgorithm
Combination of policies used by precharge manager, row arbiter, column arbiter, address arbiter
Address arbiter decides which selected precharge, row, column operation to performChoices: in-order, priority, precharge operation first, row operation first, column operation first
14 ECE 5900 spring 05 14
Experimental setupExperimental setup
Streaming media processors are preferredStreams lack temporal locality Stream transfer bandwidth drives the processor performanceThe image stream processor is simulated
frequency 500MHZDram frequency 125MHZPeak system bandwidth 2GB/s
15 ECE 5900 spring 05 15
Experimental setup(contd)Experimental setup(contd)
Benchmarks and media processing applications
16 ECE 5900 spring 05 16
In order schedulingIn order scheduling
In-order access schedulerNo access reorderingA column is only performed for the oldest pending reference; same as bank precharge and row activation Baseline
17 ECE 5900 spring 05 17
FirstFirst--ready schedulingready scheduling
Uses the ordered priority scheme for all unitsSubjects to resource and timing constraintsSchedule an operation for the oldest pending references
Benefits: Accesses targeting other banks can be performed while waiting for a precharge or row activationparallelism: multiple references in progress
18 ECE 5900 spring 05 18
Experimental resultsExperimental results
Sustained memory bandwidth increased about 79%
19 ECE 5900 spring 05 19
Experimental resultsExperimental results
Sustained bandwidth increased about 17%
20 ECE 5900 spring 05 20
Experimental resultsExperimental results
Sustained memory bandwidth increased about 79%
21 ECE 5900 spring 05 21
Aggressive reorderingAggressive reordering
Drawback of first-ready schedulingPrecharges a bank when the oldest pending reference targets a different row than the active row in a bank,there are still multiple pending references to the active row
Aggressive reordering to further increase sustained memory bandwidth
22 ECE 5900 spring 05 22
Possible reordering scheduling algorithm policesPossible reordering scheduling algorithm polices
Large range of possible memory access schedulerFour representative
23 ECE 5900 spring 05 23
Experimental resultsExperimental results
Improve bandwidth by 106-144%
24 ECE 5900 spring 05 24
Experimental resultsExperimental results
Improve bandwidth by 27-30%
25 ECE 5900 spring 05 25
Experimental resultsExperimental results
Improve bandwidth 85-93%
26 ECE 5900 spring 05 26
RowRow--first policy VS column first policyfirst policy VS column first policy
Address arbiterRow-first: always select row operation firstColumn-first: always select column operation first
Little difference across all benchmarksException: FFT
Less to do with the scheduling algorithm than the characteristic of benchmark itselfFFT most sensitive to stream load latencyCol/op policy allows a store stream to delay load streams
27 ECE 5900 spring 05 27
Open or closed precharge policy?Open or closed precharge policy?
Closed precharge policybanks are precharged as soon as no pending references to the active row
Open precharge policyNo pending references to the active row, pending references to other rows of the same bank
Difference between open and closed precharge policy is slightBenchmarks with random access pattern prefer closed precharge policy
Little reference locality No benefit to keep row open
FFT prefers op precharge policyNumerous accesses to each row
28 ECE 5900 spring 05 28
Effect of bank buffer sizeEffect of bank buffer size
Row/closed scheduling algorithm
29 ECE 5900 spring 05 29
ConclusionsConclusions
Memory access scheduling greatly increases the bandwidth utilization
Buffering memory references Access internal banks in parallel Maximize the number of column accesses per row access
First ready scheduling algorithm79% bandwidth improvement on microbenchmarks, 40% on application traces
Aggressive reordering algorithm144% bandwidth improvement on benchmarks,30% on media processing applications, 93% on the application traces
30 ECE 5900 spring 05 30
ConclusionsConclusions
Closed precharge policy preferred by most benchmarksLittle difference in performance between row-first or column first policies.For latency sensitive applications, scheduling loads ahead of stores preferred.
Banks are precharged as soon as the last column reference to an active row is completed
31 ECE 5900 spring 05 31
Paper referencePaper reference
Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, John D. Owens, Memory access scheduling, ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual international symposium on Computer architecture, Volume 28 Issue 2, May 2000
32 ECE 5900 spring 05 32
Thank you !
Top Related