08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson...
Transcript of 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson...
![Page 1: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/1.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 0
08 – Address Generator Unit(AGU)Oscar Gustafsson
![Page 2: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/2.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 1
Todays lecture
• Memory subsystem• Address Generator Unit (AGU)
![Page 3: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/3.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 2
Memory subsystem
• Applications may need from kilobytes to gigabytesof memory
• Having large amounts of memory on-chip isexpensive
• Accessing memory is costly in terms of power• Designing the memory subsystem is one ofthe main challenges when designing for lowsilicon area and low power
![Page 4: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/4.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 3
Memory issues
• Memory speed increases slower than logic speed• Impact on memory size
• Small size memory: Fast and area inefficient• Large size memory: Slow and area efficient
![Page 5: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/5.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 4
Comparison: Tradi onal SRAM vs ASIC memory block
• Traditional memory• Single tri-state data
bus for read/write• Asynchronous
operation
• ASIC memory block• Separate buses for
data input and dataoutput
• Synchronousoperation
![Page 6: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/6.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 5
Embedded SRAM overview
• Medium to large sized SRAMmemories are almostalways based on this architecture
• That is, large asynchronous memories should beavoided at all costs!
![Page 7: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/7.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 6
Best prac ces for memory usage in RTL code
• Use synchronous memories for all but the smallestmemory blocks
• Register the data output as soon as possible• A little combinational logic could be ok, but avoid
putting a multiplier unit here for example• Some combinational logic before the inputs to thememory is ok• Beware: The physical layout of the chip can cause
delays here that you don’t see in the RTL code
![Page 8: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/8.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 7
Best prac ces for memory usage in RTL code
• Disable the enable signal to the memory when youare not reading or writing• This will save you a lot of power
![Page 9: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/9.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 8
ASIC memories
• ASIC synthesis tools are usually not capable ofcreating optimized memories
• Specialized memory compilers are used for thistask
• Conclusion: You can’t implement large memoriesin VHDL or Verilog, you need to instantiate them• An inferred memory will be implemented using
standard cells which is very inefficient (10x largerthan an inferred memory and much slower)
![Page 10: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/10.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 9
Inferred vs instan ated memory
reg [31:0] mem [511:0];
always @(posedge clk) begin SPHS9gp_512x32m4d4_bL dm(if (enable) begin .Q(data),if (write) begin .A(addr),mem[addr] <= writedata; .CK(clk),
end else begin .D(writedata),data <= mem[addr]; .EN(enable),
end .WE(write));end
end
![Page 11: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/11.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 10
Memory design in a core
• Memory is not located in the core• Memory address generation is in the core• Memory interface is in the core
• This makes it easier to change the memories usedby a design where source code is not available.
![Page 12: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/12.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 11
Scratch pad vs Cache memories
• Scratch pad memory• Simpler, cheaper, and use less power• More deterministic behavior• Suitable for embedded/DSP• May exist in separate address spaces
![Page 13: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/13.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 12
Scratch pad vs Cache memories
• Cache memory• Consumes more power• Cache miss induced cycles costs uncertainty• Suitable for general computing, general DSP• Global address space.• Hide complexity
![Page 14: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/14.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 13
Selec ng Scratch pad vs Cache memory
Low addressingcomplexity
High addressingcomplexity
Cache If you are lazy Good candidatewhen aiming forquick TTM
Scratch pad Good candidatewhen aiming forlow power/cost
Difficult to imple-ment and analyze
![Page 15: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/15.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 14
The cost of caches
• More silicon area (tag memory and memory area)• More power (need to read both tag and memoryarea at the same time)• If low latency is desired, several ways in a set
associative cache may be read simultaneously aswell
• Higher verification cost• Many potential corner cases when dealing with
cache misses
![Page 16: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/16.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 15
Address genera on
• Regardless of whether cache or scratch padmemory is used, address generation must beefficient
• Key characteristic of most DSP applications:• Addressing is deterministic
![Page 17: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/17.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 16
Typical AGU loca on
![Page 18: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/18.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 17
Basic AGU func onality
[Liu2008]
![Page 19: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/19.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 18
AGU Example
Memory direct A ⇐ Immediate dataAddress register + offset A ⇐ AR + Immediate dataRegister indirect A ⇐ RFRegister + offset A ⇐ RF + immediate dataAddr. reg. post increment A ⇐ AR; AR ⇐ AR + 1Addr. reg. pre decrement AR ⇐ AR - 1; A ⇐ ARAddress register + register A ⇐ RF + AR
![Page 20: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/20.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 19
Modulo addressing
• Most general solution:• Address = TOP + AR % BUFFERSIZE• Modulo operation too expensive in AGU
– (Unless BUFFERSIZE is a power of two.)• More practical:
• AR = AR + 1• If AR is more than BOT, correct by setting AR to TOP
![Page 21: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/21.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 20
AGU Example with Modulo Addressing
• Let us add Modulo addressing to the AGU:• A=AR; AR = AR + 1• if(AR == BOT) AR = TOP;
![Page 22: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/22.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 21
AGU Example with Modulo Addressing
• What about post-decrement mode?
![Page 23: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/23.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 22
Modulo addressing - Post Decrement
• The programmer can exchange TOP and BOTTOM• Alternative – Add hardware to select TOP and BOTTOM based on
which addressing mode that is used
![Page 24: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/24.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 23
Variable step size
• Sometimes it makes sense to use a larger stepsizethan 1• In this case we can’t check for equality but must
check for greater than or less than conditions• if( AR > BOTTOM) AR = AR - BUFFERSIZE
![Page 25: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/25.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 24
Variable step sizeKeepers and
registers for BUFFERSIZE, STEPSIZE, and BOTTOM not shown.
Note that STEPSIZE can’t be larger than BUFFERSIZE
![Page 26: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/26.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 25
Bit reversed addressing• Important for FFT:s and similar creatures• Typical behavior from FFT-like transforms:
Input Transformed Outputsample sample index
(binary)x[0] X[0] 000x[1] X[4] 100x[2] X[2] 010x[3] X[6] 110x[4] X[1] 001x[5] X[5] 101x[6] X[3] 011x[7] X[7] 111
![Page 27: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/27.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 26
Bit reversed addressing
• Case 1: Buffer size and buffer location can be fixedby the AGU designer
• Case 2: Buffer size is fixed by the AGU designer,buffer location is arbitrary• Discussion break: How would you go about
designing an AGU for case 1 or 2?• Case 3: Buffer size and location are not fixed
![Page 28: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/28.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 27
Bit reversed addressing
• Case 1: Buffer size and buffer location can be fixed• Solution: If buffer size is 2N , place the start of the
buffer at an even multiple of 2N• ADDR = {FIXED_PART,
BIT_REVERSE(ADDR_COUNTER_N_BITS)};
![Page 29: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/29.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 28
Bit reversed addressing
• Case 2: Buffer size is fixed, buffer location isarbitrary• Solution: Add an offset to the bit reversed content.• ADDR = BASE_REGISTER +
BIT_REVERSE(ADDR_COUNTER_N_BITS);
![Page 30: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/30.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 29
Bit reversed addressing
• Case 3: Buffer size and location are not fixed• The most programmer friendly solution. Can bedone in several ways.
![Page 31: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/31.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 30
Other addressing modes: 2D
• For image processing, video coding, etc:• 2D addressing
– ADDR = Base + X + Y*WIDTH• 2D addressing with wrap around
– ADDR = Base + X % WIDTH + (Y % HEIGHT) *WIDTH
– Note: You are in trouble if WIDTH and HEIGHT arenot powers of two here! (C.f. texture access inGPU:s)
![Page 32: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/32.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 31
Other addressing modes: 2D
• 2D addressing with clamp to border• ADDR = Base + CLAMPW(X) + CLAMPH(Y)*WIDTH
• function CLAMPW(X)if(X < 0) X = 0;if(X > WIDTH-1) X = WIDTH-1;return X;
![Page 33: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/33.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 32
Why design for memory hierarchy
• Small memories are faster• Acting data/program in small size memories
• Large memories are fairly area efficient• Volume storage using large size memories
• Very large memories are very area efficient• DRAM needs special considerations during
manufacturing (e.g. special manufacturingprocesses)
![Page 34: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/34.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 33
Typical SoC Memory Hierarchy
DSP
L1: RF
PMDM1 DMn…
DP+CP
DMAI/F
MCU
L1: RF
PMDM1 DMn…
DP+CP
DMA I/F
Accelerators
PMDMn
DP+CP
DMA I/F
SoCBUS and its arbitration / routing / control
Main on chip memory Off chip DRAMNonvolatile memory I/FI/FI/F
DM
A
[Liu2008]
![Page 35: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/35.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 34
Memory par on
• Requirements• The number of data simultaneously• Supporting access of different data types• Memory shutting down for low power• Overhead costs from memory peripheral• Critical path from memory peripheral• Limit of on chip memory size
![Page 36: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/36.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 35
Issues with off-chip memory
• Relatively low clockrate compared to on-chip• Will need clock domain crossing, possibly using
asynchronous FIFOs• High latency
• Many clockcycles to send a command and receive aresponse
• Burst oriented• Theoretical bandwidth can only be reached by
relatively large consecutive read/writetransactions
![Page 37: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/37.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 36
Why burst reads from external memory?
• Procedure for reading from (DDR-)SDRAMmemory• Read out one column (with around 2048 bits)
from DRAMmemory into flip-flops (slow)• Do a burst read from these flip-flops (fast)• Write back all bits into the DRAMmemory
• Conclusion: If we have off-chip memory we shoulduse burst reads if at all possible
![Page 38: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/38.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 37
Example: Image processing
• Typical organization of framebuffer memory– Linear addresses
![Page 39: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/39.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 38
Example: Image processing
• Fetching an 8x8 pixel block from main memory– Minimum transfer size:
16 pixels
– Minimum alignment: 16 pixels
• Must load: 256 bytes
![Page 40: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/40.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 39
Rearranging memory content to save bandwidth
• Use tile based frame buffer instead of linear
![Page 41: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/41.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 40
Rearranging memory content to save bandwidth
• Fetching 8x8 block from memory– Minimum transfer size:
16 pixels
– Minimum alignment: 16 pixels
• Must load: 144 pixels– Only 56% of the
previous example!
![Page 42: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/42.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 41
Rearranging memory content to save bandwidth
• Extra important when using a wide memory busand/or a cache
![Page 43: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/43.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 42
Other Memory related tricks
• Trading memory for AGU complexity• Look-up tables• Loop unrolling
• Using several memory banks to increaseparallelism
![Page 44: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/44.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 43
Discussion break
• Memory subsystem suitable for image processing• Requirements:
• You should be able to read any 4 adjacenthorizontal pixels in one clock cycle
• You should be able to read any 4 adjacent verticalpixels in one clock cycle
![Page 45: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/45.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 44
Buses
• External memory• SDRAM, DDR-SDRAM, DDR2-SDRAM, etc
• SoC bus• AMBA, AXI, PLB, Wishbone, etc• You still need to worry about burst length, latency,
etc• See the TSEA44 course if you are interested in this!
![Page 46: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/46.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 45
DMA defini on and specifica on
• DMA: Direct memory access• An external device independent of the core• Running load and store in parallel with DSP• DSP processor can do other things in parallel
• Requirements• Large bandwidth and low latency• Flexible and support different access patterns• For DSP: Multiple access is not so important
![Page 47: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/47.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 46
Data memory access policies
• Both DMA and DSP can access the data memorysimultaneously• Requires a dual-port memory
• DSP has priority, DMA must use spare cycles toaccess DSP memory• Verifying the DMA controller gets more
complicated• DMA has priority, can stall DSP processor
• Verifying the processor gets more complicated• Ping-pong buffer
• Verifying the software gets more complicated
![Page 48: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/48.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 47
Ping-pong buffers
![Page 49: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/49.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 48
A simplified DMA architecture
[Liu2008]
![Page 50: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/50.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 49
A typical DMA request
• Start address in data memory• Start address in off-chip memory• Length• Priority
• Permission to stall DSP core• Priority over other users of off-chip memory
• Interrupt flag• Should the DSP core get an interrupt when the
DMA is finished?
![Page 51: 08 – Address Generator Unit (AGU) - isy.liu.se · 08–AddressGeneratorUnit(AGU) OscarGustafsson October4,2018 13 SelecngScratchpadvsCachememory Low addressing complexity High addressing](https://reader038.fdocuments.in/reader038/viewer/2022103013/5c64902409d3f2a86e8b5e78/html5/thumbnails/51.jpg)
08 – Address Generator Unit (AGU) Oscar Gustafsson October 4, 2018 50
Sca er Gather-DMA
• Pointer to DMA request 1
• Pointer to DMA request 2
• Pointer to DMA request 3
• End of List
• Start address in data memory• Start address in off-chip
memory• Length• Priority
– Permission stall DSP core– Priority over other users of off-
chip memory
• Interrupt flag– Should the DSP core get an
interrupt when the DMA is finished?