A parallel 'for' loop memory template for a high level synthesis compiler
-
Upload
craig-moore -
Category
Education
-
view
1.378 -
download
0
description
Transcript of A parallel 'for' loop memory template for a high level synthesis compiler
![Page 1: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/1.jpg)
A parallel for loop memory templatefor a high level synthesis compiler
Euromicro Conference on Digital System Design
Lille, France02/09/2010
Craig MooreWim Meeus, Harald Devos, and Dirk Stroobandt
![Page 2: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/2.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 2
Outline
● High Level Synthesis● Hardware Development● External Memory● Burst memory transfers● Parallel For Loops● Memory Template Overview● Small Example● Future Work● Conclusions
![Page 3: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/3.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 3
High Level Synthesis (HLS)Missing Pieces
![Page 4: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/4.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 4
HLS Missing Pieces
![Page 5: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/5.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 5
HLS Missing Pieces
![Page 6: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/6.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 6
Memory Templatesas Tools
● HDL Programmers have:● Toolkit of memory designs● Use the right tool for the job● Manually adapt their designs
● HLS Compilers should:● Have a toolkit of templates● Adapt the template to the app● Evaluate each template● Suggest the best template
![Page 7: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/7.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 7
1) Read values from memory2) Process each value3) Store output in memory
Basic Steps for any Algorithm
for (int i = start; i < end; i++){ b[i] = func(a[i]);}
![Page 8: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/8.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 8
Implement on Hardware
![Page 9: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/9.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 9
External Memoryfor FPGAs
● A bottle neck● Sequential in nature● Number of values
returned each cycle depends on bus width.
● Each memory request requires a handshake
![Page 10: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/10.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 10
Adapting to the Bottleneck
● Stream values from memory
● Pre-fetch values● Read/Write more than
one value each clock cycle
● Store values locally to mask latency
● Reduce number of requests
![Page 11: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/11.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 11
Burst Transfers
● Burst of consecutive memory operations
![Page 12: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/12.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 12
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 13: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/13.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 13
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 14: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/14.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 14
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 15: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/15.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 15
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 16: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/16.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 16
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 17: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/17.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 17
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 18: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/18.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 18
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 19: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/19.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 19
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 20: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/20.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 20
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 21: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/21.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 21
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 22: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/22.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 22
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
![Page 23: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/23.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 23
Parallel for Loop
● Each iteration is run in parallel● No loop dependencies
● Loop Transformations to remove them
for i = 1 to 4{ a(i) = a(i) + 1 b(i) = a(i – 1) + a(i + 1)}
Example with Dependencies
![Page 24: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/24.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 24
Template Overview
![Page 25: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/25.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 25
Template Overview
Requests read bursts and controls execution of data paths, waits foroutput buffer if it is full
![Page 26: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/26.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 26
Template Overview
Non-pipelined loop bodies executing in parallel.
![Page 27: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/27.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 27
Manual Design
With enough values, performs write bursts.
![Page 28: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/28.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 28
Manual Design
Starts and stops execution
![Page 29: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/29.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 29
Manual Design
Controls access to memory, grants permission based on request (output buffer priority)
![Page 30: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/30.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 30
Manual Design
Controls access to memory, grants permission based on request (output buffer priority)
Starts and stops execution With enough values, performs write bursts.
Non-pipelined loop bodies executing in parallel.
Requests read bursts and controls execution of data paths, waits foroutput buffer if it is full
![Page 31: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/31.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 31
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
![Page 32: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/32.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 32
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
![Page 33: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/33.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 33
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
![Page 34: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/34.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 34
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
![Page 35: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/35.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 35
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
![Page 36: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/36.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 36
Parametrized Template
![Page 37: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/37.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 37
Parametrized Template
● Memory Bus Width = MParameters
![Page 38: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/38.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 38
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
![Page 39: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/39.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 39
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
![Page 40: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/40.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 40
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
![Page 41: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/41.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 41
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
● Iterations = Output FIFOs = N = C
N * X
![Page 42: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/42.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 42
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
● Iterations = Output FIFOs = N = C
N * X
● Burst Length
● Input FIFO Length
● Iteration Length
● Output FIFO Length
![Page 43: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/43.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 43
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
● Iterations = Output FIFOs = N = C
N * X
● Burst Length
● Input FIFO Length
● Iteration Length
● Output FIFO Length
![Page 44: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/44.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 44
Example – Reading Values
Values in Memory
Values to be read
Byte enabled
Byte disabled
Values processed
![Page 45: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/45.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 45
Example – Processing Values
Values in Memory
Values to be read
Byte enabled
Byte disabled
Values processed
![Page 46: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/46.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 46
Example – Writing Values
Values in Memory
Values to be read
Byte enabled
Byte disabled
Values processed
![Page 47: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/47.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 47
Future Work
● More templates for other parallel for loops● Pipelined loop body● Data reuse
● Compiler identifies parallel for loop● No keywords● Check for loop dependencies, and do loop
transformations if required● Compiler suggests best memory template
● Chosen based on performance estimate● Design space exploration using templates
![Page 48: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/48.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 48
Conclusions
● HLS Tools don't create memory designs● Manual memory designs can take
days/weeks/months to complete● Parametrized memory template designs are
generated in seconds● Easy to perform design space exploration using
different parameter values and/or templates
![Page 49: A parallel 'for' loop memory template for a high level synthesis compiler](https://reader033.fdocuments.in/reader033/viewer/2022042816/55972e321a28abf2378b466d/html5/thumbnails/49.jpg)
30/06/2010 Craig Moore, DSD 02/09/2010 49
Thank You!
Questions?
[email protected]://www.elis.ugent.be/~cmoore
Wim Meeus*, Harald Devos‡, and Dirk Stroobandt**{wim.meeus, dirk.stroobandt}@elis.ugent.be, ‡[email protected]