From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason...
-
Upload
philippa-roberts -
Category
Documents
-
view
218 -
download
0
Transcript of From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason...
![Page 1: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/1.jpg)
From Software to Circuits: High-Level Synthesis for FPGA-Based
Processor/Accelerator SystemsJason Anderson
Tools to Tackle Big Data – Big Data Workshop3 July 2014
Dept. of Electrical and Computer EngineeringUniversity of Toronto
![Page 2: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/2.jpg)
LegUp Research Team
• Undergrad Researchers: Mathew Hall, Stefan Hadjis, Joy Chen
• Faculty: Stephen Brown and myself • Industry Liaison: Tomasz Czajkowski, Altera
AndrewCanis
JamesChoi
NazaninCalagar
LannyLian
Blair Fort
![Page 3: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/3.jpg)
Computations in Two Ways
![Page 4: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/4.jpg)
Write Software
Computations in Two Ways
![Page 5: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/5.jpg)
Write Software
Computations in Two Ways
![Page 6: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/6.jpg)
Write Software
Computations in Two Ways
![Page 7: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/7.jpg)
Write Software
Computations in Two Ways
![Page 8: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/8.jpg)
Write Software Design Custom Circuits
Computations in Two Ways
![Page 9: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/9.jpg)
Write Software Design Custom Circuits
Computations in Two Ways
![Page 10: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/10.jpg)
Write Software Design Custom Circuits
Computations in Two Ways
![Page 11: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/11.jpg)
Design Methodology
![Page 12: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/12.jpg)
Write software
Design Methodology
![Page 13: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/13.jpg)
Write software• Easy
Design Methodology
![Page 14: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/14.jpg)
Write software• Easy• Flexibility lower performance
Design Methodology
![Page 15: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/15.jpg)
Write software• Easy• Flexibility lower performance
Design Custom Circuits
Design Methodology
![Page 16: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/16.jpg)
Write software• Easy• Flexibility lower performance
Design Custom Circuits• Efficient, low power
Design Methodology
![Page 17: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/17.jpg)
Write software• Easy• Flexibility lower performance
Design Custom Circuits• Efficient, low power• Need specialized knowledge
Design Methodology
![Page 18: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/18.jpg)
Hardware’s Potential
• Implementing computations in FPGA hardware can have speed/energy advantages over software:– Lithography simulation: 15X speed-up [Cong & Zou, TRETS’09]– Linear system solver: 2.2X speed-up, 5X more energy efficient
[Zhang, Betz, Rose, TRETS’12]– Monte Carlo simulation for photodynamic therapy: 80X faster,
45X more energy efficient [Lo et al., J. Biomed Optics’09]– Options pricing: 4.6X faster, 25X more energy efficient
[Tse, Thomas, Luk, TVLSI’12]
![Page 19: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/19.jpg)
So Why Doesn’t Everybody Use Hardware?
• Hardware design is difficult and skills are rare:– Requires use of hardware description languages:
Verilog and VHDL• Low-level of abstraction (individual bits)
– 10 software engineers for every hardware engineer* • We need a CAD flow that simplifies hardware
design for software engineers
*US Bureau of Labour Statistics 2012
![Page 20: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/20.jpg)
• High-Level Synthesis– Design circuits using software languages– From a software program, high-level
synthesis tool automatically “synthesizes” circuit that does the same computations as the program
– Benefits of software programmability and hardware performance
A Solution
![Page 21: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/21.jpg)
LegUp High-Level Synthesis for FPGAs
• LegUp is a high-level synthesis tool we have been developing since 2009.
• Takes a C program as input, and produces a circuit.
• 1000+ downloads of our tool since its first release in 2011.
• http://legup.eecg.toronto.edu
![Page 22: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/22.jpg)
legup.eecg.toronto.edu
![Page 23: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/23.jpg)
Why Use FPGAs to Implement Circuits?
• Building fully fabricated custom chips is hard– Very complex design process– Costs $millions to prototype a chip– Takes 2-3 months to fabricate– Only done for high volume applications or apps that
require high speed or lowest power
• Alternative: pre-fabricated, programmable chips
Field-Programmable Gate Arrays (FPGAs)
![Page 24: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/24.jpg)
Field-Programmable Gate Arrays• Pre-fabricated chip consists of “array” of logic blocks Surrounded by
programmable interconnect• Hardware “becomes” what you want by programming blocks and
interconnect (electrically)
Channels ofprogrammableinterconnect
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
Blo
ck R
AM
Blo
ck R
AM
Blo
ck R
AM
Har
d IP
Blo
ckH
ard
IP B
lock
Configurablelogic block
Common blocks: multiplier, DSP,
processor,PCI, ADC, DLL
SRAM block(e.g., 18 kbits)
![Page 25: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/25.jpg)
A Real FPGA – Altera Stratix III
![Page 26: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/26.jpg)
FPGA Advantages over “Hard” Chips
• “Manufacture” takes seconds vs. months• Design, test and manufacture:
$single-digit millions vs. $tens of millions• Giving:
– Faster time-to-market for products– FPGA vendor handles difficult design & manufacture issues– FPGA vendor shares inventory risk across many customers– FPGA vendor does test
• Two largest FPGA vendors: Xilinx and Altera
![Page 27: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/27.jpg)
FPGAs and High-Level Synthesis
• FPGAs mainly accessible to HW engineers– Vendors want to expand user-base:
make FPGAs useable as computing platforms• Area/power/delay gap between HLS-generated
HW and manually crafted HW– In custom Si, user must “pay” for area gap– Power/performance one of main reasons to go custom
• FPGAs likely the IC media through which HLS goes “mainstream”
![Page 28: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/28.jpg)
LegUp: Top-Level Vision
Program code
C CompilerProcessor
(MIPS/ARM)
Self-ProfilingProcessor
Profiling Data:
Execution CyclesPower
Cache Misses
High-levelsynthesis Suggested
programsegments to
target to HWFPGA fabric
P Hardenedprogramsegments
Altered SW binary (calls HW accelerators)
int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);}....
![Page 29: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/29.jpg)
LegUp: Key Features• C to Verilog high-level synthesis• Many benchmarks (incl. 12 CHStone)• Automated verification tests• Support for four different FPGAs:
– Altera Cyclone II, Stratix IV, Cyclone IV, Cyclone V-SoC
• Open source, freely downloadable
![Page 30: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/30.jpg)
How Does High-Level Synthesis Work?
![Page 31: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/31.jpg)
Digital Circuits
• Example: you buy a “1 GHz processor”
![Page 32: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/32.jpg)
Digital Circuits
• Example: you buy a “1 GHz processor”
1 GHz = 1 nanosecond time-steps
Some computation is done in each time step
![Page 33: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/33.jpg)
Digital Circuits
• Example: you buy a “1 GHz processor”
1 GHz = 1 nanosecond time-steps
time
Some computation is done in each time step
![Page 34: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/34.jpg)
Digital Circuits
• Example: you buy a “1 GHz processor”
1 GHz = 1 nanosecond time-steps
time
1ns
Some computation is done in each time step
![Page 35: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/35.jpg)
Example Circuit
1ns
A B
+ Calculate A+B
![Page 36: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/36.jpg)
Example Circuit
1ns
A B
+
Store computation after each step
![Page 37: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/37.jpg)
Example Circuit
1ns
A B C D E F
+ – *
![Page 38: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/38.jpg)
Example Circuit
1ns
1ns
A B C D E F
+
*
– *
![Page 39: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/39.jpg)
Example Circuit
1ns
1ns
1ns
(A+B)*(C–D) – (E*F)
A B C D E F
+
–
*
*–
![Page 40: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/40.jpg)
Scheduling: Key Aspect of HLS
• How to assign the computations of a program into the hardware time steps?
C language snippet:
z = a+b;x = c+d;q = z+x;q = q-2;r = q*2;
Programs do not contain the notionof “time steps”.Here, we have: 3 add operations 1 subtract operation 1 multiplication operation
![Page 41: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/41.jpg)
Scheduling
Questions:• Which operations can be scheduled
in the same time step?• Which operations are dependent
on others?• If addition takes 5ns, subtraction
takes 5ns and multiplication takes 10ns, how to schedule?– Target clock step length is 10ns
C language snippet:
z = a+b;x = c+d;q = z+x;q = q-2;r = q*2;
![Page 42: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/42.jpg)
Scheduling
10ns
10ns
10ns
+ +
+
-
*
2
2
a b c d
![Page 43: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/43.jpg)
Scheduling
10ns
10ns
10ns
+ +
+
-
*
2
2
a b c d
chaining
parallel operations
![Page 44: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/44.jpg)
HLS Challenges• Performance of HLS-generated circuits not
as good as human-designed circuits
• However, HLS-generated circuits are already better than SW in many cases
• Much of our research is aimed towards improving HLS quality
![Page 45: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/45.jpg)
Loop Pipelining
![Page 46: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/46.jpg)
Loop Pipeliningfor (int i = 0; i < N; i++) {
sum[i] = a + b + c + d}
+
a b
+
c
+
d
cycle
1
2
3
• Cycles: 3N• Adders: 3• Utilization: 33%
![Page 47: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/47.jpg)
Loop PipeliningCycle 1 2 3 4 5 … N N+1 N+2
i=0 + + +
i=1 + + +
i=3 + + +
…. …. … …. …
i=N-2 + + +
i=N-1 + + +
• Cycles: N+2 (~1 cycle per iteration)• Adders: 3• Utilization: 100% in steady state
Steady State
![Page 48: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/48.jpg)
Loop Pipelining
• Ideally, we could start a loop iteration every clock cycle– Initiation interval (II) = 1
• However,– Loops may have dependencies across iterations– There may be constraints on resources
• e.g. only two memory accesses in a cycle
• Loop pipelining seeks to minimize II subject to constraints
![Page 49: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/49.jpg)
Exploiting Spatial Parallelism
![Page 50: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/50.jpg)
Motivation• Speed benefits of HW arise from spatial
parallelism• Extracting parallelism from a sequential
program is difficult• Auto-parallelizing compilers do not work well!
• Easier to start from parallel code• Pthreads/OpenMP can help!
![Page 51: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/51.jpg)
Background
Programming Models
![Page 52: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/52.jpg)
Background
Programming Models
SequentialC/C++
![Page 53: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/53.jpg)
Background
Programming Models
SequentialC/C++
Massively ParallelCUDA/OpenCL
![Page 54: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/54.jpg)
Background
Programming Models
SequentialC/C++
Massively ParallelCUDA/OpenCLPthreads/OpenMP
Standard API in C!
![Page 55: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/55.jpg)
OpenMP example
#pragma omp parallel for num_threads(2) private(i)for (i = 0; i < SIZE; i++) { output[i] = A_array[i]*B_array[i];}
![Page 56: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/56.jpg)
Pthread Examplestruct thread_data{ int start; int end;};
int main() { pthread_t thread1, thread2; struct thread_data data1, data2;
data1.start = 0; data1.end = SIZE/2; data2.start = SIZE/2; data2.end = SIZE;
pthread_create( &thread1, NULL, product, (void*)&data1); pthread_create( &thread2, NULL, product, (void*)&data2);
pthread_join( thread1, NULL); pthread_join( thread2, NULL);}
void *product(void *threadarg) { int i, startIdx, endIdx;
struct thread_data* arg = (struct thread_data*) threadarg; stardIdx = arg->start; endIdx = arg->end;
for (i = startIdx; i < endIdx; i++){ output[i] = A_array[i]*B_array[i]; }}
![Page 57: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/57.jpg)
Pthreads vs OpenMP
• OpenMP provides an easy/implicit way for parallelizing a section of code (e.g. loops)
• Pthreads require explicit thread forks/joins• Pthreads can be more work but gives more
control to programmer• Pthreads can execute different functions in
parallel
![Page 58: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/58.jpg)
OpenMP/Pthreads Support in LegUp
• Allow Pthreads and OpenMP to be used to specify parallel hardware.
• Automatically infer parallel-operating accelerators for the parallel-operating threads.
• Permits a easy exploration of a broad parallelization landscape.– Incl. support for nested parallelism.
![Page 59: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/59.jpg)
Nested ParallelismPthreads
![Page 60: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/60.jpg)
Nested ParallelismPthreads
add sub mult
![Page 61: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/61.jpg)
Nested ParallelismPthreads
add sub mult
![Page 62: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/62.jpg)
Nested ParallelismPthreads
add sub mult
OMP OMP OMP
![Page 63: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/63.jpg)
Nested ParallelismPthreads
add sub mult
OMP OMP OMP
![Page 64: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/64.jpg)
Nested Parallelism
Processor
On-chip Cache
Off-chip Mem
Accel 1 Accel 2 Accel 3
![Page 65: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/65.jpg)
Nested Parallelism
Processor
Multi-ported Cache
Off-chip Mem
Accel 1 Accel 2 Accel 3
1 2 3 2 3 1 2 3
On-chip Cache
Processor1
![Page 66: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/66.jpg)
Case Study
![Page 67: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/67.jpg)
Computing the Mandelbrot Set
• Highly compute-bound application
• Each pixel is computed independently
• Fixed point calculations
• Our target image: 128x128 pixels
![Page 68: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/68.jpg)
Target Platform: Altera Cyclone V-SoC
• 28nm FPGA with embedded dual-core ARM processor in the FPGA fabric– 800 MHz ARM with L1 + L2 caches– FPGA accelerators can access ARM cache
ARM processor
AlteraCyclone V FPGAfabric
![Page 69: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/69.jpg)
Speed Performance Results
5.7X speed-upvs ARM SW
ARM software
1 HLS accel
2 HLS accels
4 HLS accels
8 HLS accels
0
5
10
15
20
25
30
0
20
40
60
80
100
120
140
Wall-clock time (ms)
MHz
![Page 70: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/70.jpg)
High-Level Synthesis for Big Data
• Seeking big data applications we can collaborate on and accelerate with HLS
• Ideal characteristics:– Compute bound (not I/O bound)– Integer or fixed point (not floating point)– Data parallel
• Please reach out to us
![Page 71: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/71.jpg)
Summary
• LegUp is an open-source high-level synthesis tool being developed at Univ. of Toronto.– Targets a hybrid FPGA-based
processor/accelerator system.– Distribution includes many benchmark programs
and other infrastructure.• Active development continues.
– Pthreads + OpenMP, debugging, memory architecture synthesis, improved HW quality.
![Page 72: From Software to Circuits: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Jason Anderson Tools to Tackle Big Data – Big Data Workshop.](https://reader034.fdocuments.in/reader034/viewer/2022051216/56649d9e5503460f94a88bab/html5/thumbnails/72.jpg)
Questions?
legup.eecg.toronto.edu