Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh...
-
Upload
tracey-sutton -
Category
Documents
-
view
224 -
download
0
description
Transcript of Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh...
![Page 1: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/1.jpg)
Approximate Computing on FPGA Approximate Computing on FPGA using Neural Accelerationusing Neural Acceleration
Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta,Kenneth Siu
![Page 2: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/2.jpg)
Approximate ComputingApproximate Computing• Involves computations that do not need to be exact (tolerance
to quality degradation)• Neural Network’s (NN) speed can be exploited• Optimization (performance and energy efficiency) in favor of
accuracy• Implement a NN accelerator that interacts with the CPU• Useful in many computer vision and image processing
applications like edge detection
![Page 3: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/3.jpg)
MotivationMotivation• To combine approaches of specialized logic (accelerator) and
approximate computing for enhanced performance and energy efficiency
Top Level System Design
![Page 4: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/4.jpg)
Architecture Design of NPUArchitecture Design of NPU
Top Level Diagram of NPU
![Page 5: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/5.jpg)
Architecture and FeaturesArchitecture and Features• Total of 8 Processing Elements in one Processing Unit (in initial
design)• Weights needed for Neural Processing loaded into the weight FIFO at
time of configuration• A Scheduling Buffer is configured in configuration phase and use to
generate control signals used for Input, Output, Sigmoid and Accumulator FIFO, PE input selection and Sigmoid Function
• After this Inputs are loaded into input FIFO (using enqd instruction)• Inputs & Weights are 16 bit wide Fixed with 7 fractional bits. NPU
supports 32 bit integers and single precision Floating Points. Input interface does required format conversion
![Page 6: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/6.jpg)
Architecture and FeaturesArchitecture and Features• Compute Unit: Performs Multiplication and Addition operation• State Machine: Controls &configures NPU – stall due to insufficient
input/Push output to FIFO• Accumulator FIFO: Stores intermediate results when No of Inputs >
No of PE• Sigmoid Function Unit: Current NPU supports tan sigmoid and linear
functions• Output FIFO: Holds output of NPU
![Page 7: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/7.jpg)
Software for ConfigurationSoftware for Configuration• Weights can be generated through custom MATLAB code or
through a compiler• A perl based compiler which expects weights and the
structure of the neural network as input• The compiler will then generate a sequence of instructions
which will be loaded into the NPU• These instructions will load values in the weight buffers as well
as the scheduling buffer
![Page 8: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/8.jpg)
Zedboard ImplementationZedboard Implementation• Used Vivado tools to set programmable logic and generate a
bitstream for gates• Implements bitstream as a First-stage boot loader by
wrapping bitstream with boot files• On Zedboard boot, programmable logic is loaded with design• Driver interfaces C code with programmable logic(NPU)• Comparison between C code runs native on Digilent Linux on
Zedboard to test ARM core
![Page 9: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/9.jpg)
Zedboard ChallengesZedboard Challenges• Configuring Vivado to generate bitstream. • Synthesis/Implementation Debugging/Errors• Creating appropriate wrapper so Zedboard does not crash on
boot
![Page 10: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/10.jpg)
BenchmarksBenchmarks• Sobel Edge Detection• Good program for
approximate computing• Uses convolution of a 3x3
matrix to find edges• Took 0.4 ms for 512x512
image
![Page 11: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/11.jpg)
AxBench BenchmarksAxBench Benchmarks• Using AxBench• Utilizes software NN (FANN
Library)• Need a hardware NN to fully
utilize efficiency• Benchmarks run both with
and without NN
NN (s) Original (s) %Error
fft 9.887461294 0.265407208 0.06606428
inversek2j 34.26521323 1.877137814 0.10299372
jpeg 1.486351732 0.056334889 0.30025683
![Page 12: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/12.jpg)
In ProgressIn Progress1. Compare the performance against another Processing Unit
with 16 PEs and check speedup gains2. Build an NPU with 2 Processing units with 8 PEs each and
again compare the performance & speedup3. Modify the scheduler to remove stalls due to unavailable
data4. More benchmarks
![Page 13: Approximate Computing on FPGA using Neural Acceleration Presented By: Mikkel Nielsen, Nirvedh Meshram, Shashank Gupta, Kenneth Siu.](https://reader035.fdocuments.in/reader035/viewer/2022062317/5a4d1b3a7f8b9ab05999e4ee/html5/thumbnails/13.jpg)
ReferencesReferences[1] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general- purpose approximate programs. MICRO, 2012.[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986,vol.1, pp. 318–362.[3] Marc de Kruijf and K. Sankaralingam, “Exploring the synergy of emerging workloads and silicon reliability trends” in SELSE, 2009.[4] Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, Mark Oskin, SNNAP: Approximate computing on programmable SoCs via neural acceleration. HPCA 2015: 603-14.