Methodology to standardize the development of FPGA- based ...€¦ · Why is it expensive to...

Post on 18-Oct-2020

0 views 0 download

Transcript of Methodology to standardize the development of FPGA- based ...€¦ · Why is it expensive to...

Methodology to standardize the development of FPGA-based intelligent DAQ and processing systems on

heterogeneous platforms using OpenCL

M. Astrain1, M. Ruiz1, S. Esquembri1, A. Carpeño1, E. Barrera1, J. Vega2

1Instrumentation and Applied Acoustic Research Group, Universidad Politécnicade Madrid, Madrid, Spain

2Laboratorio Nacional de Fusión, CIEMAT, Madrid, Spain

miguel.astrain@i2a2.upm.es Slide 1

OUTLINE

• Context

• System Architecture Example

• OpenCL standard

• Development cycle

• Results

• Conclusions

miguel.astrain@i2a2.upm.es Slide 2

Context

miguel.astrain@i2a2.upm.es Slide 3

Context

Advantages of Data Acquisition (DAQ) systems based on FPGAs+ Flexibility to modify the design

+ Low latency

+ Deterministic behavior

+ High-performance processing

+ Parallelization

miguel.astrain@i2a2.upm.es Slide 4

But developing for FPGA is complex and time consuming (expensive …)

PXI Chassis MTCA ChassisFPGA Board

I/O

Interface

AMC + FPGA

I/O Interface

NI Linux DriverMTCA Driver

IRIO

NDS-CORE

Custom

IRIO Templates

NDS-EPICS

User Device Driver

Custom process

User customizationUser customization

Networks

Fast

Controller

Remote I/O

PCIe

Application

independant

Application

dependant

Hardware &

device

drivers

Software

SDN

, DA

N

Main control, Other systems, etc

Why is it expensive to develop for FPGAs

miguel.astrain@i2a2.upm.es Slide 5

Desing UsingLabVIEW/FFPA

Design UsingHDLsOnly Products

from OneManufacturer

IRIO-OpenCL

miguel.astrain@i2a2.upm.es Slide 6

PXI Chassis MTCA ChassisFPGA Board

I/O

Interface

AMC + FPGA

FMC

I/O

NI Linux Driver OpenCL RTE

IRIO

NDS-CORE

Custom

Kernels

IRIO Templates

NDS-EPICS

IRIO-OpenCL

IRIO-OpenCL

BSP

IRIO-OpenCL

KernelsCustom process

User customizationUser customization

Networks

Fast

Controller

Remote I/O

PCIe

Application

independant

Application

dependant

Hardware &

device

drivers

Software

SDN

, DA

N

Main control, Other systems, etcControl

System

Desing UsingOpenCL(80%)

HDL(20%)

Products frommultiple

Manufacturers

System Architecture Example

• PC + PCIe + MTCA + AMC + (PROCESSING) + (I/O)

miguel.astrain@i2a2.upm.es Slide 7

COMPUTER CODAC Core System 6.0

MCH carrier

PCIe

optical link

Device

AMC ARRIA10 FPGA

FMC DAQ Board

MTCA4 CHASSIS

HOST+

+ NDSv3

1

OpenCL: Basic overview

miguel.astrain@i2a2.upm.es Slide 8

High level language + COMPUTING MODEL • A host and multiple devices (CPU, GPU, FPGA).• Computation is divided into functions called Kernels.• There is one or several queues that send the Kernels to execute concurrently.• Memory organized in buffers/images and transfers are explicit.• Parallelization is a big focus.

OpenCL: Kernels

• Short: The part of OPENCL that goes into the FPGA and is allocated in the FPGA in partial reconfigurable partition.

• Kernels have access to all device memory layers.

• Key to performance is optimizing this memory usage.

• Parallelization is achieved in the form of a pipeline for FPGA.

miguel.astrain@i2a2.upm.es Slide 9

Intel RTE

HOST Interface

KernelPipeline

DDR

Host

JESDE204B

K_DAQ

K_GEN

K_TIMING

K_NFM

BSP

miguel.astrain@i2a2.upm.es Slide 10

OpenCL: BSP

• Short: The part of OPENCL that goes into the FPGA and is FIXED.

• Manages DDR memory.

• It interfaces with the host

• JESD204B interface

• Requires HDL to modify (usually given)

Intel RTE

HOST Interface

Kernelsmanagement

DDR

Ke

rne

lsHost

JESD204B

SPI

OpenCL: Host

• Short: The part of OPENCL that goes into the C++ drivers.

• Host sends commands and can set Global Memory (DDR) (SLOW!)

• Critical processes can be organized with a chain of pipes (HDL=AXI ST)

• Data can be gathered from I/O pins, but kernels are queued from host.

miguel.astrain@i2a2.upm.es Slide 11

Host

Transfer Transfer

I/O I/O

AXI ST/AVALON ST

Development cycle: OpenCL

miguel.astrain@i2a2.upm.es Slide 12

Three main scenarios for a new application:

1- New algorithm

2- FMC module

3- AMC + FPGA

JESD204B controller

(.qip)

Kernels Source

Code

Custom

(.cl)

OpenCL System IP

Files (.qip, .qsys)

Board Config Files(board_env.xml)(board_spec.xml)

Intel Quartus

Prime Design

Software

Board

Support

Package

Intel FPGA SDK

for OpenCL

Kernels Offline

Compiler (aoc)

FPGA bitstreamFixed partition

(.aocx)

HOST

Applicatio

n Code

(.cpp,.h)

System host

compiler

HOST Binary

HOST DESIGN FLOW

DEVICE DESIGN FLOW

KERNELS

FPGA bitstreamPartial

Reconfiguration(.aocx)

Intel

OpenCL

RTE

Intel FPGA SDK

for OpenCL

Kernels Offline

Compiler (aoc)

SPI controller

(.qip)

IRIO-OpenCL

Kernels

DAQ

GEN

(.cl)

Application

dependant

FMC moduledependant

Software

and tools

AMC FPGA

board

dependant

Board

Test

Kernel

(.cl)

Results

• Implementation: more on ID 494 !!

• Basic kernels FPGA resource utilization

miguel.astrain@i2a2.upm.es Slide 13

Under evaluation:o Timingo Triggeringo Routing

Functionality:DAQWFGProcessingRouting

• Totally integrated in EPICS using NDS

• Tested in CODAC CORE SYSTEM 6.0

Results

• Working example: fission chamber neutron flux measurement.

• Why this example:• The measurements benefit from high sampling rates.

• High data rate but simple operations (floating point).

• Logic inside the FPGA can classify different pulses

• Parallelization enables on-line comparison of different algorithms making it and intelligent system.

• Alternatively other algorithms based in machine learning techniques can be executed in parallel.

miguel.astrain@i2a2.upm.es Slide 14

Conclusions

• DAQ combined with OpenCL reduces development of high-performance processing.

• The DAQ and processing systems developed with OpenCL are less manufacturer dependent.

• OpenCL enables C-like development of FPGA with lots of OpenCLalgorithms examples.

• OpenCL handles data transfers and device interface, hardware abstraction.

• Combined with NDSv3 a modular solution was developed, abstracted from the control system. IRIO-OpenCL

• You only need to do the algorithm.

miguel.astrain@i2a2.upm.es Slide 15

Future Work

• Future work:• Implementation of Machine Learning Applications using machine learning and Deep

Learning Tools*.

• Expand IRIO-OpenCL functionalities

• Looking for more use cases (possible collaborations) using IRIO-OpenCL -> Contact us!

miguel.astrain@i2a2.upm.es Slide 16

* Dr Jesus Vega (484. Automatic recognition of plasma relevant events: implications for ITER) 14 may. 2019 9:20

Acknowledgements

This work was supported in part by the Spanish Ministry of Economyand Competitiveness, Projects Nº ENE2015-64914-C3-3-R andMadrid regional government (YEI fund), Grant Nº PEJD-2018-PRE/TIC-8571.

miguel.astrain@i2a2.upm.es Slide 17

The Intel® FPGA SDK for OpenCL™ is based on a published Khronos Specification.Altera, Arria, Intel, the Intel logo, Nios, Quartus and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission of the Khronos Group™.*Other names and brands may be claimed as the property of others.

miguel.astrain@i2a2.upm.es Slide 18

Thank You! Questions?

Backup

miguel.astrain@i2a2.upm.es Slide 19

FPGA

FPGA

FPGA

HOST APPLICATION

Ke

rnel A

rgu

men

ts

12 bits ADC

(1 GS/s)

JESD204B

Pulse Counting

Kernel

DaqStartStop

SamplingRate

BlockSize

FIFO

IO PIPESAI0 (FC)

AI1

Thre

sh

old

s

(m

in,m

ax)

Wid

th

(min

,max)

ST

Ava

lon

Po

rt

CPU

FMC DAQ Device

Threshold

Parameters

Global Buffers

CODAC Core System 6.0

AO0 (FC)

AO1

FC

SAMPLES * 8

125MHz

FC

SAMPLES * 8

125MHz

FC

SAMPLES * 8

125MHz

Campbelling

Kernel

Current

Kernel

FIFO

FIFO

FIFO

PIPES

IRIO-

OpenCL

DAQ

Kernel

Results

Global Buffers

Re

sults

JESD204B

ST Avalon Port

IRIO-OpenCL

SPI Config

12 bits DAC

(1 GS/s)

IRIO-OpenCL

GEN

Kernel

Samples

Global Buffers

Sam

ple

s

PIPESFIFO

FC

SA

MP

LE

S *

4

25

0 M

Hz

NDSv3

Main control, Other systems, etc

PIPES

Low-jitter

Clock GeneratorCLKREF, SYSREF

SPI Master

ST Avalon Port

IRIO-OpenCL

SPI CONFIG

Kernel

IRIO-OpenCL

DAQ

IRIO-OpenCL

GEN

IRIO-OpenCL

NFM

IRIO-OpenCL

BSP

FPGA Logic

ARRIA 10

Reconfigurable

Kernels

Backup

miguel.astrain@i2a2.upm.es Slide 20

asynArrayInt8

asynFloat64

<reason>DAQ.CH1

NDS-Core and NDS-EPICS Libraries

asynDriver Module

<reason>UpdateRate

<reason>UpdateRate_RBV

<more PVs)……...

asynFloat32Array

<reason>DAQ.CH0

asynDriver

Management of NELM for waveforms

Other functionalities specific for HWA

<node>Firmware

<node>Health Monitoring

<PV>SamplingRate_RBV(VariableIn)

<node>Waveform Generation (WFG)

<PV>UpdateRate (DelegateOut)

<PV>UpdateRate_RBV(VariableiN)

<node>Data Acquisition (DAQ)

<PV>SamplingRate (DelegateOut)

<node>Device

<more PVs)……...

Hardware Device A: HWA

EPICS Database

HW Function: DAQ HW FuncTion: WFG

Kernel

Device Driver

(manufacturer)

AP

I

HW Function:TRG-CLK HW Function:Firmware

NDS-EPICS

EPICS-IOC

NDS-EPICS-Ext for HWA

X-Series

PXI6683

PTM-DAMC-1588-10F

PXIe

MTCA.4

FILTERA/D

CONVERTERD/A

CONVERTER FILTER

FILTER

100 MHz

PLL

DIVIDER

MEMORY

MUX

FILTERA/D

CONVERTERMUX D/A

CONVERTER

HW Function:Health Monitoring

Temp Power

NDS-CORE

FlexRIO+NI5761

Results

• OpenCL tools profiler screen

miguel.astrain@i2a2.upm.es Slide 21

• Graphs

Backup