PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf ·...

33
PACAP de programmer un FGPA ? Steven Derrien, Simon Rokicki 21 novembre 2016 INSA-EII-5A 1

Transcript of PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf ·...

Page 1: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

PACAP de programmer un FGPA ?

Steven Derrien, Simon Rokicki21 novembre 2016

INSA-EII-5A 1

Page 2: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Schedule

9:15 - 9h50 : FPGA technology basics

9h50 – 10h15 : Designing FPGAs with HDL

9h15 – 10h45 : Designing FPGAs with HLS

break

10h45 – 12h00 : Lab session 1

break

13h30 – 14h30 : Optimizations for HLS based designs

14h30 – 16h00 : Lab session 2

break

14h30 – 16h00 : Lab session 3

PACAP - FPGA 2

Page 3: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 3

Page 4: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

A basic FPGA architecture

L = logic blockC = Connection BlockS = Switch Block

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

Horizontal routingchannel

Vertical routingchannel

Wiringsegment

A matrix of logic blocs + programmable interconnectA Logic Block is programmed to emulate small logic functionsLogic Blocks are wired together to implement the full circuit

PACAP - FPGA 4

Page 5: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Example of logic block structure

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

FPGA

LUT6

Flip-flop

Example based on the Xilinx Virtex 7 architecture

SliceSlice

CLB SLICE

LUT

Logic block (CLB)

• Four 6-input LUTs • Two flip-flops/LUT

PACAP - FPGA 5

Page 6: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

LUT (Look-Up Table) Functionality

x1 x2 x3 x4

y

x1 x2

y

LUT

x1x2x3x4

y

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y0100010101001100

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y1111111111110000

x1 x2 x3 x4

y

x1 x2 x3 x4

y

x1 x2

y

x1 x2

y

LUT

x1x2x3x4

y

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y0100010101001100

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y0100010101001100

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y1111111111110000

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y1111111111110000

• Look-Up tables used for logic implementation

• A LUT4 can implement any function of 4 inputs

PACAP - FPGA 6

Page 7: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Logic block for real (virtex 7)

Specific featuresfor building wide

multiplexers

Fast carry propagation for

adders, etc.

LUT6 can beused as 64x1

RAM

LUT6 can bedecomposedas 2xLUT5

PACAP - FPGA 7

Page 8: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

Programmable routing

Based on Switch box and connection blocksConfigurable (depopulated) crossbars

In modern devices, interconnect is more sophisticatedWire spanning several logic blocks, special routing for clock, etc.

PACAP - FPGA 8

Page 9: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

External interface

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

PACAP - FPGA 9

Page 10: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

External interface

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

I/O pins and pin mapping is also configurable …

Pins can beconfigured as input/output, bidirectional

FPGA configurationis propagated

serially throughshift registers

Some FPGA pins are dedicated to the configuration

process

Page 11: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 11

Page 12: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

architecture MLU_DATAFLOW of MLU is

signal A1,B1,Y1:STD_LOGIC;signal MUX_0, MUX_1: STD_LOGIC;signal MUX_2, MUX_3: STD_LOGIC;

Begin

A1<=A when (NEG_A='0') else not A;B1<=B when (NEG_B='0') else not B;Y<=Y1 when (NEG_Y='0') else not Y1;

MUX_0<=A1 and B1;MUX_1<=A1 or B1;MUX_2<=A1 xor B1;MUX_3<=A1 xnor B1;

with (L1 & L0) select Y1<=MUX_0 when "00",MUX_1 when "01",MUX_2 when "10",MUX_3 when others ;

end MLU_DATAFLOW;

VHDL description Circuit Netlist

Logic Synthesis

PACAP - FPGA 12

Page 13: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Technological mapping

LUT2

LUT3

LUT4

LUT5

LUT1FF1

FF2

LUT0

PACAP - FPGA 13

Page 14: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Technological mapping

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

LUT2

LUT3

LUT4

LUT5

LUT1FF1

FF2

LUT0

PACAP - FPGA 14

Page 15: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Palcement and routing

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

Derive an actual FPGA configuration meeting constraintsConstraints in the form of achievable clock speed

During the lab you will realizethat P&R can be time consuming.

For very large designs, P&R can take days …

PACAP - FPGA 15

Page 16: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

0100101001011001010

Bitstream & device configuration

Configuration data is used by the FPGA at power-up

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

From Place & Route results, we derived the configuration Bitstream

The bitstream is then download inside the FPGA from FLASH or by a CPU.

PACAP - FPGA 16

Page 17: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

HDLHDL

Logic SynthesisLogic Synthesis

Floorplanning

PlacementPlacement

RoutingRouting

configuration

SimulationSimulation

Post-Layout Simulation

Structural

Physical

BehavioralDesign Capture

Des

ign

Itera

tion

Programmable Logic Design Flow

In situ testingIn situ testing On Field

PACAP - FPGA 17

Page 18: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 18

Page 19: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Limits of LUT based FPGAs

Lack of sufficient on-chip storageSignal processing/Wireless need to buffer data and/or resultsNetwork application need to store many medium sized tables

Poor/insufficient arithmetic performanceInteger Multiplication/ACcumulation a key metric for DSP

Integer multipliers build out of LUTs too slow and costly to enable real-time signal processing applications

On-chip memory built out of LUT and Slice flip-flop not sufficient for addressing performance requirements

PACAP - FPGA 19

Page 20: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

DSP blocks

Extend FPGA architecture with arithmetic oriented blocksMedium sized hard-wired integer multipliersFast accumulation, rounding and shifters, etc.

Example of the Virtex-5 DSP block

Somewhat similar structures used in Altera devices

48 bit wide ALU

25 bits Preadder 17 bit shifter for

scaling

25x18 pipelinedinteger multiplier

PACAP - FPGA 20

Page 21: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Embedded memory blocks

Hard-wired memory banks distributed in the FPGAFirst blocks were 9kbits block, current blocks are 36kbits

3636DIADIA

ADDRAADDRA3636

DOADOA

Port A

36 KbMemory

Array

CLKACLKA

WEAWEA44

3636DIBDIB

ADDRBADDRB3636

DOBDOB

Port BCLKBCLKB

WEBWEB44

Configurable width/depth

(32kx1 to 512x72)

Two read/write ports with distinct address ports.

Built-in logic to operate as FIFO buffer

PACAP - FPGA 21

Page 22: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

State of the art FPGAs at a glance

Logic Cells

Block RAM

DSP Slices

Peak DSP Perf.

Transceivers

Transceiver Performance

Memory Performance

I/O Pins

I/O Voltages

Lowest Power

and Cost

Industry’s Best Price/Performance

Industry’s Highest System

Performance

Maximum Capability

Different capacity, performance and features

Device cost ranges from 5$ to 20k$ …

PACAP - FPGA 22

Page 23: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

FPGA trends

FPGA capacities evolve faster than Moore’s Law dictatesVery regular design eases optimized implementation tricksMultiple FPGA die on a silicon interposer

65% 130% 163%

PACAP - FPGA 23

Page 24: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 24

Page 25: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

System Level Integration

Older systems combined FPGA + CPU at PCB levelFlexibility in CPU/DSP FPGA choicesCPU used mostly for UI or system level management

Processor soft-core appeared in early 2000’s

Processors build out of FPGA logic (LUT + DSP + EMB)Limited clock-speed and low performance µ−archEx : NIOS2 (revamped MIPS R3000) reached 300 MIPs

Today, FPGAs integrate high perf. embedded CPUs

ARM processors (A9 – A53) and/or PowerPC coresIntel Xeon-FPGA as a dual chip in the same package

PACAP - FPGA 25

Page 26: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

The Zynq platform

Virtual address space

MMU

To external memory (DDRAM)

256kb L2 cache

L1

MMU

L1

Memory controller

Cortex A9Cortex A9

1,2 GB/s

1,2 GB/s

Cache coherent access to L2 with ACP port

Four non coherent access to SDRAM

600Mhz dual core Cortex A9 with Neon SIMD ISA

PACAP - FPGA 26

Page 27: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

The Zybo board

27

Low end Zynq based system for academic use (150$).

• 28,000 logic cells• 240 KB Block RAM• 80 DSP slices• 650 MHz dual-core Cortex A9• DDR3 memory 512 MB x32

w/ 1050Mbps bandwidth

Page 28: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 28

Page 29: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

FPGA markets

Storage and networking are the main market drivers

Taken from http://www.radiantinsights.com/img/research/north-america-fpga-market.png

PACAP - FPGA 29

Page 30: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

FPGAs vs. ASICs

ASIC NRE costs have rising dramatically over yearsFPGAs keep on improving in size, performance, cost

Total Cost

Volume

Std. Cell(current)

FPGA(current)

Break-EventPoint

FPGA(future)

Std. Cell(future)

In 2009, 97% of new design starts target FPGAs

[source chipdesign, 2009]

PACAP - FPGA 30

Page 31: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 31

Page 32: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

FPGA as throughput accelerators

FPGA accelerator = massively parallel processing10 Tflops announced for the Stratix 10 FPGAEven better for unconventional arithmetic (cryptography)

FPGA does not necessarily [perform better than GPUSBenefit of FPGAs is mostly the 10x-50x energy efficiency

PACAP - FPGA 32

ControlALU ALU

ALU ALU

Cache

DRAM DRAM DRAM

CPU GPU FPGA

Page 33: PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf · sufficient for addressing performance requirements PACAP ... The Zynq platform ...

FPGAs as latency accelerators

key

value

Example : key-value store (memcached)Large scale distributed key-value systems

PACAP - FPGA 33