Investigations on PACAP and its receptor in human milk, different ...
PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf ·...
Transcript of PACAP de programmer un FGPA - people.irisa.frpeople.irisa.fr/Simon.Rokicki/files/Pacap-FPGA.pdf ·...
PACAP de programmer un FGPA ?
Steven Derrien, Simon Rokicki21 novembre 2016
INSA-EII-5A 1
Schedule
9:15 - 9h50 : FPGA technology basics
9h50 – 10h15 : Designing FPGAs with HDL
9h15 – 10h45 : Designing FPGAs with HLS
break
10h45 – 12h00 : Lab session 1
break
13h30 – 14h30 : Optimizations for HLS based designs
14h30 – 16h00 : Lab session 2
break
14h30 – 16h00 : Lab session 3
PACAP - FPGA 2
Principles of FPGA technology
Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators
PACAP - FPGA 3
A basic FPGA architecture
L = logic blockC = Connection BlockS = Switch Block
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
Horizontal routingchannel
Vertical routingchannel
Wiringsegment
A matrix of logic blocs + programmable interconnectA Logic Block is programmed to emulate small logic functionsLogic Blocks are wired together to implement the full circuit
PACAP - FPGA 4
Example of logic block structure
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
FPGA
LUT6
Flip-flop
Example based on the Xilinx Virtex 7 architecture
SliceSlice
CLB SLICE
LUT
Logic block (CLB)
• Four 6-input LUTs • Two flip-flops/LUT
PACAP - FPGA 5
LUT (Look-Up Table) Functionality
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
• Look-Up tables used for logic implementation
• A LUT4 can implement any function of 4 inputs
PACAP - FPGA 6
Logic block for real (virtex 7)
Specific featuresfor building wide
multiplexers
Fast carry propagation for
adders, etc.
LUT6 can beused as 64x1
RAM
LUT6 can bedecomposedas 2xLUT5
PACAP - FPGA 7
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
Programmable routing
Based on Switch box and connection blocksConfigurable (depopulated) crossbars
In modern devices, interconnect is more sophisticatedWire spanning several logic blocks, special routing for clock, etc.
PACAP - FPGA 8
External interface
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
C
S
S
C
C
C
S
S
C
C
C S SC C
C S SC C
PACAP - FPGA 9
External interface
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
C
S
S
C
C
C
S
S
C
C
C S SC C
C S SC C
I/O pins and pin mapping is also configurable …
Pins can beconfigured as input/output, bidirectional
FPGA configurationis propagated
serially throughshift registers
Some FPGA pins are dedicated to the configuration
process
Principles of FPGA technology
Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators
PACAP - FPGA 11
architecture MLU_DATAFLOW of MLU is
signal A1,B1,Y1:STD_LOGIC;signal MUX_0, MUX_1: STD_LOGIC;signal MUX_2, MUX_3: STD_LOGIC;
Begin
A1<=A when (NEG_A='0') else not A;B1<=B when (NEG_B='0') else not B;Y<=Y1 when (NEG_Y='0') else not Y1;
MUX_0<=A1 and B1;MUX_1<=A1 or B1;MUX_2<=A1 xor B1;MUX_3<=A1 xnor B1;
with (L1 & L0) select Y1<=MUX_0 when "00",MUX_1 when "01",MUX_2 when "10",MUX_3 when others ;
end MLU_DATAFLOW;
VHDL description Circuit Netlist
Logic Synthesis
PACAP - FPGA 12
Technological mapping
LUT2
LUT3
LUT4
LUT5
LUT1FF1
FF2
LUT0
PACAP - FPGA 13
Technological mapping
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
C
S
S
C
C
C
S
S
C
C
C S SC C
C S SC C
LUT2
LUT3
LUT4
LUT5
LUT1FF1
FF2
LUT0
PACAP - FPGA 14
Palcement and routing
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
C
S
S
C
C
C
S
S
C
C
C S SC C
C S SC C
Derive an actual FPGA configuration meeting constraintsConstraints in the form of achievable clock speed
During the lab you will realizethat P&R can be time consuming.
For very large designs, P&R can take days …
PACAP - FPGA 15
0100101001011001010
Bitstream & device configuration
Configuration data is used by the FPGA at power-up
L C
C
L L
L L L
L L L
S S
S S
C
C
C
C C C
C C
C C
C
S
S
C
C
C
S
S
C
C
C S SC C
C S SC C
From Place & Route results, we derived the configuration Bitstream
The bitstream is then download inside the FPGA from FLASH or by a CPU.
PACAP - FPGA 16
HDLHDL
Logic SynthesisLogic Synthesis
Floorplanning
PlacementPlacement
RoutingRouting
configuration
SimulationSimulation
Post-Layout Simulation
Structural
Physical
BehavioralDesign Capture
Des
ign
Itera
tion
Programmable Logic Design Flow
In situ testingIn situ testing On Field
PACAP - FPGA 17
Principles of FPGA technology
Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators
PACAP - FPGA 18
Limits of LUT based FPGAs
Lack of sufficient on-chip storageSignal processing/Wireless need to buffer data and/or resultsNetwork application need to store many medium sized tables
Poor/insufficient arithmetic performanceInteger Multiplication/ACcumulation a key metric for DSP
Integer multipliers build out of LUTs too slow and costly to enable real-time signal processing applications
On-chip memory built out of LUT and Slice flip-flop not sufficient for addressing performance requirements
PACAP - FPGA 19
DSP blocks
Extend FPGA architecture with arithmetic oriented blocksMedium sized hard-wired integer multipliersFast accumulation, rounding and shifters, etc.
Example of the Virtex-5 DSP block
Somewhat similar structures used in Altera devices
48 bit wide ALU
25 bits Preadder 17 bit shifter for
scaling
25x18 pipelinedinteger multiplier
PACAP - FPGA 20
Embedded memory blocks
Hard-wired memory banks distributed in the FPGAFirst blocks were 9kbits block, current blocks are 36kbits
3636DIADIA
ADDRAADDRA3636
DOADOA
Port A
36 KbMemory
Array
CLKACLKA
WEAWEA44
3636DIBDIB
ADDRBADDRB3636
DOBDOB
Port BCLKBCLKB
WEBWEB44
Configurable width/depth
(32kx1 to 512x72)
Two read/write ports with distinct address ports.
Built-in logic to operate as FIFO buffer
PACAP - FPGA 21
State of the art FPGAs at a glance
Logic Cells
Block RAM
DSP Slices
Peak DSP Perf.
Transceivers
Transceiver Performance
Memory Performance
I/O Pins
I/O Voltages
Lowest Power
and Cost
Industry’s Best Price/Performance
Industry’s Highest System
Performance
Maximum Capability
Different capacity, performance and features
Device cost ranges from 5$ to 20k$ …
PACAP - FPGA 22
FPGA trends
FPGA capacities evolve faster than Moore’s Law dictatesVery regular design eases optimized implementation tricksMultiple FPGA die on a silicon interposer
65% 130% 163%
PACAP - FPGA 23
Principles of FPGA technology
Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators
PACAP - FPGA 24
System Level Integration
Older systems combined FPGA + CPU at PCB levelFlexibility in CPU/DSP FPGA choicesCPU used mostly for UI or system level management
Processor soft-core appeared in early 2000’s
Processors build out of FPGA logic (LUT + DSP + EMB)Limited clock-speed and low performance µ−archEx : NIOS2 (revamped MIPS R3000) reached 300 MIPs
Today, FPGAs integrate high perf. embedded CPUs
ARM processors (A9 – A53) and/or PowerPC coresIntel Xeon-FPGA as a dual chip in the same package
PACAP - FPGA 25
The Zynq platform
Virtual address space
MMU
To external memory (DDRAM)
256kb L2 cache
L1
MMU
L1
Memory controller
Cortex A9Cortex A9
1,2 GB/s
1,2 GB/s
Cache coherent access to L2 with ACP port
Four non coherent access to SDRAM
600Mhz dual core Cortex A9 with Neon SIMD ISA
PACAP - FPGA 26
The Zybo board
27
Low end Zynq based system for academic use (150$).
• 28,000 logic cells• 240 KB Block RAM• 80 DSP slices• 650 MHz dual-core Cortex A9• DDR3 memory 512 MB x32
w/ 1050Mbps bandwidth
Principles of FPGA technology
Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators
PACAP - FPGA 28
FPGA markets
Storage and networking are the main market drivers
Taken from http://www.radiantinsights.com/img/research/north-america-fpga-market.png
PACAP - FPGA 29
FPGAs vs. ASICs
ASIC NRE costs have rising dramatically over yearsFPGAs keep on improving in size, performance, cost
Total Cost
Volume
Std. Cell(current)
FPGA(current)
Break-EventPoint
FPGA(future)
Std. Cell(future)
In 2009, 97% of new design starts target FPGAs
[source chipdesign, 2009]
PACAP - FPGA 30
Principles of FPGA technology
Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators
PACAP - FPGA 31
FPGA as throughput accelerators
FPGA accelerator = massively parallel processing10 Tflops announced for the Stratix 10 FPGAEven better for unconventional arithmetic (cryptography)
FPGA does not necessarily [perform better than GPUSBenefit of FPGAs is mostly the 10x-50x energy efficiency
PACAP - FPGA 32
ControlALU ALU
ALU ALU
Cache
DRAM DRAM DRAM
CPU GPU FPGA
FPGAs as latency accelerators
key
value
Example : key-value store (memcached)Large scale distributed key-value systems
PACAP - FPGA 33