EE109 FPGAs and Memories
Transcript of EE109 FPGAs and Memories
18.1
Unit 18
Field Programmable Gate Arrays (FPGAs)
Implementing Logic Functions with Memories
18.2
HARDWARE IMPLEMENTATION TARGETS
18.3
Processing Logic Approaches• Recall HW/SW designs sit on a continuum• Suppose I want to implement: F = (X+Y)*(A+B)• Custom Hardware (Faster, Less Power)
– Logic that directly implements a specific task– Example above may use separate adders and a
multiplier unit
• General Purpose (GP) Processor/Microcontroller (Design Time, Cost)– Logic designed to execute SW instructions– Provides basic processing resources that are reused by
each instruction
• What if I want to perform: (X*Y) + (A*B)– What's easiest to redesign?
+(Adder)
+(Adder)
*
X
Y
A
B
F
Custom HW ImplementationC
om
pu
tin
g S
ys
tem
Co
nti
nu
um
Application
Specific Hardware
(no software)
Processor
Executing Software
Fle
xib
ilit
y, D
es
ign
Tim
e
Pe
rfo
rma
nc
e
Co
st
+ *
CPU controlInstruc.
StoreADD T,X,YADD S,A,BMUL F,T,S
GP Proc. Implementation
of (X+Y)*(A+B)
Data in Mem.
Proc
18.4
Progression of HW Logic Density
• Our ability to design hardware components with greater numbers of gates/transistors has increased exponentially
• Small Scale Integrated (SSI) Circuits– 1960’s and 1970’s– A few gates on a chip (74LS00 has 4 NAND gates)
• Medium Scale Integrated (MSI) Circuits– 1970’s– Around a hundred gates per chip (4-bit adder)
• Large Scale Integrated (LSI) Circuits• Very Large Scale Integrated (VLSI) Circuits
– 100’s of millions of gates
18.5
ASICs
• Application Specific Integrated Circuits (ASICs) is another name for a typical "chip"
• Computer engineers determine the gates and their interconnection that performs a specific task/application– Start with high level "behavioral" description
– Use CAD software tools to refine that to logic gates
– Use CAD software tools to refine that to transistors and where each should be located on the surface of the chip and how they should be wired together
– From there the chip is fabricated and mass-produced
• Design process is expensive, and once fabricated the design cannot be changed (but it is fast and uses less power)
In an ASIC design, a
unique chip will be
manufactured that
implements our design at
which point the HW
design is fixed & cannot
be changed (example:
Pentium, etc.)
18.6
ASICs
18.7
Motivation for Reconfigurable Logic• Could we get some of the benefits of
both hardware (speed/power) AND software (flexible/reusable)
• Yes…enter Field Programmable Gate Arrays (FPGAs)– Has prebuilt, generic hardware constructs
that can be configured and interconnected based on one design and then reconfigured and interconnected later for another design
• Let's learn more about the secret ingredient to FPGAs…memories!
Computing System ContinuumApplication
Specific Hardware
(no software /
custom chip)
Microcontroller/Processor
Executing Software
Reconfigurable
Hardware; FPGAs
FPGA’s have “logic
resources” on them that
we can configure to
implement our specific
design. We can then
reconfigure it to
implement another design
18.8
Where are FPGAs Used
• Datacenters
– Bing search engine
– Real-time data analytics
– Compression and encryption
– High-frequency trading
• Robots and Rovers
– JPL and the Mars Rovers
• Telecom
• Aerospace
18.9
USING MEMORIES TO BUILD COMBINATIONAL CIRCUITS
18.10
MEMORY BASICSDimensions and Operations
18.11
Memories
• Memories store (write) and retrieve (read) data
– Read-Only Memories (ROM’s): Can only retrieve data (contents are initialized and then cannot be changed)
– Read-Write Memories (RWM’s): Can retrieve data and change the contents to store new data
18.12
ROM’s
• Memories are just tables of data with rows and columns
• When data is read, one entire row of data is read out
• The row to be read is selected by putting a binary number on the address inputs
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
A2
A0
A1
D3 D2 D1 D0
0
1
2
3
4
5
6
7
Address
Inputs
Data
Outputs
ROM
18.13
ROM’s
• Example– Address = 410 = 1002 is
provided as input
– ROM outputs data in that row (1101 bin.)
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
A2
A0
A1
1 1 0 1
0
1
2
3
4
5
6
7
Address:
1002 = 410
Data:
Row 4 is
output
ROM
1
0
0
D3 D2 D1 D0
18.14
Memory Dimensions
• Memories are named by their dimensions:
– Rows x Columns
• n rows and m columns => n x m ROM
• n rows => log2n address bits…or…2k rows => k address bits
• m cols => m data outputs
0 … 1
1 0
0 0
0 0
1 1
0
1
2
2n-2
ROM
.
.
.
2n-1
An-1
A0
A1
…
Dm-1 D0
18.15
RWM’s
• Writable memories provide a set of data inputs for write data (as opposed to the data outputs for read data)
• A control signal R/W (1=READ / 0 = WRITE) is provided to tell the memory what operation the user wants to perform
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
A2
A0
A1
DO3 DO2 DO1 DO0
0
1
2
3
4
5
6
7
Address
Inputs
Data
Outputs
8x4 RWM
DI2
DI0
DI1
DI3Data
Inputs
R/W
18.16
RWM’s
• Write example– Address = 310 = 0112
– DI = 1210 = 11002
– R/W = 0 => Write op.
• Data in row 3 is overwritten with the new value of 11002.
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
0
1
1
? ? ? ?
0
1
2
3
4
5
6
7
Address
Inputs
Data
Outputs
8x4 RWM
1
0
0
1Data
Inputs
0
R/W
1 1 0 0
A2
A0
A1
DI2
DI0
DI1
DI3
DO3 DO2 DO1 DO0
R/W
18.17
USING MEMORIES TO BUILD COMBINATIONAL FUNCTIONS
Look-up tables…
18.18
Memories as Look-Up Tables
• One major application of memories in digital design is to use them as LUT’s (Look-Up Tables) to implement logic functions
– This is the core technology used by FPGAs (Field-Programmable Gate Arrays)
• Idea: Use a memory to hold the truth table of a function and feed the inputs of the function to the address inputs to "look-up" the answer
18.19
Implementing Functions w/ Memories
1
0
1
1
0
0
0
1
A2
A0
A1
D0
0
1
2
3
4
5
6
7
8x1 Memory
X Y Z F
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 1
Arbitrary
Logic
Function
X
Z
Y
F
1
0
1
1
0
0
0
1
A2
A0
A1
D0
0
1
2
3
4
5
6
7
8x1 Memory
1
0
1
0
X,Y,Z inputs
“look up”
the correct
answer
Use a memory with the same dimensions as 'output' side of the truth table.
It's almost TOO easy.
X
YZ
F
XYZ F
A0
A1
A2D0
8x1 Mem.
18.20
Implementing Functions w/ Memories
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
A2
A0
A1
D1
0
1
2
3
4
5
6
7
8x2 Memory
X Y Z C S
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Multi-bit function
(One's count)
X
Z
Y
C
8x2 Memory
D0
S
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
A2
A0
A1
D1
0
1
2
3
4
5
6
7
1
1
0
1
D0
01+0+1 = 10
Use a memory with the same dimensions as 'output' side of the truth table.
It's almost TOO easy.
18.21
3-bit Squaring Circuit
• Q: What size memory would you use to build our 3-bit squaring circuit?
• A: 8x6 memory
• Q: What would you connect to the address inputs of the memory?
• A: A[2:0]
• Q: What bits would you program into row 5 of the memory?
• A: 011001 (i.e. 25 = 52)
Inputs Outputs
A A2 A1 A0 B5 B4 B3 B2 B1 B0 B=A2
0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 1 1
2 0 1 0 0 0 0 1 0 0 4
3 0 1 1 0 0 1 0 0 1 9
4 1 0 0 0 1 0 0 0 0 16
5 1 0 1 0 1 1 0 0 1 25
6 1 1 0 1 0 0 1 0 0 36
7 1 1 1 1 1 0 0 0 1 49
Memory Contents to
build 3-bit Squaring
Circuit
18.22
4x4 Multiplier ExampleDetermine the dimensions of the memory that would be necessary to implement a 4x4-bit unsigned multiplier with inputs X[3:0] and Y[3:0] and outputs P[??:0]
Question: How many bits are needed for P?
Question: What are the contents of the numbered rows?
Example:
X3X2X1X0=0010
Y3Y2Y1Y0=0001
P = X * Y = 2 * 1 = 2
= 00010
ROM
...
A2
A0
A1Y1
Y0
Y2
Y3 A3
A6
A4
A5X1
X0
X2
X3 A7
P7 P0
0
2
20
39
255
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 1 1 0
1 1 1 0 0 0 0 1
20=00010100
=0001*0100=4
39=00100111
=0010*0111=14
255=11111111
=1111*1111=225
18.23
Implementing Functions w/ Memories
• To implement a function w/ n-variables and m outputs
• Just place the output truth table values in the memory
• Memory will have dimensions: 2n rows and m columns
– Still does not scale terribly well (i.e. n-inputs requires memory w/ 2n rows)
– But it is easy and since we can change the contents of memories it allows us to create "reconfigurable" logic
– This idea is at the heart of FPGAs
18.24
FPGAS
18.25
Basis of FPGA’s
• Memories provide a universal way to implement any combinational logic function– 2n x m memory can implement a
function of n-variables and m outputs
• If we use RWM (read/write memory) rather than ROM’s we can change what function the memory implements
• Memories are referred to as Look-up Tables (LUT’s)
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
X
Cin
Y
Cout S
D1 D0
0
1
2
3
4
5
6
7
8x2 Memory
A2
A0
A1
Full Adder
Implementation
18.26
Configurable Logic Blocks (CLB’s)
• The memory allows for any combinational function
• Provided D-FF’s allow designs with sequential logic
– “Bypass” mux selects the pure combinational output of the LUT or the sequential/registered/D-FF output
• Blue boxes indicate configurable bits that control the operation and function of the logic
Any 3-input /
2-output
combinational
function
FF’s if
sequential
logic needed
0
1
2
3
4
5
6
7
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
A0
A1
A2
D1 D0
8x2 Mem.
CLK
D
Q
CLK
D
Q
CLB
01 01
bypass mux
18.27
Routing & Switch Matrices
• Inputs and outputs of neighboring CLB’s connect to a “switch matrix” (SM)
• Switch matrix is simply composed of muxesthat allow us to “route” inputs and outputs to another CLB or further away
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
18.28
Routing & Switch Matrices
• Suppose we want the connection shown in green and purple, what select values would be used? B
A
L
BA
L
LBA
LBA...
...
...
...
C
To / from
N SM
Switch
Matrix
(SM)
CLB
CLB
To / from E SM
To / from
S SM
CLB
CLB
To / from W SM
A B
D
E
F
GHI
J
K
L 1110
01
11
01
11
1110
11
10=
10
11
2
110=00012
18.29
Place and Route
• ASIC: Find where each gate should be placed on the chip and how to route the wires that connect to it– Direct connections can be faster
• FPGA: Determine which LUT’s should be used and how to route through switch matrices– Added delay to go through the routing muxes
ASICFPGA
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
18.30
BA
L
LBA...
...
C
To / from
N SM
Switch
Matrix
(SM)
CLB
To / from E SM
A B
D
E
F
1110
01
11
CLB
CLB 1CLB 2
CLB 1
CLB 2
CLB 2
CLB 1
Exercise
• Find the configuration bits to build a 3-bit free-running (always enabled) counter
0
1
2
3
4
5
6
7
A0
A1
A2
D1 D0
8x2 Mem.
CLK
D
Q
CLK
D
Q
CLB
01 01
0
1
2
3
4
5
6
7
A0
A1
A2
D1 D0
8x2 Mem.
CLK
D
Q
CLK
D
Q
CLB
01 01
0 1
1 0
d d
d d
d d
d d
d d
d d
0 0
0 1
0 1
1 0
1 0
1 1
1 1
0 0
Q0
0
0
Co
Q1
Q2
0 111
Q1Q2 Q0Co
0
0 0 Q0
Co Q0
Co
Q1
Q2
Q1
Q2
Select to
choose Q0
(B input
label) = 0001
HA
3-bit Reg.
HA HA
1
Q0Q1Q2
Ci
Q1
Q2
Q0
Q0 Co Q0*(Q0+1)
0 0 1
1 1 0
Co
Q2 Q1 Co Q2* Q1*
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
1 1 0 1 1
1 1 1 0 0
Selects to
choose
A = 0000
D = 0011
E = 0100
3
4
18.31
ASIC’s vs. FPGA’s
• ASIC’s
– Faster
– Handles Larger Designs
– More Expensive
– Less Flexible (Cannot be reconfigured to perform a new hardware function)
• FPGA’s
– Slower (extra logic to make it reconfigurable)
– Smaller Designs
– Less Expensive
– Extremely Flexible
18.32
Modern FPGA's
• SoC design (Xilinx Kintex [KU115])
– Quad-Core ARM cores
– DDR3 SDRAM Memory Interface
– ~800 I/O Pins
– ~15M gate equivalent FPGA fabric
• ~1M D-FFs + 552K LUTs
• 1968 dedicated DSP "slices" 18x18 multiply + adder