Post on 22-Dec-2015
1
On Controllers,
Soft Connections, and Logical Topologies
Michael PellauerMIT CSAIL
Angshuman Parashar, Michael Adler, Joel EmerIntel VSSAD
2
The Setup
(For both our HAsim simulator and the talk)Virtex5 110t on HiTechGlobal PCIe accelerator
Future: FSB-based accelerators. Larrabee?Use HAsim’s Remote-Request-Response (RRR)
Protocol of communication between SW/HW
Allows calls from one to the other
FPGAHost
Processor PCIe
runprogram
dumpstats
emulateinstr
translateaddress
3
Just because you can talk doesn’t mean you have anything interesting to say!We must control higher-level interactions between software and hardware
Example: “Dump Stats” commandTransmit requests intra-FPGA, aggregate responses
Future: think about multiple-FPGA setup
The Problem of the Day
Cache
BranchPred
PCIeInterface FPGARRR
dumpstats Controller …
4
The HAsim Controller
Software sees it as…
Hardware sees it as…
ControllerHost
Software
run, pause,
…
setParam
dumpstats
Controller
run, pause,
… setParam
dumpstats
enableevents
debugassertion
fail
RRR
Which modules use which service is very
fluid
Different modules access different
services
5
Problem: HDLs’ Inflexible Interfaces
Branch Predictor has a bugWant to send some debug info to the Controller
Fundamental Problem: HDLs allow communication only up and down hierarchy
Verilog OOMRs are not an acceptable solution
Gets worse if we have alternative modules
BranchPred
Simulator
Core Controller
Front End RRR
Fetch PCIe
HW Module Instantiation
6
Our Solution: Soft Connections
Goal: “soften” rigid communication hierarchyUsers separately instantiate named endpointsCan read and write as if they were half of a guarded FIFO (FI and FO)
Instantiator’s interface does not changeBluespec standard ModuleCollect library
mkSend
“fet2dec”
mkRecv
“fet2dec”
send()recv()
Added During Bluespec Static Elaboration Compiler Phase
7
Review: Static Elaboration PhaseInline function calls and datatypes as combinational logic
Instantiate modules with specific parameters
Resolve polymorphism/overloading
run2
design2 design3
run1run1…
run1run1run2.1…
run1run1run3.1…
source
Software Toolflow:
.exe
Compile
design1
Elaborate w/params
source
Hardware Toolflow:
run1run1run1.1…
run w/params
run1
run w/params
run3
8
Elaboration-Time Algorithm
let (sends, recvs) = getCollection() // Get from ModuleCollectfor each s in sends do
let rs = matchByName(s.name, recvs)if rs == {} and not s.optional then
error(“Unmatched Send:” + s.name)else if rs == {r} then
connect(s, r) // instantiate bufferingelse
error(“Multiple Receives connected to:” + s.name)recvs = recvs – rs // remove matched recvs
for each r in recvs doerror(“Unmatched Receive:” + r.name)
Open Question: Can we do this in SystemVerilog as well?
9
“Multicast” Connections
A one-to-many Send (broadcast)
A many-to-one Recv (listener)
mkBcast
“start_prog”
mkRecv
“start_prog”
broadcast()
recv()
mkRecv
“start_prog”
recv()
mkRecv
“start_prog”recv()
Standard receive modules
ID + data
mkListener
“debug_out”
listen()
mkSend
“debug_out”
send()
mkSend
“debug_out”
send()
mkSend
“debug_out”
send()
Standard send modules
(now multiple recvs are no longer an error)
10
Building 2-Way Communication
More complex abstractions from primitivesClient/Server
“Multicast” Client/Server
mkClient
“mem_load” “mem_load”
makeReq() getReq()
mkServergetResp() makeResp()
ID + data
mkClient
“mem_load”
“mem_load”
makeReq()
getReq()mkServer
getResp()
makeResp()mk
Client“mem_load”
makeReq()getResp()
Standard Client modules
ID + data
Standard Server modules
mkClient
“stats_count”“stats_count”
broadcastReq()
mkServer
getResp()
makeResp()
getReq()mkServer
makeResp()
“stats_count”
getReq()
Pair of normal send and recv
11
Controller Services: Revisited
Which should get which type of soft connection?Commands/Params:
Receive from software, send to many modulesOne-to-Many BroadcastCan make a nice abstraction for local commands, params
Events/Stats:Receive from software, send to many modules, aggregate responsesMany-to-one Client
Assertions/Debug:Receive from many modules, send to softwareMany-to-one Receive
12
Case Study: span
span(c) = number of instantiation boundaries crossed between sender and receiver
Roughly, the pain of changing a communication path
In HAsim, 118/217 connections are to/from ControllerWe start to worry about the massive fan-in
3
33
8
18 17 1510
74
36
4
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6 7 8 9
Span
Nu
mb
er
of
Co
nn
ec
tio
ns
13
Logical Topology vs Physical Topology
We described the “logical” communication topologyCould be implemented with different physical topologyCould use Rings/Trees/Grids to offset massive fan-in
Implemented: Rings and TreesSo far no improvement over physical point-to-point
send
recv
station
station
station
station
station
station
station has an address for “foo”
#5“foo”
“foo”
station has toknow #5 means
“foo”
this stationdoesn’t have #5 Station routing
tables madeat elaboration
send
recv
recv
Connection interfacedoes not change!
14
Take Aways
FPGA-as-accelerator model is rapidly maturingThe FPGA-as-raw-fabric model is not ideal
Something like HAsim’s Controller helpsCoordinates interaction between FPGA/SW
Need different Hardware-design techniques for FPGA accelerators
More flexibility needed: reconfigurations common
Soft Connections bring flexibility to interfacesMake it easier to have a fluid set of modules which interact with the controllerLogical topology != Physical topologyDesigner needs help with both
17
The Controller’s Services
Commands: Receive “start” or “pause” from softwareController distributes to all interested hardware modules
Params: Receive dynamic command line valuesController distributes to interested hardware modules
Events: Software can enable, disable Controller aggregates, sends to software
Stats: Software requests dump periodicallyController passes on request, aggregates responses
Assertions: Controller passes failures on to software
Debug: Controller passes info on to software
18
Ultimately we want many distributed “services” throughout the FPGA talking to software
They communicate at different rates
It makes sense for the variable/rare services to share the same interconnect on the FPGAFlexibility of communication == Easier developmentToday: Development plan and issues
Making “Gateware” more like Software
Common Variable Rare
Events Loads/Stores
Debugging Messages
Stats Assertion failures
19
Review: Soft Connections
Point-to-Point
“Smart” Synthesis Boundaries
Client/Server
mkSend
“fet2dec”
mkRecv
“fet2dec”
send() recv()mkClient
“funcp_fet” “funcp_fet”
makeReq() getReq()
mkServergetResp() makeResp()
outg outg outg outg outg
Compiler Log: “Dangling Send fet2dec [3] {Inst}”
send “fet2dec”send()
try_xfer()xfer_ack()
…
try_xfer() xfer_ack()
B
A
mkB
mkB
addDanglingSend(mkB.outg[3], “fet2dec”, “Inst”);
20
Proposed Primitive: One-To-Many
A “Broadcast” Send
mkBcast
“start_prog”
mkRecv
“start_prog”
broadcast()
recv()
mkRecv
“start_prog”
recv()
mkRecv
“start_prog”
recv()
mkRecv
“start_prog”
recv()
All rules and registers inserted duringstatic elaboration
(don’t know how many receivers during instantiation)
rule when (all r == 1):all r <= 0q.deq()
when (r[0] == 0):try_xfer(q.first())if (ack) r[0] <= 1
when (r[1] == 0):try_xfer(q.first())if (ack) r[1] <= 1
when (r[2] == 0):try_xfer(q.first())if (ack) r[2] <= 1
when (r[3] == 0):try_xfer(q.first())if (ack) r[3] <= 1
Tougher alternative: many FIFOs
Standard receive modules
21
Proposed Primitive: Many-to-One
A “listener” receive
mkListener
“debug_out”
listen()
ID + data
mkSend
“debug_out”
send()
mkSend
“debug_out”
send()
mkSend
“debug_out”
send()
mkSend
“debug_out”
send()
rule when (q0.notEmpty):try_xfer(q0.first(), 0)if (ack) q0.deq()
rule when (q1.notEmpty):try_xfer(q1.first(), 1)if (ack) q1.deq()
rule when (q2.notEmpty):try_xfer(q2.first(), 2)if (ack) q2.deq()
rule when (q3.notEmpty):try_xfer(q3.first(), 3)if (ack) q3.deq()
All rules inserted during static elaboration(don’t know IDs during instantiation)
Standard send modules
Is a fairness guarantee needed?
22
Proposed Primitive: Hub Servers
Hub Server, Distributed Clients1 Many-to-One ConnectionReverse is many One-to-One connections
Remove the ID and send it to the appropriate destination
mkClient
“mem_load”
“mem_load”
makeReq()
getReq()mkHubServer
getResp()
makeResp()
mkClient
“mem_load”
makeReq()
getResp()
ID + data
Standard Client modules
23
Proposed Primitive: Hub Client
Hub Client, Distributed Servers1 One-to-Many Connection1 Many-to-One Connection
mkHubClient
“stats_count”“stats_count”
broadcastReq()
getReq()
mkServer
getResp()
makeResp()
getReq()
mkServer makeResp()
“stats_count”
ID + data
Standard Server modules
Ability to send to individuals as well?