ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O...
Transcript of ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O...
ProtoFlex: FPGA-Accelerated Hybrid Simulator
Eric S. Chung, Eriko Nurvitadhi James C. Hoe, Babak Falsafi, Ken Mai
Computer Architecture Lab at
Our work in this area has been funded in part by NSF, Intel, and IBM.
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-2
Multiprocessor Emulation• Need fast MP emulators to study future revolutionary
changes in HW and SW• Hardware concurrency of FPGA emulation can scale up
multiprocessor simulation speed• BUT, we also want full-system fidelity (OS and I/O
support) Can we really build a 1000-node fully detailed MP system,
FPGA or not?
Memory
PCI Bus
Ethernetcontroller
Graphics card
I/O MMUcontroller
DiskDisk
DMAcontroller
IRQ controller Terminal
SCSIcontroller
CPU CPU
FPGA
How to get full-system fidelity without building the full system?
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-4
• Simulators already provide full-system behaviors⇒ why not just simulate infrequent behaviors (e.g., I/O devices)?
Combining Simulators & FPGAs
• Advantagesavoid implementing infrequent behaviors
⇒ simplify full-system emulator developmentlow impact on scalability and performance acceleration
Memory Memory SCSI
disk
SCSI
disk
FPGA SimulatorCPU
Ethernet Ethernet
CPUCPU CPU
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-5
memmem
Transplanting Hybrid Simulator
• 3 ways to map target object to hybrid-simulation hostEmulation-only Simulation-only Transplantable
• Transplantable objectsswitch modes between FPGA & simulator hostscomplete behavior need not be in implemented in FPGAi.e., implement only the frequently used ISA subset in FPGA
1 2 3
Target design FPGA Simulator
memmemI/OI/O
CPUCPU
I/OI/O1 2
3
“Target objects”
CPUCPU
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-6
It Really Works
+= SUN 3800 Server
(1x UltraSPARC III, Solaris8)
Xilinx XUP Virtex-II Pro 30Virtutech Simics
(commercial simulator)
Transplant& message
interfaceEthernet
Simics UltraSPARC
Simulated target devices
BlueSPARCEmbedded PowerPC
DDR memory
1 graduate student in 6 months
How to build a 1K-node MP emulator, without building 1024 nodes?
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-8
0
100
200
300
400
500
600
700
800
900
1000
1 2 4 8 16 32 64 128 256 512 1024
# processor cores in the simulated system
Slow
dow
n re
lativ
e to
real
sys
tem
10 MIPS simulator
100 MIPS simulator
1000 MIPS simulator
How fast do you need to simulate?
In the uniprocessor world• up to 100x slowdown usable for interactive SW research (e.g. Simics)• 1k to 10k slowdown usable for design exploration (e.g. SimpleScalar)
sim-outorder(< 1MIPS)
Aggregate Throughput
Simics
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-9
Different ways to simulate 1K cores• Even for a 1K-node MP, only need 1000 to 10,000 MIPS
(in aggregate) to do useful work• The naïve approach
build a fast ISA core (estimate 100 MIPS per core)physically replicate the core 1000 times
⇒ 10x to 100x faster than it needs to be⇒ Why spend effort on performance I don’t need
• The better approach⎯think in terms of MIPSbuild a 100-MIPS ISA core with a statically interleaved pipeline that can support multiple contexts interleave 100 contexts per core to emulate a 1K-node system with just 10 physical cores
⇒ the parameters and the effort required can be tuned to make the emulator just fast enough and not more
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-10
PROTOFLEXMP
• Build a 1000-MIPS simulator from 10s of FPGAsmaximize throughput per emulation engine to be share by multiple interleaved contextsmultiplex a large number of emulated contexts onto a few emulation engines
Base the number of emulation engines you need on how much performance you need, and not on how many nodes you are emulating
N-way target system
P-way FPGA emulation engines, P<<NMemory
CPU CPU CPUNCPU CPU CPU CPU CPU CPU CPU CPU
CPU CPU CPUP
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-11
Conclusions• Technology to build a large-scale full-system
multicore/multiprocessor simulatorUse hybrid transplantation to avoid a full-system construction effortUse interleaved emulation cores to reduce physical system size and complexity
Full-system simulator host
FPGA platform (BEE2) FPGA platform (BEE2)
CPU CPU CPU
Memory
Micro-transplant simulators
Components hosted on FPGAs
MMU DMA
Graphics NIC SCSI
Terminal
CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-12
ProtoFlexComputer Architecture Lab (CALCM)
http://www.ece.cmu.edu/CALCM