ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O...

12
ProtoFlex: FPGA-Accelerated Hybrid Simulator Eric S. Chung, Eriko Nurvitadhi James C. Hoe, Babak Falsafi, Ken Mai Computer Architecture Lab at Our work in this area has been funded in part by NSF, Intel, and IBM.

Transcript of ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O...

Page 1: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

ProtoFlex: FPGA-Accelerated Hybrid Simulator

Eric S. Chung, Eriko Nurvitadhi James C. Hoe, Babak Falsafi, Ken Mai

Computer Architecture Lab at

Our work in this area has been funded in part by NSF, Intel, and IBM.

Page 2: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-2

Multiprocessor Emulation• Need fast MP emulators to study future revolutionary

changes in HW and SW• Hardware concurrency of FPGA emulation can scale up

multiprocessor simulation speed• BUT, we also want full-system fidelity (OS and I/O

support) Can we really build a 1000-node fully detailed MP system,

FPGA or not?

Memory

PCI Bus

Ethernetcontroller

Graphics card

I/O MMUcontroller

DiskDisk

DMAcontroller

IRQ controller Terminal

SCSIcontroller

CPU CPU

FPGA

Page 3: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

How to get full-system fidelity without building the full system?

Page 4: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-4

• Simulators already provide full-system behaviors⇒ why not just simulate infrequent behaviors (e.g., I/O devices)?

Combining Simulators & FPGAs

• Advantagesavoid implementing infrequent behaviors

⇒ simplify full-system emulator developmentlow impact on scalability and performance acceleration

Memory Memory SCSI

disk

SCSI

disk

FPGA SimulatorCPU

Ethernet Ethernet

CPUCPU CPU

Page 5: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-5

memmem

Transplanting Hybrid Simulator

• 3 ways to map target object to hybrid-simulation hostEmulation-only Simulation-only Transplantable

• Transplantable objectsswitch modes between FPGA & simulator hostscomplete behavior need not be in implemented in FPGAi.e., implement only the frequently used ISA subset in FPGA

1 2 3

Target design FPGA Simulator

memmemI/OI/O

CPUCPU

I/OI/O1 2

3

“Target objects”

CPUCPU

Page 6: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-6

It Really Works

+= SUN 3800 Server

(1x UltraSPARC III, Solaris8)

Xilinx XUP Virtex-II Pro 30Virtutech Simics

(commercial simulator)

Transplant& message

interfaceEthernet

Simics UltraSPARC

Simulated target devices

BlueSPARCEmbedded PowerPC

DDR memory

1 graduate student in 6 months

Page 7: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

How to build a 1K-node MP emulator, without building 1024 nodes?

Page 8: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-8

0

100

200

300

400

500

600

700

800

900

1000

1 2 4 8 16 32 64 128 256 512 1024

# processor cores in the simulated system

Slow

dow

n re

lativ

e to

real

sys

tem

10 MIPS simulator

100 MIPS simulator

1000 MIPS simulator

How fast do you need to simulate?

In the uniprocessor world• up to 100x slowdown usable for interactive SW research (e.g. Simics)• 1k to 10k slowdown usable for design exploration (e.g. SimpleScalar)

sim-outorder(< 1MIPS)

Aggregate Throughput

Simics

Page 9: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-9

Different ways to simulate 1K cores• Even for a 1K-node MP, only need 1000 to 10,000 MIPS

(in aggregate) to do useful work• The naïve approach

build a fast ISA core (estimate 100 MIPS per core)physically replicate the core 1000 times

⇒ 10x to 100x faster than it needs to be⇒ Why spend effort on performance I don’t need

• The better approach⎯think in terms of MIPSbuild a 100-MIPS ISA core with a statically interleaved pipeline that can support multiple contexts interleave 100 contexts per core to emulate a 1K-node system with just 10 physical cores

⇒ the parameters and the effort required can be tuned to make the emulator just fast enough and not more

Page 10: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-10

PROTOFLEXMP

• Build a 1000-MIPS simulator from 10s of FPGAsmaximize throughput per emulation engine to be share by multiple interleaved contextsmultiplex a large number of emulated contexts onto a few emulation engines

Base the number of emulation engines you need on how much performance you need, and not on how many nodes you are emulating

N-way target system

P-way FPGA emulation engines, P<<NMemory

CPU CPU CPUNCPU CPU CPU CPU CPU CPU CPU CPU

CPU CPU CPUP

Page 11: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-11

Conclusions• Technology to build a large-scale full-system

multicore/multiprocessor simulatorUse hybrid transplantation to avoid a full-system construction effortUse interleaved emulation cores to reduce physical system size and complexity

Full-system simulator host

FPGA platform (BEE2) FPGA platform (BEE2)

CPU CPU CPU

Memory

Micro-transplant simulators

Components hosted on FPGAs

MMU DMA

Graphics NIC SCSI

Terminal

Page 12: ProtoFlex: FPGA-Accelerated Hybrid Simulator€¦ · 1 2 3 Target design FPGA Simulator I/OI/O memmem CPU CPU 1 2 I/OI/O 3 “Target objects ... • Build a 1000-MIPS simulator from

CMU/ECE/CALCM/HOE NSFNGS Workshop, March 2007, slide-12

ProtoFlexComputer Architecture Lab (CALCM)

http://www.ece.cmu.edu/CALCM