HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible
description
Transcript of HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible
HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible
Michael AdlerElliott FlemingMichael PellauerJoel Emer
2
Outline
• Problem & goals• Basic model structure• Modeling a pipelined microarchitecture• Modeling memory hierarchies• Modeling multiprocessors• FPGA implementation details
3
Standard Scaling Problem Slide
• Single core targets: model performance scaled with processor speed• Multi-core targets: problem size grows with each generation
• Solutions:
– Reduce fidelity:• Shorter runs• Subset of available cores• Lightweight model
– Structural simulator change:• Parallelize it• Find a new method
4
Dependence Problems in Parallel Software Models
Option 1: Target CPUs ➞ Simulator Threads– Uncore causes dependence between simulator threads– High performance models (e.g. Graphite) relax the dependence
Fetch Decode Execute
Core 0 Core 1Uncore
Option 2: Target Pipeline Stages ➞ Simulator Threads– Lots of data movement– Cyclic pipelines impose complex dependence
5
Why is Hardware Difficult to Model in Software?
• Constant data movement through pipelines• Many points of dependence between “parallel” regions• Large, irregular, memory footprint• Difficult to vectorize• Branchy
6
Software Model Compromises
• Speed: Detailed model– Slow– Studies limited by run-time (e.g. large cache replacement policy)
• Accuracy: Simplified model– Model writer makes decisions about fidelity, hoping not to affect
predictions– Multi-core interactions remain difficult to parallelize
Find a new method?
7
FPGAs
• Shares the same properties as the target machine– Abundant wires– “ parallelism– “ registers
• Obvious mapping of pipelines• Already ubiquitous for RTL verification• Fast
Detailed FPGA models are often faster than simple models!
8
Aggregate Simulator Throughput (Parsec Black-Scholes)
9
Classification of FPGA-Based Designs
10
Prototype
• Final RTL, mapped to a different technology– E.g. an ASIC emulated on an FPGA
• This is what most people imagine for FPGA-based models
Characteristics:• Useful for verification before producing final hardware
– Shorter debugging loop– Internal state is more visible than final hardware– Masks are expensive
• Too late to make big micro-architectural decisions• Often too large to fit on a single FPGA• Often too late or too slow to be useful for software development
11
Functional Emulator
• Model architectural semantics• No prediction of run-time
Characteristics:• Can be written faster than prototypes• Potentially more FPGA-area efficient
– Use FPGA-friendly structures (e.g. no big CAMs)– Multiplex functional pipelines (like SMT)
• Useful as a software development platform• Not useful for microarchitectural research
12
Model
• Project metrics of interest (e.g. timing, power, reliability)• Emulate functional behavior as needed to compute metrics
Characteristics:• Metric may be computed algorithmically (even time)• An extension of functional emulators: function + metrics
13
Model Terminology
Modeling hardware on hardware leads to terminology confusion:– Both have caches, pipelines, memories…
• Target machine means the microarchitecture being studied• FPGA, functional-model and timing-model all refer to
implementation details. (E.g. functional memory cache is an FPGA structure.)
• Host is the general purpose machine to which FPGAs are connected
14
Why isn’t everyone building timing models with FPGAs?
15
Fast, Accurate or Now?
Accuracy
Development TimeModel Speed
16
FPGA Picture is Different
Accuracy
Development TimeModel Speed
17
Reducing Development Time: Managing Complexity
Use FPGAs while focusing on my algorithm? HAsim LEAP
Development Time
Model time? A-Ports Re-use components?
Split functional / timing models AWB
Fit a large problem on FPGAs? Multiplexing Latency Insensitivity Multiple FPGAs
How do I:
18
STDIO on General Purpose Machines
FILE *f = fopen(path, “w”);const char *name = “Kenneth”;fprintf(f, “%s, what is the frequency?\n”, name);
19
I/O In Hardware Description Languages (System Verilog)
Integer f = fopen(path, “w”);string name = “Kenneth”;fwrite(f, “%s, what is the frequency?\n”, name);
20
Nothing Comes from Nothing
FPGAs have:• No standard physical device• No standard device model• No standard system interface• No standard API
21
What Makes Hardware General Purpose?
The software!
• Compilers and library APIs make code “universal”• Hardware standards (ACPI, PCIe) make OS development and
compiler writing easier. Little impact on user programs.• ISA matters if you want to avoid recompiling. ISA is part of the
software API, along with standard libraries.
22
LEAP Platform
RRR
Platform Interface
STDIOScratchpadMemory
Control
Timing Partition
Functional Partition
Remote Memory Channel
FPGA Physical Platform
ExeDecodeFetch
RRR
Channel
Software Physical Platform
VirtualPlatform
Control
Software Services
StreamsMemoryStateEmulate
VirtualPlatform
FPGA Software
23
Hello World in LEAP
module [CONNECTED_MODULE] mkConnectedApplication ();
STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");
Reg#(STATE) state <- mkReg(STATE_start);
rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule
endmodule
24
Bluespec on One Foot
• Functional language derived from Haskell• Generates Verilog• Modules – the analog of C++ classes
– May be polymorphic (types are abstract)
• Methods are the callable routines exposed by modules– Inlined statically at compile time into a calling rule
• Rules are:– Executed atomically– Guarded (predicated)
• Guard is both explicit (user specified) and implicit• Implicit guards come from guards on methods called in a rule
25
Hello World in LEAP
module [CONNECTED_MODULE] mkConnectedApplication ();
STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");
Reg#(STATE) state <- mkReg(STATE_start);
rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule
endmodule
main()
26
Hello World in LEAP
module [CONNECTED_MODULE] mkConnectedApplication ();
STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");
Reg#(STATE) state <- mkReg(STATE_start);
rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule
endmodule
Control Logic
27
Hello World in LEAP
module [CONNECTED_MODULE] mkConnectedApplication ();
STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");
Reg#(STATE) state <- mkReg(STATE_start);
rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule
endmodule
STDIO
28
LEAP Gives FPGAs Key “General Purpose” Properties
Virtual Platform– I/O– Virtual memory abstract ion (scratchpads)
Topology– Named channels (FIFOs) instead of hard-coded wires– Host/FPGA remote procedure calls– Automated mapping to multiple FPGAs
Debugging Aids– Deadlock detection– Automated scan chains– User scan chains
29
LEAP Platform Users
• HAsim timing models• Prototypes
– SSD Functional Model– AirBlue wireless network stack
• Algorithmic accelerators– H.264 decoder– Matrix multiplication– …
30
Key Concept: Latency Insensitivity
31
Latency Insensitive Channel Semantics
• Guaranteed:– FIFO– Accurate– Always allow at least one message to be in flight
• Not guaranteed:– Latency
Why?– Allows for replacement of algorithms – even to software– Permits use of hierarchical memories (caches)– Simplifies communication – especially off-chip
This is a common software strategy (pipes, TCP/IP, pthread mutex)
32
Named Channels
• Name both endpoints of a FIFO• Software builds the connection• Replaces user’s hand-routed Verilog channels• Automatically route, even across FPGAs
Common in software:– Named ports in software timing models– UUCP has been dead for a long time (for a reason)
33
34
Finally, an Explanation of our Project’s Name
LINC: Latency-Insensitive Named Channel
LEAP: LINC-based Environment for Application Programming
HAsim: Hardware-based micro-Architecture Simulator