Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael...
Hasim
Joel Emer†‡
Michael Adler†, Artur Klauser†, Angshuman Parashar†, Michael Pellauer‡,
Murali Vijayaraghavan‡
†VSSADIntel
‡CSAILMIT
2007.05.14 Hasim2
Overview
• Goal– Produce compelling evidence for architecture ideas
• Requirements– Cycle accurate simulation– Representative simulation length– Software development (often)
• Current approach– Mostly software simulation (10 KHz to 1 KHz)
• New approach– Build a performance model in an FPGA
2007.05.14 Hasim3
FPGA-based approaches
• Prototyping– Build a logically isomorphic representation of the design
• Modeling– Build a performance simulation in gates
• Hybrids– Build something that is partially a prototype and partially a model
2007.05.14 Hasim4
Recreate Asim in hardware
• Modularity
• Inter-module communication
• Functional/Timing Partitioning
• Modeling Utilities
2007.05.14 Hasim5
Why modularity?
• Speed of model development
• Shared components between products
• Reuse across generations
• Encourages isomorphism to design
• Improved fidelity
• Facilitates speed/fidelity trade-offs
• Architectural experimentation
• Factorial development and evaluations
• Sharing
2007.05.14 Hasim6
ASIM Module Hierarchy
S
MC N
D R X C WF
B
2007.05.14 Hasim7
ASIM Module Selection
B
B
B
B
S
MC N
D R X C WF
BB
2007.05.14 Hasim8
D R X C WF D R X C WF
S
MC NC M N
Module Selection
S
BB
B
B
B
B
2007.05.14 Hasim9
Module Replacement
B
B
B
B
S
MC N
D R X C WF
B
X
2007.05.14 Hasim10
(H)ASIM Module Hierarchy
2007.05.14 Hasim11
Communication
C
D R X C WF
N N
2007.05.14 Hasim12
Named connections
S DA-out A-in
2007.05.14 Hasim13
Model and FPGA Cycles
Module AModule B
Port
A 1.1 1.2 1.3 2.1 2.2
B 1.1 2.1 2.2 2.3
1 2 3 4 5 6 7 8
A 1.1 1.2 1.3 2.1 2.2
B 1.1 2.1 2.2 2.3
1 2 3 4 5 6 7 8
Port
Port
Port
2007.05.14 Hasim14
Functional/Timing Decomposition
• ISA semantics• Platform semantics
• Micro-architecture
TimingPartition
FunctionalPartition
Fetch(PC)
…
Instruction
• Simplifies timing model
• Amortize functional model design effort over many models
• Can be pipelined for performance
• Can be FPGA-friendly design
• Can be split across hardware and software
2007.05.14 Hasim15
Execute@execute phases
Fetch instruction
Speculatively execute instruction
Read memory*
Speculatively write memory* (locally visible)
Commit or Abort instruction
Write memory* (globally visible)
* Optional depending on instruction type
2007.05.14 Hasim16
Execution in phases
F D X R C
F D X W C W
F D X C
Assertion: All data dependencies can be represented in these phases
F D X R A
F D X X C W
2007.05.14 Hasim17
HASim: Partitioning Overview
Token Gen
Dec Exe Mem LCom GComFet
Timing Partition
MemoryState
Register State
RegFileFunctionalPartition
2007.05.14 Hasim18
Common Infrastructure
• Modules
• Inter-module communication
• Statistics gathering
• Event logging
• Debug Tracing
• Simulation control
• …
2007.05.14 Hasim19
Bluespec (Asim-style) modulemodule [HAsim_module] mkCache#() (Empty);
Port#(Addr) req_port <- mkSendPort(‘a2cache’); Port#(Bool) resp_port <- mkRecvPort(‘cache2a’);
TagArray tagarray <- mkTagArray();
rule cycle(True); Maybe#(Addr) mx = req_port.get(); if (isValid(mx)) resp_port.put(tagarray.lookup(validValue(mx)));
endruleendmodule
2007.05.14 Hasim20
Bluespec (Asim-style) submodulemodule mkTagArray(TagArray);
RegFile#(Bit#(12),Bit#(4)) tagArray<- mkRegFileFull(...);
method Bool lookup(Bit#(16) a); return (tagArray.sub(getIndex(a)) == getTag(a)); endmethod
function Bit#(4) getTag(Address x); return x[15:12]; endfunction
function Bit#(12) getIndex(Address x); return x[11:0]; endfunction
endmodule
2007.05.14 Hasim21
Support functions - stats
Module
Stat Counter
Module
Stat Counter
Module
Stat Counter
Stat Dumper
module mkCache#(...) (Empty); ... cache_hits <- mkStat(...); ... hit=tagarray.lookup(...); if (hit) cache_hits.increment();
endif
...endmodule
2007.05.14 Hasim22
2Dreams
2007.05.14 Hasim23
Support functions - events
Module
Event Reg
Module
Event Reg
Module
Event Reg
Event Dumper
module mkCache#(...) (Empty); ... cache_event <- mkEvent(...); ... hit=tagarray.lookup(...); cache_event.report(hit);
...endmodule
2007.05.14 Hasim24
Support functions – global controller
Module
Controller
Module
Controller
Module
Controller
GlobalController
module mkCache#(...) (Empty); ... ctrl <- mkCntrlr(...); ... rule (ctrl.run()) ...
endrule
endmodule
2007.05.14 Hasim26
FPGA-based prototype
Prototyping Catch-22…
2007.05.14 Hasim27
Module Instantiation
U
D R X C WF
MC NC
D R X C WF
M
C
D R X C WF
2007.05.14 Hasim28
Factorial Coding/Experiments
SC
S
MC N
SM
RC
S
MC N
SM
SC
S
MC N
RM
RC
S
MC N
RM
2007.05.14 Hasim29
HAsim: Current status - models
• Simple RISC functional model operating– Simple RISC ISA– Pipelined multi-phase instruction execution– Supports speculative OOO design
• Physical Reg File and ROB• Small physically addressed memory• Fast speculative rewinds
• Instruction-per-cycle (APE) model– Runs simple benchmarks on FPGA
• Five stage pipeline– Supports branch mis-speculation – Runs simple benchmarks (in software simulation)
• X86 functional model architecture under development
2007.05.14 Hasim30
Connections Implement Ports
foo bar bar
foo
baz
baz
PM (Module Tree w. Connections)
PM (Hardware Modules w. Wrappers)
barbar
foofoo
baz baz
Implemented via connections.
2007.05.14 Hasim31
Timing Model Resources (Fast)
OOO, branch prediction, three functional units, 32KB 2-way set associative ICache and DCache, iTLB, dTLB2142 slices (15% of a 2VP30)
• 21 block RAMs (15% of a 2VP30)
Configurable cache model
• 32KB 4-way set associative cache with 16B cache-lines – 165 slices (1% of a 2VP30) – 17 block RAMs (12% of a 2VP30)
• 2MB 4-way set-associative cache with 64B cache-lines– 140 slices (1% of a 2VP30)– 40 block RAMs (29% of a 2VP30)
Current FPGAs (4VFX140)
• 142,128 slices
• 552 block RAMs
• 2 PowerPCs