Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael...

31
Hasim Joel Emer †‡ Michael Adler , Artur Klauser , Angshuman Parashar , Michael Pellauer , Murali Vijayaraghavan VSSAD Intel CSAIL MIT
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael...

Page 1: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

Hasim

Joel Emer†‡

Michael Adler†, Artur Klauser†, Angshuman Parashar†, Michael Pellauer‡,

Murali Vijayaraghavan‡

†VSSADIntel

‡CSAILMIT

Page 2: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim2

Overview

• Goal– Produce compelling evidence for architecture ideas

• Requirements– Cycle accurate simulation– Representative simulation length– Software development (often)

• Current approach– Mostly software simulation (10 KHz to 1 KHz)

• New approach– Build a performance model in an FPGA

Page 3: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim3

FPGA-based approaches

• Prototyping– Build a logically isomorphic representation of the design

• Modeling– Build a performance simulation in gates

• Hybrids– Build something that is partially a prototype and partially a model

Page 4: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim4

Recreate Asim in hardware

• Modularity

• Inter-module communication

• Functional/Timing Partitioning

• Modeling Utilities

Page 5: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim5

Why modularity?

• Speed of model development

• Shared components between products

• Reuse across generations

• Encourages isomorphism to design

• Improved fidelity

• Facilitates speed/fidelity trade-offs

• Architectural experimentation

• Factorial development and evaluations

• Sharing

Page 6: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim6

ASIM Module Hierarchy

S

MC N

D R X C WF

B

Page 7: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim7

ASIM Module Selection

B

B

B

B

S

MC N

D R X C WF

BB

Page 8: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim8

D R X C WF D R X C WF

S

MC NC M N

Module Selection

S

BB

B

B

B

B

Page 9: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim9

Module Replacement

B

B

B

B

S

MC N

D R X C WF

B

X

Page 10: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim10

(H)ASIM Module Hierarchy

Page 11: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim11

Communication

C

D R X C WF

N N

Page 12: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim12

Named connections

S DA-out A-in

Page 13: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim13

Model and FPGA Cycles

Module AModule B

Port

A 1.1 1.2 1.3 2.1 2.2

B 1.1 2.1 2.2 2.3

1 2 3 4 5 6 7 8

A 1.1 1.2 1.3 2.1 2.2

B 1.1 2.1 2.2 2.3

1 2 3 4 5 6 7 8

Port

Port

Port

Page 14: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim14

Functional/Timing Decomposition

• ISA semantics• Platform semantics

• Micro-architecture

TimingPartition

FunctionalPartition

Fetch(PC)

Instruction

• Simplifies timing model

• Amortize functional model design effort over many models

• Can be pipelined for performance

• Can be FPGA-friendly design

• Can be split across hardware and software

Page 15: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim15

Execute@execute phases

Fetch instruction

Speculatively execute instruction

Read memory*

Speculatively write memory* (locally visible)

Commit or Abort instruction

Write memory* (globally visible)

* Optional depending on instruction type

Page 16: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim16

Execution in phases

F D X R C

F D X W C W

F D X C

Assertion: All data dependencies can be represented in these phases

F D X R A

F D X X C W

Page 17: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim17

HASim: Partitioning Overview

Token Gen

Dec Exe Mem LCom GComFet

Timing Partition

MemoryState

Register State

RegFileFunctionalPartition

Page 18: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim18

Common Infrastructure

• Modules

• Inter-module communication

• Statistics gathering

• Event logging

• Debug Tracing

• Simulation control

• …

Page 19: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim19

Bluespec (Asim-style) modulemodule [HAsim_module] mkCache#() (Empty);

Port#(Addr) req_port <- mkSendPort(‘a2cache’); Port#(Bool) resp_port <- mkRecvPort(‘cache2a’);

   TagArray tagarray <- mkTagArray();

rule cycle(True);     Maybe#(Addr) mx = req_port.get();   if (isValid(mx))     resp_port.put(tagarray.lookup(validValue(mx)));

   endruleendmodule

Page 20: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim20

Bluespec (Asim-style) submodulemodule mkTagArray(TagArray);

RegFile#(Bit#(12),Bit#(4)) tagArray<- mkRegFileFull(...);

method Bool lookup(Bit#(16) a); return (tagArray.sub(getIndex(a)) == getTag(a)); endmethod

function Bit#(4) getTag(Address x); return x[15:12]; endfunction

function Bit#(12) getIndex(Address x); return x[11:0]; endfunction

endmodule

Page 21: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim21

Support functions - stats

Module

Stat Counter

Module

Stat Counter

Module

Stat Counter

Stat Dumper

module mkCache#(...) (Empty);   ... cache_hits <- mkStat(...); ...    hit=tagarray.lookup(...);    if (hit) cache_hits.increment();

endif

...endmodule

Page 22: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim22

2Dreams

Page 23: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim23

Support functions - events

Module

Event Reg

Module

Event Reg

Module

Event Reg

Event Dumper

module mkCache#(...) (Empty);   ... cache_event <- mkEvent(...); ...    hit=tagarray.lookup(...);    cache_event.report(hit);

...endmodule

Page 24: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim24

Support functions – global controller

Module

Controller

Module

Controller

Module

Controller

GlobalController

module mkCache#(...) (Empty);   ... ctrl <- mkCntrlr(...); ... rule (ctrl.run()) ...

endrule

endmodule

Page 25: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.
Page 26: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim26

FPGA-based prototype

Prototyping Catch-22…

Page 27: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim27

Module Instantiation

U

D R X C WF

MC NC

D R X C WF

M

C

D R X C WF

Page 28: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim28

Factorial Coding/Experiments

SC

S

MC N

SM

RC

S

MC N

SM

SC

S

MC N

RM

RC

S

MC N

RM

Page 29: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim29

HAsim: Current status - models

• Simple RISC functional model operating– Simple RISC ISA– Pipelined multi-phase instruction execution– Supports speculative OOO design

• Physical Reg File and ROB• Small physically addressed memory• Fast speculative rewinds

• Instruction-per-cycle (APE) model– Runs simple benchmarks on FPGA

• Five stage pipeline– Supports branch mis-speculation – Runs simple benchmarks (in software simulation)

• X86 functional model architecture under development

Page 30: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim30

Connections Implement Ports

foo bar bar

foo

baz

baz

PM (Module Tree w. Connections)

PM (Hardware Modules w. Wrappers)

barbar

foofoo

baz baz

Implemented via connections.

Page 31: Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

2007.05.14 Hasim31

Timing Model Resources (Fast)

OOO, branch prediction, three functional units, 32KB 2-way set associative ICache and DCache, iTLB, dTLB2142 slices (15% of a 2VP30)

• 21 block RAMs (15% of a 2VP30)

Configurable cache model

• 32KB 4-way set associative cache with 16B cache-lines – 165 slices (1% of a 2VP30) – 17 block RAMs (12% of a 2VP30)

• 2MB 4-way set-associative cache with 64B cache-lines– 140 slices (1% of a 2VP30)– 40 block RAMs (29% of a 2VP30)

Current FPGAs (4VFX140)

• 142,128 slices

• 552 block RAMs

• 2 PowerPCs