Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of...

28
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011 L07-1 http:// csg.csail.mit.edu/6.375

Transcript of Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of...

Page 1: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Modeling Processors

Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology

February 22, 2011 L07-1http://csg.csail.mit.edu/6.375

Page 2: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Instruction settypedef enum {R0;R1;R2;…;R31} RName;

An instruction set can be implemented using many different microarchitectures

typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName condR; RName addrR;} Bz; struct {RName dst; RName addrR;} Load; struct {RName valueR; RName addrR;} Store} Instr deriving(Bits, Eq);

typedef Bit#(32) Iaddress;typedef Bit#(32) Daddress;typedef Bit#(32) Value;

February 22, 2011 L07-2http://csg.csail.mit.edu/6.375

Page 3: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Deriving Bits

To store datatypes in register, fifo, etc. we need to know how to represent them as bits (pack) and interpret their bit representation (unpack)Deriving annotation automatically generates the “pack” and “unpack” operations on the type (simple concatenation of bit representations of components)It is possible to customize the pack/unpack operations to any specific desired representation

typedef struct { … } Foo deriving (Bits);

February 22, 2011 L07-3http://csg.csail.mit.edu/6.375

Page 4: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Tagged Unions: Bit Representation

00 dst src1 src2

01 condR addrR

10 dst addrR

11 dst imm

typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName condR; RName addrR;} Bz; struct {RName dst; RName addrR;} Load; struct {RName dst; Immediate imm;} AddImm;} Instr deriving(Bits, Eq);

Automatically derived representation; can be customized by the user written pack and unpack functions

February 22, 2011 L07-4http://csg.csail.mit.edu/6.375

Page 5: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

The Plan

Non-pipelined processor

Two-stage Inelastic pipeline

Two-stage Elastic pipeline – next lecture

Some understanding of simple processor pipelines is needed to follow this lecture

February 22, 2011 L07-5http://csg.csail.mit.edu/6.375

Page 6: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Non-pipelined Processor

fetch & execute

pc

iMem dMem

rfCPU

module mkCPU#(Mem iMem, Mem dMem)(); Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkRegFileFull();

Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; rule fetch_Execute ...

endmodule

February 22, 2011 L07-6http://csg.csail.mit.edu/6.375

Page 7: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Non-pipelined processor rulerule fetch_Execute (True); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: begin

rf.upd(rd, rf[ra]+rf[rb]); pc <= predIa

end tagged Bz {condR:.rc,addrR:.ra}: begin

pc <= (rf[rc]==0) ? rf[ra] : predIa; end tagged Load {dest:.rd,addrR:.ra}: begin rf.upd(rd, dMem.read(rf[ra])); pc <= predIa; end tagged Store {valueR:.rv,addrR:.ra}: begin dMem.write(rf[ra],rf[rv]); pc <= predIa; end endcaseendrule

my syntaxrf[r] rf.sub(r)

Assume “magic memory”, i.e. responds to a read request in the same cycle and a write updates the memory at the end of the cycle

Pattern matching

February 22, 2011 L07-7http://csg.csail.mit.edu/6.375

Page 8: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Register FileHow many read ports? Two

How many write ports? One

Concurrency properties? Must be able to do two reads and a

write concurrently The values produced must be as if

reads precede the write (if any)

February 22, 2011 L07-8http://csg.csail.mit.edu/6.375

Page 9: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

The Plan

Non-pipelined processor

Two-stage Inelastic pipeline

Two-stage Elastic pipeline

February 22, 2011 L07-9http://csg.csail.mit.edu/6.375

Page 10: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Two-stage InelasticPipeline

fetch & decode

execute

buReg

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .FDstage FD1 FD2 FD3 FD4 FD5

EXstage EX1 EX2 EX3 EX4 EX5

Actions to be performed in parallel every cycle: Fetch Action: Decodes the instruction at the current pc

and fetches operands from the register file and stores the result in buReg

Execute Action: Performs the action specified in buReg and updates the processor state (pc, rf, dMem)

pc rf dMem

rule InelasticPipeline2(True);fetchAction; executeAction; endrule

February 22, 2011 L07-10http://csg.csail.mit.edu/6.375

Page 11: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Instructions & Templates

typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value val; Daddress addr} EStore;} InstTemplate deriving(Eq, Bits);

typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName condR; RName addrR} Bz; struct {RName dst; RName addrR} Load; struct {RName valueR; RName addrR} Store;} Instr deriving(Bits, Eq);

buReg contains instruction templates, i.e., decoded instructions

February 22, 2011 L07-11http://csg.csail.mit.edu/6.375

Page 12: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Fetch & Decode ActionFills the buReg with a decoded instruction

function InstrTemplate newIt(Instr instr); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {condR:.rc,addrR:.addr}: return EBz{cond:rf[rc],tAddr:rf[addr]}; tagged Load {dst:.rd,addrR:.addr}: return ELoad{dst:rd,addrR:rf[addr]}; tagged Store{valueR:.v,addrR:.addr}: return EStore{val:rf[v],addr:rf[addr]}; endcase endfunction

buReg <= newIt(instr);

February 22, 2011 L07-12http://csg.csail.mit.edu/6.375

Page 13: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Execute Action: Reads buReg and modifies state (rf,dMem,pc)

case (buReg) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); pc <= predIa; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); pc <= predIa; end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); pc <= predIa; end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then pc <= predIa; else begin pc <= av; Invalidate buReg endendcase What does this mean?

February 22, 2011 L07-13http://csg.csail.mit.edu/6.375

Page 14: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Issues with buRegfetch & decode

execute

buReg

buReg may not always contain an instruction. Why?

start cycle Execute stage may kill the fetched instructions

because of branch misprediction Maybe type to the rescue

Can’t update buReg in two concurrent actions

fetchAction; executeAction Fold them together

pc rf dMem

February 22, 2011 L07-14http://csg.csail.mit.edu/6.375

Page 15: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

InelasticPipeline first attempt fetch &

decodeexecute

pc rfCPU

buRegrule SyncTwoStage (True); let instr = iMem.read(pc); let predIa = pc+1;

Action fetchAction = action buReg <= Valid newIt(instr); pc <= predIa; endaction;

case (buReg) matches each instruction execution calls fetchAction or puts Invalid in buReg …

endcaseendcase endrule

February 22, 2011 L07-15http://csg.csail.mit.edu/6.375

Page 16: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Execute

case (buReg) matches tagged Valid .it: case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); fetchAction; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction; end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction; end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then fetchAction; else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction; endcase

fetch & decode

execute

pc rfCPU

buReg

Not quite correct!

February 22, 2011 L07-16http://csg.csail.mit.edu/6.375

Page 17: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Pipeline Hazardsfetch & decode

execute

buReg

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .FDstage FD1 FD2 FD3 FD4 FD5

EXstage EX1 EX2 EX3 EX4 EX5

I1 Add(R1,R2,R3)I2 Add(R4,R1,R2)

I2 must be stalled until I1 updates the register file

pc rf dMem

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .FDstage FD1 FD2 FD2 FD3 FD4 FD5

EXstage EX1 EX2 EX3 EX4 EX5

February 22, 2011 L07-17http://csg.csail.mit.edu/6.375

Page 18: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Stall condition

Suppose the fetched instruction needs to read register r and the instruction in buReg is going to write in r then the Fetch unit must stallA function to find register r in an instruction template it

fetch & decode

execute

pc rfCPU

buReg

function Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.v1,op2:.v2}:

return (r == rd); tagged EBz {cond:.c,tAddr:.a}:

return (False); tagged ELoad{dst:.rd,addr:.a}:

return (r == rd); tagged EStore{val:.v,addr:.a}:

return (False); endcase endfunction

February 22, 2011 L07-18http://csg.csail.mit.edu/6.375

Page 19: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

The Stall FunctionDecides if instruction instr should stall given the state of the buReg

function Bool stallFunc (Instr instr, Maybe#(InstTemplate) mit); case (mit) matches tagged Invalid: return False; tagged Valid .it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); tagged Bz {condR:.rc,addrR:.addr}: return (findf(rc,it) || findf(addr,it)); tagged Load {dst:.rd,addrR:.addr}: return (findf(addr,it)); tagged Store {valueR:.v,addrR:.addr}:

return (findf(v,it) || findf(addr,it)); endcaseendfunction

February 22, 2011 L07-19http://csg.csail.mit.edu/6.375

Page 20: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

InelasticPipeline corrected

rule SyncTwoStage (True); let instr = iMem.read(pc); let predIa = pc+1;

Action fetchAction = action if stallFunc(instr, buReg) then buReg <=Invalid else begin buReg <= Valid newIt(instr); pc <= predIa; end endaction;

case (buReg) matches The execute rule (no change) endcaseendcase endrule

fetch & decode

execute

pc rfCPU

buReg

February 22, 2011 L07-20http://csg.csail.mit.edu/6.375

Page 21: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

BypassingAfter decoding the newIt function must read the new register values if available (i.e., the values that are still to be committed in the register file)

We pass the value being written to decoding action (the newIt function)

February 22, 2011 L07-21http://csg.csail.mit.edu/6.375

Page 22: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Generation of bypass register valuerule inelasticProcessor2 (True);case (buReg) matches tagged Valid .it: case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); fetchAction; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction; end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction; end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then fetchAction; else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction; endcase endrule

bypassable values

February 22, 2011 L07-22http://csg.csail.mit.edu/6.375

Page 23: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Bypassing values to Fetchcase (buReg) matches tagged Valid .it: case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); fetchAction(Valid rd, va+vb); end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction(Valid rd, dMem.read(av)); end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction(Invalid, ?); end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then fetchAction(Invalid, ?); else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction(Invalid, ?); endcaseFebruary 22, 2011 L07-23http://csg.csail.mit.edu/6.375

Page 24: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

New fetchAction

function Action fetchAction(Maybe#(RName) mrd, Value val);

action if stallFunc(instr, buReg) then buReg <= Invalid; else begin buReg <= Valid newIt(mrd, val, instr); pc <= predIa; end endactionendfunction

February 22, 2011 L07-24http://csg.csail.mit.edu/6.375

Page 25: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Updated newItfunction InstrTemplate newIt(Maybe#(RName) mrd,

Value val, Instr instr); let nrf(a)=(Valid a == mrd) ? val: rf.sub(a); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:nrf(ra),op2:nrf(rb)}; tagged Bz {condR:.rc,addrR:.addr}: return EBz{cond:nrf(rc),tAddr:nrf(addr)}; tagged Load {dst:.rd,addrR:.addr}: return ELoad{dst:rd,addr:nrf(addr)}; tagged Store{valueR:.v,addrR:.addr}: return EStore{val:nrf(v),addr:nrf(addr)}; endcase endfunction

February 22, 2011 L07-25http://csg.csail.mit.edu/6.375

Page 26: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Bypassing

Now that we’ve correctly bypassed data, we do not have to stall as often

The current stall function is correct, but inefficient Should not stall if the value is now

being bypassed

February 22, 2011 L07-26http://csg.csail.mit.edu/6.375

Page 27: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

The stall function for the Inelastic pipelinefunction Bool newStallFunc (Instr instr,

Reg#(Maybe#(InstTemplate)) buReg); case (buReg) matches tagged Invalid: return False; tagged Valid .it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); …

Previously we stalled when ra matched the destination register of the instruction in the execute stage. Now we bypass that information when we read, so no stall is necessary.

return (false);

February 22, 2011 L07-27http://csg.csail.mit.edu/6.375

Page 28: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1 .

Inelastic Pipelines

Notoriously difficult to get right Imagine the cases to be analyzed if it was a

five stage pipeline

Difficult to refine for better clock timing

elastic pipelines

February 22, 2011 L07-28http://csg.csail.mit.edu/6.375