Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

23
Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012 L9-1 http:// csg.csail.mit.edu/SNU

description

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. Three-Stage SMIPS. Register File. Epoch. stall?. PC. Execute. Decode. wbr. fr. +4. Data Memory. Inst Memory. - PowerPoint PPT Presentation

Transcript of Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Page 1: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Realistic Memories and Caches

ArvindComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology

January 19, 2012 L9-1http://csg.csail.mit.edu/SNU

Page 2: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Three-Stage SMIPS

PC

InstMemory

Decode

Register File

Execute

DataMemory

+4fr

Epoch

wbr

stall?

The use of magic memories makes

this design unrealistic

January 19, 2012 L9-2http://csg.csail.mit.edu/SNU

Page 3: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

A Simple Memory Model

Reads and writes are always completed in one cycle

a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock

edge(the write address and data must be stable at the clock edge)

MAGIC RAM

ReadData

WriteData

Address

WriteEnableClock

In a real DRAM the data will be available several cycles after the address is supplied

January 19, 2012 L9-3http://csg.csail.mit.edu/SNU

Page 4: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Memory Hierarchy

size: RegFile << SRAM << DRAM why? latency: RegFile << SRAM << DRAM why? bandwidth: on-chip >> off-chip why?

On a data access:hit (data Î fast memory) low latency accessmiss (data Ï fast memory) long latency access (DRAM)

Small,Fast Memory

SRAM

CPURegFile

Big, Slow MemoryDRAM

A B

holds frequently used data

January 19, 2012 L9-4http://csg.csail.mit.edu/SNU

Page 5: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Plan

What do simple caches look like

Incorporating caches in processor pipleline

January 19, 2012 L9-5http://csg.csail.mit.edu/SNU

Page 6: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data Cache - Interface

interface DCache; method Action req(MemReq r); method ActionValue#(MemResp) resp;

method ActionValue#(MemReq) memReq; method Action memResp(MemResp r);endinterface

cache

req

guardresp

guard

memReq

guardmemResp

guard

Processor DRAM

hitQ

mReqQ

mRespQ

missReq

January 19, 2012 L9-6http://csg.csail.mit.edu/SNU

Page 7: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Direct-Mapped Cache

Tag Data Block V

=

Offset Tag Index

t k b

t

HIT Data Word or Byte

2k

lines

Block number Block offset

What is a bad reference pattern? Strided at size of cache

req address

January 19, 2012 L9-7http://csg.csail.mit.edu/SNU

Page 8: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data Cache – code structuremodule mkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <-

replicateM(mkReg(False));…

rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(MemResp r) … endmethod;endmodule

January 19, 2012 L9-8http://csg.csail.mit.edu/SNU

Page 9: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data Cache state declarations

Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False));

Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU);

Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU);

FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1;

PipeReg#(MemReq) hitQ <- mkPipeReg; Reg#(MemReq) missReq <- mkRegU; Reg#(Bit#(2)) status <- mkReg(0);

January 19, 2012 L9-9http://csg.csail.mit.edu/SNU

Page 10: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data Cache processor-side methods method Action req(MemReq req) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr); Bool valid = vArray[idx]; Bool tagMatch = tagArray[idx]==tag; if(valid && tagMatch && hitQ.notFull) hitQ.enq(req); else begin missReq <= req; status <= 1; end endmethod

method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2); if(r.op==St) dataArray[idx] <= r.data; return dataArray[idx]; endmethod

January 19, 2012 L9-10http://csg.csail.mit.edu/SNU

Page 11: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq; return mReqQ.first; endmethod

method Action memResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res); endmethod

January 19, 2012 L9-11http://csg.csail.mit.edu/SNU

Page 12: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data CacheRule to process a cache-missrule doMiss (status!=0); Index idx = truncate(missReq.addr>>2); if(status==1 && mReqQ.notFull) begin if(vArray[idx]) mReqQ.enq( MemReq{op:St, addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2; end

if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) begin if(vArray[idx]) mRespQ.deq; mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3; end

January 19, 2012 L9-12http://csg.csail.mit.edu/SNU

Page 13: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Data CacheRule to process a cache-miss rule doMiss (status!=0); … if(status==3 && mRespQ.notEmpty && hitQ.notFull) begin let data = mRespQ.first; mRespQ.deq;

Tag tag = truncateLSB(missReq.addr); vArray[idx] <= True; tagArray[idx] <= tag; dataArray[idx] <= data;

hitQ.enq(missReq); status <= 0; end endrule

January 19, 2012 L9-13http://csg.csail.mit.edu/SNU

Page 14: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Five-Stage SMIPS

PC

InstMemory

Decode

Register File

Execute

DataMemory

+4fr

Epoch

wbr

stall?

dr er

In this organization memory can take any

amount of time

January 19, 2012 L9-14http://csg.csail.mit.edu/SNU

Page 15: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Five-Stage SMIPS state elementsmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFile rf <- mkRFile;

Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport;

PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; rule doProc; …

January 19, 2012 L9-15http://csg.csail.mit.edu/SNU

Page 16: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Five-Stage SMIPSinstruction fetch rule doProc; Bool iAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end

enque instruction fetch request

if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc)

January 19, 2012 L9-16http://csg.csail.mit.edu/SNU

Page 17: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin let dInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq;end

Five-Stage SMIPSdecode

decode the fetched instruction

January 19, 2012 L9-17http://csg.csail.mit.edu/SNU

Page 18: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Addr redirPc = ?; Bool redirPCvalid = False;if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin Bool eStall = … Bool wbStall = … if(!eStall && !wbStall) begin let eInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) begin redirPC … end; er.enq(EBundle{…}; dr.deq; end end else dr.deq; end

Five-Stage SMIPScode structure for killing wrongly fetched instructions

kill

successful

January 19, 2012 L9-18http://csg.csail.mit.edu/SNU

Page 19: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Addr redirPc = ?; Bool redirPCvalid = False;if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin let dInst = dr.first.dInst;

Bool eStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst));

Bool wbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst));

Five-Stage SMIPSstall signal

January 19, 2012 L9-19http://csg.csail.mit.edu/SNU

Page 20: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC = eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end

Five-Stage SMIPSExecute if not stall

successful execution

January 19, 2012 L9-20http://csg.csail.mit.edu/SNU

Page 21: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end

Five-Stage SMIPSexecute and memory responses to WB

January 19, 2012 L9-21http://csg.csail.mit.edu/SNU

Page 22: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end

pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endruleendmodule

Five-Stage SMIPSwriteback

January 19, 2012 L9-22http://csg.csail.mit.edu/SNU

Page 23: Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

SummaryLot of room for making errors verification and testing is essential

Memory systems or dealing with load latencies is a major aspect of computer design The 5-stage design presented here is

different from H&P and is much more realistic

next Branch prediction

January 19, 2012 L9-23http://csg.csail.mit.edu/SNU