CS152 Discussion Section
Transcript of CS152 Discussion Section
![Page 1: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/1.jpg)
CS152 Discussion Section
3/8/19
![Page 2: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/2.jpg)
Administrivia
● Midterm grading moving along on schedule
● Today’s discussion will introduce Lab 3
○ Won’t officially be assigned until after Lab 2!
○ Due April 8th (one month from today)
○ Relies on local simulation like Lab 1
○ Instead of talking about the actual lab procedure, we’ll take a higher-level look at what is actually being simulated
![Page 3: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/3.jpg)
Lab 3 Goals
● Take an out-of-order core microarchitecture that has many parameters
● Vary those parameters and see how the performance is affected
○ Branch predictor
○ ROB size
○ Number of physical registers
○ Et cetera
![Page 4: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/4.jpg)
The Berkeley Out-of-Order Machine (BOOM)
![Page 5: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/5.jpg)
BOOM Pipeline OverviewDefaultBoomConfig
![Page 6: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/6.jpg)
Type of Machine
Data in ROBUnified Physical Register File
BOOM
![Page 7: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/7.jpg)
Out-of-Order Design Space
![Page 8: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/8.jpg)
▪
▪
▪
![Page 9: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/9.jpg)
ld x1, (x3)addi x3, x1, #4sub x6, x7, x9add x3, x3, x6ld x6, (x1)add x6, x6, x3sd x6, (x1)ld x6, (x11)
ld P1, (Px)addi P2, P1, #4sub P3, Py, Pzadd P4, P2, P3ld P5, (P1)add P6, P5, P4sd P6, (P1)ld P7, (Pw)
![Page 10: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/10.jpg)
op p1 PR1 p2 PR2ex
use Rd PRdLPRdROB
ld x1, 0(x3)addi x3, x1, #4sub x6, x7, x6add x3, x3, x6ld x6, 0(x1)
Free ListP0P1P3P2P4
<x6>P5 <x7>P6 <x3>P7
P0
Pn
P1P2P3P4
Physical Regs
ppp
<x1>P8 p
x ld p P7 x1 P0
x5 P5x6 P6x7
x0 P8x1x2 P7x3x4
Rename Table
P0
P8
![Page 11: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/11.jpg)
![Page 12: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/12.jpg)
I-cacheFetch BufferIssueBufferFunc.Units
Arch.State
Execute
Decode
ResultBuffer Commit
PCFetch
Branchexecuted
Next fetch started
![Page 13: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/13.jpg)
▪
•
•
▪
•
••
![Page 14: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/14.jpg)
![Page 15: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/15.jpg)
![Page 16: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/16.jpg)
▪
▪
![Page 17: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/17.jpg)
▪▪
1.
2.
![Page 18: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/18.jpg)
![Page 19: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/19.jpg)
![Page 20: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/20.jpg)
![Page 21: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/21.jpg)
![Page 22: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/22.jpg)
▪▪▪▪
Fetch Decode
Execute
CommitReorder Buffer
Kill
Kill KillPC
Inject correct PCBranchPrediction
BranchResolution
Complete
![Page 23: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/23.jpg)
▪
▪
![Page 24: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/24.jpg)
More BOOM Detail
![Page 25: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/25.jpg)
Frontend OverviewRVC Memory
Interpretation Here
RVC Expansion Here
![Page 26: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/26.jpg)
Frontend Description
● Cycle 0○ Get the PC to pass into the BTB/BPD and I$○ Hash the GHR outside the “backing predictor”
● Cycle 1○ I$ is getting a line○ BTB makes a prediction on the PC○ 1st stage of “backing predictor”
■ Ex. GShare accesses counter tables
● Cycle 2○ 2nd stage of “backing predictor”
■ Ex. GShare puts counter output into queue
● Cycle 3○ BR decode I$ result (not “true” decode)○ BR check to update BTB and redirect○ Enqueue the Fetch Buffer and Fetch Target
Queue with instructions and BR info
● Fetch Buffer○ Connects Front-end to Back-end○ Passes Fetch Packet of instructions to the
Decode stage
● Fetch Target Queue○ Previously called the Branch Reorder Buffer
○ Holds branch prediction information used to update the BPD on commit, mispredictions, speculation
○ Dequeued on commit once all instructions in an entry are committed
● RVC in Cycle 3○ Puts a full Fetch Packet into the Fetch Buffer
depending on where instruction is aligned○ Used in Decoder to “expand” instructions
![Page 27: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/27.jpg)
Branch Predictor Pipeline
● GHR management is kept outside the “abstract” predictor○ Keeps track of global branch state○ Keep “snapshot” or copy of GHR per “fetch packet”
● Abstract Predictor○ TAGE○ GShare○ BaseOnly
■ Only use BIM from BTB to predict○ Random
■ Using LSFR to create random predictions○ Null predictor
■ Don’t predict anything
![Page 28: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/28.jpg)
GShare Example
![Page 29: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/29.jpg)
Backend
![Page 30: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/30.jpg)
Decode and Rename
● Simple Decode○ Thanks RISC-V!
○ RVC instructions are expanded here to go through the normal decode
● Rename○ Split into FP and INT Rename
■ Both are coupled by the Decode Width○ Map Table - maps logical registers (x0,...x31) to physical registers
■ Otherwise known as the “rename table”○ Busy Table - marks physical registers as “busy” until operands are ready○ Free List - running list of physical registers that are free
● Complete Map table snapshots are kept for each BR instruction in the ROB○ Can recover in a single cycle
![Page 31: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/31.jpg)
Dispatch and Issue
● Multiple Issue Queues for each type of micro-operation (uop)○ Memory
○ Integer (ALU)
○ Floating Point
○ Note: Now you can have a monolithic Memory + Integer issue queue
● Once operands are ready, then the uop is issued (this means reading the necessary registers and moving to the execution stage)
● Wakeup (ready) ports coming from the Exe Units are automatically wired to the issue queues
![Page 32: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/32.jpg)
ROB
● [What to add about the ROB
here?] Fairly basic...
![Page 33: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/33.jpg)
Execution Units● Execution Unit
○ Top level execution unit that shares read port and write ports
○ Can be composed of multiple “smaller” functional units
○ May reserve a dedicated RF write port, or share the long-latency writeback with memory
● Functional Unit○ Wraps external units into pipelined functional
units○ Auto-generates pipeline regs for uop metadata○ Auto-generates misspeculation kill logic Functional Unit Hierarchy
![Page 34: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/34.jpg)
2R2W IRF3R2W FRF
4R2W IRF3R2W FRF
8R4W IRF6R3W FRF
![Page 35: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/35.jpg)
Execution Units
Pipelined Functional Unit
![Page 36: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/36.jpg)
MemorySystem
![Page 37: CS152 Discussion Section](https://reader036.fdocuments.in/reader036/viewer/2022071604/62d0a4f17be1cb5c306f2d13/html5/thumbnails/37.jpg)
Memory System
● SAQ○ Holds store addresses○ Content Addressable Memory (CAM) to associatively search for conflicts○ Incoming loads must search for dependent stores
● SDQ○ Hold store data
● LAQ○ Holds load addresses○ CAM for conflict detection○ Incoming loads must search for earlier loads
● D$ Shim○ Wraps the underlying rocketchip L1 cache
● Rocket non-blocking L1 cache