COMP25212 CPU Multi Threading
-
Upload
jesse-parrish -
Category
Documents
-
view
40 -
download
1
description
Transcript of COMP25212 CPU Multi Threading
COMP25212 CPU Multi Threading
• Learning Outcomes: to be able to:– Describe the motivation for multithread support in CPU
hardware
– To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading
– To explain when multithreading is inappropriate
– To be able to describe a multithreading implementations
– To be able to estimate performance of these implementations
– To be able to state important assumptions of this performance model
Revision: IncreasingCPU Performance
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
How can throughput be increased?
Clock
a
c
b
d
f
e
Increasing CPU Performance
a) By increasing clock frequency
b) By increasing Instructions per Clock
c) Minimizing memory access impact – data cached) Maximising Inst issue rate – branch prediction
e) Maximising Inst issue rate – superscalar
f) Maximising pipeline utilisation – avoid instruction dependencies – out of order execution
g) (What does lengthening pipeline do?)
Increasing Program Parellelism
– Keep issuing instructions after branch?– Keep processing instructions after cache miss?– Process instructions in parallel?– Write register while previous write pending?
• Where can we find additional independent instructions?– In a different program!
Revision – Process States
Terminated
Running on a CPU
Blocked waiting for
event
Ready waiting for
a CPU
New
Dispatch(scheduler)
Needs to wait(e.g. I/O)
I/O occurs
Pre-empted(e.g. timer)
Revision – Process Control Block
• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management
Info
• Open File List, with positions
• Network Connections• CPU time used• Parent Process ID
Revision: CPU Switch
Process P0Process P1Operating System
Save state into PCB0
Load state fromPCB1
Save state into PCB0
Load state fromPCB1
What does CPU load on dispatch?
• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management
Info
• Open File List, with positions
• Network Connections• CPU time used• Parent Process ID
What does CPU need to store on deschedule?
• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management
Info
• Open File List, with positions
• Network Connections• CPU time used• Parent Process ID
CPU Support for Multithreading
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
PCA
PCB
VA MappingA
VA MappingB
AddressTranslation
GPRsA
GPRsB
How Should OS View Extra Hardware Thread?
• A variety of solutions
• Simplest is probably to declare extra CPU
• Need multiprocessor-aware OS
CPU Support for Multithreading
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
PCA
PCB
VA MappingA
VA MappingB
AddressTranslation
GPRsA
GPRsB
Design Issue:when to switch
threads
Coarse-Grain Multithreading
• Switch Thread on “expensive” operation:– E.g. I-cache miss– E.g. D-cache miss
• Some are easier than others!
Switch Threads on Icache miss1 2 3 4 5 6 7
Inst a IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst c IF MISS ID EX MEM WB
Inst d IF ID EX MEM
Inst e IF ID EX
Inst f IF ID
Inst X
Inst Y
Inst Z
- - - -
Performance of Coarse Grain
• Assume (conservatively)– 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks)– 1 i-cache miss per 100 instructions– 1 instruction per clock otherwise
• Then, time to execute 100 instructions without multithreading– 100 + 20 clock cycles– Inst per Clock = 100 / 120 = 0.83.
• With multithreading: time to exec 100 instructions:– 100 [+ 1]– Inst per Clock = 100 / 101 = 0.99..
Switch Threads on Dcache miss1 2 3 4 5 6 7
Inst a IF ID EX M-Miss WB
Inst b IF ID EX MEM WB
Inst c IF ID EX MEM WB
Inst d IF ID EX MEM
Inst e IF ID EX
Inst f IF ID
MISS MISS MISS
- - -
- - -
- - -
Inst X
Inst Y
Performance:similar calculation (STATE ASSUMPTIONS!)
Where to restart after memory cycle? I suggest instruction “a” – why?
Abort theseAbort these