COMP25212 CPU Multi Threading

16
COMP25212 CPU Multi Threading • Learning Outcomes: to be able to: Describe the motivation for multithread support in CPU hardware To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading To explain when multithreading is inappropriate To be able to describe a multithreading implementations To be able to estimate performance of these implementations To be able to state important assumptions of this performance model

description

COMP25212 CPU Multi Threading. Learning Outcomes: to be able to: Describe the motivation for multithread support in CPU hardware To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading To explain when multithreading is inappropriate - PowerPoint PPT Presentation

Transcript of COMP25212 CPU Multi Threading

Page 1: COMP25212  CPU Multi Threading

COMP25212 CPU Multi Threading

• Learning Outcomes: to be able to:– Describe the motivation for multithread support in CPU

hardware

– To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading

– To explain when multithreading is inappropriate

– To be able to describe a multithreading implementations

– To be able to estimate performance of these implementations

– To be able to state important assumptions of this performance model

Page 2: COMP25212  CPU Multi Threading

Revision: IncreasingCPU Performance

Data Cache

Fetch Logic

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Fetch Logic

Mem

Logic

Write Logic

Inst Cache

How can throughput be increased?

Clock

a

c

b

d

f

e

Page 3: COMP25212  CPU Multi Threading

Increasing CPU Performance

a) By increasing clock frequency

b) By increasing Instructions per Clock

c) Minimizing memory access impact – data cached) Maximising Inst issue rate – branch prediction

e) Maximising Inst issue rate – superscalar

f) Maximising pipeline utilisation – avoid instruction dependencies – out of order execution

g) (What does lengthening pipeline do?)

Page 4: COMP25212  CPU Multi Threading

Increasing Program Parellelism

– Keep issuing instructions after branch?– Keep processing instructions after cache miss?– Process instructions in parallel?– Write register while previous write pending?

• Where can we find additional independent instructions?– In a different program!

Page 5: COMP25212  CPU Multi Threading

Revision – Process States

Terminated

Running on a CPU

Blocked waiting for

event

Ready waiting for

a CPU

New

Dispatch(scheduler)

Needs to wait(e.g. I/O)

I/O occurs

Pre-empted(e.g. timer)

Page 6: COMP25212  CPU Multi Threading

Revision – Process Control Block

• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management

Info

• Open File List, with positions

• Network Connections• CPU time used• Parent Process ID

Page 7: COMP25212  CPU Multi Threading

Revision: CPU Switch

Process P0Process P1Operating System

Save state into PCB0

Load state fromPCB1

Save state into PCB0

Load state fromPCB1

Page 8: COMP25212  CPU Multi Threading

What does CPU load on dispatch?

• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management

Info

• Open File List, with positions

• Network Connections• CPU time used• Parent Process ID

Page 9: COMP25212  CPU Multi Threading

What does CPU need to store on deschedule?

• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management

Info

• Open File List, with positions

• Network Connections• CPU time used• Parent Process ID

Page 10: COMP25212  CPU Multi Threading

CPU Support for Multithreading

Data Cache

Fetch Logic

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Fetch Logic

Mem

Logic

Write Logic

Inst Cache

PCA

PCB

VA MappingA

VA MappingB

AddressTranslation

GPRsA

GPRsB

Page 11: COMP25212  CPU Multi Threading

How Should OS View Extra Hardware Thread?

• A variety of solutions

• Simplest is probably to declare extra CPU

• Need multiprocessor-aware OS

Page 12: COMP25212  CPU Multi Threading

CPU Support for Multithreading

Data Cache

Fetch Logic

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Fetch Logic

Mem

Logic

Write Logic

Inst Cache

PCA

PCB

VA MappingA

VA MappingB

AddressTranslation

GPRsA

GPRsB

Design Issue:when to switch

threads

Page 13: COMP25212  CPU Multi Threading

Coarse-Grain Multithreading

• Switch Thread on “expensive” operation:– E.g. I-cache miss– E.g. D-cache miss

• Some are easier than others!

Page 14: COMP25212  CPU Multi Threading

Switch Threads on Icache miss1 2 3 4 5 6 7

Inst a IF ID EX MEM WB

Inst b IF ID EX MEM WB

Inst c IF MISS ID EX MEM WB

Inst d IF ID EX MEM

Inst e IF ID EX

Inst f IF ID

Inst X

Inst Y

Inst Z

- - - -

Page 15: COMP25212  CPU Multi Threading

Performance of Coarse Grain

• Assume (conservatively)– 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks)– 1 i-cache miss per 100 instructions– 1 instruction per clock otherwise

• Then, time to execute 100 instructions without multithreading– 100 + 20 clock cycles– Inst per Clock = 100 / 120 = 0.83.

• With multithreading: time to exec 100 instructions:– 100 [+ 1]– Inst per Clock = 100 / 101 = 0.99..

Page 16: COMP25212  CPU Multi Threading

Switch Threads on Dcache miss1 2 3 4 5 6 7

Inst a IF ID EX M-Miss WB

Inst b IF ID EX MEM WB

Inst c IF ID EX MEM WB

Inst d IF ID EX MEM

Inst e IF ID EX

Inst f IF ID

MISS MISS MISS

- - -

- - -

- - -

Inst X

Inst Y

Performance:similar calculation (STATE ASSUMPTIONS!)

Where to restart after memory cycle? I suggest instruction “a” – why?

Abort theseAbort these