COMP25212 CPU Multi Threading

• Learning Outcomes: to be able to:– Describe the motivation for multithread support in CPU

hardware

– To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading

– To explain when multithreading is inappropriate

– To be able to describe a multithreading implementations

– To be able to estimate performance of these implementations

– To be able to state important assumptions of this performance model

Revision: IncreasingCPU Performance

Data Cache

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Fetch Logic

Write Logic

Inst Cache

How can throughput be increased?

Increasing CPU Performance

a) By increasing clock frequency

b) By increasing Instructions per Clock

c) Minimizing memory access impact – data cached) Maximising Inst issue rate – branch prediction

e) Maximising Inst issue rate – superscalar

f) Maximising pipeline utilisation – avoid instruction dependencies – out of order execution

g) (What does lengthening pipeline do?)

Increasing Program Parellelism

– Keep issuing instructions after branch?– Keep processing instructions after cache miss?– Process instructions in parallel?– Write register while previous write pending?

• Where can we find additional independent instructions?– In a different program!

Revision – Process States

Terminated

Running on a CPU

Blocked waiting for

Ready waiting for

Dispatch(scheduler)

Needs to wait(e.g. I/O)

I/O occurs

Pre-empted(e.g. timer)

Revision – Process Control Block

• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management

• Open File List, with positions

• Network Connections• CPU time used• Parent Process ID

Revision: CPU Switch

Process P0Process P1Operating System

Save state into PCB0

Load state fromPCB1

Save state into PCB0

Load state fromPCB1

What does CPU load on dispatch?

What does CPU need to store on deschedule?

CPU Support for Multithreading

Data Cache

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Fetch Logic

Write Logic

Inst Cache

VA MappingA

VA MappingB

AddressTranslation

How Should OS View Extra Hardware Thread?

• A variety of solutions

• Simplest is probably to declare extra CPU

• Need multiprocessor-aware OS

CPU Support for Multithreading

Data Cache

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Fetch Logic

Write Logic

Inst Cache

VA MappingA

VA MappingB

AddressTranslation

Design Issue:when to switch

threads

Coarse-Grain Multithreading

• Switch Thread on “expensive” operation:– E.g. I-cache miss– E.g. D-cache miss

• Some are easier than others!

Switch Threads on Icache miss1 2 3 4 5 6 7

Inst a IF ID EX MEM WB

Inst b IF ID EX MEM WB

Inst c IF MISS ID EX MEM WB

Inst d IF ID EX MEM

Inst e IF ID EX

Inst f IF ID

Inst X

Inst Y

Inst Z

- - - -

Performance of Coarse Grain

• Assume (conservatively)– 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks)– 1 i-cache miss per 100 instructions– 1 instruction per clock otherwise

• Then, time to execute 100 instructions without multithreading– 100 + 20 clock cycles– Inst per Clock = 100 / 120 = 0.83.

• With multithreading: time to exec 100 instructions:– 100 [+ 1]– Inst per Clock = 100 / 101 = 0.99..

Switch Threads on Dcache miss1 2 3 4 5 6 7

Inst a IF ID EX M-Miss WB

Inst b IF ID EX MEM WB

Inst c IF ID EX MEM WB

Inst d IF ID EX MEM

Inst e IF ID EX

Inst f IF ID

MISS MISS MISS

Inst X

Inst Y

Performance:similar calculation (STATE ASSUMPTIONS!)

Where to restart after memory cycle? I suggest instruction “a” – why?

Abort theseAbort these

COMP25212 CPU Multi Threading

Documents

Transcript of COMP25212 CPU Multi Threading

About this manual - Fujitsu this manual Abbreviations The ... Eth. Ctrl 1000-BASE-T Cu lp ... Hyper Threading CPU Settings PCI Device, Embedded Ethernet 2 …

Multi Threading

COMP25212 ARRAY OF DISKS Sergio Davies Sergio.Davies@manchester.ac.uk Feb/Mar 2014COMP25212 – Storage 2.

Indexable Threading Insertsold.ctri.com.cn/.../english/Indexable_Threading_Inserts.pdfthreading inserts for oil pipe (API) and threading cutter arbors (external and internal threading

Java threading

COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

provide threading services 34 - City and Guilds · provide threading services. ... and prepare for threading services ... Threading tools, materials and equipment Treatment specific

Multi-core systems System Architecture COMP25212

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

Threading Manual

CPU CPU 81,349 CPU CPU X86 CPU CPU CPU CPU CPU ) , CPU

COMP25212 - SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop Antoniu.Pop@manchester.ac.uk COMP25212 – Lecture 2Jan/Feb 2015.

Android Threading

Threading, Events, and Concurrencyfaculty.cs.tamu.edu/bettati/Courses/410/2017A/Slides/threads.pdf · Threading, Events, and Concurrency • Threading Recap ... – Concurrency is

Threading - RIDGID Professional Tools WEB.pdf · 4.1 THREADING Threading • Dependable heavy duty machines for maximum up-time, superior quality threads and minimal threading time.

COMP25212 CPU Multi Threading

Threading Tools - sumitool.com · SEC-Threading Tools Selection Guide F48 F Threading Tools Grooving Cut-Off Threading External Threading 60° 55° 60° 60° 60° 60° 1˚47' 60°

.Net Threading

Threading Complete