Multi Processor

download Multi Processor

of 73

Transcript of Multi Processor

  • 7/23/2019 Multi Processor

    1/73

    Multiprocessor ArchitectureBasics

    Companion slides forThe Art of Multiprocessor

    Programmingby Maurice Herlihy & Nir Shavit

  • 7/23/2019 Multi Processor

    2/73

    Art of MultiprocessorProgramming

    2

    Multiprocessor Architecture

    Abstract models are (mostly) OK tounderstand algorithm correctness

    To understand how concurrentalgorithms perform

    You need to understand something

    about multiprocessor architectures

  • 7/23/2019 Multi Processor

    3/73

    Art of MultiprocessorProgramming

    3

    Pieces

    Processors

    Threads

    Interconnect

    Memory

    Caches

  • 7/23/2019 Multi Processor

    4/73

    Art of MultiprocessorProgramming

    4

    Design an Urban Messenger

    Service in 1980 Downtown Manhattan

    Should you use Cars (1980 Buicks, 15 MPG)?

    Bicycles (hire recent graduates)?

    Better use bicycles

  • 7/23/2019 Multi Processor

    5/73

    Art of MultiprocessorProgramming

    5

    Technology Changes

    Since 1980, car technology haschanged enormously

    Better mileage (hybrid cars, 35 MPG) More reliable

    Should you rethink your Manhattan

    messenger service?

  • 7/23/2019 Multi Processor

    6/73

    Art of MultiprocessorProgramming

    6

    Processors

    Cycle: Fetch and execute one instruction

    Cycle times change 1980: 10 million cycles/sec

    2005: 3,000 million cycles/sec

  • 7/23/2019 Multi Processor

    7/73

    Art of MultiprocessorProgramming

    7

    Computer Architecture

    Measure time in cycles Absolute cycle times change

    Memory access: ~100s of cycles Changes slowly

    Mostly gets worse

  • 7/23/2019 Multi Processor

    8/73

    Art of MultiprocessorProgramming

    8

    Threads

    Execution of a sequential program

    Software, not hardware

    A processor can run a thread

    Put it aside Thread does I/O

    Thread runs out of time

    Run another thread

  • 7/23/2019 Multi Processor

    9/73

    Art of MultiprocessorProgramming

    9

    Interconnect

    Bus Like a tiny Ethernet

    Broadcast medium Connects

    Processors to memory

    Processors to processors

    Network Tiny LAN

    Mostly used on large machines

  • 7/23/2019 Multi Processor

    10/73

    Art of MultiprocessorProgramming

    10

    Interconnect

    Interconnect is a finite resource

    Processors can be delayed if othersare consuming too much

    Avoid algorithms that use too muchbandwidth

  • 7/23/2019 Multi Processor

    11/73

    Art of MultiprocessorProgramming11

    Analogy

    You work in an office When you leave for lunch, someone

    else takes over your office. If you dont take a break, a security

    guard shows up and escorts you to

    the cafeteria. When you return, you may get adifferent office

  • 7/23/2019 Multi Processor

    12/73

    Art of MultiprocessorProgramming12

    Processor and Memory are Far

    Apart

    processor

    memory

    interconnect

  • 7/23/2019 Multi Processor

    13/73

    Art of MultiprocessorProgramming13

    Reading from Memory

    address

  • 7/23/2019 Multi Processor

    14/73

    Art of MultiprocessorProgramming14

    Reading from Memory

    zzz

  • 7/23/2019 Multi Processor

    15/73

    Art of MultiprocessorProgramming15

    Reading from Memory

    value

  • 7/23/2019 Multi Processor

    16/73

    Art of MultiprocessorProgramming16

    Writing to Memory

    address, value

  • 7/23/2019 Multi Processor

    17/73

    Art of MultiprocessorProgramming 17

    Writing to Memory

    zzz

  • 7/23/2019 Multi Processor

    18/73

    Art of MultiprocessorProgramming 18

    Writing to Memory

    ack

  • 7/23/2019 Multi Processor

    19/73

    Art of MultiprocessorProgramming 19

    Remote Spinning

    Thread waits for a bit in memory tochange

    Maybe it tried to dequeue from anempty buffer

    Spins Repeatedly rereads flag bit

    Huge waste of interconnectbandwidth

  • 7/23/2019 Multi Processor

    20/73

    Art of MultiprocessorProgramming 20

    Analogy

    In the days before the Internet Alice is writing a paper on aardvarks

    Sources are in university library Request book by campus mail Book arrives by return mail

    Send it back when not in use She spends a lot of time in the mail

    room

  • 7/23/2019 Multi Processor

    21/73

    Art of MultiprocessorProgramming 21

    Analogy II

    Alice buys A desk

    In her office To keep the books she is using now

    A bookcase

    in the hall To keep the books she will need soon

  • 7/23/2019 Multi Processor

    22/73

    Art of MultiprocessorProgramming 22

    Cache: Reading from Memory

    address

    cache

  • 7/23/2019 Multi Processor

    23/73

    Art of MultiprocessorProgramming 23

    Cache: Reading from Memory

    cache

  • 7/23/2019 Multi Processor

    24/73

    Art of MultiprocessorProgramming 24

    Cache: Reading from Memory

    cache

  • 7/23/2019 Multi Processor

    25/73

    Art of MultiprocessorProgramming25

    Cache Hit

    cache

    ?

  • 7/23/2019 Multi Processor

    26/73

    Art of MultiprocessorProgramming26

    Cache Hit

    cache

    Yes!

  • 7/23/2019 Multi Processor

    27/73

    Art of MultiprocessorProgramming27

    Cache Miss

    address

    cache

    ?No

  • 7/23/2019 Multi Processor

    28/73

  • 7/23/2019 Multi Processor

    29/73

  • 7/23/2019 Multi Processor

    30/73

    Art of MultiprocessorProgramming30

    Local Spinning

    With caches, spinning becomes practical

    First time

    Load flag bit into cache As long as it doesnt change

    Hit in cache (no interconnect used)

    When it changes One-time cost See cache coherence below

  • 7/23/2019 Multi Processor

    31/73

    Art of MultiprocessorProgramming31

    Granularity

    Caches operate at a largergranularity than a word

    Cache line: fixed-size block containingthe address

  • 7/23/2019 Multi Processor

    32/73

  • 7/23/2019 Multi Processor

    33/73

    Art of MultiprocessorProgramming33

    Hit Ratio

    Proportion of requests that hit in thecache

    Measure of effectiveness of cachingmechanism

    Depends on locality of application

  • 7/23/2019 Multi Processor

    34/73

  • 7/23/2019 Multi Processor

    35/73

    Art of MultiprocessorProgramming35

    L1 and L2 Caches

    L1

    L2

    Small & fast1 or 2 cycles~16 byte line

  • 7/23/2019 Multi Processor

    36/73

    Art of MultiprocessorProgramming36

    L1 and L2 Caches

    L1

    L2

    Larger and slower10s of cycles~1K line size

  • 7/23/2019 Multi Processor

    37/73

    Art of MultiprocessorProgramming37

    When a Cache Becomes Full

    Need to make room for new entry

    By evicting an existing entry

    Need a replacement policy Usually some kind of least recently used

    heuristic

  • 7/23/2019 Multi Processor

    38/73

    Art of MultiprocessorProgramming38

    Fully Associative Cache

    Any line can be anywhere in the cache Advantage: can replace any line

    Disadvantage: hard to find lines

  • 7/23/2019 Multi Processor

    39/73

  • 7/23/2019 Multi Processor

    40/73

    Art of MultiprocessorProgramming40

    K-way Set Associative Cache

    Each slot holds k lines Advantage: pretty easy to find a line

    Advantage: some choice in replacing line

  • 7/23/2019 Multi Processor

    41/73

  • 7/23/2019 Multi Processor

    42/73

    Art of MultiprocessorProgramming42

    Contention

    Good to avoid memory contention.

    Idle processors

    Consumes interconnect bandwidth

  • 7/23/2019 Multi Processor

    43/73

  • 7/23/2019 Multi Processor

    44/73

  • 7/23/2019 Multi Processor

    45/73

  • 7/23/2019 Multi Processor

    46/73

    Art of MultiprocessorProgramming46

    Cache Coherence

    Processor A and B both cacheaddress x

    A writes to x Updates cache

    How does B find out?

    Many cache coherence protocols inliterature

  • 7/23/2019 Multi Processor

    47/73

  • 7/23/2019 Multi Processor

    48/73

  • 7/23/2019 Multi Processor

    49/73

    Art of MultiprocessorProgramming

    49

    MESI

    Modified Have modified cached data, must write

    back to memory Exclusive

    Not modified, I have only copy

    Shared Not modified, may be cached elsewhere

  • 7/23/2019 Multi Processor

    50/73

  • 7/23/2019 Multi Processor

    51/73

  • 7/23/2019 Multi Processor

    52/73

  • 7/23/2019 Multi Processor

    53/73

  • 7/23/2019 Multi Processor

    54/73

  • 7/23/2019 Multi Processor

    55/73

  • 7/23/2019 Multi Processor

    56/73

  • 7/23/2019 Multi Processor

    57/73

    Art of MultiprocessorProgramming

    57

    Write-Through Caches

    Immediately broadcast changes Good

    Memory, caches always agree More read hits, maybe

    Bad

    Bus traffic on all writes Most writes to unshared data For example, loop indexes

    (1)

  • 7/23/2019 Multi Processor

    58/73

  • 7/23/2019 Multi Processor

    59/73

  • 7/23/2019 Multi Processor

    60/73

  • 7/23/2019 Multi Processor

    61/73

    Art of MultiprocessorProgramming

    61

    Multicore Architectures

    The university president Alarmed by fall in productivity

    Puts Alice, Bob, and Carol in samecorridor Private desks

    Shared bookcase Contention costs go way down

  • 7/23/2019 Multi Processor

    62/73

  • 7/23/2019 Multi Processor

    63/73

    Art of MultiprocessorProgramming

    63

    cache

    BusBus

    memory

    cachecachecache

    Multicore Architecture

  • 7/23/2019 Multi Processor

    64/73

    Art of MultiprocessorProgramming

    64

    Multicore

    Private L1 caches

    Shared L2 caches

    Communication between same-chipprocessors now very fast

    Different-chip processors still not so

    fast

  • 7/23/2019 Multi Processor

    65/73

    Art of MultiprocessorProgramming

    65

    NUMA Architectures

    Alice and Bob transfer to NUMAState University

    No centralized library Each office basement holds part of

    the library

  • 7/23/2019 Multi Processor

    66/73

  • 7/23/2019 Multi Processor

    67/73

    Art of MultiprocessorProgramming

    67

    SMP vs NUMA

    SMP

    memory

    NUMA

    (1)

    SMP: symmetric multiprocessor NUMA: non-uniform memory access CC-NUMA: cache-coherent

  • 7/23/2019 Multi Processor

    68/73

    Art of MultiprocessorProgramming

    68

    Spinning Again

    NUMA without cache OK if local variable

    Bad if remote Cc-NUMA

    Like SMP

  • 7/23/2019 Multi Processor

    69/73

    Art of MultiprocessorProgramming

    69

    Relaxed Memory

    Remember the flag principle? Alice and Bobs flag variables false

    Alice writes true to her flag andreads Bobs

    Bob writes true to his flag and reads

    Alices One must see the others flag true

  • 7/23/2019 Multi Processor

    70/73

    Art of MultiprocessorProgramming

    70

    Not Necessarily So

    Sometimes the compiler reordersmemory operations

    Can improve cache performance

    interconnect use

    But unexpected concurrentinteractions

  • 7/23/2019 Multi Processor

    71/73

    Art of MultiprocessorProgramming

    71

    Write Buffers

    address

    Absorbing

    Batching

  • 7/23/2019 Multi Processor

    72/73

    Art of MultiprocessorProgramming

    72

    Volatile

    In Java, if a variable is declaredvolatile, operations wont be

    reordered Expensive, so use it only when needed

    This work is licensed under a Creative Commons Attribution-

    http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/
  • 7/23/2019 Multi Processor

    73/73

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    You are free:

    to Share to copy, distribute and transmit the work to Remix to adapt the work

    Under the following conditions: Attribution. You must attribute the work to The Art of

    Multiprocessor Programming (but not in any way thatsuggests that the authors endorse you or your use of the

    work). Share Alike. If you alter, transform, or build upon this work,you may distribute the resulting work only under the same,similar or a compatible license.

    For any reuse or distribution, you must make clear to others thelicense terms of this work. The best way to do this is with a linkto

    http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission

    from the copyright holder. Nothing in this license impairs or restricts the author's moral

    rights.

    http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/