162 Consistency

download 162 Consistency

of 22

Transcript of 162 Consistency

  • 8/10/2019 162 Consistency

    1/22

    CS 162

    Memory Consistency Models

  • 8/10/2019 162 Consistency

    2/22

    Memory operations are reordered to improveperformance

    Hardware (e.g., store buffer, reorder buffer)

    Compiler (e.g., code motion, caching value in register)

    Behave the same as long as dependences arerespected

    Reordering in Uniprocessors

    a1: St x

    a2: Ld y

    a2: Ld y

    a1: St x

  • 8/10/2019 162 Consistency

    3/22

    counter-intuitiveprogram behavior

    Reordering in Multiprocessors

    Initiallyx=y=0

    (Rx=1, Ry=1)

    (Rx=1, Ry=0)

    (Rx=0, Ry=0)

    b1: Ry= y;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    b1: Ry

    = y;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    b1: Ry= y;b1: Ry= y;

    b2: Rx= x;

    (Rx=0, Ry=1)Intuitively, y=1 x=1

    a1: x = 1; b1: Ry= y;

    b2: Rx= x;a2: y = 1;

    P1 P2

    a1: x = 1;

    a2: y = 1;

    Possible outcomes

  • 8/10/2019 162 Consistency

    4/22

    Reordering in Multiprocessors

    p = new A() if (flag)

    a = p->var;flag = true;

    P1 P2

    flagis supposed to be set afterp is allocated

    Initiallyp=NULL, flag = false

    counter-intuitiveprogram behavior

    Lock-free algorithms, e.g., Dekker, Peterson

  • 8/10/2019 162 Consistency

    5/22

    Dekker Algorithm (mutual exclusion)

    Reordering in Multiprocessors

    flag1 = 1; flag2 = 1;

    if (flag2 == 0) if (flag1 == 0)critical section critical section

    P1 P2

    Initially flag1 = flag2 = 0

    flag1 = 1

    flag2 == 0

    After reordering, both flag1and flag2 can be 0

    St flag1

    Ld flag2

    counter-intuitiveprogram behavior

  • 8/10/2019 162 Consistency

    6/22

    Memory Consistency Models

    Specify the ordering of loads and stores to

    differentmemory locations

    LdLd, Ld St, StLd, StSt

    Contract between hardware, compiler, and

    programmer

    hardware and compiler will not violate the ordering specifiedthe programmer will not assume a stricter order than that of

    the model

  • 8/10/2019 162 Consistency

    7/22

    Memory Consistency Models

    Allowed

    Reordering

    Commercial

    Architecture

    Sequential Consistency None not existTotal Store Ordering StLd x86, SPARC

    Relaxed Memory Order All ARM, PowerPC

    Low

    High

    Performance

    Stronger models

    Stronger constraints

    Fewer

    memory

    reorderings

    Easier to reason

    Lower

    performance

    High

    Low

    Progra

    mmability

  • 8/10/2019 162 Consistency

    8/22

    Cache Coherence vs .Memory Model

    Cache coherence ensures a consistent view ofmemory

    Guarantees that the update to memory by one

    processor will be seen by other processors eventually

    But, how consistent ?NO guarantees on whenan update should be seen

    NO guarantees on what order of updates should beseen

  • 8/10/2019 162 Consistency

    9/22

    Cache Coherence vs .Memory Model

    Initially A = B = 0

    P1 P2 P3

    A = 1; while (A != 1) ;

    B = 1; while (B != 1) ;

    tmp = A ;

    tmp= 1? or tmp = 0?

  • 8/10/2019 162 Consistency

    10/22

    Sequential Consistency (SC)

    Definition [Lamport](1) the result of any execution is the same as if theoperations of all processors were executed in somesequential order;

    (2) the operations of each individual processorappear in this sequence in the order specified by itsprogram.

    MEMORY

    P1

    P3

    P2

    Pn

    Behave as the repetition:(1) Pick a processor by anymethod (e.g., randomly)

    (2) the processor completes a

    load/store operation

  • 8/10/2019 162 Consistency

    11/22

    SC Example

    b1: Ry= y;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    b1: Ry= y;

    b1: Ry= y;

    b2: Rx= x;

    (Rx=0, Ry=0)

    a1: x = 1; b1: Ry= y;

    b2: Rx= x;a2: y = 1;

    P1 P2

    a1: x = 1;

    a2: y = 1;

    b1: Ry= y;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    b1: Ry= y;

    b2: Rx= x;

    a1: x = 1;

    a2: y = 1;

    a2: y = 1;

    b1: Ry= y;

    b2: Rx= x;

    a1: x = 1;

  • 8/10/2019 162 Consistency

    12/22

    Sequential Consistency (SC)

    Simple and intuitive

    consistent with programmers intuition

    easy to reason program behavior

    However, the simplicity comes at the cost ofperformance

    prevents aggressive compiler optimizations (e.g., loadreordering, store reordering, caching value in register)

    constrains hardware utilization, (e.g., store buffer)

  • 8/10/2019 162 Consistency

    13/22

    SC Violation

    a1: x = 1

    a2: y = 1

    b1: R1 = y

    b2: R2 = x

    program order

    conflict relation

    SC Violation

    - A cycleformed by programorders and conflict orders

    [Shasha and Snir, 1988]

    e.g., (a2, b1, b2, a1, a2)

    - Executing in the order (a2, b1, b2, a1)will produce R1=1, R2=0, which is not anSC outcome

    Insert fences to break cycle- a2 can not be executed before a1

  • 8/10/2019 162 Consistency

    14/22

  • 8/10/2019 162 Consistency

    15/22

  • 8/10/2019 162 Consistency

    16/22

    if (cond)

    a1: St x

    a2: Ld y

    b1: St y

    b2: Ld x

    Fence1 Fence2

    a1is in a conditional branch

    Conservativeness of Fences

    a1: St *p

    a2: Ld x

    b1: St x

    b2: Ld *q

    Fence1 Fence2

    p andq may point to the samememory location

    Inserted statically and conservatively

    No cycle is formed at runtime

  • 8/10/2019 162 Consistency

    17/22

  • 8/10/2019 162 Consistency

    18/22

    Address-aware Fences

    Consider memory locations accessed around

    fences at runtime

    Fences only take effect when there is a cycleabout to happen

  • 8/10/2019 162 Consistency

    19/22

    Detect and Avoid Cycles

    A1

    A2

    Proc 1 Proc 2

    a1:

    a2:

    Fence1

    B1

    B2

    b1:

    Fence2

    b2:

    c1

    c2?

    How to detect c2efficiently?

  • 8/10/2019 162 Consistency

    20/22

    Detect and Avoid Cycles

    A1

    A2

    Proc 1 Proc 2

    a1:

    a2:

    Fence1

    B1

    B2

    b1:

    Fence2

    b2:

    c1

    watchl is t

    c2?

    How to detect c2efficiently?

    Collecting watchlistfor each fence

    Completing memory operation

    checks the watchlist

    - bypass,if its address is not in

    the watchlist

    - stall, otherwise

  • 8/10/2019 162 Consistency

    21/22

    Performance: Execution Time

    Traditional fence (T) vs. Address-aware fence (A)

    Fence overhead becomes negligible

  • 8/10/2019 162 Consistency

    22/22

    Further Reading

    L. Lamport. How to make a multiprocessor computer that correctly

    executes multiprocess program. IEEE Trans. Comput., 28(9):690

    691, 1979.

    S. V. Adve and K. Gharachorloo. Shared memory consistency

    models: A tutorial. IEEE Computer, 29:6676, 1995.D. Shasha and M. Snir. Efficient and correct execution of parallel

    programs that share memory. ACM Trans. Program. Lang. Syst.,

    10(2):282312, 1988.

    Daniel J. Sorin, Mark D. Hill, David A. Wood.A Primer on Memory

    Consistency and Cache Coherence. Synthesis Lectures onComputer Architecture, 2011.

    C. Lin, V. Nagarajan, and R. Gupta.Address-aware fences. ICS

    13, pages 313324, 2013