OmniOrder: Directory-Based Conßict Serialization of...

22
University of Illinois http://iacoma.cs.uiuc.edu / http://iacoma.cs.uiuc.edu / OmniOrder: Directory-Based Conflict Serialization of Transactions Xuehai Qian Benjamin Sahelices and Josep Torrellas UC Berkeley, Universidad de Valladolid University of Illinois, Urbana-Champaign

Transcript of OmniOrder: Directory-Based Conßict Serialization of...

  • Universityof Illinois http://iacoma.cs.uiuc.edu/

    http://iacoma.cs.uiuc.edu/

    OmniOrder: Directory-Based Conflict Serialization of Transactions

    Xuehai QianBenjamin Sahelices and Josep Torrellas

    UC Berkeley, Universidad de ValladolidUniversity of Illinois, Urbana-Champaign

    http://iacoma.cs.uiuc.eduhttp://iacoma.cs.uiuc.edu

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Transaction (Atomic Block)

    2

    • Types of transactions:• SW-demarcated transactions: fixed boundary

    • TM or compiler-generated• HW-generated transactions: dynamically-built

    • Enforcing SC speculatively• Conflicting transactions need to be serialized• Conventional serialization mechanisms• Squash: one of the transactions aborts• Stall: one of the transactions stalls

    T1T2

    squashstall

    wrrd

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Conflict Serialization

    3

    • Serializes conflicting transactions without squash or stall

    • Commits conflicting transactions according to the order of dependences

    • Only squash on cyclic dependence

    T1 T2

    commit order

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Supporting Conflict Serialization

    • Record dependence ordering• Forward data (typically)• Detect dependence cycles• Four existing proposals• Dependence-Aware TM (MICRO’08): Snoopy-based • SONTM (MICRO’10): High communication overhead• BulkSMT (HPCA’12): Only within single SMT• Wait-n-GoTM (ASPLOS’13): only SW-demarcated trans.

    4

    No existing scheme can support conflict serialization of both SW and HW transactions in a directory-based protocol

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Contribution: OmniOrder

    • A directory-based cache coherence protocol that supports conflict serialization of both SW and HW transactions

    • Key idea:• Keep only non-spec data in the caches• Keep history of spec updates at word granularity in a buffer• On line transfer due to cache coherence, include history of

    spec updates

    • Coherence protocol transitions are unmodified

    5

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Outline• Motivation• OmniOrder Design• Evaluation

    6

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    OmniOrder Characteristics

    • Speculative state management is decoupled from coherence transitions

    • No centralized HW structure or operation• Minimum overhead if there is no conflict• When a transaction is squashed: only squash successors

    with same-word RAW

    • Modest complexity

    7

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    S:v0

    I

    Accesses within Transactions

    P1P0 P2

    D: v0 I I

    Wr:v1

    P0: v1 Speculative Update History

    P1P0 P2

    I IP0: v1

    Wr:v2

    D:v0

    P1: v2

    Multiple history

    entries for a word

    I

    P1P0 P2

    D:v0 IIP0: v1

    Rd

    S:v0

    Dir

    P1: v2 On a future read: Dir will provide both data and update history

    Eager value propagation • No extra messages or update

    history when there is no conflict

    • Original coherence protocol transitions are not affected

    33

    commit order

    commit order

    commit order

    Eager conflict detection

    Record order of transactions

    P0: v1P1: v2P0: v1P1: v2

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    P0: v1P0: v1

    Transaction Commit and Squash

    9

    S:v0I

    P1P0 P2

    D:v0 II

    P1: v2

    S:v0

    Dir

    S:v0

    P1: v2P0: v1

    Sq

    P0’s squash does not squash P1 or P2 because they did not

    read P0’s data

    P1‘s squash squashes P2

    because P2 read P1‘s data

    Sq

    Squash never affects non-spec

    cache line.

    P1: v2

    S:v0I

    P1P0 P2

    D:v0 II

    P1: v2

    S:v0

    Dir

    S:v0

    P1: v2P0: v1

    Sq

    Final cache state: contains P1’s

    update but not P0’s update

    Cmt

    P1: v2S:v2

    CmtS:v2

    Squashes never affect non-spec

    cache lines

    Lazy value update: merge

    on commit

    commit order

    commit order

    commit order

    commit order

    Sq

    Sq

    Sq

    Sq CmtP0: v1

    Sq

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Transaction Commit and Squash

    10

    • Each processor in a dependence chain forwards squash and commit signals to successors

    • Transaction commit: merge updates wherever they are• Transaction squash: purge updates wherever they are• Merges are in commit order; purges are in any order• Lazy value update is the key for efficient state recovery• Squashing a transaction simply involves purging update

    history entries

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    ➙ correct merges and purges

    ➙ HW compatible with coherence

    protocol

    Keep update history at WORD granularity

    Multi-word Cache Line

    11

    P1P0

    wr AD I

    P0:v0

    P1P0

    wr BD I

    P0:v0I D

    commit order

    Record the commit order based on

    dependences at LINE granularity

    P1P0

    I DP0:v0

    P1:v1

    P1:v1

    commit order

    wr AD I

    Cycle: one of the transactions needs to squash. Even if no conflict at word

    granularity

    A B

    P0:v2

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Cycle Detection• OmniOrder uses a local conservative cycle detection policy• A cycle is detected by a transaction, if:

    • The transaction is the src of a dependence and the dst of another dependence• The src dependence occurs before the dst dependence

    • The transaction that detects the cycle squashes• It may trigger other squashes following the successors

    • The policy may have false positive but simple to implement

    12

    T0 T1 T2 T3A B

    C D

    GH

    12

    4

    F3

    E

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Cycle Detection• OmniOrder uses a local conservative cycle detection policy• A cycle is detected by a transaction, if

    • The transaction is the src of a dependence and the dst of another dependence• The src dependence occurs before the dst dependence

    • The transaction that detects the cycle squashes• It may trigger other squashes following the successors

    • The policy may have false positive but simple to implement

    13

    T0 T1 T2 T3A B

    C D

    GH

    12

    3

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    HW-Generated Dynamic Transaction

    14

    OmniOrder handles HW-generated transactions like SW-demarcated ones

    P0 P1

    Time

    B0 B1

    rd x

    rd z

    wr y

    wr z

    Start a HW transaction. It has to be committed according to dependence order

    Enter a HW transaction to

    enforce SC

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Also in the paper ...

    15

    • Detailed specification of speculative reads and writes, commits and squashes

    • Hardware structures• Implementation issues

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Outline• Motivation• OmniOrder Design• Evaluation

    16

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Evaluation Setup

    17

    • Use simulations of multicore with up to 64 cores• Private L1 cache, shared and banked L2 cache with distributed

    directory

    • SW-demarcated transactions• Run STAMP benchmarks• Compare OmniOrder (OO) to Squash-On-Conflict (S)

    • HW-generated dynamic transactions• Run SPLASH, PARSEC and small concurrent algorithms• Compare OO to:• InvisiFence with Commit-On-Violation (IF_COV)• Release Consistency (RC)

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions Bayes_S Bayes_O

    O G

    enome_S

    Genom

    e_OO

    Intruder_S Intruder_O

    O

    Kmeans_S

    Kmeans_O

    O

    Labyrinth_S Labyrinth_O

    O

    Ssca2_S Ssca2_O

    O

    Vacation_S Vacation_O

    O

    Yada_S Yada_O

    O

    Avg_S Avg_O

    O0

    0.10.20.30.40.50.60.70.80.91.0

    Nor

    mal

    ized

    Exe

    cutio

    n Ti

    me Useful Memory Squashed

    SW-Demarcated Transactions

    18

    • OO reduces execution time by over 18% on avg. over squash-on-conf• Reduces squash time by ordering transactions• Reduces memory time by avoiding squash-induced cache misses

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    HW-Generated Transactions (Apps)

    19

    • OO has similar performance as RC and IF_COV for most apps due to few conflicts

    SC

    Barnes_RC

    Barnes_IF_CO

    V Barnes_O

    O Blackscholes_R

    C Blackscholes_IF_C

    OV

    Blackscholes_OO

    Cholesky_R

    C C

    holesky_IF_CO

    V C

    holesky_OO

    FFT_RC

    FFT_IF_CO

    V FFT_O

    O

    Fluidanimate_R

    C Fluidanim

    ate_IF_CO

    V Fluidanim

    ate_OO

    Fmm

    _RC

    Fmm

    _IF_CO

    V Fm

    m_O

    O

    LU_R

    C LU

    _IF_CO

    V LU

    _OO

    Ocean_R

    C O

    cean_IF_CO

    V O

    cean_OO

    Radiosity_R

    C R

    adiosity_IF_CO

    V R

    adiosity_OO

    Radix_R

    C R

    adix_IF_CO

    V R

    adix_OO

    Raytrace_R

    C R

    aytrace_IF_CO

    V R

    aytrace_OO

    Streamcluster_R

    C Stream

    cluster_IF_CO

    V Stream

    cluster_OO

    Swaptions_R

    C Sw

    aptions_IF_CO

    V Sw

    aptions_OO

    Volrend_RC

    Volrend_IF_CO

    V Volrend_O

    O

    Water-ns_R

    C W

    ater-ns_IF_CO

    V W

    ater-ns_OO

    Water-sp_R

    C W

    ater-sp_IF_CO

    V W

    ater-sp_OO

    0

    0.25

    0.5

    0.75

    1.0

    1.25

    Nor

    mal

    ized

    Exe

    cutio

    n Ti

    me Useful Memory Squashed+Stall

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    HW-Generated Transactions (Conc. Algo.)

    20

    • OO reduces execution time by over 15% on average• Concurrent algorithms have conflicts ➙ OO eliminates most of

    squashes

    Dekker_IF_C

    OV

    Dekker_O

    O Peterson_IF_C

    OV

    Peterson_OO

    Aharr_IF_CO

    V Aharr_O

    O

    Harris_IF_C

    OV

    Harris_O

    O

    Lazylist_IF_CO

    V Lazylist_O

    O

    Moirbt_IF_C

    OV

    Moirbt_O

    O

    Moircas_IF_C

    OV

    Moircas_O

    O

    Ms2_IF_C

    OV

    Ms2_O

    O

    Msn_IF_C

    OV

    Msn_O

    O

    Mst_IF_C

    OV

    Mst_O

    O

    Snark_IF_CO

    V Snark_O

    O

    Avg_IF_CO

    V Avg_O

    O

    00.10.20.30.40.50.60.70.80.91.0

    Nor

    mal

    ized

    Exe

    cutio

    n Ti

    me Useful Memory Squashed+Stall

  • Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions

    Conclusion

    21

    • OmniOrder: first directory-based cache coherence protocol that supports conflict serialization of both SW and HW transactions

    • Key idea:• Keep only non-spec data in the caches• Keep history of spec updates at word granularity in a buffer• On line transfer due to cache coherence, include history of spec

    updates

    • Coherence protocol transitions are unmodified• Reduction in execution time depends on frequency of conflicts:• Avg. reduction by 18% for SW transactions• Avg. reduction by 15% for HW transactions for apps with conflicts

  • Universityof Illinois http://iacoma.cs.uiuc.edu/

    http://iacoma.cs.uiuc.edu/

    OmniOrder: Directory-Based Conflict Serialization of Transactions

    Xuehai QianBenjamin Sahelices and Josep Torrellas

    UC Berkeley, Universidad de ValladolidUniversity of Illinois, Urbana-Champaign

    http://iacoma.cs.uiuc.eduhttp://iacoma.cs.uiuc.edu