Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower , Polina Dudnik ,

47
Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower, Polina Dudnik, Mark D. Hill, David A. Wood

description

Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower , Polina Dudnik , Mark D. Hill, David A. Wood. Executive Summary. Determinism @ Good Performance. Determinism Valuable: Same inputs Same multithreaded execution Debugging, Fault Tolerance, Security - PowerPoint PPT Presentation

Transcript of Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower , Polina Dudnik ,

Page 1: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Calvin:Deterministic or Not?Free Will to Choose

Derek R. Hower, Polina Dudnik,Mark D. Hill, David A. Wood

Page 2: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Executive Summary• Determinism Valuable:

– Same inputs Same multithreaded execution– Debugging, Fault Tolerance, Security

• Performance Required:– Slow & deterministic not enough

• Propose: Calvin– Leverages Total Store Order (TSO) in hardware to... – … deterministically order memory operations

• Multiple modes w/o speculation– 20% Deterministic (vs. software 1-11X)– 8% Conventional

Determinism @ Good Performance

Page 3: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Outline• Motivation & Goals

• Model

• Implementation

• Evaluation

• Conclusion

• Related Work (optional)

Page 4: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Want Deterministic Execution

if (account >= sum)

account -= sum; account -= sum;

if (account >= sum)account = 100

account = 0

account = 0

account = 0

Bug: unprotected account update

thread 0

Page 5: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Bug: unprotected account update

Want Deterministic Execution

thread 0if (account >= sum)

account -= sum;

account -= sum;

if (account >= sum)account = 100

account = 100

account = 0

account = -100

Page 6: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Specific Goals• Strong Determinism:

– Make no assumptions about program behavior

– Help debug racey programs

• Performance:– Small enough overhead to

be on all the time

• Compatibility:– Complex speculative cores– Non-speculative cores

Strong Determinism Performance

Compatibility

Page 7: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Outline• Motivation & Goals

• Model

• Implementation

• Evaluation

• Conclusion

• Related Work (optional)

Page 8: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Proc

1

Proc

0

Calvin: The Big Picture

Load A

Load C

Store B

Store D

Mem

ory

Orde

r

Load D

Store B

Store ALoad A

Page 9: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Recall Total Store Order (TSO)…• TSO is a Relaxed memory model• Key point: write completion can be delayed

processor 0ST A <- 1

R1 <- LD B

ST A <- 1

R1 <- LD B

ST A <- 1

R1 <- LD B

ST A <- 1

R1 <- LD B

Mem

ory

Orde

rPC ->local

buffering

R2 <- LD AR2 <- LD A

Page 10: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Buff

erBuff

er Proc

1

Proc

0

Calvin Model: One InterleavingM

emor

y Or

der

Load A

Load C

Store B

Store DLoad D

Store B

Store ALoad A

Load A

Load C

Store B

Store DLoad D

Store B

Store ALoad A

Exec

ute

Publ

is h

1) all loads before all stores (execute)

2) all stores in processor order (publish)

Page 11: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Exec

ute

Publ

is hPRO

CESS

OR

0

PRO

CESS

OR

1

Calvin Model: Reduce Scope• Temporally divide multithreaded execution into global strata

Stratum S

Stratum S + 1Begin Stratum

Begin Stratum

Tim

e

Load

Load

Load

Store

StoreLoad

Store

StoreLoad

Load

Store

StoreLoad

Load

Store

Store

Load

Store

Load

Store

Store

Load

Load

Load LoadStore

Exec

ute

Publ

is h

End Stratum and Synchronize

End Stratum and Synchronize

Page 12: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Stratum Termination Function (3 Modes)

1. Unbounded deterministic:– determinism architectural events only, e.g. instructions– (#instructions == threshold) OR synchronization

2. Conventional:– performance reduce load imbalance, e.g. cycle count– (#cycles == threshold) OR synchronization

2. Bounded deterministic:– determinism architectural events only, e.g. instructions– (#instructions == threshold) OR (synchronization) OR (resource exhaustion)

Page 13: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Outline• Motivation & Goals

• Model

• Implementation– Write Cache– MIST Protocol– Stratum Size Predictor

• Evaluation

• Conclusion

• Related Work (optional)

Page 14: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Implementation: Overview• Implementation Challenges:– Stratification Load imbalance due to barriers– Buffering Conventional store buffers do not

scale– Ordering Serial flush is sloooooooow

• Calvin-MIST Implementation:– Store buffers Unordered write cache– Load imbalance Stratum Size Predictor (in

paper)– Fast flush MIST Coherence Protocol

Page 15: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Proc

1Pr

oc 0Load

ALoad C

Load B

Load A

Exec

ute

Publ

is h

Unordered Write Cache• Behavior:

– drops program store ordering– coalesces stores– prohibits loads in publish phase

• Replacements/overflow:1. End stratum

– Bounded Deterministic Mode– Repeatable only on same HW

2. Log (TM-like)– Unbounded Deterministic

Mode– Repeatable on any HW

Store BStore D

Store A

Atomic Flush

Store D

Page 16: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

MIST Protocol• Goal: speed up publish phase– delayed “timebomb” invalidate (in paper) – write caches flush in parallel

Proc

1

Proc

0

Load ALoad C Load B

Load A

Exec

ute

Publ

ishStore B

Store DStore AStore D

Page 17: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Outline• Motivation & Goals

• Model

• Implementation

• Evaluation

• Conclusion

• Related Work (optional)

Page 18: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Evaluation Methodology• Infrastructure

– Bochs– GEMS

• Workloads

– Parsec– Mantevo

Base Calvin-MIST

Cores 8, 2.0 Ghz in-order pipelined

Write Cache N/A 64 entry, 8 way

L1 Cache Private, Split L1 I&D, 32K 8-way, 1 cycle

Coherence Protocol

Conventional MOESI Multiple Writer MIST

Barrier N/A 16 cycle latency

L2 Cache Shared, 8MB, 16-way, 8 banks, 12 cycles

Directory Distributed at the L2 banks

Page 19: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Unbounded Deterministic Mode

0

0.5

1

1.5

2

2.5 UDBDC

Nor

mal

ized

Exe

cuti

on T

ime publis

h~20%

slowdown

fine-grained locking

frequent overflow

Page 20: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Bounded Deterministic Mode

0

0.5

1

1.5

2

2.5 phase2UDBDC

Nor

mal

ized

Exe

cuti

on T

ime publis

h~20% simpler HW

better stratum

size

Page 21: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Conventional Mode

0

0.5

1

1.5

2

2.5 logphase2UDBDC

Nor

mal

ized

Exe

cuti

on T

ime publis

h~8% slowdown

bad stratum

size

Page 22: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Outline• Motivation & Goals

• Model

• Implementation

• Evaluation

• Conclusion

• Related Work (optional)

Page 23: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Conclusion• Determinism Valuable:

– Same inputs Same multithreaded execution– Debugging, Fault Tolerance, Security

• Performance Required:– Uninteresting to be slow & deterministic

• Propose: Calvin– Leverages TSO in hardware to... – … deterministically order memory operations

• Multiple modes w/o speculation– 20% Deterministic– 8% Conventional

Determinism @ Good Performance

Page 24: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Outline• Motivation & Goals

• Model

• Implementation

• Evaluation

• Conclusion

• Related Work (optional)

Page 25: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Related Work• DMP [Devietti, J. et al., ASPLOS ‘09]

– First hardware solution for strong determinism– Good performance through TM-like speculation– Calvin seeks good performance with less speculation (power?)

• Kendo [Olszewski, M. et. al., ASPLOS ‘09]– First software solution for weak determinism– Good performance, but not as general (e.g., debugging data races)– Calvin seeks good performance for strong determinism

• CoreDet [Bergan, T. et al., ASPLOS ‘10]– First software solution for strong determinism– Exploits relaxed model, e.g., TSO with software store buffer– Performance left room for improvement– Calvin implements similar ideas in hardware to be fast

Page 26: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Questions?

Page 27: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Backup Slides Follow

Page 28: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

R0 = 2R1 = 1

R2 = 0

Calvin Model

Stratum S

Mem

ory

Orde

r

processor 0ST A <- 1

R2 <- LD A

R1 <- LD B

ST A <- 2processor 1

ST B <- 3

R0 <- LD A

BufferBuffer

A = 1 A = 2B = 3

Exec

ute

Publ

is h

• Deterministically order memory operations within stratum• All loads before all stores• All stores are ordered by processor

Page 29: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Coherence Protocol• Write-back protocol• Allows parallel write cache flush• Allows fast reader invalidate

# states MIST MESI MOESIStable @

L1 6 4 7Transient @

L1 12 6 8Stable @

L2 5 3 13Transient @

L2 17 14 46Total 40 27 74

Page 30: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

L1 Cache States

State Meaning Global Invariant

I Not Present/Invalid 0 or more readers, 0 or more writers

S Read Permission, no other writers in the system

1 or more readers, 0 writers

M Write permission, didn’t write in current stratum

0 readers, 1 writer

Ts Read permission until the end of the stratum

1 or more readers, 1 or more writers

Mw Write permission, wrote in current stratum 0 readers, 1 writer

MMw Write permission until the end of the stratum

2 or more writers, 0 or more readers

Page 31: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Directory States

State Meaning Global Invariant Valid Copy @

I Not Present/Invalid 0 readers, 0 writers Memory

S One or more readers

1 or more readers, 0 writers L2 Cache

M Only one writer 0 or more readers, 1 writer Processor

MM No readers/writers 0 readers, 0 writers L2 Cache

MS Multiple writers 0 or more readers, 1 or more writers L2 Cache

Page 32: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Stratum Size Predictor• Stratum Size

Predictor:– optimizes stratum size– adopts to loads

imbalance

• Large stratum:– reduce instruction mix

variability

• Small stratum:– adopt to synchronization

Proc

1

Proc

0

Page 33: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

L1

Cach

e

L1

Cach

e

Reader Self-InvalidationTi

me

Exec

ute

Publ

is h

L2 C

acheB:

Shared

Processor 0 Processor 1

B: Shared B: Shared

LDST

Intent

B: Shared B: ModifiedB: Modified

B: Shared B: ModifiedB: Modified

Page 34: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Predictor

MemBar?C&BD:

Overflow?

Stratum Ends

Saturated?

Decrement Predictor

Increment Predictor

Size*2 Size/2

No Yes

Yes/Low

Yes/High

Stratum Ends

No

Page 35: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Predictor Helps Improve Performance

beam blc

kbd

trde

dup

epetr

aflu

id freq

hpccg

minimd

phpcc

g ray swap vip

sx26

4mea

n

-0.1

-0.05

0

0.05

0.1

0.15 CBDUD

Spee

dup

Page 36: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Write Cache Size Affects Performance

.0

0.5

1

1.5

2

2.5

log phase2 64E_8W 32E_8W16E_8W

Nor

mal

ized

Exe

cuti

on T

ime

Page 37: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Bottom Line

0

0.5

1

1.5

2

2.5 logphase2UDBDC

Nor

mal

ized

Exe

cuti

on T

ime publis

h

Mantevo

Page 38: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Calvin-MIST Operation

Page 39: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Example Protocol Operation

Page 40: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,
Page 41: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Atomic Operations• Ensure that only one atomic operation executes

per stratum• Logically place the atomic operation at the end of

the stratum

• Terminate stratum on atomic operation• Execute both R and W parts of RMW as

processor’s last store• Allows processors to communicate within a

stratum

Page 42: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

Multi-Writer Example

Core 2Core 1L1 Cache L1 Cache

Write Cache Write CacheExecution PhasePublish Phase

FWD FWD

L2 CacheACKNAC

KACK

Page 43: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

43

Atomic Operations• TSO atomic ordering rules:

1) All previous loads and stores2) Atomic (both load and store portion)3) All subsequent loads and stores

• Calvin satisfies rules by:1) Ending strata on atomics2) Executing atomic op entirely in publish phase3) Executing next instruction in next strata

Page 44: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

44

Atomic Example

Proc

1

Proc

0

Load A

Load A

Store A

Store L

Load C

Store C

Store B

Load B

Mem

ory

Ord

er

RMW LLoad A

Store C

Stall

Page 45: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

45

Deterministic Input• Program’s repeatability depends on deterministic

input

• Input:– Use mechanisms from uniprocessor deterministic replay,

e.g.:• Revirt• VMware Replay• FDR

• Interrupts:– Delivered only on strata boundaries

• Makes for easy logging (e.g., <vector #, strata #>)

Page 46: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

46

Conventional Mode Slowdown• Sources:

– Barrier latency (16 cycle)• Results indicate 4 cycle barrier largely eliminates overhead

– Load imbalance• Especially in presence of fine-grained communication

– Slow inter-thread communication• Threads cannot communicate within a stratum

Page 47: Calvin: Deterministic or Not? Free Will to Choose Derek R.  Hower ,  Polina Dudnik ,

With Average Stratum Size

. beam blck bdtr dedup epetra fluid freq hpccg minimd phpccg ray swap vips x264 mean0

0.5

1

1.5

2

2.5

1312

6

8984

5135

1071

1521

5

571

5948

1214

8

5476

1206

2

4584 1363

8

1235

7

1203

4

1.03

9012

4293

4226

3257

3132

1503 540

3568

105

2542 25

02

1938

2386

1254

2849

3001

3153

1.16

5101

5072

3813

3269

3132

1497

534

3574

104

2855 25

60

2307

2426

1453

3378

3035

3229

1.17

6691

4707

0772

logphase2UDBDC