Improved Single Global Lock Fallback for Best-effort Hardware ...

36
Improved Single Global Lock Fallback for Besteffort Hardware Transactional Memory Irina Calciu Tatiana Shpeisman Gilles Pokam Maurice Herlihy TRANSACT 2014

Transcript of Improved Single Global Lock Fallback for Best-effort Hardware ...

Page 1: Improved Single Global Lock Fallback for Best-effort Hardware ...

Improved  Single  Global  Lock  Fallback  for  Best-­effort  Hardware  

Transactional  Memory

Irina  CalciuTatiana  ShpeismanGilles  PokamMaurice  Herlihy

TRANSACT   2014

Page 2: Improved Single Global Lock Fallback for Best-effort Hardware ...

Multicore  Performance  Scaling

2

Page 3: Improved Single Global Lock Fallback for Best-effort Hardware ...

Intel’s  Haswell TSX: RTM  &  HLE

3

Low  overhead  (cache  based)

IBM’s  Blue  Gene/Q  &  System  Z  &  Power  Architecture

Hardware  Transactional  Memory  (HTM)

Page 4: Improved Single Global Lock Fallback for Best-effort Hardware ...

Haswell RTM

if  (_xbegin()  ==  _XBEGIN_STARTED)

_xend()

Speculate  Execution

Speculate  Execution,  without  any  locks

Read  and  Write  Sets

4

Abort  on  memory  conflict

else

Abort  Handler

Page 5: Improved Single Global Lock Fallback for Best-effort Hardware ...

Haswell RTM

5

_xbegin()

_xend()

Read  X

Write  Y

Add  to  Read  Set

Add  to  Write  Set

_xbegin()

_xend()

Write  X

Write  YAdd  to  Write  Set

Make  the  change  to  Y  visibleCOMMIT

Add  to  Write  SetABORT

if  (_xbegin()  ==  _XBEGIN_STARTED)

_xend()

Speculate  Execution

Page 6: Improved Single Global Lock Fallback for Best-effort Hardware ...

Lock Elision

<HLE_Aquire_Prefix>  Lock(L)

<HLE_Release_Prefix>  Release(L)

Atomic  region  executed  as  a  transaction or  mutually  exclusive on  L

Execute  optimistically,  without  any  locks

Track  Read  and  Write  Sets

6

Abort  on  memory  conflict:  rollback  acquire  lock

Page 7: Improved Single Global Lock Fallback for Best-effort Hardware ...

[Anand Tech]7

Page 8: Improved Single Global Lock Fallback for Best-effort Hardware ...

Best-­effort

OverflowUnsupported  InstructionsInterrupts

Conflicts

8

Small  &  Medium  Transactions

Haswell RTM

Needs  software  fallback

Page 9: Improved Single Global Lock Fallback for Best-effort Hardware ...

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

9

Page 10: Improved Single Global Lock Fallback for Best-effort Hardware ...

Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation

Single Global Lock HyTM (simple and common)

10

EndHW txn

BeginHW txnRead L

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Page 11: Improved Single Global Lock Fallback for Best-effort Hardware ...

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txnRead L

EndHW txn

(1)

BeginHW txnRead L

EndHW txn

(2)

BeginHW txnRead L

BeginHW txnRead L

EndHW txn

(3) EndHW txn

(4)

XX

X

X

Legend:X = ABORT

Single Global Lock HyTM (simple and common)

Time

11

Page 12: Improved Single Global Lock Fallback for Best-effort Hardware ...

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Acquire(L)

Release(L)

CRITICAL  SECTION(SW  TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Time

Thread  1 Thread  2

Execution   Time  1 12

Page 13: Improved Single Global Lock Fallback for Best-effort Hardware ...

Thread  1 Thread  2

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Acquire(L)

Release(L)

CRITICAL  SECTION(SW  TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Execution   Time  1

Time

Execution   Time  2

13

Page 14: Improved Single Global Lock Fallback for Best-effort Hardware ...

Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation

Lazy SGL

1414

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Read LEnd

HW txn

BeginHW txn

Page 15: Improved Single Global Lock Fallback for Best-effort Hardware ...

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend:X = ABORT

COMMIT

COMMIT

Lazy SGL

Time

15

Page 16: Improved Single Global Lock Fallback for Best-effort Hardware ...

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

16

Page 17: Improved Single Global Lock Fallback for Best-effort Hardware ...

Transactional Memory Correctness

Transaction 1SW

Transaction 2HW

Time

Order T2 AFTER T1

Order T2 BEFORE T1

COMMIT

COMMIT

17

Page 18: Improved Single Global Lock Fallback for Best-effort Hardware ...

Thread  1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread  2(HW)

Correct: a Actual: b

Time

Case  1:  HW  begins  SW  begins  HW  ends  SW  ends

X value:

Check Lock

ABORT

18

Page 19: Improved Single Global Lock Fallback for Best-effort Hardware ...

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread  1(SW)

Thread  2(HW)

Case  2:  SW  beginsHW  beginsHW  endsSW  ends

Correct: a Actual: b

Time

Check Lock

ABORT

X value:

19

Page 20: Improved Single Global Lock Fallback for Best-effort Hardware ...

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Case  3:  SW  beginsHW  beginsSW  endsHW  ends

Thread  1(SW)

Thread  2(HW)

TimeX value:

Correct: b Actual: b

Check Lock

COMMIT

20

Page 21: Improved Single Global Lock Fallback for Best-effort Hardware ...

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN

X = b…

TXN_END

Case  4:  HW  beginsSW  beginsSW  endsHW  ends

Thread  1(SW)

Thread  2(HW)

TimeX value:

Correct: b Actual: b

Check Lock

COMMIT21

Page 22: Improved Single Global Lock Fallback for Best-effort Hardware ...

22

Thread  1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

TXN_BEGIN

Z = 1/(Y-X)

TXN_END

Thread  2(HW)

Z = 1/0 !!!Time

Hardware  Sandboxing

Page 23: Improved Single Global Lock Fallback for Best-effort Hardware ...

Indirect  Jumps

Thread  1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

_xbegin

if (X == Y) *p = garbagep()

…if (lock) abort_xend

Thread  2(HW)

_xend

Time

23

Page 24: Improved Single Global Lock Fallback for Best-effort Hardware ...

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

24

Page 25: Improved Single Global Lock Fallback for Best-effort Hardware ...

0

0.5

1

1.5

2

2.5

3

3.5

1 2 4 8

Spee

dup

Threads

Ssca2  (small  txns)

00.51

1.52

2.53

3.54

1 2 4 8

Spee

dup

Threads

Labyrinth  (large  txns)

25

Intruder  (medium  txns)

0

0.5

1

1.5

2

2.5

3

1 2 4 8

Speedup

Threads

TL2SGLHLEE-­SGLL-­SGL

Better

Page 26: Improved Single Global Lock Fallback for Best-effort Hardware ...

Improved  Lock  Acquisition  Rate

26

Vacation  Low  (medium  txns)

Kmeans High  (small  txns)

Intruder  (medium  txns)

Labyrinth  (large  txns)

0

5

10

15

20

25

30

1 2 4 8

%  lock  acquisitions

Threads

0

10

20

30

40

50

60

70

1 2 4 8

%  lock  acquisitions

Threads

051015202530354045

1 2 4 8

%  lock  acquisitions

Threads

HLE

E-­SGL

L-­SGL

0

10

20

30

40

50

60

70

80

1 2 4 8

%  lock  acquisitions

Threads

HLE

E-­SGL

L-­SGL

Better

Page 27: Improved Single Global Lock Fallback for Best-effort Hardware ...

No  single  thread  overhead

27

Slowdown  relative  to  sequential  for  1  thread

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Slowdown

TL2SGLHLEE-­SGLL-­SGL

Page 28: Improved Single Global Lock Fallback for Best-effort Hardware ...

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

28

Page 29: Improved Single Global Lock Fallback for Best-effort Hardware ...

Bloom Filters

• Efficient  probabilistic  data  structure  to  compute  fast  set  intersection  

• Can  admit  false  positives

• No  false  negatives

• Used  in  TM  for  Conflict  Detection

29

Page 30: Improved Single Global Lock Fallback for Best-effort Hardware ...

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend:X = ABORT

COMMIT

COMMIT

Lazy SGL

Time

30

Page 31: Improved Single Global Lock Fallback for Best-effort Hardware ...

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Check BFEnd

HW txn(1)

BeginHW txn

Check BFEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

Legend:X = ABORT

COMMIT

COMMIT

BF SGL

Time

31

Page 32: Improved Single Global Lock Fallback for Best-effort Hardware ...

Thread  1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread  2(HW)

Time

Case  1:  HW  begins  SW  begins  HW  ends  SW  ends

X value:a

Correct: a Actual: a

Check BF

If BFs intersect: ABORTElse: COMMIT

32

Page 33: Improved Single Global Lock Fallback for Best-effort Hardware ...

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread  1(SW)

Thread  2(HW)

Case  2:  SW  beginsHW  beginsHW  endsSW  ends

Correct: a Actual: b

TimeX value:

Check BF

If BFs intersect: ABORTElse: COMMIT 33

Page 34: Improved Single Global Lock Fallback for Best-effort Hardware ...

Conclusions

• HTMs  are  becoming  more  available

• Best-­effort  – need  software  fallback

• Eager  SGL• simple  and  fast  fallback,  • often  preferred  to  more  efficient  solutions

34

Page 35: Improved Single Global Lock Fallback for Best-effort Hardware ...

Conclusions

• Lazy  SGL  • as  simple  as  Eager  SGL• more  efficient

• Bloom  Filter  SGL  • more  accurate  conflict  detection• Slower

• Can  be  implemented  directly  in  hardware

35

Page 36: Improved Single Global Lock Fallback for Best-effort Hardware ...