Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit...

27
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT

Transcript of Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit...

Page 1: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Reduced Hardware NOrec: A Safe and Scalable

Hybrid Transactional Memory

Alexander MatveevNir Shavit

MIT

Page 2: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Good: Hardware Transactional Memory (HTM)

HTM may always fail due to:1. L1 cache capacity2. Interrupt3. Unsupported instruction

Bad: The HTM is “best-effort”

To ensure progress, we need

a software fallback

Page 3: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Thread 1 Thread 2

1. HTM Start

2. Read lock and check it is free

3. ... code …

4. HTM Commit

1. HTM Start

2. Read lock and check it is free

3. ... code …

4. HTM Commit

No conflict – HTMs commit concurrently

A Possible Solution is:Lock Elision

1. Lock1. Lock

2. Unlock2. Unlock

Page 4: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Thread 1 Thread 2

1. HTM Start

2. Read lock and check it is free

3. ... code …

1. HTM Start

2. Read lock and check it is free

3. ... code …

No concurrency between hardware and software

Thread 3

1. HTM Start

2. Read lock and check it is free

3. ... code …3. ... FAIL … HTM Restart

1. Acquire Lock

2. ... code …

3. Release Lock

4. ... CONFLICT … HTM Restart

4. ... CONFLICT … HTM Restart

Wait for LockWait for Lock

A Possible Solution is:Lock Elision

Page 5: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• Good– Simple: No need to instrument reads and

writes• Bad:

– Serial fallback: A software fallback grabs the global lock and aborts all hardware transactions

A Possible Solution is:Lock Elision

Page 6: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Thread 1 Thread 2

1. HTM Start

2. Read lock and check it is free

3. ... code …

1. HTM Start

2. Read lock and check it is free

3. ... code …

Thread 3

1. HTM Start

2. Read lock and check it is free

3. ... code …3. ... FAIL … HTM Restart

1. STM Start

2. ... code …

3. … more code …

4. ... more code …4. ... more code

STM and HTM execute concurrently

Another Approach is:Hybrid Transactional Memory

Page 7: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• Good– Hardware-Software Concurrency

• Bad:– Complex:1. Hard to coordinate hardware and software

2. Hard to apply to code due to instrumentation

Another Approach is:Hybrid Transactional Memory

Our focus

GCC C/C++ TM helps here a lot

Page 8: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• 2006: First Hybrid TM [DamronFedorovaLevLuchangcoMoirNussbaum]

– Key Idea: Use per location metadata version-locks to coordinate hardware and software

• Bad:– Hardware is slow: on each read/write must

read the version-lock and execute a branch condition check

Hybrid TM History

Page 9: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• 2007: Phased TM [LevMoirNussbaum]

– Key Idea: Use HTM mode or STM mode, but not HTM and STM at the same time

• Bad:– Expensive to switch modes: a single fallback

must stop all hardware

Hybrid TM History

Page 10: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• 2011: Hybrid Norec (state-of-the-art) [DalessandroCarougeWhiteLevMoirScottSpear]

– Key Idea: No metadata + global clock for coordination

Hybrid TM History

Page 11: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• Good– No metadata: Efficient for low concurrency

• Bad:– Limited Scalability: too much aborts due to

global clock updates• A software write must abort all hardware• A hardware write must abort all software

Hybrid NOrec

Page 12: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Hybrid NOrec

Slow-Path: Software

Read X (pure)Lock clock

ABORTX = 4

Fast-Path: Hardware

Unlock clock

Read clock Read clock

Read clockRead X

Read clock

RESTART

Update clock

Read X (verify clock)

Read X:check clock =>

changed => restart/revalidate

Page 13: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• 2011: Hybrid NOrec 2 [RiegelMarlierNowackFelberFetzer]

– Key Idea: Use non-speculative reads inside HTM to verify the global clock and avoid unnecessary aborts

• Bad:– HTM of Intel and IBM has no support for non-

speculative reads

A Possible Solution

Page 14: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• 2014: Invyswell Hybrid [CalciuGottschlichShpeismanPokamHerlihy]

– Key Idea: Allow unsafe concurrency between hardware and software, and use the HTM sandboxing to detect and handle errors

A Recent Approach

Page 15: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Invyswell

Slow-Path: Software

Read X (NEW)

Lock clock

X = 4 (NEW)

Read Y (OLD)

Func(X, Y): UnsafeHopes HTM aborts

Y = 8 (NEW)

Unlock clock

Update clock

Fast-Path: Hardware

NO ABORT

FUTURE

Page 16: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• Good– Much less aborts than Hybrid Norec

• Bad:– Unfortunately, HTM sandboxing may miss

errors, so a corrupted transactions may commit and crash the system:

– This problem was shown in a recent work: “Pitfalls of Lazy Subscription” by [DiceHarrisKoganLevMoir]

Invyswell

Page 17: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• 2015: RH NOrec [MatveevShavit]

– Key Idea: Use a “mixed” fallback path, that uses both software and short hardware transactions

Our New Approach

Page 18: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

RH NOrecSlow-Path: Software

Read X (NEW)

Lock clock

X = 4 (NEW)

Read Y (OLD)

Func(X, Y): UnsafeHopes HTM aborts

Y = 8 (NEW)

Unlock clock

Update clock

Fast-Path: Hardware

X = 4 (HIDDEN)

Y = 8 (HIDDEN)

HTM

X and Y both OLD or both NEW – not a mix

Read X (OLD)

Read Y (OLD)

Func(X, Y) Safe!

A Writes are speculative (invisible)

Mixed Slow-Path

Page 19: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• Key Point 1: Execute software writes in a short hardware transaction – No need to abort hardware transactions– Full safety

• In practice this works well– Due to the 80:20 rule: a typical operation has

80% reads and 20% writes

RH NOrec

Page 20: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• Key Point 2: Execute a maximal amount of initial software reads in a read-only hardware transaction – Allows to defer the global clock read, and

significantly reduce the software restarts/revalidations

RH NOrec

Page 21: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

HTM start

…reads/writes…

Update clock

HTM commit

Fast-Path: Hardware Mixed Path

Read clock

RESTARTRead some X:check clock =>

changed => restart/revalidate

… reads in software …(verifies clock)

Page 22: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

HTM start

…reads/writes…

Update clock

HTM commit

HTM start

…reads in HTM… (pure/direct)

Read clock

HTM commit

HTM Prefix

Fast-Path: Hardware Mixed Path

NO ABORTNO ABORT

Page 23: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

HTM start

…reads/writes…

Update clock

HTM commit

HTM start

…reads in HTM… (pure/direct)

Read clock

HTM commit

HTM Prefix

…reads in software…

HTM start

HTM commit

HTM Postfix

Lock clock

…writes in HTM…

Unlock clock

HTM start

Update clock

HTM commitNO ABORTNO ABORT

…reads/writes…

Page 24: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Throughput on 8-core Intel (GCC C/C++)

1 2 4 6 8 10 12 14 160.00E+00

1.00E+08

2.00E+08

3.00E+08

4.00E+08

5.00E+08

6.00E+08

7.00E+08

Lock ElisionRH-NORecTL2HY-NORec

Red-Black Tree (10K)10% mutations

1 2 4 6 8 10 12 14 160.00E+00

5.00E+07

1.00E+08

1.50E+08

2.00E+08

2.50E+08

3.00E+08

3.50E+08

4.00E+08

4.50E+08

Lock ElisionRH-NORecTL2HY-NORec

Red-Black Tree (10K)40% mutations

Page 25: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

1 2 4 6 8 10 12 14 160.00E+00

2.00E+05

4.00E+05

6.00E+05

8.00E+05

1.00E+06

1.20E+06

1.40E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

Vacation Database (STAMP - Low)

1 2 4 6 8 10 12 14 160.00E+00

5.00E+05

1.00E+06

1.50E+06

2.00E+06

2.50E+06

3.00E+06

3.50E+06

4.00E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

Intruder Detection (STAMP)

1 2 4 6 8 10 12 14 160.00E+001.00E+052.00E+053.00E+054.00E+055.00E+056.00E+057.00E+058.00E+059.00E+051.00E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

Genome Sequencing (STAMP)

1 2 4 6 8 10 12 14 160.00E+00

5.00E+05

1.00E+06

1.50E+06

2.00E+06

2.50E+06

3.00E+06

3.50E+06

4.00E+06

4.50E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

SSCA2 (STAMP)

Page 26: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

• RH Norec: a new Hybrid TM that is safe and scalable

• Key Idea: Use a “mixed” fallback path that uses two short hardware transactions:1. HTM Prefix: Executes a maximal amount of

initial reads – defers the global clock read2. HTM Postfix: Executes the software writes –

preserves safety and allows hardware-software concurrency

Conclusion

Page 27: Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Thank You