Reactive Spin-locks: A Self-tuning Approach

Reactive Spin-locks: A Self-tuning Approach

Phuong Hoai Ha

Marina Papatriantafilou

Philippas Tsigas

I-SPAN ’05, Las Vegas,Dec. 7th – 9th, 2005

I-SPAN '05 2

Outline

• Mutual exclusion– Overhead– Available reactive spin-locks

• New reactive spin-lock– Model– Algorithm– Evaluation

• Conclusions

I-SPAN '05 3

Mutual exclusion

• Performance goals:– Low latency– Low contention– …

Entry section Critical section Exit sectionNoncritical sec.

Lock releasedRequests issuedArbitrationLock sent to winner

I-SPAN '05 4

Spin-lock categories

• Arbitrating locks:– Determine who is the next lock-holder in advance, e.g.

ticket-locks, queue-locks.– Advantages:

• Prevent processors from causing bursts in network traffic and high contention on the lock.

• Non-arbitrating locks:– E.g. Test-and-set locks– Advantages:

• Exploit locality/cache• Tolerate failures in the Entry section.

I-SPAN '05 5

Arbitrating vs. non-arbitrating locks

InterconnectionNetwork


11 33 55

22 44 66



I-SPAN '05 6

Available reactive spin-lock algorithms

• Drawbacks:– Their reactive schemes rely on

• Fixed experimental thresholds– The thresholds frequently become inappropriate in variable

and unpredictable environments like multiprogramming systems

– E.g. ticket locks with proportional backoff, test-and-test-and-set locks with exponential backoff

• Known probability distributions of some inputs– The assumption is not usually feasible.

I-SPAN '05 7

New reactive spin-lock algorithm

• Ideas– A non-arbitrating lock with adaptive sensible backoff

delay.

• Advantages– Its reactive scheme is self-tuning

• Neither experimentally tuned thresholds nor probability distributions of inputs are needed

– It combines advantages of both arbitrating and non-arbitrating spin-lock categories.

• It can exploit locality as well as reduce contention on the lock.

I-SPAN '05 8

Find sensible backoff delay• Need to optimize trade-off between:

– Latency • The interval between a pair of lock-release and lock-acquisition

– Contention on the lock • This is an online problem.

Load on the lock

delay=?

I-SPAN '05 9

Reactive scheme

– Increase delay only when the load on lock is the highest so far,– When increasing delay, increase just enough to keep the competitive ratio c = P - (P-1)/P1/(P-1)

• Bounds for loads on the lock: 1 lt P• During a load-rising phase:

• Similar for load-dropping phase

• In each load-rising/load-dropping phase, the reactive scheme is competitive with competitive ration c=(ln(P))

I-SPAN '05 10



Algorithm

00

00 11

33 22

11223344

•The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P).

I-SPAN '05 11

Evaluation

• Benchmarks– Spark98 kernel: lmv– SPLASH-2 suite: Volrend and Radiosity

• Representatives:– Arbitrating: ticket lock with (tuned) proportional

backoff– Non-arbitrating: test-and-test-and-set lock with (tuned)

exponential backoff

• System– A ccNUMA SGI Origin2000 with 28 250MHz MIPS

R1000 processors.

I-SPAN '05 12

Experimental results

Spark98_Complete_Sgi2k_ExecTime

0

200

400

600

800

1000

1200

1 4 8 12 16 20 24 28

#processors

tim

e (m

s)

tts ticket reactive

I-SPAN '05 13

Experimental results (2)

Volrend_Sgi2k_ExecTime

0

200

400

600

800

1000

4 8 12 16 20 24 28

#processors

tim

e (m

s)

tts ticket reactive

I-SPAN '05 14

Experimetal results (3)

Radiosity_Sgi2k_ExecTime

0

2000

4000

6000

8000

10000

12000

14000

16000

4 8 12 16 20 24 28

#processors

tim

e (m

s)

tts ticket reactive

I-SPAN '05 15

Conclusions

• We have designed and implemented a new reactive spin-lock:– It is self-tuning.– It combines advantages of both arbitrating and non-

arbitrating locks– Its reactive scheme is competitive with c= (ln(P))

The lock automatically adjusts its backoff delay reasonably according to loads on the lock as well as applications

Thanks for your attention!

I-SPAN '05 17

Estimate delay bases • Fairness

– A fair lock helps parallel application gain performance since the application threads can execute their non-critical section in parallel.

– Definition:

• Heuristic to estimate basel

Nn

nfairness

ii

ii

t .max

2

.

DoCS

bDoCSabasel

, where a, b are system documented constants

and DoCS is the delay outside CS

, where ni is #lock-acquisitions of a processor in t and N is #processors

I-SPAN '05 18

NUMA• Another parameter that makes the problem harder is NUMA

– Latency is much different– E.g. ccNUMA SGI Origin2000

I-SPAN '05 19

Model: An online problem

• A sequence of loads on the lock are unfolded on-the-fly.• When observing a load, the algorithm must decide how much its

current backoff delay should be lengthened.– If increasing delay too soon, it will waste time on a long delay when

the lock becomes available– If not increasing delay in time, it will cause high contention on the

lock

it must increase delay at high loads reasonably

Goal is to maximize t delayt .loadt ,where t delayt P

I-SPAN '05 20

Algorithm• LockType:

<lock, counter>

• Initial delay = L.counter x

basel

• The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P).

Acquire( Lock pL)L = FAA(pL.L, <1,1>)if L.lock then delay = ComputeDelay(L) cond = <1,0>do sleep(delay) L = pL.L if L.lock then

delay = ComputeDelay(L) continue;

cond = FAA(pL.L, <1,0>) while cond.lock

Release( Lock pL)do L = pL.Lwhile not CAS(pL.L,L,<0,L.counter-1>)

Reactive Spin-locks: A Self-tuning Approach

Documents

Transcript of Reactive Spin-locks: A Self-tuning Approach