Wait-Free Linked-Lists
description
Transcript of Wait-Free Linked-Lists
Wait-Free Linked-Lists
Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez PetrankTechnion, Israel
Presented by Shahar Timnat
4 6 9-∞ +∞
Our Contribution• A fast, wait-free linked-list• The first wait-free list fast enough to be
used in practice
Agenda• What is a wait-free linked-list?• Related work and existing tools• Wait-Free Linked-List design• Performance
3
Concurrent Data Structures• Allow several threads to read or modify
the data-structure simultaneously• Increasing demands due to highly-
parallel systems
Progress Guarantees• Obstruction Free – A thread running
exclusively will make a progress• Lock Free – At least one of the running
threads will make a progress• Wait Free – every thread that gets the
CPU will make a progress.
Wait Free Algorithms• Provides the strongest progress
guarantee• Always desirable, particularly in real-
time systems.• Relatively rare• Hard to design• Typically slower
The Linked List Interface • Following the traditional choice;
a sorted list-based set of integersinsert(int x);delete(int x);contains(int x);
4 6 9-∞ +∞
Prior Wait-Free Lists• Only Universal Constructions• Non-scalable (by nature ?)• Achieve good complexity, but poor
performance• State-of-the-art construction (Chuong,
Ellen, Ramachandran) significantlyunder-perform our construction.
Our wait-free versus a universal construction
1 5 9 13 17 21 25 290
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
WF Universal Threads
Ope
ratio
ns d
one
in 2
seco
nds (
Mill
ions
)
1 5 9 13 17 21 25 290
20
40
60
80
100
120
140
160
180
Threads
Ratio
Linked-Lists with Progress Guarantee • No practical wait-free linked-lists
available• Lock-free linked-lists exists• Most notably: Harris’s linked-list
Existing Lock-Free List(by Harris)• Deletion in two steps• Logical: Mark the next field using a CAS
• Physical: Remove the node4 6 9
4 6 9
Existing Lock-Free List(by Harris)• Use the least significant bit in each next field,
as a mark bit• The mark bit signals that a node is logically
deleted• The Node’s next field cannot be changed (the
CAS will fail) if it is logically deleted
4 6 9
4 6 9
Help Mechanism• A common technique to achieve wait-
freedom• Each thread declares in a designated
state array the operation it desires• Many threads may attempt to execute
it
Help Mechanism - Difficulties• Multiple threads should be able to work
concurrently on the same operation• Many potential races• Difficult to design• Usually slower
Complication: Deletion Owning• T1, T2 both attempt delete(6)
4 6 9-∞ +∞
Complication: Deletion Owning• T1, T2 both attempt delete(6)• T1, T2 both declare in the state array
4 6 9-∞ +∞
Complication: Deletion Owning• T1, T2 both attempt delete(6)• T1, T2 both declare in the state array• T3 sees T1 declaration and tries to help it,
while T4 helps T2
4 6 9-∞ +∞
Complication: Deletion Owning• T1, T2 both attempt delete(6)• T1, T2 both declare in the state array• T3 sees T1 declaration and tries to help it,
while T4 helps T2
4 6 9-∞ +∞
Complication: Deletion Owning• If both helpers T3, T4 “go to sleep” after
the mark was done, which thread (T1 or T2) should return true and which false?
4 6 9-∞ +∞
"Solution: use a “success bit• Each node holds an extra “success bit”
(initially 0)• Potential owners compete to CAS it to 1
(no help in this part)• Note the node is deleted before it is
decided which thread owns its deletion
Helping an Insert Operation• Search• Direct• Insert• Report
Helping an Insert Operation• Search• Direct• Insert• Report
4 6 9
Status: PendingOperation: InsertNew node:
7
Helping an Insert Operation• Search• Direct• Insert• Report
4 6 9
Status: PendingOperation: InsertNew node:
7
Helping an Insert Operation• Search• Direct• Insert• Report
4 6 9
Status: PendingOperation: InsertNew node:
7
Helping an Insert Operation• Search• Direct• Insert• Report
4 6 9
CAS
Status: PendingOperation: InsertNew node:
7
Helping an Insert Operation• Search• Direct• Insert• Report
4 6 9
Status: PendingOperation: InsertNew node:
7
Status: SuccessOperation: InsertNew node:
CAS
Incorrect Result Returnedconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9 inserts new node.
CAS(state[tid],s,success)
} 4 6 9
T2 {
found(6,7) CAS(state[tid],s,failure)} 7
Incorrect Result Returnedconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9 inserts new node
CAS(state[tid],s,success)
} 4 6 9
T2 {
found(6,7) CAS(state[tid],s,failure)} 7
Incorrect Result Returnedconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9 inserts new node
CAS(state[tid],s,success)
} 4 6 9
T2 {
found(6,7) CAS(state[tid],s,failure)}
7
Incorrect Result Returnedconsider 2 threads helping insert(7)
T1 {
found (6,9) node.next = &9 inserts new node.
CAS(state[tid],s,success)
} 4 6 9
T2 {
found(6,7) CAS(state[tid],s,failure)}
7
Incorrect Result Returnedconsider 2 threads helping insert(7)
T1 {
found (6,9) node.next = &9 inserts new node.
CAS(state[tid],s,success)
} 4 6 9
T2 {
found(6,7) CAS(state[tid],s,failure)}
7
Incorrect Result Returnedconsider 2 threads helping insert(7)
T1 {
found (6,9) node.next = &9 inserts new node
CAS(state[tid],s,success)
} 4 6 9
T2 {
found(6,7) CAS(state[tid],s,failure)}
7
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 9
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
7
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 9
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
7
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 7
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
9
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 7
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
9
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 7
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
9
7’
Incorrect Result Returned 2T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 7
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
9
7’
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 7
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
9
7’
Incorrect Result Returned 2
T1 { found (6,9) node.next = &9 inserts new node
CAS(->success)
}4 6 7
T2 {
found(6,7) CAS(->failure}
T3 {
Delete(7)Insert(7)}
9
7’
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 9
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}7
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 9
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}7
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 9
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}7
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 9
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}
7
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 9
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}
7
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 8
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) ...Insert(8) (after 7)}
7 9
Ill-timed Directconsider 2 threads helping insert(7)
T1 { found (6,9) node.next = &9}
4 6 8
T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) ...Insert(8) (after 7)}
7 9
More Races Exist• Additional races were handled in both
the delete and insert operations• We constructed a formal proof for the
correctness of the algorithm
Main Invariant• Each modification of a node’s next field
belongs into one of four categories• Marking (change the mark bit to true)• Snipping (removing a marked node)• Redirection (of an infant node)• Insertion (a non-infant to an infant)
• Proof by induction and by following the code lines
Fast-Path-Slow-Path(Kogan and Petrank, PPOPP 2012)• Each thread:• Tries to complete the operation without
help• Asks For help Only if it failed due to
contention
• (Almost) as fast as the lock-free • Gives the stronger wait-free guarantee
Fast-Path-Slow-Path• Previously implemented for a queue• Requires the wait-free algorithm and
the lock-free one to work concurrently• Our algorithm was carefully chosen to
allow a fast-path-slow-path execution
Performance• We measured our Algorithm against
Harris’s lock-free algorithm• We measured our algorithm using• Immediate help• Deferred help• FPSP
Performance• We report the results of a micro-
benchmark:• 1024 possible keys, 512 on average• 60% contains, 20% insert, 20% delete
• Measured on:• Intel Xeon (8 concurrent threads)• Sun ULTRA SPARC (32 concurrent
threads)
Performance
0
200
400
600
800
1000
1200
1400
1600
1800
2000
UltraSPARC T1
LF FPSP Deferred-HelpImmed-Help
Number Of Threads
Operations (thousands)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
500
1000
1500
2000
2500
Intel(R) Xeon(R)
LF FPSP Deferred-Help Immed-Help
Number Of Threads
Performance• When employing the FPSP technique
together with our algorithm:• 0-2% difference on Intel (R) Exon (R)• 9-11% difference on on UltraSPARC
Conclusions• We designed the first practical wait-
free linked-list• Performance measurement shows our
algorithm to work almost as fast as the lock-free list, and give a stronger progress guarantee• A formal correctness proof is available
Questions?