Lock free Algorithm for Multi-core architecture · 2016-07-06 · SDY 1.Introduction Background...
Transcript of Lock free Algorithm for Multi-core architecture · 2016-07-06 · SDY 1.Introduction Background...
SDY
Lock free Algorithm for Multi-core architectureLock free Algorithm for Multi-core architecture
SDY CorporationHiromasa Kanda
SDY
1.IntroductionBackground needed Multi-ThreadProblem of Multi-Thread programWhat's Lock free?
2.Lock free AlgorithmAtomic operationLock-free queueLock-free hash-map
3.Performance
4.Summary
ContentsContents
SDY
11.Introduction.IntroductionBackground needed Multi-ThreadBackground needed Multi-Thread
・Scheduling
・・Shared resource
・The speedup of a program using multiple processors in parallel computing is limited
・Parallelization Multi-threadMulti-thread
・Manage memory access and CPU clock・Limited CMOS scaling
What is needed in application?What is needed in application?
Amdahl's lawAmdahl's law
Problem of Multi-threadProblem of Multi-thread
Multi-core and SMT(HT)Multi-core and SMT(HT)
SDY
11.Introduction.Introduction
There was resources problem that share it when concurrent access multi-thread.
Problem of Multi-Thread program 1/2Problem of Multi-Thread program 1/2
Thread 1q.dequeue;
Thread 2q.dequeue();
queue q
① ②10
Criticalsession!!
dequeuedequeue
20
30
10 10
SDY
The traditional approach to multi-thread programming is using locks synchronize access to shared resources.Mutex,Semaphore
11.Introduction.IntroductionProblem of Multi-Thread program 2/2Problem of Multi-Thread program 2/2
Thread 1lock q;
q.dequeue();
unlock i
Thread 2
lock q;
q.dequeue();
unlock i
queue q
② ②'10
dequeue20
30④
①lock
③unlock
lock
lock falssewait
⑤unlock
dequeue10
20
SDY
Lock-free is "non-blocking" algorithm that is not broken value when access each thread at the same time.
11.Introduction.IntroductionWhat's Lock free?What's Lock free?
Thread 1q.dequeue();
q.enqueue(40);
Thread 2q.euqueue();
queue q
①②
③
10
Thread Safe!
enqueuedequeue
enqueue 20
30
Lock-free algorithm
40
10 20
SDY
2.Lock free Algorithm2.Lock free AlgorithmAtomic operationAtomic operation
"atomic operation" is built-ins that is no memory operand will be moved across the operation,either forward or backward.
・Test-and-Set operation TAS __sync_test_and_set(&p , a)
・Fetch-and-Add(Sub) operation __sync_fetch_and_add(&p , i )
・Compare-and-swap operation CAS __sync_compare_and_swap(&p , a , b )
SDY
2.Lock free Algorithm2.Lock free AlgorithmLock-free queue 1/10Lock-free queue 1/10
head pointer
Lock-free queue algorithm(Java Concurrent queue)
②
dummytail pointer
dummy
dummy nodenext
NULLNULL
node Anext
data ANULL
head pointerdummy
tail pointerA
dummy nodenext
NULLA
node Anext
data ANULL
enqueue
Head pointerA
tail pointerA
node Anext
data ANULL
dequeueHead pointer
dummytail pointer
A
dummy nodenext
NULLA
node Anext
data ANULL
①
②
① ←dummy
data A
SDY
2.Lock free Algorithm2.Lock free AlgorithmLock-free queue 2/10Lock-free queue 2/10Lock-free queue algorithm(Java Concurrent queue) enqueue(value)E01: node = new node ;E02: node–>value = value ;E03: node–>next.ptr = NULL ;E04: While(true) {E05: tail = TailPointer;E06: next = tail.ptr–>next;E07: if ( tail == TailPointer ) {E08: if (next.ptr == NULL ){E09: if ( CAS(&tail.ptr–>next, next, node) ) {E10: break ;E11: }E12: } else {E13: CAS(&TailPointer, tail, next.ptr) ; E14: }E15: }E16: }E17: CAS(&TailPointer, tail, node) ;
SDY
2.Lock free Algorithm2.Lock free AlgorithmLock-free queue 3/10Lock-free queue 3/10Lock-free queue using Java algorithmdequeue(value)D01: while(true){D02: head = HeadPointer ; D03: tail = TailPointer ;D04: next = head–>next ;D05: if ( head == HeadPointer ) { D06: if ( head.ptr == tail.ptr ) {D07: if ( next.ptr == NULL ) {D08: return FALSE ;D09: }D10: CAS(&TailPointer, tail, next.ptr) ;D11: } else {D12: value = next.ptr–>value ;D13: if ( CAS(&HeadPointer, head, next) ) { D14: break ;D15: }D16: }D17: }D18: }D19: delete head ;D20: return true ;
Critical SessionCritical Session
SDY
2.Lock free Algorithm2.Lock free AlgorithmLock-free queue 4/10Lock-free queue 4/10Non liner lock-free queue (using arrangement)
head pointer0
tail pointer0
0NULL
1NULL
2NULL
3NULL
4NULL
5NULL
6NULL
MNULL・・・
head pointer0
tail pointer2
0A
1B
2NULL
3NULL
4NULL
5NULL
6NULL
MNULL・・・
SDY
Lock-free queue 5/10Lock-free queue 5/10
head pointer
Lock-free queue new algorithm
②
①
dummytail pointer
dummy
dummy nodenext
NULLdummy
node Anext
data Adummy
head pointerdummy
tail pointerA
dummy nodenext
NULLA
node Anext
data Adummy
enqueue pattan1
Head pointerdummy
tail pointerA
dummy nodenext
NULLdummy
dequeue pattan1Head pointer
dummytail pointer
A
dummy nodenext
NULLA
node Anext
data Adummy
Node Anext
data Adummy
②
①
2.Lock free Algorithm2.Lock free Algorithm
③
SDY
Lock-free queue 6/10Lock-free queue 6/10Lock-free queue new algorithm
②
Head pointerB
tail pointerB
dummy nodenext
NULLB
dequeue pattan2Head pointer
dummytail pointer
B
dummy nodenext
NULLA
node Anext
data AB
Node Anext
data AB ①
2.Lock free Algorithm2.Lock free Algorithm
Node Bnext
data Bdummy
Node Bnext
data Bdummy
③
SDY
Lock-free queue 7/10Lock-free queue 7/10Lock-free queue new algorithm
Head pointerB
tail pointerB
dummy nodenext
NULLB
dequeue pattan3 (case of dequeue2)
2.Lock free Algorithm2.Lock free Algorithm
Node Bnext
data Bdummy
Head pointerdummy
tail pointerB
dummy nodenext
NULLdummy
Node Bnext
data Bdummy
Same result dequeue pattan1
①
②③
SDY
head pointer
①
dummytail pointer
A
dummy nodenext
NULLdummy
node Bnext
data Bdummy
head pointerdummy
tail pointerB
dummy nodenext
NULLB
node Bnext
data Bdummy
enqueue pattan2 (case of dequeue1)
②
Lock-free queue 8/10Lock-free queue 8/10Lock-free queue new algorithm
2.Lock free Algorithm2.Lock free Algorithm
①
node Cnext
data Cdummy
head pointerB
tail pointerC
dummy nodenext
NULLB
node Bnext
data BC
②
enqueue pattan3 (case of dequeue2)
Head pointerB
tail pointerB
dummy nodenext
NULLB
Node Bnext
data Bdummy
node Cnext
data Cdummy
Same result dequeue pattan1
Same result dequeue pattan2
SDY
2.Lock free Algorithm2.Lock free AlgorithmLock-free queue 9/10Lock-free queue 9/10Lock-free queue using new algorithmenqueue(value)E01: node = new node ;E02: node–>value = value ;E03: node–>next.ptr = dummy ;E04: while( true ) {E05: tail = TailPointer;E06: if ( CAS(&tail–>next,dummy,node) ) {E07: CAS(&tail,TailPointer,node) ;E08: break ;E09: } else {E10: next = tail->next;E11: if ( next != dummy ) {E12: CAS(&TailPointer, tail ,next ) ;E13: }E14: }E15: }
SDY
2.Lock free Algorithm2.Lock free AlgorithmLock-free queue 10/10Lock-free queue 10/10Lock-free queue using new algorithmdequeue(value)D01: while ( true ) {D02: head = HeadPointer ;D04: next = HeadPointer –>next ;D05: if ( head == HeadPointer ) { D06: if ( head == dummy ) {D07: if ( next == dummy ) return false ;D09: if ( CAS(HeadPointer, head, next–>next) ){D10: TAS(&dummy–>next,dummy) ;D11: value = next–>value ;D12: delete next ;D13: return true ;D14: }D15: } else {D16: if ( CAS(HeadPointer, head, next–>next) ) {D17: CAS(TailPointer, head , dummy) ;D18: value = head->value ;D19: delete head ;D20: return true;D21: }D22: }D23: }D24:}
SDY
Lock-free hash-mapLock-free hash-mapLock-free hash-map algorithm
2.Lock free Algorithm2.Lock free Algorithm
0NULL
1NULL
2NULL
3NULL
4NULL
5NULL
6NULL
MNULL・・・
NULL NULL NULL NULL NULL NULL NULL NULLkey
value
hash-map
0NULL
1NULL
2A
3NULL
4NULL
5NULL
6NULL
MNULL・・・
NULL NULL A NULL NULL NULL NULL NULLkey
value
hash-map
Key AValue AHash function
SDY
3.Performance3.Performance
1 5 10 50 100 200 300 500 700 10000
5000000000
10000000000
15000000000
20000000000
25000000000
30000000000
35000000000
40000000000
45000000000
0
5000000000
10000000000
15000000000
20000000000
25000000000
30000000000
35000000000
40000000000
45000000000
std queue(use mutex)lock-free queue(liner)lock-free queue(non-liner)
queue's performance
Thread num
RTDSC /thread num
SDY
3.Performance3.Performance
1 5 10 500
500000000
1000000000
1500000000
2000000000
2500000000
3000000000
0
500000000
1000000000
1500000000
2000000000
2500000000
3000000000
std queue(use mutex)lock-free queue(liner)lock-free queue(non-liner)
queue's performanceRTDSC /thread num
Thread num
SDY
4.Summary4.Summary・Possible simple coding on application
・Possible access faster than use mutex
・I will plans to create “list” and “liner hash-map”. Present,considering some method that is find(),delete().. base on liner queue.
See source forge website:
http://sourceforge.jp/projects/c-lockfree/