1 CS 201 Compiler Construction Array Dependence Analysis & Loop Parallelization.
Automatic Loop Parallelization using STM
-
Upload
amr-abed -
Category
Technology
-
view
336 -
download
0
Transcript of Automatic Loop Parallelization using STM
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
Why STM ?
O Lock-basedO DeadlockO Priority InversionO Convoying
O Lock-freeO Not easy to implementO CAS multiple locations
O STM O Easy to implement, as in lock-basedO Higher performance than lock-free
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
STM
O Each transaction performs an atomic task
O Transactions run concurrently
O To access shared memoryO Write buffer
O Undo log
O At end, validate reads O No conflict Commit
O Conflict Abort and restart
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
Motivation
Location-based metadata
TX writing to W locationsO(W) CAS operations
Committing R/W TXO(R+W) overhead
All validation done in CS
TX-based metadata
TX writing to W locationsNo CAS operations
Committing R/W TXSingle CAS operation
Bloom filters used for validation
Orec-based STM Ring STM
0 0 0 0 0 0 0 0 0 0 0 0 0 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 0 0 1 0 0 1 0 0 0 0 0 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Bloom Filter
Hash Functions
A
171013
Inserting A
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 0 0 1 0 0 1 0 0 0 0 0 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Bloom Filter
Hash Functions
B
27912
Inserting B
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Bloom Filter
Hash Functions
A
171013
Searching for A
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Bloom Filter
Hash Functions
C
371012
Searching for C
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Bloom Filter
Hash Functions
D
271012
Searching for D
New Transaction
41
42
43
40
44
46
39
45 Writing
Complete
Write Buffer
Start Time43
Write FilterRead Filter
1 0 0 1 0 0 1 0 0 0 0 0 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 0 0 1 0 0 0 1 0 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
On Write
Hash Functions
Address
171012
Write filter
Add address to Write set
1 0 0 1 0 0 1 0 0 0 0 0 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 0 0 1 0 0 0 1 0 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
On Write
Hash Functions
Address
171012
Write filter
Write Buffer
Value
Address
Add <address, value> to Write buffer
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
On Read
Hash Functions
Address
171013
Address is in Write set
Write filter
On Read
Get value from write buffer
Write Buffer
Value
Address
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Hash Functions
Address
171013
Write filter
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
On Read
Hash Functions
Address
371012
Address is not in Write Set
Write filter
On Read
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 1 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
Hash Functions
Address
371012
Get value from memory
Write filter
Memory
Value
Address
1 1 0 1 1 0 1 0 0 0 1 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
1 0 0 1 1 0 1 0 0 0 0 1 1 000
15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0
On Read
Hash Functions
Address
371012
Add address to read set
Read filter
Write Buffer
Start Time43
Write FilterRead Filter
On Read
41
42
43
40
44
46
39
45
Read Filter
Start Time44
Check for conflicts
Write Buffer
Start Time43
Write FilterRead Filter
On Commit
41
42
43
40
44
46
39
45
47
Add new entry, and update index
Write Buffer
Start Time43
Write FilterRead Filter
On Commit
41
42
43
40
44
46
47
45
Write Filter
Check for conflicting writers
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
New Transaction
Write Buffer
Startversion
Write SignatureRead Signature
Commitversion
Abort?
Commit?
Global Clock
Commitlog
Pre-Commit
log
minSVlog
New Transaction
Write Buffer
Startversion
Write SignatureRead Signature
Commitversion
Abort?
Commit?
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
Loop parallelization
O Non-Speculative ParallelizationDOALL
DOACROSS
DSWP
O Speculative ParallelizationTLS
Spec-PS-DSWP
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
The Value Algorithm
Slow-modeper-access instrumentation
Consistency Check
Fast-mode un-instrumented speed
No false conflicts
Data forwarding
The Signature Algorithm
Array of write signatures
Update entry on each write
Maintain read signature
On Transition, intersect sets
Even before!
Outline
O Motivation
O Software Transactional Memory
O RingSTM
O STMlite
O Speculative Parallelization
O Fastpath
O SMTX (Spec-PS-DSWP)
Example code
A: while(node) {
B: node = node−>next;
C: res = work(node);
D: write(res); }A B
C
D
Control Dependency
Data Dependency
Speculation
while(TRUE) {
B: node = node−>next;
C: res = work(node);
D: write(res); }A
C
D
BB
Control Dependency
Data Dependency
Parallel Stage
CC
Pipelining
C
D
B
Stage 1 (Sequential)
Stage 2 (Parallel)
Stage 3 (Sequential)
node = node−>next;
res = work(node);
write(res);
Core 0 Core1 Core 2 Core 3 Core 4 Core 5
0
1
2
3
4
5
Execution
B0
B1
B2
B3
B4
B5
D0
D1
D2
C2
C1
C3Commit 0try1
C0
C4
Try0
MTX creation
Copy
on
Write
Virtu
al A
dd
ress S
pa
ce
Page table
Main/Commit
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Worker 1
Worker 2
Copy
on
Write
Communication Channel
Copy
on
Write
Com
mit
Memory access
Copy
on
Write
Virtu
al A
dd
ress S
pa
ce
Page table
Main/Commit
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Worker 1
Worker 2
Priv
ate
Priv
ate
Copy
on
Write
Copy
on
Write
Communication Channel
Com
mit
Commit
Copy
on
Write
Virtu
al A
dd
ress S
pa
ce
Page table
Main/Commit
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Worker 1
Worker 2
Priv
ate
Priv
ate
Communication Channel
or
Copy
on
Write
Virtu
al A
dd
ress S
pa
ce
Page table
Main/Commit
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Worker 1
Worker 2
Priv
ate
Priv
ate
Com
mit
Communication Channel
Com
mit
Rollback
Copy
on
Write
Virtu
al A
dd
ress S
pa
ce
Page table
Main/Commit
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Page tablePage table
Virtu
al A
dd
ress S
pa
ce
Page table
Worker 1
Worker 2
Priv
ate
Priv
ate
Copy
on
Write
Communication Channel