From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and...
Transcript of From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and...
![Page 1: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/1.jpg)
1
From Transactions to Dataflow and Back Again
Mikel Luján
Advanced Processor Technologies Group
University of Manchester
http://www.cs.manchester.ac.uk/apt
![Page 2: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/2.jpg)
2
From Transactions to Dataflow and Back Again
M. Ansari, C. Kotselidis, B. Khan, M. Horsnell, K. Jarvis, S. Khan, D. Goodman, C. Seaton, C. Kirkham,
I. Watson & M. Lujan
Advanced Processor Technologies Group
University of Manchester
http://www.cs.manchester.ac.uk/apt
![Page 3: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/3.jpg)
3
Multi-cores
![Page 4: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/4.jpg)
APT Group and Manchester
4
Asynchronous Systems
Advanced Processor Technologies
Networks-
on-Chip Sw/Hw/ML
Mikel Lujan
Gavin Brown
Ian Watson
Multi-Core
Chips
Low-
Power
Systems
Neural Systems
Engineering
Steve Furber
Jim Garside
Dave Lester
GAELS
Jim Garside
Steve Furber
3D VLSI
V. Pablidis
Virtualization
nn A. Rawsthorne
![Page 5: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/5.jpg)
5
Multi-cores == Terror Movie?
Business volume Hardware $200K millons Software $2K billons
![Page 6: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/6.jpg)
6
Roadmap for today
APT Group Intro & Need for SW/HD co-design Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 7: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/7.jpg)
7
Circuit Routing
![Page 8: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/8.jpg)
8
Definitions
Grid: a three dimensional
Layer: is the combination of a conductive layer and non-conductive one
Via: connection among the different layers
Cell: a point in the grid
Route: a set of contiguous cells that reach from the source cell to the destination cell
Obstacle: one cell (or set of cells) that cannot belong to any route
![Page 9: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/9.jpg)
9
Problem definition
Input: • Description of the board
• List of cell pairs - (source, destination)
Output: • list of routes
Program: • Automatically generate the routes so that the
routes do not contain cells in common while offering the best “electrical properties”.
![Page 10: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/10.jpg)
10
D
S
Lee’s algorithm
![Page 11: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/11.jpg)
11
What kind of routes can we guarantee to have found?
![Page 12: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/12.jpg)
12
Example of routes: disallowed vs allowed
![Page 13: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/13.jpg)
13
Lee’s algorithm (pseudo code)
Grid grid
for i in list of routes {
expand (from source to destination)
traceBack (from destination to origin)
cleanup(expansion)
}
![Page 14: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/14.jpg)
14
Roadmap for today
APT Group Intro & Need for SW/HD co-design Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 15: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/15.jpg)
15
My turn – Parallel Lee
Grid grid
ListOfRoutes myRoutes // subset of routes
for my_i in myRoutes {
acquire lock(grid)
expand(from origin to destination)
traceBack (from destination to origin)
cleanup(expansion)
release lock(grid)
}
![Page 16: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/16.jpg)
16
Our turn – Towards Parallel Lee v2.0
Grid grid
for i in list of routes {
expand (from source to destination)
traceBack (from destination to origin)
cleanup(expansion)
}
![Page 17: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/17.jpg)
17
Our turn – Parallel Lee v2.0
Grid grid VectorOfLocks vector SynchronizedQueueOfRoutes queue, queueForLongRoutes while (thereAreMoreRoutes & IAmActive) { nextRoute (queue) determine to which grid partition route belongs // coordinates if route fits within partition{ acquire lock(vector, coordinates for partition) expand (from source to destination) traceBack(from destination to origin) clenup (expansion) release lock(vector, coordinates for partition) } else { add route to queueForLongRoutes } // decide whether IAmActive still, grow partition & swap // queue andqueueForLongRoutes
}
![Page 18: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/18.jpg)
18
Your turn – Towards Parallel Lee v3.0
Grid grid
for i in list of routes {
expand (from source to destination)
traceBack (from destination to origin)
cleanup(expansionGrid)
}
![Page 19: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/19.jpg)
19
A pause for reflection
Parallel programming -> easy/complex
Deadlock/livelock
Composing parallel libraries
Message passing vs. shared memory
Memory model (SC, relaxed)
Can we offer these abstractions to expert software developers? To high productivity ones?
![Page 20: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/20.jpg)
20
Roadmap for today
APT Group Intro & Need for SW/HD co-design Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 21: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/21.jpg)
21
Transactional Memory Hype – Big Promises
Composition
Easy to use as a single global lock
As efficient as fine grain locking
![Page 22: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/22.jpg)
22
One transaction in databases?
ACID • Atomicity: is the property which guarantees that
every operation has been performed or none at all (never halfway)
• Consistency: is the property which guarantees that read and written values are coherent
• Isolation: is the property which guarantees that one transaction will not be affected by another transaction
• Durability: is the prosperity which guarantees persistent data
![Page 23: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/23.jpg)
23
Transactional Memory - Syntax
synchronized(foo) {
x++;
y++;
z++;
}
atomic {
x++;
y++;
z++;
}
![Page 24: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/24.jpg)
24
Locks - Example
T1:
synchronized(foo) {
x++;
y++;
z++;
}
T2: synchronized(foo) { x++;
y++;
z++;
}
![Page 25: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/25.jpg)
25
Locks – Example two
T1:
synchronized(foo) {
x++;
y++;
z++;
}
T2: synchronized(foo) { a++;
b++;
c++;
}
![Page 26: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/26.jpg)
26
Transactional Memory – Example two
T1:
atomic {
x++;
y++;
z++;
}
T2: atomic { a++;
b++;
c++;
}
![Page 27: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/27.jpg)
27
Sets and conflict detection
{y, z} read set {x} write set
Transaction Tx1 will have a
conflict with another parallel executing transaction IFF the intersection of the
sets is not empty
Which ones?
Tx1:
atomic {
x = y + z;
}
![Page 28: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/28.jpg)
28
Transactional Memory - Requirements
To be able to store the read set and the write set
To be able to computer the intersection of the sets
When one Tx executes optimistically -> to be able of restore the state of the program and computer architecture to the state before the transaction started
![Page 29: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/29.jpg)
29
TM Implementations (landscape)
Granularity
Conflict detection (eager vs. lazy)
Speculative state (write operations)
Software (DSTM2, RSTM, tinySTM, TL2, DiSTM, etc.)
Hardware (TCC, LogTM, Rock,…) & Haswell
Hybrid (Rock, Intel Research, Microsoft Research)
![Page 30: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/30.jpg)
30
Roadmap for today
Multi-core: ubiquitous and future trends Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 31: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/31.jpg)
31
Lee’s algorithm (pseudo code)
Grid grid
for i in list of routes {
expand (from source to destination)
traceBack (from destination to origin)
cleanup(expansion)
}
![Page 32: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/32.jpg)
32
Transaccional Lee (pseudo code)
Grid grid
forall routes { // work queue
atomic{
expand (from source to destination)
traceBack (from destination to origin)
cleanup(expansion)
}
}
![Page 33: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/33.jpg)
33
Can we improve it?
Privatization
![Page 34: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/34.jpg)
34
Transactional Lee (privatization)
Grid grid forall routes { // work queue atomic{ Grid local expansion (from source to destination) // read global & write local traceBack (from destination to origin) // read local & write global // NO: cleanup(expansion) } }
![Page 35: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/35.jpg)
35
We’ll look at the performance later
But, have we reached the optimum?
![Page 36: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/36.jpg)
36
Routes: disallowed vs allowed
![Page 37: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/37.jpg)
37
Transactional Lee (privatization)
Grid grid forall routes { // work queue atomic{ Grid local expansion (from source to destination) // read global & write local traceBack (from destination to origin) // read local & write global // NO: cleanup(expansion) } }
![Page 38: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/38.jpg)
38
Transactional Lee (early release)
Grid grid forall routes { // work queue atomic{ Grid local expansion (from source to destination) // ER: read global & write local traceBack (from destination to origin) // read local, compare with global & // write global // NO: cleanup(expansion) } } // We are not advocating for early release
![Page 39: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/39.jpg)
39
Roadmap for today
Multi-core: ubiquitous and future trends Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 40: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/40.jpg)
40
Experiment: abstract TM
#Iterations 305 227 14
1st Iteration 79 118 697
Failed attempts
89534 53838 374
Lee-TM Lee-TM
privatization
Lee-TM
early release
1506 routes Routes shorted in increasing order Algorithm tries to avoid “spaghetti” routes
![Page 41: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/41.jpg)
41
Experiment: abs. TM (pending routes)
![Page 42: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/42.jpg)
42
Experiment: abs. TM (#iterations vs. #processors)
![Page 43: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/43.jpg)
43
Experiment abss TM (#executed transactions)
![Page 44: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/44.jpg)
44
Experiment with DSTM2 on 8-core AMD
0
50
100
150
200
250
300
350
400
0 2 4 6 8 10
Threads
Tim
e (
s) Coarse
Medium
TM
TM-ER
![Page 45: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/45.jpg)
45
Experiment with our HardwareTM
![Page 46: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/46.jpg)
46
Roadmap for today
APT Group Intro & Need for SW/HD co-design Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 47: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/47.jpg)
47
Transactional Lee: a closer look (DSTM2)
Percentage of All Transactions that were
Successful (Committed) Transactions
0
10
20
30
40
50
60
70
80
90
100
Time
Perc
en
tag
e
![Page 48: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/48.jpg)
48
Control (auto-tune) number of transaction
TM applications can exhibit different phases with different levels of parallelism
Relation between the number of transaction executing without conflicts and the amount of parallelism available in an application utilizable
Constant
Time
EP
Periodic
Time
EP
Random
Time
EP
![Page 49: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/49.jpg)
49
How can we make it work?
Use TCR as an approximation to the amount of parallelism available
Transaction Commit Rate (TCR) • NumCommittedTx/NumTotalTx (in a give period of time)
• If is high -> allow more parallel executing transactions
• If is low -> allow fewer parallel executing transactions
![Page 50: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/50.jpg)
50
Results (execution time improvement)
Contention Manager
Simple Adjust
Exponential Interval
Exponential Adjust Exponential Combined Average
1 2 4 8 1 2 4 8 1 2 4 8 1 2 4 8
Aggressive 0.94 1.24 0.94 1.06 0.92 1.13 1.00 1.07 1.01 1.25 1.07 1.10 1.08 1.18 1.03 1.04 1.07
Backoff 0.82 0.74 1.63 2.47 0.84 0.87 1.39 2.73 0.76 0.90 1.41 3.00 0.89 0.91 1.41 2.47 1.45
Eruption 0.72 1.14 1.12 1.42 0.82 1.13 1.03 1.39 0.81 1.21 0.95 1.49 0.83 1.21 0.93 1.52 1.11
Greedy 1.20 1.08 1.00 1.34 0.99 0.98 1.00 1.26 1.14 1.04 1.00 1.36 1.08 0.99 0.94 1.33 1.11
Karma 1.12 1.04 1.05 1.31 1.02 1.21 1.05 1.30 1.18 1.13 1.04 1.41 1.05 1.13 1.03 1.41 1.16
Kindergarten 1.12 1.18 0.99 1.06 1.13 1.07 0.91 1.02 1.30 1.22 0.99 1.05 1.35 1.14 0.99 1.01 1.10
Polka 0.96 1.23 0.97 1.08 1.01 1.03 0.94 1.09 1.07 1.09 1.08 1.24 1.04 1.02 0.92 1.14 1.06
Priority 1.32 1.09 1.05 0.98 1.13 0.95 1.04 0.98 1.21 1.08 1.04 1.00 1.23 1.05 1.04 0.98 1.07
< 0.9
0.9 - 1.0
1.0 - 1.1
> 1.1
![Page 51: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/51.jpg)
51
Results (improvement #used cores)
Contention Manager Resource utilization (%)
Aggressive 46
Backoff 82
Eruption 59
Greedy 57
Karma 53
Kindergarten 44
Polka 41
Priority 41
SimpleAdjust with 8 initial threads
![Page 52: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/52.jpg)
Scheduling vs. Aborts: Example
• T1 and T2 execute concurrently
• T1 conflicts with T2
• T1 aborts
• T1 restarts (immediately)
• T1 conflicts with T2 again
• T1 aborts again
• T1 restarts (immediately)
• T1 conflicts with T2 again
• …
T2 T1
…
![Page 53: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/53.jpg)
Steal-on-Abort
In general, difficult to predict first conflict/abort
Once observed, simple to avoid next conflict/abort • Do not execute T1 & T2 concurrently
Steal-on-abort design: • Automatically make scheduling decisions to avoid
conflicts: - On abort, transaction stolen by aborter
- Aborted transaction released after stealer commits
• Additionally, attempt to improve performance: - Thread whose transaction is stolen obtains another
transaction to execute. May commit, improving performance.
![Page 54: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/54.jpg)
Performance
0
2000
4000
6000
8000
10000
12000
14000
16000
1 2 4 8
Threads
Tra
nsacti
on
s/s
eco
nd
Non-SOA
SOA
![Page 55: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/55.jpg)
Wasted Resources
0
20
40
60
80
100
120
2 4 8
Threads
Wa
ste
d w
ork
(%
)
Non-SOA
SOA
![Page 56: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/56.jpg)
56
Roadmap for today
Multi-core: ubiquitous and future trends Lee’s algorithm
Understand the problem Parallel implementations
Different choices Lessons
Transactional Memory Basic concept Lee with transactions
Performance analysis Improving performance Teraflux
![Page 57: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/57.jpg)
57
Transactional Memory
Transactional Memory is not a silver bullet. But, provides both a concurrent programming
abstraction which is much simpler than traditional techniques; and
A more relaxed coherence semantics. Program state must be coherent at the start and end of transaction.
We are interested in Transactional Memory as a key component of a computational model
![Page 58: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/58.jpg)
My personal baggage with Parallel Systems
Undergraduate: Shared Memory vs. Message Passing Programming • Equivalent, pain developing and debugging, performance (memory
allocator, cache coherence)
PhD: High Productivity for HPC • Java and OO for Numerical Linear Algebra
• Recover lost performance with compilation techniques • Advisor: John Gurd (Manchester Dataflow)
Sun Microsystems DARPA High Productivity Computing System project • Runtime software for Petascale System (order of 106 hardware
threads)
• PGAS, GUPS & Global address space vs. Cache Coherence
Transactional Memory in Manchester • Software, Hardware, Distributed, Scheduling, Applications …
• Work with Ian Watson & Chris Kirkham (Manchester Dataflow)
Teraflux: my first project with Dataflow I suppose it was unavoidable! 58
![Page 59: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/59.jpg)
59
The TERAFLUX Project
Exploiting Dataflow Parallelism in Teradevice Computing
What is it about? • Many-cores (1000+ cores or Teradevices) • General purpose computing • Dataflow (data driven execution) • Reliability
Funded by the EU Seventh Framework • University of Siena (co-ordinator) • Barcelona Supercomputing Centre • CAPS Enterprise • Hewlett Packard • INRIA • Microsoft (Israel) • THALES • University of Cyprus • University of Augsburg • University of Manchester
![Page 60: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/60.jpg)
60
What is the fuss with Dataflow?
Computation Model: • Computation is described as a graph
• Edges describe unidirectional data dependencies
• Nodes represent computation (side-effect free computation)
• Execution follows data driven - A node is “fired” once all its input data is ready
- Parallel execution is natural: multiple nodes can execute in parallel as long as their input data is available
Relation with pure procedures (side effect free computation, nothing shared),…
What was wrong with the Manchester Dataflow?
![Page 61: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/61.jpg)
Google MapReduce on data-centres OSDI’04
61
![Page 62: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/62.jpg)
62
Is Dataflow the silver bullet?
A flexible and efficient way of exploiting parallelism
Maybe its ‘time has come’ in the many core era • Consider MapReduce, NLA, GPUs, FPGAs
But is it general purpose? • Is certainly good at irregular (i.e. general purpose)
parallelism where other approaches fail
• But a big weakness is (with its underlying side effect free connections) an inability to deal well with shared mutable state
• Transactional memory provides a good mechanism for updating shared mutable state (Isolation and Atomicity)
![Page 63: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/63.jpg)
63
Dataflow plus Transactions
A major aim of the TERAFLUX project is to investigate the introduction of Transactional Memory into Dataflow • Computational Model vs Programming Environment
• Hardware Support
• Fault-tolerance
• Applications
I’m just giving you • a high level overview & motivation
• a description and perspective of work-in-progress in Manchester
![Page 64: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/64.jpg)
Prototyping in Scala
Scala • High Productivity Developers
• Combines functional programming with OO
We have extended Scala with Transactional syntax and have provided a Software Transactional Memory • http://apt.cs.man.ac.uk/projects/TERAFLUX/MUTS
• Manchester University Transactions for Scala (MUTS)
We have implemented a new Dataflow library
We are investigating means of generating automically dataflow execution. Developer does not create threads
• Reimplementation of the Scala parallel collection using dataflow plus transactions
• Analysis for Lee-TM of benefits of Dataflow plus transactions
We are investigating how a subset of Scala and the “right” type system can simplify the software development
64
![Page 65: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/65.jpg)
Many-core Architecture in Manchester
Contributing to the memory model
Investigating how to simplify coherency & consistency by using Dataflow and TM computational model • No “traditional” cache coherence across the chip, but globally
accessible address space
Investigating how to scale hardware TM • Can dataflow simplify the TM implementation?
Investigating relation between hardware Dataflow scheduler and hardware TM
How to simulate large many-cores? NoCs 2012
How to make TM compatible with fault-tolerance mechanism proposed by our partners.
MCTS for GO game and other applications 65
![Page 66: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/66.jpg)
Summary
Dataflow plus Transactions seems to be a promising new approach to extend the power of the Dataflow model to include shared state
What is it about? • Many-cores (1000+ cores or Teradevices) • General purpose computing • Dataflow (data driven execution) • Reliability
DF+TM = efficient general purpose parallel computational model?
66
![Page 67: From Transactions to Dataflow and Back Again · 2018-01-04 · From Transactions to Dataflow and Back Again Mikel Luján ... Performance analysis Improving performance Teraflux .](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f08210e7e708231d4207ba6/html5/thumbnails/67.jpg)
67
More Information
http://www.teraflux.eu http://www.cs.manchester.ac.uk/apt/projects/TM http://www.cs.wisc.edu/trans-memory/
Transactional Memory. Harris, Larus & Rajwar, 2010.