ConcurrencyExamples for .NET
Responsive
PerformanceScalable algorithms
Three pillars of Concurrency
Scalability (CPU) Parallel.For
Responsiveness Task async/await
Consistency lock Interlocked.* Mutex/Event/Semaphore Monitor
Scalability
Which is fastest?
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
SHARED STATE Race condition
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
SHARED STATE Poor performancevar ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}// ------------------------------------------------var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Memory nsec 225 70 3x
Then and now
Metric VAX-11/750 (’80)
Today Improvement
MHz 6 3300 550x
Memory MB 2 16384 8192x
Memory MB/s 13 R ~10000W ~2500
770x190x
Memory nsec 225 70 3x
Memory cycles
1.4 210 -150x
299,792,458 m/s
Speed of light is too slow
0.09 m/c
99% - latency mitigation
1% - computation
2 Core CPU
RAM
L3L2
L1
CPU
L2
L1
CPU
2 Core CPU – L1 Cache
L1
CPU
L1
CPU
new Random ()
new int[InnerLoop]
4 Core CPU – L1 Cache
L1
CPU
L1
CPU
L1
CPU
L1
CPU
new Random ()
new int[InnerLoop]
2x4 Core CPU
RAM
L3L2
L1
CPU
L2
L1
CPU
L2
L1
CPU
L2
L1
CPU
L3L2
L1
CPU
L2
L1
CPU
L2
L1
CPU
L2
L1
CPU
Solution 1 – Locks
var ints = new int[InnerLoop];var random = new Random();Parallel.For( 0, InnerLoop, i => {lock (ints) {ints[i] = random.Next();}} );
Solution 2 – No sharing
var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop, () => new Random(), (i, pls, random) => {ints[i] = random.Next(); return random;}, random => {} );
Parallel.For adds overheadLevel0
Level1
Level2
ints[0]
ints[1]
Level2
ints[2]
ints[3]
Level1
Level2
ints[4]
ints[5]
Level2
ints[6]
ints[7]
Solution 3 – Less overhead
var ints = new int[InnerLoop];Parallel.For( 0, InnerLoop / Modulus, () => new Random(), (i, pls, random) => { var begin = i * Modulus ; var end = begin + Modulus ; for (var iter = begin; iter < end; ++iter) { ints[iter] = random.Next(); } return random; }, random => {} );
var ints = new int[InnerLoop];var random = new Random();for (var inner = 0; inner < InnerLoop; ++inner){ ints[inner] = random.Next();}
Solution 4 – Independent runs
var tasks = Enumerable.Range (0, 8).Select ( i => Task.Factory.StartNew ( () => { var ints = new int[InnerLoop]; var random = new Random (); while (counter.CountDown ()) { for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } } }, TaskCreationOptions.LongRunning)) .ToArray ();Task.WaitAll (tasks);
Parallel.For
Only for CPU bound problems
Sharing is bad
Kills performanceRace conditions
Dead-locks
Servers have natural concurrency
Avoid Parallel.For
Act like an engineer
Measure before and after
One more thing…
Mårten Rånge
Top Related