Bringing Parallel ComputingParallel Computing to the mass · Microsoft PowerPoint - ALes enjeux du...
Transcript of Bringing Parallel ComputingParallel Computing to the mass · Microsoft PowerPoint - ALes enjeux du...
Bringing Parallel Computing to the massBringing Parallel Computing to the mass
− Parallel Programming
1000000
10000000
Pentium® 4 Architecture
10000
100000 Pentium® Pro Architecture
Pentium® 4 Architecture
Pentium® Architecture
Goal: 10 TIPS by
100
1000
MIP
S
Pentium® Architecture
486386
286
y2015
1
10
M 2868086
0,01
0,1
1970 1980 1990 2000 2010 2020Source: Intel Juin 2006
…but this only matters if developers embrace parallel computing!
32,768
2 048
OP
S
2,048
128Parallelism Opportunity
80X
GO
16
80X
2004 2006 2008 2010 2012 2015
If you haven’t done so already, now is the time to take a hard look at the design of your application determine whathard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit fromsoon, and identify how those places could benefit from concurrency.”
H b S tt C hi i f h 200Herb Sutter, C++ Architect at Microsoft (March 2005)
So this is the challenge that we are focus on with ourSo this is the challenge that we are focus on with our partner .
“Enable people to do parallelism without the need to go deep and make parallelism accessible to a large number of software d l ”developers.”
Steve TexeiraSteve TexeiraProduct Unit ManagerParallel Developer ToolsParallel Developer ToolsMicrosoft Corporation
Microsoft® Parallel Computing InitiativeInitiative
The manycore is a disruption for the developer but it’s also a y p pgreat opportunity
It begins with DEVELOPERS
M k it i t dM k it i t dMake it easier to express and manage the correctness,
efficiency and maintainability of parallelism
Make it easier to express and manage the correctness,
efficiency and maintainability of parallelism y f p
on Microsoft platforms for developers of all skill levels
y f pon Microsoft platforms for
developers of all skill levels
Enable developers toEnable developers toEnable developers to express parallelism easily and focus on the problem to be
Enable developers to express parallelism easily and focus on the problem to be Improve theImprove the
Simplify the process of designing and testing parallel
Simplify the process of designing and testing parallel the problem to be
solvedthe problem to be
solvedImprove the
efficiency and scalability of parallel
applications
Improve the efficiency and
scalability of parallel applications
applicationsapplications
applicationsapplications
Tools, Programming Models, RuntimesProgramming ModelsTools
Parallel Pattern Task Parallel
Parallel LINQParallelDebugger Agents
LibraryLibraryLibrary Data St uc
ture
s
Profiler Concurrency
Library
NET Framework 4 Visual C++ 10Visual Studio
Task Scheduler
Concurrency Runtime
ThreadPoolructures D
ata
Stry
Analysis .NET Framework 4 Visual C++ 10StudioIDE
Resource Manager
Task Scheduler
Resource Manager
MPI Debugger
ThreadsOperating System UMS ThreadsWindows
Managed NativeKey:
Threads
Tooling
UMS ThreadsWindows
(and many overloads for the above)
void MatrixMult(void MatrixMult(int size, double** m1, double** m2, double** result)
{for (int i = 0; i < size; i++) {for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {result[i][j] = 0;for (int k = 0; k < size; k++) {for (int k = 0; k < size; k++) {
result[i][j] += m1[i][k] * m2[k][j];}
}}}
}
void MatrixMult(int size, double** m1, double** m2, double** result) {
int N = size;int N = size;int P = 2 * NUMPROCS; int Chunk = N / P;HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);long counter = P;f (i t 0 P ) {for (int c = 0; c < P; c++) {
std::thread t ([&,c] {for (int i = c * Chunk;
i < (c + 1 == P ? N : (c + 1) * Chunk); i++) {for (int j = 0; j < size; j++) {o ( t j 0; j s e; j ) {result[i][j] = 0;for (int k = 0; k < size; k++) {
result[i][j] += m1[i][k] * m2[k][j];}
}}}if (InterlockedDecrement(counter) == 0)
SetEvent(hEvent);});
}}WaitForSingleObject(hEvent,INFINITE);CloseHandle(hEvent);
}
i i l S f i i h h d"Nontrivial Software written with threads, semaphores, and mutexes are incomprehensible
h d d h ld b d"to humans and cannot and should not be trusted"
Edward A LeeEdward A. LeeUC Berkeley
void MatrixMult(int size, double** m1, double** m2, double** result)
{{parallel_for(0, size, 1, [&](int i) {
for (int j = 0; j < size; j++) {result[i][j] = 0;f (i k 0 k i k ) {for (int k = 0; k < size; k++) {
result[i][j] += m1[i][k] * m2[k][j];}
}}});
}
IEnumerable<RaceCarDriver> drivers = ...;;var results = new List<RaceCarDriver>();foreach(var driver in drivers){{
if (driver.Name == queryName &&driver.Wins.Count >= queryWinCount)q y
{results.Add(driver);
}}}results.Sort((b1, b2) =>
b1.Age.CompareTo(b2.Age));
IEnumerable<RaceCarDriver> drivers = …;var results = new List<RaceCarDriver>();int partitionsCount = Environment.ProcessorCount;i i i C i i Cint remainingCount = partitionsCount;var enumerator = drivers.GetEnumerator();try {
using (var done = new ManualResetEvent(false)) {for(int i = 0; i < partitionsCount; i++) {p
ThreadPool.QueueUserWorkItem(delegate {while(true) {
RaceCarDriver driver;lock (enumerator) {
if (!enumerator MoveNext()) break;if (!enumerator.MoveNext()) break;driver = enumerator.Current;
}if (driver.Name == queryName &&
driver.Wins.Count >= queryWinCount) {lock(results) results.Add(driver);
}}if (Interlocked.Decrement(ref remainingCount) == 0) done.Set();
});});}done.WaitOne();results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age));
}}}finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
var results = from driver in driverswhere driver.Name == queryName &&
driver.Wins.Count >= queryWinCount.AsParallel()driver.Wins.Count > queryWinCountorderby driver.Age ascendingselect driver;
()
Programming-Model-IndependentSchedulerScheduler
Resource Managerg
LocalWork-
S li
Local Work-
S li…
Lock-FreeGlobal Queue
Stealing Queue
Stealing Queue
Worker Thread 1
Worker Thread p
…
Task 6
Program Thread
Task 1Task 2
Task 3Task 5
Task 4