Overview Of Parallel Development - Ericnel

Post on 07-Nov-2014

2.079 views 1 download

Tags:

description

VBUG Newcastle delivery 24th February 2009 by Eric Nelson

Transcript of Overview Of Parallel Development - Ericnel

1

Overview of Parallel Development

Eric Nelsonhttp://geekswithblogs.net/iupdateablehttp://blogs.msdn.com/goto100 http://twitter.com/ericnel

Agenda

Overview of what we are up toDrill down into parallel programming for managed developers

Things I learnt...We have a very large investment in parallel computing

We have “something for everyone”It is not all synced, it is sometimes overlapping

It is a big topicManaged vs native vs client vs server vs task vs data...

Even with the investment, design/code/test for parallel is far harder

Locking, Deadlocks, Livelocks

It is about getting ready for the futureCode today – run better tomorrow?

VS2010 CTP – not a great place for parallelSingle core in guestUnsupported route to use Hyper-V

Easiest route to dabble – Microsoft Parallel Extensions June CTP for VS2008

Buying a new Processor

£100 - £300£100 - £300

2-3GHz2-3GHz

2 cores or 42 cores or 4

64-bit64-bit

CoreCore

CoreCore

Buying a new Processor

CoreCoreCoreCoreCoreCoreCoreCore£200 - £500£200 - £500

2-3GHz2-3GHz

4 cores with HT4 cores with HT

64-bit64-bit

QuickPath QuickPath InterconnectInterconnect

Memory ControllerMemory Controller

Where will it all end?

Unisys ES7000 (7600R) used with kind permission of Mr Henk var der Valk, Unisys, NL

Was it a wise purchase?

Windows OSWindows OS

App 1App 1 App 2App 2 ......

App 1App 1

.NET CLR.NET CLR

.NET Framework.NET Framework

My CodeMy Code

Was it a wise purchase?

Some environments scale to take advantage of additional CPU cores (mostly server-side)

A lot of code does not (mostly client-side)This code will see little benefit from future hardware advances

ASP.NET Web Forms/ServicesASP.NET Web Forms/Services WCF ServicesWCF Services WF EngineWF Engine ......

.NET ThreadPool or Custom Threading Strategy.NET ThreadPool or Custom Threading Strategy

What happened to “The Free Lunch”?

Bad sequential code will run faster on a faster processor

Just using parallel code is not enoughBad parallel code WILL NOT run faster on more cores

0

16

32

48

64

0 16 32 48 64

Cores

Par

alle

l S

pee

du

p

Production Fluid

Production Face

Production Cloth

Game Fluid

Game Rigid Body

Game Cloth

Marching Cubes

Sports Video Analysis

Video Cast Indexing

Home Video Editing

Text Indexing

Ray Tracing

Foreground Estimation

Human Body Tracker

Portifolio Management

Geometric Mean

Graphics Rendering – Physical Simulation -- Vision – Data Mining -- Analytics

Applications Can Scale Well

Multithreaded programming is “hard” todayDoable by only a subgroup of senior specialistsParallel patterns are not prevalent, well known, nor easy to implementSo many potential problems

Races, deadlocks, livelocks, lock convoys, cache coherency overheads, lost event notifications, broken serializability, priority inversion, and so on…

Businesses have little desire to “go deep”Best developers should focus on business value, not concurrencyNeed simple ways to allow all developers to write concurrent code

What's The Problem?

void MatrixMult( int size, double** m1, double** m2, double** result){    for (int i = 0; i < size; i++) {        for (int j = 0; j < size; j++) {            result[i][j] = 0;            for (int k = 0; k < size; k++) {                result[i][j] += m1[i][k] * m2[k][j];            }        }    }}

void MatrixMult( int size, double** m1, double** m2, double** result) {  int N = size;                             int P = 2 * NUMPROCS;   int Chunk = N / P;                    HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);  long counter = P;                       for (int c = 0; c < P; c++) {   std::thread t ([&,c] {        for (int i = c * Chunk;       i < (c + 1 == P ? N : (c + 1) * Chunk); i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; } } }      if (InterlockedDecrement(counter) == 0)        SetEvent(hEvent);    });   }  WaitForSingleObject(hEvent,INFINITE); CloseHandle(hEvent);}

Synchronization Knowledge

Error prone

Heavy synchronization

Static partitioning

Lack of thread reuse

Tricks

Lots of boilerplate

Microsoft Parallel Computing Technologies

•Robotics-based manufacturing assembly line•Silverlight Olympics viewer

•Enterprise search, OLTP, collab•Animation / CGI rendering•Weather forecasting•Seismic monitoring•Oil exploration

•Automotive control system •Internet –based photo services

•Ultrasound imaging equipment •Media encode/decode•Image processing/ enhancement•Data visualization

Task Concurrency

Data Parallelism

Distributed/Cloud Computing

LocalComputing

CCR

Maestro

TPL / PPL

Cluster TPL

Cluster PLINQ

MPI / MPI.Net

WCF

Cluster SOA

WF

PLINQ

TPL / PPL

CDS

OpenMP

WF

Compute Shader

Visual Studio 2010Tools / Programming Models / Runtimes

Parallel Pattern Library

Resource Manager

Task Scheduler

Task Parallel Library

PLINQ

Managed Library Native Library

ThreadsThreadsOperating System

Concurrency Runtime

Programming Models

AgentsLibrary

ThreadPool

Task SchedulerTask Scheduler

Resource ManagerResource Manager

Data Structures

Dat

a St

ruct

ures

Integrated Tooling

Tools

ParallelDebugger

Tool

Profiler Concurrenc

yAnalysis

Programming Models

Concurrency Runtime

16

Explicit Tasking Support

.NET 4.0 Task Parallel Library

Task, TaskFactoryParallel.ForParallel.ForeachParallel.InvokeConcurrent data structures

Visual Studio 2010 C++Parallel Pattern Library

task, task_groupparallel_forparallel_for_eachparallel_invokeConcurrent data structuresPrimitives for message passingUser-mode locks

Task Parallel Library ( TPL )

18

Task

No Threadingto Threadingto Tasks

Program Thread

Program Thread

CLR Thread Pool

User Mode Scheduler

GlobalQueue

Worker Thread 1

Worker Thread p

CLR Thread Pool: Work-Stealing

Worker Thread 1

Worker Thread p

Program Thread

Program Thread

User Mode Scheduler For Tasks

GlobalQueue

LocalQueue

LocalQueue

Task 1Task 1Task 2Task 2

Task 3Task 3Task 5Task 5Task 4Task 4

Task 6Task 6

Debugger Support

Support both managed and native1. Parallel Tasks2. Parallel Stacks

Higher Level Constructs

Even with Task there are common patterns that build into higher level abstractions

The Parallel classInvoke, For, For<T>, Foreach

Care needs to be taken with state, ordering“This is not your Father’s for loop”

23

Parallel

Parallel.ForEachParallel.Invoke

Declarative Data Parallelism

Parallel LINQ-to-Objects (PLINQ)Enables LINQ devs to leverage multiple coresFully supports all .NET standard query operatorsMinimal impact to existing LINQ model

var q = from p in people        where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd        orderby p.Year ascending        select p;

25

Parallel LINQ

What Next?

Download VS 2010 CTPRemember to set the clock back

OrDownload Parallel Extensions June CTP for VS2008Experiment with runtime and API

Team is working on Visual Studio 2010 betaVery open to feedbackJoin in the discussion forumshttp://blogs.msdn.com/pfxteam/

Parallel Computing Resources

Downloads, Binaries, Code, Forums, Blogs, Videos, Screencasts,

Podcasts, Articles, Samples

http://msdn.com/concurrency

http://blogs.msdn.com/pfxteam/