Parallel programming patterns - Олександр Павлишак

Post on 10-May-2015

447 views 4 download

Tags:

description

Паралелізм та concurrency -- напрямок, в якому технології програмування прямують зараз і, без сумніву, прямуватимуть в майбутньому. Багатоядерними процесорами оснащуються комп'ютери навіть початкового рівня, що відкриває можливості для створення швидких ефективних програм із "живими" інтерфейсами. Тому навіть ті розробники, які раніше не стикались із concurrent кодом, будуть все частіше і частіше програмувати із врахуванням паралелізму. В доповіді буде розглянуто шаблони паралельного програмування, способи асинхронного програмування, а також тенденції окремих сучасних технологій в області паралелізму та асинхронності. Слухачі зможуть отримати знання про основні способи організації паралельних обчислень у desktop-, web- і серверних аплікаціях, засоби досягнення responsive GUI, техніки вирішення проблем, що виникають у concurrent програмуванні.

Transcript of Parallel programming patterns - Олександр Павлишак

Parallel Programming Patterns

Аудиторія: розробники

Олександр Павлишак, 2011pavlyshak@gmail.com

Зміст

- Тренд- Основні терміни- Managing state- Паралелізм- Засоби

Вчора

Сьогодні

Завтра

Що відбувається?

- Ріст частоти CPU вповільнився- Через фізичні обмеження- Free lunch is over- ПЗ більше не стає швидшим саме по собі

Сучасні тренди

- Manycore, multicore- GPGPU, GPU acceleration, heterogeneous

computing- Distributed computing, HPC

Основні поняття

- Concurrency- Many interleaved threads of control

- Parallelism- Same result, but faster

- Concurrency != Parallelism- It is not always necessary to care about

concurrency while implementing parallelism- Multithreading- Asynchrony

Задачі

- CPU-bound- number crunching

- I/O-bound- network, disk

Стан

- Shared- accessible by more than one thread- sharing is transitive

- Private- used by single thread only

Task-based program

Application

Tasks (CPU, I/O)

Runtime (queuing, scheduling)

Processors (threads, processes)

Managing state

Isolation

- Avoiding shared state- Own copy of state- Examples:

- process isolation- intraprocess isolation- by convention

Immutability

- Multiple read -- not a problem!- All functions are pure- Requires immutable collections- Functional way: Haskell, F#, Lisp

Synchronization

- The only thing that remains to deal with shared mutable state

- Kinds:- data synchronization- control synchronization

Data synchronization

- Why? To avoid race conditions and data corruption

- How? Mutual exclusion- Data remains consistent- Critical regions

- locks, monitors, critical sections, spin locks- Code-centered

- rather than associated with data

Critical region|Thread 1|// ...|lock (locker)|{| // ...| data.Operation();| // ...|}|// ...|||

|Thread 2|// ...|||||||lock (locker)|{| // ...| data.Operation();| // ...|}

Control synchronization

- To coordinate control flow- exchange data- orchestrate threads

- Waiting, notifications- spin waiting- events- alternative: continuations

Three ways to manage state

- Isolation: simple, loosely coupled, highly scalable, right data structures, locality- Immutability: avoids sync- Synchronization: complex, runtime overheads, contention

- in that order

Паралелізм

Підходи до розбиття задач

- Data parallelism- Task parallelism- Message based parallelism

Data parallelism

How?

- Data is divided up among hardware processors- Same operation is performed on elements - Optionally -- final aggregation

Data parallelism

When?

- Large amounts of data- Processing operation is costly- or both

Data parallelism

Why?

- To achieve speedup

- For example, with GPU acceleration:- hours instead of days!

Data parallelism

Embarrassingly parallel problems- parallelizable loops- image processing

Non-embarrassingly parallel problems- parallel QuickSort

Data parallelism

...

Thread 1

...

Thread 2

Data parallelism

Structured parallelism

- Well defined begin and end points- Examples:

- CoBegin- ForAll

CoBegin

var firstDataset = new DataItem[1000];var secondDataset = new DataItem[1000];var thirdDataset = new DataItem[1000];

Parallel.Invoke( () => Process(firstDataset), () => Process(secondDataset), () => Process(thirdDataset) );

Parallel For

var items = new DataItem[1000 * 1000];// ...Parallel.For(0, items.Length, i => { Process(items[i]); });

Parallel ForEach

var tickers = GetNasdaqTickersStream();Parallel.ForEach(tickers, ticker => { Process(ticker); });

Striped Partitioning

Thread 1 Thread 2

...

Iterate complex data structures

var tree = new TreeNode();// ...Parallel.ForEach( TraversePreOrder(tree), node => { Process(node); });

Iterate complex data

...

Thread 1

Thread 2

Declarative parallelismvar items = new DataItem[1000 * 1000];// ...var validItems = from item in items.AsParallel() let processedItem = Process(item) where processedItem.Property > 42 select Convert(processedItem);

foreach (var item in validItems){ // ...}

Data parallelism

Challenges

- Partitioning- Scheduling- Ordering- Merging- Aggregation- Concurrency hazards: data races, contention

Task parallelism

How?

- Programs are already functionally partitioned: statements, methods etc.- Run independent pieces in parallel- Control synchronization- State isolation

Task parallelism

Why?

- To achieve speedup

Task parallelism

Kinds- Structured

- clear begin and end points- Unstructured

- often demands explicit synchronization

Fork/join

- Fork: launch tasks asynchronously- Join: wait until they complete- CoBegin, ForAll- Recursive decomposition

Fork/join

Task 1

Task 2

Task 3

SeqSeq

Fork/join

Parallel.Invoke( () => LoadDataFromFile(), () => SavePreviousDataToDB(), () => RenewOtherDataFromWebService());

Fork/join

Task loadData = Task.Factory.StartNew(() => { // ... });Task saveAnotherDataToDB = Task.Factory.StartNew(() => { // ... });// ...Task.WaitAll(loadData, saveAnotherDataToDB);// ...

Fork/join

void Walk(TreeNode node) { var tasks = new[] { Task.Factory.StartNew(() => Process(node.Value)), Task.Factory.StartNew(() => Walk(node.Left)), Task.Factory.StartNew(() => Walk(node.Right)) }; Task.WaitAll(tasks);}

Fork/join recursive

Root

SeqSeq Left

Right

Node

Left

Right

Node

Left

Right

Dataflow parallelism: Futures

Task<DataItem[]> loadDataFuture = Task.Factory.StartNew(() => { //... return LoadDataFromFile(); });

var dataIdentifier = SavePreviousDataToDB();RenewOtherDataFromWebService(dataIdentifier);//...DisplayDataToUser(loadDataFuture.Result);

Dataflow parallelism: Futures

Seq

Future

Seq Seq

Dataflow parallelism: Futures

Seq

Future

Seq

Future

Seq

Future

Seq Seq

Continuations

Seq

Task

Seq Seq

Task

Task

Continuationsvar loadData = Task.Factory.StartNew(() => { return LoadDataFromFile(); });

var writeToDB = loadData.ContinueWith(dataItems => { WriteToDatabase(dataItems.Result); });

var reportToUser = writeToDB.ContinueWith(t => { // ... });reportProgressToUser.Wait();

Producer/consumerpipeline

lines parsedlines DB

reading parsing storing

Producer/consumerpipeline

lines

parsedlines

DB

Producer/consumer

var lines = new BlockingCollection<string>();

Task.Factory.StartNew(() => { foreach (var line in File.ReadLines(...)) lines.Add(line); lines.CompleteAdding(); });

Producer/consumer

var dataItems = new BlockingCollection<DataItem>();

Task.Factory.StartNew(() => { foreach (var line in lines.GetConsumingEnumerable() ) dataItems.Add(Parse(line)); dataItems.CompleteAdding(); });

Producer/consumer

var dbTask = Task.Factory.StartNew(() => { foreach (var item in dataItems.GetConsumingEnumerable() ) WriteToDatabase(item); });

dbTask.Wait();

Task parallelism

Challenges

- Scheduling- Cancellation- Exception handling- Concurrency hazards: deadlocks, livelocks, priority inversions etc.

Message based parallelism

- Accessing shared state vs. local state- No distinction, unfortunately- Idea: encapsulate shared state changes into

messages- Async events- Actors, agents

Засоби

Concurrent data structures

- Concurrent Queues, Stacks, Sets, Lists- Blocking collections, - Work stealing queues- Lock free data structures- Immutable data structures

Synchronization primitives

- Critical sections, - Monitors, - Auto- and Manual-Reset Events,- Coundown Events, - Mutexes, - Semaphores, - Timers, - RW locks- Barriers

Thread local state

- A way to achieve isolation

var parser = new ThreadLocal<Parser>( () => CreateParser());

Parallel.ForEach(items, item => parser.Value.Parse(item));

Thread pools

ThreadPool.QueueUserWorkItem(_ => { // do some work });

AsyncTask.Factory.StartNew(() => { //... return LoadDataFromFile(); }) .ContinueWith(dataItems => { WriteToDatabase(dataItems.Result); }) .ContinueWith(t => { // ... });

Asyncvar dataItems = await LoadDataFromFileAsync();

textBox.Text = dataItems.Count.ToString();

await WriteToDatabaseAsync(dataItems);

// continue work

Технології

- TPL, PLINQ, C# async, TPL Dataflow- PPL, Intel TBB, OpenMP- CUDA, OpenCL, C++ AMP- Actors, STM- Many others

Підсумок

- Програмування для багатьох CPU- Concurrency != parallelism- CPU-bound vs. I/O-bound tasks- Private vs. shared state

Підсумок

- Managing state:- Isolation - Immutability- Synchronization

- Data: mutual exclusion- Control: notifications

Підсумок

- Паралелізм:- Data parallelism: scalable- Task parallelism: less scalable- Message based parallelism

Підсумок

- Data parallelism- CoBegin- Parallel ForAll- Parallel ForEach- Parallel ForEach over complex data structures- Declarative data parallelism

- Challenges: partitioning, scheduling, ordering, merging, aggregation, concurrency hazards

Підсумок

- Task parallelism: structured, unstructured- Fork/Join

- CoBegin- Recursive decomposition

- Futures- Continuations- Producer/consumer (pipelines)

- Challenges: scheduling, cancellation, exceptions, concurrency hazards

Підсумок

- Засоби/інструменти- Компілятори, бібліотеки- Concurrent data structures- Synchronization primitives- Thread local state- Thread pools- Async invocations- ...

Q/A