Design Patterns for Parallel Programming

134
1 Design Patterns for Parallel Programming Research work by: Kurt Keutzer, EECS, UC Berkeley Tim Mattson, Intel Corporation Presented to faculty at Tsinghua Multicore Workshop Michael Wrinn, Intel Corporation

description

Design Patterns for Parallel Programming. Research work by: Kurt Keutzer , EECS, UC Berkeley Tim Mattson, Intel Corporation Presented to faculty at Tsinghua Multicore Workshop Michael Wrinn, Intel Corporation. Outline. A Programming Pattern Language Motivation – patterns: why and what? - PowerPoint PPT Presentation

Transcript of Design Patterns for Parallel Programming

Page 1: Design Patterns for Parallel Programming

1

Design Patterns for Parallel Programming

Research work by:Kurt Keutzer, EECS, UC Berkeley

Tim Mattson, Intel Corporation

Presented to faculty at Tsinghua Multicore Workshop

Michael Wrinn, Intel Corporation

Page 2: Design Patterns for Parallel Programming

2

Outline

A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo

Page 3: Design Patterns for Parallel Programming

Outline Motivation – what is the problem we are trying to solve How do programmers think? Psychology meets computer science. Pattern Language for Parallel programming

Toy problems showing some key PLPP patterns A PLPP example (molecular dynamics)

Expanding the effort: Patterns for engineering parallel software A Survey of some key patterns Case studies

3

Page 4: Design Patterns for Parallel Programming

Microprocessor trends

IBM Cell

NVIDIA Tesla C1060 Intel Terascale research chip

ATI RV770

3rd party names are the property of their owners.

Many core processors are the “new normal”.

80 cores

240 cores

1 CPU + 6 cores

160 cores

Page 5: Design Patterns for Parallel Programming

5

The State of the field A harsh assessment …

We have turned to multi-core chips not because of the success of our parallel software but because of our failure to continually increase CPU frequency.

Result: a fundamental and dangerous (for the computer industry) mismatch Parallel hardware is ubiquitous. Parallel software is rare

The multi-core Challenge: How do we make Parallel software as ubiquitous as parallel hardware?

This could be the greatest challenge ever faced by the computer industry

Page 6: Design Patterns for Parallel Programming

We need to create a new generation of parallel programmers

Parallel Programming is difficult, error prone and only accessible to small cadre of experts … Clearly we haven’t been doing a very good job at educating programmers.

Consider the following: All known programmers are human beings. Hence, human psychology, not hardware design, should guide

our work on this problem.

6

We need to focus on the needs of the programmer, not the needs of the computer

Page 7: Design Patterns for Parallel Programming

Outline Motivation – what is the problem we are trying to solve How do programmers think? Psychology meets computer science. Pattern Language for Parallel programming

Toy problems showing some key PLPP patterns A PLPP example (molecular dynamics)

Expanding the effort: Patterns for engineering parallel software A Survey of some key patterns Case studies

7

Page 8: Design Patterns for Parallel Programming

Cognitive Psychology and human reasoning

Human Beings are model builders We build hierarchical complexes of mental models. Understand sensory input in terms of these models. When input conflicts with models, we tend to believe the models.

Consider the following slide …

Page 9: Design Patterns for Parallel Programming
Page 10: Design Patterns for Parallel Programming

Why did most of you see motion?

Your brain’s visual system contains a model that says this combination of geometric shapes and colors implies motion.

Your brain believes the model and not what your eyes see..

To understand people and how we work, you need understand the models we work with.

Page 11: Design Patterns for Parallel Programming

Programming and models Programming is a process of successive

refinement of a problem over a hierarchy of models. [Brooks83]

The models represent the problem at a different level of abstraction. The top levels express the problem in the original

problem domain. The lower levels represent the problem in the

computer’s domain. The models are informal, but detailed enough to

support simulation.

Page 12: Design Patterns for Parallel Programming

Domain

Problem Specific: polygons,

rays, molecules, etc.

OpenMP’s fork/join,

Actors

Threads – shared memory

Processes – shared nothing

Registers, ALUs, Caches, Interconnects, etc.

Models

Specification

Programming

Computation

Machine (AKA Cost Model)

Model based reasoning in programming

Page 13: Design Patterns for Parallel Programming

Programming process: getting started.

The programmer starts by constructing a high level model from the specification.

Successive refinement starts by using some combination of the following techniques: The problem’s state is defined in terms of objects

belonging to abstract data types with meaning in the original problem domain.

Key features of the solution are identified and emphasized. These features, sometimes referred to as "beacons” [Wiedenbeck89] , emphasize key aspects of the solution.

Page 14: Design Patterns for Parallel Programming

The programming process

Programmers use an informal, internal notation based on the problem, mathematics, programmer experience, etc. Within a class of programming languages, the

program generated is only weakly dependent on the language. [Robertson90] [Petre88]

Programmers think about code in chunks or “plans”. [Rist86] Low level plans code a specific operation: e.g.

summing an array. High level or global plans relate to the overall

program structure.

Page 15: Design Patterns for Parallel Programming

Programming Process: Strategy + Opportunistic Refinement

Common strategies are: Backwards goal chaining - start at result and work

backwards to generate sub goals. Continue recursively until plans emerge.

Forward chaining: Directly leap to plans when a problem is familiar. [Rist86]

Opportunistic Refinement: [Petre90]

Progress is made at multiple levels of abstraction. Effort is focused on the most productive level.

Page 16: Design Patterns for Parallel Programming

Programming Process: the role of testing

Programmers test the emerging solution throughout the programming process.

Testing consists of two parts: Hypothesis generation: The programmer forms an

idea of how a portion of the solution should behave. Simulation: The programmer runs a mental simulation

of the solution within the problem models at the appropriate level of abstraction. [Guindom90]

Page 17: Design Patterns for Parallel Programming

From psychology to software Hypothesis:

A Design Pattern language provides the roadmap to apply results from the psychology of programming to software engineering:

Design patterns capture the essence of plans The structure of patterns in a pattern language should mirror

the types of models programmers use. Connections between patterns must fit well with goal-chaining

and opportunistic refinement.

Page 18: Design Patterns for Parallel Programming

References [Brooks83] R. Brooks, "Towards a theory of the comprehension of computer programs",

International Journal of Man-Machine Studies, vol. 18, pp. 543-554, 1983. [Guindom90] R. Guindon, "Knowledge exploited by experts during software system

design", Int. J. Man-machine Studies, vol. 33, pp. 279-304, 1990 [Hoc90] J.-M. Hoc, T.R.G. Green, R. Samurcay and D.J. Gilmore (eds.), Psychology of

Programming, Academic Press Ltd., 1990. [Petre88] M. Petre and R.L. Winder, "Issues governing the suitability of programming

languages for programming tasks. "People and Computers IV: Proceedings of HCI-88, Cambridge University Press, 1988.

[Petre90] M. Petre, "Expert Programmers and Programming Languages", in [Hoc90], p. 103, 1990.

[Rist86] R.S. Rist, "Plans in programming: definition, demonstration and development" in E. Soloway and S. Iyengar (Eds.), Empirical Studies of Programmers, Norweed, NF, Ablex, 1986.

[Rist90] R.S. Rist, "Variability in program design: the interaction of process with knowledge", International Journal of Man-Machine Studies, Vol. 33, pp. 305-322,1990.

[Robertson90] S. P. Robertson and C Yu, "Common cognitive representations of program code across tasks and languages", int. J. Man-machine Studies, vol. 33, pp. 343-360, 1990.

[Wiedenbeck89] S. Wiedenbeck and J. Scholtz, "Beacons: a knowledge structure in program comprehension", In G. Salvendy and M.J. Smith (eds.) Designing and Using Human Computer interfaces and Knowledge-based systems, Amsterdam: Elsevier,1989.

18

Page 19: Design Patterns for Parallel Programming

People, Patterns, and Frameworks

Design Patterns Frameworks

Application Developer Uses application design patterns

(e.g. feature extraction) to design the

application

Uses application frameworks(e.g. CBIR)

to develop application

Application-Framework Developer

Uses programming design patterns (e.g. Map/Reduce)to design the application framework

Uses programming design patterns(e.g MapReduce) to develop the application framework

Page 20: Design Patterns for Parallel Programming

Eventually

Domain literate programming gurus (1% of the population).

Application frameworks

Parallel patterns &

programming frameworks

+

Parallel programming gurus (1-10% of programmers)Parallel

programming frameworks

3

2

Domain ExpertsEnd-user,

application programs

Application patterns &

frameworks

+1

The hope is for Domain Experts to create parallel code with little or no understanding of parallel programming.

Leave hardcore “bare metal” efficiency layer programming to the parallel programming experts

Page 21: Design Patterns for Parallel Programming

Today

Domain literate programming gurus (1% of the population).

Application frameworks

Parallel patterns &

programming frameworks

+

Parallel programming gurus (1-10% of programmers)Parallel

programming frameworks

3

2

Domain ExpertsEnd-user,

application programs

Application patterns &

frameworks

+1

• For the foreseeable future, domain experts, application framework builders, and parallel programming gurus will all need to learn the entire stack.

• That’s why you all need to be here today!

Page 22: Design Patterns for Parallel Programming

Definitions - 1

Design Patterns: “Each design pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.“ Page x, A Pattern Language, Christopher Alexander

Structural patterns: design patterns that provide solutions to problems associated with the development of program structure

Computational patterns: design patterns that provide solutions to recurrent computational problems

Page 23: Design Patterns for Parallel Programming

Definitions - 2

Library: The software implementation of a computational pattern (e.g. BLAS) or a particular sub-problem (e.g. matrix multiply)

Framework: An extensible software environment (e.g. Ruby on Rails) organized around a structural pattern (e.g. model-view-controller) that allows for programmer customization only in harmony with the structural pattern

Domain specific language: A programming language (e.g. Matlab) that provides language constructs that particularly support a particular application domain. The language may also supply library support for common computations in that domain (e.g. BLAS). If the language is restricted to maintain fidelity to a structure and provides library support for common computations then it encompasses a framework (e.g. NPClick).

Page 24: Design Patterns for Parallel Programming

13 dwarves

Getting our software act together:First step … Define a conceptual roadmap to guide our work

Page 25: Design Patterns for Parallel Programming

25

Alexander’s Pattern Language

Christopher Alexander’s approach to (civil) architecture: "Each pattern describes a problem

which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.“ Page x, A Pattern Language, Christopher Alexander

Alexander’s 253 (civil) architectural patterns range from the creation of cities (2. distribution of towns) to particular building problems (232. roof cap)

A pattern language is an organized way of tackling an architectural problem using patterns

Main limitation: It’s about civil not software

architecture!!!

Page 26: Design Patterns for Parallel Programming

26

Alexander’s Pattern Language (95-103)

Layout the overall arrangement of a group of buildings: the height and number of these buildings, the entrances to the site, main parking areas, and lines of movement through the complex.

95. Building Complex

96. Number of Stories

97. Shielded Parking

98. Circulation Realms

99. Main Building

100. Pedestrian Street

101. Building Thoroughfare

102. Family of Entrances

103. Small Parking Lots

Page 27: Design Patterns for Parallel Programming

27

Family of Entrances (102)May be part of Circulation Realms (98).

Conflict:

When a person arrives in a complex of offices or services or workshops, or in a group of related houses, there is a good chance he will experience confusion unless the whole collection is laid out before him, so that he can see the entrance of the place where he is going.

Resolution:Lay out the entrances to form a family. This means:

1) They form a group, are visible together, and each is visible from all the others.

2) They are all broadly similar, for instance all porches, or all gates in a wall, or all marked by a similar kind of doorway.

May contain Main Entrance (110), Entrance Transition (112), Entrance Room (130), Reception Welcomes You (149).

Page 28: Design Patterns for Parallel Programming

28

Family of Entrances

http://www.intbau.org/Images/Steele/Badran5a.jpg

Page 29: Design Patterns for Parallel Programming

29

Computational Patterns

The Dwarfs from “The Berkeley View” (Asanovic et al.)Dwarfs form our key computational patterns

Page 30: Design Patterns for Parallel Programming

30

Patterns for Parallel Programming• PLPP is the first attempt to

develop a complete pattern language for parallel software development.

•  PLPP is a great model for a pattern language for parallel software

•  PLPP mined scientific applications that utilize a monolithic application style

• PLPP doesn’t help us much with horizontal composition

• Much more useful to us than: Design Patterns: Elements of Reusable Object-Oriented Software, Gamma, Helm, Johnson & Vlissides, Addison-Wesley, 1995.

Page 31: Design Patterns for Parallel Programming

31

Structural programming patterns In order to create more complex

software it is necessary to compose programming patterns

For this purpose, it has been useful to induct a set of patterns known as “architectural styles”

Examples: pipe and filter event based/event driven layered Agent and

repository/blackboard process control Model-view-controller

Page 32: Design Patterns for Parallel Programming

13 dwarves

Putting it all together…

Page 33: Design Patterns for Parallel Programming

Elements of a Pattern DescriptionElements of a Pattern Description

• Name

• Problem: Classes of problems this pattern addresses

• Context Context in which this problem occurs

• Forces Trade-offs that crop up in this situation

• Solution Solution the pattern embodies

• Invariants Properties that need to always be true for this pattern to work

• Examples

• Known uses

• Related Patterns

33

Page 34: Design Patterns for Parallel Programming

34

Graph Algorithms

Dynamic Programming

Dense Linear Algebra

Sparse Linear Algebra

Unstructured Grids

Structured Grids

Model-view controller

Iterator

Map reduce

Layered systems

Arbitrary Static Task Graph

Pipe-and-filter

Agent and Repository

Process Control

Event based, implicit invocation

Graphical models

Finite state machines

Backtrack Branch and Bound

N-Body methods

Circuits

Spectral Methods

Task Decomposition ↔ Data DecompositionGroup Tasks Order groups data sharing data access

Applications

Pipeline

Discrete Event

Event Based

Divide and Conquer

Data Parallelism

Geometric Decomposition

Task Parallelism

Graph algorithms

Fork/Join

CSP

Master/worker

Loop Parallelism

BSP

Distributed Array Shared-Data

Shared Queue

Shared Hash Table

Barriers

Mutex

Thread Creation/destruction

Process/Creation/destruction

Message passing

Collective communication

Speculation

Transactional memory

Choose your high level structure – what is the structure of my application? Guided expansion

Identify the key computational patterns – what are my key computations?Guided instantiation

Implementation methods – what are the building blocks of parallel programming? Guided implementation

Choose your high level architecture - Guided decomposition

Refine the structure - what concurrent approach do I use? Guided re-organization

Utilize Supporting Structures – how do I implement my concurrency? Guided mapping

Pro

du

ctiv

ity

Lay

erE

ffic

ien

cy L

ayer

Digital Circuits

Semaphores

Programming Pattern Language 1.0 Keutzer& Mattson

Page 35: Design Patterns for Parallel Programming

35

Decompose Tasks• Group tasks• Order Tasks

Architecting Parallel Software

Identify the Software Structure

Identify the Key Computations

Decompose Data• Identify data sharing• Identify data access

Page 36: Design Patterns for Parallel Programming

36

• Pipe-and-Filter• Agent-and-Repository• Event-based coordination• Iterator• MapReduce• Process Control• Layered Systems

Identify the SW Structure

Structural Patterns

These define the structure of our software but they do not describe what is computed

Page 37: Design Patterns for Parallel Programming

37

Analogy: Layout of Factory Plant

Page 38: Design Patterns for Parallel Programming

38

Identify Key Computations

Computational patterns describe the key computations but not how they are implemented

Computational Patterns

Page 39: Design Patterns for Parallel Programming

39

Analogy: Machinery of the Factory

Page 40: Design Patterns for Parallel Programming

40

• Pipe-and-Filter• Agent-and-Repository• Event-based• Bulk Synchronous• MapReduce• Layered Systems• Arbitrary Task Graphs

Decompose Tasks/Data

Order tasks Identify Data Sharing and Access

• Graph Algorithms• Dynamic programming• Dense/Spare Linear Algebra • (Un)Structured Grids• Graphical Models• Finite State Machines• Backtrack Branch-and-Bound• N-Body Methods• Circuits• Spectral Methods

Architecting Parallel Software

Identify the Software Structure

Identify the Key Computations

Page 41: Design Patterns for Parallel Programming

41

Analogy: Architected Factory

Raises appropriate issues like scheduling, latency, throughput, workflow, resource management, capacity etc.

Page 42: Design Patterns for Parallel Programming

42

Outline

A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo

Page 43: Design Patterns for Parallel Programming

Inventory of Structural Patterns

1. pipe and filter

2. iterator

3. MapReduce

4. blackboard/agent and repository

5. process control

6. model-view controller

7. layered

8. event-based coordination

Page 44: Design Patterns for Parallel Programming

44

Elements of a structural pattern

Components are where the computation happens

Connectors are where the communication happens

A configuration is a graph of components (vertices) and connectors (edges)

A structural pattern may be described as a family of graphs.

Page 45: Design Patterns for Parallel Programming

45

Filter 6

Filter 5

Filter 4

Filter 2

Filter 7

Filter 3

Filter 1

Pattern 1: Pipe and Filter• Filters embody computation• Only see inputs and produce

outputs• Pipes embody

communication

May have feedback

Examples?

Page 46: Design Patterns for Parallel Programming

46

Examples of pipe and filter

Almost every large software program has a pipe and filter structure at the highest level

Logic optimizerImage Retrieval SystemCompiler

Page 47: Design Patterns for Parallel Programming

47

Pattern 2: Iterator Pattern

itera

te

Exit condition met?

Initialization condition

Synchronize results of iteration

Variety of functions performed asynchronously

Yes

No

Examples?

Page 48: Design Patterns for Parallel Programming

4848

Example of Iterator Pattern:Training a Classifier: SVM Training

48

UpdatesurfaceUpdatesurface

IdentifyOutlierIdentifyOutlier

itera

te

Iterator Structural Pattern

All points withinacceptable error? Yes

No

Page 49: Design Patterns for Parallel Programming

49

Pattern 3: MapReduce

To us, it means A map stage, where data is mapped onto independent

computations A reduce stage, where the results of the map stage are

summarized (i.e. reduced)

Map

Reduce

Map

Reduce

Examples?

Page 50: Design Patterns for Parallel Programming

50

Examples of Map Reduce

General structure: Map a computation across distributed data sets Reduce the results to find the best/(worst),

maxima/(minima)

Speech recognition• Map HMM computation

to evaluate word match• Reduce to find the

most-likely word sequences

Support-vector machines (ML)• Map to evaluate distance from

the frontier • Reduce to find the greatest

outlier from the frontier

Page 51: Design Patterns for Parallel Programming

51

Pattern 4: Agent and Repository

Repository/Blackboard

(i.e. database)

Agent 2Agent 1

Agent 4

Agent and repository : Blackboard structural patternAgents cooperate on a shared medium to produce a resultKey elements: Blackboard: repository of the resulting creation that is

shared by all agents (circuit database) Agents: intelligent agents that will act on blackboard

(optimizations) Manager: orchestrates agents access to the blackboard and

creation of the aggregate results (scheduler)

Agent 3

Examples?

Page 52: Design Patterns for Parallel Programming

52

Example: Compiler Optimization

Constantfolding

loopfusion

Softwarepipelining

Common-sub-expressionelimination

Strength-reduction

Dead-code elimination

Optimization of a software program Intermediate representation of program is stored in the repository Individual agents have heuristics to optimize the program Manager orchestrates the access of the optimization agents to the

program in the repository Resulting program is left in the repository

InternalProgram

representation

Page 53: Design Patterns for Parallel Programming

53

Example: Logic Optimization

Optimization of integrated circuits Integrated circuit is stored in the repository Individual agents have heuristics to optimize the circuitry of an

integrated circuit Manager orchestrates the access of the optimization agents to the

circuit repository Resulting optimized circuit is left in the repository

timingopt agent 1

timingopt agent 2

timingopt agent 3

timingopt agent N……..

Circuit Database

Page 54: Design Patterns for Parallel Programming

54

Pattern 5: Process Control

Process control: Process: underlying phenomena to be controlled/computed Actuator: task(s) affecting the process Sensor: task(s) which analyze the state of the process Controller: task which determines what actuators should be

effected

processcontroller

input variables

controlledvariables

controlparameters

manipulatedvariables

senso

rs

actuators

Source: Adapted from Shaw & Garlan 1996, p27-31.

Examples?

Page 55: Design Patterns for Parallel Programming

55

Examples of Process Control

Circuitcontroller

usertiming

constraints

Speed?

Launchingtransformations

Timingconstraints

Power?

Process control structural pattern

Page 56: Design Patterns for Parallel Programming

56

Pattern 6: Model-View-Controller

• Model: embodies the data and “intelligence” (aka business logic) of the system

• Controller: captures all user input and translates it into actions on the model

• View: renders the current state of the model for userExamples?

Page 57: Design Patterns for Parallel Programming

57

Example of Model-View Controller

View 2View 2View 1View 1

20%30%

50%

Control FormControl Form

Values

View 1

View 2

50%

30%

20%

Back Back

Controllera = 50%b = 30%c = 20%

Model

User Updates Values

&Presses ‘View 1’ Button

State Change

Controller Selects View View

Determines Model State

Extended from: Design Patterns Elements of Reusable Object-

Oriented Software

Page 58: Design Patterns for Parallel Programming

58

Pattern 7: Layered Systems

Individual layers are big but the interface between two adjacent layers is narrow

Non-adjacent layers cannot communicate directly. Examples?

Page 59: Design Patterns for Parallel Programming

59

Example: ISO Network Protocol

Page 60: Design Patterns for Parallel Programming

60

Pattern 8: Event-based Systems

Agents interact via events/signals in a medium Event manager manages events Interaction among agents is dynamic – no fixed connection

Agent

Agent

Agent Agent

AgentAgent

Agent

Agent

EventManager

Medium

Examples?

Page 61: Design Patterns for Parallel Programming

61

Example: The Internet

128.0.0.56

• Internet is the medium• Computers are agents• Signals are IP packets• Control plane of the router is

the event manager

Page 62: Design Patterns for Parallel Programming

62

Remember the Analogy: Layout of Factory Plant

We have only talked about structure. We haven’t described computation.

Page 63: Design Patterns for Parallel Programming

63

Decompose Tasks• Group tasks• Order Tasks

Architecting Parallel Software

Identify the Software Structure

Identify the Key Computations

Decompose Data• Identify data sharing• Identify data access

Page 64: Design Patterns for Parallel Programming

64

Outline

A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: design patterns in Smoke Demo

Page 65: Design Patterns for Parallel Programming

65

Computational Patternsthe dwarfs from “The Berkeley View” form our key

computational patterns

Page 66: Design Patterns for Parallel Programming

66

Problem and Context

Problem In many situations the proper behavior of a system can naturally

be described by a language of finite, or perhaps infinite, strings. The problem is to define a piece of software that distinguishes between valid input strings (associated with proper behavior) and invalid input strings (improper behavior). The system may have a set of pre-defined responses to proper input and another set of responses to improper input.

Context: As inputs arrive a system must respond depending on the input.

Alternatively, depending on inputs, changes in a process may be actuated. The proper response of the system may be to idle after receiving a sequence of inputs, or the input stream may be presumed to be infinite.

Page 67: Design Patterns for Parallel Programming

67

Solution: Finite State Machines

Transducer FSM: Huffman decoding

A 0

B 10

C 1100

D 1101

E 1110

F 1111

Symbol Codeword

a b c

d

e

1/-0/B

0/A0/C, 1/D

0/E, 1/F

1/-1/-

0/-

Input Alphabet : 0, 1Output Alphabet : A - FStates: a,b,c,d,e Transitions:

( (a, 0) , (a, A)), ( (a, 1) , (b,-)),…

Initial state: a

Output

After: Multimedia Image and Video Processing By Ling Guan, Jan Larsen

Page 68: Design Patterns for Parallel Programming

68

Example: Traffic Light State Machine

IF A=1

AND B=0

Always

IF A=0

AND B=1

Otherwise

Otherwise

Always

Note:Clock beatsevery 4 sec.

So Light

is Yellow

for 4 sec.

CarSensors

B

A

Page 69: Design Patterns for Parallel Programming

69

Problem and Context

Problem In many problems the output is a simple logical function, or

bit-wise permutation, of the input.

Context: A vector of Boolean values is applied as input and another

set, defined by combinational operators or “wiring”, is given as output.

Page 70: Design Patterns for Parallel Programming

70

Solution: Circuits Describe desired function as a circuit

Page 71: Design Patterns for Parallel Programming

71

DESchosenplaintext

Kp

Kp-1

Kp-2

K3

K2

K1

new SP

MASK

test DP? EP

'1'

01

01

Example:Digital EncryptionStandard (DES)

Page 72: Design Patterns for Parallel Programming

72

Outline

A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo

Page 73: Design Patterns for Parallel Programming

73

Architecture of Logic Optimization

Graph algorithm pattern Graph algorithm pattern

Group, order tasks

Group, order tasks

Group, order tasks

DecomposeData

Page 74: Design Patterns for Parallel Programming

7474

Architecting Speech Recognition

Signal Processing

Inference Engine

Recognition Network

Voice Input

Most Likely Word

Sequence

Iterator

Pipe-and-filter

MapReduce

Beam Search Iterations

Active State Computation Steps

Dynamic Programming

Graphical Model

Pipe-and-filter

Large Vocabulary Continuous Speech Recognition Poster: Chong, YiWork also to appear at Emerging Applications for Manycore Architecture

Page 75: Design Patterns for Parallel Programming

7575

CBIR Application Framework

ResultsResults

Exercise ClassifierExercise Classifier

Learn ClassifierLearn Classifier

Feature ExtractionFeature Extraction

User FeedbackUser Feedback

Choose ExamplesChoose Examples

New ImagesNew Images

0

50

100

150

200

250

?

?

Catanzaro, Sundaram, Keutzer, “Fast SVM Training and Classification on Graphics Processors”, ICML 2008

Page 76: Design Patterns for Parallel Programming

7676

Feature Extraction

Image histograms are common to many feature extraction procedures, and are an important feature in their own right

76

• Agent and Repository: Each agent computes a local transform of the image, plus a local histogram.

• Results are combined in the repository, which contains the global histogram The data dependent access patterns found when

constructing histograms make them a natural fit for the agent and repository pattern

Page 77: Design Patterns for Parallel Programming

7777

Learn Classifier: SVM Training

77

Update Optimalit

y Condition

s

Update Optimalit

y Condition

s

Select Working

Set, Solve QP

Select Working

Set, Solve QP

Learn ClassifierLearn Classifier itera

te

Bulk Synchronous Structural Pattern

MapReduce

MapReduce

Page 78: Design Patterns for Parallel Programming

7878

Exercise Classifier : SVM Classification

Compute dot

products

Compute dot

products

Compute Kernel

values, sum & scale

Compute Kernel

values, sum & scale

Output

Test Data

SV

Exercise ClassifierExercise Classifier

MapReduce

Dense LinearAlgebra

Page 79: Design Patterns for Parallel Programming

79

Outline

A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo

Page 80: Design Patterns for Parallel Programming

80

Recap: what we’re trying to accomplish Ultimately, we want domain-expert programmers to routinely

create parallel programs. Domain-Expert programmers:

Driven by the need to ship robust solutions on schedule. New features are more important than performance. Little or no background in computer science or concurrency. Finished with formal education … “on the job” learning in burst mode.

Note: we not trying to tell concurrency experts how to write parallel programs Expert parallel programmers … they already have what they need. HPC programmers … they want performance at all costs and will work

as close to the hardware as needed to hit performance goals.

Page 81: Design Patterns for Parallel Programming

81

How do we support domain experts:

Domain literate programming gurus (1% of the population).

Application frameworks

Parallel programming frameworks

+

Parallel programming gurus (1-10% of programmers)Parallel

programming frameworks

3

2

Domain ExpertsEnd-user, application programs

Application frameworks

+1

The hope is for Domain Experts to create parallel code with little or no understanding of parallel programming.

Leave hardcore “bare metal” efficiency layer programming to the parallel programming experts

Page 82: Design Patterns for Parallel Programming

82

How will we make this happen?

Software architecture systematically described in terms of a design pattern language: Everyone … even the “gurus” … need direction.

The “parallel programming gurus” need to know the variety of parallel programming patterns in use.

The “domain literate programming gurus” need to know which application patterns to support.

… and we need to document these so everyone can work with them at the appropriate level of detail.

By expressing parallel programming in terms of design pattern languages:

We provide this direction to the gurus We document the frameworks and how they work. We layout a roadmap to solving the parallel programming problem.

Page 83: Design Patterns for Parallel Programming

83

Graph Algorithms

Dynamic Programming

Dense Linear Algebra

Sparse Linear Algebra

Unstructured Grids

Structured Grids

Model-view controller

Iterator

Map reduce

Layered systems

Arbitrary Static Task Graph

Pipe-and-filter

Agent and Repository

Process Control

Event based, implicit invocation

Graphical models

Finite state machines

Backtrack Branch and Bound

N-Body methods

Circuits

Spectral Methods

Task Decomposition ↔ Data DecompositionGroup Tasks Order groups data sharing data access

Applications

Pipeline

Discrete Event

Event Based

Divide and Conquer

Data Parallelism

Geometric Decomposition

Task Parallelism

Graph algorithms

Fork/Join

CSP

Master/worker

Loop Parallelism

BSP

Distributed Array Shared-Data

Shared Queue

Shared Hash Table

Barriers

Mutex

Thread Creation/destruction

Process Creation/destruction

Message passing

Collective communication

Speculation

Transactional memory

Choose your high level structure – what is the structure of my application? Guided expansion

Identify the key computational patterns – what are my key computations?Guided instantiation

Implementation methods – what are the building blocks of parallel programming? Guided implementation

Choose your high level architecture - Guided decomposition

Refine the structure - what concurrent approach do I use? Guided re-organization

Utilize Supporting Structures – how do I implement my concurrency? Guided mapping

Pro

du

ctiv

ity

Lay

erE

ffic

ien

cy L

ayer

Digital Circuits

Semaphores

Programming Pattern Language 1.0 Keutzer& Mattson

Page 84: Design Patterns for Parallel Programming

84

Graph Algorithms

Dynamic Programming

Dense Linear Algebra

Sparse Linear Algebra

Unstructured Grids

Structured Grids

Model-view controller

Iterator

Map reduce

Layered systems

Arbitrary Static Task Graph

Pipe-and-filter

Agent and Repository

Process Control

Event based, implicit invocation

Graphical models

Finite state machines

Backtrack Branch and Bound

N-Body methods

Circuits

Spectral Methods

Task Decomposition ↔ Data DecompositionGroup Tasks Order groups data sharing data access

Applications

Pipeline

Discrete Event

Event Based

Divide and Conquer

Data Parallelism

Geometric Decomposition

Task Parallelism

Graph algorithms

Fork/Join

CSP

Master/worker

Loop Parallelism

BSP

Distributed Array Shared-Data

Shared Queue

Shared Hash Table

Barriers

Mutex

Thread Creation/destruction

Process Creation/destruction

Message passing

Collective communication

Speculation

Transactional memory

Choose your high level structure – what is the structure of my application? Guided expansion

Identify the key computational patterns – what are my key computations?Guided instantiation

Implementation methods – what are the building blocks of parallel programming? Guided implementation

Choose your high level architecture - Guided decomposition

Refine the structure - what concurrent approach do I use? Guided re-organization

Utilize Supporting Structures – how do I implement my concurrency? Guided mapping

Pro

du

ctiv

ity

Lay

erE

ffic

ienc

y L

ayer

Digital Circuits

Semaphores

Programming Pattern Language 1.0 Keutzer& Mattson

Top levels: Structural and computational patterns: Note: in many cases, these pertain to

good software practices … both serial and parallel.

These patterns are all about parallel algorithms and how to turn them into code.

Many books talk about how to use a particular programming language … We instead focus on how to

use these languages and “think parallel”.

Page 85: Design Patterns for Parallel Programming

85

PLPP’s structure: Four design spaces in parallel software development

Original Problem Tasks, shared and local data

Decomposition

Implementation. & building blocks

Corresponding source code

Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int N = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N,DATA); for (int I= 0; I<N;I=I+Num){ tmp = func(I); Res.accumulate( tmp); }}

Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int N = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N,DATA); for (int I= 0; I<N;I=I+Num){ tmp = func(I); Res.accumulate( tmp); }}

Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int N = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N,DATA); for (int I= 0; I<N;I=I+Num){ tmp = func(I); Res.accumulate( tmp); }}

Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int Num = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N, Data); for (int I= ID; I<N;I=I+Num){ tmp = func(I, Data); Res.accumulate( tmp); }}

Concurrency

strate

gy

Units of execution + new shared data for extracted dependencies

Page 86: Design Patterns for Parallel Programming

86

Start with a specification that solves the original problem -- finish with the problem decomposed into tasks, shared data, and a partial ordering.

Design EvaluationDesign Evaluation

Start

DependencyAnalysis

DecompositionAnalysis

DataDecompositionDataDecomposition

TaskDecompositionTaskDecomposition

OrderGroupsOrderGroups

GroupTasksGroupTasks

DataSharingDataSharing

Structural Computational

decomposition

Concurrency strategy

Implementation strategy

Par prog building blocks

Decomposition(Finding Concurrency)

Page 87: Design Patterns for Parallel Programming

87

Concurrency Strategy(Algorithm Structure)

Start

Organize By Data

GeometricDecomposition

GeometricDecomposition

Linear?

Recursive?

Task Parallelism

Task Parallelism

Divide and Conquer

Divide and Conquer

Recursive Data

Recursive Data

Linear?

Recursive?

Organize By Flow of Data

Regular?

Irregular?

PipelinePipeline Event Based Coordination

Event Based Coordination

Design PatternDesign Pattern

Decision

Decision Point Key

Organize By Tasks

Structural Computational

decomposition

Concurrency strategy

Implementation strategy

Par prog building blocks

Page 88: Design Patterns for Parallel Programming

88

Implementation strategy(Supporting Structures)

High level constructs impacting large scale organization of the source code.

Program Structure

Master/WorkerMaster/Worker

SPMDSPMD

Loop ParallelismLoop Parallelism

Fork/JoinFork/Join

Data Structures

Shared DataShared Data

Shared QueueShared Queue

Distributed ArrayDistributed Array

Structural Computational

decomposition

Concurrency strategy

Implementation strategy

Par prog building blocks

Page 89: Design Patterns for Parallel Programming

89

Parallel Programming building blocks

Low level constructs implementing specific constructs used in parallel computing. Examples in Java, OpenMP and MPI.

These are not properly design patterns, but they are included to make the pattern language self-contained.

Process controlProcess control

Thread controlThread control

UE* ManagementSynchronization

Memory sync/fencesMemory sync/fences

barriersbarriers

Mutual ExclusionMutual Exclusion

* UE = Unit of execution

Collective CommCollective Comm

Message PassingMessage Passing

Other CommOther Comm

Communications

Structural Computational

decomposition

Concurrency strategy

Implementation strategy

Par prog building blocks

Page 90: Design Patterns for Parallel Programming

90

Case Studies Simple Examples

Numerical Integration

Linear Algebra Abstracting complexity in data structures

Spectral Methods Multi-level parallelism

Branch and Bound Managing recursive parallelism

Molecular dynamics A complete example from design to code.

Game Engines A complex example showing that there are multiple ways

to parallelize a problem

Page 91: Design Patterns for Parallel Programming

91

The heart of a game is the “Game Engine”

InputSim

data/stateAudio

Data/state

Render

Data/state

assetsMedia (disk, optical media)

control

Front End

Data

networkAnimation

game state in its internally managed data structures.

Source: Conversations with developers at Electronic Arts

Page 92: Design Patterns for Parallel Programming

92

The heart of a game is the “Game Engine”

InputSim

data/stateAudio

Data/state

Render

Data/state

assetsMedia (disk, optical media)

control

Front End

Data

networkAnimation

game state in its internally managed data structures.

Physics Particle System

AICollision Detection

Time Loop

Sim is the integrator and integrates from one time step to the next; calling update

methods for the other modules inside a central time loop.

Sim is the integrator and integrates from one time step to the next; calling update

methods for the other modules inside a central time loop.

Sim

Source: Conversations with developers at Electronic Arts

Page 93: Design Patterns for Parallel Programming

93

Combine modules into groups, assign one group to a thread.

Asynchronous execution … interact through events.

Coarse grained parallelism dominated by flow of data between groups … event based coordination pattern.

Finding concurrency: functional parallelism

InputSim

data/stateAudio

Data/state

Render

Data/state

assetsMedia (disk, optical media)

control

Front End

Data

networkAnimation

Page 94: Design Patterns for Parallel Programming

94

The FindingConcurrency Design Space

Start with a specification that solves the original problem -- finish with the problem decomposed into tasks, shared data, and a partial ordering.

Design EvaluationDesign Evaluation

Start

DependencyAnalysis

DecompositionAnalysis

DataDecompositionDataDecomposition

TaskDecompositionTaskDecomposition

OrderGroupsOrderGroups

GroupTasksGroupTasks

DataSharingDataSharing

Many cores needs many concurrent tasks … functional parallelism just doesn’t

expose enough concurrency in this case. We need to go back and rethink the design.

Page 95: Design Patterns for Parallel Programming

95

More sophisticated parallelization strategy

InputSim

data/stateAudio

Data/state

Render

Data/state

assetsMedia (disk, optical media)

control

Front End

Data

networkAnimation

Decompose computation into a pool of tasks – finer granularity exposes more concurrency.

Work stealing critical to support good load balancing.

Page 96: Design Patterns for Parallel Programming

96

Parallel Execution Strategy: Task parallelism pattern

InputSim

data/state

Audio

Data/state

Render

Data/state

assetsMedia (disk, optical media)

control

Front End

Data

networkAnimation

A pool of threads executes tasks based on:

• When dependencies are met.

• Task Priority

The Challenge is scheduling based on bus usage, load balancing, and managing concurrency for correctness

Dependencies create an acyclic graph of tasks.

Tasks are assigned priorities with some labeled as optional.

Page 97: Design Patterns for Parallel Programming

97

The AlgorithmStructure Design Space

Start

Organize By Data

GeometricDecomposition

GeometricDecomposition

Linear? Recursive?

Task Parallelism

Task Parallelism

Divide and Conquer

Divide and Conquer

Recursive Data

Recursive Data

Linear? Recursive?

Organize By Flow of Data

Regular? Irregular?

PipelinePipeline Event Based Coordination

Event Based Coordination

Organize By Tasks

Solution: A composition of these two patterns

High level architecture … functional decomposition

Collections of tasks generated by modules

Structural Computational

decomposition

Concurrency strategy

Implementation strategy

Par prog building blocks

Page 98: Design Patterns for Parallel Programming

98

The Supporting Structures Design Space

Program Structure

Master/WorkerMaster/Worker

SPMDSPMD

Loop ParallelismLoop Parallelism

Fork/JoinFork/Join

Data Structures

Shared DataShared Data

Shared QueueShared Queue

Distributed ArrayDistributed Array

Manage task queues in a global pool (to support work stealing)

Top level functional Decomposition with MPI

Standard queue won’t work, will need to build more general data structures to support prioritized queue and interrupt capabilities

Top level functional Decomposition with MPI

Standard queue won’t work, will need to build more general data structures to support prioritized queue and interrupt capabilities

Manage task queues in a global pool (to support work stealing)

Top level functional Decomposition with MPI

Standard queue won’t work, will need to build more general data structures to support prioritized queue and interrupt capabilities

Page 99: Design Patterns for Parallel Programming

99

Architecture/Framework implicationsTop level structural patterns

Event based, implicit invocation

Inside each module Fine grained task decomposition Master worker pattern extended with

Work stealing Task priorities Soft real time constraints … interrupt, update state, flush queues

This would fit nicely into a framework … in fact it needs a framework since the task queues would

be very complicated to get right.

Page 100: Design Patterns for Parallel Programming

100

Outline The parallel computing landscape

A Programming Pattern Language Overview Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: simple case studies

Applications in CAD The CAD algorithm landscape Parallel patterns in CAD: overview Detailed case studies: architecting parallel CAD software

Page 101: Design Patterns for Parallel Programming

How does architecting SW help?How does architecting SW help?

A good software architecture should provide: Composibility: allow you to build something complicated by

composing simple pieces Modularity: elements of the software are firewalled from each

other Locality: easy to identify what’s happening where May provide a basis for building a more general framework (e.g.

Model-view-controller and Ruby-on-Rails)

Patterns help by: Clearly identifying the problem Offering solutions that capture and embody a lot of shared

experience with solving the problem Offering a common vocabulary for this problem/solution process Also capturing common pitfalls of the solutions in the pattern

(e.g. repository will always become the bottleneck in data-and-respository)

101

Page 102: Design Patterns for Parallel Programming

Overall SummaryOverall Summary Manycore microprocessors are on the horizon and computationally intensive

applications, like computer-aided design, will want to exploit them

Incremental methods will not scale to exploit >= 32 processors: Use of threads (OpenMP, Win32, others) Incremental parallelization of serial codes Use of threading Tools and Library support

Need to re-architect software

Key to re-architect software for manycore is patterns and frameworks

We identified two key computational patterns for CAD: graph algorithms and backtracking/branch-and-bound

There is some software support in the from of a framework, Cilk, for backtracking/branch-and-bound

The key to parallelize graph algorithms is netlist/graph partitioning

We presented a number of approaches for effective graph partitioning

We aim to use this to produce a framework for graph algorithms

Page 103: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

103

Design Patterns in education:lessons from history?

Early OO days

Perception: Object oriented? Isn’t that just an academic thing?

Usage: specialists only. Mainstream avoids with indifference, anxiety, or worse.

Performance: not so good.

Now

Perception: OO=programming

Isn’t this how it was always done?

Usage: cosmetically widespread, some key concepts actually deployed.

Performance: so-so, masked by CPU advances until now.

1994

Page 104: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

104

Design Patterns in education:lessons from history?

Early PP days(now)

Perception: Parallel programming? Isn’t that just an HPC thing?

Usage: specialists only. Mainstream avoids with indifference, anxiety, or worse.

Performance: very good, for the specialists.

Future

Perception: PP=programming

Isn’t this how it was always done?

Usage: cosmetically widespread, some key concepts actually deployed.

Performance: broadly sufficient. Application domains greatly expanded.

2010?

Page 105: Design Patterns for Parallel Programming

Copyright © 2009, Intel Corporation. All rights reserved. *Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners

Smoke Pattern DecompositionMichael Anderson, Mark

Murphy,

Kurt Keutzer

Page 106: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

106

Agenda

Characterize the architecture and computation of Smoke in terms of Our Pattern Language

• What is an effective multi-threaded game architecture?

• Can the architecture and computation in Smoke be described in Our Pattern Language?

• Which components of Smoke can be easily sped up on manycore devices?

Games as a potential ParLab app

• Where can we as grad students have the most impact?

• What else can manycore enable?

106/32

Page 107: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

107

Our Pattern Language

Hierarchy of patterns that describe reusable components of software.

107/32

Page 108: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

108

Agenda

Characterize the architecture and computation of Smoke in terms of Our Pattern Language

• What is an effective multi-threaded game architecture?

• Can the architecture and computation in Smoke be described in Our Pattern Language?

• Which components of Smoke can be easily sped up on manycore devices?

Games as a potential ParLab app

• Where can we as grad students have the most impact?

• What else can manycore enable?

108/32

Page 109: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

109

Structural and Computational PatternsFirst lets talk about the structural and computational patterns we found in Smoke.

109/32

Page 110: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

110

Input PhysicsPos, Velocity, Collision

AIPos, Velocity, Goal

GraphicsPos, Verts, Texture

InputInput data

AudioPos, SFX

Pos

Velocity

Intro

Smoke is composed of several subsystems

Some (Physics, Graphics . .) are large independent engines.

110/32

Pos

Pos

Top-Level: Architecture

Brad Werth, GDC 2009

Page 111: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

111

Top-Level v1: Task Graph Pattern

There exist fixed dependencies between subsystems

Can be modeled as an arbitrary task graph

Example: Moving the zombie

• Keyboard -> AI -> Physics -> Graphics

111/32

Input

Physics

Graphics

AI

Effects

Page 112: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

112

Top-Level v1: Iterator Pattern

Iterates over consecutive frames

Data from previous frame is used in the next

112/32

Input

Physics

Graphics

AI

Effects

Page 113: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

113

Top-Level v2: Puppeteer

Subsystems have private data, need access to other subsystems' data

Don't want to write N*(N-1) interfaces

Common use is to manage communication between multiple simulators, each with different data structures

113/32

Page 114: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

114

Task Graph (v1) vs. Puppeteer (v2)

Subsystem modularity is important (swappable)

Want to reduce the complexity of communication for a scalable multi-threaded design

Motivates the use of the puppeteer pattern instead of arbitrary task graph for subsystem communication

114/32

Input Physics Graphics Effects

Framework

Change Control Manager

Interfaces

AI

Brad Werth, GDC 2009

Page 115: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

115

Agenda

Characterize the architecture and computation of Smoke in terms of Our Pattern Language

• What is an effective multi-threaded game architecture?

• Can the architecture and computation in Smoke be described in Our Pattern Language?

• Which components of Smoke can be easily sped up on manycore devices?

Games as a potential ParLab app

• Where can we as grad students have the most impact?

• What else can manycore enable?

115/32

Page 116: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

116

Puppeteer Subsystem Patterns

Now that we’ve described the top-level structural patterns, look at the subsystems.

116/32

Input Physics Graphics Effects

Framework

Change Control Manager

Interfaces

AI

Page 117: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

117

Physics Subsystem

From Havok User Guide

Pipe and filter and structure

117/32

Page 118: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

118

Physics Subsystem

Constraint Solver – Minimize penalties and forces in a system subject to constraints.

Dense Linear Algebra computational pattern

118/32

> 180°

robbocode.com

Page 119: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

119

Physics Subsystem

Collision Detection – Interpolate paths and test if overlap occurs during the current timestep

N-body computational pattern

119/32

Page 120: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

120

AI Subsystem

Agent and Repository structural pattern

• Multiple agents access and modify shared data

• Examples: Version control systems, logic optimization, parallel SAT solvers

120/32

Shared Repository

(i.e., Database)

Agent 1

Agent 2

Agent N

. . .

Page 121: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

121

AI Subsystem

Zombies and chickens update location in POI repository

One writer (zombie) many readers (chickens) reduces the burden on the repository controller

Agents (animals) are independent state machines

121/32Brad Werth, GDC 2009

Page 122: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

122

Effects Subsystem

Procedural Fire: Two particle simulators

• Visible fire

• Invisible “Heat” particles

N-body computational pattern

122/32Hugh Smith, Intel

Page 123: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

123

Parallel Implementation Patterns

We’ve described the structural and computational patterns. What about the lower layers?

123/32

Page 124: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

124

Parallel Algorithm Strategy Patterns (1)At the top level, subsystems can run concurrently

• Communication within a single frame is eliminated by double-buffering

Smoke exploits this task parallelism (subject of Lab 2)

124/32

Frame 1 Data

Frame 2 Data

Graphics Physics AI . . .

Page 125: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

125

Parallel Algorithm Strategy Patterns (2)Procedural Fire is highly data parallel

Smoke spreads the computation over 8 threads (ideal hardware utilization)

125/32

Hugh Smith, Intel

Brad Werth, GDC 2009

Page 126: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

126

Parallel Algorithm Strategy Patterns (3)AI subsystem exploits task parallelism among independent agents (chickens)

126/32

Render

AI

Brad Werth, GDC 2009

Render

AI

Thread Profile Before

Thread Profile After

Page 127: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

127

Implementation Strategy and Concurrent Execution Patterns

Task queue -> Thread pool

Intel Thread Building Blocks

127/32

Brad Werth, GDC 2009

Page 128: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

128

Agenda

Characterize the architecture and computation of Smoke in terms of Our Pattern Language

• What is an effective multi-threaded game architecture?

• Can the architecture and computation in Smoke be described in Our Pattern Language?

• Which components of Smoke can be easily sped up on manycore devices?

Games as a potential ParLab app

• Where can we as grad students have the most impact?

• What else can manycore enable?

128/32

Page 129: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

129

What was done in the demo?

129/32

Top level

• Task parallelism among subsystems

Procedural Fire

• Data parallel particle simulator

AI Subsystem

• Task parallelism among agents

Page 130: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

130

What else can be done? (1)

130/32

Add data parallel effects: rain, ragdoll corpses, more things breaking apart

• Similar to procedural fire

• Can easily be added without changing gameplay

• Objects may or may not be registered with the scene graph

nvidia.com

Particle simulator

Page 131: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

131

What else can be done? (2)

131/32

Speed up and enhance physics subsystem

• Independent “simulation islands” can be executed in parallel

• Parallel constraint solver & collision detection

• Speedup allows for more detailed and realistic simulation– More active independent objects in simulation– Bring in new algorithms from scientific computing

• Must ensure the worst-case computation can fit in a single frame

• Nvidia PhysX is working on this

Page 132: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

132

What else can be done? (3)

132/32

Enable more complex AI

• Smoke’s AI is very simple. What is AI like in larger games?

• If AI is just state machines, we can add more

Add new interfaces

• Computer vision pose detection as input

• Overlaps well with our current research

• Microsoft Xbox – Project Natal

Xbox.com/projectnatal

Page 133: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

133

Agenda

Characterize the architecture and computation of Smoke in terms of Our Pattern Language

• What is an effective multi-threaded game architecture?

• Can the architecture and computation in Smoke be described in Our Pattern Language?

• Which components of Smoke can be easily sped up on manycore devices?

133/32

Page 134: Design Patterns for Parallel Programming

Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners

134

Summary and conclusions

Characterize the architecture and computation of Smoke in terms of Our Pattern Language

• What is an effective multi-threaded game architecture?– Puppeteer patterns allows for modularity, ease of development and

evolution

• Can the architecture and computation in Smoke be described in Our Pattern Language?– Yes, the description is natural

• Which components of Smoke can be easily sped up on manycore devices?– Data parallel effects, physics, AI, and new interfaces all show potential

benefit

134/32