Intel Slides on Parallelism

172
Intel ® Software College What Is Parallel Computing? Attempt to speed solution of a particular task by 1. Dividing task into sub-tasks 2. Executing sub-tasks simultaneously on multiple processors Successful attempts require both Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 1 Recognizing Potential Parallelism Successful attempts require both 1. Understanding of where parallelism can be effective 2. Knowledge of how to design and implement good solutions

Transcript of Intel Slides on Parallelism

Page 1: Intel Slides on Parallelism

Intel® Software College

What Is Parallel Computing?

Attempt to speed solution of a particular task by

1. Dividing task into sub-tasks

2. Executing sub-tasks simultaneously on multiple processors

Successful attempts require both

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

1Recognizing Potential Parallelism

Successful attempts require both

1. Understanding of where parallelism can be effective

2. Knowledge of how to design and implement good solutions

Page 2: Intel Slides on Parallelism

Intel® Software College

Methodology

Study problem, sequential program, or code segment

Look for opportunities for parallelism

Try to keep all processors busy doing useful work

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

2Recognizing Potential Parallelism

Page 3: Intel Slides on Parallelism

Intel® Software College

Ways of Exploiting Parallelism

Domain decomposition

Task decomposition

Pipelining

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

3Recognizing Potential Parallelism

Page 4: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

First, decide how data elements should be divided among processors

Second, decide which tasks each processor should be doing

Example: Vector addition

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

4Recognizing Potential Parallelism

Page 5: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

5Recognizing Potential Parallelism

Page 6: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

6Recognizing Potential Parallelism

Page 7: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

7Recognizing Potential Parallelism

Page 8: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

8Recognizing Potential Parallelism

Page 9: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

9Recognizing Potential Parallelism

Page 10: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

10Recognizing Potential Parallelism

Page 11: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

11Recognizing Potential Parallelism

Page 12: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

12Recognizing Potential Parallelism

Page 13: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

13Recognizing Potential Parallelism

Page 14: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

14Recognizing Potential Parallelism

Page 15: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

15Recognizing Potential Parallelism

Page 16: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition

Find the largest element of an array

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

16Recognizing Potential Parallelism

Page 17: Intel Slides on Parallelism

Intel® Software College

Task (Functional) Decomposition

First, divide tasks among processors

Second, decide which data elements are going to be accessed (read and/or written) by which processors

Example: Event-handler for GUI

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

17Recognizing Potential Parallelism

Page 18: Intel Slides on Parallelism

Intel® Software College

Task Decomposition

f()

r()q()

h()

g()

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

18Recognizing Potential Parallelism

s()

q()h()

Page 19: Intel Slides on Parallelism

Intel® Software College

Task Decomposition

f()

r()q()

h()

g()

CPU 0

CPU 2

CPU 1

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

19Recognizing Potential Parallelism

s()

q()h()

Page 20: Intel Slides on Parallelism

Intel® Software College

Task Decomposition

f()

r()q()

h()

g()

CPU 0

CPU 2

CPU 1

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

20Recognizing Potential Parallelism

s()

q()h()

Page 21: Intel Slides on Parallelism

Intel® Software College

Task Decomposition

f()

r()q()

h()

g()

CPU 0

CPU 2

CPU 1

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

21Recognizing Potential Parallelism

s()

q()h()

Page 22: Intel Slides on Parallelism

Intel® Software College

Task Decomposition

f()

r()q()

h()

g()

CPU 0

CPU 2

CPU 1

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

22Recognizing Potential Parallelism

s()

q()h()

Page 23: Intel Slides on Parallelism

Intel® Software College

Task Decomposition

f()

r()q()

h()

g()

CPU 0

CPU 2

CPU 1

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

23Recognizing Potential Parallelism

s()

q()h()

Page 24: Intel Slides on Parallelism

Intel® Software College

Pipelining

Special kind of task decomposition

“Assembly line” parallelism

Example: 3D rendering in computer graphics

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

24Recognizing Potential Parallelism

Input Output

Page 25: Intel Slides on Parallelism

Intel® Software College

Processing One Data Set (Step 1)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

25Recognizing Potential Parallelism

Page 26: Intel Slides on Parallelism

Intel® Software College

Processing One Data Set (Step 2)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

26Recognizing Potential Parallelism

Page 27: Intel Slides on Parallelism

Intel® Software College

Processing One Data Set (Step 3)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

27Recognizing Potential Parallelism

Page 28: Intel Slides on Parallelism

Intel® Software College

Processing One Data Set (Step 4)

RasterizeClipProjectModel

The pipeline processes 1 data set in 4 steps

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

28Recognizing Potential Parallelism

Page 29: Intel Slides on Parallelism

Intel® Software College

Processing Two Data Sets (Step 1)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

29Recognizing Potential Parallelism

Page 30: Intel Slides on Parallelism

Intel® Software College

Processing Two Data Sets (Time 2)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

30Recognizing Potential Parallelism

Page 31: Intel Slides on Parallelism

Intel® Software College

Processing Two Data Sets (Step 3)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

31Recognizing Potential Parallelism

Page 32: Intel Slides on Parallelism

Intel® Software College

Processing Two Data Sets (Step 4)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

32Recognizing Potential Parallelism

Page 33: Intel Slides on Parallelism

Intel® Software College

Processing Two Data Sets (Step 5)

RasterizeClipProjectModel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

33Recognizing Potential Parallelism

The pipeline processes 2 data sets in 5 steps

Page 34: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 1)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

34Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 35: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 2)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

35Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 36: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 3)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

36Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 37: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 4)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

37Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 38: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 5)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

38Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 39: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 6)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

39Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 40: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 7)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

40Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 41: Intel Slides on Parallelism

Intel® Software College

Pipelining Five Data Sets (Step 8)

Data set 0

Data set 1

Data set 2

CPU 0 CPU 1 CPU 2 CPU 3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

41Recognizing Potential Parallelism

Data set 2

Data set 3

Data set 4

Page 42: Intel Slides on Parallelism

Intel® Software College

Dependence Graph

Graph = (nodes, arrows)

Node for each

Variable assignment (except index variables)

Constant

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

42Recognizing Potential Parallelism

Operator or function call

Arrows indicate use of variables and constants

Data flow

Control flow

Page 43: Intel Slides on Parallelism

Intel® Software College

Dependence Graph Example #1

for (i = 0; i < 3; i++)a[i] = b[i] / 2.0;

b[0] b[1] b[2]2 2 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

43Recognizing Potential Parallelism

a[0] a[1] a[2]

///

Page 44: Intel Slides on Parallelism

Intel® Software College

Dependence Graph Example #1

for (i = 0; i < 3; i++)a[i] = b[i] / 2.0;

b[0] b[1] b[2]2 2 2

Domain decompositionpossible

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

44Recognizing Potential Parallelism

a[0] a[1] a[2]

///

Page 45: Intel Slides on Parallelism

Intel® Software College

Dependence Graph Example #2

for (i = 1; i < 4; i++)a[i] = a[i-1] * b[i];

b[1] b[2] b[3]a[0]

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

45Recognizing Potential Parallelism

a[1] a[2] a[3]

***

Page 46: Intel Slides on Parallelism

Intel® Software College

Dependence Graph Example #2

for (i = 1; i < 4; i++)a[i] = a[i-1] * b[i];

b[1] b[2] b[3]a[0]

No domain decomposition

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

46Recognizing Potential Parallelism

a[1] a[2] a[3]

***

Page 47: Intel Slides on Parallelism

Intel® Software College

Dependence Graph Example #3

a = f(x, y, z);b = g(w, x);t = a + b;c = h(z);s = t / c;

x

f

w y z

g h

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

47Recognizing Potential Parallelism

ab

t

c

s

/

Page 48: Intel Slides on Parallelism

Intel® Software College

Dependence Graph Example #3

a = f(x, y, z);b = g(w, x);t = a + b;c = h(z);s = t / c;

x

f

w y z

g h

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

48Recognizing Potential Parallelism

ab

t

c

s

/

Taskdecompositionwith 3 CPUs.

Page 49: Intel Slides on Parallelism

Intel® Software College

Speculative Computation in a Turn-Based Strategy Game

Make moves for distantAI-controlled countries

in parallel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

49Recognizing Potential Parallelism

Page 50: Intel Slides on Parallelism

Intel® Software College

Risk: Unexpected Interaction

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

50Recognizing Potential Parallelism

Page 51: Intel Slides on Parallelism

Intel® Software College

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

51Recognizing Potential Parallelism

Page 52: Intel Slides on Parallelism

Intel® Software College

Orange Cannot Move a Ship that Has Already Been Sunk by Green

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

52Recognizing Potential Parallelism

Page 53: Intel Slides on Parallelism

Intel® Software College

Solution: Reverse Time

Must be able to “undo” an erroneous, speculative computation

Analogous to what is done in hardware after incorrect branch prediction

Speculative computations typically do not have a big

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

53Recognizing Potential Parallelism

Speculative computations typically do not have a big payoff in parallel computing

Page 54: Intel Slides on Parallelism

Intel® Software College

Fork/Join Programming Model

When program begins execution, only master thread active

Master thread executes sequential portions of program

For parallel portions of program, master thread forks

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

54Shared-Memory Model and Threads

For parallel portions of program, master thread forks(creates or awakens) additional threads

At join (end of parallel section of code), extra threads are suspended or die

Page 55: Intel Slides on Parallelism

Intel® Software College

Relating Fork/Join to Code

for {

}

Sequential code

Parallel code

Sequential code

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

55Shared-Memory Model and Threads

for {

}

Sequential code

Parallel code

Sequential code

Page 56: Intel Slides on Parallelism

Intel® Software College

Incremental Parallelization

Sequential program a special case of threaded program

Programmers can add parallelism incrementally

Profile program execution

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

56Shared-Memory Model and Threads

Repeat

Choose best opportunity for parallelization

Transform sequential code into parallel code

Until further improvements not worth the effort

Page 57: Intel Slides on Parallelism

Intel® Software College

Utility of Threads

Threads are flexible enough to implement

Domain decomposition

Functional decomposition

Pipelining

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

57Shared-Memory Model and Threads

Pipelining

Page 58: Intel Slides on Parallelism

Intel® Software College

Domain Decomposition Using Threads

SharedMemory

Thread 0 Thread 2

f ( ) f ( )

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

58Shared-Memory Model and Threads

Memory

Thread 1

f ( )

Page 59: Intel Slides on Parallelism

Intel® Software College

SharedMemory

Functional Decomposition Using Threads

Thread 0 Thread 1

e ( ) f ( )

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

59Shared-Memory Model and Threads

g ( )h ( )

Page 60: Intel Slides on Parallelism

Intel® Software College

Pipelining Using Threads

Thread 0 Thread 2Thread 1

e ( ) f ( ) g ( )

Dataset 2

Dataset 4 Data

set 3 Data

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

60Shared-Memory Model and Threads

Shared Memory

Input Output

set 2

Data sets5, 6, ...

set 3 Dataset 1

Page 61: Intel Slides on Parallelism

Intel® Software College

Shared versus Private Variables

SharedVariables

PrivateVariables

PrivateVariables

Thread

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

61Shared-Memory Model and Threads

Variables VariablesVariables

Thread

Page 62: Intel Slides on Parallelism

Intel® Software College

The Threads Model

Thread

PrivateVariables

Thread

PrivateVariables

Thread

PrivateVariables

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

62Shared-Memory Model and Threads

Shared Variables

Page 63: Intel Slides on Parallelism

Intel® Software College

What Is OpenMP?

OpenMP is an API for parallel programming

First developed by the OpenMP Architecture Review Board (1997), now a standard

Designed for shared-memory multiprocessors

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

63Implementing Domain Decompositions

Set of compiler directives, library functions, and environment variables, but not a language

Can be used with C, C++, or Fortran

Based on fork/join model of threads

Page 64: Intel Slides on Parallelism

Intel® Software College

Strengths and Weaknesses of OpenMP

Strengths

Well-suited for domain decompositions

Available on Unix and Windows NT

Weaknesses

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

64Implementing Domain Decompositions

Weaknesses

Not well-tailored for functional decompositions

Compilers do not have to check for such errors as deadlocks and race conditions

Page 65: Intel Slides on Parallelism

Intel® Software College

Syntax of Compiler Directives

A C/C++ compiler directive is called a pragma

Pragmas are handled by the preprocessor

All OpenMP pragmas have the syntax:

#pragma omp <rest of pragma>

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

65Implementing Domain Decompositions

#pragma omp <rest of pragma>

Pragmas appear immediately before relevant construct

Page 66: Intel Slides on Parallelism

Intel® Software College

Pragma: parallel for

The compiler directive

#pragma omp parallel for

tells the compiler that the for loop which

immediately follows can be executed in parallel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

66Implementing Domain Decompositions

The number of loop iterations must be computable at run time before loop executes

Loop must not contain a break, return, or exit

Loop must not contain a goto to a label outside loop

Page 67: Intel Slides on Parallelism

Intel® Software College

Example

int first, *marked, prime, size;

...

#pragma omp parallel for

for (i = first; i < size; i += prime)

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

67Implementing Domain Decompositions

for (i = first; i < size; i += prime)

marked[i] = 1;

Page 68: Intel Slides on Parallelism

Intel® Software College

Matching Threads with CPUs

Function omp_get_num_procs returns the number of physical processors available to the parallel program

int omp_get_num_procs (void);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

68Implementing Domain Decompositions

Example:

int t;

...

t = omp_get_num_procs();

Page 69: Intel Slides on Parallelism

Intel® Software College

Matching Threads with CPUs (cont.)

Function omp_set_num_threads allows you to set the number of threads that should be active in parallel sections of code

void omp_set_num_threads (int t);

The function can be called with different arguments at different points in the program

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

69Implementing Domain Decompositions

at different points in the program

Example:

int t;

omp_set_num_threads (t);

Page 70: Intel Slides on Parallelism

Intel® Software College

Which Loop to Make Parallel?

main () {

int i, j, k;

float **a, **b;

...

for (k = 0; k < N; k++) Loop-carried dependences

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

70Implementing Domain Decompositions

for (k = 0; k < N; k++)

for (i = 0; i < N; i++)

for (j = 0; j < N; j++)

a[i][j] = MIN(a[i][j], a[i][k] + a[k][j]);

Loop-carried dependences

Can execute in parallel

Can execute in parallel

Page 71: Intel Slides on Parallelism

Intel® Software College

Grain Size

There is a fork/join for every instance of#pragma omp parallel for

for ( ) {

...

}

Since fork/join is a source of overhead, we want to

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

71Implementing Domain Decompositions

Since fork/join is a source of overhead, we want to maximize the amount of work done for each fork/join; i.e., the grain size

Hence we choose to make the middle loop parallel

Page 72: Intel Slides on Parallelism

Intel® Software College

Almost Right, but Not Quite

main () {

int i, j, k;

float **a, **b;

...

for (k = 0; k < N; k++)

Problem: j is a shared variable

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

72Implementing Domain Decompositions

for (k = 0; k < N; k++)

#pragma omp parallel for

for (i = 0; i < N; i++)

for (j = 0; j < N; j++)

a[i][j] = MIN(a[i][j], a[i][k] + a[k][j]);

Page 73: Intel Slides on Parallelism

Intel® Software College

Problem Solved with private Clause

main () {

int i, j, k;

float **a, **b;

...

for (k = 0; k < N; k++)

Tells compiler to makelisted variables private

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

73Implementing Domain Decompositions

for (k = 0; k < N; k++)

#pragma omp parallel for private (j)

for (i = 0; i < N; i++)

for (j = 0; j < N; j++)

a[i][j] = MIN(a[i][j], a[i][k] + a[k][j]);

listed variables private

Page 74: Intel Slides on Parallelism

Intel® Software College

Example

int i;

float *a, *b, *c, tmp;

...

for (i = 0; i < N; i++) {

tmp = a[i] / b[i];

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

74Implementing Domain Decompositions

tmp = a[i] / b[i];

c[i] = tmp * tmp;

}

Loop is perfectly parallelizable except for shared

variable “tmp”

Page 75: Intel Slides on Parallelism

Intel® Software College

Solution

int i;

float *a, *b, *c, tmp;

...

#pragma omp parallel for private (tmp)

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

75Implementing Domain Decompositions

for (i = 0; i < N; i++) {

tmp = a[i] / b[i];

c[i] = tmp * tmp;

}

Page 76: Intel Slides on Parallelism

Intel® Software College

More about Private Variables

Each thread has its own copy of the private variables

If j is declared private, then inside the for loop no thread can access the “other” j (the j in

shared memory)

No thread can use a previously defined value of j

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

76Implementing Domain Decompositions

No thread can use a previously defined value of j

No thread can assign a new value to the shared j

Private variables are undefined at loop entry and loop exit, reducing execution time

Page 77: Intel Slides on Parallelism

Intel® Software College

Clause: firstprivate

The firstprivate clause tells the compiler that the

private variable should inherit the value of the shared variable upon loop entry

The value is assigned once per thread, not once per loop iteration

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

77Implementing Domain Decompositions

Page 78: Intel Slides on Parallelism

Intel® Software College

Example

a[0] = 0.0;

for (i = 1; i < N; i++)

a[i] = alpha (i, a[i-1]);

#pragma omp parallel for firstprivate (a)

for (i = 0; i < N; i++) {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

78Implementing Domain Decompositions

for (i = 0; i < N; i++) {

b[i] = beta (i, a[i]);

a[i] = gamma (i);

c[i] = delta (a[i], b[i]);

}

Page 79: Intel Slides on Parallelism

Intel® Software College

Clause: lastprivate

The lastprivate clause tells the compiler that the

value of the private variable after the sequentially last loop iteration should be assigned to the shared variable upon loop exit

In other words, when the thread responsible for the sequentially last loop iteration exits the loop,

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

79Implementing Domain Decompositions

sequentially last loop iteration exits the loop, its copy of the private variable is copied back to the shared variable

Page 80: Intel Slides on Parallelism

Intel® Software College

Example

#pragma omp parallel for lastprivate (x)

for (i = 0; i < N; i++) {

x = foo (i);

y[i] = bar(i, x);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

80Implementing Domain Decompositions

y[i] = bar(i, x);

}

last_x = x;

Page 81: Intel Slides on Parallelism

Intel® Software College

Pragma: parallel

In the effort to increase grain size, sometimes the code that should be executed in parallel goes beyond a single for loop

The parallel pragma is used when a block of code

should be executed in parallel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

81Implementing Domain Decompositions

Page 82: Intel Slides on Parallelism

Intel® Software College

Pragma: for

The for pragma is used inside a block of code already marked with the parallel pragma

It indicates a for loop whose iterations should be

divided among the active threads

There is a barrier synchronization of the threads at

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

82Implementing Domain Decompositions

There is a barrier synchronization of the threads at the end of the for loop

Page 83: Intel Slides on Parallelism

Intel® Software College

Pragma: single

The single pragma is used inside a parallel block of

code

It tells the compiler that only a single thread should execute the statement or block of code immediately following

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

83Implementing Domain Decompositions

Page 84: Intel Slides on Parallelism

Intel® Software College

Clause: nowait

The nowait clause tells the compiler that there is no

need for a barrier synchronization at the end of a parallel for loop or single block of code

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

84Implementing Domain Decompositions

Page 85: Intel Slides on Parallelism

Intel® Software College

Case: parallel, for, single Pragmas

for (i = 0; i < N; i++)

a[i] = alpha(i);

if (delta < 0.0) printf (“delta < 0.0\n”);

for (i = 0; i < N; i++)

b[i] = beta (i, delta);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

85Implementing Domain Decompositions

b[i] = beta (i, delta);

Page 86: Intel Slides on Parallelism

Intel® Software College

Solution: parallel, for, single Pragma

#pragma omp parallel

{

#pragma omp for nowait

for (i = 0; i < N; i++)

a[i] = alpha(i);

#pragma omp single nowait

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

86Implementing Domain Decompositions

#pragma omp single nowait

if (delta < 0.0) printf (“delta < 0.0\n”);

#pragma omp for

for (i = 0; i < N; i++)

b[i] = beta (i, delta);

}

Page 87: Intel Slides on Parallelism

Intel® Software College

Extended Example

for (i = 0; i < m; i++) {

low = a[i];

high = b[i];

if (low > high) {

printf (“Exiting during iteration %d\n”, i);

break;

}

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

87Implementing Domain Decompositions

}

for (j = low; j < high; j++)

c[j] += alpha (i, j);

}

Page 88: Intel Slides on Parallelism

Intel® Software College

Extended Example

for (i = 0; i < m; i++) {

low = a[i];

high = b[i];

if (low > high) {

printf (“Exiting during iteration %d\n”, i);

break;

}

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

88Implementing Domain Decompositions

}

#pragma omp parallel for

for (j = low; j < high; j++)

c[j] += alpha (i, j);

}

Page 89: Intel Slides on Parallelism

Intel® Software College

Extended Example

#pragma omp parallel private (i, j, low, high)

for (i = 0; i < m; i++) {

low = a[i];

high = b[i];

if (low > high) {

printf (“Exiting during iteration %d\n”, i);

break;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

89Implementing Domain Decompositions

break;

}

#pragma omp for nowait

for (j = low; j < high; j++)

c[j] += alpha (i, j);

}

Page 90: Intel Slides on Parallelism

Intel® Software College

Extended Example

#pragma omp parallel private (i, j, low, high)

for (i = 0; i < m; i++) {

low = a[i];

high = b[i];

if (low > high) {

#pragma omp single nowait

printf (“Exiting during iteration %d\n”, i);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

90Implementing Domain Decompositions

printf (“Exiting during iteration %d\n”, i);

break;

}

#pragma omp for nowait

for (j = low; j < high; j++)

c[j] += alpha (i, j);

}

Page 91: Intel Slides on Parallelism

Intel® Software College

Potential Pitfall?

double area, pi, x;

int i, n;

...

area = 0.0;

for (i = 0; i < n; i++) {

x = (i + 0.5)/n;

area += 4.0/(1.0 + x*x);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

91Congronting Race Conditions

area += 4.0/(1.0 + x*x);

}

pi = area / n;

What happens when we make the for loop parallel?

Page 92: Intel Slides on Parallelism

Intel® Software College

Race Condition

A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable

For example, suppose both Thread A and Thread B are executing the statement

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

92Congronting Race Conditions

area += 4.0 / (1.0 + x*x);

Page 93: Intel Slides on Parallelism

Intel® Software College

Value of area Thread A Thread B

11.667

One Timing ⇒⇒⇒⇒ Correct Sum

+3.765

15.432

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

93Congronting Race Conditions

15.432

15.432

+ 3.563

18.995

Page 94: Intel Slides on Parallelism

Intel® Software College

Value of area Thread A Thread B

11.667

Another Timing ⇒⇒⇒⇒ Incorrect Sum

+3.765

11.667

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

94Congronting Race Conditions

11.667

15.432

+ 3.563

15.230

Page 95: Intel Slides on Parallelism

Intel® Software College

Another Race Condition Example

struct Node {

struct Node *next;

int data; };

struct List {

struct Node *head; }

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

95Congronting Race Conditions

void AddHead (struct List *list,

struct Node *node) {

node->next = list->head;

list->head = node;

}

Page 96: Intel Slides on Parallelism

Intel® Software College

Original Singly-Linked List

headdata

listnode_a

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

96Congronting Race Conditions

headdata

next

Page 97: Intel Slides on Parallelism

Intel® Software College

Thread 1 after Stmt. 1 of AddHead

headdata

listnode_a

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

97Congronting Race Conditions

headdata

next

data

next

node_b

Page 98: Intel Slides on Parallelism

Intel® Software College

Thread 2 Executes AddHead

headdata

listnode_a

data

next

node_c

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

98Congronting Race Conditions

headdata

next

data

next

node_b

Page 99: Intel Slides on Parallelism

Intel® Software College

Thread 1 After Stmt. 2 of AddHead

headdata

listnode_a

data

next

node_c

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

99Congronting Race Conditions

headdata

next

data

next

node_b

Page 100: Intel Slides on Parallelism

Intel® Software College

Why Race Conditions Are Nasty

Programs with race conditions exhibit nondeterministic behavior

Sometimes give correct result

Sometimes give erroneous result

Programs often work correctly on trivial data sets

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

100Congronting Race Conditions

Programs often work correctly on trivial data sets and small number of threads

Errors more likely to occur when number of threads and/or execution time increases

Hence debugging race conditions can be difficult

Page 101: Intel Slides on Parallelism

Intel® Software College

Mutual Exclusion

We can prevent the race conditions described earlier by ensuring that only one thread at a time references and updates shared variable or data structure

Mutual exclusion refers to a kind of synchronizationthat allows only a single thread or process

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

101Congronting Race Conditions

that allows only a single thread or process at a time to have access to a shared resource

Mutual exclusion is implemented using some form of locking

Page 102: Intel Slides on Parallelism

Intel® Software College

Do Flags Guarantee Mutual Exclusion?

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

102Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 103: Intel Slides on Parallelism

Intel® Software College

Flags Don’t Guarantee Mutual Exclusion

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Thread 1

0

flag

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

103Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 104: Intel Slides on Parallelism

Intel® Software College

Flags Don’t Guarantee Mutual Exclusion

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Thread 1

0

flag Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

104Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 105: Intel Slides on Parallelism

Intel® Software College

Flags Don’t Guarantee Mutual Exclusion

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Thread 1

1

flag Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

105Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 106: Intel Slides on Parallelism

Intel® Software College

Flags Don’t Guarantee Mutual Exclusion

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Thread 1

1

flag Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

106Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 107: Intel Slides on Parallelism

Intel® Software College

Flags Don’t Guarantee Mutual Exclusion

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Thread 1

0

flag Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

107Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 108: Intel Slides on Parallelism

Intel® Software College

Flags Don’t Guarantee Mutual Exclusion

int flag = 0;

void AddHead (struct List *list,

struct Node *node) {

while (flag != 0) /* wait */ ;

flag = 1;

Thread 1

0

flag Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

108Congronting Race Conditions

flag = 1;

node->next = list->head;

list->head = node;

flag = 0;

}

Page 109: Intel Slides on Parallelism

Intel® Software College

Locking Mechanism

The previous method failed because checking the value of flag and setting its value were two

distinct operations

We need some sort of atomic test-and-set

Operating system provides functions to do this

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

109Congronting Race Conditions

Operating system provides functions to do this

The generic term “lock” refers to a synchronization mechanism used to control access to shared resources

Page 110: Intel Slides on Parallelism

Intel® Software College

Critical Sections

A critical section is a portion of code that threads execute in a mutually exclusive fashion

The critical pragma in OpenMP immediately

precedes a statement or block representing a critical section

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

110Congronting Race Conditions

Good news: critical sections eliminate race conditions

Bad news: critical sections are executed sequentially

More bad news: you have to identify critical sections yourself

Page 111: Intel Slides on Parallelism

Intel® Software College

Reminder: Motivating Example

double area, pi, x;

int i, n;

...

area = 0.0;

for (i = 0; i < n; i++) {

x = (i + 0.5)/n;

area += 4.0/(1.0 + x*x);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

111Congronting Race Conditions

area += 4.0/(1.0 + x*x);

}

pi = area / n;

Where is the critical section?

Page 112: Intel Slides on Parallelism

Intel® Software College

Solution #1

double area, pi, x;

int i, n;

...

area = 0.0;

#pragma omp parallel for private(x)

for (i = 0; i < n; i++) {

x = (i + 0.5)/n;

#pragma omp critical

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

112Congronting Race Conditions

#pragma omp critical

area += 4.0 / (1.0 + x*x);

}

pi = area / n;

This ensures area will end up with the correct value.How can we do better?

Page 113: Intel Slides on Parallelism

Intel® Software College

Solution #2

double area, pi, tmp, x;

int i, n;

...

area = 0.0;

#pragma omp parallel for private(x,tmp)

for (i = 0; i < n; i++) {

x = (i + 0.5)/n;

tmp = 4.0/(1.0 + x*x);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

113Congronting Race Conditions

tmp = 4.0/(1.0 + x*x);

#pragma omp critical

area += tmp;

}

pi = area / n;

This reduces amount of time spent in critical section.How can we do better?

Page 114: Intel Slides on Parallelism

Intel® Software College

Solution #3

double area, pi, tmp, x;int i, n;...area = 0.0;#pragma omp parallel private(tmp){

tmp = 0.0;#pragma omp for private (x)

for (i = 0; i < n; i++) {x = (i + 0.5)/n;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

114Congronting Race Conditions

x = (i + 0.5)/n;tmp += 4.0/(1.0 + x*x);

}#pragma omp critical

area += tmp;}pi = area / n;

Why is this better?

Page 115: Intel Slides on Parallelism

Intel® Software College

Reductions

Given associative binary operator ⊕ the expression

a1 ⊕ a2 ⊕ a3 ⊕ ... ⊕ an

is called a reduction

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

115Congronting Race Conditions

The π-finding program performs a sum-reduction

Page 116: Intel Slides on Parallelism

Intel® Software College

OpenMP reduction Clause

Reductions are so common that OpenMP provides a reduction clause for the parallel for pragma

Eliminates need for

Creating private variable

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

116Congronting Race Conditions

Dividing computation into accumulation of local answers that contribute to global result

Page 117: Intel Slides on Parallelism

Intel® Software College

Solution #4

double area, pi, x;

int i, n;

...

area = 0.0;

#pragma omp parallel for private(x) \

reduction(+:area)

for (i = 0; i < n; i++) {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

117Congronting Race Conditions

for (i = 0; i < n; i++) {

x = (i + 0.5)/n;

area += 4.0/(1.0 + x*x);

}

pi = area / n;

Page 118: Intel Slides on Parallelism

Intel® Software College

Important: Lock Data, Not Code

Locks should be associated with data objects

Different data objects should have different locks

Suppose lock associated with critical section of code instead of data object

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

118Congronting Race Conditions

Mutual exclusion can be lost if same object manipulated by two different functions

Performance can be lost if two threads manipulating different objects attempt to execute same function

Page 119: Intel Slides on Parallelism

Intel® Software College

Example: Hash Table Creation

NULL

NULL

NULL

NULL

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

119Congronting Race Conditions

...

NULL

NULL

Page 120: Intel Slides on Parallelism

Intel® Software College

Locking Code: Inefficient

#pragma omp parallel for private (index)

for (i = 0; i < elements; i++) {

index = hash(element[i]);

#pragma omp critical

insert_element (element[i], index);

}

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

120Congronting Race Conditions

}

Page 121: Intel Slides on Parallelism

Intel® Software College

Locking Data: Efficient

/* Static variable */

omp_lock_t hash_lock[HASH_TABLE_SIZE];

/* Inside function ‘main’ */

for (i = 0; i < HASH_TABLE_SIZE; i++)

omp_init_lock(&hash_lock[i]);

Declaration

Initialization

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

121Congronting Race Conditions

void insert_element (ELEMENT e, int i)

{

omp_set_lock (&hash_lock[i]);

/* Code to insert element e */

omp_unset_lock (&hash_lock[i]);

}

Use

Page 122: Intel Slides on Parallelism

Intel® Software College

Locks Are Dangerous

Suppose a lock is used to guarantee mutually exclusive access to a shared variable

Imagine two threads, each with its own critical section

Thread A Thread B

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

122Congronting Race Conditions

Thread A Thread B

a += 5; b += 5;

b += 7; a += 7;

a += b; a += b;

a += 11; b += 11;

Page 123: Intel Slides on Parallelism

Intel® Software College

Faulty Implementation

Thread A Thread B

lock (lock_a); lock (lock_b);

a += 5; b += 5;

lock (lock_b); lock (lock_a);

b += 7; a += 7;

What happens ifthreads are atthis point at thesame time?

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

123Congronting Race Conditions

b += 7; a += 7;

a += b; a += b;

unlock (lock_b); unlock (lock_a);

a += 11; b += 11;

unlock (lock_a); unlock (lock_b);

Page 124: Intel Slides on Parallelism

Intel® Software College

Deadlock

A situation involving two or more threads (processes) in which no thread may proceed because each is waiting for a resource held by another

Can be represented by a resource allocation graph

sem_bwants held by

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

124Congronting Race Conditions

A graph of deadlock contains a cycle

Thread A Thread A

sem_b

sem_a

wants

wants

held by

held by

Page 125: Intel Slides on Parallelism

Intel® Software College

More on Deadlocks

A program exhibits a global deadlock if every thread is blocked

A program exhibits local deadlock if only some of the threads in the program are blocked

A deadlock is another example of a nondeterministic

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

125Congronting Race Conditions

A deadlock is another example of a nondeterministic behavior exhibited by a parallel program

Adding debugging output to detect source of deadlock can change timing and reduce chance of deadlock occurring

Page 126: Intel Slides on Parallelism

Intel® Software College

Four Conditions for Deadlock

Mutually exclusive access to a resource

Threads hold onto resources they have while they wait for additional resources

Resources cannot be taken away from threads

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

126Congronting Race Conditions

Cycle in resource allocation graph

Page 127: Intel Slides on Parallelism

Intel® Software College

Deadlock Prevention Strategies

Don’t allow mutually exclusive access to resource

Make resource shareable

Don’t allow threads to wait while holding resources

Only request resources when have none. That means only hold one resource at a time or request all resources at once.

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

127Congronting Race Conditions

request all resources at once.

Allow resources to be taken away from threads.

Allow preemption. Works for CPU and memory. Doesn’t work for locks.

Ensure no cycle in request allocation graph.

Rank resources. Threads must acquire resources in order.

Page 128: Intel Slides on Parallelism

Intel® Software College

Correct Implementation

Thread A Thread B

lock (lock_a); lock (lock_a);

a += 5; lock (lock_b);

lock (lock_b); b += 5;

b += 7; a += 7;

Threads must locklock_a before lock_b

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

128Congronting Race Conditions

b += 7; a += 7;

a += b; a += b;

unlock (lock_b); unlock (lock_a);

a += 11; b += 11;

unlock (lock_a); unlock (lock_b);

Page 129: Intel Slides on Parallelism

Intel® Software College

Another Problem with Locks

Every call to function lock should be matched with a call to unlock, representing the start and the

end of the critical section

A program may be syntactically correct (i.e., may compile) without having matching calls

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

129Congronting Race Conditions

A programmer may forget the unlock call or may pass the wrong argument to unlock

A thread that never releases a shared resource creates a deadlock

Page 130: Intel Slides on Parallelism

Intel® Software College

Case Study: The N Queens Problem

Is there a way to placeN queens on an N-by-Nchessboard such thatno queen threatensanother queen?

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

130Implementing Task Decompositions

Page 131: Intel Slides on Parallelism

Intel® Software College

A Solution to the 4 Queens Problem

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

131Implementing Task Decompositions

Page 132: Intel Slides on Parallelism

Intel® Software College

Exhaustive Search

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

132Implementing Task Decompositions

Page 133: Intel Slides on Parallelism

Intel® Software College

Design #1 for Parallel Search

Create threads to explore different parts of the search tree simultaneously

If a node has children

The thread creates child nodes

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

133Implementing Task Decompositions

The thread explores one child node itself

Thread creates a new thread for every other child node

Page 134: Intel Slides on Parallelism

Intel® Software College

Design #1 for Parallel Search

Thread W

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

134Implementing Task Decompositions

Thread W NewThread X

NewThread Y

NewThread Z

Page 135: Intel Slides on Parallelism

Intel® Software College

Pros and Cons of Design #1

Pros

Simple design, easy to implement

Balances work among threads

Cons

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

135Implementing Task Decompositions

Cons

Too many threads created

Lifetime of threads too short

Overhead costs too high

Page 136: Intel Slides on Parallelism

Intel® Software College

Design #2 for Parallel Search

One thread created for each subtree rooted at a particular depth

Each thread sequentially explores its subtree

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

136Implementing Task Decompositions

Page 137: Intel Slides on Parallelism

Intel® Software College

Design #2 in Action

Thread1

Thread2

Thread3

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

137Implementing Task Decompositions

Page 138: Intel Slides on Parallelism

Intel® Software College

Pros and Cons of Design #2

Pros

Thread creation/termination time minimized

Cons

Subtree sizes may vary dramatically

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

138Implementing Task Decompositions

Subtree sizes may vary dramatically

Some threads may finish long before others

Imbalanced workloads lower efficiency

Page 139: Intel Slides on Parallelism

Intel® Software College

Design #3 for Parallel Search

Main thread creates work pool—list of subtrees to explore

Main thread creates finite number of co-worker threads

Each subtree exploration is done by a single thread

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

139Implementing Task Decompositions

Each subtree exploration is done by a single thread

Inactive threads go to pool to get more work

Page 140: Intel Slides on Parallelism

Intel® Software College

Work Pool Analogy

More rows than workers

Each worker takes an unpicked row and picks the crop

After completing a row, the

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

140Implementing Task Decompositions

After completing a row, the worker takes another unpicked row

Process continues until all rows have been harvested

Page 141: Intel Slides on Parallelism

Intel® Software College

Design #3 in Action

Thread1

Thread2

Thread3

Thread3

Thread1

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

141Implementing Task Decompositions

Page 142: Intel Slides on Parallelism

Intel® Software College

Pros and Cons of Strategy #3

Pros

Thread creation/termination time minimized

Workload balance better than strategy #2

Cons

Threads need exclusive access to data

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

142Implementing Task Decompositions

Threads need exclusive access to data structure containing work to be done, a sequential component

Workload balance worse than strategy #1

Conclusion

Good compromise between designs 1 and 2

Page 143: Intel Slides on Parallelism

Intel® Software College

Implementing Strategy #3 for N Queens

Work pool consists of N boards representing N possible placements of queen on first row

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

143Implementing Task Decompositions

Page 144: Intel Slides on Parallelism

Intel® Software College

Parallel Program Design

One thread creates list of partially filled-in boards

Fork: Create one thread per CPU

Each thread repeatedly gets board from list, searches for solutions, and adds to solution count, until no more board on list

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

144Implementing Task Decompositions

no more board on list

Join: Occurs when list is empty

One thread prints number of solutions found

Page 145: Intel Slides on Parallelism

Intel® Software College

Search Tree Node Structure

/* The ‘board’ struct contains information about a

node in the search tree; i.e., partially filled-

in board. The work pool is a singly linked

list of ‘board’ structs. */

struct board {

int pieces; /* # of queens on board*/

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

145Implementing Task Decompositions

int pieces; /* # of queens on board*/

int places[MAX_N]; /* Queen’s pos in each row */

struct board *next; /* Next search tree node */

};

Page 146: Intel Slides on Parallelism

Intel® Software College

Key Code in main Function

struct board *stack;

...

stack = NULL;

for (i = 0; i < n; i++) {

initial=(struct board *)malloc(sizeof(struct board));

initial->pieces = 1;

initial->places[0] = i;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

146Implementing Task Decompositions

initial->places[0] = i;

initial->next = stack;

stack = initial;

}

num_solutions = 0;

search_for_solutions (n, stack, &num_solutions);

printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

Page 147: Intel Slides on Parallelism

Intel® Software College

Insertion of OpenMP Code

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) {initial=(struct board *)malloc(sizeof(struct board));initial->pieces = 1;initial->places[0] = i;initial->next = stack;stack = initial;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

147Implementing Task Decompositions

stack = initial;}num_solutions = 0;

omp_set_num_threads (omp_get_num_procs());#pragma omp parallelsearch_for_solutions (n, stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

Page 148: Intel Slides on Parallelism

Intel® Software College

Original C Function to Get Work

void search_for_solutions (int n,

struct board *stack, int *num_solutions)

{

struct board *ptr;

void search (int, struct board *, int *);

while (stack != NULL) {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

148Implementing Task Decompositions

while (stack != NULL) {

ptr = stack;

stack = stack->next;

search (n, ptr, num_solutions);

free (ptr);

}

}

Page 149: Intel Slides on Parallelism

Intel® Software College

C/OpenMP Function to Get Work

void search_for_solutions (int n,

struct board *stack, int *num_solutions)

{

struct board *ptr;

void search (int, struct board *, int *);

while (stack != NULL) {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

149Implementing Task Decompositions

while (stack != NULL) {

#pragma omp critical{ ptr = stack; stack = stack->next; }search (n, ptr, num_solutions);

free (ptr);

}

}

Page 150: Intel Slides on Parallelism

Intel® Software College

Original C Search Function

void search (int n, struct board *ptr,int *num_solutions)

{int i;int no_threats (struct board *);

if (ptr->pieces == n) {(*num_solutions)++;

} else {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

150Implementing Task Decompositions

} else {ptr->pieces++;for (i = 0; i < n; i++) {

ptr->places[ptr->pieces-1] = i;if (no_threats(ptr))

search (n, ptr, num_solutions);}ptr->pieces--;

}

}

Page 151: Intel Slides on Parallelism

Intel® Software College

C/OpenMP Search Function

void search (int n, struct board *ptr,int *num_solutions)

{int i;int no_threats (struct board *);

if (ptr->pieces == n) {

#pragma omp critical(*num_solutions)++;

} else {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

151Implementing Task Decompositions

} else {ptr->pieces++;for (i = 0; i < n; i++) {

ptr->places[ptr->pieces-1] = i;if (no_threats(ptr))

search (n, ptr, num_solutions);}ptr->pieces--;

}}

Page 152: Intel Slides on Parallelism

Intel® Software College

Only One Problem: It Doesn’t Work!

OpenMP program throws an exception

Culprit: Variable stack

Heap

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

152Implementing Task Decompositions

stack

Page 153: Intel Slides on Parallelism

Intel® Software College

Problem Site

int main ()

{

struct board *stack;

...

#pragma omp parallelsearch_for_solutions

(n, stack, &num_solutions);

...

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

153Implementing Task Decompositions

...

}

void search_for_solutions (int n,

struct board *stack, int *num_solutions)

{

...

while (stack != NULL) ...

Page 154: Intel Slides on Parallelism

Intel® Software College

1. Both Threads Point to Top

stack stack

Thread 1 Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

154Implementing Task Decompositions

Page 155: Intel Slides on Parallelism

Intel® Software College

2. Thread 1 Grabs First Element

stack

Thread 1 Thread 2

stack

ptr

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

155Implementing Task Decompositions

ptr

Page 156: Intel Slides on Parallelism

Intel® Software College

3. Error #1:Thread 2 grabs same element

Thread 1 Thread 2

stack

ptr

stack

ptr

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

156Implementing Task Decompositions

ptr ptr

Page 157: Intel Slides on Parallelism

Intel® Software College

4. Error #2:Thread 1 deletes element and thenThread 2’s stack ptr dangles

stack

Thread 1 Thread 2

stack

ptr

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

157Implementing Task Decompositions

ptr

?

Page 158: Intel Slides on Parallelism

Intel® Software College

Remedy 1: Make stack Static

Thread 1 Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

158Implementing Task Decompositions

stack

Page 159: Intel Slides on Parallelism

Intel® Software College

Remedy 2: Use Indirection

stack stack

Thread 1 Thread 2

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

159Implementing Task Decompositions

stack

Page 160: Intel Slides on Parallelism

Intel® Software College

Corrected main Function

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) {initial=(struct board *)malloc(sizeof(struct board));initial->pieces = 1;initial->places[0] = i;initial->next = stack;stack = initial;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

160Implementing Task Decompositions

stack = initial;}num_solutions = 0;

omp_set_num_threads (omp_get_num_procs());#pragma omp parallelsearch_for_solutions (n, &stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

Page 161: Intel Slides on Parallelism

Intel® Software College

Corrected Stack Access Function

void search_for_solutions (int n,

struct board **stack, int *num_solutions){

struct board *ptr;

void search (int, struct board *, int *);

while (*stack != NULL) {

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

161Implementing Task Decompositions

while (*stack != NULL) {

#pragma omp critical{ ptr = *stack;*stack = (*stack)->next; }

search (n, ptr, num_solutions);

free (ptr);

}

}

Page 162: Intel Slides on Parallelism

Intel® Software College

Case Study: Fancy Web Browser

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

162Implementing Task Decompositions

Page 163: Intel Slides on Parallelism

Intel® Software College

Case Study: Fancy Web Browser

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

163Implementing Task Decompositions

You can see snapshot ofpage before deciding

whether to click on the link

Page 164: Intel Slides on Parallelism

Intel® Software College

C Code

page = retrieve_page (url);

find_links (page, &num_links, &link_url);

for (i = 0; i < num_links; i++)

snapshots[i].image = NULL;

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

164Implementing Task Decompositions

snapshots[i].image = NULL;

for (i = 0; i < num_links; i++)

generate_preview (&snapshots[i]);

display_page (page);

Page 165: Intel Slides on Parallelism

Intel® Software College

Pseudocode, Option A

Retrieve page

Identify links

Enter parallel region

Thread gets ID number (id)

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

165Implementing Task Decompositions

Thread gets ID number (id)

If id = 0 draw page

else fetch page & build snapshot image (id-1)

Exit parallel region

Page 166: Intel Slides on Parallelism

Intel® Software College

Timeline of Option A

Display page

Fetch page, create snapshot

Fetch, create

Fetch, create

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

166Implementing Task Decompositions

Retrieve page

Identify links

Enter parallel block

Display page

Barrier at end ofparallel block

Page 167: Intel Slides on Parallelism

Intel® Software College

C/OpenMP Code, Option A

page = retrieve_page (url);

find_links (page, &num_links, &link_url);

for (i = 0; i < num_links; i++)

snapshots[i].image = NULL;

omp_set_num_threads (num_links + 1);

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

167Implementing Task Decompositions

#pragma omp parallel private (id)

{

id = omp_get_thread_num();

if (id == 0) display_page (page);

else generate_preview (&snapshots[id-1]);

}

Page 168: Intel Slides on Parallelism

Intel® Software College

Pseudocode, Option B

Retrieve page

Identify links

Two activities happen in parallel

1. Draw page

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

168Implementing Task Decompositions

1. Draw page

2. For all links do in parallel

Fetch page and build snapshot image

Page 169: Intel Slides on Parallelism

Intel® Software College

Parallel Sections

#pragma omp parallel sections

{

<code block A>

#pragma omp section

Meaning: The followingblock contains sub-blocksthat may execute inparallel

Each block executed by one thread

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

169Implementing Task Decompositions

<code block B>

#pragma omp section

<code block C>

}

Dividers between sections

Page 170: Intel Slides on Parallelism

Intel® Software College

Nested Parallelism

We can use parallel sections to specify two different concurrent activities: drawing the Web page and creating the snapshots

We are using a for loop to create multiple snapshots; number of iterations is known only at run time

We would like to make for loop parallel

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

170Implementing Task Decompositions

We would like to make for loop parallel

OpenMP allows nested parallelism: a parallel region inside another parallel region

A thread entering a parallel region creates a newteam of threads to execute it

Page 171: Intel Slides on Parallelism

Intel® Software College

Timeline of Option B

Display page

Fetch page, create snapshot

Fetch, create

Fetch, create

Enterparallel for

Barrier

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

171Implementing Task Decompositions

Retrieve page

Identify links

Enter parallel sections

Display page

Barrier

Page 172: Intel Slides on Parallelism

Intel® Software College

C/OpenMP Code, Option B

page = retrieve_page (url);

find_links (page, &num_links, &link_url);

omp_set_num_threads (2);

#pragma omp parallel sections

{

display_page (page);

#pragma omp section

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

172Implementing Task Decompositions

#pragma omp section

omp_set_num_threads (num_links);

#pragma omp parallel for

for (i = 0; i < num_links; i++)

generate_preview (&snapshots[i]);

}