OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was...

31
OpenMP Tasking Explained Ruud van der Pas 1 OpenMP Tasking Explained Ruud van der Pas Senior Principal So2ware Engineer SPARC Microelectronics Santa Clara, CA, USA SC’13 Talk at OpenMP Booth Wednesday, November 20, 2013

Transcript of OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was...

Page 1: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

1  

OpenMP  Tasking  Explained  Ruud  van  der  Pas  

Senior  Principal  So2ware  Engineer  SPARC  Microelectronics  

 Santa  Clara,  CA,  USA  SC’13  Talk  at  OpenMP  Booth  

Wednesday,  November  20,  2013  

Page 2: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

2  

What  Was  Missing  ?  

Page 3: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

3  

n Constructs worked well for many cases n But OpenMP’s Big Brother had to see everything

à Loops with a known length at run time

à Finite number of parallel sections

à .... n This didn’t work well with certain common problems

à Linked lists and recursive algorithms being the cases in

point

n Often, a solution was feasible, but ugly at best

Before  OpenMP  3.0  

Page 4: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

4  

Today’s  All  New  Episode  TASKING  

Page 5: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

5  

n Tasking was introduced in OpenMP 3.0 n Until then it was impossible to efficiently and easily

implement certain types of parallelism n The initial functionality was very simple by design

à The idea was (and still) is to augment tasking as we

collectively gain more insight and experience

n Note that tasks can be nested

à But not for the faint of heart

Tasking  in  OpenMP  

Page 6: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

6  

Thread  

Generate  tasks  

Thread  

Thread  

Thread  

Thread  

Execute  tasks  

The  Tasking  Concept  In  OpenMP  

Page 7: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

7  

Who  Does  What  And  When  ?  Developer  Use  a  pragma  to  specify  where  the  tasks  are  

(The  assump2on  is  that  all  tasks  can  be  executed  independently)  

OpenMP  runOme  system  •  When  a  thread  encounters  a  task  construct,  a  new  task  is  generated  

•  The  moment  of  execu2on  of  the  task  is  up  to  the  run2me  system  

•  Execu2on  can  either  be  immediate  or  delayed  •  Comple2on  of  a  task  can  be  enforced  through  task  synchronizaOon  

Page 8: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

8  

!$omp task

#pragma omp task Defines  a  task  

The  Tasking  Construct  

Page 9: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

9  

Task  SynchronizaOon  There  are  two  task  synchronizaOon  constructs  

#pragma  omp  barrier  

#pragma  omp  taskwait  

!$omp barrier

#pragma omp barrier

!$omp taskwait

#pragma omp taskwait

Page 10: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

10  

Task  CompleOon  

!$omp flush taskwait

#pragma omp taskwait

Explicitly  wait  on  the  compleKon  of  child  tasks:  

Page 11: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

11  

Tasking  Explained  By  Ways  Of  One  Example  

Page 12: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

12  

A  Simple  Plan  

Write  a  program  that  prints  either  “A  race  car”  or  “A  car  race”  and  maximize  the  parallelism  

Your  Task  for  Today:  

Page 13: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

13  

Tasking  Example/1  

#include <stdlib.h> #include <stdio.h> int main(int argc, char *argv[]) { printf("A "); printf("race "); printf("car "); printf("\n"); return(0); }

$ cc -fast hello.c $ ./a.out A race car $

What will this program print ?

Page 14: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

14  

#include <stdlib.h> #include <stdio.h> int main(int argc, char *argv[]) { #pragma omp parallel { printf("A "); printf("race "); printf("car "); } // End of parallel region printf("\n"); return(0); } What will this program print

using 2 threads ?

Tasking  Example/2  

Page 15: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

15  

$ cc -xopenmp -fast hello.c $ export OMP_NUM_THREADS=2 $ ./a.out A race car A race car

Note  that  this  program  could  (for  example)  also  print    “A  A  race  race  car  car”  or      “A  race  A  car  race  car”,  or    “A  race  A  race  car  car”,  or  

                           .....  But  I  have  not  observed  this  (yet)  

Tasking  Example/3  

Page 16: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

16  

#include <stdlib.h> #include <stdio.h> int main(int argc, char *argv[]) { #pragma omp parallel { #pragma omp single { printf("A "); printf("race "); printf("car "); } } // End of parallel region printf("\n"); return(0); }

What will this program print using 2 threads ?

Tasking  Example/4  

Page 17: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

17  

$ cc -xopenmp –fast hello.c $ export OMP_NUM_THREADS=2 $ ./a.out A race car

But of course now only 1 thread executes .......

Tasking  Example/5  

Page 18: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

18  

int main(int argc, char *argv[]) { #pragma omp parallel { #pragma omp single { printf(“A “); #pragma omp task {printf("race ");} #pragma omp task {printf("car ");} } } // End of parallel region printf("\n"); return(0); }

What will this program print using 2 threads ?

Tasking  Example/6  

Page 19: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

19  

$ cc -xopenmp -fast hello.c $ export OMP_NUM_THREADS=2 $ ./a.out A race car $ ./a.out A race car $ ./a.out A car race $

Tasks can be executed in arbitrary order

Tasking  Example/7  

Page 20: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

20  

Another  Simple  Plan  

Have  the  sentence  end  with  “is  fun  to  watch”  (hint:  use  a  print  statement)  

You  did  well  and  quickly,  so  here  is  a  final  task  to  do  

Page 21: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

21  

int main(int argc, char *argv[]) { #pragma omp parallel { #pragma omp single { printf(“A “); #pragma omp task {printf("race ");} #pragma omp task {printf("car ");} printf(“is fun to watch “); } } // End of parallel region printf("\n"); return(0); }

What will this program print using 2 threads ?

Tasking  Example/8  

Page 22: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

22  

$ cc -xopenmp -fast hello.c $ export OMP_NUM_THREADS=2 $ ./a.out A is fun to watch race car $ ./a.out A is fun to watch race car $ ./a.out A is fun to watch car race $

Tasks are executed at a task execution point

Tasking  Example/9  

Page 23: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

23  

int main(int argc, char *argv[]) { #pragma omp parallel { #pragma omp single { printf(“A “); #pragma omp task {printf("car ");} #pragma omp task {printf("race ");} #pragma omp taskwait printf(“is fun to watch “); } } // End of parallel region printf("\n");return(0); }

What will this program print using 2 threads ?

Tasking  Example/10  

Page 24: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

24  

$ cc -xopenmp -fast hello.c $ export OMP_NUM_THREADS=2 $ ./a.out $ A car race is fun to watch $ ./a.out A car race is fun to watch $ ./a.out A race car is fun to watch $

Tasks are executed first now

Tasking  Example/11  

Page 25: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

25  

Thank You And ..... Stay Tuned !

[email protected]

Page 26: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

26  

n As the computation progresses, the work performed per task may shrink

à Recursive algorithms are an example

à This is where the final clause may come handy

n The data environment can also grow too much

à This is why the mergeable clause has been added

More  About  Tasking  

Page 27: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

27  

Example  –  Linked  List/1   ........ while(my_pointer) { (void) do_independent_work (my_pointer); my_pointer = my_pointer->next ; } // End of while loop ........

Hard to do before tasking: First count number of iterations, then convert while loop to for loop

Page 28: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

28  

n Walking through the linked list is a serial process

à Scan each entry until the NULL pointer has been hit

n How do we create the tasks then ? n The idea is actually quite simple:

à Use the single construct : one thread generates the tasks

à All other threads execute the tasks as they become

available

Example  –  Linked  List/2  

Page 29: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

29  

Example  –  Linked  List/3  

my_pointer = listhead;!! #pragma omp parallel! {! #pragma omp single! {! while(my_pointer) {! #pragma omp task firstprivate(my_pointer)! {! (void) do_independent_work (my_pointer);! }! my_pointer = my_pointer->next ;! }! } // End of single! } // End of parallel region!

OpenMP Task is specified here (executed in parallel)

Page 30: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

30  

Example  –  Linked  List/4  

my_pointer = listhead;!! #pragma omp parallel! {! #pragma omp single nowait! {! while(my_pointer) {! #pragma omp task firstprivate(my_pointer)! {! (void) do_independent_work (my_pointer);! }! my_pointer = my_pointer->next ;! }! } // End of single - no implied barrier (nowait)! } // End of parallel region - implied barrier!

Can eliminate a barrier

Page 31: OpenMP’Tasking’Explained’ · OpenMP Tasking Explained Ruud van der Pas 5"! Tasking was introduced in OpenMP 3.0 ! Until then it was impossible to efficiently and easily implement

OpenMP Tasking Explained Ruud van der Pas

31  

n The depend clause to support task dependences

à Forces additional constraints on task scheduling

à Expressed through: list item(s) + dependence type

à Dependence types are: in, out and inout

n The taskgroup construct

à Specifies to wait on completion of child tasks and their

descendant tasks

à Note: taskwait only joins direct child tasks

Main  Tasking  Extensions  in  4.0