Sorting Things Out - OpenMP

36
OpenMP Booth – Sorting Things Out With Tasks Ruud van der Pas 1 Sorting Things Out Ruud van der Pas Distinguished Engineer SPARC Microelectronics Santa Clara, CA, USA SC’16 Talk at OpenMP Booth Tuesday, November 15, 2016 (with tasks)

Transcript of Sorting Things Out - OpenMP

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas1

SortingThingsOut

RuudvanderPasDistinguished EngineerSPARCMicroelectronics

SantaClara,CA,USASC’16Talkat OpenMPBoothTuesday,November15,2016

(withtasks)

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas2

TheDarkAgesOfOpenMPBigBrotherHadToKnowEverything

Andinadvance,(right)beforeexecutionForexample,thelooplength,numberof

parallelsections,etcGetshardwithmoredynamicproblemslikeprocessinglinkedlists,divideandconquer,

recursionAsolutionwasugly.Atbest

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas3

TaskingComesToTheRescue!

Andwewillshowyouhowitallworks

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas4

BUT!

Noformalterminology,definitions,etc

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas5

Ataskisachunkofindependentwork

Youguaranteedifferenttaskscanbeexecutedsimultaneously#pragmaomp task{“thisismytask”}

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas6

Theruntimesystemdecidesontheschedulingofthetasks

Atcertainpoints(implicitandexplicit),tasksareguaranteedtobecompleted

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas7

Forthosewholovetostudythefineprint,thefollowingadvice:

RTFM!Andthisiswhatitlookslike:

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas8

RTFM!

RecognizeTheFabulousMasters

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas9

Thread

Generatetasks

Thread

Thread

Thread

Thread

Executetasks

TheTaskingConceptInOpenMP

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas10

WhoDoesWhatAndWhen?You

Useapragmatospecifywherethetasksare(Theassumptionisthatalltaskscanbeexecutedindependently)

• Whenathreadencountersataskconstruct,anewtaskisgenerated

• Themomentofexecutionofthetaskisuptotheruntimesystem

• Executioncaneitherbeimmediateordelayed• Completionofataskcanbeenforcedthroughtasksynchronization

TheOpenMPruntimesystem

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas11

TaskingExplainedByWaysOfOneExample

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas12

ASimplePlan

Writeaprogramthatprintseither“Aracecar”or“Acarrace”andmaximizetheparallelism

YourTaskforToday:

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas13

TaskingExample/1

#include <stdlib.h>#include <stdio.h>

int main(int argc, char *argv[]) {

printf("A ");printf("race ");printf("car ");

printf("\n");return(0);

}

$ cc -fast hello.c$ ./a.outA race car$

What will this program print ?

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas14

#include <stdlib.h>#include <stdio.h>

int main(int argc, char *argv[]) {

#pragma omp parallel{

printf("A ");printf("race ");printf("car ");

} // End of parallel region

printf("\n");return(0);

} What will this program print using 2 threads ?

TaskingExample/2

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas15

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car A race car

Notethatthisprogramcould(forexample)alsoprint“AAraceracecarcar”or“AraceAcarracecar”,or“AraceAracecarcar”,or

.....ButIhavenotobservedthis(yet)

TaskingExample/3

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas16

#include <stdlib.h>#include <stdio.h>

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf("A ");printf("race ");printf("car ");

}} // End of parallel region

printf("\n");return(0);

}

What will this program print using 2 threads ?

TaskingExample/4

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas17

$ cc -xopenmp –fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car

But of course now only 1 thread executes .......

TaskingExample/5

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas18

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf(“A “);#pragma omp task{printf("race ");}

#pragma omp task{printf("car ");}

}} // End of parallel region

printf("\n");return(0);

}

What will this program print using 2 threads ?

TaskingExample/6

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas19

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car$ ./a.outA race car$ ./a.outA car race$

Tasks can be executed in arbitrary order

TaskingExample/7

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas20

AnotherSimplePlan

Havethesentenceendwith“isfuntowatch”(hint:useaprintstatement)

Youdidwellandquickly,sohereisafinaltasktodo

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas21

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf(“A “);#pragma omp task{printf("race ");}

#pragma omp task{printf("car ");}

printf(“is fun to watch “);}

} // End of parallel region

printf("\n");return(0);

}

What will this program print using 2 threads ?

TaskingExample/8

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas22

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.out

A is fun to watch race car$ ./a.out

A is fun to watch race car$ ./a.out

A is fun to watch car race$

Tasks are executed at a task execution point

TaskingExample/9

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas23

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf(“A “);#pragma omp task

{printf("car ");}#pragma omp task

{printf("race ");}#pragma omp taskwaitprintf(“is fun to watch “);

}} // End of parallel region

printf("\n");return(0);}

What will this program print using 2 threads ?

TaskingExample/10

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas24

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.out$ A car race is fun to watch $ ./a.outA car race is fun to watch$ ./a.outA race car is fun to watch$

Tasks are executed first now

TaskingExample/11

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas25

SortingThingsOut

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas26

TheQuicksortAlgorithmACommonlyUsedAlgorithmUsedForSorting

UsesadivideandconquerstrategyMainsteps:

Splitthearraythroughapivot,suchthat

Allelementstotheleftaresmaller

Allelementstotherightareequal,orgreater

Repeatforleftandrightpartuntildone

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas27

ASimpleExample/1

8 5 7 3 9 initialvalues

8 5 7 3 9 choosepivot, keepindex

8 5 7 3 9 swappivotandlastelement

8 5 9 3 7 scanarray,swapifsmaller

8 5 9 3 7 5<7 =>movetoposition0

5 8 9 3 7 3<7 =>movetoposition1

5 3 9 8 7 continue,but nothing found

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas28

ASimpleExample/2

5 3 9 8 7 restorepivot

5 3 7 8 9 pivotisinfinalposition

5 3

7

8 9

repeatforleftbranch

repeatforrightbranch

OpenMPtask OpenMPtask

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas29

TheRecursiveSequentialCode1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 (void) Quicksort(a, lo, p - 1); // Left branch78 (void) Quicksort(a, p + 1, hi); // Right branch9 }

10 }

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas30

AndNowWithTasks1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 #pragma omp task shared(a) firstprivate(lo,p)7 {(void) Quicksort(a, lo, p - 1);} // Left branch89 #pragma omp task shared(a) firstprivate(hi,p)

10 {(void) Quicksort(a, p + 1, hi);} // Right branch1112 #pragma omp taskwait13 }12 }

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas31

IncludingTheDriverPart1 #pragma omp parallel default(none) shared(a,nelements)2 {3 #pragma omp single nowait4 { (void) Quicksort(a, 0, nelements-1); }5 } // End of parallel region

1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 #pragma omp task default(none) firstprivate(a,lo,p)7 {(void) Quicksort(a, lo, p - 1);} // Left branch89 #pragma omp task default(none) firstprivate(a,hi,p)

10 {(void) Quicksort(a, p + 1, hi);} // Right branch11 }12 }

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas32

FineTuningTheAlgorithmWhenthearraysectiongetstoosmall,itisbettertoswitchtothesequentialalgorithmMayalsoconsidertheuseoftheif-clauseplus

themergeable andfinalclauses

Someexperimentationisrecommended;-)

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas33

APerformanceExample*

30.4

15.0

7.64.1

2.1 1.4 0.9 0.7051015202530354045

0

5

10

15

20

25

30

1 2 4 8 16 32 64 128

Speedup

oversinglethread

Elap

sed>m

e(secon

ds)

NumberofOpenMPthreads

PerformanceoftheOpenMPquicksortalgorithm(40Melements)

Elapsed>me(s) Speedup

*) SPARC M7-8 server @ 4.1 GHz

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas34

SummaryBigBrotherDoesNotNeedToKnowEverything

Forcertaintypesofalgorithms

Taskingisideallysuitable

Optimalperformancemayrequiresomefinetuning

But.......Remember:

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas35

RTFM!

RecognizeTheFabulousMasters

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas36

Thank You And ..... Stay Tuned [email protected]