Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races...

32
Data races in Parallel and High Performance Compu7ng Simone Atzeni [email protected] February 3, 2016

Transcript of Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races...

Page 1: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

DataracesinParallelandHighPerformanceCompu7ng

[email protected]

February3,2016

Page 2: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Whatisadatarace?

•  Adataraceoccurswhentwoormorethreads:– accessthesamememoryloca/on(i.e.sharedvariable)

– atleastonethreadwritesthememoryloca/on–  theaccessesareconcurrentandunsynchronized

•  Leadstonon-determinis7cbehavior(crashes,programexcep7ons,wrongresults,etc.)

•  Hardtofindwithtradi7onaldebuggingtools

2

Page 3: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Whatisadatarace?

3

#pragma omp parallel num_threads(N)

sum = sum + 1;

void Thread1() { // Runs in Thread1.

sum = sum + 1; }

void ThreadN() { // Runs in ThreadN.

sum = sum + 1;

}

PThreads

OpenMP

Page 4: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Whyisfindingdataracesimportant?

4

•  Dataraceareundefinedbehavior:–  Non-determinis7cresults–  Programerrorsorexcep7ons–  Cri7calconsequences

•  Dataracesaffectseveralfields:–  Filesystems–  Security–  Life-cri7calsystems

•  Compilerop7miza7oncanintroduceraces:–  “Benign”races(thereisnosuchthings!)–  Instruc7onreordering

•  Dataracepreven7on:–  Toomuchsynchroniza7onàbadperformanceanddeadlocks

Page 5: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example1#pragma omp parallel for for (i = 1; i < N; i++) {

a[i] = 2.0 * i * (i - 1);

b[i] = a[i] - a[i - 1];

}

•  Solu7on Includeaccessestoarrayawithinacri7calsec7on.orRecomputethesecondespression.

5

Page 6: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example1#pragma omp parallel for for (i = 1; i < N; i++) {

#pragma omp critical

{

a[i] = 2.0 * i * (i - 1);

b[i] = a[i] - a[i - 1];

} }

6

Page 7: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example1#pragma omp parallel for for (i = 1; i < N; i++) {

a[i] = 2.0 * i * (i - 1);

b[i] = a[i] – 2.0 * (i – 1) * (i - 2);

}

7

Page 8: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example2

•  Solu7on Removethenowaitclausefromfirstfor-loop.

#pragma omp parallel

{ #pragma omp for nowait

for (i = 1; i < N; i++) {

a[i] = 3.0 * i * (i + 1);

}

#pragma omp for for (i = 1; i < N; i++) {

b[i] = a[i] - a[i - 1];

}

}

}

8

Page 9: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example2#pragma omp parallel

{ #pragma omp for for (i = 1; i < N; i++) {

a[i] = 3.0 * i * (i + 1);

}

#pragma omp for for (i = 1; i < N; i++) {

b[i] = a[i] - a[i - 1];

}

}

}

9

Page 10: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example3#pragma omp parallel for for (i = 1; i < N; i++) { x = sqrt(b[i]) - 1; a[i] = x * x + 2 * x + 1; } }

•  Solu7on Addprivate(x)inomppragma.

10

Page 11: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example3#pragma omp parallel for private(x) for (i = 1; i < N; i++) { x = sqrt(b[i]) - 1; a[i] = x * x + 2 * x + 1; } }

•  Solu7on Addprivate(x)inomppragma.

11

Page 12: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example4#pragma omp parallel for for (i = 0; i < N; i++) { for (j = 0; j < M; j++) { a[i][j] = compute(i,j); } }

•  Solu7on Addprivate(x)inomppragma.

12

Page 13: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example4#pragma omp parallel for private(j) for (i = 0; i < N; i++) { for (j = 0; j < M; j++) { a[i][j] = compute(i,j); } }

•  Solu7on Addprivate(x)inomppragma.

•  GoodPrac7ceUseclausedefault(none),explicitlysharevariableswithshared(…) 13

Page 14: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example5#pragma omp parallel for

for (i = 1; i < N; i++) {

sum = sum + a[i]; } }

•  Solu7on Addreduction(+:sum) toomppragma.

14

Page 15: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example5#pragma omp parallel for reduction(+:sum)

for (i = 1; i < N; i++) {

sum = sum + a[i]; } }

•  Solu7on Addreduction(+:sum) toomppragma.

15

Page 16: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example6#pragma omp parallel for private(local)

{ #pragma omp master

init = 10; local = init }

•  Solu7onmaster constructdoesnothaveanimpliedbarrierbeferuse single

16

Page 17: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

OpenMPDataRaces:Example6#pragma omp parallel for private(local)

{ #pragma omp single

init = 10; local = init }

•  Solu7onmaster constructdoesnothaveanimpliedbarrierbeferuse single

17

Page 18: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

“Benign”dataracesinOpenMPint num_threads; #pragma omp parallel { num_threads = omp_get_num_threads(); }

•  ThisisundefinedbehaviorinC/C++.

•  Compilertransforma7onsonracycodemighttransforma“benignrace”inaverybadbehavior.

18

Page 19: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

“Benign”dataracesinOpenMP#pragma omp parallel {

foo = …; // Store a pointer to function.

num_threads = foo; // Spill from register into the stop variable.

… foo = num_threads; // Restore from the stop variable into a register.

call(foo) ...; // Call function pointed by foo.

num_threads = omp_get_num_threads();

}

•  Acompilertransforma7oncouldspillsometemp(i.e.func7onpointer)valueintonum_threads:–  Onethreadspillspointertowrite_file()func7on.–  Anotherthreadspillspointertolaunch_nuclear_missile() func7on.

•  Thefirstthreadrestoresitspointer,itwillgetthewrongpointertolaunch_nuclear_missile().

•  The“benign”dataracejustleadtoanaccidentalmissilelaunch.

“Benign”racesarebad,findandfixthemasanyothertypeofdatarace!1

1SeeBenigndataraces:whatcouldpossiblygowrong?byDmitryVyukov.19

Page 20: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Real-WorldexampleatLawrenceLivermoreNa/onalLaboratory–  HYDRA–Largemul7physicsMPI/OpenMPapplica7on.–  Por7ngononeofthelargestsupercomputerintheworldSequoia.–  Non-determinis7ccrashesonathreadedversionofHyprelibrary.–  Abovecertainop7miza7onlevelsandcertainscales(8KMPIprocesses).

ArcherfindsdataracesonHYDRA(Hyprelibrary):–  ThreedataracesfoundinsidefairlycomplexOpenMPregion.–  Theracesare“benign”(samevaluewrifenmul7ple7mesinthesamememory

loca7on).–  Crashmanifestsonlyat“>O0”op7miza7onlevel.–  Thecompiler(IBMXL),assumingrace-freecodeforop7miza7ons,transforms

“benign”racesinharmfulraces.

“Benign”dataracesinOpenMP

20

Page 21: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

DataRaceDetec7onTechiniques

•  Sta7cAnalysis–  Reasonsaboutallinputs/interleavings–  Veryimprecise,manyfalseposi7vesandmissraces–  Veryscalableandfast(i.e.norun7meoverhead)

•  DynamicAnalysis–  Veryprecise,nofalseposi7ves–  Reportsracesonlyinbranchesoftheprogramsthatareactuallyexecuted

–  Veryhighrun7meandmemoryoverhead

21

Page 22: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

DataRaceDetec7onTools

•  NotmanyOpenMPracedetectorsoutthere!•  Commercialtools:–  IntelSta7cSecureAnalysis(sta7canalysis)–  IntelInspectorXE(dynamicanalysis)

– SunStudioData-RaceDetec7onTool(dynamicanalysis)

•  Open-sourcetools:– ThreadSani7zer(onlyPThreadsprograms,dynamicanalysis)

– Archer,basedonClang/LLVMandThreadSani7zer(sta7canddynamicanalysis)

22

Page 23: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

ThreadSani7zer:dataracedetector

•  Errorcheckingtoolfor– Memoryerrors–  Threadingerrors(Pthreads)

•  BasedonLLVM/Clang•  Run7meanalysis•  AvailableforLinux,WindowsandMac•  SupportsC,C++,andFortran•  Moreinfo:hfps://github.com/google/sani7zers

23

Page 24: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

ThreadSani7zer:UsageandLimita7ons•  Compiletheprogramwiththefollowingcommand:

–  clang –g –fsanitize=thread myprog.c –o myprog •  Run7mecheck

–  Errordetec7ononlyinsopwarebranchesthatareexecuted•  Lowrun7meoverhead

–  Roughly2x-20x–  DetectracesonlyinPThreadsapplica7ons–  Nofalseposi7ves

•  Compilerinstrumenta7on–  Slowercompila7onprocess(applydifferentpassesonthesourcecode

toiden7fyracefreeregionsofcode,instrumentsonlytherest),fasterandmorepreciseatrun7me

24

Page 25: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

5intvar;67voidThread1(){8var++;9}1011voidThread2(){12var++;13}

WARNING:ThreadSani7zer:dataraceWriteofsize4at0x0000014b2e90bythreadT2:#0Thread2race1.c:12:6Previouswriteofsize4at0x0000014b2e90bythreadT1:#0Thread1race1.c:8:6Loca7onisglobal'var'ofsize4

ThreadSani7zer:ResultSummary

25

Page 26: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Archer:FeaturesandLimita7ons•  Sta7cAnalysis

–  OnlyforOpenMPprograms–  Excluderacefreeregionsandsequen7al

codefromrun7meanalysistoreduceoverhead•  Run7mecheck

–  Errordetec7ononlyinsopwarebranchesthatareexecuted•  Lowrun7meoverhead

–  Roughly2x-20x–  DetectracesinlargeOpenMPapplica7ons–  Nofalseposi7ves

•  Compilerinstrumenta7on–  Slowercompila7onprocess(applydifferentpassesonthesourcecode

toiden7fyracefreeregionsofcode,instrumentsonlytherest),fasterandmorepreciseatrun7me

26

Page 27: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Archer:Usage–  Compiletheprogramwiththe–gcompilerflag

•  clang-archer myprog.c –o myprog –  RuntheprogramundercontrolofArcherRun7me

•  export OMP_NUM_THREADS=... ./myprog

•  Detectsproblemsonlyinsopwarebranchesthatareexecuted

–  Understandandcorrectthethreadingerrorsdetected–  Editthesourcecode–  Repeatun7lnoerrorsreported

27

Page 28: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

6#pragmaompparallel7{8for(inti=0;i<100;i++)9 a[i]=a[i]*a[i];10  }1112#pragmaompparallel13{14if(a<100){15#pragmaompcri7cal16a++;17}18}

Excludedfromthemrun7mecheckbecauseitisracefree.WARNING:ThreadSani7zer:dataraceReadofsize4at0x7fffffffdcdcbythreadT2:#0.omp_outlined.race.c:14(race+0x0000004a6dce)#1__kmp_invoke_microtask<null>(libomp_tsan.so)

Previouswriteofsize4at0x7fffffffdcdcbymainthread:#0.omp_outlined.race.c:16(race+0x0000004a6e2c)#1__kmp_invoke_microtask<null>(libomp_tsan.so)

Archer:ResultSummary

28

Page 29: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Archer:OpenMPdataracedetector

•  Errorcheckingtoolfor– Memoryerrors–  Threadingerrors(OpenMP,Pthreads)

•  BasedonLLVM/ClangandThreadSani7zer•  Sta7candRun7meanalysis•  AvailableforLinux,WindowsandMac•  SupportsC,C++•  Moreinfo:hfps://github.com/PRUNER/archer

29

Page 30: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Archer:FeaturesandLimita7ons•  Sta7cAnalysis

–  OnlyforOpenMPprograms–  Excluderacefreeregionsandsequen7al

codefromrun7meanalysistoreduceoverhead•  Run7mecheck

–  Errordetec7ononlyinsopwarebranchesthatareexecuted•  Lowrun7meoverhead

–  Roughly2x-20x–  DetectracesinlargeOpenMPapplica7ons–  Nofalseposi7ves

•  Compilerinstrumenta7on–  Slowercompila7onprocess(applydifferentpassesonthesourcecode

toiden7fyracefreeregionsofcode,instrumentsonlytherest),fasterandmorepreciseatrun7me

30

Page 31: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

Archer:Usage–  Compiletheprogramwiththe–gcompilerflag

•  clang-archer myprog.c –o myprog –  RuntheprogramundercontrolofArcherRun7me

•  export OMP_NUM_THREADS=... ./myprog

•  Detectsproblemsonlyinsopwarebranchesthatareexecuted

–  Understandandcorrectthethreadingerrorsdetected–  Editthesourcecode–  Repeatun7lnoerrorsreported

31

Page 32: Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races in Parallel and High Performance Compu7ng Simone Atzeni simone@cs.utah.edu February

6#pragmaompparallel7{8for(inti=0;i<100;i++)9 a[i]=a[i]*a[i];10  }1112#pragmaompparallel13{14if(a<100){15#pragmaompcri7cal16a++;17}18}

Excludedfromthemrun7mecheckbecauseitisracefree.WARNING:ThreadSani7zer:dataraceReadofsize4at0x7fffffffdcdcbythreadT2:#0.omp_outlined.race.c:14(race+0x0000004a6dce)#1__kmp_invoke_microtask<null>(libomp_tsan.so)

Previouswriteofsize4at0x7fffffffdcdcbymainthread:#0.omp_outlined.race.c:16(race+0x0000004a6e2c)#1__kmp_invoke_microtask<null>(libomp_tsan.so)

Archer:ResultSummary

32