Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races...
Transcript of Data races in Parallel and High Performance Compu7nghari/teaching/paralg/simone_archer.pdfData races...
Whatisadatarace?
• Adataraceoccurswhentwoormorethreads:– accessthesamememoryloca/on(i.e.sharedvariable)
– atleastonethreadwritesthememoryloca/on– theaccessesareconcurrentandunsynchronized
• Leadstonon-determinis7cbehavior(crashes,programexcep7ons,wrongresults,etc.)
• Hardtofindwithtradi7onaldebuggingtools
2
Whatisadatarace?
3
#pragma omp parallel num_threads(N)
sum = sum + 1;
void Thread1() { // Runs in Thread1.
sum = sum + 1; }
⋮
void ThreadN() { // Runs in ThreadN.
sum = sum + 1;
}
PThreads
OpenMP
Whyisfindingdataracesimportant?
4
• Dataraceareundefinedbehavior:– Non-determinis7cresults– Programerrorsorexcep7ons– Cri7calconsequences
• Dataracesaffectseveralfields:– Filesystems– Security– Life-cri7calsystems
• Compilerop7miza7oncanintroduceraces:– “Benign”races(thereisnosuchthings!)– Instruc7onreordering
• Dataracepreven7on:– Toomuchsynchroniza7onàbadperformanceanddeadlocks
OpenMPDataRaces:Example1#pragma omp parallel for for (i = 1; i < N; i++) {
a[i] = 2.0 * i * (i - 1);
b[i] = a[i] - a[i - 1];
}
• Solu7on Includeaccessestoarrayawithinacri7calsec7on.orRecomputethesecondespression.
5
OpenMPDataRaces:Example1#pragma omp parallel for for (i = 1; i < N; i++) {
#pragma omp critical
{
a[i] = 2.0 * i * (i - 1);
b[i] = a[i] - a[i - 1];
} }
6
OpenMPDataRaces:Example1#pragma omp parallel for for (i = 1; i < N; i++) {
a[i] = 2.0 * i * (i - 1);
b[i] = a[i] – 2.0 * (i – 1) * (i - 2);
}
7
OpenMPDataRaces:Example2
• Solu7on Removethenowaitclausefromfirstfor-loop.
#pragma omp parallel
{ #pragma omp for nowait
for (i = 1; i < N; i++) {
a[i] = 3.0 * i * (i + 1);
}
#pragma omp for for (i = 1; i < N; i++) {
b[i] = a[i] - a[i - 1];
}
}
}
8
OpenMPDataRaces:Example2#pragma omp parallel
{ #pragma omp for for (i = 1; i < N; i++) {
a[i] = 3.0 * i * (i + 1);
}
#pragma omp for for (i = 1; i < N; i++) {
b[i] = a[i] - a[i - 1];
}
}
}
9
OpenMPDataRaces:Example3#pragma omp parallel for for (i = 1; i < N; i++) { x = sqrt(b[i]) - 1; a[i] = x * x + 2 * x + 1; } }
• Solu7on Addprivate(x)inomppragma.
10
OpenMPDataRaces:Example3#pragma omp parallel for private(x) for (i = 1; i < N; i++) { x = sqrt(b[i]) - 1; a[i] = x * x + 2 * x + 1; } }
• Solu7on Addprivate(x)inomppragma.
11
OpenMPDataRaces:Example4#pragma omp parallel for for (i = 0; i < N; i++) { for (j = 0; j < M; j++) { a[i][j] = compute(i,j); } }
• Solu7on Addprivate(x)inomppragma.
12
OpenMPDataRaces:Example4#pragma omp parallel for private(j) for (i = 0; i < N; i++) { for (j = 0; j < M; j++) { a[i][j] = compute(i,j); } }
• Solu7on Addprivate(x)inomppragma.
• GoodPrac7ceUseclausedefault(none),explicitlysharevariableswithshared(…) 13
OpenMPDataRaces:Example5#pragma omp parallel for
for (i = 1; i < N; i++) {
sum = sum + a[i]; } }
• Solu7on Addreduction(+:sum) toomppragma.
14
OpenMPDataRaces:Example5#pragma omp parallel for reduction(+:sum)
for (i = 1; i < N; i++) {
sum = sum + a[i]; } }
• Solu7on Addreduction(+:sum) toomppragma.
15
OpenMPDataRaces:Example6#pragma omp parallel for private(local)
{ #pragma omp master
init = 10; local = init }
• Solu7onmaster constructdoesnothaveanimpliedbarrierbeferuse single
16
OpenMPDataRaces:Example6#pragma omp parallel for private(local)
{ #pragma omp single
init = 10; local = init }
• Solu7onmaster constructdoesnothaveanimpliedbarrierbeferuse single
17
“Benign”dataracesinOpenMPint num_threads; #pragma omp parallel { num_threads = omp_get_num_threads(); }
• ThisisundefinedbehaviorinC/C++.
• Compilertransforma7onsonracycodemighttransforma“benignrace”inaverybadbehavior.
18
“Benign”dataracesinOpenMP#pragma omp parallel {
foo = …; // Store a pointer to function.
num_threads = foo; // Spill from register into the stop variable.
… foo = num_threads; // Restore from the stop variable into a register.
call(foo) ...; // Call function pointed by foo.
num_threads = omp_get_num_threads();
}
• Acompilertransforma7oncouldspillsometemp(i.e.func7onpointer)valueintonum_threads:– Onethreadspillspointertowrite_file()func7on.– Anotherthreadspillspointertolaunch_nuclear_missile() func7on.
• Thefirstthreadrestoresitspointer,itwillgetthewrongpointertolaunch_nuclear_missile().
• The“benign”dataracejustleadtoanaccidentalmissilelaunch.
“Benign”racesarebad,findandfixthemasanyothertypeofdatarace!1
1SeeBenigndataraces:whatcouldpossiblygowrong?byDmitryVyukov.19
Real-WorldexampleatLawrenceLivermoreNa/onalLaboratory– HYDRA–Largemul7physicsMPI/OpenMPapplica7on.– Por7ngononeofthelargestsupercomputerintheworldSequoia.– Non-determinis7ccrashesonathreadedversionofHyprelibrary.– Abovecertainop7miza7onlevelsandcertainscales(8KMPIprocesses).
ArcherfindsdataracesonHYDRA(Hyprelibrary):– ThreedataracesfoundinsidefairlycomplexOpenMPregion.– Theracesare“benign”(samevaluewrifenmul7ple7mesinthesamememory
loca7on).– Crashmanifestsonlyat“>O0”op7miza7onlevel.– Thecompiler(IBMXL),assumingrace-freecodeforop7miza7ons,transforms
“benign”racesinharmfulraces.
“Benign”dataracesinOpenMP
20
DataRaceDetec7onTechiniques
• Sta7cAnalysis– Reasonsaboutallinputs/interleavings– Veryimprecise,manyfalseposi7vesandmissraces– Veryscalableandfast(i.e.norun7meoverhead)
• DynamicAnalysis– Veryprecise,nofalseposi7ves– Reportsracesonlyinbranchesoftheprogramsthatareactuallyexecuted
– Veryhighrun7meandmemoryoverhead
21
DataRaceDetec7onTools
• NotmanyOpenMPracedetectorsoutthere!• Commercialtools:– IntelSta7cSecureAnalysis(sta7canalysis)– IntelInspectorXE(dynamicanalysis)
– SunStudioData-RaceDetec7onTool(dynamicanalysis)
• Open-sourcetools:– ThreadSani7zer(onlyPThreadsprograms,dynamicanalysis)
– Archer,basedonClang/LLVMandThreadSani7zer(sta7canddynamicanalysis)
22
ThreadSani7zer:dataracedetector
• Errorcheckingtoolfor– Memoryerrors– Threadingerrors(Pthreads)
• BasedonLLVM/Clang• Run7meanalysis• AvailableforLinux,WindowsandMac• SupportsC,C++,andFortran• Moreinfo:hfps://github.com/google/sani7zers
23
ThreadSani7zer:UsageandLimita7ons• Compiletheprogramwiththefollowingcommand:
– clang –g –fsanitize=thread myprog.c –o myprog • Run7mecheck
– Errordetec7ononlyinsopwarebranchesthatareexecuted• Lowrun7meoverhead
– Roughly2x-20x– DetectracesonlyinPThreadsapplica7ons– Nofalseposi7ves
• Compilerinstrumenta7on– Slowercompila7onprocess(applydifferentpassesonthesourcecode
toiden7fyracefreeregionsofcode,instrumentsonlytherest),fasterandmorepreciseatrun7me
24
5intvar;67voidThread1(){8var++;9}1011voidThread2(){12var++;13}
WARNING:ThreadSani7zer:dataraceWriteofsize4at0x0000014b2e90bythreadT2:#0Thread2race1.c:12:6Previouswriteofsize4at0x0000014b2e90bythreadT1:#0Thread1race1.c:8:6Loca7onisglobal'var'ofsize4
ThreadSani7zer:ResultSummary
25
Archer:FeaturesandLimita7ons• Sta7cAnalysis
– OnlyforOpenMPprograms– Excluderacefreeregionsandsequen7al
codefromrun7meanalysistoreduceoverhead• Run7mecheck
– Errordetec7ononlyinsopwarebranchesthatareexecuted• Lowrun7meoverhead
– Roughly2x-20x– DetectracesinlargeOpenMPapplica7ons– Nofalseposi7ves
• Compilerinstrumenta7on– Slowercompila7onprocess(applydifferentpassesonthesourcecode
toiden7fyracefreeregionsofcode,instrumentsonlytherest),fasterandmorepreciseatrun7me
26
Archer:Usage– Compiletheprogramwiththe–gcompilerflag
• clang-archer myprog.c –o myprog – RuntheprogramundercontrolofArcherRun7me
• export OMP_NUM_THREADS=... ./myprog
• Detectsproblemsonlyinsopwarebranchesthatareexecuted
– Understandandcorrectthethreadingerrorsdetected– Editthesourcecode– Repeatun7lnoerrorsreported
27
6#pragmaompparallel7{8for(inti=0;i<100;i++)9 a[i]=a[i]*a[i];10 }1112#pragmaompparallel13{14if(a<100){15#pragmaompcri7cal16a++;17}18}
Excludedfromthemrun7mecheckbecauseitisracefree.WARNING:ThreadSani7zer:dataraceReadofsize4at0x7fffffffdcdcbythreadT2:#0.omp_outlined.race.c:14(race+0x0000004a6dce)#1__kmp_invoke_microtask<null>(libomp_tsan.so)
Previouswriteofsize4at0x7fffffffdcdcbymainthread:#0.omp_outlined.race.c:16(race+0x0000004a6e2c)#1__kmp_invoke_microtask<null>(libomp_tsan.so)
Archer:ResultSummary
28
Archer:OpenMPdataracedetector
• Errorcheckingtoolfor– Memoryerrors– Threadingerrors(OpenMP,Pthreads)
• BasedonLLVM/ClangandThreadSani7zer• Sta7candRun7meanalysis• AvailableforLinux,WindowsandMac• SupportsC,C++• Moreinfo:hfps://github.com/PRUNER/archer
29
Archer:FeaturesandLimita7ons• Sta7cAnalysis
– OnlyforOpenMPprograms– Excluderacefreeregionsandsequen7al
codefromrun7meanalysistoreduceoverhead• Run7mecheck
– Errordetec7ononlyinsopwarebranchesthatareexecuted• Lowrun7meoverhead
– Roughly2x-20x– DetectracesinlargeOpenMPapplica7ons– Nofalseposi7ves
• Compilerinstrumenta7on– Slowercompila7onprocess(applydifferentpassesonthesourcecode
toiden7fyracefreeregionsofcode,instrumentsonlytherest),fasterandmorepreciseatrun7me
30
Archer:Usage– Compiletheprogramwiththe–gcompilerflag
• clang-archer myprog.c –o myprog – RuntheprogramundercontrolofArcherRun7me
• export OMP_NUM_THREADS=... ./myprog
• Detectsproblemsonlyinsopwarebranchesthatareexecuted
– Understandandcorrectthethreadingerrorsdetected– Editthesourcecode– Repeatun7lnoerrorsreported
31
6#pragmaompparallel7{8for(inti=0;i<100;i++)9 a[i]=a[i]*a[i];10 }1112#pragmaompparallel13{14if(a<100){15#pragmaompcri7cal16a++;17}18}
Excludedfromthemrun7mecheckbecauseitisracefree.WARNING:ThreadSani7zer:dataraceReadofsize4at0x7fffffffdcdcbythreadT2:#0.omp_outlined.race.c:14(race+0x0000004a6dce)#1__kmp_invoke_microtask<null>(libomp_tsan.so)
Previouswriteofsize4at0x7fffffffdcdcbymainthread:#0.omp_outlined.race.c:16(race+0x0000004a6e2c)#1__kmp_invoke_microtask<null>(libomp_tsan.so)
Archer:ResultSummary
32