PosterPrintSize:Thispostertemplateis48”highby36”wide.Itcanbeusedtoprintanyposterwitha4:3aspectraAo.
Placeholders:ThevariouselementsincludedinthisposterareonesweoCenseeinmedical,research,andscienAficposters.Feelfreetoedit,move,add,anddeleteitems,orchangethelayouttosuityourneeds.Alwayscheckwithyourconferenceorganizerforspecificrequirements.
ImageQuality:YoucanplacedigitalphotosorlogoartinyourposterfilebyselecAngtheInsert,Picturecommand,orbyusingstandardcopy&paste.Forbestresults,allgraphicelementsshouldbeatleast150-200pixelsperinchintheirfinalprintedsize.Forinstance,a1600x1200pixelphotowillusuallylookfineupto8“-10”wideonyourprintedposter.Topreviewtheprintqualityofimages,selectamagnificaAonof100%whenpreviewingyourposter.Thiswillgiveyouagoodideaofwhatitwilllooklikeinprint.Ifyouarelayingoutalargeposterandusinghalf-scaledimensions,besuretopreviewyourgraphicsat200%toseethemattheirfinalprintedsize.Pleasenotethatgraphicsfromwebsites(suchasthelogoonyourhospital'soruniversity'shomepage)willonlybe72dpiandnotsuitableforprinAng.
[Thissidebarareadoesnotprint.]
ChangeColorTheme:Thistemplateisdesignedtousethebuilt-incolorthemesinthenewerversionsofPowerPoint.Tochangethecolortheme,selecttheDesigntab,thenselecttheColorsdrop-downlist.Thedefaultcolorthemeforthistemplateis“Office”,soyoucanalwaysreturntothataCertryingsomeofthealternaAves.
PrinAngYourPoster:Onceyourposterfileisready,visitwww.genigraphics.comtoorderahigh-quality,affordableposterprint.EveryorderreceivesafreedesignreviewandwecandeliverasfastasnextbusinessdaywithintheUSandCanada.Genigraphics®hasbeenproducingoutputfromPowerPoint®longerthananyoneintheindustry;daAngbacktowhenwehelpedMicrosoC®designthePowerPointsoCware.USandCanada:1-800-790-4001Email:[email protected]
[Thissidebarareadoesnotprint.]
AdvancedOpenMPConstructs,Tuning,andToolsatNERSC
AhanaRoyChoudhury1,Yun(Helen)He2,andAliceKoniges21Computer&InformaAonSciencesDepartment,UniversityofAlabamaatBirmingham
2NERSC,LawrenceBerkeleyNaAonalLaboratory,Berkeley,CA
1. hnps://compuAng.llnl.gov/tutorials/openMP/2. hnp://www.eweek.com/c/a/ApplicaAon-Development/Oracle-and-Java-7-The-Top-10-Developer-Features-6261453. UsingOpenMPbyBarbaraChapman,GabrieleJost,RuudvanderPas,TheMITPress,MIT,2008.4. hnps://soCware.intel.com/en-us/get-started-with-advisor-threading-linux5. hnps://soCware.intel.com/en-us/intel-inspector-xe6. hnps://www.olcf.ornl.gov/wp-content/uploads/2013/02/Cray_Reveal-HP1.pdf
7. hnp://www.nersc.gov/users/computaAonal-systems/cori/applicaAon-porAng-and-performance/improving-openmp-scaling/
8. hnps://soCware.intel.com/en-us/arAcles/avoiding-and-idenAfying-false-sharing-among-threads9. hnps://soCware.intel.com/en-us/arAcles/finding-your-memory-access-performance-bonlenecks10. hnps://www.nersc.gov/assets/Uploads/Nested-OpenMP-NUG-20151008.pdf11. hnp://www.nersc.gov/users/training/events/advanced-openmp-training-february-4-2016/12. hnps://drive.google.com/a/lbl.gov/file/d/0B9D5EnxRqcaZalR5WEh6bkhxNGs/view
References
• Whichpartofthecodecanbesafelyparallelized?• Howtodetectandavoiddataraces?• Howtodetectandavoidmemoryleaks?• HowcantoolsbeusedtogetsuggesAonsonvariablescopeandOpenMPcompilerdirecAves?• HowtodetecttopAme-consumingloops?• Howtodetectandremovefalsesharing?• Ensuringdesirableprocessandthreadaffinity• UsingNestedOpenMP
QuesNonsandChallenges
• Falsesharingoccurswhenthreadsondifferentprocessorsanempttomodifyvariablesthatresideonthesamecacheline.
• ItcausesperformancedegradaAonduetocoherenceissuesandshouldbeavoided.• IntelVTuneAmplifier isaperformanceanalysis tool thathelps toanalyze thealgorithmand idenAfy
whereandhowapplicaAonscanbenefitfromavailablehardwareresources.• TodetectfalsesharingusingVTuneAmplifier,wewroteacodethatcausesfalsesharing,detectedthe
problemand then resolved itusingpadding.Wealsonoted that somecompileropAmizaAonchoicesremovefalsesharing.
BeforeFalseSharingRemovalARerfalseSharingRemoval
Abstract
• Thread affinity binds each process or thread to run on a specific subset of processors, to take advantage of memory locality.
• Improper process/thread affinity could slow down code performance significantly.
UsingtheOMP_PROC_BINDenvironmentvariable
ProcessandThreadAffinity
In order to achieve good performance in OpenMP codes, it is important to use advanced OpenMPconceptslikeprocessandthreadaffinityandnestedOpenMP.Itisequallycrucialtoavoidfalsesharingandover-subscripAon.Wehaveexploredhow tools canbeused to facilitate theprocessof tuningOpenMPcodesaswellasworkedonthreadandprocessaffinityandnestedOpenMP.Futurework involvesusingnestedOpenMPtospeedupfullapplicaAons.
Conclusions
NERSC’snextgeneraAonsupercomputersystems,e.g.,CoriPhase2withKNL (IntelKnightsLanding)architecture,havealargenumberofcorespernode,so,itisimportanttoconsideradvancedOpenMPconceptsinordertoachieveopAmalperformanceonsuchsystems.In thisproject,wehavewrinenand implementedcode snippetsandused themto testa varietyoftools for improving OpenMP performance and detailed their usage on various NERSC systemsincludingKNL.Wehaveexplored thedetecAonof issues such as false sharing anddata racesusingtools and how they can be resolved. We have also explored advanced OpenMP concepts such asprocessandthreadaffinityandnestedOpenMP.
ImprovingPerformanceofSampleCodesusingTools
IdenAfyTopTimeConsumingLoopsusingSurveyReport
InsertAnnotaAonsforParallelRegionsandRecompileCode
BasedontheSuitabilityReport,ModifytheCodebyParallelizingtheLoop
UseSuitabilityReporttoPredicttheSpeed-upoftheApplicaAonbasedonAnnotaAons
CheckDependenciestoRemoveDataSharingProblems
ForkandJoinModelWhentouseNestedOpenMP Thread Affinity for Nested OpenMP
NestedOpenMP
OMP_NUM_THREADS=4,3OMP_PROC_BIND=spread,close OMP_PROC_BIND=spread,spread
Level 2 threadsin same teambindtothreadsonsamecore
Level 2 threadsin same teamevenly spreadon threads ondifferentcores
Scopingtheloop
ResolveunresolvedVariables
0
20
40
60
80
100
120
140
160
2 4 8 16 32 64
Time(s)
#ofthreads
AdvisorEsNmateandActualWallClockTime
EsAmatedAme
ActualTime
• IntelAdvisorhelpstoensurethatFortran,CandC++applicaAonstakefullperformanceadvantageoftoday'sprocessors.
• ThreadingAdvisorisathreadingdesignandprototypingtoolthathelpstoanalyze,design,tune,andcheckthreadingdesignopAonswithoutdisrupAngnormaldevelopment.
ThreadingDesignusingIntelAdvisor
Comparison between Advisor esAmated and measuredwall clock Ames. The % variaAon range is 3-15% andincreaseswithincreasingnumbersofthreads.
ParallelizingCodesusingCrayReveal
ThreadingandMemoryErrorDetecNonusingIntelInspector
• RevealisatooldevelopedbyCraythatispartoftheCrayPerCoolssoCwarepackage.• HelpstoidenAfytopAme-consumingloops,dependenciesandvectorizaAon.• Loop scope analysis provides variable scope and compiler direcAve suggesAons for inserAngOpenMP.
• Intel Inspector is adynamicmemoryand threadingerror checking tool forusersdeveloping serialandmulAthreadedapplicaAons.
ExampleforDetecNngDataRaceExampleforDetecNngMemoryProblems
ThreadingDesignusingIntelAdvisor
ImprovingPerformanceofSampleCodesusingTools(conNnued)
Thislinecausesfalsesharing
Toachievemorefine-grainedthreadparallelism:• When the top level OpenMP loop does not use all
availablethreads• WhenmulAplelevelsofOpenMPloopsarenoteasily
collapsed• ForcertaincomputaAonintensivekernels• FormulA-threadedMKL(IntelMathKernelLibrary)
Top Related