Choosing Resources

34
Choosing Resources: How Much RAM? How Many CPUs? RCS Lunch & Learn Training Series Bob Freeman, PhD Director, Research Technology Operations HBS 3 October, 2017

Transcript of Choosing Resources

Page 1: Choosing Resources

ChoosingResources:HowMuchRAM?HowManyCPUs?

RCSLunch&LearnTrainingSeries

BobFreeman,PhDDirector,ResearchTechnologyOperations

HBS

3October,2017

Page 2: Choosing Resources

Overview• Q&A• Why?• What'sGoingOn?• Choosingresources• Jobefficiency– are4coresneeded?• LessRAM+lessCPU=morework

We'regoingassumeyou'vebeendoingworkforafewweeks.Noworriesifnot!

Page 3: Choosing Resources

ACloserLook…

Scheduler

Page 4: Choosing Resources

ACloserLook…

Storage

Compute

RemoteAccessSoftware

Page 5: Choosing Resources

ACloserLook…atCompute

Page 6: Choosing Resources

ACloserLook…atCompute

1.Startapplication…

2.Applicationlaunches…

3.Runterminalapp… 4.Usebjobs commandforjobinfo…

Page 7: Choosing Resources

ACloserLook…atCompute

Scheduler

1. RunprogramfromApplicationpull-downmenu2. Scheduleriscontactedtorequestworkbedone(interactiveprogram)

3. Schedulerfindscomputenodewithresourcesfree4. Jobissenttocomputenode*5. Computenodestartsexecutingcodewhilethedisplaystillappearsonthelogin

6. Programcontinuestorunandresourcesareunavailableuntilexplicitlyquitbyuser

*onlyifresourcelimitsaren'thit

Page 8: Choosing Resources

FiniteLimitsoftheGrid…32cores*8boxes =256coreseachbox =256GBRAM

resourcelimits: 12cores80GBRAM

Launch 2xStata-MP4-30g1xStata-MP4-20gMAX

Limits:256/12cores/person =atmost22people8x(256/80GB/person) =atmost24people

Dotalktousifyouneedatemporaryincreaseofyourallocation

Page 9: Choosing Resources

LessRAM+lessCPU=morework

Page 10: Choosing Resources

Whatareyouworkingwith?32cores*8boxes =256coreseachbox =256GBRAM

resourcelimits: 12cores80GBRAM

Launch 2xStata-MP4-30g1xStata-MP4-20gMAX

Isitreallyworthit?Doyoureallyneedthis?DoyouneedallthatRAM?DoyouneedallthatCPU?

GoingbeyondyourresourcelimitcausesPENDproblems:[10:44:46, rfreeman@rhrcscli01:~]$ bjobs –wJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME144795 rfreeman RUN interactive rhrcscli01 4*rhrcsnod08 /usr/local/bin/jobstarter_xstata-mp4.pl xstata-mp4 Mar 7 10:40144796 rfreeman RUN interactive rhrcscli01 4*rhrcsnod07 /usr/local/bin/jobstarter_xstata-mp4.pl xstata-mp4 Mar 7 10:44144797 rfreeman RUN interactive rhrcscli01 4*rhrcsnod07 /usr/local/bin/jobstarter_xstata-mp4.pl xstata-mp4 Mar 7 10:44144798 rfreeman PEND interactive rhrcscli01 - /usr/local/bin/jobstarter_xstata-se.pl xstata-se Mar 7 10:44[10:44:51, rfreeman@rhrcscli01:~]$

Page 11: Choosing Resources

HowmuchCPUareyouusing?15of38are>10%efficiency

22of29areinteractives>7days

Page 12: Choosing Resources

HowmuchRAMareyouusing?

10of38are>50%ofRAMask

Page 13: Choosing Resources

Whatareyouworkingwith?

32cores*8boxes =256coreseachbox =256GBRAM

resourcelimits: 12cores80GBRAM

Isitreallyworthit?Doyoureallyneedthis?DoyouneedallthatRAMDoyouneedallthatCPU?

Page 14: Choosing Resources

MethodsforSelectingRAM

Severalgoodstartingpoints:

• Moreisnotreallybetter,sincethisisasharedresource

• Usefewercores,perhaps1,forinteractivework,unlessfinishingworkinshortperiodoftime.• Interactive=computeriswaitingforyou

• CheckyourMAXMEMusagefrompastjobhistory(asshowninafewslides),andselectbest-fitmemoryfootprint

• Giveyourself20%extraRAMforwiggleroom!• (Harder)WritecustomLSFjobsubmitcommandstocloselymatchmemory

usage• You'llneedtodothisifrequiringRAMamounts>30GB*

Page 15: Choosing Resources

MethodsforSelectingRAMIfyoudon'treallyknowwheretostart…

• Eachlanguagehascommandsthatwillgiveyouthememoryusageofyourdatawhileloaded(inmemory):

Stata:.memory

grand total indicatesused andallocated (!)

https://www.stata.com/manuals14/dmemory.pdf

MATLAB:memory functiononlyavailableonWindows.Others,usemonitor_memory_whos.mfunctiontodeterminevariableusage,andadd0.5GBforapplicationoverhead.

https://www.mathworks.com/matlabcentral/answers/97560-how-can-i-monitor-how-much-memory-matlab-is-using

Python:guppymodulefortotalprogramandobjectinformation:from guppy import hpyh = hpy()print h.heap()Total size = 19909080 bytes.

https://www.pluralsight.com/blog/tutorials/how-to-profile-memory-usage-in-python

R:mem_used() functionofpryr packagecaninformyourvariableusage,andadd0.5GBforapplicationoverhead.

http://adv-r.had.co.nz/memory.html

Page 16: Choosing Resources

MethodsforSelectingRAMIfyoudon'treallyknowwheretostart…

• Or,ifnotcreatingnewdatastructuresafterreadingindatafile,tryRAMfootprintthatis10xthedatafilesize.Ifcreatingnewones,try20xto30x.

• Or,tryalargememorysize(e.g.20G),finishyourwork,anddecreasethememoryaskbycheckingtheMAXMEMusage,andselectingbestfitmemoryfootprintnexttime

• Giveyourself20%wiggleroom!

# One Unix command to rule them all...

[11:06:32, rfreeman@rhrcscli01:~]$ bjobs -l | grep -E "Application|IDLE|MAX"

Job <144795>, User <rfreeman>, Project <XSTATA>, Application <stata-mp4-30g>, S IDLE_FACTOR(cputime/runtime): 0.01 MAX MEM: 56 Mbytes; AVG MEM: 49 Mbytes

Page 17: Choosing Resources

LookingatJobInformation

Page 18: Choosing Resources

ACloserLook…• Getjobinformation

• bjobs, bjobs (jobID), bjobs -w, bjobs -l

• Gethistoricinformation• bhist, bhist (jobid), bhist -l (jobID), bhist -S 2016/03/01,

• Killajob• bkill (jobID), bkill -J (jobname)

• Howbusyisthecluster?• bjobs -u all, bqueues && bhosts

Page 19: Choosing Resources

ForCurrentJobs…[16:57:27, rfreeman@rhrcscli01:~]$ bjobsJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME144767 rfreema RUN interactiv rhrcscli01 rhrcsnod02 *n/rstudio Mar 6 16:44

[16:57:30, rfreeman@rhrcscli01:~]$ bjobs -l 144767

Job <144767>, User <rfreeman>, Project <R>, Application <R-5g>, Status <RUN>, Queue <interactive>, Interactive mode, Command </usr/local/apps/R/rstudio/v0.98.493/bin/rstudio>

Mon Mar 6 16:44:29: Submitted from host <rhrcscli01>, CWD <$HOME>;Mon Mar 6 16:44:29: Started on <rhrcsnod02>;Mon Mar 6 16:57:34: Resource usage collected.

The CPU time used is 37 seconds.IDLE_FACTOR(cputime/runtime): 0.04MEM: 169 Mbytes; SWAP: 2.2 Gbytes; NTHREAD: 13PGID: 12778; PIDs: 12778 PGID: 12779; PIDs: 12779 12781 12793

MEMORY USAGE:MAX MEM: 169 Mbytes; AVG MEM: 159 Mbytes

SCHEDULING PARAMETERS:r15s r1m r15m ut pg io ls it tmp swp mem

loadSched - - - - - - - - - - -loadStop - - - - - - - - - - -

RESOURCE REQUIREMENT DETAILS:Combined: select[type == local] order[r15s:pg] rusage[mem=5120.00]Effective: select[type == local] order[r15s:pg] rusage[mem=5120.00]

Page 20: Choosing Resources

ACloserLook…

[16:59:29, rfreeman@rhrcscli01:~]$ bjobsJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME144769 rfreema RUN interactiv rhrcscli01 rhrcsnod02 *n/rstudio Mar 6 17:02[17:09:08, rfreeman@rhrcscli01:~]$ bjobs -l 144769

Job <144769>, User <rfreeman>, Project <R>, Application <R-5g>, Status <RUN>, Queue <interactive>, Interactive mode, Command </usr/local/apps/R/rstudio/v0.98.493/bin/rstudio>

Mon Mar 6 17:02:25: Submitted from host <rhrcscli01>, CWD <$HOME>;Mon Mar 6 17:02:25: Started on <rhrcsnod02>;Mon Mar 6 17:08:43: Resource usage collected.

The CPU time used is 365 seconds.IDLE_FACTOR(cputime/runtime): 0.90MEM: 221 Mbytes; SWAP: 2.2 Gbytes; NTHREAD: 13PGID: 19033; PIDs: 19033 PGID: 19034; PIDs: 19034 19036 19048

MEMORY USAGE:MAX MEM: 221 Mbytes; AVG MEM: 204 Mbytes

Page 21: Choosing Resources

[17:16:34, rfreeman@rhrcscli01:~]$ bjobs -l 144769

Job <144769>, User <rfreeman>, Project <R>, Application <R-5g>, Status <RUN>, Queue <interactive>, Interactive mode, Command </usr/local/apps/R/rstudio/v0.98.493/bin/rstudio>

Mon Mar 6 17:02:25: Submitted from host <rhrcscli01>, CWD <$HOME>;Mon Mar 6 17:02:25: Started on <rhrcsnod02>;Mon Mar 6 20:44:00: Resource usage collected.

The CPU time used is 975 seconds.IDLE_FACTOR(cputime/runtime): 0.07MEM: 244 Mbytes; SWAP: 2.3 Gbytes; NTHREAD: 13PGID: 19033; PIDs: 19033 PGID: 19034; PIDs: 19034 19036 19048

MEMORY USAGE:MAX MEM: 246 Mbytes; AVG MEM: 239 Mbytes

SCHEDULING PARAMETERS:r15s r1m r15m ut pg io ls it tmp swp mem

loadSched - - - - - - - - - - -loadStop - - - - - - - - - - -

RESOURCE REQUIREMENT DETAILS:Combined: select[type == local] order[r15s:pg] rusage[mem=5120.00]Effective: select[type == local] order[r15s:pg] rusage[mem=5120.00]

3Hourslater…

Page 22: Choosing Resources

[17:16:34, rfreeman@rhrcscli01:~]$ bhist -a -S 2017/08/01,Summary of time in seconds spent in various states:JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL155256 rfreema *in/bash 0 0 1121 0 0 0 1121 155948 rfreema *in/bash 0 0 21358 0 0 0 21358 157290 rfreema MATLAB 0 0 376 0 0 0 376 157396 rfreema *in/bash 0 0 30743 0 0 0 30743 157400 rfreema MATLAB 0 0 595 0 0 0 595 157520 rfreema MATLAB 0 0 1178 0 0 0 1178 157522 rfreema *rstudio 0 0 625 0 0 0 625 157523 rfreema *rstudio 0 0 21 0 0 0 21 157524 rfreema *rstudio 2 0 32 0 0 0 34

[10:58:03, rfreeman@rhrcscli02:~]$ bhist -l 157400

Job <157400>, Job Name <MATLAB>, User <rfreeman>, Project <MATLAB>, Application<matlab-5g>, Interactive pseudo-terminal mode, Command <matlab_2017a -r "LASTN = maxNumCompThreads(4);">

Wed Aug 23 12:08:23: Submitted from host <rhrcscli02>, to Queue <interactive>,CWD <$HOME>, 4 Processors Requested;

Wed Aug 23 12:08:23: Dispatched to 4 Hosts/Processors <4*rhrcsnod01>, EffectiveRES_REQ <select[type == local] order[r15s:pg] rusage[mem=5120.00] >;

Wed Aug 23 12:08:23: Starting (Pid 28541);Wed Aug 23 12:18:18: Done successfully. The CPU time used is 142.2 seconds;Wed Aug 23 12:18:18: Post job process done successfully;

MEMORY USAGE:MAX MEM: 634 Mbytes; AVG MEM: 537 Mbytes

Summary of time in seconds spent in various states by Wed Aug 23 12:18:18PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL0 0 595 0 0 0 595

Forhistoricalinformation…

Efficiency = CPU time / (RUN * # Processors Requested= 142.2 / (595 * 4 )

0.06 =--> 6%

Page 23: Choosing Resources

[10:58:27, rfreeman@rhrcscli02:~]$ bhist -l 157396

Job <157396>, User <rfreeman>, Project <default>, Interactive pseudo-terminal shell mode, Command </bin/bash>

Wed Aug 23 11:44:51: Submitted from host <rhrcscli01>, to Queue <interactive>,CWD <$HOME>, Requested Resources <rusage[mem=1000]>, Specified Hosts <rhrcsnod07>;

RUNLIMIT 1440.0 min of rhrcscli01Wed Aug 23 11:44:51: Dispatched to <rhrcsnod07>, Effective RES_REQ <select[type

== any] order[r15s:pg] rusage[mem=1000.00] >;Wed Aug 23 11:44:51: Starting (Pid 21766);Wed Aug 23 20:17:14: Exited by signal 9. The CPU time used is 191.2 seconds;Wed Aug 23 20:17:14: Completed <exit>; TERM_EXTERNAL_SIGNAL: job killed by a si

gnal external to LSF;

MEMORY USAGE:MAX MEM: 4 Mbytes; AVG MEM: 3 Mbytes

Summary of time in seconds spent in various states by Wed Aug 23 20:17:14PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL0 0 30743 0 0 0 30743

Forhistoricalinformation…

Efficiency = CPU time / (RUN * # Processors Requested= 191.2 / (30743 * 1 )

0.01 ~=--> ~1%

Resourcesconsumedbylowefficiency"work"accountfor>70%ofthegridusage.Ifnotactivelyusingtheresources,exitprogramstoreleasethoseresourcesforotherstouse!

Page 24: Choosing Resources

ParallelProcess&

JobEfficiency– AreMoreCoresBetter?Faster?

Page 25: Choosing Resources

Traditionally,softwarehasbeenwrittenforserialcomputers• Toberunonasinglecomputer havingasingleCentralProcessingUnit(CPU)• Problemisbrokenintoadiscretesetofinstructions• Instructionsareexecutedoneaftertheother• Oneoneinstruction canbeexecutedatanymomentintime

SerialvsMulticoreApproaches

Page 26: Choosing Resources

Inthesimplestsense,parallelcomputingisthesimultaneoususeofmultiplecomputeresources tosolveacomputationalproblem:• ToberunusingmultipleCPUs• Aproblemisbrokenintodiscreteparts(eitherbyyouortheapplicationitself)thatcanbe

solvedconcurrently• Eachpartisfurtherbrokendowntoaseriesofinstructions• InstructionsfromeachpartexecutesimultaneouslyondifferentCPUs

SerialvsMulticoreApproaches

Page 27: Choosing Resources

Inordertoruninparallel,programs(code)mustbeexplicitlyprogrammedtodoso.Thus,requestingcoresfromtheschedulerdoesnotautomagicallyparallelizeyourcode!

#!/bin/bash##BSUB -q normal # Queue to submit to (comma separated)#BSUB -J frog_blast # Job name#BSUB -n 8 # Number of cores

...blastn –query seqs.fasta –db nt –out seqs.nt.blastn # WRONG!!blastn –query seqs.fasta –db nt –out seqs.nt.blastn –num_threads $LSB_MAX_NUM_PROCESSORS# YES!!

ParallelProcessing…

Page 28: Choosing Resources

• Bydefault,R,Python,Perl,andMATLAB*arenotmultithreaded…sodonotaskforortrytousemorethan1core/CPU!!

• Foralltheseprograms,youcannotusethedrop-downGUImenus,andyoumustsetthe#ofCPUs/coredynamically!DONOTUSESTATICVALUES!

• ForR,youcanuseappropriateroutineswithRparallel• Nowpartofbase-R• IncludesRforeach,RdoMC,orRsnow

• ForPython,youcanusethemultiprocessinglibrary(ormanyothers)• ForPerl,there'sthreadsorParallel::ForkManager• MATLABhasparpool,anddonotsettheworkerthreadcountinGUIsettings

# R example (parallel.R)library(doMC)mclapply(seq_len(), run2, mc.cores = Sys.getenv('LSB_MAX_NUM_PROCESSORS'))

bsub –q normal –n 4 -app R-5g R CMD BATCH parallel.R # custom submission command

# MATLAB example (parallel.m)hPar = parpool( 'local' , str2num( getenv('LSB_MAX_NUM_PROCESSORS') ) );…

matlab-5g –n4 parallel.m # uses command-line wrapper

MulticoreOptionsinR,Python,&MATLAB

Seemoreinfoonourwebsiteathttp://grid.rcs.hbs.org/parallel-processing

Page 29: Choosing Resources

Stata/MP Performance Report Summary (1)

1 Summary

Stata/MP1 is the version of Stata that is programmed to take full advantage of multicore and multipro-cessor computers. It is exactly like Stata/SE in all ways except that it distributes many of Stata’s mostcomputationally demanding tasks across all the cores in your computer and thereby runs faster—muchfaster.

In a perfect world, software would run 2 times faster on 2 cores, 3 times faster on 3 cores, and soon. Stata/MP achieves about 75% e�ciency. It runs 1.7 times faster on 2 cores, 2.4 times faster on4 cores, and 3.2 times faster on 8 cores (see figure 1). Half the commands run faster than that. Theother half run slower than the median speedup, and some of those commands are not sped up at all,either because they are inherently sequential(most time-series commands) or because theyhave not been parallelized (graphics, mixed).

In terms of evaluating average performanceimprovement, commands that take longer torun—such as estimation commands—are ofgreater importance. When estimation com-mands are taken as a group, Stata/MP achievesan even greater e�ciency of approximately85%. Taken at the median, estimation com-mands run 1.9 times faster on 2 cores, 3.1 timesfaster on 4 cores, and 4.1 times faster on 8cores. Stata/MP supports up to 64 cores.

This paper provides a detailed report onthe performance of Stata/MP. Command-by-command performance assessments are pro-vided in section 8.

Median performance(estimation)

Median performance(all commands)

Logisticregression

Theoreticalupper bound

Lower bound (no improvement)1

2

4

8

Sp

ee

d r

ela

tive

to

sp

ee

d o

f si

ng

le c

ore

1 2 4 8Number of cores

Possible performance region

Figure 1. Performance of Stata/MP. Speed onmultiple cores relative to speed on a single core.

1. Support for this e↵ort was partially provided by the U.S. National Institutes of Health, National Institute on Aginggrants 1R43AG019542-01A1, 2R44AG019542-02, and 5R44AG019542-03. We also thank Cornell Institute for Social andEconomic Research (CISER) at Cornell University for graciously providing access to several highly parallel SMP platforms.CISER sta↵, in particular John Abowd, Kim Burlingame, Janet Heslop, and Lars Vilhuber, were exceptionally helpful inscheduling time and helping with configuration. The views expressed here do not necessarily reflect those of any of theparties thanked above.

Revision 3.0.1 30jan2016

Stataoffersa293-pagereportonitsparallelizationefforts.Theyareprettyimpressive.However:

Example:StataParallelization

Withmultiplecores,onemightexpecttoachievethetheoreticalupperboundofdoublingthespeedbydoublingthenumberofcores—2coresruntwiceasfastas1,4runtwiceasfastas2,andsoon.However,therearethreereasonswhysuchperfectscalabilitycannotbeexpected: 1)somecalculationshavepartsthatcannotbepartitionedintoparallelprocesses;2)evenwhentherearepartsthatcanbepartitioned,determininghowtopartitionthemtakescomputertime;and3)multicore/multiprocessorsystemsonlyduplicateprocessorsandcores,notalltheothersystemresources.

Stata/MPachieved75%efficiencyoveralland85%efficiencyamongestimationcommands.

Speedismoreimportantforproblemsthatarequantifiedaslargeintermsofthesizeofthedatasetorsomeotheraspectoftheproblem,suchasthenumberofcovariates.Onlargeproblems,Stata/MPwith2coresrunshalfofStata’scommandsatleast1.7timesfasterthanonasinglecore.With4cores,thesamecommandsrunatleast2.4timesfasterthanonasinglecore.

Thisparallelizationbenefitismostlyrealizedinbatchmode…mostofinteractiveStataiswaitingforuserinput(orleftidle),asCPUefficiency

istypically<5%- 10%

Page 30: Choosing Resources

ImportantPoints&

Troubleshooting

Page 31: Choosing Resources

TheHBSComputeGridisagreatstartforusingadvancecomputingresources(ACI).Buttherearethingstoremember:

• NB!Thisisexpensiveequipmentandotherpeople'sresearchisatrisk.• ThisisaLevel3&4HRCIenvironment.Pleaserespectthesecurityguidelines&be

cautiouswithcodeyouhavenotwritten.• Jobinformationandcontrolonlyhappensthroughtheterminal.

• Donotrun!WerunaComputeGrid:UnixCheatsheet classinSept!• Ifproblems,rememberyourfriendsbjobs andbhist (andyourfriendsatRCS!)• Withprojectspaces,permissionscanbeaproblem.Seeouronlinewrite-up• Ifparallelprocessing,ensurethatyouusethecorrectcommands:

• AsktheschedulerfortheCPUs• TellyourcodeviatheLSB_MAX_NUM_PROCESSORS environmentvariable

• Donotrun>5jobsthathaveheavydiskreading/writingintothesamedirectory• Talktousaboutworkflowandusagequestions!GuidanceisFREE!• PerhapsFASRC'sOdysseymightbemoreappropriateforthevolume&shapeofyour

work31

ImportantPoints

Page 32: Choosing Resources

Beforeseekinghelp,takesomebasicstepstoascertainwhatisgoingonwithyourjob:

• Usebjobs -l andtoquerydetailsfromLSF• Isyourjobwaitingforspace(Resources)?• Willyourjobeverrun(Dependency)?• Isthereanerrorcodeormessage?

• Ifrunninginteractivejobs,nologfiles,somuchusebjobs &bhist commands• Otherwise,checkyourlogfiles

• Wasa*.logfilegenerated?Whatdoesitsay?• Ifusingcustomsubmissioncommands,youdidspecifyboth-o and-e,yes?• MessageaboutPre-emption,Timeout,orFailure?• Thelasterrorinthelogisusuallynottheproblem.Thefirstoneis!

• Didyourequeste-mailmessagesforyourjobswith-u [email protected]?• Ifyougotajobreportthroughemail,didyoulookatthistounderstandtheerror?• Isyourjobscriptformattedproperly?• Areyouusingthecorrectsoftwarepaths?Possiblesoftware/libraryconflicts?

32

BasicTroubleshooting

Page 33: Choosing Resources

RCS Website&Documentation-- onlyauthoritativesourcehttps://grid.rcs.hbs.org/

Submitahelprequest [email protected]

Bestwaytohelpustohelpyou?Giveus...DescriptionofproblemAdditionalinfo(login/batch?queue?JobIDs?)StepstoReproduce(1.,2.,3...)ActualresultsExpectedresults

GettingHelp

Page 34: Choosing Resources

• Pleasetalktoyourpeers,and…• Wewishyousuccessinyourresearch!

• http://intranet.hbs.edu/dept/research/• https://grid.rcs.hbs.org/• https://training.rcs.hbs.org/

• @hbs_rcs

ResearchComputingServices