Post on 16-May-2018
BuildingaSLURMBankingSystemAuthorsSahilHasan*,HarrisonKuo*,CassieZhang*,QuinnDombrowski*,SteveMasover*,PatrickSchmitz*,KrishnaMuriki(*,**)YongQin(*,**)
*BerkeleyResearchComputing,UniversityofCalifornia,Berkeley*HighPerformanceComputingServices,LawrenceBerkeleyNationalLaboratory
• BerkeleyResearchComputing
• Deliverreliable,sustainableresourcesandservicestomeetcomputationaldemandsofResearchgroupsinUCBerkeley.
• Savioisa400nodeinstitutionalHPCLinuxclusterwhichformsthefoundationoftheBRCprogram.
• NumberofresearchgroupswithCondocontributions-22
• Numberofresearchgroupsusingfreecomputeallowances- 195
• Numberofuniqueusers- 1120
WhatisBRC?
• Managedthroughsacct,sacctmgr
• CurrentBRCaccountinginfrastructureislimitedandnoteasilyintegratedintomyBRC
• HardtocreatewebapplicationstoprovideuserinterfacesforaccountingusingSLURMAPI
StateofSLURMAccounting
Why?
• Facultygetcomputeallowanceof300KServiceUnits(SU=1core-hour)
• FacultyComputingAllowances(FCA)areusuallyformedforfacultyresearch•FCA~10softrustedusersinagroup
• InstructionComputingAllowances(ICA)areusuallygivenforinstructors•ICA~10-100sofpotentiallyuntrustedusers
• Instructors/Facultywanttocontrolquotasandourcurrentinfrastructurecannotsupportthis
Goals
• Createanopensource,plugandplayaccountingsystem
• IndependentofexistingSLURMdatabase
• Easilyallowforwebapplicationstobebuiltontopofstack
• AgraphicalDjango-baseddashboardthataidsaccountownersinchangingallocations,generatingvisualizations,etc.
• AdatabasesystemseparatefromexistingSLURMinfrastructure
• RESTAPItoprotectintegrityofdatabase
• Asystemofpluginsthatenableseasysub-allocationofTRES-minutesandtracksburndownofsub-allocationswithinanaccount
Components
DatabaseSystem
• NeededtoreflectwhatwasusedinSLURMaccounting
• Neededtohavegoodassociationsbetweenusers,accountsandjobs
• Customizablepartitions,QOS,jobstatus,etc.
• UsesmySQL
RestAPI
• Neededtosupportviewingquotas
• Neededtoallowprivilegeduserstoeditotherusers’quotas
• Threeendpoints:/users,/accounts,and/jobs
• Allowsalayerofconvenienceandsecuritybetweentheuser/applicationandthedatabase/logic/SLURM
• UsesDjangoWebFramework
SLURMPlugin
• SLURMpluginsperformwebAPIcallstointeractwithdatabase
• Allowsforjobsrunfromtheterminaltoberegisteredindatabase
• Splitintotwofunctions:asubmit-timeandanepilogue
LifecycleofaSuccessfulJob
Job submittedBy user
Job created with hold status
Submit-Time
Job updatedwith finish
status Check SUs used
Epilogue
Check if job is less than allowance
Deducts user balance
Job run