Building a SLURM Banking System - Research...

24
Building a SLURM Banking System Authors Sahil Hasan*, Harrison Kuo*, Cassie Zhang*, Quinn Dombrowski*, Steve Masover*, Patrick Schmitz*, Krishna Muriki(*,**) Yong Qin(*,**) * Berkeley Research Computing, University of California, Berkeley * High Performance Computing Services, Lawrence Berkeley National Laboratory

Transcript of Building a SLURM Banking System - Research...

BuildingaSLURMBankingSystemAuthorsSahilHasan*,HarrisonKuo*,CassieZhang*,QuinnDombrowski*,SteveMasover*,PatrickSchmitz*,KrishnaMuriki(*,**)YongQin(*,**)

*BerkeleyResearchComputing,UniversityofCalifornia,Berkeley*HighPerformanceComputingServices,LawrenceBerkeleyNationalLaboratory

myBRCDashboardUIDatabaseSystemRESTAPISLURMPlugins

- Overview-

BuildingaSLURMBankingSystem

• BerkeleyResearchComputing

• Deliverreliable,sustainableresourcesandservicestomeetcomputationaldemandsofResearchgroupsinUCBerkeley.

• Savioisa400nodeinstitutionalHPCLinuxclusterwhichformsthefoundationoftheBRCprogram.

• NumberofresearchgroupswithCondocontributions-22

• Numberofresearchgroupsusingfreecomputeallowances- 195

• Numberofuniqueusers- 1120

WhatisBRC?

• Managedthroughsacct,sacctmgr

• CurrentBRCaccountinginfrastructureislimitedandnoteasilyintegratedintomyBRC

• HardtocreatewebapplicationstoprovideuserinterfacesforaccountingusingSLURMAPI

StateofSLURMAccounting

Why?

• Facultygetcomputeallowanceof300KServiceUnits(SU=1core-hour)

• FacultyComputingAllowances(FCA)areusuallyformedforfacultyresearch•FCA~10softrustedusersinagroup

• InstructionComputingAllowances(ICA)areusuallygivenforinstructors•ICA~10-100sofpotentiallyuntrustedusers

• Instructors/Facultywanttocontrolquotasandourcurrentinfrastructurecannotsupportthis

Goals

• Createanopensource,plugandplayaccountingsystem

• IndependentofexistingSLURMdatabase

• Easilyallowforwebapplicationstobebuiltontopofstack

Overview

SLURMplugins

API+Logic

SLURMplugins myBRCDashboard

Database

• AgraphicalDjango-baseddashboardthataidsaccountownersinchangingallocations,generatingvisualizations,etc.

• AdatabasesystemseparatefromexistingSLURMinfrastructure

• RESTAPItoprotectintegrityofdatabase

• Asystemofpluginsthatenableseasysub-allocationofTRES-minutesandtracksburndownofsub-allocationswithinanaccount

Components

DatabaseSystemRESTAPISLURMPlugins

Overview

- myBRCDashboardUI-

BuildingaSLURMBankingSystem

DashboardIntegratedwithBRCWebsite

Login

NotificationCenter

StatisticsVisualization

AllocationManager

ServiceUnitCalculator

RESTAPISLURMPlugins

OverviewmyBRC DashboardUI

- DatabaseSystem-

BuildingaSLURMBankingSystem

DatabaseSystem

• NeededtoreflectwhatwasusedinSLURMaccounting

• Neededtohavegoodassociationsbetweenusers,accountsandjobs

• Customizablepartitions,QOS,jobstatus,etc.

• UsesmySQL

SLURMPlugin

OverviewmyBRC DashboardUIDatabaseSystem

- RESTAPI-

BuildingaSLURMBankingSystem

RestAPI

• Neededtosupportviewingquotas

• Neededtoallowprivilegeduserstoeditotherusers’quotas

• Threeendpoints:/users,/accounts,and/jobs

• Allowsalayerofconvenienceandsecuritybetweentheuser/applicationandthedatabase/logic/SLURM

• UsesDjangoWebFramework

- SLURMPlugin-

OverviewmyBRC DashboardUIDatabaseSystemRESTAPI

BuildingaSLURMBankingSystem

SLURMPlugin

• SLURMpluginsperformwebAPIcallstointeractwithdatabase

• Allowsforjobsrunfromtheterminaltoberegisteredindatabase

• Splitintotwofunctions:asubmit-timeandanepilogue

LifecycleofaSuccessfulJob

Job submittedBy user

Job created with hold status

Submit-Time

Job updatedwith finish

status Check SUs used

Epilogue

Check if job is less than allowance

Deducts user balance

Job run

Thankyou