EPCC, University of Edinburgh DIRAC and SAFE. DIRAC requirements DIRAC serves a variety of different...

EPCC, University of Edinburgh

DIRAC and SAFE

http://www.epcc.ed.ac.uk/

DIRAC requirements

• DIRAC serves a variety of different user communities.– These have different computational requirements best served by

different types of computer.– User communities are spread across many different institutions.– Resources are geographically distributed and run by multiple

organisations.– Some of these resources are provided by existing services with

existing procedures.

• Funding is limited– Mostly only HW was funded.– Need to provide rest of the service as efficiently as possible.– Need to utilise existing infrastructure/processes where possible– Avoid unnecessary complications.


Stakeholders

• Dirac management – Need overview of usage of resources to inform allocation policy.– Need mechanisms to implement allocation policy.

• Research communities– Need resource usage information to manage community science

programme. – Need mechanisms to manage community membership.– Need mechanisms to manage community resources.

• Users– Need to be able to request accounts (frequently at remote institutions)– Need to access accounts remotely– Want to get on with science without additional complications.


Level of integration

• Most requirements for integration are at the management

level

• Experience suggests a strong correlation between user

communities and compute resource.– Communities will choose resources appropriate to their science.– Users will want to access the unique features of these resources.– Though projects may span resources most individual users will

probably stick to a single system.

• Global accounts, single-sign-on etc. not essential.


GRID?

• Computational grid not appropriate– Grids designed to provide uniform access to interchangeable

resources. DIRAC resources are complementary not interchangeable.

– Provides standard interface but only to features common to all systems

• Data grid may be more relevant.– Depends on the data handling requirements of user communities.– Need to gather more requirements.


SAFE design principles

• SAFE has been built to provide a single point of contact for

users of national HPC services.– Role essentially that of the ITIL service desk.– Originally deployed for HPCx service, Currently used for HECToR

service. Also used for internal EPCC services.

• Provides a well defined interface for service providers.– Tries to express all requests as standard tickets.– Supports multiple service providers with different support policies.

• Has to make very few technological assumptions.– Users can come from any academic institution. Can’t assume much

more than Email and Web.– We usually bid to run service in parallel with hardware procurement.

We have little say over hardware or system software and need to adapt SAFE quickly to provide service if bid successful.


SAFE design principles II

• Has to be flexible rather than prescriptive.– Requirements have changed constantly over the 10 years of SAFE

development.– Need to be able to quickly implement new reports or policies

generated by RCs or policy panels.– Need to maintain access to old data even when current system/policy

has changed.– Need to be able to integrate new services into existing instances.– Need to be able to adapt tickets to meet needs of service teams and

underlying infrastructure.

• Controlling our own software gives us a great deal of

flexibility.– We have built up an extensive toolbox to allow rapid implementation

of new requirements.


What can SAFE offer DIRAC.

• Software already exists and is already managing BG/Q

service (minimal cost).

• Its designed to handle distributed user communities from

many different institutions.– Many DIRAC users will already be familiar with it.

• Its designed to handle multiple service providers with

different operating policies.

• While the SAFE supports many features sites only need to

adopt those that work with their normal way of working.


SAFE as a service

• Can use the BG/Q safe to provide a service for the whole of

DIRAC– Host, install, maintain, modify where necessary.– Generates necessary reports and statistics for whole of DIRAC.– Provides single point to manage project membership, account

creation etc.– Lightweight and non-intrusive integration with service providers.

– Special handling to work within local policies.– Choice over which features are adopted.

– Centralised service requires minimal changes to existing software and only needs O(N) interactions not O(N2)


Account creation.

• Accounts requested via SAFE– Sends request to project manager.– Once approved raises ticket with service provider– Default is to do this by email, XML available for scripts.

Hi Support,

This user has been authorised to have an account on one of our machines. Please create a new user account for them using the following information.

Task ID: 46067 Machine: hector Username: demo Email: [email protected] User's Name: Dr Stephen P Booth Consortium: z01 - USL Project Group(s): z01 UID: 13535 GID: 1001

Thanks, The SAF. P.S. You can see the current pending queue by looking at https://www.hector.ac.uk/safe/servlet/SysAdminServlet

<SysAdmin><Id>46067</Id><Type>New User</Type><Status>Pending</Status><StarDate>2012-6-4 11:3:51</StarDate><EndDate>0000-00-00 00:00:00</EndDate><Machine>hector</Machine><Project> <Code>z01</Code> <Name>USL</Name></Project><ProjectGroup> <Code>z01</Code><GroupID>1001</GroupID></ProjectGroup><Account> <Name>demo</Name> <UID>13535</UID> <GID>1001</GID> <Groups>z01</Groups></Account><Person><Name><Title>Dr</Title><Firstname>Stephen</Firstname><Lastname>Booth</Lastname></Name><Email>[email protected]</Email></Person></SysAdmin>


Completing tickets.

• Once created need to notify SAFE via web-form– Manually via browser or automatically via script.– Service provider can reject tickets.– Initial (one-shot?) password returned to SAFE for retrieval by user.– Similar mechanism possible for password resets.

• We can gather more information if needed– IP address ranges has been requested.

• We can encode local policies on Usernames UID/GID ranges

into SAFE.

• Or we can let site choose UID/GID/Username and return

values to SAFE when completing ticket.– UID/GID only need to be managed centrally if supporting file-system

cross mounts.


Accounting/Reports

• SAFE contains an extensive accounting sub-system.

• Accounting data is parsed into DB tables.– Do NOT mandate a fixed format instead keep data close to raw

format and define mappings to standard properties.– Easier to change system/policy without re-importing old data.– Easier to handle different service provider policies – Single reports may combine data from multiple tables in different

formats provided reports are based on common properties.

• Service providers only need to provide DIRAC usage data in

some convenient format.– Normally upload data daily.– Can also support storage accounting though this does currently use a

fixed format.


Resource Management

• Safe can provide more detailed resource management. Uses

a 3 level model.1. Project – Top level corresponds to a grant of resources from

allocation panel mostly internal to SAFE

2. ProjectGroup – Internal project management grouping controlled by project PI or designated managers through web interface. These can be just compute budgets but may also correspond to unix groups if used to manage disk resources.

3. User – individual user.

• Though this gives a lot of fine control to PI/PM it requires

more integration with service provider– Sites can choose to use local resource management procedures

instead. – Accounting does NOT depend on SAFE managing the resources.


EPCC, University of Edinburgh DIRAC and SAFE. DIRAC requirements DIRAC serves a variety of different...

Documents

Transcript of EPCC, University of Edinburgh DIRAC and SAFE. DIRAC requirements DIRAC serves a variety of different...