EPCC, University of Edinburgh DIRAC and SAFE. DIRAC requirements DIRAC serves a variety of different...
-
Upload
shon-leonard -
Category
Documents
-
view
223 -
download
1
Transcript of EPCC, University of Edinburgh DIRAC and SAFE. DIRAC requirements DIRAC serves a variety of different...
DIRAC requirements
• DIRAC serves a variety of different user communities.– These have different computational requirements best served by
different types of computer.– User communities are spread across many different institutions.– Resources are geographically distributed and run by multiple
organisations.– Some of these resources are provided by existing services with
existing procedures.
• Funding is limited– Mostly only HW was funded.– Need to provide rest of the service as efficiently as possible.– Need to utilise existing infrastructure/processes where possible– Avoid unnecessary complications.
Stakeholders
• Dirac management – Need overview of usage of resources to inform allocation policy.– Need mechanisms to implement allocation policy.
• Research communities– Need resource usage information to manage community science
programme. – Need mechanisms to manage community membership.– Need mechanisms to manage community resources.
• Users– Need to be able to request accounts (frequently at remote institutions)– Need to access accounts remotely– Want to get on with science without additional complications.
Level of integration
• Most requirements for integration are at the management
level
• Experience suggests a strong correlation between user
communities and compute resource.– Communities will choose resources appropriate to their science.– Users will want to access the unique features of these resources.– Though projects may span resources most individual users will
probably stick to a single system.
• Global accounts, single-sign-on etc. not essential.
GRID?
• Computational grid not appropriate– Grids designed to provide uniform access to interchangeable
resources. DIRAC resources are complementary not interchangeable.
– Provides standard interface but only to features common to all systems
• Data grid may be more relevant.– Depends on the data handling requirements of user communities.– Need to gather more requirements.
SAFE design principles
• SAFE has been built to provide a single point of contact for
users of national HPC services.– Role essentially that of the ITIL service desk.– Originally deployed for HPCx service, Currently used for HECToR
service. Also used for internal EPCC services.
• Provides a well defined interface for service providers.– Tries to express all requests as standard tickets.– Supports multiple service providers with different support policies.
• Has to make very few technological assumptions.– Users can come from any academic institution. Can’t assume much
more than Email and Web.– We usually bid to run service in parallel with hardware procurement.
We have little say over hardware or system software and need to adapt SAFE quickly to provide service if bid successful.
SAFE design principles II
• Has to be flexible rather than prescriptive.– Requirements have changed constantly over the 10 years of SAFE
development.– Need to be able to quickly implement new reports or policies
generated by RCs or policy panels.– Need to maintain access to old data even when current system/policy
has changed.– Need to be able to integrate new services into existing instances.– Need to be able to adapt tickets to meet needs of service teams and
underlying infrastructure.
• Controlling our own software gives us a great deal of
flexibility.– We have built up an extensive toolbox to allow rapid implementation
of new requirements.
What can SAFE offer DIRAC.
• Software already exists and is already managing BG/Q
service (minimal cost).
• Its designed to handle distributed user communities from
many different institutions.– Many DIRAC users will already be familiar with it.
• Its designed to handle multiple service providers with
different operating policies.
• While the SAFE supports many features sites only need to
adopt those that work with their normal way of working.
SAFE as a service
• Can use the BG/Q safe to provide a service for the whole of
DIRAC– Host, install, maintain, modify where necessary.– Generates necessary reports and statistics for whole of DIRAC.– Provides single point to manage project membership, account
creation etc.– Lightweight and non-intrusive integration with service providers.
– Special handling to work within local policies.– Choice over which features are adopted.
– Centralised service requires minimal changes to existing software and only needs O(N) interactions not O(N2)
Account creation.
• Accounts requested via SAFE– Sends request to project manager.– Once approved raises ticket with service provider– Default is to do this by email, XML available for scripts.
Hi Support,
This user has been authorised to have an account on one of our machines. Please create a new user account for them using the following information.
Task ID: 46067 Machine: hector Username: demo Email: [email protected] User's Name: Dr Stephen P Booth Consortium: z01 - USL Project Group(s): z01 UID: 13535 GID: 1001
Thanks, The SAF. P.S. You can see the current pending queue by looking at https://www.hector.ac.uk/safe/servlet/SysAdminServlet
<SysAdmin><Id>46067</Id><Type>New User</Type><Status>Pending</Status><StarDate>2012-6-4 11:3:51</StarDate><EndDate>0000-00-00 00:00:00</EndDate><Machine>hector</Machine><Project> <Code>z01</Code> <Name>USL</Name></Project><ProjectGroup> <Code>z01</Code><GroupID>1001</GroupID></ProjectGroup><Account> <Name>demo</Name> <UID>13535</UID> <GID>1001</GID> <Groups>z01</Groups></Account><Person><Name><Title>Dr</Title><Firstname>Stephen</Firstname><Lastname>Booth</Lastname></Name><Email>[email protected]</Email></Person></SysAdmin>
Completing tickets.
• Once created need to notify SAFE via web-form– Manually via browser or automatically via script.– Service provider can reject tickets.– Initial (one-shot?) password returned to SAFE for retrieval by user.– Similar mechanism possible for password resets.
• We can gather more information if needed– IP address ranges has been requested.
• We can encode local policies on Usernames UID/GID ranges
into SAFE.
• Or we can let site choose UID/GID/Username and return
values to SAFE when completing ticket.– UID/GID only need to be managed centrally if supporting file-system
cross mounts.
Accounting/Reports
• SAFE contains an extensive accounting sub-system.
• Accounting data is parsed into DB tables.– Do NOT mandate a fixed format instead keep data close to raw
format and define mappings to standard properties.– Easier to change system/policy without re-importing old data.– Easier to handle different service provider policies – Single reports may combine data from multiple tables in different
formats provided reports are based on common properties.
• Service providers only need to provide DIRAC usage data in
some convenient format.– Normally upload data daily.– Can also support storage accounting though this does currently use a
fixed format.
Resource Management
• Safe can provide more detailed resource management. Uses
a 3 level model.1. Project – Top level corresponds to a grant of resources from
allocation panel mostly internal to SAFE
2. ProjectGroup – Internal project management grouping controlled by project PI or designated managers through web interface. These can be just compute budgets but may also correspond to unix groups if used to manage disk resources.
3. User – individual user.
• Though this gives a lot of fine control to PI/PM it requires
more integration with service provider– Sites can choose to use local resource management procedures
instead. – Accounting does NOT depend on SAFE managing the resources.