EGEE Infrastructure, Services, & Operations
description
Transcript of EGEE Infrastructure, Services, & Operations
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE Infrastructure, Services, & Operations
Ian Bird, CERN ITSA1 Activity Leader
1st EGEE User Forum 2nd March 2006
EGEE Infrastructure & Operations 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
• Introduction – history • Middleware and Services• Middleware distributions• Operations• User Support• Access to resources & Introducing
new VOs• What can you get from EGEE?
– And what does it cost?• From EGEE to EGEE-II• Outlook
SA1 – Operations & Management 97%
SA2 – Network Services 3%
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Introduction
EGEE Infrastructure & Operations 4
Enabling Grids for E-sciencE
INFSO-RI-508833
History• EGEE infrastructure (middleware distribution and
operations) was built up during 18 months prior to the start of EGEE by the LCG project
– The LCG work formed the basic infrastructure of EGEE– The middleware distribution retained this name (LCG-2.x) as it was
expected to be replaced by gLite– Now the middleware distribution will evolve with additional or
replacement services coming from gLite or elsewhere
• EGEE started in April 2004 with a running grid infrastructure
– 40 sites, 3000 CPU– Basic operations– Developed certification and deployment process
• Now expanded to:– 200 sites, >20 000 CPU, 40 countries– Managed operations – stability of sites– >10 000 jobs / day sustained over the last year
Sites
CPU
Jobs/day
EGEE Infrastructure & Operations 5
Enabling Grids for E-sciencE
INFSO-RI-508833
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Middleware & Services
EGEE Infrastructure & Operations 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid middleware • Middleware is software and services that sit between the user application
and the underlying computing and storage resources, to provide a uniform access to those resources.
• The GRID middleware services: should– Find convenient places for
the application to be run– Optimise use of resources– Organise efficient access to data – Deal with authentication to the
different sites that are used– Run the job & monitor progress– Recover from problems– Transfer the result back to the scientist
EGEE Infrastructure & Operations 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Middleware Distributions and Stacks
• Terminology:– EGEE deploys a middleware distribution
Drawn from various middleware products, stacks, etc. Do not confuse the distribution with development projects or with software packages Count on 6 months from software developer “release” to production deployment
– The EGEE distribution: Current production version labelled: LCG-2.7.0 Next version labelled: gLite-3.0 Name change to hopefully reduce confusion
• EGEE distribution contents: LCG-2.7.0:
– VDT – packaging Globus 2.4, Condor, MyProxy
– EDG workload management– LCG components:
BDII (info sys), catalogue (LFC), DPM, data management libraries and
CLI tools monitoring tools
– gLite: R-GMA, VOMS, FTS
gLite-3.0:– Based on LCG-2.7.0, and– gLite workload management– Other gLite components (not in the
distribution but provided as services): AMGA, Hydra, Fireman gLite-IO
evolution
EGEE Infrastructure & Operations 10
Enabling Grids for E-sciencE
INFSO-RI-508833
CAs, Authentication, Authorization
Authentication• Use of GSI, X.509 certificates
– Generally issued by national certification authorities
• Agreed network of trust:– International Grid Trust Federation (IGTF)
EUGridPMA APGridPMA TAGPMA
– All EGEE sites will usually trust all IGTF root CAs
Authorization• Until LCG-2.7.0 via grid-map files only• From LCG-2.7.0 using VOMS
extended proxies– Call-outs to local authorization services– Integration with grid services under way –
compute elements, storage systems– For some time the authorization will be a
mixture of call-outs and grid-map files until all services understand extended proxies
TAGPMA APGridPMA
The Americas Grid PMA
European Grid PMA
EUGridPMA
Asia-Pacific
Grid PMA
EGEE Infrastructure & Operations 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Basic ServicesJob Management:• Workload Management –
– Resource Broker– DLI/SI interface to catalogues for data-based
scheduling– Bulk job submission (gLite-3.0)– DAGs (gLite-3.0)– Push/pull mode (pull untested – gLite-3.0)
• Compute Element (CE):– Globus/EDG/LCG Condor_C (VO-based
scheduling) in gLite-3.0• Logging & Bookkeeping• Local Batch systems:
– LSF, PBS, Condor, (Sun Grid Engine)• Additional tools:
– Ability to “peek” at stdout/stderr of running jobs– User job monitoring – look at the status (state,
cpu time, etc) of running jobs
Data Management• File and replica catalogues (LFC)
– Central or local (not distributed)– Replication via Oracle, or squid caches tested
by LCG– Secure
• File Transfer Service (FTS)– Reliable data transfer– Uses gridftp or srmcopy as transport
• Storage Elements based on SRM interface– DPM: implements Posix ACLs, VOMS
roles/groups (gLite-3.0)– Other available SEs: dCache, Castor– Deprecated: “Classic SE” – basically just gridftp
• Metadata catalogue:– AMGA (gLite-3.0 – partial support)
• Secure Keystore:– Hydra (gLite-3.0 – partial support)
• Utilities and IO libraries:– Lcg-utils– GFAL – this is the SRM client library– gLiteIO – expect functionality to be replaced
EGEE Infrastructure & Operations 12
Enabling Grids for E-sciencE
INFSO-RI-508833
Other servicesInformation system• BDII (implementation of Globus MDS)• GLUE schema• Several tools to access information• FCR site selection tool (see next
slide)
Monitoring & Accounting• R-GMA used as monitoring framework• Aggregation for various sources of
monitoring data• Accounting: APEL package:
– After-the-fact accounting – Uses GGF User Record as schema– Does not provide user-level data – but
this is a legal/privacy issue not technical!
EGEE Infrastructure & Operations 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Selecting resources• Selecting resources:
– Tool that uses dynamically updated data about sites
Site functional tests– VO can:
Select critical tests White/black list sites
– VO gets a customised set of “good” sites – a view in the information system
– VO can add VO-specific tests
• Can be used by RB or other workload management system to run on good/stable sites
EGEE Infrastructure & Operations 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Selecting resources
EGEE Infrastructure & Operations 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Selecting resources
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Middleware distributions Deployment
EGEE Infrastructure & Operations 17
Enabling Grids for E-sciencE
INFSO-RI-508833
Inte
grat
ion
VDT/OSG
OMII-Europe
JRA1
SA3
…
Test
ing
& C
ertif
icat
ion
Support, analysis, debugging
Pro
duct
ion
serv
ice
SA1P
re-p
rodu
ctio
n se
rvic
e
Mid
dlew
are
prov
ider
s
SA3
Certification activities SA3+SA1
Process to deployment
EGEE Infrastructure & Operations 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Release Process (simplified)
C&T
EISGIS
GDB
ApplicationsRC Bugs/Patches/TaskSavannah
EISCICs
Head of Deployment
prioritization&
selection
Developers
Applications
Developers
1
List for next release(can be empty)2
integration&
first testsC&T
3
Internal Releases
4User Level install of
client toolsEIS
5
full deployment on test clusters (6)
functional/stress tests~1 week
C&T
6
assign and update cost
Bugs/Patches/TaskSavannah
componentsready at cutoff
InternalClient
Release
7Client
ReleaseService Release
Updates Release
Core Service Release
C&T
EGEE Infrastructure & Operations 19
Enabling Grids for E-sciencE
INFSO-RI-508833
Deployment process
Release(s)
Certificationis run daily
Update User Guides EIS
UpdateRelease Notes
GIS
ReleaseNotes
InstallationGuides
UserGuides
Re-Certify
CIC
Every Month
11
ReleaseReleaseClient Release
Deploy ClientReleases
(User Space)GIS
Deploy ServiceReleases (Optional) CICs
RCs
Deploy MajorReleases
(Mandatory) ROCsRCs
YAIM
Every Month
Every 3 months
on fixed dates !
at own pace
EGEE Infrastructure & Operations 20
Enabling Grids for E-sciencE
INFSO-RI-508833
Certification test bed
RB_ a
BDI I _ a
MDS_ a
CE_ a
SE_ a
RB_ b
BDI I _ b
CE_ b
WNs
CE_ 2
SE_ 2
WNs
RB_ 3
BDI I _ 3
MDS_ 3_ a
CE_ 3
SE_ 3
WNs
CE_ 4
SE_ 4
WNsWNsWN_ a1
WNsWNs
WN_ b1 WNsWNsWNs
WN_ 2_ a1WNsWNs
WN_ 3_ a2
WN_ 3_ a1
WNsWNsWNsWNs
WN_ 4
RLS_ MySQL
RLS_ oracle
Cluster_1 Cluster_2 Cluster_3 Cluster_4
UI _ 1 UI _ 4
CE_ 5
WNsWNsWNsWNsWNs
WN_ 5
Cluster_5
CE_ 6
WNsWNsWNsWNsWNsWN
Cluster_6
LSFCondor
CertTB
Proxy
WN_ b2WN_ a2
WN_ 2_ a2
LCFGng Lite install
MDS_ b
MDS_ 3_ b
EGEE Infrastructure & Operations 21
Enabling Grids for E-sciencE
INFSO-RI-508833
Time to upgrade
• Time to upgrade ~constant (~2.5 sites/day)
• Takes a long time to upgrade entire infrastructure
• Better now than it was – site functional tests and operational oversight
• Need to move away from the need to do full upgrades more than 1-2 times / year
– But need to be able to deploy updates, new tools, security patches, etc.
LCG-2.6.0
EGEE Infrastructure & Operations 22
Enabling Grids for E-sciencE
INFSO-RI-508833
Desired scenario• Steady-state with:
– Components delivered (as far as possible) independent of each other– Developed according to realistic schedules – not constrained by artificial release
deadlines– Production service running stable, tested (certified) versions of services and tools
Major upgrades only 1 or 2 times per year Potential for upgrading individual services Client tools: new versions deployed as needed Emphasis on reliability, stability, performance, backward compatibility, …
– Pre-production service running new, but certified versions of services Anticipated as upgrades to production services (beta releases of next versions or new
services) Allowing reasonable scale application testing and integration with new versions
– Certification testbed running full regression, stress, and functional tests Pre-requisite before moving to pre-production and production
• Software can be rejected (not working, not ready, … )– During testing/certification– During pre-production
• Net result must be that the production service is stable and as reliable as possible; and evolves incrementally and in a controlled way
EGEE Infrastructure & Operations 23
Enabling Grids for E-sciencE
INFSO-RI-508833
Checklist for a new service• User support procedures (GGUS)
– Troubleshooting guides + FAQs– User guides
• Operations Team Training– Site admins– CIC personnel– GGUS personnel
• Monitoring– Service status reporting– Performance data
• Accounting– Usage data
• Service Parameters – Scope - Global/Local/Regional– SLAs– Impact of service outage– Security implications
• Contact Info– Developers– Support Contact– Escalation procedure to developers
• Interoperation– Documented issues
• First level support procedures– How to start/stop/restart service– How to check it’s up– Which logs are useful to send to
CIC/Developers and where they are
• SFT Tests– Client validation– Server validation– Procedure to analyse these
error messages and likely causes• Tools for CIC to spot problems
– GIIS monitor validation rules (e.g. only one “global” component)
– Definition of normal behaviour Metrics
• CIC Dashboard– Alarms
• Deployment Info– RPM list– Configuration details– Security audit
This is what is takes to make a reliable production service from a middleware component
Not much middleware is delivered with all this … yet
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Operations
EGEE Infrastructure & Operations 25
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Operations• Services:
– Production service– Pre-production service– Operational security – incident response
• Operation process, includes:– Problem detection– Reporting– Problem solving– Escalation procedures
EGEE Infrastructure & Operations 26
Enabling Grids for E-sciencE
INFSO-RI-508833
EGEE Operations Structure• Operations Management Centre
(OMC)• Core Infrastructure Centres
(CIC)– Manage daily grid
operations – oversight, troubleshooting
“Operator on Duty”– Run infrastructure services– UK/I, Fr, It, CERN,
Ru,Taipei• Regional Operations Centres
(ROC)– Front-line support for user
and operations issues– Provide local knowledge
and adaptations– One in each region – many
distributed• User Support Centre (GGUS)
– In FZK: provide single point of contact (service desk) + portal.
EGEE Infrastructure & Operations 27
Enabling Grids for E-sciencE
INFSO-RI-508833
EGEE Operations Process• Grid operator on duty
– 6 teams working in weekly rotation CERN, IN2P3, INFN, UK/I, Ru,Taipei
– Crucial in improving site stability and management
• Operations coordination– Weekly operations meetings– Regular ROC, CIC managers meetings– Series of EGEE Operations Workshops
Nov 04, May 05, Sep 05, (June 06?)• Geographically distributed responsibility
for operations:– There is no “central” operation– Tools are developed/hosted at different sites:
GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon)
• Procedures described in Operations Manual
– Introducing new sites– Site downtime scheduling– Suspending a site– Escalation procedures– etc
EGEE Infrastructure & Operations 28
Enabling Grids for E-sciencE
INFSO-RI-508833
Operations tools: Dashboard• Dashboard provides top level
view of problems:– Integrated view of monitoring
tools (SFT, GStat) shows only failures and assigned tickets
– Single tool for ticket creation and notification emails with detailed problem categorisation and templates
– Detailed site view with table of open tickets and links to monitoring results
– Ticket browser highlighting expired tickets
Test summary (SFT,GSTAT)
GGUS Ticket status
•`Problem categories
•`Sites list (reporting new problems)
Developed and operated by CC-IN2P3: http://cic.in2p3.fr/
EGEE Infrastructure & Operations 29
Enabling Grids for E-sciencE
INFSO-RI-508833
Regional Operations
Centre
Regional Operations
Centre
Regional Operations
Centre… …
Resource Centre
Resource Centre
… Resource Centre
Resource Centre
…
OperationsCoordination
Centre OSCT
Coordination,Middleware deployment
Operational security coordination
1st Level support
2nd Level support
JSPG
Coordination,Middleware deployment
Coordination,Middleware deployment
JSPG: Joint Security Policy GroupOSCT: Operational Security Coordination Team
Operations/deployment support
EGEE Infrastructure & Operations 30
Enabling Grids for E-sciencE
INFSO-RI-508833
Regional Operations
Centre… …
Regional Operations
Centre
Resource Centre
Resource Centre
…
Regional Operations
Centre
Resource Centre
Resource Centre
…
OSCTGrid Operator on-duty
2nd Level support
1st Level support
Monitoring shows a problem
Operator submits a GGUS ticket against the ROC and cc’s the site. The ticket is followed until it is solved
ROC and Site work to resolve the problem
Operations support workflows
EGEE Infrastructure & Operations 31
Enabling Grids for E-sciencE
INFSO-RI-508833
Evolution of SFT metric
Missing log data
Available sites
Available CPU
Daily: July November
EGEE Infrastructure & Operations 32
Enabling Grids for E-sciencE
INFSO-RI-508833
Security Policy
• Joint Security Policy Group– EGEE with strong input from OSG– Policy Set:
• Policy Revisions– Grid Acceptable Use Policy (AUP)
https://edms.cern.ch/document/428036/ common, general and simple AUP for all VO members using many Grid
infrastructures• EGEE, OSG, SEE-GRID, DEISA, national Grids…
– VO Security https://edms.cern.ch/document/573348/ responsibilities for VO managers and
members VO AUP to tie members to Grid AUP
accepted at registration– Incident Handling and Response
https://edms.cern.ch/document/428035/ defines basic communications paths defines requirements (MUSTs) for IR
• reporting• response• protection of data• analysis
not to replace or interfere with local response plans
Security & Availability Policy
UsageRules
Certification Authorities
AuditRequirements
Incident Response
User Registration & VO Management
Application Development& Network Admin Guide
VOSecurity
EGEE Infrastructure & Operations 33
Enabling Grids for E-sciencE
INFSO-RI-508833
Operational Security Coordination Team (OSCT)
– What it is not: Not focused on middleware security
architecture Not focused on vulnerabilities (see
Vulnerabilities Group)– Focus on Incident Response
Coordination Assume it’s broken, how do we
respond? Planning and Tracking
– Focus on ‘Best Practice’ Advice Monitoring Analysis
– Coordinators for each EGEE ROC plus OSG LCG Tier 1 + Taipei
SSC1 - Job Trace
SSC2 - Storage Audit
Infrastructure
HA
ND
BO
OK
IncidentResponse
Policy
Procedures
Resources
Reference
Playbook
SecurityService
Challenge
Infrastructure
Agents
Deployment
MonitoringTools
3 strategies
• OSCT membership ROC security contacts
EGEE Infrastructure & Operations 34
Enabling Grids for E-sciencE
INFSO-RI-508833
Vulnerability Group• Has been set up last summer (CCLRC lead)• Purpose: inform developers, operations, site managers of vulnerabilities as they
are identified and encourage them to produce fixes or to reduce their impact• Set up (private!) database of vulnerabilities
– To inform sites and developers• Urgent action OSCT to manage• After reaction time (45 days)
– Vulnerability and risk analysis given to OSCT to define action – publication?– Will not publish vulnerabilities with no solution
• Intend to report progress and statistics on vulnerabilities by middleware component and response of developers
• Balance between open responsible public disclosure and creating security issues with precipitous publication
• Following first experience in implementing this process, review of procedures under way, including need for appropriate risk analyses
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
User Support
EGEE Infrastructure & Operations 36
Enabling Grids for E-sciencE
INFSO-RI-508833
Goals• A single access point for support• A portal with a well structured information and updated documentation• Knowledgeable experts • Correct, complete and responsive support• Tools to help resolve problems
– search engines – monitoring applications– resources status
• Examples, templates, specific distributions for software of interest• Interface with other Grid support systems• Connection with developers, deployment, operation teams• Assistance during production use of the grid infrastructure
EGEE Infrastructure & Operations 37
Enabling Grids for E-sciencE
INFSO-RI-508833
Central Application
(GGUS)
DeploymentSupport
MiddlewareSupport
NetworkSupport
Operations Support
TPM
ROC 1 ROC 10ROC…
VOSupport
InterfaceWebportal
The Support Model ““Regional Support with Central Coordination"Regional Support with Central Coordination"
The ROCs, VOs and other project-wide groups such
as the Core Infrastructure Center (CIC), middleware
groups (JRA), network groups (
NA), service groups (SA) areconnected via a
central integration
platform provided by GGUS.
Regional Support units
User Support unitsTechnical Support units
EGEE Infrastructure & Operations 38
Enabling Grids for E-sciencE
INFSO-RI-508833
The GGUS System
EGEE Infrastructure & Operations 39
Enabling Grids for E-sciencE
INFSO-RI-508833
GGUS Portal: user services
Browseable ticketsBrowseable tickets
Search through solved ticketsSearch through solved tickets
Useful links (Wiki FAQ)Useful links (Wiki FAQ)
Broadcast toolsBroadcast tools
Latest NewsLatest News
GGUS Search EngineGGUS Search Engine
Updated documentation (Wiki FAQ)Updated documentation (Wiki FAQ)
EGEE Infrastructure & Operations 40
Enabling Grids for E-sciencE
INFSO-RI-508833
TPMGrid experts
GGUS Supporters
VO-TPMVO experts
User
First line support
VO SupportUnits
Middleware Support Units
Deployment Support Units
Operations Support
ROC Support Units
Network Support
Second line support
EGEE Infrastructure & Operations 41
Enabling Grids for E-sciencE
INFSO-RI-508833
Performance statistics
Tickets per Submitter
CIC; 144GGUS user; 137
Average processing times for cms
00:00:00
140:01:52
Tim
e (h
h:m
m:s
s) Average time f romticket creation toticket assignment
Average time f romticket assignment toticket solution
September
Average processing times for cms tickets
0:18:35
21:03:45
Tim
e (h
h:m
m:s
s) Average time fromticket submit to ticketassignment
Average time fromticket assignment toticket solution
October
Average processing times for TPM
0:01:34
2:38:37
Tim
e (h
h:m
m:s
s) Average time fromticket creation toticket assignment
Average time fromticket assignment toticket solution
October
Average processing times for all ROCs
1:35:16
41:59:13
Tim
e (h
h:m
m:s
s) Average time from ticketcreation to ticketassignment
Average time from ticketassignment to ticketsolution
October
A peak of 80 tickets per day has been reached.
0
10
20
30
40
50
15
22
1
1118
23
34
19
46
16 13 15
26
42
4
16
2 1
Cas
tor
Gen
eric
Dep
loym
ent
Glo
balG
ridU
serS
uppo
rt
Net
wor
kOpe
ratio
ns
RO
C_A
sia/
Pac
ific
RO
C_C
E
RO
C_C
ER
N
RO
C_D
E/C
H
RO
C_F
ranc
e
RO
C_I
taly
RO
C_N
orth
RO
C_R
ussi
a
RO
C_S
E
RO
C_S
W
RO
C_U
K/Ir
elan
d
Sec
urity
Man
agem
ent
TPM
VO
Sup
port
Wor
kloa
d M
anag
emen
t
Am
ount
of t
icke
ts
November 2005: 315 tickets
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
New VOs; Access to Resources; Benefits & Costs
EGEE Infrastructure & Operations 43
Enabling Grids for E-sciencE
INFSO-RI-508833
How new VOs find resourcesVarious possibilities:1. Pilot applications:
– Expectation that they have access to resources provided by many partners For EGEE-II this is specified in TA
2. Applications reviewed and approved by EGAAP:– Negotiation via OAG to understand which ROCs/sites are willing to
Run services on behalf of the VO Provide compute and/or storage resources
3. Other (self supporting) applications Own their own resources Use EGEE infrastructure, operations, support Many successful examples of such VOs
• 1 & 2: – Formal agreements (TA or MoU) – Should expect support via NA4 – but should also build up internal support teams– Expected to collaborate on improving the service – not just “users”
• 1, 2 & 3:– Full user and operations support– VOs need to provide support teams – some problems are application problems!
EGEE Infrastructure & Operations 44
Enabling Grids for E-sciencE
INFSO-RI-508833
NegotiationOperations Advisory Group (OAG)• Brings together VOs and resource providers (ROCs)• Negotiate for services and resources
• Should not always be an expectation of “free” resources– In future applications should bring some resources with them – Computational and storage resources are not funded (!) by the project
EGEE Infrastructure & Operations 45
Enabling Grids for E-sciencE
INFSO-RI-508833
EGEE – What can it deliver?• A managed operation – providing a service:
– A large number of sites of different sizes and capabilities– Developed operational procedures
Monitoring of the grid services providing access to resources– Operational security support; incident response coordination– Support services: user support, training, etc. – Building up considerable experience in grid-enabling a variety of different
applications– Tools for monitoring of resources at a site … if required
• A new VO joining EGEE with a few sites:– Benefits from the operations and support – the VO sites can be monitored and
supported as part of the infrastructure– Potentially access to other resources – It is a significant effort to set up a grid infrastructure from scratch
EGEE Infrastructure & Operations 46
Enabling Grids for E-sciencE
INFSO-RI-508833
… and what does it cost?• “The application VO buys into the EGEE model”
– Actually not so restrictive now – supports many linux flavours, IA64, (other teams have worked on AIX, SGI ports)
– Simple installation of client software now (can be done on the fly)– Basic grid services are quite general, nothing really application-specific
• Some unresolved issues:– Commercial licensed software used by an application– Levels of privacy/security needed in some life-science applications– True interactivity
• … and of course, this is all new, rapidly evolving and many problems still to be overcome
• VOs should:– Provide application support effort to help other VO users– Invest effort into helping improve the infrastructure and services – should not be
simple “client – server” – rather a collaboration
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Future
EGEE Infrastructure & Operations 48
Enabling Grids for E-sciencE
INFSO-RI-508833
From EGEE to EGEE-II• Simplify operations structure
– ROCs absorb CIC roles – spread of expertise• Introduce SA3
– Integration, certification, distribution preparation– Emphasises focus on stability, reliability, performance rather than new features– Mechanism for integrating non-EGEE software – according to need
• Increased emphasis on – Platform support (OS, 64-bit, etc)– Interoperability with other grids (international, regional, national, local, campus,) and other
middleware stacks (Unicore, ARC, …)
SA: 54% of total• SA1 (operations) : 86%• SA2 (network) : 3%• SA3 (certification): 11%
EGEE Infrastructure & Operations 49
Enabling Grids for E-sciencE
INFSO-RI-508833
Outlook• LHC VOs must achieve reliable production and analysis in 2006
– Will be making significant use of resources• Consolidate and improve existing services: Focus on
– Reliability, robustness– Manageability– Performance, scalability– Evolution or replacement of services driven by needs of application (or
security/manageability)• Expand grid operations
– Spread expertise to ROCs– Collaboration with OSG, A-P– Start to negotiate SLAs
• New applications– Must bring resources – show commitment – Resource sharing and negotiation – must become streamlined
Will need a mechanism for cost/credit for use of resources
EGEE Infrastructure & Operations 50
Enabling Grids for E-sciencE
INFSO-RI-508833
Summary• EGEE Infrastructure – world’s largest multi-science production grid
service– But does not exist in isolation: interoperability and interoperation is essential
• Significant improvements in reliability and stability over the last year• Is in constant use for significant production work
– Many VOs now use it as their primary resource• Middleware distribution is
– Consolidating existing and new services– Basis for evolution according to needs
• Shift from EGEE to EGEE-II– No major changes, but adjustments based on experience and anticipated evolution– Refine and improve processes