Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM....

86
Grid Computing ESI 2011 Markus Schulz IT Grid Technology Group, CERN WLCG [email protected]

Transcript of Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM....

Page 1: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Grid Computing ESI 2011Markus Schulz

IT Grid Technology Group, [email protected]

Presenter
Presentation Notes
----- Meeting Notes (26/04/2011 21:01) -----
Page 2: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Overview

• Grid Computing – Definition, History, Fundamental

Problems, Technology– EMI/gLite

• WLCG– Challenge– Infrastructure– Usage

• Grid Computing– Who?– How?

• What’s Next?

2

Page 3: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Focus

• Understanding concepts• Understanding the current

usage• Understanding whether you

can profit from grid computing• Not much about projects,

history, details…

• If you want practical examples: • https://edms.cern.ch/file/7223

98/1.4/gLite-3-UserGuide.pdf

3

Page 4: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• There are many conflicting definitions– Has been used for several years for marketing…

• Marketing moved recently to “Cloud-computing”

• Ian Foster and Karl Kesselman– “coordinated resource sharing and problem solving in dynamic, multi-

institutional virtual organizations. “– These are the people who started globus, the first grid middleware

project• From the user’s perspective:

– I want to be able to use computing resources as I need– I don’t care who owns resources, or where they are– Have to be secure– My programs have to run there

• The owners of computing resources (CPU cycles, storage, bandwidth)– My resources can be used by any authorized person (not for free)– Authorization is not tied to my administrative organization

• – NO centralized control of resources or users

Markus Schulz

What is a Computing Grid?

4

Page 5: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• The world is a fairly heterogeneous place– Computing services are extremely heterogeneous

• Examples:– Batch Systems (controlling the execution of your jobs )

• LSF, PBS, TorQue, Condor, SUN-GridEngine, BQS, …..• Each comes with its own commands and status messages

– Storage: Xroot, CASTOR, dCache, DPM, STORM,+++– Operating Systems:

• Windows, Linux ( 5 popular flavors), Solaris, MacOS,….• All come in several versions

– Site managers• Highly experienced professionals• Scientists forced to do it ( or volunteering )• Summer students doing it for 3 months…….

Markus Schulz

Other Problems?

5

Page 6: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• A Virtual Organization is a group of people that agree to share resources for solving a common problem– The members often belong to different

organizations

– The organizations are often in different countries

– High Energy Physics Collaborations are a good example

Markus Schulz

What is a Virtual Organization?

6

Page 7: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Fundamental Problems

• Provide security without central control

• Hide and manage heterogeneity

• Facilitate communication between users and providers

• Not only a technical problem!

• Grid Middleware is the software to address these problems

Markus Schulz, CERN, IT Department

Page 8: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• 1998 The GRID by Ian Foster & Carl Kesselman– Made the idea popular

• 1998 Globus-1 first middleware widely available– Proof of concept

– www.globus.org evolved to gt-5 ( 2010 )

• Since 1998 several hundred middleware solutions

• OpenGridForum works towards standardization– Progress is slow…..

• LHC experiments use: – gLite, ARC, OSG (globus, VDT), Alien

Markus Schulz

Short History

8

Page 9: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Identify an AAA system that all can agree on• Authentication, Authorization, Auditing

– That doesn’t require local user registration– That delegates “details” to the users ( Virtual

Organizations)• Define and implement abstraction layers for

resources– Computing, Storage, etc.

• Define and implement a way to announce your resources (Information System)

• Build high level services to optimize the usage• Interface your applications to the system

Markus Schulz

Software Approach

9

Page 10: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

gLite as an example

10

Page 11: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

CERN, IT Department

Data Services

Storage Element

File and Replica Catalog

Metadata Catalog

Job Management

Services

Computing Element

Worker Node

Workload Management

Job Provenance

Security Services

Authorization

Authentication

Information & Monitoring

Services

Information System

Job Monitoring

Accounting

Access Services

User Interface

API

gLite middleware

Page 12: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

12

The Big Picture

Page 13: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Security

• Authentication– Who you are

• Authorization– What you can do

• Auditing/Accounting– What you have done

• How to establish trust?

13

Page 14: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite The EGEE Middleware Distribution

14

Authentication • Authentication is based on X.509 PKI infrastructure ( Public Key)

– Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport)

• Commonly used in web browsers to authenticate to sites

– Trust between CAs and sites is established (offline)– In order to reduce vulnerability, on the Grid user identification is done by

using (short lived) proxies of their certificates• Short-Lived Credential Services (SLCS)

– issue short lived certificates or proxies to its local users • e.g. from Kerberos or from Shibboleth credentials

• Proxies can– Be delegated to a service such that it can act on the user’s behalf– Be stored in an external proxy store (MyProxy) – Be renewed (in case they are about to expire)– Include additional attributes - Authorization

Page 15: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• How to exchange secret keys?– 340 Sites ( global)

• With hundreds of nodes each?

– 200 User Communities ( non local)

– 10000 Users (global)

• And keep them secret!!!

Markus Schulz

Public Key Based Security

15

Page 16: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Authorization• VOMS is now a de-facto standard

– Attribute Certificates provide users with additional capabilities defined by the VO.

– Allows group and role based authorization– Basis for the authorization process

• Authorization: currently via mapping to a local user on the resource ( or ACLs)– glexec changes the local identity (based on suexec from

Apache)• Designing an authorization service with a common

interface agreed with multiple partners– Uniform implementation of authorization in gLite services– Easier interoperability with other infrastructures– ARGUS

Page 17: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite The EGEE Middleware Distribution

17

Security - overviewCertification

Authority1 per

country/region/lab

Page 18: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Common AuthZ interface

SAML-XACML interface

Common SAML XACML library

Site Central: LCAS + LCMAPS

L&L plug-ins

GPBox

LCMAPSplug-in

Site Central: GUMS (+ SAZ)

Common SAML XACML library

glexec

L&L plug-in: SAML-XACML

edg-gk

edg-gridftp

gt4-interface

pre-WS GT4 gk, gridftp, opensshd

Prima + gPlazma: SAML-XACML

GT4 gatekeeper,g

ridftp, (opensshd)

dCache

LCAS + LCMAPS

CREAM

Oblg: user001, somegrp<other obligations>

SAML-XACML Query

Q:

R:

map.user.to.some.pool

Pilot job on Worker Node

(both EGEE and OSG)

OSG EGEE/EGI

Page 19: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• You can setup a Certification Authority and several Registration Authorities– Using openssl

– 1h work

• No one will trust your certificates

• Trust is based on:– Common policies

– Common infrastructure to follow up on security problems

Markus Schulz

Trust

19

Page 20: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Security groups• Joint Security Policy Group:

– Joint with WLCG, OSG, and others

– Focus on policy issues

– Strong input to e-IRG

• EUGridPMA– Pan-European trust federation of CAs

– Included in IGTF (and was model for it)

– Success: most grid projects now subscribe to the IGTF

• Grid Security Vulnerability Group– Looking at how to manage vulnerabilities

– Risk analysis is fundamental

– Balance between openness and security

• Operational Security Coordination Team– Main day-to-day operational work

– Incident response and follow up

– Members in all NGI and sites

– Frequent tests (Security Challenges)

TAGPMA APGridPMA

The Americas Grid PMA

European Grid PMA

EUGridPMA

Asia-Pacific

Grid PMA

Security & Availability Policy

UsageRules

Certification Authorities

AuditRequirements

Incident Response

User Registration & VO Management

Application Development& Network Admin Guide

VOSecurity

Page 21: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Computing Elements (CE)– gateways to farms

• Workload Management– WMS/LB

– Matches resources and requests• Including data location

– Handles failures (resubmission)

– Manages complex workflows

– Tracks job status

21

Computing Access

CECELFS

CPU CPU CPU CPU CPU CPU

CPU CPU CPU CPU CPU CPU

CPU CPU CPU CPU CPU CPU

CPU CPU CPU CPU CPU CPU

Site

UI

WMS

UI

Page 22: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Workload Management (compact)

Desktops

A few~50 nodes

1-20 per site

1-24000per site

Page 23: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

ECSAC'09 - Veli Lošinj, Croatia, 25-29 August 2009

23

Job Description Language• [• Executable = “my_exe”;• StdOutput = “out”;• StdError = “err”;• Arguments = “a b c”;• InputSandbox = “/home/giaco/my_exe”;• OutputSandbox = “out”, “err”;• Requirements = Member(• other.GlueHostApplicationSoftwareRunTimeEnvironment,• "ALICE3.07.01“• );• Rank = -other.GlueCEStateEstimatedResponseTime;• RetryCount = 3• ]

Page 24: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 24

Pilot Jobs

17/05/2011

• All WLCG experiments use a form of pilot jobs

• They have given a number of advantages

• Responsive scheduling• Knowledge of

environment• They have also faced some

resistance• Security• Traceability

Page 25: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• BDII = Yellow Pages – realtime

• Light weight Database

• LDAP protocol • GLUE 1.3 (2)

Schema – Describes resources

and their state– Approx 100MB– Update 2min

• Several hundred instances

25

Information System

Page 26: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Storage Elements (SEs) – External interfaces based on SRM 2.2 and gridFTP

– Many implementations:• CASTOR, Storm, DPM, dCache, BestMan….

– Many local interfaces: • POSIX, dcap, secure rfio, rfio, xrootd

• Catalogue: LFC (local and global)

• File Transfer Service (FTS)

• Data management clients gfal/LCG-Utils

26

Data Management

CPU

CPU

CPU

CPU

Site

rfio

xrootd

SRM

GridFTP

SE

Page 27: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Data Management

lcg_utilsFTS

Vendor Specific

APIs

GFAL Cataloging Storage Data transfer

Data Management

User ToolsVOFrameworks

(RLS) LFC SRM(Classic

SE)gridftp RFIO

Information System/Environment Variables

Page 28: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite The EGEE Middleware Distribution 28

General Storage Element• Storage Resource Manager (SRM)

– hides the storage system implementation (disk or tape)

– handles authorization

– translates SURLs (Storage URL) to TURLs (Transfer URLs)

– disk-based: DPM, dCache,+; tape-based: Castor, dCache

– Mostly asynchronous

• File I/O: posix-like access from local nodes or the gridGFAL (Grid File Access Layer)

Page 29: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• An abstraction layer for storage and data access is necessary– Guiding principle:– Non-interference with local policies

• Providing all necessary user functionality and control– Data Management– Data Access – Storage management– Control:

• Pinning files• Retention Policy• Space management and reservation

– Data Transfers• Grid enabled and based on current technology

– Interface technology (gSOAP) – Security Model (gsi security)– To integrate with the grid infrastructure

Markus Schulz

Approach to SRM

29

Page 30: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

SRM basic and use cases tests

33

Page 31: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite - The EGEE Middleware Distribution 34

LCG “File” Catalog• The LFC stores mappings between

– Users’ file names

– File locations on the Grid

• The LFC is accessible via– CLI, C API, Python interface, Perl interface

• Supports sessions and bulk operations

– Data Location Interface (DLI)• Web Service used for match making:

– given a GUID, returns physical file location

• ORACLE backend for high performance applications– Read-only replication support

…File replica 2

GUID

File replica 1

File replica m

LFC file name 1

LFC file name n

These “Replicas” are “Copies”

All files are “Write Once Read Many”

ACL

Page 32: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite - The EGEE Middleware Distribution 38

/vo

DPM: user's point of view

/dpm

/domain

/home

DPMhead node file

(uid, gid1, …)

DPMdisk servers

DPM Name Server– Namespace– Authorization (role based ACLs)– Physical files location

Disk Servers– Physical files

Direct data transfer from/to disk server (no bottleneck)

External transfers via gridFTPLocal access: rfio, xroot, soon Webdav and NFS-

4.1Easy to deploy, easy to operate

CLI, C API, SRM-enabled client,

etc.

Page 33: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite - The EGEE Middleware Distribution 42

FTS: key points

• Scalable File Transfer service• Reliability

– It handles the retries in case of storage / network failures

– VO customizable retry logic– Service designed for

high-availability deployment• Security

– All data is transferred securely using delegated credentials with SRM / gridFTP

– Service audits all user / admin operations• Service and performance

– Service stability: it is designed to efficiently use the available storage and network resources without overloading them

– Service recovery: integration of monitoring to detect service-level degradation

Page 34: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite - The EGEE Middleware Distribution 43

FTS: key points

• Scalable File Transfer service• Reliability

– It handles the retries in case of storage / network failures

– VO customizable retry logic– Service designed for

high-availability deployment• Security

– All data is transferred securely using delegated credentials with SRM / gridFTP

– Service audits all user / admin operations• Service and performance

– Service stability: it is designed to efficiently use the available storage and network resources without overloading them

– Service recovery: integration of monitoring to detect service-level degradation

Page 35: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Encrypted Data Storage – DICOM SE, HYDRA (distributed key store)

• Several grid enabled storage systems• Meta Data Catalogues

– AMGA

• Logging and Bookkeeping – Doing exactly this

• Accounting – APEL, DGAS

• ARGUS – Global/local authorization and policy system

Markus Schulz

Other Software in gLite

45

Page 36: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

gLite code base

CERN, IT Department

• Distributed under an open source license.

• Main platform is Scientifc Linux (recompiled RH EL).

• Many 3rd party dependencies– tomcat, log4*,gSOAP , ldap etc.

• ~ 20 FTEs, 80 people, 12 institutes (mostly academic)

• Geographically distributed, independent– Coding conventions, Documentation, Naming Conventions

– Testing and quality, dependency management

Page 37: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Summary Middleware

• Middleware allows to:– Find resources – Access Computing and

Storage– Mange workflows – Move data – Locate data

• Provides,– without central control– Security and accounting

47

Page 38: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

WLCGWorldwide LHC Computing Grid

Markus Schulz

CERN-IT-GT

August 2010 Openlab Summer Students

Page 39: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Overview

• An example for large scale scientific grid computing

• The LHC challenge

• Why grid computing?

• First full year with data

49

Page 40: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

One of our data generators: ATLAS

150 million sensors deliver data …

… 40 million times per second

Page 41: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

First events

Markus Schulz - ESI 2011 5117/05/2011

Page 42: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz, CERN 52

1.25 GB/sec (ions)

Tier 0 at CERN: Acquisition, First pass processingStorage & Distribution

8 October 2009

Presenter
Presentation Notes
The next two slides illustrate what happens to the data as it moves out from the experiments. Each of CMS and ATLAS produce data at the rate of 1 DVD-worth every 15 seconds or so, while the rates for LHCb and ALICE are somewhat less. However, during the part of the year when LHC will accelerate lead ions rather than protons, ALICE (which is an experiment dedicated to this kind of physics) alone will produce data at the rate of over 1 Gigabyte per second (1 DVD every 4 seconds). Initially the data is sent to the CERN Computer Centre – the Tier 0 - for storage on tape. Storage also implies guardianship of the data for the long term – the lifetime of the LHC – at least 20 years. This is not passive guardianship but requires migrating data to new technologies as they arrive. We need large scale sophisticated mass storage systems that not only are able to manage the incoming data streams, but also allow for evolution of technology (tapes and disks) without hindering access to the data. The Tier 0 centre provides the initial level of data processing – calibration of the detectors and the first reconstruction of the data.
Page 43: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Flow in and out of the center

Markus Schulz, CERN, IT Department

Page 44: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 54

Offline Requirements

17/05/2011

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

LHC approved

ATLAS & CMSapproved

ALICEapproved

LHCb approved

“Hoffmann”Review

7x107 MIPS1,900 TB disk

ATLAS (or CMS) requirementsfor first year at design luminosity

ATLAS&CMSCTP

107 MIPS100 TB disk

LHC start

ComputingTDRs

55x107 MIPS70,000 TB disk

(140 MSi2K)

Latest review

627kHS0683,000 TB disk

(156 MSi2K)

Page 45: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

55

Data and Algorithms

• HEP data are organized as Events (particle collisions)

• Simulation, Reconstruction and Analysis programs process “one Event at a time” – Events are fairly

independent Trivial parallel processing

• Event processing programs are composed of a number of Algorithms selecting and transforming “raw” Event data into “processed” (reconstructed) Event data and statistics

8 October 2009Markus Schulz, CERN

RAW Detector digitisation

~2 MB/event

ESD/RECOPseudo-physical information:Clusters, track candidates

~100 kB/event

AOD~10 kB/event

TAG

~1 kB/event

Relevant information for fast event selection

Triggered eventsrecorded by DAQ

Reconstructed information

Analysis information

Classification information

Physical information:Transverse momentum, Association of particles, jets, id of particles

Page 46: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

The LHC Computing Challenge

56

Signal/Noise: 10-13 (10-9 offline)

Data volume High rate * large number of

channels * 4 experiments

15 Peta Bytes of new data each year

Compute power Event complexity * Nb. events *

thousands users 280 k of (today's) fastest CPU cores

45 PB of disk storage

Worldwide analysis & funding Computing funding locally in major

regions & countries

Efficient analysis everywhere

GRID technology

26 June 2009

Page 47: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

The LHC Computing ChallengeLogarithmic

Scale

Logarithmic Scale

Page 48: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

588 October 2009 Markus Schulz, CERN

Page 49: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 59

Why a grid?

17/05/2011

• Both practical and political considerations point to a distributed computing solution

• The grid model is particularly well suited to collaboration

• LHC brings the community into unchartered territory in many domains, including computing

Smaller centres (T2s) contribute ~50% of CPU

Page 50: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Architecture

Markus Schulz, CERN 60

Tier-0 (CERN): (15%)•Data recording• Initial data reconstruction•Data distribution

Tier-1 (11 centres): (40%)•Permanent storage•Re-processing•Analysis

Tier-2 (~200 centres): (45%)• Simulation• End-user analysis

8 October 2009

15%

40%

45%

Date

Presenter
Presentation Notes
The Tier 0 centre at CERN stores the primary copy of all the data. A second copy is distributed between the 11 so-called Tier 1 centres. These are large computer centres in different geographical regions of the world, that also have a responsibility for long term guardianship of the data. The data is sent from CERN to the Tier 1s in real time over dedicated network connections. In order to keep up with the data coming from the experiments this transfer must be capable of running at around 1.3 GB/s continuously. This is equivalent to a full DVD every 3 seconds. The Tier 1 sites also provide the second level of data processing and produce data sets which can be used to perform the physics analysis. These data sets are sent from the Tier 1 sites to the around 130 Tier 2 sites. A Tier 2 is typically a university department or physics laboratories and are located all over the world in most of the countries that participate in the LHC experiments. Often, Tier 2s are associated to a Tier 1 site in their region. It is at the Tier 2s that the real physics analysis is performed.
Page 51: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

History• 1999 - MONARC project

– First LHC computing architecture – hierarchicaldistributed model

• 2000 – growing interest in grid technology– HEP community main driver in launching the DataGrid project

• 2001-2004 - EU DataGrid project– middleware & testbed for an operational grid

• 2002-2005 – LHC Computing Grid – LCG– deploying the results of DataGrid to provide aproduction facility for LHC experiments

• 2004-2006 – EU EGEE project phase 1– starts from the LCG grid– shared production infrastructure– expanding to other communities and sciences

• 2006-2008 – EU EGEE project phase 2– expanding to other communities and sciences– Scale and stability– Interoperations/Interoperability

• 2008-2010 – EU EGEE project phase 3– More communities– Efficient operations– Less central coordination

• 2010 – 201x EGI and EMI – Sustainable infrastructures based on National Grid

Infrastructures– Decoupling of middleware development and infrastructure

CERN

Page 52: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Lyon/CCIN2P3Barcelona/PICDe-FZK

US-FNAL

Ca-TRIUMF

NDGF

CERNUS-BNL

UK-RAL

Taipei/ASGC

26 June 2009 Ian Bird, CERN 62

Today we have 49 MoU signatories, representing 34 countries:

Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA.

WLCG Collaboration StatusTier 0; 11 Tier 1s; 64 Tier 2 federations (124 Tier 2 sites)

Amsterdam/NIKHEF-SARA

Bologna/CNAF

Page 53: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 63

Network

17/05/2011

• CERN runs a commercial Internet Exchange Point (IXP)

• Dedicated 10Gb/s optical links to tier 1s• Rest on the research networks, GEANT2 in

Europe

Page 54: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 64

Monitoring

17/05/2011

Metric selection for colourof rectangles

Show SAM status

Show GridView availability data

Grid topology view (grouping)

Metric selection for size of rectangles

VO selection

Overall Site or Site Service selection

Link: http://gridmap.cern.chDrilldown into region by clicking on the title

Context sensitive information

Colour KeyDescription of current view

Page 55: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz, CERN 65

EGI Infrastructure

• >270 VOs from several scientific domains– Astronomy & Astrophysics– Civil Protection– Computational Chemistry– Comp. Fluid Dynamics– Computer Science/Tools– Condensed Matter Physics– Earth Sciences– Fusion– High Energy Physics– Life Sciences

• Further applications joining all the time

– Recently fishery ( I-Marine)

Applications have moved from testing to routine and daily usage

Presenter
Presentation Notes
At Feb review: 100 sites, 10K CPUs 1st gLite release foreseen for March’05 6 domains and
Page 56: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

2. EGI_DS Review www.eu-egi.eu 66

NGIs in Europewww.eu-egi.eu

Page 57: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 67

WLCG today

17/05/2011

Page 58: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Usage

We have a working grid infrastructure• With (still) adequate resources

Experiments use distributed models

Network traffic close to planned• Highly reliable

Large numbers of Individual users• CMS ~800• ATLAS ~1000• LHCb/ALICE ~200

68

Page 59: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 69

Gstat / installed capacity

17/05/2011

Data taken live from the LDAP based information system (BDII)

Page 60: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 70

First year of running - resources

17/05/2011

CPUs 264k cores

Disk 158PB

Tape 126PB

Page 61: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 71

CPU usage

17/05/2011

A lot of work continues even when there’s no beam

About 1 million

jobs/day

Page 62: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 72

Accumulated CPU usage

17/05/2011

>10 thousand years of CPU

Page 63: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 73

CASTOR – CERN tape storage

17/05/2011

LHC can produce 15PB per year

>5GB/s to tape during HI~ 2 PB/month to tape during pp~ 4 PB to tape in HI

Page 64: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 74

Atlas data throughput

17/05/2011

Transfers typically managed and scheduled with FTS

Page 65: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 75

SAM and availability

17/05/2011

• Grid community puts a great effort into operations• Infrastructure is continually monitored with active followup of issues

Page 66: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 76

At the WLCG Management Board

17/05/2011

Page 67: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

To Grid or not to Grid?

78

Page 68: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Distributed community (VO)– Different organizations– Distributed resources

• Longer term project ( > 2 years)– With massive computing requirements ( >> 100 PC nodes)

• Computing requires modest parallelization– MPI is available on some sites, but not easy to use in a Grid

• Don’t expose middleware directly to end users– Link from workflow management/portals– Shield users from failures/complexity– Distributed computing requires management of failures

• Join an existing infrastructure – EGI is in Europe a good choice

• Use workflow management software from other Vos– Dirac, Panda, gCube from D4Science …..

• Get sufficient expertise…..

Markus Schulz

Grid

79

Page 69: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Distributed small community (< 100 )– Closely linked ( same region or organization)– Distributed resources

• Medium term project ( < 2 years)

• Join an existing VO ( use their experience )• Or:

– Link your resources via Condor• http://www.cs.wisc.edu/condor/

• Or:– Use cloud computing ( OpenStack, OpenNebula, Amazon EC2..)

• Or: – Use volunteer computing ( BOINC (like Seti@home)– We interfaced gLite and BOINC… not much use by HEP

• You still need to invest, but you will see results faster

Markus Schulz

Half Grid

80

Page 70: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

• Local team– Closely linked ( same region or organization)

– Distributed resources

• Short or medium term project ( < 2 years)

• Massive parallel processing needed or HPC needed

• If you choose using the gird nevertheless…– Understand the startup costs

Markus Schulz

No Grid

81

Page 71: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Future

• WANs are now very stable and provide excellent performance– Move to a less hierarchical

model • Virtualization and Cloud

Computing• Adapting standards• Integrating new

technology

82

Page 72: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 83

Data access: a WAN solution

17/05/2011

• Data access over the WAN is now a possibility• More efficient use of storage• Greater job reliability• Not necessarily more WAN traffic

• Can be combined with various caching strategies• Can be quicker than pulling something locally from tape

• NFSv4.1 offers this possibility (WAN optimised operations, parallelism)• A global xrootd federation is being demonstrated by CMS:

Page 73: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 84

Virtualisation & cloud

17/05/2011

Virtualisation is interesting in a number of domains

• Application Environment• HEP applications are platform dependent

• Sites & laptops are varied

• Infrastructure Management

• Direct cloud use by LHC experiments• Simulation• Elasticity• Reprocessing & analysis

• Data cost

Page 74: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 85

Adoption of standards

17/05/2011

• OGF, OASIS• Storage Resource Manager (SRM)

– hides the storage system implementation (disk or active tape)

– handles authorisation– Many implementations: DPM, dCache, StoRM,

BeSTman, Castor, dCache

• GLUE 2.0 Information Schema

• Non-grid standards• NFSv4.1• SRM/https• WebDAV• HTTP• SSL

• Why NFSv4.1?• Simplicity

• Regular mount-point and real POSIX I/O

• Performance• pNFS : parallel NFS• Clever protocols

EMI has embraced the adoption of standards, many applications see the benefits

Page 75: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 86

Other future developments…

17/05/2011

• WLCG and experiment frameworks require long-term planning• Many projects are taking advantage of emerging technology• An incomplete selection:

Multicore Efficiency, esp in memory usage

LHCONE Improved networking in T2

NoSQL Performance improvements

CERNVMfs Distribution of applications

Authentication User friendly ways to authenticate

…etc

------BEGIN CERTIFICATE------MIIHmCCdaSDFpopiopjan242ASD2qrA2

Page 76: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011

Summary

• Grid Computing and WLCG has proven itself during the first year of data-taking of LHC

• Grid computing works for our community and has a future

87

Page 77: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Overview

• If you want to use gLite read the user guide:

• https://edms.cern.ch/document/722398/

• There is NO way around it – Unless you are in an LHC

experiment

88

Page 78: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011

Thank you

89

Page 79: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz

Extra Slides

90

Page 80: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

European Middleware Initiative (EMI)

Page 81: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Primary ObjectivesConsolidate the existing middleware distribution simplifying services and components to make them more sustainable (including use of off-the-shelf and commercial components whenever possible)

Evolve the middleware services/functionality following the requirement of infrastructure and communities, mainly focusing on operational, standardization and interoperability aspects

Reactively and proactively maintain the middleware distribution to keep it in line with the growing infrastructure usage

EMI Overview - Kick-off Meeting 92

Consolidate

Evolve

Support

26/05/2010

Page 82: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Partners (26)

EMI Overview - Kick-off Meeting 9326/05/2010

Page 83: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Technical Areas

EMI Overview - Kick-off Meeting 94

Compute Services

Data Services

Security Services

Infrastructure Services

A-REX, UAS-Compute, WMS, CREAM, MPI, etc

dCache, StoRM, UAS-Data, DPM, LFC, FTS, Hydra, AMGA, etc

UNICORE Gateway, UVOS/VOMS/VOMS-Admin, ARGUS, SLCS, glExec, Gridsite,

Proxyrenewal, etc

Logging and Bookkeeping, Messaging, accounting, monitoring, virtualization/clouds support, information systems and providers

26/05/2010

Page 84: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 95

Middleware – the EMI project

17/05/2011

Standards,New technologies (clouds)Users and Infrastructure

Requirements

3 yearsBefore EMI

Page 85: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 96

EMI Middleware

17/05/2011

Page 86: Grid Computing ESI 2011 - EPN Campus€¦ · DPM: user's point of view /dpm /domain /home. DPM. head node. file (uid, gid1, …) DPM. disk servers. DPM Name Server – Namespace –

Markus Schulz - ESI 2011 97

EMI services

17/05/2011

EGEE Maintained Components External Components

General Services

LHC FileCatalogue

HydraWorkload

Management Service

File TransferService

Logging &Book keeping

Service

Storage Element

DPM

Information S

ervices

BDII

User InterfaceUser Access

SecurityServices

Virtual Organisation Membership

Service

Authz. Service

SCAS

Proxy Server

LCAS & LCMAPS

Compute Element

CREAM LCG-CE

gLExec

BLAH

Worker Node

User Interface

dCache

StoRM

Not all EMI services are illustrated