GridPP & The Grid Who we are & what it is Tony Doyle.

Post on 13-Jan-2016

218 views 0 download

Tags:

Transcript of GridPP & The Grid Who we are & what it is Tony Doyle.

GridPP & The Grid

Who we are & what it is

Tony Doyle

Web: information sharing

• Invented at CERN by Tim Berners-Lee

No. of

Inte

rnet

host

s (m

illio

ns)

Year

• Agreed protocols: HTTP, HTML, URLs

• Anyone can access information and post their own

• Quickly crossed over into public use Tim

Berners-Lee

@Home Projects• Uses home PCs to run

numerous calculations with dozens of variables.

• Distributed computing project, not a grid

• Other @home projects– BBC Climate

Change ExperimentSETI @ Home

– FightAIDS@home

Peer To Peer Networks

Peer-to-peer network

• No centralised database of files• Legal problems with sharing

copyrighted material• Security problems

Grid: Resource Sharing

• Share more than information• Data, computing power, applications

MIDDLEWARE

CPUCluster

User Interface Machine

CPUCluster

Resource Broker

DiskServer

Your Program

Disks, CPU etc

PROGRAMS

OPERATING SYSTEM

Word/Excel

Email/Web

Your Program

Games

• Middleware handles everything

Single computer

The Grid

Analogy with the Electricity Power Grid

'Standard Interface'

Distribution Infrastructure

Power Stations

Computing and Data Centres

Fibre Optics of the Internet

The CERN LHC

4 Large Experiments

The world’s most powerful particle accelerator - 2007

ALICE- heavy ion collisions, to create quark-gluon plasmas

- 50,000 particles in each collision

LHCb- to study the differences between matter and antimatter

- will detect over 100 million b and b-bar mesons each year

ATLAS- General purpose- Origin of mass- Supersymmetry- 2,000 scientists from 34 countries

CMS- General purpose

- 1,800 scientists from over 150 institutes

“One Grid to Rule Them All”?

The Experiments

Why do particle physicists need the Grid?

Example from LHC: starting from this event…

…we are looking for this “signature”

Selectivity: 1 in 1013

Like looking for 1 person in a thousand world populations

Or for a needle in 20 million haystacks

Why do particle physicists need the

Grid?

Concorde(15 Km)

Mt. Blanc(4.8 Km)

One year’s data from LHC

would fill a stack of CDs 20km high • 100 million electronic

channels• 800 million proton-

proton interactions per second

• 0.0002 Higgs per second• 10 PBytes of data a year • (10 million GBytes

= 14 million CDs)

Who else can use a Grid?• Astronomers

• Healthcare Profesionals

• Bioinformatics

• Digital curation

To create digital Libraries and

Museums

Scanning

Remote consultanc

y

Optical

X ray

Digitize almost anything

19 UK Universities, CCLRC (RAL &

Daresbury) Funded by PPARCGridPP1 2001-2004

"From Web to Grid"

GridPP2 2004-2007 "From Prototype to Production"

Developed a working, highly functional Grid

Who are GridPP?

What Have We Done So Far

• Simulated 46 million molecules for medical research in 5 weeks, which would have taken over 80 years on a single PC

• Reached transfer speeds of 1 Gigabyte per second in high speed networking tests from CERN – a DVD every 5 seconds

• BaBar experiment has simulated 500 million particle physics collisions on the UK Grid

• UK’s #1 producer of data for LHCb, ATLAS and CMS

Worldwide LHC Computing Grid• GridPP is part

of EGEE and LCG (currently the largest Grid in the world)

EGEE stats:

182 Sites

42 Countries

38,201 CPUs

9,145 TBytes Storage

Tier Structure

Tier 0

Tier 1National centres

Tier 2Regional groups

Tier 3Institutes

Offline farm

Online system

CERN computer centre

RAL,UK

ScotGrid NorthGridSouthGrid London

ItalyUSA

Glasgow Edinburgh Durham

FranceGermany

Detector

UK Tier-1/A Centre• High quality data services• National and International

Role• UK focus for International Grid

development

•1000 Dual CPU

•200 TB Disk•220 TB Tape (Capacity 1PB)

Grid Operations Centre

UK Tier-2 CentresScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield

SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick

LondonBrunel, Imperial, QMUL, RHUL, UCL

•Must •share data between thousands of scientists with multiple interests•link major and minor computer centres•ensure all data accessible anywhere, anytime•grow rapidly, yet remain reliable for more than a decade•cope with different management policies of different centres•ensure data security•be up and running routinely by 2007

What are the Grid challenges?

Other Grids• UK National Grid Service

– UK’s core production computational and data Grid

• EGEE (Europe)– Enabling Grids for E-

sciencE

• Nordugrid (Europe)– Grid Research and

Development collaboration

• Open Science Grid (USA)– Science applications from

HEP to biochemistry

The Future• Grow the LHC Grid

• Spread beyond science– Healthcare, commercial uses, government, games

• Will it become part of everyday life?

Further Info

http://www.gridpp.ac.uk

Backups

“UK contributes to EGEE's battle with malaria”

BioMedSuccesses/Day 1107Success % 77%

WISDOM (Wide In Silico Docking On Malaria)

The first biomedical data challenge for drug discovery, which ran on the EGEE grid production service from 11 July 2005 until 19 August 2005.

GridPP resources in the UK contributed ~100,000 kSI2k-hours from 9 sites

Number of Biomedical jobs processed by country

Normalised CPU hours contributed to thebiomedical VO for UK sites, July-August 2005

Is GridPP a Grid?

1. Coordinates resources that are not subject to centralized control

2. … using standard, open, general-purpose protocols and interfaces

3. … to deliver nontrivial qualities of service

1. YES. This is why development and maintenance of LCG is important.

2. YES. VDT (Globus/Condor-G) + EGEE(Glite) ~meet this requirement.

3. YES. LHC experiments data challenges over the summer of 2004.

http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

http://agenda.cern.ch/fullAgenda.php?ida=a042133

Application Development

ATLAS LHCb CMS

BaBar (SLAC) SAMGrid (FermiLab)QCDGrid PhenoGrid

Middleware Development

Middleware Development

Configuration Management

Storage Interfaces

Network Monitoring

Security

Information Services

Grid Data Management

Requirement

Storage Element

Basic File Transfer

Reliable File Transfer

Catalogue Services

Data Management tools

Compute Element

Workload Management

VO Agents

VO Membership Services

DataBase Services

Posix-like I/O

Application Software Installation Tools

Job Monitoring

Reliable Messaging

Information System

15 Baseline Services for a functional Grid

We rely upon gLite components

This middleware builds upon VDT (Globus and Condor) and meets the requirements of all the basic scientific use cases:

1. Purple (amber) areas are (almost) agreed as part of the shared generic middleware stack by each of the application areas

2. Red are areas where generic middleware competes with application-specific software.

www.glite.org

gLite Middleware Stack

2005 Metrics and Quality Assurance

Target Current status

Q2 2006 Target values

Number of Users

~ 1000 ≥ 3000

Number of sites

120 50

Number of CPU

~12000 9500 at month 15

Number of Disciplines

6 ≥ 5

Multinational 24 ≥ 15 countries

LCG Service Challenges

SC2SC3

LHC Service OperationFull physics run

2005 20072006 2008

First physicsFirst beams

cosmics

June05 - Technical Design Report

Sep05 - SC3 Service Phase

May06 – SC4 Service Phase

Sep06 – Initial LHC Service in stable operation

SC4

SC2 – Reliable data transfer (disk-network-disk) – 5 Tier-1s, aggregate 500 MB/sec sustained at CERNSC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 500 MB/sec, including mass storage (~25% of the nominal final throughput for the proton period)SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughputLHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput

Apr07 – LHC Service commissioned

Status?: Exec2 Summary

• 2005 was the first full year of a Production Grid: the UK Tier-1 was the largest CPU provider on the LCG and by the end of the year the Tier-2s provided twice the CPU of the Tier-1.

• The Production Grid is considered to be functional and hence the focus is now on improving performance of the system, especially w.r.t. data storage and management.

• The GridPP2 Project is now approaching halfway and has met 40% of its original targets with 91% of the metrics within specification.

Grid OverviewAim: by 2008 (full year’s data

taking)- CPU ~100MSi2k (100,000

CPUs)- Storage ~80PB - Involving >100 institutes

worldwide

- Build on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT)

1. Prototype went live in September 2003 in 12 countries

2. Extensively tested by the LHC experiments in September 2004

Some of the challenges for 2006

• File transfers – Good initial progress– But some way still to go with testing - stressing reliability, performance– Can only be done with participation of experiments– Distribution to other sites being planned

• Distributed VO services– Plan agreed – T1 will sign off and then VO boxes may be deployed by

T2s– But still to deploy pilot services - ALICE ATLAS CMS LHCb

• End-to-end testing of the T0-T1-T2 chain– MC production, reconstruction, distribution

• Full Tier-1 work load testing – Recording, reprocessing, ESD distribution,

analysis, Tier-2 support• Understanding the “Analysis Facility”

– batch analysis @ T1 and T2– interactive analysis

• Startup scenarios– Schedule is known at high level and defined for Service Challenges –

testing time ahead (in many ways)

Data Processing

LEVEL-1 Trigger Hardwired processors (ASIC, FPGA) Pipelined massive parallel

HIGH LEVEL Triggers Farms of

processors

10-9 10-6 10-3 10-0 103

25ns 3µs hour yearms

Reconstruction&ANALYSIS TIER0/1/2

Centers

ON-lineOFF-line

sec

Giga Tera Petabit

9 or

ders

of

mag

nitu

de

Getting Started

http://ca.grid-support.ac.uk/

1. Get a digital certificate

2. Join a Virtual Organisation (VO) For LHC join LCG and choose a

VO

3. Get access to a local User Interface Machine (UI) and copy your files and certificate there

Authentication – who you are

http://lcg-registrar.cern.ch/

Authorisation – what you are allowed to do

Job Preparation

############# athena.jdl #################Executable = "athena.sh";StdOutput = "athena.out";StdError = "athena.err";InputSandbox = {"athena.sh", "MyJobOptions.py", "MyAlg.cxx", "MyAlg.h", "MyAlg_entries.cxx", "MyAlg_load.cxx", "login_requirements", "requirements", "Makefile"}; OutputSandbox = {"athena.out","athena.err", "ntuple.root", "histo.root", "CLIDDBout.txt"};Requirements = Member("VO-atlas-release-10.0.4", other.GlueHostApplicationSoftwareRunTimeEnvironment);################################################

Input files

Output Files

Choose ATLAS Version

Prepare a file of Job Description Language (JDL):

My C++ CodeJob Options

Script to run

Dep

loym

ent

Bo

ard

Tie

r1/T

ier2

,T

estb

eds,

Ro

llou

t

Ser

vice

spec

ific

atio

n&

pro

visi

on

Use

r B

oar

d

Req

uir

emen

ts

Ap

plic

atio

nD

evel

op

men

t

Use

rfe

edb

ack

Met

adat

a

Wo

rklo

ad

Net

wo

rk

Sec

uri

ty

Info

. M

on

.

PM

B

Sto

rag

e

III. Grid Middleware

I. Experiment Layer

II. Application Middleware

IV. Facilities and Fabrics

UserBoard

DeploymentBoard

Management: Mapping Grid Structures

GridPP Status?

GridPP status

(last night)

14 Sites

2,898 CPUs

124 TBytes storage