1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of...

27
1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28, 2004

Transcript of 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of...

Page 1: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

1

Grid3: an Application Grid Laboratory for

ScienceRob GardnerUniversity of Chicagoon behalf of the Grid3 project

CHEP ’04, Interlaken

September 28, 2004

Page 2: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

2

Grid2003: an application grid laboratory

virtual data grid laboratory

virtual data research

end-to-end HENPapplications

CERN LHC: US ATLAStestbeds & data challenges

CERN LHC: USCMStestbeds & data challenges

Grid3

Page 3: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

3

Grid3 at a Glance Grid environment built from core Globus and Condor

middleware, as delivered through the Virtual Data Toolkit (VDT) GRAM, GridFTP, MDS, RLS, VDS

…equipped with VO and multi-VO security, monitoring, and operations services

…allowing federation with other Grids where possible, eg. CERN LHC Computing Grid (LCG) USATLAS: GriPhyN VDS execution on LCG sites USCMS: storage element interoperability (SRM/dCache)

Delivering the US LHC Data Challenges

Page 4: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

4

Grid3 Design Simple approach:

Sites consisting of Computing element (CE) Storage element (SE) Information and monitoring services

VO level, and multi-VO VO information services Operations (iGOC)

Minimal use of grid-wide systems No centralized workload manager, replica or data

management catalogs, or command line interface higher level services are provided by individual VO’s

sitesitesitesite

VO VO

iGOC

sitesitesitesite

CE SE

Page 5: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

5

Site Services and Installation Goal is to install and configure with minimal human

intervention Use Pacman and distributed software “caches” Registers site with VO and Grid3 level services Accounts, application install areas & working directories

Compute Element

Storage

Grid3 Site%pacman –get iVDGL:Grid3

VDT

VO service

GIIS register

Info providers

Grid3 Schema

Log management

$app

$tmp

four hourinstall

and validate

Page 6: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

6

Multi-VO Security Model DOEGrids Certificate Authority PPDG or iVDGL Registration

Authority Authorization service: VOMS Each Grid3 site generates a Globus

gridmap file with an authenticated SOAP query to each VO service

Site-specific adjustments or mappings

Group accounts to associate VOs with jobs iVDGL

US CMS

LSC

SDSS

BTeVSite

Grid3 grid-map

VOMS

US ATLAS

Page 7: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

7

iVDGL Operations Center (iGOC) Co-located with Abilene NOC (Indianapolis) Hosts/manages multi-VO services:

top level Ganglia, GIIS collectors MonALISA web server and archival service VOMS servers for iVDGL, BTeV, SDSS Site Catalog service, Pacman caches

Trouble ticket systems phone (24 hr), web and email based collection and reporting system Investigation and resolution of grid middleware problems at the level

of 30 contacts per week

Weekly operations meetings for troubleshooting

Page 8: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

8

Service monitoring

Grid3 – a snapshot of sites

Sep 04•30 sites, multi-VO•shared resources•~3000 CPUs (shared)

Page 9: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

9

Grid3 Monitoring Framework

c.f. M. Mambelli, B. Kim et al., #490

Page 10: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

10

Monitors

Jobs by VOACDC

Job Queues(Monalisa)

Data IO(Monalisa)

Metrics(MDViewer)

Page 11: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

11

Use of Grid3 – led by US LHC 7 Scientific

applications and 3 CS demonstrators A third HEP and two biology

experiments also participated

Over 100 users authorized to run on Grid3 Application execution

performed by dedicated individuals

Typically ~few users ran the applications from a particular experiment

Page 12: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

12

US CMS Data Challenge DC04

CMS dedicated (red)

Opportunistic use of Grid3 non-CMS (blue)

Events producedvs. day

c.f. A. Fanfani, #497

Page 13: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

14

Shared infrastructure, last 6 months

cms dc04

atlasdc2

Sep 10

Usa

ge:

CP

Us

Page 14: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

15

ATLAS DC2 production on Grid3: a joint activity with LCG and NorduGrid

-20000

0

20000

40000

60000

80000

100000

120000

140000

4062

3

4062

6

4062

9

4070

2

4070

5

4070

8

4071

1

4071

4

4071

7

4072

0

4072

3

4072

6

4072

9

4080

1

4080

4

4080

7

4081

0

4081

3

4081

6

4081

9

4082

2

4082

5

4082

8

4083

1

4090

3

4090

6

4090

9

4091

2

4091

5

4091

8

Days

Nu

mb

er

of

job

s

LCGNorduGridGrid3Total

G. Poulard, 9/21/04

# V

alid

ated

Job

s

total

c.f. L. Goossens, #501& O. Smirnova #499

Day

Page 15: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

17

Beyond LHC applications…

Astrophysics and Astronomical LIGO/LSC: blind search for continuous gravitational waves SDSS: maxBcg, cluster finding package

Biochemical SnB: Bio-molecular program, analyses on X-ray diffraction to

find molecular structures GADU/Gnare: Genome analysis, compares protein sequences

Computer Science Supporting Ph.D. research

adaptive data placement and scheduling algorithms mechanisms for policy information expression, use, and monitoring

Page 16: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

18

Astrophysics: Sloan Sky Survey Image stripes of the sky from telescope data sources:

galaxy cluster finding red shift analysis, weak lensing effects

Analyze weighted images Increase sensitivity by 2 orders of magnitude with object detection and measurement code

Workflow: replicate sky segment data to Grid3 sites average, analyze, send output to Fermilab 44,000 jobs, 30% complete

Page 17: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

19

Time Period:May 1 - Sept. 1, 2004

Total Number of Jobs:71949

Total CPU Time:774 CPU Days

Average Job Runtime:0.26 Hr

SDSS Job Statistics on Grid3

Page 18: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

20

Structural BiologySnB is a computer program based on Shake-and-Bake where:

A dual-space direct-methods procedure for determining molecular crystal structures from X-ray diffraction data is used.

As many as 2000 unique non-H atom difficult molecular structures have been solved in a routine fashion.

SnB has been routinely applied to jump-start the solution of large proteins, increasing the number of selenium atoms determined in Se-Met molecules from dozens to several hundred.

SnB is expected to play a vital role in the study of ribosomes and large macromolecular assemblies containing many different protein molecules and hundreds of heavy-atom sites.

Page 19: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

21

Genomic Searches and Analysis Searches and find new

genomes on public databases (eg. NCBI)

Each genome composed of ~4k genes

Each gene needs to be processed and characterized Each gene handled by

separate process Save results for future use

also: BLAST protein sequences

Smart DiffCompare local directory

with PDB directory

User InterfaceSelect genomes torun through tools

PDB Acquisition ftp to Public Databases (PDB) Search New, Updated genome

Exit

Genome Upload Get new or updated

genomes to localdirectory

Create info files foranalyzing genomes

Pre-HPC Select jobs to run Parse info files

Tool GrabberParse data fromoutput files

Genbank GrabberParse data fromannotation files

Check Output

User NotificationNotify userregarding updates

New/Updated Old

Submit to bio toolsSubmit genomes in parallel Blast Pfam Blocks

HPC ProcessingCHIBA-CITY

OUTPUT

Correct

ORACLERelational DB

GenomeIntegratedDatabase

Error

Organism Name:Cornybacterium_glutamicum

Version and GINumber:NC_003450.1GI:19551250

Definition:Cornybateriumglutamicum, completegenome.

Sequence Qty: 3456

Path to fasta file:/nfs/............

Tool: ChibaBlast

GADU Work FlowGADU

250 processors3M sequencesID’d: bacterial, viral, vertebrate, mammal

Page 20: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

22

Lessons (1) Human interactions in grid building costly Keeping site requirements light lead to heavy

loads on gatekeeper hosts Diverse set of sites made jobs requirements

exchange difficult Single point failures – rarely happen; certificate

revocation lists expiring happened twice Configuration problems – Pacman helped, but still

spent enormous amounts of time diagnosing problems

Page 21: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

23

Lessons (2) Software updates were relatively easy or

extremely painful Authorization: simple in Grid3, but coarse grained Troubleshooting: efficiency for submitted jobs

was not as high as we’d like. Complex system with many failure modes and points

of failure. Need fine grained monitoring tools. Need to improve at both service level and user level

Page 22: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

24

Operations Experience iGOC and US ATLAS Tier1 (BNL) developed operations

response model in support of DC2 Tier1 center

core services, “on-call” person available always response protocol developed

iGOC Coordinates problem resolution for Tier1 “off hours” Trouble handling for non-ATLAS Grid3 sites. Problems

resolved at weekly iVDGL operations meetings ~600 trouble tickets (generic); ~20 ATLAS DC2 specific

Extensive use of email lists

Page 23: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

25

Not major problems bringing sites into single purpose grids simple computational grids for highly portable

applications specific workflows as defined by today’s JDL

and/or DAG approaches centralized, project-managed grids to a particular

scale, yet to be seen

Page 24: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

26

Major problems: two perspectives Site & service providing perspective:

maintaining multiple “logical” grids with a given resource; maintaining robustness; long term management; dynamic reconfiguration; platforms

complex resource sharing policies (department, university, projects, collaborative), user roles

Application developer perspective: challenge of building integrated distributed systems end-to-end debugging of jobs, understanding faults common workload and data management systems

developed separately for each VO

Page 25: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

27

Grid3 is evolving into OSG Main features/enhancements

Storage Resource Management Improve authorization service Add data management capabilities Improve monitoring and information services Service challenges and interoperability with other

Grids Timeline

Current Grid3 remains stable through 2004 Service development continues Grid3dev platform

c.f. R. Pordes, #192

Page 26: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

28

Conclusions Grid3 taught us many lessons about how to

deploy and run a production grid Breakthrough in demonstrated use of

“opportunistic” resources enabled by grid technologies

Grid3 will be a critical resource for continued data challenges through 2004, and environment to learn how to operate and upgrade large scale production grids

Grid3 is evolving to OSG with enhanced capabilities

Page 27: 1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

29

Acknowledgements R. Pordes

(Grid3 co-coordinator)

and the rest of the Grid3 team which did all the work! Site administrators VO service administrators Application developers Developers and contributors iGOC team Project teams