CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

14
CDF Grid Status CDF Grid Status Stefan Stonjek Stefan Stonjek 05-Jul-2005 05-Jul-2005 13 13 th th GridPP meeting / GridPP meeting / Durham Durham

Transcript of CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Page 1: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

CDF Grid StatusCDF Grid Status

Stefan StonjekStefan Stonjek

05-Jul-200505-Jul-2005

1313thth GridPP meeting / Durham GridPP meeting / Durham

Page 2: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 22

OutlineOutline

SAM: Sequential Access via Metadata SAM: Sequential Access via Metadata file cataloguefile catalogue metadatametadata

CAF: Central Analysis FarmCAF: Central Analysis Farm JIM: Job Information and MonitoringJIM: Job Information and Monitoring Lessons learnedLessons learned SummarySummary

Page 3: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 33

CDF is runningCDF is running

CDF is an experiment currently taking dataCDF is an experiment currently taking data For a limited timeFor a limited time

Stable offline computing is high priorityStable offline computing is high priority Limited resources for Grid developmentLimited resources for Grid development Limited possibilities to introduce new softwareLimited possibilities to introduce new software New software is accepted if it provides new functionalityNew software is accepted if it provides new functionality

CDF is using some Grid technologyCDF is using some Grid technology Large parts of the software will stay non-Grid awareLarge parts of the software will stay non-Grid aware

We can learn from the experience gained at CDFWe can learn from the experience gained at CDF

Page 4: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 44

SAMSAM

SAM is currently used by DSAM is currently used by DØ, CDF and Ø, CDF and MINOSMINOS

SAM was originally developed for SAM was originally developed for DDØØ SAM is used in production at CDFSAM is used in production at CDF

Production output is going directly into SAMProduction output is going directly into SAM

SAM is now the only supported data-SAM is now the only supported data-handling system at CDFhandling system at CDF Some users know how to circumvent SAMSome users know how to circumvent SAM

Page 5: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 55

SAM problemsSAM problems

Performance problems with db-servers Performance problems with db-servers db-server = CORBA to SQL bridgedb-server = CORBA to SQL bridge Large queries (many files) consume much memoryLarge queries (many files) consume much memory Currently solved by creating multiple db-server Currently solved by creating multiple db-server

instances, this is not optimalinstances, this is not optimal

Recover from failed projectsRecover from failed projects Project covers many input files in many jobsProject covers many input files in many jobs SAM “thinks” file basedSAM “thinks” file based Several input, one output file and crash in the middle Several input, one output file and crash in the middle

causes a problemcauses a problem

Page 6: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 66

SAM points of failureSAM points of failure

SAM strongly depends on central servicesSAM strongly depends on central services Database is single point of failureDatabase is single point of failure

SAM writes to the database for every actionSAM writes to the database for every action To solve the problemTo solve the problem

complete replication (with write access)complete replication (with write access) distributed databasedistributed database

No “of the shelf” solutionNo “of the shelf” solution

CORBA naming service is single point of failureCORBA naming service is single point of failure Needed by every client to talk to the rest of the SAM universeNeeded by every client to talk to the rest of the SAM universe To solve the problemTo solve the problem

redundant naming serviceredundant naming service distributed naming servicedistributed naming service

Not enough manpowerNot enough manpower

Page 7: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 77

SAM uploadSAM upload

Tool to insert files into SAM from arbitrary Tool to insert files into SAM from arbitrary nodesnodes

Important for the acceptance of SAM at Important for the acceptance of SAM at CDFCDF

Intense useIntense use Causes performance problemsCauses performance problems Each client starts thread in db-serverEach client starts thread in db-server

Page 8: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 88

MetadataMetadata

SAM selects files based upon file metadataSAM selects files based upon file metadata Two types of metadataTwo types of metadata

Physical file parameters (file size, checksum etc.)Physical file parameters (file size, checksum etc.) Physics file parameters (run and event numbers, Physics file parameters (run and event numbers,

event information, time etc.)event information, time etc.)

Only physical file parameters schema is fixOnly physical file parameters schema is fix Physics file parameter schema has to be dynamic Physics file parameter schema has to be dynamic

(many changes required)(many changes required)

Page 9: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 99

Metadata (cont.)Metadata (cont.)

SAM uses metadata query languageSAM uses metadata query language Called “dimensions”Called “dimensions” Protect user from SQL difficultiesProtect user from SQL difficulties Protect database from user mistakesProtect database from user mistakes

Therefore less flexible that plain SQLTherefore less flexible that plain SQL Require constant adoption to new Require constant adoption to new

requirementsrequirements

Page 10: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1010

Leason Learned (SAM, metadata)Leason Learned (SAM, metadata)

Avoid single point of failureAvoid single point of failure Not new, but difficult with databaseNot new, but difficult with database

Keep a many information a possible localKeep a many information a possible local Minimizing the impact of problems in the Minimizing the impact of problems in the

central databasecentral database

Need a flexible metadata query languageNeed a flexible metadata query language

Page 11: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1111

CAFCAF

CAFCAF Central (or CDF) Analysis FarmCentral (or CDF) Analysis Farm Good sandbox technologyGood sandbox technology Good graphical job submission interfaceGood graphical job submission interface Does job multiplication for the userDoes job multiplication for the user

Submit once, execute multiple times Submit once, execute multiple times

Page 12: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1212

CAF (cont.)CAF (cont.)

Distributed CAF (DCAF)Distributed CAF (DCAF) Many sites around the worldMany sites around the world In use for Monte-Carlo productionIn use for Monte-Carlo production Human based resource brockeringHuman based resource brockering

CondorCAF (Glide ins)CondorCAF (Glide ins) New CAF version uses CondorNew CAF version uses Condor Allow Glide-InsAllow Glide-Ins

GridCAFGridCAF ““edg-*” compatibale job submissionedg-*” compatibale job submission CAF-GUI submits to the grid, no job-multiplicationCAF-GUI submits to the grid, no job-multiplication

Page 13: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1313

JIMJIM

JIM: Job Information and MonitoringJIM: Job Information and Monitoring Together with SAM the system which produces Together with SAM the system which produces

CDF Monte-CarloCDF Monte-Carlo Requires additional software being installed on Requires additional software being installed on

Grid sitesGrid sites SAMSAM Small differences in resource advertisingSmall differences in resource advertising Working towards interoperability between JIM Working towards interoperability between JIM

and LCG-Grid sitesand LCG-Grid sites

Page 14: CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1414

SummarySummary

CDF is using some Grid-toolsCDF is using some Grid-tools LHC experiments can learn from CDF LHC experiments can learn from CDF

experienceexperience SAMSAM

central database central database metadatametadata

CAFCAF submission GUIsubmission GUI job multiplicationjob multiplication