CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.
-
Upload
laurence-gregory -
Category
Documents
-
view
213 -
download
0
Transcript of CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.
CDF Grid StatusCDF Grid Status
Stefan StonjekStefan Stonjek
05-Jul-200505-Jul-2005
1313thth GridPP meeting / Durham GridPP meeting / Durham
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 22
OutlineOutline
SAM: Sequential Access via Metadata SAM: Sequential Access via Metadata file cataloguefile catalogue metadatametadata
CAF: Central Analysis FarmCAF: Central Analysis Farm JIM: Job Information and MonitoringJIM: Job Information and Monitoring Lessons learnedLessons learned SummarySummary
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 33
CDF is runningCDF is running
CDF is an experiment currently taking dataCDF is an experiment currently taking data For a limited timeFor a limited time
Stable offline computing is high priorityStable offline computing is high priority Limited resources for Grid developmentLimited resources for Grid development Limited possibilities to introduce new softwareLimited possibilities to introduce new software New software is accepted if it provides new functionalityNew software is accepted if it provides new functionality
CDF is using some Grid technologyCDF is using some Grid technology Large parts of the software will stay non-Grid awareLarge parts of the software will stay non-Grid aware
We can learn from the experience gained at CDFWe can learn from the experience gained at CDF
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 44
SAMSAM
SAM is currently used by DSAM is currently used by DØ, CDF and Ø, CDF and MINOSMINOS
SAM was originally developed for SAM was originally developed for DDØØ SAM is used in production at CDFSAM is used in production at CDF
Production output is going directly into SAMProduction output is going directly into SAM
SAM is now the only supported data-SAM is now the only supported data-handling system at CDFhandling system at CDF Some users know how to circumvent SAMSome users know how to circumvent SAM
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 55
SAM problemsSAM problems
Performance problems with db-servers Performance problems with db-servers db-server = CORBA to SQL bridgedb-server = CORBA to SQL bridge Large queries (many files) consume much memoryLarge queries (many files) consume much memory Currently solved by creating multiple db-server Currently solved by creating multiple db-server
instances, this is not optimalinstances, this is not optimal
Recover from failed projectsRecover from failed projects Project covers many input files in many jobsProject covers many input files in many jobs SAM “thinks” file basedSAM “thinks” file based Several input, one output file and crash in the middle Several input, one output file and crash in the middle
causes a problemcauses a problem
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 66
SAM points of failureSAM points of failure
SAM strongly depends on central servicesSAM strongly depends on central services Database is single point of failureDatabase is single point of failure
SAM writes to the database for every actionSAM writes to the database for every action To solve the problemTo solve the problem
complete replication (with write access)complete replication (with write access) distributed databasedistributed database
No “of the shelf” solutionNo “of the shelf” solution
CORBA naming service is single point of failureCORBA naming service is single point of failure Needed by every client to talk to the rest of the SAM universeNeeded by every client to talk to the rest of the SAM universe To solve the problemTo solve the problem
redundant naming serviceredundant naming service distributed naming servicedistributed naming service
Not enough manpowerNot enough manpower
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 77
SAM uploadSAM upload
Tool to insert files into SAM from arbitrary Tool to insert files into SAM from arbitrary nodesnodes
Important for the acceptance of SAM at Important for the acceptance of SAM at CDFCDF
Intense useIntense use Causes performance problemsCauses performance problems Each client starts thread in db-serverEach client starts thread in db-server
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 88
MetadataMetadata
SAM selects files based upon file metadataSAM selects files based upon file metadata Two types of metadataTwo types of metadata
Physical file parameters (file size, checksum etc.)Physical file parameters (file size, checksum etc.) Physics file parameters (run and event numbers, Physics file parameters (run and event numbers,
event information, time etc.)event information, time etc.)
Only physical file parameters schema is fixOnly physical file parameters schema is fix Physics file parameter schema has to be dynamic Physics file parameter schema has to be dynamic
(many changes required)(many changes required)
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 99
Metadata (cont.)Metadata (cont.)
SAM uses metadata query languageSAM uses metadata query language Called “dimensions”Called “dimensions” Protect user from SQL difficultiesProtect user from SQL difficulties Protect database from user mistakesProtect database from user mistakes
Therefore less flexible that plain SQLTherefore less flexible that plain SQL Require constant adoption to new Require constant adoption to new
requirementsrequirements
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1010
Leason Learned (SAM, metadata)Leason Learned (SAM, metadata)
Avoid single point of failureAvoid single point of failure Not new, but difficult with databaseNot new, but difficult with database
Keep a many information a possible localKeep a many information a possible local Minimizing the impact of problems in the Minimizing the impact of problems in the
central databasecentral database
Need a flexible metadata query languageNeed a flexible metadata query language
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1111
CAFCAF
CAFCAF Central (or CDF) Analysis FarmCentral (or CDF) Analysis Farm Good sandbox technologyGood sandbox technology Good graphical job submission interfaceGood graphical job submission interface Does job multiplication for the userDoes job multiplication for the user
Submit once, execute multiple times Submit once, execute multiple times
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1212
CAF (cont.)CAF (cont.)
Distributed CAF (DCAF)Distributed CAF (DCAF) Many sites around the worldMany sites around the world In use for Monte-Carlo productionIn use for Monte-Carlo production Human based resource brockeringHuman based resource brockering
CondorCAF (Glide ins)CondorCAF (Glide ins) New CAF version uses CondorNew CAF version uses Condor Allow Glide-InsAllow Glide-Ins
GridCAFGridCAF ““edg-*” compatibale job submissionedg-*” compatibale job submission CAF-GUI submits to the grid, no job-multiplicationCAF-GUI submits to the grid, no job-multiplication
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1313
JIMJIM
JIM: Job Information and MonitoringJIM: Job Information and Monitoring Together with SAM the system which produces Together with SAM the system which produces
CDF Monte-CarloCDF Monte-Carlo Requires additional software being installed on Requires additional software being installed on
Grid sitesGrid sites SAMSAM Small differences in resource advertisingSmall differences in resource advertising Working towards interoperability between JIM Working towards interoperability between JIM
and LCG-Grid sitesand LCG-Grid sites
Tue 05-Jul-2005Tue 05-Jul-2005 CDF Grid status report (Stefan Stonjek)CDF Grid status report (Stefan Stonjek) 1414
SummarySummary
CDF is using some Grid-toolsCDF is using some Grid-tools LHC experiments can learn from CDF LHC experiments can learn from CDF
experienceexperience SAMSAM
central database central database metadatametadata
CAFCAF submission GUIsubmission GUI job multiplicationjob multiplication