9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer,...
-
Upload
dwain-mcdaniel -
Category
Documents
-
view
212 -
download
0
Transcript of 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer,...
9 February 2000 CHEP2000 Paper 368 1
CDF Data Handling:Resource Management and Tests
E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts
• Hardware and Resources
• Organization of Data
• User View of Access to Data
• Batch queues
• Disk Management
• Tests
9 February 2000 CHEP2000 Paper 368 2
Hardware Resources; Organization of Data.
• Mixed flavor unix cluster (CPU resource).
• Fibre channel disk arrays on each node of cluster currently (disk resource).
• Tape drives and robot tape library (tape drives resource). Drives connected directly on each node.
• Concentrate in talk on resources during reading of data.
• Datasets, filesets, files of 1GB.
• Datasets: raw, primary, secondary…
• Tapes store a group of filesets.
• Associations in Datafile Catalog (see Paper 367).
9 February 2000 CHEP2000 Paper 368 3
Title: dh2_fig1.figCreator: fig2dev Version 3.1 Patchlevel 2Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers
UserView
9 February 2000 CHEP2000 Paper 368 4
User View of Access to Data
• Batch queues to manage cpu cycles
• Access data only from disk, not tape.
• Staging jobs in parallel.
• Disk inventory manager package for shared disk space.
• Batch queues to manage tape drives.
Title: DH_user_dim.epsCreator: fig2dev Version 3.2 Patchlevel 1Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers
9 February 2000 CHEP2000 Paper 368 5
Batch Queues• LSF (Platform
Computing) proposed.
• Fairshare scheduling.
• Combined quotas across queues desirable.
• CPU queues for analysis jobs:
• Allocate CPU cycles by group, by user, by special project.
• I/O queues for staging jobs: input, output, event pick.
• 1 tape drive per queue slot.• I/O job cpu use is
proportional to data volume.
• Allocate drives and data volume by group, user, project.
9 February 2000 CHEP2000 Paper 368 6
Disk Management
• By fileset (reduce bookkeeping overhead)
• Allow static filesets for important datasets
• Filesets remain on disk until space is needed.
• Use-reservation prevents deletion of fileset.
• Delete algorithm looks at frequency of use and time since last use-reservation.
• Allocate space algorithm uses quotas by group and user.
9 February 2000 CHEP2000 Paper 368 7
User Job and Disk Management
• User gives dataset
• Dataset converted to list of filesets
• Stager manages list and returns next fileset when asked.
Stager Part of User Job:
• Maintains small buffer of use-reservations to keep ahead of analysis job
• Adds use-reservations for filesets on disk or spawns input staging jobs to maintain buffer
• Releases use-reservations when fileset processed
9 February 2000 CHEP2000 Paper 368 8
Effects of Disk Management
• Job processes filesets on disk first (different orders, different times)
• Multiple jobs using same fileset share staging jobs• Fast analysis job gets multiple staging jobs• Only a fraction of a dataset is present on disk at
one time (conserves disk space).
9 February 2000 CHEP2000 Paper 368 9
Prototype Tests• Set of basic queues on
workstation (LSF)
• Basic staging software
• Simulated analysis jobs which process dummy data
• Set of big and small dummy datasets
• Basic CDF Data Catalog software with contents for this simulation
• Purpose is to test ideas on resource management, and evaluate how analysis jobs interact in a resource limited environment.
9 February 2000 CHEP2000 Paper 368 10
Prototype Scaled Down Environment
• Single cpu workstation, b0ib04
• Staging disk 9 GB
• Filesets of size 0.5 GB
• 4 small datasets @ 1GB i.e. 10% of disk
• 4 large datasets @ 10 GB i.e. 100% of disk
• 2 cpu queues, short & long
• Analysis jobs with variable cpu time
• 4 execution slots for each cpu queue
• 2 simulated tape drives (2 slots in io queue)
• 1 real tape drive in Emass robot
9 February 2000 CHEP2000 Paper 368 11
Simulation Scenarios
Purpose:
• Investigate effect of patterns of use by collaboration (CDF “spin” jobs, repetitive small dataset jobs)
• Exercise data access features
Choosing scenarios:
• A. Short vs long job competition
• B. Several jobs using same big dataset (CDF “spin” jobs)
• C. Competition for tape drives and disk space
9 February 2000 CHEP2000 Paper 368 12
Some scenarios studied:• One long job vs a stream
of short jobs
• Three long jobs on same dataset, see figure
• Ten long jobs on same dataset
• Mixed set of different long jobs and users (6 jobs, 6 users, 4 datasets).
• Stream of short jobs vs 4 different long jobs
• The disk allowed 4 different big datasets to be processed together, as expected for this simulation.
• Extra staging jobs for the streams of short jobs occurred when expected (when contesting against 4 or more different big datasets).
9 February 2000 CHEP2000 Paper 368 13
Trial 45ThreeLongJobs
Title: (KaleidaGraph\252)Creator: (KaleidaGraph: LaserWriter 8 8.4.3)Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers
9 February 2000 CHEP2000 Paper 368 14
Trial 29Stream of shortsvs 1 long, 1 short
9 February 2000 CHEP2000 Paper 368 15
Conclusions from Prototype Tests
• DIM/Stager worked well.
• Stager functions appropriate, simple.
• Gave guidance for full implementation (client/server structure, cleanup, admin functions)
• Limited test of LSF (batch queues) worked well.
9 February 2000 CHEP2000 Paper 368 16
Mock Data Challenge 1
• During December 99 and January 00, CDF successfully tested the movement of MC simulated data from the online Level 3 trigger farm of processors to the tape library, and through the offline reconstruction farm back to the tape library.
• Many sub-groups were involved.
• The resource management methods discussed here were implemented and used but will not be stressed until the rate tests of Challenge 2 in Spring 2000.
9 February 2000 CHEP2000 Paper 368 17
Summary
• Resource management methods were explained.
• Prototype tests were extolled.
• Full implementation of methods is underway. More tests to come.
• CDF Engineering run occurs in August 00.