9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer,...

17
9 February 2000 CHEP2000 Paper 368 1 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts • Hardware and Resources • Organization of Data • User View of Access to Data • Batch queues • Disk Management • Tests

Transcript of 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer,...

Page 1: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 1

CDF Data Handling:Resource Management and Tests

E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts

• Hardware and Resources

• Organization of Data

• User View of Access to Data

• Batch queues

• Disk Management

• Tests

Page 2: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 2

Hardware Resources; Organization of Data.

• Mixed flavor unix cluster (CPU resource).

• Fibre channel disk arrays on each node of cluster currently (disk resource).

• Tape drives and robot tape library (tape drives resource). Drives connected directly on each node.

• Concentrate in talk on resources during reading of data.

• Datasets, filesets, files of 1GB.

• Datasets: raw, primary, secondary…

• Tapes store a group of filesets.

• Associations in Datafile Catalog (see Paper 367).

Page 3: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 3

Title: dh2_fig1.figCreator: fig2dev Version 3.1 Patchlevel 2Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

UserView

Page 4: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 4

User View of Access to Data

• Batch queues to manage cpu cycles

• Access data only from disk, not tape.

• Staging jobs in parallel.

• Disk inventory manager package for shared disk space.

• Batch queues to manage tape drives.

Title: DH_user_dim.epsCreator: fig2dev Version 3.2 Patchlevel 1Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Page 5: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 5

Batch Queues• LSF (Platform

Computing) proposed.

• Fairshare scheduling.

• Combined quotas across queues desirable.

• CPU queues for analysis jobs:

• Allocate CPU cycles by group, by user, by special project.

• I/O queues for staging jobs: input, output, event pick.

• 1 tape drive per queue slot.• I/O job cpu use is

proportional to data volume.

• Allocate drives and data volume by group, user, project.

Page 6: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 6

Disk Management

• By fileset (reduce bookkeeping overhead)

• Allow static filesets for important datasets

• Filesets remain on disk until space is needed.

• Use-reservation prevents deletion of fileset.

• Delete algorithm looks at frequency of use and time since last use-reservation.

• Allocate space algorithm uses quotas by group and user.

Page 7: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 7

User Job and Disk Management

• User gives dataset

• Dataset converted to list of filesets

• Stager manages list and returns next fileset when asked.

Stager Part of User Job:

• Maintains small buffer of use-reservations to keep ahead of analysis job

• Adds use-reservations for filesets on disk or spawns input staging jobs to maintain buffer

• Releases use-reservations when fileset processed

Page 8: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 8

Effects of Disk Management

• Job processes filesets on disk first (different orders, different times)

• Multiple jobs using same fileset share staging jobs• Fast analysis job gets multiple staging jobs• Only a fraction of a dataset is present on disk at

one time (conserves disk space).

Page 9: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 9

Prototype Tests• Set of basic queues on

workstation (LSF)

• Basic staging software

• Simulated analysis jobs which process dummy data

• Set of big and small dummy datasets

• Basic CDF Data Catalog software with contents for this simulation

• Purpose is to test ideas on resource management, and evaluate how analysis jobs interact in a resource limited environment.

Page 10: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 10

Prototype Scaled Down Environment

• Single cpu workstation, b0ib04

• Staging disk 9 GB

• Filesets of size 0.5 GB

• 4 small datasets @ 1GB i.e. 10% of disk

• 4 large datasets @ 10 GB i.e. 100% of disk

• 2 cpu queues, short & long

• Analysis jobs with variable cpu time

• 4 execution slots for each cpu queue

• 2 simulated tape drives (2 slots in io queue)

• 1 real tape drive in Emass robot

Page 11: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 11

Simulation Scenarios

Purpose:

• Investigate effect of patterns of use by collaboration (CDF “spin” jobs, repetitive small dataset jobs)

• Exercise data access features

Choosing scenarios:

• A. Short vs long job competition

• B. Several jobs using same big dataset (CDF “spin” jobs)

• C. Competition for tape drives and disk space

Page 12: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 12

Some scenarios studied:• One long job vs a stream

of short jobs

• Three long jobs on same dataset, see figure

• Ten long jobs on same dataset

• Mixed set of different long jobs and users (6 jobs, 6 users, 4 datasets).

• Stream of short jobs vs 4 different long jobs

• The disk allowed 4 different big datasets to be processed together, as expected for this simulation.

• Extra staging jobs for the streams of short jobs occurred when expected (when contesting against 4 or more different big datasets).

Page 13: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 13

Trial 45ThreeLongJobs

Title: (KaleidaGraph\252)Creator: (KaleidaGraph: LaserWriter 8 8.4.3)Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Page 14: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 14

Trial 29Stream of shortsvs 1 long, 1 short

Page 15: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 15

Conclusions from Prototype Tests

• DIM/Stager worked well.

• Stager functions appropriate, simple.

• Gave guidance for full implementation (client/server structure, cleanup, admin functions)

• Limited test of LSF (batch queues) worked well.

Page 16: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 16

Mock Data Challenge 1

• During December 99 and January 00, CDF successfully tested the movement of MC simulated data from the online Level 3 trigger farm of processors to the tape library, and through the offline reconstruction farm back to the tape library.

• Many sub-groups were involved.

• The resource management methods discussed here were implemented and used but will not be stressed until the rate tests of Challenge 2 in Spring 2000.

Page 17: 9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

9 February 2000 CHEP2000 Paper 368 17

Summary

• Resource management methods were explained.

• Prototype tests were extolled.

• Full implementation of methods is underway. More tests to come.

• CDF Engineering run occurs in August 00.