The GSI Mass Storage for Experiment Data DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer,...

30
The GSI Mass Storage for Experiment Data DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer, GSI Darmstadt [email protected]

Transcript of The GSI Mass Storage for Experiment Data DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer,...

The GSI Mass Storage for Experiment Data

DVEE-Palaver GSI Darmstadt

Feb. 15, 2005

Horst Göringer, GSI Darmstadt

[email protected]

Horst Göringer GSI DVEE Palaver 15.2.2005 2

Overview

different views current status last enhancements:

- write cache

- on-line connection to DAQ future plans conclusions

Horst Göringer GSI DVEE Palaver 15.2.2005 3

GSI Mass Storage System

Gsi mass STORagE system

gstore

Horst Göringer GSI DVEE Palaver 15.2.2005 4

gstore: storage view

central tape central disk clients

write cache

tsmcli client

RFIO client

DAQ client

ArchivePool,

RetrievePool,StagePool,

...

...

DAQPool,...

disk

memory

memory

read cache

write cache

ATL

Horst Göringer GSI DVEE Palaver 15.2.2005 5

gstore: hardware view

3 automatic tape libraries (ATL):

(1) IBM 3494 (AIX)

8 tape drives IBM 3590 (14 MByte/s)

ca. 2300 volumes (47 TByte, 13 TByte backup)

1 data mover (adsmsv1)

access via adsmcli, RFIO read

read cache 1.1 TByte

StagePool, RetrievePool

Horst Göringer GSI DVEE Palaver 15.2.2005 6

gstore: hardware view

(2) StorageTek L700 (Windows 2000)

8 tape drives LTO2 ULTRIUM (35 MByte/s)

ca 170 volumes (32 TByte)

8 data mover (gsidmxx), connected via SAN

access via tsmcli, RFIO

read cache 2.5 TByte

StagePool, RetrievePool

write cache

ArchivePool: 0.28 TByte

DAQPool: 0.28 TByte

Horst Göringer GSI DVEE Palaver 15.2.2005 7

gstore: hardware view

(3) StorageTek L700 (Windows 2000)

4 tape drives LTO1 ULTRIUM (15 MByte/s)

ca. 80 volumes (10 TByte):

backup copy of 'irrecoverable' archives ...raw

mainly for backup of user data (~ 30 TByte)

Horst Göringer GSI DVEE Palaver 15.2.2005 8

gstore: software view

2 major components:

• TSM (Tivoli Storage Manager) commercial

handles tape drives and robots

data base• GSI software (~ 80,000 lines of code)

C, sockets, threads

- interface to user (tsmcli / adsmcli, RFIO)

- interface to TSM (TSM API client)

- cache administration

Horst Göringer GSI DVEE Palaver 15.2.2005 9

gstore user view: tsmcli

tsmcli subcommands:

archive file* archive path

retrieve file* archive path

query file* archive path*

stage file* archive path

delete file archive path

ws_query file* archive path

pool_query pool*

*: any combination of wildcard characters (*,?) allowed

soon: file may contain list of files (with wildcard chars)

Horst Göringer GSI DVEE Palaver 15.2.2005 10

gstore user view: RFIO

rfio_[f]open

rfio_[f]read

rfio_[f]write

rfio_[f]close

rfio_[f]stat

rfio_lseek

GSI extensions (for on-line DAQ connection):

rfio_[f]endfile

rfio_[f]newfile

Horst Göringer GSI DVEE Palaver 15.2.2005 11

gstore server view: query

writecacheserver

readcacheserver

DB

DB

TSMserver

client

serverentry

DB

Horst Göringer GSI DVEE Palaver 15.2.2005 12

gstore server view: archive to cache

writecacheserver

readcacheserver

DB

DB

TSMserver

writecache

client

data mover i (of n)

serverentry

DB

moverserver

Horst Göringer GSI DVEE Palaver 15.2.2005 13

gstore server view: archive from cache

writecacheserver

DBTSMserver

tape

Agent

writecache

data mover i (of n)

DB

SAN

serverarchive

TSMStor.

Horst Göringer GSI DVEE Palaver 15.2.2005 14

gstore server view: retrieve from tape

writecacheserver

readcacheserver

DB

DB

TSMserver

tape

AgentStor.

TSM

cacheread

client

data mover i (of n)

entryserver

DB

SAN

moverserver

Horst Göringer GSI DVEE Palaver 15.2.2005 15

server view: retrieve from write cache

writecacheserver

readcacheserver

DB

DB

TSMserver

cacheread write

cache

client

data mover jdata mover i

DB

serverentry

moverserver

servermover

Horst Göringer GSI DVEE Palaver 15.2.2005 16

gstore: overall server view

writecacheserver

readcacheserver

DB

DB

TSMserver

tape tape

tape tape

... servercache

AgentStor.

TSM

cacheread write

cache

client

data mover i (of n)

serverentry

DB

SAN

moverserver

archiveserver

Horst Göringer GSI DVEE Palaver 15.2.2005 17

server view: gstore design concepts

• strict separation of control and data flow• no bottleneck for data• scalable in

capacity (tape and disk)

I/O bandwidth• hardware independent

(as long as TSM support)• platform independent• unique name space

Horst Göringer GSI DVEE Palaver 15.2.2005 18

server view: cache administration • multithreaded servers for read and write cache• each with own metadata DB• main tasks:

- lock/unlock files

- select data movers and file systems

- collect actual infos on

disk space

soon: data mover and disk load -> load balancing

- trigger asynchronous archiving

- disk cleaning • several disk pools with different attributes:

StagePool, RetrievePool, ArchivePool, DAQPool, ...

Horst Göringer GSI DVEE Palaver 15.2.2005 19

usage profile: batch farmbatch farm: ~120 double processor nodes

=> highly parallel mass storage access (read and write)

• read requests:

'good' user: stage all files before

use wildcard chars

'bad' user: read lots of single files from tape

'bad' system: stage disk/DM crashes during analysis

• write requests:

via write cache

distribute as uniformly as possible

Horst Göringer GSI DVEE Palaver 15.2.2005 20

usage profile: experiment DAQ

• several continous data streams from DAQ• keep same DM during life time of data stream• only via RFIO• GSI extensions necessary:

rfio_[f]endfile, rfio_[f]newfile• disks faster emptied than filled:

network -> disk: ~10 MByte/s

disk -> tape: ~30 MByte/s

=> time to stage for on-line analysis• enough disk buffer necessary for case of problems

(robot, TSM, ...)

Horst Göringer GSI DVEE Palaver 15.2.2005 21

current plans: new hardwaremore and safer disks:• write cache: all RAID

4 TByte (ArchivePool, DAQPool)• read cache: +7.5 TByte new RAID

StagePool, RetrievePool,

new pools, e.g. with longer file life time• 5 new data movers:

new fail-safe entry server• hosts query server, cache administration servers

-> query performance!• take-over in case of host failure• metadata DBs mirrored on 2nd host

Horst Göringer GSI DVEE Palaver 15.2.2005 22

current plans: merge tsmcli /adsmcli

new command gstore:• replaces tsmcli and adsmcli• unique name space (already available)• users need not care in which robot data reside• new archive: policy computing center

Horst Göringer GSI DVEE Palaver 15.2.2005 23

brief excursion: future of IBM 3494?

• still heavily used• rather full• hardware highly reliable• should be decided this year!

Horst Göringer GSI DVEE Palaver 15.2.2005 24

usage IBM 3494 (AIX)

Horst Göringer GSI DVEE Palaver 15.2.2005 25

brief excursion: future of IBM 3494?

2 extreme options (and more in between):• no more money investment

use as long as possible

in a few years: move data to other robot• upgrade tape drives and connect to SAN

3590 (~30 GB, 14 MB/s) -> 3592 (300 GB, 40 MB/s)

new media: => 700 TByte capacity

access with available data movers via SAN

new fail-safe TSM server (Linux?)

Horst Göringer GSI DVEE Palaver 15.2.2005 26

current plans: load balancing

• acquire actual info on no. of read/write processes

for each disk, data mover, pool• new write request:

select resource with lowest load• new read request:

avoid 'hot spots'

-> create additional instances of stage file• new option '-randomize' for stage/retrieve

distribute equally to different data movers / disks

split into n (parallel) jobs

Horst Göringer GSI DVEE Palaver 15.2.2005 27

current plans: new org. of DMs

• Linux platform

more familar environment (shell scripts, Unix commands, ...)

case sensitive file names

current mainstream OS for experiment DV

• '2nd level' data movers

no SAN connection

disks filled via ('1st level') DMs with SAN connection

for stage pools with guaranteed life time of files

Horst Göringer GSI DVEE Palaver 15.2.2005 28

current plans: new org. of DMs• integration of selected group file servers

as '2nd level' data movers

disk space (logically) reserved for owners

pool policy according to owners

many advantages:

no NFS => much faster I/O

files physically distributed over several servers

load balancing of gstore

disk cleaning

disadvantages:

only for exp. data, access via gstore interface

Horst Göringer GSI DVEE Palaver 15.2.2005 29

current plans: user interface• a large number of user requests:

- longer file names

- option to rename files

- more specific return codes

- ...• program code consolidation • improved error recovery after HW failures• support for successor of alien• GRID support

- gstore as Storage Element (SE)

- Storage Resource Manager (SRM)

-> new functionalities, e.g. reserve resources

Horst Göringer GSI DVEE Palaver 15.2.2005 30

Conclusions

• GSI concept for mass storage successfully verified• hardware and platform independent• scalable in capacity and bandwidth to keep up with

- requirements of future batch farm(s)

- data rates of future experiments• gstore able to manage very different usage profiles• but still a lot of work ...

to fully reach all discussed plans