Operation of the CERN Managed Storage environment; current status and future directions CHEP 2004 /...

Operation of the CERN Managed Storage

environment;current status and future

directions CHEP 2004 / Interlaken

Data Services team:Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David Hughes,

Gordon Lee, Tony Osborne, Tim Smith

2004/09/29 CERN Managed Storage: [email protected] 2 of 18

Managed Storage Dream

Free to open… Instant access Any time later… Unbounded recall Find exact same coins Goods integrity

0011010101001100111011111011


Managed Storage Reality

Maintain + upgrade, innovate + technology refresh Ageing equipment, escalating requirements

Dynamic store / Active Data Management

0011010101001100111011111011

TapeStore

DiskCache


CASTOR Service

CERN Managed Storage

Disk Cache

Tape Store

Stage Servers

CAST

OR S

erve

rs

CASTOR Grid Service

GRIDftp serversSRM Service

ReliabilityUniformityAutomation

New ServiceScalability

RedundancyScalability

Tape StoreTape StoreTape StoreTape StoreTape Store

Disk Cache

Stage ServersDisk Cache






Stage Servers

Tape StoreTape Store

42 stager/disk caches370 disk servers6,700 spinning disks

70 tape servers35,000 tapes

HighlyDistributed System


CASTOR Service Running experiments

CDR for NA48, COMPASS, Ntof Experiment peaks of 120MB/s

Combined average 10TB/day Sustained 10MB/s per dedicated 9940B drive

Record 1.5 PB in 2004 Pseudo-online analysis

Experiments in the analysis phase LEP and Fixed Target

LHC experiments in construction Data production / analysis (Tier0/1 operations) Test beam CDR


Quattor-ising Motivation: Scale (See G.Cancio’s talk)

Uniformity; Manageability; Automation Configuration Description (into CDB)

HW and SW; nodes and services Reinstallation

Quiescing a server ≠ draining a client! Gigabit cards gymnastics; BIOS upgrades for PXE Eliminate peculiarities from CASTOR nodes Switches misconfigurations, firmware upgrades (ext2 -> ext3)

Manageable servers


LEMON-ising Lemon agent everywhere

Linux box monitoring and alarms

Automatic HW static checks

Adding CASTOR server specific Service monitoring

HW Monitoring temperatures, voltages, fans etc

lm_sensors -> IPMI (see tape section) disk errors; SMART

smartmontools auto checks; predictive monitoring

tape drive errors; SMARTUniformly monitored servers


Warranties

0

50

100

150

200

250

300

350

400

450

500

Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09

Num

ber o

f Dis

k Se

rver

s

Out of WarrantyX - 2.8 GHzX - 2.8 GHzELONEX - 2.4GHzELONEX - 2.4GHzELONEX - 2.0GHzELONEX - 1.1GHzJTT - 1.1GHzELONEX - 1GHzELONEX - 1GHzJTT - 1GHzELONEX - 900ELONEX - 900ELONEX - 900TECH - 800ELONEX - 700ELONEX - 650COGESTRA - 500ELONEX - 500ELONEX - 500COGESTRA - 450Out of Warranty

4th Generation

3rd Generation

2nd Generation

1st Generation

0th Generation

5th Generation


Disk Replacement

10 months before case agreed: Head instabilities 4 weeks to execute

1224 disks exchanged (=18%); And the cages

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

Dec-03 Jan-04 Feb-04 Mar-04 Apr-04 May-04 Jun-04 Jul-04 Aug-04 Sep-04

% B

roke

n M

irror

s

1224 disks replaced

0th Generation

Unacceptablyhigh failurerate!


Disk Storage Developments

Disk Configurations / File systems HW.Raid-1/ext3 -> HW.Raid-5+SW.Raid-0/XFS

IPMI: HW health monit. + remote access Remote reset + power-on/off (indep. of OS) Serial console redirection over LAN

LEAF: Hardware and State Management Next generations (see H.Meinhard’s talk)

360 TB SATA in a box 140 TB external SATA disk arrays

New CASTOR stager (JD.Durand’s talk)


9940B228849840

8149

35908639

9940B50

984020

359014

LTO6

9940A4

Tape Service 70 tape servers (Linux)

(mostly) Single FibreChannel attached drives

2 symmetric robotic installations 5 x STK 9310 Silos in each

Drives Media

BulkphysicsFast

Access

Backup


Chasing Instabilities Tape server temperatures?


Media Migration Technology generations

Migrate data to avoid obsolescence and reliability issues in drives 1986 3480 / 3490 1995 Redwood 2001 9940

Financial Capacity gain in sub generations


1% of A tapes unreadableon B drives – keep A drives

(drive head tolerances)

Media Migration

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

2001 2002 2003 2004

Num

ber o

f Tap

es

9940A60GB12MB/s

9940B200GB30MB/s

Replace A drivesby B drives

Capacity, Performance, Reliability

9 months;25% of Bresources

MigrateA to Bformat


Tape Service Developments

Removing tails… Tracking of all tape errors (18 months)

Retiring of problematic media Proactive retiring of heavily used media (>5000

mounts) repack on new media

Checksums Populated writing to tape Verified loading back to disk

Drive testing Commodity LTO-2; High end IBM3592/STK-NG

New Technology; SL8500 library / Indigo


CASTOR Central Servers Combined Oracle DB and Application

Daemons node Assorted helper applications distributed

(historically) across ageing nodes FrontEnd / BackEnd split

FE: Load balanced applications servers Eliminate interference with DB Load distribution, overload localisation

BE: (developing) clustered DB Reliability, security


GRID Data Management GridFTP + SRM servers (Former)

Standalone / experiment dedicated Hard to intervene; not scalable

New load-balanced shared 6 node Service castorgrid.cern.ch DNS hacks for Globus reverse lookup issues SRM modifications to support operation

behind load balancer GridFTP standalone client

Retire ftp and bbftp access to CASTOR


Conclusions

Stabilising HW and SWAutomationMonitoring and control

Reactive -> Proactive Data Management

Operation of the CERN Managed Storage environment; current status and future directions CHEP 2004 /...

Documents

Transcript of Operation of the CERN Managed Storage environment; current status and future directions CHEP 2004 /...