March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems...

21
March 6, 2009 March 6, 2009 Tofigh Azemoon Tofigh Azemoon 1 Real-time Data Access Monitoring Real-time Data Access Monitoring in Distributed, Multi Petabyte in Distributed, Multi Petabyte Systems Systems Tofigh Azemoon Tofigh Azemoon Jacek Becla Jacek Becla Andrew Hanushevsky Andrew Hanushevsky Massimiliano Turri Massimiliano Turri SLAC National Accelerator Laboratory SLAC National Accelerator Laboratory

Transcript of March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems...

Page 1: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 11

Real-time Data Access Monitoring Real-time Data Access Monitoring in Distributed, Multi Petabyte in Distributed, Multi Petabyte

SystemsSystems

Tofigh AzemoonTofigh Azemoon

Jacek BeclaJacek Becla

Andrew Hanushevsky Andrew Hanushevsky

Massimiliano TurriMassimiliano Turri

SLAC National Accelerator LaboratorySLAC National Accelerator Laboratory

Page 2: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 22

Soon a typical running HEP experiment will look like Soon a typical running HEP experiment will look like thisthis

100s of users100s of users

10s of thousands of batch nodes10s of thousands of batch nodes

1000s of data servers1000s of data servers

10s of millions of files10s of millions of filesContainingContaining

10s of PB of data10s of PB of data++

geographically distributed geographically distributed crossing many time zones.crossing many time zones.

Page 3: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 33

Mission StatementMission Statement

Provide real time overall view of Provide real time overall view of system performancesystem performance

Respond to detailed queriesRespond to detailed queries to identify bottle necks to identify bottle necks

to optimize the systemto optimize the system

to aid in planning system to aid in planning system expansionexpansion

Page 4: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 44

The SLAC ¼The SLAC ¼PBPB “kan” “kan” ClusterCluster

ClientsClients

kan001 kan002 kan003 kan004 kan059

bbr-rdr03 bbr-rdr04

Data ServersData Servers( 320 TB)( 320 TB)

ManagersManagers

Page 5: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 55

Monitored ObjectsMonitored Objects

client

file

job

session

user

type_2

type_1

server

Page 6: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 66

File classes to monitor aggregate valuesFile classes to monitor aggregate values

for groups of files for groups of files BaBar Examples:BaBar Examples:

type_1 ( dataType )type_1 ( dataType ) /store/store//PRPR//R22/AllEvents/0006/70/22.0.3/AllEvents_00067045_22.0.3V03.02E.rootR22/AllEvents/0006/70/22.0.3/AllEvents_00067045_22.0.3V03.02E.root /store//store/SPSP/R22/000998/200406/22.0.3/SP_000998_068468.01.root /R22/000998/200406/22.0.3/SP_000998_068468.01.root

/store/store//PRskimsPRskims//R22/22.1.1cR22/22.1.1c//IsrIncExcIsrIncExc//79/79/IsrIncExc_57978.01.rootIsrIncExc_57978.01.root

/store/store//SPskimsSPskims//R22/22.1.1cR22/22.1.1c//Tau1NTau1N//001235/200212/001235/200212/Tau1N_001235_49553.01.rootTau1N_001235_49553.01.root

type_2 (skims)type_2 (skims)

File path File path getFileType getFileType ( ( type_1 valuetype_1 value, , type_2 valuetype_2 value))

Page 7: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 77

Xrootd ServerXrootd Server

Highly scalable serverHighly scalable server Posix like access to filesPosix like access to files Load balancingLoad balancing Transparent recovery from Transparent recovery from

server crashesserver crashes Fault tolerantFault tolerant Very low latencyVery low latency

Page 8: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 88

Monitoring Implementation in Monitoring Implementation in xrootdxrootd

Minimal impact Minimal impact on client requestson client requests

Robustness in Robustness in multimode failuremultimode failure

Precision & Precision & specificity of specificity of collected datacollected data

Real time Real time scalabilityscalability

Use UDP datagramsUse UDP datagrams Data servers Data servers

insulated from insulated from monitoring. Butmonitoring. But

Packets can get lostPackets can get lost

Outsource client Outsource client serializationserialization

Low bounded Low bounded resource usageresource usage

Use of time bucketsUse of time buckets

Page 9: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 99

•Start SessionsessionId, user, PId, client, server, start T

•StagingstageId, user, PId, client, file path, stage T, duration, server

•Open FilefileId, user, PId, client, server, file path, open T

•Close File fileId, bytes read, bytes written, close T•End Session

sessionId, duration, end T + Xrootd restart time for each server

RT

data

Page 10: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1010

Single Site MonitoringSingle Site Monitoring

Page 11: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1111

Multi-site MonitoringMulti-site Monitoring

Page 12: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1212

COMPONENTSCOMPONENTS

Collector/Decoder (C+Collector/Decoder (C++)+)

MySQL database (5.0)MySQL database (5.0) Database Applications Database Applications

(Perl, Perl DBI)(Perl, Perl DBI) CreateCreate Load Load PreparePrepare UpgradeUpgrade ReloadReload BackupBackup

Web application ( JSP3)Web application ( JSP3) DB access via JDBCDB access via JDBC

Data servers Data servers xrootd enabledxrootd enabled

Database ServerDatabase Server Hosting DB & Hosting DB &

running DB running DB applicationapplication

Web ServerWeb Server TomcatTomcat

For security reasons DB For security reasons DB & Web servers on & Web servers on different hostsdifferent hosts

Page 13: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1313

Configuration FileConfiguration File dbName: xrdmon_kan_v005dbName: xrdmon_kan_v005 MySQLUser: xrdmonMySQLUser: xrdmon webUser: readerwebUser: reader MySQLSocket: /tmp/mysql.sockMySQLSocket: /tmp/mysql.sock baseDir: /u1/xrdmon/allSitesbaseDir: /u1/xrdmon/allSites ctrPort: 9931ctrPort: 9931 thisSite: SLACthisSite: SLAC fileType: dataType 100fileType: dataType 100 fileType: skim 500fileType: skim 500

Page 14: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1414

site: 1 SLAC PST8PDT 2005-06-13 site: 1 SLAC PST8PDT 2005-06-13

00:00:0000:00:00 site: 2 RAL WET 2005-08-08 10:14:00site: 2 RAL WET 2005-08-08 10:14:00 site: 2 CCIN2P3 CET 2006-10-16 00:00:00site: 2 CCIN2P3 CET 2006-10-16 00:00:00 site: 3 CNAF CET 2006-12-18 00:00:00site: 3 CNAF CET 2006-12-18 00:00:00 site: 3 GRIDKA CET 2006-10-16 00:00:00site: 3 GRIDKA CET 2006-10-16 00:00:00 site: 3 UVIC PST8PDT 2007-05-04 21:00:00site: 3 UVIC PST8PDT 2007-05-04 21:00:00 backupInt: SLAC 1 DAYbackupInt: SLAC 1 DAY backupIntDef: 1 DAYbackupIntDef: 1 DAY

Page 15: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1515

fileCloseWaitTime: 10 MINUTEfileCloseWaitTime: 10 MINUTE maxJobIdleTime: 15 MINUTEmaxJobIdleTime: 15 MINUTE maxSessionIdleTime: 12 HOURmaxSessionIdleTime: 12 HOUR maxConnectTime: 70 DAYmaxConnectTime: 70 DAY closeFileInt: 15 MINUTEcloseFileInt: 15 MINUTE closeIdleSessionInt: 1 HOURcloseIdleSessionInt: 1 HOUR closeLongSessionInt: 1 DAYcloseLongSessionInt: 1 DAY nTopPerfRows: 20nTopPerfRows: 20 yearlyStats: ONyearlyStats: ON allYearsStats: OFFallYearsStats: OFF

Page 16: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1616

Basic ViewBasic View

users unique files

jobs all files

Page 17: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1717

Top Top Performers Performers

TableTable

Page 18: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1818

User InformationUser Information

Page 19: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1919

Skim InformationSkim Information

Page 20: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 2020

Statistics (BaBar, September 2007)Statistics (BaBar, September 2007)

DB size: > 40 GBDB size: > 40 GB # tables: > 200# tables: > 200

many with 10’s of millions rowsmany with 10’s of millions rows Largest table > 132,000,000 rowsLargest table > 132,000,000 rows

# jobs recorded > 30,000,000# jobs recorded > 30,000,000 At peak timesAt peak times

over 100 concurrent usersover 100 concurrent users running 10’s of thousands of jobsrunning 10’s of thousands of jobs

Page 21: March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 2121

Future Developments & ExpansionsFuture Developments & Expansions

DB and RT Data BackupDB and RT Data Backup File and User FilteringFile and User Filtering Staging MonitoringStaging Monitoring

Never enough disk to hold entire data sampleNever enough disk to hold entire data sample Disk uses power even when files are not Disk uses power even when files are not

accessedaccessed Fraction of Data AccessedFraction of Data Accessed

For each file typeFor each file type In specific time intervalsIn specific time intervals … …

Multi Experiment MonitoringMulti Experiment Monitoring Many experiments sharing computing resourcesMany experiments sharing computing resources