March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems...
-
Upload
nathan-anderson -
Category
Documents
-
view
214 -
download
0
Transcript of March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems...
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 11
Real-time Data Access Monitoring Real-time Data Access Monitoring in Distributed, Multi Petabyte in Distributed, Multi Petabyte
SystemsSystems
Tofigh AzemoonTofigh Azemoon
Jacek BeclaJacek Becla
Andrew Hanushevsky Andrew Hanushevsky
Massimiliano TurriMassimiliano Turri
SLAC National Accelerator LaboratorySLAC National Accelerator Laboratory
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 22
Soon a typical running HEP experiment will look like Soon a typical running HEP experiment will look like thisthis
100s of users100s of users
10s of thousands of batch nodes10s of thousands of batch nodes
1000s of data servers1000s of data servers
10s of millions of files10s of millions of filesContainingContaining
10s of PB of data10s of PB of data++
geographically distributed geographically distributed crossing many time zones.crossing many time zones.
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 33
Mission StatementMission Statement
Provide real time overall view of Provide real time overall view of system performancesystem performance
Respond to detailed queriesRespond to detailed queries to identify bottle necks to identify bottle necks
to optimize the systemto optimize the system
to aid in planning system to aid in planning system expansionexpansion
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 44
The SLAC ¼The SLAC ¼PBPB “kan” “kan” ClusterCluster
ClientsClients
kan001 kan002 kan003 kan004 kan059
bbr-rdr03 bbr-rdr04
Data ServersData Servers( 320 TB)( 320 TB)
ManagersManagers
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 55
Monitored ObjectsMonitored Objects
client
file
job
session
user
type_2
type_1
server
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 66
File classes to monitor aggregate valuesFile classes to monitor aggregate values
for groups of files for groups of files BaBar Examples:BaBar Examples:
type_1 ( dataType )type_1 ( dataType ) /store/store//PRPR//R22/AllEvents/0006/70/22.0.3/AllEvents_00067045_22.0.3V03.02E.rootR22/AllEvents/0006/70/22.0.3/AllEvents_00067045_22.0.3V03.02E.root /store//store/SPSP/R22/000998/200406/22.0.3/SP_000998_068468.01.root /R22/000998/200406/22.0.3/SP_000998_068468.01.root
/store/store//PRskimsPRskims//R22/22.1.1cR22/22.1.1c//IsrIncExcIsrIncExc//79/79/IsrIncExc_57978.01.rootIsrIncExc_57978.01.root
/store/store//SPskimsSPskims//R22/22.1.1cR22/22.1.1c//Tau1NTau1N//001235/200212/001235/200212/Tau1N_001235_49553.01.rootTau1N_001235_49553.01.root
type_2 (skims)type_2 (skims)
File path File path getFileType getFileType ( ( type_1 valuetype_1 value, , type_2 valuetype_2 value))
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 77
Xrootd ServerXrootd Server
Highly scalable serverHighly scalable server Posix like access to filesPosix like access to files Load balancingLoad balancing Transparent recovery from Transparent recovery from
server crashesserver crashes Fault tolerantFault tolerant Very low latencyVery low latency
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 88
Monitoring Implementation in Monitoring Implementation in xrootdxrootd
Minimal impact Minimal impact on client requestson client requests
Robustness in Robustness in multimode failuremultimode failure
Precision & Precision & specificity of specificity of collected datacollected data
Real time Real time scalabilityscalability
Use UDP datagramsUse UDP datagrams Data servers Data servers
insulated from insulated from monitoring. Butmonitoring. But
Packets can get lostPackets can get lost
Outsource client Outsource client serializationserialization
Low bounded Low bounded resource usageresource usage
Use of time bucketsUse of time buckets
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 99
•Start SessionsessionId, user, PId, client, server, start T
•StagingstageId, user, PId, client, file path, stage T, duration, server
•Open FilefileId, user, PId, client, server, file path, open T
•Close File fileId, bytes read, bytes written, close T•End Session
sessionId, duration, end T + Xrootd restart time for each server
RT
data
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1010
Single Site MonitoringSingle Site Monitoring
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1111
Multi-site MonitoringMulti-site Monitoring
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1212
COMPONENTSCOMPONENTS
Collector/Decoder (C+Collector/Decoder (C++)+)
MySQL database (5.0)MySQL database (5.0) Database Applications Database Applications
(Perl, Perl DBI)(Perl, Perl DBI) CreateCreate Load Load PreparePrepare UpgradeUpgrade ReloadReload BackupBackup
Web application ( JSP3)Web application ( JSP3) DB access via JDBCDB access via JDBC
Data servers Data servers xrootd enabledxrootd enabled
Database ServerDatabase Server Hosting DB & Hosting DB &
running DB running DB applicationapplication
Web ServerWeb Server TomcatTomcat
For security reasons DB For security reasons DB & Web servers on & Web servers on different hostsdifferent hosts
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1313
Configuration FileConfiguration File dbName: xrdmon_kan_v005dbName: xrdmon_kan_v005 MySQLUser: xrdmonMySQLUser: xrdmon webUser: readerwebUser: reader MySQLSocket: /tmp/mysql.sockMySQLSocket: /tmp/mysql.sock baseDir: /u1/xrdmon/allSitesbaseDir: /u1/xrdmon/allSites ctrPort: 9931ctrPort: 9931 thisSite: SLACthisSite: SLAC fileType: dataType 100fileType: dataType 100 fileType: skim 500fileType: skim 500
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1414
site: 1 SLAC PST8PDT 2005-06-13 site: 1 SLAC PST8PDT 2005-06-13
00:00:0000:00:00 site: 2 RAL WET 2005-08-08 10:14:00site: 2 RAL WET 2005-08-08 10:14:00 site: 2 CCIN2P3 CET 2006-10-16 00:00:00site: 2 CCIN2P3 CET 2006-10-16 00:00:00 site: 3 CNAF CET 2006-12-18 00:00:00site: 3 CNAF CET 2006-12-18 00:00:00 site: 3 GRIDKA CET 2006-10-16 00:00:00site: 3 GRIDKA CET 2006-10-16 00:00:00 site: 3 UVIC PST8PDT 2007-05-04 21:00:00site: 3 UVIC PST8PDT 2007-05-04 21:00:00 backupInt: SLAC 1 DAYbackupInt: SLAC 1 DAY backupIntDef: 1 DAYbackupIntDef: 1 DAY
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1515
fileCloseWaitTime: 10 MINUTEfileCloseWaitTime: 10 MINUTE maxJobIdleTime: 15 MINUTEmaxJobIdleTime: 15 MINUTE maxSessionIdleTime: 12 HOURmaxSessionIdleTime: 12 HOUR maxConnectTime: 70 DAYmaxConnectTime: 70 DAY closeFileInt: 15 MINUTEcloseFileInt: 15 MINUTE closeIdleSessionInt: 1 HOURcloseIdleSessionInt: 1 HOUR closeLongSessionInt: 1 DAYcloseLongSessionInt: 1 DAY nTopPerfRows: 20nTopPerfRows: 20 yearlyStats: ONyearlyStats: ON allYearsStats: OFFallYearsStats: OFF
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1616
Basic ViewBasic View
users unique files
jobs all files
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1717
Top Top Performers Performers
TableTable
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1818
User InformationUser Information
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1919
Skim InformationSkim Information
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 2020
Statistics (BaBar, September 2007)Statistics (BaBar, September 2007)
DB size: > 40 GBDB size: > 40 GB # tables: > 200# tables: > 200
many with 10’s of millions rowsmany with 10’s of millions rows Largest table > 132,000,000 rowsLargest table > 132,000,000 rows
# jobs recorded > 30,000,000# jobs recorded > 30,000,000 At peak timesAt peak times
over 100 concurrent usersover 100 concurrent users running 10’s of thousands of jobsrunning 10’s of thousands of jobs
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 2121
Future Developments & ExpansionsFuture Developments & Expansions
DB and RT Data BackupDB and RT Data Backup File and User FilteringFile and User Filtering Staging MonitoringStaging Monitoring
Never enough disk to hold entire data sampleNever enough disk to hold entire data sample Disk uses power even when files are not Disk uses power even when files are not
accessedaccessed Fraction of Data AccessedFraction of Data Accessed
For each file typeFor each file type In specific time intervalsIn specific time intervals … …
Multi Experiment MonitoringMulti Experiment Monitoring Many experiments sharing computing resourcesMany experiments sharing computing resources