Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and...
Transcript of Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and...
![Page 1: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/1.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.1
Deploying Server-side File SystemMonitoring at NERSCCray Users Group Proceedings May 7, 2009
Andrew UseltonNational Energy Research Scientific Computing Center
Lawrence Berkeley National Lab
![Page 2: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/2.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.2
Contents
1 The Franklin Cray XT4CerebroThe Lustre Monitoring ToolThe Lustre Dashboard
2 Data AnalysisMonitoring Specific Tests or IntervalsData Mining for Average and Aggregate Behavior
3 A Simple Model for I/OPoisson DistributionsFranklin’s Actual DistributionLate Breaking NewsAcknowledgements and References
![Page 3: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/3.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.3
Monitoring the I/O Subsystem
CN
network10.0
networkNERSC
Cerebro/LMT
Liberty
switch
fc
RAID
RAID
OST
OSS
OST
OST
OST
OSS
OST
OST
OST
OSTswitch
CN
CNfc
MDS
Net
Almanack
![Page 4: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/4.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.4
Cerebro
OST
OST
cerebro_metric_lmt_mds.so
cerebro_metric_lmt_ost.so
cerebro_monitor_lmt.so
OSS
/usr/lib/cerebro/*
cerebro_metric_lmt_oss.so
OST
OST
![Page 5: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/5.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.5
LMT
stats
uuid
OSS
OST
/proc/meminfo
/proc/stat
/proc/fs/lustre/obdfilter/*/
OST
OST
filesfree
OST
filestotal
kbytesfree
kbytestotal
numrefs
![Page 6: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/6.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.6
An OSS Tuple
Cerebro Protocol VersionHost NameCPU UtilizationMemory Utilization
1.0;nid04187;4.990020;39.303989
![Page 7: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/7.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.7
OST Data Values
Cerebro Protocol VersionHost NameUUIDBytes ReadBytes WrittenKbytes FreeKbytes UsedInodes FreeInodes Used
![Page 8: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/8.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.8
MDS Operations
mysql> select * from OPERATION_INFO;OPERATION_NAME UNITS OPERATION_NAME UNITS
req_waittime usec mds_getattr_lock usecreq_qdepth reqs mds_close usecreq_active reqs mds_reint usec
reqbuf_avail bufs mds_readpage usecost_reply usec mds_connect usec
ost_getattr usec mds_disconnect usecost_setattr usec mds_getstatus usecost_read bytes mds_statfs usecost_write bytes mds_pin usec
ost_create usec mds_unpin usecost_destroy usec mds_sync usecost_get_info usec mds_done_writing usecost_connect usec mds_set_info usec
ost_disconnect usec mds_quotacheck usecost_punch usec mds_quotactl usecost_open usec mds_getxattr usecost_close usec mds_setxattr usecost_statfs usec ldlm_enqueue usec
...
![Page 9: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/9.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.9
The Lustre Dashboard
![Page 10: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/10.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.10
Four IOR Tests
0
2000
4000
6000
8000
10000
1200022
:45
22:4
8
22:5
1
22:5
4
22:5
7
23:0
0
23:0
3
23:0
6
23:0
9
23:1
2
23:1
5
23:1
8
Dat
a R
ate
(MB/
s)
Time (PDT)
Aggregate OST rates from 2008-07-28 22:45:00
read ratewrite rate
![Page 11: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/11.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.11
24 Hours of LMT Data
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000008
:00
10:0
0
12:0
0
14:0
0
16:0
0
18:0
0
20:0
0
22:0
0
00:0
0
02:0
0
04:0
0
06:0
0
Dat
a R
ate
(MB
/s)
Time (PDT)
read ratewrite rate
![Page 12: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/12.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.12
Daily Averages
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
07/01 08/01 09/01 10/01 11/01 12/01 01/01 02/01 03/01
Dat
a R
ate
(GB/
s)
Time (PDT)
Average daily rates
ReadWrite
![Page 13: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/13.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.13
http://en.wikipedia.org/wiki/Poisson_distribution:
•
fλ(k) =λk e−λ
k !
•C(m) = N × fλ(int(m/M))
![Page 14: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/14.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.13
http://en.wikipedia.org/wiki/Poisson_distribution:
•
fλ(k) =λk e−λ
k !
•C(m) = N × fλ(int(m/M))
![Page 15: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/15.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.14
Poisson Distribution: λ = 2
10
100
1 K
10 K
100 K
1 M
10 M
100 M
0 0.2 GB 0.4 GB 0.6 GB 0.8 GB 1 GB 1.2 GB 1.4 GB 1.6 GB 1.8 GB 2.0 GB
coun
t
m - The amount of data transferred during 5 second interval.
Poisson distribution
lambda = 2, M = 125MB, N = 250M
![Page 16: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/16.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.15
Poisson Distribution: λ = 20
10
100
1 K
10 K
100 K
1 M
10 M
100 M
0 0.2 GB 0.4 GB 0.6 GB 0.8 GB 1 GB 1.2 GB 1.4 GB 1.6 GB 1.8 GB 2.0 GB
coun
t
m - The amount of data transferred during 5 second interval.
Poisson distribution
lambda = 20, M = 40MB, N = 250M
![Page 17: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/17.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.16
250 M LMT Observations
10
100
1 K
10 K
100 K
1 M
10 M
0 500 1000 1500 2000 2500
Cou
nt
MB
Distribution of LMT observed rates
readwrite
![Page 18: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/18.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.17
Two weeks of recent observations
10
100
1 K
10 K
100 K
1 M
0 500 1000 1500 2000 2500
Cou
nt
MB
Distribution of LMT observed rates
readwrite
![Page 19: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/19.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.18
I would like to acknowledge and thank:
Al Chu The author of Cerebro.Herb Wartens The author of the Lustre Monitorining Tool
plug-ins.
Both work at Lawrence Livermore National Lab, whichsupported the development of these tools. Both were verygenerous with their time as I deployed the software on Franklin.
![Page 20: Deploying Server-side File System Monitoring at NERSC · Late Breaking News Acknowledgements and References 1.8 MDS Operations mysql> select * from OPERATION_INFO; OPERATION_NAME](https://reader033.fdocuments.in/reader033/viewer/2022052004/601701d7fd5ea11d870f4a09/html5/thumbnails/20.jpg)
DeployingServer-side File
System Monitoring atNERSC
Andrew Uselton
The Franklin Cray XT4Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
Data AnalysisMonitoring Specific Tests orIntervals
Data Mining for Averageand Aggregate Behavior
A Simple Model for I/OPoisson Distributions
Franklin’s ActualDistribution
Late Breaking News
Acknowledgements andReferences
1.19
The software is available from:
Both applications are open source and available fromSourceforge.
Cerebro http://sourceforge.net/projects/cerebro
LMT http://sourceforge.net/projects/lmt/
If you would like hints and encouragement withgetting this software deployed, contact me:Andrew Uselton ([email protected])If you get results from your deployment that youwould like to share, please do so.