Descriptive Data Analysis of File Transfer Data
-
Upload
tatiana-torres -
Category
Documents
-
view
45 -
download
1
description
Transcript of Descriptive Data Analysis of File Transfer Data
![Page 1: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/1.jpg)
Descriptive Data Analysis of File
Transfer Data
Sudarshan Srinivasan
Victor Hazlewood
Gregory D. Peterson
![Page 2: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/2.jpg)
2
Objective
· Understanding the GridFTP log transfer data we have at NICS.
· Analyze the data and identify areas of potential improvement.
· Perform predictive analysis to improve efficiency.
· Apply knowledge to XSEDE service providers.
![Page 3: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/3.jpg)
3
NICS GridFTP Infrastructure
![Page 4: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/4.jpg)
4
GridFTP Logging
· Gridftp data transfer protocol version 5.2.2.
· Two types of logging: "usage" logging and "log_transfer" logging (enabled in 5.2.2).
· Prior to 5.2.2 endpoint IP address data was filled with 0.0.0.0.
· Thanks to the Globus folks for fixing this bug!
![Page 5: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/5.jpg)
5
Transfer Logs
· NICS uses a PostgreSQL database for storing transfer log data.
· Two new tables: n_gridftp_usage and n_gridftp_usage_detail.
· n_gridftp_usage: quick lookup of aggregate monthly GridFTP usage information.
· n_gridftp_usage_detail: Detailed records of each data transfer.
· Log data includes: starttime, endtime, nbytes, user, filename, source and destination end points.
![Page 6: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/6.jpg)
Log Data Collection
· Data from each GridFTP server is copied to log files to a central NFS location.
· Each month we run a processing script on the log files that checks for errors in the log entry.
· Following this, we run a script to load the log files into database table.
· We chose transfer log data for the year 2013 for this analysis.
DATE=20130401132041.657463 HOST=datamover1.nics.utk.edu PROG=globus-gridftp-server NL_EVNT=FTP_INFO START=2013041132041.534646 USER=username NBYTES=1048576 VOLUME=/ STREAMS=1 STRIPS=1 DEST=[192.249.6.164] TYPE=RETR CODE=226
![Page 7: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/7.jpg)
7
Log Data Analysis· Two variables were identified: number of transfers
and total amount of data transferred.
· Data transfer rate based on starttime, endtime and nbytes.
· Monthly visual comparison of data coming into and going out of NICS from everywhere.
· Intra XSEDE site number of transfers and data transferred coming into and going out of NICS.
· Bucketing of transfer data based on transfer size (ts).
· R statistical computing language was used to plot all histograms and graphs.
![Page 8: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/8.jpg)
8
Basic Statistics for the year 2013
Type Quantity
Total Transfers 67,160,380
Average transfers per month 5,596,698
File transfers ts > 64 GB 813 (0.001%)
File transfers 1 MB < ts < 64GB 19,374,549 (28.85%)
File transfers ts < 1 MB 47,785,018 (71.15%)
![Page 9: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/9.jpg)
9
Number of transfers and amount transferred for the year 2013
Number of transfers (in millions)Total = 83.54 millions
Total amount transferred (in TB)Total = 1235.7millions
MonthTota
l am
ou
nt tr
ansf
err
ed
(in
TB
)N
um
be
r of
tran
sfe
rs(i
n m
illio
ns) Mean
![Page 10: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/10.jpg)
10
Percentage of transfers vs Transfer size for the year 2013
Total transfers: 67160380
Transfers size (ts)
Pe
rce
nta
ge
of t
ran
sfe
rs
![Page 11: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/11.jpg)
11
Transfer speed for top 500 transfers with transfer size > 1GB
Month
gbp
s
![Page 12: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/12.jpg)
12
Monthly comparison between number of transfers coming into and going out
of NICS for year 2013
Month
Tota
l nu
mb
er o
f tra
nsf
ers
(in
mill
ion
s)
![Page 13: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/13.jpg)
13
Monthly comparison between total amount of data coming into and going
out of NICS for year 2013
Month
Tota
l am
ou
nt o
f dat
a m
ove
d(i
n T
B)
![Page 14: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/14.jpg)
Transfer data buckets for November 2013
14
All transfers for November 2013Total transfers: 2181157
Transfer size (ts)
Pe
rce
nta
ge
of t
ran
sfe
rs
All transfers for November 2013, ts < 1MBTotal transfers: 749747
Pe
rce
nta
ge
of t
ran
sfe
rs
Transfer size (ts)
All transfers for November 2013, 1MB < ts < 64GBTotal transfers: 1431385
Pe
rce
nta
ge
of t
ran
sfe
rs
Transfer size (ts)
All transfers for November 2013, ts > 64GBTotal transfers: 25
Pe
rce
nta
ge
of t
ran
sfe
rs
Transfer size (ts)
![Page 15: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/15.jpg)
15
Intra XSEDE Sites and Abbreviation
Site Name Abbreviation
Texas Advanced Computer Center TACC
Pittsburgh Supercomputing Center PSC
San Diego Supercomputer Center SDSC
National Institute for Computational Sciences/ Georgia Institute of
Technology
NICS/GaTech
Indiana University IU
Open Science Grid OSG
National Center for Atmospheric Research
NCAR
![Page 16: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/16.jpg)
16
Intra XSEDE site data coming into NICSN
um
be
r of
tran
sfe
rs(i
n th
ousa
nd
s)To
tal a
mo
unt
tran
sfe
rre
d(i
n T
B)
Month
TACCPSCSDSCNICS/GaTech
IUOSGNCAR
![Page 17: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/17.jpg)
17
Intra XSEDE site data going out of NICS
Month
Nu
mb
er
of tr
ansf
ers
(in
thou
san
ds)
TACCPSCSDSCNICS/GaTech
IUOSGNCAR
Tota
l am
ou
nt tr
ansf
err
ed
(in
TB
)
![Page 18: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/18.jpg)
18
Intra XSEDE site data coming into and going out of NICS together
TACCPSCSDSCNICS/GaTech
IUOSGNCAR
Nu
mb
er
of tr
ansf
ers
(in
thou
san
ds)
Tota
l am
ou
nt tr
ansf
err
ed
(in
TB
)
Month
![Page 19: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/19.jpg)
19
Future Work· Currently in progress:
– Moving from using PostgreSQL database to loading data completely in memory in a separate machine.
– Using Apache Spark for fast large-scale data processing.– Combining SQL, streaming, and complex analytics.– Using advanced data mining and machine learning
algorithms provided in libraries in Python.
· Next Step:– Analyze by combing job data, filesystem data, and archive
data for analysis.– Visualize data flow within XSEDE network on a
geographical map.
![Page 20: Descriptive Data Analysis of File Transfer Data](https://reader035.fdocuments.in/reader035/viewer/2022062314/568132b0550346895d9965c7/html5/thumbnails/20.jpg)
Thank You!
Questions?