dCache Locality Performance Testing

14
dCache Locality Performance Testing Sarah Williams Indiana University 2013-09-13

description

dCache Locality Performance Testing. Sarah Williams Indiana University 2013-09-13. Test Cases. Downloading the files via xrdcp 5 MB and 2 GB files dummy files, full of 0’s Directly access files via readDirect - PowerPoint PPT Presentation

Transcript of dCache Locality Performance Testing

Page 1: dCache  Locality Performance Testing

dCache Locality Performance Testing

Sarah WilliamsIndiana University

2013-09-13

Page 2: dCache  Locality Performance Testing

Test Cases

• Downloading the files via xrdcp– 5 MB and 2 GB files dummy files, full of 0’s

• Directly access files via readDirect– The test file is NTUP_SMWZ.01120689._000089.root.1,

a fairly typical user input file of size 4.6 GB.– The tests read 10%-90% of the file, in 10% increments– readDirect does not allow us control the bytes read,

only number of events read. I ran a series of tests to find the number of events read which would correspond to 10-90% of bytes read.

Page 3: dCache  Locality Performance Testing

Test Conditions

• The test clients are worker nodes temporarily removed from the cluster. They are running only the OS (Scientific Linux 6.4), Condor in an offline mode, and the tests themselves.

• The tests were run serial and single-threaded.• The servers are our production servers. The tests are

spread across 10 servers.• The cluster was running a normal production load. The

LAN network and WAN connections showed typical usage. Tests from times of non-typical conditions were discarded.

Page 4: dCache  Locality Performance Testing

Network conditions

Test WorkerNodes LAN Storage

Remote Storage

Remote Storage

Remote Storage

Clients ran at IU, with local storage at IU and remote storage at UC. Note this route does not use LHCOne.Near-future network improvements are not shown.

[root@iut2-c200 ~]# traceroute uct2-s20.uchicago.edutraceroute to uct2-s20.uchicago.edu (128.135.158.170), 30 hops max, 60 byte packets 1 149.165.225.254 (149.165.225.254) 20.328 ms 20.325 ms 9.372 ms 2 et-10-0-0.2012.rtr.ictc.indiana.gigapop.net (149.165.254.249) 0.448 ms 0.491 ms 0.484 ms 3 149.165.227.22 (149.165.227.22) 5.087 ms 5.196 ms 5.273 ms 4 10.4.247.230 (10.4.247.230) 5.229 ms 5.396 ms 5.547 ms 5 uct2-s20.uchicago.edu (128.135.158.170) 5.323 ms 5.347 ms 5.220 ms

Page 5: dCache  Locality Performance Testing

Test Code• FAX End-user tutorial code base:

– https://github.com/ivukotic/Tutorial/– readDirect opens a ROOT file and reads the specified percentage of events from

it.• Locality-caching specific code:

– https://github.com/DHTC-Tools/ATLAS/tree/master/Performance%20Tests/Locality%20Caching

– test_local.sh is run directly on the worker node, and runs the local disk test with memory flushing. The line that executes ‘releaseFileCache’ can be commented out to do non-flushed tests.

– test_direct.sh, test_hit.sh and test_miss.sh are run on the management host and ssh to the test hosts to run the tests.

– ref_points.txt contains the set of percentage of events read needed to get the correct percentage of bytes read for readDirect

– releaseFileCache removes a specific file from the Linux page cache– pooltest*.txt contain the basenames of the files used for the tests. The

directories are hardcoded in the scripts. The filename is also the name of the data server the file resides on, ex. ‘uct2-16_1’

– testshosts.txt contains the hostnames of the worker nodes to use for testing.

Page 6: dCache  Locality Performance Testing

Small file downloadsIn these small files tests, the optimum strategy is no caching. You can see in the chart below that cache hit downloads take about 700 ms longer than non-cached downloads. This is caused by the small overhead caching imparts on the central dCache manager, which must look up what data servers a client is permitted to read and check if the requested file is on one of those servers.

0 10 20 30 40 50 60 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 MB file download times

cache hitcache missno caching

Iteration

Dow

nloa

d tim

e in

seco

nds

-0.090.010.110.210.310.410.51

5MB file download times

Cache hit

5 MB file

Dow

nloa

d tim

e in

seco

nds

Page 7: dCache  Locality Performance Testing

Larger files show the advantages of caching in an environment where files will be reused. Note that the time to download a file on a cache miss is roughly the sum of an non-cached download and a cache hit download. Cached performance is much more consistent than non-cached.

Large file downloads

0 5 10 15 20 25 30 350

20

40

60

80

100

120

140

2 GB file download times

cache hitcache missno caching

Iteration

Dow

nloa

d tim

e in

seco

nds

0255075

100125150175

2GB file download times

Cache hitLinear (Cache hit)

2 GB fileDow

nloa

d tim

e in

seco

nds

Page 8: dCache  Locality Performance Testing

The test file was pre-loaded to local disk, and then the tests were run on it sequentially. Due to memory caching in the Linux kernel, there was no actual disk IO during these tests. This is not necessarily a realistic test case. In production there would be N jobs competing for memory, and it would be impossible for them all to keep their input files in memory. Amore realistic case would be to removethe file from memory between tests.

Local disk (memory cached) tests

Read speed in MB/s X5660 E5440 AMD 2350 AMD2218Average 85.89 74.85 43.09 53.13Std dev 2.94 2.40 2.61 4.03

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

20

40

60

80

100

120

Times for Local Memory-Cached Tests

E5440Linear (E5440)AMD 2218

Percent of file read

Wal

ltim

e in

seco

nds

Page 9: dCache  Locality Performance Testing

The test file was pre-loaded to local disk, and then the tests were run on it sequentially. Between each test, the releaseFileCache utility was used to remove the test file from memory. The tests were narrowed down to the two faster processors, the X5660 and E5440, to simplify testing.

Local disk (memory flushed) tests

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

20

40

60

80

100

120

Times for Local Disk (memory flushed) Tests

X5660Linear (X5660)E5440Linear (E5440)

Percent of file read

Wal

ltim

e in

seco

nds

Read speed in MB/s

X5660 E5440

Average 50.54 42.41

Std dev 0.99 0.82

Page 10: dCache  Locality Performance Testing

WAN reads (caching disabled)WAN tests are vulnerable to network conditions on inter-campus links, making them more variable than LAN tests

Read speed in MB/s

X5660 E5440

Average 22.58 22.63

Std dev 4.15 1.65

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

25

50

75

100

125

150

175

200

225

Times for Remote Read tests (caching off)

X5660Linear (X5660)E5440

Percent of file read

Wal

ltim

e in

seco

nds

Page 11: dCache  Locality Performance Testing

Cache HitsReads over the LAN show more consistency thanWAN tests.

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

10

20

30

40

50

60

70

80

90

100

Times for Cache Hit Tests

X5660Linear (X5660)E5440Linear (E5440)

Percent of file read

Wal

ltim

e in

seco

nds

Read speed in MB/s

X5660 E5440

Average 47.45 44.77

Std dev 2.02 1.41

Page 12: dCache  Locality Performance Testing

Cache Misses

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

20

40

60

80

100

120

140

160

180

200

Times for Cache Miss Tests

X5660Linear (X5660)E5440Linear (E5440)

Percent of file read

Wal

ltim

e in

seco

nds

Cache miss tests results are equivalent to cache hit results plus 90s. We can infer thatit takes 90s to transfer the file from remote storage into local storage, at about 52 MB/s. Standard deviations are higher due to the inclusion of the inter-site link, the remote storage server and the local storage server, all of which are also serving the active cluster.

Read speed in MB/s

X5660 E5440

Average 16.24 15.41

Std dev 6.38 5.87

Page 13: dCache  Locality Performance Testing

Direct comparison of caching strategies

No caching is the optimum strategy when less than 75% of the file is read. When more than 75% is read, caching becomes optimal.

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%0

50

100

150

200

250

Comparison of Caching Strategies on X5660

Cache hitLinear (Cache hit)Local diskLinear (Local disk)

Percent of file read

Wal

ltim

e in

seco

nds

0

10

20

30

40

50

60

Comparison of Download Speed of Caching Strategies, on X5660

Spee

d in

MB/

s

Page 14: dCache  Locality Performance Testing

Conclusions

• Caching is less effective for very small files and for jobs that read only part of the file

• Cache is most effective medium to large files, when the entire file is downloaded, or when the file is reused

• Next steps:– Turn on locality caching for production nodes and

monitor. Monday 9/16?