Download - Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Page 1: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014


Ceph – status update and xrootd

testingAlastair Dewhurst, Tom Byrne

Page 2: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014


• On 15th October gave overview talk on plans for Ceph at RAL Tier 1.

• Will aim to provide updates on progress made focusing on the xrootd deployment and testing.

• Current Ceph cluster with 7 nodes using 2013 generation hardware.


Page 3: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

S3 gateway

• At last meeting we had S3 gateway on virtual machine:

• Hope to have firewall holes + x.509 authentication working by next week.

• S3 gateway ‘does it’s own thing’ with files which means it is difficult to use with other plugins.

• Will investigate writing own WebDAV gateway.


Page 4: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

CERN plugins• CERN have four plugins based on XRootD for


• radosfs (impl. file & directories in rados)

• xrootd-rados-oss (interfacing radosfs as OSS plug-in)

• xrootd-diamond-ofs (adding checksumming & TPC)

• xrootd-auth-change-id (adding NFS server style authentication to xrootd)

• Our work has been on the xrootd-diamond-ofs

• Setup instructions can be found:


Page 5: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Xrootd deployment• Used RPMs provided on wiki to setup XrootD


• Had to setup a Cache tier because it currently doesn’t work directly with erasure coded pools

• This is because the file is opened and then appended to, CERN are working on patching it to work with EC.

• There are two pools:

• Data and Meta-Data


Page 6: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Cache Tier• Cache Tier is using mostly default settings

• 3 replicas of the data

• Will create a ‘cold’ erasure coded copy instantly

• LRU algorithm to clean up data.

• We would prefer not to use a Cache Tier and have direct access to Erasure coded pool

• It would be possible to have a ~10% Cache Tier in front of the storage.

• We believe Erasure coded pool should work well as we are not appending to files.


Page 7: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Diamond data

• Plugin splits file into chunks which are stored with a GUID in Ceph:

• Makes it hard to manage files and write other plugins.


[root@gdss540 ~]# rados -p diamond-data ls | grep 774b1a83-14d0-4fb9-a6c0-10e36c32febf | sort774b1a83-14d0-4fb9-a6c0-10e36c32febf774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000001774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000002774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000003774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000004774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000005774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000006774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000007774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000008774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000009774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000a774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000b774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000c774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000d774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000e774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000f

Page 8: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Diamond meta-data 8

Page 9: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Testing• Have tried commands from:

• UI (using xrootd v3.3.6)

• Node (using xrootd v4.0.4)

• Can copy files in and out:


[root@gdss540 ~]# xrdcp ./ivukotic\:group.test.hc.NTUP_SMWZ.root root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root.1[760.2MB/760.2MB][100%][==================================================][95.03MB/s]

[root@gdss540 ~]# xrdcp root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root /ivukotic\:group.test.hc.NTUP_SMWZ.root [760.2MB/760.2MB][100%][==================================================][58.48MB/s]

Page 10: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014


• Can create directories with UNIX style permissions.

• Setup is “Fragile” – frequently need to restart xrootd.

• Dies when doing “ls –l”


xrdfs gdss541 mkdir "/atlas/?owner=10763&group=1307"

[root@gdss540 ~]# xrdfs gdss541 ls /atlas//atlas/ivukotic:group.test.hc.NTUP_SMWZ.root/atlas/test

Page 11: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Direct Read• Code from Wahid:

• git clone https://[email protected]/reps/FAX

• Wanted to try 4 tests:

• Read 10% of the file and use 30MB cache

• Read 100% of the file and use 30MB cache

• Read 10% of the file and use 100MB cache – CRASHED!

• Read 100% of the file and use 100MB cache – CRASHED!


30MB Cache 1st 2nd 3rd Average

100%CPU Time /s 31.13 31.13 30.5 30.92

Disk IO MB/s 112.654 112.951 113.094


10%CPU Time /s 15.9 16.35 16.04


Disk IO MB/s 110.737 112.13 112.056 111.641

Page 12: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014

Future plans• 3 threads of development:

• Get simplified xrootd to work.

• Look into GridFTP gateway – Spoken to Brian Bockelman who has made equivalent for HDFS.

• Look into Webdav gateway – Instructions to get started on Ceph wiki and will speak to DPM developers.

• Need to start looking at xattr

• We have procured mac mini for future Calamari builds.


Page 13: Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.

Tom Byrne, 12th November 2014


• We got S3 gateway to work, but it wasn’t quite what we wanted.

• Testing Diamond plugin with help from CERN. Do not need all the features.

• Question: Why do all the plugins create their own data formats?

• If we go with an object store we will have to write our own plugins but this does not appear to be an impossible task.