Tom Byrne, 12th November 2014
1
Ceph – status update and xrootd
testingAlastair Dewhurst, Tom Byrne
Tom Byrne, 12th November 2014
Introduction
• On 15th October gave overview talk on plans for Ceph at RAL Tier 1.
• Will aim to provide updates on progress made focusing on the xrootd deployment and testing.
• Current Ceph cluster with 7 nodes using 2013 generation hardware.
2
Tom Byrne, 12th November 2014
S3 gateway
• At last meeting we had S3 gateway on virtual machine:
• Hope to have firewall holes + x.509 authentication working by next week.
• S3 gateway ‘does it’s own thing’ with files which means it is difficult to use with other plugins.
• Will investigate writing own WebDAV gateway.
3
Tom Byrne, 12th November 2014
CERN plugins• CERN have four plugins based on XRootD for
CEPH:
• radosfs (impl. file & directories in rados)
• xrootd-rados-oss (interfacing radosfs as OSS plug-in)
• xrootd-diamond-ofs (adding checksumming & TPC)
• xrootd-auth-change-id (adding NFS server style authentication to xrootd)
• Our work has been on the xrootd-diamond-ofs
• Setup instructions can be found: https://github.com/cern-eos/eos-diamond/wiki
4
Tom Byrne, 12th November 2014
Xrootd deployment• Used RPMs provided on wiki to setup XrootD
gateway
• Had to setup a Cache tier because it currently doesn’t work directly with erasure coded pools
• This is because the file is opened and then appended to, CERN are working on patching it to work with EC.
• There are two pools:
• Data and Meta-Data
5
Tom Byrne, 12th November 2014
Cache Tier• Cache Tier is using mostly default settings
• 3 replicas of the data
• Will create a ‘cold’ erasure coded copy instantly
• LRU algorithm to clean up data.
• We would prefer not to use a Cache Tier and have direct access to Erasure coded pool
• It would be possible to have a ~10% Cache Tier in front of the storage.
• We believe Erasure coded pool should work well as we are not appending to files.
6
Tom Byrne, 12th November 2014
Diamond data
• Plugin splits file into chunks which are stored with a GUID in Ceph:
• Makes it hard to manage files and write other plugins.
7
[root@gdss540 ~]# rados -p diamond-data ls | grep 774b1a83-14d0-4fb9-a6c0-10e36c32febf | sort774b1a83-14d0-4fb9-a6c0-10e36c32febf774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000001774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000002774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000003774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000004774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000005774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000006774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000007774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000008774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000009774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000a774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000b774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000c774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000d774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000e774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000f
Tom Byrne, 12th November 2014
Diamond meta-data 8
https://indico.cern.ch/event/305441/session/5/contribution/37/material/slides/0.pdf
Tom Byrne, 12th November 2014
Testing• Have tried commands from:
• UI (using xrootd v3.3.6)
• Node (using xrootd v4.0.4)
• Can copy files in and out:
9
[root@gdss540 ~]# xrdcp ./ivukotic\:group.test.hc.NTUP_SMWZ.root root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root.1[760.2MB/760.2MB][100%][==================================================][95.03MB/s]
[root@gdss540 ~]# xrdcp root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root /ivukotic\:group.test.hc.NTUP_SMWZ.root [760.2MB/760.2MB][100%][==================================================][58.48MB/s]
Tom Byrne, 12th November 2014
“Filesystem”
• Can create directories with UNIX style permissions.
• Setup is “Fragile” – frequently need to restart xrootd.
• Dies when doing “ls –l”
10
xrdfs gdss541 mkdir "/atlas/?owner=10763&group=1307"
[root@gdss540 ~]# xrdfs gdss541 ls /atlas//atlas/ivukotic:group.test.hc.NTUP_SMWZ.root/atlas/test
Tom Byrne, 12th November 2014
Direct Read• Code from Wahid:
• git clone https://[email protected]/reps/FAX
• Wanted to try 4 tests:
• Read 10% of the file and use 30MB cache
• Read 100% of the file and use 30MB cache
• Read 10% of the file and use 100MB cache – CRASHED!
• Read 100% of the file and use 100MB cache – CRASHED!
11
30MB Cache 1st 2nd 3rd Average
100%CPU Time /s 31.13 31.13 30.5 30.92
Disk IO MB/s 112.654 112.951 113.094
112.8997
10%CPU Time /s 15.9 16.35 16.04
16.09667
Disk IO MB/s 110.737 112.13 112.056 111.641
Tom Byrne, 12th November 2014
Future plans• 3 threads of development:
• Get simplified xrootd to work.
• Look into GridFTP gateway – Spoken to Brian Bockelman who has made equivalent for HDFS.
• Look into Webdav gateway – Instructions to get started on Ceph wiki and will speak to DPM developers.
• Need to start looking at xattr
• We have procured mac mini for future Calamari builds.
12
Tom Byrne, 12th November 2014
Summary
• We got S3 gateway to work, but it wasn’t quite what we wanted.
• Testing Diamond plugin with help from CERN. Do not need all the features.
• Question: Why do all the plugins create their own data formats?
• If we go with an object store we will have to write our own plugins but this does not appear to be an impossible task.
13
Top Related