Federating Grid and Cloud S torage in EUDAT
description
Transcript of Federating Grid and Cloud S torage in EUDAT
Shaun de Witt, STFCMaciej Brzeźniak, PSNCMartin Hellmich, CERN
Federating Grid and Cloud Storage in EUDAT
International Symposium on Grids and Clouds 2014,
23-28 March 2014
Agenda• Introduction• …• …• …• Test results• Future work
3rd EUDAT Technical meeting in Bologna 7th February 2013
Introduction• We present and analyze the results
of Grid and Cloud Storage integration• In EUDAT we used:
– iRODS as Grid Storage federation mechanism– OpenStack Swift as scalable object storage solution
• Scope:– Proof of concept– Pilot OpenStack Swift installation in PSNC– Production iRODS servers
in PSNC (Poznan) and EPCC (Edinburgh)
3rd EUDAT Technuical meeting in Bologna 7th February 2013
EUDAT project introduction• pan-European Data Storage & mgmt infrastructure• Long term data preservation:
• Storage safety, availability – replication, integrity control
• Data Accessibility – visibility, possibility to refer over years
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Partners: data center & communities:
EUDAT challenges:
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Federate heterogeneous data management systems:• dCache, AFS, DMF, GPFS, SAM-FS
• File systems, HSMs, file servers
• Object Storage systems (!)
while ensuring:• Performance, scalability,
• Data safety, durability, HA, fail-over
• Unique access, Federation transparency,
• Flexibility (rule engine)
• Implement the core services:• safe and long-term storage: B2SAFE,
• efficient analysis: B2STAGE,
• easy deposit & sharing: B2SHARE,
• Data & meta-data exploration: B2FIND.
Picture showing various storagesystems federated under iRODS
EUDAT CDI domain of registered data:
Grid – Cloud storage integration• Need to integrate Grids and Cloud/Object storage
• Grids get another, cost-effective, scalable backend• Many institutions and initiatives
are testing & using in production object storage including
• Most Cloud Storage use Object Storage concept• Object Storage solutions have limited support
for federation that is well addressed in Grids• In EUDAT we integrated:
• object storage system – OpenStack Swift• iRODS servers and federations
3rd EUDAT Technuical meeting in Bologna 7th February 2013
Context: Object Storage Concept• The concept enables building
low-cost, scalable, efficient storage:• Within data centre• DR / distributed configurations
• Reliability thanks to redundancy of components:• Many cost-efficient storage servers w/ disk drives (12-60
HDD/SSD)• Typical (cheap) network: 1/10 Gbit Ethernet
• Limitations of traditional appraoches:• High investment cost and maintenance• Vendor lock-in, Closed architecture, Limited scalability• Slow adoption of new technologies than in commodity market
Context: Object Storage importance• Many institutions and initiatives
(DCs, NRENs, companies, R&D projects)are testing & using in production object storage including:• Open source / private cloud:
• Open Stack Swift• Ceph / RADOS• Sheepdog, Scality…
• Commercial:• Amazon S3, RackSpace Cloud Files…• MS Azzure Object Storage…
• Most promising open source: Open Stack Swift & Ceph
Object Storage: Architectures
OpenStack Swift
User Apps
Load balancer
ProxyNode
ProxyNode
ProxyNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
UploadDownload
CEPH
LibRados
RadosGW RBD CephFS
APP HOST / VM Client
Rados
MDSMDS.1
MDS.n
......
MONsMON.1
MON.n
......
OSDsOSD.1
OSD.n
......
Object Storage: concepts:
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
Object Storage concepts: no DB lookups!
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
Grid – Cloud storage integration
• Most cloud/object storage solutions expose:• S3 interface• Other native interfaces: OSS: Swift; Ceph: RADOS
• S3 (by Amazon) is de facto standard in cloud storage:• Many PetaBytes, Global systems• Vendors use it (e.g. Dropbox) or provides it• Large take up
• Similar concepts:• CDMI: Cloud Data Management Interface –
SNIA standard, not many implementationshttp://www.snia.org/cdmi
• Nimbus.IO: https://nimbus.io
• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/
• RackSpace Cloud Files:www.rackspace.com/cloud/files/3rd EUDAT Technuical meeting in Bologna 7th
February 2013
S3 and S3-like in commercial systems:• S3 re-sellers:
• Lots of services• Including Dropbox
• Services similar to S3 concept:• Nimbus.IO:
https://nimbus.io • MS-Azzure blob Storage:
http://www.windowsazure.com/en-us/manage/services/storage/• RackSpace Cloud Files:
www.rackspace.com/cloud/files/• S3 implementations ‚in the hardware’:
• Xyratex• Amplidata
3rd EUDAT Technuical meeting in Bologna 7th February 2013
o
Why build PRIVATE S3-like storage?• Features/ benefits:
• Reliable storage on top of commodity hardware• User still able to control the data• Easy scalability, possible to grow the system
• Adding resources and redistributing data possible in non-disruptive way
• Open source software solutions and standards available:• e.g. OpenStack Swift: Open Stack Native API and S3 API• Other S3-enabled storage: e.g. RADOS• CDMI: Cloud Data Management Interface
3rd EUDAT Technuical meeting in Bologna 7th February 2013
Why to federate iRODS with S3/OpenStack?
• Some communities have data stored in OpenStack• VPH is building reliable storage cloud on top of OpenStack
Swift within pMedicine project (together with PSNC)
• These data should be available to EUDAT• Data Staging: Cloud -> EUDAT -> PRACE HPC and back• Data Replication: Cloud -> EUDAT -> other back-end
storage• We could apply rule engine to data in the cloud,
assign PIDs
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• We were asked to consider cloud storage:• From EUDAT 1st year review report:
EUDAT’s iRODS federation
VPH case analysis:
iRODS serverS3 driver
S3 APIOSS API
iRODS serverother storage driver
Storage system
S3/OSS
client
iRODS client
HPC system
iRODS serverstorage driver
Data access
Data ingestion
Regi-stration
Data Staging
EUDAT’s PID Service
Replication
Dataingestion
Dataaccess
Dataaccess
PIDassigned
Our 7.2 project• Purpose:
• To examine existing iRODS-S3 driver• (possibly) to improve it / provide another one
• Steps/status:• 1st stage:
• Play with what is there – done for OpenStack/S3 + iRODS• Examine functionality• Evaluate scalability – found some issues already
• Follow-up• Try to improve the existing S3 driver
• Functionality• Performance
• Implement native Open Stack driver?• Get in touch with iRODS developers
3rd EUDAT Technuical meeting in Bologna 7th February 2013
iRODS-OpenStack testsTEST SETUP:• iRODS server:
• Cloud as compoundresources
• Disk cache in front of it• Open Stack Swift:
• 3 proxies, 1 with S3• 5 storage nodes• Extensive functionality and perf. tests
• Amazon S3:• Only limited functionality tests
3rd EUDAT Technuical meeting in Bologna 7th February 2013
S3/OpenStack API
S3 API
iRODS server(s)
iRODS-OpenStack testTEST RESULTS:
• S3 vs native OSS overhead• Upload: ~0%• Download: ~8%
• iRODS overhead:• Upload: ~19%• Download:
• From compound S3: ~0%• Cached: SPEEDUP: 230% (cache resources faster than S3)
iRODS-OpenStack test
Conclusions and future plans:• Conclusions
• Performance-wise iRODS does not bring much overhead – files <2GB• Problems arise for files >2GB – no support for multipart upload
in iRODS-S3 driver – this prevents iRODS from storing files >2GB in clouds• Some functional limits (e.g. imv problem)• Using iRODS to federate S3 clouds in large scale
would require improving the existing or developing a new driver
• Future plans:• Test the integration with VPH’s cloud using existing driver• Ask SAF for supporting the driver development • Get in touch with iRODS developers to assure the sustainability of our
work
EUDAT’s iRODS federation
Object storage on top of iRODS?
S3 driver
S3 API S3/OSS
client
iRODS client
Data Access/
ingest
Dataingestion
Dataaccess
iRODS server
Other storage
iRODS serverother storage driver
Storage system Storage system
iRODS API S3 API
Problems:• Data organisation mapping: * filesystem vs objects * big files vs fragments
• Identity mapping? * S3 keys/accounts vs X.509?
• Out of scope of EUDAT? * a lot of work needed