REST VOL for HSDS and HDF Sharded Data Storage · 5 • Convenience of having one file vs lots of...
Transcript of REST VOL for HSDS and HDF Sharded Data Storage · 5 • Convenience of having one file vs lots of...
-
Proprietary and Confidential. Copyright 2018, The HDF Group.
REST VOL for HSDS and HDF Sharded Data Storage
John Readey
-
2
• Sharded Data Storage• REST VOL
• Direct Access
Overview
-
3
Instead of managing HDF5 objects (datasets, groups, chunks) within a POSIX file store them as separate files (or as objects within an object storage system such as S3)
Sharded data concept
For meta data (datasets and groups), a self-descriptive format such as JSON can be used
For chunks, store as binary objects for efficiency
-
4
• Limit maximum size of any object
• -> Object storage systems typically don’t support partial writes, so large objects
are inefficient to update
• Supporting parallelism is easier
• -> no file locking needed
• No need to manage free space, key-value mappings, etc
• -> storage systems have gotten pretty good at doing this for you
• No need to worry about system crash leaving you with a corrupted HDF5 files
• -> worse case you lose one object, with object storage not even that
Why a sharded data format?
-
5
• Convenience of having one file vs lots of small files
• Maybe not as important given tooling to abstract this from the user
• E.g. hstouch, hscopy, hsrm tools
• Filesystems have trouble dealing with large number of files (particularly within one directory)
• Not sure this is a problem with modern Linux filesystems
• Certainly not an issue with object storage systems
Case against sharded storage
-
6HSDS shard schema example
root_obj_id/group.jsonobj1_id/
group.jsonobj2_id/
dataset.json0_00_1
obj3_id/dataset.json0_0_20_0_3
Observations:• Metadata is stored as JSON• Chunk data stored as binary blobs• Self-explanatory• One HDF5 file can translate to lots of
objects• Flat hierarchy – supports HDF5
multilinking• Can limit maximum size of an object• Can be used with Posix or object
storageSchema is documented here: https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md
https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md
-
7
• It can be useful to divvy up the objects within an HDF5 domain into roughly equal size collections – for instance we have n workers and we’d like to perform some
action on the domain
• CRUSH algorithm approach: hash key and take modulo of number of workers
• Decentralized, no book keeping required
Storage Partitioning
-
8
• The tool “hsload” will convert an HDF5 file to the sharded format (using HSDS)• Conversely, the “hsget” tool will take the shaded format and reconstruct the HDF5
file
• Data is preserved after a round trip
Conversion from HDF5 files to sharded files
-
9HDF5 file linking
• Converting large HDF5 files (or a large collection of files) to the sharded format is time consuming and effectively doubles the storage requirements
• Rather than converting the entire file to the HDF Schema, just the metadata can be imported (typically
-
10REST VOL Plugin
• The HDF5 VOL architecture is a plugin layer for HDF5• Public API stays the same, but different back ends can be implemented
• REST VOL substitutes REST API requests for file i/o actions
• C/Fortran applications should be able to run with minor tweaks• Downloadable from: https://github.com/HDFGroup/vol-rest
For HDF5 1.12, use the hdf5_1_12_update branch
https://github.com/HDFGroup/vol-rest
-
11Features not yet supported
• *Need new VOL api?
Feature HSDS h5pyd RESTVOLObject reference J J LRegion Reference L L LFill Value L J LVirtual Datasets L L LVariable Length Datatypes J J LDataset SQL Query* J J L
-
12REST VOL Wishlist
Interesting things that would be nice to have:• Support multi-threaded clients
• Support asynchronous API
• REST VOL activation based on file prefix – e.g. “hdf5://”• Paginated read/write for large dataset operations
• Retry logic for HTTP timeouts, Service Unavailable• Support for other languages: Java, .Net, R, etc.
-
13Pros and cons of running a service
• Accessing a sharded data store via a service (HSDS) is nice:• Server mediates access to the storage system
• Server can speed things up by caching recently accessed data
• Only the data the client needs needs to be transmitted outside the data center• HSDS running on a large server or cluster can provide more processing capacity
than a client might have • Unless it’s not:
• Don’t want to bother setting up, running service
• Challenge to scale capacity of service to clients
-
14Direct Access Project
Provide equivalent functionality of HSDS in a library • SN code would run in a sub-process• DN code would run in one or more sub-processes (e.g.
based on number of cores) • Sub-processes would directly access storage system• Communication between parent processes and sub-
processes would be http via localhost• Sub-processes shutdown when last file is closed• The same HSDS storage schema would be used• Can switch between direct access and server as
needed
-
15Diect Access System Diagram
-
16Direct Access VOL plugin
• For C/C++ apps, the direct access model could be implemented as a VOL connector• Other than launching the sub-processes the VOL would work in the same way as the
REST VOL, so it probably makes sense to include this functionality in the REST VOL
rather than create a new VOL• With direct access HDF5 lib + REST VOL enables sharded data as an alternative to
the HDF5 file format• Enables multi-threading
• Cloud optimized storage
• Crash-proof
-
17Questions?
-
18Try it out!
Get the software here:
• HSDS: https://github.com/HDFGroup/hsds• H5pyd: https://github.com/HDFGroup/h5pyd• REST VOL: https://github.com/HDFGroup/vol-rest• REST API documentation:
https://github.com/HDFGroup/hdf-rest-api• Example programs:
https://github.com/HDFGroup/hdflab_examples
https://github.com/HDFGroup/hsdshttps://github.com/HDFGroup/h5pydhttps://github.com/HDFGroup/vol-resthttps://github.com/HDFGroup/hdflab_examples