storage for tiers space for...

34
Campaign Storage storage for tiers space for everything Peter Braam Co-founder & CEO Campaign Storage, LLC 2017-05 campaignstorage.com

Transcript of storage for tiers space for...

Page 1: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Campaign Storage

storage for tiersspace for everything

Peter Braam

Co-founder & CEO

Campaign Storage, LLC

2017-05

campaignstorage.com

Page 2: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Contents

• Brief overview of the system• Creating and updating policy databases• Data management API’s

5/7/17 Campaign Storage LLC 2

Page 3: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Thank you

The reviewers of our paper asked quite a few insightful questions

Thank you.

5/7/17 Campaign Storage LLC 3

Page 4: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Campaign StorageInvented at LANLBeing productized at Campaign Storage

5/7/17 Campaign Storage LLC 4

Page 5: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

CPU or GPU packages

CPU cores

HighBandwidth

Memory

NVRAMe.g. XPOINT,

PCM, STTRAM

RAM

FLASH DISK

BW Cost $/ (GB/s) $10 (CPU included!) $10 $200 $2K $30K

Capacity Cost $/GB $ $8 $0.3 $0.05 $0.01

Node BW (GB/sec) 1 TB/s 100 GB/s 20 GB/s 5 GB/s

Cluster BW (TB/sec)

1 PB/s 100 TB/s 5 TB/s 100 GB/s 10’s GB/s

Software Language level Language levelHDF5 / DAOS

DDN IMECray Data Warp

Parallel FSCampaign Storage

ArchiveCampaign

5/7/17 Campaign Storage LLC 5

TAPE

Page 6: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Campaign Storage - a new tier

5/7/17 Campaign Storage LLC 6

Parallel File System

Archive

Parallel File System

Archive

Burst Buffer

Campaign Storage

Cloud

decreasingemphasis

High BW, high $$$Decreasing capacities

Old World New World

TB/sec

10 GB/sec

100 GB/seclarge, reliable

archive stage

Page 7: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

HPC Cluster A

Simulation Cluster20PF

Burst Buffer5 PB & 5 TB/s

HPCD & Viz Cluster HDFS

HPC Cluster B

HPCDCluster20PF

Burst Buffer5 PB & 5 TB/s

Lust

reFS

1 T

B/s

Campaign Storage

Campaign StorageMover Nodes

Campaign StorageMetadata Repository

Campaign StorageMover Nodes

Parallel Staging & Archiving

Campaign StorageObject Repository

Search & Data Management

Campaign Storage

5/7/17 Campaign Storage LLC 7

File System Interface

Optional other tools:• Policy managers (e.g. Robinhood)• Workflow managers (e.g. Irods)

customer infrastructure

Page 8: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Campaign Storage

It is …A file system - staging and archivingBuilt from low cost HW but:

• Industry standard object stores• Existing metadata stores

High integrityHigh capacity, ultra scalableNot highest BW or lowest latency

• 10-100x higher than archives• 10x lower than PFS

It is not …General purpose file system

• Wait … these don’t exist actually

Using object stores has problems• Data mover support takes effort• We will ease that pain

5/7/17 Campaign Storage LLC 8

Page 9: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Implementation - modules

5/7/17 Campaign Storage LLC 9

OS with VFS and Fuse

MarFS

Object Storage Metadata FS

Data Movers

HSM – Lustre / DMAPIMPI

Enterprise NASgridftp

Management

Analytics & SearchMigrationContainers

Page 10: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Campaign Storage - deployment

5/7/17 Campaign Storage LLC 10

Campaign Storage

Campaign StorageMover Nodes

Campaign Storage

MetadataRepository

Campaign StorageMover Nodes

Campaign StorageObject

Repository

Search & Data Management

Object Repository

Disk object stores- Commercial & OSS

Archival object stores- Black Pearl

Full POSIX objects- Stored in metadata FS

Metadata Repository

Some nearly POSIX distributed FS with EA’s• Lustre / ZFS• GPFS

Move & Manage

Nodes: 1-100’s - Mount MarFS & other FS

Mover software- Software on mover node

Management- Search analytics in MarFS- 3rd party movement- Containers

deploy

Page 11: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Policy Databases

5/7/17 Campaign Storage LLC 11

Page 12: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Traditional approach

Database with a record for each fileFound in HPSS, Robinhood, DMF etc

Used forUnderstanding what is in the file system

which files are old, recent, big, belong to group, on deviceAssist in automatic (“policy”) or manual data management Typically histogram ranges are computed from search results

5/7/17 Campaign Storage LLC 12

Page 13: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Challenges

ChallengesPerformance – both ingest and queries

queries on 100M file database can take minutesScalability

Requires significant RAM (e.g. 30% of DB size)Handling more than 1B files is very difficult presently

Never 100% in syncAdds load to premium storage

5/7/17 Campaign Storage LLC 13

Page 14: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Approaches

Horizontally scaling key value storeLANL is exploring this

A variety of proprietary approaches – e.g. Komprise

Histogram analyticsMaintaining aggregate data has it own challenges:

e.g. How to measure the change in size of a fileVery few changelogs record old size

5/7/17 Campaign Storage LLC 14

Page 15: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Analytics - subtree search

Every directory has histogram recording properties of its subtree• encode: #files, #bytes in subtree have a property?• Limited granularity, limited relational algebra

• Store perhaps ~100,000 properties per directory

Examples:• Quota in subtree? User/group database for subtree?• What fileservers contain files?• Geospatial information in file?• (file type, size, access time) tuples

• Allows limited relational algebraNot a new idea. Can be added to ZFS & Lustre

5/7/17 Campaign Storage LLC 15

Page 16: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

5/7/17 Campaign Storage LLC 16

Include e.g. linked list of subdirectories and database of parents of files link count > 1

Page 17: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Iterate over subdirectories

5/7/17 Campaign Storage LLC 17

HistoDB

prv nxt

Subdir 1

dir

Subdir 3

HistoDB

prv nxt

Subdir 2

HistoDB

prv nxt

HistoDB

Page 18: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Key properties

Generate initially from a scan, then update with changelogsmathematically provehisto(changelog ○ FS1) = histo_update(changelog) + histo(FS1)

Additive property: histograms can be added, either increase count or add new barshisto(dir) = sum histo(subdirs) + contributions(files in dir)this is Merkl tree property – graft subtrees with simple addition

Keep 100% consistent with snapshotsSpace consumption on par with policy database with 100K histogram buckets

5/7/17 Campaign Storage LLC 18

Page 19: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Inserting subtrees

5/7/17 Campaign Storage LLC 19

HistoDB

subtree

/

/a

/a/b

HistoDB

subtree

HistoDB

HistoDB

HistoDB

+

+

+

=

Page 20: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Evaluation

A single histogram lookup may provide the overview that a policy search provided

But

A histogram approach may has insufficient data for efficient general searches. Adapting histograms can be costly – how common is this?

5/7/17 Campaign Storage LLC 20

Page 21: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Missing Storage API’s

5/7/17 Campaign Storage LLC 21

Page 22: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Reflect on Storage Software

Since 1980’s a utility has been added “afs” “bfs” “cfs” … “zfs”implements a set of non-standardized featuresfile sets, data layout, ACL’s

ACL’s and extended attributes became part of POSIX in 2000’s

Storage software almost always centers around batch data operations:caches do this inside the OSutilities do this – rsync, zip, cloud software does this – dropboxcontainers do this - Docker

5/7/17 Campaign Storage LLC 22

Page 23: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Lack of standardized API’s

Unnecessarily complicated software

Not portable, locked in to a platform

5/7/17 Campaign Storage LLC 23

Page 24: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Example - data movement across many files

• Objective store batches of files• New concept: file level I/O vectorization

• Includes server driven ordering• Packing small files into one object• Cache flushes

5/7/17 Campaign Storage LLC 24

int copy_file_range(copy_range *r, uint count, int flags)

struct copy_range {int source_fd;int dest_fd;off_t source_offset;off_t dest_offset;size_t length;

}

Page 25: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Extending the API - alternatives

In some areas concepts must be defineddata layoutsub-sets and subtrees of file systems (very similar to “mount”)

DB world solved this problem – SQL as a domain specific languageA file level data management solution could build on:

asynchronous data and metadata API’sbatch / transaction boundariesintelligent processing

Possibly a better approach than more API callsevidence is seen in SQFSCK

New problems will keep appearing, e.g. doing this in clusters

5/7/17 Campaign Storage LLC 25

Page 26: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Thank you

5/7/17 Campaign Storage LLC 26

Page 27: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Metadata Movement

5/7/17 Campaign Storage LLC 27

Page 28: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Batch metadata handling

Well studied problem, not easily productized

Several sides to the problem1. scale out the server side – data layout2. bulk communication

in many cases this utilizes replay of operations3. tree requires linking subtrees and subsets

Conflicting demands between latency & throughput

5/7/17 Campaign Storage LLC 28

Page 29: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Role of containers

Fundamentally Unlikelydifferent tiers perform data movement at similar granularity

Containers are a must-have

5/7/17 Campaign Storage LLC 29

Page 30: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Example Container Functionality

5/7/17 Campaign Storage LLC 30

Layer 1

Base layer ZFS file system

ZFS snapshot

ZFS clone ZFS snapshot

ZFS cloneContainer layer Serializeddifferential

Serializeddifferential

Analytics differential

Containeranalytics

ZFS Pool

Application interface Implementation Slower tier interface

Page 31: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Containers as distributed namespace

Requires being able to locate the containerLocation database: a subtree resides on a node

Performance will scale well as long as containers can be large enoughFragmented vs. co-located metadataLocal node performance x #nodes

Related to STT trees, not identical. CMU published a series of papers on this.

5/7/17 Campaign Storage LLC 31

Page 32: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Other approaches / key unsolved problems

Other approaches:Peer to peer metadata protocolsLANL scaled them to 1B file creates / sec (in an experiment)

Allow conflicts

Distributed namespace consistencyAn “epoch” approach tracking dependent updates should workThere is little understanding of fragmented vs contiguous MD

5/7/17 Campaign Storage LLC 32

Page 33: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Conclusions

Campaign Storage: bulk data store, archive – focus on data movement

Massive data handling at file level is importantAmazon introduced S3FS, Dropbox and Gdrive rule

Search, batch metadata movement key ingredients

Richer API’s or a DSL could create a better eco system

campaignstorage.com

5/7/17 Campaign Storage LLC 33

Page 34: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder

Thank you

5/7/17 Campaign Storage LLC 34