Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage....

19
Syndicate: Software-defined Wide-area Storage Jude Nelson Princeton University

Transcript of Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage....

Page 1: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Syndicate:Software-defined

Wide-area Storage

Jude NelsonPrinceton University

Page 2: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Background

CCI*DIBBS NSF #1541318 Princeton University + University of Arizona

− OpenCloud + CyVerse (iPlant) Next-generation storage system

− Coming online this year− Seeking community input and advise

Page 3: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Outline

Problem Formulation What is Syndicate? Sample Applications UI/UX Status

Page 4: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

The Good: Lots of Data Sources!

University CorporateLab

Legacy Data Stores Legacy Data Stores

CloudStorage

CDNs +Bulk xfer

My Site PublicDatasets

Page 5: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

The Bad: Lots of Data Flows

Page 6: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

The Ugly: Storage Reintegration

Driver Driver Driver Driver Driver Driver

Drivers are only the beginning... Consistency Confidentiality Formatting Fault tolerance

Access control Retention Authentication ...etc...

Workflow logic

Page 7: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Each workflow implements abuilt-in bespoke storage system!

Page 8: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Prior Work

iRODS− Intra-site programmable storage

Parrot Virtual FS− Driver layer for legacy services

CernVM FS− Wide-area− End-to-end guarantees− Read-only

Page 9: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Syndicate: Programmable Storage

Driver Driver Driver Driver Driver

Workflow Workflow Workflow

Composable, reusablestorage programs

Stable API

Workflow-specificI/O pipeline

Driver

Page 10: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Why Syndicate?

Spans multiple sites and services− End-to-end authenticity− End-to-end correctness− No central points of trust

Minimizes operational costs− Isolates, composes reusable storage logic− Reprogrammable fabric → Immutable workflows− Self-managing (SDN-like)

Page 11: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Workflow

Syndicate Programming Model

Storage Programs− UNIX-y data plane− I/O flow: typed byte stream− Composition: 1-to-1, 1-to-N, N-to-1

Gateways− A storage program’s “process”− Stable workflow interface

Syndicate− The “shell” for gateways−

CloudStorage

Dataset

Page 12: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Syndicate Usage

Volume− Tagged filesystem abstraction− Set of cooperating gateways− Workflow-specific data-plane behavior

Users− Own, control, and run gateways− Volume owner: controls admission−

Page 13: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Real-world Volume (1)

HadoopGateWay

Gateway

GatewayiRODS

Gateway

CDN

Indexing

Encrypted writes

HDFS interface

RESTful interface

Stage data

Encryptedreads

Page 14: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Laptop

Real-world Volume (2)

VM

VM

AmazonS3

GenBankVM

VM

AkamaiCDN

Coordinate writes

R/W

R/O

Cached chunks

CyVerse Atmosphere;OpenCloud

Page 15: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Spanning Multiple Networks

Global control plane− Membership; configuration; I/O pipeline

construction− Metadata Service (MS) in Google AppEngine

Blockstack (USENIX ATC 2016)− Public LDAP-like DB− Control plane trust anchor− All nodes independently construct the same DB

DB journal embedded in a PoW blockchain No central points of trust!

Page 16: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

User Experience

1) PI makes user accounts2) Users make volumes3) Volume owners make and assign gateways4) Users point client at volume owners

Client looks up volume owners in Blockstack Client discovers accessible volumes Client configures and runs gateways

Page 17: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Operator Experience

1) Bake Syndicate into VM images2) Run site-local Blockstack server3) Run Syndicate MS in Google AppEngine4)5)6)7) (optional) Run gateways on users’ behalf

MS is untrusted Helps gateway discovery Authentication through Blockstack

Page 18: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

System Status

Driver support− Amazon S3, Google Drive, Box.net, Dropbox, …− GenBank, M-Lab, iRODS, local disk, …− FUSE, Node.js, HDFS, shell programs, …

Blockstack in production since 2015− https://github.com/blockstack

Syndicate is alpha− Usable, with quirks− https://github.com/syndicate-storage

Page 19: Syndicate: Software-defined Wide-area Storage · Syndicate: Software-defined Wide-area Storage. Jude Nelson Princeton University. Background

Thank you!

Questions?