Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace...

18
© 2012 IBM Corporation Page:1 SWIFT SWIFT Openstack Openstack Swift Swift Object Store Cloud Object Store Cloud built from the grounds up built from the grounds up David David Hadas Hadas Swift ATC Swift ATC HRL HRL [email protected] [email protected]

Transcript of Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace...

Page 1: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:1SWIFTSWIFT

OpenstackOpenstack SwiftSwift

Object Store Cloud Object Store Cloud

built from the grounds upbuilt from the grounds up

David David HadasHadas

Swift ATCSwift ATC

HRLHRL

[email protected]@il.ibm.com

Page 2: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:2SWIFTSWIFT

Object Store Cloud Services

PUT/GET/DELETE

Expectations:

• Huge Capacity (Scale)

• Always Available + Safe

• High Performance

• More PUTs than GETs

• Low Cost per TB

Page 3: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:3SWIFTSWIFT

High Level: Object Store vs. File System

� Interface:

– Web based: GET/PUT/DELETE

– RESTful: Stateless

– Metadata

� Synchronization

– Eventual Consistency

– No Distributed Locking

� Software Defined Storage

– Commodity hardware

– Designed to Fault but never Fail

– Built to auto-recover by design

� Features

– Basic services that scale (KISS)

– SW extendible with web interfaces

� Interface:

– Posix:

Open,Seek/Read/Write,Close

– Stateful

� Synchronization

– Always consistent

– Uses Distributed Locking

� Hardware and Software

– Best of breed hardware

– Designed not to fault

– Admin controlled recovery

� Features

– Abundant enterprise features built into the products

Object Store File System

Page 4: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:4SWIFTSWIFT

Object Store Cloud Options

� Amazon S3

– Proprietary API

� Rackspace Cloud Files (Swift)

– Swift API

� Microsoft Blob Storage

– Proprietary API

� Google Cloud Storage

– Proprietary API

� HP Cloud Object Store (Swift)

– Swift API

� OpenStack Object Services

– Written in Python

- Apache license

– Interfaces:

- Native Swift 1.0

- S3, CDMI interfaces

Page 5: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:5SWIFTSWIFT

Historical Perspective

OpenStackRackspace contributes Swift

NASA contributes Nova

Rackspace Cloud Files V1Distributed Storage

Centralized Metadata

Hard to scale – while growing

rapidly

Rackspacedesigns Swift’

Aug

Jun

Rackspace Cloud Files V2 (‘Swift’)Distributed Storage and Metadata

Jul

Oct

Dec

Bexar

Feb

Apr

Sep

Au

sti

n

Cactu

s

Dia

blo

Apr

Sep

Essex

Fo

lso

m

Apr

Gri

zzly

Rackspace deployment of Swift3 Data Centers

Migrated all customers

Jan

Jul

2008 2009 2010 2011 2012 2013

OpenStackFoundation

20

06

Amazon S3 Launched

S3 holds 14 Billions Objects

Page 6: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:6SWIFTSWIFT

OpenStack - The Cloud operating System

�550K lines of code

�300K Downloads

�550 Developers

�7000 Individuals from 100 Countries

�850 Organizations

�10M$ Funding

�3000 participants in the last summit

Page 7: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:7SWIFTSWIFT

Swift

A massively scalable redundant storage system

– Replicate objects to disk drives spread as far apart as possible

� Scales horizontally (adding new servers)

– Known to work with

- ‘Thousands of servers’

- ‘Petabytes of data’

� Designed to contain high rate of failures

– Inexpensive commodity hard drives and servers can be used

– Maintains replication level following faults

� An Open Source Software

Page 8: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:8SWIFTSWIFT

Swift General Layout

� Swift Cluster

– Proxy Nodes

- Smart

– Storage Nodes

- Simple

– (Can be the same node)

� Object GET/PUT

– Web Clients use HTTP

– Approach a Proxy Node

– The Proxy node maps to storage devices

- Located on Storage Nodes

– Request is propagated to Storage Node

� A Cut-through technology

– Proxy and Storage Nodes never store the entire object in memory

PUTProxy NodesProxy Nodes

Storage NodesStorage Nodes

Page 9: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:9SWIFTSWIFT

Swift API Examples

� GET MyAccount/MyContainer/MyObject

– Responder provides the object in the body + metadata in http headers

� PUT MyAccount/MyContainer

– Requester indicates container metadata in http headers => Responder indicates 201

� POST MyAccount/MyContainer

– Requester indicates container metadata in http headers => Responder indicates 200

� PUT MyAccount/MyContainer/MyObject

– Requester indicates object content in the message body + metadata in http headers

– Responder indicates 201

� GET MyAccount/MyContainer

– Responder provides the list of objects in the body + metadata in http headers

� DELETE MyAccount/MyContainer/MyObject

– Responder indicates 200

Page 10: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:10SWIFTSWIFT

Principles of Operation - Mapping

� Mapping must allow us to stay:– Distributed

– Scalable

– Load Balanced

– Highly Available

� Fortunately, “All problems in computer science can be solved by another level of indirection (David Wheeler)“

� Double mapping to allow virtualization of Disk space– Objects are stored in virtual partitions

– Partitions are stored in “devices”

- Typically a device is a local hard drive in a server

- Device can be anything

- Allowing another level of indirection

Page 11: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:11SWIFTSWIFT

Mapping Implementation

� Consistent Hashing concepts

– Only K/n keys are remapped

– Evenly distributed load during changes

� Mapping of Objects to Partitions

– Partition determined by MD5(Fully Qualified Object

Name)

– Uses Partition-Power highest bits of the MD5 result

– Can elastically grow and shrink

� Mapping Partitions to devices

– Using a ‘Ring’ which maps each partition to D devices

(i.e. D replications)

– The Ring is consistent allowing addition of removal of

devices

- Used mainly to grow the cluster

Devices:

5, 2, 14

MD5(‘FQ Object Name’)

Partition 3

Device TableID IP:PORT12 1.2.3.4:6010

sde1 partitions

2, 3, 8, 21

Device

sde1

1.2.3.4:6010

Page 12: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:12SWIFTSWIFT

Principles of Operation - Eventual Consistency

� Faults are assumed – Disk faults

– Server faults

� Service continues normally during faults– Continue servicing both PUTs and GETs

– Continue offering the same level of replication

� Distributed, on-going recovery– Handoff replicas are created following lost of replica

� Always strive for consistency– Replicas look for the way ‘back home’

Page 13: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:13SWIFTSWIFT

Eventual Consistency Implementation

� PUT/POST/DELETE ObjectAdd/Update an object at D replicas

– Respond after RoundDown(D/2 +1)

- On failure: place on a handoff device

- Any device may serve as a handoff for any object

– Update the container

- On failure: Queue in async_pending for later update

– …Delete marks an object for deletion

� GET Object Get one of the replicas

– Would only try the D replicas at the Ring

� Replicator Daemon– Crawl local devices

– Replicate locally found replicas to its designated place

- Delete successful and handoff

- Signatures for objects + partitions used for efficiency and speed

� Updater Daemon– Crawl local devices looking for

async_pending

– Updates container about an object

� Auditor Daemon– Crawl local devices

– Audit the object signatures

� Expirer Daemon– Deletes cold objects

Page 14: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:14SWIFTSWIFT

Principles of Operation - Containers and Accounts

�Containers and Accounts are Objects

– The content of the object is the list of items it contains

– GET /myaccount/mycontainer retrieves the list of objects

– PUT /myaccount/mycontainer creates a container

– PUT /myaccount/mycontainer/myobject adds an object to a container

Page 15: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:15SWIFTSWIFT

Containers and Accounts - Implementation

� The Container/Account is an SQLite3 DB.

– Today used for

� Separate Rings

– Allow store on separate devices (SSDs for containers/accounts):

- Objects

- Containers

- Accounts

� Containers and Accounts have their own set of eventual consistency daemons:

– Replicators

– Updaters

– Auditors

– Expirer

Page 16: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:16SWIFTSWIFT

Swift Architecture

MD5

Swift

Proxy

Swift

Proxy

Swift

Proxy

Load Balancer

Client

PUT

PUT

PUT

Extensions Extensions Extensions

SwiftStorage Node

XFS Node

Object ServerContainer ServerAccount Server

Extensions

Authentication

Authorization

Logging

CORS

CDMI and S3 APIs

Scale-out architecture

Commodity hardware

ReplicatorUpdaterAuditorExpirer

‘John/videos/bungee2.mpg’

Device TableID IP:PORT12 1.2.3.4:6010

Device

sde1

Page 17: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:17SWIFTSWIFT

IBM Participation – Create Enterprise Grade Swift

� Contributions to Open Source– Done:

- CDMI Support

- Swift as a request processor of Apache

– On going:

- Account ACLs

- Account Listing

- Enable Ring Size to Grow

- Swift Crawler

– Future

- Object ACLs

- Metadata Extendibility

- Undelete

Page 18: Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace contributes Swift NASA contributes Nova ... Migrated all customers Jul Jan

© 2012 IBM Corporation

Page:18SWIFTSWIFT