OpenStack Object Storage (Swift) Essentials - Sample Chapter
Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace...
Transcript of Openstack Swift - IBM Research People and Projects Swift Object Store Cloud ... OpenStack Rackspace...
© 2012 IBM Corporation
Page:1SWIFTSWIFT
OpenstackOpenstack SwiftSwift
Object Store Cloud Object Store Cloud
built from the grounds upbuilt from the grounds up
David David HadasHadas
Swift ATCSwift ATC
HRLHRL
[email protected]@il.ibm.com
© 2012 IBM Corporation
Page:2SWIFTSWIFT
Object Store Cloud Services
PUT/GET/DELETE
Expectations:
• Huge Capacity (Scale)
• Always Available + Safe
• High Performance
• More PUTs than GETs
• Low Cost per TB
© 2012 IBM Corporation
Page:3SWIFTSWIFT
High Level: Object Store vs. File System
� Interface:
– Web based: GET/PUT/DELETE
– RESTful: Stateless
– Metadata
� Synchronization
– Eventual Consistency
– No Distributed Locking
� Software Defined Storage
– Commodity hardware
– Designed to Fault but never Fail
– Built to auto-recover by design
� Features
– Basic services that scale (KISS)
– SW extendible with web interfaces
� Interface:
– Posix:
Open,Seek/Read/Write,Close
– Stateful
� Synchronization
– Always consistent
– Uses Distributed Locking
� Hardware and Software
– Best of breed hardware
– Designed not to fault
– Admin controlled recovery
� Features
– Abundant enterprise features built into the products
Object Store File System
© 2012 IBM Corporation
Page:4SWIFTSWIFT
Object Store Cloud Options
� Amazon S3
– Proprietary API
� Rackspace Cloud Files (Swift)
– Swift API
� Microsoft Blob Storage
– Proprietary API
� Google Cloud Storage
– Proprietary API
� HP Cloud Object Store (Swift)
– Swift API
� OpenStack Object Services
– Written in Python
- Apache license
– Interfaces:
- Native Swift 1.0
- S3, CDMI interfaces
© 2012 IBM Corporation
Page:5SWIFTSWIFT
Historical Perspective
OpenStackRackspace contributes Swift
NASA contributes Nova
Rackspace Cloud Files V1Distributed Storage
Centralized Metadata
Hard to scale – while growing
rapidly
Rackspacedesigns Swift’
Aug
Jun
Rackspace Cloud Files V2 (‘Swift’)Distributed Storage and Metadata
Jul
Oct
Dec
Bexar
Feb
Apr
Sep
Au
sti
n
Cactu
s
Dia
blo
Apr
Sep
Essex
Fo
lso
m
Apr
Gri
zzly
Rackspace deployment of Swift3 Data Centers
Migrated all customers
Jan
Jul
2008 2009 2010 2011 2012 2013
OpenStackFoundation
20
06
Amazon S3 Launched
S3 holds 14 Billions Objects
© 2012 IBM Corporation
Page:6SWIFTSWIFT
OpenStack - The Cloud operating System
�550K lines of code
�300K Downloads
�550 Developers
�7000 Individuals from 100 Countries
�850 Organizations
�10M$ Funding
�3000 participants in the last summit
© 2012 IBM Corporation
Page:7SWIFTSWIFT
Swift
A massively scalable redundant storage system
– Replicate objects to disk drives spread as far apart as possible
� Scales horizontally (adding new servers)
– Known to work with
- ‘Thousands of servers’
- ‘Petabytes of data’
� Designed to contain high rate of failures
– Inexpensive commodity hard drives and servers can be used
– Maintains replication level following faults
� An Open Source Software
© 2012 IBM Corporation
Page:8SWIFTSWIFT
Swift General Layout
� Swift Cluster
– Proxy Nodes
- Smart
– Storage Nodes
- Simple
– (Can be the same node)
� Object GET/PUT
– Web Clients use HTTP
– Approach a Proxy Node
– The Proxy node maps to storage devices
- Located on Storage Nodes
– Request is propagated to Storage Node
� A Cut-through technology
– Proxy and Storage Nodes never store the entire object in memory
PUTProxy NodesProxy Nodes
Storage NodesStorage Nodes
© 2012 IBM Corporation
Page:9SWIFTSWIFT
Swift API Examples
� GET MyAccount/MyContainer/MyObject
– Responder provides the object in the body + metadata in http headers
� PUT MyAccount/MyContainer
– Requester indicates container metadata in http headers => Responder indicates 201
� POST MyAccount/MyContainer
– Requester indicates container metadata in http headers => Responder indicates 200
� PUT MyAccount/MyContainer/MyObject
– Requester indicates object content in the message body + metadata in http headers
– Responder indicates 201
� GET MyAccount/MyContainer
– Responder provides the list of objects in the body + metadata in http headers
� DELETE MyAccount/MyContainer/MyObject
– Responder indicates 200
© 2012 IBM Corporation
Page:10SWIFTSWIFT
Principles of Operation - Mapping
� Mapping must allow us to stay:– Distributed
– Scalable
– Load Balanced
– Highly Available
� Fortunately, “All problems in computer science can be solved by another level of indirection (David Wheeler)“
� Double mapping to allow virtualization of Disk space– Objects are stored in virtual partitions
– Partitions are stored in “devices”
- Typically a device is a local hard drive in a server
- Device can be anything
- Allowing another level of indirection
© 2012 IBM Corporation
Page:11SWIFTSWIFT
Mapping Implementation
� Consistent Hashing concepts
– Only K/n keys are remapped
– Evenly distributed load during changes
� Mapping of Objects to Partitions
– Partition determined by MD5(Fully Qualified Object
Name)
– Uses Partition-Power highest bits of the MD5 result
– Can elastically grow and shrink
� Mapping Partitions to devices
– Using a ‘Ring’ which maps each partition to D devices
(i.e. D replications)
– The Ring is consistent allowing addition of removal of
devices
- Used mainly to grow the cluster
Devices:
5, 2, 14
MD5(‘FQ Object Name’)
Partition 3
Device TableID IP:PORT12 1.2.3.4:6010
…
sde1 partitions
2, 3, 8, 21
Device
sde1
1.2.3.4:6010
© 2012 IBM Corporation
Page:12SWIFTSWIFT
Principles of Operation - Eventual Consistency
� Faults are assumed – Disk faults
– Server faults
� Service continues normally during faults– Continue servicing both PUTs and GETs
– Continue offering the same level of replication
� Distributed, on-going recovery– Handoff replicas are created following lost of replica
� Always strive for consistency– Replicas look for the way ‘back home’
© 2012 IBM Corporation
Page:13SWIFTSWIFT
Eventual Consistency Implementation
� PUT/POST/DELETE ObjectAdd/Update an object at D replicas
– Respond after RoundDown(D/2 +1)
- On failure: place on a handoff device
- Any device may serve as a handoff for any object
– Update the container
- On failure: Queue in async_pending for later update
– …Delete marks an object for deletion
� GET Object Get one of the replicas
– Would only try the D replicas at the Ring
� Replicator Daemon– Crawl local devices
– Replicate locally found replicas to its designated place
- Delete successful and handoff
- Signatures for objects + partitions used for efficiency and speed
� Updater Daemon– Crawl local devices looking for
async_pending
– Updates container about an object
� Auditor Daemon– Crawl local devices
– Audit the object signatures
� Expirer Daemon– Deletes cold objects
© 2012 IBM Corporation
Page:14SWIFTSWIFT
Principles of Operation - Containers and Accounts
�Containers and Accounts are Objects
– The content of the object is the list of items it contains
– GET /myaccount/mycontainer retrieves the list of objects
– PUT /myaccount/mycontainer creates a container
– PUT /myaccount/mycontainer/myobject adds an object to a container
© 2012 IBM Corporation
Page:15SWIFTSWIFT
Containers and Accounts - Implementation
� The Container/Account is an SQLite3 DB.
– Today used for
� Separate Rings
– Allow store on separate devices (SSDs for containers/accounts):
- Objects
- Containers
- Accounts
� Containers and Accounts have their own set of eventual consistency daemons:
– Replicators
– Updaters
– Auditors
– Expirer
© 2012 IBM Corporation
Page:16SWIFTSWIFT
Swift Architecture
MD5
Swift
Proxy
Swift
Proxy
Swift
Proxy
Load Balancer
Client
PUT
PUT
PUT
Extensions Extensions Extensions
SwiftStorage Node
XFS Node
Object ServerContainer ServerAccount Server
Extensions
Authentication
Authorization
Logging
CORS
CDMI and S3 APIs
Scale-out architecture
Commodity hardware
ReplicatorUpdaterAuditorExpirer
‘John/videos/bungee2.mpg’
Device TableID IP:PORT12 1.2.3.4:6010
…
Device
sde1
© 2012 IBM Corporation
Page:17SWIFTSWIFT
IBM Participation – Create Enterprise Grade Swift
� Contributions to Open Source– Done:
- CDMI Support
- Swift as a request processor of Apache
– On going:
- Account ACLs
- Account Listing
- Enable Ring Size to Grow
- Swift Crawler
– Future
- Object ACLs
- Metadata Extendibility
- Undelete
© 2012 IBM Corporation
Page:18SWIFTSWIFT