Sheepdog: yet another all in-one storage for openstack

25
Sheepdog: Yet Another All-In-On e Storage For Openstack Openstack Hong Kong Summit Liu Yuan 2013.11.8

Transcript of Sheepdog: yet another all in-one storage for openstack

Page 1: Sheepdog: yet another all in-one storage for openstack

Sheepdog: Yet Another All-In-One Storage For Openstack

Openstack Hong Kong Summit Liu Yuan 2013.11.8

Page 2: Sheepdog: yet another all in-one storage for openstack

Who I Am

• Active contributor to various open source projects such as Sheepdog, QEMU, Linux Kernel, Xen, Openstack, etc.

• Primary top contributor of Sheepdog project and co-maintains it with Kazutaka Morita from NTT Japan since 2011.12

• Technically lead the storage projects based on Sheepdog for internal uses of www.taobao.com

• Contacts–Email: [email protected]–Micro Blog: @淘泰来

Page 3: Sheepdog: yet another all in-one storage for openstack

Agenda

Introduction - Sheepdog Overview

Exploration - Sheepdog Internals

Openstack - Sheepdog Goal

Roadmap - Features From The Future

Industry - How Industry Use Sheepdog

Page 4: Sheepdog: yet another all in-one storage for openstack

Sheepdog Overview

Introduction

Page 5: Sheepdog: yet another all in-one storage for openstack

What Is Sheepdog

• Distributed Object Storage System In User Space–Manage Disks and Nodes•Aggregate the capacity and the power (IOPS + throughput)•Hide the failure of hardware•Dynamically grow or shrink the scale–Manage Data•Provide redundancy mechanisms (replication and erasure code) for high-availability•Secure the data with auto-healing and auto-rebalanced mechanisms–Provide Services•Virtual volume for QEMU VM, iSCSI TGT (Perfectly supported by upstream)•RESTful container (Openstack Swift and Amazon S3 Compatible, in progress)•Storage for Openstack Cinder, Glance, Nova (Available for Havana)

Page 6: Sheepdog: yet another all in-one storage for openstack

Sheepdog Architecture

Gateway

Store

1TB 1TB

1TB

Gateway

Store

1TB 1TB

2TB

Gateway

Store

1TB 2TB

X

Private Hash Ring

Global Consistent Hash Ring and P2P Node event: global rebalanceDisk event: local rebalance

No meta servers!Zookeeper: membership management and message queue

4TB

sheep daemon

dog admin tool

Hot-plugged Auto unplugged on EIO

Page 7: Sheepdog: yet another all in-one storage for openstack

Why Sheepdog

• Minimal assumptions of underlying kernel and file system–Any type of file systems that support extended attribute(xattr)–Only require kernel version >= 2.6.32

• Full of features–snapshot, clone, incremental backup, cluster-wide snapshot, discard, etc.–User-defined replication/erasure code scheme on VDI(Virtual Disk Image) basis–Auto node/disk management

• Easy to set up the cluster with thousands of nodes –Single daemon can manage unlimited number of disks in one node as efficient as RAID0–as many as 6k+ for a single cluster

• Small–Fast and very small memory footprint (less than 50 MB even when busy)–Easy to hack and maintain, 35K lines of code in C as of now

Page 8: Sheepdog: yet another all in-one storage for openstack

Sheepdog Internals

Exploration

Page 9: Sheepdog: yet another all in-one storage for openstack

Sheepdog In A Nutshell

/dev/vda

/dev/vdaShared PersistentCache

Gateway 1TB

1TB

1TB

1TB

2TB

Journal

Net

DataReplicated orErasure coded

WeightedNodes & Disks

256GSSD 4G

Store

Store

Page 10: Sheepdog: yet another all in-one storage for openstack

Sheepdog Volume

• Copy-On-Write Snapshot –Disk-only snapshot, disk & memory snapshot –Live snapshot, offline snapshot–Rollback(tree structure), clone –Incremental backup–Instant operation, only create 4M inode object

• Push many logics into client -> simple and fast code !–Only 4 opcodes for store, read/write/create/remove, snapshot is done by QEMU blo

ck driver or dog–Requests serialization is not handled by Sheepdog but client–Inode object is treated the same as data object

Page 11: Sheepdog: yet another all in-one storage for openstack

Gateway - Request Engine

1TB 1TB 2TB 2TB 4TB

Store

Req Queue

Node Ring

CacheReq Manager

RouteRetry

Net

Sheep

Sheep

Sheep

Sheep X

X

Degraded toGW-only On EIO

Retry OnError(Timeout, EIO)

Return Only AllSucceed

ConcurrentHandling

SocketCache

Page 12: Sheepdog: yet another all in-one storage for openstack

Store - Data Engine

1TB 1TB 2TB X 2TB

Disk Ring

Journal

Disk Manager

Net

Sheep

Sheep

Sheep

SheepAuto unplugged on EIO

1. Fake network err to ask GW retry2. Update disk ring3. Start local data rebalance

GW

Page 13: Sheepdog: yet another all in-one storage for openstack

Redundancy Scheme

Sheep Sheep Sheep

Full Replication

Sheep Sheep Sheep Sheep Sheep Sheep

Erasure Coding

Parity

Page 14: Sheepdog: yet another all in-one storage for openstack

Erasure Coding Over Full Replication

• Advantages–Far less storage overhead–Rumors breaking•Better R/W performance•Support random R/W•Can run VM Images !

• Disadvantages–Generate more traffic for recovery•(X + Y)/(Y + 1) times data ( Suppose X data, Y parity strips)– 2 times data for 4:2 compared to 3 full copies

0

20

40

60

80

100

Read Wri te

6 Nodes wi th 1Gb NI C

MB/s Repl i cati on (3 Copy)

Erasure (4: 2)

Page 15: Sheepdog: yet another all in-one storage for openstack

Recovery - Redundancy Repair & Data Rebalance

Node Ring

Recovery Manager

X

Net

Sleep Queue

Req Queue

Sheep

If lost, read or rebuild from other copy

SheepMigrate to other nodefor rebalance

Update VersionSchedule

Page 16: Sheepdog: yet another all in-one storage for openstack

Recovery Cont.

• Eager Recovery As Default–Allow users to stop it and do manual recovery temporarily

• Node/Disk Events Handling–Disk event and node event share the same algorithm•Handle mixed node and disk events nicely

–Subsequent event will supersede previous one•Handle group join/leave of disks and nodes gracefully

• Recovery Handling Transparently to the Client–Put requests for objects being recovered on sleep queue and wake it up later–Serve the request directly if object is right there in the store

Page 17: Sheepdog: yet another all in-one storage for openstack

Farm - Cluster Wide Snapshot

Sheep

Sheep

Sheep

Sheep

Net

Dedicated Backup Storage

Slicer

Hash Dedupper

128K

Meta DataGenerator

• Incremental backup• Up to 50% dedup ratio• Compression doesn't help

Think of Sheepdog On Sheepdog ? Yeah!

Page 18: Sheepdog: yet another all in-one storage for openstack

Sheepdog Goal

Openstack

Page 19: Sheepdog: yet another all in-one storage for openstack

Openstack Storage Components

• Cinder - Block Storage–Support since day 1

• Glance - Image Storage–Support merged at Havana version

• Nova - Ephemeral Storage–Not yet started

• Swift - Object Storage–Swift API compatible In progress

• Final Goal - Unified Storage–Copy-On-Write anywhere ?–Data dedup ?

Sheep Sheep Sheep Sheep

Cinder Glance

Unified Storage

NovaSwift

Page 20: Sheepdog: yet another all in-one storage for openstack

Features From The Future

Roadmap

Page 21: Sheepdog: yet another all in-one storage for openstack

Look Into The Future

• RESTful Container–Plans to be Openstack Swift API compatible first, coming soon

• Hyper Volume–256PB Volume, coming soon

• Geo-Replication• Sheepdog On Sheepdog

–Storage for cluster wide snapshot

• Slow Disk & Broken Disk Detecter–Deal with dead D state process hang because of broken disk in massive deplo

yment

Page 22: Sheepdog: yet another all in-one storage for openstack

How Industry Use Sheepdog

Industry

Page 23: Sheepdog: yet another all in-one storage for openstack

Sheepdog In Taobao & NTT

SD VM SD VM

SD VM SD VM

VM running inside Sheepdog Clusterfor test & dev at

Taobao

SD SD

SD SD

SD

SD

HTTP

Ongoing project with 10k+ ARM nodesfor cold data at

Taobao

SD SD

SD SD

SD

SD

LUN device pool

Sheepdog cluster runas iSCSI TGT backend

storage at NTT

Page 24: Sheepdog: yet another all in-one storage for openstack

Other Users In Production

Any more users I don't know ?

Page 25: Sheepdog: yet another all in-one storage for openstack

Go Sheepdog !

Q & A

Try me outgit clone git://github.com/sheepdog/sheepdog.git

Homepage http://sheepdog.github.io/sheepdog/