A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement...

Rama, Microsoft Research

joint work with

John MacCormick, Nick Murphy, Kunal Talwar,

Udi Wieder, Junfeng Yang, and Lidong Zhou

A Novel Data Placement Model for Highly-Available Storage Systems

Introduction

Kinesis:

Framework for placing data and replicas in a data-center storage system

Three Design Principles:

Structured organization

Freedom of choice

Scattered placement

Kinesis Design =>Structured Organization

Segmentation: Divide storage servers into k

segments

Each segment is an independent hash-based system

Failure Isolation: Servers with shared

components in the same segment

r replicas per item, each on a different segment

Reduces impact of correlated failures

Kinesis Design =>Freedom of Choice

Balanced Resource Usage:

Multiple-choice paradigm

Write: r out of k choices

Read: 1 out of r choices

Kinesis Design =>Scattered Data Placement

Independent hash functions

Each server stores different set of items

Parallel Recovery

Spread recovery load among multiple servers uniformly

Recover faster

Motivation => Data Placement Models

Data is stored on a server determined by a hash function

Server identified by a local computation during reads

Low overhead

Limited control in placing data items and replicas

Any server can store any data

A directory provides the server to fetch data form

Expensive to maintain a globally-consistent directory in a large-scale system

Can place data items carefully to maintain load balance, avoid correlated failure, etc.

Hash-based Directory-based

Kinesis =>Advantages

Enables hash-based storage systems to have advantages of directory-based systems

Near-optimal load balance

Tolerance to shared-component failures

Freedom from problems induced by correlated replica placement

No bottlenecks induced by directory maintenance

Avoids slow data/replica placement algorithms

Evaluation

Theory

Simulations

Experiments

Theory => The “Balls and Bins” Problem

Load balancing is often modeled by the task of throwing balls (items) into bins (servers)

Throw m balls into n bins:

Pick a bin uniformly at random

Insert the ball into the bin

Single-Choice Paradigm:

Max Load:(with high prob.)

Throw m balls into n bins:

Pick d bins uniformly at random (d ≥ 2)

Insert the ball into the less-loaded bin

Excess load is independent of m, number of balls! [BCSV00]

Max Load:(with high prob.)

Theory => Multiple-Choice Paradigm

Theory vs. Practice

Storage load vs. network load: [M91]

Network load is not persistent unlike storage load

Non-uniform sampling: [V99],[W07]

Servers chosen based on consistent (or linear) hashing

Replication: [MMRWYZ08]

Choose r out of k servers instead of 1 out of d bins

Heterogeneity:

Variable-sized servers [W07]

Variable-sized items [TW07]

Theory => Planned Expansions

Adding new, empty servers into the system

Creates sudden, large imbalances in load Empty servers vs. filled servers

Eventual storage balance Do subsequent inserts/writes fill up new servers?

Eventual network load balance Are new items distributed between old and new servers? New items are often more popular than old items

Expanding a linear-hashing system:

0 2b2b-1

Before Expansion:

Assume there are k n servers on k segments

Storage is evenly balanced

Expansion:

Add α n new servers to each segment

After expansion:

R = (1 - α) n servers with twice the load as

L = 2 α n servers (new servers + split servers)

Relative Load Distribution: RL

Ratio of expected number of replicas inserted in L servers

to expected number of replicas inserted on all servers

RL > 1 => eventual storage balance!

RL < 1 => no eventual storage balance

Theorem: RL > 1 if k ≥ 2 r [MMRWYZ08]

Evaluation

Theory

Simulations

Experiments

Simulations => Overview

Real-World Traces:

Compare with Chain: one segment

single-choice paradigm

chained replica placement

E.g. PAST, CFS, Boxwood, Petal

Trace Num Files Total Size Type

MSNBC 30,656 2.33 TB Read only

Plan-9 1.9 Million 173 GB Read/write

Simulations => Load Balance

MSNBC Trace Plan-9 Trace

0 10000 20000 30000

Number of Objects

Chain(3)

Kinesis(7,3)

0 1 2 3

Number of Operations (106)

Chain(3)

Kinesis(7,3)

Simulations => System Provisioning

29.0%26.0%

23.0%20.0%

K (5) K (6) K (7) K (8) K (9) K (10) Chain

Simulations => User Experienced Delays

0 100 200 300 400 500

Number of Servers

Chain (3)

Kinesis (7,3)

Simulations => Read Load vs. Write Load

Read load: short term resource consumption Network bandwidth, computation, disk bandwidth

Directly impacts user experience

Write load: short and long term resource consumption Storage space

Solution (Kinesis S+L): Choose short term over long term Storage balance is restored when transient resources are

not bottlenecked

0 20 40 60 80 100

Update to Read Ratio (%)

Kinesis S

Kinesis S+L

Kinesis L

0 4 8 12 16 20 24

Time (hours)

Request Rate

Kinesis S+L

Kinesis S

Evaluation

Theory

Simulations

Experiments

Kinesis => Prototype Implementation

Storage system for variable-sized objects

Read/Write interface

Storage servers

Linear hashing system

Read, Append, Exists, Size (storage), Load (queued requests)

Failure recovery for primary replicas Primary for an item is the highest numbered server with a replica

Kinesis => Prototype Implementation

Front end

Two-step read protocol

1) Query k candidate servers

2) Fetch from least loaded server with a replica

Versioned updates with copy-on-write semantics

RPC-based communication with servers

Non implemented!

Failure detector

Failure-consistent updates

Kinesis =>Experiments

15 node LAN test-bed

14 storage servers and 1 front end

MSNBC trace of 6 hours duration

5000 files, 170 GB total size, and 200,000 reads

Failure induced at 3 hours

10.7 10.7 10.2 10.6 10.3 10.8

17.8 17.4

10.9 10.7 10.4 10.6

Experiments => Kinesis vs. Chain

11.9 11.4 11.7 11.9 11.5 11.9 11.612.4 12.5 12.9 13.4 12.7 12.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Kinesis(7,3)

Chain(3)

Average Latency

Kinesis: 1.73 sec Chain: 3 sec

Experiments => Failure Recovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Kinesis(7,3)

Chain(3)

Total Recovery Time

Kinesis: 17 min(12 servers)

Chain: 44 min(5 servers)

Related Work

Prior work:

Hashing: Archipelago, OceanStore, PAST, CFS…

Two-choice paradigm: common load balancing technique

Parallel recovery: Chain Replication [RS04]

Random distribution: Ceph [WBML06]

Our contributions:

Putting them together in the context of storage systems

Extending theoretical results to the new design

Demonstrating the power of these simple design principles

Summary and Conclusions

Kinesis: A data/replica placement framework for LAN storage systems

Structured organization, freedom-of-choice, and scattered distribution

Simple and easy to implement, yet quite powerful!

Reduces infrastructure cost, tolerates correlated failures, and quickly restores lost replicas

Soon to appear in ACM Transactions on Storage, 2008Download: http://research.microsoft.com/users/rama

0 0.5 1 1.5 2 2.5 3

Time (hours)

Chain (3)

Kinesis (7,3)

0 0.5 1 1.5 2 2.5 3

Time (hours)

Total Load

Chain (3)

Kinesis (7,3)

A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement...

Documents

Transcript of A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement...

Big “Serverless” Data - Big Data Days · Serverless in big data processing Amazon Kinesis Video Streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose ... Amazon Kinesis

Amazon Kinesis Data Streams - usermanual.wiki · As you get started with Amazon Kinesis Data Streams, you can beneﬁt from understanding its architecture and terminology. Kinesis

KINESIS MONETARY SYSTEM · PA 2 KINESIS BLUEPRINT CONTENTS 3 KINESIS MONETARY SYSTEM 4 INTRODUCTION 6 ABOUT US 7 OUR VALUES 8 ABX & Kinesis timeline 10 Executive Team 12 LeadershipTeam

Introducing Amazon Kinesis

Amazon Kinesis Firehose - Developer Guide · Amazon Kinesis Firehose Developer Guide Key Concepts What Is Amazon Kinesis Data Firehose? Amazon Kinesis Data Firehose is …

Kinesis Personal UNIQUE DESIGN /Kinesis Personal Kinesis …esports.ph › wp-content › uploads › 2014 › 06 › TG-Kinesis... · 2014-06-04 · Kinesis Personal. 38 39 UNIQUE

KINESIS MONETARY SYSTEM · The Kinesis Money team has developed a proprietary blockchain network forked off the Stellar blockchain for the Kinesis currency suite. The Kinesis Blockchain

Kinesis LabView Guide - Thorlabs Control/KINESIS/Kinesis-Labview.pdf · THORLABS 3 Kinesis in LabView Guide 4) In this folder the easiest way to ensure all the required files are

Inorganic Standards - Kinesis

Kinesis One - E-Sports International · 2014. 6. 4. · Kinesis & Flexibility / Kinesis / Kinesis One The Kinesis innovation & technology is also available in one single station.

Aws Kinesis

kinesis 2009 - 2010

KINESIS UK - LOOKBOOK

Kinesis: A New Approach to Replica Placement in ... · Kinesis: A New Approach to Replica Placement in Distributed Storage Systems JOHN MACCORMICK Department of Mathematics and Computer

Kinesis Corporate Brochure

Building Big Data Storage Solutions (Data Lakes) for ... · Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility AWS Whitepaper Amazon Kinesis Firehose Data Ingestion

Amazon Kinesis Data Analytics · Amazon Kinesis Data Analytics Developer Guide Getting Started What Is Amazon Kinesis Data Analytics for Java Applications? With Amazon Kinesis Data

Kinesis One - E-Sports Internationalesports.ph/wp-content/uploads/2014/06/TG-Kinesis.pdf · Kinesis & Flexibility / Kinesis / Kinesis One As one of the most prolific training platforms

Kinesis Personal Vision

Kinesis Survey Technologies Kinesis Webinar January 8 & 9, 2014 Mobile Testing - Best Practices.