Membase Introduction

34
membase.org: The Simple, Fast, Elastic NoSQL Database Membase, Inc. Matt Ingenthron [email protected]

description

Introduction to Membase, presented at SD Forum Cloud SIG on Oct. 26, 2010.

Transcript of Membase Introduction

Page 1: Membase Introduction

membase.org: The Simple, Fast, Elastic NoSQL Database

Membase, Inc.Matt [email protected]

Page 2: Membase Introduction

Membase is an Open Source distributed, key-value database management system optimized for storing data behind interactive web applications.

All aspects of membase are simple, fast and elastic by design.

2

Page 3: Membase Introduction

Valueimage courtesy http://www.flickr.com/photos/vintagedept/3617706196/

3

Page 4: Membase Introduction

Simple

Image courtesy http://www.flickr.com/photos/brenda-starr/3509344100/sizes/m/in/photostream/

4

Page 5: Membase Introduction

Simple

Image courtesy http://www.flickr.com/photos/brenda-starr/3509344100/sizes/m/in/photostream/

(with a replica )4

Page 6: Membase Introduction

Fast

5

• Original use case: speed up access to authoritative data as a distributed hashtable

• Must be at at least as fast as a highly tuned DBMS

• Designed for modern datacenter substrate– Designed for VM and cloud

deployments

Page 7: Membase Introduction

Elastic

• Add nodes without losing access to data

• Maintain consistency when accessing data– membase is a CP

type system• Scale linearly by just

adding more nodes

6

Page 8: Membase Introduction

Before: Application scales linearly, data hits wall

Application Scales OutJust add more commodity web servers

Database Scales UpGet a bigger, more complex server

7

Page 9: Membase Introduction

Membase is a distributed database

8

Membase Servers

In the data center

Web application server

Application user

On the administrator console

Page 10: Membase Introduction

Built-in Memcached Caching Layer

9

Memcached

Membase Database

Memcached

Membase Database

Memcached Mode Membase Mode

Fact: Membase development team has also contributed over half of the code to the Memcached project.

Page 11: Membase Introduction

Leading cloud service (PAAS) providerOver 65,000 hosted applicationsOver 2,000 users to dateMembase Server serving over 3,000 Heroku customers

Proven at small, and extra large scale

10

Social game leader – FarmVille, Mafia Wars, Café WorldOver 230 million monthly usersMembase Server is the 500,000 ops-per-second database behind FarmVille and Café World

Page 12: Membase Introduction

After: Data layer scales like application logic layerData layer now scales with linear cost and constant performance.

Application Scales OutJust add more commodity web servers

11

Database Scales OutJust add more commodity data servers

Scaling out flattens the cost and performance curves.

Membase Servers

Page 13: Membase Introduction

Who?

12

Page 14: Membase Introduction

Fault-tolerant memcached Cluster

at  NHNthe  biggest  web  portal  in  Korea

Page 15: Membase Introduction

What is Project Arcus?

• Memcached– Common protocol across PHP, Java, C

applications• Moxi (Memcached proxy) based• In-house automatic fault-detection and failover

solution• Collectd-based monitoring• Proxy and cache server administration UI• Private cloud service

14

Page 16: Membase Introduction

Previous Deployments

• A few individual memcached installations• Problems

– No fault-tolerance• Hardware failures are common (heat, network switch

failure, etc)– No automatic scalability

• To add / remove a memcached server, they need to rebuild code, distribute, and restart all clients

15

Page 17: Membase Introduction

Today

• Memcached clusters– Fault-tolerance transparent to clients

• Consistent hashing in moxi (memcached proxy)– Cache As A Service (CaaS)

• All major services in NHN started using cache• Multitenancy across cache services

16

Page 18: Membase Introduction

Performance impact

X 16.6

Throughput

X 10

Response Time

Performance

50 %

34 %

DB Load

Page 19: Membase Introduction

Membase-Cloudera Partnership

“AOL serves more than 5 billion impressions per day from our ad serving platforms, and any incremental improvement in processing time translates to huge benefits in our ability to more effectively serve the ads to needed meet our contractual commitments. Traditional databases like MySQL lack the scalability required to support our goal of five milliseconds per read/write. Creating user profiles with Hadoop, then serving them from Membase, reduces profile read and write access to under a millisecond, leaving the bulk of the processing time budget for improved targeting and customization.”

Pero SubasicChief Architect, AOL

Page 20: Membase Introduction

Joint development of bi-directional software integration between Membase and Hadoop• Membase NodeCode Module streaming interface

to Cloudera Distribution for Hadoop via Flume interface

• Sqoop-derived command line utility for bi-directional batch movement of data between Membase and Cloudera Distribution for Hadoop

Joint marketing and sales of integrated distributed OLTP-OLAP solution• Membase – the distributed OLTP solution• Cloudera – the distributed OLAP solutionCloudera to distribute integration

Membase-Cloudera Partnership

Page 21: Membase Introduction

Customer use case – Ad targeting

20

eventsprofiles, campaigns

profiles, real time campaign statistics

40 milliseconds to come up with an answer.

2

3

1

Page 22: Membase Introduction

21

Demo

Page 23: Membase Introduction
Page 24: Membase Introduction

The Guts

Photo Courtesy http://www.flickr.com/photos/pellis/76804760/

23

Page 25: Membase Introduction

Clustering

• Underlying cluster functionality based on erlang OTP

• Have a custom, vector clock based way of storing and propagating...– Cluster topology– vBucket mapping

• Collect statistics from many nodes of the cluster– Identify hot keys,

resource utilization 24

Page 26: Membase Introduction
Page 27: Membase Introduction
Page 28: Membase Introduction
Page 29: Membase Introduction
Page 30: Membase Introduction

vBucket mapping

26

Page 31: Membase Introduction

TAP

• A generic, scalable method of streaming mutations from a given server– As data operations arrive, they can be sent to arbitrary TAP

receivers

• Leverages the existing memcached engine interface, and the non-blocking IO interfaces to send data

• Three modes of operation

Working setDataMutations

Working setDataMutations

Working set

27

Page 32: Membase Introduction

Disk > Memory

Buc

ket C

onfig

urat

ion

mem_high_wat

mem_low_wat

memory quota

28

Dataset may have many items infrequently accessed. However, memcached has different behavior (LRU) than wanted with membase.

Still, traditional (most) RDBMS implementations are not 100% correct for us either. The speed of a miss is very, very important.

Page 33: Membase Introduction

ns_servermembase(memcached + membase engine)

moxi ns_server

vbucketmigratorTAP

memcached operationswith tap commands

memcached operations

Client

port 11211 memcached operations

moxi + Client

port 11210 memcached operations REST/comet

cluster topology and vbucket map

Clients, nodes and other nodes

29

Page 34: Membase Introduction