Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

26
april25-26 sanfrancisco cloud success starts here Building RightScale’s Globally Distributed Datastore Josep M. Blanquer, Chief Architect

Transcript of Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

Page 1: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

april25-26 sanfrancisco

cloud success starts here

Building RightScale’s Globally Distributed Datastore

Josep M. Blanquer, Chief Architect

Page 2: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 2# 2

#RightscaleCompute

In this talk…

• Intro• Data Taxonomy • Data Storage Design

• Scale, HA and DR considerations

• Conclusion

Page 3: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 3# 3

#RightscaleCompute

Intro: Expectations and scope

What this is and what is not• IS a talk about:

• how RightScale has designed and implemented its backing datastores

• …for a few of the most representative internal systems• …with the rationale behind it

• Is NOT a talk about• RightScale’s overall architecture• Nodes or hosts, it’s about Systems• RightScale’s data modeling

Page 4: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 4# 4

#RightscaleCompute

Intro: Tools and Technologies• RightScale uses a mix of RDBMS and NoSQL

technologies:• MySQL , Cassandra and S3 (for backups and archiving)

• Transactionality:• MySQL: strong ACID properties• Cassandra: no Atomicity, eventually Consistent, some Isolation,

Durable

• Availability:• MySQL: async replication. Master-SlaveN or Master-Master• Cassandra: Distributed, master-less, highly-replicated (multi-DC)

• Queryability:• MySQL: Extremely flexible at adding indexes and changing data

model• Cassandra: More difficult to change the querying patterns

Page 5: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 5# 5

#RightscaleCompute

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Page 6: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 6# 6

#RightscaleCompute

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Common across accounts: Users Plans Settings MultiCloud Marketplace:

Published Assets Sharing Groups …

Page 7: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 7# 7

#RightscaleCompute

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Deployments Imported assets Alert Specifications Server Inputs Audit Tags User Events …

Page 8: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 8# 8

#RightscaleCompute

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Cloud resource states (cache) Cloud credentials

Page 9: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 9# 9

#RightscaleCompute

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Instance agents location Core agents location Agent action registry …

Page 10: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 10# 10

#RightscaleCompute

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Collected metric data Collected syslog data …

Page 11: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 11# 11

#RightscaleCompute

Taxonomy of RightScale’s DataU

sers

Inst

an

ces

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data?• Users through the Dash/API• Instances from the Cloud

Data close to the Users

Data close to the Cloud

Data Placement

Page 12: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 12# 12

#RightscaleCompute

Taxonomy of RightScale’s DataX

-acc

tA

cco

un

t

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Which data do we need?• Data for all accounts• Data for a single account

Data shared between accounts

Data required within scopeof a single account

Data scope and containment

Page 13: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 13# 13

Talk with the Experts.

Use

rs

Taxonomy of RightScale’s DataIn

sta

nce

s

X-a

cct

Acc

ou

nt

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data? Proximity to User vs. Cloud

Which data do we need? Scope of data available

Close to cloud resourcesAccount-shardable* data

Close to userAccount-shardable data

Close to userGlobally accessible data

Page 14: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 14# 14

#RightscaleCompute

Use

rsIn

sta

nce

s

AccountX-Account

Page 15: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 15# 15

#RightscaleCompute

Use

rsIn

sta

nce

s

global

X-Account

Custom replication

Why custom? More control• Multiple sources• Individual columns• Apply transformations• Smart re-sync features

Global: MySQL• ACID semantics• Master-Slave replication

Page 16: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 16# 16

#RightscaleCompute

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

Dashboard: MySQL• ACID semantics• Master-SlaveN replication• Slave reads• Rows tagged by account

Other systems: Cassandra• Simpler Key-Value access• Great scalability• Great replica control• High write availability• Time-to-live expiration as cache• Rows tagged by account

Data archive: S3• Low read rate• Globally accessible

Page 17: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 17# 17

#RightscaleCompute

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

dash

events

tags

audit

So we can horizontally scale our dashboard by partitioning objects

based on account groups:

Clusters

Page 18: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 18# 18

#RightscaleCompute

Use

rs

AccountC

lust

er 1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Account Set 1 Account Set 2

RightScale Accounts

Clu

ster

3

dash

S3

events

tags

audit …

Features:• 1 cluster: N accounts• 1 account: 1 home• Migratable accounts

Benefits:• Great horizontal growth• Better failure isolation• Independent scale• Load rebalancing• Versionable code• Differentiated service

US Eas

t

EU Ja

pan

Page 19: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 19# 19

#RightscaleCompute

dash

events

tags

audit

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

routing

polling

monitor

X-Account

Page 20: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 20# 20

#RightscaleCompute

routing

polling

monitor

routing

polling

monitor

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

And partition our cloud objects based on the cloud the instances of an account run on:

Islands

Page 21: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 21# 21

#RightscaleCompute

Account

Inst

an

ces

Services co-locatedwith resources

Services co-locatedwith resources

Services co-locatedwith resources

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

routing

polling

monitor

routing

polling

monitor

Cloud 1 Cloud 2 Cloud N

Page 22: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 22# 22

#RightscaleCompute

Account

Inst

an

ces

Features:• 1 instance: 1 home island• 1 Island can serve N clouds• Core Agents: global data

Benefits:• Close to cloud resources• Good failure isolation

• As good as cloud • Good scale: global replicas across cassandra DCs

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Polling Clouds: MySQL• Master-Slave replication• Can port to NoSQL easily• Mostly a resource cache• But cloud partitionable

Monitoring: Custom• Replicated files• Backup to S3• Archive to S3

Routing: Cassandra• Simpler Key-Value access• Very high availability• Great scalability• Great replica control• Plus cross DC replication*

Page 23: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 23# 23

#RightscaleCompute

Use

rs

AccountC

lust

er 1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Clu

ster

3

dash

S3

events

tags

audit …

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Inst

an

ces

Azure

AWS E

ast

Privat

e

US Eas

t

Wes

t EU

Japa

n

Different Geographies

Different Clouds

What if the cloud where the clusteris deployed on…

Fails?What if the cloud where the islandis deployed on…

Fails?

Page 24: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 24# 24

#RightscaleCompute

Use

rs

AccountC

lust

er 1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Clu

ster

3

dash

S3

events

tags

audit …

US Eas

t

Wes

t EU

Japa

n

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Inst

an

ces

Azure

AWS E

ast

Privat

e

Sister Clusters

Full replica

Features:• Each master has an extra remote slave• Each cluster in a pair is a DC replica of the other’s localring

At Disaster Recovery time:• Apps are told to start serving an extra shard• No need to provision more infrastructure to recover (try to avoid since everybody is on the same boat)

• New resources can be allocated over time to help offload existing ones

Page 25: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

# 25# 25

#RightscaleCompute

Conclusions• Shown that RightScale uses multiple database

technologies:• RDBMS – MySQL for the ACID semantics and ‘queryability’

• Using a Master to N-Slaves for RO scale, and quick failure recovery• And ReadOnly Provisioning – To increase RO availability and scale remote systems

• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster• For fully replicated regions across the globe (for Read/Write!)

• Shown how RightScale uses them in different techniques• It partitions resource data into Islands based on cloud proximity

• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances

• Can provide routing availability, colocated with instances for any world region

• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account

isolation/differentiation• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud

resources

• It maintains cluster pairs (sister sites)• To recover from full cloud region failures• It doesn’t require massive amounts of new resources to recover

Page 26: Building RightScale's Globally Distributed Datastore - RightScale Compute 2013

april25-26 sanfrancisco

cloud success starts here

Questions?