Building Cloud Services with Riak - erlang-factory.com€¦ · Building Cloud Services with Riak...

28
Building Cloud Services with Riak Andy Gross, @argv0 Principal Architect, Basho Technologies Erlang Factory SF 2012 Monday, April 2, 12

Transcript of Building Cloud Services with Riak - erlang-factory.com€¦ · Building Cloud Services with Riak...

Building Cloud Services with RiakAndy Gross, @argv0Principal Architect, Basho TechnologiesErlang Factory SF 2012

Monday, April 2, 12

What is a cloud service?

Web scale (seriously)

highly available

globally distributed

horizontally scalable

operationally simple

Monday, April 2, 12

Riak is Web Scale

Web scale (seriously)

highly available

globally distributed

horizontally scalable

operationally simple

Monday, April 2, 12

NoSQL Complexity

Quorum controls: R, W, DW, PW, PR

Backend choices

Unfamiliar query model

Client libraries sometimes immature

Difficult to sell!

Monday, April 2, 12

Use Riak as foundation for simpler, “vertical” services (like cloud storage).

Monday, April 2, 12

Vertical Services

Abstract away NoSQL complexity

Provide simpler APIs

Have fewer configuration knobs

Have existing client libraries (sometimes)

Easier to sell!

Monday, April 2, 12

Riak Service Patterns

Riak Core Application

Stateless Proxy

Clustered Proxy

Monday, April 2, 12

Riak Extension Points

foo_vnode foo_ring|node_watcher

foo_db

foo_fsm foo_consolefoo_api

Riak Core

commit hooks custom kv backends

custom hash functions

Monday, April 2, 12

Riak Core Application

Usually consists of a custom vnode implementation and client API

Usually requires FSM for coordination with vnodes

Example:

https://github.com/jbrisbin/misultin-riak-core-vnode-dispatcher

Monday, April 2, 12

Stateless Proxy

Implemented as client of Riak KV

Deployed in separate VM

Uses Riak KV for all state storage

Proxies have no knowledge of each other

Scale independently from Riak

Monday, April 2, 12

Clustered Proxy

Use Riak Core at proxy layer for:

clustering

load balancing

distribution of proxy state

Monday, April 2, 12

Example:

Monday, April 2, 12

Riak CS

Announced on Tuesday

S3-compatible cloud storage backed by Riak

Follows “Stateless Proxy” pattern

Riak CS

Riak CS

Riak CS

RiakEDS

RiakEDS

RiakEDS

Monday, April 2, 12

Riak CS Overview

Implements S3 API via webmachine

Large files come in through API, 1+ Riak Objects written:

manifest: file metadata

chunks: statically sized chunks of large file

Monday, April 2, 12

What worked well?

Monday, April 2, 12

Riak KV, Riak Core!

No code modifications required to build Riak CS

Resulting service inherits all of Riak’s webscale

Monday, April 2, 12

Tools

Erlang

Rebar

Quickcheck

Webmachine

Other Basho Open Source projects

Monday, April 2, 12

Process

Started as a prototype

Iterated quickly with a beta customer

Shipped frequently

Small, close team, grown slowly

Monday, April 2, 12

What sucked?

Monday, April 2, 12

Connection Pooling

Just as hard as caching and naming

# incoming connections > # connection capacity of cluster

Started with naive approach

Outsourced to proxy software

Wrote proper connection pool

Monday, April 2, 12

Conflict Resolution Is Hard

Implementation of conflict-handling code can be very tricky

Required for high availability

CRDTs may help

QuickCheck saves the day, as always

Monday, April 2, 12

Lack Of Strong Consistency

Some S3 operations need to be atomic

Riak can’t do this

Implemented a stopgap solution with less-than-ideal availability properties

Monday, April 2, 12

Customer Environments

Everything besides Riak and Riak CS

Software != Service

Planning

Provisioning

Deployment

Monitoring

Monday, April 2, 12

What may suck soon?

Monday, April 2, 12

Large Erlang Clusters

~150 works well in Riak deployments

Not sure how much further we can go

Approaches:

shard among many Riak clusters

investigate new distribution protocol

Monday, April 2, 12

Storage Costs

3x replication per datacenter gets expensive

Erasure coding is a possibility

Smarter global replication

notion of “home cluster” with 3 copies, others hav e 1 or 2

Monday, April 2, 12

Conclusions

Riak makes a perfect foundation for large scale internet services

Basho will make more of these

Lots of work to do on the environments riak/riak cs runs in

Monday, April 2, 12

Questions?

Monday, April 2, 12