Riak at The NYC Cloud Computing Meetup Group

41
A Walk Down NOSQL Lane in the Cloud Part 2: Riak NYC Cloud Computing Group, March 2011 Alexander Sicular @siculars

description

In depth look at the nosql product, Riak.

Transcript of Riak at The NYC Cloud Computing Meetup Group

Page 1: Riak at The NYC Cloud Computing Meetup Group

A Walk Down NOSQL Lane in the Cloud

Part 2: RiakNYC Cloud Computing Group, March 2011

Alexander Sicular@siculars

Page 2: Riak at The NYC Cloud Computing Meetup Group

Who is this blowhard?Columbia University pays my mortgage

For the better part of a decade in Medical Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast particularly in the area of Informatics

Page 3: Riak at The NYC Cloud Computing Meetup Group

Riak, eh?Dynamo inspired

Homogeneous

Single key-space

Distributed

Replicated

Predictable scaleability

Page 5: Riak at The NYC Cloud Computing Meetup Group

CAP Theoremhttp://en.wikipedia.org/wiki/CAP_theorem

Consistency

Availability

Partition tolerance

http://guide.couchdb.org/draft/consistency.html

Pick two?

Riak says: pick two at a time.

Page 6: Riak at The NYC Cloud Computing Meetup Group

Homogeneous

Every node is the same

Any node can service any request

Nodes gossip on their own port

Page 7: Riak at The NYC Cloud Computing Meetup Group

One Ring to Rule Them Single 160 bit key space

Huh?

No Sharding!

Page 8: Riak at The NYC Cloud Computing Meetup Group

Distributed (!= replicated)

riak is not sharded

vnodes = units of distribution

vnodes != physical nodes (pnodes)

vnodes map to pnodes

data is distributed at the vnode level

★Considerations:

-must plan maximum ring size

-think about number of vnodes per pnode

-generally no less than 10 vnodes per pnode

Page 9: Riak at The NYC Cloud Computing Meetup Group

Conflict ResolutionVector Clocks

ancestry / divergency maintained

automatic or manual resolution

★ Considerations:

X-Riak-ClientId,

X-Riak-Vclock

allow_mult

Page 10: Riak at The NYC Cloud Computing Meetup Group

Replicated (!= distributed)

configurable replication values (“N”)

configurable consistency and availability values at read and write time

- read

- write

- durable write

Page 11: Riak at The NYC Cloud Computing Meetup Group

Predictable Scaleability

How much performance per node?

Scale in both directions

> bin/riak-admin

> Usage: riak-admin { join | leave | backup | restore | test | status | reip | js_reload | wait-for-service | ringready | transfers }

Page 12: Riak at The NYC Cloud Computing Meetup Group

Data Agnosticschemaless

data objects may be of any type

binary, text (json, xml)

use content types

>curl -v -d 'this is a test' -H "Content-Type: text/plain" \http://127.0.0.1:8098/riak/testBucket/testKey

Page 13: Riak at The NYC Cloud Computing Meetup Group

Extra Goodies

Erlang

http://www.pragprog.com/titles/jaerlang/programming-erlang

Code Architecture

basho_bench

Multiple backends

bitcask, innodb, mem

Page 14: Riak at The NYC Cloud Computing Meetup Group

Code architecture

Highly modularized

riak_core

riak_kv

bitcask

erlang_js

http://bitbucket.org/basho

Page 15: Riak at The NYC Cloud Computing Meetup Group

basho_bench

Performance profiling

highly customizable

pretty pictures

key/value store generalized

https://wiki.basho.com/display/RIAK/Benchmarking+with+Basho+Bench

http://pics.livejournal.com/demmonoid/pic/00001sa7

Page 16: Riak at The NYC Cloud Computing Meetup Group

BitcaskRiak’s default disk backend

Write Only Log

Heavy updates will grow your footprint

- Look into compaction/merging settings

Keys are cached in memory with disk offsetshttps://spreadsheets.google.com/ccc?key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO

Page 18: Riak at The NYC Cloud Computing Meetup Group

Ok sounds good. How do I get it?

>git|hg clone http://bitbucket.org/basho/riak

>cd riak

>make all && make rel

OR if you’re on a mac:

>brew install riak

Page 19: Riak at The NYC Cloud Computing Meetup Group

Ok sounds good. How do I get it?

>git|hg clone http://bitbucket.org/basho/riak_search

>cd riak_search

>make all && make rel

OR if you’re on a mac:

>brew install riak-search

Page 20: Riak at The NYC Cloud Computing Meetup Group

What does that get me?

Fully functional

Self contained (<3)

Default configuration

-64 vnodes, “riak” cookie, N = 3

Page 21: Riak at The NYC Cloud Computing Meetup Group

Work... like so.

Config fileshttp://wiki.basho.com/display/RIAK/Configuration+Files

app.config

-ring_creation_size

vm.args-name, -settings

Page 22: Riak at The NYC Cloud Computing Meetup Group

Fire it up

> bin/riak

> Usage: riak {start|stop|restart|reboot|ping|console|attach}

> bin/riak start

Page 23: Riak at The NYC Cloud Computing Meetup Group

GET:

> curl -v http://127.0.0.1:8098/ping

> curl -v http://127.0.0.1:8098/stats

> curl -v http://127.0.0.1:8098/riak/myBucket

> curl -v http://127.0.0.1:8098/riak/myBucket/myKey

Do Stuff!

PUT:

> curl -v -X PUT -H "Content-Type: application/json" -d '{"backend": "ets"}' http://127.0.0.1:8098/riak/myBucket

> curl -v -X PUT -d 'test key' http://127.0.0.1:8098/riak/myBucket/myKey

> curl -v -X POST -d 'autogen key' http://127.0.0.1:8098/riak/myBucket

Page 24: Riak at The NYC Cloud Computing Meetup Group

LinksLightweight Graphing

Practical limitations re. number of links per object

Unidirectional object linking

relationship modeling (one to one, one to many)

Returns “Content-Type: multipart/mixed;”

- Library needs to be multipart aware

- nodejs, formidable

Page 25: Riak at The NYC Cloud Computing Meetup Group

Link WalkingFirst level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_

Via Map/Reduce>$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @-{"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map":{"language":"javascript","source":"function(v){ return [v]; }"}}]}^D

N level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_

More Info:http://blog.basho.com/2010/02/24/link-walking-by-example/http://wiki.basho.com/display/RIAK/Linkshttp://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking

Page 26: Riak at The NYC Cloud Computing Meetup Group

Map/Reduce

Functions written in either Erlang or JavaScript

Map is distributed to where the data lives

Reduce is run on the node coordinating the M/R

Erlang > JavaScript

Tweak JavaScript settings in app.conf

Page 27: Riak at The NYC Cloud Computing Meetup Group

M/R in RiakAn input to start from

bucket

list of keys / keyfilter

★ keys > bucket

possible link phase

one or more map phases

(many) possible reduce phase(s)

function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0]; ! ! o.lastModifiedParsed = Date.parse(v["values"][0] \["metadata"]["X-Riak-Last-Modified"]);! o.key = v["key"];! ret.push(o);! return ret;! } else {! return [];! }! };

Map = SQL Select/Where clauseReduce = SQL Aggregates (SUM, COUNT, GROUP BY)

Page 28: Riak at The NYC Cloud Computing Meetup Group

Pre/Post Commit Hooks

Pre Commit

JavaScript or Erlang

Validation

Modify data

Kill writes

Post Commit

Erlang

Indexing

Messaging

Page 29: Riak at The NYC Cloud Computing Meetup Group

Chief complaints

No index

No native sort

No increment

No native data structures

Page 31: Riak at The NYC Cloud Computing Meetup Group

Riak Search... more

uses a modified bitcask backend called merge_index

enabled on a per bucket basis

access via http and command line

Page 32: Riak at The NYC Cloud Computing Meetup Group

Riak-JS

NodeJS Riak module

Written in Coffeescript

HTTP and Protobuf

Customizable via “meta” options

http://riakjs.org

Page 33: Riak at The NYC Cloud Computing Meetup Group

Code demo

nodejs

riak-js

redis

simple post site

tags

json data passing

Page 34: Riak at The NYC Cloud Computing Meetup Group

Javascript Mapvar map = function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0];! o.key = v["key"]; //put the key in the returned data object! o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"];! ret.push(o);! return ret;! } else {! return [];! }! };

Page 35: Riak at The NYC Cloud Computing Meetup Group

Javascript Reducevar sortInt = function ( data , args ) { var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field; var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === 'desc';! ! data.sort ( function(a,b) { ! ! ! if (desc) {! ! ! var _ref = [b, a];! ! ! a = _ref[0];! ! ! b = _ref[1];! ! ! } ! !! ! return a[sortBy] - b[sortBy] ! ! } );! ! return data! };

Page 36: Riak at The NYC Cloud Computing Meetup Group

Putting it all togetherriak! .add(“bucket”) //map function! .map(map) //reduce fuction! .reduce(sortInt, { field: "lastModified", order: "desc" }) ! .run(function(err, response) { //send out an error if there is one ! if (err) res.simpleJSON(400, {errortxt: 'mapreduce gone bad :('} );! //otherwise send the data back...! res.simpleJSON(200, { response } );!! });

Page 37: Riak at The NYC Cloud Computing Meetup Group

Hybrid architectures are the future!

Use tools like Redis to augment shortcomings!

Page 39: Riak at The NYC Cloud Computing Meetup Group

GoogleLook Ma!

No exact counts!

Page 40: Riak at The NYC Cloud Computing Meetup Group

Twitter

No Pagination!

No Totals!

Page 41: Riak at The NYC Cloud Computing Meetup Group

Questions?

NYC Cloud Computing Group, March 2011

Alexander Sicular@siculars