Riak at The NYC Cloud Computing Meetup Group

Post on 08-May-2015

1.903 views 3 download

description

In depth look at the nosql product, Riak.

Transcript of Riak at The NYC Cloud Computing Meetup Group

A Walk Down NOSQL Lane in the Cloud

Part 2: RiakNYC Cloud Computing Group, March 2011

Alexander Sicular@siculars

Who is this blowhard?Columbia University pays my mortgage

For the better part of a decade in Medical Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast particularly in the area of Informatics

Riak, eh?Dynamo inspired

Homogeneous

Single key-space

Distributed

Replicated

Predictable scaleability

CAP Theoremhttp://en.wikipedia.org/wiki/CAP_theorem

Consistency

Availability

Partition tolerance

http://guide.couchdb.org/draft/consistency.html

Pick two?

Riak says: pick two at a time.

Homogeneous

Every node is the same

Any node can service any request

Nodes gossip on their own port

One Ring to Rule Them Single 160 bit key space

Huh?

No Sharding!

Distributed (!= replicated)

riak is not sharded

vnodes = units of distribution

vnodes != physical nodes (pnodes)

vnodes map to pnodes

data is distributed at the vnode level

★Considerations:

-must plan maximum ring size

-think about number of vnodes per pnode

-generally no less than 10 vnodes per pnode

Conflict ResolutionVector Clocks

ancestry / divergency maintained

automatic or manual resolution

★ Considerations:

X-Riak-ClientId,

X-Riak-Vclock

allow_mult

Replicated (!= distributed)

configurable replication values (“N”)

configurable consistency and availability values at read and write time

- read

- write

- durable write

Predictable Scaleability

How much performance per node?

Scale in both directions

> bin/riak-admin

> Usage: riak-admin { join | leave | backup | restore | test | status | reip | js_reload | wait-for-service | ringready | transfers }

Data Agnosticschemaless

data objects may be of any type

binary, text (json, xml)

use content types

>curl -v -d 'this is a test' -H "Content-Type: text/plain" \http://127.0.0.1:8098/riak/testBucket/testKey

Extra Goodies

Erlang

http://www.pragprog.com/titles/jaerlang/programming-erlang

Code Architecture

basho_bench

Multiple backends

bitcask, innodb, mem

Code architecture

Highly modularized

riak_core

riak_kv

bitcask

erlang_js

http://bitbucket.org/basho

basho_bench

Performance profiling

highly customizable

pretty pictures

key/value store generalized

https://wiki.basho.com/display/RIAK/Benchmarking+with+Basho+Bench

http://pics.livejournal.com/demmonoid/pic/00001sa7

BitcaskRiak’s default disk backend

Write Only Log

Heavy updates will grow your footprint

- Look into compaction/merging settings

Keys are cached in memory with disk offsetshttps://spreadsheets.google.com/ccc?key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO

Ok sounds good. How do I get it?

>git|hg clone http://bitbucket.org/basho/riak

>cd riak

>make all && make rel

OR if you’re on a mac:

>brew install riak

Ok sounds good. How do I get it?

>git|hg clone http://bitbucket.org/basho/riak_search

>cd riak_search

>make all && make rel

OR if you’re on a mac:

>brew install riak-search

What does that get me?

Fully functional

Self contained (<3)

Default configuration

-64 vnodes, “riak” cookie, N = 3

Work... like so.

Config fileshttp://wiki.basho.com/display/RIAK/Configuration+Files

app.config

-ring_creation_size

vm.args-name, -settings

Fire it up

> bin/riak

> Usage: riak {start|stop|restart|reboot|ping|console|attach}

> bin/riak start

GET:

> curl -v http://127.0.0.1:8098/ping

> curl -v http://127.0.0.1:8098/stats

> curl -v http://127.0.0.1:8098/riak/myBucket

> curl -v http://127.0.0.1:8098/riak/myBucket/myKey

Do Stuff!

PUT:

> curl -v -X PUT -H "Content-Type: application/json" -d '{"backend": "ets"}' http://127.0.0.1:8098/riak/myBucket

> curl -v -X PUT -d 'test key' http://127.0.0.1:8098/riak/myBucket/myKey

> curl -v -X POST -d 'autogen key' http://127.0.0.1:8098/riak/myBucket

LinksLightweight Graphing

Practical limitations re. number of links per object

Unidirectional object linking

relationship modeling (one to one, one to many)

Returns “Content-Type: multipart/mixed;”

- Library needs to be multipart aware

- nodejs, formidable

Link WalkingFirst level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_

Via Map/Reduce>$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @-{"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map":{"language":"javascript","source":"function(v){ return [v]; }"}}]}^D

N level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_

More Info:http://blog.basho.com/2010/02/24/link-walking-by-example/http://wiki.basho.com/display/RIAK/Linkshttp://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking

Map/Reduce

Functions written in either Erlang or JavaScript

Map is distributed to where the data lives

Reduce is run on the node coordinating the M/R

Erlang > JavaScript

Tweak JavaScript settings in app.conf

M/R in RiakAn input to start from

bucket

list of keys / keyfilter

★ keys > bucket

possible link phase

one or more map phases

(many) possible reduce phase(s)

function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0]; ! ! o.lastModifiedParsed = Date.parse(v["values"][0] \["metadata"]["X-Riak-Last-Modified"]);! o.key = v["key"];! ret.push(o);! return ret;! } else {! return [];! }! };

Map = SQL Select/Where clauseReduce = SQL Aggregates (SUM, COUNT, GROUP BY)

Pre/Post Commit Hooks

Pre Commit

JavaScript or Erlang

Validation

Modify data

Kill writes

Post Commit

Erlang

Indexing

Messaging

Chief complaints

No index

No native sort

No increment

No native data structures

Riak Search... more

uses a modified bitcask backend called merge_index

enabled on a per bucket basis

access via http and command line

Riak-JS

NodeJS Riak module

Written in Coffeescript

HTTP and Protobuf

Customizable via “meta” options

http://riakjs.org

Code demo

nodejs

riak-js

redis

simple post site

tags

json data passing

Javascript Mapvar map = function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0];! o.key = v["key"]; //put the key in the returned data object! o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"];! ret.push(o);! return ret;! } else {! return [];! }! };

Javascript Reducevar sortInt = function ( data , args ) { var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field; var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === 'desc';! ! data.sort ( function(a,b) { ! ! ! if (desc) {! ! ! var _ref = [b, a];! ! ! a = _ref[0];! ! ! b = _ref[1];! ! ! } ! !! ! return a[sortBy] - b[sortBy] ! ! } );! ! return data! };

Putting it all togetherriak! .add(“bucket”) //map function! .map(map) //reduce fuction! .reduce(sortInt, { field: "lastModified", order: "desc" }) ! .run(function(err, response) { //send out an error if there is one ! if (err) res.simpleJSON(400, {errortxt: 'mapreduce gone bad :('} );! //otherwise send the data back...! res.simpleJSON(200, { response } );!! });

Hybrid architectures are the future!

Use tools like Redis to augment shortcomings!

GoogleLook Ma!

No exact counts!

Twitter

No Pagination!

No Totals!

Questions?

NYC Cloud Computing Group, March 2011

Alexander Sicular@siculars