Post on 08-May-2015
description
A Walk Down NOSQL Lane in the Cloud
Part 2: RiakNYC Cloud Computing Group, March 2011
Alexander Sicular@siculars
Who is this blowhard?Columbia University pays my mortgage
For the better part of a decade in Medical Informatics
Am not shilling for any of these companies
Am not a computer scientist
Am a computer science enthusiast particularly in the area of Informatics
Riak, eh?Dynamo inspired
Homogeneous
Single key-space
Distributed
Replicated
Predictable scaleability
Origins
Amazon’s Dynamo
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Akamai
http://www.basho.com/bios.html
Show me your friends...
Paramount Home Video
CAP Theoremhttp://en.wikipedia.org/wiki/CAP_theorem
Consistency
Availability
Partition tolerance
http://guide.couchdb.org/draft/consistency.html
Pick two?
Riak says: pick two at a time.
Homogeneous
Every node is the same
Any node can service any request
Nodes gossip on their own port
One Ring to Rule Them Single 160 bit key space
Huh?
No Sharding!
Distributed (!= replicated)
riak is not sharded
vnodes = units of distribution
vnodes != physical nodes (pnodes)
vnodes map to pnodes
data is distributed at the vnode level
★Considerations:
-must plan maximum ring size
-think about number of vnodes per pnode
-generally no less than 10 vnodes per pnode
Conflict ResolutionVector Clocks
ancestry / divergency maintained
automatic or manual resolution
★ Considerations:
X-Riak-ClientId,
X-Riak-Vclock
allow_mult
Replicated (!= distributed)
configurable replication values (“N”)
configurable consistency and availability values at read and write time
- read
- write
- durable write
Predictable Scaleability
How much performance per node?
Scale in both directions
> bin/riak-admin
> Usage: riak-admin { join | leave | backup | restore | test | status | reip | js_reload | wait-for-service | ringready | transfers }
Data Agnosticschemaless
data objects may be of any type
binary, text (json, xml)
use content types
>curl -v -d 'this is a test' -H "Content-Type: text/plain" \http://127.0.0.1:8098/riak/testBucket/testKey
Extra Goodies
Erlang
http://www.pragprog.com/titles/jaerlang/programming-erlang
Code Architecture
basho_bench
Multiple backends
bitcask, innodb, mem
Code architecture
Highly modularized
riak_core
riak_kv
bitcask
erlang_js
http://bitbucket.org/basho
basho_bench
Performance profiling
highly customizable
pretty pictures
key/value store generalized
https://wiki.basho.com/display/RIAK/Benchmarking+with+Basho+Bench
http://pics.livejournal.com/demmonoid/pic/00001sa7
BitcaskRiak’s default disk backend
Write Only Log
Heavy updates will grow your footprint
- Look into compaction/merging settings
Keys are cached in memory with disk offsetshttps://spreadsheets.google.com/ccc?key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO
Speak my language?
HTTP
http://wiki.basho.com/display/RIAK/REST+API
Protocol Buffers
http://wiki.basho.com/display/RIAK/PBC+API
Native Erlang
http://wiki.basho.com/display/RIAK/Erlang+Client+PBC
http://www.zazzle.com/speak_to_me_in_tagalog_tshirt-235376204895796392
Ok sounds good. How do I get it?
>git|hg clone http://bitbucket.org/basho/riak
>cd riak
>make all && make rel
OR if you’re on a mac:
>brew install riak
Ok sounds good. How do I get it?
>git|hg clone http://bitbucket.org/basho/riak_search
>cd riak_search
>make all && make rel
OR if you’re on a mac:
>brew install riak-search
What does that get me?
Fully functional
Self contained (<3)
Default configuration
-64 vnodes, “riak” cookie, N = 3
Work... like so.
Config fileshttp://wiki.basho.com/display/RIAK/Configuration+Files
app.config
-ring_creation_size
vm.args-name, -settings
Fire it up
> bin/riak
> Usage: riak {start|stop|restart|reboot|ping|console|attach}
> bin/riak start
GET:
> curl -v http://127.0.0.1:8098/ping
> curl -v http://127.0.0.1:8098/stats
> curl -v http://127.0.0.1:8098/riak/myBucket
> curl -v http://127.0.0.1:8098/riak/myBucket/myKey
Do Stuff!
PUT:
> curl -v -X PUT -H "Content-Type: application/json" -d '{"backend": "ets"}' http://127.0.0.1:8098/riak/myBucket
> curl -v -X PUT -d 'test key' http://127.0.0.1:8098/riak/myBucket/myKey
> curl -v -X POST -d 'autogen key' http://127.0.0.1:8098/riak/myBucket
LinksLightweight Graphing
Practical limitations re. number of links per object
Unidirectional object linking
relationship modeling (one to one, one to many)
Returns “Content-Type: multipart/mixed;”
- Library needs to be multipart aware
- nodejs, formidable
Link WalkingFirst level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_
Via Map/Reduce>$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @-{"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map":{"language":"javascript","source":"function(v){ return [v]; }"}}]}^D
N level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_
More Info:http://blog.basho.com/2010/02/24/link-walking-by-example/http://wiki.basho.com/display/RIAK/Linkshttp://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking
Map/Reduce
Functions written in either Erlang or JavaScript
Map is distributed to where the data lives
Reduce is run on the node coordinating the M/R
Erlang > JavaScript
Tweak JavaScript settings in app.conf
M/R in RiakAn input to start from
bucket
list of keys / keyfilter
★ keys > bucket
possible link phase
one or more map phases
(many) possible reduce phase(s)
function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0]; ! ! o.lastModifiedParsed = Date.parse(v["values"][0] \["metadata"]["X-Riak-Last-Modified"]);! o.key = v["key"];! ret.push(o);! return ret;! } else {! return [];! }! };
Map = SQL Select/Where clauseReduce = SQL Aggregates (SUM, COUNT, GROUP BY)
Pre/Post Commit Hooks
Pre Commit
JavaScript or Erlang
Validation
Modify data
Kill writes
Post Commit
Erlang
Indexing
Messaging
Chief complaints
No index
No native sort
No increment
No native data structures
Riak Search
Betalicious
Superset of Riak
Full text search
http://wiki.basho.com/display/RIAK/Riak+Search
http://www.slideshare.net/rklophaus/riak-search-erlang-factory-london-2010
http://www.seowebworx.co.uk/
Riak Search... more
uses a modified bitcask backend called merge_index
enabled on a per bucket basis
access via http and command line
Riak-JS
NodeJS Riak module
Written in Coffeescript
HTTP and Protobuf
Customizable via “meta” options
http://riakjs.org
Code demo
nodejs
riak-js
redis
simple post site
tags
json data passing
Javascript Mapvar map = function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0];! o.key = v["key"]; //put the key in the returned data object! o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"];! ret.push(o);! return ret;! } else {! return [];! }! };
Javascript Reducevar sortInt = function ( data , args ) { var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field; var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === 'desc';! ! data.sort ( function(a,b) { ! ! ! if (desc) {! ! ! var _ref = [b, a];! ! ! a = _ref[0];! ! ! b = _ref[1];! ! ! } ! !! ! return a[sortBy] - b[sortBy] ! ! } );! ! return data! };
Putting it all togetherriak! .add(“bucket”) //map function! .map(map) //reduce fuction! .reduce(sortInt, { field: "lastModified", order: "desc" }) ! .run(function(err, response) { //send out an error if there is one ! if (err) res.simpleJSON(400, {errortxt: 'mapreduce gone bad :('} );! //otherwise send the data back...! res.simpleJSON(200, { response } );!! });
Hybrid architectures are the future!
Use tools like Redis to augment shortcomings!
1,456,023 Or “A Lot”
At scale, precision does not matter in practice.
http://photography.nationalgeographic.com/photography/enlarge/okavango-cape-buffalo_pod_image.html
GoogleLook Ma!
No exact counts!
No Pagination!
No Totals!
Questions?
NYC Cloud Computing Group, March 2011
Alexander Sicular@siculars