Intravert Server side processing for Cassandra

download Intravert Server side processing for Cassandra

If you can't read please download the document

description

IntraVert

Transcript of Intravert Server side processing for Cassandra

  • 1. Before we get into the heavystuff, Lets imagine hacking around with C* for a bit...

2. You run a large video website CREATE TABLE videos (videoid uuid,videoname varchar,username varchar,description varchar, tags varchar,upload_date timestamp,PRIMARY KEY (videoid,videoname) ); INSERT INTO videos (videoid, videoname, username,description, tags, upload_date) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,My funny cat,ctodd, My catlikes to play the piano! So funny.,cats,piano,lol,2012-06-0108:00:00); 3. You have a bajillion users CREATE TABLE users (username varchar,firstname varchar,lastname varchar,email varchar,password varchar,created_date timestamp,PRIMARY KEY (username)); INSERT INTO users (username, firstname, lastname, email,password, created_date) VALUES (tcodd,Ted,Codd,[email protected],5f4dcc3b5aa765d61d8327deb882cf99,2011-06-01 08:00:00); 4. You can query up a storm SELECT firstname,lastname FROM users WHERE username=tcodd;firstname | lastname-----------+----------Ted |Codd SELECT * FROM videos WHERE videoid = b3a76c6b-7c7f-4af6-964f-803a9283c401 and videoname>N;videoid | videoname | description | tags | upload_date| usernameb3a76c6b-7c7f-4af6-964f-803a9283c401 | Now my dog plays piano! | Mydog learned to play the piano because of the cat. | dogs,piano,lol | 2012-08-30 16:50:00+0000 | ctodd 5. Thats great! Then you ask yourself... 6. Can I slice a slice (or sub query)? Can I do advanced where clauses ? Can I union two slices server side? Can I join data from two tables without tworequest/response round trips? What about procedures? Can I write functions or aggregation functions? 7. Lets look at the APIs we have http://www.slideshare.net/aaronmorton/apachecon-nafeb2013 8. But none of those APIs do what I want, and it seems simpleenough to do... 9. Intravert joins the party at the API Layer 10. Why not just do it client side? Move processing close to data Idea borrowed from Hadoop Doing work close to the source can result in: Less network IO Less memory spend encoding/decoding throwaway data New storage and access paradigms 11. Vertx + cassandra What is vertx ? Distributed Event Bus which spans the server andeven penetrates into client side for effortless real-time web applications What are the cool features? Asynchronous Hot re-loadable modules Modules can be written in groovy, ruby, java, java-script http://vertx.io 12. Transport, payload, andbatching 13. HTTP Transport HTTP is easy to use on firewalled networks Easy to secure Easy to compress The defacto way to do everything anyway IntraVert attempts to limit round-trips Not provide a terse binary format 14. JSON Payload Simple nested types like list, map, String Request is composed of N operations Each operation has a configurable timeout Again, IntraVert attempts to limit round-trips Not provide a terse message format 15. Why not use lighting fast transportand serialization library X? Xs language/code gen issues You probably can not TCP dump X Net-admins dont like 90 jars for health checks IntraVert attempts to limit round-trips: Prepared statements Server side filtering Other cool stuff 16. Sample request and response{"e": [ { { "type": "SETKEYSPACE","exception":null, "op": { "keyspace": "myks" }"exceptionId":null,}, { "type": "SETCOLUMNFAMILY", "opsRes": { "op": { "columnfamily": "mycf" }"0":"OK",}, { "1":"OK", "type": "SLICE", "2":[{ "op": {"name":"Founders", "rowkey": "beers", "start": "Allagash", "value":"Breakfast Stout" "end": "Sierra Nevada", }] "size": 9}}} }]} 17. Server side filter 18. Imagine your data looks like...{ "rowkey": "beers", "name":"Allagash", "value": "Allagash Tripel" }{ "rowkey": "beers", "name":"Founders", "value": "Breakfast Stout" }{ "rowkey": "beers", "name": "DogfishHead","value": "Hellhound IPA" } 19. Application requirement User request wishes to know which beers areBreakfast Stout (s) Common solutions: Write a copy of the data sorted by type Request all the data and parse on client side 20. Using an IntraVert filter Send a function to the server Function is applied to subsequent get or sliceoperations Only results of the filter are returned to theclient 21. Defining a filter JavaScript Syntax to create a filter{ "type": "CREATEFILTER", "op": { "name": "stouts", "spec": "javascript","value": "function(row) { if (row[value] == Breakfast Stout)return row; else return null; }" }}, 22. Defining a filter Groovy/Java We can define a groovy closure or Java filter{"type": "CREATEFILTER","op": { "name": "stouts", "spec": "groovy", "{ row -> if (row["value"] == "Breakfast Stout") return row elsereturn null }"}}, 23. Filter flow 24. Common filter use cases Transform data Prune columns/rows like a where clause Extract data from complex fields (json, xml,protobuf, etc) 25. Some light relief 26. Server Side Multi-Processor 27. Its the cure for your redis envy 28. Imagine your data looks like... { row key:1, { row key:4,name:a ,val...} name:a ,val...} { row key:1, { row key:4,name:b ,val...} name:z ,val...} 29. Application Requirements User wishes to intersect the column names oftwo slices/queries Common solutions Pull all results to client and apply the intersectionthere 30. Server Side MultiProcessor Send a class that implements MultiProcessorinterface to server public List multiProcess(Map input, Map params); Do one or more get/slice operations as input Invoke MultiProcessor on input 31. Multi-processor flow 32. Multi-processor use cases Union N slices Intersection N slices Some Join scenarios 33. Fat client becomesthe Phat client 34. Imagine you want to insert this data User wishes to enter this event for multiple columnfamilies 09/10/201111:12:13 App=Yahoo Platform=iOS OS=4.3.4 Device=iPad2,1 Resolution=768x1024 EventsvideoPlayPercent=38Taste=great http://www.slideshare.net/charmalloc/jsteincassandranyc2011 35. Inserting the dataaggregateColumnNames(AppPlatformOSVersionDeviceResolution") = "app+platform+osversion+device+resolution#def ccAppPlatformOSVersionDeviceResolution(c: (String) => Unit) = {c(aggregateColumnNames(AppPlatformOSVersionDeviceResolution) + app + p(platform) + p(osversion) + p(device) + p(resolution))}aggregateKeys(KEYSPACEByMonth") = month //201109aggregateKeys(KEYSPACE"ByDay") = day //20110910aggregateKeys(KEYSPACEByHour") = hour //2011091012aggregateKeys(KEYSPACEByMinute") = minute //201109101213def r(columnName: String): Unit = {aggregateKeys.foreach{tuple:(ColumnFamily, String) => {val (columnFamily,row) = tupleif (row !=null && row.size > 0)rows add (columnFamily -> row has columnName inc) //increment the counter}}}ccAppPlatformOSVersionDeviceResolution(r)http://www.slideshare.net/charmalloc/jsteincassandranyc2011 36. Solution Send the data once and compute the Npermutations on the server sidepublic void process(JsonObject request, JsonObject state, JsonObject response, EventBus eb) {JsonObject params = request.getObject("mpparams");String uid = (String) params.getString("userid");String fname = (String) params.getString("fname");String lname = (String) params.getString("lname");String city = (String) params.getString("city");RowMutation rm = new RowMutation("myks", IntraService.byteBufferForObject(uid));QueryPath qp = new QueryPath("users", null, IntraService.byteBufferForObject("fname"));rm.add(qp, IntraService.byteBufferForObject(fname), System.nanoTime());QueryPath qp2 = new QueryPath("users", null, IntraService.byteBufferForObject("lname"));rm.add(qp2, IntraService.byteBufferForObject(lname), System.nanoTime());...try {StorageProxy.mutate(mutations, ConsistencyLevel.ONE);} catch (WriteTimeoutException | UnavailableException | OverloadedException e) { e.printStackTrace(); response.putString("status", "FAILED");}response.putString("status", "OK");} 37. Service Processor Flow 38. IntraVert status Still pre 1.0 Good docs https://github.com/zznate/intravert-ug/wiki/_pages Functional equivalent to thrift (mostly featuresported) CQL support Virgil (coming soon) Hbase like scanners (coming soon) 39. Hack at ithttps://github.com/zznate/intravert-ug 40. Questions?