Developing node-mdb: a Node.js - based clone of SimpleDB

Post on 15-Jan-2015

1.362 views 2 download

Tags:

description

Talk given at the London Ajax Users Group, June 14 2011

Transcript of Developing node-mdb: a Node.js - based clone of SimpleDB

Developing node-mdb

SimpleDB emulationusing Node.js and GT.M

Rob TweedM/Gateway Developments Ltd

http://www.mgateway.comTwitter: @rtweed

Could you translate that title?

• SimpleDB:– Amazon’s NoSQL cloud database

• Node.js:– evented server-side Javascript (using V8)

• GT.M:– Open source global-storage based NoSQL

database

• node-mdb– Open source emulation of SimpleDB

SimpleDB

• Amazon’s cloud database– Pay as you go

• Secure HTTP interface• Schema-free NoSQL database• Spreadsheet-like database model

– Domains (= tables)• Items (= rows)

– Attributes (=cells)

» Values (1+ per attribute allowed)

• SQL-like query API

Why emulate SimpleDB?

• Because I could!

• Kind of cool project

Why emulate SimpleDB?

• To provide a free, locally-available database that behaved identically to SimpleDB– Lots of off-the-shelf available clients

• Standalone– Bolso

– Mindscape’s SimpleDB Management Tools

• Language-specific clients– boto (Python)

– Official AWS clients for Java, .Net

– Node.js

– etc…

Why emulate SimpleDB?

• To perform local tests prior to committing to production on SimpleDB

• To provide a live, local backup database

• A SimpleDB database for private clouds

• To provide an immediately-consistent SimpleDB database– SimpleDB is “eventually consistent”

Why the GT.M database?• I’m familiar with it• Free Open Source NoSQL database• Schema-free• “Globals”:

– Sparse persistent multi-dimensional arrays• Hierarchical database• Completely dynamic storage

– No pre-declaration or specification needed

• Result: trivial to model SimpleDB in globals

• node-mdb: Good way to demonstrate the capabilities of the otherwise little-known GT.M

• More info – Google:– “GT.M database”– “universalnosql”

Why write it using Node.js?

• M/DB originally written in late 2008– Implemented using GT.M’s native scripting language

(M)– Apache + m_apache gateway to GT.M for HTTP

interface

• I’ve been working with Node.js for about a year now– Rewriting M/DB in Javascript would make it more

widely interesting and comprehensible

• Some performance issues reported with M/DB when being pushed hard

Why Node.js?

• Conclusion:– Re-implementing M/DB using Node.js should

provide better performance and scalability– Fewer moving parts:

• Apache + m_apache + GT.M / multi-threaded• Node.js + GT.M as child processes / single-thread

– Cool Node.js project to attempt– Great example of non-trivial use of Node.js +

database

How does SimpleDB work?

HTTPServer

AuthenticateRequest

(HMacSHA)

Security Key IdSecret Key

ExecuteAPI

Action

GenerateHTTP

Response

SimpleDBDatabaseCopy 1

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy n

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy 2

IncomingSDB

HTTPRequest

OutgoingSDB

HTTPResponse

Error Successand/or

data/results

Node.js can emulate all this

HTTPServer

AuthenticateRequest

(HMacSHA)

Security Key IdSecret Key

ExecuteAPI

Action

GenerateHTTP

Response

SimpleDBDatabaseCopy 1

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy n

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy 2

IncomingSDB

HTTPRequest

OutgoingSDB

HTTPResponse

Error Successand/or

data/results

GT.M can emulate this

HTTPServer

AuthenticateRequest

Security Key IdSecret Key

ExecuteAPI

Action

GenerateHTTP

Response

SimpleDBDatabaseCopy 1

IncomingSDB

HTTPRequest

OutgoingSDB

HTTPResponse

Error Successand/or

data/results

Node.js characteristics

• Single threaded process

• Event loop

• Non-blocking I/O– Asynchronous calls to functions that handle I/O– Event-driven call-back functions when function

completes• Data fetched• Data saved

Result: deeply nested call-backs

HTTPServer

AuthenticateRequest

Security Key IdSecret Key

ExecuteAPI

Action

GenerateHTTP

Response

Error Successand/or

data/results

Flattening the call-back nesting

processSDBRequest()

http server

executeAPI() sendResponse()

http.createServer(function(req,res) {..}

var processSDBRequest = function() {…};

var executeAPI = function() {…};

Node.js HTTP Serverhttp.createServer(function(request, response) { request.content = ''; request.on("data", function(chunk) { request.content += chunk; }); request.on("end", function(){ var SDB = {startTime: new Date().getTime(), request: request, response: response }; var urlObj = url.parse(request.url, true); if (request.method === 'POST') { SDB.nvps = parseContent(request.content); } else { SDB.nvps = urlObj.query; } var uri = urlObj.pathname; if ((uri.indexOf(sdbURLPattern) !== -1)||(uri.indexOf(mdbURLPattern) !== -1)) { processSDBRequest(SDB); } else { var uriString = 'http://' + request.headers.host + request.url; var error = {code:'InvalidURI', message: 'The URI ' + uriString + ' is not valid',status:400}; returnError(SDB ,error); } });}).listen(httpPort);

processSDBRequest()var processSDBRequest = function(SDB) { var accessKeyId = SDB.nvps.AWSAccessKeyId; if (!accessKeyId) { var error = {code:'AuthMissingFailure', message: 'AWS was not able to authenticate the request: access credentials are missing',status:403}; returnError(SDB, error); } else { MDB.getGlobal('MDBUAF', ['keys', accessKeyId], function (error, results) { if (!error) { if (results.value !== '') { accessKey[accessKeyId] = results.value; validateSDBRequest(SDB, results.value); } else { var error = {code:'AuthMissingFailure', message: 'AWS was not able to authenticate the request: access credentials are missing',status:403}; returnError(SDB, error); } } }); }};

validateSDBRequest()

var validateSDBRequest = function(SDB, secretKey) { var type = ‘HmacSHA256’; var stringToSign = createStringToSign(SDB, true); var hash = digest(stringToSign, secretKey, type); if (hash === SDB.nvps.Signature) { processSDBAction(SDB); } else { errorResponse('SignatureDoesNotMatch', SDB) }};

stringToSign()

POST{lf}192.168.1.134:8081{lf}/{lf}AWSAccessKeyId=rob&Action=ListDomains& MaxNumberOfDomains=100&SignatureMethod=HmacSHA1& SignatureVersion=2& Timestamp=2011-06-06T22%3A39%3A30%2 B00%3A00& Version=2009-04-15

ie: reconstruct the same string that the SDB client used to sign the request

then use rob’s secret key to sign it:

digest()

var crypto = require("crypto");

var digest = function(string, secretKey, type) { var hmac = crypto.createHmac(type, secretKey); hmac.update(string); return hmac.digest('base64');};

Ready to execute an API!

HTTPServer

AuthenticateRequest

Security Key IdSecret Key

ExecuteAPI

Action

GenerateHTTP

Response

SimpleDBDatabaseCopy 1

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy n

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy 2

IncomingSDB

HTTPRequest

OutgoingSDB

HTTPResponse

Error Successand/or

data/results

SimpleDB APIs (Actions)

• CreateDomain• ListDomains• DeleteDomain• PutAttributes (BatchPutAttributes)• GetAttributes• DeleteAttributes (BatchDeleteAttributes)• Select• DomainMetaData

Accessing the GT.M Database

• Accessed via node-mwire– TCP-based wire protocol– Extension of Redis protocol– Adapted redis-node module

• APIs allow you to set/get/delete/edit Globals

GT.M Globals

• Globals = unit of persistent storage– Schema-free– Hierarchically structured– Sparse– Dynamic

– “persistent associative array”

GT.M Globals

• A Global has:– A name– 0, 1 or more subscripts– String value

globalName[subscript1,subscript2,..subscriptn]=value

SDB Domain in GlobalsCreateDomain AWSAccessKeyId = ‘rob’ DomainName = ‘books’

MDB ‘rob’

‘domains’

‘name’

‘domainIndex’

‘created’ 1304956337618

‘books’

‘modified’ 1304956337618

‘books’

1

1 ‘’

‘name’

‘created’ 1304956337423

‘accounts’

‘modified’ 1304956337423

2

‘accounts’ 2 ‘’

Multiple Domains in Globals

MDB ‘rob’

‘domains’

‘name’

‘domainIndex’

‘created’ 1304956337618

‘books’

‘modified’ 1304956337618

‘books’

1

1 ‘’

2

Creating a new domain (1)

increment()

MDB ‘rob’

‘domains’

‘name’

‘domainIndex’

‘created’ 1304956337618

‘books’

‘modified’ 1304956337618

‘books’

1

1 ‘’

‘name’

‘created’ 1304956337423

‘accounts’

‘modified’ 1304956337423

2

‘accounts’ 2 ‘’

Creating a new domain (2)

setGlobal()

Key Node.js async patterns for db I/O

• Dependent pattern:– Can’t set the global nodes until the value of

the increment() is returned

• Parallel pattern:– Global nodes can be created in parallel– No interdependence– BUT:

• Need to know when they’re all completed

MDB ‘rob’

‘domains’

‘name’

‘created’ 1304956337618

‘books’

‘modified’ 1304956337618

1

2

Dependent pattern

MDB.increment([accessKeyId, 'domains'], 1, function (error, results) { var id = results.value; //….now create the other global nodes inside callback});

IncrBy

MDB ‘rob’

‘domains’

‘name’

‘created’ 1304956337618

‘books’

‘modified’ 1304956337618

1

2

Dependent pattern

MDB.increment([accessKeyId, 'domains'], 1, function (error, results) { var id = results.value; //….now create the other global nodes inside callback});

Parallel Pattern (semaphore) var count = 0; MDB.setGlobal([accessKeyId, 'domains', id, 'name'], domainName, function (error, results) { count++; if (count === 4) sendCreateDomainResponse(count, SDB); }); MDB.setGlobal([accessKeyId, 'domains', id, 'created'], now, function (error, results) { count++;

if (count === 4) sendCreateDomainResponse(count, SDB); }); MDB.setGlobal([accessKeyId, 'domains', id, 'modified'], now, function (error, results) { count++; if (count === 4) sendCreateDomainResponse(count, SDB); }); MDB.setGlobal([accessKeyId, 'domainIndex', nameIndex, id], '', function (error, results) { count++; if (count === 4) sendCreateDomainResponse(count, SDB); });

MDB ‘rob’

‘domains’

‘name’

‘domainIndex’

‘created’ 1304956337618

‘books’

‘modified’ 1304956337618

‘books’

1

1 ‘’

‘name’

‘created’ 1304956337423

‘accounts’

‘modified’ 1304956337423

2

‘accounts’ 2 ‘’

New domain nodes created

Send CreateDomain Response

HTTPServer

AuthenticateRequest

Security Key IdSecret Key

ExecuteAPI

Action

GenerateHTTP

Response

SimpleDBDatabaseCopy 1

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy n

SimpleDBDatabaseCopy 2

SimpleDBDatabaseCopy 2

IncomingSDB

HTTPRequest

OutgoingSDB

HTTPResponse

Error Successand/or

data/results

CreateDomain Response<?xml version="1.0"?><CreateDomainResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"> <ResponseMetadata> <RequestID>e4e9fa45-f9dc-4e5b-8f0a-777acce6505e</RequestID> <BoxUsage>0.0020000000</BoxUsage> </ResponseMetadata></CreateDomainResponse>

var okResponse = function(SDB) { var nvps = SDB.nvps; var xml = responseStart({action: nvps.Action, version: nvps.Version}); xml = xml + responseEnd(nvps.Action, SDB.startTime, false); responseHeader(200, SDB.response); SDB.response.write(xml); SDB.response.end();};

Node.js HTTP Server Response

http.createServer(function(request, response) { //…numerous call-backs deep:

response.writeHead(status, { "Server": "Amazon SimpleDB", "Content-Type": "text/xml", "Date": dateNow.toUTCString()}); response.write('<?xml version="1.0"?>\n'); response.write(xml); response.end();

});

Entire request/response SDB round-trip completed

Demo using Bolso

• List Domains

• Create Domain

• Add an item (row) and some attributes (columns + cells)

Node.js Gotchas

• Async programming is not immediately intuitive!

• Loops– Calling functions that use call-backs inside a

for..in loop will go horribly wrong!

• Understanding closures– How externally-defined variables can be used

inside call-back functions

Example

• BatchPutAttributes– Intuitively a for .. in loop around PutAttributes– Had to be serialised

• Completion of one PutAttributes calls the next

– Copy state of SDB object and use for..in?• var SDBx = SDB;• SDBx is a pointer to SDB, not a clone of it!

Conclusions• node-mdb is now nearly complete• Only BatchDeleteAttributes not implemented• Other APIs emulate SimpleDB 100%• Free Open Source

– https://github.com/robtweed/node-mdb– Give it a try!– Use mdb.js for examples to build your own Node.js database

applications• Check out GT.M!

• Follow me on Twitter at @rtweed

• Slides: http://www.mgateway.com/node-mdb-pres.html