Mashing the data

20
Mashing the Data Real-Time replication from MySQL to Google Cloud Datastore

Transcript of Mashing the data

Page 1: Mashing the data

Mashing the DataReal-Time replication from

MySQL to Google Cloud Datastore

Page 2: Mashing the data

Ingredients● MySQL● NodeJS● ZongJi● Google Cloud Datastore

Page 3: Mashing the data

There are two types of DBAs:1) DBAs that do backups

2) DBAs that will do backups

Page 4: Mashing the data

MySQL● Most used Open source DB - second place overall after Oracle (but almost

equal)*● Since 1995● Currently at version 5.7 (5.7.16 in Oct’16)● Several forks - MariaDB, Percona● Several storage engines, most used is InnoDB ● NDB Cluster and Master-Master Replication for HA

* According to http://db-engines.com/en/ranking

Page 5: Mashing the data

A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"

Page 6: Mashing the data

MySQL replication● Master - Slave(s)● Slaves can be Masters in their turn (Master->Slave->Slave->...->Slave)

○ log_slave_updates

● Only data modifying queries are logged (Create, Update, Delete; not Reads)

● 2 ½ types of replication○ Statement Based (SBR) -> binary log records queries (UPDATE … SET ..) which are then

replayed on slave

○ Row Based (RBR) -> binary log records directly the values of the affected row before and after the change is applied

○ Mixed -> binary log records a mix of SBR and RBR (default is SBR, but for certain statements + storage engine used, the log is automatically switched to row-based)

Page 7: Mashing the data

Q: Why do you never ask SQL people to help you move your furniture?

A: They sometimes drop the table

Page 8: Mashing the data

MySQL replication (cont’d)● SBR is good when changes affect lots of rows (as for e.g. 1k modified rows

we only send a few bytes across the wire)● SBR has problems when there are inconsistencies between master and

slave or when queries are not deterministic (e.g. UPDATE … SET … LIMIT 100)

● RBR is good in maintaining a better consistency (as every changed row is replicated)

● RBR can be problematic when many rows are changed with a single statement (lots of traffic over the network)

Page 9: Mashing the data
Page 10: Mashing the data

Google Cloud Datastore

Page 11: Mashing the data

What is GCD● NoSQL document database● Automatic scaling● High performance● Flexible storage

Page 12: Mashing the data

GCD (cont’d)● Balance of strong and eventual consistency

○ entity lookups by key and ancestor queries always receive strongly consistent data○ Other queries are eventually consistent

● Encryption at rest○ encrypts all data before it is written to disk

● Querying of data through GQL○ Similar with “classic” SQL; e.g. SELECT * FROM myKind WHERE myProp >= 100 AND

myProp < 200 or SELECT * FROM myKind ORDER BY myProp DESC LIMIT 100

● By default all properties are indexed, supports composite indexes (a bit more work to enable them though)

Page 13: Mashing the data

Our Setup

Page 14: Mashing the data

Setup

MySQL Master

MySQL Slave

SBR NodeJS App

RBR

Google Cloud Datastore

Google Cloud Node modules

Page 15: Mashing the data

Details about NodeJS App● Uses ZongJi (https://github.com/nevill/zongji - MySQL binlog listener)

var ZongJi = require('zongji');

var zongji = new ZongJi(config.database);

zongji.on('binlog',function (evt) {doSomething('binlog',evt)})

zongji.on('query', function(evt) {doSomething('query',evt)})

zongji.on('writerows',function(evt) {doSomething('insert',evt)})

zongji.on('updaterows', function(evt) {doSomething('update',evt)})

zongji.on('deleterows', function(evt) {doSomething('delete',evt)})

Page 16: Mashing the data

NodeJS (cont’d)zongji.start({

startAtEnd: true,

includeSchema: {yourDBhere":true,"yourOtherDBHere":true},//config.monitor,

includeEvents: [ 'tablemap', 'writerows', 'updaterows', 'deleterows' , 'query','rotate']

});

var doSomething = function(type, event) {

//event has a rows attribute containing every modified row

//it also has a tableMap containing table metadata (most important - table name)

}

Page 17: Mashing the data

NodeJS (last one, I promise)var sendToDataStore = function(namespace,idfldname,row) {

var k = datastore.key([namespace, row[idfldname]]);

datastore.save({key:k,data:row} ,function(err,res){

if(err) console.log("ERROR",err)

else console.log("OK",JSON.stringify(res))

});

}

Page 18: Mashing the data

Demo Time

Page 19: Mashing the data

In case the demo does not work

Page 20: Mashing the data

Thank you!