OrientDB distributed architecture 1.1

Distributed architecturewith a Multi-Master approach

Available in version 1.0(planned for December 2011)

www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License of 41

rev 1.1

http://www.orientechnologies.com/

http://creativecommons.org/licenses/by-nd/3.0/

Where is the previousOrientDB

Master/Slavearchitecture?




After first tests we decided tothrow away the old Master-Slave

architecture because it wasagainst the OrientDB philosophy:

doesn't scale

and

it's hard to configure properlywww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License of 41



So what's next?

We've re-designed the entire distributedarchitecture to get it working as

Multi-Master*to release in the version 1.0

(december 2011)

*http://en.wikipedia.org/wiki/Multi-master_replication




In the Multi-Master architecture

any node can read/write to the database

this scale up horizontly

adding nodes is straightforward

Say wow!




...but

you have to fightwith

conflictswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License of 41



Fortunately we found somesmart ways to resolve conflicts without

falling in a

Blood Bath




Leader Node

The actors

Any server node in the cluster. Has a permanentconnection to the Leader Node

Synchronous mode replication. Server node propagateschanges waiting for the response from the remote server,then sends the ACK to the clientAsynchronous mode replication. Server node propagateschanges and sends the ACK to the client without waitingfor the response from the remote server

Peer Node

Only 1 per Leader per cluster, checks other nodes andnotify changes to other Peer Nodes. Can be any servernode in the cluster, usually the first to start

Clients are connected to Server Nodes no matter if Leaderor PeerClient

Database Database, where data are stored




How the clusterof nodes iscomposed

andmanaged?




Cluster auto-discoveringAt start up each Server Node sends a IP Multicast message in broadcast to

discover if any Leader Node is available to join the cluster. If available, theLeader Node will connect to it and it becomes a Peer Node, otherwise it becomes

the Leader node.

DBDBDBDBDBDB

DBDBDBDBDBDB

Server #1(Leader)

Server #2(Peer)




One Leader Multiple PeersThe first node to start is always the Leader but in case of failure can be electedany other. Leader Node polls all the servers verifying the status and alerts all the

Peer Nodes at every changes in the cluster composition.

DBDBDBDBDBDB

DBDBDBDBDBDBDBDBDBDBDB

Server #1(Leader)

Server #2(Peer)

Server #3(Peer)




Asymmetric clusteringEach database can be clustered in multiple server nodes. Databases can be moved

across servers. Replication strategy has per database/server granularity.This means you could have Server #2 that replicates database B in asynch way

to the Server #3 and database A in synch way to the Server #1.

Server #1(Leader)

A B C B

A

C

Server #2(Peer)

Server #3(Peer)




Distributed configurationCluster configuration is broadcasted from the Leader Node to all the Peer Nodes.

Peer Nodes broadcast to all the connected clients.Everybody knows who has the database

Server #1(Leader)

Server #2(Peer)

Server #3(Peer)

Client #1Client #3

Client #2




SecurityTo join a cluster the Server Node has to configure the cluster name and password

Broadcast messages are encrypted using the passwordPassword doesn't cross the network: it's stored in the configuration file


DBDBDBDBDBDB

Server #1(Leader)

Server #2(Peer)

Join the clusterONLY

If knows the nameand password



Leader electionEach Peer Node continuously checks the connection with the Leader Node

If lost try to elect itself as a new Leader NodeSplit Network resolved using a simple algorithm


Server #2192.168.10.27:2424

(Leader)

Server #1192.168.0.10:2424

(Leader)

Server #1 takes theleadership

because has the lower IDID = <ip-address>:<port>



Multiple clustersMultiple separate clusters can coexist in the same network

Clusters can't see each others. Are separated boxesWhat identify a cluster is name + password

Server #1(Leader)

Server #2(Peer)


Server #3(Peer)

Cluster 'A', password 'aaa'

Server #1(Leader)

Server #2(Peer)

Server #3(Peer)

Cluster 'B', password 'bbb'



Server #1 Server #2

Fail-overClients knows about other nodes, so transparently switch

to good servers. No error is sent to the client app.Running transactions will be repeated transparently too (v1.2)

DB-1 DB-2

Client #1 Client #2 Client #3 Client #4




How the replication works?




Server #1

Synchronous ReplicationGuarantees two databases are always consistent

More expensive than asynchronous because the First Serverwaits for the Second Server's answer before to send back

the ACK to the client. After ACK the Client is securethe data is placed in multiple nodes at the same time

Server #2


DB-1 DB-2



Server #1

DB-2

Synchronous Replicationsteps

Server #2

1) Update record request

2) Update record to DB-1

3) Propagates the update

5) Sends back OK to Server #1 4) update record to DB-2

Client #1

6) Sends back OK to Client #1


DB-1



Asynchronous ReplicationChanges are propagated without waiting for the answer

Two databases could be not consistent in the range of few msFor this reason it's called “Eventually Consistent”

It's much less expensive than synchronous replication.


Server #1 Server #2

DB-1 DB-2



Server #1

Client #1

Asynchronous Replicationsteps

(4a and 4b are executed in parallel)




4b) update record to DB-2

4a) Sends back OK to Client #1


DB-2

Server #2

DB-1



Server #1

DB-2

Error ManagementDuring replication the Second Server could get an error due to a

conflict (the record was modified in the same moment from another client)or a I/O problem. In this case the error is logged to disk to being fixed later.

Server #2




Client #1

4) Sends back OK to Client #1

Synch Log

5) update record to DB-26) log the error


DB-1



DB-2

Conflict ManagementDuring replication conflicts could happen if two clients are

updating the same record at the same timeThe conflicts resolution strategy can be plugged by providing

implementations of the OConflictResolver interface

Server #2


Conflict Strategy



DB-2

Conflict ManagementDefault strategy

Server #2

Synch Log

Default implementationmerges the records:

in case same fields arechanged the oldest

document wins and thenewest is written into the

Synch Log


DefaultConflict Strategy



Manual control of conflictslike SVN/GIT tools




Display the diff of 2 databases> compare database db1 db2

Copy a record across databases> copy record #10:20@db1 to #10:20@db2

Copy entire cluster across databases> copy cluster city@db1 to city@db2

Merges two records across databases> merge records #10:20@db1 #10:20@db2

to #10:[email protected] Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License of 41



How nodes are re-aligned

once up again after a fail,shutdown or network problem?




Server #1 Server #2

During replication all operationsare logged using

unique op-id with the format <node>#<serial>

DB-1 DB-2

Client

Operation Log

Op-id: 192.168.0.10:2424#123232

Operation Log

Op-id: 192.168.0.10:2424#123232

Update a record




Server #1 Server #2

On restart the node asks to the Leaderwhich are the servers to synchronize

op-ids are used to know the operation missed

DB-1 DB-2Operation Log

Op-id: 192.168.1.11:2424#9569

Operation Log

Op-id: 192.168.0.10:2424#123232




To beconsistentor not be,

that isthe question




Always consistentuse it as a Master-Slave

Server #2Synch Slaveread only

Server #1Master

read + write


Read only, consistent. Leave it as replica. Since it's always aligned it's the best candidate as new master if

Server #1 is unavailable

Read/Write. All changes on this server

avoiding conflicts

Client

Client

Perfect for Analysis, Business Intelligence

and ReportsOne-way only



Read-only scalingusing many asynchronous replicas

Server #2Synch Slaveread onlyServer #1

Masterread + write


Client

Client

Read/Write. All changes on this server

avoiding conflicts

Server #3Asynch Slave

read only


read only


read only

Server #NAsynch Slave

read only

Read only, eventually consistent. Replication

cost close to zero



Read/Write scalingMulti master + handling conflicts

Server #3Master

read + write

Server #1Master

read + write


Client

Client

Server #2Master

read + write Client

Client

Client

Client



Read/Write scaling + shardingMulti master, no conflict! :-)

Server CHIMaster

read + write

Server USAMaster

read + write


Client

Client

Writes oncustomers_usa

Writes oncustomers_china

customers_usa

customers_china



Multi-Master + Sharding=

big scale in high-availability and no conflictswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License of 41



NuvolaBase.com(beta)

The firstGraph Database

on the Cloud

always availablefew seconds to setup it

use it from Web & Mobileapps




Luca GarulliAuthor of OrientDB and

Roma <Meta> FrameworkOpen Source projects,

Member of JSR#12 (jdo 1.0) and JSR#243 (jdo 2.0)

CEO at Nuvola Base Ltd

www.twitter.com/lgarulli@London, UK

and@Rome, Italy




OrientDB distributed architecture 1.1

Technology

Transcript of OrientDB distributed architecture 1.1