OrientDB distributed architecture 1.1
-
Upload
luca-garulli -
Category
Technology
-
view
35.329 -
download
7
description
Transcript of OrientDB distributed architecture 1.1
Distributed architecturewith a Multi-Master approach
Available in version 1.0(planned for December 2011)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 of 41
rev 1.1
Where is the previousOrientDB
Master/Slavearchitecture?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3 of 41
After first tests we decided tothrow away the old Master-Slave
architecture because it wasagainst the OrientDB philosophy:
doesn't scale
and
it's hard to configure properlywww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4 of 41
So what's next?
We've re-designed the entire distributedarchitecture to get it working as
Multi-Master*to release in the version 1.0
(december 2011)
*http://en.wikipedia.org/wiki/Multi-master_replication
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5 of 41
In the Multi-Master architecture
any node can read/write to the database
this scale up horizontly
adding nodes is straightforward
Say wow!
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7 of 41
...but
you have to fightwith
conflictswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8 of 41
Fortunately we found somesmart ways to resolve conflicts without
falling in a
Blood Bath
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9 of 41
Leader Node
The actors
Any server node in the cluster. Has a permanentconnection to the Leader Node
Synchronous mode replication. Server node propagateschanges waiting for the response from the remote server,then sends the ACK to the clientAsynchronous mode replication. Server node propagateschanges and sends the ACK to the client without waitingfor the response from the remote server
Peer Node
Only 1 per Leader per cluster, checks other nodes andnotify changes to other Peer Nodes. Can be any servernode in the cluster, usually the first to start
Clients are connected to Server Nodes no matter if Leaderor PeerClient
Database Database, where data are stored
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10 of 41
How the clusterof nodes iscomposed
andmanaged?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11 of 41
Cluster auto-discoveringAt start up each Server Node sends a IP Multicast message in broadcast to
discover if any Leader Node is available to join the cluster. If available, theLeader Node will connect to it and it becomes a Peer Node, otherwise it becomes
the Leader node.
DBDBDBDBDBDB
DBDBDBDBDBDB
Server #1(Leader)
Server #2(Peer)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12 of 41
One Leader Multiple PeersThe first node to start is always the Leader but in case of failure can be electedany other. Leader Node polls all the servers verifying the status and alerts all the
Peer Nodes at every changes in the cluster composition.
DBDBDBDBDBDB
DBDBDBDBDBDBDBDBDBDBDB
Server #1(Leader)
Server #2(Peer)
Server #3(Peer)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13 of 41
Asymmetric clusteringEach database can be clustered in multiple server nodes. Databases can be moved
across servers. Replication strategy has per database/server granularity.This means you could have Server #2 that replicates database B in asynch way
to the Server #3 and database A in synch way to the Server #1.
Server #1(Leader)
A B C B
A
C
Server #2(Peer)
Server #3(Peer)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14 of 41
Distributed configurationCluster configuration is broadcasted from the Leader Node to all the Peer Nodes.
Peer Nodes broadcast to all the connected clients.Everybody knows who has the database
Server #1(Leader)
Server #2(Peer)
Server #3(Peer)
Client #1Client #3
Client #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15 of 41
SecurityTo join a cluster the Server Node has to configure the cluster name and password
Broadcast messages are encrypted using the passwordPassword doesn't cross the network: it's stored in the configuration file
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16 of 41
DBDBDBDBDBDB
Server #1(Leader)
Server #2(Peer)
Join the clusterONLY
If knows the nameand password
Leader electionEach Peer Node continuously checks the connection with the Leader Node
If lost try to elect itself as a new Leader NodeSplit Network resolved using a simple algorithm
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17 of 41
Server #2192.168.10.27:2424
(Leader)
Server #1192.168.0.10:2424
(Leader)
Server #1 takes theleadership
because has the lower IDID = <ip-address>:<port>
Multiple clustersMultiple separate clusters can coexist in the same network
Clusters can't see each others. Are separated boxesWhat identify a cluster is name + password
Server #1(Leader)
Server #2(Peer)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18 of 41
Server #3(Peer)
Cluster 'A', password 'aaa'
Server #1(Leader)
Server #2(Peer)
Server #3(Peer)
Cluster 'B', password 'bbb'
Server #1 Server #2
Fail-overClients knows about other nodes, so transparently switch
to good servers. No error is sent to the client app.Running transactions will be repeated transparently too (v1.2)
DB-1 DB-2
Client #1 Client #2 Client #3 Client #4
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19 of 41
How the replication works?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20 of 41
Server #1
Synchronous ReplicationGuarantees two databases are always consistent
More expensive than asynchronous because the First Serverwaits for the Second Server's answer before to send back
the ACK to the client. After ACK the Client is securethe data is placed in multiple nodes at the same time
Server #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21 of 41
DB-1 DB-2
Server #1
DB-2
Synchronous Replicationsteps
Server #2
1) Update record request
2) Update record to DB-1
3) Propagates the update
5) Sends back OK to Server #1 4) update record to DB-2
Client #1
6) Sends back OK to Client #1
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22 of 41
DB-1
Asynchronous ReplicationChanges are propagated without waiting for the answer
Two databases could be not consistent in the range of few msFor this reason it's called “Eventually Consistent”
It's much less expensive than synchronous replication.
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23 of 41
Server #1 Server #2
DB-1 DB-2
Server #1
Client #1
Asynchronous Replicationsteps
(4a and 4b are executed in parallel)
1) Update record request
2) Update record to DB-1
3) Propagates the update
4b) update record to DB-2
4a) Sends back OK to Client #1
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24 of 41
DB-2
Server #2
DB-1
Server #1
DB-2
Error ManagementDuring replication the Second Server could get an error due to a
conflict (the record was modified in the same moment from another client)or a I/O problem. In this case the error is logged to disk to being fixed later.
Server #2
1) Update record request
2) Update record to DB-1
3) Propagates the update
Client #1
4) Sends back OK to Client #1
Synch Log
5) update record to DB-26) log the error
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25 of 41
DB-1
DB-2
Conflict ManagementDuring replication conflicts could happen if two clients are
updating the same record at the same timeThe conflicts resolution strategy can be plugged by providing
implementations of the OConflictResolver interface
Server #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26 of 41
Conflict Strategy
DB-2
Conflict ManagementDefault strategy
Server #2
Synch Log
Default implementationmerges the records:
in case same fields arechanged the oldest
document wins and thenewest is written into the
Synch Log
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27 of 41
DefaultConflict Strategy
Manual control of conflictslike SVN/GIT tools
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28 of 41
Display the diff of 2 databases> compare database db1 db2
Copy a record across databases> copy record #10:20@db1 to #10:20@db2
Copy entire cluster across databases> copy cluster city@db1 to city@db2
Merges two records across databases> merge records #10:20@db1 #10:20@db2
to #10:[email protected] Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29 of 41
How nodes are re-aligned
once up again after a fail,shutdown or network problem?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30 of 41
Server #1 Server #2
During replication all operationsare logged using
unique op-id with the format <node>#<serial>
DB-1 DB-2
Client
Operation Log
Op-id: 192.168.0.10:2424#123232
Operation Log
Op-id: 192.168.0.10:2424#123232
Update a record
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31 of 41
Server #1 Server #2
On restart the node asks to the Leaderwhich are the servers to synchronize
op-ids are used to know the operation missed
DB-1 DB-2Operation Log
Op-id: 192.168.1.11:2424#9569
Operation Log
Op-id: 192.168.0.10:2424#123232
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32 of 41
To beconsistentor not be,
that isthe question
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33 of 41
Always consistentuse it as a Master-Slave
Server #2Synch Slaveread only
Server #1Master
read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34 of 41
Read only, consistent. Leave it as replica. Since it's always aligned it's the best candidate as new master if
Server #1 is unavailable
Read/Write. All changes on this server
avoiding conflicts
Client
Client
Perfect for Analysis, Business Intelligence
and ReportsOne-way only
Read-only scalingusing many asynchronous replicas
Server #2Synch Slaveread onlyServer #1
Masterread + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35 of 41
Client
Client
Read/Write. All changes on this server
avoiding conflicts
Server #3Asynch Slave
read only
Server #3Asynch Slave
read only
Server #3Asynch Slave
read only
Server #NAsynch Slave
read only
Read only, eventually consistent. Replication
cost close to zero
Read/Write scalingMulti master + handling conflicts
Server #3Master
read + write
Server #1Master
read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36 of 41
Client
Client
Server #2Master
read + write Client
Client
Client
Client
Read/Write scaling + shardingMulti master, no conflict! :-)
Server CHIMaster
read + write
Server USAMaster
read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37 of 41
Client
Client
Writes oncustomers_usa
Writes oncustomers_china
customers_usa
customers_china
Multi-Master + Sharding=
big scale in high-availability and no conflictswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38 of 41
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39 of 41
NuvolaBase.com(beta)
The firstGraph Database
on the Cloud
always availablefew seconds to setup it
use it from Web & Mobileapps
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40 of 41
Luca GarulliAuthor of OrientDB and
Roma <Meta> FrameworkOpen Source projects,
Member of JSR#12 (jdo 1.0) and JSR#243 (jdo 2.0)
CEO at Nuvola Base Ltd
www.twitter.com/lgarulli@London, UK
and@Rome, Italy
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41 of 41