Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers...
Transcript of Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers...
![Page 1: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/1.jpg)
Orchestrator on Raft: internals, benefits and considerations
Shlomi Noach GitHub
FOSDEM 2018
![Page 2: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/2.jpg)
About me
• @github/database-infrastructure
• Author of orchestrator, gh-ost, freno, ccql and others.
• Blog at http://openark.org
• @ShlomiNoach
![Page 3: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/3.jpg)
Agenda
• Raft overview
• Why orchestrator/raft
• orchestrator/raft implementation and nuances
• HA, fencing
• Service discovery
• Considerations
![Page 4: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/4.jpg)
Raft
• Consensus algorithm
• Quorum based
• In-order replication log
• Delivery, lag
• Snapshots! !
!!
!
![Page 5: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/5.jpg)
HashiCorp raft
• golang raft implementation
• Used by Consul
• Recently hit 1.0.0
• github.com/hashicorp/raft
![Page 6: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/6.jpg)
orchestrator
• MySQL high availability solution and replication topology manager
• Developed at GitHub
• Apache 2 license
• github.com/github/orchestrator
"
"
"
" ""
"
" ""
"
" ""
"
""
![Page 7: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/7.jpg)
Why orchestrator/raft
• Remove MySQL backend dependency
• DC fencing
And then good things happened that were not planned:
• Better cross-DC deployments
• DC-local KV control
• Kubernetes friendly
![Page 8: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/8.jpg)
orchestrator/raft
• n orchestrator nodes form a raft cluster
• Each node has its own,dedicated backend database (MySQL or SQLite)
• All nodes probe the topologies
• All nodes run failure detection
• Only the leader runs failure recoveries
"
"
"
" ""
"
" ""
"
" ""
"
""
![Page 9: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/9.jpg)
Implementation & deployment @ GitHub• One node per DC
• 1 second raft polling interval
• step-down
• raft-yield
• SQLite-backed log store
• MySQL backend (SQLite backend use case in the works)
"
"
"
"
"
"
DC1
DC2
DC3
![Page 10: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/10.jpg)
A high availability scenario
o2 is leader of a 3-node orchestrator/raft setup
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 11: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/11.jpg)
Injecting failure
master: killall -9 mysqld
o2 detects failure. About to recover, but…
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 12: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/12.jpg)
Injecting 2nd failure
o2: DROP DATABASE orchestrator;
o2 freaks out. 5 seconds later it steps down
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 13: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/13.jpg)
orchestrator recovery
o1 grabs leadership
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 14: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/14.jpg)
MySQL recovery
o1 detected failure even before stepping up as leader.
o1, now leader, kicks recovery, fails over MySQL master
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 15: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/15.jpg)
orchestrator self health tests
Meanwhile, o2 panics and bails out.
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 16: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/16.jpg)
puppet
Some time later, puppet kicks orchestrator service back on o2.
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 17: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/17.jpg)
orchestrator startup
orchestrator service on o2 bootstraps, creates orchestrator schema and tables.
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 18: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/18.jpg)
Joining raft cluster
o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the group
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 19: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/19.jpg)
Grabbing leadership
Some time later, o2 grabs leadership
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 20: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/20.jpg)
DC fencing
• Assume this 3 DC setup
• One orchestrator node in each DC
• Master and a few replicas in DC2
• What happens if DC2 gets network partitioned?
• i.e. no network in or out DC2
"
"
" ""
"" ""
"""
DC1
DC2
DC3
![Page 21: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/21.jpg)
DC fencing
• From the point of view of DC2 servers, and in particular in the point of view of DC2’s orchestrator node:
• Master and replicas are fine.
• DC1 and DC3 servers are all dead.
• No need for fail over.
• However, DC2’s orchestrator is not part of a quorum, hence not the leader. It doesn’t call the shots.
"
"
" ""
"" ""
"""
DC1
DC2
DC3
![Page 22: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/22.jpg)
DC fencing
• In the eyes of either DC1’s or DC3’s orchestrator:
• All DC2 servers, including the master, are dead.
• There is need for failover.
• DC1’s and DC3’s orchestrator nodes form a quorum. One of them will become the leader.
• The leader will initiate failover.
"
"
" ""
"" ""
"""
DC1
DC2
DC3
![Page 23: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/23.jpg)
DC fencing
• Depicted potential failover result. New master is from DC3.
"
"
"""
"
"
"
"
"""
DC1
DC2
DC3
![Page 24: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/24.jpg)
orchestrator/raft & consul
• orchestrator is Consul-aware
• Upon failover orchestrator updates Consul KV with identity of promoted master
• Consul @ GitHub is DC-local, no replication between Consul setups
• orchestrator nodes, update Consul locally on each DC
![Page 25: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/25.jpg)
Considerations, watch out for
• Eventual consistency is not always your best friend
• What happens if, upon replay of raft log, you hit two failovers for the same cluster?
• NOW() and otherwise time-based assumptions
• Reapplying snapshot/log upon startup
![Page 26: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual](https://reader036.fdocuments.in/reader036/viewer/2022071017/5fd0f73e21f013424540e60f/html5/thumbnails/26.jpg)
orchestrator/raft roadmap
• Kubernetes
• ClusterIP-based configuration in progress
• Already container-friendly via auto-reprovisioning of nodes via Raft