2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the...

53
How people build soware ! MySQL Infrastructure Testing Automation @ GitHub Jonah Berquist, Tom Krouper GitHub Percona Live 2018 1 !

Transcript of 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the...

Page 1: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

MySQL Infrastructure Testing Automation @ GitHubJonah Berquist, Tom Krouper GitHub

Percona Live 2018

1

!

Page 2: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Agenda• Intros • MySQL @ GitHub • Backup/restores • Schema migrations • Failovers

2

!

Page 3: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

About Tom• Sr. Infrastructure Engineer • Member of the Database Infrastructure Team • Working with MySQL since 2003 (MySQL 4.0 release era) • Worked on MySQL at Twitter, Booking, and Box previous to

GitHub. Several other places too.

https://github.com/tomkrouper https://twitter.com/@CaptainEyesight

3

!

Page 4: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

About Jonah• Infrastructure Engineering Manager • Member of the Database Infrastructure team • Proud manager of 5 lovely team members

https://github.com/jonahberquist https://twitter.com/@hashtagjonah

4

!

Page 5: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 5

• The world’s largest Octocat t-shirt and stickers store • And plush Octocats • And hoodies • And software development platform

GitHub

Page 6: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

MySQL at GitHub• GitHub stores repositories in git, and uses MySQL

as the backend database for all related metadata. • We run a few (growing number of) clusters, totaling

over 100 MySQL servers. • The setup isn’t very large but very busy.

6

!

Page 7: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

MySQL at GitHub• Our MySQL servers must be available, responsive

and in good state • GitHub has 99.95% SLA • Availability issues must be handled quickly, as

automatically as possible.

7

!

Page 8: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Backups

8

Page 9: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Your dataIt’s important

9

!

Page 10: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Backups• xtrabackup • On busy clusters, dedicated backup servers. • Backups from replicas in each DC • We monitor for number of “success” events in past

24-ish hours, per cluster.

10

!

Page 11: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Restores• Something bad happened and you need that data • Building a new host • Rebuilding a broken one • All the time!

11

!

Page 12: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Restores - the old way• Dedicated restore servers. • One per cluster. • Continuously restores, catches up with replication,

restores, catches up with replication, restores, … • Sending a “success” event at the end of each cycle. • We monitor for number of “success” events in past

24-ish hours, per cluster.

12

!

Page 13: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 13

!

!

!

!

!

production replicas

auto-restore replica

master

!

auto-restore replicas

""""""

backup replica

Page 14: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Restores - the new way• Database-class servers in kubernetes. • Data not persistent. • Database cluster agnostic. • Continuously restores, catches up with replication,

restores, catches up with replication, restores, … • Sending a “success” event at the end of each cycle. • We monitor for number of “success” events in past

24-ish hours, per cluster.

14

!

Page 15: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 15

!

! !

!

!

auto-restore replicas on k8s

""""""

!

! !

!

!

""""""

Page 16: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 16

!

!

!

!

!

Auto-restore!

Picks a backup from cluster A

""""""

!

! !

!

!

""""""

Page 17: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 17

!

!

!

!

!

Auto-restore!

starts replicating from cluster A

""""""

!

! !

!

!

""""""

Page 18: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 18

!

!

!

!

!

!

replication catches up

""""""

!

! !

!

!

""""""

!

Page 19: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

! !

!

!

moves on to backup of cluster B

""""""

!

!

!

!

!

Auto-restore!

""""""

Page 20: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

! !

!

!

replicates from cluster B

""""""

!

!

!

!

!

Auto-restore!

""""""

Page 21: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

! !

!

!

replication catches up

""""""

!

!

!

!

!

!

""""""

!

Page 22: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

! !

!

!

auto-restore replica not always running

""""""

!

! !

!

!

""""""

Page 23: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Restores• New host provisioning uses same flow as restore. • A human may kick a restore/reclone manually.

• This can grab the latest, or really any backup we have

• We can also restore from another running host.

23

!

Page 24: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Restore failure• A specific backup/restore may fail because

computers. • No reason for panic.

• Previous backup/restores proven to be working • At most we lose time

• Lack of successful restore for a cluster in the last ~24 hours is an issue to be investigated

24

!

Page 25: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Restore: delayed replica• One delayed replica per cluster

• Lagging at 4 hours

25

!

Page 26: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Backup/restore: logical• We routinely run a logical backup of all individual

tables (independently) • We can load a specific table from a specific logical

backup, onto a non-production server • No need for DBA. Table allocated in a developer’s

space. • Operation is audited.

26

!

Page 27: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Schema migrations

27

Page 28: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Is your data correct?The data you see is merely a ghost of your original data

28

!

Page 29: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

gh-ost• Young. 1yr old. • In production at GitHub since born. • Software • Bugs • Development • Bugs

29

Page 30: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

gh-ost• Overview

30

Page 31: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

31

Synchronous triggers based migration

pt-online-schema-change oak-online-alter-table LHM

!# #original table ghost table

$insert

delete

update

replace

delete

replace

Page 32: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

32

!# #original table ghost table

$insert

delete

updateno triggers

%binary log

Triggerless, binlog based migration

gh-ost

Page 33: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

33

• Binary logs can be read from anywhere • gh-ost prefers connecting to a replica, offloading work from master

• gh-ost controls the entire data flow • It can truly throttle, suspending all writes on the migrated server

• gh-ost writes are decoupled from the master workload • Write concurrency on master turns irrelevant

• gh-ost’s design is to issue all writes sequentially • Completely avoiding locking contention • Migrated server only sees a single connection issuing writes • Migration algorithm simplified

Binlog based design implications

Page 34: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

!

34

Binlog based migration, utilize replica

!# #$

%

!# #

master

replica

Page 35: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

gh-ost testing• gh-ost works perfectly well on our data • Tested, re-tested, and tested again • Full coverage of production tables

35

Page 36: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

gh-ost testing servers• Dedicated servers that run continuous tests

36

Page 37: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software! 37

!

!

!%

!

!

production replicas

testing replica

master

!

gh-ost testing replicas

!

!

!%

!

!

production replicas

testing replica

master

!

Page 38: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

gh-ost testing• Trivial ENGINE=INNODB migration • Stop replication • Cut-over, cut-back • Checksum both tables, compare • Checksum failure: stop the world, alert • Success/failure: event • Drop ghost table • Catch up • Next table

38

Page 39: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

gh-ost development cycle• Work on branch

.deploy gh-ost/mybranch to prod/mysql_role=ghost_testing • Let continuous tests run • Depending on nature of change, observe hours/days/more. • Merge • Tests run regardless of deployed branch

39

Page 40: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Failovers

40

Page 41: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

MySQL setup @ GitHub• Plain-old single writer master-replicas • Semi-sync • Cross DC, multiple data centers • 5.7, RBR • Servers with special roles: production replica,

backup, migration-test, analytics, … • 2-3 tiers of replication • Occasional cluster split (functional sharding) • Very dynamic, always changing

41

!

Page 42: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Points of failure• Master failure, sev1 • Intermediate masters failure

42

!

! !

!!!

! !!

!

Page 43: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

orchestrator• Topology discovery • Refactoring • Failovers for masters and intermediate masters • Open source, Apache 2 license • github.com/github/orchestrator

43

!

Page 44: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

orchestrator failovers @ GitHub• Automated master & intermediate master failovers

for all clusters. • On failover, runs GitHub-specific hooks

• Grabbing VIP/DNS • Updating server role • Kicking services (e.g. pt-heartbeat) • Notifying chat • Running puppet

44

!

Page 45: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Testing cluster• Dedicated testing cluster in production • Does not take production traffic

• “load-test” traffic • Resembles a production topology:

• OS, MySQL Versions • Data centers • Server roles • DNS • Proxy

• Used for many of our deployment tests

45

!

Page 46: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Failover testing• Multiple times per day:

• Setup the cluster in desired topology layout • Inject failure (kill/block/reject) • Wait, expect recovery • Check topology:

• Expect new master, correct DNS changes, replica capacity, …

• Restore old master from backup • (an implicit backup/restore test)

• “success/failure” event

46

!

Page 47: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Failover in production• We expect < 30s failover • Normal case is 10-13s • Intermediate master failover has low impact on

subset of users, depending on cluster/DC/server • Master failover implies outage • Planned master switchover takes a few seconds

47

!

Page 48: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

A moment of reflection

48

Page 49: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

What builds trust in failovers?A testing environment?

49

!

Page 50: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Chaos testing in production• First steps into regular testing • Manual • Supported by our peers • Learning, understanding impact

50

!

Page 51: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Tests that go wrong• Many things can go wrong

• Corrupt replication • Invalidated servers • Unassigned DNS

• Cleanups

51

!

Page 52: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Conclusion• Backup & restore • Failovers • Schema migrations

52

Page 53: 2018 PLSC GitHub MySQL Testing · • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata. • We run a few (growing number of) clusters,

How people build software!

Thank you!Questions?

Jonah Berquist @hashtagjonah Tom Krouper @CaptainEyesight

53

!