1 / 191 - DataOps Barcelona | Databases · GR is a plugin for MySQL, made by MySQL and packaged...

Post on 20-May-2020

11 views 0 download

Transcript of 1 / 191 - DataOps Barcelona | Databases · GR is a plugin for MySQL, made by MySQL and packaged...

1 / 191

2 / 1912 / 191

 

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purpose only, and may not beincorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied up inmaking purchasing decisions. The development, release and timing of any features or functionality described for Oracle´s productremains at the sole discretion of Oracle.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

3 / 191

about.me/lefred

Who am I ?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

4 / 191

Frédéric Descamps@lefredMySQL EvangelistManaging MySQL since 3.23devops believerliving in Belgium 🇧🇪https://lefred.be

 

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

5 / 191

Group Replication: heart of MySQL InnoDB Cluster

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

6 / 191

Group Replication: heart of MySQL InnoDB Cluster

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

7 / 191

MySQL Group Replication

but what is it ?!?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

8 / 191

GR is a plugin for MySQL, made by MySQL and packaged withMySQLGR is an implementation of Replicated Database StateMachine theoryPaxos based protocolGR allows to write on all Group Members (cluster nodes)simultaneously while retaining consistencyGR implements conflict detection and resolutionGR allows automatic distributed recovery

Automatic failoverAutomatic membership configuration

Adding/removing membersNetwork partitions, failures

Prevents data lossSupported on all MySQL platforms !!

Linux, Windows, Solaris, OSX, FreeBSDCompletely Open Source - GPL ! No license required to haveHigh Availability

MySQL Group Replication

but what is it ?!?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

9 / 191

Group Replication is nice, but how does it work ?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

10 / 191

Group Replication is nice, but how does it work ?

it´s just ...

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

11 / 191

Group Replication is nice, but how does it work ?

it´s just ...

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

12 / 191

Group Replication is nice, but how does it work ?

it´s just ...

... no, in fact the writesets replication is synchronous and then certification and apply of the changes are local to each nodes andasynchronous.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

13 / 191

Group Replication is nice, but how does it work ?

it´s just ...

... no, in fact the writesets replication is synchronous and then certification and apply of the changes are local to each nodes andasynchronous.

not that easy to understand... right ? As a picture is worth a 1000 words, let´s illustrate this...

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

14 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

15 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

16 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

17 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

18 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

19 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

20 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

21 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

22 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

23 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

24 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

25 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

26 / 191

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

27 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

28 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

29 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

30 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

31 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

32 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

33 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

34 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

35 / 191

MySQL Group Replication (full transaction)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

36 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocol

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

37 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocolReplicated Database State Machine

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

38 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocolReplicated Database State MachinePaxos based protocol (a variant of Mencius)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

39 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocolReplicated Database State MachinePaxos based protocol (a variant of Mencius)its task: deliver messages across the distributed system:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

40 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocolReplicated Database State MachinePaxos based protocol (a variant of Mencius)its task: deliver messages across the distributed system:

atomically

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

41 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocolReplicated Database State MachinePaxos based protocol (a variant of Mencius)its task: deliver messages across the distributed system:

atomicallyin Total Order (Writes)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

42 / 191

MySQL Group Communication System (GCS)MySQL Xcom protocolReplicated Database State MachinePaxos based protocol (a variant of Mencius)its task: deliver messages across the distributed system:

atomicallyin Total Order (Writes)

MySQL Group Replication receives the Ordered 'tickets' from this GCS subsystem.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

43 / 191

Architecture, Stack, Core, ...

MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

44 / 191

MySQL Group Replication - Architecture

Node Types:

R: Traffic routers/proxies: MySQL Router, HAProxy, ProxySQL...M: mysqld nodes participating in MySQL Group Replication

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

45 / 191

MySQL Group Replication - Stack

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

46 / 191

Server calls into the plugin through a generic interface

Most server internals are hidden from the plugin

Plugin interacts with the server through a generic interface

Replication plugin determines the fate of the commitoperation through a well defined server interfaceThe plugin makes use of the relay log infrastructure toinject changes in the receiving server

MySQL Group Replication - Core

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

47 / 191

Maintaining distributed execution contextDetecting and Resolving conflictsHandling distributed recovery

Detect membership changesDonate state if neededCollect state if needed

Proposing transactions to other membersReceiving and handling transactions from other membersDeciding the ultimate fate of transactions

commit or rollback

MySQL Group Replication - GR Plugin

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

48 / 191

The communication API (and binding) is responsible for:

Abstracting the underlying group communication system implementation fromthe plugin itselfMapping the interface to a specific group communication system implementation

The Group Communication System engine:

Paxos implementation (Similar to Paxos Mencius)Building block to provide distributed agreement between servers

MySQL Group Replication - GCS

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

49 / 191

Total Order

GTID generation

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

50 / 191

MySQL Group Replication uses MySQL replication framework bydesign:

binary logs

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

51 / 191

MySQL Group Replication uses MySQL replication framework bydesign:

binary logs

relay logs

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

52 / 191

MySQL Group Replication uses MySQL replication framework bydesign:

binary logs

relay logs

GTIDs: Global Transaction IDs

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

53 / 191

How does Group Replication handle GTIDs ?There are two ways of generating GTIDs:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

54 / 191

How does Group Replication handle GTIDs ?There are two ways of generating GTIDs:

AUTOMATIC: the transaction is assigned with an automatically generated id during commit. Where regular replication uses thesource server UUID, on Group Replication, the group name is used.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

55 / 191

How does Group Replication handle GTIDs ?There are two ways of generating GTIDs:

AUTOMATIC: the transaction is assigned with an automatically generated id during commit. Where regular replication uses thesource server UUID, on Group Replication, the group name is used.

ASSIGNED: the user assigns manually a GTID through SET GTID_NEXT to the transaction. This is common to any replicationformat and the id is assigned before the transaction starts.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

56 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

57 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

58 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

59 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

60 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

61 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

62 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

63 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

64 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

65 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

66 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

67 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

68 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

69 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

70 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

71 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

72 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

73 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

74 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

75 / 191

Group Replication : Total Order Delivery - GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

76 / 191

Group Replication : GTIDThe previous example was not totally in sync with reality. In fact, a writer allocates a block of GTID and when we have multiplewrites (multi-primary mode) all writers will use GTID sequence numbers in their allocated block.

The size of the block is defined by group_replication_gtid_assignment_block_size (default to 1M)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

77 / 191

Group Replication : GTIDExample:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

78 / 191

Group Replication : GTIDExample:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355

New write on an other node:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

79 / 191

Group Replication : GTIDExample:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355

New write on an other node:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354

Let's write on the third node:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355:1000354:2000354

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

80 / 191

Group Replication : GTIDExample:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355

New write on an other node:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354

Let's write on the third node:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355:1000354:2000354

And writing back on the first one:

Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-356:1000354:2000354

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

81 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

82 / 191

done !

Return from Commit

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

83 / 191

Group Replication: return from commitAsynchronous Replication:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

84 / 191

Group Replication: return from commit (2)Semi-Sync Replication:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

85 / 191

(*): eventual

Group Replication: return from commit (3)Group Replication (*):

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

86 / 191

(*): before

Group Replication: return from commit (4)Group Replication (*):

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

87 / 191

(*): after

Group Replication: return from commit (5)Group Replication (*):

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

88 / 191

Does this mean we can have a distant node and always let it ack later?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

89 / 191

Does this mean we can have a distant node and always let it ack later?

NO!

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

90 / 191

Does this mean we can have a distant node and always let it ack later?

NO!

Because the system has to wait for the noop (single skip message) from the “distant” node where latency is higher

The size of the GCS consensus messages window can be get and set from UDF functions:group_replication_get_write_concurrency(), group_replication_set_write_concurrency()

mysql> select group_replication_get_write_concurrency();+-------------------------------------------+| group_replication_get_write_concurrency() |+-------------------------------------------+| 10 |+-------------------------------------------+

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

91 / 191

Event HorizonGCS Write Consensus Concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

92 / 191

Event HorizonGCS Write Consensus Concurrency

group replication write concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

93 / 191

Event HorizonGCS Write Consensus Concurrency

group replication write concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

94 / 191

Event HorizonGCS Write Consensus Concurrency

group replication write concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

95 / 191

Event HorizonGCS Write Consensus Concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

96 / 191

Event HorizonGCS Write Consensus Concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

97 / 191

Event HorizonGCS Write Consensus Concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

98 / 191

Event HorizonGCS Write Consensus Concurrency

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

99 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

100 / 191

conflict

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

101 / 191

Group Replication : Optimistic LockingGroup Replication uses optimistic locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

102 / 191

Group Replication : Optimistic LockingGroup Replication uses optimistic locking

during a transaction, local (InnoDB) locking happens

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

103 / 191

Group Replication : Optimistic LockingGroup Replication uses optimistic locking

during a transaction, local (InnoDB) locking happensoptimistically assumes there will be no conflicts across nodes (no communication between nodes necessary)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

104 / 191

Group Replication : Optimistic LockingGroup Replication uses optimistic locking

during a transaction, local (InnoDB) locking happensoptimistically assumes there will be no conflicts across nodes (no communication between nodes necessary)cluster-wide conflict resolution happens only at COMMIT, during certification

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

105 / 191

Group Replication : Optimistic LockingGroup Replication uses optimistic locking

during a transaction, local (InnoDB) locking happensoptimistically assumes there will be no conflicts across nodes (no communication between nodes necessary)cluster-wide conflict resolution happens only at COMMIT, during certification

Let´s first have a look at the traditional locking to compare.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

106 / 191

Traditional locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

107 / 191

Traditional locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

108 / 191

Traditional locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

109 / 191

Traditional locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

110 / 191

Traditional locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

111 / 191

Traditional locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

112 / 191

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

113 / 191

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

114 / 191

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

115 / 191

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

116 / 191

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

117 / 191

Optimistic Locking

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

118 / 191

Optimistic Locking

The system returns error 149 as certification failed:

ERROR 1180 (HY000): Got error 149 during COMMIT

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

119 / 191

Optimistic Locking (2)The conflict detection can happen at two different stage:

Only one of the two transactions was sent to the Group and the other one is still running (local).

If both transaction were sent to the Group at almost the same time and both reached GCS/XCOM.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

120 / 191

Optimistic Locking (3)

Only one transaction was sent to GCS and the other conflicting one is still local:

In this case, it's the high priority transaction mechanism of InnoDB that kills the local one:

If you try any statement in the transaction's session you will see:

ERROR: 1213: Deadlock found when trying to get lock; try restarting transaction

If you try to commit the transaction you will see:

ERROR: 1180: Got error 149 - 'Lock deadlock; Retry transaction' during COMMIT

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

121 / 191

Optimistic Locking (4)

Both transactions were sent to GCS:

The second transaction (the conflicting one) will return:

ERROR 3101 (40000): Plugin instructed the server to rollback the current transaction.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

122 / 191

Such conflicts happen only when using multi-primary group ! 

not totally true in MySQL < 8.0.13 when failover happens

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

123 / 191

Drawbacks of optimistic lockinghaving a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

124 / 191

Drawbacks of optimistic lockinghaving a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:

large transactions

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

125 / 191

Drawbacks of optimistic lockinghaving a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:

large transactions

long running transactions

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

126 / 191

Drawbacks of optimistic lockinghaving a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:

large transactions

long running transactions

hotspot records

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

127 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

128 / 191

Configurable Consistency Guarantees

Consistency Levels

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

129 / 191

mysql> show variables like 'group_replication_consistency';+-------------------------------+----------+| Variable_name | Value |+-------------------------------+----------+| group_replication_consistency | EVENTUAL |+-------------------------------+----------+

 

 

Consistency: EVENTUAL (default)By default, there is no synchronization point for the transactions, when you perform a write on a node, if you immediately read thesame data on another node, it is eventually there.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

130 / 191

mysql> show variables like 'group_replication_consistency';+-------------------------------+----------+| Variable_name | Value |+-------------------------------+----------+| group_replication_consistency | EVENTUAL |+-------------------------------+----------+

 

 

Consistency: EVENTUAL (default)By default, there is no synchronization point for the transactions, when you perform a write on a node, if you immediately read thesame data on another node, it is eventually there.

Since MySQL 8.0.16, we have to possibility to set thesynchronization point at read or at write or both (globally or for asession).

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

131 / 191

Consistency: BEFORE (READ)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

132 / 191

Consistency: AFTER (WRITE)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

133 / 191

Consistency: BEFORE_AND_AFTER

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

134 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

135 / 191

can the transaction be committed ?

Certification

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

136 / 191

CertificationCertification is the process that only needs to answer the following unique question:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

137 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

138 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactions

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

139 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

140 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

happens on every member/node and is deterministic

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

141 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

happens on every member/node and is deterministicresults are not reported to the group (does not require a new communication step)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

142 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

happens on every member/node and is deterministicresults are not reported to the group (does not require a new communication step)

pass: commit/queue to appy

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

143 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

happens on every member/node and is deterministicresults are not reported to the group (does not require a new communication step)

pass: commit/queue to appyfail: rollback/drop the transaction

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

144 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

happens on every member/node and is deterministicresults are not reported to the group (does not require a new communication step)

pass: commit/queue to appyfail: rollback/drop the transaction

serialized by the total order in GCS/XCOM + GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

145 / 191

CertificationCertification is the process that only needs to answer the following unique question:

can the write (transaction) be committed ?based on yet to be applied transactionssuch conflicts must come for other members/nodes

happens on every member/node and is deterministicresults are not reported to the group (does not require a new communication step)

pass: commit/queue to appyfail: rollback/drop the transaction

serialized by the total order in GCS/XCOM + GTIDcost is based on trx size (# rows & # keys)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

146 / 191

terminology

Write vs Writeset

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

147 / 191

Let's illustrate a table:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

148 / 191

Now let's make a change

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

149 / 191

and at commit time:

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

150 / 191

WritesetsContain the hash for the rows PKs that are changed and in some cases the hashes of foreign keys or others dependencies thatneed to be captured (e.g. non NULL UKs). Writesets are gathered during transaction execution.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

151 / 191

WritesetsContain the hash for the rows PKs that are changed and in some cases the hashes of foreign keys or others dependencies thatneed to be captured (e.g. non NULL UKs). Writesets are gathered during transaction execution.

WritesCalled also write values, refers to the actual changes. Write values are also gathered during transaction execution.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

152 / 191

Writeset - examples+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

153 / 191

Writeset - examples+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t2 values (1,2);

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

154 / 191

Writeset - examples+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t2 values (1,2);

pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

155 / 191

Writeset - examples+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t2 values (1,2);

pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462

mysql> update t2 set name=3 where id=1;

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

156 / 191

Writeset - examples+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t2 values (1,2);

pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462

mysql> update t2 set name=3 where id=1;

pke: PRIMARY | test | t2 | 1 | 1 hash: 10002085147685770725pke: PRIMARY | test | t2 | 1 | 1 hash: 10002085147685770725

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

157 / 191

Writeset - examples (2)+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | UNI | NULL | || name2 | binary(1) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

158 / 191

Writeset - examples (2)+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | UNI | NULL | || name2 | binary(1) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t3 values (1,2,3);

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

159 / 191

Writeset - examples (2)+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | UNI | NULL | || name2 | binary(1) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t3 values (1,2,3);

pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853pke: name | test |t3 | 2 hash: 11034644986657565827

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

160 / 191

Writeset - examples (2)+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | UNI | NULL | || name2 | binary(1) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t3 values (1,2,3);

pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853pke: name | test |t3 | 2 hash: 11034644986657565827

mysql> update t3 set name=3 where id=1;

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

161 / 191

Writeset - examples (2)+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | UNI | NULL | || name2 | binary(1) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t3 values (1,2,3);

pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853pke: name | test |t3 | 2 hash: 11034644986657565827

mysql> update t3 set name=3 where id=1;

pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853pke: name | test | t3 | 3 hash: 18082071075512932388pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853pke: name | test | t3 | 2 hash: 11034644986657565827

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

162 / 191

Writeset - examples (2)+-------+-----------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-----------+------+-----+---------+-------+| id | binary(1) | NO | PRI | NULL | || name | binary(2) | YES | UNI | NULL | || name2 | binary(1) | YES | | NULL | |+-------+-----------+------+-----+---------+-------+

mysql> insert into t3 values (1,2,3);

pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853pke: name | test |t3 | 2 hash: 11034644986657565827

mysql> update t3 set name=3 where id=1;

pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853pke: name | test | t3 | 3 hash: 18082071075512932388pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853pke: name | test | t3 | 2 hash: 11034644986657565827

[after image][before image]

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

163 / 191

Certification

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

164 / 191

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

165 / 191

Enhanced support for large transactions

Message Fragmentation

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

166 / 191

Fragmenting an outgoing message

(1). If the message size exceeds the maximum size that the user allows(group_replication_communication_max_message_size), the member fragments the message into chunksthat do not exceed the maximum size.

(2). The member broadcasts each chunk to the group, i.e. forwards each chunk individually to XCom.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

167 / 191

Reassembling an incoming message - first fragment

(2). The members conclude that the incoming message is actually a fragment of a bigger message.

(3). The members buffer the incoming fragment because they conclude the fragment is a chunk of a still-incomplete message.(Fragments contain the necessary metadata to reach this conclusion.)

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

168 / 191

Reassembling an incoming message - second fragment

(5). Same as step 3.

(6). Same as step 4.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

169 / 191

Reassembling an incoming message - last fragment

1. The members conclude that the incoming message is actually a fragment of a bigger message.

2. The members conclude that the incoming fragment is the last chunk missing, reassemble the original, whole message, andprocess it.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

170 / 191

Houston we have a problem !

Flow Control

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

171 / 191

Flow ControlIn Group Replication, every member send statistics about its queues (applier queue and certification queue) to the other members.Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

172 / 191

Flow ControlIn Group Replication, every member send statistics about its queues (applier queue and certification queue) to the other members.Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue.

So when group_replication_�ow_control_mode is set to QUOTA on the node seeing that one of the othermembers of the cluster is lagging behind (threshold reached), it will throttle the write operations to the a quota that is calculatedbased on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over thequota” messages from the last period.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

173 / 191

Flow ControlIn Group Replication, every member send statistics about its queues (applier queue and certification queue) to the other members.Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue.

So when group_replication_�ow_control_mode is set to QUOTA on the node seeing that one of the othermembers of the cluster is lagging behind (threshold reached), it will throttle the write operations to the a quota that is calculatedbased on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over thequota” messages from the last period.

This mean that the threshold is NOT decided on the node being slow, but the node writing a transaction checks its threshold flowcontrol values and compare them to the statistics from the other nodes to decide to throttle or not.

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

174 / 191

Flow Control - on writer

>quota

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

175 / 191

Flow Control - on all members

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

176 / 191

Flow Control - configuration variablesAs in MySQL 8.0.16:

+-----------------------------------------------------+-------+| Variable_name | Value |+-----------------------------------------------------+-------+| group_replication_�ow_control_applier_threshold | 25000 || group_replication_�ow_control_certi�er_threshold | 25000 || group_replication_�ow_control_hold_percent | 10 || group_replication_�ow_control_max_quota | 0 || group_replication_�ow_control_member_quota_percent | 0 || group_replication_�ow_control_min_quota | 0 || group_replication_�ow_control_min_recovery_quota | 0 || group_replication_�ow_control_mode | QUOTA || group_replication_�ow_control_period | 1 || group_replication_�ow_control_release_percent | 50 |+-----------------------------------------------------+-------+

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

177 / 191

transaction's lifecycle in Group Replication

Summary

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

178 / 191

begin;

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

179 / 191

begin;update table1 set c = 999 where id =2;

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

180 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

181 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

clie

nt b

lock

s on

com

mit.

..

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

182 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

clie

nt b

lock

s on

com

mit.

..

writesets+ gtid_event+ write values

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

183 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

clie

nt b

lock

s on

com

mit.

..

writesets+ gtid_event+ write values

certify

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

184 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

clie

nt b

lock

s on

com

mit.

..

writesets+ gtid_event+ write values

certifycertify

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

185 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

clie

nt b

lock

s on

com

mit.

..

writesets+ gtid_event+ write values

certifycertify

certify

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

186 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

commit finalized

writesets+ gtid_event+ write values

certifycertify

certify

+ GTID

binlog

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

187 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

commit finalized

writesets+ gtid_event+ write values

certifycertify

certify

+ GTID

binlog

+ GTID

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

188 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

commit finalized

writesets+ gtid_event+ write values

certifycertify

certify

+ GTID

binlog

+ GTID

+ GTIDrelaylog

relaylog

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

189 / 191

begin;update table1 set c = 999 where id =2;update table1set b = "eee"where id = 3;

commit;

commit finalized

writesets+ gtid_event+ write values

certifycertify

certify

+ GTID

binlog

+ GTID

+ GTIDrelaylog

relaylog

binlog

binlog

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

190 / 191

Thank you !

Any Questions ?

Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.

191 / 191