Introduction to TokuDB v7.5 and Read Free Replication

46
La empresa es definida como una actividad económica organizada para la producción, transformación, comercialización, administración o custodia de bienes, o para la prestación de un servicio. Realizado por: Rox LM

description

TokuDB v7.5 introduced Read Free Replication, allowing MySQL slaves to run with virtually no read IO. This presentation discusses how Fractal Tree indexes work, what they enable in TokuDB, and they allow TokuDB to uniquely offer this replication innovation.

Transcript of Introduction to TokuDB v7.5 and Read Free Replication

Page 1: Introduction to TokuDB v7.5 and Read Free Replication

What’s New in TokuDB 7.5

Read Free Replicationand more!

Tim Callaghan, [email protected]

@tmcallaghan

Page 2: Introduction to TokuDB v7.5 and Read Free Replication

Company

• Two high-performance database solutions for big data• NoSQL: TokuMX™ for MongoDB • NewSQL: TokuDB® for MySQL, MariaDB & Percona

Server

• Radical new storage for larger-than-RAM datasets• Fractal Tree® indexing technology• Data science research at M.I.T., Rutgers, Stony Brook

• Open source• Example: Red Hat (Linux) & Canonical (Ubuntu)

Page 3: Introduction to TokuDB v7.5 and Read Free Replication

Current Customers / Big Data Innovators

Page 4: Introduction to TokuDB v7.5 and Read Free Replication

Ever seen this?

IO Utilization Graph, write performance is IO limited

Page 5: Introduction to TokuDB v7.5 and Read Free Replication

Agenda

• What is a Fractal Tree index?• No Read Free Replication without them!

• What Fractal Tree indexes enable in MySQL, MariaDB, and Percona Server• TokuDB!

• What’s new in TokuDB 7.5• Read Free Replication and more

• Q+A

Page 6: Introduction to TokuDB v7.5 and Read Free Replication

Indexing 101:

B-trees and Fractal Tree Indexes

Page 7: Introduction to TokuDB v7.5 and Read Free Replication

B-trees

Page 8: Introduction to TokuDB v7.5 and Read Free Replication

B-tree Overview - vocabulary

Internal Nodes - Path to data

Leaf Nodes - Actual Data -

Sorted

Pointers

Pivots

Page 9: Introduction to TokuDB v7.5 and Read Free Replication

B-tree Overview - example

22

10 99

2, 3, 4 10,20 22,25 99

• Pivot Rule is >=• “numbers” are keys, stored value is row data

Page 10: Introduction to TokuDB v7.5 and Read Free Replication

B-tree Overview - search

22

10 99

2, 3, 4 10,20 22,25 99

“Find 25”

Page 11: Introduction to TokuDB v7.5 and Read Free Replication

B-tree Overview - insert

22

10 99

2, 3, 4 10,15,20 22,25 99

“Insert 15”

Page 12: Introduction to TokuDB v7.5 and Read Free Replication

RAM

RAM

DISK

B-tree Overview - performance

22

10 99

2, 3, 4 10,20 22,25 99

Performance is IO limited when data > RAM, one IO is needed for each insert/update

(actually it’s one IO for every index on the table)

Page 13: Introduction to TokuDB v7.5 and Read Free Replication

Fractal Tree indexes

Page 14: Introduction to TokuDB v7.5 and Read Free Replication

Fractal Tree indexes

similar to B-trees• store data in leaf nodes• use index key for ordering

message buffer

message buffer

message buffer

All internal nodes have

message buffers

different than B-trees• message buffers• big nodes (4MB vs. ~16KB)

As buffers overflow, they

cascade down the tree

Messages are eventually

applied to leaf nodes

Page 15: Introduction to TokuDB v7.5 and Read Free Replication

Doesn’t InnoDB Have Buffers?

InnoDB Change Buffer

• It sure does!• No buffer for the primary key index• One buffer for each secondary index

• http://dev.mysql.com/doc/refman/5.5/en/innodb-performance-change_buffering.html

Page 16: Introduction to TokuDB v7.5 and Read Free Replication

InnoDB Buffers Help (for a while)

Page 17: Introduction to TokuDB v7.5 and Read Free Replication

InnoDB Buffers Help (for a while)

• Buffering allows for IO amortization (> 1 operation per IO)

• When data gets large enough the single buffer can’t help (blue = red)

Page 18: Introduction to TokuDB v7.5 and Read Free Replication

Fractal Tree Indexes - sample data

25

10 99

2,3,4 10,20 22,25 99

Looks a lot like a B-tree!

Page 19: Introduction to TokuDB v7.5 and Read Free Replication

insert 15;

Fractal Tree Indexes - insert

25

10 99

2,3,4 10,20 22,25 99

insert (15)

• search operations must consider messages along the way• messages cascade down the tree as buffers fill up• they are eventually applied to the leaf nodes, hundreds or

thousands of operations for a single IO• CPU and cache are conserved as important data is not ejected

Page 20: Introduction to TokuDB v7.5 and Read Free Replication

Fractal Tree Indexes - other operations

25

10 99

2,3,4 10,20 22,25 99

add_column(c4 bigint)

delete(99)increment(22,+5)

...

insert (100)delete(8)delete(2)insert (8)

Lots of operations can be messages!

Page 21: Introduction to TokuDB v7.5 and Read Free Replication

MySQL/MariaDB/Percona Server+

Fractal Tree Indexes=

TokuDB

Page 22: Introduction to TokuDB v7.5 and Read Free Replication

22

What is TokuDB?

• Transactional MySQL Storage Engine - think InnoDB• Available for MySQL 5.5 and MariaDB 5.5• Percona Server 5.6 and MariaDB 10.0 too

• ACID and MVCC• Free/OSS Community Edition• http://github.com/Tokutek/ft-engine

• Enterprise Edition• Commercial support + hot backup

Performance + Compression + Agility

Page 23: Introduction to TokuDB v7.5 and Read Free Replication

TokuDB Performance

Warning - Benchmarks Ahead!

Page 24: Introduction to TokuDB v7.5 and Read Free Replication

24

Indexed Insertion Performance

* old numbers, now > 25K/sec

Page 25: Introduction to TokuDB v7.5 and Read Free Replication

25

Sysbench Performance (> RAM)

The fastest IO is the one you never have to do (compression)

Page 26: Introduction to TokuDB v7.5 and Read Free Replication

TokuDB Compression

Page 27: Introduction to TokuDB v7.5 and Read Free Replication

27

Compression + IO Reduction

• Server was at 90% IO utilization with InnoDB, 10% IO utilization with TokuDB

Page 28: Introduction to TokuDB v7.5 and Read Free Replication

28

Compression Performance

• InnoDB performance is severely impacted by compression• Compression “misses” are costly

*iiBench workload

Page 29: Introduction to TokuDB v7.5 and Read Free Replication

29

Compression Achieved

• InnoDB compresses 16K blocks, TokuDB is 64K (or more)• InnoDB requires fixed on-disk size, TokuDB is flexible

*log style data

Page 30: Introduction to TokuDB v7.5 and Read Free Replication

TokuDB Agility

Page 31: Introduction to TokuDB v7.5 and Read Free Replication

31

Maintenance Windows?

Page 32: Introduction to TokuDB v7.5 and Read Free Replication

32

Schema Changes Without Downtime?

• In TokuDB, column add/drop/expand is instant • “it’s just a message” – Fractal Tree index

• No need for helper tools• MySQL 5.6 or Percona Tools• Operation is still expensive (table rewrite)

• Or, no need to change on slave, then switch with master and repeat

Page 33: Introduction to TokuDB v7.5 and Read Free Replication

TokuDB 7.5

New Features

Page 34: Introduction to TokuDB v7.5 and Read Free Replication

34

TokuDB 7.5 – Small Stuff

• Updated MySQL and MariaDB to 5.5.39• Allow XA transactions to skip fsync() in prepare phase• XA means a multi-statement transaction that includes

TokuDB another another XA engine (InnoDB)• Community Contribution by Bohu TANG

• Hot backup now supports multiple directories• datadir plus log_bin, tokudb_data_dir, tokudb_log_dir

• Additional bulk fetch – this is not small!• Was just “select *”• Now includes “insert into select …”, “replace into

select …”, “insert ignore select …”, insert into select … on duplicate key update …”, and “delete from select …”

Page 35: Introduction to TokuDB v7.5 and Read Free Replication

Brief Overview ofMySQL Replication

Page 36: Introduction to TokuDB v7.5 and Read Free Replication

36

MySQL Replication - Modes

MySQL supports three replication modes

• Statement Based• SQL statements are logged and replayed on slaves

• "insert into foo values (1,1);"• Good for when statement affects a lot of rows

• "insert into foo select * from bar;"

• Row Based• Before and after images of affected rows are logged and

replayed on slaves• foo : before (1,1) after (1,2)

• Mixed• Statement based unless it is determined to be unsafe, at

which point row based• "update foo set c1=5 limit 5;"

Page 37: Introduction to TokuDB v7.5 and Read Free Replication

37

MySQL Replication – Read Only

Setting the slave's read_only=1• Puts the slave in "read only" mode• Except that user's with the SUPER privilege are allowed

to insert/update/delete– This can break RFR!

Page 38: Introduction to TokuDB v7.5 and Read Free Replication

38

MySQL Replication – Slave Apply

Simple (hand wavy) overview of the slave process• Read a replication event from the binary log• If "statement"• Execute the statement on the slave

• If "row"• Insert, just write the row• Delete/Update, lookup the row• If row doesn’t exist, stop replication

Page 39: Introduction to TokuDB v7.5 and Read Free Replication

39

MySQL Replication – Slave Lag

In MySQL 5.5, replication is single threaded• Masters support concurrency, slaves do not• Causes slaves to "lag" behind the master

Improvements exist• MySQL 5.6 supports multi-threaded slaves (database)• MariaDB 10.0 it's own parallel replication mechanism

All of these will work with TokuDB's Read Free Replication

Page 40: Introduction to TokuDB v7.5 and Read Free Replication

TokuDB Read Free Replication

Page 41: Introduction to TokuDB v7.5 and Read Free Replication

41

Read Free Replication - Requirements

• What is required on the master?• binlog_format=ROW

• What is required on the slave?• read_only=1• tokudb_rpl_unique_checks=0 and/or

tokudb_rpl_lookup_rows=0

Page 42: Introduction to TokuDB v7.5 and Read Free Replication

42

RFR Optimization #1 – Skip Unique Checks

• tokudb_rpl_unique_checks=0

• Why is it OK?• The master already performed the uniqueness

check

• Why can't InnoDB skip unique checks?• It could, but…• InnoDB doesn't support change buffering on the PK• So, the row must be read for maintenance

• Since it is then in memory, there is little to be gained for skipping the check

Page 43: Introduction to TokuDB v7.5 and Read Free Replication

43

RFR Optimization #2 – Skip Read/Modify/Write

• tokudb_rpl_lookup_rows=0

• Why is it OK?• If RBR, master provided before/after row images

• Why can't InnoDB skip read/modify/write?• InnoDB doesn't support change buffering on the PK• So, the row must be read for maintenance

• Why can TokuDB skip read/modify/write?• Everything necessary is in the binary log• Simple message injection

Page 44: Introduction to TokuDB v7.5 and Read Free Replication

44

Read Free Replication – Sysbench Benchmark

No lag

Enabled Read Free Replication

Page 45: Introduction to TokuDB v7.5 and Read Free Replication

45

Read Free Replication – Sysbench Benchmark

Additional Read Capacity

Enabled Read Free Replication

Page 46: Introduction to TokuDB v7.5 and Read Free Replication

46

Read Free Replication – Sysbench Benchmark

Mostly fsync()

No reads!

Enabled Read Free Replication

Page 47: Introduction to TokuDB v7.5 and Read Free Replication

47

Read Free Replication - Ideas

#1, scale your reads• HA is nice, but don’t we also want to scale our reads?

IOIO

Master Slave

Slave

Workload

Workload

Readers

Readers

No RFR

RFRIO

Master

IO

Page 48: Introduction to TokuDB v7.5 and Read Free Replication

48

Read Free Replication - Ideas

#2, shared slaves

IOIO

Master1 Slave1

No RFR

RFR

IOIO

Slave2 Master2

IO

IO

Master1 Slave1+2

IO

Master2

IO

1 machine2 mysqlds

Page 49: Introduction to TokuDB v7.5 and Read Free Replication

49

Read Free Replication - Ideas

#3, high IO master (flash/SSD), low IO slave (SAS/SATA)

Master Slave

RFR

Page 50: Introduction to TokuDB v7.5 and Read Free Replication

50

Can we do even more?

• Yes, Reduce fsync() calls on slaves - writes• This is the current choke point for RFR slaves

• In 5.5, master.info and relay-log.info are just files• Each need fsync() for crash safety

• In 5.6, these files can be InnoDB tables• 3 fsync() operations are now 1• However, this becomes an XA transaction when

TokuDB tables are in use• Even more fsync() calls

• Should be able to convert these to TokuDB

Page 51: Introduction to TokuDB v7.5 and Read Free Replication

51

TokuDB Resources

• Website @ www.tokutek.com

• Documentation @ docs.tokutek.com/tokudb

• Community @ tokudb-user Google Group

• Tokutek Blogs @ www.tokutek.com/tokuview

Page 52: Introduction to TokuDB v7.5 and Read Free Replication

Thank you for attending!

Enter questions into the chat box

Contact us: [email protected]

Thank you!