Introduction to TokuDB v7.5 and Read Free Replication
-
Upload
tim-callaghan -
Category
Technology
-
view
79 -
download
0
description
Transcript of Introduction to TokuDB v7.5 and Read Free Replication
What’s New in TokuDB 7.5
Read Free Replicationand more!
Tim Callaghan, [email protected]
@tmcallaghan
Company
• Two high-performance database solutions for big data• NoSQL: TokuMX™ for MongoDB • NewSQL: TokuDB® for MySQL, MariaDB & Percona
Server
• Radical new storage for larger-than-RAM datasets• Fractal Tree® indexing technology• Data science research at M.I.T., Rutgers, Stony Brook
• Open source• Example: Red Hat (Linux) & Canonical (Ubuntu)
Current Customers / Big Data Innovators
Ever seen this?
IO Utilization Graph, write performance is IO limited
Agenda
• What is a Fractal Tree index?• No Read Free Replication without them!
• What Fractal Tree indexes enable in MySQL, MariaDB, and Percona Server• TokuDB!
• What’s new in TokuDB 7.5• Read Free Replication and more
• Q+A
Indexing 101:
B-trees and Fractal Tree Indexes
B-trees
B-tree Overview - vocabulary
Internal Nodes - Path to data
Leaf Nodes - Actual Data -
Sorted
Pointers
Pivots
B-tree Overview - example
22
10 99
2, 3, 4 10,20 22,25 99
• Pivot Rule is >=• “numbers” are keys, stored value is row data
B-tree Overview - search
22
10 99
2, 3, 4 10,20 22,25 99
“Find 25”
B-tree Overview - insert
22
10 99
2, 3, 4 10,15,20 22,25 99
“Insert 15”
RAM
RAM
DISK
B-tree Overview - performance
22
10 99
2, 3, 4 10,20 22,25 99
Performance is IO limited when data > RAM, one IO is needed for each insert/update
(actually it’s one IO for every index on the table)
Fractal Tree indexes
Fractal Tree indexes
similar to B-trees• store data in leaf nodes• use index key for ordering
message buffer
message buffer
message buffer
All internal nodes have
message buffers
different than B-trees• message buffers• big nodes (4MB vs. ~16KB)
As buffers overflow, they
cascade down the tree
Messages are eventually
applied to leaf nodes
Doesn’t InnoDB Have Buffers?
InnoDB Change Buffer
• It sure does!• No buffer for the primary key index• One buffer for each secondary index
• http://dev.mysql.com/doc/refman/5.5/en/innodb-performance-change_buffering.html
InnoDB Buffers Help (for a while)
InnoDB Buffers Help (for a while)
• Buffering allows for IO amortization (> 1 operation per IO)
• When data gets large enough the single buffer can’t help (blue = red)
Fractal Tree Indexes - sample data
25
10 99
2,3,4 10,20 22,25 99
Looks a lot like a B-tree!
insert 15;
Fractal Tree Indexes - insert
25
10 99
2,3,4 10,20 22,25 99
insert (15)
• search operations must consider messages along the way• messages cascade down the tree as buffers fill up• they are eventually applied to the leaf nodes, hundreds or
thousands of operations for a single IO• CPU and cache are conserved as important data is not ejected
Fractal Tree Indexes - other operations
25
10 99
2,3,4 10,20 22,25 99
add_column(c4 bigint)
delete(99)increment(22,+5)
...
insert (100)delete(8)delete(2)insert (8)
Lots of operations can be messages!
MySQL/MariaDB/Percona Server+
Fractal Tree Indexes=
TokuDB
22
What is TokuDB?
• Transactional MySQL Storage Engine - think InnoDB• Available for MySQL 5.5 and MariaDB 5.5• Percona Server 5.6 and MariaDB 10.0 too
• ACID and MVCC• Free/OSS Community Edition• http://github.com/Tokutek/ft-engine
• Enterprise Edition• Commercial support + hot backup
Performance + Compression + Agility
TokuDB Performance
Warning - Benchmarks Ahead!
24
Indexed Insertion Performance
* old numbers, now > 25K/sec
25
Sysbench Performance (> RAM)
The fastest IO is the one you never have to do (compression)
TokuDB Compression
27
Compression + IO Reduction
• Server was at 90% IO utilization with InnoDB, 10% IO utilization with TokuDB
28
Compression Performance
• InnoDB performance is severely impacted by compression• Compression “misses” are costly
*iiBench workload
29
Compression Achieved
• InnoDB compresses 16K blocks, TokuDB is 64K (or more)• InnoDB requires fixed on-disk size, TokuDB is flexible
*log style data
TokuDB Agility
31
Maintenance Windows?
32
Schema Changes Without Downtime?
• In TokuDB, column add/drop/expand is instant • “it’s just a message” – Fractal Tree index
• No need for helper tools• MySQL 5.6 or Percona Tools• Operation is still expensive (table rewrite)
• Or, no need to change on slave, then switch with master and repeat
TokuDB 7.5
New Features
34
TokuDB 7.5 – Small Stuff
• Updated MySQL and MariaDB to 5.5.39• Allow XA transactions to skip fsync() in prepare phase• XA means a multi-statement transaction that includes
TokuDB another another XA engine (InnoDB)• Community Contribution by Bohu TANG
• Hot backup now supports multiple directories• datadir plus log_bin, tokudb_data_dir, tokudb_log_dir
• Additional bulk fetch – this is not small!• Was just “select *”• Now includes “insert into select …”, “replace into
select …”, “insert ignore select …”, insert into select … on duplicate key update …”, and “delete from select …”
Brief Overview ofMySQL Replication
36
MySQL Replication - Modes
MySQL supports three replication modes
• Statement Based• SQL statements are logged and replayed on slaves
• "insert into foo values (1,1);"• Good for when statement affects a lot of rows
• "insert into foo select * from bar;"
• Row Based• Before and after images of affected rows are logged and
replayed on slaves• foo : before (1,1) after (1,2)
• Mixed• Statement based unless it is determined to be unsafe, at
which point row based• "update foo set c1=5 limit 5;"
37
MySQL Replication – Read Only
Setting the slave's read_only=1• Puts the slave in "read only" mode• Except that user's with the SUPER privilege are allowed
to insert/update/delete– This can break RFR!
38
MySQL Replication – Slave Apply
Simple (hand wavy) overview of the slave process• Read a replication event from the binary log• If "statement"• Execute the statement on the slave
• If "row"• Insert, just write the row• Delete/Update, lookup the row• If row doesn’t exist, stop replication
39
MySQL Replication – Slave Lag
In MySQL 5.5, replication is single threaded• Masters support concurrency, slaves do not• Causes slaves to "lag" behind the master
Improvements exist• MySQL 5.6 supports multi-threaded slaves (database)• MariaDB 10.0 it's own parallel replication mechanism
All of these will work with TokuDB's Read Free Replication
TokuDB Read Free Replication
41
Read Free Replication - Requirements
• What is required on the master?• binlog_format=ROW
• What is required on the slave?• read_only=1• tokudb_rpl_unique_checks=0 and/or
tokudb_rpl_lookup_rows=0
42
RFR Optimization #1 – Skip Unique Checks
• tokudb_rpl_unique_checks=0
• Why is it OK?• The master already performed the uniqueness
check
• Why can't InnoDB skip unique checks?• It could, but…• InnoDB doesn't support change buffering on the PK• So, the row must be read for maintenance
• Since it is then in memory, there is little to be gained for skipping the check
43
RFR Optimization #2 – Skip Read/Modify/Write
• tokudb_rpl_lookup_rows=0
• Why is it OK?• If RBR, master provided before/after row images
• Why can't InnoDB skip read/modify/write?• InnoDB doesn't support change buffering on the PK• So, the row must be read for maintenance
• Why can TokuDB skip read/modify/write?• Everything necessary is in the binary log• Simple message injection
44
Read Free Replication – Sysbench Benchmark
No lag
Enabled Read Free Replication
45
Read Free Replication – Sysbench Benchmark
Additional Read Capacity
Enabled Read Free Replication
46
Read Free Replication – Sysbench Benchmark
Mostly fsync()
No reads!
Enabled Read Free Replication
47
Read Free Replication - Ideas
#1, scale your reads• HA is nice, but don’t we also want to scale our reads?
IOIO
Master Slave
Slave
Workload
Workload
Readers
Readers
No RFR
RFRIO
Master
IO
48
Read Free Replication - Ideas
#2, shared slaves
IOIO
Master1 Slave1
No RFR
RFR
IOIO
Slave2 Master2
IO
IO
Master1 Slave1+2
IO
Master2
IO
1 machine2 mysqlds
49
Read Free Replication - Ideas
#3, high IO master (flash/SSD), low IO slave (SAS/SATA)
Master Slave
RFR
50
Can we do even more?
• Yes, Reduce fsync() calls on slaves - writes• This is the current choke point for RFR slaves
• In 5.5, master.info and relay-log.info are just files• Each need fsync() for crash safety
• In 5.6, these files can be InnoDB tables• 3 fsync() operations are now 1• However, this becomes an XA transaction when
TokuDB tables are in use• Even more fsync() calls
• Should be able to convert these to TokuDB
51
TokuDB Resources
• Website @ www.tokutek.com
• Documentation @ docs.tokutek.com/tokudb
• Community @ tokudb-user Google Group
• Tokutek Blogs @ www.tokutek.com/tokuview