Marko Mäkelä - InnoDB

46
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | MySQL 5.7 InnoDB—What’s New Sunny Bains – [email protected] Senior Engineering Manager Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Marko Mäkelä [email protected] Senior Principal Software Engineer

description

Marko Mäkelä - InnoDB, Tech Tour Helsinki 2014

Transcript of Marko Mäkelä - InnoDB

Page 1: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7InnoDB—What’s NewSunny Bains – [email protected]

Senior Engineering Manager

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.

Marko Mäkelä [email protected]

Senior Principal Software Engineer

Page 2: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 3: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Performance

Features

Download & Blogs

Page 4: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Transactions

Transaction poolFixed chunks of 4MB each

sizeof(trx_t) reduced from 1144 to 712 bytes

Ordered on address, locality of reference

Improves performance of read-write transaction list scans

Reduces malloc()/free() overhead

Page 5: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Transactions

Transaction life cycleAll transactions are considered as read-only by default

Read only transaction start/commit mutex free

No application changes required

Read views are cached

Read view created iff a RW transaction started since the last snapshot

Reduce contention when implicit → explicit row lock conversion is done

Page 6: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Transactions

Transaction life cycleHigh priority transactions (Replication GCS)

Cannot be rolled back

Can jump the record lock queue – prioritized

Currently not visible to end users

Allows rollback of arbitrary user transactions internally• If purge blocked due to long running transactions

Page 7: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Memcached Plugin

8.0 16.0 32.0 64.0 128.0 256.0 512.0 1024.00

200,000

400,000

600,000

800,000

1,000,000

1,200,000

MySQL 5.7 vs 5.6 - InnoDB & Memcached

MySQL 5.7

MySQL 5.6

Connections

Queries per Second

Page 8: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Sysbench OLTP Read-Only

Page 9: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Sysbench OLTP Read-Write

Page 10: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Table Optimizations

DDL changesNot stored in the data dictionary – lower mutex contention

Special shared temporary tablespace – lower IO overhead

Compressed tables done the old way, separate .ibd file

Tablespace recreated on startup

Page 11: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Table Optimizations

DML changesSpecial UNDO logs that are not redo logged

Undo logging required for rollback to savepoint

Changes to the temporary tablespace are not redo logged

No fsyncs() on the temporary tablespace

Configuration variablestemp_data_file_path := same format as the system tablespace

e.g., ibtmp1:12M:autoextend – default setting

Page 12: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5.70

100

200

300

400

500

600

700

Temporary table CREATE/DROP

Version

Se

con

ds

Page 13: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5.70

100

200

300

400

500

600

700

Insert 5M rows

Versions

Se

con

ds

Page 14: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5,70

500

1000

1500

2000

2500

Delete 5M rows

Version

Se

con

ds

Page 15: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5.70

500

1000

1500

2000

2500

Update 5M rows

Version

Se

con

ds

Page 16: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Buffer pool improvements

Use atomics for page reference countingBug#68079 - INNODB DOES NOT SCALE WELL ON 12 CORE

Flush list traversalFix flush and LRU list rescanning

Previously after flushing a page we would start from the tail again

Extra work and increased mutex contention

Page 17: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Buffer pool improvements

Multithreaded flushing5.6 introduced a separate thread for flushing

5.7 allows multiple threads

--innodb-page-cleaners := 1..64 – default is 1

Page 18: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Redo log

Better concurrency and faster recoveryRemove unused code

Fix read on write issue – pad the log buffer before writing to disk

Refactor and clean up the mini-transaction code

Optimize mutex acquire/release during log checkpoint

Write data file names to the redo log

No need to open clean *.ibd files on crash recovery

Page 19: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Memcached Plugin

Leverages the read-only transaction optimizationsFixed several bottlenecks in the Memcache and the plugin code

1.1 Million GET/s

Limiting factors were:• The network• Memcached client

Page 20: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Memcache Benchmarks

Page 21: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : index->lock

Increased concurrency, improved performanceVery complex fix

Previously entire index X latched for tree structure modification

B-tree internal nodes not latched before fix

New SX lock mode – compatible with S lock mode

Increases concurrency e.g., index->lock(SX), reads can proceed

Page 22: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : DDL & Truncate

Truncate table is now atomicPreviously DROP + CREATE

ID mismatch or .ibd missing If crash after DROP but before CREATE

More schema-only ALTER TABLE supportedRename index

VARCHAR extension

Faster ALTER TABLEBug#17657223 EXCESSIVE TEMPORARY FILE USAGE IN ALTER TABLE

WL#7277 Bulk Load for Index Creation

Page 23: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Partitions

Native PartitioningReduced memory overhead

Allows us to easily add• Foreign key support• Full text index support

Makes it easier to plan for a parallel query infra-structure

Page 24: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : PartitionsNative Partitioning memory overhead improvement — labs

Example Table with 8K partitionsCREATE TABLE `t1` (

`a` int(10) unsigned NOT NULL AUTO_INCREMENT,`b` varchar(1024) DEFAULT NULL, PRIMARY KEY (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY HASH (a) PARTITIONS 8192;

Memory overhead comparison

One open instance uses 49 % less memory (111 MB vs 218 MB)

Ten open instances take 90 % less memory (113 MB vs 1166 MB)

More work to be done – stay tuned!

Page 25: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : PartitionsImport/Export support

Importing a single partition# If the table doesn't already exist, create it

mysql> CREATE TABLE partitioned_table <same as the source>;

# Discard the tablespaces for the partitions to be restored

mysql> ALTER TABLE partitioned_table DISCARD PARTITION p1,p4 TABLESPACE;

# Copy the tablespace files

$ cp /path/to/backup/db-name/partitioned_table#P#p{1,4}.{ibd,cfg} /path/to/mysql-datadir/db-name/

# Import the tablespaces

mysql> ALTER TABLE partitioned_table IMPORT PARTITION p1,p4 TABLESPACE;

Page 26: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Partitions

DMLIndex condition push down

Limited HANDLER support for partitionsCREATE TABLE t (a int, b int, KEY (a, b)) PARTITION BY HASH (b) PARTITIONS 2;

HANDLER t READ a = (1, 2);

HANDLER t READ a NEXT;

Page 27: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Tablespace management – labs

TablespacesSQL syntax for explicit tablespace management

Replaces legacy --innodb-file-per-table usage

• CREATE TABLESPACE Logs ADD DATAFILE 'log01.ibd';• CREATE TABLE http_req(c1 varchar) TABLESPACE=Logs ;• ALTER TABLE some_table TABLESPACE=Logs;• DROP TABLESPACE Logs; -- must be empty

Page 28: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Tablespace management – labs

TablespacesSQL syntax for explicit tablespace management

Replaces legacy --innodb-file-per-table usage

• CREATE TABLESPACE Logs ADD DATAFILE 'log01.ibd';• CREATE TABLE http_req(c1 varchar) TABLESPACE=Logs ;• ALTER TABLE some_table TABLESPACE=Logs;• DROP TABLESPACE Logs; -- must be empty

Page 29: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Buffer Pool

Dynamic buffer pool size re-sizeDone in a separate thread

--innodb_buffer_pool_chunk_size – resize done in chunk size

Example:

SET GLOBAL innodb_buffer_pool_size=402653184;

Page 30: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : UNDO

UNDO Log Space ManagementRequires separate UNDO tablespaces to work

• --innodb-undo_log_truncate=on (default off)• --innodb-max_undo_log_size – default 1G• --innodb-purge-rseg-truncate-frequency – default 128 - advanced

Page 31: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Larger Page Sizes - labs

32K and 64K Page SizesLonger records can be stored “in-page”

Better compression with the new transparent page compression

Page 32: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Memcached Multiple Get - labs

MemcachedMultiple get

Gets around Memcached client protocol bottlenecks• Shorter query string• Fetch multiple keys in range• Extension to “traditional” Memcached

Page 33: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : GIS

Spatial indexImplemented as an R-Tree

Supports all MySQL geometric types

Currently only 2D supported

Supports transactions & MVCC

Uses predicate locking to avoid phantom reads

Page 34: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : GIS

R-TreeMulti-dimension spatial data search

Queries more like:• Find object “within”, “intersects” or “touches” another object• MySQL geometric types

• POINT, LINESTRING, POLYGON, MULTIPOINT,• MULTILINESTRING, MULTIPOLYGON, GEOMETRY

Page 35: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Mutexes

Flexible mutexesMix and match mutex types in the code – build time option only

Can use futex on Linux instead of condition variables

Futex eliminates “thundering herd” problem

Not enabled by default, build with -DMUTEX_TYPE=”futex” from source

Page 36: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Character Sets

Adds the MySQL character set gb18030 (Chinese encoding for Unicode)

Supports the China National Standard GB 18030 character set.

The new associated collations are gb18030_bin and gb18030_chinese_ci.

Page 37: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Full Text Search

Support for external parser

For tokenizing the document and the query

Example:

CREATE TABLE t1 (

id INT AUTO_INCREMENT PRIMARY KEY,

doc CHAR(255), FULLTEXT INDEX (doc) WITH PARSER my_parser) ENGINE=InnoDB;

ALTER TABLE articles ADD FULLTEXT INDEX (body) WITH PARSER my_parser;

CREATE FULLTEXT INDEX ft_index ON articles(body) WITH PARSER my_parser;

Page 38: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Sandisk/FusionIO Atomic Writes

No new configuration variables – may change in GASystem wide settingDisables the doublewrite buffer if the system tablespace is on NVMFSMore to come

Page 39: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Transparent PageIO Compression

Proof of concept patch from FusionIO – currently Linux onlyRequires sparse file support : NVMFS, XFS, EXT4, ZFS & NTFSLinux 2.6.39+ added PUNCH HOLE supportCan co-exist with current ROW_FORMAT=COMPRESSED tablesWorks on all table types, including the system tablespace

Page 40: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Transparent PageIO Compression

Page 41: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Zip compression Tested and tried, works well enough

Complicates buffer pool code

Special page format required

No IO layer changes

Algorithm supported - Zlib

Can't compress system tablespace

Can't compress UNDO tablespace

Features : Zip vs Page IO compression

PageIO compression Requires OS/FS support

Simple

Works with all file types, system tablespaces

Potential fragmentation issues

NVMFS doesn't suffer from fragmentation

Adds to the cost of IO

Current algorithms are tuned to existing assumptions

Requires multi-threaded flushing

Easy to add new algorithms – Zlib, LZ4 etc.

Page 42: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : PageIO Compression Benchmark

FusionIO – 25G BP – maxid 50 Million 64 Requesters - Linkbench

Normal PageIO compression Current compression0

10000

20000

30000

40000

50000

60000

70000

Size & Operations per/sec

Ops/sec

Size

Siz

e a

nd

op

era

tion

s p

er

se

c

Page 43: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Transparent PageIO Compression

Configuration options--innodb-compression-algorithm := 0,1,2

Where: 0 – None, 1 – Zlib and 2 – LZ4

--innodb-compression-level

--innodb-compression-punch-hole := boolean

--innodb-read-async := boolean

--innodb-read-block-size := boolean

Page IO Compression

Page 44: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Miscellaneous

Implement update_time for InnoDB tablesImprove select count(*) performance by using handler::records();Improve recovery, redo log tablespace meta data changes

Page 45: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Download & Blogs

http://labs.mysql.com

http://dev.mysql.com/downloads/mysql/

http://mysqlserverteam.com/

Page 46: Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Thank You!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.