Marko Mäkelä - InnoDB

Post on 06-Jul-2015

157 views 2 download

description

Marko Mäkelä - InnoDB, Tech Tour Helsinki 2014

Transcript of Marko Mäkelä - InnoDB

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7InnoDB—What’s NewSunny Bains – sunny.bains@oracle.com

Senior Engineering Manager

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.

Marko Mäkelä marko.makela@oracle.com

Senior Principal Software Engineer

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Performance

Features

Download & Blogs

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Transactions

Transaction poolFixed chunks of 4MB each

sizeof(trx_t) reduced from 1144 to 712 bytes

Ordered on address, locality of reference

Improves performance of read-write transaction list scans

Reduces malloc()/free() overhead

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Transactions

Transaction life cycleAll transactions are considered as read-only by default

Read only transaction start/commit mutex free

No application changes required

Read views are cached

Read view created iff a RW transaction started since the last snapshot

Reduce contention when implicit → explicit row lock conversion is done

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Transactions

Transaction life cycleHigh priority transactions (Replication GCS)

Cannot be rolled back

Can jump the record lock queue – prioritized

Currently not visible to end users

Allows rollback of arbitrary user transactions internally• If purge blocked due to long running transactions

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Memcached Plugin

8.0 16.0 32.0 64.0 128.0 256.0 512.0 1024.00

200,000

400,000

600,000

800,000

1,000,000

1,200,000

MySQL 5.7 vs 5.6 - InnoDB & Memcached

MySQL 5.7

MySQL 5.6

Connections

Queries per Second

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Sysbench OLTP Read-Only

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Sysbench OLTP Read-Write

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Table Optimizations

DDL changesNot stored in the data dictionary – lower mutex contention

Special shared temporary tablespace – lower IO overhead

Compressed tables done the old way, separate .ibd file

Tablespace recreated on startup

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Table Optimizations

DML changesSpecial UNDO logs that are not redo logged

Undo logging required for rollback to savepoint

Changes to the temporary tablespace are not redo logged

No fsyncs() on the temporary tablespace

Configuration variablestemp_data_file_path := same format as the system tablespace

e.g., ibtmp1:12M:autoextend – default setting

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5.70

100

200

300

400

500

600

700

Temporary table CREATE/DROP

Version

Se

con

ds

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5.70

100

200

300

400

500

600

700

Insert 5M rows

Versions

Se

con

ds

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5,70

500

1000

1500

2000

2500

Delete 5M rows

Version

Se

con

ds

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Temporary Tables Benchmarks

5.6 5.70

500

1000

1500

2000

2500

Update 5M rows

Version

Se

con

ds

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Buffer pool improvements

Use atomics for page reference countingBug#68079 - INNODB DOES NOT SCALE WELL ON 12 CORE

Flush list traversalFix flush and LRU list rescanning

Previously after flushing a page we would start from the tail again

Extra work and increased mutex contention

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Buffer pool improvements

Multithreaded flushing5.6 introduced a separate thread for flushing

5.7 allows multiple threads

--innodb-page-cleaners := 1..64 – default is 1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Redo log

Better concurrency and faster recoveryRemove unused code

Fix read on write issue – pad the log buffer before writing to disk

Refactor and clean up the mini-transaction code

Optimize mutex acquire/release during log checkpoint

Write data file names to the redo log

No need to open clean *.ibd files on crash recovery

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Memcached Plugin

Leverages the read-only transaction optimizationsFixed several bottlenecks in the Memcache and the plugin code

1.1 Million GET/s

Limiting factors were:• The network• Memcached client

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : Memcache Benchmarks

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : index->lock

Increased concurrency, improved performanceVery complex fix

Previously entire index X latched for tree structure modification

B-tree internal nodes not latched before fix

New SX lock mode – compatible with S lock mode

Increases concurrency e.g., index->lock(SX), reads can proceed

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Performance : DDL & Truncate

Truncate table is now atomicPreviously DROP + CREATE

ID mismatch or .ibd missing If crash after DROP but before CREATE

More schema-only ALTER TABLE supportedRename index

VARCHAR extension

Faster ALTER TABLEBug#17657223 EXCESSIVE TEMPORARY FILE USAGE IN ALTER TABLE

WL#7277 Bulk Load for Index Creation

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Partitions

Native PartitioningReduced memory overhead

Allows us to easily add• Foreign key support• Full text index support

Makes it easier to plan for a parallel query infra-structure

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : PartitionsNative Partitioning memory overhead improvement — labs

Example Table with 8K partitionsCREATE TABLE `t1` (

`a` int(10) unsigned NOT NULL AUTO_INCREMENT,`b` varchar(1024) DEFAULT NULL, PRIMARY KEY (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY HASH (a) PARTITIONS 8192;

Memory overhead comparison

One open instance uses 49 % less memory (111 MB vs 218 MB)

Ten open instances take 90 % less memory (113 MB vs 1166 MB)

More work to be done – stay tuned!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : PartitionsImport/Export support

Importing a single partition# If the table doesn't already exist, create it

mysql> CREATE TABLE partitioned_table <same as the source>;

# Discard the tablespaces for the partitions to be restored

mysql> ALTER TABLE partitioned_table DISCARD PARTITION p1,p4 TABLESPACE;

# Copy the tablespace files

$ cp /path/to/backup/db-name/partitioned_table#P#p{1,4}.{ibd,cfg} /path/to/mysql-datadir/db-name/

# Import the tablespaces

mysql> ALTER TABLE partitioned_table IMPORT PARTITION p1,p4 TABLESPACE;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Partitions

DMLIndex condition push down

Limited HANDLER support for partitionsCREATE TABLE t (a int, b int, KEY (a, b)) PARTITION BY HASH (b) PARTITIONS 2;

HANDLER t READ a = (1, 2);

HANDLER t READ a NEXT;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Tablespace management – labs

TablespacesSQL syntax for explicit tablespace management

Replaces legacy --innodb-file-per-table usage

• CREATE TABLESPACE Logs ADD DATAFILE 'log01.ibd';• CREATE TABLE http_req(c1 varchar) TABLESPACE=Logs ;• ALTER TABLE some_table TABLESPACE=Logs;• DROP TABLESPACE Logs; -- must be empty

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Tablespace management – labs

TablespacesSQL syntax for explicit tablespace management

Replaces legacy --innodb-file-per-table usage

• CREATE TABLESPACE Logs ADD DATAFILE 'log01.ibd';• CREATE TABLE http_req(c1 varchar) TABLESPACE=Logs ;• ALTER TABLE some_table TABLESPACE=Logs;• DROP TABLESPACE Logs; -- must be empty

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Buffer Pool

Dynamic buffer pool size re-sizeDone in a separate thread

--innodb_buffer_pool_chunk_size – resize done in chunk size

Example:

SET GLOBAL innodb_buffer_pool_size=402653184;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : UNDO

UNDO Log Space ManagementRequires separate UNDO tablespaces to work

• --innodb-undo_log_truncate=on (default off)• --innodb-max_undo_log_size – default 1G• --innodb-purge-rseg-truncate-frequency – default 128 - advanced

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Larger Page Sizes - labs

32K and 64K Page SizesLonger records can be stored “in-page”

Better compression with the new transparent page compression

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Memcached Multiple Get - labs

MemcachedMultiple get

Gets around Memcached client protocol bottlenecks• Shorter query string• Fetch multiple keys in range• Extension to “traditional” Memcached

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : GIS

Spatial indexImplemented as an R-Tree

Supports all MySQL geometric types

Currently only 2D supported

Supports transactions & MVCC

Uses predicate locking to avoid phantom reads

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : GIS

R-TreeMulti-dimension spatial data search

Queries more like:• Find object “within”, “intersects” or “touches” another object• MySQL geometric types

• POINT, LINESTRING, POLYGON, MULTIPOINT,• MULTILINESTRING, MULTIPOLYGON, GEOMETRY

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Mutexes

Flexible mutexesMix and match mutex types in the code – build time option only

Can use futex on Linux instead of condition variables

Futex eliminates “thundering herd” problem

Not enabled by default, build with -DMUTEX_TYPE=”futex” from source

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Character Sets

Adds the MySQL character set gb18030 (Chinese encoding for Unicode)

Supports the China National Standard GB 18030 character set.

The new associated collations are gb18030_bin and gb18030_chinese_ci.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Full Text Search

Support for external parser

For tokenizing the document and the query

Example:

CREATE TABLE t1 (

id INT AUTO_INCREMENT PRIMARY KEY,

doc CHAR(255), FULLTEXT INDEX (doc) WITH PARSER my_parser) ENGINE=InnoDB;

ALTER TABLE articles ADD FULLTEXT INDEX (body) WITH PARSER my_parser;

CREATE FULLTEXT INDEX ft_index ON articles(body) WITH PARSER my_parser;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Sandisk/FusionIO Atomic Writes

No new configuration variables – may change in GASystem wide settingDisables the doublewrite buffer if the system tablespace is on NVMFSMore to come

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Transparent PageIO Compression

Proof of concept patch from FusionIO – currently Linux onlyRequires sparse file support : NVMFS, XFS, EXT4, ZFS & NTFSLinux 2.6.39+ added PUNCH HOLE supportCan co-exist with current ROW_FORMAT=COMPRESSED tablesWorks on all table types, including the system tablespace

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Transparent PageIO Compression

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Zip compression Tested and tried, works well enough

Complicates buffer pool code

Special page format required

No IO layer changes

Algorithm supported - Zlib

Can't compress system tablespace

Can't compress UNDO tablespace

Features : Zip vs Page IO compression

PageIO compression Requires OS/FS support

Simple

Works with all file types, system tablespaces

Potential fragmentation issues

NVMFS doesn't suffer from fragmentation

Adds to the cost of IO

Current algorithms are tuned to existing assumptions

Requires multi-threaded flushing

Easy to add new algorithms – Zlib, LZ4 etc.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : PageIO Compression Benchmark

FusionIO – 25G BP – maxid 50 Million 64 Requesters - Linkbench

Normal PageIO compression Current compression0

10000

20000

30000

40000

50000

60000

70000

Size & Operations per/sec

Ops/sec

Size

Siz

e a

nd

op

era

tion

s p

er

se

c

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Transparent PageIO Compression

Configuration options--innodb-compression-algorithm := 0,1,2

Where: 0 – None, 1 – Zlib and 2 – LZ4

--innodb-compression-level

--innodb-compression-punch-hole := boolean

--innodb-read-async := boolean

--innodb-read-block-size := boolean

Page IO Compression

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Features : Miscellaneous

Implement update_time for InnoDB tablesImprove select count(*) performance by using handler::records();Improve recovery, redo log tablespace meta data changes

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Download & Blogs

http://labs.mysql.com

http://dev.mysql.com/downloads/mysql/

http://mysqlserverteam.com/

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Thank You!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.