InnoDB Magic

33
g414-inno Embedded InnoDB,Voldemort, St8, and a Few Other Tidbits Sunny Gleason

description

 

Transcript of InnoDB Magic

Page 1: InnoDB Magic

g414-innoEmbedded InnoDB, Voldemort, St8,

and a Few Other Tidbits

Sunny Gleason

Page 2: InnoDB Magic

What’s in this Preso

• What is InnoDB?

• Relation to MySQL & Other Products

• InnoDB Model

• g414-inno: a Java Access Library for InnoDB

Page 3: InnoDB Magic

What else is in this Preso

• Creating a Voldemort Storage Engine with Embedded InnoDB

• St8: A REST-based Storage Server

• Faban Benchmark Results

Page 4: InnoDB Magic

What is InnoDB?

• High-Performance “guts” of MySQL

• Finely Tuned B-Tree Storage Engine

• MVCC Transactional Store a la Jim Gray (“Transactional Processing Systems”)

• Available Stand-Alone as Embedded InnoDB (stagnant) or HailDB (drizzle)

Page 5: InnoDB Magic

Relation to MySQL

• One of many MySQL storage engines

• Transactional, in contrast to MYISAM

• Well-known, Bullet-Proof Backup, Failure & Recovery Modes

• Advanced Buffer Pool Management (adaptive hash index, tunable LRU)

• Online Backup Support (Xtrabackup / Hot)

Page 6: InnoDB Magic

Other Products

• Tokyo BDB, Oracle BDB & BDB-JE

• Schema-Free (No Structure / Data Types)

• Lower Concurrency (fewer writers)

• Performance Degradation in Larger DBs

• (TODO: quantify performance gap - in meantime, see Dynamo & Voldemort)

Page 7: InnoDB Magic

InnoDB Model (Logical)• Database == Tablespace

• Tablespace has Table(s) and Log(s)

• Table has columns (rich datatypes)

• Tables have a PRIMARY clustered index

• Tables may have SECONDARY indexes

• Row == Tuple

• Tuples are stored / clustered by index sort

• Secondary index stores full Primary Key

Page 8: InnoDB Magic

InnoDB Model (Txns)• Everything uses a Transaction

• Isolation Level: Serialized, Read Committed, Read Uncommitted

• Locks: Shared (Read-only), Exclusive (Read/Write)

• Cursors provide access to tables: Lookup by index, Iteration / Traversal

• Secondary index contains partial Tuples

• Secondary cursor can access primary (full tuple)

Page 9: InnoDB Magic

InnoDB Model (Physical)

• Tablespace is a collection of pages (16K)

• Pages organized as a B-Tree: infimum & supremum keys, pointers to children

• Pages contain row or index tuple data, or blob overflow data

• Pages written to log first and flushed to tablespace based on ‘sync’ policy

Page 10: InnoDB Magic

Physical Considerations

• New pages requested from OS in extend_size increments

• OS Assigns space from file system / partition “free list”

• Temporal Locality (pages close together)

• Spatial Locality / Fragmentation from Updates

• Prefer “narrow” rows / indexes: faster scan, keeps working set in-memory

• Secondary “covering” indexes can save primary index access

Page 11: InnoDB Magic

!""#$%&$'(')'*+&,-.+*

!"#$%$ &!'()

!"#$%&'$()*%#+(,%

!"#$%"&'(&#&

(!)#!*"&%+

-"!./'0($('012%,$32"

*++,-.%$"'()

/0 !""*(,-.!'$-/$%-#&,'$

0!,( .!'$1

0.%2 .!'$1

3"(*'*41

!"1$%#5,3..$%5

*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052

Page 12: InnoDB Magic

!""#$%&'()*+,-(.+,

!"#$%#

&$'($%#

!"#$%#

!"#$%# !"#$%#

)%*$"#$%#*+*,-*.)'$/

!"#$%#

01"*23

456

72$83*9

4588*.52%#$1

72$83*.52%#$1/

72$83*: 72$83*%

456

;)'$

456

456

456 456

<$)=*%53$*/$'($%#

0)>8$/.)?$

4588>)?@*/$'($%#

A5%B8$)=*%53$*/$'($%#

456456

*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052

Page 13: InnoDB Magic

!""#$%&'#()

!"#$%&'()*+, **

#+,-./#(012,

-./0-12$3"456-./0-12$3"456

!"#$%&'(&%'''')%*'+,''''!$--'./%''''0-&'./%1'''$2"%3-$45.67"'./%'88'09"-&'26-:"1

#+,-./#(012,

* *7890/:-2$3"4567890/:-2$3"456

!"#$%&'(

*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052

Page 14: InnoDB Magic

!""#$%&!"'()(*&+ ,-./0-1

!$020&-#3*&0-(&*2#-('&."&24(&%+2-((&5(06&"#'(*&#6&0&758*2(-('&."'()!%+2-((&.*&#-90".:('&;1&<-./0-1&=(1&#-&"#"+"855&8".>8(&=(1&#6&20;5(?&.6&'(6."('@&(5*(?&0"&."2(-"05&7#58/"&3.24&A+;12(&BCDE!$&.*&0''('F

!"#$%&'&()*'+,-'./0&.1

+2(&3

!!

"#$%&#'()*+,-

*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052

Page 15: InnoDB Magic

!""#$%&!"'()(*&+ ,(-#"'./0

! ,(-#"'./0&1"'()&%+2/((&3(.4&"#'(*&-#"2.1"5&4#/&(.-6&7(0&8.39(5&26(&:/1;./0&7(0*&#4&26(&-#//(*:#"'1"<&/#=*5&9*('&2#&.--(**&-39*2(/1"<&1"'()&2#&#>2.1"&26(&'.2.

!"#$%&'&()*'+,-'./0&.1

+2(&3

!"#$%&'&()*'+,-'./0&.1

+2(&3

4&!52(-'./+2(&3

67%'&&/"&-8/25(&$9/!52%-+2+2:/(-%-

67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$

4&!52(-'./+2(&3

67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$

,(-#"'./0 !"#$%

*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052

Page 16: InnoDB Magic

How can we use InnoDB?

• Download Embedded InnoDB or HailDB

• Use C-API for access to InnoDB tables

• Innostore: Erlang library for InnoDB access (from Basho’s Riak NoSQL project)

• g414-inno: Open-Source Java access library for Embedded InnoDB

Page 17: InnoDB Magic

g414-inno Foundations

• Uses JNA (Java Native Access): Like JNI, but doesn’t provoke (as much) insanity

• JNAerate: creates thin Java Class wrapper from a C-based header file (innodb.h)

• But, complex C API’s are super ugly in Java

• Need to clean that up a bit...

Page 18: InnoDB Magic

g414-inno Library

• Provides a more Object-Oriented API to mask all of the JNA “Pointer” madness

• Transaction Objects, Cursors, Table Builder, Tuple Builder, Datatype Validation

• Java Enum Types for ‘int’ enums in C API

• inTransaction() templates (like Spring, JDBI)

• Contains sanity checks to prevent common errors (mostly C API order of operations)

Page 19: InnoDB Magic

Use Case: Voldemort• Voldemort: High-Performance Key-Value

Store (Amazon Dynamo clone)

• Nokia: good results with Voldemort on MySQL with InnoDB

• Typical features of DB (network connectivity, SQL language) not really necessary

• Thought: why bother with DB layer?The g414-inno project is born ...

Page 20: InnoDB Magic

Voldemort Storage Engines

• Trivial to integrate new persistence mechanisms with Voldemort

• 2 Classes: Config & Storage Engine

• Trivial InnoDB Table:key_ VARBINARY(200) NOT NULLversion_ VARBINARY(200) NOT NULLvalue_ BLOBPRIMARY KEY (key_, version_)

• 3 Operations: put(k, v), get(k), delete(k)

• Complication: k is Versioned<Key>

Page 21: InnoDB Magic

V Storage Engine: put• put(byte[] key, byte[] version, byte[] value)

• Start transaction, open table cursor

• Create search tuple for key

• Cursor.find(key)

• Foreach row matching key if row.version is below, delete row if row.version is above, throw exception

• Cursor.insert(key, version, value)

Page 22: InnoDB Magic

V Storage Engine: get• get(byte[] key, byte[] version)

• Start transaction, open table cursor

• Create search tuple for key

• Cursor.find(key)

• Foreach row matching key add to results

• Return results

Page 23: InnoDB Magic

V Storage Engine: delete

• delete(byte[] key)

• Start transaction, open table cursor

• Create search tuple for key

• Cursor.find(key)

• Foreach row matching key delete row

Page 24: InnoDB Magic

V Storage Engine: TODO

• Perform Benchmarks (in EC2, local)

• Tuning / Optimization

• Clarify licenses (GPLv2 + Apache == ouch)

• Organize & streamline distribution

Page 25: InnoDB Magic

St8

• Simple, Open Source REST-based Storage Server

• Wraps InnoDB with thin “but pleasant” HTTP API

• Custom Tables using JSON table definitions

• Natural, JSON-based access to tables: CRUD, Index-based Query & Iteration

• Under the hood: Jetty, Jersey, Guice, Jackson, g414-inno, Embedded InnoDB

Page 26: InnoDB Magic

St8 Table Def{"columns":[ {"name":"key1","type":"INT","length":4}, {"name":"key2","type":"VARCHAR","length":50}, {"name":"val","type":"BLOB","length":0}],"indexes":[ { "name":"PRIMARY", "clustered":true,"unique":true, "indexColumns":[{"name":"key1"}] }, { "name":"key2", "clustered":false,"unique":false, "indexColumns":[{"name":"key2"}] }]}

Page 27: InnoDB Magic

St8 Interface• Operations for Table Management: create,

describe, delete, truncate

• Operations for Data Management: Create, Retrieve, Update, Delete

• Influences g414-inno design: template methods for inTransaction(), insert, update, insertOrUpdate, delete, load

• Coming Soon: Query & Iteration APIs

Page 28: InnoDB Magic

St8: Sample Requests

SIMPLE GET:

curl "http://localhost:8080/d/atable;key1=123"

INSERT:

curl -X PUT "http://localhost:8080/d/atable;key1=123;key2=ABC;val=AVERYLONGDATA"

UPDATE:

curl -X POST "http://localhost:8080/d/atable;key1=123;key2=CDE;val=NEWDATA"

DELETE:

curl -X DELETE "http://localhost:8080/d/atable;key1=123"

Page 29: InnoDB Magic

g414-inno: Faban Benchmark

• Row: 4-byte Key, 4096-byte value

• Insert Sequential, Random

• Single disk, 3-disk RAID 0, SSD

• TODO: Concurrent Benchmarks, Mixed Read/Write

Page 30: InnoDB Magic

Benchmark Results

0

5

10

15

20

InsertSeq InsertRnd SelectRnd

Embedded InnoDB Latency (ms)

Single Disk (OS X 1) 3-Disk Raid 0 (OS X 1) SSD (OS X 2)

Single-Threaded Benchmarks InsertSeq InsertRnd SelectRndSingle Disk (OS X 1) 9.0 9.3 163-Disk Raid 0 (OS X 1) 0.47 1.4 5.2SSD (OS X 2) 0.51 1.2 0.71

Page 31: InnoDB Magic

Next Steps / Future Work

• Finish St8: Queries & Iteration, Benchmark

• Package / Qualify Voldemort Storage Engine

• Integrate with Xtrabackup (hot backup)

• Integrate with Sqoop (hadoop export)

• Explore more advanced App-Level Replication Support

Page 32: InnoDB Magic

Questions?

• Thank you for listening!