Best storage engine for MySQL
-
Upload
tomflemingh2 -
Category
Data & Analytics
-
view
382 -
download
2
description
Transcript of Best storage engine for MySQL
DeepDB for MySQL® Overv iewJ u l y 2 0 1 4
The World We Live In…
• According to IDC, the Database software market has a CAGR of 34.2%
• Wal-Mart generates 1 million new database records every hour
• Chevron generates data at a rate of 2TB/day!
• According to the Data Warehousing Institute 46% of companies plan to replace their existing data warehousing platforms
• Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.
MySQL Challenges
• Performance degrades as table sizes get larger – Limitations of the underlying computer science
• Highly indexed schemas negatively impact performance– More indexes helps query performance but hurts transactions
• Poor performance with complex queries– Many table joins
• Data loading times are slow due to poor concurrency– Table locking and single threaded operations
• Backup time and performance impact– Big databases are slow to backup and effect system
performance
Technology LimitationsMost relational databases use Traditional B+ Trees which have architectural
limitations that become apparent with large data sets or heavy indexing
Cache Ahead Summary Index Tree
Derived from the classic B+ Tree
Embedded statistics and other meta-data in the nodes improves both tree
navigation and indexing
Branch node segments can vary in size based on
actual data values
Summary nodes provide a mechanism navigate extremely large tables by minimizing the number of branches walked
Wider trees with embedded meta-data to enhance
search and modification operations
CASI Tree Instantiations:• A CASI Tree exists in both memory and on
disk for each table and index• The structure of the Tree on disk and in
memory are different• The (re)organization of the Tree on disk
happens asynchronously from the one in memory based on adaptive algorithms, to yield improved disk I/O and CPU concurrency
Root Node
Branch NodeSummary
Node
Summary Node
Summary NodeBranch
NodeBranch Node
CASI Tree Benefits
CONSTANT TIME INDEXING
Lightning fast indexing at extreme scale
SEGMENTED COLUMN STORE
Accelerates analytic operations and data management at scale
STREAMING I/O
Maximizes disk throughput with highly efficient use of IOPS
EXTREME CONCURRENCY
Minimizes locks and wait states to maximize CPU throughput
INTELLIGENT CACHING
Uses adaptive segment sizes and summaries to eliminate many disk reads
BUILD FOR THE CLOUD
Adaptive configuration and continuous optimization eliminates scheduled downtime
CASI Tree Principles:• Always try to append data to file (i.e. don't seek, use the current seek position)• Read data sequentially (i.e. don't seek, use the current seek position for next sequence of reads)• Continually re-writes & reorders data such that the previous two principles above are met
Constant Time IndexingMinimizes index cost enabling high performance heavily indexed tables
Different data structures on disk and in memoryAll work is performed in constant-time eliminating the need for periodic flushing
Streaming File I/O
(No memory map page
size limitations)
In Memory: Enhanced B+ Tree
• Optimized for ‘wide’ nodes with accelerated operations
• Stores index summaries to achieve great scale while maximizing cache effectiveness
• Values are stored independently of the tree
• Tree rebalancing occurs only in memory – no impact on data stored on disk
• No fixed page/block sizes
On Disk: Segmented Column Store
• Highly optimized for on-disk read/write access
• Never requires operational/in-place rebalancing
• All previous database states are available
• Efficiently supports variable size keys, values and ‘point reads’
• Utilizes segmented column store technology for indexes and columns
Key Benefits: Increases maximum practical table sizes and improves analytic performance by allowing for more indexing
Segmented Column StoreStructure of the index files for the database
– Provides the functional capabilities of a column store– Simultaneously read and write optimized– Instantaneous database start up/shut down– Columns are updated in tandem with value changes– Consistent performance and latency; optimized in real time– Columns consist of variable length segments – Each segment is a block of ordered keys, references to rows and
meta-data– Changes to the key space require only delta updates
Optimized for real-time analytics– Embedded statistical data in each segment– Allows for heavy indexing to improve query performance– Enables continuous transactional data feed
Suited for high levels of compression– Compact representation of keys with summarization– Flexible segment and delta compression
Key Benefits: Excellent compression facilities and improved query performance. Supports continuous streaming backups with snapshots
Streaming I/O
• Massively optimized delivering near wire speed throughput• Append only file structures virtually eliminate disk seeks• Concurrent operations for updates in memory & on disk• Optimizations for SSD, HDD, and in-memory-only operation• Minimizes IO wait states
DeepDB
Data Streams Streaming Transactional State Logging
Streaming Indexing
Key Benefits: Achieves near SSD like performance with magnetic HDD’s. Extends the life expectancy of SSD’s with built in wear leveling and no write amplification
Extreme Concurrency
Running the Sysbench test On a 32 CPU core system with 32 attached clients
Utilizes ~100% of available system resources to complete
the testLoad time 23.96sTest Time 5.82s
Transaction rate: 15k/sec
Strands system resources and takes longer to complete the
testLoad time 8m59sTest Time 54.09s
Transaction rate: 1.4k/sec
Key Benefits: Database operations take full advantage of all allocated system resources and dramatically improves system performance
Intelligent Caching
• Adaptive algorithms manage cache usage – Dynamically sized data segments– Point read capable: no page operations
• In-memory compression– Maximizes cache effectiveness– Adaptive operation manages compression vs. performance
• Summary indexing reduces cache ‘thrashing’– Only pull in the data that is relevant– No need to pull ‘pages’ in to cache
Key Benefits: Improves overall system performance by staying in cache more often then standard MySQL
Built for the Cloud
• Designed for easy deployments with virtually no configuration required in most cases
• No off-line operations– Continuous defragmentation & optimization– No downtime for scheduled maintenance
• Linear performance and consistent low latency• Instantaneous startup and shutdown• No performance degradations due to B+ Tree
rebalancing or log flushing
Key Benefits: Rapid deployment with almost no configuration and no off- line maintenance operations. Delivers greatly enhanced performance when using network based storage
DeepDB for MySQL
A storage engine that breaks through current
performance and scaling limitations
– Easy-to-install plugin replacement for the
InnoDB storage engine
– Requires no application or schema changes
– Scales-up performance of existing systems
– Increases practical data sizes and complexity
– Billions of rows with high index densities
– High performance index creation/maintenance
– High performance ACID transactions with
consistently low latency
– Reduced query latencies
Bare metal | Virtualized | Cloud
CentOS | RHEL | Ubuntu
InnoDBDeepDB
MySQLApache Server
PHP | Perl | Python | Etc.
Application Examples:Wordpress | SugarCRM | Drupal
Benefits The Entire Data Lifecycle
Load - Delimited files - Dump files
Operate - Transactions - Compress
Analyze - Replicate - Query
Protect - Backup - Recover
DeepDB
Provides enhanced scaling and performance across a broad set of use cases
Compatible with all existing MySQL applications and tool chains
Designed to fully leverage todays powerful computing systems
Optimized for deployment in the cloud with adaptive behavior and on-line maintenance
Data Loading
15
DeepDBReduces data loading times
by 20x or more
Whether you are loading delimited files or restoring MySQL dump files DeepDB can dramatically reduce your load times
DeepDB’s data loading advantage can be seen in both dedicated bare-metal and cloud based deployments
16
Transactional Performance
Use Cases (All tests performed on MySQL 5.5)
MySQL with DeepDB MySQL with InnoDB
Improvement
Streaming Data test (Machine-to-Machine) (iiBench Maximum Transactions/second with Single index) 3.795M/sec 217k/sec 17x
Transactional Workload Test (Financial)(Sysbench transaction rate) 15,083/sec 1,381/sec 11x
Complex Transactional Test (e-Commerce)(DBT-2 transaction rate using HDD) 205,184/min 15,086/min 13.6x
Social Media Transactional Test (Twitter)(iiBench with 250M Rows,7 Indexes w/ composite keys)
Database Creation 15 Minutes 24 Hours 96x
First query from cold start 50 seconds 5.5 Minutes 6.6x
Second query from Cold start 1 second 240 seconds 240x
Disk storage footprint (uncompressed) 29GB 50G 42%
17
Advantage in the Cloud
Reduces Disk Size Requirements
18
Cut Your Query Times in Half
19
DeepDB improves query speed by 1.5 to 2 times when measured against DBT3 benchmark
20
Protect Your Data
DeepDB architecture eliminates potential data integrity problems and patent-pending error recovery completes in just seconds • No updates in place• No memory map
Unique data structures support real-time and continuous streaming backups to ensure data is always protected• Append only files provide natural
incremental backups
DeepDBEnsures
your data is continually backed up
and available
DeepDB Advantages
21
The Ultimate MySQLStorage Engine
50% Smaller Data Footprint
Reduces compressed or uncompressed data to
less than half the size of InnoDB
5x-10x
Improvement in ACID transactional throughput
Plug-in Replacement for
InnoDB
Install DeepDB without any changes to existing
MySQL Applications
HDD=SSD
Increases effective HDD throughput to near SSD levels and extends SSD
life up to 10x
1B+ Rows
Provides high performance support for
very large tables
Run Queries Twice as Fast
Summary Indexing techniques enable ultra
low latency queries
Real-Time Backups
Create streaming backups with snapshotting
Low Latency Replicas
Efficiently scale out analytics and read heavy
work loads
20x Faster Data Loading
Concurrent operations and IO optimizations reduces load times
22
Try DeepDB yourself!
http://deep.is/downloads/
23
Thank You!