CompressioninOpenSource Databases* - Percona...Abit*of*History**...
Transcript of CompressioninOpenSource Databases* - Percona...Abit*of*History**...
Compression in Open Source Databases
Peter Zaitsev CEO, Percona Percona Technical Webinars January 27th, 2016
About the Talk
A bit of the History
Approaches to Data Compression
What some of the popular systems implement
2
Lets Define The Term
Compression -‐ Any Technique to make data size smaller
3
A bit of History
Early Computers were too slow to compress data in SoMware
Hardware Compression (ie Tape)
Compression first appears for non performance criQcal data
4
We did not need it much for space… 5
Welcome to the modern age
Data Growth outpaces HDD improvements
Powerful CPUs Flash
Cloud Data we store now
6
ExponenCal Data Size Growth 7
Powerful CPUs
High Performance
MulQple Cores
8
Compression and Decompression Performance
• From hZps://github.com/inikep/lzbench
9
Compressor Compress Decompress RaCo
memcpy 8368 MB/sec 8406 MB/sec 100%
Brotli level 2 66 MB/sec 207 MB/sec 45%
Lz4 487 MB/sec 2452 MB/sec 62%
Lz4fast level 17 964 MB/sec 3112 MB/sec 74%
Snappy 326 MB/sec 1147 MB/s 62%
Lzma Level 2 10 MB/s 37 MB/s 39%
Zlib level 1 39 MB/s 201 MB/s 49%
Zstd level 1 249 MB/s 537 MB/s 49%
Flash (Solid State)
Disk space is more costly than for HDDs
Write Endurance is expensive
Want to write less data
Decent at handling fragmentaQon
10
Cloud
Pay for Space Pay for IOPS
More limited Storage
Performance
Network Performance may be limited
11
Data we store in Databases
• Text • JSON • XML • Time Series Data • Log Files
Modern Data
Compresses Well!
12
COMPRESSION BASICS IntroducQon into a ways of making your data smaller
13
Lossy and Lossless
Database generally use Lossless Compression
Lossy compression done on the applicaQon level
14
Some ways of geQng data smaller
Layout OpQmizaQons “Encoding”
DicQonary Compression
Block Compression
15
Layout OpCmizaCons
Column Store versus Row Store
Hybrid Formats
Variable Block Sizes
16
Encoding
Depends on Data Type and Domain
Delta Encoding, Run Length Encoding (RLE)
Can be faster than read of uncompressed data
UTF8 (strings) and VLQ (Integers)
Index Prefix Compression
17
DicConary Compression
Replacing frequent values with DicQonary
Pointers
Kind of like STL String
ENUM type in MySQL
18
Block Compression
Compress “block” of data so it is smaller for storage
Finding PaZerns in Data and Efficiently encoding them
Many Algorithms Exist: Snappy, Zlib, LZ4, LZMA
19
Block Compression Details
Compression rate highly depends on data
Compression rate depends on block size
Speed depends on block size and data
20
Block Size Dependence (by Leif Walsh) 21
There is no one size fits all
Typically Compression Algorithm can be selected
OMen with addiQonal seongs
22
WHERE AND HOW Where do we compress data and how do we do that
23
Where to Compress Data
In Memory ?
In the Database Data Store ?
As Part of File System ?
Storage Hardware ?
ApplicaQon ?
24
Compression in Memory
Reduce amount of memory needed for same working
set
Reduce IO for Fixed amount of
Memory
Typically in-‐Memory
Performance Hit
Encoding/DicQonary
Compression are good fit
25
Database Data Store
Reduce Database Size on Disk
Works with all file systems and storage
With OS cache can be used as In-‐
Memory compression variant
Dealing with fragmentaQon is common issue
26
Compression on File System Level
Works with all Databases/
Storage Engines
Performance Impact can be significant
Logical Space on disk is not reduced
ZFS
27
Compression on Storage Hardware
Hardware Dependent
Does not reduce space on disk
Can result in Performance
Gains rather than free space (SSD)
Can become a choke point
28
By ApplicaCon
No Database Support needed
Reduce Database Load and Network Traffic
ApplicaQon may know more about data
More Complexity
Give up many DBMS features (search, index)
29
DESIGN CONSIDERATIONS What makes database system to do well with compression
30
The Goal
Minimize NegaQve Impact for User OperaQons (Reads and Writes)
31
Design Principles
Fast Decompression
Compression in Background
Parallel Compression/Decompression
Reduce need of Re-‐Compression
on Update
32
Choosing Block Size
Large Blocks
• Most efficient for compression
• Bulky Read Writes
Small Blocks
• Fastest to Decompress
• Best for point lookups
33
IMPLEMENTATION EXAMPLES What Database systems Really do with Compression
34
MySQL “Packed” MyISAM
Compress table “offline” with myisampack
Table Becomes Read Only
Variety of compression methods are used
Only data is compressed, not indexes
Note MyISAM support index prefix compression for all indexes
35
MySQL Archive Storage Engine
Does not support indexes
EssenQally file of rows with sequenQal access
Uses zlib compression
36
Innodb Table Compression
Available Since MySQL 5.1
Pages compressed using zlib
Compressed page target (1K, 4K, 8K) has to be set
Both Compressed and Uncompressed pages can be cached in Buffer Pool
Per Page “log” of changes to avoid recompression
Extenrally Stored BLOBs are compressed as single enQty
37
Innodb Transparent Page Compression
Available in MySQL 5.7
Zib and LZ4 Compression
Compresses pages as they are wriZen to disk
Free space on the page is given back using “hole punching”
Originally designed to work with FusionIO NVMFS
Can cause problems for current filesystem due to very high hole number
38
Disk usage (Linkbench data set by Sunny Bains) 39
Performance on Fast SSD (FusionIO NVMFS) 40
Results on Slower SSD (Intel 730*2, EXT4) 41
Fractal Trees Compression
Available as Storage Engine for MySQL and MongoDB
Can use many compression libraries
Tunable Compression Block Size
Reduce Re-‐Compression due to message buffering
42
Can get a lot of compression 43
MongoDB WiredTiger Storage Engine
Engine Has many compression seongs
Indexes are using Index Prefix Compression
Data Pages can be compressed using zlib,lz4 or Snappy
44
Compression Size (results by Asya Kamsky) 45
Compression in RocksDB
RocksDB – LSM Based Storage Engine for MongoDB and MySQL
LSM works very well with compression
Supports, zlib, lz4, bzip2 compression
Can use different compression methods for different Levels in LSM
46
Compression results from Mike Kania 47
PostgreSQL
Uses compression by default with TOAST
2KB (default) or longer Strings, BlOBs
Unlike Innodb External Storage is not required for Compression
Recommended to use File system compression ie ZFS if Compression is Desired
48
Summary
Compression is Important in Modern Age
Consider it for your system
Many different techniques are used to make data smaller by databases
Compression support is rapidly changing and improving
49
www.percona.com
Percona Live Data Performance Conference
• April 18-‐21 in Santa Clara, CA at the Santa Clara ConvenQon Center
• Register with code “WebinarPL” to receive 15% off at registraQon
• MySQL, NoSQL, Data in the Cloud
www.perconalive.com