CompressioninOpenSource Databases* - Percona...Abit*of*History**...

51
Compression in Open Source Databases Peter Zaitsev CEO, Percona Percona Technical Webinars January 27 th , 2016

Transcript of CompressioninOpenSource Databases* - Percona...Abit*of*History**...

Page 1: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  in  Open  Source  Databases  

Peter  Zaitsev  CEO,  Percona  Percona  Technical  Webinars  January  27th,  2016    

Page 2: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

About  the  Talk  

A  bit  of  the  History  

Approaches  to  Data  Compression  

What  some  of  the  popular  systems  implement    

2  

Page 3: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Lets  Define  The  Term  

Compression  -­‐    Any  Technique  to  make  data  size  smaller    

3  

Page 4: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

A  bit  of  History    

Early  Computers  were  too  slow  to  compress  data  in  SoMware  

Hardware  Compression  (ie  Tape)  

Compression  first  appears  for  non  performance  criQcal  data  

4  

Page 5: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

We  did  not  need  it  much  for  space…  5  

Page 6: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Welcome  to  the  modern  age  

Data  Growth  outpaces  HDD  improvements  

Powerful  CPUs   Flash  

Cloud   Data  we  store  now    

6  

Page 7: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

ExponenCal  Data  Size  Growth  7  

Page 8: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Powerful  CPUs  

High  Performance    

MulQple  Cores  

8  

Page 9: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  and  Decompression  Performance  

•  From  hZps://github.com/inikep/lzbench  

9  

Compressor   Compress   Decompress   RaCo  

memcpy   8368  MB/sec   8406  MB/sec   100%  

Brotli  level  2   66  MB/sec   207  MB/sec   45%  

Lz4   487  MB/sec   2452  MB/sec   62%  

Lz4fast  level  17   964  MB/sec   3112  MB/sec   74%  

Snappy   326  MB/sec   1147  MB/s   62%  

Lzma  Level  2   10  MB/s   37  MB/s   39%  

Zlib  level  1   39  MB/s   201  MB/s   49%  

Zstd  level  1   249  MB/s   537  MB/s   49%  

Page 10: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Flash  (Solid  State)  

Disk  space  is  more  costly  than  for  HDDs  

Write  Endurance  is  expensive  

Want  to  write  less  data  

Decent  at  handling  fragmentaQon  

10  

Page 11: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Cloud  

Pay  for  Space   Pay  for  IOPS  

More  limited  Storage  

Performance  

Network  Performance  may  be  limited  

11  

Page 12: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Data  we  store  in  Databases  

• Text  • JSON    • XML  • Time  Series  Data  • Log  Files  

Modern  Data  

Compresses  Well!  

12  

Page 13: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

COMPRESSION  BASICS  IntroducQon  into  a  ways  of  making  your  data  smaller  

13  

Page 14: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Lossy  and  Lossless  

Database  generally  use  Lossless  Compression  

Lossy  compression  done  on  the  applicaQon  level  

14  

Page 15: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Some  ways  of  geQng  data  smaller  

Layout  OpQmizaQons   “Encoding”  

DicQonary  Compression  

Block  Compression  

15  

Page 16: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Layout  OpCmizaCons  

Column  Store  versus  Row  Store  

Hybrid  Formats  

Variable  Block  Sizes  

16  

Page 17: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Encoding  

Depends  on  Data  Type  and  Domain  

Delta  Encoding,  Run  Length  Encoding  (RLE)  

Can  be  faster  than  read  of  uncompressed  data  

UTF8  (strings)  and    VLQ  (Integers)  

Index  Prefix  Compression    

17  

Page 18: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

DicConary  Compression  

Replacing  frequent  values  with  DicQonary  

Pointers  

Kind  of  like  STL  String  

ENUM  type  in  MySQL    

18  

Page 19: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Block  Compression  

Compress  “block”  of  data  so  it  is  smaller  for  storage  

Finding  PaZerns  in  Data  and  Efficiently  encoding  them  

Many  Algorithms  Exist:  Snappy,  Zlib,  LZ4,  LZMA  

19  

Page 20: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Block  Compression  Details  

Compression  rate  highly  depends  on  data  

Compression  rate  depends  on  block  size  

Speed  depends  on  block  size  and  data  

20  

Page 21: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Block  Size  Dependence  (by  Leif  Walsh)  21  

Page 22: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

There  is  no  one  size  fits  all  

Typically  Compression  Algorithm  can  be  selected  

OMen  with  addiQonal  seongs  

22  

Page 23: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

WHERE  AND  HOW  Where  do  we  compress  data  and  how  do  we  do  that  

23  

Page 24: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Where  to  Compress  Data    

In  Memory  ?  

In  the  Database  Data  Store  ?  

As  Part  of  File  System  ?  

Storage  Hardware  ?  

ApplicaQon  ?  

24  

Page 25: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  in  Memory  

Reduce  amount  of  memory  needed  for  same  working  

set  

Reduce  IO  for  Fixed  amount  of  

Memory  

Typically  in-­‐Memory  

Performance  Hit  

Encoding/DicQonary  

Compression  are  good  fit  

25  

Page 26: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Database  Data  Store  

Reduce  Database  Size  on  Disk  

Works  with  all  file  systems  and  storage  

With  OS  cache  can  be  used  as  In-­‐

Memory  compression  variant  

Dealing  with  fragmentaQon  is  common  issue  

26  

Page 27: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  on  File  System  Level  

Works  with  all  Databases/

Storage  Engines  

Performance  Impact  can  be  significant  

Logical  Space  on  disk  is  not  reduced  

ZFS  

27  

Page 28: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  on  Storage  Hardware  

Hardware  Dependent  

Does  not  reduce  space  on  disk  

Can  result  in  Performance  

Gains  rather  than  free  space  (SSD)  

Can  become  a  choke  point  

28  

Page 29: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

By  ApplicaCon  

No  Database  Support  needed  

Reduce  Database  Load  and  Network  Traffic  

ApplicaQon  may  know  more  about  data  

More  Complexity  

Give  up  many  DBMS  features  (search,  index)  

29  

Page 30: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

DESIGN  CONSIDERATIONS  What  makes  database  system  to  do  well  with  compression  

30  

Page 31: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

The  Goal  

Minimize  NegaQve  Impact  for  User  OperaQons  (Reads  and  Writes)  

31  

Page 32: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Design  Principles  

Fast  Decompression  

Compression  in  Background  

Parallel  Compression/Decompression  

Reduce  need  of  Re-­‐Compression  

on  Update  

32  

Page 33: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Choosing  Block  Size  

Large  Blocks  

• Most  efficient  for  compression    

• Bulky  Read  Writes  

Small  Blocks  

• Fastest  to  Decompress  

• Best  for  point  lookups  

33  

Page 34: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

IMPLEMENTATION  EXAMPLES  What  Database  systems  Really  do  with  Compression  

34  

Page 35: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

MySQL  “Packed”  MyISAM  

Compress  table  “offline”  with  myisampack  

Table  Becomes  Read  Only  

Variety  of  compression  methods  are  used  

Only  data  is  compressed,  not  indexes  

Note  MyISAM  support  index  prefix  compression  for  all  indexes  

35  

Page 36: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

MySQL  Archive  Storage  Engine  

Does  not  support  indexes    

EssenQally  file  of  rows  with  sequenQal  access  

Uses  zlib  compression    

36  

Page 37: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Innodb  Table  Compression  

Available  Since  MySQL  5.1  

Pages  compressed  using  zlib  

Compressed  page  target  (1K,  4K,  8K)  has  to  be  set    

Both  Compressed  and  Uncompressed  pages  can  be  cached  in  Buffer  Pool  

Per  Page  “log”  of  changes  to  avoid  recompression  

Extenrally  Stored  BLOBs  are  compressed  as  single  enQty  

37  

Page 38: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Innodb  Transparent  Page  Compression  

Available  in  MySQL  5.7  

Zib  and  LZ4  Compression  

Compresses  pages  as  they  are  wriZen  to  disk  

Free  space  on  the  page  is  given  back  using  “hole  punching”  

Originally  designed  to  work  with  FusionIO  NVMFS  

Can  cause  problems  for  current  filesystem  due  to  very  high  hole  number  

38  

Page 39: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Disk  usage  (Linkbench  data  set  by  Sunny  Bains)  39  

Page 40: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Performance  on  Fast  SSD  (FusionIO  NVMFS)  40  

Page 41: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Results  on  Slower  SSD  (Intel  730*2,  EXT4)  41  

Page 42: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Fractal  Trees  Compression  

Available  as  Storage  Engine  for  MySQL  and  MongoDB  

Can  use  many  compression  libraries    

Tunable  Compression  Block  Size  

Reduce  Re-­‐Compression  due  to  message  buffering  

42  

Page 43: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Can  get  a  lot  of  compression  43  

Page 44: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

MongoDB  WiredTiger  Storage  Engine  

Engine  Has  many  compression  seongs    

Indexes  are  using  Index  Prefix  Compression  

Data  Pages  can  be  compressed  using  zlib,lz4  or  Snappy  

44  

Page 45: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  Size  (results  by  Asya  Kamsky)  45  

Page 46: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  in  RocksDB  

RocksDB  –  LSM  Based  Storage  Engine  for  MongoDB  and  MySQL  

LSM  works  very  well  with  compression  

Supports,  zlib,  lz4,  bzip2  compression  

Can  use  different  compression  methods  for  different  Levels  in  LSM  

46  

Page 47: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Compression  results  from  Mike  Kania  47  

Page 48: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

PostgreSQL  

Uses  compression  by  default  with  TOAST  

2KB  (default)  or  longer  Strings,  BlOBs  

Unlike  Innodb  External  Storage  is  not  required  for  Compression  

Recommended  to  use  File  system  compression  ie  ZFS  if  Compression  is  Desired  

48  

Page 49: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

Summary  

Compression  is  Important  in  Modern  Age  

Consider  it  for  your  system  

Many  different  techniques  are  used  to  make  data  smaller  by  databases  

Compression  support  is  rapidly  changing  and  improving  

49  

Page 50: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

www.percona.com      

Percona  Live  Data  Performance  Conference  

• April  18-­‐21  in  Santa  Clara,  CA  at  the  Santa  Clara  ConvenQon  Center  

• Register  with  code  “WebinarPL”  to  receive  15%  off  at  registraQon  

• MySQL,  NoSQL,  Data  in  the  Cloud  

 

www.perconalive.com  

Page 51: CompressioninOpenSource Databases* - Percona...Abit*of*History** Early%Computers%were%too%slow%to%compress% datain%SoMware% Hardware%Compression%(ie%Tape)% Compression%firstappears%for%non%

51  

Thank You! Peter  Zaitsev  

[email protected]  hZps://www.linkedin.com/in/peterzaitsev