Performance evaluation of fast integer compression techniques over tables
-
Upload
ikhtearsharif -
Category
Technology
-
view
437 -
download
1
description
Transcript of Performance evaluation of fast integer compression techniques over tables
![Page 1: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/1.jpg)
Performance evaluation of fast integer compression
techniques over tables
Ikhtear Md. Sharif Bhuyan
Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser
©Ikhtear Md. Sharif Bhuyan
![Page 2: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/2.jpg)
Overview • Introduction
• Compression in databases and issues
• Objectives
• Experimental Results
• Conclusion
• Future Work
12/4/2013 Performance evaluation of fast integer compression techniques over tables 2
![Page 3: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/3.jpg)
Query processing
12/4/2013 3 Performance evaluation of fast integer compression techniques over tables
RAM
Disk
Cache
Processor
![Page 4: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/4.jpg)
Compression in databases
• Reduce storage
• Query processing speed
• Save I/O bandwidth
• Improve performance for I/O-bound operation
12/4/2013 4 Performance evaluation of fast integer compression techniques over tables
![Page 5: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/5.jpg)
Selecting Compression in
databases
• Lossless
• Trade off between compression ratio and speed of
compression and decompression
12/4/2013 5 Performance evaluation of fast integer compression techniques over tables
![Page 6: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/6.jpg)
Objective
• Examining and comparing the performance of
patched schemes with other methods with respect
to compression ratio, decompression speed and
compression speed.
• Assessing the effect of different factors such as row
order.
12/4/2013 6 Performance evaluation of fast integer compression techniques over tables
![Page 7: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/7.jpg)
Column-oriented database
system
ID Name
104543 Peter
203456 Sam
234321 Maria
12/4/2013 Performance evaluation of fast integer compression techniques over tables 7
104543 Peter
203456 Sam
234321 Maria
104543 203456 234621
Peter Sam Maria
Row-oriented database
Column-oriented database
![Page 8: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/8.jpg)
Compression Algorithm • Variable length output
o Byte-oriented compression: Integers are coded in
units of bytes. i.e., Variable-Byte
o Block-based compression: These schemes use a
fixed number of input integers and output a
variable number of bytes. e.g., FOR, NewPFD,
FastPFD
12/4/2013 8 Performance evaluation of fast integer compression techniques over tables
![Page 9: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/9.jpg)
Compression Algorithm (Contd …)
• Fixed length output Each step takes a variable number of integers
and produces a compressed form of those integers
using a fixed number of bits as a unit. i.e., Simple9
12/4/2013
Performance evaluation of fast integer compression techniques over tables
9
![Page 10: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/10.jpg)
Binary packing
• Original Sequence
• the numbers range from 67 to 98.
• Compressed Sequence
12/4/2013 10 Performance evaluation of fast integer compression techniques over tables
67 78 85 96 98
0 11 18 29 31
![Page 11: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/11.jpg)
Patched Compression
• Original Sequence
• The exception # 11111.
• Base value b=2 (non-exceptional values), maximum
number of bits 5, number of exception 1, location
of exception 125
• Compressed Sequence
12/4/2013 11 Performance evaluation of fast integer compression techniques over tables
11 1 10 … 11 11 11111 10 11
11 1 10 … 11 11 11 10 11
![Page 12: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/12.jpg)
Synthetic data experiments • Compression Ratio Clustered data
12/4/2013 12 Performance evaluation of fast integer compression techniques over tables Clustered Data
![Page 13: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/13.jpg)
Synthetic data experiments (Contd …)
• Compression Ratio Uniform data
12/4/2013 13 Performance evaluation of fast integer compression techniques over tables
Uniform data
![Page 14: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/14.jpg)
Synthetic data experiments(Contd …)
• Decompression Speed:
12/4/2013 14 Performance evaluation of fast integer compression techniques over tables Clustered data
![Page 15: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/15.jpg)
Synthetic data experiments(Contd …)
12/4/2013 15 Performance evaluation of fast integer compression techniques over tables
Uniform Data
![Page 16: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/16.jpg)
Real Data Sets
• Census-Income
• Census1881
• Star Schema Benchmark
12/4/2013 16 Performance evaluation of fast integer compression techniques over tables
![Page 17: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/17.jpg)
Column wise Compressed size
12/4/2013 17 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Original Shuffled
![Page 18: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/18.jpg)
Column wise Compressed size
(Contd …)
12/4/2013 18 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3)
![Page 19: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/19.jpg)
Column wise Compression
speed
12/4/2013 19 Performance evaluation of fast integer compression techniques over tables
Column-wise compression speed for Census1881 of frequency coded file
![Page 20: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/20.jpg)
Column wise Compression
speed (Contd …)
12/4/2013 20 Performance evaluation of fast integer compression techniques over tables
Column-wise compression speed for Census1881 of frequency coded file
![Page 21: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/21.jpg)
Column wise Decompression
speed
12/4/2013 21 Performance evaluation of fast integer compression techniques over tables
Column-wise decompression speed for Census1881 of frequency coded file
![Page 22: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/22.jpg)
Column wise Decompression
speed (Contd …)
12/4/2013 22 Performance evaluation of fast integer compression techniques over tables
Column-wise decompression speed for Census1881 of frequency coded file
![Page 23: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/23.jpg)
Effect of Row Order
12/4/2013 23 Performance evaluation of fast integer compression techniques over tables
Histogram of compressed size (bits/int)
![Page 24: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/24.jpg)
Conclusion • Sorting columns results in good compressed size.
• Sorted columns can be compressed and
decompressed faster than shuffled order.
• Selection of compression schemes depends on the
nature of database(OLPT/OLAP) and the
requirement of storage and data access speed.
12/4/2013 24 Performance evaluation of fast integer compression techniques over tables
![Page 25: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/25.jpg)
Future Work • Incorporating a query engine to asses real world
performance.
• Comparing on processor-level metrics.
• Using multiple threads in compression algorithm.
• Query in compressed form
12/4/2013 25 Performance evaluation of fast integer compression techniques over tables
![Page 26: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/26.jpg)
Thank You
12/4/2013 26 Performance evaluation of fast integer compression techniques over tables
![Page 27: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/27.jpg)
Backup
12/4/2013 27 Performance evaluation of fast integer compression techniques over tables
![Page 28: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/28.jpg)
Key Issues
• Data access latency
The time it takes between the request sent and the
data is found on disk to start processing.
• Disk bandwidth
The amount of data can be sent per second from the
disk.
12/4/2013 28 Performance evaluation of fast integer compression techniques over tables
![Page 29: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/29.jpg)
Experimental Setup
• Hardware o Intel Core i5-2400
o RAM: 8 GB
o Cache: 6MB L3
o Memory Clock Speed: 1333 MHz
• Software o Java SDK version 1.7.0
o https://github.com/lemire/JavaFastPFOR
o Single-threaded
• More Info o http://hdl.handle.net/1882/45703
12/4/2013 29 Performance evaluation of fast integer compression techniques over tables
![Page 30: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/30.jpg)
Compressed Size
12/4/2013 30 Performance evaluation of fast integer compression techniques over tables
Coding Scheme Original Shuffled High Card. Low Card.
Variable-Byte 15.00 15.00 15.00 15.00
Binary Packing 11.37 11.42 11.15 11.37
NewPFD 13.06 13.19 12.32 13.14
OptPFD 11.84 11.85 11.80 11.80
FastPFOR 11.27 11.29 11.06 11.24
Simple9 15.75 15.90 15.72 15.84
Result of compression (bits per integer) on SSB with frequency coded file
![Page 31: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/31.jpg)
Compression Speed
12/4/2013 31 Performance evaluation of fast integer compression techniques over tables
Coding Scheme Original Shuffled High Card. Low Card.
Variable-Byte 33 31 33 31
Binary Packing 729 711 746 732
NewPFD 52 36 40 34
OptPFD 6 3 5 4
FastPFOR 104 76 89 84
Simple9 78 60 69 64
Result of compression speed (mis) on Census1881 with frequency coded file
![Page 32: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/32.jpg)
Decompression Speed
12/4/2013 32 Performance evaluation of fast integer compression techniques over tables
Coding Scheme Original Shuffled High Card. Low Card.
Variable-Byte 165 197 214 186
Binary Packing 1151 1089 1151 1135
NewPFD 709 615 729 689
OptPFD 421 357 482 381
FastPFOR 776 707 763 730
Simple9 488 377 447 398
Result of decompression speed (mis) on Census1881 with frequency coded file
![Page 33: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/33.jpg)
Column wise Compressed size
12/4/2013 33 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Original Shuffled
![Page 34: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/34.jpg)
Column wise Compressed size
12/4/2013 34 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3)
![Page 35: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/35.jpg)
Column wise Compression
speed
12/4/2013 35 Performance evaluation of fast integer compression techniques over tables
Column-wise compression speed for Census1881 of frequency coded file
![Page 36: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/36.jpg)
Column wise Decompression
speed
12/4/2013 36 Performance evaluation of fast integer compression techniques over tables
Column-wise decompression speed for Census1881 of frequency coded file
![Page 37: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/37.jpg)
Effect of CPU family on
compression speed
12/4/2013 37 Performance evaluation of fast integer compression techniques over tables Compression speed (mis) on different processor
![Page 38: Performance evaluation of fast integer compression techniques over tables](https://reader033.fdocuments.in/reader033/viewer/2022052900/5560dde9d8b42afb7b8b45a8/html5/thumbnails/38.jpg)
Effect of CPU family on
decompression speed
12/4/2013 38 Performance evaluation of fast integer compression techniques over tables
Decompression speed (mis) on different processor