NEUR 3680 Midterm II Review Megan Metzler [email protected].
Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler,...
Transcript of Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler,...
![Page 1: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/1.jpg)
Albis: High-Performance File Format for Big Data Systems
Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler,
IBM Research, Zurich
2018 USENIX Annual Technical Conference
![Page 2: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/2.jpg)
Relational Data Processing Stack in the Cloud
2
Relational Engines
File Formats
DistributedStorage
One of the most popular data processing paradigms
- Data organized in tables
- Analyzed using DSL like SQL
- Integrity protected using variants
But unlike classical RDBMs systems, they don’t manage their own storage
![Page 3: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/3.jpg)
Relational Data Processing Stack in the Cloud
3
Relational Engines
File Formats
DistributedStorage
![Page 4: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/4.jpg)
Back to the Future - It is 2010
4
Relational Engines
File Formats
Hardware
Disks connected over 1/10 Gbps network
![Page 5: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/5.jpg)
The I/O Revolution
5
2-3 orders of magnitude performance improvements- latency : from msecs to μsecs - bandwidth : from MBps to GBps - IOPS : from 100s to 100K
![Page 6: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/6.jpg)
The Impact of the Revolution
6
Hadoop NameNode
Hadoop DataNode
Benchmark100 Gbps
3.1 GB/s x 4 = 12.4 GB/s
Micro-benchmark*
16 cores in parallel, reading TPC-DS data set. What is the bandwidth?
Why micro-benchmark?Decouple from the SQL engine
*https://github.com/animeshtrivedi/fileformat-benchmarks
File format...
![Page 7: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/7.jpg)
The Impact of the Revolution
7
![Page 8: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/8.jpg)
The Impact of the Revolution
8
Goodput Throughput
Formats like JSON bloat data upto 10x. Hence we decouple amount of data vs. how it is stored
![Page 9: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/9.jpg)
The Impact of the Revolution
9
None of the modern file formats delivered performance close to the hardware
100 Gbps
74.9 Gbps: HDFS/NVMe
![Page 10: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/10.jpg)
The Outdated Assumptions and Impact
10
End-host assumptions
Distributed systems assumptions
Language/runtimes assumptions
![Page 11: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/11.jpg)
The Outdated Assumptions and Impact
11
End-host assumptions
Distributed systems assumptions
Language/runtimes assumptions
1. CPU is fast, I/O is slow - trade CPU for I/O - compression, encoding
But why now? CPU core speed is stalled, but …
1 Gbps HDD 100 Gbps Flash
Bandwidth 117 MB/s 140 MB/s 12.5 GB/s 3.1 GB/s
cycle/unit 38,400 10,957 360 495
![Page 12: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/12.jpg)
The Outdated Assumptions and Impact
12
End-host assumptions
Distributed systems assumptions
Language/runtimes assumptions
2. Avoid slow, random small I/O - preference for large block scans
But leads to bad CPU cache performance
C0C1C2C3 C7
C6C5C4
128 MB 1 GB cache size?
Bounded by the poor cache/IPC
performance
Bounded by the number of
instructions/row
![Page 13: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/13.jpg)
The Outdated Assumptions and Impact
13
End-host assumptions
Distributed systems assumptions
Language/runtimes assumptions
3. Remote I/O is slow - pack data/metadata together
- schedule tasks on local blocks
But now network/storage is super fast? then why still pack all data in a single block and try to co-schedule tasks?
data
compute
data
compute
![Page 14: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/14.jpg)
The Outdated Assumptions and Impact
14
End-host assumptions
Distributed systems assumptions
Language/runtimes assumptions
4. Metadata lookups are slow - decrease number of lookups by decreasing
number of files/directories
RAMCloud, Crail can do 10 millions of lookups/sec. Does this design still make sense?
Metadata Server
Client
Data
Where is data?Data access
![Page 15: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/15.jpg)
The Outdated Assumptions and Impact
15
End-host assumptions
Distributed systems assumptions
Language/runtimes assumptions
5. Disregard for the runtime environment: - group encoded/decoded- heavy object pressure- independent layers, no shared object- materialize all objects
Binary / raw data
Runtime row binary data
![Page 16: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/16.jpg)
Can we reset all assumptions and
start from scratch for modern
high-performance I/O devices?
“Deliver the full hardware performance”
Albis
16
http://www.fotocommunity.de/photo/albiskette-chfleischli/39086845
![Page 17: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/17.jpg)
Albis● Albis - A file format to store relational tables for read-heavy analytics workloads
● Supports all basic primitive types with data and schema
○ nested schemas are flattened and data is stored in the leaves
● Three fundamental design decisions:
1. avoid CPU pressure, i.e., no encoding, compression, etc.
2. simple data/metadata management on the distributed storage
3. carefully managed runtime - simple row/column storage with a binary API
17
![Page 18: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/18.jpg)
Table Storage Logic
18
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
Int double byte[ ] char float[ ]
![Page 19: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/19.jpg)
Table Storage Logic
19
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01
10 11
03 04
13 14
20 21
30 31
40 41
23 24
33 34
43 44
02
12
22
32
42Ro
w g
roup
s
Column groups
Int double byte[ ] char float[ ]
![Page 20: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/20.jpg)
Table Storage Logic
20
00 01
10 11
03 04
13 14
20 21
30 31
40 41
23 24
33 34
43 44
02
12
22
32
42
Row
gro
ups
Column groups
![Page 21: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/21.jpg)
Table Storage Logic
21
03 04
13 14
23 24
33 34
43 44
02
12
22
32
42
Row
gro
ups
Column groups
RG0CG0
RG0CG1
RG0CG2
RG1CG2
RG1CG1
RG1CG0
If there is only 1 column group : Row store If there are ‘n’ column groups : Columns store
![Page 22: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/22.jpg)
Table Storage Logic
22
03 04
13 14
23 24
33 34
43 44
02
12
22
32
42
Row
gro
ups
Column groups
RG0CG0
RG0CG1
RG0CG2
RG1CG2
RG1CG1
RG1CG0
table0
RG0 RG1
CG0 CG1 CG2 CG0 CG1 CG2
schema
![Page 23: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/23.jpg)
Row Storage Format
23
table0
RG0 RG1
CG0 CG1 CG2 CG0 CG1 CG2
schema How is a single row of data stored in these files?
![Page 24: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/24.jpg)
Row Storage Format
24
Null bitmap
Marking null columns values
![Page 25: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/25.jpg)
Row Storage Format
25
Null bitmap
complete row size
![Page 26: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/26.jpg)
Row Storage Format
26
Null bitmap
complete row size fixed-field area variable-field area
![Page 27: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/27.jpg)
Row Storage Format
27
ptr ptr byte [ ] ... float [ ] ...
Null bitmap
complete row size fixed-field area variable-field area
Schema of { int, double, byte[ ], char, float[ ] } :
![Page 28: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/28.jpg)
Row Storage Format
28
ptr ptr byte [ ] ... float [ ] ...
Null bitmap
complete row size fixed-field area variable-field area
Schema of { int, double, byte[ ], char, float[ ] } : + 1 byte bitmap (because there are 5 columns)+ 4 byte size + 4 byte (int) + 8 byte (double) + 8 byte (offset + size, ptr) + 1 byte (char) + 8 byte (offset + size, ptr) = 34 bytes + variable area.
![Page 29: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/29.jpg)
segment buffer (e.g., 1 MB)
Writing Rows
29
writer object Min, max, distribution statistics
HDFS data file HDFS metadata file
Use to implement filters
![Page 30: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/30.jpg)
Reading Rows
30
table0
RG0 RG1
CG0 CG1 CG2 CG0 CG1 CG2
schema
1. Read schema file
2. Check projection to figure out which files
to read
a. Complete CGs
b. Partial CGs
3. Evaluate filters to skip segments
4. Materialize values
a. Skip value materialization in partial
CG reads
1 2 3 4 5Row data
![Page 31: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/31.jpg)
More Details in the Paper
● How to evolve schema? Adding and removing columns
● How to evolve data? Adding and removing rows
● How to process Albis files in a relational data processing engine?
● Concerns regarding data imbalance or re-grouping?
● ...
31
![Page 32: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/32.jpg)
EvaluationAll experiments on a 4-node cluster with 100 Gbps network and flash devices
Dataset is TPC-DS tables with the scale factor of 100 (~100 GB of data)
Three fundamental questions
● Does Albis deliver better performance for micro-benchmarks?
● Does micro-benchmark performance translate to better workload
performance?
● What is the performance and space trade-off in Albis?
32
![Page 33: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/33.jpg)
Microbenchmark Performance - Revised
33
100 Gbps
74.9 Gbps: HDFS/NVMe
![Page 34: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/34.jpg)
Microbenchmark Performance - Revised
34
100 Gbps
74.9 Gbps: HDFS/NVMe
Albis delivers 1.9 - 21.3x performance improvements over other formats
![Page 35: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/35.jpg)
Spark/SQL TPC-DS Performance
35
TPC-DS dataset, scale factor = 100Y axis : CDF of queries X axis : percentage performance gains
![Page 36: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/36.jpg)
Spark/SQL TPC-DS Performance
36Albis delivers up to 3x performance gains for TPC-DS queries
![Page 37: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/37.jpg)
Space vs. Performance Trade-off
37
None Snappy Gzip zlib
Parquet 58.6 GB12.5 Gbps
44.3 GB9.4 Gbps
33.8 GB 8.3 Gbps N/A
ORC 72.0 GB19.1 Gbps
47.6 GB17.8 Gbps N/A 36.8 GB
13.0 Gbps
Albis 94.5 GB 59.9 Gbps N/A N/A N/A
![Page 38: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/38.jpg)
Space vs. Performance Trade-off
38
None Snappy Gzip zlib
Parquet 58.6 GB12.5 Gbps
44.3 GB9.4 Gbps
33.8 GB 8.3 Gbps N/A
ORC 72.0 GB19.1 Gbps
47.6 GB17.8 Gbps N/A 36.8 GB
13.0 Gbps
Albis 94.5 GB 59.9 Gbps N/A N/A N/A
Albis inflates data by 1.3 - 2.7x, but gives 3.4 - 7.2x performance gains
![Page 39: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/39.jpg)
Microbenchmark Performance - Revised
39
100 Gbps
74.9 Gbps: HDFS/NVMe
What would it take to deliver 100 Gbps?
![Page 40: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/40.jpg)
Microbenchmark Performance - Revised
40
100 Gbps
74.9 Gbps: HDFS/NVMe
JVM object overheads
Apache Crail (Incubating) - A High-Performance Distributed Data Store, http://crail.incubator.apache.org/
Albi
s +
Crai
l
![Page 41: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/41.jpg)
Microbenchmark Performance - Revised
41
100 Gbps
74.9 Gbps: HDFS/NVMeData
density
Albi
s +
Crai
l
Albi
s +
Crai
l + N
oObj
s
![Page 42: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/42.jpg)
Microbenchmark Performance - Revised
42Albis can deliver performance within 10% of hardware
100 Gbps
74.9 Gbps: HDFS/NVMe
Albi
s +
Crai
l
Albi
s +
Crai
l + N
oObj
s
![Page 43: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/43.jpg)
Albis - Summary● Albis - a high-performance file format for storing relational data
○ Open-source address: https://github.com/zrlio/albis
● Motivation: in presence of new network and storage devices, time to revise basic
assumptions○ no compression or encoding
○ simple data and metadata design
○ efficient object management with a binary API
● Revised software stack to lead to significant performance improvements
○ demonstrated it for the file format
○ very active research field - OSes designs (Arrakis, IX), networking and storage stacks43
![Page 44: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/44.jpg)
Notice
IBM is a trademark of International Business Machines Corporation, registered in many jurisdictions worldwide. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Java and all Java- based trade-marks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Other products and service names might be trademarks of IBM or other companies.
44
![Page 45: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/45.jpg)
Backup
45
![Page 46: Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich … · Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference. Relational Data Processing](https://reader034.fdocuments.in/reader034/viewer/2022042612/5f85b5a979330e41be27bdf5/html5/thumbnails/46.jpg)
Microarchitectural Analysis
46
Parquet ORC Arrow Albis Gains
Instructions per row 6.6K 4.9K 1.9K 1.6K 1.2 - 4.1x
Cache-misses per row 9.2 4.6 5.1 3.0 1.7 - 3.0x
Nanosecond per row 105.3 63.9 31.2 20.8 1.5 - 5.0x