Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark
-
Upload
daniel-gomez-ferro -
Category
Engineering
-
view
197 -
download
0
Transcript of Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark
![Page 1: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/1.jpg)
Architecture of an Open Source RDBMS powered by
HBase and Spark
January 12, 2017
Spark Meetup Barcelona
Daniel Gómez Ferro
![Page 2: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/2.jpg)
▪ Introduction to Splice Machine
▪ 1.x: the need for Spark
▪ 2.0: Spark introduction, challenges and wins
▪ Future
2
Agenda
![Page 3: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/3.jpg)
▪ Splice Machine
▪ Distributed database company
▪ Open source
▪ VC-backed
▪ Offices in San Francisco and St Louis (MO)
3
Who are we?
![Page 4: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/4.jpg)
4
What do we do?
The Open Source RDBMS Powered By Hadoop & Spark
ANSI SQLNo retraining or rewrites for SQL-based
analysts, reports, and applications
¼ the Cost Scales out on
commodity hardware
SQL Scale Out Speed
TransactionsEnsure reliable updates
across multiple rows
Mixed WorkloadsSimultaneously support
OLTP and OLAP workloads
ElasticIncrease scale in
just a few minutes
10x FasterLeverages Spark
in-memory technology
![Page 5: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/5.jpg)
Timeline
5
Where are we?
Founded2012
v1.0Nov 2014
v1.5Oct 2015
v2.0Open Source
Spark integrationJul 2016
v2.5Feb? 2017
![Page 6: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/6.jpg)
1.x: the need for Spark
![Page 7: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/7.jpg)
▪ SQL Parser
▪ SQL Planner
▪ SQL Optimizer
▪ Data storage
▪ Execution Engine
7
RDBMS building blocks
![Page 8: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/8.jpg)
▪ SQL Parser
▪ SQL Planner
▪ SQL Optimizer
▪ Data storage
▪ Execution Engine
8
RDBMS building blocks
Apache Derby
Apache HBase
![Page 9: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/9.jpg)
▪ Distributed Sorted Map▪ Keys and Values are byte[]
▪ Lexicographically sorted (like a dictionary)▪ [] < [0x00; 0x00] < [0x01] < [0x01, 0x00]
▪ Key range is partitioned/sharded in ‘regions’
▪ Operations
▪ get(byte[] key) -> byte[]
▪ put(byte[] key, byte[] value)
▪ scan(byte[] start, byte[] stop) -> scanner
▪ Extensions
▪ Coprocessors
9
HBase Introduction
![Page 10: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/10.jpg)
10
HBase Architecture
HDFS
![Page 11: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/11.jpg)
▪ Sits on top of HDFS▪ Distributed append-only Filesystem
▪ Files are immutable once closed
11
HBase challenges
![Page 12: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/12.jpg)
▪ Sits on top of HDFS▪ Distributed append-only Filesystem
▪ Files are immutable once closed
▪ Log-Structured Merge Tree
▪ Multi level storage▪ Inserts are buffered in memory
▪ Flush this buffer to disk, writing indexed, sorted files
12
HBase challenges
In-memorybuffer[a; m]
File 1 [a; p]
File 2 [f; z]
![Page 13: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/13.jpg)
▪ Sits on top of HDFS▪ Distributed append-only Filesystem
▪ Files are immutable once closed
▪ Log-Structured Merge Tree
▪ Multi level storage▪ Inserts are buffered in memory
▪ Flush this buffer to disk, writing indexed, sorted files
13
HBase challenges
File 3 [a; m]
File 1 [a; p]
File 2 [f; z]In-memory
buffer[]
![Page 14: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/14.jpg)
14
HBase Architecture
HDFS
![Page 15: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/15.jpg)
15
HBase Architecture
▪ HRegion▪ MemStore
▪ One or more HFiles
▪ HLog
▪ Writes
▪ Add it to the MemStore
▪ Write it to the HLog
▪ When the MemStore gets big enough▪ Flush: dump the MemStore into a new HFile
▪ Reads
▪ In parallel from the MemStore and all HFiles
![Page 16: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/16.jpg)
▪ We reused several Derby components▪ JDBC driver▪ SQL Parser/Planner/Optimizer
▪ In-memory data formats
▪ Bytecode generation
▪ Developed some custom solutions
▪ TEMP table for transient data (joins, aggregates, etc.)
▪ Task framework (using HBase’s coprocessors)
▪ Connection pooling
▪ Switched Derby’s datastore for HBase▪ Primary Keys and Indexes make use of HBase’s sorting order
▪ Removing Derby’s assumptions about running on a single machine...
16
Derby - HBase integration
![Page 17: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/17.jpg)
▪ Great for▪ Operational workloads▪ Replacing non-scalable RDBMS solutions
▪ SQL support▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer...
17
Splice Machine 1.x
![Page 18: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/18.jpg)
▪ Great for▪ Operational workloads▪ Replacing non-scalable RDBMS solutions
▪ SQL support▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer...
▪ But...
▪ Struggled on analytical queries
▪ HBase’s compactions created instabilities
▪ Minimum latency was too high (due to Task Framework)
18
Splice Machine 1.x
![Page 19: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/19.jpg)
▪ Great for▪ Operational workloads▪ Replacing non-scalable RDBMS solutions
▪ SQL support▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer...
▪ But...
▪ Struggled on analytical queries
▪ HBase’s compactions created instabilities
▪ Minimum latency was too high (due to Task Framework)
19
Splice Machine 1.x
![Page 20: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/20.jpg)
2.0: Spark introduction, challenges and wins
![Page 21: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/21.jpg)
▪ Challenging but natural▪ Matched tree of database operators with RDD transformations
21
Spark Integration
Aggregate
Join
Scan Restriction
Scan
reduceByKey()
join()
newAPIHadoopRDD() filter()
newAPIHadoopRDD()
![Page 22: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/22.jpg)
22
▪ Abstracted away the Spark API
▪ Two implementations▪ In-memory using Guava’s FluentIterable APIs▪ Distributed using Spark
▪ SQL operations have a single implementation
▪ In-memory use case:▪ OLTP workloads▪ Very low latency▪ Bring data in, perform computation locally
▪ Anti-pattern in distributed systems, but it works
Unified API
![Page 23: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/23.jpg)
▪ Got rid of TEMP table▪ Spark maintains temporary data in memory
▪ Got rid of Task Framework▪ Spark performs the same job, less complexity
▪ Resource isolation▪ HBase and Spark in separate processes
▪ Analytical queries don’t impact as much HBase stability
23
Spark Integration Benefits
![Page 24: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/24.jpg)
▪ Many serialization boundaries: poor performance▪ HDFS -> HBase -> Spark -> Client
▪ Task granularity▪ Too coarse
▪ Multiple Spark contexts▪ One per HRegionServer
▪ Derby legacy issues▪ Custom datatypes▪ Thread context assumptions
24
Integration Problems
![Page 25: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/25.jpg)
▪ Remove serialization boundaries
▪ Hybrid scanners: ▪ Custom InputFormat that reads HFiles directly from HDFS into Spark▪ Merges those values with a fast scanner on the MemStore▪ Most data: HDFS -> Spark▪ Small part: HBase (in-memory) -> Spark
▪ Requires some hooks in HBase▪ Compactions remove HFiles, Spark might be reading them▪ Flushes add HFiles
▪ Much better read performance
25
Solving: serialization boundaries
![Page 26: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/26.jpg)
▪ Increase task granularity
▪ HTableInputFormat default is:▪ 1 region = 1 partition▪ Each region could be 1 GB or more
▪ SpliceInputFormat subdivides regions into blocks (default 32Mb)▪ Better parallelism▪ Better performance
▪ This also needs hooks in HBase (coprocessors)
26
Solving: task granularity
![Page 27: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/27.jpg)
▪ Single shared Spark context
▪ JobServer wasn’t good enough▪ It would become a bottleneck, results would go through it
▪ Custom JobServer (called OLAPServer)▪ Single Spark context on this server▪ Currently colocated with the HMaster (fault tolerant for free)▪ Makes Spark jobs stream results directly to the client▪ Runs several partitions in parallel▪ Starts streaming as soon as there’s data
27
Solving: multiple Spark contexts
![Page 28: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/28.jpg)
28
JobServer vs OLAPServerTi
me
JobServerStart partition 1Next rowNext row…End partition 1, sendStart partition 2Next rowNext row…End partition 2, send
ClientRun partition 1
Get resultsRun partition 2
Get results
During this time the client is blocked waiting for more data
![Page 29: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/29.jpg)
29
JobServer vs OLAPServerTi
me
JobServerStart partition 1Next rowNext row…End partition 1, sendStart partition 2Next rowNext row…End partition 2, send
ClientRun partition 1
Get resultsRun partition 2
Get results
OLAPServerStart partition 1, 2, 3Get and send rowGet and send row…End partition 1Start partition 4Get and send rowGet and send row…End partition 2, 3Start partition 5, 6
ClientRun partition 1, 2, 3Get resultGet result
Run partition 4
Get resultGet result
Run partition 5, 6
![Page 30: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/30.jpg)
30
Single shared Spark context
![Page 31: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/31.jpg)
▪ Custom datatypes:▪ Custom Kryo serializers for Derby objects
▪ Thread contexts▪ Not completely solved▪ Use TaskContext.addTaskCompletionListener() to cleanup after ourselves▪ Still finding resource leaks from time to time...
31
Solving: Derby legacy issues
![Page 32: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/32.jpg)
▪ HBase compactions in Spark:▪ HBase compactions can be expensive
▪ Reading and writing lots of data
▪ If they happen in the HBase JVM they can kill OLTP performance
▪ We made possible running them in Spark▪ Maintaining data locality▪ Scheduled among other jobs
▪ Fallback to HBase if the Spark scheduler doesn’t have resources
32
Other Spark goodies
![Page 33: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/33.jpg)
33
Other Spark goodies
▪ Integration with Spark streaming:
▪ We can ingest data directly from Spark Streaming
▪ Easy to write to Splice Machine from Kafka through it
![Page 34: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/34.jpg)
Future work and roadmap
![Page 35: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/35.jpg)
35
▪ Move to DataFrame APIs▪ Catalyst optimizer▪ Whole stage code generation (better than Derby’s codegen)▪ Already transitioned some operations▪ Requires good UnsafeRow support
▪ UnsafeRow▪ Compact in memory representation
▪ Rows are a serialized in a continuous block of memory
▪ Better memory management▪ Less GC time
Future Spark work
![Page 36: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/36.jpg)
36
Future Spark work
Row1
Current
DataDes[]
SQLDate SomeOtherField
SQLInteger
SQLInteger
UnsafeRow
MemoryBlock: byte[]
Row10, 100
Row2100, 50
![Page 37: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/37.jpg)
37
▪ Columnar storage format
▪ We already have ‘Pinned’ tables:▪ Create Parquet snapshot of table▪ Get columnar access▪ Good for read-only data
▪ Planning on maintaining dual representation▪ Row-oriented for recently written values▪ Column-oriented for historical data▪ Merge those on the fly
Future Spark work
![Page 38: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/38.jpg)
38
▪ Better Spark shell integration
▪ Our SparkContext resides in the OLAPServer
▪ Getting data to a Spark shell incurs a serialization boundary▪ From Splice’s SparkContext to the shell context
▪ We want to achieve transparent conversion▪ ResultSet -> DataFrame
Future Spark work
![Page 39: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/39.jpg)
39
▪ Performance increases across the board▪ TPCC, TPCH, Backup/Restore, ODBC driver…
▪ Incremental backup
▪ Native PL/SQL support (in Beta)▪ No excuses left for migrating those Oracle databases
▪ Client load balancing/failover▪ Via HAProxy
▪ Statistics improvements▪ Histograms, sketching libraries
▪ RDD caching (pinning)
2.5 Roadmap
![Page 40: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/40.jpg)
Thank You!
We are hiring :-)
![Page 41: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/41.jpg)
Open Source: github.com/splicemachine
Community:community.splicemachine.com
Slack channel:tinyurl.com/SMslack
![Page 42: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/42.jpg)
42
![Page 43: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/43.jpg)
TPCH 100 Load Throughput
![Page 44: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/44.jpg)
TPCH 100 Query Times (seconds)Query
1 395 TRAFODION-2237 99
2 PHOENIX-3322 516 44
3 PHOENIX-3322 TRAFODION-2237 126
4 PHOENIX-3322 TBD 133
5 PHOENIX-3322 TBD 192
6 74 3178 38
7 PHOENIX-3322 4442 220
8 PHOENIX-3322 TRAFODION-2239 620
9 PHOENIX-3322 941 273
10 PHOENIX-3322 TRAFODION-2241 101
11 PHOENIX-3317 463 56
![Page 45: Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark](https://reader030.fdocuments.in/reader030/viewer/2022020314/58ef66c91a28ab6a518b45d5/html5/thumbnails/45.jpg)
TPCH 100 Query Times (seconds)Query
12 379 TBD 85
13 PHOENIX-3318 TBD 71
14 PHOENIX-3322 TBD 50
15 PHOENIX-3319 TBD 102
16 PHOENIX-3322 TBD 33
17 PHOENIX-3322 TBD 929
18 PHOENIX-3322 TBD SPLICE-34
19 PHOENIX-3322 TBD 57
20 PHOENIX-3320 TBD SPLICE-410
21 PHOENIX-3321 TBD 479
22 PHOENIX-3322 TBD 219