SkyhookDB: leveraging programmable storage toward DB ... · PDF fileExamples: MySQL,...
Transcript of SkyhookDB: leveraging programmable storage toward DB ... · PDF fileExamples: MySQL,...
CROSS Symposium 2017
SkyhookDB: leveraging programmable storage
toward DB elasticityCROSS Incubator Project
PI: Jeff LeFevre ([email protected])1
CROSS Symposium 2017
Larger single machine route
3
DB Storage
DB application● Static allocation/course
upgrade path● Storage variants
○ RAID, SAN, EBS● Benefit: simple arch● Challenge
○ Single DB app does all processing
○ Scaling (**Tidalscale)
Examples: MySQL, PostgreSQL, MS SQL Server, etc.
CROSS Symposium 2017
Classical MPP route
4
DB Storage
DB application
● Static allocation/course upgrade path
● Storage variants○ RAID, NAS, EBS○ Specialized file
systems● Benefit: parallelism● Challenge:
○ Vendor lock-in○ Complexity of
multi-node coordination
○ $$
DB Storage
DB application
DB Storage
DB application
Examples: Teradata, OracleRAC, IBM, Vertica; newer: Redshift, PostgresXL
CROSS Symposium 2017
“Hybrid” MPP route
5
DB Storage
DB application
● Static allocation/course upgrade path
● Data distributed among multiple independent DB instances
● Benefits:○ Simpler than classical
MPP● Challenges:
○ Multi-shard transactions difficult
○ Data still tied to node
DB Storage
DB application
DB Storage
DB application
Examples: CitusData (Postgres), Greenplum, PostgresFDW+PMPP, other sharded databases
CROSS Symposium 2017
Many other approaches...
• AWS Redshift Spectrum (ParAccel)– MPP over local storage + query S3 data in place
• Apache Kudu (Cloudera)– Hadoop ecosystem; Impala, Spark
• Snowflake Computing– Dynamic MPP over S3 buckets in AWS
• OpenStack Swift “Store-lets” project (scoop)6
CROSS Symposium 2017
Growing your database: SkyhookDB
7
DB application CPU
DB storage
DB application
DB Storage
CPU
CPU
CPU
CPU
CPU
CPU
CPU CPU
CPU
CPU
DB application CPU
DB storage
CROSS Symposium 2017
Ceph Distributed Object Store
Elasticity: SkyhookDB
8
Ceph Distributed Object Store
PostgreSQL CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPUCPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPUCPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPUCPU
Ceph Distributed Object Store
CPU
CPUCPU
CPU
CROSS Symposium 2017
SkyhookDB Guiding Principles
• Leverage and extend storage capabilities– Processing++ {elasticity*, indexing, stats, local optimizations}
• Single DB node architecture – Avoid the complexity of MPP, still get the parallelism
• Import trust by building only upon trusted software– PostgreSQL, Ceph
• Keep approach generic enough for other applications– Allows for application-specific customizations *marketing benefits
9
CROSS Symposium 2017
Key features enabling elasticity
1. Intrinsic elasticity of storage system – Distributed storage systems are dynamic!
2. Repartition data dynamically– Subdivide data partitions for elasticity and
load balancing/skew3. Scale-out I/O and compute
– Remote data processing + metadata mgm’t10
Client DBMS applicationPostgreSQL
Foreign Data Wrapper
Repo: skyhook-db
Repo: skyhook-fdw
Ceph Extensions
Repo: skyhook-kraken
Storage Access Layer API
Server
OS
Object Storage Device
Server
OS
Object Storage Device
Server
OS
Object Storage Device
Ceph Distributed Object Store
12
SkyhookDB Architecture
set of objects set of objectsset of objects
● OSDs store and manage objects○ Database table partitions
● Objects are extensible○ Remote filter, projection
● OSDs have embedded KVstore (RocksDB)○ Query-able metadata
● ALL data, metadata, indexes stored in objects
13
Object Storage Device (OSD): Ceph
CEPH OSD
RocksDB
set of objects
CROSS Symposium 2017
Design Details
14
Object storage
RocksDB
N partitions
DB Table data (rows)
object-1
object-2
object-n
Ceph OSD Query-able Metadata
● Index values● Counts● Min/max/size● Stats, etc...
Object storage
RocksDB
Object storage
RocksDB
Object storage
RocksDB
15
• Object functionality– SQL processing (pushdowns)– Complex functions (application specific)– Repartition/move (copy partial rows)
• Collection level functionality– Indexing at object or collection level– IO batching, request reordering (local state)
DB-specific customizations
CROSS Symposium 2017
Breadth-first approach
• Tested prototype functionality at scale• Try to ensure scalability benefits are clear
– IO, CPU– Collection-level optimizations possible– Server-local runtime optimizations
• Many experiments for each aspect– Pushdown processing, indexing, repartition/split
17
CROSS Symposium 2017
But Does it Scale?
• Experiments across many dimensions– Data set size– Number of machines– Number of objects per dataset– Repartition + new query time– Index creation per object– Number of concurrent users - not yet tested
18
20
• Dataset: – TPC lineitem table, 1 billion rows ~140GB– Ceph 10K objects ~4MB each (each with index)
• Queries: – Range query, regex query (LIKE), point query
• Machines:– Cloudlab 20 Cores, 500GB SSDs, 10GbE
• Parallelism - 12/node IO dispatch
Experimental Setup
22
Qa: Count query with 10% selectivity:
SELECT count(*) FROM lineitem1B WHERE l_extendedprice > 71000.0
Qb: Range query with 10% selectivity:
SELECT * FROM lineitem1B WHERE l_extendedprice > 71000.0
Qd: Point query (unique row) issued with and without index:
SELECT l_extendedprice FROM lineitem1B WHERE l_orderkey=5 AND l_linenumber=3
Qf: Regex query with 10% selectivity:
SELECT * FROM lineitem1B WHERE l_comment iLIKE '%uriously%'
Queries
27
EXPLAIN SELECT l_extendedprice FROM lineitem1B WHERE l_orderkey=5 AND l_linenumber=3;
QUERY PLAN------------------------------------------------------------------------------------ Gather (cost=1000.00..21420167.15 rows=439 width=8) Workers Planned: 9 -> Parallel Seq Scan on lineitem1bssd (cost=0.00..21419123.25 rows=49
width=8) Filter: ((l_orderkey = 5) AND (l_linenumber = 3))
Parallel Query Plan in Postgres
28
Filter: l_extendedprice >71000 Regex: l_comment like %uriously%
Number Servers
Initial comparison with single-node Postgres (selectivity=10%)
Number of Storage Servers (OSDs) Number of Storage Servers (OSDs)
CROSS Symposium 2017
What’s Next And Challenges
• More functionality: – Joins in storage (critical for analytics)– Multi shard transactions (several options here)
• Collection level optimizations– IO batching and indexing
• Application specific customizations– Physics or genomics queries?
• Cost based elasticity policies– Minimize time or impact to grow/shrink
29
CROSS Symposium 2017
Acknowledgements
• Noah Watkins• CROSS Industry Board members• Other SRL/CROSS: Ivo, Carlos, Shel, Michael• More info:
– https://sites.google.com/site/skyhookdb/
– https://cross.ucsc.edu– [email protected]
30
References (1)[Oracle 2016a] Oracle database appliance, Nov 2016.
[Oracle 2012] A Technical overview of the Oracle Exadata database machine and Exadata Storage Server, Jun 2012.
[Oracle 2016b] Oracle database appliance evolves as an enterprise private cloud building block, Jan 2016.
[Kee] K. Keeton, D.Patterson, J.Hellerstein, A case for intelligent disks (iDISKS), SIGMOD Record, Sep 1998.
[Bor] H.Boral, D.DeWitt, Database machines, an idea whose time has passed?, Workshop on Database Machines 1983.
[Woo] L. Woods, Z.Istvan, G.Alonso, Ibex: an intelligent storage engine with support for advanced SQL offloading, VLDB 2014.
[Teradata 2016] Marketing statement: “Multi-dimensional scalability through independent scaling of processing power and storage capacity via our next generation massively parallel processing (MPP) architecture”
[Sevilla+Watkins 2017] M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, C. Maltzahn, "Malacology: A Programmable Storage System", in EuroSys 2017.
[LeFevre 2014] J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, M.J. Carey, "MISO: Souping Up Big Data Query Processing with a Multistore System", in SIGMOD 2014.
[LeFevre 2016] J. LeFevre, R. Liu, C. Inigo, M. Castellanos, L. Paz, E. Ma, M. Hsu, "Building the Enterprise Fabric for Big Data with Vertica and Spark", in SIGMOD 2016.
31
References (2)[Anantharayanan 2011] Ganesh Anantharayanan, Ali Ghodsi, Scott Shenker, Ion Stoica, “Disk-Locality in Datacenter Computing Considered Irrelevant”, in USENIX HotOS, May 2011.
[Dageville 2016] Benoit Dageville, Thierry Cruanes, Marcin Zukowski, et. al, "The Snowflake Elastic Data Warehouse", in SIGMOD 2016.
[FlashGrid 2017] White paper on Mission-Critical Databases in the Cloud. “Oracle RAC on Amazon EC2 Enabled by FlashGrid® Software”. rev. Jan 6, 2017.
[Gupta 2015] Anurag Gupta, Deepak Agarwal, Derek Tan, et.al, “Amazon Redshift and the Case for Simpler Data Warehouses”, in SIGMOD 2015.
[Srinivasan 2016] Vidhya Srinivasan, “What’s New with Amazon Redshift”, AWS re-invent, Nov 2016.
[Dynamo 2017] Amazon’s DynamoDB, https://aws.amazon.com/dynamodb/, accessed Jan 2017.
[Brantner 2008] Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann, Tim Kraska, “Building a database on S3”, in SIGMOD 2008.
[Shuler 2015] Bill Shuler, “CitusDB Architecture for Real-time Big Data”, Jun 2015.
[Citusdata 2017] Citus Data, https://www.citusdata.com, accessed Jan 2017.
[Pavlo 2016] Andrew Pavlo, Matthew Aslett, "What’s Really New with NewSQL?", in SIGMOD Record Jun 2016.
32
References (3)[Xi 2015] Sam (Likun) Xi, Oreoluwa Babarinsa, Manos Athanassoulis, Stratos Idreos, "Beyond the Wall: Near-Data Processing for Databases", in DAMON 2015.
[Kim 2015] Sungchan Kim, Hyunok Ohb, Chanik Parkc, Sangyeun Choc, Sang-Won Leed, Bongki Moone, "In-storage processing of database scans and joins", Information Sciences, 2015.
[Do 2013] Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, David J. DeWitt, "Query Processing on Smart SSDs: Opportunities and Challenges", in SIGMOD 2013.
[Gkantsidis] Christos Gkantsidis, Dimitrios Vytiniotis, Orion Hodson, Dushyanth Narayanan, Florin Dinu, Antony Rowstron, "Rhea: automatic filtering for unstructured cloud storage", in NSDI 2013.
[Balasubramonian 2014] Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, Steven Swanson, "NEAR-DATA PROCESSING: INSIGHTS FROM A MICRO-46 WORKSHOP", IEEE Micro, Aug 2014.
[Kudu 2017] Apache Kudu, Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data
[Kudu 2015] Lipcon et. al “Kudu: Storage for Fast Analytics on Fast Data”, white paper.
[Moatti 2017] Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop, ICDE 2017 33