CS 839: Design the Next-Generation Database Lecture 22:...

33
Xiangyao Yu 4/9/2020 CS 839: Design the Next-Generation Database Lecture 22: Snowflake 1

Transcript of CS 839: Design the Next-Generation Database Lecture 22:...

Page 1: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Xiangyao Yu4/9/2020

CS 839: Design the Next-Generation DatabaseLecture 22: Snowflake

1

Page 2: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Announcements

2

Course project• Submission deadline: Apr. 23• Peer review: Apr. 23 – Apr. 30• Presentation: Apr. 28 & 30• Submission deadline: May 4

Will create google sheet for presentation signup

Page 3: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Discussion Highlights

3

Optimal design that combines the advantages? • Athena with instances pre-running• Hybrid instance store and S3; decide caching based on the workload• High-quality code compilers• Heterogeneous system that combines all the existing systems together

Optimization opportunities for serverless databases?• Optimize resource sharing among users (e.g., cache, computation)• SW/HW codesign • Heterogeneous hardware and storage (e.g., different function on different hardware)• Scale computation and storage on demand• Keep instances pre-warmed to reduce cold starts

Cloud databases benefit from new hardware?• Using GPU• SmartSSD• RDMA and SmartNIC (e.g., shared cache in SSD, computation offloading)• Persistent memory to improve bandwidth and aid fast restarts

Page 4: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Today’s Paper

4SIGMOD 2016

Page 5: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

On-Premises vs. Cloud

5

On-premises• Fixed and limited hardware

resources

Cloud• Virtually infinite computation & storage• Pay-as-you-go

CPU

Mem

HDD

CPU

Mem

HDD

CPU

Mem

HDD

CPU

Mem

HDD

CPU

Mem

HDD

CPU

Mem

HDD

… …… …

Page 6: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Shared Nothing – Advantages

6

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

Scalability: horizontal scaling• Scales well for star-schema

queriesDimension Table

Fact Table

Page 7: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Shared Nothing – Disadvantages

7

Workload A Workload B

More CPU intensive Less CPU intensive

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

Heterogeneous workload

Page 8: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Shared Nothing – Disadvantages

8

Heterogeneous workloadMembership changes• Add a node: data redistribution

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

Page 9: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Shared Nothing – Disadvantages

9

Heterogeneous workloadMembership changes• Add a node: data redistribution• Delete a node: fault tolerance

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

Page 10: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Shared Nothing – Disadvantages

10

Heterogeneous workloadMembership changesOnline upgrade• Similar to membership change

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

CPU

Mem

VM

HDD

Page 11: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Web User Interface

Serverless (similar to Athena) 11

Page 12: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Multi-Cluster Shared-Data Architecture

12

Control layer

Compute layer

Storage layer

Page 13: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Storage

13

Data format: PAX

Data horizontally partitioned into immutable files (~16MB)• An update = remove and add an entire file• Queries download file headers and columns they are interested in

Intermediate data spilling to S3

Page 14: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

14

T-Shirt sizes: XS to 4XL

Elasticity and Isolation• Created, destroyed, or resized at any point (may shutdown all VWs)• User may create multiple VWs for multiple queries

Workload A Workload B

More CPU intensive Less CPU intensive

Large VW Small VW

Page 15: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

15

Local caching• S3 data can be cached in local memory or disk

CPU CPU CPU

HDD HDD HDD HDD HDD

Page 16: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

16

Local caching• S3 data can be cached in local memory or disk

CPU CPU CPU

HDD HDD HDD HDD HDD

Consistent hashing• When the hash table (n keys and

m slots) is resized, only n/m keys need to be remapped

Page 17: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

17

Local caching• S3 data can be cached in local memory or disk

CPU CPU CPU

HDD HDD HDD HDD HDD

CPUConsistent hashing• When the hash table (n keys and

m slots) is resized, only n/m keys need to be remapped

Page 18: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

18

Local caching• S3 data can be cached in local memory or disk

CPU CPU CPU

HDD HDD HDD HDD HDD

CPUConsistent hashing• When the hash table (n keys and

m slots) is resized, only n/m keys need to be remapped• When a VW is resized, no data

shuffle required; rely on LRU to replace cache content

Page 19: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

19

Local caching• S3 data can be cached in local memory or disk

CPU CPU CPU

HDD HDD HDD HDD HDD

CPUConsistent hashing• When the hash table (n keys and

m slots) is resized, only n/m keys need to be remapped• When a VW is resized, no data

shuffle required; rely on LRU to replace cache content

File stealing to tolerate skew

Page 20: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Virtual Warehouse

20

Execution engine• Columnar: SIMD, compression• Vectorized: process a group of elements at a time• Push-based

Page 21: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Architecture – Cloud Services

21

Multi-tenant layer shared across multiple users

Query optimizationConcurrency control • Isolation: snapshot isolation (SI)• S3 data is immutable, update entire files with MVCC• Versioned snapshots used for time traveling

Pruning• Snowflake has no index (same in Athena, Presto, Hive, etc)• Min-max based pruning: store min and max values for a data block

Page 22: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

High Availability and Fault Tolerance

22

Stateless services

Page 23: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

High Availability and Fault Tolerance

23

Replicated metadata

Page 24: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

High Availability and Fault Tolerance

24

One node failure in VW • Re-execute with failed node

immediately replaced• Re-execute with reduced

number of nodesWhole AZ failure

• Re-execute by re-provisioning a new VW

Hot-standby nodes

Page 25: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

High Availability and Fault Tolerance

25

S3 is highly available and durable

Page 26: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Online Upgrade

26

Deploy new versions of services and VWs

Page 27: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Semi-Structured Data

27

Extensible Markup Language (XML) JavaScript Object Notation(JSON)

Page 28: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Extract-Transform-Load (ETL)

Transform (e.g., converting to column format) adds latency to the system

28

Page 29: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

ETL vs. ELT

29Picture from https://aws.amazon.com/blogs/big-data/etl-and-elt-design-patterns-for-lake-house-architecture-using-amazon-redshift-part-1/

Page 30: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Optimization for Semi-Structured DataAutomatic type inferenceHybrid columnar format• Frequently paths are detected, projected out, and stored in separate

columns in table file (typed and compressed)• Collect metadata on these columns for optimization (e.g., pruning)

30

Page 31: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

SummarySnowflake vs shared nothing • Heterogeneous workload• Membership changes

Snowflake vs. Redshift (Spectrum)

Snowflake vs. Athena

Snowflake vs. Presto/Hive/Vertica

31

Page 32: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Snowflake – Q/A Storage system better than S3 (e.g., allow updates)Row store for transaction processing?Server-side cursor? Min-max based pruning replacing indices?Other systems similar to Snowflake? Pay-as-you-go?Push vs. pull?Pruning requires sorting?Snowflake autoscaling compute based on demand?

32

Page 33: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3

Group DiscussionHow far away is Snowflake from the “optimal design” that you discussed last time?• High-quality code compilers• Athena with instances pre-running• Hybrid instance store and S3; decide caching based on the workload• Heterogeneous system that combines all the existing systems together

Can you come up with a nice way of combining cloud data warehousing (e.g., Snowflake) with cloud transaction processing (e.g., Aurora)?

33