The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse...
Transcript of The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse...
![Page 2: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/2.jpg)
2
Who We Are
•Founded 2012
•Mission: Build an enterprise data warehouse as a cloud service
•HQ in downtown San Mateo
•130+ employees, ~50 engs (and hiring!)
![Page 3: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/3.jpg)
3
![Page 4: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/4.jpg)
4
Our Product
•The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch for the cloud •Built to provide a true service experience
•Runs in the Amazon cloud (AWS)
•Millions of queries per day over petabytes of data
•100+ active customers, growing fast
![Page 5: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/5.jpg)
5
Motivation
![Page 6: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/6.jpg)
6
Some history
•Late 2012…
•SQL-on-Hadoop is all the hype…
•Redshift isn’t around yet...
•Let’s not look around. Let’s look up...
![Page 7: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/7.jpg)
7
What is that Cloud thing?
![Page 8: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/8.jpg)
8
What is that Cloud thing?
![Page 9: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/9.jpg)
9
Cloud: Your Next Computer
•New computing platform
•New operating system
•Elasticity in multiple dimensions
•Infinite* scalability
•SaaS delivery model
•The data hub for the world
![Page 10: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/10.jpg)
10
Cloud and Databases?
•Can it work? •Sure! Let’s deploy MySQL on EC2!
•Can it work well? •Elasticity? •Resilient to hardware failures? •Easy to use?
•Hmmm....
![Page 11: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/11.jpg)
11
Shared-nothing Architecture
•Tables are horizontally partitioned across nodes
•Scales well for star-schema queries
•Requires a lot of tuning
•Dominant architecture in data warehousing •Teradata, Vertica, Netezza…
![Page 12: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/12.jpg)
12
The Perils of Coupling
•Shared-nothing couples compute and storage resources
•Elastic? •Resizing requires redistributing data •System often unavailable •Cannot disable unused resources → no pay-per-use •Impossible to provision correctly
•Homogeneous resources vs. heterogeneous workload •Bulk loading, reporting, exploratory analysis
![Page 13: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/13.jpg)
13
Our Vision for a Cloud Data Warehouse
Data warehouse as a service
No infrastructure to
manage, no knobs to tune
Multidimensional elasticity
On-demand scalability data, queries, users
All business data
Native support for
relational + semi-structured data
![Page 14: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/14.jpg)
14
Architecture
![Page 15: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/15.jpg)
15
Data Storage
Multi-cluster Shared-data Architecture
• All data in one place
• Independently scale every layer
• Every virtual warehouse can access all data
Cloud Services
Transaction Manager Security Optimizer Infrastructure
manager
Authentication & access control
Virtual Warehouse
Cache
Virtual Warehouse
Cache
Virtual Warehouse
Cache
Virtual Warehouse
Cache
Rest (JDBC/ODBC/Python)
Metadata
![Page 16: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/16.jpg)
16
Data Storage Layer
•Stores table data and query results
•Uses Amazon S3 •Object store (key-value) with HTTP(S) interface •High availability, extreme durability (11-9)
•Some important differences w.r.t. local disks •Performance (sure…) •No update-in-place, objects must be written in full
•S3-optimized file format and concurrency control
![Page 17: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/17.jpg)
17
Other Data
•S3 also used for temp data and query results •Arbitrarily large queries, never run out of disk space •Retrieve and reuse previous query results
•Metadata stored in a transactional key-value store (not S3) •Mapping of S3 objects to tables •Optimizer statistics, lock tables, transaction logs etc. •Part of Cloud Services layer (see later)
![Page 18: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/18.jpg)
18
Virtual Warehouse
•Cluster of EC2 instances
•Pure compute resources •Created, destroyed, resized on demand •Users may run multiple VW at same time •Shared data access with isolated performance •Users may shut down all VWs when they have nothing to run
•Worker nodes are ephemeral
•Each worker node maintains local table cache
![Page 19: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/19.jpg)
19
Cloud Services
•Collection of services •Access control, query optimizer, transaction manager etc.
•Multi-tenant and always on
•Replicated for availability and scalability
•Hard state stored in transactional key-value store
•Standard interfaces and feature-rich web UI
•Focus on ease-of-use and service experience
![Page 20: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/20.jpg)
20
Feature highlights
![Page 21: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/21.jpg)
21
Multi-dimensional elasticity
•Elastic scaling for
•Storage
•Compute
•Concurrency
•All thanks to decoupling of storage and compute! Biz Dev
Sales
Finance
ETL & Data Loading
Test/Dev
Marketing
Databases
![Page 22: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/22.jpg)
22
Elastic Storage
•S3: Low-cost, fully replicated, secure and resilient
•Infinite* capacity
•Pay for space/time you use
•All data available to everyone •Full transactional consistency
•Requires elastic processing engine
![Page 23: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/23.jpg)
23
Elastic compute and concurrency
•Optimize Virtual Warehouses for workloads •Small VW for continuous loading •X-Large VW for once-a-week report
•Optimize for concurrent use •Different VWs for different users •Access to the same data, no performance interference •Automatic scaling for high-concurrency scenarios
•Pay for what you use
![Page 24: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/24.jpg)
24
New usage scenarios
•“Cheaper than walking to the DBA” •Asking DBA for permission takes 10 minutes. •Time => Money => Compute (if elastic!)
•“It’s like a Porsche for the weekend” •“I use a 64-node machine for my weekly report!”
•No more: “Don’t run queries! We’re loading new data!” •No resource/performance interference. No data marts!
•“No tuning, it just works” •“I lost 20 pounds and reduced smoking”
![Page 25: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/25.jpg)
25
Other features
•Multi-AZ deployment •Continuous availability •Always up-to-date •Security (SOC-2, HIPAA)
•Federated authentication & MFA •Access control
•Automated backup •Automated scalability
•Time travel •Instant cloning •Optimized semi-structured storage and processing
•Matching relational performance •JavaScript UDFs •ODBC, JDBC, NodeJS, Python, R, Spark, … •Tableau, Informatica, Looker…
![Page 26: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/26.jpg)
26
Lessons learned
![Page 27: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/27.jpg)
27
Lessons Learned
•Decoupling storage and compute a game changer for users •Maps onto cloud very well •Allows a novel multi-cluster, shared-data architecture •Fewer data silos and easier data access •More flexible use scenarios •Scale costs for different layers independently
•Semi-structured extensions were a bigger hit than expected
•SaaS model helps both users and us
•Users love “no tuning” aspect
![Page 28: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/28.jpg)
28
Ongoing Challenges
•SaaS and multi-tenancy remain biggest challenges •Hundreds of concurrent users, some of which do weird things •Metadata layer is becoming huge •Failure handling
•Security •There is more to running a secure service than “encrypt everything”
•Lots of work left to do •SQL functionality and performance improvements •Self-service model
![Page 30: The Snowflake Elastic Data Warehouse · 2016-06-02 · •The Snowflake Elastic Data Warehouse •Multi-tenant, transactional, secure, highly scalable, elastic •Designed from scratch](https://reader030.fdocuments.in/reader030/viewer/2022040514/5e6ba40e66f2f263e3046496/html5/thumbnails/30.jpg)
30