SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand,...

38
© 2019 Snowflake Inc. All Rights Reserved SNOWFLAKE BEST PRACTICES LOUIS LEE SALES ENGINEER CLIVE ASTBURY REGIONAL SALES ENGINEERING MANAGER

Transcript of SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand,...

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKEBEST PRACTICES

LOUIS LEESALES ENGINEER

CLIVE ASTBURY REGIONAL SALES ENGINEERING MANAGER

© 2019 Snowflake Inc. All Rights Reserved

AGENDA

2

Virtual Warehouse Management

Cost Management

Network Security Policies

User Authentication

Role Management

Snowflake Community

© 2019 Snowflake Inc. All Rights Reserved

VIRTUAL WAREHOUSE MANAGEMENT

© 2019 Snowflake Inc. All Rights Reserved 4

VIRTUAL WAREHOUSE MANAGEMENT

Considerations• Key SLA’s and challenges with

meeting SLA’s

• Data load and transformation workloads

• Reporting, ad hoc analysis, and data science workloads

• Cost management

Topics• Sizes and approach to right-sizing

• Scaling up vs. scaling out

• Automating suspend/resume, sizing, and multi-cluster scale-out

© 2019 Snowflake Inc. All Rights Reserved

WAREHOUSE SIZESSizes Servers / Cluster Credits / Hour Notes

X-Small 1 1 Default size when created using CREATE WAREHOUSE.

Small 2 2

Medium 4 4

Large 8 8

X-Large 16 16 Default size for warehouses created in the web UI.

2X-Large 32 32

3X-Large 64 64

4X-Large 128 128

5

Doubling the number of servers halves the run-time...

SCALE UP - LOADING 1BN RECORDS

Doubling the number of servers halves the run-time...

… but you pay per-server, per-second of compute...

… so you can get your answer 8x faster for the same cost.

SCALE OUT - MULTI-CLUSTER WAREHOUSES

SCALE OUT - MULTI-CLUSTER WAREHOUSES

4x increase in servers

4x increase in servers (at peak load)

both are 16 servers, in different configurations

multi-cluster is also half the cost of the xlarge single cluster

multi-cluster gives better results

S

M

MM

time

All three examples contain the same amount of work.

Using scale up and scale out, total run-time is significantly reduced.

You pay per-server, per-second so they all cost the same.

ALL TOGETHER - SCALE, ELASTICITY, COST

© 2019 Snowflake Inc. All Rights Reserved 10

AUTOMATING SUSPEND/RESUME

Auto Suspend/Resume• On-demand, end-user workloads• Suspend idle time setting should take into

account data caching

Programmatic Suspend/Resume• Scheduled jobs where process orchestration is

controlled• Programmatically resume at the start of

processing and suspend at the end of processing to avoid idle time costs

© 2019 Snowflake Inc. All Rights Reserved

COST MANAGEMENT

© 2019 Snowflake Inc. All Rights Reserved 12

Considerations• Compute Costs• Storage Costs• Service Costs• Data Transfer (Egress) Costs• Monitoring & Alerting

COST MANAGEMENT

Topics

● Resources Incurring Costs● Compute

○ Viewing Usage○ Resource Monitors

● Storage○ Time Travel & Fail-Safe○ Viewing Usage

● Services○ Non-warehouse compute

© 2019 Snowflake Inc. All Rights Reserved

RESOURCES INCURRING COSTS

Materialized ViewsAccount

Virtual Warehouses

Databases Schemas

Tables

Permanent

Temp/Transient

AutomaticClustering

Service

Stages

Internal

Cross-RegionExtract Egress

PipesCompute Costs

Storage CostsService CostsPass-through Costs

Materialized Views

13

© 2019 Snowflake Inc. All Rights Reserved 14

RESOURCE MONITOR• Align with team-by-team warehouse

separation for granular cost governance

• Set at account level if specific virtual warehouse quotas are not needed

• Leverage tiered triggers with escalating actions (e.g., Notify > Notify > Suspend)

• Enable notifications using ACCOUNTADMIN role and set e-mail address

© 2019 Snowflake Inc. All Rights Reserved

STORAGE FUNDAMENTALS

15

© 2019 Snowflake Inc. All Rights Reserved

TABLE TYPES

Tied to an individual session and persists only for the duration of the session. Used for storing non-permanent, transitory data (e.g. ETL data, session-specific data).

TemporarySpecifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary tables), but does not need the same level of data protection and recovery provided by permanent tables.

TransientDesigned for data that requires the highest level of data protection and recovery with both a Time-Travel and Fail-Safe period, and is the default for creating tables.

Permanent

Time-Travel

Fail-Safe x x

© 2019 Snowflake Inc. All Rights Reserved 17

TIME TRAVELSTORAGE

• High churn detected with ratio such as:

TIME_TRAVEL_BYTES / ACTIVE_BYTES

from TABLE_STORAGE_METRICS view

• For Enterprise (or higher), retention period can be up to 90 days; verify retention period on all large or high-churn tables

• Reduce retention period if data can be regenerated/reloaded and time/effort to do so is within acceptable boundaries/SLAs

• Use periodic zero-copy-cloning (snapshots) instead of time travel to provide longer retention period at discrete points in time (daily, weekly, etc)

Areas Of Focus• Dimensional Tables• Persistent Staging Areas• Materialized Relationships,

Derivations, Other Business Rules

© 2019 Snowflake Inc. All Rights Reserved 18

FAIL-SAFESTORAGE

• Permanent tables follow full CDP lifecycle; temp/transient tables NEVER use fail-safe

• Utilize temp tables for session-specific intermediate results in complex data processing workflow

• Temporary tables are dropped (and storage released) as soon as session ends

• Utilize transient tables for staging where frequent truncate/reload operations occur

• Consider designating databases/schemas as transient to simplify table creation

Areas Of Focus• Staging Tables• Intermediate Result Tables• Work Areas for Developers, Analysts

& Data Scientists• Reporting Tool Materialized Results

© 2019 Snowflake Inc. All Rights Reserved

NETWORK SECURITY

© 2019 Snowflake Inc. All Rights Reserved

LAYERED SECURITY

To protect customer data using AES 256 bit encryption, and periodic re-keying

Network(AuthenticateConnection)

Account(Authenticate User)

Object(Authorization)

Data(Encryption)

1 2 3 4

To authenticate users using a Password, Multi-Factor Authentication or Single Sign-On

To restrict access to specific Databases, Schemas, Tables, Views, etc.

Using Roles and Privileges

To restrict access to specified IP address/rangeOptionally: To restrict via Secure Private Network

20

© 2019 Snowflake Inc. All Rights Reserved 21

NETWORKSECURITY

Considerations• IP Whitelisting &

Blacklisting• Public Internet Exposure

Considerations

Topics• Network Security Policies

• AWS/Azure PrivateLink

© 2019 Snowflake Inc. All Rights Reserved 22

• Managed by ACCOUNTADMIN or SECURITYADMIN roles

• Only one network policy object can be active at any one time

• Supports IPv4 addresses & CIDR notation

• Maintain consistency with other enterprise application network security policies

• Connectivity test plan should include all networks (i.e., internal, vpn, etc.)

• Utilize IP ranges versus IP lists whenever possible (e.g., 192.168.1.0/24)

• Blocked IP’s are enforced first and require careful consideration when overlapping an allowed IP range (e.g., 0.0.0.0/0 blocks all IP’s)

NETWORK SECURITY POLICIES

© 2019 Snowflake Inc. All Rights Reserved

AWS/AZURE PRIVATELINK

23

AWS QuickSight

© 2019 Snowflake Inc. All Rights Reserved

USER AUTHENTICATION

© 2019 Snowflake Inc. All Rights Reserved 25

USER AUTHENTICATION

Considerations• Multi-Factor

Authentication• Federated Authentication• User Group Scenarios• Service Account

Scenarios

Topics• Multi-Factor Authentication

• Federated Authentication & SSO

• OAuth

© 2019 Snowflake Inc. All Rights Reserved 26

• Provides increased login security for users connecting to Snowflake

• Powered by Duo Security, which is managed by Snowflake

• Can self-enroll

• Strongly recommend requiring MFA for all users with ACCOUNTADMIN role

• Duo-generated passcode can be used when connecting through Python, SnowSQL, JDBC or ODBC

MULTI-FACTOR AUTHENTICATION

© 2019 Snowflake Inc. All Rights Reserved 27

• Enables user SSO (single sign-on) through federated authentication

• Browser-based supports for most SAML 2.0-compliant identity providers (Google, Azure, Onelogin, PingOne)

• Native support for Okta and Microsoft ADFS

• Browser-based SSO can be used in combination with MFA

FEDERATED AUTH & SSO

© 2019 Snowflake Inc. All Rights Reserved 28

• Open-standard 2.0 protocol that allows supported clients authorized access to Snowflake without sharing or storing user login credentials

• Supports Tableau Desktop/Server/Online and custom clients configured by your organization

• Supports OAuth with AWS PrivateLink

• ACCOUNTADMIN and SECURITYADMIN are blocked roles by default, but can be enabled by Snowflake Support

• Currently only the default role for a user is authorized or PUBLIC if no default is set

OAUTH

© 2019 Snowflake Inc. All Rights Reserved

ROLE MANAGEMENT

© 2019 Snowflake Inc. All Rights Reserved 30

ROLE MANAGEMENT

Considerations• Administrators• Developers & DevOps

Flow• End-Users• Service Accounts

Risks • Inappropriate or Overly

Restrictive Access

• Lack of Extensibility & Control

• Burdensome Maintenance

• Future Rework & Reconfiguration

© 2019 Snowflake Inc. All Rights Reserved 31

SYSTEM-DEFINED ROLES

Users & Roles Objects

ACCOUNTADMINOwns the Snowflake account and can operate on all objects in the account, view and manage Snowflake billing and credit data, and stop any running SQL statements

SECURITYADMINPrimary role for managing users, custom roles and object access (grants)

SYSADMINPrimary role for creating and managing objects (i.e., warehouses, databases, tables, etc.) and administering object access through custom roles

PUBLICPseudo-role that is automatically granted to every user and every role in your account

© 2019 Snowflake Inc. All Rights Reserved 32

EXAMPLE Functional Roles● Analyst Team Lead● Junior Analyst

Analyst Team Lead● Has all (CRUD) access to a working schema

● Read access to the main schema

Junior Analyst● Limited to read access to the main schema

Both roles share access to a Virtual Warehouse

Analyst Team Lead

Table

Database: DWH

Schema: Working Area

Select

JuniorAnalyst

Table

Schema: Main

READ ONLY PATTERN: SOLUTION

OBRIAN WSMITH

Usage

Virtual Warehouse

Usage

Usage

Usage

© 2019 Snowflake Inc. All Rights Reserved 34

Naming Convention• Establish and use a consistent naming

convention across entire account

Future Grants• Allows defining a role with an initial set of

privileges on new objects of a certain type (e.g., tables or views) within a schema or database (pr-preview)

Viewing Granted Roles & Privileges• SHOW GRANTS TO USER <user>;• SHOW GRANTS TO ROLE <role>;• SHOW GRANTS OF ROLE <role>;• Query INFORMATION_SCHEMA

Managed Access Schema• Centralizes grant management to the

schema owner or role with MANAGE GRANTS

OTHERCONSIDERATIONS

• Naming Convention• Future Grants• Viewing Granted Roles &

Privileges• Managed Access Schema

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKECOMMUNITY

© 2019 Snowflake Inc. All Rights Reserved 36

SNOWFLAKE COMMUNITY

Snowflake Community• We are moving our forum

to Stack Overflow• Use existing forum for

Snowflake account-related questions

• Everything else will remain the same with Snowflake Community

Stack Overflow • Technical Q&A

• Use the “[snowflake]” tag

• Include relevant information like error messages

© 2019 Snowflake Inc. All Rights Reserved

Questions?

© 2019 Snowflake Inc. All Rights Reserved

Thank You