AWS Segment XO Group Joint webinar

47
Loading and Analyzing Behavioral Data in Amazon Redshift Presented by Segment, AWS & XO Group Inc. March 3, 2015

Transcript of AWS Segment XO Group Joint webinar

Page 1: AWS Segment XO Group Joint webinar

Loading and Analyzing Behavioral Data in Amazon Redshift

Presented by Segment, AWS & XO Group Inc.

March 3, 2015

Page 2: AWS Segment XO Group Joint webinar

Jon Hawkins Dir. of Analytics & SEO

XO Group Inc.

Scott Ward Solutions Architect

AWS

Peter Reinhardt CEO

Segment

Today’s Speakers

Page 3: AWS Segment XO Group Joint webinar

Today’s agenda

•  Amazon Redshift Architecture

•  Segment SQL Quick Demo

•  XO Group’s Story

Page 4: AWS Segment XO Group Joint webinar

Amazon Redshift

Page 5: AWS Segment XO Group Joint webinar

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Page 6: AWS Segment XO Group Joint webinar

Amazon Redshift is Easy to Use

•  Provision in minutes

•  Monitor query performance

•  Point and click resize

•  Built in security

•  Automatic backups

Page 7: AWS Segment XO Group Joint webinar

Amazon Redshift Architecture •  Leader Node

–  SQL endpoint –  Stores metadata –  Coordinates query execution

•  Compute Nodes –  Local, columnar storage –  Execute queries in parallel –  Load, backup, restore via Amazon S3 –  Parallel load from Amazon DynamoDB, Amazon

EMR, Amazon S3, HDFS/SSH

•  Two hardware platforms –  Optimized for data processing –  DW1: HDD; scale from 2TB to 1.6PB –  DW2: SSD; scale from 160GB to 256TB

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3

JDBC/ODBC

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

Leader Node

Page 8: AWS Segment XO Group Joint webinar

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage •  With row storage you do unnecessary I/O

•  To get total amount, you have to read everything

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

Amazon Redshift Dramatically Reduces I/O

Page 9: AWS Segment XO Group Joint webinar

•  With column storage, you only read the data you need

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

Amazon Redshift Dramatically Reduces I/O

Page 10: AWS Segment XO Group Joint webinar

analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage •  COPY compresses automatically

•  You can analyze and override

•  More performance, less cost

Amazon Redshift Dramatically Reduces I/O

Page 11: AWS Segment XO Group Joint webinar

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

10 | 13 | 14 | 26 |… … | 100 | 245 | 324

375 | 393 | 417…

… 512 | 549 | 623

637 | 712 | 809 …

… | 834 | 921 | 959

10

324

375

623

637

959

•  Track the minimum and maximum value for each block

•  Skip over blocks that don’t contain relevant data

Amazon Redshift Dramatically Reduces I/O

Page 12: AWS Segment XO Group Joint webinar

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

128 GB RAM

16 cores

16 TB disk

DW.HS1.8XL:

•  > 2 GB/s scan rate

•  Optimized for data processing

•  High disk density

16 GB RAM 2 cores

2 TB disk

DW.HS1.XL:

Amazon Redshift Dramatically Reduces I/O

Page 13: AWS Segment XO Group Joint webinar

•  Query

•  Load

•  Backup/Restore

•  Resize

Amazon Redshift Parallelizes and Distributes Everything

Page 14: AWS Segment XO Group Joint webinar

Amazon S3/DynamoDB

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

•  Query

•  Load

•  Backup/Restore

•  Resize

•  Parallel load from Amazon DynamoDB, Amazon EMR, Amazon S3, HDFS/SSH

•  Kinesis integration

•  Data automatically distributed and sorted according to DDL

•  Scales linearly with number of nodes

Amazon Redshift Parallelizes and Distributes Everything

Page 15: AWS Segment XO Group Joint webinar

Amazon S3

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

•  Query

•  Load

•  Backup/Restore

•  Resize

•  Backups to Amazon S3 are automatic, continuous and incremental

•  Backup your cluster to a second region

•  Configurable system snapshot retention period; take user snapshots on-demand

•  Streaming restores enable you to resume querying faster

Amazon Redshift Parallelizes and Distributes Everything

Page 16: AWS Segment XO Group Joint webinar

SQL Clients/BI Tools

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Leader Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Leader Node

•  Query

•  Load

•  Backup/Restore

•  Resize

•  Add/remove nodes or change node type while remaining online

•  Provision a new cluster and copy data in parallel from node to node

•  Only charged for source cluster until SQL endpoint has automatically been switched over via DNS

Amazon Redshift Parallelizes and Distributes Everything

Page 17: AWS Segment XO Group Joint webinar

•  SSL to secure data in transit

•  Encryption to secure data at rest –  AES-256; hardware accelerated

–  All blocks on disks and in Amazon S3 encrypted

–  HSM/CloudHSM

•  No direct access to compute nodes

•  Amazon VPC support

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

128GB RAM

16TB disk

16 cores

128GB RAM

16TB disk

16 cores

128GB RAM

16TB disk

16 cores

Amazon S3 / Amazon DynamoDB

Customer VPC

Internal Security Group

JDBC/ODBC

Leader Node

Compute Node

Compute Node

Compute Node

Amazon Redshift Has Security Built In

Page 18: AWS Segment XO Group Joint webinar

Segment SQL

Page 19: AWS Segment XO Group Joint webinar

A single hub to store, transform, and send your customer data.

Page 20: AWS Segment XO Group Joint webinar

All these tools run on three types of data.

IDENTIFY

Who are your users?

TRACK

What are they doing?

PAGE

Where are they?

Page 21: AWS Segment XO Group Joint webinar

analytics.track({ 'Added Product' , { name: 'Monopoly: 3rd Edition', price: 18.99, quantity: 3});

12345

Replace all your tags and tracking with a single API.

Page 22: AWS Segment XO Group Joint webinar

A single hub to store, transform, and send your customer data.

Page 23: AWS Segment XO Group Joint webinar

That same data can flow effortlessly (for you) into

a petabyte scale data warehouse.

Page 24: AWS Segment XO Group Joint webinar

Segment SQL

All of your data in a hosted, schematized Amazon Redshift instance.

Page 25: AWS Segment XO Group Joint webinar

Segment SQL

Query user, page, and event tables.

Page 26: AWS Segment XO Group Joint webinar

Segment SQL Partners

Page 27: AWS Segment XO Group Joint webinar

XO Group Inc.

Page 28: AWS Segment XO Group Joint webinar

XO Group Inc.

Page 29: AWS Segment XO Group Joint webinar

Our Analytics Journey

Page 30: AWS Segment XO Group Joint webinar

XO Group Inc. + Segment

Individual product teams wanted isolated access to their own analytics.

Segment + Mixpanel + Customer.io + Optimizely + Uservoice

Page 31: AWS Segment XO Group Joint webinar

XO Group Inc. + Segment

Still needed a solution to connect Segment data from multiple products and platforms into a single view.

Segment SQL + Mode Analytics

Page 32: AWS Segment XO Group Joint webinar

Version support.

Sharing functionality.

Membership analysis.

3 Case Studies

Page 33: AWS Segment XO Group Joint webinar

Membership Analysis

Existing database didn’t give any context into how a user became a member in the app.

Couldn’t analyze which features were driving membership upgrades, marketing attribution, or cross-product pollination.

SELECT userID, source,

reason FROM xogrp.signed_up

1234

Page 34: AWS Segment XO Group Joint webinar

Isolated Views

Page 35: AWS Segment XO Group Joint webinar

Membership Analysis

App A App B

Page 36: AWS Segment XO Group Joint webinar

Membership Analysis

A desire to save our content drives our membership.

Experiencing certain apps before downloading a second significantly improved LTV.

We made it easier to save and organize our content.

We optimized promotion strategy to advertise the stickiest apps first to current members.

Page 37: AWS Segment XO Group Joint webinar

Share Functionality

Facebook & Twitter drove the most inbound traffic.

So, we guessed they’d also be the most popular sharing options.

SELECT title, shareOption,

quantity FROM xogrp.shared_article

1234

Page 38: AWS Segment XO Group Joint webinar

Share Functionality

Page 39: AWS Segment XO Group Joint webinar

More considerate “Share” options

SMS a dress from desktop or mobile web browser.

One click email yourself the details of a venue.

Page 40: AWS Segment XO Group Joint webinar

Version Support

Couldn’t answer very simple question about which roll up browser versions, devices, and operating systems customers were using.

Caused a lot of debate over which versions to support, which test devices to buy.

Page 41: AWS Segment XO Group Joint webinar

Version Support

Chrome 40 + Chrome 40 + Chrome 40 = No, just no!

Page 42: AWS Segment XO Group Joint webinar
Page 43: AWS Segment XO Group Joint webinar
Page 44: AWS Segment XO Group Joint webinar

Version Support

Now we can easily see what browser versions and OS’s are most popular.

Product and engineering teams can make data-driven decisions on which versions to support.

Page 45: AWS Segment XO Group Joint webinar
Page 46: AWS Segment XO Group Joint webinar

Set up your tracking and analytics database to answer any question.

Save valuable engineering time with prebuilt software.

Listen to each team’s needs and create the analytics view they want.

Takeaways

Page 47: AWS Segment XO Group Joint webinar

Thank you!

Schedule a demo today.

segment.com/contact/sql