Partner webinar presentation aws pebble_treasure_data

Post on 03-Aug-2015

165 views 0 download

Tags:

Transcript of Partner webinar presentation aws pebble_treasure_data

Data Science at PebbleAnalyzing Data to Make Smarter Watches

June 2, 2015

Today’s speakers

Scott Ward

Solutions Architect

Amazon Web Services

Kiyoto Tamura

Head of Marketing

Treasure Data

Susan Holcomb

Head of Analytics

Pebble

Data at Pebble

What is Pebble?

• Customizable smart watch with crowd-pleasing history

• $10.3MM on Kickstarter with first product

• In March, $20MM on Kickstarter with new product

Pebble Data Team: Then vs. Now

One year ago…

No data team

No analytics infrastructure

Barely any data

Barely any insights

Today… 5-person team (& growing!)

Scalable analytics infrastructure via Treasure Data

~60MM records per day

New product influenced by data insights

Data Science Workflow

Define the problem

Acquire the data

Fit the model

the work the hype

Pebble’s First Problem

How should we measure product success?

Engagement Definition

• How can we tell someone likes the watch?– Button presses?– Apps downloaded / launched?– Minimized SW bugs?– A crazy formula combining these?

• Simplest: They are wearing the watch– Use accelerometer

Accessing Data

60 MM records per day Scheduled jobs

in TD to post-process & aggregate data

Ad hoc queries in TD to explore data (Presto, Hive)

Dashboards

Standardized output

Process: ~30 queries to get

one result

Accelerometer noise threshold

• Accelerometer picks up gestures, net motion (so we can enable cool features)

• Sensitive enough to pick up vibrations of passing train

• Goal: Determine threshold for noise so we can assess when watch is really in use

Accelerometer noise threshold

First result

???

Raising the threshold

peaks shift left spike remainsbacklight data matches original threshold!!

Further validated by survey of users

Why this worked

• Rapid, repeated ad hoc querying lets you get an intuitive picture of the data– What is the range?– Where are the errors?– Where are the inflection points?

• Few analytics infrastructure tools optimize for this– Too focused on standardized reporting– Want to sell you black box that spits out “insights”

Problems 2-n

• Building scalable reporting system

• Delivering insights that shaped interface for new product

• Discovering signals on user attrition

• Designing models to segment use cases

• Analyzing dozens of product elements to improve product experience

thanks <3

Product Overview

Kiyoto TamuraDirector of Developer Relations

Event Data is Everywhere…

Smartphones Websites Home Automation

WearableDevices

ConnectedVehicles

Event Data is Everywhere…

Smartphones Websites Home Automation

WearableDevices

ConnectedVehicles

{“timestamp”: “2015-05-22T13:50:00-0600”,“event”: “tap”,“object”: “button_32”,“user”: { “name”: “Luca”, “email”: “luca@treasuredata.com”, “twitter”: “luckymethod” }}

Connecting the (big) data dots is hard

credit: Matt Turck @ FirstMark Capital

We provide a simple solution

Ingest Analyze Distribute

and more…

• Streaming or Batch ingestion (or both) with Treasure Agent and Embulk

• Don’t worry about changing the way you send data, Treasure Data handles it all

• 99.99% uptime, our team takes care of running the show so you don’t have to

• Query all your data using SQL, no schema required

• Control Treasure Data through our Console, our Command Line Interface or Luigi-TD for complex automated data pipelines

• Choose Hive or Presto

• Run machine learning at scale with Hivemall

• Expansive collection of export plugins: send data to Google Docs, Tableau, Excel, PostgreSQL…

• Connect your favorite BI tool

• Fine grained user access control to your data

Why is Treasure Data better?

Ingest Analyze Distribute

CommerceTechnologyGaming Media & Ad Tech

Our growing customer base

Energy Company

IoT

• API Servers (c3.2xlarge)

• Hadoop workers (c3.8xlarge)

• Generic workers (c3.4xlarge)

• Powers our schema-free, columnar store

• 50 billion events/day

• No capacity planning needed!

• Both MySQL & PostgreSQL

• Reduced ops cost

• No dedicated devops for 2.5 years

Treasure Data on AWS

EC2 S3 RDS

Amazon Relational Database Service (RDS)

Amazon RDS is a fully managed relational DB service that is:– Simple to deploy– Easy to scale– Reliable– Cost-effective

Ease of deployment and patching

Push-button scalability

Choice of DB Engines

Automated backups

User snapshots and cloning

Monitoring and auto. host replacement

POSTGRE

Amazon RDS for Aurora (Preview)

Amazon RDS - Multi-Availability Zone Configuration

• Configure your RDS environment for high availability and DR

• Primary database running in one Availability Zone with Standby in

another

• DNS Name changes due to unhealthy RDS instance or Availability Zone

Availability Zone #1

Web Tier

RDPGW

AppTier

Web Tier

AppTier

Auto Scaling group

Auto Scaling group

Availability Zone #2

Web Tier

AppTier

Web Tier

AppTier

Auto Scaling group

Auto Scaling group

RDS Multi-Availability Zone Architecture

Amazon RDS - Read Replicas

Insert Partner Logo Here

Region #1 Region #2

Insert Partner Logo Here

Questions?

Treasure DataKiyoto Tamura

@kiyototamura

treasuredata.com

PebbleSusan Holcomb

getpebble.com

AWSScott Ward

aws.amazon.com

Contact us to learn more