Partner webinar presentation aws pebble_treasure_data

32
Data Science at Pebble Analyzing Data to Make Smarter Watches June 2, 2015

Transcript of Partner webinar presentation aws pebble_treasure_data

Page 1: Partner webinar presentation aws pebble_treasure_data

Data Science at PebbleAnalyzing Data to Make Smarter Watches

June 2, 2015

Page 2: Partner webinar presentation aws pebble_treasure_data

Today’s speakers

Scott Ward

Solutions Architect

Amazon Web Services

Kiyoto Tamura

Head of Marketing

Treasure Data

Susan Holcomb

Head of Analytics

Pebble

Page 3: Partner webinar presentation aws pebble_treasure_data

Data at Pebble

Page 4: Partner webinar presentation aws pebble_treasure_data

What is Pebble?

• Customizable smart watch with crowd-pleasing history

• $10.3MM on Kickstarter with first product

• In March, $20MM on Kickstarter with new product

Page 5: Partner webinar presentation aws pebble_treasure_data

Pebble Data Team: Then vs. Now

One year ago…

No data team

No analytics infrastructure

Barely any data

Barely any insights

Today… 5-person team (& growing!)

Scalable analytics infrastructure via Treasure Data

~60MM records per day

New product influenced by data insights

Page 6: Partner webinar presentation aws pebble_treasure_data

Data Science Workflow

Define the problem

Acquire the data

Fit the model

the work the hype

Page 7: Partner webinar presentation aws pebble_treasure_data

Pebble’s First Problem

How should we measure product success?

Page 8: Partner webinar presentation aws pebble_treasure_data

Engagement Definition

• How can we tell someone likes the watch?– Button presses?– Apps downloaded / launched?– Minimized SW bugs?– A crazy formula combining these?

• Simplest: They are wearing the watch– Use accelerometer

Page 9: Partner webinar presentation aws pebble_treasure_data

Accessing Data

60 MM records per day Scheduled jobs

in TD to post-process & aggregate data

Ad hoc queries in TD to explore data (Presto, Hive)

Dashboards

Standardized output

Process: ~30 queries to get

one result

Page 10: Partner webinar presentation aws pebble_treasure_data

Accelerometer noise threshold

• Accelerometer picks up gestures, net motion (so we can enable cool features)

• Sensitive enough to pick up vibrations of passing train

• Goal: Determine threshold for noise so we can assess when watch is really in use

Page 11: Partner webinar presentation aws pebble_treasure_data

Accelerometer noise threshold

Page 12: Partner webinar presentation aws pebble_treasure_data

First result

???

Page 13: Partner webinar presentation aws pebble_treasure_data

Raising the threshold

peaks shift left spike remainsbacklight data matches original threshold!!

Further validated by survey of users

Page 14: Partner webinar presentation aws pebble_treasure_data

Why this worked

• Rapid, repeated ad hoc querying lets you get an intuitive picture of the data– What is the range?– Where are the errors?– Where are the inflection points?

• Few analytics infrastructure tools optimize for this– Too focused on standardized reporting– Want to sell you black box that spits out “insights”

Page 15: Partner webinar presentation aws pebble_treasure_data

Problems 2-n

• Building scalable reporting system

• Delivering insights that shaped interface for new product

• Discovering signals on user attrition

• Designing models to segment use cases

• Analyzing dozens of product elements to improve product experience

Page 16: Partner webinar presentation aws pebble_treasure_data

thanks <3

Page 17: Partner webinar presentation aws pebble_treasure_data

Product Overview

Kiyoto TamuraDirector of Developer Relations

Page 18: Partner webinar presentation aws pebble_treasure_data

Event Data is Everywhere…

Smartphones Websites Home Automation

WearableDevices

ConnectedVehicles

Page 19: Partner webinar presentation aws pebble_treasure_data

Event Data is Everywhere…

Smartphones Websites Home Automation

WearableDevices

ConnectedVehicles

{“timestamp”: “2015-05-22T13:50:00-0600”,“event”: “tap”,“object”: “button_32”,“user”: { “name”: “Luca”, “email”: “[email protected]”, “twitter”: “luckymethod” }}

Page 20: Partner webinar presentation aws pebble_treasure_data

Connecting the (big) data dots is hard

credit: Matt Turck @ FirstMark Capital

Page 21: Partner webinar presentation aws pebble_treasure_data

We provide a simple solution

Ingest Analyze Distribute

and more…

Page 22: Partner webinar presentation aws pebble_treasure_data

• Streaming or Batch ingestion (or both) with Treasure Agent and Embulk

• Don’t worry about changing the way you send data, Treasure Data handles it all

• 99.99% uptime, our team takes care of running the show so you don’t have to

• Query all your data using SQL, no schema required

• Control Treasure Data through our Console, our Command Line Interface or Luigi-TD for complex automated data pipelines

• Choose Hive or Presto

• Run machine learning at scale with Hivemall

• Expansive collection of export plugins: send data to Google Docs, Tableau, Excel, PostgreSQL…

• Connect your favorite BI tool

• Fine grained user access control to your data

Why is Treasure Data better?

Ingest Analyze Distribute

Page 23: Partner webinar presentation aws pebble_treasure_data

CommerceTechnologyGaming Media & Ad Tech

Our growing customer base

Energy Company

IoT

Page 24: Partner webinar presentation aws pebble_treasure_data

• API Servers (c3.2xlarge)

• Hadoop workers (c3.8xlarge)

• Generic workers (c3.4xlarge)

• Powers our schema-free, columnar store

• 50 billion events/day

• No capacity planning needed!

• Both MySQL & PostgreSQL

• Reduced ops cost

• No dedicated devops for 2.5 years

Treasure Data on AWS

EC2 S3 RDS

Page 25: Partner webinar presentation aws pebble_treasure_data
Page 26: Partner webinar presentation aws pebble_treasure_data
Page 27: Partner webinar presentation aws pebble_treasure_data

Amazon Relational Database Service (RDS)

Amazon RDS is a fully managed relational DB service that is:– Simple to deploy– Easy to scale– Reliable– Cost-effective

Ease of deployment and patching

Push-button scalability

Choice of DB Engines

Automated backups

User snapshots and cloning

Monitoring and auto. host replacement

POSTGRE

Amazon RDS for Aurora (Preview)

Page 28: Partner webinar presentation aws pebble_treasure_data

Amazon RDS - Multi-Availability Zone Configuration

• Configure your RDS environment for high availability and DR

• Primary database running in one Availability Zone with Standby in

another

• DNS Name changes due to unhealthy RDS instance or Availability Zone

Page 29: Partner webinar presentation aws pebble_treasure_data

Availability Zone #1

Web Tier

RDPGW

AppTier

Web Tier

AppTier

Auto Scaling group

Auto Scaling group

Availability Zone #2

Web Tier

AppTier

Web Tier

AppTier

Auto Scaling group

Auto Scaling group

RDS Multi-Availability Zone Architecture

Page 30: Partner webinar presentation aws pebble_treasure_data

Amazon RDS - Read Replicas

Insert Partner Logo Here

Region #1 Region #2

Page 31: Partner webinar presentation aws pebble_treasure_data

Insert Partner Logo Here

Page 32: Partner webinar presentation aws pebble_treasure_data

Questions?

Treasure DataKiyoto Tamura

@kiyototamura

treasuredata.com

PebbleSusan Holcomb

getpebble.com

AWSScott Ward

aws.amazon.com

Contact us to learn more