Dataiku - google cloud platform roadshow - october 2013

18
Data Science Studio 19 customers Founded in January 2013 Data Science For Everyone

description

 

Transcript of Dataiku - google cloud platform roadshow - october 2013

Page 1: Dataiku  - google cloud platform roadshow - october 2013

Data Science Studio

19 customers

Founded in January 2013

Data Science For Everyone

Page 2: Dataiku  - google cloud platform roadshow - october 2013

(big) data(s) + machine learning + for practical applications = Data Science

Page 3: Dataiku  - google cloud platform roadshow - october 2013

The Project

(c) Dataiku 2013 - Confidential

Hal Alowne BI Manager Dim’s Private Showroom

Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst

Big Guys $10B + revenue 100+ Data Scientists

Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do!

Page 4: Dataiku  - google cloud platform roadshow - october 2013

Hal Wish #1Global Customer Value Funnel

SEO

NewsLetter

Display Retargeting Display AdWords

Marketplace Direct Sales

Delivery

View Basket

Support Returns

$

$ $ $

Orders

Page 5: Dataiku  - google cloud platform roadshow - october 2013

Hal Wish #2Why people drop basket ?

9/30/13 5

Basket

Payment refused

Credit Refused

Cheaper elsewhere ?

Delivery costs ? Wait Xmas?

ACTION

Page 6: Dataiku  - google cloud platform roadshow - october 2013

Hal Wish #3What product to put on top ?

9/30/13 6

Original Most Popular on top

Better Machine Learning Score (age/discount/margin…)

Advanced Machine Learning Score + Personalization

Page 7: Dataiku  - google cloud platform roadshow - october 2013

9/30/13 7

Why is it so

complicated

?

Page 8: Dataiku  - google cloud platform roadshow - october 2013

Partner Data Spaghetti

Mailing Partner

DMP Partnerz

Mail Optimizer

Retargeter

Market Data Providers

Social z Networks

Page 9: Dataiku  - google cloud platform roadshow - october 2013

Database are Full

9/30/13 9

1 TB BI Database

20 TB BI Database

Any new computing job take > 1 day

NEED FOR SCALE

Page 10: Dataiku  - google cloud platform roadshow - october 2013

Architecture Bingo

9/30/13 10

BI Real-Time Batch Real Real-Time

Simple Queries

Statistics

Machine Learning

Hive

Pig

Spark

MongoDB

ElasticSearch

Cascading

R

Page 11: Dataiku  - google cloud platform roadshow - october 2013

Hadoop Ceph

Sphere Cassandra Spark

Scikit-Learn

Mahout WEKA

MLBase

RapidMiner

Panda D3 Crossfilter

InfiniDB LucidDB

Impala

Elastic Search SOLR

MongoDB Riak

Membase

Pig Hive Cascading Talend

Machine Learning !Mystery Land!

Scalability Central!NoSQL-Slavia!

SQL Columnar Republic!

Vizualization County! Data Cleanup Wasteland!

Statistician Old !House!

R

Page 12: Dataiku  - google cloud platform roadshow - october 2013

Hal’s Bingo !

9/30/13 12

HADOOP Google Cloud Platform Dataiku

Page 13: Dataiku  - google cloud platform roadshow - october 2013

Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google

Cloud Storage }  Full Java, Easy To Deploy

Step 1 Get your own data

9/30/13 13

Silent in night Autoscale during Sales summer and winter

Page 14: Dataiku  - google cloud platform roadshow - october 2013

Step 2 Mix All Your Data

9/30/13 14

4 VMs on GCE

Tracking Data

Internal Data

Partner Data

Data Science Studio Pig Hive

HADOOP

auto-sync to BigQuery

Page 15: Dataiku  - google cloud platform roadshow - october 2013

Step 3 Mine your Data

9/30/13 15

Builtin Predictive Models

Advanced Adhoc Models (R or Python)

Shared Web Based Data Mining Platform

Page 16: Dataiku  - google cloud platform roadshow - october 2013

}  January ◦  Choose Partner / Setup the architecture

}  February ◦  Initial Deployment : 4TB ◦  Replace BI

}  May ◦  New Applications (SEO, …)

}  September ◦  Scale Deployment to 15TB ◦  Integrate all channels

Typical Project Calendar

9/30/13 16

Page 17: Dataiku  - google cloud platform roadshow - october 2013

}  Enhance Daily Report Availability ◦  Previous architecture �  Between H+17 and H+26 (!) ◦  Hadoop on GCE �  Between H+3 AND H+7

}  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps”

Some Success For the Project

9/30/13 17

Page 18: Dataiku  - google cloud platform roadshow - october 2013

Thank you !

9/30/13 18

Follow us on twitter @dataiku

Ask any big data question [email protected]