Dataiku - From Big Data To Machine Learning

76
1 Dataiku 05/09/2022

description

This presentation was made in front of CIO to sensibilize to the big data in practical terms and to the new usages of machine learning and analytics.

Transcript of Dataiku - From Big Data To Machine Learning

Page 1: Dataiku - From Big Data To Machine Learning

1Dataiku04/10/2023

Page 2: Dataiku - From Big Data To Machine Learning

04/10/2023 2Dataiku

Hi !

Current Life:CEO, Dataiku

Tweet about this: @dataiku @club_dsi_gun

Past Life: CriteoIsCool EntertainmentExalead

Florian Douetteau

Available on Slide Sharehttp://www.slideshare.net/Dataiku

Goals Today: • Concrete Feedback on Data Analytics

Projects• Data Team in practice and Key technologies • Motivate you to start a data science project

Slide deck allergic ? Check:https://github.com/dataiku

Page 3: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 3

Dataiku

Dataiku : An open source platform to help you build your data lab‟

Page 4: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 4

Motivation

Page 5: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 5

Collocation

Big Apple

Big Mama

Big Data

A familiar grouping of words, especially words that habitually appear together and thereby convey meaning by association.

Collocation:

Page 6: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 6

“Big” Data in 1999

struct Element { Key key; void* stat_data ;}….

C Optimized Data structuresPerfect HashingHP-UNIX Servers – 4GB Ram100 GB dataWeb Crawler – Socket reuse HTTP 0.9

1 Month

Page 7: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 7

Hadoop Java / Pig / Hive / Scala /

Closure / … A Dozen NoSQL data store MPP Databases Real-Time

Big Data in 2013

1 Hour

Page 8: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 8

Data Analytics: The Stakes

1 TB? $

Social Gaming2011Web Search

1999

Logistics2004

Online Advertising2012

1 TB100M $

E-Commerce2013

Banking CRM2008

1 TB1B $

Web Search2010

100 TB? $

10 TB10M $

1000TB500M $

50TB1B$

Page 9: Dataiku - From Big Data To Machine Learning

04/10/2023 9

Meet Hal Alowne

Dataiku - Data Tuesday

Big Guys• 10B$+ Revenue• 100M+ customers• 100+ Data Scientist

Hal AlowneBI ManagerDim’s Private Showroom

Hey Hal ! We need a big data platform

like the big guys.Let’s just do as they do!

‟”European E-commerce Web site

• 100M$ Revenue• 1 Million customer• 1 Data Analyst (Hal Himself)

Dim SumCEO & Founder Dim’s Private Showroom

Big DataCopy Cat Project

Page 10: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 10

Technology is complex

HadoopCeph

Sphere

Cassandra

Spark

Scikit-Learn

MahoutWEKA

MLBase

RapidMiner

PandaD3Crossfilter

InfiniDBLucidDB

Impala

Elastic Search

SOLR

MongoDBRiak

Membase

Pig HiveCascadingTalend

Machine Learning Mystery Land

Scalability CentralNoSQL-Slavia

SQL Colunnar Republic

Vizualization County Data Clean Wasteland

Statistician Old House

R

Page 11: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 11

Statistics and Machine Learning is complex !

Try to understand myself

Page 12: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 12

(Some Book you might want to read)

Page 13: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 13

Plumbing is not complex(but difficult)

Implicit User Data(Views, Searches…)

Content Data(Title, Categories, Price, …)

Explicit User Data(Click, Buy, …)

User Information(Location, Graph…)

500TB

50TB

1TB

200GB

Transformation Matrix

Transformation Predictor

Per User Stats

Per Content Stats

User Similarity

Rank Predictor

Content Similarity

Page 14: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 14

MERIT = TIME + ROI

Targeted Newsletter

RecommenderSystems

Adapted Product/ Promotions

TIME : 6 MONTHS ROI : APPS

Build a lab in 6 months (rather than 18 months)

Find the right people

(6 months?)

Choose the technology(6 months?)

Make it work (6 months?)

Build the lab (6 months)

Deploy apps that actually deliver value

2013 2014

2013

• Train People• Reuse working patterns

Page 15: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 15

The Problem

It’s utterly complex and unreasonable

Page 16: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 16

Our Goal

Our Goal:

Change his perspective on data science projects

(sorry, we couldn’tfind a picture of Hal Smiling)

Page 17: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 17

Why and For What ?◦ Business Theory ◦ Concrete Projects

How people and project ? ◦ How to start◦ Dedicated team ?

What technologies ? ◦ Machine Learning◦ Architecture

Agenda

Page 18: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 18

Embodiment of Knowledge

Find your core business avantage

Page 19: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 19

Product Success driven by Quality !

Margin / Customer Value / Traffic / Acquisition

Example: Launching an Appon the App Store

Page 20: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 20

Margin for new customers might decline …

Margin for new

features might decline …

Is your business really scalable ?

you continue growing ….

Page 21: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 21

Existing Customers Profiles

Existing Product Assets

Existing Specific Business Model

And your KNOWLEDGE of it

Where is your core business advantage ?

Page 22: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 22

Data Driven BusinessWhat your value ?

Number of Customers

Customer Knowledge

Increase over time with:- Time spend in your app- User relationship (network effet)- Partner / Other Apps Interactions

Your Value

1,409,540 $1,03$2,57

$4,081,710,239

2,534,123

Page 23: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 23

Data ImpactNot all business equals

Online Advertising

Telecommunication

Insurance

Ability to Acquire

Margin New Services Overall

Subscription Market

Infrastructure Driver

Selling Data

Risk / Price Optimization

Subscription Market

Subscription Market

Page 24: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 24

From Theory To Practice

Concrete Projects

Page 25: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 25

What should be free in the application ?

How to optimize conversion ?

How to plan and create a business model ?

Main Pain Point:How to plan and optimize pricing in the application ?

Freemium Application

Page 26: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 26

Example (Freemium Application) Fremium Model Optimization

BusinessModel

User Cluster

Simulation

Optimized Pricing: Margin +23%

Business Planning Capability 1 month 9 months

R + Python + InfiniDBOn-Premise1TB Dataset 5 weeks project

Page 27: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 27

Business Intelligence Stack as Scalability and maintenance issues

Backoffice implements business rules that are challenged

Existing infrastructure cannot cope with per-user information

Main Pain Point:23 hours 52 minutes to compute Business Intelligence aggregates for one day.

Large E-Retailer

Page 28: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku - Data Tuesday 28

• Relieve their current DWH

and accelerate production of some aggregates/KPIs

• Be the backbone for new personalized user experience on their website: more recommendations, more profiling, etc.,

• Train existing people around machine learning and segmentation experience

1h12 to perform the aggregate, available every morning

New home page personalization deployed in a few weeks

Hadoop Cluster (24 cores)Google Compute EnginePython + R + Vertica12 TB dataset6 weeks projects

Large E-Retailer : The Datalab

Page 29: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku - Data Tuesday 29

BI performed directly on production databases

New reports required the CTO direct work for design and implementation

Each photo tag manually validated and completed

Large Photo Bank

Main pain point:No visibility on new users behaviours

Page 30: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku - Data Tuesday 30

Implementing a Cloud-based data lab to :

• centralize all available data, previously scattered between SQL DB and file systems,

• improve web tracking granularity to enhance customer knowledge via behavior modeling and segmentation,

• create content-based recommendation engines with keywords clustering and association.

Large Photo Bank : The Datalab

R + Vertica + HadoopAmazon Web Services8 weeks projects

Automated content filtering and recommendation

Page 31: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 31

Large set of manually crafted linguistic resources for interpreting users queries

New Brands, rare terms .. hard to maintain

Large Online Directory

Main Pain Point:Ability to maintain a very large ontological knowledge sets, with more than 100k concepts

Page 32: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 32

Analyze clicks, rephrasing navigation to detect queries that require specific processing

Gather web and external data to enrich the existing index

Train team to Hadoop and Machine Learning

Continuous Relevance Monitoring

Automated enrichment 2x more productivity

Hadoop (48 cores) PythonOn Premise10 weeks projects

Large Online Directory: The Data Lab

Page 33: Dataiku - From Big Data To Machine Learning

Dataiku 33

Launch A Marketing campaign

After a few days PREDICT based on behaviours◦ Total ARPU for users

after 3 months◦ Efficiency of a campaign◦ Continue or not ?

Example ( E-Application ) Marketing Campaign Prediction

Page 34: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 34

A very large community

Some mid-size communities

Lots of small clusters mostly 2 players)

Correlation◦ between community size

and engagement / virality Meaningul patterns

◦ 2 players / Family / Group What is the minimum

number of friends to have in the application to get additional engagement ?

Example (Social Gaming) Social Gaming Communities

Page 35: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 35

What others do ? ◦ Concrete Projects

How people and project ? ◦ How to start◦ Dedicated team ?

What technologies ? ◦ Machine Learning◦ Architecture

Agenda

Page 36: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 36

First Steps

Drag picture to placeholder or click icon to add

Page 37: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 37

A / B Test (or equivalent for your business) is the first step to get into a “data-driven” mind set

No advanced analytics requires, some existing tools can help

Changing a color button +21%

(1) Be Data Driven

Page 38: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 38

People Microsoft Excel

(2) Use Excel

Page 39: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 39

Data Team Data Tools

(3) Build a team

The Business Expertwho knows maths

The Analyst that reveals patterns

The Coding Guy That is enthusiastic

Page 40: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 40

data lab, (n. m): a small group with all the expertise, including business minded people, machine learning knowledge and the right technology

A proven organization used by successful data-driven companies over the past few years (eBay, LinkedIn, Walmart…)

TEAM + TOOLS = LAB

Page 41: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 41

Organization

Targeted campaingsPrice optimization

Personalized experience

Quality AssuranceWorkload and yield

management

User Feedback (A/B Test)Continuous improvement

Data

Product Designer

Business &

Marketing

Engineers

User Voice

Page 42: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 42

Short Term Focus Long Term Drive

Business People Optimize Margin, …. Create new business revenue streams

Marketing People Optimize click ratio Brand awareness and impact

IT People Make IT work Clean and efficient Architecture

Data People Get Stats Right, make predictions

Create Data Driven Features

It’s just a new team

Page 43: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 43

Super Intern

What is your ability to integrate a new smart guy and give him any data he would need and any computingpower he would need to enhance your product ?

Page 44: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 44

What others do ? ◦ Concrete Projects

How people and project ? ◦ How to start◦ Dedicated team ?

What technologies ? ◦ Machine Learning◦ Architecture

Agenda

Page 45: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 45

An oversimplified view of big data architecture

Architecture Patterns

Page 46: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 46

Database Business Layer Application

Page 47: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 47

(What it really looks like)

Page 48: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 48

What kind of scale?

Database Business Layer Application

Or

Data Science App

Or ?

Page 49: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 49

What kind of interaction ?

Database Business Layer Application

Data Science App

?

?

? ? ?

?

Page 50: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 50

Classic Columnar Architecture

Some data Some Place To Pour It In

Some Tool To To Some Maths And Graphs

Page 51: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 51

Classic Columnar Architecture

Lots of data Some Place To Pour It In

Some Tool To To Some Maths And GraphsWeb Tracking Logs

Raw Server Logs

Order / Product / Customer

Facebook Info

Open Data (Weather, Currency …)

Page 52: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 52

The Corinthian Architecture

Lots of dataSome Place To Perform Rapid Calculations

Some Tools To Do Some Maths And Charts

Some Place To Pour It In And Clean / Prepare It

Page 53: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 53

Data Storage And Preparation

Large Scale:Hadoop Cluster CassandraMPP SQL Columnar

Medium/Large Scale:CouchBaseMongoDB….

Selection Drivers

VolumeScalability

Page 54: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 54

Calculations

Classic Database• PostgresSQL• MySQL• ….

MPP SQL Database • Vertica, Vectorwise, InfiniDB,

GreenplumHD….

Hadoop New Databases• Impala

Selection Drivers:

Speed ( Interactivity )

Expressivity

Page 55: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 55

The Corinthian Architecture

Lots of dataSome Place To Perform Rapid Calculations

Some Tools To Do Some Maths And Charts

Some Place To Pour It In And Clean / Prepare It

Statistics

Cohorts

Regressions

Bar Charts For Marketing

Nice Infography for you Company Board

Page 56: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 56

The Corinthian Architecture

Lots of dataSome Database To Perform Rapid Calculations

Some Tools To Do Some Maths Some Other To Do Some Charts

Some Place To Pour It In And Clean / Prepare It

Page 57: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 57

Statistical Tools

Open Source:• IPython • Rstudio

Commercial• RapidMiner• SAS• RevolutionR

Selection Drivers

Existing Knowhow

Scalability

Page 58: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 58

What is a statistical tool ?

Interact and explore data

Some stats capabilities

Some Graph Capabilities

Page 59: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 59

Visualization Tools

Open Source:• SpotFire• Tableau• QlikView

SAAS• BIME• ChartIO• RevolutionR

HTML5 / AdHoc• D3• GraphViz

Selection Drivers

How Many Contributors / Readers ?

Scalability

Page 60: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 60

The One Database won’t make it all problem

Lots of dataSome Database To Perform Rapid Calculations

Some Tools To Do Some Maths Some Other To Do Some Charts

Some Place To Pour It In And Clean / Prepare It

JOIN / Aggregate

Rapid Goup By Computations

Direct Access to the computed Results to production etc..

Page 61: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 61

The Roman Social Forum

Lots of dataSome Database To Perform Rapid CalculationsAnd Some DatabaseFor Graphs

Some Tools To Do Some Maths Some Other To Do Some Charts

Some Place To Pour It In And Clean / Prepare It

Page 62: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 62

Graph

Databases• Neo4J• Titan• OrientDB• InfiniteGraph

Analytic / Visualization• Gephi

Selection Drivers

Scalability

What Algorithms ?

Licensing Constraints

Page 63: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 63

The Key Value Store

Lots of dataSome Database To Perform Rapid CalculationsAnd Some DatabaseFor Graphs And Some Distributed Key Value Store

Some Tools To Do Some Maths Some Other To Do Some Charts

Some Place To Pour It In And Clean / Prepare It

Page 64: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 64

NoSQL

Search• SOLR• ElasticSearch

Document• MongoDB• CouchDB

KeyValue• Redis• Hbase

Selection Drivers

Durability / Avaiability …

Performance

Ease of use and API

Indexing

Page 65: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 65

Action requires Prediction

Lots of dataSome Database To Perform Rapid CalculationsAnd some databasefor graphs And Some Distributed Key Value Store

Some Tools To Do Some Maths Some Other To Do Some Charts

Some Place To Pour It In And Clean / Prepare It

Draw A Line For the future

What are my real users groups ?

Should I launch a discount offering or not ? To everybody or to specific users only ?

Page 66: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 66

The Medieval Fairy Land

Lots of data Some Tools To Do Some Maths Some Other To Do Some Charts and some MACHINE LEARNING

Some Place To Pour It In And Clean / Prepare It

Some Database To Perform Rapid CalculationsAnd Some DatabaseFor Graphs And Some Distributed Key Value Store

Page 67: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 67

Predictions

Java• Mahout (Hadoop)• WEKA

Python• Scikit-Learn• PyML

R

Commercial• Kxen• SAS• SPSS…

Selection Drivers

Scalability

Black Box / White Box ?

Data Management Integration

Page 68: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 68

Can be fun

Machine Learning

Page 69: Dataiku - From Big Data To Machine Learning

Exploratory Data Analysis◦ Identifying and visualizing key patterns and correlations within the dataset

Unsupervised Learning◦ Create groups of similar observations sharing same patterns (aka Clustering, Segmentation)

Supervised Learning◦ Modeling a variable using independent features (aka Scoring, Predictive Modeling, Classification)

Time Series Prevision◦ Predict a time-dependent variable using its own history, and sometimes other covariates

(variables)

Graph Analysis◦ Analyzing relationships between a set of “nodes”, linked by “edges”

Associations / Sequences Mining◦ Identifying frequently associated items within transactions/ events databases, sometimes ordered over time

And many more…

Classes of Machine Learning Problems

10/04/2023Dataiku - Innovation Services 69

Page 70: Dataiku - From Big Data To Machine Learning

Mapping ML to Business Questions

10/04/2023Dataiku - Innovation Services 70

Class Sample Business Questions

Exploratory Data Analysis What does my dataset look like ? What are the key correlations in my data ?

Unsupervised Learning Can I create groups of users who share the same purchasing behavior ? The same navigation behavior ?

Supervised Learning What users are likely to click on ad X ? What users are likely to convert to paying users ? Who is going to leave my service ? What is the profile of the users who do X ?

Time Series Prevision What is the prevision of my revenue next month ? Given the weather forecast, can I also forecast my sales ?Product Sale Forecast (for surbooking)

Graph Analysis Can I identify influencers in my users community ? Can I recommend new friends to my users ?

Association & Sequences Mining Which products are frequently bought together ? What is the typical navigation path on my website ?

Page 71: Dataiku - From Big Data To Machine Learning

Machine Learning Methods Detailed

10/04/2023Dataiku - Innovation Services 71

Analytical Task ML Task Sample Algorithms Shape of Dataset

Exploratory Data Analysis

Univariate Analysis Distribution, frequencies, histogram, boxplots, fit tests... N obs. (1 row per obs.) * P features

Bivariate Analysis Scatterplots, correlations (Pearson, Spearman), GLM, Chi Square...

N obs. (1 row per obs.) * P features

Multivariate Analysis

Principal components analysis, multi-dimensional scaling correspondence analysis, factor analysis…

N obs. (1 row per obs.) * P features

“Oriented” Data Analysis

Unsupervised Learning

K-means, K-medoids, hierarchical clustering, gaussian mixture models, mean shift, dbscan, spectral clustering...

N obs. (1 row per obs.) * P features

Supervised Learning Linear & logistic regression, decision trees, neural networks, SVM, naïve Bayes, K-NN, random forests…

N obs. (1 row per obs.) * P features

Time Series Prevision

ARMA, VARMAX, ARIMA… Time Series (rows: time period, columns: measures)

Graph Analysis Centrality (closeness, betweeness, Page Rank, HITS), modularity (Louvain)…

Nodes and Edges lists (+ attributes)

Associations & Sequences

Frequent Itemsets, A priori, Market Basket… (Timestamped) events or transactions

Page 72: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 72

Cluster a dataset into K Buckets by choosing the “closest” neighbours

Unsupervised MethodK-Means

Page 73: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 73

Predict the color of a point depending on the colors of its K closest neighbours

Supervised K-Nearest-Neighbours

Page 74: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 74

Find the most “significant” input variable and split value

Split the dataset recursively

SupervisedDecision Tree

Page 75: Dataiku - From Big Data To Machine Learning

Several Paths to Machine Learning

10/04/2023Dataiku - Innovation Services 75

Analytical Dataset

I’m looking for

clusters

I want to

predict a

variable

I’m looking variable

by variable, or pairs I know how

many groups to look for

HCA…

Partitioning (K-means…)

GMM…

DP GMM

K-means + Gap

| Silhouette | …

2-steps clusteri

ng

I just want to explore

Yes

No

Yes

No

Small Dataset (<<1K)Ye

sNo

Medium Dataset

(<<100K)Yes

No

I can sample

Yes

No

Affinity Propagation

, Mean Shift…

Unsupervised Learning

Yes

No

All my variables

are numeric Ye

sNo

CA…

I have a distance matrix

Yes

No

MDS...

PCA…

Exploratory Data Analysis Data Viz..

.

Yes

Not Only

I value interpretabil

ityGeneralized Linear

Model

Simple Decision Tree

Supervised Learning*

Correlation Analysis

GLM

Parametric and non parametric

stat. tests

* Methods generally working for both classification & regression

Support Vector

Machines

Neural Networ

ks

K-Nearest Neighbor

s

Ensembles (Random Forest, Gradient Boosted

Tree

MARS

Generalized

Additive Model

Page 76: Dataiku - From Big Data To Machine Learning

04/10/2023Dataiku 76

Questions ?

Take Away◦ There are new ways to perform data

analytics that are within your reach and can bring business value

Some Additional Resources◦ Open Source Projects

Dataiku Cloud Transport Clienthttp://dctc.io

Dataiku Web Trackerhttps://github.com/dataiku/wt1

◦ Our Technical Blog http://www.dataiku.com/blog