Dato Keynote

The ML pipeline circa 2013

Data ML

Algorithm

My curve is better than your curve

Write a paper

Retail

Movie Distribution

Music

Advertising

Networking

Search

Taxis

Dating

Legal Advice Human Resources

Coupons

Campaigning

Real Estate

Wearables

CRM

Disruptive companies differentiated by

INTELLIGENT APPLICATIONS

using

Machine Learning

Dato’s mission is to accelerate the creation of

intelligent applications

by making sophisticated machine learning

as easy as “Hello world!”

•  Released 3 products

•  More than 10,000 downloads

GraphLab Create Dato Distributed Dato Predictive Services

Since last year…

Since last year…

Our customers…

Demo: Intelligent application (Gift for Julia)

Systems Elastic, scalable

People Data scientist

Challenge today: Path from inspiration to production

Production

Prototyping

Inspiration

Scale

Sophisticated ML Production

Sophisticated ML is impractical

• Hard to match algo to app • Algos trapped in paper

Scaling is costly

• Rewrite algo from scratch • Expensive infrastructure

Deployment: more costly infrastructure & time

• Build custom services & API • Model quality deteriorates

Deploy Service

Slow & expensive process

Sophisticated ML is impractical

ML

dev

elo

pm

ent

tod

ay

Inspiration for Intelligent Application

Data

Top down solution would be easiest

Read data

Extract text

Create features

Choose model

Tune parameter

Forced to go bottoms up

Try again

And again

but not possible:

Application is innovative →

no black box solution available

Fine approach if it’s 2013 & I’m obsessed with

“my curve is better than your curve” (i.e., yet another solution for same old problem)

or not primarily focused on accelerating creation of intelligent applications

Inspiration for Intelligent Application

Data

If in 5 years all applications intelligent, ML needs:

Start from relevant, high-level, sophisticated ML building blocks

Don’t waste time on boring stuff, like parameter search or

worry about specialized ML knowledge, like SGD

Quickly write code: combine, blend,

understand, adapt, improve, optimize

Read data

Extract text

Create features

Choose model

Tune parameter

Forced to go bottoms up

Try again

And again

ML done differently,

Let’s see

how…

Demo: Building an intelligent application with GraphLab Create (Restaurant recommender)

High-level ML toolkits get started with 4 lines of code, then modify, blend, add yours…

Recommender Image search

Sentiment analysis

Data matching

Auto tagging

Churn predictor

Object detector

Product sentiment

Click prediction Fraud detection User

segmentation Data

completion

Anomaly detection

Document clustering Forecasting Search

ranking Summarization …

import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create(data,

user_id='user', item_id='movie’, target='rating')

recommendations = model.recommend(k=5)

Sophisticated machine learning made easy Create Intelligence Accelerants

High-level ML toolkits

AutoML

tune params, model selection,…

è so you can focus on

creative parts

Reusable features

transferrable feature engineering

è accuracy with less data &

less effort

Makes ML hard

Understand & scale

complex models

Feature engineering

Need for lots of

labeled data

Very hard! Usually: Simple models & lots of feature engineering

Krishna’s talk tomorrow @9:10am: auto feature engineering Next: Transfer learning can provide complex models with less work & less data

Modeling challenge Data challenge

Representation challenge

Example: Deep learning in computer vision

(or the deep devil is in the deep details)

Image features •  Features = local detectors

o  Combined to make prediction o  (in reality, features are more low-level)

Face!

Eye

Eye

Nose

Mouth

Many hand create features exist… Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$

Textons$ GLOH$Slide$Credit:$Honglak$Lee$

Standard image classification approach

Input

Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$


Extract features Use simple classifier e.g., logistic regression, SVMs

Car?

Many hand create features exist… Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$


… but very painful to design

Deep neural networks implicitly learn features

Each layer learns features, at different levels of abstraction

Y LeCunMA Ranzato

Deep Learning = Learning Hierarchical Representations

It's deep if it has more than one stage of non-linear feature transformation

Trainable Classifier

Low-LevelFeature

Mid-LevelFeature

High-LevelFeature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

Color & edge detectors

Geometric detectors

Car-specific detectors

Deep learning has yielded exciting accuracy, e.g., Krizhevsky et al. won 2012 ImageNet competition impressively

Huge gain

Challenges of deep learning

Deep learning workflow

Lots of labeled data

Training set

Validation set

80%

20%

Learn deep neural net

model

Validate

Many tricks needed to work well…

Different types of layers, connections,… needed for high accuracy

Krizhevsky et al. ‘12

GraphLab Create adds deep features

Deep learning + Transfer learning

Change image classification approach?

Input

Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$


Extract features Use simple classifier e.g., logistic regression, SVMs

Car?

Can we learn features from data, even when

we don’t have data or time?

Transfer learning: Use data from one domain to help learn on another

Lots of data:

Learn neural net

Great accuracy on cat v. dog vs.

Some data:

Neural net as feature extractor

+

Simple classifier

Great accuracy on 101

categories

Old idea, explored for deep learning by Donahue et al. ’14

What’s learned in a neural net Neural net trained for Task 1: cat vs. dog

Very specific to Task 1 Should be ignored for other tasks

More generic Can be used as feature extractor

vs.

Transfer learning in more detail…

Neural net trained for Task 1: cat vs. dog

Very specific to Task 1 Should be ignored for other tasks

More generic Can be used as feature extractor

Keep weights fixed!

For Task 2, predicting 101 categories, learn only end part

Use simple classifier e.g., logistic regression, SVMs

Class?

Transfer learning with deep features

Training set

Validation set

80%

20%

Learn simple model

Some labeled data

Extract features

with neural net trained on different

task

Validate Deploy in

production

Deep learning tutorial tomorrow, 4pm!

Demo: The power of deep features, a.k.a., transfer learning (Shoes, please)

How general are deep features?

Talk by founder, Jason Gates, tomorrow 9:40am

GraphLab Create includes easy to use, deep learning on multi-GPUs

Deep learning tutorial tomorrow, 4pm!

graphlab.deeplearning.create(data,target=label')

Deep learning in 1 line of code You can also

open the box and add your own layers

Average Pooling Layer Rectified Linear Layer

Convolution Layer Sigmoid Layer

Dropout Layer SoftMax Layer

Flatten Layer SoftPlus Layer

Full Connection Layer Sum Pooling Layer

Max Pooling Layer Tanh Layer

0.60%

0.65%

0.70%

0.75%

0.80%

0.85%

0 5 10 15

Tes

t Er

ror

Hours

Digit recognition benchmark

H2O.ai: 10 machines/80 cores

GraphLab Create 4 min on 4 GPUs

GraphLab Create for intelligent applications

High-level ML toolkits (4 lines of code gets you started)

deep learning, recommender, product reviews, data matching, sentiment, image search, churn,

click prediction, customer segmentation, fraud detection,…

Auto Feature Engineering (automate, achieve high accuracy)

. deep & reusable features . data transformation pipelines . kernels & hashing, encodings

AutoML (automate to focus on creativity) . parameter search . model selection . algorithm selection . distributed

Tables, graphs, text, images

Scalable viz for TBs of data

Including Matplotlib at scale

Anthony Goldbloom Founder & CEO

Debora Donato Sr. Director of Personalization & Principal Data Scientist

Native Advertising – The opportunity of making ads valuable

For the users

For the publishers

Bad advertising does not work for anybody

The data: •  400k raw html pages containing:

o  text, images, links, and well, everything web pages have The task: •  predict which pages are organic and which are

sponsored advertising When: •  starts August 1!

The Prize •  Fame!!! •  Knowledge!!! •  $10,000

A lot of effort in Kaggle competitions involves running many experiments…

…can get slow L

SFrame ❤ ️ all ML tools SGraph

Sophisticated machine learning made scalable Data Structures to Create Intelligence

Data frames user movie rating

When you choose a data frame,

have your application in mind

SFrame is optimized for ML

ML has specific data access patterns,

we make them fast, really fast (Columnar transformations,

creating new features, iterations,…)

… Same code

user movie rating

SFrame: Scalable data frame optimized for ML Never run out of memory Sharded, compressed, out-of-core, columnar Arbitrary lambda transformations, joins,… from Python

Talk tomorrow with details: Yucheng @11am

Large data on one machine?

Limited RAM è Must use disk (out-of-core computation)

Opportunity for Out-of-Core ML

Capacity 1 TB

0.5 GB/s

10 TB

0.1 GB/s

0.1 TB

1 GB/s Throughput

Fast, but significantly limits data size Opportunity for big data on 1 machine

For sequential reads only! Random access very slow

Out-of-core ML opportunity is huge

Usual design → Lots of random access → Slow

Design to maximize sequential access for

ML algo patterns

GraphChi early example SFrame data frame for ML

Demo: 10TBs of data on one machine!

SFrame ❤ ️ all ML

scikit-learn is awesome, but...

0

1000

2000

3000

4000

0 50 100 150 200 250 300 350 400

Ru

ntim

e (s

)

Millions of Rows Airline Delay Dataset, SGDLinearClassifier

scikit-learn +

Numpy

Out of RAM Numpy in memory only

Demo: 10TBs of data on one machine redux

Numpy Automatically Backed by Sframes → Scale many Python packages (scikit-learn, scipy,…)

import graphlab.numpy Scalable numpy activation successful

0

1000

2000

3000

4000

0 50 100 150 200 250 300 350 400

Ru

ntim

e (s

)

Millions of Rows Airline Delay Dataset,

SGDLinearClassifier

Out of RAM Graphlab Create

+ scikit-learn

+ Numpy

scikit-learn +

Numpy

Caveats apply

- Scales most memory-bound sklearn algorithms

- Sequential access highly preferred for performance

ML is not just about tables

ML pipelines combine multiple data types

Raw Wikipedia

< / > < / > < / > XML

Hyperlinks PageRank Top 20 Pages

Title PR Text

Table

Title Body Topic Model

(LDA) Word Topics

Word Topic

Term-Doc Graph

SGraph

Graph processing & analytics

Out-of-core & scalable

Neighborhoods, paths, graph algos, community detection,

label propagation, ML on graphs, viz, …

Backed by SFrame

Performance of SGraph

55

70 sec

251 sec

200 sec

2,128 sec

0 750 1500 2250

GraphLab Create

GraphX

Giraph

Spark

Connected components in Twitter graph

Source(s): Gonzalez et. al. (OSDI 2014) Twitter: 41 million Nodes, 1.4 billion Edges

SGraph

16 machines

1 machine

Pagerank on Common Crawl Graph 3.5 billion Nodes and 128 billion Edges

0

2

4

6

8

10

1 machine

Min

ute

s p

er it

erat

ion

16 CPUs, 1 SSD

We ❤ ️ open source

SFrame & SGraph

Optimized out-of-core

computation for ML

High Performance 1 machine can handle:

TBs of data 100s Billions of edges

Optimized for ML . Columnar transformation . Create features . Iterators . Filter, join, group-by, aggregate . User-defined functions . Easily extended through SDK

Tables, graphs, text, images

Open-source ❤ ️

BSD license

(August)

Distributed machine learning

Your big data infrastructure

(cloud, hadoop, spark,..)

Sophisticated machine learning made distributed Create Intelligence on Huge Data

Pagerank on Common Crawl Graph 3.5 billion Nodes and 128 billion Edges

0

2

4

6

8

10

1 machine 16 machines

Min

ute

s p

er it

erat

ion

256 CPUs 16 CPUs

45 secs/iteration 3B edges/sec

Criteo Terabyte Click Prediction

4.4 Billion Rows 13 Features

½ TB of data

0

500

1000

1500

2000

2500

3000

3500

4000

0 4 8 12 16

Ru

ntim

e

#Machines

225s

3630s

Same code, distributed ML

import graphlab as gl data = gl.SFrame.read_csv(’s3://…') model = gl.classifier.create(data,

target=’click’)

Sin

gle

mac

hin

e

ML

cod

e

c = gl.deploy.ec2_cluster.load(’s3://…')

gl.set_distributed_execution_environment(c)

c = gl.deploy.hadoop_cluster.load(’hdfs://…') c = gl.deploy.spark_cluster.load(’hdfs://…') …

Dato machine learning platform

Inspiration

Scale

Sophisticated ML

Optimized for ML performance, for any data size, on any infrastructure

AutoML

GraphLab Create

ML Toolkits

Canvas

Reusable Features

Job Mgmt

Distributed Engine

Distributed ML Dato Distributed

SGraph

Create Engine

SFrame GraphLab Create

Machine Learning In Production

Machine Learning in Production

Deployment

Easily serve live predictions

Deployment Engineers

Deploying ML models

Data Scientists

Exciting new deep learning model.

How long is this going to take?!

REST API! I will be done today.

It’s accurate!

Dato Predictive Services

Choosing between deployed models

Machine Learning in Production

Evaluation

Monitoring

Deployment

Management

Easily serve live predictions

Measuring quality of deployed models

Tracking model operations

Talk tomorrow with details: Alice & Rajat @1:45pm

Evaluation

Monitoring

Deployment

Management

Inspiration

Scale

Sophisticated ML


AutoML

GraphLab Create

ML Toolkits

Canvas

Reusable Features

Job Mgmt

Distributed Engine


SGraph

Create Engine




Inspiration

Scale

Production Deploy Service


AutoML

GraphLab Create

ML Toolkits

Canvas

Reusable Features REST Client Model Mgmt

Dato Predictive Services

Robust, Elastic

Direct

Job Mgmt

Distributed Engine


SGraph

Create Engine


Sophisticated ML

Create of intelligent applications faster & cheaper

My curve is better than your curve

INTELLIGENT APPLICATIONS

are disrupting markets

Phase transition of machine learning

Accelerate this process

> pip install graphlab-create

[email protected] @guestrin

Dato Keynote

Data & Analytics

Transcript of Dato Keynote