Dato Keynote

69

Transcript of Dato Keynote

Page 1: Dato Keynote
Page 2: Dato Keynote

The ML pipeline circa 2013

Data ML

Algorithm

My curve is better than your curve

Write a paper

Page 3: Dato Keynote
Page 4: Dato Keynote

Retail

Movie Distribution

Music

Advertising

Networking

Search

Taxis

Dating

Legal Advice Human Resources

Coupons

Campaigning

Real Estate

Wearables

CRM

Disruptive companies differentiated by

INTELLIGENT APPLICATIONS

using

Machine Learning

Page 5: Dato Keynote

Dato’s mission is to accelerate the creation of

intelligent applications

by making sophisticated machine learning

as easy as “Hello world!”

Page 6: Dato Keynote

•  Released 3 products

•  More than 10,000 downloads

GraphLab Create Dato Distributed Dato Predictive Services

Since last year…

Page 7: Dato Keynote

Since last year…

Our customers…

Page 8: Dato Keynote

Demo: Intelligent application (Gift for Julia)

Page 9: Dato Keynote

Systems Elastic, scalable

People Data scientist

Challenge today: Path from inspiration to production

Production

Prototyping

Inspiration

Scale

Sophisticated ML Production

Sophisticated ML is impractical

• Hard to match algo to app • Algos trapped in paper

Scaling is costly

• Rewrite algo from scratch • Expensive infrastructure

Deployment: more costly infrastructure & time

• Build custom services & API • Model quality deteriorates

Deploy Service

Slow & expensive process

Page 10: Dato Keynote

Sophisticated ML is impractical

Page 11: Dato Keynote

ML

dev

elo

pm

ent

tod

ay

Inspiration for Intelligent Application

Data

Top down solution would be easiest

Read data

Extract text

Create features

Choose model

Tune parameter

Forced to go bottoms up

Try again

And again

but not possible:

Application is innovative →

no black box solution available

Fine approach if it’s 2013 & I’m obsessed with

“my curve is better than your curve” (i.e., yet another solution for same old problem)

or not primarily focused on accelerating creation of intelligent applications

Page 12: Dato Keynote

Inspiration for Intelligent Application

Data

If in 5 years all applications intelligent, ML needs:

Start from relevant, high-level, sophisticated ML building blocks

Don’t waste time on boring stuff, like parameter search or

worry about specialized ML knowledge, like SGD

Quickly write code: combine, blend,

understand, adapt, improve, optimize

Read data

Extract text

Create features

Choose model

Tune parameter

Forced to go bottoms up

Try again

And again

ML done differently,

Let’s see

how…

Page 13: Dato Keynote

Demo: Building an intelligent application with GraphLab Create (Restaurant recommender)

Page 14: Dato Keynote

High-level ML toolkits get started with 4 lines of code, then modify, blend, add yours…

Recommender Image search

Sentiment analysis

Data matching

Auto tagging

Churn predictor

Object detector

Product sentiment

Click prediction Fraud detection User

segmentation Data

completion

Anomaly detection

Document clustering Forecasting Search

ranking Summarization …

import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create(data,

user_id='user', item_id='movie’, target='rating')

recommendations = model.recommend(k=5)

Page 15: Dato Keynote

Sophisticated machine learning made easy Create Intelligence Accelerants

High-level ML toolkits

AutoML

tune params, model selection,…

è so you can focus on

creative parts

Reusable features

transferrable feature engineering

è accuracy with less data &

less effort

Page 16: Dato Keynote

Makes ML hard

Understand & scale

complex models

Feature engineering

Need for lots of

labeled data

Very hard! Usually: Simple models & lots of feature engineering

Krishna’s talk tomorrow @9:10am: auto feature engineering Next: Transfer learning can provide complex models with less work & less data

Modeling challenge Data challenge

Representation challenge

Page 17: Dato Keynote

Example: Deep learning in computer vision

(or the deep devil is in the deep details)

Page 18: Dato Keynote

Image features •  Features = local detectors

o  Combined to make prediction o  (in reality, features are more low-level)

Face!

Eye

Eye

Nose

Mouth

Page 19: Dato Keynote

Many hand create features exist… Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$

Textons$ GLOH$Slide$Credit:$Honglak$Lee$

Page 20: Dato Keynote

Standard image classification approach

Input

Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$

Textons$ GLOH$Slide$Credit:$Honglak$Lee$

Extract features Use simple classifier e.g., logistic regression, SVMs

Car?

Page 21: Dato Keynote

Many hand create features exist… Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$

Textons$ GLOH$Slide$Credit:$Honglak$Lee$

… but very painful to design

Page 22: Dato Keynote

Deep neural networks implicitly learn features

Each layer learns features, at different levels of abstraction

Y LeCunMA Ranzato

Deep Learning = Learning Hierarchical Representations

It's deep if it has more than one stage of non-linear feature transformation

Trainable Classifier

Low-LevelFeature

Mid-LevelFeature

High-LevelFeature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

Color & edge detectors

Geometric detectors

Car-specific detectors

Page 23: Dato Keynote

Deep learning has yielded exciting accuracy, e.g., Krizhevsky et al. won 2012 ImageNet competition impressively

Huge gain

Page 24: Dato Keynote

Challenges of deep learning

Page 25: Dato Keynote

Deep learning workflow

Lots of labeled data

Training set

Validation set

80%

20%

Learn deep neural net

model

Validate

Page 26: Dato Keynote

Many tricks needed to work well…

Different types of layers, connections,… needed for high accuracy

Krizhevsky et al. ‘12

Page 27: Dato Keynote

GraphLab Create adds deep features

Deep learning + Transfer learning

Page 28: Dato Keynote

Change image classification approach?

Input

Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$

Textons$ GLOH$Slide$Credit:$Honglak$Lee$

Extract features Use simple classifier e.g., logistic regression, SVMs

Car?

Can we learn features from data, even when

we don’t have data or time?

Page 29: Dato Keynote

Transfer learning: Use data from one domain to help learn on another

Lots of data:

Learn neural net

Great accuracy on cat v. dog vs.

Some data:

Neural net as feature extractor

+

Simple classifier

Great accuracy on 101

categories

Old idea, explored for deep learning by Donahue et al. ’14

Page 30: Dato Keynote

What’s learned in a neural net Neural net trained for Task 1: cat vs. dog

Very specific to Task 1 Should be ignored for other tasks

More generic Can be used as feature extractor

vs.

Page 31: Dato Keynote

Transfer learning in more detail…

Neural net trained for Task 1: cat vs. dog

Very specific to Task 1 Should be ignored for other tasks

More generic Can be used as feature extractor

Keep weights fixed!

For Task 2, predicting 101 categories, learn only end part

Use simple classifier e.g., logistic regression, SVMs

Class?

Page 32: Dato Keynote

Transfer learning with deep features

Training set

Validation set

80%

20%

Learn simple model

Some labeled data

Extract features

with neural net trained on different

task

Validate Deploy in

production

Deep learning tutorial tomorrow, 4pm!

Page 33: Dato Keynote

Demo: The power of deep features, a.k.a., transfer learning (Shoes, please)

Page 34: Dato Keynote

How general are deep features?

Talk by founder, Jason Gates, tomorrow 9:40am

Page 35: Dato Keynote

GraphLab Create includes easy to use, deep learning on multi-GPUs

Deep learning tutorial tomorrow, 4pm!

graphlab.deeplearning.create(data,target=label')

Deep learning in 1 line of code You can also

open the box and add your own layers

Average Pooling Layer Rectified Linear Layer

Convolution Layer Sigmoid Layer

Dropout Layer SoftMax Layer

Flatten Layer SoftPlus Layer

Full Connection Layer Sum Pooling Layer

Max Pooling Layer Tanh Layer

Page 36: Dato Keynote

0.60%

0.65%

0.70%

0.75%

0.80%

0.85%

0 5 10 15

Tes

t Er

ror

Hours

Digit recognition benchmark

H2O.ai: 10 machines/80 cores

GraphLab Create 4 min on 4 GPUs

Page 37: Dato Keynote

GraphLab Create for intelligent applications

High-level ML toolkits (4 lines of code gets you started)

deep learning, recommender, product reviews, data matching, sentiment, image search, churn,

click prediction, customer segmentation, fraud detection,…

Auto Feature Engineering (automate, achieve high accuracy)

. deep & reusable features . data transformation pipelines . kernels & hashing, encodings

AutoML (automate to focus on creativity) . parameter search . model selection . algorithm selection . distributed

Tables, graphs, text, images

Scalable viz for TBs of data

Including Matplotlib at scale

Page 38: Dato Keynote

Anthony Goldbloom Founder & CEO

Debora Donato Sr. Director of Personalization & Principal Data Scientist

Page 39: Dato Keynote

Native Advertising – The opportunity of making ads valuable

For  the  users  

For  the  publishers  

Page 40: Dato Keynote

Bad advertising does not work for anybody

Page 41: Dato Keynote

The data: •  400k raw html pages containing:

o  text, images, links, and well, everything web pages have The task: •  predict which pages are organic and which are

sponsored advertising When: •  starts August 1!

The Prize •  Fame!!! •  Knowledge!!! •  $10,000

Page 42: Dato Keynote

A lot of effort in Kaggle competitions involves running many experiments…

…can get slow L

Page 43: Dato Keynote

SFrame ❤ ️ all ML tools SGraph

Sophisticated machine learning made scalable Data Structures to Create Intelligence

Page 44: Dato Keynote

Data frames user movie rating

When you choose a data frame,

have your application in mind

SFrame is optimized for ML

ML has specific data access patterns,

we make them fast, really fast (Columnar transformations,

creating new features, iterations,…)

Page 45: Dato Keynote

… Same code

user movie rating

SFrame: Scalable data frame optimized for ML Never run out of memory Sharded, compressed, out-of-core, columnar Arbitrary lambda transformations, joins,… from Python

Talk tomorrow with details: Yucheng @11am

Large data on one machine?

Limited RAM è Must use disk (out-of-core computation)

Page 46: Dato Keynote

Opportunity for Out-of-Core ML

Capacity 1 TB

0.5 GB/s

10 TB

0.1 GB/s

0.1 TB

1 GB/s Throughput

Fast, but significantly limits data size Opportunity for big data on 1 machine

For sequential reads only! Random access very slow

Out-of-core ML opportunity is huge

Usual design → Lots of random access → Slow

Design to maximize sequential access for

ML algo patterns

GraphChi early example SFrame data frame for ML

Page 47: Dato Keynote

Demo: 10TBs of data on one machine!

Page 48: Dato Keynote

SFrame ❤ ️ all ML

Page 49: Dato Keynote

scikit-learn is awesome, but...

0

1000

2000

3000

4000

0 50 100 150 200 250 300 350 400

Ru

ntim

e (s

)

Millions of Rows Airline Delay Dataset, SGDLinearClassifier

scikit-learn +

Numpy

Out of RAM Numpy in memory only

Page 50: Dato Keynote

Demo: 10TBs of data on one machine redux

Page 51: Dato Keynote

Numpy Automatically Backed by Sframes → Scale many Python packages (scikit-learn, scipy,…)

import graphlab.numpy Scalable numpy activation successful

0

1000

2000

3000

4000

0 50 100 150 200 250 300 350 400

Ru

ntim

e (s

)

Millions of Rows Airline Delay Dataset,

SGDLinearClassifier

Out of RAM Graphlab Create

+ scikit-learn

+ Numpy

scikit-learn +

Numpy

Caveats apply

- Scales most memory-bound sklearn algorithms

- Sequential access highly preferred for performance

Page 52: Dato Keynote

ML is not just about tables

Page 53: Dato Keynote

ML pipelines combine multiple data types

Raw Wikipedia

< / > < / > < / > XML

Hyperlinks PageRank Top 20 Pages

Title PR Text

Table

Title Body Topic Model

(LDA) Word Topics

Word Topic

Term-Doc Graph

Page 54: Dato Keynote

SGraph

Graph processing & analytics

Out-of-core & scalable

Neighborhoods, paths, graph algos, community detection,

label propagation, ML on graphs, viz, …

Backed by SFrame

Page 55: Dato Keynote

Performance of SGraph

55  

70 sec

251 sec

200 sec

2,128 sec

0 750 1500 2250

GraphLab Create

GraphX

Giraph

Spark

Connected components in Twitter graph

Source(s): Gonzalez et. al. (OSDI 2014) Twitter: 41 million Nodes, 1.4 billion Edges

SGraph

16 machines

1 machine

Page 56: Dato Keynote

Pagerank on Common Crawl Graph 3.5 billion Nodes and 128 billion Edges

0

2

4

6

8

10

1 machine

Min

ute

s p

er it

erat

ion

16 CPUs, 1 SSD

Page 57: Dato Keynote

We ❤ ️ open source

Page 58: Dato Keynote

SFrame & SGraph

Optimized out-of-core

computation for ML

High Performance 1 machine can handle:

TBs of data 100s Billions of edges

Optimized for ML . Columnar transformation . Create features . Iterators . Filter, join, group-by, aggregate . User-defined functions . Easily extended through SDK

Tables, graphs, text, images

Open-source ❤ ️

BSD license

(August)

Page 59: Dato Keynote

Distributed machine learning

Your big data infrastructure

(cloud, hadoop, spark,..)

Sophisticated machine learning made distributed Create Intelligence on Huge Data

Page 60: Dato Keynote

Pagerank on Common Crawl Graph 3.5 billion Nodes and 128 billion Edges

0

2

4

6

8

10

1 machine 16 machines

Min

ute

s p

er it

erat

ion

256 CPUs 16 CPUs

45 secs/iteration 3B edges/sec

Page 61: Dato Keynote

Criteo Terabyte Click Prediction

4.4 Billion Rows 13 Features

½ TB of data

0

500

1000

1500

2000

2500

3000

3500

4000

0 4 8 12 16

Ru

ntim

e

#Machines

225s

3630s

Page 62: Dato Keynote

Same code, distributed ML

import graphlab as gl data = gl.SFrame.read_csv(’s3://…') model = gl.classifier.create(data,

target=’click’)

Sin

gle

mac

hin

e

ML

cod

e

c = gl.deploy.ec2_cluster.load(’s3://…')

gl.set_distributed_execution_environment(c)

c = gl.deploy.hadoop_cluster.load(’hdfs://…') c = gl.deploy.spark_cluster.load(’hdfs://…') …

Page 63: Dato Keynote

Dato machine learning platform

Inspiration

Scale

Sophisticated ML

Optimized for ML performance, for any data size, on any infrastructure

AutoML

GraphLab Create

ML Toolkits

Canvas

Reusable Features

Job Mgmt

Distributed Engine

Distributed ML Dato Distributed

SGraph

Create Engine

SFrame GraphLab Create

Machine Learning In Production

Page 64: Dato Keynote

Machine Learning in Production

Deployment

Easily serve live predictions

Page 65: Dato Keynote

Deployment Engineers

Deploying ML models

Data Scientists

Exciting new deep learning model.

How long is this going to take?!

REST API! I will be done today.

It’s accurate!

Dato Predictive Services

Page 66: Dato Keynote

Choosing between deployed models

Machine Learning in Production

Evaluation

Monitoring

Deployment

Management

Easily serve live predictions

Measuring quality of deployed models

Tracking model operations

Talk tomorrow with details: Alice & Rajat @1:45pm

Page 67: Dato Keynote

Evaluation

Monitoring

Deployment

Management

Inspiration

Scale

Sophisticated ML

Optimized for ML performance, for any data size, on any infrastructure

AutoML

GraphLab Create

ML Toolkits

Canvas

Reusable Features

Job Mgmt

Distributed Engine

Distributed ML Dato Distributed

SGraph

Create Engine

SFrame GraphLab Create

Dato machine learning platform

Page 68: Dato Keynote

Dato machine learning platform

Inspiration

Scale

Production Deploy Service

Optimized for ML performance, for any data size, on any infrastructure

AutoML

GraphLab Create

ML Toolkits

Canvas

Reusable Features REST Client Model Mgmt

Dato Predictive Services

Robust, Elastic

Direct

Job Mgmt

Distributed Engine

Distributed ML Dato Distributed

SGraph

Create Engine

SFrame GraphLab Create

Sophisticated ML

Create of intelligent applications faster & cheaper

Page 69: Dato Keynote

My curve is better than your curve

INTELLIGENT APPLICATIONS

are disrupting markets

Phase transition of machine learning

Accelerate this process

> pip install graphlab-create

[email protected] @guestrin