The Challenges of Bringing Machine Learning to the Masses

The Challenges of Bringing Machine Learning to the MassesAlice Zheng and Sethu RamanGraphLab Inc.NIPS workshop on Software Engineering for Machine LearningDecember 13, 2014

Self introduction

ML Research

“Accessible ML”

The need for accessible ML• So much potential in ML• Everyone trying to make sense of their

data• ML is transforming lives and industries:

personalized medicine, internet search, social networks, advertising, etc.

• But success is unattainable to most

Building a predictive app

Was using 217 business ruleshoping world doesn’t change

Have an inspiring idea toreinvent their business

Key pains:Hiring Talent

Shortfall in data-savvy workers needed to make sense out of big data by 2018 [McKinsey 2011]

Noisy Space of Tools

Data scientists use a variety of tools, across

different programming languages… require a lot of context-switching…affects productivity and impedes

reproducibility.Ben Lorica, Data Analysis: Just one component of

the Data Science workflow

Building a predictive app

Featureengineering

Modeldefinition

Trainingevaluation

DeploymentMonitoring

Pure ML is not enough• Building a predictive application involves much

more than just building ML models• System engineering: data storage, computation

infrastructure, networking…• Data Science: problem definition, data cleaning,

feature engineering• Software development: turn prototype model

into bullet-proof production code• Operations engineering: deploy and monitor app• …

Pain points• What are the right features?• What model should I use?• How do I train it?• How do I set the tuning parameters?• Do I even have the right data?• Ok, I have a working prototype, now

Pain points• Increase in data size or decrease in

latency requires complete rewrite of code and new toolset• GB – R/scikit-learn/Matlab• TB-PB—Hadoop/Mahout/Spark

• Many forms of data and data structures• Images, text, speech, logs• Dense lists, sparse dictionaries, time series• Tables, graphs, matrices, tensors

The need for an ML platform• Minimize tool/code switching, maximize

performance (speed/accuracy/scale)• Graceful transition from small to large

dataset sizes• Flexible, interoperable data types• Minimize complexity• System-agnostic• Simple API• Auto-tune parameters

The parallel to databases• What’s an example of a mega-

successful platform for data operations?

• Databases! • SQL, Oracle, NoSQL, …

• What lessons can we bring in from the database world?

Database engine components

Storageengine

Queryexecution

Queryoptimizer

Storage

Storageengine

Queryexecution

Queryoptimizer

Storage

Complex but self-contained, has clean API,only changes when there’s new hardware.

Storageengine

Queryexecution

Queryoptimizer

Storage

Complex bag of tricks, no formalism,constantly changing to adapt to data, query, disk characteristics.

ML engine componentsFeature

engineeringModel

definitionTraining

evaluation

Bags of tricks,expert knowledge,experience,lots of trial and error

Advances in databases• Reasonable abstraction—relational

DB• Hardware speedups• Pragmatic software implementationSuccessful platform• Take-away lesson: fast computation

engine + “good enough” execution plan

To advance ML platforms• ML will be end-user friendly when the

platform is clever enough to handle less-than-optimal directions from the user

• What needs to happen?• The complexity needs to be automated

and wrapped away with neat interfaces between components

• Fast components, “good enough” directions

GraphLab• Started as a research project at CMU

in 2009• Now a Seattle-based startup

The GraphLab CreateTM Solution• Flexible, interoperable data types

• SArray+SFrame+SGraph inter-translatable• dense list, sparse array, image, text, tables, graphs

• Graceful transition between data sizes• SFrame: memory to disk to distributed

• One environment, many substrates• Python front-end• Localhost, cluster, Hadoop, EC2

• End-to-end• Data ingestion+feature engineering+model

building+ deployment in a single environment

GraphLab Create ML Toolkits

Machine Learning Task

Business Task

Algorithms & SDK

Recommender, Target, Social Match, …

Regression, Classification, Data Matching,…

SVM, Matrix Factorization, LDA, …

Developers

Savvy Dev& Data Sci.

ML experts

GLC SDK example• Task: fill in missing value in an array

using previous value• Existing solution:• E.g., use Pandas—Python library

providing in-memory dataframes• Problem:• Given, say, 25M rows and 50 cols, takes

forever to even load the data

GLC SDK solution> cat fill.cpp#include <flexible_type/flexible_type.hpp>#include <unity/lib/toolkit_function_macros.hpp>#include <unity/lib/gl_sarray.hpp>

using namespace graphlab;

gl_sarray fill(gl_sarray sa) { gl_sarray_writer writer(sa.dtype(), 1); flexible_type last_value = sa[0]; for (const auto &elem: sa.range_iterator()) { if (elem != FLEX_UNDEFINED) last_value = elem; writer.write(last_value, 0); } return writer.close();}

BEGIN_FUNCTION_REGISTRATIONREGISTER_FUNCTION(fill, "sa");END_FUNCTION_REGISTRATION

GLC SDK solution

> cat Makefileall: fill.so

fill.so: fill.cppg++ -std=c++11 $^ -l graphlab –l

~/graphlab-dev/deps/shared-fPIC –o $@ -O3

> python>>> import graphlab as gl>>> gl.ext_import(‘fill.so’, ‘example’)>>> sa = gl.Sarray([1, 2, 3, None, 6])>>> print gl.extensions.example.fill.fill(sa)[1, 2, 3, 3, 6]

Join the revolution!• Research methods to make the following

efficient and automatic:• Feature engineering• Model selection• Model debugging• Problem formulation (??)

• Develop novel algorithms on top of our SDK• Backed by scalable, flexible typed data structures• Automatic Python wrappers• Make them available to many other peple

• We’re hiring! jobs@graphlab.com

The Challenges of Bringing Machine Learning to the Masses

Engineering

Transcript of The Challenges of Bringing Machine Learning to the Masses

Bringing the Mountain to the Masses: New Media and Immersive Marketing

SOLVING MANUFACTURING CHALLENGES AND BRINGING SPIN · PDF fileSOLVING MANUFACTURING CHALLENGES AND BRINGING SPIN TORQUE MRAM ... Magnetoresistive RAM, ... The Market for ^Persistent

Parallelism for the Masses: Opportunities and …...Parallelism for the Masses “Opportunities and Challenges” 3 ©Inte Cl orporaotin •Broad range of systems (servers, desktops,

= OnBase Higher Education Solutions Bringing Structure to Chaos Challenges & Opportunities for Universitie s.

RS Sector Outlook Series Bringing industry challenges to ...€¦ · RS Sector Outlook Series Bringing industry challenges to the surface 1. The biggest challenges facing the sector

Paper Fuel Cells: Bringing the Hydrogen Economy to the Masses

Bringing devices to the masses: a comparative study of the ... fileBringing devices to the masses: a comparative study of the Brazilian Computador Popular and the Indian Simputer Rodrigo

Bringing in the Resources. Bringing in the Resources Agenda Introductions Who’s in the Room? Challenges Creative Solutions Discussion Wrap-up.

Bringing new energy to Britain: Policies, pathways and challenges - Steven Fries, DECC

The challenges and opportunities in bringing data science ... · The challenges and opportunities in bringing data science to the problem of homelessness CARES of NY, Inc, is a not-for-profit

Thinking Small: Bringing the Power of Big Data to the Masses · Thinking Small: Bringing the Power of Big Data to the Masses | Digital Clarity Group 3 Everyone seems to be talking

Thinking around: what are the practical challenges of bringing safeguarding & personalisation?

RS Sector Outlook Series Bringing industry challenges · PDF fileRS Sector Outlook Series Bringing industry challenges to the surface Mining – Finding the Balance March 2016

CUSTOMER EXPERIENCE CHALLENGES: BRINGING TOGETHER …

Bringing VR and Spatial 3D Interaction to the Masses through Video ...

Bringing in the Masses: Making Your Library Essential

Review: Bringing Mini ERP to the Masses · 2018-05-26 · Fishbowl Inventory Review: Bringing Mini‐ERP to the Masses Monitor Any ERP application will have the ability to create

Twilio Signal 2016 Bringing P2P to the Masses with WebRTC

NEUTRINO MASSES AND OSCILLATIONS NEUTRINO MASSES AND OSCILLATIONS Triumphs and Challenges R. D. McKeown Caltech.

Bringing The Power of SDN to The Masses