Sri Ambati – CEO, 0xdata at MLconf ATL

H2O.aiOpen Source

Machine Learningfor Intelligent Applications

H2O.aiMachine Intelligence

Time is the only non-renewable resource

Speed Matters!

Law of Large Numbers

Sampling

Per Node2M Row ingest/sec

50M Row Regression/sec

750M Row Aggregates / sec

On PremiseOn / Off HadoopOn EC2

H2O Prediction Engine

ensembles

Deep learningCl

Nano Fast Scoring Engine

Memory Manager Columnar Compression

Query Processor R-engine

In-Mem Map ReduceDistributed fork/join

HDFS S3 SQL NoSQL

SDK / API

Infrastructure

ParallelismData Parallel Chunking Express!Algorithm Parallel

Parallel Code blocksMath Parallelism

ADMM, HogWild

DistributionZero-Serialization –

endian wars have ended

Scalable Machine LearningFor Smarter Applications

H2O.ai

Programmable Internet

Programmable Devices

AdSense Sense

Correlation Causality

SensorsDevices

Semi-structured data. json. High velocity. High dimensions.

Events. Signals. TimeSeries

Streaming Data

Scoring from predictionAnomaly and Outliers DetectionUnsupervised Learning

Historical Data

Streaming Data

Anomaly and Outliers Detection

Historical Data

Scoring from prediction

Streaming Data

Clustering / Unsupervise Learning

Historical Data

Scoring from prediction

H2O.aiMachine Intelligence https://developer.nest.com/documentation/api-reference/devices

Take Models to Production in Java

Onset of Rita

Common ensemble techniquesBayesian Classifiers

Ensembles of all hypotheses in hypothesis-space.

Bagging Each model votes with equal weight.

Bagging trains models on randomly drawn subset

Boosting Incrementally build an ensemble of each new model

Gradient Boosting Machine

Variable Importance Comparison

Random Forest, 50 trees

Gradient Boosting Machine, 50 trees

Generalized Linear Modeling – Variable Importance

GLM, Elastic Net (Binomial)Categorical expansion on Age

GLM, Elastic Net (Binomial)

Variable Importance Comparison

Deep Learning (Tanh / 4-layer)

Deep Learning (Tanh / 3-layer)

every generation needs to invent it’s math.

Our data, our tools!

Power-Law

Code is incomplete without Community!

Open Source Matters!

CommunityCommitters 30Meet ups 90

in 12 months

Coverage

Conference Speakers

CurriculumStanford, MIT, CSU, SUNY, SJSU, Purdue

Data Driven Decision Making is hard!

Courage Matters!

ThanksCourtney, Nick & MLConf

for bringing us to ATL

Sparkling Water Application Life Cycle

Sparkling App

jar file

SparkMaster

spark-submit

SparkWorker

(1) User submits App to Spark cluster Master node(2) App distributed to Spark cluster Worker nodes(3) Spark Executor JVMs start for App(4) H2O instance starts within each Executor JVM(5) App’s Scala main program runs

Sparkling Water Cluster

Spark Executor JVM

H2O(4)

Spark Executor JVM

Sparkling Water Data Distribution

Sparkling Water Cluster

Spark Executor JVMData

Source(e.g.

HDFS) (1)

(1) Use Spark SQL to read data into a Spark RDD

(2) Convert Spark RDD to H2O RDD; H2O RDD is column-based and highly compressed

(Not shown) Run modeling and prediction workflows with H2O

(3) Convert H2O RDD (e.g. predictions) back to Spark RDD

H2ORDD

Spark Executor JVM

SparkRDD

Hadoop MR

Standalone YARN H2O in MR

HortonWorks, Cloudera, MapR, Intel H2O.aiMachine Intelligence

H2O – The Killer-App for Spark

Sparkling Water

HDFS=DATA

MLlib H2O SQLH2ORDD

In-Memory Big Data, ColumnarML 100x faster AlgosR CRAN, API, fast engineAPI Spark API, Java MMCommunity Devs, Data Science

examples

Fraud / No-fraud1/1000 unbalanced

Click-Stream Browse / Click / Buy

Propensity ModelsMerchants –to- Users

Lifetime Value of CustomerPricing Engines

Sri Ambati – CEO, 0xdata at MLconf ATL

Technology

Transcript of Sri Ambati – CEO, 0xdata at MLconf ATL

Kaz Sato, Evangelist, Google at MLconf ATL 2016

H2O Open New York - Keynote, Sri Ambati, CEO H2O.ai

Scott Triglia, MLconf 2013

MLconf NYC Justin Basilico

Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Josh Wills, MLconf 2013

MLconf NYC Shan Shan Huang

MLconf NYC Samantha Kleinberg

MLconf NYC Claudia Perlich

ReviewAnalysis MLconf 2016 JPrendki

Michal Malohlava, Software Engineer, H2O.ai at MLconf NYC

American Express Slides, MLconf 2013

Ted Willke, Intel Labs MLconf 2013

Sandy Ryza – Software Engineer, Cloudera at MLconf ATL

Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute

MLconf NYC Corinna Cortes

MLconf NYC Edo Liberty

MLConf Seattle 2015 - ML@Quora

MLconf Yael Elmatad

Jake Mannix, MLconf 2013