H2O 0xdata MLconf

43
H2O.ai Open Source Machine Learning for Intelligent Applications H 2 O.ai Machine Intelligence

description

Variable Importance

Transcript of H2O 0xdata MLconf

Page 1: H2O 0xdata MLconf

H2O.aiOpen Source

Machine Learningfor Intelligent Applications

H2O.aiMachine Intelligence

Page 2: H2O 0xdata MLconf

Time is the only non-renewable resource

Speed Matters!

H2O.aiMachine Intelligence

Page 3: H2O 0xdata MLconf

Law of Large Numbers

Sampling

Page 4: H2O 0xdata MLconf

Data scientists & Analysts will not write Java MapReduce

Page 5: H2O 0xdata MLconf

Per Node2M Row ingest/sec

50M Row Regression/sec

750M Row Aggregates / sec

On PremiseOn / Off HadoopOn EC2

Tabl

eau

RJSON

Scal

aJa

va

H2O Prediction Engine

ensembles

Deep learningCl

uste

r

Nano Fast Scoring Engine

Memory Manager Columnar Compression

Query Processor R-engine

In-Mem Map ReduceDistributed fork/join

Pyth

on

HDFS S3 SQL NoSQL

Regr

essi

onCl

assi

fy

Tree

s

Boos

ting

Fore

sts

Solv

ers

Gra

dien

ts

SDK / API

Exce

l

H2O.aiMachine Intelligence

Page 6: H2O 0xdata MLconf

Infrastructure

ParallelismData Parallel Chunking Express!Algorithm Parallel

Parallel Code blocksMath Parallelism

ADMM, HogWild

DistributionZero-Serialization –

endian wars have ended

Page 7: H2O 0xdata MLconf

Scalable Machine LearningFor Smarter Applications

H2O.aiMachine Intelligence

H2O.ai

Page 8: H2O 0xdata MLconf

Programmable Internet

H2O.aiMachine Intelligence

Page 9: H2O 0xdata MLconf

Programmable Devices

H2O.aiMachine Intelligence

Page 10: H2O 0xdata MLconf

AdSense Sense

H2O.aiMachine Intelligence

Page 11: H2O 0xdata MLconf

Correlation Causality

H2O.aiMachine Intelligence

Page 12: H2O 0xdata MLconf

Data

SensorsDevices

Semi-structured data. json. High velocity. High dimensions.

Events. Signals. TimeSeries

H2O.aiMachine Intelligence

Page 13: H2O 0xdata MLconf

Streaming Data

Scoring from predictionAnomaly and Outliers DetectionUnsupervised Learning

Historical Data

H2O.aiMachine Intelligence

Page 14: H2O 0xdata MLconf

Streaming Data

Anomaly and Outliers Detection

Historical Data

mod

el

Scoring from prediction

H2O.aiMachine Intelligence

Page 15: H2O 0xdata MLconf

Streaming Data

Clustering / Unsupervise Learning

Historical Data

mod

el

Scoring from prediction

H2O.aiMachine Intelligence

Page 16: H2O 0xdata MLconf

H2O.aiMachine Intelligence https://developer.nest.com/documentation/api-reference/devices

Page 17: H2O 0xdata MLconf

Take Models to Production in Java

H2O.aiMachine Intelligence

Page 18: H2O 0xdata MLconf

Onset of Rita

H2O.aiMachine Intelligence

Page 19: H2O 0xdata MLconf

Common ensemble techniquesBayesian Classifiers

Ensembles of all hypotheses in hypothesis-space.

Bagging Each model votes with equal weight.

Bagging trains models on randomly drawn subset

Boosting Incrementally build an ensemble of each new model

H2O.aiMachine Intelligence

Page 20: H2O 0xdata MLconf

H2O.aiMachine Intelligence

Page 21: H2O 0xdata MLconf

H2O.aiMachine Intelligence

Page 22: H2O 0xdata MLconf

Gradient Boosting Machine

H2O.aiMachine Intelligence

Page 23: H2O 0xdata MLconf

H2O.aiMachine Intelligence

Page 24: H2O 0xdata MLconf

H2O.aiMachine Intelligence

Page 25: H2O 0xdata MLconf

Variable Importance Comparison

Random Forest, 50 trees

Gradient Boosting Machine, 50 trees

H2O.aiMachine Intelligence

Page 26: H2O 0xdata MLconf

Generalized Linear Modeling – Variable Importance

GLM, Elastic Net (Binomial)Categorical expansion on Age

GLM, Elastic Net (Binomial)

H2O.aiMachine Intelligence

Page 27: H2O 0xdata MLconf

Variable Importance Comparison

Deep Learning (Tanh / 4-layer)

Deep Learning (Tanh / 3-layer)

H2O.aiMachine Intelligence

Page 28: H2O 0xdata MLconf

every generation needs to invent it’s math.

Our data, our tools!

H2O.aiMachine Intelligence

Page 29: H2O 0xdata MLconf

Power-Law

Page 30: H2O 0xdata MLconf

Code is incomplete without Community!

Open Source Matters!

H2O.aiMachine Intelligence

Page 31: H2O 0xdata MLconf
Page 32: H2O 0xdata MLconf

CommunityCommitters 30Meet ups 90

in 12 months

Coverage

Conference Speakers

CurriculumStanford, MIT, CSU, SUNY, SJSU, Purdue

Page 33: H2O 0xdata MLconf

Data Driven Decision Making is hard!

Courage Matters!

H2O.aiMachine Intelligence

Page 34: H2O 0xdata MLconf

Winning customer trust not just quarters!

Mindset matters!

H2O.aiMachine Intelligence

Page 35: H2O 0xdata MLconf

ThanksCourtney, Nick & MLConf

for bringing us to ATL

Page 36: H2O 0xdata MLconf

Sparkling Water Application Life Cycle

Sparkling App

jar file

SparkMaster

JVM

spark-submit

SparkWorker

JVM

SparkWorker

JVM

SparkWorker

JVM

(1)

(2)

(3)

(1) User submits App to Spark cluster Master node(2) App distributed to Spark cluster Worker nodes(3) Spark Executor JVMs start for App(4) H2O instance starts within each Executor JVM(5) App’s Scala main program runs

Sparkling Water Cluster

Spark Executor JVM

H2O(4)

Spark Executor JVM

H2O

Spark Executor JVM

H2O

Page 37: H2O 0xdata MLconf

Sparkling Water Data Distribution

H2O

H2O

H2O

Sparkling Water Cluster

Spark Executor JVMData

Source(e.g.

HDFS) (1)

(2)

(3)

(1) Use Spark SQL to read data into a Spark RDD

(2) Convert Spark RDD to H2O RDD; H2O RDD is column-based and highly compressed

(Not shown) Run modeling and prediction workflows with H2O

(3) Convert H2O RDD (e.g. predictions) back to Spark RDD

H2ORDD

Spark Executor JVM

Spark Executor JVM

SparkRDD

Page 38: H2O 0xdata MLconf

H2O

HHDFS

H2O

YARN

HHDFS

Hadoop MR

H2O

HHDFS

Standalone YARN H2O in MR

HortonWorks, Cloudera, MapR, Intel H2O.aiMachine Intelligence

Page 39: H2O 0xdata MLconf

H2O.aiMachine Intelligence

H2O – The Killer-App for Spark

Sparkling Water

HDFS=DATA

MLlib H2O SQLH2ORDD

In-Memory Big Data, ColumnarML 100x faster AlgosR CRAN, API, fast engineAPI Spark API, Java MMCommunity Devs, Data Science

Page 40: H2O 0xdata MLconf

examples

H2O.aiMachine Intelligence

Page 41: H2O 0xdata MLconf
Page 42: H2O 0xdata MLconf

Fraud / No-fraud1/1000 unbalanced

Click-Stream Browse / Click / Buy

H2O.aiMachine Intelligence

Page 43: H2O 0xdata MLconf

Propensity ModelsMerchants –to- Users

Lifetime Value of CustomerPricing Engines

H2O.aiMachine Intelligence