H2O for IoT - Jo-Fai (Joe) Chow, H2O

32
H2O for Internet of Things Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulous Data Science Milan Politecnico di Milano 10 th October, 2016

Transcript of H2O for IoT - Jo-Fai (Joe) Chow, H2O

Page 1: H2O for IoT - Jo-Fai (Joe) Chow, H2O

H2O for Internet of Things

Jo-fai (Joe) Chow

Data Scientist

[email protected]

@matlabulous

Data Science Milan

Politecnico di Milano

10th October, 2016

Page 2: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Agenda

• First Talk (25 mins)o About H2O.aio Demo

• A Simple Classification Task• H2O’s Web Interface

o Why H2O?• Our Community• Our Customers

o What’s Next?• New H2O Features

• Second Talk (25 mins)o H2O for IoT

• Predictive Maintenance• Anomaly Detection• H2O’s R Interface

• Third Talk (25 mins)o Deep Watero Demo

• H2O + mxnet on GPU• H2O’s Python Interface

2

Page 3: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Data and Code

• Please go to bit.ly/h2o_milan_1• subfolders

o iot_use_case_1

o iot_use_case_2

3

Page 4: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use Case 1

Predictive Maintenance

Page 5: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Data for Use Case 1: SECOM

5https://archive.ics.uci.edu/ml/datasets/SECOM

Page 6: H2O for IoT - Jo-Fai (Joe) Chow, H2O

6

We want to predict fails in the future.

Page 7: H2O for IoT - Jo-Fai (Joe) Chow, H2O

The ML Problem – Pass/Fai l

• Inputs

o 591 features

• Output

o Classification• -1 = pass

• 1 = fail

• Size: 1567 Samples

7

Page 8: H2O for IoT - Jo-Fai (Joe) Chow, H2O

8

Features (Numeric)

ID (excluded from modeling)

Page 9: H2O for IoT - Jo-Fai (Joe) Chow, H2O

9

Features (Numeric)

Response (Classification)-1 (Pass) or 1 (Fail)

Page 10: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use Case 1: Predictive Maintenance

Step 1: R Packages

Page 11: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_01_instal l_packages.R

11

Package ‘h2o’

Page 12: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use Case 1: Predictive Maintenance

Step 2: Exploratory Analysis

Page 13: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_02_exploratory_analysis.R

13

Importing SECOM data

Optional (different ways to import data)

Page 14: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_02_exploratory_analysis.R

14

Basic exploratory analysis

Convert -1 and 1 to categorical value

Note: Imbalance dataset (only 104 fails)

Page 15: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use H2O Flow ( localhost:54321)

15

Page 16: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use Case 1: Predictive Maintenance

Step 3: Building & Evaluating Models

Page 17: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_03_basic_models.R

17

Define features & target

Page 18: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_03_basic_models.R

18

Split data with a random seed

Classification 1 samples ≈ 7%

Page 19: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_03_basic_models.R

19

Train H2O models with default values

H2O automatically ignores

Columns with constant values

Page 20: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_03_basic_models.R

20

summary(model_xxx)

Page 21: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use H2O Flow ( localhost:54321)

21

Page 22: H2O for IoT - Jo-Fai (Joe) Chow, H2O

step_03_basic_models.R

22

Evaluate models with

test data

Page 23: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Advanced Procedures

• Step 4 – Manual Tuning

• Step 5 – Early Stopping

• Step 6 – Grid Search

• Step 7 – Stacking Models (“h2oEnsemble”)

• Step 8 – Saving/Loading Models

• Please try them out later (bit.ly/h2o_milan_1)

23

Page 24: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Use Case 2:

Anomaly Detection

Page 25: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Anomaly (Outl ier) Detection

• Definition

o Identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.

• Use Cases

o Bank Fraud

o Monitoring Manufacturing Lines

o Machine Learning• Separate dataset and

build different models

25Photo credit: www.dbta.com

Page 26: H2O for IoT - Jo-Fai (Joe) Chow, H2O

Deep Autoencoder for Anomaly Detect ion

• Consider the following three-layer neural network with one hidden layer and the same number of input neurons (features) as output neurons.

• The loss function is the mean squared error (MSE) between the input and the output. Hence, the network is forced to learn the identity via a nonlinear, reduced representation of the original data.o e.g. High MSE = potential outliers

• Such an algorithm is called a deep autoencoder.

26https://github.com/h2oai/h2o-training-book/blob/master/hands-on_training/anomaly_detection.md

Page 27: H2O for IoT - Jo-Fai (Joe) Chow, H2O

MNIST Example – The Good Ones

27Samples with Low Mean Squared Error (MSE)

Page 28: H2O for IoT - Jo-Fai (Joe) Chow, H2O

MNIST Example – The Bad Ones

28Samples with High Mean Squared Error (MSE)

Page 29: H2O for IoT - Jo-Fai (Joe) Chow, H2O

MNIST Example – The Ugly Ones

29Samples with Highest Mean Squared Error (MSE)

Page 30: H2O for IoT - Jo-Fai (Joe) Chow, H2O

use_case_2_anomaly_detection.R

30

Page 31: H2O for IoT - Jo-Fai (Joe) Chow, H2O

use_case_2_anomaly_detection.R

31

Define your own cut-off point

Build a Deep Autoencoder

Look at the MSE

Define cut-off Outliers identified

Page 32: H2O for IoT - Jo-Fai (Joe) Chow, H2O

End of Second Talk – Thanks!

32

• Data Science Milan

• Gianmario Spacagna

• Politecnico di Milano

• Resourceso bit.ly/h2o_milan_1

o www.h2o.ai

o docs.h2o.ai

• Contacto [email protected]

o @matlabulous

o github.com/woobe