Hank Roark, Data Scientist, H2O at MLconf ATL - 9/18/15

9
H 2 O.ai Machine Intelligence ML is the new SQL Prediction is the new Search

Transcript of Hank Roark, Data Scientist, H2O at MLconf ATL - 9/18/15

H2O.ai Machine Intelligence

ML is the new SQL Prediction is the new Search

H2O.ai Machine Intelligence

Machine Learning for the

Sensored Internet of Things

Hank [email protected]@hankroark

2

H2O.ai Machine Intelligence

Who am I?

▪ Data Scientist & Hacker @ H2O.ai▪ Lecturer in Systems Thinking, University of Illinois at Urbana-Champaign

▪ John Deere, Research, Software Product Development, High Tech Ventures▪ Lots of time dealing with data off of machines, equipment, satellites, radar,

hand sampled, and on.▪ Geospatial and temporal / time series data almost all from sensors.▪ Previously at startups and consulting (Red Sky Interactive, Nuforia,

NetExplorer, Perot Systems, a few of my own)

▪ Systems Design & Management MIT▪ Physics Georgia Tech

H2O.ai Machine Intelligence

IoT Data Comes From Lots of Places, Much of it from Sensors

H2O.ai Machine Intelligence

The data is going to be huge, so get ready

H2O.ai Machine Intelligence

Wow, how big is a brontobyte?

H2O.ai Machine Intelligence

This much data will require a fast OODA loopMuch of these models will then be used in control systems

Image courtesy http://www.telecom-cloud.net/wp-content/uploads/2015/05/Screen-Shot-2015-05-27-at-3.51.47-PM.png

H2O.ai Machine Intelligence

Machine Prognostics Use Case Sensor data of turbofan remaining useful life prediction

Jupyter notebook @ https://goo.gl/G2zx3o

Many more tips and tricks

H2O.ai Machine Intelligence

Key take aways for modeling the sensored IoT

• Some sort of signal processing is usually helpful, but can introduce bias• Smoothers, filters, frequency domain, interpolation, LOWESS, ... , feature

engineering

• Validation strategy is important• Easy to memorize due to autocorrelation

• Sometimes the simplest things work• Treat each observation independently; Use time, location, as data elements

• Uncertainty is the name of the game• Methods that will report out probabilities are often required (not shown here)

• The data can be big, get ready, it'll be a great ride• Scalable tools like H2O will help you model the coming bronobytes of data