Project "Deep Water"

95
Project “Deep Water” H 2 O’s Integration with TensorFlow Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulous TensorFlow Paris Meetup 30 th November, 2016

Transcript of Project "Deep Water"

Page 1: Project "Deep Water"

Project “Deep Water”H2O’s Integration with TensorFlow

Jo-fai (Joe) ChowData [email protected]

@matlabulous

TensorFlow Paris Meetup30th November, 2016

Page 2: Project "Deep Water"

2

About Me• Civil (Water) Engineer• 2010 – 2015• Consultant (UK)

• Utilities• Asset Management• Constrained Optimization

• Industrial PhD (UK)• Infrastructure Design Optimization• Machine Learning +

Water Engineering• Discovered H2O in 2014

• Data Scientist• From 2015• Virgin Media (UK)• Domino Data Lab (Silicon Valley)• H2O.ai (Silicon Valley)

Page 3: Project "Deep Water"

3

Agenda• Introduction• About TensorFlow• TensorFlow Use Cases• About H2O.ai

• Project Deep Water• Motivation• Benefits• H2O + TensorFlow Live Demo

• Conclusions

Page 4: Project "Deep Water"

4

About TensorFlow

Page 5: Project "Deep Water"

5

About TensorFlow• Open source machine learning framework

by Google• Python / C++ API• TensorBoard

• Data Flow Graph Visualization

• Multi CPU / GPU• v0.8+ distributed machines support

• Multi devices support• desktop, server and Android devices

• Image, audio and NLP applications• HUGE Community• Support for Spark, Windows …

Page 6: Project "Deep Water"

6

TensorFlow Wrappers• TFLearn – Simplified interface• keras – TensorFlow + Theano• tensorflow.rb – Ruby wrapper• TensorFlow.jl – Julia wrapper• TensorFlow for R – R wrapper

• … and many more!• See:

github.com/jtoy/awesome-tensorflow

Page 7: Project "Deep Water"

7

TensorFlow Use CasesVery Cool TensorFlow Use CasesSome of the cool things you can do with TensorFlow

Page 8: Project "Deep Water"

8playground.tensorflow.org

Page 9: Project "Deep Water"

9

Neural Style Transfer in TensorFlow• Neural Style

• “… a technique to train a deep neural network to separate artistic style from image structure, and combine the style of one image with the structure of another”

• Original Paper• “A Neural Algorithm of Artistic Style

• TensorFlow Implementation • [Link]

Page 10: Project "Deep Water"

10

Sorting Cucumbers• Problem

• Sorting cucumbers is a laborious process.

• In a Japanese farm, the farmer’s wife can spend up to eight hours a day sorting cucumbers during peak harvesting period.

• Solution• Farmer’s son (Makoto Koike) used

TensorFlow, Arduino and Raspberry Pi to create an automatic cucumber sorting system.

Page 11: Project "Deep Water"

11

Sorting Cucumbers• Classification Problem• Input: cucumber photos (side, top,

bottom)• Output: one of nine classes

• Google’s Blog Post [Link]• YouTube Video [Link]

Page 12: Project "Deep Water"

12

Page 13: Project "Deep Water"

13

Page 14: Project "Deep Water"

14

Page 15: Project "Deep Water"

15

Of coursethere are more TensorFlow use casesThe key message here is …

Page 16: Project "Deep Water"

16

TensorFlowdemocratizesthe power of deep learning

Page 17: Project "Deep Water"

17

About H2O.aiWhat exactly is H2O?

Page 18: Project "Deep Water"

18

Company OverviewFounded 2011 Venture-backed, debuted in 2012

Products • H2O Open Source In-Memory AI Prediction Engine• Sparkling Water• Steam

Mission Operationalize Data Science, and provide a platform for users to build beautiful data products

Team 70 employees • Distributed Systems Engineers doing Machine Learning• World-class visualization designers

Headquarters Mountain View, CA

Page 19: Project "Deep Water"

Bring AI To Business Empower Transformation

Financial Services, Insurance and Healthcare as Our Vertical Focus

Community as Our Foundation

H2O is an open source platformempowering business transformation

Page 20: Project "Deep Water"

20

Users In Various Verticals Adore H2O

Financial Insurance MarketingTelecom Healthcare

Page 21: Project "Deep Water"

21

www.h2o.ai/customers

Page 22: Project "Deep Water"

Cloud Integration

Big Data Ecosystem

H2O.ai Makes A Difference as an AI PlatformOpen Source Flexible Interface

Scalability and Performance GPU EnablementRapid Model Deployment

Smart and Fast Algorithms

H2O Flow• 100% open source

• Highly portable models deployed in Java (POJO) and Model Object Optimized (MOJO)

• Automated and streamlined scoring service deployment with Rest API

• Distributed In-Memory Computing Platform

• Distributed Algorithms• Fine-Grain MapReduce

Page 23: Project "Deep Water"

23

1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-160

10000

20000

30000

40000

50000

60000

70000# H2O Users

H2O Community Growth Tremendous Momentum Globally

65,000+ users globally (Sept 2016)

• 65,000+ users from ~8,000 companies in 140 countries. Top 5 from:

Large User Circle

* DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT

1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-160

100020003000400050006000700080009000

# Companies Using H2O ~8,000+ companies(Sept 2016)

+127%

+60%

Page 24: Project "Deep Water"

24

H2O Community SupportGoogle forum – h2osteam community.h2o.ai

Please try

Page 25: Project "Deep Water"

25

H2O for Kaggle Competitions

Page 26: Project "Deep Water"

26

H2O for Academic Research

http://www.sciencedirect.com/science/article/pii/S0377221716308657

https://arxiv.org/abs/1509.01199

Page 27: Project "Deep Water"

27

H2Odemocratizesartificial intelligence & big data science

Page 28: Project "Deep Water"

28

Our Open Source Products100% Open Source. Big Data Science for Everyone!

Page 29: Project "Deep Water"

29

H2O.ai Offers AI Open Source PlatformProduct Suite to Operationalize Data Science with Visual Intelligence

In-Memory, Distributed Machine Learning

Algorithms with Speed and Accuracy

H2O Integration with Spark. Best Machine Learning on

Spark.

100% Open Source

Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products

Operationalize and Streamline Model Building, Training and Deployment

Automatically and Elastically

State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O

Deep Water

Page 30: Project "Deep Water"

30

H2O.ai Offers AI Open Source PlatformProduct Suite to Operationalize Data Science with Visual Intelligence

In-Memory, Distributed Machine Learning

Algorithms with Speed and Accuracy

H2O Integration with Spark. Best Machine Learning on

Spark.

100% Open Source

Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products

Operationalize and Streamline Model Building, Training and Deployment

Automatically and Elastically

State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O

Deep Water

Page 31: Project "Deep Water"

31

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

Page 32: Project "Deep Water"

32

Supervised Learning

• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

• Naïve Bayes

Statistical Analysis

Ensembles

• Distributed Random Forest: Classification or regression models

• Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

Deep Neural Networks

• Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Algorithms OverviewUnsupervised Learning

• K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k

Clustering

Dimensionality Reduction

• Principal Component Analysis: Linearly transforms correlated variables to independent components

• Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

Anomaly Detection

• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

Page 33: Project "Deep Water"

33

Distributed Algorithms

• Foundation for In-Memory Distributed Algorithm Calculation - Distributed Data Frames and columnar compression

• All algorithms are distributed in H2O: GBM, GLM, DRF, Deep Learning and more. Fine-grained map-reduce iterations.

• Only enterprise-grade, open-source distributed algorithms in the market

User Benefits

Advantageous Foundation

• “Out-of-box” functionalities for all algorithms (NO MORE SCRIPTING) and uniform interface across all languages: R, Python, Java

• Designed for all sizes of data sets, especially large data• Highly optimized Java code for model exports• In-house expertise for all algorithms

Parallel Parse into Distributed Rows

Fine Grain Map Reduce Illustration: Scalable Distributed Histogram Calculation for GBM

Foun

datio

n fo

r Dist

ribut

ed A

lgor

ithm

s

Page 34: Project "Deep Water"

34

H2O Deep Learning in Action

Page 35: Project "Deep Water"

35

H2O in ActionQuick Demo (5 minutes)

Page 36: Project "Deep Water"

36

Simple Demo – Iris

Page 37: Project "Deep Water"

37

H2O + R Please try

Page 38: Project "Deep Water"

38

H2O Flow (Web)

Page 39: Project "Deep Water"

39

H2O Flow

Page 40: Project "Deep Water"

40

Key Learning Resources• Help Documentations• docs.h2o.ai

• Meetups• bit.ly/h2o_meetups

• YouTube Channel• bit.ly/h2o_youtube

Page 41: Project "Deep Water"

41

H2O.ai Offers AI Open Source PlatformProduct Suite to Operationalize Data Science with Visual Intelligence

In-Memory, Distributed Machine Learning

Algorithms with Speed and Accuracy

H2O Integration with Spark. Best Machine Learning on

Spark.

100% Open Source

Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products

Operationalize and Streamline Model Building, Training and Deployment

Automatically and Elastically

State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O

Deep Water

Page 42: Project "Deep Water"

42

Both TensorFlow and H2O are widely used

Page 43: Project "Deep Water"

43

TensorFlow democratizes the power of deep learning.

H2O democratizes artificial intelligence & big data science.

There are other open source libraries like MXNet and Caffe too.Let’s have a party, this will be fun!

Page 44: Project "Deep Water"

44

Deep Water Next-Gen Distributed Deep Learning with H2O

H2O integrates with existing GPU backends for significant performance gains

One Interface - GPU Enabled - Significant Performance GainsInherits All H2O Properties in Scalability, Ease of Use and Deployment

Recurrent Neural Networksenabling natural language processing, sequences, time series, and more

Convolutional Neural Networks enabling Image, video, speech recognition

Hybrid Neural Network Architectures enabling speech to text translation, image captioning, scene parsing and more

Deep Water

Page 45: Project "Deep Water"

Deep Water Architecture

Node 1 Node N

ScalaSpark

H2OJava

Execution Engine

TensorFlow/mxnet/CaffeC++

GPU CPU

TensorFlow/mxnet/CaffeC++

GPU CPU

RPC

R/Py/Flow/Scala clientREST APIWeb server

H2OJava

Execution Engine

grpc/MPI/RDMA

ScalaSpark

45

Page 46: Project "Deep Water"

46

Using H2O Flow to train Deep Water Model

Page 47: Project "Deep Water"

47

Same H2O R/Python Interface

Page 48: Project "Deep Water"

48

H2O + TensorFlowLive Demo

Page 49: Project "Deep Water"

49

Deep Water H2O + TensorFlow Demo• H2O + TensorFlow• Dataset – Cat/Dog/Mouse• TensorFlow as GPU backend• Train a LeNet (CNN) model• Interfaces

• Python (Jupyter Notebook)• Web (H2O Flow)

• Code and Data• github.com/h2oai/deepwater

Page 50: Project "Deep Water"

50

Data – Cat/Dog/Mouse Images

Page 51: Project "Deep Water"

51

Data – CSV

Page 53: Project "Deep Water"

53

Deep Water in ActionQuick Demo (5 minutes)

Page 54: Project "Deep Water"

54

Page 55: Project "Deep Water"

55

Page 56: Project "Deep Water"

56

Page 57: Project "Deep Water"

57

Page 58: Project "Deep Water"

58

Using GPU for training

Page 59: Project "Deep Water"

59

Page 60: Project "Deep Water"

60

Saving the custom network structure as a file

Page 61: Project "Deep Water"

61

Creating the custom network structure with size = 28x28 and channels = 3

Page 62: Project "Deep Water"

62

Specifying the custom network structure for training

Page 63: Project "Deep Water"

63

H2O + TensorFlowNow try it with H2O Flow

Page 64: Project "Deep Water"

64

Want to try Deep Water?• Build it• github.com/h2oai/deepwater• Ubuntu 16.04• CUDA 8• cuDNN 5 • …

• Pre-built Amazon Machine Images (AMIs)• Info to be confirmed

Page 65: Project "Deep Water"

65

H2O.ai Offers AI Open Source PlatformProduct Suite to Operationalize Data Science with Visual Intelligence

In-Memory, Distributed Machine Learning

Algorithms with Speed and Accuracy

H2O Integration with Spark. Best Machine Learning on

Spark.

100% Open Source

Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products

Operationalize and Streamline Model Building, Training and Deployment

Automatically and Elastically

State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O

Deep Water

Page 66: Project "Deep Water"

66

Want to find out more about Sparkling Water and Steam?

• R Addicts Paris• Tomorrow 6:30pm• Three H2O Talks:• Introduction to H2O

• Demos: H2O + R + Steam• Auto Machine Learning using H2O• Sparkling Water 2.0

Page 67: Project "Deep Water"

67

Conclusions

Page 68: Project "Deep Water"

68

Project “Deep Water”• H2O + TensorFlow• a powerful combination of two widely

used open source machine learning libraries.

• All Goodies from H2O• inherits all H2O properties in scalability,

ease of use and deployment.

• Unified Interface• allows users to build, stack and deploy

deep learning models from different libraries efficiently.

• 100% Open Source• the party will get bigger!

Page 69: Project "Deep Water"

69

Deep Water Roadmap (Q4 2016)

Page 70: Project "Deep Water"

70

H2O’s Mission

Making Machine Learning Accessible to EveryonePhoto credit: Virgin Media

Page 71: Project "Deep Water"

71

Deep Water – Current Contributors

Page 72: Project "Deep Water"

72

Merci beaucoup!• Organizers & Sponsors

• Jiqiong, Natalia & Renat• Dailymotion

• Code, Slides & Documents• bit.ly/h2o_meetups• bit.ly/h2o_deepwater• docs.h2o.ai

• Contact• [email protected]• @matlabulous• github.com/woobe

Page 73: Project "Deep Water"

73

Extra Slides (Iris Demo)

Page 74: Project "Deep Water"

74

Page 75: Project "Deep Water"

75

Page 76: Project "Deep Water"

76

Page 77: Project "Deep Water"

77

Page 78: Project "Deep Water"

78

Page 79: Project "Deep Water"

79

Page 80: Project "Deep Water"

80

Page 81: Project "Deep Water"

81

Page 82: Project "Deep Water"

82

Page 83: Project "Deep Water"

83

Page 84: Project "Deep Water"

84

Extra Slides (Deep Water Demo)

Page 85: Project "Deep Water"

85

Page 86: Project "Deep Water"

86

Page 87: Project "Deep Water"

87

Page 88: Project "Deep Water"

88

Page 89: Project "Deep Water"

89

Page 90: Project "Deep Water"

90

Page 91: Project "Deep Water"

91

Page 92: Project "Deep Water"

92

Page 93: Project "Deep Water"

93

Page 94: Project "Deep Water"

94

Page 95: Project "Deep Water"

95