Post on 14-Jul-2015
H2O - MORE THAN WATERWhat is H2O? (water, duh!)
It is ALSO an open-source, parallel processing engine for machine learning.
What makes H2O different?
Cutting-edge algorithms + parallel architecture + ease-of-use
=Happy Data Scientists / Analysts
TEAM @ H2O.AI16,000 commits
H2O World Conference 2014
COMMUNITY REACH
120 meetups in 201411,000 installations2,000 corporationsFirst Friday Hack-A-Thons
TRY IT!Don’t take my word for it…www.h2o.ai
Simple Instructions
1. CD to Download Location2. unzip h2o file3. java -jar h2o.jar4. Point browser to: localhost:54321
GUI
R
SUPERVISED LEARNINGDeep Learning Applications on Labeled Data
SUPERVISED LEARNINGWhat is it?
Examples of supervised learning tasks:
1. Classification Tasks - Benign / Malignant tumor 2. Regression Tasks - Predicting future stock market prices
3. Image Recognition - Highlighting faces in pictures
Methods that infer a function from labeled training data. Key task: Predicting ________ . (Insert your task here)
SUPERVISED ALGORITHMS
Ensembles
Deep Neural Networks
• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie
• Cox Proportional Hazards Models • Naïve Bayes
• Distributed Random Forest: Classification or regression models
• Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations
• Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations
Statistical Analysis
VERY HOT subject area & our topic today!
WHY NEURAL NETS?Linear Classification Non-Linear Classification
Error
NEURAL NETS + H20
Inputs
Outputs
Hidden Features
Neurons activate each other via weighted sums
y1
y2
x1
x2
x3
x3
h1
h2
h3
Activation Functions H2O Supports:
Tanh
Rectifier
Maxout
FINDING THE HIGGS-BOSONTask:
Can we identify the Higgs-Boson particle vs. background noise using ‘low-level’ machine generated data?
Live Demo!
CERN Lab
FIGHTING CRIME IN CHICAGO
Spark + H2O
OPEN CITY, OPEN DATA“…my kind of town” - F. Sinatra
~4.6 Million rows of crimes from 2001, updated weekly*External data source considerations???
Weather Data ?U.S. CensusData ?
Crime Data
ML WORKFLOW
1. Collect datasets (Crime + Weather + Census)2. Do some feature extraction (e.g. dates, times)3. Join Crime data Weather Data Census Data4. Build deep learning model to predict
arrest / no arrest made
GOAL:For a given crime,
predict if an arrest is more / less likely to be made!
SPARK SQL + H2O RDD3 table join using Spark SQL
Convert joined table to H2O RDD
H2O DEEP LEARNINGCan do grid search over many parameters!
HOW’D WE DO?
nice!
~ 10 mins
MODEL BUILDING + TUNINGDReD Net = Deep Rectifier w/ Dropout Neural Net
Arrest
Inputs
X
X
X
X
Epochs, hidden layers, regularization
UNSUPERVISED LEARNINGDeep Learning Applications on Non-Labeled Data
UNSUPERVISED LEARNINGWhat is it?
Examples of unsupervised learning tasks:
1. Clustering - Discovering customer segments2. Topic Extraction - What topics are people tweeting about?
3. Information Retrieval - IBM Watson: Question + Answer
Methods to understand the general structure of input data whereno predictions is needed.
4. Anomaly Detection - Detecting irregular heart-beats
NO CURATION NEEDED!
UNSUPERVISED ALGORITHMS
Dimensionality Reduction
Anomaly Detection
• K-means: Partitions observations into k clusters/groups of the same spatial size
• Principal Component Analysis: Linearly transforms correlated variables to independent components
• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
Clustering
AUTOENCODER + H2OInput Output
HiddenFeatures
Information Flow
x1
x2
x3
x4
x1
x2
x3
x4
Dogs, Dogs and Dogs
ANOMALY DETECTION OF VINTAGE YEAR BORDEAUX WINE
BORDEAUX WINELargest wine-growing region in France
+ 700 Million bottles of wine produced / year !
Some years better than others: Great ($$$) vs. Typical ($)Last Great years: 2010, 2009, 2005, 2000
GREAT VS. TYPICAL VINTAGE?Question:
Can we study weather patterns in Bordeaux leading up to harvest to identify ‘anomalous’ weather years >>
correlates to Great ($$$) vs. Typical ($) Vintage?
The Bordeaux Dataset (1952 - 2014 Yearly)
Amount of Winter Rain (Oct > Apr of harvest year)Average Summer Temp (Apr > Sept of harvest year)Rain during Harvest (Aug > Sept)Years since last Great Vintage
AUTOENCODER + ANOMALY DETECTION
ML Workflow:
1) Train autoencoder to learn ‘typical’ vintage weather pattern2) Append ‘great’ vintage year weather data to original dataset3) IF great vintage year weather data does NOT match learnedweather pattern, autoencoder will produce high reconstruction
error (MSE)
‘en primeur of en primeur’ - Can we use weather patterns to identify anomalous years >> indicates great vintage quality?
Goal:
RESULTS (MSE > 0.10)
Mean Square Error
1961 V 2009 V
2005 V2000 V
1990 V
1989 V
1982 V2010 V
2014 BORDEAUX??
Mean Square Error
2014 ?2013
DEEP AUTOENCODERS + K-MEANS EXAMPLE
Help cyclists with their health related questions!
CYCLING + __________Problem:
New and Experienced Cyclists have questions about cycling + ______ (given topic). Let’s build a question + answer system to help!
ML Workflow:1) Scrape thousands of article titles from internet about cycling /
cycling tips / cycling health, etc from various sources.
2) Build Bag-of-Words Dataset on article titles corpus
3) Reduce # of dimensions via deep autoencoder
4) Extract ‘last layer’ of deep features and cluster using k-means
5) Inspect Results!
BAG-OF-WORDSBuild dataset of cycling-related articles from various sources:
The Basics of Exercise Nutrition
0 , 0 , 0 , 0 , 1, 1, 0 , 0 , 1, 0 , 0 …, 0
basics exercise nutrition
lower caseremove ‘stopwords’remove punctuation
Article Title
[ ]
DIMENSIONALITY REDUCTION
Use deep autoencoder to reduce # features (~2,700 words!)
2,700 Words
500 hidden features
250 H.F.
125 H.F.
50
125 H.F.
250 H.F.
500 hidden features
2,700 Words
Decoder
Encoder
The Basics of Exercise Nutrition
K-MEANS CLUSTERINGFor each article: Extract ‘last’ layer of autoencoder (50 deep features)
The Basics of Exercise Nutrition 50 ‘deep features’
The Basics of Exercise Nutrition -‐0.09330833 0.167881429 -‐0.234307408 0.247723639 -‐0.067700267 -‐0.094107866
DF1 DF2 DF3 DF4 DF5 DF6
K-Means ClusteringInputs: Extracted 50 deep features for each cycling-related articleK = 50 clusters after grid-search of values
RESULT: CYCLING + A.I.Now we inspect the clusters!
Test Article Title:Fluid & Carbohydrate Ingestion Improve Performance During 1Hour of
Intense Exercise
Result:Clustered w/ 17 other titles (out of ~5,700)
Top 5 similar titles within cluster :
Caffeine ingestion does not alter performance during a 100-km cycling time-trial performance
Immuno-endocrine response to cycling following ingestion of caffeine and carbohydrate
Metabolism and performance following carbohydrate ingestion late in exercise
Increases in cycling performance in response to caffeine ingestion are repeatable
Fluid ingestion does not influence intense 1-h exercise performance in a mild environment
HOW TO GET FASTER?Test Article Title:
Muscle Coordination is Key to Power Output & Mechanical Efficiency of Limb Movements
Result:Clustered w/ 29 other titles (out of ~5,700)
Top 5 similar titles within cluster :Muscle fibre type efficiency and mechanical optima affect freely chosen pedal rate during cycling.
Standard mechanical energy analyses do not correlate with muscle work in cycling.
The influence of body position on leg kinematics and muscle recruitment during cycling.
Influence of repeated sprint training on pulmonary O2 uptake and muscle deoxygenation kinetics in humans
Influence of pedaling rate on muscle mechanical energy in low power recumbent pedaling using forward dynamic simulations
WHAT’S NEXT??Build smarter apps!!
alex@h2o.aigithub.com/h2oai
Hack with us!!
HIGGS-BOSON PARTICLE
How did our Deep Neural Net do??
BEST Low-Level AUC: 0.73