Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014
-
Upload
sri-ambati -
Category
Data & Analytics
-
view
1.325 -
download
0
Transcript of Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014
![Page 1: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/1.jpg)
Unsupervised Learning in H20
H20 World 2014A.Tellez, S.Subramanian, L.Tashkevych, T.Nguyen
![Page 2: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/2.jpg)
What is Unsupervised Learning
Unsupervised Learning: Generalizing the internal structure of the data where no prediction is necessary.
Supervised learning must stand on curated data whereas unsupervised learning requires no ‘answer book’.
Common Unsupervised Learning Approaches:
Clustering: k-means, mixture-models, affinity propagation
Dimensionality Reduction: PCA, Autoencoders
Hidden Markov Models
Topic Extraction: NMF, LDA
![Page 3: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/3.jpg)
Example: Single Malt Scotch
Single Malt Scotch: A whiskey made at one particular distillery from a mash that only uses malted grain (barley).
Must be aged at least 3 years in oak casks
Many famous distilleries produced in northern regions of Scotland
![Page 4: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/4.jpg)
Single Malt Dataset
The Single Malt Whiskey Dataset
85 distilleries from northern Scotland
12 descriptor features
E.g. Sweetness, Smoky, Tobacco, Honey, Spicy, Malty, etc
Each descriptor rated 0 (weak) 4 (strong)
Dataset kindly provided here*.
How can we use our knowledge of unsupervised learning to learn more about single malt whiskeys?
Can build a whiskey recommendation engine based on whiskeys we like already?
* Dataset Source: https://www.mathstat.strath.ac.uk/outreach/nessie/nessie_whisky.html
![Page 5: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/5.jpg)
Dimension Reduction + K-Means
First, let’s reduce the 12 features to a lower dimensional space using Principal Component Analysis…
…7 principal components explain 85% of the variance in the dataset
Then, let’s use k-means clustering to determine how the unique groups using the new PCA’d dataset
Grid Search shows that 11 clusters are appropriate
Pipe out result and attach original distillery labels to see what whiskey’s cluster with each other all using H20!
![Page 6: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/6.jpg)
Model Results
I ENJOY:
OTHER WHISKEYS THAT CLUSTER WITH THESE:
![Page 7: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/7.jpg)
Model Results Cont’d.
SOME OF YOU MY LIKE:
OTHER WHISKEYS IN THE SAME CLUSTER:
![Page 8: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/8.jpg)
Example: Feeling like ramen?
Burning question: You like Japanese ramen, where can you go for dinner tonight if you want ramen around Mountain View?
8
![Page 9: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/9.jpg)
Ramen Yelp Dataset
Harvested all the known ramen shops around Mountain View and built our Yelp dataset:
9
![Page 10: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/10.jpg)
Step 1: PCA
85% of cumulative variance in dataset explained using 2 PC’s
10
Second PC
![Page 11: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/11.jpg)
Step 2: K-Means
Grid Search shows 4 clusters on PCA’d dataset
I really like this ramen joint:
I’m thinking these places for dinner tonight:
11
![Page 12: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/12.jpg)
Example: Bordeaux Wines
Bordeaux is the largest wine growing region in France
700 Million bottles of wine (red + white) annually
Some years better than other years Great ($$$) vs. Typical ($)
Last Great Years: 2010, 2009, 2005, 2000
![Page 13: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/13.jpg)
Buying Bordeaux ‘en primeur’
While wine is still barreled, purchasers can ‘invest’ in the wine before bottling and official public release
Advantage: Wines may be considerably cheaper during ‘en primeur’ period than official release.
13
Great Years: 2000,’05,’09’
![Page 14: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/14.jpg)
Red Obsession Trailer
Sri, there is a 3 minute movie trailer for red obsession that I will show but didn’t send due to size limitation in email.
![Page 15: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/15.jpg)
Great Vintage vs. Typical Vintage
Question: Can we study the weather patterns in Bordeaux leading up to harvest to identify ‘anomalous’ weather years correlates to Great Vintage vs. Typical Vintage?
The Bordeaux Dataset (1952 – 2014) : Yearly data that measures:
Winter Rain (October March of harvest year)
Average Summer Temp (April September of harvest year)
Harvest Rain (August September of harvest year)
![Page 16: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/16.jpg)
Autoencoder + Anomaly Detection
In Steps:
Train an autoencoder model to learn Typical Vintage year weather patterns
Append Great Vintage year weather data to original dataset.
IF Great Vintage year weather data does not match learned weather pattern, autoencoder will produce high reconstruction error (MSE).
‘en primeur’ of ‘en primeur’: Can we use weather patterns to identify anomalous years which may be indicative of Great Vintage quality?
![Page 17: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/17.jpg)
Autoencoder Results (MSE > 0.25)
Me
an
Sq
ua
re E
rror
1961 V
1989 V
1990 V2000 V
2003 NV*
2005 NV
2009 V2010 V
2011 NV*
![Page 18: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/18.jpg)
2014 Bordeaux? You Decide!
18
2014 ?? 2013 NV
![Page 19: Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014](https://reader034.fdocuments.in/reader034/viewer/2022052316/55a200e11a28ab911b8b465e/html5/thumbnails/19.jpg)
Thank You!
What single malt whiskeys do you like?
Our github has link to original whiskey dataset and the PCA + K-Means cluster assignments
Add to your Netflix: Red Obsession (2013), Somm (2012)
github.com/LenaTash/RH_MachineLearning
All work done in presentation using H20 (Thanks Sri!)
Questions + Comments?