QCon London H2O Driverless AI
An AI that creates AI!
Marios Michailidis
Background• Competitive data scientist at H2O.ai• PhD in ensemble methods at UCL• Former kaggle #1 – over 120+ competitions
Features Target
Data Quality andTransformation
ModelingTable
ModelBuilding
ModelData Integration
+
Challenges in the Machine Learning Workflow
Weeks or even Months per Model Optimization
Highly Iterative Process
• Insight – Visualization• Cross Validation• Feature Engineering • Model Selection• Hyper Parameter Optimization• Feature Selection• Ensemble• Understanding/Interpreting the results• Deploy/Productionize
Confidential4
Visualizations
outlier detection Correlation graph Heat map
Confidential5
AnimalDogDogCatFishCatDogFishCatCatFish
L.Encoding2213123113
Frequency3343433443
Is_Dog? Is_Cat? Is_Fish?1 0 01 0 00 1 00 0 10 1 01 0 00 0 10 1 00 1 00 0 1
Cost10015030050250752540022540
Animal Avg.costCat 293.75Dog 108.333Fish 38.3333
108.33108.33293.7538.33293.75108.3338.33293.75293.7538.33
Feature engineering: Categorical features
Confidential6
Age404545454951566162717374777878
Age40-5152-5657-7778+
Age40454545
missing51566162717374777878
Age4045454561.1451566162717374777878
Age40-5152-5657-7778+
missing
Age40-5140-5140-5140-5140-5140-5152-5657-7757-7757-7757-7757-7757-7778+78+
Age40-5140-5140-5140-51missing40-5152-5657-7757-7757-7757-7757-7757-7778+78+
Mean=61.14
Feature engineering: Numerical features
Age
Income
Confidential7
Ageincom
e40 4045 1445 10045 3349 2051 4456 6561 8062 5571 1573 1874 3377 3278 4878 41
* +1600 80630 594500 1451485 78980 692244 953640 1214880 1413410 1171065 861314 912442 1072464 1093744 1263198 119
Animal colorDog brownDog whiteCat blackFish goldCat mixedDog blackFish whiteCat whiteCat blackFish brown
Animal_colorDog_brownDog_whiteCat_blackFish_goldCat_mixedDog_blackFish_whiteCat_whiteCat_blackFish_brown
Age Animal13 Dog12 Dog15 Cat3 Fish16 Cat10 Dog6 Fish20 Cat13 Cat2 Fish
derived
11.6611.66163.661611.663.6616163.66
Animal AgeDog 11.66Cat 16Fish 3.66
Feature engineering: Interactions
Confidential8
This dog is cute! I love dogs
dog this is jedi love cuteempero
r bingo ! I2 1 1 0 1 1 0 0 1 1
• SVD• Word2vec
word f1 f2 f3 f4 f5dog 0.8 0.2 0.1 -0.1 -0.4this -0.2 0 0.1 0.8 0.3love 0.1 0.2 0.7 -0.5 0.1cute 0.2 0.2 0.8 -0.4 0
King – Man =Queen
Feature engineering: Text
Confidential9
Date1/1/20182/1/20183/1/20184/1/20185/1/20186/1/20187/1/20188/1/20189/1/201810/1/2018
Day Month Yearweekda
y weeknum1 1 2018 2 12 1 2018 3 13 1 2018 4 14 1 2018 5 15 1 2018 6 16 1 2018 7 17 1 2018 1 28 1 2018 2 29 1 2018 3 210 1 2018 4 2
Date Sales1/1/2018 1002/1/2018 1503/1/2018 1604/1/2018 2005/1/2018 2106/1/2018 1507/1/2018 1608/1/2018 1209/1/2018 8010/1/2018 70
Lag1 Lag2- -100 -150 100160 150200 160210 200150 210160 150120 16080 120
MA2-100125155180205180155140100
Feature engineering: Time series
Confidential10
Common open source packages used in ML
LIBLINEARLIBFFMLIBSVM
Confidential11
• Optimizing maximum depth• Tree expansion (loss vs depth)• Learning rate• Number of trees
1
2
3 3
2
Hyper parameter tuning
Copyright 2018 H2O.ai Inc. All Rights Reserved.
Validation approach
Train dataPast Future
x0 x1 x2 x3 y0.94 0.27 0.80 0.34 10.02 0.22 0.17 0.84 00.83 0.11 0.23 0.42 10.74 0.26 0.03 0.41 00.08 0.29 0.76 0.37 00.71 0.76 0.43 0.95 10.08 0.72 0.97 0.04 00.84 0.79 0.89 0.05 1
0.94 0.27 0.80 0.34 10.02 0.22 0.17 0.84 00.83 0.11 0.23 0.42 10.74 0.26 0.03 0.41 00.08 0.29 0.76 0.37 00.71 0.76 0.43 0.95 10.08 0.72 0.97 0.04 00.84 0.79 0.89 0.05 1
K=4
pred0.000.000.000.000.000.000.000.00
Fold : 1
pred0.960.030.000.000.000.000.000.00
Train
Predict pred0.960.030.900.120.000.000.000.00
Fold : 2Fold : 3
pred0.960.030.900.120.030.770.000.00
Fold : 4
pred0.960.030.900.120.030.770.180.91
Confidential13
Genetic algorithm approach
x1 x2 x3 x4 y0.14 0.69 0.01 0.71 10.22 0.44 0.45 0.69 10.12 0.35 0.51 0.23 00.22 0.42 0.79 0.60 10.93 0.82 0.72 0.50 10.32 0.58 0.28 0.22 00.95 0.59 0.68 0.09 10.34 0.58 0.35 0.81 00.05 0.80 0.28 0.86 10.23 0.49 0.63 0.03 00.05 0.34 0.53 0.73 10.74 0.02 0.33 0.56 0
Iteration1/10
X% accuracyFeature Importancex4 1x2 0.5x3 0.2x1 0.01
Tuned
Iteration2/10
+ Random Transformation
sqrt(x2)
Z% accuracyFeature Importancex2+x4 1x4 0.4
sqrt(x2) 0.3x3 0.2x2 0.05
Confidential14
Feature ranking based on permutations
0.5 0.52 0.69 0.2 10.18 0.94 0.57 0.68 00.67 0.6 0.76 0.91 10.25 0.68 0.57 0.07 10.83 0.07 0.76 0.56 00.49 0.92 0.64 0.02 0
0.17 0.7 0.57 10.96 0.75 0.59 10.96 0.14 0.09 00.61 0.38 0.07 00.07 0.13 0.87 10.66 0.27 0.48 1
0.410.560.210.610.260.02
0.020.560.210.410.610.26
Train
Validation
fit
score
80% accuracy 70% accuracy (drop)
Shuffle The 10% difference is how important that feature is
Bring back to normalRepeat for all features
Confidential15
StackingX0 x1 x2 xn y0.17 0.25 0.93 0.79 10.35 0.61 0.93 0.57 00.44 0.59 0.56 0.46 00.37 0.43 0.74 0.28 10.96 0.07 0.57 0.01 1
AX0 x1 x2 xn y0.89 0.72 0.50 0.66 00.58 0.71 0.92 0.27 10.10 0.35 0.27 0.37 00.47 0.68 0.30 0.98 00.39 0.53 0.59 0.18 1
BX0 x1 x2 xn y0.29 0.77 0.05 0.09 ?0.38 0.66 0.42 0.91 ?0.72 0.66 0.92 0.11 ?0.70 0.37 0.91 0.17 ?0.59 0.98 0.93 0.65 ?
C
pred0 pred1 pred2 y0.24 0.72 0.70 00.95 0.25 0.22 10.64 0.80 0.96 00.89 0.58 0.52 00.11 0.20 0.93 1
B1pred0 pred1 pred2 y0.50 0.50 0.39 ?0.62 0.59 0.46 ?0.22 0.31 0.54 ?0.90 0.47 0.09 ?0.20 0.09 0.61 ?
C1
Train algorithm 0 on A and make predictions for B and C and save to B1, C1 Train algorithm 1 on A and make predictions for B and C and save to B1, C1 Train algorithm 2 on A and make predictions for B and C and save to B1, C1
Train algorithm 3 on B1 and make predictions for C1
Preds30.450.230.990.340.05
Consider datasets A,B,C. Target variable (y) is known for A,B
Confidential16
“Theabilitytoexplainortopresentinunderstandabletermstoahuman.”– Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable
machine learning.” arXiv preprint. 2017. https://arxiv.org/pdf/1702.08608.pdf
FAT*: https://www.fatml.org/resources/principles-for-accountable-algorithms
XAI: https://www.darpa.mil/program/explainable-artificial-intelligence
Machine Learning Interpretability?
Confidential17
Interpretability and accuracy trade-off
Linear Models
Exact explanations forapproximate models.
Machine Learning
Approximate explanations for exact models.
Confidential18
DecisionTreeSurrogateModelsPartialDependenceandIndividualConditionalExpectation
https://github.com/jphall663/interpretable_machine_learning_with_python
Machine Learning Interpretability examples
H2O.ai Product Suite
Automatic feature engineering,machine learning and interpretability
• 100% open source – Apache V2 licensed• Built for data scientists – interface using R, Python on H2O Flow
(interactive notebook interface)• Enterprise support subscriptions
• Enterprise software• Built for domain users, analysts and
data scientists – GUI-based interface for end-to-end data science
• Fully automated machine learning from ingest to deployment
H2O AI open source engine integration with Spark
Lightning fast machine learning on GPUs
In-memory, distributed machine learning algorithms
with H2O Flow GUI
Open Source
Confidential20
H2O is the open source leader in AI.
Democratize AI for Everyone.
AI for good.
Confidential21
Will there be a default?
AUC, Accuracy, Precision…Time, hardware
Insight-visualization
Feature Engineering
Predictions
Model Interpretability
• Some input data• A target variable• An objective (or a success metric)• Some allocated resources
H2O Driverless AI process
x1 x2 x3 x4 y0.14 0.69 0.01 0.71 10.22 0.44 0.45 0.69 10.12 0.35 0.51 0.23 00.22 0.42 0.79 0.60 10.93 0.82 0.72 0.50 10.32 0.58 0.28 0.22 00.95 0.59 0.68 0.09 10.34 0.58 0.35 0.81 00.05 0.80 0.28 0.86 10.23 0.49 0.63 0.03 00.05 0.34 0.53 0.73 10.74 0.02 0.33 0.56 0
Confidential22
Scoring Pipeline• Python
• MOJO (Java)
Confidential23
Demo
Confidential24
Thank you
[email protected]/mariosmichailidis/
@StackNet_https://www.facebook.com/StackNet/
Top Related