Making neural programming architectures generalize via recursion 20170224
Automatic Generation of Neural Network Architectures Using a … · 18 Automatic Generation of...
Transcript of Automatic Generation of Neural Network Architectures Using a … · 18 Automatic Generation of...
Wolf-Guido Bolick
Gießen, xx.xx.2017
Automatic Generation
of Neural Network
Architectures Using a
Genetic Algorithm
2 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Why should one use predicting models?
Special Tasks:
(Off-) Target-Prediction
Virtual Screening
Creation of focused libraries
Experiments need:
Prediction power and
performance increases:
• Improving
methodologies
• New approaches
• Moore’s law
Available Datasources:
In-house data accumulates
Publicly available data:
ChEMBL
ChEBI
…
Prediction
seconds
Experiment
weeks
3 Titel der Präsentation | DD.MM.YYYY
Observed performance (accuracy, kappa, …) of predictors depends on:
Training data
Test data
Preprocessing of data (e.g. type/length of fingerprints, substructures, …)
Methodology:
SVM
Random Forest
…
Neural Networks
In silico Prediction
4 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
What is a Neural Network (NN)?
Activation-functions transform input
values into output values for each neuron
millions of unique
combinations possible5 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
NN Architectures & Hyperparameters
NN-Architecture
• Layer-Type
• Number of Layers
• Neurons per Layer
• Activation-Functions
Training-Parameters
• Optimizer
• Learning-Rate
• Weight-Decay
• Batch-Size
• Loss-Function
• …
Hyperparameters
6 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Optimization of Hyperparameters
Expert Lucky People Everyone
Hyperparameters derived
from literature & experience
Hyperparameter search
within promising parameter
areas
Random-Search (Bergstra et al. 2012)
Grid-Search (Larochelle et al. 2007)
Probability based algorithms (Brochu et al. 2010, Bergstra et al. 2011)
Directed Random-Search
(e.g. genetic algorithms)
7
What is a Genetic Algorithm?
5.1
5.2
4
12 3
8
Validation Strategies
• Use as much data as possible for training
• Being able to get a realistic glimpse of the
performance
• 5-fold cross-validation
• Every compound represented in 4/5 models
• Hyperparameter optimization to increase
performance of validation sets
• Resulting performance trustworthy ?!
• 5-fold nested cross-validation 25 models
• Every compound represented in 16/25 models
• Increased computational requirements
• 5x Hyperparameter optimizations to increase
performances of validation sets
• Final performances evaluated using
corresponding outer loop test sets
9
Getting a job (hyperparameters) from the jobserver
Repeat for all training/test sets:
Building of a NN based on hyperparameters
Training of the NN using a training set
Balanced-Batch-Generator maintains the same active/inactive-ratio within a batch
Early-Stopping, when mean validation-loss of sliding window (15 epochs) does not
improve for 100 epochs
Evaluation of best state (center of best window)
using validation set, metric Cohen’s Kappa
Training of a NN
1
2
2.1
2.2
2.3
Agreement of labels vs. prediction
Agreement of 2 random observersAutomatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
10 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
So many parameters..
Genetic Algorithm
• Population-Size: 100
• Workers: 10
• Fingerprint-Size:
1024
• Smarts-Patterns:
826
• Evolution-Strat.:
Drop-Worst-50%
Mutation Settings
• Default:
• Mutation-Rate: 5%
• Mutation-Strength: 1
• Crossing-Over-Rate: 30%
• Increased:
• Mutation-Rate: 10%
• Mutation-Strength: 2
• Crossing-Over-Rate: 30%
Training
• Optimizer: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam
• Loss-Functions: mae, mse, msle
• Learning-Rate: 0.05, 0.1, 0.5, 1.0
• Weight-Decay: 0.0, 1E-7, 5E-7
• Momentum: 0.0, 0.1, …, 0.9
• Nesterov: 0, 1
• Batch-Size: 5%, 6%, …, 20%
Architecture
• Layers: 1-4
• Layer-Types: Dense, Dropout
• Neurons: 32, 64, …, 512
• Dropout-Ratio: 5%, 10%, …, 90%
• Activation-Functions: linear, sigmoid, hard-sigmoid, softmax, relu, tanh
11 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Datasets
Dataset hERG Micronucleus-Test
Compounds 6999 798
Actives 3205 (46%) 263 (33%)
Inactives 3794 (54%) 535 (67%)
Binary Classification: Inactive 0
Active 1
12 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Found NN-Hyperparameters
13 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Found NN-Hyperparameters
14 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Improvement of NNs while running the GA
Initial population starts with inner-
kappa values of ~0.6 in all splits
GA is able to improve performance of
best entities even more (red line)
Mutations can lead to bad performing
entities (blue line) until the last
generation
15 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Novelty of Architectures
Proportion of new entities in population
decreases during the runtime of the GA
Higher mutation-rate (red line) increases
the searchable space for the GA
16 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Influence of Hyperparameters
1_activation (344)
First hidden
layer
Activation-function
of this layer
Number of
contributing pairs
Contributing pairs only differ by
the shown parameter
Boxplots are based on the
absolute difference of both inner-
kappa values of all contributing
pairs
17 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
User-Interface
18 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Implemented an algorithm to create a consensus-model using 5-fold nested cross-validation
Each compound is represented in 16 of 25 NNs
Calculation needs 8-14 hours (e.g. during a night) using a GTX-Cluster
GA improves already high kappa values of NNs even more
Kappa values of final NN-models are mostly larger than 0.5 (“moderate” according to Landis et al. 1977)
Further steps:
Possibility to use chemical descriptors and multiple fingerprints
Option to create multi-class models (more classes than just 0 and 1) and regression models
(Polishing up and writing a paper)
Conclusion
19
Images designed by Macrovector - Freepik.com
20 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
Implementation of the GA