Automatic Generation of Neural Network Architectures Using a … · 18 Automatic Generation of...

Wolf-Guido Bolick

Gießen, xx.xx.2017

Automatic Generation

of Neural Network

Architectures Using a

Genetic Algorithm

2 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016

Why should one use predicting models?

Special Tasks:

(Off-) Target-Prediction

Virtual Screening

Creation of focused libraries

Experiments need:

Prediction power and

performance increases:

• Improving

methodologies

• New approaches

• Moore’s law

Available Datasources:

In-house data accumulates

Publicly available data:

ChEMBL

ChEBI

…

Prediction

seconds

Experiment

weeks

3 Titel der Präsentation | DD.MM.YYYY

Observed performance (accuracy, kappa, …) of predictors depends on:

Training data

Test data

Preprocessing of data (e.g. type/length of fingerprints, substructures, …)

Methodology:

SVM

Random Forest

…

Neural Networks

In silico Prediction


What is a Neural Network (NN)?

Activation-functions transform input

values into output values for each neuron

millions of unique

combinations possible5 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016

NN Architectures & Hyperparameters

NN-Architecture

• Layer-Type

• Number of Layers

• Neurons per Layer

• Activation-Functions

Training-Parameters

• Optimizer

• Learning-Rate

• Weight-Decay

• Batch-Size

• Loss-Function

• …

Hyperparameters


Optimization of Hyperparameters

Expert Lucky People Everyone

Hyperparameters derived

from literature & experience

Hyperparameter search

within promising parameter

areas

Random-Search (Bergstra et al. 2012)

Grid-Search (Larochelle et al. 2007)

Probability based algorithms (Brochu et al. 2010, Bergstra et al. 2011)

Directed Random-Search

(e.g. genetic algorithms)

7

What is a Genetic Algorithm?

5.1

5.2

4

12 3

8

Validation Strategies

• Use as much data as possible for training

• Being able to get a realistic glimpse of the

performance

• 5-fold cross-validation

• Every compound represented in 4/5 models

• Hyperparameter optimization to increase

performance of validation sets

• Resulting performance trustworthy ?!

• 5-fold nested cross-validation 25 models

• Every compound represented in 16/25 models

• Increased computational requirements

• 5x Hyperparameter optimizations to increase

performances of validation sets

• Final performances evaluated using

corresponding outer loop test sets

9

Getting a job (hyperparameters) from the jobserver

Repeat for all training/test sets:

Building of a NN based on hyperparameters

Training of the NN using a training set

Balanced-Batch-Generator maintains the same active/inactive-ratio within a batch

Early-Stopping, when mean validation-loss of sliding window (15 epochs) does not

improve for 100 epochs

Evaluation of best state (center of best window)

using validation set, metric Cohen’s Kappa

Training of a NN

1

2

2.1

2.2

2.3

Agreement of labels vs. prediction

Agreement of 2 random observersAutomatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016


So many parameters..

Genetic Algorithm

• Population-Size: 100

• Workers: 10

• Fingerprint-Size:

1024

• Smarts-Patterns:

826

• Evolution-Strat.:

Drop-Worst-50%

Mutation Settings

• Default:

• Mutation-Rate: 5%

• Mutation-Strength: 1

• Crossing-Over-Rate: 30%

• Increased:

• Mutation-Rate: 10%

• Mutation-Strength: 2

• Crossing-Over-Rate: 30%

Training

• Optimizer: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam

• Loss-Functions: mae, mse, msle

• Learning-Rate: 0.05, 0.1, 0.5, 1.0

• Weight-Decay: 0.0, 1E-7, 5E-7

• Momentum: 0.0, 0.1, …, 0.9

• Nesterov: 0, 1

• Batch-Size: 5%, 6%, …, 20%

Architecture

• Layers: 1-4

• Layer-Types: Dense, Dropout

• Neurons: 32, 64, …, 512

• Dropout-Ratio: 5%, 10%, …, 90%

• Activation-Functions: linear, sigmoid, hard-sigmoid, softmax, relu, tanh


Datasets

Dataset hERG Micronucleus-Test

Compounds 6999 798

Actives 3205 (46%) 263 (33%)

Inactives 3794 (54%) 535 (67%)

Binary Classification: Inactive 0

Active 1


Found NN-Hyperparameters


Improvement of NNs while running the GA

Initial population starts with inner-

kappa values of ~0.6 in all splits

GA is able to improve performance of

best entities even more (red line)

Mutations can lead to bad performing

entities (blue line) until the last

generation


Novelty of Architectures

Proportion of new entities in population

decreases during the runtime of the GA

Higher mutation-rate (red line) increases

the searchable space for the GA


Influence of Hyperparameters

1_activation (344)

First hidden

layer

Activation-function

of this layer

Number of

contributing pairs

Contributing pairs only differ by

the shown parameter

Boxplots are based on the

absolute difference of both inner-

kappa values of all contributing

pairs


User-Interface


Implemented an algorithm to create a consensus-model using 5-fold nested cross-validation

Each compound is represented in 16 of 25 NNs

Calculation needs 8-14 hours (e.g. during a night) using a GTX-Cluster

GA improves already high kappa values of NNs even more

Kappa values of final NN-models are mostly larger than 0.5 (“moderate” according to Landis et al. 1977)

Further steps:

Possibility to use chemical descriptors and multiple fingerprints

Option to create multi-class models (more classes than just 0 and 1) and regression models

(Polishing up and writing a paper)

Conclusion

19

Images designed by Macrovector - Freepik.com


Implementation of the GA

Automatic Generation of Neural Network Architectures Using a … · 18 Automatic Generation of...

Documents

Transcript of Automatic Generation of Neural Network Architectures Using a … · 18 Automatic Generation of...