Evolutionary hyper-parameter selection for deep …...Image colorization Grayscale Deep colorization...

Silesian University of Technology and Future Processing

Evolutionary hyper-parameter selection fordeep neural networks

Jakub Nalepa

Silesian University of Technology, Gliwice, PolandFuture Processing, Gliwice, Poland

[email protected]

Machine Learning Meets Quantum Computation (QIPLSIGML)Krakow, Poland. April 26, 2018

IntroductionEvolving hyper-parameters of deep neural networks

What is next?

On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art

About me

QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 1 / 36


What is next?


My research interests


Evolutionaryalgorithms

Machinelearning

Deeplearning

Imageanalysis

Medicalimaging

Complexoptimization

problems



What is next?




Evolutionaryalgorithms

Machinelearning

Deeplearning

Imageanalysis

Medicalimaging

Evolutionary deep learning

Complexoptimization

problems



What is next?


Deep neural networksin the wild



What is next?


Segmentation of medical images

K. Pawełczyk et al.: Towards Detecting High-Uptake Lesions from Lung CT Scans Using Deep Learning, ICIAP 2017.

High-uptake lesions from CT scans

Retinal segmentation

P. Liskowski and K. Krawiec: Segmenting Retinal Blood Vessels

with Deep Neural Networks, IEEE Trans. Med. Imag. vol. 35,

no. 11, 2016.

Brain segmentation

A. de Brebisson and G. Montana: Deep Neural

Networks for Anatomical Brain Segmentation, IEEE

CVPR, 2015.



What is next?


Image colorization Grayscale Deep colorization Ground truth

R. Zhang, P. Isola, A. A. Efros: Colorful image colorization, ECCV 2016.



What is next?


Object detection and recognition

D. Erhan et al.: Scalable Object Detection using Deep Neural Networks, CVPR 2014.



What is next?


Text classificationTemporal and time-series analysisSelf-driving carsVoice generationMusic compositionReal-time analysis of behaviorsTranslationSpeech recognitionLanguage modelingDocument summarization. . .



What is next?


How to deploy a deep neuralnetwork?

1 Design a topology2 Select hyper-parameter values3 Train the network



What is next?



1 Design a topology

2 Select hyper-parameter values3 Train the network



What is next?



1 Design a topology2 Select hyper-parameter values

3 Train the network



What is next?






What is next?


The problem of hyper-parameter selection for DNNs



What is next?


The problem of hyper-parameter selection for DNNs


Main obstaclesIncreasingly hard as models get more complexMore dependent on experts to fine-tune the models


What is next?


Hyper-parameter selection as an optimization problem

λ∗ = arg min L(T ;M) = arg min f (λ;A,T ,V ,L),λ λ

where:f denotes the objective functionλ is a set of hyper-parametersL(T ;M) is a loss function for model M in training set TM is constructed by a learning algorithm A trained on T andvalidated on V



What is next?


Hyper-parameter selection as an optimization problem

λ∗ = arg min L(T ;M) = arg min f (λ;A,T ,V ,L),λ λ

where:f denotes the objective functionλ is a set of hyper-parametersL(T ;M) is a loss function for model M in training set TM is constructed by a learning algorithm A trained on T andvalidated on V


Main obstaclesObjective function f (x) is very expensive to computeThe number of hyper-parameters can be really large


What is next?


Automated Hyper-Parameter Selection

Model-Free Model-Based

Gridsearch

Randomsearch

BayesianOptimiza-

tion

EvolutionaryAlgorithms

Non-Probabilistic

TPE SpearmintCMA-ES PSO

RBFSurrogate

Model



What is next?Particle swarm optimization for DNNsExperiments

Particle swarm forhyper-parameter optimization

in DNNsRibalta P., Nalepa J. et al.: Particle Swarm Optimization for

Hyper-Parameter Selection in Deep Neural Networks, Proceedings of the2017 Annual Conference on Genetic and Evolutionary Computation,

GECCO 2017, pp 481-488, DOI: 10.1145/3071178.3071208, ACM, 2017.

Ribalta P., Nalepa J. et al.: Hyper-parameter Selection in Deep NeuralNetworks Using Parallel Particle Swarm Optimization, Proceedings of the

2017 Annual Conference on Genetic and Evolutionary Computation,GECCO 2017, pp 1864-1871, DOI: 10.1145/3067695.3084211, ACM,

2017.




Particle swarm optimization for DNNs

Swarm initialization:Randomly sample s vectors λ ∈ Rk from U(bl , bu)

Particle velocity updates:vi ← ωvi + φprp(λ∗

i − λi ) + φg rg (λS − λi )




Swarm evolution

1: while g ≤ Gmax do2: for i in 0, ..., s do3: Update velocity vi4: λi ← λi + vi5: if f (λi ) > f (λ∗

i ) then Improved particle’s best6: λ∗

i ← λi7: if f (λ∗

i ) > f (λS) then Improved swarm’s best8: λS ← λ∗

i9: if ‖ λS − λs

prev ‖< δ then No movement10: return λS

11: if f (λS)− f (λSprev ) < ε then No improvement

12: return λS

13: g ← g + 114: λS

prev ← λS

15: return λS




Swarm evolution

1: while g ≤ Gmax do2: for i in 0, ..., s do3: Update velocity vi4: λi ← λi + vi5: if f (λi ) > f (λ∗

i ) then Improved particle’s best6: λ∗

i ← λi7: if f (λ∗

i ) > f (λS) then Improved swarm’s best8: λS ← λ∗

i9: if ‖ λS − λs

prev ‖< δ then No movement10: return λS

11: if f (λS)− f (λSprev ) < ε then No improvement

12: return λS

13: g ← g + 114: λS

prev ← λS

15: return λS


Main advantagesPSO is independent from the underlying topologyPSO is inherently parallelizable



Experiments




ImplementationSetups:

Intel Xeon E5-2698 v3 (40M Cache, 2.30 GHz) with 128GB ofRAM and NVIDIA Tesla K80 GPU 24GB DDR5Intel i7-6850K (15M Cache, 3.80 GHz) with 32 GB RAM andNVIDIA Titan X Ultimate Pascal GPU 12GB GDDR5X

Implementation:Implemented in Python using Numpy

DNNs were trained using Keras with Tensorflow backendover CUDA 8.0 and CuDNN5.1

Setting:Objective function: Multi-class classification accuracy over Ψ10-fold cross-validation where | T |= 9 | V |Use of an archive cache calculated positions




Datasets

MNIST: 70,000 grayscale images (28× 28× 1 pixels) dividedin 10 classes (∼ 7, 000 images per class)

CIFAR-10: 60,000 color images (32× 32× 3 pixels) dividedin 10 classes (6, 000 images per class)




Experimental architecture – SimpleNet

Block 1

Block 1 Block 2

P0 C0 C1 C2

P – Pooling, C – ConvolutionalQIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 26 / 36



Architectures and parametrizationSimpleNet-Nk

N: Number of blocks (Convolution + Max Pooling)k: Number of convolutions prepended to the network

Layer parameters:

Layer type Parameters Values

Convolutional (C) Receptive field size (sF × sF )No. of receptive fields (n)

sF ≥ 2n ≥ 1

Max Pooling (P) Stride size (`)Receptive field size (sP)

` ≥ 2sP ≥ 2

Boundary values:

Layer bl buCn {n = 1, sF = 2} {n = 16, sF = 8}Pn {sp = 2, ` = 2} {sp = 4, ` = 4}




Influence of the swarm size (MNIST, SimpleNet-1)

Algorithm s Time (sec.) Positions gs Acc. on ΨGrid search — 87,356 1,008 — 0.9897

Random search — 39,906 400 — 0.9897PSO 4 934 14 14 0.9852PSO 10 2,091 29 20 0.9864PSO 16 13,892 49 23 0.9871





Min Avg Max Min Avg Max Min Avg Max0.900.910.920.930.940.950.960.970.980.99

1s = 4 s = 10 s = 16

Accuracyon

Ψ





PSO (best, s=4)

GS (best)

PSO (best, s=10)

GS (others)

PSO (best, s=16)

1 2 2 2C0,n C0,sF P0,sP P0,`

16 8 4 4




Incrementing SimpleNet (CIFAR-10)

SimpleNet-1SimpleNet-13

SimpleNet-11SimpleNet-2

SimpleNet-12

0 0.5 1 1.5 20.10.20.30.40.50.6

PSO evolution time (in hours)

Accu

racy

onΨ




Optimizing exisiting DNN topology (LeNet-4, MNIST)

Optimization time Average optimization time

1 2 3 4 5 6 7 8 9 10012340.96

0.970.980.99

1

Tim

e(h

)

Execution

Acc.Ψ

Min Avg Max




Optimizing exisiting DNN topology (LeNet-4, MNIST)

Classifier Error rate (%)Pairwise linear classifier 7.6Convolutional Clustering 1.4

SimpleNet-1, s=4 1.13SimpleNet-1, s=10 1.12

LeNet-4 1.1SimpleNet-1, s=16 1.08

Product of stumps on Haar features 0.87Boosted LeNet-4 0.7

LeNet-4 with PSO 0.66K-NN with non-linear deformation 0.52

NiN 0.47Maxout Networks 0.45

DSN 0.39R-CNN 0.31

MultiColumn DNN 0.23




Conclusions

PSO surpasses human expertise when optimizing DNNsAugmenting minimal DNNs and optimizing them with PSOcan be an effective tool for learning challenging datasetsPSO is independent from the underlying DNN topology



What is next?Future (current) work

Future work

Evolution of DNNsP. Ribalta Lorenzo and J. Nalepa: Memetic Evolution of DeepNeural Networks, GECCO 2018.K. Pawelczyl, M. Kawulok, and J. Nalepa: Genetically-TrainedDeep Neural Networks, GECCO Companion 2018.

Evolving deep neural networks for real-life dataLightweight fitness functionsUnderstanding the internals of deep neural networks

Hands-free design of robust deep neural networks



What is next?Future (current) work

Future work

Evolution of DNNsP. Ribalta Lorenzo and J. Nalepa: Memetic Evolution of DeepNeural Networks, GECCO 2018.K. Pawelczyl, M. Kawulok, and J. Nalepa: Genetically-TrainedDeep Neural Networks, GECCO Companion 2018.

Evolving deep neural networks for real-life dataLightweight fitness functionsUnderstanding the internals of deep neural networksHands-free design of robust deep neural networks


Silesian University of Technology and Future Processing

Evolutionary hyper-parameter selection fordeep neural networks

Jakub Nalepa

Silesian University of Technology, Gliwice, PolandFuture Processing, Gliwice, Poland

[email protected]

Thank you!Machine Learning Meets Quantum Computation (QIPLSIGML)

Krakow, Poland. April 26, 2018

Evolutionary hyper-parameter selection for deep …...Image colorization Grayscale Deep colorization...

Documents

Transcript of Evolutionary hyper-parameter selection for deep …...Image colorization Grayscale Deep colorization...