Evolutionary hyper-parameter selection for deep …...Image colorization Grayscale Deep colorization...
Transcript of Evolutionary hyper-parameter selection for deep …...Image colorization Grayscale Deep colorization...
Silesian University of Technology and Future Processing
Evolutionary hyper-parameter selection fordeep neural networks
Jakub Nalepa
Silesian University of Technology, Gliwice, PolandFuture Processing, Gliwice, Poland
Machine Learning Meets Quantum Computation (QIPLSIGML)Krakow, Poland. April 26, 2018
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
About me
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 1 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
My research interests
My research interests
Evolutionaryalgorithms
Machinelearning
Deeplearning
Imageanalysis
Medicalimaging
Complexoptimization
problems
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 2 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
My research interests
My research interests
Evolutionaryalgorithms
Machinelearning
Deeplearning
Imageanalysis
Medicalimaging
Evolutionary deep learning
Complexoptimization
problems
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 3 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Deep neural networksin the wild
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 4 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Segmentation of medical images
K. Pawełczyk et al.: Towards Detecting High-Uptake Lesions from Lung CT Scans Using Deep Learning, ICIAP 2017.
High-uptake lesions from CT scans
Retinal segmentation
P. Liskowski and K. Krawiec: Segmenting Retinal Blood Vessels
with Deep Neural Networks, IEEE Trans. Med. Imag. vol. 35,
no. 11, 2016.
Brain segmentation
A. de Brebisson and G. Montana: Deep Neural
Networks for Anatomical Brain Segmentation, IEEE
CVPR, 2015.
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 5 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Image colorization Grayscale Deep colorization Ground truth
R. Zhang, P. Isola, A. A. Efros: Colorful image colorization, ECCV 2016.
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 6 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Object detection and recognition
D. Erhan et al.: Scalable Object Detection using Deep Neural Networks, CVPR 2014.
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 7 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Text classificationTemporal and time-series analysisSelf-driving carsVoice generationMusic compositionReal-time analysis of behaviorsTranslationSpeech recognitionLanguage modelingDocument summarization. . .
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 8 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
How to deploy a deep neuralnetwork?
1 Design a topology2 Select hyper-parameter values3 Train the network
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 9 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
How to deploy a deep neuralnetwork?
1 Design a topology
2 Select hyper-parameter values3 Train the network
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 9 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
How to deploy a deep neuralnetwork?
1 Design a topology2 Select hyper-parameter values
3 Train the network
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 9 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
How to deploy a deep neuralnetwork?
1 Design a topology2 Select hyper-parameter values3 Train the network
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 9 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
How to deploy a deep neuralnetwork?
1 Design a topology2 Select hyper-parameter values3 Train the network
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 10 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
The problem of hyper-parameter selection for DNNs
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 11 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
The problem of hyper-parameter selection for DNNs
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 12 / 36
Main obstaclesIncreasingly hard as models get more complexMore dependent on experts to fine-tune the models
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Hyper-parameter selection as an optimization problem
λ∗ = arg min L(T ;M) = arg min f (λ;A,T ,V ,L),λ λ
where:f denotes the objective functionλ is a set of hyper-parametersL(T ;M) is a loss function for model M in training set TM is constructed by a learning algorithm A trained on T andvalidated on V
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 13 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Hyper-parameter selection as an optimization problem
λ∗ = arg min L(T ;M) = arg min f (λ;A,T ,V ,L),λ λ
where:f denotes the objective functionλ is a set of hyper-parametersL(T ;M) is a loss function for model M in training set TM is constructed by a learning algorithm A trained on T andvalidated on V
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 14 / 36
Main obstaclesObjective function f (x) is very expensive to computeThe number of hyper-parameters can be really large
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Automated Hyper-Parameter Selection
Model-Free Model-Based
Gridsearch
Randomsearch
BayesianOptimiza-
tion
EvolutionaryAlgorithms
Non-Probabilistic
TPE SpearmintCMA-ES PSO
RBFSurrogate
Model
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 15 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?
On deep neural networksThe problem of hyper-parameter selection for DNNsAutomatic Hyper-Parameter Selection–state of the art
Automated Hyper-Parameter Selection
Model-Free Model-Based
Gridsearch
Randomsearch
BayesianOptimiza-
tion
EvolutionaryAlgorithms
Non-Probabilistic
TPE SpearmintCMA-ES PSO
RBFSurrogate
Model
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 16 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Particle swarm forhyper-parameter optimization
in DNNsRibalta P., Nalepa J. et al.: Particle Swarm Optimization for
Hyper-Parameter Selection in Deep Neural Networks, Proceedings of the2017 Annual Conference on Genetic and Evolutionary Computation,
GECCO 2017, pp 481-488, DOI: 10.1145/3071178.3071208, ACM, 2017.
Ribalta P., Nalepa J. et al.: Hyper-parameter Selection in Deep NeuralNetworks Using Parallel Particle Swarm Optimization, Proceedings of the
2017 Annual Conference on Genetic and Evolutionary Computation,GECCO 2017, pp 1864-1871, DOI: 10.1145/3067695.3084211, ACM,
2017.
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 17 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Particle swarm optimization for DNNs
Swarm initialization:Randomly sample s vectors λ ∈ Rk from U(bl , bu)
Particle velocity updates:vi ← ωvi + φprp(λ∗
i − λi ) + φg rg (λS − λi )
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 18 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Particle swarm optimization for DNNs
Swarm initialization:Randomly sample s vectors λ ∈ Rk from U(bl , bu)
Particle velocity updates:vi ← ωvi + φprp(λ∗
i − λi ) + φg rg (λS − λi )
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 19 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Particle swarm optimization for DNNs
Swarm initialization:Randomly sample s vectors λ ∈ Rk from U(bl , bu)
Particle velocity updates:vi ← ωvi + φprp(λ∗
i − λi ) + φg rg (λS − λi )
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 20 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Swarm evolution
1: while g ≤ Gmax do2: for i in 0, ..., s do3: Update velocity vi4: λi ← λi + vi5: if f (λi ) > f (λ∗
i ) then Improved particle’s best6: λ∗
i ← λi7: if f (λ∗
i ) > f (λS) then Improved swarm’s best8: λS ← λ∗
i9: if ‖ λS − λs
prev ‖< δ then No movement10: return λS
11: if f (λS)− f (λSprev ) < ε then No improvement
12: return λS
13: g ← g + 114: λS
prev ← λS
15: return λS
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 21 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Swarm evolution
1: while g ≤ Gmax do2: for i in 0, ..., s do3: Update velocity vi4: λi ← λi + vi5: if f (λi ) > f (λ∗
i ) then Improved particle’s best6: λ∗
i ← λi7: if f (λ∗
i ) > f (λS) then Improved swarm’s best8: λS ← λ∗
i9: if ‖ λS − λs
prev ‖< δ then No movement10: return λS
11: if f (λS)− f (λSprev ) < ε then No improvement
12: return λS
13: g ← g + 114: λS
prev ← λS
15: return λS
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 22 / 36
Main advantagesPSO is independent from the underlying topologyPSO is inherently parallelizable
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Experiments
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 23 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
ImplementationSetups:
Intel Xeon E5-2698 v3 (40M Cache, 2.30 GHz) with 128GB ofRAM and NVIDIA Tesla K80 GPU 24GB DDR5Intel i7-6850K (15M Cache, 3.80 GHz) with 32 GB RAM andNVIDIA Titan X Ultimate Pascal GPU 12GB GDDR5X
Implementation:Implemented in Python using Numpy
DNNs were trained using Keras with Tensorflow backendover CUDA 8.0 and CuDNN5.1
Setting:Objective function: Multi-class classification accuracy over Ψ10-fold cross-validation where | T |= 9 | V |Use of an archive cache calculated positions
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 24 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Datasets
MNIST: 70,000 grayscale images (28× 28× 1 pixels) dividedin 10 classes (∼ 7, 000 images per class)
CIFAR-10: 60,000 color images (32× 32× 3 pixels) dividedin 10 classes (6, 000 images per class)
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 25 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Experimental architecture – SimpleNet
Block 1
Block 1 Block 2
P0 C0 C1 C2
P – Pooling, C – ConvolutionalQIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 26 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Architectures and parametrizationSimpleNet-Nk
N: Number of blocks (Convolution + Max Pooling)k: Number of convolutions prepended to the network
Layer parameters:
Layer type Parameters Values
Convolutional (C) Receptive field size (sF × sF )No. of receptive fields (n)
sF ≥ 2n ≥ 1
Max Pooling (P) Stride size (`)Receptive field size (sP)
` ≥ 2sP ≥ 2
Boundary values:
Layer bl buCn {n = 1, sF = 2} {n = 16, sF = 8}Pn {sp = 2, ` = 2} {sp = 4, ` = 4}
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 27 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Influence of the swarm size (MNIST, SimpleNet-1)
Algorithm s Time (sec.) Positions gs Acc. on ΨGrid search — 87,356 1,008 — 0.9897
Random search — 39,906 400 — 0.9897PSO 4 934 14 14 0.9852PSO 10 2,091 29 20 0.9864PSO 16 13,892 49 23 0.9871
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 28 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Influence of the swarm size (MNIST, SimpleNet-1)
Min Avg Max Min Avg Max Min Avg Max0.900.910.920.930.940.950.960.970.980.99
1s = 4 s = 10 s = 16
Accuracyon
Ψ
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 29 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Influence of the swarm size (MNIST, SimpleNet-1)
PSO (best, s=4)
GS (best)
PSO (best, s=10)
GS (others)
PSO (best, s=16)
1 2 2 2C0,n C0,sF P0,sP P0,`
16 8 4 4
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 30 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Incrementing SimpleNet (CIFAR-10)
SimpleNet-1SimpleNet-13
SimpleNet-11SimpleNet-2
SimpleNet-12
0 0.5 1 1.5 20.10.20.30.40.50.6
PSO evolution time (in hours)
Accu
racy
onΨ
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 31 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Optimizing exisiting DNN topology (LeNet-4, MNIST)
Optimization time Average optimization time
1 2 3 4 5 6 7 8 9 10012340.96
0.970.980.99
1
Tim
e(h
)
Execution
Acc.Ψ
Min Avg Max
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 32 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Optimizing exisiting DNN topology (LeNet-4, MNIST)
Classifier Error rate (%)Pairwise linear classifier 7.6Convolutional Clustering 1.4
SimpleNet-1, s=4 1.13SimpleNet-1, s=10 1.12
LeNet-4 1.1SimpleNet-1, s=16 1.08
Product of stumps on Haar features 0.87Boosted LeNet-4 0.7
LeNet-4 with PSO 0.66K-NN with non-linear deformation 0.52
NiN 0.47Maxout Networks 0.45
DSN 0.39R-CNN 0.31
MultiColumn DNN 0.23
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 33 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Particle swarm optimization for DNNsExperiments
Conclusions
PSO surpasses human expertise when optimizing DNNsAugmenting minimal DNNs and optimizing them with PSOcan be an effective tool for learning challenging datasetsPSO is independent from the underlying DNN topology
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 34 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Future (current) work
Future work
Evolution of DNNsP. Ribalta Lorenzo and J. Nalepa: Memetic Evolution of DeepNeural Networks, GECCO 2018.K. Pawelczyl, M. Kawulok, and J. Nalepa: Genetically-TrainedDeep Neural Networks, GECCO Companion 2018.
Evolving deep neural networks for real-life dataLightweight fitness functionsUnderstanding the internals of deep neural networks
Hands-free design of robust deep neural networks
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 35 / 36
IntroductionEvolving hyper-parameters of deep neural networks
What is next?Future (current) work
Future work
Evolution of DNNsP. Ribalta Lorenzo and J. Nalepa: Memetic Evolution of DeepNeural Networks, GECCO 2018.K. Pawelczyl, M. Kawulok, and J. Nalepa: Genetically-TrainedDeep Neural Networks, GECCO Companion 2018.
Evolving deep neural networks for real-life dataLightweight fitness functionsUnderstanding the internals of deep neural networksHands-free design of robust deep neural networks
QIPLSIGML 2018 J. Nalepa: Evolutionary hyper-parameter selection for deep neural networks 35 / 36
Silesian University of Technology and Future Processing
Evolutionary hyper-parameter selection fordeep neural networks
Jakub Nalepa
Silesian University of Technology, Gliwice, PolandFuture Processing, Gliwice, Poland
Thank you!Machine Learning Meets Quantum Computation (QIPLSIGML)
Krakow, Poland. April 26, 2018