An Evolutionary Method for Training Autoencoders for Deep Learning Networks

43
University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method for Training Autoencoders for Deep Learning Networks MASTER’S THESIS DEFENSE SEAN LANDER ADVISOR: YI SHANG

description

An Evolutionary Method for Training Autoencoders for Deep Learning Networks. Master’s Thesis Defense Sean Lander Advisor: Yi Shang. Agenda. Overview Background and Related Work Methods Performance and Testing Results Conclusion and Future Work. Agenda. Overview - PowerPoint PPT Presentation

Transcript of An Evolutionary Method for Training Autoencoders for Deep Learning Networks

Page 1: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

University of Missouri, Department of Computer ScienceUniversity of Missouri, Informatics Institute

Sean Lander, Master’s Candidate

An Evolutionary Method for Training Autoencoders for Deep Learning NetworksMASTER’S THESIS DEFENSE

SEAN LANDER

ADVISOR: YI SHANG

Page 2: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 3: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 4: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

OverviewDeep Learning classification/reconstructionoSince 2006, Deep Learning Networks (DLNs) have changed the landscape of classification problemsoStrong ability to create and utilize abstract featuresoEasily lends itself to GPU and distributed systemsoDoes not require labeled data – VERY IMPORTANToCan be used for feature reduction and classification

Page 5: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

OverviewProblem and proposed solutionoProblems with DLNs:oCostly to train with large data sets or high feature spacesoLocal minima systemic with Artificial Neural NetworksoHyper-parameters must be hand selected

oProposed Solutions:oEvolutionary based approach with local search phaseo Increased chance of global minimumoOptimizes structure based on abstracted featuresoData partitions based on population size (large data only)oReduced training timeoReduced chances of overfitting

Page 6: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 7: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundPerceptronsoStarted with Perceptron in 1950oOnly capable of linear separabilityoFailed on XOR

Page 8: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundArtificial Neural Networks (ANNs)oANNs went out of favor until the Multilayer Perceptron (MLP) introducedoPro: Non-linear classificationoCon: Time consuming

oAdvance in training: BackpropagationoIncreased training speedsoLimited to shallow networksoError propagation diminishes anumber of layers increase

Page 9: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundBackpropagation using Gradient DescentoProposed in 1988, based on classification erroroGiven m training samples:

oFor each sample where calculate its error:

oFor all m training samples the total error can be calculated as:

Page 10: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundDeep Learning Networks (DLNs)oAllows for deep networks with multiple layersoLayers pre-trained using unlabeled dataoLayers are “stacked” and fine tunedoMinimizes error degradation for deepneural networks (many layers)

oStill costly to trainoManual selection of hyper-parametersoLocal, not global, minimum

Page 11: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundAutoencoders for reconstructionoAutoencoders can be used forfeature reduction and clusteringo“Classification error” is the abilityto reconstruct the sample inputoAbstracted features – output fromthe hidden layer – can be used toreplace raw input for othertechniques

Page 12: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Related WorkEvolutionary and genetic ANNsoFirst use of Genetic Algorithms (GAs) in 1989oTwo layer ANN on a small data setoTested multiple types of chromosomal encodings and mutation types

oLate 1990s and early 2000s introduced other techniquesoMulti-level mutations and mutation priorityoAddition of local search in each generationoInclusion of hyper-parameters as part of the mutationoIssue of competing conventions starts to appearo Two ANNs produce the same results by sharing the same nodes but in a permuted order

Page 13: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Related WorkHyper-parameter selection for DLNsoMajority of the work explored using newer technologies and methods such as GPU and distributed (MapReduce) trainingoImproved versions of Backpropagation, such as Conjugated Gradient or Limited Memory BFGS were tested under different conditionsoMost conclusions pointed toward manual parameter selection via trial-and-error

Page 14: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 15: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Method 1Evolutionary Autoencoder (EvoAE)oIDEA: Autoencoders’ power are in their feature abstraction, the hidden node outputoTraining many AEs willmake more potentialabstracted featuresoBest AEs will contain thebest featuresoJoining these featuresshould create a better AE

Page 16: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Method 1Evolutionary Autoencoder (EvoAE)

x x’

A3

A4

A1

A2

h x

B3

C2

B1

B2

h

Initi

aliza

tion

Loca

l Sea

rch

x’

Cros

sove

rM

utati

on

Page 17: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Method 1ADistributed learning and Mini-batchesoTraining of generic EvoAE increases in time linearly to the size of the populationoANN training time increases drastically with data sizeoTo combat this, mini-batches can be used where each AE is trained against a batch and updatedoBatch size << total data

Page 18: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Method 1ADistributed learning and Mini-batchesoEvoAE lends itself to distributed systemoData duplication and storage now an issue due to data duplication

Train• Forward propagation• Backpropagation

Rank• Calculate error• Sort

GA• Crossover• Mutate

Batch 1

Batch 2

Batch N

Page 19: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Method 2EvoAE Evo-batchesoIDEA: When data is large, small batches can be representativeoPrevents overfitting as nodes being trained are almost always introduced to new dataoScales well with large amounts of data even when parallel training is not possibleoWorks well on limited memory systems by increasing size of the population, thus reducing data per batchoQuick training of large populations, equivalent to training a single autoencoder using traditional methods

Page 20: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Method 2EvoAE Evo-batches

Data A Data B Data C Data D

Data D

Data C

Data B

Data A

Original Data

Local SearchCrossoverMutate

Page 21: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 22: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Performance and TestingHardware and testing parametersoLenovo Y500 laptopoIntel i7 3rd generation 2.4GHzo12 GB RAM

oAll weights randomly initialized to N(0,0.5)Parameter Wine Iris Heart Disease MNIST

Hidden Size 32 32 12 200

Hidden Std Dev NULL NULL NULL 80

Hidden +/- 16 16 6 NULL

Mutation Rate 0.1 0.1 0.1 0.1

Parameter Defaults

Learning Rate 0.1

Momentum 2

Weight Decay 0.003

Population Size 30

Generations 50

Epochs/Gen 20

Train/Validate 80/20

Page 23: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Performance and TestingBaseline

Learning rate Learning rate * 0.1

oBaseline is a single AE with 30 random initializationsoTwo learning rates to create two baseline measurementsoBase learning rateoLearning rate * 0.1

Page 24: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Performance and TestingData partitioningoThree data partitioning methods were usedoFull dataoMini-batchoEvo-batch

Full data Mini-batch Evo-batch

Page 25: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Performance and TestingPost-training configurationsoPost-training run in the following waysoFull data (All)oBatch data (Batch)oNone

Full data Batch data None

All sets below are using the Evo-batch configuration

Page 26: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 27: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsParameters ReviewParameter Wine MNIST

Hidden Size 32 200

Hidden Std Dev NULL 80

Hidden +/- 16 NULL

Mutation Rate 0.1 0.1

Parameter Defaults

Learning Rate 0.1

Momentum 2

Weight Decay 0.003

Population Size 30

Generations 50

Epochs/Gen 20

Train/Validate 80/20

Page 28: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsDatasetsoUCI wine dataseto178 sampleso13 featureso3 classesoReduced MNIST dataseto6000/1000 and 24k/6k training/testing sampleso784 featureso10 classes (0-9)

Page 29: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsSmall datasets - UCI Wine

Parameter Wine

Hidden Size 32

Hidden Std Dev NULL

Hidden +/- 16

Mutation Rate 0.1

Page 30: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsSmall datasets - UCI WineoBest error-to-speed:oBaseline 1

oBest overall error:oFull data All

oFull data is fast onsmall scale dataoEvo- and mini-batchnot good on smallscale data

Parameter Wine

Hidden Size 32

Hidden Std Dev NULL

Hidden +/- 16

Mutation Rate 0.1

Page 31: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsSmall datasets – MNIST 6k/1k

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

Page 32: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsSmall datasets – MNIST 6k/1koBest error-to-time:oMini-batch None

oBest overall error:oMini-batch Batch

oFull data slowsexponentially onlarge scale dataoEvo- and mini-batchclose to baseline speed

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

Page 33: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsMedium datasets – MNIST 24k/6k

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

Page 34: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsMedium datasets – MNIST 24k/6koBest error-to-time:oEvo-batch None

oBest overall error:oEvo-batch Batch ORoMini-batch Batch

oFull data too slow torun on datasetoEvoAE w/ population30 trains as quickly asa single baseline AEwhen using Evo-batch

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

Page 35: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

Page 36: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ConclusionsGood for large problemsoTraditional methods are still preferred choice for small problems and toy problemsoEvoAE with Evo-batch produces effective and efficient feature reduction given a large volume of dataoEvoAE is robust against poorly-chosen hyper-parameters, specifically learning rate

Page 37: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Future WorkoImmediate goals:oTransition to distributed system, MapReduce based or otherwiseoHarness GPU technology for increased speeds (~50% in some

cases)

oLong term goals:oOpen the system for use by novices and non-programmersoMake the system easy to use and transparent to the user for both

modification and training purposes

Page 38: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

Thank you

Page 39: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundBackpropagation with weight decayoWe use this new cost to update weights and biases given some learning rate α:

oCost is prone to overfitting - weight decay variable λ is added

Page 40: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundConjugated Gradient DescentoThis can become stuck in a loop, however, so we add a momentum term β

oThis adds memory to the equation, as we use previous updates

Page 41: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

BackgroundArchitecture and hyper-parametersoArchitecture and hyper-parameter selection usually done through trial-and-erroroManually optimized and updated by handoDynamic learning rates can beimplemented to correct forsub-optimal learning rate selection

Page 42: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsSmall datasets – UCI IrisoThe UCI Iris dataset has 150 samples with 4 features and 3 classesoBest error-to-speed:oBaseline 1

oBest overall error:oFull data None

Parameter Iris

Hidden Size 32

Hidden Std Dev NULL

Hidden +/- 16

Mutation Rate 0.1

Page 43: An Evolutionary Method for Training  Autoencoders  for Deep Learning Networks

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

ResultsSmall datasets – UCI Heart DiseaseoThe UCI Heart Disease dataset has 297 samples with 13 features and 5 classesoBest error-to-time:oBaseline 1

oBest overall error:oFull data None

Parameter Heart Disease

Hidden Size 12

Hidden Std Dev NULL

Hidden +/- 6

Mutation Rate 0.1