Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for...

20
CSC411/2515 Fall 2016 Neural Networks Tutorial Lluís Castrejón Oct. 2016 Slides adapted from Yujia Li’s tutorial and Prof. Zemel’s lecture notes.

Transcript of Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for...

Page 1: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

CSC411/2515 Fall 2016

Neural Networks Tutorial

Lluís Castrejón Oct. 2016

Slides adapted from Yujia Li’s tutorial and Prof. Zemel’s lecture notes.

Page 2: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Overfitting • Thetrainingdatacontainsinformationabouttheregularitiesinthemappingfrominputtooutput.Butitalsocontainsnoise– Thetargetvaluesmaybeunreliable.– Thereissamplingerror.Therewillbeaccidentalregularitiesjustbecauseoftheparticulartrainingcasesthatwerechosen

• Whenwefitthemodel,itcannottellwhichregularitiesarerealandwhicharecausedbysamplingerror.– Soitfitsbothkindsofregularity.– Ifthemodelisveryflexibleitcanmodelthesamplingerrorreallywell.Thisisadisaster.

2

Page 3: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Overfitting

Picture credit: Chris Bishop. Pattern Recognition and Machine Learning. Ch.1.1.

Page 4: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Preventingoverfitting

• Useamodelthathastherightcapacity:– enoughtomodelthetrueregularities– notenoughtoalsomodelthespuriousregularities(assumingtheyareweaker)

• Standardwaystolimitthecapacityofaneuralnet:– Limitthenumberofhiddenunits.– Limitthesizeoftheweights.– Stopthelearningbeforeithastimetooverfit.

4

Page 5: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Limitingthesizeoftheweights

Weight-decayinvolvesaddinganextratermtothecostfunctionthatpenalizesthesquaredweights.

– Keepsweightssmallunlesstheyhavebigerrorderivatives.

ii

i

iii

ii

wEw

wCwhen

wwE

wC

wEC

∂−==

+∂

∂=

∑+=

λ

λ

λ

1

22

,0

w

C

5

Page 6: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Theeffectofweight-decay

• Itpreventsthenetworkfromusingweightsthatitdoesnotneed– Thiscanoftenimprovegeneralizationalot.– Ithelpstostopitfromfittingthesamplingerror.– Itmakesasmoothermodelinwhichtheoutputchangesmoreslowlyastheinputchanges.

• But,ifthenetworkhastwoverysimilarinputsitpreferstoputhalftheweightoneachratherthanalltheweightononeà otherformofweightdecay?

w/2 w/2 w 0

6

Page 7: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Decidinghowmuchtorestrictthecapacity

• Howdowedecidewhichlimittouseandhowstrongtomakethelimit?– Ifweusethetestdatawegetanunfairpredictionoftheerrorratewewouldgetonnewtestdata.

– Supposewecomparedasetofmodelsthatgaverandomresults,thebestoneonaparticulardatasetwoulddobetterthanchance.Butitwon’tdobetterthanchanceonanothertestset.

• Souseaseparatevalidationsettodomodelselection.

7

Page 8: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Usingavalidationset

• Dividethetotaldatasetintothreesubsets:– Trainingdataisusedforlearningtheparametersofthemodel.

– Validationdataisnotusedoflearningbutisusedfordecidingwhattypeofmodelandwhatamountofregularizationworksbest

– Testdataisusedtogetafinal,unbiasedestimateofhowwellthenetworkworks.Weexpectthisestimatetobeworsethanonthevalidationdata

• Wecouldthenre-dividethetotaldatasettogetanotherunbiasedestimateofthetrueerrorrate.

8

Page 9: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Preventingoverfittingbyearlystopping

• Ifwehavelotsofdataandabigmodel,itsveryexpensivetokeepre-trainingitwithdifferentamountsofweightdecay

• Itismuchcheapertostartwithverysmallweightsandletthemgrowuntiltheperformanceonthevalidationsetstartsgettingworse

• Thecapacityofthemodelislimitedbecausetheweightshavenothadtimetogrowbig.

9

Page 10: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Whyearlystoppingworks

• Whentheweightsareverysmall,everyhiddenunitisinitslinearrange.– Soanetwithalargelayerofhiddenunitsislinear.

– Ithasnomorecapacitythanalinearnetinwhichtheinputsaredirectlyconnectedtotheoutputs!

• Astheweightsgrow,thehiddenunitsstartusingtheirnon-linearrangessothecapacitygrows.

outputs

inputs

10

Page 11: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

LeNet

• YannLeCunandothersdevelopedareallygoodrecognizerforhandwrittendigitsbyusingbackpropagationinafeedforwardnetwith:– Manyhiddenlayers– Manypoolsofreplicatedunitsineachlayer.– Averagingtheoutputsofnearbyreplicatedunits.– Awidenetthatcancopewithseveralcharactersatonceeveniftheyoverlap.

• DemoofLENET

11

Page 12: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

RecognizingDigitsHand-writtendigitrecognitionnetwork

– 7291trainingexamples,2007testexamples– Bothcontainambiguousandmisclassifiedexamples– Inputpre-processed(segmented,normalized)

• 16x16graylevel[-1,1],10outputs

12

Page 13: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

LeNet:SummaryMainideas:

• Localà globalprocessing • Retaincoarseposninfo

Maintechnique:weightsharing–unitsarrangedinfeaturemaps

Connections:1256units,64,660cxns,9760freeparameters

Results:0.14%(train),5.0%(test)

vs.3-layernetw/40hiddenunits: 1.6%(train),8.1%(test)

13

Page 14: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

The82errorsmadebyLeNet5

Notice that most of the errors are cases that people find quite easy. The human error rate is probably 20 to 30 errors

14

Page 15: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Abruteforceapproach

• LeNetusesknowledgeabouttheinvariancestodesign:– thenetworkarchitecture– ortheweightconstraints– orthetypesoffeature

• Butitsmuchsimplertoincorporateknowledgeofinvariancesbyjustcreatingextratrainingdata:– foreachtrainingimage,producenewtrainingdatabyapplyingallofthetransformationswewanttobeinsensitiveto

– Thentrainalarge,dumbnetonafastcomputer.– Thisworkssurprisinglywell

15

Page 16: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

16

Page 17: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Makingbackpropagationworkforrecognizingdigits

• Usingthestandardviewingtransformations,andlocaldeformationfieldstogetlotsofdata.

• Usemany,globallyconnectedhiddenlayersandlearnforaverylongtime– ThisrequiresaGPUboardoralargecluster

• Usetheappropriateerrormeasureformulti-classcategorization– Cross-entropy,withsoftmaxactivation

• Thisapproachcanget35errorsonMNIST!17

Page 18: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

Fabricatingtrainingdata

Goodgeneralizationrequireslotsoftrainingdata,includingexamplesfromallrelevantinputregions

ImprovesolutionifgooddatacanbeconstructedExample:ALVINN

18

Page 19: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

ALVINN:simulatingtrainingexamples

On-the-flytraining:currentvideocameraimageasinput,currentsteeringdirectionastarget

But:over-trainonsameinputs;noexperiencegoingoff-road

Method:generatenewexamplesbyshiftingimages

Replace10low-error&5randomtrainingexampleswith15new

Key:relationbetweeninputandoutputknown!

19

Page 20: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

NeuralNetDemos

Scene recognition - Places MIT

Digit recognition

Neural Nets Playground

Neural Style Transfer