Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for...
Transcript of Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for...
![Page 1: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/1.jpg)
CSC411/2515 Fall 2016
Neural Networks Tutorial
Lluís Castrejón Oct. 2016
Slides adapted from Yujia Li’s tutorial and Prof. Zemel’s lecture notes.
![Page 2: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/2.jpg)
Overfitting • Thetrainingdatacontainsinformationabouttheregularitiesinthemappingfrominputtooutput.Butitalsocontainsnoise– Thetargetvaluesmaybeunreliable.– Thereissamplingerror.Therewillbeaccidentalregularitiesjustbecauseoftheparticulartrainingcasesthatwerechosen
• Whenwefitthemodel,itcannottellwhichregularitiesarerealandwhicharecausedbysamplingerror.– Soitfitsbothkindsofregularity.– Ifthemodelisveryflexibleitcanmodelthesamplingerrorreallywell.Thisisadisaster.
2
![Page 3: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/3.jpg)
Overfitting
Picture credit: Chris Bishop. Pattern Recognition and Machine Learning. Ch.1.1.
![Page 4: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/4.jpg)
Preventingoverfitting
• Useamodelthathastherightcapacity:– enoughtomodelthetrueregularities– notenoughtoalsomodelthespuriousregularities(assumingtheyareweaker)
• Standardwaystolimitthecapacityofaneuralnet:– Limitthenumberofhiddenunits.– Limitthesizeoftheweights.– Stopthelearningbeforeithastimetooverfit.
4
![Page 5: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/5.jpg)
Limitingthesizeoftheweights
Weight-decayinvolvesaddinganextratermtothecostfunctionthatpenalizesthesquaredweights.
– Keepsweightssmallunlesstheyhavebigerrorderivatives.
ii
i
iii
ii
wEw
wCwhen
wwE
wC
wEC
∂
∂−==
∂
∂
+∂
∂=
∂
∂
∑+=
λ
λ
λ
1
22
,0
w
C
5
![Page 6: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/6.jpg)
Theeffectofweight-decay
• Itpreventsthenetworkfromusingweightsthatitdoesnotneed– Thiscanoftenimprovegeneralizationalot.– Ithelpstostopitfromfittingthesamplingerror.– Itmakesasmoothermodelinwhichtheoutputchangesmoreslowlyastheinputchanges.
• But,ifthenetworkhastwoverysimilarinputsitpreferstoputhalftheweightoneachratherthanalltheweightononeà otherformofweightdecay?
w/2 w/2 w 0
6
![Page 7: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/7.jpg)
Decidinghowmuchtorestrictthecapacity
• Howdowedecidewhichlimittouseandhowstrongtomakethelimit?– Ifweusethetestdatawegetanunfairpredictionoftheerrorratewewouldgetonnewtestdata.
– Supposewecomparedasetofmodelsthatgaverandomresults,thebestoneonaparticulardatasetwoulddobetterthanchance.Butitwon’tdobetterthanchanceonanothertestset.
• Souseaseparatevalidationsettodomodelselection.
7
![Page 8: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/8.jpg)
Usingavalidationset
• Dividethetotaldatasetintothreesubsets:– Trainingdataisusedforlearningtheparametersofthemodel.
– Validationdataisnotusedoflearningbutisusedfordecidingwhattypeofmodelandwhatamountofregularizationworksbest
– Testdataisusedtogetafinal,unbiasedestimateofhowwellthenetworkworks.Weexpectthisestimatetobeworsethanonthevalidationdata
• Wecouldthenre-dividethetotaldatasettogetanotherunbiasedestimateofthetrueerrorrate.
8
![Page 9: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/9.jpg)
Preventingoverfittingbyearlystopping
• Ifwehavelotsofdataandabigmodel,itsveryexpensivetokeepre-trainingitwithdifferentamountsofweightdecay
• Itismuchcheapertostartwithverysmallweightsandletthemgrowuntiltheperformanceonthevalidationsetstartsgettingworse
• Thecapacityofthemodelislimitedbecausetheweightshavenothadtimetogrowbig.
9
![Page 10: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/10.jpg)
Whyearlystoppingworks
• Whentheweightsareverysmall,everyhiddenunitisinitslinearrange.– Soanetwithalargelayerofhiddenunitsislinear.
– Ithasnomorecapacitythanalinearnetinwhichtheinputsaredirectlyconnectedtotheoutputs!
• Astheweightsgrow,thehiddenunitsstartusingtheirnon-linearrangessothecapacitygrows.
outputs
inputs
10
![Page 11: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/11.jpg)
LeNet
• YannLeCunandothersdevelopedareallygoodrecognizerforhandwrittendigitsbyusingbackpropagationinafeedforwardnetwith:– Manyhiddenlayers– Manypoolsofreplicatedunitsineachlayer.– Averagingtheoutputsofnearbyreplicatedunits.– Awidenetthatcancopewithseveralcharactersatonceeveniftheyoverlap.
• DemoofLENET
11
![Page 12: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/12.jpg)
RecognizingDigitsHand-writtendigitrecognitionnetwork
– 7291trainingexamples,2007testexamples– Bothcontainambiguousandmisclassifiedexamples– Inputpre-processed(segmented,normalized)
• 16x16graylevel[-1,1],10outputs
12
![Page 13: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/13.jpg)
LeNet:SummaryMainideas:
• Localà globalprocessing • Retaincoarseposninfo
Maintechnique:weightsharing–unitsarrangedinfeaturemaps
Connections:1256units,64,660cxns,9760freeparameters
Results:0.14%(train),5.0%(test)
vs.3-layernetw/40hiddenunits: 1.6%(train),8.1%(test)
13
![Page 14: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/14.jpg)
The82errorsmadebyLeNet5
Notice that most of the errors are cases that people find quite easy. The human error rate is probably 20 to 30 errors
14
![Page 15: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/15.jpg)
Abruteforceapproach
• LeNetusesknowledgeabouttheinvariancestodesign:– thenetworkarchitecture– ortheweightconstraints– orthetypesoffeature
• Butitsmuchsimplertoincorporateknowledgeofinvariancesbyjustcreatingextratrainingdata:– foreachtrainingimage,producenewtrainingdatabyapplyingallofthetransformationswewanttobeinsensitiveto
– Thentrainalarge,dumbnetonafastcomputer.– Thisworkssurprisinglywell
15
![Page 16: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/16.jpg)
16
![Page 17: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/17.jpg)
Makingbackpropagationworkforrecognizingdigits
• Usingthestandardviewingtransformations,andlocaldeformationfieldstogetlotsofdata.
• Usemany,globallyconnectedhiddenlayersandlearnforaverylongtime– ThisrequiresaGPUboardoralargecluster
• Usetheappropriateerrormeasureformulti-classcategorization– Cross-entropy,withsoftmaxactivation
• Thisapproachcanget35errorsonMNIST!17
![Page 18: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/18.jpg)
Fabricatingtrainingdata
Goodgeneralizationrequireslotsoftrainingdata,includingexamplesfromallrelevantinputregions
ImprovesolutionifgooddatacanbeconstructedExample:ALVINN
18
![Page 19: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/19.jpg)
ALVINN:simulatingtrainingexamples
On-the-flytraining:currentvideocameraimageasinput,currentsteeringdirectionastarget
But:over-trainonsameinputs;noexperiencegoingoff-road
Method:generatenewexamplesbyshiftingimages
Replace10low-error&5randomtrainingexampleswith15new
Key:relationbetweeninputandoutputknown!
19
![Page 20: Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations](https://reader035.fdocuments.in/reader035/viewer/2022071021/5fd565b21da83e6a2f6cd619/html5/thumbnails/20.jpg)
NeuralNetDemos
Scene recognition - Places MIT
Digit recognition
Neural Nets Playground
Neural Style Transfer