Experimental Design for Machine...
Transcript of Experimental Design for Machine...
![Page 1: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/1.jpg)
Gerald Friedland, http://www.gerald-friedland.org
GeraldFriedland(UCBerkeley)
ExperimentalDesignforMachineLearning
Paper,Demo,etc:https://tfmeter.icsi.berkeley.edu
Commercialtool:http://brainome.ai
![Page 2: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/2.jpg)
About me….
▪ AdjunctFaculty,UCBerkeley
▪ DataScientistatNationalLab
▪ StartedworkinMachineLearningin2001
![Page 3: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/3.jpg)
Gerald Friedland, http://www.gerald-friedland.org3
Startofthiswork:SimpleQuestion
▪ Howmuchmoney(cputime,memory,IO)doIneedtobudgetformydeeplearningexperiment?
▪ StateoftheArt:Noanswer.Forexample,ImageNetmodelsvarysignificantly:
▪ AlexNet:238MBmodel,2.27BnOps
▪ DarkNet:28MBmodel,0.96BnOps
▪ VGG-16:528MB,30.94BnOps
Source:https://pjreddie.com/darknet/imagenet/
![Page 4: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/4.jpg)
Gerald Friedland, http://www.gerald-friedland.org4
Agame…
▪ Continuethesequence:
▪ 2,4,6,8,….
▪ 6,5,1,4,…..▪ Whatisthenextnumber?
▪ 100000(sequence1)
▪ 100000(sequence2)
▪ Why?
![Page 5: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/5.jpg)
Gerald Friedland, http://www.gerald-friedland.org5
TheScientificMethod
DataScience:TheScienceofAutomatingtheScientificMethod
![Page 6: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/6.jpg)
Gerald Friedland, http://www.gerald-friedland.org6
TheScientificMethod:Practical(traditional)
E = mc2
![Page 7: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/7.jpg)
Gerald Friedland, http://www.gerald-friedland.org7
TheScientificMethod:Practical(new)
E = mc2
![Page 8: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/8.jpg)
Gerald Friedland, http://www.gerald-friedland.org8
▪ Intelligence:Theabilitytoadapt(BinetandSimon,1904)
▪ MachinelearningadaptsafinitestatemachineMtoanunknownfunctionbasedonobservations.
▪ Input:nrowsofobservations(instances)inatablewithheader:whereisacolumnwithlabelswecalltargetfunction.
▪ Output:StatemachineMthatmapsapoint
ThoughtFramework:MachineLearning
(x1, x2, . . . , xm, f( ⃗x ))
f( ⃗x )
(x1, x2, . . . , xm) ⟹ f( ⃗x )
![Page 9: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/9.jpg)
Gerald Friedland, http://www.gerald-friedland.org9
▪ Assume(binaryclassifier)
▪ Question:HowmanystatetransitionsdoesMneedtomodelthetrainingdata?
ThoughtFramework:MachineLearning
xi ∈ ℝ, f( ⃗x ) ∈ {0,1}
![Page 10: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/10.jpg)
Gerald Friedland, http://www.gerald-friedland.org10
Refresh:MemoryArithmetic
• Informationisreductionofuncertainty:H=-log2P=-log2=log2#statesmeasuredinbits.
• Information:log2#states(positivebits)Uncertainty:log2P=log2(negativebits)
• Ifstatesarenotequiprobable,ShannonEntropyprovidestighterbound.Math:Assumptionsneeded!(infinity,distribution)Engineering:Estimateusingbinning
1#states
1#states
![Page 11: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/11.jpg)
Gerald Friedland, http://www.gerald-friedland.org11
▪ Assume(binaryclassifier)Question:HowmanystatetransitionsdoesMneedtomodelthetrainingdata?Maximally:#rows(lookuptable)Minimally:?(KolmogorovComplexity)
ThoughtFramework:MachineLearning
xi ∈ ℝ, f( ⃗x ) ∈ {0,1}
![Page 12: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/12.jpg)
Gerald Friedland, http://www.gerald-friedland.org12
▪ IntellectualCapacity:Thenumberofuniquetargetfunctionsamachinelearnerisabletorepresent(asafunctionofthenumberofmodelparameters).
▪ MemoryEquivalentCapacity(MEC):Amachinelearner’sintellectualcapacityismemory-equivalenttoNbitswhenthemachinelearnerisabletorepresentall2NbinarylabelingfunctionsofNuniformlyrandominputs.
▪ AtMECorhigher,Misabletomemorizeallpossiblestatetransitionsfromtheinputtotheoutput.
ThoughtFramework:MachineLearning
![Page 13: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/13.jpg)
Gerald Friedland, http://www.gerald-friedland.org13
ThisTalk:Maintrick
• Ifwededucenothingfromdata,theonlythingwecandoismemorizetheobservationsverbatim.
• Usingasmanyparametersasneededformemorizationisthereforeanindicatorthatthemachinelearnerdidnotdeduceanything(overfitting).
• Reducingparametersbelowmemorizationcapacitywill,inthebestcase,makethemachinelearnerforgetwhat’snotrelevantwithregardstothetargetfunction:generalization.
Memorizationisworst-casegeneralization
![Page 14: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/14.jpg)
Gerald Friedland, http://www.gerald-friedland.org14
GeneralizationinMachineLearning
Memorizationisworst-casegeneralization.Forbinaryclassifiers:G<1=>Mneedsmoretraining/data(notevenmemorizing)G=1=>Mismemorizing=overfitting1<G<=>Mcouldbeimplementingalosslesscompression(andstilloverfit)G>=>Misgeneralizing(nochanceforoverfitting)
G =#correctly classified instancesMemory Equivalent Capacity
[bitsbit
]
GMEM
GMEM
![Page 15: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/15.jpg)
Gerald Friedland, http://www.gerald-friedland.org15
GeneralizationinMachineLearning
G =#correctly classified instancesMemory Equivalent Capacity
[bitsbit
]
Advantagesofthisdefinition:
• Keepcurrentapproachwithtraining/validation/benchmarksets.
• Noi.i.d.requirementfortrain/testset:Onlyrequirementisinputpointsaredistinct!
• Nodistributionalassumptions.
![Page 16: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/16.jpg)
Gerald Friedland, http://www.gerald-friedland.org16
HowdowecalculatetheMemoryEquivalentCapacity?
• BinaryDecisionTree:Depthoftree(ifperfect).
• NeuralNetwork(reminderoftalk)
• RandomForrest:TBD
• SVN:TBD
• k-NN:TBD
• GMMs:TBD
![Page 17: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/17.jpg)
Gerald Friedland, http://www.gerald-friedland.org17
MachineLearningasEngineeringDiscipline
• SupervisedMachineLearnershaveaMemoryEquivalentCapacityinbitsthatiscomputableandmeasurable.
• ArtificialNeuralNetworkswithgatingfunctions(Sigmoid,ReLU,etc.)have
• acapacityupperlimitthatcanbedeterminedanalyticallyusing4principles
• aneffectivecapacitythatcanbemeasuredonactualimplementations.
• Predictingandmeasuringcapacityallowsfortask-independentoptimizationofaconcretenetworkarchitecture,learningalgorithm,convergencetricks,etc…
• Capacityrequirementcanbeapproximatelypredictedgiventheinputdataandgroundtruth.
![Page 18: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/18.jpg)
Gerald Friedland, http://www.gerald-friedland.org18
Repeat:ThePerceptron
Source:WikipediaPhysicalinterpretation:Energythreshold
![Page 19: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/19.jpg)
Gerald Friedland, http://www.gerald-friedland.org19
Repeat:ActivationFunctions(toomany)
Source:WikipediaActivationfunctionsapproximatethesharpdecisionboundary.
![Page 20: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/20.jpg)
Gerald Friedland, http://www.gerald-friedland.org20
HowmanybinaryfunctionscanonmodelusingasinglePerceptron?
Source:R.Rojas,IntrotoNeuralNetworks
![Page 21: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/21.jpg)
Gerald Friedland, http://www.gerald-friedland.org21
Example:BooleanFunctions
Source:R.Rojas,IntrotoNeuralNetworks
• 22vpossiblelabelingsofvbooleanvariables
• 22vlabelingsof2vpoints.
• Forv=2,allbut2functionswork:XOR,NXOR
![Page 22: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/22.jpg)
Gerald Friedland, http://www.gerald-friedland.org22
MachineLearningasanEncoder/Decoder
Informationloss
Learning Method
Neural Network
Sender
Identity
Encoder Channel Decoder Receiver
labels weights weights labels'
data
Source:D.MacKay:InformationTheory,InferenceandLearning
Maintrick:LettheMachineLearnerlabelrandompoints!
![Page 23: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/23.jpg)
Gerald Friedland, http://www.gerald-friedland.org23
CriticalPoints:Perceptron(Cover,MacKay)
N=K:VCDimension(forpointsinrandomposition)N=2K:Cover/MacKayInformationCapacity
Source:D.MacKay:InformationTheory,InferenceandLearning
![Page 24: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/24.jpg)
Gerald Friedland, http://www.gerald-friedland.org24
FromaPerceptrontoPerceptronNetworks
Source:Wikipedia
![Page 25: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/25.jpg)
Gerald Friedland, http://www.gerald-friedland.org25
Careful:OtherArchitectures
TypicalMLPShortcutNetwork
Source:R.Rojas,IntrotoNeuralNetworks
ExampleSolutionstoXOR
![Page 26: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/26.jpg)
Gerald Friedland, http://www.gerald-friedland.org26
Solution:Calculateinbits!
Assume:yi,xi∈{0,1},xiuniformlydistributednbitsofmemory:f(x1,…,xn)=x1,…,xn.(identityfunction).MachineLearner:binaryclassifier:f(x1,…,xn)=y1multi-class/regression:f(x1,…,xn)=y1,…,ym
MemoryEquivalentCapacity:Thenumberofconfigurationsofuniformlydistributedx1,…,xnthatamachinelearnercanguaranteetolabelcorrectly.
![Page 27: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/27.jpg)
Gerald Friedland, http://www.gerald-friedland.org27
MemoryEquivalentCapacityforNeuralNetworks
1) Theoutputofaperceptronismaximally1bit.2) Themaximummemorycapacityofaperceptronisthe
numberofparameters(includingbias)inbits.(MacKay2003)
3) Themaximummemorycapacityofperceptronsinparalleladditive.(MacKay2003speculative,FriedlandandKrell2017)
4) Themaximummemorycapacityofalayerofperceptronsdependingonapreviouslayerofperceptronsislimitedbythemaximumoutput(inbits)ofthepreviouslayer.(DataProcessingInequality,Tishby2012)
![Page 28: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/28.jpg)
Gerald Friedland, http://www.gerald-friedland.org28
Examples:Howmanybitsofmaximalcapacity?
x1
x2x2
x1
3bits 2*3bits+min(2,3)bits=8bits
w1
w2
b
w1
w2
w3
w4
b1
b2
w5
w6
![Page 29: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/29.jpg)
Gerald Friedland, http://www.gerald-friedland.org29
Examples:Howmanybitsofmaximalcapacity?
2*3bits+min(2,2*3)bits+min(2,3)bits=10bits
x1
x2
w1
w2
w3
w4
b1
b2
w5
w6
w7
w8
b3
b4
b5
w9
w10
![Page 30: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/30.jpg)
Gerald Friedland, http://www.gerald-friedland.org30
Examples:Howmanybitsofmaximalcapacity?
3bits+4bits=7bits
x1
x2
w1
w2
w3
w4
w5
b1 b2
ShortcutorResNet
![Page 31: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/31.jpg)
Gerald Friedland, http://www.gerald-friedland.org31
CharacteristicCurveofaTheoretical3-LayerMLP
![Page 32: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/32.jpg)
Gerald Friedland, http://www.gerald-friedland.org32
CharacteristicCurveofanActual3-LayerMLP
Python scikit-learn, 3-Layer MLP
![Page 33: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/33.jpg)
Gerald Friedland, http://www.gerald-friedland.org33
PredictingCapacityRequirements
Givendataandlabels:HowmuchactualcapacitydoIneedtomemorizethefunction?
Idea:1) Worstcase:Let’sbuildamemorizationnetworkwhereonly
thebiasesaretrained2) Expectedcase:Howmuchparameterreductioncan
(exponential)trainingbuyus?
![Page 34: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/34.jpg)
Gerald Friedland, http://www.gerald-friedland.org34
PredictingMaximumMemoryEquivalentCapacity
0
1
+1...
1
.
.
.
1
1
-1
x1
x2
+/-1
.
.
.1
1
1
b1
b2
bmxn
1
1
“Dumb”Network
Runtime:O(nlogn)
![Page 35: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/35.jpg)
Gerald Friedland, http://www.gerald-friedland.org35
PredictingExpectedMinimumMemoryEquivalentCapacity
DumbNetwork:• Highlyinefficient.• Potentiallynot100%accurate(hashcollisions).• Wecanassumetrainingweights(andbiases)gets100%accuracywhilereducingparameters.
ExpectedReduction:Exponential!nthresholdsshouldbeabletoberepresentedwithlog2nweightsandbiases(searchtree!).
![Page 36: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/36.jpg)
Gerald Friedland, http://www.gerald-friedland.org36
EmpiricalResults
Allresultsrepeatableat:https://github.com/fractor/nntailoring
![Page 37: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/37.jpg)
37
Training
▪ Everythingwedidsofarassumesperfecttraining.Thisis,trainingthatguaranteestoreachtheglobalminimumerror.
▪ Perfecttrainingrequiresexponentialtime.
▪ ImperfecttrainingmeansMemoryEquivalentCapacityiseffectivelyreduced.
▪ Howtomeasurethat:?
![Page 38: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/38.jpg)
Gerald Friedland, http://www.gerald-friedland.org38
FromMemorizationtoGeneralization
Goodnews:• Real-worlddataisnotrandom.• Theinformationcapacityofaperceptronisusually>1bitperparameter(Cover,MacKay).
Thismeans,weshouldbeabletouselessparametersthanpredictedbymemorycapacitycalculations.
Memorizationisworst-casegeneralization.
![Page 39: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/39.jpg)
Gerald Friedland, http://www.gerald-friedland.org39
SuggestedEngineeringProcessforGeneralization
• Startatapproximateexpectedcapacity.• Trainto>98%accuracy.Ifimpossible,increaseparameters.• Retrainiterativelywithdecreasedcapacitywhiletestingagainstvalidationset.Shouldsee:decreaseintrainingaccuracywithincreaseinvalidationsetaccuracy
• Stopatminimumcapacityforbestheld-outsetaccuracy.
Bestcasescenario:Asparametersarereduced,neuralnetworkfailstomemorizeonlytheinsignificant(noise)bits.
![Page 40: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/40.jpg)
Gerald Friedland, http://www.gerald-friedland.org40
GeneralizationProcess:ExpectedCurve
![Page 41: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/41.jpg)
Gerald Friedland, http://www.gerald-friedland.org41
OvercapacityMachineLearning:Issues
▪ Wasteofmoney,energy,andtime.Badforenvironment.
▪ Thelessparameters=>thebetterthegeneralizationrule=>thehigheradaptationperparameter=>thehigherthechanceanunseeninstancecanbepredictedcorrectly.
▪ Lessparametersgiveahigherchanceforexplainability(Occam’sRazor).See:G.Friedland,A.Metere:“MachineLearningforScience”,UQSciMLWorkshop,LosAngeles,June2018.
![Page 42: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/42.jpg)
Gerald Friedland, http://www.gerald-friedland.org42
Reminder:Occam’sRazor
Amongcompetinghypotheses,theonewiththefewestassumptionsshouldbeselected.
Foreachacceptedexplanationofaphenomenon,theremaybeanextremelylarge,perhapsevenincomprehensible,numberofpossibleandmorecomplexalternatives,becauseonecanalwaysburdenfailingexplanationswithadhochypothesestopreventthemfrombeingfalsified;therefore,simplertheoriesarepreferabletomorecomplexonesbecausetheyaremoretestable.(Wikipedia,Sep.2017)
![Page 43: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/43.jpg)
Gerald Friedland, http://www.gerald-friedland.org43
GeneralGeneralization
G =#correctly classified instances
#instances that can be memorized
▪ Binaryclassifier(repeat):
G =#correctly classified instancesMemory Equivalent Capacity
[bitsbit
]
▪ Multi-class/regression:
![Page 44: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/44.jpg)
Gerald Friedland, http://www.gerald-friedland.org44
Non-StatisticalDefinition(Literature)
Informally:Whendotwodifferentinputsleadtothesamemachinelearneroutput.
Thisis,whichbitscanbeignoredinthecomparison.
Statisticalequivalent:Howmanybitsperbitcanbeignoredonaverage(seeGmeasure).
![Page 45: Experimental Design for Machine Learningdeeplearning.cs.cmu.edu/F20/document/slides/Gerald_friedland.pdf · Machine learning adapts a finite state machine M to an unknown function](https://reader033.fdocuments.in/reader033/viewer/2022060909/60a4056d5e749d6a1b332c09/html5/thumbnails/45.jpg)
Gerald Friedland, http://www.gerald-friedland.org45
http://tfmeter.icsi.berkeley.edu
Demo:ExperimentalDesignforTensorFlow