Experimental Design for Machine Learning on Multimedia...

18
Experimental Design for Machine Learning on Multimedia Data Lecture 5 Dr. Gerald Friedland, [email protected] 1 Website: http://www.icsi.berkeley.edu/~fractor/fall2019/

Transcript of Experimental Design for Machine Learning on Multimedia...

Page 1: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

Experimental Design for Machine Learning on Multimedia Data

Lecture 5

Dr. Gerald Friedland, [email protected]

1Website: http://www.icsi.berkeley.edu/~fractor/fall2019/

Page 2: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

2

DiscussHomework

Projectproposaldeadline:October11th,2019.

EmailprojectproposalstomeandRishi.

Page 3: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

3

ProjectQuestions1-4(Data&ProblemInspection)

1) What is the variable the machine learner should predict? What is the required accuracy for success? What impact will adversarial examples have?

2) How much data do we have to train the prediction of the variable? Are the classes balanced? How many modalities could be exploited in the data? Is there temporal information? How much noise are we expecting? Do you expect bias?

3) How well is the data annotated (anecdotally)? What is the annotator agreement (measured)?

4) Given questions 1-3: Are we reducing information (pattern matching) or do we need to infer information (statistical machine learner)? As a consequence, what seems the best choice for the type of machine learner per modality?

Page 4: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

4

ProjectQuestions5-8(TrainingforGeneralization)

1) Estimate the memory equivalent capacity needed for the machine learner of your choice. What is the expected generalization? How does the progression look like: Is there enough data?

2) Train your machine learner for accuracy at memory equivalent capacity. Can you reach near 100% memorization? If not, why (diagnose)?

3) Train your machine learner for generalization: Plot the accuracy/capacity curve. What is the expected accuracy and generalization ratio at the point you decided to stop? Do you need to try a different machine learner (if so, redo from 5)? Should you extract features (if so, redo from 5)?

4) How well did your generalization prediction hold on the independent test data? Explain results. How confident are you in the results?

Page 5: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

5

ProjectQuestions9-10(FinishingTouch)

1) How do you combine the models of the modalities? Explain your choice. How confident are you in the combination results (ie., does it make sense to combine)?

2) What are the final combined results of the system? Are the experiments documented and repeatable (if not, please make sure they are, even for bad results)? Are the experiments reproducible (speculate)?

Page 6: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

6

GenericProjectWorkflowforAccuracy

Machine Learning

Development Data

Statistical Models

Apply Models

Test Data

Results

Ground Truth

Error Metric

Accuracy Scores

Training Testing

Evaluation

Page 7: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

7

GenericProjectWorkflowforGeneralization

Supervised Machine Learning Engineering ProcessMaximizing the Chance for Generalization/Minimizing Adverserial Examples

Labelled Data High Enough?

ClassesBalanced?

Check Annotator Agreement

“Clean“ DataAnnotate Again

Too low

Undetermined

Annotate Redundantly

Yes

Subsample to Balance Classes

Yes

No

Committed to a Specific ML

Approach?

No

Yes

Approach with Lowest Capacity

Estimate wins

Estimate Best Approach using

Capacity Estimators

Estimate Generalization

Progression

Train Machine Learner

Good Generalization Progression?

No Acquire More Labelled DataHard Decision

Congrats!

Insufficient Labelled Data

Yes Distrust Estimators

Mismatched Representation Function(s)

Training Machine Learner High Accuracy, Small Capacity?

Yes

High Accuracy only with Large Capacity

Low Accuracyeven with

Large Capacityaka “Overfitting“

Labelled Data

Train with Training Data at

Memory Capacity

Split into Train and Test Data

Run Capacity Estimator on Training Data

HighAccuracy?

Yes

Reduce Machine Learner Capacity

Train on Training Data, Test on Test

Data

Start Debugging

No

Model

Start Debugging

HighAccuracy?

Yes Use Model from Previous

Iteration

No

Already tried many different ML

approaches?

Yes

No

Acquire More Labelled Data

Gerald Friedland, v0.4 Jan 2nd, 2019 [email protected]

Page 8: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

8

Conclusionssofar

▪ Thelowerlimitofgeneralizationismemorization.Thisis,theupperlimitforthesizeofamachinelearnerisit’smemorycapacity.

▪ Thememorycapacityismeasurableinbits.▪ Usingamachinelearnerthatisovercapacityisawasteofresourcesand

increasestheriskoffailure!▪ Alchemyconvertedintochemistrybymeasuring:It’stimetoconvert

guessingandcheckinginMachineLearningintoscience!Let’scallitdatascience?

▪ Todi=o:▪ Non-Binaryclassifiers,regression▪ Convolutionalnetworks,othermachinelearners▪ Re-thinkingtraining▪ Explainableadversarialexamples

Page 9: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

9

PredictingCapacityRequirements

Givendataandlabels:HowmuchactualcapacitydoIneedtomemorizethefunction?

Theoreticalanswer:Whatistheminimumdescriptionlengthofthetablerepresentingthefunctionf(thisis,ShannonEntropy).

PracticalAnswer:1) Worstcase:Let’sbuildaneuralnetworkwhereonlythe

biasesaretrained2) Expectedcase:Howmuchparameterreductioncan

(exponential)trainingbuyus?

Page 10: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

10

PredictingMaximumMemoryCapacity

0

1

+1...

1

.

.

.

1

1

-1

x1

x2

+/-1

.

.

.1

1

1

b1

b2

bmxn

1

1

“Dumb”Network

Runtime:O(nlogn)

Page 11: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

11

PredictingMemoryCapacity

DumbNetwork:• Highlyinefficient.• Potentiallynot100%accurate(hashcollisions).• Wecanassumetrainingweights(andbiases)gets100%accuracywhilereducingparameters.

ExpectedReduction:Exponential!nthresholdsshouldbeabletoberepresentedwithlog2nweightsandbiases(searchtree!).

Page 12: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

12

EmpiricalResults

Allresultsrepeatableat:https://github.com/fractor/nntailoring

Page 13: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

13

FromMemorizationtoGeneralization

Goodnews:• Real-worlddataisnotrandom.• Theinformationcapacityofaperceptronisusually>1bitperparameter(Cover,MacKay).

Thismeans,weshouldbeabletouselessparametersthanpredictedbymemorycapacitycalculations.

Memorizationisworst-casegeneralization.

Page 14: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

14

SuggestedEngineeringProcessforGeneralization

• Startatapproximateexpectedcapacity.• Trainto>98%accuracy.Ifimpossible,increaseparameters.• Retrainiterativelywithdecreasedcapacitywhiletestingagainstvalidationset. Shouldsee:decreaseintrainingaccuracywithincreaseinvalidationsetaccuracy

• Stopatminimumcapacityforbestheld-outsetaccuracy.

Bestcasescenario:Asparametersarereduced,neuralnetworkfailstomemorizeonlytheinsignificant(noise)bits.

Page 15: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

15

GeneralizationProcess:ExpectedCurve

Page 16: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

16

OvercapacityMachineLearning:Issues

▪ Wasteofmoney,energy,andtime.Badforenvironment.

▪ Unseen(redundantbits)areanecessaryconditionforadversarialexamples.See: B.Li,G.Friedland,J.Wang,R.Jia,C.Spanos,D.Song:“OneBitMatters:ExplainingAdversarialExamplesastheAbuseofRedundancy”,submittedtoICLR2019.

▪ Lessparametersgiveahigherchanceforexplainability(Occam’sRazor).See: G.Friedland,A.Metere:“MachineLearningforScience”,UQSciMLWorkshop,LosAngeles,June2018.

Page 17: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

17

Reminder:Occam’sRazor

Amongcompetinghypotheses,theonewiththefewestassumptionsshouldbeselected.

Foreachacceptedexplanationofaphenomenon,theremaybeanextremelylarge,perhapsevenincomprehensible,numberofpossibleandmorecomplexalternatives,becauseonecanalwaysburdenfailingexplanationswithadhochypothesestopreventthemfrombeingfalsified;therefore,simplertheoriesarepreferabletomorecomplexonesbecausetheyaremoretestable.(Wikipedia,Sep.2017)

Page 18: Experimental Design for Machine Learning on Multimedia ...fractor/fall2019/ewExternalFiles/cs294-5.pdf4 Project Questions 5-8 (Training for Generalization) 1) Estimate the memory equivalent

18

Demotime!

▪ Intrototoolsongithub▪ T(n,k)calculation▪ Capacityestimationgiventable▪ Capacityprogression