Stats5 Seminar: Machine Learningsmyth/courses/stats5/onlineslides/...Feb 20 John Brock Cylance, Inc...
Transcript of Stats5 Seminar: Machine Learningsmyth/courses/stats5/onlineslides/...Feb 20 John Brock Cylance, Inc...
Stats5Seminar:MachineLearning
Winter2018
ProfessorPadhraicSmythDepartmentsofComputerScienceandStatisticsUniversityofCalifornia, Irvine
P.Smyth:Stats5:DataScience Seminar,Winter2018:2
ClassOrganization
• Meetweeklyfor40minuteseminarwith5-10minutediscussion
• 8topics(withguestspeakers),weeks2through9– Youareencouraged toaskquestionsduringandafterthetalks
• Introandwrap-uptalksinweeks1and10
• ClassWebsiteisatwww.ics.uci.edu/~smyth/courses/stats5– Slidesandrelatedmaterialswillbeposted duringthequarter
P.Smyth:Stats5:DataScience Seminar,Winter2018:3
Date Speaker DepartmentOr Organization Topic
Jan9 PadhraicSmyth ComputerScience IntroductiontoDataScience
Jan16 Padhraic Smyth ComputerScience Classification AlgorithmsinMachineLearning
Jan23 MichaelCarey ComputerScience DatabasesandDataManagement
Jan30 SameerSingh ComputerScience StatisticalNaturalLanguageProcessing
Feb6 Zhaoxia Yu Statistics AnIntroductiontoClusterAnalysis
Feb13 ErikSudderth ComputerScience ComputerVision andMachineLearning
Feb20 JohnBrock Cylance, Inc DataScienceandCyberSecurity
Feb27 VideoLecture(KateCrawford)
MicrosoftResearchandNYU BiasinMachineLearning
Mar6 MattHarding Economics DataScienceinEconomics andFinance
Mar13 PadhraicSmyth ComputerScience Review:PastandFutureofDataScience
ScheduleofLectures
P.Smyth:Stats5:DataScience Seminar,Winter2018:4
SubmissionofReviewForms(Weeks2to10)
• SubmitReviewformsforLectures2through10• Availableathttp://www.ics.uci.edu/~smyth/courses/stats5/Forms/
• Reviewformswillbeavailableonlineatthestartofeachclass– Afewrelativelyshortquestionsbasedonthelecturethatday– Needstobesubmitted toEEEby12:15foreachlecture– Bringyour laptoporotherdevice
• Requirementstopasstheclass– Attendandsubmit reviewform for least8lecturesforweeks2through 10
(allowedtomissoneifyouneedtoforsomereason)
• Nofinalexam:pass/failbasedonattendanceandreviewforms
P.Smyth:Stats5:DataScience Seminar,Winter2018:5
OutlineofToday’sTopic
• Whatismachinelearning?
• Classificationalgorithms
• Examplesfromimageandsequenceclassification
• Conclusionsanddiscussion
[Acknowledgement toProfessor AlexIhler forvariousslidesandfigures inthislecture]
P.Smyth:Stats5:DataScience Seminar,Winter2018:6
WhatisMachineLearning?
P.Smyth:Stats5:DataScience Seminar,Winter2018:7
Machinelearning(ML)
• Learningmodelsfromdata• Makingpredictions(ordecisions)• Gettingbetterwithexperience(data)• Problemswhosesolutionsare“hardtodescribe”
P.Smyth:Stats5:DataScience Seminar,Winter2018:8
Typesofmachinelearningproblems
• Supervisedlearning– “Labeled”trainingdata– Everyexamplehasadesiredtargetvalue(a“knownanswer”)– Rewardpredictionsclosetotarget;penalizepredictionswithlargeerrors
– Classification:adiscrete-valuedprediction– Regression:acontinuous-valued prediction
P.Smyth:Stats5:DataScience Seminar,Winter2018:9
P.Smyth:Stats5:DataScience Seminar,Winter2018:10
Typesofmachinelearningproblems
• Supervisedlearning– “Labeled”trainingdata– Everyexamplehasadesiredtargetvalue(a“bestanswer”)– Rewardpredictionbeingclosetotarget
– Classification:adiscrete-valuedprediction– Regression:acontinuous-valued prediction
– Recommendersystems12
11
10
987654321
455? 311
3124452
534321423
245424
5224345
423316
users
movies
P.Smyth:Stats5:DataScience Seminar,Winter2018:11
Typesofmachinelearningproblems
• Supervisedlearning– Trainingdatahaslabelsortargetvalues
• Unsupervisedlearning– Trainingdatahasnolabelsortargetvalues– Interestedindiscoveringnaturalstructureindata– Oftenusedinexplorationofdata,e.g.,inscience,inbusiness– Example:
• Clusteringcustomersormedicalpatientsintogroups• Discoveringanumericalrepresentationofwordsormovies
P.Smyth:Stats5:DataScience Seminar,Winter2018:12
Datain2Dimensionswith5Clusters
SeeLecturebyProfZhaoxia YulaterthisquarteronClusteringAlgorithms
P.Smyth:Stats5:DataScience Seminar,Winter2018:13
Embeddings ofWordsasVectors
From:https://www.mathworks.com/help/examples/textanalytics/
P.Smyth:Stats5:DataScience Seminar,Winter2018:14
Figure from Koren, Bell, Volinksy, IEEE Computer, 2009
P.Smyth:Stats5:DataScience Seminar,Winter2018:15
Typesofmachinelearningproblems
• Supervisedlearning• Unsupervisedlearning
• Reinforcementlearning– Algorithmgetsindirectfeedbackonitsprogress(ratherthancorrect/incorrect)– E.g.,aprogramlearningtoplaychess,orGo,oravideogame– E.g.,anautonomous vehiclelearninghowtonavigateacity– Mathematicalmodelsfordelayedreward,creditassignment,explore/exploit
P.Smyth:Stats5:DataScience Seminar,Winter2018:16
P.Smyth:Stats5:DataScience Seminar,Winter2018:17
ClassificationusingSupervisedLearning
P.Smyth:Stats5:DataScience Seminar,Winter2018:18
LearningaClassificationModel
PatientID Zipcode Age …. Test Score Diagnosis
18261 92697 55 83 1
42356 92697 19 99 1
00219 90001 35 21 0
83726 24351 0 35 0
TrainingData
Learningalgorithmlearnsafunctionthattakesvaluesonthelefttopredictthevalue(diagnosis)ontheright
P.Smyth:Stats5:DataScience Seminar,Winter2018:19
MakingPredictionswithaClassificationModel
PatientID Zipcode Age …. Test Score Diagnosis
18261 92697 55 83 1
42356 92697 19 99 1
00219 90001 35 21 0
83726 24351 0 35 0
12837 92697 40 70 ??
72623 92697 32 44 ??
Wecanthenusethemodeltomakepredictionswhentargetvaluesareunknown
TrainingData
TestData
P.Smyth:Stats5:DataScience Seminar,Winter2018:20
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000
AGE
MO
NTHL
Y IN
COM
EEachdotisa2-dimensionalpointrepresentingoneperson=[AGE,MONTHLYINCOME]
P.Smyth:Stats5:DataScience Seminar,Winter2018:21
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000
AGE
MO
NTHL
Y IN
COM
E
Goodboundary?
Betterboundary?
Bluedots=goodloansReddots=badloans
P.Smyth:Stats5:DataScience Seminar,Winter2018:22
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000
AGE
MO
NTHL
Y IN
COM
E
Amuchmorecomplexboundary– butperhapsoverfitting tonoise?
P.Smyth:Stats5:DataScience Seminar,Winter2018:23
BasicConcepts
• Thecurverepresentsaclassifier(amodel,apredictor)– Pointsononesideofthelinegetclassifiedasoneclass– Pointsontheother sidegetclassifiedastheotherclass– Onceweknowthecurvewecantakenewpointsandclassifythem
• Thecurveisrepresentedinternallybyasetofcoefficients– Thesearealsoknownas“parameters”or“weights”
• Thealgorithmsystematicallyadjuststhecoefficientsontrainingdatatoreducetheerrorasmuchasitcan
• Thisprocessoffindingtheweightsisknownas“learningamodel”
• Foundationalideasarefromstatisticsandoptimization
P.Smyth:Stats5:DataScience Seminar,Winter2018:24
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000
AGE
MO
NTHL
Y IN
COM
E
Initialguessforcoefficients(notverygood,higherror)
P.Smyth:Stats5:DataScience Seminar,Winter2018:25
0 10 20 30 40 50 60 70 80 900
2000
4000
6000
8000
10000
12000
14000
AGE
MO
NTHL
Y IN
COM
E
Initialguessforcoefficients(notverygood,higherror)
Finalsolutionforcoefficients(muchbetter,lowerror)
P.Smyth:Stats5:DataScience Seminar,Winter2018:26
010 20
30 4050
60 7080
90
0
2000
4000
6000
8000
10000
12000
14000-3
-2
-1
0
1
2
3
4
5
AGEMONTHLY INCOME
ASS
ETS
Noweachdotisa3-dimensionalpointrepresentingoneperson=[AGE,MONTHLYINCOME,ASSETS]
Ourboundarylinewillnowbecomeaplane
P.Smyth:Stats5:DataScience Seminar,Winter2018:27
HowDoesthisWorkinPractice?
• Weusecomputeralgorithmstosearchforthebestlineorcurve
• Thesesearchalgorithmsarequitesimple1. Startwithaninitialrandomguessforcoefficients2. Changethecoefficientsslightly toreducetheerror
(canusecalculustodothis)3. Movetothenewcoefficients4. Keeprepeatinguntil“convergence”
• Thissearchcanbedone10,100,1000,or1million“dimensions”….with10’sofmillionsofexamples
• Thissearchprocessisatthecoreofmachinelearningalgorithms
P.Smyth:Stats5:DataScience Seminar,Winter2018:28
KeyPoints
• Werepresentourtrainingdataaspointsinamulti-dimensionalspace– Howdoweobtainthelabelsforthedatapoints?
• Wewanttofindaboundarycurvethatcanseparatepointsintotwoclasses
• Thecurvesarerepresentedbysetsofcoefficients(orweights)
• Machinelearningalgorithmsusesearch(oroptimization)toautomaticallyfindthecoefficientswiththelowesterroronthetrainingdata
P.Smyth:Stats5:DataScience Seminar,Winter2018:29
IftheModelistooComplexitcanOverfit
x
y
x
y
x
y
x
y
Toosimple?
Toocomplex? Aboutright?
Data
P.Smyth:Stats5:DataScience Seminar,Winter2018:30
NeuralNetworkClassifiers
P.Smyth:Stats5:DataScience Seminar,Winter2018:31
Machine Learning Notation
Features x e.g.,pixelinputs(usuallyamultidimensionalvector)
Targets y e.g.,truelabelforanimage:“cat”or“nocat”
Predictionsŷ e.g.,model’spredictiongiveninputs,e.g.,“cat”
Error e(y,ŷ ) e.g.,e=0ifpredictionmatchestarget,1otherwise
Parametersθ e.g.,weights,coefficientsspecifyingthemodel
P.Smyth:Stats5:DataScience Seminar,Winter2018:32
Example:ASimpleLinearModel
x1
x2
x3
+1
f(x)
Themachinelearningalgorithmwilllearnaweightforeacharrowinthediagram
Thisasimplemodel:oneweightperinput
P.Smyth:Stats5:DataScience Seminar,Winter2018:33
ASimpleNeuralNetwork
Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction
Thisismorecomplexandhasmoreparametersthanthesimplemodel
x1
x2
x3
+1
f(x)
HiddenLayer
Output
Inputs
P.Smyth:Stats5:DataScience Seminar,Winter2018:34
ASimpleNeuralNetwork
Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction
Thisismorecomplexandhasmoreparametersthanthesimplemodel
x1
x2
x3
+1
f(x)
HiddenLayer
Output
Inputs
P.Smyth:Stats5:DataScience Seminar,Winter2018:35
ASimpleNeuralNetwork
Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction
Thisismorecomplexandhasmoreparametersthanthesimplemodel
x1
x2
x3
+1
f(x)
HiddenLayer
Output
Inputs
P.Smyth:Stats5:DataScience Seminar,Winter2018:36
DeepLearning:ModelswithMoreHiddenLayers
Wecanbuildonthisideatocreate“deepmodels”withmanyhiddenlayers
x1
x2
x3
+1
f(x)
Veryflexibleandcomplexfunctions
HiddenLayer 1
HiddenLayer 2
Output
Inputs
P.Smyth:Stats5:DataScience Seminar,Winter2018:37
Figure from http://parse.ele.tue.nl/
ExampleofaNetworkforImageRecognition
Mathematicallythisisjustafunction(acomplicatedone)
P.Smyth:Stats5:DataScience Seminar,Winter2018:38
ABriefHistoryofNeuralNetworks…
• ThePerceptronEra:1950sand60s– Greatoptimism withperceptrons(linearmodels)....– ...untilMinsky,1969:perceptronshadlimitedrepresentationpower– Hardproblemsrequirehiddenlayers....buttherewasnotrainingalgorithm
• TheBackpropagationEra:Late1980stomid-90’s– Invention ofbackpropagation– trainingofmodelswithhiddenlayers– Wildenthusiasm(intheUSatleast)....NIPSconference,funding,etc– Mid1990’s:enthusiasmdiesout: trainingdeepNNsishard
• TheDeepLearningEra:2010-present– 3rdwaveofneuralnetworkenthusiasm– Whathappenedsincemid90’s?
• Muchlargerdatasets• Muchgreatercomputationalpower• Fastoptimizationtechniques
P.Smyth:Stats5:DataScience Seminar,Winter2018:39
LearningviaGradientDescent
P.Smyth:Stats5:DataScience Seminar,Winter2018:40
Finding good parameters
• Wanttofindparametersθ whichminimizeourerror…
• Thinkofacost“surface”:errorresidualforthat θ…
P.Smyth:Stats5:DataScience Seminar,Winter2018:41
Gradientdescent
?
• Howtochangeθ toimproveJ(θ)?
• ChooseadirectioninwhichJ(θ)isdecreasing
P.Smyth:Stats5:DataScience Seminar,Winter2018:42
Gradientdescent
• Howtochangeθ toimproveJ(θ)?
• ChooseadirectioninwhichJ(θ)isdecreasing
• Derivative
• Positive=>increasing• Negative=>decreasing
P.Smyth:Stats5:DataScience Seminar,Winter2018:43
Gradientdescentinmoredimensions
• Gradientvector
• Indicatesdirectionofsteepestascent(negative=steepestdescent)
P.Smyth:Stats5:DataScience Seminar,Winter2018:44
Commentsongradientdescent
• Simpleandgeneralalgorithm– Usableinbroadvarietyofmodels
• Localminima– Sensitivetostartingpoint
P.Smyth:Stats5:DataScience Seminar,Winter2018:45
ImageClassificationExamples
P.Smyth:Stats5:DataScience Seminar,Winter2018:46
Example:ClassifyingHandwrittenDigits
Whatthedatalooksliketothehumaneye
Inputs:pixelvaluesfromeachimageOutput:10possibleclasses(0,1,…,9)
P.Smyth:Stats5:DataScience Seminar,Winter2018:47
PixelInputsRepresentedNumerically
From https://www.tensorflow.org/get_started/mnist/beginners
P.Smyth:Stats5:DataScience Seminar,Winter2018:48
Example:ClassifyingHandwrittenDigits
ClassificationAccuracyhasgonefrom93%to99.9%inthepast10years
P.Smyth:Stats5:DataScience Seminar,Winter2018:49
ExamplesofErrorsmadebytheNeuralNetworkClassifier
Image from http://neuralnetworksanddeeplearning.com/chap6.html
Human label (“truth”)
Label predicted by the classifier
P.Smyth:Stats5:DataScience Seminar,Winter2018:50
Russakovsky etal,ImageNet LargeScaleVisualRecognitionChallenge, 2015
P.Smyth:Stats5:DataScience Seminar,Winter2018:51
Trainingdatainputsx=rawpixelvalueslabelsy=valuesfrom1to1000
Trainedonmillionsofimages
Howisnetworkstructuredetermined?Essentiallytrial-and-error(expensive!)
DeepNetworkarchitectureforGoogLeNet network,27layers
P.Smyth:Stats5:DataScience Seminar,Winter2018:52
Figure from Kevin Murphy, Google, 2016
P.Smyth:Stats5:DataScience Seminar,Winter2018:53
Figure from Krizhevsky, Sutskever, Hinton, 2012
P.Smyth:Stats5:DataScience Seminar,Winter2018:54
Figure from Krizhevsky, Sutskever, Hinton, 2012
P.Smyth:Stats5:DataScience Seminar,Winter2018:55
Figure from Lee et al., ICML 2009
P.Smyth:Stats5:DataScience Seminar,Winter2018:56
SequencePredictionExamples
P.Smyth:Stats5:DataScience Seminar,Winter2018:57
LearningbyPredictingwhat’sNext
• Examples– Predictthenextwordapersonwilltypeorspeak,givenwordsuptothispoint– Predictthevalueofthe DowJonestomorrowafternoon,givenhistory
• Wecanusethesamegeneralmethodologiesasbefore– Modelnowusespastdatatopredictnextevent
• Applications– Speechrecognition– Auto-suggest inhumantyping– Machinetranslation– Consumermodeling– Chatbots– …andmore
P.Smyth:Stats5:DataScience Seminar,Winter2018:58
Example:PredictingtheNextCharacter
Figure from http://cs.stanford.edu/people/karpathy/recurrentjs/
P.Smyth:Stats5:DataScience Seminar,Winter2018:59
Example:PredictingCharacterswithaRecurrentNetwork
Figure from http://cs.stanford.edu/people/karpathy/recurrentjs/
P.Smyth:Stats5:DataScience Seminar,Winter2018:60
OutputfromaModelLearnedonShakespeare
KING LEAR: O, if you were a feeble sight, the courtesy of your law, Your sight and several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.
Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states.
DUKE VINCENTIO: Well, your wit is in the care of side and that.
Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
P.Smyth:Stats5:DataScience Seminar,Winter2018:61
OutputfromaModelLearnedonCookingRecipes
From https://gist.github.com/nylki/1efbaa36635956d35bcc
P.Smyth:Stats5:DataScience Seminar,Winter2018:62
OutputfromaModelLearnedonSourceCode
Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
P.Smyth:Stats5:DataScience Seminar,Winter2018:63
OutputfromaModelLearnedonMathematicsPapers
Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
P.Smyth:Stats5:DataScience Seminar,Winter2018:64
OutputfromaModelLearnedfromUSPresidentSpeeches
From https://medium.com/@samim/
P.Smyth:Stats5:DataScience Seminar,Winter2018:65
LimitationsofClassificationAlgorithms
P.Smyth:Stats5:DataScience Seminar,Winter2018:66
ADeepNeuralNetworkforImageRecognitionFromNguyen,Yosinski, Clune, CVPR2015
P.Smyth:Stats5:DataScience Seminar,Winter2018:67
ADeepNeuralNetworkforImageRecognition
ImagesusedforTraining NewImages
FromNguyen,Yosinski, Clune, CVPR2015
P.Smyth:Stats5:DataScience Seminar,Winter2018:68
ADeepNeuralNetworkforImageRecognitionFromNguyen,Yosinski, Clune, CVPR2015
P.Smyth:Stats5:DataScience Seminar,Winter2018:69
FromNguyen,Yosinski, Clune, CVPR2015
ADeepNeuralNetworkforImageRecognition
P.Smyth:Stats5:DataScience Seminar,Winter2018:70
P.Smyth:Stats5:DataScience Seminar,Winter2018:71
Date Speaker DepartmentOr Organization Topic
Jan9 PadhraicSmyth ComputerScience IntroductiontoDataScience
Jan16 Padhraic Smyth ComputerScience MachineLearning
Jan23 MichaelCarey ComputerScience DatabasesandDataManagement
Jan30 SameerSingh ComputerScience StatisticalNaturalLanguageProcessing
Feb6 Zhaoxia Yu Statistics AnIntroductiontoClusterAnalysis
Feb13 ErikSudderth ComputerScience ComputerVision andMachineLearning
Feb20 JohnBrock Cylance, Inc DataScienceandCyberSecurity
Feb27 VideoLecture(KateCrawford)
MicrosoftResearchandNYU BiasinMachineLearning
Mar6 MattHarding Economics DataScienceinEconomics andFinance
Mar13 PadhraicSmyth ComputerScience Review:PastandFutureofDataScience
ScheduleofLectures