Stats5 Seminar: Machine Learningsmyth/courses/stats5/onlineslides/...Feb 20 John Brock Cylance, Inc...

Stats5Seminar:MachineLearning

Winter2018

ProfessorPadhraicSmythDepartmentsofComputerScienceandStatisticsUniversityofCalifornia, Irvine

P.Smyth:Stats5:DataScience Seminar,Winter2018:2

ClassOrganization

• Meetweeklyfor40minuteseminarwith5-10minutediscussion

• 8topics(withguestspeakers),weeks2through9– Youareencouraged toaskquestionsduringandafterthetalks

• Introandwrap-uptalksinweeks1and10

• ClassWebsiteisatwww.ics.uci.edu/~smyth/courses/stats5– Slidesandrelatedmaterialswillbeposted duringthequarter


Date Speaker DepartmentOr Organization Topic

Jan9 PadhraicSmyth ComputerScience IntroductiontoDataScience

Jan16 Padhraic Smyth ComputerScience Classification AlgorithmsinMachineLearning

Jan23 MichaelCarey ComputerScience DatabasesandDataManagement

Jan30 SameerSingh ComputerScience StatisticalNaturalLanguageProcessing

Feb6 Zhaoxia Yu Statistics AnIntroductiontoClusterAnalysis

Feb13 ErikSudderth ComputerScience ComputerVision andMachineLearning

Feb20 JohnBrock Cylance, Inc DataScienceandCyberSecurity

Feb27 VideoLecture(KateCrawford)

MicrosoftResearchandNYU BiasinMachineLearning

Mar6 MattHarding Economics DataScienceinEconomics andFinance

Mar13 PadhraicSmyth ComputerScience Review:PastandFutureofDataScience

ScheduleofLectures


SubmissionofReviewForms(Weeks2to10)

• SubmitReviewformsforLectures2through10• Availableathttp://www.ics.uci.edu/~smyth/courses/stats5/Forms/

• Reviewformswillbeavailableonlineatthestartofeachclass– Afewrelativelyshortquestionsbasedonthelecturethatday– Needstobesubmitted toEEEby12:15foreachlecture– Bringyour laptoporotherdevice

• Requirementstopasstheclass– Attendandsubmit reviewform for least8lecturesforweeks2through 10

(allowedtomissoneifyouneedtoforsomereason)

• Nofinalexam:pass/failbasedonattendanceandreviewforms


OutlineofToday’sTopic

• Whatismachinelearning?

• Classificationalgorithms

• Examplesfromimageandsequenceclassification

• Conclusionsanddiscussion

[Acknowledgement toProfessor AlexIhler forvariousslidesandfigures inthislecture]


WhatisMachineLearning?


Machinelearning(ML)

• Learningmodelsfromdata• Makingpredictions(ordecisions)• Gettingbetterwithexperience(data)• Problemswhosesolutionsare“hardtodescribe”


Typesofmachinelearningproblems

• Supervisedlearning– “Labeled”trainingdata– Everyexamplehasadesiredtargetvalue(a“knownanswer”)– Rewardpredictionsclosetotarget;penalizepredictionswithlargeerrors

– Classification:adiscrete-valuedprediction– Regression:acontinuous-valued prediction



• Supervisedlearning– “Labeled”trainingdata– Everyexamplehasadesiredtargetvalue(a“bestanswer”)– Rewardpredictionbeingclosetotarget

– Classification:adiscrete-valuedprediction– Regression:acontinuous-valued prediction

– Recommendersystems12

11

10

987654321

455? 311

3124452

534321423

245424

5224345

423316

users

movies



• Supervisedlearning– Trainingdatahaslabelsortargetvalues

• Unsupervisedlearning– Trainingdatahasnolabelsortargetvalues– Interestedindiscoveringnaturalstructureindata– Oftenusedinexplorationofdata,e.g.,inscience,inbusiness– Example:

• Clusteringcustomersormedicalpatientsintogroups• Discoveringanumericalrepresentationofwordsormovies


Datain2Dimensionswith5Clusters

SeeLecturebyProfZhaoxia YulaterthisquarteronClusteringAlgorithms


Embeddings ofWordsasVectors

From:https://www.mathworks.com/help/examples/textanalytics/


Figure from Koren, Bell, Volinksy, IEEE Computer, 2009



• Supervisedlearning• Unsupervisedlearning

• Reinforcementlearning– Algorithmgetsindirectfeedbackonitsprogress(ratherthancorrect/incorrect)– E.g.,aprogramlearningtoplaychess,orGo,oravideogame– E.g.,anautonomous vehiclelearninghowtonavigateacity– Mathematicalmodelsfordelayedreward,creditassignment,explore/exploit


ClassificationusingSupervisedLearning


LearningaClassificationModel

PatientID Zipcode Age …. Test Score Diagnosis

18261 92697 55 83 1

42356 92697 19 99 1

00219 90001 35 21 0

83726 24351 0 35 0

TrainingData

Learningalgorithmlearnsafunctionthattakesvaluesonthelefttopredictthevalue(diagnosis)ontheright


MakingPredictionswithaClassificationModel

PatientID Zipcode Age …. Test Score Diagnosis

18261 92697 55 83 1

42356 92697 19 99 1

00219 90001 35 21 0

83726 24351 0 35 0

12837 92697 40 70 ??

72623 92697 32 44 ??

Wecanthenusethemodeltomakepredictionswhentargetvaluesareunknown

TrainingData

TestData


0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

EEachdotisa2-dimensionalpointrepresentingoneperson=[AGE,MONTHLYINCOME]


0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Goodboundary?

Betterboundary?

Bluedots=goodloansReddots=badloans


0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Amuchmorecomplexboundary– butperhapsoverfitting tonoise?


BasicConcepts

• Thecurverepresentsaclassifier(amodel,apredictor)– Pointsononesideofthelinegetclassifiedasoneclass– Pointsontheother sidegetclassifiedastheotherclass– Onceweknowthecurvewecantakenewpointsandclassifythem

• Thecurveisrepresentedinternallybyasetofcoefficients– Thesearealsoknownas“parameters”or“weights”

• Thealgorithmsystematicallyadjuststhecoefficientsontrainingdatatoreducetheerrorasmuchasitcan

• Thisprocessoffindingtheweightsisknownas“learningamodel”

• Foundationalideasarefromstatisticsandoptimization


0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Initialguessforcoefficients(notverygood,higherror)


0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000

10000

12000

14000

AGE

MO

NTHL

Y IN

COM

E

Initialguessforcoefficients(notverygood,higherror)

Finalsolutionforcoefficients(muchbetter,lowerror)


010 20

30 4050

60 7080

90

0

2000

4000

6000

8000

10000

12000

14000-3

-2

-1

0

1

2

3

4

5

AGEMONTHLY INCOME

ASS

ETS

Noweachdotisa3-dimensionalpointrepresentingoneperson=[AGE,MONTHLYINCOME,ASSETS]

Ourboundarylinewillnowbecomeaplane


HowDoesthisWorkinPractice?

• Weusecomputeralgorithmstosearchforthebestlineorcurve

• Thesesearchalgorithmsarequitesimple1. Startwithaninitialrandomguessforcoefficients2. Changethecoefficientsslightly toreducetheerror

(canusecalculustodothis)3. Movetothenewcoefficients4. Keeprepeatinguntil“convergence”

• Thissearchcanbedone10,100,1000,or1million“dimensions”….with10’sofmillionsofexamples

• Thissearchprocessisatthecoreofmachinelearningalgorithms


KeyPoints

• Werepresentourtrainingdataaspointsinamulti-dimensionalspace– Howdoweobtainthelabelsforthedatapoints?

• Wewanttofindaboundarycurvethatcanseparatepointsintotwoclasses

• Thecurvesarerepresentedbysetsofcoefficients(orweights)

• Machinelearningalgorithmsusesearch(oroptimization)toautomaticallyfindthecoefficientswiththelowesterroronthetrainingdata


IftheModelistooComplexitcanOverfit

x

y

x

y

x

y

x

y

Toosimple?

Toocomplex? Aboutright?

Data


NeuralNetworkClassifiers


Machine Learning Notation

Features x e.g.,pixelinputs(usuallyamultidimensionalvector)

Targets y e.g.,truelabelforanimage:“cat”or“nocat”

Predictionsŷ e.g.,model’spredictiongiveninputs,e.g.,“cat”

Error e(y,ŷ ) e.g.,e=0ifpredictionmatchestarget,1otherwise

Parametersθ e.g.,weights,coefficientsspecifyingthemodel


Example:ASimpleLinearModel

x1

x2

x3

+1

f(x)

Themachinelearningalgorithmwilllearnaweightforeacharrowinthediagram

Thisasimplemodel:oneweightperinput


ASimpleNeuralNetwork

Herethemodellearns3differentfunctionsandthencombinestheoutputsofthe3tomakeaprediction

Thisismorecomplexandhasmoreparametersthanthesimplemodel

x1

x2

x3

+1

f(x)

HiddenLayer

Output

Inputs





x1

x2

x3

+1

f(x)

HiddenLayer

Output

Inputs


DeepLearning:ModelswithMoreHiddenLayers

Wecanbuildonthisideatocreate“deepmodels”withmanyhiddenlayers

x1

x2

x3

+1

f(x)

Veryflexibleandcomplexfunctions

HiddenLayer 1

HiddenLayer 2

Output

Inputs


Figure from http://parse.ele.tue.nl/

ExampleofaNetworkforImageRecognition

Mathematicallythisisjustafunction(acomplicatedone)


ABriefHistoryofNeuralNetworks…

• ThePerceptronEra:1950sand60s– Greatoptimism withperceptrons(linearmodels)....– ...untilMinsky,1969:perceptronshadlimitedrepresentationpower– Hardproblemsrequirehiddenlayers....buttherewasnotrainingalgorithm

• TheBackpropagationEra:Late1980stomid-90’s– Invention ofbackpropagation– trainingofmodelswithhiddenlayers– Wildenthusiasm(intheUSatleast)....NIPSconference,funding,etc– Mid1990’s:enthusiasmdiesout: trainingdeepNNsishard

• TheDeepLearningEra:2010-present– 3rdwaveofneuralnetworkenthusiasm– Whathappenedsincemid90’s?

• Muchlargerdatasets• Muchgreatercomputationalpower• Fastoptimizationtechniques


LearningviaGradientDescent


Finding good parameters

• Wanttofindparametersθ whichminimizeourerror…

• Thinkofacost“surface”:errorresidualforthat θ…


Gradientdescent

?

• Howtochangeθ toimproveJ(θ)?

• ChooseadirectioninwhichJ(θ)isdecreasing


Gradientdescent

• Howtochangeθ toimproveJ(θ)?

• ChooseadirectioninwhichJ(θ)isdecreasing

• Derivative

• Positive=>increasing• Negative=>decreasing


Gradientdescentinmoredimensions

• Gradientvector

• Indicatesdirectionofsteepestascent(negative=steepestdescent)


Commentsongradientdescent

• Simpleandgeneralalgorithm– Usableinbroadvarietyofmodels

• Localminima– Sensitivetostartingpoint


ImageClassificationExamples


Example:ClassifyingHandwrittenDigits

Whatthedatalooksliketothehumaneye

Inputs:pixelvaluesfromeachimageOutput:10possibleclasses(0,1,…,9)


PixelInputsRepresentedNumerically

From https://www.tensorflow.org/get_started/mnist/beginners


Example:ClassifyingHandwrittenDigits

ClassificationAccuracyhasgonefrom93%to99.9%inthepast10years


ExamplesofErrorsmadebytheNeuralNetworkClassifier

Image from http://neuralnetworksanddeeplearning.com/chap6.html

Human label (“truth”)

Label predicted by the classifier


Russakovsky etal,ImageNet LargeScaleVisualRecognitionChallenge, 2015


Trainingdatainputsx=rawpixelvalueslabelsy=valuesfrom1to1000

Trainedonmillionsofimages

Howisnetworkstructuredetermined?Essentiallytrial-and-error(expensive!)

DeepNetworkarchitectureforGoogLeNet network,27layers


Figure from Kevin Murphy, Google, 2016


Figure from Krizhevsky, Sutskever, Hinton, 2012


Figure from Lee et al., ICML 2009


SequencePredictionExamples


LearningbyPredictingwhat’sNext

• Examples– Predictthenextwordapersonwilltypeorspeak,givenwordsuptothispoint– Predictthevalueofthe DowJonestomorrowafternoon,givenhistory

• Wecanusethesamegeneralmethodologiesasbefore– Modelnowusespastdatatopredictnextevent

• Applications– Speechrecognition– Auto-suggest inhumantyping– Machinetranslation– Consumermodeling– Chatbots– …andmore


Example:PredictingtheNextCharacter

Figure from http://cs.stanford.edu/people/karpathy/recurrentjs/


Example:PredictingCharacterswithaRecurrentNetwork

Figure from http://cs.stanford.edu/people/karpathy/recurrentjs/


OutputfromaModelLearnedonShakespeare

KING LEAR: O, if you were a feeble sight, the courtesy of your law, Your sight and several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.

Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states.

DUKE VINCENTIO: Well, your wit is in the care of side and that.

Examples from “The Unreasonable Effectiveness of Recurrent Neural Networks”, Andrej Kaparthy, blog, http://karpathy.github.io/2015/05/21/rnn-effectiveness/


OutputfromaModelLearnedonCookingRecipes

From https://gist.github.com/nylki/1efbaa36635956d35bcc


OutputfromaModelLearnedonSourceCode



OutputfromaModelLearnedonMathematicsPapers



OutputfromaModelLearnedfromUSPresidentSpeeches

From https://medium.com/@samim/


LimitationsofClassificationAlgorithms


ADeepNeuralNetworkforImageRecognitionFromNguyen,Yosinski, Clune, CVPR2015


ADeepNeuralNetworkforImageRecognition

ImagesusedforTraining NewImages

FromNguyen,Yosinski, Clune, CVPR2015


ADeepNeuralNetworkforImageRecognitionFromNguyen,Yosinski, Clune, CVPR2015


FromNguyen,Yosinski, Clune, CVPR2015

ADeepNeuralNetworkforImageRecognition


Date Speaker DepartmentOr Organization Topic

Jan9 PadhraicSmyth ComputerScience IntroductiontoDataScience

Jan16 Padhraic Smyth ComputerScience MachineLearning

Jan23 MichaelCarey ComputerScience DatabasesandDataManagement

Jan30 SameerSingh ComputerScience StatisticalNaturalLanguageProcessing

Feb6 Zhaoxia Yu Statistics AnIntroductiontoClusterAnalysis

Feb13 ErikSudderth ComputerScience ComputerVision andMachineLearning

Feb20 JohnBrock Cylance, Inc DataScienceandCyberSecurity

Feb27 VideoLecture(KateCrawford)

MicrosoftResearchandNYU BiasinMachineLearning

Mar6 MattHarding Economics DataScienceinEconomics andFinance

Mar13 PadhraicSmyth ComputerScience Review:PastandFutureofDataScience

ScheduleofLectures

Stats5 Seminar: Machine Learningsmyth/courses/stats5/onlineslides/...Feb 20 John Brock Cylance, Inc...

Documents

Transcript of Stats5 Seminar: Machine Learningsmyth/courses/stats5/onlineslides/...Feb 20 John Brock Cylance, Inc...