Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples §...

32
Supervised Learning Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric.

Transcript of Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples §...

Page 1: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

SupervisedLearning

RobotImageCredit:Viktoriya Sukhanova©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

Page 2: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

TheBadgesGame

Background:• Pre-registeredattendeesatthe1994MachineLearningConferencereceivedanamebadgelabeledwitha"+"or"-"

• Thelabelisbasedonly uponthename• Thereare294examples(210positiveand84negative)

Whatfunctionwasusedtogeneratethe+/- labeling?

+NaokiAbe - EricBaum

Page 3: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

TrainingData

3

+NaokiAbe- Myriam Abramson+DavidW.Aha+KamalM.Ali- EricAllender+DanaAngluin- Chidanand Apte+MinoruAsada+LarsAsker+Javed Aslam+JoseL.Balcazar- CristinaBaroglio

+PeterBartlett- EricBaum+Welton Becket- Shai Ben-David+GeorgeBerg+NeilBerkman+Malini Bhandaru+Bir Bhanu+Reinhard Blasig- Avrim Blum- AnselmBlumer+JustinBoyan

+CarlaE.Brodley+NaderBshouty- WrayBuntine- Andrey Burago+TomBylander+BillByrne- ClaireCardie+JohnCase+JasonCatlett- PhilipChan- Zhixiang Chen- ChrisDarken

Page 4: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or
Page 5: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

TestData

5

?Shivani Agarwal?ChrisCallison-Burch?EricEaton?PeterStone?MatthewTaylor

Page 6: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

LabeledTestData

6

- Shivani Agarwal- ChrisCallison-Burch- EricEaton+PeterStone+MatthewTaylor

Page 7: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

WhatisLearning?• TheBadgesGameisanexampleofakeylearningprotocol:supervisedlearning

• Firstquestion:Areyousureyougotit?Why?• Issues:–Whichproblemwaseasier: predictionormodeling?– Representation– Problemsetting– BackgroundKnowledge–Whendidlearningtakeplace?

Algorithm:canyouwriteaprogramthattakesthisdataasinputandpredictsthelabelforyourname?

7

Page 8: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Output

y∈YAnitemy

drawnfromanoutputspaceY

Input

x∈XAnitemx

drawnfromaninputspaceX

Systemy =f(x)

SupervisedLearning

• Weconsidersystemsthatapplyanunknownfunctionf()toinputitemsxandreturnanoutputy =f(x).

8

Page 9: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Output

y∈YAnitemy

drawnfromanoutputspaceY

Input

x∈XAnitemx

drawnfromaninputspaceX

Systemy =f(x)

SupervisedLearning

• In(supervised)machinelearning,ourgoalistolearnafunctionh()fromexamplesthatapproximatesf()

9

Page 10: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Output

y∈Y

Anitemydrawnfromalabel

spaceY

Input

x∈X

AnitemxdrawnfromaninstancespaceX

LearnedModely=h(x)

Supervisedlearning

10

Targetfunctiony=f(x)

y = h(x)

Page 11: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Supervisedlearning:Training

• GivethelearnerexamplesinD train

• Thelearnerreturnsamodelh(x)11

LabeledTrainingDataD train

(x1,y1)(x2,y2)…

(xN,yN)

Learnedmodelh(x)

LearningAlgorithm

Canyousuggestotherlearningprotocols?

h(x)isthemodelwe’lluseinourapplication

Page 12: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

FunctionApproximationProblemSetting• Setofpossibleinstances• Setofpossiblelabels• Unknowntargetfunction• Setoffunctionhypotheses

Input:Trainingexamplesofunknowntargetfunctionf

Output:Hypothesisthatbestapproximatesf

XY

f : X ! YH = {h | h : X ! Y}

h 2 H

BasedonslidebyTomMitchell

{hxi, yii}ni=1 = {hx1, y1i , . . . , hxn, yni}

Page 13: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

SampleDataset• ColumnsdenotefeaturesXi

• Rowsdenotelabeledinstances• Classlabeldenoteswhetheratennisgamewasplayed

hxi, yii

hxi, yii

Page 14: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Supervisedlearning:Testing

• Reservesomelabeleddatafortesting14

LabeledTestData

D test

(x’1,y’1)(x’2,y’2)

…(x’M,y’M)

Page 15: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Supervisedlearning:Testing

LabeledTestData

D test

(x’1,y’1)(x’2,y’2)

…(x’M,y’M)

TestLabelsY test

y’1y’2...y’M

RawTestDataX test

x’1x’2….x’M

15

Page 16: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

TestLabelsY test

y’1y’2...y’M

RawTestDataX test

x’1x’2….x’M

Supervisedlearning:Testing• Applythemodeltotherawtestdata• Evaluatebycomparingpredictedlabelsagainstthetestlabels

16

Learnedmodelh(x)

PredictedLabelsh(X test)h(x’1)h(x’2)….

h(x’M)

Canyouuse thetestdataotherwise?

Page 17: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

SupervisedLearning:Examples

§ Diseasediagnosis§ x:Propertiesofpatient(symptoms,labtests)§ f:Disease(ormaybe:recommendedtherapy)

§ Part-of-Speechtagging§ x:AnEnglishsentence(e.g.,Thecanwillrust)§ f:Thepartofspeechofawordinthesentence

§ Facerecognition§ x:Bitmappictureofperson’sface§ f:Nametheperson(ormaybe:apropertyof)

§ AutomaticSteering§ x:Bitmappictureofroadsurfaceinfrontofcar§ f:Degreestoturnthesteeringwheel

17

Manyproblemsthatdonotseemlikeclassificationproblemscanbedecomposedintoclassificationproblems.

Page 18: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

KeyIssuesinMachineLearning• Modeling

– Howtoformulateapplicationproblemsasmachinelearningproblems?– Howtorepresentthedata?– LearningProtocols(whereisthedata&labelscomingfrom?)

• Representation– Whatfunctions shouldwelearn(hypothesisspaces)?– Howtomaprawinput toaninstancespace?– Anyrigorouswaytofindthese?Anygeneralapproach?

• Algorithms– Whataregoodalgorithms?– Howdowedefinesuccess?– Generalizationvs.overfitting– Thecomputationalproblem

18

Page 19: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Usingsupervisedlearning

§ Whatisourinstancespace?§ Whatkindoffeaturesareweusing?

§ Whatisourlabelspace?§ Whatkindoflearningtaskarewedealingwith?

§ Whatisourhypothesisspace?§ Whatkindoffunctions(models)arewelearning?

§ Whatlearningalgorithmdoweuse?§ Howdowelearnthemodelfromthelabeleddata?

§ Whatisourlossfunction/evaluationmetric?§ Howdowemeasuresuccess?Whatdriveslearning?

19

Page 20: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Output

y∈YAnitemy

drawnfromalabelspaceY

Input

x∈XAnitemx

drawnfromaninstancespaceX

LearnedModelh(x)

1.TheinstancespaceX

• DesigninganappropriateinstancespaceX iscrucialforhowwellwecanpredicty.

20

Page 21: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

1.TheinstancespaceX§ Whenweapplymachinelearningtoatask,wefirst

needtodefinetheinstancespaceX.§ Instancesx∈ X aredefinedbyfeatures:

§ Booleanfeatures:§ Isthereafoldernamedafterthesender?§ Doesthisemailcontainstheword‘class’?§ Doesthisemailcontainstheword‘waiting’?§ Doesthisemailcontainstheword‘class’andtheword‘waiting’?

§ Numericalfeatures:§ Howoftendoes‘learning’occurinthisemail?§ Whatlongisemail?§ HowmanyemailshaveIseenfromthissenderoverthelastday/week/month?

§ Bagoftokens§ Justlistallthetokens intheinput 21

Doesitaddanything?

Page 22: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

What’sX fortheBadgesgame?

§ Possiblefeatures:§ Gender§ Name’scountry-of-origin§ Lengthoftheirfirstorlastname§ Doesthenamecontainletter‘x’?§ Howmanyvowelsdoestheirnamecontain?§ Isthen-th letteravowel?§ Doesthenamehavethesamenumberofvowelsandconsonants?

22

Page 23: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

X asavectorspace

§ X isanN-dimensionalvectorspace(e.g.<N)§ Eachdimension=onefeature.

§ Eachx isafeaturevector(hencetheboldfacex).§ Thinkofx =[x1 …xN]asapointinX :

23x1

x2

Page 24: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Goodfeaturesareessential§ Thechoiceoffeaturesiscrucial forhowwellataskcanbelearned

§ Inmanyapplicationareas(language,vision,etc.),alotofworkgoesintodesigningsuitablefeatures

§ Thisrequiresdomainexpertise

§ Thinkaboutthebadgesgame– whatifyouwerefocusingonvisualfeatures?

§ Wecan’tteachyouwhatspecificfeaturestouseforyourtask§ Butwewilltouchonsomegeneralprinciples

24

Page 25: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Output

y∈YAnitemy

drawnfromalabelspaceY

Input

x∈XAnitemx

drawnfromaninstancespaceX

LearnedModelh(x)

2.ThelabelspaceY

• ThelabelspaceY determineswhatkind ofsupervisedlearningtask wearedealingwith

25

Page 26: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

SupervisedlearningtasksI

§ Outputlabelsy∈Y arecategorical:§ Binaryclassification:Twopossiblelabels§ Multi-classclassification:kpossiblelabels

§ Outputlabelsy∈Y arestructuredobjects (sequencesoflabels,parsetrees,etc.)

§ Structurelearning

26

Page 27: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

SupervisedlearningtasksII

§ Outputlabelsy∈Y arenumerical:§ Regression(linear/polynomial):

§ Labelsarecontinuous-valued§ Learnalinear/polynomialfunctionf(x)

§ Ranking:§ Labelsareordinal§ Learnanorderingf(x1)>f(x2)overinput

27

Page 28: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

Output

y∈YAnitemy

drawnfromalabelspaceY

Input

x∈XAnitemx

drawnfromaninstancespaceX

LearnedModelh(x)

3.Themodelh(x)

• Weneedtochoosewhatkind ofmodelwewanttolearn

28

Page 29: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

ALearningProblem

29

y = f (x1, x2, x3, x4)Unknownfunction

x1x2x3x4

Example x1 x2 x3 x4 y1 0 0 1 0 0

3 0 0 1 1 14 1 0 0 1 15 0 1 1 0 06 1 1 0 0 07 0 1 0 1 0

2 0 1 0 0 0Canyoulearnthis

function?Whatisit?

Page 30: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

HypothesisSpaceCompleteIgnorance:Thereare216 =65536possiblefunctionsoverfourinputfeatures.

Wecan’tfigureoutwhichoneiscorrectuntilwe’veseeneverypossibleinput-outputpair.

Afterobservingsevenexampleswestillhave29 possibilitiesfor f

IsLearningPossible?

30

Example x1 x2 x3 x4 y

16 1 1 1 1 ?

1 0 0 0 0 ?

1 0 0 0 ?

1 0 1 1 ?1 1 0 0 01 1 0 1 ?

1 0 1 0 ?1 0 0 1 1

0 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 ?

0 0 1 1 10 0 1 0 0

2 0 0 0 1 ?

1 1 1 0 ?

q Thereare|Y||X| possiblefunctionsf(x)fromtheinstancespaceX tothelabelspaceY.

q Learnerstypicallyconsideronlyasubset ofthefunctionsfromX toY,calledthehypothesisspaceH .H⊆|Y||X|

Page 31: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

GeneralstrategiesforMachineLearning

§ Developflexiblehypothesisspaces:§ Decisiontrees,neuralnetworks,nestedcollections.§ Constrainingthehypothesisspaceisdonealgorithmically

§ Developrepresentationlanguagesforrestrictedclassesoffunctions:§ Servetolimittheexpressivityofthetargetmodels§ E.g.,Functionalrepresentation(n-of-m);Grammars;linearfunctions;stochasticmodels;

§ Getflexibilitybyaugmentingthefeaturespace§ Ineithercase:

§ Developalgorithmsforfindingahypothesisinourhypothesisspace,thatfitsthedata

§ Andhopethattheywillgeneralizewell

34

Page 32: Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples § Disease diagnosis § x: Properties of patient (symptoms, lab tests) § f : Disease (or

KeyIssuesinMachineLearning• Modeling

– Howtoformulateapplicationproblemsasmachinelearningproblems?– Howtorepresentthedata?– LearningProtocols(whereisthedata&labelscomingfrom?)

• Representation– Whatfunctions shouldwelearn(hypothesisspaces)?– Howtomaprawinput toaninstancespace?– Anyrigorouswaytofindthese?Anygeneralapproach?

• Algorithms– Whataregoodalgorithms?– Howdowedefinesuccess?– Generalizationvs.overfitting– Thecomputationalproblem

35