Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors:...

18
Machine Learning for Intelligent Systems Instructors: Nika Haghtalab (this time) and Thorsten Joachims Lecture 3: Supervised Learning and Decision Trees Reading: UML 18-18.2

Transcript of Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors:...

Page 1: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

MachineLearningforIntelligentSystems

Instructors:NikaHaghtalab (thistime)andThorstenJoachims

Lecture3:SupervisedLearningandDecisionTrees

Reading:UML18-18.2

Page 2: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

PlacementExam

82

84

86

88

90

92

94

Calculus Optimization Probability Lin.Algebra

Page 3: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

Example:AppleHarvestFestivalLearntoclassifytastyapplesbasedonexperience.à Trainingset:Applesyouhavetastedinthepast,theirfeaturesandwhethertheyweretasty.

à Youseeanapple,wouldyoubuyit?

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes#11 A Red Medium Crunchy ?

Features Label

Page 4: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

SupervisedLearningInstanceSpace:

InstancespaceX includingfeaturerepresentation.E.g.,X = A, B × red, green × large,medium, small × crunchy, soft .

TargetAttributes(Labels):AsetY oflabels.E.g.,Y = Tasty, Not Tasty orY = Yes, No .

Hiddentargetfunction:Anunknownfunction,f: X → Y,e.g.,howanapplesfeaturescorrelate

withitstastinessinreallife.TrainingData:

Asetoflabeledpairs x, f x ∈ X×Y thatwehaveseenbefore,e.g.,wehavetasteda(A, red, large, crunchy) applebeforethatwasTasty.

Givenalargeenoughnumberoftrainingexamples,learnahypothesish: X → Y thatapproximatesf(⋅)

Informal:Ourmaingoal

Page 5: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

HypothesisSpaceHypothesisspace:

ThesetH offunctionsweconsiderwhenlookingforh: X → Y.

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes

Example:AllhypothesesthatareANDoffeature-values:Farm = whatever ∧ color = red ∧ size = whatever∧ Iirmness = Crunchy

Page 6: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

Consistency

Afunctionh: X → Y isconsistentwithasetoflabeledtrainingexamplesS ifandonlyifh x = y forall x, y ∈ S.

Consistency

Example:Is Farm = whatever ∧ color = red ∧ size = whatever ∧Iirmness = Crunchy consistentwiththetableofapples?

IsitconsistentwiththeapplesthatcamefromfarmA?

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes

Page 7: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

InductiveLearning

Givenalargeenoughnumberoftrainingexamples,andahypothesisspaceH,

learnahypothesish ∈ H thatapproximatesf(⋅)

MoreFormal:Ourmaingoal

Ourhope:Thereisah ∈ H thatapproximatesf.

Ourstrategy: Our trainingexamplesarerepresentativeoftherealityàWehaveseenmany examples.à Theywereselectedwithoutanybias.àAnyhypothesisthatisconsistentwithtrainingdatawouldbeaccurateonunobservedinstances.

Findh ∈ H suchthatisconsistentwithS.

h x = f x foralmostallx ∈ X

Page 8: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

UsingConsistencyinLearningIdea: OnatrainingsetS,returnanyh ∈ H thatisconsistentwithit.

TheversionspaceVS(H, S) isasubsetoffunctionsfromHthatisconsistentwithS.

VersionSpace:

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes

Example:ForthefollowingsetoftrainingexamplesSwhatandH thatisallhypothesesthatareANDoffeature-valuesettings.WhatisVS(H, S)?

Page 9: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

ComputingtheVersionSpace

• InitializeVS = H• Foreachtrainingexample x, y ∈ S,

o removeanyh ∈ H suchthath x ≠ y.• OutputVS.

List-then-EliminateAlgorithm

Atahighlevel:ListallhypothesisinH,RemovethemiftheyarenotconsistentwithsomeexampleinS.

Takeaway:Thealgorithmworks,butà Keepingtrackoftheversionspaceistimeandspaceconsuming.à Onlyneedoneconsistentà Buildaconsistenthypothesisdirectly.

Page 10: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

DecisionTrees

Firmness

soft

No

crunchy

Color

greenred

Yes Size

largesmall

No Yes

Internalnodes:Testonefeature,e.g.,color.

Branchesfromaninternalnode:Possiblevaluesofthefeaturethat’sbeingtested,e.g.,{red,green}.

Leafnodes:Thelabelofinstanceswhosefeaturescorrespondtotheroot-to-leafpath,e.g.,Yes orNo.

No

medium

Page 11: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes

UsingaDecisionTreeLetfunctionf bethefollowingdecisiontree.1. Whatisf((B, Green, Large, Crunchy))?2. Whatisf((B, Red, Large, Soft))?3. Whatisf((A, Green, Small, Crunchy))?

Firmness

soft

No

crunchy

Color

greenred

Yes Size

largesmall

No YesNo

medium

Is f consistentwiththetrainingsetinthetable.

Page 12: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

MakingaTree

Example:Whatisthedecisiontreecorrespondingtofunctions1. Iirmness = Crunchy ?2. Iirmness = Crunchy ∧ (color = Red)?3. color = Red ∨ size = Large ?4. Iirmness = Crunchy ∧ color = Red ∨ size = Large ?

Page 13: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

Top-DownInductionofDecisionTreesIdea:• Don’tuseList-then-Eliminatetooinefficient.• Insteadgrowagooddecisiontreetostartwith.

àWegrowfromtheroottotheleavesà Repeatedlytakeanexitingleafthatdoesnotincludeadefinitivelabelandreplaceitwithaninternalnode.

• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.

• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat

SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.

TD-IDT(S,y)

Page 14: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

GrowaconsistentDT

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes

• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.

• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat

SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.

TD-IDT(S,y)

Page 15: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

WhichSplit?HowshouldwepickfeatureA forsplitting.à Howdoesthepredictionimproveafterthesplità Ateachnode,takethenumberofpositiveandnegativeinstances.E.g.(4Yes,2No).

à Ifweweretostop,besttopredictthemajoritylabel.E.g.,Yes.à Howdoweachieveamoredefiniteprediction?

No

Firmness

softcrunchy

Yes

(4Y,6N)

(4Y,2N) (0Y,4N)

Yes

Size

largesmall

No No

(4Y,6N)

(2Y,3N)

medium

(0Y,3N) (2Y,0N)

Yes

Farm

BA

No

(4Y,6N)

(1Y,4N) (3Y,2N)

Page 16: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

WhichSplit?

No

Firmness

softcrunchy

Yes

{4Y,6N}

{4Y,2N} {0Y,4N}

Yes

Size

largesmall

No No

(4Y,6N)

(2Y,3N)

medium

(0Y,3N) (2Y,0N)

Yes

Farm

BA

No

(4Y,6N)

(1Y,4N) (3Y,2N)

Differentwaystomeasurethis:• Attributethatminimizesthenumberoferror:

• Beforesplit:Err 𝑆 = size of the minority label.• Aftersplit:Err 𝑆|𝐴 = ∑^_ Err(𝑆a)

• Attributethatminimizesentropy(maximizeInformationGain).• Beforesplit:𝐻 𝑆 = −∑𝑝 𝑥 log(𝑝 𝑥 )• Aftersplit:I 𝑆|𝐴 = ∑^_ 𝑝 𝑆a 𝐻(𝑆a)

Page 17: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

Example:TextClassification• Task:LearnrulethatclassifiesReutersBusinessNews

• Class+:“CorporateAcquisitions”• Class-:Otherarticles• 2000traininginstances

• Representation:• Booleanattributes,indicatingpresenceofakeywordinarticle• 9947suchkeywords(moreaccurately,word“stems”)

LAROCHE STARTS BID FOR NECO SHARES

Investor David F. La Roche of North Kingstown, R.I., said he is offering to purchase 170,000 common shares of NECO Enterprises Inc at 26 dlrs each. He said the successful completion of the offer, plus shares he already owns, would give him 50.5 pct of NECO's 962,016 common shares. La Roche said he may buy more, and possible all NECO shares. He said the offer and withdrawal rights will expire at 1630 EST/2130 gmt, March 30, 1987.

SALANT CORP 1ST QTR FEB 28 NET

Oper shr profit seven cts vs loss 12 cts. Oper net profit 216,000 vs loss 401,000. Sales 21.4 mln vs 24.9 mln. NOTE: Current year net excludes 142,000 dlr tax credit. Company operating in Chapter 11 bankruptcy.

+ -

Page 18: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

DecisionTreefor“CorporateAcq.”• vs=1:-• vs=0:• |export=1:…• |export=0:• ||rate=1:• |||stake=1:+• |||stake=0:• ||||debenture=1:+• ||||debenture=0:• |||||takeover=1:+• |||||takeover=0:• ||||||file=0:-• ||||||file=1:• |||||||share=1:+• |||||||share=0:-…andmanymore

Learnedtree:• has437nodes• isconsistent

Accuracyoflearnedtree:• 11%errorrate

Note:wordstemsexpandedforimprovedreadability.