Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page;...
-
Upload
aron-booth -
Category
Documents
-
view
217 -
download
0
Transcript of Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page;...
Lecture 1, Slide 1
Today’s Topics• HW0 due 11:55pm tonight and no later than next Tuesday
• HW1 out on class home page; discussion page in Moodle
• Please do not use company-specific discussion forums for cs540 – use Moodle (or request we add Piazza)
• Everyone gets 10 free late days this term (but at most 5 per HW)
• Learning from Labeled Examples
• Supervised Learning and Venn Diagrams
• Simple ML Algo: k-Nearest Neighbors
– Read Section 18.8.1 of textbook and Wikipedia article(s) linked to class home page
• Tuning Parameters
• Some “ML Commandments”
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Where We Are
• Have selected ‘concept’ to learn• Have chosen features to rep examples• Have created at least 100 labeled examples
• Next: learn a ‘model’ that can predict output for NEW examples
9/15/15 Lecture 1, Slide 2
9/15/15
Learning from Labeled Examples
Positive Examples Negative Examples
Category of this example?
Concept
Solid Red Circle in a (Regular?) Polygon
What about? Figures on left side of page Figures drawn before 5pm 2/2/89 <etc>
3CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15
.
.
.
Concept Learning
Learning systems differ in how they represent concepts
TrainingExamples
Backpropagation
ID3, C4.5, CART
AQ, FOIL, Aleph
SVMs
NeuralNet
DecisionTree
Φ X YΦ Z
Rules
If 5x1 + 9x2 – 3x3 > 12Then +
Weighted Sum
4CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15
Recall: Feature Space
If examples are described in terms of values of features, they can be plotted as points in an N-dimensional space
Size
Color
Weight
?Big
2500
Gray
A “concept” is then a (possibly disjoint) volume in this space
5CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15
Supervised Learning and Venn Diagrams
Concept = A or B (ie, a disjunctive concept)
Examples = labeled points in feature space
Concept = a label for regions of feat. space
Venn Diagram
A
B
--
--
-
-
- -
-
-
-
-
--
-
-
-
--
--
- - --
- -
---
--
-
-
+
++ ++
+ +
+
++
+ +
+
++
+
+
++
Feature Space
6CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15
Brief Introduction to Logic
• Conjunctive ConceptColor(?obj1, red)
Size(?obj1, large)
• Disjunctive ConceptColor(?obj2, blue)
Size(?obj2, small)
• More formally a “concept” is of the form
x y z F(x, y, z) Member(x, Class1)
“and”
“or”
Instances
7CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Lecture 1, Slide 8
Logical Symbols
and
or
not
implies
equivalent
for all
there exists
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Deduction – compute what logically follows
– if we know P(Mary) is true and x P(x) Q(x), we can deduce Q(Mary)
Induction– if we observe P(1), P(2), …, P(100)
we can induce x P(x)– might be wrong
Induction vs. Deduction
9/15/15
Which does supervised ML do?
9CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15
Nearest-Neighbor Algorithms (aka exemplar models, instance-based learning,
case-based learning) – Section 18.1.1 of textbook
• Learning ≈ memorize training examples
• Problem solving = find most similar example in memory; output its category
Venn
-
--
-
-
--
-+
+
+
+ + +
++
+
+?
…
“VoronoiDiagrams”
(all points closest to labeled example
in center)
10CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
• Find the K nearest neighbors to test-set example
• Or find all ex’s within radius R• Combine their ‘votes’
– Most common category– Average value (real-valued prediction)– Can also weight votes by distance– Lots of variations on basic theme
Nearest Neighbors:Basic Algorithm
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
-
-
+?
11
--
+
9/15/15
“Hamming Distance” (# of different bits)Ex 1 = 2Ex 2 = 1Ex 3 = 2
Simple Example: 1-NN
Training Set1. a=0, b=0, c=1 +2. a=0, b=0, c=0 -3. a=1, b=1, c=1 -Test Example
a=0, b=1, c=0 ?
So output -
(1-NN ≡ one nearest neighbor)
12CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Some Common Jargon
ClassificationLearning a discrete valued function
RegressionLearning a real valued function
k-NN easily extended to regression tasks (and to multi-category classification) – HOW?
Discrete/RealOutputs
(inputs can be real valued in both cases)
13
9/15/15
Sample Experimental Results (see UC-Irvine archive for more)
TestbedTestset Correctness
1-NN D-Trees Neural Nets
Wisconsin Cancer 98% 95% 96%
Heart Disease
78% 76% ?
Tumor 37% 38% ?
Appendicitis 83% 85% 86%Simple algorithm works quite well!
14CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Why so low?
Lecture 1, Slide 15
Doing Well by doing Poorly
• You say: “Bad news, my testset accuracy is only 1%” (on a two-category task)
• I say: “That is great news!”
• Why?
• If you NEGATE your predictions,you’ll have 99% accuracy!
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Lecture 1, Slide 16
Doing Poorly byDoing Well
• You say: “Good news, my testset accuracy is 95%” (on a two-category task)
• I say: “That is bad news!”
• Why might that be?
• Because (let’s assume) the most common output value occurs 99% of the time!
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Parameter Tuning (First Visit)
Algo: Collect K nearest neighbors, combine their outputs
What should K be?– It is problem (ie, testbed) dependent– Can use tuning sets to select
good setting for K
1
Shouldn’t really“connect the dots”(Why?)
Tuning SetError Rate
2 3 4 5K
17
Lecture 1, Slide 18
Why Not Use the TEST Setto Select Good Parameters?
A 2002 paper in Nature (a major, major journal) needed to be corrected due to “training on the testing set”
Original report : 95% accuracy (5% error rate)
Corrected report (which still is buggy): 73% accuracy (27% error rate)
Error rate increased over 400%!!!
This is, unfortunately, a very common error9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2
Some ML “Commandments”
• Let the data decide– ‘Internalize’ (ie, tune) parameters
• Scaling up by dummying down– Don’t ignore simple algo’s, such as
• Always guessing most common category in the training set• Find best SINGLE feature
– Clever ideas do not imply better results
• Generalize don’t memorize– Accuracy on held-aside data is our focus
• Never train on the test examples! – Commonly violated, alas
9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 19