Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page;...

19
Today’s Topics • HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in Moodle • Please do not use company-specific discussion forums for cs540 – use Moodle (or request we add Piazza) • Everyone gets 10 free late days this term (but at most 5 per HW) • Learning from Labeled Examples • Supervised Learning and Venn Diagrams • Simple ML Algo: k-Nearest Neighbors Read Section 18.8.1 of textbook and Wikipedia article(s) linked to class home page • Tuning Parameters • Some “ML Commandments” 9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 1

Transcript of Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page;...

Page 1: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Lecture 1, Slide 1

Today’s Topics• HW0 due 11:55pm tonight and no later than next Tuesday

• HW1 out on class home page; discussion page in Moodle

• Please do not use company-specific discussion forums for cs540 – use Moodle (or request we add Piazza)

• Everyone gets 10 free late days this term (but at most 5 per HW)

• Learning from Labeled Examples

• Supervised Learning and Venn Diagrams

• Simple ML Algo: k-Nearest Neighbors

– Read Section 18.8.1 of textbook and Wikipedia article(s) linked to class home page

• Tuning Parameters

• Some “ML Commandments”

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 2: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Where We Are

• Have selected ‘concept’ to learn• Have chosen features to rep examples• Have created at least 100 labeled examples

• Next: learn a ‘model’ that can predict output for NEW examples

9/15/15 Lecture 1, Slide 2

Page 3: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

Learning from Labeled Examples

Positive Examples Negative Examples

Category of this example?

Concept

Solid Red Circle in a (Regular?) Polygon

What about? Figures on left side of page Figures drawn before 5pm 2/2/89 <etc>

3CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 4: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

.

.

.

Concept Learning

Learning systems differ in how they represent concepts

TrainingExamples

Backpropagation

ID3, C4.5, CART

AQ, FOIL, Aleph

SVMs

NeuralNet

DecisionTree

Φ X YΦ Z

Rules

If 5x1 + 9x2 – 3x3 > 12Then +

Weighted Sum

4CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 5: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

Recall: Feature Space

If examples are described in terms of values of features, they can be plotted as points in an N-dimensional space

Size

Color

Weight

?Big

2500

Gray

A “concept” is then a (possibly disjoint) volume in this space

5CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 6: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

Supervised Learning and Venn Diagrams

Concept = A or B (ie, a disjunctive concept)

Examples = labeled points in feature space

Concept = a label for regions of feat. space

Venn Diagram

A

B

--

--

-

-

- -

-

-

-

-

--

-

-

-

--

--

- - --

- -

---

--

-

-

+

++ ++

+ +

+

++

+ +

+

++

+

+

++

Feature Space

6CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 7: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

Brief Introduction to Logic

• Conjunctive ConceptColor(?obj1, red)

Size(?obj1, large)

• Disjunctive ConceptColor(?obj2, blue)

Size(?obj2, small)

• More formally a “concept” is of the form

x y z F(x, y, z) Member(x, Class1)

“and”

“or”

Instances

7CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 8: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Lecture 1, Slide 8

Logical Symbols

and

or

not

implies

equivalent

for all

there exists

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 9: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Deduction – compute what logically follows

– if we know P(Mary) is true and x P(x) Q(x), we can deduce Q(Mary)

Induction– if we observe P(1), P(2), …, P(100)

we can induce x P(x)– might be wrong

Induction vs. Deduction

9/15/15

Which does supervised ML do?

9CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 10: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

Nearest-Neighbor Algorithms (aka exemplar models, instance-based learning,

case-based learning) – Section 18.1.1 of textbook

• Learning ≈ memorize training examples

• Problem solving = find most similar example in memory; output its category

Venn

-

--

-

-

--

-+

+

+

+ + +

++

+

+?

“VoronoiDiagrams”

(all points closest to labeled example

in center)

10CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 11: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

• Find the K nearest neighbors to test-set example

• Or find all ex’s within radius R• Combine their ‘votes’

– Most common category– Average value (real-valued prediction)– Can also weight votes by distance– Lots of variations on basic theme

Nearest Neighbors:Basic Algorithm

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

-

-

+?

11

--

+

Page 12: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

“Hamming Distance” (# of different bits)Ex 1 = 2Ex 2 = 1Ex 3 = 2

Simple Example: 1-NN

Training Set1. a=0, b=0, c=1 +2. a=0, b=0, c=0 -3. a=1, b=1, c=1 -Test Example

a=0, b=1, c=0 ?

So output -

(1-NN ≡ one nearest neighbor)

12CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 13: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Some Common Jargon

ClassificationLearning a discrete valued function

RegressionLearning a real valued function

k-NN easily extended to regression tasks (and to multi-category classification) – HOW?

Discrete/RealOutputs

(inputs can be real valued in both cases)

13

Page 14: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15

Sample Experimental Results (see UC-Irvine archive for more)

TestbedTestset Correctness

1-NN D-Trees Neural Nets

Wisconsin Cancer 98% 95% 96%

Heart Disease

78% 76% ?

Tumor 37% 38% ?

Appendicitis 83% 85% 86%Simple algorithm works quite well!

14CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Why so low?

Page 15: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Lecture 1, Slide 15

Doing Well by doing Poorly

• You say: “Bad news, my testset accuracy is only 1%” (on a two-category task)

• I say: “That is great news!”

• Why?

• If you NEGATE your predictions,you’ll have 99% accuracy!

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 16: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Lecture 1, Slide 16

Doing Poorly byDoing Well

• You say: “Good news, my testset accuracy is 95%” (on a two-category task)

• I say: “That is bad news!”

• Why might that be?

• Because (let’s assume) the most common output value occurs 99% of the time!

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 17: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Parameter Tuning (First Visit)

Algo: Collect K nearest neighbors, combine their outputs

What should K be?– It is problem (ie, testbed) dependent– Can use tuning sets to select

good setting for K

1

Shouldn’t really“connect the dots”(Why?)

Tuning SetError Rate

2 3 4 5K

17

Page 18: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Lecture 1, Slide 18

Why Not Use the TEST Setto Select Good Parameters?

A 2002 paper in Nature (a major, major journal) needed to be corrected due to “training on the testing set”

Original report : 95% accuracy (5% error rate)

Corrected report (which still is buggy): 73% accuracy (27% error rate)

Error rate increased over 400%!!!

This is, unfortunately, a very common error9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Page 19: Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Some ML “Commandments”

• Let the data decide– ‘Internalize’ (ie, tune) parameters

• Scaling up by dummying down– Don’t ignore simple algo’s, such as

• Always guessing most common category in the training set• Find best SINGLE feature

– Clever ideas do not imply better results

• Generalize don’t memorize– Accuracy on held-aside data is our focus

• Never train on the test examples! – Commonly violated, alas

9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 19