CS 273P Machine Learning and Data Mining SPRING 2019 PROF …xhx/courses/CS273P/02-nearest... ·...

Complexity; Nearest Neighbors

PROF XIAOHUI XIESPRING 2019

CS 273P Machine Learning and Data Mining

Slides courtesy of Alex Ihler

Machine Learning

Complexity and Overfitting

Nearest Neighbors

K-Nearest Neighbors

Inductive bias• “Extend” observed data to unobserved examples

– “Interpolate” / “extrapolate”

• What kinds of functions to expect? Prefer these (“bias”)– Usually, let data pull us away from assumptions only with evidence!

Overfitting and complexity

Overfitting and complexitySimple model: Y= aX + b + e

Overfitting and complexityY = high-order polynomial in X

(complex model)

Overfitting and complexitySimple model: Y= aX + b + e

Overfitting and complexity

How Overfitting affects Prediction

PredictiveError

Model Complexity

Error on Training Data

Error on Test Data

Ideal Rangefor Model Complexity

OverfittingUnderfitting

Training and Test Data

Data• Several candidate learning algorithms or models,

each of which can be fit to data and used for prediction• How can we decide which is best?

Training Data Test Data

Approach 1: Split into train and test data

• Learn parameters of each model from training data• Evaluate all models on test data, and pick best performer

Problem:• Over-estimates test performance (“lucky” model)• Learning algorithms should never have access to test data

Training, Validation, and Test Data

Data• Several candidate learning algorithms or models,

each of which can be fit to data and used for prediction• How can we decide which is best?

Training Data Test DataApproach 2: Reserve some data for validation

• Learn parameters of each model from training data• Evaluate models on validation data, pick best performer• Reserve test data to benchmark chosen model

Validation

Problem:• Wasteful of training data (learning can’t use validation)• May bias selection towards overly simple models

Cross-Validation• Divide training data into K

equal-sized folds• Train on K-1 folds, evaluate on

remainder• Pick model with best average

performance across K trials

How many folds?

• Bias: Too few, and effective training dataset much smaller• Variance: Too many, and test performance estimates noisy• Cost: Must run training algorithm once per fold (parallelizable)• Practical rule of thumb: 5-fold or 10-fold cross-validation• Theoretically troubled: Leave-one-out cross-validation, K=N

Validation

ValidationTest

Fold 1 Fold 2 Fold 3 Fold 4

Competitions• Training data

– Used to build your model(s)• Validation data

– Used to assess, select among, or combine models– Personal validation; leaderboard; …

• Test data– Used to estimate “real world” performance

Machine Learning

Nearest Neighbors

K-Nearest Neighbors

How does machine learning work?

Program (“Learner”)

Characterized by some “parameters” θProcedure (using θ) that outputs a prediction

Training data (examples)

Features (x)

Learning algorithm

Change θImprove performance

Feedback / Target values(y)

Score performance(“cost function”)

“predict”

“train”

Notation● Features x● Targets y● Predictions ŷ = f(x ; θ)● Parameters θ

Regression; Scatter plots

• Suggests a relationship between x and y• Regression: given new observed x(new), estimate y(new)

0 10 200

Feature xx(new)

y(new) =?

Nearest neighbor regression

Find training datum x(i) closest to x(new); predict y(i)

0 10 200

x(new)

y(new) =?

Feature x

“Predictor”:Given new features: Find nearest example Return its value

Nearest neighbor regression

• Find training datum x(i) closest to x(new); predict y(i) • Defines an (implicit) function f(x)• “Form” is piecewise constant

0 10 200

Feature x

Nearest neighbor classifier

“Closest” training x? Typically Euclidean distance:

All points where we decide 1

All points where we decide 0

Decision Boundary

Nearest Nbr:Piecewise linear boundary

Class 0

Class 1

Voronoi Diagram

Voronoi diagram:Each datum is assigned to a region, in which all points are closer to it than any other datum

Decision boundary:Those edges across which the decision (class of nearest training datum) changes

Nearest Nbr:Piecewise linear boundary

More Data Points

More Complex Decision Boundary

In general:Nearest-neighbor classifierproduces piecewise linear decision boundaries

Issue: Neighbor Distance & Dimension

For m=1000 data in unit hypercube

Machine Learning

Nearest Neighbors

K-Nearest Neighbors

K-Nearest Neighbor (kNN) ClassifierFind the k-nearest neighbors to x in the data

• i.e., rank the feature vectors according to Euclidean distance

• select the k vectors which are have smallest distance to x

Regression

• Usually just average the y-values of the k closest training examples

Classification

• ranking yields k feature vectors and a set of k class labels• pick the class label which is most common in this set (“vote”)• classify x as belonging to this class• Note: for two-class problems, if k is odd (k=1, 3, 5, …) there will never be any “ties”; otherwise,

just use (any) tie-breaking rule

“Training” is trivial: just use training data as a lookup table, and search to classify a new datum. Can be

computationally expensive: must find nearest neighbors within a potentially large training set

kNN Decision Boundary• Piecewise linear decision boundary• Increasing k “simplifies” decision boundary

– Majority voting means less emphasis on individual points

K = 1 K = 3

K = 5 K = 7

K = 25

Error rates and K

PredictiveError

K (# neighbors)

Error on Training Data

Error on Test Data

K=1? Zero error!Training data have been memorized...

Best value of K

Complexity & Overfitting• Complex model predicts all training points well• Doesn’t generalize to new data points• k = 1 : perfect memorization of examples (complex)• k = m : always predict majority class in dataset (simple)• Can select k using validation data, etc.

K (# neighbors)

Too complex

simpler

K-Nearest Neighbor (kNN) ClassifierTheoretical Considerations

• as k increases– we are averaging over more neighbors– the effective decision boundary is more “smooth”

• as m increases, the optimal k value tends to increase

Extensions of the Nearest Neighbor classifier• Weighted distances

– e.g., some features may be more important; others may be irrelevant• Fast search techniques (indexing) to find k-nearest points in d-space• Weighted average / voting based on distance

Summary• K-nearest neighbor models

– Classification (vote)– Regression (average or weighted average)

• Piecewise linear decision boundary– How to calculate

• Test data and overfitting– Model “complexity” for knn– Use validation data to estimate test error rates & select k

Upcoming

● Sign up on Piazza○ All course announcements will be done through Piazza! You may miss

information announcements if you are not on Piazza.

● Homework 1 will be up soon. ○ Meanwhile, get familiar with Python, Numpy, Matplotlib

Acknowledgement

Based on slides by Alex Ihler & Sameer Singh

CS 273P Machine Learning and Data Mining SPRING 2019 PROF …xhx/courses/CS273P/02-nearest... ·...

Documents

Transcript of CS 273P Machine Learning and Data Mining SPRING 2019 PROF …xhx/courses/CS273P/02-nearest... ·...

Conscious Pregnancy - Based on Teachings of Yogi Bhajan (273p)

A METHOD FOR CLASSIFICATION OF DOUBLY RESOLVABLE DESIGNS …sci-gems.math.bas.bg/.../10525/1629/1/sjc-vol5-num3-2011-273p-30… · A METHOD FOR CLASSIFICATION OF DOUBLY RESOLVABLE

Shell & Tube Application Request - aihti.com · For CS2000 Series Email form to: ... CS 2000 Series selection TABLE D- Surface Area CS-2036 CS-2048 CS-2060 CS-2072 CS-2084 CS-2096

PANTONE Digital Color Library Simulations 223 CS PANTONE 224 CS PANTONE 225 CS PANTONE 226 CS PANTONE 227 CS PANTONE 228 CS PANTONE 229 CS PANTONE 230 CS PANTONE 231 CS PANTONE 232

Learning express acing the sat 2006 273p

Learning express vocabulary & spelling success in 20 minutes a day, 4th ed 273p

8080ST CS - 8040HD CS - 8060HD CS - 8 CS …...C HD CS - 8060HD CS - 8080HD SR - 804 CS-8060HD CS-8080HD Definitive Technology 2 CS-8060HD & CS-8080HDOWNER’S MANUAL Power Supply

Machine Learning and Data Mining Introductionkkask/Spring-2018 CS273P/slides/01-intro...Machine learning (ML) • One (important) part of AI • Making predictions (or decisions) •

Operating Instructions - Appliances Online · CS-Z7RKR CS-Z9RKR CS-Z12RKR CS-Z15RKR CS-Z18RKR CS-Z21RKR CS-Z24RKR CS-Z28RKR CU-Z7RKR CU-Z9RKR CU-Z12RKR CU-Z15RKR CU-Z18RKR CU-Z21RKR

CS 3870/CS 5870

Product Guide - ums.com.t Guide_SEP...CS 7400 / CS 7600 RVG 5100 / 6100 / 6500 CS 2100 / CS 2200 CDIS/LOGICON/ COMMUNICATOR CS 1200 / 1500 / 1600 CS 8000C / CS 8100 CS 9000 / CS 9300

Machine Learning and Data Mining Linear classificationkkask/Spring-2018 CS273P/slides/05-linClassify.pdf• A data set is separable by a learner if – There is some instance of that

CS-3013 & CS-502, Summer 2006 Multimedia topics1 Multimedia CS-3013 & CS-502 Operating Systems.

PRODUCT CATALOG & APPLICATION GUIDEFits Hakko FX-888, FX-888D Comparable to Hakko A1559 CS-888S CS-888H CS-17 CS-44 CS-47 CS-36 CS-9/625 CS-1 CS-14 CS-14M CS-33 CS-44 Hole design Slitted

CS-1194H CS-2194H

Facing Westward to the Pacific Ocean - Star Lane Vineyardstarlanevineyard.com/wp-content/uploads/2012/04/Starlane_300_Blo… · CS CF ML SY PV MA NB CS CS CS CS CS CS CS CS CS CS

PRODUCT CATALOG & APPLICATION GUIDE - Techspray...Fits Hakko FX-888, FX-888D Comparable to Hakko A1559 CS-888S CS-888H CS-17 CS-44 CS-47 CS-36 CS-9/625 CS-1 CS-14 CS-14M CS-33 CS-44

Machine Learning and Data Mining Decision Treeskkask/Spring-2018 CS273P/slides/09-dectree.pdf · Building a decision tree Stopping conditions: * # of data < K * Depth > D *

EASA vs CH Airworthiness Review CH - Willkommen beim … · · 2015-01-02AMC 20 AMC 21 CS 25 CS 34 CS 36 CS E CS P CS APU CS 22 CS 23 CS 27 ... SAMA 071102 swe FOR TRAINING PURPOSE

CS 273P Machine Learning and Data Mining SPRING 2019 PROF ...xhx/courses/CS273P/12-pca-273p.pdf · CS 273P Machine Learning and Data Mining Slides cour tesy of Alex Ihler. Machine