Learning: Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002.

29
Learning: Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002

Transcript of Learning: Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002.

Learning:Nearest Neighbor

Artificial Intelligence

CMSC 25000

January 31, 2002

Agenda

• Machine learning: Introduction• Nearest neighbor techniques

– Applications: Robotic motion, Credit rating

• Efficient implementations:– k-d trees, parallelism

• Extensions: K-nearest neighbor• Limitations:

– Distance, dimensions, & irrelevant attributes

Machine Learning

• Learning: Acquiring a function, based on past inputs and values, from new inputs to values.

• Learn concepts, classifications, values– Identify regularities in data

Machine Learning Examples

• Pronunciation: – Spelling of word => sounds

• Speech recognition:– Acoustic signals => sentences

• Robot arm manipulation:– Target => torques

• Credit rating:– Financial data => loan qualification

Machine Learning Characterization

• Distinctions:– Are output values known for any inputs?

• Supervised vs unsupervised learning– Supervised: training consists of inputs + true output value

» E.g. letters+pronunciation

– Unsupervised: training consists only of inputs

» E.g. letters only

• Course studies supervised methods

Machine Learning Characterization

• Distinctions:– Are output values discrete or continuous?

• Discrete: “Classification”– E.g. Qualified/Unqualified for a loan application

• Continuous: “Regression”– E.g. Torques for robot arm motion

• Characteristic of task

Machine Learning Characterization

• Distinctions:– What form of function is learned?

• Also called “inductive bias”

• Graphically, decision boundary

• E.g. Single, linear separator– Rectangular boundaries - ID trees

– Vornoi spaces…etc…

+ + + - - -

Machine Learning Functions

• Problem: Can the representation effectively model the class to be learned?

• Motivates selection of learning algorithm

++ + + + +

- - - - - - - - -

For this function,Linear discriminant is GREAT!Rectangular boundaries (e.g. ID trees)

TERRIBLE!

Pick the right representation!

Machine Learning Features

• Inputs: – E.g.words, acoustic measurements, financial data– Vectors of features:

• E.g. word: letters – ‘cat’: L1=c; L2 = a; L3 = t

• Financial data: F1= # late payments/yr : Integer

• F2 = Ratio of income to expense: Real

Machine Learning Features

• Question: – Which features should be used?

– How should they relate to each other?

• Issue 1: How do we define relation in feature space if features have different scales? – Solution: Scaling/normalization

• Issue 2: Which ones are important?– If differ in irrelevant feature, should ignore

Complexity & Generalization

• Goal: Predict values accurately on new inputs

• Problem: – Train on sample data– Can make arbitrarily complex model to fit– BUT, will probably perform badly on NEW data

• Strategy:– Limit complexity of model (e.g. degree of equ’n)– Split training and validation sets

• Hold out data to check for overfitting

Nearest Neighbor

• Memory- or case- based learning• Supervised method: Training

– Record labeled instances and feature-value vectors

• For each new, unlabeled instance– Identify “nearest” labeled instance– Assign same label

• Consistency heuristic: Assume that a property is the same as that of the nearest reference case.

Nearest Neighbor Example

• Problem: Robot arm motion– Difficult to model analytically

• Kinematic equations – Relate joint angles and manipulator positions

• Dynamics equations– Relate motor torques to joint angles

– Difficult to achieve good results modeling robotic arms or human arm

• Many factors & measurements

Nearest Neighbor Example

• Solution: – Move robot arm around– Record parameters and trajectory segment

• Table: torques, positions,velocities, squared velocities, velocity products, accelerations

– To follow a new path:• Break into segments

• Find closest segments in table

• Get those torques (interpolate as necessary)

Nearest Neighbor Example

• Issue: Big table– First time with new trajectory

• “Closest” isn’t close

• Table is sparse - few entries

• Solution: Practice– As attempt trajectory, fill in more of table

• After few attempts, very close

Nearest Neighbor Example II

• Credit Rating:– Classifier: Good /

Poor– Features:

• L = # late payments/yr;

• R = Income/Expenses

Name L R G/P

A 0 1.2 G

B 25 0.4 P

C 5 0.7 G

D 20 0.8 PE 30 0.85 P

F 11 1.2 G

G 7 1.15 GH 15 0.8 P

Nearest Neighbor Example II

Name L R G/P

A 0 1.2 G

B 25 0.4 P

C 5 0.7 G

D 20 0.8 PE 30 0.85 P

F 11 1.2 G

G 7 1.15 GH 15 0.8 P L

R

302010

1 A

B

C D E

FG

H

Nearest Neighbor Example II

L 302010

1 A

B

C D E

FG

HR

Name L R G/P

H 6 1.15

I 22 0.45

J 15 1.2

G

HP

I

??

J

Distance Measure:

Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2))

- Scaled distance

Efficient Implementations

• Classification cost:– Find nearest neighbor: O(n)

• Compute distance between unknown and all instances

• Compare distances

– Problematic for large data sets

• Alternative:– Use binary search to reduce to O(log n)

Efficient Implementation: K-D Trees

• Divide instances into sets based on features– Binary branching: E.g. > value

– 2^d leaves with d split path = n • d= O(log n)

– To split cases into sets,• If there is one element in the set, stop• Otherwise pick a feature to split on

– Find average position of two middle objects on that dimension» Split remaining objects based on average position» Recursively split subsets

K-D Trees: Classification

R > 0.825?

L > 17.5? L > 9 ?

No Yes

R > 0.6? R > 0.75? R > 1.025 ?R > 1.175 ?

NoYes No Yes

No

Poor Good

Yes No Yes

Good Poor

No Yes

Good Good

No

Poor

Yes

Good

Efficient Implementation:Parallel Hardware

• Classification cost:– # distance computations

• Const time if O(n) processors

– Cost of finding closest• Compute pairwise minimum, successively

• O(log n) time

Nearest Neighbor: Issues

• Prediction can be expensive if many features

• Affected by classification, feature noise– One entry can change prediction

• Definition of distance metric– How to combine different features

• Different types, ranges of values

• Sensitive to feature selection

Nearest Neighbor Analysis

• Problem: – Ambiguous labeling, Training Noise

• Solution:– K-nearest neighbors

• Not just single nearest instance• Compare to K nearest neighbors

– Label according to majority of K

• What should K be?– Often 3, can train as well

Nearest Neighbor: Analysis

• Issue: – What is a good distance metric?– How should features be combined?

• Strategy:– (Typically weighted) Euclidean distance– Feature scaling: Normalization

• Good starting point: – (Feature - Feature_mean)/Feature_standard_deviation

– Rescales all values - Centered on 0 with std_dev 1

Nearest Neighbor: Analysis

• Issue: – What features should we use?

• E.g. Credit rating: Many possible features– Tax bracket, debt burden, retirement savings, etc..

– Nearest neighbor uses ALL – Irrelevant feature(s) could mislead

• Fundamental problem with nearest neighbor

Nearest Neighbor: Advantages

• Fast training:– Just record feature vector - output value set

• Can model wide variety of functions– Complex decision boundaries– Weak inductive bias

• Very generally applicable

Summary

• Machine learning:– Acquire function from input features to value

• Based on prior training instances

– Supervised vs Unsupervised learning• Classification and Regression

– Inductive bias: • Representation of function to learn

• Complexity, Generalization, & Validation

Summary: Nearest Neighbor

• Nearest neighbor:– Training: record input vectors + output value– Prediction: closest training instance to new data

• Efficient implementations

• Pros: fast training, very general, little bias

• Cons: distance metric (scaling), sensitivity to noise & extraneous features