Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC 25000 January 29, 2004.
Learning: Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002.
-
Upload
christal-ray -
Category
Documents
-
view
215 -
download
0
Transcript of Learning: Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002.
Agenda
• Machine learning: Introduction• Nearest neighbor techniques
– Applications: Robotic motion, Credit rating
• Efficient implementations:– k-d trees, parallelism
• Extensions: K-nearest neighbor• Limitations:
– Distance, dimensions, & irrelevant attributes
Machine Learning
• Learning: Acquiring a function, based on past inputs and values, from new inputs to values.
• Learn concepts, classifications, values– Identify regularities in data
Machine Learning Examples
• Pronunciation: – Spelling of word => sounds
• Speech recognition:– Acoustic signals => sentences
• Robot arm manipulation:– Target => torques
• Credit rating:– Financial data => loan qualification
Machine Learning Characterization
• Distinctions:– Are output values known for any inputs?
• Supervised vs unsupervised learning– Supervised: training consists of inputs + true output value
» E.g. letters+pronunciation
– Unsupervised: training consists only of inputs
» E.g. letters only
• Course studies supervised methods
Machine Learning Characterization
• Distinctions:– Are output values discrete or continuous?
• Discrete: “Classification”– E.g. Qualified/Unqualified for a loan application
• Continuous: “Regression”– E.g. Torques for robot arm motion
• Characteristic of task
Machine Learning Characterization
• Distinctions:– What form of function is learned?
• Also called “inductive bias”
• Graphically, decision boundary
• E.g. Single, linear separator– Rectangular boundaries - ID trees
– Vornoi spaces…etc…
+ + + - - -
Machine Learning Functions
• Problem: Can the representation effectively model the class to be learned?
• Motivates selection of learning algorithm
++ + + + +
- - - - - - - - -
For this function,Linear discriminant is GREAT!Rectangular boundaries (e.g. ID trees)
TERRIBLE!
Pick the right representation!
Machine Learning Features
• Inputs: – E.g.words, acoustic measurements, financial data– Vectors of features:
• E.g. word: letters – ‘cat’: L1=c; L2 = a; L3 = t
• Financial data: F1= # late payments/yr : Integer
• F2 = Ratio of income to expense: Real
Machine Learning Features
• Question: – Which features should be used?
– How should they relate to each other?
• Issue 1: How do we define relation in feature space if features have different scales? – Solution: Scaling/normalization
• Issue 2: Which ones are important?– If differ in irrelevant feature, should ignore
Complexity & Generalization
• Goal: Predict values accurately on new inputs
• Problem: – Train on sample data– Can make arbitrarily complex model to fit– BUT, will probably perform badly on NEW data
• Strategy:– Limit complexity of model (e.g. degree of equ’n)– Split training and validation sets
• Hold out data to check for overfitting
Nearest Neighbor
• Memory- or case- based learning• Supervised method: Training
– Record labeled instances and feature-value vectors
• For each new, unlabeled instance– Identify “nearest” labeled instance– Assign same label
• Consistency heuristic: Assume that a property is the same as that of the nearest reference case.
Nearest Neighbor Example
• Problem: Robot arm motion– Difficult to model analytically
• Kinematic equations – Relate joint angles and manipulator positions
• Dynamics equations– Relate motor torques to joint angles
– Difficult to achieve good results modeling robotic arms or human arm
• Many factors & measurements
Nearest Neighbor Example
• Solution: – Move robot arm around– Record parameters and trajectory segment
• Table: torques, positions,velocities, squared velocities, velocity products, accelerations
– To follow a new path:• Break into segments
• Find closest segments in table
• Get those torques (interpolate as necessary)
Nearest Neighbor Example
• Issue: Big table– First time with new trajectory
• “Closest” isn’t close
• Table is sparse - few entries
• Solution: Practice– As attempt trajectory, fill in more of table
• After few attempts, very close
Nearest Neighbor Example II
• Credit Rating:– Classifier: Good /
Poor– Features:
• L = # late payments/yr;
• R = Income/Expenses
Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 PE 30 0.85 P
F 11 1.2 G
G 7 1.15 GH 15 0.8 P
Nearest Neighbor Example II
Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 PE 30 0.85 P
F 11 1.2 G
G 7 1.15 GH 15 0.8 P L
R
302010
1 A
B
C D E
FG
H
Nearest Neighbor Example II
L 302010
1 A
B
C D E
FG
HR
Name L R G/P
H 6 1.15
I 22 0.45
J 15 1.2
G
HP
I
??
J
Distance Measure:
Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2))
- Scaled distance
Efficient Implementations
• Classification cost:– Find nearest neighbor: O(n)
• Compute distance between unknown and all instances
• Compare distances
– Problematic for large data sets
• Alternative:– Use binary search to reduce to O(log n)
Efficient Implementation: K-D Trees
• Divide instances into sets based on features– Binary branching: E.g. > value
– 2^d leaves with d split path = n • d= O(log n)
– To split cases into sets,• If there is one element in the set, stop• Otherwise pick a feature to split on
– Find average position of two middle objects on that dimension» Split remaining objects based on average position» Recursively split subsets
K-D Trees: Classification
R > 0.825?
L > 17.5? L > 9 ?
No Yes
R > 0.6? R > 0.75? R > 1.025 ?R > 1.175 ?
NoYes No Yes
No
Poor Good
Yes No Yes
Good Poor
No Yes
Good Good
No
Poor
Yes
Good
Efficient Implementation:Parallel Hardware
• Classification cost:– # distance computations
• Const time if O(n) processors
– Cost of finding closest• Compute pairwise minimum, successively
• O(log n) time
Nearest Neighbor: Issues
• Prediction can be expensive if many features
• Affected by classification, feature noise– One entry can change prediction
• Definition of distance metric– How to combine different features
• Different types, ranges of values
• Sensitive to feature selection
Nearest Neighbor Analysis
• Problem: – Ambiguous labeling, Training Noise
• Solution:– K-nearest neighbors
• Not just single nearest instance• Compare to K nearest neighbors
– Label according to majority of K
• What should K be?– Often 3, can train as well
Nearest Neighbor: Analysis
• Issue: – What is a good distance metric?– How should features be combined?
• Strategy:– (Typically weighted) Euclidean distance– Feature scaling: Normalization
• Good starting point: – (Feature - Feature_mean)/Feature_standard_deviation
– Rescales all values - Centered on 0 with std_dev 1
Nearest Neighbor: Analysis
• Issue: – What features should we use?
• E.g. Credit rating: Many possible features– Tax bracket, debt burden, retirement savings, etc..
– Nearest neighbor uses ALL – Irrelevant feature(s) could mislead
• Fundamental problem with nearest neighbor
Nearest Neighbor: Advantages
• Fast training:– Just record feature vector - output value set
• Can model wide variety of functions– Complex decision boundaries– Weak inductive bias
• Very generally applicable
Summary
• Machine learning:– Acquire function from input features to value
• Based on prior training instances
– Supervised vs Unsupervised learning• Classification and Regression
– Inductive bias: • Representation of function to learn
• Complexity, Generalization, & Validation