Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

20
Know Thy Neighbor: An Introduction to Scikit- learn and K-NN Portia Burton Portland Data Science Group March 25, 2014

description

Know Thy Neighbor: An Introduction to Scikit-learn and K-NN. Portia Burton Portland Data Science Group March 25, 2014. What We will Cover Today. 1. Define What is Machine Learning 2. Go Over Scikit-learn 3. Explain k-Nearest Neighbor 4. Demo of Scikit-learn and k-Nearest Neighbor . - PowerPoint PPT Presentation

Transcript of Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Page 1: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Portia BurtonPortland Data Science GroupMarch 25, 2014

Page 2: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

What We will Cover Today

1. Define What is Machine Learning2. Go Over Scikit-learn3. Explain k-Nearest Neighbor4. Demo of Scikit-learn and k-Nearest Neighbor

Page 3: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

What is Machine Learning

• The art of creating a predictive models

• Uses input to make predictions

• Enabling computers to pattern match data

Page 4: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN
Page 5: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN
Page 6: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Scikit-Learn

Page 7: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

What is scikit-learn?

• Python machine learning package• Built on NumPy, SciPy, and matplotlib

Page 8: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

**

Page 9: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

k-NN

• k Nearest Neighbor algorithm– The simplest machine learning algorithm– K being the constant

Page 10: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Basic Information about KNN

• It is a lazy algorithm : doesn’t generalize the training data until approached with a new data point

Page 11: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Supervised vs. Unsupervised Learning

Page 12: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Supervised LearningWhen your samples are labeled

Page 13: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Example: Spam Filters

Page 14: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Unsupervised LearningThe given instances are not labeled, and the

categories are determined independently

Page 15: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

How k-NN works

Page 16: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

How k-NN works

?

Page 17: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

What can KNN be used for

• Clustering

• Regression

Page 18: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Downsides of KNN

• Since there is minimum training there is a high cost in testing new data

• Correlation is falsely high (data points can be given too much weight)

Page 19: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

Alternatives to kNN• KDTree• BallTree

Page 20: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN

References:http://www.solver.com/xlminer/help/k-nearest-neighbors-prediction-example

http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/

http://scikit-learn.org/stable/modules/neighbors.html

http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html

http://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning

http://stackoverflow.com/questions/2620343/what-is-machine-learning