MachineLearningfor Language(Technology( Lecture9:!...
Transcript of MachineLearningfor Language(Technology( Lecture9:!...
![Page 1: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/1.jpg)
Machine Learning for Language Technology Lecture 9: Perceptron
Marina San2ni Department of Linguis2cs and Philology Uppsala University, Uppsala, Sweden
Autumn 2014
Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials
1
![Page 2: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/2.jpg)
Inputs and Outputs
![Page 3: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/3.jpg)
Feature Representa2on
![Page 4: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/4.jpg)
Features and Classes
![Page 5: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/5.jpg)
Examples (i)
![Page 6: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/6.jpg)
Examples (ii)
![Page 7: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/7.jpg)
Block Feature Vectors
![Page 8: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/8.jpg)
Representa2on
Linear Classifiers: Repe22on & Extension 8
![Page 9: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/9.jpg)
![Page 10: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/10.jpg)
![Page 11: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/11.jpg)
![Page 12: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/12.jpg)
![Page 13: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/13.jpg)
![Page 14: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/14.jpg)
![Page 15: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/15.jpg)
Linear classifiers (atomic classes)
Linear Classifiers: Repe22on & Extension 15
• Assump2on: data must be linearily separable
![Page 16: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/16.jpg)
Perceptron
![Page 17: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/17.jpg)
Perceptron (i)
![Page 18: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/18.jpg)
Perceptron Learning Algorithm
![Page 19: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/19.jpg)
Separability and Margin (i)
![Page 20: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/20.jpg)
Separability and Margin (ii)
Linear Classifiers: Repe22on & Extension 20
• Given a training instance, let Y bar t be the set of all labels that are incorrect, let’s define the set of incorrect labels minus the correct labels for that instance.
• Then we say that a training set is separable with a margin gamma, if there exists a weight vector w that has a certain norm (ie 1),
The score that we get when we use this vector w minus the score of every incorrect label is at least gamma
![Page 21: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/21.jpg)
Separability and Margin (iii) • IMPORTANT: for every training instance the score that we
get when we use the training vector w minus the score of every incorrect label is at least a certain margin gamma (ɣ). That is, the margin ɣ is the smallest difference between the score of the right class and the best score of the incorrect class.
The higher the weights, the greater the norms. And we want this to be 1 (normalization).
There are different ways of measuring the length/magnitude of a vector and they are known as norms. The Eucledian norm (or L2 norm) says: take all the values of the weight vector, square them and sum them up, then take the square root .
![Page 22: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/22.jpg)
Perceptron
Linear Classifiers: Repe22on & Extension 22
![Page 23: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/23.jpg)
Perceptron Learning Algorithm
Linear Classifiers: Repe22on & Extension 23
![Page 24: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/24.jpg)
Main Theorem
![Page 25: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/25.jpg)
Linear Classifiers: Repe22on & Extension 25
Perceptron Theorem
• For any training set that is separable with some margin, we can prove that the number of mistakes during training -‐-‐ if we keep itera2ng over the training set -‐-‐ is bounded by a quan2ty that depends on the size of the margin (see proofs in the Appendix, slides Lecture 3).
• R depends on the norm of the largest difference you can have between feature vectors. The larger R, the more spread out the data, the more errors we can poten2ally make. Similarly if gamma is larger we will make fewer mistakes.
![Page 26: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/26.jpg)
Summary
![Page 27: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/27.jpg)
Basically…
Linear Classifiers: Repe22on & Extension 27
.... if it is possible to find such a weight vector for some posiAve margin gamma, then the training set is separable.
So... if the training set is separable, Perceptron will eventually find the weight vector that separates the data. The 2me it takes depends on the property of the data. But aeer a finite number of itera2on, the training set will converge to 0. However... although we find the perfect weight vector for separa2ng the training data, it might be the case that the classifier has not good generaliza2on (do you remember the difference between empirical error and generaliza2on error?) So, with Perceptron, we have a fixed norm (=1) and variable margin (>0).
![Page 28: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/28.jpg)
Appendix: Proofs and Deriva2ons
![Page 29: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/29.jpg)
![Page 30: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/30.jpg)
![Page 31: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/31.jpg)
![Page 32: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/32.jpg)
![Page 33: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/33.jpg)
![Page 34: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/34.jpg)
![Page 35: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/35.jpg)
![Page 36: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/36.jpg)
![Page 37: MachineLearningfor Language(Technology( Lecture9:! Perceptron!santini.se/teaching/ml/2014/Lecture09_Perceptron.pdf · MachineLearningfor Language(Technology(!Lecture9:! Perceptron!](https://reader035.fdocuments.in/reader035/viewer/2022062311/5fd76bf8df96383d662ac7d9/html5/thumbnails/37.jpg)