Linear Classification

1. Linear Classification Machine Learning; Wed Apr 23, 2008

2. Motivation Problem:Our goal is to classify input vectorsx into one ofkclasses.Similar to regression, but the output variable is discrete. Inlinear classificationthe input space is split in (hyper-)planes, each with an assigned class. 3. Activation function Non-linear function assigning a class. 4. Activation function Non-linear function assigning a class. 5. Activation function Non-linear function assigning a class. Due tofthe model isnotlinear in the weights. 6. Activation function Non-linear function assigning a class. Due tofthe model isnotlinear in the weights. The decision surfacesarelinear inwandx . Decision surface 7. Discriminant functions A simple linear discriminant function: 8. Discriminant functions A simple linear discriminant function: 9. Least square training Target vectors as bit vectors. Classification vectors thehfunctions. 10. Least square training Least square is appropriate for Gaussian distributions, but has major problems with discrete targets... 11. Fisher's linear discriminant Approach: maximize this distance Consider the classification a projection: 12. Fisher's linear discriminant Approach: maximize this distance But notice the large overlap in the histograms. The variance in the projection is larger than it need be. Consider the classification a projection: 13. Fisher's linear discriminant Maximize difference in meanandminimize within-class variance: 14. Probabilistic models 15. Probabilistic models Wake Up! Clever ideas here! 16. Probabilistic models Wake Up! Clever ideas here! Modeling 17. Probabilistic models Wake Up! Clever ideas here! Modeling Fitting in 18. Probabilistic models Wake Up! Clever ideas here! Modeling Fitting in Making choices 19. Probabilistic models Approach:Define conditional distribution and make decision based on the sigmoid activation 20. Probabilistic models Particularly simple expression for Gaussian regression 21. Probabilistic models 22. Maximum likelihood estimation Assume observed iid 23. Maximum likelihood estimation 24. Logistic regression We can also directly express the class probability as a sigmoid (without implicitly having an underlying Gaussian): 25. Logistic regression We can also directly express the class probability as a sigmoid (without implicitly having an underlying Gaussian): The likelihood: 26. Logistic regression We can maximize the log likelihood... 27. Logistic regression Here we only estimateMweights, notMfor each mean plusO ( M2 )for the variance in the Gaussian approach. 28. Logistic regression Here we only estimateMweights, notMfor each mean plusO ( M2 )for the variance in the Gaussian approach. We are not assuming anything about the underlying densities (we only assume we can separate them linearly). 29. Summary

Classification models

Linear decision surfaces

Geometric approach

Maximizing distance of means and minimizing variance

Probabilistic approach

Sigmoid functions

Logistic regression

Linear Classification

Technology

Transcript of Linear Classification