Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible...
Transcript of Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible...
![Page 1: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/1.jpg)
Classification: Perceptron
Prof. Seungchul Lee
Industrial AI Lab.
![Page 2: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/2.jpg)
Classification
• Where 𝑦 is a discrete value– Develop the classification algorithm to determine which class a new input should fall into
• We will learn– Perceptron
– Logistic regression
• To find a classification boundary
2
![Page 3: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/3.jpg)
Perceptron
• For input 𝑥 =
𝑥1⋮𝑥𝑑
'attributes of a customer’
• Weights 𝜔 =
𝜔1⋮𝜔𝑑
Data
Features
Classification
3
![Page 4: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/4.jpg)
Perceptron
• For input 𝑥 =
𝑥1⋮𝑥𝑑
'attributes of a customer’
• Weights 𝜔 =
𝜔1⋮𝜔𝑑
Data
Features
Classification
4
![Page 5: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/5.jpg)
Perceptron
• Introduce an artificial coordinate 𝑥0 = 1 :
• In a vector form, the perceptron implements
• Let’s see geometrical meaning of perceptron
5
![Page 6: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/6.jpg)
𝝎
• If Ԧ𝑝 and Ԧ𝑞 are on the decision line
6
![Page 7: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/7.jpg)
Signed Distance from a Line: 𝒉
• Sign with respect to a line
7
![Page 8: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/8.jpg)
How to Find 𝝎
• All data in class 1 (𝑦 = 1)– 𝑔 𝑥 > 0
• All data in class 0 (𝑦 = −1)– 𝑔 𝑥 < 0
8
![Page 9: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/9.jpg)
Perceptron Algorithm
• The perceptron implements
• Given the training set
1) pick a misclassified point
2) and update the weight vector
9
![Page 10: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/10.jpg)
Iterations of Perceptron
1. Randomly assign 𝜔
2. One iteration of the PLA (perceptron learning algorithm)
where (𝑥, 𝑦) is a misclassified training point
3. At iteration 𝑡 = 1,2,3,⋯ , pick a misclassified point from
4. And run a PLA iteration on it
5. That's it!
10
![Page 11: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/11.jpg)
Diagram of Perceptron
• Perceptron will be shown to be a basic unit for neural networks and deep learning later
11
![Page 12: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/12.jpg)
Scikit-Learn for Perceptron
12
![Page 13: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/13.jpg)
Perceptron Algorithm in Python
13
![Page 14: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/14.jpg)
Perceptron Algorithm in Python
14
![Page 15: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/15.jpg)
The Best Hyperplane Separator?
• Perceptron finds one of the many possible hyperplanes separating the data if one exists
• Of the many possible choices, which one is the best?
• Utilize distance information from all data samples
– We will see this formally when we discuss the logistic regression
15
![Page 16: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/16.jpg)
Classification: Logistic Regression
16
![Page 17: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/17.jpg)
Classification: Logistic Regression
• Perceptron: make use of sign of data
• Logistic regression: make use of distance of data
• Logistic regression is a classification algorithm
– don't be confused from its name
• To find a classification boundary
17
![Page 18: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/18.jpg)
Using Distances
18
![Page 19: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/19.jpg)
Using Distances
19
![Page 20: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/20.jpg)
Using all Distances
• basic idea: to find the decision boundary (hyperplane) of 𝑔 𝑥 = 𝜔𝑇𝑥 = 0
such that maximizes ς𝑖 ℎ𝑖 → optimization
– Inequality of arithmetic and geometric means
and that equality holds if and only if 𝑥1 = 𝑥2 = ⋯ = 𝑥𝑚
20
![Page 21: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/21.jpg)
Using all Distances
• Roughly speaking, this optimization of maxς𝑖 ℎ𝑖 tends to position a hyperplane in the middle of two classes
21
![Page 22: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/22.jpg)
Sigmoid Function
• We link or squeeze −∞,+∞ to 0, 1 for several reasons:
• 𝜎 𝑧 is the sigmoid function, or the logistic function
– Logistic function always generates a value between 0 and 1
– Crosses 0.5 at the origin, then flattens out
Step function
22
![Page 23: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/23.jpg)
Sigmoid Function
• Benefit of mapping via the logistic function– Monotonic: same or similar optimization solution
– Continuous and differentiable: good for gradient descent optimization
– Probability or confidence: can be considered as probability
– Probability that the label is +1
– Probability that the label is 0
23
![Page 24: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/24.jpg)
Goal: We Need to Fit 𝜔 to our Data
• For a single data point (𝑥, 𝑦) with parameters 𝜔
• It can be compactly written as
24
![Page 25: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/25.jpg)
Scikit-Learn for Logistic Regression
25
![Page 26: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/26.jpg)
Multiclass Classification
• Generalization to more than 2 classes is straightforward– one vs. all (one vs. rest)
– one vs. one
26
![Page 27: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/27.jpg)
Classifying Non-linear Separable Data
• Consider the binary classification problem
– each example represented by a single feature 𝑥
– No linear separator exists for this data
27
![Page 28: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/28.jpg)
Classifying Non-linear Separable Data
• Consider the binary classification problem
– each example represented by a single feature 𝑥
– No linear separator exists for this data
• Now map each example as 𝑥 → 𝑥, 𝑥2
• Data now becomes linearly separable in the new representation
• Linear in the new representation = nonlinear in the old representation
28
![Page 29: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/29.jpg)
Classifying Non-linear Separable Data
• Let's look at another example
– Each example defined by a two features
– No linear separator exists for this data 𝑥 = 𝑥1, 𝑥2
• Now map each example as 𝑥 = 𝑥1, 𝑥2 → 𝑧 = 𝑥12, 2𝑥1𝑥2, 𝑥2
2
– Each example now has three features (derived from the old representation)
• Data now becomes linear separable in the new representation
29
![Page 30: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/30.jpg)
Kernel
• Often we want to capture nonlinear patterns in the data
– nonlinear regression: input and output relationship may not be linear
– nonlinear classification: classes may not be separable by a linear boundary
• Linear models (e.g. linear regression, linear SVM) are not just rich enough
– by mapping data to higher dimensions where it exhibits linear patterns
– apply the linear model in the new input feature space
– mapping = changing the feature representation
• Kernels: make linear model work in nonlinear settings
30
![Page 31: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one](https://reader035.fdocuments.in/reader035/viewer/2022081615/5fd7610b7c2cec7ce6681143/html5/thumbnails/31.jpg)
Nonlinear Classification
31https://www.youtube.com/watch?v=3liCbRZPrZA