Revisiting Logistic Regression & Naïve Logistic Regression...
Transcript of Revisiting Logistic Regression & Naïve Logistic Regression...
![Page 1: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/1.jpg)
Revisiting Revisiting
Logistic Regression & Naïve Logistic Regression & Naïve BayesBayes
Aarti SinghAarti Singh
Machine Learning 10-701/15-781
Jan 27, 2010
![Page 2: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/2.jpg)
Generative and Discriminative
Classifiers
Training classifiers involves learning a mapping f: X -> Y, or P(Y|X)
Generative classifiers (e.g. Naïve Bayes)
• Assume some functional form for P(X,Y) (or P(X|Y) and P(Y))
• Estimate parameters of P(X|Y), P(Y) directly from training data
2
• Estimate parameters of P(X|Y), P(Y) directly from training data
• Use Bayes rule to calculate P(Y|X)
Discriminative classifiers (e.g. Logistic Regression)
• Assume some functional form for P(Y|X)
• Estimate parameters of P(Y|X) directly from training data
![Page 3: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/3.jpg)
Logistic Regression
Assumes the following functional form for P(Y|X):
Alternatively,
3
Alternatively,
(Linear Decision Boundary)
DOES NOT require any conditional independence assumptions
![Page 4: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/4.jpg)
Connection to Gaussian Naïve Bayes
There are several distributions that can lead to a linear decision boundary.
As another example, consider a generative model:
Exponential family
4
Observe that Gaussian is a special case
![Page 5: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/5.jpg)
Connection to Gaussian Naïve Bayes
5
Constant term First-order term
Special case: P(X|Y=y) ~ Gaussian( µy,Σy) where Σ0 = Σ1 (cij,0 = cij,1)
Conditionally independent cij,y = 0 , i ≠ j
(Gaussian Naïve Bayes)
![Page 6: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/6.jpg)
Generative vs Discriminative
Given infinite data (asymptotically),
If conditional independence assumption holds,
Discriminative and generative perform similar.
6
If conditional independence assumption does NOT holds,
Discriminative outperforms generative.
![Page 7: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/7.jpg)
Generative vs Discriminative
Given finite data (n data points, p features),
Ng-Jordan
paper
7
Naïve Bayes (generative) requires n = O(log p) to converge to its
asymptotic error, whereas Logistic regression (discriminative)
requires n = O(p).
Why? “Independent class conditional densities”
* smaller classes are easier to learn
* parameter estimates not coupled – each parameter is learnt
independently, not jointly, from training data.
![Page 8: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/8.jpg)
Naïve Bayes vs Logistic Regression
Verdict
Both learn a linear decision boundary.
Naïve Bayes makes more restrictive assumptions
8
Naïve Bayes makes more restrictive assumptions
and has higher asymptotic error,
BUT
converges faster to its less accurate asymptotic
error.
![Page 9: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/9.jpg)
Experimental Comparison (Ng-Jordan’01)
UCI Machine Learning Repository 15 datasets, 8 continuous features, 7 discrete features
9Logistic RegressionNaïve Bayes
More in
Paper…
![Page 10: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/10.jpg)
Classification so far … (Recap)Classification so far … (Recap)
10
![Page 11: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/11.jpg)
Tax Fraud Detection
Diagnosing sickle
cell anemia
Anemic cell
Healthy cell
Features, X Labels, Y
Classification Tasks
Tax Fraud Detection
Web ClassificationSports
Science
News
Predict squirrel hill
residentDrive to CMU, Rachel’s fan,
Shop at SH Giant Eagle
Resident
Not resident11
![Page 12: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/12.jpg)
Goal:
Classification
Sports
12
Sports
Science
News
Features, X Labels, Y
Probability of Error
![Page 13: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/13.jpg)
Classification
Optimal predictor:(Bayes classifier)
13
Depends on unknown distribution
![Page 14: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/14.jpg)
Classification algorithms
Independent and identically distributed
However, we can learn a good prediction rule from training data
Learning algorithm
So far …
14
Decision Trees
K-Nearest Neighbor
Naïve Bayes
Logistic Regression
![Page 15: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/15.jpg)
Linear RegressionLinear Regression
Aarti SinghAarti Singh
Machine Learning 10-701/15-781
Jan 27, 2010
![Page 16: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/16.jpg)
Discrete to Continuous Labels
Sports
Science
News
Classification
Anemic cell
Healthy cell
16
Regression
Stock Market
PredictionY = ?
X = Feb01
X = Document Y = Topic X = Cell Image Y = Diagnosis
![Page 17: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/17.jpg)
Regression Tasks
Weather Prediction
Y = Temp
17
X = 7 pm
Estimating
Contamination
X = new location
Y = sensor reading
![Page 18: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/18.jpg)
Supervised Learning
Sports
Goal:
18
Sports
Science
News
Classification: Regression:
Probability of Error Mean Squared Error
Y = ?
X = Feb01
![Page 19: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/19.jpg)
Regression
Optimal predictor:(Conditional Mean)
Dropping subscripts
for notational convenience
19
for notational convenience
![Page 20: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/20.jpg)
Regression
Optimal predictor:(Conditional Mean)
20
Depends on unknown distribution
Intuition: Signal plus (zero-mean) Noise model
![Page 21: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/21.jpg)
Regression algorithms
Learning algorithm
Linear Regression
21
Linear Regression
Lasso, Ridge regression (Regularized Linear Regression)
Nonlinear Regression
Kernel Regression
Regression Trees, Splines, Wavelet estimators, …
Empirical Risk Minimizer:
Empirical mean
![Page 22: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/22.jpg)
Linear Regression
- Class of Linear functions
Least Squares Estimator
22
ββββ1 - intercept
ββββ2 = slopeUni-variate case:
Multi-variate case: 1
where ,
![Page 23: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/23.jpg)
Least Squares Estimator
23
![Page 24: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/24.jpg)
Least Squares Estimator
24
![Page 25: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/25.jpg)
Normal Equations
If is invertible,
p xp p x1 p x1
25
When is invertible ? (Homework 2)
Recall: Full rank matrices are invertible. What is rank of ?
What if is not invertible ? (Homework 2)
Regularization (later)
![Page 26: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/26.jpg)
Geometric Interpretation
Difference in prediction on training set:
26
is the orthogonal projection of
onto the linear subspace spanned by the
columns of
0
![Page 27: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/27.jpg)
Revisiting Gradient Descent
Even when is invertible, might be computationally expensive if A is huge.
Gradient Descent
27
Initialize:
Update:
0 if =
Stop: when some criterion met e.g. fixed # iterations, or < ε.
Gradient Descent
![Page 28: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/28.jpg)
Effect of step-size α
28
Large α => Fast convergence but larger residual error
Also possible oscillations
Small α => Slow convergence but small residual error
![Page 29: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/29.jpg)
When does Gradient Descent
succeed?View of the algorithm is myopic.
29
Guaranteed to converge to local minima if Converges as in jth direction
Convergence depends on eigenvalue spread
http://demonstrations.wolfram.comhttp://www.ce.berkeley.edu/~bayen/
![Page 30: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/30.jpg)
Least Squares and MLE
Intuition: Signal plus (zero-mean) Noise model
30
Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model !
log likelihood
![Page 31: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/31.jpg)
Regularized Least Squares and MAP
What if is not invertible ?
log likelihood log prior
31Prior belief that β is Gaussian with zero-mean biases solution to “small” β
I) Gaussian Prior
0
Ridge Regression
Closed form: HW
![Page 32: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/32.jpg)
Regularized Least Squares and MAP
What if is not invertible ?
log likelihood log prior
32Prior belief that β is Laplace with zero-mean biases solution to “small” β
Lasso
Closed form: HW
II) Laplace Prior
![Page 33: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/33.jpg)
Ridge Regression vs Lasso
Ridge Regression: Lasso: HOT!Ideally l0 penalty,
but optimization
becomes non-convex
βs with constant J(β)
(level sets of J(β))
33
Lasso (l1 penalty) results in sparse solutions – vector with more zero coordinates
Good for high-dimensional problems – don’t have to store all coordinates!
βs with
constant
l1 norm
βs with
constant
l0 norm
(level sets of J(β))
βs with
constant
l2 norm
β2
β1
![Page 34: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/34.jpg)
Beyond Linear Regression
Polynomial regression
Regression with nonlinear features/basis functions
34
Kernel regression - Local/Weighted regression
Regression trees – Spatially adaptive regression
h
![Page 35: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/35.jpg)
Polynomial Regression
Univariate case:
where ,
35
Nonlinear features
Weight ofeach feature
![Page 36: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/36.jpg)
Nonlinear Regression
Fourier Basis Wavelet Basis
Nonlinear features/basis functionsBasis coefficients
36
Good representation for oscillatory functions Good representation for functions
localized at multiple scales
![Page 37: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/37.jpg)
Local Regression
Nonlinear features/basis functionsBasis coefficients
37
Globally supported
basis functions
(polynomial, fourier)
will not yield a good
representation
![Page 38: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/38.jpg)
Local Regression
Nonlinear features/basis functionsBasis coefficients
38
Globally supported
basis functions
(polynomial, fourier)
will not yield a good
representation
![Page 39: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/39.jpg)
Kernel Regression (Local)
Weighted Least Squares
Weigh each training point based on distance to test point
39
K – Kernel
h – Bandwidth of kernel
![Page 40: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/40.jpg)
Nadaraya-Watson Kernel Regression
constant
40
![Page 41: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/41.jpg)
Nadaraya-Watson Kernel Regression
constant
41
h
Recall: NN classifierAverage <-> majority vote
with box-car kernel
Sum of Ys in h ball around X#pts in h ball around X
![Page 42: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/42.jpg)
Choice of Bandwidth
hShould depend on n, # training data
(determines variance)
Should depend on smoothness of
function
42
Large Bandwidth – average more data points, reduce noise
Small Bandwidth – less smoothing, more accurate fit
(Lower variance)
(Lower bias)
Bias – Variance tradeoff : More to come in later lectures
function
(determines bias)
![Page 43: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/43.jpg)
Spatially adaptive regression
h
h
43
If function smoothness varies spatially, we want to allow bandwidth h to
depend on X
Local polynomials, splines, wavelets, regression trees …
![Page 44: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/44.jpg)
Regression trees
Num Children?
≥ 2 < 2
Binary Decision Tree
44
Average (fit a constant ) on the leaves
![Page 45: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/45.jpg)
Regression trees
h
h
Quad Decision Tree
45
- Polynomial fit on each leaf
If , then split
Else stop Compare residual error with and without split
![Page 46: Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that](https://reader036.fdocuments.in/reader036/viewer/2022081612/5f67e6e64b474f7ba94fbac3/html5/thumbnails/46.jpg)
Summary
Discriminative vs Generative Classifiers
- Naïve Bayes vs Logistic Regression
Regression
- Linear Regression
Least Squares Estimator
46
Least Squares Estimator
Normal Equations
Gradient Descent
Geometric Interpretation
Probabilistic Interpretation (connection to MLE)
- Regularized Linear Regression (connection to MAP)
Ridge Regression, Lasso
- Polynomial Regression, Basis (Fourier, Wavelet) Estimators
- Kernel Regression (Localized)
- Regression Trees