ETHEM ALPAYDIN © The MIT Press, 2010ethem/i2ml2e/2e_v1-0/i2ml2...Nonparametric Estimation...
Transcript of ETHEM ALPAYDIN © The MIT Press, 2010ethem/i2ml2e/2e_v1-0/i2ml2...Nonparametric Estimation...
ETHEM ALPAYDIN© The MIT Press, 2010
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Lecture Slides for
Nonparametric Estimation Parametric (single global model), semiparametric (small
number of local models)
Nonparametric: Similar inputs have similar outputs
Functions (pdf, discriminant, regression) change smoothly
Keep the training data;“let the data speak for itself”
Given x, find a small number of closest training instances and interpolate from these
Aka lazy/memory-based/case-based/instance-based learning
3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Given the training set X={xt}t drawn iid from p(x)
Divide data into bins of size h
Histogram:
Naive estimator:
or
Density Estimation
Nh
xxxp
t as bin same the in #ˆ
4
Nh
hxxhxxp
t
2
#ˆ
otherwise
if
0
1211
1
uuw
h
xxw
Nhxp
N
t
t /ˆ
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Kernel Estimator
N
t
t
h
xxK
Nhxp
1
1ˆ
7
Kernel function, e.g., Gaussian kernel:
Kernel estimator (Parzen windows)
22
1 2uuK exp
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width
dk(x), distance to kth closest instance to x
k-Nearest Neighbor Estimator
xNd
kxp
k2ˆ
9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Kernel density estimator
Multivariate Gaussian kernel
spheric
ellipsoid
Multivariate Data
N
t
t
d hK
Nhp
1
1 xxxˆ
11
uuu
uu
1
212
2
2
1
2
1
22
1
SS
T
d
d
K
K
exp
exp
//
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Estimate p(x|Ci) and use Bayes’ rule
Kernel estimator
k-NN estimator
Nonparametric Classification
ti
N
t
t
diii
ii
ti
N
t
t
di
i
rh
KNh
CPCpg
N
NCPr
hK
hNCp
1
1
1
1
xxxx
xxx
ˆ|ˆ
ˆ |ˆ
12
k
k
p
CPCpCP
VN
kCp iii
iki
ii
x
xx
xx
ˆ
ˆ|ˆ|ˆ |ˆ
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Condensed Nearest Neighbor
13
Time/space complexity of k-NN is O (N)
Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968)
ZZXXZ ||' EE
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Condensed Nearest Neighbor Incremental algorithm: Add instance if needed
14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Nonparametric Regression
otherwise
withbin same the in is if
where
0
1
1
1
xxxxb
xxb
rxxbxg
tt
N
t
t
tN
t
t
,
,
,ˆ
15
Aka smoothing models
Regressogram
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Running Mean/Kernel Smoother Running mean smoother
Running line smoother
Kernel smoother
where K( ) is Gaussian
Additive models (Hastie and Tibshirani, 1990)
18
otherwise
1i f
where
ˆ
0
1
1
1
uuw
h
xxw
rh
xxw
xgN
t
t
tN
t
t
N
t
t
tN
t
t
h
xxK
rh
xxK
xg
1
1
ˆ
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
19Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
20Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
How to Choose k or h? When k or h is small, single instances matter; bias is
small, variance is large (undersmoothing): High complexity
As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity
Cross-validation is used to finetune k or h.
22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
23Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)