Introduction to Machine Learning - CmpE WEB · Title: Introduction to Machine Learning Author:...
Transcript of Introduction to Machine Learning - CmpE WEB · Title: Introduction to Machine Learning Author:...
INTRODUCTION
TO
MACHINE
LEARNING 3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
Lecture Slides for
CHAPTER 12:
LOCAL MODELS
Introduction 3
Divide the input space into local regions and learn simple (constant/linear) models in each patch
Unsupervised: Competitive, online clustering
Supervised: Radial-basis functions, mixture of experts
Competitive Learning
ijtj
ti
ij
t
ij
t
ti
t
tti
i
lt
li
tti
t i itt
i
k
ii
mxbm
Em
k
b
bk
b
bE
:means- Online
:means- Batch
otherwise
min if
xm
mxmx
mxm
0
1
1X
4
5
Winner-take-all
network
Adaptive Resonance Theory
Incremental; add a new cluster if
not covered; defined by vigilance,
ρ
otherwise
if
min
it
i
it
k
lt
k
li
tti
b
b
mxm
xm
mxmx
1
1
6
(Carpenter and Grossberg, 1988)
Self-Organizing Maps 7
2
2
22
1
ilile
ile lt
l
exp,
, mxm
Units have a neighborhood defined; mi is “between”
mi-1 and mi+1, and are all updated together
One-dim map: (Kohonen, 1990)
Radial-Basis Functions
Locally-tuned units:
0
1
wpwyH
h
thh
t
2
2
2 h
ht
th
sp
mxexp
8
Local vs Distributed Representation 9
Training RBF 10
Hybrid learning:
First layer centers and spreads:
Unsupervised k-means
Second layer weights:
Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken,
1989)
Regression 11
3
2
2
0
1
2
2
1
h
ht
th
t iih
ti
tih
h
hjtjt
ht i
ihti
tihj
t
th
ti
tiih
i
H
h
thih
ti
t i
ti
tihiihhh
spwyrs
s
mxpwyrm
pyrw
wpwy
yrwsE
mx
m
X|,,,
Classification 12
k h kthkh
h ithiht
i
t i
ti
tihiihhh
wpw
wpwy
yrwsE
0
0
exp
exp
log |,,,
Xm
Rules and Exceptions 13
0
1
vpwy tTH
h
thh
t
xv
Default
rule Exceptions
Rule-Based Knowledge 14
10
2
1022
10
22
3
2
32
12
2
2
2
2
1
2
11
321
.
.
.
ws
cxp
ws
bx
s
axp
ycxbxax
withexp
withexpexp
THEN OR AND IF
Incorporation of prior knowledge (before training)
Rule extraction (after training) (Tresp et al., 1997)
Fuzzy membership functions and fuzzy rules
Normalized Basis Functions 15
2
1
22
22
1
2
2
h
hjtj
t i
th
tiih
ti
tihj
th
t
ti
tiih
H
h
thih
ti
l llt
hht
H
l
tl
tht
h
s
mxgywyrm
gyrw
gwy
s
s
p
pg
/exp
/exp
mx
mx
Competitive Basis Functions 16
H
h
ttttt hphpp1
xrxxr ,||| Mixture model:
l llt
l
hht
hth
l
t
tt
sa
sag
lplp
hphphp
22
22
2
2
/exp
/exp
|
||
mx
mx
x
xx
Regression
222
1
ti
ti
i
tt yrp exp|xr
l
l i
til
ti
tl
i
tih
ti
tht
h
t h
hjtjt
hthhj
t
th
tih
tiih
ihtih
t h i
tih
ti
thhiihhh
lplp
hphphp
yrg
yrgf
s
mxgfmfyrw
wy
yrgws
xrx
xrxxr
m
,
,,
/
/
,,,
||
|||
exp
exp
fit constant the is
explog|
2
2
2
2
21
21
2
1
XL
17
Classification 18
l i
til
ti
tl
i
tih
ti
tht
h
k kh
ihtih
t h i
tih
ti
th
t h i
rtih
thhiihhh
yrg
yrgf
w
wy
yrg
ygwsti
log exp
log exp
exp
exp
log exp log
log|,,,
XL m
EM for RBF (Supervised EM) 19
tth hpf xr ,|
E-step:
M-step:
t
th
t
ti
th
ih
t
th
T
ht
t htt
h
h
t
th
t
tth
h
f
rfw
f
fs
f
f
mxmx
xm
Learning Vector Quantization 20
otherwise
)label)label( if
it
i
it
it
i
mxm
mxmxm
(
H units per class prelabeled (Kohonen, 1990)
Given x, mi is the closest:
x
mi mj
Mixture of Experts
In RBF, each local fit is a
constant, wih, second
layer weight
In MoE, each local fit is
a linear function of x, a
local expert: ttih
tihw xv
21
(Jacobs et al., 1991)
MoE as Models Combined
Radial gating:
Softmax gating:
22
l llt
hht
th
s
sg
22
22
2
2
/exp
/exp
mx
mx
l
tTl
tTht
hgxm
xm
exp
exp
Cooperative MoE 23
tj
th
t
ti
tih
tih
tihj
t
tth
tih
tiih
t i
ti
tihiihhh
xgywyrm
gyr
yrwsE
xv
m2
2
1X|,,
,
Regression
Competitive MoE: Regression 24
t
t
th
thh
tth
t
tih
tiih
tihih
tih
t h i
tih
ti
thhiihhh
gf
fyr
wy
yrgws
xm
xv
xv
m
exp log|,,2
,2
1XL
Competitive MoE: Classification 25
k
tkh
tih
k kh
ihtih
t h i
tih
ti
th
t h i
rtih
thhiihhh
w
wy
yrg
ygwsti
xv
xv
m
exp
exp
exp
exp
log exp log
log|,,,
XL
Hierarchical Mixture of Experts 26
Tree of MoE where each MoE is an expert in a
higher-level MoE
Soft decision tree: Takes a weighted (gating)
average of all leaves (experts), as opposed to
using a single path and a single leaf
Can be trained using EM (Jordan and Jacobs,
1994)