Introduction to Machine Learning - CmpE WEB · Title: Introduction to Machine Learning Author:...

INTRODUCTION

TO

MACHINE

LEARNING 3RD EDITION

ETHEM ALPAYDIN

© The MIT Press, 2014

[email protected]

http://www.cmpe.boun.edu.tr/~ethem/i2ml3e

Lecture Slides for

CHAPTER 12:

LOCAL MODELS

Introduction 3

Divide the input space into local regions and learn simple (constant/linear) models in each patch

Unsupervised: Competitive, online clustering

Supervised: Radial-basis functions, mixture of experts

Competitive Learning

ijtj

ti

ij

t

ij

t

ti

t

tti

i

lt

li

tti

t i itt

i

k

ii

mxbm

Em

k

b

bk

b

bE

:means- Online

:means- Batch

otherwise

min if

xm

mxmx

mxm

0

1

1X

4

5

Winner-take-all

network

Adaptive Resonance Theory

Incremental; add a new cluster if

not covered; defined by vigilance,

ρ

otherwise

if

min

it

i

it

k

lt

k

li

tti

b

b

mxm

xm

mxmx

1

1

6

(Carpenter and Grossberg, 1988)

Self-Organizing Maps 7

2

2

22

1

ilile

ile lt

l

exp,

, mxm

Units have a neighborhood defined; mi is “between”

mi-1 and mi+1, and are all updated together

One-dim map: (Kohonen, 1990)

Radial-Basis Functions

Locally-tuned units:

0

1

wpwyH

h

thh

t

2

2

2 h

ht

th

sp

mxexp

8

Local vs Distributed Representation 9

Training RBF 10

Hybrid learning:

First layer centers and spreads:

Unsupervised k-means

Second layer weights:

Supervised gradient-descent

Fully supervised

(Broomhead and Lowe, 1988; Moody and Darken,

1989)

Regression 11

3

2

2

0

1

2

2

1

h

ht

th

t iih

ti

tih

h

hjtjt

ht i

ihti

tihj

t

th

ti

tiih

i

H

h

thih

ti

t i

ti

tihiihhh

spwyrs

s

mxpwyrm

pyrw

wpwy

yrwsE

mx

m

X|,,,

Classification 12

k h kthkh

h ithiht

i

t i

ti

tihiihhh

wpw

wpwy

yrwsE

0

0

exp

exp

log |,,,

Xm

Rules and Exceptions 13

0

1

vpwy tTH

h

thh

t

xv

Default

rule Exceptions

Rule-Based Knowledge 14

10

2

1022

10

22

3

2

32

12

2

2

2

2

1

2

11

321

.

.

.

ws

cxp

ws

bx

s

axp

ycxbxax

withexp

withexpexp

THEN OR AND IF

Incorporation of prior knowledge (before training)

Rule extraction (after training) (Tresp et al., 1997)

Fuzzy membership functions and fuzzy rules

Normalized Basis Functions 15

2

1

22

22

1

2

2

h

hjtj

t i

th

tiih

ti

tihj

th

t

ti

tiih

H

h

thih

ti

l llt

hht

H

l

tl

tht

h

s

mxgywyrm

gyrw

gwy

s

s

p

pg

/exp

/exp

mx

mx

Competitive Basis Functions 16

H

h

ttttt hphpp1

xrxxr ,||| Mixture model:

l llt

l

hht

hth

l

t

tt

sa

sag

lplp

hphphp

22

22

2

2

/exp

/exp

|

||

mx

mx

x

xx

Regression

222

1

ti

ti

i

tt yrp exp|xr

l

l i

til

ti

tl

i

tih

ti

tht

h

t h

hjtjt

hthhj

t

th

tih

tiih

ihtih

t h i

tih

ti

thhiihhh

lplp

hphphp

yrg

yrgf

s

mxgfmfyrw

wy

yrgws

xrx

xrxxr

m

,

,,

/

/

,,,

||

|||

exp

exp

fit constant the is

explog|

2

2

2

2

21

21

2

1

XL

17

Classification 18

l i

til

ti

tl

i

tih

ti

tht

h

k kh

ihtih

t h i

tih

ti

th

t h i

rtih

thhiihhh

yrg

yrgf

w

wy

yrg

ygwsti

log exp

log exp

exp

exp

log exp log

log|,,,

XL m

EM for RBF (Supervised EM) 19

tth hpf xr ,|

E-step:

M-step:

t

th

t

ti

th

ih

t

th

T

ht

t htt

h

h

t

th

t

tth

h

f

rfw

f

fs

f

f

mxmx

xm

Learning Vector Quantization 20

otherwise

)label)label( if

it

i

it

it

i

mxm

mxmxm

(

H units per class prelabeled (Kohonen, 1990)

Given x, mi is the closest:

x

mi mj

Mixture of Experts

In RBF, each local fit is a

constant, wih, second

layer weight

In MoE, each local fit is

a linear function of x, a

local expert: ttih

tihw xv

21

(Jacobs et al., 1991)

MoE as Models Combined

Radial gating:

Softmax gating:

22

l llt

hht

th

s

sg

22

22

2

2

/exp

/exp

mx

mx

l

tTl

tTht

hgxm

xm

exp

exp

Cooperative MoE 23

tj

th

t

ti

tih

tih

tihj

t

tth

tih

tiih

t i

ti

tihiihhh

xgywyrm

gyr

yrwsE

xv

m2

2

1X|,,

,

Regression

Competitive MoE: Regression 24

t

t

th

thh

tth

t

tih

tiih

tihih

tih

t h i

tih

ti

thhiihhh

gf

fyr

wy

yrgws

xm

xv

xv

m

exp log|,,2

,2

1XL

Competitive MoE: Classification 25

k

tkh

tih

k kh

ihtih

t h i

tih

ti

th

t h i

rtih

thhiihhh

w

wy

yrg

ygwsti

xv

xv

m

exp

exp

exp

exp

log exp log

log|,,,

XL

Hierarchical Mixture of Experts 26

Tree of MoE where each MoE is an expert in a

higher-level MoE

Soft decision tree: Takes a weighted (gating)

average of all leaves (experts), as opposed to

using a single path and a single leaf

Can be trained using EM (Jordan and Jacobs,

1994)

Introduction to Machine Learning - CmpE WEB · Title: Introduction to Machine Learning Author:...

Documents

Transcript of Introduction to Machine Learning - CmpE WEB · Title: Introduction to Machine Learning Author:...