Classification - Seoul National University · Perceptron algorithm Simple optimization procedure...

15
Classification Learning From Data Chapter 8 Outline 8.1 Statistical Learning Theory Formulation 8.2 Classical Formulation 8.3 Methods for Classification 8.4 Summary Classification Input sample x=(x 1 , x 2 , …, x d ) is classified to one (and only one) of J groups. Concerned with relationship between the class- membership label and feature vector. Goal is to estimate the mapping x y (decision rule) using labeled training data (x i , y i ). Two-class problem Output of the system Takes on values y={0, 1}. Learning machine Implement a set of indicator functions f(x, ω). Loss function ) , ( ) , ( 1 0 )) , ( , ( ω ω x f y if x f y if w x f y L = =

Transcript of Classification - Seoul National University · Perceptron algorithm Simple optimization procedure...

Page 1: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Cla

ssif

icat

ion

Lea

rnin

g Fr

om D

ata

Cha

pter

8

Out

line

�8.

1 S

tatis

tical

Lea

rnin

g T

heor

y Fo

rmul

atio

n

�8.

2 C

lass

ical

For

mul

atio

n

�8.

3 M

etho

ds f

or C

lass

ific

atio

n

�8.

4 S

umm

ary

�C

lass

ific

atio

n�

Inpu

t sam

ple

x=(x

1, x

2, …

, xd)

is c

lass

ifie

d to

one

(and

onl

y on

e) o

f J

grou

ps.

�C

once

rned

with

rel

atio

nshi

p be

twee

n th

e cl

ass-

mem

bers

hip

labe

l and

fea

ture

vec

tor.

�G

oal i

s to

est

imat

e th

e m

appi

ng x

→ y

(de

cisi

onru

le)

usin

g la

bele

d tr

aini

ng d

ata

(xi,

y i).

�T

wo-

clas

s pr

oble

m�

Out

put o

f th

e sy

stem

�T

akes

on

valu

es y

={0

, 1}.

�L

earn

ing

mac

hine

�Im

plem

ent a

set

of

indi

cato

r fu

nctio

ns f(

x, ω

).

�L

oss

func

tion

),

(

),

(

10))

,(

,(

ωωx

fy

if

xf

yif

wx

fy

L≠=

=

Page 2: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

�R

isk

Func

tiona

l

�L

earn

ing

is th

e pr

oble

m o

f fi

ndin

g th

e fu

nctio

nf(

x, ω

0) th

at m

inim

izes

the

prob

abili

ty o

fm

iscl

assi

fica

tion

usin

g on

ly th

e tr

aini

ng d

ata.

∫=

dxdy

yx

px

fy

LR

),

())

,(

,(

)(

ωω

�Im

plem

enta

tion

of m

etho

ds u

sing

SR

Mre

quir

es:

�Sp

ecif

icat

ion

of a

nes

ted

stru

ctur

e on

a s

et o

fin

dica

tor

appr

oxim

atin

g fu

nctio

ns.

�M

inim

izat

ion

of th

e em

piri

cal r

isk

(mis

clas

sifi

catio

n er

ror)

for

a g

iven

ele

men

t of

ast

ruct

ure.

�E

stim

atio

n of

pre

dict

ion

risk

.

�C

lass

ical

for

mul

atio

n of

the

clas

sifi

catio

n�

Con

ditio

nal d

ensi

ty

�Pr

ior

prob

abili

ty

�Po

ster

ior

prob

abili

ty

*),

()1

|(

*),

()0

|(

βαx

x

xx

py

p

py

p

==

==

�D

ecis

ion

rule

�A

pplie

s th

e E

RM

indu

ctiv

e pr

inci

ple

indi

rect

ly to

fir

stes

timat

e th

e de

nsiti

es.

�C

once

ptua

lly f

law

ed in

est

imat

ing

deci

sion

bou

ndar

yvi

a de

nsity

est

imat

ion.

=

>=

=ot

herw

ise

1

)1(

*),

()0

(*)

,(

0

)(

10

yP

py

Pp

iff

βα

xx

x

Page 3: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

8.1

Stat

istic

al L

earn

ing

The

ory

Form

ulat

ion

�G

oal i

s to

est

imat

e an

indi

cato

r fu

nctio

n or

deci

sion

bou

ndar

y f(

x, ω

0).

�U

se S

RM

indu

ctiv

e pr

inci

ple.

�C

onst

ruct

ive

met

hods

sho

uld

sele

ct a

n el

emen

tof

a s

truc

ture

Sm=

f(x,

ω0)

and

an

indi

cato

rfu

nctio

n

.

�B

ound

on

pred

ictio

n ri

sk)

,(

0mf

ω�

)/

()

()

(0

0m

mem

pm

hn

RR

Φ+

≤ω

ω

......

21

⊂⊂

⊂⊂

mSS

S

mm

Sh

of

dim

ensi

on-

VC

is

......

21

≤≤

≤≤

mhh

h

SLT

For

mul

atio

n (C

ont’

d)

�T

wo

stra

tegi

es f

or m

inim

izin

g th

e bo

und

�K

eep

the

conf

iden

ce in

terv

al f

ixed

and

min

imiz

eth

e em

piri

cal r

isk.

�In

clud

e al

l sta

tistic

al a

nd n

eura

l net

wor

k m

etho

ds u

sing

dict

iona

ry r

epre

sent

atio

n.

�W

ith h

igh-

dim

ensi

onal

dat

a, f

irst

pro

ject

the

data

ont

o th

elo

w-d

imen

sion

al s

ubsp

ace

(i.e

, m f

eatu

res)

and

then

perf

orm

mod

elin

g in

this

sub

spac

e.

�K

eep

the

valu

e of

the

empi

rica

l ris

k fi

xed

(sm

all)

and

min

imiz

e th

e co

nfid

ence

inte

rval

.�

Req

uire

s a

spec

ial s

truc

ture

suc

h th

at th

e va

lue

of th

eem

piri

cal r

isk

is k

ept s

mal

l for

all

appr

oxim

atin

gfu

nctio

ns.

SLT

For

mul

atio

n (C

ont’

d)

�B

inar

y cl

assi

fica

tion

prob

lem

�G

oal i

s to

min

imiz

e th

e m

iscl

assi

fica

tion

erro

r.

�Pe

rcep

tron

alg

orith

m�

Sim

ple

optim

izat

ion

proc

edur

e fo

r fi

ndin

g f(

x,ω

*)pr

ovid

ing

zero

mis

clas

sifi

catio

n er

ror

if�

trai

ning

dat

a is

line

arly

sep

arab

le.

�in

dica

tor

func

tion

is

�Pr

ereq

uisi

te

�W

eigh

t upd

ate

rule

�W

hen

the

data

is n

ot s

epar

able

and

/or

the

optim

alde

cisi

on b

ound

ary

is n

onlin

ear,

doe

s no

t pro

vide

an

optim

al s

olut

ion.

Page 4: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

SLT

For

mul

atio

n (C

ont’

d)

�M

LP

clas

sifi

er�

Can

for

m f

lexi

ble

nonl

inea

r de

cisi

on b

ound

arie

s.

�A

ppro

xim

ate

the

indi

cato

r fu

nctio

n by

a w

ell-

beha

ved

sigm

oid

func

tion.

�E

mpi

rica

l ris

k fu

nctio

nal

�Si

gmoi

d ac

tivat

ions

of

hidd

en u

nits

ena

ble

the

cons

truc

tion

of a

fle

xibl

e no

nlin

ear

deci

sion

bou

ndar

y.

�O

utpu

t sig

moi

d ap

prox

imat

es th

e di

scon

tinuo

us in

dica

tor

func

tion.

�In

neu

ral n

etw

orks

, a c

omm

on p

roce

dure

for

clas

sifi

catio

n de

cisi

ons

is to

use

sig

moi

d ou

tput

.

�O

utpu

t of

the

trai

ned

netw

ork

is in

terp

rete

d as

an

estim

ate

of th

e po

ster

ior

prob

abili

ty.

]*)

)*,

,(

([

)(

θ−=

Vw

xg

sI

xf

)|

1(ˆ

*))

*,,

((

xV

wx

==

yP

gs

SLT

For

mul

atio

n (C

ont’

d)

�G

ener

al p

resc

ript

ion

for

impl

emen

ting

cons

truc

tive

met

hods

(1)

Spe

cify

a (

flex

ible

) cl

ass

of a

ppro

xim

atin

g fu

nctio

ns f

orco

nstr

uctin

g a

(non

linea

r) d

ecis

ion

boun

dary

.

(2)

Cho

ose

a no

nlin

ear

optim

izat

ion

met

hod

for

sele

ctin

gth

e be

st f

unct

ion

from

cla

ss (

1), i

.e.,

the

func

tion

prov

idin

g th

e sm

alle

st e

mpi

rica

l ris

k.

(3)

Sel

ect c

ontin

uous

err

or f

unct

iona

l sui

tabl

e fo

r th

eop

timiz

atio

n m

etho

ds c

hose

n in

(2)

.

(4)

Sel

ect t

he b

est p

redi

ctiv

e m

odel

fro

m a

cla

ss o

ffu

nctio

ns (

1) u

sing

the

firs

t str

ateg

y fo

r m

inim

izin

g SL

Tbo

und.

SLT

For

mul

atio

n (C

ont’

d)

�C

lass

ific

atio

n m

etho

ds u

se a

con

tinuo

us e

rror

func

tiona

l tha

t onl

y ap

prox

imat

es th

e tr

ue o

ne.

�A

cla

ssif

icat

ion

met

hod

usin

g su

chap

prox

imat

ion

will

be

succ

essf

ul o

nly

ifm

inim

izin

g of

the

erro

r fu

nctio

nal s

elec

ted

in(3

) al

so m

inim

izes

true

em

piri

cal r

isk.

Page 5: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

8.2

Cla

ssic

al F

orm

ulat

ion

�B

ased

on

stat

isti

cal d

ecis

ion

theo

ry�

Prov

ides

the

foun

datio

n fo

r co

nstr

uctin

g op

timal

deci

sion

rul

es m

inim

izin

g ri

sk.

�St

rict

ly a

pplie

s on

ly w

hen

all d

istr

ibut

ions

are

know

n.

�C

once

rned

with

con

stru

ctin

g de

cisi

on r

ules

.

�E

stim

ate

the

requ

ired

dis

trib

utio

ns f

rom

the

data

and

use

them

with

in th

e fr

amew

ork

ofst

atis

tical

dec

isio

n th

eory

.

Cla

ssic

al F

orm

ulat

ion

(Con

t’d)

�W

hen

x is

not

obs

erve

d

�W

hen

x is

obs

erve

d

Cla

ssic

al F

orm

ulat

ion

(Con

t’d)

�M

iscl

assi

fica

tion

with

une

qual

cos

ts{

}

othe

rwis

e)0

()1

()1

|(

)0|

(

if

10)

(

)|

()

()

|(

)(

whe

neve

r th

at

such

de

fine

d

is re

gion

if

min

imiz

ed

isR

isk

)|

()

()

|(

)(

)|

()

()

|(

)(

)(

isri

sk

ov

eral

l

)|

()

|(

isco

st

ex

pect

ed

the

,

0001

1110

10

10

10

10

==−−

>==

=

==

<

=

=

==

+

=

==

==

+=

==

=

=+

==

∑∑

∫∫

∑∑

∑∑

∫∫

∫∫

yP

yP

CC

CC

yp

yp

r

iy

pi

yP

Ci

yp

iy

PC

x

di

yp

iy

PC

di

yp

iy

PC

di

yp

iy

PC

di

yp

iy

PC

iy

Pq

di

yp

Cd

iy

pC

q

If

ii

ii

ii

ii

ii

ii

i

ii

i

xxx

xx

RR

xx

xx

xx

xx

xx

xx

Rx

00

0R

1R

R0

R1

0R

1R

i

Page 6: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Post

erio

r di

stri

butio

ns

�T

wo

stra

tegi

es�

Est

imat

e th

e pr

ior

prob

abili

ties

and

clas

sco

nditi

onal

den

sitie

s an

d pl

ug th

em in

to B

ayes

’ru

le.

�E

stim

ate

post

erio

r de

nsiti

es d

irec

tly u

sing

trai

ning

data

fro

m a

ll th

e cl

asse

s.

�T

wo

appr

oach

es f

or tw

o st

rate

gies

�Pa

ram

etri

c m

etho

ds.

�A

dapt

ive

flex

ible

met

hods

.

Post

erio

r… (

Con

t’d)

�D

irec

t Est

imat

ion

�E

stim

atio

n of

pos

teri

or d

ensi

ties

can

be d

one

usin

gre

gres

sion

met

hods

.

�Fo

r tw

o-cl

ass

case

�Pa

ram

etri

c R

egre

ssio

n�

Lin

ear

regr

essi

on c

an b

e us

ed f

or c

lass

ific

atio

n fo

rno

n-ga

ussi

an d

istr

ibut

ions

.�

Fish

er li

near

dis

crim

inan

t pro

vide

s an

app

roxi

mat

ion

for

the

post

erio

r pr

obab

ility

, but

bia

sed,

pro

vidi

ng a

nac

cura

te c

lass

ific

atio

n ru

le.

�M

ay p

rovi

de a

poo

r de

cisi

on b

ound

ary.

�A

dapt

ive

Reg

ress

ion

�A

ccur

ate

regr

essi

on d

oes

not g

uara

ntee

a lo

wcl

assi

fica

tion

risk

.

�U

se th

e da

ta to

est

imat

e th

e co

nditi

onal

exp

ecta

tion.

�Fo

r tw

o-cl

ass

prob

lem

s

Page 7: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

8.3

Met

hods

for

Cla

ssif

icat

ion

�8.

3.1

Reg

ress

ion-

Bas

ed M

etho

ds�

Met

hods

bas

ed o

n co

ntin

uous

num

eric

alop

timiz

atio

n ca

n be

cas

t in

the

form

of

mul

tiple

-re

spon

se r

egre

ssio

n.

�M

ost p

opul

ar a

ppro

ach

to c

lass

ific

atio

n.

�8.

3.2

Tre

e-B

ased

Met

hods

�B

ased

on

a gr

eedy

opt

imiz

atio

n st

rate

gy.

�C

AR

T is

an

exam

ple.

�8.

3.3

Nea

rest

-Nei

ghbo

r an

d Pr

otot

ype

Met

hods

�L

ocal

met

hods

for

cla

ssif

icat

ion.

�E

stim

ate

the

deci

sion

bou

ndar

y lo

cally

.

�N

eare

st-n

eigh

bors

cla

ssif

icat

ion,

Koh

onen

’sle

arni

ng v

ecto

r qu

antiz

atio

n

ML

P C

lass

ifie

rs

�U

se 1

- of-

J ou

tput

enc

odin

g an

d si

gmoi

d ou

tput

units

.

�Pr

actic

al h

ints

and

impl

emen

tatio

n is

sues

�Pr

e-sc

alin

g of

inpu

t var

iabl

es�

Scal

e in

put d

ata

to th

e ra

nge

[-0.

5, 0

.5].

�T

ypic

ally

pre

-sca

led

to z

ero

mea

n, u

nit v

aria

nce.

�H

elps

to a

void

pre

mat

ure

satu

ratio

n an

d sp

eeds

up

trai

ning

.

�A

ltern

ativ

e ta

rget

out

put v

alue

s�

Tra

inin

g ou

tput

s ar

e se

t to

valu

es 0

.1 a

nd 0

.9.

�A

void

long

trai

ning

tim

e an

d ex

trem

ely

larg

e w

eigh

tsdu

ring

trai

ning

.

�In

itial

izat

ion

�N

etw

ork

para

met

ers

are

initi

aliz

ed to

sm

all r

ando

mva

lues

.

�C

hoic

e of

initi

aliz

atio

n ra

nge

has

subt

le r

egul

ariz

atio

nef

fect

.

�St

oppi

ng r

ules

�D

urin

g tr

aini

ng�

Tra

inin

g sh

ould

pro

ceed

as

long

as

the

decr

easi

ngco

ntin

uous

loss

fun

ctio

n re

duce

s th

e em

piri

cal

mis

clas

sifi

catio

n er

ror.

�E

arly

sto

ppin

g�

used

as

a fo

rm o

f co

mpl

exity

con

trol

(m

odel

sel

ectio

n).

Page 8: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

�M

ultip

le lo

cal m

inim

a�

Mai

n fa

ctor

com

plic

atin

g em

piri

cal r

isk

min

imiz

atio

n as

wel

l as

mod

el s

elec

tion.

�U

se th

e m

iscl

assi

fica

tion

erro

r ra

ther

than

squ

ared

err

orlo

ss d

urin

g m

odel

sel

ectio

n.

�L

earn

ing

rate

and

mom

entu

m te

rm�

Aff

ects

loca

l min

ima

foun

d by

bac

kpro

paga

tion

trai

ning

.

�O

ptim

al c

hoic

e of

thes

e is

pro

blem

-dep

ende

nt.

RB

F C

lass

ifie

rs

�U

se m

ultip

le o

utpu

t reg

ress

ion

to b

uild

ade

cisi

on b

ound

ary.

�Fo

rm o

f di

scri

min

ant f

unct

ions

RB

F… (

Con

t’d)

�C

hara

cter

istic

s�

Impl

emen

ts lo

cal d

ecis

ion

boun

dari

es.

�Fo

r fi

xed

valu

es o

f ba

sis

func

tion

para

met

ers,

wik

are

estim

ated

via

line

ar le

ast s

quar

es.

�C

ompl

exity

can

be

dete

rmin

ed b

y th

e nu

mbe

r of

basi

s fu

nctio

ns m

.

�Po

ssib

le to

use

res

ampl

ing

tech

niqu

es to

est

imat

epr

edic

tion

risk

to p

erfo

rm m

odel

sel

ectio

n.

�T

ypic

ally

use

nor

mal

ized

bas

is f

unct

ions

�A

llow

s to

be

inte

rpre

ted

as a

type

of

dens

ity m

ixtu

rem

odel

.

Tre

e-B

ased

Met

hods

�A

dapt

ivel

y sp

lit in

put s

pace

into

dis

join

t reg

ions

in o

rder

to c

onst

ruct

a d

ecis

ion

boun

dary

.

�B

ased

on

a gr

eedy

opt

imiz

atio

n pr

oced

ure.

�Sp

littin

g pr

oces

s ca

n be

rep

rese

nted

as

a bi

nary

tree

.

�Fo

llow

ing

the

grow

th o

f th

e tr

ee, p

runi

ng o

ccur

sas

a f

orm

of

mod

el s

elec

tion.

�G

row

ing

and

prun

ing

stra

tegy

pro

vide

s be

tter

clas

sifi

catio

n ac

cura

cy th

an ju

st g

row

ing

alon

e.

Page 9: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Tre

e-B

ased

Met

hods

�Pr

unin

g cr

iteri

a pr

ovid

es a

n es

timat

e of

the

pred

ictio

n ri

sk w

hile

the

grow

ing

crite

ria

roug

hly

refl

ects

em

piri

cal r

isk.

�R

esul

ting

clas

sifi

er h

as a

bin

ary

tree

repr

esen

tatio

n.

CA

RT

�Po

pula

r ap

proa

ch to

con

stru

ct a

bin

ary-

tree

-ba

sed

clas

sifi

er.

�G

reed

y se

arch

em

ploy

s a

recu

rsiv

epa

rtiti

onin

g st

rate

gy.

�C

ost f

unct

ion

�M

easu

re n

ode

impu

rity

.�

Giv

e a

mea

sure

men

t of

how

hom

ogen

eous

a n

ode

t is

with

res

pect

to th

e cl

ass

labe

ls o

f th

e tr

aini

ng d

ata

in th

ere

gion

of

node

t.

CA

RT

(C

ont’

d)

�C

rite

ria

to b

e m

et�

Q is

at i

ts m

axim

um o

nly

for

prob

abili

ties

(1/J

, … ,

1/J)

.

�Q

is a

t its

min

imum

onl

y fo

r pr

obab

ilitie

s (1

, 0, …

, 0)

, (0,

1, 0

, …,0

), …

, (0,

0, …

,0, 1

).

�Q

is a

sym

met

ric

func

tion

of it

s ar

gum

ents

.

)(

)(

)|

(

,)(

)(

))|

((,

),|

2(),

|1(

()

(

tn

tn

tj

pnt

nt

p

tJ

pt

pt

pQ

tQ

j=

=

=�

CA

RT

(C

ont’

d)

�E

xam

ples

�G

ini a

nd e

ntro

py f

unct

ions

are

use

d fo

r pr

actic

alim

plem

enta

tion.

�G

ini a

nd e

ntro

py f

unct

ions

do

not m

easu

re th

ecl

assi

fica

tion

risk

dir

ectly

.

Page 10: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

CA

RT

(C

ont’

d)

�T

wo

diff

icul

ties

whe

n us

ing

the

empi

rica

lm

iscl

assi

fica

tion

cost

�T

here

are

cas

es w

here

mis

clas

sifi

catio

n co

st d

oes

not d

ecre

ase

for

any

cand

idat

e sp

lit, l

eadi

ng to

earl

y ha

lting

in a

poo

r lo

cal m

inim

um.

�T

he m

iscl

assi

fica

tion

cost

doe

s no

t fav

or s

plits

that

tend

to p

rovi

de a

low

er m

iscl

assi

fica

tion

cost

infu

ture

spl

its.

CA

RT

(C

ont’

d)

�D

ecre

ase

in im

puri

ty

Var

iabl

e x k

and

the

split

poi

nt v

are

sel

ecte

d to

max

imiz

eth

e de

crea

se in

nod

e im

puri

ty.

CA

RT

(C

ont’

d)

�Pr

unin

g�

Perf

orm

ed a

fter

gro

win

g is

com

plet

ed.

�Im

plem

ents

mod

el s

elec

tion.

�B

ased

on

min

imiz

ing

pena

lized

em

piri

cal r

isk.

�Pe

rfor

med

in a

gre

edy

sear

ch s

trat

egy.

�E

very

pai

r of

sib

ling

leaf

nod

es is

rec

ombi

ned

in o

rder

tofi

nd a

pai

r th

at r

educ

es th

e fo

llow

ing. no

des

te

rmin

alof

num

ber

:

|

T|

data

tr

aini

ngfo

r th

e

rate

icat

ion

mis

clas

sif

: R

||

emp

TR

Rem

ppe

+=

Page 11: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

CA

RT

(C

ont’

d)

�Pr

oced

ure

�In

itial

izat

ion

�R

oot n

ode

cons

ists

of

who

le in

put s

pace

.

�E

stim

ate

prop

ortio

n of

the

clas

ses

via

p(j|t

=0)

=n j

(0)/

n.

�T

ree

grow

ing

�R

epea

t unt

il st

oppi

ng c

rite

rion

is s

atis

fied

.�

Perf

orm

exh

aust

ive

sear

ch o

ver

all v

alid

nod

es in

the

tree

,al

l spl

it va

riab

les,

and

all

valid

kno

t poi

nts.

�In

corp

orat

e th

e da

ught

ers

into

the

tree

that

res

ults

in th

ela

rges

t dec

reas

e in

the

impu

rity

usi

ng g

ini

or e

ntro

py c

ost

func

tion.

CA

RT

(C

ont’

d)

�T

ree

Prun

ing

�R

epea

t unt

il no

mor

e pr

unin

g oc

curs

.�

Perf

orm

exh

aust

ive

sear

ch o

ver

all s

iblin

g le

af n

odes

in th

etr

ee, m

easu

ring

the

chan

ge in

mod

el s

elec

tion

crite

rion

.

�D

elet

e th

e pa

ir th

at le

ads

to th

e la

rges

t dec

reas

e of

mod

else

lect

ion

crite

rion

.

�D

isad

vant

ages

�Se

nsiti

ve to

coo

rdin

ate

rota

tions

�B

ecau

se C

AR

T p

artit

ions

the

spac

e in

to a

xis-

orie

nted

subr

egio

ns.

�C

an b

e al

levi

ated

by

split

ting

on li

near

com

bina

tions

of

inpu

t var

iabl

es.

Nea

rest

-Nei

ghbo

r an

dPr

otot

ype

Met

hods

�G

oal o

f lo

cal m

etho

ds f

or c

lass

ific

atio

n�

Con

stru

ctio

n of

loca

l dec

isio

n bo

unda

ries

.

�In

SL

T v

iew

, loc

al m

etho

ds f

or c

lass

ific

atio

nfo

llow

the

fram

ewor

k of

loca

l ris

km

inim

izat

ion.

�In

cla

ssic

al d

ecis

ion

theo

ry, l

ocal

met

hods

are

inte

rpre

ted

as lo

cal p

oste

rior

den

sity

estim

atio

n.

Nea

rest

-Nei

ghbo

rC

lass

ific

atio

n

�C

lass

ifie

s an

obj

ect b

ased

on

the

clas

s of

the

kda

ta p

oint

s ne

ares

t to

the

estim

atio

n po

int x

0.

�N

earn

ess

is m

ost c

omm

only

mea

sure

d us

ing

eucl

idea

n di

stan

ce m

etri

c in

x-s

pace

.

�L

ocal

dec

isio

n ru

le is

con

stru

cted

usi

ng th

epr

oced

ure

of lo

cal r

isk

min

imiz

atio

n.

Page 12: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Nea

rest

… (

Con

t’d)

�E

xam

ple

: tw

o-cl

ass

prob

lem

�E

mpi

rica

l ris

k

Nea

rest

… (

Con

t’d)

�R

easo

ns f

or th

e su

cces

s�

Prac

tical

pro

blem

s of

ten

have

a lo

w in

trin

sic

dim

ensi

onal

ity e

ven

thou

gh th

ey m

ay h

ave

man

yin

put v

aria

bles

.

�E

ffec

t of

the

curs

e of

dim

ensi

onal

ity is

not

as

seve

re d

ue to

the

natu

re o

f th

e cl

assi

fica

tion

prob

lem

.�

Acc

urat

e es

timat

es o

f co

nditi

onal

pro

babi

litie

s ar

e no

tne

cess

ary

for

accu

rate

cla

ssif

icat

ion.

Lea

rnin

g V

ecto

r Q

uant

izat

ion

(LV

Q)

�O

utlin

e�

Use

vec

tor

quan

tizat

ion

met

hods

to d

eter

min

ein

itial

loca

tions

of

m p

roto

type

vec

tors

.

�A

ssig

n cl

ass

labe

ls to

thes

e pr

otot

ype.

�A

djus

t the

loca

tions

usi

ng a

heu

rist

ic s

trat

egy.

�R

educ

e th

e em

piri

cal m

iscl

assi

fica

tion

risk

.

�L

VQ

1, L

VQ

2, L

VQ

3 by

Koh

onen

.

LV

Q (

Con

t’d)

�L

VQ

1�

A s

toch

astic

app

roxi

mat

ion

met

hod.

�G

iven

(x(

k), y

( k))

: a

data

poi

nt,

cj( k

) : p

roto

type

cent

er,

and

prot

otyp

e la

bels

wj.

�D

eter

min

e th

e ne

ares

t pro

toty

pe c

ente

r to

the

data

poi

nt.

�U

pdat

e th

e lo

catio

n of

the

near

est p

roto

type

.�

If c

orre

ctly

cla

ssif

ied

�E

lse

�k=

k+1

Page 13: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Em

piri

cal C

ompa

riso

ns(1

)

�M

ixtu

re o

f G

auss

ians

(R

iple

y, 1

994)

Em

piri

cal C

ompa

riso

ns(2

)

�L

inea

rly

Sepa

rabl

e Pr

oble

m

�T

rain

ing

data

set

s ar

e ge

nera

ted

acco

rdin

g to

the

dist

ribu

tion

x∼N

(0,I

), x

∈ℜ

10.

�T

en tr

aini

ng s

ets

are

gene

rate

d, a

nd e

ach

data

set

cont

ains

200

sam

ples

.

�Pr

edic

tion

risk

is e

stim

ated

for

eac

h in

divi

dual

clas

sifi

er u

sing

a la

rge

test

set

(20

00 s

ampl

es).

Page 14: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Em

piri

cal C

ompa

riso

ns(3

)

�W

avef

orm

Dat

a�

Firs

t use

d in

Bre

iman

et a

l (19

84).

�21

inpu

t var

iabl

es th

at c

orre

spon

d to

21

disc

rete

time

sam

ples

take

n fr

om a

ran

dom

ly g

ener

ated

wav

efor

m.

�W

avef

orm

is g

ener

ated

usi

ng a

ran

dom

line

arco

mbi

natio

n of

two

out o

f th

ree

poss

ible

com

pone

nt w

avef

orm

s.

iji

iij

iji

iij

iji

iij

jh

uj

hu

x

jh

uj

hu

x

jh

uj

hu

x

εεε +−

+=

+−

+=

+−

+=

)(

)1(

)(

: 3 C

lass

)(

)1(

)(

: 2 C

lass

)(

)1(

)(

: 1 C

lass

32

31

21

�T

en tr

aini

ng s

ets

are

gene

rate

d, a

nd e

ach

data

set

cont

ains

300

sam

ples

.

�Pr

edic

tion

risk

is e

stim

ated

usi

ng a

larg

e te

st s

et(2

000

sam

ples

). Sum

mar

y

�U

nder

stan

ding

cla

ssif

icat

ion

met

hods

req

uire

scl

ear

sepa

ratio

n be

twee

n co

ncep

tual

pro

cedu

reba

sed

on th

e SR

M in

duct

ive

prin

cipl

e an

d its

tech

nica

l im

plem

enta

tion.

�C

once

ptua

l pro

cedu

re�

Tw

o ne

cess

ary

thin

gs

�M

inim

ize

the

empi

rica

l cla

ssif

icat

ion

erro

r (v

iano

nlin

ear

optim

izat

ion)

.

�E

stim

ate

accu

rate

ly f

utur

e cl

assi

fica

tion

erro

r (m

odel

sele

ctio

n).

Page 15: Classification - Seoul National University · Perceptron algorithm Simple optimization procedure for finding f (x, ... Fisher linear discriminant provides an approximation for the

Sum

mar

y (C

ont’

d)

�T

echn

ical

impl

emen

tatio

n�

Com

plic

ated

by

the

disc

ontin

uous

mis

clas

sifi

catio

n er

ror

func

tiona

l.

�Pr

even

ts d

irec

t min

imiz

atio

n of

the

empi

rica

l ris

k

�A

ll pr

actic

al m

etho

ds u

se a

sui

tabl

e co

ntin

uous

loss

fun

ctio

npr

ovid

ing

appr

oxim

atio

n fo

r m

iscl

assi

fica

tion

erro

r.

�In

mod

el s

elec

tion,

sho

uld

use

clas

sifi

catio

n er

ror

loss

.

�A

ccur

ate

estim

atio

n of

a p

oste

rior

pro

babi

litie

s is

not n

eces

sary

for

acc

urat

e cl

assi

fica

tion.

Goo

d pr

obab

ilit

y es

tim

ates

are

not

nec

essa

ry fo

r go

od c

lass

ific

atio

n;si

mil

arly

, low

cla

ssif

icat

ion

erro

r do

es n

ot im

ply

that

the

corr

espo

ndin

gcl

ass

prob

abil

itie

s ar

e be

ing

esti

mat

e (e

ven

rem

otel

y) a

ccur

atel

y

-

Fri

edm

an (

1997

)

Sum

mar

y (C

ont’

d)

�SL

T’s

exp

lana

tion

for

empi

rica

l evi

denc

e on

wel

l-be

havi

ng o

f si

mpl

e m

etho

ds�

Sim

ple

clas

sifi

catio

n m

etho

ds m

ay n

ot r

equi

re n

onlin

ear

optim

izat

ion,

so

the

empi

rica

l cla

ssif

icat

ion

erro

r is

min

imiz

ed d

irec

tly.

�O

ften

sim

ple

met

hods

pro

vide

the

sam

e le

vel o

f em

piri

cal

mis

clas

sifi

catio

n er

ror

in th

e m

inim

izat

ion

step

as

mor

eco

mpl

ex m

etho

ds.

�N

o ne

ed to

use

mor

e co

mpl

ex m

etho

ds.

�C

lass

ific

atio

n pr

oble

ms

are

inhe

rent

ly le

ss s

ensi

tive

than

regr

essi

on to

opt

imal

mod

el s

elec

tion.