Perceptron, Kern-Trick und Support Vector Machine · PD Dr. Martin Stetter, Siemens AG 1...

PD Dr. Martin Stetter, Siemens AG 1

Perceptron, Kern-Trick und Support Vector Machine

• Das Perceptron (Wiederholung)

• Kernel-Klassifikation

• Large-Margin Klassifikatoren

• Support Vector Machine

Klassifikation: Perceptron

Wiederholung: Der Kernel-Trick

Klassifikation: Kernel-Trick

Kernel-Klassifikation: Ein illustrierendes Beispiel

• Projektion des X-OR Problems in einen 3-D

Eigenschafts-Raum („feature space“).

Def: )2,,(),,(),(: 2122

2132121 xxxxzzzxx ==→φφ x

• Erfolgreiche lineare Klassifikation im

transformierten Raum: z.B.

)( by T +Θ= zv21),2,1,1( −=−= bv

• Beob. 1: Separierende Hyperebene in 3-D

entspricht nichtlinearer Klassifikation (hier:

separierender Ellipse) in 2-D. Denn:

Vxxxxzv TTT

vvxxvxvxv =⎟⎟⎠

⎞⎜⎜⎝

⎛=++=

orthogonale Eigenvektoren⇒= VVT 2,1, == iiii uVu λ

∑∑∑ ===⇒=i iij

Tiji ji

i ii const 2

,αλααα VuuVxxux

11 λ 21 λ

Ellipsengleichung mit Hauptachsen.2 consti ii =∑ αλ 21, uu

X-OR Problem in 3-D

PD Dr. Martin Stetter, Siemens AG 3Klassifikation: Kernel-Trick

• Beob. 3: Die Klassifikation läßt sich allein durch Berechnung des Kernels durchführen

)),(( xwky Θ=

Kernel Trick:

Aus jedem Algorithmus, der durch Skalarprodukte formuliert ist, läßt sich das SP durch einen

positiv definiten Kernel ersetzen und so ein alternativer Algorithmus formulieren.

Kernel-Klassifikation:

• Definiere einen Kernel

• Formuliere das Klassifikationsproblem unter Verwendung des Kernels:

),( xx ′k

))((,21),1,1(),(: 2 byb T +Θ=−=−== xwwwv φ

-- Hohe VC-Dim (Modellmächtigkeit) durch hohe Dimension von

-- Trotzdem effiziente Berechnung durch Kernel

-- Für das X-OR Problem (s.o.):

)(xφ),()()( xxxx ′=′ kTφφ

• Beob 2: Skalarprodukt im Feature-Raum läßt sich einfach im Originalraum (2-D)

berechnen. Betrachte dazu die Abb: )(),( xzwv φφ ==

heißt Kernel (Kern). heißt Kernel-Abbildung),( xx ′k )(xφ),()()(2)()( 22

2211221122

21 xwxwxwzv kxwxwxwxwxwxw TTT ≡=+=++==⇒ φφ

Im Fall des obigen Beispiels :

„Large-Margin“-Klassifikatoren

Klassifikation: Large Margin

Betrachte wieder linear separable Probleme

ObdA: Gehe über zu }1,1{}1,0{ −∈→∈ yy

Separierende „Hyperebene“

1)( −=my

1ˆ +=y

0=+⋅ bxw

1ˆ −=yw

• Beobachtung: Es existieren viele Lösungen (w,b) mit

Mmby mTm ...,1,0)( )()( =∀>+⋅xw

),sgn( )()( by mTm +⋅= xw also

• Idee: Wähle diejenige Lösung mit maximalem

Abstand zu den Datenpunkten: „Large Margin“

Warum?: Eindeutige Lösung, robusteste Lösung, kleinste VC Dimension (Generalisierung)

effizient lösbar (quadratisches Programm), führt zu Support Vector Machine

• Kanonische Form des Klassifikators

bb ′=′=µµ1

: wwalso mit

Linear separables Problem: Lösg (w‘,b‘) Mmby mTm ...,1,0,0)( )()( =∀>⇒≥>′+⋅′ µxw∃

Mmby mTm ...,11)( )()( =∀≥+⋅⇒ xw„=1“

=> Margin

• Margin eines Hyperebenen-Klassifikators:

}0||{||minmin)(mg )( =+−= bTm

mxwxxw

1)(min

minmin)(mg

=+=−=

In der kanonischen Form gilt:

• Large Margin:

-- Minimiere ||w||2

-- aber behalte kanonische Form bei: Also2

1minarg www b

LM = unter den Randbedingungen

Mmby mTm ...,1,0)1)(( )()( =∀≤−+⋅−⇒ xw

=> Optimierungsproblem mit

Rdbed: Lagrange-Fkt

Def. Margin:

)(min )(m

)(ˆ )1()1( xxw −= Td

Intuition: Large-Margin Ebene durch die

nächstliegenden Datenpunkte bestimmt

=−+⋅−= ∑=

1),,( )()(

)(2bybL mTm

m xwwáw α Sattelpunkt 0)( ≥mα

• Lösung des Optimierungsproblems bedingt:

-- KKT Bedingung: Mmby mTmm ,...,1,0)1)(( )()()( ==−+xwα=> Entweder liegt am margin: 0,1)( )()()( >⇒=+ mmTm by αxw)(mx erlaubt

=> Oder 0)( =mα => Datenpunkt trägt nichts zum Parametervektor w bei

= Support-Vektor)(mx

Support-Vektor-Machine

Support-Vektoren

)(0),,( mmM

m ybL xwáww ∑

=⇒=∂∂ α => w = Linearkomb. der Trainingsdaten--

-- 00),,( )(

)( =⇒=∂∂ ∑

m ybLb

-- Support-Vektoren liegen auf beiden Seiten der Ebene

-- Gesuchter Parametervektor ist eine

Linearkombination von Supportvektoren!

ist vollständig durch die Support-Vektoren bestimmt

• Duales Problem:

-- Eliminiere w und b durch Gleichungen für α )()(

)( mmM

m y xw ∑=

= α 0)(

)( =∑=

nTmnmnm yy1,

)(),()()()()(2

1xxw αα

∑∑∑∑====

+−−=−+−M

mTmmmTmM

m ybyby1

)()()()()(

)( )1)(( αααα

nTmnmnm yy1,

)(),()()()()( xxαα 0=

Duales Problemmin

)()()()()()()( =−=⇒ ∑∑ nm

nTmnmnm

m yyW xxá ααα

Mit RB: 0,,...,1,0 )()()( ==≥ ∑ m

mm yMm αα

-- Löse „duales“ Optimierungsproblem bezüglich α : Quadratisches Programm

-- Die Entscheidungsfunktion wird zu

)sgn()(ˆ)sgn()(ˆ1

)()()( byxybyM

TmmmT +=→+⋅= ∑ =xxxwx α

• Support-Vector-Machine und Kernel-Trick :

-- Beobachtung: Das duale Problem läßt sich rein durch Skalarprodukte formulieren

-- Kerneltrick anwendbar!!

• Behandlung verrauschter Probleme: Soft-Margin Klassifikatoren

-- Erlaube gelegentliche Verletzung des Margin

-- Lerne „Lockerungsvariablen“ mit0)( ≥mξmin,1)(

)()()()( =−≥+ ∑ =

mmmTm by ξξ aberxw

Klassifikation: Support-Vector Machine

-- Die Entscheidungsfunktion wird zu

Duales Problemmin),(

)()()()()()()( =−= ∑∑ nm

nmnmnm

m kyyW xxá ααα

Mit RB: 0,,...,1,0 )()()( ==≥ ∑ m

mm yMm αα

)),(sgn()(ˆ1

)()()( bkyxyM

mmm += ∑ =xxα

-- Bei nicht-separablen Problemen, löse:

single voxeltime series

Intensity BOLD signal

Beispiel: Klassifikation von Denkzuständen aus fMRI Bildern

Each fMRI volume is considered a feature vector:

• extremely high dimensional (196994 voxels after the mask),

• noise and sparse data

( ) { }nttxt vii ,...,1),(, ==X i=1,2 (class label: task or control)v=1,…, number of voxels

fMRI Brain volumes and feature vectors

Experiment Design I: Face Matching

Number of Scans(volumes):

Instruction for Control: Press button when image appears (images are abstract)Instruction for Face task : Press button when faces are identical No button press required when faces are different

Task: Face matchingControl

2 7 2 2 2 2 27 7 7 7 7 92

Number of Scans(volumes):

2 7 2 2 2 2 27 7 7 7 7 92

Instruction

Instruction for Control: Press button when image appears (images are abstract)Instruction for Location task: Press button when location of abstract images is same No button press required for different location

Task: Location matchingControlInstruction

Experiment Design I I: Location Matching

Machine Learning Algorithm

.fMRI volumes

from condition1

fMRI volumesfrom condition2

fMRI vectors (volumes)from a new subject

Machine Learning Algorithm

condition 1 or condition 2

Training

Volume withthe most

discriminativebrain regions

Training classifiers to predict the subject’s cognitive state, given their fMRI activity at a single timeinstant.

f: single fMRI volume -> Cognitive State (condition 1 vs. condition 2)

Objective

A brain with only two voxels

( )11 tX ( )31 tX( )22 tX ( )42 tX

voxel 1

voxel 2

( )11 tX

( )31 tX( )22 tX

( )42 tX

Support vectors

classification threshold

margin

classification vector w

A brain with only two voxels: Projection onto classification vector

( )11 tX ( )31 tX( )22 tX ( )42 tX

voxel 2

voxel 1

Projection onto classification vector:

Results: Healthy Subjects

Face task x Control task

Leave-one-out test: Healthy SubjectsFace task x Control task

leave the first subject out

leave the second subject out

leave the third subject out

leave the fourth subject out

leave the fifth subject out

Difference Vector

Face task x Control task

GFi: inferior frontal gyrus BA 47

FG: fusiform gyrus GTs: superior temporal gyrus BA 22

Lpi: Inferior parietal lobule BA 40

GFM: Middle frontal gyrus BA 9

Vector W

Face task x Control taskfusiform gyrus

GTm: middle temporal gyrus BA 22

Cuneus - BA 18

GFM: Middle frontal gyrus BA 9

Error rate

0,300,32Face task X Location task

0,230,27Location task X Control task

0,180,16Face task X Control task

PCA & SVMPCA & FLD

Results: Leave one-out-test

Perceptron, Kern-Trick und Support Vector Machine · PD Dr. Martin Stetter, Siemens AG 1...

Documents

Transcript of Perceptron, Kern-Trick und Support Vector Machine · PD Dr. Martin Stetter, Siemens AG 1...

Winnow vs perceptron

Perceptron (actual lecture) · 2020. 4. 21. · Perceptron (actual lecture) Tuesday, April 21, 2020 13:15 Perceptron Page 1 . Perceptron Page 2 . Perceptron Page 3 . Perceptron Page

Perceptron - SNU · •Hypothesis space of a perceptron (representation power of a perceptron) •The case where the threshold perceptron return 1 • defines a hyperplane in the

The Perceptron - dbs.ifi.lmu.de · A linear classi er can be realized by a Perceptron, a single formalized neuron! 11. The Perceptron: A Learning Machine The Perceptron was the rst

Perceptron Learning Rules

Perceptron (neural network)

ANN 3- Perceptron

Multi Layer Perceptron

Perceptron C++

Winow vs Perceptron

Perceptron Classiﬁcation lineaire)´

Multilayer Perceptron perceptron.pdf · Multilayer Perceptron ... input x belongs to C 1. Perceptron is cosmetically similar to logistic ... Learning Boolean XOR A simple perceptron

Trick Best Trick

Multiple Layer Perceptron ...

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.

Perceptron Learning

Jaringan Perceptron

Linear Classification: The Perceptron - Penn Engineeringcis519/fall2017/...Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron

IA Perceptron

The Kernel Trick - Linguistics · The Kernel Trick There are a lot of ... many applications for which one would want to use the kernel trick with an ordinary perceptron rather than