MLP and RBFN as Classifiers

8/13/2019 MLP and RBFN as Classifiers

1/28

MLPandRBFNasClassifiers

Dr.RajeshB.GhongadeProfessor,Vishwakarma InstituteofInformationTechnology,

Pune411048


2/28

Agenda

PatternClassifiers

Classifierperformance

metrics

CaseStudy:TwoClassQRSclassificationwithMLP

DataPre

processing

MLPClassifierMATLABDEMO

CaseStudy:TwoClassQRSclassificationwithRBFN

RBFNClassifierMATLABDEMO


3/28

Pattern

Classifiers Patternrecognitionultimatelyisusedforclassificationofapattern

Identifytherelevantfeaturesaboutthepatternfromthe

originalinformation

and

then

use

afeature

extractor

to

measurethem

Thesemeasurementsarethenpassedtoaclassifierwhich

performstheactualclassification,i.e.,determinesatwhichof

theexisting

classes

to

classify

the

pattern

Hereweassumetheexistenceofnaturalgrouping,i.e.wehave

someaprioriknowledgeabouttheclassesandthedata

Forexamplewemayknowtheexactorapproximatenumber

ofthe

classes

and

the

correct

classification

of

some

given

patternswhicharecalledthetrainingpatterns

Thistypeofinformationandthetypeofthefeaturesthatmay

suggest

which

classifier

to

apply

for

a

given

application


4/28

ParametricandNonparametric

Classifiers

Aclassifieriscalledaparametricclassifierifthediscriminantfunctionshaveawelldefinedmathematicalfunctionalform

(Gaussian)thatdependsonasetofparameters(meanand

variance)

Innonparametricclassifiers,thereisnoassumedfunctionalformforthediscriminants.Nonparametricclassificationis

solelydrivenbythedata.Thesemethodsrequireagreatdeal

ofdataforacceptableperformance,buttheyarefreefrom

assumptionsabout

the

shape

of

discriminant

functions

(or

datadistributions)thatmaybeerroneous.


5/28

MinimumDistanceClassifiers Ifthetrainingpatternsseemtoformclustersweoftenuseclassifiers

whichusedistancefunctionsforclassification.

Ifeachclassisrepresentedbyasingleprototypecalledthecluster

center,we

can

use

aminimum

distance

classifier

to

classify

anew

pattern.

Asimilarmodifiedclassifierisusedifeveryclassconsistsofseveral

clusters.

The

nearest

neighbor

classifier

classifies

a

new

pattern

by

measuring

itsdistancesfromthetrainingpatternsandchoosingtheclassto

whichthenearestneighborbelongs

Sometimestheaprioriinformationistheexactorapproximate

number

of

classes

c.

Eachtrainingpatternisinoneoftheseclassesbutitsspecific

classificationisnotknown.Inthiscaseweusealgorithmsto

determinethecluster(class)centersbyminimizingsome

performanceindexandarefounditerativelyandthenanewpattern

isclassifiedusingaminimumdistanceclassifier.


6/28

StatisticalClassifiers

Manytimesthetrainingpatternsofvariousclassesoverlap

forexamplewhentheyareoriginatedbysomestatistical

distributions.

Inthiscaseastatisticalapproachisappropriate,particularly

whenthevariousdistributionfunctionsoftheclassesare

known

Astatisticalclassifiermustalsoevaluatetheriskassociatedwitheveryclassificationwhichmeasurestheprobabilityof

misclassification.

TheBayes

classifier

based

on

Bayes

formula

from

probability

theoryminimizesthetotalexpectedrisk


7/28

Fuzzy

Classifiers

Quiteoftenclassificationisperformedwithsomedegreeof

uncertainty

Eithertheclassificationoutcomeitselfmaybeindoubt,or

theclassifiedpatternxmaybelonginsomedegreetomore

thanoneclass.

Wethusnaturallyintroducefuzzyclassificationwherea

patternis

amember

of

every

class

with

some

grade

of

membershipbetween0and1

ForsuchasituationthecrispkMeansalgorithmis

generalizedandreplacedbytheFuzzykMeansandafterthe

clustercenters

are

determined,

each

incoming

pattern

is

givenafinalsetofgradesofmembershipwhichdetermine

thedegreesofitsclassificationinthevariousclusters.

Thesearethenonparametricclassifiers.


8/28

ArtificialNeuralNetworks The

neural

net

approach

assumes

as

other

approaches

beforethatasetoftrainingpatternsandtheircorrect

classificationsisgiven

Thearchitectureofthenetwhichincludesinputlayer,output

layerandhiddenlayersmaybeverycomplex

Itischaracterizedbyasetofweightsandactivationfunction

whichdeterminehowanyinformation(inputsignal)isbeing

transmitted

to

the

output

layer. Theneuralnetistrainedbytrainingpatternsandadjuststhe

weightsuntilthecorrectclassificationsareobtained

Itisthenusedtoclassifyarbitraryunknownpatterns

Thereare

several

popular

neural

net

classifiers,

like

the

multilayeredperceptron(MLP),radialbasisfunctionneural

nets(RBFN),selforganizingfeaturemaps(SOFM)and

supportvectormachine(SVM)

Thesebelong

to

the

semi

parametric

classifier

type


9/28

PatternClassifiers


10/28

ClassifierPerformanceMetricsThe Confusion Matrix

The confusion matrix is a table where

the true classification is compared with

the output of the classifier

Let us assume that the true

classification is the row and the

classifier output is the column

The classification of each sample

(specified by a column) is added to the

row of the true classification

A perfect classification provides a

confusion matrix that has only the

diagonal populated

All the other entries are zero

The classification error is the sum of off

diagonal entries divided by the total

number of samples

TP:Truepositive

TN:Truenegative

FP:Falsepositive

FN:Falsenegative


11/28

1. Sensitivity (Se) is the fraction of abnormal ECG beats that are correctly detected

among all the abnormal ECG beats

2. Specificity (SPE) is the fraction of normal ECG beats that are correctly classified

among all the normal ECG beats

3. Positive predictivity is the fraction of real abnormal ECG beats in all detected

beats

4. False Positive Rate is the fraction of all normal ECG beats that are not rejected

ClassifierPerformanceMetrics


12/28

ClassifierPerformanceMetricscontinued5. Classification rate (CR) is the fraction of all correctly classified ECG beats, regardless of

normal or abnormal among all the ECG beats

6. Mean squared error (MSE) is a measure used only as the stopping criterion whiletraining the ANN

7. Percentage average accuracy is the total accuracy of the classifier

8. Training Time is the CPU time required for training an ANN described in terms of time

per epoch per total exemplars in seconds

9. Preprocessing time is the CPU time required for generating the transform part of the

feature vector in seconds

10. Resources consumed for the ANN topology is the sum of weights and biases for the

first layer and the second layer also called as adjustable or free parameters of the

network


13/28

ReceiverOperating

Characteristics

(ROC)

Thereceiveroperatingcharacteristicisametricusedtocheck

thequalityofclassifiers

Foreachclassofaclassifier,rocappliesthresholdvalues

acrosstheinterval[0,1]tooutputs.

Foreach

threshold,

two

values

are

calculated,

the

True

PositiveRatio(thenumberofoutputsgreaterorequaltothe

threshold,dividedbythenumberofonetargets),andthe

FalsePositiveRatio(thenumberofoutputslessthanthe

threshold,divided

by

the

number

of

zero

targets)

ROCgivesustheinsightoftheclassifierperformance

especiallyinthehighsensitivityandhighselectivityregion


14/28

Atypical

ROC

Foranidealclassifier

thearea

under

curve

ofROC=1


15/28

CaseStudy:TwoClassQRS

classificationwith

MLP

ProblemStatement:

DesignasystemtocorrectlyclassifyextractedQRScomplexesintoTWOclasses:NORMALandABNORMALusingMLP

ABNORMALNORMAL


16/28

DataPre

processing

MeanadjustthedatatoremovetheDCcomponent

Wecan

use

all

180

points

for

training

but

it

poses

acomputationaloverhead

Usingfeaturesfortrainingminimizesthe

responseof

ANN

to

noise

present

in

the

signal

TrainingtimeoftheANNreduces

OverallaccuracyoftheANNimproves

Generalizationof

the

network

improves

Featureextractionintransformdomaincanuse

DiscreteCosineTransform(DCT)

Onehot

encoding

technique

for

the

target


17/28

Selectionofsignificantcomponents

Metrics for component selection

Components that retain 99.5% of the signal energy

Percent Root Mean Difference (PRD)


18/28

DiscreteCosineTransform

Thirty DCT components contribute to 99.86%of the signal energy

PRD(30 components)=0.5343%


19/28

Sr. Transform Coefficients PRD % Signal Energy

(%)

1

DCT

5 9.8228 35.81

2 15 5.4096 80.53

3 30 0.5343 99.86

4 40 0.3134 99.93

DiscreteCosineTransformselectionofsignificantcoefficients

EffectoftruncatingDCT

coefficients


20/28

DatasetCreation

Itis

always

desirable

that

we

have

equal

number

ofexemplarsfromeachclassforthetraining

dataset

Thisprevents

favoring

any

class

during

training

Ifthenumberofexemplarsareunequalwehave

todeskewtheclassifierdecision

Deskewingsimplyscalestheoutputaccording

totheprobabilityoftheinputclasses

Datarandomizationbeforetrainingisamust

otherwise

repetitive

training

by

same

class

may

notallowthenetworktoconverge,remember

thattheerrorgradientisimportantforachieving

globalminima

,if

it

exists!


21/28

PartitionthedataintoTHREEdisjointsets

Training

set Crossvalidationset

Testingset

Beforethedataispresentedtothenetfortrainingwe

haveto

normalizethe

data

in

range

[1,1]

,this

helps

in

fasternetworklearning

AmplitudeandOffsetaregivenas:

Tonormalize

data:

Todenormalizedata:


22/28

Methodology


23/28


24/28

MLPClassifierMATLABDEMO


25/28


26/28


27/28

RBFNClassifierMATLABDEMO


28/28

ThankYou!

MLP and RBFN as Classifiers

Documents

Transcript of MLP and RBFN as Classifiers