S7_extraFeatureSelection
-
Upload
sargentshriver -
Category
Documents
-
view
216 -
download
0
Transcript of S7_extraFeatureSelection
-
7/29/2019 S7_extraFeatureSelection
1/7
S7Extra:FeatureSelec/on
ShawndraHill
Spring2013
TR1:30-3pmand3-4:30
-
7/29/2019 S7_extraFeatureSelection
2/7
FeatureSelec/onStep1:
UseDomainknowledgetoguideyouwheneverpossible
Step2:
VisualizeaKributes
RemoveaKributeswithnovalues,toomanymissingvalues Checkforobviousoutliersandremovethem
Step3:
ConstructnewaKributes(ifitmakessense)
CombineaKributes NormalizenumericaKributes(forregression,NaveBayes,NNhKp:www.tus.edu
~gdallalregtrans.htm)
CreatebinaryaKributesfromnominalaKributesStep4:
SelectthebestsubsetofaKributesfortheproblem
IFINDOUBTCHOOSEAMETHODTHATDOESTHEFEATURESELECTIONFORYOU(forexample,
decisiontrees)
-
7/29/2019 S7_extraFeatureSelection
3/7
TheBasics
BasicIdeasUsuallyfacedwithproblemofselec/ngsubsetofpossiblepredictors
Havetobalanceconflic/ngobjec/ves Wanttoincludeallvariablesthathavelegi/matepredic/veskill Wanttoexcludeallextraneousvariablesthatfitonlysample-specificnoise
Reducepredic/veskill Increasestandarderrorsofregressioncoefficients,classifica/on,etc.
Ideallywouldbeabletodeterminesinglebestsubsetofpredictorstoinclude
Butnosingledefini/onofbest Differentalgorithmswillproducedifferent"best"subsets Problemsmagnifiedbycorrela/onamongpredictors
-
7/29/2019 S7_extraFeatureSelection
4/7
FeatureSelec/on
RankingBysomeobjec/ve(forexample,informa/ongain)
Subset
Algorithms(seenextslide)Wrapper(trysubsetwithinthecontextofthealgorithmyouknowyouaregoingtouse)
-
7/29/2019 S7_extraFeatureSelection
5/7
FeatureSelec/onAlgorithms
Allpossiblesubsets Onlyfeasiblewithsmallnumberofpoten/alpredictors(maybe10orless) Thencanuseoneormoreofpossiblenumericalcriteriatofindoverallbest
Forwardstepwiseregression Startwithnopredictors
Firstincludepredictorwithhighestcorrela/onwithresponse Insubsequentstepsaddpredictorswithhighestpar/alcorrela/onwithresponsecontrollingforvariablesalreadyinequa/ons
Stopwhennumericalcriterionsignalsmaximum(minimum) Some/meseliminatevariableswhentvaluegetstoosmall
Onlypossiblemethodforverylargepredictorpools Localop/miza/onateachstep,noguaranteeoffindingoverallop/mum
Backwardelimina/on Startwithallpredictorsinequa/on
Removepredictorwithsmallesttvalue Con/nueun/lnumericalcriterionsignalsmaximum(minimum)
Oenproducesdifferentfinalmodelthanforwardstepwisemethod
-
7/29/2019 S7_extraFeatureSelection
6/7
The degree of correlation between Xs.
A high degree of multicolinearity produces unacceptable
uncertainty (large variance) in regression coefficient estimates
(i.e., large sampling variation)
Imprecise estimates of slopes and even the signs of the
coefficients may be misleading.
t-tests which fail to reveal significant factors. Theanalysisofvariance
fortheoverallmodelmayshowahighlysignificantlygoodfit,whenparadoxically;thetestsforindividualpredictorsarenon-significant.
Mul/colinearity(regression)
-
7/29/2019 S7_extraFeatureSelection
7/7
S7Extra:FeatureSelec/on
ShawndraHill
Spring2013
TR1:30-3pmand3-4:30