THE PORTUGESE BANK’s DIRECT MARKETING …...THE PORTUGESE BANK’s DIRECT MARKETING CAMPAIGN GOAL...
Transcript of THE PORTUGESE BANK’s DIRECT MARKETING …...THE PORTUGESE BANK’s DIRECT MARKETING CAMPAIGN GOAL...
THE PORTUGESE BANK’s DIRECT MARKETING
CAMPAIGN
GOALTopredictiftheclientwillsubscribetothebank’sterm
depositthroughthecampaignbasedoncalls.
ROADMAPv Understandingthedatav ExploratoryDataAnalysisv FeatureSelectionandEngineeringv Choosingthemodelv Evaluationmetrics
THEFIRSTLOOK20Features
• Numberofrows=41188• CategoricalFeatures=10• NumericalFeatures=10• Fewunknownvalues
ClassImbalanceintarget
INSIGHTSFROMDATA
Preferablecontacttype
Frequencyofsubscriptiondependsonjobtitle.AsAdminandTechnicalrolesarestableroles.
Monthseemstobeanimportantfeatureasthedataisevenlydistributed.Highly,likelytouseHotEncoding.
CORRELATIONNumericalversusNumerical Categorical(housing)versusCategorical
CategoricalColumns P-Value
job 0.0900
y 0.0583
marital 0.0442
education 0.0118
default 0.0103
day_of_week 0.0012
poutcome 0.0000
month 0.0000
contact 0.0000
loan 0.0000
NumericalversusCategorical
HeatMap,CrosstabandChi-Squaredmethodtoidentifycorrelationsbetweendifferentvariables
FEATURESELECTION&ENGINEERINGq BasedonRandomForestEstimatorandDecisionTreeq FeatureImportancepredictedq Top7commonfeaturesareselectedfromboththemethods
q Binscreatedonthecolumnsage,campaign.q HandlingOutliersq Standardizedthecolumneurobi3musingminandmaxq Labelencodedonallthecategoricalvariablesq MissingValuesHandledq OversamplingfortheImbalancedClassthroughrandomoversamplingandSMOTE
• age• euribor3m• job• campaign• education• day_of_week• marital',housing'
RESULTSDataspitted(80–20)randomlytotrainandtestthealgorithms
Addingthecolumn‘Duration’tothemodelincreasestheefficiencyby12%butthecolumnisnotusedtopredictthesubscribersasdurationisnotknownbeforethecallisperformed.
FEATURES:07
ALGORITHM RECALL PRECISION AUCROC
LOGISTICREGRESSION 70% 23.8% 70.9%
RANDOMFORESTS 35.2% 28.8% 62.1%
ANYQUESTIONS?