Risk Based Loan Approval Framework

download Risk Based Loan Approval Framework

of 41

Embed Size (px)

description

A typical use case in Auto Lending industry - how a predictive risk model is built and deployed to assist in loan approval decisions.

Transcript of Risk Based Loan Approval Framework

  • 1. RISK BASED APPROVAL FRAMEWORK -Auto LoansDec 2013

2. CONTENTS Business Problem Methodology & Process How does the model get Deployed - 30K feet view Where else will the lender use the models?Do other industries use this framework too? References for reading materialsIntended for Knowledge Sharing only2 3. BUSINESS PROBLEMRisk based Approval/Pricing FrameworkIntended for Knowledge Sharing only3 4. BUSINESS PROBLEMBUSINESS PROBLEMRisk based Approval/Pricing Framework1What are the chances of non-repayment?2If it happens, how much money will go bad?3How Business sees it?How much will I ultimately recover if I repossess and sell off the vehicle?Note: * Non-repayment is defined as payments delayed by over 180 days since the due date.Intended Knowledge Sharing only Intended for for Knowledge Sharing only4 5. BUSINESS PROBLEMBUSINESS PROBLEMRisk based Approval/Pricing FrameworkHow Statisticians See it?12 3Intended Knowledge Sharing only Intended for for Knowledge Sharing only5 6. BUSINESS PROBLEMBUSINESS PROBLEMRisk based Approval/Pricing FrameworkHow Analysts See it?1Probability of non-repayment (PD)2Estimated $ of non-repayment (EAD)3Loss Post Recovery(LGD)Intended Knowledge Sharing only Intended for for Knowledge Sharing only6 7. HOW IS IT DONE? First step would be to convert a business problem into Analytical Framework (Label & Inputs), followed by.Data PreparationDimensionality ReductionModeling & AnalysisValidationRecommendations & Implementation Strategy Hypotheses - Important drivers and expected relationship Data preparation - Missing & Capping Treatment Bivariate - Type and Strength of the relationship Multivariate - VIF & CI (Similar to PCA) Model building on Development Sample -Identification of statistically significant drivers, Overall fit & Accuracy Model rebuilding on Validation Sample -Stability of drivers, Fit of model & Accuracy Framing of actionable recommendations and impact analysisIntended Knowledge Sharing only Intended for for Knowledge Sharing only7 8. HOWEVER IT SHOULD BE PRECEDED BY SEGMENTATION Customers need to be bucketed into homogenous buckets, to normalize for inherent variation between various types of customers/products etc. Loan TermCredit Score BandsLow End ModelsMid Range ModelsLuxury BrandsLeast Score Range 3 1 yearMid Score Range1High Score Range4 5Least Score Range 3 2 yearMid Score Range High Score Range2 4Intended Knowledge Sharing only Intended forfor KnowledgeSharingonly8 9. TRANSLATE INTO ANALYTICAL FRAMEWORK A model is a mathematical relationship between a Target/Label Variable and the Predictor/Input variables. Here Non-repayment is the Target/Label and application information are Predictors/Input VariablesNon-repayment = f {application data like Credit Score, %Monthly Payment to Income, etc.} We build models on a historical sample, i.e., where we have both application data and what happened with that application later on over the loan term.Predictors/Input VariablesAppl_ID 1 2 3Crd_sc %Pymt_Inc 750 10% 500 70% 650 25%Customer info at the time of applicationTarget/LabelsAppl_ID NP_Flag When 1 No 2 Yes 5th Month 3 No -Modeling Data Predictors/Input Variables + Target/Labels Appl_ID Crd_sc %Pymt_Inc 1 750 10% 2 500 70% 3 650 25%NP_flag When No Yes 5th Month No -Non-repayment info over loan termIntended Knowledge Sharing only Intended for for Knowledge Sharing only9 10. DATA CREATION- PREDICTOR VARIABLES & HYPOTHESES DATATYPEVARIABLESEXPECTED RELATIONSHIPAbsolute valuesCredit Score Payment to Income Ratio Debt to Income Ratio #Inquiries in last qtr, 12 months Total Outstanding Loan Bankrupty, Non-repayments, Charge offs, etc.-ve +ve +ve +ve +ve +veDeviations in Slope and LevelTrend, Shocks, etc.-ve/+veTotal Loan Requested Term of the loanDepends -ve/+ve Depends on market demand for the Make/Model -ve New = -veBUREAU DATALOAN DETAILSDEMOGRAPHIC DETAILSAbsolute valuesAbsolute valuesMACROECONOMIC DATAAbsolute valuesGEO DATAAbsolute ValuesTRANSACTIONS DATAAbsolute valuesDeviationDeviationMake/Model/Model Year of the Car Past relationship with the Lender New/Used Car Home Owner/Renter, #Dependents, Gender, Marital Status, Age,Occupation, Education, Profession GDP, Household Savings Ratio, Fuel Prices, Unemployment Rate, Interest Rates, etc. Trend, Shocks, etc. City, State, Region Cluster, Local Competition Data, Dealership level factors, etc. Monthly Payments, #Payments made, #Nonrepayments, Time to CO, Amount of Nonrepayment, Recovery Rate, etc. Trend, Shocks, etc.Depends on the variable Depends on the variable Depends on the variable Depends on the variableDepends on the variable Depends on the variableIntended Knowledge Sharing only Intended for for Knowledge Sharing only10 11. HOW IS IT DONE?Data PreparationDimensionality ReductionModeling & AnalysisValidationRecommendations & Implementation Strategy Hypotheses - Important drivers and expected relationship Data preparation - Missing & Capping Treatment Bivariate - Type and Strength of the relationship Multivariate - VIF & CI (Similar to PCA) Model building on Development Sample -Identification of statistically significant drivers, Overall fit & Accuracy Model rebuilding on Validation Sample -Stability of drivers, Fit of model & Accuracy Framing of actionable recommendations and impact analysisIntended Knowledge Sharing only Intended for for Knowledge Sharing only11 12. DATA PREPARATION CAPPING & MISSING VALUE TREATMENT Capping treatment is necessary to remove the effect of extreme/non-sensical values, very different from the rest of population.No.PyoffflgPrin0105LoanamtTermFixedAgnsttrBbctradNummorttRvoptbalNumminqMissing 282Numminq3observations 0102324.9199003601212821203796.522100240066911133978113112523.2420003601136350.3673211405190.9217603491428851911005153.6180003601588511950600601256.915500360.13409176000704403.325150900132141752357931803137.2178002401445282596710904256.599999993601817947130683411006442.431200360331771020Unrealistic values 1 9 1340.Missing treatment is imputation of missing values for certain variables, and is mandatory. If left unattended, entire record is excluded from Modeling.Intended Knowledge Sharing only Intended for for Knowledge Sharing only12 13. HOW IS IT DONE?Data PreparationDimensionality ReductionModeling & AnalysisValidationRecommendations & Implementation Strategy Hypotheses - Important drivers and expected relationship Data preparation - Missing & Capping Treatment Bivariate - Type and Strength of the relationship Multivariate - VIF & CI (Similar to PCA) Model building on Development Sample -Identification of statistically significant drivers, Overall fit & Accuracy Model rebuilding on Validation Sample -Stability of drivers, Fit of model & Accuracy Framing of actionable recommendations and impact analysisIntended Knowledge Sharing only Intended for for Knowledge Sharing only13 14. DIMENSIONALITY REDUCTION BIVARIATE ANALYSIS Bivariate analysis explores the nature and degree of relationship between the independent and dependent variables. Rank Plots: Checks if the predictor variables correlate with Target variable. Steps: Sort the population by predictor variable values Split into groups with equal number of obs, generally ten groups or deciles Get the average of Target variable in each group Check if there is a trend in average value of Target variables from the top group to bottomDummy = (predictor valueCut-offs used vary from 2 to 10Conditional Index (CI) Conditional Index is the square root of the ratio of the highest eigen value (max) and individual eigenvalue (). ->Cut-offs used vary from 13 to 30Very similar to Principal Component Analysis (PCA)Intended Knowledge Sharing only Intended for for Knowledge Sharing only15 16. GENERALIZED LINEAR MODELS SAMPLE VIF/CI OUTPUT The REG Procedure Model: MODEL1 Dependent Variable: NP_Flag Number of Observations Read Number of Observations UsedSourceModel Error Corrected Total Root MSE Dependent Mean Coeff Var40162 40162 Analysis of Variance DF Sum of Squares 12 610.91533 40149 9332.36401 40161 9943.27934 0.48212 0.5492 87.78642VariableDFIntercept Credit_Score %Down_Pymt_to_Loan %Mnthly_Pymt_to_Loan1 1 1 1Number1 2 3 8 9 10 11 12 13EigenvalueR-Square Adj R-SqMean Square 50.90961 0.23244F ValuePr > F219.02 |t| Estimate Error 1.24953 0.20693 6.04 Predictive Model is as a mathematical relationship between the predictors and Target Log (odds) = + 1X1 + 2X2 SAS procedure: Proc Logistic (with various link functions)Intended Knowledge Sharing only Intended for for Knowledge Sharing only22 23. HOW TO FIND IF A METHOD WORKS? For Logistic Models, following metrics are used as Performance diagnostics Concordance/Discordance: Overall indicator of the model prediction accuracy Pair all observations randomly Check the %pairs where the bad guy is given higher probability vs. the good guyRank Order: Similar test like above, but a more structured format Steps: Sorting: Sort the population by predicted probability Deciling: Bucket them into ten groups, each having 10% of the population in the sorted order Check the %Non-repayment guys in each decile Capturing: Ideally %bad guys should be highest in top deciles and lowest in bottom deciles. Top deciles should capture most of the Non-repayment guys.Gains Chart: Graphical representation of capturing by the model and performance against random bucketing.Akaike Information Criteria(AIC): Helps in selecting the most parsimonious regression models- maximum information capture with least number of predictors.apart from usual checks on Signs, Statistical Significance and if the model holds in the validation samples alsoIntended Knowledge Sharing only Intended for for Knowledge Sharing only23 24. SAMPLE MODEL OUTPUTEffect APPLICATION_PRIM_CB_ %Down_Pymt_to_Loan %Mnthly_Pymt_to_LoanType 3 Analysis of Effects DF Wald Chi-Square 2 14.5230