Credit Risk Modelling

169
Credit Risk Modeling Arnar Ingi Einarsson Kongens Lyngby 2008 IMM-PHD-2008-1

Transcript of Credit Risk Modelling

CreditRiskModelingArnarIngi EinarssonKongensLyngby2008IMM-PHD-2008-1Technical UniversityofDenmarkInformaticsandMathematicalModellingBuilding321,DK-2800KongensLyngby,DenmarkPhone+4545253351, [email protected]:ISSN0909-3192SummaryThecredit assessment madebycorporatebanks has beenevolvinginrecentyears. Credit assessments have evolved from the being the subjective assessmentof the banks credit experts, to become more mathematically evolved. Banks areincreasinglyopeningtheireyestotheexcessiveneedforcomprehensivemodel-ing of credit risk. The nancial crisis of 2008 is certain to further the great needforgoodmodelingprocedures. Inthisthesisthemodelingframework forcreditassessment models is constructed. Dierent modeling procedures are tried, lead-ing to the assumption that logistic regression is the most suitable framework forcreditratingmodels. Analyzingtheperformanceofdierentlinkfunctionsforthelogisticregression,leadtotheassumptionthatthecomplementarylog-loglinkismostsuitableformodelingthedefaultevent.Validation of credit rating models lacks a single numeric measure that concludesthemodelperformance. Asolutiontothisproblemissuggestedbyusingprin-cipal component representatives of few discriminatory power indicators. With asingle measure of model performance model development becomes a much moreecientprocess. Thesamegoesforvariableselection. Thedatausedinthemodelingprocessarenotextensiveaswouldbethecaseformanybanks. Anresamplingprocess is introducedthat is useful ingettingstableestimates ofmodelperformanceforarelativelysmalldataset.iiPrefaceThis thesis was prepared at Informatics Mathematical Modelling, theTechnicalUniversityof Denmarkinpartial fulllmentof therequirementsforacquiringtheMasterofScienceinEngineering.The project was carried out in the period from October 1st 2007 to October 1st2008.Thesubjectofthethesisisthestatisticalaspectofcreditriskmodeling.Lyngby,October2008ArnarIngiEinarssonivAcknowledgementsIthankmysupervisorsProfessorHenrikMadsenandJesperCollianderKris-tensenfortheirguidancethroughout thisproject.I would also like to thank my family, my girlfriend Hrund for her moral support,my older son Halli for his patience and my new-born son Almar for his inspirationandforallowingmesomesleep.viContentsSummary iPreface iiiAcknowledgements v1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 AimofThesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 OutlineofThesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 CreditModelingFramework 52.1 DenitionofCreditConcepts . . . . . . . . . . . . . . . . . . . . 52.2 SubprimeMortgage Crisis . . . . . . . . . . . . . . . . . . . . . . 132.3 DevelopmentProcessofCreditRatingModels . . . . . . . . . . . 15viii CONTENTS3 CommonlyUsedCreditAssessmentModels 213.1 HeuristicModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 StatisticalModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 CausalModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4 HybridFormModels . . . . . . . . . . . . . . . . . . . . . . . . . 303.5 Performance ofCreditRiskModels . . . . . . . . . . . . . . . . . 314 DataResources 354.1 Datadimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Quantitativekeygures . . . . . . . . . . . . . . . . . . . . . . . 394.3 Qualitativegures . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4 Customerfactors . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.5 Otherfactorsandgures . . . . . . . . . . . . . . . . . . . . . . . 544.6 Exploratory dataanalysis . . . . . . . . . . . . . . . . . . . . . . 585 TheModelingToolbox 615.1 GeneralLinearModels . . . . . . . . . . . . . . . . . . . . . . . . 615.2 GeneralizedLinearModels. . . . . . . . . . . . . . . . . . . . . . 695.3 DiscriminantAnalysis . . . . . . . . . . . . . . . . . . . . . . . . 735.4 k-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . 775.5 CART,atree-basedMethod. . . . . . . . . . . . . . . . . . . . . 775.6 PrincipalComponentAnalysis . . . . . . . . . . . . . . . . . . . 806 ValidationMethods 85CONTENTS ix6.1 DiscriminatoryPower . . . . . . . . . . . . . . . . . . . . . . . . 856.2 RelativefrequenciesandCumulativefrequencies . . . . . . . . . 866.3 ROCcurves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.4 MeasuresofDiscriminatoryPower . . . . . . . . . . . . . . . . . 886.5 Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 ModelingResults 997.1 GeneralResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.2 PrincipalComponentAnalysis . . . . . . . . . . . . . . . . . . . 1077.3 ResamplingIterations . . . . . . . . . . . . . . . . . . . . . . . . 1127.4 Performance ofIndividualVariables . . . . . . . . . . . . . . . . 1147.5 Performance ofMultivariateModels . . . . . . . . . . . . . . . . 1197.6 AdditionofVariables. . . . . . . . . . . . . . . . . . . . . . . . . 1227.7 DiscriminantAnalysis . . . . . . . . . . . . . . . . . . . . . . . . 1247.8 Linkfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1268 Conclusion 1298.1 SummaryofResults . . . . . . . . . . . . . . . . . . . . . . . . . 1298.2 Furtherwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130ACreditPricingModeling 133A.1 ModelingofLossDistribution. . . . . . . . . . . . . . . . . . . . 133B AdditionalModelingResults 135x CONTENTSB.1 DetailedPerformance ofMultivariateModels . . . . . . . . . . . 135B.2 AdditionalPrincipalComponentAnalysis . . . . . . . . . . . . . 142B.3 UnsuccessfulModeling . . . . . . . . . . . . . . . . . . . . . . . . 149CProgramming 153C.1 TheRLanguage . . . . . . . . . . . . . . . . . . . . . . . . . . . 153C.2 Rcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154Chapter 1Introduction1.1 BackgroundBankingisbuiltontheideaofprotingbyloaningmoneytoonesthatareinneed of money. Banks then collect interests on the payments which the borrowermakesinordertopaybackthemoneytheyborrowed. Thelikelyeventthatsomeborrowers willdefault ontheirloans,thatisfailtomaketheirpayments,resultsinananciallossforthebank.Intheapplicationprocessfornewloans, banksassessthepotential borrowerscreditworthiness. Asameasureofcreditworthinesssomeassessmentaremadeonthe probability of default for the potential borrowers. The riskthat thecredit assessment of the borrowers is to modest,is called creditrisk. Credit riskmodelingisquiteanactiveresearcheld. BeforethemilestoneofAltman[2],creditriskoncorporate loanwasbasedonsubjectiveanalysisofcreditexpertsofnancialinstitutes.Probability of default is a key gure in the daily operation of any credit institute,asitisusedasameasureof creditrisk inbothinternalandexternalreporting.Thecreditriskassessmentsmadebybanksarecommonlyreferredtoascreditratingmodels. Inthisthesisvariousstatistical methodsareusedasmodeling2 Introductionproceduresforcreditratingmodels.1.2 AimofThesisThisthesisisdoneinco-operationwithacorporatebank, whichsuppliedthenecessarydataresources. Theaimof thethesisistoseewhetherlogisticre-gressioncanoutperformthecurrentheuristiccreditratingmodel usedintheco-operatingcorporatebank. Thecurrentmodel iscalledRatingModel Cor-porate(RMC)andisdescribedbetterinsection4.5.1. Thiswastheonlyclearaiminthebeginning,butfurthergoalswereacquiredintheproceedingsofthethesis.First somevariablesthatwerenotusedinRMCbut werestill available, aretested. Then an attempt was made to model credit default with dierent math-ematical procedures. Also an eort was made to combine some of those methodswithlogisticregression. Sincediscriminantanalysishaveseenexcessiveuseincreditmodelingtheperformanceof discriminantanalysiswasdocumentedforcomparison.Validation of credit ratings is hard compared to regular modeling whereas thereisnotrueorobservedratingthatcanbecomparedwiththepredictedcreditratingtomeasure the predictionerror. There aresome validationmethodsavailablebut nosingle measurecanbe usedinorder tomake the clear cutdecisiononwhether onemodel is better thanother. It is thus necessarytoconsidernumerousmeasures simultaneouslytodrawsomeconclusiononmodelperformance. This has an clear disadvantage as it might be debateable whetheronemodelisbetterthananother. Inordertoaddressthisproblemanattemptwasmadetocombinethemeasuresthatareavailableintoasinglemeasure.Asmissingvaluesarefrequentlyapparentinmanyof themodelingvariables,somethoughtsaremadeonhowthatparticularproblemcouldbesolved. Theproblemregarding smallsampleofdataisdealtwith.Thegeneralpurposeofthisthesisistoinformthereaderonhowitispossibletoconstructcreditratingmodels. Special emphasisismadeonthepracticalmethodsthatabankinthecorporate bankingsectorcouldmakeuseof,inthedevelopmentprocessofanewcreditratingmodel.1.3OutlineofThesis 31.3 OutlineofThesisCreditriskmodelingisawideeld. Inthisthesisanattemptismadetoshedalightonthemanydierentsubjectsofcreditriskmodeling. Chapters2and6providethefundamentalunderstandingofcreditriskmodeling.Thestructureofthethesisisasfollows.Chapter2: CreditModelingFramework. Introduces the basic conceptsof credit risk modeling. Furthermore, a discussion on the ongoing nancialcrisis isgiven. Thennallyadetaileddescriptionof themodelingprocessisgiven.Chapter3: CommonlyUsedCreditAssessmentModels. Givesabriefintroductiononthedierent types andperformanceof commonlyusedcreditassessmentmodels.Chapter4: DataResources. Gives aquitedetaileddescriptionabout thedatausedintheanalysis. Thedataweresuppliedbyaco-operatingcor-poratebank.Chapter5: TheModelingToolbox. Givesafull discussiononthemathe-maticalproceduresthatwhereusedinthemodeldevelopment.Chapter6: ValidationMethods. Introduces the largeselectionof valida-tion methods. As validation is a fundamental part of credit risk modeling.Chapter7: ModelingResults. The mainndings are presented. Perfor-manceof dierentmathematical proceduresarelisted. Furthermoretheperformanceofvariablesisgivenadiscussion.Chapter8: Conclusion. Concludes on the thesis and includes a section aboutfurtherworks.AppendixA: CreditPricingModels. Introduces a practical method to es-timatethelossdistribution. Theestimationof thelossdistributioncanbeusedtoextendthecreditratingmodeltoacreditpricingmodel.AppendixB: AdditionalModelingResults. Somemodelingresultsthatwereconsideredlessimportantresultsarepresented.AppendixC: Programming. Includes anintroductiontoR, theprogram-minglanguageused.4 IntroductionChapter 2CreditModelingFrameworkInorder to get a better feel for credit modeling frameworkthere are someimportantconceptsandmeasuresthatareworthconsidering. Itisalsoworthconsideringtheneedofcredit modelingandtheimportantrole ofinternationallegislationonbankingsupervision,calledBaselII.InSection2.1themostimportantconceptsof thecreditmodelingframeworkaredened. Thedenitions arepartlyadaptedfromthedetaileddiscussioninOng[26] andAlexanderandSheedy[1]. Section2.2discussestheongoingnancialcrisisthatarepartlyduetopoorcreditratingsandnallythemodeldevelopmentprocessisintroducedinSection2.3.2.1 Denitionof CreditConceptsThemajoractivityof most banks1is toraiseprincipal byloaningmoneytothosewhoareinneedofmoney. Theythencollectinterestsonthepaymentsmadebytheborrowerinordertopaybacktheprincipal borrowed. Assomeborrowers failtomaketheirpayments,theyaresaidtohavedefaultedontheirpromiseof repayment. Amore formal denitionof defaultisobtainedfrom the1Bythetermbankitisalsoreferredtoanynancialinstitutegivingcredit.6 CreditModelingFrameworkBaselIIlegislation[6]. Arm2, isdenedasadefaultrmifeitherorbothofthefollowingscenarioshavetakenplace.I-Thecreditinstitutionconsidersthattheobligorisunlikelytopayitscreditobligationsto the creditinstitution infull,withoutrecourseby the credit institution to actions such as realizing security (if held).II - The obligor is past due more than90 days onany materialcreditobligationtothebankinggroup. Overdraftswill beconsideredasbeing pastdue once thecustomerhasbreachedanadvisedlimitorbeenadvisedofalimitsmallerthancurrentoutstandings.By considering therst of thetwo rather formal denitions, it states that if thebank believes it will not receive their debt in full, without demanding ownershipofthecollateral3taken. Thesecondscenarioissimplerasitstatesthatiftheborrower has not paid some promised payment, which was due 90 days ago, theborrower is considered to have defaulted on its payment. The sentence regardingoverdrafts4canbeinterpretedasif theborrowerweretomakeatransactionbreakingtheadvisedlimitorisstrugglingtoloweritslimitandthusmakingthebankfearthattheywillnotreceivetheirpayment.It is important tonote the dierence betweenthe three dierent terms, in-solvency, bankruptcyanddefault. Thetreeterms, arefrequentlyusedintheliteratureasthesamething. Inordertoavoidconfusionthethreetermsaregiven an explanation here. The term insolvency refers to a borrower that unableitsdebtwhereastheborrower thathasdefaultedonitsdebtiseitherunwillingorunabletopaytheirdebt. Tocomplicatemattersevenfurtherinsolvencyisoftenreferredtoasthesituationwhenliabilitiesexceedassets,butrmsmightstillbeprotableandthusbeabletopay alltheirdebts. Bankruptcyisalegalnding that results in a court supervision over the nancial aairs of a borrowerthat is either insolvent or in default. It is important to note that a borrower thathasdefaultedcancomebackfrombeingdefaultedbysettlingthedebt. Thatmightbedonebyaddingcollateral or bygettingalternative fundings. Further-more, aswill beseenlater, whenconsideringlossgivendefault, theeventofadefaultdoesnotnecessaryresultinananciallossforthebank.Whenpotential borrowersapplyforaloanatabank, thebankwill evaluatethecreditworthinessof thepotential borrower. Thisassessmentisof whether2Armisanybusinessentitysuchasacorporation,partnershiporsoletrader.3Collateral isanassetof theborrowerthatbecomesthelendersif theborrowerdefaultsontheloan.4Overdraftisatypeof loanmeanttocoverrmsshorttermcashneed. Itgenerallyhasanupperboundandinterestsarepayedontheoutstandingbalanceoftheoverdraftloan.2.1DenitionofCreditConcepts 7theborrower canpaytheprincipalandinterestwhendue. Theriskthatarisesfrom theuncertainty of thecredit assessment, especially thatitistomodest,iscalled creditrisk. According to the Basel Handbook [26] credit risk is the majorrisktowhichbanksareexposed,whereasmakingloansistheprimaryactivityofmostbanks. AformaldenitionofcreditriskisgivebyZenios[35]asTheriskofanunkeptpaymentpromiseduetodefaultofanobligorcounter-party, issuer or borroweror due to adverse price movementsof an asset caused by an upgrading or downgrading of the credit qual-ity of an obligorthat bringsinto questiontheirabilityto make futurepayments.Thecreditworthinessmaydeclineovertime, duetobadmanagementorsomeexternal factors, suchas risingination5, weaker exchange rates6, increasedcompetitionorvolatilityinassetvalue.ThecreditriskcanbegeneralizedwiththefollowingequationCreditRisk = max {ActualLoss ExpectedLoss, 0}wheretheactuallossistheobservednancialloss. Creditriskisthustheriskthat the actual loss is larger than the expected loss. Expected loss is an estimateand the credit risk can be considered the risk that the actual loss is considerablelarger the the expectedloss. The expectedloss canbe dividedintofurthercomponentsasfollowsExpectedLoss= ProbabilityofDefault ExposureatDefault LossGivenDefaultAnexplanationofeachofthesecomponentsisadaptedfromOng[26].Probability of Default (PD) is the expectedprobabilitythat aborrower willdefault on the debt before its maturity7. PD is generally estimated by reviewingthehistorical defaultrecordof otherloanswithsimilarcharacteristics. PDisgenerally dened as the default probability of a borrower over a one year period.AsPDsaregenerallysmall numberstheyaregenerallytransformedtoariskgradeorriskrating,tomakethemmorereadable.Exposureat Default (EAD)istheamountthattheborrowerlegallyowesthebank. Itmaynotbetheentireamountofthefundsthebankhasgrantedthe5Inationisaneconomical termfor thegeneral increaseinthepricelevel of goods andservices.6Exchangeratesdescribestherelationbetweentwocurrencies, specifyinghowmuchonecurrencyisworthintermsoftheother.7Maturityreferestothenalpaymentdateofaloan,atwhichpointallremaining interestandprincipalisduetobepaid.8 CreditModelingFrameworkborrower. For instance, a borrower with an overdraft, under which outstandingsgo up and down depending on the borrowers cashow needs, could fail at a pointwhen not all of the funds has been drawn down. EAD is simply the exact amountthe borrower owes at the time of default and can easily be estimated at any timeasthecurrentexposure. Thecurrentexposureisthecurrentoutstandingdebtminus a discounted value of the collateral. The discounted value of the collateralismeanttorepresent theactualvalueofthecollateral.LossGivenDefault (LGD)isapercentageoftheactuallossofEAD, thatthebanksuers. Banksliketoprotectthemselvesandfrequentlydosobytakingcollateral orbyholdingcreditderivatives8asasecuritization. Borrowersmayevenhaveaguarantor whowilladopt thedebtiftheborrower defaults,inthatcasetheLGDtakesthevaluezero. Themirrorimageof LGD, recoveryrategiven default is frequently used in the literature and they add up to the amountowedbytheborrower at thetimeofdefault,EAD.Loss givendefaultissimplytheexpectedpercentage of lossonthefundsprovided totheborrower. Altmanetal. [4] reportsempirical evidencethatobserveddefaultratesandLGDsarepositively correlated. From this observation it is possible to conclude that banksare successful in protecting themselves when default rates are moderate, but failtodosowhenhighdefaultratesareobserved.Expected Loss(EL) can be seen as the average loss of historically observed losses.ELcanalsobeestimatedusingestimatesofthethreecomponentsinequation(2.1).EL = PDEAD LGD (2.1)ELestimations is partlydecisive of the banks capital requirement. Capitalrequirements,that is theamount of money that thebank has tokeepavailable,isdeterminedbynancial authoritiesandisbasedoncommoncapitalratios9.Thecapital requirementsarethoughusuallysubstantiallyhigherthanELasithastocoverall typesof riskthatthebankisimposedto, suchasmarket,liquidity, systematic and operational risks10or simply all risks that might resultin a solvency crisis for the bank. Un-expected Loss(UEL) is dened in Alexander8Credit derivatives arebilateral contracts betweenabuyer andseller, under whichtheseller sellsprotectionagainst thecredit riskof anunderlyingbond, loanor other nancialasset.9TierI,TierII,leverageratio,Commonstockholdersequity.10Marketrisk theriskofunexpectedchangesinpricesorinterestorexchangerates.Liquidityrisk theriskthatthecostsofadjustingnancial positionswill increasesubstan-tiallyorthatarmwillloseaccesstonancing.Systemicrisk theriskofbreakdowninmarketwideliquidityorchain-reactiondefault.Operational risk the risk of fraud, systems failures, trading errors, and many other internalorganizational risks.2.1DenitionofCreditConcepts 9andSheedy[1]withrespecttoacertainValueatRisk(VaR)quantileandtheprobability distribution of theportfolios loss. The VaR quantile can beseen asanestimateofthemaximumloss. TheVaRquantileisdenedmathematicallyasPr [Loss V aR] =, whereisgenerallychosenashighquantiles99%-99.9%. ForacertainVaRquantiletheUELcanbedenedasUEL = VaRELThenameun-expectedloss is somewhatconfusingas thevaluerather stateshowmuchincrementallosscouldbeexpectedinaworst casescenario. FurtherdiscussiononhowtoobtainanestimateofEL,VaRandUELcanbeseeninAppendixA.One of the primary objectives of this thesis is to consider how to obtain the bestpossibleestimateofprobabilityofdefaultofspecicborrowers. Itisthereforeworthconsideringwhatisthepurposeof acquiringthebestpossibleestimateof PDs. ThePDsarereportedasameasureof risktobothbanksexecutiveboardandtonancial supervisoryauthorities. Thedutyof nancial supervi-soryauthorityis tomonitor thebanks nancial undertakingsandtoensurethatbankshavereliablebankingprocedures. Financial supervisoryauthoritydeterminebanks,capitalrequirements. Asbanksliketominimizetheircapitalrequirements it is of great value to show that credit risk is successfully modeled.Expectedloss,capitalrequirements alongwiththePDsarethemainfactorsindeciding the interest rate for each borrower. As most borrowers will look for thebest oer on the market it is vital to have a good rating model. In a competitivemarket,banks willloanat increasingly lower interest rates. Thussomeof themmight default and as banks loan other banks, that might cause a chain reaction.BankinglegislationIfachainofbanksoramajorbankwoulddefault, itwouldhavecatastrophicconsequences on any economic system. As banks loan each others the operationsofbanksareveryintegratedwitheachother. Strongcommercialbanksarethedriving force inthe economical growthof anycountry, as theymake fundsavailablefor investors. Realizingthis thecentral bankgovernorsof theG10nations11foundedtheBasel CommitteeonBankingSupervisionin1974. Theaimofthiscommitteeisaccording totheirwebsite[8]TheBasel CommitteeonBankingSupervisionprovidesaforumforregularcooperationonbankingsupervisorymatters. Itsobjectiveis11Thetwelvememberstatesof G10are: Belgium, Netherlands, Canada, Sweden, France,Switzerland,Germany,UnitedKingdom,Italy,UnitedStates, JapanandLuxembourg.10 CreditModelingFrameworktoenhanceunderstandingofkeysupervisoryissuesandimprovethequalityof bankingsupervisionworldwide. It seeks todosobyex-changing information on national supervisory issues, approaches andtechniques, withaviewtopromoting commonunderstanding. Attimes, theCommitteeuses this commonunderstandingtodevelopguidelinesandsupervisorystandardsinareaswheretheyareconsid-ereddesirable. Inthisregard, theCommitteeisbest knownforitsinternationalstandardsoncapital adequacy;theCorePrinciplesforEective Banking Supervision; andthe Concordat oncross-borderbankingsupervision.The Basel committee published an accord called Basel II in 2004 which is meantto create international standards that banking regulators can use when creatingregulations about howmuchcapital, banks needtokeepsolvent inorder toavoidcreditandoperationalrisks.Morespecicallytheaimof theBasel IIregulationsisaccordingtoOng[26]toquantifyandseparateoperational riskfromcreditriskandtoensurethatcapital allocation is more risk sensitive. Inother words Basel IIsets a guidelinehow, banksin-houseestimationof thelossparameters; probabilityof default(PD), loss givendefault (LGD), andexposureat default (EAD), shouldbe.As banks needregulators approval, these guidelines ensure that banks holdsucientcapital tocovertheriskthat thebankexposesitself tothroughitslendingand investment practices. These international standards should protectthe international nancial system from problems that might arise should a majorbankoraseriesofbankscollapse.CreditModelingTheBasel IIaccord introduces good practices for internalbased rating systemsasanotheroptiontousingratingsobtainedfromcreditratingagencies. Creditratingagenciesrate; rms, countriesandnancial instrumentsbasedontheircredit risk. The largest andamongst the most citedagencies are Moodys,Standard&PoorsandFitchRatings. Internalbasedratingsystemshavetheadvantage over the rating agencies that, there are addition information availableinsidethatbank, suchascredithistoryandcreditexpertsvaluation. Internalbased ratings can be obtain for all borrowers whereas for rating agencies ratingsmightbemissingsomepotential borrowers. Furthermore, ratingagenciesjustpubliclyreporttheriskgradesoflargerrms, whereasthereisapricetoviewtheirratingsforsmallandmediumsizedrms.There are twodierent types of credit models that shouldnot be confused2.1DenitionofCreditConcepts 11together. Oneis credit ratingmodels andtheother is credit pricingmodels.There is a fundamental dierencein thetwo models as thecredit rating modelsareusedtomodel PDsandthepricingmodelsconsidercombinationsofPDs,EADs and LGDs to model the EL. A graphical representation of the two modelscanbeseeninFigure2.1.Figure2.1: Systematicoverview ofCreditAssessmentModels.Inthis thesis credit ratingmodels areof themainconcern, as it is of morepractical useandcanbeusedtoget estimates of EL. ByestimatingtheELthe same result as for credit pricingmodels is obtained. Reconsideringtherelationshipbetweentheriskcomponentsinequation(2.1).The PDs are obtained from the credit rating model, the EAD is easily estimatedasthecurrentexposure. Anestimateof LGDcanbefoundbycollectinghis-toricaldataofLGDandinFigure2.2anexampleofLGDdistributioncanbeseen. Theaveragewhichliesaround40%doesnotrepresentthedistributionwell. Amoresophisticatedprocedurewouldbetomodel theeventof loss ornoloss withsomeclassicationprocedure, e.g. logisticregression. Thenusetheleftpartof theempirical distributiontomodel thoseclassiedasnolossandtherightpartforthoseclassiedasloss. Theaveragesofeachsideofthedistributioncouldbeused. ItwouldthoughbeevenbettertouseLGDasastochasticvariable, andconsider it tobe independent of PD. It is generallyseeninpracticethatLGDsareassumedindependentofPDsasAltmanetal.[4] pointsoutthatthecommercial creditpricingmodels12useLGDeitheras12These value-at-risk (VaR) models include J.P. Morgans CreditMetrics R , McKin-12 CreditModelingFrameworkHistogram of LGDLGD [%]Relative Frequency0 20 40 60 80 1000.000.010.020.030.040.05Figure2.2: ExampleofaempiricaldistributionofLossGivenDefault(LGD).aconstantorastochasticvariableindependentfromPD.WhenestimationsofPDs, EADs and LGDs have been obtained they can be used to estimate the EL.ApracticalproceduretoestimatetheexpectedlossisgivenanintroductioninappendixA.seys CreditPortfolioViewR , Credit Suisse Financial Products CreditRisk+R , KMVsPortfolioManager R ,andKamakurasRiskManager R .2.2SubprimeMortgageCrisis 132.2 SubprimeMortgageCrisisItisimportanttorecognizetheimportanceofmacro-economics13onobserveddefault frequencies. By comparing the average default rates reported by Altmanet al. [4] and reports of recent recessions14a clear and simple relationship can beseen. Wikipedia [33] reports a recession in the early 1990s and in the early 2000andAltmanet al.[4]reports defaultrates higherthan10% in 1990, 1991, 2001and2002, whereasfrequentlyobserveddefaultratesarebetween1%and2%.Therelationshipisthathighdefaultratesareobservedatandafterrecessiontimes.Intheir 2006paper, Altmanet al. [4], arguethat therewasatypeof creditbubbleontherising, causingseeminglyhighlydistressedrmstoremainnon-bankruptwhen, inmorenormal periods, manyof thesermswouldhavede-faulted. Their words could be understood as there has been given to much credittodistressedrms, whichwouldthusresultingreaterlosseswhenthatcreditbubblewouldcollapse. Withthenancial crisisof 2008thatcreditbubbleiscertaintohavebursted. Thismightresultinhighdefaultratesandsignicantlossesforcorporatebanksinthenextyearortwo,onlytimewilltell.Thenancial crisis of 2008is directlyrelatedtothesubprimemortgagecri-sis, whereashighoil andcommoditypriceshaveincreasedination, whichhasinducedfurthercrisissituations. Abrief discussion, adaptedfromMaslakovic[22],onthesubprimemortgagecrisisanditscausesfollows.Thesubprimemortgagecrisisisanongoingworldwideeconomicproblem, re-sultinginliquidityissuesintheglobalbankingsystem. Thecrisisbeganwiththe burstingof the U.S. housingbubble inlate 2006, resultinghighdefaultratesonsubprimeandotheradjustableratemortgages (ARM).Theterm,sub-prime referstohigher-riskborrowers, thatisborrowerswithlowerincomeorlessercredithistorythanprime borrowers. Subprimelendinghasbeenama-jorcontributortotheincreasesinhomeownershipintheU.S. inrecentyears.Theeasilyobtainedmortgages, combinedwiththeassumptionof risinghous-ingpricesafteralongtermtrendofrisinghousingpricesencouraged subprimeborrowers totakemortgageloans. Asinterestrateswentup,andoncehousingpricesstartedtodropmoderatelyin2006and2007inmanypartsoftheU.S.,defaultsandforeclosure activityincreaseddramatically.13Macroeconomics istheeldof economics that considers theperformanceandbehaviorof anational orregional economyasawhole. Macroeconomiststrytomodel thestructureof national income/output, consumption, ination, interest rates andunemployment rates,amongstothers. Macro-referstolargescalewhereasmicro-referstosmallscale.14Arecessionisacontractionphaseofthebusinesscycle. Recessionisgenerallydenedaswhentherehasbeenanegativegrowthinrealgrossdomesticproduct(GDP)for twoormoreconsecutivequarters. Asustainedrecessionisreferredtoasdepression.14 CreditModelingFrameworkThemortgagelenderswerethersttobeaected,asborrowers defaulted,butmajor banks andother nancial institutions aroundthe worldwere hurt aswell. Thereasonfortheirpainwasduetoanancial engineeringtool calledsecuritization, where rights to the mortgage payments is passed on via mortgage-backedsecurities(MBS)andcollateralized debtobligations (CDO).Corporate,individual andinstitutional investors holdingMBSor CDOfacedsignicantlosses, as the value of the underlying mortgage assets declined. The stock pricesof thoserms reportinggreat losses causedbytheir involvement inMBSorCDOfelldrastically.The widespreaddispersionof credit riskthroughCDOs andMBSs andtheuncleareectonnancialinstitutionscausedlenderstoreducelendingactivityortomakeloansathigherinterestrates. Similarly, theabilityofcorporationstoobtainfundsthroughtheissuanceof commercial paperwasaected. Thisaspect of the crisis is consistent with a credit crisis term called credit crunch. Thegeneral crisiscausedstockmarketstodeclinesignicantlyinmanycountries.Theliquidityconcernsdrovecentralbanksaroundtheworldtotakeactiontoprovidefunds tomember bankstoencouragethelendingof funds toworthyborrowers andtore-invigorate thecommercialpapermarkets.Thecreditcrunchhascooledtheworldeconomicsystem, asfewerandmoreexpensive loans decreasethe investments of businesses andconsumers. Themajor contributors tothesubprimemortgage crisis were poor lendingpracticesandmispricingof credit risk. Credit ratingagencieshavebeencriticizedforgiving CDOs and MBSs based on subprime mortgage loans much higher ratingsthentheyshouldhave,thusencouraginginvestorstobuyintothesesecurities.Criticsclaim that conicts of interest were involved, as rating agencies are paidbythermsthatorganizeandsell thedebt toinvestors, suchasinvestmentbanks. The market for mortgages had previously been dominated by governmentsponsoredagencieswithstricterratingcriteria.Inthenancial crisis, whichhas beenespeciallyhardfor nancial institutesaround theworld, thewords of the prominent Cambridge economist John May-nardKeyneshaveneverbeenmoreappropriate,asheobservedin1931duringtheGreatDepression:Asoundbanker, alas, is not onewhoforesees danger andavoidsit, butonewho, whenheisruined,isruinedinaconventional wayalongwithhisfellows, sothatnoonecanreallyblamehim.2.3DevelopmentProcessofCreditRatingModels 152.3 Development Process of Credit Rating Mod-elsInthissectionthedevelopmentprocessof creditratingmodelsisintroduced.Figure2.3showsthesystematicoverviewofthecreditmodelingprocess. Therectangular boxes inFigure 2.3represent processes, whereas the boxes withtheslopedsidesrepresentnumericalinformations. AscanbeseenfromFigure2.3therearequiteafewprocessesinsidethecredit ratingmodelingprocess.Thegureshowsthejourneyfromtheoriginaldatatothemodelperformanceinformations.Figure2.3: Systematicoverview oftheCreditRatingModelingProcess.Thedatausedarerecordingsfromtheco-operatingbanksdatabase,andtheyarethesamedataasusedinRatingModel Corporate(RMC). Thedataaregivenafull discussioninChapter4canbecategorizedasshownatthetopofFigure2.3.Thedatagoesthroughacertaincleaningprocess. Armthatisnotobservedintwosuccessiveyears,itiseitheranewcustomeroraretiringone, andthusremovedfromthedataset. Observationswithmissingvaluesarealsoremovedfromthedataset.16 CreditModelingFrameworkWhenthe datahas beencleansedtheywill be referredtoas complete andtheyarethensplittedintotrainingandvalidationsets. Thetotaldatawill beapproximatelysplittedasfollowing,50%willbeusedasatrainingset,25%asavalidationsetand25%asatestset:Training Validation TestThe training set is used to t the model and the validation set is used to estimatethe prediction error for model selection. In order to account for the small sampleof data, that is of bad cases, the process of splitting, tting,transformation andvalidationisperformedrecursively.Thetestsetisthenusedtoassessthegeneralizationerrorof thenal modelchosen. Thetrainingandvalidationsets, together calledmodelingsets, arerandomlychosensetsfromthe2005, 2006and2007dataset, whereasthetestsetisthe2008dataset. Therecursivesplittingofthemodelingsetsisdonebychoosing a random sample without replacement such that the training set is 2/3andvalidationsetis1/3ofthemodelingset.In the early stages of the modeling process it was observed that dierent seedingsintotrainingandvalidationsets, resultedinconsiderabledierentresults. Inordertoaccommodatethisproblemaresamplingprocessisperformedandtheaverageperformanceover Nsamples is consideredfor variableselection. Inorder to ensure that the the same Nsamples are used in the resampling processthefollowingprocedureisperformed:- Firstarandom number,calledtheseed,isselectede.g. 2345.- Fromtheseedasetofrandomnumbers,calledaseedingpool,aregener-ated. The modeling sample is then splitted into the training and validationsetsusingaidentityfromtheseedingpool.- Afterthesplittingintothetrainingandvalidationsets,thedefaultratesof thetwosets arecalculated, respectively. If thedierenceindefaultratesismorethan 10%thenthatparticularsplitisrejectedandwithanew identity from the seeding pool a new split is tried recursively untilanappropriatetrainingandvalidationsetsareobtained.Anexampleof thedierent performancesfordierent splits for RMCandalogisticregressionmodel canbeseeninFigure2.4. Thegureshowstheclear2.3DevelopmentProcessofCreditRatingModels 17needfor theresamplingprocess. This canbeseenbyconsideringthedier-ent splitsiniteration1and50respecitvely. Foriteration1theRMCwouldhavebeenpreferredtotheLRmodel. Theoppositconclusionwouldhavebeenreachedifthesplitofiteration50wouldhavebeenconsidered.111111111111111111111111111111111111111111111111110 10 20 30 40 504202Performance ComparisonIterationPCA.stat22222222222222222222222222222222222222222222222222LR ModelRMCFigure2.4: ComparisonoftheperformanceofaLogisticregressionmodel andRMC. The performances have been ordered in such a way that the performanceoftheLRmodelisinanincreasingorder.Thedatasetsconsistsofcreditworthinessdataandthevariableofwhetherthermhasdefaultedayearlater. Thedefaultvariableisgiventhevalueoneifthermhasdefaultedandthevaluezerootherwise.When the training and validation sets have been properly constructed, the mod-elingisperformed. Themodelingreferstotheprocessofconstructingamodelthat can predict whether a borrower will default on their loan, using some previ-ous informations on similar rm. The proposed model is tted using the data ofthetrainingsetandthenapredictionismadeforthevalidationset. Iflogistic18 CreditModelingFrameworkregression15isusedasamodelingmethodthenthepredictedvalueswilllieonthe interval [0,1] and the predicted values can be interpreted as the probabilitiesof default(PD). Generallywhenoneismodelingsomeeventornon-eventthepredictedvaluesareroundedtooneforeventandtozerofornon-event. Thereisaproblemtothisasthettedvaluesdependlargelyontheratiosof zerosandonesinthetrainingsample. Thatis,forcaseswhentherearealotofzeroscomparedtoonesinthetrainingset, whichisthecaseforcreditdefaultdata,the predicted values will be small. Those probabilities can be interpreted as theprobabilityof defaultofindividualrm. Anexampleof computedprobabilitiescanbeseeninFigure2.5.Histogram of Probability of DefaultProb. DefaultFrequency0.00 0.05 0.10 0.15 0.20 0.25 0.300200400600Figure 2.5: Example of a emperical distribution of probabilities of default (PD).From Figure 2.5 it is apparent that the largest PD is considerable below 0.5 andthus all the tted values would get the value zero if they where rounded to binary15Logistic regressionis amodelingprocedure that is specializedfor modelingwhenthedependentvariableiseitheroneorzero. Logisticregressionisintroducedinsection3.2.2andamoredetaileddiscussioncanbeseeninsection5.2.2.2.3DevelopmentProcessofCreditRatingModels 19numbers. This is themain reason for why ordinary classication and validationmethods donot workoncredit default data. The observedprobabilities ofdefaultaresmall numbersandthusnoteasilyinterpreted. Hence, toenhancethereadabilitythedefault probabilities theyaretransformedtoriskratings.Rating Model Corporate has 12 possible ratings and the same transformation torisk rating scale was used for proposed models, in order to ensure comparability.ThetransformationfromPDstoriskratingsissummarizedinTable2.1.PD-interval Rating[ 0.0%; 0.11%[ 12[ 0.11%; 0.17%[ 11[ 0.17%; 0.26%[ 10[ 0.26%; 0.41%[ 9[ 0.41%; 0.64%[ 8[ 0.64%; 0.99%[ 7[ 0.99%; 1.54%[ 6[ 1.54%; 2.40%[ 5[ 2.40%; 3.73%[ 4[ 3.73%; 5.80%[ 3[ 5.80%; 9.01%[ 2[ 9.01%; 100.0%] 1Table2.1: Probabilitiesof Default (PD) aretransformedtotherelativeriskrating.It isapparentfromTable2.1thatthePD-intervalsareverydierentissize.ItisalsoapparentthatlowPDsrepresentingagoodborrower aretransformedtohighriskrating. Anexampleof ariskratingdistributioncanbeseeninFigure2.6. Whentheratingshavebeenobserveditispossibletovalidatetheresults, thatisdonebycomputingthediscriminatorypower16oftheobservedratings. Thediscriminatorypowerindicatorsarethencomparedtotheindica-torscalculatedforRMCinthespecicvalidationset. Themodelperformanceisconcludedfromthediscriminatorypowerindicators. Numerousdiscrimina-torypowermethodsarepresentedinSection6.4. Importantinformationcanbedrawnformvisual representationof themodel performanceasintherela-tiveandcumulativefrequenciesofthegoodandbadcasesrespectivelyandtherespectiveROCcurve,whichareallintroducedinSections6.2and6.3. Visualcomparison is not made when the modeling is performed on numerous modelingsets,thatiswhentheresamplingprocessisused.16Theterm,discriminatorypowerreferstothefundamentalabilitytodierentiatebetweengoodandbadcasesandisintroducedinSection6.1.20 CreditModelingFrameworkHistogram of Predicted RatingsRating ClassRelative Frequency2 4 6 8 10 120.000.050.100.15Figure2.6: Exampleof aRiskRatingdistribution, whenthePDshavebeentransformedtoriskratings.From the model performance it is possible to assess dierent varaibles and mod-elingprocedures. TheresultscanbeseeninSection7.Chapter 3CommonlyUsedCreditAssessmentModelsInthischapter,creditassessmentmodels,commonlyusedinpractice,arepre-sented. Firsttheirgeneralfunctionalityandapplicationisintroduced,followedbyalight discussionof current researchintheeldis given. Thecredit as-sessment models are used to rate borrowers based on their creditworthiness andthey can be grouped as seen in Figure 3.1. The three main groups are heuristic,statisticalandcausalmodels. Inpractice,combinationsofheuristicandeitherof theothertwomethodsarefrequentlyusedandreferredtoashybridmod-els. Thediscussionhereisadaptedfrom Datschetzkyet al.[13]1andshouldbeviewedforamoredetaileddiscussion.HeuristicmodelsarediscussedinSection3.1andabrief introductionof sta-tistical modelsinSection3.2andamoredetaileddiscussioninChapter5. InSection3.3 modelsbasedonoptionpricingtheory andcash owsimulationareintroducedandthennallyhybridformmodelsareintroducedinSection3.4.1Chapter322 CommonlyUsedCreditAssessmentModelsFigure3.1: Systematicoverview ofCreditAssessmentModels.3.1 HeuristicModelsHeuristic models attempt to use past experience to evaluate the future creditwor-thinessof a potentialborrower. Credit experts choose relevant creditworthinessfactorsandtheirweights,basedontheirexperience. Signicancyoffactorsarenotnecessarilyestimatedandtheirweightsnotnecessarilyoptimized.3.1.1 ClassicRatingQuestionnairesInclassic ratingquestionnaires the credit institutions, credit experts deneclearlyanswerablequestionsregardingfactorsrelevanttocreditworthinessandassignsxednumber of points tospecicanswers. Generally, thehigherthepointscorethebetterthecreditratingwill be. Thistypeof modelsarefre-quentlyobserved inthepublicsector,andthenlledoutbyarepresentative ofthe credit institute. An example of questions for a public sector customer mightbe,sex,age,maritualstatusandincome.3.1HeuristicModels 233.1.2 QualitativeSystemsInqualitativesystemstheinformationcategoriesrelevanttocreditworthinessaredenedbycredit experts, but incontrast toquestionnaires , qualitativesystemsarenotassignedaxedvalueineachfactor. Instead,arepresentativeof thecreditinstituteevaluatestheapplicantforeachfactor. Thismightbydonewithgradesandthenthenal assessmentwouldbeaweightedorsimpleaverage ofallgrades. Thegradingsystemneedtobewelldocumentedinordertogetsimilarratingsfromdierentcreditinstituterepresentatives.Inpractice,credit institutionshave usedtheseprocedures frequently,especiallyinthecorporatecustomer segment. Improvements indataavailabilityalongwithadvancesinstatisticshavereducedtheuseofqualitativesystems.3.1.3 ExpertSystemsExpert systemsaresoftwaresolutionswhichaimtorecreatehumanproblemsolvingabilities. Thesystemusesdataandrulesselectedbycreditexpertsinordertoevaluateitsexpertevaluation.AltmanandSaunders [3] reports that bankers tendtobe overlypessimisticaboutthecreditriskandthatmultivariatecredit-scoringsystemstendtoout-performsuchexpertsystems.3.1.4 FuzzyLogicSystemsFuzzylogicsystemscanbeseenasaspecial caseof expertsystemswiththeadditionalabilityoffuzzylogic. Inafuzzylogicsystem,specicvaluesenteredforcreditworthinesscriteriaarenotallocatedtoasinglecategorical terme.g.highorlow, rathertheyareassignedmultiplevalues. Asanexampleconsideraexpertsystemthatratesrmswithreturnonequityof15%ormoreasgoodandareturnonequityoflessthan15%aspoor. Itisnotinlinewithhumandecision-makingbehaviortohavesuchsharpdecisionboundaries, asitisnotsensible torate a rm withreturn on equity of 14.9% as poor and a rm withareturn onequity of 15% as good. By introducing a linguisticvariable as seen inFigure 3.2 a rm having return on equity of 5% would be considered 100% poorandarmhavingreturnonequityof25%wouldbeconsidered100%good. Arm witha returnon equityof 15% wouldbebeconsidered 50% poor and50%good. Theselinguisticvariablesareusedinacomputerbasedevaluationbased24 CommonlyUsedCreditAssessmentModels0 5 10 15 20 25 3000.20.40.60.81Return on equity (%)Poor GoodFigure3.2: ExampleofaLinguisticVariable.on the experience of credit experts. The Deusche Bundesbankuses discriminantanalysisasamainmodelingprocedurewiththeerrorrate18.7%, thenafterintroducingfuzzylogicsystemtheerrorratedroppedto16%.3.2 Statistical ModelsStatisticalmodelsrelyonempiricaldatasuggestedbycreditexpertsaspredic-torsofcreditworthiness,whileheuristicmodelsrelypurelyonsubjectiveexpe-rience of credit experts. In order to get good predictions from statistical modelslargeempiricaldatasetsarerequired. Thetraditionalmethodsofdiscriminantanalysis and logistic regression are discussed inSections 3.2.1 and 3.2.2, respec-tively. ThenmoreadvancedmethodsformodelingcreditriskarediscussedinSection3.2.3.3.2Statistical Models 253.2.1 DiscriminantAnalysisIn1968,Altman[2] introducedhisZ-scoreformulaforpredictingbankruptcy,thiswastherstattempttopredictbankruptcybyusingnancial ratios. Toform the Z-score formula, Altman used linear multivariate discriminant analysis,withtheoriginaldatasampleconsistedof66rms. Halfofthermshadledforbankruptcy.AltmanproposedthefollowingZ-scoreformulaZ= 0.12X1 + 0.14X2 + 0.033X3 + 0.006X4 + 0.999X5(3.1)whereX1=WorkingCapital /Total Assets.Measuresnetliquidassetsinrelationtothesizeofthecompany.X2=RetainedEarnings/Total Assets.MeasuresprotabilitythatreectsthecompanysageX3=EarningsBeforeInterestandTaxes/Total Assets.Measuresoperatingeciencyapartfromtaxandleveragingfactors.X4=MarketValueEquity/BookValueofTotal Debt.Measureshowmuchrmsmarket valuecandeclinebeforecominginsol-vent.X5=Sales/Total Assets.Standardmeasureforturnoverandvariesgreatlyfromindustrytoindus-try.All thevaluesexcepttheMarketValueEquity, inX4, canbefounddirectlyfromrmsnancialstatements. Theweightsoftheoriginal Z-score wasbasedondatafrompubliclyheldmanufacturerswithassetsgreaterthan$1million,buthassincebeenmodiedforprivatemanufacturing,non-manufacturing andservicecompanies. ThediscriminationofZ-scoremodelcanbesummarizedasfollows2.99Z-score FirmshavinghighprobabilityofdefaultAdvances in computing capacity has made discriminant analysis (DA) a populartool forcreditassessment. Thegeneral objectiveof multivariatediscriminant26 CommonlyUsedCreditAssessmentModelsanalysisistodistinguishbetweendefaultandnon-defaultborrowers, withhelpof several independentcreditworthinessgures. Lineardiscriminantfunctionsarefrequentlyusedinpracticeandcanbegivenasimpleexplanationas anweightedlinearcombinationofindicators. ThediscriminantscoreisD = w0 +w1X1 +w1X2 +. . . +wkXk(3.2)Themainadvantage ofDA,compared tootherclassicationprocedures isthattheindividualweightsshowthecontributionofeachexplanatory variable. Theresult of the linear functionis thenalsoeasytointerpret, as lowZ-scoreisobserveditrepresentsapoorloanapplicant.The downside to DA is that it requires the explanatory variables to be normallydistributed. Anotherprerequisiteisthattheexplanatory variables are requiredtohavethesamevarianceforthegroupstobediscriminated. Inpracticethisishowever oftenthoughttobelesssignicantandthusoftendisregarded.DiscriminantanalysisisgivenamoredetailedmathematicaldiscussioninSec-tion5.3.3.2.2 LogisticRegressionAnother popular tool for credit assessment is the logistic regression. Logistic re-gression usesas a dependent variable abinary variable that takes thevalueoneifa borrower defaultedintheobservation periodandzero otherwise. Theinde-pendent variables are all potentially relevant parameters to credit risk. Logisticregressionisdiscussedfurtherandinmoredetail inSection5.2.2. Alogisticregressionisoftenrepresentedusingthelogitlinkfunctionasp(X) =11 + exp[(0 +1X1 +1X2 + +kXk)](3.3)where p(X) is the probability of default given the kinput variables X. Logisticregression hasseveral advantages over DA.Itdoesnot requirenormal distribu-tionininputvariablesandthusqualitativecreditworthiness characteristics canbetakenintoaccount. Secondlytheresultsoflogisticregressioncanbeinter-preteddirectlyastheprobabilityof default. AccordingtoDatschetzkyetal.[13]logisticregression hasseenmorewidespreadusebothinacademic researchandinpracticeinrecent years. Thiscanbeattributedtotheexibilityindatahandlingandmorereadableresultscomparedtodiscriminantanalysis.3.2Statistical Models 273.2.3 OtherStatistical andMachineLearningMethodsIn this section a short introduction of other methods which can be grouped underthesameheadingofstatisticalandmachinelearningmethods. Asadvancesincomputerprogrammingevolvednewmethodsweretriedascreditassessmentmethods,thoseinclude- RecursivePartitioningAlgorithm(RPA)- k-Nearest NeighborAlgorithm(kNN)- SupportVectorMachine(SVM)- NeuralNetworks(NN)Abriefintroductionofthosemethodsfollows.RecursivePartitioningAlgorithm(RPA)One of these methods Recursive Partitioning Algorithm (RPA) is a data miningmethodthatemploysdecisiontreesandcanbeusedforavarietyof businessandscienticapplications. InastudybyFrydmanetal. [16] RPAwasfoundtooutperformdiscriminantanalysisinmostoriginalsampleandholdoutcom-parisons. Interestinglyitwasalsoobservedthatadditional informationwherederivedbyusingbothRPAanddiscriminantanalysisresults.Thismethodisalso known asclassication andregression trees(CART)andisgivenamoredetailedintroductionunderthatnameinSection5.5.k-NearestNeighborAlgorithm(kNN)k-NearestNeighborAlgorithmisanon-parametricmethodthatconsiderstheaverage ofthedependentvariableofthekobservationthataremostsimilartoanewobservationandisintroducedinSection5.4.SupportVectorMachine(SVM)Support Vector Machine is method closely related to discriminant analysis whereanoptimal nonlinearboundaryisconstructed. Thisrathercomplexmethodis28 CommonlyUsedCreditAssessmentModelsgivenabriefintroductioninSection5.3.3.Neural Networks(NN)Neural networks use information technology in an attempt to simulate the com-plicatedwayinwhichthehumanbrainprocessesinformation. Withoutgoingintotomuchdetail onhowthe humanbrainworks neural networks canbethoughtof asmulti-stageinformationprocessing. Ineachstagehiddencorre-lationsamongtheexplanatoryvariablesareidentiedmakingtheprocessingablackboxmodel2. Neuralnetworkscanprocessanyformofinformationwhichmakes thenespeciallywell suitedtoformagoodratingmodels. CombiningtheblackboxmodelingandalargesetofinformationNNgenerallyshowhighlevelsofdiscriminatorypower. However,theblackboxnatureofNNresultsingreat acceptance problems. Altman et al. [5] concluded that the neural networkapproachdidnotmateriallyimproveuponthelineardiscriminantstructure.3.2.4 HazardRegressionHazardregression3considererstimeuntil failure, defaultinthecaseof creditmodeling. Lando[21] referstohazardregressionasthemostnatural statisti-calframeworktoanalyzesurvivaldatabutasAltmanandSaunders[3]pointsout annancial institutewouldneedaportfolioof some20,000-30,000rmstodevelopverystable estimates of default probabilities. Veryfewnancialinstitutesworldwidecomeevenremotelyclosetohavingthisnumberofpoten-tial borrowers. TheRobert Morris Associates, Philadelphia, PA, USA, havethough initiateda project to develop a shared national data base, among largerbanks, ofhistoricmortalitylossratesonloans. Ratingagencies,haveadoptedandmodiedthemortalityapproachandutilizeitintheirstructurednancialinstrumentanalysis,accordingtoAltmanandSaunders[3].3.3 Causal ModelsCausal models incredit assessment procedures usethe analytics of nancialtheory to estimate creditworthiness. These kind of models dier from statisticalmodelsinthewaythattheydonotrelyonempiricaldatasets.2Ablackboxmodelisamodelwheretheinternalstructureofthemodelisnotviewable3HazardRegressionisalsocalledSurvivalAnalysisintheliterature.3.3Causal Models 293.3.1 OptionPricingModelsTherevolutionary work ofBlack andScholes (1973) andMerton (1974) formedthebasisoftheoptionpricingtheory. Thetheorywasoriginallyusedtopriceoptions4canalsobe usedto valuate default riskonthe basis of individualtransactions. Optionpricingmodelscanbeconstructedwithoutusingacom-prehensivedefault history, howeveritrequiresdataontheeconomicvalueofassets, debtandequityandespeciallyvolatilities. Themainideabehindtheoptionpricingmodel isthatcreditdefaultoccurswhentheeconomicvalueoftheborrowers assetfallsbelowtheeconomicvalueofthedebt.The data required makes it impossible to use option pricing models in the publicsector andit is not without its problemtorequire the dataneededfor thecorporate sector, it is for example dicult in many cases to assess the economicvalueofassets.3.3.2 CashFlowModelsCashowmodels aresimulationmodels of futurecashowarisingfromtheassetsbeingnancedandarethusespeciallywell suitedforcreditassessmentinspecializedlendingtransactions. Thusthetransaction itselfisrated,notthepotential borrower and the result would thus be referred to as transaction rating.Cash ow models can be viewed as a variation of the option pricing model wherethe economic value of the rm is calculated on the basis of expected future cashow.3.3.3 FixedIncomePortfolioAnalysisSincethepioneeringworkof Markowich, 1959, portfoliotheoryhasbeenap-pliedoncommonstockdata. Thetheorycouldjustaswell beappliedtothexedincomeareainvolvingcorporateandgovernmentbondsandevenbanksportfolioof loans. Eventhoughportfoliotheorycouldbeauseful tool for-nancialinstitutes,widespreaduseofthetheoryhasnotbeenseenaccordingtoAltmanandSaunders[3]. Portfoliotheorylaysouthowrationalinvestorswillusediversicationtooptimizetheirportfolio. Thetraditional objectiveoftheportfoliotheoryistomaximizereturnforagivenlevel ofriskandcanalsobeused for guidance on how to price risky assets. Portfolio theory could be applied4nancial instrument that gives theright, but not theobligation, toengageinafuturetransactiononsomeunderlyingsecurity.30 CommonlyUsedCreditAssessmentModelstobanksportfoliotoprice, bydetermininginterestrates, newloanapplicantsaftercalculatingtheirprobabilityofdefault(PD),theirriskmeasure.3.4 HybridFormModelsThemodelsdiscussedinprevioussectionsarerarelyusedintheirpureform.Heuristicmodelsareoftenusedincollaboration withstatisticalorcausal mod-els. Even though statistical and causal models are generally seen as better ratingprocedures the inclusion of credit experts knowledge generally improves ratings.Inaddition not all statistical models are capable of processing qualitative infor-matione.g. discriminantanalysisortheyrequirealargedatasettoproducesignicantresults.Theuseofcreditexpertsknowledgealsoimprovesusersacceptance.There are four mainarchitectures tocombine the qualitative datawiththequantitativedata.- Horizontal linkingofmodel types. Thenbothqualitativeandquanti-tativedataareusedasainputintheratingmachine.- Overrides, heretheratingobtainedfromeither statistical or acausalmodel isalteredbythecreditexpert. Thisshouldonlybedoneforfewrms and only if it is considered necessary. Excessive use of overrides mayindicatealack of user acceptance ora lack of understandingof theratingmodel.- KnockOutCriteria,herethecreditexpertssetsomepredenedrules,whichhavetobefullledbeforeancreditassessmentismade. Thiscanfor example that some specic risky sectors are not considered as possiblecustomers.- Special Rules, herethecreditexpertssetsomepredenedrules. Therules can be on almost every form and regard every aspect of the modelingprocedure. Anexampleofsuchruleswouldbethatstart-uprmscouldnotgethigherratingsthansomepredenedrating.Allorsomeofthesearchitecturescouldbeobservedinhybridmodels.3.5PerformanceofCreditRiskModels 313.5 PerformanceofCreditRiskModelsInordertosummarizethegeneral performanceof themodelsinthisChaptertheperformanceof someof themodelscanbeseeninTable3.1Datschetzkyet al. [13]5reports alist of Gini Coecient6values obtainedinpracticefordierent types of rating models. As can be seen in Table 3.1 multivariate modelModel GiniCoecientUnivariatemodels In general, good individual indicatorscan reach 30-40%. Special indica-tor may reach approx 55% in selectedsamples.Classicratingquestionnaire Frequentlybelow50%/qualitativesystemsOptionpricingmodels Greater than55% for exchange-listedcompanies.Multivariate models (discriminantanalysisandlogisticregression)Practical models with quantitative in-dicators reach approximately 60-70%.Multivariate models with quantita-tiveandqualitativefactorsPractical models reach approximately70-80%NeuralNetworks Upto80%inheavilycleansedsam-ples: however, inpracticethis valueishardlyattainable.Table3.1: Typical valuesobtainedinpracticefor theGinicoecient asamea-sureofdiscriminatorypower.generally outperform option pricing models by quitea margin. Theimportanceofqualitativefactorsasmodelingvariablesisalsoclear. Neuralnetworkshavealsobeenshowntoproducegreatperformance,butthehighcomplexityoftheratingproceduremakesneuralnetworksalessattractiveoption.Inthestudyof Yuetal. [34] highlyevolvedneural networkswherecomparedwithlogisticregression, simplearticial neural network(ANN)andasupportvectormachine(SVM). Thestudyalsocomparedafuzzysupportvectorma-chine(FuzzySVM). Thestudywasperformedondetailedinformationof 605pp. 1096TheGini coecient ranges formzerotoone, onebeingoptimal.TheGini coecient isintroducedinSection6.432 CommonlyUsedCreditAssessmentModelscorporationswhichof 30whereinsolvent. TheresultsreportedinTable3.27Category Model Rule AverageHitRate(%)Single LogR 70.77[5.96]ANN 73.63[7.29]SVM 77.84[5.82]Hybrid FuzzySVM 79.00[5.65]Ensemble Voting-based Majority 81.63[7.33]Reliability-based Maximum 84.14[5.69]Minimum 85.01[5.73]Median 84.25[5.86]Mean 85.09[5.68]Product 85.87[6.59]Table3.2: Resultsof acomprehensivestudyof Yuetal. [34], emphasizingonneuralnetworks. Theguresinthebracketsarethestandarddeviations.showthatlogisticregression hastheworst performanceofallthesinglemodel-ingprocedures,whereasSVMperformsbestofthesinglemodelingproceduresByintroducingfuzzylogictotheSVMtheperformanceimproves. Themulti-stage reliability-based neural network ensemble learning models all show similarperformanceandoutperformthesingleandhybridformmodelssignicantly.GalindoandTamayo[17] conductedanextensivecomparativeresearchof dif-ferentstatisticalandmachinelearningmodelingmethodsofclassicationonamortgageloandataset. Theirndingsforatrainingsampleof 2,000recordsare summarized in Table 3.3. The results show that CART decision-tree modelsModel Average HitRate(%)CART 91.69NeuralNetworks 89.00K-Nearest Neighbor 85.05Probit 84.87Table3.3: Performanceof dierentstatistical andmachinelearningmodelingmethodsofclassicationonamortgage loandataset7TotalHitRate =numberofcorrectclassicationthenumberofevaluationsample3.5PerformanceofCreditRiskModels 33provide the best estimation for default withan average 91.69% hit rate. NeuralNetworksprovidedthesecondbestresultswithanaveragehitrateof89.00%.TheK-NearestNeighboralgorithmhadanaveragehitrateof 85.05%. Theseresultsoutperformedalogisticregression modelusingtheProbitlinkfunction,which attained an average hit rate of 84.87%. Although the results are for mort-gage loan datait isclear thattheperformance of logisticregression modelscanbeoutperformed.CurrentstudiesCreditcrisisinthe70sand80sfueledresearchesintheeld,resultingingreatimprovements inobserved default rates. Highdefault rates intheearly 90s andinthebeginningofanewmillenniumhaveensuredthatcreditriskmodelingisstill an active research eld. In the light of the nancial crisis of 2008, researchesintheeldaresuretocontinue. Mostofthecurrentresearch ishighlyevolvedand well beyond the scope of this thesis and is thus just given a brief discussion.Even though it is not very practical for most nancial institutes much of currentresearches are focused on option pricing models. Lando [21] introduces IntensityModeling asthemostexcitingresearchareaintheeld. Intensitymodelscanexplained in a naive way as a mixture of hazard regression and standard pricingmachinery. Theobjectiveof Intensitymodelsisnottogettheprobabilityofdefault but to build better models for credit spreads and default intensities. ThemathofIntensitymodelsishighlyevolvedandoneshouldrefertoLando[21]foracompletediscussiononthetopic.Thesubjectofcreditpricinghasalsobeensubjecttoextensiveresearches,es-peciallyascreditderivativeshaveseenmorecommonuse. Theuseofmacroe-conomicalvariablesisseenasamaterialforprospectivestudies.The discussion here on credit assessment models is rather limited and for furtherinterestonecouldviewAltmanandSaunders[3] andAltmanetal. [4] foradiscussiononthedevelopmentincreditmodeling,Datschetzkyetal.[13]foragood overview of models used in practice. Lando [21] then gives a good overviewofcurrentresearch intheeld,alongwithextensivelistofreferences.34 CommonlyUsedCreditAssessmentModelsChapter 4DataResourcesThetimesweliveinaresometimesreferredtoastheinformationage,whereasthe technical breakthroughof commercial computers havemade informationrecordingsaneasiertask. Alongwithincreasedinformationithasalsomadecomputations more ecient furthering advances in practical mathematical mod-eling.Inthedevelopmentofastatisticalcreditratingmodelsthequalityofthedatausedinthemodeldevelopment, isof great importance. Especially important istheinformationonthefewrmsthathavedefaultedontheirliabilities.Inthis chapter thedatamadeavailablebytheco-operatingCorporatebankarepresented. Thischapterispartlyinuencedbytheco-operatingbanksin-housepaperCredit[11]. Section4.1introducesdatadimensionalityanddataprocessingisdiscussed. IntroductionofquantitativeandqualitativeguresaregiveninSections4.2and4.3,respectively. CustomerfactorsareintroducedinSection4.4andotherfactors andgures areintroducedinSection4.5. Finally,somepreliminarydataanalysisareperformedinSection4.6.36 DataResources4.1 DatadimensionsThedatausedinthemodelingprocessarethedatausedintheco-operatingCorporatebanks current credit ratingmodel, whichis calledRatingModelCorperate (RMC),whichisintroducedinSection4.5.1. Theavailable datacanbegroupedaccordingtotheiridentityintothefollowinggroups- Quantitative- Qualitative- Customerfactors- OtherfactorsandguresRating Model Corperate is a heuristic model and was developed in 2004. There-fore, therstrawdataarefrom2004ascanbeseeninTable4.1. Inordertovalidate the performance of the credit ratingmodel the dependent variable,whichiswhetherthermhasdefaultedonitsobligationsayearafteritwasrated,isneeded. Inorder toconstructdatasetsthataresubmissibleforvalida-tion,rms that are not observed in two successive years and thus being either anew customer or a retireing one, are removed from the dataset. The rst valida-tion was done in 2005 and from Table 4.1 it can be seen that the observations oftheconstructed2005 dataset arenoticeablyfewerthantheraw datasetof 2004and2005,duetotheexclusionofneworretireingcustomers. Theconstructeddatasetsarethedatasetsthattheco-operatingbankwouldperformtheirvali-dationon,theyarehowever notsubmissibleforuseinmodelingpurposes. Thereasonforthatisthattherearemissingvaluesintheconstructeddataset.Byremovingmissingvaluesfromtheconstructeddatsetacompletedatasetisobtained. Itiscompleteinthesensethatthereareequallymanyobservationsfor all variables. The problemwithremovingmissingvalues is that alargeproportionof thedataarethrownawayas canbeseeninTable4.1. Somevariableshavemoremissingvaluesthanothersandbyexcludingsomeof thevariableswithmanymissingvalueswouldresultinalarger modelingdataset.Whenthedatahasbeencleansedtheyaresplittedintotrainingandvalidationsets. Thetotaldatawillbeapproximately splittedasfollows,50% willbeusedasatrainingset,25%asavalidationsetand25%asatestset:Training Validation Test4.1Datadimensions 37DataSet Rows ColumnsRawData- 2008 4063 2- 2007 4125 29- 2006 4237 29- 2005 4262 29- 2004 4521 29ConstructedData- 2008 3600 29- 2007 3599 29- 2006 3586 29- 2005 3788 29CompleteData- 2008 2365 29- 2007 2751 29- 2006 2728 29- 2005 2717 29Table4.1: Summaryofdatadimensionsandusableobservations.The training set is used to t the model and the validation set is used to estimatethe prediction error for model selection. In order to account for the small sampleof data, thatisof badcases, theprocessof splitting, ttingandvalidationisperformedrecursively. Theaverageperformanceoftherecursiveevaluationsisthenconsiderinthemodelingdevelopment.Thetestsetisthenusedtoassessthegeneralizationerrorof thenal modelchosen. Thetrainingandvalidationsets, together calledmodelingsets, arerandomlychosensetsfromthe2005, 2006and2007datasetwhereasthetestsetisthe2008dataset. Therecursivesplittingofthemodelingsetsisdonebychoosing a random sample without replacement such that the training set is 2/3andvalidationsetis1/3ofthemodelingset.Toseehowtheco-operatingbanksportfolioisconcentratedbetweensectorstheportfolioissplittedupintovemainsectors,thoseare:- Realestate- Trade- Production38 DataResources- Service- TransportThe portfolio is splitted according to a in-house procedure largely based a Danishlegislation called the Danish Industrial Classication 2003 (DB03) which is basedonEUlegislations. Toviewhowtheportfolioisdividedbetweensectorsthenumber of total observations of the complete data set and respective percentageof eachsectorcanbeseeninTable4.2. Table4.2alsoshowsthenumberofdefaultedobservations ineachsectorandtherelativedefaultrate.Sector Observations [%] DefaultObservations[%] DefaultRate (%)RealEstate 2295 [28.0] 21 [15.2] 0.92Trade 1153 [14.1] 11 [ 8.0] 0.95Production 3181 [38.8] 82 [59.4] 2.58Service 1348 [16.5] 21 [15.2] 1.56Transport 219 [ 2.7] 3 [ 2.2] 1.37All 8196 [100.0] 138 [100.0] 1.68Table 4.2: Summary of the portfolios concentration between sectors and sector-wisedefaultrates.ByanalyzingTable4.2itisapparentthattheproductionsectoristhelargestandhasthehighestdefaultrate. Ontheotherhandthetradeandreal estatesectorshaveratherlowdefaultrates.Itisdiculttogeneralizewhatdefaultratecanbeconsideredasnormal, butsomeassumptionscanbemadebyconsideringtheaveragedefaultratesoftheperiod1982-2006 intheU.S.reportedbyAltmanetal.[4]. Wheremostoftheobservationsarebetweenoneandtwopercentagesthatmightbeconsideredasnormal default rates. There are not as many observations betweentwo andvepercentage, whichcan thenbeconsidered as highdefault rates andpercentagesaboveveasveryhigh.4.2Quantitativekeygures 394.2 QuantitativekeyguresAs a quantitative measure of creditworthiness nancial ratiosare used. A nan-cial ratio is a ratio of selected values on a rms nancial statements1. Financialratioscanbeusedtoquantifymanydierentaspectsof armsnancial per-formanceandallowforcomparisonbetweenrmsinthesamebusinesssector.Furthermore,nancialratioscanbeusedto, comparermstoitssectoraver-ageandtoconsidertheirvariationovertime. Financialratioscanvarygreatlybetween sectors and can be categorized by which aspect of business it describes,thecategoriesareasfollows.- Liquidityratiosmeasuretherms,availabilityofcashtopaydebt.- Leverageratiosmeasurethermsabilitytorepaylong-term debt.- Protabilityratiosmeasurethermsuseof itsassetsandcontrol ofitsexpensestogenerateanacceptablerateofreturn.- Activityratiosmeasurehowquicklyarmconvertsnon-cashassetstocashassets.- Marketratiosmeasureinvestorresponsetoowningacompanysstockandalsothecostofissuingstock.Onlyrstfourcategoriesof theseratiosareusedtomeasurermscreditwor-thinessasthemarketratiosaremostlyusedinthenancialmarkets. Thedis-cussionhereandinthefollowingsectionsonnancialratiosislargelyadaptedfromCredit[11]andBodieetal.[9]As thevalues usedtocalculatedthenancial ratioareobtainedfromrmsnancialstatements,itisonlypossibletocalculatenancialratios whenarmhaspublisheditsnancial statements. Thisproducestwokindsof problems,rstlynewrmsdonothavenancial statementsandsecondlynewdataareonlyavailableonceayear.Mathematicallynancialratioswillbereferredtoasthegreekletteralpha,.Financial ratios are also referred to as key gures or key ratios both in this workandintheliterature. Thesummary statistics andgures are obtained by usingthecompletedatasets.1Financialstatementsare reportswhichprovide anoverviewofa rms nancialconditioninbothshortandlongterm. Financialstatementsareusuallyreportedannuallyandsplittedintotwomainparts,rstthebalancesheetandsecondlytheincomestatement. Thebalancesheet reports current assets, liabilities andequity, whilethe incomestatement reports theincome,expensesandtheprot/lossofthereportingperiod.40 DataResources4.2.1 LiquidityRatioTheliquidityratiois anancial ratiothat is usedas ameasureof liquidity.Theterm,liquidity,refers tohow easilyan asset canbeconverted tocash. Theliquidityratioinequation(4.1)consistsof currentassets2dividedbycurrentliabilities3andisthusoftenreferredtoasthecurrent ratio. Theliquidityratiois consideredtomeasure tosome degree whether or not armhas enoughresourcestopayitsdebtsoverthenext12months.liquidity =CurrentAssetsCurrent liabilities(4.1)Theliquidityratiocanalsobeseenasaindicatorofthermsabilitytoavoidinsolvencyintheshortrunandshouldthusbeagoodindicatorof creditwor-thiness. Byconsideringthecomponentsofequation(4.1), itcanbeseenthatalargepositivevalueofthecurrentratiocanbeseenasapositiveindicatorofcreditworthiness. In the case that the current liabilities are zero, it is consideredasapositiveindicatorof creditworthiness, andtheliquidityratioisgiventheextremevalue1000. InTable4.3thesummarystatisticsoftheliquidityratiocanbeseenforallsectorsandeachindividualsector.Statistics AllSectors RealEstate Trade Production Service TransportMin. 0.65 0.09 0.01 0.65 0.00 0.001stQu. 0.83 0.14 0.94 0.83 0.53 0.47Median 1.11 0.62 1.19 1.11 0.97 0.69Mean 1.26 2.31 1.53 1.26 1.57 0.863rdQu. 1.46 1.58 1.60 1.46 1.48 0.99Max. 25.64 275.50 37.21 25.64 91.80 10.54ev(1000) 0.95% 2.48% 0.78% 0.22% 0.37% 0.0%Table4.3: SummarystatisticsoftheLiquidityratio, withoutthe1000values.Therateofobservedextremevalues,ev(1000),isalsolistedforeachsector.Ascan be seen inTable 4.3 by looking at the medianand rst quarters the realestatesectorhasthelowestliquidityratio. Thetransportsectoralsohaslowliquidity ratios. The liquidity ratio for all sectors and each individual sector canbeseeninFigure4.1.2Currentassetsarecashandotherassetsexpectedtobeconvertedtocash, sold, orcon-sumedwithinayear.3Currentliabilitiestheseliabilitiesarereasonablyexpectedtobeliquidatedwithinayear.They usually include amongst others, wages, accounts,taxes, short-term debt and proportionsoflong-termdebttobepaidthisyear4.2Quantitativekeygures 41The liquidity ratio will simply be referred to as the liquidity whereas it measuresthermsabilitytoliquidatingitscurrentassetsbyturningthemintocash. Itis though worth noting that it is just a measure of liquidity as the book value ofassetsmightbeconsiderabledierenttoitsactual value. Mathematicallytheliquiditywillbereferredtoasl.4.2.2 DebtratioTheDebtratioakeygureconsistingof netinterestbearingdebtdividedbytheearningsbeforeinterest,taxes,depreciationandamortization(EBITDA)4.TheDebt ratiocanbecalculatedusingequation(4.2) wherethegures areobtainablefromthermsnancialstatement.DebtEBITDA=NetinterestbearingdebtOperatingprot/loss+Depreciation/Amortization(4.2)Wherethenetinterest bearing debtcan becalculatedfrom therms nancialstatementandequation(4.3).Netinterestbearingdebt = Subordinaryloancapital + longtermliabilities+ Currentliabilitiestomortgagebanks + Currentbankliabilities+ Currentliabilitiestogroup +Currentliabilitiestoowner,etc.Liquidfunds Securities GroupdebtOutstandingaccountsfromowner,etc.(4.3)TheDebt ratioisameasureof thepay-backperiodasit indicateshowlongtimeitwouldtaketopaybackall liabilitieswiththecurrentoperationprot.Thelongerthepaybackperiod, thegreatertheriskandthuswill small ratiosindicates that the rm is in a good nancial position. As both debt and EBITDAcanbenegative thereare someprecautions that have tobemade,asit hastwodierentmeaningif theratioturnsouttobenegative. Inthecasewherethedebtisnegativeitisapositivethingandshouldthusbeoverwritten aszeroora negative number to indicate a positive creditworthiness. In the case where theEBITDA is negative or zero the ratio should be overwritten as a large number toindicate poor creditworthiness, in the original dataset these gures are -1000 and1000respectively. Inthecasewhenbothvaluesarenegativetheyareassignedtheresultingpositivevalue, eventhoughnegativedebtcanbeconsideredasamuchmorepositivething.4Amortizationisthewrite-oofintangibleassetsanddepreciationisthewearandtearoftangibleassets.42 DataResourcesLiquidityDensity0 5 10 15 20 25 300.00.20.40.60.8All SectorsLiquidityDensity0 5 10 15 20 25 300.00.20.40.60.8Real EstateLiquidityDensity0 5 10 15 20 25 300.00.20.40.60.8TradeLiquidityDensity0 5 10 15 20 25 300.00.20.40.60.8ProductionLiquidityDensity0 5 10 15 20 25 300.00.20.40.60.8ServiceLiquidityDensity0 5 10 15 20 25 300.00.40.8TransportFigure4.1: Histogramoftheliquidityratioforall sectorsandeachindividualsector, the gures shows a rened scale of this key gure for the complete dataset.4.2Quantitativekeygures 43Theoverwrittenvalueshavetobecarefullyselectedinordertopreventthattheregression will beunstable. Histograms of theDebtratio forall sectors andeach individual sector can be seen in Figure 4.2. The 1000 values make it hardtoseethedistributionoftheotherguresandarethusnotshown. AscanbeseeninFigure4.2thedebtratioisdierentfordierentsectors, especiallyinthereal estatesector. Theretheratioisonaveragelargerforthereal estatesector than for theother sectors. In order to get an even better view of this keyguresummaryvaluesforallsectorsandeachindividual sectorcanbeseeninTable4.4.Statistics AllSectors RealEstate Trade Production Service TransportMin. 0.01 0.00 0.01 0.01 0.00 0.241stQu. 1.64 4.95 2.18 1.64 2.18 1.95Median 3.14 7.67 4.00 3.14 3.93 3.27Mean 5.87 11.56 6.62 5.87 6.78 5.873rdQu. 5.21 11.42 6.59 5.21 6.90 5.16Max. 469.90 454.70 601.00 469.90 162.40 157.10ev(1000) 6.73% 6.58% 6.50% 6.41% 8.61% 2.74%ev(-1000) 5.17% 4.23% 4.16% 5.28% 7.79% 2.74%Table4.4: Summaryofdebt/EBITDA, forall sectorsandeachindividual sec-tor, withoutguresoutsidethe 1000range. Therateof theextremevaluesev(1000) andev(-1000) foreachsectorisalsolisted.FromTable4.4it is clear that thereal estatesector has considerablelargerDebtratiothantheothersectorwhichareallratherequal. Theinconsistencybetween sectors has to be considered before modeling. Mathematically the Debtratiowillbereferredtoasd.4.2.3 ReturnonTotal AssetsTheReturnOntotal Assets (ROA) percentageshows howprotableacom-panysassetsareingeneratingrevenue. Thetotal assetsareapproximatedastheaverage ofthisyearstotalassetsandlastyearsassets,whicharetheassetsthatformedtheoperatingprot/loss. ReturnOntotal Assetsisameasureofprotability and can be calculated using equation (4.4) and the relative compo-nentsfromthermsnancialstatements.ROA =Operatingprot/loss12(Balancesheet0 + Balancesheet1)(4.4)44 DataResourcesDebt/EBITDADensity0 20 40 60 80 1000.000.050.100.15All SectorsDebt/EBITDADensity0 20 40 60 80 1000.000.050.100.15Real EstateDebt/EBITDADensity0 20 40 60 80 1000.000.050.100.15TradeDebt/EBITDADensity0 20 40 60 80 1000.000.050.100.15ProductionDebt/EBITDADensity0 20 40 60 80 1000.000.050.100.15ServiceDebt/EBITDADensity0 20 40 60 80 1000.000.050.100.15TransportFigure4.2: Histograms of Debt/EBITDAfor all sectors andeachindividualsector,inarenedscale. The 1000 valuesarenotshown.4.2Quantitativekeygures 45Inequation(4.4)thebalancesheets5havethesubscriptszeroandminusone,whichrefertothecurrentandlastyearsassets,respectively. Forrmsthatdoonlyhavethecurrentbalancesheet, thatvalueisusedinsteadof theaveragevalue of the currents and last years assets. Return on assets gives an indication ofthecapital intensityof therm,which diersbetweensectors. Firmsthat haveundergonelargeinvestmentswill generallyhavelowerreturnonassets. Startuprmsdonothaveabalancesheetandarethusgiventhepoorcreditworthyvalue-100. BytakingalookatthehistogramsoftheROAinFigure4.3itisclear that thetransport sector and especially thereal estate sector have a quitedierentdistributioncomparedtotheothersectors.Statistics AllSectors RealEstate Trade Production Service TransportMin. 104.10 100.00 100.00 104.10 100.00 100.001stQu. 3.17 3.13 3.93 3.17 2.69 3.78Median 7.43 5.67 7.71 7.43 6.67 6.97Mean 1.15 2.30 4.67 1.15 3.06 2.233rdQu. 12.60 8.23 13.12 12.60 11.44 9.76Max. 93.05 203.30 104.50 93.05 105.50 31.55ev(-100) 6.49% 5.01% 5.90% 8.20% 5.86% 4.11%Table4.5: SummaryofReturnOntotalAssetsAscanbeseenfromTable4.5theROAdierssignicantlybetweensectors.Themeanvaluesmightbemisleadinganditisbettertoconsiderthemedianvalueandtherstandthirdquartiles. ItcanbeseenthatthetransportandrealestatesectorsdonothaveashighROAastheotherswhichcanpartlybeexplainedbythelargeinvestmentsmadebymanyrealestatesectorrms. Itisalsoobservablethattherstquartileoftheservicesectorisconsiderablelowerthantheothersindicatingaheaviernegativetailthantheothersectors.4.2.4 SolvencyratioSolvency can also be described as the ability of a rm to meet its long-term xedexpenses and to accomplish long-termexpansion and growth. The Solvency ratioisalso oftenreferred toas theequityratio, consists of theshareholders equity6andthebalancesheet,obtainablefromthermsnancialstatement.Solvency =ShareholdersequityBalancesheet(4.5)5Balancesheet=Total Assets=Total Liabilities+ShareholdersEquity6Equity=Total Assets-TotalLiabilities. EquityisdenedinSection??.46 DataResourcesReturnDensity100 50 0 50 1000.000.040.08All SectorsReturnDensity100 50 0 50 1000.000.040.08Real EstateReturnDensity100 50 0 50 1000.000.040.08TradeReturnDensity100 50 0 50 1000.000.040.08ProductionReturnDensity100 50 0 50 1000.000.040.08ServiceReturnDensity100 50 0 50 1000.000.040.08TransportFigure4.3: HistogramsoftheReturnOntotal Assetsforall sectorsandeachindividualsector.4.2Quantitativekeygures 47Thebalance sheet can be considered as eitherthe thetotal assets or thesum oftotal liabilities and shareholders equity. By considering the balance sheet to bethesumoftotalliabilitiesandshareholders equitythesolvency ratiodescribestowhatdegreetheshareholders equityisfundingtherm. Thesolvencyratioisapercentageandideallyontheinterval[0%,100%]. Thehigherthesolvencyratio,thebetterthermisnancially.By viewing Table 4.6 it can be seen that the minimum values are large negativegures. Thisoccurswhenthevaluationsplacedonassetsdoesnotexceedlia-bilities,thennegativeequityexists. Inthecasewhenthebalancesheetiszero,asisthecasefornewlystartedrms,theSolvencyratioisgiventheextremelynegative creditworthiness value of -100. To get a better view of thedistributionStatistics AllSectors RealEstate Trade Production Service Transportev(-100) 3.64% 4.23% 4.86% 3.02% 3.12% 3.20%Min. 138.10 133.40 100.00 138.10 234.40 100.001stQu. 14.59 10.06 13.59 14.59 13.01 13.30Median 24.27 22.00 22.52 24.27 24.82 18.23Mean 23.19 22.72 20.87 23.19 25.09 18.303rdQu. 34.96 38.01 34.95 34.96 41.11 27.79Max. 99.57 100.00 100.00 99.57 100.00 83.48Table4.6: SummarystatisticsoftheSolvencyratioof theSolvencyratio, histogramsof thesolvencyratiocanbeseeninFigure4.4. AscanbeseeninFigure 4.4thedistributionismainlyon thepositivesideof zero. Thetransport andreal estatesectors lookquitedierent comparedtotheothersectors. Thenbyconsideringthemedianvalueandtherstandthirdquantilesitisobservable thatthetradeandproductionssectors arequitesimilar. The real estate and service sectors are tailed towards 100 while the realestateisalsotailedtowardszero.4.2.5 DiscussionFirmsthathavejuststartedbusinessdonothaveanynancial statementstoconstructthequantitativekeygures. Inordertoassessthecreditworthinessof a start-up rm there are two possibilities. One is to build a separate start-upmodelandtheotheristoadaptthestart-uprmstotheratingmodel.Thereis oneotherthingworthnotingregardingnancial ratios, andthat is48 DataResourcesSolvencyDensity100 50 0 50 1000.000.020.04All SectorsSolvencyDensity100 50 0 50 1000.000.020.04Real EstateSolvencyDensity100 50 0 50 1000.000.020.04TradeSolvencyDensity100 50 0 50 1000.000.020.04ProductionSolvencyDensity100 50 0 50 1000.000.020.04ServiceSolvencyDensity100 50 0 50 1000.000.020.04TransportFigure4.4: HistogramsoftheSolvencyratioforallsectorsandeachindividualsector.4.2Quantitativekeygures 49thattheyareconstructedonvaluesthatarecalledbookvalueandmightbefar fromthe actual market value. Thebookvalueof liabilities is subjectedtolessuncertainty,butmightbesubjectedtosomeuncertaintyininterestandexchange rates. That is if they hold some debt that carry adjustable rates or areinforeigncurrencies,respectively. Astheequityiscalculatedasthedierencebetweenthetotal assets and totalliabilities,theequity valuemight befar fromtheactualmarketvalue. Thisfactresultsinsomedeteriationofthepredictivepowerofthenancialratios.4.2.6 ScaledKeyFiguresByconsideringthe keygures inprevious sections it is clear that therearetwoproblematicsituations. First, itisdicult todecidewhatvaluesshouldbeassignedinthecaseswhentheactual ratioisnonsenseandsecondlyisthedierencebetweensectors. Thepredictivepowerof thekeygureswouldbepoor, especiallyforsomesectors, if theywhereusedwithoutcorrectingthemfor eachsector. AnarticlebyAltmanandSaunders [3] reports that sectorrelativenancial ratios, rather thansimple rmspecic nancial ratios, arebetterpredictorsof corporatedefault. Itisstatedthatingeneral, thesectorrelativenancialratiomodeloutperformedthesimplermspecicmodel.The key gures have been scaled by the co-operating bank for use in their RMC.Thescalingprocessisperformedinsuchaway suchthatthescaledkeyguresareonthecontinuousscalefrom1to7where1indicatesabadsituationand7indicatesagoodsituation. Inthecaseswhentheactualratiosarenonsense,theyareassignedthevalue1if theyaretorepresentapoorcreditworthinessand7iftheyare torepresent apositivecreditworthiness. Afterthesimplermspecicnancial ratioshavebeenscaledtocorrectthemforeachsectortheyarereferredtoasscores. Since,theyhavebeenadjustedfortheirsectoritisofnointeresttoconsidereachsectorseparately.Histograms of thescaledquantitativefactors alongwiththedefault variableandRMCsratingscanbeseeninFigure4.5. InthesamegureonecanseetheSpearmansrankcorrelation7anddotplotsof thescaledkeygures. TheSpearmans rank correlation is used as an alternative to the Pearsons correlationasitisanon-parametricprocedureanddoesthusnotneedanydistributionalassumptions. In gure 4.5 it canbe seenthat thereis some correlation betweenthe scaled key gures, especially between the debt and return scores and liquidity7Correlationisanumerical measureof howrelatedtwovariablesare. Correlationcoe-cientsrangefromminusonetoonewhereonemeansthattheyarecompletelythesameandminusonethattheyaretotallydierent. Ifthecorrelationcoecientiszerothenthereisnorelationbetweenthetwovariables.50 DataResourcesandsolvencyscores.Mathematically the scaled key gures will be referred to as the greek letter alphawithatildesignaboveit, .4.3 QualitativeguresInthecreditapplicationprocess, creditexpertsratethepotential borrowerinsixdierentaspects, reectingthermspositioninthatparticulareld. Theeldsthatmakeupthequalitativeguresarethefollowing.- Managementandstrategy- Sectorstabilityandprospects- Market position- Stasituationproductionfacilitiesandassetassessment- Financialriskandmanagement- RefundingThecustomerchiefhandlingtheloanapplicationratesthepotential borrowerineacheld. Thequalitativeratingsareindiscretescalefrom1to7where1indicatesabadsituationand7indicatesagoodsituation. Thoseratingsthenneedtobeacceptedbyadministrators inthecredit departmentofthebank. Itispossibletorejecteachindividualfactorifitisnotrelevanttoarm.Inordertogetabetterfeel ofthequalitativefactorsadotplotcanbeseeninFigure4.6,wherereddotsaredefaultedrmsandblackdotsaresolventrms.InthesamegureonecanseetheSpearmansrankcorrelation andhistogramsof the qualitative factors. From Figure 4.6 it is clear that the qualitative factorsare considerable correlated. It is also noticeable that red dots appear more ofteninthelowerleftcornerofthedotplotsindicatingthatqualitativefactorshavesomepredictivepower.For example, donewrms not haveearlier minor maxratings, soif thosevariables are to be used in modeling purposes it would result in smaller datasets.For the qualitative gures there are quite a few cases where one of the six valuesismissingandinordertosavetheobservationfrombeingomitteditwouldbe4.3Qualitativegures 51DEFAULT2 6 10 1 3 5 7 1 3 5 70.00.40.826100.18RATING0.110.49DEBT_SCORE135713570.0650.560.27LIQUIDITY_SCORE0.081 0.200.620.034RETURN_SCORE13570.0 0.4 0.813570.110.671 3 5 70.360.491 3 5 70.0031SOLVENCY_SCOREFigure 4.5: Dotplot for all the scaled quantitative factors along with the defaultvariable and RMC ratings, where red dots are defaulted rm and black dots aresolvent rms. In the lower triangular the correlation of the variables can be seenandonthediagonalthererespectivehistograms.52 DataResourcesMANAGEMENT1 3 5 7 1 3 5 7 1 3 5 7135713570.65STABILITY0.63 0.65POSITION135713570.60 0.54 0.53SITUATION0.63 0.54 0.54 0.52REFUNDING13571 3 5 713570.69 0.591 3 5 70.56 0.601 3 5 70.69RISKFigure4.6: Dotplotforallthequalitativefactors,wherereddotsaredefaultedrmandblackdotsaresolventrms. Inthelowertriangularthereisthecor-relationof thequalitativefactorsandonthediagonal therearehistogramsofthem.4.4Customerfactors 53possibletoconsider theprincipalcomponent8representatives of thequalitativegures.In mathematical notations the qualitative gures will be referred to as the greekletterphi,.4.4 CustomerfactorsThecustomerfactorsthatarelistedinTable4.7aretheavailableinthedataastheyareusedinRatingModel Corporate. AscanbeseenfromTable4.7CustomerFactor FactorlevelIsthereanaccountantss Yes,anoutrightreservationannotationinthenancial Yes,asupplementaryremarkstatements? NoHasthecompanyfailedto Yes,withinthepastyearperformitsobligationtoFIH? Yes,withinthepast12-24monthsNoIsthecompanylistedonthe Nostockexchange? Yes,butthesharesarenotOMXC20-listedYes,andthesharesareOMXC20-listedAgeofcompanywithcurrent Uptoandincluding24monthsoperation? From25monthsanduptoandincluding60monthsFrom61monthsandolderTable4.7: CustomerfactorsusedintheCorporateModel.thecustomer factors all havethreelevels andmost negativeones areinthehighestrowandtheygetmorepositiveastheygetlower. Thestockexchangelisted rms are unlikely to have any predictive powers as their are very few stockexchangelistedrmsintheportfolioandfurthermoreitisnotaindicatorofamorelikelydefaultevent tobestockexchange listed,itisonthecontrary. Thestockexchange listedrmscanthusonlybeusedasaheuristicvariable,giving8TheprincipalcomponentanalysismethodispresentedinSection5.6.54 DataResourcesstockexchangelistedrmsahigherratingthanestimated. Thereasonforthisisthatstockexchangelistedrmshaveanactivemarketfortheirsharesandcangotothemarket wheninneedformoneybyoeringmoreshares.Mathematically, the customer factors will be referredto as the greeklettergamma,.4.5 OtherfactorsandguresIn this section some of the factors and gures that are not part of the qualitative,quantitativeguresorcustomerfactors,arepresented.4.5.1 RatingModel CorporateThe rating model used by FIH today is called Rating Model Corporate. As it isa rather delicate industrial secret it will just be briey introduced. The model isaheuristic9model whichusesthevariablespresentedintheprevioussections.Asystematicoverviewof theproceedingsof RatingModel CorporatecanbeseeninFigure4.7.Weightedaverageofthescaledqualitativefactorsandweightedaverageofthequalitativekeyguresareweightedtogethertogetaninitial score. CustomerfactorsarethenaddedtothemodelscorewhichisthenusedinanexponentialformulainordertogetanestimatedPD. ThePDsarethenmappedtoathenalscorewhichisontherange1-12. Therearealsoseveralspecialrules.Theweighted average makes it easy to handle missing values. The performanceofRMCcanbeseeninSection7.54.5.2 KOBRatingsKOBScoreisaratingfrom theDanishdepartmentofthermExperian whichisaninternationalratingagencyandisDenmarkslargestcreditratingagency.ThecorrelationofKOBratingsandRatingModel Corporateisaround0.6so9Aheuristicis aproblemsolvingmethod. Heuristicsarenon-conventional strategies tosolveaproblem. Heuristicscanbeseenassomesimplerules, educatedguessesorintuitivejudgments.4.5Otherfactorsandgures 55Figure4.7: SystematicoverviewofRatingModelCorporate.itcanbeassumedthatthereissomevariancethere. TheKOBratingisonthescale 0 to 100, where 0 is the worst, and 100 is the greatest. So if the rating is lowthen the creditworthiness is also low. The KOB rating is a weighted conclusion,KOBCreditRating RiskB VeryHigh/Unknown0-14 Veryhigh15-33 High34-49 Moderate50-69 Normal70-80 Low81-100 VerylowTable4.8: CreditworthinessofcreditratingoftheKOBmodel.wheretheeconomical factorshavethehighestweightbuttherearealsootherfactorsthat aretakenintoconsideration. Thesefactorscanhavepositiveornegative eectsand can change the ratings given in Table 7.16. There are somecomplications regarding the KOB score as their are some rms that are rated Bandthesomenumbere.g. B50. InordertosolvethatallrmsratedwithB50andhigherwheregiventhenumericvalue20andallrmshavingratingslower56 DataResourcesFactorsintheKOBModel WeightMasterdata 25%-Buisnesssector-Age- NumerofEmployesEconomicaldata 50%- Solvency- ReturnonEquity- Liquidity- Nettoresults- EquityOtherdata 25%- Payment History- AccountantsAnnotation- QulitativemeasureTable4.9: CreditworthinessofcreditratingoftheKOBmodel.thanB50weregiventhenumericvalue10.4.5.3 OtherguresInthedatasetsgeneratedfromthebanksdatabasetherearefewotherfactorsandguresthathavenotbeenmentionedearliertheyarethefollowing- Lowest EarlierRating- HighestEarlier Rating- GuarantorRating- SubjectiveRating- FirmsIdentityNumber- Default- Equitytheseguresandfactorsarenowgivenabrief introduction. Inmathematicalnotationsthesegureswill bereferredtoasthegreeklettersigma, , andtherstletterofthegureasasubscript.4.5Otherfactorsandgures 57LowestandHighestEarlierRatingsLowestandhighestearlierratingsarethemaximumandminimumratingthermhashadoverthelasttwelvemonths. Earlierratingsshouldonlybetakeninto consideration with the utmost care. When earlier values are used in model-ing purposes they are often referred to as having a memory. Including a variablewithamemorycouldunderminetherobustnessoftheothervariables.GuarantorRatingGuarantor Rating is the ratingof the guarantor. Armis saidto have aguarantor,ifsomeotherrmisreadytoadoptthedebttheborrowerdefaultsonitsdebt.SubjectiveRatingCreditexpertscangivetheirsubjectiveopiniononwhatthenalcreditratingshouldbe. Creditexpertsareonlysupposedtogivethissubjectiveratingifitisintheiropinionsomeexternalfactorsinuencingthermscreditworthiness.FirmsIdentityNumberEachrmhasanidentitynumberthatisusedtoobtainmatchinginformationbetweendierentdatasets.DefaultThe dependent variable is a logistic variable stating whether the rm has fullledits obligations or not. A formal and much more detailed description can be seeninSection2.EquityThe shareholders equity is the dierence between the total assets and total debt.Should all the rms assets be sold and all liabilities settled then the shareholders58 DataResourceswouldreceivethedierence,calledequity.4.6 ExploratorydataanalysisThe relative and cumulative frequencies and the relative ROC curve of 2005 and2006datacanbeseeninFigure4.8. TherelativeandcumulativefrequenciesandtherelativeROCcurveof 2005and2006datacanbeseeninFigure4.9.ThecompletedatasetswhereusedtoformFigures4.8and4.9. ThedefaultfrequencyofthedatasetscanbeseeninFigure4.8anditisinterestingtoseethat thereis quitesomedierencebetweenyears. Likewise, it is interestingtoseethedierencebetweenthedistributionsof thebadcases. Thereisalsoconsiderably betterresultsforthe2006 dataset compared tothe2005 dataset.4.6.1 VariableDiscussionThenumberofvariablesusedinthisanalysisisquitelimited. Itisthusworthconcludingwithfewwordonvariableselectionforthedevelopmentof anewcreditratingmodel.Chenet al.[10]lists 28 variables formodeling credit default and discusses theirpredictivepower,usingsupportvectormachineasamodelingprocedure. BehrandG uttler[7]reportquiteafewinterestingpointsonvariableselectionforalogistic regression. Another interesting thingis that theirresearch isperformedwithadatasetoftentimesthesizeoftheavailabledataforthisresearch.Foralogisticregression itmightimprovethemodelperformanceifmodelvari-able agewould be measured as a continous variable, thenby using CART anal-ysisit couldbepossibleto obtaininformation onat what age intervalrms aremostvulnerabletosolvencyproblems.Payment history of rms is likely to be a good source of information. By consid-eringrmsthatalways maketheirpaymentsoftime,theycanbeseenasrmsthatarenotsubjecttocashowproblems. Ontheotherhandrmsthataremakingtheirpaymentslate, but escapingdefault shouldbedocumentedandusedasearlywarningindicators.4.6Exploratorydataanalysis 592 4 6 8 10 120.00.20.4Rating Cl