07-10-09-EDF-Validation-All-2007.pdf

56
SEPTEMBER 10, 2007 POWER AND LEVEL VALIDATION OF MOODY’S KMV EDF CREDIT MEASURES IN NORTH AMERICA, EUROPE, AND ASIA MODELINGMETHODOLOGY ABSTRACT In this paper, we validate the performance of Moody’s KMV EDF credit measures in its timeliness of default prediction, ability to discriminate good firms from bad firms, and accuracy of levels in three regions: North America, Europe, and Asia. We focus on the period 1996–2006 for most of our tests. Wherever possible, we compare the performance to that of other popular alternatives, such as agency ratings, Moody’s KMV RiskCalc ® EDF credit measures, Altman’s Z-Scores, and a simpler version of the Merton model. We find that EDF credit measures perform consistently well across different time horizons, and different subsamples based on firm size and credit quality. Our tests indicate that EDF credit measures provide a very useful measure of credit risk that can be applied throughout the world. AUTHORS Irina Korablev Douglas Dwyer

Transcript of 07-10-09-EDF-Validation-All-2007.pdf

  • SEPTEMBER 10, 2007

    POWER AND LEVEL VALIDATION OF MOODYS KMV EDF CREDIT MEASURES IN NORTH AMERICA, EUROPE, AND ASIA MODELINGMETHODOLOGY

    ABSTRACT

    In this paper, we validate the performance of Moodys KMV EDF credit measures in its timeliness of default prediction, ability to discriminate good firms from bad firms, and accuracy of levels in three regions: North America, Europe, and Asia. We focus on the period 19962006 for most of our tests. Wherever possible, we compare the performance to that of other popular alternatives, such as agency ratings, Moodys KMV RiskCalc EDF credit measures, Altmans Z-Scores, and a simpler version of the Merton model. We find that EDF credit measures perform consistently well across different time horizons, and different subsamples based on firm size and credit quality. Our tests indicate that EDF credit measures provide a very useful measure of credit risk that can be applied throughout the world.

    AUTHORS

    Irina Korablev

    Douglas Dwyer

  • 2

    Copyright 2007, Moodys KMV Company. All rights reserved. Credit Monitor, CreditEdge, CreditEdge Plus, CreditMark, DealAnalyzer, EDFCalc, Private Firm Model, Portfolio Preprocessor, GCorr, the Moodys KMV logo, Moodys KMV Financial Analyst, Moodys KMV LossCalc, Moodys KMV Portfolio Manager, Moodys KMV Risk Advisor, Moodys KMV RiskCalc, RiskAnalyst, Expected Default Frequency, and EDF are trademarks owned by of MIS Quality Management Corp. and used under license by Moodys KMV Company.

    Published by: Moodys KMV Company

    To Learn More Please contact your Moodys KMV client representative, visit us online at www.moodyskmv.com, contact Moodys KMV via e-mail at [email protected], or call us at:

    NORTH AND SOUTH AMERICA, NEW ZEALAND AND AUSTRALIA, CALL: 1 866 321 MKMV (6568) or 415 874 6000

    EUROPE, THE MIDDLE EAST, AFRICA AND INDIA, CALL: 44 20 7280 8300

    FROM ASIA CALL: 813 3218 1160

  • TABLE OF CONTENTS

    3

    1 INTRODUCTION .................................................................................................. 5

    2 CREDIT RISK ASSESSMENT APPROACHES ........................................................ 5 2.1 Moodys KMV EDF Credit Measures ......................................................................................... 6 2.2 Agency Ratings .......................................................................................................................... 6 2.3 Moodys KMV RiskCalc EDF Credit Measures .......................................................................... 6 2.4 Mertons Structural Model........................................................................................................ 6 2.5 Altmans Z-Score ...................................................................................................................... 7

    3 EMPIRICAL METHODOLOGY ............................................................................... 8 3.1 Timely Default Prediction.......................................................................................................... 8 3.2 Default Predictive Power .......................................................................................................... 9 3.3 Level Validation with Default Data ............................................................................................ 9

    3.3.1 Interpreting the Analytical Outputs for Level Validation .............................................. 9 3.4 Level Validation with CDS Data............................................................................................... 11 3.5 Median EDF by Rating Category across Regions.................................................................... 11

    4 EMPIRICAL RESULTS ....................................................................................... 12 4.1 North America ......................................................................................................................... 12

    4.1.1 Data.............................................................................................................................. 12 4.1.2 Timely Default Prediction U.S. ................................................................................. 13 4.1.3 Default Predictive Power U.S. .................................................................................. 14 4.1.4 Accuracy of Levels U.S. ............................................................................................ 21 4.1.5 Timely Default Prediction Outside the U.S............................................................... 29 4.1.6 Default Predictive Power Outside the U.S. .............................................................. 30 4.1.7 Accuracy of Levels Outside the U.S.......................................................................... 31 4.1.8 Conclusion................................................................................................................... 32

    4.2 Europe ..................................................................................................................................... 32 4.2.1 Diversity in Bankruptcy Mechanisms and Creditor Protection .................................. 32 4.2.2 Data.............................................................................................................................. 34 4.2.3 Timely Default Prediction............................................................................................ 36 4.2.4 Default Predictive Power ............................................................................................ 36 4.2.5 Level validation with default data ............................................................................... 38 4.2.6 Level Validation with CDS Data................................................................................... 41 4.2.7 Conclusion................................................................................................................... 43

  • 4

    4.3 Asia .......................................................................................................................................... 44 4.3.1 Data.............................................................................................................................. 44 4.3.2 Timely Default Prediction............................................................................................ 45 4.3.3 Default Predictive Power ............................................................................................ 46 4.3.4 Level Validation ........................................................................................................... 48 4.3.5 Conclusion................................................................................................................... 50

    4.4 Median EDF by Rating Category across Regions.................................................................... 50

    5 CONCLUSION.................................................................................................... 51

    APPENDIX B: SUMMARY OF ACCURACY RATIOS FOR EDF CREDIT MEASURES AND AGENCY RATINGS BY YEAR ........................................................................................ 54

  • 5

    1 INTRODUCTION The new Basel Capital Accord states: The methodology for assigning credit assessments must be rigorous, systematic, and subject to some form of validation based on historical experience. There are two important components to this validation process: the ability to predict defaults and the accuracy of the default predictive measure.

    The first criterion implies that a credit measure should be dynamic enough to be a meaningful and timely signal of deteriorating credit quality or an impending credit event. In this regard, the Basel Accord states: Assessments must be subject to ongoing review and responsive to changes in financial condition. Before being recognized by supervisors, an assessment methodology for each market segment, including rigorous back-testing, must have been established for at least one year. This also means that the credit assessment technology should have the ability to distinguish between defaulters and non-defaulters. It should not allow defaulters to enter the sample while trying to create a sample of good quality firms (Type I Error). Conversely, it should not exclude good quality firms from the sample while trying to exclude potential defaulters (Type II Error).

    The second criterion is focused on the accuracy of the credit assessment measure so that it can be useful to banks and other financial institutions in their efforts toward risk measurement, valuation, and capital allocation. The Basel Accord states: Banks must have a robust system in place to validate the accuracy and consistency of rating systems processes, and the estimation of PDs (Probabilities of Default).

    The objective of this document is to compare the performance, based on the above validation criteria, of EDF credit measures with some of the other popular credit assessment approaches. The popular approaches that we consider are the following:

    Agency ratings RiskCalc U.S. v3.1 private firm model A Simple Merton structural model Altmans Z-Score In this paper we present our test results for three regions: North America, Europe and Asia. The rest of the paper is organized as follows: Section 2 discusses briefly the credit assessment approaches that we consider in our paper. Section 3 highlights the empirical methodology we follow to compare the approaches. Section 4 presents the results of our tests by region and interprets the economic meaningfulness of these results.1 Section 5 concludes the paper.

    2 CREDIT RISK ASSESSMENT APPROACHES The credit risk assessment approaches considered in this paper are:

    Moodys KMV EDF credit measures Agency ratings Moodys KMV RiskCalc private firm model Mertons structural model Altmans Z-Scores2 In the following section we briefly discuss each of the approaches.

    1 Section 4.1 presents the results for North America, section 4.2 presents the results for Europe, and section 4.3 presents the results for

    Asia. 2 For reasons explained in the next two sections, not all the approaches can be subjected to tests on all the criteria. We try to include as

    many of these approaches as possible in our test of each criterion.

  • 6

    2.1 Moodys KMV EDF Credit Measures The structural view on credit risk was first made commercially viable with the introduction of the Vasicek-Kealhofer (VK) model. This model offers a rich framework that treats equity as a perpetual down-and-out option on the underlying assets of the firm. This framework incorporates five different classes of liabilities: short-term liabilities, long-term liabilities, convertible debt, preferred shares, and common shares. To overcome the regular problems encountered by structural models due to the assumption of normality, the VK model uses an empirical mapping based on actual default data to get the default probabilities, known as EDF credit measures and offered by Moodys KMV.3 Volatility is estimated through a Bayesian approach that combines a comparables analysis with an iterative approach.

    EDF credit measures are the outputs of Moodys KMV Credit Monitor and CreditEdge applications. An EDF credit measure is a quantitative measure of credit quality. More specifically, an EDF credit measure is an estimate of the physical probability of default for a given firm. For an overview of the EDF credit measure, see Crosbie and Bohn (2003).

    In 2007, Moodys KMV released EDF 8.0, which refines the mapping of the Distance-to-Default to the EDF credit measure using a much larger default database observed over a longer time period. Details of the new model enhancement can be found in Dwyer and Qu (2007).

    The EDF estimates are now bounded between 0.01% (for an EDF value of 0.01) and 35% (for an EDF value of 35). Moodys KMV offers a term-structure of EDF credit measures for 1 to 10 years and an extrapolation scheme to get shorter-term EDF credit measures. The risk free rate used in the calculation of EDF credit measures is now updated monthly.

    2.2 Agency Ratings Moodys Investors Service, Standard and Poors Corporation, and other well-known rating agencies around the world have been assigning credit ratings to major borrowers for decades. These are ordinal measures of credit measures (i.e., they help rank firms by their quality of credit). These ratings have established international credibility because of the long history of rating agencies, and the extensive testing of their relative performance.

    2.3 Moodys KMV RiskCalc EDF Credit Measures Moodys KMV RiskCalc is designed to calculate EDF credit measures for private companies. Private companies are typically smaller than public companies and are not required to file financial statements with SEC.

    The RiskCalc model incorporates aspects of both the structural, market-based approach in the form of industry-level distance-to-default measures, and the localized financial statement-based approach. While it incorporates equity market information at the aggregate level, RiskCalc does not take advantage of the equity information of the specific company.

    We used the RiskCalc v3.1 U.S. model to obtain RiskCalc EDF credit measures for the set of publicly traded companies. Comparing public firm EDF credit measures to RiskCalc EDF credit measures computed on public firms represents an out-of-universe test of RiskCalc.

    2.4 Mertons Structural Model The Merton model of risky debt is the original structural model of credit risk, and perhaps the most significant contribution to the area of quantitative credit risk research. This model assumes that equity is a call option on the value of assets of the firm. From this insight, the value of debt can be derived based on the observed equity value. The default event is modeled as the firms asset value falling below a threshold level (i.e., default barrier). Given the default barrier, and the asset value parameters, the probability of default can be estimated for various horizons. A detailed description of this model can be found in most standard finance textbooks.4

    3 See Eom, Helwege, and Huang (2003) for details of the discussion.

    4 See, for example, Hull (1999).

  • 7

    For our specific tests, the model has been implemented as:

    Default Pointi,Merton = Short Term Liabilities + 0.5 Long Term Liabilities

    The default probability for a firm i for a time horizon t is computed as:

    ( )2ii,Merton

    i

    AVLln 0.5 tDefault Point

    PDt

    i i

    i

    + =

    (1)

    1equity i i

    i ii i

    EVL EVLAVL AVL

    =

    (2)

    ( ) ( )( )

    i 1 i,Merton 2

    2i

    i,Merton1

    2 1

    EVL AVL d Default Point e d

    AVLln r 0.5 tDefault Point

    dt

    d d t

    rti

    i

    i

    i

    = + + =

    =

    (3)

    i , equityi , AVLi, and EVLi are the asset volatility, equity volatility, asset value and equity value of firm i, respectively. (x) is the cumulative normal distribution function. i is the drift rate for the asset returns of firm i while r is the riskless rate of return. equityi is computed as the standard deviation of three years of weekly equity returns for company i. Asset value AVLi is computed by solving equations (2) and (3) simultaneously.

    5

    2.5 Altmans Z-Score Altmans Z-Score came as a response to the need for identifying the financial health of any business based on observable accounting and market ratios. This original measure was developed in 1968 by Edward Altman, whose Z-Score is available in various forms. We chose the public firm form, which includes market capitalization in the leverage ratio, and calculated Z-Scores as follows:

    5 In contrast to the two equations and two unknowns, we use an iterative approach to solve for empirical volatility which is combined

    with modeled volatility in a Bayesian fashion.

  • 8

    1 2 3 4 5( )Z X X X X X= + + + + Where

    1CurrentLiabilities1.2BookAssetValue

    X =

    is the ratio of Current Liabilities to Total Assets;

    2Retained Earnings1.4Book Asset Value

    X =

    is the Profitability Ratio;

    3Operating Income before Depreciation3.3

    Book Asset ValueX =

    is the ratio of EBIDTA to Total Assets;

    4Market Capitalization0.6

    Book Value of LiabilitiesX =

    is the ratio of Market Value of Equity to Book Value of Liabilities; and

    5Sales

    Book Asset ValueX =

    is the ratio of Sales to Total Assets.

    (4)

    The calculation typically produces a Z-Score between 5 and 10, with a high Z-Score implying a better credit quality and lower chance of bankruptcy. Z-Scores are not interpreted directly as default probabilities and therefore work as ordinal measures of financial health. Therefore, they cannot be used directly for valuation, quantitative risk assessment, and capital allocation purposes.

    3 EMPIRICAL METHODOLOGY In this section, we describe the methodology we chose for tests of each criterion.

    3.1 Timely Default Prediction Timeliness measures how many months before impending credit event EDF credit measures give signal of deteriorating credit quality. To test timeliness, we create a sample of defaulted firms, retaining monthly observations from 24 months prior to default up to12 months after default. We compute the median EDF credit measure and the median Moodys rating by months to default. We overlay and compare the median EDF credit measure and the median Moodys rating.

    For testing timeliness against rating, we use the Moodys rating. To ensure that the measure has stood the test of time and the rating grades and size, we also provide the analysis, wherever possible, for the subsets of data based on time period:

    19962000 2001 and beyond

  • 9

    3.2 Default Predictive Power While a default predictive measure can be timely for warning of impending defaults, it may not be so effective in distinguishing a good firm from a bad firm. The calibration of the model may be on the conservative side inflating the default probability of all suspect names, of which some names might not be genuinely distressed. In this case, even though one could claim that the model performed well in predicting impending defaults, it would be fairly mediocre in its ability to distinguish good firms from bad firms. One of the essential features of a good model is that it should be sophisticated enough to differentiate bad (genuinely distressed) firms from good (false alarms) firms. There are two well-known approaches to testing a model for its power:

    Cumulative Accuracy Profile (CAP) with its output known as Accuracy Ratio (AR). Receiver Operating Characteristic (ROC) with its output known as Area Under Curve (AUC). Typically, the larger the Accuracy Ratio or Area Under Curve, the better the model. In extreme cases, a totally random model that bears no information on impending defaults has AR = 0, and AUC = 0.5. For a perfect model, AR = AUC = 1. The two approaches are equivalent with AR = 2AUC-1. A more detailed discussion can be found in Appendix A.

    In this article, we use the Cumulative Accuracy Profile approach, and provide AR as our output. We compared EDF credit measures to:

    Ratings RiskCalc EDF credit measure Simple Merton model Altmans equity-based Z-Score.

    3.3 Level Validation with Default Data The level validation of EDF credit measures verifies how well the models predicted default rates track realized default rates. We employ the same methodology described in Bohn, Arora and Korablev (2005) which was first developed in Kurbat and Korablev (2002). The procedure is summarized into the following four steps:

    1. Using Monte Carlo technique, we simulate asset value movements based on a single factor Gaussian model to capture correlated defaults.

    2. We determine default/non-default state based on the level of each firms EDF credit measure and each simulation outcome.

    3. We compare the actual default rate to the median, 10th percentile and 90th percentile of the simulated distribution.

    4. We compute the probability of observing a default rate less than or equal to the realized default rate given the model and the correlation coefficient.

    We extend this methodology by using Bayesian methods to compute the posterior distribution of the aggregate shock given the realized default rate, the model, and the correlation coefficient. The extension to the original methodology is developed in Dwyer (2007).

    3.3.1 Interpreting the Analytical Outputs for Level Validation We create two graphs as an output to the level validation test. Figure 1 is the illustrative example of the output, and is the comparison of the median predicted (by simulation) default rate and realized default rate. The median predicted default rate is the black line. Red line represents the actual default rate. Fifty percent of the time the actual default rate should be above (or below) the median. We also show the mean of predicted default rate, which is the blue line. Most of the time the actual default rate should be below the average predicted default rate. The two gray lines correspond to the prediction interval which represents the range of variability that is expected in the realized default rates given the EDF

  • 10

    values and the assumed correlation model. This prediction interval implies that eighty percent of the time the realized default rate should lie within the 10th and the 90th percentiles.6

    FIGURE 1 Illustrative example of the level validation output. Comparison of median predicted default rate and realized default rate.

    6 This prediction interval differs from the concept of a confidence interval. An x% confidence interval is random interval for which

    the probability of it holding the true value of a parameter is x%. In our context here, an x% prediction interval has the interpretation that x% of the time the realized default rate will be within this range given the EDFs levels and the correlation model.

    The actual default rate should lie within the 10th and 90th percentile 80% of the time.

    The actual default rate.

    The median predicted default rate. Fifty percent of the time the actual default rate should be above (or below) the median.

    The average predicted default rate. Most of the time the actual default rate should be below this average.

  • 11

    FIGURE 2 Illustrative example of the level validation output. Posterior distribution of the aggregate shock and P-value of the actual default rate

    The figure depicts the posterior distribution for the aggregate shock that was derived given the realized default rate, the model and the correlation coefficient. We also computed the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate. This P-value is shown as a blue line.

    3.4 Level Validation with CDS Data This test analyzes the level bias in European EDF credit measures relative to that of U.S. EDF credit measures. The rationale for the test is based on the assumption that similar risks should offer similar premium in the U.S. and Europe.

    We compare the median as well as 25th and 75th percentile CDS levels of two regions: U.S. and Europe across EDF-implied rating groups. The same EDF categories should have same aggregate median spreads in CDS market across two regions. We used Mark-It composite CDS data from January 2003 to December 2006. The Europe region is based on the following currency information: Euro, Austrian Schilling, Belgian Franc, Swiss Franc, Czech Republic Koruna, Deutsche Mark, Danish Kroner, Spanish Peseta, Finnish Markka, French Franc, Greek Drachmae, Hungarian Forint, and British Pound. The U.S. region is based on the U.S. dollar.

    3.5 Median EDF by Rating Category across Regions We calculate and compare median EDF credit measures for North American non-financial companies, Asian-Pacific non-financial companies, European non-financial companies and global financial companies by several rating categories. In the absence of other measures of credit risk, e.g., spreads or defaults, a comparison with rating provides a sanity check on the rank ordering of risk produced by the EDF credit measure and the comparableness of level of the EDF credit measure across geographies.

    P-value measures the probability of observing a default rate at or lower than the actual default rate

    Median value of the aggregate shock given the actual default rate

  • 12

    4 EMPIRICAL RESULTS In this section, we describe empirical results.

    4.1 North America In this section, we describe empirical results obtained in North America. Results are separated into U.S. and North American companies that are headquartered outside of the U.S. These companies are predominantly headquartered in Canada, Bermuda and the Cayman Islands.

    4.1.1 Data We start with all U.S. firms that have publicly traded equity from 19962006, unless otherwise specified. We restrict the sample to non-financial firms with more than $30 million in size.7 For level validation we impose further restriction of $300 million in size.

    We also present results for comparable North American firms that are outside of the U.S. (Canada, Bermuda, Cayman Islands, Bahamas, Belize, Panama, Virgin Islands, and Netherlands Antilles). Table 1 shows the countries and the number of firm-months in each country that constitute North American module in Credit Monitor and CreditEdge. Outside of the U.S., the largest countries are Canada, Bermuda and the Cayman Islands.

    TABLE 1 Countries in the North American Database

    Country Number of Observations

    (firm-month)

    Netherlands Antilles 776

    Bahamas 440

    Belize 85

    Bermuda 3,552

    Canada 153,971

    Cayman Islands 975

    Panama 245

    USA 1,127,452

    Virgin Islands 491

    For all comparison against ratings, we used Moodys ratings.

    Defaults are based on the Moodys KMV Default database and include missed payments, distressed exchanges, and insolvency proceedings. The defaults have been collected on a daily basis for more than ten years using a variety of printed and on-line sources.8 By the end of 2006, we had about 7,900 public defaults worldwide. About 5,600 defaults were from North America.

    7 Size is measured by the sales of the firm for non-financial firms. Wherever the firms total sales number was not available, we used

    the book asset value of the firm. This number was further adjusted for inflation effect across years by adjusting the numbers to a common denomination by using a deflation adjustor calculated internally at Moodys KMV. 8 To collect defaults, we use numerous printed and online sources from around the world on a daily basis. We use government fillings,

    government agency sources, company announcements, news services, specialized default news sources and even sources within financial institutions to ensure to the greatest extent possible that we find all defaults. We also keep evidences in electronic format so that content can be easily verified. As a result, Moodys KMV has the most extensive default database for public firms.

  • 13

    4.1.2 Timely Default Prediction U.S. In this section, we compare the performance of EDF credit measures against agency ratings in their ability to predict timely defaults. Figure 3 demonstrates how the median EDF credit measure (represented by the solid black line) starts rising 24 months before the actual default, while the median Moodys rating stays flat until 13 months before default, and then shows a steep rise about 5 months before default. In that sense, the EDF credit measures seem to lead the ratings. This is also helped further by the fact that the EDF credit measure is more continuous, and therefore one can see a steady and continuous rise in the aggregate. Ratings, on the other hand, are discrete, and therefore one sees a step-like function with flat stretches implying that this measure does not instantaneously pick up the most currently available information.

    To test for the robustness of the results, we further divided our data into the subperiods:

    1996-2000 2001-2006 The period 19962000 is shown on the left panel of Figure 4, and the period 20012006 is shown on the right panel of Figure 4. Both EDF credit measures and ratings start at a higher level 24 months prior to default in the latter half of the sample. EDF credit measures continued to lead the agency rating in each subperiod, indicating that EDF credit measures indeed provide a more timely warning of impending defaults.

    FIGURE 3 Comparison of median agency ratings with Moodys KMV EDF values for rated defaulted firms

    in the U.S. from 2 years before default to 1 year after default between 1996 and 2006

    EDF measure is leading rating by 11 months

  • 14

    FIGURE 4 Comparison of median agency ratings with Moodys KMV EDF values for rated defaulted firms in the U.S. from 2 years before default to 1 year after default for subsamples: 19962000 (left panel)

    and 20012006 (right panel)

    4.1.3 Default Predictive Power U.S. In this section, we compare the performance of EDF credit measures against agency ratings, Z-Scores, and a simple Merton model in its ability to discriminate between good and bad firms. Our test statistic is the Accuracy Ratio as defined earlier. We also show the plots of Cumulative Accuracy Profiles of these measures for various subsamples selected using different horizons and size filters.

    EDF Credit Measure vs. Agency Rating

    Figure 5 shows the performance of EDF credit measures against ratings on the entire sample period of 19962006. By design, this test is restricted to the sample of rated firms only. It is clear that the EDF credit measure performs better than ratings on the entire sample period with their Accuracy Ratios at 0.88 and 0.75, respectively.

    To ensure that the measure is robust in its performance across various time horizons, we divide our sample into two subsets of data based on time periods:

    19962000 20012006 We provide the analysis by three different size categories:

    Size is greater than $30 million Size is between $30 and $300 million Size is greater than $300 million

  • 15

    FIGURE 5 Cumulative Accuracy Performance (CAP) curves comparing Moodys KMV EDF credit measures and agency ratings for U.S. non-financial companies between 1996 and 2006. The Accuracy

    Ratios for EDF measure and agency rating are 0.88 and 0.75, respectively.

    Table 2 illustrates the results for the subsamples. We find that the EDF credit measure substantially outperforms ratings, in all categories by at least 12%.

    TABLE 2 Accuracy Ratios by category for EDF Credit Measures and agency ratings for U.S. non-financial companies

    Date EDF Credit Measure

    Ratings

    19962006 0.88 0.75

    19962000 0.87 0.75

    20012006 0.88 0.75

    19962006, Size > $30 Million

    0.88 0.75

    19962006, Size $30-$300 Million

    0.75 0.57

    19962006, Size> $300 Million

    0.89 0.76

  • 16

    We also calculated Accuracy Ratios at the horizons longer than one year. The results are presented in Table 3. EDF credit measures have more discriminatory power than agency ratings at all horizons, but the difference is smaller at longer horizons.

    TABLE 3 Accuracy Ratios of one- to five-year EDF credit measures and agency ratings for U.S. non-financial companies between 1991 and 2006

    EDF Credit Measure

    Ratings Number of Observations

    Number of Defaults

    One-year EDF credit measure 0.88 0.76 2031 354

    Two-year EDF credit measure 0.81 0.73 1926 374

    Three-year EDF credit measure 0.77 0.71 1917 385

    Four-year EDF credit measure 0.72 0.7 1892 400

    Five-year EDF credit measure 0.69 0.68 1850 404

    The Accuracy Ratios (AR) for both the EDF credit measure and agency rating decreases with horizon. The difference between ARs becomes more compressed at longer horizons.

    Figure 6 and Figure 7 present the Accuracy Ratios for the EDF credit measure and agency rating by year at one- and five-year horizons respectively. 9 For each year, we used the EDF credit measure as of the last market day of the prior year to predict default during the next one or five years.

    At a one-year horizon, the EDF credit measure has better discriminatory power than agency rating in all years, except 1996, which had the least number of defaults. At a five-year horizon, the EDF credit measure also outperforms agency rating in all years except 2000.

    9 The numbers underlying Figures 6 and 7 are summarized in Tables 15th and 16th of Appendix B.

  • 17

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

    1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

    1-Year EDF Credit Measure Agency Rating

    FIGURE 6 Accuracy Ratios for EDF credit measures and agency ratings for U.S. non-financial companies by year at the one-year horizon

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

    1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002

    5-Year EDF Credit Measure Agency Rating

    FIGURE 7 Accuracy Ratios for EDF Credit Measures and agency ratings for U.S. non-financial companies by year at the five-year horizon

    EDF Credit Measure vs. Merton Default Probability and Z-Score

    In this section we compare the performance of EDF credit measures to the Merton models implied default probabilities and Z-Scores as described in Section 2. The sample period used is between 1996 and 2006. Unlike the rated firms, which are usually larger and higher profile, some of the unrated firms can be very small and their defaults can go unnoticed. In some cases, there can be some informal negotiations or bailouts, avoiding the default. These cases are likely

  • 18

    to contaminate our results. Therefore we filtered out very small firms (size < 30 million dollars) from our sample. 10 For the entire period 19962006, the results are shown in Figure 8. The results are presented on a joined sample of Z-Scores, Merton default probabilities, and EDF credit measures, which require each of these values to be non-missing.

    We find that the EDF credit measure substantially outperforms Merton default probability and Z-Score in terms of their ability to discriminate good firms from bad firms with their Accuracy Ratios at 0.82, 0.72, and 0.66 respectively. We further divide the sample into subsets of sizes 30 million dollars to 300 million dollars, and 300 million dollars and above. In both cases, the EDF credit measure outperforms the Merton model and Z-Score, as shown in Table 4.

    Once again, as a robustness check, we compared the performance of the two measures across the time horizons 1996-2000, and 20012006. The results are shown in Table 4. As expected, our results are fairly robust with EDF credit measures outperforming Merton default probabilities and Z-Scores across both horizons.

    FIGURE 8 Cumulative Accuracy Performance (CAP) curves comparing Moodys KMV EDF credit measures,

    Merton default probability and Z-Scores for U.S. non-financial companies between 1996 and 2006.The Accuracy Ratios for EDF measure, Merton Default Probability and Z-Score are 0.82, 0.72 and 0.66

    respectively.

    10

    Size is measured by the sales of the firm for non-financial firms. Whenever the firms total sales number was not available, we used the book asset value of the firm. This number was further adjusted for inflation effect across years by adjusting the numbers to a common denomination by using a deflation adjustor calculated internally at Moodys KMV.

  • 19

    TABLE 4 Summary of Accuracy Ratios across various size buckets and time horizons for EDF credit measure, Merton default probability, and Z-Score for U.S. non-financial companies

    Date/Size EDF Credit Measure Z-Score

    Merton Default Probability

    1996-2006, Size >$30Mln 0.82 0.66 0.72

    1996-2000, Size >$30Mln 0.82 0.66 0.73

    2001-2006, Size >$30Mln 0.82 0.67 0.71

    1996-2006, Size $30-$300 Million 0.76 0.65 0.67

    1996-2006, Size> $300 Million 0.88 0.66 0.77

    EDF Credit Measure vs. RiskCalc EDF Credit Measure

    In this section we compare the performance of EDF credit measures to RiskCalc EDF credit measures calculated for Public firms as described in Section 2. The sample period used was 19962006. As before, we filtered out very small firms (size < 30 million dollars) from our sample.11 For the entire period 19962006, the results are shown in Figure 9.

    We find that EDF credit measures have more discriminatory power than RiskCalc EDF credit measures, which we expected because RiskCalc does not incorporate firm-specific equity market information. Their Accuracy Ratios are at 0.82 and 0.68 respectively. We further divided the sample into subsets of sizes of 30 million dollars to 300 million dollars, and 300 million dollars and above. In both cases, EDF credit measures outperform RiskCalc EDF credit measures, as shown in Table 5. Both measures perform better for larger firms.

    Once again, as a robustness check, we compared the performance of the two measures across the time horizons 1996-2000, and 20012006. The results are presented in Table 5. As expected, our results are fairly robust with the EDF credit measures outperforming the RiskCalc EDF credit measures across both horizons. The Accuracy Ratio of the EDF credit measure is higher in the second period while Accuracy Ratio of the RiskCalc EDF stays the same.

    11

    Size is measured by the sales of the firm for non-financial firms. Wherever the firms total sales number was not available, we used the book asset value of the firm. This number was further adjusted for inflation effect across years by adjusting the numbers to a common denomination by using a deflation adjustor calculated internally at Moodys KMV.

  • 20

    FIGURE 9 Cumulative Accuracy Performance (CAP) curves comparing Moodys KMV EDF credit measures and RiskCalc EDF credit measures between 1996 and 2006 for U.S. non-financial

    companies. The Accuracy Ratios for EDF measure and RiskCalc EDF measure are 0.82 and 0.68 respectively.

  • 21

    TABLE 5 Summary of Accuracy Ratios for EDF Credit Measures and RiskCalc EDF Credit Measures for U.S. non-financial companies by different size buckets and time periods

    Date / Size EDF Credit Measure

    RiskCalc EDF Credit Measure

    1996-2006, Size >$30 Million 0.82 0.68

    1996-2000, Size >$30 Million 0.81 0.68

    2001-2006, Size >$30 Million 0.83 0.68

    1996-2006, Size $30-300 Million 0.76 0.64

    1996-2006, Size>$300 Million 0.89 0.72

    The EDF credit measure effectively discriminates between good and bad credits. It performed better than Z-Score, RiskCalc for private firms applied for publics, and simple implementation of a Merton model. It leads rating changes in predicting defaults and it performs well across multiple cuts of the data and multiple horizons.

    4.1.4 Accuracy of Levels U.S. The test for this criterion draws from the methodology used by Korablev and Kurbat (2002), and Bohn, Arora and Korablev (2005), which is described in Section 3. We also extended this methodology by using Bayesian methods to compute the posterior distribution of the aggregate shock given the realized default rate, the model and the correlation coefficient as described in Dwyer (2007).

    The other alternatives of credit risk measurement cannot be directly interpreted as physical default probabilities, or provide a framework that can account for the underlying correlations between assets. Therefore they cannot be compared against EDF credit measures for the level test.12 Secondly, we have issues of hidden defaults or missing defaults for smaller firms, as explained in Kurbat and Korablev (2002). Therefore, consistent with that paper, we restrict this test to firms of size 300 million and above.

    We first present results broken down by coarser levels of the EDF credit measure, then repeat the analysis for narrower ranges of the measure.

    Results for Firms with EDF Values Below 35%

    In the previous validation studies (Kurbat and Korablev (2002), Bohn, Arora and Korablev (2004)), the test was performed on the EDF 7.1 model, which was capped at 20%. In that case the predicted number of defaults was likely to underestimate the realized number of defaults due to the truncation effect. Therefore we divided our sample into two: EDF credit measures less than 20% and EDF credit measures equal to 20%.

    One of the main features of the EDF 8.0 model is the new cap of 35%. Now we can expect that the truncation effect would lessen or even disappear. Nevertheless, to be consistent with the previous studies we decided to split the sample into two: firms with EDF values less than 35% (3500 bps) and firms with EDF values equal to 35%. The comparison for the sample of firms with EDF values less than 35% is shown in Figure 10. The left panel of Figure 10 displays mean, median predicted (by simulation) and actual default rate for EDF values below 35% along with 80% confidence set for the predicted default rate. We used an asset correlation of 0.19 to simulate defaults in each year. The right panel of the

    12

    The exception to this is the Merton model but the default probabilities are too low as implied by the Merton model, and therefore it would usually underestimate the predicted number of defaults.

  • 22

    Figure 10 presents the posterior distribution for the aggregate shock given the actual default rate and P-values of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    The predicted default rate clearly tracks the realized default rate very well. All predicted default rates fall within the confidence set. The exception is year 2003, which was an uncharacteristically good year for the economy leading to a substantially lower number of defaults. In year 2003, to explain the low default rate, we estimate that the U.S. economy received a positive 0.84 standard deviation shock relative to market expectations. Such a positive shock is consistent with the high returns on the S&P 500 observed during that year. The P-values of the realized default rate range from 21% to 75%, which is within the sampling variability that would be expected.

    FIGURE 10 Comparison of median predicted default rate with the realized default rate, 19912006 The sample was restricted to U.S. firms larger than 300 million dollars and EDF credit measure less than 35%. We used an asset correlation of 0.19 to simulate defaults in each year. On the left panel, the gray lines represent the 80% prediction interval for realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the aggregate shock distribution and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    We summarize the numbers that underlie Figure 10 in Tables 6 and 7. Table 6 contains the number of firms, number of defaults, median and mean predicted default rate per year as well as the 10th and 90th percentiles for predicted default rate. It is clear from this table that the correlation effect skews the distribution of default rates to the left. If we ignored this effect and had simply taken the mean default rate of the sample, we would have grossly over-predicted the realized default rate. Table 7 contains the median aggregate shock, the 10th and 90th percentiles of the aggregate shock, and the p-value by year.

  • 23

    TABLE 6 Comparison of mean and median predicted default rate with the realized default rate between 1991 and 2006

    Year

    Mean Predicted

    Default Rate

    Median Predicted

    Default Rate

    Realized Default

    Rate

    10th percentile

    90th percentile Firms Defaults

    1991 2.3% 1.7% 2.5% 0.5% 4.9% 1554 39

    1992 1.4% 1.0% 1.0% 0.2% 3.2% 1549 15

    1993 1.3% 0.9% 0.9% 0.2% 2.9% 1639 15

    1994 1.1% 0.7% 0.6% 0.1% 2.5% 1775 10

    1995 1.2% 0.7% 0.9% 0.1% 2.6% 1847 16

    1996 1.2% 0.8% 0.9% 0.2% 2.8% 1906 17

    1997 1.2% 0.8% 0.8% 0.2% 2.6% 2054 17

    1998 1.1% 0.7% 0.9% 0.1% 2.4% 2114 20

    1999 1.8% 1.2% 1.0% 0.3% 3.9% 2106 22

    2000 2.6% 1.9% 1.9% 0.5% 5.5% 2042 38

    2001 3.6% 2.8% 2.7% 0.8% 7.3% 1783 48

    2002 2.5% 1.9% 1.8% 0.5% 5.4% 1707 31

    2003 3.0% 2.3% 1.0% 0.6% 6.2% 1635 16

    2004 1.2% 0.8% 0.7% 0.2% 2.8% 1699 12

    2005 0.8% 0.5% 1.0% 0.1% 1.9% 1806 18

    2006 0.7% 0.4% 0.2% 0.1% 1.5% 1835 4The sample was restricted to U.S. firms larger than 300 million dollars with EDF credit measures less than 35%.

  • 24

    TABLE 7 Summary table of aggregate shock and year-wise probability of realizing the actual number of defaults between 1991 and 2006

    Year 10th Percentile Median

    Aggregate Shock 90th Percentile Probability of having

    actual defaults or even lower

    1991 -0.64 -0.42 -0.19 68.7%

    1992 -0.28 0.03 0.34 51.7%

    1993 -0.37 -0.07 0.23 57.9%

    1994 -0.16 0.19 0.52 47.3%

    1995 -0.42 -0.13 0.15 58.9%

    1996 -0.35 -0.06 0.23 54.7%

    1997 -0.35 -0.06 0.22 57.7%

    1998 -0.52 -0.25 0.01 64.2%

    1999 -0.11 0.15 0.40 47.7%

    2000 -0.20 0.02 0.23 51.3%

    2001 -0.17 0.04 0.24 49.5%

    2002 -0.21 0.03 0.26 52.0%

    2003 0.54 0.84 1.13 21.3%

    2004 -0.20 0.13 0.45 51.4%

    2005 -0.86 -0.58 -0.31 74.5%

    2006 -0.02 0.46 0.92 45.1%The sample was restricted to U.S. firms larger than 300 million dollars with EDF credit measures less than 35%.

    Results for Firms with EDF Values Equal to 35%

    Figure 11 shows the median predicted and actual number of defaults for EDF credit measures of 35%. We used 0.181 as an asset correlation for pairs of firms in each year to simulate defaults. The companies in this sample are, on average, somewhat less correlated with each other than the set of firms with EDF credit measures of less than 35%. We find that the realized default rate ranges from 11% to 67%. The high default rate in 1998 is indicative of a large negative shock which is shown in Figure 11 along with the P-values of the realized default rate. The P-values range from 8% to 93%, which is within the sampling variability that would be expected over a 15-year period.

  • 25

    FIGURE 11 Comparison of median predicted default rate with the realized default rate, 19912006 The sample was restricted to U.S. firms larger than 300 million dollars and EDF credit measure equal to 35%. We used an asset correlation of 0.181 to simulate defaults in each year. On the left panel, the gray lines represent the 80% prediction interval for realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the aggregate shock distribution and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    We summarize the numbers that underlie Figure 11 in Tables 8 and 9. Table 8 contains the number of firms, the number of defaults, the median and mean predicted default rate per year, as well as the 10th and 90th percentiles for the predicted default rate. It is clear from this table that the correlation effect skews the distribution of default rates to the left. Table 9 contains the median aggregate shock, the 10th and 90th percentiles of the aggregate shock, and the P-value by year.

  • 26

    TABLE 8 Comparison of mean and median predicted default rate with the realized default rate between 1991 and 2006

    Year

    Mean Predicted

    Default Rate

    Median Predicted

    Default Rate

    Realized Default

    Rate

    10th Percentile

    90th Percentile Firms Defaults

    1991 35.0% 33.4% 40.0% 12.7% 59.7% 30 12

    1992 35.0% 33.4% 24.0% 12.3% 60.3% 25 6

    1993 35.0% 33.5% 33.3% 11.8% 60.9% 21 7

    1994 35.0% 33.5% 11.8% 11.2% 61.9% 17 2

    1995 35.0% 33.7% 15.4% 10.4% 63.5% 13 2

    1996 35.0% 33.6% 12.5% 11.0% 62.2% 16 2

    1997 35.0% 34.2% 11.1% 9.2% 67.1% 9 1

    1998 35.0% 33.6% 66.7% 10.8% 62.6% 15 10

    1999 35.0% 33.4% 35.1% 13.1% 59.2% 37 13

    2000 35.0% 33.5% 38.3% 13.6% 58.8% 47 18

    2001 35.0% 33.5% 40.2% 14.5% 57.8% 107 43

    2002 35.0% 33.5% 41.0% 13.9% 58.4% 61 25

    2003 35.0% 33.5% 38.0% 14.1% 58.2% 71 27

    2004 35.0% 33.4% 23.1% 12.4% 60.1% 26 6

    2005 35.0% 33.5% 10.0% 11.7% 61.1% 20 2

    2006 35.0% 33.5% 16.7% 11.4% 61.6% 18 3

    The sample was restricted to U.S. firms larger than 300 million dollars and EDF credit measure equal to 35%.

  • 27

    TABLE 9 Summary table of aggregate shock and year-wise probability of realizing the actual number of defaults between 1991 and 2006

    Year 10th Percentile Median

    Aggregate Shock 90th Percentile The probability of

    having actual defaults or even lower

    1991 -0.86 -0.30 0.24 63.1%

    1992 -0.21 0.44 1.06 30.6%

    1993 -0.65 0.00 0.63 50.0%

    1994 0.13 0.94 1.76 10.8%

    1995 -0.14 0.69 1.54 17.0%

    1996 0.07 0.88 1.71 12.0%

    1997 -0.21 0.72 1.68 12.4%

    1998 -1.96 -1.23 -0.53 92.8%

    1999 -0.60 -0.08 0.42 53.8%

    2000 -0.71 -0.24 0.22 60.2%

    2001 -0.68 -0.35 -0.04 64.4%

    2002 -0.79 -0.38 0.03 65.7%

    2003 -0.63 -0.24 0.15 60.0%

    2004 -0.15 0.49 1.11 28.7%

    2005 0.30 1.08 1.89 8.0%

    2006 -0.03 0.73 1.48 17.9%

    The sample was restricted to U.S firms larger than 300 million dollars with EDF credit measures equal to 35%.

    Results by EDF Subgroups

    To test the robustness of our results, we further divide the sample of firms with EDF values less than 35% into smaller groups. EDF buckets that we used along with correlation for default simulation in each bucket are presented in Table 10.

    TABLE 10 EDF buckets

    Stratum EDF Range CorrelationNumber of Firms

    1 0.015- 0.191 22887

    2 512- 0.177 1264

    3 1235- 0.192 442

    Figures 12, 13, and 14 show the median, mean, and the prediction interval for the realized default rate and actual default rate for EDF values in the range [0.02, 5), [5,12), and [12,35), respectively. It is clear from these figures that while the predicted and realized default rates can deviate from each other in certain years, there is no substantial bias in their levels over the long run. In general, the two levels track each other very well. All predicted default rates fall within the

  • 28

    prediction interval. Year 2003 was an uncharacteristically good year for the economy leading to a substantially lower number of defaults in two of the three subgroups.

    FIGURE 12 Comparison of median predicted default rate with the realized default rate, 1991- 2006 The sample was restricted to U.S. firms larger than 300 million dollars and EDF credit measure between 0.01% and 5%. We used an asset correlation of 0.191 to simulate defaults in each year. On the left panel, the gray lines represent the 80% prediction interval for realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the aggregate shock distribution and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    FIGURE 13 Comparison of median predicted default rate with the realized default rate, 1991 - 2006 The sample was restricted to U.S. firms larger than 300 million dollars and EDF credit measure between 5% and 12%. We used an asset correlation of 0.177 to simulate defaults in each year. On the left panel, the gray lines represent 80% prediction interval for predicted default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the posterior distribution of the aggregate shock and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th

  • 29

    percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    FIGURE 14 Comparison of median predicted default rate with the realized default rate, 1991- 2006 The sample was restricted to U.S. firms larger than 300 million dollars and EDF credit measure between 12% and 34.99%. We used an asset correlation of 0.192 to simulate defaults in each year. On the left panel, the gray lines represent the 80% prediction interval for realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the aggregate shock distribution and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    4.1.5 Timely Default Prediction Outside the U.S. The Timeliness test outside the U.S. produces very similar results to those in the U.S. The median EDF credit measure starts rising 24 months before the actual default, while the median rating rises 18 months before default from B2 to B3, then stays flat until six months before default at which point it rises sharply. EDF credit measures clearly lead ratings.

  • 30

    FIGURE 15 Comparison of median agency ratings with Moodys KMV EDF values for defaulted firms from two years before default to one year after default for North American companies outside the

    U.S. and sample period between 1996 and 2006

    4.1.6 Default Predictive Power Outside the U.S. In this section, we compare the performance of Moodys KMV EDF credit measures Z-Scores and a simple Merton model in its ability to discriminate between good and bad firms. We do not perform a power test against Agency Rating because of the small number of rated defaults outside the U.S.

    EDF Credit Measure vs. Merton Default Probability and Z-Score

    In this section we compare the performance of EDF credit measures to the Merton model and Z-Scores as described in Section 2. The sample period used was 19962006. We filtered out very small firms (size < 30 million dollars) from our sample as we did in the case of U.S. companies.13 Results for the entire period 19962006, are shown in Figure 16. The results are presented as a sample of Z-Scores, Merton Default Probabilities, and EDF credit measures. All three values should be non-missing to be included in the sample.

    We find that the EDF credit measure outperforms Merton Default Probability and Z-Score as a more effective statistic to discriminate good firms from bad firms with their Accuracy Ratios at 0.78, 0.70 and 0.65, respectively. Because of the sample size, we do not divide the sample into two subsamples as we did in the U.S.

    13

    Size is measured by the sales of the firm for non-financial firms. Whenever the firms total sales number was not available, we used the book asset value of the firm. This number was further adjusted for inflation effect across years by adjusting the numbers to a common denomination by using a deflation adjustor calculated internally at Moodys KMV.

  • 31

    FIGURE 16 Cumulative Accuracy Performance (CAP) curves comparing EDF credit measures, Merton Default Probability and Z-Scores between 1996 and 2006 for North American companies outside the U.S. The Accuracy Ratios for the EDF credit measure, Merton Default Probability and

    Z-Score are 0.78, 0.70 and 0.65 respectively.

    4.1.7 Accuracy of Levels Outside the U.S. Figure 17 presents the level validation results for the sample of firms with EDF credit measures below 35%. The left panel of the Figure 17 displays mean, median predicted, and actual default rate as well as 80% confidence set for predicted defaults. We used an asset correlation of 0.19 to simulate defaults in each year. The right panel of Figure 17 displays the posterior distribution for the aggregate shock given the actual default rate and P-values of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    Predicted default rate tracks the realized default rate very well. Realized default rate fluctuates around median predicted default rate. In all years, except 1991, predicted default rates fall within the confidence set. Year 1991 was a good year, leading to a lower number of defaults. In year 1991, to explain the low default rate, we estimate that the U.S. economy received a positive 0.82 standard deviation shock relative to market expectations. P-values are between 8% and 75% which is in the range we expect over 15-year period.

  • 32

    FIGURE 17 Comparison of median predicted default rate with the realized default rate, 19912006 The sample was restricted to North American firms outside the U.S. larger than 300 million dollars and EDF credit measure less than 35%. We used an asset correlation of 0.19 to simulate defaults in each year. On the left panel the gray lines represent 80% prediction interval for the realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate and red dotted line is the realized default rate. The right panel shows the posterior distribution of the aggregate shock and P-values. The dark black line, rm50 is the median of the aggregate shock; and the grey lines, rm10 and rm90, are the 10th and 90th percentiles for the aggregate shock; the blue line is the P-vale of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    4.1.8 Conclusion Results obtained for the North American sample show that the EDF credit measure leads the agency rating in timely default prediction. The EDF credit measure leads other alternative measures in its ability to discriminate good firms from bad firms over time and across various subsections of the data. We also showed that the model predicted default rates track realized default rates well and the model works well not only in the U.S., but also in North America excluding the U.S.

    4.2 Europe In this section, we describe the results obtained in Europe.

    4.2.1 Diversity in Bankruptcy Mechanisms and Creditor Protection Bankruptcy mechanisms can differ between regions. For example, Davydenko and Franks (2005) found that while the British bankruptcy mechanism is designed to be extremely creditor friendly, the French system is geared toward protecting a business as a going concern even at the expense of its creditors.14 While interpreting the validation results, it is important to understand the impact of these mechanisms on the outcome of the model. For example, if a system is too creditor-friendly, the creditors can pressure the firm at the slightest hint of distress. This action may cause a firm to file for bankruptcy sooner, although the recovery for creditors may be higher. On the other hand, if the system is too geared toward protecting a firm, the creditors may not be allowed to take a firm to court even if it is in severe distress.

    14

    A brief description of the similarities and differences among British, French, and German bankruptcy mechanisms is provided in Korablev (2005).

  • 33

    A second characteristic is the nature of debt in an economy. A creditor-debtor relationship might be close (as in Japan), or at arms length (as in the U.S.). If the creditors are few and have a close relationship with the debtor, they are more likely to evaluate the long-term potential of the debtor before taking it into bankruptcy. If the creditors are scattered, there is a higher likelihood of a free-rider problem, leading to a forced bankruptcy even if the debtor may have some long-term positive potential.

    In general, we see an equal contribution of non-bankruptcy defaults and bankruptcies in North America, while the European cases of distress are dominated by bankruptcies as shown in Figure 18.15 This may be influenced by two factors. First, in many economies within Europe, the debt is held more closely relative to that in the U.S., making it more likely to enter private renegotiations of debt and avoid default during times of a liquidity crunch. Second, many cases of defaults may not be covered by the media, and are in that sense hidden. These two factors should not be applicable to larger firms because their debt is usually widely held, and they are followed more closely by media.

    Figure 19 compares the percentage representation of default cases in Europe and North America by size over the period of 19962006. Defaults as a fraction of total distress cases are substantially smaller in Europe for small and mid-sized firms. Larger firms, however, have more comparable default behavior across Europe and North America. This shows that the model validation is more reliable on the sample of large companies because of the quality of data on actual defaults.

    North America Europe

    FIGURE 18 Percentage representation of defaults and bankruptcies in North American

    and European Markets between 1996 and 2006

    North America Europe

    FIGURE 19 Default events as a percentage of all distress cases across three size buckets

    between 1996 and 2006

    15

    The following events constitute non-bankruptcy defaults: missed interest or principle payment, distressed extension of a loan, distressed exchange offer, delay in paying substantial portion of trade debt, and government takeover of financial institution to prevent market collapse.

    0%

    20%

    40%

    60%

    80%

    100%

    1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

    Bankruptcy Defaults

    0%

    20%

    40%

    60%

    80%

    100%

    1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

    Bankruptcy Defaults

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

    Size < 30 Million 30 Million

  • 34

    The success of a model relies on the ability of the inputs to take regional nuances into account. A model whose inputs are not universal in concept may have more difficulty capturing the differences in characteristics of the system in which it is being implemented. As long as the economic fundamentals of a model are universal in nature, it is not necessary to interpret its output differently across different regions. For the Moodys KMV EDF model, one of the main drivers is asset value, which is inferred from the equity value and an underlying structural framework. The model should work well for data from individual regions and for data pooled across them because the equity markets take into account the regional differences.

    The extent to which different equity markets accurately reflect firm value and volatility has implications for the power and the level performances of the model. In fact, even if a model is powerful in discriminating defaulters from non-defaulters in different regions, but is off in its level performance, the aggregation of data across regions will make the model seem less powerful. For example, if a distance-to-default (DD) of 2 corresponds to an EDF credit measure of 5% in the U.K., but 2% in France, then an aggregation of data would incorrectly suggest that both a U.K. firm and a French firm with a DD of 2 correspond to the same rank in our test. In that sense, a default predictive power test on a dataset aggregated across different regions essentially tests a joint hypothesis that the model is powerful and that the DD-to-EDF mapping is similar across different regions. It could be the case that the model might be powerful in two regions separately, but may appear less powerful if the data are aggregated.

    Similarly, while testing for levels, one could imagine that the model had specified levels in two regions incorrectly, overestimating the default rate in one region and underestimating it in the other. However, it may work well on the aggregated dataset. Therefore, a reasonable level performance on aggregated data is a necessary, but not a sufficient, test for the level performance of the model in each region. Unfortunately, there is an insufficient number of defaults available to perform a reliable level test in each subregion of Europe.

    4.2.2 Data We start with all European firms that have publicly traded equity between 1996 and 2006. The sample was then restricted to non-financial firms with more than $30 million in size to avoid missing and hidden default problem.16 For level validation we imposed a further restriction of $300 million in size.

    16

    Following our practice in North America, size is measured by the sales of the firm for non-financial firms. Whenever the firms total sales number was not available, we used the book asset value of the firm. This number was further adjusted for inflation effect across years by adjusting the numbers to a common denomination by using the appropriate consumer price index and exchange rate.

  • 35

    TABLE 11 Number of companies by country in the European Module of Credit Monitor and CreditEdge

    Country Country

    Code Size >=

    $30 Million Size >=

    $300 Million

    Austria AUT 112 60

    Belgium BEL 134 77

    Switzerland CHE 232 153

    Czech Republic CZE 75 30

    Germany DEU 796 394

    Denmark DNK 171 77

    Spain ESP 177 124

    Finland FIN 153 81

    France FRA 897 413

    Great Britain GBR 1959 788

    Greece GRC 256 66

    Hungary HUN 37 15

    Ireland IRL 76 37

    Iceland ISL 7 5

    Israel ISR 136 48

    Italy ITA 301 172

    Luxemburg LUX 31 23

    Netherlands NLD 238 152

    Norway NOR 232 90

    Poland POL 118 43

    Portuguese PRT 90 37

    Russia RUS 61 57

    Slovakia SVK 16 4

    Slovenia SVN 8 7

    Sweden SWE 311 134

    Turkey TUR 165 55

    We also present the results for level validation for subsample of countries that have more than 100 companies of size $300 million. These countries include Switzerland, Germany, Spain, France, Great Britain, Italy, Netherlands, and Sweden. The number of firms by country and size is shown in Table 11.

  • 36

    Defaults are based on the Moodys KMV Default database and include missed payments, distressed exchanges and insolvency proceedings.17 For all comparisons against agency ratings we used Moodys ratings.

    4.2.3 Timely Default Prediction In this section, we compare the performance of EDF credit measures against agency ratings in their ability to predict timely defaults according with methodology described in section 3.1. We create a sample of defaulted firms retaining monthly observations from 24 months prior to default until 10 months after default. Only those observations were included in the sample that had non-missing history of EDF credit measures and ratings 24 months prior to default. We compute the median of the EDF credit measure and the median rating by months to default and overlay the median EDF and the median rating.

    Figure 20 demonstrates that in the event of default, EDF credit measures become elevated 11 months before ratings. Ratings move later and more abruptly, giving the most signal in the last nine months.

    FIGURE 20 Median agency ratings and Moodys KMV EDF values for rated defaulted firms in Europe

    from 24 months before default to 10 months after default between 1996 and 2006

    4.2.4 Default Predictive Power EDF credit measures outperform simple Merton model implied default probabilities and Z-Scores in its ability to discriminate between defaulters and non-defaulters, which can be seen from Figure 21. The Accuracy Ratios for the EDF credit measure, Merton default probability, and Z-Score are 0.79, 0.70 and 0.61, respectively.

    17

    To collect defaults, we use numerous printed and online sources from around the world on a daily basis. We use government fillings, government agency sources, company announcements, news services, specialized default news sources and even sources within financial institutions to ensure, to the greatest extent possible that we find all defaults. We also keep evidences in electronic format so that content can be easily verified. As a result, Moodys KMV has the most extensive default database for public firms.

  • 37

    We divide the sample into subsets of sizes $30 million to $300 million, and $300 million and above. In both cases the EDF credit measure outperforms the Merton model implied default probability and Z-Score, as shown in Table 10. All the measures improve for larger firms.

    As a robustness check, we compared the performance of the three measures across time horizons 19962000 and 20012006. The results, presented in Table 10, illustrate that the EDF credit measure outperforms the Merton model and Z-Score with EDF credit measure and Merton default probability performing better in 19962000 period while Z-Score has higher Accuracy Ratio in the second period.

    FIGURE 21 Cumulative Accuracy Profile curves (CAP) comparing Moodys KMV EDF credit measures, Z-Scores and Merton default probabilities for European non-financial firms between 1996 and 2006.

    The Accuracy Ratios for EDF measure, Z-Score and Merton default probability are 0.79, 0.61 and 0.70, respectively.

    We summarize our findings in this section in Table 12. The results clearly show that the EDF credit measure in Europe outperforms the other popular alternative in its ability to discriminate good firms from bad firms at a 1-year horizon.

  • 38

    TABLE 12 Summary of Accuracy Ratios, across various size buckets and time periods for European non-financial firms

    Date EDF Credit Measure Z-Score

    Simple Merton Model

    19962006, Size >$30 Million 0.79 0.61 0.70

    19962000 Size >$30 Million 0.79 0.53 0.71

    20012006 Size >$30 Million 0.78 0.64 0.64

    19962006, Size between $30$300 Million 0.75 0.60 0.64

    19962006, Size>$300 Million 0.83 0.65 0.77

    4.2.5 Level validation with default data To validate the accuracy of levels we followed the methodology described in Section 3.3.

    Results for the Whole Sample

    Figure 22 shows the level validation results for the sample of European firms with size greater than $300 million. The left panel of the Figure 22 displays the mean, median predicted and actual default rate along with 80% prediction interval for the default rate. We used an asset correlation of 0.25 to simulate defaults in each year. The right panel of the Figure 22 presents the posterior distribution for the aggregate shock given the actual default rate and P-values of the actual default rate, which is the probability of observing default at or lower than the actual default rate.

    The predicted default rate tracks the realized default rate well. There are exceptions, however, during times of systematic shock. For example, 2002 was a year when the markets crashed and there were an unexpectedly high number of defaults compared to what was predicted by the model. We estimated that the shock was negative 0.29 standard deviations. Similarly, year 2003 was an uncharacteristically good year for the economy leading to a substantially lower number of defaults. The graph of the aggregate shocks shows that in year 2003 the economy experienced a positive shock of 1.02 standard deviations that led to that small default rate.

    The results show that all realized default rates fall within the prediction interval. The P-values of the realized default rate range from 17% to 65%, which is within the sampling variability that would be expected.

  • 39

    FIGURE 22 Comparison of median predicted default rate with the realized default rate, 19962006 The sample was restricted to European non-financial firms larger than 300 million dollars. We used an asset correlation of 0.25 to simulate defaults in each year. On the left panel, the gray lines represent the 80% prediction interval for realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the aggregate shock distribution and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    We summarize the numbers that underlie Figure 22 in Tables 12 and 13. Table 12 contains number of firms, number of defaults, median and mean predicted default rate per year, as well as 10th and 90th percentiles for predicted default rate. We find that the mean predicted default rates are much larger than the median default rates indicating that the correlation effect skews the distribution of default rates to the left. If we ignored this effect and had simply taken the mean default rate of the sample, we would have falsely concluded that the model over predicts defaults. Table 13 contains the median aggregate shock, 10th and 90th percentiles of the aggregate shock and the P-value by year.

  • 40

    TABLE 13 Comparison of mean and median predicted number of defaults with the realized number of defaults between 1996 and 2006

    Year

    Mean Predicted

    Default Rate

    Median Predicted

    Default Rate

    Realized Default

    Rate

    10th Percentile

    90th Percentile Firms Defaults

    1996 0.87% 0.40% 0.44% 0.00% 2.00% 1596 7

    1997 0.90% 0.50% 0.37% 0.00% 2.10% 1610 6

    1998 0.67% 0.30% 0.38% 0.00% 1.40% 1588 6

    1999 0.98% 0.50% 0.35% 0.00% 2.20% 1692 6

    2000 0.92% 0.50% 0.38% 0.00% 2.10% 1580 6

    2001 1.34% 0.80% 0.88% 0.10% 3.10% 1360 12

    2002 1.92% 1.20% 1.70% 0.20% 4.40% 1409 24

    2003 3.09% 2.20% 0.66% 0.50% 6.80% 1513 10

    2004 1.65% 1.00% 0.77% 0.20% 3.80% 1563 12

    2005 1.14% 0.70% 0.31% 0.10% 2.60% 1627 5

    2006 0.47% 0.20% 0.06% 0.00% 0.80% 1607 1

    The sample was restricted to European firms larger than 300 million dollars.

    TABLE 14 Summary of aggregate shock and year-wise probability of realizing the actual number of defaults between 1996 and 2006

    Year 10th Percentile Median

    Aggregate Shock 90th Percentile Probability of having

    actual defaults or even lower

    1996 -0.30 0.04 0.37 56.09%

    1997 -0.16 0.21 0.57 47.61%

    1998 -0.44 -0.07 0.28 60.46%

    1999 -0.08 0.28 0.62 46.15%

    2000 -0.18 0.18 0.53 48.31%

    2001 -0.37 -0.08 0.21 55.76%

    2002 -0.52 -0.29 -0.06 64.51%

    2003 0.70 1.02 1.32 17.02%

    2004 -0.04 0.26 0.54 42.95%

    2005 0.15 0.54 0.92 38.48%

    2006 -0.05 0.57 1.17 49.24%

    The sample was restricted to the European firms larger than 300 million dollars.

  • 41

    Results for Countries having at Least 100 Companies with Size Greater than $300 Million

    We restricted the sample to the countries that have at least 100 companies with size greater than $300 million. These countries tend to have larger equity markets. For these companies, the predicted default rate tracks the realized default rate very well as shown in Figure 23. The relatively low default rate in year 2003 is indicative of a large positive shock. The P-values of the realized default rate range from 20% to 69%, which is within the sampling variability that would be expected.

    FIGURE 23 Comparison of median predicted default rate with the realized default rate, 19962006 The sample was restricted to European non-financial firms larger than 300 million dollars from the following countries: Switzerland, Germany, Spain, France, Great Britain, Italy, Netherlands, and Sweden. We used an asset correlation of 0.25 to simulate defaults in each year. On the left panel, the gray lines represent the 80% prediction interval for realized default rate, the black line is the median predicted default rate, the blue line is the mean predicted default rate, and the red dotted line is the realized default rate. The right panel shows the aggregate shock distribution and P-values. The dark black line, rm50 is the median for the posterior distribution of the aggregate shock; the grey lines, rm10 and rm90 are the 10th and 90th percentiles; the blue line is the P-value of the actual default rate, which is the probability of observing a default at or lower than the actual default rate.

    4.2.6 Level Validation with CDS Data The number of defaults observed for larger firms in Europe was less than in the North America, making the power of the test somewhat weaker compared to that in North America. Therefore, we present another indirect validation of EDF credit measures in Europe. This test analyzes the level bias in European EDF credit measure relative to that of the U.S. EDF credit measure. The rationale for the test is based on the assumption that similar risks offer similar premia in the U.S. and Europe. So, if we subdivide the firms based on EDF categories, then the same EDF categories should have same aggregate median spreads in CDS markets across the two regions.

    For example, if EDF levels in Europe substantially overstated the level of default risk in Europe relative to North America, then if we were to compare a European firm to a North American firm with a comparable EDF, the European firm on average would have a substantially lower CDS spread. Conversely, if there were no such systematic bias between EDF credit measures in North America versus Europe, then the median spread should be approximately the same.

    In Figure 24, we compare the median, 25th and 75th percentile CDS spreads for Aa and above and A EDF implied rating categories. The median spreads as well as 25th and 75th percentiles over time are comparable in the U.S. and Europe, thereby indicating no relative bias in EDF levels of Europe over that in the U.S.. We also tried this for Baa, Ba, B and

  • 42

    Caa EDF implied rating categories and found comparable results.12 The results are shown in Figure 25, and 26 respectively. There was some overlap in the underlying names in the two currencies. However, our findings are robust to using a completely non-overlapping sample as well. The subinvestment names can be impacted by liquidity risk that can be different in different regions, thereby making the test less reliable.

    FIGURE 24 Comparison of CDS spreads in the U.S. and Europe for Aa and above and A EDF-implied rating categories

    Blue lines represent 25th, median and 75th percentile of the CDS spread in Europe and red lines are similar data for the U.S.

    12

    The category Aaa is not shown because there were very few observations for CDS contracts in this category.

    Aa and above A

  • 43

    FIGURE 25 Comparison of CDS spreads in the U.S. and Europe for Baa and B EDF-implied rating categories

    Blue lines represent 25th, median and 75th percentile of the CDS spread in Europe and red lines are similar data for the U.S.

    FIGURE 26 Comparison of CDS spreads in the U.S. and Europe for B and Caa EDF implied rating categories

    Blue lines represent 25th, median and 75th percentile of the CDS spread in Europe and red lines are similar data for the U.S.

    4.2.7 Conclusion We showed that in Europe, EDF credit Measures lead Agency Ratings in timely default prediction. EDF credit measures lead other alternative measures in their ability to discriminate good firms from bad firms over time and across various

    Baa Ba

    B Caa

  • 44

    subsections of the data. Model-predicted default rates track realized default rates well and CDS spreads are similar to those in the U.S. for the same EDF-implied rating categories.

    4.3 Asia In this section, we describe the results obtained in Asia.

    4.3.1 Data We start with all Asian firms that have publicly traded equity from 1996 to 2006. We restrict the sample to non-financial firms with more than $30 million in size (unless otherwise specified) to account for hidden or missing defaults.18 Defaults are based on the Moodys KMV Default database and include missed payments, distressed exchanges, and insolvency proceedings.

    Table 14 shows the number of companies by country for two size categories: above $30 million and above $300 million that are in Asian module of Credit Monitor and CreditEdge.

    We decided to exclude some countries from level validation:

    China, because the government intervention default definition is not clear Australia and New Zealand, because they belong to the Pacific region Japan, because it has a different economic structure and a hidden default problem Pakistan and Sri Lanka, because they have a small number of companies The remaining countries have the most comprehensive default coverage. These countries are Hong Kong, India, Indonesia, Korea, Malaysia Philippines, Singapore, Thailand, and Taiwan. We ran power and level validation tests separately for Japan.

    18

    Size is measured by the sales of the firm for non-financial firms. Whenever the firms total sales number was not available, we used the book asset value of the firm. This number was further adjusted for inflation effect across years by adjusting the numbers to a common denomination by using a deflation adjustor calculated internally at Moodys KMV.

  • 45

    TABLE 15 Number of companies in Asian Module of Credit Monitor and CreditEdge by country and size

    Country Country

    Code Size >=

    $30 Million Size >=

    $300 Million

    Australia AUS 844 258

    China CHN 1357 385

    Hong Kong HKG 695 231

    Indonesia IDN 200 64

    India IND 567 174

    Japan JPN 3955 2274

    Korea KOR 899 377

    Sri Lanka LKA 19 1

    Malaysia MYS 643 135

    New Zealand NZL 101 47

    Pakistan PAK 83 24

    Philippines PHL 96 25

    Singapore SGP 494 129

    Thailand THA 360 79

    Taiwan TWN 1196 310

    4.3.2 Timely Default Prediction In this section, we compare the performance of EDF credit measures against agency ratings in their ability to predict timely defaults according to methodology described in section 3.1. We create a sample of defaulted firms retaining monthly observations from 24 months prior to default until 10 months after default. Only those observations were included in the sample that had non-missing history of EDF values and ratings 24 months prior to default. We compute the median of the EDF credit measure and the median rating by months to default and overlay the median EDF and the median rating.

    Figure 27 demonstrates that in the event of default, EDF credit measures become elevated 10 months before ratings.

  • 46

    FIGURE 27 Median agency ratings and Moodys KMV EDF values for all rated defaulted firms in Asia from 24 months before default to 10 months after default between 1996 and 2006. EDF values are displayed

    on log scale.

    4.3.3 Default Predictive Power The EDF credit measure has more discriminatory power than Z-Score and Merton Default Probability in Hong Kong, India, Indonesia, Korea, Malaysia Philippines, Singapore, Thailand and Taiwan as can be seen in Figure 28. The Accuracy Ratio for the EDF credit measure is 0.67. Contrary to the power tests performed in North America and Europe, Z-Score outperforms simple Merton model implied default probability in its ability to discriminate between bad and good firms with Accuracy Ratios being 0.57 and 0.56, respectively.

  • 47

    FIGURE 28 Cumulative Accuracy Profile (CAP) curves comparing Moodys KMV EDF credit measures and Z-Scores for Asian non-financial companies between 10/2001 and 12/2006. The Accuracy Ratios for EDF

    measure, Z-Score and Merton Default Probabilities are 0.67, 0.57 and 0.56, respectively.

    The EDF credit measure has more discriminatory power than Z-Score and Merton Default Probability in Japan. Consistent with the results in other nine countries, Z-Score has higher Accuracy ratio than Merton default probability. CAP curves are presented in Figure 29. Accuracy ratios of EDF credit measure, Merton default probability and Z-Score are 0.89, 0.79 and 0.77, respectively.

  • 48

    FIGURE 29 CAP curves comparing Moodys KMV EDF credit measures and Z-Scores for Japanese non-financial companies between 10/2001 and 12/2006. The Accuracy Ratios for the EDF credit measure,

    Z-Score, and Merton Default Probabilities are 0.89, 0.79 and 0.77, respectively.

    4.3.4 Level Validation Figure 30 shows the level validation results for the sample of Asian firms (Hong Kong, India, Indonesia, Korea, Malaysia Philippines, Singapore, Thailand and Taiwan) with size greater than $300 million. The left panel of Figure 30 displays the mean, median predicted, and actual default rate along with 80% prediction interval for predicted default rate. We used an asset correlation of 0.25 to simulate defaults in each year. The right panel of Figure 30 presents the posterior distribution for the aggregate shock given the actual default rate and P-values of the actual default rate, which is the probability of observing default at or lower than the actual default rate.

    Collection of default data in Asia is more difficult than in the U.S. and Europe because of language barriers, poor reporting of default events, and government intervention to prevent company collapse, which often goes unreported. We could expect the under prediction of defaults in 1996, 1997, and 1998 because of the severe Asian financial crisis. The over-prediction of defaults in 2001 and 2002 may reflect market uncertainties regarding the Asian recovery while Europe and North America were in recessions. The P-values of the realized default rate range from 11% to 87%, which is within the sampling variability that would be expected.