• No results found

Analyzing the Use of the Through-The-Cycle Methodology in US Bank Ratings 1986-2009

N/A
N/A
Protected

Academic year: 2021

Share "Analyzing the Use of the Through-The-Cycle Methodology in US Bank Ratings 1986-2009 "

Copied!
41
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Analyzing the Use of the Through-The-Cycle Methodology in US Bank Ratings 1986-2009

Steven Jonker

s1387006

Master Thesis

January 27, 2010

Supervisor: Prof. Dr. L.J.R. Scholtens Second supervisor: Dr. H. Gonenc

MSc Finance

Faculty of Economics & Business

University of Groningen

(2)

ANALYZING THE USE OF THE THROUGH-THE-CYCLE METHODOLOGY

IN US BANK RATINGS 1986-2009

STEVEN D.B. JONKER

1

ABSTRACT

My thesis is the first to analyze and quantify the use of the through-the-cycle (T-T-C) methodology in agency ratings of banks. I set up two credit-scoring models; one that predicts agency ratings in an ordered logit setting and one that predicts bank defaults in a logit setting. The dataset consists of 8,185 observations of US banks in the time period 1986-2009. The relative stability of agency ratings and the serial correlation in rating downgrades provide strong evidence of the use of the T-T-C methodology in bank ratings. Furthermore, my findings suggest that rating agencies implicitly employ a three-year default horizon in their rating models of banks. Hereby the default horizon is shorter than as evidenced for agency ratings of non-financial companies in the literature. This is in line with the empirical irregularities which indicate that the effects of the T-T-C methodology may be less pronounced in bank ratings.

JEL classification:

C35; G21; G24

Keywords:

Bank ratings; Through-the-cycle methodology; Rating agencies; Ordered logit

1 Contact information: Oudezijds Voorburgwal 212-I, 1012 GJ Amsterdam, The Netherlands, or via e-mail:

s.d.b.jonker@student.rug.nl. I would like to thank my parents, girlfriend, friends and my supervisor Prof. Dr. Scholtens for their comments and support.

(3)

I. INTRODUCTION

“There are two superpowers in the world today in my opinion. There’s the United States and there’s Moody’s Bond Rating Service. The United States can destroy you by dropping bombs, and Moody’s can destroy you by downgrading your bonds. And believe me, it’s not clear sometimes who’s more powerful.”

This statement by writer Thomas Friedman (1996) reveals the important role of the credit rating industry2 in today’s world. By origin a credit rating agency3 was above all a publisher of credit opinions, the world’s shortest editorials (Husisian, 1990). Over the last two decades however the influence of rating agencies grew considerably, stimulated inter alia by the Basel II Accord that further increased the reliance on agency ratings in the financial system (Basel Committee on Banking Supervision, 2001). As a consequence, the rating practices of agencies have been increasingly criticized.4 Often, the criticism is set off by the inability of agencies to warn investors of imminent corporate default in times of crisis (Cinquegrana, 2009).

Examples are the Asian Financial Crisis in 1997, where downgrades occurred during the crisis rather than before (IMF, 1999), and the collapse of Enron in 2001, where the investment-grade rating of Enron, up until four days before it declared bankruptcy, triggered investigations into the rating practices of agencies (US Senate, 2002). As a result, many financial market participants believe that rating agencies are passive in adjusting their ratings to new information of a company’s deteriorating creditworthiness (Gonzalez et al., 2004). In other words, agency ratings are perceived to be relatively stable. Furthermore, surveys performed by academics show that market participants would prefer more accurate and timely ratings over stable ratings (Ellis, 1998; Association for Financial Professionals, 2004; Cantor et al., 2007). In the aftermath of the 2007-2009 financial crisis, a prominent hedge-fund manager said: “Most of the companies that have run into trouble during the financial crisis were or still are AAA rated, including AIG, Fannie Mae, Freddie Mac, (…) and General Electric.”5. This comment suggests that the issue of rating stability has also been relevant during the most recent crisis in which predominantly banks have defaulted.

The purpose of my thesis is to obtain more insight into how rating agencies arrive at stable ratings which indicate the creditworthiness of banks. It has become widely known that the perceived stability of ratings is created deliberately by rating agencies as they use a through-the-cycle (T-T-C) methodology in their rating models (Basel Committee on Banking Supervision, 2000). This methodology implies assessing a company’s default probability in the downside scenario of the business cycle, so that the company’s rating would tend to exhibit stability over the course of the business cycle. By contrast, the point-in-time (P-I-T) approach creates ratings based on a company’s creditworthiness under the current (economic)

d

2 A credit rating is defined as “an assessment of how likely an issuer (i.e. a company) is to make timely payments on a financial obligation”, thereby indicating the creditworthiness of the company and implicitly its default probability (IOSCO, 2003). For a historic overview of the credit rating industry, see Cantor and Packer (1995) and Sylla (2001).

3 In this thesis, a credit rating agency refers to the three major agencies; Fitch, Moody’s and Standard & Poor’s (S&P).

There have been new raters entering the market recently, but still the three major agencies clearly dominate the credit rating industry with a combined market share of 98% (SEC, 2008).

4 The criticism focuses on the agencies’ disclosure practices, potential conflict of interest, anticompetitive practices, and diligence and competence (Frost, 2006).

5 Passage cited from David Einhorn’s speech at an investment conference (MarketWatch, 2009). Einhorn is well- known for foreseeing the September 2008 bankruptcy of Lehman Brothers, being a short seller in Lehman’s stock since July 2007.

(4)

Figure I: Point-In-Time versus Through-The-Cycle

The figure shows the different rating perspective of the point-in-time (P-I-T) and the through-the-cycle (T-T-C) approaches and has been adapted from Borio et. al. (2001). The bold line in the upper panel displays the distance to default of a company. A high distance to default indicates a low default probability and vice versa. The distance to default varies over time, probably following the course of the business cycle. P-I-T ratings closely follow the trend of the distance to default, as shown with the bold line in the lower panel (in which higher ratings involve a lower default probability). The dashed line in the upper panel represents an arbitrary distance to default that rating agencies set for a company in the downside scenario, reflecting a company’s permanent or fundamental creditworthiness.

Generally, the distance will be small (implying a higher default probability) because of the relatively difficult economic conditions.

The bold line in the lower panel reveals that T-T-C ratings are based on this distance to default and therefore exhibit stability through the trend of the business cycle.

0 2 4 6 8 10 12

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time

Arbitrary distance measure

Distance to default Downside scenario distance

0 2 4 6 8 10 12 14 16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time

Rating

P-I-T rating T-T-C rating

conditions. Figure I provides a graphical explanation of both approaches. But why do rating agencies choose to create T-T-C ratings? According to Cantor and Mann (2007), agencies respond to two conflicting needs of financial market participants: (1) rating accuracy, which refers to a high performance of ratings in predicting defaults correctly and timely; and (2) rating stability, which involves a moderate frequency and magnitude of rating changes. While the desire for accurate ratings seems intuitive, the rationale of stable ratings is evidenced by Löffler (2004a) who finds that rating stability prevents excessive transaction costs for investors. In the opposite case of continuous rating adjustments, more transactions are triggered by portfolio governance rules, which force an investment manager to sell off an asset after its rating is downgraded. The continuous rating adjustments are likely to include many rating reversals, creating the excessive transaction costs. Furthermore, stable ratings benefit the economy as a whole, since credit decisions of moneylenders often rely on agency ratings. Therefore rating stability over the business cycle prevents overlending during economic booms due to overly optimistic ratings and underlending during recessions because of too pessimistic ratings (Catarineu-Rabell et al., 2005). Interestingly, stable ratings are also in the interest of agencies themselves. Stable ratings make frequent rating reversals due to incorrect rating assessments less likely, which preserves the agencies’ reputation (Frost, 2006). To balance the needs of rating accuracy and stability, agencies affirm that they use the T-T-C methodology; e.g. Moody’s states

(5)

that its ratings “are intended to be accurate and stable measures of relative credit risk, as determined by each issuer’s relative fundamental creditworthiness and without reference to explicit time horizons”

(Cantor and Mann, 2003). Similarly, S&P (2006) maintains that its ratings are meant to be stable by employing an indefinable time horizon in the future and only adjusting to permanent changes in a company’s creditworthiness. Despite these insights into the agencies’ intentions, the exact measures that agencies employ in the T-T-C methodology to create stable ratings remain unknown to the outside world.

In response to the aforementioned criticism on agencies, both EU and US authorities aim to increase transparency by introducing regulations in the hitherto self-regulated credit rating industry.6 In light of the sentiment towards transparency, a more detailed disclosure of the T-T-C methodology is however unlikely, since the exact rating models that agencies use to arrive at their ratings are proprietary information (Smith and Walter, 2002). Nevertheless, the aim of my research is to find the measures that rating agencies appear to set in the T-T-C methodology to create stable ratings.

In this thesis I analyze the agencies’ use of the T-T-C methodology in the rating models of banks.

In this analysis, I conceptualize the methodology as the employment of a default horizon and a migration policy. The default horizon refers to the time period for which ratings reflect the default probability of a bank, whereas the migration policy contains the measures that determine how the agencies adjust their ratings after new information about a bank’s creditworthiness becomes available. My research objective is twofold: (1) to verify that agencies use the T-T-C methodology in bank ratings; and (2) to identify the default horizon and the migration policy which agencies implicitly set for bank ratings. To the best of my knowledge, this thesis is the first to analyze the use of the T-T-C methodology in bank ratings by quantifying the methodology. The empirical results of my research extend the understanding of T-T-C bank ratings and are relevant to financial market participants who may consider the implicit time horizon and migration policy when they utilize bank ratings. Moreover, the most recent financial crisis of 2007-2009 experienced many bank defaults. Examining ratings for the banking industry may provide interesting results given the specific characteristics of bank ratings that emerge from earlier studies. Empirical evidence suggests that bank ratings are less stable than the ratings of non-financial companies (Nickell et al., 2000) and that the serial correlation in rating downgrades is only significant in higher rating classes (Lando and Skødeberg, 2002). Therefore the effects of the T-T-C methodology in bank ratings seem to be less pronounced. In my research I use a dataset of US banks that combines their quarterly S&P credit rating with accounting ratios, market information and two proxies over the time period 1986-2009. Therefore my study also adds new empirical evidence to the literature on the basis of an up-to-date dataset.

Following the methodology of Altman and Rijken (2004, 2006, 2007), I set up and compare two credit-scoring models. First, the rating model predicts agency ratings in an ordered logit regression setting.

Second, the default model has the occurrence of a bank default event for various default horizons as the

6One of the main conclusions of a post-Enron report by the Securities and Exchange Commission (SEC) is that with more transparency the marketplace will form a better understanding of ratings (SEC, 2003). As a result, the Credit Rating Agency Reform Act of 2006 obligates agencies to disclose their ratings procedures and methodologies (US Senate, 2006). Moreover, the EU recently approved a proposal that requires agencies to present their assumptions and limitations extensively and to publish a transparency report annually (EU, 2009).

(6)

dependent variable. I estimate this model in a standard logit regression. Both models use the same independent variables, but the default horizon of the default model is altered by including only default events that occur within a specific time period. First I verify the use of the T-T-C methodology in bank ratings by investigating the relative stability of agency ratings compared to P-I-T ratings in rating migration distributions. Further evidence of the T-T-C methodology is found in the average rating migrations of agency ratings where I observe serial correlation in rating downgrades. Subsequently, I detect the default horizon that agencies appear to employ by comparing the relative weights of the predictors in the rating model and the default models of different default horizons. I improve upon the relative weight calculation of Altman and Rijken (2004, 2006) by using the method of relative weights for logistic regressions that has been designed by Tonidandel and LeBreton (2009). My findings suggest that rating agencies implicitly employ a default horizon of three years in their ratings of banks. To identify the migration policy, I adjust the rating model to account for different migration policies by including a threshold value and an adjustment fraction. The threshold value expresses that rating agencies only change the rating of a bank if the change in creditworthiness is of substantial magnitude. If a rating change is triggered, the agencies only partly adjust the rating to the new rating level by the value of the adjustment fraction. I construct implied ratings from the adjusted models and compare the average rating migrations of these implied ratings with those of agency ratings. However, the effects of the different migration policies are inconclusive, so that I am not able to identify a specific migration policy that agencies might employ. All in all, my results of rating stability and serial correlation suggest strong evidence of the use of the T-T-C methodology in agency ratings of banks, whereby the agencies implicitly employ a three-year default horizon.

The remainder of this thesis proceeds as follows. Section II reviews the literature and connects this thesis to the relevant studies. In addition, hypotheses are presented. Section III sets out the methodology, Section IV describes the data and provides descriptive statistics. Section V discusses the results. Finally, Section VI concludes.

II. REVIEW OF THE LITERATURE

This thesis connects to several strands of literature. First, this section describes how my research builds on the extensive academic work that uses credit-scoring models to explain agency ratings and company defaults. In addition, I combine two types of implied ratings that emerge from the literature. Furthermore, this section discusses how my thesis is related to the stream of literature presenting empirical evidence that verifies the T-T-C methodology. In particular, it is most closely related to those papers that specifically examine bank ratings. The empirical results place this thesis both in the field of these verification studies and into the developing area of studies that discuss a quantification of the measures by which rating agencies employ the T-T-C methodology. At the end of this section I present my hypotheses.

Academic researchers have, as I do in this thesis, proposed numerous models that attempt to explain a company’s creditworthiness. These are often called credit-scoring models (see for an overview

(7)

Altman and Saunders, 1998). The present thesis discusses two credit-scoring models; one that evaluates agency ratings in an ordered logit setting and one that explains bank defaults in a logit regression setting.7 Therefore my work falls into two different branches of the literature on credit-scoring models. The first branch contains the models of agency rating determination and prediction that have been developed ever since the inception of ratings. In these rating models the agency rating is the dependent variable. In early studies, such as those by Horrigan (1966) and West (1970), each rating class is assigned an ordinal number.

Then a regression of these numbers is performed on a set of independent variables which are believed to have explanatory power over the creditworthiness of companies, hence its credit rating. In the literature (e.g.

Kamstra et al., 2001) three well-established procedures emerge to estimate rating models; ordinary least squares regression (OLS), discriminant analysis, and ordered probit/logit models. However, OLS does not account for the ordinal and discrete nature of agency ratings. Furthermore, discriminant analysis underperforms since it is not able to cope with the non-normality of independent variables (Lennox, 1999).

For these reasons I conclude that the ordered probit/logit technique is superior to OLS and discriminant analysis. This is in line with the conclusion of Ederington (1985) who compares the three methods. In its use of the ordered logit technique, the present thesis relates to a number of previous studies on agency ratings in this branch (e.g. Deyoung et al., 2001; Kamstra et al., 2001; Shawn Strother, 2009). In other related studies (Poon et al., 1999; Godlewski, 2004; Caporale et al., 2009) the ordered logit model is also used for particularly bank ratings. The second branch of the literature on credit-scoring models consists of the default models. Instead of employing agency ratings as the measure of creditworthiness, these models have the default occurrence of companies as the dependent variable. Herewith they attempt to predict the default probability of a company directly.8 The logit model serves as a common method to estimate those default models. Previous applications of the logit model which relate to my work include early research by Ohlson (1980) and other studies by Johnsen and Melicher (1994), Carey and Hrycay (2001) and Shumway (2001). The logit model setting also facilitates the explanation of, in particular, bank failures (Martin, 1977;

Estrella et al., 2000; Kolari et al., 2002). Interestingly, two specific characteristics of banks are found to be relevant in the research into both bank ratings and bank defaults. Evidence by Wheelock and Wilson (2000) documents that recently established banks are associated with a higher default probability. This age effect diminishes after the first few years of bank existence. Rime (2005) reports that banks which are expected to be “too big to fail” enjoy a rating bonus of several notches. He shows that a bank’s size has a significant, positive effect on the bank’s creditworthiness.

With credit-scoring models, it is possible to construct implied ratings, i.e. not officially issued ratings, but ratings that are based on any set of creditworthiness indicators. The outcome of credit-scoring models is a credit score, which is then converted into an implied rating on the agency rating scale.

7See Cramer (2003) for an extensive explanation of the (ordered) logit model and its applications in economics.

8The informed reader may notice that agency ratings are nothing more than an indirect measure of creditworthiness, since the assignation of rating classes is based on the order of a company’s default probability. Nevertheless, a distinction between rating models and default models is still relevant as their methodology is different due to the ordinal nature of ratings.

(8)

Dependent on the kind of independent variables in the credit-scoring models, I distinguish two different types of implied ratings: (1) those based on accounting ratios and (2) market-implied ratings. The former type is pioneered by Beaver (1966) and Altman (1968), and for banks by Meyer and Pifer (1970). More recent studies still use this method (Blume et al., 1998; Amato and Furfine, 2004). A specific set of accounting ratios to measure the default probability of banks has been developed by the Federal Reserve (Deyoung et al., 2001); the CAMELS system. In general, the technique for accounting-based ratings involves choosing for the independent variables a set of key accounting ratios which predict a company’s default probability. These ratios are based on the financial statements of a company. On the other hand, market-implied ratings have various sources as they can be based on bond spreads, information from the equity market or spreads of credit default swaps (cf. Breger et al., 2003; Kealhofer, 2003; Fitch, 2007). The value of market-implied ratings has been recognized recently by the rating agencies themselves (Cantor and Mann, 2003). As a result, the agencies regularly report implied ratings along with their own agency ratings (Kou and Varotto, 2008). This validates the set-up of the paper by Altman and Rijken (2004) who use both accounting and market information to create implied ratings, which they compare to agency ratings. I use the same combination in this thesis. Altman and Rijken’s approach is unique in the literature, in the sense that they combine both a rating model and a default model with the independent variables based on accounting as well as market information. The aforementioned papers in the two branches generally use only one of these features. Consequently, the purpose of this set-up is not to only predict ratings or explain defaults, but to benchmark agency ratings with the implied ratings. Hereby it is possible to verify the use of the T-T-C methodology in agency ratings and to quantify the measures that the rating agencies implicitly employ in this methodology.

It is widely acknowledged in the literature that rating agencies use a T-T-C methodology to arrive at their ratings, because of the empirical irregularities in agency ratings that are evidenced by researchers (see for example Gordy and Howells, 2006). I recognize three particular characteristics of agency ratings for both financial and non-financial companies; namely, (1) the relative stability of ratings, (2) the serial correlation in rating downgrades, and (3) the procyclicality of ratings. First, empirical studies (Altman and Kao, 1992; Kealhofer et al., 1998) find that agency ratings are relatively stable compared to P-I-T ratings, i.e. ratings based on a company’s current condition in the current state of the business cycle. A direct result of employing the T-T-C methodology in ratings is in fact that they change less frequently. Particularly, the literature finds that agency ratings of banks also exhibit stability compared to point-in-time ratings (Curry et al., 2008). However when an industry variable is included, the volatility of rating migrations is substantially higher for banks than for industrials (Nickell et al., 2000). Thus, bank ratings are less stable.

Second, other evidence by Altman and Kao (1992) and Lando and Skødeberg (2002) documents serial correlation in rating downgrades, so that if the initial rating change is a downgrade, it is likely to be followed by another downgrade. This finding is the result of the step-wise approach that agencies take in adjusting ratings as induced by the T-T-C methodology. For bank ratings, an analysis of the Lando and Skødeberg (2002) results reveals that the serial correlation in rating downgrades is only found significant in

(9)

higher rating classes. With respect to the third issue, the results of the research on rating procyclicality are mixed. Amato and Furfine (2004) find that in general ratings are not procyclical, i.e. their level is not influenced by changes in the business cycle. However, in restricted samples of their dataset they do detect procyclicality, as also reported by other scholars (Nickell et al., 2000; Bangia et al., 2002). Furthermore, Pagratis and Stringa (2009) provide significant evidence that bank ratings exhibit procyclicality, because an economic recession following a period of credit boom implies lower bank ratings. The debate on whether ratings are procyclical is still ongoing. I argue that ratings do not necessarily have to be found fully independent from the economic cycle when the aim of rating agencies is to create T-T-C ratings. The reason is that using the T-T-C methodology does not yet imply that agencies are really able to achieve the absence of procyclicality. All in all, the empirical irregularities of rating stability and serial correlation are definitely ascribed to the use of the T-T-C methodology by rating agencies (see also Löffler, 2005), while the issue of rating procyclicality remains inconclusive for the time being. Bank ratings are found relatively less stable and exhibit less serial correlation than ratings of non-financials. As a result, these empirical irregularities of bank ratings in particular indicate that the effects of using the T-T-C methodology may be less pronounced in bank ratings.9 Hence, to verify the use of the T-T-C methodology in bank ratings I test if the bank ratings in my dataset exhibit the characteristics of rating stability and serial correlation. Similar to Altman and Kao (1992) and Lando and Skødeberg (2002) I evaluate the rating migration distributions and the average rating migrations with the purpose of verifying the use of the T-T-C methodology. Related verification studies include Carey and Hrycay (2001) and Löffler (2004b) who are the first to explicitly refer to the T-T-C methodology as the explanation for the empirical irregularities. Recent verification studies are Mizen and Tsoukas (2009) and Shawn Strother and Tibbs (2009) who observe that a simple scoring model based on seven financial ratios captures information about a firm’s creditworthiness that is only slowly reflected in S&P ratings.

Thus, due to the verification studies, consensus in the literature exists over the adoption of the T- T-C methodology by rating agencies. However, the implications of using this methodology for the rating process is less clear, i.e. the exact measures that agencies employ to create T-T-C ratings. In two papers, Altman and Rijken (2004, 2006) are the first to go beyond a verification of the methodology by modeling a quantification of the measures that rating agencies appear to employ in their ratings as a result of the T-T-C methodology. Altman and Rijken (2004) identify a first measure by comparing a rating model with default models of different default horizons. The default horizon refers to the time period for which ratings reflect the default probability of a company. The comparison between the models is based on the relative weights of the predictors in the agency model and all default models. Hereby Altman and Rijken conceptualize that in the rating model the relative weights of the predictors are determined by the weight that rating agencies

9 I believe a probable explanation for the different characteristics of bank ratings is the asset opaqueness hypothesis of Morgan (2002) in which he states that a bank is relatively more opaque than a non-financial firm. Furthermore, the business model of banks is by nature more risky than that of firms in non-financial sectors, since banks may change their composition of assets and liabilities overnight (Expert interviews by author, 2009). Moreover, Elton et al. (2001) find higher default probabilities for banks than for industrials. These larger uncertainties and default risks involved enforce more frequent rating adjustments when information about a bank’s creditworthiness becomes clear over time.

(10)

attach to these predictors in their rating models. Therefore the default model with a specific default horizon that has the best match in relative weights with the rating model identifies the default horizon of agency ratings. Altman and Rijken’s (2004) findings suggest that rating agencies implicitly employ a default horizon of six years. Altman and Rijken present the migration policy of the agencies as the second measure that is implicitly employed in the T-T-C methodology. This policy is defined as the rules that agencies follow when they change their ratings after new information about a bank’s creditworthiness becomes available. The migration policy of agencies is identified by finding a best match in rating migration distributions and average rating migrations between the agency ratings and implied ratings that include a range of different migration policies. Hereby Altman and Rijken derive that ratings are only changed when the P-I-T rating prediction differs at least 1.25 notch steps10 from the agency rating. This is conceptualized as the threshold value of rating migrations. Furthermore, they find that when a rating adjustment is triggered, the degree to which the rating is corrected depends on the adjustment fraction. This value amounts to 75 percent of the difference between the previous agency rating and the new implied rating from the default model. In their conclusion, Altman and Rijken (2004) describe these quantitative results as a long-term default horizon and a prudent migration policy which the rating agencies appear to employ to arrive at T-T-C ratings. In a later study, Altman and Rijken (2006) confirm their findings for the time period 1981-2002. They also show that rating outlooks and rating watch lists partly compensate for the stabilizing effect of the T-T-C methodology, as these measures are able to enhance rating accuracy (Altman and Rijken, 2007). The methodology in this thesis most closely follows, but also extends that of Altman and Rijken (2004). The empirical results are therefore best compared to those of Altman and Rijken.

Hereby my empirical evidence is the first to be directly compared to their results and additionally provides insight into the effects of examining ratings from a new and different industry sector. Nevertheless this thesis differs from Altman and Rijken in three ways. In the first place, all three studies by Altman and Rijken use a dataset of non-financial US corporate issuer ratings. In contrast to these studies, my dataset includes only banks. Altman and Rijken may have excluded financial companies from their research because no universal set of financial ratios exists that has the same credit risk implications for financial firms as well as for non-financial firms. This problem exists, as also noted by other academics such as Carey and Hrycay (2001), due to the very different nature of the business models of financial and non- financial companies. Since my dataset only includes financial companies and excludes non-financial companies, it is sufficient to use a set of independent variables which has relevance in explaining only a bank’s creditworthiness. Second, I utilize an up-to-date dataset that starts five years later than that of Altman and Rijken (2004) but ends in 2009. Third, I use a new method to calculate the relative weights of the predictors. The calculation method by Altman and Rijken (2004) potentially biases the relative weight measure due to correlation between the predictors. As a result, the relative weight value may not reflect the genuine weight of a predictor, but may be influenced by other predictors. Therefore this thesis is the first to

10 One notch step denotes a rating migration from the previous rating class to the nearest rating class on the upper or lower side, dependent on the direction of the migration.

(11)

use the technique of Tonidandel and LeBreton (2009), based on the theory of relative weights by Johnson (2000). This method is specifically designed for logistic regressions and accounts for the collinearity of predictors.

Several researchers progress on the research of Altman and Rijken (2004, 2006). I discuss three relevant papers that further address the quantification of the T-T-C methodology and relate to my thesis.

First, Kou and Varotto (2008) construct implied ratings from bond spreads and conclude that these ratings can predict agency rating changes up to six months before the announcement date. In other words, the critical slowness of rating agencies in adjusting ratings to new levels in creditworthiness is also detected by comparing agency ratings with implied ratings based on this source of market information. In agreement with the present thesis, the Kou and Varotto paper consists of a verification study that provides evidence of the use of the T-T-C methodology in agency ratings. Their choice of bond spreads as the sole basis for the construction of implied ratings differs from my approach, wherein I choose a combination of both accounting ratios and market information. In the second paper, Löffler (2008) does not construct implied ratings, but directly compares agency ratings with (trends in the) expected default frequencies based on the option-theoretic approach of Merton. His results suggest that ratings are capable of detecting the long-term trends in the expected default frequencies. Thus, agencies really possess the ability to rate T-T-C by focusing on the permanent component of a company’s creditworthiness. My thesis connects to Löffler’s results by demonstrating that this focus of agencies on the permanent component implicitly means that they employ a specific default horizon. Moreover, with the evidence presented by Löffler the possibility that the results of Altman and Rijken (2004, 2006) are derived by simple backward-looking smoothing of the statistical models is ruled out. Conversely, their results, which indicate that ratings have information power over longer time horizons, prove that the ratings possess a forward-looking ability. The third paper by Posch (2006) builds on the concept of Altman and Rijken’s (2004) threshold value. He also conceptualizes that agency ratings are not adjusted to market-implied changes in creditworthiness until such a change becomes large enough to exceed a certain threshold value. In contrast to the paper of Altman and Rijken, Posch uses an econometric friction model with variables based on expected default frequencies. His results are of interest as he reports a threshold value of about two notches, which is larger than the value found by Altman and Rijken (2004). Intriguingly, he also concludes that the threshold values differ depending on the direction of the rating action, i.e. being an up- or downgrade. The disagreement in results between the two studies could probably stem from the different datasets which they use, as the time period and geographic span are not comparable. In relation to the work of Posch (2006), the results of my research may provide more clarification with respect to the threshold value, since Posch includes financial companies in his dataset whereas Altman and Rijken (2004) exclude these. Collectively, the results of the three discussed papers provide additional evidence of and further insight into the use of the T-T-C methodology by rating agencies.

To summarize, my thesis adds value to the literature in the following respects: (a) to my knowledge, this thesis is the first attempt to quantify the implicit measures that rating agencies employ in

(12)

the T-T-C methodology for bank ratings in particular, (b) this thesis provides an updated study since it uses a dataset which includes the bank defaults during the 2007-2009 financial crisis, and (c) this thesis introduces in the modeling of credit ratings a new method of calculating the relative weights of logit regression predictors by using the technique of Tonidandel and LeBreton (2009).

In line with other verification studies (e.g. Carey and Hrycay, 2001; Löffler, 2004b), my research starts with verifying the use of the T-T-C methodology in bank ratings which consists of detecting rating stability (Altman and Kao, 1992) and serial correlation (Lando and Skødeberg, 2002). For this purpose, I test the following hypothesis:

- H1: Agency ratings of banks are relatively stable compared to P-I-T ratings and exhibit serial correlation in rating downgrades.

Empirical evidence for bank ratings shows that bank ratings are less stable than the ratings of non- financial companies (Nickell et al., 2000) and that serial correlation in rating downgrades is only significant in the higher rating classes of bank ratings (Lando and Skødeberg, 2002). These particular characteristics of bank ratings indicate that the effects of the T-T-C methodology may be less pronounced in bank ratings than in ratings of non-financials. In order to achieve T-T-C ratings to a lesser degree, rating agencies would have to employ a shorter default horizon and a less restricted migration policy for bank ratings. This would make agency ratings a bit more similar to P-I-T ratings in terms of rating stability. Hence, compared to the values found for ratings of non-financials by Altman and Rijken (2004), the default horizon in bank ratings should be lower than six years. Also, the agencies’ migration policy for bank ratings should have a lower threshold value and a higher adjustment fraction than in the migration policy for ratings of non-financials.

Therefore, based on the literature I state the following hypotheses:

- H2: Rating agencies employ an implicit default horizon in their bank rating models which is shorter than the six years as identified for ratings of non-financials by Altman and Rijken (2004).

- H3: Rating agencies employ an implicit migration policy in their bank rating models which has a lower threshold value and a higher adjustment fraction than found for ratings of non- financials by Altman and Rijken (2004).

III. METHODOLOGY

This section starts with discussing the structure and the statistical setting of the two credit-scoring models.

Subsequently, I elaborate on the set of independent variables included in both models. Thereafter, the construction of implied ratings is described. Also discussed is the analysis of rating migration distributions that leads to the verification of the T-T-C methodology in bank ratings. Next, I explain how comparing the

(13)

relative weights of the predictors in both models leads to the identification of the implicit default horizon.

Lastly, I demonstrate how comparing the average migration of ratings, adjusted for different migration policies, with those of agency ratings may reveal the agencies’ migration policy.

In line with the methodology of Altman and Rijken (2004), I set up two credit-scoring models: an agency rating (AR) prediction model and a default (DF) prediction model. The models are adapted from those of Altman and Rijken. Both models intend to explain the default probability of a bank, but employ a different regressand to account for this measure. In the AR model the dependent variable is the credit rating assigned to a bank by the agency. The dependent variable in the DF model is the occurrence of a default event of a bank for various time horizons.

For the AR model, I utilize an ordered logit regression to accommodate the ratings’ discrete values and the ordinal scale of ratings. The model is structured as follows:

AR

i,t

= α + β X

i,t

+ ε

i,t (Equation 1)

This equation estimates the agency rating score (ARi,t) for bank i in quarter t. For bank i, Xi,t is a set of key accounting ratios, a market-based measure and two proxies for the age and size of the bank in quarter t. The error terms are given by εi,t. If the assumption of homoskedastic errors is violated, generalized Huber-White standard errors will be used to relax the homoskedasticity assumption. The association between the agency rating score ARi,t and the actual agency rating yi,t is specified by:

y

i,t

= k

if

B

k1

< AR

i,t

B

k (Equation 2)

In this equation I define that if the agency rating score falls in between the boundaries Bk-1 and Bk, the rating is in rating class k and is therefore equal to the actual rating yi,t. The agency ratings are combined into seven different classes; this procedure is described in Section IV that discusses the data. The rating classes are numbered in a similar approach as that of Horrigan (1966) and Amato and Furfine (2004). The rating class with the highest default probability is assigned the number 1, the next rating class number 2, and so on, until the highest number for the rating class with the lowest default probability. Kisgen (2006) also follows this approach, and I use his tests of investment-grade/speculative-grade ratings to check the robustness of the coefficients. More details about the robustness checks are provided in Section IV. The cumulative logistic function F which gives the probability that yi,t equals k is modeled as follows:

P ( y

i,t

= k ) = F ( B

k

AR

i,t

) − F ( B

k1

AR

i,t

)

(Equation 3)

In the ordered logit regression model, the coefficients α, β and Bk are estimated by a standard maximum likelihood procedure so that the predicted rating of the model best matches the actual rating.

The DF model is structured as a standard logit regression model:

DF

i,t

= α + β X

i,t

+ ε

i,t (Equation 4) In this equation, DFi,t is the default score of bank i in quarter t that is directly related to the default probability. Furthermore, Xi,t is the same set of independent variables as in Equation 1, for bank i in quarter

(14)

t. The error terms are given by εi,t. Once again, generalized Huber-White standard errors will be used if the homoskedasticity assumption of the errors is violated. A maximum likelihood procedure is used to estimate the coefficients α and β. The association between the default score DFi,t and the default probability pi,t is specified by:

) exp(

1 ) 1 (

, ,

t i t

i

DF

p

E = +

(Equation 5)

This equation includes the default probability pi,t of a bank, which is defined as 0 if a bank’s default occurs within the default horizon. I define the default event of a bank as when the agency rating of the bank is the default rating D. For observations that do not default, pi,t equals 1. The default horizon determines the time span of the DF model, which restricts the time period in which banks may default or survive. By choosing one up to multiple years it is possible to alter the default horizon. For example, setting a one-year default horizon implies that the pi,t of all observations of which the bank defaults within one year is set to 0. For all other observations pi,t is assigned the binary number 1.

Both the AR model and the DF model include the same set of independent variables which are assumed to be relevant in predicting the default probability of banks. This set consists of key accounting ratios, a measure from the equity market and two proxies for the bank’s age and size. To start with, the accounting ratios are based on the widely used CAMELS system. Regulators such as the Federal Reserve work with the bank stress indicators of CAMELS, which is an acronym for the following assessment categories: capital adequacy, asset quality, management, earnings, liquidity, and sensitivity to market risk. I do not use the last category, so that in fact I use the old CAMEL system.11 The employment of similar categories by rating agencies validates my use of the CAMEL system; see for example the S&P approach of rating banks (S&P, 2004). Furthermore, an equity market measure is included to reflect the market’s opinion on the default probability of a bank. The last two independent variables are proxies for the age and size of the bank, which are deemed relevant predictors of bank defaults in the literature (Wheelock and Wilson, 2000; Rime, 2005). My procedure of selecting the variables that account for the CAMEL categories and the other measures is based on Löffler and Posch (2007) and is as follows. First, I compile a set of variables for all banks in the dataset that may potentially represent one of the above CAMEL categories or the other default prediction measures. Second, I examine in the dataset each variable’s distribution and the univariate relationship of each variable to default probability. As a result, for each category and measure I choose two or three variables which are available for a substantial number of observations in the dataset and for which the univariate logit regressions have a high explanatory power.

The latter is measured by a common goodness-of-fit measure, the Pseudo R-squared of the regression. In the fourth step, I run multivariate logit regressions in which each category and measure is alternately

11 In 1997, the sixth component (a bank’s sensitivity to market risk) was added to the CAMEL system, so that the acronym changed to CAMELS. However, the bulk of the academic literature still uses CAMEL, since it includes pre- 1997 data. This is also the reason why I do not include this variable in my credit-scoring models. Furthermore, an easy measure of a bank’s sensitivity to market risk is not available. See the explanation by Lopez (1999) for more information on CAMEL(S) ratings.

(15)

represented by one of the chosen variables. The ultimate set of variables is now chosen by considering the highest Pseudo R-squared of the multivariate logit regressions. Also, I test the joint significance of the removed variables by performing likelihood ratio tests of the regressions. This process ensures that I do not lose explanatory power by removing variables that may jointly explain defaults even though they are individually insignificant. Appendix I presents an overview of the variables that have been considered in the procedure, as well as the variable definitions. It turns out to be difficult to find a variable that represents management; the CAMEL category that measures the quality of the bank’s management. Efforts to cover this category by including cost efficiency variables fail as these variables prove to be consistently insignificant and to be diminishing the explanatory power of the multivariate regressions.12 Therefore I exclude the management category. Figure I displays the ultimate set of independent variables.

Figure I: Overview of the Set of Independent Variables

The figure presents the description of the independent variables included in the AR and DF model. Their exact definitions and their incorporation in Equation 1 and 4 are demonstrated in Equation 6.

CAMEL categories Market-based Proxies

Capital adequacy

Asset

quality Earnings Liquidity Market

leverage Bank age Bank size Total

shareholders’

equity / Total assets

Reserves for loans and asset losses

/ Total assets

Return on assets

Cash and cash equivalents

/ Total assets

Market value /

Total liabilities

Age of the bank since first rating

Market value / Total market value of NASDAQ Financial-100 Expected sign of the relationship with the dependent variable:

Positive Negative Positive Positive Positive Positive Positive

The following is a brief discussion of the economic rationale with respect to the expected sign of the variable coefficients in the AR and DF model estimations. Essentially, this expected sign involves the relationship of the independent variable with default probability.13 The first variable, representing the capital adequacy category, detects the solvency of a bank. A larger amount of equity may cover future losses and hence decreases the default probability, so I expect a positive sign of the coefficient. The asset quality of a bank is indicated by the bank’s reserves to cover future asset losses. A bank is likely to carry high reserves when it foresees substantial asset losses in the future. For this reason I expect a negative sign on the asset quality coefficient. Profitability indicates a bank’s ability to achieve a decent return with its business model, and thereby its strength to stay in business. Thus the earnings variable as measured by the return on assets is expected to have a positive coefficient in the models. The fourth variable denotes the cash position of a bank to measure its liquidity. Auxiliary to solvency as an indicator of default probability

12My guess is that the information content of the cost efficiency variables is already covered by the earnings variables.

I was not able to find any other ratio that may represent the management category.

13Please note that in both the AR and the DF model a lower default probability of a bank implies a higher outcome;

respectively a higher rating class k and a higher binary outcome of 1 for non-default occurrence. Therefore, if an independent variable has a positive relationship with the dependent variable, this means that an increase in the variable results in a lower default probability.

(16)

in the long-term, a higher liquidity variable indicates the ease with which a bank may finance its operations in the short-term. Consequently I expect a positive sign of the liquidity coefficient, as the higher the liquid assets, the less likely it is that the bank will experience default in the near future. The market-based variable reflects the willingness of investors to invest in the bank compared to the leverage as measured by the total liabilities of the bank. The market’s opinion is based on the confidence of investors and their actual expectations of the bank’s future profitability. Thus, greater market value involving larger investor confidence means a lower default probability, so a positive coefficient is expected. Lastly, the age and size of a bank are proxies of the bank characteristics that exhibit a relation with its default probability.

Reviewing the literature learns that recently established banks experience a higher default probability in their first years of operation (Wheelock and Wilson, 2000). Also, large banks are simply “too big to fail”, since their bankruptcy would have a major negative impact on the whole economy (Rime, 2005). Therefore, I expect a positive sign for the coefficient of the age and size proxies.

Based on the described procedure and the chosen independent variables in Figure I, the exact definition of the set Xi,t of independent variables and its incorporation in Equations 1 and 4 is as follows:

t i t i t

i t

i t

i

t i t

i t

i t

i t i

SIZE TL AGE

MV TA

CASHEQ CASH

TA INCOME NET

TA LOSSRES TA

DF EQ AR

, , 7

, 6

, 5

, 4

, 3

, 2

, 1 ,

,

) ( )

( )

( )

(

) (

) (

) ( ,

ε β

β β

β

β β

β α

+ +

+ + +

+

+ +

+

=

(Equation 6)

In this equation, for every bank i in quarter t, EQ is the total shareholders’ equity from the balance sheet, LOSSRESS is the reserves for loan and asset losses from the balance sheet, NETINCOME is the net income from the income statement, CASH is the cash item and CASHEQ the cash equivalents (i.e. the other liquid assets) from the balance sheet, and MV is the bank’s market value. The first four variables are all are divided by TA; the total assets of the bank. The market-based measure is divided by TL, the total liabilities of the bank. For the proxies, AGE is the number of years that the bank is included in the dataset, and SIZE is the market value of the bank divided by the total market value of the NASDAQ Financial-100 Index, representing the 100 largest financial securities listed on the NASDAQ Stock Market.

Apart from analyzing the coefficient estimates of the predictors in the credit-scoring models, I also construct implied ratings from the AR model and the DF model of various default horizons. For this purpose, I calculate the credit score of each observation in quarter t. This calculation consists of the values of the independent variables of the involved bank multiplied by the coefficient estimates of one of the models. The result is a credit score for each observation in the specific quarter. At the end of each quarter t, these credit scores are ranked. The score of each observation is converted into a rating, in such a way that the created frequencies in the different rating classes are equal to the original frequencies in the rating classes of the agency ratings. As an example, in each quarter the observations with the lowest credit scores are assigned the lowest implied rating yimpliedi,t, so that the number of observations in this rating class is

(17)

equal to the number of observations with the lowest possible agency rating yi,t in that quarter. In this way implied ratings can be constructed for each of the different credit-scoring models. A rating migration distribution is created for each credit-scoring model by counting all migrations and non-migrations of these ratings between consecutive quarters. The distribution specifies the direction (i.e. an upgrade, stay or downgrade) and the size (i.e. its notch steps) proportionally for all rating migrations. In line with Altman and Kao (1992) I may now observe in the rating migration distributions if bank ratings are more stable than the ratings which are based on the default models, i.e. the P-I-T ratings.

Subsequently, I discuss the procedure that leads to the identification of the rating agencies’ default horizon. By altering the default horizon of the DF model in Equation 5, I obtain estimations of the DF model for various default horizons. In all these estimations and the AR model estimation I am interested in revealing the relative weight of every predictor by considering the contribution that each variable makes to the total variance in the dependent variable. The relative weight RW of each predictor gives this insight and is calculated in all models as follows:

=

=

n

s s s

r r

RW

r

1

| |

|

|

σ β

σ

β

(Equation 7)

In this formula, βr is the coefficient estimate for predictor r, and σr is the standard deviation of the same predictor in the pooled sample. Their product is divided by the sum of the products of the coefficient estimate and the standard deviation for all n predictors in the model. However, this relative weight measure does not account for the correlation between predictors. It is therefore possible that the measure not only includes the stand-alone weight of the predictor, but also noise that is created by the collinearity with other predictors. To overcome this problem, I also calculate the relative weights of the predictors with the technique of Tonidandel and LeBreton (2009). This method is specifically designed for logistic regressions and accounts for the collinearity of independent variables.14 Their technique is based on Johnson’s (2000) relative weight procedure that creates a new set of independent variables which are orthogonal representations of the original predictors. I follow the method as described by Tonidandel and LeBreton that incorporates fully standardized coefficients in this procedure. First, I create orthogonal approximations of the original predictors and standardize the newly created variables. Thereafter I obtain the coefficients linking the original predictors to these standardized variables. Also I obtain the unstandardized coefficients linking the standardized orthogonal predictors to the dependent variable of each model. I compute the Pseudo R-squared value by regressing the observed values of the dependent variable on the predicted values of the dependent variable. All these variables are then combined to obtain the relative weight of each predictor. This calculation involves multiplying the unstandardized coefficient with the standard deviation of the predictor and the Pseudo R-squared value, divided by the standard deviation of the predicted values of the dependent variable. The relative weights of all predictors in each model then sum to the model’s

14Johnson (2000) and Tonidandel and LeBreton (2009) extensively describe the technique on how to obtain relative weights in logistic regression. Due to space considerations, I do not repeat this exercise and only briefly discuss the procedure of the technique. For more explanation, I refer the reader to the original texts.

(18)

Pseudo R-squared. Due to a conversion into percentages the relative weights are expressed as the proportion of predictable variance that is accounted for by each predictor.

Having the relative weights of the predictors in both the AR model and the DF model of various default horizons, I now test the association in relative weights between each DF model and the AR model.

The association is calculated by means of the chi-square statistic; a measure that may uncover the strength of association between two statistical series. The purpose is to find the DF model with a specific default horizon that is most associated with the AR model. This implies that its relative weights of the predictors most closely mirror those relative weights of the same predictors in the AR model. Hence, the contribution that the predictors in this DF model deliver provides the best match with the value that rating agencies attach to these predictors in their rating model, as represented by the AR model. In this way the specific default horizon of the best-matching DF model illuminates the default horizon that rating agencies employ in their rating models of banks.

Next is the identification of the migration policy that rating agencies implicitly employ in the rating of banks. Following Altman and Rijken (2007), the quantitative effects of a rating agencies’

migration policy can be observed by conceptualizing that rating agencies employ a threshold value TH and an adjustment fraction AF in their rating models. The threshold value reflects that the rating of a bank is not changed as long as the creditworthiness level of that bank, i.e. its credit score, stays within a specific interval [-TH,+TH]. In other words, a rating is only adjusted to a new creditworthiness level of the bank when the change in this level exceeds a certain threshold value. If so, the adjustment fraction in turn recognizes that rating agencies only adjust their ratings with a certain degree to the new creditworthiness level. They only change their rating with the value of the adjustment fraction. The modeling of these factors involves two steps.15

First, the credit scores ARi,t that follow out of the AR model are modified to credit scores ARmpi,t, credit scores that include a particular migration policy. For each bank i, ARmpi,0 is equal to ARi,0. Thereafter ARmpi,t is derived as follows:

AR AR TH

if AR

AR

t N

t i mp t

i t

i mp t

i

mp

− <

=

, 1 , ,

1 , ,

| , |

γ

(Equation 8)

This equation specifies that the modified credit score ARmpi,t stays equal to the previous value ARmpi,t-1 as long as the credit score ARi,t stays in between the values distributed by the threshold value TH. The latter is expressed in notches, so that a scaling factor γ,N,t is required to convert the credit scores to a notch scale.

Appendix II discusses the derivation of this scaling factor. In brief, the factor corresponds to the slope of the regression line that matches the rating classes with the credit scores. If the change in credit score exceeds the threshold value in Equation 8, ARmpi,t will be adjusted. However, now the value of the adjustment fraction AF determines the new value of ARmpi,t:

15 Equation 8 and 9 are adjusted versions of those from Altman and Rijken (2007). For a detailed explanation of how to account for a migration policy in credit rating scores, see Appendix A of the study by Altman and Rijken.

Referenties

GERELATEERDE DOCUMENTEN

In this study, reflectance measurements were performed dur- ing the heating of human blood samples to various temperatures and during RF ablation of human liver tissue both ex vivo

To test the cannibalism hypothesis, the researchers analyzed a sample of the human and animal (cattle) remains to compare butchery/trauma evidence. Following

Waardplantenstatus vaste planten voor aaltjes Natuurlijke ziektewering tegen Meloïdogyne hapla Warmwaterbehandeling en GNO-middelen tegen aaltjes Beheersing valse meeldauw

Besides this, I also found that independence of the board of the parent has no significant effect on the relationship between foreign subsidiaries financial reporting quality and

The positive coefficient on DLOSSRDQ means that firm with negative earnings have a stronger relationship between credit ratings and risk disclosure quality compared to firms

The results confirmed the expected relation between the market value (measured using the market price to book ratio) and the credit rating, as well as relations between the CR

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Om berekeningen van de wandbewegingen te kunnen verifieren is er behoefte aan.. De afstand van de buiswand tot de glasfiber+ptiek beinvloedt de hoeveelheid opgevangen