Bankruptcy Probability Prediction for US Firms in the High-Tech Industry

(1)

1

Bankruptcy Probability Prediction for US

Firms in the High-Tech Industry

by Irina Colomeet

1

, University of Groningen

Thesis: 21

st

of June, 2013

Abstract

This paper presents bankruptcy prediction analysis of US listed firms and separately for the high-tech industry using two decision tree algorithms: Classification and Regression Trees (CART) and Chi-squared Automatic Iterative Detector (CHAID). The most critical predictors of bankruptcy are identified by using factor analysis. A comparison of the algorithms used revealed that CART marginally outperforms CHAID algorithm, with no factor analysis performed before. However, the CHAID algorithm shows superior results using the predictors extracted through factor analysis. The results are valid for the entire sample. However further investigation of the high-tech subsample is necessary. CART algorithm has a higher accuracy rate, when no factor analysis is performed.

Key words: Bankruptcy prediction, Model selection, Decision tree classification, Artificial intelligence. JEL classification: C14, C45, G33

1

(2)

2

Introduction

In the nowadays competitive environment the research of bankruptcy phenomena is a critical task for all the stakeholders of the company. The financial crisis has emphasized the importance of reliable and robust bankruptcy models. The high-tech industry has survived the financial turmoil better than other industries.2 Despite the resilience, stable growth and overall financial performance, financial viability of high-tech firms still remains a concern. Economic risk, increased competition, and industry consolidation are current threats of this sector. To such an extent, bankruptcy prediction models should become a key part of the decision making process in this sector of the economy, as the current situation leaves little room for errors. As a new technology bubble might be knocking on the door3, as there are many similarities between the dot-com bubble and today’s high-tech hype.4

The economy as we know it today has evolved into high-tech dependent, as it is built upon information. Although the progress of the global economy has its benefits, the downside risk is that the old models for bankruptcy analysis are based on specific balance-sheet data which may not be appropriate for today’s way of doing business. Thus, the main objective of this paper is to (1) determine the most critical predictors of bankruptcy in the case of the high-tech industry, (2) use financial and non-financial ratios to enhance the accuracy of the financial distress prediction, (3) adopt decision tree techniques to construct a bankruptcy prediction model based on both selected predictors from factor analysis and all the predictors used in the study, and (4) employ statistical method for determining classification errors to compare the degree of accuracy with the artificial intelligence approach. As early signalling of financial distress is essential for bankruptcy prevention, the ultimate goal of this paper is to aid the stakeholders in bankruptcy prevention.

The reason why this study differs from previous studies is the focus on high tech firms. There is barely any other sector of the economy that is more dynamic than the high-tech sector. Rapid growth, high frequency of trend changes during the past years and the specific characteristics (i.e. high concentration, high added value of the outputs, knowledge intensity, high R&D expenses, high risks, etc.) make the industry unique. Therefore it requires further investigation to determine if it may need a sector specific bankruptcy prediction model.

There are several factors justifying this study. The major factor is that typically the financial structure of high-tech firms differs from firms within other more established industries. Firms in this sector of the economy are mainly funded with equity, which may contribute risk taking during the research and development (R&D) stage of the firm. Another important factor to consider is the volatility of the industry itself, which may appear exciting for investors at first glance, but can pose major threats in

2

Technology firms in the recession, Here we go again, The Economist, 15th January 2009,

http://www.economist.com/node/12936523 3

http://www.forbes.com/sites/greatspeculations/2013/05/02/five-stocks-to-own-for-the-next-technology-boom/ 4

(3)

3

the future. Investments in the high-tech industry can be compared to a gamble, as the company may be the next Apple or Microsoft, or it may disappear at the R&D stage. Another factor that should be taken into account, when analysing firms form the high-tech industry is the effect of the economy. The high-tech industry is a procyclical industry with high systematic risk. The development of firms in this industry is highly dependent on the economic cycle, as the evolution of technology and society is mainly based on the high-tech industry.

The study in this paper will be focussed around the following research questions: (1) Do the predictors of bankruptcy for the high-tech industry differ from the general predictors (selected for the economy as a whole)? (2) Does the decision tree method itself provide higher prediction accuracy by being able to better select the bankruptcy prediction variables?

In order to answer the proposed research questions, the study will be conducted on the US listed companies that filed for bankruptcy in 2009. Relevant variable extraction will be performed using factor analysis. It will be applied to both the entire sample and the high-tech industry subsample separately in order to test if there are any differences in the selected predictors of bankruptcy. In this study, two types of classification algorithms will be compared, more specifically Classification and Regression Trees (CART) and Chi-squared Automatic Iterative Detector (CHAID), based on both their predictive performance and accuracy. The classification algorithms will be applied to the sample of all US listed companies which filed for bankruptcy in 2009 and separately to the high-tech industry. The total sample and the high-tech subsample are analysed using both the entire set of variables and the extracted variables through factor analysis.

(4)

4

2. Literature review

Business failure prediction has become a major research area in the corporate finance field over the past years. A variety of methods have been developed to predict bankruptcy; these methods are mainly focused on financial and accounting ratios. Methodological approaches used in the bankruptcy prediction as described by Min and Jeong (2009) can be broadly classified in statistical methods and artificial intelligence methods. The first group includes the most popular models due to their ease of use and clear interpretability, like discriminant analysis (pioneering study by Beaver (1966)) and binary choice models (i.e. logit and probit). These models are fully parametric. The second group comprises the artificial intelligence models, which range from the neural networks and genetic algorithms to classification and regression trees. In contrast with the first group, the common feature of those models is that they all are non-parametric.

The most applied methods are the classical statistical methods e.g. univariate analysis, risk index models, multivariate discriminant analysis (MDA) (Altman, 1968) and logistic regression. However MDA is by far the most dominant in this area followed by logistic regression (Li, Sun and Wu, 2010).

Altman (1968) considers that solvency, liquidity and profitability ratios (i.e. Working capital to Total assets, Retained earnings to Total assets, Earnings before interest and tax (EBIT) to Total assets, Market value of equity to Book value of total liabilities, and Sales to Total assets) are the most relevant variables. He has developed a multiple discriminant analysis by combining ratios in a multivariate linear framework and has calculated a so-called Z-score as a measure to predict bankruptcy. The initial test of the model showed 72% accuracy in predicting bankruptcy two years before the event. However, the model has its shortcomings. These are primarily driven by underlying assumptions of multivariate normally distributed independent variables, equal variance-covariance matrices across healthy and bankrupt companies, prior probability of failure and misclassification costs, and the assumption of the absence of the multicolliniarity (Balcean and Ooghe, 2006). In the real world these assumptions are usually not satisfied, which results in an inappropriate application of the model and results that are not suitable for generalization.

The logistic regression developed by Ohlson (1980) has major advantages above other previously used models as it does not require assumptions regarding a priori probabilities. He compared the results of his study to three other bankruptcy prediction studies, i.e. Altman and McGrough (1974), Moyer (1977) and Altman, Haldeman and Narayanan (1977), and concluded that some specific factors as company size, financial structure, performance, and current liquidity5 contribute considerably to predicting bankruptcy one year before the event. He concluded that the predictive power of any model depends upon when the information (financial data) is assumed to be available. Therefore, significant improvement of

5

(5)

5

predictive power requires additional predictors. In addition, evaluating predictive performance of the models, based on misclassification errors, he showed that the multivariate discriminant analysis and logistic regression are essentially equivalent predictive models based.

Given the fact that bankruptcy prediction is important to a wide range of users (e.g. creditors, managers, suppliers, shareholders, employees, rating agencies etc.) more powerful and accurate bankruptcy prediction models were developed like a dynamic logit model - hazard model (Shumway, 2001, Campbell, Hilscher, and Szilagyi, 2008), a mixed logit model (Jones and Hensher, 2004), an error component logit model (Jones and Hensher, 2007), a latent class multinomial logit model (Jones and Hensher, 2007a) and a nested logit model (Jones and Hensher, 2007b). Li, Lee, Zhou, and Sun (2011) enhanced the analytical performance of the logit model by combining a random subspace approach with a binary logit model, which accounts for the different decision opinions of agents in order to improve the results of forecasting corporate failure in China. Using 30 financial variables6 as initial predictors, they concluded that a random subspace binary logit (RSBL) model performed much better compared to all classical statistical models (e.g. MDA, logit model, and probit model) based on the results of the mean accuracy of the models. Additionally, the classical statistical methods appeared to be inferior compared to RSLB because they produced higher error rates. All of these variations of the logit models are proved to have higher explanatory and statistical power compared to the standard logit model. However, logit model has several shortcomings which are driven by underlying assumptions which are not met, like homogeneity of data (Lee, Chiu, Chou and Lu, 2006) and sensitivity to multicollioniarity (Doumpos and Zopoudinis, 1999). Therefore, other techniques (e.g. probit models) are applied to overcome this problem, yet they are not that popular given the fact that they are more computationally intensive (Dimitras, Zanakis and Zopoudinis, 1996).

The artificial intelligence models have a key role in the bankruptcy prediction since 1990 (Li, Sun and Wu, 2010). They have become popular among researchers and practitioners due to the fact that they seldom require the assumptions on which classical statistical methods are based. Li, Sun and Wu (2010) compared five modes for bankruptcy prediction, classical statistical models (i.e. MDA and Logit) and the most popular data mining techniques (i.e. k nearest neighbours (kNN), support vector machine (SVM) and CART). Based on the predictors selected by stepwise method of the MDA (i.e. Total Asset Turnover, Asset-Liability Ratio, Total Asset Growth Rate, and Earnings per share) they concluded that the CART method outperforms both classical statistical methods of Logit and MDA at least at the 5% significance level, has superior predictive power compared to kNN at the significance level of 10% and marginally outperforms the SVM method for the short term business failure prediction of the Chines listed firms.

(6)

6

There are several methods that have become more popular in the recent years like kNN, SVM methods and methods using decision trees. The kNN was first applied for bankruptcy prediction in the 1990s by Jo, Han and Lee (1997) using the case-based reasoning with the kNN at its heart to predict bankruptcy. In the study two different methods of variable selection (i.e. t-test and stepwise selection) were applied on 51 potential predictors of business failure. Three different time periods were used in order to test the accuracy of prediction of three models: MDA, case-base forecasting and neural networks. The results showed that neural networks model significantly outperforms the other two models, based on the hit ratio7 measure of 83.79% compared to 82.22% for MDA and 81.52 for the case-based forecasting respectively. In addition, they concluded that the case-base forecasting system is not suitable for bankruptcy prediction. However, as suggested by Li, Sun and Wu (2010) neural network model has several drawbacks: (1) there are no models constructed in the algorithm of the kNN which causes the bankruptcy prediction to be very time-consuming; and (2) the number of the nearest neighbours often has to be selected empirically.

A model which is an alternative to both classical statistical methods and kNN is the support vector machines (SVM) that is based on data mining (i.e. classification mining). The model is suitable in the cases where dependent and independent variables exhibit complex nonlinear relationships. Lacerda and Moro (2008) have determined the best predictors of default for Portuguese firms using three models (i.e. discriminant analysis, logit and SVM). They used 47 indicators, which defined two types of information, accounting measures (e.g. profitability, leverage, capital structure, liquidity, activity dynamics over time and size) and non-accounting (e.g. firm age, number of employees). They have concluded that SVM has a higher rate of accuracy compared to the other two models examined based on the median accuracy ratio estimated on the bootstrapped samples. It can be explained by the fact that the method allowed using strong predictors (i.e. interest coverage ratio, average cost of financial debt, cash and cash equivalents to total assets ratio, net profit margin, logarithm of historical sales, logarithm of total assets –company size) that display a non-linear relationship with probability of default. This was not possible in the case of the logit or discriminant analysis. However, Wei, Li and Chen (2007) consider this model (i.e. SVM) sensitive to outliers and noise, and computational complexity is high due to the iterative procedures. Li, Sun and Wu (2010) also criticize the model for difficulty in selecting the appropriate kernel functions, inability to identify the relative importance of variables, and the difficulty to interpret the results by the practitioners.

Decision tree analysis is one of the common data mining methodologies that provide simultaneously both a classification and a prediction function (Chang and Chen, 2009) and has had many successful applications to real world problems (Kumar and Ravi, 2007).This methodology also has the ability to build models with both numerical and categorical data. Several studies have concluded that the decision tree classification models (Li et al. (2010) – CART; Chen (2011) – CART and CHAID)

7

(7)

7

deliver superior results in predicting financial distress compared to other models such as: MDA, Logit, kNN and SVM (Li et al. (2010)) and logistic regression (Chen, 2011).

In the decision tree methods, the CART and CHAID, as suggested by Li et al. (2010) present several advantages compared to the other bankruptcy prediction models, more specifically: (1) ease of the interpretation of the predictive results, which is superior to MDA, Logit and SMV; (2) ability to generate if-then rules for the bankruptcy prediction, superior to the MDA, Logit, kNN, and SMV; (3) invariance to monotonic transformations of the explanatory features used for the bankruptcy prediction, superior to the kNN and SMV; (4) effectiveness in modelling complex relationships (i.e. net income and interest coverage display non-linear relationship with the probability of default) between independent and dependent variable without strong model assumptions, as in case of the MDA and Logit; (5) ability to identify significant independent features by itself in the process of the bankruptcy prediction, superior to the all mentioned above methods; (6) no parameters to be selected and optimized in the training process, superior to kNN and SMV; (7) ease and robustness to be constructed in the firms’ bankruptcy prediction without a long training process or a long testing process, which is superior to kNN and SMV. Given the above advantages, the decision tree methods are highly applicable in predicting bankruptcies with real-life data.

Chen (2011) has used a factor analysis for the initial 37 ratios in order to extract suitable variables for the Taiwanese listed firms for the sampling period from January 2000 until May 2007 on the collected sample of the matched companies. Based on the factor analysis Chen (2011) selected 12 variables8 to be used as an input vector to the 4 methods applied (i.e. C5.0, CART, CHAID9 and logistic regression). The results revealed that the decision tree models produce more accurate results the closer the time to actual company failure. He also concluded that the predictive performance of the logistic regression is more pronouncedly influenced by the factor analysis compared to the decision trees. While, the decision tree approach obtains better prediction accuracy than the logistic regression at developing a financial distress prediction model, the accuracy rate of the logistic regression for the longer run is higher. Therefore the decision tree approach is suitable for predicting business failure in the short run; otherwise logistic regression is more appropriate for the longer run.

The hypothesis about the impact of the age of the firm on the ability of the models to predict bankruptcy was tested by Pompe and Bilderbeek (2005). They concluded that all the ratios investigated are predictors to a certain extent and they have a very similar predictive power over a time span of 5 years. Moreover, they noticed that it is more difficult to predict bankruptcy for young firms compared to more mature ones.

Financial ratios, as it is well documented, differ from one industry to another. Many studies focus on a particular industry, such as the financial industry (Estrella, Park and Peristiani, 2000), and the

8_{Earnings per Share Ratio, Return on Assets (ROA), Return on Equity (ROE), Cash Flow Ratio, Cash Flow to Total debt Ratio,} Current Ratio, Acid-Test Ratio, Gearing Ratio, Debt to Equity Ratio, Debt Equity Ratio.

9

(8)

8

health system (Coyane, Singh and Smith, 2008). It is important to determine bankruptcy predictors for the high-tech industry, which has been a niche industry in bankruptcy prediction studies. There is still room for research, given the fact that the relevance of the general model for a specific industry and particular time frame has not been proven yet. Therefore, this paper proposes bankruptcy predictors for high-tech companies comparing two algorithms of the decision tree classification (i.e. CART and CHAID).

Based on the research objective of this paper and previous research in the field (Chen, 2011; Brezigar-Masten and Masten, 2012) the following hypotheses emerged:

H1: There is no significant difference between predictive performance of CART and CHAID methods for bankruptcy prediction of US listed companies.

H1a: There is no significant difference between predictive performance of CART and CHAID methods for bankruptcy prediction of US listed high-tech companies.

H2a: There is no significant difference between the accuracy of CART and CHAID methods in predicting bankruptcy for US listed companies.

(9)

9

3. Methodology

3.1 The contribution of this research

This study contributes to the demonstration of the applicability of the decision tree classification algorithms (i.e. CART and CHAID) in the area of bankruptcy prediction in the high-tech industry based on the research methodology of Chen (2011). In order to meet the objective of determining the most critical predictors of bankruptcy for the high-tech industry two different algorithms of implementing decision trees will be used. One is to construct a tree (i.e. CART and CHAID) entirely based on the optimal predictors set selected by a filtered approach of factor analysis. The alternative method is to construct a tree with all available predictors which have not undergone factor analysis. Predictive performance and prediction accuracy of both methods will be ultimately compared.

Together with the predictors of bankruptcy suggested by literature, see Table 3, page 18, it additionally includes predictors proposed by this study, more specifically cash return on capital invested, R&D to sales, capital expenditures to sales, cash flow to capital expenditures, operating cash flow ratio and cash flow to short term debt. The choice of these variables proposed by this study was based on the relevance of the variables for the high-tech industry. For example, the cash return on capital invested is a cash flow metric that measures the cash profits of a company as a proportion of the funding required to generate them (i.e. common shares, preferred shares and long term debt). Given the fact that high-tech firms usually have a longer R&D phase (i.e. development stage), which requires large initial investments with delayed potential future cash flows, this metric is of particular importance for both equity holders and creditors of the company. R&D expense to sales measures the R&D intensity of the firm. This metric is useful in assessing whether R&D expenses are more sensitive to the firm’s financial health compared to sales. Angelmar (1985) concluded that concentration of industries with high cost and uncertainty of R&D outcome, and no barriers to new entrants is accompanied by a significant increase in the research and development. The high-tech industry falls exactly in this category, especially in recent years, when an industry consolidation could be observed. 10

The following two ratios relate to the company’s capital expenditures. The capital expenditure to sales ratio measures the percentage of sales reinvestment. This ratio shows whether the company still has the opportunity to grow and the future outlook for the company (i.e. if the company is doing well and have a positive outlook it will increase its capital expenditures to maintain or increase its competitive advantage on the market). It is especially important for the high-tech industry where the availability of new investment opportunities is vital, because the technology gets quickly outdated. The second ratio is cash flow to capital expenditures; a metric that shows the firm’s ability to purchase long term assets using the cash flow generated from operations. The company will grow if it will have the financial ability to

10

(10)

10

invest, otherwise a typical high-tech company will struggle to survive due to the lack of new investment opportunities.

The company’s liquidity in the short run is measured by the operating cash flow ratio that shows how well current liabilities are covered by the cash flow form the operations. This is a better indicator of liquidity compared to the other ratios as it uses first the cash flow (the transaction that included actual transfer of money) and second the cash flow generated only from operations of the company, so the company’s core activity. The last ratio proposed by this study is the short-term debt coverage ratio (i.e. cash flow to short term debt) which measures the ability of the company to meet its current obligations (short term borrowings and current portion of the long term debt) and it shows if the company has enough cash flow available to expand its business or only enough to repay the debt which comes due.

3.2 Research design

The research objective is to identify the most critical predictors of bankruptcy and compare the predictive performance of different algorithms (i.e. CART and CHAID) in the case of US bankrupt listed companies, paying specific attention to the high-tech industry. Selection of the predictors is a very important phase in all bankruptcy prediction studies. Yet there is no particular theory that has been generally accepted. The majority of the previous research has used a step-wise procedure (e.g. MDA, Logit) to select the variables. This kind of procedure is not statistically rigorous. As this method is sensitive to the order of the input variables it will not generate a unique solution. As an attempt to overcome this problem some authors (e.g. Chen, 2011) use factor analysis to select the appropriate variables.

In the initial phase, the original database will be pre-processed (i.e. cleaned and transformed). The objective in this phase is to select the appropriate predictors by means of factor analysis. In the next phase the first selected variables will be used in the decision tree algorithms and then the whole set of variables will be analysed using the decision tree method. Both analyses will be performed in order to test if factor analysis improves predictive performance and accuracy of the decision tree algorithms. In the last phase the decision tree algorithms are compared with each other with respect to their predictive power and predictive accuracy.

3.3 Factor analysis

Factor analysis is a method used for describing the variability among observed, correlated variables in terms of a potentially lower number of unobservable variables which are called factors. Factor analysis searches for the joint variation in response to unobservable latent variables11. The observed variables are modelled as a linear combination of the potential factors, plus an error term. Thus, the information

(11)

11

gathered about the interdependencies between observed variables can be used to reduce the number of variables in the dataset. The steps of the factor analysis are depicted in the figure below.

Fig.1 Steps of the Factor Analysis The basic model of factor analysis is based on the following two assumptions:

1. The error terms ei are independently distributed, with zero mean and finite variance:

E(ei)=0 and Var(ei)= .

2. The latent factors Fj are independent to one another and the error terms are such that the:

E(Fj)=0, Var(Fj)=1.

It has been assumed that the variable (i.e. predictor of bankruptcy) Yi, is a linear function of

independent latent factors and error terms, and can be written as:

∑ (1)

Where:

Yi –predictor i of the bankruptcy;

βi0 – constant;

βij - factor Fj loading;

Fj - unobservable factor;

ei – error.

The variance of the Yi is calculated using the following formula:

( ) ∑ ( ) ( ) ( ) ∑ (2) Where:

– communality 12;

– predictorspecific variance;

12

The communality is the sum of the squared loadings of a particular variable for all factors, and it measures the percent of variance in a given variable explained by all factors jointly.

1. Determine the meaningfull factors

2.Rotation of the factor axes

(12)

12

Therefore the variance of the predictor consists of the two parts. The communality of the variable is the part that is explained by the common factor . Moreover, the specific variance is the part of the variance of the predictor Yi that is not accounted by the common factors. If the factors are

perfect predictors of the variables (i.e. bankruptcy predictors) then always, and . The covariance of any two observable variables, Yi and Yk, can be written as:

∑ ( ) ( ) (3) ∑ ( ) ( ) (4)

In this manner the covariance of any two observable variables is as follows:

( ) ∑ ( ) ( )( ) ( ) ( )( ) ( ) ∑ (5)

The factor loadings will be determined using the principle component analysis (PCA), as it is the most widely used method for determining a first set of loadings (Tryfos, 1996). The idea behind the method is that it seeks values of the factor loadings that will bring the estimate of the total communality as close as possible to the total observed variance. The covariances in this case are ignored.

The communality is the part of the variance of the variable that is explained by the factors. The higher the communality value, the more successful the factor model is in explaining the variable. The PCA method determines the values of the which make the total communality proxy as closely as possible to , the sum of the observed variances of the variables, see Table 1. The estimated loadings are different from the theoretical loadings .

Table 1

Elements of principal component method

Variable Observed variance Theoretical factor loadings Theoretical communality Yi Si 2 2 Total

The factor loadings will determine the coefficients of correlation between the variable and the factor. The sum of the squared loadings on the factor , ∑ can be interpreted as the contribution of the factor in explaining the sum of the observed variables.

Having the communality approximate as close as possible the sum of observed variables make sense only in case when the predictors are measured in the same units, which is not the case for all

(13)

13

extraction method. This is accomplished by using the formula (6), however by construction, variables are standardized in statistical software used for this analysis (i.e.SPSS®).

̅

(6)

Where:

– standardized observation j of the variable i;

– observation j of the variable i;

̅ – mean of the variable i;

– standard deviation of the variable i.

The selection method of factors applied in the current analysis is based on the study of Chen (2011). He used Kaiser’s criteria that suggest selecting eigenvalues greater than 1 is a common factor and the communality greater than 0.8 in order to obtain suitable factors.

If the first factor solution does not provide the hypothesised structure of loadings, the rotation is applied to find another set of loadings that proxy the observation equally well, but is more easily to interpret. Therefore this study will use the Verimax criterion for rotation, which is an orthogonal rotation method which seeks the rotated factor loadings that maximizes the variance of the squared loading for each factor. The goal of the rotation is to make some of the loadings as large as possible, and the other as small as possible in absolute value. This method encourages selection of the factors that are related to fewer variables, and discourages the ones related to all variables.

To summarize, this study uses the SPSS® statistical software package to guide factor analysis using the PCA extraction method and Verimax rotation to extract bankruptcy predictors.

3.4 Decision tree algorithms

The decision tree algorithm is a non-parametric learning method that produces different

types of the classification and regression trees.

The algorithm is used to predict the affiliation of the objects to certain groups of dependent variable based on one or more predictor variables. The advantages of the algorithm, mentioned in the literature review, is the reason why this algorithm has become one of the most popular data mining algorithms used for bankruptcy prediction (Li et al., 2010). Two of the major algorithms of the decision tree analysis (Chen, 2011) that will be used in this study are the following: CART and CHAID. The algorithms work by sequential procedures when analysing the observations. Both algorithms are discussed below.

3.4.1. CART algorithm

(14)

14

by Brieman, Friedman, Olshen and Stone (1984). The algorithm generates a binary decision tree, which is built based on a splitting rule – a rule that ascertain that the splitting of the sample in the component classes. The data is divided at the parent node into the child nodes that ensures maximum homogeneity of the observations within each child node. The splitting algorithm of the decision trees based on the CART method is as follows:

Fig.2. Splitting algorithm of the classification tree on the example of the

CART method, where tP, tL, tR are parent, left and right nodes, PL is

probability of left node, PR is probability of right node; xj is variable j,

and xjR is best splitting value of variable xj.

The maximum homogeneity of the child node is defined by the impurity function i(t), described by the Gini index that is discussed further in this section. Given the fact that the impurity function for the parent node tP is constant for any given split xj ≤ xjR, j=1, …, M, the maximum homogeneity of left and

right child nodes, tL and tR, will be achieved by maximizing the change of the impurity function ( ):

( ) ( ) ( ) (7)

Where is the left and right node (child nodes) of the parent node . Assuming that PL and PR are

probabilities of the left and right nodes, the following equation can be derived:

( ) ( ) ( ) ( ) (8) Consequently, at each node CART algorithm solves the following maximization problem: ( ) ( ) ( ) (9)

The above equation implies that CART will search through all possible values of all variables included in the study to find the best split xj ≤ xj R, that maximizes the change of the impurity measure

( ).

It is important to define the impurity function i(t). In this study the Gini splitting rule suggested by Breiman et al. (1984) will be used, due to the fact that it is not that computationally intense but with

tR

PL PR

xj ≤ xjR

tP

(15)

15

comparable results compared to other node impurity measures e.g. Twoing or ordered Twoing. The Gini index of the impurity of the node can be expressed as follows:

G(t)= 1-p(t)2-(1-p(t))2 (10) Where:

G(t) – Gini index;

p(t) – relative frequency of the first class in the node.

The Gini index reaches the value of zero when only one category of companies (i.e. bankrupt or active) is present in the node. Therefore the Gini index will have a value of zero only in the case when all the objects are homogeneous, i.e. belong to the same class.

To ensure that the method has high predictive accuracy, the split-sample validation technique will be used based on the split ratio of 70:30, as suggested by Li et al. (2011), 70% training sample and 30% testing sample. The selection of the inclusion of the observations in the training and testing samples is purely random and made automatically by the statistical software.

3.4.2 CHAID algorithm

The CHAID algorithm is originally proposed by Kass (1980). The CHAID algorithm is not based on any probabilistic distribution, but exclusively on chi-square goodness of fit test to create a multi way split (i.e. two or more child nodes). At each node, the CHAID algorithm selects which predictors can be used to provide the best possible split of the node. The selection process is accomplished by comparing values associated with each predictor. The CHAID algorithm selects the predictors with the smallest value (i.e. most significant) and compares them with the user specified alpha-level (i.e. α =5%). If the p-value is less than or equal to alpha-level then the node is split using the selected predictor, otherwise the node is not split and considered as a terminal node. Therefore, the CHAID method repeatedly splits the observations of the parent node into two or more child nodes within the entire sample.

In this study due to the small sample size the likelihood ratio is used as a chi-square statistic. It is more robust compared to the Pearson’s chi-square statistic and it is more appropriate in the case of the small samples. The observed frequency and the expected frequency of the certain category of the dependent variable are used to calculate the likelihood ratio statistic. For splitting the nodes in the CHAID algorithm p-value is needed for a pair of predictor variables. Let assume that X denotes a set of predictor variables and Y denotes the dependent variable. The (unadjusted) p-value for a pair of predictor variables is computed on the following formula:

∑∑ ( ̂ ) (11)

Where I are categories of X (predictor variable), J are classes of the dependent variable Y,

(16)

16

case n, D is the relevant data need to calculate p-value and mij is the estimated cell frequency for the cell

(xn = i, yn = j), estimated as follows:

̂

(12)

Where:

∑ ∑ ∑∑ (13)

The corresponding p-value is given by ( ) , where follows a chi-squared distribution with d degrees of freedom, d = (J-1)(I-1).

3.5 Algorithm prediction accuracy

In order to test how accurately the models can predict the bankruptcy, error measures were calculated: Type I Error Rate, Type II Error Rate, and Total Error Rate. Based the following table proposed by Chen (2011).

Table 2

The relationship between Error Types

Classification

Predicted

Observed Active Bankrupt Sum

Active Y1 Y2 Y3

Bankrupt Y4 Y5 Y6

Sum Y7 Y8 Y9

Where Y1, Y2, ... , Y9 is the number of companies classified under the specific category. Type I Error Rate is the rate of risk that a model cannot categorize the active company as active company. Type II Error Rate is the rate of risk that the model cannot categorize bankrupt company as a bankrupt company. Total Error Rate is the combined Type I and Type II Error Rate. The formula for each of the error type is as follows

(17)

17

4. Data

The presented analysis was performed on all US listed companies and the subsample of the high-tech industry which went bankrupt in 2009, based on the data of the year 2007 (i.e. two years prior bankruptcy). The reasons for using 2007 data to predict 2009 bankruptcies are: (1) It is the most optimal year for bankruptcy prediction, as one year before the bankruptcy (i.e. 2008) the financial statement can reveal itself the financial health of the company, however the earlier years may not yet contain enough information in order to discriminate among companies in bankrupt and active with sufficient accuracy; (2) Previous studies (Altman, 1968; Campbell et al., 2008; Jo et al., 2011; Chen, 2011) also used the financial data two years prior the event of bankruptcy; (3) In the case of the current study year 2007 contains more information on bankrupt companies, compared to year 2008. This fact helped retain sufficient amount of bankrupt companies for the study. The data used in this paper originated from three different sources: Compustat and Bloomberg Databases and the data available in the online resources of the United States Courts (Public Access to Court Electronic Records).

Based on the information available on the website of the United States Courts13_{, in 2009 there have} been 60,837 business bankruptcy filings, out of which 41,962 Chapter 7 filings, 13,683 Chapter 11 filings and 136 Chapter 15 filings. Out of all these bankruptcy filings only 198 filings came from listed companies regardless of the chapter filed. For of all above chapter filings only 211 filings (both listed and unlisted, and 57 listed alone) came from high-tech firms (including tire manufacturers) according to the information available on the Bloomberg data base. Therefore, the majority of the bankruptcy filings come from unlisted companies, and only 0.3% from listed companies, out of which the high-tech industry filings have high presence (i.e. 28.8%). According to the Bloomberg database the year 2009 is one with the highest rate of bankruptcy filing for the high-tech industry after the dotcom bubble. High-tech companies were classified according to Industry Classification Benchmark as technology industry14_and additionally including the following: aerospace and defence, automobile and parts, automobile manufacturers (excluding tire manufacturers), biotechnology, electronic and electrical equipment, and telecommunications. Filtering all the companies, to eliminate the tire manufacturers, companies with insufficient data, and selecting only the listed ones, the sample for the study comprises of 109 firms out of which 42 are high-tech bankrupt companies. The financial sector was entirely excluded from the sample under study, as the financial structure and the reporting basis of these companies differ significantly compared to the industrial companies (Ohlson, 1980).

Based on the North America Industry Classification System, the sample structure of the bankrupt companies, see Fig. 3 and Table A1, revealed that the composition is quite dissimilar to the US GDP by industry, see Fig. A1. One of the reasons was the exclusion of the financial sector15_{from the analysis.}

13

http://www.uscourts.gov/uscourts/Statistics/BankruptcyStatistics/BankruptcyFilings/2009/1209_f2.pdf 14

Computer services, Internet, Software, Electronic Office Equipment, Semiconductors, Telecomunication Equioment. 15

(18)

18

Fig.3 Structure of the bankrupt companies based on the North America Industry Classification System Based on the subsector membership of the company, the structure of high tech sample showed that four subsectors have the highest frequency, more specifically telecommunications with 7 companies, auto parts & equipment with 6 companies and biotechnology and semiconductors with 5 companies each. Internet is with only one company the subsector with the lowest presence, see Fig.4.

Fig.4 Subsector structure of the high-tech bankrupt companies

The selection of the used variables as candidates for inclusion in the analysis is based on the previous research studies on the bankruptcy prediction. As the starting point for the variable selection, the most relevant bankruptcy prediction studies and the studies that have used vast number predictors were included. The related research fulfilled by Altman (1968), Ohlson (1980), Jo, Han and Lee (1997), Lacerda and Moro (2008), Chen (2011), Li, Lee, Zhou, and Sun (2011) suggested variables for the bankruptcy prediction. Additionally, several specific ratios proposed by this study are added, marked with (S) in the list of variables, see Table 3. The set of 43 variables was categorized in five different major categories: profitability ratios (10), financial structure ratios (10), activity ratios (11), liquidity ratios (11) and non-financial factors (1). The category composition based on the variable affiliation is presented below: 12 2 58 1 8 3 11 2 7 1 1 1 2

Mining, Quarrying, and Oil and Gas Extraction Construction

Manufacturing Wholesale Trade Retail Trade

Transportation and Warehousing Information

Real Estate and Rental and Leasing

Professional, Scientific, and Technical Services Administrative and Support and Waste Management and Remediation Services

(19)

19

Table 3

List of the variables proposed to be used in this study, based on the previous work of Altman (1968), Ohlson (1980), Jo, Han and Lee (1997), Lacerda and Moro (2008), Chen (2011), Li, Lee, Zhou, and Sun (2011) and several variables proposed by the current study marked with S.

Variable set (43) Altman (1968) Ohlson (1980)

Jo, Han and Lee (1997) Lacerda and Moro (2008) (2011) Chen Li et al. (2011) Current Study (S) Profitability Ratios (10) Return on Assets      Return on Equity 

Earnings per Share  

Price to Book Ratio 

Dividend Payout Ratio 

Pretax Margin 

Gross Profit Margin   

Net Profit Margin   

Gross Return on Assets  

Cash Return on Capital Invested S

Financial Structure Ratios(10)

Debt to Assets  

Interest Coverage Ratio    

Debt to Equity  *

Book value per share 

Current Assets to Total Assets   

Fixed Assets to Total Assets  

Gross Margin to Total Assets 

Current Debt Ratio 

% ST Debt in Total Debt 

Equity to Fixed Assets 

Activity Ratios (11)

Total Assets Turnover     

Fixed Assets Turnover   

Inventory Turnover     Accounts Payable/Sales COGS/Sales  R&D/Sales S Equity Turnover  

Working Capital Turnover  

CAPEX to Sales S

Cash Flow to Sales 

Cash Flow to CAPEX S

Liquidity Ratios (11)

Current Ratio    

Quick Ratio  

Acid Test Ratio 

Cash Ratio   

Working Capital to Total Assets    

Operating Cash Flow Ratio S

Cash Flow to LT Debt 

Cash Flow to Total Debt  

Cash Flow to ST Debt  

Average Cost of Debt S

Cash to Total Assets 

Non-financial factors (1)

Number of Employees 

* Equity to debt Ratio

(20)

20

The analysis starts by examining the predictors of default. Descriptive statistics for both US economy wide bankrupt and active companies are presented in the Table 4. The subsample of bankrupt companies consists of 109 companies that filed for bankruptcy in 2009 regardless of the chapter of filing. The subsample of active companies consists of the matched companies to the bankrupt ones based on two criteria: (1) total assets for the year 2007 and (2) NAICS classification.

(21)

21

Table 4

Summary statistics for the US economy wide bankrupt listed firms (109 firms) and matched non-bankrupt listed firms (109 firms), based on the pre-processed data, i.e. cleaned and transformed, for the year 2007. The bankrupt subsample consists of US listed firms which filed for bankruptcy in 2009. The non-bankrupt companies are matched based two criteria, Total Assets for the year 2007 and the NAICS classification. N is the number of observations which contain the variable. P10% and P90% are 10th and 90th percentiles. IPR is the inter percentile range (P90% - P10%). The data that is used are financial and non-financial factors for the year 2007.

Bankrupt Companies Non-bankrupt companies

Variable N Mean Median P10% P90% IPR N Mean Median P10% P90% IPR

Company Size Total Assets 109 2089.35 156.21 7.83 1938.17 1930.35 109 2095.13 185.39 7.08 1858.62 1851.54 Profitability Ratios ROA 109 -0.453 -0.109 -1.374 0.030 1.404 109 -0.317 0.007 -0.957 0.180 1.137 ROE 109 -0.292 -0.071 -2.279 1.167 3.446 109 -0.287 0.050 -0.993 0.511 1.504 EPS 100 -0.696 -0.340 -2.428 0.138 2.566 108 0.252 0.045 -1.117 2.542 3.659 Price to Book Ratio 75 4.968 0.651 -1.175 5.225 6.400 66 5.500 1.953 -0.002 7.309 7.310 Dividend Payout Ratio 85 0.046 0.000 -0.083 0.000 0.083 109 0.181 0.000 -0.001 0.363 0.363 Pretax Margin 109 -4.012 -0.127 -7.519 0.025 7.544 109 -7.388 0.012 -1.561 0.224 1.785 Gross Margin 109 -7.040 0.246 -0.905 0.639 1.544 109 -0.054 0.366 0.061 0.713 0.652 Net Profit Margin 109 -4.561 -0.108 -7.460 0.027 7.487 109 -7.433 0.010 -1.561 0.204 1.765 Gross Return on Assets 108 -0.302 -0.067 -0.936 0.072 1.008 108 -0.194 0.026 -0.793 0.164 0.957 CROCI 109 -1.174 0.054 -1.961 0.751 2.712 108 0.009 0.159 -0.831 0.790 1.622

Financial Structure Ratios

Debt to Assets 108 0.515 0.360 0.006 1.087 1.081 108 0.251 0.158 0.000 0.508 0.508 Interest Coverage Ratio 108 -162.997 -0.853 -29.882 1.737 31.620 95 -141.280 0.831 -21.747 57.074 78.821 Debt to Equity 108 14.905 0.372 -2.405 3.455 5.861 108 0.364 0.134 -0.006 1.329 1.335 Book value per share 104 2.022 0.342 -0.832 8.752 9.584 109 474.823 3.868 -0.019 19.809 19.828 Current Assets to Total Assets 108 6.260 0.425 0.096 0.833 0.737 109 0.500 0.495 0.131 0.840 0.710 Fixed Assets to Total Assets 108 0.281 0.184 0.034 0.719 0.684 109 0.277 0.165 0.014 0.762 0.748 Gross Margin to Total Assets 109 -0.027 0.001 -0.009 0.032 0.042 109 -0.020 0.002 0.000 0.045 0.045 Current Debt Ratio 107 0.506 0.309 0.089 0.797 0.708 109 0.381 0.228 0.091 0.671 0.579 % ST Debt in Total Debt 109 0.274 0.073 0.000 0.995 0.995 108 0.262 0.069 0.000 1.000 1.000 Equity to Fixed Assets 109 0.921 0.809 -4.341 8.488 12.829 107 9.889 2.194 0.041 34.830 34.789

Activity Ratios

Total Assets Turnover 108 1.010 0.791 0.050 2.106 2.056 109 0.972 0.760 0.069 2.055 1.986 Fixed Assets Turnover 109 11.268 4.614 0.087 24.141 24.053 109 13.804 4.785 0.245 30.107 29.863 Inventory Turnover 108 17.250 6.276 0.000 44.789 44.789 108 22.322 6.487 0.000 42.735 42.735 Accounts Payable/Sales 109 0.736 0.108 0.028 0.762 0.734 109 0.281 0.082 0.019 0.354 0.335 COGS/Sales 109 7.994 0.701 0.268 1.905 1.638 109 1.017 0.619 0.158 0.912 0.754 R&D/Sales 71 0.338 0.038 0.000 0.665 0.665 67 6.282 0.050 0.000 1.538 1.538 Equity Turnover 108 15.620 0.564 -4.993 9.384 14.376 109 2.027 1.162 0.000 4.651 4.651 Working Capital Turnover 107 1.422 1.904 -11.221 17.411 28.633 109 6.185 2.469 -1.571 14.195 15.765 CAPEX to Sales 109 2.969 0.033 0.003 2.433 2.430 108 0.226 0.045 0.005 0.601 0.596 Cash Flow to Sales 109 5.341 0.001 -0.314 0.777 1.091 108 2.460 0.007 -0.166 0.386 0.552 Cash Flow to CAPEX 109 2.358 0.049 -6.504 7.564 14.068 108 2.984 0.134 -5.546 13.723 19.269

Liquidity Ratios

Current Ratio 108 1.909 1.344 0.116 4.370 4.254 109 2.844 1.895 0.335 4.903 4.568 Quick Ratio 107 10.958 0.944 0.124 4.561 4.437 108 2.380 1.397 0.332 4.299 3.967 Acid Test Ratio 104 1.378 0.855 0.137 3.265 3.128 109 2.208 1.245 0.234 4.109 3.875 Cash Ratio 100 0.763 0.218 0.010 1.625 1.615 109 0.998 0.547 0.028 1.863 1.835 Working Capital to Total Assets 108 -0.040 0.133 -0.429 0.434 0.863 109 0.119 0.192 -0.290 0.602 0.892 Operating Cash Flow Ratio 107 -0.527 -0.082 -2.031 0.597 2.628 108 0.116 0.192 -1.230 1.428 2.658 Cash Flow to LT Debt 109 -160.617 0.000 -0.605 0.887 1.492 107 4.877 0.000 -0.526 2.437 2.963 Cash Flow to Total Debt 109 -10.698 0.002 -0.459 0.978 1.437 107 2.060 0.000 -0.363 1.550 1.913 Cash Flow to ST Debt 109 142.344 0.000 -2.044 16.334 18.378 108 40.091 0.000 -1.087 32.209 33.296 Average Cost of Debt 106 0.211 0.065 0.000 0.169 0.169 97 0.071 0.051 0.000 0.150 0.150 Cash to Total Assets 101 0.119 0.071 0.004 0.346 0.342 109 0.174 0.096 0.013 0.455 0.443

Non-financial factors

(22)

22

Summary statistics for both high-tech active and bankrupt US firms are presented in Table 5. The subsample of high-tech bankrupt firms consists of 42 companies that filed for bankruptcy in 2009 regardless of the type of filing. The subsample is matched with 42 active high-tech firms based on (1) total assets for the year 2007 and (2) NAICS classification.

Analogous to the data retrieved for the entire economy, for high-tech companies several variables are highly skewed: debt to equity ratio, interest coverage ratio, quick ratio and number of employees for bankrupt high-tech companies, and interest coverage ratio for the active high-tech companies, see Table 5. Ratios such as: dividend pay-out ratio, inventory turnover, equity turnover, working capital turnover and number of employees, which present higher dispersion in the case of the bankrupt companies and interest coverage ratio, book value per share, equity to fixed assets and cash flow to short term debt variables that are more dispersed for the active companies. More attention will be devoted to the ratios that have not been previously discussed. Dispersion of dividend pay-out ratio variable, shows on one hand that some companies use internally generated funds during bad times, as the packing order theory suggests, on the other hand other companies prefer to pay out dividends. The latter is triggered by the debt overhang problem. High inventory turnover ratio is good up to a certain level (i.e. optimal level for the companies) but very high ratios, as in the case of the bankrupt companies, may indicate the company is running out of certain items of inventory and therefore losing the sales to competition. Therefore the levels of inventory should be reasonable for the company to maintain its profitability. For the variable “number of employees”, a large difference between active and bankrupt firms may be caused by several factors: (1) labour intense companies go bankrupt more often compared to capital intense companies; (2) bankrupt companies are less efficient, therefore they have more personal. However, due to the small sample size the difference in variables between active and bankrupt companies needs to be treated with conscience, as the data may contain outliers that can lead to spurious results.

(23)

23

Table 5

Summary statistics for the high-tech bankrupt firms (42 firms) and matched non-bankrupt high-tech firms (42 firms), based on the pre-processed data, i.e. cleaned and transformed, for the year 2007. The bankrupt subsample consists of US listed high-tech firms which filed for bankruptcy in 2009. The non-bankrupt companies are matched based two criteria, Total Assets for the year 2007 and the NAICS classification. N is the number of observations which contain the variable. P10% and P90% are 10th and 90th percentiles. IPR is the inter percentile range (P90% - P10%). The data that is used are financial and non-financial factors for the year 2007.

Bankrupt Companies Non-bankrupt companies

Variable N Mean Median P10% P90% IPR N Mean Median P10% P90% IPR

Company Size Total Assets 42 4815.5 152.14 8.13 7505.5 7497.37 42 1863.05 101.97 4.62 3092.4 3087.73 Profitability Ratios ROA 42 -0.363 -0.144 -1.215 0.037 1.253 42 -0.379 -0.014 -1.235 0.145 1.380 ROE 42 -0.337 -0.011 -3.295 1.345 4.640 42 0.119 0.082 -0.873 0.693 1.566 EPS 33 0.543 -0.010 -1.414 0.510 1.924 42 -0.190 -0.025 -2.078 2.248 4.326 Price to Book Ratio 32 9.826 0.399 -0.930 8.642 9.572 25 1.740 1.504 -0.049 4.667 4.716 Dividend Payout Ratio 21 -0.063 0.000 -0.244 0.000 0.244 42 0.059 0.000 0.000 0.146 0.146 Pretax Margin 42 0.453 -0.064 -1.810 0.076 1.885 42 -9.909 -0.012 -2.766 0.235 3.001 Gross Margin 42 -17.151 0.246 -0.059 0.638 0.696 42 0.208 0.346 0.018 0.654 0.636 Net Profit Margin 42 -1.036 -0.089 -1.810 0.055 1.865 42 -9.937 -0.008 -2.759 0.203 2.961 Gross Return on Assets 41 -0.227 -0.071 -0.939 0.073 1.012 41 -0.181 0.008 -1.082 0.159 1.240 CROCI 42 -2.683 0.147 -3.577 0.695 4.272 41 0.293 0.147 -0.783 0.661 1.445

Financial Structure Ratios

Debt to Assets 41 0.534 0.347 0.016 1.406 1.390 41 0.261 0.143 0.000 0.682 0.682 Interest Coverage Ratio 41 6.811 -0.720 -9.762 2.595 12.358 35 -364.831 0.892 -26.999 158.279 185.278 Debt to Equity 41 39.446 0.153 -2.347 3.493 5.840 41 0.184 0.046 -0.955 2.482 3.437 Book value per share 40 2.185 0.010 -0.137 10.541 10.678 42 3.756 1.754 -0.545 17.227 17.772 Current Assets to Total Assets 42 15.422 0.477 0.126 0.866 0.740 42 0.526 0.565 0.191 0.802 0.611 Fixed Assets to Total Assets 41 0.199 0.105 0.022 0.421 0.399 42 0.199 0.125 0.013 0.525 0.512 Gross Margin to Total Assets 42 -0.052 0.001 0.000 0.057 0.057 42 0.010 0.002 0.000 0.025 0.025 Current Debt Ratio 41 0.505 0.353 0.118 1.024 0.906 42 0.427 0.236 0.112 1.259 1.147 % ST Debt in Total Debt 42 0.316 0.126 0.000 1.000 1.000 41 0.358 0.127 0.000 1.000 1.000 Equity to Fixed Assets 42 1.284 0.751 -10.748 11.270 22.018 42 10.832 2.746 -3.214 40.975 44.189

Activity Ratios

Total Assets Turnover 41 1.119 1.026 0.272 1.948 1.677 42 0.927 0.741 0.237 1.615 1.378 Fixed Assets Turnover 42 15.856 7.214 0.590 38.002 37.412 42 14.826 6.469 0.934 38.581 37.647 Inventory Turnover 41 13.935 5.469 0.000 38.641 38.641 42 11.497 6.151 0.000 24.850 24.850 Accounts Payable/Sales 42 0.182 0.154 0.031 0.309 0.278 42 0.261 0.092 0.025 0.572 0.548 COGS/Sales 42 18.103 0.743 0.147 1.059 0.912 42 0.768 0.643 0.304 0.935 0.631 R&D/Sales 40 0.343 0.042 0.000 0.282 0.282 35 8.798 0.057 0.000 1.035 1.035 Net Sales % of Working Capital 41 11.485 3.563 -10.130 20.291 30.421 42 4.397 2.444 -1.653 13.051 14.704 Equity Turnover 41 39.422 1.398 -7.436 13.555 20.991 42 0.874 1.022 -1.679 3.677 5.356 Working Capital Turnover 41 11.485 3.563 -10.130 20.291 30.421 42 4.397 2.444 -1.653 13.051 14.704 CAPEX to Sales 42 0.052 0.024 0.002 0.197 0.196 41 0.111 0.029 0.005 0.299 0.294 Cash Flow to Sales 42 -0.139 0.019 -0.241 0.738 0.979 41 0.995 0.019 -0.141 0.926 1.067 Cash Flow to CAPEX 42 -5.301 1.128 -28.336 8.757 37.093 41 8.906 0.342 -7.116 32.877 39.993

Liquidity Ratios

Current Ratio 42 1.528 1.186 0.002 3.719 3.717 42 2.540 1.975 0.244 5.791 5.547 Quick Ratio 41 25.771 0.913 0.121 5.033 4.912 42 2.040 1.334 0.228 5.225 4.997 Acid Test Ratio 38 1.160 0.855 0.136 2.291 2.155 42 1.899 1.225 0.127 5.043 4.916 Cash Ratio 35 0.485 0.232 0.000 1.374 1.374 42 0.943 0.423 0.059 2.509 2.450 Working Capital to Total Assets 42 0.018 0.125 -0.772 0.489 1.260 42 0.099 0.171 -0.918 0.628 1.546 Operating Cash Flow Ratio 41 -0.414 -0.060 -1.645 0.705 2.350 41 -0.158 0.022 -1.637 1.211 2.848 Cash Flow to LT Debt 42 0.436 0.000 -0.443 1.294 1.737 40 9.224 0.000 -0.884 2.349 3.233 Cash Flow to Total Debt 42 0.573 0.022 -0.371 1.281 1.653 40 2.611 0.007 -0.073 1.623 1.696 Cash Flow to ST Debt 42 6.917 0.013 -0.515 19.464 19.979 41 6.297 0.000 -0.595 27.190 27.786 Average Cost of Debt 40 0.095 0.030 0.000 0.137 0.137 36 0.071 0.057 0.000 0.164 0.164 Cash to Total Assets 35 0.110 0.079 0.000 0.361 0.361 42 0.189 0.130 0.018 0.584 0.566

Non-financial factors

(24)

24

5. Results

5.1 Factor analysis

In order to perform the factor analysis, 109 bankrupt companies from the sample are matched with 109 active companies, based on total assets for the year 2007 and the NAICS sector classification. There are 43 variables available for the bankruptcy prediction.

Missing values for any variable in the factor analysis are treated by excluding all the observations for that particular company from the analysis. Therefore, there is a trade-off between sample size and number of the predictors used. Consequently certain variables (i.e. earnings per share, price to book ratio, dividend pay-out ratio, number of employees) that have more data points missing (i.e. more than 17% of the total number of observations) will be eliminated from the study. The only exception from this rule will be the R&D to sales variable, because it might be an important predictor in the case of the high-tech companies. For the remaining variables, missing data points will be replaced by the mean. As a result the analysis will be carried out using 39 bankruptcy predictors.

Factor analysis is performed using the principal component analysis extraction method and the Verimax method (i.e. orthogonal method) for factor rotation. The Kaiser’s criteria16_{is used for factor} selection based on the paper of Chen (2011). The number of the factors was set to be selected automatically by the statistical software.

Factor analysis is used to distinguish the variables by testing whether the differences between the variables are significant or not. The results of the first factor analysis on the entire sample are presented in Table 6. The total variance explained by all 12 factors is 79.1%. Due to low (less than 0.8) communality 16 variables are discarded (i.e. ROE, book value per share, debt to assets, interest coverage, fixed assets to total assets, gross margin to total assets, percentage of short term debt in total debt, equity to fixed assets, total asset turnover, fixed asset turnover, inventory turnover, operating cash flow ratio, cash flow to total assets, cash flow to short term debt, and cash flow to total debt). Therefore, after first factor analysis 23 variables are extracted, on which the second factor analysis is performed, see Table B1.

16

(25)

25

Table 6

Summary statistics for the first factor analysis for 109 bankrupt and 109 matched non-bankrupt US economy listed firms. The bankrupt subsample consists of US listed firms which filed for bankruptcy in 2009. The non-bankrupt companies are matched based two criteria, Total Assets for the year 2007 and the NAICS classification.

Factor Variable loadings Factor Communalities Eigenvalues Explained variance

1 COGS/Sales 0.994 0.990 6.094 15.234

Current Assets to Total Assets 0.987 0.977

Quick Ratio 0.987 0.976

2 Accounts Payable/Sales 0.950 0.945 4.595 11.487

CAPEX to Sales 0.949 0.913

Cash Flow to Sales 0.928 0.946

3 Working Capital to Total Assets 0.934 0.921 4.127 10.317

Gross Return on Assets 0.820 0.838

ROA 0.758 0.871

4 Acid Test Ratio 0.919 0.899 3.059 7.648

Current Ratio 0.909 0.892

Cash Ratio 0.679 0.805

Equity to Fixed Assets 0.528 0.622

Cash to Total Assets 0.405 0.680

R&D/Sales 0.120 0.891

Gross Margin 0.007 0.990

5 Debt to Equity 0.990 0.982 2.592 6.479

Equity Turnover 0.989 0.982

6 Net Profit Margin 0.708 0.954 2.389 5.974

Pretax Margin 0.705 0.953

ROE 0.544 0.428

CROCI 0.064 0.980

7 Cash Flow to LT Debt 0.976 0.955 _1.971 _4.926

Average Cost of Debt 0.917 0.876

Cash Flow to ST Debt 0.724 0.594

8 Working Capital Turnover 0.996 0.996 _1.819 _4.549

9 Interest Coverage Ratio 0.670 0.474 1.615 4.037

Operation Cash Flow Ratio 0.556 0.477

Fixed Assets to Total Assets 0.292 0.626

Debt to Assets 0.264 0.656

10 Fixed Assets Turnover 0.839 0.759 _1.261 _3.152

Total Assets Turnover 0.665 0.687

% ST Debt in Total Debt 0.183 0.438

Current Debt Ratio 0.092 0.942

Gross Margin to Total Assets 0.084 0.589

11 Book value per share 0.727 0.540 1.088 2.720

Inventory Turnover 0.672 0.498

12 Cash Flow to CAPEX 0.832 0.720 1.037 2.594

Cash Flow to Total Debt 0.231 0.389

(26)

26

Table 7

Summary statistics for the third factor analysis for 109 bankrupt and 109 matched non-bankrupt US economy listed firms. The bankrupt subsample consists of US listed firms which filed for bankruptcy in 2009. The non-bankrupt companies are matched based two criteria, Total Assets for the year 2007 and the NAICS classification.

1 COGS/Sales 0.999 0.999

Current Assets to Total Assets 0.999 0.999 4.802 20.879

Quick Ratio 0.999 0.999

2 CAPEX to Sales 0.976 0.959 4.033 17.534

Accounts Payable/Sales 0.971 0.980

3 Working Capital to Total Assets 0.908 0.874 3.376 14.679

ROA 0.833 0.879

4 Debt to Equity 0.995 0.991 2.993 13.015

7 Acid Test Ratio 0.959 0.981 1.926 8.372

R&D/Sales 0.032 0.939

Gross Margin 0.015 0.999

8 Working Capital Turnover 0.998 1.000 1.643 7.145

CROCI 0.040 0.967

9 Cash Flow to LT Debt 0.986 0.974 1.426 6.202

Total variance explained 96.327

The second factor analysis showed that variance explained by the factors increased to 95.214% and that there is one redundant variable (i.e. cash ratio), see Table B1. The third factor analysis shows that no further factor extraction is needed, all the variables remaining satisfying the necessary conditions and the total explained variance is of 96.327%. Subsequently, 22 variables were selected as relevant in the case of the bankruptcy prediction.

(27)

27

Table 8

Summary statistics for the first factor analysis for 42 bankrupt and 42 matched non-bankrupt US economy listed firms. The bankrupt subsample consists of US listed firms which filed for bankruptcy in 2009. The non-bankrupt companies are matched based two criteria, Total Assets for the year 2007 and the NAICS classification.

1 COGS/Sales 0.997 0.996 6.674 17.112

Current Assets to Total Assets 0.997 0.995

Quick Ratio 0.997 0.995

ROA 0.866 0.943

ROE 0.395 0.495

3 Cash Ratio 0.926 0.914 4.864 12.472

Acid Test Ratio 0.892 0.871

Cash to Total Assets 0.769 0.801

CROCI 0.062 0.989

Gross Margin 0.018 0.996

4 Debt to Equity 0.991 0.990 3.124 8.012

5 Cash Flow to CAPEX 0.879 0.826 3.079 7.896

Operating Cash Flow Ratio 0.756 0.789

Interest Coverage Ratio 0.484 0.731

6 Cash Flow to ST Debt 0.864 0.871 2.538 6.508

Cash Flow to LT Debt 0.833 0.833

Fixed Assets Turnover 0.657 0.820

Equity to Fixed Assets 0.643 0.795

7 Book value per share 0.645 0.576 1.681 4.311

Working Capital to Total Assets 0.497 0.828

8 Fixed Assets to Total Assets 0.811 0.801 1.484 3.804

CAPEX to Sales 0.764 0.714

Accounts Payable/Sales 0.083 0.925

Debt to Assets 0.073 0.640

9 % ST Debt in Total Debt 0.170 0.721 1.268 3.250

R&D/Sales 0.121 0.951

10 Inventory Turnover 0.660 0.473 1.103 2.829

Total Assets Turnover 0.548 0.720

Cash Flow to Total Debt 0.405 0.574

Gross Margin to Total Assets 0.092 0.985

11 Working Capital Turnover 0.870 0.797 1.028 2.635