Beyond the credit grade: a study on the identification and application of the determinants of default in online peer-to-peer lending.

(1)

MSc Thesis Finance Daniël Nils Zange

Beyond the credit grade: a study on the identification and application

of the determinants of default in online peer-to-peer lending.

January 12, 2017 Abstract

Online peer-to-peer (P2P) lending is a relatively new phenomenon where lenders and borrowers are connected directly using an online platform. Online P2P lending providers are able to offer attractive rates to their clients, resulting in a rapid growth in recent years. The largest downside for lenders is however the problem of asymmetric information associated with online P2P lending. The aim of this study is to examine how the additional loan data provided by online P2P lending platforms can be used to help explain the default probability of loans. Additionally it is examined if the determinants of default identified in one period can be used to improve the loan selection for loans issued in a subsequent period. It is found that a number of determinants significantly explain the default

probability in addition to the provided credit grade for 36 month loans issued in 2012. Additionally it is found that the determinants of default identified for loans issued in 2012 can be used to improve the loan selection for 36 month loans

issued in January and February 2013. The results can provide insights to both academics and practitioners by addressing the most prominent issue in online P2P lending.

Keywords: default risk, credit risk, peer-to-peer lending, asymmetric information

Student number: S2026120 Email: d.n.zange@student.rug.nl Phone: +31636166799

(2)

1

1. Introduction

Online peer-to-peer (P2P) lending is a relatively new phenomenon in which the internet is used to create an online marketplace in which borrowers and lenders are connected directly. The first company to offer online P2P loans was the UK based company Zopa, which was founded in 2005 (www.zopa.com). The online P2P lending industry experienced a rapid growth since it’s advent. To illustrate, Lending Club, the subject of this study, issued approximately 76,5 million in loans at the end of 2009. This number rose significantly to more than 22,6 billion in total loans issued after the third quarter of 2016 (www.lendingclub.com).

Online P2P lending provides advantages as well as disadvantages for borrowers and lenders when compared to traditional lending alternatives. First, there is no financial intermediary (Galloway, 2009). This leads to implications for both lenders and borrowers. Borrowers might be able to get a more favorable rate on their loan compared to the loans offered by traditional financial institutions. Lenders might in turn be able to earn a better rate with online P2P lending compared to other lending alternatives (Emekter et al, 2015). Second, online P2P lending is fast and easy to use. Borrowers and lenders can open an online account in a few mouse clicks and apply for a loan or invest their funds. Traditional loan applications can take considerable more time and effort for borrowers. For lenders it is relatively quick and easy to invest because online P2P lending platforms generally require a low minimal investment and provide tools for lenders to diversify their investment between many different loans. One of the major downsides of online P2P lending is the potential exacerbation of the information asymmetry between lenders and borrowers when compared to traditional lending alternatives. Because of the relative anonymity provided by the internet it is harder for lenders to assess the creditworthiness of the

borrowers and this asymmetry can lead to adverse selection (Akerlof, 1970) and moral hazard (Stiglitz and Weiss, 1981). Assessing the creditworthiness of the borrowers is particularly important in online P2P lending given the fact that most platforms offer unsecured loans where the lenders carry all of the credit risk

(3)

2

Online P2P lending websites try to mitigate the problems associated with information asymmetry by screening the loan applicants using the applicants self-reported as well as verified data. A loan application gets declined if the applicant does not meet the minimum requirements. Applicants who meet the minimum requirement get assigned a credit grade with a corresponding interest rate. The greater the default risk as perceived by the online P2P lending platform the higher the interest rate. Additionally, online P2P lending platforms provide their customers with additional loan data which can be used by lenders to make more informed decisions when selecting loans for their portfolio.

The aim of this study is to examine if lenders can use additional loan data to improve their borrower assessment in addition to the credit grade that is provided by the online P2P platform. The subject of this study is the online P2P lending platform Lending Club and determinants of default are identified by using publicly available loan data on loans issued in 2012. These determinants are then used to calculate a default score which is used to rank loans issued in the first two months of 2013. It is then examined if the performance of a loan portfolio can be improved by selecting the loans with the lowest default score within a given credit grade category. More specifically, this paper aims to answer two related research questions.

1. Which loan/borrower characteristics can help to determine the default risk in addition to the provided credit grade?

2. Can the determinants of default identified in one period be used to improve the loan selection in a subsequent period?

The results of this study show that there are multiple determinants that explain the default risk of 36 month loans issued in 2012 in addition to the provided credit grade. Furthermore it is shown that these determinants can be used to help lenders make better loan selection decisions by using these determinants of default to rank loans within credit grade categories.

(4)

3

In section 3 the data used in this study will be described. Section 4 describes the methodology and results of the empirical study. Finally, section 5 concludes by discussing the results and limitations of this study.

2. Literature review

In this section, financial intermediation theory will first be used to explain why online P2P lending exists and why it has grown so rapidly in recent years. Second, financial intermediation theory is used to describe the most prominent downside for lenders in online P2P lending, the problems associated with asymmetric information. Finally, empirical studies on the topic of asymmetric information in online P2P lending are reviewed.

Traditionally, the role of financial intermediaries is to connect lenders with borrowers. Boot et al. (2016) define a financial intermediary as an entity that intermediates between providers and users of financial capital. The existence of financial intermediaries can be attributed to the presence of market

imperfections. As a result, financial intermediation theory has been developed to explain how financial intermediaries reduce these market imperfections. The existence and growth of online P2P lending can be explained by using financial intermediation theory.

(5)

4

addition online P2P lending platforms are not subject to the frictions between long term loans and short term deposits (Serrano-Cinca et al, 2015). As a result, online P2P lending providers are able to have a cost advantage over traditional financial intermediaries, enabling online P2P lending platforms to offer relatively more attractive rates to their client.

The existence of financial intermediaries can alternatively be explained by using transaction cost theory (Coase, 1960). Buckle and Thompson (1998)

identify four types of transaction costs involved in direct lending: search costs, verification costs, monitoring costs, and enforcement costs. Financial

intermediaries are able to lower these transaction costs by taking advantage of economies of scale, economies of scope, and expertise (Mishkin and Eakings, 2009). As a result it is more favorable for borrowers and lenders to make use of a financial intermediary than to incur the transaction costs of finding a

counterparty directly. The existence and growth of online P2P lending can be explained using transaction cost theory. According to Benston and Smith (1976) the raison d’ être for the financial intermediation industry is the existence of transaction costs. They argue that the role of financial intermediaries is to create specialized financial commodities whenever they can expect to sell them for prices which are expected to cover all costs of their production, both direct and opportunity costs. Online P2P lending platforms can be viewed as market makers for direct lending, creating specialized financial commodities and reducing

transaction costs without taking positions in loans themselves. The online P2P lending platforms use economies of scale, economies of scope, and expertise to bring each of the four types of transaction costs down. Technological

advancements in recent years made it possible to further lower transaction costs, explaining the rapid growth of the online P2P lending industry.

(6)

5

can ex ante lead to adverse selection and ex post to moral hazard (Lin, 2009). Adverse selection in credit markets, as described by Stiglitz and Weiss (1981), describes the phenomenon where informational frictions sort potential borrowers

ex ante. A borrower for example has the incentive to overstate his

creditworthiness to potential lenders. When the interest rate is set at a level that reflects the average quality of the borrowers the borrowers most likely to drop out will be the low-credit-risk borrowers. The low-credit-risk borrowers are not

willing to pay the higher interest rate or have better alternatives, leaving the high risk borrowers (Boot et al, 2016). Moral hazard in credit markets refers to the phenomenon where borrowers increase their risk taking behavior ex post. In the case of consumer lending this gets manifested by borrowers who have a greater incentive to default when given a higher interest rate (Karlan and Zinman, 2009). Leland and Pyle (1977) argue that the presence of information asymmetry and the resulting problems are the primary reason for the existence of financial intermediaries. Financial intermediaries such as banks are

specialized in assessing the creditworthiness of borrowers ex-ante and can use economies of scale when monitoring borrowers ex-post to alleviate asymmetric information problems. In addition financial intermediaries can ask borrowers to provide collateral against the loan, reducing losses in case of default.

The problems associated with asymmetric information are considered the most prominent downside for lenders in online P2P lending. The financial

intermediary, the expert in dealing with credit risk, does not bear the credit risk in online P2P lending. Instead the credit risk is shifted to the lenders who are often not specialized determining the creditworthiness of borrowers. In addition, most online P2P lending platforms provide unsecured loans.

Theoretically, asymmetric information problems can be alleviated by

financial intermediaries using informational economies of scale (Leland and Pyle, 1977) ex-ante and delegated monitoring (Diamond, 1984) ex-post. The delegated monitor in online P2P lending, the online P2P lending platform, can however not diversify its risk like traditional financial intermediaries do because the platform

does not take positions in the loans. Additionally, monitoring becomes more

(7)

6

online environment and the fact that borrowers and lenders do not physically meet (Gefen et al, 2008). In practice online P2P lending platforms attempt to reduce information asymmetry in four ways (Emekter et al, 2015). First, all potential borrowers are screened, filtering out the ones that are perceived as too risky. Second, the typical size of each loan is small, in the case of Lending Club the maximum loan amount was $35,000 (this has recently been changed to $40,000). Small loan sizes ensure that potential individual defaults represent a relatively small loss. Third, online P2P lending platforms offer tools to create a diversified portfolio of loans that is matched to the risk appetite of the lender. Fourth, credit agencies are used to collect funds in behalf of the lender in case of failed payments.

Lenders can also try to alleviate information asymmetry by assessing the creditworthiness of borrowers themselves. Online P2P lending platforms

facilitate these lenders and simultaneously improve transparency by providing (potential) lenders with additional information in addition to their own risk assessment. Both academics and practitioners have studied how the use of additional data can help assessing the creditworthiness of borrowers. Prior literature on the determinants of default in online P2P lending makes a distinction between hard and soft information. Petersen (2004) defines hard information as information that is quantitative and easy to store and transmit. Additionally, the content of hard information is independent of the collection process. Hard information in online P2P lending includes standard quantitative financial information such as the interest rate, the debt-to-income ratio and the principal loan amount. Soft information on the other hand is defined as non-standard and often non-quantitative information. Soft information is often dependent on the collection process. Examples include socio demographic factors and group intermediation.

Scholars have found that soft information can be used by lenders to reduce information asymmetry. Lin et al. (2013) find that social network relationships lower the probability of default. Freedman and Jin (2014) find that social ties can have a positive impact on the ability to get a loan funded. Iyer et al. (2009)

(8)

7

borrowers by using additional soft information. It is found that lenders are able to use soft information effectively and are able to predict default with up to 45% greater accuracy than based on credit score alone.

Prior studies show that soft information can be useful for lenders when used effectively. The non-standard nature of soft information makes it however more difficult to generalize findings on the topic. To be of value for a lender, soft information has to be generated and interpreted correctly to be useful. Hard information does not suffer from these drawbacks because of its quantitative and standardized nature. Emekter et al (2014) find that credit score, debt-to-income ratio, FICO score and revolving line utilization play an important role when determining the default probability at Lending Club. Serrano-Cinca et al. (2015) also study Lending Club and find that a combination of hard information like credit history and soft information such as loan purpose can explain the default probability between 2008-2011.

In sum, theories that have been developed to explain the existence of traditional financial intermediaries can be used to explain the existence and growth of online P2P lending. Online P2P lending platforms are able to promise relatively attractive rates by having a cost advantage over traditional

intermediaries, technological advancements in recent years made it possible for online P2P lending platforms to further lower their costs, resulting in a rapid growth. The cost advantage can be explained using the asset transformation theory and transaction cost theory. The most prominent downside for lenders in online P2P lending, default risk, can be explained using theories on the problems related to asymmetric information. Online P2P lending providers try to mitigate asymmetric information by screening the loan applicants ex ante and by making it possible for lenders to diversify their loans. In addition clients are provided with additional information in order for lenders to attempt and assess the creditworthiness of borrowers themselves. Prior literature recognizes the problems associated with information asymmetry in online P2P lending.

(9)

8

body of literature in two ways. First, determinants of default are identified

ex-post using data on loans that have recently reached maturity. Second, it is

examined if the determinants of default that have been identified in one period can be used to improve the loan selection of lenders in a subsequent period.

3. Data

(10)

9

two different maturities, 36 and 60 months. For the purpose of this study only the loans with a maturity of 36 months are considered. Focusing on the loans with a 36 month

maturity makes it possible to take an ex-post approach since these loans have reached

maturity at the time this study was conducted. Lending Club issued a total of 53,367 loans in 2012 from which 43,470 had a maturity of 36 months. The raw dataset contains more than one hundred variables per loan, Lending Club supplies this many variables to provide

(potential) lenders with as much information as possible in order to try and alleviate the information asymmetry problems that arise in online P2P lending. This study focusses on the factors that can potentially help to predict the probability of default ante by taking an

ex-post approach. This implies that variables that lenders could observe at the time

the loan was being funded were kept in the dataset, variables relating to

characteristics after the loan was funded were removed from the dataset, except for the terminal outcome of the loan. Variables that obviously have no

(11)

10

the least risk and G5 corresponds to a loan with the most risk as perceived by Lending Club. The distribution of loans per grade is presented in figure 1, it can be seen that the distribution is skewed to the higher grades with grade B being the grade with the largest number of loans. Table 2 provides a more in-depth look into the distribution by showing the number of loans as well as the total dollar amount of loans issued per subgrade. It can be seen that grade B3 has the largest number of loans as well as the

largest dollar amount of loans issued, one can also observe that the riskier grades are substantially less populated both in number of loans as well as in total dollar amount of loans issued. The interest rate that one receives or pays on a loan is determined by the rate corresponding to the subgrade assigned to the loan at time of application, the higher the risk as perceived by Lending Club the higher the interest rate. The interest rate corresponding to a subgrade is fixed at a particular point in time but the rates can fluctuate throughout time. To

illustrate, the interest rate corresponding to a B3 loan was 11.71% at the 1st_of

January and 12.12% at the 1st_{of December. Figure 2 shows the average interest}

(12)

11

F and G) actually had a lower default rate than grade D and E. While this seems counter-intuitive it has to be pointed out that the lowest credit grades represent a relatively small percentage of the total amount of loans, as shown in figure 1 and table 2. The overall default rate of 36 month loans issued in 2012 was

approximately 13.59%. Summary statistics of the continuous variables can be found in Table 4. It can be observed that the average credit grade was a B grade and the average interest rate was approximately 12.63% for 36 month loans in the 2012 period.

The second category of variables relates to the characteristics of the loan itself and includes the loan purpose and the loan amount. Borrowers who apply for a loan have to specify the purpose of the loan in their loan application.

Lending Club offers the choice between 13 different purposes, these are listed in Table 3. The loan distribution among purposes shows that the majority of loans were issued either for credit card- or general debt consolidation. Table 4 shows that the variable loan amount varied from the required minimum of $1000 to the maximum of $35,000, the average loan amount for 36 month loans in the 2012 period was $11,681.60.

The third category of variables describes the borrower characteristics and contains the annual income, the housing situation and the number of years a

borrower has been working at their current employer. Table 5 shows the

(13)

12

rent a house, the remaining part reported to own a house and a negligible part of the loan applicants either had no house or reported to have an alternative

housing situation. The annual income variable consists of the self-reported annual income at the time the of the loan application and is reported in units of one thousand dollars, approximately 55% of all incomes and reported employers were subsequently verified by Lending Club. The distribution of the employment length variable is shown in Table 6, it can be seen that more than a quarter of all borrowers reported to work for their current employer for ten years or more.

The fourth category of variables contains information about the credit history of the borrowers. A delinquency is defined as the number of 30+ days past-due incidences of delinquency and an inquiry is defined as an inquiry by

creditors, table 4displays the summary statistics of the credit history variables.

(14)

13

months prior to the application. Furthermore, the average

borrower had 0.03 public records, 10.5 open credit lines and used around 57% of available credit relative to all available revolving credit. The fifth and final category contains three ratios that relate to the indebtedness of each borrower. The variables loan amount to income and annual installment to income measure the impact of the loan relative to the income of the borrower. The average loan amount accounted for an average of 19.8% of the borrowers annual income and the yearly owed payments accounted for an average of 8% of the borrowers annual income. The difference between these two measures of indebtedness is that the annual installment to income ratio takes into account the interest payments while the loan amount to income ratio does not. The annual installment to income ratio will ceterus paribus be higher for low grade loans since these interest rates will be higher. The final indebtedness variable is the debt-to-income ratio as calculated by Lending Club, it is calculated by taking the borrower’s total monthly debt obligations minus mortgage expenses and the requested loan, divided by the self-reported monthly income of the borrower. The average borrower had a calculated debt-to-income ratio of 16.4%.

The exploration of the variables in the dataset helps to understand the selection criteria of Lending Club when deciding whether to approve a loan application. The borrower assessment by Lending Club shows a significant skew to higher grade loans. A possible explanation is that risky applications are more likely to be rejected than to be put into a risky grade category. The reasoning behind rejecting rather than assigning a higher rate to a borrower is that giving risky borrowers a higher interest rate increases the likelihood of adverse

(15)

14

or general debt consolidation. This can be attributed to the fact that Lending Club is a company operating in the United States focusing on personal loans. The typical American is always in credit card debt and many Americans carry

additional debt in the form of car and study loans (www.bloomberg.com). The rates on these loans are often higher than the rates Lending Club can offer, explaining the high percentage of loans in the credit card and debt consolidation categories. When looking at the borrower characteristics category it stands out that more than a quarter of all borrowers reported to work for their current

employer for more than 10 years, implying that Lending Club values employment length when deciding whether to approve a loan application. Moreover, it appears that Lending Club values credit history length when deciding whether to approve a loan application given the average of more than 14 years of credit history. Table 7 is shown in appendix A and contains the correlation coefficients between all the variables in the dataset. To avoid multicollinearity problems in the

regressions in the subsequent sectionvariables with a high correlation have to be

avoided to be included in the same regression. The table shows, as expected, a high correlation between grade, sub grade and the interest rate. High

correlations are also observed between loan amount, loan amount to income, and annual installment to income. The high correlations between these pairs of variables are expected and intuitive since they measure similar characteristics. High correlations are also observed between the credit card and debt

consolidation dummy variables as well as between mortgage and rent dummy variables. The high correlation between these dummies will not be problematic as long as the appropriate base category is chosen. The correlation coefficients

between the remaining variables appears to be low enough to avoid multicollinearity problems in the regressions in the next section.

4. Empirical results

(16)

15

follows, first the mean values of all variables are compared between the defaulted and non-defaulted groups of loans. Second, the variables which appear to be significantly different between the two groups are used as variables in a binary logistic regression to further examine their effect on the probability of default.

The second part of this section concerns the second research question and uses the determinants of default that have been identified for the 2012 sample to calculate a default score. The default score is then used to rank all 36 month loans issued in the first two months of 2013 within each subgrade. Two groups are then created, the first group consists of the ten percent of loans that have the lowest default score within each subgrade and the second group consists of the ninety percent of loans that have the highest default rate within each subgrade. Subsequently it is tested if the default rate of the first group is significantly lower than the default rate of the second group. If the default rate of the first group is significantly lower it is shown that the determinants of default identified in one period can be used to improve the loan selection in a subsequent period by only considering the loans that have the lowest default score within each subgrade.

4.1 Determinants of default

The largest concern for a lender is whether a loan he or she invested in will default. In this section the factors that might influence the probability of default will be examined. The section is structured as followed, in the first part the factors that might influence the probability of default are compared between the defaulted and non-defaulted groups in the sample. Second, a logistic regression is employed to further examine the effect of each factor on the probability of default

of a loan. Table 8 shows a comparison of the continuous variables between the

defaulted and non-defaulted groups of loans. A defaulted loan is defined as a loan that has not been fully reimbursed and a non-defaulted loan is defined as a loan that has been fully reimbursed. From the 43,470 loans in the sample 5,906 loans are classified as defaulted which translates to a total default rate of

(17)

16

whether there is a significant difference between the defaulted and non-defaulted groups of loans both a parametric and a non-parametric test are employed.

Column 6 shows the values of the parametric test, the t-values of an independent sample t-test. Because a t-test assumes normality in the distribution of the

variables the results of an additional, non-parametric test, are presented in column 7. The chi-squared values are obtained from the Kruskal-Wallis test and are adjusted for ties where necessary. The values of the two tests show similar results in terms of significance. All continuous variables except loan amount,

number of delinquencies in the past 2 years, months since last delinquency and open accounts are significantly different between the groups at the 1% level.

Moreover, all differences have the expected sign. Table 9shows the comparison of

the categorical variables between the defaulted and non-defaulted groups of loans. Columns two and three show the proportions and number of loans that defaulted per purpose or housing situation category. The fourth column shows the Pearson chi-squared statistic per category, it can be seen that the purpose categories credit card, home improvement and major purchase show a

significantly lower default rate at the 1% level while the category car shows a significantly lower default rate at the 5% level. The purpose categories debt

consolidation, other, and small business show a significantly higher default rate

(18)

17

higher default rate (p<0.1). The purpose categories house, moving, vacation and

wedding have default rates that are not significantly different from the sample

default rate. The housing categories show a significantly higher default rate for the rent category and a significantly lower default rate for the mortgage category (p<0.01). Furthermore, the default rate for the housing situation category own is significantly higher at the 5% level. The remaining two categories other and none show no significant differences but it has to be remembered that these categories represent a negligible part of the total sample.

To summarize, Table 8 and 9 show significant differences between defaulted and defaulted loans in the 2012 sample. The defaulted and non-defaulted groups of loans show significant differences in certain continuous variables and the proportion of defaulted loans appears to be significantly higher or lower for certain categories of purposes and housing situations. To further examine the effects of the variables that have shown to be significantly different between groups a binary logistic regression is employed. The binary logistic regression technique is chosen because the dependent variable under

(19)

18

been proposed for use in the analysis of a dichotomous dependent variable but the logistic distribution has two primary advantages. First, the logistic function is extremely flexible and easily to use from a mathematical point of view. Second,

it lends itself to an economically meaningful interpretation. Assume that 𝑑𝑖 is an

unobserved continuous number representing the likelihood of default for loan 𝑖, the following transformation is made to convert this number into a number between zero and one.

𝑝_𝑖 = 1

1−𝑒−𝑑𝑖 (1)

Where 𝑝𝑖 is the probability of default for loan 𝑖 and 𝑑𝑖 is given by the equation:

𝑑_𝑖 = 𝑏₀+ 𝑏₁𝑥_𝑖1+ 𝑏₂𝑥_𝑖2+ ⋯ + 𝑏_𝑛𝑥_𝑖𝑛+ 𝜀_𝑖 (2)

Where 𝑥_𝑖 is the independent variable 𝑖 and 𝑛 represents the number of covariates.

Table 10 shows the results of seven different models that have been estimated. Model one uses the subgrade as explanatory variable and model two uses the interest rate as explanatory variable. It has to be remembered that the subgrade and interest rate are highly correlated, as a result model one and two appear to be similar, both the subgrade and the interest rate are significant at the 1% level. The fit of the model, as measured by Nagelkerke’s R², is improved by 0.001 when using the interest rate instead of the subgrade, this can be attributed to the fact that the interest rate corresponding to a subgrade can fluctuate over time. In model three, the loan characteristics are added to the subgrade by incorporating twelve loan purpose dummy variables with the category credit card being the base category. The third model shows that the purpose categories debt

consolidation, house, medical, other, renewable energy, small business, and

vacation appear to be significantly riskier than the credit card category at the 1%

level. The category moving appears to be significantly riskier than the credit card category at the 5% level. The remaining purposes car, home improvement, major

purchase, and wedding show no significant differences in default probability

compared to the credit card category. In the fourth model the borrower

characteristics are added in addition to the subgrade by including the variables

(20)

19

housing situation with the category mortgage being the base category. The fourth model shows that borrowers who own or rent a house appear to have a higher probability of default compared to borrowers who have a mortgage. Furthermore it appears that annual income significantly lowers the probability of default at the 1% level while employment length appears to have no significant effect on the probability of default. In the fifth model the credit history is added in addition to the subgrade. The fifth model shows significant results for the number of

inquiries in the past 6 months (p<0.01), credit history length (p<0.01), public records (p<0.05), and revolving balance utilization (p<0.05). Furthermore, these

(21)

20

model which includes the variables of all 5 categories. Table 10 shows that the addition of each of the categories in model 3 to 6 improved the fit of the model. Additionally, the variables in each category do not appear to be nested in the subgrade, as can be seen by the significant chi-squared statistics of the likelihood ratio test which compared each model to model 1. The full model appears to fit the data the best as indicated by the highest Nagelkerke R² value and

chi-squared value of the likelihood ratio test. The improvement of the fit of the model by adding more covariates in addition to the subgrade has however no value to lenders if the predictive ability of the model is not improved.

To test the predictive accuracy of the model the c- statistic (calculated as the area under the receiver operating characteristic curve) has been calculated and is displayed at the bottom row of table 10. The c-statistic measures the ability of the model to discriminate between defaulted an non-defaulted loans (Hosmer and Lemeshow, 2000). The c-statistic of the seven models ranges between 0.618 and 0.655 which indicates that all models do a poor job at

discriminating between defaulted an non-defaulted loans (values >0.7 indicate a good predictive ability). The c-statistic is however the highest for model seven which indicates that the inclusion of the additional covariates improved the predictive ability of the model. Model 3 to 7 are replicated and shown in table 11 in appendix B using the interest rate instead of the subgrade and the loan

amount to income ratio instead of installment to income ratio, the results appear to be similar.

In the first research question it is asked which loan/borrower

(22)

21

in model 7 the average marginal effects have been calculated and are presented in table 12. It can be seen that dropping a subgrade increases the probability of default significantly by approximately 0.71% (P<0.01). Furthermore, significant increases in the probability of default can be observed between the different purpose categories. The renewable energy category for example increases the probability of default by approximately 14.39% when compared to the credit card category (P<0.01). When looking at the average marginal effects of the housing situation category it can be seen that the category rent increases the probability of default by approximately 1.54% when compared to the mortgage category (P<0.01). The reported annual income of a borrower has a negative effect on the probability of default, a ten thousand dollar increase in the reported annual

income of a borrower

significantly decreases the probability of default by 0.7% (P<0.01). When looking at the credit history variables it can be observed that an inquiry in

the past 6 months increases the

default probability by 1.48% (P<0.01), a public record increases the default

probability by 1.8% (P<0.05), and a ten percent increase in the revolving balance

utilization increases the

probability of default by 0.27% (P<0.01). The average marginal effects in the indebtedness category reveal that a ten percent increase in the

debt-to-income ratio of a borrower

(23)

22

the default probability (P<0.01) and a ten percent increase in the installment to income ratio increases the default probability by 2.75% (P<0.01).

The analysis up until this point has focused on identifying and analyzing the determinants of default in one period. In section 4.2 it will be examined if the significant determinants of default that have been identified in model 7 can be used to improve the loan selection in a subsequent period, in this case the loans issued in the first two months of 2013.

4.2 Improving loan selection

To examine if a lender is able to improve his/her loan selection ability by using the determinants of default identified in an earlier period the following

methodology is used. First, publicly available data on all 36 month loans issued in the first two months of 2013 is collected from the Lending Club website and compared to the data from the loans issued in 2012. Second, the coefficients from model 7 in the previous section are used to calculate a default score which makes it possible to rank the loans within each subgrade. Third, the default rates of the loans in the highest ranking decile within each subgrade are compared to the default rates of the ninety percent of loans that rank the lowest within each subgrade. If the default rate of the group of loans that rank the highest is

significantly lower than the group of loans that rank the lowest it is shown that the loan selection can be improved by only considering the top ranking loans within each subgrade. The 2013 sample consists of 11,548 loans from which 1,474 eventually defaulted, resulting in a default rate of approximately 12.76%. The

(24)

23

(25)

24

sample shows an average of 0.148 public records while the 2012 sample shows an average of 0.029 public records. The values of the remaining averages in table 13 appear to be comparable to the values in the 2012 sample. Table 14 shows the distribution of loans per loan purpose category for the 2013 sample. It can be observed that, just as in the 2012 sample, the majority of loans are concentrated in the categories credit card and debt consolidation, containing 26.97% and

59.63% of the number of loans respectively.

Table 15 shows an additional similarity between the samples, when looking at the loan distribution by employment length it can be seen that a

significant part of the loan applicants reported to work for their current employer for 10 years or more, 31.42% in the 2013 sample compared to 27.2% for the 2012 sample. When looking at the distribution of loans per housing situation category in table 16 it can be observed that a higher percentage of loan applicants had a mortgage in the 2013 sample, 51.02% compared to 43.75% in the 2012 sample. The reverse is true for the percentage of loan applicants that rented a house, 41.78% in the 2013 sample compared to 47.82% in the 2012 sample. In addition, the housing situation categories other and none are not represented in the 2013

(26)

25

default that have been identified for the 2012 sample can be used to improve the loan selection for the 2013 sample more than when selecting on credit score alone. In order to rank the loans in the 2013 sample in terms of predicted default probability the coefficients of model 7 are used to calculate a default score for each loan within subgrades. The coefficients of the variables that have been shown to be significant at at least the 5% level in model 7 are included in the scoring formula, yielding the following expression.

𝐷𝑒𝑓𝑎𝑢𝑙𝑡 𝑠𝑐𝑜𝑟𝑒 = 0.245 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑑𝑒𝑏𝑡 𝑐𝑜𝑛𝑠𝑜𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛 + 0.252 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: ℎ𝑜𝑚𝑒 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 + 0.581 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: ℎ𝑜𝑢𝑠𝑒 + 0.558 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑚𝑒𝑑𝑖𝑐𝑎𝑙 + 0.41 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑚𝑜𝑣𝑖𝑛𝑔 + 0.487 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑜𝑡ℎ𝑒𝑟 + 0.949 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑟𝑒𝑛𝑒𝑤𝑎𝑏𝑙𝑒 𝑒𝑛𝑒𝑟𝑔𝑦 + 0.839 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑠𝑚𝑎𝑙𝑙 𝑏𝑢𝑠𝑖𝑛𝑒𝑠𝑠 + 0.578 ∗ 𝑝𝑢𝑟𝑝𝑜𝑠𝑒: 𝑣𝑎𝑐𝑎𝑡𝑖𝑜𝑛 + 0.136 ∗ ℎ𝑜𝑢𝑠𝑒: 𝑟𝑒𝑛𝑡 − 0.006 ∗ 𝑎𝑛𝑛𝑢𝑎𝑙 𝑖𝑛𝑐𝑜𝑚𝑒 + 0.131 ∗ 𝑛𝑟 𝑜𝑓 𝑖𝑛𝑞𝑢𝑖𝑟𝑖𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑎𝑠𝑡 6 𝑚𝑜𝑛𝑡ℎ𝑠 + 0.159 ∗ 𝑛𝑟 𝑜𝑓 𝑝𝑢𝑏𝑙𝑖𝑐 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 + 0.238 ∗ 𝑟𝑒𝑣𝑜𝑙𝑣𝑖𝑛𝑔 𝑏𝑎𝑙𝑎𝑛𝑐𝑒 𝑢𝑡𝑖𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 + 0.823 ∗ 𝑑𝑒𝑏𝑡 𝑡𝑜 𝑖𝑛𝑐𝑜𝑚𝑒 𝑟𝑎𝑡𝑖𝑜 + 2.432 ∗ 𝑖𝑛𝑠𝑡𝑎𝑙𝑙𝑚𝑒𝑛𝑡 𝑡𝑜 𝑖𝑛𝑐𝑜𝑚𝑒 𝑟𝑎𝑡𝑖𝑜 (3)

Table 17 shows the average calculated default score per subgrade, it can be observed that the average default score generally becomes higher when moving down to riskier subgrades. This indicates that loans which Lending Club perceives as more risky are also on average classified as more risky by the calculated default score. To test if lenders can improve their loan selection by using the calculated default score the loans within each subgrade are ranked from a low default score to a high default score. The loans in each subgrade are then divided into two groups. The first group is composed of the loans in the decile with the lowest default score within each subgrade, this group will be referred to as the top 10 group. The second group consists of the ninety percent of the loans with the highest

(27)

26

18 shows the default rates per subgrade for the 2013 sample. Column 2 shows the default rates per subgrade for the whole sample, column 3 shows the default rates for the top 10 group, and column 4 shows the default rates for the bottom 90 group per subgrade. It can be observed that the top 10 group outperforms the bottom 90 group in terms of default rates in 23 of the 28 subgrades. The overall default rate in the top 10 group is 9.59% while the overall default rate of the bottom 90 group is 13.12%. To test whether the default rate of the top 10 group is significantly lower than the bottom 90 group both a t-test and a Kruskall-Wallis test are employed. Both tests are significant at the 1% level yielding a t-value of 3.419 and a chi-squared statistic of 11.679 respectively.

The second research question asks if the determinants of default identified in one period can be used to improve the selection of loans in a subsequent

(28)

27

5. Conclusion

Academics and practitioners have recognized the problems associated with asymmetric information in online P2P lending. This study aimed to identify specific determinants of default by using publicly available loan data on 36 month loans issued in 2012. It is found that a number of determinants

significantly help to determine the default risk in addition to the provided credit grade. The determinants of default that have shown to have a significant effect on the default probability in addition to the credit grade for 36 month loans in the 2012 period are the loan purpose, housing situation, annual income, number of inquiries in the past 6 months, number of public records, revolving balance utilization, debt-to-income ratio, and installment to income ratio. Furthermore, it has been shown that the loan selection for 36 month loans issued in the first two months of 2013 could significantly be improved by ranking loans within each subgrade based on a score calculated using the determinants of default identified in 2012.

The results of this study can have implications for both practitioners and academics. The problem of asymmetric information is widely recognized in prior literature and to examine which information can reduce the problems associated with asymmetric information is interesting from an academic point of view. The results additionally have value for practitioners. Lenders can use the results of this study to more consciously select the loans they choose to invest in.

Additionally potential borrowers can use the results of this study to realize which signals they give to potential borrowers in terms of perceived creditworthiness. Online P2P lending providers can also consider the results of this study by looking at the determinants of default that have been shown to be significant in this study and comparing them to the determinants that are used in their own credit risk assessment.

(29)

28

this study only considered the online P2P lending platform Lending Club. Because Lending Club screens potential borrowers ex-ante it could be that the population of lenders at Lending Club is different from the population of lenders at other online P2P lending platforms. The fact that Lending Club only operates in the United States further limits the generalizability of the results. Third, only consumer loans were considered in this study. An interesting topic for future research would be to identify determinants of default in online P2P business lending. Fourth, it has been shown that the determinants of default identified in one period can help to improve the loan selection in a subsequent period, it is however not clear how robust these results are over time. When more loans reach maturity further research can be done to examine how robust the determinants of default identified in 2012 are over time. Fifth, this study takes a binary

approach with regard to loan defaults. In practice, loans that have not been fully reimbursed might have been partially reimbursed, this was not considered when doing this study but might be interesting to include in future research. Sixth, the determinants of default identified in this study helped to explain the default risk in addition to the credit grade, it is however reasonable to assume that Lending Club changes the way in which the credit grades are calculated over time. If the way the credit grade is calculated changes the applicability of the results of this study would diminish.

Given the results and limitations of this study future research on

asymmetric information in online P2P lending is highly encouraged. The fact that the online P2P lending industry is still growing makes further research on the topic increasingly important and valuable for both academics and practitioners. The way in which online P2P lending platforms will address the problems

(30)

29

References

Akerlof, G., 1970. The market for lemons: quality uncertainty

and the market mechanism. Quarterly Journal of Economics 84, 488–500. Benston, G., Smith C., 1976. A Transactions Cost Approach to the Theory of Financial Intermediation. The Journal of Finance 31, 215-231.

Boot, A., Greenbaum S,. Thakor A., 2016. Contemporary Financial Intermediation. Elsevier, Amsterdam.

Buckle, M., Thompson, J., 1998. The UK financial system: theory and practice. Manchester university press, Manchester.

Coase, R., 1960. The Problem of Social Cost. Journal of Law and Economics 3, 1– 23.

Diamond, D., 1984. Financial Intermediation and Delegated Monitoring. Review of Economic Studies 51, 393–414.

Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M., 2015. Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics 47, 54-70.

Freedman, S., Jin, G., 2014. The signaling value of online social networks:

Lessons from peer-to-peer lending. Unpublished working paper, National bureau of economic research, Cambridge, MA.

(31)

30

Gefen, D., Benbasat, I., Pavlou, P. 2008. A research agenda for trust in online environments, Journal of Management Information Systems 24. 275–86.

Gurley, J., Shaw E., 1960. Money in a Theory of Finance. Brookings institution, Washington D.C.

Hertzberg, A., Liberman, A., Paravisini. D. 2016. Adverse selection on maturity: evidence from online consumer credit. Unpublished working paper. Columbia Business School, New York.

Hosmer, D., Lemeshow, S., 2000. Applied logistic regression. John Wiley, New York.

Iyer, R., Khwaja A., Luttmer E., Shue K., 2009. Screening in new credit markets: can individual lenders infer borrower creditworthiness in peer-to-peer lending? Unpublished working paper. National bureau of economic research, Cambridge, MA.

Karlan, K., Zinman, J., 2009. Observing unobservables: identifying information asymmetries with a consumer credit field experiment. Econometrica 77, 1993-2008.

Leland, H., Pyle, D., 1977. Informational asymmetries, financial structure, and financial intermediation. The Journal of Finance 32, 371–87

Lin, M., Prabhala, N., Viswanathan, S., 2013. Judging borrowers by the company they keep: friendship networks and information asymmetry in online peer-to-peer lending. Management Science 59, 17–35.

(32)

31

Petersen, M. A. (2004). Information : Hard and Soft. Unpublished working Paper. National bureau of economic research, Cambridge, MA.

Stiglitz, J., Weiss, A., 1981. Credit rationing in markets with imperfect information. American Economic Review 71, 393–419.

Steverman, M., 2017. Americans can’t help themselves from borrowing more on credit cards. Bloomberg. Available at:

(33)

32

Appendix A

Table 7, correlation matrix showing the correlation coefficients between all variables for 36 month loans issued in 2012

(34)

(35)

34

Appendix B

This table shows the results of the seven binary logistic regression models