University of Amsterdam, Amsterdam Business School MSc Finance, Quantitative Finance track

(1)

1

University of Amsterdam, Amsterdam Business School MSc Finance, Quantitative Finance track

Master’s Thesis

The Impact of Artificial Intelligence on Bank Performance

Beurgaud Sara 13268082 July 2021

Thesis Supervisor: Felipe Dutra Calainho

(2)

2

Statement of Originality

This document is written by Student Sara Beurgaud who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3

Acknowledgments

I would like to thank Felipe Dutra Calainho for his precious help before and during the supervision of my thesis. As busy as he was, he made himself regularly available to answer any questions or address possible doubts I had throughout the process. He suggested the idea

of using patents to construct a proxy variable for the use of artificial intelligence even before being my supervisor. This master’s thesis would never have been possible without the insight

and support from Felipe.

(4)

4

The Impact of Artificial Intelligence on Bank Performance

Sara Beurgaud

Abstract

This empirical research focusses on the impact of the use of Artificial Intelligence on the performance of banks in the United States of America during the period 2011 to 2019. The use of artificial intelligence is proxied by the bank’s patent applications regarding AI. The results show that the use of AI has a positive impact on the independent variables size, net income, EBIT, output, number of employees, salary expenses, furniture expenses and operating expenses. The regression of EBIT on AI reveals a positive coefficient of 153.7 million dollars which is statistically significant at the 1% significance level. This regression exhibits a high r-squared of 0.955 when using the control variables size, industry, ownership, leverage, age, operating expenses, book-to- market ratio. This same model run with entity fixed effect still gives a positive AI coefficient however it is not statistically significant. This master’s thesis gives an indication that the use of AI increases the performance of banks through a higher production, over the period 2011-2019 in the US. Due to the limitations of the model and the data, further studies are needed to conclude with certitude that AI improves bank performance. Future research should investigate the impact of AI on performance by dealing with a wider study period, distinguishing the different subfields and applications of AI, improving the AI proxy or finding an instrumental variable, and studying the performance of other institutional investors.

(5)

5

Introduction

The Artificial Intelligence field is expected to grow by 40.2% (Grand View Research, 2021) as there are more incentives to collect and exploit new data. Artificial intelligence (AI) refers to the simulation of human intelligence in machines such as learning and problem-solving (Frankenfield Jake, 2021). There is not a consensus over what AI entails, but an AI technology should have the power to reason and integrate knowledge. An AI should not only recognise but should also do something with its gathered information (Grinaker Siw, 2019).

AI is a broad general field that can be divided in different subfields (The informativo, 2020) such as:

• Machine learning (ML): functions that can automatically learn from and adapt to new data without being assisted by humans. Deep learning (DL) is a subfield of Machine Learning that enables this automatic learning through the absorption of huge amounts of unstructured data such as text, images, or video. Artificial Neural Networks (ANN) are a subset of machine learning that are based on a collection of connected nodes.

• Expert Systems: a computer developed to solve challenges that need human knowledge or experience.

• Natural Language Processing (NLP): a way for computers to evaluate, recognize, and infer significance from human language

• Computer Vision: the processing of any image/video source to extract meaningful information and to take action on that basis.

• Robotics: robots are autonomous entities that are designed to control items by perceiving, capturing, transferring, or changing the physical properties of the object. It allows to free manpower from performing repetitive tasks

• Speech Recognition: the ability of a computer and a program to spot words and phrases within the speech and convert them to a computer-readable format.

(9)

9

Figure 1: Subfields of Artificial intelligence (Amr Kayid, 2020)

In the future, a technology mimicking human intelligence could be achieved through a network of sub programs handling vision (computer vision), language (NLP), adaptation (machine learning), movement (robotics) etc (Amr Kayid, 2020).

AI is used in the financial sector through different applications (Zavadskaya Alexandra, 2017):

• Anomaly/fraud detection: identify behaviours that deviate from standard patterns.

• Portfolio management and robot-advisory: establish an optimal investment strategy.

• High-frequency trading: use algorithms to incorporate knowledge about changing market conditions and trade automatically based on the gathered information.

• Text mining and Market sentiment analysis: automatically read and analyse text to investigate the behaviour of market participants.

• Robotic process automation: automate repetitive tasks to reduce time and costs.

• Credit evaluation and loan/insurance underwriting: credit worthiness assessment.

AI allows to reduce governance issues, human errors, and behavioural biases by reducing the number of humans making decisions. Algorithms can overcome the harmful effects of cognitive biases (Sunstein, 2018). This innovative tool is becoming essential for institutional investors however it could also have some drawbacks:

• Expensive mistakes due to the application of wrong models in the financial sector (Pandio, 2021): massive amounts of cash are being managed inside of banks thus an inaccurate automated task could have considerable consequences.

(10)

10

• Expensive implementation and maintenance (Prabhakar Mali Gaurav, 2020): Human capital, hardware and Research & Development in Artificial Intelligence is costly. The cost of skilled managers should be compared to the cost of skilled data scientist and hardware, which is why the return after fees (net income and EBIT) should be considered as the independent variable in the model to assess performance.

• Shortage in AI engineers: AI skills are not easily accessible, and outsourcing could be necessary which would increase the operating expenses (Scanlon Luke, 2020).

• Limits due to customer protection and IT security (Scanlon Luke, 2020): New technologies are particularly difficult to implement inside banks due to regulatory limitations. Other kinds of institutional investors such as fintech start-ups and hedge fund do not face this challenge.

While algorithmic and data mining methods are not based on traditional finance theory and often considered as black box, they frequently produce better forecasting results than traditional time- series model (Zavadskaya Alexandra, 2017), especially in nonlinear spaces. However, it can also add unnecessary complexity in linear spaces, therefore artificial intelligence seems to be a complementary tool for successful econometricians.

This paper intends to answer the research question: Does artificial intelligence optimise banks’ performance?

Six hypotheses are made, the main hypothesis states that the use of Artificial intelligence increases the bank net profits (net income and EBIT). The performance of banks using AI and banks not using AI is compared by using a difference-in-differences model. The 66 banks using AI are classified in the control group before the occurrence of the treatment and they are placed in the treatment group for the second period of the experiment. The 770 banks that do not use AI during the study period are classified in the control group. The results support the main hypothesis as the regression of EBIT on AI reveal a positive coefficient of 153.7 million dollars which is statistically significant at the 1% significance level. This regression exhibits a high r-squared of 0.955 when using the control variables size, industry, ownership, leverage, age, operating expenses, book-to- market ratio. This same model run with entity fixed effect still gives a positive AI coefficient however it is not statistically significant. In order to solve any reserved causality issue, the regression using entity fixed effect is required as it would allow to compare a bank’s performance after the treatment to its own performance before the treatment. Without the entity fixed effect, the average performance of the 66 banks (that are probably the best performing banks) would be compared to

(11)

11

the average performance of the 836 banks in the dataset. A wider study period is needed to obtain more observations and thus a statistically significant result. This research still provides a plausible indication that AI has a positive impact on bank performance.

The literature regarding finance and data science proved the effectiveness of artificial intelligence through different tasks such as trading strategies and weight optimization. Even though 80% of the equity market capital is held by institutional investors (Pensions & Investments, 2017), the available research focussed on the impact of artificial intelligence on performance for different institutional investors is scarce. Moreover, the topic of AI in finance is relatively new, thus there are few empirical researches and most of the existing literature is composed of data science papers that are implementing AI methods to improve asset allocation. The research method used in this master’s thesis is also innovative and built from scratch, including the way the variable AI was created to account for the use of Artificial Intelligence. This empirical research will contribute to optimise banks’ efficiency and will add to the debate of the relevance of the use of artificial intelligence in finance.

1. Related literature

AI in finance

Several research papers argue that the use of artificial intelligence improves performance. Amer Awad Alzaidi (2018) studies the impact of artificial intelligence in the banking sector in the Middle East by manually collecting data from 200 banks using AI. The paper concluded that use of Artificial intelligence in banking sector can have positive impact on overall productivity of the banking system. Similarly, Abdallah Abusalma (2021) concluded that artificial intelligence directly affects job performance and that the interest in artificial intelligence increases the bank’s efficiency and strengthens its ability to perform banking internally and externally. Zihao Zhang, Stefan Zohren and Stephen Roberts (2020) utilize deep learning models to directly optimize the portfolio Sharpe ratio. The author’s model delivers better performance and tolerate larger transaction costs than either of these benchmarks: Mean–variance analysis and the maximum diversification method introduced by Choueifaty and Coignard (2008). Derek Snow (2020) found that machine learning can help with most portfolio construction tasks, such as idea generation, alpha factor design, asset allocation, weight optimization, position sizing, and the testing of strategies. Experimental results of Carol Hargreaves, Vallaru Chandana and Vishnu Reddy (2017) confirmed that the use of

(12)

12

machine learning and artificial intelligence methods can help to select top performing stock portfolios that outperform the stock market. Machine learning seems also proficient in predicting industry return based on the information in lagged industry returns (David E. Rapach, Jack K.

Strauss, Jun Tu and Guofu Zhou, 2019).

AI and Econometrics are complementary

Shihao Gu & Bryan Kelly & Dacheng Xiu (2020) demonstrate large economic gains to investors using machine learning forecasts, in some cases doubling the performance of leading regression- based strategies from the literature. This is because nonlinear predictor interactions are missed by other methods. Unlike many of the industry approaches which use heuristics and numerical approximation, the machine learning approach from Robert Kissell and Jungsun Bae solves for the exact problem and provides a dramatic improvement in calculation time (Robert Kissell and Jungsun Bae, 2018). Building accurate computational models can be achieved by assembling ML and econometrics (Joseph A. Cerniglia and Frank J. Fabozzi, 2020). Machine learning tools provide the ability to make more accurate predictions by accommodating nonlinearities in data, understanding complex interaction among variables, and allowing the use of large, unstructured datasets. The tools of financial econometrics remain critical in answering questions related to inference among the variables describing economic relationships in finance; when properly applied, their role has not diminished with the introduction of machine learning.

Challenges regarding the application of artificial intelligence in the financial sector The rate of failure in quantitative finance is high, particularly in financial machine learning applications (Marcos López de Prado, 2018). Marcos López de Prado finds 10 critical mistakes underlie those failures. Machine learning algorithms have been developed for certain data environments which substantially differ from the one encountered in finance. Not only do difficulties arise due to some of the idiosyncrasies of financial markets, there is a fundamental tension between the underlying paradigm of machine learning and the research philosophy in financial economics. There are four main conflicts between financial-economic research and machine learning relating to the importance of statistical inference: causality, theoretical hypotheses, and model assumptions (Kristof Lommers, Ouns El Harzli, Jack Kim (2021)). First, the main difference between econometrics and machine learning could be found in the focus on inference relative to prediction (Bzdok, Altman and Krzywinski, 2018). Second, causal understanding has a central place in financial economics while most machine learning methods do not place much emphasis on causation (Rudin, 2015). Third, financial economics is hypothesis

(13)

13

driven while machine learning tends to be data driven (Rudin, 2015). Fourth, in financial economics one tends to make assumptions on the relationship that the model attempts to describe, and one believes that the relationship between variables is governed by that specific model (Rudin, 2015).

However, Kristof Lommers, Ouns El Harzli and Jack Kim (2021) argue that machine learning could be unified with financial research to become a robust complement to the econometrician’s toolbox. More specifically, it can be used for various parts of the research process such as data pre- processing, estimation, empirical discovery, testing, causal inference, and prediction.

Another challenge ML must overcome is behavioural biases. Some large allocators shy away from systematic hedge funds altogether (Campbell R. Harvey, Sandy Rattray, Andrew Sinclair and Otto Van Hemert, 2017). One possible explanation is “algorithm aversion”. However, the authors find no empirical basis for such an aversion. For the period 1996–2014, systematic and discretionary manager performance is similar, after adjusting for volatility and factor exposures (that is, in terms of their appraisal ratio). It is sometimes claimed that systematic funds have a greater exposure to ill-known risk factors. However, the authors find that for discretionary funds (in aggregate), more of the average return and the volatility of returns can be explained by risk factors.

Due to the inflexibility in robo-advice algorithm application in the same risk profile, investing with human advisers yielded superior returns (J.P. Harrison and S. Samaddar, 2020). These unexpected results showed that at each investment level in the test range ($100k < x < $1MM), the human adviser outperformed the robo-adviser; it was also seen that the robo-adviser was not sensitive to investment or age level, only investor-declared risk tolerance. This demonstrates human financial advisers’ added value in recent market conditions.

Literature on the Methodology

Xiaodong Yuan, Fan Hou & Xuehui Cai (2020) study the effect of patent asset ownership on firm performance. They use EBIT as a dependent variable and the following control variables: sector, ownership, size, age, selling expenses, leverage. In previous literature, the control variables include industry sector (Artz et al. 2010), firm ownership (Zhou et al. 2008), firm size (Zott and Amit 2008), firm age (Maresch, Fink, and Harms 2016), leverage (Rasoulian et al. 2017), and selling expenses (Santhanam and Hartono 2003).

(14)

14

In many studies, bank output is proxied by total revenues or total assets, while labour and capital inputs are proxied by number of employees and total non-labour cost respectively (Athanasoglou and Brissimis, 2004 and Athanasoglou et al., 2008).

When studying the operating performance of banks, Simon H.Kwan (2003) uses as dependent variables the ratio of the total operating costs to earning assets, the ratio of the labour cost to earning assets, and the ratio of the physical capital cost to earning assets. He controls for the ratio of loan loss provision to total loans, the ratio of cash and dues from banks to total assets, the ratio of equity capital to total assets, which are used to proxy for output quality, liquidity of the bank and managerial quality. He also uses the ratio of retail deposits to total deposits and the ratio of total loans to total earning assets that both control for the potential variation in banking powers across countries. Since this thesis focusses on the performance of banks in the United-States, these two last control variables can be dropped. Regressions similar to those of Simon H.Kwan are run to study the operating efficiency of the banks using AI, but without using ratios in order to study whether banks using AI have higher production, salaries expenses, furniture expenses and number of employees.

2. Data

Data sources

The WRDS US Patents (Beta) Patents provides patent-level data directly parsed from USPTO's XML files, which gather patent application is the United States of America. The current version covers data range between 04/01/2011 and 31/12/2019, therefore our research will focus on this study period in the USA.

The Compustat Bank Fundamentals Annual database provides annual data on banks including balance sheet and income statement variables.

The WRDS US Patents (Beta) Compustat Link contains linkages between individual patents and Compustat companies. It allows to link the patent number to the identifier of the bank (GVKEY).

By merging the two previous datasets, the performance of the banks that have applied for a patent is obtained.

(15)

15 Data cleaning

First, the variables patent number (patnum) and GVKEY (gvkey) are downloaded from the WRDS US Patents (Beta) Compustat Link database for the period 2011-2019. The minimum of patent number is collapsed on gvkey to keep only the first application of each company. The variable gvkey needs to be destringed in order to merge this file with the subsequent files.

Then, the variables patent number (patnum) and application date (appldate) are downloaded from WRDS US Patents (Beta) for the period 2011-2019. The minimum of application date is collapsed on patent number to keep only the first application of each company.

From the Compustat Bank Fundamentals Annual database for the period 2011-2019 are downloaded the following variables: “GVKEY” (gvkey), “Net Income” (ni), “EBIT” (ebit), “Fiscal Year” (fyear), “Assets – total”, “Debt in Current Liabilities – Total” (dlc), “Long-Term Debt – Total” (dltt), “total fair value assets” (tfva), “total fair value liabilities” (tfvl), “liabilities and stockholder’s equity – total” (lse), “total current operating expenses” (tcoe), “Investment Securities - Gain (Loss) – Total” (isgt), “Staff Expense - Wages and Salaries” (xstfws), “Furniture and Equipment Expense” (fedrcs), “Employees” (emp), “Provision for Loan Losses” (pclc), “Cash and Due from Banks– Total” (cdbt), “Stockholders Equity – Total” (teq), “North American Industry Classification Code” (naics), “Stock Ownership Code” (stko), “Current ISO Country Code – Headquarters” (loc). The variable gvkey needs to be destringed.

The age variable is computed by counting the number of lags of each time series in the database Compustat Bank Fundamentals Annual for the data range 1955-12 to 2021-05, thus the variables gvkey and fyear need to be downloaded from this dataset for the widest period possible. The variable gvkey needs to be destringed.

After, the dataset from the Compustat Bank Fundamentals Annual database for the period 2011- 2019 is used. The data are collapsed by gvkey which then gives a list of all the gvkeys of the banks in the dataset . A one-to-one merge is operated between the data from WRDS US Patents (Beta) Compustat Link and the data from WRDS US Patents (Beta). The patnum equal to "D0711674"

in line 76 and "D0779635" in line 77 are replaced to "D711674" in line 76 and "D779635" as it was a typing error that would cause issue in the AI variable construction. The resulting dataset gives the gvkey and the patent number of the banks that have applied for a patent, therefore it is the dataframe used to construct the AI variable. All the observations are kept in the dataset as the

(16)

16

algorithm performed proved that all the patents for which the banks have applied are related to AI. A one-to-many merge is executed with the data from the Compustat Bank Fundamentals Annual database for the period 2011-2019 to obtain the dataset showing the different variables for all the banks and the application date for the banks that applied for an AI patent. A many-to-one merge is executed with the data from the Compustat Bank Fundamentals Annual database for the period 1955-2021 to obtain the age variable.

The observations having naics equal to 721110 are dropped as this industry code refers to “Hotels (except Casino Hotels) and Motels” thus it was an error to include these observations in the banking dataset of compustat. The observations before 2011 are dropped as the study period of this research is 2011-2019. Only the observations with “Current ISO Country Code – Headquarters” (loc) equal to “USA” are kept as the geofigureical area of the research is the US.

The following variables are constructed: the ratio of net income to total assets, the logarithm of total assets, Tobin’s Q, Book-to-Market ratio, Leverage, the ratio of salary expenses to the number of employees, the number of banks and the number of banks using AI.

Construction of the AI variable

The AI dummy variable accounts for banks having applied for a patent regarding artificial intelligence. To create it, the datasets bank regulatory and WRDS US Patents are merged which gives a panel of banks which have applied for a patent. Because of the lack of data in the WRDS patent database, I wrote a web scraping algorithm in python (Appendix, source code 1 and 2) to keep only the patents that are related to machine learning. The algorithm is using each patent number to create a link to the website Google Patent. From each page, the algorithm is taking the patent title using the Beautiful Soup package. Then, using the Selenium package, the algorithm looks whether any of the following words are written on each page: "neural network", "learn",

"ML", "artificial intelligence", "AI", "robot", "natural language processing", "intelligen", "expert system". If any of the words in the list is used, the patent associated with the page is kept in the data frame, otherwise it is dropped. Subsequently, the application date of the patent allows to create the AI variable. The time series of the banks that have never applied for a patent between April 2011 and December 2019 have their AI variable equal to zero. The time series of the banks who have applied for an AI patent in this period have their AI variable equal to 0 before the application date and equal to 1 after the application date. In the dataset, 21 banks have applied for a patent related to Machine Learning. All the banks that applied for patents (66) have their patents related

(17)

17

to Artificial Intelligence which allows to apply the central limit theorem, as more than 30 observations are obtained.

Summary statistics

The total number of banks observed reported in the first line of table 1 is 836, among which there are 66 banks using AI (line 2 table 1). The period of observation goes from 2011 to 2019 as presented in line 3 of table 1 and the total number of observations is 5572. Net income ranges from -3,426 million US dollars to 36,431 million US dollars with a mean of 179.7 million US dollars (line 4 table 1). The median is substantially lower than the mean (9.4 million US dollars) which proves that a few large companies drag the average net income up. EBIT has also a high standard deviation (line 5 table 1), meaning that the performance spectrum of the banks observed is large.

Total assets range from 64.548 million US dollars to 2,687,379 million US dollars which is a significant variation thus the standard deviation of total assets is 154,143 (line 6 table 1). Moreover, the median (1,237.695) is considerably far from the mean (19,314.2). The logarithm of total assets accounts for the size variable, it allows to obtain less variation and smaller numbers than total assets as the values range from 4.2 to 14.8 with a standard deviation of 1.6 (line 7 table 1). Leverage varies from 0 to 622,827 million US dollars as some banks do not use leverage (line 8 table 1). 50% of the banks use less than 82.489 million US dollars of leverage. The minimum value of Tobin’s Q is 0 because the total fair value of assets and liabilities is equal to 0 for some companies (line 9 table 1). Therefore, the book-to-market ratio is used to account for value creation (line 10 table 1). The banks in the data set are from 1 to 63 years old (line 11 table 1). The yearly operating expenses averages to 683.8 million US dollars (line 12 table 1). The average investment gain is surprisingly negative and, as reported in line 13 of table 1, more than 50% of the banks made losses when investing in securities. The minimum salary expenses (line 14 table 1) and the minimum equipment expenses (line 15 table 1) are respectively 857,000 US dollars and 36,000 US dollars. Line 14 of table 1 shows that the number of employees ranges from 0 to 282 000. Output quality (line 16 table 1), liquidity (line 17 table 1) and management quality (line 18 table 1) have a minimum value of respectively -202 million US dollar, 437,000 US dollar and -144 million US dollar.

Table 2 shows that there are only 452 observations available for the banks using AI, against 5120 observations for banks not using AI as reported in table 3. The net income (line 2 table 2) and EBIT (line 3 table 2) of the banks using AI experience significantly more variation than the net income (line 2 table 3) and EBIT (line 3 table 3) of the banks not using AI. The net income and EBIT minimum are lower and the maximum are higher for banks using AI than for banks not

(18)

18

using AI even though less observations are studied regarding banks using AI. Total assets (line 4, table 2) which accounts for the size of the bank has a substantially higher mean, minimum and maximum for the banks using AI than for the banks not using AI. The banks using AI are at least 4 years old (line 9, table 2).

Another interesting insight the data give is that several variables went down in 2015:

• Net income for banks not using AI (figure 4)

• EBIT (figure 6 and 7, Appendix), output (figure 8 and 9, Appendix), salary expenses (figure 10 and 11, Appendix), furniture expenses (figure 12 and 13, Appendix), number of employees (figure 14 and 15, Appendix) for both types of banks

• Operating expenses for banks using AI (figure 16, Appendix)

Table 1: Summary statistics of all the banks

count mean median minimum maximum standard deviation

Number of banks 836 1 1 1 1 0

Number of banks

using AI 66 1 1 1 1 0

Fiscal year 5572 2014.8 2015 2011 2019 2.546

Net income

(million USD) 5553 179.7 9.391 -3426 36431 1539.193

EBIT

(million USD) 5547 407.9 22.637 -84.079 63197 3314.546

Total assets

(million USD) 5553 19314.2 1237.695 64.548 2687379 154143

Log(total assets) 5553 7.472 7.121 4.167 14.80408 1.582491

Leverage 5546 3106.2 82.489 0 622827 31365.99

Tobin’s Q 5368 0.179 .153 0 1.927 .155

Book-to-Market

ratio 5335 9.075 .714 -9.175 8485.6 155.385

Age 5572 19.70 18 1 63 11.821

Operating exp.

(million USD) 5552 683.8 45.569 2.106 115304 5506.25

Invest. Gains

(million USD) 5279 -4.764 -.06 -3374 138.735 76.952

Salary exp.

(million USD) 5540 264.7 19.018 .857 36965 2161.106

Equipment exp.

(million USD) 2163 33.63 2.182 .036 2763 200.843

Number of employees (thousand)

4552 3.038 .312 0 282 19.781

Output Quality

(million USD) 5465 40.77 1.5 -202 13410 377.120

Liquidity

(million USD) 5545 1769.0 52.122 .437 512308 17952.14

Management Quality (million USD)

5553 2117.3 130.506 -144.116 267146 16150.24

Number of

Observations 5572

(19)

19

Notes: the number of banks is computed by counting the number of different GVKEY in the Compustat Bank Fundamentals Annual database for the period 2011-2019. The number of banks using AI is computed by counting the number of different GVKEY having AI=1. The fiscal year ranges from 2011 to 2019. Net income is COMPUSTAT item n°172. EBIT is “Earnings Before Interest and Taxes” (ebit). Total assets is COMPUSTAT item n°6. Log(total assets) is the logarithm function applied to total assets. Leverage is the sum of “Debt in Current Liabilities – Total” (dlc) COMPUSTAT item n°34 and “Long-Term Debt – Total”

(dltt) COMPUSTAT item n°142. Tobin’s Q =(tfva+tfvl)/lse with tfva being the “total fair value assets”, tfvl being “total fair value liabilities”, lse being “liabilities and stockholder’s equity – total”. Book-to-Market ratio is stockholder’s equity (teq) divided by total fair value assets (tfva). The age variable is computed by counting the number of lags of each time series in the database Compustat Bank Fundamentals Annual for the data range 1955-12 to 2021-05. Operating exp. is “total current operating expenses” (tcoe).

Invest. Gains is “Investment Securities - Gain (Loss) – Total” (isgt). Salary exp. is “Staff Expense - Wages and Salaries” (xstfws).

Equipment exp. is “Furniture and Equipment Expense” (fedrcs). Number of employees is COMPUSTAT item n°29 “Employees”

(emp). Output Quality is “Provision for Loan Losses” (pclc). Liquidity is “Cash and Due from Banks– Total” (cdbt). Management Quality is “Stockholders Equity – Total” (teq). The first column gives the name of the variables and their units. The second column counts the number of observations for each variable. The third column shows the mean, the fourth the median and the fifth the standard deviation. The figures are rounded to three decimal points.

Table 2: Summary statistics of the banks using AI

Fiscal year 452 2015.2 2015 2011 2019 2.497

Net income

(million USD) 452 1817.7 186.102 -3426 36431 5107.537

EBIT

(million USD) 452 4072.0 410.651 -84.079 63197 10938.11

Total assets

(million USD) 452 190677.7 20163.99 174.509 2687379 508846.2

Log(total assets) 452 10.04 9.917 5.162 14.804 2.159

Leverage 452 33038.3 1302.853 5.592 622827 105237.4

Tobin’s Q 440 0.196 .171 0 .727 .119

Book-to-Market

ratio 434 2.002 .699 -9.176 105.157 9.043

Age 452 34.39 33 4 63 17.197

Operating exp.

(million USD) 452 6679.0 595.099 6.895 115304 18183.06

Invest. Gains

(million USD) 385 -47.99 -.466 -3374 138.735 262.141

Salary exp.

(million USD) 452 2617.1 229.769 2.579 36965 7144.876

Equipment exp.

(million USD) 218 260.7 25.9 .358 2763 580.449

423 25.05 3.256 .034 282 60.413

Output Quality

(million USD) 452 353.5 14.793 -202 13410 1213.771

Liquidity

(million USD) 452 19740.9 618.879 7.586 512308 60043.28

452 20517.8 2431.064 -144.116 267146 53053.8

Number of

Observations 452

Notes: table 2 summarises the same variables as table 1 but applying the condition AI=1. The figures are rounded to three decimal points.

(20)

20 Table 3: Summary statistics of the banks not using AI

Fiscal year 5120 2014.8 2015 2011 2019 2.548

Net income

(million USD) 5101 34.55 8.26 -1739.375 2335.152 115.435

EBIT

(million USD) 5095 82.82 20.004 -66.401 5456 267.287

Total assets

(million USD) 5101 4129.6 1135.143 64.548 190328 11645.6

Log(total assets) 5101 7.244 7.035 4.167 12.157 1.294

Leverage 5094 450.2 69.621 0 51366.07 1963.426

Tobin’s Q 4928 0.177 .152 0 1.928 .159

Book-to-Market

ratio 4901 9.702 .716 -4.331672 8485.6 162.083

Age 5120 18.40 18 1 63 10.261

Operating exp.

(million USD) 5100 152.5 41.676 2.106 14126.7 542.836

Invest. Gains

(million USD) 4894 -1.363 -.054 -1974 49.9 28.914

Salary exp.

(million USD) 5088 55.72 17.184 .857 2603 155.246

Equipment exp.

(million USD) 1945 8.176 1.802 .036 330 28.679

4129 0.782 .284 0 29.182 1.883

Output Quality

(million USD) 5013 12.57 1.351 -40.69 4012.956 113.4

Liquidity

(million USD) 5093 174.0 47.33 .437 13052.81 598.762

5101 486.9 115.028 -108.647 24398.83 1540.219

Number of

Observations 5120

Notes: table 3 summarises the same variables as table 1 but applying the condition AI=0. The figures are rounded to three decimal points.

Table 4: Summary Statistics Categorical Variable: Industry

North American Industry Classification Code

Description Number of observations

522110 Commercial Banking 4,379

522120 Savings Institutions 1,168

522292 Real Estate Credit 5

522310 Mortgage and Other Loan Brokers 2

522390 Other Activities Related to Credit Intermediation 9

523920 Portfolio Management 9

Notes: North American Industry Classification Code is the variable naics in Compustat.

(21)

21

Table 5: Summary Statistics Categorical Variable: Ownership

Stock Ownership code Description Number of Observations

0 Publicly traded company, includes NYSE, ASE,

NASDAQ, and OTC BB

4,099

1 Subsidiary of a publicly traded company 27

2 Subsidiary of a company that is not publicly traded 9 3 Company that is publicly traded but not on a major

exchange, includes Other, Pink Sheet, other OTC, etc.

1,437

Notes: Stock Ownership code is the variable stko in Compustat.

Correlation

Table 6: Correlation Matrix Performance

Variables (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)

(1) AI 1.000

(2) Fiscal Year 0.049 1.000

(3) Net Income 0.323 0.038 1.000

(4) EBIT 0.336 0.018 0.956 1.000

(5) Investment Gains 0.175 -0.029 0.398 0.550 1.000

(6) Size 0.491 0.138 0.410 0.432 0.240 1.000

(7) Leverage 0.290 0.010 0.871 0.960 0.641 0.380 1.000

(8) Age 0.374 -0.040 0.337 0.353 0.195 0.617 0.305 1.000

(9) Tobin’s Q 0.035 -0.143 0.065 0.080 0.066 0.022 0.084 0.096 1.000 (10) Book-to-Market -0.014 -0.018 -0.006 -0.007 -0.003 -0.024 -0.005 -0.023 -0.063 1.000 (11) Operating Expenses 0.330 0.012 0.895 0.976 0.649 0.432 0.978 0.350 0.084 -0.007 1.000 Notes: In yellow are highlighted the correlation coefficients above 0.9.

Confidence interval for Pearson's product-moment correlation at the 1% confidence level are computed for the correlation coefficients above 0.9, based on Fisher's transformation:

• EBIT and Leverage Correlation = 0.960 on 5540 observations (99% CI: 0.957 to 0.962)

• EBIT and Operating Expenses Correlation = 0.976 on 5546 observations (99% CI: 0.975 to 0.978)

• Leverage and Operating Expenses Correlation = 0.978 on 5546 observations (99% CI:

0.977 to 0.980)

(22)

22

Table 7: Cramer’s V - correlation between categorical variables

North American Industry Classification Code

Stock Ownership Code

0 1 2 3 Total

522110 3180 27 9 1163 4379

522120 894 0 0 274 1168

522292 5 0 0 0 5

522310 2 0 0 0 2

522390 9 0 0 0 9

523920 9 0 0 0 9

Total 4099 27 9 1437 5572

Cramér's V = 0.0379

Notes: table 7 shows the number of banks that belongs to each category of both the industry variable (naics) and the ownership variable (stko). The Cramer’s V test gives the correlation between these two variables.

Table 7 shows that most of the banks in the dataset are commercial banks that are publicly traded on a major stock exchange (3,180 banks out of 5,572). The industry and ownership variables have a low correlation of 0.0379 as stated by the Cramer’s V test..

Table 8: Correlation Matrix Performance drivers

Variables (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

(1) AI 1.000

(2) Fiscal Year 0.074 1.000

(3) Total Assets 0.360 0.026 1.000

(4) Salary Expenses 0.346 0.018 0.996 1.000 (5) Equipment Expenses 0.380 0.028 0.958 0.961 1.000

(6) Nb. of Employees 0.369 0.010 0.974 0.978 0.978 1.000

(7) Operating Expenses 0.345 0.011 0.985 0.993 0.958 0.973 1.000

(8) Loan Loss Provision 0.263 -0.036 0.791 0.822 0.795 0.822 0.864 1.000

(9) Liquidity 0.345 0.015 0.905 0.886 0.860 0.905 0.864 0.616 1.000

(10) Management Quality 0.362 0.027 0.998 0.993 0.953 0.966 0.979 0.785 0.897 1.000 Notes: In yellow are highlighted the correlation coefficients above 0.9.

Confidence interval for Pearson's product-moment correlation at the 1% confidence level are computed for the correlation coefficients above 0.9, based on Fisher's transformation:

• Number of Employees and Management Quality Correlation = 0.972 on 4552 observations (99% CI: 0.970 to 0.974)

• Salary Expenses and Management Quality Correlation = 0.992 on 5540 observations (99%

CI: 0.992 to 0.993)

• Equipment Expenses and Management Quality Correlation = 0.953 on 2163 observations (99% CI: 0.948 to 0.958)

(23)

23

• Operating Expenses and Management Quality Correlation = 0.982 on 5552 observations (99% CI: 0.981 to 0.984)

• Total assets and Management Quality Correlation = 0.994 on 5553 observations (99% CI:

0.994 to 0.995)

• Operating Expenses and Number of Employees Correlation = 0.978 on 4552 observations (99% CI: 0.977 to 0.980)

• Operating Expenses and Furniture Expenses = 0.958 on 2163 observations (99% CI: 0.953 to 0.962)

• Operating Expenses and Salary Expenses Correlation = 0.993 on 5540 observations (99%

CI: 0.993 to 0.994)

• Operating Expenses and Total Assets Correlation = 0.986 on 5552 observations (99% CI:

0.985 to 0.987)

The high number of observations (ranging in thousands) allows all upper bounds of the confidence intervals to be lower than 1 at the 99% confidence level. Therefore, the OLS assumption regarding perfect multicollinearity holds. However, management quality is highly correlated with total assets, number of employees, salary expenses, equipment expenses, operating expenses therefore the models using it as a control variable are not considered to be reliable.

3. Methodology

Hypotheses

1. Large companies are more likely to use AI

2. The use of Artificial intelligence increases the net profits of banks (net income and EBIT) 3. The use of Artificial intelligence increases the investment gains of banks

4. The use of Artificial intelligence increases the production of banks 5. The use of Artificial intelligence decreases the number of employees 6. The use of Artificial intelligence increases the salary and furniture expenses

Model

The different parameters defining banks using AI and banks not using AI are compared by using a difference-in-differences model with the treatment group being the banks using AI and the

(24)

24

control group is made out of the banks that don’t use AI. “i” represents a given bank and “t” is the fiscal year.

Independent variables

The independent variables used are net income, EBIT, investment gains, output, number of employees, salary expenses, equipment expenses and operating expenses.

Dependent variable

The dependent variable in all the regressions is the AI dummy variable that accounts for the use of artificial intelligence.

Control variables

The control variables are inspired from Xiaodong Yuan, Fan Hou & Xuehui Cai (2020) and Simon H.Kwan (2003). Moreover, the book-to-market ratio is added to measure value creation and the selling expenses variable is replaced by the operating expenses to include also general and administration expenses which can impact bank performance.

The coefficient of the control variables size, leverage, age and book-to-market ratio are expected to be positive for the regressions of net income, EBIT and investment gains on AI because an increase in these control variables in supposed to increase the profits on the bank. Indeed, a bigger bank should have higher profits than a small bank, the use of leverage should increase the profits by lowering the taxable income and increasing the investable capital, an older company should be more established and have business deals allowing it to have higher profits, banks with high book- to-market ratio should have high value creation which means high performance. The coefficient of operating expenses is expected to be positive for the regressions of net income and EBIT and investment gains on AI because the costs have to be deducted from the profits. Operating expenses should have a positive coefficient for the regression of investment gains on AI as higher expenses means higher gross returns and these costs are not subtracted from investment gains.

For the regressions of number of employees, salary, equipment and operating expenses on AI, Loan loss provision is used to proxy for output quality and its coefficient can be expected to be negative as banks with qualitative output optimise costs and labour. For the regression of output on AI, the coefficient of output quality can be expected to be positive as the higher the output quality, the higher the demand and the higher the output. Cash and dues from banks accounts for

(25)

25

the liquidity of the bank and its coefficient is expected to be positive. While liquid assets reduce the bank’s liquidity risk, they may be more costly to handle as these assets may involve additional transportation cost, storage and protection costs, and labour cost (Simon H.Kwan, 2003). More liquid assets also mean less complexity therefore less equipment and labour should be needed and coefficient of liquidity for the regressions of number of employees, salary, equipment and operating expenses on AI can be expected to be negative. Banks that are more liquid can issue more loans, therefore they have a higher output and the coefficient of output quality can be expected to be positive. Equity capital captures the quality of bank management and risk preference. To the extent that well-capitalized banks reflect both high quality management and aversion to risk taking, these banks are likely to be more cost efficient in producing banking outputs (Simon H.Kwan, 2003).

Thus, for the regressions of number of employees, salary, equipment and operating expenses on AI, management quality is expected to have a negative coefficient; and for the regression of output on AI, management quality is expected to have a positive coefficient.

1. Large companies are more likely to use AI

𝑆𝑖𝑧𝑒_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡 + 𝛽₂∗ 𝐺_𝑖+ 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡

𝑆𝑖𝑧𝑒_𝑖,𝑡(billion dollars): total assets or log of total assets 𝐴𝐼_𝑖,𝑡 = 𝐺_𝑖∗ 𝐷_𝑡 : treatment (the use of AI)

𝐺_𝑖 : dummy variable capturing the group (=1 if the bank uses AI) 𝐷_𝑡 : dummy capturing the time (=1 for periods where the firm uses AI)

𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡: leverage, age, ownership, industry, operating expenses, book-to-market ratio

Expected sign for the AI coefficient: +

2. The use of Artificial intelligence increases the net profits of banks

𝑁𝑒𝑡 𝐼𝑛𝑐𝑜𝑚𝑒_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡 + 𝛽₂∗ 𝐺_𝑖 + 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡 𝐸𝐵𝐼𝑇_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡+ 𝛽₂∗ 𝐺_𝑖+ 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡

𝑁𝑒𝑡 𝑖𝑛𝑐𝑜𝑚𝑒_𝑖,𝑡(million dollars) 𝐸𝐵𝐼𝑇_𝑖,𝑡(million dollars)

𝐴𝐼_𝑖,𝑡 = 𝐺_𝑖∗ 𝐷_𝑡 : treatment (the use of AI)

(26)

26

𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡: size, leverage, age, ownership, industry, operating expenses, book-to-market ratio

3. The use of Artificial intelligence increases the investment gains of banks

𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 𝐺𝑎𝑖𝑛_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡+ 𝛽₂∗ 𝐺_𝑖+ 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡

𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡: size, leverage, age, ownership, industry, operating expenses, book-to-market ratio

4. The use of Artificial intelligence increases the production of banks

𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡+ 𝛽₂∗ 𝐺_𝑖+ 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡

Production_𝑖,𝑡 = 𝑡𝑜𝑡𝑎𝑙 𝑎𝑠𝑠𝑒𝑡𝑠

𝐺_𝑖 : dummy variable capturing the group (=1 if the bank uses AI) 𝐷_𝑡 : dummy capturing the time (=1 for periods where the firm uses AI) 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡: Output Quality, Liquidity, Management Quality

5. The use of Artificial intelligence decreases the number of employees

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡+ 𝛽₂∗ 𝐺_𝑖 + 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡 + 𝜀_𝑖,𝑡

(27)

27 Number of employees_𝑖,𝑡(thousand)

𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡: Size (log of total assets), Output Quality, Liquidity, Management Quality

Expected sign for the AI coefficient: -

6. The use of Artificial intelligence increases the salary and furniture expenses

𝑆𝑎𝑙𝑎𝑟𝑖𝑒𝑠 𝐸𝑥𝑝𝑒𝑛𝑠𝑒𝑠_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡 + 𝛽₂∗ 𝐺_𝑖 + 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡 𝐹𝑢𝑟𝑛𝑖𝑡𝑢𝑟𝑒 𝐸𝑥𝑝𝑒𝑛𝑠𝑒𝑠_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡 + 𝛽₂∗ 𝐺_𝑖+ 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡+ 𝜀_𝑖,𝑡 𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑛𝑔 𝐸𝑥𝑝𝑒𝑛𝑠𝑒𝑠_𝑖,𝑡 = 𝛼 + 𝛽₁∗ 𝐴𝐼_𝑖,𝑡+ 𝛽₂∗ 𝐺_𝑖 + 𝛽₃∗ 𝐷_𝑡+ 𝛾 ∗ 𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡 + 𝜀_𝑖,𝑡

𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠_𝑖,𝑡: Size (log of total assets), Output Quality, Liquidity, Management Quality

Assumptions of the diff-in-diff model

In order to implement a diff-in-diff model, the following assumptions need to hold:

1. Common trend assumption: This assumption should hold in this setting as the returns of the US banks using AI and the returns of US banks not using AI should have the same trend before the treatment.

2. Exogeneity: If AI and the error term are correlated, then 𝛽₁ will be overestimated. The model should not suffer from omitted variable bias as it includes the control variables recommended in the previous literature regarding the performance of banks and the r- squared of the following regressions is above 0.9.

3. Random sample: The variables from Compustat are supposed to be independent and identically distributed. Similarly, the USPTO patent database gathers all the patents registered in the USA.

(28)

28

4. No perfect multicollinearity: the correlation matrix and the confidence intervals of the correlation coefficients shows that all the variables used in the different regressions have a correlation below 1.

5. No outlier: the summary statistics don’t show values far outside the usual range of the data.

Strength of the model

The banks in the dataset that have their AI variable equal to 1 also have it equal to 0 previously therefore they are both in the control and the treatment group which allows to compare their performance post treatment to their own performance pre-treatment.

Moreover, observing a change in the outcome variable after treatment in the treatment group and comparing it to the control group allows to estimate the causal effect of the treatment. The control group is a good counterfactual for the treatment group as the treatment group would have behaved like the control group in the absence of treatment. The control group is not affected by the treatment of interest, but it is similarly affected by other changes that may occur around the time of treatment. A difference-in-differences model allows to take out variables that are constant over time and the impact of macroeconomic cycles on returns since the performance of the treatment group is compared to the performance of the control group. It is intended to mitigate the effects of extraneous factors and selection bias.

The management strategy of investors changes over time but since their performance is aggregated the model should not suffer from endogeneity. By averaging the effect on several banks, it is unlikely that all the banks that applied for a patent regarding AI have changed another component that affect their performance. It removes the influence of the sector and individual companies. The t-statistic estimates whether the effect is significant.

The application date gives a clear date for the treatment. Unlike diff-in-diff models studying policy changes, the model of this research clearly separates the effect of interest and can be generalizable.

Limitation of the model and the data

This method may still be subject to certain biases inherent to a diff-in-diff model especially reverse causality (The firms that perform well are more likely to be able to invest in AI technologies, the use of AI itself can be endogenous) and omitted variable bias (the effect measured might be

University of Amsterdam, Amsterdam Business School MSc Finance, Quantitative Finance track