The Acceptance of AI Systems as Perceived by Employees
A Quantitative Research in the Dutch Banking Sector
University of Amsterdam – Amsterdam Business School
Executive Programme in Management Studies – Digital Business Track
Author : Frans van Vliet
Student number : 12531472
Date : January 23rd, 2021
Version : Final
Supervisor : Dr. H. Güngör
EBEC approval number : EC 20200904090949
- 2 -
Front cover image from: Adobe Stock file # 135958479
Statement of Originality
This document is written by Student Frans van Vliet who declares to take full responsibility for the contents of this document.
I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.
The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.
- 3 -
Abstract ... - 5 -
Abbreviations ... - 6 -
1. Introduction ... - 7 -
2. Literature review ... - 10 -
2.1 The definition and capabilities of Artificial Intelligence ... - 10 -
2.2 Examples of AI within banking ... - 12 -
2.3 The Technology Acceptance Model (TAM) ... - 13 -
2.3.1. The original TAM ... - 13 -
2.3.2. TAM 2 ... - 14 -
2.3.3. Criticism on the TAM ... - 15 -
2.4 Acceptance ... - 16 -
2.5 Research hypothesis ... - 17 -
2.5.1. Job relevance ... - 17 -
2.5.2. Output quality ... - 18 -
2.5.3. Result demonstrability... - 18 -
2.5.4. Perceived usefulness ... - 19 -
2.5.5. Customer proximity ... - 19 -
2.5.6. Knowledge of AI ... - 21 -
2.5.7. Age ... - 21 -
2.6 Research model ... - 22 -
2.7 Hypotheses ... - 23 -
3. Research design & Methodology ... - 24 -
3.1 Research design ... - 24 -
3.2 Population and Sample ... - 25 -
3.3 Measurements ... - 26 -
4. Results ... - 28 -
4.1 Preliminary steps ... - 28 -
4.2 Demographics ... - 29 -
4.3 Reliability and normality analysis ... - 30 -
4.4 Correlation analysis ... - 32 -
4.5 Hypothesis testing ... - 33 -
4.5.1. Direct effects ... - 33 -
4.5.2. Indirect effects ... - 35 -
4.6 Summary of the results ... - 41 -
4.6.1. Outcome model ... - 41 -
- 4 -
4.6.2. Results of hypothesis testing ... - 42 -
5. Discussion... - 43 -
5.1 General discussion ... - 43 -
5.2 Theoretical and managerial contribution ... - 44 -
5.3 Limitations ... - 46 -
5.4 Suggestions for future research ... - 47 -
6. Conclusion ... - 48 -
References ... - 51 -
Appendix I: Survey measurements ... - 47 -
Appendix II: Online survey ... - 48 -
Appendix III: Survey responses ... - 50 -
Appendix IV: Analysis on Customer Proximity ... - 55 -
- 5 -
Artificial Intelligent (AI) systems are already used within the banking sector and the development of new systems is expected to increase more rapidly in the future. The development of this technology will have impact on bank employees and the way they will perform their tasks. A research model has been proposed in this study based on the theoretical framework from TAM2, with further validation by a survey conducted among 330 bank employees. The study concludes that Job relevance, Output quality and Result demonstrability have a direct positive effect on Perceived usefulness, which on its term has a direct positive effect on Acceptance of AI systems. The moderating effect of employee characteristics age, knowledge of AI and customer proximity were minor and statistically not significant.
Suggestions for further research have been made.
Keywords: AI, Technology Acceptance Model, TAM2, bank employees, customer proximity
- 6 -
AI Artificial Intelligence BI Behavioural Intention EAI Experience with AI IT Information Technology
KAI Knowledge of AI
ML Machine Learning
NN Neural Network
OUT Output quality
PEOU Perceived Ease of Use PU Perceived Usefulness REL Job relevance
RES Result demonstrability
TAM Technology Acceptance Model TRA Theory of Reasoned Action
- 7 -
Artificial Intelligence (AI) systems within the banking sector have been applied for decades and are used across a range of diverse activities within such organizations. Whether it is analysing a loan request, detecting credit card fraud and money laundering, forecasting credit risks or answering questions from a client using a chat bot. The development of new AI systems is expected to increase even more rapidly in the future (McKinsey, 2018b). The impact of this development is under increased academic interest with conflicting conclusions, where on the one hand is expected that humans and AI systems will work together (Davenport & Bean, 2017;
Jarrahi, 2018) and on the other hand the belief that employees will be replaced all together for efficiency purposes (Frischmann & Selinger, 2017). Research performed by Güngör (2020) showed that employees, compared to other organizational stakeholders such as customers and suppliers, are perceived as the most negatively impacted stakeholders. Regardless of which scenario will unfold, it is clear that the development of this technology will have impact on employees and the way they will perform their tasks. A successful adoption of AI systems means that the user, in the context of this research an employee, needs to accept the outcome of the system, to ensure the return on organization investments made in this technology.
A lot of research has been done to determine if a user is likely to accept a new technology, where the Technology Acceptance Model (TAM) is a commonly used theoretical framework. The first model by Davis (1989) and the extended TAM2 model from Venkatesh
& Davis (2000) state that perceived usefulness and perceived ease of use are fundamental determinants of user acceptance. The TAM has been applied on both consumer acceptance and employee acceptance. However, little is known on the acceptance of AI technology by employees, which indicates that the TAM has not yet been applied to AI in this context. As far as can be ascertained, current academic research has also not yet examined what the effect of
- 8 - customer proximity is to the employee acceptance of a technology. Thus far the literature has focussed on the employee as a homogenous group of people, where certain variables are added to the model such as age, gender or self-efficacy (Morris & Venkatesh, 2000). However, given the various roles and functions within organizations, it is important to look at the possible effect of customer proximity on accepting a new technology. Both gaps will be addressed in this research and lead to the following research question:
What factors influence employee acceptance of AI systems and how do employee characteristics affect these factors?
This research aims to add a contribution by addressing two identified literature gaps, being the use of the TAM on AI from the perspective of employees and the effect of customer proximity on the acceptance of AI. This research aims to add a managerial contribution as well.
Given the expected increase in development of AI systems it is important for organizations to understand if the usage behaviour is similar between the various roles that are more proximate to customers, compared to those that are less proximate. The findings of this research can be used for training, communication and development purposes, where either a one-size-fits-all or a role-specific approach is needed to increase the employee acceptance of AI systems within the banking sector.
This study begins with a literature review in Chapter 2 in which the theoretical background is explored, starting with the definition and capabilities of AI and followed by examples of AI in banking. Next, the Technology Acceptance Model (TAM) is elaborated, discussing the original TAM and the extended TAM2 version. This chapter will also address the research hypotheses and the proposed research model. In Chapter 3 the research design and methodology are discussed, including the data collection techniques and measurements of
- 9 - constructs in this research. The hypotheses are tested and the results are presented in Chapter 4, followed by a discussion of the results in Chapter 5. This section will discuss the theoretical and managerial implications of the results, as well as limitations of this research and suggestions for further research. Finally, this research ends with the conclusion in Chapter 6, summarizing the main findings, contributions, limitations and suggestions for future research.
- 10 -
2. Literature review
This chapter explores the theoretical background of this study, starting with the definition and capabilities of AI in paragraph 2.1 and followed by examples of AI in banking in paragraph 2.2. Next, the Technology Acceptance Model (TAM) is elaborated in paragraph 2.3, discussing the original TAM, the extended TAM2 version and the criticism on this model. Paragraph 2.4 elaborates on the role of acceptance, followed by a description of the research hypotheses in paragraph 2.5. The proposed research model is presented in paragraph 2.6 and this chapter ends with an overview of the hypotheses in paragraph 2.7.
2.1 The definition and capabilities of Artificial Intelligence
The definition of AI is as ambiguous as its applicability. Three concise definitions are mentioned in Table 1.
Definition of AI Reference
the theory and development of computer systems able to perform tasks normally requiring human intelligence
Oxford English Dictionary the replication of human analytical and/or decision-making capabilities Finlay (2018) a system’s ability to interpret external data correctly, to learn from such data, and to
use those learnings to achieve specific goals and tasks through flexible adaptation
Haenlein (2019) Table 1: Definitions of AI
AI systems have the capability to capture structured and unstructured data (e.g. image and speech recognition), find patterns in this data and predict what will happen next (Burgess, 2018). The general consensus in literature is that AI systems perform most tasks faster, better and cheaper than humans (Finlay, 2018; Burgess, 2018; Kaplan & Haenlein, 2019). Most of the capabilities are based on Machine Learning (ML) techniques, using algorithms to analyse data and to discover clusters and useful relationships between different items of data. Once the
- 11 - clusters have been identified and the relations between them, predictions can be made for new inputs (Finlay, 2018).
The capabilities of ML are categorized in three major types of ML: Supervised learning, Unsupervised learning and Reinforcement learning (McKinsey, 2018a, Finlay, 2018).
Supervised learning algorithms use data sets which have been classified or labelled by humans to find the relation between the input variables and the output. Examples of supervised learning algorithms are classification models (e.g. decision trees to predict the probability a fraudulent transaction is occurring) and regression models (e.g. linear regression to predict the amount of credit loss). Unsupervised learning algorithms use unlabelled data sets to find patterns or clusters of similar points within the data, for example a Neural Network (NN) for text recognition and K-means clustering which segments customers based on demographics (e.g.
age, gender) or geography (e.g. city, street). Reinforcement learning algorithms learn to perform a task by trying to maximize the reward it receives for its actions and to optimize for the best series of actions by correcting itself over time. Examples are varied, from stock trading to physical robots aiming to climb a stair.
The differences in types of ML have an effect on the level of understandability and transparency per type of algorithm. Scorecards and decision trees are more transparent in determining how they give a certain output. Neural Networks (NN) are more complex and often considered a ‘black box’, making the traceability in these so called ‘hidden layers’ difficult in determining how the system came to an output (Finlay, 2018).
Although there is a wide array of AI technologies to apply to a use case, there is no single best type of model that should always be used as it is dependent on the type of problem, type of data and objectives (Finlay, 2018). This is supported by the extensive research of Bahrammirzaee (2010) who compared the use of different AI techniques across three domains
- 12 - (credit evaluation, portfolio management and financial planning) and concluded that some techniques outperform others depending on the domain and type of data. A similar finding has been made by Coakley & Brown (2000) who’s research of Artificial Neural Networks (ANN) in finance and accounting conclude there is no formal theory to determine the optimal network model and during development experiments must be conducted to determine the performance differences between alternative models.
2.2 Examples of AI within banking
The use of AI systems within banks is widespread across multiple parts of the value chain, both in front office and back office (McKinsey, 2020). In the front office the use of NN and Decision Trees has been applied for decades to analyse and approve credit applications (Hawley et al.
1990; Jensen, 1992). Within the field of marketing, NN and Decision Trees are used to identify the most profitable groups of customers for a marketing campaign and to predict the purchasing behaviour of customers (Sing’oei & Wang, 2013; Moro et al., 2014). Customer can gain more insight in their personal finance, as payment transactions can be classified into several categories like ‘groceries’ and ‘housing’. Customers can also interact with a chat bot via digital channels like an app or website and if the customer prefers to use the telephone, audio transcriptions can be made using Natural Language Processing (NLP). AI systems are not only used to attract new clients, but also to retain them as Classification and Regression Tree (CART) models use customer characteristics to predict customer churn (Prasad & Madhavi, 2012).
In the back office different types of AI systems are used for anti-money laundering (AML) (Han et al., 2020) and credit card fraud detection (Ghosh & Reilly, 1994; Gomez et al., 2018), analysing millions of transactions to detect the ones that have an increased possibility
- 13 - of being a malicious transaction. Image recognition technologies are able to recognize invoices and legal documents which drastically speed up the operational processes (Deloitte, 2018).
Within the field of risk management, Deep Learning techniques are used for company bankruptcy forecasting based on firms textual disclosures in annual reports (Mai et al., 2019) or news text and basic event information (Rönnqvist & Sarlin, 2017). A majority of examples of AI within banking consist of some form of prediction, as prediction tasks are crucial (Cavalcante et al., 2016). This includes exchange rate predications (e.g. Shen et al., 2015;
Zheng et al., 2017), stock market predictions (e.g. Matsubara et al., 2018; Singh & Srivastava, 2017) and macroeconomic predictions (Sevim et al., 2014; Chatzis et al., 2018). Although AI is already applied in several parts of the value chain, many more potential applications within banking are waiting to be exploited (Hassani et al., 2020).
2.3 The Technology Acceptance Model (TAM)
The Technology Acceptance Model (TAM) is a commonly used theoretical framework to provide an explanation of drivers of users’ acceptance of information technology (IT). This section will briefly elaborate on the variations of the TAM and address some of the criticism it has received over time.
2.3.1. The original TAM
The first model by Davis (1989) was built on the Theory of Reasoned Action (Fishbein &
Ajzen, 1975) and state that perceived usefulness and perceived ease of use are fundamental determinants of user acceptance (see Figure 1). In the work of Davis (1989, p320), perceived usefulness is described as “the extent to which a person believes that using the system will enhance his or her job performance”. The same research describes perceived ease of use as
- 14 -
“the extent to which a person believes that using the system will be free of effort”. The model has been applied to both consumer acceptance and employee acceptance of new technologies and one of the reasons it is widely accepted is because the TAM explains on average about 40% of the variance (Venkatesh & Davis, 2000). Examples of prior research applying the TAM on employee acceptance of a new technology include topics such as a computer banking system (Brown et al., 2002; Dalcher & Shine, 2003), an e-learning system (Lee et al., 2011) or a frequently used application chosen by the respondent (Koh et al., 2010).
Figure 1: Technology Acceptance Model (Davis, 1989)
2.3.2. TAM 2
The original TAM has been extended by adding social influence processes (subjective norm, voluntariness, and image) and cognitive instrumental processes (job relevance, output quality, result demonstrability and perceived ease of use) in the extended TAM2 model from Venkatesh
& Davis (2000) as can be seen in Figure 2 below. Their study has found that the added constructs significantly influence user acceptance. The additional constructs of Job relevance, Output quality and Result demonstrability are considered to be very valuable for this research on AI systems due to the possible impact it has on employees, its wide applicability but also its complexity. The applied constructs in this research will be elaborated in later segments.
- 15 - Figure 2: Extension of the Technology Acceptance Model – TAM2 (Venkatesh & Davis, 2000)
2.3.3. Criticism on the TAM
An often used model as the TAM is not immune to criticism. Some studies state that the simplicity of the model lacks explanatory power of usage behaviour (e.g. Bagozzi, 2007) and adding additional variables can cause model over-fitting rather than representing a refined model (Brock & Khan, 2017). Empirical research on multiple TAM studies, using the original TAM and TAM2, shows that results are not totally consistent or unclear which suggests that significant factors are not included in the models (Legris et al. 2013).
The TAM is more often subject of comparison with other theoretical models. Although prior research (Taylor & Todd, 1995) conclude that the Theory of Planned Behaviour (TPB) explains more behavioural intention compared to TAM, TAM has consistently outperformed the TRA and TPB in terms of explained variance across multiple studies (e.g. Davis et al., 1989; Venkatesh et al., 2003), making it a suitable model for this research.
- 16 -
The acceptance of an AI system, or behavioural intention (BI) to use an AI system, is the dependent variable in this research. As described above, the main driver of user acceptance is perceived usefulness, but acceptance encompasses more than that. Acceptance is intertwined with the trust a user has in a system, as human operators tend only to use automation they trust and refuse to use automation they do not trust (Dzindolet et al., 2003; Pop, 2015). In addition, transparency and understandability lead to a higher acceptance of automated systems (Seong
& Bisantz, 2008). When looking at AI systems in specific, the main challenge for operationalizing AI systems is that employees need to accept and be comfortable with the automated decisions made (Finlay, 2018). The acceptance is dependent on the trust in the technology and how it operates (Hengstler et al., 2016), but according to one research only 16 percent of employees trust AI-generated insights (McKinsey, 2018b). The acceptance of an AI system by employees cannot be taken for granted as the research of Dietvorst et al. (2014) showed that evidence based algorithms are more accurate in predicting the future compared to human forecasters, but the human forecaster remains favourable even when they see the algorithm outperform humans. The authors introduced the phenomenon algorithm aversion and showed that forecasters are averse against algorithms and lose confidence more quickly after seeing them make a mistake. This averse can result in underutilization of a system and nullify the investment of company resources. As a result, employee attitudes towards AI are more complex than expected, resulting in the paradox that the same employee can have positive or negative attitudes towards AI, depending on the particular situation and application (Lichtenthaler, 2019).
- 17 -
2.5 Research hypothesis
This section will elaborate on the hypotheses that are addressed in this study and the corresponding conceptual framework. This research will build on the TAM2 model from Venkatesh & Davis (2000) and will apply the cognitive instrumental processes which all have demonstrated to be significant. It is important to address that one key variable in the TAM, the perceived ease of use, is not included in this research for multiple reasons. First, perceived usefulness is the main predictor of usage intention and not the perceived ease of use (Davis et al., 1989; Subramanian, 1994). Second, perceived ease of use is less suitable for this research due to the wide applicability of AI systems. In contrast to many other research which focus on one specific, often existing system, the majority of respondents of this research will most likely not have worked with an AI system before. Third and final, the wide applicability of AI systems make it difficult to narrow down the respondents view on this technology and answer questions related to interacting and using a system. An extension to the TAM2 has been made to address to earlier mentioned literature gap regarding customer proximity.
2.5.1. Job relevance
Job relevance is one of the cognitive instrumental processes which have been added in the TAM2 model and is described as “an individual’s perception regarding the degree to which the target system is applicable to his or her job.” (Venkatesh & Davis, 2000, p191). It relates to the importance of the set of tasks the system is capable of supporting and the longitudinal study of Venkatesh and Davis (2000) has shown that job relevance has a positive effect on perceived usefulness for both mandatory and voluntary use of systems. The above mentioned variety of examples of AI within banks indicate that AI systems are capable of supporting employees in their tasks and a user will trust automation more if the system seems to be able
- 18 - to achieve the specific users goal (Lee & See, 2004). To investigate what the effect of Job relevance is on Perceived usefulness, the following hypothesis is proposed:
Hypothesis 1: Job relevance will have a positive direct effect on Perceived usefulness
2.5.2. Output quality
The perception of output quality is described as the way people will take into consideration how well the system performs tasks (Venkatesh & Davis, 2000) and the relationship between perceived output quality and perceived usefulness has been shown before (Davis et al., 1992).
Research has shown that machines deliver higher decision quality compared to humans (Jarrahi, 2018) and algorithms are more accurate in predicting the future compared to human forecasters (Dietvorst et al., 2014). The general consensus in literature is that AI systems perform most tasks faster, better and cheaper than humans (Finlay, 2018; Burgess, 2018;
Kaplan & Haenlein, 2019). To investigate what the effect of Output quality is on Perceived usefulness, the following hypothesis is proposed:
Hypothesis 2: Output quality will have a positive direct effect on Perceived usefulness
2.5.3. Result demonstrability
Defined by Moore and Benbasat (1991, p. 203) as the “tangibility of the results of using the innovation”. In other words, the improvement in job performance can be attributed specifically to their use of the system. The human operator will trust automation more if the underlying algorithms are understandable (Lee & See, 2004) and a similar statement is made by Seong &
Bisantz (2008) who claim that transparency and understandability lead to a higher acceptance of automated systems. However, as addressed earlier, the level of transparency and understandability differ per type of algorithm (e.g. scorecards and decision trees are more transparent compared to NN). This could indicate that potentially the improvement in job
- 19 - performance cannot always be attributed to the AI system, dependent on the type of algorithm.
To investigate what the effect of Result demonstrability is on Perceived usefulness, the following hypothesis is proposed:
Hypothesis 3: Result demonstrability will have a positive direct effect on Perceived usefulness
2.5.4. Perceived usefulness
The consistent variable in all versions of the TAM is perceived usefulness which is defined as
“the extent to which a person believes that using the system will enhance his or her job performance” (Davis, 1989, p320). Perceived usefulness has been a strong determinant of usage intentions in the many empirical tests of TAM (Venkatesh & Davis, 2000). This would indicate that bank employees who consider AI systems to be useful for their job performance are likely to accept the AI technology at work. Therefore the following hypothesis is proposed:
Hypothesis 4: Perceived usefulness will have a positive direct effect on the acceptance of AI systems
2.5.5. Customer proximity
Banks and other service firms are often involved with a high level of customer interaction, but not all activities related to providing a certain service require customer interaction (Metters &
Vargas, 2000). Employees and their activities are often divided into groups that differ in customer interaction. According to Liao and Subramony (2008) there are three distinct categories of functional roles and each functional role has its own proximity to external clients.
The most proximate roles are the ‘customer contact roles’, who directly help and service external customers, followed by ‘production roles’. The least proximate roles are described as
‘support roles’ (p318) who have little to no interaction with customers, for example employees
- 20 - working at the HR or IT department. Dividing employees into three groups that differ in customer interaction has been used before (Zomerdijk & De Vries, 2007; Wangenheim et al., 2007) as can be seen in the customer proximity spectrum in Figure 3 below.
Zomerdijk & De Vries (2007) back office mid office front office Wangenheim et al. (2007) no customer
limited customer interaction
high degree of customer interaction Liao & Subramony (2008) support roles production roles customer contact
roles Figure 3: The customer proximity spectrum
The relation between customer interaction and employees is described in various research. For example, Parker and Axtell (2001) state that more customer interaction leads to more knowledge accumulation of the customers perspective and a better understanding of customer needs. Employees with more customer facing roles have different skills, attitudes and a higher customer orientation (Liao & Subramony, 2008), and high- and low-contact activities make different demands on the staff and technology (Zomerdijk & De Vries, 2007).
Client interaction is related to maintaining client relationship and justify any decision made, e.g. to accept or reject a product to a client. Prior research has shown that transparency contributes to the success of a business relationship (Eggert & Helm, 2003) and customer satisfaction (Eskildsen & Kristensen, 2007). However, the transparency and understandability of AI systems can differ depending on the type of AI system as described above. This could indicate that the level of customer proximity can affect the acceptance of AI systems, although this has not been researched before. Therefore, this research will address this gap in literature and will explore the moderating effect of customer proximity to the acceptance of AI systems, where the hypothesis is that a higher customer proximity will lead to a lower acceptance of AI systems.
- 21 - 2.5.6. Knowledge of AI
The importance of AI knowledge amongst employees cannot be neglected in this research, as the lack of talent with AI related skills and knowledge is seen as one of the most significant barriers for adopting AI (McKinsey, 2018b). Employees who are more knowledgeable about a system are more likely to adopt them (Thong, 1999), hence educating all employees is needed to ensure AI adoption (Fountaine et al., 2019). There is a significant body of research supporting the relation between the knowledge a user has of a system and their acceptance of that system. The research of Eastwood & Luther (2016) state that users show an increase in both satisfaction and willingness to adapt the system after receiving information about the system. A similar result was found by Yeomans et al. (2018) who state that explanation of the system leads to increased understanding and a higher willingness to use the system. This research will explore the moderating effect of knowledge of AI to the acceptance of AI systems, where the hypothesis is that a higher knowledge of AI will lead to a higher acceptance of AI systems.
The effect of age is an interesting variable for multiple reasons. First, the effect of age on the acceptance of new technologies has been described in research before. For example, Morris and Venkatesh (2000) concluded that younger users are more likely to use a new technology compared to older users. Second, the population in the Netherlands is aging as the average age in The Netherlands has increased to 42 years by the end of 2019 (CBS Statline). The average age of the three largest banks in The Netherlands however is above 50 years and employees within this age group are not investing as much in sustainable employability (FD, 2016). This research will explore the moderating effect of age to the acceptance of AI systems, where the hypothesis is that a higher age will lead to a lower acceptance of AI systems.
- 22 - To analyse the possible moderating effect of the three aforementioned employee characteristics of customer proximity, knowledge of AI and age, the following hypotheses are proposed.
Hypothesis 5: The relationship between Job relevance and Perceived usefulness is moderated
by (a) customer proximity (b) knowledge of AI and (c) age
Hypothesis 6: The relationship between Output quality and Perceived usefulness is moderated
by (a) customer proximity (b) knowledge of AI and (c) age
Hypothesis 7: The relationship between Result demonstrability and Perceived usefulness is moderated by (a) customer proximity (b) knowledge of AI and (c) age
Hypothesis 8: The relationship between Perceived usefulness and the acceptance of AI systems is moderated by (a) customer proximity (b) knowledge of AI and (c) age
2.6 Research model
Based on the explored variables and in order to research the defined gap in literature, the following research model will be applied as can be seen in Figure 4.
“acceptance of AI system”
Customer Proximity Knowledge of AI Age Job relevance
H1 H2 H3
Figure 4: Research model
- 23 -
Below an overview is presented of the proposed hypotheses from paragraph 2.5 and Figure 4 which are tested in this research.
H1 Job relevance will have a positive direct effect on Perceived usefulness H2 Output quality will have a positive direct effect on Perceived usefulness
H3 Result demonstrability will have a positive direct effect on Perceived usefulness H4 Perceived usefulness will have a positive direct effect on Behavioural intention H5a
The relationship between REL and PU is moderated by customer proximity The relationship between REL and PU is moderated by knowledge of AI The relationship between REL and PU is moderated by age
H6a H6b H6c
The relationship between OUT and PU is moderated by customer proximity The relationship between OUT and PU is moderated by knowledge of AI The relationship between OUT and PU is moderated by age
H7a H7b H7c
The relationship between RES and PU is moderated by customer proximity The relationship between RES and PU is moderated by knowledge of AI The relationship between RES and PU is moderated by age
H8a H8b H8c
The relationship between PU and BI is moderated by customer proximity The relationship between PU and BI is moderated by knowledge of AI The relationship between PU and BI is moderated by age
- 24 -
3. Research design & Methodology
This chapter introduces the methods used to collect and analyse data. First, paragraph 3.1 describes the research design, followed by a description of the population and sample in paragraph 3.2. Lastly, paragraph 3.3 describes the measurement of factors.
3.1 Research design
The research is a quantitative research as the majority of academic research on the TAM have used a survey with questions that can be answered using a 7 point Likert scale (e.g. Davis, 1989; Venkatesh & Davis, 2000; Brown et al., 2002). This research applies a similar methodology with comparable questions in the questionnaires, to determine if similar findings can be found. The online survey was created in Qualtrics and starts with an introduction which briefly explains the purpose of this research and provides a few examples of AI within banking.
The remainder of the survey consists of nine questions, some of which contain multiple statements. The survey was available in the Dutch and English language. Before publishing the survey, a pre-test was conducted amongst nine persons. Based on their feedback minor adjustments were made in the introduction and the sequence of questions. In paragraph 3.3 the measurements are further elaborated and Appendix II can be consulted for an overview of the final survey questions. A survey design summary can be found in Table 2.
Survey Design Summary
Method Online Survey
Population Size 159,300
Sample Technique Convenience Sampling
Confidence level 95%
Margin of error 5%
Sample Size 384
Survey Tool Qualtrics
Survey Participants Colleagues from authors company, connections on LinkedIn Survey Response Time November 4th, 2020 till November 25th, 2020
Table 2: Survey design summary
- 25 -
3.2 Population and Sample
The population consists of every employee working in a bank in The Netherlands. The latest available public data indicates that 165,800 employees are working in the banking sector in the Netherlands by the end of December 2019 (CBS Statline). Based on a 95% confidence level and a 5% margin of error, the ideal sample size = 384 (Qualtrics).
The sampling technique used in this research is Convenience Sampling and the data for this research was gathered via an online survey which was distributed among employees within the banking sector in the Netherlands. The time period for questionnaire distribution and recovery was three weeks.
The author is employed at one of the three largest banks within The Netherlands. This company makes use of Yammer, a software application from Microsoft which allows companies to enhance the communication and collaboration amongst employees by posting messages, ask questions and receive answers from colleagues (Microsoft). The ‘All Company’
group of the bank within Yammer has 46.195 members in which a request to participate in an online survey was posted, however this doesn’t mean all members see the request to participate.
Via LinkedIn other respondents from other banks were approached as well by posting a request to participate in general and also in two LinkedIn groups related to banking and AI/ML. Due to a limited number of responses after ten days, the author requested colleagues from various departments and connections from other banks via email to participate in the survey. Although this resulted in a higher survey response, this method does limit the convenience sampling technique.
Between November 4th, 2020 till November 25th, 2020 a total of 368 responses were collected. After excluding all responses that have not finished the entire survey (38 responses), the final result was a total of 330 finished responses (N = 330). Although this is below the ideal
- 26 - sample size of 384, it is above the threshold designed by Green (1991) who state that N > 50 + 8m (where m is the number of IVs) for correlation and regression analysis. The sample is also above the general rule of thumb of 300 cases for factor analysis (Tabachnick & Fidell, 1996).
Theoretical constructs were operationalized using validated items from prior research (see Appendix I). The TAM scales of Perceived Usefulness (PU) and Behavioural Intention (BI) were measured using items adapted from Davis (1989) and Davis et al. (1989). An additional item for BI (BI3) was adopted from Al-Jabri and Roztocki (2015) which was also validated by Verma et al. (2018).
Measures of Result Demonstrability (RES) were adapted from Moore & Benbasat (1991) and measures of Job Relevance (REL) were adapted from Davis et al. (1992). For Output Quality (OUT), one item was adapted from Davis et al. (1992) and one new item (OUT2) has been added by the author to validate the statement that AI systemsperform tasks better compared to humans (Finlay, 2018; Burgess, 2018; Kaplan & Haenlein, 2019).
The before mentioned items were all measured using a 7 point Likert scale, where 1 = Strongly disagree and 7 = Strongly agree, in line with prior research.Age is measured as a continuous variable by using a drop down menu ranging from ‘18’ to ‘70’.
As described in section 2.5.5. there are multiple ways to measure Customer Promixity (CP). For this research the items of Wangenheim et al. (2007) are adopted as the author expects that this description in the survey will be most clear to the survey respondents.
There are multiple methods to measure the knowledge of AI and each method has its pros and cons. One method is to ask multiple choice (MC) questions where only one answer is
- 27 - correct. These questions could range from general questions (e.g. origins or often used examples of AI) to moderately difficult questions (e.g. the different types of ML) to very difficult questions (e.g. which type of algorithm is used in an example of AI). However, a respondent can select the correct answer without actually having knowledge of AI, the so called
“guessing factor”. Similarly, if a respondent selects an incorrect answer, this could indicate that the respondent does not know the answer and thus has less knowledge of AI. However, as Hunt (2003) points out, this inference of being uninformed is misleading as the respondent could be very sure that the selected incorrect answer is correct, making the respondent misinformed which is worse than being uninformed (Hunt, 2003). Another method to measure knowledge of AI is by self-assessment where the respondent rate themselves on a scale. Table 3 provides an overview of different measurements of knowledge self-assessment in prior research. For this research the 11 point measurement scale of Güngör (2020) is adopted as the author expects that this description in the survey will be most clear to the survey respondents. However, it is adapted to a 10 point measurement scale (from 1 to 10) similar to the grading system in Dutch education.
Source Example question No. of scale items Measurement scale
Güngör (2020) What is your AI knowledge 11 point scale 0 (don’t know anything about AI) to 10 (know a lot about AI)
Ifinedo (2016) Overall computer knowledge level
5 point scale “Far below average” to
Dreyfus (2014) 5 point scale “Novice” to “Expert”
Cegarra-Navarro et al (2014)
You have acquired new…
7 point scale (Likert)
“Strongly disagree” to
Table 3: Measurement of knowledge self-assessment in prior research
- 28 -
Chapter four consists the results of this study in which the findings are presented based on statistical tests. First, the preliminary steps before data analysis are explained in paragraph 4.1, followed by the demographic analysis of the survey’s respondents in paragraph 4.2. Next, the reliability and normality of the measurements is reported in the third paragraph. In paragraph 4.4 the correlation analysis can be found, after which the hypothesis testing is presented in paragraph 4.5. This paragraph will elaborate on whether the hypotheses of this study are supported or rejected by analysing the direct and indirect effects. Lastly, an overview of all the results will be presented in paragraph 4.6, including the outcome model.
4.1 Preliminary steps
Between November 4th, 2020 and November 25th, 2020 a total of 368 responses were collected.
The first step was to exclude all responses that have not finished the entire survey (38 responses), resulting in a total of 330 finished responses (N = 330). The second step was to modify the numerical variable of ‘Age’, as the drop down menu in the survey consists of the values between 18 and 70 and within the output from Qualtrics a selected age of ‘18’ resulted in an output value of ‘1’, a selected age of ‘19’ resulted in an output value of ‘2’ et cetera. By adding the output value with ‘17’, the actual respondents age was used for further analysis. The third step was to recode one counter-indicative item (RES4) in SPSS, as this was the only negatively-keyed item from a total of four items. The fourth and final step before starting with the analysis was to create a new variable for all five constructs with a 7 point Likert scale which represents the mean of all items in that construct. For example, all four items in Result Demonstrability (RES1 to RES4) are computed to one variable called RES_TOT . This method, also known as computing scale means (Field, 2018) is used to analyse all direct and indirect
- 29 - effects that are hypothesized in paragraph 2.7. Appendix III can be consulted for a graphical overview of all survey responses.
To provide insights in the characteristics of the sample, descriptive statistics was used on the control variable and three moderating variables which can be seen in Table 4. The age in years reported by the respondents was 𝑥̅ = 45 years, which is between the average age of 42 years in The Netherlands (CBS Statline) and the average age of 50 years of bank employees in the Netherlands (FD, 2016). The age-range was 22 to 67 years, thus the sample includes employees at the start of their career as well as at the end of their career, given the current legal retirement age of 67 years in The Netherlands.
A vast majority of survey respondents (80 percent) indicated they have (hardly) no customer interaction, compared to only 7 percent who indicated to have a high degree of customer interaction. This could be caused by the sampling technique where the author has also requested colleagues from the CFO department to participate in the survey and the majority of roles within a CFO department have (hardly) no interaction with external clients.
A majority of survey respondents (45.2 percent) indicated they have not worked with an AI system before, compared to a third (34.2 percent) that indicate they have. An interesting observation is that a fifth of the respondents don’t know if they have worked with an AI system before.
The respondents’ knowledge about AI was 5.06 on a scale from 1 to 10, just below the mid-point between ‘don’t know anything about AI’ and ‘know a lot about AI’ and close to the self-reported knowledge about AI in prior research (Güngör, 2020).
- 30 -
Variable Frequency Percentage (%)
Age 18-30 45 13.6
(𝑥̅ = 45) 31-40 71 21.5
41-50 106 32.1
51-60 85 25.8
61 and older 23 7.0
Customer Proximity (hardly) no customer interaction 264 80.0
Limited customer interaction 43 13.0
High degree of customer interaction 23 7.0
Experience with AI Yes 113 34.2
No 149 45.2
I don’t know 68 20.6
Knowledge of AI 1 11 3.3
(𝑥̅ = 5.06) 2 28 8.5
3 39 11.8
4 41 12.4
5 56 17.0
6 81 24.5
7 47 14.2
8 19 5.8
9 5 1.5
10 3 0.9
Table 4: Demographic analysis of the respondents (N = 330)
4.3 Reliability and normality analysis
The Cronbach’s Alpha (α) represents the level of reliability of a scale and needs to be above the threshold of 0.70 to be reliable enough to draw reliable conclusions (Field, 2018). The constructs of PU, REL, OUT, RES and BI all had a Cronbach’s Alpha (α) between 0.80 and 0.92 (see Table 5 for more details per construct) and the α would not change significantly (∆ α
> 0.1) if one item would be deleted in any of these constructs.
- 31 - Within the construct of OUT, the new item that has been added by the author (OUT2) is validated to be reliable as the construct as a whole reports a α = .798. The Inter-Item Correlation Matrix measures the strength of the relationship between the items within a construct and a value closer to 1, the stronger the relationship. Item OUT2 has an inter-item correlation with OUT1 of .669 which indicates there is a strong relationship between the two items.
The skewness of both PU (< -1.0), REL (< -0.4) and OUT (< -0.5) were all within acceptable boundaries. The skewness of RES showed one item of -1.15 and based on group totals a skewness of -1.02 which is still acceptable. The same applies for BI where two items showed a skewness of -1.4 and based on group totals a skewness of -1.03 which is slightly above the threshold of -1. The negative skewness values can be explained by the positive attitude from employees towards AI systems, where on a 7 point scale BI (𝑥̅ = 5.45, SD = 1.00) and PU (𝑥̅ = 5.24, SD = 1.12) reported the highest sample means, followed by RES (𝑥̅ = 4.92, SD = 1.11), OUT (𝑥̅ = 4.75, SD = 1.20) and REL (𝑥̅ = 4.45, SD = 1.43).
Finally, a normality check was performed using the Kolmogorov-Smirnov test and the Shapiro-Wilk test. The results indicate that none of the five constructs are normally distributed at p < .001.
The demographics analysis in the previous segment showed that the respondents are divided very unequal among the three groups of CP. The group of ‘high degree of customer interaction’
only had 23 respondents which means the Central Limit Theorem (CLT) doesn’t apply. One of Construct No. of items α Skewness Kurtosis Kolmogorov-Smirnov Shapiro-Wilk
PU 4 0.920 -0.808 0.490 .119* .940*
OUT 2 0.798 -0.549 0.208 .129* .958*
REL 2 0.908 -0.312 -0.694 .121* .958*
RES 4 0.827 -1.024 1.651 .119* .934*
BI 3 0.830 -1.037 2.121 .130* .928*
Table 5: Skewness, Kurtosis and Normality Check * p < .001
- 32 - the assumptions of a linear model is that the variances in the groups are equal. This assumption of homogeneity of variance was tested using Levene’s test which showed that the variances were equal for all five constructs as can be seen in Table 6.
Construct Levene Statistic p Welch Statistic p
PU 2.152 .118 3.191 .050
REL .524 .593 2.456 .097
OUT .133 .876 .078 .925
RES .323 .724 1.433 .249
BI .880 .416 1.088 .345
Table 6: Levene’s and Welch’s test
The Welch test indicates that there is a statistically significant difference between groups for PU only (F (2, 50) = 3.19, p = .05) and the Games-Howell post hoc test indicates that the only statistically significant difference (p < .05) is between group 1 (hardly no customer interaction) and group 3 (high degree of customer interaction). In case CP has an interaction term with PU, bootstrap parameters estimates can be used to know that the model parameters are robust without affecting F itself (Field, 2018).
4.4 Correlation analysis
The Pearson’s correlation test is used to examine the relationship between continuous variables and allows to quantify the intensity and meaning of the relationship between these variables (Field, 2018). Table 7 shows that based on the Pearson correlation there is a high positive
correlation between REL and PU (r = 0.60) and PU and BI (r = 0.58), both significant at p < .01. There is also a moderate positive correlation between OUT and PU (r = 0.47), RES
and PU (r = 0.35), both significant at p < .01. It shows that REL (r = 0.47), OUT (r = 0.43) and RES (r = 0.39) all have a moderate positive correlation with BI (p < .01). Knowledge of AI (KAI) only has a moderate positive correlation with RES (r = .36, p < .01). This suggest that respondents with more knowledge of AI have a better ability toattribute the results to the use
- 33 - of an AI system. Age has a slightly weak negative but significant correlation with BI (r = -.14, p < .01). This suggest that older respondents have a lower behavioural intention to use AI systems, which is similar to prior research (e.g. Morris & Venkatesh, 2000). The control variable Experience with AI (EAI) has a significant moderate negative correlation with all five constructs from the TAM2 model, ranging between -.13 and -.28 and also a negative correlation with KAI (r = -0.37).
Construct M SD 1 2 3 4 5 6 7 8 9
1. KAI 5,06 1,92 -
2. Control: EAI 1,86 0,72 -.37** -
3. CP 1,27 0,58 -.11* .03 -
4. Age 44,96 10,90 -.11* .10 -.02 -
5. PU 5,24 1,12 -.11* -.18** .10 -.10 (.92)
6. REL 4,45 1,43 .16** -.25** .10 -.03 .60** (.91)
7. OUT 4,75 1,20 .12* -.13* .00 -.02 .47** .39** (.80)
8. RES 4,92 1,11 .36** -.28** -.08 -.08 .35** .37** .27** (.83)
9. BI 5,45 1,00 .18* -.25** .03 -.14** .58** .47** .43** .39** (.83) Table 7: Means, standard deviations and correlations
** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
(.xx) Cronbach's Alpha
4.5 Hypothesis testing
This segment will test the proposed hypothesis from chapter 2.7. In 4.5.1 the direct effects between the independent variables and dependent variable will be examined using a multiple regression analysis. In 4.5.2 the results of the indirect effects are presented which are investigated using PROCESS of Andrew F. Hayes (2018).
4.5.1. Direct effects
Hierarchical multiple regression was performed to examine the ability of REL, RES and OUT on PU, after controlling for Experience with AI (EAI). In the first step of hierarchical multiple
- 34 - regression the Control Variable EAI was entered. This model was statistically significant F (1,328) = 11.32; p < .001 and explained 3.3% of the variance in Perceived usefulness. At Step 2 the Independent Variables of REL, RES and OUT were entered and the total variance explained by the model as a whole was 43.4% F (3, 325) = 62.38; p < .001. By adding the three variables an additional 40% of the variance in Perceived usefulness was explained, after controlling for EAI (R2 Change = .40; F (3, 325) = 76.78; p < .001). In the final model the Control Variable EAI (β -.005; p = .91) was statistically not significant. The three Independent Variables were all statistically significant, with REL showing a higher Beta value (β .44; p <
.001) than RES (β -.27; p < .001) and OUT (β .11; p = .01) as can be seen in Table 8. In other words, if a person’s Job relevance increases with one point, the Perceived usefulness would increase with 0.44. With the results of this multiple regression analysis enough evidence exists to support hypotheses 1, 2, and 3.
R R2 R2 Change B SE β t
Step 1 .18 .03***
EAI -.28 .08 -.18 -3.36
Step 2 .65 .43*** .40
EAI -.00 .06 -.00 -.11
REL .35 .03 .44*** 9.35
RES .25 .04 .27*** 5.90
OUT .11 .04 .11** 2.37
Table 8: Hierarchical Regression Model of Perceived Usefulness Note: statistical significance: * p < .05; ** p<.01; *** p<.001
A similar analysis was performed to investigate the ability of PU on BI, after controlling for Experience with AI (EAI). The result can be found in Table 9. In the first step of hierarchical multiple regression the Control Variable EAI was entered. This model was statistically significant F (1, 328) = 22.97; p < .001 and explained 6.5% of the variance in Behavioural intention. After entry of PU in Step 2 the total variance explained by the model as a whole was 36.1% F (1, 327) = 92.40; p < .001. The introduction of PU explained an additional 29.6% of
- 35 - the variance in Behavioural Intention, after controlling for EAI (R2 Change = .29; F (1, 327) = 151.31; p < .001). In the final model the variable EAI (β -.15; p < .001) was statistically significant. The variable PU was statistically significant with a Standardized Coefficient Beta of .55 (p < .001). In other words, if a person’s Perceived usefulness increases with one point, the Behavioural intention would increase with 0.55. With the results of this multiple regression analysis enough evidence exists to support hypotheses 4.
R R2 R2 Change B SE β t
Step 1 .25 .06***
EAI -.35 .07 -.25*** -4.79
Step 2 .60 .36*** .29
EAI -.21 .06 -.15*** -3.44
PU .49 .04 .55*** 12.30
Table 9: Hierarchical Regression Model of Behavioural Intention Note: statistical significance: * p < .05; ** p<.01; *** p<.001
4.5.2. Indirect effects
To test the moderating effect of age, knowledge of AI and customer proximity, analysis was performed using the PROCESS model from Hayes. Hayes (2018) provides several models which can be used to analyse if the main effect between an independent and dependent variable changes across the different value of the moderators. It has one limitation that it only allows up to two moderators simultaneously in one model which is not fully suitable for this research model which contains three moderators. Therefore, to test whether age and knowledge of AI effect the relationship between REL, RES and OUT on PU, Model 2 has been used. To test whether customer proximity effect the relationship between REL, RES and OUT on PU, Model 1 has been used. As CP is a categorical moderator, SPSS will compare the means of each group with each other. Hence, W1 will compare the mean of ‘limited customer interaction’ (CP2) with ‘(hardly) no customer interaction’ (CP1) and W2 will compare the mean of CP3 ‘high degree of customer interaction’ (CP3) with ‘(hardly) no customer interaction’ (CP1). The
- 36 - numerical independent variables and moderators were standardized in SPSS before running the analysis, hence a new variable in the dataset was created which started with ‘Z’, e.g. REL_TOT became ZREL_TOT et cetera. A conceptual diagram of both models can be seen in Figure 5.
Figure 5: Process Model 1 and 2 by Andrew F. Hayes
Moderation between Job Relevance and Perceived Usefulness
The results in Table 10 show that the interaction term REL x KAI (XW) and REL x Age (XZ) were both statistically not significant (p = .69 and p = .96). Thus, the effect of job relevance on perceived usefulness of AI systems does not change significantly across the different values of AI knowledge and age. In other words, the effect of REL on PU does not depend on the fact if someone is young or old or if someone is very knowledgeable of AI or not. Therefore, H5b and H5c are rejected.
Coeff. SE t p
Intercept i1 5.24 .05 103.94 < .001
REL (X) c1 .66 .05 13.07 < .001
KAI (W) c2 .02 .05 .40 .68
REL*KAI (XW) c3 -.02 .05 -.39 .69
Age (Z) c4 -.08 .05 -1.77 .07
REL*Age (XZ) c5 .00 .05 -.04 .96
R² = 0.361, p < 0.001 F(5,324) = 36.678
Table 10: PROCESS Moderation summary between REL and PU
- 37 - The results in Table 11 show that for W1 (C2 = .20, p = .69) the difference in perceived usefulness is not significant. For W2 (C2 = 1.52) the difference in perceived usefulness between employees with a high and low degree of customer interaction is statistically significant (p = .05). However, the interaction term for both XW1 (C3= -.03) and XW2 (C3= -.26) are statistically not significant (p = .72 and p = .08). Thus, although there is a difference on the perceived usefulness of AI systems between high degree of customer interaction and no customer interaction, the effect of job relevance on perceived usefulness is not moderated by the level of customer proximity. Therefore, hypothesis H5b is rejected.
Coeff. SE t p
Intercept i1 3.07 .17 17.24 < .001
REL (X) c1 .48 .03 12.56 < .001
CP (W1) c2 .20 .52 .38 .69
CP (W2) c2 1.52 .78 1.94 .05
REL*CP (XW1) c3 -.03 .11 -.35 .72
REL*CP (XW2) c3 -.26 .15 -1.74 .08
R² = 0.362, p < 0.001 F(5,324) = 36.829
Table 11: PROCESS Moderation summary between REL and PU
Moderation between Output Quality and Perceived Usefulness
The results in Table 12 show that the interaction term OUT x KAI (XW) and OUT x Age (XZ) were both statistically not significant (p = .08 and p = .09). Thus, the effect of output quality on perceived usefulness of AI systems does not change significantly across the different values of AI knowledge and age. In other words, the effect of OUT on PU does not depend on the fact if someone is young or old or if someone is very knowledgeable of AI or not. Therefore, H6b and H6c are rejected.