Running head: THE ROAD AHEAD TO AUTONOMOUS AI-CAD SOLUTIONS

(1)

The Road Ahead to Autonomous AI-CAD Solutions for Reliable Breast Cancer Detection

Stefanie Gante 13319949

University of Amsterdam, Amsterdam Business School

MSc Business Administration, Track Digital Business Supervisor: Spyros Angelopoulos

Word Count: 12990

Submission Date: 23 June 2021 Version Submitted: Final Draft

EBEC approval number: EC 20210423070430

(2)

Statement of Originality

This document is written by Student Stefanie Heike Gante who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Table of Contents

1 Introduction 1

2 Literature Review 3

2.1 Mammogram Diagnosis – From Taking the Test to Detecting Cancer 3

2.1.1 The process – How is cancer being detected? 3

2.1.2 The evolution of Computer-Aided Detection/Diagnosis. 5

2.1.3 IT environment - PACS and DICOM. 8

2.2 Factors on the Conversion from Supporting Software to Fully Automated Decision Making 10 2.2.1 The importance of data in product development and excellent algorithm performance. 10

2.2.2 FDA and CE clearance. 11

2.2.3 It is not just about the product, it is about how it can be used. 12

2.2.3.1 Minimal AI integration. 13

2.2.3.2 Fully automatic AI integration. 15

2.2.3.3 Platform AI integration. 16

2.2.4 Consumer acceptance and algorithm aversion. 17

3 Methodological Framework 19

3.1 Data Collection 20

3.2 Data Analysis Strategy and Techniques 22

4 Results 23

4.1 The Pilot Interview 25

4.2 The Industry Experts 26

4.2.1 The drivers for the adoption of autonomous AI for mammography. 26

4.2.1.1 Product excellence. 27

4.2.1.2 Integration in hospital IT and the clinician’s workflow. 28

4.2.1.3 Acceptance of stakeholders. 29

4.2.2 Important topics as perceived by development versus integration focused employees. 33

4.2.2.1 Future outlook. 33

4.2.2.2 Challenges for the development. 34

4.2.2.3 Challenges for the integration. 35

4.2.2.3 Acceptance of stakeholders. 37

5 Discussion 38

5.1 Key Findings 38

5.2 Implications for Legislators 39

5.3 Implications for Companies 40

5.4 Implications for Research 42

6 Limitations of the Study 43

7 Conclusions 44

REFERENCES 47

APPENDIX A. FIGURES AND TABLES 52

APPENDIX B. INTERVIEW PROTOCOL 58

APPENDIX C. INTERVIEW TRANSCRIPTS 59

(4)

ABSTRACT

The global COVID pandemic has impaired an existing backlog of mammography conduction and interpretation. Recent developments in the field of convolutional neural networks have facilitated the development of algorithms, in this paper referred to as AI-CAD, which can detect and classify lesions in mammograms more accurately than radiologists have done to date. However, the adoption rate of AI-CAD in Europe is low. In the European double-reading standard, which requires two radiologists to analyze a mammogram, AI-CAD solutions may only be used as an advisory agent. In this paper, the researcher conducts interviews with market-leading medical imaging companies that develop or implement AI-CAD. The aim of the research is to identify the software vendor’s challenges for the adoption AI-CAD products. The findings suggest that the main drivers of AI-CAD adoption in the eyes of the software vendors are (1) product excellence, (2) the integration of the product with the hospital IT and the clinician’s workflow, (3) stakeholder acceptance and (4) legal and ethical aspects. Interestingly, people concerned with the software development see the main challenge in stakeholder acceptance while people on the integration side consider software integration to be the grater remit. The evaluation exposes how both factors affect AI-CAD adoption and stimulates academic as well as practice-oriented contemplation about causes, solutions and stakeholders’ responsibility.

Key Words: Computer-Aided Detection, Computer-Aided Diagnosis, AI-CAD, Mammography, Breast Cancer Detection, Diagnostic Software Adoption

(5)

1 Introduction

According to the World Cancer Research Fund, breast cancer is the most commonly occurring form of cancer in women and the second most common form of cancer overall. It is estimated that one out of eight U.S. women will develop breast cancer in their life (U.S. Breast Cancer Statistics, 2021). The likelihood of developing breast cancer increases with age, which is why it is widely recommended that women receive an annual mammogram from age 40 onwards (Johns Hopkins Medicine, 2021). A mammogram is an X-ray of the breast, which helps a radiologist to detect early warning signs of breast cancer, that might go unnoticed for up to several years (RadiologyInfo.org, 2019). Since the 1970s, mammograms have been used as an important screening technology to detect breast cancer, even when the patient is asymptomatic (American Cancer Society, 2021). Fast forward 50 years, we find ourselves in the middle of the fourth industrial revolution which is forging ahead, recently reinforced even more by a global pandemic that is challenging us in our traditional way of working. The medical imaging industry is no exception to this, as the shortage of radiologists in certain countries and regions and the backlog for medical examinations that have built up since early 2020 belong to the top five drivers of the global teleradiology market (Gill, 2020). Over the past 30 years, companies have turned this deficiency in the market into a competitive advantage for themselves by developing computer- aided detection/diagnosis (CAD) software (Siemens Healthineers, 2021). This software can help a radiologist to analyze a medical image, such as a mammogram, by detecting and highlighting suspicious areas in the picture. However, traditional CAD did not perform at the quality of a radiologist, which is why it was used as a tool complementing the radiologist’s job, but could not function independently and has never received great appreciation within the field.

(6)

With the processes made within the past five to ten years in different fields of artificial intelligence (AI), image recognition software made a huge leap forward and so did modern CAD algorithms (Sarvamangala and Kulkarni, 2021). These novel software solutions will be referred to as AI-CAD. Nowadays, there are AI-CAD solutions that are better at detecting lesions in a mammogram than the best radiologists. In 2020, a study was conducted that compared the performance of an AI algorithm to detect breast cancer in mammograms versus two radiologists making a diagnosis for each mammogram jointly. The result of the study showed a decreased rate by 1.2% in false positives and a 2.7% decrease in false negatives for the algorithmic predictions (Walsh, 2020). More and more companies are developing AI-CAD for medical institutions and it seems there might finally be a way to meet the excess breast cancer screenings needed and simultaneously improve the diagnoses made. These developments seed hope for the battle against all kinds of diseases that mankind is fighting today, breast cancer being just one of them. Surely, we are on the verge of transforming traditional healthcare and taking it to a whole new level.

And yet, it appears that especially in Europe, the adoption rate of AI-CAD is moderate.

When used, it still functions as a complement to a radiologist’s work, rather than an independent reader, although it seems the technology performs well. Why is that the case? What hinders us on the way to more autonomous breast cancer detection software? Is it solely the customer, hospitals and clinics, who decide if a product will be adopted? What impact do other stakeholders have? To answer these questions, this research is designed as a qualitative study, entailing expert interviews with companies that are market leaders in the medical imaging industry. In the frame of four semi- structured and one open interview, experts in the field have shared their experiences to allow for a deeper insight on their perception of the main challenges along the way to mammography analysis with AI-CAD. Hence, the research question this study aims to answer is:

(7)

“What are software vendor’s challenges for the diagnosis of breast cancer with AI-CAD?”

2 Literature Review

The following section serves to summarize the academic research findings of the past years in the field of software for mammography analysis. The literature review is divided into two main parts. The first one provides an introduction to breast cancer screening procedures, the software used and the IT infrastructure in place. The second section will focus on considerations for future development in the field. The factors that can affect the adoption of mammogram screening software as identified by the literature will be named and elaborated on.

2.1 Mammogram Diagnosis – From Taking the Test to Detecting Cancer

2.1.1 The process – How is cancer being detected? The first step of breast cancer screening is usually conducting a mammography. As previously mentioned, this is a non-invasive procedure during which an X-ray of the patient’s chest is taken. Clinicians differentiate between 2D X-rays of the chest (mammograms) and 3D X-rays (breast tomosynthesis). The difference between the two is that the former one takes pictures from the front and the side of the breast, while the latter takes pictures from various angles, so that each layer of the breast issue is shown (El Camino Health, 2021). Both procedures are widely accepted and are colloquially referred to as mammograms. In this study, we refer to mammograms as the original, 2D mammograms.

After a mammogram is being taken, a radiologist will examine the X-ray. In Europe, a double-reading standard is in place, which means that according to the European guidelines for quality assurance and breast cancer screenings, two radiologists must inspect the mammogram in a decentralized program. In centralized programs this is not required but advised and also broadly

(8)

implemented (Rosselli del Turco et al., 2006). The double-reading standard makes breast cancer screenings more time intensive and also more costly, as the capacities of two radiologists are needed, rather than one. However, it is also proven that regardless of the patient’s individual case at hand, double-reading increases sensitivity and, therefore, the likelihood that breast cancer is being detected (Euler-Chelpin, 2018). If the radiologists come to the conclusion that the mammogram is abnormal, different (invasive) follow up tests are needed to confirm or deny the presence of cancer cells with absolute certainty. A study by the Breast Cancer Surveillance Consortium in 2013 showed that out of all abnormal mammograms that were taken, only 4.4% of examinations resulted in a tissue diagnosis of cancer within one year (BCSC, 2013). While actual cancer cases are oftentimes detected by mammograms (true positives), there are also a lot of false positives, hence, alleged cancer cases that turn out to not be cancerous after some follow up tests.

False positives can lead to anxiety for patients and follow up tests that are expensive, time consuming and invasive to the patient. For this reason, a major challenge the industry is facing at this point is to diminish the false positive rate and detect very early-stage cancer that might not be obvious in a mammogram.

Lastly, it should be mentioned that mammograms are more difficult to interpret for radiologists in two cases. Firstly, a mammogram is harder to read when the patient has denser rather than fatty breasts, which refers to the density of the patient’s breast tissue. The reason for the increased difficulty of interpreting a mammogram of a dense breast over a fatty breast is that dense breast tissue is more visible in a mammogram than fatty breast tissue. Therefore, there is a higher likelihood to overlook a tumor in the picture or confuse the tissue with a lesion (National Cancer Institute, 2020). An example of a tumor displayed in the mammogram of a fatty versus a dense breast is displayed below in Figure 1.

(9)

Figure 1

Mammogram with positive tumor finding for a fatty versus dense breast (DenseBreast-info and Berg, 2021)

Secondly, a radiologist can make a diagnosis more easily when he or she has a point of reference in the form of former mammograms of the patient. As previously mentioned, there is a high rate of false positive mammograms, which hints towards the fact that discoloration in a mammogram can easily be misinterpreted. When a radiologist can compare the mammogram to an older version, it is easier to rule out alleged abnormalities (ITN, 2016). In the following, it will become apparent how these two cases also have an effect on software tools for cancer detection.

2.1.2 The evolution of Computer-Aided Detection/Diagnosis. Until the late 90s, radiologists had to analyze a patient’s mammogram without any technological aid. With the

(10)

upswing of technological development and the progress made in machine learning techniques, Computer-Aided Detection/Diagnosis was first introduced in 1998 (Firmino et al., 2016). The software allowed for digital mammograms to be stored and modified (e.g., to adjust the contrast for better visibility of tumors). Zooming in on the software, Computer-Aided Detection (CADe) and Computer-Aided Diagnosis (CADx), often being jointly referred to as CAD, have slightly different objectives. CADe is more focused on the detection of lesions, while CADx can characterize the abnormal finding in a medical image (Firmino et al., 2016). CADe can be understood as the origin of CAD software (Castellino, 2005).

CAD applications incorporate a supervised learning algorithm that could classify a mammogram/areas of the mammogram as normal or abnormal. Algorithms used for CAD are commonly classifying supervised learning algorithms that depend a lot on choices the developer makes as well as the ground truth of the dataset used for their training (Lebovitz, 2021). Put simply, the algorithm is taught what to look for to detect a cancerous case. The developer would have to make decisions about how different attributes that aim to predict the dependent variable (cancer or no cancer) fall into weight (Sechopoulos & Mann, 2020). The predictions made by such algorithms are easy to trace back, and understand why the algorithm has made a certain prediction.

However, CAD was not very successful and proved to be less accurate than radiologists’

predictions. One main argument for this was allegedly the fact that the aim of CAD was to do what radiologists already did well. It had been constructed to detect cancer in a mammogram like a radiologist would. The consequence was that it became good at mimicking a radiologist, but could not outperform it, which meant that it would add very little value to experienced radiologists and was not considered reliable. Radiologists did not trust the prediction of the algorithm, which made

(11)

it rather superfluous (in some case even considered extra work), rather than helpful (Kohli and Jha, 2018).

With the introduction of convolutional neural networks, image processing and interpretation evolved and software associated with the same significantly improved performance.

Neural networks are an AI deep learning method. Their introduction in the field of medical image processing inspired the development of new software which would be able to detect breast cancer with higher sensitivity and higher precision (measures probability of correct classification of true cancer cases) rates. Unlike the traditional supervised learning algorithms that had been used previously, convolutional neural networks have not been trained how to look for a lesion. Instead, the algorithm receives several mammograms including their true classification (ground truth) and adjusts the parameters for predicting to optimize the correct decisions made. In a way, it teaches itself how to find and identify a lesion.

However, the way this machine learning method makes predictions has also been a major barrier for some people to trust the results. Neural networks have several “decision making layers”

and the decisions taken are numerous. It is hard to impossible for a human to understand in the end why a certain attribute falls into weight a certain amount when predicting the likelihood of a cancer case (Shin, 2021). With increasing complexity, the way the model makes a prediction becomes a

“black box”, hence, not easily tracible. To not understand how a decision is being made can be hard to accept, as it might be perceived as “giving up control”. This is a novel concept to many people, especially when it comes to medical decision making. How this can have an effect on user and patient acceptance will be discussed further in the second part of the literature view.

At the current state of art, AI-CAD predicts the example as cancerous or not cancerous based on the data provided by the image only. As previously mentioned, it can help the quality of

(12)

the diagnosis to consider past mammograms, however this is something the developed algorithms cannot do at this point. Additionally, it was mentioned that dense breast tissue is more visible in a mammogram than fatty breast tissue, which can make it harder to read for a radiologist. In the end, people are limited by their visual capabilities. An algorithm analyses a picture differently than a human does, which can help detecting lesions that a human might overlook. While sensitivity is higher for fatty breasts than dense breasts, AI-CAD can help a radiologist detect lesions, independent of the breast density (Leichter et al., 2010).

2.1.3 IT environment - PACS and DICOM. Nowadays, mammograms are saved in a digital format on a computer, which allows for contrast modification and additional software usage, such as Computer-Aided Diagnosis (CADx) and Computer-Aided Detection (CADe; Bland and Copeland, 2010). Digital mammograms are often stored according to the international DICOM standard, which is a protocol for transmitting, storing, retrieving, printing, processing and displaying medical imaging information (DICOM, 2021). This generally accepted standard enables among others a more efficient information exchange of patient data, the integration of different imaging devices from various vendors and ultimately more affordable and better patient care. The DICOM standard has been an essential precondition for the development and adoption of PACS systems (picture archiving and communications system) and while it is no requirement to enforce it, it is core of the absolute majority of PACS systems. The DICOM standard facilitates the exchange of data between the PACS and other modalities (itz-medi.com, 2021).

A PACS is installed at most hospitals and clinics to store and retrieve pictures. According to an article from 2015, nine out of ten hospitals in western Europe have a fully operating PACS system installed (Brosky, 2015). PACS systems have four main components: 1) imaging

(13)

modalities, which are necessary to produce a medical image (in case of breast screenings this would be an X-ray machine), 2) a network to the database, which is there to ensure secure image transmission and upload, 3) a workstation, which is the access point for radiologists or other clinicians to work with the images and 4) an archive for storage, which can be accessed by people that have the necessary permissions to do so (Advanced Data Systems Corporation, 2021).

While the format, in which a mammogram needs to be saved, is predetermined (DICOM), there is no standardization of a PACS system itself. This means that there are no strict regulations when it comes to the design and functionalities of a PACS system. This leads to vendors carrying out their PACS systems in different ways and focusing on different functionalities. For instance, the way an image is viewed when opened and what the radiologist can do with the picture (editing it etc.) can vary. This is also true for the integration of external software into the PACS. While some systems are fairly open and different software can be used as an add on, some systems are more closed and do not allow for interaction with external software. The little to no presence of PACS guidelines leads to different systems installed at different hospitals and clinics. This has some implications, which will be elaborated on further in the second part of the literature review.

It should be mentioned that there may be other IT systems installed and part of the customer’s workflow, such as a clinical information system (CIS). However, a PACS is most crucial for any work related to medical imaging information and would be the information system that external software for mammography analysis could be integrated with, as will be elaborated on more thoroughly in the second part of the literature review. To frame the problem at hand better, this study therefore solely focuses on PACS systems.

(14)

2.2 Factors on the Conversion from Supporting Software to Fully Automated Decision Making

Seeing the progress that has been made in the field over the past couple of years, one may wonder how long it will take for diagnostic applications such as AI-CAD to be fully autonomous, without any input needed from a professional clinician. Truthfully, research suggests that we have almost reached that point from the technical perspective. Algorithmic performance is still not optimal, however certain applications are already outperforming excellent radiologists. Taking a rational and result oriented perspective, radiologists’ input on mammogram analysis might fade in importance over the years to come, which would free up a significant amount of time and capacities that could be invested elsewhere. Nevertheless, the adoption of fully independent diagnostic software boils down to more than just excellent software performance. In the following paragraph, the author will break down the different aspects of successful software implementation as found in academia.

2.2.1 The importance of data in product development and excellent algorithm performance. Taking the perspective of an AI vendor, so of a firm that develops the diagnostic software, there are several aspects to a successful product. First and foremost, there is of course the software performance. Without excellent performance, there is no market for the product in the first place. To achieve this, several factors come into play. In previous paragraphs, it has been discussed how the development in AI in general and image analysis specifically has boosted the progress made in medical diagnostics. Having the necessary technological process and professionals that are capable of implementing it, is only one side to a well-functioning algorithm.

Besides the actual building of an algorithm, it must also be trained in order to perform well. The

(15)

better the training process and the better the quality of the data the algorithm has been trained on, the better the software performance (Valtorta and Sessions, 2006). Indeed, this process can be compared with a human radiologist – obviously clinicians will perform better when they are given the right tools to perform their task, but in the end, they will have to learn how to execute them first. Their performance increases with their experience. For this reason, medical data, in this case being mammograms that are correctly labeled as cancerous or not, need to be available for training.

There are public databases available for these purposes, but one can claim that medical data is generally well protected and not easily obtainable. Hospital data cannot be easily shared between parties, even if it is anonymized. Within Europe, hospitals can share anonymized data with institutions that support research in the public sector. This includes e.g. universities, but evidently excludes direct data transferal to private companies. Due to the GDPR regulations, which came into force in 2018, data sharing outside the EU/EEA has become more difficult. The Federation of European Academies of Medicine released a report jointly with two other European institutions in spring 2021, which urges that one lesson learned from COVID-19 is the importance of collaboration in research. The article claims that no-one can benefit from very rigid data protection regulations in research and pushes for more data sharing (FEAM, EASAC and ALLEA 2021). Considering these arguments, one important part of the discussion may be if and how governments should take action to facilitate secure data sharing to accelerate the progress made in research and consequently in the development of healthcare products, such as modern CAD.

2.2.2 FDA and CE clearance. Additionally, one must keep in mind that the development of software for medical diagnoses falls into the field of medical product development. For apparent reasons there are strict regulations on the sale and usage of such products and they need approval

(16)

before they can be used (EMERGO, 2021). While in America medical products need to be FDA approved, they need a CE license in the European Union (ASQ, 2021). A CE mark indicates the compliance with certain regulations to assure performance, quality and security of the product (Clever Compliance, 2020). Technological developments over the past years have challenged the adaptability of the CE labeling procedure to the fast pace developments have made in the field.

Modern algorithms are constantly improving through regular updates and improvements that are being made and it appears that there is a trade-off between optimal quality control and fast pace product improvement. One requirement made by the CE label is that static products are being released. This implies that after a product has been released, changes cannot easily be made to the product (such as algorithm improvements through extended training of the algorithm after it has been released). To put it differently, firms cannot make use of unsupervised or reinforcement learning algorithms that learn with less supervision or even continually (Lomonaco, 2021). This regulation works as a safety switch, to make sure product and data quality is assured in every version of the product. And yet, this safety regulation arguably slows down the progress an algorithm could make and shuts out a whole world of possibilities for medical software solutions.

Therefore, in the light of the development of deep learning techniques, we should ask ourselves if there may be inefficiencies due to the limitations set legally in the market that might not be the optimum anymore concerning modern software solutions and should be reconsidered.

2.2.3 It is not just about the product, it is about how it can be used. The usability of the best technological product is in the end limited by how it integrates in the workflow of the user.

The competition in the technological market has increased dramatically and costumers have a lot of products, vendors and processes to choose from for different tasks. For a long time, the focus

(17)

in technological development has been on product excellence rather than product integration (Iansiti and West, 1997). But with the increase in product offers and different sellers, the variety of ways how to integrate them has increased and so has the importance of high-quality product integration. Previously, it had been mentioned that there is one data transferal standard in place, called DICOM, but that other than that, there is little standardization. To visualize how this influences the design of a workflow and how modern CAD software can be integrated into this workflow, the following section will discuss the main components of a radiologist’s workflow and how they can integrate.

There are many ways of how AI can be included in a radiology workflow. In the following, three of the most common ways to do so will be described. The author decided to highlight these three integration options, as those are the ones that are mainly used by the companies that participated in the study and have also sometimes been addressed in the interviews. Note that in the examples made, the customer is assumed to have a PACS system installed. As previously discussed, this is not always the case, however the absolute majority of hospitals falls into this category. In case no PACS system is installed, the first integration option is the only one out of the three that would be applicable for the customer.

2.2.3.1 Minimal AI integration. The first integration option is very simplistic and requires minimum initial effort from the customer. The AI software can be for instance a web-application or a software package that can be installed on a (separate) computer. In the example for a minimal AI integration displayed in Figure 2, a separate workstation (such as a computer dedicated to analyzing the mammograms) is installed on premise. After the mammogram is taken, the scans are sent to the PACS to be stored, as well as to the AI workstation to be analyzed. The results computed

(18)

on the workstation are then being reported back to the PACS where they can be accessed again through a PACS viewer. The sole purpose of the viewer is that the results saved in the PACS can be opened and viewed, but not edited. It is important to mention that the user must actively transfer the results from the AI work station to the PACS, as the software installed cannot communicate automatically with the PACS system. Thus, the radiologist can enter the workflow at two points, the AI work station and the PACS viewer. The advantage of this integration is that it is easily implemented and might therefore also be less costly at first. Also, the AI workstation allows for the radiologist to edit the results before sending them to the PACS. However, the consequence is that input from the radiologist is required for accurate storing of the results. Additionally, the workflow can get messy, especially when several software solutions for different tasks (different medical cases) are being integrated (Metz, 2019).

(19)

Figure 2

Minimal AI integration – A separate workstation

Note. Adapted from 5 Options for AI integration into your radiology workflow, by Metz, 2019 (https://www.quantib.com/blog/5-options-for-ai-integration-into-your-radiology-workflow).

2.2.3.2 Fully automatic AI integration. The fully automatic AI integration is similar to the minimal AI integration, but they differ in one essential point: In the fully automatic integration, the results as computed by the workstation (here called AI server/node) are automatically being sent to the PACS system. Therefore, there is no action needed by the radiologist at this point to manually send over the results calculated by the AI software. However, this also implies that without a workstation, the radiologist cannot easily make adjustments and edit the calculated results. They are only accessible through the PACS viewer, which merely allows the results to be looked at, but not modified. The advantage here is that no action is needed from the radiologist to

(20)

ensure the correct storing of the results. On the downside is the missing possibility for the radiologist to edit the results before they are being sent to the PACS system, which might lead to less clean reporting of the decisions made (Metz, 2019).

Figure 3

Fully automatic integration – The AI server/node

Note. Adapted from 5 Options for AI integration into your radiology workflow, by Metz, 2019 (https://www.quantib.com/blog/5-options-for-ai-integration-into-your-radiology-workflow).

2.2.3.3 Platform AI integration. An alternative to having a separate AI workstation or node would be the integration through an AI platform. There are several big companies that offer a platform solution, which facilitates the integration of individual AI solutions. Consequently, AI vendors can decide to let their integration be handled by a different company. The company then again needs to integrate into the hospital’s workflow through a platform which can communicate

(21)

with the PACS system. This solution implies little effort from the hospital’s side and is an attractive solution if the hospital or clinic would like to make use of more than one software for different tasks. The downside is that there are again different vendors that offer such a platform, so the software vendors that develop AI solutions must decide who to collaborate with and may then only be available through this platform (Metz, 2019).

Figure 4

Platform AI Integration

Note. Adapted from 5 Options for AI integration into your radiology workflow, by Metz, 2019 (https://www.quantib.com/blog/5-options-for-ai-integration-into-your-radiology-workflow) Copyright 2021 by Quantib B.V.

2.2.4 Consumer acceptance and algorithm aversion. Aside from the product excellence and integration, consumer acceptance is essential in the question of product success. Obviously, a product will not be adopted if the demand in the market does not exist. In the question of consumer acceptance of medical imaging software, we need to distinguish the customer of the product, the user/consumer of the product and the patient. The costumer of the product are management teams

(22)

within the hospitals and clinics that decide if an investment should be made to purchase the product or not. Their interest is mainly economical, which is why we will focus on the consumer and the patient in this paper.

The consumer of the product in this case are the radiologists that use the product to improve their diagnoses. Their acceptance has several underlying drivers. The most pressing ones are arguably the fear of job replacement by algorithms they may experience and the level of trust they have toward the quality of a diagnosis made by an algorithm. According to a recent study, 38% of radiologists fear being replaced by AI. However, the same study also showed a correlation between fear of replacement and lack of knowledge of AI algorithms. Meanwhile, radiologists with an intermediate to advanced knowledge of AI had a more open and proactive attitude towards it (Huisman et al., 2021). Another factor is the level of trust the radiologist has towards the algorithm.

While some radiologists experience under-trust and therefore have a tendency to not rely on the result of the algorithm, others rely on it too much and do not examine the mammogram with sufficient attention (Jorritsma et al., 2015). In a previous section, it has been mentioned how the complexity of the algorithms at use can cause mistrust of the radiologist (concept of the “black box”). On the other hand, it has also been mentioned how AI-CAD cannot consider other patient information, such as the patient’s health history or cancer risk rate, to make a diagnosis at this point. Being able to analyze the case in its context is a significant value added by a radiologist.

Hence, in either case of mistrust, the consequence is decreased performance, which negatively affects consumer acceptance.

Lastly, one must consider the patient’s and essentially society’s attitude towards AI-CAD.

In the end, having an algorithm make a diagnosis (autonomously) is also an ethical discussion.

(23)

While the ethical arguments that come into play exceed the frame of this paper, one concept, called algorithm aversion, shall be mentioned at this point.

Humans tend to pick a human forecaster over an algorithm after seeing the latter err, even when they see that the algorithm overall outperforms the human, and are willing to trust an algorithm as long as it performs perfectly, but they are not willing to forgive a “mistake” like they would if the wrong prediction was made by a human (Dietvorst et al., 2015). This phenomenon, referred to as algorithm aversion, might be even more extreme in a medical scenario, as a false prediction (a false negative being even worse than a false positive), carries a lot of weight for the patients. It remains questionable what causes algorithm aversion. Plausible explanations are that 1) people do not fully understand how algorithms function – they expect a human to be capable to learn from a mistake and not do it again, whereas they assume an algorithm would repeatably make the same mistake; or 2) they have a general mistrust towards an algorithm making an important decision and feel uncomfortable with it, especially after seeing that also algorithms are imperfect.

From these findings it appears that the acceptance of AI for medical decision-making increases with the knowledge we have about it, yet there is a lack of academic research about patient acceptance of AI-CAD for mammography and its drivers. The main takeaway of this past paragraph is therefore that there is still some reservation toward autonomous diagnostic software usage and for now it remains questionable, how much of an impact this has on product adoption.

3 Methodological Framework

In the following paragraphs, the data collection and analysis strategy will be elaborated on.

A detailed visualization of the entire project workflow is attached in appendix A (Figure 5).

(24)

3.1 Data Collection

This study incorporates a qualitative research design and is executed in the form of expert interviews, following a purposive sampling approach. The interviewees are employees involved in software development and/or integration at the market-leading companies in Europe in the field of medical imaging. The companies were identified after an elaborate analysis of the competitive landscape of the market. Next, the individual companies were scanned for employees that work on the development of the product the company offers, or the integration of the same with clients. All potential interviewees (with exception of the pilot interview) were contacted mainly through LinkedIn and were sometimes referred to by colleagues.

In total, five interviews were conducted after which the point of data saturation had been reached. The first interview served as a pilot interview. It took place after the completion of the literature review and was meant to provide an impression of the main considerations concerning the adoption of AI-CAD at hospitals by an intermediary. The first interviewee is responsible for the IT integration at 14 different hospitals in Germany and is the contact person to all stakeholders for questions concerning system infrastructure and software solutions. The interview lasted one hour and took place via video call. Besides some general questions about the process of software integration at hospitals in the beginning of the interview, it was unstructured. The researcher aimed to let the interviewee lead the conversation toward topics they considered to be important in the question of autonomous software solutions for medical purposes. What followed was an open discussion and exchange of experiences made during the past 20 years. Based on the insights gained during the first interview, the researcher decided to focus on the challenges as perceived by medical software vendors/medical imaging companies. The design of the interview protocol for the following interview was inspired by the first pilot interview.

(25)

The remaining four interviews took 25 minutes on average, were executed via video call (Zoom) and have been audio recorded. Prior to the conduction of the interview, the participants had been provided with a two-page information sheet about the purpose of the study, why they had been asked to participate and how their data was being collected and processed. They were informed that the study had been approved by the Economics and Business Ethics Committee of the University of Amsterdam. Additionally, they were asked to sign a consent form to confirm that they agreed with the data recording, processing and storing terms. In exchange for their participation, they would receive a report summarizing the key findings of the research after completion of the thesis project if interested. All interviewees signed the consent form and asked for a report upon submission of the thesis project.

The interviews were semi-structured. Due to the different job positions of the interviewees, there were two slightly different interview outlines – one for people focused on software development and one for people focused on product integration. One interviewee carries responsibilities in both areas, so the guiding questions they were asked were a mix of both interview outlines. One interviewee works for a PACS vendor, but their interest is mainly with the integration of different AI solutions (such as AI-CAD for mammography) in the company’s PACS system. To identify with absolute certainty what interview questions would be appropriate for what participant, the first question in every interview would target the job and the responsibilities of the respondent. The interview structure was chosen to assure that the broad focus of all interviews would be the same (so that statements made by respondents in different positions could be compared) but to also allow the possibility to touch upon novel insights made during the interview.

The interview protocol is attached in appendix B.

(26)

After each interview, an interview transcript with a predetermined structure was created manually by the researcher with help of the audio file. The resulting transcripts were then edited lightly by the researcher, respecting an intelligent verbatim transcription style. The interviewees were provided with the final transcript of their interview to verify their data and potentially interject in case they felt their statements had not been recorded accordingly. Two participants made minor adjustments to how a sentence was phrased or asked for a part to be left out due to confidentiality reasons, but all participants agreed with the content of the interview. All interview transcripts can be found in appendix C.

3.2 Data Analysis Strategy and Techniques

The main strategy used for the data analysis has been a relational content analysis. This approach has been most appropriate, as the research is largely inductive in its nature and is hoped to provide some novel insights on personal experiences made by key players in the field. In the following, the individual steps for the data analysis will be specified.

As previously mentioned, interview transcripts were created after the conduction of the interview. NVivo was used as a program to help encode and organize the data gathered. In a first step, the researcher skimmed over the interviews to get an overview of the whole (Hsieh &

Shannon, 2005) and decided on some general codes that should be applied. As the participants were usually more involved in either the software development or the integration of the software in the customer’s workflow, most codes had a subdivision so that the opinions of the two opposing groups could be compared. From the information the respondents provided, some new codes were determined, which resulted in 30 codes overall. After all transcripts had been encoded, the different codes were scanned for patterns (second cycle coding). Some codes were merged with others and

(27)

a more specific structure was applied to the codes. In the end, there were eleven thematic top-level codes with more subdivisions, to avoid the loss of context and data (see Figure 6a and 6b in the appendix). Based on the resulting codes, the answers of the two groups of interviewees were compared and relationships were established, which will be specified in the next paragraph. The advantage of this data analysis technique is that the focus is on the information gathered and there is no rigid coding structure that might influence the outcomes of the study. However, due to the manual coding, the findings can be subject to the personal interpretation or prioritization of the researcher (Ratner, 2002). Also, due to data reduction in qualitative data analysis, there is always a risk of context loss and misinterpretation. The researcher implemented different strategies to mitigate the risk of subjectivity, which are presented in Figure 9 in appendix A.

4 Results

In the following section, the results and findings of the interviews will be discussed. To facilitate the allocation of quotes and arguments made, Table 1 below serves as an overview of the participants that contributed to the study.

(28)

Table 1

Participants Overview

In the following, the author will quote the participants with reference to their participant number. In addition, the complete interview transcripts with the respective participant numbers are attached in appendix C. From Table 1, one can tell that all participants have different jobs that are targeted toward product development, product integration or both. The interview’s focus was chosen due to the current and past job responsibilities of the participant. As such, P3’s interview focus was product development, as the interviewees’ past responsibilities (and therefore their experience) were targeted towards product development. P4’s interview was a mix of integration and development focused questions. While the interviewee indicated that their past responsibilities were very much focused on product development, nowadays, their attention toward product

(29)

integration is increasing significantly. That being said, all participants, regardless of their precise position were asked questions directed toward both, development and integration. As the pilot interview, indicated as P5 in the table above, has been an open interview, the focus of the interview design cannot be indicated as development nor integration focused.

4.1 The Pilot Interview

As mentioned in the Data Collection section of this paper, the first interview served as a pilot to get a better grasp on the pressing topics in the field of AI-CAD for mammography by an expert who is responsible for IT integration at hospitals and, therefore, is in touch with both, radiologists and software vendors.

One aspect that they mentioned very early on in the conversation is that radiologists’ time capacities are very valuable and the interpretation of mammograms requires a significant amount of their time. The integration of a technological solution is always a big financial investment, however governments now realize that a change is needed to guarantee good patient care in the long run. This is why in some countries, governments take action by passing new laws that create budgets for an incremental change in hospital IT (such as the Krankenhauszukunftsgesetz in Germany). Different stakeholders such as hospitals and law makers are currently scrutinizing the way how decisions in healthcare are made and if it is truly the best solution to have humans do everything manually. The interviewee also mentions that there are different mentalities in different countries and that this process is more progressed in some countries than in others.

Concerning the actual technological solutions, the interviewee highlights the difference between a decision support system and one that actively makes decisions and therefore also takes over significant responsibility from the clinician. Besides the legal implications, they mention that

(30)

companies may also be reluctant to take on this liability. Contrary to what the academic literature suggests, the respondent does not see confidentiality issues in data sharing for product development, as data can be anonymized and there should be no reason to fear that it can be traced back to a natural entity. However, a way bigger issue should be the current architecture of several PACS systems, which are rather closed off and do not allow for easy integration of external software. This could consequently cause a significant effort from the customer side, which could hinder the willingness to adapt such software.

4.2 The Industry Experts

4.2.1 The drivers for the adoption of autonomous AI for mammography. The aim of this study is to determine the key challenges that software vendors see toward the adaption of (more autonomous) AI software usage for mammography analysis. First, the main drivers that determine the adoption of AI for mammography were extracted from the information the interviews provided.

(31)

Figure 7

Adoption of AI-CAD for mammography drivers

As displayed in the Figure 7, all interviews jointly identify algorithm performance, integration in hospital IT and clinician’s workflow, acceptance of shareholders and legal & ethical questions as the most decisive variables that could positively or negatively influence the adoption of the software by a client. The legal & ethical aspects are not discussed elaborately in the interviews nor in the literature review of the paper, as they require significant attention and additional expert opinions from the legal departments of the companies in question. Nevertheless, they represent an important pillar for the question of software adoption which the interviewees are aware of and they offer potential for future research.

4.2.1.1 Product excellence. The other three pillars entail some subcategories that already suggest some components of the underlying drivers of the pillars. First, there is product excellence.

(32)

All firms mention that their software significantly helps radiologists to make a decision with confidence. The main advantage they see is that they can provide safety to the radiologist making the decision and help them minimize their workload. As P4 puts it: “[…] at the end of the day, our goal is that the performance of reading and detecting cancer in mammography and tomosynthesis improves. So at the end they have more women whose cancer is detected and it is more possible to kind of minimize the impact of the disease. This is something we believe in and we have studies showing that this is the case”. When asked what data they work with and if they collect data from their clients, they all reply that only a minority of the data used comes from public datasets. Some companies receive data with help of standing research affiliations with universities or hospitals.

All companies indicate that they work with data that they receive back from their customers, but only P3 indicates that this is an essential component of their product: “This is a very important thing for all AI manufacturers. It’s to know your weaknesses and of course what you are good at.

The weaknesses are the most important so that you can try to improve your algorithm all the time”.

For the other companies, performance feedback is sometimes received from customers if specifically agreed upon. However, it is not a requirement to report back the data when the product is being purchased.

4.2.1.2 Integration in hospital IT and the clinician’s workflow. The next pillar is integration in hospital IT and the clinician’s workflow. One interesting aspect is that both employees that have a stronger background in algorithm development do not see software integration as a big issue at hand (P3 and P4), while the two employees that are focused on integration see it as the biggest challenge. P3 mentions that the results of their algorithm are available through a web-page, which guarantees an easy integration. The interviewee mentions

(33)

that it can therefore be integrated in all environments. This integration is comparable with the minimum AI integration mentioned in the literature review, which is popular for requiring low initial integration effort for the customer. In the meantime, the employee that is working for a PACS vendor (P1) mentions the aspect offering the most room for improvement is “definitely the integration”. According to P1, “the AI algorithms are set up mostly next to a PACS, they have their own interface, their own viewer, so the radiologist has to open up different software next to their PACS where they are working in daily and that is kind of a hassle for the radiologist at the moment”. They point out that especially for a radiologist analyzing different types of medical images, a lot of the times they currently have separate applications and viewers that they must use, which is inconvenient. For this reason, their company has decided to offer a platform that integrates AI vendors directly. However, the downside is that “not all AI vendors are suitable to hook up onto that platform”. This solution is comparable to the platform solution that has been elaborated on previously in the literature review, with the difference that the PACS vendor is fixed. The different perceptions about software integration in the radiologist’s workflow are striking and will be paid more attention to at a later stage of the data analysis and in the findings.

4.2.1.3 Acceptance of stakeholders. The final pillar that will be discussed in detail is the acceptance of stakeholders. The stakeholders that are considered for this question are the people along the product chain, with a main focus on radiologists and patients, as described in the literature review. The participants contribute more insights towards the perceived acceptance of clinicians rather than patients, which may very well be due to the fact that those are the stakeholders they are closest to.

(34)

While literature suggests a certain restraint towards AI-CAD, especially by radiologists, all interviewees agree that radiologists are very open towards the changes and developments that the field is experiencing. During the first interview, the participant strikes that this is a positive development from the past years. They reason that “if we go five years back, it was way different, because AI was kind of a black box for most people and nowadays everybody is using AI on their phones and stuff like that. Everybody is more familiar with what AI does”. P1 also mentions that while there is a minority of radiologists that is still concerned with losing their jobs, it is not the regular radiologist that will lose their job, “but I think it’s the radiologist who doesn’t use AI that will be replaced for the radiologist who uses AI to improve their diagnosis”.

P3 argues that this openness and optimism toward technological development and involvement in the workflow stems from the nature of the radiologist’s job: “The good thing with radiologists is they are pretty techy people, because they are used to technology since the beginning. They use scanners, MRI, ultra sound. They need technology since the beginning, it’s not the standard physician”. They continue in saying “Before I was working more with surgeons and I saw the difference”. Another participant, P4, mentions that the key to openness of all is transparency and clear communication from the vendor’s side, which in the end strengthens trust, support and eventually acceptance: “We talk to our customers and they know our intentions, because our papers also show our future intentions of the health. And they are also contributing to this, because some of them are partners on this”.

Lastly, it should be mentioned that, still, aversion of algorithm usage from the human side is the main reason for slow development in the field of AI-CAD in the eyes of some participants.

First and foremost, they see patient acceptance as the major hurdle to face: “[…] are we ready as humans to have an autonomous algorithm reading our mammograms? That is a big question, so

(35)

we will need to prove scientifically that it is safe and then patients can trust it” (P3). Moving forward, P3 also mentions that while most radiologists are optimistic about the usage of CAD software, the radiologists that are not supportive of the developments might still hinder the progress in the field. As they phrase it: “radiologists will fear to lose their jobs and we really need to enforce the position that they are not going to lose their job, their job is going to change. It is a big revolution as we’ve seen it in different times in the world history as the mechanical automation and we will face the same issue. These would be the two main things. Technology is becoming ready and now it’s really a human thing”.

The derivation of the main drivers through some key quotes extracted from the different interviews is visualized in Figures 8a below and 8b in the appendix. The two Figures show how important quotes by the participants led to the findings developed in Figure 7 (which has been elaborated on in the previous paragraphs). Figure 8a on the following page offers a more concise illustration of the concept development with only one exemplary citation per dimension, whereas Figure 8b in the appendix is a more elaborate representation of the arguments made during the interviews.

(36)

Product excellence

Integration in hospital IT and clinician’s

workflow

Acceptance of stakeholders

“[…] it has been proven that AI is very well capable of predicting whether there is a tumor or not inside of the breast and they are doing a pilot right now to see if they can replace one of those radiologists. Also, the algorithms keep learning, so they keep getting better and better […]” (P1)

Algorithm performance

Data availability

Legal &

ethical

Research affiliations

Dependency on PACS vendors

Creation of plug and play platforms

Clinicians

Patients Common interest of optimal patient

care

Product promise and responsibility

“I think the majority of the data that we use to train our product is not publicly available”

(P2)

Static product limitations

Patient data protection

“Some others are datasets that were previously built through research projects. […] we had some history on already working on datasets, so we have access to those datasets that are used and some others are indeed via costumers.” (P4)

“Each PACS system will require Company X to do the integration in as specific way. So there is not really a guideline or a golden standard.” (P2)

“I think one of the biggest rooms for improvement would then be standardization. So if there was more standardization between PACS systems, that would make things a lot easier.” (P2)

“[…] a lot of platforms are in development. […] it’s really a platform that will be offered to a customer with any PACS system and the platform will handle all the communication with the AI algorithm. […] I think that is quite a strong development.” (P2)

“We are more capable on showing what that black box is to the end users so that they can comprehend what the algorithm does and I think it is getting more accepted.” (P1)

“I think the general perception of AI is quite good and they [clinicians] see that they can improve diagnoses and help the patients. That’s what you’re doing it for, right?” (P1)

“Either by replacing one reader in the double reading status in Europe or maybe more to the extreme - but that’s a bit more critical, because then you run into more liabilities - to even say you have the system telling that your mammogram shouldn’t be read at all, basically.”

(P4)

“So at Company X, we deliver a static product so the product will not learn anything until of course the moment that we update the algorithm.” (P2)

“That also has to do with different contracts between the vendor and the PACS vendors, but also the hospital. Like a three-way contract on how to send over data and feedback the results of the algorithm. But it’s definitely possible.” (P1)

“I think everybody is quite open in using AI algorithms. There are still some older generations of radiologists who still feel that AI is taking over their job. […] But I think the general perception of AI is quite good […]” (P1)

Second Order Dimensions Driving

Factors

Figure 8a

Derivation of the Drivers of AI-CAD Adoption

Illustrative Quotes

Low standardization

(37)

4.2.2 Important topics as perceived by development versus integration focused employees. In the next step, there will be a closer look at the importance the two groups lay on the individual drivers and what they associate with them. To compare their opinions based on the experiences they made in the field, some broad topics that dominated the interviews have been chosen. The view they take on the different main questions is summarized in Table 2 below.

Table 2

Different Perceptions of Integration vs Development Department

4.2.2.1 Future outlook. From the table above, it can be seen that the two groups identify the core challenge of the adaption of medical imaging software differently. One similarity that they share is that they both expect AI-CAD to be applied more often in the close future and that there will be a redefinition of tasks that a radiologist or an algorithm is responsible for. People on the

(38)

integration side have a slightly more reserved attitude toward the importance within the radiologist’s workflow than people from the integration side. One interesting comment has been made by the medical applications engineer who works for a PACS company. P1 suggests that a possible solution in the years to come would be that people who are not trained radiologists could take over some simple tasks from radiologists to free up capacities: “Maybe even tasks that are being done by radiologists now can be done by technicians for example using AI, because the tasks are also getting simpler in that kind of way. The simple tasks are being done then by the technicians and the radiologist will then need to make the diagnosis obviously”.

In the next category, the researcher aimed to retrace the challenges perceived by each party for themselves, but also of the other group. By doing so, one could compare if they identify the same core challenges faced by the product development as well as the product integration.

4.2.2.2 Challenges for the development. Considering the product development first, employees on the integration versus the development side identify slightly different core challenges. The employees focused on the product development indicate that the main issue they see for the product development is not the improvement of their product per se, but rather the acceptance and consequently the adoption by patients. When it comes to the algorithm performance, they do see room for improvement moving ahead, but the cause of low adoption rates is not insufficient product performance. P3 claims: “Technology is becoming ready and now it’s really a human thing”. They rather see further improvement of the algorithm (mainly through the improvement of training data) as an opportunity moving forward to increase applicability of different patient cases and to enable autonomous decision making for all. One interviewee highlights that different ethnic groups run different risks of developing cancer throughout their

(39)

lives and their chest might be anatomically different. For this reason, they underline the importance of accurate representment of such cases in the training datasets: “So in the long term we will really have to make sure to remove all these bias as much as possible. And this will of course be a very important step before more autonomous decisions [algorithms are possible]. If we say okay, we have 100% accuracy but it is not applicable on all women but only to some ethnicities, then there is a real problem in terms of applicability”. P3 also mentions the option of combining different information sources in one algorithmic diagnosis to optimize accuracy: “The question behind is what do we want to do, because you can add many different parts. You can combine different types of exams, like ultra sound, MRI to get even better performances”.

On the other hand, the biggest challenge for the product development as perceived by people from integration side are privacy regulations that are in place as well as certain functionalities that enable easy integration of the product in the workflow.

4.2.2.3 Challenges for the integration. Concerning the integration, the software developers see less of a concern. P3 states “Also, going back to integration, as it is a web page, we are very easy to integrate in practice, which is a big advantage”. Interviewee P4 mentions that due to mammography solutions being the most established ones in the market, processes are already standardized and if implemented correctly by everyone, integration should not be a big problem:

“[…] basically in mammography, because of CAD that has been the first radiology application where AI was used years ago, the standard is already there. It’s basically people implementing and making use of this and of course, some are specific things that are more happening. Especially in 3D [tomosynthesis], those things are kind of enhanced navigation through the slides for the standardization, that are just implementations of these features that are needed”. An opportunity

(40)

developers see is the integration of other systems to improve diagnoses even further in the future.

P3 says “You can combine different types of exams, like ultra sound, MRI to get even better performances”. All these statements suggest that integration is not seen as a big hurdle or concern from the development side, but rather as an opportunity.

The comments made by people focused on integration have been quite opposing to this.

One big challenge they see is the integration of different vendors of PACS systems or software solutions. One possible solution to this could be more standardization in the process according to P2: “I think one of the most challenging parts is the different integrations that we have. For instance, there are a lot of different PACS systems that we use. Each PACS system will require Company X to do the integration in a specific way. So there is not really a guideline or a golden standard. The only standard that is there is the DICOM standard. So each system will use DICOM but all PACS systems want something else or cannot handle, for example, structured reports or are not able to fetch the score from the study so they cannot prioritize worklists”.

The same respondent also indicates that AI vendors are limited in their functionalities when they integrate with a PACS vendor, as the software can only provide solutions that are supported by the PACS system. For instance, if the PACS vendor does not have an integrated viewer for mammograms, then the image solutions from the AI vendor cannot be displayed through the PACS. As a possible solution to this, the employee mentioned that their company is planning to develop viewers for their solutions that work without the support of the PACS. However, they are also aware that this comes at a trade-off for user friendliness: “For a radiologist it’s a nightmare.

Having different viewers for everything, dedicated viewers for everything – it is a real nightmare.

When it is completely integrated – some viewers can be completely integrated in their PACS – then the radiologist won’t really mind. But normally that is not the case” (P2). An alternative