• No results found

Why fraud detection algorithms have to be data ethical ready

N/A
N/A
Protected

Academic year: 2021

Share "Why fraud detection algorithms have to be data ethical ready"

Copied!
86
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

WHY FRAUD DETECTION ALGORTIMS HAVE

TO BE DATA ETHICAL READY

For implementation in the public sector: an explorative study on

how SyRI contributes to data ethics readiness

Meike - Yang Cnossen

Student number: S2433494

Date: 7 August 2020

Supervisor: Alex Ingrams

Word count:24426

Leiden University 19/20 | MSc Public Administration International and European

Governance

(2)

Abstract

The use of big data in the public sector has become much more common. Algorithms are often used in various sectors in the public sector, one of which is the safety and security domain to detect and prevent fraud in an early stage (van Ark, 2020). Algorithms can help to analyze data and identify irregulars that might indicate fraudulent features, which then can be dealt with (Konasani et al., 2012; Hipgrave, 2013). The increased use of big data and algorithms in the public sector led to more attention to the ethical aspect of working with algorithms. Moreover, there is no concrete guidance at this moment on how to develop ethical, responsible algorithmic fraud detection tools that can be implemented in the public sector.

This research focuses, therefore, on what fraud detection algorithms need to be date ethical ready for implementation in the public sector. To answer this question, we will focus on what big data, algorithms and artificial intelligence are, and what ethics and data readiness means. This knowledge will be combined to develop a data ethics readiness framework which will help to understand the importance of providing concrete guidance on how to deal with data ethic risks that might occur when working with big data and algorithms in the public sector.

For this research, we will execute a document analysis using the SyRI case to analyze the data ethics framework. In the Netherlands, there is a system designed called SyRI, which works with predicting algorithmic analysis to detect fraud in early stages to prevent welfare fraud (WRR, 2016). This case received a lot of publicity and open debate due to the questionable approach dealing with the privacy of citizens. This case helps to understand how algorithmic fraud detection systems work in practice and what needs to be done to create data ethic readiness algorithmic instruments for fraud detection in the public sector.

The data ethics readiness framework includes concrete concepts that cover the main challenges and risks when working with algorithms. It, therefore, shows how algorithmic fraud instruments should look like when implementing such a system in the public sector. The framework can be divided into the ethical side and the data readiness side. In this framework, focus lays on the main challenges when working with algorithms such as unjustified action, opacity, bias, discrimination, autonomy, moral responsibility, accessibility, validity and context. Moreover, the framework pays attention to what is needed and how to deal with implementing algorithms in the public sector such as good

(3)

representation, transparency, context, accountability, responsibility, (legal) guidance, where data comes from, faithfulness and representation and appropriateness for goal.

The SyRI case demonstrates that the law Wet structuur uitvoeringsorganisatie werk en inkomen (SUWI) is an essential part of the justification of implementing SyRI in practice. The law is a valuable source that explains how responsibility and accountability have been assigned to specific governmental authorities. Moreover, the law shows how SyRI fits into the approach to prevent fraud and how the different stages of SyRI look like (WRR, 2016; Raad van State, 2014; Ministerie SZW, 2014).

While the preparation and design of SyRI might look reliable, there is a lack of attention to possible risks that can occur and how it should be dealt with this in practice. The concerns arising in practice resulted in under-representation of essential data ethics readiness elements which is necessary to take in consideration. For example, it is unclear how the risk indicators influence the outcome of the algorithm and the SyRI system cannot be controlled and therefore, it is challenging to justify decisions made based on this system.

The SyRI case shows that the government aimed to design a reliable fraud detection system which is enshrined in law; however, this is not enough to create a data ethical ready instrument that can be implemented in the public sector. While regulations are essential to guide how one should work with algorithms in the public sector, it is also crucial to focus on other factors as well such as fairness, well preparation, adding value and transparency. Creating a plan guided by these factors will help to develop algorithmic fraud detection systems with a focus on the goal and purpose of using such instruments and how this can be achieved in an ethical, responsible way. Moreover, the data ethics readiness framework will contribute to good design but also concrete steps on how to accomplish this and how to deal with issues that might arise. Moreover, the data ethics framework can provide tools that can be used for evaluation or monitoring other algorithmic fraud detection systems if they have what it needs to be implemented in the public sector.

(4)

Table of content

ABSTRACT ... 1

1. INTRODUCTION ... 4

1.1BIG DATA AND ITS IMPACT ON SOCIETAL LIFE ... 4

1.2ALGORITHMS FOR THE SAFETY AND SECURITY DOMAIN AND DEVELOPMENTS IN THE NETHERLANDS ... 7

1.3RESEARCH QUESTION ... 7

1.4INTRODUCTION METHOD OF DATA COLLECTION AND METHOD OF ANALYSIS ... 10

1.5OUTLINE CONTEXT THESIS ... 11

2. THEORETICAL FRAMEWORK ... 12

2.1SOME DEFINITIONS: BIG DATA, ARTIFICIAL INTELLIGENCE AND ALGORITHMS ... 12

2.2.ALGORITHMS AND FRAUD PREVENTION ... 13

2.3ETHICS OF ALGORITHMS ... 17

2.3.1 Computer ethics ... 17

2.3.2 Kant’s philosophical perspective on ethics ... 18

2.3.3 Importance of context and ethics ... 19

2.3.4 Ethical challenges when working with algorithms ... 20

2.4DATA READINESS ... 24

2.5DATA ETHICS READINESS FRAMEWORK ... 26

2.6CONCLUSION ... 30

3. METHODOLOGY ... 31

3.1CONCEPTUALIZATION AND OPERATIONALIZATION ... 32

3.2RESEARCH STRATEGY ... 33

3.3DATA COLLECTION AND METHOD OF ANALYSIS ... 36

3.4.VALIDITY AND RELIABILITY ... 38

4. CASE STUDY AND ANALYSIS... 40

4.1INTRODUCTION SYRI CASE ... 40

4.1.1 How the SyRI system function ... 41

4.2CRITIQUES AND CONCERNS ... 43

4.3DATA ETHIC READINESS FRAMEWORK ANALYSIS... 45

4.4.CONCLUSION ... 53

5. CONCLUSION ... 57

6. DISCUSSION ... 60

REFERENCES ... 61

APPENDICES ... 70

1. SYRI CATEGORIES DATA INPUT ... 70

2. ELABORATION DATA COLLECTION STRATEGY ... 71

(5)

1. Introduction

1.1 Big data and its impact on societal life

Nowadays, data is collected and used everywhere (Yeung, 2018). All this information is labelled big data and can be valuable for organizations. Some organizations rely on it entirely, such as Facebook and Google, who are using big data as a business model (Moscaritolo, 2020). Others, namely in the public sector, are using it to improve the internal organization or (public)service delivery systems (Berryhill, Heang, Clogher, & McBride, 2019). Dealing with big data requires new working methods and technologies which can deal with this enormous flow of constant real-time information (Klievink, Romijn, Cunningham, & Bruijn, 2017). Artificial intelligence (AI) and algorithms are technical tools and systems designed to help process and analyze big data in a way that was not possible before (Yeung, 2018). These new tools create new possibilities to be innovative and implement big data and algorithms in the public sector that can be seen as a regular component of the work processes. According to Carriço (2018), artificial intelligence transforms the way we (will) work. Therefore, it is essential to be aware of how this can be beneficial for society as a whole. As well as how artificial intelligence and algorithms can be implemented in the public sector in a well-considered ethically responsible way. What are the risks at stake? And how can these be tackled?

Big data and data analytics are used for its relevant information which can assist with dealing with (societal) issues and decision-making processes in the public sector. Artificial intelligence can help governments with their resource allocation, comprehend large datasets, expert’s shortage, predictable scenarios, procedural and diverse data topics (Mehr, 2017, p. 4). Artificial intelligence is innovative and can help the public sector to work more efficiently in different ways and increase their interaction with citizens while using more technology (Berryhill et al., 2019).

However, it is vital to understand that artificial intelligence is not the solution for all issues and depending on specific situations, additional action is needed to deal with particular societal or internal organizational challenges. Moreover, it also raises questions concerning ethical issues such as privacy, bias and lack of transparency (Fink, 2018; Mittelstadt, Allo, Taddeo, Wachter, & Floridi, 2016). Artificial intelligence and algorithms rely on specific assigned tasks using mathematic formulas (Kitchen, 2017), and therefore, the outcome always needs to be put into the right context to be rightly understood. There is a need to know how big data can be used and implemented ethically.

(6)

Nevertheless, artificial intelligence is already implemented in various sectors of the government, including the safety and security field. Data analytics and risk profiles are made using big data analytics and algorithmic tools to deal with safety and security issues. These data analytics and risk profiles are explicitly deployed in, amongst others, the fight against tax and welfare fraud in the public sector in the Netherlands (WRR, 2016). Using such instruments has consequences for the privacy of citizens and the transparency of the instrument itself, and in addition to the need for accountability, explaining how this instrument is ethically responsible for executing.

The development of using big data technologies and tools in the public sector is relatively new. The approach of dealing with this development is still ongoing and experimental finding a way on how to deal with this correctly. Much attention is given to this development by international and national (public) organizations (Carriço, 2018). The OECD (Berryhill et al., 2019), presented a working paper on how to use artificial intelligence in the public sector, focusing mainly on the definition and context and what possible work approaches can be used when using artificial intelligence. The European Commission published a white paper on how artificial intelligence can be embedded into the shared values following the same rules, to ensure the shared values and privacy of all EU citizens (European Commission, 2020). While the EU is learning how to work and succeed in using artificial intelligence, many individual countries (UK, Germany, France, China) are competing to obtain a key position within the artificial intelligence developments (Sloane, 2018). Besides the investment in the technological development of artificial intelligence itself, more attention is given to the ethical aspects. It is essential to acknowledge the limits of machine learning and deep learning and what machines can and cannot do. It is, therefore ‘’important to reevaluate AI relating to power, democracy and inequality and what this means to the human’’ (Sloane, 2018).

While there have been many attempts to indicate the risks and challenges, it is still not clear how to use this broad guidance in practice and what consequences it has when implementing this in practice. Often it is stated that issues as bias, black box, transparency, discrimination and privacy are challenges, but what this means when implementing algorithmic systems in practice is lacking. Also, it is essential to look at the implementation by organizations as well when working with big data—looking at the (data) readiness which will also affect the ethical justification.

It is essential for the public sector to have a clear vision of what the goal is and how the public interest could be served, when using artificial intelligence and algorithms and to research how this can be achieved (Klievink et al., 2017; Lawrence, 2017). When using the outcomes of algorithms for decision-making processes or for providing public services, there should be a formulated plan on how

(7)

algorithms contribute to this goal or procedure and what considerations have been made with regard of dealing with the implications that algorithms inherently contain.

The ethical consideration and justification are an essential part when working with big data technologies. In the private sector, the big tech companies who are using a lot of big data and artificial intelligence systems, already highlight the importance of ethical concerns and are proactive to create more transparency and therefore responsibility (O’Brien & Lerman, 2019). These developments in the private sector have happened due to increasing demands from the public for more clarity and transparency concerning the privacy aspect of users.

In the public sector, there is still a gap and quest on how to deal with these ethical challenges that arise from the usage of technological systems such as algorithms and artificial intelligence. One can argue that the environment is not ready and able yet to deal with algorithms and big data ethically. There is a lack of research and knowledge about artificial intelligence concerning policy bodies, and according to Sloane (2018), social science is essential to provide knowledge in understanding how to use these technological systems to improve our organizations in an ethically responsible way. Besides the technical, technological aspect of artificial intelligence, it is also necessary to focus on how people will use this system and therefore, it is legitimate to connect other fields, such as philosophy, in this discussion when talking about ethics in this seemingly objective system.

Moreover, it is essential to pay more attention to the ethical considerations and challenges, since in some areas such as healthcare, possible risks and consequences of algorithms will affect individuals directly. People’s lives depend on the safety and efficiency of these algorithms (Tutt, 2017) even though it is complex to predict when and if the algorithm will fail or not, mistakes can have a significant negative impact. Still, the public sector remains interested in working (more) with big data and its technology systems. However, the complex process of how those algorithmic systems work and what (ethical) consequences it has for society makes it challenging to implement them in an ethically responsible way, as should be necessary for a public sector environment.

In this development, it is not clear what ‘ethical’ and ‘bias’ exactly means within the artificial intelligence topic—neither on how to deal with issues concerning privacy and other values that need to be guaranteed. As mentioned in different academic articles, the main challenges are highlighted as relating to privacy, transparency and biases. However, there is a growing demand for a clear ethical framework that provides an overview of what artificial intelligence and ethics mean and how this can be responsibly implemented in the public sector. Besides, the algorithmic systems should be shaped

(8)

and be ‘ready’ to be implemented in the public sector. According to Klievink et al. (2017), the right knowledge and skills are needed to be able to work with big data. The data readiness of organizations should be aligned with the algorithmic systems, meaning that the organizations using big data technologies should know how to work with these and how to deal with the ethical challenges. The main focus should be on how to deal with ethical data readiness sustainably when using big data analytics and instruments in the public sector.

1.2 Algorithms for the safety and security domain and developments in the Netherlands

The Dutch government uses big data, artificial intelligence and algorithms increasingly. Since the development of using big data and artificial intelligence instruments is relatively new and ongoing, it is essential to understand how it can be used efficiently and to obtain goals (Algemene Rekenkamer, n.d.). The Dutch Audit is currently researching the use of algorithms within the Dutch government, looking into the different methods of algorithms and what effects and risks they have as well as how the organizations can work with algorithms and how they should and could be monitored and controlled. This research project can be seen as a follow-up to the CBS research paper1, which also

looked into the usage of algorithms within the Dutch government. One of their findings is that algorithms are often implemented to identify high-risk cases concerning various risk categories (Doove & Otten, 2018). Accordingly, more research or other interventions are applied based on the risk indication.

Different organizations are currently monitoring the use of algorithms, and special attention is paid to the privacy aspect. However, it is unclear how organizations should work with algorithms, which rules apply and when own insights can be applied. Reports such as Strategisch Actieplan voor Artificiële Intelligentie and Big Data in een vrije en veilige samenleving outline both the benefits and the risks of the use of artificial intelligence and what this entails for different policy areas and what this means for the environment of the organization as well as what this means for the organization itself (Ministerie van Economische Zaken en Klimaat, 2019). On the other hand, attention is given to the possible risks of using algorithms and what should be done to prevent such risks (WRR, 2016). It is concluded that the current algorithmic systems, dealing with big data, are not able to deal with the ethical standards that are so important within the public sector. There is a growing demand for more research on how

1 Research CBS use of algorithms by governmental organizations:

(9)

these systems using artificial intelligence can be best reformed to ensure that they fit within the public sector and add value to the organizational (internal) processes and services.

In the safety and security domain, big data and algorithmic tools are used to predict risks and make risk analyses to prevent fraud (WRR, 2016). A data-driven approach can help to identify irregularities in systems or data sets that can indicate fraud (OECD, 2019). What is essential is to create fraud risk assessments for acknowledging the risks and the consequences of the decisions made and what effect this might have on others. Various projects are implemented using big data analytics because of the increasing demand by citizens and politicians to tackle fraud issues (Olsthoorn, 2016).

Using risk profiles based on data analysis helps to indicate risk categories which require additional consideration. Risk profiles contribute to a better allocation of capacity and achieving the stated goal (Doove & Otten, 2018). Big data provides information and data than can be used to be analyzed by tools such as algorithms and artificial intelligence systems. The input used by these instruments is essential and essential since this will affect the outcome. The WRR (2016) acknowledges the importance of data collection, including data preparedness, data security and data storage, as well as the data analyses. Data consideration is essential since the data is a crucial element of creating and predicting risk cases. Organizations working with big data analysis and risk profiles should be able and ‘ready’ to work with algorithms and big data while paying attention to how data is collected and used as well as how this can be ethically justified. Organizations should be able to recognize the risks and what can be done to minimize these risks from happening.

One example of an algorithmic system that uses big data analysis and risk profiles to prevent fraud is SyRI. SyRI is implemented by the Dutch government to detect possible fraud cases by analyzing various data sets and information from citizens. The data is obtained via multiple involved institutions. The analysis provides a list of potential fraud cases which require further investigation. Based on this list, decisions are made that have an impact on individuals directly (Olsthoorn, 2016; WRR, 2016). While the law explains the work process of SyRI, it is challenging to understand how the input is transformed into the outcome.

(10)

Moreover, it is unclear how the work process exactly unfolds in practice. It is not clear how the possible fraud cases are detected and on which information the outcome is based. Therefore, it is challenging to justify the work process as well as safeguarding the ethical responsibilities. Moreover, some concerns were raised by civil society organizations. The concerns were expressed about the lack of transparency, violation of privacy and human rights, which indicates that despite the rules explained in the law the SyRI system, the execution in practice did not lead to the desired outcomes.

1.3 Research question

Besides considering the ethical challenges and how to deal with this, it is essential to understand what data can be used and what consequences this has for decision-making and individuals when follow-up actions are taken based on the algorithmic output. The current development of dealing with big data and algorithmic tools in the public sector, and increasing attention for the ethical aspect of algorithms, shows that there is a need for a concrete framework. A framework that guides how to deal with the ethical implications of working with big data and algorithms as well as how organizations can be prepared to work with big data technologies responsibly.

The research question is as follows: What do fraud detection algorithms need to be data ethical ready for implementation in the public sector?

While big data and data analytics are used for fraud prevention and other (safety and security) goals, not much academic information is published about the consequences or design process. Therefore, the general knowledge will be obtained through information about big data and algorithms, which is also an ongoing development and then adapted to the fraud prevention setting.

Working with algorithms can contribute to deal with big data and implement new innovative ways of working and processing enormous amounts of data. But on the other hand, they are not sufficiently designed yet to be able to incorporate the risks and the ethical considerations into the work process. Besides, because working with new algorithms systems in the public sector is still in the starting (experimental) phase, beneficially using the algorithms while considering the ethical aspects as well, can be somewhat challenging to implement within the organization. Researchers have paid attention to the various aspects of algorithms, including the risks, importance of ethics and debating how algorithms and big data can be implemented in practice. However, a clear framework is missing in this

(11)

area. Moreover, there is a need for clarity on how to deal with systems like SyRI, who are often complex and comprehensive.

Combining the technological (data readiness) side with the ethical side of working with big data and artificial intelligence in the public sector, a framework will be developed to provide more guidance on how algorithmic systems can be developed in a way that provides ‘data ethical readiness’ to be implemented in public organizations legitimately and responsibly. Big data and algorithms are already implemented in the security area at this moment. However, questions are raised about the validity. In the case of SyRI, the critique is focused on the violation of ethical standards. Therefore, this research will focus on how algorithmic systems can be implemented in a data ethical, responsible way. This research will attempt to obtain clarification about what the relevance is of ethical considerations are which need to be safeguarded in the design of algorithmics. Besides, it will include data readiness that helps the algorithmic system itself as well as the executed organization to deal with this responsibly. The outcome of this research can then be beneficial to improve systems such as SyRI and eliminate criticism currently directed towards the public sector.

1.4 Introduction method of data collection and method of analysis

In this research, we will be using a document analysis focusing on the SyRI case. SyRI is an analytic data instrument implemented by the Dutch government to make risk analyses to assist with fraud prevention in the security sector. To give an in-depth impression of SyRI, different resources will be consulted, which will be further explained in the methodology section.

Often algorithmic systems itself are difficult to explain due to the black box characteristics. Looking into how algorithmic systems work and are implemented in the public sector, it adds more sensitivity because citizens privacy and information are involved as well as the interests of public organizations who needs to safeguard the privacy of these citizens. SyRI is an example that acquired a lot of attention in the media due to civil society organizations who got involved as well as the court case, which led to more awareness and information about this SyRI system. Also, the SyRI case has attracted a great deal of international attention from privacy experts around the world (Simonite, 2020) since the court case will decide about how governments should deal with algorithms. The SyRI case got even support from the United Nations rapporteur Alston (Alston, 2019; Simonite, 2020). Moreover, ‘’the European court and regulators have influence and limit what governments are allowed to do when working with artificial intelligence and algorithms on citizens.’’ (Simonite, 2020). The SyRI case will probably influence how other courts and countries interpret the EU human right law and General Data

(12)

Protection Regulation (GDPR) according to van Veen (Simonite, 2020). The SyRI case shows, therefore, the importance of addressing on how to work with algorithms in the public sector and what value the human right basis and following the GDPR have in designing and implementing such systems (de Haan, n.d.).

The court case and the public discussion around this SyRI system led to the publication of various documents and information concerning the work process and challenges faced by executing SyRI. The parliament discussed SyRI multiple times which have been recorded and published. Numerous research reports concerning using big data and algorithmic analysis to prevent fraud have been published. Based on this information, the data ethical readiness framework can be analyzed giving an insight in how SyRI dealt with the ethical challenges if there are shortcomings and what can be improved in the future for similar systems. The data ethic readiness framework will help to analyze what is done or should be done to implement fraud detection algorithms in the public sector.

1.5 Outline context thesis

The remainder part of this thesis is organized as follows. Section two describes the theoretical context of the meaning of big data, artificial intelligence and algorithms (2.1) and how big data analytics are used for fraud prevention (2.2.). Furthermore, the role of ethics (2.3) and data readiness (2.4) will be reviewed and how this is related to working with big data analytics and why this is important. The last paragraph (2.5) presents the data ethical readiness framework that emerged from the essential of ethic and data readiness considerations when working with big data in the public sector. In section 3, we will discuss the methodology, including the conceptualization and operationalization of the main concepts (3.1), the used research strategy (3.2), and data collection and method of analysis (3.3). Section 4 presents an introduction of the SyRI case (4.1.), the critiques and concerns (4.2.), and an analysis based on the ethic readiness framework presented in the theoretical framework section (4.3.). The findings will be discussed in paragraph 4.4, as well as what this means for the data ethics readiness framework. We will end with the conclusion (5) summing up the main findings and answering the research question as well as making some remarks about the investigation in the discussion part (6).

(13)

2. Theoretical framework

Big data, artificial intelligence and algorithms are terms that are inextricably linked to each other. Artificial intelligence and algorithms are tools and systems that can be used when dealing with big data (Kitchen, 2017). All the concepts above deal with large data sets of information which is then processed by computer-controlled systems (Klievink et al., 2017). Artificial intelligence and algorithms can help to innovate and improve existing work processes by processing large amounts of data more efficiently. However, it raises concerns when it comes to the influence and consequences these tools can have on induvial in society directly—often working with artificial intelligence and algorithms conflicts with the ethical considerations and what is acceptable and what not. Besides, not all (public) organizations can implement these new technologies in the desired way.

First, the concepts will be defined, and the link between them will be explained. Next, the role of big data in fraud detection will be clarified. Moreover, the ethical side of the algorithms will be explored and why ethics are such an essential element when working with algorithms. This helps to understand the importance of ethics and how this should coincide with regulations, values and other factors. Furthermore, we will focus on data readiness and how it can influence ethical challenges. To conclude, the data ethics framework will be introduced.

2.1 Some definitions: big data, artificial intelligence and algorithms

Big data represents collections of data that are so large, varied and dynamic that only advanced

computer technologies, such as artificial intelligence and algorithms, can deal with them (Kankanhalli et al., 2016, as cited in Klievink et al., 2017). The current hardware and software cannot deal with big data. However, what big data exactly means depends on the context. Thus, big data comprehend a complex concept and therefore has yet to reach a definitive definition of what it entails explicitly. However, a definition is not a necessity to understand how big data works.

Klievink et al. (2017) distinguish five characteristics of how big data works. First, using big data means working with (multiple) large data sets from various sources, meaning both internal and external to the organization. Second, the data analyses activities take places in a structured (traditional) and unstructured way. Third, the data set input often concerns real-time data, meaning data can be analyzed while it is created. Fourth, new technologies such as algorithms help computer technology to deal with the complexity of dealing with big data. And lastly, (existing) data is used in new innovative

(14)

ways. It implies that some information is used for new applications than it was initially collected for. To sum up, big data can be used for different activities such as collect, combine, analyze and use in decision-making and work processes.

Big data provides the basis ingredient (data) that leads to innovations such as artificial intelligence and algorithms. Often, they have similar goals and challenges since algorithms need big data to function, and artificial intelligence models use algorithms to work.

Artificial intelligence (AI) can be understood as a technological system that uses big data as input to

process and analyze information. It can make decisions based on identifying patterns and irregularities in large data sets without human interventions (Sousa, Melo, Bermejo, Farias, & Gomes, 2019). Mehr (2017, p. 1) describes artificial intelligence as: ‘’artificial intelligence is the programming or training of a computer to do tasks typically reserved for human intelligence, whether it is recommending which movie to watch next or answering technical questions.’’. With artificial intelligence and algorithm, it is possible to create machines that learn new skills that can be used for many different functions. Moor (2001, p. 89) describes it as: ‘’We can design them, teach them, and even evolve them to do what we want them to do.’’.

Carriço (2018) divides artificial intelligence into three categories. First, narrow intelligence, this includes machine learning and deep learning. It means that a machine is a development in such a way that it can perform specific tasks. It also shows the limitation of the machine, since it can only ‘answer a question’ or ‘solve a problem’ of a specified task fully dedicated to this. While machine learning focuses on learning to make decisions based on (semi-supervised) feedback, deep learning implies that systems can detect patterns based on data analyses that humans likely fail to see (De Nationale AI-cursus, 2020, track 4 and 5).

Nevertheless, it is based on specific tasks and input given on forehand. Second, artificial general intelligence refers to ‘a human-level AI machine’ (Carriço, 2018, p. 30). It means that machines can conduct the same tasks a human can and act in the same way. The environment and understanding this context are an essential part. Third, artificial superintelligence means that ‘a machine is smarter than the smartest ‘Einstein’ in practically every field, including scientific creativity, general wisdom and social skills’ (Bostrom, 2016, p. 11, as cited in Carriço, 2018, p. 30).

Algorithms are a specific part of what kind of data will be analyzed and what the outcome will be

(15)

Based on this and the given input (big data) information, an outcome will provide information that can be used for the next algorithmic process, or it can already provide an ‘answer’ that can be used for decision-making processes (Kitchin, 2017).

When using algorithms in a real-world context, questions can be raised about the function they have and how they perform a specific task. It means that it is an essential need to have a clear vision about the mathematic technological part as well as the rational and uncertainty side that can lead to other implications such as objectivity, legitimacy and reliability. Algorithms can do many things as Kitchin and Dodge (2011, as cited in Kitchin, 2017, p. 18) describe it: “Algorithms search, collate, sort, categorize, group, match, analyze, profile, model, simulate, visualize and regulate people, processes and places. They shape how we understand the world, and they do work in and make the world through their execution as software, with profound consequences.”. Therefore, it is essential to know the implications to find a way to deal with this.

Yeung (2018) describes different ways of how algorithms can be used. First, as an outcome-based regulation that focuses mainly on the outcome and not the process of how this outcome occurred, which clashes with the ethical and socially responsible aspects. Second, as a data-driven performance management system, in this way, algorithms are used to focus on governmental improvement rather than the processes. The different approaches of using algorithms show how the implementation of an algorithmic system can influence society and decision-making when this is based on the outcome of an algorithmic system. Besides the various interpretations of how algorithmics are used and implemented in the public sector, algorithms itself also contain challenges and limitations. The main problems concern the validity and generalizability depending on the context of algorithms and what purpose it serves (Mittelstadt et al., 2016).

The benefits of using big data and its technologies are that it provides new (innovative) ways of processing the multitude of data available in our current society. The amount of data is challenging for people to handle; however, artificial intelligence and algorithms make it possible to work with big data in an efficient way (Yeung, 2018). However, artificial intelligence and algorithms will not solve systemic problems that exist within the government, as Mehr (2017, p. 1) highlights. Moreover, working with big data and algorithms can lead to challenges around service delivery, privacy and ethics. Challenges of any computer system, hence also artificial intelligence, is that it can contain bias, make unfair and discriminatory decisions, and the system’s actions cannot always be predicted. These flaws become problematic when the system is used in a highly sensitive environment, and human interest is at risk (Wachter, Mittelstadt, & Floridi, 2017).

(16)

While algorithms can help governments and public organizations to work more efficiently and precise, it is essential to acknowledge that algorithms contain bias and are value-laden (Kraemer, Overveld, & Peterson, 2011; Vedder & Naudts, 2017). Moreover, it can be challenging to discover the bias within an algorithm (Fink, 2018) because it originates with the design of specific algorithms. The input is often based on particular values and norms, which inherently contains a certain bias towards the perception of society. These biases influence the algorithmic process and therefore, the outcome. Furthermore, it also depends on how the results are used and perceived and what consequences that decision has for society (Kraemer et al., 2011). When creating an algorithm, specific input leads to an outcome. It can be, for example, either positive or negative. The challenging part is how to deal with false positives and false negatives, which depends on the designer since they decide what the input is and how the algorithm will process the data input. Decisions made based on algorithms systems, or influenced by these, contain bias and are value-laden since the input and output are shaped by particular perceptions and a frame of reference.

Also, due to the technical part of algorithms, it is challenging to understand how they exactly work— specifically, how the input is analyzed and the process that leads to a specific outcome. Fink (2018) discusses the lack of transparency surrounding algorithms. Algorithms can, therefore, be compared to black boxes within the decision-making process, where it is unclear what decisions exactly lead to the outcome and how this process looks like (Sloane, 2018). Using an algorithm means that information (input) is transformed and follows a process (throughput) that leads to a result (output). But what exactly is happening during the throughput phase of the process is not clear or transparent, hence the link to the ‘black box’ (Fink, 2018; Vedder & Naudts, 2017).

Using big data and algorithms in real life is entirely different than just looking at it from the technical side. The surroundings and context play an important role in how algorithms are designed and how they are used, implemented and interpreted in the social world. It has led to a new definition of ‘algorithmic decision-making’ (Yeung, 2018), where algorithms provide knowledge systems that are used for decision-making. At this point, algorithms are an essential part of the decision-making process. It is necessary to think about how to shape the context and surroundings of algorithms that it is usable in the public administration sector in such a way that it does not clash with the current ‘standards’ of accountability and privacy. The development of using big data and algorithms in the public sector led to more attention to what ethical considerations are needed to implement algorithms responsibly.

(17)

2.2. Algorithms and fraud prevention

Algorithms and big data are increasingly used to prevent fraud in various domains such as in the private sector, the banking sector, the health sector, but also the safety and security sector within the government. Big data and algorithmic analytics can be seen as smarter fraud detection tools that can be implemented by organizations (Hipgrave, 2013). The benefit of using algorithms is a possibility to detect fraud in an early stage and therefore, able to prevent fraud from happening. Discovering trends, patterns and irregulars in a tremendous amount of data sets can lead to detecting fraud in an early stage. According to Hipgrave (2013), it is essential to know which fraud detection actions may be implemented within the existing legislation. Moreover, coordination and collaboration within organizations as well as with closely involved organizations, are needed to share data and information. Collaboration increases the efficiency of preventing fraud since work can be done more efficiently and accessibility to more data means having a completer overview of specific situations which can help to detect irregulars.

Another example of preventing fraud while using big data analytics can be found in the health sector. Currently, two instruments are used to prevent fraud showing how big data and technology its entrance in fraud prevention. First, a fraud audit rules, this means that one needs to audit health insurance claims manually to detect fraud. Manually audition requires a lot of work and time; the second tool is the fraud prediction scorecard. The fraud prediction scorecard is an automatic process based on computer statistical analysis (Konasani, Biwas, & Keloth, 2012). In this case, historical data was used to make prediction scorecards which affect the quality of the data and the outcome. However, creating a data analysis where more data and cases can be included as well as taking influencing factors into account, the prediction scorecard can develop into a reliable and quick fraud prevention system.

Research into insurance fraud detection points out that fraud can occur due to the weaknesses in legislation and rules of a system (Bologa, Bologa, & Florea, 2010). The benefit of using artificial intelligence for fraud detection is that artificial intelligence provides new tools to cluster and classify data. This means that data can be compared with existing rules and laws and find irregulars that might indicate fraud. Moreover, machine learning can help to show what the characteristics are of fraud as well as the possibility to detect and classify data patterns unsupervised. Furthermore, an artificial intelligence system can learn from examples and use them in the future to identify new cases based on the existing examples.

(18)

There is a broad idea of how algorithmic systems can help to detect and prevent fraud. Using artificial intelligence can lead to faster, more efficient ways to process and analyze data and detect irregularities. Ideas are presented on what is essential to keep in mind, such as existing regulations, collaboration and data sharing. However, the main focus lays on detecting fraud and less attention is on how to deal with privacy complications or other ethical challenges that might occur, which is essential when working with algorithms.

2.3 Ethics of algorithms

Working with big data and its technological tools such as algorithms comes with various challenges and implications as shortly mentioned above. Some of these challenges focus on the technical side of algorithms and big data tools, while others focus on the practical side of the implementation process. Both the practical and the technological side of working with big data has a risk of leading to implications that have something to do with the ethical conduct of algorithms. When we are talking about the ethics of algorithms, we focus on the question of what is right and what is not. According to Sloane (2018), it is essential to have an understanding of what ‘ethics’ entails to be able to deal with it. But first, what does the ethics of algorithms mean? And why is it important?

2.3.1 Computer ethics

The technical side of algorithms involves computer ethics which focus on broadly shared values that are shared by many worldwide. Talking about ethics within artificial intelligence and algorithms means having to talk about ethics in computer ethics. When using technology, such as the world wide web, there is no general law that tells us what is acceptable and what not.

The difficulty of computer ethics is that, besides the technical computer side, it also involves humans. Those humans decide what computer ethics look like. These technological computer structures do not stop at a national border, and therefore has further complications dealing with many different cultures, values and legislation it has to account for. Consequently, it is essential to create ethics that are globally accepted. However, it is not easy to construct a supranational law that will be taken and enforced by different nations. The interpretation of what is right or not depends on the frame of reference, which is mostly shaped through national culture or community influences and hence is different for many. There could be endless discussions about which rules are deemed most important and which should be enforced (Moor, 1998). Therefore, the most important lesson is that it is essential

(19)

to set up general standards, that are linked to human values that are essential across nations, cultures and communities (such as freedom, knowledge, life, happiness, security). Policies, even informal ones concerning norms and values, are necessary not to violate the ethical boundaries (Moor, 2001). Also though interpretation can be slightly different, the core values remain the same. The Computer Ethics - Philosophical Enquiry (CEPE) in 2000 held various conferences to discuss the different perspectives on values such as privacy, identity, anonymity, freedom, access and security (Johnson, Moor, & Tavani, 2000, p. 6). When the rules are linked to these universal values and are not too specific on the details, the country has room to interpret these rules how they see fit in their society while the core values implemented remain the same. Nevertheless, during the implementation of these core values, human (subjective) actions remain involved. Hence ethical considerations from a more philosophical field are relevant to understand how someone comes to a particular decision.

2.3.2 Kant’s philosophical perspective on ethics

Immanuel Kant was an influential philosopher who made a distinction between two concepts which explains why people form particular decisions and what the ethical philosophy is behind these decisions. Namely with the categorical imperatives and the hypothetical imperatives. His work mainly focuses on morality and what one ought to do because it derives from it being a duty, goodwill, one’s desire, or because the (moral) law says so (Johnson & Cureton, 2016; Jackson, 1943). The categorical imperative implies that commands must be followed, but it is morally the right thing to do. Thus, not because one does it out of own interested or desire. Hypothetical imperatives also imply a specific command is given, but the decision to follow this command or not rely on whether the outcome is beneficial and whether the choice to do so is free and autonomous or not. Ethics cover the part of internal desire to pursue specific rules and values, and decisions made stimulated by the (moral) law. When looking at ethical questions, it is essential to keep in mind why something is or should be done in a certain way. Is it because one ought to act like that, or is it based on ‘duty’ or ‘goodwill’ and does the interest of the individual play a role or not? And when someone does something because of a particular conviction, is that acceptable by others?

Both will help to understand the importance of ethics in algorithms and the challenges it can have while using these in the public sector. More importantly, it connects the two sides (technical and practical) of algorithms and helps to understand why ethics is so important yet challenging to deal with. Ethics will always remain a topic to be debated. But for artificial intelligence, it all depends on

(20)

whether it is accepted in the context in which the algorithm takes place. Is it accepted by those who will face the consequences and those who have the power to make those decisions?

2.3.3 Importance of context and ethics

Wachter et al. (2017) discuss the importance of looking at AI, robots, algorithms and decision-making altogether. The different aspects of hardware, software and data are inseparably linked to each other. And only when looking at these as a whole, challenges such as fairness, transparency, accountability and interpretability can be discussed and tackled. According to Floridi and Taddeo (2016), social acceptability and social preferability are also essential and should be a guiding principle within data projects.

Floridi (as cited in Schuller, 2020) said in an interview that ‘digital ethics’, which is an umbrella concept, is to cover many ethical issues such as issues of privacy, property, personal identity, transparency and accountability (Schuller, 2020). Therefore, ethics can be seen in a broader context where the surroundings and the entire environment play an essential role when focusing on the ethical aspects of AI and technology. Floridi (as cited in Schuller, 2020) also makes a distinction between the perception of ethical issues. The first group of problems focus on what happens in the present and what is visible and broadcasted by newspapers, TV and online media. Issues such a privacy and data are frequently discussed in public. The second group is focused on the long term and is less viable due to the less-threatening nature of the issues, for example, the erosion of autonomy. The context in which decisions take place differs per situation. First of all, it is essential to look at the whole meaning of the technical side, how the different machines and technologies interact with each other and what effect that has for the outcome and how that can affect individuals. But also, how the society perceives ethical issues and how that influences the emphasis on specific ethical topics which are then highly prioritized.

Understanding the context in which algorithms takes place, helps to direct responsibility to someone. Matthias (2004) highlights that there is a ‘responsibility gap’ since automated decision-making does not have to involve human intervention. It might make sense that the designer and producer are responsible for the outcome of the algorithm. But this does not make sense when the algorithms can make independent decisions. Besides, when individuals make decisions based on the result of the algorithm, it can be questioned who is responsible because there has been a human influence.

(21)

2.3.4 Ethical challenges when working with algorithms

Many scholars have written about the ethics of algorithms, either raised concerns or have given a critique on this development. Mittelstadt et al. (2016) distinguished six main ethical challenges of algorithms that covers the main points in the algorithm ethics discussion. While some of the challenges focus on the more technical shortcomings, the outcomes or consequences of these have a severe influence on society or individuals and thus focus on the ethical responsibility of using algorithms. First, Mittelstadt et al. (2016) make a distinction between epistemic concerns (quality of the evidence) and normative concerns (ethical evaluation of actions). This combined focusses on what can be seen as the truth, and what believes, and justifications apply. It concentrates on the setting in which algorithms are implemented, but also who should be held accountable for mistakes when they occur. Below there is an overview of an outline of the ethical challenges and the possible consequences they can have. First, inconclusive evidence concerns computer learning. Algorithms can learn from a given input and

find links between different data sets. This must be done carefully because linking random datasets can lead to correlation or causation. Depending on what the input is and how one interprets the outcome of the algorithmic system. Inclusive evidence means that algorithms only say something about the data that is used as input. The result of the algorithms only shows certain correlation based on the input (Barocas, 2014). Barocas (2014) points out that only the (biased) input will be considered when analyzing a data set. It influences the outcome and can be wrongly interpreted when not acknowledging the shortcomings of algorithms. The outcome of an algorithmic system or formulation does not say much about the bigger context or what this means in a general or bigger population. In this case, depending on the input correlation can be found in anything based on the specific information and the design of the algorithms (example of how ice cream kills2).

However, there are multiple risks involved. First, the ‘knowledge’ that is used as a basis for further research or decision-making is based on computer statistics, rather than knowledge from experience or academic research. And second, the outcome of the algorithmic system cannot justify any action or consequences. The challenge is to what extend conclusions can be drawn from this information given by algorithms based on our biased input. Using the outcome of algorithms can, therefore, lead to

unjustified actions. Decision-making and data mining (obtaining a lot of data) rely on inductive

knowledge and correlations identified within dataset causation. Acting on associations is problematic because of the spurious correlations and predictive analysis, it’s double uncertain. The outcome of

(22)

algorithmic systems often concerns populations due to the information that was used as input. At the same time, actions are directed towards individuals (Ilori & Russo, 2014, as cited in Mittelstadt et al., 2016). When making a decision based on general knowledge but using this to deal with a specific individual case, it has consequences for bias that can occur (Kraemer et al., 2011; Vedder & Naudts, 2017). Unjustified actions can lead to discrimination which can clash with the values of public organizations and does not meet the expectations that citizens have of public organizations (Regan & Maschino, 2019).

Second, inscrutable evidence concerns the evidence part in which data is used for a conclusion. When

this is the case, there should be a connection between the data and the conclusion. Moreover, this should be accessible to be able to understand this connection and the outcome that has been presented. The outcome must be satisfied, is it well explained? And how can one interpret it? And should it be interpreted at all? These unclarities in the process are called opacity. Evidence should be

transparent and monitored so that the process and outcome can be evaluated. However, algorithms are poorly predictable, difficult to control and monitor (Tutt, 2017, as cited in Mittelstadt et al., 2016). The availability of evidence focuses on accessibility and comprehensibility. Which means that information should be accessible, but often this is not the case because of reasons concerning competitive advantage, national security or privacy (Turilli & Floridi, 2009, p. 106 as cited in Mittelstadt et al., 2016). Other issues concern the black box, machine learning and human intervention. More transparency can help to overcome the challenge of inscrutable evidence and the black box part of algorithms since it will show more of what the process looks like and how the input generates the output (Fink, 2018; Vedder & Naudts, 2017). However, there is a side note that needs to be considered. Making data transparent will also affect the privacy of others since personal information is used in these algorithms. It is crucial to examine when an invasion of privacy is desired and tolerated and under which circumstances, and when it is not. Machine learning is another form of using artificial intelligence without human intervention (Sousa et al., 2019). Again, this absence makes it more challenging to control and monitor how the data has been processed.

Third, misguided evidence elaborates on the inconclusive evidence, where the output can never

exceed the input, also called the ‘garbage in, garbage out’ concept. This means that ‘’conclusions can only be as reliable (but also as neutral) as the data they are based on’’ (Mittelstadt et al., 2016, p. 5). Meaning that the reliability and conclusion can only say something about the context in which specific input (data) is given and thus cannot be generalized to another (larger) context. This can lead to bias.

The values of the designer of the algorithms can influence how data will be analyzed (Kraemer et al., 2011). The value-laden algorithms are always involved, however, in this specific case, it can lead to

(23)

wrong conclusions which then can be interpreted differently by those who consult it. Therefore, algorithms are not neutral and objective when it comes to using big data analytics for non-standard numerical cases.

Fourth, unfair outcomes concern the evaluation of the actions driven by algorithms. It focuses on

‘fairness’ of action and its effects. Based on this, a result can be seen as discriminatory and hence lead to discrimination, profiling and predicting. When profiling takes place, personal data is used and analyzed, and then future behavior is predicted. Based on these predictions, one can take measurements which may not fit with the actual behavior of someone and therefore, can be unfair. Whether such actions are (ethically) acceptable (fair) and appropriate depends on the effect of the treatment (Schermers, 2011, as cited in Mittelstadt et al., 2016). Whether something is seen as discriminatory depends on the context and (policy) area, and the consequences and impact it has on individuals and society.

Fifth, transformative effects imply that algorithms cannot always be backtracked to specific cases or

failures. The lack of transparency makes it difficult to understand how the algorithmic process works and how the input leads to the output (Fink, 2018; Vedder & Naudts, 2017). The lack of transparency in the process of the algorithm makes it difficult to see what causes the harm and where the failure occurs. Besides, algorithms can influence our conceptualization about how we see the world (amongst others through misguided evidence), because of the framed information that algorithms produce. When the algorithm has constructed a decision without insight into the process, it creates issues concerning autonomy. This can be divided into two forms of autonomy. First, algorithms decide what

information you see. You cannot choose or determine what information you want to see. Based on the information, the algorithm provides decisions are made without considering alternative explanations is (Mittelstadt et al., 2016). The focus here lays on the autonomy of the individual and how algorithms influence an individual’s choices. The second form of lack of autonomy concerns how other (persons, organizations) deal with the transformative effects. It is essential to keep in mind that algorithms have value-laden characteristics. When decisions are made based on the outcome of algorithms without other considerations, the algorithms have considerable influence on how individuals or organizations look at certain topics or issues. When decisions are made only based on algorithms, the consequences of that choice can affect the support for the action taken.

In some cases, the decisions made can lead to an informational privacy issue. Being in charge of what information can be assessed and shared also lead to implications when it comes to data sharing and

(24)

privacy as well as ‘smart transparency’. When can data be shared and at what risks? More transparency involving the process of how the algorithms work has the downside that private data which has been used as input is no longer guaranteed to be private as our current values and laws dictate it should be when concerning data usage in the public sector. This has led to a new debate on what data can and should be openly shared and with whom (Young et al., 2019).

And sixth, traceability is about the ethical assessment, responsibility and availability of the use of

algorithms. Algorithms consist of many small technological elements that are connected; this makes it challenging to find and detect errors. Moreover, which specific part of the algorithms cause issues or harm (Fink, 2018; Vedder & Naudts, 2017). Furthermore, it also focusses on the shortcoming on whom can be held responsible for issues or errors that occur through the use of algorithms—working with algorithms concerns many parts; from designing to availability and manipulation of large volumes of personal data. The ‘harm’ in this case can be dived into the cause, what was the cause of it, and responsibility. This led to the question of moral responsibility. Who should be blamed and can be held

accountable when errors occur, and how can this be done? And on which grounds has a decision been made (Kant’s philosophy)? Someone can only be held responsible when that person had some degree of control (Matthias, 2004) and has acted intentionally. There is also a ‘loop’ in making mistakes which makes it challenging to identify when a decision is made intentionally and how much control one had in this situation. Additionally, it is challenging to deal with machine learning since no humans are involved so who can be blamed when the machine made a mistake?

These six challenges cover issues concerning transparency, traceability, accountability, black box and bias. Often, they are in some way connected or have some sort of influence on each other. The ethical challenges of algorithms are consistent with the risks of using algorithms in general, as discussed earlier. Similar to what Struijs, Braaksma and Daas (2014) indicate, big data sources, such as social media messages, are often not suitable for big data analysis because they are not designed to be analyzed. Therefore, it affects the quality of the data and makes it challenging to structure the data as well as define what the target population is. The interpretation of big data does not necessarily show the reality or truth since the outcome can contain bias because of the value-laden input.

Moreover, how an individual interprets the output also influences how the ‘truth’ is presented as Beer (2017) points out. The design of an algorithm can be formulated in a certain way (influenced by norms, values and preferences) to lead to the desired outcome. Therefore, it is essential to acknowledge this issue when working with algorithms and when decisions are made based on the outcome of algorithms. The context is critical to understand why the algorithms show specific results.

(25)

The complex context of working with algorithms makes it challenging for organizations to work in an ethically responsible way. Many aspects need to be taken into account when working with algorithms such as transparency. When there is for example, not enough transparency on how the algorithm works or how the process of decision making is influenced by using algorithms, it can lead to complications with regards to accountability and whether unfair actions have been taken. If just one of these challenges is overlooked, it can lead to many more, and most ethically charged difficulties. Therefore, it is essential to think carefully about how algorithms can be used. Besides, it is essential to think about the data and information used for algorithmic systems and in which way this is used. Klievink et al. (2017) and Lawrence (2017) stress the importance of data readiness when working with algorithms, which we will discuss in the next part.

2.4 Data readiness

There are different ways of looking at data readiness. Klievink et al. (2017) discuss the data readiness of public organizations and how to evaluate whether a (public) organization can handle working with big data by looking at the means, knowledge and responsibilities, thus, how organizations deal with big data. Lawrence (2017) focusses on the data readiness of data itself and when (big) data is actually ‘ready’ to be used for further analysis and models. The focus for Lawrence (2017) is more on the evaluation of the quality of the data itself, rather than on those who use it. He concentrates on the big data itself and what it takes for this data to be ‘ready’ to be used as input for an algorithmic system. Even though Klievink and Lawrence have different perspectives, both are relevant when looking at the challenges of algorithms and which goals they serve for decision-making. While the main focus remains on the algorithmic system itself, the readiness of the public sector will help to understand the context and overall environment in which algorithms are implemented, especially when we focus on how these contextual environmental factors influence the design process of algorithmic systems. The two aspects of data readiness, how it is organized and implemented within the organization and the quality of the data itself, will be further discussed in the next paragraph.

Klievink et al. (2017) developed a framework concerning big data readiness within the public sector. He separates three main components which will indicate whether a public organization is ‘ready’ to handle working with big data. The main focus within the framework is on the interaction between the organization and technology (p 271).

(26)

The first component Klievink et al. (2007) described, is the organizational alignment. It is essential to

consider whether big data is suited to use and implement in the organization. Therefore, it is necessary to look at the organizational infrastructure. Specifically, readiness is evaluated on whether the big data projects are aligned with what the organization can do. Particularly in the public sector, it is essential to look at the statutory tasks and whether an organization is (legally) able to perform an activity without causing conflict with other rules. As Klievink et al. (2017) pointed out:” The statutory tasks thus largely determine a public organization’s main activities and its data activities in support of these.” (p. 272). It should be clear what the main statutory tasks (organizational strategy) and data activities (organizational infrastructure) are and how these are connected to the big data application type (IT strategy) with support from the big data characteristics (IT infrastructure) (p. 272). The second component of the framework is described as organizational maturity, which focuses on e-government

developments within the organization. Organizational maturity entails the ability to implement big data by partially evaluating to what extent this will be successful. Moreover, it draws attention to the collaboration with other organizations (on IT areas) and how this collaboration leads to providing better public and citizen-oriented services and demand-driven policies (p. 273). Third in the framework are the organizational capabilities which focus on the ability of organizations to use big data, while

also creating value from it with the assurance that no negative consequences arise from using big data with its applications and tools (p. 273).

Furthermore, Lawrence (2017) distinguishes different levels of data readiness. His definition of data readiness depends on the understanding of what data is and how it can be used. Because the interpretation of data can be somewhat abstract, it can be challenging to grasp for decision-makers and data analyst to use big data. Since working with big data and algorithm has various challenges, including inconclusive and inscrutable evidence, it is essential to understand how these challenges can occur. Therefore, Lawrence describes three levels or ‘bands’, from A to C (A being the best, and C the worst), which assesses the quality of the data. In each band, several questions are asked, which help to determine the quality and value of the data used for big data analytics and algorithms. These bands say something about the readiness of the data used in projects and what needs to be considered to be data ethical ready. The data band levels guide what is essential to consider when using data (Data Readiness Levels, n.d.a).

Band C concerns the accessibility of a data set (Lawrence, 2017). The first question is whether the data

is real, verified and whether it is recorded. When data is saved and stored, there are certain implications that can occur concerning privacy or legal issues. Hence there is a need for transparency where the data comes from and how it can be accessed while considering ethical (privacy) rules. Band

(27)

B is focused on the validity of the data, which entails the faithfulness and representation of the data

(Data Readiness Levels, n.d.b; Lawrence, 2017). When the data is (in)complete or when there are missing values or other anomalies, these will influence the validity and usage of the data and therefore decide whether the data can be seen as faithful and representative. Band A is about data in context

(Lawrence, 2017, p. 4). It assesses the appropriateness of the data for a specific analysis while looking at the context of the organization. It highlights the importance of using correct data which will help to find the answer for a particular issue.

The main message of the ‘band’ levels of data readiness demonstrates that it is essential to ask questions about the data used in data analysis. These substantive questions about the data are just as crucial as organizations being able to deal with the data—both influences how the process of working with algorithms are developed and the results it will have. Overall, knowledge, skills and means are needed to be able to be data ready in the public sector. Thus, the primary importance is knowing what data is required and why. And how this can be retrieved in the given context, considering the legal and ethical constrains.

2.5 Data ethics readiness framework

Using big data and algorithms requires knowledge, means and a good preparation (Klievink et al., 2017). The design of algorithmic systems requires good preparation to be able to use data correctly and for the right purpose. When working with algorithms and big data, there are risks that wrong data is used as the input, which can lead to various implications regarding the ethical usage of personal data (Lawrence, 2017).

Moreover, having an organizational context that is aligned and ready for the implementation of big data is essential for the successful use of any big data analytics. The ability of an organization how to deal and work with big data influences, and how algorithmic systems are perceived and how the outcomes are used in and by the organization (Klievink et al., 2017). This will influence how outcomes of algorithms are used and indicate how organizations deal with specific ethical implications when there is a clear vision of what the goal is of using algorithms and how this can be achieved.

When summarizing the information provided throughout this paper, we can affirm that a data ethics readiness framework can be designed, combining the ethical and the data readiness aspect. While much attention is given to ethical considerations, data readiness can be linked to certain ethical issues to a certain extent. The importance of being able to work with big data and using the correct

(28)

information may help to prevent misuse and misguided evidence as well as unfair decision-making based on the outcomes of algorithms.

The data ethics readiness framework focusses on how algorithmic systems can be designed or evaluated and what essential elements should be considered when implementing and using algorithms in the public sector. The data ethics readiness framework combines the six ethical challenges of Mittelstadt et al. (2016) and the data readiness of Lawrence (2017). The framework makes a distinction between the ethical and data readiness side. While the ethical challenges differ from the data readiness side, they are both important when working with algorithms and some elements can complement each other and help to prevent risks such as unfair outcomes and misguided evidence due to the lack of using inappropriate data. Within this distinction, the main elements are presented. This framework guides how to design, evaluate, or monitor (existing) algorithmic systems.

Figure 1: Data Ethics Readiness Framework

When constructing the data ethics readiness framework, it has been divided into two equally important parts, explaining the combination of ethics and data readiness. The first segment focusses on the ethical part when working with algorithms, showing what essential elements are needed to be considered when working with algorithms to deal with ethical implications. To make ethical responsible decisions when algorithms are involved, it is essential to tackle the possibilities of issues

Data etchical readiness

ethical context (influence environment)

unjustified action representationgood

opacity transparency

bias context

discrimination accountability

autonomy responsibility moral

responsibility (legal) guidance

data design context

accesability (band

c) where does data come from validity (band b) faithfullnes and representation context data (band

Referenties

GERELATEERDE DOCUMENTEN

This basic qualitative interview strategy offers some possibilities for finding “truth“ and relevance, but in order to dig deeper and in other directions four specific techniques

Financial analyses 1 : Quantitative analyses, in part based on output from strategic analyses, in order to assess the attractiveness of a market from a financial

Where fifteen years ago people needed to analyze multiple timetables of different public transport organizations to plan a complete journey, today travellers need far less time to

this dissertation is about disruptive life events causing an “experience of contin- gency,” and the ways people make meaning of such events and integrate them into their

Exploratory analysis depicting cell-cell associations between a total of 77 available WBC subsets that were signi ficantly stronger (red) or weaker (blue) in PLHIV compared to

It analyzes different theories regarding disruptive innovations, why companies keep focusing on higher tiers of the market, how companies can meet current and

The package is primarily intended for use with the aeb mobile package, for format- ting document for the smartphone, but I’ve since developed other applications of a package that

• Several new mining layouts were evaluated in terms of maximum expected output levels, build-up period to optimum production and the equipment requirements