• No results found

Overcoming algorithm aversion - The effect of human intervention in adjusting the forecasts of imperfect algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Overcoming algorithm aversion - The effect of human intervention in adjusting the forecasts of imperfect algorithms"

Copied!
67
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Overcoming algorithm aversion - The effect of human intervention in adjusting

the forecasts of imperfect algorithms

Student: Bianca Denisa Negrea Student Number: 11114495

Thesis supervisor: dr. Markus Paukku Date of Submission: June 22nd, 2018

MSc Business Administration, specialization International Management Faculty of Economics and Business, University of Amsterdam

(2)

Statement of originality

This document is written by student Bianca Denisa Negrea who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

This paper examines if human intervention - in the form of self-intervention or another person’s intervention - in adjusting the forecasts of imperfect algorithms lowers algorithm aversion. Studies show that evidence-based algorithms are more accurate in making predictions than humans. However, people avoid using algorithms after seeing them make errors, a phenomenon called algorithm aversion. Dietvorst et al. (2016) found that allowing people to modify the imperfect forecasts of a statistical model led to higher algorithm usage. Current literature did not investigate if giving people the option to use the algorithmic estimates adjusted by another person can aid the overcoming of algorithm aversion. Using an experiment, this paper tests the generalizability of Dietvorst et al.’s (2016) conclusion on a different sample and proposes a new solution to overcome algorithm aversion. This study reinforces the empirical regularity that algorithms improve people’s forecasting accuracy. The results reveal that human intervention does not lower algorithm aversion. Contrary to previous findings, participants showed a significant preference for using the algorithm, while allowing them to modify the model’s estimates did not lead to significantly higher algorithm usage. The option to use the algorithmic forecasts adjusted by another person actually led to a decrease in algorithm usage, offering insights on the possible causes of algorithm aversion.

(4)

Table of contents

Introduction ... 5

Literature Review ... 8

Algorithms, Forecasting Methods, and Algorithm Aversion ... 9

Evidence on Algorithmic and Judgmental Forecasting Performance ... 13

Causes of Algorithm Aversion ... 16

Overcoming Algorithm Aversion ... 18

Hypothesis Development ... 19

Alternative Solution to Overcoming Algorithm Aversion ... 21

Methods ... 24 Overview ... 24 Participants ... 26 Procedure ... 27 Independent Variables ... 30 Dependent Variables ... 31 Results... 31 Hypothesis Testing ... 33

Use of the algorithm in making predictions... 33

Adjustment size ... 36

Forecasting performance ... 38

Confidence ... 43

Results of hypothesis testing ... 43

Concluding discussion ... 44

Practical Implications ... 48

Limitations and Directions for Future Research ... 49

References ... 50

(5)

Introduction

Companies search for data to aid their decisions or to make a variety of predictions: employee performance, product demand, sales or marketing efficiency. For example, Netflix uses predictive analytics to help them decide what television programs to produce, while IBM started using algorithms rather than just human judgment to evaluate potential acquisition targets (MIT Sloan, 2016).

Studies show that evidence-based linear algorithms are more accurate than humans in making forecasts in a variety of fields (Grove and Meehl, 1996; Silver, 2012; Grove et al., 2000). Yet, decision-makers are usually reluctant to base their forecasts on algorithms and prefer to rely on human judgment, a phenomenon called algorithm aversion (Dietvorst, Simmons and Massey, 2014). Algorithm aversion may undermine the use of this valuable resource that contributes to increased organizational performance (Sanders and Manrodt, 2003).

This thesis examines if human intervention in adjusting the forecast of an imperfect algorithm aids the overcoming of algorithm aversion. By human intervention, this study defines participant’s own adjustment of the algorithmic forecast and the use of a past participant’s adjustment of the algorithmic estimates.

The answer to the research question is important for organizations that want to adopt superior forecasting systems or to implement data-driven decision-making processes. If managers understand under what conditions people are more likely to use imperfect algorithms in forecasting tasks, organizations could capitalize on the trend of Big Data, improve their predictive accuracy and increase their efficiency.

(6)

The current understanding is that people’s reluctance to use imperfect algorithms originates from an “intolerance of inevitable error” (Dietvorst et al., 2016, p. 2), based on their belief that humans can achieve perfection while algorithms will necessarily make errors. Dietvorst, Simmons, and Massey (2016) found that people’s algorithm aversion is reduced considerably by allowing them to adjust the result of the imperfect algorithm. However, recent studies bring conflicting results and do not replicate the above-stated findings (Prahl & Van Swol, 2017; Logg, 2017).

Empirical findings revealed that people abandon algorithms easier than they abandon humans for making the same mistake (Dietvorst et al., 2014). Current literature only explored people’s choice to use an algorithm, their judgment, or other people’s forecasts. This study proposes a new solution to overcoming algorithm aversion: it investigates if allowing participants to use the model’s estimates adjusted by another human results in higher algorithm usage.

This thesis used the experimental instrument designed by Dietvorst, Simmons, and Massey and provided by the authors for the purpose of this study. Participants in the experiment were recruited through the Amazon Mechanical Turk platform, and had to forecast student’s scores on a standardized math test based on nine variables. The experiment replicated two conditions of the study conducted by Dietvorst et al. (2016) where participants could choose between using their judgment or an imperfect algorithm to make estimates. This study added a third condition that allowed participants to use the algorithm’s estimates adjusted by a former participant to test the new solution to overcoming algorithm aversion that represents this paper’s main contribution.

(7)

The results indicate that human intervention does not aid the overcoming of algorithm aversion. The replication revealed, in line with the original study, that the use of algorithms increases participants’ forecasting accuracy. However, the other finding was surprising, since allowing participants to modify the predictions of the model did not lead to significantly higher algorithm usage rates. Even more interesting is that participants were significantly less likely to use a past participant’s adjustments of the statistical model.

This paper brings two significant contributions to the academic literature. First, this study is an answer to the call for future research made by Dietvorst et al. (2016). The authors encourage future research to investigate the generalizability of their findings on a different sample. This study found conflicting results compared to prior evidence on algorithm aversion. Contrary to the experiment of Dietvorst et al. (2016) this paper did not find support for the hypothesis that allowing participants to adjust the algorithmic forecasts increases people’s use of imperfect algorithms.

Secondly, this study made some incremental steps in the investigation of algorithm aversion. It proposed a new solution to lower people’s reluctance to use algorithms in forecasting tasks. It went beyond the individual level previously researched, where the participants themselves adjust the algorithmic forecast, to test if another person’s intervention in modifying the algorithmic predictions lowers algorithm aversion. The findings of this study also shed light on the possible causes of algorithm aversion.

From a practical point of view, the results of this study are essential for the managers of companies that invest in technological developments and complex algorithms and fail to see

(8)

performance improvements. The findings of this paper can help managers in adopting techniques that encourage forecasters to use algorithms in making predictions. Moreover, the reinforced empirical regularity that algorithms improve people’s forecasting accuracy should persuade managers to implement automated forecasting tools in their organizations.

The remaining sections of the paper are structured as follows: the first section consists of a literature review covering the most important aspects of human or algorithmic forecasting and leads to the hypothesis of this study. The second section provides the methodology for how the experiment was conducted, followed by the results and the discussion of the findings. The last section outlines the limitations of this study and sets directions for future research.

Literature Review

The future is uncertain, but there are countless situations in which managers need to make predictions and take decisions under uncertainty. There is pressure on forecasting experts to make estimates as accurate as possible. In these circumstances, people may turn to automated tools such as algorithms to aid them in the decision-making process. This brings up the question of which is the most efficient forecasting method: the judgmental or the algorithmic forecasting.

The first part of the literature review introduces the main concepts used in this paper and the steps taken in the scientific research that led to investigating algorithm aversion. The next section reviews the literature that compares the performance of algorithms with that of humans in forecasting tasks. The third part touches upon the possible causes of algorithm aversion, while the last sections outline the progress in overcoming algorithm aversion and the hypothesis of this study.

(9)

Algorithms, Forecasting Methods, and Algorithm Aversion

To understand algorithm aversion a few theoretical constructs must be introduced first. This section defines the type of algorithms used in this paper; then it specifies what forecasting accuracy is and its importance in the forecasting field. Later on, it shows how the scientific literature got to studying algorithm aversion.

Algorithms are scripts for sequences of mathematical calculations or procedural steps (Logg, 2017). Some algorithms are automated models that are meant to take the workload off humans’ shoulders, such as the autopilot in a car, or system-monitoring tools in a power plant (Madhavan & Wiegmann, 2007). Others are designed to supplement human judgment and to provide advice, like forecasting support systems or medical diagnostic tools.

In the past, algorithms placed the human in a key role in constructing them. People identified the relevant variables and features to be included in the model, resulting in a Dawesian algorithm, called after one of the pioneer researchers in this field. Today’s algorithms are based on large data sets and do not require the human-built model. Modern algorithms use a general statistical algorithm designed by humans and learn by themselves, without the human guidance of what to look for when making predictions (Yeomans et al., 2016). Although interesting and increasingly used, the current algorithms for machine learning and artificial intelligence are out of the scope of this study.

This paper is concerned with the simple, Dawesian linear models that allow comparison between humans and “routinized versions of a human process” (Yeomans et al., 2016, p. 5). Thus, this paper will use the term “algorithm” as defined by Dietvorst et al. (2014) to refer to “any

(10)

evidence-based forecasting formula or rule, statistical model or mechanical procedures used for forecasting” (p. 1).

Forecasting is an activity intensive in information processing and decision-making. It is a task where both humans and automated aids have valuable inputs in the process (Fildes & Goodwin, 2013). As the end goal in forecasting tasks is to achieve the highest level of accuracy, these automated models have been introduced to aid and improve the decision-making process. The scientific literature presents different perspectives when it comes to investigating the best performing method for predicting an uncertain outcome. The most relevant fields for this study that have investigated forecasting accuracy are the advice-taking literature and the clinical psychology research. The following paragraphs will outline the main findings of these areas of research.

The advice-taking literature focuses on investigating when and how decision-makers incorporate advice from an external source. So far, researchers found, in line with the research on trust (Mayer, Davis, & Schoorman, 1995), that the perceived competence of the advisor is an important factor that influences its credibility. Credibility further affects people’s confidence in the human or statistical advisor, which is directly linked to how often people choose to integrate the advice into the final decision. In the forecasting field, this credibility translates to forecasting accuracy, and a high accuracy of a forecasting method usually determines its further usage. The accuracy is measured as the average distance between forecasts and real-life observations and is one of the focal elements in studying algorithm aversion.

(11)

Dietvorst et al. (2014) confirm the assumption that the more accurate an algorithm is perceived, the more it is used in forecasting tasks. People that experienced algorithms making little to no errors, relied on them and chose to use them more often in forecasting tasks. However, perfect forecasts are unachievable as the future is unpredictable and algorithms are prone to errors. Knowing that the algorithms are imperfect has proved to be a severe threat to people’s willingness to use automated models to improve their accuracy in the forecasting tasks.

There is a large body of studies in the advice taking and in the decision-making literature that investigates the degree to which the end decision-maker considers the information presented by an advisor when deciding in uncertain situations. Bonaccio and Dalal (2006) made a comprehensive review of the studies on advice-taking from human advisors, while Önkal et al. (2009), and Prahl and Van Swol (2017) investigated whether people react differently when the advice comes from humans compared to algorithms. They found that the utilization of advice decreases more when people receive inaccurate advice from algorithms than from humans. This phenomenon of disregarding the advice was called advice discounting (Yaniv & Kleinberger, 2000).

In the clinical literature, Sarbin (1944) was one of the first to draw an analogy between statistical (actuarial) judgment and human clinical reasoning, and to assess the performance of each method. Sarbin’s (1944) research and the book of Meehl (1954) found that experts made less accurate predictions than evidence-based formulas. Dawes, Faust, and Meehl (1989) were among the first to investigate the performance of clinical and actuarial methods of diagnosing patients. The former is concerned with either making unaided judgmental forecasts based on the

(12)

available data using one’s intuition or combining the data with their own judgment. The later uses an automated aid, such as an algorithm, to help with the forecasting process. In clinical trials, this method was called the actuarial judgment, and it eliminated the human judge, basing the prediction solely on a statistical model. The findings of Dawes et al. (1989) lead to the conclusion that the actuarial method significantly outperforms the judgmental forecasts.

Recently, the above-described phenomenon of algorithmic advice discounting and the performance of judgmental versus algorithmic forecasting were investigated in the forecasting field. Dietvorst, Simmons, and Massey (2014) called it “algorithm aversion” (p. 1) and defined it as people’s reluctance to use algorithms in making predictions when knowing that those algorithms are imperfect. The authors seek to understand why people underutilize algorithms and how to increase people’s reliance on automated aids.

To conclude on the progress of the literature towards studying algorithm aversion and set the framework for this study, there are some distinctions to be made. The advice-taking literature focused on investigating how and when people incorporate the feedback from humans or algorithms in their forecasting decisions. The clinical studies eliminated human judgment and reached the diagnostic decision based solely on the prediction of the algorithm. Compared to these fields of study, the research on algorithm aversion led by Dietvorst et al. (2014) focused on understanding why people are reluctant to use imperfect algorithms in making predictions in the first place, and if there are ways to overcome this aversion.

This paper will advance the study of Dietvorst et al. (2016) in overcoming algorithm aversion by investigating if human adjustment of an algorithmic forecast influences

(13)

decision-makers’ initial choice to use algorithms in making predictions. The algorithms used in this study are the simple, linear statistical models used in practice to make predictions on small, automated tasks.

Evidence on Algorithmic and Judgmental Forecasting Performance

Companies have started using algorithms for daily activities such as predicting areas where there are high chances of crime occurrence so they can efficiently allocate resources (Lynch, 2016). Organizations are naturally interested in employing the best method for the most accurate predictions possible. This section will outline the findings of the existing literature on the performance of judgmental and algorithmic forecasts.

Dawes (1979) was one of the first to gather a large body of evidence across several subjects of which clinical diagnosis, forecasts of graduate students’ success or other prediction exercises. Since the goal in making forecasts is achieving a high degree of accuracy, these studies compared human and algorithmic forecasting methods to establish which performed better. His results showed that human experts had a lower performance than simple linear models. In line with this finding, the review of Camerer and Johnson (1997) on expert predictive accuracy concluded that “Sometimes experts are more accurate than novices (though not always), but they are rarely better than simple statistical models” (p. 211).

In a later study, Dawes, Faust, and Meehl (1989) found evidence for the superiority of actuarial methods in clinical diagnosis in “nearly 100 comparative studies in the social sciences” where “in virtually every one of these studies, the actuarial method has equaled or surpassed the clinical method” (p. 1669). Moreover, a meta-analysis of Grove et al. (2000) shows that

(14)

algorithms outperformed humans by an average of 10% in forecasting human health and behavior.

Despite the average superior algorithmic performance, a more recent survey conducted by Fildes and Goodwin (2007) on 149 professional forecasters across domains (telecom, banking, publishing, pharmaceuticals or food manufacturing) found that “only one quarter of forecasts were based exclusively on quantitative methods” (p. 572). Most of them used subjective managerial adjustments of quantitative forecasts and about half of the respondents did not check their reviewed forecasts for improved accuracy. The lack of popularity of algorithms in forecasting fields leads to lower performances, as outlined by Sanders and Manrodt (2003) who surveyed 240 companies and saw significant differences in forecasting accuracy between companies that use human judgment or quantitative algorithms. As expected, the companies focused on quantitative forecasting methods had lower error rates.

There is evidence to support the potential of algorithms to improve the accuracy of human judgment in several domains. As Armstrong (2001) concluded, quantitative methods of forecast tend to be less affected by human judgment errors and biases and to use the data more efficiently. In the advice-taking literature, there is evidence of increased accuracy in forecasting tasks that consider advice from either humans or algorithms simply because the combination of multiple forecasts decreases the random errors associated with the individual prediction (Yaniv, 2004).

However, decision-makers often discount advice from algorithms in favor of human advisors or pay greater attention to advice that comes from a human expert (Önkal et al., 2009).

(15)

People show an overall tendency to favor their own judgment over advice from other sources (Bonaccio & Dalal, 2006; Yaniv & Kleinberger, 2000) and appear resistant to letting a mechanical algorithm make decisions for them (Dawes, 1979). It seems that people’s acceptance of algorithms can take time and experience considerable resistance (Silver, 2012). This view is supported by the experimental studies where participants fail to adjust their original estimates to incorporate advice from algorithms to a wide enough extent, despite knowing that their accuracy will be improved (Önkal et al., 2009).

On the other hand, the scientific literature on judgmental forecasts presents the advantages that people have over algorithms. Humans, unlike statistical models, can consider contextual information such as seasonal variation or sales promotions in their decision-making process. This external knowledge gives them superiority over linear algorithms, or the ability to adjust the algorithmic forecast and increase its accuracy, as found in some studies (Lawrence et al., 2006; Fildes et al., 2009).

There is also evidence that people often identify more exceptions and incorrectly modify the statistical predictions (Dawes et al., 1989). The experimental study of Goodwin & Fildes (1999) revealed that in situations when the statistical forecast should have been used as a baseline and the human judgment should have made corrections relying on additional contextual data, participants discarded the algorithm, resulting in a lower accuracy. The same outcome was observed in Lim and O'Connor's (1995) study where participants had troubles reducing the weight of their own judgments.

(16)

Lastly, there is the possibility of a hybrid human-machine system, where researchers found that the human adjusted algorithmic predictions led to overall more accurate results (Huss, 1985). In the prediction markets, evidence shows that the hybrid human-machine models performed better than the human alone and indistinguishable from machine alone (Nagar & Malone, 2011). Although most studies bring solid findings for the superiority of Dawesian algorithmic predictions over judgmental forecasts, their usage remains relatively low.

Causes of Algorithm Aversion

This section introduces the possible causes of algorithm aversion so it can further generate solutions to overcome it.

After seeing the average superiority of algorithms in forecasting tasks, one would expect forecasters to choose to use algorithms more frequently than their own judgment. Nevertheless, research shows that individuals remain resistant to use algorithms in forecasting domains and prefer human forecasts (Diab et al., 2011; Eastwood et al., 2012), they rely more on human input than on algorithmic input (Önkal et al., 2009; Promberger & Baron, 2006), and judge harsher the professionals who turn for advice to algorithms rather than to humans (Shaffer et al., 2013).

The reliance on human experts was justified by Armstrong (1980) as a shift in responsibility from the forecaster to the expert in case of inaccuracy. However, when statistical models are used to make forecasts, there is no shift of responsibility unless a human intervenes and adjusts the algorithmic forecast. The appearance of rejecting help, one of the causes of advice discounting, does not apply here either, as algorithms are automated tools and a refusal would not affect them (Harvey & Fischer, 1997).

(17)

Another possible explanation is that decision-makers have access to their own judgment and reasoning behind the forecasts made, but they do not see the decisional process of a machine or of another human advisor (Bonacio & Dalal, 2006; Yaniv, 2004). However, people have little to no understanding of a variety of automated aids they use daily, so there must be more than this to their aversion.

Einhorn (1986) theorized one of the most suitable explanations for the cause of algorithm aversion. He explained that forecaster’s reluctance to use algorithms might stem from the acceptance that “models are simplifications of reality that must lead to errors in prediction” (Einhorn, 1986, p. 390), while human judgment can achieve perfection - “the goal to perfect predictability, although difficult to attain, is not impossible” (Einhorn, 1986, p. 389). Dietvorst et al. (2016) and other researchers referred to the cause of this reluctance to use algorithms as an “intolerance of inevitable error” (p. 2). The experiment of Dietvorst et al. (2014) tested this assumption and found evidence of a decrease in confidence in and the use of algorithms after subjects saw they were imperfect. Besides avoiding using algorithms in future tasks, Dzindolet et al. (2002) also found that people tend to exaggerate the perceived error rate of an automated tool.

On a similar line, Madhavan and Wiegmann (2007) started with the opposing belief, that humans generally expect automaton to be perfect, while for people it is natural to make mistakes. They proposed a framework called “perfect automation schema” (Madhavan & Wiegmann 2007, p. 291), which led to the same conclusion as the intolerance of inevitable error:

(18)

that people punish algorithms harsher for making mistakes and consequently, their trust in automation significantly decreases.

There are some adjacent factors contributing to algorithm aversion, such as the human desire for perfect forecasts (Dawes, 1979), the assumed ability of human forecasters to learn from experience and improve in prediction accuracy (Highhouse, 2008), and the presumed incapacity of algorithms to integrate qualitative data (Grove & Meehl, 1996), which all lead to a lesser usage of algorithms. Nevertheless, there are also ways to aid the overcoming of algorithm aversion.

Overcoming Algorithm Aversion

Forecasters’ reluctance to use imperfect superior algorithms impacts organizations interested in making better decisions and predictions that are more accurate. This section will outline the solutions proposed by scholars that could decrease or overcome people’s algorithm aversion.

Dietvorst et al. (2014) researched algorithm aversion and showed that people were less likely to choose an algorithmic forecast over a human forecast after seeing the algorithm err. This effect persisted even when the participants saw the algorithm consistently outperform the human. Dzindolet et al. (2003) also found that people lost trust in the automated aid, even though it made half as many errors as the human and chose self-reliance over using the algorithm.

The assumption that the intolerance of inevitable error drives people’s algorithm aversion led Dietvorst et al. (2016) to conduct an experiment where they gave participants some control

(19)

over an imperfect algorithmic forecast and allowed them to reduce or eliminate the error. The effect was a significant increase in the use of algorithms in making forecasts, which proved to be an effective method to reduce participant’s algorithm aversion.

Studies from related fields bring some conflicting results over people’s reluctance to use algorithms. Prahl and Van Swol (2017) investigated algorithm aversion and the discounting of algorithmic advice and did not replicate the findings of Dietvorst et al. (2014) that human advice is used more than automatic advice. Participants in the experiment made forecasts related to operating room management in a hospital and received input from either a human or an algorithm after giving their estimates. Further on, participants revised their estimates based on the advice, submitted their forecasts again and received feedback on their accuracy. The results show that there was no significant difference in the frequency of using an algorithmic advice or human advice. The findings of the experiment conducted by Prahl and Van Swol (2017) cast some doubt over the assumption that people are algorithm averse and avoid using algorithms in forecasting tasks.

Hypothesis Development

This section introduces the main arguments that led to the development of the hypotheses proposed in this study.

Dietvorst et al. (2014) found that people chose to use algorithms less frequently after they saw that the model was imperfect. Participants displayed algorithm aversion even after seeing the algorithm constantly outperform the human forecasts. The authors concluded that humans have lower tolerance for errors made by algorithms than for people’s mistakes. Later on,

(20)

Dietvorst et al. (2016) brought evidence that people are not always algorithm averse and that if they are allowed to reduce or eliminate the error of the statistical model, they are more willing to use imperfect algorithms.

However, a recent study that used incentivized forecasting tasks did not find evidence of algorithm aversion (Prahl & Van Swol, 2017). In this context, there is a clear opportunity to study if there has been a change in the degree of people’s reluctance to use imperfect statistical models in forecasting tasks during the recent years. Moreover, if people display algorithm aversion, it would be interesting to find out if the solution proposed by Dietvorst et al. (2016) is still effective in encouraging forecasters to rely more on algorithms.

To clarify if allowing people to adjust the algorithmic forecast lowers their aversion, this paper will replicate the experiment conducted by Dietvorst et al. (2016) and will investigate if the conclusion of the original paper can be reinforced. To do this, this paper makes some strong assumptions. First of all, this paper assumes that people are algorithm averse and that it is the same intolerance of inevitable error proposed by Dietvorst et al. (2014) that makes people reluctant to use imperfect algorithms. Secondly, based on the reviewed literature, this paper assumes that people have lower forecasting accuracies compared to linear models.

Therefore, the first two hypotheses are derived from the original study and offer the same

predictions:

H1: As compared to not giving control, allowing decision-makers to modify the algorithmic

(21)

H2: The use of algorithms in forecasting tasks will increase decision-makers’ forecasting

accuracy.

Alternative Solution to Overcoming Algorithm Aversion

There may be an alternative way to stimulate algorithm usage that scholars did not yet consider. As already stated, the purpose of using algorithms in forecasting tasks is to increase the predictive accuracy. There are numerous examples where human adjustments of the algorithmic forecast have ended up lowering models’ accuracy (Lim & O’Connor, 1995; Goodwin & Fildes, 1999; Önkal et al., 2009). However, if allowing people to modify the algorithmic prediction encourages them to use automated aids more frequently and improves their predictive accuracy, it outweighs the costs of decreasing algorithm’s performance. This section proposes an alternative way to overcome algorithm aversion.

The existing literature found that people forgive humans easier than algorithms for making relatively larger mistakes. Dietvorst et al. (2014) found evidence that there is a “general tendency for people to more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake” (p. 2). This finding led Dietvorst et al. (2016) to allow people to rectify the algorithmic forecast, assuming that they will be more forgiving with their own errors and with the model’s errors. This proved to be an effective way to lower people’s algorithm aversion.

In a recent working paper, Logg (2017) replicated one of the experiments of Dietvorst et al. (2015) where participants chose between their judgment and an algorithm’s estimates. The author added another condition, where participants could choose between another person’s

(22)

forecasts and the algorithmic forecasts. Logg (2017) manipulated whether the source of human judgment was participants’ own-judgment or another human’s judgment. The results show a preference for algorithmic estimates as more participants chose the algorithm over the forecasts of another person. They also chose the algorithm more than their own estimates, although less frequently compared to choosing the algorithm over another person’s estimates.

This result can be explained through the phenomenon called “overconfidence”, where people rely more on their own judgments over other people’s judgments (Harvey, 1997). This is consistent with the finding of the literature on advice-taking, where people show a general tendency to favor their own judgment over advice from other sources (Bonaccio & Dalal, 2006; Yaniv & Kleinberger, 2000; Yaniv, 2004).

Dietvorst et al. (2016) also ran an experiment with implications in the advice-taking literature where subjects could either use the imperfect forecasts of another person or an algorithm. They were further constrained to use the forecasts of the chosen entity in one condition or enabled to adjust the predictions in the other condition. The study found that allowing participants to modify the results equally increased their choice of the human forecast and of the algorithmic forecast.

The studies mentioned above explored people’s choice between an algorithm, their judgment, or other people’s forecasts. However, the scientific literature did not investigate the possibility of choosing to use a human-adjusted algorithmic forecast. In this scenario, decision-makers use the algorithm-adjusted forecast of another person that already relied on the statistical model to make predictions.

(23)

If the assumption that people have an intolerance of the inevitable error is valid, then human intervention on the algorithmic forecast should lower this intolerance, as humans are thought to be able to achieve perfection and reduce the errors of automated tools (Einhorn, 1986). Similarly, if people avoid imperfect algorithms because they expect automation to be perfect, an imperfect human-adjusted algorithmic forecast should be accepted easier since for people it is natural to make errors. As people rely on humans more than on machines (Dietvorst et al., 2014), there should be an increase in the use of algorithms when the automated forecast is already adjusted by a human. Therefore, based on the stated assumptions, this paper hypothesizes that prior human intervention should aid the overcoming of algorithm aversion in forecasting tasks.

H3: Human intervention will lower algorithm aversion, hence decision-makers will use the

algorithm’s forecasts modified by another participant more often than they will choose to

use the algorithmic forecast alone.

Researchers have also found that people are less tolerant with algorithm’s relatively smaller mistakes than with humans’ larger errors (Dietvorst et al. 2015). However, people used the algorithm more when they could modify its forecast (Dietvorst et al. 2016). Based on this finding, human-adjustment of the algorithmic forecast should lead to a higher tolerance for errors and increased algorithm usage. Since people trust humans more than automation, there should be no difference in the frequency of choice to use the algorithm when the decision-maker itself modified the algorithm or when someone else already adjusted the algorithmic forecast. From here, the following hypothesis has been developed:

(24)

H4: Decision-makers will be equally likely to use the algorithm in forecasting tasks when they

can modify it or when it was already adjusted by a past participant.

To illustrate the hypothesis proposed above, this paper introduces the following conceptual model:

Figure 1: Conceptual model

Methods

This section illustrates the empirical part of the study. First, there is an overview of the experimental setup. Then the research design is explained. Subsequently, the exact procedure and variables of the conducted experiment are outlined.

Overview

To test the hypothesis stated above, this paper used an experiment based on the one designed by Dietvorst et al. (2016). The original experimental instrument used and developed by Dietvorst, Simmons, and Massey was provided by Dr. Dietvorst on request, for the purpose of this study.

Human intervention in adjusting the algorithmic forecast

Algorithm usage performanceForecasting

Own intervention

Another person’s intervention

H1

H3

H4 H2

(25)

The main finding of the original study is that people are willing to use imperfect algorithms when they can (even slightly) modify their predictions. This study attempts to understand if allowing participants to use the algorithmic forecasts adjusted by another person could also be a solution to algorithm aversion. To find this out, this experiment replicated two conditions from the original paper to check if giving people control over the algorithmic forecast is still an applicable strategy to lower people’s algorithm aversion. Moreover, this study added a new condition that will test if, compared to giving people control over the algorithmic forecast, presenting participants with the option to use the model’s estimate adjusted by a past participant will be equally successful in lowering their algorithm aversion. This last condition is the main contribution of this paper.

As in the original experiment, participants in this study were asked to forecast real student’s score on a standardized math test based on nine variables. Participants could choose to use an algorithm to forecast the results, but they were informed that the model is imperfect, with an average error of 17.5 percentiles. The experiment manipulated whether participants could use the model’s forecasts in a constrained and unconstrained manner, or the model’s estimates adjusted by a past participant to make the official predictions. Participant’s algorithm aversion was evaluated based on their initial choice to use the model or to make their own forecasts. They had no performance feedback on the quality of their predictions.

The experiment used a one-way between-subjects design (Kirk, 1982) with three conditions: in the can’t change condition, participants choose between exclusively using their own forecast or exclusively using the model’s forecast. In the self-adjusted-condition, participants

(26)

choose between exclusively using their own forecast or adjusting the model’s estimates to form their official estimates. In the last group, the participant-adjusted condition, participants choose between exclusively using their own forecast or exclusively using the model’s forecast adjusted by a past participant to form their official estimates.

Table 1: Description of the experimental conditions

The first two conditions were the same as the ones in the initial experiment conducted by Dietvorst et al. (2016) and attempted to replicate the finding that giving people control over the algorithm will lower their aversion. The last condition was designed to test the hypotheses that are the contribution of this paper.

Participants

This study was conducted on the Amazon Mechanical Turk (MTurk) platform. Participants recruited through the platform received a $0.70 fee for completing the study and could earn an additional bonus of up to $0.50 depending on their forecasting accuracy. The number of participants was predetermined to 87 subjects for each condition, adding up to 261 participants. They were filtered so none had experience in forecasting and they were unfamiliar with the task. There were also criteria to make sure the participants had performed high-quality tasks on

Can’t-change condition Self-adjusted condition Participant-adjusted condition Participants chose

between exclusively using their own forecast

or exclusively using the

model’s forecast.

Participants chose between exclusively using their own forecast or exclusively

using the model’s forecast. They learned that they can modify the model’s forecasts however much they

wanted.

Participants chose between exclusively using their own

forecast or exclusively using

the human-adjusted

(27)

Amazon MTurk in the past. At the beginning of the study, participants answered some questions designed to check their comprehension of the instructions. The ones that failed to give the right answers were not paid for their participation and were redirected to the end of the experiment without influencing the results of the study. They were replaced to ensure that each condition had 87 responses.

Procedure

The experiment was administered as an online survey. The experiment was conducted in two phases. The first two conditions were run on Amazon MTurk between the 7th and the 10th of May 2018. The adjustments that the participants made to the model’s estimates were used to form what the participants saw as the “model’s estimates adjusted by a participant” in the last condition. Since participants were told that this study uses real data from real students, running the conditions separately was necessary to avoid deception. Consequently, data for the last condition was collected between the 11th and the 15th of May 2018.

Participants gave their consent and entered their Amazon MTurk identification number. They learned that they would make 20 incentivized forecasts where they will estimate the percentile score of 20 real high-school students on a standardized math test. Then they received a description of the percentiles to ensure their understanding. Participants were informed that the data described real-life students and then read the detailed description of the nine variables they would use in their forecasts.

Participants were then told that they could use a statistical model designed by analysts to make the forecasts. They truthfully learned that analysts built the model using data from

(28)

thousands of high school students and that it was based on the same nine variables that they would see. The model did not have any additional information, and it was presented to participants as a “black box”, in the same manner as the initial study framed it, with the following description: “a sophisticated model, put together by thoughtful analysts” (Dietvorst et al., 2016, p. 3). This is usually the case in real-life situations when users have little insights into the workings of the algorithms. Then participants found out that the model’s estimates were off by an average of 17.5 percentiles, hence that the model was imperfect. Moreover, they were told that the estimates might have an error of more or less than 17.5 percentiles from one student to another.

In the following section, participants learned about the incentives for making accurate forecasts. They received $0.50 additional bonus for making estimates within five percentiles of the students’ actual score on the test. This bonus decreased by $0.10 for every additional five percentiles of error in participants’ estimates. Participants were also asked to type the following sentence to ensure their comprehension of the incentives: “During the official round, you will receive additional bonus money based on the accuracy of the official estimates. You can earn $0 to $0.50 depending on how close your official estimates are to the actual ranks.”(Dietvorst et al., 2016, p. 3).

Then participants were randomly assigned to one of the three conditions: in the

can’t-change condition, participants were told that they could use their judgment exclusively to make

forecasts or an algorithm’s forecasts exclusively. In the self-adjusted condition, they learned that they could choose between exclusively using their judgment to make forecasts and exclusively using the algorithm, but that they can modify the estimates of the model however much they

(29)

want if they choose to use it. Finally, in the participant-adjusted condition, participants could make forecasts exclusively relying on their own judgment or they could choose to use the model’s estimates adjusted by a past participant to form their official estimates. After their assignment, participants had to type a sentence describing their condition to ensure their understanding of the procedure.

Next, participants decided if they wanted to use their judgment or the statistical model and made their 20 incentivized predictions. The 20 scenarios in which participants made predictions were randomly drawn without replacement from a pool of 50 high school students.

Table 2: Example of the stimuli used in the study

Race More than one race, non-Hispanic

Socioeconomic status (first = lowest, fifth = highest) Fifth quintile (highest) Desired occupation at age 30 Legal Occupations

Predicted highest degree Complete Ph.D./M.D./law degree/other high level professional degree

Region of country West

Times taken PSAT Twice

How many friends are not going to college None of them

Favorite school subject Social studies/history/government/civics

Taken any AP test Yes

Participants who chose to use their judgment made their forecasts without seeing the models’ predictions. In the can’t-change and participant-adjusted condition, participants who chose to use the model submitted their forecasts anyway, but they were not used to determine their bonus. Participants in the self-adjustment condition received a bonus according to their forecasting accuracy regardless of the forecasting method chosen.

(30)

After completing the 20 incentivized forecasts, participants rated their confidence in their own forecast and in the models’ forecast on a five-point scale. In the end, they entered a unique code generated by Qualtrics to match their MTurk ID with their response to receive the bonus and ended the experiment.

To conclude, the procedure was the same as the one in the experiment conducted by Dietvorst et al. (2016) with a few exceptions: first, the third experimental condition was different. The original study used only the forecasts of the algorithm while giving participants permission to restrictively or freely adjust the forecasts of the statistical model. This experiment introduced the adjusted algorithmic estimates by a past participant to test the hypotheses that are the contribution of this paper and attempt to answer the research question. Second, instead of a laboratory set-up, this experiment recruited participants online, through the Amazon MTurk platform. Third, the remuneration for participating in the experiment was different: participants received $0.70 for participating, while the bonus scheme used was the one designed by Dietvorst et al. (2016) for their second experiment, which was distributed online as well.

Independent Variables

This study used one independent variable, the human intervention in adjusting the algorithmic forecast, with three conditions. The first condition involved no human intervention over the algorithmic forecast. In the second condition, the independent variable was manipulated to offer participants the option to modify the algorithm through their own intervention, while the last condition involved a past participant’s adjustment of the algorithmic forecast.

(31)

Dependent Variables

The main dependent variable of the study was the use of the algorithm in making forecasts. This was computed as the percentage of participants that chose to use the algorithm in making estimates. This is a behavioral measure of the algorithm aversion.

There were also several other dependent variables assessed at the end of the experiment to examine the differences between conditions.

Confidence: Towards the end of the experiment, participants answered questions designed to check their confidence in the model’s estimates and in their own forecasts as ratings on a 5-point scale (1= none, 5= a lot).

Performance differences: Computed as the difference between the forecasted percentile ranking and the students’ realized score on the standardized math test. This variable is necessary for the study as it enables comparing accuracies of the various forecasting methods tested.

Bonus earned: computed based on the average bonus that participants earned in each condition according to the chosen forecasting method.

Results

The participants of this study were recruited on Amazon MTurk, and the final sample consisted of N = 260 responses from participants that passed the attention check questions. The first two conditions had 87 responses each, while from the 87 responses in the last condition one response was removed because the forecasts were not within the required 0-100 range. There was no missing data. The sample consisted of 168 males (64.6%) and 92 females (35.4%).

(32)

A one-way ANOVA analysis on age, gender and education was performed to test for the differences between the three groups. The analysis revealed that there are no significant differences across conditions in age F (2, 257) = 0.95, p = 0.91, gender F (2, 257) = 0.25, p = 0.97 and education F (2, 257) = 2.46, p = 0.92.

Table 3: Means across conditions for age, Table 4: One-way ANOVA for age, education education and gender and gender

The correlation matrix revealed strong linear relationships between some of the dependent variables. The lower the performance differences or participants, the higher the bonus, as revealed by the strong, significant negative correlation r(260) = -.87, p < 0.001. The use of the algorithm in making predictions has a strong negative correlation with the performance differences r(260) = -.66, p < 0.001, showing that participants who use the model have a smaller absolute difference between the forecasted and the realized scores. There is also a significant

SS DF MS F Sig Age 14.79 2 7.39 .095 .910 Error 20071.406 257 78.10 Total 20086.196 259 SS DF MS F Sig Education 3.57 2 1.787 2.406 .092 Error 190.811 257 .742 Total 194.811 259 SS DF MS F Sig Gender .011 2 .006 .025 .976 Error 59.435 257 .231 Total 59.446 259 Age M SD N Can't-change 32.37 9.04 87 Self-adjusted 32.89 7.88 87 Participant-adjusted 32.86 9.51 86 Gender Can't-change 1.34 .47 87 Self-adjusted 1.36 .48 87 Participant-adjusted 1.36 .48 86 Education Can't-change 3.36 .94 87 Self-adjusted 3.09 .81 87 Participant-adjusted 3.13 .82 86

(33)

positive linear relationship between the use of the algorithm and the bonus earned r(260) = .71, p < .001), participants that chose to use the algorithm earned a higher bonus.

Table 5: Correlation matrix showing Means, Standard Deviations, and Correlations

Variables M SD 1 2 3 4 5 6 7 8 1.Gender 1.35 0.47 - 2.Age 32.7 8.8 .03 - 3.Education 3.19 0.86 -.05 -.14* - 4. Performance differences 23.13 6.85 -.03 -.04 .99 - 5. Bonus 1.19 0.92 .03 -.01 -.07 -.87** -

6. Confidence in model's estimates 3.38 0.88 -.06 -0.8 .22** .12 -.09 -

7. Confidence in own estimates 3.31 1.01 .02 -.02 .20** .11 -.12 .51** - 8. Use of the algorithm in making

forecasts 0.59 0.49 .06 -.10 -.00 -.66** .71** .05 -.01 - **. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

Hypothesis Testing

Use of the algorithm in making predictions.

As predicted in the original study and retested here, allowing participants to modify the algorithmic forecast resulted in a higher frequency in their choice to use the algorithm to form their official estimates. Whereas 62.1% of the participants in the can’t-change condition used the model, 72.4% of those in the self-adjusted condition chose to use the model’s estimates and adjust them to form their official predictions. However, the Pearson Chi-Square test revealed that the difference in the frequency of choosing to use the algorithm between the two groups was not significant (χ2(1, N = 174) = 2.11, p = .146).

Between adjusting the model themselves or using the model’s estimates adjusted by a past participant without being able to change them, participants had an interesting, strong

(34)

preference to have their own intervention determine their bonus rather than someone else’s intervention. This result will be discussed further as it represents the main contribution of this paper. While 72% of the participants in the self-adjusted condition chose to use the model, only 43% of the ones in the participant-adjusted condition tied their bonus to the participant-adjusted algorithm. This difference is significant, as revealed by the Person Chi-Square test χ2(1, N = 173) = 15.31, p < 0.001. The result is in line with the literature in the field of cognitive sciences that found that people rely more on their own judgments over other people’s judgments, a phenomenon called “overconfidence” (Harvey, 1997).

When comparing the choice of using the model without being able to adjust it, in the can’t-change condition to using the model’s estimates adjusted by a past participant, participants in this experiment had a preference for the unadjusted algorithm. Out of the 87 participants in the can’t-change condition, 62% chose to use the model to determine their bonus, whereas only 43% opted for using the model’s estimates adjusted by a past participant. This difference is significant χ2(1, N = 173) = 6.29, p = 0.012 and it signals that having another person intervene in adjusting the algorithmic forecast actually lowers participant’s use of the model. This finding contradicts the assumption that human intervention will lower algorithm aversion.

Table 6: Pearson Chi-Square test results on participant’s choice to use the algorithm

Conditions Pearson Chi-Square Df P N

Can't-change vs self-adjusted 2.113 1 0.146 174 Self-adjusted vs participant-adjusted 15.316 1 <.001 173 Can't-change vs participant-adjusted 6.292 1 0.012 173

(35)

62% 72% 43% 0% 10% 20% 30% 40% 50% 60% 70% 80%

Can't-change Self-adjusted Participant-adjusted

n = 87 n = 87 n = 86

% of participants who chose to use the model

32% 76% 73% 0% 10% 20% 30% 40% 50% 60% 70% 80%

Can't-change Adjust-by-10 Change-10

% of participants who chose to use the model

n = 74 n = 72 n = 72

Original design and findings on use of the algorithm. In their experiment, Dietvorst et al. (2016) used four conditions: in the first condition, the can’t-change condition, participants could use the model without modifying its predictions; in the remaining three conditions (adjust-by-10,

change-10 and use-freely) participants were restricted to how much they could adjust the

algorithmic forecast. This paper replicated two conditions of the paper published by Dietvorst et al. (2016). The authors found that allowing decision-makers to modify the algorithmic forecasts significantly increases their use of the model, as well as their forecasting accuracy.

Figure 2: Frequency of participant’s choice to use the algorithm in making predictions in this study (left) and in the original study (right)

This experiment replicated two conditions of the original paper, but it did not replicate the original findings, as participants in the can’t-change condition chose to use the model far more frequently (62%) than the ones in the original paper (32%), leading to non-significant differences between restraining participants to use the algorithmic forecast and allowing them to modify the model’s estimates. The implications of this finding will be discussed further in the discussion section of the paper.

(36)

Adjustment size

Participants in the can’t-change condition used the model less frequently than the ones in the self-adjusted condition and had forecasts that were significantly farther from the model’s

predictions, as the revealed by the results of the independent-sample t-tests (t(172) = 2.03, p = .023). Participants that chose to tie their bonus to the model in the can’t-change condition

introduced their estimates anyway, and the results show that they deviated more from the

algorithm’s estimates (M = 10.29, SD = 9.32) than participants in the self-adjust condition (M = 8.3, SD = 8.23), although the differences are not statistically significant (t(106) = 1.21, p = .226). Therefore, allowing participants to modify the predictions of the algorithm resulted in

them anchoring their incentivized forecasts slightly closer to the model’s predictions.

Between adjusting the algorithmic forecast and using the model’s estimates modified by a past participant, the results show that participants in the self-adjusted condition deviated less from the model than the ones in the participant-adjusted condition. Participants who chose to use the model and modify its predictions to form their official estimates made adjustments of 8.3 percentiles on average (M = 8.3, SD = 8.23). Participants that tied their bonus to the participant-adjusted algorithmic forecast and used them as official estimates still introduced their own estimates, and they were 10.57 percentiles away from the official estimates, on average (M = 10.57, SD = 8.63). Although the differences are not significant (t(98) = -1.31, p = .194), the results show that participants anchored closer to the algorithm’s forecasts than to the model’s estimates adjusted by a past participant.

(37)

Between the can’t-change and participant-adjusted conditions, the forecast introduced had similar deviations from the model. Participants in the can’t-change condition had a 10.29 percentile average difference from the model, while the predictions in the participant-adjusted condition were 10.57 percentiles away from the model, on average. The independent-sample t-tests show no significant statistical differences between conditions t(89) = -.143, p = .886. The findings reveal that out of the three conditions, participants used the model’s forecasts and relied on them the most when they were able to adjust them.

Table 7: Average differences between participant’s estimates and the algorithmic predictions for participants who used the model

Groups T Df P Mean Difference

Can't-change Self-adjusted1 1.217 106 .226 1.99

Can't-change Participant-adjusted -.143 89 .886 -.27 Self-adjusted Participant-adjusted -1.309 98 .194 -2.27

Table 8: Average absolute deviation from model

Groups T Df P Mean Difference

Can't-change Self-adjusted 2.037 172 .043 2.90

Can't-change Participant-adjusted -1.888 171 .061 1.43 Self-adjusted Participant-adjusted -4.031 171 <.001 1.39

(38)

Original findings on adjustment size. In the original study, participants voluntarily included the estimates of the model in their predictions but had similar deviations regardless of how much they were constrained to adjust the model’s predictions.

This study replicates the findings that allowing people to modify the algorithm results in lower deviations from the model’s estimates (although not statistically significant), but they also reveal a surprising result: participants deviate more from the participant-adjusted forecast than from the model when they cannot adjust the estimates of the forecasting method chosen. Overall, the adjustments made by the participants were larger in this study compared to the original paper.

Figure 3: Average adjustment size in this study (left) and in the original study (right)

Forecasting performance

Participants in the can’t-change condition had slightly higher accuracies than the ones in the self-adjusted condition, given their prevalent forecasting method of choice was to use the model (62.1%), together with their constraint to use the algorithm’s estimates as the official ones. However, the differences between conditions were not significant, t(172) = -1.16, p = .248.

14.18 11.27 16.89 0 5 10 15 20

Can't-change Self-adjusted Participant-adjusted

Average absolute deviation from model

12.6 8.03 8.49 8.18 0 2 4 6 8 10 12 14

Can't-change Adjust-by-10 Change-10 Use-freely

Average absolute deviation from model

(39)

Given their higher accuracies, participants in the can’t-change condition earned slightly higher bonuses (M = 1.36, SD = .087) than the ones in the self-adjusted condition (M = 1.11, SD = .086). Despite the differences not being significant (t(172) = 1.82, p = .070), they show that allowing participants to adjust the algorithm lowered its performance, as it was previously found in the reviewed literature (Carbone et al., 1983; Goodwin & Fildes, 1999; Lim & O’Connor, 1995).

There are no statistically significant differences in forecasting accuracy between the self-adjusted and participant-self-adjusted conditions (t(165) = -.664, p = .508), nor in the bonus earned (t(167) = .154, p = .878). The differences found between the can’t-change and participant-adjusted conditions were also not significant, both in their accuracy (t(171) = -1.720, p = .087) and bonus earned (t(171) = 1.841, p = 0.067).

Table 9: Performance differences between groups

Groups T Df P Mean Difference

Can't-change Self-adjusted -1.159 172 .248 -1.128 Can't-change Participant-adjusted2 -1.720 171 .087 -1.840

Self-adjusted Participant-adjusted2 -0.664 165 .508 -.712

Table 10: Bonus differences between groups

Groups T Df P Mean Difference

Can't-change Self-adjusted 1.826 172 .070 .013

Can't-change Participant-adjusted 1.841 171 .067 .026 Self-adjusted Participant-adjusted 2 0.154 167 .878 .002

(40)

Figure 4: Performance differences and bonus earned across conditions

When comparing the performance of the participants that chose to use the model to those that made their own estimates, the results of the independent sample t-tests show significant performance differences in every condition. Participants that chose to use the model in every condition - regardless of their adjustment constraints or if it was their own intervention or a past participant’s intervention in modifying the algorithm - had significantly higher accuracies, as summarized in Table 11. Therefore, participants that used the model in the can’t-change condition made more accurate predictions than the ones who made their own estimates (t(34) = -10.58, p < .001), as the participants in the self-adjusted condition that modified the algorithmic predictions compared to the ones that did not use the model (t(85) = -4.11, p < .001). The same results show up for the last condition, where participants that used the model’s estimates modified by a past participant had higher accuracies than the ones that made their own forecasts (t(57) = -12.31, p < .001). 0.136 0.111 0.109 0.1 0.105 0.11 0.115 0.12 0.125 0.13 0.135 0.14

Can't-change Self-adjusted Participant-adjusted

Average bonus earned in $

22.14 23.27 23.98 21.00 21.50 22.00 22.50 23.00 23.50 24.00 24.50

Can't-change Self-adjusted Participant-adjusted

Average absolute error

(41)

Participant-adjusted Can’t-change

Table 11: Performance differences in each condition based on participant’s bonus choice

Condition Groups T Df P Mean Difference Can't-change Use model's estimates Use my estimates3 -10.583 34 <.001 -10.732

Self-adjustment Adjust model's estimates Use my estimates -4.116 85 <.001 1.415

Participant-adjusted

Use model's estimates adjusted

by a past participant Use my estimates

3 -12.317 57 <.001 -11.713

Table 12: Bonus differences in each condition based on participant’s bonus choice

Condition Groups T Df P Mean Difference

Can't-change Use model's estimates Use my estimates3 13.042 36 <.001 .155

Self-adjustment Adjust model's estimates Use my estimates 4.901 85 <.001 .090

Participant-adjusted

Use model's estimates adjusted

by a past participant Use my estimates

3 14.616 83 <.001 .168

Figure 5: Distribution of participant’s average absolute errors by condition, and whether or not they used the model

3 Levenes’s Test was significant (p < 0.01) so equal variance was not assumed

(42)

Original findings on forecasting performance. As already stated, the accuracy of predictions in forecasting tasks is of utmost importance, both in theory and practice. Dietvorst et al. (2016) found that the use of algorithms in making estimates led to significantly better predictions. Participants that could modify the algorithmic estimates outperformed those that could not adjust the model. Although the estimates of the model were imperfect, reliance on the algorithm improved participant’s forecasting accuracy.

This study replicated the finding that algorithm usage leads to more accurate forecasts, but only between participants that chose to use the model against those that made their own estimates within the same condition. Across conditions, the accuracy of forecasts did not improve. On the contrary, allowing participants to modify the algorithm lowered its performance, as depicted in the graphs below. The worst performance was registered in the condition that was different from the initial study, where the majority of participants chose not to use the model’s estimates adjusted by a past participant, thus making their own predictions.

Figure 6: Forecasting performance in this study (left) and in the original study (right)

22.14 23.27 23.98 21.00 21.50 22.00 22.50 23.00 23.50 24.00 24.50

Can't-change Self-adjusted Participant-adjusted

Average absolute error

22.97 20.08 20.44 19.97 18.00 19.00 20.00 21.00 22.00 23.00 24.00

Can't-change Adjust-by-10 Change-10 Use-freely

Average absolute error

(43)

Confidence

There were no significant differences in confidence ratings in participant’s own estimates and in model’s estimates across conditions. Between the participants that chose to use the model compared to the ones that used their own estimates, significant differences in the results of the t-tests were found only in the self-adjusted group, where participants that chose to use the model were significantly more confident in model’s estimates than in their own estimates (t(85) = 2.12, p = .037). However, the differences in confidence in their own estimates were not significant (t(85) = -.54, p = .591).

No significant differences in the performance estimate that participants made for their own forecasts and the model’s forecasts were found either between conditions or between the forecasting methods of choice.

These findings are in line with the ones of the original study that reported no significant differences in the confidence ratings between or within the experimental conditions.

Results of hypothesis testing

The first two hypothesis were in connection with the replication of the original study. The results of the analysis do not support H1, suggesting that allowing decision-makers to modify the algorithmic forecast does not significantly increase their use of algorithms in making predictions.

Participant’s choice to use the model led to significantly higher bonuses, due to the higher forecasting accuracies. The results show support for H2, thus the use of algorithms in making predictions significantly increases decision-makers’ forecasting accuracy and leads to higher bonuses. This conclusion leads to a partially successful replication of the original study.

Referenties

GERELATEERDE DOCUMENTEN

The International Conference on Digital Economy, ICDEc, was founded in 2016 to discuss innovative research and projects related to the supporting role of information system

Overall the Mail&amp;Guardian failed to meet the media requirements of the Protocol because it reinforced gender oppression and stereotypes in its coverage of

Abstract—The paper presents distributed algorithms for com- bined acoustic echo cancellation (AEC) and noise reduction (NR) in a wireless acoustic sensor and actuator network

Actually, when the kernel function is pre-given, since the pinball loss L τ is Lipschitz continuous, one may derive the learning rates of kernel-based quantile regression with 

Of all 110 patients diag- nosed with isolated ssPE in the Christopher cohort, 11 patients (10% of ssPE diagnosis and 2% of all PE diagnoses) would have remained undetected if the

Both the clinical model and the genetic model are expected to improve acenocoumarol therapy in pediatric patients, as compared with the dosing method used today, which is based only

wordt naar het verschil tussen de onafhankelijke variabelen, leeftijd, mate van ADHD symptomen en de psychosociale vaardigheden van een kind met (vermoeden van) ADHD in

vrouwen?” Op basis van dit onderzoek kan worden geconcludeerd dat zowel mannen als vrouwen een lagere merkattitude hebben bij naaktheid in alcoholadvertenties dan wanneer er