• No results found

BIAS IN PERFORMANCE EVALUATIONS WITH THE BALANCED SCORECARD

N/A
N/A
Protected

Academic year: 2021

Share "BIAS IN PERFORMANCE EVALUATIONS WITH THE BALANCED SCORECARD"

Copied!
26
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

BIAS IN PERFORMANCE EVALUATIONS WITH THE

BALANCED SCORECARD

Stefan Ziengs (S2702959) University of Groningen

MSc Business Administration – Organizational & Management Control

(2)

2 ABSTRACT

(3)

3

INTRODUCTION

In 1992 Kaplan and Norton introduced a new management tool called the balanced scorecard (BSC). In the U.S. the conventional methods of performance measurement at the time were mainly based on financial measures. Critics of this practice argued that these measures were incompatible with the increasing trend of innovation and led managers to only focus on short-term financial results (Hoque, 2014). Among them there were differing opinions on how to remedy the shortcomings of the existing performance measurement systems. Some argued that the existing financial measures should be made more modern, while others suggested abolishing financial measurements completely and replacing them with non-financial measures (Kaplan and Norton, 1992). However, Kaplan and Norton (1992) disagreed with both approaches and stated that managers should not have to make a choice between financial or non-financial measures. After thoroughly studying the performance measurement practices within various companies, they designed the BSC.

The BSC contains, in addition to traditional financial performance measures, non-financial measures on customer satisfaction, internal processes, and innovation and improvement activities (Hoque, 2014; Kaplan and Norton, 1992). Through the inclusion of financial measures managers will receive timely feedback on results of the decisions they have made, while the complementing non-financial measures provide information on factors that will drive future performance (Kaplan and Norton, 1992). Consequently, Kaplan and Norton argue, the BSC is a solution to the growing focus on improvement and innovation in contemporary companies.

Since its conception the BSC has received significant attention from both scholars and practitioners. It has been the subject of continuous improvement by Kaplan and Norton themselves, as well as by numerous other authors (Kaplan and Norton, 1996; Kaplan, Norton and Rugelsjoen, 2010; Lawrie and Cobbold, 2014; Norreklit, 2000). As a result, the BSC has evolved from a relatively simple tool that provides an overview to management into a complex system that is integrated throughout entire companies (Perkins, Grey, and Remmers, 2014). Therefore, whereas the BSC’s aim first was to provide an overview of a firm, it is now used in a wide range of organizational processes, such as increasing the involvement of employees and evaluating their performance (Hoque, 2014). In this study the BSC will be examined from a performance evaluation context, in which it can also serve as a tool for comparing performance across different business units and for deciding on the financial compensation of unit managers.

(4)

4

form. Another recent survey by 2GC Active Management (2017) focuses on what purpose the BSC serves within organizations. From this survey it can be concluded that its main use in organizations is for strategy implementation, but also that about half of the firms in the sample use the BSC during performance evaluations. Despite its widespread use, however, research has identified numerous cognitive biases that can prevent the BSC from functioning optimally as a performance evaluation tool (Ittner, Larcker and Meyer, 2003; Kang and Fredin, 2012; Lipe and Salterio, 2000). In this study I will elaborate on two distinct groups of biases that influence performance evaluators during performance evaluations and that therefore prevent the optimal functioning of the BSC. The first type centers on biases that affect the evaluators’ attitude towards performance measures and the second type of biases influences the evaluators’ attitude against the subordinate they are evaluating.

Since these two groups of biases cannot be observed directly, they have to be manipulated in order to observe their effect on performance ratings. Firstly, for the group of measures-related biases, this study does so by distinguishing between objective and subjective measures. The reason for this distinction is that previous research has found that objective and subjective measures are not interchangeable and are susceptible to biases in different degrees (Bommer, Johnson, Rich, Podsakoff and MacKenzie, 1995; Rich, Bommer, MacKenzie, Podsakoff and Johnson, 1999). This is caused by the fact that, unlike objective measures, subjective performance measures often do not have clear benchmarks and thus require more discretion from evaluators (Bommer et al., 1995). As a consequence, subjective measures are more exposed to the biases of evaluators. Since performance evaluations nearly always consist of a mixture of subjective and objective performance measures to capture the positive aspects of both (Bommer et al., 1995), this distinction enables the research of evaluator biases related to the measures. Secondly, in order to see the effects of the group of biases related to the subordinates, I investigate the role of the trust that an evaluator has in their subordinate. In this study trust (distrust) is defined as the amount of confidence an evaluator has that their subordinate will continue to perform positively (negatively) based on past behavior. Since trust is a factor that affects the evaluator’s attitude against the subordinate they are evaluating, it will enable the research of the group of biases that is related to subordinates. The exact role of trust in this context is also still unclear and will therefore also be touched upon in this study.

(5)

5

measures, the second is focused on the standalone effects of (dis)trusting a subordinate and the third aims to convey their combined impact.

I hypothesize that the effect of using both subjective and objective measures, while keeping all other factors equal, leads evaluators to likely prefer one type of measures over the other. Since subjective measures are subject to less clear benchmarks and more susceptible to biases, I speculate that evaluators are less willing to use subjective measures to avoid being seen as unfair. Consequently, their ratings are based more on objective measures than subjective measures. When it comes to the presence of (dis)trust, I argue that evaluators will be affected by biases such as the halo effect and the confirmation bias, which means that they are biased towards the information that confirms their preexisting beliefs (Jonas, Schulz-Hardt, Frey and Thelen, 2001). This implies that when evaluators trust their subordinate based on positive past performance, they will focus on information corresponding to that trust, which are the positive measures. In the case of distrust, the opposite will occur as evaluators put more weight on the negative measures. Accordingly, the ratings of the distrusted subordinate will be lower than that of the trusted subordinate, even when all other factors are equal.

The remaining hypothesis is about what happens when the evaluator’s preference for a certain type of measures (in this case the objective measures) interacts with their (dis)trust in a subordinate. I hypothesize that whether evaluators mostly base their ratings on the objective measures or on their trust in a subordinate, depends on whether they trust or distrust their subordinate. If evaluators trust their subordinate, they would mostly be affected by biases such as the halo effect and confirmation bias and therefore put more weight on the measures that correspond with their attitude towards the subordinate. If, on the other hand, they distrust their subordinate I argue that the leniency error comes into play. The leniency error is the tendency of evaluators to be lenient during performance evaluations, which results in disproportionally high ratings. A common explanation for this bias is that evaluators are more lenient out of fear of being unfair in assigning a rating (Golman and Bathia, 2012). I therefore hypothesize the same bias comes into play when the evaluator distrusts their subordinate. This implies that evaluators will disregard their attitude towards the subordinate out of fear of being unfair and that they would therefore base their rating mostly on their preferred measures.

(6)

6

employee which primed trust or distrust and were then shown the employee’s BSC. Based on the information they received they were asked to rate that employee on a seven-point scale adapted from Banker (2011). Three possible control variables are taken into account during this experiment, namely the participants’ propensity to trust, their self-perceived objectivity and whether their gender is the same as that of the employees.

The results of the experiment indicate some interesting findings. First, there is no support for the first hypothesis which states that evaluators prefer objective measures over subjective measures. As a matter of fact, there is very strong support for the opposite of this hypothesis, which implies that participants strongly prefer subjective measures over objective measures. Secondly, strong support for the second hypothesis is found since participants gave significantly higher ratings to trusted employees in comparison to distrusted employees. Thirdly, there are no significant findings related to the combined effects of trust and contradicting performance measures. The implications of these results are discussed in the conclusion of this paper.

This study contributes to the existing literature on performance evaluation and the balanced scorecard in more than one way. First, although the distinction between biases against measures and biases against subordinates exists in practice, their respective impact on the judgment of evaluators has not been the subject of any significant research. Secondly, introducing the factor of trust in this context helps to further understand the dynamics that take place between evaluators and subordinates during performance evaluation. As opposed to much of the existing literature, this study measures trust from the perspective of the evaluator instead of the person being evaluated.

The remainder of this paper is structured as follows. First, I will elaborate on the theoretical background of this study, after which I will formulate the hypotheses. Next, I will describe the methodology of this study, and subsequently I will present its results. In the final section I will discuss those results and conclude with the implications and limitations of this research.

THEORETICAL BACKGROUND

Issues in the performance evaluation process

(7)

7

achieving this. It does so by complementing the traditional financial measures with non-financial measures revolving around customer satisfaction, internal processes, and innovation and improvement activities (Hoque, 2014).

However, when it comes to performance evaluation, the BSC is not flawless. Lipe and Salterio (2000), for instance, describe how tailoring the BSC to each business unit in a company might be pointless, since performance evaluators tend to put significantly more weight on the measures that all business units have in common. A follow-up study by Banker, Chang and Pizzini (2004) corroborates the findings of Lipe and Salterio (2000) and also discovers that the benefits of the BSC cannot be exploited fully unless evaluators have an extensive understanding of the BSC’s linkages with strategy. Yet another study by Ittner et al. (2003) finds that when evaluators have to give bonuses with the BSC based on their own subjectivity, they will favor certain subordinates and become uncertain as to which measures they should base their rating on.

These examples show that while one of the BSC’s aim is to remedy the shortcomings of traditional performance evaluations, there still are issues that prevent it from fully achieving this. Since it aims to improve performance measurements by including more diversified measures (i.e. financial versus nonfinancial measures), it is interesting to see that many of the problems in the previous paragraph stem from the type of measures that are used in the BSC. The fact that performance measures can be categorized in many different ways other than financial versus nonfinancial, therefore implies that the functioning of the BSC is dependent on many different factors. One way in which I will evaluate the functioning of the BSC in this study is by distinguishing between objective and subjective performance measures. By making this distinction I hope to further shed a light on the different types of performance measures and their respective role in the BSC.

Objective versus subjective performance measures

(8)

8

performance of salespersons. They also reach the conclusion that objective and subjective performance measures are not substitutes of each other.

Expanding further on their findings, Bommer et al. (1995) illustrate why objective and subjective measures cannot substitute each other. They describe on the one hand how subjective measures are relatively more susceptible to threats to their validity than objective measures, which implies that they are disproportionally more affected by evaluator biases than objective measures. The underlying reason can be found in the fact that there are no clear benchmarks for evaluating performance with subjective measures, while objective measures often have well-defined standards. On the other hand, Bommer et al. (1995) argue that objective measures can also be criticized for giving a too narrow overview of performance. Moreover, they state that purely objective performance measures can never give a complete overview of all aspects of performance. For this reason, performance evaluations should always be based on both types of measures (Bommer et al., 1995).

The fact that subjective measures are more susceptible to biases, however, has implications for the accuracy of performance ratings. To understand how the choice between subjective and objective measures can affect the accuracy of performance ratings, I will first elaborate on how these biases come to be. Evaluators have to rely on their own discretion in deciding their subordinate’s performance (Moers, 2005). This causes problems, as previous research has consistently demonstrated that individuals are not fully rational (Kahneman, 2003; Simon, 1979). This phenomenon, known as bounded rationality, is especially relevant to the decision-making literature because managers constantly enact cognitively demanding decisions in their daily activities. Therefore, they continually face cognitive limits that affect their ability to process and interpret information (Simon, 1979; Tiwana, Wang, Keil and Ahluwalia, 2007). The same concept is applicable to the performance evaluation process, because performance evaluators are often subject to time constraints and large amounts of information in examining their subordinates’ performance. It is these cognitive limitations that result in biases that are present throughout performance evaluations. These biases are then responsible for distorting the judgment of evaluators, leading them to give less accurate ratings (Bommer et al., 1995).

(9)

9

into the way they each affect evaluators’ judgment. In this study I will therefore investigate what their actual impact is on evaluator judgment, but first I will describe several important biases relevant to this research and classify them into one of the two previously described groups as shown in table 1.

Table 1: Biases in performance evaluation

Measures-related biases Person-related biases Common-measures bias Halo (horn) effect Organization- and presentation biases Leniency error

Biases related to evaluator’s attitude towards measures

The first type of biases is related to the measures that are used to evaluate the performance. They affect the way in which evaluators interpret the information that forms the basis of their performance ratings. It is important to take into account that the presence of these biases relies on the type of tools that are used in evaluating performance. Therefore, I will specifically look at the measure-related biases in the context of the balanced scorecard. The measure-related biases that are especially relevant in this study are the common-measures bias, first identified by Lipe and Salterio (2000), and biases stemming from the organization and presentation of the performance measures.

The common-measures bias is a well-known instance of a measure-related bias in performance evaluation, particularly in combination with the BSC. According to Lipe and Salterio (2000), the common-measures bias describes the phenomenon that occurs when evaluators compare the performance between two different business units. In this situation they are inclined to put more weight on the measures that both units have in common, as opposed to the measures that are unique to the units (Kang and Fredin, 2012; Lipe and Salterio, 2000). This is problematic for the implementation of the BSC as Kaplan and Norton argue that the ability to implement measures tailored to different business units in a firm is an important aspect of the BSC (Lipe and Salterio, 2000). Further research has found ways to significantly reduce the presence of the common measures bias, for instance by clarifying the strategic linkages between performance measures (Banker et al., 2004) or by giving task feedback (Kang and Fredin, 2012). While the common-measures bias is not directly related to this research, it is included in this section anyway because it is a notable shortcoming of the BSC in particular.

(10)

10

target) of measures within the BSC affect the relative weights evaluators put on financial and non-financial measures. After conducting two experiments, they find evidence that both the organization and presentation in the BSC can significantly alter the weights that evaluators put on financial and non-financial measures. It is therefore important to take into account that the organization and presentation of the BSC can result in additional biases.

Biases related to evaluator’s attitude towards subordinates

The second group of biases, those that are related to the person being evaluated, are also the result of cognitive limitations of the evaluator. Unlike the measures-related biases described in the previous section, these person-related biases are the result of evaluators’ attitude towards subordinates and therefore depend on variables such as trust (distrust) or sympathy (apathy). Accordingly, unlike the measures-related biases, this type of biases is not dependent on the evaluator’s attitude towards the measures used during the performance evaluation. For this reason, the biases in this section are described in relation to performance evaluations in general, rather than to the BSC in particular. The most relevant person-related biases for this research are the halo effect and the leniency error.

One well-known bias that is frequently present during performance evaluations is the halo effect. This concept ranges back to early 20th century when Thorndike (1920) formulated it as an explanation for the fact that psychological rating processes in previous studies were often affected by a constant error. He specifically noted that “ratings were apparently affected by a marked tendency to think of the person in general as rather good or rather inferior and to color the judgments of the qualities by this general feeling” (Thorndike, 1920). In other words, evaluators tend to judge subordinates based on their general impression of them, rather than their actual performance in certain categories (Nisbett and Wilson, 1977). This means that an evaluator’s general impression of a subordinate based on, for instance, their attitude, can skew the ratings that the subordinate in question receives. In that case another subordinate, who might be doing a better job in the area that is being evaluated, might still receive a lower rating than the first subordinate that leaves a better overall impression. The counterpart of the halo effect is the horn effect, which takes place when a subordinate is rated lower because their evaluator has a negative general impression of them (Boachie-Mensah and Seidu, 2012).

(11)

11

ratings to be unfair. This means that they perceive the making of unfavorable errors to be worse than the making of favorable errors and consequently they will give ratings that are skewed upwards.

The previous sections list a number of important biases that prevent performance evaluations from functioning optimally. It must be noted that in reality there are a very large amount of biases and cognitive limitations and that it is therefore impossible to completely eliminate performance evaluation errors. Nevertheless, there is an abundant history of research on factors that might remedy rating errors or worsen them. One commonly researched factor that can reduce rating errors is the presence of specific training that helps with making evaluators aware of the biases they exert (Latham, Wexley and Pursell, 1975; Smith, 1986). The problem with this approach in reducing biases is that the costs might be too high in comparison to the benefits and that it does not work for a significant amount of biases (Smith, 1986). Trust

In this study I will investigate the role of trust, which is another factor that could affect the evaluator’s biases during performance evaluations with the BSC. More specifically, since trust depends on the evaluator’s attitude towards their subordinates, it is reasonable to assume that it only directly affects the second group of biases (i.e. those against the subordinate). In this context trust could possibly have a remedying effect by encouraging evaluators to confront their own biases in order to give a fair judgment. On the other hand, it could also have the opposite effect by introducing additional biases. For instance, trust might generate a halo effect because evaluators could give higher ratings as a consequence of their overall positive impression of a subordinate.

Before establishing a concrete definition of trust in this study, I will first elaborate on rater affect, which is a closely related variable that has been the subject of several prominent studies in the performance evaluation literature. In fact, according to Levy and Williams (2004) rater affect, defined as the liking of one’s subordinate, is one of the most studied variables in this research area. Lefkowitz (2000) conducts a comprehensive literature review of affect in performance evaluation. He finds that evaluator’s affect for their subordinate is generally related to higher performance ratings, reduced accuracy and a greater halo effect. Moreover, Cardy and Dobbins (1986) also find that affect reduces the accuracy of performance ratings after conducting an experiment among psychology students.

(12)

12

subordinates in accurately assessing their performance (Mayer and Davis, 1999; Levy and Williams, 2004). Fewer studies take the manager as focal point and even then, as reflected in the research of Whitener, Brodt, Korsgaard and Werner (1998), the aim is to understand how managers can become more trustworthy. In this study however, I examine trust from the perspective of managers as the evaluators during performance evaluation. In this process trust could act similarly to affect by influencing the cognitive limitations of evaluators that lead to biases. To investigate this, the definition of trust (distrust) in this study is the confidence that a supervisor has in their subordinate that he or she will continue to perform positively (negatively) based on past behavior. This definition is related to the concept of behavioral consistency, which Whitener et al. (1998) consider to be one of the most important aspects of trust.

Investigating the role of evaluator biases

By focusing on trust and the distinction between objective and subjective measures in this study, I hope to shed some light on the two groups of evaluator biases. Like mentioned earlier, the contrast between these groups has not been the subject of any significant research and therefore not much is known about how they function in practice. In fact, it is especially unclear how the biases related to the person and the biases related to the measures interact with each other and what effect this has on evaluator ratings. The combined effect of the biases related to the person and the biases related to the measures is therefore one of the things this paper seeks to clarify.

Since it is impossible to observe directly how and to what extent these biases affect evaluators, both groups of biases have to be manipulated in order to observe their outcome on performance ratings. For this reason, the two earlier described concepts are used, namely the objectivity (subjectivity) of the performance measures and the evaluator’s trust (distrust) in their subordinate. On the one hand, the distinction between objective and subjective measures can be used to observe the role of the biases related to the measures, since these measures are susceptible to biases in different degrees. On the other hand, trust could play an important role in observing the biases related to the person, since it affects the evaluator’s attitude towards their subordinates. By manipulating these two variables I therefore hope it will become clearer what the standalone effect of both these types of biases on the ratings of evaluators is and how they interact with each other in this context. Consequently, the question that this study seeks to answer is the following:

(13)

13

HYPOTHESES

As mentioned earlier, subjective and objective measures are generally found to not be interchangeable (Bommer et al., 1995; Rich et al., 1999). It is therefore probable for conflicts to arise between the two types when both are present during a performance evaluation, especially when they contradict each other. In these cases, the evaluator can put more weight on one type of measures than the other in deciding a subordinate’s performance. They are likely to put most weight on the type of measure that they deem most important or reliable, and therefore base their rating disproportionally on either objective or subjective measures. Taking into account that subjective measures do not have clear benchmarks and are more affected by distortions than objective measures (Moers, 2005), I hypothesize that evaluators generally feel less comfortable with using subjective measures since they perceive them as being more unreliable and possibly unfair. For this reason, I assume that performance evaluators are inclined to prefer objective over subjective measures. Accordingly, the first hypothesis of this study is formulated as follows:

Hypothesis 1: Ceteris paribus, performance evaluators will put more weight on objective measures than subjective measures when both types of measures are present during performance evaluation.

When it comes to the presence of trust during performance evaluations, its effect is rather straightforward. I assume that trust, defined in this study as the confidence that a subordinate will continue to perform as previously, leads to similar distortions as with rater affect. Lefkowitz (2000) for example, found that affect is generally related to a greater halo effect, less inclination to punish poor performance, and higher performance ratings. He states that these distortions are often the result of unconscious reactions to the subordinates and the task they are asked to perform. I hypothesize that similar reactions will take place based on the past performance of subordinates. In the presence of trust evaluators will therefore give disproportionally higher performance ratings. One of the biases that causes this is the halo effect, which leads the evaluator to base their rating on their positive general impression of a subordinate.

(14)

14

case of distrust, evaluators’ initial belief will be that the subordinate’s performance is negative and they will therefore focus on the negative information. The presence of trust thus implies that evaluators (often subconsciously) tend to put more weight on the positive measures and therefore less on the negative measures. This phenomenon then results in inconsistently high ratings. If distrust is present the opposite would occur, which means that more weight is put on the negative measures, resulting in unjustifiably lower ratings. Therefore, the second hypothesis in this study is defined as:

Hypothesis 2: Ceteris paribus, performance evaluators will put disproportionally more weight on positive measures and less weight on negative measures when they trust a subordinate as opposed to when they distrust a subordinate.

In practice, both differing types of measures and some degree of trust are present during performance evaluations. The evaluator’s rating is then affected by biases related to the measures being used and biases related to the person being evaluated. In this case it is likely for conflicts to occur. Take for instance a situation where an evaluator has a very high level of trust in their subordinate because of excellent past performance, yet during the current evaluation he finds that this subordinate has performed awfully on the measures he prefers and better on the measures he considers unreliable. Should this happen he is forced to reevaluate his existing assumptions as his biases against the measures clash with his biases for the subordinate. In this case he is likely to be more affected by either his preferred measures or his trust in his subordinate.

I hypothesize that whether an evaluator in this situation bases his ratings disproportionally on either the measures or on trust depends on whether the evaluator trusts or distrusts their subordinate. If there is a high level of trust I speculate that they are likely to base their rating disproportionally on their trust in the subordinate, because biases such as the halo effect and confirmation bias will dominate their decision. The role of trust in this conflict means that, assuming that evaluators normally prefer objective measures, less weight is now put on the objective measures when these are negative and the subjective measures positive. However, if the objective measures are positive and subjective measures are negative, there would be no conflict between the measures and trust. In this case I hypothesize that trust will reinforce the measures, leading to higher ratings than in a similar situation where trust is not a factor.

(15)

15

evaluator’s preference for objective measures and on the other hand their distrust in a subordinate. I hypothesize that in this scenario evaluators will base their rating disproportionally on their preferred measures, because they do not want to be unfair or make unfavorable errors. This means that when the objective measures of a distrusted subordinate are positive and subjective measures negative, the rating given by the evaluator is higher than in a situation where trust is not a factor. Furthermore, if the objective measures are negative and subjective measures positive, then there would again be no conflicts between both groups of biases and I assume again that the biases against negative objective measures are strengthened by distrust.

To summarize, introducing trust into a situation where subjective and objective measures contradict each other, leads to a conflict between evaluator’s biases against or for the measures and biases against or for the subordinate. I argue that their rating will be the result of which group of biases has the upper hand. In the case of trust, biases like the halo effect and confirmation bias resulting from trust will overshadow evaluator’s preferences for measures. In the case of distrust, however, evaluators will be more affected by the leniency error and they will therefore base their rating disproportionally on their preferred measures. The following hypotheses convey the weights that evaluators put on objective and subjective measures when trust plays a role and the measures contradict each other:

Hypothesis 3a: In the presence of trust, and assuming that both objective and subjective measures are present that signal contradicting results, evaluators will base their ratings disproportionally more on objective (subjective) measures when the objective (subjective) measures are positive and the subjective (objective) measures negative.

Hypothesis 3b: In the presence of distrust, and assuming that both objective and subjective measures are present that signal contradicting results, evaluators will base their ratings disproportionally more on objective measures regardless of which measures are positive and which are negative.

METHODOLOGY

Experimental design

(16)

16

consists of four objective and four subjective measures, as well as some short background information for each manager to prime trust (or distrust). To ensure that the distinction between objective and subjective measures was clear to participants the BSC was specifically divided into an objective and a subjective category. The background information consists of a manager’s tenure at the company, their commitment to the company as a proxy for trust and in some cases a general assessment of their performance in previous years. Each participant was then asked to rate four managers based on the information provided on a seven-point scale adapted from Banker (2011) that ranges from terrible to excellent. The average rating of each employee’s performance is used as the dependent variable in this study.

Table 2: An example of the balanced scorecard as presented during the experiment. This employee has negative objective and positive subjective measures. The aim of the information provided was to invoke trust. Employee:

Charlotte

Charlotte has started working as a restaurant manager in the company 4 years ago. She has always posted good results and has shown that she knows how to manage a restaurant. She is very committed in managing her restaurant and wants there to be a pleasant atmosphere for both customers and employees. Consequently, she has always received very favorable ratings from her customers and employees.

OBJECTIVE MEASURES

Measures Target Actual % better than target Financial Controllable margin 20,000 18,950 -5.25% ▼

Customer Number of covers served 50,000 48,000 -4.00% ▼

Internal process Food waste (in kilograms) 26,000 27,200 -4.62% ▼

Learning & growth Employee training investments 4,000 3,850 -3.75% ▼

SUBJECTIVE MEASURES

Measures Target Actual % better than target Financial Realized/budgeted expenses 100% 96% 4.00% ▲ Customer Mystery shopper rating 7.2 7.5 4.17% ▲ Internal process Expert efficiency rating 7.0 7.3 4.29% ▲ Learning & growth Employee satisfaction rating 7.5 7.9 5.33% ▲

(17)

17

either positive or negative values. Ultimately, this means that there are four managers for each trust situation with one having both positive objective and subjective measures, one having positive objective measures and negative subjective measures and so forth. To ensure that the individual characteristics of each employee would not influence the outcome of the experiment, the subordinates were randomly assigned a male or female name, with a total of four male and four female names. No further information on their individual characteristics (e.g. age) was given in order to avoid other demographic biases.

Control variables

Three possible control variables were incorporated into the experiment. The first one is the participant’s propensity to trust, which is defined by Colquitt, Scott and LePine (2007) as “a stable individual difference that affects the likelihood that a person will trust.” They state that someone’s propensity to trust is a form of their personality and it is perhaps the most important factor when it comes to trusting unknown persons. Therefore, the reason why trust propensity is included as a control variable is that the trust propensity of participants could affect the outcome of this experiment, since they are provided with an unknown hypothetical description of a manager. To measure a participant’s propensity to trust, an 8-point scale was adapted from Mayer and Davis (1999), who use a shortened version of a scale designed by Rotter (1967).

The second control variable that could possibly affect the outcome of the experiment is a subject’s self-perceived objectivity. Uhlmann and Cohen (2007) argue that people with a high level of self-perceived objectivity tend to feel that their thoughts and beliefs are based in objectivity and therefore are valid. Meanwhile, because of these feelings, they tend to overlook their own inherent biases. The research of Uhlmann and Cohen (2007) accordingly finds that a higher level of personal objectivity leads to increased gender discrimination in hiring practices. In the context of this research this variable is also relevant, as persons with a higher level of self-perceived objectivity might skew the results because they are more susceptible to biases. To measure the self-perceived objectivity in this experiment a 4-point scale was adapted from Uhlmann and Cohen (2007), who in turn based their scale on Armor (1998).

(18)

18

demographic information was given, similarity in gender can pose a useful control variable to control for the similar-to-me error.

Procedure

The experiment was distributed in the form of a survey constructed with the help of Qualtrics survey software. Each participant was first shown a description of the role they would assume as evaluator together with a clarification of the BSC and its aspects. Next, they were given a general overview of how the remaining part of the survey was structured. They then had to judge four participants out of the total of eight earlier specified employees. The four participants were randomly assigned through Qualtrics’ loop function to ensure that the order in which the employees were presented would not affect ratings. Moreover, to guarantee that participants would not ignore the background information that primed (dis)trust they first had to click a confirmation button in order to see an employee’s BSC and rate them accordingly. After rating four employees in this way participants had to fill out several questions in order to measure their propensity to trust and self-perceived objectivity. Afterward, they were questioned about their familiarity with the BSC and how difficult they perceived the judgment task to be, along with some standard demographic questions. At the end of the survey they received information on how to qualify for the incentive, which consisted of two randomly distributed gift cards worth 20 Euros each.

Participants

(19)

19

RESULTS

Table 3 shows the descriptive statistics of all ratings that were given to the employees. In order to test the first hypothesis the employees whose objective measures are positive and subjective measures negative (i.e. employee 2 and 6) are compared with the employees whose objective measures are negative and subjective measures positive (i.e. employee 3 and 5). The second group was found to have received significantly higher ratings than the first group, t(53) = -4.811, p < .001. Since employees with positive subjective measures receive significantly higher ratings than those with positive objective measures, evaluators would have put more weight on the subjective than the objective measures. The first hypothesis, which stated that evaluators put more weight on objective measures, is therefore rejected because these findings signal the opposite.

Table 3: Descriptive statistics. Note that the ratings range from 1 (terrible) to 7 (excellent). Employee Objective measures Subjective measures Trust or distrust Amount (n) Mean rating Standard deviation 1. Audrey Positive Positive Trust 10 6.20 0.79 2. Brian Positive Negative Trust 10 4.10 0.88 3. Charlotte Negative Positive Trust 12 5.17 0.58 4. David Negative Negative Trust 10 3.40 0.52 5. Eric Positive Positive Distrust 9 4.67 1.12 6. Grace Positive Negative Distrust 9 3.78 0.83 7. Harry Negative Positive Distrust 13 4.15 0.80 8. Iris Negative Negative Distrust 11 2.18 0.60

Total 84 4.21 1.35

(20)

20

Next, to investigate the third hypothesis a regression is done with all the data. To achieve this, three dummy variables were used. Two variables, objective performance and subjective performance were coded with a 1 or 0 is the employee had respectively a positive or negative information on those measures. The other variable, trust was denoted with a 1 for trust and 0 for distrust. While this model does show support for my earlier findings (since the coefficient of subjective measures is much higher than that of objective measures and since trust has a significant effect on ratings), there is no significant evidence that could support the third hypothesis.

Table 4: Regression results

Model 1 DV: Employee rating Coefficient P-value Constant -3.89** .00 Trust -0.84* .01 Objective performance -1.04** .00 Subjective performance -1.53** .00 Propensity to trust -0.08* .01

Preference for objectivity -0.03 .61

Similarity in gender -0.15 .38

Trust * Objective performance -0.12 .73 Trust * Subjective performance -0.39 .26

Observations -84

Adjusted R-squared -0.68

*, ** Results are significant at respectively α = 0.01 and α = 0.001.

(21)

21

DISCUSSION AND CONCLUSION

With the introduction of the balanced scorecard, Kaplan and Norton aspired to remedy the shortcomings of traditional performance measurement systems. As the BSC has evolved from a simple management tool to a complex system that is integrated throughout entire companies, it also became widely used during performance evaluations. Nevertheless, the existing literature on the BSC has identified significant shortcomings in the form of evaluator biases that prevent it from being used optimally during performance evaluations. With this study my aim was to take a deeper look at two distinct groups of evaluator biases, namely those against the measures in the BSC and those against the subordinate that receives a rating on their performance. To achieve this, I have investigated evaluators’ preference for objective versus subjective measures and their trust in subordinates.

Based on my research it can be concluded that evaluators put more weight on subjective measures over objective measures when performance evaluations consist of both. This finding differs from my initial hypothesis and is perhaps especially remarkable when one considers the fact that all participants believed their decisions to be based more on objective than subjective factors according to the self-perceived objectivity scale. This does show support for the findings of Uhlmann and Cohen (2007), who argue that people with a high level of self-perceived objectivity tend to overlook their own inherent biases. However, these findings are in contrast to those of other authors, such as Moers (2005), who finds that evaluators prefer objective measures over subjective measures. Nevertheless, despite the fact that subjective measures do not have clear benchmarks and are more affected by distortions (Bommer et al., 1995), the participants in this study clearly put more weight on subjective measures than objective measures. Taking into account that existing literature (Bommer et al., 1995; Moers, 2005; Rich et al., 1999) has found evidence that subjective measures are more affected by biases than objective measures, these findings imply that a BSC with more subjective measures included leads to more distortions than a BSC that contains mostly objective measures.

(22)

22

the other trust-related aspect that I investigated, the propensity to trust, was found to have a slight negative effect on ratings. This implies that more trusting individuals do indeed give (slightly) different ratings than less trusting individuals.

The third and final aspect of performance evaluations that I investigate is what happens when an evaluator’s preference for certain measures contradicts their relationship with a subordinate. Regrettably, the results related to this hypothesis were not significant. In theory a contradiction between an evaluator’s biases for or against certain measures and for or against a subordinate should lead to one group of biases affecting the rating more than the other. Since the leniency bias has been shown to significantly influence ratings before (Bretz, Milkovich and Read, 1992; Golman and Bathia, 2012), I assume that it will affect the evaluator’s rating in a similar way when they distrust a subordinate. If they trust their subordinate, on the other hand, the leniency bias should not have a significant presence, while other biases related to the subordinate, such as the halo error and confirmation bias, should have. However, in order to confirm whether this is the case more extensive research is necessary.

These findings add to the existing literature on performance evaluations and on the BSC in multiple ways. First, by investigating the evaluator’s preference for subjective measures over objective measures, this paper adds to existing studies on the different types of performance measures in the performance evaluation literature. Moreover, by investigating the role of trust and notably taking the evaluator himself as a focal point, this study further adds to the performance evaluation literature as well as the broad literature surrounding the concept of trust. Finally, by distinguishing between evaluator biases against measures and evaluator biases against subordinates, this research also adds to the extensive literature on cognitive biases and decision making.

It would also be wise for practitioners to take these findings into account, since arguably the best way to reduce biases is to be aware of them. The results of this study show that evaluators give significantly higher ratings towards those subordinates that they trust. In reality, while employee’s past performance should definitely be taken into account, it should ideally not be an indicator for their performance during the current year. Moreover, the findings of this research also indicate that evaluators disproportionally base their ratings on subjective performance measures. This is also something that should be taken into account during performance evaluations as previous literature has stated that these measures can distort ratings more than objective measures (Bommer et al., 1995; Moers, 2005).

(23)

23

study’s experiment might have influenced the ratings given by participants, even though precautions were taken to reduce these biases. Secondly, unlike similar research that this study is based on (Kang and Fredin, 2012; Lipe and Salterio, 2000), this study’s sample consists mostly of non-MBA students. This might imply that the experiment’s findings are not generalizable to the population of performance evaluators. Moreover, additional discrepancies could be caused by the fact that the sample size in this study’s experiment is also smaller than that of most existing research. A third limitation of this research is that in practice evaluators often, either consciously or unconsciously, compare between the performances of different employees. Therefore, ratings during actual performance evaluations might differ further from the findings in this theoretical study.

Regardless of these limitations, this study lays a groundwork on which future research can build in several ways. To reduce some of the earlier mentioned limitations, a similar experiment should be held on a larger scale. It would be preferable if this happened among actual performance evaluators who have more knowledge and experience with rating employees. It is also necessary for more research to be done into the interaction between evaluator biases related to performance measures and evaluator biases related to subordinates in order to better understand how they function.

REFERENCES

2GC Active Management. (2017). Balanced scorecard usage survey 2017. Retrieved from https://2gc.eu/assets/files/site/Survey_Files/2017_Survey_Document_final_180123.pdf.

Baker, G., Gibbons, R., & Murphy, K. J. (1994). Subjective performance measures in optimal incentive contracts. The Quarterly Journal of Economics, 109(4), 1125-1156.

Banker, R. D., Chang, H., & Pizzini, M. J. (2004). The Balanced Scorecard: Judgmental effects of performance measures linked to strategy. The Accounting Review, 79(1), 1-23.

Bates, R. (2002). Liking and similarity as predictors of multi-source ratings. Personnel Review, 31(5), 540-552.

Boachie-Mensah, F., & Seidu, P. A. (2012). Employees’ perception of performance appraisal system: A case study. International Journal of Business and Management, 7(2), 73-88.

Bol, J. C. (2011). The determinants and performance effects of managers' performance evaluation biases. The Accounting Review, 86(5), 1549-1575.

(24)

24

Bommer, W. H., Johnson, J. L., Rich, G. A., Podsakoff, P. M., & MacKenzie, S. B. (1995). On the interchangeability of objective and subjective measures of employee performance: A meta‐analysis. Personnel Psychology, 48(3), 587-605.

Bond, S. D., Carlson, K. A., Meloy, M. G., Russo, J. E., & Tanner, R. J. (2007). Information distortion in the evaluation of a single option. Organizational Behavior and Human Decision Processes, 102(2), 240-254.

Bretz Jr, R. D., Milkovich, G. T., & Read, W. (1992). The current state of performance appraisal research and practice: Concerns, directions, and implications. Journal of Management, 18(2), 321-352.

Cardinaels, E., & van Veen-Dirks, P. M. (2010). Financial versus non-financial information: The impact of information organization and presentation in a Balanced Scorecard. Accounting, Organizations and Society, 35(6), 565-578.

Cardy, R. L., & Dobbins, G. H. (1986). Affect and appraisal accuracy: Liking as an integral dimension in evaluating performance. Journal of Applied Psychology, 71(4), 672-678.

Colquitt, J. A., Scott, B. A., & LePine, J. A. (2007). Trust, trustworthiness, and trust propensity: a meta-analytic test of their unique relationships with risk taking and job performance. Journal of Applied Psychology, 92(4), 909-927.

Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal. Journal of Applied Psychology, 66(2), 127-148.

Golman, R., & Bhatia, S. (2012). Performance evaluation inflation and compression. Accounting, Organizations and Society, 37(8), 534-543.

Hartmann, F., & Slapničar, S. (2009). How formal performance evaluation affects trust between superior and subordinate managers. Accounting, Organizations and Society, 34(6-7), 722-737.

Hoque, Z. (2014). 20 years of studies on the balanced scorecard: trends, accomplishments, gaps and opportunities for future research. The British Accounting Review, 46(1), 33-59.

Ittner, C. D., Larcker, D. F., & Meyer, M. W. (2003). Subjectivity and the weighting of performance measures: Evidence from a balanced scorecard. The Accounting Review, 78(3), 725-758.

Jonas, E., Schulz-Hardt, S., Frey, D., & Thelen, N. (2001). Confirmation bias in sequential information search after preliminary decisions: an expansion of dissonance theoretical research on selective exposure to information. Journal of Personality and Social Psychology, 80(4), 557-571.

Kahneman, D. (2003). Maps of bounded rationality: Psychology for behavioral economics. American Economic Review, 93(5), 1449-1475.

Kang, G., & Fredin, A. (2012). The balanced scorecard: the effects of feedback on performance evaluation. Management Research Review, 35(7), 637-661.

Kaplan, R.S., & Norton, D. P. (1992). The balanced scorecard - measures that drive performance. Harvard Business Review, 70(1), 71-79.

(25)

25 Boston, MA: Harvard Business School Press.

Kaplan, R.S., Norton, D.P., & Rugelsjoen, B. (2010). Managing alliances with the balanced scorecard. Harvard Business Review 88(1), 114-120.

Latham, G. P., Wexley, K. N., & Pursell, E. D. (1975). Training managers to minimize rating errors in the observation of behavior. Journal of Applied Psychology, 60(5), 550-555.

Lawrie, G., & Cobbold, I. (2004). Third-generation balanced scorecard: evolution of an effective strategic control tool. International Journal of Productivity and Performance Management, 53(7), 611-623.

Lefkowitz, J. (2000). The role of interpersonal affective regard in supervisory performance ratings: A literature review and proposed causal model. Journal of Occupational and Organizational Psychology, 73(1), 67-85.

Levy, P. E., & Williams, J. R. (2004). The social context of performance appraisal: A review and framework for the future. Journal of Management, 30(6), 881-905.

Lipe, M.G., and S.E. Salterio. (2000). The Balanced Scorecard: Judgmental Effects of Common and Unique Performance Measures. The Accounting Review 75 (3): 283–98.

Lunenburg, F. C. (2012). Performance appraisal: methods and rating errors. International Journal of Scholarly Academic Intellectual Diversity, 14(1), 1-9.

Mayer, R. C., & Davis, J. H. (1999). The effect of the performance appraisal system on trust for management: A field quasi-experiment. Journal of Applied Psychology, 84(1), 123-136.

Moers, F. (2005). Discretion and bias in performance evaluation: the impact of diversity and subjectivity. Accounting, Organizations and Society, 30(1), 67-80.

Nisbett, R. E., & Wilson, T. D. (1977). The halo effect: Evidence for unconscious alteration of judgments. Journal of Personality and Social Psychology, 35(4), 250-256.

Norreklit, H. (2000). The balance on the balanced scorecard a critical analysis of some of its assumptions. Management Accounting Research, 11(1), 65-88.

Perkins, M., Grey, A., & Remmers, H. (2014). What do we really mean by “Balanced Scorecard”?. International Journal of Productivity and Performance Management, 63(2), 148-169.

Prendergast, C., & Topel, R. H. (1993). Discretion and bias in performance evaluation. European Economic Review, 37(2-3), 355-365.

Prendergast, C., & Topel, R. H. (1996). Favoritism in organizations. Journal of Political Economy, 104(5), 958-978.

Rich, G. A., Bommer, W. H., MacKenzie, S. B., Podsakoff, P. M., & Johnson, J. L. (1999). Apples and apples or apples and oranges? A meta-analysis of objective and subjective measures of salesperson performance. The Journal of Personal Selling and Sales Management, 41-52.

(26)

26

Rotter, J.B. (1967). A new scale for the measurement of interpersonal trust. Journal of Personality, 35(4), 651-665.

Simon, H. A. (1979). Rational decision making in business organizations. The American Economic Review, 69(4), 493-513.

Smith, D.E. (1986) Training programs for performance appraisal: A review. Academy of Management Review 11(1), 22-40.

Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4(1), 25-29.

Tiwana, A., Wang, J., Keil, M., & Ahluwalia, P. (2007). The bounded rationality bias in managerial valuation of real options: Theory and evidence from IT projects. Decision Sciences, 38(1), 157-181. Uhlmann, E.L., & Cohen, G.L. (2007). “I think it, therefore it’s true”: Effects of self-perceived objectivity on hiring discrimination. Organizational Behavior and Human Decision Processes, 104, 207-223.

Referenties

GERELATEERDE DOCUMENTEN

Maximale verkoopprijzen afstemmen op marktsituatie Technische kwaliteit afstemmen op marktwensen en overheidseisen Ontwerp (doen) afstemmen op marktwensen en overheidseisen

[r]

Daarentegen wordt in culturen die een lage waarde hebben voor de cultuurdimensie tijd en daarbij op de toekomst zijn georiënteerd meer moderne doelstellingen geformuleerd, dan

Daarentegen wordt in culturen waarbij een hoge mate van collectivisme heerst meer doelstellingen geformuleerd die in het belang zijn van de stakeholders, dan in culturen met een

voornaamste gebruikte methode is het 'scouten' (meestal i.c.m. signaalplaten) waarbij men de plaag pas constateert als er schade is of als deze een bepaald niveau al heeft bereikt.

Intranasal administering of oxytocin results in an elevation of the mentioned social behaviours and it is suggested that this is due to a rise of central oxytocin

In afnemende frequentie van vóórkomen zijn dit: (a) minder belang hechten aan afwijkingen op de ver- schillende dimensies; (b) het in technische zin aanpas- sen van

The Messianic Kingdom will come about in all three dimensions, viz., the spiritual (religious), the political, and the natural. Considering the natural aspect, we