The influence of control system design choices on forced distribution rating systems

(1)

The influence of control system design choices on

forced distribution rating systems

Name: Regi Gaston Pothuizen Student number: 11095067

Thesis supervisor: dhr. R.S. (Razvan) Ghita Date: 25-6-2017

Word count: 12.980

MSc Accountancy & Control, specialization Accountancy & Contol Amsterdam Business School

(2)

2 Statement of Originality

This document is written by student Regi Gaston Pothuizen who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3 The influence of control system design choices on forced distribution rating systems

Abstract

This paper examines if the rating purpose and information accuracy could influence the perception of fairness of managers when using forced distribution rating systems. Prior research has shown that there is discussion about the influence of the rating purpose on the perception of fairness. High accurate information leads to an increase of fairness. The study consists of two hypotheses. First hypothesis is that raters see an non-administrative purpose as more fair when the rating is based on low accurate information compared to high accurate information. The second hypothesis is that raters see an administrative purpose as more fair when the rating is based on high accurate information compared to low accurate information. The experimental data supports both hypothesis. The conclusion is that the rating purpose and information accuracy are dependent to each other.

Keywords:

(4)

4

Table of Content

1. Introduction: ... 5 2. Literature review/hypothesis: ... 7 2.1 Performance evaluation ... 7 2.2 Forced distribution ... 8 2.3 Fairness perception ... 10

2.4 Role of control system designs ... 12

2.5 Hypothesis development ... 13 3. Research method ... 16 3.1 Experimental design ... 16 3.2 Task ... 16 3.3 Dependent variable ... 17 3.4 Manipulations ... 18

3.5 Procedures & Participants ... 19

4. Results ... 21

4.1 Manipulation checks and descriptive statics ... 21

4.2 Hypothesis test ... 23

4.2.1 Non-administrative purpose (H1) ... 23

4.2.2 Administrative purpose (H2) ... 23

5. Discussion and conclusion ... 25

References: ... 29

Appendices ... 31

(5)

5

1. Introduction:

This thesis examines if the rating purpose and information accuracy could influence the perception of fairness of managers when using forced distribution rating systems. In other words I examine whether there are differences between the perception of fairness of managers when the rating purpose and level of information accuracy differ. Some companies will use a non-administrative rating purpose where the rating do not have any consequences and others will use an administrative purpose where the ratings will have consequences. This thesis is based on two questions. First I investigate how the perception of fairness of managers will evolve between a low level of information accuracy and high level of information accuracy when the rating purpose is non-administrative. This will be done by the question is an non-administrative purpose fair when the rating decisions is based on high accurate information. Secondly I investigate whether the perception of fairness of managers will change when the rating purpose is an administrative purpose. Hereby the question for the managers will be if it is fair that the rating will have consequences when they are based on low levels of accurate information.

Subjective performance evaluation has the disadvantage that it is highly susceptible to biases like compression (Bol, 2011 and Bol et al. 2016). This means that the performance evaluation could differ from the norm. An solving mechanism is forced distribution rating systems, because prior literature assumes that one of the advantages of forced distribution rating systems is that it is more resistant to biases. On the other hand one of the major disadvantages of forced distribution rating systems is that this performance evaluation system is less fair. Scullen et al. (2005) and Schleicher et al. (2009) argue that a low fairness perception of the evaluators could affect the willingness, confidence and commitment to support the forced distribution rating system. Microsoft is one of the companies who shifted away from forced distribution rating systems because of the combination of complaints and a lack of hard support (Holland. 2006). This means that it will be extremely important for companies, which will introduce or maintain forced distribution rating systems, to know how to increase the perception of fairness of the managers who will work with the forced distribution rating systems.

The research question is important because, by understanding the mechanism of how both the rating purpose and information accuracy will influence the perception of fairness of the managers, companies understand how they could increase the perception of fairness of the managers. By increasing the perception of fairness of the managers the willingness, confidence and commitment to support the forced distribution rating system will increase. Which makes the change of a successful introduction or maintaining of the forced distribution rating system greater.

Current literature has only separately investigated the influence of rating purposes and the level of information accuracy on the perception of fairness. In prior literature there is discussion and disunity about the perception of fairness of the rating purposes. Schleicher et al. (2009) concluded that the perception of fairness of the non-administrative purpose was not significantly higher than the administrative purpose. On the other hand Blume et al. (2009) and Scullen et al. (2005) argue that a

(6)

non-6 administrative purpose has been seen as more fair compared to an administrative purpose. Both their argument is that firing employees will lower the attractiveness of the forced distribution rating system. Leventhal et al. (1980) argues that high accurate information is important in the perception of fairness of managers and is one of the six criteria of procedural justice. Leventhal et al. (1980) argue that the degree of accurate information is crucial in the decision-making process of the performance evaluation. To answer the research question, data is collected using a scenario experiment. Participants are recruited by the Prolific.ac website. This platform is used because I needed participants who had experiences with supervising. The task of the participants in the experiment consisted of two phases. First rating the store managers based on their performance and secondly giving their view on the fairness by answering four fairness questions. With the rating purpose and the information accuracy four conditions are created. This are the conditions administrative & low accurate information, non-administrative & high accurate information, non-administrative & low accurate information and administrative high accurate information. The fairness score, which is computed from the four fairness questions, will indicate the perception of fairness per condition.

The results indicate that both the hypothesis are supported. Supporting the first hypothesis means that the managers who rate the performances of the store managers by using a forced distribution rating system see a non-administrative purpose as more fair if they need to rate employees based on low accurate information compared to high accurate information. Supporting the second hypothesis means that the managers who rate the performance of the store managers by using a forced distribution rating system see an administrative purpose as more fair if they need to rate employees based on high accurate information compared to low accurate information.

This study contributes to management accounting research and practice on performance evaluation and especially forced distribution rating systems. By examining different cases on the perception of fairness of forced distribution rating systems, this study helps in better understanding how the perception of fairness could be influenced. Which could help companies in practice with introducing or maintaining forced distribution rating systems. Another contribution is that it will extent the literature on the subjects fairness, forced distribution performance evaluations, rating purpose and information accuracy. By investigating the role of the rating purpose and the information accuracy it gives more insights in how the perception of fairness develops. Lastly this study contributes to the future research call in the article of Bol et al. (2016). They argued that information accuracy should be tested in different settings than they did in their study.

This thesis is structured as follows. In the next chapter I review the literature background of the thesis and develop the hypotheses. The third chapter will consist of the methodology part. This part consists of the experimental design, task, dependent variable, manipulations, procedures and participants and the operationalisation. The fourth chapter reviews the results and tests the hypotheses. The fifth chapter consists of the discussion and conclusion. After chapter five the references and appendices are presented.

(7)

7

2. Literature review/hypothesis:

2.1 Performance evaluation

One of the steps in the framework of Ferreira and Otley (2009) is the performance evaluation step. They argue that evaluating the performances of employees are primarily done with objective evaluations, subjective evaluations or evaluations between both objective and subjective. Ferreira and Otley (2009) state that most of the efforts of the employees could be covered by objective numbers. Bol (2008) contradicts this statement, because individual contribution of the majority of the employees in a firm is hardly measurable by only using objective performance evaluation. He argues that subjective performance evaluation is playing an important role in performance evaluation contracts. Baker et al. (1993) examine that subjective performance measures can improve or compliment the objective measures which are already in place, because they assume that the objective performance evaluation could lead to distorted incentives. Also Berger et al. (2013) argues that it is important for organizations to use frequently subjective appraisals to evaluate the performance of the employee, which could strengthen the job performance evaluation. Other research confirms the importance of subjective performance evaluation by addressing the increased alignment of interest between both the firm and the employee (Gibbs et al. 2004). Gibbs et al. (2004) mention that subjectivity is the most important when jobs involve multiple tasks and decision making. If the work environment is complex and unpredictable subjectivity will be more important in contrast with objective performance evaluation.

Besides the importance of subjective performance evaluation there are also some disadvantages. Milkovich and Newman (2002) conclude that supervisors prefer to use objective performance measures instead of subjective performance measures, because subjective performance measures are more susceptible to generating conflicts with the employees. Bol et al. (2016) conclude that managers in their performance evaluation not only consider how their evaluation will affect the organisation, but they will look after their own personal costs and benefits. This means that in performance evaluation the managers will have incentives to reduce the personal costs. Harris (1994) conclude that managers especially want to avoid the confrontation costs. Bol et al. (2016) sees confrontation as a costly activity because it is time-consuming, psychologically painful or it will lead to much tension. In line with Harris (1994), Bol (2011) argues that this avoidance of confrontation costs will lead to biases of subjective performance evaluation.

Prendergast (1999) identified two types of biases which occur by using subjective performance evaluation. These two biases are leniency bias and centrality bias. The leniency bias means that managers are reluctant to give employees bad ratings. Managers have a centrality bias when they compress ratings between good and bad performers. Compression means that the differences between bad and good performers are too small. With a leniency bias the managers will overstate the performance of employees. For the centrality bias applies that the ratings of all the employees differ from the norm. Golman & Bhatia (2012) conclude that performance information noise creates more inflate performance

(8)

8 evaluation ratings and will demotivate the employees, which gives a poorer performance and lower productivity. Landy et al. (1980) sees forced distribution rating systems as an solving mechanism for biases, because they argue that it is assumed that one of the advantages of forced distribution rating systems is that it is more resistant to biases of the managers like compression. In the next paragraph forced distribution rating systems as performance evaluation system will be reviewed and explained.

2.2 Forced distribution

Landy et al. (1980) & Blume et al. (2009) are both researching forced rating systems. Landy et al. (1980) researched forced choice rating systems and Blume et al. (2009) researched forced distribution rating systems. Both papers look at the specifics of forced rating system. Landy et al. (1980) characterized forced choice rating systems as a system where the managers are required to choose from a set of pre-set categories. Landy et al. (1980) argues that it is assumed that the advantage of forced distribution rating systems is that it is more resistant to biases of the managers like compression.

Blume et al. (2009) makes first the distinction between absolute performance evaluations and relative performance evaluations. With absolute performance evaluations the individual performances are evaluated towards particular standards and with relative performance evaluation the performances are compared with the other performances in the group. Forced distribution is one form of a relative performance evaluation. In this performance evaluation system the managers are forced to make a distinction between the high and low performers. It generally involves sorting employees in performance categories which are predetermined and rank the employees based on the relative performance in the group. Blume et al (2009) stated that at least 20% of American business use forced distribution rating systems. General Electric is one of the main examples. The difference between the forced choice rating system and the forced distribution rating system is that in the forced distribution rating system the managers must rank the performers to a certain distribution (for example 20% to A, 70% to B and 10% to C), where in the forced choice rating system the managers have the possibility to evaluate all the performances in the two best ranks (A and B). In this thesis the forced distribution rating systems will have a central role.

Not all companies use the forced distribution rating systems for the same reasons. Schleicher et al. (2009) states that General Electric uses forced distribution ratings systems to terminate the bottom 10% in the performance distribution. This is one of the extremes. On the other side some companies use this only to put this in a personnel file, which could have impact in feedback meetings. Schleicher et al. (2009) argues that companies use the forced distribution ratings system to determine for example demotions or promotions and the compensation.

One of the most important advantage of forced distribution rating systems is the reduction of the possible biases. Besides the biases, Guralnik et al. (2004) describes other advantages and disadvantages of forced distribution rating systems. According to Guralnik et al. (2004) the reasons for adopting (advantages) forced distribution systems are creating more differentiation of the performance

(9)

9 evaluation, better application, creating a reward an punishment system and it will give more openness to feedback. Differentiation among the ratings of high, medium and low performers will be positive for the success rate of the company. If there is no differentiation in the evaluation system this system will lose its credibility and become redundant and burdensome. Guralnik et al. (2004) describes the advantages of application that every performance, weak or good, should be rewarded based on the performance and do not have any compression in it. Creating a reward and punishment system has the advantage that it will guide the pay and bonus allocation and it will help with the selection for termination. Forced distribution rating systems give the advantage that giving honest feedback will be accepted and becomes part of the company culture.

According to Blume et al. (2009) there are four key elements of forced distribution systems. These four elements has some similarities with the reasons for adoption of the forced distribution rating systems from Guralnik et al. (2004). The four key elements which are researched are the consequences for low performers, differentiation of rewards for top performers, frequency of feedback and the comparison group size. In their experiment they aim to identify if these four elements influence the attractiveness of the forced distribution rating system. The results of the experiments indicates that the consequences for lower performers is playing an important part in the decision for introducing forced distribution rating systems and thereby it has a negative influence. As Guralnik et al. (2004) also concluded, Blume et al. (2009) conclude that the reward differentiation, larger comparison group size and more feedback are all variables which are positive related to the attractiveness of forced distribution rating systems.

In the advantages of Blume et al. (2009) or Guralnik et al. (2004) there is no attention to the consequences for the performances. Berger et al. (2013) investigates the consequences of a forced distribution system on the performances. This is done in an experimental setting. The results of the experiment show that the performance under forced distribution is significantly higher when workers work independently and won’t harm each other easily. Berger et al. (2013) concludes that this substantial gain in performance under forced distribution is because in the baseline, where managers could make their own choices, the performance evaluation is more susceptible to leniency and centrality biases. Besides the performance gain they also conclude that altruistic managers give higher bonuses, where the equity-orientated managers are giving significantly less differentiated ratings. In this matter they found evidence that the social preferences of managers are driving the rating decisions. Besides the advantages of forced distribution Berger et al. (2013) indicate the potential problem with the set-up of forced distribution when employees have experiences with more free and liberal systems. They also indicate that there is the possibility that it only leads to short-term performances. Lastly it also could demotivate the employees, which could lead to lower performances.

Scullen et al. (2005) has the same conclusion as Berger et al. (2013) about the impressive increased performances in the first few years. This will be the effect of firing poor performers. On the other hand they conclude that the expected growth will be slowing down or becomes zero. They also

(10)

10 see some negative effects on for example morale, teamwork, recruiting and shareholder perceptions. Guralnik et al (2004) also sees the dis-advantages which are stated by Scullen et al. (2005). They argue that forced distribution systems could hinder teamwork and collaboration, bring legal challenges in the organisation and foster competition.

In line with Scullen et al (2005), who concludes that the perceptions of unfairness will signal a lack of organisational support, examines Schleicher et al. (2009) the reactions of the managers who rate the employees by using a forced distribution rating system. Schleicher et al. (2009) identified that difficulty and fairness are the two relevant and central dimensions of managers reactions. They argue that organizations could expect some resistance when introducing forced distribution rating systems because of the aspect of unfairness and difficulty. Microsoft is one of the companies who shifted away from the forced performance evaluation systems. The main reason of shifting away is the combination of complaints and the lack of hard support from the employees (Holland. 2006). Schleicher et al. (2009) suggest furthermore that the managers variability in performance evaluations will be reduced by using forced distribution rating systems. Blume et al. (2009) argue that a salient issue on the attraction of forced distribution rating system is the fairness of the process and the rewards.

2.3 Fairness perception

Where in most literature the employee perception of fairness is playing the biggest role. Scullen et al (2005) and Schleicher et al. (2009) will address the importance of the fairness perception of the evaluators. Scullen et al. (2005) and Schleicher et al. (2009) have concluded that a lack of perception of fairness from the evaluators is one of the reasons why forced distribution system ratings couldn’t be a success. They argue that the perceptions of fairness of the evaluators could affect the willingness, confidence and commitment to support the forced distribution rating system. Roch et al. (2007) argues that the fairness perceptions of evaluators could have implication for the attitudes and behavior in the organisation.

In understanding the perception of fairness from evaluators the types of fairness will be explained. Erdogan (2002) argues that the taxonomy which is mostly used to describe fairness is procedural justice and distributive justice. Colquitt et al. (2001) describe justice based on the fairness of the procedures and the outcome of the distributions. They use the same two types as Erdogan (2002). Both Colquitt et al (2001) and Erdogan (2002) considers interactional justice as a third type of justice. Erdogan (2002) and Greenberg (1986) define procedural justice as fairness based on the procedures by which the performances have been evaluated. Which means that the outcome of the evaluation could be fair but the process which lead to the outcome could be unfair. The differentiation of procedural and interactional justice in prior research is because of the thin line between the fairness of the evaluators procedures and the procedures addressed by the companies. The equity theory of Adams (1965) forms the bases of distributive justice. Erdogan (2002) argues that in distributive justice the levels of fairness will be determined on the input- output ratio compared to the results of other. This means that the fairness

(11)

11 will be based on a relative basis. Colquitt et al. (2001) argues that distributive justice only looks at the question if the outcomes are fair. They argue that the process of comparing results to answer the fairness question has some objective component but it is still a subjective process. Colquitt et al. (2001) researched the relationship between the types of justice and concluded that the relationship is the strongest if they use an indirect combination measure and weaker by using procedural fairness perception.

The perception of fairness apply to both the employees and the managers as evaluators. Prior research have mostly more attention to the fairness perception of the employees, but in this research the relevance of the evaluators perception of fairness plays a major role. Schleicher et al. (2009) quoted Meisler (2003) about the concerns that the employees with low performances do not deserve to be fired and that it matters on which bases the system is developed. Schleicher et al (2009) argues that the fairness perceptions of employees also concerns the ratee reactions. As mentioned earlier it is possible that negative perceptions of fairness of raters will affect the commitment, confidence and willingness to support every kind of evaluation system. Roch et al. (2007) researched the fairness perceptions of the raters. In their research they assessed how different rating systems will affect the fairness perceptions. They took both the perceptions of the employees and the raters in account. Roch et al. (2007) argue that raters in general see absolute rating system as more fair compared to relative rating systems like forced distribution rating system. Within the group of relative rating system they see some differences in fairness perceptions.

As Schleicher et al. (2009) mentioned a unfair perception of raters could affect the commitment, confidence and willingness to support evaluation systems negative. Knowing this, it is important for companies to know which criteria will influence the perception of fairness. Colquitt et al. (2001) argues that when companies meet the criteria of procedural justice from Leventhal et al. (1980) this will have positive effects on the perception of fairness. The criteria of Leventhal et al. (1980) consist of six criteria which have only impact on the procedural justice. First the procedures must be applied consistently across time and people. The second criteria is that it must be free from biases. Third it is important that the information is accurate. A mechanism which correct flawed or inaccurate decision is the fourth criteria. The fifth criteria is that the evaluation is conform the standard of ethics or morality of employees. Lastly it is considered important that the opinions of various group which affect the decision must be taken into account. Greenberg (1986) did research on the determinant of the perceived fairness of performance evaluations. He concludes that the ability to challenge or change evaluations and the consistency of evaluations and the standards are two important factors in the perception of fairness. These two criteria are equal to the criteria of Leventhal et al. (1980). Besides the criteria of procedural fairness Greenberg (1986) found two distributive factors. Greenberg (1986) argues the relationship between the performance and rating and between the rating and subsequent action must be fair.

(12)

12 2.4 Role of control system designs

Rating purpose and information accuracy are the two variables which will be used to asses if the fairness perceptions of raters could be influenced.

Rating purpose: Forced distribution rating systems could be used for different purposes. As mentioned Schleicher et al. (2009) use in their research two purposes. This are the administrative and non-administrative purpose. In a company with administrative purposes the ratings will have consequences for the employees. This could be by giving rewards, but also with promotions, demotion or even in the most extreme elimination of the lowest performers. Ratings in companies with non-administrative purposes will not have consequences, but will only be used as feedback. Rewarding or promotions will not be dependent on the ratings the employees receive. Schleicher et al. (2009) hypothesize in their research that the raters will see that forced distribution rating systems are less fair under conditions of administrative purposes compared to non-administrative purposes. After testing the hypothesis they concluded that the raters found that forced distribution rating systems with administrative purposes are somewhat less fair compared to forced distribution rating systems with non-administrative purposes, but that this difference was not significant. Which means that their hypothesis is not supported.

Blume et al. (2009) and Scullen et al. (2005) argue in contrast of Schleicher et al. (2009) that firing and termination of the lowest performers will affect the perception of fairness. Blume et al. (2009) found significant results in their experiment to support the hypothesis that firing the lowest performers will lower the attractiveness for forced distribution rating systems. In their hypothesis development Blume et al. (2009) argue that the perception of fairness is playing a role in this hypothesis. Scullen et al. (2009) conclude that firing requirements used in forced distribution rating systems could violate the perception of fairness. They argue that if employees see that colleagues could be fired because of their performances then it could be possible that themselves will be terminated in the future. This means that both Blume et al. (2009) and Scullen et al. (2005) argue that forced distribution with non-administration purposes could be seen as more fair compared to systems with an administration purpose.

Information accuracy: Bol et al. (2016) argue that information accuracy is referring to which extend the control system gives information that is informative about the work efforts of the employee. They also argue that accurate information about performances reduces the uncertainty about the efforts of the employee. Leventhal (1980) concluded that the level of accuracy of information is one the six factors which will influence the perception of fairness. With high accurate information the perception of fairness will be mostly higher.

(13)

13 2.5 Hypothesis development

As mentioned above one of the important arguments of using forced distribution rating systems is the decreased possibilities of biases, like compression of incentives. Besides the advantage of decreasing the change of biases in subjective performance evaluation, Guralnik et al. (2004) argues that creating more differentiation of the performance evaluation, better application, creating a reward an punishment system and more openness to feedback are advantages of forced distribution rating systems. At the same time the perception of fairness plays an important role in the acceptance of forced distribution rating systems by managers (Schleicher et al 2009). Forced distribution rating systems could be used for different purposes. In the research of Schleicher et al. (2009) they hypothesize that forced distribution rating systems with elimination, promotion and demotion purposes, like the purpose of General Electrics, is the most unfair system. This could be seen as the administrative purpose. The system with non-administrative purposes, which means that the rating of forced distribution has no consequences, is being seen as the least unfair. In this research the difference between administrative purposes and non-administrative purpose will be investigated. Schleicher et al. (2009) tested in their research that the perception of fairness of the managers is will be slightly lower when the forced distribution rating system has an administrative purpose compared to a non-administrative purpose, but this was not significant. As mentioned in the previous paragraph both Blume et al. (2009) and Scullen et al. (2005) concluded that non-administrative purpose has been seen as more fair compared to administrative purposes of forced distribution rating systems. They argue that firing employees will lower the attractiveness of forced distribution rating systems because this is considered as unfair. The question will be on which direction the perception of fairness of both the purposes will develop in relation with the level of information accuracy.

Primarily the managers’ fairness perceptions will be influenced by the kind of procedures and the potential outcome of the evaluation. High fairness perceptions are important for companies’ their performance evaluations. With a low fairness perception of the managers the commitment, confidence and support for the forced distribution rating system will be low and could even result in problems. In general the fairness perception of managers will be lower when the information accuracy is relatively low. With inaccurate information the probability that inaccurate information gives a poor view about the efforts and activities of the employees is high. With this uncertainty about the accuracy of information it will be difficult for managers to evaluate the performances of the employees, in other words it makes it hard for evaluators to justify their choices and ratings. If this is placed in a forced distribution rating system, low accurate information could result in some issues. By using a forced distribution rating system it is possible that because of inaccurate information some employees get rated in a better distribution then someone else. The probability that this happens with accurate information is much lower. With inaccurate information managers will feel uncomfortable to give a rating because of the low perceptions of fairness. Is it fair to give some ratings based on information which is not accurate?

(14)

14 Prior research suggests that high accurate information is important in the perception of fairness of managers. According to the criteria of procedural justice of Leventhal et al. (1980) accurate information is one of the six criteria. Leventhal et al. (1980) argues that the degree of accurate information is crucial in the decision-making process of the performance evaluation. Also Landy et al. (1978) argue that accuracy has a correlation with the perception of fairness. In their study they investigated how certain process variables correlated with fairness and accuracy. Primarily accurate information is important in improving the procedural justice perception of managers. Hartmann & Slapnicar (2012) conclude that increasing the procedural justice will increase the motivation for performance evaluation and in this research particular to the use of the forced distribution rating system.

In this research the question will be how will the level of information accuracy influence the perception of fairness when the rating systems has an non-administrative purpose or an administrative purpose. Based on the prior literature the statement is that low accurate information will increase the managers perception of fairness when the forced distribution rating system has an non-administrative purposes. With low accurate information managers will argue that it is unfair to justify the ratings they give, because there is a high possibility of giving a wrong rating. So is it for example possible in a forced distribution rating system that some employees will be evaluated in the wrong distribution, because of the inaccurate information. When this ratings have consequences it could lead to the situation that some employees which do not deserve a promotion get a promotion, but even more unfair is to eliminate someone who does not deserve elimination. This in mind, managers will feel that it is more fair that with low accurate information available forced distribution rating systems have a non-administrative purpose, because the ratings will not have much consequences and will give only an view about the performances of the employees.

High levels of accurate information means that managers could justify their ratings and exactly know how the employees have performed. By knowing exactly how employees have performed managers won’t have the uncertainty that they give ratings to people which they do not deserve. They will have more certainty about how poorly bad performers have performed and how good the great performers have performed. Hereby managers will feel more comfortable to give a rating to employees. Besides more confidence in giving ratings, managers will have more the feeling that it will be fair that the performances should have consequences. If the ratings, based on accurate information do not have consequences high performer will feel that their performances are less appreciated and it will not matter the management. This will give unfairness. Also Greenberg (1986) argues that the relationship between the performance and rating and between the rating and subsequent action must be fair. Which means that the perception of fairness will decrease when the forced distribution rating system has a non-administrative purpose. Which will result in the following hypothesis:

(15)

15 H1: Raters assigned to rate the performances of employees under an non-administrative condition will see forced distribution rating systems as more fair when the rating is based on low accurate information, compared to basing the ratings on high accurate information.

Forced distribution rating systems with non-administrative purposes will be more fair when the information accuracy is low then high. The perception of fairness will be the opposite when it has an administrative purpose. As mentioned before managers cannot justify the rating they will give to employees when they need to base their evaluation on low accurate information. Low accurate information could give a total different view about the performances of the employees. If the rating than will be used for determining promotion, demotions, rewards or even elimination this will influence the perception of fairness of the managers. In other words when the forced distribution rating system has an administrative purpose managers will feel that this will be unfair when the information is not accurate. They will have an unfair feeling about the possibility of eliminating the lowest performers when this is based on information which has high levels of uncertainty.

When forced distribution rating systems has an administrative purpose managers will only see it as fair when they have accurate information on their disposal. As mentioned before high levels of accurate information are positive for managers to justify their choices and ratings. By knowing exactly how employees have performed managers will not have the uncertainty that they give ratings to people which they do not deserve. They will have more certainty about how poorly bad performers have performed and how good the great performers have performed. Hereby managers will feel more comfortable to give a rating to employees. Besides this managers will have a better feeling and confidence that it would be fair that the evaluation should result in some consequences like promotions or demotions when the information is accurate. All this will lead to an increased perception of fairness of forced distribution rating systems with an administrative purpose.

H2: Raters assigned to rate the performances of employees under an administrative condition will see forced distribution rating systems as more fair when the rating is based on high accurate information, compared to basing the ratings on low accurate information.

Figure 1: Estimation hypothesis development of perception of fairness

Information accuracy Low Information accuracy High

PE R CE PT IO N O F F A IR N ES S Non-administrative Administrative

(16)

16

3. Research method

3.1 Experimental design

Answering and testing the hypothesis will be done in an experiment with a 2 x 2 between subject design and an survey questionnaire about fairness. The experiment is inspired by the experiments of Bol et al. (2016) and Schleicher et al. (2009) and will have some similarities with the experiments they used. The participants will be asked to evaluate the performances by ranking the store managers, which will have pre-set consequences dependent on the rating purpose. Before ranking the store managers, the participants will have the knowledge of the consequences and will be asked after ranking the store managers to give their perception of fairness about the forced distribution rating system. This will be asked by using a point scale of 7 options. In this experiment the rating purpose and information accuracy will be manipulated. The rating purpose will be tested by using the administrative purpose versus the non-administrative purpose. Information accuracy will be high accurate information versus low accurate information. This means that there will be four cases in the experiment. All cases will give another perspective on the participants perception of fairness. In this research the participants are asked to rank the store managers. The reason for asking participants to rank the store managers is because the participants then are only able to judge the fairness of the processes and the outcome.

3.2 Task

The task of the participants will consist two phases. First rating the store managers and secondly giving their view on the fairness. The participants will read a case scenario about a the company Retail Holland, which has about 80 stores in the Netherlands which sells different kind of articles. The stores are divided in several regions over the country. Every region (16 in total) has its own regional manager who evaluates the performance of the stores in his region on a relative basis by using a forced distribution rating system. The task of the participant is to take the role of the regional manager of Amsterdam, who has four stores. Participants must provide a rating for the four stores by using a forced distribution. The best performer gets a 1 or a 2, the two medium performers a 3 and the weakest performer a 4 or a 5.

The rating decisions of the participants should be based on a set of performance indicators of the four stores (figure 1). This set of performance indicators consist of five indicators and includes the sales revenue per square foot, return on sales, customer satisfaction, mystery shopper and average employee satisfaction. Per performance indicator the target from headquarters is given as well. In this experiment store B has the best performances compared to the other three stores. Store C is performing worst compared to the other stores and store A and D perform between both other stores. This will be clear for the participants, but it is important to see how they rate the store managers and how this will influence the second part of the experiment.

As mentioned in the second part will be asked how the participants look at the fairness of the process and outcome. This will be measured by four items. The questions which will cover the

(17)

17 procedural fairness perception of the participants are “This ranking perception is a fair process of evaluating the store managers” and “Most store managers will see this process as a fair process”. The last two questions will be important in measuring the distributive fairness perception. These question are “Each store managers gets a fair rating” and “The consequences and rewards justice the performances”. In measuring the perception of fairness the mean of the four questions will be used. The questions will be answered by the participant with the options completely disagree, mostly disagree, slightly disagree, neutral, slightly agree, mostly agree or completely agree.

Figure 2: performance reports stores A, B, C and D

The key performance indicators, shown in figure 1, have some variability. Which means that it will be clear for the participants to give the best rating to store B and the lowest to store C, but it will be interesting to see if the participants will feel that the accuracy of information will lead to doubts about the efforts of the store managers.

3.3 Dependent variable

In this experiment the four different cases which will be explained in the following paragraph, will have consequences in the assignment of the ratings for the employees. In the two cases with a non-administrative purpose the ratings will not affect the rewarding of the store managers. The rewards will be equally shared between the store managers. For the other two cases, with an administrative purpose, the ratings will affect the rewarding. The best performer will get a reward between €6.500 to €7.500. The two middle performers will get a reward of €4.000 and the worst performer will get €1.500 to €1.000. Besides this the participants will be notified about the possibility that there is the change of elimination of the lowest performers. As mentioned these consequences and rewards could affect the perception of fairness of the participants/managers. In the four fairness questions the participants will be asked to give a score on a scale from 1-7. The fairness score per participant will be calculated by the mean of the four fairness questions. The fairness score per case will be calculated by taking the mean of the participants of the case. If for example in the first case with low information accuracy and an non administrative purpose 15 participants give a fairness score of 4 and 15 a score of 5 than the mean of this case will be 4,5. This score results in the fairness perception of the managers which will be the dependent variable. By looking at the fairness score it is possible to conclude how the perception of fairness will develop. This development will be the result of manipulation of the rating purpose and the accuracy of information. This manipulation will be explained in the next paragraph.

Performance indicators Store A Store B Store C Store D

Sales revenue per square foot € 115.450 € 135.560 € 100.345 € 121.670

Return on sales 12,50% 16,80% 9,40% 13,80%

Customer satisfaction rating (1-100) 76 86 72 77

Mystery shopper program rating (1-100) 88 85 94 87

(18)

18 3.4 Manipulations

Manipulation of the rating purpose and the information accuracy will help investigating how the control system design choices rating purpose and information accuracy will influence the fairness perception of managers. The manipulation will be done by giving the participants an concept email from the corporate management about the purpose of the rating system and the level of information accuracy. In this experiment there are four email versions. Emails which refer to the non-administrative purpose, administrative purpose, low conditions of information accuracy and high conditions of information accuracy. Before the manipulation all the participants will read the same expert from an email from the CEO of Retail Holland. This email contains the following information:

“As regional manager it will be your job to give a rating to the store managers in your region by using the forced distribution rating system. You could give the best performer a 1 or a 2, the two middle performers a 3 and the worst performer a 4 or a 5. Base your rating decision on your assessment of the overall performances of the store managers. After the rating decision, we as board will ask you to have a private meeting with the store managers to discuss their performances and to give an explanation about their rating, rewards and future.”

The second part of the email consists the manipulation of the rating purpose of the rating system. In the email with the non-administrative purpose the CEO stated the following:

“Looking to the purpose of the rating system. We have in the company the policy that all the performance evaluations have a non-administrative purpose. This means that the given ratings do not have any consequences for the store managers. It will only give us as board insight in the performances of the store managers. The bonus rewards will be equally per store managers and there will not be any promotion, demotion or elimination, because of the rating.”

In the email with the administrative purpose the CEO stated the following:

“Looking to the purpose of the rating system. We have in the company the policy that all the performance evaluations have an administrative purpose. This means that the given ratings do have consequences for the store managers. The best performer will get a reward between €6.500 to €7.500. The two middle performers will get a reward of €4.000 and the worst performer will get €1.500 to €1.000. It will give us as board insight in the performances of the store managers which could result in consequences like promotion, demotion or elimination, because of the rating. The possibility of elimination of the weakest performers is because the board wants primarily have the best store managers in their stores.

The information accuracy will be manipulated by providing in the emails that the information in the performance reports of the stores is accurate or not. In a low information accuracy condition the

(19)

19 regional managers will be told that that the information give a low accurate picture of the efforts of the managers. Besides this picture the participants will be provided with the knowledge that the customer flows are heavily volatile. In the condition of high accurate information the managers are provided with an email where is stated that the information is highly accurate and that this gives a good view about the results and the efforts of the performances of the store managers.

In the email with the administrative purpose the CEO stated the following:

“Please find attached the performance reports of the four stores that you need to evaluate. It is important to note that the four stores are located in areas with a very volatile/steady customer flow. This means that the data and information where the key performance indicators are based on is low/high accurate. This means that the Key Performance Indicators only provide a moderately/low or highly accurate picture of the managers’ efforts.”

With this manipulations there will be four different cases in both the experimental settings. 1. Non-administrative purpose – Low information accuracy

2. Non-administrative purpose – High information accuracy 3. Administrative purpose – Low information accuracy 4. Administrative purpose – High information accuracy

At the end of the experiment (part 2) the participants need to answer some statements to check if they understood the case and if the manipulations were effective. The manipulation check statements needed to be assessed in the form of 7 point Likert scale. The statement for the rating purpose was as followed: “Retail Holland has the policy that the forced distribution rating system has an administrative purpose (performances have consequences)”. The manipulation of the information accuracy was checked by the statement “The key performance indicators were based on highly accurate information about the efforts of the store managers”.

3.5 Procedures & Participants

The data from the participants in the experiment is collected by using the online survey platform prolific.ac. between 20-5-2017 and 29-5-2017. The data is collected in five sessions. The first session consisted of 120 participants.. In this session the four different phases were randomly distributed. I started in the first round with 120 participants to see how many participants would answer the check questions correctly. From the 120 participants in the first session only 50 participants answered the two manipulation questions correctly. Beforehand I was not expecting that all the participants would answer the check questions correctly, but 50 participants was less than expected. Also the distribution of the four manipulations cases was skewed. Case one and two had eight and seven participants who correctly

(20)

20 answered the manipulation question, relative to fifteen and twenty participants for case three and four. In the second session 50 participants were asked to contribute to the experiment. This session consisted of three manipulation phases. Only the administrative with high accuracy phase was excluded, because after reviewing the first session this phase had already enough participants (20). After the two sessions there were 69 participants who answered both the manipulation questions correctly. Both the non-administrative phases (low and high accuracy) had some troubles with finding participants who answered the question right. For session three and four the important sentences for the manipulation were made bold and underlined. In session three the phase with non-administrative and low accuracy was distributed to 16 participants. Session four, the phase with non-administrative and high accuracy, was distributed to 11 participants. Session three and four resulted in 15 participants who answered the manipulation question correctly. After these session one last session followed to increase the number of participants. The last session were taken by 52 participants. Which resulted in 25 participants who answered the manipulation questions correctly. In total 109 participants are included in this research. As described above in finding the participants I used several sessions. Before the experiment I wanted to find the participants by using two sessions, but because of the skewed results of the first round I decided to find the participants by doing multiple rounds. If I would launch a second session with 120 participants than this would probably results in a unbalanced distribution of participants. By using this many session I had some control over the distribution of the participants of the cases. The participants who failed the questions were excluded from further analysis. In paragraph 4.1 the manipulation checks will be explained further.

In the selection of the participants one of the important elements is the supervisor position. The choice for supervisors and not MBA or master students is because students do not have that many experiences with evaluating employees.The participants have on average 3,9 years of experience in a supervisory role. From the 109 participants were 60 (55%) male and 49 (45%) female. The average age of the participants is 34,43 years. Most of the participants are from the United Kingdom (56%) and the United States (11,9%). If this is translated to continents then 81,7% of the participants are form Europe, 14,7% from North and South America and the others are from Asia, Australia or unknown.

(21)

21

4. Results

4.1 Manipulation checks and descriptive statics

In the second part of the experiment the participants needed to answer several manipulation check questions to see if they read the case and to check the effectiveness of the manipulations in the experiment. The participants needed to answer the presented statements on a Likert scale from strongly disagree to strongly agree. To examine the effectiveness of the manipulation two question were asked. Examining the purpose manipulation the following statement was used: “Retail Holland has the policy that the forced distribution rating system has an administrative purpose (performances have consequences)” The mean score of the administrative purpose was 6,05 with a standard deviation of 0,88. For the non-administrative purpose applies a mean score of 1,73 with a standard deviation of 0,81. Which means that mean score of the administrative purpose was significantly higher than the mean score of the non-administrative purpose (p < .001, two-tailed). This means that the conclusion is that the purpose manipulation was successful. For testing the information accuracy manipulation the following statement was used: “The key performance indicators were based on highly accurate information about the efforts of the store managers”. The mean score of the high accurate information manipulation was 6,20 with a standard deviation of 0,75. The low accurate information manipulation has a mean score of 1,78 and a standard deviation of 1,03. Which means that the mean score of the high accurate information manipulation is significantly higher than the mean score of the low accurate information manipulation (p < .001, two-tailed). Considering that both the manipulation checks are significant, the conclusion is that both the manipulations were effective.

As mentioned in paragraph 3.3 the fairness scores are the dependent variable. To identify the fairness perception of managers the participants needed to answer four statements about the perception of fairness. The questions which are asked are already mentioned in paragraph 3.2. Table 1 and table 2 present the statistics of the fairness scores. In table 1 the means of the fairness score per statement are given. The means of the four statements are used to compute the overall fairness score per case. This score is presented in both the tables 1 and 2. The mean of the overall fairness score will be used in the ANOVA in the next paragraph. The fairness scores are based on a sore between 1 to 7. A score of 1 means that the fairness perception is extremely low and a score of 7 means that the fairness perception is extremely high.

Consistent with the theoretical reasoning in chapter 2 the low-administrative and low accurate information condition has a higher fairness score than the low-administrative and high accurate information condition. The fairness score of the low-administrative and low accurate information condition has a mean of 4,54 and a standard deviation of 1,24. The fairness score of the low-administrative and high information condition is lower with a mean of 3,81 and a standard deviation of 0,99. The group size of the low accurate information condition is higher than the group size of the high accurate information condition (N=25 vs N=20). The difference in the fairness scores of the two

(22)

22 administrative purpose conditions is even bigger. The fairness score of the administrative and low accurate information condition is 3,13 with a standard deviation of 1,19. In the high accurate information condition the fairness score is 4,64 with a standard deviation of 1,18. The group sizes are with a N=33 and N=31 close to each other.

Table 2

Descriptive statistics fairness scores per purpose

N Min Max Mean SD

1. Non-administrative &

Low accuracy 25 2,75 7,0 4,54 1,24 2. Non-administrative &

High accuracy 20 2,0 6,0 3,81 ,99 3. Administrative & Low

accuracy 33 1,0 6,0 3,13 1,19 4. Administrative & High

accuracy 31 2,5 6,5 4,65 1,18

Table 1

Fairness scores per question per purpose

N Mean Q1 Mean Q2 Mean Q3 Mean Q4 Mean Total

1. Non-administrative &

Low accuracy 25 4,40 4,40 4,68 4,68 4,54 2. Non-administrative &

High accuracy 20 3,95 3,45 4,15 3,70 3,81 3. Administrative & Low

accuracy 33 3,18 2,73 3,30 3,30 3,13 4. Administrative & High

(23)

23 4.2 Hypothesis test

4.2.1 Non-administrative purpose (H1)

H1 states that Raters assigned to rate the performances of employees under an non-administrative condition will see forced distribution rating systems as more fair when the rating is based on low accurate information, compared to basing the ratings on high accurate information. This hypothesis will be tested by using a one-way analysis of variance (ANOVA). The ANOVA compares the fairness scores of the participants in the low accurate information condition with the high accurate information condition when having a non-administrative purpose.

The

Levene’s F test revealed that the homogeneity of variance assumption was met (p = ,121). The ANOVA results, which are presented in table 3 (panel A), show that the there was a significant difference among the low accurate and high accurate groups as determined by the one way ANOVA. The ANOVA results are a F-statistic of 4,57 and a p value of ,038. This result means that the H1 will be supported. Supporting H1 means that the managers who rate the performances of the store managers by using a forced distribution rating system see a non-administrative purpose as more fair if they need to rate employees based on low accurate information compared to high accurate information. Figure 3 present the fairness scores of both the low accurate and high accurate conditions.

Figure 3

Fairness score non-administrative purpose

4.2.2 Administrative purpose (H2)

H2 states that Raters assigned to rate the performances of employees under an administrative condition will see forced distribution rating systems as more fair when the rating is based on high accurate information, compared to basing the ratings on low accurate information. This hypothesis will be tested by using a one-way ANOVA. The ANOVA compares the fairness scores of the participants in the low

4,54 3,81 3,0 3,20 3,40 3,60 3,80 4,0 4,20 4,40 4,60 4,80

Low Accurate High Accurate

(24)

24 accurate information condition with the high accurate information condition when having an administrative purpose.

The

Levene’s F test revealed that the homogeneity of variance assumption was met (p = ,704). The ANOVA results, which are presented in table 3 (panel B), show that the there was a significant difference among the low accurate and high accurate groups as determined by the one way ANOVA. The ANOVA results are significant with a F-statistic of 26,44 and a p value of < 0,01. This results means that the H2 will be supported. Supporting H2 means that the managers who rate the performance of the store managers by using a forced distribution rating system see an administrative purpose as more fair if they need to rate employees based on high accurate information compared to low accurate information. Figure 4 present the fairness scores of both the low accurate and high accurate conditions.

Figure 4

Fairness score administrative purpose

3,13 4,65 3,0 3,20 3,40 3,60 3,80 4,0 4,20 4,40 4,60 4,80

Low Accurate High Accurate

Administrative

Table 3

ANOVA results, test for H1 and H2

Panel A: Analysis of variance for Non-administrative purpose

Source Df Mean square F-statistic p-value Fairness score 1 5,88 4,57 ,038

(Levene’s F test was not significant (p=,121) so equal variance is assumed)

Panel B: Analysis of variance for Administrative purpose

Source Df Mean square F-statistic p-value Fairness score 1 37,15 26,44 <0,01

(25)

25

5. Discussion and conclusion

Many organizations evaluate their employees based on subjective performance evaluations. Prior literature suggests that the evaluations based on subjective performance evaluations leads to biased evaluations (Bol et al. 2016). Compression and leniency are two examples of biased evaluations. Besides subjective performance evaluations companies could use forced distribution rating systems. This type of performance evaluation is more resistant against biases like compression (Landy et al. 1980). One disadvantage of forced distribution rating system is that the perception of fairness of the evaluators is low, which will have negative consequences for the implementation and successful maintaining of the forced distribution rating system. To see how control system design choices influence the fairness perception of managers this paper examined if the rating purposes and information accuracy could influence the perception of fairness of managers when using forced distribution rating systems.

The assumption in my study is that the evaluators assigned to rate the employee see an non-administrative purpose as more fair if their evaluations are based on low accurate information compared to high accurate information. The reason for this assumption is that managers cannot justify the rating they will give to the employees when they need to base their evaluation on low accurate information. On the other hand the assumption is that forced distribution rating systems with an administrative purpose work the other way around. The perception of fairness will be high when the evaluators need to base their evaluations on high accurate information. By knowing exactly how employees have performed (high accurate information) managers will not have the uncertainty that they give ratings to people which they do not deserve. Only high accurate information will be fair if the purpose is administrative.

In an experiment with managers, I found support for both this reasoning’s. For the first assumption I found that the fairness score of the low accurate information condition with an non-administrative purpose was significantly higher compared to the fairness score of the high accurate information condition with an non-administrative purpose. Which means that managers who rate the performances of their employees by using a forced distribution rating system see a non-administrative purpose as more fair when they need to rate employees based on low accurate information compared to high accurate information. For the second assumption the findings are that the fairness score of the high accurate information condition are significantly higher compared to the condition with low accurate information when the rating purpose is administrative. Considering this it means that the managers who rate the performance of their employees by using a forced distribution rating system see an administrative purpose as more fair by rating employees based on high accurate information compared to low accurate information.

The first conclusion on the influence of the rating purpose and the information accuracy on the perception of fairness of a forced distribution rating system, is that both control system designs are influencing each other. In other words both the control system designs are dependent to each other. It is

(26)

26 not possible to say that the perception of fairness will increase by using high accurate information, because in the condition with a non-administrative purpose the perception of fairness of managers will decrease. This also works the other way around. Low accurate information has influence on the perception of fairness if this is combined with the rating purpose. An administrative rating purpose in combination with low accurate information will decrease the perception of fairness significantly. This means that the influence of both the control system design choices cannot be seen independently.

Secondly the conclusion is that it is possible to increase the perception of fairness of managers by making choices with both the control system designs. Making the right choices for increasing the perception of fairness of the managers could increase the willingness, confidence and commitment to support the forced distribution rating system. Which makes the change of a successful introduction of an forced distribution rating system bigger.

The findings of this study have a number of implications on management accounting theory. The first contribution to theory is that this thesis answers in some way the future research call in the article of Bol et al. (2016). Bol et al. (2016) argued that the control system design choice of information accuracy should be tested in different settings than they did. They used the setting subjective performance evaluation while this study uses the forced distribution evaluation system. The results implicate that in both studies the information accuracy is playing a major influence as control system design.

Secondly this study contributes by extending the literature on the subjects fairness, forced distribution performance evaluations, rating purposes and information accuracy. This is the first study which combines both the control system designs rating purpose and information accuracy in combination with the perception of fairness. By investigating the role of the rating purpose and the information accuracy it gives more insights in how the perception of fairness develops. Besides this most studies about fairness are having attention to the employee who gets evaluated. In this study the managers, as evaluators, having the attention. On the other hand this study provide information about the information accuracy and rating purpose itself. This study concludes that both control system designs are influencing and cannot be seen apart from each other when it looks at the perception of fairness. Schleicher et al. (2009) examined the fairness of the forced distribution rating system without the information accuracy. They only used the rating purpose. By researching the role of the information accuracy this will contribute to the study of Schleicher et al. (2009) by giving more insight in how forced distribution rating systems influence the fairness perceptions. Only looking at the rating purpose will give an incomplete view.

The findings of this study are also relevant for practice. Firstly this study shows companies that the perception of fairness of forced distribution rating systems could be increased. As mentioned before because of this study companies understand better how the perception of fairness could be influenced, which could results in making different choices which are in favour of their objective. It will be possible for companies to create circumstances to increase the perception of fairness of their managers relating