• No results found

Monitoring and employee productivity : evidence from a field experiment

N/A
N/A
Protected

Academic year: 2021

Share "Monitoring and employee productivity : evidence from a field experiment"

Copied!
42
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Monitoring and Employee Productivity: Evidence

from a Field Experiment

Jannes Pijnenburg

11927240

University of Amsterdam, July 2018

Msc Business Economics: Managerial Economics & Strategy

Supervisor: Hessel Oosterbeek

15 ECTS

Abstract

This research examines the response of employees to the implementation of electronic monitoring. The response is tested in a field experiment with a sample group consisting of cashiers in a supermarket located in Terheijden (Noord-Brabant). Besides examining the causal relationship between monitoring and performance, the research supplements existing literature by exploring potential heterogeneous treatment effects for three different subgroups: underperformers, workers who perceive monitoring as ‘controlling’ and experienced workers. Accounting for seasonal effects and day-specific effects, the results show that workers increased their productivity after the implementation of monitoring measures. Furthermore, high-performers in the sample group were shown to be more receptive to the monitoring treatment than under- or average performers. Similarly, workers who perceived the treatment as ‘fair’ rather than ‘controlling’ were more sensitive to the treatment effects. The results regarding the influence of the employees’ experience level remain debatable.

(2)

This document is written by Student Jannes Pijnenburg who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1.INTRODUCTION ... 4

2. METHODOLOGY ... 7

2.1. LOCATION AND SAMPLE ... 7

2.1.1. Location ... 7 2.1.2. Sample ... 8 2.2. EXPERIMENTAL DESIGN ... 9 2.3. MAIN TREATMENT ... 10 2.4. DEPENDENT VARIABLES ... 13 2.5. INDEPENDENT VARIABLES ... 14

2.6. POTENTIAL CONTAMINATION OF RESULTS ... 17

2.6.1. Scan-desks ... 17

2.6.2. Group obstruction and spillovers ... 18

2.6.3. Performance ceiling ... 18 2.6.4. Big clients ... 18 2.6.5. Survey inaccuracy ... 19 2.6.6 Procedural unfairness ... 19 2.6.7. Hawthorne effects... 19 2.7. HYPOTHESES ... 20 3. RESULTS ... 22

3.1. MONITORING AND PRODUCTIVITY ... 22

3.2. PER EMPLOYEE RESULTS ... 28

4. DISCUSSION ... 34 5. CONCLUSION ... 36 APPENDIX ... 38 APPENDIX A: INTERVIEW 1 ... 38 APPENDIX B: INTERVIEW 2 ... 39 REFERENCE LIST ... 40

(4)

1. Introduction

This research consists of a field experiment where the cashiers of a supermarket are subjected to an (electronic) monitoring treatment to test whether it stimulates performance. Additionally, the field experiment tests whether a discrepancy can be found in the treatment effect for underperformers, inexperienced workers or workers who perceive the treatment as ‘controlling’ (as opposed to their counterpart groups: over-performers, experienced workers or those who perceive the treatment as fair).

Organizational engagements in electronic monitoring have been steadily increasing over the past ten years (Firoz et al, 2006: Fazekas, 2004). E-mail communications, browsing history checks, computer keystroke capturing, listening in on phone calls, video surveillance or GPS tracking in cars are all commonly used strategies to gather information or keep workers from shirking. Shirking can be costly to an employer due to reduced productivity. Since shirking tends to occur when no consequences are attached to the wrongdoing (Engel, 2010), investing in (electronic) monitoring can be an attractive and cost-effective option for employers.

In general terms, humans tend to show more socially desired behavior whenever a ‘watching eye’ is introduced (Bateson et al, 2006). Prior literature reports that a similar effect occurs in performance monitoring in working environments. Engel (2010) examined the different effects of incentive schemes on productivity and found that subjects who were given incentives to shirk do in fact shirk, but (amongst others) monitoring could lead to increased productivity. Although Stanton (2000) predominantly mentions negative side-effects of monitoring, the research also indicates the positive relation between monitoring and both fairness appraisal and the recognition of high performers. Similarly, Mas and Moretti (2009) reported that supermarket employees drastically increased performance in the situation where an employee could be observed by another productive employee. This report suggests that the increased performance is to a certain extent driven by a fear of exposure as well as the presence of a certain prestige that we consider as workers. This exposure, in combination with the fear of reprimands by an authority, is generally believed to drive positive performance stimulation (Bengtsson and Engström, 2013; Aiello et al, 1995).

(5)

Nonetheless, (electrical) monitoring can also induce negative side-effects. Employees who were subjected to performance monitoring are reported to be more likely to experience job stress, boredom, psychological tension, anxiety, depression, fatigue and other health complaints (Fairweather, 1999; Aiello & Kolb, 1995; Smith et al, 1992; Stanton, 1996). Another negative side-effect is found in recent developments in economic theory, which provide evidence for the ‘motivational crowd-out theory’. This theory predominantly mentions a ‘loss of intrinsic motivation’ due to either financial incentives (Mellström and Johannesson, 2010; Gneezy and Rustichini, 2000) or control (Falk and Kosfeld, 2006; Marx and Sherizen, 1998). For the latter, the motivational loss is caused by the fact that control negatively affects one’s autonomy and induces a sense of distrust (Stanton, 2000). In an experimental principal-agent game1 where a worker’s choice range is increasingly being limited, Falk and Kosfeld (2006) showed that motivation and performance tend to decrease as control increases. Additionally, a heterogeneous effect was found for subjects who perceived the treatment as ‘controlling’, as opposed to those who had a positive attitude towards the implementation.

It can be concluded that literature provides opposing conclusions when it comes to performance monitoring. The crowd-out theory stresses negative effects of monitoring, whereas performance stimulation theories support monitoring. This exposes a need for additional experimentations on the causal relation between monitoring and performance. Especially since increasingly digitalized work environments allow more e-monitoring alternatives, a need arises for supplementary assessments of its potential negative effects. On top of that, it was indicated how the effect of monitoring appears to differ per type of employee. Increased research could thus potentially add to formulating more generalisable conclusions.

Whereas much of the abovementioned literature draws its conclusions from questionnaires or lab experiments, this research tests the hypotheses through real-life workplace experiments with actual incentives. Hence, this may contribute to the existing literature and may simultaneously serve as an example for implementation within other firms. The experiment is

1

Situation or game where a worker (the agent) and the manager (principal) impact one another while both acting out of self-interest. Typically, these interests are unaligned and the situation is characterized by asymmetric information (Jensen & Meckling, 1976)

(6)

set up using several subgroups, which will be split according to performance level, attitude towards the treatment or experience. This allows comparison of results between different employee types and thus introduces the variable of heterogeneity in the receptiveness of monitoring. In conclusion, the main research is two-fold and can be formulated as:

How does worker productivity change in response to the implementation of monitoring and to what extent does relative performance, a person’s perception of the intervention or experience cause heterogeneity in the treatment effect?

In short, results showed that after correcting for seasonal effects and day-specific effects, overall productivity has increased during the monitoring treatment, though not significantly. Corrected for 2017 seasonal effects and day-specific effects, productivity levels on average apear to have increased on average with 0.7 products handled and 0.05 customers helped per minute, relative to the weeks before the treatment. Comparing these results to the pre-intervention period in 2018, this would be a 4.9 and a 3.7 percent increase, respectively. Then, regarding heterogeneity among our subgroups: it appears that it is not underperformers who are the most stimulated by monitoring, but high performers. Also, workers who perceive the treatment as ‘fair’ and workers with less experience show a higher treatment effect than their counterpart groups.

The remainder of this paper will be organised as follows: section 2 describes the methodology, in which the experiment itself is discussed elaborately. This section covers the location and sample choice, as well as the contextual factors that are of interest to the experiment. It also provides a detailed description of the actual treatment protocol, the chosen variables and the potential contaminations of the results. Subsequently, section 3 comprises the actual experimental results. To show general performance discrepancy between pre-intervention and post-pre-intervention periods (of both 2017 and 2018), average productivity levels are displayed in tables and by use of Kernel Density graphs. Difference-in-difference models shall indicate whether there is significant difference in productivity between 2018 pre- and post-intervention periods, corrected for (2017) seasonal effects. In order to see whether heterogeneous treatment effects are found for the earlier mentioned subgroups, per employee and per subgroup treatment effects are given. Based upon these results, the hypotheses and our main research question are answered. Sections 4 and 5 will finally discuss the potential

(7)

side effects of the treatment, comments and future research possibilities, as well as the overall conclusions of the experiment.

2. Methodology

2.1. Location and sample

2.1.1. Location

A Jumbo supermarket in Terheijden functions as the location of the field experiment and its cashiers are used as subjects. The location provides the ideal context for the experiment, because productivity is well-measurable due to the current software system2 installed on all cash registers. The system does not only register multiple productivity measures, but it also gives notice of who is operating the desk, how many customers are helped, turnover per employee and various other figures. Since this monitoring system is already installed, but currently not used by the management as a tool for feedback or contractual decisions, it serves as a perfect low-cost opportunity for the monitoring experiment. The responsible figure in charge of the department is kept unaware of the available data.

The organisation is also characterized by flat wages, little career concern among the workers and a culture of shirking (or free-riding). The shirking can partly be attributed to the fact that many of the employees work at the supermarket as a side job, leading to low levels of career concern or fear of losing the job. Other reasons for the shirking could originate from wage- and teamwork-related factors. From a management perspective, flat wages have proven to be suboptimal in terms of performance stimulation, in comparison to piece rates or bonus payments (Lazear, 2000). The reason is that there is no direct compensation for over-performing. From an economic point of view, it could be argued that workers in a supermarket work just hard enough not to get fired. This way, (s)he maximizes ‘wage minus cost of effort’. Theory on team jobs states that it is economically unsatisfying to ‘over-perform’, because the increased marginal result will benefit all team members while only the high-performer bears the extra marginal costs (Hart, 1995).

2

NCR Storeline POS: A ‘fully integrated and cross-functional Point of Sale (POS) and store management

(8)

It is assumed that cashiers work as a team and that the busier in the store, the more effort is needed to handle all customers timely (increased marginal cost of effort). In this situation, all cashiers thus benefit if everyone performs on a certain level, but for an individual it makes no sense to put in more effort than the group average. By putting in more effort, the worker bears the extra marginal costs individually, decreasing his/her own total utility. Merely if one works for non-monetary reasons (i.e. has intrinsic motivation), it is understandable why that person over-performs. Prestige, moral or social reasoning could be the driving factors behind intrinsic motivation. In fact, interviews with the employees showed that many fear to be ‘below average’ and all acknowledge that their individual performance ceiling has not yet been reached (Appendix A). This proves the presence of shirking, as well as the presence of a certain prestige concern.

2.1.2. Sample

In terms of active employees, 58 workers are able to occupy a cash desk at the start of the experiment of which 55 are still in service by the time the treatment starts. Of these 55 workers, 20 people are suitable candidates for the experiment since the rest does not operate a cash desk on a regular basis, has a management function or is not employed for the entire duration of the experiment. These 20 people thus serve as the sample group. Besides the fact that the sample group is typically characterized by women (15 out of 20), it also shows a clear division in type of employees. On the one hand, the supermarket employs scholars/students aged 16 to 19, working between 2 to 14 hours per week under temporary contracts. On the other hand, there is a ‘mature’ workforce for whom the work is their main source of income. These people have usually been employed at the supermarket for quite some years, generally working 22 to 40 hours per week under fixed contracts. Although the suitable labour force for our experiment is limited to 20 people, there are many observations per worker and the clear division in our sample makes it a proper setting for measuring heterogeneity of the treatment effect among subgroups (experienced vs. non-experienced).

It should be noted that the non-random sample in which women and youngsters are overrepresented (let alone self-selection in supermarkets), decreases the external validity of the experiment. Conclusions following from the results can therefore not be generalised to larger populations but are rather used as suggestive evidence taken from a case study.

(9)

2.2. Experimental design

The experiment will make use of a difference-in-difference design in which 4 distinct periods are distinguished: 2018 pre- and post-intervention and 2017 pre- and post-intervention (control). The experiment takes place during week numbers 10 to 22 of 2018. In the 2018 pre-intervention period (weeks 10-18), daily productivity figures from the current employees are collected and closely examined. Weeks 19 to 22 are used as the ‘intervention’ or ‘treatment’ period. To avoid misattribution of a potential treatment effect, data from the exact same days and weeks of 2017 are used as a reference (i.e. difference-in-difference design) and accounted for if required. Usually, an experiment is characterised by random assignment of subjects to the treatment and control groups. However, our setting makes this random assignment impossible, due to a lack of influence on working schedules and because it is impossible to correct for past observations (i.e. from 2017).

Furthermore, a ‘pre-treatment’ interview is performed in week 18 with each of the 20 workers, consisting of standardised and predetermined questions. These will give some prior insight into a worker’s job satisfaction, their intrinsic motivation, their performance ceiling, how they could potentially be affected by the treatment, their knowledge about their contracts and various other meaningful insights (Appendix A). Gathering these data prior to the experiment is required for determining why certain variable change during the experiment as well as to properly divide the employees into subgroups based upon treatment perception (i.e. fair vs. controlling). The interview is required to determine control variables and to ensure the productivity ceiling has not yet been reached before the treatment has even started. In case someone believes (s)he could not work any faster than their current effort, the treatment shall have little to no impact. Moreover, without this prior knowledge, workers could be wrongly categorised as an underperformer whereas (s)he thinks their effort is above average. This could affect one’s incentives and it makes it harder to allocate a person to the under- or overperformers subgroup. To counter this, a comprehensive list is hung up in the cashier’s office at the start of the treatment, displaying bi-monthly productivity figures per worker, ranked from most productive to least productive. Hence, using this list everyone should know his/her relative performance and the categorisation of workers as under- or overperformer can be executed.

(10)

An ‘after-treatment’-survey3

will provide extra information on experimental side-effects and/or whether workers attribute a change in their respective productivity to the treatment rather than to other factors beyond the scope of the research.

2.3. Main treatment

Since no performance measuring of any kind was deployed before the treatment started, a proper baseline is created. The type of monitoring selected for this experiment influences the experiment, since not all monitoring methods are expected to have a similar effect. Yet, considering the set-up of this experiment, the choice is mainly determined by the availability of measurement tools and the willingness of the store management to cooperate. A distant or back-office monitoring tool called Storeline was shown to be the preferred option, since the tool properly stores all productivity data and the store manager did not favour the idea of routinely physical monitoring. Besides the monitoring itself, an important aspect is to repeatedly remind the workers that they are being monitored. This is to avoid a mitigated or distorted effect due to a lack of information provision about what is happening. Additionally, the treatment must be in accordance to the newly installed Dutch and European laws4. Main implications from these laws are that workers cannot be monitored through camera surveillance for a significant period of time and all data should be anonymous. Nonetheless, it is not forbidden to keep track of productivity figures if a proper reasoning is given and clearly communicated to the employees. To guarantee this, a contract is signed with the supermarket that data is indeed anonymous, not shared with third parties and only used for the purpose of this research.

The Sunday before the start of the intervention, all workers are notified of their involvement in this new campaign. To guarantee that subjects are given the proper treatment effect and to increase effectiveness, it is not specifically communicated that they are enrolled in an experiment used for research purposes. Rather, merely the management’s motivation for the intervention is communicated, i.e. to get a proper grasp of the workers’ relative performance

3

Post-intervention interviews were made optional due to obstructionism from personnel and the author’s promise not to intervene too dominantly in the usual course of business. 9 subjects were found to answer and gave some useful insights.

4

(11)

to one another. To guarantee that all subjects experience the same treatment on the exact same time, the communicative methods chosen are both email and text message.

Just before the start of the treatment (Sunday of week 18), all subjects receive an email containing a message mentioning all employees are being watched and productivity figures are being stored and passed onto their managers. The employees received the following email, where all names are omitted for privacy reasons and transcripts are translated:

Email contact 1 (6th of May, 2018):

“Dear employee,

Through the use of this email I am updating you on the fact that as of tomorrow (7th of may) I will be closely observing your performance. On behalf of the management I will, on a daily basis, be looking at each of your individual productivity figures, starting from the minute you log onto one of the cash registers. I will do this through the use of software that is installed onto the cash registers and which provides data that rightfully reflects your working speed. I will be looking at the following aspects of your work:

- (Daily) average number of items scanned per minute - (Daily) average number of customers handled per minute

Explanation: the total number of customers/products you have handled/scanned on a given day will be divided by the total number of minutes and seconds you have operated the cash register.

The goal of this project is to give the management a proper overview of who is relatively productive and who is not. All results will be shared with the management and if you wish to know about your productivity, I will provide updates on your performance personally.

Kind regards, Jannes Pijnenburg”

On the morning of the start of the treatment, a supplementary text message is sent to every worker as a reminder that they are being observed from the moment they start working. This

(12)

text message was sent on the 7th of May, 2018 and contained the following message (translated):

“Dear employee,

As of today (7th of May 2018) I will be observing your productivity for the upcoming four weeks. I have sent you all an email with the details.

Hence, keep in mind that whenever you log-in onto a cash desk with your personal account, the performance measuring will start and I will be able to see how fast you are handling products and customers.

Kind regards, Jannes”

The purpose of the text is both to remind and to refer to the earlier mentioned email in order to rule out a chance that anybody has missed the provided details or does not get a similar treatment. A reminder is sent by email on the 16th and the 26th of May, 2018 to prevent the treatment effect from slacking off due to a lack of communication. The content of these emails are provided below (translated).

Email contact 2 (16th of may, 2018):

“Dear employee,

We are well underway in week two of your productivity scan. I have handed over a list of everyone’s productivity in week 19 to ….(name manager 1) and ….(name manager 2). If you would like to receive some feedback, please feel free to ask either them or myself. I can see how much time you’re currently spending on scanning, ‘thinking’ and paying and how you perform on each on these factors relative to others.

Also, for clarification, I retain the possibility to see and collect your productivity figures whenever I am not physically in the store. If there are any questions or concerns, you can reply to this email or address me personally in the office. I am present on Friday, Saturday and Sunday,

(13)

Kind regards, Jannes”

Email contact 3 (26th of May, 2018)

”Dear employee,

Next week we will enter the fourth and final week of the observation period. Your results of week 20 have been handed to your managers and feel free to come by to see how you are performing. Next week’s productivity figures shall be available on Friday.

To answer some of your earlier questions:

If …. (name manager) tells you your APM is around 15 it means that in a certain week, the total number of products handled by yourself is 15 per minute on average. This is calculated by dividing your total number of products handled during the week by the number of minutes you have been logged onto any of the cash desks during this same week.

The same goes for KPM (i.e. customers per minute). This is calculated by summing the number of customers handled by you in a week divided by the total number of minutes you’ve worked behind a cash desk.

Again, feel free to pass by and make inquiries if there are questions.

Kind regards, Jannes”

2.4. Dependent variables

As mentioned in the emails, employees will be monitored on their average daily number of customers handled per minute (CPM) as well as the average number of products scanned per minute (PPM). These variables represent productivity to a proper extent since per-minute averages display the working speed adequately regardless of the length of one’s shift. The reason that two variables are attained rather than one is as follows: whenever a cashier handles many customers with fewer products, his/her PPM is likely to decrease while his/her CPM is likely to increase. Since handling many low-products customers induces a large

(14)

amount of fixed handlings, such as operating the screen or collecting change, more effort is required relative to workers that receive fewer customers with better filled shopping carts. Additionally, (s)he will spend more time waiting for customers to perform actions such as packing groceries or paying. For cashiers who receive a higher number of ‘big shoppers’ on a given day, the opposite is true. Hence, both cases are taken into account as indicators of productivity. As a result, workers who improve on both indicators are deemed to have ‘increased performance’.

2.5. Independent variables

The main independent variable is the treatment itself, as aims to identify the influence of digital monitoring on employee behaviour.

Treatment: Binary variable (0 or 1) to determine whether the observation was in the ‘intervention period’ or the ‘observation period’, respectively.

2.5.1. Controls

To check for the robustness and the general validity and interpretation of our results, the observations in the experiment are to be compared to observations within the same time period for past years (in this case 2017) . Using 2017 as the control year ensures the exclusion of seasonal effects.

It is important to control for external circumstances that could potentially influence productivity levels, especially since the experiment is performed with relatively few subjects observed over a large period of time. From experience, literature and knowledge gained by conversations with the department managers and employees, it was concluded that productivity of individuals at this specific department of the supermarket are mainly influenced by several factors: the day-specific circumstances such as crowdedness in the store, the day of the week, the length of the shift, the intensity of a worker’s workweek, illness and stress or the time spent behind a ‘scan-desk’5. The main controls thus are:

5

Cash desk where customers come to pay who have scanned their products themselves on forehand. Entire receipts are uploaded onto the screen within 2 seconds.

(15)

Customers: A measurement for crowdedness in the store, measured on a daily basis. A lot of customers means repeating many fixed handlings per client and consequentially a lower average PPM. Subjects fear the customer service rule where the ‘fifth-in-line’ gets free groceries. Hence, crowdedness increases stress and potentially influences productivity. The variable is measured by count and is provided by Storeline.

Turnover: Another control variable measuring crowdedness in the store on a particular day. 16 out of 20 subjects have stated that busy days cause a sense of hurry, possibly increasing productivity. Turnover is measured in thousands of euros per day and is available in Storeline.

Shiftlength: Within the organization, a large discrepancy exists between the length of shifts that employees fulfill. Since the marginal cost of effort is generally larger for longer shifts, productivity is likely to drop with the amount of time worked on a given day. Jeanmonod et al. (2008) proved that for hospitals, shorter shift lengths improved productivity (in terms of patients evaluated per hour). The reason identified was that longer shifts could induce fatigue, concentration loss and errors (Bendak, 2003). On top of that, shift lengths are not randomized between workers. Rather, the longer shifts are generally occupied by experienced workers with fixed contracts whereas the number hours that scholars work are both limited by school hours and by law. Hence, shift length could be a predominant factor to control for. The variable ‘Shiftlength’ entails the amount of time worked behind any of the cash desks on a given day, measured in minutes.

Scandesk: The control variable ‘Scandesk’ concerns the ratio of ‘normally’ scanned products to the total number of products entered into the register by hand. The latter method takes longer, as extra time is spent on either entering a 13-figure PLU code6 or on manual product searches on the touchscreen. A worker that enters more products by hand (e.g. fruits, vegetables, breads, beers) generally experiences reduced productivity. A worker behind a cash-desk does not handle any manual products. Hence, the ratio of scanned products to manually inserted products is used to indicate the relative time a worker operates a scan-desk and therefore partly accounts for variation in productivity.

6

A unique code below the barcode present on every product in a supermarket. It is mainly used for inventory management purposes.

(16)

WorkWeek: Based on the notion that longer days cause fatigue and concentration loss (Bendak, 2013), this variable is used to control for longer working weeks. Correlation between the intensity of a workweek and productivity is partly confirmed by a randomly chosen sample of a several cashiers. As a result, it makes sense to control for this variable. The variable is measured on a 1 to 7 scale, respectively denoting whether the observation is made on the first, second, third, fourth, fifth, sixth or seventh day that a worker is working that week. This variable could especially be influential in the supermarket in question, due to the clear separation between the hours worked by the younger staff and the full-time personnel on fixed contracts.

Workyesterday: A binary variable (0 or 1) measuring whether a person had been working the day prior to the day on which the observation was made. This variable is included since the described effect of long working weeks may be mitigated whenever a worker has a full day off in between working days. This dummy variable is thus to be added to the previously mentioned control ‘WorkWeek’.

Medical factors such as illness, fatigue or stress due to circumstances outside of work could additionally influence a subject’s productivity. Yet, this would require daily interviews with the supermarket staff, which was not supported by the management. Consequentially, the assumption was made that every worker in the experiment experienced an equal amount of good and bad days. Although it does not provide usable data for the experiment, a post-experiment inquiry is made among the workers to evaluate whether they believed they were influenced by any external factors during the treatment. The results of the inquiry are merely as background information for a discussion.

The speed with which customers are handled is mainly affected by a worker’s own speed (e.g. in scanning products, operating the screen, registering payment, returning cash), but also by a customer’s time needed to stash products on the belt as well as the time needed to pay. Some customers take longer to stash and pay than others do. Hence, the amount of ‘fast payers’ a customer handles could have an influence on the outcome of an experiment. Since this variable in not measurable through the software, the assumption is made that every individual worker experiences an equal level of average ‘waiting time’ both before and during the treatment.

(17)

More elaborately, the average time required by a customer varies between 20 and 24 seconds for any given day and any given worker. One reason for the discrepancy between workers is that there is no full randomisation in the assignment of customers to cashiers. Although many customers opt for the line that causes the shortest waiting time (i.e. randomisation), certain cashiers attract certain customers due to social connections or personal preferences of the customer. Moreover, customers who pay at scan-desks usually prefer a faster process since they are in a hurry or have a lot of groceries. Despite these notions, the workers that operate a scan-desks did so both before and during the treatment. Hence, their average handling times should be similar in both time periods.

On top of that, a discrepancy was found in the type of contract, wage, contract expiration dates, level of physical ability, joy experienced at work, stress levels, attitude towards the treatment, age, gender, relative productivity to the group, performance ceilings, attitude towards underperforming and feeling of autonomy during work. However, these individual fixed effects merely become relevant when comparing across individuals whereas this experiment used aggregated data of all employees to find a general relationship between monitoring and performance. Finally, for the sake of this experiment, it is assumed that all workers work an equal number of days per week before as after the treatment. This assumption is required to account for the possibility that if faster workers take more shifts in the intervention period, overall productivity will be higher regardless of the treatment.

2.6. Potential contamination of results

2.6.1. Scan-desks

As is the case in many contemporary supermarkets, there are two (or possibly even more) ways of registering a customer’s products. Customers either shop for groceries in the traditional fashion where they collect all their groceries, put them on the ‘belt’ and let the cashier scan all products. In recent years, a second possibility has arisen where customers pick up a ‘hand-scanner’ at the supermarket entrance which they use to scan products. At the supermarket checkout, customers hand over the hand-scanner to the cashier who then instantly enters the entire receipt into the cash register. This significantly reduces waiting time and leads to a higher customer flow-through at this type of register, relative to traditional cash registers. Besides, customers with an above average amount of products are more likely to use a hand-scanner.

(18)

As a result, a person who operates ‘scan-desks’ generally has a higher average productivity. Over the period of days or months, the same people operate the same type of registers (i.e. traditional cash registers or scan-desks), yet from hour to hour this varies significantly. As described in section 2.5, the ‘Scandesk’ variable is used to control for this variation. Although this variable provides a decent indication of the time spent operating a scan-desk, the exact time remains unknown. To account for this, the assumption is made that each subject operates a scan-desk similarly intensively in the treatment as in the observation period. This assumption is arguable since scan-desks are generally operated by responsible or experienced workers, due to the higher degree of theft-risk of the scan-desk. Additionally, one of the cash desks functions as the service counter which is exclusively operated by experienced workers. 2.6.2. Group obstruction and spillovers

In case the treatment procedure is perceived as unfair by any subject or a subject openly opposes the treatment, this negative stance could spillover to colleagues and lead to potential contamination of the results. If part of the sample group adapts this negative stance, the possibility exists that they deliberately refrain from changing their performance. In this case, results will indicate a less effective treatment than it might be in reality. Nonetheless, the setup of the experiment does not allow an assessment of the individuals’ influence on one another.

2.6.3. Performance ceiling

There naturally is a maximum productivity level possible. In case this maximum productivity level has already been reached before the treatment, no room for improvement remains to be induced by the treatment. This would thus cause the overall treatment effect to be mitigated. To account for this, a pre-experiment inquiry was made among employees to check whether the felt an increase in their performance would still be possible (see Appendix A).

2.6.4. Big clients

Occasionally, a company purchases large batches of a particular type of product from the supermarket. This mainly concerns crates of beer (by pubs) or milk powder (by international traders). These batches of products are usually pre-ordered by the business customer. Consequentially, these business customers pay upfront at one of the cash desks, where the order is inserted into the registry manually. So in case a batch of one hundred items is

(19)

ordered, this is inserted onto the screen and on the receipt, instantly and drastically increasing the cashier’s productivity.

The moments when such large batches are ordered are not spread out evenly over time, since they generally coincide with discounts or moments when market prices are low. Additionally, these batches are often handled by the same cashier. Although this could potentially distort the productivity figures, the monitoring software can be used to detect and remove these batch orders.

2.6.5. Survey inaccuracy

During the interviews, employees might be inclined to respond untruthfully. A worker may for example prefer socially desired answers over truthful answers due to privacy reasons or out of fear of repercussions from the management. To minimize these effects, it was communicated towards the workers that the surveys were anonymous and not shared with any person working at Jumbo Supermarkets.

2.6.6 Procedural unfairness

It could be the case that control is not the factor that reduces the effectiveness of monitoring, but rather the notion that our methods are perceived as unfair. Procedural unfairness is described as the discrepancy between desired and perceived procedures or policies (Alexander & Ruderman, 1987). According to literature, procedural unfairness contributes negatively to job satisfaction and psychosomatic well-being (Schmitt & Dörfel, 1999; Alexander & Ruderman, 1987) as well as to the evaluation of the supervisor, trust in management and economic health of the organization (Bies, 1993; Meindl & Stensma, 1994; Sheppard, Lewicki & Minton, 1992). Thus, measuring the effect of monitoring would be hampered if the experiment would be perceived as unfair. It is expected that this would lead to a mitigation of a potential positive effect (or an enlargement of a negative effect). To evaluate whether the subjects perceived procedural unfairness, this was included as question in the post-experiment survey (appendix B).

2.6.7. Hawthorne effects

The Hawthorne effect is the tendency of people to behave differently during research experiments where they are being observed (Gillespie, 1991). Any type of experiment is possibly characterised by Hawthorne effects which may influence or contaminate research

(20)

results. Hawthorne effects generally induce a positive treatment effect (e.g. an increase in productivity/effort/motivation) regardless of the type of treatment, because workers are knowingly being observed. This effect applies to the experiment in this research as well since the treatment comprises the observation of workers. Hawthorne effects could furthermore have occurred prior to the treatment. In the weeks prior to the experiment, workers may have noticed the presence of a researcher on the work floor which may have triggered them to change their performance. If workers improved their performance before the treatment had started, this would have resulted in a lower measured treatment effect.

2.7. Hypotheses

Prior research has shown that monitoring stimulates performance levels, driven by the fear of exposure. Our experiment facilitates the proper context for testing this hypothesis, given the fact that shirking is common on the work floor feedback is insufficiently given according to the employees (see appendix A). Moreover, subjects estimated a potential productivity increase of almost 19% if they maximised effort levels), which indicates that there is room for improvement. Hence, the first hypothesis is formulated as:

Hypothesis 1: Employees are more productive when they know they are being monitored.

Secondly, as based on Falk and Kosfeld (2006) on the hidden costs of control, a perception of procedural unfairness or an aversion to being controlled is expected to negatively affect a worker’s receptiveness to the treatment. Although monitoring is not identical to control as workers still have freedom to make choice, it may still be perceived as controlling. Therefore, the second hypothesis is the following:

Hypothesis 2: Workers who perceive the treatment as controlling show a smaller productivity increase (or a decrease) than those who deem the measure as fair or just.

In addition, older workers or workers with fixed contracts are expected to perceive the treatment as more controlling than newer or younger employees. Older workers may have developed a more distinct feeling of autonomy or may perceive intervention by the management as more distrustful. On the contrary, newer workers may not have the tendency to complain but rather believe monitoring employees to be business as usual. On top of that,

(21)

older/more experienced workers might dislike change in general. Workers with temporary contracts might also feel a stronger urge to prove themselves worthy of contract prolongation. Therefore, hypothesis 3 is formulated as follows:

Hypothesis 3: Inexperienced workers are more positively stimulated by the monitoring treatment than experienced workers.

Finally, underperformers are expected to be more sensitive to a fear of exposure. Hence, underperformers should have a higher motivation to improve performance than over-performers. Following this line of thought, a third hypothesis is formulated as follows:

Hypothesis 4: Underperformers show a larger productivity increase than average or over-performers.

(22)

3. Results

3.1. Monitoring and productivity

Table 1 displays a summary of the statistics concerning gathered data on the prior outcome- and control variables. Since a single observation entails a worker’s daily productivity, 1281 observations implies that over a period of 182 days (=91 days in both 2017 and 2018), approximately seven workers per day were involved. Overall, the average number of customers and products handled per minute on a given day is 1.43 and 14.06, respectively.

Table 1: Summary Statistics

VARIABLES N Mean St.dev. Min Max

Treatment 1,281 0.313 0.464 0 1 Customers per Minute 1,281 1.427 0.227 0.760 2.450 Products per Minute 1,281 14.06 3.073 4.800 29.11 Turnover 182 23.89 6.157 8.721 43.60 Customers 182 1,565 245.5 710 3,162 Scandesk 1,281 4.514 2.753 0.643 38.69 Shiftlength 1,281 144.6 81.30 2 411 WorkWeek 1,281 2.113 1.084 1 7 WorkYesterday 1,281 0.386 0.487 0 1

Summary of the main dependent and independent variables. Turnover is measured in thousands of Euro’s.

Tables 2 and 3 subsequently denote the average levels of PPM and CPM during pre-intervention and post-pre-intervention, as well as during 2017 and 2018. Notably the baseline averages for both outcome variables are higher in 2018 than in 2017. Hence, the sample in 2018 is found to be more productive on average than in 2017 to start with. Furthermore, the average productivity is higher in the post-intervention period as compared to the pre-intervention (or observation) period in 2017. Workers in May thus appear to have worked faster than in April or March, regardless of the presence of a treatment, suggesting a seasonal effect. Mean values per outcome variable per period are as follows:

(23)

Table 2: Per period productivity levels: products handled per minute

Period N Mean Std. Dev. Min Max

2017 Pre-intervention 430 13.07 2.80 4.80 23.93

2017 Post-intervention (Control) 204 13.76 2.52 8.62 23.92

2018 Pre-intervention 450 14.33 2.88 5.56 27.56

2018 Post-intervention (Treatment) 197 15.92 3.61 10.72 29.11

Table 3: Per period mean productivity levels: customers handled per minute

Period N Mean Std. Dev. Min Max

2017 Pre-intervention 430 1.36 0.21 0.76 1.98

2017 Post-intervention (Control) 204 1.39 0.19 0.96 1.96

2018 Pre-intervention 450 1.45 0.22 0.85 2.24

2018 Post-intervention (Treatment) 197 1.57 0.23 1.03 2.45

Average PPM and CPM increased during the treatment with a respective 11.10 and 8.28 percent. In 2017, these same outcome variables increased with a respective 5.27 and 2.20 percent. Lastly, minimum values increase drastically when moving from pre-intervention to post-intervention, which seems to be the case for both years (tables 2 and 3).

To visualise the productivity distribution per period among the sample group, outcome variables are displayed in a Kernel density graph. Figures 1 and 2 show the Kernel densities of both CPM (left) and PPM (right). The figures display probability density functions, which indicate the distribution of our outcome variables. The area under the curve adds up to 1 and is interpreted as follows: the probability that x (our outcome variables) lies between x1 and x2 is calculated by the area under the graph between those points on the horizontal axis.

Figure 1 shows data for all employees whereas figure concerns a balanced panel, because changes in the group composition from 2017 to 2018 would affect average productivity levels. Figure 2 was included to allow better interpretation of the discrepancy in the lines shown in the Kernel density graphs. Its results are merely based on observations of workers who were employed in both periods and in both years. From these figures it can be deduced that average productivity in the post-intervention period in 2018 is higher for both sample groups and for both outcome variables (i.e. the yellow peaks are shifted more to the right).

(24)

Additionally it can be noted that in this period, the distribution is more divided among observations (or less concentrated around fewer values) than in the other three periods. Especially for PPM (right-hand graphs), there appears to be a vast increase in observations between the values 15 and 23.

Figure 1: CPM and PPM Kernel Density Graphs (All employees)

Figure 1: Kernel density distributions for outcome variables CPM (left) and PPM (right) for all four periods (2017 before/after and 2018 before/after) and all employees included.

Figure 2: CPM and PPM Kernel Density Graphs (Balanced Panel)

Figure 2: Kernel density distributions for outcome variables CPM (left) and PPM (right) for all four periods (2017 before/after and 2018 before/after), including only workers that worked both 2017 and 2018

Due to the presence of a potential seasonal effect, it is important to assess its scale and correct for it in the treatment effect. Table 4 shows a difference-in-difference model displaying panel data on all workers who were employed during the experimental time frame of either 2017 or 2018. Columns 1 and 3 present a simple difference-in-difference model, merely displaying the

(25)

average treatment effect on the outcome variables ‘products per minute’ (PPM) and ‘customers per minute’ (CPM). ‘Treatment’ here implies the pre-intervention vs. post-intervention dummy, whereas the ‘Year2018’ dummy concerns whether observation took place in 2017 or 2018 (i.e. 1 if 2018, 0 if 2017). Standard errors are robust and clustered on an employee level. The amount of independent observations is limited by clustering standard errors on an employee level, which in turn drastically decreases the number of degrees of freedom. In order to avoid zero degrees of freedom (=N-K-1), the variable ‘WorkYesterday’ is omitted in tables 4 and 5. The argumentation behind the omission stem from its low degree of influence on the outcome variables (lowest coefficients and significance level) and its high degree of multicollinearity with variable ‘WorkWeek’. This will capture part of the effect in case our omitted variable is lost.

In tables 4 and 5, the difference-in-difference estimator is called ‘monitoring’ and is related to the latter part of the subsequent formula’s:

YPPM,CPM = ß0 + α1Treatmentt + ß1Year2018it + ß2 Year2018*Treatmentit + εit

Corrected for potential seasonal effects in 2017, the monitoring intervention in the simple model shows an average productivity increase of 0.896 products scanned per minute and an average increase of 0.089 customers handled per minute. Both values are significant at the 5 percent level. After controlling for crowdedness in the supermarket, proportion of time worked behind a scan-desk, shift length and the intensity of one’s workweek, the difference-in-difference estimator (=’Monitoring’) has become more significant for both outcome variables. The formula’s R2 increases from 9,5 percent (in both simple models) to 32.8 and 37.3 percent after adding day-specific controls. Furthermore, the estimated mean productivity for post-intervention in 2017 weeks 19-22 (control year) shows a notable positive difference relative to the pre-intervention weeks 10-18 for CPM (see ‘Treatment’ variable, column 4). This provides evidence of a seasonal effect. For PPM, this effect is found to be positive but not significant.

(26)

Table 4: Difference-in-difference results: All Subjects

(1) (2) (3) (4) PPM PPM CPM CPM Monitoring 0.896** 1.027*** 0.0889** 0.0714** (0.423) (0.364) (0.0374) (0.0277) Year2018 1.258** 0.772 0.0868* 0.0948** (0.537) (0.462) (0.0448) (0.0381) Treatment 0.691** 0.271 0.0321 0.0585** (0.283) (0.219) (0.0289) (0.0231) Turnover 0.183*** -0.0127*** (0.0348) (0.00239) Customers -1.886** 0.0644 (0.711) (0.0607) ScanHand -0.266*** 0.0132*** (0.0526) (0.00302) Shiftlength -1.080*** -0.0804*** (0.183) (0.0202) WorkWeek 0.524*** -0.0139 (0.106) (0.0112) Constant 13.07*** 13.59*** 1.359*** 1.646*** (0.413) (0.822) (0.0363) (0.0596) R2 0.095 0.328 0.095 0.373 F 14.46 21.49 12.85 75.61 Observations 1281 1281 1281 1281

The difference-in-difference estimator is displayed by the variable ‘Monitoring’. Variables ‘Turnover’ and ‘Customers’ are measured in thousands. Variable Shiftlength is measured in hundreds of minutes. Standard errors are robust and clustered on the employee level. PPM and CPM refer to ‘products per minute’ and ‘customers per minute’, respectively. Standard errors in parentheses. *

p < 0.10, ** p < 0.05, *** p < 0.01

Hence, even in the absence of monitoring workers tend to work faster in May than in April or March. Also for PPM this appears to be the case, though not significant (column 2). Furthermore, a mean difference in the baseline of both PPM and CPM is existent between 2018 and 2017 prior to the intervention. As the ‘Year 2018’ variable shows, the average productivity in the pre-intervention period appeared to be significantly higher in 2018 than in 2017, regardless of any treatment. Since all workers who were employed in either 2017 or

(27)

2018 during this time of the year were included in the data, it is well possible that 2018 contained more productive workers on average or workers have learned over time. This model, however, is thus only informing under the strong assumptions that workers work an equal number of days each week (to exclude overrepresentation of fast workers in the treatment period) and that mean productivity levels from workers that worked either 2017 or 2018 are of equal levels.

Since these assumptions are so strong, a similar regression is done for the group of workers that were employed during both 2017 and 2018. This regression (table 5) reduces observations, but increases explanatory value. Including only the workers that were employed by the supermarket in both 2017 and 2018 during both pre-intervention and post-intervention should provide a more balanced panel and thereby, stronger evidence of a treatment effect. As we can see, the difference-in-difference estimator here has become smaller and less significant.

After controlling for both the observation period in 2018, the seasonal trend in 2017 and day-specific effects, a significantly positive treatment effect is suggested for both PPM and CPM (column 2 and 4), though not significant. Monitoring seems to have increased the average number of products scanned per minute by 0.716 and the average number of customers helped per minute by 0.0536. Comparing it to the pre-intervention period in 2018, this is a 4.91 and a 3.7 percent increase, respectively. As was the case with Table 4, this table shows prove of both a seasonal effect as well as a significant difference in the baseline productivity levels. This could suggest a learning effect.

(28)

Table 5: Difference-in-difference results: Balanced Panel

(1) (2) (3) (4) PPM PPM CPM CPM Monitoring 0.511 0.716 0.0613 0.0536 (0.551) (0.536) (0.0411) (0.0306) Year2018 2.193*** 1.608*** 0.202*** 0.201*** (0.541) (0.460) (0.0435) (0.0313) Treatment 0.866*** 0.229 0.0540* 0.0717** (0.264) (0.268) (0.0290) (0.0257) Turnover 0.233*** 0.00825*** (0.0367) (0.00233) Customers -2.864** -0.0620 (0.945) (0.0677) Scandesk -0.309*** 0.0143*** (0.0714) (0.00372) Shiftlength -1.030*** -0.0626** (0.266) (0.0271) WorkWeek 0.298*** -0.0305** (0.0739) (0.0107) _cons 12.93*** 14.47*** 1.311*** 1.700*** (0.535) (1.107) (0.0366) (0.0924) R2 0.172 0.399 0.243 0.471 F 9.627 103.2 20.90 47.13 N 701 701 701 701

The difference-in-difference estimator is displayed by the variable ‘Monitoring’. ‘Turnover’ and ‘Customers’ are measured in thousands. ‘Shiftlength’ is measured in hundreds of minutes. Standard errors are robust and clustered on the employee level. PPM and CPM refer to ‘products per minute’ and ‘customers per minute’, respectively. Standard errors in parentheses. *

p < 0.10, ** p < 0.05, *** p < 0.01

3.2. Per Employee Results

In order to verify hypotheses 2 and 3, relating to expected heterogeneous effects between under versus over-performers, years of experience or attitude towards the treatment, results per employee or per group are required. Tables 4 and 5 represent before-after regressions per subject as well as a ‘difference-in-difference’ regression for those employees who worked all 4 periods (2017 before/after and 2018 before/after). Aggregate levels below the tables (average1 and average2) indicate mean treatment effects. For average1, the separate estimates

(29)

have been weighted by their respective sample fractions. Average2 displays mean treatment effects, weighted by the inverse of their variances. The latter method should minimize the variance of the mean estimate, which makes it the most efficient estimator. Formulas are found in the notes below tables 6 and 7.

For both outcome variables, only 3 workers have decreased performance in 2018 while the other 17 have improved. It should be noted that this concerns mean levels and not corrected for day-specific- and seasonal effects. There are no workers that experienced a decrease in both outcome variables. While this appears illogical, it is explained by the reason why two outcome variables were chosen as dependent variables in the first place (see section 2.4). Most workers that have improved, have primarily done so in either one of the fields, rather than both.

Relative to the observation period, the average number of products scanned per minute per employee increased with 1.453 and the average number of customers helped per minute increased with 0.135 (following average1). For 2017, these values were 0.474 and 0.067, respectively. Corrected for their individual seasonal effect in 2017, the balanced panel of workers on average showed a positive performance increase (columns 10). Still, this is not the actual total productivity increase of the team combined, as not every worker has improved by the same proportion and a discrepancy between workers in the number of hours worked exists. Logically, if workers with a relatively smaller improvement would work more hours, total productivity increase would be mitigated.

Furthermore, the R2 of the regressions vary from 18.4 percent (subject number 8, 2017, Table 7) to 88 percent (subject 6, 2018, Table 7). Columns 4 and 8 of both tables show us that the R2 of the regressions has increased in 2018. Since the treatment is the only anomaly, the change in R2 could well be attributed to our monitoring treatment.

(30)

Table 6: OLS Results per Employee (PPM)

Subject (1) ∆2018 (2) Se (3) R2 (4) N (5) ∆2017 (6) Se (7) R2 (8) N (9) ∆2018-∆2017 (10) Se (11) 1 0.762 (1.634) 0.331 30 1.393** (0.508) 0.666 37 -0.470 (1.358) 2 3.859*** (0.944) 0.548 36 1.809** (0.658) 0.58 36 2.262* (1.273) 3 1.717 (1.286) 0.556 35 -0.371 (1.109) 0.468 32 2.239 (1.540) 4 0.370 (0.526) 0.538 42 0.479 (0.499) 0.296 44 -0.00690 (0.637) 5 0.727 (1.099) 0.476 42 2.252 (1.408) 0.445 27 0.496 (1.607) 6 0.174 (0.738) 0.666 21 -1.93* (1.108) 0.63 26 -0.0304 (1.210) 7 -0.334 (1.058) 0.237 24 0.872 (0.715) 0.57 36 -1.323 (1.213) 8 1.124 (0.776) 0.476 31 0.574 (0.338) 0.287 35 0.368 (0.593) 9 1.719 (1.242) 0.437 44 0.797 (0.633) 0.48 53 1.249 (1.311) 10 1.625 (1.311) 0.312 33 -1.091 (1.111) 0.247 37 3.337* (1.799) 11 2.834*** (0.817) 0.756 15 12 0.499 (0.792) 0.706 37 13 1.458 (0.884) 0.285 26 14 2.099 (1.082) 0.554 25 15 1.408*** (0.429) 0.618 38 16 3.564* (1.831) 0.476 25 17 0.305 (0.854) 0.404 27 18 0.482 (0.615) 0.552 22 19 2.372*** (0.756) 0.548 59 20 1.707** (0.736) 0.545 35 21 2.424*** (0.572) 0.537 40 22 -1.109 (0.707) 0.528 31 23 0.0847 (0.504) 0.362 36 24 0.217 (0.728) 0.573 42 25 0.548 (1.101) 0.556 26 26 -0.296 (0.759) 0.619 35 27 0.522 (0.588) 0.265 61 Average1 1.453 (1.046) 0.474 (1.063) 0.875 (1.212) Average2 1.235 (0.180) 0.609 (0.153) 0.330 (0.435)

Column 10 represents difference-in-difference regressions so includes a ‘year’ variable to control for baseline differences. Day-specific controls are added for all regressions.Subjects 7 and 11 only once worked two days in a row, which makes the binary dummy ‘Workyesterday’ unusable (hence, it is excluded). Average1 contains averages weighted by their relative sample fractions

(b = ∑i wi*bi /∑i wi). Average2 uses inverse-variance weights, where wi=var(bi)−1/∑ivar(bi) -1

and var(b )=1/∑ivar(bi)

-1

(31)

Table 7: OLS Results per Employee (CPM)

Column 10 represents difference-in-difference regressions so includes a ‘year’ variable to control for baseline differences. Day-specific controls are added for all regressions.Subjects 7 and 11 only once worked two days in a row, which makes the binary dummy ‘Workyesterday’ unusable (hence, it is excluded). Average1 contains averages weighted by their relative sample fractions

(b = ∑i wi*bi /∑i wi). Average2 uses inverse-variance weights, where wi=var(bi)−1/∑ivar(bi)-1 and

var(b )=1/∑ivar(bi) -1

. Standard errors are in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.010

Subject (1) ∆2018 (2) Se (3) R2 (4) N (5) ∆2017 (6) Se (7) R2 (8) N (9) ∆2018-∆2017 (10) Se (11) 1 0.0708 (0.0558) 0.737 30 0.0955** (0.0402) 0.495 37 -0.110 (0.0736) 2 0.202*** (0.0551) 0.531 36 0.326*** (0.0588) 0.614 36 -0.156* (0.0783) 3 0.199** (0.0740) 0.693 35 0.0425 (0.0502) 0.534 32 0.153* (0.0860) 4 0.0654** (0.0274) 0.435 42 0.0169 (0.0253) 0.650 44 0.0131 (0.0358) 5 0.105** (0.0457) 0.648 42 -0.0164 (0.0786) 0.280 27 0.0380 (0.0984) 6 0.0728*** (0.0221) 0.880 21 0.0275 (0.0552) 0.376 26 0.0185 (0.0482) 7 0.0659 (0.0664) 0.661 24 0.00839 (0.0529) 0.668 36 0.0148 (0.0923) 8 0.180*** (0.0387) 0.761 31 -0.0197 (0.0428) 0.184 35 0.115** (0.0568) 9 0.0722 (0.0500) 0.505 44 0.0880** (0.0351) 0.430 53 0.0176 (0.0609) 10 0.322*** (0.112) 0.604 33 0.122 (0.0750) 0.460 37 0.147 (0.142) 11 0.0593 (0.0868) 0.787 15 12 0.106** (0.0450) 0.472 37 13 0.214*** (0.0488) 0.674 26 14 0.162** (0.0590) 0.655 25 15 0.0419 (0.0366) 0.538 38 16 -0.00757 (0.0433) 0.718 25 17 0.287*** (0.0867) 0.605 27 18 0.155*** (0.0423) 0.705 22 19 0.177*** (0.0288) 0.597 59 20 0.119** (0.0475) 0.338 35 21 0.233*** (0.0451) 0.613 40 22 0.00409 (0.0429) 0.555 31 23 0.149** (0.0562) 0.688 36 24 0.00774 (0.0502) 0.581 42 25 -0.0284 (0.0815) 0.649 26 26 0.0945** (0.0447) 0.798 35 27 -0.0310 (0.0327) 0.532 61 Average1 0.135 (0.077) 0.067 (0.095) 0.024 (0.071) Average2 0.110 (0.010) 0.055 (0.011) 0.020 (0.018)

(32)

Tables 8-10 group together workers into categories based upon relative performance (table 8), attitude towards the treatment (table 9) and experience levels (table 10). The coefficients represent the treatment effects per subgroup, only using 2018 data. Even though difference-in-difference models are preferred due to the correction for seasonal effects, this type of model is not feasible given the fact that only 10 workers worked both 2017 and 2018. Dividing these 10 workers into 2 or more subgroups would negatively affect the credibility of the results. Hence, 2018 data is used to see in what subgroup the treatment was most effective, uncorrected for seasonal effects.

Categorising workers into subgroups lowers the number of independent observations per regression to the exact number of employees per subgroup. Clustering standard errors on the employee level would therefore make regressions that include day-specific controls impossible (n-k-1≤0). To avoid having a larger or equal number of regressors to independent variables (=zero degrees of freedom), standard errors in tables 8-10 are exclusively made robust. The choice is made not to cluster on an employee level, since coefficients rather than significance levels are of main interest and these remain unaffected by the clustered standard errors. Hence, the regressions are performed as if all observations are independent from one another and significance levels should not be interpreted.

For tables 8 and 10 a fixed quota per subgroup is used to create equal groups (and thus approximately equal observations). For table 9, the categorisation is done by the worker’s degree of agreement to the pre-intervention interview statement: “If someone would observe me constantly during my work, it would feel for me as if I’m not trusted or as a breach of my independency”. If a worker answered with ‘agree’ or ‘strongly agree’, the worker is categorised in the group whose attitude towards the treatment is ‘controlling’, whereas if workers answered with ‘neutral’, ‘disagree’ or ‘strongly disagree’, they are considered to perceive monitoring as a ‘fair’ practice. Due to the homogeneity of the type of regressions and to stimulate clarity and simplicity of the tables, control variable output is not showed. Most results are positive and significant, but this could well be explained by the observed seasonal effect or the fact that errors are not clustered. Yet, if a seasonal effect is in fact present, it is assumed that this occurs for all workers and results from the subsequent tables can still be interpreted relative to one another.

(33)

Table 8: 2018 OLS treatment effects per subgroup: Relative performance

Products scanned per minute (PPM) Customers helped per minute (CPM)

Table displays OLS regressions per subgroup (low-performers vs. average performers vs. high-performers) using 2018 data only. Forced distribution is used to categorize. Standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.010

Table 9: 2018 OLS treatment effects per subgroup: Attitude towards the treatment

Products scanned per minute (PPM) Customers helped per minute (CPM)

Table displays OLS regressions per subgroup (workers who perceive the treatment as ‘fair’ vs. workers who perceive the treatment as ‘controlling’) using 2018 data only. Standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.010

Table 10: 2018 OLS treatment effects per subgroup: Experience level

Products scanned per minute (PPM) Customers helped per minute (CPM)

Table displays OLS regressions per subgroup (experience<10 months vs. experience>10 months) using 2018 data only. Includes all 20 workers. Standard errors in parentheses. * p < 0.10, ** p < 0.05,

***

p < 0.010

(1) (2) (3) (4) (5) (6)

Productivity: Low Average High Low Average High

Monitoring 1.221*** 0.659* 1.687*** 0.111*** 0.128*** 0.155*** (0.302) (0.351) (0.384) (0.0213) (0.0243) (0.0265) Constant 12.19*** 15.09*** 15.56*** 1.728*** 1.734*** 1.929*** (1.186) (0.993) (1.367) (0.100) (0.0921) (0.0830) R2 0.309 0.263 0.378 0.408 0.508 0.438 F 9.376 8.795 16.86 25.03 29.84 29.14 N Day-controls 198 Yes 170 Yes 279 Yes 198 Yes 170 Yes 279 Yes (1) (2) (3) (4)

Attitude: Fair Control Fair Control

Monitoring 1.407*** 0.648 0.143*** 0.106*** (0.257) (0.478) (0.0164) (0.0380) Constant 13.37*** 17.10*** 1.815*** 1.910*** (0.861) (1.489) (0.0550) (0.118) R2 0.363 0.239 0.409 0.400 F 40.92 5.786 49.54 12.27 N Day-controls 510 Yes 137 Yes 510 Yes 137 Yes (1) (2) (3) (4)

Experience: <10 months >10 months <10 months >10 months

Monitoring 1.685*** 0.987*** 0.141*** 0.126*** (0.345) (0.295) (0.0193) (0.0210) Constant 12.97*** 15.17*** 1.635*** 1.984*** (1.133) (0.959) (0.0632) (0.0684) R2 0.349 0.330 0.472 0.401 F 20.39 25.74 33.96 34.85 N Day-controls 274 Yes 373 Yes 274 Yes 373 Yes

Referenties

GERELATEERDE DOCUMENTEN

Vos combineerde zijn dichterschap met het beroep van glazenmaker; door over stedelijke gebouwen en hun bewoners te dichten verwierf hij zich opdrachten voor zijn

Com- parisons of the numbers of pupae deposited in burrows without the trap, with the numbers of perinatal flies trapped in burrows, showed that many full-term pregnant female

The path of the winning number of this game and the performance of the players are analysed in order to see what happens to the price of an asset and the performance of traders,

The downconverter uses a two-stage approach; the first stage is an analog multi- path/multi-phase harmonic rejection mixer followed by a second stage providing additional

For the astronomical application each LOFAR station will contain 96 Low Band Antennas (LBAs) and 48 to 96 High Band Antenna (HBA) tiles to cover the whole frequency range

Toen bij het bestuur bekend werd dat ons oud bestuurslid Felix Gaillard, die op de achtergrond nog buitengewoon veel werk voor de vereniging doet, zijn financiële en

Als we afspreken dat we de getallen van minder dan drie cij- fers met nullen ervoor aanvullen tot drie cijfers, kunnen we ook vragen: ‘wat is het eerste cijfer?’ Trekken we dit

Bijvoorbeeld man zit in het sub-menu INFORMATIE_EISENPAKliET en men .iet dat men de aanlooptijd (TYDA) niet ingegeven heeft terwijl men eist dat deze maximaal